I'm dealing with slightly different types hence for clarity of what I'm trying to achieve I have decided to use metaphor.
Let's say you need to create tables that describe projects by two architectural bureaus:
1st only deals with 3D plans
2nd only deals with 2D sketches
I have the following table
mysql> describe sketch;
+------------------+-------------------------------+------+-----+-------------------+
| Field | Type | Null | Key | Default |
+------------------+-------------------------------+------+-----+-------------------+
| project_id | binary(16) | NO | PRI | NULL |
| company_id | binary(16) | NO | PRI | NULL |
| type | enum('2D','3D','N/A') | YES | |'N/A' |
+------------------+-------------------------------+------+-----+-------------------+
As you can see project_id & company_id form the PRIMARY KEY
The issue arises when in some exceptional circumstances the same company takes on 2D and 3D task under the same project ID.
Or the same company starts working on two or more sub-projects of the same type (e.g. both are 2D sketches) but within the realm of let's call it parent project with exactly the same ID.
One quick and dirty fix would be simply to add unique ID to the above table but it wouldn't work for me, because there are various reports and and other functions which basically do this: SELECT blah FROM sketch WHERE project_id=XXX AND company_id
I could add code to filter the results from the above SQL but I can't really change the structure or the table.
Any ideas of what options do I have?
Appreciate any ideas!
And thank you very much beforehand!
As you describe the problem, company/project is not a primary key. You describe circumstances where uniqueness is violated.
Then company/project/type does seem to be a unique key and a candidate primary key. I would say that you should have a numeric primary key and declare the tripartite key as unique.
Related
Im playing around with MySQL at the moment, learning stuff about database design and wondered something i couldnt find an answer to in Google.
Imagine a table named 'products' with the primary key 'id' and two additional columns named 'name' and 'primary_image_id', where 'primary_image_id' is a foreign key linking to a second table.
The second table is named 'product_images' also with the primary key 'id' and two additional columns this time called 'path' (path to the image) and 'product_id'. 'product_id' is of course a foreign key linking back to the first table.
+----+-----------+------------------+
| id | name | primary_image_id |
+----+-----------+------------------+
| 1 | product_A | 3 |
+----+-----------+------------------+
| 2 | product_B | 6 |
+----+-----------+------------------+
+----+-----------+------------------+
| id | path | product_id |
+----+-----------+------------------+
| 1 | /image_01 | 2 |
+----+-----------+------------------+
| 2 | /image_02 | 1 |
+----+-----------+------------------+
| 3 | /image_03 | 1 |
+----+-----------+------------------+
| 4 | /image_04 | 1 |
+----+-----------+------------------+
| 5 | /image_05 | 2 |
+----+-----------+------------------+
| 6 | /image_06 | 2 |
+----+-----------+------------------+
The idea is to have a table with all product images while only one image per product is the preview image (primary image). Is this type of foreign key linking even possible? And if yes, is it good databse design or should I use an other method?
Thank you in advance!
This is a valid use case and the table design looks good if your intention is to just read data using foreign key like "Get all image paths for product id 1" or "Get primary image of product id 1" or "Get paths of all primary images".
People tend to avoid the cycle of foreign key reference in tables specially if there is a cascade dependency on delete/update events. You need to answer questions like "What should happen to image 2, 3 ,4 if product 1 is deleted" or "what should happen to product 1 if image 3 is deleted".
The answers would help you come with a design that fulfills your requirement
Just use indexes without FOREIGN KEYs.
A more typical approach would be to move the primary flag to the images table. Both of these approaches have the potential for illogical data —
Your way would allow product 1 to name image A as its primary while image A could identify product 2 as its product.
My way would allow products to have 0 or 2+ primary images if the flag wasn’t well-managed.
Depending on how worried you are about either inconsistency, you could try to manage it via triggers or constraints, although MySQL is a little lacking in these areas compared to other DBMSs.
One way to absolutely prevent a problem would be to have the primary flag in the images table, but use it as an int (rank), not a Boolean with a convention that minimum rank is the “primary” — create a unique index on the combination of (product ID, rank) — and access this data via a stored proc or view that could implement the rank convention for you, e.g. select * from images a where product_id = whatever and does not exist (select 1 from images b where a.product_id = b.product_id and a.rank > b.rank).
Seems like overkill, but you need to be the judge how important potential data integrity issues are for your application.
I have 2 tables in a my MySQL Database.
Let's call 1st main, 2nd final.
TABLE `main` has the structure | TABLE `final` has the structure
|
`id` --> PRIMARY KEY (Auto Increment) | `id` --> PRIMARY KEY (Auto Increment)
| `id_main` --> ?? (Need help here)
|
id | name | info | id | id_main | name | info(changed)
--------------------- | ---------------------------------------
1 | Peter | 5,9 | 1 | 2 | Butters | 0.3,34
2 | Butters | 3,3 | 2 | 4 | Stewie | 1.2,4.4
3 | Stan | 2,96 | 3 | 1 | Peter | 5.7,0.9
4 | Stewie | 1,84 | 4 | 3 | Stan | 4.8,0.74
After analysing data in main the results get put into final.
As you can see final has an extra column (id_main) which points back to main.id
In actuality these 2 tables are 100 million+ rows each, my problem arises while performing SQL queries.
How should final especially (id & id_main) be configured so that Querying from main to final is the fastest.
Can I do away with final.id (PRIMARY KEY, Auto Increment) & keep
final.id_main (As an UNIQUE Index?)
OR
Should I keep id AS PRIMARY KEY (AI) & final.id_main AS UNIQUE Index?
I would be making calls like:
int id_From_Main= 10000;
SELECT `id_main` FROM `final` WHERE `id`='"+id_From_Main+"'
If there's a 1:1 relation between those tables, I don't see any reason why they would need two separate auto-incremented primary keys.
I would remove the final.id column and have the final.id_main as a non-auto-incremented primary key and a foreign key to the main.id column.
In general, you can also have a table without a primary key at all. It depends on if you want to be able to select specific individual rows or not.
I don't understand your query SELECT id_main FROM final WHERE id = '"+id_From_Main+"' — you're trying to select the value of ID from main by ID from main. What's the purpose, why are you trying to get the value you already have?
Anyway, you're not providing enough information to give you a qualified answer. You have to optimize you data structures according to queries you'll be doing.
Make sure you have indexes on columns which you are using in the WHERE clausule. If you're selecting by final.id_main, have an index on that column. If you're selecting by final.id_main and final.name, have a composite index on both columns, etc.
Do you really need to have the name column in both tables? It's a bad database design, unless it's some performance optimization (to avoid a join).
So, you should:
collect all queries you're currently using, set proper indexes according to them
remove any unnecessary columns (e.g. final.id, final.name)
use the EXPLAIN on your queries to get execution information (you can also use the Explain analyzer to help you interpret the results)
you can try query profiling
In mysql, you have to define id as PK because it is auto_increment. Define id_main as UNIQUE.
I have mysql table, which has structure
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| content | longtext | NO | | NULL | |
| valid | tinyint(1) | NO | | NULL | |
| created_at | timestamp | YES | | NULL | |
| updated_at | timestamp | YES | | NULL | |
+------------+------------------+------+-----+---------+----------------+
I need to remove duplicate entries by content column, everything would be easy if it wasn't longtext, the main issue is that entries in that column vary in length from 1 char to over 12,000 chars and more, and I have over 4,000,000 entries, simple query like select id from table where content like "%stackoverflow%"; takes 15s to execute, what would be best approach to remove duplicate entries and not wait 2 days on executing query?
md5 is your friend here. Make a separate hashvalues table (to avoid locking/contention with this table in production) with columns for the id and hash. The primary key for this table should actually be the hash column, rather than id.
Once the new empty table is created, use MySql's md5() function to populate the new table from your original data, with the original id and the md5(content) for the field values. If necessary you can even populate the table in batches, if it would take too long or slow things down too much to do it all at once.
When the new table is fully populated with data, you can JOIN it to itself like this:
SELECT h1.*
FROM hashvalues h1
INNER JOIN hashvalues h2 on h1.hash = h2.hash and h1.id <> h2.id
This should be MUCH faster than comparing the content directly, since the database only has to compare pre-computed hash values. I'd expect to run almost instantly. It will tell you which records are potential duplicates. There is still a potential for hash collisions, so you also need to compare this back to the original data to be sure, or include an originalcontent column in the new table you can use with the query above. That done, you will know which records to remove.
This system can be even better if you can add a column to the original table to keep the md5() hash of your content field up to date every time it changes. A Generated Column will work well for this if you have the right storage engine. Otherwise, you can use a trigger. This column will allow you to re-run your duplicates check as needed, without all the extra work with the separate table.
Finally, there are also Sha(), Sha1(), and Sha2() functions that might be more collision-resistant. However, the md5() will be much faster and the additional collision resistance isn't enough to avoid the need for also comparing the original data. This also isn't a security situation where collision potential will matter, and so md5() is the better choice here. These aren't passwords, after all.
I'm having a diffecult time figuring out what to use for my primary key.
My table:
| gender | age | value | date updated | page id(the forgein key) |
| M | 15-24 | 100 | some date | 1
| M | 25-34 | 120 | some date | 1
| M | 35-44 | 110 | some date | 1
| F | 15-24 | 190 | some date | 1
| F | 25-34 | 230 | some date | 1
Now I need to add a primary key. I could either add a id field with auto increment and make that the pk but that id will not be used as forgein key or anything else in another table so it would be kind of useless to add it.
I could also combine the page, gender and age and make them the primary key but I am not sure what the advantage on that would be. I tried googling for a while but still not sure what to do.
Please read the documentation of MySQL:
The primary key for a table represents the column or set of columns
that you use in your most vital queries. It has an associated index,
for fast query performance. Query performance benefits from the NOT
NULL optimization, because it cannot include any NULL values. With the
InnoDB storage engine, the table data is physically organized to do
ultra-fast lookups and sorts based on the primary key column or
columns.
If your table is big and important, but does not have an obvious
column or set of columns to use as a primary key, you might create a
separate column with auto-increment values to use as the primary key.
These unique IDs can serve as pointers to corresponding rows in other
tables when you join tables using foreign keys.
Thanks #AaronDigulla for his explanation...:
Necessary? No. Used behind the scenes? Well, it's saved to disk and
kept in the row cache, etc. Removing will slightly increase your
performance (use a watch with millisecond precision to notice).
But ... the next time someone needs to create references to this
table, they will curse you. If they are brave, they will add a PK (and
wait for a long time for the DB to create the column). If they are not
brave or dumb, they will start creating references using the business
key (i.e. the data columns) which will cause a maintenance nightmare.
Conclusion: Since the cost of having a PK (even if it's not used ATM)
is so small, let it be.
From my experience and knowledge if you do not define your primary key the database will create an hidden primary key. So in your situation best solution is to create it anyway.
I don't think that using an auto increment key, or using gender and age as a composite primary key would significantly change performance.
Anyway primary key on gender and age should be a nice choice as also it prevents duplicate entries (you can't repeat the same pair of values in other records) and leaves the table structure more clear.
I am not totally sure I am naming this right, but please bear with me.
I am wondering if is possible to do something like this in SQL(MySQL specifically):
Let's say we have tree-like data that is persisted in the database in the following table:
mysql> desc data_table;
+------------------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+---------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| parent_id | int(10) unsigned | YES | MUL | NULL | |
| value | text | YES | | NULL | |
So each row has a parent, except for the 'root' row and each row has children except for leaf rows.
Is it possible to find all descendants of any given row utilizing solely SQL?
It's possible to fetch all descendants utilizing solely SQL, but not in a single query. But I'm sure you figured that out; I assume you mean you want to do it in a single query.
You might be interested in reading about some alternative designs to store tree structures, that do enable you to fetch all descendants using a single SQL query. See my presentation Models for Hierarchical Data with SQL and PHP.
You can also use recursive SQL queries with other brands of database (e.g. PostgreSQL), but MySQL does not currently support this feature.
You're probably better of with the nested set model instead (see http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/ - further down). Its far more efficient for selects and you can get the complete path to each node with a simple self join.
However, in practice it is a good idea to pre-cache path and depth if you want to do things like " where depth = 3" or want to show the complete path for multiple nodes if you have more than 1000 records in your table.
I was just asking myself the same question.
This is what i googled:
http://moinne.com/blog/ronald/mysql/manage-hierarchical-data-with-mysql-stored-procedures
It works with stored procedures.
But so much logic in the DB is not a good thing in my opinion..