Do I need a Primary Key If I'm using 1 to Many Relationship? - mysql

I have a table called branch
It looks something like.
+----------------+--------------+
| branch_id | branch_name |
+----------------+--------------+
| 1 | TestBranch1 |
| 2 | TestBranch2 |
+----------------+--------------+
I've set the branch_id as primary key.
Now my question is related to the next table called item
It looks like this.
+----------------+-----------+---------------------------+
| branch_id | item_id | item_name |
+----------------+-----------+---------------------------+
| 1 | 1 | Apple |
| 1 | 2 | Ball |
| 2 | 1 | Totally Difference Apple |
| 2 | 2 | Apple Apple 2 |
+----------------+-----------+---------------------------+
I'd like to know if I need to create a primary key for my item table?
UPDATE
They do not share the same items. Sorry for the confusion.. A branch can create a product that doesn't exist in the other branch. They are like two stores sharing the same database.
UPDATE
Sorry for the incomplete information.
These tables are actually from two local database...
I'm trying to create a database that can exist on its own but would still have no problem when mixed with another. So the system would just append all the item data from another branch without mixing them up.. The branches doesn't take the item_id of the other branches in consideration when generating a unique_id for their items. All the databases however may share same branch table as reference.
Thank you guys in advance.

I'd like to know if I need to create a primary key for my item table?
You always1 need a key2, whether the table is involved in a relationship3 or not. The only question is what kind of key?
Here are your options in this case:
Make {item_id} alone a key. This makes the relationship "non-identifying" and item a "strong" entity...
Which produces a slimmer key (compared to the second option), therefore any child tables that may reference it are slimmer.
Any ON UPDATE CASCADE actions are cut-off at the level of the item and not propagated to children.
May play better with ORMs.
Make a composite4 key on {branch_id, item_no}. This makes the relationship "identifying" and item a "weak" entity...
Which makes item itself slimmer (one less index).
Which may be very useful for clustering.
May help you avoid a JOIN in some cases (if there are child tables, branch_id is propagated to them).
May be necessary for correctly modelling "diamond-shaped" dependencies.
So pick your poison ;)
Of course, branch_id is a foreign key (but not key) in both cases.
And orthogonal to all that, if item_name has to be unique per-branch (as opposed to per whole table), you need a composite key on {branch_id, item_name} as well.
1 From the logical perspective, you always need a key, otherwise your table would be a multiset, therefore not a relation (which is a set), therefore your database would no longer be "relational". From the physical perspective, there may be some special cases for breaking this rule, but they are rare.
2 Whether its primary or not is immaterial from the logical standpoint, although it may be important if the DBMS ascribes a special meaning to it, such is the case with InnoDB which uses primary key as clustering key.
3 Please make a distinction between "relation" and "relationship".
4 Aka. "compound".

According to your example data you are using n to m relations and not 1 to m. It should be like this
item table
----------
item_id | item_name
1 | Apple
2 | Ball
branch_item table
-----------------
item_id | branch_id
1 | 1
1 | 2
2 | 1
2 | 2
And your brach_item table should have a compound unique key containg branch_id and item_id to make sure no duplicate entries can be added.

Yes you do. The Primary key is what allows the many to one relationship to exist.
This requirement is already catered for by the branch_id column.
The item_id column is not required for the one-to-many relationship in your example.

Related

Is it possible to have a table linking to a table that is linking back to the first table using foreign keys?

Im playing around with MySQL at the moment, learning stuff about database design and wondered something i couldnt find an answer to in Google.
Imagine a table named 'products' with the primary key 'id' and two additional columns named 'name' and 'primary_image_id', where 'primary_image_id' is a foreign key linking to a second table.
The second table is named 'product_images' also with the primary key 'id' and two additional columns this time called 'path' (path to the image) and 'product_id'. 'product_id' is of course a foreign key linking back to the first table.
+----+-----------+------------------+
| id | name | primary_image_id |
+----+-----------+------------------+
| 1 | product_A | 3 |
+----+-----------+------------------+
| 2 | product_B | 6 |
+----+-----------+------------------+
+----+-----------+------------------+
| id | path | product_id |
+----+-----------+------------------+
| 1 | /image_01 | 2 |
+----+-----------+------------------+
| 2 | /image_02 | 1 |
+----+-----------+------------------+
| 3 | /image_03 | 1 |
+----+-----------+------------------+
| 4 | /image_04 | 1 |
+----+-----------+------------------+
| 5 | /image_05 | 2 |
+----+-----------+------------------+
| 6 | /image_06 | 2 |
+----+-----------+------------------+
The idea is to have a table with all product images while only one image per product is the preview image (primary image). Is this type of foreign key linking even possible? And if yes, is it good databse design or should I use an other method?
Thank you in advance!
This is a valid use case and the table design looks good if your intention is to just read data using foreign key like "Get all image paths for product id 1" or "Get primary image of product id 1" or "Get paths of all primary images".
People tend to avoid the cycle of foreign key reference in tables specially if there is a cascade dependency on delete/update events. You need to answer questions like "What should happen to image 2, 3 ,4 if product 1 is deleted" or "what should happen to product 1 if image 3 is deleted".
The answers would help you come with a design that fulfills your requirement
Just use indexes without FOREIGN KEYs.
A more typical approach would be to move the primary flag to the images table. Both of these approaches have the potential for illogical data —
Your way would allow product 1 to name image A as its primary while image A could identify product 2 as its product.
My way would allow products to have 0 or 2+ primary images if the flag wasn’t well-managed.
Depending on how worried you are about either inconsistency, you could try to manage it via triggers or constraints, although MySQL is a little lacking in these areas compared to other DBMSs.
One way to absolutely prevent a problem would be to have the primary flag in the images table, but use it as an int (rank), not a Boolean with a convention that minimum rank is the “primary” — create a unique index on the combination of (product ID, rank) — and access this data via a stored proc or view that could implement the rank convention for you, e.g. select * from images a where product_id = whatever and does not exist (select 1 from images b where a.product_id = b.product_id and a.rank > b.rank).
Seems like overkill, but you need to be the judge how important potential data integrity issues are for your application.

MySQL Database Table optimization for FASTER Querying & Performance

I have 2 tables in a my MySQL Database.
Let's call 1st main, 2nd final.
TABLE `main` has the structure | TABLE `final` has the structure
|
`id` --> PRIMARY KEY (Auto Increment) | `id` --> PRIMARY KEY (Auto Increment)
| `id_main` --> ?? (Need help here)
|
id | name | info | id | id_main | name | info(changed)
--------------------- | ---------------------------------------
1 | Peter | 5,9 | 1 | 2 | Butters | 0.3,34
2 | Butters | 3,3 | 2 | 4 | Stewie | 1.2,4.4
3 | Stan | 2,96 | 3 | 1 | Peter | 5.7,0.9
4 | Stewie | 1,84 | 4 | 3 | Stan | 4.8,0.74
After analysing data in main the results get put into final.
As you can see final has an extra column (id_main) which points back to main.id
In actuality these 2 tables are 100 million+ rows each, my problem arises while performing SQL queries.
How should final especially (id & id_main) be configured so that Querying from main to final is the fastest.
Can I do away with final.id (PRIMARY KEY, Auto Increment) & keep
final.id_main (As an UNIQUE Index?)
OR
Should I keep id AS PRIMARY KEY (AI) & final.id_main AS UNIQUE Index?
I would be making calls like:
int id_From_Main= 10000;
SELECT `id_main` FROM `final` WHERE `id`='"+id_From_Main+"'
If there's a 1:1 relation between those tables, I don't see any reason why they would need two separate auto-incremented primary keys.
I would remove the final.id column and have the final.id_main as a non-auto-incremented primary key and a foreign key to the main.id column.
In general, you can also have a table without a primary key at all. It depends on if you want to be able to select specific individual rows or not.
I don't understand your query SELECT id_main FROM final WHERE id = '"+id_From_Main+"' — you're trying to select the value of ID from main by ID from main. What's the purpose, why are you trying to get the value you already have?
Anyway, you're not providing enough information to give you a qualified answer. You have to optimize you data structures according to queries you'll be doing.
Make sure you have indexes on columns which you are using in the WHERE clausule. If you're selecting by final.id_main, have an index on that column. If you're selecting by final.id_main and final.name, have a composite index on both columns, etc.
Do you really need to have the name column in both tables? It's a bad database design, unless it's some performance optimization (to avoid a join).
So, you should:
collect all queries you're currently using, set proper indexes according to them
remove any unnecessary columns (e.g. final.id, final.name)
use the EXPLAIN on your queries to get execution information (you can also use the Explain analyzer to help you interpret the results)
you can try query profiling
In mysql, you have to define id as PK because it is auto_increment. Define id_main as UNIQUE.

mysql auto increment id vs combined fields primary key

I'm having a diffecult time figuring out what to use for my primary key.
My table:
| gender | age | value | date updated | page id(the forgein key) |
| M | 15-24 | 100 | some date | 1
| M | 25-34 | 120 | some date | 1
| M | 35-44 | 110 | some date | 1
| F | 15-24 | 190 | some date | 1
| F | 25-34 | 230 | some date | 1
Now I need to add a primary key. I could either add a id field with auto increment and make that the pk but that id will not be used as forgein key or anything else in another table so it would be kind of useless to add it.
I could also combine the page, gender and age and make them the primary key but I am not sure what the advantage on that would be. I tried googling for a while but still not sure what to do.
Please read the documentation of MySQL:
The primary key for a table represents the column or set of columns
that you use in your most vital queries. It has an associated index,
for fast query performance. Query performance benefits from the NOT
NULL optimization, because it cannot include any NULL values. With the
InnoDB storage engine, the table data is physically organized to do
ultra-fast lookups and sorts based on the primary key column or
columns.
If your table is big and important, but does not have an obvious
column or set of columns to use as a primary key, you might create a
separate column with auto-increment values to use as the primary key.
These unique IDs can serve as pointers to corresponding rows in other
tables when you join tables using foreign keys.
Thanks #AaronDigulla for his explanation...:
Necessary? No. Used behind the scenes? Well, it's saved to disk and
kept in the row cache, etc. Removing will slightly increase your
performance (use a watch with millisecond precision to notice).
But ... the next time someone needs to create references to this
table, they will curse you. If they are brave, they will add a PK (and
wait for a long time for the DB to create the column). If they are not
brave or dumb, they will start creating references using the business
key (i.e. the data columns) which will cause a maintenance nightmare.
Conclusion: Since the cost of having a PK (even if it's not used ATM)
is so small, let it be.
From my experience and knowledge if you do not define your primary key the database will create an hidden primary key. So in your situation best solution is to create it anyway.
I don't think that using an auto increment key, or using gender and age as a composite primary key would significantly change performance.
Anyway primary key on gender and age should be a nice choice as also it prevents duplicate entries (you can't repeat the same pair of values in other records) and leaves the table structure more clear.

Is it okay to have non sequential ids as primary keys for a table in your database?

I don't know enough about databases to find the right words to ask this question, so let me give an example to explain what I'm trying to do: Suppose I want the primary key for a table to be an ID I grab from an API, but the majority of those API requests result in 404 errors. As a result, my table would look like this:
I also don't know how to format a table-like structure on Stack Overflow, so this is going to be a rough visual:
API_ID_PK | name
------------------
1 | Billy
5 | Timmy
23 | Richard
54 | Jobert
104 | Broccoli
Is it okay for the ID's not to be sequentially separated by 1 digit? Or should I do this:
ID PK | API_ID | NAME
----------------------------------------
1 | 1 | Billy
2 | 5 | Timmy
3 | 23 | Richard
4 | 54 | Jobert
5 | 104 | Broccoli
Would the second table be more efficient for indexing reasons? Or is the first table perfectly fine? Thanks!
No, there won't be any effect on efficiency if you have non-consecutive IDs. In fact, MySQL (and other databases) allow for you to set a variable auto_increment_increment to have the ID increment by more than 1. This is commonly used in multi-master setups.
It's fine to have IDs not sequential. I regularly use GUIDs for IDs when dealing with enterprise software where multiple business could share the same object and they're never sequential.
The one thing to watch out for is if the numbers are the same. What's determining the ID value you're storing?
If you have a clustered index (Sql-Server) on a ID column and insert IDs with random values (like Guids), this can have a negative effect, as the physical order of the clustered index corresponds to the logical order. This can lead to a lot of index re-organisations. See: Improving performance of cluster index GUID primary key.
However, ordered but non consecutive values (values not separated by 1) are not a problem for clustered indexes.
For non-clustered indexes the order doesn't matter. It is okay to insert random values for primary keys as long as they are unique.

Whats the most efficient way to store an array of integers in a MySQL column?

I've got two tables
A:
plant_ID | name.
1 | tree
2 | shrubbery
20 | notashrubbery
B:
area_ID | name | plants
1 | forrest | *needhelphere*
now I want the area to store any number of plants, in a specific order and some plants might show up a number of times: e.g 2,20,1,2,2,20,1
Whats the most efficient way to store this array of plants?
Keeping in mind I need to make it so that if I perform a search to find areas with plant 2, i don't get areas which are e.g 1,20,232,12,20 (pad with leading 0s?) What would be the query for that?
if it helps, let's assume I have a database of no more than 99999999 different plants. And yes, this question doesn't have anything to do with plants....
Bonus Question
Is it time to step away from MySQL? Is there a better DB to manage this?
If you're going to be searching both by forest and by plant, sounds like you would benefit from a full-on many-to-many relationship. Ditch your plants column, and create a whole new areas_plants table (or whatever you want to call it) to relate the two tables.
If area 1 has plants 1 and 2, and area 2 has plants 2 and 3, your areas_plants table would look like this:
area_id | plant_id | sort_idx
-----------------------------
1 | 1 | 0
1 | 2 | 1
2 | 2 | 0
2 | 3 | 1
You can then look up relationships from either side, and use simple JOINs to get the relevant data from either table. No need to muck about in LIKE conditions to figure out if it's in the list, blah, bleh, yuck. I've been there for a legacy database. No fun. Use SQL to its greatest potential.
How about this:
table: plants
plant_ID | name
1 | tree
2 | shrubbery
20 | notashrubbery
table: areas
area_ID | name
1 | forest
table: area_plant_map
area_ID | plant_ID | sequence
1 | 1 | 0
1 | 2 | 1
1 | 20 | 2
That's the standard normalized way to do it (with a mapping table).
To find all areas with a shrubbery (plant 2), do this:
SELECT *
FROM areas
INNER JOIN area_plant_map ON areas.area_ID = area_plant_map.area_ID
WHERE plant_ID = 2
You know this violates normal form?
Typically, one would have an areaplants table: area_ID, plant_ID with a unique constraint on the two and foreign keys to the other two tables. This "link" table is what gives you many-many or many-to-one relationships.
Queries on this are generally very efficient, they utilize indexes and do not require parsing strings.
8 years after this question was asked, here's 2 ideas:
1. Use json type (link)
As of MySQL 5.7.8, MySQL supports a native JSON data type defined by RFC 7159 that enables efficient access to data in JSON (JavaScript Object Notation) documents.
2. Use your own codification
Turn area_id into a string field (varchar or text, your choice, think about performance), then you can represent values as for example -21-30-2-4-20- then you can filter using %-2-%.
If you somehow try one of these, I'd love it if you shared your performance results, with 100M rows as you suggested.
--
Remember than using any of these breaks first rule of normalization, which says every column should hold a single value
Your relation attributes should be atomic, not made up of multiple values like lists. It is too hard to search them. You need a new relation that maps the plants to the area_ID and the area_ID/plant combination is the primary key.
Use many-to-many relationship:
CREATE TABLE plant (
plant_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255)
) ENGINE=INNODB;
CREATE TABLE area (
area_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255)
) ENGINE=INNODB;
CREATE TABLE plant_area_xref (
plant_id INT NOT NULL,
area_id INT NOT NULL,
sort_idx INT NOT NULL,
FOREIGN KEY (plant_id) REFERENCES plant(plant_id) ON DELETE CASCADE,
FOREIGN KEY (area_id) REFERENCES area(area_id) ON DELETE CASCADE,
PRIMARY KEY (plant_id, area_id, sort_idx)
) ENGINE=INNODB;
EDIT:
Just to answer your bonus question:
Bonus Question Is it time to step away from MySQL? Is there a better DB to manage this?
This has nothing to do with MySQL. This was just an issue with bad database design. You should use intersection tables and many-to-many relationship for cases like this in every RDBMS (MySQL, Oracle, MSSQL, PostgreSQL etc).