mysql auto increment id vs combined fields primary key - mysql

I'm having a diffecult time figuring out what to use for my primary key.
My table:
| gender | age | value | date updated | page id(the forgein key) |
| M | 15-24 | 100 | some date | 1
| M | 25-34 | 120 | some date | 1
| M | 35-44 | 110 | some date | 1
| F | 15-24 | 190 | some date | 1
| F | 25-34 | 230 | some date | 1
Now I need to add a primary key. I could either add a id field with auto increment and make that the pk but that id will not be used as forgein key or anything else in another table so it would be kind of useless to add it.
I could also combine the page, gender and age and make them the primary key but I am not sure what the advantage on that would be. I tried googling for a while but still not sure what to do.

Please read the documentation of MySQL:
The primary key for a table represents the column or set of columns
that you use in your most vital queries. It has an associated index,
for fast query performance. Query performance benefits from the NOT
NULL optimization, because it cannot include any NULL values. With the
InnoDB storage engine, the table data is physically organized to do
ultra-fast lookups and sorts based on the primary key column or
columns.
If your table is big and important, but does not have an obvious
column or set of columns to use as a primary key, you might create a
separate column with auto-increment values to use as the primary key.
These unique IDs can serve as pointers to corresponding rows in other
tables when you join tables using foreign keys.
Thanks #AaronDigulla for his explanation...:
Necessary? No. Used behind the scenes? Well, it's saved to disk and
kept in the row cache, etc. Removing will slightly increase your
performance (use a watch with millisecond precision to notice).
But ... the next time someone needs to create references to this
table, they will curse you. If they are brave, they will add a PK (and
wait for a long time for the DB to create the column). If they are not
brave or dumb, they will start creating references using the business
key (i.e. the data columns) which will cause a maintenance nightmare.
Conclusion: Since the cost of having a PK (even if it's not used ATM)
is so small, let it be.
From my experience and knowledge if you do not define your primary key the database will create an hidden primary key. So in your situation best solution is to create it anyway.

I don't think that using an auto increment key, or using gender and age as a composite primary key would significantly change performance.
Anyway primary key on gender and age should be a nice choice as also it prevents duplicate entries (you can't repeat the same pair of values in other records) and leaves the table structure more clear.

Related

MySQL Database Table optimization for FASTER Querying & Performance

I have 2 tables in a my MySQL Database.
Let's call 1st main, 2nd final.
TABLE `main` has the structure | TABLE `final` has the structure
|
`id` --> PRIMARY KEY (Auto Increment) | `id` --> PRIMARY KEY (Auto Increment)
| `id_main` --> ?? (Need help here)
|
id | name | info | id | id_main | name | info(changed)
--------------------- | ---------------------------------------
1 | Peter | 5,9 | 1 | 2 | Butters | 0.3,34
2 | Butters | 3,3 | 2 | 4 | Stewie | 1.2,4.4
3 | Stan | 2,96 | 3 | 1 | Peter | 5.7,0.9
4 | Stewie | 1,84 | 4 | 3 | Stan | 4.8,0.74
After analysing data in main the results get put into final.
As you can see final has an extra column (id_main) which points back to main.id
In actuality these 2 tables are 100 million+ rows each, my problem arises while performing SQL queries.
How should final especially (id & id_main) be configured so that Querying from main to final is the fastest.
Can I do away with final.id (PRIMARY KEY, Auto Increment) & keep
final.id_main (As an UNIQUE Index?)
OR
Should I keep id AS PRIMARY KEY (AI) & final.id_main AS UNIQUE Index?
I would be making calls like:
int id_From_Main= 10000;
SELECT `id_main` FROM `final` WHERE `id`='"+id_From_Main+"'
If there's a 1:1 relation between those tables, I don't see any reason why they would need two separate auto-incremented primary keys.
I would remove the final.id column and have the final.id_main as a non-auto-incremented primary key and a foreign key to the main.id column.
In general, you can also have a table without a primary key at all. It depends on if you want to be able to select specific individual rows or not.
I don't understand your query SELECT id_main FROM final WHERE id = '"+id_From_Main+"' — you're trying to select the value of ID from main by ID from main. What's the purpose, why are you trying to get the value you already have?
Anyway, you're not providing enough information to give you a qualified answer. You have to optimize you data structures according to queries you'll be doing.
Make sure you have indexes on columns which you are using in the WHERE clausule. If you're selecting by final.id_main, have an index on that column. If you're selecting by final.id_main and final.name, have a composite index on both columns, etc.
Do you really need to have the name column in both tables? It's a bad database design, unless it's some performance optimization (to avoid a join).
So, you should:
collect all queries you're currently using, set proper indexes according to them
remove any unnecessary columns (e.g. final.id, final.name)
use the EXPLAIN on your queries to get execution information (you can also use the Explain analyzer to help you interpret the results)
you can try query profiling
In mysql, you have to define id as PK because it is auto_increment. Define id_main as UNIQUE.

Quite large (400k) mysql database design for multiple users

I'm seeking advice of experienced admins.
I'm working on a website, where you solve word anagrams. If it solved it should never be displayed again.
Wordbase contains ~400k entries. What would be the most effective solution to storing such data?
One way could be:
+---------+------------------------+
| word_id | user1 | user2 | user...|
+---------+------------------------+
| 1 | null | null | 1 |
| 2 | 1 | null | null |
| ... | | | |
| 400000 | null | 1 | null |
+---------+------------------------+
Where let's say 1 = solved.
But wouldn't it become a monster quite quickly?
(+even a simple query of extending it by a new user takes forever)
Other solution is to store every solved word_id for all users, but then, it can be 6-digits for every entry and growing massively and rapidly aswell.
Also which engine would be more effective in this example? MyISAM or InnoDB?
You would not put the users as columns. If I understand the question, you would have a table, called something like WordUsers with one row per "word" and one per "user":
create table WordUsers (
WordUserId int not null primary key auto_increment,
WordId int not null,
UserId int not null,
. . .
constraint fk_WordId foreign key (WordId) references Words(WordId),
constraint fk_UserId foreign key (UserId) references Users(UserId)
);
When a word is shown to a user, then you add a row to this table. The . . . can include other information, such as the date/time of the interaction.
If your database supports it (and I think all of them do now) - why not just put either a text field on the user's table, fill it with a string of "N"s for "No - they haven't seen this word yet" and when they are given a word just change the "N" to "Y" for that record/word and re-save the new string? A TEXT string can be up to 65,536 characters long. So you make your string something like 5,000 "N"s.
Or if you want to beat yourself up a bit - use the BIT field and make it something like 5000 flags. Same concept but harder to use.
BTW: On the string of "N"s and "Y"s you should be able to make an SQL query which has something like "WHERE SUBSTR(SEEN_IT,WORD_ID,1)='N'" kind of test.
You should use a relational database, like it should be, relational:
CREATE TABLE user( user_id int autoincrement, user CHAR(16));
CREATE TABLE word( word_id int autoincrement, word CHAR(16));
CREATE TABLE solved( word_id, user_id);

Do I need a Primary Key If I'm using 1 to Many Relationship?

I have a table called branch
It looks something like.
+----------------+--------------+
| branch_id | branch_name |
+----------------+--------------+
| 1 | TestBranch1 |
| 2 | TestBranch2 |
+----------------+--------------+
I've set the branch_id as primary key.
Now my question is related to the next table called item
It looks like this.
+----------------+-----------+---------------------------+
| branch_id | item_id | item_name |
+----------------+-----------+---------------------------+
| 1 | 1 | Apple |
| 1 | 2 | Ball |
| 2 | 1 | Totally Difference Apple |
| 2 | 2 | Apple Apple 2 |
+----------------+-----------+---------------------------+
I'd like to know if I need to create a primary key for my item table?
UPDATE
They do not share the same items. Sorry for the confusion.. A branch can create a product that doesn't exist in the other branch. They are like two stores sharing the same database.
UPDATE
Sorry for the incomplete information.
These tables are actually from two local database...
I'm trying to create a database that can exist on its own but would still have no problem when mixed with another. So the system would just append all the item data from another branch without mixing them up.. The branches doesn't take the item_id of the other branches in consideration when generating a unique_id for their items. All the databases however may share same branch table as reference.
Thank you guys in advance.
I'd like to know if I need to create a primary key for my item table?
You always1 need a key2, whether the table is involved in a relationship3 or not. The only question is what kind of key?
Here are your options in this case:
Make {item_id} alone a key. This makes the relationship "non-identifying" and item a "strong" entity...
Which produces a slimmer key (compared to the second option), therefore any child tables that may reference it are slimmer.
Any ON UPDATE CASCADE actions are cut-off at the level of the item and not propagated to children.
May play better with ORMs.
Make a composite4 key on {branch_id, item_no}. This makes the relationship "identifying" and item a "weak" entity...
Which makes item itself slimmer (one less index).
Which may be very useful for clustering.
May help you avoid a JOIN in some cases (if there are child tables, branch_id is propagated to them).
May be necessary for correctly modelling "diamond-shaped" dependencies.
So pick your poison ;)
Of course, branch_id is a foreign key (but not key) in both cases.
And orthogonal to all that, if item_name has to be unique per-branch (as opposed to per whole table), you need a composite key on {branch_id, item_name} as well.
1 From the logical perspective, you always need a key, otherwise your table would be a multiset, therefore not a relation (which is a set), therefore your database would no longer be "relational". From the physical perspective, there may be some special cases for breaking this rule, but they are rare.
2 Whether its primary or not is immaterial from the logical standpoint, although it may be important if the DBMS ascribes a special meaning to it, such is the case with InnoDB which uses primary key as clustering key.
3 Please make a distinction between "relation" and "relationship".
4 Aka. "compound".
According to your example data you are using n to m relations and not 1 to m. It should be like this
item table
----------
item_id | item_name
1 | Apple
2 | Ball
branch_item table
-----------------
item_id | branch_id
1 | 1
1 | 2
2 | 1
2 | 2
And your brach_item table should have a compound unique key containg branch_id and item_id to make sure no duplicate entries can be added.
Yes you do. The Primary key is what allows the many to one relationship to exist.
This requirement is already catered for by the branch_id column.
The item_id column is not required for the one-to-many relationship in your example.

setting MySQL auto_increment to be dependent on two other primary keys

i'm trying to set up a MySQL database for storing biological data. I have to extract this data from a file and i have a perl script for that. The problem i have is that i need three primary keys in order for them to be unique, and i want one of them to be an auto increment integer. I would like, however, the auto-incremented value to reset each time the combination of the first two keys changes.
sequence1 | hit1 | 1
sequence1 | hit1 | 2
sequence1 | hit2 | 1
sequence2 | something | 1
sequence2 | something | 2
sequence2 | something | 3
sequence3 | something | 1
etc. etc.
is that possible or do i have to implement that directly into the script?
thank you
It is possible with MyISAM tables only and will not work in InnoDB or any other storage engine MySQL has.
Just create a primary key on (col1, col2, id) and set auto_increment flag on id column. And make sure there is no unique constraint on id alone. MyISAM will generate a new sequence of values per each unique pair of (col1, col2).

Whats the most efficient way to store an array of integers in a MySQL column?

I've got two tables
A:
plant_ID | name.
1 | tree
2 | shrubbery
20 | notashrubbery
B:
area_ID | name | plants
1 | forrest | *needhelphere*
now I want the area to store any number of plants, in a specific order and some plants might show up a number of times: e.g 2,20,1,2,2,20,1
Whats the most efficient way to store this array of plants?
Keeping in mind I need to make it so that if I perform a search to find areas with plant 2, i don't get areas which are e.g 1,20,232,12,20 (pad with leading 0s?) What would be the query for that?
if it helps, let's assume I have a database of no more than 99999999 different plants. And yes, this question doesn't have anything to do with plants....
Bonus Question
Is it time to step away from MySQL? Is there a better DB to manage this?
If you're going to be searching both by forest and by plant, sounds like you would benefit from a full-on many-to-many relationship. Ditch your plants column, and create a whole new areas_plants table (or whatever you want to call it) to relate the two tables.
If area 1 has plants 1 and 2, and area 2 has plants 2 and 3, your areas_plants table would look like this:
area_id | plant_id | sort_idx
-----------------------------
1 | 1 | 0
1 | 2 | 1
2 | 2 | 0
2 | 3 | 1
You can then look up relationships from either side, and use simple JOINs to get the relevant data from either table. No need to muck about in LIKE conditions to figure out if it's in the list, blah, bleh, yuck. I've been there for a legacy database. No fun. Use SQL to its greatest potential.
How about this:
table: plants
plant_ID | name
1 | tree
2 | shrubbery
20 | notashrubbery
table: areas
area_ID | name
1 | forest
table: area_plant_map
area_ID | plant_ID | sequence
1 | 1 | 0
1 | 2 | 1
1 | 20 | 2
That's the standard normalized way to do it (with a mapping table).
To find all areas with a shrubbery (plant 2), do this:
SELECT *
FROM areas
INNER JOIN area_plant_map ON areas.area_ID = area_plant_map.area_ID
WHERE plant_ID = 2
You know this violates normal form?
Typically, one would have an areaplants table: area_ID, plant_ID with a unique constraint on the two and foreign keys to the other two tables. This "link" table is what gives you many-many or many-to-one relationships.
Queries on this are generally very efficient, they utilize indexes and do not require parsing strings.
8 years after this question was asked, here's 2 ideas:
1. Use json type (link)
As of MySQL 5.7.8, MySQL supports a native JSON data type defined by RFC 7159 that enables efficient access to data in JSON (JavaScript Object Notation) documents.
2. Use your own codification
Turn area_id into a string field (varchar or text, your choice, think about performance), then you can represent values as for example -21-30-2-4-20- then you can filter using %-2-%.
If you somehow try one of these, I'd love it if you shared your performance results, with 100M rows as you suggested.
--
Remember than using any of these breaks first rule of normalization, which says every column should hold a single value
Your relation attributes should be atomic, not made up of multiple values like lists. It is too hard to search them. You need a new relation that maps the plants to the area_ID and the area_ID/plant combination is the primary key.
Use many-to-many relationship:
CREATE TABLE plant (
plant_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255)
) ENGINE=INNODB;
CREATE TABLE area (
area_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255)
) ENGINE=INNODB;
CREATE TABLE plant_area_xref (
plant_id INT NOT NULL,
area_id INT NOT NULL,
sort_idx INT NOT NULL,
FOREIGN KEY (plant_id) REFERENCES plant(plant_id) ON DELETE CASCADE,
FOREIGN KEY (area_id) REFERENCES area(area_id) ON DELETE CASCADE,
PRIMARY KEY (plant_id, area_id, sort_idx)
) ENGINE=INNODB;
EDIT:
Just to answer your bonus question:
Bonus Question Is it time to step away from MySQL? Is there a better DB to manage this?
This has nothing to do with MySQL. This was just an issue with bad database design. You should use intersection tables and many-to-many relationship for cases like this in every RDBMS (MySQL, Oracle, MSSQL, PostgreSQL etc).