Should the auto increment column be the primary key? - mysql

I'm confused as to how to assign primary keys.
For example, let's say I have these two tables:
users table where the user_id is unique:
+---------+----------+--------------+
| user_id | username | password |
+---------+----------+--------------+
| 1 | hello | somepassword |
| 2 | world | another |
| 3 | stack | overflow |
+---------+----------+--------------+
posts table where the post_id is unique:
+---------+---------+--------------+
| post_id | user_id | content |
+---------+---------+--------------+
| 1 | 1 | Hello World! |
| 2 | 1 | Another. |
| 3 | 3 | Number 3. |
| 4 | 2 | Stack. |
| 5 | 1 | Overflow. |
+---------+---------+--------------+
Obviously for the users table the primary key should be user_id, but what should the primary key be in the posts table? post_id or user_id? Please explain.

The primary key for the Posts table should also be the auto increment value, post_id, because it's the only thing that uniquely identifies each post, because each post has an id unlike any other. The user_id won't always be unique because the same user could have multiple posts (as far as I know) so it can't uniquely identify the posts. If you need to relate information between the tables you can always do a join on the user_id from both tables, however to identify things with a primary key, the post_id would be your best bet.

Surely, you have this sceneraio:
A user can post several posts.
A post can be posted, logically, by one user only.
Thus, you are dealing with a One-To-Many model.
Once these things are clear to you, you can guess that the primary key of users must appear as a foreign key in posts. This is what you obviously have done already.
Now, wether post_id is enough as the primary key of posts depends on the whole entity relationship model you have (how many other entities do you have and what are their relationship to each others).
However, you will not need, for this specific scenario to combine the foreign key user_id as a part of the primary key of posts.
Note: when you implement your tables, please add the constraints of auto_increment and not null to user_id and post_id.
Let's summerize all this mess in SQL:
Table users:
mysql> create table users (user_id int(2) unique auto_increment not null, username varchar(15) not null, password varchar(20) not null, primary key(user_id));Query OK, 0 rows affected (0.33 sec)
Table posts:
mysql> create table posts(post_id int(2) unique auto_increment not null, user_id int(2) not null, content varchar(50) not null, foreign key(user_id) references users(user_id), primary key(post_id));
Query OK, 0 rows affected (0.26 sec)

Of course it should be user_id because when you use ORM then it will map tables automatically from proper naming of keys
You may refer ORM here : Good PHP ORM Library?

Related

What kind of relationship I have among database tables if the link between two fields is not enough to identify a record?

I have a MySQL database with several tables. Most tables contain data and both translatable and non-translatable texts.
When a table contains a text that can be translated to one or more languages, I do not store the text in that table, but I use an id referring to another table that contains all the possible translations for all the possible texts. Let us make a practical example:
Table USERS
user_id BIGINT(20) UNIQUE PRIMARY KEY
user_name VARCHAR(190)
user_surname VARCHAR(190)
user_age INT
user_motto_id BIGINT(20)
The user's name and surname are NOT translatable, so they are stored in the table. The user's motto can be translated, so I store in table only an id that correspond to the real user's motto. To identify the translation to retrieve, however, I use ALSO the language code dynamically returned by get_locale(). So user_motto_id corresponds to text_id, NOT trans_id. This is important.
Table TEXTS
trans_id BIGINT(20) UNIQUE PRIMARY KEY
text_id BIGINT(20)
lang_code VARCHAR(7)
text LONGTEXT
So, for example
Table USERS
1 | Joe | Doe | 25 | 101
2 | Mary | Foo | 31 | 107
Table TEXTS
1 | 101 | en_US | Raise your hearts
2 | 101 | it_IT | In alto i cuori
3 | 101 | fr_FR | Élevez vos cœurs
4 | 107 | en_US | To the stars and beyond
5 | 107 | it_IT | Fino alle stelle e oltre
6 | 107 | fr_FR | Vers les étoiles et au-delà
I am using MySQL Workbench to model the database. In my EER diagram no connection is visible between TEXTS and USERS because text_id is not enough to identify which translation should be used. I also need the lang_code that I obtain at run-time.
However I would like to make it visible that some connection exists, in some way, but how? Is there a way to say that user_motto_id is a value that should correspond to some text_id, even if it is not enough to retrieve the record I need?
If I understand your description correctly.
You must enter user_motto_id in the USERS table as an foreign key and connect to the TEXTS table.
Like the example below:
CREATE TABLE categories(
categoryId INT AUTO_INCREMENT PRIMARY KEY,
categoryName VARCHAR(100) NOT NULL
)ENGINE=INNODB;
CREATE TABLE products(
productId INT AUTO_INCREMENT PRIMARY KEY,
productName varchar(100) not null,
categoryId INT,
CONSTRAINT fk_category
FOREIGN KEY (categoryId)
REFERENCES categories(categoryId)
ON UPDATE SET NULL
ON DELETE SET NULL
)ENGINE=INNODB;
For better information, read the following link:
https://www.mysqltutorial.org/mysql-foreign-key/

MySQL Adding Foreign Key Error 1215

I know such a question is asked before. I made sure that they have the same data type and also checked my syntax, but I am still getting the error:
ALTER TABLE meetings ADD FOREIGN KEY (ownerName) REFERENCES employees(name);
ERROR 1215 (HY000): Cannot add foreign key constraint
mysql> desc `meetings`;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| room | int(6) | NO | | NULL | |
| ownerName | varchar(30) | NO | | NULL | |
| ownerID | varchar(30) | NO | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
mysql> desc `employees`;
+----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| name | varchar(30) | NO | | NULL | |
| username | varchar(30) | NO | PRI | NULL | |
| pswd | varchar(255) | YES | | NULL | |
+----------+--------------+------+-----+---------+-------+
What am I doing wrong?
name is not primary key in employees table so .. try using username
ALTER TABLE meetings ADD FOREIGN KEY (ownerName) REFERENCES employees(username);
or as suggested by DanielE or you can use the column name but need an UNIQUE index for this column
Change the primary key of 'employees' from user name to name. Then you can use
ALTER TABLE meetings ADD FOREIGN KEY (ownerName) REFERENCES employees(name);
What am I doing wrong? Error 1215 is probably the least of the problems here.
Some of the answers given for this question suggest making username the referenced column in employees rather than name, which is fine as far as it goes, but ignores some really fundamental issues in the schema and quite possibly wasn't the poster's intention for these columns. The rest of this answer is based on my own set of assumptions.
meetings table
Looking at the meetings table, I'm left wondering about the purpose of the ownerID column. Since the intention is to have ownerName as a foreign key to employees, what exactly is ownerID? The name suggests it also somehow references employees, but there is no id or ownerID in employees. Also, if any column starting owner... refers to an employee then why would you need both in the meetings table? One of them is surely redundant. Why is ownerID a VARCHAR(30)? ID columns tend to be INT. Of course, I may be reading to much into this and ownerID may have some other purpose that has nothing to do with an employee, but if that's the case the name is likely going to cause confusion in the future.
The meetings table also has an INT surrogate key in id. There's another INT for room. Since room isn't a foreign key, it suggests that rooms are either consistently identified only by number (which would be strange in my experience) and that there is nothing more to a 'room' that's worth capturing (e.g. location, capacity, equipment etc.) to bother with modelling data about the room in a separate table (again unlikely). Alternatively, room might itself be a foreign key referencing an INT id column in an, as yet undefined, rooms table.
employees table
If we accept ownerID as a more appropriate foreign key to the employee that owns the meeting (it uses less memory to index than either name or username) then consistency would suggest another surrogate key id as the primary key in the employees table. It's not necessary to do this, username would be unique and is fine on it's own, but it's simpler and more efficient. The other suggestion made that name should be the PK in employees is wrong - it presupposes that names are always unique.
A single column to cover an employee name would also be unusual.
The point made about referencing a PK or a unique index is well made (even if it's not strictly necessary in Innodb), I'd just say that ownerName is the wrong foreign key and username and name are the wrong references because there is a better alternative.
And, finally, is a NULL password (pswd) a good idea?

How to design a simple and efficient MySQL blacklist table

Suppose I have user table my_users in which there is a primary key id. Also, I wish to design (in MySQL) a simple blacklist table, whose declaration looks like this:
CREATE TABLE IF NOT EXISTS black_list (
user_id INT NOT NULL,
bad_string VARCHAR(100) NOT NULL,
FOREIGN KEY (user_id) REFERENCES my_users(id),
PRIMARY KEY (user_id, bad_string));
The interpretation of any row in the black_list is that a user with the ID user_id wants to blacklist the string bad_string. Obviously, user_id cannot be unique since a single user may have more than one blacklisted string. Other way around, bad_string cannot be unique since more than one users may have blacklisted the same string. However, the pair (user_id, bad_string) should be unique since it makes no sense for the user to black list the same string more than once.
When we select a black list via a user ID (SELECT * FROM black_list WHERE user_id = X) in the worst case, MySQL will have to scan the entire black_list table.
My question here is: is there a way for running the above SELECT statement in sublinear time with regard to the number of rows in the black_list table? If yes, how can I accomplish that?
Your assertion that SELECT * FROM black_list WHERE user_id = X will have to scan the entire black_list table is incorrect.
In this sql fiddle, you can see it's using an index:
+----+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | black_list | ref | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index |
+----+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------------+

MySQL 5.6 long WHERE IN query very slow

Since version 5.6 of MySQL a very simple albeit long query takes several orders longer than in 5.4.
The schema: Three tables, one with elements, one with categories and an M:N table tween those. Create Statements:
CREATE TABLE element (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=4257455 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE category (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=76 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE elements_categories (
id int(11) NOT NULL AUTO_INCREMENT,
element_id int(11) NOT NULL,
category_id int(11) NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY element_id (element_id,category_id),
KEY elements_categories_element_id (element_id),
KEY elements_categories_category_id (category_id),
CONSTRAINT D7d489b06a407a0c1c70f108712c815e FOREIGN KEY (category_id) REFERENCES category (id),
CONSTRAINT co_element_id_57f4f2ec0db9441c_fk_element_id FOREIGN KEY (element_id) REFERENCES element (id)
) ENGINE=InnoDB AUTO_INCREMENT=88131737 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The query:
SELECT elements_categories.element_id, category.id, category.name
FROM category
INNER JOIN elements_categories
ON category.id = elements_categories.category_id
WHERE elements_categories.element_id IN (1, 2, 3, ...)
So, the element table does not even play a role in this query, I already got a bunch of IDs from with with a previous query. (Disclaimer: I'm using an ORM and also inlining the first query did not make things faster.) The number of values in the IN clause can become very big, in my example 14240. That's not a problem, takes a tenth of a second or so. That's the execution plan:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+--------+---------------------------------------------------------------------------+------------+---------+---------------------------------+-------+--------------------------+
| 1 | SIMPLE | elements_categories | range | element_id,elements_categories_element_id,elements_categories.category_id | element_id | 4 | NULL | 42720 | Using where; Using index |
| 1 | SIMPLE | category | eq_ref | PRIMARY | PRIMARY | 4 | elements_categories.category_id | 1 | NULL |
When I add one more element, the execution time explodes to 60 seconds plus a fetch time of 200 seconds. The execution plan also changes to this:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+------+---------------------------------------------------------------------------+---------------------------------+---------+-------------+------+-------------+
| 1 | SIMPLE | category | ALL | PRIMARY | NULL | NULL | NULL | 75 | NULL |
| 1 | SIMPLE | elements_categories | ref | element_id,elements_categories_element_id,elements_categories_category_id | elements_categories_category_id | 4 | category.id | 760 | Using where |
range and eq_ref lookups exchanged for ALL and ref, order of tables switched, not using elements_categories.category_id as ref although it is the foreign key between those two tables. I don't get why the plan gets changed like this.
There are 75 categories and 4,300,000 elements and 1,600,000 assignments.
My guess is that I'm exceeding some size limit here, but cannot figure out which one. Also I didn't change anything from the MySQL 5.5 installation which stuck to the former execution plan all the time.
There are several ways to trick the optimizer to use the correct plan:
Add an index hint: ... JOIN elements_categories FORCE INDEX (element_id)...
Swap the tables around and make category a LEFT JOIN (assuming every elements_categories has a category). This is not a generic solution, but should work in this case.
Make a temp table with the element_id's and JOIN it in all of your queries instead of using IN (1,2,3...). You should also be able to use IN (SELECT id FROM <temp table>) instead of literals.
The reason that the optimizer chooses another plan when you have different parameters is that it looks at statistics from the tables and guess which index will remove the most rows, but this is a guess and can often be wrong.
If you know better you need to tell the optimizer what to do with an index hint like the first example #Vatev gives.
An interesting thing about the optimizer is that since an index adds an extra layer of indirection and thus potentially more reads it has to remove more than half the table to be considered useful by the optimizer. (I don't remember how much more than half...)
Another interesting feature of the optimizer is that if the index contains all information needed from a table it can avoid looking up the actual row so depending on your situation you might benefit from adding an extra column to the index. This optimization is used in the first query-plan "using index", but not the second. Thus adding "element_id" to your index "elements_categories_category_id" might speed things up. see http://dev.mysql.com/doc/refman/5.6/en/explain-output.html

MySQL JOIN extremely poor performance

I've been messing around all day trying to find why my query performance is terrible. It is extremely simple, yet can take over 15 minutes to execute (I abort the query at that stage). I am joining a table with over 2 million records.
This is the select:
SELECT
audit.MessageID, alerts.AlertCount
FROM
audit
LEFT JOIN (
SELECT MessageID, COUNT(ID) AS 'AlertCount'
FROM alerts
GROUP BY MessageID
) AS alerts ON alerts.MessageID = audit.MessageID
This is the EXPLAIN
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
| 1 | PRIMARY | AL | index | NULL | IDX_audit_MessageID | 4 | NULL | 2330944 | 100.00 | Using index |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 124140 | 100.00 | |
| 2 | DERIVED | alerts | index | NULL | IDX_alerts_MessageID | 5 | NULL | 124675 | 100.00 | Using index |
This is the schema:
# Not joining, just showing types
CREATE TABLE messages (
ID int NOT NULL AUTO_INCREMENT,
MessageID varchar(255) NOT NULL,
PRIMARY KEY (ID),
INDEX IDX_messages_MessageID (MessageID)
);
# 2,324,931 records
CREATE TABLE audit (
ID int NOT NULL AUTO_INCREMENT,
MessageID int NOT NULL,
LogTimestamp timestamp NOT NULL,
PRIMARY KEY (ID),
INDEX IDX_audit_MessageID (MessageID),
CONSTRAINT FK_audit_MessageID FOREIGN KEY(MessageID) REFERENCES messages(ID)
);
# 124,140
CREATE TABLE alerts (
ID int NOT NULL AUTO_INCREMENT,
AlertLevel int NOT NULL,
Text nvarchar(4096) DEFAULT NULL,
MessageID int DEFAULT 0,
PRIMARY KEY (ID),
INDEX IDX_alert_MessageID (MessageID),
CONSTRAINT FK_alert_MessageID FOREIGN KEY(MessageID) REFERENCES messages(ID)
);
A few very important things to note - the MessageID is not 1:1 in either 'audit' or 'alerts'; The MessageID can exist in one table, but not the other, or may exist in both (which is the purpose of my join); In my test DB, none of the MessageID exist in both. In other words, my query will return 2.3 million records with 0 as the count.
Another thing to note is that the 'audit' and 'alert' tables used to use MessageID as varchar(255). I created the 'messages' table expecting that it would fix the join. It actually made it worse. Previously, it would take 78 seconds, now, it never returns.
What am I missing about MySQL?
Subqueries are very hard for the MySQL engine to optimize. Try:
SELECT
audit.MessageID, COUNT(alerts.ID) AS AlertCount
FROM
audit
LEFT JOIN alerts ON alerts.MessageID = audit.MessageID
GROUP BY audit.MessageID
You're joining to a subquery.
The subquery results are effectively a temporary table - note the <derived2> in the query execution plan. As you can see there, they're not indexed, since they're ephemeral.
You should execute the query as a single unit with a join, rather than joining to the results of a second query.
EDIT: Andrew has posted an answer with one example of how to do your work in a normal join query, instead of in two steps.