Should I use a unique index here? And why? - mysql

I read but I'm still confused when to use a normal index or a unique index in MySQL. I have a table that stores posts and responses (id, parentId). I have set up three normal indices for parentId, userId, and editorId.
Would using unique indices benefit me in any way given the following types of queries I will generally run? And why?
Most of my queries will return a post and its responses:
SELECT * FROM posts WHERE id = #postId OR parentId = #postId ORDER BY postTypeId
Some times I will add a join to get user data:
SELECT * FROM posts
JOIN users AS owner ON owner.id = posts.userId
LEFT JOIN users AS editor ON editor.id = posts.editorId
WHERE id = #postId OR parentId = #postId ORDER BY postTypeId
Other times I may ask for a user and his/her posts:
SELECT * FROM users
LEFT JOIN posts ON users.id = posts.userid
WHERE id = #userId
My schema looks like this:
CREATE TABLE `posts` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`posttypeid` int(10) NOT NULL,
`parentid` int(10) DEFAULT NULL,
`body` text NOT NULL,
`userid` int(10) NOT NULL,
`editorid` int(10) NOT NULL,
`updatedat` datetime DEFAULT NULL,
`createdat` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `userId` (`userid`),
KEY `editorId` (`editorid`),
KEY `parentId` (`parentid`)
) ENGINE=InnoDB AUTO_INCREMENT=572 DEFAULT CHARSET=utf8

When an index is created as UNIQUE, it only adds consistency to your table: inserting a new entry reusing the same key by error will fail, instead of being accepted and lead to strange errors later.
So, you should use it for your IDs when you know there won't be duplicate (it's by default and mandatory for primary keys), but it won't give you any benefits performance wise. It only gives you a guarantee that you won't have to deal with a specific kind of database corruption because of a bug in the client code.
However, if you know there can be duplicates (which I assume is the case for your columns userId, editorId, and parentId), using the UNIQUE attribute would be a serious bug: it would forbid multiple posts with the same userId, editorId or parentId.
In short: use it everywhere you can, but in this case you can't.

Unique is a constraint that just happens to be implemented by the index.
Use unique when you need unique values. IE no duplicates. Otherwise don't. That simple really.

Unique keys do not have any benefit over normal keys for data retrieval. Unique keys are indexes with a constraint: they prevent insertion of the same value and so they only benefit inserts.

Related

MySQL — Query possible without three SELECT?

Happy New Year's, everyone!
I'll jump right into it. I've inherited a project that includes a very large database. Some tables are upwards of 285.6GiB.
One of the larger tables is user-ratings. The table has the following columns (simplified):
from — VARCHAR(19)
reason — VARCHAR(512)
stars — TINYINT
timestamp — TIMESTAMP
to — VARCHAR(19)
Currently, users can check the ratings of other users. THis shows a summary of their ratings given, received, as well as their entire last 5 ratings received. To do this, we'd currently use the following queries (simplified):
# First query — ratings given from the user
SELECT Avg(`stars`),
Min(`timestamp`),
Count(*),
Count(DISTINCT( `to` ))
INTO avgStarsGiven, firstRatingGivenAt, totalRatingsGiven,
totalUniqueRatingsGiven
FROM `ratings`
WHERE `from` = user;
# Second query — ratings received by the user
SELECT Avg(`stars`),
Min(`timestamp`),
Count(*),
Count(DISTINCT( `from` ))
INTO avgStarsReceived, firstRatingReceivedAt, totalRatingsReceived,
totalUniqueRatingsReceived
FROM `ratings`
WHERE `to` = user;
# Third query — get the last 5 ratings to the user
SELECT * FROM `ratings` WHERE `to` = user ORDER BY `timestamp` DESC LIMIT 5;
Is it possible to retrieve all of this information without having to go over the entire table 3 times?
Thanks in advance!
Edit: The table and version are below:
# 8.0.27-0ubuntu0.20.04.1
CREATE TABLE `ratings` (
`no` int NOT NULL AUTO_INCREMENT,
`from` varchar(19) NOT NULL,
`to` varchar(19) NOT NULL,
`reason` varchar(512) NOT NULL,
`stars` tinyint NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`no`),
KEY `i-ratings-from` (`from`) /*!80000 INVISIBLE */,
KEY `i-ratings-to` (`to`) /*!80000 INVISIBLE */,
KEY `i-ratings-from-to-timestamp` (`from`,`to`,`timestamp`),
CONSTRAINT `fk-ratings-from` FOREIGN KEY (`from`) REFERENCES `users` (`user`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `fk-ratings-to` FOREIGN KEY (`to`) REFERENCES `users` (`user`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=59 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
You probably don't need to go over the entire table in any of these queries, if you define indexes on the columns from and to. When you look for a person by name in a telephone book, do you read the entire book every time?
ALTER TABLE ratings
ADD INDEX (`from`),
ADD INDEX (`to`, `timestamp`);
You can use EXPLAIN to confirm that it's using the index:
EXPLAIN SELECT * FROM `ratings` WHERE `to` = <example-value>
ORDER BY `timestamp` DESC LIMIT 5;
The EXPLAIN report should show you in its rows field that it will examine a small subset of the rows of the table. This is one of the benefits of an index, to narrow down the search efficiently, so a query doesn't need to scan the entire table.
You edited your question above to add the CREATE TABLE definition.
I see that your table already has some indexes, but these indexes aren't tailored very well to the queries you show. You might like to review my presentation How to Design Indexes, Really, or the video.
Also I see that some of your indexes are defined with the INVISIBLE option, which means the optimizer won't use these indexes. Read https://dev.mysql.com/doc/refman/8.0/en/invisible-indexes.html for details.

MySQL composite index effect on joins

I have the following SQL query (DB is MySQL 5):
select
event.full_session_id,
DATE(min(event.date)),
event_exe.user_id,
COUNT(DISTINCT event_pat.user_id)
FROM
event AS event
JOIN event_participant AS event_pat ON
event.pat_id = event_pat.id
JOIN event_participant AS event_exe on
event.exe_id = event_exe.id
WHERE
event_pat.user_id <> event_exe.user_id
GROUP BY
event.full_session_id;
"SHOW CREATE TABLE event":
CREATE TABLE `event` (
`id` int(12) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`session_id` varchar(64) DEFAULT NULL,
`full_session_id` varchar(72) DEFAULT NULL,
`pat_id` int(12) DEFAULT NULL,
`exe_id` int(12) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `SESSION_IDX` (`full_session_id`),
KEY `PAT_ID_IDX` (`pat_id`),
KEY `DATE_IDX` (`date`),
KEY `SESSLOGPATEXEC_IDX` (`full_session_id`,`date`,`pat_id`,`exe_id`)
) ENGINE=MyISAM AUTO_INCREMENT=371955 DEFAULT CHARSET=utf8
"SHOW CREATE TABLE event_participant":
CREATE TABLE `event_participant` (
`id` int(12) NOT NULL AUTO_INCREMENT,
`user_id` varchar(64) NOT NULL,
`alt_user_id` varchar(64) NOT NULL,
`username` varchar(128) NOT NULL,
`usertype` varchar(32) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ALL_UNQ` (`user_id`,`alt_user_id`,`username`,`usertype`),
KEY `USER_ID_IDX` (`user_id`)
) ENGINE=MyISAM AUTO_INCREMENT=5397 DEFAULT CHARSET=utf8
Also, the query itself seems ugly, but this is legacy code on a production system, so we are not expected to change it (at least for now).
The problem is that, there is around 36 million record on the event table (in the production system), so there have been frequent crashes of the DB machine due to using temporary;using filesort processing (they provided these EXPLAIN outputs, unfortunately, I don't have them right now. I'll try to update them to this post later.)
The customer asks for a "quick fix" by adding indices. Currently we have indices on full_session_id, pat_id, date (separately) on event and user_id on event_participant.
Thus I'm thinking of creating a composite index (pat_id, exe_id, full_session_id, date) on event- this index comprises of the fields in the join (equivalent to where ?), then group by, then aggregate (min) parts.
This is just an idea because we currently don't have that kind of data volume to test, so we try the best we could first.
My question is:
Could the index above help in the performance ? (It's quite confusing on the effect because I have found two really contrasting results: https://dba.stackexchange.com/questions/158385/compound-index-on-inner-join-table
versus Separate Join clause in a Composite Index, where the latter suggests that composite index on joins won't work and the former that it'll work.
Does this path (adding indices) have hopes ? Or should we forget it and just try to optimize the query instead ?
Thanks in advance for your help :)
Update:
I have updated the full table description for the two related tables.
MySQL version is 5.1.69. But I think we don't need to worry about the ambiguous data issue mentioned in the comments, because it seems there won't be ambiguity for our data. Specifically, for each full_session_id, there is only one "event_exe.user_id" returned (it's just a business logic in the application)
So, what do you think about my 2 questions ?

Subquery processing more rows than necessary

I am optimising my queries and found something I can't get my head around.
I am using the following query to select a bunch of categories, combining them with an alias from a table containing old and new aliases for categories:
SELECT `c`.`id` AS `category.id`,
(SELECT `alias`
FROM `aliases`
WHERE category_id = c.id
AND `old` = 0
AND `lang_id` = 1
ORDER BY `id` DESC
LIMIT 1) AS `category.alias`
FROM (`categories` AS c)
WHERE `c`.`status` = 1 AND `c`.`parent_id` = '11';
There are only 2 categories with a value of 11 for parent_id, so it should look up 2 categories from the alias table.
Still if I use EXPLAIN it says it has to process 48 rows. The alias table contains 1 entry per category as well (in this case, it can be more). Everything is indexed and if I understand correctly therefore it should find the correct alias immediately.
Now here's the weird thing. When I don't compare the aliases by the categories from the conditions, but manually by the category ids the query returns, it does process only 1 row, as intended with the index.
So I replace WHERE category_id = c.id by WHERE category_id IN (37, 43) and the query gets faster:
The only thing I can think of is that the subquery isn't run over the results from the query but before some filtering is done. Any kind of explanation or help is welcome!
Edit: silly me, the WHERE IN doesn't work as it doesn't make a unique selection. The question still stands though!
Create table schema
CREATE TABLE `aliases` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lang_id` int(2) unsigned NOT NULL DEFAULT '1',
`alias` varchar(255) DEFAULT NULL,
`product_id` int(10) unsigned DEFAULT NULL,
`category_id` int(10) unsigned DEFAULT NULL,
`brand_id` int(10) unsigned DEFAULT NULL,
`page_id` int(10) unsigned DEFAULT NULL,
`campaign_id` int(10) unsigned DEFAULT NULL,
`old` tinyint(1) unsigned DEFAULT '0',
PRIMARY KEY (`id`),
KEY `product_id` (`product_id`),
KEY `category_id` (`category_id`),
KEY `page_id` (`page_id`),
KEY `alias_product_id` (`product_id`,`alias`),
KEY `alias_category_id` (`category_id`,`alias`),
KEY `alias_page_id` (`page_id`,`alias`),
KEY `alias_brand_id` (`brand_id`,`alias`),
KEY `alias_product_id_old` (`alias`,`product_id`,`old`),
KEY `alias_category_id_old` (`alias`,`category_id`,`old`),
KEY `alias_brand_id_old` (`alias`,`brand_id`,`old`),
KEY `alias_page_id_old` (`alias`,`page_id`,`old`),
KEY `lang_brand_old` (`lang_id`,`brand_id`,`old`),
KEY `id_category_id_lang_id_old` (`lang_id`,`old`,`id`,`category_id`)
) ENGINE=InnoDB AUTO_INCREMENT=112392 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
SELECT ...
WHERE x=1 AND y=2
ORDER BY id DESC
LIMIT 1
will be performed in one of several ways.
Since you have not shown us the indexes you have (SHOW CREATE TABLE), I will cover some likely cases...
INDEX(x, y, id) -- This can find the last row for that condition, so it does not need to look at more than one row.
Some other index, or no index: Scan DESCending from the last id checking each row for x=1 AND y=2, stopping when (if) such a row is found.
Some other index, or no index: Scan the entire table, checking each row for x=1 AND y=2; collect them into a temp table; sort by id; deliver one row.
Some of the EXPLAIN clues:
Using where -- does not say much
Using filesort -- it did a sort, apparently for the ORDER BY. (It may have been entirely done in RAM; ignore 'file'.)
Using index condition (not "Using index") -- this indicates an internal optimization in which it can check the WHERE clause more efficiently than it used to in older versions.
Do not trust the "Rows" in EXPLAIN. Often they are reasonably correct, but sometimes they are off by orders of magnitude. Here is a better way to see "how much work" is being done in a rather fast query:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
With the CREATE TABLE, I may have suggestions on how to improve the index.

Relatively simple SQL query with join refuses to be efficient

I'm having some problems optimizing a certain query in SQL(using MariaDB), to give you some context: I have a system with "events"(see them as log entries) that can occur on tickets, but also on some other objects besides tickets(which I why I seperated the event and ticket_event tables). I want to get all ticket_events sorted by display_time. The event table has ~20M rows right now.
CREATE TABLE IF NOT EXISTS `event` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(255) DEFAULT NULL,
`data` text,
`display_time` datetime DEFAULT NULL,
`created_time` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_for_display_time_and_id` (`id`,`display_time`),
KEY `index_for_display_time` (`display_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `ticket_event` (
`id` int(11) NOT NULL,
`ticket_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `ticket_id` (`ticket_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `ticket_event`
ADD CONSTRAINT `ticket_event_ibfk_1` FOREIGN KEY (`id`) REFERENCES `event` (`id`),
ADD CONSTRAINT `ticket_event_ibfk_2` FOREIGN KEY (`ticket_id`) REFERENCES `ticket` (`id`);
As you see I already played around with some keys(I also made one for (id, ticket_id) that doesn't show up here now since I removed it again) The query I execute:
SELECT * FROM ticket_event
INNER JOIN event ON event.id = ticket_event.id
ORDER BY display_time DESC
LIMIT 25
That query takes quite a while to execute(~30s if I filter on a specific ticket_id, can't even complete it reliably without filtering on it). If I run an explain on the query it shows it does a filesort + temporary:
I played around with force index etc. a bit, but that doesn't seem to solve anything or I did it wrong.
Does anyone see what I did wrong or what I can optimize here? I would very much prefer not to make "event" a wide table by adding ticket_id/host_id etc. as columns and just making them NULL if they don't apply.
Thanks in advance!
EDIT: Extra image of EXPLAIN with actual rows in the table:
OK what if you try to force the index?
SELECT * FROM ticket_event
INNER JOIN event
FORCE INDEX (index_for_display_time)
ON event.id = ticket_event.id
ORDER BY display_time DESC
LIMIT 25;
Your query selects every column from every row, even if you use a LIMIT. Have you tried to select one specific row by id?
KEY `index_for_display_time_and_id` (`id`,`display_time`),
is useless; DROP it. It is useless because you are using InnoDB, which stores the data "clustered" on the PK (id).
Please change ticket_event.id to event_id. id is confusing because it feels like the PK of the mapping table, which it is. But wait! That does not make sense? There is only one ticket for each event? Then why does ticket_event exist at all? Why not put ticket_id in event?
For a many-to-many table, do
CREATE TABLE IF NOT EXISTS `ticket_event` (
`event_id` int(11) NOT NULL,
`ticket_id` int(11) NOT NULL,
PRIMARY KEY (`event_id`, ticket_id), -- for lookup one direction
KEY (`ticket_id`, event_id) -- for the other direction
) ENGINE=InnoDB DEFAULT;
Maybe you will achieve a better performance by trying this:
SELECT *
FROM ticket_event
INNER JOIN (select * from event ORDER BY display_time DESC limit 25) as b
ON b.id = ticket_event.id;

Mysql Join Query optimization

I have two tables in mysql:
Results Table : 1046928 rows.
Nodes Table : 50 rows.
I am joining these two tables with the following query and the execution of the query is very very slow.
select res.TIndex, res.PNumber, res.Sender, res.Receiver,
sta.Nickname, rta.Nickname from ((Results res join
Nodes sta) join Nodes rta) where ((res.sender_h=sta.name) and
(res.receiver_h=rta.name));
Please help me optimize this query. Right now if I want to pull just top 5 rows, It takes about 5-6 MINUTES. Thank you.
CREATE TABLE `nodes1` (
`NodeID` int(11) NOT NULL,
`Name` varchar(254) NOT NULL,
`Nickname` varchar(254) NOT NULL,
PRIMARY KEY (`NodeID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `Results1` (
`TIndex` int(11) NOT NULL,
`PNumber` int(11) NOT NULL,
`Sender` varchar(254) NOT NULL,
`Receiver` varchar(254) NOT NULL,
`PTime` datetime NOT NULL,
PRIMARY KEY (`TIndex`,`PNumber`),
KEY `PERIOD_TIME_IDX` (`PTime`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
SELECT res.TIndex ,
res.PNumber ,
res.Sender ,
res.Receiver ,
sta.Nickname ,
rta.Nickname
FROM Results AS res
INNER JOIN Nodes AS sta ON res.sender_h = sta.name
INNER JOIN Nodes AS rta ON res.receiver_h = rta.NAME
Create an index on Results
(sender_h)
Create an index on Results (receiver_h)
Create an index
on Nodes (name)
Joining on the node's name rather than NodeId (the primary key) doesn't look good at all.
Perhaps you should be storing NodeId for foreign key sender and receiver in the Results table instead of name Adding foreign key constraints is a good idea too. Among other things, this might cause indexing automatically depending on your configuration
If this change is difficult, at the very least you should enforce uniqueness on node's name field
If you change the tables definition in this manner, change your query to John's recommendation, and add indexes it should run a lot better and be a lot more readable/better form.