I'm having some problems optimizing a certain query in SQL(using MariaDB), to give you some context: I have a system with "events"(see them as log entries) that can occur on tickets, but also on some other objects besides tickets(which I why I seperated the event and ticket_event tables). I want to get all ticket_events sorted by display_time. The event table has ~20M rows right now.
CREATE TABLE IF NOT EXISTS `event` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(255) DEFAULT NULL,
`data` text,
`display_time` datetime DEFAULT NULL,
`created_time` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_for_display_time_and_id` (`id`,`display_time`),
KEY `index_for_display_time` (`display_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `ticket_event` (
`id` int(11) NOT NULL,
`ticket_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `ticket_id` (`ticket_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `ticket_event`
ADD CONSTRAINT `ticket_event_ibfk_1` FOREIGN KEY (`id`) REFERENCES `event` (`id`),
ADD CONSTRAINT `ticket_event_ibfk_2` FOREIGN KEY (`ticket_id`) REFERENCES `ticket` (`id`);
As you see I already played around with some keys(I also made one for (id, ticket_id) that doesn't show up here now since I removed it again) The query I execute:
SELECT * FROM ticket_event
INNER JOIN event ON event.id = ticket_event.id
ORDER BY display_time DESC
LIMIT 25
That query takes quite a while to execute(~30s if I filter on a specific ticket_id, can't even complete it reliably without filtering on it). If I run an explain on the query it shows it does a filesort + temporary:
I played around with force index etc. a bit, but that doesn't seem to solve anything or I did it wrong.
Does anyone see what I did wrong or what I can optimize here? I would very much prefer not to make "event" a wide table by adding ticket_id/host_id etc. as columns and just making them NULL if they don't apply.
Thanks in advance!
EDIT: Extra image of EXPLAIN with actual rows in the table:
OK what if you try to force the index?
SELECT * FROM ticket_event
INNER JOIN event
FORCE INDEX (index_for_display_time)
ON event.id = ticket_event.id
ORDER BY display_time DESC
LIMIT 25;
Your query selects every column from every row, even if you use a LIMIT. Have you tried to select one specific row by id?
KEY `index_for_display_time_and_id` (`id`,`display_time`),
is useless; DROP it. It is useless because you are using InnoDB, which stores the data "clustered" on the PK (id).
Please change ticket_event.id to event_id. id is confusing because it feels like the PK of the mapping table, which it is. But wait! That does not make sense? There is only one ticket for each event? Then why does ticket_event exist at all? Why not put ticket_id in event?
For a many-to-many table, do
CREATE TABLE IF NOT EXISTS `ticket_event` (
`event_id` int(11) NOT NULL,
`ticket_id` int(11) NOT NULL,
PRIMARY KEY (`event_id`, ticket_id), -- for lookup one direction
KEY (`ticket_id`, event_id) -- for the other direction
) ENGINE=InnoDB DEFAULT;
Maybe you will achieve a better performance by trying this:
SELECT *
FROM ticket_event
INNER JOIN (select * from event ORDER BY display_time DESC limit 25) as b
ON b.id = ticket_event.id;
Related
Happy New Year's, everyone!
I'll jump right into it. I've inherited a project that includes a very large database. Some tables are upwards of 285.6GiB.
One of the larger tables is user-ratings. The table has the following columns (simplified):
from — VARCHAR(19)
reason — VARCHAR(512)
stars — TINYINT
timestamp — TIMESTAMP
to — VARCHAR(19)
Currently, users can check the ratings of other users. THis shows a summary of their ratings given, received, as well as their entire last 5 ratings received. To do this, we'd currently use the following queries (simplified):
# First query — ratings given from the user
SELECT Avg(`stars`),
Min(`timestamp`),
Count(*),
Count(DISTINCT( `to` ))
INTO avgStarsGiven, firstRatingGivenAt, totalRatingsGiven,
totalUniqueRatingsGiven
FROM `ratings`
WHERE `from` = user;
# Second query — ratings received by the user
SELECT Avg(`stars`),
Min(`timestamp`),
Count(*),
Count(DISTINCT( `from` ))
INTO avgStarsReceived, firstRatingReceivedAt, totalRatingsReceived,
totalUniqueRatingsReceived
FROM `ratings`
WHERE `to` = user;
# Third query — get the last 5 ratings to the user
SELECT * FROM `ratings` WHERE `to` = user ORDER BY `timestamp` DESC LIMIT 5;
Is it possible to retrieve all of this information without having to go over the entire table 3 times?
Thanks in advance!
Edit: The table and version are below:
# 8.0.27-0ubuntu0.20.04.1
CREATE TABLE `ratings` (
`no` int NOT NULL AUTO_INCREMENT,
`from` varchar(19) NOT NULL,
`to` varchar(19) NOT NULL,
`reason` varchar(512) NOT NULL,
`stars` tinyint NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`no`),
KEY `i-ratings-from` (`from`) /*!80000 INVISIBLE */,
KEY `i-ratings-to` (`to`) /*!80000 INVISIBLE */,
KEY `i-ratings-from-to-timestamp` (`from`,`to`,`timestamp`),
CONSTRAINT `fk-ratings-from` FOREIGN KEY (`from`) REFERENCES `users` (`user`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `fk-ratings-to` FOREIGN KEY (`to`) REFERENCES `users` (`user`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=59 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
You probably don't need to go over the entire table in any of these queries, if you define indexes on the columns from and to. When you look for a person by name in a telephone book, do you read the entire book every time?
ALTER TABLE ratings
ADD INDEX (`from`),
ADD INDEX (`to`, `timestamp`);
You can use EXPLAIN to confirm that it's using the index:
EXPLAIN SELECT * FROM `ratings` WHERE `to` = <example-value>
ORDER BY `timestamp` DESC LIMIT 5;
The EXPLAIN report should show you in its rows field that it will examine a small subset of the rows of the table. This is one of the benefits of an index, to narrow down the search efficiently, so a query doesn't need to scan the entire table.
You edited your question above to add the CREATE TABLE definition.
I see that your table already has some indexes, but these indexes aren't tailored very well to the queries you show. You might like to review my presentation How to Design Indexes, Really, or the video.
Also I see that some of your indexes are defined with the INVISIBLE option, which means the optimizer won't use these indexes. Read https://dev.mysql.com/doc/refman/8.0/en/invisible-indexes.html for details.
I have a query that takes about 20 seconds, I would like to understand if there is a way to optimize it.
Table 1:
CREATE TABLE IF NOT EXISTS `sessions` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=9845765 ;
And table 2:
CREATE TABLE IF NOT EXISTS `access` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`session_id` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `session_id ` (`session_id `)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=9467799 ;
Now, what I am trying to do is to count all the access connected to all sessions about one user, so my query is:
SELECT COUNT(*)
FROM access
INNER JOIN sessions ON access.session_id=session.id
WHERE session.user_id='6';
It takes almost 20 seconds...and for user_id 6 there are about 3 millions sessions stored.
There is anything I can do to optimize that query?
Change this line from the session table:
KEY `user_id` (`user_id`)
To this:
KEY `user_id` (`user_id`, `id`)
What this will do for you is allow you to complete the query from the index, without going back to the raw table. As it is, you need to do an index scan on the session table for your user_id, and for each item go back to the table to find the id for the join to the access table. By including the id in the index, you can skip going back to the table.
Sadly, this will make your inserts slower into that table, and it seems like this may be a bid deal, given just one user has 3 millions sessions. Sql Server and Oracle would address this by allowing you to include the id column in your index, without actually indexing on it, saving a little work at insert time, and also by allowing you specify a lower fill factor for the index, reducing the need to re-build or re-order the indexes at insert, but MySql doesn't support these.
So I've got a table with all users, and their values. And I want to order them after how much "money" they got. The problem is that they have money in two seperate fields: users.money and users.bank.
So this is my table structure:
CREATE TABLE IF NOT EXISTS `users` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(54) COLLATE utf8_swedish_ci NOT NULL,
`money` bigint(54) NOT NULL DEFAULT '10000',
`bank` bigint(54) NOT NULL DEFAULT '10000',
PRIMARY KEY (`id`),
KEY `users_all_money` (`money`,`bank`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci AUTO_INCREMENT=100 ;
And this is the query:
SELECT id, (money+bank) AS total FROM users FORCE INDEX (users_all_money) ORDER BY total DESC
Which works fine, but when I run EXPLAIN it shows "Using filesort", and I'm wondering if there is any way to optimize it?
Because you want to sort by a derived value (one that must be calculated for each row) MySQL can't use the index to help with the ordering.
The only solution that I can see would be to create an additional total_money or similar column and as you update money or bank update that value too. You could do this in your application code or it would be possible to do this in MySQL with triggers too if you wanted.
I read but I'm still confused when to use a normal index or a unique index in MySQL. I have a table that stores posts and responses (id, parentId). I have set up three normal indices for parentId, userId, and editorId.
Would using unique indices benefit me in any way given the following types of queries I will generally run? And why?
Most of my queries will return a post and its responses:
SELECT * FROM posts WHERE id = #postId OR parentId = #postId ORDER BY postTypeId
Some times I will add a join to get user data:
SELECT * FROM posts
JOIN users AS owner ON owner.id = posts.userId
LEFT JOIN users AS editor ON editor.id = posts.editorId
WHERE id = #postId OR parentId = #postId ORDER BY postTypeId
Other times I may ask for a user and his/her posts:
SELECT * FROM users
LEFT JOIN posts ON users.id = posts.userid
WHERE id = #userId
My schema looks like this:
CREATE TABLE `posts` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`posttypeid` int(10) NOT NULL,
`parentid` int(10) DEFAULT NULL,
`body` text NOT NULL,
`userid` int(10) NOT NULL,
`editorid` int(10) NOT NULL,
`updatedat` datetime DEFAULT NULL,
`createdat` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `userId` (`userid`),
KEY `editorId` (`editorid`),
KEY `parentId` (`parentid`)
) ENGINE=InnoDB AUTO_INCREMENT=572 DEFAULT CHARSET=utf8
When an index is created as UNIQUE, it only adds consistency to your table: inserting a new entry reusing the same key by error will fail, instead of being accepted and lead to strange errors later.
So, you should use it for your IDs when you know there won't be duplicate (it's by default and mandatory for primary keys), but it won't give you any benefits performance wise. It only gives you a guarantee that you won't have to deal with a specific kind of database corruption because of a bug in the client code.
However, if you know there can be duplicates (which I assume is the case for your columns userId, editorId, and parentId), using the UNIQUE attribute would be a serious bug: it would forbid multiple posts with the same userId, editorId or parentId.
In short: use it everywhere you can, but in this case you can't.
Unique is a constraint that just happens to be implemented by the index.
Use unique when you need unique values. IE no duplicates. Otherwise don't. That simple really.
Unique keys do not have any benefit over normal keys for data retrieval. Unique keys are indexes with a constraint: they prevent insertion of the same value and so they only benefit inserts.
I have two tables in mysql:
Results Table : 1046928 rows.
Nodes Table : 50 rows.
I am joining these two tables with the following query and the execution of the query is very very slow.
select res.TIndex, res.PNumber, res.Sender, res.Receiver,
sta.Nickname, rta.Nickname from ((Results res join
Nodes sta) join Nodes rta) where ((res.sender_h=sta.name) and
(res.receiver_h=rta.name));
Please help me optimize this query. Right now if I want to pull just top 5 rows, It takes about 5-6 MINUTES. Thank you.
CREATE TABLE `nodes1` (
`NodeID` int(11) NOT NULL,
`Name` varchar(254) NOT NULL,
`Nickname` varchar(254) NOT NULL,
PRIMARY KEY (`NodeID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `Results1` (
`TIndex` int(11) NOT NULL,
`PNumber` int(11) NOT NULL,
`Sender` varchar(254) NOT NULL,
`Receiver` varchar(254) NOT NULL,
`PTime` datetime NOT NULL,
PRIMARY KEY (`TIndex`,`PNumber`),
KEY `PERIOD_TIME_IDX` (`PTime`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
SELECT res.TIndex ,
res.PNumber ,
res.Sender ,
res.Receiver ,
sta.Nickname ,
rta.Nickname
FROM Results AS res
INNER JOIN Nodes AS sta ON res.sender_h = sta.name
INNER JOIN Nodes AS rta ON res.receiver_h = rta.NAME
Create an index on Results
(sender_h)
Create an index on Results (receiver_h)
Create an index
on Nodes (name)
Joining on the node's name rather than NodeId (the primary key) doesn't look good at all.
Perhaps you should be storing NodeId for foreign key sender and receiver in the Results table instead of name Adding foreign key constraints is a good idea too. Among other things, this might cause indexing automatically depending on your configuration
If this change is difficult, at the very least you should enforce uniqueness on node's name field
If you change the tables definition in this manner, change your query to John's recommendation, and add indexes it should run a lot better and be a lot more readable/better form.