MySQL — Query possible without three SELECT? - mysql

Happy New Year's, everyone!
I'll jump right into it. I've inherited a project that includes a very large database. Some tables are upwards of 285.6GiB.
One of the larger tables is user-ratings. The table has the following columns (simplified):
from — VARCHAR(19)
reason — VARCHAR(512)
stars — TINYINT
timestamp — TIMESTAMP
to — VARCHAR(19)
Currently, users can check the ratings of other users. THis shows a summary of their ratings given, received, as well as their entire last 5 ratings received. To do this, we'd currently use the following queries (simplified):
# First query — ratings given from the user
SELECT Avg(`stars`),
Min(`timestamp`),
Count(*),
Count(DISTINCT( `to` ))
INTO avgStarsGiven, firstRatingGivenAt, totalRatingsGiven,
totalUniqueRatingsGiven
FROM `ratings`
WHERE `from` = user;
# Second query — ratings received by the user
SELECT Avg(`stars`),
Min(`timestamp`),
Count(*),
Count(DISTINCT( `from` ))
INTO avgStarsReceived, firstRatingReceivedAt, totalRatingsReceived,
totalUniqueRatingsReceived
FROM `ratings`
WHERE `to` = user;
# Third query — get the last 5 ratings to the user
SELECT * FROM `ratings` WHERE `to` = user ORDER BY `timestamp` DESC LIMIT 5;
Is it possible to retrieve all of this information without having to go over the entire table 3 times?
Thanks in advance!
Edit: The table and version are below:
# 8.0.27-0ubuntu0.20.04.1
CREATE TABLE `ratings` (
`no` int NOT NULL AUTO_INCREMENT,
`from` varchar(19) NOT NULL,
`to` varchar(19) NOT NULL,
`reason` varchar(512) NOT NULL,
`stars` tinyint NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`no`),
KEY `i-ratings-from` (`from`) /*!80000 INVISIBLE */,
KEY `i-ratings-to` (`to`) /*!80000 INVISIBLE */,
KEY `i-ratings-from-to-timestamp` (`from`,`to`,`timestamp`),
CONSTRAINT `fk-ratings-from` FOREIGN KEY (`from`) REFERENCES `users` (`user`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `fk-ratings-to` FOREIGN KEY (`to`) REFERENCES `users` (`user`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=59 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

You probably don't need to go over the entire table in any of these queries, if you define indexes on the columns from and to. When you look for a person by name in a telephone book, do you read the entire book every time?
ALTER TABLE ratings
ADD INDEX (`from`),
ADD INDEX (`to`, `timestamp`);
You can use EXPLAIN to confirm that it's using the index:
EXPLAIN SELECT * FROM `ratings` WHERE `to` = <example-value>
ORDER BY `timestamp` DESC LIMIT 5;
The EXPLAIN report should show you in its rows field that it will examine a small subset of the rows of the table. This is one of the benefits of an index, to narrow down the search efficiently, so a query doesn't need to scan the entire table.
You edited your question above to add the CREATE TABLE definition.
I see that your table already has some indexes, but these indexes aren't tailored very well to the queries you show. You might like to review my presentation How to Design Indexes, Really, or the video.
Also I see that some of your indexes are defined with the INVISIBLE option, which means the optimizer won't use these indexes. Read https://dev.mysql.com/doc/refman/8.0/en/invisible-indexes.html for details.

Related

Mysql is looking in much more, estimate rows, then expected

I have user_rates table where i have two user foreign references user_id_owner and user_id_rated.
This is my create table query:
CREATE TABLE `user_rates` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id_owner` int(10) unsigned NOT NULL,
`user_id_rated` int(10) unsigned NOT NULL,
`value` int(11) NOT NULL COMMENT '0 - dislike, 1 - like',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_rates_user_id_rated_foreign` (`user_id_rated`),
KEY `user_rates_user_id_owner_foreign` (`user_id_owner`),
CONSTRAINT `user_rates_user_id_owner_foreign` FOREIGN KEY (`user_id_owner`) REFERENCES `users` (`id`),
CONSTRAINT `user_rates_user_id_rated_foreign` FOREIGN KEY (`user_id_rated`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1825767 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
When i execute this query:
EXPLAIN SELECT
user_id_rated
FROM
`user_rates` AS ur
WHERE
ur.user_id_owner = 10101;
It shows estimate rows to examine 107000, but returning only 60000.
Can you explain me why it's examining so many rows, when it is comparing with equality operator and also comparing field is foreign key?
EDIT
I am getting this on EXPLAIN
I want to add several where clauses also. At last my query looks like this:
Explain SELECT
user_id_rated
FROM
`user_rates` AS ur
WHERE
ur.user_id_owner = 10101
AND (ur.value IN (1, 2, 3)
OR (ur.value = 0
AND ur.created_at > '2020-02-04 00:00:00'));
Output:
It will be nice if query can be more optimized. I don't understand why isn't it reducing estimate rows.
Steps i tried when optimizing
Added compose index on (user_id_owner, value, created_at)
But estimate row is not reducing, It is filtering even more rows
Maybe i am doing indexing wrong? I really don't know how to make proper indexes. Sorry for bad question, I am new here. Thanks in advance.
The "rows" is an estimate, often quite far off -- sometimes even worse than your example. The incorrectness of the estimate rarely impacts performance.
You can run ANALYZE TABLE tablename to improve the estimate. But it may still not be better.
For the current query, use:
( SELECT user_id_rated
FROM `user_rates` AS ur
WHERE ur.user_id_owner = 10101
AND ur.value IN (1, 2, 3)
)
UNION ALL
( SELECT user_id_rated
FROM `user_rates` AS ur
WHERE ur.user_id_owner = 10101
AND ur.value = 0
AND ur.created_at > '2020-02-04 00:00:00'
);
And have the composite (and "covering") indexes:
INDEX(user_id_owner, value, user_id_rated)
INDEX(user_id_owner, value, created_at, user_id_rated)
If there are other variations of the query, show us. As you may guess; the details are important.
(The simplified version of the query does not provide any useful information when discussing the real query.)

Relatively simple SQL query with join refuses to be efficient

I'm having some problems optimizing a certain query in SQL(using MariaDB), to give you some context: I have a system with "events"(see them as log entries) that can occur on tickets, but also on some other objects besides tickets(which I why I seperated the event and ticket_event tables). I want to get all ticket_events sorted by display_time. The event table has ~20M rows right now.
CREATE TABLE IF NOT EXISTS `event` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(255) DEFAULT NULL,
`data` text,
`display_time` datetime DEFAULT NULL,
`created_time` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_for_display_time_and_id` (`id`,`display_time`),
KEY `index_for_display_time` (`display_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `ticket_event` (
`id` int(11) NOT NULL,
`ticket_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `ticket_id` (`ticket_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `ticket_event`
ADD CONSTRAINT `ticket_event_ibfk_1` FOREIGN KEY (`id`) REFERENCES `event` (`id`),
ADD CONSTRAINT `ticket_event_ibfk_2` FOREIGN KEY (`ticket_id`) REFERENCES `ticket` (`id`);
As you see I already played around with some keys(I also made one for (id, ticket_id) that doesn't show up here now since I removed it again) The query I execute:
SELECT * FROM ticket_event
INNER JOIN event ON event.id = ticket_event.id
ORDER BY display_time DESC
LIMIT 25
That query takes quite a while to execute(~30s if I filter on a specific ticket_id, can't even complete it reliably without filtering on it). If I run an explain on the query it shows it does a filesort + temporary:
I played around with force index etc. a bit, but that doesn't seem to solve anything or I did it wrong.
Does anyone see what I did wrong or what I can optimize here? I would very much prefer not to make "event" a wide table by adding ticket_id/host_id etc. as columns and just making them NULL if they don't apply.
Thanks in advance!
EDIT: Extra image of EXPLAIN with actual rows in the table:
OK what if you try to force the index?
SELECT * FROM ticket_event
INNER JOIN event
FORCE INDEX (index_for_display_time)
ON event.id = ticket_event.id
ORDER BY display_time DESC
LIMIT 25;
Your query selects every column from every row, even if you use a LIMIT. Have you tried to select one specific row by id?
KEY `index_for_display_time_and_id` (`id`,`display_time`),
is useless; DROP it. It is useless because you are using InnoDB, which stores the data "clustered" on the PK (id).
Please change ticket_event.id to event_id. id is confusing because it feels like the PK of the mapping table, which it is. But wait! That does not make sense? There is only one ticket for each event? Then why does ticket_event exist at all? Why not put ticket_id in event?
For a many-to-many table, do
CREATE TABLE IF NOT EXISTS `ticket_event` (
`event_id` int(11) NOT NULL,
`ticket_id` int(11) NOT NULL,
PRIMARY KEY (`event_id`, ticket_id), -- for lookup one direction
KEY (`ticket_id`, event_id) -- for the other direction
) ENGINE=InnoDB DEFAULT;
Maybe you will achieve a better performance by trying this:
SELECT *
FROM ticket_event
INNER JOIN (select * from event ORDER BY display_time DESC limit 25) as b
ON b.id = ticket_event.id;

Optimize MySQL count query with JOIN

I have a query that takes about 20 seconds, I would like to understand if there is a way to optimize it.
Table 1:
CREATE TABLE IF NOT EXISTS `sessions` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=9845765 ;
And table 2:
CREATE TABLE IF NOT EXISTS `access` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`session_id` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `session_id ` (`session_id `)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=9467799 ;
Now, what I am trying to do is to count all the access connected to all sessions about one user, so my query is:
SELECT COUNT(*)
FROM access
INNER JOIN sessions ON access.session_id=session.id
WHERE session.user_id='6';
It takes almost 20 seconds...and for user_id 6 there are about 3 millions sessions stored.
There is anything I can do to optimize that query?
Change this line from the session table:
KEY `user_id` (`user_id`)
To this:
KEY `user_id` (`user_id`, `id`)
What this will do for you is allow you to complete the query from the index, without going back to the raw table. As it is, you need to do an index scan on the session table for your user_id, and for each item go back to the table to find the id for the join to the access table. By including the id in the index, you can skip going back to the table.
Sadly, this will make your inserts slower into that table, and it seems like this may be a bid deal, given just one user has 3 millions sessions. Sql Server and Oracle would address this by allowing you to include the id column in your index, without actually indexing on it, saving a little work at insert time, and also by allowing you specify a lower fill factor for the index, reducing the need to re-build or re-order the indexes at insert, but MySql doesn't support these.

Order by two fields - Indexing

So I've got a table with all users, and their values. And I want to order them after how much "money" they got. The problem is that they have money in two seperate fields: users.money and users.bank.
So this is my table structure:
CREATE TABLE IF NOT EXISTS `users` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(54) COLLATE utf8_swedish_ci NOT NULL,
`money` bigint(54) NOT NULL DEFAULT '10000',
`bank` bigint(54) NOT NULL DEFAULT '10000',
PRIMARY KEY (`id`),
KEY `users_all_money` (`money`,`bank`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci AUTO_INCREMENT=100 ;
And this is the query:
SELECT id, (money+bank) AS total FROM users FORCE INDEX (users_all_money) ORDER BY total DESC
Which works fine, but when I run EXPLAIN it shows "Using filesort", and I'm wondering if there is any way to optimize it?
Because you want to sort by a derived value (one that must be calculated for each row) MySQL can't use the index to help with the ordering.
The only solution that I can see would be to create an additional total_money or similar column and as you update money or bank update that value too. You could do this in your application code or it would be possible to do this in MySQL with triggers too if you wanted.

Should I use a unique index here? And why?

I read but I'm still confused when to use a normal index or a unique index in MySQL. I have a table that stores posts and responses (id, parentId). I have set up three normal indices for parentId, userId, and editorId.
Would using unique indices benefit me in any way given the following types of queries I will generally run? And why?
Most of my queries will return a post and its responses:
SELECT * FROM posts WHERE id = #postId OR parentId = #postId ORDER BY postTypeId
Some times I will add a join to get user data:
SELECT * FROM posts
JOIN users AS owner ON owner.id = posts.userId
LEFT JOIN users AS editor ON editor.id = posts.editorId
WHERE id = #postId OR parentId = #postId ORDER BY postTypeId
Other times I may ask for a user and his/her posts:
SELECT * FROM users
LEFT JOIN posts ON users.id = posts.userid
WHERE id = #userId
My schema looks like this:
CREATE TABLE `posts` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`posttypeid` int(10) NOT NULL,
`parentid` int(10) DEFAULT NULL,
`body` text NOT NULL,
`userid` int(10) NOT NULL,
`editorid` int(10) NOT NULL,
`updatedat` datetime DEFAULT NULL,
`createdat` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `userId` (`userid`),
KEY `editorId` (`editorid`),
KEY `parentId` (`parentid`)
) ENGINE=InnoDB AUTO_INCREMENT=572 DEFAULT CHARSET=utf8
When an index is created as UNIQUE, it only adds consistency to your table: inserting a new entry reusing the same key by error will fail, instead of being accepted and lead to strange errors later.
So, you should use it for your IDs when you know there won't be duplicate (it's by default and mandatory for primary keys), but it won't give you any benefits performance wise. It only gives you a guarantee that you won't have to deal with a specific kind of database corruption because of a bug in the client code.
However, if you know there can be duplicates (which I assume is the case for your columns userId, editorId, and parentId), using the UNIQUE attribute would be a serious bug: it would forbid multiple posts with the same userId, editorId or parentId.
In short: use it everywhere you can, but in this case you can't.
Unique is a constraint that just happens to be implemented by the index.
Use unique when you need unique values. IE no duplicates. Otherwise don't. That simple really.
Unique keys do not have any benefit over normal keys for data retrieval. Unique keys are indexes with a constraint: they prevent insertion of the same value and so they only benefit inserts.