Mysql is looking in much more, estimate rows, then expected - mysql

I have user_rates table where i have two user foreign references user_id_owner and user_id_rated.
This is my create table query:
CREATE TABLE `user_rates` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id_owner` int(10) unsigned NOT NULL,
`user_id_rated` int(10) unsigned NOT NULL,
`value` int(11) NOT NULL COMMENT '0 - dislike, 1 - like',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_rates_user_id_rated_foreign` (`user_id_rated`),
KEY `user_rates_user_id_owner_foreign` (`user_id_owner`),
CONSTRAINT `user_rates_user_id_owner_foreign` FOREIGN KEY (`user_id_owner`) REFERENCES `users` (`id`),
CONSTRAINT `user_rates_user_id_rated_foreign` FOREIGN KEY (`user_id_rated`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1825767 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
When i execute this query:
EXPLAIN SELECT
user_id_rated
FROM
`user_rates` AS ur
WHERE
ur.user_id_owner = 10101;
It shows estimate rows to examine 107000, but returning only 60000.
Can you explain me why it's examining so many rows, when it is comparing with equality operator and also comparing field is foreign key?
EDIT
I am getting this on EXPLAIN
I want to add several where clauses also. At last my query looks like this:
Explain SELECT
user_id_rated
FROM
`user_rates` AS ur
WHERE
ur.user_id_owner = 10101
AND (ur.value IN (1, 2, 3)
OR (ur.value = 0
AND ur.created_at > '2020-02-04 00:00:00'));
Output:
It will be nice if query can be more optimized. I don't understand why isn't it reducing estimate rows.
Steps i tried when optimizing
Added compose index on (user_id_owner, value, created_at)
But estimate row is not reducing, It is filtering even more rows
Maybe i am doing indexing wrong? I really don't know how to make proper indexes. Sorry for bad question, I am new here. Thanks in advance.

The "rows" is an estimate, often quite far off -- sometimes even worse than your example. The incorrectness of the estimate rarely impacts performance.
You can run ANALYZE TABLE tablename to improve the estimate. But it may still not be better.
For the current query, use:
( SELECT user_id_rated
FROM `user_rates` AS ur
WHERE ur.user_id_owner = 10101
AND ur.value IN (1, 2, 3)
)
UNION ALL
( SELECT user_id_rated
FROM `user_rates` AS ur
WHERE ur.user_id_owner = 10101
AND ur.value = 0
AND ur.created_at > '2020-02-04 00:00:00'
);
And have the composite (and "covering") indexes:
INDEX(user_id_owner, value, user_id_rated)
INDEX(user_id_owner, value, created_at, user_id_rated)
If there are other variations of the query, show us. As you may guess; the details are important.
(The simplified version of the query does not provide any useful information when discussing the real query.)

Related

MySQL Sum Very Slow On Small Number of Rows

I can't figure out why MySQL is so slow summing less than 400 rows. Both u and t have indexes and return the rows quickly.
SELECT sum(t) FROM `s_table`
WHERE `u` LIKE 'dogs%'
AND `t`> 10000
Query took 3.5299
If I remove the sum part of the query.
SELECT t FROM `s_table`
WHERE `u` LIKE 'dogs%'
AND `t`> 10000
Query took 0.0090 seconds returns 397 rows.
So to sum 397 rows takes over 3 seconds!
Then I tried.
SELECT SUM(t)
FROM ( SELECT t
FROM s_table
WHERE `u` LIKE 'dogs%'
AND `t`> 10000
) AS total;
Query took 3.5767 seconds, so basically the same as the first query.
I'm going insane here. Why is it taking MySQL over 3 seconds to sum only 398 numbers?
Here is the explain:
CREATE TABLE `s_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`s` varchar(100) NOT NULL,
`v` int(12) NOT NULL,
`c` float NOT NULL,
`r` int(3) NOT NULL,
`u` varchar(350) NOT NULL,
`w` int(1) NOT NULL,
`t` int(12) NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_v` (`v`),
KEY `idx_c` (`c`),
KEY `idx_r` (`r`),
KEY `idx_u` (`u`),
KEY `idx_t` (`t`),
KEY `idx_date` (`date`),
KEY `s` (`s`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED
Change these
KEY `idx_u` (`u`),
KEY `idx_t` (`t`),
to
KEY `u_t` (`u`, `t`),
KEY `t_u` (`t`, `u`),
I said "change", not "add". I have seen cases where "adding" a composite index did not change the Optimizer's choice of index; I think this is a bug. Note that these 2-column indexes are "covering", which, by itself, gives a performance boost.
Don't use ROW_FORMAT=COMPRESSED, it may be expending a lot of effort in uncompressing to run the query.
The UI you are using seems to stop at 25 rows -- this could explain the extra speed.
What do you get from these? (They may help in analyzing things.)
How many rows "need" to be looked at by each single-column index:
SELECT SUM(U LIKE 'dogs%'), SUM(t > 10000) FROM s_table;
More details than a plain EXPLAIN:
EXPLAIN FORMAT=JSON SELECT ... -- your query
This will definitively say whether 397 versus 272580 rows were fetched:
FLUSH STATUS;
SELECT ...; -- your query
SHOW SESSION STATUS LIKE 'Handler%';

MySQL — Query possible without three SELECT?

Happy New Year's, everyone!
I'll jump right into it. I've inherited a project that includes a very large database. Some tables are upwards of 285.6GiB.
One of the larger tables is user-ratings. The table has the following columns (simplified):
from — VARCHAR(19)
reason — VARCHAR(512)
stars — TINYINT
timestamp — TIMESTAMP
to — VARCHAR(19)
Currently, users can check the ratings of other users. THis shows a summary of their ratings given, received, as well as their entire last 5 ratings received. To do this, we'd currently use the following queries (simplified):
# First query — ratings given from the user
SELECT Avg(`stars`),
Min(`timestamp`),
Count(*),
Count(DISTINCT( `to` ))
INTO avgStarsGiven, firstRatingGivenAt, totalRatingsGiven,
totalUniqueRatingsGiven
FROM `ratings`
WHERE `from` = user;
# Second query — ratings received by the user
SELECT Avg(`stars`),
Min(`timestamp`),
Count(*),
Count(DISTINCT( `from` ))
INTO avgStarsReceived, firstRatingReceivedAt, totalRatingsReceived,
totalUniqueRatingsReceived
FROM `ratings`
WHERE `to` = user;
# Third query — get the last 5 ratings to the user
SELECT * FROM `ratings` WHERE `to` = user ORDER BY `timestamp` DESC LIMIT 5;
Is it possible to retrieve all of this information without having to go over the entire table 3 times?
Thanks in advance!
Edit: The table and version are below:
# 8.0.27-0ubuntu0.20.04.1
CREATE TABLE `ratings` (
`no` int NOT NULL AUTO_INCREMENT,
`from` varchar(19) NOT NULL,
`to` varchar(19) NOT NULL,
`reason` varchar(512) NOT NULL,
`stars` tinyint NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`no`),
KEY `i-ratings-from` (`from`) /*!80000 INVISIBLE */,
KEY `i-ratings-to` (`to`) /*!80000 INVISIBLE */,
KEY `i-ratings-from-to-timestamp` (`from`,`to`,`timestamp`),
CONSTRAINT `fk-ratings-from` FOREIGN KEY (`from`) REFERENCES `users` (`user`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `fk-ratings-to` FOREIGN KEY (`to`) REFERENCES `users` (`user`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=59 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
You probably don't need to go over the entire table in any of these queries, if you define indexes on the columns from and to. When you look for a person by name in a telephone book, do you read the entire book every time?
ALTER TABLE ratings
ADD INDEX (`from`),
ADD INDEX (`to`, `timestamp`);
You can use EXPLAIN to confirm that it's using the index:
EXPLAIN SELECT * FROM `ratings` WHERE `to` = <example-value>
ORDER BY `timestamp` DESC LIMIT 5;
The EXPLAIN report should show you in its rows field that it will examine a small subset of the rows of the table. This is one of the benefits of an index, to narrow down the search efficiently, so a query doesn't need to scan the entire table.
You edited your question above to add the CREATE TABLE definition.
I see that your table already has some indexes, but these indexes aren't tailored very well to the queries you show. You might like to review my presentation How to Design Indexes, Really, or the video.
Also I see that some of your indexes are defined with the INVISIBLE option, which means the optimizer won't use these indexes. Read https://dev.mysql.com/doc/refman/8.0/en/invisible-indexes.html for details.

Indexing needs to be sped up

I have a table with the following details:
CREATE TABLE `test` (
`seenDate` datetime NOT NULL DEFAULT '0001-01-01 00:00:00',
`corrected_test` varchar(45) DEFAULT NULL,
`corrected_timestamp` timestamp NULL DEFAULT NULL,
`unable_to_correct` tinyint(1) DEFAULT '0',
`fk_zone_for_correction` int(11) DEFAULT NULL,
PRIMARY KEY (`sightinguid`),
KEY `corrected_test` (`corrected_test`),
KEY `idx_seenDate` (`seenDate`),
KEY `idx_corrected_test_seenDate` (`corrected_test`,`seenDate`),
KEY `zone_for_correction_fk_idx` (`fk_zone_for_correction`),
KEY `idx_corrected_test_zone` (`fk_zone_for_correction`,`corrected_test`,`seenDate`),
CONSTRAINT `zone_for_correction_fk` FOREIGN KEY (`fk_zone_for_correction`) REFERENCES `zone_test` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I am then using the following query:
SELECT
*
FROM
test
WHERE
fk_zone_for_correction = 1
AND (unable_to_correct = 0
OR unable_to_correct IS NULL)
AND (corrected_test = ''
OR corrected_test IS NULL)
AND (last_accessed_timestamp IS NULL
OR last_accessed_timestamp < (NOW() - INTERVAL 30 MINUTE))
ORDER BY seenDate ASC
LIMIT 1
Here is a screenshot of the optimiser - the ORDER BY is slowing things down, and in my opinion seems to be indexed properly, and the correct index (idx_corrected_test_zone) is being selected. What can be done to improve it?
There is no INDEX that will help much.
This might help:
INDEX(fk_zone_for_correction, seenDate)
Both columns can perhaps be used -- the first for filtering, the second for avoiding having to sort. But, it could backfire if it can't find the 1 row quickly.
The killer is OR. If you could avoid ever populating any of those 3 columns with NULL, then this might be better:
INDEX(fk_zone_for_correction, unable_to_correct, corrected_test, last_accessed_timestamp)
-- the range thing needs to be last
-- this index would do the filtering, but fail to help with `ORDER` and `LIMIT`.
Even though it is using idx_corrected_test_zone, it is probably not using more than the first two columns -- because of OR.
You have two cases of redundant indexes. For example, the first of these is the left part of the second; so the first is redundant and can be DROPped:
KEY `corrected_test` (`corrected_test`),
KEY `idx_corrected_test_seenDate` (`corrected_test`,`seenDate`),

Optimise mysql where, group by, scanning too many rows

Table representing statuses. User can re-share a status as on FB, therefore the original_id.
| user_status | CREATE TABLE `user_status` (
`status_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`destination_user_id` int(11) NOT NULL,
`original_id` int(11) DEFAULT NULL,
`type` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`status_id`),
KEY `IDX_1E527E21A76ED395` (`user_id`),
KEY `IDX_1E527E21C957ECED` (`destination_user_id`),
KEY `core_index` (`destination_user_id`,`original_id`),
CONSTRAINT `FK_1E527E21A76ED395` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`),
CONSTRAINT `FK_1E527E21C957ECED` FOREIGN KEY (`destination_user_id`)REFERENCES `users` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=161362 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
I am trying to optimise a query for newsfeed (I removed everything unnecessary just to be able to optimise the core of the query but it's still simply slow).
Query:
EXPLAIN SELECT MAX(us.status_id)
FROM user_status us
WHERE us.destination_user_id IN (25,30,31,32,33,34,35,36,37,38,39,40,42,43,44,46,49,50,51,52,53,55,56,57,58,59,60,62,64,66,68,74,78,79,81,88,91,92,94,96,98,99,100,101,102,106,110,112,113,114,117,124,128,129,133,138,140,144,149,150,151,154,155,156,158,159,160,164,170,174,175,180,184,186,187,210,211,222,225,227,228,231,234,235,236,237,240,264,269,271,276,282,287,289,295,297,298,301,302,311,315,318,322,326,328,345,350,379,396,398,403,404,418,426,428,431,449,460,471,476,477,495,496,506,538,539,540,542,546,551,554,557,559,561,564,571,572,575,585,586,588,590,616,617,624,629,630,641,645,649,654,655,656,657,658,659,660,662,663,673,685,690,693,696,698,724,728,734,737,746,757,760,762,769,791,797,808,829,833,841,857,858,865,878,879,881,888,889,898,919,921,932,937,944,949,950,958,961,965,966,974,980,986,994,996,1005,1012,1013,1019,1020,1027,1044,1062,1079,1081,1097,1121,1122,1131,1140,1174,1178,1199,1214,1219,1221,1259,1261,1262,1268,1277,1282,1294,1300,1307,1320,1330,1331,1333,1336,1350,1361,1371,1388,1393,1440,1464,1482,1497,1507,1509,1511,1513,1514,1525,1537,1558,1569,1572,1573,1577,1584,1588,1591,1593,1627,1644,1645,1666,1688,1716,1729,1735,1751,1756,1803,1818,1828,1867,1871,1876,1914,1935,2038,2047,2058,2072,2074,2085,2106,2153,2168,2197,2232,2279,2355,2359,2511,2560,2651,2773,2803,2812,2818,2829,2835,2841,2865,2891,3032,3051,3095,3100,3148,3412,3476,3578,3623,3808,3853,3968,3976,3992,4045,4047,4069,4077,4119,4156,4237,4271,4280,4285,4337,4348,4644,4711,4872,4898,5084,5108,5110,5248,5254,5266,5268,5315,5318,5553,5716,5744,5768,5782,5784,5794,5815,5883,5920,5921,5985,5987,6016,6070,6364,7067,7522,7571,7733,7800,8259,8421,8640,9743,10039,11900,12344,12794,13419,13468,13548,13778,13829,13892,13902,13910,13976,13977,14042,14056,14171,14175,14176,14210,14255,14258,14279,14301,14343,14394,14465,14501,14538,14650,14656,14657,14805,14807,14813,14970,14975,15110,15174,15277,15284,15306,15354,15404,15649,15710,15776,16084,16099,14752,16516,1130,9770,1127,14200,13950,15842,16406,15614,16566,16209,16672,13887,16122,14857,16877,10093,15752,16131,17618,17767,5783,17867,16081,18224,6972,14273,18471,15403,16261,6641,18669,15153,18708,18534,17447,18843,18840,27,61,18656,18336,18006,15337,17197,18999,14360,19023,19002,16856,2885,17237,16560,15575,16297,11199,17836,14313,759,18403,19421,19514,2828,14562,1792,18131,19703,1280,18314,15944,17078,18316,19695,20017,16493,19566,17028,19104,17518,2045,16312,15508,20092,5060,18207,1773,17129,17154,18786,17077,15155,17640,2845,19480,20943,107,2775,21247,3989,20292,19077,20046,18230,18241,18102,19225,
14230,21011,5765,15344,21732,11249,15532,14105,4136,17373,14612,17944,17040,15505,17528,20461,22200,14059,11701,19410,3085,12180,22730,22631,17673,2820,20826,21895,23992,24080,24249,25144,25146,25171,25177,25181,25222,25223,25232,25245,25248,25250,25252,25255,25264,25267,25276,25279,25280,25284,25294,25298,25300,25312,25324,25332,25359,25373,25374,25381,25402,25412,25430,25434,25437,25442,25444,25446,25454,25465,25474,25486,25490,25491,25494,25535,25540,25549,25555,25568,25671,25711,25713,25714,25722,25737,25755,25768,25774,
25783,25784,25839,25854,25886,25889,25891,25913,25926,25956,25967,26026,26043)
GROUP BY us.original_id
ORDER BY us.status_id DESC
LIMIT 0,10;
Explain of the query:
Execution time: 10 rows in set (0,41 sec), MySQL 5.7 (strict mode turned off)
Imo. a table as small as 100k rows should be performing much better. I tried to change the indexes up and down but they seems to be properly set.
Any idea how could I optimise this query to 0.0x or 0.1x ?
Update
The linked duplicate is not related with my issue, shouldn't be linked imo.
Removing the unnecessary extra join resolved the issue.
Now it works as suppose to by using tight index scan http://dev.mysql.com/doc/refman/5.7/en/group-by-optimization.html
I can't get rid of the "Using temporary; Using filesort" even if I change the ORDER BY to "us.original_id" but the execution time is now as expected: 0.08
Remove the two extra tables, since us seems to be the only one relevant.
It is not valid to ORDER BY status_id since it is not in the GROUP BY, nor (technically) in the SELECT..

Subquery processing more rows than necessary

I am optimising my queries and found something I can't get my head around.
I am using the following query to select a bunch of categories, combining them with an alias from a table containing old and new aliases for categories:
SELECT `c`.`id` AS `category.id`,
(SELECT `alias`
FROM `aliases`
WHERE category_id = c.id
AND `old` = 0
AND `lang_id` = 1
ORDER BY `id` DESC
LIMIT 1) AS `category.alias`
FROM (`categories` AS c)
WHERE `c`.`status` = 1 AND `c`.`parent_id` = '11';
There are only 2 categories with a value of 11 for parent_id, so it should look up 2 categories from the alias table.
Still if I use EXPLAIN it says it has to process 48 rows. The alias table contains 1 entry per category as well (in this case, it can be more). Everything is indexed and if I understand correctly therefore it should find the correct alias immediately.
Now here's the weird thing. When I don't compare the aliases by the categories from the conditions, but manually by the category ids the query returns, it does process only 1 row, as intended with the index.
So I replace WHERE category_id = c.id by WHERE category_id IN (37, 43) and the query gets faster:
The only thing I can think of is that the subquery isn't run over the results from the query but before some filtering is done. Any kind of explanation or help is welcome!
Edit: silly me, the WHERE IN doesn't work as it doesn't make a unique selection. The question still stands though!
Create table schema
CREATE TABLE `aliases` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lang_id` int(2) unsigned NOT NULL DEFAULT '1',
`alias` varchar(255) DEFAULT NULL,
`product_id` int(10) unsigned DEFAULT NULL,
`category_id` int(10) unsigned DEFAULT NULL,
`brand_id` int(10) unsigned DEFAULT NULL,
`page_id` int(10) unsigned DEFAULT NULL,
`campaign_id` int(10) unsigned DEFAULT NULL,
`old` tinyint(1) unsigned DEFAULT '0',
PRIMARY KEY (`id`),
KEY `product_id` (`product_id`),
KEY `category_id` (`category_id`),
KEY `page_id` (`page_id`),
KEY `alias_product_id` (`product_id`,`alias`),
KEY `alias_category_id` (`category_id`,`alias`),
KEY `alias_page_id` (`page_id`,`alias`),
KEY `alias_brand_id` (`brand_id`,`alias`),
KEY `alias_product_id_old` (`alias`,`product_id`,`old`),
KEY `alias_category_id_old` (`alias`,`category_id`,`old`),
KEY `alias_brand_id_old` (`alias`,`brand_id`,`old`),
KEY `alias_page_id_old` (`alias`,`page_id`,`old`),
KEY `lang_brand_old` (`lang_id`,`brand_id`,`old`),
KEY `id_category_id_lang_id_old` (`lang_id`,`old`,`id`,`category_id`)
) ENGINE=InnoDB AUTO_INCREMENT=112392 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
SELECT ...
WHERE x=1 AND y=2
ORDER BY id DESC
LIMIT 1
will be performed in one of several ways.
Since you have not shown us the indexes you have (SHOW CREATE TABLE), I will cover some likely cases...
INDEX(x, y, id) -- This can find the last row for that condition, so it does not need to look at more than one row.
Some other index, or no index: Scan DESCending from the last id checking each row for x=1 AND y=2, stopping when (if) such a row is found.
Some other index, or no index: Scan the entire table, checking each row for x=1 AND y=2; collect them into a temp table; sort by id; deliver one row.
Some of the EXPLAIN clues:
Using where -- does not say much
Using filesort -- it did a sort, apparently for the ORDER BY. (It may have been entirely done in RAM; ignore 'file'.)
Using index condition (not "Using index") -- this indicates an internal optimization in which it can check the WHERE clause more efficiently than it used to in older versions.
Do not trust the "Rows" in EXPLAIN. Often they are reasonably correct, but sometimes they are off by orders of magnitude. Here is a better way to see "how much work" is being done in a rather fast query:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
With the CREATE TABLE, I may have suggestions on how to improve the index.