Optimise mysql where, group by, scanning too many rows - mysql

Table representing statuses. User can re-share a status as on FB, therefore the original_id.
| user_status | CREATE TABLE `user_status` (
`status_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`destination_user_id` int(11) NOT NULL,
`original_id` int(11) DEFAULT NULL,
`type` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`status_id`),
KEY `IDX_1E527E21A76ED395` (`user_id`),
KEY `IDX_1E527E21C957ECED` (`destination_user_id`),
KEY `core_index` (`destination_user_id`,`original_id`),
CONSTRAINT `FK_1E527E21A76ED395` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`),
CONSTRAINT `FK_1E527E21C957ECED` FOREIGN KEY (`destination_user_id`)REFERENCES `users` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=161362 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
I am trying to optimise a query for newsfeed (I removed everything unnecessary just to be able to optimise the core of the query but it's still simply slow).
Query:
EXPLAIN SELECT MAX(us.status_id)
FROM user_status us
WHERE us.destination_user_id IN (25,30,31,32,33,34,35,36,37,38,39,40,42,43,44,46,49,50,51,52,53,55,56,57,58,59,60,62,64,66,68,74,78,79,81,88,91,92,94,96,98,99,100,101,102,106,110,112,113,114,117,124,128,129,133,138,140,144,149,150,151,154,155,156,158,159,160,164,170,174,175,180,184,186,187,210,211,222,225,227,228,231,234,235,236,237,240,264,269,271,276,282,287,289,295,297,298,301,302,311,315,318,322,326,328,345,350,379,396,398,403,404,418,426,428,431,449,460,471,476,477,495,496,506,538,539,540,542,546,551,554,557,559,561,564,571,572,575,585,586,588,590,616,617,624,629,630,641,645,649,654,655,656,657,658,659,660,662,663,673,685,690,693,696,698,724,728,734,737,746,757,760,762,769,791,797,808,829,833,841,857,858,865,878,879,881,888,889,898,919,921,932,937,944,949,950,958,961,965,966,974,980,986,994,996,1005,1012,1013,1019,1020,1027,1044,1062,1079,1081,1097,1121,1122,1131,1140,1174,1178,1199,1214,1219,1221,1259,1261,1262,1268,1277,1282,1294,1300,1307,1320,1330,1331,1333,1336,1350,1361,1371,1388,1393,1440,1464,1482,1497,1507,1509,1511,1513,1514,1525,1537,1558,1569,1572,1573,1577,1584,1588,1591,1593,1627,1644,1645,1666,1688,1716,1729,1735,1751,1756,1803,1818,1828,1867,1871,1876,1914,1935,2038,2047,2058,2072,2074,2085,2106,2153,2168,2197,2232,2279,2355,2359,2511,2560,2651,2773,2803,2812,2818,2829,2835,2841,2865,2891,3032,3051,3095,3100,3148,3412,3476,3578,3623,3808,3853,3968,3976,3992,4045,4047,4069,4077,4119,4156,4237,4271,4280,4285,4337,4348,4644,4711,4872,4898,5084,5108,5110,5248,5254,5266,5268,5315,5318,5553,5716,5744,5768,5782,5784,5794,5815,5883,5920,5921,5985,5987,6016,6070,6364,7067,7522,7571,7733,7800,8259,8421,8640,9743,10039,11900,12344,12794,13419,13468,13548,13778,13829,13892,13902,13910,13976,13977,14042,14056,14171,14175,14176,14210,14255,14258,14279,14301,14343,14394,14465,14501,14538,14650,14656,14657,14805,14807,14813,14970,14975,15110,15174,15277,15284,15306,15354,15404,15649,15710,15776,16084,16099,14752,16516,1130,9770,1127,14200,13950,15842,16406,15614,16566,16209,16672,13887,16122,14857,16877,10093,15752,16131,17618,17767,5783,17867,16081,18224,6972,14273,18471,15403,16261,6641,18669,15153,18708,18534,17447,18843,18840,27,61,18656,18336,18006,15337,17197,18999,14360,19023,19002,16856,2885,17237,16560,15575,16297,11199,17836,14313,759,18403,19421,19514,2828,14562,1792,18131,19703,1280,18314,15944,17078,18316,19695,20017,16493,19566,17028,19104,17518,2045,16312,15508,20092,5060,18207,1773,17129,17154,18786,17077,15155,17640,2845,19480,20943,107,2775,21247,3989,20292,19077,20046,18230,18241,18102,19225,
14230,21011,5765,15344,21732,11249,15532,14105,4136,17373,14612,17944,17040,15505,17528,20461,22200,14059,11701,19410,3085,12180,22730,22631,17673,2820,20826,21895,23992,24080,24249,25144,25146,25171,25177,25181,25222,25223,25232,25245,25248,25250,25252,25255,25264,25267,25276,25279,25280,25284,25294,25298,25300,25312,25324,25332,25359,25373,25374,25381,25402,25412,25430,25434,25437,25442,25444,25446,25454,25465,25474,25486,25490,25491,25494,25535,25540,25549,25555,25568,25671,25711,25713,25714,25722,25737,25755,25768,25774,
25783,25784,25839,25854,25886,25889,25891,25913,25926,25956,25967,26026,26043)
GROUP BY us.original_id
ORDER BY us.status_id DESC
LIMIT 0,10;
Explain of the query:
Execution time: 10 rows in set (0,41 sec), MySQL 5.7 (strict mode turned off)
Imo. a table as small as 100k rows should be performing much better. I tried to change the indexes up and down but they seems to be properly set.
Any idea how could I optimise this query to 0.0x or 0.1x ?
Update
The linked duplicate is not related with my issue, shouldn't be linked imo.
Removing the unnecessary extra join resolved the issue.
Now it works as suppose to by using tight index scan http://dev.mysql.com/doc/refman/5.7/en/group-by-optimization.html
I can't get rid of the "Using temporary; Using filesort" even if I change the ORDER BY to "us.original_id" but the execution time is now as expected: 0.08

Remove the two extra tables, since us seems to be the only one relevant.
It is not valid to ORDER BY status_id since it is not in the GROUP BY, nor (technically) in the SELECT..

Related

Mysql is looking in much more, estimate rows, then expected

I have user_rates table where i have two user foreign references user_id_owner and user_id_rated.
This is my create table query:
CREATE TABLE `user_rates` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id_owner` int(10) unsigned NOT NULL,
`user_id_rated` int(10) unsigned NOT NULL,
`value` int(11) NOT NULL COMMENT '0 - dislike, 1 - like',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_rates_user_id_rated_foreign` (`user_id_rated`),
KEY `user_rates_user_id_owner_foreign` (`user_id_owner`),
CONSTRAINT `user_rates_user_id_owner_foreign` FOREIGN KEY (`user_id_owner`) REFERENCES `users` (`id`),
CONSTRAINT `user_rates_user_id_rated_foreign` FOREIGN KEY (`user_id_rated`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1825767 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
When i execute this query:
EXPLAIN SELECT
user_id_rated
FROM
`user_rates` AS ur
WHERE
ur.user_id_owner = 10101;
It shows estimate rows to examine 107000, but returning only 60000.
Can you explain me why it's examining so many rows, when it is comparing with equality operator and also comparing field is foreign key?
EDIT
I am getting this on EXPLAIN
I want to add several where clauses also. At last my query looks like this:
Explain SELECT
user_id_rated
FROM
`user_rates` AS ur
WHERE
ur.user_id_owner = 10101
AND (ur.value IN (1, 2, 3)
OR (ur.value = 0
AND ur.created_at > '2020-02-04 00:00:00'));
Output:
It will be nice if query can be more optimized. I don't understand why isn't it reducing estimate rows.
Steps i tried when optimizing
Added compose index on (user_id_owner, value, created_at)
But estimate row is not reducing, It is filtering even more rows
Maybe i am doing indexing wrong? I really don't know how to make proper indexes. Sorry for bad question, I am new here. Thanks in advance.
The "rows" is an estimate, often quite far off -- sometimes even worse than your example. The incorrectness of the estimate rarely impacts performance.
You can run ANALYZE TABLE tablename to improve the estimate. But it may still not be better.
For the current query, use:
( SELECT user_id_rated
FROM `user_rates` AS ur
WHERE ur.user_id_owner = 10101
AND ur.value IN (1, 2, 3)
)
UNION ALL
( SELECT user_id_rated
FROM `user_rates` AS ur
WHERE ur.user_id_owner = 10101
AND ur.value = 0
AND ur.created_at > '2020-02-04 00:00:00'
);
And have the composite (and "covering") indexes:
INDEX(user_id_owner, value, user_id_rated)
INDEX(user_id_owner, value, created_at, user_id_rated)
If there are other variations of the query, show us. As you may guess; the details are important.
(The simplified version of the query does not provide any useful information when discussing the real query.)

MySQL composite index effect on joins

I have the following SQL query (DB is MySQL 5):
select
event.full_session_id,
DATE(min(event.date)),
event_exe.user_id,
COUNT(DISTINCT event_pat.user_id)
FROM
event AS event
JOIN event_participant AS event_pat ON
event.pat_id = event_pat.id
JOIN event_participant AS event_exe on
event.exe_id = event_exe.id
WHERE
event_pat.user_id <> event_exe.user_id
GROUP BY
event.full_session_id;
"SHOW CREATE TABLE event":
CREATE TABLE `event` (
`id` int(12) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`session_id` varchar(64) DEFAULT NULL,
`full_session_id` varchar(72) DEFAULT NULL,
`pat_id` int(12) DEFAULT NULL,
`exe_id` int(12) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `SESSION_IDX` (`full_session_id`),
KEY `PAT_ID_IDX` (`pat_id`),
KEY `DATE_IDX` (`date`),
KEY `SESSLOGPATEXEC_IDX` (`full_session_id`,`date`,`pat_id`,`exe_id`)
) ENGINE=MyISAM AUTO_INCREMENT=371955 DEFAULT CHARSET=utf8
"SHOW CREATE TABLE event_participant":
CREATE TABLE `event_participant` (
`id` int(12) NOT NULL AUTO_INCREMENT,
`user_id` varchar(64) NOT NULL,
`alt_user_id` varchar(64) NOT NULL,
`username` varchar(128) NOT NULL,
`usertype` varchar(32) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ALL_UNQ` (`user_id`,`alt_user_id`,`username`,`usertype`),
KEY `USER_ID_IDX` (`user_id`)
) ENGINE=MyISAM AUTO_INCREMENT=5397 DEFAULT CHARSET=utf8
Also, the query itself seems ugly, but this is legacy code on a production system, so we are not expected to change it (at least for now).
The problem is that, there is around 36 million record on the event table (in the production system), so there have been frequent crashes of the DB machine due to using temporary;using filesort processing (they provided these EXPLAIN outputs, unfortunately, I don't have them right now. I'll try to update them to this post later.)
The customer asks for a "quick fix" by adding indices. Currently we have indices on full_session_id, pat_id, date (separately) on event and user_id on event_participant.
Thus I'm thinking of creating a composite index (pat_id, exe_id, full_session_id, date) on event- this index comprises of the fields in the join (equivalent to where ?), then group by, then aggregate (min) parts.
This is just an idea because we currently don't have that kind of data volume to test, so we try the best we could first.
My question is:
Could the index above help in the performance ? (It's quite confusing on the effect because I have found two really contrasting results: https://dba.stackexchange.com/questions/158385/compound-index-on-inner-join-table
versus Separate Join clause in a Composite Index, where the latter suggests that composite index on joins won't work and the former that it'll work.
Does this path (adding indices) have hopes ? Or should we forget it and just try to optimize the query instead ?
Thanks in advance for your help :)
Update:
I have updated the full table description for the two related tables.
MySQL version is 5.1.69. But I think we don't need to worry about the ambiguous data issue mentioned in the comments, because it seems there won't be ambiguity for our data. Specifically, for each full_session_id, there is only one "event_exe.user_id" returned (it's just a business logic in the application)
So, what do you think about my 2 questions ?

Indexes for a large MYSQL table

hope you will allow me to pick your brains so I can gain some knowledge in the process.
We have 3 tables - data_product, data_issuer, data_accountbalance
CREATE TABLE `data_issuer` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`issuer_name` varchar(128) NOT NULL
PRIMARY KEY (`id`)
) ENGINE=InnoDB
CREATE TABLE `data_product` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
`issuer_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `data_product_name_issuer_id_260fec65_uniq` (`name`,`issuer_id`),
KEY `data_product_issuer_id_d07fa696_fk_data_issuer_id` (`issuer_id`),
CONSTRAINT `data_product_issuer_id_d07fa696_fk_data_issuer_id` FOREIGN KEY
(`issuer_id`) REFERENCES `data_issuer` (`id`)
) ENGINE=InnoDB
CREATE TABLE `data_accountbalance` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date` date NOT NULL,
`nominee_name` varchar(128) NOT NULL,
`beneficiary_name` varchar(128) NOT NULL,
`nominee_id` varchar(128) NOT NULL,
`account_id` varchar(16) NOT NULL,
`product_id` int(11) NOT NULL,
`register_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `data_accountbalance_date_product_id_nominee__7b8d2c6a_uniq` (`date`,`product_id`,`nominee_id`,`beneficiary_name`),
KEY `data_accountbalance_product_id_nominee_id_date_8ef8754f_idx` (`product_id`,`nominee_id`,`date`),
KEY `data_accountbalance_register_id_4e78ec16_fk_data_register_id` (`register_id`),
KEY `data_accountbalance_product_id_date_nominee_i_c3a41e39_idx` (`product_id`,`date`,`nominee_id`,`beneficiary_name`,`balance_amount`),
CONSTRAINT `data_accountbalance_product_id_acfb18f6_fk_data_product_id` FOREIGN KEY (`product_id`) REFERENCES `data_product` (`id`),
CONSTRAINT `data_accountbalance_register_id_4e78ec16_fk_data_register_id` FOREIGN KEY (`register_id`) REFERENCES `data_register` (`id`)
) ENGINE=InnoDB
When running the query below, the system takes about an hour to respond -
SELECT SQL_NO_CACHE *
from data_product
INNER JOIN `data_issuer` ON (`data_issuer`.`id` = `data_product`.`issuer_id`)
INNER JOIN `data_accountbalance` ON (`data_accountbalance`.`product_id` = `data_product`.`id`)
LIMIT 100000000;
Both data_issuer and data_product only have few 100 records in them, but the data_accountbalance is huge with about 15,384,358 records.
The explain plan produced is below -
# id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE data_product ALL PRIMARY,data_product_issuer_id_d07fa696_fk_data_issuer_id 459 100
1 SIMPLE data_issuer eq_ref PRIMARY PRIMARY 4 pnl.data_product.issuer_id 1 100
1 SIMPLE data_accountbalance ref data_accountbalance_product_id_nominee_id_date_8ef8754f_idx,data_accountbalance_product_id_date_nominee_i_c3a41e39_idx data_accountbalance_product_id_date_nominee_i_c3a41e39_idx 4 pnl.data_product.id 493 100
Can someone help tune the query so it does not take an hour to run please? Appreciate any pointers you might have for me.
If your query is literally what you are showing there... Then thats the problem. It has no WHERE clause.
That query would literally return 15,384,358 results. As the two smaller tables are typical domain tables with NOT NULL relations all the way across, it will return 1 to 1 results for every row in data_accountbalance.
The actual time cost will probably be in creating a Massive temp table (tho I'm not sure about that). Just to download the entire database, all 3 tables, you could look into optimize your temp table MySQL config to possibly speed this up, OR preferably make it so that when you start executing the query that you can read the results as MySQL gets them ready (avoids a temp table). Alternatively, maybe your script that runs this query is trying to read the whole data set into memory, which takes a long time?
Is there a particular reason to download All the data? Usually you just download the data you are meaning to operate on. Or have MySQL do the grouping, summing, etc then return the answer you wanted based on All the data.
How many rows did you expect the query to return? If you are thinking something less than 15 million, then the answer is to add some kind of WHERE statement, or an aggregate function. Depending on what table and column in you use to reduce the result set, those columns will have to be indexed.
I hope this helps. :)

Indexing needs to be sped up

I have a table with the following details:
CREATE TABLE `test` (
`seenDate` datetime NOT NULL DEFAULT '0001-01-01 00:00:00',
`corrected_test` varchar(45) DEFAULT NULL,
`corrected_timestamp` timestamp NULL DEFAULT NULL,
`unable_to_correct` tinyint(1) DEFAULT '0',
`fk_zone_for_correction` int(11) DEFAULT NULL,
PRIMARY KEY (`sightinguid`),
KEY `corrected_test` (`corrected_test`),
KEY `idx_seenDate` (`seenDate`),
KEY `idx_corrected_test_seenDate` (`corrected_test`,`seenDate`),
KEY `zone_for_correction_fk_idx` (`fk_zone_for_correction`),
KEY `idx_corrected_test_zone` (`fk_zone_for_correction`,`corrected_test`,`seenDate`),
CONSTRAINT `zone_for_correction_fk` FOREIGN KEY (`fk_zone_for_correction`) REFERENCES `zone_test` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I am then using the following query:
SELECT
*
FROM
test
WHERE
fk_zone_for_correction = 1
AND (unable_to_correct = 0
OR unable_to_correct IS NULL)
AND (corrected_test = ''
OR corrected_test IS NULL)
AND (last_accessed_timestamp IS NULL
OR last_accessed_timestamp < (NOW() - INTERVAL 30 MINUTE))
ORDER BY seenDate ASC
LIMIT 1
Here is a screenshot of the optimiser - the ORDER BY is slowing things down, and in my opinion seems to be indexed properly, and the correct index (idx_corrected_test_zone) is being selected. What can be done to improve it?
There is no INDEX that will help much.
This might help:
INDEX(fk_zone_for_correction, seenDate)
Both columns can perhaps be used -- the first for filtering, the second for avoiding having to sort. But, it could backfire if it can't find the 1 row quickly.
The killer is OR. If you could avoid ever populating any of those 3 columns with NULL, then this might be better:
INDEX(fk_zone_for_correction, unable_to_correct, corrected_test, last_accessed_timestamp)
-- the range thing needs to be last
-- this index would do the filtering, but fail to help with `ORDER` and `LIMIT`.
Even though it is using idx_corrected_test_zone, it is probably not using more than the first two columns -- because of OR.
You have two cases of redundant indexes. For example, the first of these is the left part of the second; so the first is redundant and can be DROPped:
KEY `corrected_test` (`corrected_test`),
KEY `idx_corrected_test_seenDate` (`corrected_test`,`seenDate`),

mysql select with order by using filesort no index used

Sorry fot long post but this is really strange and I am close to give it up. 2 tables:
CREATE TABLE `endu_results` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`base_name` varchar(200) NOT NULL,
`base_nr` int(11) DEFAULT NULL,
`base_yob` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `endu_results_206a6355` (`base_name`),
KEY `endu_results_63df4402` (`base_nr`),
KEY `base_yob` (`base_yob`)
) ENGINE=InnoDB AUTO_INCREMENT=3424028 DEFAULT CHARSET=utf8;enter code here
and 2nd:
CREATE TABLE `endu_resultinterest` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`result_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `endu_resultinterest_3b529087` (`result_id`),
CONSTRAINT `result_id_refs_id_19e24435` FOREIGN KEY (`result_id`) REFERENCES `endu_results` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=48590 DEFAULT CHARSET=utf8;
There are about 2mln records in endu_resultstable and less then 100K i endu_resultinterest. I have slow query:
explain select base_yob from endu_resultinterest
inner join endu_results
on (endu_results.id = endu_resultinterest.result_id)
order by endu_results.base_yob;
1 SIMPLE endu_resultinterest index endu_resultinterest_3b529087 endu_resultinterest_3b529087 4 NULL 47559 Using index; Using temporary; Using filesort
The question is: Why mysql is using this index: endu_resultinterest_3b529087 - but it should use base_yob - this is where sorting is requested ?
To test it further I have manaully created 2 additional identical tables endu_testresults and endu_testresultintrest and filled those with some records:
CREATE TABLE `endu_testresults` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`base_yob` int(11) DEFAULT NULL,
`base_name` varchar(200) NOT NULL,
`base_nr` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `endu_testresults_a65b2616` (`base_yob`),
KEY `endu_testresults_ba0ab39c` (`base_name`),
KEY `endu_testresults_d75ba04d` (`base_nr`)
) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8;
So I go again for explain:
explain select base_yob from endu_testresultinterest
inner join endu_testresults
on (endu_testresults.id = endu_testresultinterest.result_id)
order by endu_testresults.base_yob;
and suprise suprise:
1 SIMPLE endu_testresults index PRIMARY endu_testresults_a65b2616 5 NULL 19 Using index
Index sort column base_yob (endu_testresults_a65b2616) is now used.
Why is that in one case index is used and in other I got 'using filesort;using temporary ? Does size matters ? I will try to copy records from one to another but do not get it with indexes. MySql is 5.6.16
Short answer: Because it is faster.
Long answer...
Your EXPLAINs seem to be incomplete -- I would expect 2 lines in each.
The first table is 20 (70?) times as big as the second. The optimizer picked the smaller table to start with. Hence it is initially doing 1/20th the amount of work. The sort that comes later (ORDER BY ...) is much less work than if it had to do 20 times as much work to start with.
The output is only 48K rows, correct? And that is how many rows in the 2nd table, correct?
Your test tables did not have the same bigger/smaller ratio, did they? Hence the different EXPLAIN.