I have a query that is taking an embarrassingly long time. ~7 minutes embarrassing. I would really appreciate some help. Missing indexes? Rewrite the query? All of the above?
Many thanks
mysql Ver 14.14 Distrib 5.7.25, for Linux (x86_64)
The query looks like:
SELECT COUNT(*) AS count_all, name
FROM api_events ae
INNER JOIN products p on p.token=ae.product_token
WHERE (ae.created_at > '2019-01-21 12:16:53.853732')
GROUP BY name
Here are the two table definitions
api_events has ~31 million records
CREATE TABLE `api_events` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`api_name` varchar(200) NOT NULL,
`hostname` varchar(200) NOT NULL,
`controller_action` varchar(2000) NOT NULL,
`duration` decimal(12,5) NOT NULL DEFAULT '0.00000',
`view` decimal(12,5) NOT NULL DEFAULT '0.00000',
`db` decimal(12,5) NOT NULL DEFAULT '0.00000',
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`product_token` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `product_token` (`product_token`)
) ENGINE=InnoDB AUTO_INCREMENT=64851218 DEFAULT CHARSET=latin1;
and
products has only 12 records
CREATE TABLE `products` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`code` varchar(30) NOT NULL,
`name` varchar(100) NOT NULL,
`description` varchar(2000) NOT NULL,
`token` varchar(50) NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=19 DEFAULT CHARSET=latin1;
You could improve the join performance adding index
create index idx1 on api_events(product_token, created_at);
create index idx2 on products(token);
You could also trying inverting the columns ofr api_events
create index idx1 on api_events(created_at, product_token);
and trying add redundancy to product index
create index idx2 on products(token, name);
For the query as stated, you needed
api_events: INDEX(created_at, product_token)
products: INDEX(token, name)
Because the WHERE mentions api_events, the Optimizer is likely to start with that table. created_at is in the WHERE, so the index starts with that, even though starting with a 'range' is usually wrong. In this case, the pair is "covering".
Then, INDEX(token, name) is also "covering".
"Covering" indexes give a small, but widely varying, amount of performance improvement.
What happens if you group by the token instead of the name?
SELECT ae.product_token, COUNT(*) AS count_all
FROM api_events ae
WHERE ae.created_at > '2019-01-21 12:16:53.853732')
GROUP BY ae.product_token;
For this query, an index on api_events(created_at, product_token) will probably help.
If this is faster, then you can bring in the name using a subquery.
It seems like the criteria on created_at is very selective (looking at only the past 7 days?). That's crying out to explore an index with created_at as a leading column.
The query is also referencing the product_token column from the same table, so we can include that column in the index, to make it a covering index.
api_events_IX ON api_events ( created_at, product_token )
Using that index, we can probably avoid looking at the vast majority of the 31 million rows, and quickly narrow in on the subset of rows we actually need to look at.
Using the index, the query will still need a "Using filesort" operation to satisfy the GROUP BY.
(My guess here is that the join to the 12 rows in product doesn't exclude a lot of rows... that on the vast majority of rows in api_event the product_token refers to a row that exists in product.
Use MySQL EXPLAIN to see the query execution plan.
A further possible refinement (to test the performance of) would be to do some of the aggregation in an inline view:
SELECT SUM(s.count_all) AS count_all
, p.name
FROM ( SELECT COUNT(*) AS count_all
, ae.product_token
FROM api_events ae
WHERE ae.created_at > '2019-01-21 12:16:53.853732'
GROUP
BY ae.product_token
) s
JOIN products p
ON p.token = s.product_token
GROUP
BY p.name
If the assumption about product_token is misinformed, if there are lots of rows in api_event that have product_token values that don't reference a row in product ... we might take a different tack ...
I have a one table with millions of entry.Below is table structure.
CREATE TABLE `useractivity` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`userid` bigint(20) NOT NULL,
`likes` bigint(20) DEFAULT NULL,
`views` bigint(20) DEFAULT NULL,
`shares` bigint(20) DEFAULT NULL,
`totalcount` bigint(20) DEFAULT NULL,
`status` bigint(20) DEFAULT NULL,
`createdat` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `userid` (`userid`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And Below is query in which i am getting slow performance.
SELECT userid,
(sum(likes)+SUM(views)+SUM(shares)+SUM(totalcount)+SUM(`status`)) as total
from useractivity
GROUP BY userid
ORDER BY total DESC
limit 0, 20;
When i am executing above query without ORDER BY then it gives me fast result set But when using ORDER BY then this query became slow,though i used limit for pagination.
What can I do to speed up this query?
You can't speed up the query as it is, MySQL needs to visit every single row and calculate the sum before sorting and finally returning the first rows. That is bound to take time. You can probably cheat though.
The most obvious approach would be to create a summary table with userid and total. Update it when the base table changes or recompute it regularly, whatever makes sense. In that table you can index total, which makes the query trivial.
Another option may be to find the top users. Most sites have users that are more active than the others. Keep the 1000 top users in a separate table, then use the same select but only for the top users (i.e. join with that table). Only the useractivity rows for the top users need to be visited, which should be fast. If 1000 users are not enough perhaps 10000 works.
I have a table defined as follows:
| book | CREATE TABLE `book` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`provider_id` int(10) unsigned DEFAULT '0',
`source_id` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`description` longtext COLLATE utf8_unicode_ci,
PRIMARY KEY (`id`),
UNIQUE KEY `provider` (`provider_id`,`source_id`),
KEY `idx_source_id` (`source_id`),
) ENGINE=InnoDB AUTO_INCREMENT=1605425 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
when there are about 10 concurrent read with following sql:
SELECT * FROM `book` WHERE (provider_id = '1' AND source_id = '1037122800') ORDER BY `book`.`id` ASC LIMIT 1
it becomes slow, it takes about 100 ms.
however if I changed it to
SELECT * FROM `book` WHERE (provider_id = '1' AND source_id = '221630001') LIMIT 1
then it is normal, it takes several ms.
I don't understand why adding order by id makes query much slower? could anyone expain?
Try to add desired columns (Select Column Name,.. ) instead of * or Refer this.
Why is my SQL Server ORDER BY slow despite the ordered column being indexed?
I'm not a mysql expert, and not able to perform a detailed analysis, but my guess would be that because you are providing values for the UNIQUE KEY in the WHERE clause, the engine can go and fetch that row directly using an index.
However, when you ask it to ORDER BY the id column, which is a PRIMARY KEY, that changes the access path. The engine now guesses that since it has an index on id, and you want to order by id, it is better to fetch that data in PK order, which will avoid a sort. In this case though, it leads to a slower result, as it has to compare every row to the criteria (a table scan).
Note that this is just conjecture. You would need to EXPLAIN both statements to see what is going on.
I have a site where there is an activity feed, similar to how social sites like Facebook have one. It is a "newest first" list that describes actions taken by users. In production, there's about 200k entries in that table.
Since this is going to be asked anyway, I'll first share the full table structure:
CREATE TABLE `karmalog` (
`id` int(11) NOT NULL auto_increment,
`guid` char(36) default NULL,
`user_id` int(11) default NULL,
`user_name` varchar(45) default NULL,
`user_avat_url` varchar(255) default NULL,
`user_sec_id` int(11) default NULL,
`user_sec_name` varchar(45) default NULL,
`user_sec_avat_url` varchar(255) default NULL,
`event` enum('EDIT_PROFILE','EDIT_AVATAR','EDIT_EMAIL','EDIT_PASSWORD','FAV_IMG_ADD','FAV_IMG_ADDED','FAV_IMG_REMOVE','FAV_IMG_REMOVED','FOLLOW','FOLLOWED','UNFOLLOW','UNFOLLOWED','COM_POSTED','COM_POST','COM_VOTE','COM_VOTED','IMG_VOTED','IMG_UPLOAD','LIST_CREATE','LIST_DELETE','LIST_ADMINDELETE','LIST_VOTE','LIST_VOTED','IMG_UPD','IMG_RESTORE','IMG_UPD_LIC','IMG_UPD_MOD','IMG_GEO','IMG_UPD_MODERATED','IMG_VOTE','IMG_VOTED','TAG_FAV_ADD','CLASS_DOWN','CLASS_UP','IMG_DELETE','IMG_ADMINDELETE','IMG_ADMINDELETEFAV','SET_PASSWORD','IMG_RESTORED','IMG_VIEW','FORUM_CREATE','FORUM_DELETE','FORUM_ADMINDELETE','FORUM_REPLY','FORUM_DELETEREPLY','FORUM_ADMINDELETEREPLY','FORUM_SUBSCRIBE','FORUM_UNSUBSCRIBE','TAG_INFO_EDITED','IMG_ADDSPECIE','IMG_REMOVESPECIE','SPECIE_ADDVIDEO','SPECIE_REMOVEVIDEO','EARN_MEDAL','JOIN') NOT NULL,
`event_type` enum('follow','tag','image','class','list','forum','specie','medal','user') NOT NULL,
`active` bit(1) NOT NULL,
`delete` bit(1) NOT NULL default '\0',
`object_id` int(11) default NULL,
`object_cache` text,
`object_sec_id` int(11) default NULL,
`object_sec_cache` text,
`karma_delta` int(11) NOT NULL,
`gold_delta` int(11) NOT NULL,
`newkarma` int(11) NOT NULL,
`newgold` int(11) NOT NULL,
`migrated` int(11) NOT NULL default '0',
`date_created` timestamp NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `user_sec_id` (`user_sec_id`),
KEY `image_id` (`object_id`),
KEY `date_event` (`date_created`,`event`),
KEY `event` (`event`),
KEY `date_created` (`date_created`),
CONSTRAINT `karmalog_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `user` (`id`) ON DELETE SET NULL,
CONSTRAINT `karmalog_ibfk_2` FOREIGN KEY (`user_sec_id`) REFERENCES `user` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Before optimizing this table, my query had 5 joins and I ran into slow query times. I have denormalized all of that data, so that not a single join is there anymore. So the table and query is flat.
As you can see in the table design, there's an "event" field which is an enum, holding a few dozen possible values. Throughout the site, I show activity feeds based on specific event types. Typically that query looks like this:
SELECT * FROM karmalog as k
WHERE k.event IN ($events) AND k.delete=0
ORDER BY k.date_created DESC, k.id DESC
LIMIT 0,30
What this query does is to find the latest 30 entries in the total set that match any of the events passed in $events, which can be multiple.
Due to removing the joins and having indices on most fields, I was expecting this to perform very well, but it doesn't. On 200k entries, it still takes over 3 seconds and I don't understand why.
Regarding solutions, I know I could archive older entries or partition the table per event type, but that will have quite a code impact, and I first would like to understand why the above is so slow.
As a temporary work-around, I'm now doing this:
SELECT * FROM
(SELECT * FROM karmalog ORDER BY date_created DESC, id DESC LIMIT 0,1000) as karma
WHERE karma.event IN ($events) AND karma.delete=0
LIMIT $page,$pagesize
What this does is to limit the baseset to search in to the latest 1000 entries only, hoping and guessing that there's 30 entries to be found for the filters that I pass in. It's not very robust though. It will not work for more rare events, and it brings pagination issues.
Therefore, I first like to get to the root cause of why my initial query is slow, against my expectation.
Edit: I was asked to share the execution plan. Here's the test query:
EXPLAIN SELECT * FROM karmalog
WHERE event IN ('FAV_IMG_ADD','FOLLOW','COM_POST','IMG_VOTE','LIST_VOTE','JOIN','CLASS_UP','LIST_CREATE','FORUM_REPLY','FORUM_CREATE','FORUM_SUBSCRIBE','IMG_GEO','IMG_ADDSPECIE','SPECIE_ADDVIDEO','EARN_MEDAL') AND karmalog.delete=0
ORDER BY date_created DESC, id DESC
LIMIT 0,36
Execution plan:
id = 1
select_type = SIMPLE
table = karmalog
type = range
possible_keys = event
key = event
key_len = 1
red = NULL
rows = 80519
Extra = Using where; Using filesort
I'm not sure how to read into the above, but I do know that the sort clause really seems to kill this query. With this sorting, it takes 4.3 secs, without 0.03 secs.
SELECT * sometimes slows down ordered queries by a huge amount, so let's start by refactoring your query as follows:
SELECT k.*
FROM karmalog AS k
JOIN (
SELECT id
FROM karmalog
WHERE event IN ($events)
AND delete=0
ORDER BY date_created DESC, id DESC
LIMIT 0,30
) AS m ON k.id = m.id
ORDER BY k.date_created DESC, k.id DESC
This will do your ORDER BY ... LIMIT operation without having to haul the whole table around in the sorting phase. Finally it will look up the appropriate thirty rows from the original table and sort just those again. This might save a whole lot of I/O and in-memory data shuffling.
Second, if id column values are assigned in ascending order as records are inserted, then the use of date_created in your ORDER BY operation is redundant. But MySQL doesn't know that, so leaving it out might help. This will be true if you always use the current date when inserting, and never update the dates.
Third, you might be able to use a compound covering index for the selection (inner) query. This is an index that contains all the fields you need. When you use a covering index, the whole query can be satisfied from the index, and there's no need to bounce back to the original table. This saves disk access time.
Try this compound covering index: (delete, event, id). If you decide you can't get rid of the use of date_created in your ordering, try this instead: (delete, event, date_created, id)
Add a compound index over the two relevant questions. In your table, you can do that by specifying e.g.
KEY `date_created` (`date_created`, `event`)
This key can still be used to satisfy plain old date_created range searching. But in addition to that, the event data is included as well, so the DBS will be able to detect the relevant rows by only looking at the index.
If you want, you can try the other order as well: first event and then date. This might allow some optimization if there are many event types but your filter only contains few. On the other hand, I'm not sure the system will be able to make use of the LIMIT clause in this case, so I'm not certain that this other order will be any help at all.
Edit: I completely missed that your date_event index already has this info. According to your execution plan, though, that one isn't used. Looks like the optimizer is getting things wrong. You could try removing the event index, and perhaps the date index as well, and see what happens then.
There is a structure:
CREATE TABLE IF NOT EXISTS `categories` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`parent_id` int(11) unsigned NOT NULL DEFAULT '0',
`title` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Query_1:
SELECT * FROM `categories` WHERE `id` = 1234
Query_2:
SELECT * FROM `categories` WHERE `id` = 1234 LIMIT 1
I need to get just one row. Since we apply WHERE id=1234 (finding by PRIMARY KEY) obviously that row with id=1234 is only one in whole table.
After MySQL has found the row, whether engine to continue the search when using Query_1?
Thanks in advance.
Look at this SQLFiddle: http://sqlfiddle.com/#!2/a8713/4 and especially View Execution Plan.
You see, that MySQL recognizes the predicate on a PRIMARY column and therefore it does not matter if you add LIMIT 1 or not.
PS: A little more explanation: Look at the column rows of the Execution Plan. The number there is the amount of columns, the query engine thinks, it has to examine. Since the columns content is unique (as it's a primary key), this is 1. Compare it to this: http://sqlfiddle.com/#!2/9868b/2 same schema but without primary key. Here rows says 8. (The execution plan is explained in the German MySQL reference, http://dev.mysql.com/doc/refman/5.1/en/explain.html the English one is for some reason not so detailed.)