I have a one table with millions of entry.Below is table structure.
CREATE TABLE `useractivity` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`userid` bigint(20) NOT NULL,
`likes` bigint(20) DEFAULT NULL,
`views` bigint(20) DEFAULT NULL,
`shares` bigint(20) DEFAULT NULL,
`totalcount` bigint(20) DEFAULT NULL,
`status` bigint(20) DEFAULT NULL,
`createdat` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `userid` (`userid`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And Below is query in which i am getting slow performance.
SELECT userid,
(sum(likes)+SUM(views)+SUM(shares)+SUM(totalcount)+SUM(`status`)) as total
from useractivity
GROUP BY userid
ORDER BY total DESC
limit 0, 20;
When i am executing above query without ORDER BY then it gives me fast result set But when using ORDER BY then this query became slow,though i used limit for pagination.
What can I do to speed up this query?
You can't speed up the query as it is, MySQL needs to visit every single row and calculate the sum before sorting and finally returning the first rows. That is bound to take time. You can probably cheat though.
The most obvious approach would be to create a summary table with userid and total. Update it when the base table changes or recompute it regularly, whatever makes sense. In that table you can index total, which makes the query trivial.
Another option may be to find the top users. Most sites have users that are more active than the others. Keep the 1000 top users in a separate table, then use the same select but only for the top users (i.e. join with that table). Only the useractivity rows for the top users need to be visited, which should be fast. If 1000 users are not enough perhaps 10000 works.
Related
I have a table defined as follows:
| book | CREATE TABLE `book` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`provider_id` int(10) unsigned DEFAULT '0',
`source_id` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`description` longtext COLLATE utf8_unicode_ci,
PRIMARY KEY (`id`),
UNIQUE KEY `provider` (`provider_id`,`source_id`),
KEY `idx_source_id` (`source_id`),
) ENGINE=InnoDB AUTO_INCREMENT=1605425 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
when there are about 10 concurrent read with following sql:
SELECT * FROM `book` WHERE (provider_id = '1' AND source_id = '1037122800') ORDER BY `book`.`id` ASC LIMIT 1
it becomes slow, it takes about 100 ms.
however if I changed it to
SELECT * FROM `book` WHERE (provider_id = '1' AND source_id = '221630001') LIMIT 1
then it is normal, it takes several ms.
I don't understand why adding order by id makes query much slower? could anyone expain?
Try to add desired columns (Select Column Name,.. ) instead of * or Refer this.
Why is my SQL Server ORDER BY slow despite the ordered column being indexed?
I'm not a mysql expert, and not able to perform a detailed analysis, but my guess would be that because you are providing values for the UNIQUE KEY in the WHERE clause, the engine can go and fetch that row directly using an index.
However, when you ask it to ORDER BY the id column, which is a PRIMARY KEY, that changes the access path. The engine now guesses that since it has an index on id, and you want to order by id, it is better to fetch that data in PK order, which will avoid a sort. In this case though, it leads to a slower result, as it has to compare every row to the criteria (a table scan).
Note that this is just conjecture. You would need to EXPLAIN both statements to see what is going on.
I have a setup like such:
using mysql 5.5
a building has a set of measurement equipment
measurements are stored in measurements -table
there are multiple different types
users can have access to either a whole building, or a set of equipment
a few million measurements
The table creation looks like this:
CREATE TABLE `measurements` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`timestamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
`building_id` INT(10) UNSIGNED NOT NULL,
`equipment_id` INT(10) UNSIGNED NOT NULL,
`state` ENUM('normal','high','low','alarm','error') NOT NULL DEFAULT 'normal',
`measurement_type` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `building_id` (`building_id`),
INDEX `equipment_id` (`equipment_id`),
INDEX `state` (`state`),
INDEX `timestamp` (`timestamp`),
INDEX `measurement_type` (`measurement_type`),
INDEX `building_timestamp_type` (`building_id`, `timestamp`, `measurement_type`),
INDEX `building_type_state` (`building_id`, `measurement_type`, `state`),
INDEX `equipment_type_state` (`equipment_id`, `type`, `state`),
INDEX `equipment_type_state_stamp` (`equipment_id`, `measurement_type`, `state`, `timestamp`),
) COLLATE='utf8_unicode_ci' ENGINE=InnoDB;
Now I need to query for the last 50 measurements of certain types that the user has access to. If the user has access to a whole building, the query runs very, very fast. However, if the user only has access to separate equipments, the query execution time seems to grow linearly with the number of equipment_ids. For example, I tested having only 2 equipment_ids in the IN query and the execution time was around 10ms. At 130 equipment_ids, it took 2.5s. The query I'm using looks like this:
SELECT *
FROM `measurements`
WHERE
`measurements`.`state` IN ('high','low','alarm')
AND `measurements`.`equipment_id` IN (
SELECT `equipment_users`.`equipment_id`
FROM `equipment_users`
WHERE `equipment_users`.`user_id` = 1
)
AND (`measurements`.`measurement_type` IN ('temperature','luminosity','humidity'))
ORDER BY `measurements`.`timestamp` DESC
LIMIT 50
The query seems to favor the measurement_type index which makes it take 15seconds, forcing it to use the equipment_type_state_stamp index drops that down to the listed numbers. Still, why does the execution time go up linearly with the number of ids, and is there something I could do to prevent that?
I am optimising my queries and found something I can't get my head around.
I am using the following query to select a bunch of categories, combining them with an alias from a table containing old and new aliases for categories:
SELECT `c`.`id` AS `category.id`,
(SELECT `alias`
FROM `aliases`
WHERE category_id = c.id
AND `old` = 0
AND `lang_id` = 1
ORDER BY `id` DESC
LIMIT 1) AS `category.alias`
FROM (`categories` AS c)
WHERE `c`.`status` = 1 AND `c`.`parent_id` = '11';
There are only 2 categories with a value of 11 for parent_id, so it should look up 2 categories from the alias table.
Still if I use EXPLAIN it says it has to process 48 rows. The alias table contains 1 entry per category as well (in this case, it can be more). Everything is indexed and if I understand correctly therefore it should find the correct alias immediately.
Now here's the weird thing. When I don't compare the aliases by the categories from the conditions, but manually by the category ids the query returns, it does process only 1 row, as intended with the index.
So I replace WHERE category_id = c.id by WHERE category_id IN (37, 43) and the query gets faster:
The only thing I can think of is that the subquery isn't run over the results from the query but before some filtering is done. Any kind of explanation or help is welcome!
Edit: silly me, the WHERE IN doesn't work as it doesn't make a unique selection. The question still stands though!
Create table schema
CREATE TABLE `aliases` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lang_id` int(2) unsigned NOT NULL DEFAULT '1',
`alias` varchar(255) DEFAULT NULL,
`product_id` int(10) unsigned DEFAULT NULL,
`category_id` int(10) unsigned DEFAULT NULL,
`brand_id` int(10) unsigned DEFAULT NULL,
`page_id` int(10) unsigned DEFAULT NULL,
`campaign_id` int(10) unsigned DEFAULT NULL,
`old` tinyint(1) unsigned DEFAULT '0',
PRIMARY KEY (`id`),
KEY `product_id` (`product_id`),
KEY `category_id` (`category_id`),
KEY `page_id` (`page_id`),
KEY `alias_product_id` (`product_id`,`alias`),
KEY `alias_category_id` (`category_id`,`alias`),
KEY `alias_page_id` (`page_id`,`alias`),
KEY `alias_brand_id` (`brand_id`,`alias`),
KEY `alias_product_id_old` (`alias`,`product_id`,`old`),
KEY `alias_category_id_old` (`alias`,`category_id`,`old`),
KEY `alias_brand_id_old` (`alias`,`brand_id`,`old`),
KEY `alias_page_id_old` (`alias`,`page_id`,`old`),
KEY `lang_brand_old` (`lang_id`,`brand_id`,`old`),
KEY `id_category_id_lang_id_old` (`lang_id`,`old`,`id`,`category_id`)
) ENGINE=InnoDB AUTO_INCREMENT=112392 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
SELECT ...
WHERE x=1 AND y=2
ORDER BY id DESC
LIMIT 1
will be performed in one of several ways.
Since you have not shown us the indexes you have (SHOW CREATE TABLE), I will cover some likely cases...
INDEX(x, y, id) -- This can find the last row for that condition, so it does not need to look at more than one row.
Some other index, or no index: Scan DESCending from the last id checking each row for x=1 AND y=2, stopping when (if) such a row is found.
Some other index, or no index: Scan the entire table, checking each row for x=1 AND y=2; collect them into a temp table; sort by id; deliver one row.
Some of the EXPLAIN clues:
Using where -- does not say much
Using filesort -- it did a sort, apparently for the ORDER BY. (It may have been entirely done in RAM; ignore 'file'.)
Using index condition (not "Using index") -- this indicates an internal optimization in which it can check the WHERE clause more efficiently than it used to in older versions.
Do not trust the "Rows" in EXPLAIN. Often they are reasonably correct, but sometimes they are off by orders of magnitude. Here is a better way to see "how much work" is being done in a rather fast query:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
With the CREATE TABLE, I may have suggestions on how to improve the index.
I've been working on a small Perl program that works with a table of articles, displaying them to the user if they have not been already read. It has been working nicely and it has been quite speedy, overall. However, this afternoon, the performance has degraded from fast enough that I wasn't worried about optimizing the query to a glacial 3-4 seconds per query. To select articles, I present this query:
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
WHERE ciid NOT
IN (
SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)
AND (
cid =117
OR cid =308
OR cid =310
)
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
The list of possible cid's varies and could be quite a bit more. In any case, I noted that about 2-3 seconds of the total time to make the query is devoted to "ORDER BY." If I remove that, it only takes about a half second to give me the query back. If I drop the subquery, the performance goes back to normal... but the subquery didn't seem to be problematic until just this afternoon, after working fine for a week or so.
Any ideas what could be slowing it down so much? What might I do to try to get the performance back up to snuff? The table being queried has 45,000 rows. The subquery's table has fewer than 3,000 rows at present.
Update: Incidentally, if anyone has suggestions on how to do multiple queries or some other technique that would be more efficient to accomplish what I am trying to do, I am all ears. I'm really puzzled how to solve the problem at this point. Can I somehow apply the order by before the join to make it apply to the real table and not the derived table? Would that be more efficient?
Here is the latest version of the query, derived from suggestions from #Gordon, below
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
LEFT JOIN (
SELECT ciid, dateRead
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)alreadyRead ON channelitem.ciid = alreadyRead.ciid
WHERE (
alreadyRead.ciid IS NULL
)
AND `cid`
IN ( 6648, 329, 323, 6654, 6647 )
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
Also, I should mention what my db structure looks like with regards to these two tables -- maybe someone can spot something odd about the structure:
CREATE TABLE IF NOT EXISTS `channelitem` (
`newsversion` int(11) NOT NULL DEFAULT '0',
`cid` int(11) NOT NULL DEFAULT '0',
`ciid` int(11) NOT NULL AUTO_INCREMENT,
`description` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`url` varchar(222) DEFAULT NULL,
`creationdate` datetime DEFAULT NULL,
`urgent` varchar(10) DEFAULT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`lastchanged` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`author` varchar(255) NOT NULL,
PRIMARY KEY (`ciid`),
KEY `newsversion` (`newsversion`),
KEY `cid` (`cid`),
KEY `creationdate` (`creationdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1638554365 ;
CREATE TABLE IF NOT EXISTS `uninet_channelitem_read` (
`ciid` int(11) NOT NULL,
`uid` int(11) NOT NULL,
`dateRead` datetime NOT NULL,
PRIMARY KEY (`ciid`,`uid`),
KEY `ciid` (`ciid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
It never hurts to try the left outer join version of such a query:
SELECT ci.ciid, ci.cid, ci.name, ci.description, ci.url, ci.creationdate, ci.author
FROM `channelitem` ci left outer join
(SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
) cr
on ci.ciid = cr.ciid
where cr.ciid is null and
ci.cid in (117, 308, 310)
ORDER BY ci.`creationdate` DESC
LIMIT 0 , 100
This query will be faster with an index on uninet_channelitem_read(ciid) and probably on channelitem(cid, ciid, createddate).
The problem could be that you need to create an index on the channelitem table for the column creationdate. Indexes help a database to run queries faster. Here is a link about MySQL Indexing
The basic form the the query is:
EXPLAIN SELECT SUM(impressions) as impressions, SUM(clicks) as clicks, SUM(cost) as cost, SUM(conversions) as conversions, keyword_id FROM `keyword_track` WHERE user_id=1 AND campaign_id=543 AND `recorded`>1325376071 GROUP BY keyword_id
It seems that I can either index say user_id, campaign_id and keyword_id and get the GROUP BY without a file sort, although a range index on the recorded is really going to more aggressively cut down on rows, this example has a big range but other queries have a much smaller time range.
Table looks like:
CREATE TABLE IF NOT EXISTS `keyword_track` (
`track_id` int(11) NOT NULL auto_increment,
`user_id` int(11) NOT NULL,
`campaign_id` int(11) NOT NULL,
`adgroup_id` int(11) NOT NULL,
`keyword_id` int(11) NOT NULL,
`recorded` int(11) NOT NULL,
`impressions` int(11) NOT NULL,
`clicks` int(11) NOT NULL,
`cost` decimal(10,2) NOT NULL,
`conversions` int(11) NOT NULL,
`max_cpc` decimal(3,2) NOT NULL,
`quality_score` tinyint(4) NOT NULL,
`avg_position` decimal(2,1) NOT NULL,
PRIMARY KEY (`track_id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
I have left any keys I currently have out of that. Basically by question is what would the best way be to get in index on the range which still indexing at least the campaign_id and ideally not needing to filesort (although that might be an acceptable tradeoff to get a range index on the recorded time).
Whenever we have range constraint and order by constraint on different attributes of a table, we can either take the advantage of the fast filtering or fast ordering for result set but not BOTH.
My answer is...
If your range constraint really cut down huge number of records and result a small set of rows out, better index to support the range constraint. i.e (user_id, campaign_id, recorded)
if not, i mean if there are really big number of rows even after the range condition validated and are supposed to be sorted, then go for for an index that support ordering.
i.e(user_id, campaign_id, key_id)
To better understand this, have a look at the below link where the same thing is explained very clearly.
http://explainextended.com/2009/04/01/choosing-index/
The best index for you in this case is composite one user_id + campaign_id + recorded
Though this will not help to avoid filesort as long as you have > comparison with recorded and group by field that isn't included in the index at all.