MySQL query performance seems to scale linearly with number of parameters? - mysql

I have a setup like such:
using mysql 5.5
a building has a set of measurement equipment
measurements are stored in measurements -table
there are multiple different types
users can have access to either a whole building, or a set of equipment
a few million measurements
The table creation looks like this:
CREATE TABLE `measurements` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`timestamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
`building_id` INT(10) UNSIGNED NOT NULL,
`equipment_id` INT(10) UNSIGNED NOT NULL,
`state` ENUM('normal','high','low','alarm','error') NOT NULL DEFAULT 'normal',
`measurement_type` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `building_id` (`building_id`),
INDEX `equipment_id` (`equipment_id`),
INDEX `state` (`state`),
INDEX `timestamp` (`timestamp`),
INDEX `measurement_type` (`measurement_type`),
INDEX `building_timestamp_type` (`building_id`, `timestamp`, `measurement_type`),
INDEX `building_type_state` (`building_id`, `measurement_type`, `state`),
INDEX `equipment_type_state` (`equipment_id`, `type`, `state`),
INDEX `equipment_type_state_stamp` (`equipment_id`, `measurement_type`, `state`, `timestamp`),
) COLLATE='utf8_unicode_ci' ENGINE=InnoDB;
Now I need to query for the last 50 measurements of certain types that the user has access to. If the user has access to a whole building, the query runs very, very fast. However, if the user only has access to separate equipments, the query execution time seems to grow linearly with the number of equipment_ids. For example, I tested having only 2 equipment_ids in the IN query and the execution time was around 10ms. At 130 equipment_ids, it took 2.5s. The query I'm using looks like this:
SELECT *
FROM `measurements`
WHERE
`measurements`.`state` IN ('high','low','alarm')
AND `measurements`.`equipment_id` IN (
SELECT `equipment_users`.`equipment_id`
FROM `equipment_users`
WHERE `equipment_users`.`user_id` = 1
)
AND (`measurements`.`measurement_type` IN ('temperature','luminosity','humidity'))
ORDER BY `measurements`.`timestamp` DESC
LIMIT 50
The query seems to favor the measurement_type index which makes it take 15seconds, forcing it to use the equipment_type_state_stamp index drops that down to the listed numbers. Still, why does the execution time go up linearly with the number of ids, and is there something I could do to prevent that?

Related

Mysql Very Slow Performance Using Order by Clause

I have a one table with millions of entry.Below is table structure.
CREATE TABLE `useractivity` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`userid` bigint(20) NOT NULL,
`likes` bigint(20) DEFAULT NULL,
`views` bigint(20) DEFAULT NULL,
`shares` bigint(20) DEFAULT NULL,
`totalcount` bigint(20) DEFAULT NULL,
`status` bigint(20) DEFAULT NULL,
`createdat` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `userid` (`userid`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And Below is query in which i am getting slow performance.
SELECT userid,
(sum(likes)+SUM(views)+SUM(shares)+SUM(totalcount)+SUM(`status`)) as total
from useractivity
GROUP BY userid
ORDER BY total DESC
limit 0, 20;
When i am executing above query without ORDER BY then it gives me fast result set But when using ORDER BY then this query became slow,though i used limit for pagination.
What can I do to speed up this query?
You can't speed up the query as it is, MySQL needs to visit every single row and calculate the sum before sorting and finally returning the first rows. That is bound to take time. You can probably cheat though.
The most obvious approach would be to create a summary table with userid and total. Update it when the base table changes or recompute it regularly, whatever makes sense. In that table you can index total, which makes the query trivial.
Another option may be to find the top users. Most sites have users that are more active than the others. Keep the 1000 top users in a separate table, then use the same select but only for the top users (i.e. join with that table). Only the useractivity rows for the top users need to be visited, which should be fast. If 1000 users are not enough perhaps 10000 works.

Reduce processing power on query

I'm dealing with a table that where new rows are added several times per minute. Just as often do I need to run the query below, which I believe is the cause for recent lag spikes.
Is there any way I can make this query more efficient?
SELECT IFNULL((SELECT SUM(amount)
FROM transactions
WHERE to_account=:account), 0) - IFNULL((SELECT SUM(amount)
FROM transactions WHERE from_account=:account), 0) as balance;
CREATE TABLE IF NOT EXISTS `transactions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`from_account` int(11) NOT NULL,
`to_account` int(11) NOT NULL,
`amount` int(11) unsigned NOT NULL,
`fee` int(11) NOT NULL DEFAULT '0'
`ip` varchar(39) DEFAULT NULL,
`time` int(10) NOT NULL,
`type` enum('CLAIM','REFERRAL','DEPOSIT','WITHDRAWAL') NOT NULL,
`is_processed` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
KEY `from_account` (`from_account`),
KEY `to_account` (`to_account`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=24099 ;
It looks like your two table-reading queries are
SELECT SUM(amount)
FROM transactions
WHERE from_account = constant
and
SELECT SUM(amount)
FROM transactions
WHERE to_account = constant
The first of these can be optimized by a compound index on (from_account, amount) because it allows the query to be satisfied by an index range scan. The second query can be satisfied with a similar index. You can get rid of your indexes on from_account and to_account, replacing them with these compound indexes.
It is still possible that you have contention between your INSERT and SELECT queries. But indexing for the SELECTs should reduce that.
Ok, the lag as you experience might not be 100% preventable, but there is a small index trick which can reduce the chance on this lag:
Use a multi column index on all the values you use and or return in your query.
In your case that would be:
CREATE INDEX idx_nn_1 ON transactions(from_account,amount);
CREATE INDEX idx_nn_2 ON transactions(to_account,amount);
What this does is take care that the real records from the database do not have to be read to get you your result.
There is a drawback: Updating/inserting records in a multi column index is slower then in a single column index. The trade off here can only be determined by testing.
Also, simplify the query:
SELECT
( SELECT IFNULL(SUM(amount), 0) FROM transactions WHERE to_account = ... ) -
( SELECT IFNULL(SUM(amount), 0) FROM transactions WHERE from_account = ... )
AS balance;
The table does not seem vary big, but check your setting of innodb_buffer_pool_size.

MySQL: SUM/MAX/MIN GROUP BY query optimize

I have a table of bitcoin transactions:
CREATE TABLE `transactions` (
`trans_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`trans_exchange` int(10) unsigned DEFAULT NULL,
`trans_currency_base` int(10) unsigned DEFAULT NULL,
`trans_currency_counter` int(10) unsigned DEFAULT NULL,
`trans_tid` varchar(20) DEFAULT NULL,
`trans_type` tinyint(4) DEFAULT NULL,
`trans_price` decimal(15,4) DEFAULT NULL,
`trans_amount` decimal(15,8) DEFAULT NULL,
`trans_datetime` datetime DEFAULT NULL,
`trans_sid` bigint(20) DEFAULT NULL,
`trans_timestamp` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`trans_id`),
KEY `trans_tid` (`trans_tid`),
KEY `trans_datetime` (`trans_datetime`),
KEY `trans_timestmp` (`trans_timestamp`),
KEY `trans_price` (`trans_price`),
KEY `trans_amount` (`trans_amount`)
) ENGINE=MyISAM AUTO_INCREMENT=6162559 DEFAULT CHARSET=utf8;
As you can see from the AUTO_INCREMENT value, the table has over 6 million entries. There will eventually be many more.
I would like to query the table to obtain max price, min price, volume and total amount traded during arbitrary time intervals. To accomplish this, I'm using a query like this:
SELECT
DATE_FORMAT( MIN(transactions.trans_datetime),
'%Y/%m/%d %H:%i:00'
) AS trans_datetime,
SUM(transactions.trans_amount) as trans_volume,
MAX(transactions.trans_price) as trans_max_price,
MIN(transactions.trans_price) as trans_min_price,
COUNT(transactions.trans_id) AS trans_count
FROM
transactions
WHERE
transactions.trans_datetime BETWEEN '2014-09-14 00:00:00' AND '2015-09-13 23:59:00'
GROUP BY
transactions.trans_timestamp DIV 86400
That should select transactions made over a year period, grouped by day (86,400 seconds).
The idea is the timestamp field, which contains the same value as datetime, but as a timestamp...I found this faster than UNIX_TIMESTAMP(trans_datetime), is divided by the amount of seconds I want to be in the time intervals.
The problem: the query is slow. I'm getting 4+ seconds processing time. Here is the result of EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE transactions ALL trans_datetime,trans_timestmp NULL NULL NULL 6162558 Using where; Using temporary; Using filesort
The question: is it possible to optimize this better? Is this structure or approach flawed? I have tried several approaches, and have only succeeded in making modest millisecond-type gains.
Most of the data in the table is for the last 12 months? So you need to touch most of the table? Then there is no way to speed that query up. However, you can get the same output orders of magnitude faster...
Create a summary table. It would have a DATE as the PRIMARY KEY, and the columns would be effectively the fields mentioned in your SELECT.
Once you have initially populated the summary table, then maintain it by adding a new row each night for the day's transactions. More in my blog.
Then the query to get the desired output would hit this Summary Table (with only a few hundred rows), not the table with millions or rows.

mySQL query optimisation, range/composite index/group by

The basic form the the query is:
EXPLAIN SELECT SUM(impressions) as impressions, SUM(clicks) as clicks, SUM(cost) as cost, SUM(conversions) as conversions, keyword_id FROM `keyword_track` WHERE user_id=1 AND campaign_id=543 AND `recorded`>1325376071 GROUP BY keyword_id
It seems that I can either index say user_id, campaign_id and keyword_id and get the GROUP BY without a file sort, although a range index on the recorded is really going to more aggressively cut down on rows, this example has a big range but other queries have a much smaller time range.
Table looks like:
CREATE TABLE IF NOT EXISTS `keyword_track` (
`track_id` int(11) NOT NULL auto_increment,
`user_id` int(11) NOT NULL,
`campaign_id` int(11) NOT NULL,
`adgroup_id` int(11) NOT NULL,
`keyword_id` int(11) NOT NULL,
`recorded` int(11) NOT NULL,
`impressions` int(11) NOT NULL,
`clicks` int(11) NOT NULL,
`cost` decimal(10,2) NOT NULL,
`conversions` int(11) NOT NULL,
`max_cpc` decimal(3,2) NOT NULL,
`quality_score` tinyint(4) NOT NULL,
`avg_position` decimal(2,1) NOT NULL,
PRIMARY KEY (`track_id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
I have left any keys I currently have out of that. Basically by question is what would the best way be to get in index on the range which still indexing at least the campaign_id and ideally not needing to filesort (although that might be an acceptable tradeoff to get a range index on the recorded time).
Whenever we have range constraint and order by constraint on different attributes of a table, we can either take the advantage of the fast filtering or fast ordering for result set but not BOTH.
My answer is...
If your range constraint really cut down huge number of records and result a small set of rows out, better index to support the range constraint. i.e (user_id, campaign_id, recorded)
if not, i mean if there are really big number of rows even after the range condition validated and are supposed to be sorted, then go for for an index that support ordering.
i.e(user_id, campaign_id, key_id)
To better understand this, have a look at the below link where the same thing is explained very clearly.
http://explainextended.com/2009/04/01/choosing-index/
The best index for you in this case is composite one user_id + campaign_id + recorded
Though this will not help to avoid filesort as long as you have > comparison with recorded and group by field that isn't included in the index at all.

Optimization of a query with GROUP BY clause by using indexes

I need to optimize indexes in a table that stores more than 10 Millions rows. The query that is particularly time consuming takes up to 10 seconds to load (when WHERE clause filters only about 2 Millions rows - 8 Millions must be grouped). I have created a few indexes (some of them are complex, some simpler) and tried to find out how to speed this up. Perhaps I'm doing something wrong. MySQL is using optimized_5 index (based on EXPLAIN).
Here is the table's structure and the query:
CREATE TABLE IF NOT EXISTS `geo_reverse` (
`fid` mediumint(8) unsigned NOT NULL,
`tablename` enum('table1','table2') NOT NULL default 'table1',
`geo_continent` varchar(2) NOT NULL,
`geo_country` varchar(2) NOT NULL,
`geo_region` varchar(8) NOT NULL,
`geo_city` mediumint(8) unsigned NOT NULL,
`type` varchar(30) NOT NULL,
PRIMARY KEY (`fid`,`tablename`,`geo_continent`,`geo_country`,`geo_region`,`geo_city`),
KEY `geo_city` (`geo_city`),
KEY `fid` (`fid`),
KEY `geo_region` (`geo_region`,`geo_city`),
KEY `optimized` (`tablename`,`type`,`geo_continent`,`geo_country`,`geo_region`,`geo_city`,`fid`),
KEY `optimized_2` (`fid`,`tablename`),
KEY `optimized_3` (`type`,`geo_city`),
KEY `optimized_4` (`geo_city`,`tablename`),
KEY `optimized_5` (`tablename`,`type`,`geo_city`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
An example query:
SELECT type, COUNT(*) AS objects FROM geo_reverse WHERE tablename = 'table1' AND geo_city IN (5847207,5112771,4916894,...) GROUP BY type
Do you have any idea of how to speed the computation up?
i would use the following index: (geo_city, tablename, type) - geo_city is obviously more selective than tablename, thus it should be on the left. After the condition is applied, the rest should be sorted by type for grouping.