Any chance of speeding up this simple MySQL Query? - mysql

Stumped on this query, it searches an important table which contains about 213k rows. The purpose of the query is to report traffic data for a month. The amount of traffic for each day of that month. And sum of a decimal value for each day. This query is ran frequently so I need to optimize it to the best possible. Currently takes avg. 2 seconds..
SQL Fiddle: http://sqlfiddle.com/#!2/171f5/3/0
All suggestions will be greatly appreciated! Thank you.
Query:
SELECT `date_day`, COUNT(*) AS num, SUM(decval) AS sum_decval FROM (`tbl_traffic`)
WHERE `uuid` = '1' AND `date_year` = '2012' AND `date_month` = '11'
GROUP BY `date_day`;
Explain Result:
id: 1
select_type: SIMPLE
table: adb1_analytics
type: ref
possible_keys: keys1,keys2,keys3
key: keys1
key_len: 7
ref: const,const,const
rows: 106693
Extra: Using where
1 row in set (0.13 sec)
Table structure:
CREATE TABLE IF NOT EXISTS `tbl_traffic` (
`id` int(100) unsigned NOT NULL AUTO_INCREMENT,
`uuid` int(100) unsigned NOT NULL,
`country` char(2) CHARACTER SET latin1 DEFAULT NULL,
`browser` varchar(50) CHARACTER SET latin1 DEFAULT NULL,
`platform` varchar(50) CHARACTER SET latin1 DEFAULT NULL,
`referrer` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`decval` decimal(15,5) NOT NULL,
`date_year` smallint(4) unsigned NOT NULL,
`date_month` tinyint(2) unsigned NOT NULL,
`date_day` tinyint(2) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `keys1` (`uuid`,`date_year`,`date_month`,`date_day`),
KEY `keys2` (`date_year`,`date_month`,`referrer`),
KEY `keys3` (`date_year`,`date_month`,`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

You have no effective indexes on date_day.
I would recommend creating a key specifically for what you are fetching and calculating: (date_day, decval)

Use count(id) and add keys on date_day and decval.
A covering key over both fields might be even better

Anytime you do a 'group by' you're asking for a full table scan.
http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html
If you can't index it better (see #njk answer) then you might want to select for your specific values then do the group, sum, etc on the sub set. If nothing else, at least it will be a smaller set to sort.

Related

mysql query is slow, adding indexes doesnt work

I have a table with 100k+ rows but my queries are slow (they take about 3 seconds).
I tried making an index like this but this doesn't seem to do anything.
ALTER TABLE pm ADD INDEX (sender,reciever)
This is my query:
SELECT id,message FROM pm WHERE reciever = '28075' OR sender = '28075'
That takes 3 seconds more or less.
Explain of table PM
''
EXPLAIN of the query:
SHOW CREATE TABLE PM:
`CREATE TABLE `pm` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`datetime` int(11) NOT NULL,
`sender` int(11) NOT NULL,
`reciever` int(11) NOT NULL,
`users` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`readm` int(11) NOT NULL DEFAULT '0',
`forOp` int(11) NOT NULL DEFAULT '0',
`bussy` int(11) NOT NULL DEFAULT '0',
`bericht` longtext CHARACTER SET utf8mb4,
`aantal` int(11) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
KEY `users` (`users`(191)),
KEY `sender` (`sender`,`reciever`)
) ENGINE=InnoDB AUTO_INCREMENT=1637118 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci`
The reason the query could not use an index is that it uses OR, and your index can't be used to match the receiver (as a compound index requires that you match the leftmost column before matching the second one)
MySQL 5 added an index_merge which allows using multiple indexes for the same query, so if you have separate indexes on sender and receiver it could pick those.
An alternative would be to rewrite the query to use UNION and again use separate indexes instead of compound one:
SELECT id,message FROM pm WHERE reciever = '28075'
UNION
SELECT id,message FROM pm WHERE sender = '28075'
You can read more at this article

How to find the reason for the difference in the execution time of a query against different databases?

I have two databases with identical schemas. The one database is from production, the other is a test database. I'm doing a query against a single table from the database. On the production table the query takes around 4.3 seconds, while on the test database it takes about 130 ms. . However, the production table has less then 50.000 records, while I've seeded the test table with more than 100.000. I've compared the two tables and both have the same indexes. To me, it seems that the problem is in the data. While seeding I tried to generate as random data as possible, so that I can simulate production conditions, but still I couldn't reproduce the slow query.
I looked the the results from EXPLAIN for the two queries. They have significant differences in the last two columns.
Production:
+-------+-------------------------+
| rows | Extra |
+-------+-------------------------+
| 24459 | Using where |
| 46 | Using where; Not exists |
+-------+-------------------------+
Test:
+------+------------------------------------+
| rows | Extra |
+------+------------------------------------+
| 3158 | Using index condition; Using where |
| 20 | Using where; Not exists |
+------+------------------------------------+
The create statement for the table on production is:
CREATE TABLE `usage_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`operation` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`check_time` datetime NOT NULL,
`check_in_log_id` int(11) DEFAULT NULL,
`daily_usage_id` int(11) DEFAULT NULL,
`duration_units` decimal(11,2) DEFAULT NULL,
`is_deleted` tinyint(1) NOT NULL DEFAULT '0',
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`facility_id` int(11) NOT NULL,
`notes` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`mac_address` varchar(20) COLLATE utf8_unicode_ci NOT NULL DEFAULT '00:00:00:00:00:00',
`login` varchar(40) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_usage_logs_on_user_id` (`user_id`),
KEY `index_usage_logs_on_check_in_log_id` (`check_in_log_id`),
KEY `index_usage_logs_on_facility_id` (`facility_id`),
KEY `index_usage_logs_on_check_time` (`check_time`),
KEY `index_usage_logs_on_mac_address` (`mac_address`),
KEY `index_usage_logs_on_operation` (`operation`)
) ENGINE=InnoDB AUTO_INCREMENT=145147 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
while the same in the test database is:
CREATE TABLE `usage_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`operation` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`check_time` datetime NOT NULL,
`check_in_log_id` int(11) DEFAULT NULL,
`daily_usage_id` int(11) DEFAULT NULL,
`duration_units` decimal(11,2) DEFAULT NULL,
`is_deleted` tinyint(1) NOT NULL DEFAULT '0',
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`facility_id` int(11) NOT NULL,
`notes` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`mac_address` varchar(20) COLLATE utf8_unicode_ci NOT NULL DEFAULT '00:00:00:00:00:00',
`login` varchar(40) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_usage_logs_on_check_in_log_id` (`check_in_log_id`),
KEY `index_usage_logs_on_check_time` (`check_time`),
KEY `index_usage_logs_on_facility_id` (`facility_id`),
KEY `index_usage_logs_on_mac_address` (`mac_address`),
KEY `index_usage_logs_on_operation` (`operation`),
KEY `index_usage_logs_on_user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=104001 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
The full query is:
SELECT `usage_logs`.*
FROM `usage_logs`
LEFT OUTER JOIN usage_logs AS usage_logs_latest ON usage_logs.facility_id = usage_logs_latest.facility_id
AND usage_logs.user_id = usage_logs_latest.user_id
AND usage_logs.mac_address = usage_logs_latest.mac_address
AND usage_logs.check_time < usage_logs_latest.check_time
WHERE `usage_logs`.`facility_id` = 5
AND `usage_logs`.`operation` = 'checkIn'
AND (usage_logs.check_time >= '2018-06-08 00:00:00')
AND (usage_logs.check_time <= '2018-06-08 11:23:05')
AND (usage_logs_latest.id IS NULL)
I execute the query on the same machine against two different databases, so I don't think that other processes are interfering in the result.
What does this result mean and what further steps can I take in order to find out the reason for the big difference in the execution time?
What MySQL version(s) are you using?
There are many factors that lead to the decision by the Optimizer as to
which table to start with; (we can't see if they are different)
which index(es) to use; (we can't see)
etc.
Some of the factors:
the distribution of the index values at the moment,
the MySQL version,
the phase of the moon.
These can also lead to different numbers (estimates) in the EXPLAIN, which may lead to different query plans.
Also other activity in the server can interfere with the availability of CPU/IO/etc. In particular caching of the data can easily show a 10x difference. Did you run each query twice? Is the Query cache turned off? Is innodb_buffer_pool_size the same? Is RAM size the same?
I see Using index condition and no "composite" indexes. Often performance can be improved by providing a suitable composite index. More
I gotta see the query!
Seeding
Random, or not-so-random, rows can influence the Optimizer's choice of which index (etc) to use. This may have led to picking a better way to run the query on 'test'.
We need to see EXPLAIN SELECT ... to discuss this angle further.
Composite indexes
These are likely to help on both servers:
INDEX(facility_id, operation, -- either order
check_time) -- last
INDEX(facility_id, user_id, max_address, check_time, -- any order
id) -- last
There is a quick improvement. Instead of finding all the later rows, but not use the contents of them, use a 'semi-join' which asks of the non-existence of any such rows:
SELECT `usage_logs`.*
FROM `usage_logs`
WHERE `usage_logs`.`facility_id` = 5
AND `usage_logs`.`operation` = 'checkIn'
AND (usage_logs.check_time >= '2018-06-08 00:00:00')
AND (usage_logs.check_time <= '2018-06-08 11:23:05')
AND NOT EXISTS ( SELECT 1 FROM usage_logs AS latest
WHERE usage_logs.facility_id = latest.facility_id
AND usage_logs.user_id = latest.user_id
AND usage_logs.mac_address = latest.mac_address
AND usage_logs.check_time < latest.check_time )
(The same indexes will be fine.)
The query seems to be getting "all but the latest"; is that what you wanted?

MySQL selecting wrong index

The table in question has millions of records. The following query is really slow (takes up to four minutes to perform), because MySQL picks the primary key as index.
SELECT
*
FROM
`activities`
WHERE
`integration_id` = 11
ORDER BY `id` DESC
LIMIT 10 OFFSET 2
The table has multiple indexes, including an index on integration. When forced, the use of this index reduces the query time to about a second:
SELECT
*
FROM
`activities`
USE INDEX (integration)
WHERE
`integration_id` = 11
ORDER BY `id` DESC
LIMIT 10 OFFSET 2
Using EXPLAIN, MySQL shows the following:
first query: (the slow one)
key: PRIMARY
key_len: 4
rows: 5472
second query: (the fast one)
key: integration
key_len: 5
rows: 24028
Seeing MySQL's explanation, I understand why it picks the primary key. However, in reality the amount of rows in the first query should be millions; and indeed is excruciatingly slow to execute.
Unfortunately I cannot just force the index in the query, because the query is dynamic (some other clauses can be included), and an ORM is used. So i'm looking for a solution in MySQL's configuration, if possible.
Extra info
I found altering the query in some cases resulted in MySQL picking the correct index. E.g. removing the ORDER BY clause, and replacing SELECT * with SELECT <insert every table column here>. I've no clue why. Removing the order is no option for me, and manually selecting every column is tricky with the ORM (apart from that, I would like to know why this fixes the index picking).
The EXPLAIN
id: '1'
select_type: 'SIMPLE'
table: 'activities'
type: 'index'
possible_keys: 'integration_category,integration_level,integration_category_level,integration'
key: PRIMARY
key_len: 4
ref: null
rows: '5473'
Extra Using where
SHOW CREATE TABLE
'CREATE TABLE `activities` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`content` text COLLATE utf8_unicode_ci NOT NULL,
`parameters` text COLLATE utf8_unicode_ci,
`input` text COLLATE utf8_unicode_ci,
`output` text COLLATE utf8_unicode_ci,
`response` text COLLATE utf8_unicode_ci,
`integration_id` int(10) unsigned DEFAULT NULL,
`level` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`category` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `integration_category` (`integration_id`,`category`),
KEY `integration_level` (`integration_id`,`level`),
KEY `integration_category_level` (`integration_id`,`category`,`level`),
KEY `integration` (`integration_id`),
CONSTRAINT `activities_integration_id_foreign` FOREIGN KEY (`integration_id`) REFERENCES `integrations` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=11262471 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci'

MySQL not using index WHERE IN

I have a simple SELECT query that's using WHERE IN across 5 integer values to get 50 results in a table of roughly 20 million rows. Here's the query:
select `prf_profiles_id` as `id`, `first`, `last`, `suffix` from `prf_names`
where prf_names`.`src_sources_id` in (9, 10, 11, 34, 37) limit 50 offset 0;
And the explain values for the query:
select_type: SIMPLE
table: prf_names
type: ALL
possible_keys: prf_names_src_sources_id_index
key: NULL
ref: NULL
rows: 20012960
Extra: Using where
As can be seen, the query knows the index is possible, but chooses not to use it and as a result the query takes about 4.5 seconds.
src_sources_id is indexed, and contains about 20 distinct values applied to all 20 million rows.
I know that the index would help immensely because when I run the same query using FORCE INDEX:
select `prf_profiles_id` as `id`, `first`, `last`, `suffix` from `prf_names`
force index (prf_names_src_sources_id_index)
where `prf_names`.`src_sources_id` in (9, 10, 11, 34, 37) limit 50 offset 0;
The query takes under 0.0 seconds.
I'd like to avoid forcing MySQL to use the index as the query is being called through an ORM and forcing it would defeat some of the purpose of having the ORM in the first place.
What can I do to ensure MySQL uses this index going forward?
EDIT:
Here's the create table statement:
CREATE TABLE `prf_names` (
`prf_profiles_id` int(10) unsigned NOT NULL,
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`created` datetime NOT NULL,
`verified` datetime NOT NULL,
`title` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`first` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`middle` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`last` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`suffix` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
`sanitized` enum('yes','no') COLLATE utf8_unicode_ci NOT NULL DEFAULT 'no',
`src_sources_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`prf_profiles_id`,`id`),
UNIQUE KEY `prf_names_id_unique` (`id`),
KEY `prf_names_first_index` (`first`),
KEY `prf_names_middle_index` (`middle`),
KEY `prf_names_last_index` (`last`),
KEY `prf_names_src_sources_id_index` (`src_sources_id`)
) ENGINE=InnoDB AUTO_INCREMENT=31633081 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

optimize query (2 simple left joins)

SELECT fcat.id,fcat.title,fcat.description,
count(DISTINCT ftopic.id) as number_topics,
count(DISTINCT fpost.id) as number_posts FROM fcat
LEFT JOIN ftopic ON fcat.id=ftopic.cat_id
LEFT JOIN fpost ON ftopic.id=fpost.topic_id
GROUP BY fcat.id
ORDER BY fcat.ord
LIMIT 100;
index on ftopic_cat_id, fpost.topic_id, fcat.ord
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE fcat ALL PRIMARY NULL NULL NULL 11 Using temporary; Using filesort
1 SIMPLE ftopic ref PRIMARY,cat_id_2 cat_id_2 4 bloki.fcat.id 72
1 SIMPLE fpost ref topic_id_2 topic_id_2 4 bloki.ftopic.id 245
fcat - 11 rows,
ftopic - 1106 rows,
fpost - 363000 rows
Query takes 4,2 sec
TABLES:
CREATE TABLE IF NOT EXISTS `fcat` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(250) collate utf8_unicode_ci NOT NULL,
`description` varchar(250) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`ord` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `ord` (`ord`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=12 ;
CREATE TABLE IF NOT EXISTS `ftopic` (
`id` int(11) NOT NULL auto_increment,
`cat_id` int(11) NOT NULL,
`title` varchar(100) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`updated` timestamp NOT NULL default CURRENT_TIMESTAMP,
`lastname` varchar(200) collate utf8_unicode_ci NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`closed` tinyint(4) NOT NULL default '0',
`views` int(11) NOT NULL default '1',
PRIMARY KEY (`id`),
KEY `cat_id_2` (`cat_id`,`updated`,`visible`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1116 ;
CREATE TABLE IF NOT EXISTS `fpost` (
`id` int(11) NOT NULL auto_increment,
`topic_id` int(11) NOT NULL,
`pet_id` int(11) NOT NULL,
`content` text collate utf8_unicode_ci NOT NULL,
`imageName` varchar(300) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`reply_id` int(11) NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`md5` varchar(100) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `md5` (`md5`),
KEY `topic_id_2` (`topic_id`,`created`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=390971 ;
Thanks,
hamlet
you need to create a key with both fcat.id, fcat.ord
Bold rewrite
This code is not functionally identical, but...
Because you want to know about distinct ftopic.id and fpost.id I'm going to be bold and suggest two INNER JOIN's instead of LEFT JOIN's.
Then because the two id's are autoincrementing they will no longer repeat, so you can drop the distinct.
SELECT
fcat.id
, fcat.title
, fcat.description
, count(ftopic.id) as number_topics
, count(fpost.id) as number_posts
FROM fcat
INNER JOIN ftopic ON fcat.id = ftopic.cat_id
INNER JOIN fpost ON ftopic.id = fpost.topic_id
GROUP BY fcat.id
ORDER BY fcat.ord
LIMIT 100;
It depends on your data if this is what you are looking for, but I'm guessing it will be faster.
All your indexes seem to be in order though.
MySQL does not use indexes for small sample sizes!
Note that the explain list that MySQL only has 11 rows to consider for fcat. This is not enough for MySQL to really start worrying about indexes, so it doesn't.
Because going to the index for small row-counts slows things down.
MySQL is trying to speed things up so it chooses not to use the index, this confuses a lot of people because we are trained so hard on the index. Small sample sizes don't give good explains!
Increase the size of the test data so MySQL has more rows to consider and you should start seeing the index being used.
Common misconceptions about force index
Force index does not force MySQL to use an index as such.
It hints at MySQL to use a different index from the one it might naturally use and it pushes MySQL into using an index by setting a very high cost on a table scan.
(In your case MySQL is not using a table scan, so force index has no effect)
MySQL (same most other DBMS's on the planet) has a very strong urge to use indexes, so if it doesn't (use any) that's because using no index at all is faster.
How does MySQL know which index to use
One of the parameters the query optimizer uses is the stored cardinality of the indexes.
Over time these values change... But studying the table takes time, so MySQL doesn't do that unless you tell it to.
Another parameter that affects index selection is the predicted disk-seek-times that MySQL expects to encounter when performing the query.
Tips to improve index usage
ANALYZE TABLE will instruct MySQL to re-evaluate the indexes and update its key distribution (cardinality). (consider running it daily/weekly in a cron job)
SHOW INDEX FROM table will display the key distribution.
MyISAM tables and indexes fragment over time. Use OPTIMIZE TABLE to unfragment the tables and recreate the indexes.
FORCE/USE/IGNORE INDEX limits the options MySQL's query optimizer has to perform your query. Only consider it on complex queries.
Time the effect of your meddling with indexes on a regular basis. A forced index that speeds up your query today might slow it down tomorrow because the underlying data has changed.