I cannot understand why the first query, which is using a derived table, is slower than the second one.
My table:
CREATE TABLE `test` (
`someid` binary(16) NOT NULL,
`indexedcolumn1` int(11) NOT NULL,
`indexedcolumn2` int(10) unsigned NOT NULL,
`data` int(11) NOT NULL,
KEY `indexedcolumn1` (`indexedcolumn1`),
KEY `indexedcolumn2` (`indexedcolumn2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
This table contains 4.514.856 rows
The faster query:
SELECT SUM(isSame) AS same, SUM(isDifferent) AS diff, SUM(isNotSet) AS notSet, indexedcolumn1 FROM (
SELECT
CASE WHEN t.indexedcolumn1 = t.data
THEN 1
ELSE 0
END AS isSame,
CASE WHEN t.indexedcolumn1 != t.data
THEN 1
ELSE 0
END AS isDifferent,
CASE WHEN t.data = 0
THEN 1
ELSE 0
END AS isNotSet,
indexedcolumn1
FROM
test as t
WHERE
t.indexedcolumn2 >= 10000000
)AS tempTable GROUP BY indexedcolumn1;
Result:
72 rows in set (4.70 sec)
The slower query:
SELECT
SUM(CASE WHEN t.indexedcolumn1 = t.data
THEN 1
ELSE 0
END) AS same,
SUM(CASE WHEN t.indexedcolumn1 != t.data
THEN 1
ELSE 0
END) AS diff,
SUM(CASE WHEN t.data = 0
THEN 1
ELSE 0
END) AS notSet,
indexedcolumn1
FROM
test as t
WHERE
t.indexedcolumn2 >= 10000000
GROUP BY indexedcolumn1;
Result:
72 rows in set (5.90 sec)
I thought you should avoid a derived table whenever its possible. Even EXPLAIN does not give any hint:
for query1:
+----+-------------+------------+------+----------------+------+---------+------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+----------------+------+---------+------+---------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2257428 | Using temporary; Using filesort |
| 2 | DERIVED | t | ALL | indexedcolumn2 | NULL | NULL | NULL | 4514856 | Using where |
+----+-------------+------------+------+----------------+------+---------+------+---------+---------------------------------+
for query 2:
+----+-------------+-------+-------+---------------------------+------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------------------+------------+---------+------+---------+-------------+
| 1 | SIMPLE | t | index | indexedcolumn1,indexedcolumn2 | indexedcolumn1 | 4 | NULL | 4514856 | Using where |
+----+-------------+-------+-------+---------------------------+------------+---------+------+---------+-------------+
I also tried the tests several times, always with the same result: The first query was faster.... But why? The results are the same.
EDIT:
I did a additional test: I removed the where clause. Even then I get better results for the first query (EXPLAIN):
+----+-------------+------------+------+---------------+------+---------+------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+---------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4514856 | Using temporary; Using filesort |
| 2 | DERIVED | t | ALL | NULL | NULL | NULL | NULL | 4514856 | NULL |
+----+-------------+------------+------+---------------+------+---------+------+---------+---------------------------------+
Explain Query 2:
+----+-------------+-------+-------+---------------+------------+---------+------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+------+---------+-------+
| 1 | SIMPLE | t | index | indexedcolumn1 | indexedcolumn1 | 4 | NULL | 4514856 | NULL |
+----+-------------+-------+-------+---------------+------------+---------+------+---------+-------+
I am initially surprised at the difference in performance. The derived table incurs the overhead of materialization. MySQL might combine that with the first step of the sorting (used for GROUP BY), so it might make no difference with ORDER BY.
Given that you are only working with 72 rows, the overhead for the GROUP BY would seem to be minimal, so I would expect the two to be pretty similar.
But the key is in the index usage. The first version uses an index to filter the data -- essentially looking up each of the 72 rows -- and then doing the group by. I'm surprised this takes multiple seconds.
The second is using the index on the group by. This saves a sort, but it requires a full table scan.
I have 5 comments, of varying degrees of relevance:
(1) Using a "covering" index is likely to be faster:
INDEX(indexedcolumn2, indexedcolumn1, data)
I expect this will make the non-subquery approach beat the other.
(2) You really should have a `PRIMARY KEY for any InnoDB table.
(3) Slight shortening:
CASE WHEN t.indexedcolumn1 = t.data
THEN 1
ELSE 0
END AS isSame,
-->
t.indexedcolumn1 = t.data AS isSame,
and
SUM(CASE WHEN t.indexedcolumn1 = t.data
THEN 1
ELSE 0
END) AS same,
-->
SUM(t.indexedcolumn1 = t.data) AS same,
(4) When running timings, run the query twice -- the first time may involve I/O (for caching) than the second.
(5) The query is in a gray area where the Optimizer does not have enough knowledge of the distribution of data to necessarily pick the best way to perform the query. The faster query made better use of an index to help with the filtering (WHERE) than the slower query, which banked on avoiding the 'sort' to deal with the GROUP BY.
Version 5.7 has a different "cost model" and may have picked the 'right' approach. What version are you using.
Related
I've been struggling when it comes to optimizing the following query (Example 1):
SELECT `service`.*
FROM
(
SELECT `storeUser`.`storeId`
FROM `storeUser`
WHERE `storeUser`.`userId` = 1
UNION
SELECT `store`.`storeId`
FROM `companyUser`
INNER JOIN `store` ON `companyUser`.`companyId` = `store`.`companyId`
WHERE `companyUser`.`userId` = 1
UNION
SELECT `store`.`storeId`
FROM `accountUser`
INNER JOIN `company` ON `company`.`accountId` = `accountUser`.`accountId`
INNER JOIN `store` ON `company`.`companyId` = `store`.`companyId`
WHERE `accountUser`.`userId` = 1
) AS `storeUser`
INNER JOIN `service` ON `storeUser`.`storeId` = `service`.`storeId`
LIMIT 10;
The subquery should be returning something like "1","2","3,"4"
Anyway it's super slow and takes about 48 seconds to give a response, even though the subquery by itself, ran in a different console, takes about 0,0020ms to give results.
The same applies if I place the subquery inside an IN instead (Example 2):
SELECT `service`.*
FROM `service`
WHERE 1
AND `service`.`storeId` IN (
SELECT `storeUser`.`storeId` FROM `storeUser` WHERE `storeUser`.`userId` = 1
UNION
SELECT `store`.`storeId` FROM `companyUser`
INNER JOIN `store` ON `companyUser`.`companyId` = `store`.`companyId`
WHERE `companyUser`.`userId` = 1
UNION
SELECT `store`.`storeId`
FROM `accountUser`
INNER JOIN `company` ON `company`.`accountId` = `accountUser`.`accountId`
INNER JOIN `store` ON `company`.`companyId` = `store`.`companyId`
WHERE `accountUser`.`userId` = 1
)
LIMIT 10;
However if I simply put the values returned by that query, manually, it's basically instantly:
SELECT
`service`.*
FROM
`service`
WHERE 1
AND `service`.`storeId` IN (
"1", "2", "3", "4", "5"
)
LIMIT 10;
Important to mention that'd I've reviewed the indexes in the joins and everything seems to be in place, and the EXPLAIN [query] returns a filtered score of 100 for basically everything.
Edit:
Sorry for not providing enough information before, hope this can be more helpful:
MySQL 5.7,
Storage engine: InnoDB
EXPLAINs
1.) StoreUser
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | storeUser | NULL | ref | PRIMARY, storeUserUser | PRIMARY | 4 | const | 1 |100.00 | Using index
2.) CompanyUser
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | companyUser | NULL | ref | PRIMARY,companyUserCompany,companyUserUser | companyUserUser | 4 | const | 30 | 100.00 | Using index
1 | SIMPLE | store | NULL | ref | storeCompany | storeCompany | 4 | Table.companyUser.companyId | 5 | 100.00 | Using index
3.) AccountUser
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | accountUser | NULL | ref | PRIMARY,accountUserUser | accountUserUser | 4 | const | 1 | 100.00 | Using index
1 | SIMPLE | company | NULL | ref | PRIMARY,companyAccount | companyAccount | 4 | Table.accountUser.accountId | 305 | 100.00 | Using index
1 | SIMPLE | store | NULL | ref | storeCompany | storeCompany | 4 | Table.company.companyId | 5 | 100.00 | Using index
4.) Whole query (Example 2)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | PRIMARY | service | NULL | ALL | NULL | NULL | NULL | NULL | 2836046 | 100.00 | Using where
2 | DEPENDENT SUBQUERY | storeUser | NULL | eq_ref | PRIMARY,storeUserStore,storeUserUser | PRIMARY | 8 | const,func | 1 | 100.00 | Using index
3 | DEPENDENT UNION | store | NULL | eq_ref | PRIMARY,storeCompany | PRIMARY | 4 | func | 1 | 100.00 | NULL
3 | DEPENDENT UNION | companyUser | NULL | eq_ref | PRIMARY,companyUserCompany,companyUserUser | PRIMARY | 8 | const,Table.store.companyId | 1 | 100.00 | Using index
4 | DEPENDENT UNION | companyUser | NULL | ref | PRIMARY,accountUserUser | accountUserUser | 4 | const | 1 | 100.00 | Using index
4 | DEPENDENT UNION | store | NULL | eq_ref | PRIMARY,storeCompany | PRIMARY | 4 | func | 1 | 100.00 | NULL
4 | DEPENDENT UNION | company | NULL | eq_ref | PRIMARY,companyAccount | PRIMARY | 4 | Table.store.companyId | 1 | 100.00 | Using where
NULL | UNION RESULT | <union2,3,4>| NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary
You didn't show us your indexes or EXPLAIN output, so all this is guesswork.
Clearly it's the subquery in your second example that's not optimized. That subquery is a UNION with three branches. The way you address performance trouble? Analyze and optimize each branch of the UNION separately.
You certainly need some better indexes, unless your database server is too small or misconfigured. That's very rare, so let's work on indexes.
The first branch is
SELECT storeUser.storeId
FROM storeUser
WHERE storeUser.userId = 1
This compound index covers that query. Try adding it. If you have a separate index on just userId, drop it when you add this one.
ALTER TABLE storeUser ADD INDEX userId_storeId (userId, storeId);
The second branch is
SELECT store.storeId
FROM companyUser
INNER JOIN store ON companyUser.companyId = store.companyId
WHERE companyUser.userId = 1
Subqueries with JOIN operations are a little tricker to optimize without access to EXPLAIN output, so this is guesswork. I guess these indexes will help, though. (Assuming you use InnoDB and the PK on store is storeId.)
ALTER TABLE companyUser ADD INDEX userId_companyId (userId, companyId);
ALTER TABLE store ADD INDEX companyId (companyId);
Similar analysis applies to the third branch of the UNION.
And, add this index. Your EXPLAIN points to it being missing, and so a full table scan of that large table being required.
ALTER TABLE service ADD INDEX storeId (storeId);
Again, helping you would be far easier if you showed us your table definitions with indexes. SHOW CREATE TABLE service; for example, would show us what we need for your service table. Pro tip when troubleshooting this kind of performance stuff always doublecheck your indexes. Ask me how I know that when you have a couple of hours to spare.
Pro tip Be obsessive about formatting your queries so they're readable. You, yourself a year from now, and your co-workers yet unborn need to read and reason about them. To my way of thinking that means skipping those silly backticks.
Perhaps you need to rethink the schema. It seems like you need a table for "user" instead of, or in addition to, the 3 tables for different types of "users".
Meanwhile, these composite indexes are likely to help performance in either formulation:
storeUser: INDEX(storeId, userId)
storeUser: INDEX(userId, storeId)
service: INDEX(storeId)
store: INDEX(companyId, storeId)
companyUser: INDEX(userId, companyId)
company: INDEX(accountId, companyId)
accountUser: INDEX(userId, accounted)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.
In particular, storeUser smells like a many-to-many mapping table. If so, see Many:many mapping for more discussion.
In general IN( SELECT ... ) does not optimize well, but you might find otherwise for your query.
Sorry to not give more details about the schemas but I wasn't allowed to share it here, anyway, the problem happened to be elsewhere:
The service table was receiving a huge amount of requests, some actions were even locking it up, ending up on slow times whenever we were accesing that table, we have fixed our other proccess and it's working great now. Hugely appreciate your time and effort, thanks.
I have a MySQL query that I am having performance problems with that I do not understand. When I try to debug and run the overall query as a sequence of separate subqueries they seem to perform reasonably well, given the volume of data. When I combine them into a single nested query I get much much much longer execution times.
The main ratings table mentioned below is approx 30 million rows (4GB of disk space), with a couple of foreign keys (it's a many-to-many table linking users and items with a small amount of additional supplementary user specific item information - approx 13 fields and 30 bytes).
Query 1 - approx 23s
SELECT COUNT(1) FROM (SELECT fields FROM ratings WHERE (id >= 0 AND id < 10000)
AND item_type = 1) AS t1;
Query 1 saved to table - approx 65s if I save the results to a temporary table
CREATE TABLE temp_table SELECT fields FROM ratings WHERE (id >= 0 AND id < 10000)
AND item_type = 1;
Query 2 - approx 3s
SELECT COUNT(1) FROM temp_table WHERE id IN (SELECT id from item_stats WHERE
ratings_count > 1000);
Bases on this I would expect a combined query to be approx 30s or so, and not more than approx 70s.
Combined query (Query 1 + Query 2) - indeterminate time (10s of minutes before I give up and cancel)
SELECT COUNT(1) from (SELECT * FROM (SELECT fields FROM ratings WHERE (id >= 0
AND id < 10000) AND item_type = 1) AS t1 WHERE t1.id IN (SELECT id FROM
item_stats WHERE ratings_count > 1000)) as t2;
Can anyone help explain this difference and guide me in creating a query that works? If I need to I can rely on the sequential queries (which would take approx 70s), but that is cumbersome and does not seem the right way to go.
I have tried using INNER JOIN instead of IN but this did not seem to make much difference. The ID count from the item_stats table is about 2700 IDs.
It's using MySQL 8.0 on a laptop (16GB RAM, SSD).
Response to suggestions / questions:
Query 1
EXPLAIN select user_id, game_id, item_type_id, rating, plays, own, bgg_last_modified from collections where (user_id >= 0 and user_id < 10000) and item_type_id = 1;
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | collections | NULL | ALL | user_id | NULL | NULL | NULL | 32898400 | 1.31 | Using where |
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Query 2
EXPLAIN select * from temp_coll where game_id in (select game_id from games_ratings_stats where (ratings_count > 1000) or (ratings_count > 500 and ratings_avg >= 7.0));
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | NULL |
| 1 | SIMPLE | temp_coll | NULL | ALL | NULL | NULL | NULL | NULL | 1674386 | 10.00 | Using where; Using join buffer (hash join) |
| 2 | MATERIALIZED | games_ratings_stats | NULL | ALL | NULL | NULL | NULL | NULL | 81585 | 40.74 | Using where |
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
Combined query
EXPLAIN select * from (select user_id, game_id, item_type_id, rating, plays, own, bgg_last_modified from collections where (user_id >= 0 and user_id < 10000) and item_type_id = 1) as t1 where t1.game_id in (select game_id from games_ratings_stats where (ratings_count > 1000) or (ratings_count > 500 and ratings_avg >= 7.0));
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
| 1 | SIMPLE | <subquery3> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | Using where |
| 1 | SIMPLE | collections | NULL | ref | user_id,game_id | game_id | 5 | <subquery3>.game_id | 199 | 1.31 | Using where |
| 3 | MATERIALIZED | games_ratings_stats | NULL | ALL | NULL | NULL | NULL | NULL | 81585 | 40.74 | Using where |
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)
Your query appears to be functionally identical to the following (rather implausible) query:
SELECT COUNT(*) total
FROM ratings r
JOIN item_stats s
ON s.id = r.id
WHERE r.id >= 0
AND r.id < 10000
AND r.item_type = 1
AND s.ratings_count > 1000
r.id is, presumably, the PRIMARY KEY, so it's automatically included in any INNODB index, which leaves just item_type and ratings_count requiring indexes.
You would benefit a lot from an online tutorial on learning how to read the EXPLAIN plan. The EXPLAINS you shared clearly show missing indexes.
As a general rule, queries should not take 23 seconds or 65 seconds, even with millions of rows. Proper indexes + partitioning should resolve the slowness.
Query 1: The user_id index on that table is not helping performance, as 99% of users are within the range in the where clause. You can add an index on item_type_id
ALTER TABLE collections ADD KEY (item_type_id)
Query 2: The temp_coll table is missing a game_id index. Also, I'm not sure if the underlying code for games_ratings_stats has an index on ratings_count and if that would help. I dont have experience with MySQL materialized tables.
ALTER TABLE temp_coll ADD KEY (game_id)
Query 3:
Would benefit from above indexes.
Increasing the InnoDB Buffer Pool Size (now set to 8GB) seems to have made a significant improvement. If anyone has any further setup or tuning advice on MySQL then that would be appreciated!
I have this select query, ItemType is varchar type and ItemComments is int type:
select * from ItemInfo where ItemType="item_type" order by ItemComments desc limit 1
You can see this query has 3 conditions:
where 'ItemType' equals a specific value;
order by 'ItemComments'
with descending order
The interesting thing is, when I select rows with all three conditions, it's getting very slow. But if I drop any one of the three (except condition 2), the query runs quite fast. See:
select * from ItemInfo where ItemType="item_type" order by ItemComments desc limit 1;
/* Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 16.318 sec. */
select * from ItemInfo where ItemType="item_type" order by ItemComments limit 1;
/* Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 0.140 sec. */
select * from ItemInfo order by ItemComments desc limit 1;
/* Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 0.015 sec. */
Plus,
I'm using MySQL 5.7 with InnoDB engine.
I have created indexes on both ItemType and ItemComments and table ItemInfo contains 2 million rows.
I have searched many possible explanation like MySQL support for descending index, composite index and so on. But these still can't explain why query #1 runs slowly while query #2 and #3 runs well.
It would be very appreciated if anyone could help me out.
Updates:create table and explain info
Create code:
CREATE TABLE `ItemInfo` (
`ItemID` VARCHAR(255) NOT NULL,
`ItemType` VARCHAR(255) NOT NULL,
`ItemPics` VARCHAR(255) NULL DEFAULT '0',
`ItemName` VARCHAR(255) NULL DEFAULT '0',
`ItemComments` INT(50) NULL DEFAULT '0',
`ItemScore` DECIMAL(10,1) NULL DEFAULT '0.0',
`ItemPrice` DECIMAL(20,2) NULL DEFAULT '0.00',
`ItemDate` DATETIME NULL DEFAULT '1971-01-01 00:00:00',
PRIMARY KEY (`ItemID`, `ItemType`),
INDEX `ItemDate` (`ItemDate`),
INDEX `ItemComments` (`ItemComments`),
INDEX `ItemType` (`ItemType`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB;
Explain result:
mysql> explain select * from ItemInfo where ItemType="item_type" order by ItemComments desc limit 1;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | i | NULL | index | ItemType | ItemComments | 5 | NULL | 83 | 1.20 | Using where |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
mysql> explain select * from ItemInfo where ItemType="item_type" order by ItemComments limit 1;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | i | NULL | index | ItemType | ItemComments | 5 | NULL | 83 | 1.20 | Using where |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
mysql> explain select * from ItemInfo order by ItemComments desc limit 1;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
| 1 | SIMPLE | i | NULL | index | NULL | ItemComments | 5 | NULL | 1 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
Query from O. Jones:
mysql> explain
-> SELECT a.*
-> FROM ItemInfo a
-> JOIN (
-> SELECT MAX(ItemComments) ItemComments, ItemType
-> FROM ItemInfo
-> GROUP BY ItemType
-> ) maxcomm ON a.ItemType = maxcomm.ItemType
-> AND a.ItemComments = maxcomm.ItemComments
-> WHERE a.ItemType = 'item_type';
+----+-------------+------------+------------+-------+----------------------------------------+-------------+---------+---------------------------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+----------------------------------------+-------------+---------+---------------------------+---------+----------+--------------------------+
| 1 | PRIMARY | a | NULL | ref | ItemComments,ItemType | ItemType | 767 | const | 27378 | 100.00 | Using where |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 772 | mydb.a.ItemComments,const | 10 | 100.00 | Using where; Using index |
| 2 | DERIVED | ItemInfo | NULL | index | PRIMARY,ItemDate,ItemComments,ItemType | ItemType | 767 | NULL | 2289466 | 100.00 | NULL |
+----+-------------+------------+------------+-------+----------------------------------------+-------------+---------+---------------------------+---------+----------+--------------------------+
I'm not sure if I execute this query right but I couldn't get the records within quite a long time.
Query from Vijay. But I add ItemType join condition cause with only max_comnt return items from other ItemType:
SELECT ifo.* FROM ItemInfo ifo
JOIN (SELECT ItemType, MAX(ItemComments) AS max_comnt FROM ItemInfo WHERE ItemType="item_type") inn_ifo
ON ifo.ItemComments = inn_ifo.max_comnt and ifo.ItemType = inn_ifo.ItemType
/* Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 7.441 sec. */
explain result:
+----+-------------+------------+------------+-------------+-----------------------+-----------------------+---------+-------+-------+----------+-----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------------+-----------------------+-----------------------+---------+-------+-------+----------+-----------------------------------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | ifo | NULL | index_merge | ItemComments,ItemType | ItemComments,ItemType | 5,767 | NULL | 88 | 100.00 | Using intersect(ItemComments,ItemType); Using where |
| 2 | DERIVED | ItemInfo | NULL | ref | ItemType | ItemType | 767 | const | 27378 | 100.00 | NULL |
+----+-------------+------------+------------+-------------+-----------------------+-----------------------+---------+-------+-------+----------+-----------------------------------------------------+
And I want to explain why I use order with limit at the first place: I was planning to fetch record from the table randomly with a specific probability. The random index generated from python and send to MySQL as a variable. But then I found it cost so much time so I decided to just use the first record I got.
After inspiring by O. Jones and Vijay, I tried using max function, but it doesn't perform well:
select max(ItemComments) from ItemInfo where ItemType='item_type'
/* Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 6.225 sec. */
explain result:
+----+-------------+------------+------------+------+---------------+----------+---------+-------+-------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+----------+---------+-------+-------+----------+-------+
| 1 | SIMPLE | ItemInfo | NULL | ref | ItemType | ItemType | 767 | const | 27378 | 100.00 | NULL |
+----+-------------+------------+------------+------+---------------+----------+---------+-------+-------+----------+-------+
Thanks for all contribute to this question. Hope you could bring more solutions based on information above.
Please provide CURRENT SHOW CREATE TABLE ItemInfo.
For most of those queries, you need the composite index
INDEX(ItemType, ItemComments)
For the last one, you need
INDEX(ItemComments)
For that especially slow query, please provide EXPLAIN SELECT ....
Discussion - Why does INDEX(ItemType, ItemComments) help with where ItemType="item_type" order by ItemComments desc limit 1?
An index is structured in a BTree (see Wikipedia), thereby making searching for an individual item very fast, and making scanning in a particular order very fast.
where ItemType="item_type" says to filter on ItemType, but there are a lot of such in the index. In this index, they are ordered by ItemComments (for a given ItemType). The direction desc suggests to start with the highest value of ItemContents; that is the 'end' of the index items. Finally limit 1 says to stop after one item is found. (Somewhat like finding the last "S" in your Rolodex.)
So the query is to 'drill down' the BTree to the end of the entries for ItemType in the composite INDEX(ItemType, ItemContents) and grab one entry -- a very efficient task.
Actually SELECT * implies that there is one more step, namely to get all the columns for that one row. That info is not in the index, but over in the BTree for ItemInfo -- which contains all the columns for all the rows, ordered by the PRIMARY KEY.
The "secondary index" (INDEX(ItemType, ItemComments)) implicitly contains a copy of the relevant PRIMARY KEY columns, so we now have the values of ItemID and ItemType. With those, we can drill down this other BTree to find the desired row and fetch all (*) the columns.
Your first query ordering ascending can take advantage of your index on ItemComment.
SELECT * ... ORDER BY ... LIMIT 1 is a notorious performance antipattern. Why? The server must sort a whole mess of rows, just to discard all but the first.
You might try this (for your descending order variant). It's a little more verbose but much more efficient.
SELECT a.*
FROM ItemInfo a
JOIN (
SELECT MAX(ItemComments) ItemComments, ItemType
FROM ItemInfo
GROUP BY ItemType
) maxcomm ON a.ItemType = maxcomm.ItemType
AND a.ItemComments = maxcomm.ItemComments
WHERE a.ItemType = 'item type'
Why does this work? It uses GROUP BY / MAX() to find the maximum value rather that ORDER BY ... DESC LIMIT 1 . The subquery does your search.
To make this work as efficiently as possible you need a compound (multicolumn) index on (ItemType, ItemComments). Create that with
ALTER TABLE ItemInfo CREATE INDEX ItemTypeCommentIndex (ItemType, ItemComments);
When you create the new index, drop your index on ItemType, because the new index is redundant with that one.
MySQL's query planner is smart enough to see the outer WHERE clause before it runs the inner GROUP BY query, so it doesn't have to aggregate the whole table.
With that compound index MySQL can use a loose index scan to satisfy the subquery. Those are almost miraculously fast. You should read up on the topic.
Your query will select all the rows with based on the where condition. After that it will sort the rows according to order by statement , then it will select the first row. A better query would be something like
SELECT ifo.* FROM ItemInfo ifo
JOIN (SELECT MAX(ItemComments) AS max_comnt FROM ItemInfo WHERE ItemType="item_type") inn_ifo
ON ifo.ItemComments = inn_ifo.max_comnt
As this query only finds maximum value from the column. Finding MAX() is only O(n) but the fastest algorithm for sorting is of O(nlogn) . So if you will avoid the order by statemet the query will perform faster.
Hope this helped.
I'm working on an application that needs to get the latest values from a table with currently > 3 million rows and counting. The latest values need to be grouped by two columns/attributes, so it runs the following query:
SELECT
m1.type,
m1.cur,
ROUND(m1.val, 2) AS val
FROM minuteCharts m1
JOIN
(SELECT
cur,
type,
MAX(id) id,
ROUND(val) AS val
FROM minuteCharts
GROUP BY cur, type) m2
ON m1.cur = m2.cur AND m1.id = m2.id;
The database server is quite the heavyweight, but the above query takes 3,500ms to complete and this number is rising. I suspect this wasn't a real problem when the application was just launched (as the database was pretty much empty back then), but it's becoming a problem and I haven't found a better solution. In fact, similar questions on SO actually had something like the above as their answers (which is probably where the developer got it from).
Is there anyone out there who knows how to get the same results more efficiently?
UPDATE: I submitted this too early.
EXPLAIN minuteCharts;
Field Type Null Key Default Extra
id int(255) NO PRI NULL auto_increment
time datetime NO MUL NULL
cur enum('EUR','USD') NO NULL
type enum('GOLD','SILVER','PLATINUM') NO NULL
val varchar(80) NO NULL
id is the primary index and there's an index on time.
The subquery with GROUP BY is doing a table-scan and a temporary table, because there's no index to support it.
mysql> EXPLAIN SELECT m1.type, m1.cur, ROUND(m1.val, 2) AS val FROM minuteCharts m1 JOIN (SELECT cur, type, MAX(id) id, ROUND(val) AS val FROM minuteCharts GROUP BY cur, type) m2 ON m1.cur = m2.cur AND m1.id = m2.id;
+----+-------------+--------------+------+---------------+-------------+---------+------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+-------------+---------+------------------------+------+---------------------------------+
| 1 | PRIMARY | m1 | ALL | PRIMARY | NULL | NULL | NULL | 1 | NULL |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 6 | test.m1.cur,test.m1.id | 2 | NULL |
| 2 | DERIVED | minuteCharts | ALL | NULL | NULL | NULL | NULL | 1 | Using temporary; Using filesort |
+----+-------------+--------------+------+---------------+-------------+---------+------------------------+------+---------------------------------+
You can improve this with the following index, sorted first by your GROUP BY columns, then also including the other columns for the subquery to make it a covering index:
mysql> ALTER TABLE minuteCharts ADD KEY (cur,type,id,val);
The table-scans turn into index scans (still not great but better), and the temp table goes away.
mysql> EXPLAIN ...
+----+-------------+--------------+-------+---------------+-------------+---------+------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+-------------+---------+------------------------+------+-------------+
| 1 | PRIMARY | m1 | index | PRIMARY,cur | cur | 88 | NULL | 1 | Using index |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 6 | test.m1.cur,test.m1.id | 2 | NULL |
| 2 | DERIVED | minuteCharts | index | cur | cur | 88 | NULL | 1 | Using index |
+----+-------------+--------------+-------+---------------+-------------+---------+------------------------+------+-------------+
Best results will be if the index fits in your buffer pool. If it's larger than the buffer pool, the query will have to push pages in and out repeatedly during the index scan, which will greatly degrade performance.
Re your comment:
The answer to how long it'll take to add the index depends on the version of MySQL you have, the storage engine for this table, your server hardware, the number of rows in the table, the level of concurrent load on the database, etc. In other words, I have no way of telling.
I'd suggest using pt-online-schema-change, so you will have no downtime.
Another suggestion would be to try it on a staging server with a clone of your database, so you can get a rough estimate how long it'll take (although testing on an idle server is often a lot quicker than running the same change on a busy server).
I have a huge table like
CREATE TABLE IF NOT EXISTS `object_search` (
`keyword` varchar(40) COLLATE latin1_german1_ci NOT NULL,
`object_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`keyword`,`media_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;
with around 39 million rows (using over 1 GB space) containing the indexed data for 1 million records in the object table (where object_id points at).
Now searching through this with a query like
SELECT object_id, COUNT(object_id) AS hits
FROM object_search
WHERE keyword = 'woman' OR keyword = 'house'
GROUP BY object_id
HAVING hits = 2
is already significantly faster than doing a LIKE search on the composed keywords field in the object table but still takes up to 1 minute.
It's explain looks like:
+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | search | ref | PRIMARY | PRIMARY | 42 | const | 345180 | 100.00 | Using where; Using index |
+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+
The full explain with joined object and object_color and object_locale table, while the above query is run in a subquery to avoid overhead, looks like:
+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 182544 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | object_color | eq_ref | object_id | object_id | 4 | search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | locale | eq_ref | object_id | object_id | 4 | search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | object | eq_ref | PRIMARY | PRIMARY | 4 | search.object_id | 1 | 100.00 | |
| 2 | DERIVED | search | ref | PRIMARY | PRIMARY | 42 | | 345180 | 100.00 | Using where; Using index |
+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+
My top goal would be to be able to scan through this within 1 or 2 seconds.
So, are there further techniques to improve search speed for keywords?
Update 2013-08-06:
Applying most of Neville K's suggestion I now have the following setup:
CREATE TABLE `object_search_keyword` (
`keyword_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`keyword` varchar(64) COLLATE latin1_german1_ci NOT NULL,
PRIMARY KEY (`keyword_id`),
FULLTEXT KEY `keyword_ft` (`keyword`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;
CREATE TABLE `object_search` (
`keyword_id` int(10) unsigned NOT NULL,
`object_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`keyword_id`,`media_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The new query's explain looks like this:
+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 24381 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | object_color | eq_ref | object_id | object_id | 4 | object_search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | object | eq_ref | PRIMARY | PRIMARY | 4 | object_search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | locale | eq_ref | object_id | object_id | 4 | object_search.object_id | 1 | 100.00 | |
| 2 | DERIVED | <derived4> | system | NULL | NULL | NULL | NULL | 1 | 100.00 | |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 24381 | 100.00 | |
| 4 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 3 | DERIVED | object_keyword | fulltext | PRIMARY,keyword_ft | keyword_ft | 0 | | 1 | 100.00 | Using where; Using temporary; Using filesort |
| 3 | DERIVED | object_search | ref | PRIMARY | PRIMARY | 4 | object_keyword.keyword_id | 2190225 | 100.00 | Using index |
+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+
The many derives are coming from the keyword comparing subquery being nested into another subquery which does nothing but count the amount of rows returned:
SELECT SQL_NO_CACHE object.object_id, ..., #rn AS numrows
FROM (
SELECT *, #rn := #rn + 1
FROM (
SELECT SQL_NO_CACHE search.object_id, COUNT(turbo.object_id) AS hits
FROM object_keyword AS kwd
INNER JOIN object_search AS search ON (kwd.keyword_id = search.keyword_id)
WHERE MATCH (kwd.keyword) AGAINST ('+(woman) +(house)')
GROUP BY search.object_id HAVING hits = 2
) AS numrowswrapper
CROSS JOIN (SELECT #rn := 0) CONST
) AS turbo
INNER JOIN object AS object ON (search.object_id = object.object_id)
LEFT JOIN object_color AS object_color ON (search.object_id = object_color.object_id)
LEFT JOIN object_locale AS locale ON (search.object_id = locale.object_id)
ORDER BY timestamp_upload DESC
The above query will actually run within ~6 seconds, since it searches for two keywords. The more keywords I search for, the faster the search goes down.
Any way to further optimize this?
Update 2013-08-07
The blocking thing seems almost certainly to be the appended ORDER BY statement. Without it, the query executes in less than a second.
So, is there any way to sort the result faster? Any suggestions welcome, even hackish ones that would require post processing somewhere else.
Update 2013-08-07 later that day
Alright ladies and gentlemen, nesting the WHERE and ORDER BY statements in another layer of subquery to not let it bother with tables it doesn't need roughly doubled it's performance again:
SELECT wowrapper.*, locale.title
FROM (
SELECT SQL_NO_CACHE object.object_id, ..., #rn AS numrows
FROM (
SELECT *, #rn := #rn + 1
FROM (
SELECT SQL_NO_CACHE search.media_id, COUNT(search.media_id) AS hits
FROM object_keyword AS kwd
INNER JOIN object_search AS search ON (kwd.keyword_id = search.keyword_id)
WHERE MATCH (kwd.keyword) AGAINST ('+(frau)')
GROUP BY search.media_id HAVING hits = 1
) AS numrowswrapper
CROSS JOIN (SELECT #rn := 0) CONST
) AS search
INNER JOIN object AS object ON (search.object_id = object.object_id)
LEFT JOIN object_color AS color ON (search.object_id = color.object_id)
WHERE 1
ORDER BY object.object_id DESC
) AS wowrapper
LEFT JOIN object_locale AS locale ON (jfwrapper.object_id = locale.object_id)
LIMIT 0,48
Searches that took 12 seconds (single keyword, ~200K results) now take 6, and a search for two keywords that took 6 seconds (60K results) now takes around 3.5 secs.
Now this is already a massive improvement, but is there any chance to push this further?
Update 2013-08-08 early that day
Undid that last nested variation of the query, since it actually slowed down other variations of it...
I'm now trying some other things with different table layouts and FULLTEXT indexes using MyISAM for a dedicated search table with a combined keyword field (comma separated in a TEXT field).
Update 2013-08-08
Alright, a plain fulltext index doesnt really help.
Back to the previous setup, the only thing blocking is the ORDER BY (which resorts to using a temporary table and filesort). Without it a search is complete within less than a second!
So basically what's left of all this is:
How do I optimize the ORDER BY statement to run faster, likely by eliminating the use of the temporary table?
Full text search will be much faster than using the standard SQL string comparison features.
Secondly, if you have a high degree of redundancy in the keywords, you could consider a "many to many" implementation:
Keywords
--------
keyword_id
keyword
keyword_object
-------------
keyword_id
object_id
objects
-------
object_id
......
If this reduces the string comparison from 39 million rows to 100K rows (roughly the size of the English dictionary), you may also see a distinct improvement, as the query would only have to perform 100K string comparisons, and joining on an integer keyword_id and object_id field should be much, much faster than doing 39M string comparisons.
The best solution for this will be a FULLTEXT search, but you will probably need a MyISAM table for that. You can setup a mirror table and update it with some events and triggers or if you have a slave replicating from your server you can change its table to MyISAM and use it for searches.
For this query the only thing I can come up with is to rewrite it as:
SELECT s1.object_id
FROM object_search s1
JOIN object_search s2 ON s2.object_id = s1.object_id AND s2.key_word = 'word2'
JOIN object_search s3 ON s3.object_id = s1.object_id AND s3.key_word = 'word3'
....
WHERE s1.key_word = 'word1'
and I'm not sure it will be faster this way.
Also you will need to have an index on object_id (assuming your PK is (key_word, object_id)).
If you have seldom INSERTs and often SELECTs you could optimize your data for the reads i.e. recalculate the number of object_ids per keyword and directly store it in the database. The SELECTs would then be very fast, the INSERTs would take some seconds though,.