I have the following table with Index on id and Foreign Key on activityID:
comment (id, activityID, text)
and the following query:
SELECT <cols> FROM `comment` WHERE `comment`.`activityID` = 1257 ORDER BY `id` DESC LIMIT 20;
I basically want to get only the first 20 comments for this activity that has 1165, however, this is the result of a describe:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE comment ref activityID activityID 4 const 1165 NULL
Essentially, it is looking through all comments for this activity before deciding to limit it.
We tested this query under high load when an activity has 200,000 comments and the query takes 5+ seconds, whereas on the same load, an activity with 30 comments takes a couple of ms.
PS: If I remove the WHERE clause, an EXPLAIN says it will only lookup a single row (don't know if that's the case really):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE comment index NULL PRIMARY 4 NULL 1 NULL
Is it possible to optimize this kind of query in any way?
Thank you.
The ordering is causing the slowness.
The query uses the activityID index to find all the rows with that ID. Then it has to read all 200,000 comments and sort them by id to find the last 20.
Add a composite index so it can use an index for the ordering:
ALTER TABLE comment ADD INDEX (activityID, id);
Note that you will no longer need the index on activityID by itself, since it's a prefix of this new index.
Use offset
SELECT <cols> FROM `comment` WHERE `comment`.`activityID` = 1257 ORDER BY `id` DESC LIMIT 0,20;
In limit clause add 0 as an offset to get only the first 20 comments
Just add two separate indexes on activityID and id. That should help you in ORDER BY too. There is no hard and fast rules in optimizations, but you need to try various methods.
Do it this way:
ALTER TABLE comment ADD INDEX (id);
ALTER TABLE comment ADD INDEX (activityID);
I think this will help.
Related
As mention in the title, I would like to know any solution for this by not using store prodecure, temporarily table etc.
Compare Query#1 and Query#3, Query#3 get worst performance result. Does it have any workaround to put in variable but without impact the performance result.
Schema (MySQL v5.7)
create table `order`(
id BIGINT(20) not null auto_increment,
transaction_no varchar(20) not null,
primary key (`id`),
unique key uk_transaction_no (`transaction_no`)
);
insert into `order` (`id`,`transaction_no`)
value (1,'10001'),(2,'10002'),(3,'10003'),(4,'10004');
Query #1
explain select * from `order` where transaction_no in ('10001','10004');
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
order
range
uk_transaction_no
uk_transaction_no
22
2
100
Using where; Using index
Query #2
set #transactionNos = "10001,10004";
There are no results to be displayed.
Query #3
explain select * from `order` where find_in_set(transaction_no, #transactionNos);
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
order
index
uk_transaction_no
22
4
100
Using where; Using index
Short Answer: See Sargeability
Long Answer:
MySQL makes no attempt to optimize an expression when an indexed column when it is hidden inside a function call such as FIND_IN_SET(), DATE(), etc. Your Query 3 will always be performed as a full table scan or a full index scan.
So the only way to optimize what you are doing is to construct
IN ('10001','10004')
That is often difficult to achieve when "binding" a list of values. IN(?) will not work in most APIs to MySQL.
here is the "explain" of my query:
explain
select eil.sell_fmt, count(sell_fmt) as itemCount
from table_items eil
where eil.cl_Id=123 and eil.si_Id='0'
and start_date <= now() and end_date is not null and end_date < NOW()
group by eil.sell_fmt
without date (start_date, end_date) filters:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE eil ref table_items_clid_siid_sellFmt 39 const,const 7393 Using where; Using index
With date filters:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE eil ref table_items_clid_siid_sellFmt 39 const,const 8400 Using where
possible_keys are:
table_items_clid_siid, table_items_clid_siid_itemId, table_items_clid_siid_startDate_endDate, table_items_clid_siid_sellFmt
The query without date filters is very fast (0.4 sec), but with date filters, its taking about 30 seconds. total records are 14K only.
Table field types:
`cl_Id` int(11) NOT NULL,
`si_Id` varchar(11) NOT NULL,
`start_date` datetime DEFAULT NULL,
`end_date` datetime DEFAULT NULL,
`sell_fmt` varchar(20) DEFAULT NULL
I concatenated field-names to give index names, so you can estimate combined fields available in the index.
Can somebody guide me here? what's going on here? what is the best course of action i should take here, or where i'm doing wrong?
I need one more suggestion: in another query on same table, a user can filter based on UPTO 10 fields, and in no definite order of fields (random no of fields in random order). Then this type search would be too slow again. What's the best strategy then? one covering index with "all" possible searchable fields? if yes, does the order of fields in index matter? (i.e. if that order is different than that of fields in query, will the index be used?
First, without seeing your create table statement, I can offer the following... create composite index (multiple fields) that best match your common querying elements applicable to the where clause, starting with the smaller nominal count basis. Since you are explicitly looking for a "cl_ID" and "si_ID" plus start and end dates. Since you have a group by, I would add that to the index for optimization purposes and be a completely COVERING index so the engine does not need to go back to the raw data to complete the query. It can resolve by all fields in the index directly.
I would have an index on
( cl_id, si_id, start_date, end_date, sell_fmt )
Finally, change your count from count(sell_fmt) to just count(*) indicating an "i don't care about a specific field, just as long as a record is found, count it"
I have a simple key-value table with two fields, created like so:
CREATE TABLE `mytable` (
`key` varchar(255) NOT NULL,
`value` double NOT NULL,
KEY `MYKEY` (`key`)
);
The keys are not unique. The table contains over one million records. I need a query that will sum up all the values for a given key, and return the top 10 keys. Here's my attempt:
SELECT t.key, SUM(t.value) value
FROM mytable t
GROUP BY t.key
ORDER BY value DESC
LIMIT 0, 10;
But this is very slow. Thing is, without the GROUP BY and SUM, it's pretty fast, and without the ORDER BY, it's very fast, but for some reason the combination of the two makes it very very slow. Can anyone explain why this is so, and how this can be speeded up?
There is no index on value. I tried creating one but it didn't help.
EXPLAIN EXTENDED produces the following in Workbench:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE t index NULL MYKEY 257 NULL 1340532 100.00 "Using temporary; Using filesort"
There are about 400K unique keys in the table.
The query takes over 3 minutes to run. I don't know how long because I stopped it after 3 minutes. However, if I remove the index on key, it runs in 30 seconds! Anyone has any idea why?
The only way to really speed this up, as far as I can see, is to create a seperate table with unique keys in and maintain the total values. Then you will be able to index values to retrieve the top ten quickly, also the calculation will already be done. As long as the table is not updated in too many places, this shouldn't be a major problem.
The major problem with this type of query is that the group by requires indexing in one order and the order by requires sorting into a different order.
I tried the SQL code:
explain SELECT * FROM myTable LIMIT 1
As a result I got:
id select_type table type possible_keys key key_len ref **rows**
1 SIMPLE myTable ALL NULL NULL NULL NULL **32117**
Do you know why the query would run though all rows instead of simply picking the first row?
What can I change within the query (or in my table) to reduce the line amount for a similar result?
The rows count shown is only an estimate of the number of rows to examine. It is not always equal to the actual number of rows examined when you run the query.
In particular:
LIMIT is not taken into account while estimating number of rows Even if you have LIMIT which restricts how many rows will be examined MySQL will still print full number.
Source
When the query actually runs only one row will be examined.
Edited for use of subselect:
Assuming the primary key is "my_id" , use WHERE. For instance:
select * from mytable
where my_id = (
select max(my_id) from mytable
)
While this seems less efficient at first, the result is as such in explain, resulting in just one row returned and a read on the index to find max. I do not suggest doing this against partitioned tables in MySQL:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY mytable const PRIMARY PRIMARY 4 const 1
2 SUBQUERY NULL NULL NULL NULL NULL NULL NULL Select tables optimized away
I have a relatively large table (5,208,387 rows, 400mb data/670mb index),
all columns i use to search with are indexes.
name and type are VARCHAR(255) BTREE INDEX
and sdate is an INTEGER column containing timestamps.
I fail to understand some issues,
first this query is very slow (5sec):
SELECT *
FROM `mytable`
WHERE `name` LIKE 'hello%my%big%text%thing%'
AND `type` LIKE '%'
ORDER BY `sdate` DESC LIMIT 3
EXPLAIN for the above:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable range name name 257 NULL 5191 Using where
while this one is very fast (5msec):
SELECT *
FROM `mytable`
WHERE `name` LIKE 'hello.my%big%text%thing%'
AND `type` LIKE '%'
ORDER BY `sdate` DESC LIMIT 3
EXPLAIN for the above:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable range name name 257 NULL 204 Using where
the amount of rows scanned different makes sense because of the indexes,
but having 5k of indexed rows take 5 seconds seems way too much.
also, ordering by name instead of sdate makes the queries very fast, but I need to order by the timestamp.
Second thing I do not understand is that before
adding the last column to the index,
the db had index of 1.4GB,
not after running an OPTIMIZE/REPAIR the size is just 670MB.
The problem is, only the portion before the first % can take advantage of the index, the rest of the like strings needs to process all rows which match hello% or hello.my% without the help of one. Also, ordering by another column then the index used, probably requires a second pass, or at least a scan rather then an already sorted index. Options to better performance (can be implemented independently from each other) are:
Using a full-text index on the name column and using a MATCH() AGAINST() search rather then LIKE with %'s.
Adding the sdate to in index combined (name,sdate) could very well speed up sorting.