MySQL Slow query for 'COUNT' - mysql

The following query takes 0.7s on a 2.5Ghz dual core Windows Server 2008 R2 Enterprise when run over a 4.5Gb MySql database. sIndex10 is a varchar(1024) column type:
SELECT COUNT(*) FROM e_entity
WHERE meta_oid=336799 AND sIndex10 = ''
An EXPLAIN shows the following information:
id: 1
select_type: SIMPLE
table: e_entity
type: ref
possible_keys: App_Parent,sIndex10
key: App_Parent
key_len: 4
ref: const
rows: 270066
extra: Using Where
There are 230060 rows matching the first condition and 124216 rows matching the clause with AND operator. meta_oid is indexed, and although sIndex10 is also indexed I think it's correctly not picking up this index as a FORCE INDEX (sIndex10), takes longer.
We have had a look at configuration parameters such as innodb_buffer_pool_size and they look correct as well.
Given that this table has already 642532 records, have we reached the top of the performance mysql can offer? Is it at this point investing in hardware the only way forward?

WHERE meta_oid=336799 AND sIndex10 = ''
begs for a composite index
INDEX(meta_oid, sIndex10) -- in either order
That is not the same as having separate indexes on the columns.
That's all there is to it.
Index Cookbook

One thing I alway do is just count(id) since id is (nearly) always indexed, counting just the id only has to look at the index.
So try running and see if it performs any better. You should also add SQL_NO_CACHE when testing to get a better idea of how the query performs.
SELECT SQL_NO_CACHE COUNT(id) FROM e_entity
WHERE meta_oid=336799 AND sIndex10 = ''
Note: This is probably not the complete answer for your question, but it was too long for just a comment.

Related

Simple MySQL query with ORDER BY too slow

Edit:The time spent for querying a normal word is actually 1.78 seconds. The 4.5 seconds mentioned in the original post below was when querying special words like '.vnet'. (I know REGEXP '\\b.vnet\\b' won't find the whole word match for '.vnet'. I might use a more complex regex to fix this later, or drop the support for '.vnet' if it's too time-consuming.) Also I added solution 5 below.
I have the following MySQL query to achieve whole word matching.
SELECT source, target
FROM tm
WHERE source REGEXP '\\bword\\b'
AND customer = 'COMPANY X'
AND language = 'YYY'
ORDER BY CHAR_LENGTH(source)
LIMIT 5;
There are 2 customers and 2 languages currently.
My goal is to find the top 5 closest matches of a phrase among hundreds of thousands of English sentences. The reason the fetched records are ordered by CHAR_LENGTH is because the shorter the length, the higher the match ratio, since REGEXP '\\bword\\b' makes sure source has word already.
The tm table:
CREATE TABLE tm(
id INT AUTO_INCREMENT PRIMARY KEY,
source TEXT(7000) NOT NULL,
target TEXT(6000) NOT NULL,
language CHAR(3),
customer VARCHAR(10),
INDEX src_cus_lang (source(755), customer, language)
The query above took about 4.5 seconds to finish, which is very slow for me and my PC that has an Intel Core i5-10400F, 16GB RAM and an SSD.
The EXPLAIN command showed the below result:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tm
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1117154
filtered: 1.00
Extra: Using where; Using filesort
1 row in set, 1 warning (0.00 sec)
I tried to delelte the src_cus_lang index and created a new one (customer, language, source(755)), but no improvement at all.
I can think of a few solutions:
Recreate the tm table, ordering by CHAR_LENGTH(source) in the process. This is not ideal for me as I'd like to keep the original order of the table.
Create a new column named src_len, i.e. the length of the source. However, ORDER BY src_len is still very slow.
Split the tm table into 4 separate ones by customer and language. Not ideal for me.
Index the source column. Still very slow.
Use INDEX(customer, language). Took 1.4 seconds longer for both normal words and special words like '.vnet'.
Is there a way to cut the execution time down to less than 0.5 seconds?
This is essentially useless:
INDEX src_cus_lang (source(755), customer, language)
The prefixing keeps the rest of the columns from being very useful. REGEXP requires checking all 1.1M rows.
This would be better:
INDEX(customer, language)
It will at least filter on those two columns, then apply the REGEXP fewer times.
Since it usually wants to finish with the WHERE before considering the ORDER BY, your attempts at src_len, etc, did not help.
If there are only 4 different combinations of customer and language, not much can be done.
However, you should consider a FULLTEXT(source) index. With such,
MATCH(source) AGAINST('+word' IN BOOLEAN MODE)
AND ...
will work much faster.
Also try IN NATURAL LANGUAGE MODE.

How do I speed up this SQL query

I have the following query:
select min(a) from tbl where b > ?;
and it takes about 4 seconds on my mysql instance with index(b, a) (15M rows). Is there a way to speed it up?
Explain:
explain select min(parsed_id) from replays where game_date > '2016-10-01';
id: 1
select_type: SIMPLE
table: replays
partitions: NULL
type: range
possible_keys: replays_game_date_index,replays_game_date_parsed_id_index
key: replays_game_date_parsed_id_index
key_len: 6
ref: NULL
rows: 6854021
filtered: 100.00
Extra: Using where; Using index
Index statement:
create index replays_game_date_parsed_id_index on replays (game_date, parsed_id);
I think the index MySQL is using is the right one. The query should be instantaneous since a SINGLE read from the index should return the result you want. I guess for this query MySQL's SQL optimizer is doing a very poor job.
Maybe you could rephrase your query to trick the SQL optimizer onto using a different strategy. Maybe you can try:
select parsed_id
from replays
where game_date > '2016-10-01'
order by parsed_id
limit 1
Is this version any faster?
select #mina
fro (select (#mina := least(#mina, a)) as mina
from tbl cross join
(select #mina := 999999) params
where b > ?
) t
limit 1;
I suspect this won't make much difference, but I'm not sure what happens under the hood with such a large aggregation function running over an index.
This may or may not help: Change the query and add an index:
SELECT a FROM tbl WHERE b > ? ORDER BY a LIMIT 1;
INDEX(a, b)
Then, if a matching b occurs soon enough in the table, this will be faster than the other suggestions.
On the other hand, if the only matching b is near the end of the table, this will have to scan nearly all the index and be slower than the other options.
a needs to be first in the index. By having both columns in the index, it becomes a "covering" index, hence a bit faster.
It may be that using my SELECT, together with two indexes will give the Optimizer enough to pick the better approach:
INDEX(a,b)
INDEX(b,a)
Schema
Adding either (or both) composite indexes should help.
Shrinking the table size is likely to help...
INT takes 4 bytes. Consider whether a smaller datatype would suffice for any of those columns.
There are 3 dates (DATETIME, TIMESTAMP); do you need all of them?
Is fingerprint varchar(36) a UUID/GUID? If so, it could be packed into BINARY(16).
640MB is tight -- check the graphs to make sure there is no "swapping". (Swapping would be really bad for performance.)

Is it possible to see why my queries are so slow just by seeing the execution plan

I am trying learn how to optimize SQL statements and I was wondering if it's possible to estimate what might be making my queries slow just by seeing the execution plan.
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 382856
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: rf
type: ref
possible_keys: rec_id
key: rec_id
key_len: 4
ref: rs.id
rows: 7
Extra: Using index condition
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: f
type: range
possible_keys: facet_name_and_value,rec_id
key: facet_name_and_value
key_len: 309
ref: NULL
rows: 382856
Extra: Using index condition; Using where; Using temporary; Using filesort
*************************** 4. row ***************************
id: 2
select_type: DERIVED
table: r
type: ref
possible_keys: record_id
key: record_id
key_len: 9
ref: sqlse_test_crescentbconflate.f.rec_id
rows: 1
Extra: Using where; Using index
Just by looking at the execution plan I can see that I am using too many joins and the data is too big since SQL is using filesort, but I might be wrong.
No, it's not really possible to diagnose the performance issue from just the EXPLAIN output.
But the output does reveal that there's a view query that's returning (an estimated) 384,000 rows. We can't tell if that's a stored view, or an inline view. But we can see that that results from that query are being materialized into a table (MySQL calls it a "derived table"), and then the outer query is running against that. The overhead for that can be considerable.
What we can't tell if it's possible to get the same result without the view, to flatten the query. And if that's not possible, whether there are any predicates on the outer query that could be pushed down into the view.
A "Using filesort" isn't necessarily a bad thing. But that operation can become expensive for really large sets. So we do want to avoid unnecessary sort operations. (What we can't tell from the EXPLAIN output is whether it would be possible to avoid those sort operations.)
And if the query uses a "covering index" then the query is satisfied from the index pages, without needing to lookup/visit pages in the underlying table, which means less work to do.
Also, make sure the predicates are in a form that enables effective use of an index. That means having conditions on bare columns, not wrapping the columns in functions. e.g.
We want to avoid writing a condition like this:
where DATE_FORMAT(t.dt,'%Y-%m') = '2016-01'
when the same thing can be expressed like this:
where t.dt >= '2016-01-01' and t.dt < '2016-02-01'
With the former, MySQL has to evaluate the DATE_FORMAT function for every row in the table, and the compare the return from the function. With the latter form, MySQL could use a "range scan" operation on an index with dt as the leading column. A range scan operation has the potential to eliminate vast swaths of rows very efficiently, without actually needing to examine the rows.
To summarize, the biggest performance improvements would likely come from
avoiding creating a derived table (no view definitions)
pushing predicates into view definitions (where view definitions can't be avoided)
avoiding unnecessary sort operations
avoiding unnecessary joins
writing predicates in a form that can make use of suitable indexes
creating suitable indexes, covering indexes where appropriate
I'd look at the extra field in the execution plan, and then examine your query and your database schema to find ways to improve performance.
using temporary means a temporary table was used, which may slow down the query. Furthermore, temporary tables may end up being written to the disk (and not stored in RAM, which the server typically tries to do if it can) if they are too large.
According the MySQL 5.5 documentation, here are some reasons
temporary tables are created:
Evaluation of UNION statements.
Evaluation of some views, such those that use the TEMPTABLE algorithm, UNION, or aggregation.
Evaluation of statements that contain an ORDER BY clause and a different GROUP BY clause, or for which the ORDER BY or GROUP BY
contains columns from tables other than the first table in the join
queue.
Evaluation of DISTINCT combined with ORDER BY may require a temporary table.
For queries that use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements
(described later) that require on-disk storage.
Evaluation of multiple-table UPDATE statements.
Evaluation of GROUP_CONCAT() or COUNT(DISTINCT) expressions.
Then there's using filesort, which means that a sort was performed which could not be done with existing indexes. This could be no big deal, but you should check what fields are being sorted on and where your indexes are and make sure you're not giving MySQL too much work to do.
You may be able to use the execution plan to see why your queries run slowly because you know how your schema works (what columns and indexes you have). But, we here on Stack Overflow can't possibly use just the execution plan to help you.
There's nothing inherently wrong with filesort. It happens to have an unfortunate name; it simply means that satisfying the query requires sorting the results of a subquery. It doesn't necessarily mean the subquery's results have been placed in an actual file in a filesystem.
Try reading this fine tutorial. http://use-the-index-luke.com/
If you need help with a specific query, please ask another question. Include the following information:
The query.
The results of EXPLAIN
The definitions of the tables involved in the query, including indexes.
Pro tip: SELECT * is harmful to performance in big queries with lots of joins. In particular,
SELECT *
FROM gigantic_table
ORDER BY column
LIMIT 1
is an antipattern, because it slurps a huge amount of data, sorts it, and then discards all but one row of of the sorted result. Lots of data gets sloshed around in your server for a small result. That's wasteful, even if it is correct. You can do this kind of thing more efficiently with
SELECT *
FROM gigantic_table
WHERE column =
(SELECT MAX(column) FROM gigantic_table)
The best efficiency will come if column is indexed.
I mention this because the first row of your explain makes it look like you're romping through a great many rows to find something.

SQL query: Speed up for huge tables

We have a table with about 25,000,000 rows called 'events' having the following schema:
TABLE events
- campaign_id : int(10)
- city : varchar(60)
- country_code : varchar(2)
The following query takes VERY long (> 2000 seconds):
SELECT COUNT(*) AS counted_events, country_code
FROM events
WHERE campaign_id` in (597)
GROUPY BY city, country_code
ORDER BY counted_events
We found out that it's because of the GROUP BY part.
There is already an index idx_campaign_id_city_country_code on (campaign_id, city, country_code) which is used.
Maybe someone can suggest a good solution to speed it up?
Update:
'Explain' shows that out of many possible index MySql uses this one: 'idx_campaign_id_city_country_code', for rows it shows: '471304' and for 'Extra' it shows: 'Using where; Using temporary; Using filesort' –
Here is the whole result of EXPLAIN:
id: '1'
select_type: 'SIMPLE'
table: 'events'
type: 'ref'
possible_keys: 'index_campaign,idx_campaignid_paid,idx_city_country_code,idx_city_country_code_campaign_id,idx_cid,idx_campaign_id_city_country_code'
key: 'idx_campaign_id_city_country_code'
key_len: '4'
ref: 'const'
rows: '471304'
Extra: 'Using where; Using temporary; Using filesort'
UPDATE:
Ok, I think it has been solved:
Looking at the pasted query here again I realized that I forget to mention here that there was one more column in the SELECT called 'country_name'. So the query was very slow then (including country_name), but I'll just leave it out and now the performance of the query is absolutely ok.
Sorry for that mistake!
So thank you for all your helpful comments, I'll upvote all the good answers! There were some really helpful additions, that I probably also we apply (like changing types etc).
without seeing what EXPLAIN says it's a long distance shot, anyway:
make an index on (city,country_code)
see if there's a way to use partitioning, your table is getting rather huge
if country code is always 2 chars change it to char
change numeric indexes to unsigned int
post entire EXPLAIN output
don't use IN() - better use:
WHERE campaign_id = 597
OR campaign_id = 231
OR ....
afaik IN() is very slow.
update: like nik0lias commented - IN() is faster than concatenating OR conditions.
Some ideas:
Given the nature and size of the table it would be a great candidate for partitioned tables by country. This way the events of every country would be stored in a different physical table even if it behaves as a virtual big table
Is country code an string? May be you have a country_id that could be easier to sort. (It may force you to create or change indexes)
Are you really using the city in the group by?
partitioning - especially by country will not help
column IN (const-list) is not slow, it is in fact a case with special optimization
The problem is, that MySQL doesn't use the index for sorting. I cannot say why, because it should. Could be a bug.
The best strategy to execute this query is to scan that sub-tree of the index where event_id=597. Since the index is then sorted by city_id, country_code no extra sorting is needed and rows can be counted while scanning.
So the indexes are already optimal for this query. MySQL is just not using them correctly.
I'm getting more information off line. It seems this is not a database problem at all, but
the schema is not normalized. The table contains not only country_code, but also country_name (this should be in an extra table).
the real query contains country_name in the select list. But since that column is not indexed, MySQL cannot use an index scan.
As soon as country_name is dropped from the select list, the query reverts to an index-only scan ("using index" in EXPLAIN output) and is blazingly fast.

MySQL query with alias not using an index

The following query is showing up in my log as not using an index:
SELECT ordinal,YEAR(ceremonydate) as yr
FROM awardinfo
ORDER BY YEAR(ceremonydate) DESC LIMIT 1;
Explain shows it's not using an index:
id: 1
select_type: SIMPLE
table: awardinfo
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 72
Extra: Using filesort
ordinal, ceremonydate both have an index. Are they not being used due to the yr alias? Is there a way to create an index on YEAR(ceremonydate) instead of just ceremonydate? Or is there a way to index an alias?
It is because of the alias. ORDER BY can use an index if it is ordering by something that is indexed. While ceremonyDate date may be indexed, YEAR(ceremoneyDate) changes the value of ceremonyDate to something completely different, so YEAR(ceremoneyDate) is not indexed.
And since you can't index an alias, this means that in order for an ORDER BY to use an index, it must be a simple column name, or list of column names.
You should be able to do this and use the index:
SELECT ordinal,YEAR(ceremonydate) as yr
FROM awardinfo
ORDER BY ceremonydate DESC LIMIT 1;
Without knowing what your data looks like, that may work for you instead.
More info:http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
You can test this by running a simple select query and using the alias:
SELECT ordinal, ceremonydate as yr FROM ... and start adding complexity to your query to see where the indexes stop being used. Most likely, because you are ordering based on YEAR(ceremonydate) mysql is creating a temporary table. Your best bet is to process ceremonydate in your code. MySQL loses a lot of efficiency with inline processing and computation like YEAR() because it has to create those temporary tables.