slow count(*) on innoDB - mysql

I have a table message_message with 3000000 records.
when I make a count(*) query, It's very slow...:
mysql> select count(*) from message_message;
+----------+
| count(*) |
+----------+
| 2819416 |
+----------+
1 row in set (2 min 35.35 sec)
explain it:
mysql> explain select count(*) from message_message;
| id | select_type| table | type | possible_keys | key | key_len | ref | rows |Extra |
| 1 | SIMPLE | message_message | index | NULL | PRIMARY | 4 | NULL | 2939870 | Using index |
1 row in set (0.02 sec)
what happen?

Have a look at This Post in InnoDB you need to do a full table scan, where as in MyISAM its a index read.
If you use a where clause though it changes the execution pattern to use indexes, so in general InnoDB will be slower than MyISAM on full unrestricted counts, where as the performance matches up on restricted counts.

If you want to count the amount of records, it's better to query the whole table and use the num_rows property of the result set. Count(...) is usually used when you want to have aggregate queries (in combination with GROUP BY).

Related

Indexing the prefix of a varchar

I have a 20 Million records table that has this column:
category_value varchar(4000).
I also have an index on this column, but still get bad results:
mysql> select count(*) from daniel_table where category_value like 'giraffe%';
+----------+
| count(*) |
+----------+
| 107130 |
+----------+
1 row in set (2 min 4.33 sec)
So what i did, is created a new field -
Short_value varchar(32)
Which contains- Substr(category_value, 1, 32)
And of course has an index.
Now, when i search this field its better:
mysql> select count(*) from daniel_table where short_value like 'giraffe%';
+----------+
| count(*) |
+----------+
| 107130 |
+----------+
1 row in set (1.36 sec)
However, I don't want to create a new field, it's duplicate.
I tried creating this index on the original varchar(4000):
key cat32(category_value(32))
And got this result:
mysql> select count(*) from daniel_table use index (cat32) where Category_Value like 'giraffe%' ;
+----------+
| count(*) |
+----------+
| 107130 |
+----------+
1 row in set (24.60 sec)
Which is still bad performance comparing to the varchar(32) field.
When looking into this a little bit I've figured out the problem.
As you can see in the explain, there is no "using index" in the extra field.
Meaning, Mysql fetch the table for each row its found.
Why does it do it?
How can it be avoided?
The explain:
+----+-------------+--------------------------------+-------+---------------+-------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------------------+-------+---------------+-------+---------+------+--------+-------------+
| 1 | SIMPLE | daniel_table | range | cat32 | cat32 | 34 | NULL | 195824 | Using where |
+----+-------------+--------------------------------+-------+---------------+-------+---------+------+--------+-------------+
Thanks alot!!

Why does this MySQL query not use the index properly?

Sorry that this is such a specific and probably cliche question, but it is really causing me major problems.
Everyday I have to do several hundred thousands select statements that look like these two (this is one example but they're all pretty much the same just with different word1):
SELECT pibn,COUNT(*) AS aaa FROM research_storage1
USE INDEX (word21pibn)
WHERE word1=270299 AND word2=0
GROUP BY pibn
ORDER BY aaa DESC
LIMIT 1000;
SELECT pibn,page FROM research_storage1
USE INDEX (word12num)
WHERE word1=270299 AND word2=0
ORDER BY num DESC
LIMIT 1000;
The first statement is quick-as-a-flash and takes a fraction of a second. The second statement takes about 2 seconds, which is way too long considering I have hundreds of thousands to do.
The indexes are:
word21pibn: word2, word1, pibn
word12num: word1, word2, num
The results of explain (for both extended and partitions are):
mysql> explain extended SELECT pibn,COUNT(*) AS aaa FROM research_storage1 USE INDEX (word21pibn) WHERE word1=270299 AND word2=0 GROUP BY pibn ORDER BY aaa DESC LIMIT 1000;
+----+-------------+-------------------+------+---------------+------------+---------+-------------+------+----------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------+------+---------------+------------+---------+-------------+------+----------+-----------------------------------------------------------+
| 1 | SIMPLE | research_storage1 | ref | word21pibn | word21pibn | 6 | const,const | 1549 | 100.00 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------------+------+---------------+------------+---------+-------------+------+----------+-----------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> explain partitions SELECT pibn,COUNT(*) AS aaa FROM research_storage1 USE INDEX (word21pibn) WHERE word1=270299 AND word2=0 GROUP BY pibn ORDER BY aaa DESC LIMIT 1000;
+----+-------------+-------------------+------------+------+---------------+------------+---------+-------------+------+-----------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+------------+------+---------------+------------+---------+-------------+------+-----------------------------------------------------------+
| 1 | SIMPLE | research_storage1 | p99 | ref | word21pibn | word21pibn | 6 | const,const | 1549 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------------+------------+------+---------------+------------+---------+-------------+------+-----------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain extended SELECT pibn,page FROM research_storage1 USE INDEX (word12num) WHERE word1=270299 AND word2=0 ORDER BY num DESC LIMIT 1000;
+----+-------------+-------------------+------+---------------+-----------+---------+-------------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------+------+---------------+-----------+---------+-------------+------+----------+-------------+
| 1 | SIMPLE | research_storage1 | ref | word12num | word12num | 6 | const,const | 818 | 100.00 | Using where |
+----+-------------+-------------------+------+---------------+-----------+---------+-------------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> explain partitions SELECT pibn,page FROM research_storage1 USE INDEX (word12num) WHERE word1=270299 AND word2=0 ORDER BY num DESC LIMIT 1000;
+----+-------------+-------------------+------------+------+---------------+-----------+---------+-------------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+------------+------+---------------+-----------+---------+-------------+------+-------------+
| 1 | SIMPLE | research_storage1 | p99 | ref | word12num | word12num | 6 | const,const | 818 | Using where |
+----+-------------+-------------------+------------+------+---------------+-----------+---------+-------------+------+-------------+
1 row in set (0.00 sec)
The only difference I see is that the second statement does not have Using index in the extra column of describe. Though this does not make sense because the index was designed for that statement, so I don't see why it would not be used.
Any idea?
Try adding the pbin and page column to the word12num compound index. Then all the information you need for your query will be in the index, like it is in your first query.
Edit I missed the pbin column you're selecting; sorry about that.
If your compound index turns out to contain (word1, word2, num, pbin, page) then everything in your second query can come from the index.
If you look at the Extra column under your first query's EXPLAIN, one of the blurbs in there is Using index. #sebas pointed this out. This means, actually, Using index only. This means the server can satisfy your query by just consulting the index without having to consult the table. That's why it is so fast: the server doesn't have to bang the disk heads around random-accessing the table to get the extra columns. Using index is not present in your second query's EXPLAIN.
The columns mentioned in WHERE come first. Then we have the columns in ORDER BY. Finally we have the columns you're simply SELECTing. Why use this particular order for columns in the index? The server finds its way to the first index entry matching the SELECT, then can read the index sequentially to satisfy the query.
It is indeed expensive to construct and maintain a compound index on a big table. You are looking at a basic tradeoff in DBMS design: do you want to spend time constructing the table or looking things up in it? Only you know whether it's better to incur the cost when building the table or when looking things up in it.

Why my query is slower if a date column is in the SELECT part of my query?

I have a strange behavior with my mysql query below:
SELECT domain_id, domain_name, domain_lastupdate
FROM domains
WHERE domain_id > 300000 LIMIT 2000
takes ~ 15seconds...
while
SELECT domain_id, domain_name
FROM domains
WHERE domain_id > 300000 LIMIT 2000
takes ~ 0.05seconds...
I've tried different ids with different limits doing one before the other and the other way around not to get cached results, but I end up with dramatic time differences.
I have 1 index on the domain_id, 1 on the domain_name, but none with both columns...
I just don't get it...
#
The domain_lastupdate is a simple Date column.
Here's the EXPLAIN output of both queries:
explain SELECT domain_id, domain_name, domain_lastupdate FROM domains WHERE domain_id > 255000 LIMIT 500;
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+-------------+
| 1 | SIMPLE | domains | range | UN_domainid | UN_domainid | 4 | NULL | 12575357 | Using where |
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+-------------+
1 row in set (0.00 sec)
second one:
explain SELECT domain_id, domain_name FROM domains WHERE domain_id > 255000 LIMIT 500;
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+--------------------------+
| 1 | SIMPLE | domains | range | UN_domainid | UN_domainid | 4 | NULL | 12575369 | Using where; Using index |
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+--------------------------+
1 row in set (0.01 sec)
Any idea why the first one doesn't use the index ?
When you are pulling out the non date columns that you have indexed the SQL Server is able to pull your data directly out of the index and needn't go to the table at all. To get the date it is having to hit the table. Add an index on the date column.
Also I suppose you could create a multi column index. Make sure you have domain_id as the first column in the index. Creating Indexes
What you want to use is what is called A Covering Index

Why is this MySQL JOIN statement returning more results?

I have two (characteristic_list and measure_list) tables that are related to each other by a column called 'm_id'. I want to retrieve records using filters (columns from characteristic_list) within a date range (columns from measure_list). When I gave the following SQL using INNER JOIN, it takes a while to retrieve the record. What am I doing wrong?
mysql> explain select c.power_set_point, m.value, m.uut_id, m.m_id, m.measurement_status, m.step_name from measure_list as m INNER JOIN characteristic_lis
t as c ON (m.m_id=c.m_id) WHERE (m.sequence_end_time BETWEEN '2010-06-18' AND '2010-06-20');
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | c | ALL | NULL | NULL | NULL | NULL | 82952 | |
| 1 | SIMPLE | m | ALL | NULL | NULL | NULL | NULL | 85321 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
2 rows in set (0.00 sec)
mysql> select count(*) from measure_list;
+----------+
| count(*) |
+----------+
| 83635 |
+----------+
1 row in set (0.18 sec)
mysql> select count(*) from characteristic_list;
+----------+
| count(*) |
+----------+
| 83635 |
+----------+
1 row in set (0.10 sec)
The reason this query takes a while to execute is because it has to scan the entire table. You never want to see "ALL" as the type of the query. To speed things up, you need to make smart decisions about what to index.
See the following documents at the MySQL site:
http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html
http://dev.mysql.com/doc/refman/5.1/en/using-explain.html
As an add-on to the previous answer by Dan, you should consider indexing the join columns and the where columns. In this case, that means the m_id cols in both tables and the sequence_end_time in the measure_list table. They are small enough that you could add an index, run explain plan and time it, then change the index and compare. Should be relatively quick to solve.

Getting a Column's Max Value

Is there any tangible difference (speed/efficiency) between these statements? Assume the column is indexed.
SELECT MAX(someIntColumn) AS someIntColumn
or
SELECT someIntColumn ORDER BY someIntColumn DESC LIMIT 1
This depends largely on the query optimizer in your SQL implementation. At best, they will have the same performance. Typically, however, the first query is potentially much faster.
The first query essentially asks for the DBMS to inspect every value in someIntColumn and pick the largest one.
The second query asks the DBMS to sort all the values in someIntColumn from largest to smallest and pick the first one. Depending on the number of rows in the table and the existence (or lack thereof) of an index on the column, this could be significantly slower.
If the query optimizer is sophisticated enough to realize that the second query is equivalent to the first one, you are in luck. But if you retarget your app to another DBMS, you might get unexpectedly poor performance.
EDIT based on explain plan:
Explain plan shows that max(column) is more efficient. The explain plan say, “Select tables optimized away”.
EXPLAIN SELECT version from schema_migrations order by version desc limit 1;
+----+-------------+-------------------+-------+---------------+--------------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+-------+---------------+--------------------------+---------+------+------+-------------+
| 1 | SIMPLE | schema_migrations | index | NULL | unique_schema_migrations | 767 | NULL | 1 | Using index |
+----+-------------+-------------------+-------+---------------+--------------------------+---------+------+------+-------------+
1 row in set (0.00 sec)
EXPLAIN SELECT max(version) FROM schema_migrations ;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.00 sec)