I have generated query
select
mailsource2_.file as col_0_0_,
messagedet0_.messageId as col_1_0_,
messageent1_.mboxOffset as col_2_0_,
messageent1_.mboxOffsetEnd as col_3_0_,
messagedet0_.id as col_4_0_
from MessageDetails messagedet0_, MessageEntry messageent1_, MailSourceFile mailsource2_
where messagedet0_.id=messageent1_.messageDetails_id
and messageent1_.mailSourceFile_id=mailsource2_.id
order by mailsource2_.file, messageent1_.mboxOffset;
Explain says that there is no full scans and indexes are used:
+----+-------------+--------------+--------+------------------------------------------------------+---------+---------+--------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys |key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+------------------------------------------------------+---------+---------+--------------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | mailsource2_ | index | PRIMARY |file | 384 | NULL | 1445 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | messageent1_ | ref | msf_idx,md_idx,FKBBB258CB60B94D38,FKBBB258CBF7C835B8 |msf_idx | 9 | skryb.mailsource2_.id | 2721 | Using where |
| 1 | SIMPLE | messagedet0_ | eq_ref | PRIMARY |PRIMARY | 8 | skryb.messageent1_.messageDetails_id | 1 | |
+----+-------------+--------------+--------+------------------------------------------------------+---------+---------+--------------------------------------+------+----------------------------------------------+
CREATE TABLE `mailsourcefile` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`file` varchar(127) COLLATE utf8_bin DEFAULT NULL,
`size` bigint(20) DEFAULT NULL,
`archive_id` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `file` (`file`),
KEY `File_idx` (`file`),
KEY `Archive_idx` (`archive_id`),
KEY `FK7C3F816ECDB9F63C` (`archive_id`),
CONSTRAINT `FK7C3F816ECDB9F63C` FOREIGN KEY (`archive_id`) REFERENCES `archive` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1370 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
Also I have indexes for file and mboxOffset. SHOW FULL PROCESSLIST says that mysql is sorting result and it takes more then few minutes. Resultset size is 5M records. How can I optimize this?
Don't think there is much optimization to do in the query itself. Joins would make it more readable, but iirc mysql nowadays is perfectly able to detect these kind of constructs and plan the joins itself.
What would help propably is to increase both the tmp_table_size and max_heap_table_size to allow the resultset of this query to remain in memory, rather than having to write it to disk.
The maximum size for in-memory temporary tables is the minimum of the tmp_table_size and max_heap_table_size values
http://dev.mysql.com/doc/refman/5.5/en/internal-temporary-tables.html
The "using temporary" in the explain denotes that it is using a temporary table (see the link above again) - which will probably be written to disk due to the large amount of data (again, see the link above for more on this).
the file column alone is anywhere between 1 and 384 bytes, so lets take the half for our estimation and ignore the rest of the columns, that leads to 192 bytes per row in the result-set.
1445 * 2721 = 3,931,845 rows
* 192 = 754,914,240 bytes
/ 1024 ~= 737,221 kb
/ 1024 ~= 710 mb
This is certainly more than the max_heap_table_size (16,777,216 bytes) and most likely more than the tmp_table_size.
Not having to write such a result to disk will most certainly increase speed.
Good luck!
Optimization is always tricky. In order to make a dent in your execution time, I think you probably need to do some sort of pre-cooking.
If the file names are similar, (e.g. /path/to/file/1, /path/to/file/2), sorting them will mean a lot of byte comparisons, probably compounded by the unicode encoding. I would calculate a hash of the filename on insertion (e.g. MD5()) and then sort using that.
If the files are already well distributed (e.g. postfix spool names), you probably need to come up with some scheme on insertion whereby either:
simply reading records from some joined table will automatically generate them in correct order; this may not save a lot of time, but it will give you some data quickly so you can start processing, or
find a way to provide a "window" on the data so that not all of it needs to be processed at once.
As #raheel shan said above, you may want to try some JOINs:
select
mailsource2_.file as col_0_0_,
messagedet0_.messageId as col_1_0_,
messageent1_.mboxOffset as col_2_0_,
messageent1_.mboxOffsetEnd as col_3_0_,
messagedet0_.id as col_4_0_
from
MessageDetails messagedet0_
inner join
MessageEntry messageent1_
on
messagedet0_.id = messageent1_.messageDetails_id
inner join
MailSourceFile mailsource2_
on
messageent1_.mailSourceFile_id = mailsource2_.id
order by
mailsource2_.file,
messageent1_.mboxOffset
My apologies if the keys are off, but I think I've conveyed the point.
write the query with joins like
select
mailsource2_.file as col_0_0_,
messagedet0_.messageId as col_1_0_,
messageent1_.mboxOffset as col_2_0_,
messageent1_.mboxOffsetEnd as col_3_0_,
messagedet0_.id as col_4_0_
from
MessageDetails m0
inner join
MessageEntry m1
on
m0.id = m1.messageDetails_id
inner join
MailSourceFile m2
on
m1.mailSourceFile_id = m2.id
order by
m2_.file,
m1_.mboxOffset;
on seeing ur explain i found 3 things which in my opinion are not good
1 file sort in extra column
2 index in type column
3 key length which is 384
if you reduce the key length you may get quick retrieval for that consider the character set you use and the partial indexes
here you can do force index for order by and use index for join ( create appropriate multi column indexes and assign them) remember it is alway good to order with column present in the same table
index type represents it is scanning entire index column which is not good
Related
I have a very simple query that is running extremely slowly despite being indexed.
My table is as follows:
mysql> show create table mytable
CREATE TABLE `mytable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`start_time` datetime DEFAULT NULL,
`status` varchar(64) DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ix_status_user_id_start_time` (`status`,`user_id`,`start_time`),
### other columns and indices, not relevant
) ENGINE=InnoDB AUTO_INCREMENT=115884841 DEFAULT CHARSET=utf8
Then the following query takes more than 10 seconds to run:
select id from mytable USE INDEX (ix_status_user_id_start_time) where status = 'running';
There are about 7 million rows in the table, and approximately 200 of rows have status running.
I would expect this query to take less than a tenth of a second. It should find the first row in the index with status running. And then scan the next 200 rows until it finds the first non-running row. It should not need to look outside the index.
When I explain the query I get a very strange result:
mysql> explain select id from mytable USE INDEX (ix_status_user_id_start_time) where status =
'running';
+----+-------------+---------+------------+------+------------------------------+------------------------------+---------+-------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+------------------------------+------------------------------+---------+-------+---------+----------+-------------+
| 1 | SIMPLE | mytable | NULL | ref | ix_status_user_id_start_time | ix_status_user_id_start_time | 195 | const | 2118793 | 100.00 | Using index |
+----+-------------+---------+------------+------+------------------------------+------------------------------+---------+-------+---------+----------+-------------+
It is estimating a scan of more than 2 million rows! Also, the cardinality of the status index does not seem correct. There are only about 5 or 6 different statuses, not 344.
Other info
There are somewhat frequent insertions and updates to this table. About 2 rows inserted per second, and 10 statuses updated per second. I don't know how much impact this has, but I would not expect it to be 30 seconds worth.
If I query by both status and user_id, sometimes it is fast (sub 0.1s) and sometimes it is slow (> 1s), depending on the user_id. This does not seem to depend on the size of the result set (some users with 20 rows are quick, others with 4 are slow)
Can anybody explain what is going on here and how it can be fixed?
I am using mysql version 5.7.33
As already mentioned in the comment, you are using many indexes on a big table. So the required memory for this indexes is very high.
You can increase the index buffer size in the my.cnf by changing the innodb_buffer_pool_size to a higher value.
But probably it is more efficient to use less indexes and do not use combined indexes if not absolutely needed.
My guess is, that if you remove all indexes and create only one on status this query will run in under 1s.
We have a big table with the following table structure:
CREATE TABLE `location_data` (
`id` int(20) NOT NULL AUTO_INCREMENT,
`dt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`device_sn` char(30) NOT NULL,
`data` char(20) NOT NULL,
`gps_date` datetime NOT NULL,
`lat` double(30,10) DEFAULT NULL,
`lng` double(30,10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `dt` (`dt`),
KEY `data` (`data`),
KEY `device_sn` (`device_sn`,`data`,`dt`),
KEY `device_sn_2` (`device_sn`,`dt`)
) ENGINE=MyISAM AUTO_INCREMENT=721453698 DEFAULT CHARSET=latin1
Many times we have performed query such as follow:
SELECT * FROM location_data WHERE device_sn = 'XXX' AND data = 'location' ORDER BY dt DESC LIMIT 1;
OR
SELECT * FROM location_data WHERE device_sn = 'XXX' AND data = 'location' AND dt >= '2014-01-01 00:00:00 ' AND dt <= '2014-01-01 23:00:00' ORDER BY dt DESC;
We have been optimizing this in a few ways:
By adding index and using FORCE INDEX on device_sn.
Separating the table into multiple tables based on the date (e.g. location_data_20140101) and pre-checking if there is a data based on certain date and we will pull that particular table alone. This table is created by cron once a day and the data in location_data for that particular date will be deleted.
The table location_data is HIGH WRITE and LOW READ.
However, few times, the query is running really slow. I wonder if there are other methods / ways / restructure the data that allows us to read a data in sequential date manner based on a given device_sn.
Any tips are more than welcomed.
EXPLAIN STATEMENT 1ST QUERY:
+----+-------------+--------------+------+----------------------------+-----------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------------------+-----------+---------+-------------+------+-------------+
| 1 | SIMPLE | location_dat | ref | data,device_sn,device_sn_2 | device_sn | 50 | const,const | 1 | Using where |
+----+-------------+--------------+------+----------------------------+-----------+---------+-------------+------+-------------+
EXPLAIN STATEMENT 2nd QUERY:
+----+-------------+--------------+-------+-------------------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+-------------------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | test_udp_new | range | dt,data,device_sn,device_sn_2 | dt | 4 | NULL | 1 | Using where |
+----+-------------+--------------+-------+-------------------------------+------+---------+------+------+-------------+
The index device_sn (device_sn,data,dt) is good. MySQL should use it without need to do any FORCE INDEX. You can verify it by running "explain select ..."
However, your table is MyISAM, which is only supports table level locks. If the table is heavily write it may be slow. I would suggest converting it to InnoDB.
Ok, I'll provide info that I know and this might not answer your question but could provide some insight.
There exits certain differences between InnoDB and MyISAM. Forget about full text indexing or spatial indexes, the huge difference is in how they operate.
InnoDB has several great features compared to MyISAM.
First off, it can store the data set it works with in RAM. This is why database servers come with a lot of RAM - so that I/O operations could be done quick. For example, an index scan is faster if you have indexes in RAM rather than on HDD because finding data on HDD is several magnitudes slower than doing it in RAM. Same applies for full table scans.
The variable that controls this when using InnoDB is called innodb_buffer_pool_size. By default it's 8 MB if I am not mistaken. I personally set this value high, sometimes even up to 90% of available RAM. Usually, when this value is optimized - a lot of people experience incredible speed gains.
The other thing is that InnoDB is a transactional engine. That means it will tell you that a write to disk succeeded or failed and that will be 100% correct. MyISAM won't do that because it doesn't force OS to force HDD to commit data permanently. That's why sometimes records are lost when using MyISAM, it thinks data is written because OS said it was when in reality OS tried to optimize the write and HDD might lose buffer data, thus not writing it down. OS tries to optimize the write operation and uses HDD's buffers to store larger chunks of data and then it flushes it in a single I/O. What happens then is that you don't have control over how data is being written.
With InnoDB you can start a transaction, execute say 100 INSERT queries and then commit. That will effectively force the hard drive to flush all 100 queries at once, using 1 I/O. If each INSERT is 4 KB long, 100 of them is 400 KB. That means you'll utilize 400kb of your disk's bandwith with 1 I/O operation and that remainder of I/O will be available for other uses. This is how inserts are being optimized.
Next are indexes with low cardinality - cardinality is a number of unique values in an indexed column. For primary key this value is 1. it's also the highest value. Indexes with low cardinality are columns where you have a few distinct values, such as yes or no or similar. If an index is too low in cardinality, MySQL will prefer a full table scan - it's MUCH quicker. Also, forcing an index that MySQL doesn't want to use could (and probably will) slow things down - this is because when using an indexed search, MySQL processes records one by one. When it does a table scan, it can read multiple records at once and avoid processing them. If those records were written sequentially on a mechanical disk, further optimizations are possible.
TL;DR:
use InnoDB on a server where you can allocate sufficient RAM
set the value of innodb_buffer_pool_size large enough so you can allocate more resources for faster querying
use an SSD if possible
try to wrap multiple INSERTs into transactions so you can better utilize your hard drive's bandwith and I/O
avoid indexing columns that have low unique value count compared to row count - they just waste space (though there are exceptions to this)
I have a photo gallery on my site with 1M photos in it. There are 2 search tables associated with it. Table #1 contains a list of words used in the photos. Table #2 contains a list of what words match up with what photos. Table #2 is 7M rows. I am testing partitioning this 7M row table because I have another set of tables with 120,000,000 rows. Queries against the 120M row wordmatch table below, with or without a join again the wordlist table below, take multiple seconds to run.
I am trying to perform a join between these 2 tables and MySQL 5.6 EXPLAIN PARTITIONS shows it is using all the partitions. How can I redo this query to make this correctly use only a single partition?
The 2 tables:
CREATE TABLE wordlist (
word_text varchar(50) NOT NULL DEFAULT '',
word_id mediumint(8) unsigned NOT NULL AUTO_INCREMENT
PRIMARY KEY (word_text),
KEY word_id (word_id)
) ENGINE=InnoDB
CREATE TABLE wordmatch (
pic_id int(11) unsigned NOT NULL DEFAULT '0',
word_id mediumint(8) unsigned NOT NULL DEFAULT '0',
title_match tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (word_id,pic_id,title_match),
KEY pic_id (pic_id)
) ENGINE=InnoDB
/*!50100 PARTITION BY HASH (word_id)
PARTITIONS 11 */;
SQL query I am performing:
EXPLAIN PARTITIONS SELECT m.pic_id FROM wordlist w, wordmatch m WHERE w.word_text LIKE 'bacon' AND m.word_id = w.word_id
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+
| 1 | SIMPLE | w | NULL | range | PRIMARY,word_id | PRIMARY | 52 | NULL | 1 | Using where |
| 1 | SIMPLE | m | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10 | ref | PRIMARY | PRIMARY | 3 | w.word_id | 34 | Using index |
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+
The join produces a query that uses all partitions.
If I retrieve the word_id # first and go straight against the wordmatch table, everything is ok:
EXPLAIN PARTITIONS SELECT m.pic_id FROM wordmatch m WHERE m.word_id = 219657;
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+
| 1 | SIMPLE | m | p9 | ref | PRIMARY | PRIMARY | 3 | const | 18220 | Using index |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+
How do I get this to work correctly?
I prefer not to split this into multiple queries if possible.
You may have noticed I am using LIKE above. People will often search on bacon% to get plurals of words, etc.
Example:
SELECT m.pic_id FROM wordlist w, wordmatch m WHERE w.word_text LIKE 'bacon%' AND m.word_id = w.word_id
I realize this wildcard search may result in 2 or more partitions being selected. This is probably ok, although if there is a way to change the partitioning to prevent that, I welcome any tips.
Edit #1: Added details as my original question was confusing. I was testing my 7M row table first before doing my 120M row table.
Edit #2: Resolution to my overall issue: My performance issues seem to be resolved as I partitioned my 120M row table into 101 partitions per this post: MySQL performance: partitions I do not know if MySQL is going against all the partitions at runtime - Ollie Jones says it does not in the comments below and EXPLAIN PARTITIONS is incorrect - but it is fast now so I am happy.
To get your query working with efficient indexing is probably a good idea before you dive into the partitioning project. Here's your query refactored to use JOIN:
SELECT m.pic_id
FROM wordlist w
JOIN wordmatch m ON w.word_id = m.word_id
WHERE w.word_text LIKE 'bacon%'
This query can use a compound index on wordlist (word_test, word_id). It will random-access the index for the first matching word_text, and then scan the index retrieving the word_id values until it gets to the last matching `word_text.
It can also use your existing primary key on wordmatch (word_id, pic_id) It speeds up your query because the data base engine can satisfy your query directly from the index without having to bat the hard drive back and forth to the table itself.
So, give those indexes a try. Your large table, the wordmatch table, should work fairly well without partitioning. It's more common to partition tables that contain lots of content (like the text of articles) than it is to partition this kind of fixed-row-size join table.
Notice that your EXPLAIN announces it will look at all the partitions because EXPLAIN can't tell which partition (or partitions) your w.word_text LIKE 'bacon%' WHERE-clause will need to examine. EXPLAIN isn't as dumb as a box of hammers, but it is close. MySQL won't examine the partitions it doesn't need to, but it doesn't know which partitions are involved until runtime.
Have you considered using FULLTEXT search? It might simplify what you're doing.
Your first query doesn't have any filtering conditions on wordmatch table that could limit the partitions in use, thus it needs to access all partitions. There is no way to redo this query to use only necessary partitions without adding a filter on the field that is the basis for the partitioning (word_id).
The second query filters on a specific word_id value, so the index knows exactly which partition to point to.
I would also agree with comment made by #OllieJones that I am not sure you should really worry about partitioning at only 7M rows. That is not really that big of a table in the grand schema of things.
I have a table with approximately 120k rows, which contains a field with a BLOB (not more than 1MB each entry in size, usually much less). My problem is that whenever I run a query asking any columns on this table (not including the BLOB one), if the filesystem cache is empty, it takes approximately 40'' to complete. All subsequent queries on the same table require less than 1'' (testing from the command line client, on the server itself). The number of rows returned in the queries vary from an empty set to 60k+
I have eliminated the query cache so it has nothing to do with it.
The table is myisam but I also tried to change it to innodb (and setting ROW_FORMAT=COMPACT), but without any luck.
If I remove the BLOB column, the query is always fast.
So I would assume that the server reads the blobs from the disk (or parts of them) and the filesystem caches them. The problem is that on a server with high traffic and limited memory, the filesystem cache is refreshed every once in a while, so this particular query keeps causing me trouble.
So my question is, is there a way to considerably speed things up, without removing the blob column from the table?
here are 2 example queries, ran one after the other, along with explain, indexes and table definition:
mysql> SELECT ct.score FROM completed_tests ct where ct.status != 'deleted' and ct.status != 'failed' and score < 100;
Empty set (48.21 sec)
mysql> SELECT ct.score FROM completed_tests ct where ct.status != 'deleted' and ct.status != 'failed' and score < 99;
Empty set (1.16 sec)
mysql> explain SELECT ct.score FROM completed_tests ct where ct.status != 'deleted' and ct.status != 'failed' and score < 99;
+----+-------------+-------+-------+---------------+--------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------+---------+------+-------+-------------+
| 1 | SIMPLE | ct | range | status,score | status | 768 | NULL | 82096 | Using where |
+----+-------------+-------+-------+---------------+--------+---------+------+-------+-------------+
1 row in set (0.00 sec)
mysql> show indexes from completed_tests;
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| completed_tests | 0 | PRIMARY | 1 | id | A | 583938 | NULL | NULL | | BTREE | |
| completed_tests | 1 | users_login | 1 | users_LOGIN | A | 11449 | NULL | NULL | YES | BTREE | |
| completed_tests | 1 | tests_ID | 1 | tests_ID | A | 140 | NULL | NULL | | BTREE | |
| completed_tests | 1 | status | 1 | status | A | 3 | NULL | NULL | YES | BTREE | |
| completed_tests | 1 | timestamp | 1 | timestamp | A | 291969 | NULL | NULL | | BTREE | |
| completed_tests | 1 | archive | 1 | archive | A | 1 | NULL | NULL | | BTREE | |
| completed_tests | 1 | score | 1 | score | A | 783 | NULL | NULL | YES | BTREE | |
| completed_tests | 1 | pending | 1 | pending | A | 1 | NULL | NULL | | BTREE | |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> show create table completed_tests;
+-----------------+--------------------------------------
| Table | Create Table |
+-----------------+--------------------------------------
| completed_tests | CREATE TABLE `completed_tests` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`users_LOGIN` varchar(100) DEFAULT NULL,
`tests_ID` mediumint(8) unsigned NOT NULL DEFAULT '0',
`test` longblob,
`status` varchar(255) DEFAULT NULL,
`timestamp` int(10) unsigned NOT NULL DEFAULT '0',
`archive` tinyint(1) NOT NULL DEFAULT '0',
`time_start` int(10) unsigned DEFAULT NULL,
`time_end` int(10) unsigned DEFAULT NULL,
`time_spent` int(10) unsigned DEFAULT NULL,
`score` float DEFAULT NULL,
`pending` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `users_login` (`users_LOGIN`),
KEY `tests_ID` (`tests_ID`),
KEY `status` (`status`),
KEY `timestamp` (`timestamp`),
KEY `archive` (`archive`),
KEY `score` (`score`),
KEY `pending` (`pending`)
) ENGINE=InnoDB AUTO_INCREMENT=117996 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
1 row in set (0.00 sec)
I originally posted this on mysql query slow at first fast afterwards but I now have more information so I repost as a different question
I also posted this on the mysql forum, but I haven't heard back
Thanks in advance as always
The design of BLOB (=TEXT) storage in MySQL seems to be totally flawed and counter-intuitive. I ran a couple of times into the same problem and was unable to find any authoritative explanation. The most detailed analysis I've finally found is this post from 2010: http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/
General belief and expectation is that BLOBs/TEXTs are stored outside main row storage (e.g., see this answer). This is NOT TRUE, though. There are several issues here (I'm basing on the article given above):
If the size of a BLOB item is several KB, it is included directly in row data. Consequently, even if you SELECT only non-BLOB columns, the engine still has to load all your BLOBs from disk. Say, you have 1M rows with 100 bytes of non-blob data each and 5000 bytes of blob data. You SELECT all non-blob columns and expect that MySQL would read from disk around 100-120 bytes per row, which is 100-120 MB in total (+20 for BLOB address). However, the reality is that MySQL stores all BLOBs in the same disk blocks as rows, so they all must be read together even if not used, and so the size of data read from disk is around 5100 MB = 5 GB - this is 50 times more than you would expect and means 50 times slower query execution.
Of course, this design has an advantage: when you need all the columns, including the blob one, SELECT query is faster when blobs are stored with the row than when stored externally: you avoid (sometimes) 1 additional page access per row. However, this is not a typical use case for BLOBs and DB engine should not be optimized towards this case. If your data is so small that it fits in a row and you're fine with loading it in every query no matter if needed or not - then you would use VARCHAR type instead of BLOB/TEXT.
Even if for some reason (long row or long blob) the BLOB value is stored externally, its 768-byte prefix is still kept in the row itself. Let's take the previous example: you have 100 bytes of non-blob data in each row, but now the blob column holds items of 1 MB each so they must be kept externally. SELECT of non-blob columns will have to read roughly 800 bytes per row (non-blobs + blob prefix), instead of 100-120 - this is again 7 times larger disk transfer than you'd expect, and 7x slower query execution.
External BLOB storage is ineffective in its usage of disk space: it allocates space in blocks of 16 KB and single block cannot hold multiple items, so if your blobs are small and take, for instance, 8 KB each, the actual space allocated is twice that large.
I hope this design will get fixed one day: MySQL will store ALL blobs - big and small - in external storage, without any prefixes kept in DB, with external storage allocation being efficient for items of all sizes. Before this happens, separating out BLOB/TEXT columns seems the only reasonable solution - separating out to another table or to the filesystem (each BLOB value kept as a file).
[UPDATE 2019-10-15]
InnoDB documentation provides now an ultimate answer to the issue discussed above:
https://dev.mysql.com/doc/refman/8.0/en/innodb-row-format.html
The case of storing 768-byte prefixes of BLOB/TEXT values inline holds indeed for COMPACT row format. According to the docs, "For each non-NULL variable-length field (...) The internal part is 768 bytes".
However, you can use DYNAMIC row format instead. With this format:
"InnoDB can store long variable-length column values (...) fully off-page, with the clustered index record containing only a 20-byte pointer to the overflow page. (...) TEXT and BLOB columns that are less than or equal to 40 bytes are stored in line."
Here, a BLOB value can occupy up to 40 bytes of inline storage, which is much better than 768 bytes as in the COMPACT mode, and looks like a lot more reasonable approach in the case you want to mix BLOB and non-BLOB types in a table and still be able to scan multiple rows pretty fast. Moreover, the extended (over 20 bytes) inline storage is used ONLY for values sized between 20-40 bytes; for larger values, only the 20-byte pointer is stored (no prefix), unlike in the COMPACT mode. Hence, the extended 40-byte storage is used rarely in practice and one can safely assume the average size of inline storage to be just 20 bytes (or less, if you tend to keep many small values of less than 20B in your BLOB). All in all, it seems DYNAMIC row format, rather than COMPACT, should be the default choice in most cases to achieve good predictable performance of BLOB columns in InnoDB.
An example how to check the actual physical storage in InnoDB can be found here:
https://dba.stackexchange.com/a/210430/177276
As to MyISAM, it apparently does NOT provide off-page storage for BLOBs at all (just inline). Check here for more info:
https://dev.mysql.com/doc/refman/5.7/en/dynamic-format.html
https://forums.mysql.com/read.php?24,105964,267596#msg-267596
I was doing research on this issue for a while. Many people recommend using blob with only one primary key in a separate table and storing the blobs meta data in another table with a foreign key to the blob table. With this the performance will be higher considerably.
Adding a composite index on the two relevant columns should allow these queries to be executed without accessing the table data directly.
CREATE INDEX `IX_score_status` ON `completed_tests` (`score`, `status`);
If you are able to switch to MariaDB then you can make the most of the table elimination optimisations. This would allow you to split the BLOB field out into it's own table and use a view to recreate you existing table structure using a LEFT JOIN. This way it will only access the BLOB data if it is explicitly required for the executing query.
Just add index or indexes to fields used after WHERE query for a table with blobs.
e.g. You have 2 tables with those fields
users : USERID, NAME, ...
userphotos : BLOBID, BLOB, USERNO, ...
select * from userphotos where USERNO=123456;
Normaly this works fine. When you have many large images (e.g. BLOB, MEDIUMBLOB or LONGBLOB more than 5GB in total ) this will take much time (more than minutes) while BLOBID is primary key.
Somehow MySQL is searching whole data including images if there is no index about the field of BLOB table in WHERE clause. When your data goes larger and larger that takes much time. If you create index for the field USERNO, this will speed up your database and it will be independed by the size of whole data.
Solution:
**Add Index to the USERNO at userphotos**
As an answer to your question you should create index for the ct.status
Database is MySQL with MyISAM engine.
Table definition:
CREATE TABLE IF NOT EXISTS matches (
id int(11) NOT NULL AUTO_INCREMENT,
game int(11) NOT NULL,
user int(11) NOT NULL,
opponent int(11) NOT NULL,
tournament int(11) NOT NULL,
score int(11) NOT NULL,
finish tinyint(4) NOT NULL,
PRIMARY KEY ( id ),
KEY game ( game ),
KEY user ( user ),
KEY i_gfu ( game , finish , user )
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3149047 ;
I have set an index on (game, finish, user) but this GROUP BY query still needs 0.4 - 0.6 seconds to run:
SELECT user AS player
, COUNT( id ) AS times
FROM matches
WHERE finish = 1
AND game = 19
GROUP BY user
ORDER BY times DESC
The EXPLAIN output:
| id | select_type | table | type | possible_keys | key | key_len |
| 1 | SIMPLE | matches | ref | game,i_gfu | i_gfu | 5 |
| ref | rows | Extra |
| const,const | 155855 | Using where; Using temporary; Using filesort |
Is there any way I can make it faster? The table has about 800K records.
EDIT: I changed COUNT(id) into COUNT(*) and the time dropped to 0.08 - 0.12 seconds. I think I've tried that before making the index and forgot to change it again after.
In the explain output the Using index explains the speeding up:
| rows | Extra |
| 168029 | Using where; Using index; Using temporary; Using filesort |
(Side question: is this dropping of a factor of 5 normal?)
There are about 2000 users, so the final sorting, even if it uses filesort, it doesn't hurt performance. I tried without ORDER BY and it still takes almost same time.
Get rid of 'game' key - it's redundant with 'i_gfu'. As 'id' is unique count(id) just returns number of rows in each group, so you can get rid of that and replace it with count(*). Try it that way and paste output of EXPLAIN:
SELECT user AS player, COUNT(*) AS times
FROM matches
WHERE finish = 1
AND game = 19
GROUP BY user
ORDER BY times DESC
Eh, tough. Try reordering your index: put the user column first (so make the index (user, finish, game)) as that increases the chance the GROUP BY can use the index. However, in general GROUP BY can only use indexes if you limit the aggregate functions used to MIN and MAX (see http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html and http://dev.mysql.com/doc/refman/5.5/en/loose-index-scan.html). Your order by isn't really helping either.
One of the shortcomings of this query is that you order by an aggregate. That means that you can't return any rows until the full result set has been generated; no index can exist (for mysql myisam, anyway) to fix that.
You can denormalize your data fairly easily to overcome this, though; You could, for instance, add an insert/update trigger to stick a count value in a summary table, with an index, so that you can start returning rows immediately.
The EXPLAIN verifies the (game, finish, user) index was used in the query. That seems like the best possible index to me. Could it be a hardware issue? What is your system RAM and CPU?
I take it that the bulk of the time is spent on extracting and more importantly sorting (twice, including the one skipped by reading the index) 150k rows out of 800k. I doubt you can optimize it much more than it already is.
As others have noted, you may have reached the limit of your ability to tune the query itself. You should next see what the setting of max_heap_table_size and tmp_table_size variables in your server. The default is 16MB, which may be too small for your table.