MySQL many-to-many relation slow on big table - mysql

I have 2 tables which are connected with a relationship table.
More details about the tables:
stores (Currently 140.000 rows)
id (index)
store_name
city_id (index)
...
categories (Currently 400 rows)
id (index)
cat_name
store_cat_relation
store_id
cat_id
Every store belongs in one or more categories.
In the store_cat_relation table, I have indexes on (store_id, cat_id) and (cat_id, store_id).
I need to find the total amount of let's say supermarkets (cat_id = 1) in Paris (city_id = 1). I have a working query, but it takes too long when the database contains lots of stores in Paris or the database has lots of supermarkets.
This is my query:
SELECT COUNT(*) FROM stores s, store_cat_relation r WHERE s.city_id = '1' AND r.cat_id = '1' AND s.id = r.store_id
This query takes about 0,05s. Database contains about 8000 supermarkets (stores with category 1) and about 8000 stores in Paris (store_id = 1). Combined 550 supermarkets in Paris at the moment.
I want to reduce the query time to below 0,01s because the database is only getting bigger.
The result of EXPLAIN is this:
id: 1
select_type: SIMPLE
table: store_cat_relation
type: ref
possible_keys: cat_id_store_id, store_id_cat_id
key: cat_id_store_id
key_len: 4
ref: const
rows: 8043
Extra: Using index
***************************************
id: 1
select_type: SIMPLE
table: stores
type: eq_ref
possible_keys: PRIMARY, city_id
key: PRIMARY
key_len: 4
ref: store_cat_relation.store_id
rows: 1
Extra: Using index condition; Using where
Anyone an idea why this query takes so long?
EDIT: I also created a SQL fiddle with 300 rows per table. With low amount of rows, it's quite fast, but I need it to be fast with +100.000 rows.
http://sqlfiddle.com/#!9/675a3/1

i have made some test and the best performance is to use the Query cache. You can enable them and use it ON DEMAND. so you can say which query are insert into the cache. if you want to use it you must make the changes in the /etc/my.cnf to make them persistent. If you change the tables you can also run some queries to warm up the cache
Here a Sample
Table size
MariaDB [yourSchema]> select count(*) from stores;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (1 min 23.50 sec)
MariaDB [yourSchema]> select count(*) from store_cat_relation;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (2.45 sec)
MariaDB [yourSchema]>
Verify cache is on
MariaDB [yourSchema]> SHOW VARIABLES LIKE 'have_query_cache';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| have_query_cache | YES |
+------------------+-------+
1 row in set (0.01 sec)
set cache size and on DEMAND
MariaDB [yourSchema]> SET GLOBAL query_cache_size = 1000000;
Query OK, 0 rows affected, 1 warning (0.00 sec)
MariaDB [yourSchema]> SET GLOBAL query_cache_type=DEMAND;
Query OK, 0 rows affected (0.00 sec)
Enable Profiling
MariaDB [yourSchema]> set profiling=on;
First execute your query - takes 0.68 sec
MariaDB [yourSchema]> SELECT SQL_CACHE COUNT(*) FROM stores s, store_cat_relation r WHERE s.city_id = '1' AND r.cat_id = '1' AND s.id = r.store_id;
+----------+
| COUNT(*) |
+----------+
| 192 |
+----------+
1 row in set (0.68 sec)
now get it from cache
MariaDB [yourSchema]> SELECT SQL_CACHE COUNT(*) FROM stores s, store_cat_relation r WHERE s.city_id = '1' AND r.cat_id = '1' AND s.id = r.store_id;
+----------+
| COUNT(*) |
+----------+
| 192 |
+----------+
1 row in set (0.00 sec)
see the Profile with duration in uS
MariaDB [yourSchema]> show profile;
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| starting | 0.000039 |
| Waiting for query cache lock | 0.000008 |
| init | 0.000005 |
| checking query cache for query | 0.000056 |
| checking privileges on cached | 0.000026 |
| checking permissions | 0.000014 |
| checking permissions | 0.000025 |
| sending cached result to clien | 0.000027 |
| updating status | 0.000048 |
| cleaning up | 0.000025 |
+--------------------------------+----------+
10 rows in set (0.05 sec)
MariaDB [yourSchema]>

What you are looking at is index scenarios:
Using the optimizer a DBMS tries to find the optimal path to the data. Depending on the data itself, this can lead to different access paths depending on the conditions (WHERE/JOINS/GROUP BY, sometimes ORDER BY) supplied. The data distribution in this can be key to fast queries or very slow queries.
So you have at this moment 2 tables, store and store_cat_relation. On store you have 2 indexes:
id (primary)
city_id
You have a where on city_id, and a join on id. The internal execution in the DBMS engine is then as follows:
1) Read index city_id
2) Then read table (ok, primary key index) to find id
3) Join on ID
This can be a bit more optimized with a multi column index:
CREATE INDEX idx_nn_1 ON store(city_id,id);
This should result in:
1) Read index idx_nn_1
2) Join using this index idx_nn_1
You do have fairly lob sided data in your current example with all city_id=1 in your example. This kind of distribution of the data in the real data, can give you problems since where city_id= is then similar to saying "Just select everything from table store". The histogram information on that column can result into a different plan in those kind of cases, however if your data distribution is not so lob sided, it should work nicely.
On your second table store_cat_relation you might try an index like this:
CREATE INDEX idx_nn_2 ON store_cat_relation(store_id,cat_id);
To see if the DBMS then decides that leads to a better data access path.
With every join you see, study the join and see if a multi column index can reduce the number of reads.
Do not index all your columns: Too many columns in an index will lead to slower inserts and updates.
Also some scenarios might require you to create indexes in different order, leading to many indexes on a table (one with column(1,2,3), the next with column(1,3,2), etc). That is also not a real happy scenario, in which single column or a limitation of the columns and just reading the table for column 2,3 might be preferred.
Indexing requires testing your most common scenarios, which can be a lot of fun since you will see how a slow query running for seconds can suddenly run within 100s of seconds or even faster.

Related

LOCK TABLES table WRITE blocks my readings

I am at the REPEATABLE-READ level.
Why does it make me wait?
I understand that all reads (SELECTs) at any level are non-blocking.
what am I missing?
Session 1:
mysql> lock tables users write;
Query OK, 0 rows affected (0.00 sec)
Session 2:
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from users where id = 1; // wait
Session 1:
mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)
Session 2:
mysql> select * from users where id = 1;
+----+-----------------+--------------------+------+---------------------+--------------------------------------------------------------+----------------+---------------------+---------------------+------------+
| id | name | email | rol | email_verified_at | password | remember_token | created_at | updated_at | deleted_at |
+----+-----------------+--------------------+------+---------------------+--------------------------------------------------------------+----------------+---------------------+---------------------+------------+
| 1 | Bella Lueilwitz | orlo19#example.com | NULL | 2022-08-01 17:22:29 | $2y$10$92IXUNpkjO0rOQ5byMi.Ye4oKoEa3Ro9llC/.og/at2.uheWG/igi | MvMlaX9TQj | 2022-08-01 17:22:29 | 2022-08-01 17:22:29 | NULL |
+----+-----------------+--------------------+------+---------------------+--------------------------------------------------------------+----------------+---------------------+---------------------+------------+
1 row in set (10.51 sec)
In this question the opposite is true
Why doesn't LOCK TABLES [table] WRITE prevent table reads?
You reference a question about MySQL 5.0 posted in 2013. The answer from that time suggests that the client was allowed to get a result that had been cached in the query cache. Since then, MySQL 5.6 and 5.7 disabled the query cache by default, and MySQL 8.0 removed the feature altogether. This is a good thing.
The documentation says:
WRITE lock:
Only the session that holds the lock can access the table. No other session can access it until the lock is released.
This was true in the MySQL 5.0 days too, but the query cache allowed some clients to get around it. But I guess it wasn't reliable even then, because if the client ran a query that happened not to be cached, I suppose it would revert to the documented behavior. Anyway, it's moot, because all currently supported versions of MySQL should have the query cache disabled or removed.

MySQL LIMIT X, Y slows down as I increase X

I have a db with around 600 000 listings, while browsing these on a page with pagination, I use this query to limit records:
SELECT file_id, file_category FROM files ORDER BY file_edit_date DESC LIMIT 290580, 30
On first pages LIMIT 0, 30 it loads in few ms, same for LIMIT 30,30, LIMIT 60,30, LIMIT 90,30, etc. But as I move forward to the end of the pages, the query takes around 1 second to execute.
Indexes are probably not related, it also happens if I run this:
SELECT * FROM `files` LIMIT 400000,30
Not sure why.
Is there a way to improve this ?
Unless there is a better solution, would it be a bad practice to just load all records and loop over them in the PHP page to see if the record is inside the pagination range and print it ?
Server is an i7 with 16GB ram;
MySQL Community Server 5.7.28;
files table is around 200 MB
here is the my.cnf if it matters
query_cache_type = 1
query_cache_size = 1G
sort_buffer_size = 1G
thread_cache_size = 256
table_open_cache = 2500
query_cache_limit = 256M
innodb_buffer_pool_size = 2G
innodb_log_buffer_size = 8M
tmp_table_size=2G
max_heap_table_size=2G
You may find that adding the following index will help performance:
CREATE INDEX idx ON files (file_edit_date DESC, file_id, file_category);
If used, MySQL would only need a single index scan to retrieve the number of records at some offset. Note that we include the columns in the select clause so that the index may cover the entire query.
LIMIT was invented to reduce the size of the result set, it can be used by the optimizer if you order the result set using an index.
When using LIMIT x,n the server needs to process x+n rows to deliver a result. The higher the value for x, the more rows have to be processed.
Here is the explain output from a simple table, having an unique index on column a:
MariaDB [test]> explain select a,b from t1 order by a limit 0, 2;
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------+
| 1 | SIMPLE | t1 | index | NULL | PRIMARY | 4 | NULL | 2 | |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------+
1 row in set (0.00 sec)
MariaDB [test]> explain select a,b from t1 order by a limit 400000, 2;
+------+-------------+-------+-------+---------------+---------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+---------+---------+------+--------+-------+
| 1 | SIMPLE | t1 | index | NULL | PRIMARY | 4 | NULL | 400002 | |
+------+-------------+-------+-------+---------------+---------+---------+------+--------+-------+
1 row in set (0.00 sec)
When running the statements above (without EXPLAIN) the execution time for LIMIT 0 is 0.01 secs, for LIMIT 400000 0.6 secs.
Since MariaDB doesn't support LIMIT in a subquery, you could split your SQL statements in to two statements:
The first statement retrieves the id's (and needs to read the index file only), the second statement uses the id's retrieved from first statement:
MariaDB [test]> select a from t1 order by a limit 400000, 2;
+--------+
| a |
+--------+
| 595312 |
| 595313 |
+--------+
2 rows in set (0.08 sec)
MariaDB [test]> select a,b from t1 where a in (595312,595313);
+--------+------+
| a | b |
+--------+------+
| 595312 | foo |
| 595313 | foo |
+--------+------+
2 rows in set (0.00 sec)
Caution: I am about to use some strong language. Computers are big and fast, and they can handle bigger stuff than they could even a decade ago. But, as you are finding out, there are limits. I'm going to point out multiple limits that you have threatened; I will try to explain why the limits may be a problem.
Settings
query_cache_size = 1G
is terrible. Whenever a table is written to, the QC scans the 1GB looking for any references to that table in order to purge entries in the QC. Decrease that to 50M. This, alone, will speed up the entire system.
sort_buffer_size = 1G
tmp_table_size=2G
max_heap_table_size=2G
are bad for a different reason. If you have multiple connections performing complex queries, lots of RAM could be allocated for each, thereby chewing up RAM, leading to swapping, and possibly crashing. Don't set them higher than about 1% of RAM.
In general, do not blindly change values in my.cnf. The most important setting is innodb_buffer_pool_size, which should be bigger than your dataset, but no bigger than 70% of available RAM.
load all records
Ouch! The cost of shoveling all that data from MySQL to PHP is non-trivial. Once it gets to PHP, it will be stored in structures that are not designed for huge amounts of data -- 400030 (or 600000) rows might take 1GB inside PHP; this would probably blow out its "memory_limit", leading PHP crashing. (OK, just dying with an error message.) It is possible to raise that limit, but then PHP might push MySQL out of memory, leading to swapping, or maybe running out of swap space. What a mess!
OFFSET
As for the large OFFSET, why? Do you have a user paging through the data? And he is almost to page 10,000? Are there cobwebs covering him?
OFFSET must read and step over 290580 rows in your example. That is costly.
For a way to paginate without that overhead, see http://mysql.rjweb.org/doc.php/pagination .
If you have a program 'crawling' through all 600K rows, 30 at a time, then the tip about "remember where you left off" in that link will work very nicely for such use. It does not "slow down".
If you are doing something different; what is it?
Pagination and gaps
Not a problem. See also: http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks which is more aimed at walking through an entire table. It focuses on an efficient way to find the 30th row going forward. (This is not necessarily any better than remembering the last id.)
That link is aimed at DELETEing, but can easily be revised toSELECT`.
Some math for scanning a 600K-row table 30 rows at a time:
My links: 600K rows are touched. Or twice that, if you peek forward with LIMIT 30,1 as suggested in the second link.
OFFSET ..., 30 must touch (600K/30)*600K/2 rows -- about 6 billion rows.
(Corollary: changing 30 to 100 would speed up your query, though it would still be painfully slow. It would not speed up my approach, but it is already quite fast.)

MySQL performance for version 5.7 vs. 5.6

I have noticed a particular performance issue that I am unsure on how to deal with.
I am in the process of migrating a web application from one server to another with very similar specifications. The new server typically outperforms the old server to be clear.
The old server is running MySQL 5.6.35
The new server is running MySQL 5.7.17
Both the new and old server have virtually identical MySQL configurations.
Both the new and old server are running the exact same database perfectly duplicated.
The web application in question is Magento 1.9.3.2.
In Magento, the following function
Mage_Catalog_Model_Category::getChildrenCategories()
is intended to list all the immediate children categories given a certain category.
In my case, this function bubbles down eventually to this query:
SELECT `main_table`.`entity_id`
, main_table.`name`
, main_table.`path`
, `main_table`.`is_active`
, `main_table`.`is_anchor`
, `url_rewrite`.`request_path`
FROM `catalog_category_flat_store_1` AS `main_table`
LEFT JOIN `core_url_rewrite` AS `url_rewrite`
ON url_rewrite.category_id=main_table.entity_id
AND url_rewrite.is_system=1
AND url_rewrite.store_id = 1
AND url_rewrite.id_path LIKE 'category/%'
WHERE (main_table.include_in_menu = '1')
AND (main_table.is_active = '1')
AND (main_table.path LIKE '1/494/%')
AND (`level` <= 2)
ORDER BY `main_table`.`position` ASC;
While the structure for this query is the same for any Magento installation, there will obviously be slight discrepancies on values between Magento Installation to Magento Installation and what category the function is looking at.
My catalog_category_flat_store_1 table has 214 rows.
My url_rewrite table has 1,734,316 rows.
This query, when executed on its own directly into MySQL performs very differently between MySQL versions.
I am using SQLyog to profile this query.
In MySQL 5.6, the above query performs in 0.04 seconds. The profile for this query looks like this: https://codepen.io/Petce/full/JNKEpy/
In MySQL 5.7, the above query performs in 1.952 seconds. The profile for this query looks like this: https://codepen.io/Petce/full/gWMgKZ/
As you can see, the same query on almost the exact same setup is virtually 2 seconds slower, and I am unsure as to why.
For some reason, MySQL 5.7 does not want to use the table index to help produce the result set.
Anyone out there with more experience/knowledge can explain what is going on here and how to go about fixing it?
I believe the issue has something to do with the way that MYSQL 5.7 optimizer works. For some reason, it appears to think that a full table scan is the way to go. I can drastically improve the query performance by setting max_seeks_for_key very low (like 100) or dropping the range_optimizer_max_mem_size really low to forcing it to throw a warning.
Doing either of these increases the query speed by almost 10x down to 0.2 sec, however, this is still magnitudes slower that MYSQL 5.6 which executes in 0.04 seconds, and I don't think either of these is a good idea as I'm not sure if there would be other implications.
It is also very difficult to modify the query as it is generated by the Magento framework and would require customisation of the Magento codebase which I'd like to avoid. I'm also not even sure if it is the only query that is effected.
I have included the minor versions for my MySQL installations. I am now attempting to update MySQL 5.7.17 to 5.7.18 (the latest build) to see if there is any update to the performance.
After upgrading to MySQL 5.7.18 I saw no improvement. In order to bring the system back to a stable high performing state, we decided to downgrade back to MySQL 5.6.30. After doing the downgrade we saw an instant improvement.
The above query executed in MySQL 5.6.30 on the NEW server executed in 0.036 seconds.
Wow! This is the first time I have seen something useful from Profiling. Dynamically creating an index is a new Optimization feature from Oracle. But it looks like that was not the best plan for this case.
First, I will recommend that you file a bug at http://bugs.mysql.com -- they don't like to have regressions, especially this egregious. If possible, provide EXPLAIN FORMAT=JSON SELECT... and "Optimizer trace". (I do not accept tweaking obscure tunables as an acceptable answer, but thanks for discovering them.)
Back to helping you...
If you don't need LEFT, don't use it. It returns NULLs when there are no matching rows in the 'right' table; will that happen in your case?
Please provide SHOW CREATE TABLE. Meanwhile, I will guess that you don't have INDEX(include_in_menu, is_active, path). The first two can be in either order; path needs to be last.
And INDEX(category_id, is_system, store_id, id_path) with id_path last.
Your query seems to have a pattern that works well for turning into a subquery:
(Note: this even preserves the semantics of LEFT.)
SELECT `main_table`.`entity_id` , main_table.`name` , main_table.`path` ,
`main_table`.`is_active` , `main_table`.`is_anchor` ,
( SELECT `request_path`
FROM url_rewrite
WHERE url_rewrite.category_id=main_table.entity_id
AND url_rewrite.is_system = 1
AND url_rewrite.store_id = 1
AND url_rewrite.id_path LIKE 'category/%'
) as request_path
FROM `catalog_category_flat_store_1` AS `main_table`
WHERE (main_table.include_in_menu = '1')
AND (main_table.is_active = '1')
AND (main_table.path like '1/494/%')
AND (`level` <= 2)
ORDER BY `main_table`.`position` ASC
LIMIT 0, 1000
(The suggested indexes apply here, too.)
THIS is not a ANSWER only for comment for #Nigel Ren
Here you can see that LIKE also use index.
mysql> SELECT *
-> FROM testdb
-> WHERE
-> vals LIKE 'text%';
+----+---------------------------------------+
| id | vals |
+----+---------------------------------------+
| 3 | text for line number 3 |
| 1 | textline 1 we rqwe rq wer qwer q wer |
| 2 | textline 2 asdf asd fas f asf wer 3 |
+----+---------------------------------------+
3 rows in set (0,00 sec)
mysql> EXPLAIN
-> SELECT *
-> FROM testdb
-> WHERE
-> vals LIKE 'text%';
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | testdb | NULL | range | vals | vals | 515 | NULL | 3 | 100.00 | Using where; Using index |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0,01 sec)
mysql>
sample with LEFT()
mysql> SELECT *
-> FROM testdb
-> WHERE
-> LEFT(vals,4) = 'text';
+----+---------------------------------------+
| id | vals |
+----+---------------------------------------+
| 3 | text for line number 3 |
| 1 | textline 1 we rqwe rq wer qwer q wer |
| 2 | textline 2 asdf asd fas f asf wer 3 |
+----+---------------------------------------+
3 rows in set (0,01 sec)
mysql> EXPLAIN
-> SELECT *
-> FROM testdb
-> WHERE
-> LEFT(vals,4) = 'text';
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | testdb | NULL | index | NULL | vals | 515 | NULL | 5 | 100.00 | Using where; Using index |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0,01 sec)
mysql>

why is mysql select count(1) taking so long?

When I first started using MySQL, a select count(*) or select count(1) was almost instantaneous. But I'm now using version 5.6.25 hosted at Dreamhost, and it's taking 20-30 seconds, sometimes, to do a select count(1). However, the second time it's fast---like the index is cached---but not super fast, like the data are coming from just the metadata index.
Anybody understand what's going on, and why it has changed?
mysql> select count(1) from times;
+----------+
| count(1) |
+----------+
| 1511553 |
+----------+
1 row in set (22.04 sec)
mysql> select count(1) from times;
+----------+
| count(1) |
+----------+
| 1512007 |
+----------+
1 row in set (0.54 sec)
mysql> select version();
+------------+
| version() |
+------------+
| 5.6.25-log |
+------------+
1 row in set (0.00 sec)
mysql>
I guess when you first started, you used MyISAM, and now you are using InnoDB. InnoDB just doesn't store this information. See documentation: Limits on InnoDB Tables
InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. To process a SELECT COUNT(*) FROM t statement, InnoDB scans an index of the table, which takes some time if the index is not entirely in the buffer pool. To get a fast count, you have to use a counter table you create yourself and let your application update it according to the inserts and deletes it does. If an approximate row count is sufficient, SHOW TABLE STATUS can be used. See Section 9.5, “Optimizing for InnoDB Tables”.
So when your index is entirely in the buffer pool after the (slower) first query, the second query is fast again.
MyISAM doesn't need to care about problems that concurrent transactions might create, because it doesn't support transactions, and so select count(*) from t will just look up and return a stored value very fast.

Why does the query execute so much slower when all the columns involved are the same and only the where condition changes?

I have this query:
SELECT 1 AS InputIndex,
IF(TRIM(DeviceInput1Name = '', 0, IF(INSTR(DeviceInput1Name, '|') > 0, 2, 1)) AS InputType,
(SELECT Value1_1 FROM devicevalues WHERE DeviceID = devices.DeviceID ORDER BY ValueTime DESC LIMIT 1) AS InputValueLeft,
(SELECT Value1_2 FROM devicevalues WHERE DeviceID = devices.DeviceID ORDER BY ValueTime DESC LIMIT 1) AS InputValueRight
FROM devices
WHERE DeviceIMEI = 'Some_Search_Value';
This completes fairly quickly (in up to 0.01 seconds). However, running the same query with WHERE clause as such
WHERE DeviceIMEI = 'Some_Other_Search_Value';
makes it run for upwards of 14 seconds! Some search values finish very quickly, while others run way too long.
If I run EXPLAIN on either query, I get the following:
+----+--------------------+--------------+-------+---------------+------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------+-------+---------------+------------+---------+-------+------+-------------+
| 1 | PRIMARY | devices | ref | DeviceIMEI | DeviceIMEI | 28 | const | 1 | Using where |
| 3 | DEPENDENT SUBQUERY | devicevalues | index | DeviceID,More | ValueTime | 9 | NULL | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | devicevalues | index | DeviceID,More | ValueTime | 9 | NULL | 1 | Using where |
+----+--------------------+--------------+-------+---------------+------------+---------+-------+------+-------------+
Also, here's the actual number of records, just so it's clear:
mysql> select count(*) from devicevalues inner join devices using(DeviceID) where devices.DeviceIMEI = 'Some_Search_Value';
+----------+
| count(*) |
+----------+
| 1017946 |
+----------+
1 row in set (0.17 sec)
mysql> select count(*) from devicevalues inner join devices using(DeviceID) where devices.DeviceIMEI = 'Some_Other_Search_Value';
+----------+
| count(*) |
+----------+
| 306100 |
+----------+
1 row in set (0.04 sec)
Any ideas why changing a search value in the WHERE clause would cause the query to execute so slowly, even when the number of physical records to search through is lower?
Note there is no need for you to rewrite the query, just explain why the above happens.
UPDATE: I have tried running two separate queries instead of one with dependent subqueries to get the information I need (first I select DeviceID from devices by DeviceIMEI, then select from devicevalues by DeviceID I got from the previous query) and all queries return instantly. I suppose the only solution is to run these queries in a transaction, so I'll be making a stored procedure to do this. This, however, still doesn't answer my question which puzzles me.
I dont think that 1017946 is equivalent to the number of rows returned by your very first query.Your first query returns all rows from devices with some correlated queries,your count query returns all common rows between the 2 tables.If this is so the problem might be cardinality issues namely some_other_values constitute a much larger proportion of the rows in your first query than some_value so Mysql chooses a table scan.
If I understand correctly, the query is the same, and only the searched value changes.
There are three real possibilities as I can see, the first much likelier than the others:
The fast query only appears to be fast. And that's why it is in the MySQL query cache already. Try disabling the cache, running with NO_SQL_CACHE, or run the slow query twice. If the second way round runs in 0.01s instead of 14s, you'll know this is the case.
One query has to look way more records than the other. An IMEI may have lots of rows in devicevalues, another might have next no none. Apparently you are in such a condition, and what makes this unlikely is (apart from the times involved) the fact that it is the slower IMEI which actually has less matches.
The slow query is indeed slow. This means that a particular subset of data is hard to locate or hard to retrieve. The first may be due to an overdue reindexing or to filesystem fragmentation of very large indexes. The second can also be due to fragmentation of the tablespace, or to other condition which splits up records (for example the database is partitioned). A search in a small partition is wont to be faster than a search in a large partition.
But the time differences involved aren't equal in the three cases, and a 1400x difference seems to me an unlikely consequence of (2) or (3). The first possibility seems way more appealing.
Update you seem not to be using indexes on your subqueries. Have you an index such as
CREATE INDEX dv_ndx ON devicevalues(DeviceID, ValueTime);
If you can, you can try a covering index:
CREATE INDEX dv_ndx ON devicevalues(DeviceID, ValueTime, Value1_1, Value1_2);