Strange performance degradation when using a subquery

Strange performance degradation when using a subquery - mysql

I have a nano AWS server running MySQL 5.5 for testing purposes. So, keep in mind that the server has limited resources (RAM, CPU, ...).
I have a table called "gpslocations". There is a primary index on its primary key "GPSLocationID". There is another secondary index on one of its fields "userID". The table has 6583 records.
When I run this query:
select * from gpslocations where GPSLocationID in (select max(GPSLocationID) from gpslocations where userID in (1,9) group by userID);
I get two rows and it takes a lot of time:
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
| GPSLocationID | lastUpdate | latitude | longitude | phoneNumber | userID | sessionID | speed | direction | distance | gpsTime | locationMethod | accuracy | extraInfo | eventType |
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
| 4107 | 2018-09-25 16:38:44 | 58.7641435 | 7.4868510 | e5d6fdff-9afe-44bb-a53a-3b454b12c9c6 | 9 | 77385f89-6b72-4b9e-b937-d2927959e0bd | 0 | 0 | 2.9 | 2018-09-25 18:38:43 | fused | 455 | 0 | android |
| 9822 | 2018-10-22 10:29:43 | 58.7794353 | 7.1952995 | 5240853e-2c36-4563-9dc3-238039de411e | 1 | 1fcad5af-c6ef-4bda-8fb2-d6e5688cf08a | 0 | 0 | 185.6 | 2018-10-22 12:29:41 | fused | 129 | 0 | android |
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
2 rows in set (14.96 sec)
When I just execute the inner select:
select max(GPSLocationID) from gpslocations where userID in (1,9) group by userID;
I get two values very fast:
+--------------------+
| max(GPSLocationID) |
+--------------------+
| 9822 |
| 4107 |
+--------------------+
2 rows in set (0.00 sec)
When I take these two values and write them manually in the outer select:
select * from gpslocations where GPSLocationID in (9822,4107);
I get exactly the same result as the first query but in no time!
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
| GPSLocationID | lastUpdate | latitude | longitude | phoneNumber | userID | sessionID | speed | direction | distance | gpsTime | locationMethod | accuracy | extraInfo | eventType |
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
| 4107 | 2018-09-25 16:38:44 | 58.7641435 | 7.4868510 | e5d6fdff-9afe-44bb-a53a-3b454b12c9c6 | 9 | 77385f89-6b72-4b9e-b937-d2927959e0bd | 0 | 0 | 2.9 | 2018-09-25 18:38:43 | fused | 455 | 0 | android |
| 9822 | 2018-10-22 10:29:43 | 58.7794353 | 7.1952995 | 5240853e-2c36-4563-9dc3-238039de411e | 1 | 1fcad5af-c6ef-4bda-8fb2-d6e5688cf08a | 0 | 0 | 185.6 | 2018-10-22 12:29:41 | fused | 129 | 0 | android |
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
2 rows in set (0.00 sec)
Can anybody explain this huge performance degradation when the two simple and fast queries are combined in one?
EDIT
Here is the output of explain:
+----+--------------------+--------------+-------+----------------------+--------+---------+------+------+---------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------+-------+----------------------+--------+---------+------+------+---------------------------------------+
| 1 | PRIMARY | gpslocations | ALL | NULL | NULL | NULL | NULL | 6648 | Using where |
| 2 | DEPENDENT SUBQUERY | gpslocations | range | userNameIndex,userID | userID | 5 | NULL | 11 | Using where; Using index for group-by |
+----+--------------------+--------------+-------+----------------------+--------+---------+------+------+---------------------------------------+
2 rows in set (0.00 sec)

in can have really bad optimization characteristics. In your version of MySQL, the subquery is probably being run once for every row in gsplocations. I think this performance problem was fixed in later versions.
I recommend using a correlated subquery instead:
select l.*
from gpslocations l
where l.GPSLocationID = (select max(l2.GPSLocationID)
from gpslocations l2
where l2.userID = l.userId
) and
l.userID in (1, 9);
And for this, you want an index on gpslocations(userID, GPSLocationID).
Another alternative is the join approach:
select l.*
from gpslocations l join
(select l2.userID, max(l2.GPSLocationID)
from gpslocations l2
where l2.userID in (1, 9)
) l2
on l2.userID = l.userId
where l.userID in (1, 9);

Related

Query on same database runs in .4 seconds in mariadb and 83 seconds in mysql 5.6,5.7 and 8

I have the following query that runs really slow on mysql (83 seconds) but really fast on mariadb (.4 seconds).
I verified the data database has the same indexes and data. Maria Db server has less cpu (1VCPU), memory (2gb)
Mysql servers have 8 - 32GB ram and full quad core processors (tried 5.6,5.7, and 8.0 with similar results).
The phppos_inventory table has ~170000 rows and the phppos_items table has ~3000 rows
Here is the query and the tables and explains
SELECT /*+ SEMIJOIN(#subq MATERIALIZATION) */ SQL_CALC_FOUND_ROWS
1 AS _h,
`phppos_location_items`.`location_id` AS `location_id`,
`phppos_items`.`item_id`,
`phppos_items`.`name`,
`phppos_categories`.`id` AS `category_id`,
`phppos_categories`.`name` AS `category`,
`location`,
`company_name`,
`phppos_items`.`item_number`,
`size`,
`product_id`,
Coalesce(phppos_location_item_variations.cost_price,
phppos_item_variations.cost_price, phppos_location_items.cost_price,
phppos_items.cost_price, 0) AS cost_price,
Coalesce(phppos_location_item_variations.unit_price,
phppos_item_variations.unit_price, phppos_location_items.unit_price,
phppos_items.unit_price, 0) AS unit_price,
Sum(Coalesce(inv.trans_current_quantity, 0)) AS quantity,
Coalesce(phppos_location_item_variations.reorder_level,
phppos_item_variations.reorder_level, phppos_location_items.reorder_level,
phppos_items.reorder_level) AS reorder_level,
Coalesce(phppos_location_item_variations.replenish_level,
phppos_item_variations.replenish_level, phppos_location_items.replenish_level,
phppos_items.replenish_level) AS replenish_level,
description
FROM `phppos_inventory` `inv`
LEFT JOIN `phppos_items`
ON `phppos_items`.`item_id` = `inv`.`trans_items`
LEFT JOIN `phppos_location_items`
ON `phppos_location_items`.`item_id` = `phppos_items`.`item_id`
AND `phppos_location_items`.`location_id` = `inv`.`location_id`
LEFT JOIN `phppos_item_variations`
ON `phppos_items`.`item_id` = `phppos_item_variations`.`item_id`
AND `phppos_item_variations`.`id` = `inv`.`item_variation_id`
AND `phppos_item_variations`.`deleted` = 0
LEFT JOIN `phppos_location_item_variations`
ON `phppos_location_item_variations`.`item_variation_id` =
`phppos_item_variations`.`id`
AND `phppos_location_item_variations`.`location_id` =
`inv`.`location_id`
LEFT OUTER JOIN `phppos_suppliers`
ON `phppos_items`.`supplier_id` =
`phppos_suppliers`.`person_id`
LEFT OUTER JOIN `phppos_categories`
ON `phppos_items`.`category_id` = `phppos_categories`.`id`
WHERE inv.trans_id = (SELECT Max(inv1.trans_id)
FROM phppos_inventory inv1
WHERE inv1.trans_items = inv.trans_items
AND ( inv1.item_variation_id =
phppos_item_variations.id
OR phppos_item_variations.id IS NULL )
AND inv1.location_id = inv.location_id
AND inv1.trans_date < '2019-12-31 23:59:59')
AND inv.location_id IN( 1 )
AND `phppos_items`.`system_item` = 0
AND `phppos_items`.`deleted` = 0
AND `is_service` != 1
GROUP BY `phppos_items`.`item_id`
LIMIT 20
Explain mysql (slighly different than maria db but I tried use index to match the execution plan and still was slow)
+------------------------------------------+-------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+---------------------------------+------------+--------+------------------------------+-------+----------+------------------------------------+
| 1 | PRIMARY | phppos_items | NULL | ref | PRIMARY,item_number,product_id,phppos_items_ibfk_1,deleted,phppos_items_ibfk_3,phppos_items_ibfk_4,phppos_items_ibfk_5,description,size,reorder_level,cost_price,unit_price,promo_price,last_modified,name,phppos_items_ibfk_6,deleted_system_item,custom_field_1_value,custom_field_2_value,custom_field_3_value,custom_field_4_value,custom_field_5_value,custom_field_6_value,custom_field_7_value,custom_field_8_value,custom_field_9_value,custom_field_10_value,verify_age,phppos_items_ibfk_7,item_inactive_index,tags,full_search,name_search,item_number_search,product_id_search,description_search,size_search,custom_field_1_value_search,custom_field_2_value_search,custom_field_3_value_search,custom_field_4_value_search,custom_field_5_value_search,custom_field_6_value_search,custom_field_7_value_search,custom_field_8_value_search,custom_field_9_value_search,custom_field_10_value_search | deleted | 4 | const | 21188 | 9.00 | Using index condition; Using where |
| 1 | PRIMARY | inv | NULL | ref | phppos_inventory_ibfk_1,location_id,phppos_inventory_custom | phppos_inventory_custom | 8 | pos.phppos_items.item_id,const | 3 | 100.00 | NULL |
| 1 | PRIMARY | phppos_location_items | NULL | eq_ref | PRIMARY,phppos_location_items_ibfk_2 | PRIMARY | 8 | const,pos.phppos_items.item_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_item_variations | NULL | eq_ref | PRIMARY,phppos_item_variations_ibfk_1 | PRIMARY | 4 | pos.inv.item_variation_id | 1 | 100.00 | Using where |
| 1 | PRIMARY | phppos_location_item_variations | NULL | eq_ref | PRIMARY,phppos_item_attribute_location_values_ibfk_2 | PRIMARY | 8 | pos.phppos_item_variations.id,const | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_suppliers | NULL | ref | person_id | person_id | 4 | pos.phppos_items.supplier_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_categories | NULL | eq_ref | PRIMARY | PRIMARY | 4 | pos.phppos_items.category_id | 1 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | inv1 | NULL | ref | phppos_inventory_ibfk_1,location_id,trans_date,phppos_inventory_ibfk_4,phppos_inventory_custom | phppos_inventory_custom | 8 | pos.inv.trans_items,pos.inv.location_id | 3 | 50.00 | Using where; Using index |
+----+--------------------+---------------------------------+------------+--------+---------------------------------------------------------------------------------------------------------
Explain maria db:
+------+---------------------------------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+---------------------------------+--------+------------------------------+
| 1 | PRIMARY | phppos_items | ref | PRIMARY,deleted,deleted_system_item | deleted | 4 | const | 23955 | Using where |
| 1 | PRIMARY | inv | ref | phppos_inventory_ibfk_1,location_id,phppos_inventory_custom | phppos_inventory_ibfk_1 | 4 | freelance_pos5.phppos_items.item_id | 2 | Using where |
| 1 | PRIMARY | phppos_location_items | eq_ref | PRIMARY,phppos_location_items_ibfk_2 | PRIMARY | 8 | const,freelance_pos5.phppos_items.item_id | 1 | |
| 1 | PRIMARY | phppos_item_variations | eq_ref | PRIMARY,phppos_item_variations_ibfk_1 | PRIMARY | 4 | freelance_pos5.inv.item_variation_id | 1 | Using where |
| 1 | PRIMARY | phppos_location_item_variations | eq_ref | PRIMARY,phppos_item_attribute_location_values_ibfk_2 | PRIMARY | 8 | freelance_pos5.phppos_item_variations.id,const | 1 | Using where |
| 1 | PRIMARY | phppos_suppliers | ref | person_id | person_id | 4 | freelance_pos5.phppos_items.supplier_id | 1 | Using where |
| 1 | PRIMARY | phppos_categories | eq_ref | PRIMARY | PRIMARY | 4 | freelance_pos5.phppos_items.category_id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | inv1 | ref | phppos_inventory_ibfk_1,location_id,trans_date,phppos_inventory_ibfk_4,phppos_inventory_custom | phppos_inventory_custom | 8 | freelance_pos5.inv.trans_items,freelance_pos5.inv.location_id | 2 | Using where; Using index |
+------+--------------------+---------------------------------+--------+------------------------------------------------------------------------------------------------+-------------------------+---------+---------------------------------------------------------------+-------+--------------------------+
Tables described (Reached StackOverflow char limit)
https://pastebin.com/nhngSHb8
Create tables:
https://pastebin.com/aWMeriqt
MYSQL (DEV BOX)
mysql> SHOW GLOBAL STATUS LIKE '%thread%';
+------------------------------------------+-------+
| Variable_name | Value |
+------------------------------------------+-------+
| Delayed_insert_threads | 0 |
| Performance_schema_thread_classes_lost | 0 |
| Performance_schema_thread_instances_lost | 0 |
| Slow_launch_threads | 0 |
| Threads_cached | 4 |
| Threads_connected | 1 |
| Threads_created | 5 |
| Threads_running | 1 |
+------------------------------------------+-------+
8 rows in set (0.06 sec)
MARIA DB
MariaDB [freelance_pos5]> SHOW GLOBAL STATUS LIKE '%thread%';
+------------------------------------------+-------+
| Variable_name | Value |
+------------------------------------------+-------+
| Delayed_insert_threads | 0 |
| Performance_schema_thread_classes_lost | 0 |
| Performance_schema_thread_instances_lost | 0 |
| Slow_launch_threads | 0 |
| Threadpool_idle_threads | 0 |
| Threadpool_threads | 0 |
| Threads_cached | 3 |
| Threads_connected | 2 |
| Threads_created | 5 |
| Threads_running | 1 |
| wsrep_applier_thread_count | 0 |
| wsrep_rollbacker_thread_count | 0 |
| wsrep_thread_count | 0 |
+------------------------------------------+-------+
13 rows in set (0.00 sec)

Moving the
WHERE inv.trans_id = (SELECT Max(inv1.trans_id)
into the INNER JOIN is the game changer.
INNER JOIN (
SELECT inv1.trans_items, inv1.item_variation_id, inv1.location_id, MAX(inv1.trans_id) as trans_id
FROM phppos_inventory inv1
WHERE inv1.trans_date < '2019-12-31 23:59:59'
GROUP BY inv1.trans_items, inv1.item_variation_id, inv1.location_id
ORDER BY inv1.trans_items, inv1.item_variation_id, inv1.location_id
) inv1 on inv1.trans_id = inv.trans_id
AND inv1.trans_items = inv.trans_items
AND (inv1.item_variation_id = phppos_item_variations.id OR phppos_item_variations.id IS NULL)
AND inv1.location_id = inv.location_id
The execution is reduced from 80+s down to ~ <0.4s, on MySQL 8.0.

MariaDB's and MySQL's Optimizers started diverging significantly at 5.6. Certain queries will run queries faster in one than the other.
I think I see a way to speed up the query, perhaps on both versions.
Don't use LEFT JOIN when it is the same as JOIN, which seems to be the case for at least phppos_items, which has items in the WHERE that override LEFT.
Please provide SHOW CREATE TABLE; meanwhile, I will guess that what indexes you have/don't have, and that each table has PRIMARY KEY(id)
Use composite indexes where appropriate. (More below.)
Get the 20 rows before JOINing to the rest of the tables:
SELECT ...
FROM ( SELECT inv.id, pi.id
FROM `phppos_inventory` AS inv `inv`
JOIN `phppos_items` AS pi
ON pi.`item_id` = `inv`.`trans_items`
AND inv.location_id IN( 1 )
AND pi.`system_item` = 0
AND pi.`deleted` = 0
AND `is_service` != 1 -- Which table is this in???
GROUP BY pi.`item_id`
LIMIT 20 )
LEFT JOIN .... (( all the other tables ))
-- no GROUP BY or LIMIT needed (I think)
phppos_items: INDEX(item_id, deleted, system_item, is_service)
phppos_items: INDEX(deleted, system_item, is_service)
phppos_inventory: INDEX(trans_items, location_id, location_id, item_variation_id, trans_date, trans_id)
phppos_inventory: INDEX(location_id)

Aside with the fact that the query is misleading since the outer join is discarded, the main difference is that the second engine operation in MariabDB is an index range scan (ref) using the phppos_inventory_custom index. MySQL also chose an index range scan but over phppos_inventory_ibfk_1.
However, without the definition of these two indexes it's difficult to asses why the engines may have chosen a different path.
Please add to your question the definition of these indexes, and alse their selectivity (percent of estimated rows selected / total table rows) to elaborate more.

Improve self-JOIN SQL Query performance

I try to improve performance of a SQL query, using MariaDB 10.1.18 (Linux Debian Jessie).
The server has a large amount of RAM (192GB) and SSD disks.
The real table has hundreds of millions of rows but I can reproduce my performance issue on a subset of the data and a simplified layout.
Here is the (simplified) table definition:
CREATE TABLE `data` (
`uri` varchar(255) NOT NULL,
`category` tinyint(4) NOT NULL,
`value` varchar(255) NOT NULL,
PRIMARY KEY (`uri`,`category`),
KEY `cvu` (`category`,`value`,`uri`),
KEY `cu` (`category`,`uri`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
To reproduce the actual distribution of my content, I insert about 200'000 rows like this (bash script):
#!/bin/bash
for i in `seq 1 100000`;
do
mysql mydb -e "INSERT INTO data (uri, category, value) VALUES ('uri${i}', 1, 'foo');"
done
for i in `seq 99981 200000`;
do
mysql mydb -e "INSERT INTO data (uri, category, value) VALUES ('uri${i}', 2, '$(($i % 5))');"
done
So, we insert about:
100'000 rows in category 1 with a static string ("foo") as value
100'000 rows in category 2 with a number between 1 and 5 as the value
20 rows have a common "uri" between each dataset (category 1 / 2)
I always run an ANALYZE TABLE before querying.
Here is the explain output of the query I run:
MariaDB [mydb]> EXPLAIN EXTENDED
-> SELECT d2.uri, d2.value
-> FROM data as d1
-> INNER JOIN data as d2 ON d1.uri = d2.uri AND d2.category = 2
-> WHERE d1.category = 1 and d1.value = 'foo';
+------+-------------+-------+--------+----------------+---------+---------+-------------------+-------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+--------+----------------+---------+---------+-------------------+-------+----------+-------------+
| 1 | SIMPLE | d1 | ref | PRIMARY,cvu,cu | cu | 1 | const | 92964 | 100.00 | Using where |
| 1 | SIMPLE | d2 | eq_ref | PRIMARY,cvu,cu | PRIMARY | 768 | mydb.d1.uri,const | 1 | 100.00 | |
+------+-------------+-------+--------+----------------+---------+---------+-------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
MariaDB [mydb]> SHOW WARNINGS;
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select `mydb`.`d2`.`uri` AS `uri`,`mydb`.`d2`.`value` AS `value` from `mydb`.`data` `d1` join `mydb`.`data` `d2` where ((`mydb`.`d1`.`category` = 1) and (`mydb`.`d2`.`uri` = `mydb`.`d1`.`uri`) and (`mydb`.`d2`.`category` = 2) and (`mydb`.`d1`.`value` = 'foo')) |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
MariaDB [mydb]> SELECT d2.uri, d2.value FROM data as d1 INNER JOIN data as d2 ON d1.uri = d2.uri AND d2.category = 2 WHERE d1.category = 1 and d1.value = 'foo';
+-----------+-------+
| uri | value |
+-----------+-------+
| uri100000 | 0 |
| uri99981 | 1 |
| uri99982 | 2 |
| uri99983 | 3 |
| uri99984 | 4 |
| uri99985 | 0 |
| uri99986 | 1 |
| uri99987 | 2 |
| uri99988 | 3 |
| uri99989 | 4 |
| uri99990 | 0 |
| uri99991 | 1 |
| uri99992 | 2 |
| uri99993 | 3 |
| uri99994 | 4 |
| uri99995 | 0 |
| uri99996 | 1 |
| uri99997 | 2 |
| uri99998 | 3 |
| uri99999 | 4 |
+-----------+-------+
20 rows in set (0.35 sec)
This query returns 20 rows in ~350ms.
It seems quite slow to me.
Is there a way to improve performance of such query? Any advice?

Can you try the following query?
SELECT dd.uri, max(case when dd.category=2 then dd.value end) v2
FROM data as dd
GROUP by 1
having max(case when dd.category=1 then dd.value end)='foo' and v2 is not null;
I cannot at the moment repeat your test, but my hope is that having to scan the table just once could compensate the usage of the aggregate functions.
Edited
Created a test environment and tested some hypothesis.
As of today, the best performance (for 1 million rows) has been:
1 - Adding an index on uri column
2 - Using the following query
select d2.uri, d2.value
FROM data as d2
where exists (select 1
from data d1
where d1.uri = d2.uri
AND d1.category = 1
and d1.value='foo')
and d2.category=2
and d2.uri in (select uri from data group by 1 having count(*) > 1);
The ironic thing is that in the first proposal I tried to minimize the access to the table and now I'm proposing three accesses.
Edited: 30/10
Ok, so I've done some other experiments and I would like to summarize the outcomes.
First, I'd like to expand a bit Aruna answer:
what I found interesting in the OP question, is that it is an exception to a classic "rule of thumb" in database optimization: if the # of desired results is very small compared to the dimension of the tables involved, it should be possible with the correct indexes to have a very good performance.
Why can't we simply add a "magic index" to have our 20 rows? Because we don't have any clear "attack vector".. I mean, there's no clearly selective criteria we can apply on a record to reduce significatevely the number of the target rows.
Think about it: the fact that the value must be "foo" is just removing 50% of the table form the equation. Also the category is not selective at all: the only interest thing is that, for 20 uri, they appear both in records with category 1 and 2.
But here lies the issue: the condition involves comparing two rows, and unfortunately, to my knowledge, there's no way an index (not even the Oracle Function Based Indexes) can reduce a condition that is dependant on info on multiple rows.
The conlclusion might be: if these kind of query is what you need, you should revise your data model. For example, if you have a finite and small number of categories (lets' say three=, your table might be written as:
uri, value_category1, value_category2, value_category3
The query would be:
select uri, value_category2
where value_category1='foo' and value_category2 is not null;
By the way, let's go back tp the original question.
I've created a slightly more efficient test data generator (http://pastebin.com/DP8Uaj2t).
I've used this table:
use mydb;
DROP TABLE IF EXISTS data2;
CREATE TABLE data2
(
uri varchar(255) NOT NULL,
category tinyint(4) NOT NULL,
value varchar(255) NOT NULL,
PRIMARY KEY (uri,category),
KEY cvu (category,value,uri),
KEY ucv (uri,category,value),
KEY u (uri),
KEY cu (category,uri)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The outcome is:
+--------------------------+----------+----------+----------+
| query_descr | num_rows | num | num_test |
+--------------------------+----------+----------+----------+
| exists_plus_perimeter | 10000 | 0.0000 | 5 |
| exists_plus_perimeter | 50000 | 0.0000 | 5 |
| exists_plus_perimeter | 100000 | 0.0000 | 5 |
| exists_plus_perimeter | 500000 | 2.0000 | 5 |
| exists_plus_perimeter | 1000000 | 4.8000 | 5 |
| exists_plus_perimeter | 5000000 | 26.7500 | 8 |
| max_based | 10000 | 0.0000 | 5 |
| max_based | 50000 | 0.0000 | 5 |
| max_based | 100000 | 0.0000 | 5 |
| max_based | 500000 | 3.2000 | 5 |
| max_based | 1000000 | 7.0000 | 5 |
| max_based | 5000000 | 49.5000 | 8 |
| max_based_with_ucv | 10000 | 0.0000 | 5 |
| max_based_with_ucv | 50000 | 0.0000 | 5 |
| max_based_with_ucv | 100000 | 0.0000 | 5 |
| max_based_with_ucv | 500000 | 2.6000 | 5 |
| max_based_with_ucv | 1000000 | 7.0000 | 5 |
| max_based_with_ucv | 5000000 | 36.3750 | 8 |
| standard_join | 10000 | 0.0000 | 5 |
| standard_join | 50000 | 0.4000 | 5 |
| standard_join | 100000 | 2.4000 | 5 |
| standard_join | 500000 | 13.4000 | 5 |
| standard_join | 1000000 | 33.2000 | 5 |
| standard_join | 5000000 | 205.2500 | 8 |
| standard_join_plus_perim | 5000000 | 155.0000 | 2 |
+--------------------------+----------+----------+----------+
The queries used are:
- query_max_based_with_ucv.sql
- query_exists_plus_perimeter.sql
- query_max_based.sql
- query_max_based_with_ucv.sql
- query_standard_join_plus_perim.sql query_standard_join.sql
The best query is still the "query_exists_plus_perimeter"that I've put after the first environment creation.

It is mainly due to the number of rows analysed. Even though you have tables indexed the main decision making condition "WHERE d1.category = 1 and d1.value = 'foo'" filters huge amount of rows
+------+-------------+-------+-.....-+-------+----------+-------------+
| id | select_type | table | | rows | filtered | Extra |
+------+-------------+-------+-.....-+-------+----------+-------------+
| 1 | SIMPLE | d1 | ..... | 92964 | 100.00 | Using where |
Each and every matching row it has to read the table again which is for category 2. Since it is reading on primary key it can get the matching row directly.
On your original table check the cardinality of the combination of category and value. If it is more towards unique, you can add an index on (category, value) and that should improve the performance. If it is same like the example given you may not get any performance improvement.

Improve Slow Mysql Query

i have a database table containing events.
mysql> describe events;
+-------------+------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------------------+----------------+
| device | varchar(32) | YES | MUL | NULL | |
| psu | varchar(32) | YES | MUL | NULL | |
| event | varchar(32) | YES | MUL | NULL | |
| down_time | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| up_time | timestamp | NO | MUL | 0000-00-00 00:00:00 | |
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
+-------------+------------------+------+-----+---------------------+----------------+
6 rows in set (0.01 sec)
i want to find events that overlap in time and use the following query:
SELECT *
FROM link_events a
JOIN link_events b
ON ( a.down_time <= b.up_time )
AND ( a.up_time >= b.down_time )
WHERE (a.device = 'd1' AND b.device = 'd2')
AND (a.psu = 'p1' AND b.psu = 'p2')
AND (a.event = 'e1' AND b.event = 'e2');
+-------------+-----------+------------+---------------------+---------------------+--------+-------------+-----------+------------+---------------------+---------------------+--------+
| device | psu | event | down_time | up_time | id | device | psu | event | down_time | up_time | id |
+-------------+-----------+------------+---------------------+---------------------+--------+-------------+-----------+------------+---------------------+---------------------+--------+
| d1 | p1 | e1 | 2013-01-14 16:42:10 | 2013-01-14 16:43:00 | 374529 | d2 | p2 | e2 | 2013-01-14 16:42:14 | 2013-01-14 16:42:18 | 211570 |
| d1 | p1 | e1 | 2013-05-29 18:49:26 | 2013-05-30 12:31:15 | 374569 | d2 | p2 | e2 | 2013-05-30 08:48:20 | 2013-05-30 08:48:27 | 211787 |
| d1 | p1 | e1 | 2013-05-29 18:49:26 | 2013-05-30 12:31:15 | 374569 | d2 | p2 | e2 | 2013-05-30 08:48:54 | 2013-05-30 08:48:58 | 211788 |
+-------------+-----------+------------+---------------------+---------------------+--------+-------------+-----------+------------+---------------------+---------------------+--------+
3 rows in set (35.88 sec)
The events table contains the following number of rows:
mysql> select count(*) from events;
+----------+
| count(*) |
+----------+
| 977759 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from events where device = 'd1' and psu = 'p1' and event = 'e1';
+----------+
| count(*) |
+----------+
| 11397 |
+----------+
1 row in set (0.12 sec)
mysql> select count(*) from events where device = 'd2' and psu = 'p2' and event = 'e2';
+----------+
| count(*) |
+----------+
| 243 |
+----------+
1 row in set (0.00 sec)
The database is installed on Windows 7 laptop and uses MyISAM engine.
Is there a way to better organise the database or change indexing to
improve query time which for first query is 35 secs. Repeating the
same query gives an immediate result however if i 'flush tables' and
repeat query a third time the time taken is again 35 secs.
Any help appreciated !
Here is output from EXPLAIN after ADD KEY:
mysql> EXPLAIN
-> SELECT *
->
-> FROM link_events a
-> JOIN link_events b
->
-> ON ( a.down_time <= b.up_time )
-> AND ( a.up_time >= b.down_time )
->
-> WHERE (a.device = 'd1' AND b.device = 'd2')
-> AND (a.psu = 'l1' AND b.psu = 'l2')
-> AND (a.event = 'e1' AND b.event = 'e2');
+----+-------------+-------+------+--------------------------------------------------------------------------------+---------------+---------+-------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------------------------------------------------------------------------+---------------+---------+-------------------+------+-----------------------+
| 1 | SIMPLE | b | ref | device,psu,event,down_time,up_time,device_2,device_3 | device_2 | 297 | const,const,const | 180 | Using index condition |
| 1 | SIMPLE | a | ref | device,psu,event,down_time,up_time,device_2,device_3 | device_2 | 297 | const,const,const | 7744 | Using index condition |
+----+-------------+-------+------+--------------------------------------------------------------------------------+---------------+---------+-------------------+------+-----------------------+
2 rows in set (0.07 sec)
New column:
mysql> describe link_events;
+-------------+------------------+------+-----+---------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------------------+-----------------------------+
| device_name | varchar(32) | YES | MUL | NULL | |
| link_name | varchar(32) | YES | MUL | NULL | |
| event_type | varchar(32) | YES | MUL | NULL | |
| down_time | timestamp | NO | MUL | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| up_time | timestamp | NO | MUL | 0000-00-00 00:00:00 | |
| span | geometry | NO | MUL | NULL | |
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
+-------------+------------------+------+-----+---------------------+-----------------------------+
7 rows in set (0.03 sec)
EXPLAIN:
mysql> EXPLAIN
->
-> SELECT
->
-> CONCAT('Link1','-', 'Link2') overlaps,
-> GREATEST(a.down_time,b.down_time) AS downtime,
-> LEAST(a.up_time,b.up_time) AS uptime,
-> TIME_TO_SEC(TIMEDIFF( LEAST(a.up_time,b.up_time),
-> GREATEST(a.down_time,b.down_time))) AS duration
->
-> FROM link_events a
-> JOIN link_events b
->
-> ON Intersects (a.span, b.span)
->
-> WHERE (a.device_name = 'd1' AND b.device_name = 'd2')
-> AND (a.link_name = 'l1' AND b.link_name = 'l2')
-> AND (a.event_type = 'e1' AND b.event_type = 'e1');
+----+-------------+-------+------+-------------------------------------------------------------------+---------------+---------+-------------------+-------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-------------------------------------------------------------------+---------------+---------+-------------------+-------+------------------------------------+
| 1 | SIMPLE | a | ref | span,device_name,link_name,event_type,device_name_2,device_name_3 | device_name_2 | 297 | const,const,const | 383 | Using index condition |
| 1 | SIMPLE | b | ref | span,device_name,link_name,event_type,device_name_2,device_name_3 | device_name_2 | 297 | const,const,const | 14580 | Using index condition; Using where |
+----+-------------+-------+------+-------------------------------------------------------------------+---------------+---------+-------------------+-------+------------------------------------+
2 rows in set (0.09 sec)
Using Intersects takes 1min 12 secs?

For this query:
SELECT *
FROM link_events a JOIN
link_events b
ON (a.down_time <= b.up_time) AND (a.up_time >= b.down_time)
WHERE (a.device = 'd1' AND b.device = 'd2') AND
(a.psu = 'p1' AND b.psu = 'p2') AND
(a.event = 'e1' AND b.event = 'e2');
You want indexes on link_events(device, psu, event, up_time, down_time). For clarity, I would express the query more like this:
SELECT *
FROM link_events a JOIN
link_events b
ON (a.down_time <= b.up_time) AND (a.up_time >= b.down_time)
WHERE (a.device, a.psu, a.event) IN (('d1', 'p1', 'e1')) AND
(b.device, a.psu, a.event) IN (('d2', 'p2', 'e2'));

Try:
ALTER TABLE link_events ADD KEY(device,psu,event,up_time),
ADD KEY(device,psu,event,down_time)
Hopefully this will be selective enough. If this does not help, post the results of EXPLAIN so we can make sure the optimizer is doing the best it can, and we will go from there if needed.
Edit:
It is important to understand that not all indexes are of equal value for a particular query. A common mistake is to think of an index as some magic worker that will automatically speed up the query if you just reference the column in the index. This is not quite the case. The keys need to be designed and the queries needs to be written in such a way that allows the best possible access path to the records. Changing something that might appear insignificant such as the order of the columns in the index or writing SQRT(x) = 4.4 instead of x = 4.4 * 4.4 could make the index unusable and slow the query down by a factor of a thousand or even a million or more.
I highly recommend reading this:
http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html
Having a feel for how MySQL uses keys can save you a lot of trouble in the future.
EDIT 2 - another idea is to add a column span GEOMETRY NOT NULL, SPATIAL KEY (span) containing linestring(point(up_time,0),point(down_time,0)) - times would need to be numeric (you can convert using UNIX_TIMESTAMP() for example) - and use Intersects(a.span,b.span) in the query. With some fine tuning this has the potential of being much faster than even the improved query because span intersections are being detected using a geometry-based algorithm specially designed for such things.

What is the performance difference in MySQL relational division (IN AND instead of IN OR) implementations?

Because MySQL does not have a built in relational division operator, programmers must implement their own. There are two leading examples of implementations which can be found in this answer here.
For posterity I'll list them below:
Using GROUP BY/HAVING
SELECT t.documentid
FROM TABLE t
WHERE t.termid IN (1,2,3)
GROUP BY t.documentid
HAVING COUNT(DISINCT t.termid) = 3
The caveat is that you have to use HAVING COUNT(DISTINCT because
duplicates of termid being 2 for the same documentid would be a false
positive. And the COUNT has to equal the number of termid values in
the IN clause.
Using JOINs
SELECT t.documentid
FROM TABLE t
JOIN TABLE x ON x.termid = t.termid
AND x.termid = 1
JOIN TABLE y ON y.termid = t.termid
AND y.termid = 2
JOIN TABLE z ON z.termid = t.termid
AND z.termid = 3
But this one can be a pain for handling criteria that changes a lot.
Of these two implementation techniques, which one would offer the best performance?

I made some improvements in the JOIN version; see below.
I vote for the JOIN approach for speed. Here's how I determined it:
HAVING, version 1
mysql> FLUSH STATUS;
mysql> SELECT city
-> FROM us_vch200
-> WHERE state IN ('IL', 'MO', 'PA')
-> GROUP BY city
-> HAVING count(DISTINCT state) >= 3;
+-------------+
| city |
+-------------+
| Springfield |
| Washington |
+-------------+
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_external_lock | 2 |
| Handler_read_first | 1 |
| Handler_read_key | 2 |
| Handler_read_last | 1 |
| Handler_read_next | 4175 | -- full index scan
(etc)
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+
| 1 | SIMPLE | us_vch200 | range | state_city,city_state | city_state | 769 | NULL | 4176 | Using where; Using index for group-by (scanning) |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+
The 'Extra' points out that it decided to tackle the GROUP BY and use INDEX(city, state) even though INDEX(state, city) might make sense.
HAVING, version 2
Making it switch to INDEX(state, city) yields:
mysql> FLUSH STATUS;
mysql> SELECT city
-> FROM us_vch200 IGNORE INDEX(city_state)
-> WHERE state IN ('IL', 'MO', 'PA')
-> GROUP BY city
-> HAVING count(DISTINCT state) >= 3;
+-------------+
| city |
+-------------+
| Springfield |
| Washington |
+-------------+
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_external_lock | 2 |
| Handler_read_key | 401 |
| Handler_read_next | 398 |
| Handler_read_rnd | 398 |
(etc)
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+
| 1 | SIMPLE | us_vch200 | range | state_city,city_state | state_city | 2 | NULL | 397 | Using where; Using index; Using filesort |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+
JOIN
mysql> SELECT x.city
-> FROM us_vch200 x
-> JOIN us_vch200 y ON y.city= x.city AND y.state = 'MO'
-> JOIN us_vch200 z ON z.city= x.city AND z.state = 'PA'
-> WHERE x.state = 'IL';
+-------------+
| city |
+-------------+
| Springfield |
| Washington |
+-------------+
2 rows in set (0.00 sec)
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_external_lock | 6 |
| Handler_read_key | 86 |
| Handler_read_next | 87 |
(etc)
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+
| 1 | SIMPLE | y | ref | state_city,city_state | state_city | 2 | const | 81 | Using where; Using index |
| 1 | SIMPLE | z | ref | state_city,city_state | state_city | 769 | const,world.y.city | 1 | Using where; Using index |
| 1 | SIMPLE | x | ref | state_city,city_state | state_city | 769 | const,world.y.city | 1 | Using where; Using index |
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+
Only INDEX(state, city) is needed. The Handler numbers are the smallest for this formulation, so I deduce that it is the fastest.
Notice how the optimizer made up its own mind which table to start with, probably due to
+-------+----------+
| state | COUNT(*) |
+-------+----------+
| IL | 221 |
| MO | 81 | -- smallest
| PA | 96 |
+-------+----------+
Conclusions
JOIN (without the unnecessary t table) is probably the fastest. Plus this composite index is needed: INDEX(state, city).
To translate back to your use case:
city --> documentid
state --> termid
Caveat: YMMV because the distribution of values for documentid and termid could be quite different than the test case I used.

indexes and speeding up 'derived' queries

I've recently noticed that a query I have is running quite slowly, at almost 1 second per query.
The query looks like this
SELECT eventdate.id,
eventdate.eid,
eventdate.date,
eventdate.time,
eventdate.title,
eventdate.address,
eventdate.rank,
eventdate.city,
eventdate.state,
eventdate.name,
source.link,
type,
eventdate.img
FROM source
RIGHT OUTER JOIN
(
SELECT event.id,
event.date,
users.name,
users.rank,
users.eid,
event.address,
event.city,
event.state,
event.lat,
event.`long`,
GROUP_CONCAT(types.type SEPARATOR ' | ') AS type
FROM event FORCE INDEX (latlong_idx)
JOIN users ON event.uid = users.id
JOIN types ON users.tid=types.id
WHERE `long` BETWEEN -74.36829174058 AND -73.64365405942
AND lat BETWEEN 40.35195025942 AND 41.07658794058
AND event.date >= '2009-10-15'
GROUP BY event.id, event.date
ORDER BY event.date, users.rank DESC
LIMIT 0, 20
)eventdate
ON eventdate.uid = source.uid
AND eventdate.date = source.date;
and the explain is
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+-------+---------------------------------+
| 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 20 | |
| 1 | PRIMARY | source | ref | iddate_idx | iddate_idx | 7 | eventdate.id,eventdate.date | 156 | |
| 2 | DERIVED | event | ALL | latlong_idx | NULL | NULL | NULL | 19500 | Using temporary; Using filesort |
| 2 | DERIVED | types | ref | eid_idx | eid_idx | 4 | active.event.id | 10674 | Using index |
| 2 | DERIVED | users | eq_ref | id_idx | id_idx | 4 | active.types.id | 1 | Using where |
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+-------+---------------------------------+
I've tried using 'force index' on latlong, but that doesn't seem to speed things up at all.
Is it the derived table that is causing the slow responses? If so, is there a way to improve the performance of this?
--------EDIT-------------
I've attempted to improve the formatting to make it more readable, as well
I run the same query changing only the 'WHERE statement as
WHERE users.id = (
SELECT users.id
FROM users
WHERE uidname = 'frankt1'
ORDER BY users.approved DESC , users.rank DESC
LIMIT 1 )
AND date & gt ; = '2009-10-15'
GROUP BY date
ORDER BY date)
That query runs in 0.006 seconds
the explain looks like
+----+-------------+------------+-------+---------------+---------------+---------+------------------------------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+---------------+---------+------------------------------+------+----------------+
| 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 42 | |
| 1 | PRIMARY | source | ref | iddate_idx | iddate_idx | 7 | eventdate.id,eventdate.date | 156 | |
| 2 | DERIVED | users | const | id_idx | id_idx | 4 | | 1 | |
| 2 | DERIVED | event | range | eiddate_idx | eiddate_idx | 7 | NULL | 24 | Using where |
| 2 | DERIVED | types | ref | eid_idx | eid_idx | 4 | active.event.bid | 3 | Using index |
| 3 | SUBQUERY | users | ALL | idname_idx | idname_idx | 767 | | 5 | Using filesort |
+----+-------------+------------+-------+---------------+---------------+---------+------------------------------+------+----------------+

The only way to clean up that mammoth SQL statement is to go back to the drawing board and carefully work though your database design and requirements. As soon as you start joining 6 tables and using an inner select you should expect incredible execution times.
As a start, ensure that all your id fields are indexed, but better to ensure that your design is valid. I don't know where to START looking at your SQL - even after I reformatted it for you.
Note that 'using indexes' means you need to issue the correct instructions when you CREATE or ALTER the tables you are using. See for instance MySql 5.0 create indexes

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008