Improve Slow Mysql Query - mysql

i have a database table containing events.
mysql> describe events;
+-------------+------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------------------+----------------+
| device | varchar(32) | YES | MUL | NULL | |
| psu | varchar(32) | YES | MUL | NULL | |
| event | varchar(32) | YES | MUL | NULL | |
| down_time | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| up_time | timestamp | NO | MUL | 0000-00-00 00:00:00 | |
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
+-------------+------------------+------+-----+---------------------+----------------+
6 rows in set (0.01 sec)
i want to find events that overlap in time and use the following query:
SELECT *
FROM link_events a
JOIN link_events b
ON ( a.down_time <= b.up_time )
AND ( a.up_time >= b.down_time )
WHERE (a.device = 'd1' AND b.device = 'd2')
AND (a.psu = 'p1' AND b.psu = 'p2')
AND (a.event = 'e1' AND b.event = 'e2');
+-------------+-----------+------------+---------------------+---------------------+--------+-------------+-----------+------------+---------------------+---------------------+--------+
| device | psu | event | down_time | up_time | id | device | psu | event | down_time | up_time | id |
+-------------+-----------+------------+---------------------+---------------------+--------+-------------+-----------+------------+---------------------+---------------------+--------+
| d1 | p1 | e1 | 2013-01-14 16:42:10 | 2013-01-14 16:43:00 | 374529 | d2 | p2 | e2 | 2013-01-14 16:42:14 | 2013-01-14 16:42:18 | 211570 |
| d1 | p1 | e1 | 2013-05-29 18:49:26 | 2013-05-30 12:31:15 | 374569 | d2 | p2 | e2 | 2013-05-30 08:48:20 | 2013-05-30 08:48:27 | 211787 |
| d1 | p1 | e1 | 2013-05-29 18:49:26 | 2013-05-30 12:31:15 | 374569 | d2 | p2 | e2 | 2013-05-30 08:48:54 | 2013-05-30 08:48:58 | 211788 |
+-------------+-----------+------------+---------------------+---------------------+--------+-------------+-----------+------------+---------------------+---------------------+--------+
3 rows in set (35.88 sec)
The events table contains the following number of rows:
mysql> select count(*) from events;
+----------+
| count(*) |
+----------+
| 977759 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from events where device = 'd1' and psu = 'p1' and event = 'e1';
+----------+
| count(*) |
+----------+
| 11397 |
+----------+
1 row in set (0.12 sec)
mysql> select count(*) from events where device = 'd2' and psu = 'p2' and event = 'e2';
+----------+
| count(*) |
+----------+
| 243 |
+----------+
1 row in set (0.00 sec)
The database is installed on Windows 7 laptop and uses MyISAM engine.
Is there a way to better organise the database or change indexing to
improve query time which for first query is 35 secs. Repeating the
same query gives an immediate result however if i 'flush tables' and
repeat query a third time the time taken is again 35 secs.
Any help appreciated !
Here is output from EXPLAIN after ADD KEY:
mysql> EXPLAIN
-> SELECT *
->
-> FROM link_events a
-> JOIN link_events b
->
-> ON ( a.down_time <= b.up_time )
-> AND ( a.up_time >= b.down_time )
->
-> WHERE (a.device = 'd1' AND b.device = 'd2')
-> AND (a.psu = 'l1' AND b.psu = 'l2')
-> AND (a.event = 'e1' AND b.event = 'e2');
+----+-------------+-------+------+--------------------------------------------------------------------------------+---------------+---------+-------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------------------------------------------------------------------------+---------------+---------+-------------------+------+-----------------------+
| 1 | SIMPLE | b | ref | device,psu,event,down_time,up_time,device_2,device_3 | device_2 | 297 | const,const,const | 180 | Using index condition |
| 1 | SIMPLE | a | ref | device,psu,event,down_time,up_time,device_2,device_3 | device_2 | 297 | const,const,const | 7744 | Using index condition |
+----+-------------+-------+------+--------------------------------------------------------------------------------+---------------+---------+-------------------+------+-----------------------+
2 rows in set (0.07 sec)
New column:
mysql> describe link_events;
+-------------+------------------+------+-----+---------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------------------+-----------------------------+
| device_name | varchar(32) | YES | MUL | NULL | |
| link_name | varchar(32) | YES | MUL | NULL | |
| event_type | varchar(32) | YES | MUL | NULL | |
| down_time | timestamp | NO | MUL | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| up_time | timestamp | NO | MUL | 0000-00-00 00:00:00 | |
| span | geometry | NO | MUL | NULL | |
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
+-------------+------------------+------+-----+---------------------+-----------------------------+
7 rows in set (0.03 sec)
EXPLAIN:
mysql> EXPLAIN
->
-> SELECT
->
-> CONCAT('Link1','-', 'Link2') overlaps,
-> GREATEST(a.down_time,b.down_time) AS downtime,
-> LEAST(a.up_time,b.up_time) AS uptime,
-> TIME_TO_SEC(TIMEDIFF( LEAST(a.up_time,b.up_time),
-> GREATEST(a.down_time,b.down_time))) AS duration
->
-> FROM link_events a
-> JOIN link_events b
->
-> ON Intersects (a.span, b.span)
->
-> WHERE (a.device_name = 'd1' AND b.device_name = 'd2')
-> AND (a.link_name = 'l1' AND b.link_name = 'l2')
-> AND (a.event_type = 'e1' AND b.event_type = 'e1');
+----+-------------+-------+------+-------------------------------------------------------------------+---------------+---------+-------------------+-------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-------------------------------------------------------------------+---------------+---------+-------------------+-------+------------------------------------+
| 1 | SIMPLE | a | ref | span,device_name,link_name,event_type,device_name_2,device_name_3 | device_name_2 | 297 | const,const,const | 383 | Using index condition |
| 1 | SIMPLE | b | ref | span,device_name,link_name,event_type,device_name_2,device_name_3 | device_name_2 | 297 | const,const,const | 14580 | Using index condition; Using where |
+----+-------------+-------+------+-------------------------------------------------------------------+---------------+---------+-------------------+-------+------------------------------------+
2 rows in set (0.09 sec)
Using Intersects takes 1min 12 secs?

For this query:
SELECT *
FROM link_events a JOIN
link_events b
ON (a.down_time <= b.up_time) AND (a.up_time >= b.down_time)
WHERE (a.device = 'd1' AND b.device = 'd2') AND
(a.psu = 'p1' AND b.psu = 'p2') AND
(a.event = 'e1' AND b.event = 'e2');
You want indexes on link_events(device, psu, event, up_time, down_time). For clarity, I would express the query more like this:
SELECT *
FROM link_events a JOIN
link_events b
ON (a.down_time <= b.up_time) AND (a.up_time >= b.down_time)
WHERE (a.device, a.psu, a.event) IN (('d1', 'p1', 'e1')) AND
(b.device, a.psu, a.event) IN (('d2', 'p2', 'e2'));

Try:
ALTER TABLE link_events ADD KEY(device,psu,event,up_time),
ADD KEY(device,psu,event,down_time)
Hopefully this will be selective enough. If this does not help, post the results of EXPLAIN so we can make sure the optimizer is doing the best it can, and we will go from there if needed.
Edit:
It is important to understand that not all indexes are of equal value for a particular query. A common mistake is to think of an index as some magic worker that will automatically speed up the query if you just reference the column in the index. This is not quite the case. The keys need to be designed and the queries needs to be written in such a way that allows the best possible access path to the records. Changing something that might appear insignificant such as the order of the columns in the index or writing SQRT(x) = 4.4 instead of x = 4.4 * 4.4 could make the index unusable and slow the query down by a factor of a thousand or even a million or more.
I highly recommend reading this:
http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html
Having a feel for how MySQL uses keys can save you a lot of trouble in the future.
EDIT 2 - another idea is to add a column span GEOMETRY NOT NULL, SPATIAL KEY (span) containing linestring(point(up_time,0),point(down_time,0)) - times would need to be numeric (you can convert using UNIX_TIMESTAMP() for example) - and use Intersects(a.span,b.span) in the query. With some fine tuning this has the potential of being much faster than even the improved query because span intersections are being detected using a geometry-based algorithm specially designed for such things.

Related

Optimizing join on derived table - EXPLAIN different on local and server

I have the following ugly query, which runs okay but not great, on my local machine (1.4 secs, running v5.7). On the server I'm using, which is running an older version of MySQL (v5.5), the query just hangs. It seems to get caught on "Copying to tmp table":
SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
p.street_number,
p.street_name,
p.site_address_city_state,
p.number_of_units,
p.number_of_stories,
p.bedrooms,
p.bathrooms,
p.lot_area_sqft,
p.cost_per_sq_ft,
p.year_built,
p.sales_date,
p.sales_price,
p.id
FROM (
SELECT APN, property_case_detail_id FROM property_inspection AS pi
GROUP BY APN, property_case_detail_id
HAVING
COUNT(IF(status='Resolved Date', 1, NULL)) = 0
) as open_cases
JOIN property AS p
ON p.parcel_number = open_cases.APN
LIMIT 0, 1000;
mysql> show processlist;
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| 21120 | headsupcity | localhost | lead_housing | Query | 21 | Copying to tmp table | SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
p.street_numbe |
| 21121 | headsupcity | localhost | lead_housing | Query | 0 | NULL | show processlist |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
Explains are different on my local machine and on the server, and I'm assuming the only reason my query runs at all on my local machine, is because of the key that is automatically created on the derived table:
Explain (local):
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| 1 | PRIMARY | p | NULL | ALL | NULL | NULL | NULL | NULL | 40319 | 100.00 | Using temporary |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 8 | lead_housing.p.parcel_number | 40 | 100.00 | NULL |
| 2 | DERIVED | pi | NULL | ALL | NULL | NULL | NULL | NULL | 1623978 | 100.00 | Using temporary; Using filesort |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
Explain (server):
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| 1 | PRIMARY | p | ALL | NULL | NULL | NULL | NULL | 41369 | Using temporary |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 122948 | Using where; Distinct; Using join buffer |
| 2 | DERIVED | pi | ALL | NULL | NULL | NULL | NULL | 1718586 | Using temporary; Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
Schemas:
mysql> explain property_inspection;
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| lblCaseNo | int(11) | NO | MUL | NULL | |
| APN | bigint(10) | NO | MUL | NULL | |
| date | varchar(50) | NO | | NULL | |
| status | varchar(500) | NO | | NULL | |
| property_case_detail_id | int(11) | YES | MUL | NULL | |
| case_type_id | int(11) | YES | MUL | NULL | |
| date_modified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| update_status | tinyint(1) | YES | | 1 | |
| created_date | datetime | NO | | NULL | |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
10 rows in set (0.02 sec)
mysql> explain property; (not all columns, but you get the gist)
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| parcel_number | bigint(10) | NO | | 0 | |
| date_modified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| created_date | datetime | NO | | NULL | |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
Variables that might be relevant:
tmp_table_size: 16777216
innodb_buffer_pool_size: 8589934592
Any ideas on how to optimize this, and any idea why the explains are so different?
Since this is where the Optimizers are quite different, let's try to optimize
SELECT APN, property_case_detail_id FROM property_inspection AS pi
GROUP BY APN, property_case_detail_id
HAVING
COUNT(IF(status='Resolved Date', 1, NULL)) = 0
) as open_cases
Give this a try:
SELECT ...
FROM property AS p
WHERE NOT EXISTS ( SELECT 1 FROM property_inspection
WHERE status = 'Resolved Date'
AND p.parcel_number = APN )
ORDER BY ??? -- without this, the `LIMIT` is unpredictable
LIMIT 0, 1000;
or...
SELECT ...
FROM property AS p
LEFT JOIN property_inspection AS pi ON p.parcel_number = pi.APN
WHERE pi.status = 'Resolved Date'
AND pi.APN IS NULL
ORDER BY ??? -- without this, the `LIMIT` is unpredictable
LIMIT 0, 1000;
Index:
property_inspection: INDEX(status, parcel_number) -- in either order
MySQL 5.5 and 5.7 are quite different and the later has better optimizer so there is no surprise that explain plans are different.
You'd better provide SHOW CREATE TABLE property; and SHOW CREATE TABLE property_inspection; outputs as it will show indexes that are on your tables.
Your sub-query is the issue.
- Server tries to process 1.6M rows with no index and grouping everything.
- Having is quite expensive operation so you'd better avoid it, expecially in sub-queries.
- Grouping in this case is bad idea. You do not need the aggregation/counting. You need to check if the 'Resolved Date' status is just exists
Based on the information provided I'd recommend:
- Alter table property_inspection to reduce length of status column.
- Add index on the column. Use covering index (APN, property_case_detail_id, status) if possible (in this columns order).
- Change query to something like this:
SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
...
p.id
FROM
property_inspection AS `pi1`
INNER JOIN property AS p ON (
p.parcel_number = `pi1`.APN
)
LEFT JOIN (
SELECT
`pi2`.property_case_detail_id
, `pi2`. APN
FROM
property_inspection AS `pi2`
WHERE
`status` = 'Resolved Date'
) AS exclude ON (
exclude.APN = `pi1`.APN
AND exclude.property_case_detail_id = `pi1`.property_case_detail_id
)
WHERE
exclude.APN IS NULL
LIMIT
0, 1000;

Strange performance degradation when using a subquery

I have a nano AWS server running MySQL 5.5 for testing purposes. So, keep in mind that the server has limited resources (RAM, CPU, ...).
I have a table called "gpslocations". There is a primary index on its primary key "GPSLocationID". There is another secondary index on one of its fields "userID". The table has 6583 records.
When I run this query:
select * from gpslocations where GPSLocationID in (select max(GPSLocationID) from gpslocations where userID in (1,9) group by userID);
I get two rows and it takes a lot of time:
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
| GPSLocationID | lastUpdate | latitude | longitude | phoneNumber | userID | sessionID | speed | direction | distance | gpsTime | locationMethod | accuracy | extraInfo | eventType |
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
| 4107 | 2018-09-25 16:38:44 | 58.7641435 | 7.4868510 | e5d6fdff-9afe-44bb-a53a-3b454b12c9c6 | 9 | 77385f89-6b72-4b9e-b937-d2927959e0bd | 0 | 0 | 2.9 | 2018-09-25 18:38:43 | fused | 455 | 0 | android |
| 9822 | 2018-10-22 10:29:43 | 58.7794353 | 7.1952995 | 5240853e-2c36-4563-9dc3-238039de411e | 1 | 1fcad5af-c6ef-4bda-8fb2-d6e5688cf08a | 0 | 0 | 185.6 | 2018-10-22 12:29:41 | fused | 129 | 0 | android |
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
2 rows in set (14.96 sec)
When I just execute the inner select:
select max(GPSLocationID) from gpslocations where userID in (1,9) group by userID;
I get two values very fast:
+--------------------+
| max(GPSLocationID) |
+--------------------+
| 9822 |
| 4107 |
+--------------------+
2 rows in set (0.00 sec)
When I take these two values and write them manually in the outer select:
select * from gpslocations where GPSLocationID in (9822,4107);
I get exactly the same result as the first query but in no time!
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
| GPSLocationID | lastUpdate | latitude | longitude | phoneNumber | userID | sessionID | speed | direction | distance | gpsTime | locationMethod | accuracy | extraInfo | eventType |
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
| 4107 | 2018-09-25 16:38:44 | 58.7641435 | 7.4868510 | e5d6fdff-9afe-44bb-a53a-3b454b12c9c6 | 9 | 77385f89-6b72-4b9e-b937-d2927959e0bd | 0 | 0 | 2.9 | 2018-09-25 18:38:43 | fused | 455 | 0 | android |
| 9822 | 2018-10-22 10:29:43 | 58.7794353 | 7.1952995 | 5240853e-2c36-4563-9dc3-238039de411e | 1 | 1fcad5af-c6ef-4bda-8fb2-d6e5688cf08a | 0 | 0 | 185.6 | 2018-10-22 12:29:41 | fused | 129 | 0 | android |
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
2 rows in set (0.00 sec)
Can anybody explain this huge performance degradation when the two simple and fast queries are combined in one?
EDIT
Here is the output of explain:
+----+--------------------+--------------+-------+----------------------+--------+---------+------+------+---------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------+-------+----------------------+--------+---------+------+------+---------------------------------------+
| 1 | PRIMARY | gpslocations | ALL | NULL | NULL | NULL | NULL | 6648 | Using where |
| 2 | DEPENDENT SUBQUERY | gpslocations | range | userNameIndex,userID | userID | 5 | NULL | 11 | Using where; Using index for group-by |
+----+--------------------+--------------+-------+----------------------+--------+---------+------+------+---------------------------------------+
2 rows in set (0.00 sec)
in can have really bad optimization characteristics. In your version of MySQL, the subquery is probably being run once for every row in gsplocations. I think this performance problem was fixed in later versions.
I recommend using a correlated subquery instead:
select l.*
from gpslocations l
where l.GPSLocationID = (select max(l2.GPSLocationID)
from gpslocations l2
where l2.userID = l.userId
) and
l.userID in (1, 9);
And for this, you want an index on gpslocations(userID, GPSLocationID).
Another alternative is the join approach:
select l.*
from gpslocations l join
(select l2.userID, max(l2.GPSLocationID)
from gpslocations l2
where l2.userID in (1, 9)
) l2
on l2.userID = l.userId
where l.userID in (1, 9);

mysql join with sub-query

This is my schema:
mysql> describe stocks;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| symbol | varchar(32) | NO | | NULL | |
| date | datetime | NO | | NULL | |
| value | float(10,3) | NO | | NULL | |
| contracts | int(8) | NO | | NULL | |
| open | float(10,3) | NO | | NULL | |
| close | float(10,3) | NO | | NULL | |
| high | float(10,3) | NO | | NULL | |
| low | float(10,3) | NO | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
9 rows in set (0.03 sec)
I added the column open and low and I want to fill up with the data inside the table.
These values open/close are referenced to each day. (so the relative max/min id of each day should give me the correct value). So my first insight is get the list of date and then left join with the table:
SELECT DISTINCT(DATE(date)) as date FROM stocks
but I'm stuck because I can't get the max/min ID or the the first/last value. Thanks
You will get day wise min and max ids from below query
SELECT DATE_FORMAT(date, "%d/%m/%Y"),min(id) as min_id,max(id) as max_id FROM stocks group by DATE_FORMAT(date, "%d/%m/%Y")
But other requirement is not clear.
Solved!
mysql> UPDATE stocks s JOIN
-> (SELECT k.date, k.value as v1, y.value as v2 FROM (SELECT x.date, x.min_id, x.max_id, stocks.value FROM (SELECT DATE(date) as date,min(id) as min_id,max(id) as max_id FROM stocks group by DATE(date)) AS x LEFT JOIN stocks ON x.min_id = stocks.id) AS k LEFT JOIN stocks y ON k.max_id = y.id) sd
-> ON DATE(s.date) = sd.date
-> SET s.open = sd.v1, s.close = sd.v2;
Query OK, 995872 rows affected (1 min 50.38 sec)
Rows matched: 995872 Changed: 995872 Warnings: 0

query doesn't use index when using IN subquery

You will notice the primary query is NOT using an index on school_id, when there is an index. If I remove the subquery and use a hardcoded list, it will use an index.
mysql> explain SELECT year, race, CONCAT(percent,'%') as percent
-> FROM school_data_race_ethnicity as school_data_race_ethnicity_outer
-> WHERE school_id IN(
-> SELECT field_school_id_value
-> FROM field_data_field_school_id
-> WHERE entity_id IN (SELECT entity_id
-> FROM field_data_field_district
-> WHERE field_district_nid =
-> (SELECT entity_id FROM field_data_field_district_id
-> WHERE `field_district_id_value` = 26130106 LIMIT 1))
-> ) ORDER BY year DESC, race;
+----+--------------------+----------------------------------+----------------+------------------------------+-----------+---------+------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------------------------------+----------------+------------------------------+-----------+---------+------+-------+-----------------------------+
| 1 | PRIMARY | school_data_race_ethnicity_outer | ALL | NULL | NULL | NULL | NULL | 97116 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | field_data_field_school_id | ALL | NULL | NULL | NULL | NULL | 5325 | Using where |
| 3 | DEPENDENT SUBQUERY | field_data_field_district | index_subquery | entity_id,field_district_nid | entity_id | 4 | func | 1 | Using where |
| 4 | SUBQUERY | field_data_field_district_id | ALL | NULL | NULL | NULL | NULL | 685 | Using where |
+----+--------------------+----------------------------------+----------------+------------------------------+-----------+---------+------+-------+-----------------------------+
4 rows in set (0.00 sec)
mysql> describe school_data_race_ethnicity
-> ;
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| school_id | varchar(255) | NO | MUL | NULL | |
| year | int(11) | NO | MUL | NULL | |
| race | varchar(255) | NO | | NULL | |
| percent | decimal(5,2) | NO | | NULL | |
+-----------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql>
use INNER JOIN instead of subquery with IN clause:
explain SELECT year, race, CONCAT(percent,'%') as percent
FROM school_data_race_ethnicity a
INNER JOIN(
SELECT field_school_id_value
FROM field_data_field_school_id b
INNER JOIN (SELECT entity_id
FROM field_data_field_district
WHERE field_district_nid =
(SELECT entity_id FROM field_data_field_district_id
WHERE `field_district_id_value` = 26130106 LIMIT 1)) c
ON b. entity_id = c.entity_id
) d
ON a.school_id = d.field_school_id_value
ORDER BY year DESC, race;
The server probably is skipping indexed seek because it's not able to figure out how many items will actually be in the list. The thinking is that since it doesn't have a good estimate of the number of items in the subquery, it's going to choose the most conservative approach and go with a scan (probably).
I'd suggest rewriting the query using JOINs instead:
SELECT DISTINCT o.year, o.race, CONCAT(o.percent,'%') as percent
FROM school_data_race_ethnicity as o
INNER JOIN field_data_field_school_id s
ON s.field_school_id_value = o.school_id
INNER JOIN field_data_field_district d
ON d.entity_id = s.entity_id
WHERE field_district_nid = (
SELECT entity_id
FROM field_data_field_district_id
WHERE `field_district_id_value` = 26130106 LIMIT 1))
ORDER BY o.year DESC, o.race;

Optimize SQL query (Facebook-like application)

My application is similar to Facebook, and I'm trying to optimize the query that get user records. The user records are that he as src ou dst. The src is in usermuralentry directly, the dst list are in usermuralentry_user.
So, a entry can have one src and many dst.
I have those tables:
mysql> desc usermuralentry ;
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_src_id | int(11) | NO | MUL | NULL | |
| private | tinyint(1) | NO | | NULL | |
| content | longtext | NO | | NULL | |
| date | datetime | NO | | NULL | |
| last_update | datetime | NO | | NULL | |
+-----------------+------------------+------+-----+---------+----------------+
10 rows in set (0.10 sec)
mysql> desc usermuralentry_user ;
+-------------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| usermuralentry_id | int(11) | NO | MUL | NULL | |
| userinfo_id | int(11) | NO | MUL | NULL | |
+-------------------+---------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
And the following query to retrieve information from two users.
mysql> explain
SELECT *
FROM usermuralentry AS a
, usermuralentry_user AS b
WHERE a.user_src_id IN ( 1, 2 )
OR
(
a.id = b.usermuralentry_id
AND b.userinfo_id IN ( 1, 2 )
);
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | b | ALL | usermuralentry_id,usermuralentry_user_bcd7114e,usermuralentry_user_6b192ca7 | NULL | NULL | NULL | 147188 | |
| 1 | SIMPLE | a | ALL | PRIMARY | NULL | NULL | NULL | 1371289 | Range checked for each record (index map: 0x1) |
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------+---------+------+---------+------------------------------------------------+
2 rows in set (0.00 sec)
but it is taking A LOT of time...
Some tips to optimize? Can the table schema be better in my application?
Use this query
SELECT *
FROM usermuralentry AS a
left join usermuralentry_user AS b
on b.usermuralentry_id = a.id
WHERE a.user_src_id IN(1, 2)
OR (a.id = b.usermuralentry_id
AND b.userinfo_id IN(1, 2));
And for some tips here are
You are using two tables in from clause which is a cartision product and will take a lot of time as well as undesired results. Always use joins in this situation.
I think your join isn't properly formed, and you need to change the query to use UNION. The OR condition in the where clause is killing performance as well:
SELECT *
FROM usermuralentry AS a
JOIN usermuralentry_user AS b ON a.id = b.usermuralentry_id /* use explicit JOIN! */
WHERE a.user_src_id IN (1 , 2)
UNION
SELECT *
FROM usermuralentry AS a
JOIN usermuralentry_user AS b ON a.id = b.usermuralentry_id
WHERE b.usermuralentry_id IN ( 1, 2 )
You also need an index: ALTER TABLE usermuralentry_user ADD INDEX (usermuralentry_id)