MySQL differences between aggregate order in query vs subquery - mysql

I have 2 query about ordering data:
Query 1:
SELECT * FROM (
SELECT idprovince, COUNT(*) total
FROM cities
JOIN persons USE INDEX (index_5) USING (idcity)
WHERE is_tutor = 'Y'
GROUP BY idprovince
) A
ORDER BY total DESC
Query 2:
SELECT idprovince, COUNT(*) total
FROM cities
JOIN persons USE INDEX (index_5) USING (idcity)
WHERE is_tutor = 'Y'
GROUP BY idprovince
ORDER BY total DESC
Query 1 return data much faster than query 2, my question is what is big difference between ordering using query and using it in subquery?
NOTE:my db version is mysql-5.0.96-x64. Data count is about 400k in persons, and 500 in cities.
UPDATE:
Output of mysql explain command:
Query 1:
mysql> EXPLAIN
-> SELECT *
-> FROM (
-> SELECT idprovince, COUNT(*) total
-> FROM cities
-> JOIN persons USE INDEX (index_5) USING (idcity)
-> WHERE is_tutor = 'Y'
-> GROUP BY idprovince
-> ) A
-> ORDER BY total DESC
-> ;
+----+-------------+------------+--------+---------------+---------+---------+------------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+------------------------------------+--------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 34 | Using filesort |
| 2 | DERIVED | persons | ref | index_5 | index_5 | 2 | | 163316 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | cities | eq_ref | PRIMARY | PRIMARY | 4 | _myproject_lesaja_2.persons.idcity | 1 | |
+----+-------------+------------+--------+---------------+---------+---------+------------------------------------+--------+----------------------------------------------+
3 rows in set (1.22 sec)
Query 2:
mysql> EXPLAIN
-> SELECT idprovince, COUNT(*) total
-> FROM cities
-> JOIN persons USE INDEX (index_5) USING (idcity)
-> WHERE is_tutor = 'Y'
-> GROUP BY idprovince
-> ORDER BY total DESC;
+----+-------------+---------+-------+---------------+-------------+---------+-------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+-------------+---------+-------+--------+----------------------------------------------+
| 1 | SIMPLE | cities | index | PRIMARY | FK_cities_1 | 4 | NULL | 4 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | persons | ref | index_5 | index_5 | 2 | const | 163316 | Using where |
+----+-------------+---------+-------+---------------+-------------+---------+-------+--------+----------------------------------------------+
2 rows in set (0.00 sec)
Result Query 1:
mysql> SELECT *
-> FROM (
-> SELECT idprovince, COUNT(*) total
-> FROM cities
-> JOIN persons USE INDEX (index_5) USING (idcity)
-> WHERE is_tutor = 'Y'
-> GROUP BY idprovince
-> ) A
-> ORDER BY total DESC
-> ;
+------------+-------+
| idprovince | total |
+------------+-------+
| 35 | 15797 |
......................
......................
......................
| 76 | 2091 |
| 65 | 2018 |
+------------+-------+
34 rows in set (0.78 sec)
Result Query 2:
mysql> SELECT idprovince, COUNT(*) total
-> FROM cities
-> JOIN persons USE INDEX (index_5) USING (idcity)
-> WHERE is_tutor = 'Y'
-> GROUP BY idprovince
-> ORDER BY total DESC;
+------------+-------+
| idprovince | total |
+------------+-------+
| 35 | 15797 |
| 33 | 14413 |
| 12 | 13683 |
......................
......................
......................
| 34 | 2135 |
| 76 | 2091 |
| 65 | 2018 |
+------------+-------+
34 rows in set (8 min 25.80 sec)
SHOW PROFILE OUTPUT:
QUERY 1:
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000240 |
| Opening tables | 0.000043 |
| System lock | 0.000004 |
| Table lock | 0.000392 |
| optimizing | 0.000084 |
| statistics | 0.004455 |
| preparing | 0.000026 |
| Creating tmp table | 0.000221 |
| executing | 0.000002 |
| Copying to tmp table | 0.913722 |
| Sorting result | 0.000065 |
| Sending data | 0.000020 |
| removing tmp table | 0.000145 |
| Sending data | 0.000008 |
| init | 0.000017 |
| optimizing | 0.000002 |
| statistics | 0.000038 |
| preparing | 0.000007 |
| executing | 0.000001 |
| Sorting result | 0.000012 |
| Sending data | 0.000337 |
| end | 0.000002 |
| end | 0.000002 |
| query end | 0.000002 |
| freeing items | 0.000020 |
| closing tables | 0.000001 |
| removing tmp table | 0.000074 |
| closing tables | 0.000003 |
| logging slow query | 0.000001 |
| cleaning up | 0.000003 |
+----------------------+----------+
QUERY 2:
+----------------------+------------+
| Status | Duration |
+----------------------+------------+
| starting | 0.000195 |
| Opening tables | 0.000029 |
| System lock | 0.000004 |
| Table lock | 0.000011 |
| init | 0.000078 |
| optimizing | 0.000021 |
| statistics | 0.003399 |
| preparing | 0.000025 |
| Creating tmp table | 0.000259 |
| Sorting for group | 0.000007 |
| executing | 0.000001 |
| Copying to tmp table | 506.711308 |
| Sorting result | 0.000049 |
| Sending data | 0.000298 |
| end | 0.000004 |
| removing tmp table | 0.000150 |
| end | 0.000002 |
| end | 0.000002 |
| query end | 0.000002 |
| freeing items | 0.000013 |
| closing tables | 0.000003 |
| logging slow query | 0.000001 |
| logging slow query | 0.000042 |
| cleaning up | 0.000003 |
+----------------------+------------+
CREATE STATEMENT
CREATE TABLE persons (
idperson INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
is_tutor ENUM('Y','N') NULL DEFAULT 'N',
name VARCHAR(64) NOT NULL,
...
idcity INT(10) UNSIGNED NOT NULL,
...
PRIMARY KEY (idperson),
UNIQUE INDEX index_3 (name) USING BTREE,
UNIQUE INDEX index_4 (email) USING BTREE,
INDEX index_5 (is_tutor),
...
CONSTRAINT FK_persons_1 FOREIGN KEY (idcity) REFERENCES cities (idcity)
)
ENGINE=InnoDB
AUTO_INCREMENT=414738;
CREATE TABLE cities (
idcity INT(10) UNSIGNED NOT NULL,
idprovince INT(10) UNSIGNED NOT NULL,
city VARCHAR(64) NOT NULL,
PRIMARY KEY (idcity),
UNIQUE INDEX index_3 (city),
INDEX FK_cities_1 (idprovince),
CONSTRAINT FK_cities_1 FOREIGN KEY (idprovince) REFERENCES provinces (idprovince)
)
ENGINE=InnoDB;

I am admittedly not an expert on this one but looking at MySQL Documentation on ORDER BY Optimization, you have not only one but two un-optimized use of ORDER BY in your Query No. 2:
SELECT idprovince, COUNT(*) total
FROM cities
JOIN persons USE INDEX (index_5) USING (idcity)
WHERE is_tutor = 'Y'
GROUP BY idprovince
ORDER BY total DESC
First one :
The key used to fetch the rows
WHERE is_tutor = 'Y'
is not the same as the one used in the ORDER BY:
ORDER BY total DESC
Second one :
You have different ORDER BY and GROUP BY expressions.
GROUP BY idprovince
ORDER BY total DESC
On the two cases above MySQL will not use Indexes in order to resolve ORDER BY although it could use indexes in searching for the rows to match the WHERE clause.
On other hand your Query No. 1, follows the optimized form of ORDER BY although the ORDER BY is used outside the sub-query.
Thus could be the reason that Query No. 2 is far slower than Query No. 1.
Additionally, in both cases the Index (idCity) will be virtually useless in resolving also ORDER BY because index uses idCity while ORDER BY clause uses Total which is an aggregate result.
See discussion here also.

Related

Counting unique rows with nested select - help me optimise

I have this weird query
SELECT t.something_id, t.platform, t.country, SUM(t.amnt) AS amountz
FROM ( SELECT something_id, platform, country, 1 AS amnt
FROM log_table
WHERE target_date = '2018-02-09'
GROUP BY (unique_key) ) t
GROUP BY t.something_id, t.country, t.platform
The log table has unique players and a counter, where if the player has multiple sessions - it's updated. It's working based on an unique index where every day there a separate row for a unique user inserted so we could analyze the data.At this point the table grew quite a bit, and running this query to count yesterdays unique users is quite a difficult task.
Running a explain extended query gives me this result:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | | | |
|---- |------------- |----------- |------- |--------------- |------------------ |--------- |----------- |----------- |---------- |------------ |-------- |---------- |------- |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 114441375 | 100.00 | Using | temporary;| Using | filesort |
| 2 | DERIVED | log_table | index | NULL | idx_multi_column | 944 | NULL | 114441375 | 100.00 | Using | where; |Using | index |
my structure:
| Name | Type |
|------------- |-------------- |
| stat_id | int(8) |
| metric | tinyint(1) |
| platform | tinyint(1) |
| something_id | varchar(128) |
| target_date | date |
| country | varchar(2) |
| amount | int(100) |
| unique_key | varchar(180) |
| created | timestamp |
| modified | timestamp |
index that I'm using:
idx_multi_column = unique_key,target_date,country,platform,something_id
I'm aware that the first elect that nests the second select uses temporary storage and because of the amount of rows that slows things a lot. Any way to improve this?
It looks like your query could be simplified using aggregate function COUNT(DISTINCT...) :
SELECT
something_id,
platform,
country,
COUNT(DISTINCT unique_key) AS amountz
FROM log_table
WHERE target_date = '2018-02-09'
GROUP BY something_id, country, platform
If there are no duplicate unique_id for a given something_id/platform/country, then you may remove the DISTINCT keyword ; this should increase performance.
I'm pretty sure this is the query you want as (GMB points out):
SELECT something_id, platform, country,
COUNT(DISTINCT unique_key) AS amountz
FROM log_table
WHERE target_date = '2018-02-09'
GROUP BY something_id, country, platform
For performance, try an index on log_table(target_date, something_id, country, platform, unique_key).

Sql query performance is varying though they are the same

There are 2 tables and their structure as below:
mysql> desc product;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| brand | varchar(20) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
2 rows in set (0.02 sec)
mysql> desc sales;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| yearofsales | varchar(10) | YES | | NULL | |
| price | int(11) | YES | | NULL | |
+-------------+-------------+------+-----+---------+-------+
3 rows in set (0.01 sec)
Here id is the foreign key.
And Queries are as follows:
1.
mysql> select brand,sum(price),yearofsales
from product p, sales s
where p.id=s.id
group by s.id,yearofsales;
+-------+------------+-------------+
| brand | sum(price) | yearofsales |
+-------+------------+-------------+
| Nike | 917504000 | 2012 |
| FF | 328990720 | 2010 |
| FF | 328990720 | 2011 |
| FF | 723517440 | 2012 |
+-------+------------+-------------+
4 rows in set (1.91 sec)
2.
mysql> select brand,tmp.yearofsales,tmp.sum
from product p
join (
select id,yearofsales,sum(price) as sum
from sales
group by yearofsales,id
) tmp on p.id=tmp.id ;
+-------+-------------+-----------+
| brand | yearofsales | sum |
+-------+-------------+-----------+
| Nike | 2012 | 917504000 |
| FF | 2011 | 328990720 |
| FF | 2012 | 723517440 |
| FF | 2010 | 328990720 |
+-------+-------------+-----------+
4 rows in set (1.59 sec)
Question is: Why the second query takes less time than the first one? I have executed it multiple times in different order as well.
You can check the execution plan for the two queries and the indexes on the two tables to see why one query takes more than the other. Also, you cannot run one simple test and trust the results, there are many factors that can impact the execution of queries, like the server being busy with something else when executing one query, so it runs slower. You'll have to run both queries a big number of times and then compare the averages.
However, it is highly recommended to use explicit joins instead of implicit joins:
SELECT brand, SUM(price), yearofsales
FROM product p
INNER JOIN sales s ON p.id = s.id
GROUP BY s.id, yearofsales;

Improve Slow Mysql Query

i have a database table containing events.
mysql> describe events;
+-------------+------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------------------+----------------+
| device | varchar(32) | YES | MUL | NULL | |
| psu | varchar(32) | YES | MUL | NULL | |
| event | varchar(32) | YES | MUL | NULL | |
| down_time | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| up_time | timestamp | NO | MUL | 0000-00-00 00:00:00 | |
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
+-------------+------------------+------+-----+---------------------+----------------+
6 rows in set (0.01 sec)
i want to find events that overlap in time and use the following query:
SELECT *
FROM link_events a
JOIN link_events b
ON ( a.down_time <= b.up_time )
AND ( a.up_time >= b.down_time )
WHERE (a.device = 'd1' AND b.device = 'd2')
AND (a.psu = 'p1' AND b.psu = 'p2')
AND (a.event = 'e1' AND b.event = 'e2');
+-------------+-----------+------------+---------------------+---------------------+--------+-------------+-----------+------------+---------------------+---------------------+--------+
| device | psu | event | down_time | up_time | id | device | psu | event | down_time | up_time | id |
+-------------+-----------+------------+---------------------+---------------------+--------+-------------+-----------+------------+---------------------+---------------------+--------+
| d1 | p1 | e1 | 2013-01-14 16:42:10 | 2013-01-14 16:43:00 | 374529 | d2 | p2 | e2 | 2013-01-14 16:42:14 | 2013-01-14 16:42:18 | 211570 |
| d1 | p1 | e1 | 2013-05-29 18:49:26 | 2013-05-30 12:31:15 | 374569 | d2 | p2 | e2 | 2013-05-30 08:48:20 | 2013-05-30 08:48:27 | 211787 |
| d1 | p1 | e1 | 2013-05-29 18:49:26 | 2013-05-30 12:31:15 | 374569 | d2 | p2 | e2 | 2013-05-30 08:48:54 | 2013-05-30 08:48:58 | 211788 |
+-------------+-----------+------------+---------------------+---------------------+--------+-------------+-----------+------------+---------------------+---------------------+--------+
3 rows in set (35.88 sec)
The events table contains the following number of rows:
mysql> select count(*) from events;
+----------+
| count(*) |
+----------+
| 977759 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from events where device = 'd1' and psu = 'p1' and event = 'e1';
+----------+
| count(*) |
+----------+
| 11397 |
+----------+
1 row in set (0.12 sec)
mysql> select count(*) from events where device = 'd2' and psu = 'p2' and event = 'e2';
+----------+
| count(*) |
+----------+
| 243 |
+----------+
1 row in set (0.00 sec)
The database is installed on Windows 7 laptop and uses MyISAM engine.
Is there a way to better organise the database or change indexing to
improve query time which for first query is 35 secs. Repeating the
same query gives an immediate result however if i 'flush tables' and
repeat query a third time the time taken is again 35 secs.
Any help appreciated !
Here is output from EXPLAIN after ADD KEY:
mysql> EXPLAIN
-> SELECT *
->
-> FROM link_events a
-> JOIN link_events b
->
-> ON ( a.down_time <= b.up_time )
-> AND ( a.up_time >= b.down_time )
->
-> WHERE (a.device = 'd1' AND b.device = 'd2')
-> AND (a.psu = 'l1' AND b.psu = 'l2')
-> AND (a.event = 'e1' AND b.event = 'e2');
+----+-------------+-------+------+--------------------------------------------------------------------------------+---------------+---------+-------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------------------------------------------------------------------------+---------------+---------+-------------------+------+-----------------------+
| 1 | SIMPLE | b | ref | device,psu,event,down_time,up_time,device_2,device_3 | device_2 | 297 | const,const,const | 180 | Using index condition |
| 1 | SIMPLE | a | ref | device,psu,event,down_time,up_time,device_2,device_3 | device_2 | 297 | const,const,const | 7744 | Using index condition |
+----+-------------+-------+------+--------------------------------------------------------------------------------+---------------+---------+-------------------+------+-----------------------+
2 rows in set (0.07 sec)
New column:
mysql> describe link_events;
+-------------+------------------+------+-----+---------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------------------+-----------------------------+
| device_name | varchar(32) | YES | MUL | NULL | |
| link_name | varchar(32) | YES | MUL | NULL | |
| event_type | varchar(32) | YES | MUL | NULL | |
| down_time | timestamp | NO | MUL | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| up_time | timestamp | NO | MUL | 0000-00-00 00:00:00 | |
| span | geometry | NO | MUL | NULL | |
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
+-------------+------------------+------+-----+---------------------+-----------------------------+
7 rows in set (0.03 sec)
EXPLAIN:
mysql> EXPLAIN
->
-> SELECT
->
-> CONCAT('Link1','-', 'Link2') overlaps,
-> GREATEST(a.down_time,b.down_time) AS downtime,
-> LEAST(a.up_time,b.up_time) AS uptime,
-> TIME_TO_SEC(TIMEDIFF( LEAST(a.up_time,b.up_time),
-> GREATEST(a.down_time,b.down_time))) AS duration
->
-> FROM link_events a
-> JOIN link_events b
->
-> ON Intersects (a.span, b.span)
->
-> WHERE (a.device_name = 'd1' AND b.device_name = 'd2')
-> AND (a.link_name = 'l1' AND b.link_name = 'l2')
-> AND (a.event_type = 'e1' AND b.event_type = 'e1');
+----+-------------+-------+------+-------------------------------------------------------------------+---------------+---------+-------------------+-------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-------------------------------------------------------------------+---------------+---------+-------------------+-------+------------------------------------+
| 1 | SIMPLE | a | ref | span,device_name,link_name,event_type,device_name_2,device_name_3 | device_name_2 | 297 | const,const,const | 383 | Using index condition |
| 1 | SIMPLE | b | ref | span,device_name,link_name,event_type,device_name_2,device_name_3 | device_name_2 | 297 | const,const,const | 14580 | Using index condition; Using where |
+----+-------------+-------+------+-------------------------------------------------------------------+---------------+---------+-------------------+-------+------------------------------------+
2 rows in set (0.09 sec)
Using Intersects takes 1min 12 secs?
For this query:
SELECT *
FROM link_events a JOIN
link_events b
ON (a.down_time <= b.up_time) AND (a.up_time >= b.down_time)
WHERE (a.device = 'd1' AND b.device = 'd2') AND
(a.psu = 'p1' AND b.psu = 'p2') AND
(a.event = 'e1' AND b.event = 'e2');
You want indexes on link_events(device, psu, event, up_time, down_time). For clarity, I would express the query more like this:
SELECT *
FROM link_events a JOIN
link_events b
ON (a.down_time <= b.up_time) AND (a.up_time >= b.down_time)
WHERE (a.device, a.psu, a.event) IN (('d1', 'p1', 'e1')) AND
(b.device, a.psu, a.event) IN (('d2', 'p2', 'e2'));
Try:
ALTER TABLE link_events ADD KEY(device,psu,event,up_time),
ADD KEY(device,psu,event,down_time)
Hopefully this will be selective enough. If this does not help, post the results of EXPLAIN so we can make sure the optimizer is doing the best it can, and we will go from there if needed.
Edit:
It is important to understand that not all indexes are of equal value for a particular query. A common mistake is to think of an index as some magic worker that will automatically speed up the query if you just reference the column in the index. This is not quite the case. The keys need to be designed and the queries needs to be written in such a way that allows the best possible access path to the records. Changing something that might appear insignificant such as the order of the columns in the index or writing SQRT(x) = 4.4 instead of x = 4.4 * 4.4 could make the index unusable and slow the query down by a factor of a thousand or even a million or more.
I highly recommend reading this:
http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html
Having a feel for how MySQL uses keys can save you a lot of trouble in the future.
EDIT 2 - another idea is to add a column span GEOMETRY NOT NULL, SPATIAL KEY (span) containing linestring(point(up_time,0),point(down_time,0)) - times would need to be numeric (you can convert using UNIX_TIMESTAMP() for example) - and use Intersects(a.span,b.span) in the query. With some fine tuning this has the potential of being much faster than even the improved query because span intersections are being detected using a geometry-based algorithm specially designed for such things.

In a very large MySQL analytics table - should I index the timestamp?

I'm looking to improve the speed of queries on a very large MySQL analytics table that I have. This table is tracking playercount on gameservers and the structure looks as so:
`server_tracker` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`ip` int(10) unsigned NOT NULL,
`port` smallint(5) unsigned NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`players` tinyint(3) unsigned NOT NULL,
`map` varchar(28) NOT NULL,
`portjoin` smallint(5) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_tracking_ip_port` (`ip`,`port`)
) ENGINE=InnoDB AUTO_INCREMENT=310729056 DEFAULT CHARSET=utf8 ROW_FORMAT=FIXED |
This table is inserted into very frequently, with 10k+ servers being tracked 10+ times an hour. However, every hour the data is taken and averaged out, and put into an "averaged" table with basically the same structure.
Currently I have the IP/port setup as key. However - sometimes it can be a tad slow when doing that hourly averaging - so I am curious if it would be worth putting an index on the timestamp, which is frequently used to select data from a certain timeframe like so:
SELECT `players`
FROM `server_tracker`
WHERE `ip` = x
AND `port` = x
AND `date` > NOW()
AND `date` < NOW() + INTERVAL 60 MINUTE
ORDER BY `id` DESC
This is the only type of query ran on this table. The table is only used for fetching the playercount from gameservers within a specific timeframe. The data is never updated or changed.
However, I am a bit new to all of this - and I am not sure if putting an index on the timestamp would do much of anything. Just looking for some friendly advice.
Results of EXPLAIN SELECT players FROM server_tracker WHERE ip = x AND port = x AND date > NOW() AND date < NOW() + INTERVAL 60 MINUTE ORDER BY id DESC
+----+-------------+-----------------+------+----------------------+----------------------+---------+-------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+----------------------+----------------------+---------+-------------+-------+-------------+
| 1 | SIMPLE | server_tracker | ref | idx_tracking_ip_port | idx_tracking_ip_port | 6 | const,const | 15354 | Using where |
+----+-------------+-----------------+------+----------------------+----------------------+---------+-------------+-------+-------------+
One of the most important information in MySQL and scripts is to know the MySQL to very few exceptions, always just ONE INDEX can be used in a query.
So it does not use much depending on an index ever to set a Column when all 4 are used verfelder in the where clause.
Only a combined index hilt over these fields.
The order of the fields is very important for this index can also be used for other queries.
An example:
An index on field1, field2 and field3 is used when you have the WHERE FIELD1 or FIELD1 and FIELD2 or field1, field2 and FIELD3. This index is not used if you in the WHERE FIELD2 or used FIELD3 or FIELD2 and field. 3 So always use the first field.
Too easy to find out if un like the QUERY works you can just run your query and EXPALIN and beommst directly the information whether and which index is used. If there are several lines you can as an indicator, the individual values ​​under rows muliplizieren together. The smaller this number is the better performs your query.
MariaDB [tmp]> EXPLAIN select * from content;
+------+-------------+---------+------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------+------+---------------+------+---------+------+------+-------+
| 1 | SIMPLE | content | ALL | NULL | NULL | NULL | NULL | 13 | |
+------+-------------+---------+------+---------------+------+---------+------+------+-------+
1 row in set (0.00 sec)
MariaDB [tmp]>
Anternativ you can check out the profiler how long the QUERY in what capacity depends and about optimizing server
An example:
MariaDB [(none)]> use tmp
Database changed
MariaDB [tmp]> SET PROFILING=ON;
Query OK, 0 rows affected (0.00 sec)
MariaDB [tmp]>
MariaDB [tmp]> SELECT * FROM content;
+----+------+---------------------+--------+------+--------------+------+------+------+------+
| id | Wert | Zeitstempel | WertID | aaa | d | e | wwww | n | ddd |
+----+------+---------------------+--------+------+--------------+------+------+------+------+
| 1 | 10 | 2001-01-01 00:00:00 | 1 | NULL | 1.5000 | NULL | NULL | 1 | NULL |
| 2 | 12.3 | 2001-01-01 00:01:00 | 2 | NULL | 2.5000 | NULL | NULL | 2 | NULL |
| 3 | 17.4 | 2001-01-01 00:02:00 | 3 | NULL | 123456.1250 | NULL | NULL | 3 | NULL |
| 4 | 10.9 | 2001-01-01 01:01:00 | 1 | NULL | 1000000.0000 | NULL | NULL | 4 | NULL |
| 5 | 15.4 | 2001-01-01 01:02:00 | 2 | NULL | NULL | NULL | NULL | 5 | NULL |
| 6 | 20.9 | 2001-01-01 01:03:00 | 3 | NULL | NULL | NULL | NULL | 6 | NULL |
| 7 | 22 | 2001-01-02 00:00:00 | 1 | NULL | NULL | NULL | NULL | 7 | NULL |
| 8 | 12.3 | 2001-01-02 00:01:00 | 2 | NULL | NULL | NULL | NULL | 8 | NULL |
| 9 | 17.4 | 2001-01-02 00:02:00 | 3 | NULL | NULL | NULL | NULL |
+----+------+---------------------+--------+------+--------------+------+------+------+------+
13 rows in set (0.00 sec)
MariaDB [tmp]>
MariaDB [tmp]> SHOW PROFILE;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000031 |
| checking permissions | 0.000005 |
| Opening tables | 0.000036 |
| After opening tables | 0.000004 |
| System lock | 0.000003 |
| Table lock | 0.000002 |
| After opening tables | 0.000005 |
| init | 0.000013 |
| optimizing | 0.000006 |
| statistics | 0.000013 |
| preparing | 0.000010 |
| executing | 0.000002 |
| Sending data | 0.000073 |
| end | 0.000003 |
| query end | 0.000003 |
| closing tables | 0.000006 |
| freeing items | 0.000003 |
| updating status | 0.000012 |
| cleaning up | 0.000003 |
+----------------------+----------+
19 rows in set (0.00 sec)
MariaDB [tmp]>

How can I speed up my query. subquery is too slow

The query I have is for a table of inventory. What the subquery join does is gets the total number of work orders there are for each inventory asset. If I run the base query with the main joins for equipment type, vendor, location and room, it runs just fine. Less than a second to return a result. using it with the subquery join, it takes 15 to 20 seconds to return a result.
Here is the full query:
SELECT `inventory`.inventory_id AS 'inventory_id',
`inventory`.media_tag AS 'media_tag',
`inventory`.asset_tag AS 'asset_tag',
`inventory`.idea_tag AS 'idea_tag',
`equipTypes`.equipment_type AS 'equipment_type',
`inventory`.equip_make AS 'equip_make',
`inventory`.equip_model AS 'equip_model',
`inventory`.equip_serial AS 'equip_serial',
`inventory`.sales_order AS 'sales_order',
`vendors`.vendor_name AS 'vendor_name',
`inventory`.purchase_order AS 'purchase_order',
`status`.status AS 'status',
`locations`.location_name AS 'location_name',
`rooms`.room_number AS 'room_number',
`inventory`.notes AS 'notes',
`inventory`.send_to AS 'send_to',
`inventory`.one_to_one AS 'one_to_one',
`enteredBy`.user_name AS 'user_name',
from_unixtime(`inventory`.enter_date, '%m/%d/%Y') AS 'enter_date',
from_unixtime(`inventory`.modified_date, '%m/%d/%Y') AS 'modified_date',
COALESCE(at.assets,0) AS assets
FROM mod_inventory_data AS `inventory`
LEFT JOIN mod_inventory_equip_types AS `equipTypes`
ON `equipTypes`.equip_type_id = `inventory`.equip_type_id
LEFT JOIN mod_vendors_main AS `vendors`
ON `vendors`.vendor_id = `inventory`.vendor_id
LEFT JOIN mod_inventory_status AS `status`
ON `status`.status_id = `inventory`.status_id
LEFT JOIN mod_locations_data AS `locations`
ON `locations`.location_id = `inventory`.location_id
LEFT JOIN mod_locations_rooms AS `rooms`
ON `rooms`.room_id = `inventory`.room_id
LEFT JOIN mod_users_data AS `enteredBy`
ON `enteredBy`.user_id = `inventory`.entered_by
LEFT JOIN
( SELECT asset_tag, count(*) AS assets
FROM mod_workorder_data
WHERE asset_tag IS NOT NULL
GROUP BY asset_tag ) AS at
ON at.asset_tag = inventory.asset_tag
ORDER BY inventory_id ASC LIMIT 0,20
The MySQL EXPLAIN data for this is here
+----+-------------+--------------------+--------+---------------+-----------+---------+-------------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+--------+---------------+-----------+---------+-------------------------------------+-------+---------------------------------+
| 1 | PRIMARY | inventory | ALL | NULL | NULL | NULL | NULL | 12612 | Using temporary; Using filesort |
| 1 | PRIMARY | equipTypes | eq_ref | PRIMARY | PRIMARY | 4 | spsd_woidbs.inventory.equip_type_id | 1 | |
| 1 | PRIMARY | vendors | eq_ref | PRIMARY | PRIMARY | 4 | spsd_woidbs.inventory.vendor_id | 1 | |
| 1 | PRIMARY | status | eq_ref | PRIMARY | PRIMARY | 4 | spsd_woidbs.inventory.status_id | 1 | |
| 1 | PRIMARY | locations | eq_ref | PRIMARY | PRIMARY | 4 | spsd_woidbs.inventory.location_id | 1 | |
| 1 | PRIMARY | rooms | eq_ref | PRIMARY | PRIMARY | 4 | spsd_woidbs.inventory.room_id | 1 | |
| 1 | PRIMARY | enteredBy | eq_ref | PRIMARY | PRIMARY | 4 | spsd_woidbs.inventory.entered_by | 1 | |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4480 | |
| 2 | DERIVED | mod_workorder_data | range | asset_tag | asset_tag | 13 | NULL | 15897 | Using where; Using index |
+----+-------------+--------------------+--------+---------------+-----------+---------+-------------------------------------+-------+---------------------------------+
Using MySql query profiling I get this:
+--------------------------------+------------+
| Status | Time |
+--------------------------------+------------+
| starting | 0.000020 |
| checking query cache for query | 0.000263 |
| Opening tables | 0.000034 |
| System lock | 0.000013 |
| Table lock | 0.000079 |
| optimizing | 0.000011 |
| statistics | 0.000138 |
| preparing | 0.000019 |
| executing | 0.000010 |
| Sorting result | 0.000004 |
| Sending data | 0.015103 |
| init | 0.000094 |
| optimizing | 0.000009 |
| statistics | 0.000049 |
| preparing | 0.000022 |
| Creating tmp table | 0.000104 |
| executing | 0.000009 |
| Copying to tmp table | 15.410168 |
| Sorting result | 0.009488 |
| Sending data | 0.000215 |
| end | 0.000006 |
| removing tmp table | 0.001997 |
| end | 0.000018 |
| query end | 0.000005 |
| freeing items | 0.000112 |
| storing result in query cache | 0.000011 |
| removing tmp table | 0.000022 |
| closing tables | 0.000036 |
| logging slow query | 0.000005 |
| logging slow query | 0.000005 |
| cleaning up | 0.000013 |
+--------------------------------+------------+
which shows me that the bottle neck is copying to temp table, but I am unsure of how to speed this up. Are there settings on the server end that I can configure to make this faster? Are there changes to the existing query that I can do that will yield the same results that would be faster?
It seems to me that the LEFT JOIN subquery would give the same resulting data matrix every time, so if it has to run that query for every row in the inventory list, I can see why it would be slow. Or does MySQL cache the subquery when it runs? I thought I read somwhere that MySQL does not cache subqueries, is this true?
Any help is appreciated.
Here is what I did which seems to be working good. I created a table called mod_workorder_counts. The table has two fields, Asset tag which is unique, and wo_count which is and INT(3) field. I am populating that table with this query:
INSERT INTO mod_workorder_counts ( asset_tag, wo_count )
select s.asset_tag, ct
FROM
( SELECT t.asset_tag, count(*) as ct
FROM mod_workorder_data t
WHERE t.asset_tag IS NOT NULL
GROUP BY t.asset_tag
) as s
ON DUPLICATE KEY UPDATE mod_workorder_counts.wo_count = ct
which executed in 0.1580 seconds which may be considered slightly slow, but not bad.
Now when I run this modification of my original query:
SELECT `inventory`.inventory_id AS 'inventory_id',
`inventory`.media_tag AS 'media_tag',
`inventory`.asset_tag AS 'asset_tag',
`inventory`.idea_tag AS 'idea_tag',
`equipTypes`.equipment_type AS 'equipment_type',
`inventory`.equip_make AS 'equip_make',
`inventory`.equip_model AS 'equip_model',
`inventory`.equip_serial AS 'equip_serial',
`inventory`.sales_order AS 'sales_order',
`vendors`.vendor_name AS 'vendor_name',
`inventory`.purchase_order AS 'purchase_order',
`status`.status AS 'status',
`locations`.location_name AS 'location_name',
`rooms`.room_number AS 'room_number',
`inventory`.notes AS 'notes',
`inventory`.send_to AS 'send_to',
`inventory`.one_to_one AS 'one_to_one',
`enteredBy`.user_name AS 'user_name',
from_unixtime(`inventory`.enter_date, '%m/%d/%Y') AS 'enter_date',
from_unixtime(`inventory`.modified_date, '%m/%d/%Y') AS 'modified_date',
COALESCE(at.wo_count, 0) AS workorders
FROM mod_inventory_data AS `inventory`
LEFT JOIN mod_inventory_equip_types AS `equipTypes`
ON `equipTypes`.equip_type_id = `inventory`.equip_type_id
LEFT JOIN mod_vendors_main AS `vendors`
ON `vendors`.vendor_id = `inventory`.vendor_id
LEFT JOIN mod_inventory_status AS `status`
ON `status`.status_id = `inventory`.status_id
LEFT JOIN mod_locations_data AS `locations`
ON `locations`.location_id = `inventory`.location_id
LEFT JOIN mod_locations_rooms AS `rooms`
ON `rooms`.room_id = `inventory`.room_id
LEFT JOIN mod_users_data AS `enteredBy`
ON `enteredBy`.user_id = `inventory`.entered_by
LEFT JOIN mod_workorder_counts AS at
ON at.asset_tag = inventory.asset_tag
ORDER BY inventory_id ASC LIMIT 0,20
It executes in 0.0051 seconds. That puts a total between the two queries at 0.1631 seconds which is near 1/10th of a second versus 15+ seconds with the original subquery.
If I just included the field "wo_count" without using the COALESCE, I got NULL values for any asset tags that were not listed in the "mod_workorder_counts" table. So the COALESCE would give me a 0 for any NULL value, which is what I want.
Now I will set it up so that when a work order is entered for an asset tag, i'll have the INSERT/UPDATE query for the counts table update at that time so it doesn't run unnecessarily.