Mysql Group by time interval optimization - mysql

I have a very large table (several hundred millions of rows) that stores test results along with a datetime and a foreign key to a related entity called 'link', I need to to group rows by time intervals of 10,15,20,30 and 60 minutes as well as filter by time and 'link_id' I know this can be done with this query as explained [here][1]:
SELECT time,AVG(RTT),MIN(RTT),MAX(RTT),COUNT(*) FROM trace
WHERE link_id=1 AND time>='2015-01-01' AND time <= '2015-01-30'
GROUP BY UNIX_TIMESTAMP(time) DIV 600;
This solution worked but it was extremely slow (about 10 on average) so I tried adding a datetime column for each 'group by interval' for example the row:
id | time | rtt | link_id
1 | 2014-01-01 12:34:55.4034 | 154.3 | 2
became:
id | time | rtt | link_id | time_60 |time_30 ...
1 | 2014-01-01 12:34:55.4034 | 154.3 | 2 | 2014-01-01 12:00:00.00 | 2014-01-01 12:30:00.00 ...
and I get the intervals with the following query:
SELECT time_10,AVG(RTT),MIN(RTT),MAX(RTT),COUNT(*) FROM trace
WHERE link_id=1 AND time>='2015-01-01' AND time <= '2015-01-30'
GROUP BY time_10;
this query was at least 50% faster (about 5 seconds on average) but it is still pretty slow, how can I optimize this query to be faster?
explain query outputs this:
+----+-------------+------------+------+------------------------------------------------------------------------+----------------------------------------------------+---------+-------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+------------------------------------------------------------------------+----------------------------------------------------+---------+-------+---------+----------------------------------------------+
| 1 | SIMPLE | main_trace | ref | main_trace_link_id_c6febb11f84677f_fk_main_link_id,main_trace_e7549e3e | main_trace_link_id_c6febb11f84677f_fk_main_link_id | 4 | const | 1478359 | Using where; Using temporary; Using filesort |
+----+-------------+------------+------+------------------------------------------------------------------------+----------------------------------------------------+---------+-------+---------+----------------------------------------------+
and these are the table indexes:
+------------+------------+----------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+----------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| main_trace | 0 | PRIMARY | 1 | id | A | 2956718 | NULL | NULL | | BTREE | | |
| main_trace | 1 | main_trace_link_id_c6febb11f84677f_fk_main_link_id | 1 | link_id | A | 2 | NULL | NULL | | BTREE | | |
| main_trace | 1 | main_trace_07cc694b | 1 | time | A | 2956718 | NULL | NULL | | BTREE | | |
| main_trace | 1 | main_trace_e7549e3e | 1 | time_10 | A | 22230 | NULL | NULL | YES | BTREE | | |
| main_trace | 1 | main_trace_01af8333 | 1 | time_15 | A | 14783 | NULL | NULL | YES | BTREE | | |
| main_trace | 1 | main_trace_1681ff94 | 1 | time_20 | A | 10870 | NULL | NULL | YES | BTREE | | |
| main_trace | 1 | main_trace_f7c28c93 | 1 | time_30 | A | 6399 | NULL | NULL | YES | BTREE | | |
| main_trace | 1 | main_trace_0f29fcc5 | 1 | time_60 | A | 3390 | NULL | NULL | YES | BTREE | | |
+------------+------------+----------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

For this query:
SELECT time_10, AVG(RTT), MIN(RTT), MAX(RTT), COUNT(*)
FROM trace
WHERE link_id = 1 AND time >= '2015-01-01' AND time <= '2015-01-30'
GROUP BY time_10;
The best index is the covering index: trace(link_id, time, time_10, rtt).

a composite index on (id,time) followed by a potential analyze table trace would make it snappy.
It is just a suggestion, I am not saying to do it. Analyze table can take some people hours to run with millions of rows.
Suggesting index creation based on just one query is not a great idea. Assumption being, you have other queries. And they are a drag on inserts/updates.

time <= '2015-01-30' excludes most of the last day of January; did you want that? This pattern works well, and avoids many endcases (eg, leapyear):
WHERE time >= '2015-01-01'
AND time < '2015-01-01' + INTERVAL 1 MONTH
If this is static data (such as a write-once Data Warehouse), you could make the query much faster by building and maintaining Summary Tables.

Related

Index not used in query. How to improve performance?

I have this query:
SELECT
*
FROM
`av_cita`
JOIN `av_cita_cstm` ON (
(
`av_cita`.`id` = `av_cita_cstm`.`id_c`
)
)
WHERE
av_cita.deleted = 0
This query takes over 120 seconds to finish, yet I have added all indexes.
When I ask for the execution plan:
explain SELECT * FROM `av_cita`
JOIN `av_cita_cstm` ON ( ( `av_cita`.`id` = `av_cita_cstm`.`id_c` ) )
WHERE av_cita.deleted = 0;
I get this:
+----+-------------+--------------+--------+----------------------+---------+---------+---------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+----------------------+---------+---------+---------------------------+--------+-------------+
| 1 | SIMPLE | av_cita | ALL | PRIMARY,delete_index | NULL | NULL | NULL | 192549 | Using where |
| 1 | SIMPLE | av_cita_cstm | eq_ref | PRIMARY | PRIMARY | 108 | rednacional_v2.av_cita.id | 1 | |
+----+-------------+--------------+--------+----------------------+---------+---------+---------------------------+--------+-------------+
delete_index is listed in the possible_keys column, but the key is null, and it doesn't use the index.
Table and index definitions:
+------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+-------+
| id | char(36) | NO | PRI | NULL | |
| name | varchar(255) | YES | MUL | NULL | |
| date_entered | datetime | YES | MUL | NULL | |
| date_modified | datetime | YES | | NULL | |
| modified_user_id | char(36) | YES | | NULL | |
| created_by | char(36) | YES | MUL | NULL | |
| description | text | YES | | NULL | |
| deleted | tinyint(1) | YES | MUL | 0 | |
| assigned_user_id | char(36) | YES | MUL | NULL | |
+------------------+--------------+------+-----+---------+-------+
+---------+------------+--------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+--------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| av_cita | 0 | PRIMARY | 1 | id | A | 192786 | NULL | NULL | | BTREE | | |
| av_cita | 1 | delete_index | 1 | deleted | A | 2 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | name_index | 1 | name | A | 96393 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | date_entered_index | 1 | date_entered | A | 96393 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | created_by | 1 | created_by | A | 123 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | assigned_user_id | 1 | assigned_user_id | A | 1276 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | deleted_id | 1 | deleted | A | 2 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | deleted_id | 2 | id | A | 192786 | NULL | NULL | | BTREE | | |
| av_cita | 1 | id | 1 | id | A | 192786 | NULL | NULL | | BTREE | | |
+---------+------------+--------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
How can I improve the performance of this query?
The query is losing time on making the join. I would strongly suggest to create and index on av_cita_cstm.id_c. Then the plan will probably be changed to use that index for the av_cita_cstm table, which is much better than PRIMARY. As a consequence PRIMARY will be used on ac_cita.
I think that will bring a big improvement. You might still get more improvement if you make sure delete_index is defined with two fields: (deleted, id), and then move the where condition of the SQL statement into the join condition. But I am not sure MySql will see this as a possibility.
The index on deleted is not used probably because the optimizer has decided that a full table-scan is cheaper than using the index. MySQL tends to make this decision if the value you search for is found on about 20% or more of the rows in the table.
By analogy, think of the index at the back of a book. You can understand why common words like "the" aren't indexed. It would be easier to just read the book cover-to-cover than to flip back and forth to the index, which only tells you that "the" appears on a majority of pages.
If you think MySQL has made the wrong decision, you can make it pretend that a table-scan is more expensive than using a specific index:
SELECT
*
FROM
`av_cita` FORCE INDEX (deleted_index)
JOIN `av_cita_cstm` ON (
(
`av_cita`.`id` = `av_cita_cstm`.`id_c`
)
)
WHERE
av_cita.deleted = 0
Read http://dev.mysql.com/doc/refman/5.7/en/index-hints.html for more information about index hints. Don't overuse index hints, they're useful only in rare cases. Most of the time the optimizer makes the right decision.
Your EXPLAIN plan shows that your join to av_cita_cstm is already using a unique index (the clue is "type: eq_ref" and also the "rows: 1"). I don't think any new index is needed in that table.
I notice the EXPLAIN shows that the table-scan on av_cita scans about an estimated 192549 rows. I'm really surprised that this takes 120 seconds. On any reasonably powerful computer, that should run much faster.
That makes me wonder if you have something else that needs tuning or configuration on this server:
What other processes are running on the server? A lot of applications, perhaps? Are the other processes also running slowly on this server? Do you need to increase the power of the server, or move applications onto their own server?
If you're on MySQL 5.7, try querying the sys schema: this:
select * from sys.innodb_buffer_stats_by_table
where object_name like 'av_cita%';
Are there other costly SQL queries running concurrently?
Did you under-allocate MySQL's innodb_buffer_pool_size? If it's too small, it could be furiously recycling pages in RAM as it scans your table.
select ##innodb_buffer_pool_size;
Did you over-allocate innodb_buffer_pool_size? Once I helped tune a server that was running very slowly. It turned out they had a 4GB buffer pool, but only 1GB of physical RAM. The operating system was swapping like crazy, causing everything to run slowly.
Another thought: You have shown us the columns in av_cita, but not the table structure for av_cita_cstm. Why are you fetching SELECT *? Do you really need all the columns? Are there huge BLOB/TEXT columns in the latter table? If so, it could be reading a large amount of data from disk that you don't need.
When you ask SQL questions, it would help if you run
SHOW CREATE TABLE av_cita\G
SHOW TABLE STATUS LIKE 'av_cita'\G
And also run the same commands for the other table av_cita_cstm, and include the output in your question above.

What index should increase performance of select query?

This is table structure:
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| visitor_hash | varchar(40) | YES | MUL | NULL | |
| uri | varchar(255) | YES | | NULL | |
| ip_address | char(15) | YES | MUL | NULL | |
| last_visit | datetime | YES | | NULL | |
| visits | int(11) | NO | | NULL | |
| object_app | varchar(255) | YES | MUL | NULL | |
| object_model | varchar(255) | YES | | NULL | |
| object_id | varchar(255) | YES | | NULL | |
| blocked | tinyint(1) | NO | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
This is request:
SELECT `object_id`
FROM `visits_visit`
WHERE `object_model` = 'News'
GROUP BY `object_id`
ORDER BY COUNT( * ) DESC
LIMIT 0, 3
Time for response is ~77,63 ms.
CREATE INDEX resource_model ON visits_visit (object_model(100));
After this request the time for response increased to ~150ms.
How to improve performance for this case? Thank you.
UPDATED:
Answering to Michal Komorowski.
This is explain before index:
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | visits_visit | ALL | NULL | NULL | NULL | NULL | 142938 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
1 row in set (0.00 sec)
And this is after index:
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
| 1 | SIMPLE | visits_visit | ref | resource_model | resource_model | 303 | const | 64959 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
1 row in set (0.00 sec)
I don't know what gives me this information.
SELECT `object_id`
FROM `visits_visit`
WHERE `object_model` = 'News'
GROUP BY `object_id`
ORDER BY COUNT( * ) DESC
LIMIT 0, 3
78,85 ms before indexing and 365,59 ms after indexing.
Also i have index
CREATE INDEX resource ON visits_visit (object_app(100), object_model(100), object_id(100));
But i need this one, because in other select queries WHERE contains this three keys.
UPDATE:
I'm using django debug toolbar to test performance of requests.
UPDATE:
Query:
ANALYZE TABLE visits_visit;
Output:
+-----------------------------+---------+----------+-----------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------------------+---------+----------+-----------------------------+
| **************.visits_visit | analyze | status | Table is already up to date |
+-----------------------------+---------+----------+-----------------------------+
1 row in set (0.00 sec)
UPDATE:
SHOW INDEXES FROM visits_visit;
Output:
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| visits_visit | 0 | PRIMARY | 1 | id | A | 142938 | NULL | NULL | | BTREE | | |
| visits_visit | 1 | visits_visit_0880babc | 1 | visitor_hash | A | 142938 | NULL | NULL | YES | BTREE | | |
| visits_visit | 1 | visits_visit_5325a746 | 1 | ip_address | A | 142938 | NULL | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 1 | object_app | A | 1 | 100 | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 2 | object_model | A | 3 | 100 | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 3 | object_id | A | 959 | 100 | NULL | YES | BTREE | | |
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
It seems to me that although you have an index, MySQL doesn't know how to use it properly. It happens when information about data distribution (statistics) within a table are not up to date. In order to update them you should call ANALYZE TABLE visits_visit and then check results.
I was confused by misunderstanding of sql mechanisms, so i decided to create model Popular and save instances in it every 24 hours. Thanks to everyone, who tried to help.
As I said in your other question, Prefix indexes are virtually useless; don't use them except in rare circumstances.
Shrink the fields to reasonable lengths and you won't be tempted to use Prefix indexes.
The optimal index for that query is INDEX(object_model, object_id). Attempting to use INDEX(object_model(##), ...) will not get past object_model to anything after it.
If object_model is things like 'News', I suspect the other possible values are short, and perhaps there is a finite number of models. For "short" change to some smaller VARCHAR. For "finite", consider using ENUM('News', 'Weather', 'Sports', ...).
As for why it took longer after indexing...
Without the index, the Optimizer had no choice but to scan the entire table. This is a simple linear scan. It would read but not count any non-News rows.
With the index, the Optimizer has the additional choice of using the index. But, perhaps most rows are News? Well, it would scan the index (nice), but for each News item in the index, it would have to look up the row to get object_id (not so nice). It seems (from the timings) that the latter is less efficient.
By shrinking the declarations and using INDEX(object_model, object_id) (in this order), the query can be performed in the index. Think of the index as a mini-table with just those two columns in it. It is smaller. It is ordered by model, so it only needs to scan the 'News' part. The explain will show this "covering" by saying "Using index".
If all cases, the GROUP BY adds some overhead -- either keeping a hash of object_id in RAM or by saving intermediate results and sorting them. Then the ORDER BY requires a sort (or a priority hash) before the LIMIT can apply.

Wondering how I can speed up a MySQL call

I'm looking for an answer as to how I can speed up my query on a table of 500,000 records.
I'm just inserting the COUNT to BROKERAGE_STOCKS_COVERED counting the number of times the same brokerage ESTIMID shows up within a date range for each record - excluding the record being examined. The only other condition is that the ANALYST is not blank.
I make a number of similar calls on the table - they all come back in 10 ... maybe 15 seconds. The only difference from this call and my others - is that this one returns a COUNT of up to 1000 for BROKERAGE_STOCKS_COVERED - whereas my other queries result in maybe 3, or 4 COUNT. This one takes almost a whole hour: :/
UPDATE `working` SET `BROKERAGE_STOCKS_COVERED` =
(SELECT COUNT(`ID`)
FROM ( SELECT `ID`, `ESTIMID`, `ANNDATS_CONVERTED`,
`ANALYST`, `REVDATS_CONVERTED`
FROM `working`
) AS BB
WHERE
BB.`ANNDATS_CONVERTED` <= `working`.`ANNDATS_CONVERTED`
AND
BB.`REVDATS_CONVERTED` > `working`.`ANNDATS_CONVERTED`
AND
BB.`ID` != `working`.`ID`
AND
BB.`ESTIMID` = `working`.`ESTIMID`
AND
BB.`ANALYST` != ''
)
WHERE `working`.`ANALYST` != '';
-- 0n 500,000 rows "457656 rows affected. (Query took 2782.4304 seconds.)" (46 min)
| ID | ANALYST | ESTIMID | ANNDATS_CONVERTED | REVDATS_CONVERTED | BROKERAGE_STOCKS_COVERED | NO_TOP_RATING |
--------------------------------------------------------------------------------------------------------------------
| 1 | DAVE | Brokerage000 | 1998-07-01 | 1998-07-04 | | 3 |
| 2 | DAVE | Brokerage000 | 1998-06-28 | 1998-07-10 | | 4 |
| 3 | DAVE | Brokerage000 | 1998-07-02 | 1998-07-08 | | 2 |
| 4 | DAVE | Brokerage000 | 1998-07-04 | 1998-12-04 | | 3 |
| 5 | SAM | Brokerage000 | 1998-06-14 | 1998-06-30 | | 4 |
| 6 | SAM | Brokerage000 | 1998-06-28 | 1999-08-08 | | 4 |
| 7 | | Brokerage000 | 1998-06-28 | 1999-08-08 | | 5 |
| 8 | DAVE | Brokerage111 | 1998-06-28 | 1999-08-08 | | 3 |
'EXPLAIN' results:
id| select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
----------------------------------------------------------------------------------------------------------------------------------------
1 | PRIMARY | working | index | ANALYST | PRIMARY | 4 | NULL | 467847 | Using where
2 | DEPENDENT SUBQUERY | <derived3> | ref | <auto_key0> | <auto_key0> | 92 | working.ESTIMID | 46785 | Using where
3 | DERIVED | working | ALL | NULL | NULL | NULL | NULL | 467847 | NULL
EXPLAIN
SELECT COUNT(`ID`) FROM (SELECT `ID`, `IRECCD`, `ANALYST`, `ESTIMID`, `ANNDATS_CONVERTED`, `REVDATS_CONVERTED` FROM `working`) AS BB
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
--------------------------------------------------------------------------------------------------
1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 462762 | NULL
2 | DERIVED | working | ALL | NULL | NULL | NULL | NULL | 462762 | NULL
EXPLAIN
SELECT COUNT(`ID`) FROM (SELECT `ID`, `IRECCD`, `ANALYST`, `ESTIMID`, `ANNDATS_CONVERTED`, `REVDATS_CONVERTED` FROM `working`) AS BB
WHERE
BB.`ANNDATS_CONVERTED` <= `ANNDATS_CONVERTED`
AND
BB.`REVDATS_CONVERTED` > `ANNDATS_CONVERTED`
AND
BB.`ID` != `ID`
AND
BB.`ESTIMID` = `ESTIMID`
AND
BB.`ANALYST` != ''
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
----------------------------------------------------------------------------------------------------
1 | PRIMARY |NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE
2 | DERIVED | working | ALL | NULL | NULL | NULL | NULL | 462762 | NULL
I think the "impossible WHERE" is just because it this part of the query is separated from the UPDATE for the purpose of displaying the "EXPLAIN
I am using InnoDB on a windows 8 PHP/MySQL install.
My columns are indexed. I have memory maxed on my windows/MySQL/
and it all works great.
- Just wondering if this is a normal wait time for such a query?
- And is there a way to speed this particular query up?
Generally - when attempting to optimize a slow running query - one would ask the database system to explain it's strategy of resolving the query. In this case you can use the SQL Explain command, on the sub-select and independently and on the where clause, to find the exact cause of the slow down. This may indicate if your where clause should exist outside the sub-select, or if the problem lies elsewhere.

How can I optimize this mysql query to find maximum simultaneous calls?

I'm trying to calculate maximum simultaneous calls. My query, which I believe to be accurate, takes way too long given ~250,000 rows. The cdrs table looks like this:
+---------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| CallType | varchar(32) | NO | | NULL | |
| StartTime | datetime | NO | MUL | NULL | |
| StopTime | datetime | NO | | NULL | |
| CallDuration | float(10,5) | NO | | NULL | |
| BillDuration | mediumint(8) unsigned | NO | | NULL | |
| CallMinimum | tinyint(3) unsigned | NO | | NULL | |
| CallIncrement | tinyint(3) unsigned | NO | | NULL | |
| BasePrice | float(12,9) | NO | | NULL | |
| CallPrice | float(12,9) | NO | | NULL | |
| TransactionId | varchar(20) | NO | | NULL | |
| CustomerIP | varchar(15) | NO | | NULL | |
| ANI | varchar(20) | NO | | NULL | |
| ANIState | varchar(10) | NO | | NULL | |
| DNIS | varchar(20) | NO | | NULL | |
| LRN | varchar(20) | NO | | NULL | |
| DNISState | varchar(10) | NO | | NULL | |
| DNISLATA | varchar(10) | NO | | NULL | |
| DNISOCN | varchar(10) | NO | | NULL | |
| OrigTier | varchar(10) | NO | | NULL | |
| TermRateDeck | varchar(20) | NO | | NULL | |
+---------------+-----------------------+------+-----+---------+----------------+
I have the following indexes:
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| cdrs | 0 | PRIMARY | 1 | id | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | id | 1 | id | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | call_time_index | 1 | StartTime | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | call_time_index | 2 | StopTime | A | 269622 | NULL | NULL | | BTREE | | |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
The query I am running is this:
SELECT MAX(cnt) AS max_channels FROM
(SELECT cl1.StartTime, COUNT(*) AS cnt
FROM cdrs cl1
INNER JOIN cdrs cl2
ON cl1.StartTime
BETWEEN cl2.StartTime AND cl2.StopTime
GROUP BY cl1.id)
AS counts;
It seems like I might have to chunk this data for each day and store the results in a separate table like simultaneous_calls.
I'm sure you want to know not only the maximum simultaneous calls, but when that happened.
I would create a table containing the timestamp of every individual minute
CREATE TABLE times (ts DATETIME UNSIGNED AUTO_INCREMENT PRIMARY KEY);
INSERT INTO times (ts) VALUES ('2014-05-14 00:00:00');
. . . until 1440 rows, one for each minute . . .
Then join that to the calls.
SELECT ts, COUNT(*) AS count FROM times
JOIN cdrs ON times.ts BETWEEN cdrs.starttime AND cdrs.stoptime
GROUP BY ts ORDER BY count DESC LIMIT 1;
Here's the result in my test (MySQL 5.6.17 on a Linux VM running on a Macbook Pro):
+---------------------+----------+
| ts | count(*) |
+---------------------+----------+
| 2014-05-14 10:59:00 | 1001 |
+---------------------+----------+
1 row in set (1 min 3.90 sec)
This achieves several goals:
Reduces the number of rows examined by two orders of magnitude.
Reduces the execution time from 3 hours+ to about 1 minute.
Also returns the actual timestamp when the highest count was found.
Here's the EXPLAIN for my query:
explain select ts, count(*) from times join cdrs on times.ts between cdrs.starttime and cdrs.stoptime group by ts order by count(*) desc limit 1;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
| 1 | SIMPLE | times | index | PRIMARY | PRIMARY | 5 | NULL | 1440 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | cdrs | ALL | starttime | NULL | NULL | NULL | 260727 | Range checked for each record (index map: 0x4) |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
Notice the figures in the rows column, and compare to the EXPLAIN of your original query. You can estimate the total number of rows examined by multiplying these together (but that gets more complicated if your query is anything other than SIMPLE).
The inline view isn't strictly necessary. (You're right about a lot of time to run the EXPLAIN on the query with the inline view, the EXPLAIN will materialize the inline view (i.e. run the inline view query and populate the derived table), and then give an EXPLAIN on the outer query.
Note that this query will return an equivalent result:
SELECT COUNT(*) AS max_channels
FROM cdrs cl1
JOIN cdrs cl2
ON cl1.StartTime BETWEEN cl2.StartTime AND cl2.StopTime
GROUP BY cl1.id
ORDER BY max_channels DESC
LIMIT 1
Though it still has to do all the work, and probably doesn't perform any better; the EXPLAIN should run a lot faster. (We expect to see "Using temporary; Using filesort" in the Extra column.)
The number of rows in the resultset is going to be the number of rows in the table (~250,000 rows), and those are going to need to be sorted, so that's going to be some time there. The bigger issue (my gut is telling me) is that join operation.
I'm wondering if the EXPLAIN (or performance) would be any different if you swapped the cl1 and cl2 in the predicate, i.e.
ON cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime
I'm thinking that, just because I'd be tempted to try a correlated subquery. That's ~250,000 executions, and that's not likely going to be any faster...
SELECT ( SELECT COUNT(*)
FROM cdrs cl2
WHERE cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime
) AS max_channels
, cl1.StartTime
FROM cdrs cl1
ORDER BY max_channels DESC
LIMIT 11
You could run an EXPLAIN on that, we're still going to see a "Using temporary; Using filesort", and it will also show the "dependent subquery"...
Obviously, adding a predicate on the cl1 table to cut down the number of rows to be returned (for example, checking only the past 15 days); that should speed things up, but it doesn't get you the answer you want.
WHERE cl1.StartTime > NOW() - INTERVAL 15 DAY
(None of my musings here are sure-fire answers to your question, or solutions to the performance issue; they're just musings.)

indexes and speeding up 'derived' queries

I've recently noticed that a query I have is running quite slowly, at almost 1 second per query.
The query looks like this
SELECT eventdate.id,
eventdate.eid,
eventdate.date,
eventdate.time,
eventdate.title,
eventdate.address,
eventdate.rank,
eventdate.city,
eventdate.state,
eventdate.name,
source.link,
type,
eventdate.img
FROM source
RIGHT OUTER JOIN
(
SELECT event.id,
event.date,
users.name,
users.rank,
users.eid,
event.address,
event.city,
event.state,
event.lat,
event.`long`,
GROUP_CONCAT(types.type SEPARATOR ' | ') AS type
FROM event FORCE INDEX (latlong_idx)
JOIN users ON event.uid = users.id
JOIN types ON users.tid=types.id
WHERE `long` BETWEEN -74.36829174058 AND -73.64365405942
AND lat BETWEEN 40.35195025942 AND 41.07658794058
AND event.date >= '2009-10-15'
GROUP BY event.id, event.date
ORDER BY event.date, users.rank DESC
LIMIT 0, 20
)eventdate
ON eventdate.uid = source.uid
AND eventdate.date = source.date;
and the explain is
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+-------+---------------------------------+
| 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 20 | |
| 1 | PRIMARY | source | ref | iddate_idx | iddate_idx | 7 | eventdate.id,eventdate.date | 156 | |
| 2 | DERIVED | event | ALL | latlong_idx | NULL | NULL | NULL | 19500 | Using temporary; Using filesort |
| 2 | DERIVED | types | ref | eid_idx | eid_idx | 4 | active.event.id | 10674 | Using index |
| 2 | DERIVED | users | eq_ref | id_idx | id_idx | 4 | active.types.id | 1 | Using where |
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+-------+---------------------------------+
I've tried using 'force index' on latlong, but that doesn't seem to speed things up at all.
Is it the derived table that is causing the slow responses? If so, is there a way to improve the performance of this?
--------EDIT-------------
I've attempted to improve the formatting to make it more readable, as well
I run the same query changing only the 'WHERE statement as
WHERE users.id = (
SELECT users.id
FROM users
WHERE uidname = 'frankt1'
ORDER BY users.approved DESC , users.rank DESC
LIMIT 1 )
AND date & gt ; = '2009-10-15'
GROUP BY date
ORDER BY date)
That query runs in 0.006 seconds
the explain looks like
+----+-------------+------------+-------+---------------+---------------+---------+------------------------------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+---------------+---------+------------------------------+------+----------------+
| 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 42 | |
| 1 | PRIMARY | source | ref | iddate_idx | iddate_idx | 7 | eventdate.id,eventdate.date | 156 | |
| 2 | DERIVED | users | const | id_idx | id_idx | 4 | | 1 | |
| 2 | DERIVED | event | range | eiddate_idx | eiddate_idx | 7 | NULL | 24 | Using where |
| 2 | DERIVED | types | ref | eid_idx | eid_idx | 4 | active.event.bid | 3 | Using index |
| 3 | SUBQUERY | users | ALL | idname_idx | idname_idx | 767 | | 5 | Using filesort |
+----+-------------+------------+-------+---------------+---------------+---------+------------------------------+------+----------------+
The only way to clean up that mammoth SQL statement is to go back to the drawing board and carefully work though your database design and requirements. As soon as you start joining 6 tables and using an inner select you should expect incredible execution times.
As a start, ensure that all your id fields are indexed, but better to ensure that your design is valid. I don't know where to START looking at your SQL - even after I reformatted it for you.
Note that 'using indexes' means you need to issue the correct instructions when you CREATE or ALTER the tables you are using. See for instance MySql 5.0 create indexes