I have a mysql database with 7 million rows hosted on an aws rds micro instance.
I am doing this sort of queries:
SELECT
`id`, `address`, `property_type`, `rooms`,
`published`, `size`, `net_rent`, `gross_rent`,
`purchase_price`, `original_id`
FROM (`properties`)
WHERE
`postcode` IN ('1000', '1001', '1002', '1003',
'1004', '1005', '1006', '1007',
'1010', '1011', '1012', '1014',
'1015', '1017', '1018', '1019')
AND `published` < '2013-01-09'
AND `property_type` IN ('Apartment', 'Apartment with terrace',
'Attic', 'Attic flat / penthouse',
'Loft', 'Maisonette', 'Studio')
AND `sale_or_rent` = 'rent'
LIMIT 50
And thus I created an index like that:
| properties | 1 | listing | 1 | postcode | A | 25091 | NULL | NULL | YES | BTREE | | |
| properties | 1 | listing | 2 | published | A | 2333545 | NULL | NULL | YES | BTREE | | |
| properties | 1 | listing | 3 | property_type | A | 3500318 | NULL | NULL | YES | BTREE | | |
| properties | 1 | listing | 4 | sale_or_rent | A | 3500318 | NULL | NULL | YES | BTREE | | |
Everything is fine so far. The problems arrise when I try to add ORDER BY published DESC (30"+ for one query).
Is there anyway to have an efficient ORDER BY published DESC knowing that I also need published to be in a WHERE clause?
I'd try putting 'published' first in the index
Related
I have the following ugly query, which runs okay but not great, on my local machine (1.4 secs, running v5.7). On the server I'm using, which is running an older version of MySQL (v5.5), the query just hangs. It seems to get caught on "Copying to tmp table":
SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
p.street_number,
p.street_name,
p.site_address_city_state,
p.number_of_units,
p.number_of_stories,
p.bedrooms,
p.bathrooms,
p.lot_area_sqft,
p.cost_per_sq_ft,
p.year_built,
p.sales_date,
p.sales_price,
p.id
FROM (
SELECT APN, property_case_detail_id FROM property_inspection AS pi
GROUP BY APN, property_case_detail_id
HAVING
COUNT(IF(status='Resolved Date', 1, NULL)) = 0
) as open_cases
JOIN property AS p
ON p.parcel_number = open_cases.APN
LIMIT 0, 1000;
mysql> show processlist;
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| 21120 | headsupcity | localhost | lead_housing | Query | 21 | Copying to tmp table | SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
p.street_numbe |
| 21121 | headsupcity | localhost | lead_housing | Query | 0 | NULL | show processlist |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
Explains are different on my local machine and on the server, and I'm assuming the only reason my query runs at all on my local machine, is because of the key that is automatically created on the derived table:
Explain (local):
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| 1 | PRIMARY | p | NULL | ALL | NULL | NULL | NULL | NULL | 40319 | 100.00 | Using temporary |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 8 | lead_housing.p.parcel_number | 40 | 100.00 | NULL |
| 2 | DERIVED | pi | NULL | ALL | NULL | NULL | NULL | NULL | 1623978 | 100.00 | Using temporary; Using filesort |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
Explain (server):
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| 1 | PRIMARY | p | ALL | NULL | NULL | NULL | NULL | 41369 | Using temporary |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 122948 | Using where; Distinct; Using join buffer |
| 2 | DERIVED | pi | ALL | NULL | NULL | NULL | NULL | 1718586 | Using temporary; Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
Schemas:
mysql> explain property_inspection;
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| lblCaseNo | int(11) | NO | MUL | NULL | |
| APN | bigint(10) | NO | MUL | NULL | |
| date | varchar(50) | NO | | NULL | |
| status | varchar(500) | NO | | NULL | |
| property_case_detail_id | int(11) | YES | MUL | NULL | |
| case_type_id | int(11) | YES | MUL | NULL | |
| date_modified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| update_status | tinyint(1) | YES | | 1 | |
| created_date | datetime | NO | | NULL | |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
10 rows in set (0.02 sec)
mysql> explain property; (not all columns, but you get the gist)
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| parcel_number | bigint(10) | NO | | 0 | |
| date_modified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| created_date | datetime | NO | | NULL | |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
Variables that might be relevant:
tmp_table_size: 16777216
innodb_buffer_pool_size: 8589934592
Any ideas on how to optimize this, and any idea why the explains are so different?
Since this is where the Optimizers are quite different, let's try to optimize
SELECT APN, property_case_detail_id FROM property_inspection AS pi
GROUP BY APN, property_case_detail_id
HAVING
COUNT(IF(status='Resolved Date', 1, NULL)) = 0
) as open_cases
Give this a try:
SELECT ...
FROM property AS p
WHERE NOT EXISTS ( SELECT 1 FROM property_inspection
WHERE status = 'Resolved Date'
AND p.parcel_number = APN )
ORDER BY ??? -- without this, the `LIMIT` is unpredictable
LIMIT 0, 1000;
or...
SELECT ...
FROM property AS p
LEFT JOIN property_inspection AS pi ON p.parcel_number = pi.APN
WHERE pi.status = 'Resolved Date'
AND pi.APN IS NULL
ORDER BY ??? -- without this, the `LIMIT` is unpredictable
LIMIT 0, 1000;
Index:
property_inspection: INDEX(status, parcel_number) -- in either order
MySQL 5.5 and 5.7 are quite different and the later has better optimizer so there is no surprise that explain plans are different.
You'd better provide SHOW CREATE TABLE property; and SHOW CREATE TABLE property_inspection; outputs as it will show indexes that are on your tables.
Your sub-query is the issue.
- Server tries to process 1.6M rows with no index and grouping everything.
- Having is quite expensive operation so you'd better avoid it, expecially in sub-queries.
- Grouping in this case is bad idea. You do not need the aggregation/counting. You need to check if the 'Resolved Date' status is just exists
Based on the information provided I'd recommend:
- Alter table property_inspection to reduce length of status column.
- Add index on the column. Use covering index (APN, property_case_detail_id, status) if possible (in this columns order).
- Change query to something like this:
SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
...
p.id
FROM
property_inspection AS `pi1`
INNER JOIN property AS p ON (
p.parcel_number = `pi1`.APN
)
LEFT JOIN (
SELECT
`pi2`.property_case_detail_id
, `pi2`. APN
FROM
property_inspection AS `pi2`
WHERE
`status` = 'Resolved Date'
) AS exclude ON (
exclude.APN = `pi1`.APN
AND exclude.property_case_detail_id = `pi1`.property_case_detail_id
)
WHERE
exclude.APN IS NULL
LIMIT
0, 1000;
I need to select and display information from a pair of MySQL tables but the syntax eludes me. Specifically, I need to JOIN the data from the cwd_user table with the data from the cwd_user_attribute table on the field cwd_user.id == cwd_user_attribute.user_id, but I also need to display values from several entries in the cwd_user_attribute table in a single line. It's the latter that eludes me. Here are the gory details:
Given two tables:
mysql (crowd#prod:crowddb)> desc cwd_user;
+---------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+-------+
| id | bigint(20) | NO | PRI | NULL | |
| user_name | varchar(255) | NO | | NULL | |
| active | char(1) | NO | MUL | NULL | |
| created_date | datetime | NO | | NULL | |
| updated_date | datetime | NO | | NULL | |
| display_name | varchar(255) | YES | | NULL | |
| directory_id | bigint(20) | NO | MUL | NULL | |
+---------------------+--------------+------+-----+---------+-------+
mysql (crowd#prod:crowddb)> desc cwd_user_attribute;
+-----------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+--------------+------+-----+---------+-------+
| id | bigint(20) | NO | PRI | NULL | |
| user_id | bigint(20) | NO | MUL | NULL | |
| directory_id | bigint(20) | NO | MUL | NULL | |
| attribute_name | varchar(255) | NO | | NULL | |
| attribute_value | varchar(255) | YES | | NULL | |
+-----------------------+--------------+------+-----+---------+-------+
Assume that there are up to seven possible values for cwd_user_attribute.attribute_name and I'm interested in four of them: lastAuthenticated, Team, Manager Notes. Example:
mysql (crowd#prod:crowddb)> select * from cwd_user_attribute where user_id = (select id from cwd_user where user_name = 'gspinrad');
+---------+---------+--------------+-------------------------+----------------------------------+
| id | user_id | directory_id | attribute_name | attribute_value |
+---------+---------+--------------+-------------------------+----------------------------------+
| 65788 | 32844 | 1 | invalidPasswordAttempts | 0 |
| 65787 | 32844 | 1 | lastAuthenticated | 1473360428804 |
| 65790 | 32844 | 1 | passwordLastChanged | 1374005378040 |
| 65789 | 32844 | 1 | requiresPasswordChange | false |
| 4292909 | 32844 | 1 | Team | Engineering - DevOps |
| 4292910 | 32844 | 1 | Manager | Matt Karaffa |
| 4292911 | 32844 | 1 | Notes | Desk 32:2:11 |
+---------+---------+--------------+-------------------------+----------------------------------+
5 rows in set (0.00 sec)
I can get a list of the users sorted by lastAuthenticated with this query:
SELECT cwd_user.user_name, cwd_user.id, cwd_user.display_name, from_unixtime(cwd_user_attribute.attribute_value/1000) as last_login FROM cwd_user JOIN cwd_directory ON cwd_user.directory_id = cwd_directory.id JOIN cwd_user_attribute ON cwd_user.id = cwd_user_attribute.user_id AND cwd_user_attribute.attribute_name='lastAuthenticated' WHERE DATEDIFF((NOW()), (from_unixtime(cwd_user_attribute.attribute_value/1000))) > 90 and cwd_user.active='T' order by last_login limit 4;
Result:
+-----------------------+---------+-----------------------+---------------------+
| user_name | id | display_name | last_login |
+-----------------------+---------+-----------------------+---------------------+
| jenkins-administrator | 1605636 | Jenkins Administrator | 2011-10-27 17:28:05 |
| sonar-administrator | 1605635 | Sonar Administrator | 2012-02-06 15:59:59 |
| jfelix | 1605690 | Joey Felix | 2012-02-06 19:15:15 |
| kbitters | 3178497 | Kitty Bitters | 2013-09-03 10:09:59 |
What I need to add to the output is the value of cwd_user_attribute.attribute_value where cwd_user_attribute.attribute_name is Team, Manager, and/or Notes. The output would look something like this:
+-----------------------+---------+-----------------------+-------------------------------------------------------------------+
| user_name | id | display_name | last_login | Team | Manager | Notes |
+-----------------------+---------+-----------------------+-------------------------------------------------------------------+
| jenkins-administrator | 1605636 | Jenkins Administrator | 2011-10-27 17:28:05 | Internal | Internal | |
| sonar-administrator | 1605635 | Sonar Administrator | 2012-02-06 15:59:59 | Internal | Internal | |
| jfelix | 1605690 | Joey Felix | 2012-02-06 19:15:15 | Hardware Eng. | Gary Spinrad | Desk 32:1:51 |
| kbitters | 3178497 | Kitty Bitters | 2013-09-03 10:09:59 | Software QA | Matt Karaffa | Desk 32:2:01 |
+-----------------------+---------+-----------------------+-------------------------------------------------------------------+
You can achieve that result with an additional LEFT JOIN with the attribute table. Then use GROUP BY and aggregated CASE statements to pivot the result (rows to columns).
SELECT
cwd_user.user_name,
cwd_user.id,
cwd_user.display_name,
from_unixtime(cwd_user_attribute.attribute_value/1000) as last_login,
MIN(CASE WHEN attr2.attribute_name = 'TEAM' THEN attr2.attribute_value END) as Team,
MIN(CASE WHEN attr2.attribute_name = 'Manager' THEN attr2.attribute_value END) as Manager,
MIN(CASE WHEN attr2.attribute_name = 'Notes' THEN attr2.attribute_value END) as Notes
FROM
cwd_user
JOIN
cwd_user_attribute ON cwd_user.id = cwd_user_attribute.user_id
AND cwd_user_attribute.attribute_name='lastAuthenticated'
LEFT JOIN
cwd_user_attribute attr2 ON cwd_user.id = attr2.user_id
AND attr2.attribute_name IN ('Team', 'Manager', 'Notes')
WHERE
DATEDIFF((NOW()), (from_unixtime(cwd_user_attribute.attribute_value/1000))) > 90
AND cwd_user.active = 'T'
GROUP BY
cwd_user.id
ORDER BY
last_login
LIMIT 4
With strict mode you would need to list all not aggregated columns in the GROUP BY clause
GROUP BY
cwd_user.user_name,
cwd_user.id,
cwd_user.display_name,
cwd_user_attribute.attribute_value
Another way is just to use three LEFT JOINs (one join per attribute name):
SELECT
cwd_user.user_name,
cwd_user.id,
cwd_user.display_name,
from_unixtime(cwd_user_attribute.attribute_value/1000) as last_login,
attr_team.attribute_value as Team,
attr_manager.attribute_value as Manager,
attr_notes.attribute_value as Notes
FROM cwd_user
JOIN cwd_user_attribute
ON cwd_user.id = cwd_user_attribute.user_id
AND cwd_user_attribute.attribute_name='lastAuthenticated'
LEFT JOIN cwd_user_attribute attr_team
ON cwd_user.id = attr2.user_id
AND attr2.attribute_name = 'Team'
LEFT JOIN cwd_user_attribute attr_manager
ON cwd_user.id = attr2.user_id
AND attr2.attribute_name = 'Manager'
LEFT JOIN cwd_user_attribute attr_notes
ON cwd_user.id = attr2.user_id
AND attr2.attribute_name = 'Notes'
WHERE DATEDIFF((NOW()), (from_unixtime(cwd_user_attribute.attribute_value/1000))) > 90
and cwd_user.active='T'
order by last_login limit 4
Note: I have removed the join with directory table because you seem not to use it. Add it again, if you need it for filtering.
Note 2: Some attributes that you often use for a search (like lastAuthenticated) should be converted to indexed columns in the users table to improve the search performance.
This is table structure:
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| visitor_hash | varchar(40) | YES | MUL | NULL | |
| uri | varchar(255) | YES | | NULL | |
| ip_address | char(15) | YES | MUL | NULL | |
| last_visit | datetime | YES | | NULL | |
| visits | int(11) | NO | | NULL | |
| object_app | varchar(255) | YES | MUL | NULL | |
| object_model | varchar(255) | YES | | NULL | |
| object_id | varchar(255) | YES | | NULL | |
| blocked | tinyint(1) | NO | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
This is request:
SELECT `object_id`
FROM `visits_visit`
WHERE `object_model` = 'News'
GROUP BY `object_id`
ORDER BY COUNT( * ) DESC
LIMIT 0, 3
Time for response is ~77,63 ms.
CREATE INDEX resource_model ON visits_visit (object_model(100));
After this request the time for response increased to ~150ms.
How to improve performance for this case? Thank you.
UPDATED:
Answering to Michal Komorowski.
This is explain before index:
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | visits_visit | ALL | NULL | NULL | NULL | NULL | 142938 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
1 row in set (0.00 sec)
And this is after index:
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
| 1 | SIMPLE | visits_visit | ref | resource_model | resource_model | 303 | const | 64959 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
1 row in set (0.00 sec)
I don't know what gives me this information.
SELECT `object_id`
FROM `visits_visit`
WHERE `object_model` = 'News'
GROUP BY `object_id`
ORDER BY COUNT( * ) DESC
LIMIT 0, 3
78,85 ms before indexing and 365,59 ms after indexing.
Also i have index
CREATE INDEX resource ON visits_visit (object_app(100), object_model(100), object_id(100));
But i need this one, because in other select queries WHERE contains this three keys.
UPDATE:
I'm using django debug toolbar to test performance of requests.
UPDATE:
Query:
ANALYZE TABLE visits_visit;
Output:
+-----------------------------+---------+----------+-----------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------------------+---------+----------+-----------------------------+
| **************.visits_visit | analyze | status | Table is already up to date |
+-----------------------------+---------+----------+-----------------------------+
1 row in set (0.00 sec)
UPDATE:
SHOW INDEXES FROM visits_visit;
Output:
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| visits_visit | 0 | PRIMARY | 1 | id | A | 142938 | NULL | NULL | | BTREE | | |
| visits_visit | 1 | visits_visit_0880babc | 1 | visitor_hash | A | 142938 | NULL | NULL | YES | BTREE | | |
| visits_visit | 1 | visits_visit_5325a746 | 1 | ip_address | A | 142938 | NULL | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 1 | object_app | A | 1 | 100 | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 2 | object_model | A | 3 | 100 | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 3 | object_id | A | 959 | 100 | NULL | YES | BTREE | | |
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
It seems to me that although you have an index, MySQL doesn't know how to use it properly. It happens when information about data distribution (statistics) within a table are not up to date. In order to update them you should call ANALYZE TABLE visits_visit and then check results.
I was confused by misunderstanding of sql mechanisms, so i decided to create model Popular and save instances in it every 24 hours. Thanks to everyone, who tried to help.
As I said in your other question, Prefix indexes are virtually useless; don't use them except in rare circumstances.
Shrink the fields to reasonable lengths and you won't be tempted to use Prefix indexes.
The optimal index for that query is INDEX(object_model, object_id). Attempting to use INDEX(object_model(##), ...) will not get past object_model to anything after it.
If object_model is things like 'News', I suspect the other possible values are short, and perhaps there is a finite number of models. For "short" change to some smaller VARCHAR. For "finite", consider using ENUM('News', 'Weather', 'Sports', ...).
As for why it took longer after indexing...
Without the index, the Optimizer had no choice but to scan the entire table. This is a simple linear scan. It would read but not count any non-News rows.
With the index, the Optimizer has the additional choice of using the index. But, perhaps most rows are News? Well, it would scan the index (nice), but for each News item in the index, it would have to look up the row to get object_id (not so nice). It seems (from the timings) that the latter is less efficient.
By shrinking the declarations and using INDEX(object_model, object_id) (in this order), the query can be performed in the index. Think of the index as a mini-table with just those two columns in it. It is smaller. It is ordered by model, so it only needs to scan the 'News' part. The explain will show this "covering" by saying "Using index".
If all cases, the GROUP BY adds some overhead -- either keeping a hash of object_id in RAM or by saving intermediate results and sorting them. Then the ORDER BY requires a sort (or a priority hash) before the LIMIT can apply.
I'm trying to calculate maximum simultaneous calls. My query, which I believe to be accurate, takes way too long given ~250,000 rows. The cdrs table looks like this:
+---------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| CallType | varchar(32) | NO | | NULL | |
| StartTime | datetime | NO | MUL | NULL | |
| StopTime | datetime | NO | | NULL | |
| CallDuration | float(10,5) | NO | | NULL | |
| BillDuration | mediumint(8) unsigned | NO | | NULL | |
| CallMinimum | tinyint(3) unsigned | NO | | NULL | |
| CallIncrement | tinyint(3) unsigned | NO | | NULL | |
| BasePrice | float(12,9) | NO | | NULL | |
| CallPrice | float(12,9) | NO | | NULL | |
| TransactionId | varchar(20) | NO | | NULL | |
| CustomerIP | varchar(15) | NO | | NULL | |
| ANI | varchar(20) | NO | | NULL | |
| ANIState | varchar(10) | NO | | NULL | |
| DNIS | varchar(20) | NO | | NULL | |
| LRN | varchar(20) | NO | | NULL | |
| DNISState | varchar(10) | NO | | NULL | |
| DNISLATA | varchar(10) | NO | | NULL | |
| DNISOCN | varchar(10) | NO | | NULL | |
| OrigTier | varchar(10) | NO | | NULL | |
| TermRateDeck | varchar(20) | NO | | NULL | |
+---------------+-----------------------+------+-----+---------+----------------+
I have the following indexes:
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| cdrs | 0 | PRIMARY | 1 | id | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | id | 1 | id | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | call_time_index | 1 | StartTime | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | call_time_index | 2 | StopTime | A | 269622 | NULL | NULL | | BTREE | | |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
The query I am running is this:
SELECT MAX(cnt) AS max_channels FROM
(SELECT cl1.StartTime, COUNT(*) AS cnt
FROM cdrs cl1
INNER JOIN cdrs cl2
ON cl1.StartTime
BETWEEN cl2.StartTime AND cl2.StopTime
GROUP BY cl1.id)
AS counts;
It seems like I might have to chunk this data for each day and store the results in a separate table like simultaneous_calls.
I'm sure you want to know not only the maximum simultaneous calls, but when that happened.
I would create a table containing the timestamp of every individual minute
CREATE TABLE times (ts DATETIME UNSIGNED AUTO_INCREMENT PRIMARY KEY);
INSERT INTO times (ts) VALUES ('2014-05-14 00:00:00');
. . . until 1440 rows, one for each minute . . .
Then join that to the calls.
SELECT ts, COUNT(*) AS count FROM times
JOIN cdrs ON times.ts BETWEEN cdrs.starttime AND cdrs.stoptime
GROUP BY ts ORDER BY count DESC LIMIT 1;
Here's the result in my test (MySQL 5.6.17 on a Linux VM running on a Macbook Pro):
+---------------------+----------+
| ts | count(*) |
+---------------------+----------+
| 2014-05-14 10:59:00 | 1001 |
+---------------------+----------+
1 row in set (1 min 3.90 sec)
This achieves several goals:
Reduces the number of rows examined by two orders of magnitude.
Reduces the execution time from 3 hours+ to about 1 minute.
Also returns the actual timestamp when the highest count was found.
Here's the EXPLAIN for my query:
explain select ts, count(*) from times join cdrs on times.ts between cdrs.starttime and cdrs.stoptime group by ts order by count(*) desc limit 1;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
| 1 | SIMPLE | times | index | PRIMARY | PRIMARY | 5 | NULL | 1440 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | cdrs | ALL | starttime | NULL | NULL | NULL | 260727 | Range checked for each record (index map: 0x4) |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
Notice the figures in the rows column, and compare to the EXPLAIN of your original query. You can estimate the total number of rows examined by multiplying these together (but that gets more complicated if your query is anything other than SIMPLE).
The inline view isn't strictly necessary. (You're right about a lot of time to run the EXPLAIN on the query with the inline view, the EXPLAIN will materialize the inline view (i.e. run the inline view query and populate the derived table), and then give an EXPLAIN on the outer query.
Note that this query will return an equivalent result:
SELECT COUNT(*) AS max_channels
FROM cdrs cl1
JOIN cdrs cl2
ON cl1.StartTime BETWEEN cl2.StartTime AND cl2.StopTime
GROUP BY cl1.id
ORDER BY max_channels DESC
LIMIT 1
Though it still has to do all the work, and probably doesn't perform any better; the EXPLAIN should run a lot faster. (We expect to see "Using temporary; Using filesort" in the Extra column.)
The number of rows in the resultset is going to be the number of rows in the table (~250,000 rows), and those are going to need to be sorted, so that's going to be some time there. The bigger issue (my gut is telling me) is that join operation.
I'm wondering if the EXPLAIN (or performance) would be any different if you swapped the cl1 and cl2 in the predicate, i.e.
ON cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime
I'm thinking that, just because I'd be tempted to try a correlated subquery. That's ~250,000 executions, and that's not likely going to be any faster...
SELECT ( SELECT COUNT(*)
FROM cdrs cl2
WHERE cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime
) AS max_channels
, cl1.StartTime
FROM cdrs cl1
ORDER BY max_channels DESC
LIMIT 11
You could run an EXPLAIN on that, we're still going to see a "Using temporary; Using filesort", and it will also show the "dependent subquery"...
Obviously, adding a predicate on the cl1 table to cut down the number of rows to be returned (for example, checking only the past 15 days); that should speed things up, but it doesn't get you the answer you want.
WHERE cl1.StartTime > NOW() - INTERVAL 15 DAY
(None of my musings here are sure-fire answers to your question, or solutions to the performance issue; they're just musings.)
I am trying to get the correct formatting of results back from a Mysql query. When I ignore the NULL values, the formatting is correct, but when I allow null values to be included, my results are messed up.
I have the following query I am using:
select name,suite,webpagetest.id,MIN(priority) AS min_pri
FROM webpagetest,comparefileerrors
WHERE vco="aof" AND user="1" AND calibreversion="9"
AND webpagetest.id=comparefileerrors.id
AND comparefileerrors.priority IS NOT NULL
GROUP BY id
ORDER BY coalesce(priority,suite,name) ASC;
This returns the expected output:
+-----------------------------+-----------------------------+-------+---------+
| name | suite | id | min_pri |
+-----------------------------+-----------------------------+-------+---------+
| set_get_status | shortsRepairDB_2009.1_suite | 6193 | 0 |
| u2uDemo | shortsRepairDB_2009.1_suite | 6195 | 0 |
| change_sets | shortsRepairDB_2009.1_suite | 6194 | 0 |
| bz1508_SEGV_password | NULL | 6185 | 1 |
| assign_short_AND_user_info | shortsRepairDB_2009.1_suite | 6198 | 2 |
| bz1273_cmdline_execplussvdb | NULL | 6203 | 2 |
| bz1747_bad_lvsf | NULL | 36683 | 3 |
+-----------------------------+-----------------------------+-------+---------+
However, sometimes the priority values will not be set. If this is the case, I want the database to treat the priority as if it had an extremely high priority, so that the values with a null-priority are at the very bottom. I can not set the priority ahead of time (using a default value), but for the purposes of the sort, is it possible to do this?
Currently, if I issue the following command,
select name,suite,webpagetest.id,MIN(priority) AS min_pri
FROM webpagetest,comparefileerrors
WHERE vco="aof" AND user="1" AND calibreversion="9"
AND webpagetest.id=comparefileerrors.id
GROUP BY id
ORDER BY coalesce(priority,suite,name) ASC;
I get output like the following:
| name | suite | id | min_pri |
+-----------------------------+-------+-------+---------+
| bz1747_bad_lvsf | NULL | 36683 | 1 |
| NEC_Dragon.query | NULL | 36684 | NULL |
| avago_hwk_elam0_asic | NULL | 6204 | NULL |
| bz1273_cmdline_execplussvdb | NULL | 6203 | 2 |
| bz1491_query_server_crash | NULL | 6188 | NULL |
| bz1493_export_built_in_prop | NULL | 6186 | NULL |
+-----------------------------+-------+-------+---------+
6 rows in set (0.68 sec)
Here I have lost the formatting I had before. I would like the formatting to be as follows:
| name | suite | id | min_pri |
+-----------------------------+-------+-------+---------+
| bz1747_bad_lvsf | NULL | 36683 | 0 |
| NEC_Dragon.query | NULL | 36684 | 0 |
| avago_hwk_elam0_asic | NULL | 6204 | 1 |
| bz1273_cmdline_execplussvdb | NULL | 6203 | 2 |
| bz1491_query_server_crash | NULL | 6188 | NULL |
| bz1493_export_built_in_prop | NULL | 6186 | NULL |
+-----------------------------+-------+-------+---------+
6 rows in set (0.68 sec)
Hopefully I've explained this well enough that someone can understand what I want here.
Thanks for looking!
if you don't want to use sentinel value, i.e. ORDER BY COALESCE(priority, 99999); use:
select * from x
order by
case
when priority is not null then 1 /* non-nulls, first */
else 2 /* nulls last */
end,
priority
or you can take advantage of the fact that mysql boolean expression results to either 1 or 0:
select * from x
order by priority is null, priority
or if you're using postgresql:
select * from x order by priority nulls first
alternatively:
select * from x order by priority nulls last
Sounds like you want MIN(IFNULL(priority, 99999)). See the documentation for the IFNULL() function.