SELECT IF(priority_date, priority_date, created_at) as created_at
FROM table
WHERE IF(priority_date , priority_date , created_at)
BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59';
What is the best way to execute this query, performance-wise?
I have a fairly large table that has two datetimes. created_at and priority_date.
priority_date doesn't always exist, but if it does, it should be what is queried upon, else it falls back to created_at. created_at is always generated upon creation of the row. The above query causes a (nearly) full table scan.
The explain plan for initial query:
+------+-------------+-----------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-----------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | table | ALL | NULL | NULL | NULL | NULL | 444877 | Using where |
+------+-------------+-----------------+------+---------------+------+---------+------+--------+-------------+
I should also note that priority_date or created_at may not necessarily both be within the time frame in question on a single row. So doing something like:
WHERE priority_date BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59'
OR created_at BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59'
Could give bad results if priority_date was 2017-10-04 23:10:43 and created_at was 2017-10-10 01:23:45
My current rows for said table: 582739
Count of WHERE priority_date BETWEEN... : 3908
Count of WHERE created_at BETWEEN...: 3437
Example explain of just one of the columns queried in WHERE BETWEEN:
+------+-------------+-----------------+-------+----------------------------------+----------------------------------+---------+------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-----------------+-------+----------------------------------+----------------------------------+---------+------+------+-----------------------+
| 1 | SIMPLE | table | range | table_created_at_index | table_created_at_index | 5 | NULL | 3436 | Using index condition |
+------+-------------+-----------------+-------+----------------------------------+----------------------------------+---------+------+------+-----------------------+
Clearly the IF is not the most efficient. The columns are indexed and the explains of individual rows are matching to their counts for rows on the explain plan. How can I leverage having a priority/fallback query without the wild performance loss?
EDIT
The best I've been able to figure (But WOW, is that verbose and copy/paste-y feeling)
SELECT IF(priority_date, priority_date, created_at) as created_at, priority_date
FROM table
WHERE priority_date BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59'
OR created_at BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59'
HAVING ((priority_date AND priority_date BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59')
OR created_at BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59');
And its explain plan:
+------+-------------+-----------------+-------------+-----------------------------------------------------------------------+-----------------------------------------------------------------------+---------+------+------+------------------------------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-----------------+-------------+-----------------------------------------------------------------------+-----------------------------------------------------------------------+---------+------+------+------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | table | index_merge | table_priority_date_index,table_created_at_index | table_priority_date_index,table_created_at_index | 6,5 | NULL | 7343 | Using sort_union(table_priority_date_index,table_created_at_index); Using where |
+------+-------------+-----------------+-------------+-----------------------------------------------------------------------+-----------------------------------------------------------------------+---------+------+------+------------------------------------------------------------------------------------------------------+
First you need a compound index on (priority_date, created_at), then you can use a query like this:
SELECT IF(priority_date, priority_date, created_at) as created_at, priority_date
FROM table
WHERE priority_date BETWEEN '2017-10-10' AND '2017-10-10 23:59:59'
OR (priority_date IS NULL AND created_at BETWEEN '2017-10-10' AND '2017-10-10 23:59:59');
Having priority_date first in the compound index makes a big difference. No union is required.
Explain results on 400k rows with 2000 results:
Extra: Using where; Using index
key: priority_created_compound
rows: 2000
SELECT priority_date as created_at
FROM table
WHERE priority_date BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59'
UNION ALL
SELECT created_at
FROM table
WHERE created_at BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59'
AND priority_date IS NULL;
You'll need an index starting with priority_date for the first half of this query, and an index on (created_at, priority_date) for the second half.
The first half will naturally not match any rows where the priority_date is NULL.
The second half will do the range-condition on created_at, and then among the subset of matching rows, further test that priority_date is NULL. This may be done by index condition pushdown.
( SELECT priority_date AS created_at
FROM table
WHERE priority_date >= '2017-10-10'
AND priority_date < '2017-10-10' + INTERVAL 1 DAY )
UNION DISTINCT
( SELECT created_at
FROM table
WHERE created_at >= '2017-10-10'
AND created_at < '2017-10-10' + INTERVAL 1 DAY
AND priority_date IS NULL )
With
INDEX(priority_date, created_at) -- in this order
Notes:
This way to do BETWEEN works better for date ranges other than DATETIME, plus avoids computing leap days, etc. (This is no performance difference.)
For each subquery, the one index is "covering" and optimal. No ICP should be needed.
I chose DISTINCT on the UNION -- Though slower than ALL, it may be more to your app's liking. Switch to ALL if there can't be dups, or if dups are OK.
Related
I'm looking to achieve efficient indexing technique for my logs table that looks like this:
MariaDB [Webapp]> explain logs;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | YES | MUL | NULL | |
| activity_name | varchar(20) | NO | | NULL | |
| activity_key | varchar(255) | NO | | NULL | |
| activity_value | varchar(255) | NO | | NULL | |
| activity_date | datetime | NO | MUL | NULL | |
+----------------+--------------+------+-----+---------+----------------+
I do searching like this:
SELECT *
FROM logs
WHERE user_id IN (1, 3)
AND activity_name IN ('login', 'logout')
AND activity_date >= '2020-02-01'
AND activity_date <= '2020-06-01'
Where columns user_id, activity_name and activity_date are involved
And sometimes like this:
SELECT *
FROM logs
WHERE user_id IN (1, 3)
AND activity_name IN ('login', 'logout')
Where both user_id and activity_name are involved but no date.
And like this too:
SELECT *
FROM logs
WHERE user_id IN (1, 3)
AND activity_date >= '2020-02-01'
AND activity_date <= '2020-06-01'
SELECT *
FROM logs
WHERE activity_name IN ('login', 'logout')
AND activity_date >= '2020-02-01'
AND activity_date <= '2020-06-01'
I did read about Compound Indexes and that they would be good if my search was ordered, but as you can see it's not so I think its not suitable..
And I also read that single index can be used just on one column at once, so i think it won't be good for my case..
Any ideas please, I'm not too much familiar with MySQL. How can I make my queries optimal?
Note: I don't use the wildcard (*) because I read it slow down things but I just put it to shorten the query for easier understanding
For each query, the base idea is to have an index whose columns cover the where clause. For your This cannot be achieved using a single index for the four queries - I think that you need 3 indexes.
First, consider the following index:
logs(user_id, activity_name, activity_date)
It matches on the where clause of the first query:
WHERE
user_id IN (1, 3)
AND activity_name IN ('login', 'logout')
AND activity_date >= '2020-02-01'
AND activity_date <= '2020-06-01'
And also on the second query (the third index column is ignored here):
WHERE
user_id IN (1, 3)
AND activity_name IN ('login', 'logout')
For the two other queries, you need two separate indexes:
WHERE
user_id IN (1, 3)
AND activity_date >= '2020-02-01'
AND activity_date <= '2020-06-01'
Needs:
logs(user_id, activity_date)
And:
WHERE
activity_name IN ('login', 'logout')
AND activity_date >= '2020-02-01'
AND activity_date <= '2020-06-01'
Needs:
logs(activity_name, activity_date)
Side note: in general, do not blindly select *; instead, enumerate the columns you want in the result set - especially if you don't want them all. If you just need two or three columns, consider adding them at the end of the index, hence turning it to a covering index.
I have got to different tables with temperature values and a timestamp. I join those tables with this query:
SELECT UNIX_TIMESTAMP(l.TimeDate) time
, AVG(l.intemp)
, AVG(n.intemp)
, DATE_FORMAT(l.TimeDate, '%Y-%m-%d-%H') dates
FROM values.temps l
LEFT
JOIN values.net n
ON DATE_FORMAT(l.TimeDate, '%Y-%m-%d-%H') = DATE_FORMAT(n.TimeDate, '%Y-%m-%d-%H')
WHERE YEARWEEK('2017-01-17 00:00:00',1) = YEARWEEK(l.TimeDate,1)
GROUP
BY dates
ORDER
BY dates ASC
This query is a little bit slow, but it works and gives me the values for 1 week. So how can I optimize it?
I haven't responded because actually I'm struggling to think how to express your YEARWEEK condition in terms of a range query.
I thought something like this would work, but it refuses to use 'range'.
SELECT *
FROM my_table
WHERE dt BETWEEN CONCAT(STR_TO_DATE(CONCAT(YEARWEEK('2017-01-25'), ' Monday'), '%x%v %W'), ' 00:00:00')
AND CONCAT(STR_TO_DATE(CONCAT(YEARWEEK('2017-01-25'), ' Sunday'), '%x%v %W'), ' 23:59:59')
Perhaps others can spot my schoolboy error.
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | my_table | ALL | dt | NULL | NULL | NULL | 100 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
I have written the query below and it's taking nearly 5 minutes to run. I have 6 million rows of data in table and found from the execution plan that some how my query does not use indexes even though all fields of the table have indexes.
Query
SELECT
event_date as date,
(CAST('2014-05-31' AS DATE)- INTERVAL 5 MONTH + INTERVAL 1 DAY) AS FROM_DATE,
COUNT(DISTINCT(IF( Column1 !=0 OR Column2!=0 OR Column3 !=0, account, NULL))) AS total_account1,
COUNT(DISTINCT(IF( Column4 !=0 OR Column5 !=0 OR Column6!=0, account, NULL))) AS total_account2,
COUNT(DISTINCT(IF( Column7 !=0 OR Column8 !=0 OR Column9!=0, account, NULL))) AS total_account3
FROM Table_name
WHERE cast(event_date as DATE) BETWEEN CAST('2014-05-31' AS DATE)- INTERVAL 5 MONTH and CAST('2014-05-31' AS DATE)
AND cast(event_date as DATE) < NOW() - INTERVAL 2 DAY
GROUP BY MONTH(event_date)
"Explain" above query output is -
+----+-------------+---------+------+---------------+------+---------+------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+-----------------------------+
| 1 | SIMPLE | table_name | ALL | NULL | NULL | NULL | NULL | 5764552 | Using where; Using filesort |
+----+-------------+---------+------+---------------+------+---------+------+---------+-----------------------------+
Why is my query not using the indexes available to it?
You can explicitly force engine to use index.
check it http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
I have table like,
id | OpenDate | CloseDate
------------------------------------------------
1 | 2013-01-16 07:30:48 | 2013-01-16 10:49:48
2 | 2013-01-16 08:30:00 | NULL
I needed to get combined result as below
id | date | type
---------------------------------
1 | 2013-01-16 07:30:48 | Open
1 | 2013-01-16 10:49:48 | Close
2 | 2013-01-16 08:30:00 | Open
I used UNION to get above output (can we achieve without UNION?)
SELECT id,date,type FROM(
SELECT id,`OpenDate` as date, 'Open' as 'type' FROM my_table
UNION ALL
SELECT id,`CloseDate` as date, 'Close' as 'type' FROM my_table
)AS `tab` LIMIT 0,15
I am getting the desired output, but now in performance side--> i have 4000 records in my table and by doing UNION its combining and giving around 8000 records, which is making very slow to load the site(more than 13 sec). How can i optimize this query to fasten the output?
I tried LIMIT in sub-query also, pagination offset is not working properly as it should if i use LIMIT in sub-query. Please help me to resolve this.
Update
EXPLAIN result
id select_type table type key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL 8858
2 DERIVED orders index OpenDate 4 NULL 4588 Using index
3 UNION orders index CloseDate 4 NULL 4588 Using index
NULL UNION RESULT<union2,3> ALL NULL NULL NULL NULL
I would do something like the following:
SELECT
id,
IF(act, t1.OpenDate, t1.CloseDate) as `date`,
IF(act, 'Open', 'Close') as `type`
FROM my_table t1
JOIN (SELECT 1 as act UNION SELECT 0) as _
JOIN my_table t2 USING (id);
Could you please help me optimize this query. I've spent lots of time and still cannot rephrase it to be fast enough (say running in the matters of seconds, not minutes as it is now).
The query:
SELECT m.my_id, m.my_value, m.my_timestamp
FROM (
SELECT my_id, MAX(my_timestamp) AS most_recent_timestamp
FROM my_table
WHERE my_timestamp < '2011-03-01 08:00:00'
GROUP BY my_id
) as tmp
LEFT OUTER JOIN my_table m
ON tmp.my_id = m.my_id AND tmp.most_recent_timestamp = m.my_timestamp
ORDER BY m.my_timestamp;
my_table is defined as follows:
CREATE TABLE my_table (
my_id INTEGER NOT NULL,
my_value VARCHAR(4000),
my_timestamp TIMESTAMP default CURRENT_TIMESTAMP NOT NULL,
INDEX MY_ID_IDX (my_id),
INDEX MY_TIMESTAMP_IDX (my_timestamp),
INDEX MY_ID_MY_TIMESTAMP_IDX (my_id, my_timestamp)
);
The goal of this query is to select the most recent my_value for each my_idbefore some timestamp. my_table contains ~100 million entries and it takes ~8 minutes to perform it.
explain:
+----+-------------+-------------+-------+------------------------------------------------+-------------------------+---------+---------------------------+-------+---------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+------------------------------------------------+-------------------------+---------+---------------------------+-------+---------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 90721 | Using temporary; Using filesort |
| 1 | PRIMARY | m | ref | MY_ID_IDX,MY_TIMESTAMP_IDX,MY_ID_TIMESTAMP_IDX | MY_TIMESTAMP_IDX | 4 | tmp.most_recent_timestamp | 1 | Using where |
| 2 | DERIVED | my_table | range | MY_TIMESTAMP_IDX | MY_ID_MY_TIMESTAMP_IDX | 8 | NULL | 61337 | Using where; Using index for group-by |
+----+-------------+-------------+-------+------------------------------------------------+-----------------------+---------+---------------------------+------+---------------------------------------+
If I understand correctly, you should be able to drop the nested select completely, and move the where clause to the main query, order by my_timestamp descending and limit 1.
SELECT my_id, my_value, max(my_timestamp)
FROM my_table
WHERE my_timestamp < '2011-03-01 08:00:00'
GROUP BY my_id
*edit - added max and group by
a trick to get a most recent record can be to use order by together with 'limit 1' instead of max aggregation together with "self" join
somthing like this (not tested):
SELECT m.my_id, m.my_value, m.my_timestamp
FROM my_table m
WHERE my_timestamp < '2011-03-01 08:00:00'
ORDER BY m.my_timestamp DESC
LIMIT 1
;
update above doesn't work because a grouping is required...
other solution that has WHERE-IN-SubSelect instead of the JOIN you've used.
could be faster. please test with your data.
SELECT m.my_id, m.my_value, m.my_timestamp
FROM my_table m
WHERE ( m.my_id, m.my_timestamp ) IN (
SELECT i.my_id, MAX(i.my_timestamp)
FROM my_table i
WHERE i.my_timestamp < '2011-03-01 08:00:00'
GROUP BY i.my_id
)
ORDER BY m.my_timestamp;
I notice in the explain plan that the optimizer is using the MY_ID_MY_TIMESTAMP_IDX index for the sub-query, but not the outer query.
You may be able to speed it up using an index hint. I also updated the ON clause to refer to tmp.most_recent_timestamp using its alias.
SELECT m.my_id, m.my_value, m.my_timestamp
FROM (
SELECT my_id, MAX(my_timestamp) AS most_recent_timestamp
FROM my_table
WHERE my_timestamp < '2011-03-01 08:00:00'
GROUP BY my_id
) as tmp
LEFT OUTER JOIN my_table m use index (MY_ID_MY_TIMESTAMP_IDX)
ON tmp.my_id = m.my_id AND tmp.most_recent_timestamp = m.my_timestamp
ORDER BY m.my_timestamp;