MySQL doesn't use indexes in a SELECT clause subquery

MySQL doesn't use indexes in a SELECT clause subquery - mysql

I have an "events" table
table events
id (pk, auto inc, unsigned int)
field1,
field2,
...
date DATETIME (indexed)
I am trying to analyse holes in the trafic (the moments where there is 0 event in a day)
I try this kind of request
SELECT
e1.date AS date1,
(
SELECT date
FROM events AS e2
WHERE e2.date > e1.date
LIMIT 1
) AS date2
FROM events AS e1
WHERE e1.date > NOW() -INTERVAL 10 DAY
It takes a very huge amount of time
Here is the explain
+----+--------------------+-------+-------+---------------------+---------------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------------+---------------------+---------+------+----------+-------------+
| 1 | PRIMARY | t1 | range | DATE | DATE | 6 | NULL | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | t2 | ALL | DATE | NULL | NULL | NULL | 58678524 | Using where |
+----+--------------------+-------+-------+---------------------+---------------------+---------+------+----------+-------------+
2 rows in set (0.00 sec)
Tested on MySQL 5.5
Why can't mysql use the DATE indexe? is it because of a subquery?

Your query suffers from the problem shown here which also presents a quick solution with temp tables. That is a mysql forum page, all of which I unearthed thru finding this Stackoverflow question.
You may find that the creation and populating such a new table on the fly yields bearable performance and is easy to implement with the range of datetimes now() less 10 days.
If you need assistance in crafting anything, let me know. I will see if I can help.

You are looking for dates with no events?
First build a table Days with all possible dates (dy). This will give you the uneventful days:
SELECT dy
FROM Days
WHERE NOT EXISTS ( SELECT * FROM events
WHERE date >= days.day
AND date < days.day + INTERVAL 1 DAY )
AND dy > NOW() -INTERVAL 10 DAY
Please note that 5.6 has some optimizations in this general area.

Related

Getting records by date is multiples of 30 days

I have the following query to get appointments that need remind once a month if they are not done yet. I want to get records with 30, 60, 90, 120,etc... in the past from the current date.
SELECT
a.*
FROM
appointments a
WHERE
DATEDIFF(CURDATE(), a.appointment_date) % 30 = 0
is there another way not to use DATEDIFF to achieve this? I want to increase the performance of this query.

Ok, lets all put the dates and date-diff aside for a moment. Looking at the question, the person is trying to look for all appointments in the past that dont necessarily have another in the future. Such as doing a FOLLOW-UP appointment with a Dr. "Come back in a month to see where things change". This points me to thinking there is probably some patient ID in the table of appointments. So this probably turns the question to looking at the past 30, 60 or 90 days ago to see if there was a corresponding appointment scheduled in the future. If already scheduled, the patient does not need a call reminder to get into the office.
That said, I would start a bit differently, get all patients that DID have an appointment within the last 90 days, and see if they already have (or not) a follow-up appointment already on the schedule for the follow-up. This way, the office person can make contacts with said patients to get on the calendar.
start by getting all maximum appointments for any given patient within the last 90 days. If someone had an appointment 90 days ago, and had a follow-up at 59 days, then they probably only care about the most recent appointment to make sure THAT has the follow-up.
select
a1.patient_id,
max( a1.appointment_date ) MostRecentApnt
from
appointments a1
WHERE
a1.appointment_date > date_sub( a1.appointment_date, interval 90 day )
group by
a1.patient_id
Now, from this fixed list and beginning date, all we care is, how many days to current is there last appointment. IS it X number of days? Just use datediff and sort. You can visually see the how many days. By trying to break them into buckets of 30, 60 or 90 days, just knowing how many days since the last appointment is probably just as easy as sorting in DESCENDING order with the oldest appointments getting called on first, vs those that just happened. Maybe even cutting off the calling list at say 20 days and still has not made an appointment and getting CLOSE to the expected 30 days in question.
SELECT
p.LastName,
p.FirstName,
p.Phone,
Last90.Patient_ID,
Last90.MostRecentApnt,
DATEDIFF(CURDATE(), Last90.appointment_date) LastAppointmentDays
FROM
( select
a1.patient_id,
max( a1.appointment_date ) MostRecentApnt
from
appointments a1
WHERE
a1.appointment_date > date_sub( a1.appointment_date, interval 90 day )
group by
a1.patient_id ) Last90
-- Guessing you might want patient data to do phone calling
JOIN Patients p
on Last90.Patient_id = p.patient_id
order by
Last90.MostRecentApnt DESC,
p.LastName,
p.FirstName
Sometimes, having an answer just for the direct question doesnt get the correct need. Hopefully I am more on-target with the desired ultimate outcome needs. Again, the above implies joining to the patient table for follow-up call purposes to schedule an appointment.

You could use the following query which compares the day of the month of the appointement to the day of the month of today.
We also test whether we are the last day of the month so as to get appointements due at the end of the month. For example if we are the 28th February (not a leap year) we will accept days of the month >= 28, ie 29, 30 & 31, which would otherwise be missed.
This method has the same problem as your current system, that appointements falling during the weekend will be missed.
select a.*
from appointements a,
(select
day(now()) today,
case when day(now())= last_day(now()) then day(now()) else 99 end lastDay
) days
where d = today or d >= lastDay;

You just want the appointments for 30 days in the future? Are they stored as DATE? Or DATETIME? Well, this works in either case:
SELECT ...
WHERE appt_date >= CURDATE() + INTERVAL 30 DAY
AND appt_date < CURDATE() + INTERVAL 31 DAY
If you have INDEX(appt_date) (or any index starting with appt_date), the query will be efficient.
Things like DATE() are not "sargable", and prevent the use of an index.
If your goal is to nag customers, I see nothing in your query to prevent nagging everyone over and over. This might need a separate "nag" table, where customers who have satisfied the nag can be removed. Then performance won't be a problem, since the table will be small.

If your primary concern is to speed up this query we can add a column int for comparing the number of days and index it. We then add triggers to calculate the modulus of the datediff between the start of the Unix period: 01/01/1970 (or any other date if you prefer) and store the result in this column.
This will take a small amount of storage space, and slow down insert and update operations. This will not be noticable when we add or modify one appointment at the time, which I suspect to be the general case.
When we query our table we calculate the day value of today, which will take very little time as it will only be done once, and compare it with the days column which will be very quick because it is indexed and there are no calculations involved.
Finally we run your current query and look at it using explain to see that, even though we have indexed the column date_ , the index cannot be used for this query.
CREATE TABLE appointments (
id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
date_ date,
days int
);
CREATE INDEX ix_apps_days ON appointments (days);
✓
✓
CREATE PROCEDURE apps_day()
BEGIN
UPDATE appointments SET days = day(date_);
END
✓
CREATE TRIGGER t_apps_insert BEFORE INSERT ON appointments
FOR EACH ROW
BEGIN
SET NEW.days = DATEDIFF(NEW.date_, '1970-01-01') % 30 ;
END;
✓
CREATE TRIGGER t_apps_update BEFORE UPDATE ON appointments
FOR EACH ROW
BEGIN
SET NEW.days = DATEDIFF(NEW.date_, '1970-01-01') % 30 ;
END;
✓
insert into appointments (date_) values ('2022-01-01'),('2022-01-01'),('2022-04-15'),(now());
✓
update appointments set date_ = '2022-01-12' where id = 1;
✓
select * from appointments
id | date_ | days
-: | :--------- | ---:
1 | 2022-01-12 | 14
2 | 2022-01-01 | 3
3 | 2022-04-15 | 17
4 | 2022-04-22 | 24
select
*
from appointments
where DATEDIFF(CURDATE() , '1970-01-01') % 30 = days;
id | date_ | days
-: | :--------- | ---:
4 | 2022-04-22 | 24
explain
select DATEDIFF(CURDATE() , '1970-01-01')
from appointments
where DATEDIFF(CURDATE() , '1970-01-01') = days;
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
-: | :---------- | :----------- | :--------- | :--- | :------------ | :----------- | :------ | :---- | ---: | -------: | :----------
1 | SIMPLE | appointments | null | ref | ix_apps_days | ix_apps_days | 5 | const | 1 | 100.00 | Using index
CREATE INDEX ix_apps_date_ ON appointments (date_);
✓
SELECT
a.*
FROM
appointments a
WHERE
DATEDIFF(CURDATE(), a.date_) % 30 = 0
id | date_ | days
-: | :--------- | ---:
4 | 2022-04-22 | 24
explain
SELECT
a.*
FROM
appointments a
WHERE
DATEDIFF(CURDATE(), a.date_) % 30 = 0
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
-: | :---------- | :---- | :--------- | :--- | :------------ | :--- | :------ | :--- | ---: | -------: | :----------
1 | SIMPLE | a | null | ALL | null | null | null | null | 4 | 100.00 | Using where
db<>fiddle here

The best way to convert time zone efficiently in MYSQL query

My table 'my_logs' have about 20,000,000 records, and I want to find out how many logs I have in each date within a few days.
I want to have a result like
+------------+---------+
| date | count |
+------------+---------+
| 2016-07-01 | 1623 |
| 2016-07-02 | 1280 |
| 2016-07-03 | 2032 |
+------------+---------+
This query below only take me milliseconds to finish, that's good
SELECT DATE_FORMAT(created_at, '%Y-%m-%d') as date,
COUNT(*) as count
FROM my_logs
WHERE created_at BETWEEN '2016-07-01' AND '2016-07-04'
GROUP BY DATE_FORMAT(created_at, '%Y-%m-%d')
The Explain of query:
+------------+---------+-------+-----------------------------+
|select_type | table | type | possible_keys |
+------------+---------+-------+-----------------------------+
| SIMPLE | my_logs| index | index_my_logs_on_created_at |
+------------+---------+-------+-----------------------------+
+-----------------------------+---------+----------+
| key | key_len | rows |
+-----------------------------+---------+----------+
| index_my_logs_on_created_at | 10 | 23458462 |
+-----------------------------+---------+----------+
+-----------------------------------------------------------+
| Extra |
+-----------------------------------------------------------+
| Using where; Using index; Using temporary; Using filesort |
+-----------------------------------------------------------+
However, I need to convert the timezone of each record to fit the time in my country, and I need to group by the 'Date' information, so I need to convert the column itself.
Both
SELECT COUNT(*)
FROM my_logs
WHERE DATE_ADD(created_at, INTERVAL 8 HOUR) BETWEEN '2016-07-01' AND '2016-07-04'
GROUP BY DATE_FORMAT(DATE_ADD(created_at, INTERVAL 8 HOUR), '%Y-%m-%d')
and
SELECT COUNT(*)
FROM my_logs
WHERE CONVERT_TZ(created_at, "+00:00", "+08:00") BETWEEN '2016-07-01' AND '2016-07-04'
GROUP BY DATE_FORMAT(CONVERT_TZ(created_at, "+00:00", "+08:00"),
'%Y-%m-%d')
take me about 12s to finish the query, it is unbearable slow!!
(The Explain is the same as the query in the top)
I think it is common problem but I can't find a good way to deal with it, does anyone has a more efficient way to do it? Thanks!

Which datatype, TIMESTAMP vs. DATETIME, did you use? (But, I'll ignore that.)
Do not "hide" an indexed column (created_at) inside any function (CONVERT_TZ()). It makes it so that the WHERE clause cannot use the index and must scan the table instead. This fix is simple:
WHERE created_at >= '2016-07-01' - INTERVAL 8 HOUR
AND created_at < '2016-07-04' - INTERVAL 8 HOUR
(or use CONVERT_TZ). Note that I also fixed the bug wherein you included midnight from the 4th. Note: Even + INTERVAL... is effectively a function.
Expressions in the SELECT and the GROUP BY are far less critical to performance.

Where clause containing date in MySQL statement not working

Table Name: DemoTable.
Total Fields: 2
Fields:
id (int, auto increment, primary key)
month_and_year (varchar(10))
month_and_year contains date as '2015-03', '2015-01', '2014-12' and so on...
I am trying to get values from the table between '2014-10' and '2015-03'.
SELECT * FROM DemoTable where month_and_year>='2014-10' AND month_and_year<='2015-03' ORDER BY month_and_year DESC
Query does not give desired output as month_and_year field has varchar data type. Changing varchar to date data type isn't possible as date data type does not accept date in 'yyyy-mm' format.
How can the result be obtained?
PS:Is UNIX_TIMESTAMP() a safe bet in this case?

You should never store date value as varchar and choose mysql native date related data types like date,datetime or timestamp
However in your case you need to do some date related calculations before doing the select query. Consider the following table
mysql> select * from test ;
+------+----------------+
| id | month_and_year |
+------+----------------+
| 1 | 2014-10 |
| 2 | 2014-10 |
| 3 | 2014-09 |
| 4 | 2014-11 |
| 5 | 2015-01 |
| 6 | 2014-08 |
+------+----------------+
Now the approach would as
First convert the varchar to real date
Then for the lower limit always start the comparison from first day of the year month value
The upper limit will be till the end of the month.
So the query becomes
select * from test
where
date_format(
str_to_date(
month_and_year,'%Y-%m'
),'%Y-%m-01'
)
>=
date_format(
str_to_date('2014-10','%Y-%m'
),'%Y-%m-01'
)
and
last_day(
date_format(
str_to_date(month_and_year,'%Y-%m'
),'%Y-%m-01'
)
)
<=
last_day(
date_format(
str_to_date('2015-03','%Y-%m'
),'%Y-%m-01'
)
);
The output will be as
+------+----------------+
| id | month_and_year |
+------+----------------+
| 1 | 2014-10 |
| 2 | 2014-10 |
| 4 | 2014-11 |
| 5 | 2015-01 |
+------+----------------+

Use the function STR_TO_DATE(string,format);
http://www.mysqltutorial.org/mysql-str_to_date/

You should use either mysql date time functions or use int field in mysql and store UNIXTIMESTAMP and compare like you are already doing. I think it is overkill to store unixtimestamp because you only need month and year and you won't benefit a lot from unixtimestamp advantages.

MySQL query date with offset given

In MySQL I have a table node_weather:
mysql> desc node_weather;
+--------------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+-------------------+-----------------------------+
| W_id | mediumint(9) | NO | PRI | NULL | auto_increment |
| temperature | int(10) | YES | | NULL | |
| humidity | int(10) | YES | | NULL | |
| time | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
Now what I need to do is the following: for every two hours of the current day (00:00:00, 02:00:00, ..., 24:00:00) I want to get a temperature. Normally the query could be like that:
mysql> SELECT temperature
-> FROM node_weather
-> WHERE date(time) = DATE(NOW())
-> AND TIME(time) IN ('00:00:00','02:00:00','04:00:00','06:00:00','08:00:00','10:00:00','12:00:00','14:00:00','16:00:00','18:00:00','20:00:00','22:00:00','24:00:00');
In the ideal case, I should get a result as 12 rows selected and everything would be fine. But there are two problems with it:
The table does not include the data for thw whole day, so for example the temperature for the time '24:00:00' is missing. In this case, I would like to return NULL.
The table sometimes record the data with the timestamp like '10:00:02' or '09:59:58', but not '10:00:00'. To resolve this case, I would like to add the offset to all the values in IN expression (something like that ('10:00:00' - offset, '10:00:00' + offset)) and it would select always just ONE value (no matter which one) from this range.
I know it is kind of awkard, but that is how my boss wants it. Thanks for help!

Okay, a bit more precise than what I wrote in comments:
EDIT: Had a bug. Hopefully this doesn't.
SELECT
time,
deviation,
hour,
temperature
FROM (
SELECT
time,
ROUND(HOUR(time) / 2) * 2 AS hour,
IF(HOUR(time) % 2,
3600 - MINUTE(time) * 60 - SECOND(time),
MINUTE(time) * 60 + SECOND(time)
) AS deviation,
temperature
FROM node_weather
WHERE DATE(time) = DATE(NOW())
ORDER BY deviation ASC
) t
GROUP BY hour
ORDER BY
hour ASC
Basically, group on intervals like 09:00:00 - 10:59:59 (by rounding hour/2), then sort ascending by those intervals, and within the interval by the distance to the center of the interval (so we choose 10:00:00 over 09:00:00 or 10:59:59).

What can cause mysql performance degradation after move?

I recently started moving my application from one host to another. From my home computer, to a virtual machine in the cloud. When testing the performance on the new node I noticed severe degradation. Comparing the results of the same query, with the same data, with the same version of mysql.
On my home computer:
mysql> SELECT id FROM events WHERE id in (SELECT distinct event AS id FROM results WHERE status='Inactive') AND (DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
+------+
| id |
+------+
| 8238 |
| 8369 |
+------+
2 rows in set (0,57 sec)
and on the new machine:
mysql> SELECT id FROM events WHERE id in (SELECT distinct event AS id FROM results WHERE status='Inactive') AND (DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
+------+
| id |
+------+
| 8369 |
+------+
1 row in set (26.70 sec)
Which means 46 times slower. That is not okay. I tried to get an explanation to why it was so slow. For my home computer:
mysql> explain SELECT id FROM events WHERE id in (SELECT distinct event AS id FROM results WHERE status='Inactive') AND (DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
+----+--------------+-------------+--------+---------------+------------+---------+-------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------------+--------+---------------+------------+---------+-------------------+---------+-------------+
| 1 | SIMPLE | events | ALL | PRIMARY | NULL | NULL | NULL | 5370 | Using where |
| 1 | SIMPLE | <subquery2> | eq_ref | <auto_key> | <auto_key> | 5 | eventor.events.id | 1 | NULL |
| 2 | MATERIALIZED | results | ALL | idx_event | NULL | NULL | NULL | 1319428 | Using where |
+----+--------------+-------------+--------+---------------+------------+---------+-------------------+---------+-------------+
3 rows in set (0,00 sec)
And for my virtual node:
mysql> explain SELECT id FROM events WHERE id in (SELECT distinct event AS id FROM results WHERE status='Inactive') AND (DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
+----+--------------------+---------+----------------+---------------+-----------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+----------------+---------------+-----------+---------+------+------+-------------+
| 1 | PRIMARY | events | ALL | NULL | NULL | NULL | NULL | 7297 | Using where |
| 2 | DEPENDENT SUBQUERY | results | index_subquery | idx_event | idx_event | 5 | func | 199 | Using where |
+----+--------------------+---------+----------------+---------------+-----------+---------+------+------+-------------+
2 rows in set (0.00 sec)
As you can see the results differ. I have not been able to figure out what the difference is. From all other point of views, the two system setups look similar.

In this case, the most likely problem is the processing of the subquery. This changed between some recent versions of MySQL (older versions do a poor job of optimizing the subqueries, the newest version does a better job).
One simple solution is to replace the in with exists and a correlated subquery:
SELECT id
FROM events
WHERE exists (SELECT 1
FROM results
WHERE status='Inactive' and results.event = events.id
) AND
(DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
This should work well in both versions, especially if you have an index on results(status, event).

The difference between 5.5 and 5.6 because of the new optimizations for handling subqueries explains (as discussed in comments) the difference in performance, but this conclusion also masks the fact that the original query is not written optimally to begin with. There does not seem to be a need for a subquery here at all.
The "events" table needs an index on (status,form,startdate) and the "results" table needs an index on (status) and another index on (event).
SELECT DISTINCT e.id
FROM events e
JOIN results r ON r.event = e.id AND r.status = 'Inactive'
WHERE (e.form = 'IndSingleDay' OR e.form = 'IndMultiDay')
AND e.status != 10
AND start_date > DATE_SUB(DATE(NOW()), INTERVAL 30 DAY)
AND start_date < DATE_SUB(DATE(NOW()), INTERVAL 2 DAY);
You might have to tweak the values "30" and "2" to get precisely the same logic, but the important principle here is that you never want to use a column as an argument to a function in the WHERE clause if it can be avoided by rewriting the expression another way, because the optimizer can't look "backwards" through the function to discover the actual range of raw values that you are wanting it to find. Instead, it has to evaluate the function against all of the possible data that it can't otherwise eliminate.
Using functions to derive constant values for comparison to the column, as shown above, allows the optimizer to realize that it's actually looking for a range of start_date values, and narrow down the possible rows accordingly, assuming an index exists on the values in question.
If I've decoded your query correctly, this version should be faster than any subquery if the indexes are in place.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008