Optimize MySQL Queries with Group-Function and two different Timestamps with Join - mysql

I have got to different tables with temperature values and a timestamp. I join those tables with this query:
SELECT UNIX_TIMESTAMP(l.TimeDate) time
, AVG(l.intemp)
, AVG(n.intemp)
, DATE_FORMAT(l.TimeDate, '%Y-%m-%d-%H') dates
FROM values.temps l
LEFT
JOIN values.net n
ON DATE_FORMAT(l.TimeDate, '%Y-%m-%d-%H') = DATE_FORMAT(n.TimeDate, '%Y-%m-%d-%H')
WHERE YEARWEEK('2017-01-17 00:00:00',1) = YEARWEEK(l.TimeDate,1)
GROUP
BY dates
ORDER
BY dates ASC
This query is a little bit slow, but it works and gives me the values for 1 week. So how can I optimize it?

I haven't responded because actually I'm struggling to think how to express your YEARWEEK condition in terms of a range query.
I thought something like this would work, but it refuses to use 'range'.
SELECT *
FROM my_table
WHERE dt BETWEEN CONCAT(STR_TO_DATE(CONCAT(YEARWEEK('2017-01-25'), ' Monday'), '%x%v %W'), ' 00:00:00')
AND CONCAT(STR_TO_DATE(CONCAT(YEARWEEK('2017-01-25'), ' Sunday'), '%x%v %W'), ' 23:59:59')
Perhaps others can spot my schoolboy error.
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | my_table | ALL | dt | NULL | NULL | NULL | 100 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+

Related

Indexes are not getting used in MySQL query

I have written the query below and it's taking nearly 5 minutes to run. I have 6 million rows of data in table and found from the execution plan that some how my query does not use indexes even though all fields of the table have indexes.
Query
SELECT
event_date as date,
(CAST('2014-05-31' AS DATE)- INTERVAL 5 MONTH + INTERVAL 1 DAY) AS FROM_DATE,
COUNT(DISTINCT(IF( Column1 !=0 OR Column2!=0 OR Column3 !=0, account, NULL))) AS total_account1,
COUNT(DISTINCT(IF( Column4 !=0 OR Column5 !=0 OR Column6!=0, account, NULL))) AS total_account2,
COUNT(DISTINCT(IF( Column7 !=0 OR Column8 !=0 OR Column9!=0, account, NULL))) AS total_account3
FROM Table_name
WHERE cast(event_date as DATE) BETWEEN CAST('2014-05-31' AS DATE)- INTERVAL 5 MONTH and CAST('2014-05-31' AS DATE)
AND cast(event_date as DATE) < NOW() - INTERVAL 2 DAY
GROUP BY MONTH(event_date)
"Explain" above query output is -
+----+-------------+---------+------+---------------+------+---------+------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+-----------------------------+
| 1 | SIMPLE | table_name | ALL | NULL | NULL | NULL | NULL | 5764552 | Using where; Using filesort |
+----+-------------+---------+------+---------------+------+---------+------+---------+-----------------------------+
Why is my query not using the indexes available to it?
You can explicitly force engine to use index.
check it http://dev.mysql.com/doc/refman/5.1/en/index-hints.html

MySQL show used index in query

For example I have created 3 index:
click_date - transaction table, daily_metric table
order_date - transaction table
I want to check does my query use index, I use EXPLAIN function and get this result:
+----+--------------+--------------+-------+---------------+------------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+-------+---------------+------------+---------+------+--------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 668 | Using temporary; Using filesort |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 645 | |
| 2 | DERIVED | <derived4> | ALL | NULL | NULL | NULL | NULL | 495 | |
| 4 | DERIVED | transaction | ALL | order_date | NULL | NULL | NULL | 291257 | Using where; Using temporary; Using filesort |
| 3 | DERIVED | daily_metric | range | click_date | click_date | 3 | NULL | 812188 | Using where; Using temporary; Using filesort |
| 5 | UNION | <derived7> | ALL | NULL | NULL | NULL | NULL | 495 | |
| 5 | UNION | <derived6> | ALL | NULL | NULL | NULL | NULL | 645 | Using where; Not exists |
| 7 | DERIVED | transaction | ALL | order_date | NULL | NULL | NULL | 291257 | Using where; Using temporary; Using filesort |
| 6 | DERIVED | daily_metric | range | click_date | click_date | 3 | NULL | 812188 | Using where; Using temporary; Using filesort |
| NULL | UNION RESULT | <union2,5> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+---------------+------------+---------+------+--------+----------------------------------------------+
In EXPLAIN results I see, that index order_date of transaction table is not used, do I correct understand ?
Index click_date of daily_metric table was used correct ?
Please tell my how to understand from EXPLAIN result does my created index is used in query properly ?
My query:
SELECT
partner_id,
the_date,
SUM(clicks) as clicks,
SUM(total_count) as total_count,
SUM(count) as count,
SUM(total_sum) as total_sum,
SUM(received_sum) as received_sum,
SUM(partner_fee) as partner_fee
FROM (
SELECT
clicks.partner_id,
clicks.click_date as the_date,
clicks,
orders.total_count,
orders.count,
orders.total_sum,
orders.received_sum,
orders.partner_fee
FROM
(SELECT
partner_id, click_date, sum(clicks) as clicks
FROM
daily_metric WHERE DATE(click_date) BETWEEN '2013-04-01' AND '2013-04-30'
GROUP BY partner_id , click_date) as clicks
LEFT JOIN
(SELECT
partner_id,
DATE(order_date) as order_dates,
SUM(order_sum) as total_sum,
SUM(customer_paid_sum) as received_sum,
SUM(partner_fee) as partner_fee,
count(*) as total_count,
count(CASE
WHEN status = 1 THEN 1
ELSE NULL
END) as count
FROM
transaction WHERE DATE(order_date) BETWEEN '2013-04-01' AND '2013-04-30'
GROUP BY DATE(order_date) , partner_id) as orders ON orders.partner_id = clicks.partner_id AND clicks.click_date = orders.order_dates
UNION ALL SELECT
orders.partner_id,
orders.order_dates as the_date,
clicks,
orders.total_count,
orders.count,
orders.total_sum,
orders.received_sum,
orders.partner_fee
FROM
(SELECT
partner_id, click_date, sum(clicks) as clicks
FROM
daily_metric WHERE DATE(click_date) BETWEEN '2013-04-01' AND '2013-04-30'
GROUP BY partner_id , click_date) as clicks
RIGHT JOIN
(SELECT
partner_id,
DATE(order_date) as order_dates,
SUM(order_sum) as total_sum,
SUM(customer_paid_sum) as received_sum,
SUM(partner_fee) as partner_fee,
count(*) as total_count,
count(CASE
WHEN status = 1 THEN 1
ELSE NULL
END) as count
FROM
transaction WHERE DATE(order_date) BETWEEN '2013-04-01' AND '2013-04-30'
GROUP BY DATE(order_date) , partner_id) as orders ON orders.partner_id = clicks.partner_id AND clicks.click_date = orders.order_dates
WHERE
clicks.partner_id is NULL
ORDER BY the_date DESC
) as t
GROUP BY the_date ORDER BY the_date DESC LIMIT 50 OFFSET 0
Although I can't explain what the EXPLAIN has dumped, I thought there must be an easier solution to what you have and came up with the following. I would suggest the following indexes to optimize your existing query for the WHERE date range and grouping by partner.
Additionally, when you have a query that uses a FUNCTION on a field, it doesn't take advantage of the index. Such as your DATE(order_date) and DATE(click_date). To allow the index to better be used, qualify the full date/time such as 12:00am (morning) up to 11:59pm. I would typically to this via
x >= someDate #12:00 and x < firstDayAfterRange.
in your example would be (notice less than May 1st which gets up to April 30th at 11:59:59pm)
click_date >= '2013-04-01' AND click_date < '2013-05-01'
Table Index
transaction (order_date, partner_id)
daily_metric (click_date, partner_id)
Now, an adjustment. Since your clicks table may have entries the transactions dont, and vice-versa, I would adjust this query to do a pre-query of all possible date/partners, then left-join to respective aggregate queries such as:
SELECT
AllParnters.Partner_ID,
AllParnters.the_Date,
coalesce( clicks.clicks, 0 ) Clicks,
coalesce( orders.total_count, 0 ) TotalCount,
coalesce( orders.count, 0 ) OrderCount,
coalesce( orders.total_sum, 0 ) OrderSum,
coalesce( orders.received_sum, 0 ) ReceivedSum,
coalesce( orders.partner_fee 0 ) PartnerFee
from
( select distinct
dm.partner_id,
DATE( dm.click_date ) as the_Date
FROM
daily_metric dm
WHERE
dm.click_date >= '2013-04-01' AND dm.click_date < '2013-05-01'
UNION
select
t.partner_id,
DATE(t.order_date) as the_Date
FROM
transaction t
WHERE
t.order_date >= '2013-04-01' AND t.order_date < '2013-05-01' ) AllParnters
LEFT JOIN
( SELECT
dm.partner_id,
DATE( dm.click_date ) sumDate,
sum( dm.clicks) as clicks
FROM
daily_metric dm
WHERE
dm.click_date >= '2013-04-01' AND dm.click_date < '2013-05-01'
GROUP BY
dm.partner_id,
DATE( dm.click_date ) ) as clicks
ON AllPartners.partner_id = clicks.partner_id
AND AllPartners.the_date = clicks.sumDate
LEFT JOIN
( SELECT
t.partner_id,
DATE(t.order_date) as sumDate,
SUM(t.order_sum) as total_sum,
SUM(t.customer_paid_sum) as received_sum,
SUM(t.partner_fee) as partner_fee,
count(*) as total_count,
count(CASE WHEN t.status = 1 THEN 1 ELSE NULL END) as COUNT
FROM
transaction t
WHERE
t.order_date >= '2013-04-01' AND t.order_date < '2013-05-01'
GROUP BY
t.partner_id,
DATE(t.order_date) ) as orders
ON AllPartners.partner_id = orders.partner_id
AND AllPartners.the_date = orders.sumDate
order by
AllPartners.the_date DESC
limit 50 offset 0
This way, the first query will be quick on the index to get all possible combinations from EITHER table. Then the left-join will AT MOST join to one row per set. If found, get the number, if not, I am applying COALESCE() so if null, defaults to zero.
CLARIFICATION.
Like you when building your pre-aggregate queries of "clicks" and "orders", the "AllPartners" is the ALIAS result of the select distinct of partners and dates within the date range you were interested in. The resulting columns of that where were "partner_id" and "the_date" respective to your next queries. So this is the basis of joining to the aggregates of "clicks" and "orders". So, since I have these two columns in the alias "AllParnters", I just grabbed those for the field list since they are LEFT-JOINed to the other aliases and may not exist in either/or the respective others.

Join Nearest Date

I have read through a ton of responses on here but nothing is working quite as well as I would like. I currently have a working query that includes 2 sub queries, the problem is that it takes about 10 seconds to execute. I was wondering if there is any way to make this go quicker, maybe with a join. I just can't seem to get my head out of the box it is in. Please let me know your thoughts.
Here is the working query:
Select concat(a.emp_firstname, ' ', a.emp_lastname) as names
, if(if (a.emp_gender = 1, 'Male', a.emp_gender)=2, 'Female',
if (a.emp_gender = 1, 'Male', a.emp_gender)) as emp_gender
, c.name
, a.emp_work_telephone
, a.emp_hm_telephone, a.emp_work_email
, a.custom7, a.employee_id
, a.city_code, a.provin_code, d.name as status,
(SELECT cast(concat(DATE_FORMAT(e.app_datetime, '%H:%i'), ' ', e.app_facility) as char(100))
FROM li_appointments.li_appointments as e where e.terp_id = a.employee_id
and e.app_datetime <= str_to_date('06/26/13 at 3:20 PM', '%m/%d/%Y at %h:%i %p')
and date(e.app_datetime) = date(str_to_date('06/26/13 at 3:20 PM', '%m/%d/%Y at %h:%i %p'))
order by e.app_datetime desc limit 1) as prevapp,
(SELECT cast(concat(DATE_FORMAT(e.app_datetime, '%H:%i'), ' ', e.app_facility) as char(100))
FROM li_appointments.li_appointments as e
where e.terp_id = a.employee_id
and e.app_datetime > str_to_date('06/26/13 at 3:20 PM', '%m/%d/%Y at %h:%i %p')
and date(e.app_datetime) = date(str_to_date('06/26/13 at 3:20 PM', '%m/%d/%Y at %h:%i %p'))
order by e.app_datetime desc limit 1) as nextapp
from hs_hr_employee as a
Join hs_hr_emp_skill as b on a.emp_number = b.emp_number
Join ohrm_skill as c on b.skill_id = c.id
Join orangehrm_li.ohrm_employment_status as d on a.emp_status = d.id
where c.name like '%Arabic%'
and d.name = 'Active' order by rand();
EXPLAIN results:
+----+--------------------+-------+--------+---------------------+------------+---------+---------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+--------+---------------------+------------+---------+---------------------------+-------+----------------------------------------------+
| 1 | PRIMARY | d | ALL | PRIMARY | | | | 10 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | a | ref | PRIMARY,emp_status | emp_status | 5 | orangehrm_li.d.id | 48 | Using where |
| 1 | PRIMARY | b | ref | emp_number,skill_id | emp_number | 4 | orangehrm_li.a.emp_number | 1 | |
| 1 | PRIMARY | c | eq_ref | PRIMARY | PRIMARY | 4 | orangehrm_li.b.skill_id | 1 | Using where |
| 3 | DEPENDENT SUBQUERY | e | ALL | | | | | 28165 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | e | ALL | | | | | 28165 | Using where; Using filesort |
+----+--------------------+-------+--------+---------------------+------------+---------+---------------------------+-------+----------------------------------------------+
Your tables appear small enough to do as I have done here. First, the inner-most query starts with all employees and did TWO left-joins immediately to the appointment table... By getting a MAX() of the appointments less than the date in question gets the "Previous Appointment", and getting the MIN() appointment AFTER the date in question gets the "Next Appointment". So now, for a single person, I have both their ID and possible previous and next appointments based on their specific times.
Now, I take that result and re-join first to the appointment tables (left joined again) but this time based on same person (Terp_ID) AND their respective Previous and Next appointments date/time. This would only be a problem if you had multiple entries with the exact same date/time for a single person, and that would just result in multiple records.
So now, I have each person with the specifics of previous and next appointments available.
The rest is simple joining to the other tables to get only employee status of "Active", and the skill set of "Arabic" criteria (which I have at their respective JOIN criteria), otherwise you could just move these to a WHERE clause.
As for the "Date/Time" basis of the query, I used #variable once so it could be used against both left-joins to appointments. Finally, I grabbed the respective fields you wanted. This SHOULD work, yet without your data, might need some tweaking.
SELECT
EmpPrevNext.Employee_ID,
EmpPrevNext.PrevApnt,
EmpPrevNext.NextApnt,
concat(Emp2.emp_firstname, ' ', Emp2.emp_lastname) as names,
if ( Emp2.emp_gender = 1, 'Male', 'Female' ) as emp_gender,
Emp2.emp_work_telephone,
Emp2.emp_hm_telephone,
Emp2.emp_work_email,
Emp2.custom7,
Emp2.city_code,
Emp2.provin_code,
cast( concat( DATE_FORMAT(PriorApp2.app_datetime, '%H:%i'), ' ', PriorApp2.app_facility) as char(100))
as PriorAppointment,
cast( concat( DATE_FORMAT(NextApp2.app_datetime, '%H:%i'), ' ', NextApp2.app_facility) as char(100))
as NextAppointment,
EStat.`name` as EmployeeStatus,
Skill.`name` as SkillName
FROM
( SELECT
Emp.Employee_ID,
MAX( PriorApp.app_DateTime ) as PrevApnt,
MIN( NextApp.app_DateTime ) as NextApnt
from
( select #DateBasis := '06/26/13 at 3:20 PM' ) sqlvars,
hs_hr_employee as Emp
LEFT JOIN li_appointments.li_appointments as PriorApp
ON Emp.Employee_ID = NextApp.Terp_ID
AND PriorApp.app_DateTime <= #DateBasis
LEFT JOIN li_appointments.li_appointments as NextApp
ON Emp.Employee_ID = NextApp.Terp_ID
AND NextApp.app_DateTime > #DateBasis
group by
Emp.Employee_ID ) EmpPrevNext
LEFT JOIN li_appointments.li_appointments as PriorApp2
ON EmpPrevNext.Employee_ID = PriorApp2.Terp_ID
AND EmpPrevNext.PrevApnt = PriorApp2.app_DateTime
LEFT JOIN li_appointments.li_appointments as NextApp2
ON EmpPrevNext.Employee_ID = NextApp2.Terp_ID
AND EmpPrevNext.NextApnt = NextApp2.app_DateTime
JOIN hs_hr_employee as Emp2
ON EmpPrevNext.Employee_ID = Emp2.Employee_ID
JOIN orangehrm_li.ohrm_employment_status as EStat
ON Emp2.Emp_Status = EStat.ID
AND EStat.`name` = 'Active'
JOIN hs_hr_emp_skill as EmpSkill
ON Emp2.emp_number = EmpSkill.emp_number
JOIN ohrm_skill as Skill
on EmpSkill.skill_id = Skill.id
AND Skill.`name` like '%Arabic%'
order by
rand();
Make sure your appointment table has an index on (Terp_ID, app_datetime )

Sorting some rows by average with SQL

All right, so here's a challenge for all you SQL pros:
I have a table with two columns of interest, group and birthdate. Only some rows have a group assigned to them.
I now want to print all rows sorted by birthdate, but I also want all rows with the same group to end up next to each other. The only semi-sensible way of doing this would be to use the groups' average birthdates for all the rows in the group when sorting. The question is, can this be done with pure SQL (MySQL in this instance), or will some scripting logic be required?
To illustrate, with the given table:
id | group | birthdate
---+-------+-----------
1 | 1 | 1989-12-07
2 | NULL | 1990-03-14
3 | 1 | 1987-05-25
4 | NULL | 1985-09-29
5 | NULL | 1988-11-11
and let's say that the "average" of 1987-05-25 and 1989-12-07 is 1988-08-30 (this can be found by averaging the UNIX timestamp equivalents of the dates and then converting back to a date. This average doesn't have to be completely correct!).
The output should then be:
id | group | birthdate | [sort_by_birthdate]
---+-------+------------+--------------------
4 | NULL | 1985-09-29 | 1985-09-29
3 | 1 | 1987-05-25 | 1988-08-30
1 | 1 | 1989-12-07 | 1988-08-30
5 | NULL | 1988-11-11 | 1988-11-11
2 | NULL | 1990-03-14 | 1990-03-14
Any ideas?
Cheers,
Jon
I normally program in T-SQL, so please forgive me if I don't translate the date functions perfectly to MySQL:
SELECT
T.id,
T.group
FROM
Some_Table T
LEFT OUTER JOIN (
SELECT
group,
'1970-01-01' +
INTERVAL AVG(DATEDIFF('1970-01-01', birthdate)) DAY AS avg_birthdate
FROM
Some_Table T2
GROUP BY
group
) SQ ON SQ.group = T.group
ORDER BY
COALESCE(SQ.avg_birthdate, T.birthdate),
T.group

MySQL query optimization

Could you please help me optimize this query. I've spent lots of time and still cannot rephrase it to be fast enough (say running in the matters of seconds, not minutes as it is now).
The query:
SELECT m.my_id, m.my_value, m.my_timestamp
FROM (
SELECT my_id, MAX(my_timestamp) AS most_recent_timestamp
FROM my_table
WHERE my_timestamp < '2011-03-01 08:00:00'
GROUP BY my_id
) as tmp
LEFT OUTER JOIN my_table m
ON tmp.my_id = m.my_id AND tmp.most_recent_timestamp = m.my_timestamp
ORDER BY m.my_timestamp;
my_table is defined as follows:
CREATE TABLE my_table (
my_id INTEGER NOT NULL,
my_value VARCHAR(4000),
my_timestamp TIMESTAMP default CURRENT_TIMESTAMP NOT NULL,
INDEX MY_ID_IDX (my_id),
INDEX MY_TIMESTAMP_IDX (my_timestamp),
INDEX MY_ID_MY_TIMESTAMP_IDX (my_id, my_timestamp)
);
The goal of this query is to select the most recent my_value for each my_idbefore some timestamp. my_table contains ~100 million entries and it takes ~8 minutes to perform it.
explain:
+----+-------------+-------------+-------+------------------------------------------------+-------------------------+---------+---------------------------+-------+---------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+------------------------------------------------+-------------------------+---------+---------------------------+-------+---------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 90721 | Using temporary; Using filesort |
| 1 | PRIMARY | m | ref | MY_ID_IDX,MY_TIMESTAMP_IDX,MY_ID_TIMESTAMP_IDX | MY_TIMESTAMP_IDX | 4 | tmp.most_recent_timestamp | 1 | Using where |
| 2 | DERIVED | my_table | range | MY_TIMESTAMP_IDX | MY_ID_MY_TIMESTAMP_IDX | 8 | NULL | 61337 | Using where; Using index for group-by |
+----+-------------+-------------+-------+------------------------------------------------+-----------------------+---------+---------------------------+------+---------------------------------------+
If I understand correctly, you should be able to drop the nested select completely, and move the where clause to the main query, order by my_timestamp descending and limit 1.
SELECT my_id, my_value, max(my_timestamp)
FROM my_table
WHERE my_timestamp < '2011-03-01 08:00:00'
GROUP BY my_id
*edit - added max and group by
a trick to get a most recent record can be to use order by together with 'limit 1' instead of max aggregation together with "self" join
somthing like this (not tested):
SELECT m.my_id, m.my_value, m.my_timestamp
FROM my_table m
WHERE my_timestamp < '2011-03-01 08:00:00'
ORDER BY m.my_timestamp DESC
LIMIT 1
;
update above doesn't work because a grouping is required...
other solution that has WHERE-IN-SubSelect instead of the JOIN you've used.
could be faster. please test with your data.
SELECT m.my_id, m.my_value, m.my_timestamp
FROM my_table m
WHERE ( m.my_id, m.my_timestamp ) IN (
SELECT i.my_id, MAX(i.my_timestamp)
FROM my_table i
WHERE i.my_timestamp < '2011-03-01 08:00:00'
GROUP BY i.my_id
)
ORDER BY m.my_timestamp;
I notice in the explain plan that the optimizer is using the MY_ID_MY_TIMESTAMP_IDX index for the sub-query, but not the outer query.
You may be able to speed it up using an index hint. I also updated the ON clause to refer to tmp.most_recent_timestamp using its alias.
SELECT m.my_id, m.my_value, m.my_timestamp
FROM (
SELECT my_id, MAX(my_timestamp) AS most_recent_timestamp
FROM my_table
WHERE my_timestamp < '2011-03-01 08:00:00'
GROUP BY my_id
) as tmp
LEFT OUTER JOIN my_table m use index (MY_ID_MY_TIMESTAMP_IDX)
ON tmp.my_id = m.my_id AND tmp.most_recent_timestamp = m.my_timestamp
ORDER BY m.my_timestamp;