Improve SQL Query Runtime - mysql

New to SQL. I am trying to get some week-over-week trends to compare various metric points. This is what I have so far. The runtime is bad because it goes through the entire table 4 times and this is only for the past 4 weeks. Is there any way to improve this process and also get metrics for all the past weeks (not just the last 4) ?
Edit: This is MySQL.
Sample Data:
Timestamp
Metric Hits
Metric Total
Metric Value
2022-09-20 06:50:01.332000
4
4
1
2022-08-31 08:49:59.086000
2
3
0.6666
2022-08-09 04:50:12.430000
1
2
0.5
SELECT
sum(metric_hits) as metric_hits_sum,
sum(metric_total) as metrics_total_sum,
avg(metric_value) as metric_value_avg
from metric_events
where timestamp >= DATEADD(DAY, -7-DATEPART(WEEKDAY, GETDATE()::date), GETDATE()::date) and timestamp < DATEADD(DAY, -DATEPART(WEEKDAY, GETDATE()::date), GETDATE()::date)
UNION
SELECT
sum(metric_hits) as metric_hits_sum,
sum(metric_total) as metrics_total_sum,
avg(metric_value) as metric_value_avg
from metric_events
where timestamp >= DATEADD(DAY, -14-DATEPART(WEEKDAY, GETDATE()::date), GETDATE()::date) and timestamp < DATEADD(DAY, -7-DATEPART(WEEKDAY, GETDATE()::date), GETDATE()::date)
UNION
SELECT
sum(metric_hits) as metric_hits_sum,
sum(metric_total) as metrics_total_sum,
avg(metric_value) as metric_value_avg
from metric_events
where timestamp >= DATEADD(DAY, -21-DATEPART(WEEKDAY, GETDATE()::date), GETDATE()::date) and timestamp < DATEADD(DAY, -14-DATEPART(WEEKDAY, GETDATE()::date), GETDATE()::date)
UNION
SELECT
sum(metric_hits) as metric_hits_sum,
sum(metric_total) as metrics_total_sum,
avg(metric_value) as metric_value_avg
from metric_events
where timestamp >= DATEADD(DAY, -28-DATEPART(WEEKDAY, GETDATE()::date), GETDATE()::date) and timestamp < DATEADD(DAY, -21-DATEPART(WEEKDAY, GETDATE()::date), GETDATE()::date)

This should get you closer. I'll leave you to add more features (like week begin and end dates)
select floor((daysold - DATEPART(WEEKDAY, GETDATE()::date)) / 7) as weeksold
, sum(metric_hits) as metric_hits_sum
, sum(metric_total) as metrics_total_sum
, avg(metric_value) as metric_value_avg
from (
SELECT datediff(day, [timestamp], GETDATE()::date) as daysold
, metric_hits
, metric_total
, metric_value
from metric_events
where [timestamp] < DATEADD(DAY, -DATEPART(WEEKDAY, GETDATE()::date), GETDATE()::date)
) q
group by floor((daysold - DATEPART(WEEKDAY, GETDATE()::date)) / 7)
order by 1

Related

MySQL: Select with first and last day of previous month

Using PhpMyadmin, I have this sentence working:
SELECT id_order as Ref FROM t_orders WHERE DATE(invoice_date) = CURDATE()
Now I want to reemplace "current date" (CURDATE) for the first day of previous month in advance.
The answer of Ankit Bajpai solved my problem (thank you):
SELECT id_order as Ref FROM t_orders WHERE DATE(invoice_date) >= concat(date_format(LAST_DAY(now() - interval 1 month),'%Y-%m-'),'01');
in MYSQL you can try the below
First day of Previous Month
select last_day(curdate() - interval 2 month) + interval 1 day
Last day of Previous Month
select last_day(curdate() - interval 1 month)
First day of Current Month
select last_day(curdate() - interval 1 month) + interval 1 day
Last day of Current Month
select last_day(curdate())
Try following query:-
SELECT id_order as Ref
FROM t_orders
WHERE DATE(invoice_date) >= concat(date_format(LAST_DAY(now() - interval 1 month),'%Y-%m-'),'01');
For MS SQL Server:
DECLARE #firstDayOfLastMonth DATETIME = DATEADD(MONTH, -1, DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0))
DECLARE #lastDayOfLastMonth DATETIME = DATEADD(DAY, -1, DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0))
SELECT #firstDayOfLastMonth;
SELECT #lastDayOfLastMonth;
After reading closely... you want the entire month, of the previous month. You can do this:
SELECT id_order as Ref FROM t_orders
WHERE
DATE(invoice_date) >= DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE())-1, 0)
AND
DATE(invoice_date) <= DATEADD(month, DATEDIFF(MONTH, 0, GETDATE()), -1)
OR
SELECT id_order as Ref FROM t_orders
WHERE
MONTH(DATE(invoice_date)) = MONTH(DATEADD(MONTH,-1,GETDATE()))

Year-Over-Year Data for last week

I am currently writing a query that will give me last weeks data (lets assume "SALES") and last years data for the same week. This is what I have to get last weeks data and it works fine:
Set DATEFIRST 1
Select DATEPArt(dd, DateAdded) AS 'Day of the Month',
count(*)AS 'Number of Users'
from TABLE1
Where DateAdded >= dateadd(day, -(datepart(dw, getdate()) + 6), CONVERT(date,getdate()))
AND DateAdded < dateadd(day, 1-datepart(dw, getdate()), CONVERT(date,getdate()))
Group by DATEPArt(dd, DateAdded)
Order by 'Day of the Month'
Now I want to add another column that will give me last years data from the same week. This is what I was thinking:
Set DATEFIRST 1
Select DATEPArt(dd, DateAdded) AS 'Day of the Month',
count(*)AS 'Number of Users'
from TABLE1
Where DateAdded >= DATEADD(yy,DATEDIFF(yy,0,GETDATE())-1,0)
AND DateAdded < DATEADD(yy,DATEDIFF(yy,0,GETDATE())+1,0)
AND DateAdded >= dateadd(day, -(datepart(dw, getdate()) + 6), CONVERT(date,getdate()))
AND DateAdded < dateadd(day, 1-datepart(dw, getdate()), CONVERT(date,getdate()))
Group by DATEPArt(dd, DateAdded), DateAdded
Order by 'Day of the Month'
Problem is that I am still getting last weeks numbers (this year, I need it to be last year). This leads me to believe the error has to be here somewhere:
DateAdded >= DATEADD(yy,DATEDIFF(yy,0,GETDATE())-1,0)
AND DateAdded < DATEADD(yy,DATEDIFF(yy,0,GETDATE())+1,0)
I appreciate everyone's help!!
You are looking for an OR condition
WHERE (DateAdded >= DATEADD(yy,DATEDIFF(yy,0,GETDATE())-1,0)
AND DateAdded < DATEADD(yy,DATEDIFF(yy,0,GETDATE())+1,0))
OR (DateAdded >= dateadd(day, -(datepart(dw, getdate()) + 6), CONVERT(date,getdate()))
AND DateAdded < dateadd(day, 1-datepart(dw, getdate()), CONVERT(date,getdate())))

Retrieving Data from database on quaterly basis condition But it should be based on current year data only

I had tried this code:
Its works also fine,but the issue is, if current month is feb and fire this query then it considers past 3 months from now and hence starts from past year i.e 2012 nov or dec say i want only current year data,if it is feb now and i fire this query then it should only show jan and feb records.
SELECT CROEmailId,
(
SELECT COUNT(LeadId)
FROM LeadStatus
WHERE DATE(`LeadTime`)> DATE_SUB(now(),
INTERVAL 3 MONTH
)
AND Generated=1 and AssignedTo=a.CROEmailId)
AS 'NEW LEAD',(
SELECT COUNT(LeadId)
FROM LeadHistory
WHERE DATE(UpdatedAt)> DATE_SUB(now(),
INTERVAL 3 MONTH
) AND AssignedTo=a.CROEmailId)
AS 'Lead Updated',
(
SELECT SUM(TotalEmails)
FROM MailJobs
WHERE DATE(CompletedAt)> DATE_SUB(now(),
INTERVAL 3 MONTH
)
AND MailFrom=a.CROEmailId)
AS 'Email Uploaded',
(
SELECT SUM(TotalSent)
FROM MailJobs
WHERE DATE(CompletedAt)> DATE_SUB(now(),
INTERVAL 3 MONTH)
AND MailFrom=a.CROEmailId
)
AS 'Email Sent',
(
SELECT SUM(NetTotal)
FROM Invoice
WHERE Status='PAID'
AND DATE(CreatedAt)> DATE_SUB(now(), INTERVAL 3 MONTH)
AND CROEmailId=a.CROEmailId)
AS 'Payment Today' FROM CustomersManager a;
Try change
DATE_SUB(now(), INTERVAL 3 MONTH)
to
IF(MONTH(CURDATE()) < 4, DATE_FORMAT(CURDATE(), '%Y-01-01'), CURDATE() - INTERVAL 3 MONTH)
in all subqueries.
SELECT CROEmailId,
(SELECT COUNT(LeadId)
FROM LeadStatus
WHERE DATE(`LeadTime`)> IF(MONTH(CURDATE()) < 4, DATE_FORMAT(CURDATE(), '%Y-01-01'), CURDATE() - INTERVAL 3 MONTH)
AND Generated=1
AND AssignedTo=a.CROEmailId) AS 'NEW LEAD',
(SELECT COUNT(LeadId)
FROM LeadHistory
WHERE DATE(UpdatedAt)> IF(MONTH(CURDATE()) < 4, DATE_FORMAT(CURDATE(), '%Y-01-01'), CURDATE() - INTERVAL 3 MONTH)
AND AssignedTo=a.CROEmailId) AS 'Lead Updated',
(SELECT SUM(TotalEmails)
from MailJobs
WHERE DATE(CompletedAt)> IF(MONTH(CURDATE()) < 4, DATE_FORMAT(CURDATE(), '%Y-01-01'), CURDATE() - INTERVAL 3 MONTH)
AND MailFrom=a.CROEmailId) AS 'Email Uploaded',
(SELECT SUM(TotalSent)
FROM MailJobs
WHERE DATE(CompletedAt)> IF(MONTH(CURDATE()) < 4, DATE_FORMAT(CURDATE(), '%Y-01-01'), CURDATE() - INTERVAL 3 MONTH)
AND MailFrom=a.CROEmailId) AS 'Email Sent',
(SELECT SUM(NetTotal)
FROM Invoice
WHERE Status='PAID'
AND DATE(CreatedAt)> IF(MONTH(CURDATE()) < 4, DATE_FORMAT(CURDATE(), '%Y-01-01'), CURDATE() - INTERVAL 3 MONTH)
AND CROEmailId=a.CROEmailId) AS 'Payment Today'
FROM CustomersManager a;
use this in your query to find record filter by year
YEAR( '20013-12-12' )
example,
SELECT * FROM TABLE WHERE YEAR(DATE_FIELD) = 2013

Pagination in SQL SELECT with ORDER BY based on CASE generated columns - TSQL

I am using MS SQL SERVER 2008 and I have created the following query (simplified);
SELECT DATE_A, DATE_B
CASE
WHEN DATEDIFF(day, DATE_A, GETDATE()) >= 1 THEN 1
WHEN DATEDIFF(day, DATE_B, GETDATE()) >= 1 AND DATEDIFF(day, DATE_A, GETDATE()) <1 THEN 2
ELSE 3
END AS sortPriority,
CASE
--deadline past
WHEN DATEDIFF(day, DATE_A, GETDATE()) >= 1 THEN DATE_A
--review only past
WHEN DATEDIFF(day, DATE_B, GETDATE()) >= 1 AND DATEDIFF(day, DATE_A()) <1 DATE_B
--anything else
ELSE DATE_B
END AS sortDate
FROM myTable
WHERE (DATEDIFF(day, DATE_A, GETDATE()) >=1 OR DATEDIFF(day, DATE_B, GETDATE()) >=1)
ORDER BY sortPriority, sortDate;
The query returns rows where DATE_A or DATE_B are older than todays date. The rows are sorted by sortPriority and then by sortDate.
I now need to add pagination to this query, however when I use the sortPriority or sortDate columns in the order by clause of the ROW_NUMBER() function, the query fails;
WITH sortedTable AS
(
SELECT DATE_A, DATE_B,
CASE
WHEN DATEDIFF(day, DATE_A, GETDATE()) >= 1 THEN 1
WHEN DATEDIFF(day, DATE_B, GETDATE()) >= 1 AND DATEDIFF(day, DATE_A, GETDATE()) <1 THEN 2
ELSE 3
END AS sortPriority,
CASE
--deadline past
WHEN DATEDIFF(day, DATE_A, GETDATE()) >= 1 THEN DATE_A
--review only past
WHEN DATEDIFF(day, DATE_B, GETDATE()) >= 1 AND DATEDIFF(day, DATE_A()) <1 DATE_B
--anything else
ELSE DATE_B
END AS sortDate,
ROW_NUMBER() OVER (sortPriority, sortDate) AS 'RowNumber'
FROM myTable
WHERE (DATEDIFF(day, DATE_A, GETDATE()) >=1 OR DATEDIFF(day, DATE_B, GETDATE()) >=1)
)
SELECT *
FROM sortedTable
WHERE RowNumber BETWEEN 10 AND 20;
I get the following error messages;
Msg 207, Level 16, State 1, Line 24
Invalid column name 'sortPriority'.
Msg 207, Level 16, State 1, Line 24
Invalid column name 'sortDate'.
And the line number refers to this line of my sample code;
ROW_NUMBER() OVER (sortPriority, sortDate) AS 'RowNumber'
How can I approach this and get the desired result (pagination with original sorting intact)
UNTESTED:
WITH sortedTable AS
(
SELECT DATE_A, DATE_B, sortPriority, sortDate,
ROW_NUMBER() OVER (sortPriority, sortDate) AS 'RowNumber'
FROM
(
SELECT DATE_A, DATE_B,
CASE WHEN DATEDIFF(day, DATE_A, GETDATE()) >= 1 THEN 1
WHEN DATEDIFF(day, DATE_B, GETDATE()) >= 1 AND DATEDIFF(day, DATE_A, GETDATE()) <1 THEN 2
ELSE 3
END AS sortPriority,
CASE
--deadline past
WHEN DATEDIFF(day, DATE_A, GETDATE()) >= 1 THEN DATE_A
--review only past
WHEN DATEDIFF(day, DATE_B, GETDATE()) >= 1 AND DATEDIFF(day, DATE_A()) <1 DATE_B
--anything else
ELSE DATE_B
END AS sortDate
FROM myTable
WHERE (DATEDIFF(day, DATE_A, GETDATE()) >=1 OR DATEDIFF(day, DATE_B, GETDATE()) >=1)
) T
)
SELECT *
FROM sortedTable
WHERE RowNumber BETWEEN 10 AND 20;

Query for duration between two times within 1 day

Suppose I have a table that contain information on streaming media connections. In this table, I have a start time and end time for when the connection was initiated and then later closed.
Table: logs
id (INT, PK, AUTO_INCREMENT)
StartTime (DATETIME)
EndTime (DATETIME)
I want to be able to run a query that will add up the total time connections were established for a day. This is obvious for connections within a day:
SELECT
SUM(
TIME_TO_SEC(
TIMEDIFF(`EndTime`, `StartTime`)
)
)
WHERE (`StartTime` BETWEEN '2010-01-01' AND '2010-01-02);
However, suppose a StartTime begins one day, say around 11:00PM, and EndTime is some time the next day, maybe 3:00AM. In these situations, I want to allocate only the amount of time that occurred during the day, to that day. So, 1 hour would go towards the first day, and 3 hours would go to the next.
SUM(
TIME_TO_SEC(
TIMEDIFF(
IF(`EndTime`>DATE_ADD('2010-01-01', INTERVAL 1 DAY), DATE_ADD('2010-01-01', INTERVAL 1 DAY), `EndTime`),
IF(`StartTime`<'2010-01-01', '2010-01-01', `StartTime`)
)
)/60/60
)
The thinking with this is that if the EndTime is more than the end of the day, then we'll just use the end of the day instead. If the StartTime is less than the beginning of the day, then we'll just use the beginning of the day instead.
So, I then need to wrap this all up into something that will generate a table that looks like this:
date, total
2010-01-01, 0
2010-01-02, 1.53
2010-01-03, 5.33
I thought this query would work:
SELECT
`date`,
SUM(
TIME_TO_SEC(
TIMEDIFF(
IF(`EndTime`>DATE_ADD(`date`, INTERVAL 1 DAY), DATE_ADD(`date`, INTERVAL 1 DAY), `EndTime`),
IF(`StartTime`<`date`, `date`, `StartTime`)
)
)/60/60
) AS `total_hours`
FROM
(SELECT * FROM `logs` WHERE `StartTime` BETWEEN '2010-08-01' AND '2010-08-31') AS logs_small,
(SELECT DATE_ADD("2010-08-01", INTERVAL `number` DAY) AS `date` FROM `numbers` WHERE `number` BETWEEN 0 AND 30) AS `dates`
GROUP BY `date`;
Note the numbers table referenced is a table with just one column, number, with a series of integers, 0, 1, 2, 3, etc. I am using it here to generate a series of dates, which works fine.
The problem with this query is that I get inaccurate data. Specifically, rows in the logs table that have an EndDate that goes into the next day don't get any time counted in that next day. For example, if I had a row that started 2010-08-01 23:00:00 and ended 2010-08-02 01:00:00, then the resulting row for 2010-08-02 would add up to 0.
Is there a better way to do this? Ideally, I'd like to get 0 instead of null on days that don't have any records that match up to them as well.
Edit: To clarify, I want to turn this:
id, StartTime, EndTime
0, 2000-01-01 01:00:00, 2000-01-01 04:00:00
1, 2000-01-01 23:00:00, 2000-01-02 05:00:00
2, 2000-01-02 00:00:00, 2000-01-04 01:00:00
... into this:
date, total_hours
2000-01-01, 4
2000-01-02, 29
2000-01-03, 24
2000-01-04, 1
2000-01-05, 0
Solution
Thanks to jim31415 for coming up with the solution! I translated his answer over to the functions usable in MySQL and came up with this:
SELECT `d`.`Date`,
SUM(COALESCE(
(CASE WHEN t.StartTime >= d.Date AND t.EndTime < DATE_ADD(d.Date, INTERVAL 1 DAY) THEN TIME_TO_SEC(TIMEDIFF(t.EndTime, t.StartTime))
WHEN t.StartTime < d.Date AND t.EndTime <= DATE_ADD(d.Date, INTERVAL 1 DAY) THEN TIME_TO_SEC(TIMEDIFF(t.EndTime,d.Date))
WHEN t.StartTime >= d.Date AND t.EndTime > DATE_ADD(d.Date, INTERVAL 1 DAY) THEN TIME_TO_SEC(TIMEDIFF(DATE_ADD(d.Date, INTERVAL 1 DAY),t.StartTime))
WHEN t.StartTime < d.Date AND t.EndTime > DATE_ADD(d.Date, INTERVAL 1 DAY) THEN 24*60*60
END), 0)
)/60/60 ConnectionTime
FROM (SELECT DATE_ADD('2011-03-01', INTERVAL `number` DAY) AS `Date` FROM `numbers` WHERE `number` BETWEEN 0 AND 30) AS d
LEFT JOIN `logs` t ON (t.StartTime >= d.Date AND t.StartTime < DATE_ADD(d.Date, INTERVAL 1 DAY))
OR (t.EndTime >= d.Date AND t.EndTime < DATE_ADD(d.Date, INTERVAL 1 DAY))
OR (t.StartTime < d.Date AND t.EndTime > DATE_ADD(d.Date, INTERVAL 1 DAY))
GROUP BY d.Date
ORDER BY d.Date;
I should also note that the null values for EndTime weren't applicable in my situation, as I am reading from old log files in my application. If you need them though, Jim's post has them outlined quite well.
This is in MS SQL, but I think the logic applies and can be translated into MySQL.
I wasn't sure how you wanted to handle EndTime that are null, so I commented that out.
select d.Date,
sum(coalesce(
(case when t.StartTime >= d.Date and t.EndTime < dateadd(day,1,d.Date) then datediff(minute,t.StartTime,t.EndTime)
when t.StartTime < d.Date and t.EndTime <= dateadd(day,1,d.Date) then datediff(minute,d.Date,t.EndTime)
when t.StartTime >= d.Date and t.EndTime > dateadd(day,1,d.Date) then datediff(minute,t.StartTime,dateadd(day,1,d.Date))
when t.StartTime < d.Date and t.EndTime > dateadd(day,1,d.Date) then 24*60
--when t.StartTime >= d.Date and t.EndTime is null then datediff(minute,t.StartTime,getdate())
--when t.StartTime < d.Date and t.EndTime is null then datediff(minute,d.Date,getdate())
end), 0)
) ConnectionTime
from (select Date=dateadd(day, num, '2011-03-01') from #NUMBERS where num between 0 and 30) d
left join Logs t on (t.StartTime >= d.Date and t.StartTime < dateadd(day,1,d.Date))
or (t.EndTime >= d.Date and t.EndTime < dateadd(day,1,d.Date))
or (t.StartTime < d.Date and t.EndTime > dateadd(day,1,d.Date))
group by d.Date
order by d.Date
Use a union to make it easier for yourself
SELECT
`date`,
SUM(
TIME_TO_SEC(TIMEDIFF(`EndTime`,`StartTime`))/60/60
) AS `total_hours`
FROM
(SELECT id, starttime, if (endtime > date then date else endtime) FROM `logs` WHERE `StartTime` >= date AND `StartTime` < date
union all
SELECT id, date, endtime FROM `logs` WHERE `enddate` >= date AND `enddate` < date and !(`StartTime` >= date AND `StartTime` < date)
union all
SELECT id, date, date_add(date, 1) FROM `logs` WHERE `enddate` > date AND `startdate` < date
) as datedetails inner join
(SELECT DATE_ADD("2010-08-01", INTERVAL `number` DAY) AS `date` FROM `numbers` WHERE `number` BETWEEN 0 AND 30) AS `dates`
GROUP BY `date`;
Hope, I understood your question correctly
Edit: Forgot case when there is a multiday request that starts before the day asked for, and ended after
Use this
select startTime,duration as duration,time,TIME_TO_SEC(TIMEDIFF(time,startTime)) as diff from <idling> limit 25;
select startTime,duration DIV 60 as duration,time,TIMESTAMPDIFF(MINUTE,startTime,time) as diff from <idling> limit 25;