I have a table containing daily sales. Not every day has sales (so there are 'missing' rows in the table).
I'm using MySQL 5.7, so there are no window functions available.
The structure of the table is date(timestamp), sales volume
date
sales volume
+/- prev DAY
2022-10-18
76
0
2022-10-17
131
55
2022-10-16
110
-21
2022-10-15
102
-8
2022-10-14
201
99
2022-10-10
100
-101
As an example, sales on 14-10 were 201, which were 99 more than sales for the previous row (15/10, 102)
I wish to derive the value in the 3rd column, comparing sales for the particular day, with those of the previous row (which isn't always the previous day), but can't seem to get anything working.
Thanks.
SELECT `sales volume` - #previous `+/- prev DAY`,
`date`,
#previous := `sales volume` `sales volume`
FROM test
CROSS JOIN (SELECT #previous := NULL) init_variable
ORDER BY `date` DESC
+/- prev DAY
date
sales volume
null
2022-10-18
76
55
2022-10-17
131
-21
2022-10-16
110
-8
2022-10-15
102
99
2022-10-14
201
-101
2022-10-10
100
fiddle
The expressions ordering in the output list and ORDER BY expression are critical. If you want to reorder output columns and/or output rows then use this query as subquery and set needed columns order and/or rows ordering in outer query.
Related
It seems to be an easy question by SQL, but I can't get it done in MySQL.
I have a table with prices for different products (ProdID) valid from certain dates and related from the quantity. For the current price list I need to get the price valid since the latest valid_from date in the table. The timestamp cannot be used as sometimes the prices inserted are for future valid_from dates, same with the ID, with is not representative to the actual prices.
ID, ProdID, qty, price, valid_from, timestamp
100 51 25 3.360 2021-02-15 2021-05-11 19:20:28
101 51 2000 3.150 2021-02-15 2021-05-11 19:20:29
102 51 6000 2.930 2021-02-15 2021-05-11 19:20:30
103 51 15000 2.870 2021-02-15 2021-05-11 19:20:31
131 51 1000 3.250 2021-02-15 2021-05-11 19:20:59
....
140 51 25 3.970 2021-10-06 2021-10-06 16:51:48
141 51 1000 3.790 2021-10-06 2021-10-06 16:51:50
142 51 2000 3.650 2021-10-06 2021-10-06 17:45:49
143 51 6000 3.500 2021-10-06 2021-10-06 16:51:54
144 51 15000 3.400 2021-10-06 2021-10-06 16:51:56
For example, these are the rows for ProdID 51.
I need to get the prices which are currently valid. In this case the ID 141 to 144 but this is only coincidentally. Also prices may be reduced so I can't go for the highest prices per prodID. The only criteria is the latest valid_from date which is <= date(). As mentioned there could be also some already inserted prices for the future (> date()).
I tried this, but it brings all above rows, valid from 2021-05-11 AND those valid from 2021-10-06:
SELECT `p`.`qty` AS `quantity`,
`p`.`price` AS `price`,
`p`.`ProdID` AS `ProdID`,
row_number() OVER (PARTITION BY `p`.`ProdID`
ORDER BY `p`.`valid_from` DESC,`p`.`qty`) AS `rk`
FROM `tblprices` `p`
You can use window functions. Use RANK to find greatest row(s) per group:
WITH cte AS (
SELECT *, RANK() OVER (PARTITION BY ProdID ORDER BY valid_from DESC) AS rnk
FROM t
WHERE valid_from <= CURRENT_DATE
)
SELECT *
FROM cte
WHERE rnk = 1
I have a dataset like this:
team date score
A 2011-05-01 50
A 2011-05-02 54
A 2011-05-03 51
A 2011-05-04 49
A 2011-05-05 59
B 2011-05-03 30
B 2011-05-04 35
B 2011-05-05 39
B 2011-05-06 47
B 2011-05-07 50
I want to add another column called MA3 where I can calculate the moving average of scores for the last 3 days. The point that made it tricky is to calculate the MA for each team. The end result should be like this:
team date score MA3
A 2011-05-01 50 null
A 2011-05-02 54 null
A 2011-05-03 51 null
A 2011-05-04 49 51.66
A 2011-05-05 59 51.33
B 2011-05-03 30 null
B 2011-05-04 35 null
B 2011-05-05 39 null
B 2011-05-06 47 34.66
B 2011-05-07 50 40.33
If that would be a single team, I would go on and do:
SELECT team,
year,
AVG(score) OVER (ORDER BY date ASC ROWS 3 PRECEDING) AS MA3
FROM table
You're missing the PARTITION BY clause:
SELECT team,
date,
AVG(score) OVER (
PARTITION BY team
ORDER BY date ASC ROWS 3 PRECEDING
) AS MA3
FROM table
Note that there will always be an average calculation, regardless of the window size. If you want the average to be null if your window size is smaller than 3, you could do it like this:
SELECT team,
date,
CASE
WHEN count(*) OVER w <= 3 THEN null
ELSE AVG(score) OVER w
END AS MA3
FROM table
WINDOW w AS (PARTITION BY team ORDER BY date ASC ROWS 3 PRECEDING)
dbfiddle
Side note
Your next question might be about logical windowing, because often, you don't actually want to calculate the average over 3 rows, but over some interval,
like e.g. 3 days. Luckily, MySQL implements this. You could then write:
WINDOW w AS (PARTITION BY team ORDER BY date ASC RANGE INTERVAL 3 DAY PRECEDING)
I have the query
SELECT DATE_FORMAT(sys_date, '%Y-%c') as month, COUNT(DATE_FORMAT(sys_date, '%Y-%c'))
FROM sale
GROUP BY month
ORDER BY month ASC
That returns the following result,
month COUNT(DATE_FORMAT(sys_date, '%Y-%c'))
2017-10 204
2017-11 178
2017-12 88
2017-7 3
2017-8 1
2017-9 153
2018-1 91
2018-2 86
2018-3 67
2018-4 109
2018-5 131
2018-6 47
2018-7 50
2018-8 36
2018-9 39
How do I make the output in correct ascending order? Like,
month COUNT(DATE_FORMAT(sys_date, '%Y-%c'))
2017-7 3
2017-8 1
2017-9 153
2017-10 204
2017-11 178
2017-12 88
2018-1 91
2018-2 86
2018-3 67
2018-4 109
2018-5 131
2018-6 47
2018-7 50
2018-8 36
2018-9 39
I've tried using MONTH(month), YEAR(month) ASC and many other options listed on the site. But nothing seems to work.
It's because the calculated month is ordered alphabetically. But you could keep that ORDER BY month and simply change it to a format with a leading 0 for months < 10.
That way the string values will all have the same length, and the alphabetical sort will be correct.
Because when comparing strings then '10' < '9' but '09' < '10'
To do that, simply change the %c to %m. Reference
Also, the COUNT can be simplified.
SELECT DATE_FORMAT(sys_date, '%Y-%m') as month, COUNT(*) as Total
FROM sale
GROUP BY month
ORDER BY month
If you do wish to use the '%Y-%c' format?
Then you could include the year and the length of month in the ORDER BY.
SELECT DATE_FORMAT(sys_date, '%Y-%c') as month, COUNT(*) as Total
FROM sale
GROUP BY YEAR(sys_date), month
ORDER BY YEAR(sys_date), LENGTH(month), month
Sort the data based on YEAR and MONTH obtained from sys_date directly. Since, you have only_full_group_by mode enabled, you will need to get YEAR and MONTH values in the SELECT part, so that ORDER BY clause can use it for sorting. Use the following query:
SELECT YEAR(sys_date) as ysysdate,
MONTH(sys_date) as msysdate,
DATE_FORMAT(sys_date, "%Y-%c") as ymonth,
COUNT(*)
FROM sale
GROUP BY ymonth
ORDER BY ysysdate ASC, msysdate ASC
I want a query that can give result with sum of last 7 day look back.
I want output date and sum of last 7 day look back impressions for each date
e.g. I have a table tblFactImps with below data:
dateFact impressions id
2015-07-01 4022 30
2015-07-02 4021 33
2015-07-03 4011 34
2015-07-04 4029 35
2015-07-05 1023 39
2015-07-06 3023 92
2015-07-07 8027 66
2015-07-08 2024 89
I need output with 2 columns:
dateFact impressions_last_7
query I got:
select dateFact, sum(if(datediff(curdate(), dateFact)<=7, impressions,0)) impressions_last_7 from tblFactImps group by dateFact;
Thanks!
If your fact table is not too big, then a correlated subquery is a simple way to do what you want:
select i.dateFact,
(select sum(i2.impressions)
from tblFactImps i2
where i2.dateFact >= i.dateFact - interval 6 day
) as impressions_last_7
from tblFactImps i;
You can achieve this by LEFT OUTER JOINing the table with itself on a date range, and summing the impressions grouped by date, as follows:
SELECT
t1.dateFact,
SUM(t2.impressions) AS impressions_last_7
FROM
tblFactImps t1
LEFT OUTER JOIN
tblFactImps t2
ON
t2.dateFact BETWEEN
DATE_SUB(t1.dateFact, INTERVAL 6 DAY)
AND t1.dateFact
GROUP BY
t1.dateFact;
This should give you a sliding 7-day sum for each date in your table.
Assuming your dateFact column is indexed, this query should also be relatively fast.
MySql 5.5.
I have a table that represents a work assignment:
empId jobNo workDate hours
4 441 10/1/2012 10
4 441 9/1/2012 22
4 441 8/1/2012 6
And one that represents salary:
empId effDate rate
4 10/1/2012 6.50
4 9/1/2012 5.85
4 6/1/2012 4.00
The salary applies to all work performed on or after the effective date. So the rate in jun, jul, and aug is 4.00; sep is 5.85, and oct is 6.50.
If I naively query for October's work:
SELECT Work.empId, Work.jobNo, Work.workDate, Work.hours, Salary.effDate, Salary.rate
FROM Work
JOIN Salary ON Work.empId = Salary.empId
WHERE Work.workDate <= '2012-10-01'
AND Salary.effDate <= Work.workDate
ORDER BY Work.jobNo ASC, Work.workDate DESC;
I do not get what I want. I get something like
4 441 10/1/2012 10 10/1/2012 6.50
4 441 10/1/2012 10 9/1/2012 5.85
4 441 10/1/2012 10 6/1/2012 4.00
4 441 9/1/2012 22 9/1/2012 5.85
4 441 9/1/2012 22 6/1/2012 4.00
4 441 8/1/2012 6 6/1/2012 4.00
When I want
4 441 10/1/2012 10 10/1/2012 6.50
4 441 9/1/2012 22 9/1/2012 5.85
4 441 8/1/2012 6 6/1/2012 4.00
I can't quite wrap my head around how to create the query I want.
The real situation has multiple employees, multiple jobs, obviously.
Thanks for your help.
Here is your actual issue: you want to be able to detect, for each record in Work, what is the corresponding effective rate, according to the work date x salary effective date. When you simply do Salary.effDate <= WORK.workDate you get ALL rates before the work date. But you only want the most recent one.
This is a slightly complicated variant of the greatest-n-per-group problem. There are many ways of doing this, here is one:
SELECT sel.*, Salary.Rate
FROM
(
SELECT Work.empId, Work.jobNo, Work.workDate,
Work.hours, Max(Salary.effDate) effDate
FROM WORK
JOIN Salary ON WORK.empId = Salary.empId
WHERE WORK.workDate <= '2012-10-01'
AND Salary.effDate <= WORK.workDate
GROUP BY WORK.empId, WORK.jobNo, WORK.workDate, WORK.hours
ORDER BY WORK.jobNo ASC, WORK.workDate DESC
) sel
INNER JOIN Salary ON sel.empId = Salary.empId
AND sel.EffDate = Salary.EffDate
First of all, the inner query detects the most recent salary effective date for each work record. Then, we join that with the Salary again to the rate.
See the working SQLFiddle.
You're using what's called a NATURAL JOIN. Try changing the word "JOIN" to "LEFT JOIN" which should group the results on the left, giving you the desired results.
Assuming the salary table has a primary or alternate key (unique index) consisting of the columns empId and effDate, I'd do something like this:
select w.empID as EMPLOYEE_ID ,
w.jobNo as JOB_NUMBER ,
w.workDate as DATE_WORKED ,
w.hours as HOURS_WORKED ,
rate.HourlyWage as HOURLY_WAGE ,
w.hours * rate.HourlyWage as WAGES_CHARGED ,
rate.effDateFrom as HOURLY_WAGE_EFFECTIVE_DATE
from work w
join ( select sfrom.EmpId as EmpID ,
sfrom.rate as HourlyWage ,
sfrom.EffDate as effDateFrom ,
( select min(Effdate)
from salary t
where t.empId = sfrom.EmpId
and t.effDate > sfrom.EffDate
) as effDateThru
from salary sfrom
) rate on rate.empID = w.empID
and rate.EffDateFrom <= w.workDate
and ( rate.effDateThru is null -- if rate has not end date, is the current period
or rate.effDateThru > w.workDate -- 'date-thru' represents the start date of the next period, so the this upper bound is EXCLUSIVE
)
we join the work table against a virtual rate table that gives us each employee's wage and the date range for which it is effective. The 'current' row for each employee will have the thru/expiry date set to null. And...since the thru/expiry date is actually the effective date for the next salary entry, the upper bound is exclusive rather than inclusive. Consequently, the range test must test for null and one can't use between.