SQL query by date - mysql

MySql 5.5.
I have a table that represents a work assignment:
empId jobNo workDate hours
4 441 10/1/2012 10
4 441 9/1/2012 22
4 441 8/1/2012 6
And one that represents salary:
empId effDate rate
4 10/1/2012 6.50
4 9/1/2012 5.85
4 6/1/2012 4.00
The salary applies to all work performed on or after the effective date. So the rate in jun, jul, and aug is 4.00; sep is 5.85, and oct is 6.50.
If I naively query for October's work:
SELECT Work.empId, Work.jobNo, Work.workDate, Work.hours, Salary.effDate, Salary.rate
FROM Work
JOIN Salary ON Work.empId = Salary.empId
WHERE Work.workDate <= '2012-10-01'
AND Salary.effDate <= Work.workDate
ORDER BY Work.jobNo ASC, Work.workDate DESC;
I do not get what I want. I get something like
4 441 10/1/2012 10 10/1/2012 6.50
4 441 10/1/2012 10 9/1/2012 5.85
4 441 10/1/2012 10 6/1/2012 4.00
4 441 9/1/2012 22 9/1/2012 5.85
4 441 9/1/2012 22 6/1/2012 4.00
4 441 8/1/2012 6 6/1/2012 4.00
When I want
4 441 10/1/2012 10 10/1/2012 6.50
4 441 9/1/2012 22 9/1/2012 5.85
4 441 8/1/2012 6 6/1/2012 4.00
I can't quite wrap my head around how to create the query I want.
The real situation has multiple employees, multiple jobs, obviously.
Thanks for your help.

Here is your actual issue: you want to be able to detect, for each record in Work, what is the corresponding effective rate, according to the work date x salary effective date. When you simply do Salary.effDate <= WORK.workDate you get ALL rates before the work date. But you only want the most recent one.
This is a slightly complicated variant of the greatest-n-per-group problem. There are many ways of doing this, here is one:
SELECT sel.*, Salary.Rate
FROM
(
SELECT Work.empId, Work.jobNo, Work.workDate,
Work.hours, Max(Salary.effDate) effDate
FROM WORK
JOIN Salary ON WORK.empId = Salary.empId
WHERE WORK.workDate <= '2012-10-01'
AND Salary.effDate <= WORK.workDate
GROUP BY WORK.empId, WORK.jobNo, WORK.workDate, WORK.hours
ORDER BY WORK.jobNo ASC, WORK.workDate DESC
) sel
INNER JOIN Salary ON sel.empId = Salary.empId
AND sel.EffDate = Salary.EffDate
First of all, the inner query detects the most recent salary effective date for each work record. Then, we join that with the Salary again to the rate.
See the working SQLFiddle.

You're using what's called a NATURAL JOIN. Try changing the word "JOIN" to "LEFT JOIN" which should group the results on the left, giving you the desired results.

Assuming the salary table has a primary or alternate key (unique index) consisting of the columns empId and effDate, I'd do something like this:
select w.empID as EMPLOYEE_ID ,
w.jobNo as JOB_NUMBER ,
w.workDate as DATE_WORKED ,
w.hours as HOURS_WORKED ,
rate.HourlyWage as HOURLY_WAGE ,
w.hours * rate.HourlyWage as WAGES_CHARGED ,
rate.effDateFrom as HOURLY_WAGE_EFFECTIVE_DATE
from work w
join ( select sfrom.EmpId as EmpID ,
sfrom.rate as HourlyWage ,
sfrom.EffDate as effDateFrom ,
( select min(Effdate)
from salary t
where t.empId = sfrom.EmpId
and t.effDate > sfrom.EffDate
) as effDateThru
from salary sfrom
) rate on rate.empID = w.empID
and rate.EffDateFrom <= w.workDate
and ( rate.effDateThru is null -- if rate has not end date, is the current period
or rate.effDateThru > w.workDate -- 'date-thru' represents the start date of the next period, so the this upper bound is EXCLUSIVE
)
we join the work table against a virtual rate table that gives us each employee's wage and the date range for which it is effective. The 'current' row for each employee will have the thru/expiry date set to null. And...since the thru/expiry date is actually the effective date for the next salary entry, the upper bound is exclusive rather than inclusive. Consequently, the range test must test for null and one can't use between.

Related

How to subtract one column from another column finding the previous occurrence of the same id?

I am working with the Sakila video rental database that comes preloaded with MySQL.
I am trying to find the average number of days each video sits on the shelf before it is rented again.
In the rentals table you have the rental_id for each rental transaction, the inventory_id corresponding to the item that was rented, as well as the rental_date and return_date.
For each rental transaction I would like to look at the rental_date and find the difference from the return_date of the previous occurrence of the same inventory_id.
I know LAG() and LEAD() might be useful here, but I have no idea how to make it only consider other rows with the same inventory_id.
Sample data:
rental_id inventory_id rental_date return_date
-------------------------------------------------------
1 115 01-01-2005 01-05-2005
2 209 01-01-2005 01-04-2005
3 115 01-06-2005 01-10-2005
4 209 01-09-2005 01-14-2005
5 209 01-15-2005 01-20-2005
6 115 01-16-2005 01-20-2005
Desired output:
rental_id inventory_id rental_date return_date days_on_shelf
------------------------------------------------------------------------
1 115 01-01-2005 01-05-2005 NULL
2 209 01-01-2005 01-04-2005 NULL
3 115 01-06-2005 01-10-2005 1
4 209 01-09-2005 01-14-2005 5
5 209 01-15-2005 01-20-2005 1
6 115 01-16-2005 01-20-2005 6
Thank you to June7. The correct code should look like this:
SELECT
rental.rental_id,
rental.inventory_id,
inventory.film_id,
rental.rental_date,
rental.return_date,
IF(#lastid = rental.inventory_id,
DATEDIFF(rental.rental_date, #lastreturn),
NULL) AS days_on_shelf,
#lastid:=rental.inventory_id,
#lastreturn:=rental.return_date
FROM
rental
JOIN
inventory ON rental.inventory_id = inventory.inventory_id
ORDER BY rental.inventory_id , rental.rental_date
You seem to just want lag():
select t.*,
datediff(rental_date,
lag(return_date) over (partition by inventory_id order by rental_date)
) as days_on_shelf
from t

Calculating moving average for different values in a column MySQL

I have a dataset like this:
team date score
A 2011-05-01 50
A 2011-05-02 54
A 2011-05-03 51
A 2011-05-04 49
A 2011-05-05 59
B 2011-05-03 30
B 2011-05-04 35
B 2011-05-05 39
B 2011-05-06 47
B 2011-05-07 50
I want to add another column called MA3 where I can calculate the moving average of scores for the last 3 days. The point that made it tricky is to calculate the MA for each team. The end result should be like this:
team date score MA3
A 2011-05-01 50 null
A 2011-05-02 54 null
A 2011-05-03 51 null
A 2011-05-04 49 51.66
A 2011-05-05 59 51.33
B 2011-05-03 30 null
B 2011-05-04 35 null
B 2011-05-05 39 null
B 2011-05-06 47 34.66
B 2011-05-07 50 40.33
If that would be a single team, I would go on and do:
SELECT team,
year,
AVG(score) OVER (ORDER BY date ASC ROWS 3 PRECEDING) AS MA3
FROM table
You're missing the PARTITION BY clause:
SELECT team,
date,
AVG(score) OVER (
PARTITION BY team
ORDER BY date ASC ROWS 3 PRECEDING
) AS MA3
FROM table
Note that there will always be an average calculation, regardless of the window size. If you want the average to be null if your window size is smaller than 3, you could do it like this:
SELECT team,
date,
CASE
WHEN count(*) OVER w <= 3 THEN null
ELSE AVG(score) OVER w
END AS MA3
FROM table
WINDOW w AS (PARTITION BY team ORDER BY date ASC ROWS 3 PRECEDING)
dbfiddle
Side note
Your next question might be about logical windowing, because often, you don't actually want to calculate the average over 3 rows, but over some interval,
like e.g. 3 days. Luckily, MySQL implements this. You could then write:
WINDOW w AS (PARTITION BY team ORDER BY date ASC RANGE INTERVAL 3 DAY PRECEDING)

Finding the largest price increase/decrease in a MySQL table

I am trying to find a way to get the largest price difference (in a time frame, e.g. 24 hours) in a MySQL table using a source and productId as reference.
Here is a sample product, productId 22.
id price createdAt updatedAt sourceId productId
21 799.00 2017-07-26 19:46:46 2017-07-26 19:46:45 1 22
853 920.00 2017-07-26 06:46:46 2017-07-26 06:46:46 1 22
855 799.00 2017-07-22 16:17:11 2017-07-22 16:17:11 2 22
851 770.00 2017-07-21 16:17:11 2017-07-21 16:17:11 1 22
856 799.00 2017-07-20 16:17:11 2017-07-20 16:17:11 2 22
852 599.00 2017-07-19 16:17:11 2017-07-19 16:17:11 1 22
857 810.00 2017-07-18 16:17:11 2017-07-18 16:17:11 2 22
858 799.00 2017-07-17 16:17:11 2017-07-17 16:17:11 2 22
In the example above for productId 22 I am sorting by createdAt, so in this scenario I'd take id 21 and substract it from id 853, this would give -121, meaning the product went down 121 dollars.
In the full data it's a mush up of prices, sourceIds and productIds. The goal here is to make a result look like this:
id createdAt sourceId productId adjustment
21 2017-07-26 19:46:46 1 22 -121
22 2017-07-26 16:46:46 2 22 201
23 2017-07-26 15:46:46 6 24 -20
Above is kind of how I am trying to get the data to look, so I'll know of the price difference of each product of each source. Then I can control the data, such as ordering by adjustment and seeing which source + product had the largest decrease or increase in a time frame.
I've tried doing a ton of sub-queries, I've probably put in a hundred examples that I've modified from Google. I can piece together parts of this, such as only getting products that have recieved a change of any kind from the past 24 hours. I've tried to merge the last two rows of each product Id, then do a math, and list all the products. It's been 2 days of trying to build this query, is it just best for me to not use queries for everything and do it on my backend?
I've even went to a support site like hackhands and they couldn't figure it out. I've exhausted all of my ideas.
This query breaks down the problem:
1) Getting the records corresponding to start_at time of the window for each product in order to get the baseline price.
2) Gets the the records for the max price for each product in the time frame.
3) Gets the records for the min price for each product in the time frame.
4) Combines 1 and 2 and 3 to form a single record per product and shows the info and the difference between base line price and the highest and lowest in the time frame.
If you only need the bigger of the two you can add and extra layer of select wrapping this query and user GREATER(a,b) to keep one diff or the other.
select BOWPRICE.product_id, BOWPRICE.created_at, BOWPRICE.price,
MAXPRICE.max_price_upd_time, MAXPRICE.max_price, ABS((BOWPRICE.price - MAXPRICE.max_price)) max_price_diff,
MINPRICE.min_price_upd_time, MINPRICE.min_price, ABS((BOWPRICE.price - MINPRICE.min_price)) min_price_diff
from
(
select mainA.product_id, mainA.created_at, mainA.price from SOTEST mainA
where id in (
select id
from SOTEST N
where created_at = (
select min(N1.created_at)
from SOTEST N1
where N1.created_at >= '2017-07-26 00:00:00'
and N1.product_id = N.product_id
)
group by mainT.product_id
)
) BOWPRICE,
(
select mainB.product_id, mainB.updated_at max_price_upd_time, mainB.price max_price from SOTEST mainB
where id in(
select id from SOTEST M
where M.price = (
select max(M1.price)
from SOTEST M1
where M1.created_at >= '2017-07-26 00:00:00'
and M1.created_at < '2017-07-27 00:00:00'
and M1.product_id = M.product_id
group by product_id LIMIT 1
)
)
) MAXPRICE,
(
select mainC.product_id, mainC.updated_at min_price_upd_time, mainC.price min_price from SOTEST mainC
where id in(
select id from SOTEST Q
where Q.price = (
select min(Q1.price)
from SOTEST Q1
where Q1.created_at >= '2017-07-26 00:00:00'
and Q1.created_at < '2017-07-27 00:00:00'
and Q1.product_id = Q.product_id
group by product_id LIMIT 1
)
)
) MINPRICE
where BOWPRICE.product_id = MAXPRICE.product_id
and BOWPRICE.product_id = MINPRICE.product_id

mysql group by day and count then filter only the highest value for each day

I'm stuck on this query. I need to do a group by date, card_id and only show the highest hits. I have this data:
date card_name card_id hits
29/02/2016 Paul Stanley 1345 12
29/02/2016 Phil Anselmo 1347 16
25/02/2016 Dave Mustaine 1349 10
25/02/2016 Ozzy 1351 17
23/02/2016 Jhonny Cash 1353 13
23/02/2016 Elvis 1355 15
20/02/2016 James Hethfield 1357 9
20/02/2016 Max Cavalera 1359 12
My query at the moment
SELECT DATE(card.create_date) `day`, `name`,card_model_id, count(1) hits
FROM card
Join card_model ON card.card_model_id = card_model.id
WHERE DATE(card.create_date) >= DATE(DATE_SUB(NOW(), INTERVAL 1 MONTH)) AND card_model.preview = 0
GROUP BY `day`, card_model_id
;
I want to group by date, card_id and filter the higher hits result showing only one row per date. As if I run a max(hits) with group by but I won't work
Like:
date card_name card_id hits
29/02/2016 Phil Anselmo 1347 16
25/02/2016 Ozzy 1351 17
23/02/2016 Elvis 1355 15
20/02/2016 Max Cavalera 1359 12
Any light on that will be appreciated. Thanks for reading.
Here is one way to do this. Based on your sample data (not the query):
select s.*
from sample s
where s.hits = (select max(s2.hits)
from sample s2
where date(s2.date) = date(s.date)
);
Your attempted query seems to have no relationship to the sample data, so it is unclear how to incorporate those tables (the attempted query has different columns and two tables).

MySQL Days Between One Order and Next Order Having > 1 Order

My goal is to find how many customers order more than once between the criteria below. Or to put into other terms, how long it takes for customers to place their next order with these criteria:
0 - 12 mos
13 - 24 mos
25 - 36 mos
37+ mos
Table is set up by line item on each order. like this:
Customer | Item | OrderNumber | OrderDate
1500 item1 5555 2015-02-01
1500 item2 5555 2015-02-01
1500 item34 5255 2014-05-25
1500 item44 4100 2012-12-30
2200 item55 5100 2014-02-15
2200 item1 5100 2014-02-15
3255 item12 5300 2015-03-05
3255 item34 5399 2014-05-01
3255 item22 5399 2014-05-01
So if it takes less than 12 mos for a customer to order more than once then it should be counted towards the "0-12 mos". If a customer took 18 mos to place their next order they would be counted towards the "13-24 mos" and so on and so forth.
I don't really know where to begin on this one. I probably will have to at least have: HAVING COUNT(DISTINCT OrderNumber) > 1. I have never used LAG, is this something that I should utilize a MySQL variant of to find the next OrderDate in the sequence?
Any help would be appreciated to at least start to identify the components of the query needed.
If you just wanted the average time between orders, you can do something like this:
select floor(days_between / 365) as numyears, count(*)
from (select customer, datediff(max(orderdate), min(orderdate)) as days_between
count(*) as numorders
from orders
group by customer
having count(*) >= 2
) c
group by numyears;
If you really want to understand the timing between orders, learn about survival analysis, particularly recurrent event analysis. Your question, although well-enough formed, is rather naive because it does not consider customers with only one order nor the time since a customer's last order.