MySQL group by where day <= x - mysql

I need some help figuring out the correct SQL statement.
If've got a table with the following structure:
id, product_id, units, timestamp
I wan't a list which contains the all over units per day. A product has maximum one record per day.
So my first try was:
SELECT
DATE(timestamp) as day, SUM(units) as overall_units
FROM
tbl
GROUP BY
DATE(timestamp);
Normally this should do it. But sometimes there are days where is no record for a product. Nevertheless the units are still in the warehouse so they should be in the calculation.
For example:
We have 3 products. Cars, pens and wheels.
Records from 2012-10-20:
Cars => 5
Pens => 20
Wheels => 4
Records from 2012-10-21
Cars => 5
Wheels => 6
My query would give the following results:
2012-10-20 => 29
2012-10-21 => 11
But I want, that if there's no record for a product for the day it should use the record for this product which is the nearest one back in time.
So it should be:
2012-10-21 => 31
I hope you understand my needs.

SELECT
MAX (DATE(timestamp) ) as day, SUM(units) as overall_units
FROM tbl
update ::
SELECT max(day),sum(ou) from
( select DATE(timestamp) as day, SUM(units) as ou
FROM tbl
GROUP BY DATE(timestamp);
)
inner qry will return
2012-10-20 , 29
2012-10-21 , 11
and the final query will return
2012-10-21 , 40

SELECT
dd.ddate AS day
, SUM(t.units) as overall_units
FROM
( SELECT DISTINCT
product_id
FROM
tbl
) AS dp
CROSS JOIN
( SELECT DISTINCT
DATE(timestamp) AS ddate
FROM
tbl
) AS dd
JOIN
tbl AS t
ON t.id =
( SELECT
tt.id
FROM
tbl AS tt
WHERE
tt.product_id = dp.product_id
AND
tt.timestamp < dd.ddate + INTERVAL 1 DAY
ORDER BY tt.timestamp DESC
LIMIT 1
)
GROUP BY
dd.ddate ;

I think you should look into the DISTINCT() function.
The query could be something like: SELECT DISTINCT(product_id), * FROM tbl ORDER BY timestamp DESC; The use PHP to loop through your results and cumulate the units.

Related

Getting total average between dates

I have a table named sales with the following format.
sale_id user_id sale_date sale_cost
j847bv-6ggd bd48ta36-cn5x 2017-01-10 15:43:12 30
vf87x2-15gr bd48ta36-cn5x 2017-01-05 13:41:16 60
3gfd7f-2cdd 8g4f5ccf-1fet 2017-01-15 14:10:12 100
4bgfd5-12vn 8g4f5ccf-1fet 2017-01-20 19:47:14 20
b58e32-bf87 8g4f5ccf-1fet 2017-01-20 17:35:13 15
bg87db-127g gr4gg1f4-3gbb 2017-01-20 12:26:15 80
How could I get the average amount that a user (user_d) spends within the first X amount of days since their first purchase? I don't want an average for every user, but a total average for all.
I know that I can get the average using select avg(sale_cost) but I'm unsure how to find out the average for a date period.
You can find average of total for each user within 10 days date range from intial sales date like this:
select avg(sale_cost)
from (
select sum(t.sale_cost) sale_cost
from your_table t
join (
select user_id, min(sale_date) start_date, date_add(min(sale_date), interval 10 day) end_date
from your_table
group by user_id
) t2 on t.user_id = t2.user_id
and t.sale_date between t2.start_date and t2.end_date
group by t.user_id
) t;
It finds the first sale_date and date 10 days after this for each user. Then joins it with the table to get total for each user within that range and then finally average of the above calculated totals.
Demo
If you want to find the average between overall first sale_date (not individual) and 10 days from it, use:
select avg(sale_cost)
from (
select sum(t.sale_cost) sale_cost
from your_table t
join (
select min(sale_date) start_date, date_add(min(sale_date), interval 10 day) end_date
from your_table
) t2 on t.sale_date between t2.start_date and t2.end_date
group by t.user_id
) t;
Demo
The between operator comes in handy whenever it comes to checking ranges
SELECT column_name(s)
FROM table_name
WHERE column_name BETWEEN value1 AND value2;
In this case value1 and value2 will be replaced by your dates using:
'2011-01-01 00:00:00' AND '2011-01-31 23:59:59'
or
sale_date AND DATE_ADD(OrderDate,INTERVAL 10 DAY)
The first way is faster and also the between values are inclusive.

MySQL loop and multiple LEFT joins

I got the following code:
SELECT
COALESCE(rv.views, 0) as views
FROM
( select 0 as n
union all select 1
union all select 2
union all select 3 ) n
LEFT JOIN restaurant_views rv
on rv.date = date_add("2015-02-24", interval - n.n day)
and restaurant_id = 192
This code is giving me the amount of views a restaurant had the last 4 days.
I am looking for a similar query to get the amount of likes a restaurant had the last 4 days.
This is what I got so far:
SELECT
( COUNT( DISTINCT a.restaurant_id)
+ COUNT( DISTINCT d.restaurant_id)) as num_likes
FROM
( select 0 as n
union all select 1
union all select 2
union all select 3 ) n
LEFT JOIN apple_likes a
on a.vote_date = date_add("2015-02-24", interval - n.n day)
and a.restaurant_id = 192
LEFT JOIN android_likes d
on d.vote_date = date_add("2015-02-24", interval - n.n day)
and d.restaurant_id = 192
And here is the output, which is as you can see not what I'm looking for:
What do I have to change to get the number of likes in the last query?
(I have checked that the restaurant has likes on all days, so I am positive it's something wrong with the query)
Try this one:
SELECT
( a.likes)
+ d.likes) as num_likes
FROM
( select 0 as n
union all select 1
union all select 2
union all select 3 ) n
LEFT JOIN (
SELECT vote_date,COUNT(*) as likes
FROM apple_likes
WHERE restaurant_id = 192
GROUP BY restaurant_id, vote_date
) as a
on a.vote_date = date_add("2015-02-24", interval - n.n day)
LEFT JOIN (
SELECT vote_date, COUNT(*) as likes
FROM android_likes
WHERE restaurant_id = 192
GROUP BY restaurant_id, vote_date
) as d
on d.vote_date = date_add("2015-02-24", interval - n.n day)
I can think of a couple items that might be what you are encountering...
Just because somebody VIEWS a restaurant, does that mean they actually VOTED??? And if Voted, are the only two devices that of apple or android? What if viewing from a browser and they are on a Windows machine browser-based?
Date Equality. In the restaurant views table, is the date field ALWAYS that of a time = 12:00:00 (ie: midnight/morning of the day). If the time-stamps of the votes are anything other than 12:00:00, and you are trying to compare for a date = date + time is probably failing. What you may need is a comparison of the date( vote_date ) = date( date_add( ... )) so this way BOTH are ignoring the time component... Now, that being said, a function on a date column is not going to be optimized, even if the restaurant ID is numeric and part of the index key... it would be PARTIALLY optimized. You may want to just add a generic date of AND vote_date >= '2015-02-20' so it can optimize the restaurant and date, then apply the DATE( vote_date ) for the actual qualfying of records.

Calculating a Moving Average MySQL?

Good Day,
I am using the following code to calculate the 9 Day Moving average.
SELECT SUM(close)
FROM tbl
WHERE date <= '2002-07-05'
AND name_id = 2
ORDER BY date DESC
LIMIT 9
But it does not work because it first calculates all of the returned fields before the limit is called. In other words it will calculate all the closes before or equal to that date, and not just the last 9.
So I need to calculate the SUM from the returned select, rather than calculate it straight.
IE. Select the SUM from the SELECT...
Now how would I go about doing this and is it very costly or is there a better way?
If you want the moving average for each date, then try this:
SELECT date, SUM(close),
(select avg(close) from tbl t2 where t2.name_id = t.name_id and datediff(t2.date, t.date) <= 9
) as mvgAvg
FROM tbl t
WHERE date <= '2002-07-05' and
name_id = 2
GROUP BY date
ORDER BY date DESC
It uses a correlated subquery to calculate the average of 9 values.
Starting from MySQL 8, you should use window functions for this. Using the window RANGE clause, you can create a logical window over an interval, which is very powerful. Something like this:
SELECT
date,
close,
AVG (close) OVER (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
FROM tbl
WHERE date <= DATE '2002-07-05'
AND name_id = 2
ORDER BY date DESC
For example:
WITH t (date, `close`) AS (
SELECT DATE '2020-01-01', 50 UNION ALL
SELECT DATE '2020-01-03', 54 UNION ALL
SELECT DATE '2020-01-05', 51 UNION ALL
SELECT DATE '2020-01-12', 49 UNION ALL
SELECT DATE '2020-01-13', 59 UNION ALL
SELECT DATE '2020-01-15', 30 UNION ALL
SELECT DATE '2020-01-17', 35 UNION ALL
SELECT DATE '2020-01-18', 39 UNION ALL
SELECT DATE '2020-01-19', 47 UNION ALL
SELECT DATE '2020-01-26', 50
)
SELECT
date,
`close`,
COUNT(*) OVER w AS c,
SUM(`close`) OVER w AS s,
AVG(`close`) OVER w AS a
FROM t
WINDOW w AS (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
ORDER BY date DESC
Leading to:
date |close|c|s |a |
----------|-----|-|---|-------|
2020-01-26| 50|1| 50|50.0000|
2020-01-19| 47|2| 97|48.5000|
2020-01-18| 39|3|136|45.3333|
2020-01-17| 35|4|171|42.7500|
2020-01-15| 30|4|151|37.7500|
2020-01-13| 59|5|210|42.0000|
2020-01-12| 49|6|259|43.1667|
2020-01-05| 51|3|159|53.0000|
2020-01-03| 54|3|154|51.3333|
2020-01-01| 50|3|155|51.6667|
Use something like
SELECT
sum(close) as sum,
avg(close) as average
FROM (
SELECT
(close)
FROM
tbl
WHERE
date <= '2002-07-05'
AND name_id = 2
ORDER BY
date DESC
LIMIT 9 ) temp
The inner query returns all filtered rows in desc order, and then you avg, sum up those rows returned.
The reason why the query given by you doesn't work is due to the fact that the sum is calculated first and the LIMIT clause is applied after the sum has already been calculated, giving you the sum of all the rows present
an other technique is to do a table:
CREATE TABLE `tinyint_asc` (
`value` tinyint(3) unsigned NOT NULL default '0',
PRIMARY KEY (value)
) ;
​
INSERT INTO `tinyint_asc` VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250),(251),(252),(253),(254),(255);
After you can used it like that:
select
date_add(tbl.date, interval tinyint_asc.value day) as mydate,
count(*),
sum(myvalue)
from tbl inner
join tinyint_asc.value <= 30 -- for a 30 day moving average
where date( date_add(o.created_at, interval tinyint_asc.value day ) ) between '2016-01-01' and current_date()
group by mydate
This query is fast:
select date, name_id,
case #i when name_id then #i:=name_id else (#i:=name_id)
and (#n:=0)
and (#a0:=0) and (#a1:=0) and (#a2:=0) and (#a3:=0) and (#a4:=0) and (#a5:=0) and (#a6:=0) and (#a7:=0) and (#a8:=0)
end as a,
case #n when 9 then #n:=9 else #n:=#n+1 end as n,
#a0:=#a1,#a1:=#a2,#a2:=#a3,#a3:=#a4,#a4:=#a5,#a5:=#a6,#a6:=#a7,#a7:=#a8,#a8:=close,
(#a0+#a1+#a2+#a3+#a4+#a5+#a6+#a7+#a8)/#n as av
from tbl,
(select #i:=0, #n:=0,
#a0:=0, #a1:=0, #a2:=0, #a3:=0, #a4:=0, #a5:=0, #a6:=0, #a7:=0, #a8:=0) a
where name_id=2
order by name_id, date
If you need an average over 50 or 100 values, it's tedious to write, but
worth the effort. The speed is close to the ordered select.

specific status on consecutive days

I have a MySQL table ATT which has EMP_ID,ATT_DATE,ATT_STATUS with ATT_STATUS with different values 1-Present,2-Absent,3-Weekly-off. I want to find out those EMP_ID's which have status 2 consecutively for 10 days in a given date range.
Please help
Please have a try with this:
SELECT EMP_ID FROM (
SELECT
IF((#prevDate!=(q.ATT_DATE - INTERVAL 1 DAY)) OR (#prevEmp!=q.EMP_ID) OR (q.ATT_STATUS != 2), #rownum:=#rownum+1, #rownum:=#rownum) AS rownumber, #prevDate:=q.ATT_DATE, #prevEmp:=q.EMP_ID, q.*
FROM (
SELECT
EMP_ID
, ATT_DATE
, ATT_STATUS
FROM
org_tb_dailyattendance, (SELECT #rownum:=0, #prevDate:='', #prevEmp:=0) vars
WHERE ATT_DATE BETWEEN '2013-01-01' AND '2013-02-15'
ORDER BY EMP_ID, ATT_DATE, ATT_STATUS
) q
) sq
GROUP BY EMP_ID, rownumber
HAVING COUNT(*) >= 10
The logic is, to first sort the table by employee id and the dates. Then introduce a rownumber which increases only if
the days are not consecutive or
the employee id is not the previous one or
the status is not 2
Then I just grouped by this rownumber and counted if there are 10 rows in each group. That should be the ones who were absent for 10 days or more.
Have you tried something like this
SELECT EMP_ID count(*) as consecutive_count min(ATT_DATE)
FROM (SELECT * FROM ATT ORDER BY EMP_ID)
GROUP BY EMP_ID, ATT_DATE
WHERE ATT_STATUS = 2
HAVING consecutive_count > 10

least value in count

i have a table employee(id,dept_id,salary,hire_date,job_id) . the following query i have to execute.
Show all the employee who were hired on the day of the week on which least no of employee were hired.
i have done the query, but am not able to get the least. please check if am correct.
select id, WEEKDAY(hire_date)+1 as days,count(WEEKDAY(hire_date)+1) as count
from test.employee group by days
This should get you the weekday on which the least number of employees were hired:
SELECT
count(id) as `Total`,
WEEKDAY(hire_date) as `DoW`
FROM
test.employee
GROUP BY `DoW`
ORDER BY `Total` DESC LIMIT 1;
select id from test.employee where hire_date in
( select count(id) count,hire_date
from test.employee
order by count desc
limit 1)
this should work
You may try this, as it will not limit to one record if you have multiple week days where the same least number of employees were hired. In reality it makes sense. The following is based on sample data.
Query:
-- find minimum id count
SELECT MIN(e.counts) INTO #min
FROM (SELECT COUNT(*) as counts,
WEEKDAY(hire_date+1) as day
FROM employee
GROUP BY WEEKDAY(hire_date+1)) e
;
-- show weekdays with minimum id counts
SELECT e2.counts as mincount,
WEEKDAY(e1.hire_date+1) as weekday
FROM employee e1
JOIN (SELECT COUNT(id) as counts,
WEEKDAY(hire_date+1) as day
FROM employee
GROUP BY day
HAVING COUNT(*) = #min) e2
ON WEEKDAY(e1.hire_date+1) = e2.day;
Results:
MINCOUNT WEEKDAY
1 6
1 3
1 4
1 2
SQLFIDDLE
select min(id), WEEKDAY(hire_date)+1 as days,count(WEEKDAY(hire_date)+1) as count
from test.employee group by days
SELECT
*
FROM
employee
WHERE
DAYOFWEEK(hire_date)
IN
(
SELECT
weekday
FROM
(
SELECT
count(*) as bcount,
DAYOFWEEK(hire_date) as weekday
FROM
employee as a
GROUP BY
weekday
HAVING
bcount = (
SELECT
MIN(tcount)
FROM
(
SELECT
count(*) as tcount,
DAYOFWEEK(hire_date) as weekday
FROM
employee
GROUP BY
weekday
) as t
)
) as q