MySQL Join tables on weekday+hour of timestamp, count per table - mysql

I have a couple of tables with exact the same structure. They contain data from forms filled in at the entrance of a yearly event. One of the fields in these tables is a DATETIME with the date and time of the entry.
For statistical purposes, I am trying to compare the amount of entries per hour of each year. Of course, I could run separate queries for each table and put them together in PHP. But I think it should also be possible in one single query. However, I cannot figure out how to build up a correct query.
This is what I've got so far:
SELECT
WEEKDAY(P3.AddedOn) Day,
HOUR(P3.AddedOn) Hour,
COUNT(P3.AddedOn) Entries2013,
COUNT(P2.AddedOn) Entries2012
FROM Event2013 P3
LEFT JOIN Event2012 P2
ON WEEKDAY(P3.AddedOn) = WEEKDAY(P2.AddedOn)
AND HOUR(P3.AddedOn) = HOUR(P2.AddedOn)
GROUP BY WEEKDAY(P3.AddedOn), WEEKDAY(P2.AddedOn), HOUR(P3.AddedOn), HOUR(P2.AddedOn)
But this query yields some strange results with too large numbers and the same numbers in the Entries2012 and Entries2013 columns. It adds up some data, but I cannot figure out exactly which.
What am I doing wrong?
Thank you for your help!
--
Solved with subqueries:
SELECT
P2.Day,
P2.Hour,
P2.Entries Entries2012,
P3.Entries Entries2013
FROM
(
SELECT
WEEKDAY(AddedOn) Day,
HOUR(AddedOn) Hour,
COUNT(1) AS Entries
FROM Event2012
GROUP BY Day, Hour
) P2
LEFT JOIN
(
SELECT
WEEKDAY(AddedOn) Day,
HOUR(AddedOn) Hour,
COUNT(1) AS Entries
FROM Event2013
GROUP BY Day, Hour
) P3
ON P2.Day = P3.Day
AND P2.Hour = P3.Hour
--
Now also with some kind of MySQL FULL OUTER JOIN:
SELECT
Day,
Hour,
Entries2012,
Entries2013
FROM (
SELECT
(CASE P2.Day
WHEN 4 THEN "Fr"
WHEN 5 THEN "Sat"
WHEN 6 THEN "Sun"
ELSE P2.Day
END) AS Day,
P2.Hour,
COALESCE(P2.Entries, 0) Entries2012,
COALESCE(P3.Entries, 0) Entries2013
FROM (
SELECT
WEEKDAY(AddedOn) Day,
HOUR(AddedOn) Hour,
COUNT(1) AS Entries
FROM Event2012
GROUP BY Day, Hour) P2
LEFT JOIN (
SELECT
WEEKDAY(AddedOn) Day,
HOUR(AddedOn) Hour,
COUNT(1) AS Entries
FROM Event2013
GROUP BY Day, Hour) P3
ON P2.Day = P3.Day
AND P2.Hour = P3.Hour
UNION SELECT
(CASE P3.Day
WHEN 4 THEN "Fr"
WHEN 5 THEN "Sat"
WHEN 6 THEN "Sun"
ELSE P3.Day
END) AS Day,
P3.Hour,
COALESCE(P2.Entries, 0) Entries2012,
COALESCE(P3.Entries, 0) Entries2013
FROM (
SELECT
WEEKDAY(AddedOn) Day,
HOUR(AddedOn) Hour,
COUNT(1) AS Entries
FROM Event2012
GROUP BY Day, Hour) P2
RIGHT JOIN (
SELECT
WEEKDAY(AddedOn) Day,
HOUR(AddedOn) Hour,
COUNT(1) AS Entries
FROM Event2013
GROUP BY Day, Hour) P3
ON P2.Day = P3.Day
AND P2.Hour = P3.Hour
WHERE P2.Hour IS NULL) AS tmp
ORDER BY Day, Hour

LEFT JOIN says that you intend to take all entries from P3 and expect that some of them won't match in P2. If P3 is your base table (and you look for possible matches in another table), you only have to group by P3 columns.
SELECT P3.Day Day, P3.Hour Hour, Entries2013, Entries2012
FROM (
SELECT
WEEKDAY(AddedOn) Day,
HOUR(AddedOn) Hour,
COUNT(AddedOn) Entries2013
GROUP BY Day, Hour
FROM Event2013) P3
LEFT JOIN (
SELECT
WEEKDAY(AddedOn) Day,
HOUR(AddedOn) Hour,
COUNT(AddedOn) Entries2012
GROUP BY Day, Hour
FROM Event2012) P2
ON P3.DAY = P2.DAY
AND P3.Hour = P2.Hour
GROUP BY P3.Day, P3.Hour;
Also, using WEEKDAY means you sum all Mondays, Tuesdays etc in the given year and then get hourly summaries from that. If you want to get daily summaries (where the day is the day of the year), you should use DAYOFYEAR i/o WEEKDAY.

Look at the UNION statement: http://dev.mysql.com/doc/refman/5.0/en/union.html
This combines queries from two tables.

Related

MySQL Get Orders From Last 12 Weeks Monday to Sunday

I have a table that stores each order made by a user, recording the date it was made , the amount and the user id. I am trying to create a query that returns the weekly transactions from Monday to Sunday for the last 12 weeks for a particular user. I am using the following query:
SELECT COUNT(*) AS Orders,
SUM(amount) AS Total,
DATE_FORMAT(transaction_date,'%m/%Y') AS Week
FROM shop_orders
WHERE user_id = 123
AND transaction_date >= now()-interval 3 month
GROUP BY YEAR(transaction_date), WEEKOFYEAR(transaction_date)
ORDER BY DATE_FORMAT(transaction_date,'%m/%Y') ASC
This produces the following result:
This however does not return the weeks where the user has made 0 orders, does not sum the orders from Monday to Sunday and does not return the weeks ordered from 1 to 12. Is there a way to achieve these things?
One way to accomplish this is with an self outer join (in this case, I use a right outer join, but of course a left outer join would work as well).
To start your weeks on Monday, subtract the result of WEEKDAY from your column transaction_date with DATE_SUB, as proposed in the most upvoted answer here.
SELECT
COALESCE(t1.Orders, 0) AS `Orders`,
COALESCE(t1.Total, 0) AS `Total`,
t2.Week AS `Week`
FROM
(
SELECT
COUNT(*) AS `Orders`,
SUM(amount) AS `Total`,
DATE(DATE_SUB(transaction_date, INTERVAL(WEEKDAY(transaction_date)) DAY)) AS `Week`
FROM
shop_orders
WHERE 1=1
AND user_id = 123
AND transaction_date >= NOW() - INTERVAL 12 WEEK
GROUP BY
3
) t1 RIGHT JOIN (
SELECT
DATE(DATE_SUB(transaction_date, INTERVAL(WEEKDAY(transaction_date)) DAY)) AS `Week`
FROM
shop_orders
WHERE
transaction_date >= NOW() - INTERVAL 12 WEEK
GROUP BY
1
ORDER BY
1
) t2 USING (Week)
To return the weeks with no Orders you have to create a table with all the weeks.
For the order order by the same fields in the group by

SQL Sum two values within same query

Good Day,
I have a table that contains 3 columns. Date, Store, Straight_Sales. Each day a new record is created for each store with their previous day's sales.
What I am trying to do is generate a result set that has both current month to date sales for each location as well as the past year same MTD sales.
I can accomplish this by using two totally separate queries and result sets however I am trying to include these in the same query for reporting purposes.
Here are my two current queries that work just fine:
Last Year Month to Date:
SELECT SUM(summ_sales_daily.straight_sales), store_master.name
FROM
store_master
INNER JOIN summ_sales_daily ON store_master.unit = summ_sales_daily.store
WHERE YEAR(date)=YEAR(DATE_SUB(NOW(), INTERVAL 1 YEAR)) AND MONTH(date)=MONTH(NOW())
GROUP BY summ_sales_daily.store ORDER BY summ_sales_daily.store
Current Year Month to Date:
SELECT SUM(summ_sales_daily.straight_sales), store_master.name
FROM
store_master
INNER JOIN summ_sales_daily ON store_master.unit = summ_sales_daily.store
WHERE YEAR(date)=YEAR(NOW()) AND MONTH(date)=MONTH(NOW())
GROUP BY summ_sales_daily.store ORDER BY summ_sales_daily.store
I'd like these to return the current and previous years MTD in the same result along with the store name (hence the join)
Any help would be awesome!
Using MariaDB
You can either use conditional aggregation and move the different conditions into a case expression within the sum function:
SELECT
store_master.name
, SUM(CASE WHEN YEAR(date)=YEAR(DATE_SUB(NOW(), INTERVAL 1 YEAR)) THEN summ_sales_daily.straight_sales ELSE 0 END) last_year_sales
, SUM(CASE WHEN YEAR(date)=YEAR(NOW()) THEN summ_sales_daily.straight_sales ELSE 0 END) current_year_sales
FROM store_master
INNER JOIN summ_sales_daily ON store_master.unit = summ_sales_daily.store
WHERE MONTH(date)=MONTH(NOW())
GROUP BY summ_sales_daily.store
ORDER BY summ_sales_daily.store;
Or you can calculate the two different values in a couple of derived tables that you join with:
SELECT
store_master.name,
last_year.sales as previous_mtd,
current_year.sales as current_mtd
FROM store_master
LEFT JOIN (
SELECT store, SUM(straight_sales) sales
FROM summ_sales_daily
WHERE YEAR(date)=YEAR(DATE_SUB(NOW(), INTERVAL 1 YEAR)) AND MONTH(date)=MONTH(NOW())
GROUP BY store
) last_year ON store_master.unit = last_year.store
LEFT JOIN (
SELECT store, SUM(summ_sales_daily.straight_sales) sales
FROM summ_sales_daily
WHERE YEAR(date)=YEAR(NOW()) AND MONTH(date)=MONTH(NOW())
GROUP BY store
) current_year ON store_master.unit = current_year.store ;
Sample SQL Fiddle
The first solution would probably perform better.

Mysql picking the greatest columna after aggregation

I have a table of sold items, with specified name, day, hour and amount of items sold.
What I need to do is for every day find the hour in which the greatest number of items (of any type) was sold and return the two-columned table with day and amount of items.
What I managed to do is to compute the sum of items per hour, but how to pick the hour with maximum amount of items sold and show it together with the day?
here is my lousy sqlfiddle attempt: http://sqlfiddle.com/#!9/93b51/17/0
select day, hour, sum(amount) as suma
from sold_items
group by day, hour
You need to join your query with juergen d's query that gets the maximum hourly amount each day.
SELECT a.day, a.hour, a.suma
FROM (
select day, hour, sum(amount) as suma
from sold_items
group by day, hour) AS a
JOIN (
select day, max(suma) AS maxsuma
from (
select day, hour, sum(amount) as suma
from sold_items
group by day, hour) AS tmp
group by day) AS b
ON a.day = b.day AND a.suma = b.maxsuma
DEMO
This follows the same pattern as SQL Select only rows with Max Value on a Column except that in this case, you're doing it with a subquery that calculates an aggregate, not the data coming directly from the table.
select day, max(suma)
from
(
select day, hour, sum(amount) as suma
from sold_items
group by day, hour
) tmp
group by day
SQLFiddle

Return active users in the last 30 days for each day

I have a table, activity that looks like the following:
date | user_id |
Thousands of users and multiple dates and activity for all of them. I want to pull a query that will, for every day in the result, give me the total active users in the last 30 days. The query I have now looks like the following:
select date, count(distinct user_id) from activity where date > date_sub(date, interval 30 day) group by date
This gives me total unique users on only that day; I can't get it to give me the last 30 for each date. Help is appreciated.
To do this you need a list of the dates and join that against the activities.
As such this should do it. A sub query to get the list of dates and then a count of user_id (or you could use COUNT(*) as I presume user_id cannot be null):-
SELECT date, COUNT(user_id)
FROM
(
SELECT DISTINCT date, DATE_ADD(b.date, INTERVAL -30 DAY) AS date_minus_30
FROM activity
) date_ranges
INNER JOIN activity
ON activity.date BETWEEN date_ranges.date_minus_30 AND date_ranges.date
GROUP BY date
However if there can be multiple records for a user_id on any particular date but you only want the count of unique user_ids on a date you need to count DISTINCT user_id (although note that if a user id occurs on 2 different dates within the 30 day date range they will only be counted once):-
SELECT activity.date, COUNT(DISTINCT user_id)
FROM
(
SELECT DISTINCT date, DATE_ADD(b.date, INTERVAL -30 DAY) AS date_minus_30
FROM activity
) date_ranges
INNER JOIN activity
ON activity.date BETWEEN date_ranges.date_minus_30 AND date_ranges.date
GROUP BY date
A bit cruder would be to just join the activity table against itself based on the date range and use COUNT(DISTINCT ...) to just eliminate the duplicates:-
SELECT a.date, COUNT(DISTINCT a.user_id)
FROM activity a
INNER JOIN activity b
ON a.date BETWEEN DATE_ADD(b.date, INTERVAL -30 DAY) AND b.date
GROUP by a.date

SQL selecting average score over range of dates

I have 3 tables:
doctors (id, name) -> has_many:
patients (id, doctor_id, name) -> has_many:
health_conditions (id, patient_id, note, created_at)
Every day each patient gets added a health condition with a note from 1 to 10 where 10 is a good health (full recovery if you may).
What I want to extract is the following 3 statistics for the last 30 days (month):
- how many patients got better
- how many patients got worst
- how many patients remained the same
These statistics are global so I don't care right now of statistics per doctor which I could extract given the right query.
The trick is that the query needs to extract the current health_condition note and compare with the average of past days (this month without today) so one needs to extract today's note and an average of the other days excluding this one.
I don't think the query needs to define who went up/down/same since I can loop and decide that. Just today vs. rest of the month will be sufficient I guess.
Here's what I have so far which obv. doesn't work because it only returns one result due to the limit applied:
SELECT
p.id,
p.name,
hc.latest,
hcc.average
FROM
pacients p
INNER JOIN (
SELECT
id,
pacient_id,
note as LATEST
FROM
health_conditions
GROUP BY pacient_id, id
ORDER BY created_at DESC
LIMIT 1
) hc ON(hc.pacient_id=p.id)
INNER JOIN (
SELECT
id,
pacient_id,
avg(note) AS average
FROM
health_conditions
GROUP BY pacient_id, id
) hcc ON(hcc.pacient_id=p.id AND hcc.id!=hc.id)
WHERE
date_part('epoch',date_trunc('day', hcc.created_at))
BETWEEN
(date_part('epoch',date_trunc('day', hc.created_at)) - (30 * 86400))
AND
date_part('epoch',date_trunc('day', hc.created_at))
The query has all the logic it needs to distinguish between what is latest and average but that limit kills everything. I need that limit to extract the latest result which is used to compare with past results.
Something like this assuming created_at is of type date
select p.name,
hc.note as current_note,
av.avg_note
from patients p
join health_conditions hc on hc.patient_id = p.id
join (
select patient_id,
avg(note) as avg_note
from health_conditions hc2
where created_at between current_date - 30 and current_date - 1
group by patient_id
) avg on t.patient_id = hc.patient_id
where hc.created_at = current_date;
This is PostgreSQL syntax. I'm not sure if MySQL supports date arithmetics the same way.
Edit:
This should get you the most recent note for each patient, plus the average for the last 30 days:
select p.name,
hc.created_at as last_note_date
hc.note as current_note,
t.avg_note
from patients p
join health_conditions hc
on hc.patient_id = p.id
and hc.created_at = (select max(created_at)
from health_conditions hc2
where hc2.patient_id = hc.patient_id)
join (
select patient_id,
avg(note) as avg_note
from health_conditions hc3
where created_at between current_date - 30 and current_date - 1
group by patient_id
) t on t.patient_id = hc.patient_id
SELECT SUM(delta < 0) AS worsened,
SUM(delta = 0) AS no_change,
SUM(delta > 0) AS improved
FROM (
SELECT patient_id,
SUM(IF(DATE(created_at) = CURDATE(),note,NULL))
- AVG(IF(DATE(created_at) < CURDATE(),note,NULL)) AS delta
FROM health_conditions
WHERE DATE(created_at) BETWEEN CURDATE() - INTERVAL 1 MONTH AND CURDATE()
GROUP BY patient_id
) t