Compare all rooms to all other rooms (Cartesian product) - mysql

I have attendance data that is stored like this:
Building | Room | Date | Morning | Evening
------------------------------------------
BuildA A1 1 10 15
BuildA A1 2 20 35
BuildA A1 3 30 15
BuildA A2 1 60 30
BuildA A2 2 30 10
BuildA A2 3 40 20
BuildB B1 1 20 25
BuildB B1 2 10 35
BuildB B1 3 30 10
BuildB B2 1 15 25
BuildB B2 2 25 35
BuildB B2 3 25 15
I then need to see the difference in attendance for each time of day from the previous day for each room. The result would look like this:
Building | Room | Date | Morning | Evening | MorningDiff | EveningDiff
-----------------------------------------------------------------------
BuildA A1 1 10 15 0 0
BuildA A1 2 20 35 10 20
BuildA A1 3 30 15 10 -20
BuildA A2 1 60 30 0 0
BuildA A2 2 30 10 -30 -20
BuildA A2 3 40 20 10 10
BuildB B1 1 20 25 0 0
BuildB B1 2 10 35 -10 10
BuildB B1 3 30 10 20 -25
BuildB B2 1 15 25 0 0
BuildB B2 2 25 35 10 10
BuildB B2 3 25 15 0 -20
The previous I was able to accomplish with this query:
select t.*,
COALESCE((`morning` -
(select `morning`
from data t2
where t2.date < t.date
and t2.room = t.room
order by t2.date desc
limit 1 )) ,0)
as MorningDiff,
COALESCE((`evening` -
(select `evening`
from data t2
where t2.date < t.date
and t2.room = t.room
order by t2.date desc
limit 1 )) ,0)
as EveningDiff
from data t
order by room,date asc;
So now I have the difference in attendance. This is where it gets a little complicated now. Maybe first seeing what the final product I am after may clear it up:
Building1 | Room1 | TimeOfDay1 | Direction1 | Building2 | Room2 | TimeOfDay2 | Direction2 | OccuranceCount | Room1DirectionCount | Room2DirectionCount
-----------------------------------------------------------------------------------------------------------------------------------------------------
BuildA A1 Morning Up BuildA A2 Morning Up 1 2 1
BuildA A1 Morning Up BuildA A2 Morning Down 1 2 1
BuildA A1 Morning Up BuildA A2 Evening Up 1 2 1
.
.
.
The reason for getting the difference between dates is to see if the attendance increased or decreased from the previous day. We are not actually concerned with the actual number from the difference, we are just interested if it went up or it went down.
OccuranceCount field - If a room's attendance went up/down one day we are trying to see whether another rooms attendance went up/down the next day. This field is used then to count how many times room2 went up/down one day and that room1 went up/down the next day. So if we take the first row as an example it shows that room A1 morning attendance went up 1 time when room A2's morning attendance went up the previous day during the 3 day period.
Room1DirectionCount/Room2DirectionCount field - These fields simply show how many time each direction occurred for each room. So if in the time period of 100 days if room A1 increased attendance 60 times the count would be 60.
Since I am comparing all the rooms to each other I have tried to do a cross join to form a cartesian product but I have been unable to figure out how to do the join properly so it references the other room's previous day.
I am not sure why this question was marked as a duplicate of a question regarding pivot tables? I don't believe this question is answered by that.

I'm not 100% certain I understand your question, and there isn't really enough sample data/expected output to be sure, but I think this query will give you the results you want. It uses a couple of CTE's: one to get the differences for each building/room/date/timeofday combination, and the second to sum those (for the RoomDirectionCount columns), then just counts grouped rows to get the OccurrenceCount column.
with atdiff AS (SELECT
building, room, date, 'Morning' AS time_of_day,
morning - lag(morning) over (partition by building, room order by date) AS diff
from attendance
UNION SELECT
building, room, date, 'Evening',
evening - lag(evening) over (partition by building, room order by date) diff
from attendance),
dircounts AS (SELECT
building, room, time_of_day, SIGN(diff) AS direction, COUNT(*) AS DirectionCount
FROM atdiff
GROUP BY building, room, time_of_day, direction)
select a1.building AS Building1,
a1.room AS Room1,
a1.time_of_day AS TimeOfDay1,
(CASE SIGN(a1.diff) WHEN 1 THEN 'Up' WHEN -1 THEN 'Down' ELSE 'Unchanged' END) AS Direction1,
a2.building AS Building2,
a2.room AS Room2,
a2.time_of_day AS TimeOfDay2,
(CASE SIGN(a2.diff) WHEN 1 THEN 'Up' WHEN -1 THEN 'Down' ELSE 'Unchanged' END) AS Direction2,
COUNT(*) AS OccurrenceCount,
MIN(d1.DirectionCount) AS Room1DirectionCount,
MIN(d2.DirectionCount) AS Room2DirectionCount
from atdiff a1
join atdiff a2 on a2.date = a1.date + 1 AND (a2.building != a1.building OR a2.room != a1.room)
JOIN dircounts d1 ON d1.building = a1.building AND d1.room = a1.room AND d1.time_of_day = a1.time_of_day AND d1.direction = SIGN(a1.diff)
JOIN dircounts d2 ON d2.building = a2.building AND d2.room = a2.room AND d2.time_of_day = a2.time_of_day AND d2.direction = SIGN(a2.diff)
where a1.diff is not NULL
group by Building1, Room1, TimeofDay1, Direction1, Building2, Room2, TimeOfDay2, Direction2
order by Building1, Room1, TimeofDay1 DESC, Direction1 DESC, Building2, Room2, TimeOfDay2 DESC, Direction2 DESC
The output is too long to include here but I've created a demo on dbfiddle. Alternate demo on dbfiddle.uk
Note that I've used a WHERE a1.diff IS NOT NULL clause to exclude results from the first day, you could possibly put a COALESCE around the computation of diff in the atdiff table and then not use that.

I am having hard times figuring out the meaning of some of your columns in the second expected output. However, for what it's worth, here are some examples and demonstrations that might help you.
If you are using MySQL 8.0, you can use the wonderful window functions to access rows that are related to the current row. The following query returns your first expected output (although where there is no previous date, NULL is returned instead of 0, to distinguish from the case when the frequentation is the same as the previous day) :
select
a.*,
morning - lag(a.morning) over (partition by a.building, a.room order by a.date) morning_diff,
evening - lag(a.evening) over (partition by a.building, a.room order by a.date) evening_diff
from attendance a
order by a.building, a.room, a.date
See the db fiddle.
With older versions of mysql, you could use a self-LEFT JOIN to access the previous row :
select
a.*,
a.morning - a1.morning morning_diff,
a.evening - a1.evening evening_diff
from
attendance a
left join attendance a1
on a1.building = a.building and a1.room = a.room and a1.date = a.date - 1
order by a.building, a.room, a.date
See this MySQL 5.7 db fiddle.
Once you have a query that returns the attendance differences, you can easily see if it went up or down in an outer query. Consider, for example :
select t.*,
case
when morning_diff is null then 'Unknown'
when morning_diff = 0 then 'Even'
when morning_diff > 0 then 'Up'
when morning_diff < 0 then 'Down'
end morning_direction,
case
when evening_diff is null then 'Unknown'
when evening_diff = 0 then 'Even'
when evening_diff > 0 then 'Up'
when evening_diff < 0 then 'Down'
end evening_direction
from (
select
a.*,
morning - lag(a.morning) over (partition by a.building, a.room order by a.date) morning_diff,
evening - lag(a.evening) over (partition by a.building, a.room order by a.date) evening_diff
from attendance a
) t
order by t.building, t.room, t.date;
See this db fiddle.

Related

SQL subquery in SELECT clause

I'm trying to find admin activity within the last 30 days.
The accounts table stores the user data (username, password, etc.)
At the end of each day, if a user had logged in, it will create a new entry in the player_history table with their updated data. This is so we can track progress over time.
accounts table:
id
username
admin
1
Michael
4
2
Steve
3
3
Louise
3
4
Joe
0
5
Amy
1
player_history table:
id
user_id
created_at
playtime
0
1
2021-04-03
10
1
2
2021-04-04
10
2
3
2021-04-05
15
3
4
2021-04-10
20
4
5
2021-04-11
20
5
1
2021-05-12
40
6
2
2021-05-13
55
7
3
2021-05-17
65
8
4
2021-05-19
75
9
5
2021-05-23
30
10
1
2021-06-01
60
11
2
2021-06-02
65
12
3
2021-06-02
67
13
4
2021-06-03
90
The following query
SELECT a.`username`, SEC_TO_TIME((MAX(h.`playtime`) - MIN(h.`playtime`))*60) as 'time' FROM `player_history` h, `accounts` a WHERE h.`created_at` > '2021-05-06' AND h.`user_id` = a.`id` AND a.`admin` > 0 GROUP BY h.`user_id`
Outputs this table:
Note that this is just admin activity, so Joe is not included in this data.
from 2021-05-06 to present (yy-mm-dd):
username
time
Michael
00:20:00
Steve
00:10:00
Louise
00:02:00
Amy
00:00:00
As you can see this from data, Amy's time is shown as 0 although she has played for 10 minutes in the last month. This is because she only has 1 entry starting from 2021-05-06 so there is no data to compare to. It is 0 because 10-10 = 0.
Another flaw is that it doesn't include all activity in the last month, basically only subtracts the highest value from the lowest.
So I tried fixing this by comparing the highest value after 2021-05-06 to their most previous login before the date. So I modified the query a bit:
SELECT a.`Username`, SEC_TO_TIME((MAX(h.`playtime`) - (SELECT MAX(`playtime`) FROM `player_history` WHERE a.`id` = `user_id` AND `created_at` < '2021-05-06'))*60) as 'Time' FROM `player_history` h, `accounts` a WHERE h.`created_at` >= '2021-05-06' AND h.`user_id` = a.`id` AND a.`admin` > 0 GROUP BY h.`user_id`
So now it will output:
username
time
Michael
00:50:00
Steve
00:50:00
Louise
00:52:00
Amy
00:10:00
But I feel like this whole query is quite inefficient. Is there a better way to do this?
I think you want lag():
SELECT a.username,
SEC_TO_TIME(SUM(h.playtime - COALESCE(h.prev_playtime, 0))) as time
FROM accounts a JOIN
(SELECT h.*,
LAG(playtime) OVER (PARTITION BY u.user_id ORDER BY h.created_at) as prev_playtime
FROM player_history h
) h
ON h.user_id = a.id
WHERE h.created_at > '2021-05-06' AND
a.admin > 0
GROUP BY a.username;
In addition to the LAG() logic, note the other changes to the query:
The use of proper, explicit, standard, readable JOIN syntax.
The use of consistent columns for the SELECT and GROUP BY.
The removal of single quotes around the column alias.
The removal of backticks; they just clutter the query, making it harder to write and to read.

Calculating moving average for different values in a column MySQL

I have a dataset like this:
team date score
A 2011-05-01 50
A 2011-05-02 54
A 2011-05-03 51
A 2011-05-04 49
A 2011-05-05 59
B 2011-05-03 30
B 2011-05-04 35
B 2011-05-05 39
B 2011-05-06 47
B 2011-05-07 50
I want to add another column called MA3 where I can calculate the moving average of scores for the last 3 days. The point that made it tricky is to calculate the MA for each team. The end result should be like this:
team date score MA3
A 2011-05-01 50 null
A 2011-05-02 54 null
A 2011-05-03 51 null
A 2011-05-04 49 51.66
A 2011-05-05 59 51.33
B 2011-05-03 30 null
B 2011-05-04 35 null
B 2011-05-05 39 null
B 2011-05-06 47 34.66
B 2011-05-07 50 40.33
If that would be a single team, I would go on and do:
SELECT team,
year,
AVG(score) OVER (ORDER BY date ASC ROWS 3 PRECEDING) AS MA3
FROM table
You're missing the PARTITION BY clause:
SELECT team,
date,
AVG(score) OVER (
PARTITION BY team
ORDER BY date ASC ROWS 3 PRECEDING
) AS MA3
FROM table
Note that there will always be an average calculation, regardless of the window size. If you want the average to be null if your window size is smaller than 3, you could do it like this:
SELECT team,
date,
CASE
WHEN count(*) OVER w <= 3 THEN null
ELSE AVG(score) OVER w
END AS MA3
FROM table
WINDOW w AS (PARTITION BY team ORDER BY date ASC ROWS 3 PRECEDING)
dbfiddle
Side note
Your next question might be about logical windowing, because often, you don't actually want to calculate the average over 3 rows, but over some interval,
like e.g. 3 days. Luckily, MySQL implements this. You could then write:
WINDOW w AS (PARTITION BY team ORDER BY date ASC RANGE INTERVAL 3 DAY PRECEDING)

MySQL - Calculate accumulation since reset event in a Table

This issue is a reference for my other question
Python solution has been done based on extract from MySQL DB (5.6.34) where original data are stored.
My question is: Is it possible to make such calculation straight in MySQL?
Just to remind:
There is 'runners' table with accumulated distance per runner and reset tags
runner startdate cum_distance reset_event
0 1 2017-04-01 100 1
1 1 2018-04-20 125 0
2 1 2018-05-25 130 1
3 2 2015-04-05 10 1
4 2 2015-10-20 20 1
5 2 2016-11-29 50 0
I would like to calculate an accumulated distance per runner since the reset point (my comments in brackets ()):
runner startdate cum_distance reset_event runner_dist_since_reset
0 1 2017-04-01 100 1 100 <-(no reset since begin)
1 1 2018-04-20 125 0 25 <-(125-100)
2 1 2018-05-25 130 1 30 <-(130-100)
3 2 2015-04-05 10 1 10 <-(no reset since begin)
4 2 2015-10-20 20 1 10 <-(20-10)
5 2 2016-11-29 50 0 30 <-(50-20)
So far I was able to calculate only differences between reset events:
SET #DistSinceReset=0;
SELECT
runner,
startdate,
reset_event,
IF(cum_distance - #DistSinceReset <0, cum_distance, cum_distance - #DistSinceReset) AS 'runner_dist_since_reset',
#DistSinceReset := cum_distance AS 'cum_distance'
FROM
runners
WHERE
reset_event = 1
GROUP BY runner, startdate
This answer is for MySQL 8.
The information you want is the most recent cum_distance for each user with reset_event = 1. You are using MySQL 8, so you can use window functions.
Here is one method:
select r.*,
(cum_distance - coalesce(preceding_reset_cum_distance, 0)) as runner_dist_since_reset
from (select r.*,
min(cum_distance) over (partition by runner order by preceding_reset) as preceding_reset_cum_distance
from (select r.*,
max(case when reset_event = 1 then start_date end) over
(partition by runner
order by start_date
rows between unbounded preceding and 1 preceding
) as preceding_reset
from runners r
) r
) r;

Missing records from one table in SQL Server 2008R2

Table 1:
Date PlacementID CampaignID Impressions
04/01/2014 100 10 1000
04/01/2014 101 10 1500
04/01/2014 100 11 500
Table 2:
Date PlacementID CampaignID Cost
04/01/2014 100 10 5000
04/01/2014 101 10 6000
04/01/2014 100 11 7000
04/01/2014 103 10 8000
When I have joined this table using Full Join and Left Join statement, I am not able to get uncommon record which is last row in table2 that display PlacementID 103 and campaignID 10 and Cost 8000. However I have searched all raw data and file but this missing records are not common between two sources. However, I want to include this records in final table. How can I do that? This two table are two different source and I have got results only common records.
Moreover, when I found out that missing value is exact value that are required in final figure so want to include every thing. I am including my SQL script below:
SELECT A.palcementid,
A.campaignid,
A.date,
Sum(A.impressions) AS Impressions,
Sum(CASE
WHEN C.placement_count > 1 THEN ( B.cost / C.placement_count )
ELSE B.cost
END) AS Cost
FROM table1 A
FULL JOIN table2 B
ON A.placementid = B.placementid
AND A.campaignid = B.campaignid
AND A.date = B.date
LEFT JOIN (SELECT Count(A.placementid) AS Placement_Count,
placementid. campaignid,
date
FROM table1
GROUP BY placementid,
campaignid,
date) c
ON A.placementid = C.placementid
AND A.campaignid = C.campaignid
AND A.date = C.date
GROUP BY A.placementid,
A.campaignid,
A.date
I am dividing Cost by placement because in source the cost was allocated for one placement only and one time so I have to divide those because in actual table the same Placementid repeat more than 1 times on same date.
As you didn't provide any expected output I guessing here but if the result you want is this:
PlacementID CampaignID Date Impressions Cost
----------- ----------- ----------------------- ----------- -----------
100 10 2014-04-01 02:00:00.000 1000 5000
100 11 2014-04-01 02:00:00.000 500 7000
101 10 2014-04-01 02:00:00.000 1500 6000
103 10 2014-04-01 02:00:00.000 NULL 8000
Then the following query should do it:
SELECT COALESCE(A.PlacementID,b.placementid) AS PlacementID,
COALESCE(A.campaignid, b.campaignid) AS CampaignID,
COALESCE(A.date, b.date) AS [Date],
SUM(A.impressions) AS Impressions,
SUM(CASE
WHEN C.placement_count > 1 THEN ( B.cost / C.placement_count )
ELSE B.cost
END ) AS Cost
FROM table1 A
FULL JOIN table2 B
ON A.[PlacementID] = B.placementid
AND A.campaignid = B.campaignid
AND A.date = B.date
LEFT JOIN (SELECT COUNT(PlacementID) AS Placement_Count,
placementid, campaignid,
date
FROM table1
GROUP BY placementid,
campaignid,
date) c
ON A.[PlacementID] = C.placementid
AND A.campaignid = C.campaignid
AND A.date = C.date
GROUP BY COALESCE(A.PlacementID, B.PlacementID),
COALESCE(A.campaignid, b.campaignid),
COALESCE(A.date, b.date)
Sample SQL Fiddle

Compute outstanding amounts in MySQL

I am having an issue with a SELECT command in MySQL. I have a database of securities exchanged daily with maturity from 1 to 1000 days (>1 mio rows). I would like to get the outstanding amount per day (and possibly per category). To give an example, suppose this is my initial dataset:
DATE VALUE MATURITY
1 10 3
1 15 2
2 10 1
3 5 1
I would like to get the following output
DATE OUTSTANDING_AMOUNT
1 25
2 35
3 15
Outstanding amount is calculated as the total of securities exchanged still 'alive'. That means, in day 2 there is a new exchange for 10 and two old exchanges (10 and 15) still outstanding as their maturity is longer than one day, for a total outstanding amount of 35 on day 2. In day 3 instead there is a new exchange for 5 and an old exchange from day 1 of 10. That is, 15 of outstanding amount.
Here's a more visual explanation:
Monday Tuesday Wednesday
10 10 10 (Day 1, Value 10, matures in 3 days)
15 15 (Day 1, 15, 2 days)
10 (Day 2, 10, 1 day)
5 (Day 3, 5, 3 days with remainder not shown)
-------------------------------------
25 35 15 (Outstanding amount on each day)
Is there a simple way to get this result?
First of all in the main subquery we find SUM of all Values for current date. Then add to them values from previous dates according their MATURITY (the second subquery).
SQLFiddle demo
select T1.Date,T1.SumValue+
IFNULL((select SUM(VALUE)
from T
where
T1.Date between
T.Date+1 and T.Date+Maturity-1 )
,0)
FROM
(
select Date,
sum(Value) as SumValue
from T
group by Date
) T1
order by DATE
I'm not sure if this is what you are looking for, perhaps if you give more detail
select
DATE
,sum(VALUE) as OUTSTANDING_AMOUNT
from
NameOfYourTable
group by
DATE
Order by
DATE
I hope this helps
Each date considers each row for inclusion in the summation of value
SELECT d.DATE, SUM(m.VALUE) AS OUTSTANDING_AMOUNT
FROM yourTable AS d JOIN yourtable AS m ON d.DATE >= m.MATURITY
GROUP BY d.DATE
ORDER BY d.DATE
A possible solution with a tally (numbers) table
SELECT date, SUM(value) outstanding_amount
FROM
(
SELECT date + maturity - n.n date, value, maturity
FROM table1 t JOIN
(
SELECT 1 n UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
) n ON n.n <= maturity
) q
GROUP BY date
Output:
| DATE | OUTSTANDING_AMOUNT |
-----------------------------
| 1 | 25 |
| 2 | 35 |
| 3 | 15 |
Here is SQLFiddle demo