Getting the total sum of data in a particular timeframe - mysql

So I have 2 tables called listings and logs with the following headers:
Logs Table:
The logs table makes a new record every status change made and passes the current timestamp as the logtime, and the listings table changes/updates the status and updates its update_date. Now to get the total listings as of today I'm using the following statement:
SELECT SUM(status = 'D') AS draft, SUM(status = 'N') AS action, SUM(status = 'Y') AS publish
FROM `crm_listings` where updated_date between '2021-05-29' and '2021-05-29'
Basically I want to return all the records that happened between a particular timeframe.
Note: Refno isn't unique in my logs table since a product with the same refno can be marked as publish 1 day and unpublish another, but it is unique in my listings table.

I suppose all listing entries appear in the log_table, even if it was created but no status change since then.
So it needs to lookup the log table only for the record with latest logtime up to the query date for each refno, the status_to for the selected record is the standing status for the date.
To archive this, sort the log_table by refno and logtime in descending order and find the first record for each refno.
SELECT SUM(status_to = 'Action'), SUM(status_to = 'Publish') FROM (
SELECT refno, status_to, ROW_NUMBER() OVER(PARTITION BY refno ORDER BY logtime DESC) AS RN
FROM log_table
WHERE logtime <= '2021-05-29 23:59:59'
) r
WHERE r.RN = 1
or this for older MySQL servers
SELECT SUM(status_to = 'Action'), SUM(status_to = 'Publish') FROM (
SELECT refno, status_to, CASE WHEN #RF != refno THEN #RN := 1 ELSE #RN := #RN + 1 END AS RN, #RF := refno
FROM log_table
JOIN ( SELECT #RN := 0 , #RF := '' ) f
WHERE logtime <= '2021-05-29 23:59:59'
ORDER BY refno ASC, logtime DESC
) r
WHERE r.RN = 1
If log_table is huge and this query is important, do consider to create another table to store such daily statistics.

Related

Subtracting or Adding data based on logtime of another table

So currently I have 2 tables called listings and logs table. The listings table holds a products reference number and it's current status. So suppose if it's status was Publish currently and it's sold, the status updates to Sold. Here the refno. in this table is unique since the status can change for 1 product.
Now I have another table called Logs table, this table records all the status changes that have happened for a particular product(referenced by refno) in a particular timeframe. Suppose the Product with refno. 5 was Publish on 1st October and Sold on 2nd October, The logs table will display as:
Refno
status_from
status_to
logtime
5
Stock
Publish
2021-10-01
5
Publish
Sold
2021-10-02
This is how my tables currently look like:
Listings table:('D'=>'Draft','N'=>'Action','Y'=>'Publish')
Logs Table which I'm getting using the following statement:
SELECT refno, logtime, status_from, status_to FROM (
SELECT refno, logtime, status_from, status_to, ROW_NUMBER() OVER(PARTITION BY refno ORDER BY logtime DESC)
AS RN FROM crm_logs WHERE logtime < '2021-10-12 00:00:00' ) r
WHERE r.RN = 1 UNION SELECT refno, logtime, status_from, status_to
FROM crm_logs WHERE logtime <= '2021-10-12 00:00:00' AND logtime >= '2015-10-02 00:00:00'
ORDER BY `refno` ASC
The logs table makes a new record every status change made and passes the current timestamp as the logtime, and the listings table changes/updates the status and updates its update_date. Now to get the total listings as of today I'm using the following statement:
SELECT SUM(status_to = 'D') AS draft, SUM(status_to = 'N') AS action, SUM(status_to = 'Y') AS publish FROM `crm_listings`
And this returns all the count data for status as of the current day.
Now this is where it gets confusing for me. So suppose today the count under action is 10 and tomorrow it'll be 15, and I want to retrieve the total that was present yesterday(10). So for this what I would've to do is take todays total(15) and subtract all the places where a product was changed to draft in between yesterday and today(Total count today in listing table - count(*) where status_to='Action' from logs table). Or vice versa, if yesterday it was 10 under action and today it is 5, it should add the values from the status_from column in logs table
Note: Refno isn't unique in my logs table since a product with the same refno can be marked as publish 1 day and unpublish another, but it is unique in my listings table.
Link to dbfiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=01cb3ccfda09f6ddbbbaf02ec92ca894
I am sure it can be simplifed or better. But its my query and logic :
I found status_changes per refno's and calculated total changes from the desired day to present :
select status_logs, sum(cnt_status) to_add from (
SELECT
status_to as status_logs, -1*count(*) cnt_status
FROM logs lm
where
id = (select max(id) from logs l where l.refno = lm.refno) and
logtime >= '2021-10-01 00:00:00'
group by status_to
union all
SELECT
status_from, count(*) cnt_status_from
FROM logs lm
where
id = (select max(id) from logs l where l.refno = lm.refno) and
logtime >= '2021-10-01 00:00:00'
group by status_from ) total_changes
group by status_logs
I matched the keys between listings table and logs table by converting listings table keys :
select
case status
when 'D' THEN 'Draft'
when 'A' THEN 'Action'
when 'Y' THEN 'Publish'
when 'S' THEN 'Sold'
when 'N' THEN 'Let'
END status_l ,COUNT(*) c
from listings
group by status
I joined them and add the calculations to total sum of current data.
I had to use full outer join , so i have one left and one right join with the same subqueries.
Lastly I used distinct , since it will generate same result for each joined query and used ifnull to bring the other tables status to the other column .
select distinct IFNULL(status_l, status_logs) status, counts_at_2021_10_01
from (select l.*,
logs.*,
l.c + ifnull(logs.to_add, 0) counts_at_2021_10_01
from (select case status
when 'D' THEN
'Draft'
when 'A' THEN
'Action'
when 'Y' THEN
'Publish'
when 'S' THEN
'Sold'
when 'N' THEN
'Let'
END status_l,
COUNT(*) c
from listings
group by status) l
left join (
select status_logs, sum(cnt_status) to_add
from (SELECT status_to as status_logs,
-1 * count(*) cnt_status
FROM logs lm
where id = (select max(id)
from logs l
where l.refno = lm.refno)
and logtime >= '2021-10-01 00:00:00'
group by status_to
union all
SELECT status_from, count(*) cnt_status_from
FROM logs lm
where id = (select max(id)
from logs l
where l.refno = lm.refno)
and logtime >= '2021-10-01 00:00:00'
group by status_from) total_changes
group by status_logs) logs
on logs.status_logs = l.status_l
union all
select l.*,
logs.*,
l.c + ifnull(logs.to_add, 0) counts_at_2021_05_01
from (select case status
when 'D' THEN
'Draft'
when 'A' THEN
'Action'
when 'Y' THEN
'Publish'
when 'S' THEN
'Sold'
when 'N' THEN
'Let'
END status_l,
COUNT(*) c
from listings
group by status) l
right join (
select status_logs, sum(cnt_status) to_add
from (SELECT status_to as status_logs,
-1 * count(*) cnt_status
FROM logs lm
where id = (select max(id)
from logs l
where l.refno = lm.refno)
and logtime >= '2021-10-01 00:00:00'
group by status_to
union all
SELECT status_from, count(*) cnt_status_from
FROM logs lm
where id = (select max(id)
from logs l
where l.refno = lm.refno)
and logtime >= '2021-10-01 00:00:00'
group by status_from) total_changes
group by status_logs) logs
on logs.status_logs = l.status_l) l

How to display only the second purchase made per account

I have a transactions table which has shows various transactions made by several accounts. Some make only one, others more than that. At the moment the SQL I have prints out the first purchase of each account but i need it to print out the second made by each account
SELECT account_id
, purchase_date as second_purchase
, amount as second_purchase_amount
FROM Transactions t
WHERE purchase_date NOT IN (SELECT MIN(purchase_date)
FROM Transactions m
)
GROUP BY account_id
HAVING purchase_date = MIN(purchase_date);
What needs to change that the second purchase date and amount are chosen? I tried adding in a count for the account_id but it was giving me the wrong value.
You can use variables to assign row numbers and get the 2nd purchase.
SELECT account_id,purchase_Date,amount
FROM (
SELECT account_id
,purchase_date
,amount
--, #rn:=IF(account_id=#a_id and #pdate <> purchase_date,#rn+1,1) as rnum
,case when account_id=#a_id and #pdate <> purchase_date then #rn:=#rn+1
when account_id=#a_id and #pdate=purchase_date then #rn:=#rn
else #rn:=1 end as rnum
, #pdate:=purchase_date
, #a_id:=account_id
FROM Transactions t
CROSS JOIN (SELECT #rn:=0,#a_id:=-1,#pdate:='') r
ORDER BY account_id, purchase_date
) x
WHERE rnum=2
Explanation of how it works:
#rn:=0,#a_id:=-1,#pdate:='' - Declare 3 variables and initialize them, #rn for assigning the row numbers, #a_id to hold the account_id and #pdate to hold the purchase_date.
For the first row (ordered by account_id and purchase_date), account_id and #a_id, #pdate and purchase_date will be compared. As they wouldn't be equal, the when conditions fail and the else part would assign #rn=1. Also, the variable assignment happens after this. #aid and #pdate would be updated to current row's values. For the second row, if they are the same account and on a different date the first when condition will be executed and the #rn will be incremented by 1. If there are ties the second when condition would be executed and the #rn remains the same. You can run the inner query to check how the variables are assigned.
Number the rows and choose RowNumber = 2
select *
from (
select
#rn := case when #account_id = account_id then #rn + 1 else #rn := 1 end as RowNumber,
#account_id := account_id as account_id,
purchase_date
from
(select #rn := 1) x,
(select #acount_id :=account_id as account_id, purchase_date
from Transactions
order by account_id, purchase_date) y
) z
where RowNumber = 2;

figure out total seconds based on timestamp

I have a table with three columns (user, timestamp, activity). The activity is basically check in or check out. How do I generate a query to view total seconds a user has clocked in by day?
There is an issue with the system that sometimes has two check ins but only one check out...in which case I want to only take the lowest check in timestamp.
I came up with something like this (activity: 1 = check in, 0 = checkout):
select a.user_id, a.d, time_to_sec(TIMEDIFF(b.created_at, a.created_at)) total_secs,
a.created_at check_in, b.created_at check_out
from (select user_id, created_at, date(created_at) d, #rownum := #rownum + 1 AS num from table, (SELECT #rownum := 0) r where activity = 1 order by user_id, created_at) a
join (select user_id, created_at, date(created_at) d, #rownum2 := #rownum2 + 1 AS num from table, (SELECT #rownum2 := 0) r where activity = 0 order by user_id, created_at) b
on a.user_id = b.user_id and a.d = b.d and a.num = b.num
However, I think relying on rownum is not accruate
If I understand your requirement correctly, you want the number of seconds between a checkout and the earliest check-in before that, but after the previous checkout.
So what you would need is a query that joins a checkout with the previous checkout and then with the earliest check-in between these two activities.
select
curr_co.user_id,
curr_co.created_at,
max(prev_co.created_at)
from
table as curr_co
left outer join
table as prev_co
on (curr_co.user_id = prev_co.user_id and curr_co.created_at > prev_co.created_at and curr_co.activity = _checkout_ and prev_co.activity = _checkout_)
group by curr_co.user_id, curr_co.created_at
This query should give you a list of checkouts with previous checkouts per user.
Now select all check-ins in between and from those the minimal and calculate the diff.
select ... min(ci.created_at), time_to_sec(TIMEDIFF(min(ci.created_at), curr_co.created_at))
... join
table as ci
on (ci.user_id = curr_co.user_id and ci.created_at < curr_co.created_at and ci.created_at > prev_co.created_at and ci.activity = _check-in_)

MySql counting instances of event

So I have an event log that logs every 5 minutes so my logs look something like this:
OK
Event1
Event1
Event1
OK
Event1
OK
Event1
Event1
Event1
OK
In this case I'd have 3 instances of "Event1", since it had an "OK" period in between the periods when that status was returned.
Is there some decent way to handle this via mySql? (Note, there are other statuses other than Event1 / OK that come up quite regularly)
The actual Sql structure looks something like this:
-Historical
--CID //Unique Identifier, INT, AI
--ID //Unique Identifier for LOCATION, INT
--LOCATION //Unique Identifier for Location, this is the site name, VarChar
--STATUS //Pulled from Software event logger, VarChar
--TIME //Pulled from Software event logger, DateTime
Another answer using a totally different way of doing it:-
SELECT MAX(#Counter) AS EventCount -- Get the max counter
FROM (SELECT #Counter:=#Counter + IF(status = 'OK' AND #PrevStatus = 1, 1, 0), -- If it is an OK record and the prev status was not an OK then add 1 to the counter
#PrevStatus:=CASE
WHEN status = 'OK' THEN #PrevStatus := 2 -- An OK status so save as a prev status of 2
WHEN status != 'OK' AND #PrevStatus != 0 THEN #PrevStatus := 1 -- A non OK status but when there has been a previous OK status
ELSE #PrevStatus:=0 -- Set the prev status to 0, ie, for a record where there is no previous OK status
END
FROM (SELECT * FROM historical ORDER BY TimeStamp) a
CROSS JOIN (SELECT #Counter:=0, #PrevStatus := 0) b -- Initialise counter and store of prev status.
)c
This is using user variables. It has a subselect to get the records back in the right order, then uses a user variable to store a code for the previous status. Starts at 0 and when it finds a status of OK it sets the previous status to a 2. If it finds a status other than OK then it sets the prev status to 1, but ONLY if the prev status is not 0 (ie, it has already found a status of OK). Before storing the prev status code, if the current status is OK and the prev status code is a 1 then it adds 1 to the counter, otherwise it adds 0 (ie, adds nothing)
Then it just has a select around the outside to select the max value of the counter.
Seems to work but hardly readable!
EDIT - To cope with multiple ids
SELECT id, MAX(aCounter) AS EventCount -- Get the max counter for each id
FROM (SELECT id,
#PrevStatus:= IF(#Previd = id, #PrevStatus, 0), -- If the id has changed then set the store of previous status to 0
status,
#Counter:=IF(#Previd = id, #Counter + IF(status = 'OK' AND #PrevStatus = 1, 1, 0), 0) AS aCounter, -- If it is an OK record and the prev status was not an OK and was for the same id then add 1 to the counter
#PrevStatus:=CASE
WHEN status = 'OK' THEN #PrevStatus := 2 -- An OK status so save as a prev status of 2
WHEN status != 'OK' AND #PrevStatus != 0 THEN #PrevStatus := 1 -- A non OK status but when there has been a previous OK status
ELSE #PrevStatus:=0 -- Set the prev status to 0, ie, for a record where there is no previous OK status
END,
#Previd := id
FROM (SELECT * FROM historical ORDER BY id, TimeStamp) a
CROSS JOIN (SELECT #Counter:=0, #PrevStatus := 0, #Previd := 0) b
)c
GROUP BY id -- Group by clause to allow the selection of the max counter per id
Which is even less readable!
Another option, again using user variables to generate a sequence number:-
SELECT Sub1.id, COUNT(DISTINCT Sub1.aCounter) -- Count the number of distinct Sub1 records found for an id (without the distinct counter it would count all the recods between OK status records)
FROM (
SELECT id,
`TimeStamp`,
#Counter1:=IF(#Previd1 = id, #Counter1 + 1, 0) AS aCounter, -- Counter for this status within id
#Previd1 := id -- Store the id, used to determine if the id has changed and so whether to start the counters at 0 again
FROM (SELECT * FROM historical WHERE status = 'OK' ORDER BY id, `TimeStamp`) a -- Just get the OK status records, in id / timestamp order
CROSS JOIN (SELECT #Counter1:=0, #Previd1 := 0) b -- Initialise the user variables.
) Sub1
INNER JOIN (SELECT id,
`TimeStamp`,
#Counter2:=IF(#Previd2 = id, #Counter2 + 1, 0) AS aCounter,-- Counter for this status within id
#Previd2 := id-- Store the id, used to determine if the id has changed and so whether to start the counters at 0 again
FROM (SELECT * FROM historical WHERE status = 'OK' ORDER BY id, `TimeStamp`) a -- Just get the OK status records, in id / timestamp order
CROSS JOIN (SELECT #Counter2:=0, #Previd2 := 0) b -- Initialise the user variables.
) Sub2
ON Sub1.id = Sub2.id -- Join the 2 subselects based on the id
AND Sub1.aCounter + 1 = Sub2.aCounter -- and also the counter. So Sub1 is an OK status, while Sub2 the the next OK status for that id
INNER JOIN historical Sub3 -- Join back against historical
ON Sub1.id = Sub3.id -- on the matching id
AND Sub1.`TimeStamp` < Sub3.`TimeStamp` -- and where the timestamp is greater than the timestamp in the Sub1 OK record
AND Sub2.`TimeStamp` > Sub3.`TimeStamp` -- and where the timestamp is less than the timestamp in the Sub2 OK record
GROUP BY Sub1.id -- Group by the Sub1 id
This is grabbing the table twice for just the status OK records, adding a sequence number each time and matching where the id matches and the sequence number on the 2nd copy is 1 greater than the first one (ie, it is finding each OK and the OK immediately following it). Then joins that against the table where the id matches and the timestamp is between the 2 OK records. Then counts the distinct occurrences of the first counter for each id.
This should be a bit more readable.
Quick try, and I have a feeling I am missing a far better way to do this but think this will work.
SELECT COUNT(*)
FROM
(
SELECT DISTINCT a.time, b.time
FROM Historical a
INNER JOIN Historical b
ON a.time < b.time
AND a.status = 'OK'
AND b.status = 'OK'
INNER JOIN Historical c
ON a.time < c.time
AND c.time < b.time
AND c.status = 'Event1'
LEFT OUTER JOIN Historical d
ON a.time < d.time
AND d.time < b.time
AND d.status = 'OK'
WHERE d.cid IS NULL
) Sub1
Joins the table against itself repeatedly. Alias a and b should be for OK events, with c being for any Event1 event between those dates. Alias d is looking for an OK event between a and b, and if any are found then the record is dropped in the WHERE clause.
Then use DISTINCT to get rid of the duplicates. Then count the result.
Possible it could be simplified as something like the following (although probably best to cast the dates to chars in the select if doing this)
SELECT COUNT(DISTINCT CONCAT(a.time, b.time))
FROM Historical a
INNER JOIN Historical b
ON a.time < b.time
AND a.status = 'OK'
AND b.status = 'OK'
INNER JOIN Historical c
ON a.time < c.time
AND c.time < b.time
AND c.status = 'Event1'
LEFT OUTER JOIN Historical d
ON a.time < d.time
AND d.time < b.time
AND d.status = 'OK'
WHERE d.cid IS NULL
What you want to count, it seems, are instances of an event when the previous record is OK. You identify these with a correlated subquery, and then summarize to get the numbers:
select status, count(*)
from (select h.*,
(select h2.status
from historical h2
where h2.time < h.time
order by h2.time desc
limit 1
) as prevStatus
from historical h
) h
where status <> 'OK' and (prevStatus = 'OK' or prevStatus is NULL)
group by status;
It is not clear which column contains the values OK and Event1. I'm guessing it is status. I also don't know what role location plays, but this should at least get you started.

Checking for maximum length of consecutive days which satisfy specific condition

I have a MySQL table with the structure:
beverages_log(id, users_id, beverages_id, timestamp)
I'm trying to compute the maximum streak of consecutive days during which a user (with id 1) logs a beverage (with id 1) at least 5 times each day. I'm pretty sure that this can be done using views as follows:
CREATE or REPLACE VIEW daycounts AS
SELECT count(*) AS n, DATE(timestamp) AS d FROM beverages_log
WHERE users_id = '1' AND beverages_id = 1 GROUP BY d;
CREATE or REPLACE VIEW t AS SELECT * FROM daycounts WHERE n >= 5;
SELECT MAX(streak) AS current FROM ( SELECT DATEDIFF(MIN(c.d), a.d)+1 AS streak
FROM t AS a LEFT JOIN t AS b ON a.d = ADDDATE(b.d,1)
LEFT JOIN t AS c ON a.d <= c.d
LEFT JOIN t AS d ON c.d = ADDDATE(d.d,-1)
WHERE b.d IS NULL AND c.d IS NOT NULL AND d.d IS NULL GROUP BY a.d) allstreaks;
However, repeatedly creating views for different users every time I run this check seems pretty inefficient. Is there a way in MySQL to perform this computation in a single query, without creating views or repeatedly calling the same subqueries a bunch of times?
This solution seems to perform quite well as long as there is a composite index on users_id and beverages_id -
SELECT *
FROM (
SELECT t.*, IF(#prev + INTERVAL 1 DAY = t.d, #c := #c + 1, #c := 1) AS streak, #prev := t.d
FROM (
SELECT DATE(timestamp) AS d, COUNT(*) AS n
FROM beverages_log
WHERE users_id = 1
AND beverages_id = 1
GROUP BY DATE(timestamp)
HAVING COUNT(*) >= 5
) AS t
INNER JOIN (SELECT #prev := NULL, #c := 1) AS vars
) AS t
ORDER BY streak DESC LIMIT 1;
Why not include user_id in they daycounts view and group by user_id and date.
Also include user_id in view t.
Then when you are queering against t add the user_id to the where clause.
Then you don't have to recreate your views for every single user you just need to remember to include in your where clause.
That's a little tricky. I'd start with a view to summarize events by day:
CREATE VIEW BView AS
SELECT UserID, BevID, CAST(EventDateTime AS DATE) AS EventDate, COUNT(*) AS NumEvents
FROM beverages_log
GROUP BY UserID, BevID, CAST(EventDateTime AS DATE)
I'd then use a Dates table (just a table with one row per day; very handy to have) to examine all possible date ranges and throw out any with a gap. This will probably be slow as hell, but it's a start:
SELECT
UserID, BevID, MAX(StreakLength) AS StreakLength
FROM
(
SELECT
B1.UserID, B1.BevID, B1.EventDate AS StreakStart, DATEDIFF(DD, StartDate.Date, EndDate.Date) AS StreakLength
FROM
BView AS B1
INNER JOIN Dates AS StartDate ON B1.EventDate = StartDate.Date
INNER JOIN Dates AS EndDate ON EndDate.Date > StartDate.Date
WHERE
B1.NumEvents >= 5
-- Exclude this potential streak if there's a day with no activity
AND NOT EXISTS (SELECT * FROM Dates AS MissedDay WHERE MissedDay.Date > StartDate.Date AND MissedDay.Date <= EndDate.Date AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND MissedDay.Date = B2.EventDate))
-- Exclude this potential streak if there's a day with less than five events
AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND B2.EventDate > StartDate.Date AND B2.EventDate <= EndDate.Date AND B2.NumEvents < 5)
) AS X
GROUP BY
UserID, BevID