Count consecutive row occurrences - mysql

I have a MySQL table with three columns: takenOn (datetime - primary key), sleepDay (date), and type (int). This table contains my sleep data from when I go to bed to when I get up (at a minute interval).
As an example, if I go to bed on Oct 29th at 11:00pm and get up on Oct 30th at 6:00am, I will have 420 records (7 hours * 60 minutes). takenOn will range from 2016-10-29 23:00:00 to 2016-10-30 06:00:00. sleepDay will be 2016-10-30 for all 420 records. type is the "quality" of my sleep (1=asleep, 2=restless, 3=awake). I'm trying to get how many times I was restless/awake, which can be calculated by counting how many times I see type=2 (or type=3) consecutively.
So far, I have to following query, which works for one day only. Is this the correct/"efficient" way of doing this (as this method requires that I have the data without any "gaps" in takenOn)? Also, how can I expand it to calculate for all possible sleepDays?
SELECT
sleepDay,
SUM(CASE WHEN type = 2 THEN 1 ELSE 0 END) AS TimesRestless,
SUM(CASE WHEN type = 3 THEN 1 ELSE 0 END) AS TimesAwake
FROM
(SELECT s1.sleepDay, s1.type
FROM sleep s1
LEFT JOIN sleep s2
ON s2.takenOn = ADDTIME(s1.takenOn, '00:01:00')
WHERE
(s2.type <> s1.type OR s2.takenOn IS NULL)
AND s1.sleepDay = '2016-10-30'
ORDER BY s1.takenOn) a
I have created an SQL Fiddle - http://sqlfiddle.com/#!9/b33b4/3
Thank you!

Your own solution is quite alright, given the assumptions you are aware of.
I present here an alternative solution, that will deal well with gaps in the series, and can be used for more than one day at a time.
The downside is that it relies more heavily on non-standard MySql features (inline use of variables):
select sleepDay,
sum(type = 2) TimesRestless,
sum(type = 3) TimesAwake
from (
select #lagDay as lagDay,
#lagType as lagType,
#lagDay := sleepDay as sleepDay,
#lagType := type as type
from (select * from sleep order by takenOn) s1,
(select #lagDay := '',
#lagType := '') init
) s2
where lagDay <> sleepDay
or lagType <> type
group by sleepDay
To see how it works it can help to select the second select statement on its own. The inner-most select must have the order by clause to make sure the middle query will process the records in that order, which is important for the variable assignments that happen there.
See your updated SQL fiddle.

Related

SQl query to calculate number of active users at the end of everyday

I have three columns User_ID, New_Status and DATETIME.
New_Status contains 0(inactive) and 1(active) for users.
Every user starts from active status - ie. 1.
Subsequently table stores their status and datetime at which they got activated/inactivated.
How to calculate number of active users at the end of each date, including dates when no records were generated into the table.
Sample data:
| ID | New_Status | DATETIME |
+----+------------+---------------------+
| 1 | 1 | 2019-01-01 21:00:00 |
| 1 | 0 | 2019-02-05 17:00:00 |
| 1 | 1 | 2019-03-06 18:00:00 |
| 2 | 1 | 2019-01-02 01:00:00 |
| 2 | 0 | 2019-02-03 13:00:00 |
Format the date time value to a date only string and group by it
SELECT DATE_FORMAT(DATETIME, '%Y-%m-%d') as day, COUNT(*) as active
FROM test
WHERE New_Status = 1
GROUP BY day
ORDER BY day
In MySQL 8 you can use the row_number() window function to get the last status of a user per day. Then filter for the one that indicate the user was active GROUP BY the day and count them.
SELECT date(x.datetime),
count(*)
FROM (SELECT date(t.datetime) datetime,
t.new_status,
row_number() OVER (PARTITION BY date(t.datetime)
ORDER BY t.datetime DESC) rn
FROM elbat t) x
WHERE x.rn = 1
AND x.new_status = 1
GROUP BY x.datetime;
If not all days are in the table you need to create a (possibly derived) table with all days and cross join it.
Find out the last activity status of users whose activity was changed for each day
select User_ID, New_Status, DATE_FORMAT(DATETIME, '%Y-%m-%d')
from activity_table
where not exists
(
select 1
from activity_table at
where at.User_ID = activity_table.User_ID and
DATE_FORMAT(at.DATETIME, '%Y-%m-%d') = DATE_FORMAT(activity_table.DATETIME, '%Y-%m-%d') and
at.DATETIME > activity_table.DATETIME
)
order by DATE_FORMAT(activity_table.DATETIME, '%Y-%m-%d');
This is not the solution yet, but a very very useful information before solution. Note that here not all dates are covered yet and the values are individual records, more precisely their last values on each day, ordered by the date.
Let's get aggregate numbers
Using the query above as a subselect and aliasing it into a table, you can group by DATETIME and do a select sum(new_Status) as activity, count(*) total, DATETIME so you will know that activity - (total - activity) is the difference in comparison to the previous day.
Knowing the delta for each day present in the result
At the previous section we have seen how the delta can be calculated. If the whole query in the previous section is aliased, then you can self join it using a left join, with pairs of (previous date, current date), still having the gaps of dates, but not worrying about that just yet. In the case of the first date, its activity is the delta. For subsequent records, adding the previous day's delta to their delta yields the result you need. To achieve this you can use a recursive query, supported by MySQL 8, or, alternatively, you can just have a subquery which sums the delta of previous days (with special attention to the first date, as described earlier) will and adding the current date's delta yields the result we need.
Fill the gaps
The previous section would already perfectly work (assuming the lack of integrity problems), assuming that there were activity changes for each day, but we will not continue with the assumption. Here we know that the figures are correct for each date where a figure is present and we will need to just add the missing dates into the result. If the results are properly ordered, as they should be, then one can use a cursor and loop the results. At each record after the first one, we can determine the dates that are missing. There might be 0 such dates between two consequent dates or more. What we do know about the gaps is that their values are exactly the same as the previous record, that do has data. If there were no activity changes on a given date, then the number of active users is exactly the same as in the previous day. Using some structure, like a table you can generate the results you have with the knowledge described here.
Solving possible integrity problems
There are several possibilities for such problems:
First, a data item might exist prior to the introduction of this table's records were started to be spawned.
Second, bugs or any other causes might have made a pause in creating records for this activity table.
Third, the addition of user is or was not necessarily generating an activity change, since its popping into existence renders its previous state of activity undefined and subject to human standards, which might change over time.
Fourth, the removal of user is or was not necessarily generating an activity change, since its popping out of existence renders is current state of activity undefined and subject to human standards, which might change over time.
Fifth, there is an infinity of other issues which might cause data integrity issues.
To cope with these you will need to comprehensively analyze whatever you can from the source-code and the history of the project, including database records, logs and humanly available information to detect such anomalies, the time they were effective and figure out what their solution is if they exist.
EDIT
In the meantime I was thinking about the possibility of a user, who was active at the start of the day being deactivated and then activated again by the end of the day. Similarly, an inactive user during a day might be activated and then finally deactivated by the end of the day. For users that have more than an activation at the start of the day, we need to compare their activity status at the start and the end of the day to find out what the difference was.
SELECT
DATE(DATETIME),
COUNT(*)
FROM your_table
WHERE New_Status = 1
GROUP BY User_ID,
DATE(DATETIME)
For MySQL
WITH RECURSIVE
cte AS (
SELECT MIN(DATE(DT)) dt
FROM src
UNION ALL
SELECT dt + INTERVAL 1 DAY
FROM cte
WHERE dt < ( SELECT MAX(DATE(DT)) dt
FROM src )
),
cte2 AS
(
SELECT users.id,
cte.dt,
SUM( CASE src.New_Status WHEN 1 THEN 1
WHEN 0 THEN -1
ELSE 0
END ) OVER ( PARTITION BY users.id
ORDER BY cte.dt ) status
FROM cte
CROSS JOIN ( SELECT DISTINCT id
FROM src ) users
LEFT JOIN src ON src.id = users.id
AND DATE(src.dt) = cte.dt
)
SELECT dt, SUM(status)
FROM cte2
GROUP BY dt;
fiddle
Do not forget to adjust max recursion depth.
Here is what I believe is a good solution for this problem of yours:
SELECT SUM(New_Status) "Number of active users"
, DATE_FORMAT(DATEC, '%Y-%m-%d') "Date"
FROM TEST T1
WHERE DATE_FORMAT(DATEC,'%H:%i:%s') =
(SELECT MAX(DATE_FORMAT(T2.DATEC,'%H:%i:%s'))
FROM TEST T2
WHERE T2.ID = T1.ID
AND DATE_FORMAT(T1.DATEC, '%Y-%m-%d') = DATE_FORMAT(T2.DATEC, '%Y-%m-%d')
GROUP BY ID
, DATE_FORMAT(DATEC, '%Y-%m-%d'))
GROUP BY DATE_FORMAT(DATEC, '%Y-%m-%d');
Here is the DEMO

Is there any way in SQL or function in MYSQL that sums up all the increments in a column?

I want to find a way to sum up all the increments in the value of a column.
We provide delivery services to our customers. A customer can pay as he go, but if he pays an upfront fee, he gets a better deal. There is a table that has the balance of the customer across the time. So I want to sum all the increments to the balance. I can't change the way the payment is recorded.
I have alredy coded an stored procedure that works, but is kind slow, so I'm looking for alternatives. I think that, maybe, an sql statement that can do this task, can outperform my stored procedure that has loops.
My stored procedure makes a select of the customer in a given date range, and insert the result in a temp table X. After that, it starts to pop rows from X table, comparing the balance value in that row against the previous row, and detects if there is an increment. If there is not increment, pops another row and do the same routine, if there is an increment, it calculates the difference between that row and the previous, and the result is inserted in another temp table Y.
When there are no rows left, the stored procedure performs a SUM in the temp table Y, and thus, you can know how much the customer has "refilled" its balance.
This is an example of the table X, and the expected result:
DATE BALANCE
---- -------
2019-02-01 200
2019-02-02 195 //from 200 to 195 there is a decrement, so it doesn't matter
2019-02-03 180
2019-02-04 150
2019-02-05 175 //there is an increment from 150 to 175, it's 25 that must be inserted in the temp table
2019-02-06 140
2019-02-07 180 //there is another increment, from 140 to 180, it's 40
So the resulting temp table Y must be something like this:
REFILL
------
25
40
The expected result is 65. My stored procedure returns this value, but as I said, is kind slow (it takes about 22 seconds to process 3900 rows, equivalent to 3 days, aprox), I think is because the loops. I would like to explore another alternatives. Because some details that I don't mention here, for a single costumer, I can have 1300 rows per day (the example is given in days, but I have rows by the minute). My tables are indexed, I think properly. I can't post my stored procedure, but it works as described (I know that "The devil is in the detail"). So any suggestion will be appreciated.
Use a user-defined variable to hold the balance from the previous row, and then subtract it from the current row's balance.
SELECT SUM(refill) AS total_refill
FROM (
SELECT GREATEST(0, balance - #prev_balance) AS refill, #prev_balance := balance
FROM (
SELECT balance
FROM tableX
ORDER BY date) AS t
CROSS JOIN (SELECT #prev_balance := NULL) AS ars
) AS t
There is a quite well-known mechanism to deal with these: Use a variable inside a field.
SELECT #result:=0;
SELECT #lastbalance:=9999999999; -- whatever value is sure to be highe than any real balance
SELECT SUM(increments) AS total FROM (
SELECT
IF(balance>#lastbalance, balance-#lastbalance, 0) AS increments,
#lastbalance:=balance AS ignore
FROM X -- insert real table name here
WHERE
-- insert selector here
ORDER BY
-- insert real chronological sorter here
) AS baseview;
Use lag() in MySQL 8+:
select sum(balance - prev_balance) as refills
from (select t.*, lag(balance) over (order by date) prev_balance
from t
) t
where balance > prev_balance;
In older versions of MySQL this is tricky. If the values are continuous dates, then a simple JOIN works:
select sum(t.balance - tprev.balance) as refills
from t join
t tprev
on tprev.date = t.date - 1
where t.balance > tprev.balance;
This may not be the case. Then the next best method is variables. But you have to be very careful. MySQL does not declare the order of evaluation of expressions in a SELECT. As the documentation explains:
The order of evaluation for expressions involving user variables is undefined. For example, there is no guarantee that SELECT #a, #a:=#a+1 evaluates #a first and then performs the assignment.
The variables need to be assigned and used in the same expression:
select sum(balance - prev_balance) as refills
from (select t.*,
(case when (#temp_prevb := #prevb) = NULL -- intentionally false
then -1
when (#prevb := balance)
then #temp_prevb
end) as prev_balance
from (select t.* from t order by date) t cross join
(select #prevb := NULL) params
) t
where balance > prev_balance;
And the final method is a correlated subquery:
select sum(balance - prev_balance) as refills
from (select t.*,
(select t2.balance
from t t2
where t2.date < t.date
order by t2.date desc
) as prev_balance
from t
) t
where balance > prev_balance;

MySQL - get users who placed 25th order during period

I have users and orders tables with this structure (simplified for question):
USERS
userid
registered(date)
ORDERS
id
date (order placed date)
user_id
I need to get array of users (array of userid) who placed their 25th order during specified period (for example in May 2019), date of 25th order for each user, number of days to place 25th order (difference between registration date for user and date of 25th order placed).
For example if user registered in April 2018, then placed 20 orders in 2018, and then placed 21-30th orders in Jan-May 2019 - this user should be in this array, if he placed 25th (overall for his account) order in May 2019.
How I can do this with MySQL request?
Sample data and structure: http://www.sqlfiddle.com/#!9/998358 (for testing you can get 3rd order as ex., not 25th, to not add a lot of sample data records).
One request is not required - if this can't be done in one request, few is possible and allowed.
You can use a correlated subquery to get the count of orders placed before the current one by a user. If that's 24 the current order is the 25th. Then check if the date is in the desired range.
SELECT o1.user_id,
o1.date,
datediff(o1.date, u1.registered)
FROM orders o1
INNER JOIN users u1
ON u1.userid = o1.user_id
WHERE (SELECT count(*)
FROM orders o2
WHERE o2.user_id = o1.user_id
AND o2.date < o1.date
OR o2.date = o1.date
AND o2.id < o1.id) = 24
AND o1.date >= '2019-01-01'
AND o1.date < '2019-06-01';
The basic inefficient way of doing this would be to get the user_id for every row in ORDERS where the date is in your target range AND the count of rows in ORDERS with the same user_id and a lower date is exactly 24.
This can get very ugly, very quickly, though.
If you're calling this from code you control, can't you do it from the code?
If not, there should be a way to assign to each row an index describing its rank among orders for its specific user_id, and select from this all user_id from rows with an index of 25 and a correct date. This will give you a select from select from select, but it should be much faster. The difficulty here is to control the order of the rows, so here are the selects I envision:
Select all rows, order by user_id asc, date asc, union-ed to nothing from a table made of two vars you'll initialize at 0.
from this, select all while updating a var to know if a row's user_id is the same as the last, and adding a field that will report so (so for each user_id the first line in order will have a specific value like 0 while the other rows for the same user_id will have a 1)
from this, select all plus a field that equals itself plus one in case the first added field is 1, else 0
from this, select the user_id from the rows where the second added field is 25 and the date is in range.
The union thingy is only necessary if you need to do it all in one request (you have to initialize them in a lower select than the one they're used in).
Edit: Well if you need the date too you can just select it along with the user_id, but calculating the number of days in sql will be a pain. Just join the result table to the users table and get both the date of 25th order and their date of registration, you'll surely be able to do the difference in code.
I'll try building an actual request, however if you want to truly understand what you need to make this you gotta read up on mysql variables, unions, and conditional statements.
"Looks too complicated. I am sure that this can be done with current DB structure and 1-2 requests." Well, yeah. Use the COUNT request, it will be easy, and slow as hell.
For the complex answer, see http://www.sqlfiddle.com/#!9/998358/21
Since you can use multiple requests, you can just initialize the vars first.
It isn't actually THAT complicated, you just have to understand how to concretely express what you mean by "an user's 25th command" to a SQL engine.
See http://www.sqlfiddle.com/#!9/998358/24 for the difference in days, turns out there's a method for that.
Edit 5: seems you're going with the COUNT method. I'll pray your DB is small.
Edit 6: For posterity:
The count method will take years on very large databases. Since OP didn't come back, I'm assuming his is small enough to overlook query speed. If that's not your case and let's say it's 10 years from now and the sqlfiddle links are dead; here's the two-queries solution:
SET #PREV_USR:=0;
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT orders.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
orders
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
Just change RANK = ? and the conditions to fit your needs. If you want to fully understand it, start by the innermost SELECT then work your way high; this version fuses the points 1 & 2 of my explanation.
Now sometimes you will have to use an API or something and it wont let you keep variable values in memory unless you commit it or some other restriction, and you'll need to do it in one query. To do that, you put the initialization one step lower and make it so it does not affect the higher statements. IMO the best way to do this is in a UNION with a fake table where the only row is excluded. You'll avoid the hassle of a JOIN and it's just better overall.
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT DERIVED_4.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
(SELECT * FROM orders
UNION
SELECT * FROM (
SELECT (#PREV_USR:=0) AS INIT_PREV_USR, 0 AS COL_2, 0 AS COL_3
) AS DERIVED_3
WHERE INIT_PREV_USR <> 0
) AS DERIVED_4
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
With that method, the thing to watch for is the amount and the type of columns in your basic table. Here orders' first field is an int, so I put INIT_PREV_USR in first then there are two more fields so I just add two zeroes with names and call it a day. Most types work, since the union doesn't actually do anything, but I wouldn't try this when your first field is a blob (worst comes to worst you can use a JOIN).
You'll note this is derived from a method of pagination in mysql. If you want to apply this to other engines, just check out their best pagination calls and you should be able to work thinks out.

SQL Query - Find out how many times a row changes from 0 to another value

I am using MySQL 8 and need to create a stored procedure
I have a single table that has a DATE field and a value field which can be 0 or any other number. This value field represents the daily amount of rain for that day.
The table stores data between today and 10 years.
I need to find out how many periods of rain there will be in the next 10 years.
So, for example, if my table contains the following data:
Date - Value
2018-06-09 - 0
2018-06-10 - 50
2018-06-11 - 0
2018-06-12 - 15
2018-06-13 - 17
2018-06-14 - 0
2018-06-15 - 0
2018-06-16 - 12
2018-06-17 - 123
2018-06-18 - 17
Then the SP should return 3, because there were 3 periods of rain.
Any help in getting me closer to the answer will be appreciated!
You don't need to have a stored procedure for this.
A solution with MySQL's 8.0 LEAD function this supports dates with gaps.
The complete table needs to be scanned but i don't think that a huge problem with ~3560 records.
Query
SELECT
SUM(filter_match = 1) AS number
FROM (
SELECT
((t.value = 0) AND (LEAD(t.value) OVER (ORDER BY t.date ASC) != 0)) AS filter_match
FROM
t
) t
see demo https://www.db-fiddle.com/f/sev4NqgLsFPgtNgwzruwy/2
By the way, would you mind expanding your answer to understand how
LEAD and SUM work together?
LEAD(t.value) OVER (ORDER BY t.date ASC) simply means get the next value from the next record ordered by date.
this demo shows it nicely https://www.db-fiddle.com/f/sev4NqgLsFPgtNgwzruwy/6
SUM(filter_match = 1) is a conditional sum. in this case the alias filter_match needs to be true.
see what filter_match is demo https://www.db-fiddle.com/f/sev4NqgLsFPgtNgwzruwy/8
In MySQL aggregate functions can have a SQL expression something like 1 = 1 (which is always true or 1) or 1 = 0 (which is always false or 0).
The conditional sum only sums up when the condition is true.
see demo https://www.db-fiddle.com/f/sev4NqgLsFPgtNgwzruwy/7
Use MySQL join:
SELECT COUNT(*) Number_of_Periods
FROM yourTable A JOIN yourTable B
ON DATE(A.`DATE`)=DATE(B.`DATE` - INTERVAL 1 DAY)
AND A.`VALUE`=0 AND B.`VALUE`>0;
See Demo on DB Fiddle.

Find the column with unusual difference between succeeding or preceding column in mysql

I have following table
id vehicle_id timestamp distance_meters
1 1 12:00:01 1000
2 1 12:00:04 1000.75
3 1 15:00:06 1345.0(unusual as time and distance jumped)
4 1 15:00:09 1347
The table above is the log of the vehicle.Normally , vehicle sends the data at 3 seconds interval , but sometimes they can get offline and send the data only they are online. Only, way to find out that is find out unusual jump in distance . We can assume some normal jump as (500 meters)
What is the best way to do that?
If you cannot ensure that the ids increment with no gaps, then you need another method. One method uses variables and one uses correlated subqueries.
The variables is messy, but probably the fastest method:
select t.*,
(case when #tmp_prev_ts := #prev_ts and false then NULL -- never happens
when #prev_ts := timestamp and false then NULL -- never happens
else #tmp_prev_ts
end) as prev_timestamp,
(case when #tmp_prev_d := #prev_d and false then NULL -- never happens
when #prev_d := distance_meters and false then NULL -- never happens
else #tmp_prev_d
end) as prev_distance_meters
from t cross join
(select #prev_ts := '', #prev_d := 0) params
order by timestamp; -- assume this is the ordering
You can then use a subquery or other logic to get the large jumps.
Usually you can use windowing function for such task - LEAD and LAG are perfect for this. However since there are no windowing functions in mysql you would have to emulate them.
You need to get your data set with row number and then join it to itself by row number with offset by 1.
It would look something like this:
SELECT
*
FROM (SELECT
rownr,
vehicle_id,
timestamp,
distance_meters
FROM t) tcurrent
LEFT JOIN (SELECT
rownr,
vehicle_id,
timestamp,
distance_meters
FROM t) tprev
ON tcurrent.vehicle_id = tprev.vehicle_id
AND tprev.rownr = tcurrent.rownr - 1
If you can assume id is sequential (without gaps) per vehicle_id, then you could use it instead of rownr. Otherwise you would have to make you own rank/row number.
So you would have to combine ranking solution from this question:
MySQL - Get row number on select