Find avg, min, and max of grouped by rows - mysql

I have created the following schema:
CREATE TABLE test (
id INT,
stat_id INT,
time DATETIME
);
INSERT INTO test (id, stat_id, time) VALUES (1, 1, '2020-09-21 00:02:31');
INSERT INTO test (id, stat_id, time) VALUES (5, 1, '2020-09-21 00:06:31');
INSERT INTO test (id, stat_id, time) VALUES (2, 2, '2020-09-19 00:08:31');
INSERT INTO test (id, stat_id, time) VALUES (3, 2, '2020-09-21 00:03:31');
INSERT INTO test (id, stat_id, time) VALUES (6, 2, '2020-09-23 00:02:31');
INSERT INTO test (id, stat_id, time) VALUES (4, 2, '2020-09-27 00:04:31');
INSERT INTO test (id, stat_id, time) VALUES (7, 3, '2020-09-20 00:04:31');
INSERT INTO test (id, stat_id, time) VALUES (8, 3, '2020-09-23 00:05:31');
https://www.db-fiddle.com/f/6CRv6XqYMAfkBHEBhz1zGe/1
I have 3 different stat_id groups.
They are ordered by the id (smallest to largest).
I need to find the avg duration between one event in each group to the the next.
For example, for site_id = 2, I need to get the difference between the 2020-09-21 and 2020-09-19, then 2020-09-23 and 2020-09-21, and then 2020-09-27 and 2020-09-23.
And then I need to get the avg duration between each of those rows, the maximum time (which be the time between the 2020-09-27 and 2020-09-23) and the minimum time.
I need to do this for all 3 stat_id groups.
I'm essentially looking, on average, how long it took for each each stat_id group to create a new row.
I tried something like:
select
stat_id,
AVG(time) as avg,
timestampdiff(hour, min(time), max(time)) as diff_in_hours,
from test
group by stat_id;
but obviously this is wrong. It gives the wrong average and just gives the difference between the biggest and the smallest in each group, which is not exactly what I am looking for. I am not sure how to do the difference between one row and its previous row?

One option uses lag():
select stat_id, avg(diff) avg_diff
from (
select t.*,
timestampdiff(hour, lag(time) over(partition by stat_id order by id), time) diff
from test t
) t
group by stat_id

Related

How to find cumulative sum between two dates in MySQL?

How to find cumulative sum between two dates taking into account the previous state?
Putting WHERE condition
WHERE date BETWEEN '2021-02-19 12:00:00'AND '2021-02-21 12:00:00';
doesn't do the job because the sum starts from the first condition's date, and not from the first record. I would like to select only part of the whole query (between two dates), but to calculate cumulative sum from the first (initial) state.
I prepared Fiddle
CREATE TABLE `table1` (
`id` int(11) NOT NULL,
`date` datetime NOT NULL DEFAULT current_timestamp(),
`payment` double NOT NULL
);
INSERT INTO `table1` (`id`, `date`, `payment`) VALUES
(1, '2021-02-16 12:00:00', 100),
(2, '2021-02-17 12:00:00', 200),
(3, '2021-02-18 12:00:00', 300),
(4, '2021-02-19 12:00:00', 400),
(5, '2021-02-20 12:00:00', 500),
(6, '2021-02-21 12:00:00', 600),
(7, '2021-02-22 12:00:00', 700);
version();
SELECT DATE_FORMAT(date, "%Y-%m-%d") AS date,
payment, SUM(payment) OVER(ORDER BY id) AS balance
FROM table1
WHERE date BETWEEN '2021-02-19 12:00:00'AND '2021-02-21 12:00:00';
You must filter the table after you get the cumulative sums:
SELECT *
FROM (
SELECT DATE(date) AS date,
payment,
SUM(payment) OVER(ORDER BY id) AS balance
FROM table1
) t
WHERE date BETWEEN '2021-02-19'AND '2021-02-21';
or:
SELECT *
FROM (
SELECT DATE(date) AS date,
payment,
SUM(payment) OVER(ORDER BY id) AS balance
FROM table1
WHERE DATE(date) <= '2021-02-21'
) t
WHERE date >= '2021-02-19';
See the demo.
Results:
date
payment
balance
2021-02-19
400
1000
2021-02-20
500
1500
2021-02-21
600
2100

GROUP CONCAT with ORDER in MEMSQL

Here is a toy example:
CREATE TABLE TEST
(
ID INT,
AGG NVARCHAR(20),
GRP NVARCHAR(20)
);
INSERT INTO TEST VALUES
(1, 'AB', 'X'), (2, 'BC', 'X'), (3, 'AC', 'X'),
(4, 'EF', 'Y'), (5, 'FG', 'Y'), (6, 'DC', 'Y'),
(7, 'JI', 'Z'), (8, 'IJ', 'Z'), (9, 'JK', 'Z');
Now, I would like to do this (this is a valid code in MySQL, but not in MEMSQL):
SELECT
COUNT(*),
SUM(ID),
GROUP_CONCAT(AGG ORDER BY AGG),
GRP
FROM TEST
GROUP BY GRP
So that the output looks like this (Required Output):
3 6 AB,AC,BC X
3 15 DC,EF,FG Y
3 24 IJ,JI,JK Z
Note that the values in the third column are sorted for each row. My output looks like this (Current Wrong Output):
3 6 BC,AB,AC X
3 15 DC,EF,FG Y
3 24 IJ,JI,JK Z
Compare each row in the third column, the lists are sorted.
However, since the above query is not valid in MEMSQL, I have to remove the ORDER BY AGG part in GROUP_CONCAT which causes the third column to not be sorted.
As per the documentation of GROUP_CONCAT, the expression can also be a function, however, there is no built in function to sort. I have tried many combinations of SELECT ... ORDER BY statements in GROUP_CONCAT without success. Is this impossible to do, or am I missing something?
I think this works for my case.
SELECT
COUNT(*),
SUM(T.ID),
GROUP_CONCAT(T.AGG),
T.GRP
FROM (
SELECT
*,
RANK() OVER(PARTITION BY GRP ORDER BY AGG) AS R
FROM TEST
) T
GROUP BY T.GRP
ORDER BY T.R
It is rather convoluted, so I hope someone can suggest an improvement.
Try this:
SELECT
COUNT(*),
SUM(ID),
GROUP_CONCAT(AGG),
GRP
FROM TEST
GROUP BY GRP
ORDER BY GROUP_CONCAT(AGG)

Finding within a SQL dataset the most recent period in which a hotel had at least one person staying

I have data which includes an ID for a hotel, the check-in date and the check-out date. I am trying to find the most recent period of time for each hotel where there was at least one person staying in the hotel for each night in the period.
Shown below is an example dataset:
CREATE TABLE Occupancy
(
HotelID INT,
PersonID INT,
CheckIn DATE,
CheckOut DATE
)
INSERT INTO Occupancy (HotelID, PersonID, CheckIn, CheckOut) VALUES (1, 1, '20/Jan/2015','22/Jan/2015')
INSERT INTO Occupancy (HotelID, PersonID, CheckIn, CheckOut) VALUES (1, 2, '13/Jan/2015','20/Jan/2015')
INSERT INTO Occupancy (HotelID, PersonID, CheckIn, CheckOut) VALUES (1, 3, '20/Jan/2015','22/Jan/2015')
INSERT INTO Occupancy (HotelID, PersonID, CheckIn, CheckOut) VALUES (1, 4, '12/Jan/2015','13/Jan/2015')
INSERT INTO Occupancy (HotelID, PersonID, CheckIn, CheckOut) VALUES (1, 5, '01/Jan/2015','10/Jan/2015')
INSERT INTO Occupancy (HotelID, PersonID, CheckIn, CheckOut) VALUES (1, 6, '01/Jan/2015','04/Jan/2015')
INSERT INTO Occupancy (HotelID, PersonID, CheckIn, CheckOut) VALUES (2, 7, '10/Jan/2015','20/Jan/2015')
INSERT INTO Occupancy (HotelID, PersonID, CheckIn, CheckOut) VALUES (2, 8, '11/Jan/2015','12/Jan/2015')
INSERT INTO Occupancy (HotelID, PersonID, CheckIn, CheckOut) VALUES (2, 9, '12/Jan/2015','13/Jan/2015')
INSERT INTO Occupancy (HotelID, PersonID, CheckIn, CheckOut) VALUES (2, 10, '12/Jan/2015','13/Jan/2015')
INSERT INTO Occupancy (HotelID, PersonID, CheckIn, CheckOut) VALUES (2, 11, '01/Jan/2015','02/Jan/2015')
INSERT INTO Occupancy (HotelID, PersonID, CheckIn, CheckOut) VALUES (3, 12, '02/Jan/2015','03/Jan/2015')
INSERT INTO Occupancy (HotelID, PersonID, CheckIn, CheckOut) VALUES (3, 13, '04/Jan/2015','05/Jan/2015')
INSERT INTO Occupancy (HotelID, PersonID, CheckIn, CheckOut) VALUES (3, 14, '05/Jan/2015','06/Jan/2015')
I am trying to create a view and the results I am expecting from this are as follows:
HotelID From To
1 12/Jan/2015 22/Jan/2015
2 10/Jan/2015 20/Jan/2015
3 04/Jan/2015 06/Jan/2015
I suspect you need to do a recursive query but I am not sure how. I should add I am using Microsoft SQL Server 2008 R2. Thank you in advance for any help.
I think I found the answer, although it feels quite convoluted. I suspect there may be a better way. Here it is:
WITH OverlappingOccupancy
AS (SELECT *
FROM (SELECT Row_number() OVER (PARTITION BY HotelID ORDER BY CheckOut DESC, CheckIN ASC) AS RowNumber,
HotelID,
CheckIN,
CheckOut
FROM Occupancy) AS RankedOccupancy
WHERE RowNumber = 1
UNION ALL
SELECT *
FROM (SELECT Row_number() OVER (PARTITION BY Occupancy.HotelID ORDER BY Occupancy.CheckIN ASC) AS RowNumber,
Occupancy.HotelID,
Occupancy.CheckIN,
Occupancy.CheckOut
FROM Occupancy
INNER JOIN OverlappingOccupancy ON OverlappingOccupancy.HotelID = Occupancy.HotelID
WHERE Occupancy.CheckOut >= OverlappingOccupancy.CheckIn
AND Occupancy.CheckIn < OverlappingOccupancy.CheckIn) AS RankedOccupancy
WHERE RowNumber = 1)
SELECT HotelID,
Min(CheckIn) AS FromCheckIn,
Max(CheckOut) AS ToCheckOut
FROM OverlappingOccupancy
GROUP BY HotelID

In MySQL how to query 2 columns from 1 row?

In MySQL table cardToCard has 1 row each time a credit card balance is transferred from one card to another card.
create table cardToCard (
id int,
dt date,
card_from int,
card_to int,
amount decimal(6,2),
primary key (id)
);
insert into cardToCard values (1, '2014-01-01', 100, 101, 200.00);
insert into cardToCard values (2, '2014-01-01', 101, 102, 200.00);
insert into cardToCard values (3, '2014-01-01', 102, 103, 200.00);
insert into cardToCard values (4, '2014-01-01', 103, 104, 200.00);
insert into cardToCard values (5, '2014-01-01', 104, 100, 200.00);
insert into cardToCard values (6, '2014-01-01', 99, 104, 200.00);
Query which card has been used 3 or more times.
select card, count(*) 'count'
from
(
select card_from 'card', dt
from cardtocard
union all
select card_to 'card', dt
from cardtocard
) d
group by card
having count >= 3
The results are correct. The question is would it be more efficient to write this as a self join?
http://sqlfiddle.com/#!2/420e72/1
Possibly the most efficient way to write this query would be to start with a list of cards and then do:
select c.card,
((select count(*) from cardTocard ctc where ctc.card_from = c.card) +
(select count(*) from cardTocard ctc where ctc.card_to = c.card)
) as cnt
from cards c
having cnt >= 3;
Then, you need two indexes: cardTocard(card_from) and cardTocard(card_to).
This should use the index for the aggregation, which is typically faster than a file sort.
EDIT:
Using the structure that you are using, it can be faster to do aggregation in the subqueries as well as the outer query:
select card, sum(cnt) as cnt
from ((select card_from as car, count(*) as cnt
from cardtocard
group by card_from
) union all
(select card_to as card, count(*) as cnt
from cardtocard
group by card_to
)
) d
group by card
having count >= 3;
This can be faster because the volume of data for the subqueries is smaller than just union'ing them together.

In a SQL select statement, how could one calculate a position of a row?

The MySQL database table:
create table t (
visitor_id int(11),
activity_type varchar(10),
date date
);
The rows:
insert into t (visitor_id, activity_type, date) values (1, 'hit', '2012-1-1');
insert into t (visitor_id, activity_type, date) values (1, 'event', '2012-1-2');
insert into t (visitor_id, activity_type, date) values (2, 'hit', '2012-1-2');
insert into t (visitor_id, activity_type, date) values (2, 'event', '2012-3-5');
insert into t (visitor_id, activity_type, date) values (2, 'hit', '2012-3-2');
insert into t (visitor_id, activity_type, date) values (1, 'hit', '2012-3-5');
insert into t (visitor_id, activity_type, date) values (1, 'hit', '2012-2-1');
I want to write a query to retrieve a data dump of users, activity type, and the order in which the activity occurred, that looks like the following:
visitor_id, activity_type, Position
1, hit, 1
1, event, 2
1, hit, 3
1, hit, 4
2, hit, 1
2, hit, 2
2, event, 3
So far I have written the following solution:
select visitor_id, activity_type, 'Position'
from t1
order by visitor_id, date
;
The hard part is the column Position. This should represent the position in the order of the rows for that visitor ID. Is there any way to determine Position?
This may be relevant to your question:
MySQL get row position in ORDER BY