[Aim]
We would like to find out how often an event "A" ocurred before time "X". More concretely, given the dataset below we want to find out the count of the prior purchases.
[Context]
DMBS: MySQL 5.6
We have following dataset:
user | date
1 | 2015-06-01 17:00:00
2 | 2015-06-02 18:00:00
1 | 2015-06-03 19:00:00
[Desired output]
user | date | purchase count
1 | 2015-06-01 17:00:00 | 1
2 | 2015-06-02 18:00:00 | 1
1 | 2015-06-03 19:00:00 | 2
[Already tried]
We managed to get the count on a specific day using an inner join on the table itself.
[Problem(s)]
- How to do this in a single query?
This could be done using user defined variable which is faster as already mentioned in the previous answer.
This needs creating incremental variable for each group depending on some ordering. And from the given data set its user and date.
Here how you can achieve it
select
user,
date,
purchase_count
from (
select *,
#rn:= if(#prev_user=user,#rn+1,1) as purchase_count,
#prev_user:=user
from test,(select #rn:=0,#prev_user:=null)x
order by user,date
)x
order by date;
Change the table name test to your actual table name
http://sqlfiddle.com/#!9/32232/12
Probably the most efficient way is to use variables:
select t.*,
(#rn := if(#u = user, #rn + 1,
if(#u := user, 1, 1)
)
) as purchase_count;
from table t cross join
(select #rn := 0, #u := '') params
order by user, date ;
You can also do this with correlated subqueries, but this is probably faster.
Related
I need to write a MYSQL select query to do the following, possibly using Interval, but am not sure how to achieve it.
To retrieve groups of results where the date time value of neighbouring rows is within 5 minutes. In the below simplified example I need to retrieve Bob, Ted and Fred in one group and Andy, Mike and Bert in another.
I don't know how many groups there will be. I need to be able to establish when each group starts, when it ends and collect the data in between.
Once I have these groups I need to be able to access each value from every row within a group in order to perform further calculations
**DB Table**
DateTime Value
2018-10-17 12:50 Bob
2018-10-17 12:55 Ted
2018-10-17 13:00 Fred
2018-10-17 15:00 Andy
2018-10-17 15:05 Mike
2018-10-17 15:10 Bert
In MySql 5.x you can calculate a rank by using variables.
Example:
SELECT `DateTime`, `Value`,
CASE
WHEN #prev_dt >= `DateTime` - INTERVAL 5 MINUTE AND #prev_dt := `DateTime` THEN #rank
WHEN #prev_dt := `DateTime` THEN #rank := #rank + 1 -- this WHEN criteria is always true
END AS `Rank`
FROM YourTable t
CROSS JOIN (SELECT #prev_dt := NULL, #rank := 0) vars
ORDER BY `DateTime`;
You'll find a test on SQL Fiddle here
Result:
DateTime Value Rank
-------------------- --- ----
2018-10-17T12:50:00Z Bob 1
2018-10-17T12:55:00Z Ted 1
2018-10-17T13:00:00Z Fred 1
2018-10-17T15:00:00Z Andy 2
2018-10-17T15:05:00Z Mike 2
2018-10-17T15:10:00Z Bert 2
In MySql 8.x you could also use the window function LAG to get the previous DateTime.
I've searched for this topic but all I got was questions about grouping results by month. I need to retrieve rows grouped by month with summed up cost from start date to this whole month
Here is an example table
Date | Val
----------- | -----
2017-01-20 | 10
----------- | -----
2017-02-15 | 5
----------- | -----
2017-02-24 | 15
----------- | -----
2017-03-14 | 20
I need to get following output (date format is not the case):
2017-01-20 | 10
2017-02-24 | 30
2017-03-14 | 50
When I run
SELECT SUM(`val`) as `sum`, DATE(`date`) as `date` FROM table
AND `date` BETWEEN :startDate
AND :endDate GROUP BY year(`date`), month(`date`)
I got sum per month of course.
Nothing comes to my mind how to put in nicely in one query to achieve my desired effect, probably W will need to do some nested queries but maybe You know some better solution.
Something like this should work (untestet). You could also solve this by using subqueries, but i guess that would be more costly. In case you want to sort the result by the total value the subquery variant might be faster.
SET #total:=0;
SELECT
(#total := #total + q.sum) AS total, q.date
FROM
(SELECT SUM(`val`) as `sum`, DATE(`date`) as `date` FROM table
AND `date` BETWEEN :startDate
AND :endDate GROUP BY year(`date`), month(`date`)) AS q
You can use DATE_FORMAT function to both, format your query and group by.
DATE_FORMAT(date,format)
Formats the date value according to the format string.
SELECT Date, #total := #total + val as total
FROM
(select #total := 0) x,
(select Sum(Val) as Val, DATE_FORMAT(Date, '%m-%Y') as Date
FROM st where Date >= '2017-01-01' and Date <= '2017-12-31'
GROUP BY DATE_FORMAT(Date, '%m-%Y')) y
;
+---------+-------+
| Date | total |
+---------+-------+
| 01-2017 | 10 |
+---------+-------+
| 02-2017 | 30 |
+---------+-------+
| 03-2017 | 50 |
+---------+-------+
Can check it here: http://rextester.com/FOQO81166
Try this.
I use yearmonth as an integer (the year of the date multiplied by 100 plus the month of the date) . If you want to re-format, your call, but integers are always a bit faster.
It's the complete scenario, including input data.
CREATE TABLE tab (
dt DATE
, qty INT
);
INSERT INTO tab(dt,qty) VALUES( '2017-01-20',10);
INSERT INTO tab(dt,qty) VALUES( '2017-02-15', 5);
INSERT INTO tab(dt,qty) VALUES( '2017-02-24',15);
INSERT INTO tab(dt,qty) VALUES( '2017-03-14',20);
SELECT
yearmonths.yearmonth
, SUM(by_month.month_qty) AS running_qty
FROM (
SELECT DISTINCT
YEAR(dt) * 100 + MONTH(dt) AS yearmonth
FROM tab
) yearmonths
INNER JOIN (
SELECT
YEAR(dt) * 100 + MONTH(dt) AS yearmonth
, SUM(qty) AS month_qty
FROM tab
GROUP BY YEAR(dt) * 100 + MONTH(dt)
) by_month
ON yearmonths.yearmonth >= by_month.yearmonth
GROUP BY yearmonths.yearmonth
ORDER BY 1;
;
yearmonth|running_qty
201,701| 10.0
201,702| 30.0
201,703| 50.0
select succeeded; 3 rows fetched
Need explanations?
My solution has the advantage over the others that it will be re-usable without change when you move it to a more modern database - and you can convert it to using analytic functions when you have time.
Marco the Sane
I came across the following problem:
I would like to sum the hours of each name, giving a total interval between START and END activities,
would be simple if I could subtract from each record the end of the beginning, more e.g., Mary, started 13th and was up to 15 and started another activity while 14 and 16, I would like the result of it was 3 (she used 3 hours of their time to perform both activities)
e.g.:
Name | START | END |
-----------------------------------------------------------
KATE | 2014-01-01 13:00:00 | 2014-01-01 14:00:00 |
MARY | 2014-01-01 13:00:00 | 2014-01-01 15:00:00 |
TOM | 2014-01-01 13:00:00 | 2014-01-01 16:00:00 |
KATE | 2014-01-01 12:00:00 | 2014-01-02 04:00:00 |
MARY | 2014-01-01 14:00:00 | 2014-01-01 16:00:00 |
TOM | 2014-01-01 12:00:00 | 2014-01-01 18:00:00 |
TOM | 2014-01-01 22:00:00 | 2014-01-02 02:00:00 |
result:
KATE 15 hours
MARY 3 hours
TOM 9 hours
Have you tried a group by and then an aggregate function?
SELECT Name, SUM(UNIX_TIMESTAMP(End) - UNIX_TIMESTAMP(Start)) FROM myTable
GROUP BY Name
Which will return a cumulative total of seconds from the intervals you have. You can then change the seconds to hours for display.
Also I would highly recommend grouping by a primary key or something instead of a string name, but I understand that this may have been just to simplify the question.
I found this problem interesting, so spent a little more time to develop a solution. What I came up with involves sorting the rows by name and start time, then using MySQL variables to account for overlapping ranges. I begin by sorting the table and supplementing it with columns that carry the name and times from one row to the next
SELECT [expounded below]
FROM (SELECT * FROM tbl ORDER BY Name, START, END) AS u,
(SELECT #x := 0, #gap := 0, #same_name:='',
#beg := (SELECT MIN(START) FROM tbl),
#end := (SELECT MAX(END) FROM tbl)) AS t
This adds the name and the outer bounds of the time range to each row of the table, as well as sorting the table so that
names are together in order by starting time. For each row, we will now have #same_name, #beg, and #end carrying values forward from one line to the next, and #x and #gap will accumulate the hours.
Now we have to do some reasoning about the possible overlaps that can occur. For any two intervals, they are either disjoint or have an intersection:
Non-overlapping: beg--------end START-------END
Overlapping: beg-----------end beg---------end
START--------------END START-----------END
Subset: beg---------------------------------end
START-----END
Once the rows are adjacent, we can decide if two ranges overlap by comparing their start and end points. They overlap
if the start of one is before the end of the other and vice versa:
IF( #end >= START && #beg <= END,
If they do overlap, then the total interval is the difference between the outer edges of the two intervals:
TIMESTAMPDIFF(HOUR, LEAST(#beg, START), GREATEST(#end, END))
If they don't overlap, then we can just add the new interval to the previous one.
We will also need to know the gap between intervals, which is the difference from the end of the first to the beginning of the second. This will be necessary to calculate the hours for a case of more than two intervals, where only some overlap.
1-----------2 3----------4
3--------------------5
Putting this together gets us a calculation per row, where each row calculates the union of the hours with the one
above it. For each variable, we have to reset it if the name changes:
SELECT Name, START, END,
#x := IF(#same_name = Name,
IF( #end >= START && #beg <= END, -- does it overlap?
TIMESTAMPDIFF(HOUR, LEAST(#beg, START), GREATEST(#end, END)),
#x + TIMESTAMPDIFF(HOUR, START, END) ),
TIMESTAMPDIFF(HOUR,START,END) ) AS hr,
#gap := IF(#same_name = Name,
IF(#end >= START && #beg <= END, -- does it overlap?
#gap,
#gap + TIMESTAMPDIFF(HOUR, #end, START)),
0) AS gap,
#beg := IF(#same_name = Name,
CAST(LEAST(#beg, START) AS DATETIME), -- expand interval
START) AS beg, -- reset interval
#end := IF(#same_name = Name,
CAST(GREATEST(#end, END) AS DATETIME),
END) AS finish,
#same_name := Name AS sameName
FROM
(SELECT * FROM xt ORDER BY Name, START, END) AS u,
(SELECT #x := 0, #gap := 0, #same_name:='', #beg := (SELECT MIN(START) FROM xt), #end := (SELECT MAX(END) FROM xt)) AS t
That still gives us as many rows as there were in the original table. The hours and gaps will accumulate for each name, so we have to select the highest values and group by Name:
SELECT Name, MAX(hr) - MAX(gap) AS HOURS
FROM ( [insert above query here] ) AS intermediateCalculcation
GROUP BY Name;
Edit
And of course a moment after hitting enter, it occurs to me that (a) there is a bug for names that have no overlapping intervals at all; and (b) all #x is really doing is building up the interval from MIN(START) to MAX(END) for eacdh name, which could be done with a simpler query and join. Um, exercise for the reader ? :-)
I have a table that keeps track of the scores of people playing my game
userID | game_level | date_of_attempt | score
1 1 2014-02-07 19:29:00 2
1 2 2014-02-08 19:00:00 0
2 1 2014-03-03 11:11:04 4
... ... ... ...
I am trying to write a query that, for a given user, will tell me their cumulative score for each game_level as well as they average of the last 20 scores they have obtained on a particular game_level (by sorting on date_of_attempt)
For example:
userID | game_level | sum of scores on game level | average of last 20 level scores
1 1 26 4.5
1 2 152 13
Is it possible to do such a thing in a single query? I often need to perform the query for multiple game_levels, and I use a long subquery to work out which levels are needed which makes me think a single query would be better
MySQL does not support analytic functions, so obtaining the average is trickier than it would be in some other RDBMS. Here I use user-defined variables to obtain the groupwise rank and then test on the result to average only over the 20 most recent records:
SELECT userID, game_level, SUM(score), x.avg
FROM my_table JOIN (
SELECT AVG(CASE WHEN (#rank := (CASE
WHEN t.userID = #userID
AND t.game_level = #gamelevel
THEN #rank + 1
ELSE 0
END) < 20 THEN score END) AS avg,
#userID := userID AS userID,
#game_level := game_level AS game_level
FROM my_table,
(SELECT #rank := #userID := #game_level := NULL) init
ORDER BY userID, game_level, date_of_attempt DESC
) x USING (userID, game_level)
GROUP BY userID, game_level
See How to select the first/least/max row per group in SQL for further information.
I ‘m trying to do a query with a partial balance, where the last column is a sum row by row:
Here a set of data A, the date is DD-MM-YYYY
Amount | Date
20 | 16-01-2013
-1 | 22-01-2013
-2 | 22-01-2013
-3 | 23-01-2013
-9 | 24-01-2013
Here a set of data B
Amount | Date
-5 | 23-01-2013
-4 | 23-01-2013
9 | 23-01-2013
3 | 24-01-2013
-3 | 24-01-2013
I’d like to have a result like this, let's say for the set of data A:
Amount | Date | Balance
-9 | 24-01-2013 | 14
-3 | 23-01-2013 | 17
-2 | 22-01-2013 | 19
-1 | 22-01-2013 | 20
20 | 16-01-2013 | 0
I'm using this query for both data set:
SELECT
PreAgg.tData,
PreAgg.amount,
#PrevBal := #PrevBal - PreAgg.amount AS Total
FROM
(
SELECT
pr.tData,
pr.amount
FROM
tableTest pr
ORDER BY
pr.tData desc
) AS PreAgg,
(SELECT #PrevBal := 0.00) AS SqlVars
Well for the data set B it works perfect for the data set A it doesn't, and I cannot understand why!
thank a lot
SOLUTION
Hi at the end I managed to do it.
I use a fake counter here the query:
SELECT
reference,amount,balance
FROM
(
SELECT
#id := #id + 1 AS id,t.date as reference,t.amount,
#balance := (#balance + t.amount) AS balance
FROM
tmpTable t, (SELECT #id:=0, #balance:=0, #grouping:=0) AS vars
ORDER BY
t.tData
) AS x
ORDER BY
x.id DESC
Your result only makes sense if I interpret the dates as DD-MM-YYYY. Otherwise, the dates are garbled.
Assuming the dates are stored as dates (and not strings), you want "asc" instead of "desc":
SELECT (#rownumIn :=amount+#rownumIn) AS Balance, date, amount
FROM table_data cross join
(SELECT #rownumIn := 0
) AS counter
WHERE some constraits
ORDER BY date asc;
I suspect you have committed the sin of storing dates as strings, though. If this is the case, it is easily fixed using str_to_date():
SELECT (#rownumIn :=amount+#rownumIn) AS Balance, date, amount
FROM table_data cross join
(SELECT #rownumIn := 0
) AS counter
WHERE some constraits
ORDER BY str_to_date(date, '%d-%m-%Y') asc;