I have a report i'm trying to figure out, but I would like to do it all with in a SQL statement instead of needing to iterate over a bunch of data in script to do it.
I have a table that is structured like:
CREATE TABLE `batch_item` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`record_id` int(11) DEFAULT NULL,
`created` DATE NOT NULL,
PRIMARY KEY (`id`),
KEY `record_id` (`record_id`)
);
The Date field is always YEAR-MONTH-01. Data looks something like:
+------+-----------+------------+
| id | record_id | created |
+------+-----------+------------+
| 1 | 1 | 2019-01-01 |
| 2 | 2 | 2019-01-01 |
| 3 | 3 | 2019-01-01 |
| 4 | 1 | 2019-02-01 |
| 5 | 2 | 2019-02-01 |
| 6 | 1 | 2019-03-01 |
| 7 | 3 | 2019-03-01 |
| 8 | 1 | 2019-04-01 |
| 9 | 2 | 2019-04-01 |
+------+-----------+------------+
So what I'm trying to do, with out having to create a looping script, is find the AVG number of sequential months for each record. Example with the data above would be:
Record_id 1 would have a avg of 4 months.
Record_id 2 would be 1.5
Record_id 3 would be 1
I can write a script to iterate through all the records. I just would rather avoid that.
This is a gaps-and-islands problem. You simply need an enumeration of the rows for this to work. In MySQL 8+, you would use row_number() but you can use a global enumeration here:
select record_id, min(created) as min_created, max(created) as max_created, count(*) as num_months
from (select bi.*, (created - interval n month) as grp
from (select bi.*, (#rn := #rn + 1) as n -- generate some numbers
from batch_item bi cross join
(select #rn := 0) params
order by bi.record_id, bi.month
) bi
) bi
group by record_id, grp;
Note that when using row_number(), you would normally partition by record_id. However that is not necessary, if the numbers are created in the correct sequence.
The above query gets the islands. For your final results, you need one more level of aggregation:
select record_id, avg(num_months)
from (select record_id, min(created) as min_created, max(created) as max_created, count(*) as num_months
from (select bi.*, (created - interval n month) as grp
from (select bi.*, (#rn := #rn + 1) as n -- generate some numbers
from batch_item bi cross join
(select #rn := 0) params
order by bi.record_id, bi.month
) bi
) bi
group by record_id, grp
) bi
group by record_id;
This is not a tested solution. It should work in MySQL 8.x with minor tweaks, since I don't remember date arithmetic in MySQL:
with
a as ( -- the last row of each island
select *
from batch_item
where lead(created) over(partition by record_id order by created) is null
or lead(created) over(partition by record_id order by created)
> created + 1 month -- Fix the date arithmetic here!
),
e as ( -- each row, now with the last row of its island
select b.id, b.record_id, min(a.last_created) as end_created
from batch_item b
join a on b.record_id = a.record_id and b.created <= a.created
group by b.id, b.record_id
),
m as ( -- each island with the number of months it has
select
record_id, end_created, count(*) as months
from e
group by record_id, end_created
)
select -- the average length of islands for each record_id
record_id, avg(months) as avg_months
from m
group by record_id
Related
I have the following tables:
Apps
TYPE_ID | BUILD_ID | CONFIG_ID | VERSION_ID | (All foreign keys to the respective tables)
1 | 1 | 1 | 1 |
1 | 1 | 1 | 2 |
2 | 2 | 3 | 3 |
2 | 2 | 3 | 4 |
Versions
ID | major | minor | patch
1 | 1 |0 |1
2 | 2 |0 |0
3 | 3 |0 |3
4 | 4 |0 |0
I need to select highest version rows from Apps table for each unique combinations of TYPE_ID, BUILD_ID and CONFIG_ID.
The version number should be calculated by MAX(major * 1000000 + minor * 1000 + patch) in the versions table.
So from the given example of the Apps table the result would be:
TYPE_ID | BUILD_ID | CONFIG_ID | VERSION_ID |
1 | 1 | 1 | 2 |
2 | 2 | 3 | 4 |
Have tried something like this:
SELECT p1.* FROM Apps p1
INNER JOIN (
SELECT max(VERSION_ID) MaxVersion, CONFIG_ID
FROM Apps
GROUP BY CONFIG_ID
) p2
ON p1.CONFIG_ID = p2.CONFIG_ID
AND p1.VERSION_ID = p2.MaxVersion
GROUP BY `TYPE_ID`, `BUILD_ID`, `CONFIG_ID`
But MAX is applied on the VERSION_ID and I need MAX to be applied on major, minor and patch combinations.
MySQL Version 15.1 distribution 5.5.56-MariaDB
Any help would be appreciated.
Cheers!
You can compute the maximum version per type_id, build_id, config_id using the formula described in your question, use it again same formula to locate the version:
SELECT sq.type_id, sq.build_id, sq.config_id, versions.id AS version_id_max
FROM (
SELECT type_id, build_id, config_id, MAX(major * 1000000 + minor * 1000 + patch) AS max_version
FROM apps
INNER JOIN versions ON apps.version_id = versions.id
GROUP BY type_id, build_id, config_id
) sq
INNER JOIN versions ON max_version = major * 1000000 + minor * 1000 + patch
+---------+----------+-----------+----------------+
| type_id | build_id | config_id | version_id_max |
+---------+----------+-----------+----------------+
| 1 | 1 | 1 | 2 |
| 2 | 2 | 3 | 4 |
+---------+----------+-----------+----------------+
Try this :
select type_id, build_id, config_id,
max(1000000*v.major+1000*v.minor+v.patch) as version
from apps a left join versions v on a.version_id=v.id
group by type_id, build_id, config_id
Utilizing Nested Derived subqueries, and a bit of hacky way of identifying VERSION_ID corresponding to MAX VERSION_NO.
We basically first get a derived table determining the VERSION_NO for each row in the Apps table.
Now using that derived table as a source for SELECT, we group by on the TYPE_ID, BUILD_ID and CONFIG_ID, and using a GROUP_CONCAT and string manipulation based trick, we determine the VERSION_ID corresponding to maximum VERSION_NO, for a group.
Try the following:
SELECT nest.TYPE_ID,
nest.BUILD_ID,
nest.CONFIG_ID,
SUBSTRING_INDEX(GROUP_CONCAT(DISTINCT nest.VERSION_ID
ORDER BY nest.VERSION_NO DESC
SEPARATOR ','), ',', 1) AS VERSION_ID
FROM (
SELECT A.TYPE_ID,
A.BUILD_ID,
A.CONFIG_ID,
A.VERSION_ID,
(V.major*1000000 + V.minor*1000 + V.patch) AS VERSION_NO
FROM Apps AS A
INNER JOIN Versions AS V ON V.ID = A.VERSION_ID
) AS nest
GROUP BY nest.TYPE_ID, nest.BUILD_ID, nest.CONFIG_ID
SQL FIDDLE
Try this:
SELECT a1.type_id, a1.build_id, a1.config_id, a1.version_id
FROM apps a1
WHERE NOT EXISTS(
(SELECT 'NEXT'
FROM apps a2
WHERE a2.type_id = a1.type_id
AND a2.build_id = a1.build_id
AND a2.config_id = a1.config_id
AND a2.version_id > a1.version_id))
Try this query, what I do here, I imitate well-known function ROW_NUMBER() OVER (PARTITION BY Type_id, Build_id, Config_id ORDER BY major desc, minor desc, patch desc).
select #type_id_lag := 0, #build_id_lag :=0, #config_id_lag := 0, #rn := 0;
select type_id, build_id, config_id, major, minor, patch from (
select case when #type_id_lag = type_id and
#build_id_lag = build_id and
#config_id_lag = config_id then #rn := #rn + 1 else #rn := 1 rn,
#type_id_lag := type_id type_id,
#build_id_lag := build_id build_id,
#config_id_lag := config_id config_id,
v.major, v.minor, v.patch
from Apps a
left join Versions v on a.version_id = v.id
order by a.type_id, a.build_id, a.config_id,
v.major desc, v.minor desc, v.patch desc
) a where rn = 1;
I got a DATETIME to store when the values where introduced, like this example shows:
CREATE TABLE IF NOT EXISTS salary (
change_id INT(11) NOT NULL AUTO_INCREMENT,
emp_salary FLOAT(8,2),
change_date DATETIME,
PRIMARY KEY (change_id)
);
I gonna fill the example like this:
+-----------+------------+---------------------+
| change_id | emp_salary | change_date |
+-----------+------------+---------------------+
| 1 | 200.00 | 2018-06-18 13:17:17 |
| 2 | 700.00 | 2018-06-25 15:20:30 |
| 3 | 300.00 | 2018-07-02 12:17:17 |
+-----------+------------+---------------------+
I want to get the last inserted value of each month for every year.
So for the example I made, this should be the output of the Select:
+-----------+------------+---------------------+
| change_id | emp_salary | change_date |
+-----------+------------+---------------------+
| 2 | 700.00 | 2018-06-25 15:20:30 |
| 3 | 300.00 | 2018-07-02 12:17:17 |
+-----------+------------+---------------------+
1 won't appear because is an outdated version of 2
You could use a self join to pick group wise maximum row, In inner query select max of change_date by grouping your data month and year wise
select t.*
from your_table t
join (
select max(change_date) max_change_date
from your_table
group by date_format(change_date, '%Y-%m')
) t1
on t.change_date = t1.max_change_date
Demo
If you could use Mysql 8 which has support for window functions you could use common table expression and rank() function to pick row with highest change_date for each year and month
with cte as(
select *,
rank() over (partition by date_format(change_date, '%Y-%m') order by change_date desc ) rnk
from your_table
)
select * from cte where rnk = 1;
Demo
The below query should work for you.
It uses group by on month and year to find max record for each month and year.
SELECT s1.*
FROM salary s1
INNER JOIN (
SELECT MAX(change_date) maxDate
FROM salary
GROUP BY MONTH(change_date), YEAR(change_date)
) s2 ON s2.maxDate = s1.change_date;
Fiddle link : http://sqlfiddle.com/#!9/1bc20b/15
I have a MySQL table with timestamp column t. I need to create another integer column (groupId) which will have the same value for records with timestamp with
less then 3 sec difference. My version of MySQL has no window function support. This is the expected output in 2nd column:
+---------------------+--------+
| t | groupId|
+---------------------+--------+
| 2017-06-17 18:15:13 | 1 |
| 2017-06-17 18:15:14 | 1 |
| 2017-06-17 20:30:06 | 2 |
| 2017-06-17 20:30:07 | 2 |
| 2017-06-17 22:44:58 | 3 |
| 2017-06-17 22:44:59 | 3 |
| 2017-06-17 23:59:50 | 4 |
| 2017-06-17 23:59:51 | 4 |
I tried to use self-join and TIMESTAMPDIFF(SECOND,t1,t2) <3
but I do not know how to generate the unique groupId.
P.S.
It is guaranteed by the nature of data what there is no continues range which spans > 3 sec
You can do this using variables.
select tm
,#diff:=timestampdiff(second,#prev,tm)
,#prev:=tm
,#grp:=case when #diff<3 or #diff is null then #grp else #grp+1 end as groupID
from t
cross join (select #prev:='',#diff:=0,#grp:=1) r
order by tm
For this, I believe that you need to create a stored procedure that first sort your table by the column t (timestamp) and then goes through it grouping and assigning the groupId accordingly.... in this case you can use your own counter as groupID.
What it is important here, is how you split the time into frames of 2 seconds, you could end with different results depending of your point of reference...
This query puts every record in the same group when the previous record is just 3 seconds before:
UPDATE t
JOIN (
SELECT
t.*
, #gid := IF(TIMESTAMPDIFF(SECOND, #prev, t) > 3, #gid + 1, #gid) AS gid
, #prev := t
FROM t
, (SELECT #prev := NULL, #gid := 1) v
ORDER BY t
) sq ON t.t = sq.t
SET t.groupId = sq.gid;
see it working live in an sqlfiddle
learn more about user-defined variables here
This query will work in Oracle sql:
select *
from (
select e.*,
rank() over (partition by trunc(hiredate,'mi') order by trunc(hiredate,'mi') desc) MINu
from emp e
)
The table structure is: user_id, Date (I'm used to work with timestamp)
for example
user id | Date (TS)
A | '2014-08-10 14:02:53'
A | '2014-08-12 14:03:25'
A | '2014-08-13 14:04:47'
B | '2014-08-13 04:04:47'
...
and for the next week I have
user id | Date (TS)
A | '2014-08-17 09:02:53'
B | '2014-08-17 10:04:47'
B | '2014-08-18 10:04:47'
A | '2014-08-19 10:04:22'
C | '2014-08-19 11:04:47'
...
and for today I have
user id | Date (TS)
A | '2015-05-27 09:02:53'
B | '2015-05-27 10:04:47'
C | '2015-05-27 10:04:22'
D | '2015-05-27 17:04:47'
I need to know how to perform a single query to find the number of users which are a "returned" user from the very beginning of their activity.
Expected results :
date | New user | returned User
2014-08-10 | 1 | 0
2014-08-11 | 0 | 0
2014-08-12 | 0 | 1 (A was active on 08/11)
2014-08-13 | 1 | 1 (A was active on 08/12 & 08/11)
...
2014-08-17 | 0 | 2 (A & B were already active )
2014-08-18 | 0 | 1
2014-08-19 | 1 | 1
...
2015-05-27 | 1 | 3 (D is a new user)
After some long search on Stackoverflow I found some material provided by https://meta.stackoverflow.com/users/107744/spencer7593 here : Weekly Active Users for each day from log but I didn't succeed to change his query to output my expected results.
Thanks for your help
Assuming you have a date table somewhere (and using t-sql syntax because I know it better...) the key is to calculate the mindate for each user separately, calculate the total number of users on that day, and then just declaring a returning user to be a user who wasn't new:
SELECT DateTable.Date, NewUsers, NumUsers - NewUsers AS ReturningUsers
FROM
DateTable
LEFT JOIN
(
SELECT MinDate, COUNT(user_id) AS NewUsers
FROM (
SELECT user_id, min(CAST(date AS Date)) as MinDate
FROM Table
GROUP BY user_id
) A
GROUP BY MinDate
) B ON DateTable.Date = B.MinDate
LEFT JOIN
(
SELECT CAST(date AS Date) AS Date, COUNT(DISTINCT user_id) AS NumUsers
FROM Table
GROUP CAST(date AS Date)
) C ON DateTable.Date = C.Date
Thanks to Stephen, I made a short fix on his query, which works well even it's a bit time consuming on large database :
SELECT
DATE(Stats.Created),
NewUsers,
NumUsers - NewUsers AS ReturningUsers
FROM
Stats
LEFT JOIN
(
SELECT
MinDate,
COUNT(user_id) AS NewUsers
FROM (
SELECT
user_id,
MIN(DATE(Created)) as MinDate
FROM Stats
GROUP BY user_id
) A
GROUP BY MinDate
) B
ON DATE(Stats.Created) = B.MinDate
LEFT JOIN
(
SELECT
DATE(Created) AS Date,
COUNT(DISTINCT user_id) AS NumUsers
FROM Stats
GROUP BY DATE(Created)
) C
ON DATE(Stats.Created) = C.Date
GROUP BY DATE(Stats.Created)
I have a table where one column is the date:
+----------+---------------------+
| id | date |
+----------+---------------------+
| 5 | 2012-12-10 10:12:37 |
+----------+---------------------+
| 4 | 2012-12-10 09:09:55 |
+----------+---------------------+
| 3 | 2012-12-09 21:12:35 |
+----------+---------------------+
| 2 | 2012-12-09 20:15:07 |
+----------+---------------------+
| 1 | 2012-12-09 20:01:42 |
+----------+---------------------+
What I need, is to count the rows which are for example whitin 3 hours to each other. In this example I want to join the upper row with the 2nd row, and the 3rd row with the 4th and 5th rows. So my output should be like this:
+----------+---------------------+---------+
| id | date | count |
+----------+---------------------+---------+
| 5 | 2012-12-10 10:12:37 | 2 |
+----------+---------------------+---------+
| 3 | 2012-12-09 21:12:35 | 3 |
+----------+---------------------+---------+
How could I do this?
I think you need a self-join for this:
select t.id, t.date, COUNT(t2.id)
from t left outer join
t t2
on t.date between t2.date - interval 3 hour and t2.date + interval 3 hour
group by t.id, t.date
(This is untested code so it might have a syntax error.)
If you are trying to divide everything into 3-hour intervals, you can do something like:
select max(t.date), t.id, count(*)
from (select t.*,
(date(date)*100 + floor(hour(date)/3)*3) as interval
from t
) t
group by interval
I am not sure how to do this with My SQL but i am able to build a set of queries in SQL Server 2005 which will provide the intended results. Here is the working sample, its very complex and may be overly complex but that's how i was able to get the desired result:
WITH BaseData AS
(
SELECT 5 AS ID, '2012-12-10 10:12:37' AS Date
UNION ALL
SELECT 4 AS ID, '2012-12-10 09:09:55' AS Date
UNION ALL
SELECT 3 AS ID, '2012-12-09 21:12:35' AS Date
UNION ALL
SELECT 2 AS ID, '2012-12-09 20:15:07' AS Date
UNION ALL
SELECT 1 AS ID, '2012-12-09 20:01:42' AS Date
),
BaseDataWithRowNum AS
(
SELECT ID,DATE, ROW_NUMBER() OVER (ORDER BY Date DESC) AS RowNum
FROM BaseData
),
InterRelatedDates AS
(
SELECT B1.RowNum AS RowNum1,B2.RowNum AS RowNum2
FROM BaseDataWithRowNum B1
INNER JOIN BaseDataWithRowNum B2
ON B1.Date BETWEEN B2.Date AND DATEADD(hh,3,B2.Date)
AND B1.RowNum < B2.RowNum
AND B1.ID != B2.ID
),
InterRelatedDatesWithinMultipleGroups AS
(
SELECT G1.RowNum1,G2.RowNum2
FROM InterRelatedDates G1
LEFT JOIN InterRelatedDates G2
ON G1.RowNum2 = G2.RowNum2
AND G1.RowNum1 != G2.RowNum1
)
SELECT BN.ID,
BN.Date,
CountExcludingOriginalGrouppingRecord +1 AS C
FROM
(
SELECT RowNum1 AS RowNum,COUNT(1) AS CountExcludingOriginalGrouppingRecord
FROM
(
-- If a row was used in only one group then it is ok. use as it is
SELECT D1.RowNum1
FROM InterRelatedDatesWithinMultipleGroups AS D1
WHERE D1.RowNum2 IS NULL
UNION ALL
-- In case a row was selected in two groups, choose the one with higher date
SELECT Min(D1.RowNum1)
FROM InterRelatedDatesWithinMultipleGroups AS D1
WHERE D1.RowNum2 IS NOT NULL
GROUP BY D1.RowNum2
) T
GROUP BY RowNum1
) T2
INNER JOIN BaseDataWithRowNum BN
ON BN.RowNum = T2.RowNum