Finding time difference based on distinct ids in mySQL

Finding time difference based on distinct ids in mySQL - mysql

I would like to find the day difference between the latest and the 2nd latest distinct order_id for each user.
The intended output would be:
user_id | order_diff
1 | 1
3 | 7
8 | 1
order_diff represents the difference in days between 2 distinct order_id. In the event that there are no two distinct order_id (as in the case for user id 9), the result is not returned.
In this case, the order_diff for user_id 1 is 1 since the day difference between his 2 distinct order_id is 1. However, there is no order_diff for user_id 9 since he has no 2 distinct `order_id'.
This is the dataset:
user_id order_id order_time
1 208965785 2016-12-15 17:14:13
1 201765785 2016-12-14 17:19:05
1 203932785 2016-12-13 20:41:30
1 209612785 2016-12-14 20:14:32
1 208112785 2016-12-14 20:27:08
1 205525785 2016-12-14 17:01:26
1 208812785 2016-12-14 20:18:23
1 206432785 2016-12-11 20:32:20
1 206698785 2016-12-14 10:50:15
2 209524795 2016-11-26 18:06:21
3 206529925 2016-10-01 10:43:57
3 203729925 2016-10-08 10:43:11
4 204876145 2016-09-24 10:23:49
5 203363157 2016-07-13 23:56:43
6 207784875 2017-01-04 12:21:21
7 206437177 2016-06-25 02:40:33
8 202819645 2016-09-09 11:47:27
8 202819645 2016-09-09 11:47:27
8 202819646 2016-09-08 11:47:27
9 205127187 2016-06-05 22:21:18
9 205127187 2016-06-05 22:21:18
11 207874877 2016-06-17 16:49:44
12 204927595 2016-11-28 23:05:40
This is the code that I am currently using:
SELECT e1.user_id,datediff(e1.order_time,e2.time), e1.order_id FROM
sales e1
JOIN
sales e2
ON
e1.user_id=e2.user_id
AND
e1.order_id = (SELECT distinct order_id FROM sales temp1 WHERE temp1.order_id =e1.order_id ORDER BY order_time DESC LIMIT 1)
AND
e2.order_id = (SELECT distinct order_id FROM sales temp2 WHERE temp2.order_id=e2.order_id ORDER BY order_time DESC LIMIT 1 OFFSET 1)
My output does not produce the desired output and it also ignores the cases where order_ids are the same.
Edit: I would also like the query to be extended to larger datasets where the 2nd most recent order_time may not be the min(order_time)

Based on your fiddle:
select user_id,
datediff(max(order_time),
( -- Scalar Subquery to get the 2nd largest order_time
select max(order_time)
from orders as o2
where o2.user_id = o.user_id -- same user
and o2.order_time < max(o.order_time) -- but not the max time
)
) as diff
from orders as o
group by user_id
having diff is not null -- if there's no 2nd largest time diff will be NULL

Following would work:
Schema (MySQL v5.7)
CREATE TABLE orders
(`user_id` int, `order_id` int, `order_time` datetime)
;
INSERT INTO orders
(`user_id`, `order_id`, `order_time`)
VALUES
(1,208965785,'2016-12-15 17:14:13'),
(1,201765785,'2016-12-14 17:19:05'),
(1,203932785,'2016-12-13 20:41:30'),
(1,209612785,'2016-12-14 20:14:32'),
(1,208112785,'2016-12-14 20:27:08'),
(1,205525785,'2016-12-14 17:01:26'),
(1,208812785,'2016-12-14 20:18:23'),
(1,206432785,'2016-12-11 20:32:20'),
(1,206698785,'2016-12-14 10:50:15'),
(2,209524795,'2016-11-26 18:06:21'),
(3,206529925,'2016-10-01 10:43:57'),
(3,203729925,'2016-10-08 10:43:11'),
(4,204876145,'2016-09-24 10:23:49'),
(5,203363157,'2016-07-13 23:56:43'),
(6,207784875,'2017-01-04 12:21:21'),
(7,206437177,'2016-06-25 02:40:33'),
(8,202819645,'2016-09-09 11:47:27'),
(8,202819645,'2016-09-09 11:47:27'),
(8,202819646,'2016-09-08 11:47:27'),
(9,205127187,'2016-06-05 22:21:18'),
(9,205127187,'2016-06-05 22:21:18'),
(11,207874877,'2016-06-17 16:49:44'),
(12,204927595,'2016-11-28 23:05:40');
Query #1
SELECT dt2.user_id,
MIN(datediff(dt2.latest_order_time,
dt2.second_latest_order_time)) AS order_diff
FROM (
SELECT o.user_id,
o.order_time AS latest_order_time,
(SELECT o2.order_time
FROM orders AS o2
WHERE o2.user_id = o.user_id AND
o2.order_id <> o.order_id
ORDER BY o2.order_time DESC LIMIT 1) AS second_latest_order_time
FROM orders AS o
JOIN (SELECT user_id, MAX(order_time) AS latest_order_time
FROM orders
GROUP BY user_id) AS dt
ON dt.user_id = o.user_id AND
dt.latest_order_time = o.order_time
) AS dt2
WHERE dt2.second_latest_order_time IS NOT NULL
GROUP BY dt2.user_id;
| user_id | order_diff |
| ------- | ---------- |
| 1 | 1 |
| 3 | 7 |
| 8 | 1 |
View on DB Fiddle
Details:
We determine maximum order_time for a user_id in a sub-select query (Derived Table). We can alias it as latest_order_time.
We Join this result-set to the orders table. This will help us in considering only the row(s) with maximum value of order_time for a user_id.
Now, we use a Correlated Subquery to determine the maximum order_time value for the same user, out of the rest of order_id value(s). We can alias it as second_latest_order_time.
Finally, use this as a Derived Table again, and remove all the cases where second_latest_order_time is null, and calculate datediff() for the rest.
A final Group By is needed, as your data has multiple entries for a

Here is the solution:
SELECT user_id,
DATEDIFF(MAX(order_time), MIN(order_time)) as order_diff
FROM orders
GROUP BY user_id
HAVING order_diff > 0;
Here is a link to test it.

Related

MySQL - Count unique users each day considering all previous days

I would like to count how many new unique users the database gets each day for all days recorded.
There will not be any duplicate ids per day, but there will be duplicates over multiple days.
If my table looks like this :
ID | DATE
---------
1 | 2022-05-21
1 | 2022-05-22
2 | 2022-05-22
1 | 2022-05-23
2 | 2022-05-23
1 | 2022-05-24
2 | 2022-05-24
3 | 2022-05-24
I would like the results to look like this :
DATE | NEW UNIQUE IDs
---------------------------
2022-05-21 | 1
2022-05-22 | 1
2022-05-23 | 0
2022-05-24 | 1
A query such as :
SELECT `date` , COUNT( DISTINCT id)
FROM tbl
GROUP BY DATE( `date` )
Will return the count per day and will not take into account previous days.
Any assistance would be appreciated.
Edit : Using MySQL 8

The user is new when the date is the least date for this user.
So you need in something like
SELECT date, COUNT(new_users.id)
FROM calendar
LEFT JOIN ( SELECT id, MIN(date) date
FROM test
GROUP BY id ) new_users USING (date)
GROUP BY date
calendar is either static or dynamically generated table with needed dates list. It can be even SELECT DISTINCT date FROM test subquery.

Start with a subquery showing the earliest date where each id appears.
SELECT MIN(`date`) `firstdate`, id
FROM tbl
GROUP BY id
Then do your count on that subquery. here.
SELECT firstdate, COUNT(*)
FROM (
SELECT MIN(`date`) `firstdate`, id
FROM tbl
GROUP BY id
) m
GROUP BY firstdate
That gives you what you want.
But it doesn't have rows for the dates where no new user ids first appeared.

Only count (and sum) the rows where the left join fails:
SELECT
m1.`DATE` ,
sum(CASE WHEN m2.id is null THEN 1 ELSE 0 END) as C
FROM mytable m1
LEFT JOIN mytable m2 ON m2.`DATE`<m1.`DATE` AND m2.ID=m1.ID
GROUP BY m1.`DATE`
see: DBFIDDLE

How to fetch rows from which sum of a single integer/float column sums upto a certain value

I have a table. It has the following structure
goods_receiving_items
id
item_id
quantity
created_at
I am trying to fetch rows against which have the following conditions
Has one item_id
When the sum of the quantity column equals a certain value
So for example I have the following data
+----+---------+----------+------------+
| id | item_id | quantity | created_at |
+----+---------+----------+------------+
| 1 | 2 | 11 | 2019-10-10 |
| 2 | 3 | 110 | 2019-10-11 |
| 3 | 2 | 20 | 2019-11-09 |
| 4 | 2 | 5 | 2019-11-10 |
| 5 | 2 | 1 | 2019-11-11 |
+----+---------+----------+------------+
I have tried the following query:
SET #sum:= 0;
SELECT item_id, created_at, (#sum:= #sum + quantity) AS SUM, quantity
FROM goods_receiving_items
WHERE item_id = 2 AND #sum<= 6
ORDER BY created_at DESC
If I don't use ORDER BY, then the query will give me ID '1'. But if I use ORDER BY it will return all the rows with item_id = 2.
What should be returned are IDs '5' and '4' exclusively in this order
I can't seem to resolve this and ORDER BY is essential to my task.
Any help would be appreciated

You should use the order by on the resulting set
you could do this using a subquery
SET #sum:= 0;
select t.*
from t (
SELECT item_id
, created_at
, (#sum:= #sum + quantity) as sum
, quantity
FROM goods_receiving_items
WHERE item_id = 2 AND #sum<= 6
) t
ORDER BY created_at DESC

You should try an INNER JOIN with SELECT min(created_at) or SELECT max(created_at)
From MYSQL docs:
...the selection of values from each group cannot be influenced by
adding an ORDER BY clause. Sorting of the result set occurs after
values have been chosen, and ORDER BY does not affect which values the
server chooses.
The answers on the following might help in more detail: MYSQL GROUP BY and ORDER BY not working together as expected

After searching around, I have made up the following query
SELECT
t.id, t.quantity, t.created_at, t.sum
FROM
( SELECT
*,
#bal := #bal + quantity AS sum,
IF(#bal >= $search_number, #doneHere := #doneHere + 1 , #doneHere) AS whereToStop
FROM goods_receiving_items
CROSS JOIN (SELECT #bal := 0.0 , #doneHere := 0) var
WHERE item_id = $item_id
ORDER BY created_at DESC) AS t
WHERE t.whereToStop <= 1
ORDER BY t.created_at ASC
In the above query, $search_number is a variable that holds the value that has to be reached. $item_id is the item we are searching against.
This will return all rows for which the sum of the column quantity makes up the required sum. The sum will be made with rows in descending order by created_at and then will be rearranged in ascending order.
I was using this query to calculate the cost when a certain amount of items are being used in an inventory management system; so this might help someone else do the same. I took most of the query from another question here on StackOverflow

MySQL - How to get the next row

So I have a student_profiles table and ranks table, I want to get the next rank based on the student rank. For example, I have rank 5 then the next rank will be rank 6. So this is my rank structure.
RANKS TABLE:
SELECT * FROM RANKS WHERE style_id = 1"
id style_id level name type primary_colour secondary_colour
1 1 1 Newbie double #4e90b2 #3aad04
22 1 2 Normal solid #fba729 NULL
31 1 3 Expert solid #4e805b NULL
and this is STUDENT_PROFILES TABLE
id | student_id | rank_id
------------------------------------
1 | 1 | 36
2 | 4 | 22
3 | 7 | 10
so all I have a variable is student_id, rank_id & style_id
so for example, I have this value student_id = 4, rank_id = 22 & style_id = 1
It should return
id style_id level name type primary_colour secondary_colour
31 | 1 | 3 | Expert | Solid | #4e805b | NULL

If you just want to get the second row:
Do it like this:
select * from
(select * from table order by id asc limit 2) as a order by id desc limit 1
Any query structure it will work as you need second row if you follow that script.

Try with that:
SELECT * FROM `ranks` WHERE `level` > (SELECT `level` FROM `ranks` WHERE `id` = rank_id) LIMIT 1
But I think it isn't very effective solution.

One option for getting the next highest level in the RANKS table is to self-join this table on the level column, order ascending, and retain the very first record only.
SELECT r2.*
FROM RANKS r1
INNER JOIN
STUDENT_PROFILES s1
ON r1.id = s1.rank_id
INNER JOIN
RANKS r2
ON r2.level > r1.level
ORDER BY r2.level
LIMIT 1
Demo here:
SQLFiddle
Note: If RANKS has duplicate levels, and you want the next level with regard to cardinality (i.e. you don't want a duplicate equal level returned), then my query could be slightly modified to filter out such duplicates.

How to return results that relate across 3 tables efficiently MySQL

Howdie do,
I have the following 3 tables: order, manifest and tracking_updates. Now, each order has foreign key called manifest_id that references the manifest table. Several orders can be in a manifest. The tracking_updates table has a foreign key called order_id that references the order table.
Now, the manifest table contains a column named upload_date. That column, upload_date is the column I need to use in order to determine if an order was uploaded in the last 30 days.
The tracking_update table can contain many updates for each order and so, I must return the most recent tracking update status for each order that matches the criteria below:
1. orders < 30 days, any delivery status
2. orders > 30 days, not delivered
Please see tables below
**Order**
ID | manifest_id
1 | 123
2 | 123
3 | 456
**Manifest**:
ID | upload_date
123 | 2015-12-15 09:31:12
456 | 2015-10-13 09:31:12
**Tracking Update**:
order_id | status_type | last_updated
1 | M | 2015-12-15 00:00:00
1 | I | 2015-12-16 07:20:00
1 | D | 2015-12-17 15:20:00
2 | M | 2015-12-15 00:00:00
2 | D | 2015-12-16 15:20:00
3 | M | 2015-10-13 00:00:00
3 | I | 2015-10-14 12:00:00
3 | E | 2015-10-15 13:50:00
This is what the result set would look like for the orders above
**Result Set**
order_id | manifest_id | latest_tracking_update_status
1 | 123 | D
2 | 123 | D
3 | 456 | E
As you can see, order 1, 2 are assigned to manifest 123 and the manifest was uploaded within the last 30 days and their latest tracking update shows a 'D' for delivered. So those two orders should be included in the result set.
The order 3 is older then 30 days, but hasn't been delivered based off the latest tracking_update status_type, so it should show up in the result set.
Now, the tracking_update table as well over 1 million updates across all orders. So I'm really going for efficiency here
Currently, I have the following queries.
Query #1 returns orders that have been uploaded within the last 30 days and their corresponding latest tracking update
SELECT
fgw247.order.id as order_id,
(SELECT
status_type
FROM
tracking_update as tu
WHERE
tu.order_id = order_id
ORDER BY
tu.ship_update_date DESC
LIMIT
1
) as latestTrackingUpdate
FROM
fgw247.order, manifest
WHERE
fgw247.order.manifest_id = manifest.id
AND
upload_date >= '2015-12-12 00:00:00'
Query #2 returns the order_id and latest tracking update for every order in the tracking_update table:
SELECT tracking_update.order_id,
substring_index(group_concat(tracking_update.status_type order by tracking_update.last_updated), ',', -1)
FROM
tracking_update
WHERE
tracking_update.order_id is not NULL
GROUP BY tracking_update.order_id
I'm just not sure how to combine these queries to get my orders that match the criteria:
orders < 30 days, any delivery status
orders > 30 days, not delivered
Any ideas would be GREATLY appreciated.
* UPDATE *
This is the current query thanks to answer selected:
select
o.id, t.maxudate, tu.status_type, m.upload_date
from
(select order_id, max(last_updated) as maxudate from tracking_update group by order_id) t
inner join
tracking_update tu on t.order_id=tu.order_id and t.maxudate=tu.last_updated
right join
fgw247.order o on t.order_id=o.id
left join
manifest m on o.manifest_id=m.id
where
(tu.status_type != 'D' and tu.status_type != 'XD' and m.upload_date <='2015-12-12 00:00:00') or m.upload_date >= '2015-12-12 00:00:00'
LIMIT 10
UPDATE
This is the current query that joins the three tables rather efficiently
SELECT
o.*, tu.*
FROM
fgw247.`order` o
JOIN
manifest m
ON
o.`manifest_id` = m.`id`
JOIN
`tracking_update` tu
ON
tu.`order_id` = o.`id` and tu.`ship_update_date` = (select max(last_updated) as last_updated from tracking_update where order_id = o.`id` group by order_id)
WHERE
m.`upload_date` >= '2015-12-14 11:50:12'
OR
(o.`delivery_date` IS NULL AND m.`upload_date` < '2015-12-14 11:50:12')
LIMIT 100

Have a subquery that returns the latest update date from the tracking table for each order. Join this subquery on the tracking, orders, and manifests tables to get the details and filter based on the upload date in the where clause:
select o.order_id, t.maxudate, tu.status_type, m.upload_date
from (select order_id, max(update_date) as maxudate from tracking_update group by order_id) t
inner join tracking_update tu on t.order_id=tu.order_id and t.maxudate=tu.update_date
right join orders o on t.order_id=o.order_id
left join manifests m on o.manifest_id=m.manifest_id
where (tu.status_type<>'D' and curdate()-m.upload_date>30) or curdate()-m.upload_date<=30
It may be more efficient to use a union query instead of the or criteria in the where clause.

You can perform a JOIN with the 2nd query result like
SELECT
fgw247.order.id as order_id,
xx.some_column,
(SELECT
status_type
FROM
tracking_update as tu
WHERE tu.order_id = order_id
ORDER BY
tu.ship_update_date DESC
LIMIT
1
) as latestTrackingUpdate
FROM
fgw247.order JOIN manifest
ON fgw247.order.manifest_id = manifest.id
JOIN (
SELECT tracking_update.order_id,
substring_index(group_concat(tracking_update.status_type order by tracking_update.last_updated), ',', -1) AS some_column
FROM
tracking_update
WHERE
tracking_update.order_id is not NULL
GROUP BY tracking_update.order_id ) xx ON xx.order_id = fgw247.order.id
WHERE upload_date >= '2015-12-12 00:00:00'

Using ORDER BY and GROUP BY together

My table looks like this (and I'm using MySQL):
m_id | v_id | timestamp
------------------------
6 | 1 | 1333635317
34 | 1 | 1333635323
34 | 1 | 1333635336
6 | 1 | 1333635343
6 | 1 | 1333635349
My target is to take each m_id one time, and order by the highest timestamp.
The result should be:
m_id | v_id | timestamp
------------------------
6 | 1 | 1333635349
34 | 1 | 1333635336
And i wrote this query:
SELECT * FROM table GROUP BY m_id ORDER BY timestamp DESC
But, the results are:
m_id | v_id | timestamp
------------------------
34 | 1 | 1333635323
6 | 1 | 1333635317
I think it causes because it first does GROUP_BY and then ORDER the results.
Any ideas? Thank you.

One way to do this that correctly uses group by:
select l.*
from table l
inner join (
select
m_id, max(timestamp) as latest
from table
group by m_id
) r
on l.timestamp = r.latest and l.m_id = r.m_id
order by timestamp desc
How this works:
selects the latest timestamp for each distinct m_id in the subquery
only selects rows from table that match a row from the subquery (this operation -- where a join is performed, but no columns are selected from the second table, it's just used as a filter -- is known as a "semijoin" in case you were curious)
orders the rows

If you really don't care about which timestamp you'll get and your v_id is always the same for a given m_i you can do the following:
select m_id, v_id, max(timestamp) from table
group by m_id, v_id
order by max(timestamp) desc
Now, if the v_id changes for a given m_id then you should do the following
select t1.* from table t1
left join table t2 on t1.m_id = t2.m_id and t1.timestamp < t2.timestamp
where t2.timestamp is null
order by t1.timestamp desc

Here is the simplest solution
select m_id,v_id,max(timestamp) from table group by m_id;
Group by m_id but get max of timestamp for each m_id.

You can try this
SELECT tbl.* FROM (SELECT * FROM table ORDER BY timestamp DESC) as tbl
GROUP BY tbl.m_id

SQL>
SELECT interview.qtrcode QTR, interview.companyname "Company Name", interview.division Division
FROM interview
JOIN jobsdev.employer
ON (interview.companyname = employer.companyname AND employer.zipcode like '100%')
GROUP BY interview.qtrcode, interview.companyname, interview.division
ORDER BY interview.qtrcode;

I felt confused when I tried to understand the question and answers at first. I spent some time reading and I would like to make a summary.
The OP's example is a little bit misleading.
At first I didn't understand why the accepted answer is the accepted answer.. I thought that the OP's request could be simply fulfilled with
select m_id, v_id, max(timestamp) as max_time from table
group by m_id, v_id
order by max_time desc
Then I took a second look at the accepted answer. And I found that actually the OP wants to express that, for a sample table like:
m_id | v_id | timestamp
------------------------
6 | 1 | 11
34 | 2 | 12
34 | 3 | 13
6 | 4 | 14
6 | 5 | 15
he wants to select all columns based only on (group by)m_id and (order by)timestamp.
Then the above sql won't work. If you still don't get it, imagine you have more columns than m_id | v_id | timestamp, e.g m_id | v_id | timestamp| columnA | columnB |column C| .... With group by, you can only select those "group by" columns and aggreate functions in the result.
By far, you should have understood the accepted answer.
What's more, check row_number function introduced in MySQL 8.0:
https://www.mysqltutorial.org/mysql-window-functions/mysql-row_number-function/
Finding top N rows of every group
It does the simlar thing as the accepted answer.
Some answers are wrong. My MySQL gives me error.
select m_id,v_id,max(timestamp) from table group by m_id;
#abinash sahoo
SELECT m_id,v_id,MAX(TIMESTAMP) AS TIME
FROM table_name
GROUP BY m_id
#Vikas Garhwal
Error message:
[42000][1055] Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'testdb.test_table.v_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

Why make it so complicated? This worked.
SELECT m_id,v_id,MAX(TIMESTAMP) AS TIME
FROM table_name
GROUP BY m_id

Just you need to desc with asc. Write the query like below. It will return the values in ascending order.
SELECT * FROM table GROUP BY m_id ORDER BY m_id asc;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Finding time difference based on distinct ids in mySQL - mysql

Here is the solution: SELECT user_id, DATEDIFF(MAX(order_time), MIN(order_time)) as order_diff FROM orders GROUP BY user_id HAVING order_diff > 0; Here is a link to test it.

Related

MySQL - Count unique users each day considering all previous days

How to fetch rows from which sum of a single integer/float column sums upto a certain value

MySQL - How to get the next row

How to return results that relate across 3 tables efficiently MySQL

Using ORDER BY and GROUP BY together

Categories

Resources