MySQL - Count unique users each day considering all previous days - mysql

I would like to count how many new unique users the database gets each day for all days recorded.
There will not be any duplicate ids per day, but there will be duplicates over multiple days.
If my table looks like this :
ID | DATE
---------
1 | 2022-05-21
1 | 2022-05-22
2 | 2022-05-22
1 | 2022-05-23
2 | 2022-05-23
1 | 2022-05-24
2 | 2022-05-24
3 | 2022-05-24
I would like the results to look like this :
DATE | NEW UNIQUE IDs
---------------------------
2022-05-21 | 1
2022-05-22 | 1
2022-05-23 | 0
2022-05-24 | 1
A query such as :
SELECT `date` , COUNT( DISTINCT id)
FROM tbl
GROUP BY DATE( `date` )
Will return the count per day and will not take into account previous days.
Any assistance would be appreciated.
Edit : Using MySQL 8

The user is new when the date is the least date for this user.
So you need in something like
SELECT date, COUNT(new_users.id)
FROM calendar
LEFT JOIN ( SELECT id, MIN(date) date
FROM test
GROUP BY id ) new_users USING (date)
GROUP BY date
calendar is either static or dynamically generated table with needed dates list. It can be even SELECT DISTINCT date FROM test subquery.

Start with a subquery showing the earliest date where each id appears.
SELECT MIN(`date`) `firstdate`, id
FROM tbl
GROUP BY id
Then do your count on that subquery. here.
SELECT firstdate, COUNT(*)
FROM (
SELECT MIN(`date`) `firstdate`, id
FROM tbl
GROUP BY id
) m
GROUP BY firstdate
That gives you what you want.
But it doesn't have rows for the dates where no new user ids first appeared.

Only count (and sum) the rows where the left join fails:
SELECT
m1.`DATE` ,
sum(CASE WHEN m2.id is null THEN 1 ELSE 0 END) as C
FROM mytable m1
LEFT JOIN mytable m2 ON m2.`DATE`<m1.`DATE` AND m2.ID=m1.ID
GROUP BY m1.`DATE`
see: DBFIDDLE

Related

Finding time difference based on distinct ids in mySQL

I would like to find the day difference between the latest and the 2nd latest distinct order_id for each user.
The intended output would be:
user_id | order_diff
1 | 1
3 | 7
8 | 1
order_diff represents the difference in days between 2 distinct order_id. In the event that there are no two distinct order_id (as in the case for user id 9), the result is not returned.
In this case, the order_diff for user_id 1 is 1 since the day difference between his 2 distinct order_id is 1. However, there is no order_diff for user_id 9 since he has no 2 distinct `order_id'.
This is the dataset:
user_id order_id order_time
1 208965785 2016-12-15 17:14:13
1 201765785 2016-12-14 17:19:05
1 203932785 2016-12-13 20:41:30
1 209612785 2016-12-14 20:14:32
1 208112785 2016-12-14 20:27:08
1 205525785 2016-12-14 17:01:26
1 208812785 2016-12-14 20:18:23
1 206432785 2016-12-11 20:32:20
1 206698785 2016-12-14 10:50:15
2 209524795 2016-11-26 18:06:21
3 206529925 2016-10-01 10:43:57
3 203729925 2016-10-08 10:43:11
4 204876145 2016-09-24 10:23:49
5 203363157 2016-07-13 23:56:43
6 207784875 2017-01-04 12:21:21
7 206437177 2016-06-25 02:40:33
8 202819645 2016-09-09 11:47:27
8 202819645 2016-09-09 11:47:27
8 202819646 2016-09-08 11:47:27
9 205127187 2016-06-05 22:21:18
9 205127187 2016-06-05 22:21:18
11 207874877 2016-06-17 16:49:44
12 204927595 2016-11-28 23:05:40
This is the code that I am currently using:
SELECT e1.user_id,datediff(e1.order_time,e2.time), e1.order_id FROM
sales e1
JOIN
sales e2
ON
e1.user_id=e2.user_id
AND
e1.order_id = (SELECT distinct order_id FROM sales temp1 WHERE temp1.order_id =e1.order_id ORDER BY order_time DESC LIMIT 1)
AND
e2.order_id = (SELECT distinct order_id FROM sales temp2 WHERE temp2.order_id=e2.order_id ORDER BY order_time DESC LIMIT 1 OFFSET 1)
My output does not produce the desired output and it also ignores the cases where order_ids are the same.
Edit: I would also like the query to be extended to larger datasets where the 2nd most recent order_time may not be the min(order_time)
Based on your fiddle:
select user_id,
datediff(max(order_time),
( -- Scalar Subquery to get the 2nd largest order_time
select max(order_time)
from orders as o2
where o2.user_id = o.user_id -- same user
and o2.order_time < max(o.order_time) -- but not the max time
)
) as diff
from orders as o
group by user_id
having diff is not null -- if there's no 2nd largest time diff will be NULL
Following would work:
Schema (MySQL v5.7)
CREATE TABLE orders
(`user_id` int, `order_id` int, `order_time` datetime)
;
INSERT INTO orders
(`user_id`, `order_id`, `order_time`)
VALUES
(1,208965785,'2016-12-15 17:14:13'),
(1,201765785,'2016-12-14 17:19:05'),
(1,203932785,'2016-12-13 20:41:30'),
(1,209612785,'2016-12-14 20:14:32'),
(1,208112785,'2016-12-14 20:27:08'),
(1,205525785,'2016-12-14 17:01:26'),
(1,208812785,'2016-12-14 20:18:23'),
(1,206432785,'2016-12-11 20:32:20'),
(1,206698785,'2016-12-14 10:50:15'),
(2,209524795,'2016-11-26 18:06:21'),
(3,206529925,'2016-10-01 10:43:57'),
(3,203729925,'2016-10-08 10:43:11'),
(4,204876145,'2016-09-24 10:23:49'),
(5,203363157,'2016-07-13 23:56:43'),
(6,207784875,'2017-01-04 12:21:21'),
(7,206437177,'2016-06-25 02:40:33'),
(8,202819645,'2016-09-09 11:47:27'),
(8,202819645,'2016-09-09 11:47:27'),
(8,202819646,'2016-09-08 11:47:27'),
(9,205127187,'2016-06-05 22:21:18'),
(9,205127187,'2016-06-05 22:21:18'),
(11,207874877,'2016-06-17 16:49:44'),
(12,204927595,'2016-11-28 23:05:40');
Query #1
SELECT dt2.user_id,
MIN(datediff(dt2.latest_order_time,
dt2.second_latest_order_time)) AS order_diff
FROM (
SELECT o.user_id,
o.order_time AS latest_order_time,
(SELECT o2.order_time
FROM orders AS o2
WHERE o2.user_id = o.user_id AND
o2.order_id <> o.order_id
ORDER BY o2.order_time DESC LIMIT 1) AS second_latest_order_time
FROM orders AS o
JOIN (SELECT user_id, MAX(order_time) AS latest_order_time
FROM orders
GROUP BY user_id) AS dt
ON dt.user_id = o.user_id AND
dt.latest_order_time = o.order_time
) AS dt2
WHERE dt2.second_latest_order_time IS NOT NULL
GROUP BY dt2.user_id;
| user_id | order_diff |
| ------- | ---------- |
| 1 | 1 |
| 3 | 7 |
| 8 | 1 |
View on DB Fiddle
Details:
We determine maximum order_time for a user_id in a sub-select query (Derived Table). We can alias it as latest_order_time.
We Join this result-set to the orders table. This will help us in considering only the row(s) with maximum value of order_time for a user_id.
Now, we use a Correlated Subquery to determine the maximum order_time value for the same user, out of the rest of order_id value(s). We can alias it as second_latest_order_time.
Finally, use this as a Derived Table again, and remove all the cases where second_latest_order_time is null, and calculate datediff() for the rest.
A final Group By is needed, as your data has multiple entries for a
Here is the solution:
SELECT user_id,
DATEDIFF(MAX(order_time), MIN(order_time)) as order_diff
FROM orders
GROUP BY user_id
HAVING order_diff > 0;
Here is a link to test it.

MYSQL selecting top 4 sums based on criteria

Hoping you will be able to help me with this MYSQL statement. I have a table like so:
|id |duration |start |
|1110460 |8.2 |20171211 |
|2221104 |8.9 |20171112 |
|1110460 |3.2 |20171113 |
|1110460 |4.4 |20171214 |
|3331938 |3.2 |20180115 |
|3331722 |5.4 |20171216 |
|1948212 |9.2 |20171217 |
|9219302 |3.2 |20171218 |
What I want to do is list the top 4 IDs by total duration for a given start month in descending order.
For example, for the top 4 IDs for 201712:
|id |duration |
|1110460 |12.6 |
|1948212 |9.2 |
|3331722 |5.4 |
|9219302 |3.2 |
Any help would be appreciated. This is what I have so far, but it has been returning the incorrect results:
SELECT id, sum(duration) FROM table WHERE
start LIKE '201712%' ORDER BY sum(duration) DESC LIMIT 4
You should use group by
SELECT id, sum(duration)
FROM table
WHERE start LIKE '201712%'
GROUP BY id
ORDER BY sum(duration) DESC LIMIT 4
To get 4 distinct maximum sums and their ids you could use following
SELECT a.id, a.sum_duration
FROM (SELECT id, sum(duration) sum_duration
FROM demo
WHERE `start` LIKE '201712%'
GROUP BY id
) a
JOIN (SELECT distinct sum(duration) max_durations
FROM demo
WHERE `start` LIKE '201712%'
GROUP BY id
ORDER BY sum(duration) DESC
LIMIT 4
) b on a.sum_duration = b.max_durations
The above version will return morethan 4 rows if 2 ids have same result of sum
DEMO

MYSQL: Select items that do not contain a certain date

Say I have a table
CustID | OrderDate
1 | 2017-05-30 05:15:18
2 | 2017-04-18 05:15:18
2 | 2017-04-15 05:15:18
3 | 2017-02-17 05:15:18
4 | 2017-05-29 05:15:18
4 | 2017-03-24 05:15:18
Any I only want to return back the CustIDs that do not contain an order date newer than 30 days (Today being 2017-05-30). So the above example would only return 2 and 3.
I have:
SELECT DISTINCT CustID
FROM TABLE
WHERE NOT EXISTS (SELECT CustID FROM TABLE WHERE OrderDate > DATE_ADD(NOW(),INTERVAL-30DAY));
But I only get syntax errors.
Thanks again, I am quite new with SQL.
You can try this:
select distinct CustIDs from YourTableName where OrderDate < now() - interval 30 day;
PS: In your query, you're using FROM TABLE - it isn't right, you must use FROM {YourTableName}, where {YourTableName} is real name of your table in database, like (customers, clients, etc.)
because you're missing a vital component in where clause: and t1.id = t2.id
select distinct CustID from table t1 where not exists (select 1 from table t2 where [whatever_your_conditions_are] and t1.id = t2.id);
A table similar to yours... It just has date instead of timestamp (and a relaxed date condition in where clause, just for the sake of simplicity):
selecting the users whose date > current_date (which is order_date in your case):
as far as i understand your question.
your query will be:
SELECT DISTINCT CustID FROM TABLE WHERE OrderDate < GETDATE()-30
Let me know if you require changes like < or > OrderDate in Where Clause.

MySQL Unique Exemption Case with Group By

I have a set of data that lists when a User changes a Product, we look at who changed it, when, the old cost and new cost, and the percentage price difference.
I want to use a group by, where statement, or case to group by and exclude products that filters out changes were the change occurred in the same day and resulted in the original price staying.
So the situation I want to exclude would look like this:
| product | Changed By | Old Price | New Price | % diff | Day Changed |
|----------|------------|-----------|-----------|--------|-------------|
| blue hat | me | 94.00 | 95.00 | 1.05 | 2016-11-28 |
| blue hat | me | 95.00 | 94.00 | 1.05 | 2016-11-28 |
Any ideas how to do this with MySql?
Here is a working version for anyone who wants to see this done using subqueries, where's, and group by's.
This query looks at the changes to an Item's cost by a User for the span of 1 day, where it pulls in all the results from "yesterday". It lists all the changes for that day one asc and one desc and compares the price changes that way. If they are the same from the oldest change to the newest change of that say then it is exempted.
SELECT
us.Name,
it.Name,
pal.CS_PA_Line_Supplier__c as supplier_name,
newest.NewValue as new_value,
oldest.OldValue as old_value,
((newest.NewValue - oldest.OldValue) / oldest.OldValue) * 100 as Percentage
FROM
(
SELECT Id, Name, KNDY4__Item__c, CS_PA_Line_Supplier__c
FROM KNDY4__Contract_Line__c
) pal
LEFT JOIN
(
SELECT *
FROM
(
SELECT
ParentId,
CreatedById,
CreatedDate,
Field,
NewValue
FROM KNDY4__Contract_Line__History
WHERE CreatedDate >= CURDATE() - INTERVAL 1 DAY
AND CreatedDate < CURDATE()
AND Field='KNDY4__Negotiated_Price__c'
ORDER BY CreatedDate DESC
) cd
GROUP BY ParentId
) newest
ON newest.ParentId=pal.Id
LEFT JOIN
(
SELECT * FROM (
SELECT
ParentId,
OldValue
FROM KNDY4__Contract_Line__History
WHERE CreatedDate >= CURDATE() - INTERVAL 1 DAY
AND CreatedDate < CURDATE()
AND Field='KNDY4__Negotiated_Price__c'
ORDER BY CreatedDate ASC
) cd
GROUP BY ParentId
) oldest
ON oldest.ParentId=pal.Id
LEFT JOIN
(
SELECT Id, Name
FROM User
) us
ON us.Id = newest.CreatedById
LEFT JOIN
(
SELECT Id,Name
FROM KNDY4__Item__c
) it
ON it.Id=pal.KNDY4__Item__c
WHERE newest.ParentId IS NOT NULL
AND oldest.OldValue IS NOT NULL
AND newest.NewValue != oldest.OldValue
GROUP BY pal.KNDY4__Item__c, pal.CS_PA_Line_Supplier__c
ORDER BY it.Name ASC

How to return results that relate across 3 tables efficiently MySQL

Howdie do,
I have the following 3 tables: order, manifest and tracking_updates. Now, each order has foreign key called manifest_id that references the manifest table. Several orders can be in a manifest. The tracking_updates table has a foreign key called order_id that references the order table.
Now, the manifest table contains a column named upload_date. That column, upload_date is the column I need to use in order to determine if an order was uploaded in the last 30 days.
The tracking_update table can contain many updates for each order and so, I must return the most recent tracking update status for each order that matches the criteria below:
1. orders < 30 days, any delivery status
2. orders > 30 days, not delivered
Please see tables below
**Order**
ID | manifest_id
1 | 123
2 | 123
3 | 456
**Manifest**:
ID | upload_date
123 | 2015-12-15 09:31:12
456 | 2015-10-13 09:31:12
**Tracking Update**:
order_id | status_type | last_updated
1 | M | 2015-12-15 00:00:00
1 | I | 2015-12-16 07:20:00
1 | D | 2015-12-17 15:20:00
2 | M | 2015-12-15 00:00:00
2 | D | 2015-12-16 15:20:00
3 | M | 2015-10-13 00:00:00
3 | I | 2015-10-14 12:00:00
3 | E | 2015-10-15 13:50:00
This is what the result set would look like for the orders above
**Result Set**
order_id | manifest_id | latest_tracking_update_status
1 | 123 | D
2 | 123 | D
3 | 456 | E
As you can see, order 1, 2 are assigned to manifest 123 and the manifest was uploaded within the last 30 days and their latest tracking update shows a 'D' for delivered. So those two orders should be included in the result set.
The order 3 is older then 30 days, but hasn't been delivered based off the latest tracking_update status_type, so it should show up in the result set.
Now, the tracking_update table as well over 1 million updates across all orders. So I'm really going for efficiency here
Currently, I have the following queries.
Query #1 returns orders that have been uploaded within the last 30 days and their corresponding latest tracking update
SELECT
fgw247.order.id as order_id,
(SELECT
status_type
FROM
tracking_update as tu
WHERE
tu.order_id = order_id
ORDER BY
tu.ship_update_date DESC
LIMIT
1
) as latestTrackingUpdate
FROM
fgw247.order, manifest
WHERE
fgw247.order.manifest_id = manifest.id
AND
upload_date >= '2015-12-12 00:00:00'
Query #2 returns the order_id and latest tracking update for every order in the tracking_update table:
SELECT tracking_update.order_id,
substring_index(group_concat(tracking_update.status_type order by tracking_update.last_updated), ',', -1)
FROM
tracking_update
WHERE
tracking_update.order_id is not NULL
GROUP BY tracking_update.order_id
I'm just not sure how to combine these queries to get my orders that match the criteria:
orders < 30 days, any delivery status
orders > 30 days, not delivered
Any ideas would be GREATLY appreciated.
* UPDATE *
This is the current query thanks to answer selected:
select
o.id, t.maxudate, tu.status_type, m.upload_date
from
(select order_id, max(last_updated) as maxudate from tracking_update group by order_id) t
inner join
tracking_update tu on t.order_id=tu.order_id and t.maxudate=tu.last_updated
right join
fgw247.order o on t.order_id=o.id
left join
manifest m on o.manifest_id=m.id
where
(tu.status_type != 'D' and tu.status_type != 'XD' and m.upload_date <='2015-12-12 00:00:00') or m.upload_date >= '2015-12-12 00:00:00'
LIMIT 10
UPDATE
This is the current query that joins the three tables rather efficiently
SELECT
o.*, tu.*
FROM
fgw247.`order` o
JOIN
manifest m
ON
o.`manifest_id` = m.`id`
JOIN
`tracking_update` tu
ON
tu.`order_id` = o.`id` and tu.`ship_update_date` = (select max(last_updated) as last_updated from tracking_update where order_id = o.`id` group by order_id)
WHERE
m.`upload_date` >= '2015-12-14 11:50:12'
OR
(o.`delivery_date` IS NULL AND m.`upload_date` < '2015-12-14 11:50:12')
LIMIT 100
Have a subquery that returns the latest update date from the tracking table for each order. Join this subquery on the tracking, orders, and manifests tables to get the details and filter based on the upload date in the where clause:
select o.order_id, t.maxudate, tu.status_type, m.upload_date
from (select order_id, max(update_date) as maxudate from tracking_update group by order_id) t
inner join tracking_update tu on t.order_id=tu.order_id and t.maxudate=tu.update_date
right join orders o on t.order_id=o.order_id
left join manifests m on o.manifest_id=m.manifest_id
where (tu.status_type<>'D' and curdate()-m.upload_date>30) or curdate()-m.upload_date<=30
It may be more efficient to use a union query instead of the or criteria in the where clause.
You can perform a JOIN with the 2nd query result like
SELECT
fgw247.order.id as order_id,
xx.some_column,
(SELECT
status_type
FROM
tracking_update as tu
WHERE tu.order_id = order_id
ORDER BY
tu.ship_update_date DESC
LIMIT
1
) as latestTrackingUpdate
FROM
fgw247.order JOIN manifest
ON fgw247.order.manifest_id = manifest.id
JOIN (
SELECT tracking_update.order_id,
substring_index(group_concat(tracking_update.status_type order by tracking_update.last_updated), ',', -1) AS some_column
FROM
tracking_update
WHERE
tracking_update.order_id is not NULL
GROUP BY tracking_update.order_id ) xx ON xx.order_id = fgw247.order.id
WHERE upload_date >= '2015-12-12 00:00:00'