Writing SQL with timestamps - mysql

The data
CREATE TABLE IF NOT EXISTS `transactions` (
`transactions_ts` timestamp ,
`user_id` int(6) unsigned NOT NULL,
`transaction_id` bigint,
`item` varchar(200), PRIMARY KEY(`transaction_id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `transactions` (`transactions_ts`, `user_id`, `transaction_id`,`item` ) VALUES
('2016-06-18 13:46:51.0', 13811335,1322361417, 'glove'),
('2016-06-18 17:29:25.0', 13811335,3729362318, 'hat'),
('2016-06-18 23::07:12.0', 13811335,1322363995,'vase' ),
('2016-06-19 07:14:56.0',13811335,7482365143, 'cup'),
('2016-06-19 21:59:40.0',13811335,1322369619,'mirror' ),
('2016-06-17 12:39:46.0',3378024101,9322351612, 'dress'),
('2016-06-17 20:22:17.0',3378024101,9322353031,'vase' ),
('2016-06-20 11:29:02.0',3378024101,6928364072,'tie'),
('2016-06-20 18:59:48.0',13811335,1322375547, 'mirror');
The question: for each user, show the first item that they ordered (first by time). I assume time as a whole timestamp (not time and date separately).
My attempt
select
min(transactions_ts) as first_trans,
user_id, item
from transactions
group by user_id
order by first_trans;
I am sorry that may be it is a simple question, but one person tells me that my query is entirely wrong. And I have got no other means to test this claim of his
demo fiddle

This is a little bit more complicated than you thought.
To start with: "for each user" would translate to GROUP BY user_id, not to GROUP BY user_id, item.
But with GROUP BY user_id, you'd need an aggregation function saying "the item for the minimum transactions_ts". MySQL doesn't feature such an aggregation function.
The obvious solution is to make this two steps:
Find the first transaction per user
Show the items for these transactions
The query:
select *
from transactions
where (user_id, transactions_ts) in
(
select user_id, min(transactions_ts)
from transactions
group by user_id
);
Another way to word the task is: "Give me the transactions for which no older transaction for the same user exists".
The query:
select *
from transactions t
where not exists
(
select *
from transactions t2
where t2.user_id = t.user_id
and t2.transactions_ts < t.transactions_ts
);

If you are using MySQL 8.0, window function ROW_NUMBER() can be used to adress your use case, as follows:
SELECT transactions_ts, user_id, item
FROM (
SELECT
transactions_ts,
user_id,
item,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY transactions_ts) rn
FROM transactions
) x WHERE rn = 1
The inner query ranks each record by ascending timestamp, within groups of records having the same user_id. The outer query filters in the first transaction of each customer.
Demo on DB Fiddle:
transactions_ts | user_id | item
:------------------ | ---------: | :----
2016-06-18 13:46:51 | 13811335 | glove
2016-06-17 12:39:46 | 3378024101 | dress

You can do it using a subquery to get the first transaction_ts for each user:
select user_id, item, transactions_ts
from transactions a
where transactions_ts=(select min(transactions_ts)
from transactions b
where b.user_id=a.user_id)
So your get:
In the inner query get the first transaction time for each user
In the outer query you get the row that has the time you got at point 1

Related

SQL query for listing values based on a column

I have a table with the following columns member_id, status and created_at (timestamp) and i want to extract the latest status for each member_id based on the timestamp value.
member_id
status
created_at
1
ON
1641862225
1
OFF
1641862272
2
OFF
1641862397
3
OFF
1641862401
3
ON
1641862402
Source: Raw data image
So, my ideal query result would be like this:
member_id
status
created_at
1
OFF
1641862272
2
OFF
1641862397
3
ON
1641862402
Expected query results image
My go to process for doing things like that is to assign a row number to each data and get row number 1 depending on the partition and sorting.
For mysql, this is only available starting mysql 8
SELECT ROW_NUMBER() OVER(PARTITION BY member_id ORDER BY created_at DESC) as row_num,
member_id, status, created_at FROM table
This will generate something like this.
row_num
member_id
status
created_at
1
1
OFF
1641862272
2
1
ON
1641862225
1
2
OFF
1641862397
1
3
ON
1641862402
2
3
OFF
1641862401
Then you use that as a sub query and get the rows where row_num = 1
SELECT member_id, status, created_at FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY member_id ORDER BY created_at DESC) as row_num,
member_id, status, created_at FROM table
) a WHERE row_num = 1
MySQL has support for Window Function since v8.0. the solution from crimson589 is preferred for v8+, this solution applies for earlier versions of MySQL or if you need an alternate solution to window queries.
After grouping by member_id we can either join back into the original set to gain the corresponding status value to the MAX(created_at)
SELECT ByMember.member_id
, status.status
, ByMember.created_at
FROM (
SELECT member_id, max(created_at) as created_at
FROM MemberStatus
GROUP BY member_id
) ByMember
JOIN MemberStatus status ON ByMember.member_id = status.member_id AND ByMember.created_at = status.created_at;
Or you could use a sub query instead of the join:
SELECT ByMember.member_id
, (SELECT status.status FROM MemberStatus status WHERE ByMember.member_id = status.member_id AND ByMember.created_at = status.created_at) as status
, ByMember.created_at
FROM (
SELECT member_id, max(created_at) as created_at
FROM MemberStatus
GROUP BY member_id
) ByMember
The JOIN based solution allows you to query additional columns from the original set instead of having multiple sub-queries. I would almost always advocate for the JOIN solution, but sometimes the sub-query is simpler to maintain.
I've setup a fiddle to compare these options: http://sqlfiddle.com/#!9/0edb931/11
You can group by member_id and max of created_at, then a self join with member_id and created_at will give you the latest status.

Selecting Data from Normalized Tables

I'm stuck on trying to write this query, I think my brain is just a little fried tonight. I have this table that stores whenever a person executes an action (Clocking In, Clocking Out, Going on Lunch, Returning from Lunch) and I need to return a list of all the primary ID's for the people who's last action is not clock_out - but the problem is it needs to be a somewhat fast query.
Table Structure:
ID | person_id | status | datetime | shift_type
ID = Primary Key for this table
person_id = The ID I want to return if their status does not equal clock_out
status = clock_in, lunch_start, lunch_end, break_start, break_end, clock_out
datetime = The time the record was added
shift_type = Not Important
The way I was executing this query before was finding people who are still clocked in during a specific time period, however I need this query to locate at any point. The queries I am trying are taking the thousands and thousands of records and making it way too slow.
I need to return a list of all the primary ID's for the people whose last action is not clock_out.
One option uses window functions, available in MySQL 8.0:
select id
from (
select t.*, row_number() over(partition by person_id order by datetime desc) rn
from mytable t
) t
where rn = 1 and status <> 'clock_out'
In earlier versions, one option uses a correlated subquery:
select id
from mytable
where
datetime = (select max(t1.datetime) from mytable t1 where t1.personid = t.person_id)
and status <> 'clock_out'
After looking through it further, this was my solution -
SELECT * FROM (
SELECT `status`,`person_id` FROM `timeclock` ORDER BY `datetime` DESC
) AS tmp_table GROUP BY `person_id`
This works because it is grouping all of the same person ID's together, and then ordering them by the datetime and selecting the most recent.

Accelerate large mysql join

I am writing a sql to list every day active user with its first appearance date in the log table. The MySQL version is 5.7.
Like:
date active_users reg_date
2020-03-1 user1 2019-02-01
2020-03-1 user2 2019-03-04
2020-03-2 user3 2019-01-18
2020-03-2 user1 2019-02-01
I have finished a query to achieve this, but as shown, I made 2 aggregation for the same table and then join them together... The login log table game_user_log comprises 2 million rows of data and I have added index on column data_date and data_date, but my query takes about 1 minute .
Is there any way to optimize and accelerate the query? Any help is appreciated.
This is my query:
SELECT a.data_date, a.user_id, b.reg_date
-- List every day and de-duplicated users
from ( SELECT distinct data_date, user_id
from `game_user_log`) a
-- Get the first login date as reg_date
left outer join ( SELECT user_id, min(data_date) reg_date
FROM `game_user_log`
GROUP BY user_id) b
on a.user_id=b.user_id
SELECT data_date,
user_id,
MIN(data_date) OVER (PARTITION BY user_id) reg_date
FROM game_user_log
GROUP BY data_date, user_id
?
PS. Index by (user_id, data_date) needed for to accelerate.
I would write your query as:
select du.data_date, du.user_id, u.reg_date
from (select distinct data_date, user_id
from game_user_log
) du join
(select user_id, min(data_date) as reg_date
from game_user_log
group by user_id
) u
on du.user_id = u.user_id;
For this query, you can try an index on game_user_log(user_id, data_date).

Compute an average number of transactions per user in a readable manner

I have always been struggling with these types of queries. So, I'd like that someone checks my approach to handle those.
I am asked to find how many transactions, on average, each user executes during a 12 hours timespan starting from the first transaction.
This is the data:
CREATE TABLE IF NOT EXISTS `transactions` (
`transactions_ts` timestamp ,
`user_id` int(6) unsigned NOT NULL,
`transaction_id` bigint NOT NULL,
`item` varchar(200), PRIMARY KEY(`transaction_id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `transactions` (`transactions_ts`, `user_id`, `transaction_id`,`item` ) VALUES
('2016-06-18 13:46:51.0', 13811335,1322361417, 'glove'),
('2016-06-18 17:29:25.0', 13811335,3729362318, 'hat'),
('2016-06-18 23::07:12.0', 13811335,1322363995,'vase' ),
('2016-06-19 07:14:56.0',13811335,7482365143, 'cup'),
('2016-06-19 21:59:40.0',13811335,1322369619,'mirror' ),
('2016-06-17 12:39:46.0',3378024101,9322351612, 'dress'),
('2016-06-17 20:22:17.0',3378024101,9322353031,'vase' ),
('2016-06-20 11:29:02.0',3378024101,6928364072,'tie'),
('2016-06-20 18:59:48.0',13811335,1322375547, 'mirror');
My approach is the following (with the steps and the query itself below):
1) For each distinct user_id, find their first and 12 hours' transaction timestamp. This is accomplished by the inner query aliased as t1
2) Then, by inner join to the second inner query (t2), basically, I augment each row of the transactions table with two variables "first_trans" and "right_trans" of the 1st step.
3) Now, by where-condition, I select only those transaction timestamps that fall in the interval specified by first_trans and right_trans timestamps
4) Filtered table from the step 3 is now aggregated as count distinct transaction ids per user
5) The result of the 4 steps above is a table where each user has a count of transactions falling into the interval of 12 hrs from the first timestamp. I wrap it in another select that sums users' transaction counts and divides by the number of users, giving an average count per user.
I am quite certain that the end result is correct overall, but I keep thinking I might go without the 4th select. Or, perhaps, the whole code is somewhat clumsy, while my aim was to make this query as readable as possible, and not necessarily computationally optimal.
select
sum(dist_ts)/count(*) as avg_ts_per_user
from (
select
count(distinct transaction_id) as dist_ts,
us_id
from
(select
user_id as us_id,
min(transactions_ts) as first_trans,
min(transactions_ts) + interval 12 hour as right_trans
from transactions
group by us_id )
as t1
inner join
(select * from transactions )
as t2
on t1.us_id=t2.user_id
where transactions_ts >= first_trans
and transactions_ts < right_trans
group by us_id
) as t3
Fiddle demo
I don't think there is a mistake per se. The code can be slightly simplified (and neatened up a bit as follows):
select sum(dist_ts)/count(*) as avg_ts_per_user
from (
select count(distinct transaction_id) as dist_ts, us_id
from (
select user_id as us_id, min(transactions_ts) as first_trans, min(transactions_ts) + interval 12 hour as right_trans
from transactions
group by us_id
) as t1
inner join transactions as t2
on t1.us_id=t2.user_id and transactions_ts >= first_trans and transactions_ts < right_trans
group by us_id
) as t3
The (select * from transactions ) as t2 was simplified above and I somewhat arbitrarilly moved a where clause condition to the on clause of the inner join.
My Fiddle Demo
Here is a second way that does not use inner joins:
select sum(cnt)/count(*) as avg_ts_per_user from (
select count(*) as cnt, t.user_id
from transactions t
where t.transactions_ts >= (select min(transactions_ts) from transactions where user_id = t.user_id)
and t.transactions_ts < (select min(transactions_ts) + interval 12 hour from transactions where user_id = t.user_id)
group by t.user_id
) sq
Another Fiddle
You should probably run EXPLAIN against the two queries to see which one runs better on your server. Also note that min(transaction_ts) is specified twice for each user. Is MySql able to avoid the redundant calculation? I don't know. One possibility would be to create a temporary table consisting of user_id and min_transaction_ts so that the value is computed once. This would only make sense if your table had lots of rows and maybe not even then.

Mysql get latest row for status

I have a log table with several statuses. It logs the position of physical objects in an external system. I want to get the latest rows for a status for each distinct physical object.
I need a list of typeids and their quantity for each status, minus the quantity of typeids that have an entry for another status that is later than the row with the status we are looking for.
e.g each status move is recorded but nothing else.
Here's the problem, I don't have a distinct ID for each physical object. I can only calculate how many there are from the state of the log table.
I've tried
SELECT dl.id, dl.status
FROM `log` AS dl
INNER JOIN (
SELECT MAX( `date` ) , id
FROM `log`
GROUP BY id ORDER BY `date` DESC
) AS dl2
WHERE dl.id = dl2.id
but this would require a distinct type id to work.
My table has a primary key id, datetime, status, product type_id. There are four different statuses.
a product must pass through all statuses.
Example Data.
date typeid status id
2014-01-13 PF0180 shopfloor 71941
2014-01-13 ND0355 shopfloor 71940
2014-01-10 ND0355 machine 71938
2014-01-10 ND0355 machine 71937
2014-01-10 ND0282 machine 7193
when selected results for the status shopfloor I would want
quantity typeid
1 ND0355
1 PF0180
when selecting for status machine I would want
quantity typeid
1 ND0282
1 ND0355
The order of the statuses shouldn't matter it only matters if there is a later entry for the product.
If I understood you correctly, this will give you the desired output:
select
l1.typeid,
l1.status,
count(1) - (
select count(1)
from log l2
where l2.typeid = l1.typeid and
l2.date > l1.date
)
from log l1
group by l1.typeid, l1.status;
Check this SQL Fiddle
TYPEID STATUS TOTAL
-----------------------------
ND0282 machine 1
ND0355 machine 1
ND0355 shopfloor 1
PF0180 shopfloor 1
You need to get the greatest date per status, not per id. Then join to the log table where the status and date are the same.
SELECT dl.id, dl.status
FROM `log` AS dl
INNER JOIN (
SELECT status, MAX( `date` ) AS date
FROM `log`
GROUP BY status ORDER BY NULL
) AS dl2 USING (status, date);
It would be helpful to have an index on (status, date) on this table, which would allow the subquery to run as an index-only query.
Everton Agner originally posted this solution, but the reply seems to have disappeared so I'm adding it (with slight modifications)
select
l1.typeid,
l1.status,
count(1) - (
select count(1)
from log l2
where l2.typeid = l1.typeid and
l2.`date` > l1.`date`
AND l2.status != 'dieshop'
) as quant
from log l1
WHERE l1.status = 'dieshop'
group by l1.typeid;