I have a log table with several statuses. It logs the position of physical objects in an external system. I want to get the latest rows for a status for each distinct physical object.
I need a list of typeids and their quantity for each status, minus the quantity of typeids that have an entry for another status that is later than the row with the status we are looking for.
e.g each status move is recorded but nothing else.
Here's the problem, I don't have a distinct ID for each physical object. I can only calculate how many there are from the state of the log table.
I've tried
SELECT dl.id, dl.status
FROM `log` AS dl
INNER JOIN (
SELECT MAX( `date` ) , id
FROM `log`
GROUP BY id ORDER BY `date` DESC
) AS dl2
WHERE dl.id = dl2.id
but this would require a distinct type id to work.
My table has a primary key id, datetime, status, product type_id. There are four different statuses.
a product must pass through all statuses.
Example Data.
date typeid status id
2014-01-13 PF0180 shopfloor 71941
2014-01-13 ND0355 shopfloor 71940
2014-01-10 ND0355 machine 71938
2014-01-10 ND0355 machine 71937
2014-01-10 ND0282 machine 7193
when selected results for the status shopfloor I would want
quantity typeid
1 ND0355
1 PF0180
when selecting for status machine I would want
quantity typeid
1 ND0282
1 ND0355
The order of the statuses shouldn't matter it only matters if there is a later entry for the product.
If I understood you correctly, this will give you the desired output:
select
l1.typeid,
l1.status,
count(1) - (
select count(1)
from log l2
where l2.typeid = l1.typeid and
l2.date > l1.date
)
from log l1
group by l1.typeid, l1.status;
Check this SQL Fiddle
TYPEID STATUS TOTAL
-----------------------------
ND0282 machine 1
ND0355 machine 1
ND0355 shopfloor 1
PF0180 shopfloor 1
You need to get the greatest date per status, not per id. Then join to the log table where the status and date are the same.
SELECT dl.id, dl.status
FROM `log` AS dl
INNER JOIN (
SELECT status, MAX( `date` ) AS date
FROM `log`
GROUP BY status ORDER BY NULL
) AS dl2 USING (status, date);
It would be helpful to have an index on (status, date) on this table, which would allow the subquery to run as an index-only query.
Everton Agner originally posted this solution, but the reply seems to have disappeared so I'm adding it (with slight modifications)
select
l1.typeid,
l1.status,
count(1) - (
select count(1)
from log l2
where l2.typeid = l1.typeid and
l2.`date` > l1.`date`
AND l2.status != 'dieshop'
) as quant
from log l1
WHERE l1.status = 'dieshop'
group by l1.typeid;
Related
I have a table for payments. It has a column named user_id, & payment_type. For every payment, a user can have multiple payment types.
I want to find the users that have used only one payment_type in their entire lifetime.
Let me make it clear through an example:
Let's say I have the following data:
user_id payment_type
1 UPI
1 NB
2 UPI
2 UPI
For the above, I only want user_id 2 as the output since for both the payments, it has used only 1 payment_type.
Can someone help?
A simple HAVING with COUNT should do the trick:
select user_id
from my_table
group by user_id
having count(distinct payment_type)=1;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=65f673a7df3ac0ee18c13105a2ec17ad
If you want to include payment_type in the result set , use:
select my.user_id,my.payment_type
from my_table my
inner join ( select user_id
from my_table
group by user_id
having count(distinct payment_type)=1
) as t1 on t1.user_id=my.user_id
group by my.user_id,my.payment_type ;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=cc4704c9e51d01e4e8fc087702edbe6e
I have a table with the following columns member_id, status and created_at (timestamp) and i want to extract the latest status for each member_id based on the timestamp value.
member_id
status
created_at
1
ON
1641862225
1
OFF
1641862272
2
OFF
1641862397
3
OFF
1641862401
3
ON
1641862402
Source: Raw data image
So, my ideal query result would be like this:
member_id
status
created_at
1
OFF
1641862272
2
OFF
1641862397
3
ON
1641862402
Expected query results image
My go to process for doing things like that is to assign a row number to each data and get row number 1 depending on the partition and sorting.
For mysql, this is only available starting mysql 8
SELECT ROW_NUMBER() OVER(PARTITION BY member_id ORDER BY created_at DESC) as row_num,
member_id, status, created_at FROM table
This will generate something like this.
row_num
member_id
status
created_at
1
1
OFF
1641862272
2
1
ON
1641862225
1
2
OFF
1641862397
1
3
ON
1641862402
2
3
OFF
1641862401
Then you use that as a sub query and get the rows where row_num = 1
SELECT member_id, status, created_at FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY member_id ORDER BY created_at DESC) as row_num,
member_id, status, created_at FROM table
) a WHERE row_num = 1
MySQL has support for Window Function since v8.0. the solution from crimson589 is preferred for v8+, this solution applies for earlier versions of MySQL or if you need an alternate solution to window queries.
After grouping by member_id we can either join back into the original set to gain the corresponding status value to the MAX(created_at)
SELECT ByMember.member_id
, status.status
, ByMember.created_at
FROM (
SELECT member_id, max(created_at) as created_at
FROM MemberStatus
GROUP BY member_id
) ByMember
JOIN MemberStatus status ON ByMember.member_id = status.member_id AND ByMember.created_at = status.created_at;
Or you could use a sub query instead of the join:
SELECT ByMember.member_id
, (SELECT status.status FROM MemberStatus status WHERE ByMember.member_id = status.member_id AND ByMember.created_at = status.created_at) as status
, ByMember.created_at
FROM (
SELECT member_id, max(created_at) as created_at
FROM MemberStatus
GROUP BY member_id
) ByMember
The JOIN based solution allows you to query additional columns from the original set instead of having multiple sub-queries. I would almost always advocate for the JOIN solution, but sometimes the sub-query is simpler to maintain.
I've setup a fiddle to compare these options: http://sqlfiddle.com/#!9/0edb931/11
You can group by member_id and max of created_at, then a self join with member_id and created_at will give you the latest status.
The data
CREATE TABLE IF NOT EXISTS `transactions` (
`transactions_ts` timestamp ,
`user_id` int(6) unsigned NOT NULL,
`transaction_id` bigint,
`item` varchar(200), PRIMARY KEY(`transaction_id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `transactions` (`transactions_ts`, `user_id`, `transaction_id`,`item` ) VALUES
('2016-06-18 13:46:51.0', 13811335,1322361417, 'glove'),
('2016-06-18 17:29:25.0', 13811335,3729362318, 'hat'),
('2016-06-18 23::07:12.0', 13811335,1322363995,'vase' ),
('2016-06-19 07:14:56.0',13811335,7482365143, 'cup'),
('2016-06-19 21:59:40.0',13811335,1322369619,'mirror' ),
('2016-06-17 12:39:46.0',3378024101,9322351612, 'dress'),
('2016-06-17 20:22:17.0',3378024101,9322353031,'vase' ),
('2016-06-20 11:29:02.0',3378024101,6928364072,'tie'),
('2016-06-20 18:59:48.0',13811335,1322375547, 'mirror');
The question: for each user, show the first item that they ordered (first by time). I assume time as a whole timestamp (not time and date separately).
My attempt
select
min(transactions_ts) as first_trans,
user_id, item
from transactions
group by user_id
order by first_trans;
I am sorry that may be it is a simple question, but one person tells me that my query is entirely wrong. And I have got no other means to test this claim of his
demo fiddle
This is a little bit more complicated than you thought.
To start with: "for each user" would translate to GROUP BY user_id, not to GROUP BY user_id, item.
But with GROUP BY user_id, you'd need an aggregation function saying "the item for the minimum transactions_ts". MySQL doesn't feature such an aggregation function.
The obvious solution is to make this two steps:
Find the first transaction per user
Show the items for these transactions
The query:
select *
from transactions
where (user_id, transactions_ts) in
(
select user_id, min(transactions_ts)
from transactions
group by user_id
);
Another way to word the task is: "Give me the transactions for which no older transaction for the same user exists".
The query:
select *
from transactions t
where not exists
(
select *
from transactions t2
where t2.user_id = t.user_id
and t2.transactions_ts < t.transactions_ts
);
If you are using MySQL 8.0, window function ROW_NUMBER() can be used to adress your use case, as follows:
SELECT transactions_ts, user_id, item
FROM (
SELECT
transactions_ts,
user_id,
item,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY transactions_ts) rn
FROM transactions
) x WHERE rn = 1
The inner query ranks each record by ascending timestamp, within groups of records having the same user_id. The outer query filters in the first transaction of each customer.
Demo on DB Fiddle:
transactions_ts | user_id | item
:------------------ | ---------: | :----
2016-06-18 13:46:51 | 13811335 | glove
2016-06-17 12:39:46 | 3378024101 | dress
You can do it using a subquery to get the first transaction_ts for each user:
select user_id, item, transactions_ts
from transactions a
where transactions_ts=(select min(transactions_ts)
from transactions b
where b.user_id=a.user_id)
So your get:
In the inner query get the first transaction time for each user
In the outer query you get the row that has the time you got at point 1
I am trying to query a table. There are 3 important fields: attendant_id, client_id, and date.
Each time an attendant works with a client, they add an entry which includes their id, the client's id, and the date. Occasionally, an attendant will work with more than one client on the same day. I would like to capture when this happens. Here is what I have so far:
SELECT *
FROM timesheet_lines tsl1
WHERE EXISTS
(
SELECT *
FROM timesheet_lines tsl2
WHERE tsl1.date = tsl2.date
AND tsl1.attendant_id = tsl2.attendant_id
AND tsl1.client_id <> tsl2.client_id
AND tsl1.date between '2014-04-01' AND '2014-06-30'
LIMIT 2,5
)
I only want to display results where an attendant worked with at least 2 different clients. I don't expect it to be possible to have more than 5 on a single day. This is why I am using LIMIT 2,5.
I am also only interested in April through June of this year.
I think I may have the right syntax, but the query seems to be taking forever to run. Is there a faster query? There should be only about 42000+ entries all together for this particular date range. I am not expecting to get more than about 500-600 results that meet the criteria.
I ended up using the following:
create TEMPORARY table tempTSL1
(date1 date, start1 time, end1 time, attend1 varchar(50), client1 varchar(50), type1 tinyint);
insert into tempTSL1(date1, start1, end1, attend1, client1, type1)
select date, start_time, end_time, attendant_id, client_id, type
from timesheet_lines
WHERE
timesheet_lines.date BETWEEN '2014-04-01' AND '2014-06-30'
and timesheet_lines.type IN (1,2,5,6);
create TEMPORARY table tempTSL2
(date2 date, start2 time, end2 time, attend2 varchar(50), client2 varchar(50), type2 tinyint);
insert into tempTSL2(date2, start2, end2, attend2, client2, type2)
select date, start_time, end_time, attendant_id, client_id, type
from timesheet_lines
WHERE
timesheet_lines.date BETWEEN '2014-04-01' AND '2014-06-30'
and timesheet_lines.type IN (1,2,5,6);
SELECT *
FROM tempTSL1
WHERE (attend1,date1) IN (
SELECT attend2
,date2
FROM tempTSL2 tsl2
GROUP BY attend2
,date2
HAVING COUNT(date2) > 1
)
GROUP BY attend1
,client1
,date1
HAVING COUNT(client1) = 1
ORDER BY date1,attend1,start1
You are likely making it much more complex than it needs to be. Try something like this:
SELECT attendant_id
,client_id
,date
FROM timesheet_lines
WHERE (attendant_id,date) IN (
SELECT attendant_id
,date
FROM timesheet_lines tsl1
GROUP BY attendant_id
,date
HAVING COUNT(date) > 1
)
GROUP BY attendant_id
,client_id
,date
HAVING COUNT(client_id) = 1
The subquery returns results only of attendants performing multiple activities on the same date. The top query will pull from the same table, matching the attendant and dates of activity, and filter the result set to items where there is only 1 client in the grouping. Example:
attendant_id client_id date
1 A 2014-01-01
1 B 2014-01-01
2 C 2014-01-01
2 D 2014-01-02
Will return:
attendant_id client_id date
1 A 2014-01-01
1 B 2014-01-01
Untested, but I think it should be in line with what you are looking for, assuming the following two statements are true:
You are not trying to capture two different attendants working the same client on the same day
An attendant can only perform one activity per client per day
If the second point is not true, then you will need to incorporate additional fields into the subquery (such as an activity_id or something).
Hope this helps.
This query works and provides me with the information I need, but it is very slow: it takes 18 seconds to agregate a database of only 4,000 records.
I'm bringing it here to see if anyone has any advice on how to improve it.
SELECT COUNT( status ) AS quantity, status
FROM log_table
WHERE time_stamp
IN (SELECT MAX( time_stamp ) FROM log_table GROUP BY userid )
GROUP BY status
Here's what it does/what it needs to do in plain text:
I have a table full of logs, each log contains a "userid", "status" (integer between 1-12) and "time_stamp" (a time stamp of when the log was created). There may be many entries for a particular userid, but with a different time stamp and status. I'm trying to get the most recent status (based on time_stamp) for each userid, then count the occurrences of each most-recent status among all the users.
My initial idea was to use a sub query with GROUP BY userid, that worked fast - but that always returned the first entry for each userid, not the most recent. If I could do GROUP BY userid using time_stamp DESC to Identify which row should be the representative for the group, that would be great. But of course ORDER BY inside of group does not work.
Any suggestions?
The first thing to try is to make this an explicit join:
SELECT COUNT(status) AS quantity, status
FROM log_table join
(select lg.userid, MAX( time_stamp ) as maxts
from log_table lg
GROUP BY userid
) lgu
on lgu.userid = lg.userid and lgu.maxts = lg.time_stamp
GROUP BY status;
Another approach is to use a different where clause. This will work best if you have an index on log_table(userid, time_stamp). This approach is doing the filtering by saying "there is no timestamp bigger than this one for a given user":
SELECT COUNT(status) AS quantity, status
FROM log_table
WHERE not exists (select 1
from log_table lg2
where lgu.userid = lg.userid and lg2.time_stamp > lg.time_stamp
)
GROUP BY status;