MySQL: get oldest record from most recent group - mysql

Sorry for the confusing title, but it's the best way to explain it. This is not a usual "most recent from group" problem and I haven't been able to find anything similar on the web.
I have a status table that tracks what people are doing at various work sites. It contains records that link people, status and location.
ID, start_date, person_ID, location_ID, status
1, 2014-10-12, 1, 1, job a
2, 2014-10-13, 2, 2, job b
3, 2014-10-15, 1, 3, job c
4, 2014-10-21, 1, 3, job d
5, 2014-10-22, 2, 4, job a
6, 2014-10-26, 2, 2, job d
I need to be able to determine how long each person as been at the current site - I'm hoping to get results like this:
person_ID, location_ID, since
1, 3, 2014-10-15
2, 2, 2014-10-26
Getting when they started the current job is relatively easy by joining the max(start_date), but I need the min(start_date) from the jobs done at the most recent location.
I have been trying to join the min(start_date) within the records that match the current location (from the most recent record), and that works great until I have a person (like person 2) who has multiple visits to the current location... you can see in my desired results that I want the 10-26 date, not the 10-13 which is the first time they were at the site.
I need some method for matching the the job records for a given person, and then iterating back until the location doesn't match. I'm figuring there has to be some way to do this with some sub-queries and some clever joins, but I haven't been able to find it yet, so I would appreciate some help.

If I understand what you're asking correctly, you could use EXISTS to eliminate all but the most recent locations per person, and get the min date from the resulting rows.
SELECT person_id, location_id, MIN(start_date) since
FROM status s
WHERE NOT EXISTS (
SELECT 1 FROM status
WHERE s.person_id = person_id
AND s.location_id <> location_id
AND s.start_date < start_date)
GROUP BY person_id
An SQLfiddle to test with.
Basically, it eliminates all locations and times where the same person has visited another location more recently. For example;
1, 2014-10-12, 1, 1, job a
...is eliminated since person 1 has visited location 3 more recently, while;
3, 2014-10-15, 1, 3, job c
...is kept since the same person has only visited the same location more recently.
It then just picks the least recent time per person. Since only the rows from the last location are kept, it will be the least recent time from the most recent location.

I think the easiest way is with variables to keep track of the information you need:
select person_id, location_id, min(start_date) as since
from (select s.*,
(#rn := if(#p <> person_id, if(#p:=person_id, 1, 1),
if(#l = location_id, #rn,
if(#l:=location_d, #rn + 1, #rn + 1)
)
)
) as location_counter
from status s cross join
(select #p := 0, #l := 0, #rn := 0) vars
order by person_id, start_date desc
) s
where location_counter = 1
group by person_id, location_id;
The weird logic with the variables is (trying to) enumerate the locations for each person. It should be incrementing #rn only when the location changes and resetting the value to 1 for a new person.

Quite simple actually.
SELECT g.person_ID,
(SELECT l.location_ID
FROM status l
WHERE l.person_ID = g.person_ID
AND l.start_date = MAX(g.start_date)) AS location,
MAX(g.start_date) AS since
FROM status g
GROUP BY g.person_ID
This uses a grouping on person_ID, and uses a SELECT for the location column expression.
The sole question is whether you meant MIN i.o. MAX as in your example you yield the youngest date, not the oldest.

Related

SQL record with latest time stamp, but with a join enumerating the user, where NOT a particular status

Really struggling matching up other people examples on this one, so wonder if someone would be good enough to point me in the right direction....
What I have are 2 tables in MySQL.
Tags
tagid, status, lot, lat, long, createuser, timestamp
Users
userid, first, surname
My process just adds rows to the Tags table, for the tagid scanned so there could be many rows with the same tagid but each row will have different info depending on the user, with each row having the timestamp of when it happened.
The ask is that I would like to list the latest record for each tagid, but I would like to exclude anything with a Tags.status of 'store' and enumerate the Tags.createuser to the name of the Users.userid
I just cant figure out how to get the last timestamp, as well as do the NOT statement, given there could be a situation like below.
tagid, status, lot, lat, long, createuser, timestamp
1000001, live, 1, xxxx, yyyy, 1, 2020-10-20 12:00
1000001, store, 1, xxxx, yyyy, 1, 2020-10-20 12:10
1000002, live, 1, xxxx, yyyy, 2, 2020-10-20 11:00
User 2 = Joe Bloggs
So the only thing I want returned is below because the last record for 1000001 was 'store'
1000002, live, 1, xxxx, yyyy, Joe Bloggs, 2020-10-20 11:00
You want the latest record per tag, along with the associated user name - if and only if the status of that tag is "live".
You can use row_number() and filtering:
select t.*, u.surname
from users u
inner join (
select t.*, row_number() over(partition by tagid order by timestamp desc) rn
from tags
) t on t.createduser = u.userid
where t.rn = 1 and t.status = 'live'
This requires MySQL 8.0. In earlier versions, one option uses a correlated subquery for filtering:
select t.*, u.surname
from users u
inner join tags t on t.createduser = u.userid
where t.status = 'live' and t.timestamp = (
select max(t1.timestamp) from tags t1 where t1.tagid = t.tagid
)

include only the first and last groups in query results

Given the schema
The following query
SELECT a.user_id,
a.id,
a.date_created,
avg(ai.level) level
FROM assessment a
JOIN assessment_item ai ON a.id = ai.assessment_id
GROUP BY a.user_id, a.id;
Returns these results
user_id, a.id, a.date_created, level
1, 99, "2015-07-13 18:26:00", 4.0000
1, 98, "2015-07-13 19:04:58", 6.0000
13, 9, "2015-07-13 18:26:00", 2.0000
13, 11, "2015-07-13 19:04:58", 3.0000
I would like to change the query such that only the earliest results is returned for each user. In other words, the following should be returned instead
user_id, a.id, a.date_created, level
1, 99, "2015-07-13 18:26:00", 4.0000
13, 9, "2015-07-13 18:26:00", 2.0000
I think I need to add a HAVING clause, but I'm struggling to figure out the exact syntax.
I have done something like this, except for a small difference I wanted first 5 per group. The usage case was for reporting - means time for running query / creation of temp table was not a constraint.
The solution I had:
Create a new table with columns as id( a reference to original table) and id can be unique/primary
INSERT IGNORE INTO tbl1 (id) select min(id) from original_tbl where id not in (select id from tbl1) group by user_id
Repeat step 2 as many times you required( in my case it was 5 times). the new table table will have only the ids you want to show
Now run a join on tbl1 and original table will give you the required result
Note: This might not be the best solution, but this worked for me when I had to share the report in 2-3hours in a weekend. And the data size I had was around 1M records
Disclaimer: I am in a bit of a hurry, and have not tested this fully
-- Create a CTE that holds the first and last date for each user_id.
with first_and_last as (
-- Get the first date (min) for each user_id
select a.[user_id], min(a.date_created) as date_created
from assessment as a
group by a.[user_id]
-- Combine the first and last, so each user_id should have two entries, even if they are the same one.
union all
-- Get the last date (max) for each user_id
select a.[user_id], max(a.date_created)
from assessment as a
group by a.[user_id]
)
select a.[user_id],
a.id,
a.date_created,
avg(ai.[level]) as [level]
from assessment as a
inner join assessment_item as ai on a.id = ai.assessment_id
-- Join with the CTE to only keep records that have either the min or max date_created for each user_id.
inner join first_and_last as fnl on a.[user_id] = fnl.[user_id] and a.date_created = fnl.date_created
group by a.[user_id], a.id, a.date_created;

MySQL Subquery / Query Issue

I'm having a mental block with this query, I'm trying to return the max date and the maximum time and do an order by of the identity. It would be greatly appreciate if someone can add a pair of eyes to this type of query So :
Data Set
Identity, Date, Time, Website
10, 5/10/15, 1, google.com
10, 5/10/15, 3, google.com
10, 5/10/15, 10, google.com
25, 5/11/15, 1, yahoo.com
25, 5/11/15, 15, yahoo.com
Expected Result
10, 5/10/15, 10, google.com
25, 5/11/15, 15, yahoo.com
Current Query
SELECT DISTINCT *, MAX(datetime) as maxdate, MAX(time), identity
FROM identity_track
GROUP BY identity
ORDER BY maxdate DESC
Something like this?
select identity, max(date), max(time), website
from identity_track
group by website;
Demo here: http://sqlfiddle.com/#!9/5cadf/1
You can order by any of the fields you want.
Also, the expected output you posted doesn't line up with what it seems like you're attempting to do.
edit
Updated query based on additional information.
select t.identity, t.date, max(t.time), t.website
from t
inner join
(select identity, website, max(date) d
from t
group by identity, website) q
on t.identity = q.identity
and t.website = q.website
and q.d = t.date
group by t.identity, t.website, t.date
This one should give you the users identity, the pages he visited, the last time he visited that page, and the most amount of time he spent in any visit on that last visit.
Don't assume that all records for an identity are on the same day e.g. if the entity has times of 1/1/15 5pm and 1/2/15 2pm you'd get 1/2/15 5pm which is wrong.
I'd always merge the time and date but if you can't try this:
select t.identity, t.website, MAX(t.time)
FROM t
INNER JOIN
(
select identity, max(date) as max_date
from t
group by identity;
) x
ON t.identity = x.identity
AND t.date = x.max_date
group by t.identity, t.website
Firstly we get the maximum date for each site. Then for that day, get the maximum time.
Hope this helps.

mysql query to efficiently remove duplicates

Hi folks and thanks for reading
I have a quiz feature on my site which stores a score, username and ip address as the most important columns. I currently have a horrible series of views bringing back the high scores based on the criteria I need which are...
Lowest score first but...only the lowest score for each Quiz user.
The complexity lies if the user has changed ip, i.e. keeps the same username but has a different ip OR if the user keeps the same IP address but changes user name.
It's easier to explain with an example.
First visitor has 4 entries but from 3 different IP Addresses
Second user from 2 IP Addresses
Third user using one IP Address but using 3 Usernames
Table with VALUES(UserID, IPA, Score)
User 1, IP1, 13
User 1, IP1, 20
User 1, IP2, 30
User 1, IP3, 10
User 2, IP4, 20
User 2, IP5, 22
User 2, IP5, 15
User 3, IP6, 12
User 3, IP6, 20
User 4, IP6, 15
User 5, IP6, 11
The highscore query would present you with
User 1, IP3, 10
User 5, IP6, 11
User 2, IP5, 15
The score value is highly unlikely to be duplicated but I guess it is possible. The figures above are simplified to explain my conundrum!
Can anyone suggest an efficient way of removing these duplicates as my table is now over 15,000 records and the views are creaking!
Many thanks.
To identify occurrences of duplicate (UserID,IPA) tuples is pretty straightforward:
SELECT s.UserID
, s.IPA
FROM mytable s
GROUP
BY s.UserID
, s.IPA
HAVING COUNT(1) > 1
To get the lowest score, you could add MIN(s.Score) to the select list.
Deleting duplicates is a little more difficult, in that you don't seem to have any guarantee of uniqueness. Some will recommend that you copy the rows you want to keep out to a separate table, and then either swap the tables with renames, or truncate the original table and reload from the new table. (That usually turns out to be the most efficient approach.)
CREATE TABLE newtable LIKE mytable ;
INSERT INTO newtable (UserID,IPA,Score)
SELECT s.UserID
, s.IPA
, MIN(Score) AS Score
FROM mytable s
GROUP
BY s.UserID
, s.IPA ;
If you want to identify duplicates by just UserID, the same approach can work. If it isn't important that the IPA value comes from the row with the lowest score, it's a little easier. I can put together the query that gets the row that has the lowest score for the user.
If you want to delete rows from the existing table, without adding a unique identifier (like an AUTO_INCREMENT id column) on each row, that can be done too.
This will get you partway, deleting all rows for a given (UserID,IPA) that have a score higher than the lowest score:
DELETE t.*
FROM mytable t
JOIN ( SELECT s.UserID
, s.IPA
, MIN(s.Score)
FROM mytable s
GROUP
BY s.Userid
, s.IPA
) k
ON k.UserID = t.UserID
AND k.IPA = t.IPA
AND k.Score < t.Score
But that will still leave duplicate occurrences of duplicate (UserID,IPA,Score) tuples. Without some other column on the table that makes the row unique, it's a little more difficult to remove duplicates. (Again, a common technique is copy the rows you want to keep to another table, and either swap tables or reload the original table from the saved rows.
FOLLOWUP
Note that views (both stored and inline) can be expensive performancewise, with MySQL, since the views get materialized as temporary MyISAM tables (MySQL calls them "derived tables").
But correlated subqueries can be even more problematic on large sets.
So, choose your poison.
If there the table has an index ON (userID, Score, IPA) here's how I would get the resultset:
SELECT IF(#prev_user=t.UserID,#i:=#i+1,#i:=1) AS seq
, #prev_user := t.UserID AS UserID
, t.IPA
, t.Score
FROM mytable t
JOIN (SELECT #i := NULL, #prev_user := NULL) i
GROUP
BY t.UserID ASC
, t.Score ASC
, t.IPA ASC
HAVING seq = 1
This is taking advantage of some MySQL-specific features: user_variables and the guarantee that the GROUP BY will return a sorted resultset. (The EXPLAIN output will show "Using index", which means we avoid a sort operation, but the query will still create a derived table. We use the user_variables to identify the "first" row for each UserID, and the HAVING clause eliminates all but that first row.
test case:
create table mytable (UserID VARCHAR(6), IPA varchar(3), Score INT);
create index mytable_IX ON mytable (UserID, Score, IPA);
insert into mytable values ('User 1','IP1',13)
,('User 1','IP1',20)
,('User 1','IP2',30)
,('User 1','IP3',10)
,('User 2','IP4',20)
,('User 2','IP5',22)
,('User 2','IP5',15)
,('User 3','IP6',12)
,('User 3','IP6',20)
,('User 4','IP6',15)
,('User 5','IP6',11);
Another followup
To eliminate 'User 4' and 'User 5' from the resultset (it's not at all clear why you would want or need to do that. If it's because those users have only one row in the table, then you could add a JOIN to a subquery (inline view) that gets a list of UserID values where there is more than one row, like this:
SELECT IF(#prev_user=t.UserID,#i:=#i+1,#i:=1) AS seq
, #prev_user := t.UserID AS UserID
, t.IPA
, t.Score
FROM mytable t
JOIN ( SELECT d.UserID
FROM mytable d
GROUP
BY d.UserID
HAVING COUNT(1) > 1
) m
ON m.UserID = t.UserID
CROSS
JOIN (SELECT #i := NULL, #prev_user := NULL) i
GROUP
BY t.UserID ASC
, t.Score ASC
, t.IPA ASC
HAVING seq = 1

Is there a way to LIMIT results per group of result rows in MySQL?

I have the following query:
SELECT title, karma, DATE(date_uploaded) as d
FROM image
ORDER BY d DESC, karma DESC
This will give me a list of image records, first sorted by newest day, and then by most karma.
There is just one thing missing: I want to only get the x images with the highest karma per day. So for example, per day I only want the 10 most karma images. I could of course run multiple queries, one per day, and then combine the results.
I was wondering if there is a smarter way that still performs well. I guess what I am looking for is a way to use LIMIT x,y per group of results?
You can do it by emulating ROW_NUMBER using variables.
SELECT d, title, karma
FROM (
SELECT
title,
karma,
DATE(date_uploaded) AS d,
#rn := CASE WHEN #prev = UNIX_TIMESTAMP(DATE(date_uploaded))
THEN #rn + 1
ELSE 1
END AS rn,
#prev := UNIX_TIMESTAMP(DATE(date_uploaded))
FROM image, (SELECT #prev := 0, #rn := 0) AS vars
ORDER BY date_uploaded, karma DESC
) T1
WHERE rn <= 3
ORDER BY d, karma DESC
Result:
'2010-04-26', 'Title9', 9
'2010-04-27', 'Title5', 8
'2010-04-27', 'Title6', 7
'2010-04-27', 'Title7', 6
'2010-04-28', 'Title4', 4
'2010-04-28', 'Title3', 3
'2010-04-28', 'Title2', 2
Quassnoi has a good article about this which explains the technique in more details: Emulating ROW_NUMBER() in MySQL - Row sampling.
Test data:
CREATE TABLE image (title NVARCHAR(100) NOT NULL, karma INT NOT NULL, date_uploaded DATE NOT NULL);
INSERT INTO image (title, karma, date_uploaded) VALUES
('Title1', 1, '2010-04-28'),
('Title2', 2, '2010-04-28'),
('Title3', 3, '2010-04-28'),
('Title4', 4, '2010-04-28'),
('Title5', 8, '2010-04-27'),
('Title6', 7, '2010-04-27'),
('Title7', 6, '2010-04-27'),
('Title8', 5, '2010-04-27'),
('Title9', 9, '2010-04-26');
Maybe this will work:
SELECT title, karma, DATE(date_uploaded) as d
FROM image img
WHERE id IN (
SELECT id
FROM image
WHERE DATE(date_uploaded)=DATE(img.date_uploaded)
ORDER BY karma DESC
LIMIT 10
)
ORDER BY d DESC, karma DESC
But this is not very efficient, as you don't have an index on DATE(date_uploaded) (I don't know if that would be possible, but I guess it isn't). As the table grows this can get very CPU expensive. It might be simpler to just have a loop in your code :-).