Retrieve the record with the newest change per group - mysql

I've got a table locations:
user | timestamp | state | other_data
-------------------------------------
1 100 1 some_data
1 200 1 some_data
1 300 0 some_data
1 400 0 some_data
2 100 0 some_data
2 200 0 some_data
This is for a location tracking app. A location has two states (1 for "user is within range" and 0 for "user is out of range").
Now I want to retrieve the last time a user's location state has changed.
Example for user = 1
user | timestamp | state | other_data
-------------------------------------
1 300 0 some_data
Because this was the first location update that has the same state value as the "current" (timestamp 400) record.
Higher-level description: I want to display the user something like "You have been in / out of range since [timestamp]"
The faster the solution, the better of course

I would use ranks to order the rows and then pick the min timestamp of the first ranked rows.
select user,min(timestamp) as timestamp,min(state) as state
from
(select l.*,#rn:=case when #user=user and #state=state then #rn
when #user<>user then 1
else #rn+1 end as rnk
,#user:=user,#state:=state
from locations l
cross join (select #rn:=0,#user:='',#state:='') r
order by user,timestamp desc
) t
where rnk=1
group by user

You can do this with a correlated subquery:
select l.*
from locations l
where l.timestamp = (select max(l2.timestamp)
from locations l2
where l2.user = l.user
);
For this to work well, you want an index on locations(user, timestamp).
This can be faster than the join and group by approach. In particular, the correlated subquery can make use of an index, but a group by on the whole table often does not (in MySQL).

As far as I am aware the only way to achieve this is a sells join. Something a bit like;
Select table.* From table
Inner Join
(Select id, max(timestamp) as tm from table group by id) as m
On m.tm = table.timestamp and m.id = table.id
Syntax is for MsSQL, it should transfer to MySQL though. Might have to specify column names instead of table.*

Related

MySQL : Group By Clause Not Using Index when used with Case

Im using MySQL
I cant change the DB structure, so thats not an option sadly
THE ISSUE:
When i use GROUP BY with CASE (as need in my situation), MYSQL uses
file_sort and the delay is humongous (approx 2-3minutes):
http://sqlfiddle.com/#!9/f97d8/11/0
But when i dont use CASE just GROUP BY group_id , MYSQL easily uses
index and result is fast:
http://sqlfiddle.com/#!9/f97d8/12/0
Scenerio: DETAILED
Table msgs, containing records of sent messages, with fields:
id,
user_id, (the guy who sent the message)
type, (0=> means it's group msg. All the msgs sent under this are marked by group_id. So lets say group_id = 5 sent 5 msgs, the table will have 5 records with group_id =5 and type=0. For type>0, the group_id will be NULL, coz all other types have no group_id as they are individual msgs sent to single recipient)
group_id (if type=0, will contain group_id, else NULL)
Table contains approx 10 million records for user id 50001 and with different types (i.e group as well as individual msgs)
Now the QUERY:
SELECT
msgs.*
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.user_id IN (50111)
AND msgs.type IN (0, 1, 5, 7)
GROUP BY CASE `msgs`.`type` WHEN 0 THEN `msgs`.`group_id` ELSE `msgs`.`id` END
ORDER BY `msgs`.`group_id` DESC
LIMIT 100
I HAVE to get summary in a single QUERY,
so msgs sent to group lets say 5 (have 5 records in this table) will be shown as 1 record for summary (i may show COUNT later, but thats not an issue).
The individual msgs have NULL as group_id, so i cant just put 'GROUP BY group_id ' coz that will Group all individual msgs to single record which is not acceptable.
Sample output can be something like:
id owner_id, type group_id COUNT
1 50001 0 2 5
1 50001 1 NULL 1
1 50001 4 NULL 1
1 50001 0 7 5
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 0 10 5
Now the problem is that the GROUP condition after using CASE (which i currently think that i have to because i only need to group by group_id if type=0) is causing alot of delay coz it's not using indexes which it does if i dont use CASE (like just group by group_id ). Please view SQLFiddles above to see the explain results
Can anyone plz give an advice how to get it optimized
UPDATE
I tried a workaround , that does somehow works out (drops INITIAL queries to 1sec). Using union, what it does is, to minimize the resultset by union that forces SQL to write on disk for filesort (due to huge resultset), limit the resultset of group msgs, and individual msgs (view query below)
-- first part of union retrieves group msgs (that have type 0 and needs to be grouped by group_id). Applies the limit to captivate the out of control result set
-- The second query retrieves individual msgs, (those with type !=0, grouped by msgs.id - not necessary but just to be save from duplicate entries due to joins). Applies the limit to captivate the out of control result set
-- JOins the two to retrieve the desired resultset
Here's the query:
SELECT
*
FROM
(
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (msgs.user_id = accounts.id)
WHERE 1
AND accounts.id IN (50111 ) AND type = 0
GROUP BY msgs.group_id
ORDER BY msgs.id DESC
LIMIT 40
)
UNION
ALL
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.type != 0
AND accounts.id IN (50111)
GROUP BY msgs.id
ORDER BY msgs.id
LIMIT 40
)
) AS temp
ORDER BY reference_id
LIMIT 20,20
But has alot of caveats,
-I need to handle the limit in inner queries as well. Lets say 20recs per page, and im on page 4. For inner queries , i need to apply limit 0,80, since im not sure which of the two parts had how many records in the previous 3 pages. So, as the records per page and number of pages grow, my query grows heavier. Lets say 1k rec per page, and im on page 100 , or 1K, the load gets heavier and time exponentially increases
I need to handle ordering in inner queries and then apply on the resultset prepared by union , conditions need to be applied on both inner queries seperately(but not much of an issue)
-Cant use calc_found_rows, so will need to get count using queries seperately
The main issue is the first one. The higher i go with the pagination , the heavier it gets
Would this run faster?
SELECT id, user_id, type, group_id
FROM
( SELECT id, user_id, type, group_id, IFNULL(group_id, id) AS foo
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
)
GROUP BY foo
ORDER BY `group_id` DESC
LIMIT 100
It needs INDEX(user_id, type).
Does this give the 'correct' answer?
SELECT DISTINCT *
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
GROUP BY IFNULL(group_id, id)
ORDER BY `group_id` DESC
LIMIT 100
(It needs the same index)

Number duplicate records on the MySQL table

Have a table with similar schema
id control code amount
1 200 12 300
2 400 12 300
3 200 12 300
4 100 10 400
5 100 10 400
6 500 13 500
Trying to list the duplicates of records on a UI.
Using following query I can retrieve the duplicate records and show it on UI.
select * from mwt group by control,code,amount having count(id) > 1;
id control code amount
1 200 12 300
4 100 10 400
Here the records with id 1 and 4 are duplicates of 3 and 5 respectively.
On the UI, the user will click a check-box adjacent to the record and corresponding duplicate records should be populate to the UI. To make things easier trying to populate another column named dup_id. Using this dup_id it is possible to filter the results from UI , which is in the JSON format.
How to create a result set similar to the one shown below?
id control code amount dup_id
1 200 12 300 1
2 400 12 300
3 200 12 300 1
4 100 10 400 4
5 100 10 400 4
6 500 13 500
This seems like a simpler solution than that suggested by #kickstarter - but maybe I've misunderstood the requirement...
SELECT x.*
, y.dup_id
FROM my_table x
LEFT
JOIN
( SELECT MIN(id) dup_id
, control
, code
, amount
FROM my_table
GROUP
BY control
, code
, amount
HAVING COUNT(*) > 1
) y
ON y.control = x.control
AND y.code = x.code
AND y.amount = x.amount;
Depending on how accurate the order has to be, you could do something like this.
This is getting all the unique control / code / amount with a count, to get a flag to know if that is a duplicate row, and ordered by control / code / amount so that they are in order. It does a cross join to initialise a few user variables.
Then it calculates a counter, only incrementing it if any of control / code / amount have changed AND it is a duplicate row. Then sets user variables to store the previous values of control / code / amount.
The outer query then orders the results back in to id order.
SELECT sub3.id,
sub3.control,
sub3.code,
sub3.amount,
sub3.dup_id
FROM
(
SELECT sub2.id,
sub2.control,
sub2.code,
sub2.amount,
#cnt:=IF(#control=control AND #code=code AND #amount=amount AND sub2.id_count IS NOT NULL, #cnt, IF(sub2.id_count IS NULL, #cnt, #cnt + 1)),
#control:=control,
#code:=code,
#amount:=amount,
IF(sub2.id_count IS NULL, NULL, #cnt) AS dup_id
FROM
(
SELECT mwt.id, mwt.control, mwt.code, mwt.amount, sub1.id_count
FROM mwt
LEFT OUTER JOIN
(
SELECT control, code, amount, COUNT(id) AS id_count
FROM mwt
GROUP BY control,code,amount
HAVING id_count > 1
) sub1
ON mwt.control = sub1.control
AND mwt.code = sub1.code
AND mwt.amount = sub1.amount
ORDER BY mwt.control, mwt.code, mwt.amount
) sub2
CROSS JOIN
(
SELECT #cnt:=0, #control:=0, #code:=0, #amount:=0
) sub0
) sub3
ORDER BY id
Note that this is ordering by control, code and amount, so not an exact match for your required output (which would require getting the first duplicates ordered by id first).
EDIT - Simpler and better way to do it. This gets all the duplicate rows with the min id for those duplicates (ordered by the min id), and uses a user variable to add a sequence number for those. Then LEFT OUTER JOINs that back against the main table to put that sequence number in all the matching rows.
SELECT mwt.id, mwt.control, mwt.code, mwt.amount, sub2.dup_id
FROM mwt
LEFT OUTER JOIN
(
SELECT sub1.id, sub1.control, sub1.code, sub1.amount, #cnt:=#cnt+1 AS dup_id
FROM
(
SELECT MIN(id) AS id, control, code, amount
FROM mwt
GROUP BY control,code,amount
HAVING COUNT(id) > 1
ORDER BY id
) sub1
CROSS JOIN
(
SELECT #cnt:=0
) sub0
) sub2
ON mwt.control = sub2.control
AND mwt.code = sub2.code
AND mwt.amount = sub2.amount
ORDER BY mwt.id
Would you need a dup_id column ?. I hope this can be achieved with a simple query like below
select id
, control
, code
, amount
from table
where control = from selected Record
and code = from selected Record
and amount = from selected Record
and id not equals from selected Record
You can very well omit the last not equals if the requirement is to list down duplicates including the selected record.

Get latest entry from multiple entries for the same user from a table

I have 2 tables. Table A and Table B.
Table A contains the details of individual users.
Table B contains 3 columns, namely "is_completed", "user_id" and "work_id"......
Table B tracks the details of work done by users and whether the work is completed or not. If completed, then that user can be assigned another work.
Problem Statement :
I assigned a work to user 1 and his is_completed is 0 (work not finished)...now I assume that after some days, his work is finished, so I did is_completed as 1 but at the same time I assigned another work to the same user 1 and now is_completed is 0. So I have two rows of same user, one with is_completed as 1 and another is_completed as 0 in Table B.
How can I fetch the latest is_completed i.e. user 1 as working or say busy?
SELECT COUNT(DISTINCT t.work_id) AS working FROM
(
SELECT * FROM
FROM TableB
WHERE user_id = 1
ORDER BY work_id DESC LIMIT 2;
) AS t
Result:
+---------+
| working |
+---------+
| 1 | // not working
| 2 | // working
+---------+
This query will return 2 if user 1 is currently in the middle of a task, indicating that there is only one record for the most recent work_id. It will return 1 if the user has finished his previous task and has not yet received a new task, indicating two records (start and stop) for the most recent work_id.
I assume that the work_id which gets assigned is always increasing.
Like this:
select u.user_id, if(ifnull(w.is_completed, 0) = 1, 'Busy', 'Available') as Status
from users u
left join work w
on u.user_id = w.user_id
left join work w2
on w.user_id = w2.user_id and w.work_id < w2.work_id
where w2.work_id is null
Fiddle: http://sqlfiddle.com/#!9/03aa7/9
Query will return ALL users and their current availability as either 'Busy' or 'Available', depending on the status of their most recent work entry. Note this depends on the notion that work_id is an ascending, never repeating value, and that a work_id greater than another work_id, is guaranteed to be the more recent of the two.
If you want it to show the status for a specific user_id, just append AND user_id = ?? to the above query
select t.user_id,t.is_completed from( select * from
TableB
order by work_id desc )as t group by t.user_id
This will give latest work staus of a user

Mysql: Order by max N values from subquery

I'm about to throw in the towel with this.
Preface: I want to make this work with any N, but for the sake of simplicity, I'll set N to be 3.
I've got a query (MySQL, specifically) that needs to pull in data from a table and sort based on top 3 values from that table and after that fallback to other sort criteria.
So basically I've got something like this:
SELECT tbl.id
FROM
tbl1 AS maintable
LEFT JOIN
tbl2 AS othertable
ON
maintable.id = othertable.id
ORDER BY
othertable.timestamp DESC,
maintable.timestamp DESC
Which is all basic textbook stuff. But the issue is I need the first ORDER BY clause to only get the three biggest values in othertable.timestamp and then fallback on maintable.timestamp.
Also, doing a LIMIT 3 subquery to othertable and join it is a no go as this needs to work with an arbitrary number of WHERE conditions applied to maintable.
I was almost able to make it work with a user variable based approach like this, but it fails since it doesn't take into account ordering, so it'll take the FIRST three othertable values it finds:
ORDER BY
(
IF(othertable.timestamp IS NULL, 0,
IF(
(#rank:=#rank+1) > 3, null, othertable.timestamp
)
)
) DESC
(with a #rank:=0 preceding the statement)
So... any tips on this? I'm losing my mind with the problem. Another parameter I have for this is that since I'm only altering an existing (vastly complicated) query, I can't do a wrapping outer query. Also, as noted, I'm on MySQL so any solutions using the ROW_NUMBER function are unfortunately out of reach.
Thanks to all in advance.
EDIT. Here's some sample data with timestamps dumbed down to simpler integers to illustrate what I need:
maintable
id timestamp
1 100
2 200
3 300
4 400
5 500
6 600
othertable
id timestamp
4 250
5 350
3 550
1 700
=>
1
3
5
6
4
2
And if for whatever reason we add WHERE NOT maintable.id = 5 to the query, here's what we should get:
1
3
4
6
2
...because now 4 is among the top 3 values in othertable referring to this set.
So as you see, the row with id 4 from othertable is not included in the ordering as it's the fourth in descending order of timestamp values, thus it falls back into getting ordered by the basic timestamp.
The real world need for this is this: I've got content in "maintable" and "othertable" is basically a marker for featured content with a timestamp of "featured date". I've got a view where I'm supposed to float the last 3 featured items to the top and the rest of the list just be a reverse chronologic list.
Maybe something like this.
SELECT
id
FROM
(SELECT
tbl.id,
CASE WHEN othertable.timestamp IS NULL THEN
0
ELSE
#i := #i + 1
END AS num,
othertable.timestamp as othertimestamp,
maintable.timestamp as maintimestamp
FROM
tbl1 AS maintable
CROSS JOIN (select #i := 0) i
LEFT JOIN tbl2 AS othertable
ON maintable.id = othertable.id
ORDER BY
othertable.timestamp DESC) t
ORDER BY
CASE WHEN num > 0 AND num <= 3 THEN
othertimestamp
ELSE
maintimestamp
END DESC
Modified answer:
select ilv.* from
(select sq.*, #i:=#i+1 rn from
(select #i := 0) i
CROSS JOIN
(select m.*, o.id o_id, o.timestamp o_t
from maintable m
left join othertable o
on m.id = o.id
where 1=1
order by o.timestamp desc) sq
) ilv
order by case when o_t is not null and rn <=3 then rn else 4 end,
timestamp desc
SQLFiddle here.
Amend where 1=1 condition inside subquery sq to match required complex selection conditions, and add appropriate limit criteria after the final order by for paging requirements.
Can you use a union query as below?
(SELECT id,timestamp,1 AS isFeatured FROM tbl2 ORDER BY timestamp DESC LIMIT 3)
UNION ALL
(SELECT id,timestamp,2 AS isFeatured FROM tbl1 WHERE NOT id in (SELECT id from tbl2 ORDER BY timestamp DESC LIMIT 3))
ORDER BY isFeatured,timestamp DESC
This might be somewhat redundant, but it is semantically closer to the question you are asking. This would also allow you to parameterize the number of featured results you want to return.

Access select latest entry of unique identifier

I have an Access table with multiple date entries for each unique identifier
Year ID TotalSpent
2003-2004 001 1000
2002-2003 001 900
2001-2002 001 100
2009-2010 002 8000
2008-2009 002 4000
2000-2001 003 100
1999-2000 003 0
I want to keep the latest (top) entry for each unique ID to produce
Year ID TotalSpent
2003-2004 001 1000
2009-2010 002 8000
2000-2001 003 100
I have looked at the top() function but cannot get it to produce more than 1 result (as opposed to 1 result for each unique ID). Any help would be appreciated.
Remou makes a valid point that a unique ID would be beneficial as it would allow to refer to the top row in the future but this could be a constraint outside of your control.
The data source is a bit awkward with the hyphenated years which prevents a simple grouping query. The second issue is that you simply cannot just group by the max of the TotalSpent field as it may not be the last field (A large refund for instance may affect a years total).
My solution involves finding the latest Year for each ID (Query A) and then reforms the year-tag to join onto table B. I didn't want to perform a join on a calculated field so I have wrapped it in another subquery (Query B). This is then joined onto the original table/query to extract the key rows and values.
SELECT YourTable.[YourYearField],
YourTable.ID,
YourTable.TotalSpent
FROM (SELECT A.ID,
[StartYear] & "-" & [EndYear] AS Grouping
FROM (SELECT YourTable.ID,
Max(Val(Right$([YourYearField], 4))) AS EndYear,
Max(Val(Right$([YourYearField], 4)) - 1) AS StartYear
FROM YourTable
GROUP BY YourTable.ID) AS A
GROUP BY A.ID,
[StartYear] & "-" & [EndYear]) AS B
INNER JOIN YourTable
ON ( B.Grouping = YourTable.[YourYearField] )
AND ( B.ID = YourTable.ID )
GROUP BY YourTable.[YourYearField],
YourTable.ID,
YourTable.TotalSpent;
You can get the Year and ID values you want with this query:
SELECT ID, Max([Year]) AS MaxOfYear
FROM YourTable
GROUP BY ID;
Then to get the corresponding TotalSpent values, use that SQL for a subquery which you join to YourTable.
SELECT y.Year, y.ID, y.TotalSpent
FROM
YourTable AS y
INNER JOIN
(
SELECT ID, Max([Year]) AS MaxOfYear
FROM YourTable
GROUP BY ID
) AS sub
ON
(y.Year = sub.MaxOfYear)
AND (y.ID = sub.ID);