I have a table ("lms_attendance") of users' check-in and out times that looks like this:
id user time io (enum)
1 9 1370931202 out
2 9 1370931664 out
3 6 1370932128 out
4 12 1370932128 out
5 12 1370933037 in
I'm trying to create a view of this table that would output only the most recent record per user id, while giving me the "in" or "out" value, so something like:
id user time io
2 9 1370931664 out
3 6 1370932128 out
5 12 1370933037 in
I'm pretty close so far, but I realized that views won't accept subquerys, which is making it a lot harder. The closest query I got was :
select
`lms_attendance`.`id` AS `id`,
`lms_attendance`.`user` AS `user`,
max(`lms_attendance`.`time`) AS `time`,
`lms_attendance`.`io` AS `io`
from `lms_attendance`
group by
`lms_attendance`.`user`,
`lms_attendance`.`io`
But what I get is :
id user time io
3 6 1370932128 out
1 9 1370931664 out
5 12 1370933037 in
4 12 1370932128 out
Which is close, but not perfect. I know that last group by shouldn't be there, but without it, it returns the most recent time, but not with it's relative IO value.
Any ideas?
Thanks!
Query:
SQLFIDDLEExample
SELECT t1.*
FROM lms_attendance t1
WHERE t1.time = (SELECT MAX(t2.time)
FROM lms_attendance t2
WHERE t2.user = t1.user)
Result:
| ID | USER | TIME | IO |
--------------------------------
| 2 | 9 | 1370931664 | out |
| 3 | 6 | 1370932128 | out |
| 5 | 12 | 1370933037 | in |
Note that if a user has multiple records with the same "maximum" time, the query above will return more than one record. If you only want 1 record per user, use the query below:
SQLFIDDLEExample
SELECT t1.*
FROM lms_attendance t1
WHERE t1.id = (SELECT t2.id
FROM lms_attendance t2
WHERE t2.user = t1.user
ORDER BY t2.id DESC
LIMIT 1)
No need to trying reinvent the wheel, as this is common greatest-n-per-group problem. Very nice solution is presented.
I prefer the most simplistic solution (see SQLFiddle, updated Justin's) without subqueries (thus easy to use in views):
SELECT t1.*
FROM lms_attendance AS t1
LEFT OUTER JOIN lms_attendance AS t2
ON t1.user = t2.user
AND (t1.time < t2.time
OR (t1.time = t2.time AND t1.Id < t2.Id))
WHERE t2.user IS NULL
This also works in a case where there are two different records with the same greatest value within the same group - thanks to the trick with (t1.time = t2.time AND t1.Id < t2.Id). All I am doing here is to assure that in case when two records of the same user have same time only one is chosen. Doesn't actually matter if the criteria is Id or something else - basically any criteria that is guaranteed to be unique would make the job here.
Based in #TMS answer, I like it because there's no need for subqueries but I think ommiting the 'OR' part will be sufficient and much simpler to understand and read.
SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
ON t1.user = t2.user
AND t1.time < t2.time
WHERE t2.user IS NULL
if you are not interested in rows with null times you can filter them in the WHERE clause:
SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
ON t1.user = t2.user
AND t1.time < t2.time
WHERE t2.user IS NULL and t1.time IS NOT NULL
Already solved, but just for the record, another approach would be to create two views...
CREATE TABLE lms_attendance
(id int, user int, time int, io varchar(3));
CREATE VIEW latest_all AS
SELECT la.user, max(la.time) time
FROM lms_attendance la
GROUP BY la.user;
CREATE VIEW latest_io AS
SELECT la.*
FROM lms_attendance la
JOIN latest_all lall
ON lall.user = la.user
AND lall.time = la.time;
INSERT INTO lms_attendance
VALUES
(1, 9, 1370931202, 'out'),
(2, 9, 1370931664, 'out'),
(3, 6, 1370932128, 'out'),
(4, 12, 1370932128, 'out'),
(5, 12, 1370933037, 'in');
SELECT * FROM latest_io;
Click here to see it in action at SQL Fiddle
If your on MySQL 8.0 or higher you can use Window functions:
Query:
DBFiddleExample
SELECT DISTINCT
FIRST_VALUE(ID) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS ID,
FIRST_VALUE(USER) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS USER,
FIRST_VALUE(TIME) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS TIME,
FIRST_VALUE(IO) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS IO
FROM lms_attendance;
Result:
| ID | USER | TIME | IO |
--------------------------------
| 2 | 9 | 1370931664 | out |
| 3 | 6 | 1370932128 | out |
| 5 | 12 | 1370933037 | in |
The advantage I see over using the solution proposed by Justin is that it enables you to select the row with the most recent data per user (or per id, or per whatever) even from subqueries without the need for an intermediate view or table.
And in case your running a HANA it is also ~7 times faster :D
Ok, this might be either a hack or error-prone, but somehow this is working as well-
SELECT id, MAX(user) as user, MAX(time) as time, MAX(io) as io FROM lms_attendance GROUP BY id;
select b.* from
(select
`lms_attendance`.`user` AS `user`,
max(`lms_attendance`.`time`) AS `time`
from `lms_attendance`
group by
`lms_attendance`.`user`) a
join
(select *
from `lms_attendance` ) b
on a.user = b.user
and a.time = b.time
I have tried one solution which works for me
SELECT user, MAX(TIME) as time
FROM lms_attendance
GROUP by user
HAVING MAX(time)
I have a very large table and all of the other suggestions here were taking a very long time to execute. I came up with this hacky method that was much faster. The downside is, if the max(date) row has a duplicate date for that user, it will return both of them.
SELECT * FROM mb_web.devices_log WHERE CONCAT(dtime, '-', user_id) in (
SELECT concat(max(dtime), '-', user_id) FROM mb_web.devices_log GROUP BY user_id
)
select result from (
select vorsteuerid as result, count(*) as anzahl from kreditorenrechnung where kundeid = 7148
group by vorsteuerid
) a order by anzahl desc limit 0,1
I have done same thing like below
SELECT t1.*
FROM lms_attendance t1
WHERE t1.id in (SELECT max(t2.id) as id
FROM lms_attendance t2
group BY t2.user)
This will also reduce memory utilization.
Thanks.
Possibly you can do group by user and then order by time desc. Something like as below
SELECT * FROM lms_attendance group by user order by time desc;
Try this query:
select id,user, max(time), io
FROM lms_attendance group by user;
This worked for me:
SELECT user, time FROM
(
SELECT user, time FROM lms_attendance --where clause
) AS T
WHERE (SELECT COUNT(0) FROM table WHERE user = T.user AND time > T.time) = 0
ORDER BY user ASC, time DESC
Related
I'm trying to figure out best way to take required rows from database.
Database table:
id user cat time
1 5 1 123
2 5 1 150
3 5 2 160
4 5 3 100
I want to take DISTINCT cat ... WHERE user=5 with MAX time value. How should I do that in efficient way?
You will want to use an aggregate function with a GROUP BY:
select user, cat, max(time) as Time
from yourtable
group by user, cat
See SQL Fiddle with Demo
If you want to include the id column, then you can use a subquery:
select t1.id,
t1.user,
t1.cat,
t1.time
from yourtable t1
inner join
(
select max(time) Time, user, cat
from yourtable
group by user, cat
) t2
on t1.time = t2.time
and t1.user = t2.user
and t1.cat = t2.cat
See SQL Fiddle with Demo. I used a subquery to be sure that the id value that is returned with each max(time) row is the correct id.
I am trying to clean up records stored in a MySQL table. If a row contains %X%, I need to delete that row and the row immediately below it, regardless of content. E.g. (sorry if the table is insulting anyone's intelligence):
| 1 | leave alone
| 2 | Contains %X% - Delete
| 3 | This row should also be deleted
| 4 | leave alone
| 5 | Contains %X% - Delete
| 6 | This row should also be deleted
| 7 | leave alone
Is there a way to do this using only a couple of queries? Or am I going to have to execute a SELECT query first (using the %x% search parameter) then loop through those results and execute a DELETE...WHERE for each index returned + 1
This should work although its a bit clunky (might want to check the LIKE argument as it uses pattern matching (see comments)
DELETE FROM table.db
WHERE idcol IN
( SELECT idcol FROM db.table WHERE col LIKE '%X%')
OR idcolIN
( SELECTidcol+1 FROMdb.tableWHEREcol` LIKE '%X%')
Let's assume the table was named test and contained to columns named id and data.
We start with a SELECT that gives us the id of all rows that have a preceding row (highest id of all ids lower than id of our current row):
SELECT t1.id FROM test t1
JOIN test t2 ON
( t2.id, true )
=
( SELECT t3.id, t3.data LIKE '%X%' FROM test t3
WHERE t3.id < t1.id ORDER BY id DESC LIMIT 1 )
That gives us the ids 3 and 6. Their preceding rows 2 and 5 contain %X%, so that's good.
Now lets get the ids of the rows that contain %X% and combine them with the previous ones, via UNION:
(SELECT t1.id FROM test t1
JOIN test t2 ON
( t2.id, true )
=
( SELECT t3.id, t3.data LIKE '%X%' FROM test t3
WHERE t3.id < t1.id ORDER BY id DESC LIMIT 1 )
)
UNION
(
SELECT id FROM test WHERE data LIKE '%X%'
)
That gives us 3, 6, 2, 5 - nice!
Now, we can't delete from a table and select from the same table in MySQL - so lets use a temporary table, store our ids that are to be deleted in there, and then read from that temporary table to delete from our original table:
CREATE TEMPORARY TABLE deleteids (id INT);
INSERT INTO deleteids
(SELECT t1.id FROM test t1
JOIN test t2 ON
( t2.id, true )
=
( SELECT t3.id, t3.data LIKE '%X%' FROM test t3
WHERE t3.id < t1.id ORDER BY id DESC LIMIT 1 )
)
UNION
(
SELECT id FROM test WHERE data LIKE '%X%'
);
DELETE FROM test WHERE id in (SELECT * FROM deleteids);
... and we are left with the ids 1, 4 and 7 in our test table!
(And since the previous rows are selected using <, ORDER BY and LIMIT, this also works if the ids are not continuous.)
You can do it all in a single DELETE statement:
Assuming the "row immediately after" is based on the order of your INT-based ID column, you can use MySQL variables to assign row numbers which accounts for gaps in your IDs:
DELETE a FROM tbl a
JOIN (
SELECT a.id, b.id AS nextid
FROM (
SELECT a.id, a.text, #rn:=#rn+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn:=1) rn_init
ORDER BY a.id
) a
LEFT JOIN (
SELECT a.id, #rn2:=#rn2+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn2:=0) rn_init
ORDER BY a.id
) b ON a.rownum = b.rownum
WHERE a.text LIKE '%X%'
) b ON a.id IN (b.id, b.nextid)
SQL Fiddle Demo (added additional data for example)
What this does is it first takes your data and ranks it based on your ID column, then we do an offset LEFT JOIN on an almost identical result set except that the rank column is behind by 1. This gets the rows and their immediate "next" rows side by side so that we can pull both of their id's at the same time in the parent DELETE statement:
SELECT a.id, a.text, b.id AS nextid, b.text AS nexttext
FROM (
SELECT a.id, a.text, #rn:=#rn+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn:=1) rn_init
ORDER BY a.id
) a
LEFT JOIN (
SELECT a.id, a.text, #rn2:=#rn2+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn2:=0) rn_init
ORDER BY a.id
) b ON a.rownum = b.rownum
WHERE a.text LIKE '%X%'
Yields:
ID | TEXT | NEXTID | NEXTTEXT
2 | Contains %X% - Delete | 3 | This row should also be deleted
5 | Contains %X% - Delete | 6 | This row should also be deleted
257 | Contains %X% - Delete | 3434 | This row should also be deleted
4000 | Contains %X% - Delete | 4005 | Contains %X% - Delete
4005 | Contains %X% - Delete | 6000 | Contains %X% - Delete
6000 | Contains %X% - Delete | 6534 | This row should also be deleted
We then JOIN-DELETE that entire statement on the condition that it deletes rows whose IDs are either the "subselected" ID or NEXTID.
There is no reasonable way of doing this in a single query. (It may be possible, but the query you end up having to use will be unreasonably complex, and will almost certainly not be portable to other SQL engines.)
Use the SELECT-then-DELETE approach you described in your question.
I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.
tbl_patient
id | patient_ID | visit_ID | patient_result
1 | 1 | 1 | 5
2 | 2 | 1 | 6
3 | 2 | 3 | 7
4 | 1 | 2 | 3
5 | 2 | 3 | 2
6 | 1 | 3 | 9
I tried the query below to fetch the last visit of each patient as,
SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`
Now i want to fetch the 2nd last visit of each patient with query but it give me error
(#1242 - Subquery returns more than 1 row)
SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`
Where I'm wrong
select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
from tbl_patient
group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id
id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.
Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9.
You can try this query:
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
where id not in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
GROUP BY patient_ID)
order by 1,2
SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2
ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`
There are a couple of approaches to getting the specified resultset returned in a single SQL statement.
Unfortunately, most of those approaches yield rather unwieldy statements.
The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.
Three of the most common approaches make use of:
correlated subquery
inequality join (nearly a Cartesian product)
two passes over the data
Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM (
SELECT p.id
, p.patient_id
, p.visit_id
, p.patient_result
, #rn := if(#prev_patient_id = patient_id, #rn + 1, 1) AS rn
, #prev_patient_id := patient_id AS prev_patient_id
FROM tbl_patients p
JOIN (SELECT #rn := 0, #prev_patient_id := NULL) i
ORDER BY p.patient_id DESC, p.id DESC
) t
WHERE t.rn <= 2
Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.
This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.
This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.
Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.
It's also possible to get an equivalent resultset with a correlated subquery:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
WHERE ( SELECT COUNT(1) AS cnt
FROM tbl_patients p
WHERE p.patient_id = t.patient_id
AND p.id >= t.id
) <= 2
ORDER BY t.patient_id ASC, t.id ASC
Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.
As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
LEFT
JOIN tbl_patients p
ON p.patient_id = t.patient_id
AND t.id < p.id
GROUP
BY t.id
, t.patient_id
, t.visit_id
, t.patient_result
HAVING COUNT(1) <= 2
Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.
Try this..
SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID) - 1) GROUP BY patient_ID
Why not use simply...
GROUP BY `patient_ID` DESC LIMIT 2
... and do the rest in the next step?
I have the following query
select t1.file_name,
max(create_date),
t1.id
from dbo.translation_style_sheet AS t1
GROUP BY t1.file_name
I'd like to add the id to the select, but everytime I do this, it returns all results rather than my grouped results
my goal is to return
pdf | 10/1/2012 | 3
doc | 10/2/2012 | 5
however my query is returning
pdf | 9/30/2012 | 1
pdf | 9/31/2012 | 2
pdf | 10/1/2012 | 3
doc | 10/1/2012 | 4
doc | 10/2/2012 | 5
Anybody know what I'm doing wrong?
If you are guaranteed that the row with the max(create_date) also has the largest id value (your sample data does but I don't know if that is an artifact), you can simply select the max(id)
select t1.file_name, max(create_date), max(t1.id)
from dbo.translation_style_sheet AS t1
GROUP BY t1.file_name
If you are not guaranteed that, and you want the id value that is from the row that has the max(create_date) to be returned, you can use analytic functions. The following will work in Oracle though it is unlikely that less sophisticated databases like Derby would support it.
select t1.file_name, t1.create_date, t1.id
from (select t2.file_name,
t2.create_date,
t2.id,
rank() over (partition by t2.file_name
order by t2.create_date desc) rnk
from dbo.translation_style_sheet t2) t1
where rnk = 1
You could also use a subquery. This is likely to be more portable but less efficient than the analytic function approach
select t1.file_name, t1.create_date, t1.id
from dbo.translation_style_sheet t1
where (t1.file_name, t1.create_date) in (select t2.file_name, max(t2.create_date)
from dbo.translation_style_sheet t2
group by t2.file_name);
or
select t1.file_name, t1.create_date, t1.id
from dbo.translation_style_sheet t1
where t1.create_date = (select max(t2.create_date)
from dbo.translation_style_sheet t2
where t1.file_name = t2.file_name);
When you are using GROUP BY, you have to include all fields in the group that aren't being used by the aggregate function.
select t1.file_name, max(create_date), t1.id from
dbo.translation_style_sheet AS t1 GROUP BY t1.file_name, t1.id
I have a table in which users store scores and other information about said score (for example notes on score, or time taken etc). I want a mysql query that finds each users personal best score and it's associated notes and time etc.
What I have tried to use is something like this:
SELECT *, MAX(score) FROM table GROUP BY (user)
The problem with this is that whilst you can extra the users personal best from that query [MAX(score)], the returned notes and times etc are not associated with the maximum score, but a different score (specifically the one contained in *). Is there a way I can write a query that selects what I want? Or will I have to do it manually in PhP?
I'm assuming that you only want one result per player, even if they have scored the same maximum score more than once. I am also assuming that you want each player's first time that they got their personal best in the case that there are repeats.
There's a few ways of doing this. Here's a way that is MySQL specific:
SELECT user, scoredate, score, notes FROM (
SELECT *, #prev <> user AS is_best, #prev := user
FROM table1, (SELECT #prev := -1) AS vars
ORDER BY user, score DESC, scoredate
) AS T1
WHERE is_best
Here's a more general way that uses ordinary SQL:
SELECT T3.* FROM table1 AS T3
JOIN (
SELECT T1.user, T1.score, MIN(scoredate) AS scoredate
FROM table1 AS T1
JOIN (SELECT user, MAX(score) AS score FROM table1 GROUP BY user) AS T2
ON T1.user = T2.user AND T1.score = T2.score
GROUP BY T1.user
) AS T4
ON T3.user = T4.user AND T3.score = T4.score AND T3.scoredate = T4.scoredate
Result:
1, '2010-01-01 17:00:00', 50, 'Much better'
2, '2010-01-01 14:00:00', 100, 'Perfect score'
Test data I used to test this:
CREATE TABLE table1 (user INT NOT NULL, scoredate DATETIME NOT NULL, score INT NOT NULL, notes NVARCHAR(100) NOT NULL);
INSERT INTO table1 (user, scoredate, score, notes) VALUES
(1, '2010-01-01 12:00:00', 10, 'First attempt'),
(1, '2010-01-01 17:00:00', 50, 'Much better'),
(1, '2010-01-01 22:00:00', 30, 'Time for bed'),
(2, '2010-01-01 14:00:00', 100, 'Perfect score'),
(2, '2010-01-01 16:00:00', 100, 'This is too easy');
You can join with a sub query, as in the following example:
SELECT t.*,
sub_t.max_score
FROM table t
JOIN (SELECT MAX(score) as max_score,
user
FROM table
GROUP BY user) sub_t ON (sub_t.user = t.user AND
sub_t.max_score = t.score);
The above query can be explained as follows. It starts with:
SELECT t.* FROM table t;
... This by itself will obviously list all the contents of the table. The goal is to keep only the rows that represent a maximum score of a particular user. Therefore if we had the data below:
+------------------------+
| user | score | notes |
+------+-------+---------+
| 1 | 10 | note a |
| 1 | 15 | note b |
| 1 | 20 | note c |
| 2 | 8 | note d |
| 2 | 12 | note e |
| 2 | 5 | note f |
+------+-------+---------+
...We would have wanted to keep just the "note c" and "note e" rows.
To find the rows that we want to keep, we can simply use:
SELECT MAX(score), user FROM table GROUP BY user;
Note that we cannot get the notes attribute from the above query, because as you had already noticed, you would not get the expected results for fields not aggregated with an aggregate function, like MAX() or not part of the GROUP BY clause. For further reading on this topic, you may want to check:
Debunking GROUP BY Myths
How does MySQL decide which id to return in group by clause?
Why does MySql allow “group by” queries WITHOUT aggregate functions?
Now we only need to keep the rows from the first query that match the second query. We can do this with an INNER JOIN:
...
JOIN (SELECT MAX(score) as max_score,
user
FROM table
GROUP BY user) sub_t ON (sub_t.user = t.user AND
sub_t.max_score = t.score);
The sub query is given the name sub_t. It is the set of all the users with the personal best score. The ON clause of the JOIN applies the restriction to the relevant fields. Remember that we only want to keep rows that are part of this subquery.
SELECT *
FROM table t
ORDER BY t.score DESC
GROUP BY t.user
LIMIT 1
Side note: It is better to specify the fields than use SELECT *