I have a table in which users store scores and other information about said score (for example notes on score, or time taken etc). I want a mysql query that finds each users personal best score and it's associated notes and time etc.
What I have tried to use is something like this:
SELECT *, MAX(score) FROM table GROUP BY (user)
The problem with this is that whilst you can extra the users personal best from that query [MAX(score)], the returned notes and times etc are not associated with the maximum score, but a different score (specifically the one contained in *). Is there a way I can write a query that selects what I want? Or will I have to do it manually in PhP?
I'm assuming that you only want one result per player, even if they have scored the same maximum score more than once. I am also assuming that you want each player's first time that they got their personal best in the case that there are repeats.
There's a few ways of doing this. Here's a way that is MySQL specific:
SELECT user, scoredate, score, notes FROM (
SELECT *, #prev <> user AS is_best, #prev := user
FROM table1, (SELECT #prev := -1) AS vars
ORDER BY user, score DESC, scoredate
) AS T1
WHERE is_best
Here's a more general way that uses ordinary SQL:
SELECT T3.* FROM table1 AS T3
JOIN (
SELECT T1.user, T1.score, MIN(scoredate) AS scoredate
FROM table1 AS T1
JOIN (SELECT user, MAX(score) AS score FROM table1 GROUP BY user) AS T2
ON T1.user = T2.user AND T1.score = T2.score
GROUP BY T1.user
) AS T4
ON T3.user = T4.user AND T3.score = T4.score AND T3.scoredate = T4.scoredate
Result:
1, '2010-01-01 17:00:00', 50, 'Much better'
2, '2010-01-01 14:00:00', 100, 'Perfect score'
Test data I used to test this:
CREATE TABLE table1 (user INT NOT NULL, scoredate DATETIME NOT NULL, score INT NOT NULL, notes NVARCHAR(100) NOT NULL);
INSERT INTO table1 (user, scoredate, score, notes) VALUES
(1, '2010-01-01 12:00:00', 10, 'First attempt'),
(1, '2010-01-01 17:00:00', 50, 'Much better'),
(1, '2010-01-01 22:00:00', 30, 'Time for bed'),
(2, '2010-01-01 14:00:00', 100, 'Perfect score'),
(2, '2010-01-01 16:00:00', 100, 'This is too easy');
You can join with a sub query, as in the following example:
SELECT t.*,
sub_t.max_score
FROM table t
JOIN (SELECT MAX(score) as max_score,
user
FROM table
GROUP BY user) sub_t ON (sub_t.user = t.user AND
sub_t.max_score = t.score);
The above query can be explained as follows. It starts with:
SELECT t.* FROM table t;
... This by itself will obviously list all the contents of the table. The goal is to keep only the rows that represent a maximum score of a particular user. Therefore if we had the data below:
+------------------------+
| user | score | notes |
+------+-------+---------+
| 1 | 10 | note a |
| 1 | 15 | note b |
| 1 | 20 | note c |
| 2 | 8 | note d |
| 2 | 12 | note e |
| 2 | 5 | note f |
+------+-------+---------+
...We would have wanted to keep just the "note c" and "note e" rows.
To find the rows that we want to keep, we can simply use:
SELECT MAX(score), user FROM table GROUP BY user;
Note that we cannot get the notes attribute from the above query, because as you had already noticed, you would not get the expected results for fields not aggregated with an aggregate function, like MAX() or not part of the GROUP BY clause. For further reading on this topic, you may want to check:
Debunking GROUP BY Myths
How does MySQL decide which id to return in group by clause?
Why does MySql allow “group by” queries WITHOUT aggregate functions?
Now we only need to keep the rows from the first query that match the second query. We can do this with an INNER JOIN:
...
JOIN (SELECT MAX(score) as max_score,
user
FROM table
GROUP BY user) sub_t ON (sub_t.user = t.user AND
sub_t.max_score = t.score);
The sub query is given the name sub_t. It is the set of all the users with the personal best score. The ON clause of the JOIN applies the restriction to the relevant fields. Remember that we only want to keep rows that are part of this subquery.
SELECT *
FROM table t
ORDER BY t.score DESC
GROUP BY t.user
LIMIT 1
Side note: It is better to specify the fields than use SELECT *
Related
I have a table:
ID ACCOUNT BALANCE TIME
1 Bill 10 1478885000
2 Bill 10 1478885001
3 James 5 1478885002
4 Ann 20 1478885003
5 Ann 15 1478885004
I want to get latest (based on TIME) balance of several accounts. I.e.:
ACCOUNT BALANCE
Bill 10
Ann 15
I try to use this SQL:
SELECT ACCOUNT, BALANCE, max(TIME)
FROM T1
WHERE ACCOUNT IN ( 'Bill', 'Ann')
GROUP BY ACCOUNT
I receive error:
1055 - Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'BALANCE' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
I understand the error and tried different SQLs but still do not understand how to retrieve needed data without multiple queries.
P.S. I use MySQl 5.7
SELECT T1.ACCOUNT, T1.BALANCE, T1.TIME
FROM T1
JOIN (SELECT ACCOUNT, max(TIME) as m_time
FROM T1
WHERE T1.ACCOUNT IN ( 'Bill', 'Ann')
GROUP BY ACCOUNT ) T2
ON T1.ACCOUNT = T2.ACCOUNT
AND T1.TIME = T2.m_time
WHERE T1.ACCOUNT IN ( 'Bill', 'Ann')
EDIT: for multiple time change better use variables
SQL DEMO: I change the date of Ann to be the same
SELECT ACCOUNT, BALANCE, TIME
FROM (
SELECT ACCOUNT, BALANCE, TIME,
#rn := if(ACCOUNT = #acc,
#rn + 1,
if(#acc := ACCOUNT, 1, 1) as rn
FROM T1, (SELECT #rn := 0, #acc:= '') P
WHERE ACCOUNT IN ( 'Bill', 'Ann')
ORDER BY ACCOUNT, TIME desc, BALANCE desc
) T
WHERE T.rn = 1
OUTPUT
| ACCOUNT | BALANCE | TIME |
|---------|---------|------------|
| Bill | 10 | 1478885001 |
| Ann | 20 | 1478885003 |
The error is quite clear. If you want the latest balance for each account, here is one way:
select t1.*
from t1
where t1.time = (select max(tt1.time) from t1 tt1 where t1.account = tt1.account);
You can add the where in the outer query to filter for particular accounts.
You have column in your select that are not in group by
or you add all the column not in aggregated function
SELECT ACCOUNT, BALANCE, max(TIME)
FROM T1
WHERE ACCOUNT IN ( 'Bill', 'Ann')
GROUP BY ACCOUNT, BALANCE
or you change the sql_mode using
SET sql_mode = ''
or
SELECT ACCOUNT, BALANCE, TIME
FROM T1
where id In (
select id from T1 where (account, time ) in (
select account, max(time)
from t1
WHERE ACCOUNT IN ( 'Bill', 'Ann') group by account))
MODE is the value that occurs the MOST times in the data, there can be ONE MODE or MANY MODES
here's some values in two tables (sqlFiddle)
create table t100(id int auto_increment primary key, value int);
create table t200(id int auto_increment primary key, value int);
insert into t100(value) values (1),
(2),(2),(2),
(3),(3),
(4);
insert into t200(value) values (1),
(2),(2),(2),
(3),(3),
(4),(4),(4);
right now, to get the MODE(S) returned as comma separated list, I run the below query for table t100
SELECT GROUP_CONCAT(value) as modes,occurs
FROM
(SELECT value,occurs FROM
(SELECT value,count(*) as occurs
FROM
T100
GROUP BY value)T1,
(SELECT max(occurs) as maxoccurs FROM
(SELECT value,count(*) as occurs
FROM
T100
GROUP BY value)T2
)T3
WHERE T1.occurs = T3.maxoccurs)T4
GROUP BY occurs;
and the below query for table t200 (same query just with table name changed) I have 2 tables in this example because to show that it works for cases where there's 1 MODE and where there are multiple MODES.
SELECT GROUP_CONCAT(value) as modes,occurs
FROM
(SELECT value,occurs FROM
(SELECT value,count(*) as occurs
FROM
T200
GROUP BY value)T1,
(SELECT max(occurs) as maxoccurs FROM
(SELECT value,count(*) as occurs
FROM
T200
GROUP BY value)T2
)T3
WHERE T1.occurs = T3.maxoccurs)T4
GROUP BY occurs;
My question is "Is there a simpler way?"
I was thinking like using HAVING count(*) = max(count(*)) or something similar to get rid of the extra join but couldn't get HAVING to return the result i wanted.
UPDATED:
as suggested by #zneak, I can simplify T3 like below:
SELECT GROUP_CONCAT(value) as modes,occurs
FROM
(SELECT value,occurs FROM
(SELECT value,count(*) as occurs
FROM
T200
GROUP BY value)T1,
(SELECT count(*) as maxoccurs
FROM
T200
GROUP BY value
ORDER BY count(*) DESC
LIMIT 1
)T3
WHERE T1.occurs = T3.maxoccurs)T4
GROUP BY occurs;
Now is there a way to get ride of T3 altogether?
I tried this but it returns no rows for some reason
SELECT value,occurs FROM
(SELECT value,count(*) as occurs
FROM t200
GROUP BY `value`)T1
HAVING occurs=max(occurs)
basically I am wondering if there's a way to do it such that I only need to specify t100 or t200 once.
UPDATED: i found a way to specify t100 or t200 only once by adding a variable to set my own maxoccurs like below
SELECT GROUP_CONCAT(CASE WHEN occurs=#maxoccurs THEN value ELSE NULL END) as modes
FROM
(SELECT value,occurs,#maxoccurs:=GREATEST(#maxoccurs,occurs) as maxoccurs
FROM (SELECT value,count(*) as occurs
FROM t200
GROUP BY `value`)T1,(SELECT #maxoccurs:=0)mo
)T2
You are very close with the last query. The following finds one mode:
SELECT value, occurs
FROM (SELECT value,count(*) as occurs
FROM t200
GROUP BY `value`
LIMIT 1
) T1
I think your question was about multiple modes, though:
SELECT value, occurs
FROM (SELECT value, count(*) as occurs
FROM t200
GROUP BY `value`
) T1
WHERE occurs = (select max(occurs)
from (select `value`, count(*) as occurs
from t200
group by `value`
) t
);
EDIT:
This is much easier in almost any other database. MySQL supports neither with nor window/analytic functions.
Your query (shown below) does not do what you think it is doing:
SELECT value, occurs
FROM (SELECT value, count(*) as occurs
FROM t200
GROUP BY `value`
) T1
HAVING occurs = max(occurs) ;
The final having clause refers to the variable occurs but does use max(occurs). Because of the use of max(occurs) this is an aggregation query that returns one row, summarizing all rows from the subquery.
The variable occurs is not using for grouping. So, what value does MySQL use? It uses an arbitrary value from one of the rows in the subquery. This arbitrary value might match, or it might not. But, the value only comes from one row. There is no iteration over it.
I realize this is a very old question but in looking for the best way to find the MODE in a MySQL table, I came up with this:
SELECT [column name], count(*) as [ccount] FROM [table] WHERE [field] = [item] GROUP BY [column name] ORDER BY [ccount] DESC LIMIT 1 ;
In my actual situation, I had a log with recorded events in it. I wanted to know during which period (1, 2 or 3 as recorded in my log) the specific event occurred the most number of times. (Eg, the MODE of "period" column of the table for that specific event
My table looked like this (abridged):
EVENT_TYPE | PERIOD
-------------------------
1 | 3
1 | 3
1 | 3
1 | 2
2 | 1
2 | 1
2 | 1
2 | 3
Using the query:
SELECT event_type, period, count(*) as pcount FROM proto_log WHERE event_type = 1 GROUP BY period ORDER BY pcount DESC LIMIT 1 ;
I get the result:
> EVENT_TYPE | PERIOD | PCOUNT
> --------------------------------------
1 | 3 | 3
Using this result, the period column ($result['period'] for example) should contain the MODE for that query and of course pcount contains the actual count.
If you wanted to get multiple modes, I suppse you could keep adding other criteria to your WHERE clause using ORs:
SELECT event_type, period, count(*) as pcount FROM proto_log WHERE event_type = 1 ***OR event_type = 2*** GROUP BY period ORDER BY pcount DESC LIMIT 2 ;
The multiple ORs should give you the additional results and the LIMIT increase will add the additional MODES to the results. (Otherwise it will still only show the top 1 result)
Results:
EVENT_TYPE | PERIOD | PCOUNT
--------------------------------------
1 | 3 | 3
2 | 1 | 3
I am not 100% sure this is doing exactly what I think it is doing, or if it will work in all situations, so please let me know if I am on or off track here.
I have a table ("lms_attendance") of users' check-in and out times that looks like this:
id user time io (enum)
1 9 1370931202 out
2 9 1370931664 out
3 6 1370932128 out
4 12 1370932128 out
5 12 1370933037 in
I'm trying to create a view of this table that would output only the most recent record per user id, while giving me the "in" or "out" value, so something like:
id user time io
2 9 1370931664 out
3 6 1370932128 out
5 12 1370933037 in
I'm pretty close so far, but I realized that views won't accept subquerys, which is making it a lot harder. The closest query I got was :
select
`lms_attendance`.`id` AS `id`,
`lms_attendance`.`user` AS `user`,
max(`lms_attendance`.`time`) AS `time`,
`lms_attendance`.`io` AS `io`
from `lms_attendance`
group by
`lms_attendance`.`user`,
`lms_attendance`.`io`
But what I get is :
id user time io
3 6 1370932128 out
1 9 1370931664 out
5 12 1370933037 in
4 12 1370932128 out
Which is close, but not perfect. I know that last group by shouldn't be there, but without it, it returns the most recent time, but not with it's relative IO value.
Any ideas?
Thanks!
Query:
SQLFIDDLEExample
SELECT t1.*
FROM lms_attendance t1
WHERE t1.time = (SELECT MAX(t2.time)
FROM lms_attendance t2
WHERE t2.user = t1.user)
Result:
| ID | USER | TIME | IO |
--------------------------------
| 2 | 9 | 1370931664 | out |
| 3 | 6 | 1370932128 | out |
| 5 | 12 | 1370933037 | in |
Note that if a user has multiple records with the same "maximum" time, the query above will return more than one record. If you only want 1 record per user, use the query below:
SQLFIDDLEExample
SELECT t1.*
FROM lms_attendance t1
WHERE t1.id = (SELECT t2.id
FROM lms_attendance t2
WHERE t2.user = t1.user
ORDER BY t2.id DESC
LIMIT 1)
No need to trying reinvent the wheel, as this is common greatest-n-per-group problem. Very nice solution is presented.
I prefer the most simplistic solution (see SQLFiddle, updated Justin's) without subqueries (thus easy to use in views):
SELECT t1.*
FROM lms_attendance AS t1
LEFT OUTER JOIN lms_attendance AS t2
ON t1.user = t2.user
AND (t1.time < t2.time
OR (t1.time = t2.time AND t1.Id < t2.Id))
WHERE t2.user IS NULL
This also works in a case where there are two different records with the same greatest value within the same group - thanks to the trick with (t1.time = t2.time AND t1.Id < t2.Id). All I am doing here is to assure that in case when two records of the same user have same time only one is chosen. Doesn't actually matter if the criteria is Id or something else - basically any criteria that is guaranteed to be unique would make the job here.
Based in #TMS answer, I like it because there's no need for subqueries but I think ommiting the 'OR' part will be sufficient and much simpler to understand and read.
SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
ON t1.user = t2.user
AND t1.time < t2.time
WHERE t2.user IS NULL
if you are not interested in rows with null times you can filter them in the WHERE clause:
SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
ON t1.user = t2.user
AND t1.time < t2.time
WHERE t2.user IS NULL and t1.time IS NOT NULL
Already solved, but just for the record, another approach would be to create two views...
CREATE TABLE lms_attendance
(id int, user int, time int, io varchar(3));
CREATE VIEW latest_all AS
SELECT la.user, max(la.time) time
FROM lms_attendance la
GROUP BY la.user;
CREATE VIEW latest_io AS
SELECT la.*
FROM lms_attendance la
JOIN latest_all lall
ON lall.user = la.user
AND lall.time = la.time;
INSERT INTO lms_attendance
VALUES
(1, 9, 1370931202, 'out'),
(2, 9, 1370931664, 'out'),
(3, 6, 1370932128, 'out'),
(4, 12, 1370932128, 'out'),
(5, 12, 1370933037, 'in');
SELECT * FROM latest_io;
Click here to see it in action at SQL Fiddle
If your on MySQL 8.0 or higher you can use Window functions:
Query:
DBFiddleExample
SELECT DISTINCT
FIRST_VALUE(ID) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS ID,
FIRST_VALUE(USER) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS USER,
FIRST_VALUE(TIME) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS TIME,
FIRST_VALUE(IO) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS IO
FROM lms_attendance;
Result:
| ID | USER | TIME | IO |
--------------------------------
| 2 | 9 | 1370931664 | out |
| 3 | 6 | 1370932128 | out |
| 5 | 12 | 1370933037 | in |
The advantage I see over using the solution proposed by Justin is that it enables you to select the row with the most recent data per user (or per id, or per whatever) even from subqueries without the need for an intermediate view or table.
And in case your running a HANA it is also ~7 times faster :D
Ok, this might be either a hack or error-prone, but somehow this is working as well-
SELECT id, MAX(user) as user, MAX(time) as time, MAX(io) as io FROM lms_attendance GROUP BY id;
select b.* from
(select
`lms_attendance`.`user` AS `user`,
max(`lms_attendance`.`time`) AS `time`
from `lms_attendance`
group by
`lms_attendance`.`user`) a
join
(select *
from `lms_attendance` ) b
on a.user = b.user
and a.time = b.time
I have tried one solution which works for me
SELECT user, MAX(TIME) as time
FROM lms_attendance
GROUP by user
HAVING MAX(time)
I have a very large table and all of the other suggestions here were taking a very long time to execute. I came up with this hacky method that was much faster. The downside is, if the max(date) row has a duplicate date for that user, it will return both of them.
SELECT * FROM mb_web.devices_log WHERE CONCAT(dtime, '-', user_id) in (
SELECT concat(max(dtime), '-', user_id) FROM mb_web.devices_log GROUP BY user_id
)
select result from (
select vorsteuerid as result, count(*) as anzahl from kreditorenrechnung where kundeid = 7148
group by vorsteuerid
) a order by anzahl desc limit 0,1
I have done same thing like below
SELECT t1.*
FROM lms_attendance t1
WHERE t1.id in (SELECT max(t2.id) as id
FROM lms_attendance t2
group BY t2.user)
This will also reduce memory utilization.
Thanks.
Possibly you can do group by user and then order by time desc. Something like as below
SELECT * FROM lms_attendance group by user order by time desc;
Try this query:
select id,user, max(time), io
FROM lms_attendance group by user;
This worked for me:
SELECT user, time FROM
(
SELECT user, time FROM lms_attendance --where clause
) AS T
WHERE (SELECT COUNT(0) FROM table WHERE user = T.user AND time > T.time) = 0
ORDER BY user ASC, time DESC
I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.
tbl_patient
id | patient_ID | visit_ID | patient_result
1 | 1 | 1 | 5
2 | 2 | 1 | 6
3 | 2 | 3 | 7
4 | 1 | 2 | 3
5 | 2 | 3 | 2
6 | 1 | 3 | 9
I tried the query below to fetch the last visit of each patient as,
SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`
Now i want to fetch the 2nd last visit of each patient with query but it give me error
(#1242 - Subquery returns more than 1 row)
SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`
Where I'm wrong
select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
from tbl_patient
group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id
id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.
Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9.
You can try this query:
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
where id not in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
GROUP BY patient_ID)
order by 1,2
SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2
ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`
There are a couple of approaches to getting the specified resultset returned in a single SQL statement.
Unfortunately, most of those approaches yield rather unwieldy statements.
The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.
Three of the most common approaches make use of:
correlated subquery
inequality join (nearly a Cartesian product)
two passes over the data
Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM (
SELECT p.id
, p.patient_id
, p.visit_id
, p.patient_result
, #rn := if(#prev_patient_id = patient_id, #rn + 1, 1) AS rn
, #prev_patient_id := patient_id AS prev_patient_id
FROM tbl_patients p
JOIN (SELECT #rn := 0, #prev_patient_id := NULL) i
ORDER BY p.patient_id DESC, p.id DESC
) t
WHERE t.rn <= 2
Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.
This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.
This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.
Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.
It's also possible to get an equivalent resultset with a correlated subquery:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
WHERE ( SELECT COUNT(1) AS cnt
FROM tbl_patients p
WHERE p.patient_id = t.patient_id
AND p.id >= t.id
) <= 2
ORDER BY t.patient_id ASC, t.id ASC
Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.
As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
LEFT
JOIN tbl_patients p
ON p.patient_id = t.patient_id
AND t.id < p.id
GROUP
BY t.id
, t.patient_id
, t.visit_id
, t.patient_result
HAVING COUNT(1) <= 2
Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.
Try this..
SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID) - 1) GROUP BY patient_ID
Why not use simply...
GROUP BY `patient_ID` DESC LIMIT 2
... and do the rest in the next step?
I'm working on a simple time tracking app.
I've created a table that logs the IN and OUT times of employees.
Here is an example of how my data currently looks:
E_ID | In_Out | Date_Time
------------------------------------
3 | I | 2012-08-19 15:41:52
3 | O | 2012-08-19 17:30:22
1 | I | 2012-08-19 18:51:11
3 | I | 2012-08-19 18:55:52
1 | O | 2012-08-19 20:41:52
3 | O | 2012-08-19 21:50:30
Im trying to create a query that will pair the IN and OUT times of an employee into one row like this:
E_ID | In_Time | Out_Time
------------------------------------------------
3 | 2012-08-19 15:41:52 | 2012-08-19 17:30:22
3 | 2012-08-19 18:55:52 | 2012-08-19 21:50:30
1 | 2012-08-19 18:51:11 | 2012-08-19 20:41:52
I hope I'm being clear in what I'm trying to achieve here.
Basically I want to generate a report that had both the in and out time merged into one row.
Any help with this would be greatly appreciated.
Thanks in advance.
There are three basic approaches I can think of.
One approach makes use of MySQL user variables, one approach uses a theta JOIN, another uses a subquery in the SELECT list.
theta-JOIN
One approach is to use a theta-JOIN. This approach is a generic SQL approach (no MySQL specific syntax), which can work with multiple RDBMS.
N.B. With a large number of rows, this approach can create a significantly large intermediate result set, which can lead to problematic performance.
SELECT o.e_id, MAX(i.date_time) AS in_time, o.date_time AS out_time
FROM e `o`
LEFT
JOIN e `i` ON i.e_id = o.e_id AND i.date_time < o.date_time AND i.in_out = 'I'
WHERE o.in_out = 'O'
GROUP BY o.e_id, o.date_time
ORDER BY o.date_time
What this does is match every 'O' row for an employee with every 'I' row that is earlier, and then we use the MAX aggregate to pick out the 'I' record with the closest date time.
This works for perfectly paired data; could produce odd results for imperfect pairs... (two consecutive 'O' records with no intermediate 'I' row, will both get matched to the same 'I' row, etc.)
correlated subquery in SELECT list
Another approach is to use a correlated subquery in the SELECT list. This can have sub-optimal performance, but is sometimes workable (and is occasionally the fastest way to return the specified result set... this approach works best when we have a limited number of rows returned in the outer query.)
SELECT o.e_id
, (SELECT MAX(i.date_time)
FROM e `i`
WHERE i.in_out = 'I'
AND i.e_id = o.e_id
AND i.date_time < o.date_time
) AS in_time
, o.date_time AS out_time
FROM e `o`
WHERE o.in_out = 'O'
ORDER BY o.date_time
User variables
Another approach is to make use of MySQL user variables. (This is a MySQL-specific approach, and is a workaround to the "missing" analytic functions.)
What this query does is order all of the rows by e_id, then by date_time, so we can process them in order. Whenever we encounter an 'O' (out) row, we use the value of date_time from the immediately preceding 'I' row as the 'in_time')
N.B.: This usage of MySQL user variables is dependent on MySQL performing operations in a specific order, a predictable plan. The use of the inline views (or "derived tables", in MySQL parlance) gets us a predictable execution plan. But this behavior is subject to change in future releases of MySQL.
SELECT c.e_id
, CAST(c.in_time AS DATETIME) AS in_time
, c.out_time
FROM (
SELECT IF(#prev_e_id = d.e_id,#in_time,#in_time:=NULL) AS reset_in_time
, #in_time := IF(d.in_out = 'I',d.date_time,#in_time) AS in_time
, IF(d.in_out = 'O',d.date_time,NULL) AS out_time
, #prev_e_id := d.e_id AS e_id
FROM (
SELECT e_id, date_time, in_out
FROM e
JOIN (SELECT #prev_e_id := NULL, #in_time := NULL) f
ORDER BY e_id, date_time, in_out
) d
) c
WHERE c.out_time IS NOT NULL
ORDER BY c.out_time
This works for the set of data you have, it needs more thorough testing and tweaking to ensure you get the result set you want with quirky data, when the rows are not perfectly paired (e.g. two 'O' rows with no 'I' row between them, an 'I' row with no subsequent 'O' row, etc.)
SQL Fiddle
Unfortunately, MySQL doesn't have ROW_NUMBER() OVER(PARTITION BY ORDER BY() function like SQL Server or this would be incredibly easy.
But, there is a way to do this in MySQL:
set #num := 0, #in_out := '';
select emp_in.id,
emp_in.in_time,
emp_out.out_time
from
(
select id, in_out, date_time in_time,
#num := if(#in_out = in_out, #num + 1, 1) as row_number,
#in_out := in_out as dummy
from mytable
where in_out = 'I'
order by date_time, id
) emp_in
join
(
select id, in_out, date_time out_time,
#num := if(#in_out = in_out, #num + 1, 1) as row_number,
#in_out := in_out as dummy
from mytable
where in_out = 'O'
order by date_time, id
) emp_out
on emp_in.id = emp_out.id
and emp_in.row_number = emp_out.row_number
order by emp_in.id, emp_in.in_time
Basically, this creates two sub-queries each one generates a row_number for that particular record - one subquery is for in_time and the other is for out_time.
Then you JOIN the two queries together on the emp_id and the row_number
See SQL Fiddle with Demo