Grouping query while providing extra selects - mysql

I have the following query
select t1.file_name,
max(create_date),
t1.id
from dbo.translation_style_sheet AS t1
GROUP BY t1.file_name
I'd like to add the id to the select, but everytime I do this, it returns all results rather than my grouped results
my goal is to return
pdf | 10/1/2012 | 3
doc | 10/2/2012 | 5
however my query is returning
pdf | 9/30/2012 | 1
pdf | 9/31/2012 | 2
pdf | 10/1/2012 | 3
doc | 10/1/2012 | 4
doc | 10/2/2012 | 5
Anybody know what I'm doing wrong?

If you are guaranteed that the row with the max(create_date) also has the largest id value (your sample data does but I don't know if that is an artifact), you can simply select the max(id)
select t1.file_name, max(create_date), max(t1.id)
from dbo.translation_style_sheet AS t1
GROUP BY t1.file_name
If you are not guaranteed that, and you want the id value that is from the row that has the max(create_date) to be returned, you can use analytic functions. The following will work in Oracle though it is unlikely that less sophisticated databases like Derby would support it.
select t1.file_name, t1.create_date, t1.id
from (select t2.file_name,
t2.create_date,
t2.id,
rank() over (partition by t2.file_name
order by t2.create_date desc) rnk
from dbo.translation_style_sheet t2) t1
where rnk = 1
You could also use a subquery. This is likely to be more portable but less efficient than the analytic function approach
select t1.file_name, t1.create_date, t1.id
from dbo.translation_style_sheet t1
where (t1.file_name, t1.create_date) in (select t2.file_name, max(t2.create_date)
from dbo.translation_style_sheet t2
group by t2.file_name);
or
select t1.file_name, t1.create_date, t1.id
from dbo.translation_style_sheet t1
where t1.create_date = (select max(t2.create_date)
from dbo.translation_style_sheet t2
where t1.file_name = t2.file_name);

When you are using GROUP BY, you have to include all fields in the group that aren't being used by the aggregate function.
select t1.file_name, max(create_date), t1.id from
dbo.translation_style_sheet AS t1 GROUP BY t1.file_name, t1.id

Related

Sort records based on string

Please consider the table below
Id F1 F2
---------------------------
1 Nima a
2 Eli a
3 Arian a
4 Ava b
5 Arsha b
6 Rozhan c
7 Zhina c
I want to display records by sorting COLUMN F2 to display one record from each string category (a,b,c) in order
Id F1 F2
---------------------------
1 Nima a
5 Arsha b
6 Rozhan c
2 Eli a
4 Ava b
7 Zhina c
3 Arian a
NOTE: a,b,c could be anything... it should take one record from one entry and then 2nd from 2nd entry.
I have used join, or group by records but no success.
MySQL version 5.7 – Syed Saqlain
SELECT id, f1, f2
FROM ( SELECT t1.id, t1.f1, t1.f2, COUNT(*) cnt
FROM test t1
JOIN test t2 ON t1.f2 = t2.f2 AND t1.id >= t2.id
GROUP BY t1.id, t1.f1, t1.f2 ) t3
ORDER BY cnt, f2;
https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=8138bd9ab5be36ba534a258d20b2e555
ROW_NUMBER() alternative for lower version of MYSQL. This query will work for version 5.5, 5.6 & 5.7.
-- MySQL (v5.7)
SELECT t.id, t.f1, t.f2
FROM (SELECT #row_no:=CASE WHEN #db_names=d.f2 THEN #row_no+1 ELSE 1 END AS row_number
, #db_names:= d.f2
, d.f2
, d.f1
, d.id
FROM test d,
(SELECT #row_no := 0,#db_names:='') x
ORDER BY d.f2) t
ORDER BY t.row_number, t.f2
Please check from url https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=02dbb0086a6dd7c926d55a690bffbd06
You can use window functions in the order by:
select t.*
from t
order by row_number() over (partition by f2 order by id),
f2;
The row_number() function (as used above) assigns a sequential number starting with 1 to each value of f2.
In older versions of MySQL, you can use a correlated subquery instead:
order by (select count(*) from t t2 where t2.f2 = t.f2 and t2.id <= t.id),
f2;
For performance, you want an index on (f2, id).

Select row with most recent date per user

I have a table ("lms_attendance") of users' check-in and out times that looks like this:
id user time io (enum)
1 9 1370931202 out
2 9 1370931664 out
3 6 1370932128 out
4 12 1370932128 out
5 12 1370933037 in
I'm trying to create a view of this table that would output only the most recent record per user id, while giving me the "in" or "out" value, so something like:
id user time io
2 9 1370931664 out
3 6 1370932128 out
5 12 1370933037 in
I'm pretty close so far, but I realized that views won't accept subquerys, which is making it a lot harder. The closest query I got was :
select
`lms_attendance`.`id` AS `id`,
`lms_attendance`.`user` AS `user`,
max(`lms_attendance`.`time`) AS `time`,
`lms_attendance`.`io` AS `io`
from `lms_attendance`
group by
`lms_attendance`.`user`,
`lms_attendance`.`io`
But what I get is :
id user time io
3 6 1370932128 out
1 9 1370931664 out
5 12 1370933037 in
4 12 1370932128 out
Which is close, but not perfect. I know that last group by shouldn't be there, but without it, it returns the most recent time, but not with it's relative IO value.
Any ideas?
Thanks!
Query:
SQLFIDDLEExample
SELECT t1.*
FROM lms_attendance t1
WHERE t1.time = (SELECT MAX(t2.time)
FROM lms_attendance t2
WHERE t2.user = t1.user)
Result:
| ID | USER | TIME | IO |
--------------------------------
| 2 | 9 | 1370931664 | out |
| 3 | 6 | 1370932128 | out |
| 5 | 12 | 1370933037 | in |
Note that if a user has multiple records with the same "maximum" time, the query above will return more than one record. If you only want 1 record per user, use the query below:
SQLFIDDLEExample
SELECT t1.*
FROM lms_attendance t1
WHERE t1.id = (SELECT t2.id
FROM lms_attendance t2
WHERE t2.user = t1.user
ORDER BY t2.id DESC
LIMIT 1)
No need to trying reinvent the wheel, as this is common greatest-n-per-group problem. Very nice solution is presented.
I prefer the most simplistic solution (see SQLFiddle, updated Justin's) without subqueries (thus easy to use in views):
SELECT t1.*
FROM lms_attendance AS t1
LEFT OUTER JOIN lms_attendance AS t2
ON t1.user = t2.user
AND (t1.time < t2.time
OR (t1.time = t2.time AND t1.Id < t2.Id))
WHERE t2.user IS NULL
This also works in a case where there are two different records with the same greatest value within the same group - thanks to the trick with (t1.time = t2.time AND t1.Id < t2.Id). All I am doing here is to assure that in case when two records of the same user have same time only one is chosen. Doesn't actually matter if the criteria is Id or something else - basically any criteria that is guaranteed to be unique would make the job here.
Based in #TMS answer, I like it because there's no need for subqueries but I think ommiting the 'OR' part will be sufficient and much simpler to understand and read.
SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
ON t1.user = t2.user
AND t1.time < t2.time
WHERE t2.user IS NULL
if you are not interested in rows with null times you can filter them in the WHERE clause:
SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
ON t1.user = t2.user
AND t1.time < t2.time
WHERE t2.user IS NULL and t1.time IS NOT NULL
Already solved, but just for the record, another approach would be to create two views...
CREATE TABLE lms_attendance
(id int, user int, time int, io varchar(3));
CREATE VIEW latest_all AS
SELECT la.user, max(la.time) time
FROM lms_attendance la
GROUP BY la.user;
CREATE VIEW latest_io AS
SELECT la.*
FROM lms_attendance la
JOIN latest_all lall
ON lall.user = la.user
AND lall.time = la.time;
INSERT INTO lms_attendance
VALUES
(1, 9, 1370931202, 'out'),
(2, 9, 1370931664, 'out'),
(3, 6, 1370932128, 'out'),
(4, 12, 1370932128, 'out'),
(5, 12, 1370933037, 'in');
SELECT * FROM latest_io;
Click here to see it in action at SQL Fiddle
If your on MySQL 8.0 or higher you can use Window functions:
Query:
DBFiddleExample
SELECT DISTINCT
FIRST_VALUE(ID) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS ID,
FIRST_VALUE(USER) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS USER,
FIRST_VALUE(TIME) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS TIME,
FIRST_VALUE(IO) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS IO
FROM lms_attendance;
Result:
| ID | USER | TIME | IO |
--------------------------------
| 2 | 9 | 1370931664 | out |
| 3 | 6 | 1370932128 | out |
| 5 | 12 | 1370933037 | in |
The advantage I see over using the solution proposed by Justin is that it enables you to select the row with the most recent data per user (or per id, or per whatever) even from subqueries without the need for an intermediate view or table.
And in case your running a HANA it is also ~7 times faster :D
Ok, this might be either a hack or error-prone, but somehow this is working as well-
SELECT id, MAX(user) as user, MAX(time) as time, MAX(io) as io FROM lms_attendance GROUP BY id;
select b.* from
(select
`lms_attendance`.`user` AS `user`,
max(`lms_attendance`.`time`) AS `time`
from `lms_attendance`
group by
`lms_attendance`.`user`) a
join
(select *
from `lms_attendance` ) b
on a.user = b.user
and a.time = b.time
I have tried one solution which works for me
SELECT user, MAX(TIME) as time
FROM lms_attendance
GROUP by user
HAVING MAX(time)
I have a very large table and all of the other suggestions here were taking a very long time to execute. I came up with this hacky method that was much faster. The downside is, if the max(date) row has a duplicate date for that user, it will return both of them.
SELECT * FROM mb_web.devices_log WHERE CONCAT(dtime, '-', user_id) in (
SELECT concat(max(dtime), '-', user_id) FROM mb_web.devices_log GROUP BY user_id
)
select result from (
select vorsteuerid as result, count(*) as anzahl from kreditorenrechnung where kundeid = 7148
group by vorsteuerid
) a order by anzahl desc limit 0,1
I have done same thing like below
SELECT t1.*
FROM lms_attendance t1
WHERE t1.id in (SELECT max(t2.id) as id
FROM lms_attendance t2
group BY t2.user)
This will also reduce memory utilization.
Thanks.
Possibly you can do group by user and then order by time desc. Something like as below
SELECT * FROM lms_attendance group by user order by time desc;
Try this query:
select id,user, max(time), io
FROM lms_attendance group by user;
This worked for me:
SELECT user, time FROM
(
SELECT user, time FROM lms_attendance --where clause
) AS T
WHERE (SELECT COUNT(0) FROM table WHERE user = T.user AND time > T.time) = 0
ORDER BY user ASC, time DESC

mysql extract rows with max values

I'm trying to figure out best way to take required rows from database.
Database table:
id user cat time
1 5 1 123
2 5 1 150
3 5 2 160
4 5 3 100
I want to take DISTINCT cat ... WHERE user=5 with MAX time value. How should I do that in efficient way?
You will want to use an aggregate function with a GROUP BY:
select user, cat, max(time) as Time
from yourtable
group by user, cat
See SQL Fiddle with Demo
If you want to include the id column, then you can use a subquery:
select t1.id,
t1.user,
t1.cat,
t1.time
from yourtable t1
inner join
(
select max(time) Time, user, cat
from yourtable
group by user, cat
) t2
on t1.time = t2.time
and t1.user = t2.user
and t1.cat = t2.cat
See SQL Fiddle with Demo. I used a subquery to be sure that the id value that is returned with each max(time) row is the correct id.

DELETE a record in relational position in MySQL?

I am trying to clean up records stored in a MySQL table. If a row contains %X%, I need to delete that row and the row immediately below it, regardless of content. E.g. (sorry if the table is insulting anyone's intelligence):
| 1 | leave alone
| 2 | Contains %X% - Delete
| 3 | This row should also be deleted
| 4 | leave alone
| 5 | Contains %X% - Delete
| 6 | This row should also be deleted
| 7 | leave alone
Is there a way to do this using only a couple of queries? Or am I going to have to execute a SELECT query first (using the %x% search parameter) then loop through those results and execute a DELETE...WHERE for each index returned + 1
This should work although its a bit clunky (might want to check the LIKE argument as it uses pattern matching (see comments)
DELETE FROM table.db
WHERE idcol IN
( SELECT idcol FROM db.table WHERE col LIKE '%X%')
OR idcolIN
( SELECTidcol+1 FROMdb.tableWHEREcol` LIKE '%X%')
Let's assume the table was named test and contained to columns named id and data.
We start with a SELECT that gives us the id of all rows that have a preceding row (highest id of all ids lower than id of our current row):
SELECT t1.id FROM test t1
JOIN test t2 ON
( t2.id, true )
=
( SELECT t3.id, t3.data LIKE '%X%' FROM test t3
WHERE t3.id < t1.id ORDER BY id DESC LIMIT 1 )
That gives us the ids 3 and 6. Their preceding rows 2 and 5 contain %X%, so that's good.
Now lets get the ids of the rows that contain %X% and combine them with the previous ones, via UNION:
(SELECT t1.id FROM test t1
JOIN test t2 ON
( t2.id, true )
=
( SELECT t3.id, t3.data LIKE '%X%' FROM test t3
WHERE t3.id < t1.id ORDER BY id DESC LIMIT 1 )
)
UNION
(
SELECT id FROM test WHERE data LIKE '%X%'
)
That gives us 3, 6, 2, 5 - nice!
Now, we can't delete from a table and select from the same table in MySQL - so lets use a temporary table, store our ids that are to be deleted in there, and then read from that temporary table to delete from our original table:
CREATE TEMPORARY TABLE deleteids (id INT);
INSERT INTO deleteids
(SELECT t1.id FROM test t1
JOIN test t2 ON
( t2.id, true )
=
( SELECT t3.id, t3.data LIKE '%X%' FROM test t3
WHERE t3.id < t1.id ORDER BY id DESC LIMIT 1 )
)
UNION
(
SELECT id FROM test WHERE data LIKE '%X%'
);
DELETE FROM test WHERE id in (SELECT * FROM deleteids);
... and we are left with the ids 1, 4 and 7 in our test table!
(And since the previous rows are selected using <, ORDER BY and LIMIT, this also works if the ids are not continuous.)
You can do it all in a single DELETE statement:
Assuming the "row immediately after" is based on the order of your INT-based ID column, you can use MySQL variables to assign row numbers which accounts for gaps in your IDs:
DELETE a FROM tbl a
JOIN (
SELECT a.id, b.id AS nextid
FROM (
SELECT a.id, a.text, #rn:=#rn+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn:=1) rn_init
ORDER BY a.id
) a
LEFT JOIN (
SELECT a.id, #rn2:=#rn2+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn2:=0) rn_init
ORDER BY a.id
) b ON a.rownum = b.rownum
WHERE a.text LIKE '%X%'
) b ON a.id IN (b.id, b.nextid)
SQL Fiddle Demo (added additional data for example)
What this does is it first takes your data and ranks it based on your ID column, then we do an offset LEFT JOIN on an almost identical result set except that the rank column is behind by 1. This gets the rows and their immediate "next" rows side by side so that we can pull both of their id's at the same time in the parent DELETE statement:
SELECT a.id, a.text, b.id AS nextid, b.text AS nexttext
FROM (
SELECT a.id, a.text, #rn:=#rn+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn:=1) rn_init
ORDER BY a.id
) a
LEFT JOIN (
SELECT a.id, a.text, #rn2:=#rn2+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn2:=0) rn_init
ORDER BY a.id
) b ON a.rownum = b.rownum
WHERE a.text LIKE '%X%'
Yields:
ID | TEXT | NEXTID | NEXTTEXT
2 | Contains %X% - Delete | 3 | This row should also be deleted
5 | Contains %X% - Delete | 6 | This row should also be deleted
257 | Contains %X% - Delete | 3434 | This row should also be deleted
4000 | Contains %X% - Delete | 4005 | Contains %X% - Delete
4005 | Contains %X% - Delete | 6000 | Contains %X% - Delete
6000 | Contains %X% - Delete | 6534 | This row should also be deleted
We then JOIN-DELETE that entire statement on the condition that it deletes rows whose IDs are either the "subselected" ID or NEXTID.
There is no reasonable way of doing this in a single query. (It may be possible, but the query you end up having to use will be unreasonably complex, and will almost certainly not be portable to other SQL engines.)
Use the SELECT-then-DELETE approach you described in your question.

MySQL - count values and merge results from multiple tables

I have two tables with one column each, containing names.
Names can have duplicates. One name can be found on every table or only in one.
I want to make an query that count duplicates, for each name in every table an list these values like this:
| name | table1 | table2 |
| john | 12 | 23 |
| mark | 2 | 5 |
| mary | | 10 |
| luke | 4 | |
I tried different strategies using UNION but no luck.
Thanks in advance!!!
SELECT DISTINCT t1.name, t1.cnt1, t2.cnt2
FROM
(SELECT name,count(name) as cnt1 FROM table1 GROUP BY name) t1
LEFT JOIN
(SELECT name,count(name) as cnt2 FROM table2 GROUP BY name) t2
ON t1.name = t2.name
UNION
SELECT DISTINCT t2.name, t1.cnt1, t2.cnt2
FROM
(SELECT name,count(name) as cnt1 FROM table1 GROUP BY name) t1
RIGHT JOIN
(SELECT name,count(name) as cnt2 FROM table2 GROUP BY name) t2
ON t1.name = t2.name
Here's a simpler solution:
You can UNION the names from the two tables together, manually differentiating their origin tables with a tbl column.
Then it's just a simple GROUP BY with conditional aggregation using the differentiating column:
SELECT a.name,
NULLIF(COUNT(CASE a.tbl WHEN 1 THEN 1 END), 0) AS table1,
NULLIF(COUNT(CASE a.tbl WHEN 2 THEN 1 END), 0) AS table2
FROM
(
SELECT name, 1 AS tbl FROM table1 UNION ALL
SELECT name, 2 FROM table2
) a
GROUP BY a.name
In accordance with your desired result-set, we NULL the count value if it turns out to be 0.
SQLFiddle Demo
SELECT SUM(res.cn), name
FROM
(
SELECT name, count(name) as cn from table1 GROUP BY name HAVING count(name) > 1
UNION ALL
SELECT name, count(name) as cn from table2 GROUP BY name HAVING count(name)>1
) as res
GROUP BY nam
e
Try the above :) I made a fiddle for you to test it:
http://sqlfiddle.com/#!3/796b2/3
It has a few double names in each table and will show you which names have doubles and then print them. The names that only appear once are not shown (acheived by the HAVING clause)
After some reading i don't think that it's posibil what i want to do. This situation ca be solved with pivot table in excel or libreoffice.
In fact this is method that i used, combined with some sql stataments to count occurence of names and export as CSV.
UNION definitetly not work. Some chance are with join, but not shure.
I found a post that discusses the same problem as mine.
MySQL - Rows to Columns