Avoiding duplicates in mysql query that uses UNION and ORDER BY - mysql

I have two tables, lets say table1 and table2 with common columns, id and update_date. I am looking to get the id's and update_date based on latest update_date in descending order. I have used 'union' and 'order by' together which gave the results in descending order of update_date but there are duplicate id's which I am not sure how to get rid of.
My query is like,
(select id,update_date from table1 where [condition])
UNION
(select id,update_date from table2 where [condition])
order by update_date desc;
I can just get rid of the duplicate id's by adding select distinct id from (above query) as temp; but the problem is that I need the update_date too.
Can anyone suggest how to get rid of duplicates and still get both id and update_date information.

Assuming you want the latest update out of duplicates this one should work:
SELECT id, max(update_date) AS last_update
FROM
( (select id,update_date from table1 where [conditions])
UNION
(select id,update_date from table2 where [conditions]) ) both_tables
GROUP BY id
ORDER by last_update DESC

Wrap the query in a DISTINCT block:
SELECT DISTINCT * FROM (
select id,update_date from table1 where [condition]
UNION
select id,update_date from table2 where [condition]
)
order by update_date desc;

Limit the second query's results:
select id, update_date
from table1
where [conditions]
union
select id, update_date
from table2
where [conditions]
and id not in (select id from table1 where [conditions])

Related

Can't remove duplicates from MariaDB 10

I'm having hard time removing duplicates from database. It's MariaDB (protocol version: 10, 10.3.34-MariaDB Server). I need to remove rows where three columns are equal. I was trying to use WITH clause but database throws error that it can't recognize 'WITH', so I focused on traditional way.
I need to remove rows where foreignId, column1 and column2 are equal.
I'm checking if there are duplicates like
SELECT foreignId, column1, column2, COUNT(*)
FROM table1
GROUP BY foreignId, column1, column2
HAVING COUNT(*) > 1
Trying to remove duplicates...
DELETE table1
FROM table1
INNER JOIN (
SELECT
p.id,
p.foreignId,
p.column1,
p.column2,
ROW_NUMBER() OVER (
PARTITION BY
p.column1,
p.column2,
p.foreignId
ORDER BY
p.foreignId,
p.column2,
p.column1
) AS row_number
FROM table1 p
GROUP BY p.foreignId, p.column1, p.column2
) dup
ON table1.column1 = dup.column1
WHERE dup.row_number > 1;
I was modifying this code alot but still can't make it work as intended... What am I doing wrong?
You have a few issues with your query:
You need to remove the GROUP BY in the subquery
You should change the ORDER BY in the OVER clause to ORDER BY p.ts DESC (where ts is the name of your timestamp column)
You need to JOIN on the unique id column; otherwise you will delete any row which has values which have duplicates anywhere i.e. ON table.id = dup.id
That will give you:
DELETE table1
FROM table1
INNER JOIN (
SELECT
p.id,
ROW_NUMBER() OVER (
PARTITION BY
p.column1,
p.column2,
p.foreignId
ORDER BY
p.ttimestamp DESC
) AS rn
FROM table1 p
) dup
ON table1.id = dup.id
WHERE dup.rn > 1
Note I would not use row_number as a column alias as it is a reserved word, so I've changed it to rn above.
Demo (thanks to #JonasMetzler) on dbfiddle
Note that if it's possible for duplicate rows to also have the same timestamp value, this query will delete a random selection of those rows. If you want a deterministic result, change the ORDER BY clause to
ORDER BY
p.ttimestamp DESC,
p.id DESC
which will keep the row with the highest (or lowest if you remove the DESC after p.id) id value.
Demo on dbfiddle
Assuming you have a unique column like id, you can do following:
DELETE FROM table1 WHERE ID NOT IN
(SELECT x.id FROM
(SELECT MAX(id) id, MAX(foreignId) foreignId,
MAX(column1) column1, MAX(column2) column2
FROM table1
WHERE ttimestamp IN (SELECT MAX(ttimestamp) FROM table1
GROUP BY foreignID, column1, column2)
GROUP BY foreignId, column1, column2)x);
Please see the working example here: db<>fiddle

SQL - Order By Sum of Multiple Tables Sharing Common Field

I have 2 different tables sharing the same column names. The tables list the same products which are identified by 'id'. The products have different revenues throughout both tables and are listed multiple times in each table.
I would like to sum the revenue of the same products across the 2 tables and ORDER BY the sum. Result is sorting the highest revenue products first.
I've tried JOIN and UNION but can't seem to figure out the right solution.
UNION query I tried...
SELECT id, SUM(rev) as total
FROM (
SELECT id, rev FROM table1 UNION ALL
SELECT id, rev FROM table2 UNION ALL
)
ORDER BY total DESC
JOIN query I tried...
SELECT table1.id,
table1.rev,
table2.id,
table2.rev,
(table1.rev + table2.rev) as revenue
FROM table1
INNER JOIN table2 ON table1.id = table2.id
ORDER BY revenue DESC
You were close. You needed:
one UNION ALL, not two.
a GROUP BY, that gives the break field.
An alias for the subquery (I used AllRevenue - you can use any valid name.)
SELECT id, SUM(rev) as total
FROM (
SELECT id, rev FROM table1 UNION ALL
SELECT id, rev FROM table2
) AS AllRevenue
GROUP BY id
ORDER BY total DESC
The join approach would have worked if you used a FULL OUTER JOIN, because some ids may be present in one table but not the other, but that is usually less performant.
Looks like you just need to group on the ID then... unless I'm missing something.
select d.* from (
SELECT
table1.id,
Sum(table1.rev) + Sum(table2.rev) as revenue
FROM table1
INNER JOIN table2 ON table1.id = table2.id
GROUP BY
table1.id) d
order by d.revenue
SELECT id, SUM(rev) as total
FROM (
SELECT id, rev FROM table1 UNION ALL
SELECT id, rev FROM table2
)
GROUP BY id
ORDER BY total DESC

Selecting row with max date from multiple almost identical tables

If I can retrieve the most recent name for each id from a table in a MySQL database like so:
SELECT n.id, n.name, n.date
FROM $table AS n
INNER JOIN
(SELECT id, MAX(date) AS date
FROM $table GROUP BY id)
AS max
USING (id, date);
How could I retrieve the most recent name from three almost identical tables (call them $table, $table2, $table3)? They all share the same column structure and the id found from one table may or may not be present in the other two. Think of it as one large table split into three (but with two of them containing two extra columns that are irrelevant in this instance). Would UNION be the best solution? If so, is there a way to do it without a mile-long query?
Constraint:
id is not an auto-incrementing unique integer unfortunately
You can use union all. One slight simplification is the group_concat()/substring_index() trick:
select id, max(date) as date,
substring_index(group_concat(name order by date desc), ',', '') as MostRecentName
from (select t.* from $table1 t union all
select t.* from $table2 t union all
select t.* from $table3
) t
group by id;
This does make certain assumptions. The name cannot contain , (although it is easy enough to change the separator. In addition, the intermediate result for the group_concat() cannot exceed a certain threshold (which is determined by a user-settable system parameter).
You could try:
SELECT n.id, n.name, n.date
FROM table1 where id in (select max(id) from table1)
union
SELECT n.id, n.name, n.date
FROM table2 where id in (select max(id) from table2)
union
SELECT n.id, n.name, n.date
FROM table3 where id in (select max(id) from table3)
Every inner query selects the highest id from the table and then searches for the corresponding fields in the outer query.
This ended up being the only solution I could think of:
SELECT n.id, n.name, n.date FROM (
SELECT id, name, date FROM $table
UNION ALL
SELECT id, name, date FROM $table2
UNION ALL
SELECT id, name, date FROM $table3
) AS n INNER JOIN (
SELECT id, MAX(date) AS date FROM (
SELECT id, date FROM $table
UNION ALL
SELECT id, date FROM $table2
UNION ALL
SELECT id, date FROM $table3
) AS t
GROUP BY id
) AS max USING (id, date)

The target table ... of DELETE is not updateable

I have such query
SET #n=0;
DELETE t3 FROM (
SELECT id, project_id, task_id, user_id,grouper
FROM (
SELECT id, project_id, task_id, user_id,
#n:=if(status=55,#n+1,#n),
if(status=55,#n-1,#n) as grouper FROM timelog
WHERE user_id='5' ORDER BY id ASC
) as t
where grouper>-1
group by grouper) as t3 WHERE grouper=1
for which i receive The target table t3 of the DELETE is not updatable
is there any solution for this error?
basically what i'm trying is to delete group of table rows marked with grouper using select in delete. i'm also happy for other solutions or ideas different than this one.
sql fiddle: http://sqlfiddle.com/#!2/33820/2/0
EDIT: thanks for the answers here is the working code(if anyone need something similiar):
SET #n=0;
delete from timelog where id in ((SELECT id
FROM (
SELECT id, project_id, task_id, user_id,
#n:=if(status=55,#n+1,#n),
if(status=55,#n-1,#n) as grouper FROM timelog
WHERE user_id='5' ORDER BY id ASC
) as t
where grouper>-1 and grouper=1
group by grouper))
Wish I had more time...but fast psuedo code...
delete from timelog where id in ((SELECT id
FROM (
SELECT id, project_id, task_id, user_id,
#n:=if(status=55,#n+1,#n),
if(status=55,#n-1,#n) as grouper FROM timelog
WHERE user_id='5' ORDER BY id ASC
) as t
where grouper>-1
group by grouper) as t3 WHERE grouper=1)
all I'm doing is changing the subselect statement into a where clause that simply returns the ID's listed in your original subquery.
edit - brackets are a bit off, I think I have it now. To be honest, this can really be cleaned up to one select statement, not the nested version here.
delete from dept_new where rowid in(select rowid from(select rowid,row_number() over(partition by deptno,dname,loc order by deptno) rownu from dept_new) where rownu>1);

MySQL use result of subquery

i have next query
SELECT *
FROM(
SELECT
ID,
SUM(points) AS SUMMARY
FROM my_table
GROUP BY ID
ORDER BY SUMMARY DESC
) t1
WHERE SUMMARY>=(SELECT
SUMMARY
FROM (
SELECT
ID,
SUM(points) AS SUMMARY
FROM my_table
GROUP BY ID
ORDER BY SUMMARY DESC
) t2
WHERE ID=1234)
How can I remove duplicate query or reuse selection results?
Maybe my request is completely incorrect?
I'm pretty sure your query is identical to:
SELECT ID, SUM(points) AS SUMMARY
FROM my_table
GROUP BY ID
HAVING SUMMARY >= (SELECT SUM(points) FROM my_table WHERE ID=1234)
ORDER BY SUMMARY DESC
SQL Fiddle demonstration