MySQL Delete duplicates in consecutive rows - mysql

Suppose this table:
ID ColA ColB
1 7 8
2 7 9
3 7 9
4 5 8
5 6 9
6 6 9
7 5 4
The PK is the ID coumn.
Now, I want to delete all duplicates of ColA and ColB in consecutive rows.
In this example rows 2,3 and 5,6 contain duplicates.
These shall be removed so that the higher ID is remained.
The output should be:
ID ColA ColB
1 7 8
3 7 9
4 5 8
6 6 9
7 5 4
How can this be done with mySQL?
Thanks,
Juergen

SELECT
ID
FROM
MyTable m1
WHERE
0 < (SELECT
COUNT(*)
FROM
MyTable m2
WHERE
m2.ID = m1.ID - 1 AND
m2.ColA = m1.ColA AND
m2.ColB = m1.ColB)
and then you can use a
delete from MyTable where ID in ...
query. This way it would surely work in any version.

CREATE TEMPORARY TABLE duplicates (id int primary key)
INSERT INTO duplicates (id)
SELECT t1.id
FROM table t1
join table t2 on t2.id = t1.id + 1
WHERE t1.ColA = t2.ColA
and t1.ColB = t2.ColB
-- SELECT * FROM duplicates --> are you happy with that? => delete
DELETE table
FROM table
join duplicates on table.id = duplicates.id

Depending on how many records you have, this might not be the most efficient:
SELECT (SELECT TOP 1 id FROM table WHERE colA = m.colA AND colB = m.colB ORDER BY id DESC) AS id, m.*
FROM (SELECT DISTINCT colA, colB
FROM table) m
There might be syntax errors because I usually use mssql, but the idea should be similar.

I've called the first table 'test'.
Firstly create a table that will hold all the identical combinations of ColA and ColB:
create temporary table tmpTable (ColA int, ColB int);
insert into tmpTable select ColA,ColB from test group by ColA, ColB;
Now, select the maximum id in the original table for each identical combination of ColA and ColB. Put this into a new table (called idsToKeep because these are the rows we do not want to delete):
create temporary table idsToKeep (ID int);
insert into idsToKeep select (select max(ID) from test where test.ColA=tmpTable.ColA and test.ColB=tmpTable.ColB) from tmpTable;
Finally, delete all the entries from the original table that are not in the idsToKeep table:
delete from test where ID <> all (select ID from idsToKeep);

Related

how to remove duplicate entries from many to many relation using sql query

My tables look like this. my op and country is having many to many relationships with each other.
OP
id, name,.....
op_country
id, op_id, country_id
country
id, name, ...
my op_country filled like below
id op_id country_id
1 1 1
2 1 2
3 2 2
4 2 3
5 3 3
6 3 3
7 1 1
I want to remove my duplicate entries from op_country. Here I want to remove rows 6 and 7 since we already have rows with such values.
How can I do that.
DELETE t1
FROM op_country t1
JOIN op_country t2 USING (op_id, country_id)
WHERE t1.id > t2.id
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=247ebc5870a6ab10b64076ffb375797f
You want to delete entries for which exists a sibling with a lower ID:
delete from op_country
where exists
(
select null
from (select * from op_country) op2
where op2.op_id = op_country.op_id
and op2.country_id = op_country.country_id
and op2.id < op_country.id
);
The from (select * from op_country) is necessary instead of a mere from op_country due to some weird restriction in MySQL updates.

MySQL - 1 Master List Table Not IN Table 2 but IN Table 3

Can you kindly help me with my query?
In my database, I have 3 tables,
Table 1 - Student Master List (9 Records)
Table 2 - AM (4 Records)
Table 3 - PM (3 Records)
Table 2 and table 3 have the same structure but Table 2 is more priority than Table 3, anyway
I want to see the records from Table 1 which are NOT IN Table 2 BUT there's a record IN Table 3. Table 2 (4) + Table 3 (3) = 7 Records
But how can I show the 2 records from the master list
sample database
My query is something like this:
select * from table1 t1
where (id, lname, fname, mname) NOT IN
(select id, lname, fname, mname from table2) and
(id, lname, fname, mname) IN
(select id, lname, fname, mname from table3)
But when I did this, It just shows some records from table 2 and table 3
If you have a common key among all tables which is (id, lname, fname, mname) below will work. If your common key is somewhat different, adjust WHERE clauses in both subqueries to only include the common key (column(s)).
Use EXISTS to include records present in table 3 and NOT EXISTS to exclude records present in table 2:
select *
from table1 t1
where
not exists (
select 1
from table2 t2
where t1.id = t2.id and t1.lname = t2.lname and t1.fname = t2.fname and t1.mname = t2.mname)
and exists (
select 1
from table3 t3
where t1.id = t3.id and t1.lname = t3.lname and t1.fname = t3.fname and t1.mname = t3.mname)
I suspect you just need to union table 2 and 3 and left join table 1 testing for null values
drop table if exists t1,t2,t3;
create table t1 (id int);
create table t2 (id int);
create table t3 (id int);
insert into t1 values (1),(2),(3),(4),(5),(6),(7),(8),(9);
insert into t2 values (2),(3),(4),(5);
insert into t3 values (1),(6),(7);
select t1.*
from t1
left join
(select id from t2
union all
select id from t3)s on s.id = t1.id
where s.id is null;
+------+
| id |
+------+
| 8 |
| 9 |
+------+
2 rows in set (0.00 sec)

Select matching to two columns of subquery

Background:
Given table t1 with fields A, B (and others):
DROP TEMPORARY TABLE IF EXISTS t1;
CREATE TEMPORARY TABLE t1 (ID INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, A varchar(255), B int, C varchar(40));
INSERT INTO t1 (A, B, C)
SELECT 'AA', 11, 100
UNION ALL
SELECT 'BB', 12, 200
UNION ALL
SELECT 'BB', 12, 201
UNION ALL
SELECT 'AA', 12, 300
UNION ALL
SELECT 'AA', 11, 101;
-- ID A B C
-- 1 AA 11 100
-- 2 BB 12 200
-- 3 BB 12 201
-- 4 AA 12 300
-- 5 AA 11 101
GOAL: For a given combination of A and B, examine how many rows there are in t1, and then list all those rows (to understand what is same and what is different between those rows).
(Finally, but beyond the scope of this question, will be writing queries to process some of the older rows, that are determined to be "obsolete" (replaced by the most recent row with given A and B). Its not safe to do so for ALL combinations of A and B, at this time. A definitive answer on "what combinations of A and B are safe to delete old version of" is not available to me - this is a legacy table which has associated with it many GBs of external files, most of which are no longer relevant to anyone. All those files have been backed up; I need to make a conservative proposal as to which files to remove, and how I determined those files.)
I've made temp table t2 with all distinct combinations of A and B (plus an ID, and a count of how many rows of each combo):
DROP TEMPORARY TABLE IF EXISTS t2;
CREATE TEMPORARY TABLE t2 (ID INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, A varchar(255), B int)
SELECT COUNT(1) As Cnt, A, B FROM t1
GROUP BY A, B
ORDER BY Cnt DESC;
SELECT * FROM t2
-- ID Cnt A B
-- 1 2 AA 11
-- 2 2 BB 12
-- 3 1 AA 12
The query that I am having trouble writing:
In the actual data, there are hundreds of rows for some combinations. I am most interested in the combinations that have a high count, so I attempt to dump the rows of t1, based on the first row of t2:
SELECT * FROM t1
WHERE A=
(SELECT A from t2 LIMIT 1 OFFSET 0) AND
B=
(SELECT B from t2 LIMIT 1 OFFSET 0);
This gives error:
Error Code: 1137. Can't reopen table: 't2'
I presume that I should refer to the row I want from t2:
(SELECT A, B from t2 LIMIT 1 OFFSET 0)
And then make a nested query that uses this row twice, in the two places where columns A and B are used. I am stuck on how to write this query. The basic idea in my head is:
SELECT * FROM t1
WHERE A=t3.A AND B=t3.B IN
(SELECT A, B from t2 LIMIT 1 OFFSET 0) AS t3;
(which is not valid SQL)
NOTE: "OFFSET 0" is there because then I will change to other offset values, to examine other A-B combos.
The goal is to see response:
-- ID A B C
-- 1 AA 11 100
-- 5 AA 11 101
Or maybe this can be done with a JOIN, but I'm not sure how to do a JOIN using just one row of t2.
You can do something like this:
SELECT t1.*
FROM t1
JOIN ( SELECT t2.A, t2.B
FROM t2
ORDER BY t2.A, t2.B
LIMIT 1 OFFSET 0
) t3
WHERE t3.A = t1.A
AND t3.B = t1.B
Without an ORDER BY clause, MySQL is free to return any row. We need to add the ORDER BY to make the result deterministic.

Update a column of a table with the count of other column in the same table in a MySQL server

I have a table like this:
recordid customerid product id count
1 2 12 3
2 4 10 1
3 2 3 3
4 3 12 2
5 3 10 2
6 2 7 3
7 5 3 1
8 ....
9 ....
I want an update query that will count the no of occurrence of each customer id and update the count column which will initially be empty.
the end result should be like above
The column names are dummy, my actual table is different.
It has data in millions of rows.The query should be speedy
I tried the query but it gets stuck...
update tablename, (select count(recordid) as count,customerid from tablename group by customerid) as temp set count=temp.count where customerid=temp.customerid
You can use JOIN in UPDATE.
Try this:
UPDATE TableName A
JOIN
(SELECT customerid,Count(customerid) as cnt
FROM TableName
GROUP BY customerid) as B ON A.customerid= B.customerid
SET A.count = B.cnt
This doesn’t see right:
update tablename, (select count(recordid) as count,customerid from tablename group by customerid) as temp set count=temp.count where customerid=temp.customerid
Why is there a comma after update tablename like this:
update tablename,
I am also reformatting for readability:
UPDATE tablename (
SELECT count(recordid) AS count, customerid
FROM tablename GROUP BY customerid
) as temp
SET count=temp.count
WHERE customerid = temp.customerid

How to fetch changed rows by comparing a table with its older version?

I have one log table and one view.
I would like to fetch the changed rows from the view by comparing it to the log table given an ID_NO.
The ID_NO is fixed between the two tables, whereas other columns can change.
In short, I would like to fetch the rows from Table1 which have one more changed columns in comparison to Table2.
for example:
TABLE 1:
ID COL1 COL2 COL3
1 A B C
2 34 56 D
3 F XY 24
TABLE 2:
ID COL1 COL2 COL3
1 A B C
2 34 56 F
3 1 XY 24
The query should return the following from TABLE2:
ID COL1 COL2 COL3
2 34 56 F
3 1 XY 24
Please advise.
Many Thanks!
SELECT *
FROM one_view vw
WHERE EXISTS
(
SELECT 1
FROM log_table t
WHERE vw.id_no = t.id_no
)
;
A note after the question was updated:
SELECT *
FROM table_2 t1
WHERE EXISTS
(
SELECT 1
FROM table_1 t2
WHERE t1.id_no = t2.id_no
AND
(
t1.col1 <> t2.col1
OR t1.col2 <> t2.col2
OR t1.col3 <> t2.col3
)
)
;
you could add a trigger to the changing table that inserts the id in a second table that is used to identify the changed rows from the changing table. Just comparing the values between tables might work but requires a lot of work. Getting the id's of the changed rows might be easier.
Just in case you also want to have the old values, add the changed colums and values to the logging table.