MySQL Removing duplicates based on condition and multiple columns combinations - mysql

I have a table in MySQL as below:
ID, COL1, COL2 VALUE
'1', 'OBJ1', 'OBJ2', '5'
'2', 'OBJ1', 'OBJ2', '1'
'3', 'OBJ2', 'OBJ1', '3'
'4', 'OBJ3', 'OBJ1', '4'
'5', 'OBJ3', 'OBJ4', '6'
Relation between col1 and col2 is independent of position, ie OBJ1 in col1 and OBJ2 in col2 is same as OBJ1 in col2 and OBJ2 in col1. This means that OBJ1 and OBJ2 shares a relationship.
Now, this means that the object OBJ1 and OBJ2 have a value of 1,5,3...
I want to keep only distinct values ie OBJ1, OBJ2 should occur only once in the table, not even OBJ2,OBJ1.
Importantly, I want to retain only the row with HIGHEST value.
The result I want is thus:
ID, COL1, COL2 VALUE
'1', 'OBJ1', 'OBJ2', '5'
'4', 'OBJ3', 'OBJ1', '4'
'5', 'OBJ3', 'OBJ4', '6'
What is the best and efficient way of doing this? I have over 10 million rows.
I have searched in many forums/Google but cannot find the exact answer I am looking for..

Try this:
SELECT t1.ID, t1.COL1, t1.COL2, t1.VALUE
FROM mytable AS t1
JOIN (
SELECT LEAST(COL1, COL2) AS C1,
GREATEST(COL1, COL2) AS C2,
MAX(VALUE) AS max_Value
FROM mytable
GROUP BY LEAST(COL1, COL2),
GREATEST(COL1, COL2)
) AS t2 ON t1.COL1 = t1.C1 AND t1.COL2 = t2.C2 AND t1.VLAUE = t2.max_Value

You could use an in clause and subselect grouped by
for solve also the problem related to the distinct pair combination
You should organize the data in a proper way
select
id
, case when col1 <= col2 then col1 else col2 end COL1
, case when col1 > col2 then col1 else col2 end COL2
, value
from start_table
then the query became
SELECT t1.ID, t1.COL1, t1.COL2, t1.VALUE
FROM (
select
id
, case when col1 <= col2 then col1 else col2 end COL1
, case when col1 > col2 then col1 else col2 end COL2
, value
from start_table
) t1
where value in (
select max(value)
FROM (
select
id
, case when col1 <= col2 then col1 else col2 end COL1
, case when col1 > col2 then col1 else col2 end COL2
, value
from start_table
) mytable
group by col1, col2
)
or using an inner join
SELECT t1.ID, t1.COL1, t1.COL2, t1.VALUE
FROM (
select
id
, case when col1 <= col2 then col1 else col2 end COL1
, case when col1 > col2 then col1 else col2 end COL2
, value
from start_table
) t1
inner join
(
select max(value) as value
FROM (
select
id
, case when col1 <= col2 then col1 else col2 end COL1
, case when col1 > col2 then col1 else col2 end COL2
, value
from start_table
) mytable
group by col1, col2
) T2 on t1.value = t2.value

Rebuild the table so that no dups are allowed; in the process, get rid of the dups. (And get rid of the apparently useless id.)
CREATE TABLE new (
col1 ...,
col2 ...,
`value` ...,
PRIMARY KEY(col1, col2),
INDEX(col2, col2, `value`)
) ENGINE=InnoDB;
INSERT INTO new (col1, col2, `value`)
SELECT LEAST(col1, col2),
GREATEST(col1, col2),
`value`
ON DUPLICATE KEY UPDATE
`value` := GREATEST(`value`, VALUES(`value`));
RENAME TABLE real TO old,
new TO real;
DROP TABLE old;
In the future, you will need this for INSERTing/UPDATEing new rows:
INSERT INTO new (col1, col2, `value`)
VALUES (?, ?, ?)
ON DUPLICATE KEY UPDATE
`value` := GREATEST(`value`, VALUES(`value`));
(This assumes you want to increase value whenever it is already in the table.)
These save space and speed (important for 10M rows): Getting rid of id; having optimal indexes; using InnoDB; etc.

Related

Select query in IF statement MYSQL

In my MySql database, I want to create select query which should give output like this:
in my select query i want a column output as 1, if the column value present in a list returned by a select query else 0 .
Select col1,col2,
,IF col3 IN
((select col from tabl2 ),1,0)AS col5
from tbl1.
Thanks in Advance
SELECT col1, col2,
IF col3 IN ((select col from tabl2 ),1,0) AS col5
FROM tbl1
Using IF and subquery in MySQL
SELECT table1.column1, table1.column2,
(
SELECT IF (
(SELECT column3 FROM table1 WHERE column3 IN (SELECT column FROM table2)), 1, 0
)
) AS column_output
FROM table1
In general:
SELECT col1,
col2,
CASE WHEN col3 IN (select col from tabl2)
THEN 1
ELSE 0
END AS col5
FROM tbl1
Specific for MySQL:
SELECT col1,
col2,
col3 IN (select col from tabl2) AS col5
FROM tbl1

Combine multiple OR and LIKE mySql

I understand that LIKE can not be used in inside IN clause from here.
My table has col1, col2, col3 columns.
How can I find multiple values in multiple columns using LIKE operator.
Currently my query is:
SELECT col1, col2, col3
FROM table_name
WHERE
(
Col1 = '38'
OR Col1 LIKE '%,38'
OR Col1 LIKE '%,38,%'
OR Col1 LIKE '38,%'
)
OR
(
col2 = '38'
OR col2 LIKE '%,38'
OR col2 LIKE '%,38,%'
OR col2 LIKE '38,%'
)
OR
(
col3 = '38'
OR col3 LIKE '%,38'
OR col3 LIKE '%,38,%'
OR col3 LIKE '38,%'
)
Is there a way make it smarter/shorter/faster?
Thanks.!
You can use the function FIND_IN_SET(). MySQL has a dedicated function FIND_IN_SET() that returns field index if the value is found in a string containing comma-separated values.
Example
DROP TABLE table_name;
CREATE TABLE table_name
(col1 VARCHAR(100));
INSERT INTO table_name VALUES ('38');
INSERT INTO table_name VALUES ('37,38,39');
INSERT INTO table_name VALUES ('37,38');
INSERT INTO table_name VALUES ('38,39');
INSERT INTO table_name VALUES ('48');
SELECT col1
FROM table_name
WHERE FIND_IN_SET(38, col1) > 0;
Your solution would be
SELECT col1, col2, col3
FROM table_name
WHERE FIND_IN_SET(38, col1) > 0
OR FIND_IN_SET(38, col2) > 0
OR FIND_IN_SET(38, col3) > 0 ;

Getting duplicate rows by several columns in MySQL

I'm trying to search duplicate rows by several columns in large table (near 18 000 rows). Problem is that queries take a lot of time, I tried this:
SELECT * FROM table_name a, table_name b
WHERE a.col1 = b.col1
AND a.col2 = b.col2
AND a.col3 = b.col3
AND a.col4 = b.col4
AND a.id <> b.id
and this:
SELECT *
FROM table_name
WHERE col1 IN (
SELECT col1
FROM table_name
GROUP BY col1
HAVING count(col1) > 1
)
AND col2 IN (
SELECT col2
FROM table_name
GROUP BY col2
HAVING count(col2) > 1
)
AND col3 IN (
SELECT col3
FROM table_name
GROUP BY col3
HAVING count(col3) > 1
)
AND col4 IN (
SELECT col4
FROM table_name
GROUP BY col4
HAVING count(col4) > 1
)
they both work, but too slow. Any ideas?
You can try using one joint GROUP BY statement like:
SELECT * FROM table_name
GROUP BY col1, col2, col3, col4
HAVING count(*) > 1
At the very least, it will look cleaner.
EDIT
To return all results as a sub-set for the previous column:
SELECT *
FROM table_name
WHERE col4 IN (
SELECT col4
FROM table_name
WHERE col3 IN (
SELECT col3
FROM table_name
WHERE col2 IN (
SELECT col2
FROM table_name
WHERE col1 IN (
SELECT col1
FROM table_name
GROUP BY col1
HAVING count(col1) > 1
)
)
)
This, in concept, should give you all results in a faster execution time.

Select minimum column not equal to zero

We have 2 columns in a table and want to SELECT the lesser most of the two (unless it is equal to 0, then we want the nonzero column).
Is there a way to do this while keeping it a simple SELECT statement?
What we want is this:
SELECT LEAST(col1, col2) FROM myTable
Unless col2 is 0, in which case we want col1.
SELECT LEAST(
IF(col1, col1, col2),
IF(col2, col2, col1)
)
FROM myTable
SELECT
CASE
WHEN col1 = 0 THEN col2
WHEN col2 = 0 THEN col1
ELSE LEAST(col1, col2)
END AS MinCol
FROM myTable

How to do MySQL order by x where (x=col3 if col3!=null, else x=col2)?

Suppose, a select operation is giving 3 columns: col1, col2 and col3. col2 and col3 are of type time. I want the result to order by the time-stamp. For each record, time-stamp is equal to col3 if col3 is Not null, else it's col2. How to do this?
Use the built-in function ifnull():
select
...
order by ifnull(col3, col2)
ifnull() returns the first parameter if it is not null, otherwise the second parameter.
You can do it using subqueries. In the internal query you check if col3 is NULL and you "select" col3 or col2 AS "my_time". And then, in the external query, you order by "my_time".
Sub-query example:
SELECT <whatever>, IF (col3 IS NOT NULL, col3, col2) AS my_time FROM <table>
And then, the external query:
SELECT * FROM (
SELECT <whatever>, IF (col3 IS NOT NULL, col3, col2) AS my_time FROM <table>
) AS <temp_table> ORDER BY <temp_table>.my_time