Optimizing mysql query to find all duplicate entries - mysql

I am running a query like this:
SELECT DISTINCT `tableA`.`field1`,
`tableA`.`filed2` AS field2Alias,
`tableA`.`field3`,
`tableB`.`field4` AS field4Alias,
`tableA`.`field6` AS field6Alias
FROM (`tableC`)
RIGHT JOIN `tableA` ON `tableC`.`idfield` = `tableA`.`idfield`
JOIN `tableB` ON `tableB`.`idfield` = `tableA`.`idfield`
AND tableA.field2 IN
(SELECT field2
FROM tableA
GROUP BY tableA. HAVING count(*)>1)
ORDER BY tableA.field2
This is to find all the duplicate entries, but now it's taking lot of time for the execution. Any suggestions for optimization?

It looks like you are trying to find all duplicates on field2 in TableA. The first step would be to move the in subquery to the from clause:
SELECT DISTINCT a.`field1`, a.`filed2` AS field2Alias,
a.`field3`, b.`field4` AS field4Alias, a.`field6` AS field6Alias
FROM tableA a left join
tableC c
on c.`idfield` = a`.`idfield` join
`tableB` b
ON b.`idfield` = a.`idfield` join
(SELECT field2
FROM tableA
group by field2
having count(*) > 1
) asum
on asum.field2 = a.field2
ORDER BY tableA.field2
There may be additional optimizations, but it is very hard to tell. Your question "find duplicates" and your query "join a bunch of tables together and filter them" don't quite match. It would also be helpful to know what tables have which indexes and unique/primary keys.

Related

MySQL - Selecting rows where fields not equal

I have a little problem with an SQL query: I have 'TableA' with a field 'TableA.b' that contains an ID for 'TableB'. I want to select all rows from 'TableB' that don't have an ID that equals any field 'TableA.b'. With other words, I need every row from TableB that's not referred to by any row from TableA in field .
I tried a Query like this :
SELECT DISTINCT TableB.* FROM TableA, TableB Where TableA.b != TableB.ID
But the result contains a row that is also returned by the negation, i.e. where both fields have the same value.
Any ideas?
What you need is LEFT (or RIGHT) JOIN.
SELECT TableB.* FROM TableA
LEFT JOIN TableB on TableA.b = TableB.ID
WHERE TableA.b IS NULL
While it's possible to do the same with a subquery as in some of the otehr answers. A join will often be faster.
A LEFT [OUTER] JOIN can be faster than an equivalent subquery because
the server might be able to optimize it better—a fact that is not
specific to MySQL Server alone. Prior to SQL-92, outer joins did not
exist, so subqueries were the only way to do certain things. Today,
MySQL Server and many other modern database systems offer a wide range
of outer join types.
First, select all ids from TableA:
SELECT DISTINCT b FROM TableA
Then use that result to select all rows in TableB that have an id that does not exist in this set by using the above query as a subquery:
SELECT * FROM TableB WHERE ID NOT IN (SELECT DISTINCT b FROM TableA)
Hope this helps.
You can try this
SELECT TableB.* FROM TableB
WHERE ID NOT IN
(SELECT b from TableA);
Use NOT IN in SELECT Query.
SELECT * FROM TableB t1 WHERE t1.ID NOT IN (SELECT t2.b FROM TableA t2);
You can use right join also.
Try this:
SELECT DISTINCT TableB.* FROM tablea RIGHT JOIN TableB ON TableA.b = Tableb.ID WHERE TableA.B IS NULL

Cardinality violation when using a subquery that returns two values

I have create a sql query that the sketch is like this
select *
from A
where A.id in (select B.id1, B.id2 from B);
where the main select returns those values for which A.id coincides with either B.id1 or B.id2.
Clearly this solution doesn't work as the cardinality doesn't match in the where clause. How can I overcome this problem?
One solution would be to make two sub-queries, one for B.id1 and one for B.id2, but as my sub-query is much longer than in this example I was looking for a more elegant solution.
I'm using Mysql
EDIT 1
As long as the syntax is simpler than using two sub-queries I have no issues using joins
EDIT 2
Thanks #NullSoulException. I tried the first solution and works as expected!!
Something like the below should do the trick.
select *
From table1 a , (select id1 , id2 from table2 ) b
where (a.id = b.id1) or (a.id = b.id2)
or you can JOIN with the same table twice by giving the joined tables an alias.
select * from table1 a
INNER JOIN table2 b1 on a.id = b1.id1
INNER JOIN table2 b2 on a.id = b2.id2
Please test the above against your datasets/tables..

DISTINCT in mysql query removing the records from resultset

DISTINCT in mysql query removing the records from resultset
I have three tables
TBL1 TBL2 TBL3
---- ------ --------
tbl1_id tbl2_id tbl3_id
cid fkcid fkcid
fktbl1_id fktbl2_id
I have query to get records of TBL3
select distinct tbl3.* from TBL3 tbl3
inner join TBL2 tbl2 on tbl2.tbl2_id = tbl3.fktbl2_id and tbl2.fkcid = tbl3.fkcid
inner join TBL1 tbl1 on tbl1.tbl1_id = tbl2.fktbl1_id and tbl2.fkcid = tbl1.cid;
This query gives me around 1000 records.
But when I removes distinct from query it gives me around 1100 records.
There is no duplicate records in table.Also I confirmed that these extra 100 are not duplicate.Please note That these extra 100 records are not found in query with distinct keyword.
Why this query is behaving unexpectedly.Please help me to understand more clearly and correct me if i am making mistake.
Thank you
You have multiple records in tbl1 or tbl2 that map to the same tbl3, and since you're only selecting tbl3.* in your output, DISTINCT removes the duplication. To instead find what the duplicates are, remove the DISTINCT, add a COUNT(*) to the SELECT clause, and add at the end a GROUP BY and HAVING, such as:
select tbl3.*, count(*)
from TBL3 tbl3
inner join TBL2 tbl2 on tbl2.tbl2_id = tbl3.fktbl2_id and tbl2.fkcid = tbl3.fkcid
inner join TBL1 tbl1 on tbl1.tbl1_id = tbl2.fktbl1_id and tbl2.fkcid = tbl1.cid
group by tbl3.tbl3_id, tbl3.fkcid, tbl3.fktbl2_id having count(*) > 1;

SQL - left join with OR operator (MySQL)

I have a left join on a MySQL database like this...
select *
from tableA
left join tableB on (tableA.id=tableB.id1 and tableB.col2='Red')
...which performs OK with >500K rows on tableA and tableB
However, changing it to this (and assuming indexes are OK)...
select *
from tableA
left join tableB on
((tableA.id=tableB.id1 and tableB.col2='Red') OR
(tableA.id=tableB.id2 and tableB.col2='Blue') )
...kills it, in terms of performance.
So why the performance hit? Can I do it another way?
EDIT
Not really sure what do you need
can you show some expected result
can you tell us what you mean by "kills it, in terms of performance" (does it go to 20sec of execution time ?)
I don't believe its more efficient but try it.
select
*
from
tableA as a
left join tableB as b1
on a.id=b1.id1
and b1.col2='Red'
left join tableB as b2
on a.id=b2.id2
and b2.col2='Blue'
where
(b1.id1 is not null or b2.id2 is not null)
or (b1.id1 is null and b2.id2 is null)
You have to manage the result in the SELECT with CASE WHEN...
You can compare the performance and put indexes on appropriated columns (depends on what you have in full table and query but here it should be id, id1 and col2)
Oh didn't notice that ... how about this ... don't have execution so you'll be able to judge better.
select *
from tableB, tableA
where (tableA.id=tableB.id1 and tableB.col2='Red') OR
(tableA.id=tableB.id2 and tableB.col2='Blue')

MySQL trying to reuse results of subquery in an efficient way

I have a query like this:
SELECT q,COUNT(x),y,
(SELECT i FROM (SELECT q,w FROM tableA WHERE conds)
JOIN tableC ON (cond)
WHERE id = t.q)
FROM (SELECT q,w FROM tableA WHERE conds) t
JOIN tableB
GROUP BY q
The subquery (SELECT q,w FROM tableA WHERE conds) returns several hundred rows. After the GROUP BY q there is around 20 rows left.
The subquery (SELECT i FROM (SELECT q,w FROM tableA WHERE conds) join tableC WHERE id = t.q) uses inside of it the exactly same subquery as the one above, but then also selects a fraction of the results based on which q value is currently being grouped.
My problem seems to be this. The performance is too slow because I can't seem to put the WHERE id = t.q inside the (SELECT q,w, FROM Table A WHERE conds) subquery. I can only guess that for every unique value of q, the query is being run, it produces hundreds of rows and then has to perform the WHERE clause on an un-indexed temporary table. I think I need to perform the WHERE before the full join
Any ideas please?
This query could produce the same results, but so much information is missing from the question, who can be sure?
Select
q,
count(x),
y,
i
From
tableA a
inner join
tableC c
on cond and c.id = a.q
cross join -- is this an inner join?
tableB b
Where
conds
Group By
q,
y,
i