Compare 2 tables and find non duplicate entries - mysql

I have 2 tables. I want to check if columns of table 1 don't have duplicates in columns of table2.
Here is how the search should work!
If no duplicates are found, I want to get the row name from table1.

If I got you right, this is what you want.
SELECT
t1.name
FROM
Table1 t1
WHERE
t1.name
NOT IN
(
SELECT t2.name
FROM Table2 t2
JOIN t1
ON t2.name = t1.name
)

You need to specify a column (or columns) that you will use to "match" the rows, to determine whether they are "duplicates".
I'm going to assume (absent any schema information), that the column name is id.
An "anti-join" pattern is usually the best performing option:
SELECT a.id
FROM table1 a
LEFT
JOIN table2 b
ON a.id = b.id
WHERE b.id IS NULL
(Performance is dependent on a whole bunch of factors.)
Your other options are to use a NOT EXISTS predicate:
SELECT a.id
FROM table1 a
WHERE NOT EXISTS
( SELECT 1
FROM table2 b
WHERE b.id = a.id
)
Or, use a NOT IN predicate:
SELECT a.id
FROM table1 a
WHERE a.id NOT IN
( SELECT b.id
FROM table2 b
WHERE b.id IS NOT NULL
)
The generated execution plan and performance of each of these statements will likely differ. With large sets, the "anti-join" pattern (the first query) usually performs best.

Related

Sum the same value with different condition

Ok, the title is cryptic but I don't know how to sintetize it better.
I have a series of expensive similar SELECT SUM queries that must be executed in sequence.
Example:
SELECT SUM(t2.Field)
FROM Table1 AS t1
INNER JOIN (
SELECT Field FROM Table2
WHERE [list of where]
) AS t2 ON ti.ExtKey = t2.Key
WHERE t1.TheValue = 'Orange'
SELECT SUM(t2.Field)
FROM Table1 AS t1
INNER JOIN (
SELECT Field FROM Table2
WHERE [list of where]
) AS t2 ON ti.ExtKey = t2.Key
WHERE t1.TheValue = 'Apple'
And so on.
I've used the nested inner join because after some test it resulted faster than a plain Join.
The rows selected for Table2 are always the same, or at least the same for session.
There's a way to group all the queries in one to speed up the execution?
I was thinking about using a material view, but this would complicate very much the design and maintenance.
I am no sure about your goal. I have a guess for you:
http://sqlfiddle.com/#!9/af66e/2
http://sqlfiddle.com/#!9/af66e/1
SELECT
SUM(IF(t1.TheValue = 'Orange',t2.Field,0)) as oranges,
SUM(IF(t1.TheValue = 'Apple',t2.Field,0)) as apples
FROM Table1 AS t1
INNER JOIN (
SELECT Field, `key` FROM Table2
) AS t2 ON t1.ExtKey = t2.`key`
# GROUP BY t1.extkey uncomment if you need it
If you can provide raw data sample and expected result that would help a lot.
I think you want a group by:
SELECT t1.TheValue, SUM(t2.Field)
FROM Table1 t1 INNER JOIN
(SELECT Field
FROM Table2
WHERE [list of where]
) t2
ON t1.ExtKey = t2.Key
GROUP BY t1.theValue;
Note that your query doesn't quite make sense, because t2 doesn't have a column called key. I assume this is an oversight in the question.
If you want to limit it to particular values, then use a WHERE clause before the GROUP BY:
WHERE t1.TheValue IN ('Apple', 'Orange', 'Pear')

3 tables and 2 left joins

Query 1:
SELECT sum(total_revenue_usd)
FROM table1 c
WHERE c.irt1_search_campaign_id IN (
SELECT assign_id
FROM table2 ga
LEFT JOIN table3 d
ON d.campaign_id = ga.assign_id
)
Query 2:
SELECT sum(total_revenue_usd)
FROM table1 c
LEFT JOIN table2 ga
ON c.irt1_search_campaign_id = ga.assign_id
LEFT JOIN table3 d
ON d.campaign_id = ga.assign_id
Query 1 gives me the correct result where as I need it in the second style without using 'in'. However Query 2 doesn't give the same result.
How can I change the first query without using 'in' ?
The reason being is that the small query is part of a much larger query, there are other conditions that won't work with 'in'
You could try something along the lines of
SELECT sum(total_revenue_usd)
FROM table1 c
JOIN
(
SELECT DISTINCT ga.assign_id
FROM table2 ga
JOIN table3 d
ON d.campaign_id = ga.assign_id
) x
ON c.irt1_search_campaign_id = x.assign_id
The queries do very different things:
The first query sums the total_revenue_usd from table1 where irt1_search_campaign_id exists in table2 as assign_id. (The outer join to table3 is absolutely unnecessary, by the way, because it doesn't change wether a table2.assign_id exists or not.) As you look for existence in table2, you can of course replace IN with EXISTS.
The second query gets you combinations of table1, table2 and table3. So, in case there are two records in table2 for an entry in table1 and three records in table3 for each of the two table2 records, you will get six records for the one table1 record. Thus you sum its total_revenue_usd sixfold. This is not what you want. Don't join table1 with the other tables.
EDIT: Here is the query using an exists clause. As mentioned, outer joining table3 doesn't alter the results.
Select sum(total_revenue_usd)
from table1 c
where exists
(
select *
from table2 ga
-- left join table3 d on d.campaign_id = ga.assign_id
where ga.assign_id = c.irt1_search_campaign_id
);

Select from table1 where similar rows do NOT appear in table2?

I've been struggling with this for a while, and haven't been able to find any examples to point me in the right direction.
I have 2 MySQL tables that are virtually identical in structure. I'm trying to perform a query that returns results from Table 1 where the same data isn't present in table 2. For example, imagine both tables have 3 fields - fieldA, fieldB and fieldC. I need to exclude results where the data is identical in all 3 fields.
Is it even possible?
There are several ways to do it (assuming the fields don't allow NULLs):
SELECT a, b, c FROM Table1 T1 WHERE NOT EXISTS
(SELECT * FROM Table2 T2 WHERE T2.a = T1.a AND T2.b = T1.b AND T2.c = T1.c)
or
SELECT T1.a, T1.b, T1.c FROM Table1 T1
LEFT OUTER JOIN Table2 T2 ON T2.a = T1.a AND T2.b = T1.b AND T2.c = T1.c
WHERE T2.a IS NULL
select
t1.*
from
table1 t1
left join table2 t2 on
t1.fieldA = t2.fieldA and
t1.fieldB = t2.fieldB and
t1.fieldC = t2.fieldC
where
t2.fieldA is null
Note that this will not work if any of the fields is NULL in both tables. The expression NULL = NULL returns false, so these records are excluded as well.
This is a perfect use of EXCEPT (the key word/phase is "set difference"). However, MySQL lacks it. But no fear, a work-around is here:
Intersection and Set-Difference in MySQL (A workaround for EXCEPT)
Please not that approaches using NOT EXISTS in MySQL (as per above link) are actually less than ideal although they are semantically correct. For an explanation of the performance differences with the above (and alternative) approaches as handled my MySQL, complete with examples, see NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL:
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.
Happy coding.
The 'left join' is very slow in MYSQL. The gifford algorithm shown below speeds it many orders of magnitude.
select * from t1
inner join
(select fieldA from
(select distinct fieldA, 1 as flag from t1
union all
select distinct fieldA, 2 as flag from t2) a
group by fieldA
having sum(flag) = 1) b on b.fieldA = t1.fieldA;

Inserting distinct entries into the database

I have two tables with exactly the same fields. Table A contains 7160 records and table B 7130 records.Now I want to insert distinct records from table A into table B such that B should not have any duplicate entry in it. How should I go about doing this?
This basically selects records that are in A that are not in B. It would work, but you might have to tweak the field you use to uniquely identify a record. In this example I used field 'ID' but you might have to change that to A.field1 = B.field1 AND A.field2 = B.field2 etc.
INSERT INTO TABLEB
(
SELECT A.*
FROM TABLEA A
LEFT JOIN TABLEB B ON A.ID = B.ID
WHERE B.ID IS NULL
)
You can use a "union" query to combine the results from multiple tables into a single result set. "union" will only return distinct rows from all tables.
See this page for more info:
http://www.tutorialspoint.com/mysql/mysql-union-keyword.htm
insert into tableB (id)
select t1.id from tableA t1
where t1.id not in (select t2.id from tableB t2)

How to retrieve non-matching results in mysql

I'm sure this is straight-forward, but how do I write a query in mysql that joins two tables and then returns only those records from the first table that don't match. I want it to be something like:
Select tid from table1 inner join table2 on table2.tid = table1.tid where table1.tid != table2.tid;
but this doesn't seem to make alot of sense!
You can use a left outer join to accomplish this:
select
t1.tid
from
table1 t1
left outer join table2 t2 on
t1.tid = t2.tid
where
t2.tid is null
What this does is it takes your first table (table1), joins it with your second table (table2), and fills in null for the table2 columns in any row in table1 that doesn't match a row in table2. Then, it filters that out by selecting only the table1 rows where no match could be found.
Alternatively, you can also use not exists:
select
t1.tid
from
table1 t1
where
not exists (select 1 from table2 t2 where t2.tid = t1.tid)
This performs a left semi join, and will essentially do the same thing that the left outer join does. Depending on your indexes, one may be faster than the other, but both are viable options. MySQL has some good documentation on optimizing the joins, so you should check that out..