Select from table1 where similar rows do NOT appear in table2? - mysql

I've been struggling with this for a while, and haven't been able to find any examples to point me in the right direction.
I have 2 MySQL tables that are virtually identical in structure. I'm trying to perform a query that returns results from Table 1 where the same data isn't present in table 2. For example, imagine both tables have 3 fields - fieldA, fieldB and fieldC. I need to exclude results where the data is identical in all 3 fields.
Is it even possible?

There are several ways to do it (assuming the fields don't allow NULLs):
SELECT a, b, c FROM Table1 T1 WHERE NOT EXISTS
(SELECT * FROM Table2 T2 WHERE T2.a = T1.a AND T2.b = T1.b AND T2.c = T1.c)
or
SELECT T1.a, T1.b, T1.c FROM Table1 T1
LEFT OUTER JOIN Table2 T2 ON T2.a = T1.a AND T2.b = T1.b AND T2.c = T1.c
WHERE T2.a IS NULL

select
t1.*
from
table1 t1
left join table2 t2 on
t1.fieldA = t2.fieldA and
t1.fieldB = t2.fieldB and
t1.fieldC = t2.fieldC
where
t2.fieldA is null
Note that this will not work if any of the fields is NULL in both tables. The expression NULL = NULL returns false, so these records are excluded as well.

This is a perfect use of EXCEPT (the key word/phase is "set difference"). However, MySQL lacks it. But no fear, a work-around is here:
Intersection and Set-Difference in MySQL (A workaround for EXCEPT)
Please not that approaches using NOT EXISTS in MySQL (as per above link) are actually less than ideal although they are semantically correct. For an explanation of the performance differences with the above (and alternative) approaches as handled my MySQL, complete with examples, see NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL:
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.
Happy coding.

The 'left join' is very slow in MYSQL. The gifford algorithm shown below speeds it many orders of magnitude.
select * from t1
inner join
(select fieldA from
(select distinct fieldA, 1 as flag from t1
union all
select distinct fieldA, 2 as flag from t2) a
group by fieldA
having sum(flag) = 1) b on b.fieldA = t1.fieldA;

Related

Sum the same value with different condition

Ok, the title is cryptic but I don't know how to sintetize it better.
I have a series of expensive similar SELECT SUM queries that must be executed in sequence.
Example:
SELECT SUM(t2.Field)
FROM Table1 AS t1
INNER JOIN (
SELECT Field FROM Table2
WHERE [list of where]
) AS t2 ON ti.ExtKey = t2.Key
WHERE t1.TheValue = 'Orange'
SELECT SUM(t2.Field)
FROM Table1 AS t1
INNER JOIN (
SELECT Field FROM Table2
WHERE [list of where]
) AS t2 ON ti.ExtKey = t2.Key
WHERE t1.TheValue = 'Apple'
And so on.
I've used the nested inner join because after some test it resulted faster than a plain Join.
The rows selected for Table2 are always the same, or at least the same for session.
There's a way to group all the queries in one to speed up the execution?
I was thinking about using a material view, but this would complicate very much the design and maintenance.
I am no sure about your goal. I have a guess for you:
http://sqlfiddle.com/#!9/af66e/2
http://sqlfiddle.com/#!9/af66e/1
SELECT
SUM(IF(t1.TheValue = 'Orange',t2.Field,0)) as oranges,
SUM(IF(t1.TheValue = 'Apple',t2.Field,0)) as apples
FROM Table1 AS t1
INNER JOIN (
SELECT Field, `key` FROM Table2
) AS t2 ON t1.ExtKey = t2.`key`
# GROUP BY t1.extkey uncomment if you need it
If you can provide raw data sample and expected result that would help a lot.
I think you want a group by:
SELECT t1.TheValue, SUM(t2.Field)
FROM Table1 t1 INNER JOIN
(SELECT Field
FROM Table2
WHERE [list of where]
) t2
ON t1.ExtKey = t2.Key
GROUP BY t1.theValue;
Note that your query doesn't quite make sense, because t2 doesn't have a column called key. I assume this is an oversight in the question.
If you want to limit it to particular values, then use a WHERE clause before the GROUP BY:
WHERE t1.TheValue IN ('Apple', 'Orange', 'Pear')

Compare 2 tables and find non duplicate entries

I have 2 tables. I want to check if columns of table 1 don't have duplicates in columns of table2.
Here is how the search should work!
If no duplicates are found, I want to get the row name from table1.
If I got you right, this is what you want.
SELECT
t1.name
FROM
Table1 t1
WHERE
t1.name
NOT IN
(
SELECT t2.name
FROM Table2 t2
JOIN t1
ON t2.name = t1.name
)
You need to specify a column (or columns) that you will use to "match" the rows, to determine whether they are "duplicates".
I'm going to assume (absent any schema information), that the column name is id.
An "anti-join" pattern is usually the best performing option:
SELECT a.id
FROM table1 a
LEFT
JOIN table2 b
ON a.id = b.id
WHERE b.id IS NULL
(Performance is dependent on a whole bunch of factors.)
Your other options are to use a NOT EXISTS predicate:
SELECT a.id
FROM table1 a
WHERE NOT EXISTS
( SELECT 1
FROM table2 b
WHERE b.id = a.id
)
Or, use a NOT IN predicate:
SELECT a.id
FROM table1 a
WHERE a.id NOT IN
( SELECT b.id
FROM table2 b
WHERE b.id IS NOT NULL
)
The generated execution plan and performance of each of these statements will likely differ. With large sets, the "anti-join" pattern (the first query) usually performs best.

SQL Server 2008 column in only one table

First a quick explanation: I am actually dealing with four tables and mining data from different places but my problem comes down to this seemingly simple concept and yes I am very new to this...
I have two tables (one and two) that both have ID columns in them. I want to query only the ID columns that are in table two only, not in both. As in..
Select ID
From dbo.one, dbo.two
Where dbo.two != dbo.one
I actually thought this would work but I'm getting odd results. Can anyone help?
SELECT t2.ID
FROM dbo.two t2
WHERE NOT EXISTS(SELECT NULL
FROM dbo.one t1
WHERE t2.ID = t1.ID)
This could also be done with a LEFT JOIN:
SELECT t2.ID
FROM dbo.two t2
LEFT JOIN dbo.one t1
ON t2.ID = t1.ID
WHERE t1.ID IS NULL
Completing the other 2 options after Joe's answer...
SELECT id
FROM dbo.two
EXCEPT
SELECT id
FROM dbo.one
SELECT t2.ID
FROM dbo.two t2
WHERE t2.ID NOT IN (SELECT t1.ID FROM dbo.one t1)
Note: LEFT JOIN will be slower than the other three, which should all give the same plan.
That's because LEFT JOIN is a join followed by a filter, the other 3 are semi-join

which method is better to join mysql tables?

What is difference between these two methods of selecting data from multiple tables. First one does not use JOIN while the second does. Which one is prefered method?
Method 1:
SELECT t1.a, t1.b, t2.c, t2.d, t3.e, t3.f
FROM table1 t1, table2 t2, table3 t3
WHERE t1.id = t2.id
AND t2.id = t3.id
AND t3.id = x
Method 2:
SELECT t1.a, t1.b, t2.c, t2.d, t3.e, t3.f
FROM `table1` t1
JOIN `table2` t2 ON t1.id = t2.id
JOIN `table3` t3 ON t1.id = t3.id
WHERE t1.id = x
For your simple case, they're equivalent. Even though the 'JOIN' keyword is not present in Method #1, it's still doing joins.
However, method #2 offers the flexibility of allowing extra conditions in the JOIN condition that can't be accomplished via WHERE clauses. Such as when you're doing aliased multi-joins against the same table.
select a.id, b.id, c.id
from sometable A
left join othertable as b on a.id=b.a_id and some_condition_in_othertable
left join othertable as c on a.id=c.a_id and other_condition_in_othertable
Putting the two extra conditions in the whereclause would cause the query to return nothing, as both conditions cannot be true at the same time in the where clause, but are possible in the join.
The methods are apparently identical in performance, it's just new vs old syntax.
I don't think there is much of a difference. You could use the EXPLAIN statement to check if MySQL does anything differently. For this trivial example I doubt it matters.

mysql SELECT NOT IN () -- disjoint set?

I'm having a problem getting a query to work, which I think should work. It's in the form
SELECT DISTINCT a, b, c FROM t1 WHERE NOT IN ( SELECT DISTINCT a,b,c FROM t2 ) AS alias
But mysql chokes where "IN (" starts. Does mysql support this syntax? If not, how can I go about getting these results? I want to find distinct tuples of (a,b,c) in table 1 that don't exist in table 2.
You should use not exists:
SELECT DISTINCT a, b, c FROM t1 WHERE NOT EXISTS (SELECT NULL FROM t2 WHERE t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c)
Using NOT IN is not the best method to do this, even if you check only one key. The reason is that if you use NOT EXISTS the DBMS will only have to check indices if indices exist for the needed columns, where as for NOT IN it will have to read the actual data and create a full result set that subsequently needs to be checked.
Using a LEFT JOIN and then checking for NULL is also a bad idea, it will be painfully slow when the tables are big since the query needs to make the whole join, reading both tables fully and subsequently throw away a lot of it. Also, if the columns allow for NULL values checking for NULL will report false positives.
I had trouble figuring out the right way to execute this query, even with the answers provided; then I found the MySQL documentation reference I needed:
SELECT DISTINCT store_type
FROM stores
WHERE NOT EXISTS (SELECT * FROM cities_stores WHERE cities_stores.store_type = stores.store_type);
The trick I had to wrap my brain around was using the reference to the 'stores' table from the first query inside the subquery. Hope this helps (or helps others, since this is an old thread.)
From http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html
SELECT DISTINCT t1.* FROM t1 LEFT JOIN t2 ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c) WHERE t2.a IS NULL
As far as I know, NOT IN can only be used for 1 field at a time. And the field has to be specified in between "WHERE" and "NOT IN".
(Edit:)
Try using a NOT EXISTS:
SELECT a, b, c
FROM t1
WHERE NOT EXISTS
(SELECT *
FROM t2
WHERE t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c)
In addition, an inner join on a, b, and c being equal should give you all non-DISTINCT tuples, while a LEFT JOIN with a WHERE IS NULL clause should give you the DISTINCT ones, as Charles mentioned below.
Well, I'm going to answer my own question, in spite of all the great advice others gave.
Here's the proper syntax for what I was trying to do.
SELECT DISTINCT a, b, c FROM t1 WHERE (a,b,c) NOT IN ( SELECT DISTINCT a,b,c FROM t2 )
Can't vouch for the efficiency of it, but the broader questions I was implicitly putting was "How do I express this thought in SQL", not "How do I get a particular result set". I know that's unfair to everyone who took a stab, sorry!
Need to add a column list after the WHERE clause and REMOVE the alias.
I tested this with a similar table and it is working.
SELECT DISTINCT a, b, c
FROM t1 WHERE (a,b,c)
NOT IN (SELECT DISTINCT a,b,c FROM t2)
Using the mysql world db:
-- dont include city 1, 2
SELECT DISTINCT id, name FROM city
WHERE (id, name)
NOT IN (SELECT id, name FROM city WHERE ID IN (1,2))