mysql SELECT NOT IN () -- disjoint set? - mysql

I'm having a problem getting a query to work, which I think should work. It's in the form
SELECT DISTINCT a, b, c FROM t1 WHERE NOT IN ( SELECT DISTINCT a,b,c FROM t2 ) AS alias
But mysql chokes where "IN (" starts. Does mysql support this syntax? If not, how can I go about getting these results? I want to find distinct tuples of (a,b,c) in table 1 that don't exist in table 2.

You should use not exists:
SELECT DISTINCT a, b, c FROM t1 WHERE NOT EXISTS (SELECT NULL FROM t2 WHERE t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c)
Using NOT IN is not the best method to do this, even if you check only one key. The reason is that if you use NOT EXISTS the DBMS will only have to check indices if indices exist for the needed columns, where as for NOT IN it will have to read the actual data and create a full result set that subsequently needs to be checked.
Using a LEFT JOIN and then checking for NULL is also a bad idea, it will be painfully slow when the tables are big since the query needs to make the whole join, reading both tables fully and subsequently throw away a lot of it. Also, if the columns allow for NULL values checking for NULL will report false positives.

I had trouble figuring out the right way to execute this query, even with the answers provided; then I found the MySQL documentation reference I needed:
SELECT DISTINCT store_type
FROM stores
WHERE NOT EXISTS (SELECT * FROM cities_stores WHERE cities_stores.store_type = stores.store_type);
The trick I had to wrap my brain around was using the reference to the 'stores' table from the first query inside the subquery. Hope this helps (or helps others, since this is an old thread.)
From http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html

SELECT DISTINCT t1.* FROM t1 LEFT JOIN t2 ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c) WHERE t2.a IS NULL

As far as I know, NOT IN can only be used for 1 field at a time. And the field has to be specified in between "WHERE" and "NOT IN".
(Edit:)
Try using a NOT EXISTS:
SELECT a, b, c
FROM t1
WHERE NOT EXISTS
(SELECT *
FROM t2
WHERE t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c)
In addition, an inner join on a, b, and c being equal should give you all non-DISTINCT tuples, while a LEFT JOIN with a WHERE IS NULL clause should give you the DISTINCT ones, as Charles mentioned below.

Well, I'm going to answer my own question, in spite of all the great advice others gave.
Here's the proper syntax for what I was trying to do.
SELECT DISTINCT a, b, c FROM t1 WHERE (a,b,c) NOT IN ( SELECT DISTINCT a,b,c FROM t2 )
Can't vouch for the efficiency of it, but the broader questions I was implicitly putting was "How do I express this thought in SQL", not "How do I get a particular result set". I know that's unfair to everyone who took a stab, sorry!

Need to add a column list after the WHERE clause and REMOVE the alias.
I tested this with a similar table and it is working.
SELECT DISTINCT a, b, c
FROM t1 WHERE (a,b,c)
NOT IN (SELECT DISTINCT a,b,c FROM t2)
Using the mysql world db:
-- dont include city 1, 2
SELECT DISTINCT id, name FROM city
WHERE (id, name)
NOT IN (SELECT id, name FROM city WHERE ID IN (1,2))

Related

Cardinality violation when using a subquery that returns two values

I have create a sql query that the sketch is like this
select *
from A
where A.id in (select B.id1, B.id2 from B);
where the main select returns those values for which A.id coincides with either B.id1 or B.id2.
Clearly this solution doesn't work as the cardinality doesn't match in the where clause. How can I overcome this problem?
One solution would be to make two sub-queries, one for B.id1 and one for B.id2, but as my sub-query is much longer than in this example I was looking for a more elegant solution.
I'm using Mysql
EDIT 1
As long as the syntax is simpler than using two sub-queries I have no issues using joins
EDIT 2
Thanks #NullSoulException. I tried the first solution and works as expected!!
Something like the below should do the trick.
select *
From table1 a , (select id1 , id2 from table2 ) b
where (a.id = b.id1) or (a.id = b.id2)
or you can JOIN with the same table twice by giving the joined tables an alias.
select * from table1 a
INNER JOIN table2 b1 on a.id = b1.id1
INNER JOIN table2 b2 on a.id = b2.id2
Please test the above against your datasets/tables..

Inner Join SQL Syntax

I've never done an inner join SQL statement before, so I don't even know if this is the right thing to use, but here's my situation.
Table 1 Columns: id, course_id, unit, lesson
Table 2 Columns: id, course_id
Ultimately, I want to count the number of id's in each unit in Table 1 that are also in Table 2.
So, even though it doesn't work, maybe something like....
$sql = "SELECT table1.unit, COUNT( id ) as count, table2.id, FROM table1, table2, WHERE course_id=$im_course_id GROUP BY unit";
I'm sure the syntax of what I'm wanting to do is a complete fail. Any ideas on fixing it?
SELECT unit, COUNT( t1.id ) as count
FROM table1 as t1 inner JOIN table2 as t2
ON t1.id = t2.id
GROUP BY unit
hope this helps.
If I understand what you want (maybe you could post an example input and output?):
SELECT unit, COUNT( id ) as count
FROM table1 as t1 JOIN table2 as t2
ON t1.id = t2.id
GROUP BY unit
Okay, so there are a few things going on here. First off, commas as joins are deprecated so they may not even be supported (depending on what you are using). You should probably switch to explicitly writing inner join
Now, whenever you have any sort of join, you also need on. You need to tell sql how it should match these two tables up. The on should come right after the join, like this:
Select *
From table1 inner join table2
on table1.id = table2.id
and table1.name = table2.name
You can join on as many things as you need by using and. This means that if the primary key of one table is several columns, you can easily create a one-to-one match between tables.
Lastly, you may be having issues because of other general syntax errors in your query. A comma is used to separate different pieces of information. So in your query,
SELECT table1.unit, COUNT( id ) as count, table2.id, FROM ...
The comma at the end of the select shouldn't be there. Instead this should read
SELECT table1.unit, COUNT( id ) as count, table2.id FROM ...
This is subtle, but the sql query cannot run with the extra comma.
Another issue is with the COUNT( id ) that you have. Sql doesn't know which id to count since table1 and table2 both have ids. So, you should use either count(table1.id) or count(table2.id)

Create a VIEW where a record in t1 is not present in t2 ? Confirmation on Union/Left Join/Inner Join?

I am trying to make a view of records in t1 where the source id from t1 is not in t2.
Like... "what records are not present in the other table?"
Do I need to include t2 in the FROM clause? Thanks
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1
WHERE t1.fee_source_id NOT IN (
SELECT t1.fee_source_id
FROM t1 INNER JOIN t2 ON t1.fee_source_id = t2.fee_source
)
ORDER BY t1.aif_id DESC
You're looking to effect an anti-join, for which there are three possibilities in MySQL:
Using IN:
SELECT fee_source_id, company_name, document
FROM t1
WHERE fee_source_id NOT IN (SELECT fee_source FROM t2)
ORDER BY aif_id DESC
Using EXISTS:
SELECT fee_source_id, company_name, document
FROM t1
WHERE NOT EXISTS (
SELECT * FROM t2 WHERE t2.fee_source = t1.fee_source_id LIMIT 1
)
ORDER BY aif_id DESC
Using JOIN:
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1 LEFT JOIN t2 ON t2.fee_source = t1.fee_source_id
WHERE t2.fee_source IS NULL
ORDER BY t1.aif_id DESC
According to #Quassnoi's analysis:
Summary
MySQL can optimize all three methods to do a sort of NESTED LOOPS ANTI JOIN.
It will take each value from t_left and look it up in the index on t_right.value. In case of an index hit or an index miss, the corresponding predicate will immediately return FALSE or TRUE, respectively, and the decision to return the row from t_left or not will be made immediately without examining other rows in t_right.
However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient than those that execute index_subquery and LEFT JOIN optimized to use Not exists method.
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.
However, I'm not entirely sure how this analysis reconciles with the MySQL manual section on Optimizing Subqueries with EXISTS Strategy which (to my reading) suggests that the second approach above should be more efficient than the first.
Another option below (similar to anti-join)... Great answer above though. Thanks!
SELECT D1.deptno, D1.dname
FROM dept D1
MINUS
SELECT D2.deptno, D2.dname
FROM dept D2, emp E2
WHERE D2.deptno = E2.deptno
ORDER BY 1;

Select from table1 where similar rows do NOT appear in table2?

I've been struggling with this for a while, and haven't been able to find any examples to point me in the right direction.
I have 2 MySQL tables that are virtually identical in structure. I'm trying to perform a query that returns results from Table 1 where the same data isn't present in table 2. For example, imagine both tables have 3 fields - fieldA, fieldB and fieldC. I need to exclude results where the data is identical in all 3 fields.
Is it even possible?
There are several ways to do it (assuming the fields don't allow NULLs):
SELECT a, b, c FROM Table1 T1 WHERE NOT EXISTS
(SELECT * FROM Table2 T2 WHERE T2.a = T1.a AND T2.b = T1.b AND T2.c = T1.c)
or
SELECT T1.a, T1.b, T1.c FROM Table1 T1
LEFT OUTER JOIN Table2 T2 ON T2.a = T1.a AND T2.b = T1.b AND T2.c = T1.c
WHERE T2.a IS NULL
select
t1.*
from
table1 t1
left join table2 t2 on
t1.fieldA = t2.fieldA and
t1.fieldB = t2.fieldB and
t1.fieldC = t2.fieldC
where
t2.fieldA is null
Note that this will not work if any of the fields is NULL in both tables. The expression NULL = NULL returns false, so these records are excluded as well.
This is a perfect use of EXCEPT (the key word/phase is "set difference"). However, MySQL lacks it. But no fear, a work-around is here:
Intersection and Set-Difference in MySQL (A workaround for EXCEPT)
Please not that approaches using NOT EXISTS in MySQL (as per above link) are actually less than ideal although they are semantically correct. For an explanation of the performance differences with the above (and alternative) approaches as handled my MySQL, complete with examples, see NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL:
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.
Happy coding.
The 'left join' is very slow in MYSQL. The gifford algorithm shown below speeds it many orders of magnitude.
select * from t1
inner join
(select fieldA from
(select distinct fieldA, 1 as flag from t1
union all
select distinct fieldA, 2 as flag from t2) a
group by fieldA
having sum(flag) = 1) b on b.fieldA = t1.fieldA;

Update with nested subquery (sum) to get restricton on update clause

Got this bit of SQL as an update script, I've tried to add a work round to not being able to include the table to be updated as a clause in the statement so using sub queries, but struggling to get this to work.
Essientially I need update a vailue in table 1 with the summation of a field in table 2, but only where the two other fields match across a couple of tables and based on field6 the restriction is applied to the update clause.
UPDATE table1 W SET Field1=(SELECT field2 FROM
(SELECT A.id, B.field3, SUM(A.field2) AS field2
FROM table2 A, table3 B, table4 P
WHERE A.id=B.id AND P.field6=B.field6) B ) WHERE W.field6=B.field6
In the real world example, select the sum of points conceded in a rugby game when a rugby player has participated in the match. table 2 has the results (including the score) table 3 has the team sheets and table 1 and 4 are the same player table to be updated.
Hopefully this is clear enough and someone can point me in the right direction.
Tried the following:
UPDATE $WSLKEEP W, $WSLFIX A, $WSLFIXPLAY B
SET W.F_CONCEDED=SUM(A.F_AGAINST)
WHERE A.F_ID=B.F_GAMEID
AND B.F_NAME=W.F_NAME"
but now stuck with:
Invalid use of group function
Kind regards
It seems like your subquery should be grouping on field6 and exposing that column for inner join with table1. Here's how you do that in MySQL:
UPDATE table1 W
INNER JOIN (
SELECT B.field6, SUM(A.field2) AS field2
FROM table2 A, table3 B, table4 P
WHERE A.id=B.id AND P.field6=B.field6
GROUP BY B.field6
) B ON W.field6=B.field6
SET W.Field1 = B.Field2
And while we are at it, I would also recommend you to refrain from (ab)using comma joins in favour of explicit joins. The latter, however unusual at first after being long accustomed to a different syntax, can very soon become habitual and much more intuitive than the former. A great deal has been said on the topic, and some people may be holding quite strong opinions about comma joins. I say, comma joins can still have their share of use. However, when you are joining on a condition, the current ANSI syntax should be your choice.
Here's the above statement with the subquery transformed so as to use explicit joins:
UPDATE table1 W
INNER JOIN (
SELECT B.field6, SUM(A.field2) AS field2
FROM table2 A
INNER JOIN table3 B ON A.id = B.id
INNER JOIN table4 P ON P.field6 = B.field6
GROUP BY B.field6
) B ON W.field6 = B.field6
SET W.Field1 = B.Field2
For an update query like you have above, you are allowed to include multiple tables in the UPDATE clause, even if you aren't updating all of them. This will make sub-queries unnecessary and speed the execution quite a bit. For example, you can do something like this.
UPDATE table1 W, table2 A, table3 B, table4 P
SET W.Field1 = SUM(A.field2) ...
I'm unclear on the specifics of what you are trying to update exactly, but I just wanted to put out that you can often avoid sub-queries by using this kind of syntax.