I have two tables TABLE1 which looks like:
id name address
1 mm 123
2 nn 143
and TABLE2 w/c looks like:
name age
mm 6
oo 9
I want to get the non existing names by comparing the TABLE1 with the TABLE2.
So basically, I have to get the 2nd row, w/c has a NN name that doesn't exist in the TABLE2, the output should look like this:
id name address
2 nn 143
I've tried this but it doesn't work:
SELECt w.* FROM TABLE1 W INNER JOIN TABLE2 V
ON W.NAME <> V.NAME
and it's still getting the existing records.
An INNER JOIN doesn't help here.
One way to solve this is by using a LEFT JOIN:
SELECT w.*
FROM TABLE1 W
LEFT JOIN TABLE2 V ON W.name = V.name
WHERE ISNULL(V.name);
The relational operator you require is semi difference a.k.a. antijoin.
Most SQL products lacks an explicit semi difference operator or keyword. Standard SQL-92 doesn't have one (it has a MATCH (subquery) semijoin predicate but, although tempting to think otherwise, the semantics for NOT MATCH (subquery) are not the same as for semi difference; FWIW the truly relational language Tutorial D successfully uses the NOT MATCHING semi difference).
Semi difference can of course be written using other SQL predicates. The most commonly seen are: outer join with a test for nulls in the WHERE clause, closely followed by EXISTS or IN (subquery). Using EXCEPT (equivalent to MINUS in Oracle) is another possible approach if your SQL product supports it and again depending on the data (specifically, when the headings of the two tables are the same).
Personally, I prefer to use EXISTS in SQL for semi difference join because the join clauses are closer together in the written code and doesn't result in projection over the joined table e.g.
SELECT *
FROM TABLE1 W
WHERE NOT EXISTS (
SELECT *
FROM TABLE2 V
WHERE W.NAME = V.NAME
);
As with NOT IN (subquery) (same for the outer join approach), you need to take extra care if the WHERE clause within the subquery involves nulls (hint: if WHERE clause in the subquery evaluates UNKNOWN due to the presence of nulls then it will be coerced to be FALSE by EXISTS, which may yield unexpected results).
UPDATE (3 years on): I've since flipped to preferring NOT IN (subquery) because it is more readable and if you are worried about unexpected results with nulls (and you should be) then stop using them entirely, I did many more years ago.
One way in which it is more readable is there is no requirement for the range variables W and V e.g.
SELECT * FROM TABLE1 WHERE name NOT IN ( SELECT name FROM TABLE2 );
Related
It seems to me that you can do the same thing in a SQL query using either NOT EXISTS, NOT IN, or LEFT JOIN WHERE IS NULL. For example:
SELECT a FROM table1 WHERE a NOT IN (SELECT a FROM table2)
SELECT a FROM table1 WHERE NOT EXISTS (SELECT * FROM table2 WHERE table1.a = table2.a)
SELECT a FROM table1 LEFT JOIN table2 ON table1.a = table2.a WHERE table1.a IS NULL
I'm not sure if I got all the syntax correct, but these are the general techniques I've seen. Why would I choose to use one over the other? Does performance differ...? Which one of these is the fastest / most efficient? (If it depends on implementation, when would I use each one?)
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
In a nutshell:
NOT IN is a little bit different: it never matches if there is but a single NULL in the list.
In MySQL, NOT EXISTS is a little bit less efficient
In SQL Server, LEFT JOIN / IS NULL is less efficient
In PostgreSQL, NOT IN is less efficient
In Oracle, all three methods are the same.
If the database is good at optimising the query, the two first will be transformed to something close to the third.
For simple situations like the ones in you question, there should be little or no difference, as they all will be executed as joins. In more complex queries, the database might not be able to make a join out of the not in and not exists queryes. In that case the queries will get a lot slower. On the other hand, a join may also perform badly if there is no index that can be used, so just because you use a join doesn't mean that you are safe. You would have to examine the execution plan of the query to tell if there may be any performance problems.
Assuming you are avoiding nulls, they are all ways of writing an anti-join using Standard SQL.
An obvious omission is the equivalent using EXCEPT:
SELECT a FROM table1
EXCEPT
SELECT a FROM table2
Note in Oracle you need to use the MINUS operator (arguably a better name):
SELECT a FROM table1
MINUS
SELECT a FROM table2
Speaking of proprietary syntax, there may also be non-Standard equivalents worth investigating depending on the product you are using e.g. OUTER APPLY in SQL Server (something like):
SELECT t1.a
FROM table1 t1
OUTER APPLY
(
SELECT t2.a
FROM table2 t2
WHERE t2.a = t1.a
) AS dt1
WHERE dt1.a IS NULL;
When need to insert data in table with multi-field primary key, consider that it will be much faster (I tried in Access but I think in any Database) not to check that "not exists records with 'such' values in table", - rather just insert into table, and excess records (by the key) will not be inserted twice.
Performance perspective always avoid using inverse keywords like NOT IN, NOT EXISTS, ...
Because to check the inverse items DBMS need to runs through all the available and drop the inverse selection.
I have two tables, my_table1 and my_table2.
my_table1 contains numbers from 1 to 10 and my_table2 contains letters a, b, c and d.
I want to do a query which returns the following:
1 a
1 b
1 c
1 d
2 a
2 b
2 c
2 d
All the way until the end.
Is there any possible way to do this in SQL?
Thanks in advance.
That is a cross join. You can write that in the simple (old) form by just selecting select * from table1, table2, but this is outdated syntax, and your queries will become very hard to read if you mix this syntax with the more modern explicit joins that were introduced in 1992. So, I'd chose to write the explicit cross join.
Also, it looks like you want the results sorted. If you're lucky this happens automatically, but you cannot be sure that this will always happen, so best to specify it if you need it. If not, omit the order by clause, because it does make the query slower.
select
n.nr,
l.letter
from
my_table1 n
cross join my_table2 l
order by
n.nr,
l.letter
That is a CROSS JOIN, in MySQL equivalent to an INNER JOIN or a regular comma:
SELECT * FROM my_table1, my_table2;
cf. https://dev.mysql.com/doc/refman/5.7/en/join.html
I have this query:
SELECT (#a:=#a+1) AS priority
FROM (SELECT t1.name FROM t1 LIMIT 100) x, (SELECT #a:=0) r
a few questions:
1 - What is the comma doing between the SELECTS? I have never seen a comma between commands, and I don't know what it means
2 - why is the second SELECT given a name?
3 - why is the second SELECT inside brackets?
4 - Performance-wize: Does it select the first 100 rows form t1, and then assigns them a number? What is going on here??
It is performing a CROSS JOIN (a cartesian product of the rows) but without the explicit syntax. The following 2 queries produce identical in results:
SELECT *
FROM TableA, TableB
SELECT *
FROM TableA
CROSS JOIN TableB
The query in the question uses 2 "derived tables" instead. I would encourage you to use the explicit join syntax CROSS JOIN and never use just commas. The biggest issue with using just commas is you have no idea if the Cartesian product is deliberate or accidental.
Both "derived tables" have been given an alias - and that is a good thing. How else would you reference some item of the first or second "derived table"? e.g. Imagine they were both queries that had the column ID in them, you would then be able to reference x.ID or r.ID
Regarding what the overall query is doing. First note that the second query is just a single row (1 row). So even though the syntax produces a CROSS JOIN it does not expand the total number of rows because 100 * 1 = 100. In effect the subquery "r" is adding a "placeholder" #a (initially at value zero) on every row. Once that #a belongs on each row, then you can increment the value by 1 for each row, and as a result you get that column producing a row number.
x and r are effectively anonymous views produced by the SELECT statements. If you imagine that instead of using SELECTs in brackets, you defined a view using the select statement and then referred to the view, the syntax would be clear.
The selects are given names so that you can refer to these names in WHERE conditions, joins or in the list of fields to select.
That is the syntax. You have to have brackets.
Yes, it selects the first 100 rows. I am not sure what you mean by "gives them a number".
This question already has answers here:
INNER JOIN ON vs WHERE clause
(12 answers)
Difference between these two joining table approaches?
(4 answers)
Closed 8 years ago.
I have a table Person with a column id that references a column id in table Worker.
What is the difference between these two queries? They yield the same results.
SELECT *
FROM Person
JOIN Worker
ON Person.id = Worker.id;
and
SELECT *
FROM Person,
Worker
WHERE Person.id = Worker.id;
There is no difference at all.
First representation makes query more readable and makes it look very clear as to which join corresponds to which condition.
The queries are logically equivalent. The comma operator is equivalent to an [INNER] JOIN operator.
The comma is the older style join operator. The JOIN keyword was added later, and is favored because it also allows for OUTER join operations.
It also allows for the join predicates (conditions) to be separated from the WHERE clause into an ON clause. That improves (human) readability.
FOLLOWUP
This answer says that the two queries in the question are equivalent. We shouldn't mix old-school comma syntax for join operation with the newer JOIN keyword syntax in the same query. If we do mix them, we need to be aware of a difference in the order of precedence.
excerpt from MySQL Reference Manual
https://dev.mysql.com/doc/refman/5.6/en/join.html
INNER JOIN and , (comma) are semantically equivalent in the absence of a join condition: both produce a Cartesian product between the specified tables (that is, each and every row in the first table is joined to each and every row in the second table).
However, the precedence of the comma operator is less than that of INNER JOIN, CROSS JOIN, LEFT JOIN, and so on. If you mix comma joins with the other join types when there is a join condition, an error of the form Unknown column 'col_name' in 'on clause' may occur. Information about dealing with this problem is given later in this section.
Beside better readability, there is one more case where explicitly joined tables are better instead of comma-separated tables.
let's see an example:
Create Table table1
(
ID int NOT NULL Identity(1, 1) PRIMARY KEY ,
Name varchar(50)
)
Create Table table2
(
ID int NOT NULL Identity(1, 1) PRIMARY KEY ,
ID_Table1 INT NOT NULL
)
Following query will give me all columns and rows from both tables
SELECT
*
FROM table1, table2
Following query will give me columns from first table with table alias called 'table2'
SELECT
*
FROM table1 table2
If you mistakenly forget comma in comma-separated join, second table automatically convert to table alias for first table. Not in all cases, but there is chances for something like this
Using JOINS makes the code easier to read, since it's self-explanatory.
In speed there is no difference (I tested it) and the execution plan is the same
If the query optimizer is doing its job right, there should be no difference between those queries. They are just two ways to specify the same desired result.
The SELECT * FROM table1, table2, etc. is good for a couple of tables, but it becomes exponentially harder as the number of tables increases.
The JOIN syntax makes it explicit what criteria affects which tables (giving a condition). Also, the second way is the older standard.
Although, to the database, they end up being the same
I'm very much still learning about mySQL (am still really only comfortable with basic queries, count, order by etc.). It is very likely that this question has been asked before, however either I don't know what to search for, or I'm too much of a novice to understand the answers:
I have two tables:
tb1 (a,b,path)
tb2 (a,b,value)
I would like to make a query that returns "path" for each row in tb1 whose a,b matches a different query on tb2. In bad mysql, it would be something like:
select
path
from tb1
where
a=(select a from tb2 where value < v1)
and
b=(select b from tb2 where value < v1);
however, this doesn't work, as the subqueries are returning multiple values. Note that exchanging = by in is not good enough, as that would be true for combinations of a,b-values that are not returned by select a,b from tb2 where value < v1
Basically, I have identified an interesting area in (a,b)-space based on tb2, and would like to study the behavior of tb1 within that area (if that makes it any clearer).
thank you :)
This is a job for an INNER JOIN on both a and b:
SELECT
path
FROM
tb1
INNER JOIN tb2 ON tb1.a = tb2.a AND tb1.b = tb2.b
/* add your condition to the WHERE clause */
WHERE tb2.value < v1
The use cases for subqueries in the SELECT list or WHERE clause can very often be handled instead using some type of JOIN. The join will frequently be faster than the subquery, owing to the fact that when using a SELECT or WHERE subquery, the subquery may need to be performed for each row returned, rather than only once.
Beyond the MySQL documentation on JOINs linked above, I would also recommend Jeff Atwood's Visual Explanation of SQL JOINs
INNER JOIN will do the trick.
You just need two ON criteria in order to match both the a and b values, like so:
SELECT path
FROM tb1
INNER JOIN tb2 ON tb1.a = tb2.a AND tb1.b = tb2.b
WHERE tb2.value < v1
You can limit your result set this way:
select
path
from tb1
where
a=(select a from tb2 where value < v1 LIMIT 1)
and
b=(select b from tb2 where value < v1 LIMIT 1);