MySQL: JOIN where ON may compare with null - mysql

I have:
simple_table
|- first_id
|- second_id
SELECT * FROM table t1 JOIN table t2
ON [many many conditions]
ON t1.id IN (SELECT first_id FROM simple_table)
AND t2 = (
SELECT second_id FROM simple_table WHERE t1.id = first_id //4th row, can return NULL
)
Questions:
How to handle situation where 4th row return null?
Can I use t1 & t2 alias inside subqueries?
Updated [extra wxplanation]
I have very big table. I need to iterate through table and check some conditions. Actually simple_table provide the ids of table entities, conditions of which I should check. I mean:
simple_table
first_id second_id
11 128
table
id <other_fields>
................
11 <other_data>
...............
128 <other_data>
So, I should check whether those two entities in table have right conditions relatively one another.

The question is unclear, but given the update the query should work better if there is an index on the ID of the big table (probably it's there already as the PK).
As the condition seems to be on the same table the easiest query will be
SELECT ...
FROM bigtable t1
INNER JOIN simple_table st ON t1.ID IN (st.first_id, st.second_id)
or
SELECT ...
FROM bigtable t1
INNER JOIN simple_table st ON t1.ID = st.first_id
INNER JOIN bigtable t2 ON st.second_id = t2
to get the two rows from bigtable on the same row of the result.
The second query will make the checks easier to write, the first will be faster but most probable need a GROUP BY to return the wanted results.
Some performance tests on the OP machine are needed to get the fastest one.
In case one of the ID in simple_table is NULL only the other will be considered, the code will have to check about it.
You can use the alias of the tables in the subqueries, and you'll need to do that as you'll probably have the same table in the subqueries.
The relative condition to check are still undisclosed by the OP so that's all I can help with.

Related

MySQL. Subtract data from table2 in table1

Simple question, but I don't get the way to acomplish it.
Table 1.
ID Quantity
1 4
2 5
3 2
Table 2
ID Quantity
2 1
3 2
I want the query to obtain the following result:
Table result
ID Quantity
1 4
2 4
I have been looking for something related with MINUS operator or NOT IN, but the thing is I want to substract the quantity in the same query.
EDIT: Table 1 is always bigger than Table 2. Table 2 can't contain id's that are not present in table 1.
I hope the example clarifies the question.
Regards!!
Sounds like a classic use-case of a join:
SELECT table1.value - COALESCE(table2.value, 0) AS value
FROM table1
LEFT OUTER JOIN table2
ON table1.id = table2.id
WHERE table1.value != table2.value
-- insert order by clauses/etc if needed
This will compute the values of table2's value minus table1's. You can get a good overview of different joins here. This uses a left join, which will only include results where there are ids in both table1 and table2 that match, and then uses COALESCE to turn the null/non-match from table2 into a 0.
The last statement's purpose is to finally remove results which equate to 0, so this would not include the (3, 0) result.
You can also use this join to create a view of the output, which has advantages like caching and speeding up your lookups.
SELECT table1.value - IFNULL(table2.value, 0) AS value
FROM table1
LEFT JOIN table2
ON table1.ID = table2.ID
WHERE table1.value > table2.value
To walk you through the above query. You use a LEFT JOIN here to combine your two tables. LEFT JOIN is specifically used since not all table 1 IDs all guaranteed to appear in table 2, but you still want to output these results. You use the ID in your ON condition since that is how you are matching the tables. You need to include the IFNULL statement since table 1 IDs with no matching table 2 IDs will result in NULL table 2 values for that joined row. You then subtract these two values to obtain your result. The WHERE clause here will remove rows which would have returned a value equal to or less than zero.
Use this SELECT statement:
SELECT T1.ID, T1.Quantity - COALESCE(T2.Quantity, 0) AS Quantity
FROM T1 LEFT JOIN T2 ON T1.ID = T2.ID
ORDER BY T1.ID;

Advanced Mysql Query to get master record if two conditions matches on different rows of child records

I was writing a mysql filter query which has a primary table and another table which holds multiple records against each record of primary table (I will call this table child).
Am trying to write a query which fetches record of primary table based on its values on child table. If the child table condition is one then I will be able to do it simply by joining, but I have 2 conditions which falls on same field.
For ex.
table 1:
id name url
1 XXX http://www.yahoo.com
2 YYY http://www.google.com
3 ZZZ http://www.bing.com
table 2:
id masterid optionvalue
1 1 2
2 1 7
3 2 7
4 2 2
5 3 2
6 3 6
My query has to return unique master records when the optionvalue matches only both 2 different conditions match on second table.
I wrote query with IN...
select * from table1
left join table2 on table1.id=table2.masterid
where table2.optionvalue IN(2,7) group by table1.id;
This gets me all 3 records because IN is basically checking 'OR', but in my case I should not get 3rd master record because it has values 2,6 (there is no 7). If I write query with 'AND' then am not getting any records...
select * from table1
left join table2 on table1.id=table2.masterid
where table2.optionvalue = 2 and table2.optionvalue = 7;
This will not return records as the and will fail as am checking different values on same column. I wanted to write a query which fetches master records which has child records with field optionvalues holds both 2 and 7 on different records.
Any help would be much appreciated.
Indeed, as AsConfused hinted, you need to two joins to TABLE2 using aliases
-- both of these are tested:
-- find t1 where it has 2 and 7 in t2
select t1.*
from table1 t1
join table2 ov2 on t1.id=ov2.masterid and ov2.optionValue=2
join table2 ov7 on t1.id=ov7.masterid and ov7.optionValue=7
-- find t1 where it has 2 and 7 in t2, and no others in t2
select t1.*, ovx.id
from table1 t1
join table2 ov2 on t1.id=ov2.masterid and ov2.optionValue=2
join table2 ov7 on t1.id=ov7.masterid and ov7.optionValue=7
LEFT OUTER JOIN table2 ovx on t1.id=ovx.masterid and ovx.optionValue not in (2,7)
WHERE ovx.id is null
You can try something like this (no performance guarantees, and assumes you only want exact matches):
select table1.* from table1 join
(select masterid, group_concat(optionvalue order by optionvalue) as opt from table2
group by masterid) table2_group on table1.id=table2_group.masterid
where table2_group.opt='2,7';
http://sqlfiddle.com/#!9/673094/9
select * from t1 where id in
(select masterid from t2 where
(t2.masterid in (select masterid from t2 where optionvalue=2))
and (t2.masterid in (select masterid from t2 where optionvalue=7)))
Old school :-) Query took 0.0009 sec.
This can also be done without the joins using correlated exists subqueries. That may be more efficient.
select *
from table1
WHERE EXISTS (SELECT 1 FROM table2 WHERE table1.id=table2.masterid and optionvalue = 2)
AND EXISTS (SELECT 1 FROM table2 WHERE table1.id=table2.masterid and optionvalue = 7)
If this is to be an exclusive match as suggested by, "when the optionvalue matches only both 2 different conditions match on second table" then you could ad yet a third exists condition. Performance-wise this may start to break down.
AND NOT EXISTS (SELECT 1 FROM table2 WHERE table1.id=table2.masterid AND optionvalue NOT IN (2,7)
Edit: A note on correlated subqueries from Which one is faster: correlated subqueries or join?.

Create a VIEW where a record in t1 is not present in t2 ? Confirmation on Union/Left Join/Inner Join?

I am trying to make a view of records in t1 where the source id from t1 is not in t2.
Like... "what records are not present in the other table?"
Do I need to include t2 in the FROM clause? Thanks
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1
WHERE t1.fee_source_id NOT IN (
SELECT t1.fee_source_id
FROM t1 INNER JOIN t2 ON t1.fee_source_id = t2.fee_source
)
ORDER BY t1.aif_id DESC
You're looking to effect an anti-join, for which there are three possibilities in MySQL:
Using IN:
SELECT fee_source_id, company_name, document
FROM t1
WHERE fee_source_id NOT IN (SELECT fee_source FROM t2)
ORDER BY aif_id DESC
Using EXISTS:
SELECT fee_source_id, company_name, document
FROM t1
WHERE NOT EXISTS (
SELECT * FROM t2 WHERE t2.fee_source = t1.fee_source_id LIMIT 1
)
ORDER BY aif_id DESC
Using JOIN:
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1 LEFT JOIN t2 ON t2.fee_source = t1.fee_source_id
WHERE t2.fee_source IS NULL
ORDER BY t1.aif_id DESC
According to #Quassnoi's analysis:
Summary
MySQL can optimize all three methods to do a sort of NESTED LOOPS ANTI JOIN.
It will take each value from t_left and look it up in the index on t_right.value. In case of an index hit or an index miss, the corresponding predicate will immediately return FALSE or TRUE, respectively, and the decision to return the row from t_left or not will be made immediately without examining other rows in t_right.
However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient than those that execute index_subquery and LEFT JOIN optimized to use Not exists method.
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.
However, I'm not entirely sure how this analysis reconciles with the MySQL manual section on Optimizing Subqueries with EXISTS Strategy which (to my reading) suggests that the second approach above should be more efficient than the first.
Another option below (similar to anti-join)... Great answer above though. Thanks!
SELECT D1.deptno, D1.dname
FROM dept D1
MINUS
SELECT D2.deptno, D2.dname
FROM dept D2, emp E2
WHERE D2.deptno = E2.deptno
ORDER BY 1;

Inner join and Split on large volume of data

We are working on large volume data (row counts given below) :
Table 1 : 708408568 rows -- 708 million
Table 2 : 1416817136 rows -- 1.4 billion
Table 1 Schema:
----------------
ID - Int PK
column2 - Int
Table 2 Schema
----------------
Table1ID - Int FK
SomeColumn - Int
SomeColumn - Int
Table1 has PK1 which servers as FK for Table 2.
Index details :
Table1 :
PK Clustered Index on Id
Non Clustered (Non Unique) on column2
Table 2 :
Table1ID (FK) Clustered Index
Below is the query which needs to be executed :
SELECT t1.[id]
,t1.[column2]
FROM Table1 t1
inner join Table2 t2
on s.id = cs.id
WHERE t1.[column2] in (select [id] from ConvertCsvToTable('1,2,3,4,5.......10000')) -- 10,000 Comma seperated Ids
So to summarize, The inner join on ID should be handled by the clustered index on the same Ids on both PK and FK.
and as for the "huge" Where condition on column2 we have a nonclustered index.
However, the query is taking 4 minutes for a small subset of 100 Ids, we need to pass 10,000 ids.
Is there a better way design wise that we can do this, or possibly does Table Partitioning help?
Just wanted to get some ways of how to solve huge volume Select with Inner Join and Where IN.
Note : ConvertCsvToTable is a Split function which has already been determined to perform optimally.
Thanks !
This is what I would try:
Create a temp table with the structure of the return from the function. Make sure to set the column ID as primary key so that the optimizer takes it into consideration...
CREATE TABLE #temp
(id int not null
...
,PRIMARY KEY (id) )
then call the function
insert into #temp exec ConvertCsvToTable('1,2,3,4,5.......10000')
then use the temp table directly joined in the query
SELECT t1.[id], t1.[column2]
FROM Table1 t1, t2, #temp
where t1.id = t2.id
and t1.[column2] = #temp.id
Bring the condition into the join
It gives the optimizer a chance to first filter by t1.[column2] first
Try different hash hints
SELECT t1.[id], t1.[column2]
FROM Table1 t1 with (nolock)
inner join Table2 t2 with (nolock)
on s.id = cs.id
and t1.[column2] in (select [id] from ConvertCsvToTable('1,2,3,4,5.......10000'))
You may need to tell it to use that index on Column2.
But give it a chance to do the right thing.
In the where you were not giving it a chance to do the right thing.
If you go with #temp then try
(and declare a PK on the temp as Rodolfo stated +1)
This will pretty much force it to start with small table
It could still get stupid do the join on T2 first but I doubt it.
SELECT t1.[id], t1.[column2]
FROM #temp
JOIN Table1 t1 with (nolock)
on t1.[column2] = #temp.ID
join Table2 t2 with (nolock)
on t2.ID = t1.ID

Insert missing records from one table to another using mysql

I don't know why I am confused with this query.
I have two table: Table A with 900 records and Table B with 800 records. Both table need to contain the same data but there is some mismatch.
I need to write a mysql query to insert missing 100 records from Table A to Table B.
In the end, both Table A and Table B should be identical.
I do not want to truncate all the entries first and then do a insert from another table. So please any help is appreciated.
Thank you.
It is also possible to use LEFT OUTER JOIN for that. This will avoid subquery overhead (when system might execute subquery one time for each record of outer query) like in John Woo's answer, and will avoid doing unnecessary work overwriting already existing 800 records like in user2340435's one:
INSERT INTO b
SELECT a.* FROM a
LEFT OUTER JOIN b ON b.id = a.id
WHERE b.id IS NULL;
This will first select all rows from A and B tables including all columns from both tables, but for rows which exist in A and don't exist in B all columns for B table will be NULL.
Then it filter only such latter rows (WHERE b.id IS NULL),
and at last it inserts all these rows into B table.
I think you can use IN for this. (this is a simpliplification of your query)
INSERT INTO table2 (id, name)
SELECT id, name
FROM table1
WHERE (id,name) NOT IN
(SELECT id, name
FROM table2);
SQLFiddle Demo
AS you can see on the demonstration, table2 has only 1 records but after executing the query, 2 records were inserted on table2.
If it's mysql and the tables are identical, then this should work:
REPLACE INTO table1 SELECT * FROM table2;
This will insert the missing records into Table1
INSERT INTO Table2
(Col1, Col2....)
(
SELECT Col1, Col2,... FROM Table1
EXCEPT
SELECT Col1, Col2,... FROM Table2
)
You can then run an update query to match the records that differ.
UPDATE Table2
SET
Col1= T1.Col1,
Col2= T1.Col2,
FROM
Table T1
INNER JOIN
Table2 T2
ON
T1.Col1 = T2.Col1
Code also works when a group by and having clauses are used. Tested SQL 2012 (11.0.5058) Tab1 is source with new records, Tab 2 is the destination to be updated. Tab 2 also has an Identity column. (Yes folks, real world is not as neat and clean as the lab assignments)
INSERT INTO Tab2
SELECT a.T1,a.T2,a.T3,a.T4,a.Val1,a.Val2,a.Val3,a.Val4,-9,-9,-9,-9,MIN(hits) MinHit,MAX(hits) MaxHit,SUM(count) SumCnt, count(distinct(week)) WkCnt
FROM Tab1 a
LEFT OUTER JOIN Tab2 b ON b.t1 = a.t1 and b.t2 = a.t2 and b.t3 = a.t3 and b.t4 = a.t4 and b.val1 = a.val1 and b.val2 = a.val2 and b.val3 = a.val3 and b.val4 = a.val4
WHERE b.t1 IS NULL or b.Val1 is NULL
group by a.T1,a.T2,a.T3,a.T4,a.Val1,a.Val2,a.Val3,a.Val4 having MAX(returns)<4 and COUNT(distinct(week))>2 ;