I have three data tables that different columns (<500 each) but share a common "id" column.
They look like:
table A
id A1 A2 ...
1 xxx xxx ...
2 xxx xxx ...
... ... ... ...
table B
id1 B1 B2 ...
1 xxx xxx ...
2 xxx xxx ...
... ... ... ...
table C
id2 C1 C2 ...
1 xxx xxx ...
2 xxx xxx ...
... ... ... ...
My goal is to join them into something like:
id A1 A2 ... B1 B2 ... C1 C2 ...
1 xxx xxx ... xxx xxx ... xxx xxx ...
2 xxx xxx ... xxx xxx ... xxx xxx ...
... ... ... ... ... ... ... ... ... ...
I was trying to join them together using
CREATE TABLE my_table
SELECT *
FROM table_A
LEFT OUTER JOIN table_B
ON table_A.id = table_B.id1
LEFT OUTER JOIN table_C
ON table_A.id = table_C.id2;
and it's been taking hours. But joining two of them takes less than 5 minutes using:
CREATE TABLE my_table
SELECT *
FROM table_A
LEFT OUTER JOIN table_B
ON table_A.id = table_B.id1
I tried using EXPLAIN, and here's what I get:
id select_type table type posibble_keys key key_len ref rows filtered Extra
1 SIMPLE table_A ALL (Null) (Null) (Null) (Null) 59670 100
1 SIMPLE table_B ALL (Null) (Null) (Null) (Null) 39776 100 Using; Using join buffer (Block Nested Loop) where
1 SIMPLE table_C ALL (Null) (Null) (Null) (Null) 50208 100 Using; Using join buffer (Block Nested Loop) where
I searched around and found posts saying that "Using join buffer (Block Nested Loop)" is a low-efficiency way and suggesting disabling this using SET optimizer_switch='block_nested_loop=off';. However, when I tried this, even joining two tables take more than 10 minutes, which seems a huge drop on perfoemance.
It seems that BNL is used only when there is no index to join on, which is not true given that all three tables have the "id" column?
I really wonder if there is some way to make the joining of these tables faster.
Maybe I should adjust the way of joining in my code?
Maybe I should turn some option on/off?
Any advice?
If that smaller join works much faster, try to do in these smaller steps.
Start with something like
CREATE temporary TABLE my_table_AB
SELECT *
FROM table_A
LEFT OUTER JOIN table_B
ON table_A.id = table_B.id
then
CREATE TABLE my_table
SELECT *
FROM my_table_AB
LEFT OUTER JOIN table_C
ON my_table_AB.id = table_C.id
Another thing is - do you need to have LEFT JOIN here?
As it was marked as solved and we found solution during discussion, I will put it here just for reference - an issue there was missing primary keys. After adding it, it worked as expected.
it may be choking since you are trying to create table from the select. The issue is that a new table can not have a duplicate of a column name in a table. You may need to be explicit, something like
CREATE TABLE my_table
SELECT
a.id,
a.A1,
a.A2,
a.[rest of columns],
b.B1,
b.B2,
b.[rest of columns],
c.C1,
c.C2,
c.[rest of columns]
FROM
table_A a
LEFT JOIN table_B b
ON a.id = a.id
LEFT JOIN table_C c
ON a.id = c.id
With 500 rows it should be almost instantaneous
As we need all the columns from all the 3 tables based on joining on id column of all tables, Instead of Left Join can't we use INNER JOIN here?
You are probably generating a Cartesian product between the three tables. You can calculate the total number of rows in the result set using:
select sum(a.cnt * coalesce(b.cnt, 1) * coalesce(c.cnt, 1))
from (select id, count(*) as cnt from a group by id) a left join
(select id, count(*) as cnt from b group by id) b
on a.id = b.id left join
(select id, count(*) as cnt from c group by id) c
on a.id = c.id;
My guess is that the number is much, much larger than you expect. This is because b and c both have multiple rows for some ids. You haven't explained what results you want in such cases, so it is hard to provide an actual solution to your question. But this should explain the performance issue.
Related
I have three data tables that have the same length (~50000), different columns (<500 each), and share a common "id" column.
They look like:
table A
id A1 A2 ...
1 xxx xxx ...
2 xxx xxx ...
... ... ... ...
n xxx xxx ...
table B
id B1 B2 ...
1 xxx xxx ...
2 xxx xxx ...
... ... ... ...
n xxx xxx ...
table C
id C1 C2 ...
1 xxx xxx ...
2 xxx xxx ...
... ... ... ...
n xxx xxx ...
I was trying to join them together using
CREATE TABLE my_table
SELECT *
FROM table_A
LEFT OUTER JOIN table_B
ON table_A.id = table_B.id
LEFT OUTER JOIN table_C
ON table_A.id = table_C.id;
and it's been taking hours.
However, when I do it by two separate steps like
CREATE TABLE my_table_0
SELECT *
FROM table_A
LEFT OUTER JOIN table_B
ON table_A.id = table_B.id;
CREATE TABLE my_table_1
SELECT *
FROM my_table_0
LEFT OUTER JOIN table_C
ON my_table_0.id = table_C.id;
Each "step" only takes less than 5 minutes.
Does anyone know whether this is normal and what's causing it? I wonder if there is a faster way I can join three tables altogether without creating intermediary tables.
Sometimes (My)SQL can be strange.
What maybe already could help in your case is using an inner join, if i understand this correctly all tables share the id column so this should be already a bit faster.
To get a better understanding about what is going on when you execute your query you can use the EXPLAIN keyword, there are some articles using it and understanding the output.
For example this is a good read: https://www.exoscale.com/syslog/explaining-mysql-queries/
When doing a UNION of 2 tables, you should use FULL OUTER JOIN. Would you try to execute below codes and let me know if it works:
CREATE TABLE my_table
SELECT *
FROM table_A
FULL OUTER JOIN table_B
ON table_A.id = table_B.id
FULL OUTER JOIN table_C
ON table_A.id = table_C.id;
And if you would like to join the 3 tables while maintaining the length of the first table (same number of rows), you should use LEFT JOIN:
CREATE TABLE my_table
SELECT *
FROM table_A
LEFT JOIN table_B
ON table_A.id = table_B.id
LEFT JOIN table_C
ON table_A.id = table_C.id;
Depending on the SQL software, not all would understand LEFT OUTER JOIN.
Try to use either OUTER JOIN or LEFT JOIN.
TableA
clientId clientPassword
1 1234
2 1234
3 1234
TableB
clientId clientCode
1 TRN
2 ABC
3 CDE
3 TRN
What would be the query to select TableA.clientPassword with only those clientID which does not have 'TRN' in TableB.clientCode ?
Part of a complex query but simplified to get my question answered.
Not exists works perfectly and is the logically straight forward way to write this but isn't always the most performant option. Using a not in relies on the DB system to figure out it can flatten the query out to avoid running it row by row. With this simple of a query the DB system likely will figure it out but you can write it in a flattened way.
SELECT a.*
FROM TableA a LEFT JOIN TableB b ON a.clientId = b.clientid
AND b.clientcode = 'TRN'
WHERE b.ClientId IS NULL
To explain this a bit the left join will join table B to table A where the ID's match and the clientcode is 'TRN' but will keep all entries in table A and have NULLs for table B when a record doesn't exist with 'TRN' so then the is null check is equivalent to the not exists in the other query but avoids the row by row checking of a correlated sub-query and should be much faster.
This is a basic not exists;
select a.*
from tableA a
where not exists (select 1
from tableB b
where a.clientId = b.clientId and b.clientcode = 'TRN'
);
More here - https://technet.microsoft.com/en-us/library/ms184297(v=sql.105).aspx
I have two tables I'm trying to join to produce a unique set of data for a third table, but having trouble doing this properly.
The left table has an id field, as well as a common join field (a).
The right table has the common join field (a), and another distinct field (b)
I'm trying to extract a result-set of id and b, where neither id nor b are duplicated.
I have an SQL fiddle set up: http://www.sqlfiddle.com/#!9/208de/3/0
The ideal results should be:
id | b
---+---
1 | 1
2 | 2
3 | 3
Each id and b value appears only once (it's only coincidence they match here, that can't be assumed always).
Thanks
What about a CTE along with a DISTINCT, Would that work?
WITH
cte1 (ID, B)
AS
(
SELECT DISTINCT Table1.ID
FROM Table1
WHERE Table1.ID IS NOT NULL
GROUP BY Table1.ID
)
SELECT DISTINCT
Table2.b
FROM Table2 AS sp
INNER JOIN cte1 AS ts
ON sp.b <> ts.ID
ORDER BY ts.ID DESC
I have two tables:
table1
#id_table1 | code1
#---------------------
# 1 | abc
# 2 | abcd
# 3 | abcde
table2
#id_table2|code2
#--------------------
# 1 | aaa
# 2 | bbb
# 3 | abcde
If i want to join this two tables and get records which are in both tables:
SELECT table1.code1, table2.code2 FROM table1, table2
WHERE table1.code1=table2.code2
Result: abcde
It's easy, but now I need to do the opposite : I want records from table1.code1 which aren't in table2.code2
Result i need: abc, abcd
And records from table2.code2, which aren't in table1.code1
Result i need: aaa, bbb
I would appriciate any help - thanks in advance!
Actually just noticed this is tagged specifically for MySQL, which doesn;t support FULL OUTER JOIN (if you are on another SQL system that supports this, you can skip on down for preferred approach.
So, in MySQL you need to UNION together both a left and right join like this:
SELECT
table1.code1,
table2.code2
FROM table1
LEFT JOIN table2 ON table1.code1=table2.code2
WHERE table2.code2 IS NULL
UNION
SELECT
table1.code1,
table2.code2
FROM table1
RIGHT JOIN table2 ON table1.code1=table2.code2
WHERE table1.code1 IS NULL
If you have FULL OUTER JOIN compatibility, you would perform the FULL OUTER JOIN and look for cases where the join results in null records on the field you are trying to join on.
SELECT
table1.code1,
table2.code2
FROM table1
FULL OUTER JOIN table2 ON table1.code1=table2.code2
WHERE table1.code1 IS NULL OR table2.code2 IS NULL
Here is a well-known article explaining how to perform different types of joins: http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
Quite simple actually:
SELECT code1
FROM table1 LEFT JOIN table2
ON table1.code1 = table2.code2
WHERE code2 IS NULL
And the same with the opposite table
I am looking for ways to merge row values into one row where the column to merge is the same
Transform:
FK | F1
========
3 | ABC
3 | DEF
to
FK | F1 | F2
=================
3 | ABC | DEF
Update:
I initially don`t know the values of F1. They might be everything, but I know they are unique for a given FK and they are varchars.
Update 2:
With your help I came to this query that will also add the FK for which there is only one value. I suppose it could be improved.
SELECT IFNULL(jointable.FK,table.FK) AS FK, IFNULL(jointable.F1,table.F1), jointable.F2
FROM table
LEFT JOIN
(SELECT T1.FK, T1.F1, T2.F1 AS F2
FROM table T1
LEFT JOIN table T2 ON T1.FK = T2.FK
WHERE T1.F1 <> T2.F1
GROUP BY T1.FK
) as jointable
ON table.FK=jointable.FK
GROUP BY FK;
Try this
SELECT FK
, T1.F1
, T2.F1 AS F2
FROM table T1
LEFT JOIN table T2 ON T1.FK = T2.FK AND T1.F1 <> T2.F1 --Criteria moved here
The LEFT JOIN is used since you mentioned that you have 1 or more values, which means the INNER JOIN could end up excluding rows.
The second criteria is to make sure you don't en up with rows like:
FK | F1 | F2
=================
3 | ABC | ABC
Please be aware that in case of an OUTER JOIN (either LEFT or RIGHT) the join criteria is not the same as the filter criteria, and therefore I moved it above.
In SQL Server, you can use ROW_NUMBER() over FK, maybe with an ORDER BY.
In MySQL you might be able to use it with a GROUP BY as you mentioned in comments, I am not sure it will work (at least not in SQL Server without an aggregate function or a CTE).
Here is a live test: http://ideone.com/Bu5aae
A suggestion:
SELECT FK, CONCAT(T1.F1,'',T2.F1) AS Result
FROM table T1, table T2
WHERE T1.FK = T2.FK