Performance/time joining multiple tables - mysql

I have three data tables that have the same length (~50000), different columns (<500 each), and share a common "id" column.
They look like:
table A
id A1 A2 ...
1 xxx xxx ...
2 xxx xxx ...
... ... ... ...
n xxx xxx ...
table B
id B1 B2 ...
1 xxx xxx ...
2 xxx xxx ...
... ... ... ...
n xxx xxx ...
table C
id C1 C2 ...
1 xxx xxx ...
2 xxx xxx ...
... ... ... ...
n xxx xxx ...
I was trying to join them together using
CREATE TABLE my_table
SELECT *
FROM table_A
LEFT OUTER JOIN table_B
ON table_A.id = table_B.id
LEFT OUTER JOIN table_C
ON table_A.id = table_C.id;
and it's been taking hours.
However, when I do it by two separate steps like
CREATE TABLE my_table_0
SELECT *
FROM table_A
LEFT OUTER JOIN table_B
ON table_A.id = table_B.id;
CREATE TABLE my_table_1
SELECT *
FROM my_table_0
LEFT OUTER JOIN table_C
ON my_table_0.id = table_C.id;
Each "step" only takes less than 5 minutes.
Does anyone know whether this is normal and what's causing it? I wonder if there is a faster way I can join three tables altogether without creating intermediary tables.

Sometimes (My)SQL can be strange.
What maybe already could help in your case is using an inner join, if i understand this correctly all tables share the id column so this should be already a bit faster.
To get a better understanding about what is going on when you execute your query you can use the EXPLAIN keyword, there are some articles using it and understanding the output.
For example this is a good read: https://www.exoscale.com/syslog/explaining-mysql-queries/

When doing a UNION of 2 tables, you should use FULL OUTER JOIN. Would you try to execute below codes and let me know if it works:
CREATE TABLE my_table
SELECT *
FROM table_A
FULL OUTER JOIN table_B
ON table_A.id = table_B.id
FULL OUTER JOIN table_C
ON table_A.id = table_C.id;
And if you would like to join the 3 tables while maintaining the length of the first table (same number of rows), you should use LEFT JOIN:
CREATE TABLE my_table
SELECT *
FROM table_A
LEFT JOIN table_B
ON table_A.id = table_B.id
LEFT JOIN table_C
ON table_A.id = table_C.id;

Depending on the SQL software, not all would understand LEFT OUTER JOIN.
Try to use either OUTER JOIN or LEFT JOIN.

Related

SQL - Joining multiple tables with shared index

I have three data tables that different columns (<500 each) but share a common "id" column.
They look like:
table A
id A1 A2 ...
1 xxx xxx ...
2 xxx xxx ...
... ... ... ...
table B
id1 B1 B2 ...
1 xxx xxx ...
2 xxx xxx ...
... ... ... ...
table C
id2 C1 C2 ...
1 xxx xxx ...
2 xxx xxx ...
... ... ... ...
My goal is to join them into something like:
id A1 A2 ... B1 B2 ... C1 C2 ...
1 xxx xxx ... xxx xxx ... xxx xxx ...
2 xxx xxx ... xxx xxx ... xxx xxx ...
... ... ... ... ... ... ... ... ... ...
I was trying to join them together using
CREATE TABLE my_table
SELECT *
FROM table_A
LEFT OUTER JOIN table_B
ON table_A.id = table_B.id1
LEFT OUTER JOIN table_C
ON table_A.id = table_C.id2;
and it's been taking hours. But joining two of them takes less than 5 minutes using:
CREATE TABLE my_table
SELECT *
FROM table_A
LEFT OUTER JOIN table_B
ON table_A.id = table_B.id1
I tried using EXPLAIN, and here's what I get:
id select_type table type posibble_keys key key_len ref rows filtered Extra
1 SIMPLE table_A ALL (Null) (Null) (Null) (Null) 59670 100
1 SIMPLE table_B ALL (Null) (Null) (Null) (Null) 39776 100 Using; Using join buffer (Block Nested Loop) where
1 SIMPLE table_C ALL (Null) (Null) (Null) (Null) 50208 100 Using; Using join buffer (Block Nested Loop) where
I searched around and found posts saying that "Using join buffer (Block Nested Loop)" is a low-efficiency way and suggesting disabling this using SET optimizer_switch='block_nested_loop=off';. However, when I tried this, even joining two tables take more than 10 minutes, which seems a huge drop on perfoemance.
It seems that BNL is used only when there is no index to join on, which is not true given that all three tables have the "id" column?
I really wonder if there is some way to make the joining of these tables faster.
Maybe I should adjust the way of joining in my code?
Maybe I should turn some option on/off?
Any advice?
If that smaller join works much faster, try to do in these smaller steps.
Start with something like
CREATE temporary TABLE my_table_AB
SELECT *
FROM table_A
LEFT OUTER JOIN table_B
ON table_A.id = table_B.id
then
CREATE TABLE my_table
SELECT *
FROM my_table_AB
LEFT OUTER JOIN table_C
ON my_table_AB.id = table_C.id
Another thing is - do you need to have LEFT JOIN here?
As it was marked as solved and we found solution during discussion, I will put it here just for reference - an issue there was missing primary keys. After adding it, it worked as expected.
it may be choking since you are trying to create table from the select. The issue is that a new table can not have a duplicate of a column name in a table. You may need to be explicit, something like
CREATE TABLE my_table
SELECT
a.id,
a.A1,
a.A2,
a.[rest of columns],
b.B1,
b.B2,
b.[rest of columns],
c.C1,
c.C2,
c.[rest of columns]
FROM
table_A a
LEFT JOIN table_B b
ON a.id = a.id
LEFT JOIN table_C c
ON a.id = c.id
With 500 rows it should be almost instantaneous
As we need all the columns from all the 3 tables based on joining on id column of all tables, Instead of Left Join can't we use INNER JOIN here?
You are probably generating a Cartesian product between the three tables. You can calculate the total number of rows in the result set using:
select sum(a.cnt * coalesce(b.cnt, 1) * coalesce(c.cnt, 1))
from (select id, count(*) as cnt from a group by id) a left join
(select id, count(*) as cnt from b group by id) b
on a.id = b.id left join
(select id, count(*) as cnt from c group by id) c
on a.id = c.id;
My guess is that the number is much, much larger than you expect. This is because b and c both have multiple rows for some ids. You haven't explained what results you want in such cases, so it is hard to provide an actual solution to your question. But this should explain the performance issue.

Is This Join Possible?

I have a couple of tables that look like this.
table_a | table_b
-------------------------
prim_key | prim_key
zero_or_one | value1
valueA | value2
valueB | value3
valueZ |
What I'm hoping to do is retrieve all of the values (prim_key, value1, value2, value3) from TABLE B if the primary keys of each table match and the value of zero_or_one in TABLE A is 0.
I'm completely new to joins, and I'm not exactly sure which join I should be using for this, but it seems like a FULL OUTER JOIN is most appropriate.
SELECT table_b.*
FROM table_a
FULL OUTER JOIN table_b
ON table_a.prim_key = table_b.prim_key
Is this even possible?
Am I using the right join for the job?
Is my "select all" syntax correct?
Since you want entries from table_b only when there is a matching primary key found in the table_a; a simple Inner Join would suffice in this case
SELECT table_b.*
FROM table_b
INNER JOIN table_a
ON table_a.prim_key = table_b.prim_key AND
table_a.zero_or_one = 0
This answer is not meant als a real answer this is meant how to simulate FULL OUTER JOIN in MySQL.
FULL OUTER JOIN is not supported in MySQL you can simulate it with a LEFT JOIN, UNION ALL and RIGHT JOIN
SELECT * FROM table_a LEFT JOIN table_b ON table_a.prim_key = table_b.prim_key
UNION ALL
SELECT * FROM table_a RIGHT JOIN table_b ON table_a.prim_key = table_b.prim_key
WHERE table_a.prim_key IS NULL

How to merge records from two tables into third using MYSQL

I have 3 tables A, B, C
Schema of all 3 tables is same as mentioned below:
1st A table:
cpid ,name, place
2nd B table:
connectorid,dob
3rd C table:
ccpid cconnectorid
Now both tables A and B have many records.
Now some of the records in A and B are with same id.
Now I want to merge the records from A and B into Table C.
Merge logic is as follows
1)If records with cpid = connectorid ,insert into table c.
2)C Table ccpid is the foreignkey for A table cpid and cconnectorid is the foreignkey B table connectorid.
3)Using select query.
You can use select insert with a n inner join
insert into table_c
select a.cpid, b.connectorid, a.place
from table_b as b
inner join table_a as a on a.id = b.id
You can try this solution for your query:
INSERT INTO `C`(`ccpid`, `cconnectorid`, `ccity`)
SELECT ta.`cpid`, ta.`cconnectorid`, tb.`place`
FROM `A` as ta
INNER JOIN `B` tb ON ta.`cpid` = tb.`cconnectorid`
You just need join data from both tables? This is simple JOIN function.
SELECT *
FROM Table_A
INNER JOIN Table_B
ON Table_A.cpid =Table_B.connectorid;
You can insert this select to your Table_C.
Here is INNER JOIN, but I think you should take a look to JOINs, here are examples and you can read more about other JOINs.
INNER JOIN: Returns all rows when there is at least one match in BOTH
tables LEFT JOIN: Return all rows from the left table, and the matched
rows from the right table RIGHT JOIN: Return all rows from the right
table, and the matched rows from the left table FULL JOIN: Return all
rows when there is a match in ONE of the tables
use following query replace with your table names
INSERT INTO CTABLE(ccpid,cconnectorid,ccity)
(SELECT A.cpid ,B.connectorid, A.place FROM
TABLEA A INNER JOIN TABLEB B ON A.cpid = B.connectorid)

MySQL - Join two tables, different columns in table 1 on the same column in table 2

I'm trying to join a different column (part_type_n (where n ranges from 1 to 54)) on Table1 with the same column (id, primary, autoinc) on Table2.
Schema:
Table1
==============
part_type_1
.
.
.
part_type_54
Table2
=============
id
I tried the obvious query (php generated, looping through n from 1 to 54), omitted repetitive stuff in ...:
SELECT * FROM Table1 JOIN Table2 on (Table1.part_type_1=Table2.id), ..., (Table1.part_type_54=Table2.id)
I receive this error:
1066 - Not unique table/alias: 'Table2'
How do I join these two tables?
You will have to join the table on it self again multiple times.
SELECT * FROM table1 t1
INNER JOIN table2 t2 on t2.Id=t1.part_type_1
INNER JOIN table2 t3 on t3.id = t1.part_type_54;
Hope this helps!
As an alternative to writing a query with 54 table aliases, you could consider joining to the table once - like so:
select ...
from Table1 t1
join Table2 t2
on t2.id in (t1.part_type_1, t1.part_type_2, ... t1.part_type_54)
It worked for me to get my required result as one row of which matches various categories all stored in one table column.
Query
SELECT cm3.*, xp.post_title,GROUP_CONCAT(DISTINCT sc.name) AS cate_list
FROM `xld_posts` xp
JOIN course_map cm0 ON cm0.course_id = xp.ID
JOIN course_map cm1 ON cm1.course_id = cm0.course_id AND cm0.id = 3
JOIN course_map cm2 ON cm2.course_id = cm1.course_id AND cm1.id = 6
JOIN course_map cm3 ON cm3.course_id = cm2.course_id AND cm2.id = 11
JOIN subject_category sc ON cm3.id = sc.id
GROUP by post_title ORDER BY post_title
Note: the categories values 3, 6, and 7 are got from form sumbit. Thus if your form has more than three or less your query should dynamically created and join each table with previous table.
:) Happy if any one felt useful.

How to use INNER/OUTER JOIN in MYSQL

I have 3 tables which contain different types of data related to each other. the tables populate via an excel spreadsheet. I have:
table1 table2 table3
item_number item_number item_number
desc desc qty_sold
qty_instock vdf_cost upc
cost status
What I'm trying to do is use a join function to show all the data as they relate to each other, except the problem is that when I run
SELECT *
FROM table1 a
INNER JOIN table2 b
ON a.someColumn = b.otherColumn
INNER JOIN table3 c
ON b.anotherColumn = c.nextColumn
It just puts the tables side by side, If I run
SELECT *
FROM table1 a
INNER JOIN table2 b
USING(item_number)
It works but only joins the item number (i have no idea how to use multiple fields such as description which repeats), and for some reason I can only use the two tables when I try to add a third table (most likely being done completely wrong)
SELECT *
FROM table1 a
INNER JOIN table2 b
INNER JOIN table3 c
USING(item_number)
I just get a syntax error.
Thanks for all the help in advance
UPDATE:
I got it working
SELECT *
FROM master_list a
INNER JOIN bby_report ab USING (item_number, description)
INNER JOIN sales_report b USING (item_number)
Is there a way I can exclude the description from one of the tables and keep it from another one? Turns out the descriptions are not an exact match from one table to a another so it keeps returning zero results.
So to clarify keep description from table1 and leave out description from table2 while being able to JOIN the fields based on item_number
SELECT *
FROM master_list a
INNER JOIN bby_report ab USING (item_number, description)
INNER JOIN sales_report b USING (item_number)