Optimize Update Join between two large tables - mysql

I have an update that does this:
Update Table1 as T1
Inner Join Table2 as T2
On T1.X=T2.Y
Set T1.A=T2.B;
Table1 is around 10,000,000 records
Table2 is around 40,000 records
I have an index on both T1.X and T2.Y
Naturally this takes forever. Is there a way to reduce the time?
For instance, my understanding the join is not the sum of the table records but the product. Is there a way (if this is true) to step through the join 1000 Table1 records at at time?

Apparently the Join works on both full tables. SO I modified my query to this:
Update Table1 as T1
Inner Join (Select T2.Y,T2.B from Table2) as T2
On T1.X=T2.Y
Set T1.A=T2.B;
I haven't run the entire query again but when I limited it as a test to 'where T1.ID<100' it reduced from 18+ seconds to 1 second when I add the Select statement i.
UPDATE
I also tested pushing in a limited # of records at a time. Interstingly enough 10 records takes 1 second, 1000 records takes 6 seconds and 10000 take 250 seconds. So I am going to test nesting this in a while to step through 1000 records at a time.

It does not answer your question, but here is a way to use your logic to make it faster:
UPDATE table_A a
INNER JOIN (SELECT TRUNCATE(lat,3) AS lat, TRUNCATE(lon,3) AS lon, address FROM table_B) b
SET a.address = b.address
WHERE TRUNCATE(a.lat,3) = b.lat AND TRUNCATE(a.lon,3) = b.lon AND a.address IS NULL;
When tou need to make a work on your data to use it, instead of transform them in the 'Where', transform them when you take them aou of your table.

Related

Difference between "INNER JOIN table" and "INNER JOIN (SELECT table)"?

I work on a query in mysql that spend 30 sec to execute. The format is like this :
SELECT id
FROM table1 t1
INNER JOIN table2 t2
ON t1.id = t2.idt2
The INNER JOIN take 25 of 30 sec. When I write this like this :
SELECT id
FROM table1 t1
INNER JOIN (
SELECT idt2,col1,col2,col3
FROM table2
) t2
ON t1.id = t2.idt2
It take only 8 sec! Why does it work? I'm afraid of losing data.
(obviously, my query is more complex than this one, it's just an exemple)
Well you haven't shown us the EXPLAIN output
EXPLAIN SELECT id
FROM table1 t1
INNER JOIN table2 t2
ON t1.id = t2.idt2
this would definitly give us some insights of your query and table sctructures.
Based on your scenario, 1st query seems like you have issues with indexing.
What happened in your 2nd query is the optimizer is creating a temporary set from your subquery furthering filtering your data. I dont recommend doing that in MOST cases.
Purpose of subquery is to solve complex logic, not an instant solution for everything.

multiple mysql joins use to much memory

I have about twenty rather small tables (the largest has about 2k rows, normaly about 100 rows, with from 4 up to 20 columns each) I try to join by
select ... from table1
left join table2 on table1.name = table2.t2name
left join table3 on table1.name = table3.othername
left join table4 on table2.t2name = table4.something
and so on
in theory it should return about 2k rows with maybe 80 columns, so I guess that the amount of data itself is not the problem.
But it runs out of memory. From reading several posts here I figured out that mysql internaly makes a big "all x all"-table first and reduces it later. How can I force it to excute the join after each join first, that it takes a lot less memory?
Just to make things clear, in your case the expected amount of data is not the problem.
What appears to be the problem is the fact that you are asking the system to compare A X B X C X D... rows (calculate what it means and you will get the picture).
The general idea described in one of my previous comments is to make you query look as follows:
SELECT * FROM (select ... from table1
where .....
) A
LEFT JOIN SELECT * FROM (select ... from table2
where .....
) B
ON A.name = B.t2name
LEFT JOIN SELECT * FROM (select ... from table3
where .....
) C
ON A.name = C.othername
LEFT JOIN SELECT * FROM (select ... from table4
where .....
) D
ON B.name = D.something
In this way, and assuming that this is applicable in the sense that you do have conditions to put in the where ..... clause of the inner selects, you will be reducing the number of records from each table that would need to be compared during the join.

MySQL: JOIN where ON may compare with null

I have:
simple_table
|- first_id
|- second_id
SELECT * FROM table t1 JOIN table t2
ON [many many conditions]
ON t1.id IN (SELECT first_id FROM simple_table)
AND t2 = (
SELECT second_id FROM simple_table WHERE t1.id = first_id //4th row, can return NULL
)
Questions:
How to handle situation where 4th row return null?
Can I use t1 & t2 alias inside subqueries?
Updated [extra wxplanation]
I have very big table. I need to iterate through table and check some conditions. Actually simple_table provide the ids of table entities, conditions of which I should check. I mean:
simple_table
first_id second_id
11 128
table
id <other_fields>
................
11 <other_data>
...............
128 <other_data>
So, I should check whether those two entities in table have right conditions relatively one another.
The question is unclear, but given the update the query should work better if there is an index on the ID of the big table (probably it's there already as the PK).
As the condition seems to be on the same table the easiest query will be
SELECT ...
FROM bigtable t1
INNER JOIN simple_table st ON t1.ID IN (st.first_id, st.second_id)
or
SELECT ...
FROM bigtable t1
INNER JOIN simple_table st ON t1.ID = st.first_id
INNER JOIN bigtable t2 ON st.second_id = t2
to get the two rows from bigtable on the same row of the result.
The second query will make the checks easier to write, the first will be faster but most probable need a GROUP BY to return the wanted results.
Some performance tests on the OP machine are needed to get the fastest one.
In case one of the ID in simple_table is NULL only the other will be considered, the code will have to check about it.
You can use the alias of the tables in the subqueries, and you'll need to do that as you'll probably have the same table in the subqueries.
The relative condition to check are still undisclosed by the OP so that's all I can help with.

Which Query is faster if we put the "Where" inside the Join Table or put it at the end?

Ok, I am using Mysql DB. I have 2 simple tables.
Table1
ID-Text
12-txt1
13-txt2
42-txt3
.....
Table2
ID-Type-Text
13- 1 - MuTxt1
42- 1 - MuTxt2
12- 2 - Xnnn
Now I want to join these 2 tables to get all data for Type=1 in table 2
SQL1:
Select * from
Table1 t1
Join
(select * from Table2 where Type=1) t2
on t1.ID=t2.ID
SQL2:
Select * from
Table1 t1
Join
Table2 t2
on t1.ID=t2.ID
where t2.Type=1
These 2 queries give the same result, but which one is faster?
I don't know how Mysql does the Join (or How the Join works in Mysql) & that why I wonder this!!
Exxtra info, Now if i don't want type=1 but want t2.text='MuTxt1', so Sql2 will become
Select * from
Table1 t1
Join
Table2 t2
on t1.ID=t2.ID
where t2.text='MuTxt1'
I feel like this query is slower??
Sometimes the MySQL query optimizer does a pretty decent job and sometimes it sucks. Having said that, there are exception to my answer where the optimizer optimizes something else better.
Sub-Queries are generally expensive as MySQL will need to execute and store results seperately. Normally if you could use a sub-query or a join, the join is faster. Especially when using sub-query as part of your where clause and don't put a limit to it.
Select *
from Table1 t1
Join Table2 t2 on t1.ID=t2.ID
where t2.Type=1
and
Select *
from Table1 t1
Join Table2 t2
where t1.ID =t2.ID AND t2.Type=1
should perform equally well, while
Select *
from Table1 t1
Join (select *
from Table2
where Type=1) t2
on t1.ID=t2.ID
most likely is a lot slower as MySQL stores the result of select * from Table2 where Type=1 into a temporary table.
Generally joins work by building a table comprised of all combinations of rows from both table and afterwards removing lines which do not match the conditions. MySQL of course will try to use indexes containing the columns compared in the on clause and specified in the where clause.
If you are interested in which indexes are used, write EXPLAIN in front of your query and execute.
As per my view 2nd query is more better than first query in terms of code readability and performance. You can include filter condition in Join clause also like
Select * from
Table1 t1
Join
Table2 t2 on t1.ID=t2.ID and t2.Type=1
You can compare execution time for all queries in SQL fiddle here :
Query 1
Query 2
My Query
I think this question is hard to answer since we don't exactly know the internals of the query parser in the database. Usually these kind of constructions are evaluated by the database in a similar way (it can see that the first and second query are identical so parses it correctly, or not).
I would write the second one since it is more clear what is happening.

Mysql join function to join around 25 tables

I have around 15 tables in my database.
Each table has a column employee id's, I wanted to join all the tables based on employee id column and at the same time select two columns from each table.
That means finally, I will have a table with 31 columns. (1 id column and 30 from 15 tables)
What would be the easier way to do this?
If I understand your problem, one possible solution could be:
SELECT t1.id, t1.f1, t1.f2, t2.f3, t2.f4, ....., t15.f30, t15.f31
FROM table1 t1 INNER JOIN table t2 ON t1.id = t2.id
INNER JOIN table3 t3 ON t1.id = t3.id....
....
INNER JOIN table15 t15 ON t1.id = t15.id
WHERE ....
Naturally IDs must have indexes (for faster querying); I'm assuming one-one relation, but this standard query could be easily changed to fullfil your needs...
First of all I would recommend you to use(whenever is possible) a left join to achieve that. In that link you can have an idea of how to optimise the process, because, of course, you will need it a lot. Anyway, depending on the amount of data you have, please consider redesigning the database. Hope that helps,