How to optimize this DB operation? - mysql

I'm quite sloppy with databases, can't get this working with joins, and I'm not even sure that would be faster...
DELETE FROM atable
WHERE btable_id IN (SELECT id
FROM btable
WHERE param > 2)
AND ctable_id IN (SELECT id
FROM ctable
WHERE ( someblob LIKE '%_ID1_%'
OR someblob LIKE '%_ID2_%' ))
Atable contains ~19M rows, this would delete ~3M of that. At the moment, I can only run the query with LIMIT 100000, and I don't want to sit here with phpmyadmin all day, because each deletion (of 100.000 rows) runs for about 1.5 mins.
Any ways to speed this up / automate it?
MySQL 5.5
(do you think it's already bad DB design if any table contains 20M rows?)

Use EXISTS or JOIN instead of IN to improve perfromance
Using EXISTS:
DELETE FROM Atable A
WHERE EXISTS (SELECT 1 FROM Btable B WHERE A.Btable_id = B.id AND B.param > 2) AND
EXISTS (SELECT 1 FROM Ctable C WHERE A.Ctable_id = C.id AND (C.someblob LIKE '%_ID1_%' OR C.someblob LIKE '%_ID2_%'))
Using JOIN:
DELETE A
FROM Atable A
INNER JOIN Btable B ON A.Btable_id = B.id AND B.param > 2
INNER JOIN Ctable C WHERE A.Ctable_id = C.id AND (C.someblob LIKE '%_ID1_%' OR C.someblob LIKE '%_ID2_%')

first you should try with exist instead of in. it's faster in many many case.
Then you could try to do inner join instead of in and exists.
Example :
delete a
from a
inner join b on b.id = a.tablebid
And finally if it could be possible (i don't know if you have id3, ids) to change the or by something else. Sometimes strange and complicated change helps the optimizer. case when, subquery...

I don't see where a simple index would help much. I'd do:
delete from atable where id in (
select
id
from
atable a
join btable b on a.btable_id = b.id
join ctable c on a.ctable_id = c.id
where
b.param > 2
and (
c.someblob LIKE '%_ID1_%'
OR c.someblob LIKE '%_ID2_%'
)
)
Correction: I'm assuming you've got indexes on btable and ctable's id's (probably, if they're primary keys...) and on b.param (if it's numeric).

Beside optimizing the query you could also take a look at a good use of indexes, since they might prevent a full table scan.
For BTable for example create an index on id and param.
To explain why this helps:
If the database has to look up the id and param values in the table in a unsorted manner, the database has to read ALL rows. If the database reads the index, SORTED, it can look up the id and param with reduced costs.

Related

MySQL Join performance is really slow

I am trying to compare values against same table which has more than 1,000,000 rows. Below is my query and it takes around 25 secs to get results.
EXPLAIN SELECT DISTINCT a.studyid,a.number,a.load_number,b.studyid,b.number,b.load_number FROM
(SELECT t1.*, buildnumber,platformid FROM t t1
INNER JOIN testlog t2 ON t1.`testid` = t2.`testid`
WHERE (buildnumber =1031719 AND platformid IN (SELECT platformid FROM platform WHERE platform.`Description` = "Windows 7 SP1"))
)AS a
JOIN
(SELECT t1.*,buildnumber,platformid FROM t t1
INNER JOIN testlog t2 ON t1.`testid` = t2.`testid`
WHERE (buildnumber =1030716 AND platformid IN (SELECT platformid FROM platform WHERE platform.`Description` = "Windows 7 SP1"))
)AS b
ON a.studyid=b.studyid AND a.load_number = b.load_number AND a.number = b.number
Could you anyone help me to improve this query to get fast enough results?
The problem here is even I have number and load_number index, the query doesn't use that. I dont know why it is always ignored..
Thanks.
First, you have a silly query. You are retrieving six columns, but there are only three values. Look at the on clause.
I think your best bet is to rewrite the query using conditional aggregation. I think the following is equivalent:
SELECT t1.studyid, t1.load_number, t1.number
FROM t t1 INNER JOIN
testlog t2
ON t1.testid = t2.testid
WHERE t2.buildnumber IN (1031719, 1030716) AND
platformid IN (SELECT platformid FROM platform p WHERE p.Description = 'Windows 7 SP1'))
GROUP BY studyid, load_number, number
HAVING MIN(buildnumber) <> MAX(buildnumber)
For this query, you want indexes on platform(Description, platformid) and testlog(buildnumber, platformid) and t(testid).
Problem #1:
IN ( SELECT ... ) optimizes very poorly. The subquery is rerun again and again. It looks like you are expecting exactly one id from that query; if so, change it to = ( SELECT ... ). That way it will be run exactly once.
Problem #2:
FROM ( SELECT ... )
JOIN ( SELECT ... ) ON ...
optimizes poorly because neither subquery. Can you merge the two subqueries into one, as Gordon was trying? If not, then put one of them into a TEMPORARY TABLE and add an appropriate index to that table so that the ON will be able to use it. Probably PRIMARY KEY(studyid, load_number, number).
Footnote: The latest versions of MySQL have made improvements on these problems by dynamically generating indexes. What version are you using?

Why mysql cannot join 2 big / large table?

Here's my problem all :
I have 2 big table call it A n B.
If I join that's 2 table with a very simple query like this example :
SELECT COUNT(*) FROM lib_judul, lib_buku
Then mysql process is not over yet, I don't know why. Table A have 158,670 records (33,6 MB) and Table B have 130,028 records (34,6 MB). I think myquery is right, cause I've try before to join table A with table C (the very smaller table one) and it's run well.
What should I do to do this?
You have implicit CROSS JOIN in your code which creates full Cartesian Product of the two tables. It creates a new table with 158,670 times 130,028 rows. This is more than 20 billion (20,631,542,760) records.
It's because there is no common field for both of the tables. Try using Explicit Join just like below:
SELECT
COUNT(*)
FROM lib_judul A
JOIN lib_buku B ON A.id=B.id
The cost of your query maybe is too large. Your query have cost = 158,670 x 130,028 = 20,631,542,760 I/O.
The query execution plan will execute join first, then select the column.
Know your need. May be you can add some "where condition" before you join it. Example:
this query: SELECT
COUNT(*)
FROM lib_judul A, lib_buku B
WHERE B.id = 1 AND B.id = A.id
can be optimized like this:
SELECT * FROM
(SELECT * FROM lib_judul) A
JOIN
(SELECT * FROM lib_buku WHERE lib_buku.id = 1) B
ON B.id = A.id

Need mysql query to pull data from two tables

So after helpful feedback from my original question, I now have this query:
SELECT sessions.id, sessions.title, sessions.abstract, sessions.presenters, sessions.proposal_id, proposals.outcomes, proposals.CategorySelection, proposals.research3, proposals.research4, proposals.research5, proposals.research6, proposals.innovation3, proposals.innovation4, proposals.innovation5,proposals.innovation6, proposals.application3, proposals.application4, proposals.application5, proposals.application6, proposals.integration3, proposals.integration4, proposals.integration5, proposals.integration6, proposals.references, proposals.organization
FROM sessions, proposals
INNER JOIN proposals ON proposals.id = sessions.proposal_id
WHERE sessions.id = '$id
LIMIT 1;)
that is getting me nowhere fast. What am I doing wrong?
Original question:
I need to pull several fields from one table and several more from a second table. The criteria is that a field called proposal_id match the id field of the second table. I am fairly new so this is what I have so far. It is not working, but not sure how to make it work.
(SELECT `title`,`abstract`,`presenters`,`proposal_id` FROM `sessions` WHERE `id`='$id')
UNION
(SELECT `outcomes`,`CategorySelection`,`research3`,`research4`,`research5`,`research6`,`innovation3`,`innovation4`,`innovation5`,
`innovation6`,`application3`,`application4`,`application5`,`application6`,`integration3`,`integration4`,`integration5`,`integration6`,`references`,`organization` FROM `proposals` WHERE `id`= `sessions`.`proposal_id`)
LIMIT 1;
You need to use JOIN not UNION
select
s.*,p.*
from `sessions` s
inner join `proposals` p on p.id = s.proposal_id
where s.id = '$id'
This is how you can join both the tables using the common key between.
You can select the specific fields instead of .* by specifying the column names as
s.col1,s.col2,p.col1,p.col2
etc
Try to use JOINS, where you can match the related fields from both the tables , this is the most convenient way to fetch records from multiple tables
UNION is used when you want to combine two queries
select a.id,b.some_field from table1 as a
INNER JOIN table2 as b ON b.prospal_id = a.id

Searching on multi (1-n) relation tables

I learned the hard way that i shouldn't store serialized data in a table when i need to make it searchable .
So i made 3 tables the base & two 1-n relation tables .
So here is the query i get if i want to select a specific activity .
SELECT
jdc_organizations_activities.id
FROM
jdc_activity_sector ,
jdc_activity_type
INNER JOIN jdc_organizations_activities ON jdc_activity_type.activityId = jdc_organizations_activities.id
AND
jdc_activity_sector.activityId = jdc_organizations_activities.id
WHERE
jdc_activity_sector.activitySector = 5 AND
jdc_activity_type.activityType = 3
Questions :
1- What kind of indexes can i add on a 1-n relation table , i already have a unique combination of (activityId - activitySector) & (activityId - activityType)
2- Is there a better way to write the query to have a better performance ?
Thank you !
I would re-organise the query to avoid the cross product caused by using , notation.
Also, you are effectively only using the sector and type tables as filters. So put activity table first, and then join on your other tables.
Some may suggest that; the first join should ideally be the join which is most likely to restrict your results the most, leaving the minimal amount of work to do in the second join. In reality, the sql engine can actually re-arrange your query when generateing a plan, but it does help to think this way to help you think about the efforts the sql engine are having to go to.
Finally, there are the indexes on each table. I would actually suggest reversing the Indexes...
- ActivitySector THEN ActivityId
- ActivityType THEN ActivityId
This is specifically because the sql engine is manipulating your query. It can take the WHERE clause and say "only include records from the Sector table where ActivitySector = 5", and similarly for the Type table. By having the Sector and Type identifies FIRST in the index, this filtering of the tables can be done much faster, and then the joins will have much less work to do.
SELECT
[activity].id
FROM
jdc_organizations_activities AS [activity]
INNER JOIN
jdc_activity_sector AS [sector]
ON [activity].id = [sector].activityId
INNER JOIN
jdc_activity_type AS [type]
ON [activity].id = [type].activityId
WHERE
[sector].activitySector = 5
AND [type].activityType = 3
Or, because you don't actually use the content of the Activity table...
SELECT
[sector].activityId
FROM
jdc_activity_sector AS [sector]
INNER JOIN
jdc_activity_type AS [type]
ON [sector].activityId = [type].activityId
WHERE
[sector].activitySector = 5
AND [type].activityType = 3
Or...
SELECT
[activity].id
FROM
jdc_organizations_activities AS [activity]
WHERE
EXISTS (SELECT * FROM jdc_activity_sector WHERE activityId = [activity].id AND activitySector = 5)
AND EXISTS (SELECT * FROM jdc_activity_type WHERE activityId = [activity].id AND activityType = 3)
I would advise against mixing old style from table1, table2 and new style from table1 inner join table2 ... in a single query. And you can alias tables using table1 as t1, shortening long table names to an easy to remember mnenomic:
select a.id
from jdc_organizations_activities a
join jdc_activity_sector as
on as.activityId = a.Id
join jdc_activity_type as at
on at.activityId = a.Id
where as.activitySector = 5
and at.activityType = 3
Or even more readable using IN:
select a.id
from jdc_organizations_activities a
where a.id in
(
select activityId
from jdc_activity_sector
where activitySector = 5
)
and a.id in
(
select activityId
from jdc_activity_type
where activityType = 3
)

Is there a way to force MySQL execution order?

I know I can change the way MySQL executes a query by using the FORCE INDEX (abc) keyword. But is there a way to change the execution order?
My query looks like this:
SELECT c.*
FROM table1 a
INNER JOIN table2 b ON a.id = b.table1_id
INNER JOIN table3 c ON b.itemid = c.itemid
WHERE a.itemtype = 1
AND a.busy = 1
AND b.something = 0
AND b.acolumn = 2
AND c.itemid = 123456
I have a key for every relation/constraint that I use. If I run explain on this statement I see that mysql starts querying c first.
id select_type table type
1 SIMPLE c ref
2 SIMPLE b ref
3 SIMPLE a eq_ref
However, I know that querying in the order a -> b -> c would be faster (I have proven that)
Is there a way to tell mysql to use a specific order?
Update: That's how I know that a -> b -> c is faster.
The above query takes 1.9 seconds to complete and returns 7 rows. If I change the query to
SELECT c.*
FROM table1 a
INNER JOIN table2 b ON a.id = b.table1_id
INNER JOIN table3 c ON b.itemid = c.itemid
WHERE a.itemtype = 1
AND a.busy = 1
AND b.something = 0
AND b.acolumn = 2
HAVING c.itemid = 123456
the query completes in 0.01 seconds (Without using having I get 10.000 rows).
However that is not a elegant solution because this query is a simplified example. In the real world I have joins from c to other tables. Since HAVING is a filter that is executed on the entire result it would mean that I would pull some magnitues more records from the db than nescessary.
Edit2: Just some information:
The variable part in this query is c.itemid. Everything else are fixed values that don't change.
Indexes are setup fine and mysql chooses the right ones for me
between a and b there is a 1:n relation (index PRIMARY is used)
between b and c there is a many to many relation (index IDX_ITEMID is used)
the point is that mysql should start querying table a and work it's way down to c and not the other way round. Any change to achive that.
Solution: Not exactly what I wanted but this seems to work:
SELECT c.*
FROM table1 a
INNER JOIN table2 b ON a.id = b.table1_id
INNER JOIN table3 c ON b.itemid = c.itemid
WHERE a.itemtype = 1
AND a.busy = 1
AND b.something = 0
AND b.acolumn = 2
AND c.itemid = 123456
AND f.id IN (
SELECT DISTINCT table2.id FROM table1
INNER JOIN table2 ON table1.id = table2.table1_id
WHERE table1.itemtype = 1 AND table1.busy = 1)
Perhaps you need to use STRAIGHT_JOIN.
http://dev.mysql.com/doc/refman/5.0/en/join.html
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order.
You can use FORCE INDEX to force the execution order, and I've done that before.
If you think about it, there's usually only one order you could query tables in for any index you pick.
In this case, if you want MySQL to start querying a first, make sure the index you force on b is one that contains b.table1_id. MySQL will only be able to use that index if it's already queried a first.
You can try rewriting in two ways
bring some of the WHERE condition into JOIN
introduce subqueries even though they are not necessary
Both things might impact the planner.
First thing to check, though, would be if your stats are up to date.