I know I can change the way MySQL executes a query by using the FORCE INDEX (abc) keyword. But is there a way to change the execution order?
My query looks like this:
SELECT c.*
FROM table1 a
INNER JOIN table2 b ON a.id = b.table1_id
INNER JOIN table3 c ON b.itemid = c.itemid
WHERE a.itemtype = 1
AND a.busy = 1
AND b.something = 0
AND b.acolumn = 2
AND c.itemid = 123456
I have a key for every relation/constraint that I use. If I run explain on this statement I see that mysql starts querying c first.
id select_type table type
1 SIMPLE c ref
2 SIMPLE b ref
3 SIMPLE a eq_ref
However, I know that querying in the order a -> b -> c would be faster (I have proven that)
Is there a way to tell mysql to use a specific order?
Update: That's how I know that a -> b -> c is faster.
The above query takes 1.9 seconds to complete and returns 7 rows. If I change the query to
SELECT c.*
FROM table1 a
INNER JOIN table2 b ON a.id = b.table1_id
INNER JOIN table3 c ON b.itemid = c.itemid
WHERE a.itemtype = 1
AND a.busy = 1
AND b.something = 0
AND b.acolumn = 2
HAVING c.itemid = 123456
the query completes in 0.01 seconds (Without using having I get 10.000 rows).
However that is not a elegant solution because this query is a simplified example. In the real world I have joins from c to other tables. Since HAVING is a filter that is executed on the entire result it would mean that I would pull some magnitues more records from the db than nescessary.
Edit2: Just some information:
The variable part in this query is c.itemid. Everything else are fixed values that don't change.
Indexes are setup fine and mysql chooses the right ones for me
between a and b there is a 1:n relation (index PRIMARY is used)
between b and c there is a many to many relation (index IDX_ITEMID is used)
the point is that mysql should start querying table a and work it's way down to c and not the other way round. Any change to achive that.
Solution: Not exactly what I wanted but this seems to work:
SELECT c.*
FROM table1 a
INNER JOIN table2 b ON a.id = b.table1_id
INNER JOIN table3 c ON b.itemid = c.itemid
WHERE a.itemtype = 1
AND a.busy = 1
AND b.something = 0
AND b.acolumn = 2
AND c.itemid = 123456
AND f.id IN (
SELECT DISTINCT table2.id FROM table1
INNER JOIN table2 ON table1.id = table2.table1_id
WHERE table1.itemtype = 1 AND table1.busy = 1)
Perhaps you need to use STRAIGHT_JOIN.
http://dev.mysql.com/doc/refman/5.0/en/join.html
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order.
You can use FORCE INDEX to force the execution order, and I've done that before.
If you think about it, there's usually only one order you could query tables in for any index you pick.
In this case, if you want MySQL to start querying a first, make sure the index you force on b is one that contains b.table1_id. MySQL will only be able to use that index if it's already queried a first.
You can try rewriting in two ways
bring some of the WHERE condition into JOIN
introduce subqueries even though they are not necessary
Both things might impact the planner.
First thing to check, though, would be if your stats are up to date.
Related
I'm always be amused and confused(at same time) whenever I have been to asked prepare and run Join query on Sql Console.
And the cause of most confusion is mainly based upon the fact whether/or not the ordering of join predicate hold any importances in Join results.
Example.
SELECT "zones"."name", "ip_addresses".*
FROM "ip_addresses"
INNER JOIN "zones" ON "zones"."id" = "ip_addresses"."zone_id"
WHERE "ip_addresses"."resporg_accnt_id" = 1
AND "zones"."name" = 'us-central1'
LIMIT 1;
Given the sql query, the Join predicate look like this.
... INNER JOIN "zones" ON "zones"."id" = "ip_addresses"."zone_id" WHERE "ip_addresses"."resporg_accnt_id"
Now, would it make any difference in term of performance of Join as well as the authenticity of the obtained result. If happen to change the predicate to look like this
... INNER JOIN "zones" ON "ip_addresses"."zone_id" = "zones"."id" WHERE "ip_addresses"."resporg_accnt_id"
The predicate order won't make a performance difference in your case, a simple equality condition, but personally I like to place the columns from the table I'm JOINing to on the LHS of each ON condition
SELECT ...
FROM ip_addresses ia
JOIN zones z
ON z.id = ia.zone_id
WHERE ...
The optimiser can use any index available on these columns during the JOIN and I find it easier to visualise this way.
Any additional conditions also tend to be on columns of the table being JOINed to and I find again this reads better when this table is consistently on the LHS
Not quite the same, but I did see a case where performance was affected by the choice of column to isolate
I think the JOIN looked something like
SELECT ...
FROM table_a a
JOIN table_b b
ON a.id = b.id - 1
Changing this to
SELECT ...
FROM table_a a
JOIN table_b b
ON b.id = a.id + 1
allowed the optimiser to use an index on b.id, but presumably at the cost of an index on a.id
I suspect this kind of query might need analysing on a case by case basis
Furthermore, I would probably switch your table order round too and write your original query:
SELECT z.name,
ia.*
FROM zones z
JOIN ip_addresses ia
ON ia.zone_id = z.id
AND ia.resporg_accnt_id = 1
WHERE z.name = 'us-central1'
LIMIT 1
Conceptually, you are saying "Start with the 'us-central1' zone and fetch me all the ip_addresses associated with a resporg_accnt_id of 1"
Check the EXPLAIN plans if you want to verify that there is no difference in your case
Say I need to pull data from several tables like so:
item 1 - from table 1
item 2 - from table 1
item 3 - from table 1 - but select only max value of item 3 from table 1
item 4 - from table 2 - but select only max value of item 4 from table 2
My query is pretty simple:
select
a.item 1,
a.item 2,
b.item 3,
c.item 4
from table 1 a
left join (select b.key_item, max(item 3) from table 1, group by key_item) b on a.key_item = b.key_item
left join (select c.key_item, max(item 4) from table 2, group by key_item) c on c.key_item = a.key_item
I am not sure if my methodology of pulling just a single max item from a table is the most efficient. Assume both tables are over a million rows. my actual sql run forever using this sql setup.
EDIT: I changed the group by clause to reflect comments made. I hope it makes a bit of sense now?
Your best bet is to add an index on table1 and table2, as follows:
ALTER TABLE table1
ADD INDEX `GoodIndexName1` (`key_item`,`item3`)
ALTER TABLE table2
ADD INDEX `GoodIndexName2` (`key_item`,`item4`)
This will allow you to use queries as described in the MySQL documentation for finding the rows holding the group-wise maximum, which appears to be what you are looking for.
Your original (edited) query should work:
select
a.item1,
a.item2,
b.item3,
c.item4
from table1 a
LEFT OUTER JOIN (
SELECT
b.key_item,
MAX(item3) AS item3
FROM table1
GROUP BY key_item
) b
ON a.key_item = b.key_item
LEFT OUTER JOIN (
SELECT
c.key_item,
MAX(item4)
FROM table2
GROUP BY key_item
) c
ON c.key_item = a.key_item
and if that performs slowly after adding the indexes, try the following too:
SELECT
a.item1,
a.item2,
b.item3,
c.item4
FROM table1 a
LEFT OUTER JOIN table1 b
ON b.key_item = a.key_item
LEFT OUTER JOIN table1 larger_b
ON larger_b.key_item = b.key_item
AND larger_b.item3 > b.item_3
LEFT OUTER JOIN table2 c
ON c.key_item = a.key_item
LEFT OUTER JOIN table2 larger_c
ON larger_c.key_item = c.key_item
AND larger_c.item4 > c.item4
WHERE larger_b.key_item IS NULL
AND larger_c.key_item IS NULL
(I have modified the table and column names only slightly, so that they conform to correct MySQL syntax. )
I work with queries that use the above structure all the time, and they perform very efficiently with indexes like the one I provided.
That said, usually I am using INNER JOINs on the b and c tables, but I don't see why your query should have any issues.
If you do experience performance problems still, report the data types of the key_item columns for each table, as if you try to join on different data types, you will generally get poor performance.
Here's my problem all :
I have 2 big table call it A n B.
If I join that's 2 table with a very simple query like this example :
SELECT COUNT(*) FROM lib_judul, lib_buku
Then mysql process is not over yet, I don't know why. Table A have 158,670 records (33,6 MB) and Table B have 130,028 records (34,6 MB). I think myquery is right, cause I've try before to join table A with table C (the very smaller table one) and it's run well.
What should I do to do this?
You have implicit CROSS JOIN in your code which creates full Cartesian Product of the two tables. It creates a new table with 158,670 times 130,028 rows. This is more than 20 billion (20,631,542,760) records.
It's because there is no common field for both of the tables. Try using Explicit Join just like below:
SELECT
COUNT(*)
FROM lib_judul A
JOIN lib_buku B ON A.id=B.id
The cost of your query maybe is too large. Your query have cost = 158,670 x 130,028 = 20,631,542,760 I/O.
The query execution plan will execute join first, then select the column.
Know your need. May be you can add some "where condition" before you join it. Example:
this query: SELECT
COUNT(*)
FROM lib_judul A, lib_buku B
WHERE B.id = 1 AND B.id = A.id
can be optimized like this:
SELECT * FROM
(SELECT * FROM lib_judul) A
JOIN
(SELECT * FROM lib_buku WHERE lib_buku.id = 1) B
ON B.id = A.id
I'm quite sloppy with databases, can't get this working with joins, and I'm not even sure that would be faster...
DELETE FROM atable
WHERE btable_id IN (SELECT id
FROM btable
WHERE param > 2)
AND ctable_id IN (SELECT id
FROM ctable
WHERE ( someblob LIKE '%_ID1_%'
OR someblob LIKE '%_ID2_%' ))
Atable contains ~19M rows, this would delete ~3M of that. At the moment, I can only run the query with LIMIT 100000, and I don't want to sit here with phpmyadmin all day, because each deletion (of 100.000 rows) runs for about 1.5 mins.
Any ways to speed this up / automate it?
MySQL 5.5
(do you think it's already bad DB design if any table contains 20M rows?)
Use EXISTS or JOIN instead of IN to improve perfromance
Using EXISTS:
DELETE FROM Atable A
WHERE EXISTS (SELECT 1 FROM Btable B WHERE A.Btable_id = B.id AND B.param > 2) AND
EXISTS (SELECT 1 FROM Ctable C WHERE A.Ctable_id = C.id AND (C.someblob LIKE '%_ID1_%' OR C.someblob LIKE '%_ID2_%'))
Using JOIN:
DELETE A
FROM Atable A
INNER JOIN Btable B ON A.Btable_id = B.id AND B.param > 2
INNER JOIN Ctable C WHERE A.Ctable_id = C.id AND (C.someblob LIKE '%_ID1_%' OR C.someblob LIKE '%_ID2_%')
first you should try with exist instead of in. it's faster in many many case.
Then you could try to do inner join instead of in and exists.
Example :
delete a
from a
inner join b on b.id = a.tablebid
And finally if it could be possible (i don't know if you have id3, ids) to change the or by something else. Sometimes strange and complicated change helps the optimizer. case when, subquery...
I don't see where a simple index would help much. I'd do:
delete from atable where id in (
select
id
from
atable a
join btable b on a.btable_id = b.id
join ctable c on a.ctable_id = c.id
where
b.param > 2
and (
c.someblob LIKE '%_ID1_%'
OR c.someblob LIKE '%_ID2_%'
)
)
Correction: I'm assuming you've got indexes on btable and ctable's id's (probably, if they're primary keys...) and on b.param (if it's numeric).
Beside optimizing the query you could also take a look at a good use of indexes, since they might prevent a full table scan.
For BTable for example create an index on id and param.
To explain why this helps:
If the database has to look up the id and param values in the table in a unsorted manner, the database has to read ALL rows. If the database reads the index, SORTED, it can look up the id and param with reduced costs.
I am using the following mysql select-query with several joins. I am wondering if this is how a somewhat good select-statement should look like:
SELECT *
FROM table_news AS a
INNER JOIN table_cat AS b ON a.cat_id = b.id
INNER JOIN table_countries AS c ON a.country_id = c.id
INNER JOIN table_addresses AS d ON a.id = d.news_id
WHERE a.deleted = 0
AND a.hidden = 0
AND a.cat_id = ".$search_cat."
AND a.country_id = ".$search_country."
AND a.title LIKE '%".$search_string."%'
OR a.deleted = 0
AND a.hidden = 0
AND a.cat_id = ".$search_cat."
AND a.country_id = ".$search_country."
AND a.subtitle LIKE '%".$search_string."%'"
It seems to be a lot of joins. Even though table b and table c contain only 3 or 4 fields, I wonder if the number of joins would clearly slow down the search on the starting-page?
Would it be better to put the fields from table d (street, city and so on) back into the main-table, as they should be needed most of the time this query is executed?
Thanx in advance,
Jayden
I don't think there is necessarily anything wrong with having three joins. There are a couple of things you can do to make sure the query is optimised.
Firstly, you should never do SELECT * - instead explicitly state what fields you want to return from the database.
Also, I would create indexes on all the fields you have in the where clause, and all of the fields you are joining. This can be a little bit of a trade off - for example if you are doing a lot of write operations then there is a hit because you need to write to the index everytime.