How can I optimize MySQL query with multiple joins? - mysql

Any inputs on how can I optimize joins in the MySQL query? For example, consider the following query
SELECT E.name, A.id1, B.id2, C.id3, D.id4, E.string_comment
FROM E
JOIN A ON E.name = A.name AND E.string_comment = A.string_comment
JOIN B ON E.name = B.name AND E.string_comment = B.string_comment
JOIN C ON E.name = C.name AND E.string_comment = C.string_comment
JOIN D ON E.name = D.name AND E.string_comment = D.string_comment
Table A,B,C,D are temporary tables and contains 1096 rows and Table E (also temporary table) contains 426 rows. Without creating any index, MySQL EXPLAIN was showing me all the rows being searched from all the Tables. Now, I created a FULLTEXT index for name as name_idx and string_comment as string_idx on all the tables A,B,C,B and E. The EXPLAIN command is still giving me the same result as shown below.
Also, please note that
name and string_comment are of type VARCHAR and idX are of type int(15)
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE A ALL name_idx,string_idx 1096
1 SIMPLE B ALL name_idx,string_idx 1096 Using where
1 SIMPLE C ALL name_idx,string_idx 1096 Using where
1 SIMPLE D ALL name_idx,string_idx 1096 Using where
1 SIMPLE E ALL name_idx,string_idx 426 Using where
Any comments on how can I tune this query?
Thanks.

For each table you should create a composite index on both columns. The syntax varies a bit, but it is something like:
CREATE INDEX comp_E_idx E(name, string_comment)
And repeat for all tables.
Separate indices won't help because when it tries to merge they are useless. It searches for the name in the index really fast, but then has to iterate to find the comment

You're asking if the query can be tune. I would first ask if the data, itself can be tuned. i.e. Please consider putting the data in A, B, C, D and E into a single table with NULL-able columns. This will reduce the complexity of searches from O(N^5) to O(N).

Related

why is this query so slow with MySQL?

Yesterday I found a slow query running on the server(this query costs more than 1 minute).It looks like this:
select a.* from a
left join b on a.hotel_id=b.hotel_id and a.hotel_type=b.hotel_type
where b.hotel_id is null
There are 40000+ rows in table a and 10000+ rows in table b.An unique key had already been created on columns hotel_id and hotel_type in table b like UNIQUE KEY idx_hotel_id (hotel_id,hotel_type).So I used the explain keyword to check the query plan on this sql and I got a result like the following:
type key rows
1 SIMPLE a ALL NULL NULL NULL NULL 36804
1 SIMPLE b index NULL idx_hotel_id 185 NULL 8353 Using where; Using index; Not exists
According to the reference manual of MySQL, when all parts of an index are used by the join and the index is a PRIMARY KEY or UNIQUE NOT NULL index the join type will be "eq_ref".See the second row of the query plan,the value of column type is "index".But I really had en unique index on hotel_id and hotel_type and both the two columns were used by the join.The join type "ef_ref" is more efficient than the join type "ref" and "ref" is more efficient than "range"."index" is the last join type wo wanna hava except "ALL".This is what I'm confused about and I wanna know why the join type here is "index". I hope I describe my question clear and I'm looking forward to get answers from you guys,thanks!
Where Is Null checks can be slow, so maybe it is that.
select * from a
where not exists ( select 1 from b where a.hotel_id=b.hotel_id and a.hotel_type=b.hotel_type )
Also: how many records are you returning? If you are returning all 36804 records this could slow things down as well.
Thanks all the people above!I found the way to solve my problem myself.The columns hotel_id and hotel_type didn't have the same character set.After I made them both "utf8",my query returned result in about less than 10 millisecond.There is an good article about left join and index in MySQL,I strongly recommend it to you guys.Here is the site:http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/

Why isn't the index used?

SELECT `Nen Straatnaam` as street, `Nen Woonplaats` as city, Gemeente,
Postcode, acn_distinct.zipcodes, acn_distinct.lat, acn_distinct.lng
FROM `acn_distinct` INNER JOIN crimes as c
ON `Nen Woonplaats` = c.place AND c.street_check = 0
ORDER BY street ASC
EXPLAIN gives me this information:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE c ref idx_place,idx_street_check,fulltext_place idx_street_check 1 const 67556 Using temporary; Using filesort
1 SIMPLE acn_distinct ref ID_nen_woonplaats ID_nen_woonplaats 768 crimes.c.place 42 Using index condition
So why is it not using the suggested indexes?
It is using an index, idx_street_check, however, the performance will be terrible, since it's also creating a temporary table and using filesort which are both notorious culprits.
The question is why it's not using an index on crimes.place. I would try creating a multi-column index on (place, street_check).
But more importantly, I think your table schema is very, very bad. Your JOIN is poorly formed:
FROM `acn_distinct` INNER JOIN crimes as c
ON `Nen Woonplaats` = c.place AND c.street_check = 0
When you do a JOIN between tables A and B, you should join as such: A.x = B.y. But in this case you're not even referring to table A. You're multiplying the # of rows from A by a particular subset of B.

MySQL - How do I optimize appending field from table b to a query of table a

I know this has to be a fairly common issue, and I am sure the answer is readily available but I am not sure how to phrase my search so I have been forced to troubleshoot this on my own for the most part.
Table A
id | content_id | score
1 | 2 | 16
2 | 2 | 4
3 | 3 | 8
4 | 3 | 12
Table B
id | content
1 | "Content Goes Here"
2 | "Content Goes Here"
3 | "Content Goes Here"
Objective: SUM all scores from table A, group by the unique content_id and show the content associated with the id, ordered by the sum score.
Current Working Query:
SELECT a.content_id, b.content, SUM(a.score) AS sum
FROM table_a a
LEFT JOIN table_b b ON a.content_id = b.id
GROUP BY a.content_id
ORDER BY sum ASC;
Problem: As far as I can tell, with the way I have structured my query, the content is grabbed from table_b by looping through each record on table_a, checking for a record in table_b with an identical id, and grabbing the content field. The problem here is that in table_a there is nearly 500k+ records, and in table_b there is 112 records. Which means that potentially 500,000 x 112 cross table lookups/matches are being performed just to attached 112 unique content fields to a total of 112 results in the ending result set.
HELP!: How do I more efficiently append the 112 content fields from table_b to the 112 results produced by the query? I am guessing it has something to do with the query execution order, like somehow only looking for and appending the content field to the matched result row AFTER the sums are produced and it is narrowed down to only 112 records? Have studied the MySQL API and benchmarked various subqueries, several joins, and even tried playing with UNION. It is probably something abundandtly obvious to you guys, but my brain just can't get around it.
FYI: Like mentioned earlier, the query does work. The results are produced in about 8 to 10 seconds, and of course each subsequent query after that is immediate because of query caching. But for me, with how simple this is, I know that 8 seconds can at LEAST be cut in half. I just feel it deep down in my guts. Right deep down in my gutssss.
I hope this is concise enough, if I need to clarify or explain something better please let me know! Thanks in advance.
The MySQL query optimiser only allows "nested loop joins" ** These are the internal operators for how an INNER join is evaluated. Other RDBMS allow other kinds of JOINs which are more efficient.
However, in your case you can try this. Hopefully the optimiser will do the aggregate before the JOIN
SELECT
a.content_id, b.content a.sum
FROM
(
SELECT content_id, SUM(score) AS sum
FROM table_a
GROUP BY content_id
) a
JOIN table_b b ON a.content_id = b.id
ORDER BY
sum ASC;
In addition, if you don't want the results ordered you can use ORDER BY NULL which usually removes a filesort from the EXPLAIN. And of course, I assume that there are indexes on the 2 content_id columns (one primary key, one foreign key index)
Finally, I would also assume that an INNER JOIN will be enough: every a.contentid exists in tableb. If not, you are missing a foreign key and index on a.contentid
** It's getting better but you need MariaDB or MySQL 5.6
This should be a little faster:
SELECT
tmp.content_id,
b.content,
tmp.asum
FROM (
SELECT
a.content_id,
SUM(a.score) AS asum
FROM
table_a a
GROUP BY
a.content_id
ORDER BY
NULL
) as tmp
LEFT JOIN table_b b
ON tmp.content_id = b.id
ORDER BY
tmp.asum ASC
You can use EXPLAIN to check the query execution plan for both queries when you want to benchmark them

Optimize the query with index

MySQL query taking 1.6 seconds for 40000 records in table
SELECT aggsm.topicdm_id AS topid,citydm.city_name
FROM AGG_MENTION AS aggsm
JOIN LOCATIONDM AS locdm ON aggsm.locationdm_id = locdm.locationdm_id
JOIN CITY AS citydm ON locdm.city_id = citydm.city_id
JOIN STATE AS statedm ON citydm.state_id = statedm.state_id
JOIN COUNTRY AS cntrydm ON statedm.country_id = cntrydm.country_id
WHERE cntrydm.country_id IN (1,2,3,4)
GROUP BY aggsm.topicdm_id,aggsm.locationdm_id
LIMIT 0,200000
I have 40000 to 50000 records in AGG_MENTION,LOCATIONDM,CITYDM tables....500records in STATEDM abd 4 records in COUNTRY table.
When i run above query it is taking 1.6 sec..Is there a way to optimize the query or index on which columns will improve the performance....
Following is the EXPLAIN output:
1 SIMPLE aggsm index agg_sm_locdm_fk_idx agg_sm_datedm_fk_idx 4 36313 Using index; Using temporary; Using filesort
1 SIMPLE locdm eq_ref PRIMARY,city_id_UNIQUE,locationdm_id_UNIQUE,loc_city_fk_idx PRIMARY 8 opinionleaders.aggsm.locationdm_id 1
1 SIMPLE citydm eq_ref PRIMARY,city_id_UNIQUE,city_state_fk_idx PRIMARY 8 opinionleaders.locdm.city_id 1
1 SIMPLE statedm eq_ref PRIMARY,state_id_UNIQUE,state_country_fk_idx PRIMARY 8 opinionleaders.citydm.state_id 1 Using where
1 SIMPLE cntrydm eq_ref PRIMARY,country_id_UNIQUE PRIMARY 8 opinionleaders.statedm.country_id 1 Using index
I would reverse the query and start with the STATE first as that is what your criteria is based upon. Since you are not actually doing anything with the country table (except the country ID)... This column also exists in the State table, so you can the State.Country_ID and remove the country table from the join.
Additionally, I would have the following indexes
Table Index
State (Country_ID) as that will be basis of your WHERE criteria.
City (State_ID, City_Name).
Location (City_ID)
Agg_Mention (LocationDM_ID, TopicDM_id).
By having the "City_Name" as part of the index, the query doesn't have to go to the actual page data for it. Since part of the index, it can use it directly.
Many times, the keyword "STRAIGHT_JOIN" included here helps optimizer to run query in the order stated so it doesn't try to take one of the other tables as its primary basis of querying the data. If that doesn't perform well, you can try it again without it.
SELECT STRAIGHT_JOIN
aggsm.topicdm_id AS topid,
citydm.city_name
FROM
STATE AS statedm
JOIN CITY AS citydm
ON statedm.state_id = citydm.state_id
JOIN LOCATIONDM AS locdm
ON citydm.city_id = locdm.city_id
join AGG_MENTION AS aggsm
ON locdm.locationdm_id = aggsm.locationdm_id
WHERE
statedm.country_id IN (1,2,3,4)
GROUP BY
aggsm.topicdm_id,
aggsm.locationdm_id
LIMIT 0,200000

How to optimize MySQL Views

I have some querys using views, and these run a lot slower than I would expect them to given all relevant tables are indexed (and not that large anyway).
I hope I can explain this:
My main Query looks like this (grossly simplified)
select [stuff] from orders as ord
left join calc_order_status as ors on (ors.order_id = ord.id)
calc_order_status is a view, defined thusly:
create view calc_order_status as
select ord.id AS order_id,
(sum(itm.items * itm.item_price) + ord.delivery_cost) AS total_total
from orders ord
left join order_items itm on itm.order_id = ord.id
group by ord.id
Orders (ord) contain orders, order_items contain the individual items associated with each order and their prices.
All tables are properly indexed, BUT the thing runs slowly and when I do a EXPLAIN I get
# id select_type table type possible_keys key key_len ref rows Extra
1 1 PRIMARY ord ALL customer_id NULL NULL NULL 1002 Using temporary; Using filesort
2 1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1002
3 1 PRIMARY cus eq_ref PRIMARY PRIMARY 4 db135147_2.ord.customer_id 1 Using where
4 2 DERIVED ord ALL NULL NULL NULL NULL 1002 Using temporary; Using filesort
5 2 DERIVED itm ref order_id order_id 4 db135147_2.ord.id 2
My guess is, "derived2" refers to the view. The individual items (itm) seem to work fine, indexed by order _ id. The problem seems to be Line # 4, which indicates that the system doesn't use a key for the orders table (ord). But in the MAIN query, the order id is already defined:
left join calc_order_status as ors on (ors.order _ id = ord.id)
and ord.id (both in the main query and within the view) refer to the primary key.
I have read somewhere than MySQL simpliy does not optimize views that well and might not utilize keys under some conditions even when available. This seems to be one of those cases.
I would appreciate any suggestions. Is there a way to force MySQL to realize "it's all simpler than you think, just use the primary key and you'll be fine"? Or are views the wrong way to go about this at all?
If it is at all possible to remove those joins remove them. Replacing them with subquerys will speed it up a lot.
you could also try running something like this to see if it has any speed difference at all.
select [stuff] from orders as ord
left join (
create view calc_order_status as
select ord.id AS order_id,
(sum(itm.items * itm.item_price) + ord.delivery_cost) AS total_total
from orders ord
left join order_items itm on itm.order_id = ord.id
group by ord.id
) as ors on (ors.order_id = ord.id)
An index is useful for finding a few rows in a big table, but when you query every row, an index just slows things down. So here MySQL probably expects to be using the whole [order] table, so it better not use an index.
You can try if it would be faster by forcing MySQL to use an index:
from orders as ord force index for join (yourindex)