I have a MySQL query which has a JOIN of 12 tables. When I explain the query, It showing 394699 rows for one table and 185368 rows for another table. All other tables has 1-3 rows. The total result which I am getting from the query id 472 rows only. But for that, it is taking more than 1 minute.
Is there any way to check how many rows has been analyzed to produce such a result? So that, I can find which is the table costs the higher time.
I am giving the query structure below. As the table structure is too high, I am not able to provide it here.
SELECT h.nid,h.attached_nid,h.created, s.field_species_value as species, g.field_gender_value as gender, u.field_unique_id_value as unqid, n.title, dob.field_adult_healthy_weight_value as birth_date, dcolor.field_dog_primary_color_value as dogcolor, ccolor.field_primary_color_value as catcolor, sdcolor.field_dog_secondary_color_value as sdogcolor, sccolor.field_secondary_color_value as scatcolor, dpattern.field_dog_pattern_value as dogpattern, cpattern.field_cat_pattern_value as catpattern
FROM table1 h
JOIN table2 n ON n.nid = h.nid
JOIN table3 s ON n.nid = s.entity_id
JOIN table4 u ON n.nid = u.entity_id
LEFT JOIN table5 g ON n.nid = g.entity_id
LEFT JOIN table6 dob ON n.nid = dob.entity_id
LEFT JOIN table7 AS dcolor ON n.nid = dcolor.entity_id
LEFT JOIN table8 AS ccolor ON n.nid = ccolor.entity_id
LEFT JOIN table9 AS sdcolor ON n.nid = sdcolor.entity_id
LEFT JOIN table10 AS sccolor ON n.nid = sccolor.entity_id
LEFT JOIN table11 AS dpattern ON n.nid = dpattern.entity_id
LEFT JOIN table12 AS cpattern ON n.nid = cpattern.entity_id
WHERE h.title = '4208'
AND ((h.created BETWEEN 1483257600 AND 1485935999))
AND h.uid!=1
AND h.uid IN(
SELECT etid
FROM `table`
WHERE gid=464
AND entity_type='user')
AND h.attached_nid>0
ORDER BY CAST(h.created as UNSIGNED) DESC;
Below is the EXPLAIN result which I get
+------+--------------+---------------+--------+----------------------+---------------------+---------+----------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+---------------+--------+----------------------+---------------------+---------+----------------------+--------+----------------------------------------------+
| 1 | PRIMARY | s | index | entity_id | field_species_value | 772 | NULL | 394699 | Using index; Using temporary; Using filesort |
| 1 | PRIMARY | u | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | n | eq_ref | PRIMARY | PRIMARY | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | g | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | dob | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | dcolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | ccolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | sdcolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | sccolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | dpattern | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | cpattern | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | h | ref | attached_nid,nid,uid | nid | 5 | pantheon.s.entity_id | 3 | Using index condition; Using where |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 4 | func | 1 | Using where |
| 2 | MATERIALIZED | og_membership | ref | entity,gid | gid | 4 | const | 185368 | Using where |
+------+--------------+---------------+--------+----------------------+---------------------+---------+----------------------+--------+----------------------------------------------+
You can find the ROWS_EXAMINED by using the Performance Schema.
Here is a link to the performance schema quick start guide.
https://dev.mysql.com/doc/refman/5.5/en/performance-schema-quick-start.html
This is the query I run in PHP applications, to find out what queries I need to optimize. You should be able to adapt it pretty easily.
The query finds the stats on the query that was run before this one. So in my apps, I run query after every query I run, store the results, then at the end of the PHP script I output the stats for every query I ran during the request.
SELECT `EVENT_ID`, TRUNCATE(`TIMER_WAIT`/1000000000000,6) as Duration,
`SQL_TEXT`, `DIGEST_TEXT`, `NO_INDEX_USED`, `NO_GOOD_INDEX_USED`, `ROWS_AFFECTED`, `ROWS_SENT`, `ROWS_EXAMINED`
FROM `performance_schema`.`events_statements_history`
WHERE
`CURRENT_SCHEMA` = '{$database}' AND `EVENT_NAME` LIKE 'statement/sql/%'
AND `THREAD_ID` = (SELECT `THREAD_ID` FROM `performance_schema`.`threads` WHERE `performance_schema`.`threads`.`PROCESSLIST_ID` = CONNECTION_ID())
ORDER BY `EVENT_ID` DESC LIMIT 1;
To decrease the number of rows accessed from og_membership, try adding an index containing the gid, entity_type, and etid fields. Including gid and entity_type should make the lookup more performant and including etid will make the index a covering index.
After adding the index, run EXPLAIN again to look at the results. Based on the new explain plan, either keep the index, remove the index, and/or add an additional index. Keep doing this until you get results you are satisfied with.
For sure, you will want to try and eliminate any mentions of Using temporary or Using filesort. Using temporary implies a temporary table is being used to make this query probably for the sheer size of your intermittent. Using filesort implies ordering isn't being satisfied with an index and is being done by examining the matching rows.
An detail explanation about EXPLAIN can be found at https://dev.mysql.com/doc/refman/5.7/en/explain-output.html.
Key-Value (EAV) schema sucks.
Indexes:
table1: INDEX(title, created)
table1: INDEX(uid, title, created)
table: INDEX(gid, entity_type, etid)
table* -- Is `entity_id` already an index? Can it be the PRIMARY KEY?
Does nid need to be NULL instead of NOT NULL?
If those don't do enough, try:
And turn the IN ( SELECT ... ) into a JOIN ( SELECT ... ) USING(hid)
If you still need help, please provide SHOW CREATE TABLE and EXPLAIN SELECT ...
Related
I have a database which consists of three tables, with the following structure:
restaurant table: restaurant_id, location_id, rating. Example: 1325, 77, 4.5
restaurant_name table: restaurant_id, language, name. Example: 1325, 'en', 'Pizza Express'
location_name table: location_id, language, name. Example: 77, 'en', 'New York'
I would like to get the restaurant info in English, sorted by location name and restaurant name, and use the LIMIT clause to paginate the result. So my SQL is:
SELECT ln.name, rn.name
FROM restaurant r
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
LIMIT 0, 50
This is terribly slow - so I refined my SQL with deferred JOIN, which make things a lot faster (from over 10 seconds to 2 seconds):
SELECT ln.name, rn.name
FROM restaurant r
INNER JOIN (
SELECT r.restaurant_id
FROM restaurant r
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
LIMIT 0, 50
) r1
ON r.restaurant_id = r1.restaurant_id
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
2 seconds is unfortunately still not very acceptable to the user, so I go and check the EXPLAIN of the my query, and it appears that the slow part is on the ORDER BY clause, which I see "Using temporary; Using filesort". I checked the official reference manual about ORDER BY optimization and I come across this statement:
In some cases, MySQL cannot use indexes to resolve the ORDER BY,
although it may still use indexes to find the rows that match the
WHERE clause. Examples:
The query joins many tables, and the columns in the ORDER BY are not
all from the first nonconstant table that is used to retrieve rows.
(This is the first table in the EXPLAIN output that does not have a
const join type.)
So for my case, given that the two columns I'm ordering by are from the nonconstant joined tables, index cannot be used. My question is, is there any other approach I can take to speed things up, or what I've done so far is already the best I can achieve?
Thanks in advance for your help!
EDIT 1
Below is the EXPLAIN output with the ORDER BY clause:
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 50 | |
| 1 | PRIMARY | rn | ref | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | r1.restaurant_id,const,const | 1 | Using where |
| 1 | PRIMARY | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | r1.restaurant_id | 1 | |
| 1 | PRIMARY | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id,const,const | 1 | Using where |
| 2 | DERIVED | rn | ALL | idx_restaurant_name_1 | NULL | NULL | NULL | 8484 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | test.rn.restaurant_id | 1 | |
| 2 | DERIVED | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id | 1 | Using where |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
Below is the EXPLAIN output without the ORDER BY clause:
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 50 | |
| 1 | PRIMARY | rn | ref | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | r1.restaurant_id,const,const | 1 | Using where |
| 1 | PRIMARY | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | r1.restaurant_id | 1 | |
| 1 | PRIMARY | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id,const,const | 1 | Using where |
| 2 | DERIVED | rn | index | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | NULL | 8484 | Using where; Using index |
| 2 | DERIVED | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | test.rn.restaurant_id | 1 | |
| 2 | DERIVED | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id | 1 | Using where; Using index |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
EDIT 2
Below are the DDL of the table. I built them for illustrating this problem only, the real table has much more columns.
CREATE TABLE restaurant (
restaurant_id INT NOT NULL AUTO_INCREMENT,
location_id INT NOT NULL,
rating INT NOT NULL,
PRIMARY KEY (restaurant_id),
INDEX idx_restaurant_1 (location_id)
);
CREATE TABLE restaurant_name (
restaurant_id INT NOT NULL,
language VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
INDEX idx_restaurant_name_1 (restaurant_id, language),
INDEX idx_restaurant_name_2 (name)
);
CREATE TABLE location_name (
location_id INT NOT NULL,
language VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
INDEX idx_location_name_1 (location_id, language),
INDEX idx_location_name_2 (name)
);
Based on the EXPLAIN numbers, there could be about 170 "pages" of restaurants (8484/50)? I suggest that that is impractical for paging through. I strongly recommend you rethink the UI. In doing so, the performance problem you state will probably vanish.
For example, the UI could be 2 steps instead of 170 to get to the restaurants in Zimbabwe. Step 1, pick a country. (OK, that might be page 5 of the countries.) Step 2, view the list of restaurants in that country; it would be only a few pages to flip through. Much better for the user; much better for the database.
Addenda
In order to optimize the pagination, get the paginated list of pages from a single table (so that you can 'remember where you left off'). Then join the language table(s) to look up the translations. Note that this only looks up on page's worth of translations, not thousands.
I have a query which is very slow in INNER JOIN condition, but is faster when used in WHERE IN clause:
Slower Inner join:
SELECT *
FROM cases
left join
(
select tst.team_set_id
from team_sets_teams tst
INNER JOIN team_memberships team_memberships
ON tst.team_id = team_memberships.team_id
AND team_memberships.user_id = '1'
AND team_memberships.deleted=0 group by tst.team_set_id
) cases_tf
ON cases_tf.team_set_id = cases.team_set_id
LEFT JOIN contacts_cases
ON contacts_cases.case_id = cases.id
AND contacts_cases.deleted = 0
where cases.deleted=0
ORDER BY cases.name LIMIT 0,20;
Faster where in:
SELECT *
FROM cases
LEFT JOIN contacts_cases
ON contacts_cases.case_id = cases.id
AND contacts_cases.deleted = 0
where cases.deleted=0
and cases.team_set_id in (
select tst.team_set_id
from team_sets_teams tst
INNER JOIN team_memberships team_memberships
ON tst.team_id = team_memberships.team_id
AND team_memberships.user_id = '1'
AND team_memberships.deleted=0
group by tst.team_set_id
)
ORDER BY cases.name LIMIT 0,20;
The explain plan for INNER JOIN and WHERE IN clause are below:
Inner Join:
+----+-------------+------------------+------+--------------------------------------------+---------------------+---------+-----------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+--------------------------------------------+---------------------+---------+-----------------------------------+--------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort |
| 1 | PRIMARY | cases | ref | idx_cases_tmst_id | idx_cases_tmst_id | 109 | cases_tf.team_set_id | 446976 | Using where |
| 1 | PRIMARY | contacts_cases | ref | idx_con_case_case | idx_con_case_case | 111 | sugarcrm.cases.id | 1 | |
| 2 | DERIVED | team_memberships | ref | idx_team_membership,idx_teammemb_team_user | idx_team_membership | 109 | | 2 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | tst | ref | idx_ud_team_id | idx_ud_team_id | 109 | sugarcrm.team_memberships.team_id | 1 | Using where |
+----+-------------+------------------+------+--------------------------------------------+---------------------+---------+-----------------------------------+--------+----------------------------------------------+
While in condition:
------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------------+-------+--------------------------------------------+---------------------+---------+-----------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | cases | index | NULL | idx_case_name | 768 | NULL | 20 | Using where |
| 1 | PRIMARY | contacts_cases | ref | idx_con_case_case | idx_con_case_case | 111 | sugarcrm.cases.id | 1 | |
| 2 | DEPENDENT SUBQUERY | team_memberships | ref | idx_team_membership,idx_teammemb_team_user | idx_team_membership | 109 | const | 2 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | tst | ref | idx_ud_team_id | idx_ud_team_id | 109 | sugarcrm.team_memberships.team_id | 1 | Using where |
+----+--------------------+------------------+-------+--------------------------------------------+---------------------+---------+-----------------------------------+------+----------------------------------------------+
Though there are indexes, I couldn't figure out what the problem. Please help me out. Thanks.
(This is a query in sugarcrm)
It is impossible to convert IN condition to INNER JOIN.
A query with IN condition is called semi-join.
A semi-join returns rows from one table that would join with another table, but without performing a complete join.
Here is a simple example of a semi-join query using IN operator:
SELECT *
FROM table1
WHERE some-column IN (
SELECT some-other-column
FROM table2
WHERE some-conditions
)
the above semi-join can be converted to semantically equivalent query
(eqivalent - means giving exactly same results)
using EXISTS operator and dependend subquery:
SELECT *
FROM table1
WHERE EXISTS(
SELECT 1
FROM table2
WHERE some-conditions
AND table1.some-column = table2.some-other-column
)
Most leading databases use the same plan for both of the above queries, and their speed is the same,
unfortunately this is not always true for MySql.
Joins and semi joins are totally different queries, with completely different execution plans, so comparing their speed is like comparing apples to onions.
You can try to convert the first query with IN into a query with EXIST, but not into the join.
How can I avoid a full table scan when performing inner joins in MySQL using IN in WHERE clause? For example:
explain SELECT
-> COUNT(DISTINCT(n.nid))
-> FROM node n
-> INNER JOIN term_node tn ON n.nid = tn.nid
-> INNER JOIN content_type_article ca ON n.nid = ca.nid
-> WHERE tn.tid IN (67,100)
-> ;
+----+-------------+-------+--------+----------------------------------+---------+---------+----------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------+---------+---------+----------------------+-------+--------------------------+
| 1 | SIMPLE | tn | ALL | PRIMARY,nid | NULL | NULL | NULL | 42180 | Using where |
| 1 | SIMPLE | ca | ref | nid,field_article_date_nid_index | nid | 4 | drupal_mm_qas.tn.nid | 1 | Using index |
| 1 | SIMPLE | n | eq_ref | PRIMARY | PRIMARY | 4 | drupal_mm_qas.ca.nid | 1 | Using where; Using index |
+----+-------------+-------+--------+----------------------------------+---------+---------+----------------------+-------+--------------------------+
3 rows in set (0.00 sec)
It seems you're filtering by a column that mysql identified to be not selective enough. When a filter's cardinality is too low (i.e, the number of distinct rows for that filter is low), mysql thinks, most of the time accurately, that a fts would be faster.
To confirm, please show the result of SELECT COUNT(DISTINCT tn.tid) FROM term_node tn and SELECT COUNT(*) FROM term_node tn
how to make this select statement more faster?
the first left join with the subselect is making it slower...
mysql> SELECT COUNT(DISTINCT w1.id) AS AMOUNT FROM tblWerbemittel w1
JOIN tblVorgang v1 ON w1.object_group = v1.werbemittel_id
INNER JOIN ( SELECT wmax.object_group, MAX( wmax.object_revision ) wmaxobjrev FROM tblWerbemittel wmax GROUP BY wmax.object_group ) AS wmaxselect ON w1.object_group = wmaxselect.object_group AND w1.object_revision = wmaxselect.wmaxobjrev
LEFT JOIN ( SELECT vmax.object_group, MAX( vmax.object_revision ) vmaxobjrev FROM tblVorgang vmax GROUP BY vmax.object_group ) AS vmaxselect ON v1.object_group = vmaxselect.object_group AND v1.object_revision = vmaxselect.vmaxobjrev
LEFT JOIN tblWerbemittel_has_tblAngebot wha ON wha.werbemittel_id = w1.object_group
LEFT JOIN tblAngebot ta ON ta.id = wha.angebot_id
LEFT JOIN tblLieferanten tl ON tl.id = ta.lieferant_id AND wha.zuschlag = (SELECT MAX(zuschlag) FROM tblWerbemittel_has_tblAngebot WHERE werbemittel_id = w1.object_group)
WHERE w1.flags =0 AND v1.flags=0;
+--------+
| AMOUNT |
+--------+
| 1982 |
+--------+
1 row in set (1.30 sec)
Some indexes has been already set and as EXPLAIN shows they were used.
+----+--------------------+-------------------------------+--------+----------------------------------------+----------------------+---------+-----------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------------------------+--------+----------------------------------------+----------------------+---------+-----------------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2072 | |
| 1 | PRIMARY | v1 | ref | werbemittel_group,werbemittel_id_index | werbemittel_group | 4 | wmaxselect.object_group | 2 | Using where |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 3376 | |
| 1 | PRIMARY | w1 | eq_ref | object_revision,or_og_index | object_revision | 8 | wmaxselect.wmaxobjrev,wmaxselect.object_group | 1 | Using where |
| 1 | PRIMARY | wha | ref | PRIMARY,werbemittel_id_index | werbemittel_id_index | 4 | dpd.w1.object_group | 1 | |
| 1 | PRIMARY | ta | eq_ref | PRIMARY | PRIMARY | 4 | dpd.wha.angebot_id | 1 | |
| 1 | PRIMARY | tl | eq_ref | PRIMARY | PRIMARY | 4 | dpd.ta.lieferant_id | 1 | Using index |
| 4 | DEPENDENT SUBQUERY | tblWerbemittel_has_tblAngebot | ref | PRIMARY,werbemittel_id_index | werbemittel_id_index | 4 | dpd.w1.object_group | 1 | |
| 3 | DERIVED | vmax | index | NULL | object_revision_uq | 8 | NULL | 4668 | Using index; Using temporary; Using filesort |
| 2 | DERIVED | wmax | range | NULL | or_og_index | 4 | NULL | 2168 | Using index for group-by |
+----+--------------------+-------------------------------+--------+----------------------------------------+----------------------+---------+-----------------------------------------------+------+----------------------------------------------+
10 rows in set (0.01 sec)
The main problem while the statement above takes about 2 seconds seems to be the subselect where no index can be used.
How to write the statement even more faster?
Thanks for help. MT
Do you have the following indexes?
for tblWerbemittel - object_group, object_revision
for tblVorgang - object_group, object_revision
for tblWerbemittel_has_tblAngebot - werbemittel_id, zuschlag
Let me know if that helps, there are a few more that I can see might help but try those first.
EDIT
Can you try these two queries and see if they run fast?
SELECT w1.id AS AMOUNT
FROM tblWerbemittel w1 INNER JOIN
(SELECT wmax.object_group,
MAX( wmax.object_revision ) AS wmaxobjrev
FROM tblWerbemittel AS wmax
GROUP BY wmax.object_group ) AS wmaxselect ON w1.object_group = wmaxselect.object_group AND
w1.object_revision = wmaxselect.wmaxobjrev
WHERE w1.flags = 0
SELECT v1.werbemittel_id
FROM tblVorgang v1 LEFT JOIN
(SELECT vmax.object_group,
MAX( vmax.object_revision ) AS vmaxobjrev
FROM tblVorgang AS vmax
GROUP BY vmax.object_group ) AS vmaxselect ON v1.object_group = vmaxselect.object_group AND
v1.object_revision = vmaxselect.vmaxobjrev LEFT JOIN
WHERE v1.flags = 0
While I consider I don't have sufficient data to provide a 100% correct answer, but I can throw in a handful of tips.
Forst of all, MYSQL is stupid. Bear that in mind and always rearrange your queries so that the most data is excluded at the beginning. For instance, if the last join reduced the number of results from 10k to 2k while the others don't, try swapping their positions so that each subsequent join operates on the smallest subset of data possible.
Same applies to the WHERE clause.
Also, joins tend to be slower than subqueries. I don't know if that's a rule or just something that I'm observing in my case, but you can always try to substitute a join or two with a subquery.
While I suppose this doesn't really answer your question, I hope it at least gives you an idea about where to start looking for optimisations.
I am currently debating whether my table, mapping_uGroups_uProducts, which is a view formed by the following table:
CREATE ALGORITHM=UNDEFINED DEFINER=`root`#`localhost`
SQL SECURITY DEFINER VIEW `db`.`mapping_uGroups_uProducts`
AS select distinct `X`.`upID` AS `upID`,`Z`.`ugID` AS `ugID` from
((`db`.`mapping_uProducts_Products` `X` join `db`.`productsInfo` `Y`
on((`X`.`pID` = `Y`.`pID`))) join `db`.`mapping_uGroups_Groups` `Z`
on((`Y`.`gID` = `Z`.`gID`)));
My current query is:
SELECT upID FROM uProductsInfo \
JOIN fs_uProducts USING (upID) column \
JOIN mapping_uGroups_uProducts USING (upID) -- could be faster if we use hard table and index \
JOIN mapping_fs_key USING (fsKeyID) \
WHERE fsName="OVERALL" \
AND ugID=1 \
ORDER BY score DESC \
LIMIT 0,30;
which is pretty slow. (for 30 results, it requires about 10 secondes). I think the reason for my query being so slow is definitely due to the fact that that particular query relies on a VIEW which has no index to speed things up.
+----+-------------+----------------+--------+----------------+---------+---------+---------------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+--------+----------------+---------+---------+---------------------------------------+-------+---------------------------------+
| 1 | PRIMARY | mapping_fs_key | const | PRIMARY,fsName | fsName | 386 | const | 1 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 19706 | Using where |
| 1 | PRIMARY | uProductsInfo | eq_ref | PRIMARY | PRIMARY | 4 | mapping_uGroups_uProducts.upID | 1 | Using index |
| 1 | PRIMARY | fs_uProducts | ref | upID | upID | 4 | db.uProductsInfo.upID | 221 | Using where |
| 2 | DERIVED | X | ALL | PRIMARY | NULL | NULL | NULL | 40772 | Using temporary |
| 2 | DERIVED | Y | eq_ref | PRIMARY | PRIMARY | 4 | db.X.pID | 1 | Distinct |
| 2 | DERIVED | Z | ref | PRIMARY | PRIMARY | 4 | db.Y.gID | 2 | Using index; Distinct |
+----+-------------+----------------+--------+----------------+---------+---------+---------------------------------------+-------+---------------------------------+
7 rows in set (0.48 sec)
The explain here looks pretty cryptic, and I don't know whether I should drop view and write a script to just insert everything in the view to a hard table. ( obviously, it will lose the flexibility of the view since the mapping changes quite frequently).
Does anyone have any idea to how I can optimize my schema better?
You current plan uses the view as a driven table: it is scanned for each record in mapping_fs_key with fsName = 'OVERALL'
You could replace the view with this function:
SELECT upID FROM uProductsInfo
JOIN fs_uProducts USING (upID)
JOIN mapping_fs_key USING (fsKeyID)
WHERE fsName='OVERALL'
AND upID IN
(
SELECT upID
FROM mapping_uGroups_Groups Z
JOIN productsInfo Y
ON y.gID = z.gID
JOIN mapping_uProducts_Products X
ON x.pID = y.pID
WHERE z.ugID = 1
)
ORDER BY
score DESC
LIMIT 0,30