slow mysql count because of subselect - mysql

how to make this select statement more faster?
the first left join with the subselect is making it slower...
mysql> SELECT COUNT(DISTINCT w1.id) AS AMOUNT FROM tblWerbemittel w1
JOIN tblVorgang v1 ON w1.object_group = v1.werbemittel_id
INNER JOIN ( SELECT wmax.object_group, MAX( wmax.object_revision ) wmaxobjrev FROM tblWerbemittel wmax GROUP BY wmax.object_group ) AS wmaxselect ON w1.object_group = wmaxselect.object_group AND w1.object_revision = wmaxselect.wmaxobjrev
LEFT JOIN ( SELECT vmax.object_group, MAX( vmax.object_revision ) vmaxobjrev FROM tblVorgang vmax GROUP BY vmax.object_group ) AS vmaxselect ON v1.object_group = vmaxselect.object_group AND v1.object_revision = vmaxselect.vmaxobjrev
LEFT JOIN tblWerbemittel_has_tblAngebot wha ON wha.werbemittel_id = w1.object_group
LEFT JOIN tblAngebot ta ON ta.id = wha.angebot_id
LEFT JOIN tblLieferanten tl ON tl.id = ta.lieferant_id AND wha.zuschlag = (SELECT MAX(zuschlag) FROM tblWerbemittel_has_tblAngebot WHERE werbemittel_id = w1.object_group)
WHERE w1.flags =0 AND v1.flags=0;
+--------+
| AMOUNT |
+--------+
| 1982 |
+--------+
1 row in set (1.30 sec)
Some indexes has been already set and as EXPLAIN shows they were used.
+----+--------------------+-------------------------------+--------+----------------------------------------+----------------------+---------+-----------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------------------------+--------+----------------------------------------+----------------------+---------+-----------------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2072 | |
| 1 | PRIMARY | v1 | ref | werbemittel_group,werbemittel_id_index | werbemittel_group | 4 | wmaxselect.object_group | 2 | Using where |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 3376 | |
| 1 | PRIMARY | w1 | eq_ref | object_revision,or_og_index | object_revision | 8 | wmaxselect.wmaxobjrev,wmaxselect.object_group | 1 | Using where |
| 1 | PRIMARY | wha | ref | PRIMARY,werbemittel_id_index | werbemittel_id_index | 4 | dpd.w1.object_group | 1 | |
| 1 | PRIMARY | ta | eq_ref | PRIMARY | PRIMARY | 4 | dpd.wha.angebot_id | 1 | |
| 1 | PRIMARY | tl | eq_ref | PRIMARY | PRIMARY | 4 | dpd.ta.lieferant_id | 1 | Using index |
| 4 | DEPENDENT SUBQUERY | tblWerbemittel_has_tblAngebot | ref | PRIMARY,werbemittel_id_index | werbemittel_id_index | 4 | dpd.w1.object_group | 1 | |
| 3 | DERIVED | vmax | index | NULL | object_revision_uq | 8 | NULL | 4668 | Using index; Using temporary; Using filesort |
| 2 | DERIVED | wmax | range | NULL | or_og_index | 4 | NULL | 2168 | Using index for group-by |
+----+--------------------+-------------------------------+--------+----------------------------------------+----------------------+---------+-----------------------------------------------+------+----------------------------------------------+
10 rows in set (0.01 sec)
The main problem while the statement above takes about 2 seconds seems to be the subselect where no index can be used.
How to write the statement even more faster?
Thanks for help. MT

Do you have the following indexes?
for tblWerbemittel - object_group, object_revision
for tblVorgang - object_group, object_revision
for tblWerbemittel_has_tblAngebot - werbemittel_id, zuschlag
Let me know if that helps, there are a few more that I can see might help but try those first.
EDIT
Can you try these two queries and see if they run fast?
SELECT w1.id AS AMOUNT
FROM tblWerbemittel w1 INNER JOIN
(SELECT wmax.object_group,
MAX( wmax.object_revision ) AS wmaxobjrev
FROM tblWerbemittel AS wmax
GROUP BY wmax.object_group ) AS wmaxselect ON w1.object_group = wmaxselect.object_group AND
w1.object_revision = wmaxselect.wmaxobjrev
WHERE w1.flags = 0
SELECT v1.werbemittel_id
FROM tblVorgang v1 LEFT JOIN
(SELECT vmax.object_group,
MAX( vmax.object_revision ) AS vmaxobjrev
FROM tblVorgang AS vmax
GROUP BY vmax.object_group ) AS vmaxselect ON v1.object_group = vmaxselect.object_group AND
v1.object_revision = vmaxselect.vmaxobjrev LEFT JOIN
WHERE v1.flags = 0

While I consider I don't have sufficient data to provide a 100% correct answer, but I can throw in a handful of tips.
Forst of all, MYSQL is stupid. Bear that in mind and always rearrange your queries so that the most data is excluded at the beginning. For instance, if the last join reduced the number of results from 10k to 2k while the others don't, try swapping their positions so that each subsequent join operates on the smallest subset of data possible.
Same applies to the WHERE clause.
Also, joins tend to be slower than subqueries. I don't know if that's a rule or just something that I'm observing in my case, but you can always try to substitute a join or two with a subquery.
While I suppose this doesn't really answer your question, I hope it at least gives you an idea about where to start looking for optimisations.

Related

MYSQL Explain 'Using:where; Using temporary; Using filesort'

I am trying to optimize a stored procedure. I have identified some issues with it, but I don't know enough to actually correct the problems. One of the subqueries looks like this
select d.districtCode,
b.year Year,
if(bli.releaseAdjustment = 3 and bli.id not in (select billLineItem_id from AppliedDiscount),
bli.amount ,0) *-1 Refund
from Bill b
join BillLineItem bli on bli.bill_id = b.id
left join Bill_District d on d.bill_id = b.id
and bli.type = d.type
left join AppliedDiscount a on a.billLineItem_id = bli.id
where bli.releaseAdjustment in (1,2,3)
and bli.type in (4,6)
group by d.districtCode, b.year
The EXPLAIN outputs this
+----+------------------------+---------------+--------------+--------------------------------+-----------------+---------+---------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+------------------------+---------------+--------------+--------------------------------+-----------------+---------+---------------------+---------+----------------------------------------------+
| 1 | PRIMARY | bli | ALL | FKF4236A,type,releaseAdjustment| NULL | NULL | NULL | 2787322 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | b | eq_ref | PRIMARY | PRIMARY | 8 | tax.bli.bill_id | 1 | |
| 1 | PRIMARY | d | ref | bill_id,type | bill_id | 8 | tax.bli.bill_id | 1 | |
| 1 | PRIMARY | a | ref | billLineItem_idx | billLineItem_idx| 8 | tax.bli.id | 1 | Using index |
| 1 | DEPENDENT SUBQUERY |AppliedDiscount|index_subquery| billLineItem_idx | billLineItem_idx| 8 | func | 1 | Using index |
+----+------------------------+---------------+--------------+--------------------------------+-----------------+---------+---------------------+---------+----------------------------------------------+
How would you suggest I fix this? This problem, or one very similar, is found throughout this stored procedure numerous times. AppliedDiscount only consists of 3 columns, all of which are indexed already.
Edit: Removing the group by changes the first row of the explain to
| 1 | PRIMARY | bli | ALL | FKF4236A,type,releaseAdjustment,bill_id| NULL | NULL | NULL | 2613847 | Using where |
That's better and technically answers my question, but that just means that I was asking the wrong question.
The 'type' is still ALL. What can I do to improve that?

MYSQL slow ORDER BY

I have an issue when adding an ORDER BY on my query.
without ORDER BY query takes around 26ms, as soon as I add a ORDER BY it's taking around 20s.
I have tried a few different ways but can seem to reduce the time.
Tried FORCE INDEX (PRIMARY)
Tried wrapping SELECT * FROM (...)
Does anyone have any other ideas?
QUERY:
SELECT
orders.id AS order_number,
orders.created AS order_created,
CONCAT_WS(' ', clients.name, office.name) AS client_office,
order_item.code AS product_code,
order_item.name AS product_name,
order_item.description AS product_description,
order_item.sale_price,
jobs.rating AS job_rating,
job_role.start_booking AS book_start,
job_role.end_booking AS book_end,
CONCAT_WS(' ', admin_staffs.first_name, admin_staffs.last_name) AS soul,
admin_roles.name AS role,
services.name AS service,
COUNT(job_item.id) AS total_assets,
COALESCE(orders.project_name, CONCAT_WS(' ', orders.location_unit_no, orders.location_street_number, orders.location_street_number, orders.location_street_name
)
) AS order_title
FROM jobs
LEFT JOIN job_item ON jobs.id = job_item.job_id
LEFT JOIN job_role ON jobs.id = job_role.job_id
LEFT JOIN admin_roles ON admin_roles.id = job_role.role_id
LEFT JOIN services ON services.id = jobs.service_id
LEFT JOIN admin_staffs ON admin_staffs.id = job_role.staff_id
LEFT JOIN order_item ON jobs.item_id = order_item.id
LEFT JOIN orders ON orders.id = order_item.order_id
LEFT JOIN clients ON orders.client_id = clients.id
LEFT JOIN office ON orders.office_id = office.id
LEFT JOIN client_users ON orders.user_id = client_users.id
GROUP BY jobs.id
ORDER BY jobs.order_id DESC
LIMIT 50
EXPLAIN:
+----+-------------+--------------+--------+----------------+-----------+---------+--------------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+----------------+-----------+---------+--------------------------------------+-------+---------------------------------+
| 1 | SIMPLE | jobs | ALL | PRIMARY | NULL | NULL | NULL | 49555 | Using temporary; Using filesort |
| 1 | SIMPLE | job_item | ref | job_id | job_id | 5 | Wx3392vf_UCO_app.jobs.id | 8 | Using where; Using index |
| 1 | SIMPLE | job_role | ref | job_id | job_id | 4 | Wx3392vf_UCO_app.jobs.id | 1 | Using where |
| 1 | SIMPLE | admin_roles | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.job_role.role_id | 1 | Using where |
| 1 | SIMPLE | services | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.jobs.service_id | 1 | Using where |
| 1 | SIMPLE | admin_staffs | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.job_role.staff_id | 1 | NULL |
| 1 | SIMPLE | order_item | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.jobs.item_id | 1 | Using where |
| 1 | SIMPLE | orders | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.order_item.order_id | 1 | Using where |
| 1 | SIMPLE | clients | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.orders.client_id | 1 | Using where |
| 1 | SIMPLE | office | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.orders.office_id | 1 | Using where |
| 1 | SIMPLE | client_users | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.orders.user_id | 1 | Using index |
+----+-------------+--------------+--------+----------------+-----------+---------+--------------------------------------+-------+---------------------------------+
Try to start from this query:
SELECT * FROM jobs ORDER BY jobs.order_id DESC LIMIT 50
And then increase its complexity by adding more joins:
SELECT * FROM jobs
LEFT JOIN job_item ON jobs.id = job_item.job_id
ORDER BY jobs.order_id DESC LIMIT 50
etc.
Make profiling at each step and you will be able to find a bottleneck. Probably, it is some TEXT field in one of joined tables that slows down the query or some other reason.

Find the number total rows analyzed in a MySQL JOIN query

I have a MySQL query which has a JOIN of 12 tables. When I explain the query, It showing 394699 rows for one table and 185368 rows for another table. All other tables has 1-3 rows. The total result which I am getting from the query id 472 rows only. But for that, it is taking more than 1 minute.
Is there any way to check how many rows has been analyzed to produce such a result? So that, I can find which is the table costs the higher time.
I am giving the query structure below. As the table structure is too high, I am not able to provide it here.
SELECT h.nid,h.attached_nid,h.created, s.field_species_value as species, g.field_gender_value as gender, u.field_unique_id_value as unqid, n.title, dob.field_adult_healthy_weight_value as birth_date, dcolor.field_dog_primary_color_value as dogcolor, ccolor.field_primary_color_value as catcolor, sdcolor.field_dog_secondary_color_value as sdogcolor, sccolor.field_secondary_color_value as scatcolor, dpattern.field_dog_pattern_value as dogpattern, cpattern.field_cat_pattern_value as catpattern
FROM table1 h
JOIN table2 n ON n.nid = h.nid
JOIN table3 s ON n.nid = s.entity_id
JOIN table4 u ON n.nid = u.entity_id
LEFT JOIN table5 g ON n.nid = g.entity_id
LEFT JOIN table6 dob ON n.nid = dob.entity_id
LEFT JOIN table7 AS dcolor ON n.nid = dcolor.entity_id
LEFT JOIN table8 AS ccolor ON n.nid = ccolor.entity_id
LEFT JOIN table9 AS sdcolor ON n.nid = sdcolor.entity_id
LEFT JOIN table10 AS sccolor ON n.nid = sccolor.entity_id
LEFT JOIN table11 AS dpattern ON n.nid = dpattern.entity_id
LEFT JOIN table12 AS cpattern ON n.nid = cpattern.entity_id
WHERE h.title = '4208'
AND ((h.created BETWEEN 1483257600 AND 1485935999))
AND h.uid!=1
AND h.uid IN(
SELECT etid
FROM `table`
WHERE gid=464
AND entity_type='user')
AND h.attached_nid>0
ORDER BY CAST(h.created as UNSIGNED) DESC;
Below is the EXPLAIN result which I get
+------+--------------+---------------+--------+----------------------+---------------------+---------+----------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+---------------+--------+----------------------+---------------------+---------+----------------------+--------+----------------------------------------------+
| 1 | PRIMARY | s | index | entity_id | field_species_value | 772 | NULL | 394699 | Using index; Using temporary; Using filesort |
| 1 | PRIMARY | u | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | n | eq_ref | PRIMARY | PRIMARY | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | g | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | dob | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | dcolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | ccolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | sdcolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | sccolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | dpattern | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | cpattern | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | h | ref | attached_nid,nid,uid | nid | 5 | pantheon.s.entity_id | 3 | Using index condition; Using where |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 4 | func | 1 | Using where |
| 2 | MATERIALIZED | og_membership | ref | entity,gid | gid | 4 | const | 185368 | Using where |
+------+--------------+---------------+--------+----------------------+---------------------+---------+----------------------+--------+----------------------------------------------+
You can find the ROWS_EXAMINED by using the Performance Schema.
Here is a link to the performance schema quick start guide.
https://dev.mysql.com/doc/refman/5.5/en/performance-schema-quick-start.html
This is the query I run in PHP applications, to find out what queries I need to optimize. You should be able to adapt it pretty easily.
The query finds the stats on the query that was run before this one. So in my apps, I run query after every query I run, store the results, then at the end of the PHP script I output the stats for every query I ran during the request.
SELECT `EVENT_ID`, TRUNCATE(`TIMER_WAIT`/1000000000000,6) as Duration,
`SQL_TEXT`, `DIGEST_TEXT`, `NO_INDEX_USED`, `NO_GOOD_INDEX_USED`, `ROWS_AFFECTED`, `ROWS_SENT`, `ROWS_EXAMINED`
FROM `performance_schema`.`events_statements_history`
WHERE
`CURRENT_SCHEMA` = '{$database}' AND `EVENT_NAME` LIKE 'statement/sql/%'
AND `THREAD_ID` = (SELECT `THREAD_ID` FROM `performance_schema`.`threads` WHERE `performance_schema`.`threads`.`PROCESSLIST_ID` = CONNECTION_ID())
ORDER BY `EVENT_ID` DESC LIMIT 1;
To decrease the number of rows accessed from og_membership, try adding an index containing the gid, entity_type, and etid fields. Including gid and entity_type should make the lookup more performant and including etid will make the index a covering index.
After adding the index, run EXPLAIN again to look at the results. Based on the new explain plan, either keep the index, remove the index, and/or add an additional index. Keep doing this until you get results you are satisfied with.
For sure, you will want to try and eliminate any mentions of Using temporary or Using filesort. Using temporary implies a temporary table is being used to make this query probably for the sheer size of your intermittent. Using filesort implies ordering isn't being satisfied with an index and is being done by examining the matching rows.
An detail explanation about EXPLAIN can be found at https://dev.mysql.com/doc/refman/5.7/en/explain-output.html.
Key-Value (EAV) schema sucks.
Indexes:
table1: INDEX(title, created)
table1: INDEX(uid, title, created)
table: INDEX(gid, entity_type, etid)
table* -- Is `entity_id` already an index? Can it be the PRIMARY KEY?
Does nid need to be NULL instead of NOT NULL?
If those don't do enough, try:
And turn the IN ( SELECT ... ) into a JOIN ( SELECT ... ) USING(hid)
If you still need help, please provide SHOW CREATE TABLE and EXPLAIN SELECT ...

Sql query optimization and debug

I am trying to optimize sql query in mysql db. Tried different ways of rewriting it , adding/removing indexes, but nothing seems to decrease the load. Maybe I am missing something.
Query:
select co.country_name as state, ci.city_name as city, ci.city_id, ci.country_id,
count(l.id) as num
FROM cities ci
INNER JOIN countries co ON (ci.country_id = co.country_id)
INNER JOIN dancers l ON (l.city_id = ci.city_id AND l.closed = 0 AND l.approved = 1 )
WHERE 1 AND ci.main=1
GROUP BY ci.city_id
ORDER BY city
Duration : 2.01sec - 2.20sec
Optimized query:
select co.country_name as state, ci.city_name as city, ci.city_id, ci.country_id, count(l.id) as num from
(select ci1.city_name, ci1.city_id, ci1.country_id from cities ci1
where ci1.main=1) as ci
INNER JOIN countries co ON (ci.country_id = co.country_id)
INNER JOIN dancers l ON (l.city_id = ci.city_id AND l.closed = 0 AND l.approved = 1 ) GROUP BY ci.city_id ORDER BY city
Duration : 0.82sec - 0.90sec
But i feel that this query can be optimized even more but not getting the ideea how to optimized it. There are 3 tables
Table 1 : countries ( country_id, country_name)
Table 2 : cities ( city_id, city_name, main, country_id)
Table 3 : dancers ( id, country_id, city_id, closed, approved)
I am trying to get all the cities which have main=1 and for each to count all the profiles that are into those cities joining with countries to get the country_name.
Any ideas are welcomed, thank you.
Later edit : - first query explain
+----+-------------+-------+-------------+---------------------------------------------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+---------------------------------------------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
| 1 | SIMPLE | l | index_merge | city_id,closed,approved,city_id_2 | closed,approved | 1,2 | NULL | 75340 | Using intersect(closed,approved); Using where; Using temporary; Using filesort |
| 1 | SIMPLE | ci | eq_ref | PRIMARY,state_id_2,state_id,city_name,lat,city_name_shorter,city_id | PRIMARY | 4 | db.l.city_id | 1 | Using where |
| 1 | SIMPLE | co | eq_ref | PRIMARY | PRIMARY | 4 | db.ci.country_id | 1 | Using where |
+----+-------------+-------+-------------+---------------------------------------------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
Second query explain :
+----+-------------+------------+------+-----------------------------------+-------------+---------+------------------+-------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+-----------------------------------+-------------+---------+------------------+-------+------------------------------------+
| 1 | PRIMARY | co | ALL | PRIMARY | NULL | NULL | NULL | 51 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ref | <auto_key1> | <auto_key1> | 4 | db.co.country_id | 176 | Using where |
| 1 | PRIMARY | l | ref | city_id,closed,approved,city_id_2 | city_id_2 | 4 | ci.city_id | 44 | Using index condition; Using where |
| 2 | DERIVED | ci1 | ALL | NULL | NULL | NULL | NULL | 11765 | Using where |
+----+-------------+------------+------+-----------------------------------+-------------+---------+------------------+-------+------------------------------------+
#used_by_already query explain :
+----+-------------+------------+-------------+-----------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------------+-----------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
| 1 | PRIMARY | co | ALL | PRIMARY | NULL | NULL | NULL | 51 | NULL |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 4 | db.co.country_id | 565 | Using where |
| 2 | DERIVED | l | index_merge | city_id,closed,approved,city_id_2 | closed,approved | 1,2 | NULL | 75341 | Using intersect(closed,approved); Using where; Using temporary; Using filesort |
| 2 | DERIVED | ci1 | eq_ref | PRIMARY,state_id_2,city_id | PRIMARY | 4 | db.l.city_id | 1 | Using where |
+----+-------------+------------+-------------+-----------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
I suggest you try this:
SELECT
co.country_name AS state
, ci.city_name AS city
, ci.city_id
, ci.country_id
, ci.num
FROM (
SELECT
ci1.city_id
, ci1.city_name
, ci1.country_id
, COUNT(l.id) AS num
FROM cities ci1
INNER JOIN dancers l ON l.city_id = ci1.city_id
AND l.closed = 0
AND l.approved = 1
WHERE ci1.main = 1
GROUP BY
ci1.city_id
, ci1.city_name
, ci1.country_id
) AS ci
INNER JOIN countries co ON ci.country_id = co.country_id
;
And that you provide the explain plan output for further analysis if needed. When optimizing knowing what indexes exist, and having he explain plan, are essentials.
Not also, that MySQL does permit non-standard GROUP BY syntax (where only one or some of the columns in the select list are included under the group by clause).
In recent versions of MySQL the default behaviour for GROUP BY has changed to SQL standard syntax (where all "non-aggregating" columns in the select list must be included under the group by clause). While your existing queries use the non-standard group by syntax, the query supplied here dis compliant.

How to convert IN condition to INNER JOIN condition - join is slower

I have a query which is very slow in INNER JOIN condition, but is faster when used in WHERE IN clause:
Slower Inner join:
SELECT *
FROM cases
left join
(
select tst.team_set_id
from team_sets_teams tst
INNER JOIN team_memberships team_memberships
ON tst.team_id = team_memberships.team_id
AND team_memberships.user_id = '1'
AND team_memberships.deleted=0 group by tst.team_set_id
) cases_tf
ON cases_tf.team_set_id = cases.team_set_id
LEFT JOIN contacts_cases
ON contacts_cases.case_id = cases.id
AND contacts_cases.deleted = 0
where cases.deleted=0
ORDER BY cases.name LIMIT 0,20;
Faster where in:
SELECT *
FROM cases
LEFT JOIN contacts_cases
ON contacts_cases.case_id = cases.id
AND contacts_cases.deleted = 0
where cases.deleted=0
and cases.team_set_id in (
select tst.team_set_id
from team_sets_teams tst
INNER JOIN team_memberships team_memberships
ON tst.team_id = team_memberships.team_id
AND team_memberships.user_id = '1'
AND team_memberships.deleted=0
group by tst.team_set_id
)
ORDER BY cases.name LIMIT 0,20;
The explain plan for INNER JOIN and WHERE IN clause are below:
Inner Join:
+----+-------------+------------------+------+--------------------------------------------+---------------------+---------+-----------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+--------------------------------------------+---------------------+---------+-----------------------------------+--------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort |
| 1 | PRIMARY | cases | ref | idx_cases_tmst_id | idx_cases_tmst_id | 109 | cases_tf.team_set_id | 446976 | Using where |
| 1 | PRIMARY | contacts_cases | ref | idx_con_case_case | idx_con_case_case | 111 | sugarcrm.cases.id | 1 | |
| 2 | DERIVED | team_memberships | ref | idx_team_membership,idx_teammemb_team_user | idx_team_membership | 109 | | 2 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | tst | ref | idx_ud_team_id | idx_ud_team_id | 109 | sugarcrm.team_memberships.team_id | 1 | Using where |
+----+-------------+------------------+------+--------------------------------------------+---------------------+---------+-----------------------------------+--------+----------------------------------------------+
While in condition:
------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------------+-------+--------------------------------------------+---------------------+---------+-----------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | cases | index | NULL | idx_case_name | 768 | NULL | 20 | Using where |
| 1 | PRIMARY | contacts_cases | ref | idx_con_case_case | idx_con_case_case | 111 | sugarcrm.cases.id | 1 | |
| 2 | DEPENDENT SUBQUERY | team_memberships | ref | idx_team_membership,idx_teammemb_team_user | idx_team_membership | 109 | const | 2 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | tst | ref | idx_ud_team_id | idx_ud_team_id | 109 | sugarcrm.team_memberships.team_id | 1 | Using where |
+----+--------------------+------------------+-------+--------------------------------------------+---------------------+---------+-----------------------------------+------+----------------------------------------------+
Though there are indexes, I couldn't figure out what the problem. Please help me out. Thanks.
(This is a query in sugarcrm)
It is impossible to convert IN condition to INNER JOIN.
A query with IN condition is called semi-join.
A semi-join returns rows from one table that would join with another table, but without performing a complete join.
Here is a simple example of a semi-join query using IN operator:
SELECT *
FROM table1
WHERE some-column IN (
SELECT some-other-column
FROM table2
WHERE some-conditions
)
the above semi-join can be converted to semantically equivalent query
(eqivalent - means giving exactly same results)
using EXISTS operator and dependend subquery:
SELECT *
FROM table1
WHERE EXISTS(
SELECT 1
FROM table2
WHERE some-conditions
AND table1.some-column = table2.some-other-column
)
Most leading databases use the same plan for both of the above queries, and their speed is the same,
unfortunately this is not always true for MySql.
Joins and semi joins are totally different queries, with completely different execution plans, so comparing their speed is like comparing apples to onions.
You can try to convert the first query with IN into a query with EXIST, but not into the join.