I have MySQL query currently selecting and joining 13 tables and finally grouping ~60k rows. The query without grouping takes ~0ms but with grouping the query time increases to ~1.7sec. The field, which is used for grouping is primary field and is indexed. Where could be the issue?
I know group by without aggregate is considered invalid query and bad practise but I need distinct base table rows and can not use DISTINCT syntax.
The query itself looks like this:
SELECT `table_a`.*
FROM `table_a`
LEFT JOIN `table_b`
ON `table_b`.`invoice` = `table_a`.`id`
LEFT JOIN `table_c` AS `r1`
ON `r1`.`invoice_1` = `table_a`.`id`
LEFT JOIN `table_c` AS `r2`
ON `r2`.`invoice_2` = `table_a`.`id`
LEFT JOIN `table_a` AS `i1`
ON `i1`.`id` = `r1`.`invoice_2`
LEFT JOIN `table_a` AS `i2`
ON `i2`.`id` = `r2`.`invoice_1`
JOIN `table_d` AS `_u0`
ON `_u0`.`id` = 1
LEFT JOIN `table_e` AS `_ug0`
ON `_ug0`.`user` = `_u0`.`id`
JOIN `table_f` AS `_p0`
ON ( `_p0`.`enabled` = 1
AND ( ( `_p0`.`role` < 2
AND `_p0`.`who` IS NULL )
OR ( `_p0`.`role` = 2
AND ( `_p0`.`who` = '0'
OR `_p0`.`who` = `_u0`.`id` ) )
OR ( `_p0`.`role` = 3
AND ( `_p0`.`who` = '0'
OR `_p0`.`who` = `_ug0`.`group` ) ) ) )
AND ( `_p0`.`action` = '*'
OR `_p0`.`action` = 'read' )
AND ( `_p0`.`related_table` = '*'
OR `_p0`.`related_table` = 'table_name' )
JOIN `table_a` AS `_e0`
ON ( ( `_p0`.`related_id` = 0
OR `_p0`.`related_id` = `_e0`.`id`
OR `_p0`.`related_user` = `_e0`.`user`
OR `_p0`.`related_group` = `_e0`.`group` )
OR ( `_p0`.`role` = 0
AND `_e0`.`user` = `_u0`.`id` )
OR ( `_p0`.`role` = 1
AND `_e0`.`group` = `_ug0`.`group` ) )
AND `_e0`.`id` = `table_a`.`id`
JOIN `table_d` AS `_u1`
ON `_u1`.`id` = 1
LEFT JOIN `table_e` AS `_ug1`
ON `_ug1`.`user` = `_u1`.`id`
JOIN `table_f` AS `_p1`
ON ( `_p1`.`enabled` = 1
AND ( ( `_p1`.`role` < 2
AND `_p1`.`who` IS NULL )
OR ( `_p1`.`role` = 2
AND ( `_p1`.`who` = '0'
OR `_p1`.`who` = `_u1`.`id` ) )
OR ( `_p1`.`role` = 3
AND ( `_p1`.`who` = '0'
OR `_p1`.`who` = `_ug1`.`group` ) ) ) )
AND ( `_p1`.`action` = '*'
OR `_p1`.`action` = 'read' )
AND ( `_p1`.`related_table` = '*'
OR `_p1`.`related_table` = 'table_name' )
JOIN `table_g` AS `_e1`
ON ( ( `_p1`.`related_id` = 0
OR `_p1`.`related_id` = `_e1`.`id`
OR `_p1`.`related_user` = `_e1`.`user`
OR `_p1`.`related_group` = `_e1`.`group` )
OR ( `_p1`.`role` = 0
AND `_e1`.`user` = `_u1`.`id` )
OR ( `_p1`.`role` = 1
AND `_e1`.`group` = `_ug1`.`group` ) )
AND `_e1`.`id` = `table_a`.`company`
WHERE `table_a`.`date_deleted` IS NULL
AND `table_a`.`company` = 4
AND `table_a`.`type` = 1
AND `table_a`.`date_composed` >= '2016-05-04 14:43:55'
GROUP BY `table_a`.`id`
The ORs kill performance.
This composite index may help: INDEX(company, type, date_deleted, date_composed).
LEFT JOIN table_b ON table_b.invoice = table_a.id seems to do absolutely nothing other than slow down the processing. No fields of table_b are used or SELECTed. Since it is a LEFT join, it does not limit the output. Etc. Get rid if it, or justify it.
Ditto for other joins.
What happens with JOIN and GROUP BY: First, all the joins are performed; this explodes the number of rows in the intermediate 'table'. Then the GROUP BY implodes the set of rows.
One technique for avoiding this explode-implode sluggishness is to do
SELECT ...,
( SELECT ... ) AS ...,
...
instead of a JOIN or LEFT JOIN. However, that works only if there is zero or one row in the subquery. Usually this is beneficial when an aggregate (such as SUM) can be moved into the subquery.
For further discussion, please include SHOW CREATE TABLE.
Related
I can't find solution to correct this big query, I always receive an error from database.
I have tre tables and I need to make changes on some attribute on some condition:
UPDATE o36t_orders as temp1,
mytable as temp2
SET
temp1.bonifico = 1,
temp2.ultimo = 1
WHERE
temp1.id_order IN (
SELECT
id_order
FROM o36t_orders
LEFT JOIN o36t_address ON (o36t_address.id_address = o36t_orders.id_address_delivery)
LEFT JOIN mytable ON (
mytable.Causale = CONCAT(
o36t_address.lastname,
' ',
o36t_address.firstname
)
)
WHERE
o36t_orders.bonifico <> 1
)
AND temp2.id IN (
SELECT
id
FROM o36t_orders
LEFT JOIN o36t_address ON (o36t_address.id_address = o36t_orders.id_address_delivery)
LEFT JOIN mytable ON (
mytable.Causale = CONCAT(
o36t_address.lastname,
' ',
o36t_address.firstname
)
)
WHERE
o36t_orders.bonifico <> 1
)
Since the subqueries of the 2 IN clauses are identical (except the retuned column), I think that you can do what you want by a straight inner join of the 2 tables and that subquery (returning both columns):
UPDATE o36t_orders temp1
INNER JOIN (
SELECT
id, id_order
FROM o36t_orders
LEFT JOIN o36t_address ON (o36t_address.id_address = o36t_orders.id_address_delivery)
LEFT JOIN mytable ON (
mytable.Causale = CONCAT(
o36t_address.lastname,
' ',
o36t_address.firstname
)
)
WHERE
o36t_orders.bonifico <> 1
) t ON t.id_order = temp1.id_order
INNER JOIN mytable temp2 ON temp2.id = t.id
SET
temp1.bonifico = 1,
temp2.ultimo = 1
I have sql query:
SELECT leads.lead_id,
attributes.code AS attributeCode,
leads_notes.content AS noteContent,
leads_notes.task_type_id,
task_types.type_name
FROM leads,
leads_notes
INNER JOIN task_types
ON task_types.task_type_id = leads_notes.task_type_id
INNER JOIN leads_attributes
ON leads_attributes.lead_id = leads.lead_id
INNER JOIN attributes
ON attributes.attribute_id = leads_attributes.attribute_id
INNER JOIN leads_notes
ON leads_notes.lead_id = leads.lead_id
WHERE ( leads.ambassador = 1
OR leads.rents_bike = 1 )
AND ( leads.city <> '' )
AND ( leads.address <> ''
OR leads.address2 <> '' )
AND ( leads.country_id <> '' )
AND ( leads_attributes.attribute_id IN ( $attributes_id ) )
AND ( leads.lead_id = $lead_id )
But I get error:
Syntax error or access violation: 1066 Not unique table/alias:
'leads_notes''
How can I solve it? Thanks.
Never use commas in the FROM clause. Always use proper, explicit JOIN syntax.
In your case you also had lead_notes twice in the FROM clause. It is easier to write and read the query if you use table aliases:
SELECT l.lead_id, a.code as attributeCode, ln.content as noteContent, ln.task_type_id, tt.type_name
FROM leads l JOIN
leads_notes ln
ON ln.lead_id = l.lead_id JOIN
task_types tt
ON tt.task_type_id = ln.task_type_id JOIN
leads_attributes la
ON la.lead_id = l.lead_id JOIN
attributes a
ON a.attribute_id = la.attribute_id
WHERE (l.ambassador=1 OR l.rents_bike=1) AND
(l.city <> '') AND
(l.address <>'' OR l.address2 <>'') AND
(l.country_id <> '') AND
(la.attribute_id IN ($attributes_id)) AND
(l.lead_id = $lead_id);
You have add leads_notes table double: one in after leads table and another in inner join, so remove one solve your problem like following
SELECT leads.lead_id,
attributes.code AS attributeCode,
leads_notes.content AS noteContent,
leads_notes.task_type_id,
task_types.type_name
FROM leads
INNER JOIN task_types
ON task_types.task_type_id = leads_notes.task_type_id
INNER JOIN leads_attributes
ON leads_attributes.lead_id = leads.lead_id
INNER JOIN attributes
ON attributes.attribute_id = leads_attributes.attribute_id
INNER JOIN leads_notes
ON leads_notes.lead_id = leads.lead_id
WHERE ( leads.ambassador = 1
OR leads.rents_bike = 1 )
AND ( leads.city <> '' )
AND ( leads.address <> ''
OR leads.address2 <> '' )
AND ( leads.country_id <> '' )
AND ( leads_attributes.attribute_id IN ( $attributes_id ) )
AND ( leads.lead_id = $lead_id )
I have a principal requet with 2 requets in. I have a problem in my second nested query, I have a condition on id and if I made my request id = 10 takes a long time to execute, so if I replace it by id LIKE 10 my request execute in one second.
Here the request:
SELECT SQL_NO_CACHE contact_groupe.id_contact_groupe
FROM toto.contact_groupe
LEFT JOIN toto.`contact` AS `contact`
ON ((toto.contact_groupe.id_contact_groupe = toto.contact.id_contact_groupe))
LEFT JOIN toto.`project` AS `project`
ON ((toto.contact_groupe.id_contact_groupe = toto.project.id_contact_groupe)
AND ( toto.project.id_project
IN (
SELECT MAX(toto.project.id_project)
FROM toto.project
WHERE ( toto.contact_groupe.id_contact_groupe = toto.project.id_contact_groupe )
) ))
LEFT JOIN toto.`phase` AS `phase`
ON ((project.id_phase = toto.phase.id_phase))
LEFT JOIN sql_base.`user` AS `user_suivi`
ON ((toto.contact_groupe.id_user_suivi = user_suivi.id_user))
WHERE ( en_attente = '0' AND contact_groupe.id_contact_groupe
IN (
SELECT DISTINCT(contact_groupe.id_contact_groupe)
FROM toto.contact_groupe
LEFT JOIN toto.`contact` AS `contact`
ON ((toto.contact_groupe.id_contact_groupe = toto.contact.id_contact_groupe)
LEFT JOIN toto.`source_contact_groupe` AS `source_contact_groupe`
ON ((toto.contact_groupe.id_contact_groupe = toto.source_contact_groupe.id_contact_groupe))
LEFT JOIN toto.`project` AS `project`
ON ((toto.contact_groupe.id_contact_groupe = toto.project.id_contact_groupe))
LEFT JOIN toto.`remarque` AS `remarque`
ON ((toto.contact_groupe.id_contact_groupe = toto.remarque.id_contact_groupe))
LEFT JOIN toto.`project_type_construction_options` AS `project_type_construction_options`
ON ((project.id_project = toto.project_type_construction_options.id_project))
LEFT JOIN toto.`project_concurrent` AS `project_concurrent`
ON ((project.id_project = toto.project_concurrent.id_project))
LEFT JOIN toto.`telephone` AS `telephone`
ON ((contact.id_contact = toto.telephone.id_contact))
WHERE ( en_attente = '0' AND ( toto.project.id_project = '10' ) AND toto.contact_groupe.id_entreprise = '2' )
)
AND toto.contact_groupe.id_entreprise = '2' )
ORDER BY toto.contact_groupe.id_contact_groupe ASC
the line is the following problem toto.project.id_project = '10' and I don't understand why the time to execute request is so different between = and LIKE
Let's start with your subquery. Those 17 lines that you've written are functionally identical to this, so why not use this instead?
SELECT DISTINCT g.id_contact_groupe
FROM contact_groupe g
JOIN project p
ON p.id_contact_groupe = g.id_contact_groupe
WHERE g.en_attente = 0
AND p.id_project = 10
AND g.id_entreprise = 2
I have a subquery in select clause, like this
SELECT
`s`.`id` AS `id`,
`s`.`unit_id` AS `unit_id`,
`m`.`description` AS `description`,
`s`.`lot_no` AS `lot_no`,
(
(
(
SELECT COALESCE(SUM(`payment_amt`),0) AS `jmlh`
FROM `sb_ar_bill_sch`
WHERE `unit_id` = `s`.`unit_id` AND `lot_no` = `s`.`lot_no` AND `project_no` = `s`.`project_no` AND `debtor_acct` = `s`.`debtor_acct`
AND `bill_type` = 'S'
)
/
`s`.`sell_price`
)
*
100
) AS `persen_paid`
FROM `p_sis_rl_sales` `s`
JOIN `p_sis_pl_project` `p` ON `s`.`unit_id` = `p`.`unit_id` AND `s`.`project_no` = `p`.`project_no`
JOIN `p_sis_pm_lot` `l` ON `s`.`unit_id` = `l`.`unit_id` AND `s`.`project_no` = `l`.`project_no` AND `s`.`lot_no` = `l`.`lot_no`
JOIN `p_common_businessunit` `m` ON `s`.`unit_id` = `m`.`unit_id`
JOIN `v_la_lastowner` `o` ON `s`.`unit_id` = `o`.`unit_id` AND `s`.`project_no` = `o`.`project_no` AND `s`.`lot_no` = `o`.`lot_no`
WHERE
(
`s`.`contract_no` IS NOT NULL
AND
(
(
(
SELECT COALESCE(SUM(`payment_amt`),0) AS `jmlh`
FROM `sb_ar_bill_sch`
WHERE `unit_id` = `s`.`unit_id` AND `lot_no` = `s`.`lot_no` AND `project_no` = `s`.`project_no` AND `debtor_acct` = `s`.`debtor_acct`
AND `bill_type` = 'S'
)
/ `s`.`sell_price`
)
*
100
)
>=
100
)
ORDER BY `s`.`sales_date` DESC
I already indexed the table that I used.
Do I need to process hundreds of thousands of rows in one table?
The Query above takes a long time. How can the query above be optimized?
I am encountering a strange issue where this particular MySQL query that we have would run almost 50 times slower after we upgraded our database from MySQL 5.1.73 to 5.6.23.
This is the SQL query:
SELECT `companies`.*
FROM `companies`
LEFT OUTER JOIN `company_texts`
ON `company_texts`.`company_id` = `companies`.`id`
AND `company_texts`.`language` = 'en'
AND `company_texts`.`region` = 'US'
INNER JOIN show_texts
ON show_texts.company_id = companies.id
AND `show_texts`.`is_deleted` = 0
AND `show_texts`.`language` = 'en'
AND `show_texts`.`region` = 'US'
INNER JOIN show_region_counts
ON show_region_counts.show_id = show_texts.show_id
AND show_region_counts.region = 'US'
WHERE ( ( `companies`.`id` NOT IN ( '77', '26' ) )
AND ( `company_texts`.is_deleted = 0 )
AND `companies`.id IN (
SELECT DISTINCT show_texts.company_id AS
id
FROM shows
INNER JOIN `show_rollups`
ON
`show_rollups`.`show_id` = `shows`.`id`
AND ( `show_rollups`.`device_id` = 3 )
AND ( `show_rollups`.`package_group_id` = 2 )
AND ( `show_rollups`.`videos_count` > 0 )
LEFT OUTER JOIN `show_texts` ON
`show_texts`.`show_id` = `shows`.`id`
AND
`show_texts`.`is_deleted` = 0
AND
`show_texts`.`language` = 'en'
AND
`show_texts`.`region` = 'US'
AND
shows.is_browseable = 1
AND
show_texts.show_id IS NOT NULL
AND (
`show_rollups`.`episodes_count` > 0
OR `show_rollups`.`clips_count` > 0
OR `show_rollups`.`games_count` > 0
)
) )
GROUP BY companies.id
ORDER BY Sum(show_region_counts.view_count) DESC
LIMIT 30 offset 30;
Now the problem is when I run this query in MySQL 5.1.73 before the upgrade, the query would only take around 1.5 seconds, but after the upgrade to 5.6.23, it now can take upward to 1 minute.
So I did an EXPLAIN of this query in 5.1.73, and I saw this:
Enlarged version : http://i.stack.imgur.com/c4ko0.jpg
And when I did EXPLAIN in 5.6.23 , I saw this :
Enlarged version : http://i.stack.imgur.com/CgBtA.jpg
I can see that in both cases, there is a full scan (type ALL) of the shows table, but is there something else I am not seeing that is causing the massive slowdown in 5.6?
Thanks
IS
Please provide SHOW CREATE TABLE. Without that, I will guess that you are missing this desirable index:
show_rollups: INDEX(device_id, package_group_id, videos_count)
You have LEFT OUTER JOIN show_texts ... ON ... show_texts.show_id IS NOT NULL. This is probably wrong for two reasons: (a) Don't use LEFT if you are not looking for NULL, and (b) the NULL test should be in the missing WHERE clause, not in the ON clause.
Getting rid of the IN ( SELECT ... ) may help it on both machines:
SELECT c.*
FROM
( SELECT DISTINCT st.company_id AS id
FROM shows
INNER JOIN `show_rollups` AS sr ON sr.`show_id` = `shows`.`id`
AND ( sr.`device_id` = 3 )
AND ( sr.`package_group_id` = 2 )
AND ( sr.`videos_count` > 0 )
LEFT OUTER JOIN `show_texts` AS st ON st.`show_id` = `shows`.`id`
AND st.`is_deleted` = 0
AND st.`language` = 'en'
AND st.`region` = 'US'
AND shows.is_browseable = 1
AND st.show_id IS NOT NULL
AND ( sr.`episodes_count` > 0
OR sr.`clips_count` > 0
OR sr.`games_count` > 0 )
) AS x
JOIN `companies` AS c ON x.id = c.id
LEFT OUTER JOIN `company_texts` AS ct ON ct.`company_id` = c.`id`
AND ct.`language` = 'en'
AND ct.`region` = 'US'
INNER JOIN show_texts AS st ON st.company_id = c.id
AND st.`is_deleted` = 0
AND st.`language` = 'en'
AND st.`region` = 'US'
INNER JOIN show_region_counts AS src ON src.show_id = st.show_id
AND src.region = 'US'
WHERE ( c.`id` NOT IN ( '77', '26' ) )
AND ( ct.is_deleted = 0 )
GROUP BY c.id
ORDER BY Sum(src.view_count) DESC
LIMIT 30 offset 30;
There is some chance that the JOINs will mess with the computation in Sum(src.view_count).
I'm sorry, but I can't make out the EXPLAIN screen-grabs on my display.
Without that, I'd wonder if the schemas for the tables changed at all, or of the database engines changed. In particular, companies.id seems to be used as a primary key.