Improve SQL Query perofrmance - mysql

I have a complex query which takes 700ms to run on my machine. I found that the bottleneck is the ORDER BY at_firstname.value clause, but how can I use indexes to improve this?
SELECT
`e`.*
, `at_default_billing`.`value` AS `default_billing`
, `at_billing_postcode`.`value` AS `billing_postcode`
, `at_billing_city`.`value` AS `billing_city`
, `at_billing_region`.`value` AS `billing_region`
, `at_billing_country_id`.`value` AS `billing_country_id`
, `at_company`.`value` AS `company`
, `at_firstname`.`value` AS `firstname`
, `at_lastname`.`value` AS `lastname`
, CONCAT(at_firstname.value
, " "
, at_lastname.value) AS `full_name`
, `at_phone`.`value` AS `phone`
, IFNULL(at_phone.value,"N/A") AS `telephone`
, `e`.`entity_id` AS `id`
FROM
`customer_entity` AS `e`
LEFT JOIN
`customer_entity_int` AS `at_default_billing`
ON (`at_default_billing`.`entity_id` = `e`.`entity_id`)
AND (`at_default_billing`.`attribute_id` = '13')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_postcode`
ON (`at_billing_postcode`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_postcode`.`attribute_id` = '30')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_city`
ON (`at_billing_city`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_city`.`attribute_id` = '26')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_region`
ON (`at_billing_region`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_region`.`attribute_id` = '28')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_country_id`
ON (`at_billing_country_id`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_country_id`.`attribute_id` = '27')
LEFT JOIN
`customer_address_entity_varchar` AS `at_company`
ON (`at_company`.`entity_id` = `at_default_billing`.`value`)
AND (`at_company`.`attribute_id` = '24')
LEFT JOIN
`customer_entity_varchar` AS `at_firstname`
ON (`at_firstname`.`entity_id` = `e`.`entity_id`)
AND (`at_firstname`.`attribute_id` = '5')
LEFT JOIN
`customer_entity_varchar` AS `at_lastname`
ON (`at_lastname`.`entity_id` = `e`.`entity_id`)
AND (`at_lastname`.`attribute_id` = '7')
LEFT JOIN
`customer_entity_varchar` AS `at_phone`
ON (`at_phone`.`entity_id` = `e`.`entity_id`)
AND (`at_phone`.`attribute_id` = '136')
ORDER BY
`at_firstname`.`value` ASC LIMIT 20
This is execution plan :
EXPLAIN of Query :
'1','SIMPLE','e',NULL,'ALL',NULL,NULL,NULL,NULL,'19951','100.00','Using temporary; Using filesort'
'1','SIMPLE','at_default_billing',NULL,'eq_ref','UNQ_CUSTOMER_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_INT_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_INT_ENTITY_ID,IDX_CUSTOMER_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.e.entity_id,const','1','100.00',NULL
'1','SIMPLE','at_billing_postcode',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_billing_city',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_billing_region',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_billing_country_id',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_company',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_firstname',NULL,'eq_ref','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.e.entity_id,const','1','100.00',NULL
'1','SIMPLE','at_lastname',NULL,'eq_ref','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.e.entity_id,const','1','100.00',NULL
'1','SIMPLE','at_phone',NULL,'eq_ref','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.e.entity_id,const','1','100.00',NULL
Table Structure:
CREATE TABLE `customer_entity_varchar` (
`value_id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Value Id',
`entity_type_id` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'Entity Type Id',
`attribute_id` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'Attribute Id',
`entity_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Entity Id',
`value` varchar(255) DEFAULT NULL COMMENT 'Value',
PRIMARY KEY (`value_id`),
UNIQUE KEY `UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID` (`entity_id`,`attribute_id`),
KEY `IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_TYPE_ID` (`entity_type_id`),
KEY `IDX_CUSTOMER_ENTITY_VARCHAR_ATTRIBUTE_ID` (`attribute_id`),
KEY `IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID` (`entity_id`),
KEY `IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE` (`entity_id`,`attribute_id`,`value`),
CONSTRAINT `FK_CSTR_ENTT_VCHR_ATTR_ID_EAV_ATTR_ATTR_ID` FOREIGN KEY (`attribute_id`) REFERENCES `eav_attribute` (`attribute_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `FK_CSTR_ENTT_VCHR_ENTT_TYPE_ID_EAV_ENTT_TYPE_ENTT_TYPE_ID` FOREIGN KEY (`entity_type_id`) REFERENCES `eav_entity_type` (`entity_type_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `FK_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_CUSTOMER_ENTITY_ENTITY_ID` FOREIGN KEY (`entity_id`) REFERENCES `customer_entity` (`entity_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=131094 DEFAULT CHARSET=utf8 COMMENT='Customer Entity Varchar';

As of now your query is:
Performing ALL left outer joins first.
Then ORDERing the rows.
Then LIMITing the rows.
I would perform the strictly needed outer joins first, then ordering and limiting (to reduce to 20 rows), and finally I would do all the rest of the outer joins. In short I would do:
Performing minimal left outer join first. That is, two tables only.
Then ORDERing the rows.
Then LIMITing the rows. This produce a max of 20 rows.
Perform all the rest of outer joins. At this point this is not thousands of rows anymore, but only 20.
This change should massively reduce the "Unique Key Lookup" executions. The modified query will look like:
select
e.*
, `at_default_billing`.`value` AS `default_billing`
, `at_billing_postcode`.`value` AS `billing_postcode`
, `at_billing_city`.`value` AS `billing_city`
, `at_billing_region`.`value` AS `billing_region`
, `at_billing_country_id`.`value` AS `billing_country_id`
, `at_company`.`value` AS `company`
, `at_lastname`.`value` AS `lastname`
, CONCAT(firstname
, " "
, at_lastname.value) AS `full_name`
, `at_phone`.`value` AS `phone`
, IFNULL(at_phone.value,"N/A") AS `telephone`
from ( -- Step #1: joining customer_entity with customer_entity_varchar
SELECT
`e`.*
, `at_firstname`.`value` AS `firstname`
, `e`.`entity_id` AS `id`
FROM
`customer_entity` AS `e`
LEFT JOIN
`customer_entity_varchar` AS `at_firstname`
ON (`at_firstname`.`entity_id` = `e`.`entity_id`)
AND (`at_firstname`.`attribute_id` = '5')
ORDER BY -- Step #2: Sorting (the bare minimum)
`at_firstname`.`value` ASC
LIMIT 20 -- Step #3: Limiting (to 20 rows)
) e
LEFT JOIN -- Step #4: Performing all the rest of outer joins (only few rows now)
`customer_entity_int` AS `at_default_billing`
ON (`at_default_billing`.`entity_id` = `e`.`entity_id`)
AND (`at_default_billing`.`attribute_id` = '13')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_postcode`
ON (`at_billing_postcode`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_postcode`.`attribute_id` = '30')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_city`
ON (`at_billing_city`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_city`.`attribute_id` = '26')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_region`
ON (`at_billing_region`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_region`.`attribute_id` = '28')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_country_id`
ON (`at_billing_country_id`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_country_id`.`attribute_id` = '27')
LEFT JOIN
`customer_address_entity_varchar` AS `at_company`
ON (`at_company`.`entity_id` = `at_default_billing`.`value`)
AND (`at_company`.`attribute_id` = '24')
LEFT JOIN
`customer_entity_varchar` AS `at_lastname`
ON (`at_lastname`.`entity_id` = `e`.`entity_id`)
AND (`at_lastname`.`attribute_id` = '7')
LEFT JOIN
`customer_entity_varchar` AS `at_phone`
ON (`at_phone`.`entity_id` = `e`.`entity_id`)
AND (`at_phone`.`attribute_id` = '136')

Unfortunately, SELECT whole_mess_of_rows FROM many_tables ORDER BY one_col LIMIT small_number is a notorious performance antipattern. Why? Because it sorts a big result set, just to discard most of it.
The trick is to cheaply find out which rows are within that LIMIT small_number, then retrieve only those rows from the larger query.
Which rows do you want? It looks to me like this query will retrieve their customer_entity.id values. But it's hard to be sure, so you should test this subquery.
SELECT customer_entity.entity_id
FROM customer_entity
LEFT JOIN customer_entity_varchar AS at_firstname
ON (at_firstname.entity_id = e.entity_id)
AND (at_firstname.attribute_id = '5')
ORDER BY at_firstname.value ASC
LIMIT 20
This should give the twenty relevant entity_id values. Test it. Look at its execution plan. Add an appropriate index to customer_entity if need be. That index might be (firstname_attribute_id, firstname_entity_id, firstname_value) But I am guessing.
Then you can put this at the end of your main query, right before ORDER BY.
WHERE e.entity_id IN (
SELECT customer_entity.entity_id
FROM customer_entity
LEFT JOIN customer_entity_varchar AS at_firstname
ON (at_firstname.entity_id = e.entity_id)
AND (at_firstname.attribute_id = '5')
ORDER BY at_firstname.value ASC
LIMIT 20
)
and things should be a bit faster.

I agree with the previous Answers, but want to emphasize on more antipattern: Over-noramlization.
Your schema is a curious (and inefficient) variant on the already-bad EAV schema pattern.
There is little advantage, and some disadvantage in splitting customer_address_entity_varchar across 5 tables. Similarly for customer_entity_varchar.
An address should (usually) be stored as a few columns in a single table; no JOINs to other tables.
Likewise for firstname+lastname.
Phone could be another issue, since a person/company/entity could have multiple phone numbers (cell, home, work, fax, etc). But that is a different story.

Related

Super-slow MySQL query with WHERE and Index

i have a super slow query without any clue about the reason why it is slow, and some strange behavious on it too.
SELECT <fields list>
FROM scadenze s
JOIN cartelle_cliniche cc ON (cc.id = s.id_cartella)
JOIN anagrafica a ON (a.id = cc.id_anagrafica)
JOIN scadenze_sedute ss ON (ss.id_scadenza = s.id)
JOIN pmultiple_prenotazioni pp ON (pp.id = ss.id_fisico OR pp.id = ss.id_virtuale)
LEFT JOIN medici_privati mp ON (mp.id = a.id_medico)
LEFT JOIN operatorif o ON (o.id = cc.id_operatore_emittente)
WHERE pp.confermato = '1' AND pp.annullato = '0'
AND pp.id_esito != 4 AND s.id_stato = '0'
AND (DATE(s.data_scadenza) BETWEEN '2021-06-01' AND '2021-06-16')
GROUP BY s.id, TIME(ss.data_seduta)
It lasts 5 minutes!!
Now, as you can see there are a lot of JOINs and only 5 WHERE clauses.
If i remove from WHERE clauses these 3 parameters: pp.confermato = '1' AND pp.annullato = '0' AND pp.id_esito != 4, the query lasts only 0.15 seconds.
I have single Indexes on these 3 parameters, and i also tried to add a multiple Index using the FORCE INDEX function of MySQL. But nothing.
If instead i put that parameters in an HAVING clause, like:
SELECT <fields list>
FROM scadenze s
JOIN cartelle_cliniche cc ON (cc.id = s.id_cartella)
JOIN anagrafica a ON (a.id = cc.id_anagrafica)
JOIN scadenze_sedute ss ON (ss.id_scadenza = s.id)
JOIN pmultiple_prenotazioni pp ON (pp.id = ss.id_fisico OR pp.id = ss.id_virtuale)
LEFT JOIN medici_privati mp ON (mp.id = a.id_medico)
LEFT JOIN operatorif o ON (o.id = cc.id_operatore_emittente)
WHERE s.id_stato = '0' AND (DATE(s.data_scadenza) BETWEEN '2021-06-01' AND '2021-06-16')
GROUP BY s.id, TIME(ss.data_seduta)
HAVING pp.confermato = '1' AND pp.annullato = '0' AND pp.id_esito != 4
Query lasts 0.15 seconds.
But what's the problem with using these parameters in the WHERE clauses?
(In Having clause i get 308 records instead of 309)
What does can cause this slow query if i have Indexes on any of that field?
Thank you in advance.
EDIT:
Explain query here:
CREATE Table for pmultiple_prenotazione (pp) here:
CREATE TABLE `pmultiple_prenotazioni` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`id_utente` INT(11) NOT NULL,
`id_anagrafica` INT(11) NOT NULL DEFAULT '0',
`id_cartella` INT(11) NOT NULL DEFAULT '0',
`id_kit` INT(11) NOT NULL DEFAULT '0',
`id_fase` INT(11) NOT NULL DEFAULT '0',
`data_seduta` DATETIME NOT NULL,
`giorno` VARCHAR(50) NOT NULL COLLATE 'latin1_swedish_ci',
`id_reparto` INT(11) NOT NULL,
`importo_prestazione` DECIMAL(6,2) NOT NULL DEFAULT '0.00',
`peso_prestazione` DECIMAL(5,2) NOT NULL DEFAULT '0.00',
`durata_prestazione` SMALLINT(6) NOT NULL DEFAULT '0',
`id_operatore` INT(11) NOT NULL DEFAULT '0',
`solo` TINYINT(1) NOT NULL DEFAULT '0',
`confermato` TINYINT(1) NOT NULL DEFAULT '0',
`annullato` TINYINT(1) NOT NULL DEFAULT '0',
`id_esito` TINYINT(4) NOT NULL DEFAULT '0',
`creazione` DATETIME NOT NULL,
`univoco` VARCHAR(255) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
PRIMARY KEY (`id`) USING BTREE,
INDEX `id_cartella` (`id_cartella`) USING BTREE,
INDEX `id_kit` (`id_kit`) USING BTREE,
INDEX `data_seduta` (`data_seduta`) USING BTREE,
INDEX `confermato` (`confermato`) USING BTREE,
INDEX `id_utente` (`id_utente`) USING BTREE,
INDEX `id_reparto` (`id_reparto`) USING BTREE,
INDEX `id_anagrafica` (`id_anagrafica`) USING BTREE,
INDEX `id_fase` (`id_fase`) USING BTREE,
INDEX `peso_prestazione` (`peso_prestazione`) USING BTREE,
INDEX `durata_prestazione` (`durata_prestazione`) USING BTREE,
INDEX `solo` (`solo`) USING BTREE,
INDEX `id_operatore` (`id_operatore`) USING BTREE,
INDEX `univoco` (`univoco`) USING BTREE,
INDEX `annullato` (`annullato`) USING BTREE,
CONSTRAINT `prenkit` FOREIGN KEY (`id_kit`) REFERENCES `starbene_1`.`pmultiple_kit` (`id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=717
;
The problem is caused here:
JOIN pmultiple_prenotazioni pp ON (pp.id = ss.id_fisico OR pp.id = ss.id_virtuale)
There is no way an index can help this. If you just had one term or the other in this join condition, it could use the primary key index for pp.id. But with the OR expression, it cannot.
You can see the effect in the EXPLAIN where it must do a table-scan (type: ALL) for pp. So the total number of join comparisons is the number of rows examined in ss or 10193 multiplied by the number of rows examined in pp or 23565. Total: 240 million!
You could separate it into two joins:
LEFT JOIN pmultiple_prenotazioni pp1 ON (pp1.id = ss.id_fisico)
LEFT JOIN pmultiple_prenotazioni pp2 ON (pp2.id = ss.id_virtuale)
...
WHERE
pp1.confermato = '1' AND pp1.annullato = '0' AND pp1.id_esito != 4
AND pp2.confermato = '1' AND pp2.annullato = '0' AND pp2.id_esito != 4
AND ...
Or you could use a UNION:
SELECT * FROM (
SELECT s.id, ss.data_seduta
FROM scadenze s
JOIN cartelle_cliniche cc ON (cc.id = s.id_cartella)
JOIN anagrafica a ON (a.id = cc.id_anagrafica)
JOIN scadenze_sedute ss ON (ss.id_scadenza = s.id)
JOIN pmultiple_prenotazioni pp ON (pp.id = ss.id_fisico) -- ONLY fisico
LEFT JOIN medici_privati mp ON (mp.id = a.id_medico)
LEFT JOIN operatorif o ON (o.id = cc.id_operatore_emittente)
WHERE
pp.confermato = '1' AND pp.annullato = '0' AND pp.id_esito != 4
AND s.id_stato = '0' AND (DATE(s.data_scadenza) BETWEEN '2021-06-01' AND '2021-06-16')
UNION
SELECT s.id, ss.data_seduta
FROM scadenze s
JOIN cartelle_cliniche cc ON (cc.id = s.id_cartella)
JOIN anagrafica a ON (a.id = cc.id_anagrafica)
JOIN scadenze_sedute ss ON (ss.id_scadenza = s.id)
JOIN pmultiple_prenotazioni pp ON (pp.id = ss.id_virtuale) -- ONLY virtuale
LEFT JOIN medici_privati mp ON (mp.id = a.id_medico)
LEFT JOIN operatorif o ON (o.id = cc.id_operatore_emittente)
WHERE
pp.confermato = '1' AND pp.annullato = '0' AND pp.id_esito != 4
AND s.id_stato = '0' AND (DATE(s.data_scadenza) BETWEEN '2021-06-01' AND '2021-06-16')
) AS t
GROUP BY id, TIME(data_seduta);
In MySQL Workbench you can see the execution plan of your query just by clicking on "Execution plan" (bottom right of the page after the execution of the query). Then it should be easy to know which step requires more time.
Looks like it might be a case of "inflate-deflate". First it JOINs tables (inflate), then it GROUPs (deflate). In the middle there may be a lot of rows with a lot of work to do all the joining.
So, try to do the GROUP BY first.
SELECT ...
FROM ( SELECT ... FROM s JOIN ss ...
GROUP BY s.id, TIME(ss.data_seduta) ) AS aaa
JOIN ...
You should not need to do a GROUP BY at the end.

Index probably not used correctly on simple SQL query

Size:
Campaigns: 3k rows (200 with campaigns.is_active = 1)
Links: 20k rows (4k with links.status = 1 // 500 with links.status = 1 AND campaigns.is_active = 1)
Clicks: 10mln rows (50k with created > '2020-10-25 00:00:00')
This query runs 2 seconds
SELECT links.id, COUNT(clicks.id)
FROM links
INNER JOIN campaigns ON campaigns.id = links.campaign_id
AND campaigns.is_active = 1
LEFT JOIN clicks ON clicks.link_id = links.id
WHERE links.status = 1
AND clicks.created > '2020-10-25 00:00:00'
GROUP BY links.id
When I remove the following line, it runs just 0.13 seconds (15 times faster)
AND campaigns.is_active = 1
There is an INDEX on campaigns.is_active.
Also tried to set an index on 2 columns (campaigns.id + campaigns.is_active) but didn't help.
"campaigns.is_active" contains simply 0 or 1. The campaigns table is small, the campaigns.is_active condition actually reduces the amount of rows. So it should speed up the query instead.
Why does it take so much longer because of this condition and how to fix it?
If I would remove the JOIN to campaigns and instead add links.campaign_id to the SELECT fields and then query every single of the returned campaign_id's in an additional query like "SELECT is_active FROM campaigns WHERE id = ?" it would still be faster, because such a query is 0.000x. From my experience when something is faster in 2 queries, it usually means the first query isn't optimized to its full extent.
Explain-Select
Structure
CREATE TABLE `campaigns` (
`id` int(11) UNSIGNED NOT NULL,
`is_active` tinyint(4) NOT NULL DEFAULT 0
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `clicks` (
`id` int(11) UNSIGNED NOT NULL,
`link_id` int(11) UNSIGNED NOT NULL,
`created` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `links` (
`id` int(11) UNSIGNED NOT NULL,
`campaign_id` int(8) UNSIGNED NOT NULL,
`status` tinyint(4) NOT NULL DEFAULT 0
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `campaigns`
ADD PRIMARY KEY (`id`),
ADD UNIQUE KEY `id_isactive` (`id`,`is_active`),
ADD KEY `is_active` (`is_active`)
ALTER TABLE `clicks`
ADD PRIMARY KEY (`id`),
ADD KEY `link_id` (`link_id`),
ADD KEY `created` (`created`)
ALTER TABLE `links`
ADD PRIMARY KEY (`id`),
ADD KEY `campaign_id` (`campaign_id`),
How long does this take?
SELECT l.id,
(SELECT COUNT(*)
FROM clicks cl
WHERE cl.link_id = l.id AND
cl.created > '2020-10-25'
)
FROM links l JOIN
campaigns ca
ca.id = l.campaign_id
WHERE l.status = 1 AND ca.is_active = 1;
EDIT:
Hmmm, with an order by, you can try:
SELECT l.id,
(SELECT COUNT(*)
FROM clicks cl
WHERE cl.link_id = l.id AND
cl.created > '2020-10-25'
)
FROM links l
WHERE EXISTS (SELECT 1
FROM campaigns ca
WHERE ca.id = l.campaign_id AND ca.is_active = 1
)
WHERE l.status = 1
ORDER BY l.id;
For this, you want an index on links(status, id) and campaigns(campaign_id, is_active).
Question... If a campaign is not currently active, you don't want any output for it, correct? Furthermore, there won't be any clicks for inactive campaigns, correct? Then why bother checking is_active?
Even if my analysis is wrong, it may be faster to ignore is_active until after the counts have been tallied.
Please don't use LEFT when it is not functional. You have a simple JOIN.
Use COUNT(*); COUNT(x) tests x for being not null.
SELECT links.id, COUNT(*)
FROM links
JOIN clicks ON clicks.link_id = links.id
WHERE links.status = 1
AND clicks.created > '2020-10-25 00:00:00'
GROUP BY links.id
This is redundant:
ADD UNIQUE KEY `id_isactive` (`id`,`is_active`),
since PRIMARY KEY(id) declares id to be an index and unique.
I prefer not to fight the database engine optimizer.
SELECT links.id, campaigns.is_active, COUNT(clicks.id)
FROM links
INNER JOIN campaigns ON campaigns.id = links.campaign_id
LEFT JOIN clicks ON clicks.link_id = links.id
WHERE links.status = 1
AND clicks.created > '2020-10-25 00:00:00'
GROUP BY links.id, campaigns.is_active
HAVING campaigns.is_active = 1;
Second variant!
-- Second Variant
EXPLAIN
SELECT links.id AS LinksId
, COUNT(clicks.id) AS ClickCount
FROM links
LEFT JOIN clicks
ON links.id = clicks.link_id
WHERE links.status = 1
AND clicks.created > '2020-10-25 00:00:00'
AND links.campaign_id IN (SELECT campaign_id
FROM campaigns
WHERE is_active = 1)
GROUP BY links.id;
Third time is the charm! Using CTEs due to the published cardinalities.
-- Third time is the charm
WITH ActiveCampaigns
AS
(SELECT *
FROM campaigns
WHERE is_active = 1)
SELECT links.id, COUNT(clicks.id)
FROM links
INNER JOIN ActiveCampaigns
ON ActiveCampaigns.id = links.campaign_id
LEFT JOIN clicks
ON clicks.link_id = links.id
WHERE links.status = 1
AND clicks.created > '2020-10-25 00:00:00'
GROUP BY links.id;

GROUP_CONCAT in subquery returns one row only

I have some troubles to finish my Mysql Query to return me what I need to have. I am new to such long queries in MYSQL.
SELECT
lang_rel_a_id,
lang_rel_b_id,
lang_rel_id,
tla.text_lang_t AS atext,
lald.lang_data_lang_id AS laid,
lald.lang_data_position AS lapp,
lald.lang_data_font_weight AS lafw,
lald.lang_data_font_size AS lafs,
lald.lang_data_font_color AS lafc,
lald.lang_data_bg_color AS labg,
lasdf.funca AS lafunc,
lang_ship,
lbld.lang_data_lang_id AS lbid,
lbld.lang_data_position AS lbpp,
lbld.lang_data_font_weight AS lbfw,
lbld.lang_data_font_size AS lbfs,
lbld.lang_data_font_color AS lbfc,
lbld.lang_data_bg_color AS lbbg,
tlb.text_lang_t AS btext,
lbsdf.funcb AS lbfunc
FROM lang_relation
LEFT JOIN
(SELECT *, GROUP_CONCAT(text_func_t SEPARATOR ', ') AS funca
FROM synt_data_func
LEFT JOIN text_func ON text_func_id = synt_df_func
GROUP BY synt_df_lang_data
)
lasdf ON lang_rel_a_id = lasdf.synt_df_lang_data
LEFT JOIN lang_data lald ON lald.lang_data_id = lang_rel_a_id
LEFT JOIN text_lang tla ON lald.lang_data_lang_id = tla.text_lang_id
LEFT JOIN
(SELECT *, GROUP_CONCAT(text_func_t SEPARATOR ', ') AS funcb
FROM synt_data_func
LEFT JOIN text_func ON text_func_id = synt_df_func
GROUP BY synt_df_lang_data
)
lbsdf ON lang_rel_b_id = lbsdf.synt_df_lang_data
LEFT JOIN lang_data lbld ON lbld.lang_data_id = lang_rel_b_id
LEFT JOIN text_lang tlb ON lbld.lang_data_lang_id = tlb.text_lang_id
WHERE lang_rel_a_id < lang_rel_b_id
GROUP BY lang_rel_id
I have a relation of two languages in my lang_relation table. I need to query for each of them 2 subtables but the one of them is a relation table that contains the relation between the lang_data_id (= lang_rel_a_id OR lang_rel_b_id, = synt_df_lang_data) and text of the different language functions where multiple values are possible.
I do not understand why the group_concat in this subquery returns only one row. If I do only this query, I get all the results. But when I put it into this larger query, everything's fine but this.. is ..not.
My language_relation table
CREATE TABLE `lang_relation`
(
`lang_rel_id` int(11) NOT NULL,
`lang_rel_a_id` int(11) NOT NULL,
`lang_rel_b_id` int(11) NOT NULL,
`lang_ship` tinyint(1) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The joined lang_data
CREATE TABLE `lang_data` (
`lang_data_id` int(11) NOT NULL,
`lang_data_pic_key` int(11) NOT NULL,
`lang_data_position` tinyint(1) NOT NULL,
`lang_data_lang_id` int(11) NOT NULL,
`lang_data_font_weight` tinyint(2) NOT NULL,
`lang_data_font_size` tinyint(2) NOT NULL,
`lang_data_font_color` tinyint(2) NOT NULL,
`lang_data_bg_color` tinyint(2) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And the synt_data_func. The text_table is a simple 2-column-table with id + text.
CREATE TABLE `synt_data_func` (
`synt_df_id` int(11) NOT NULL,
`synt_df_lang_data` int(11) NOT NULL,
`synt_df_func` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
I tried different approachs. This seems to be the one that is the nearest of what I need. I don't know how many times I changed the GROUP BY clauses, I even tried to do the CONCAT_GROUP in the parent SELECT..
I even wonder if this is possible because the subqueries are going for 2 different IDs .. is this the problem?
Thanks for any hint in advance.
I finally got it. Maybe it will help somebody with a similar question. I changed my approach for this query.
SELECT
lrel.lang_rel_pic_key,
lrel.lang_rel_id,
langdata_a.lascore,
lasdf.func_a,
langdata_a.latext,
lasf.score_astyle,
SUM(lasf.score_astyle) + (langdata_a.lascore) AS atotal,
lang_ship,
langdata_b.lbtext,
langdata_b.lbscore,
lbsdf.func_b,
lbsf.bformat,
lbsf.score_bstyle,
SUM(lbsf.score_bstyle) + (langdata_b.lbscore) AS btotal
FROM lang_relation lrel
INNER JOIN
(
SELECT DISTINCT
lald.lang_data_id,
lafw.field_value AS lafweight,
lafs.field_value AS lafsize,
lafc.field_value AS laffc,
lafbg.field_value AS lafbg,
lapos.field_value AS laposa,
tla.text_lang_t AS latext,
SUM(lafw.field_value) + (lafs.field_value) + (lafc.field_value) + (lafbg.field_value) + (lapos.field_value) AS lascore
FROM lang_data lald
LEFT JOIN text_lang tla ON lald.lang_data_lang_id = tla.text_lang_id
LEFT JOIN `fields` lafw ON lald.lang_data_font_weight = lafw.field_id
LEFT JOIN `fields` lafs ON lald.lang_data_font_size = lafs.field_id
LEFT JOIN `fields` lafc ON lald.lang_data_font_color = lafc.field_id
LEFT JOIN `fields` lafbg ON lald.lang_data_bg_color = lafbg.field_id
LEFT JOIN `fields` lapos ON lald.lang_data_position = lapos.field_id
GROUP BY lald.lang_data_id
)
langdata_a ON langdata_a.lang_data_id = lrel.lang_rel_a_id
LEFT JOIN
(SELECT sdf.synt_df_lang_data, GROUP_CONCAT(latf.text_func_t) AS func_a
FROM synt_data_func sdf
INNER JOIN text_func latf ON latf.text_func_id = sdf.synt_df_func
GROUP BY sdf.synt_df_lang_data
)
lasdf ON lasdf.synt_df_lang_data = lrel.lang_rel_a_id
LEFT JOIN
(
SELECT sfb.synt_format_lang_data,
sfb.synt_format_fields_id,
GROUP_CONCAT(sfbf.field_text SEPARATOR ', ') AS aformat,
SUM(sfbf.field_value) AS score_astyle
FROM synt_format sfb
INNER JOIN `fields` sfbf ON sfbf.field_id = sfb.synt_format_fields_id
GROUP BY sfb.synt_format_lang_data
)
lasf ON lasf.synt_format_lang_data = lrel.lang_rel_a_id
INNER JOIN
(
SELECT DISTINCT
lbld.lang_data_id,
lbfw.field_value AS lbfweight,
lbfs.field_value AS lbfsize,
lbfc.field_value AS lbffc,
lbfbg.field_value AS lbfbg,
lbpos.field_value AS lbposa,
tlb.text_lang_t AS lbtext,
SUM(lbfw.field_value) + (lbfs.field_value) + (lbfc.field_value) + (lbfbg.field_value) + (lbpos.field_value) AS lbscore
FROM lang_data lbld
LEFT JOIN text_lang tlb ON lbld.lang_data_lang_id = tlb.text_lang_id
LEFT JOIN `fields` lbfw ON lbld.lang_data_font_weight = lbfw.field_id
LEFT JOIN `fields` lbfs ON lbld.lang_data_font_size = lbfs.field_id
LEFT JOIN `fields` lbfc ON lbld.lang_data_font_color = lbfc.field_id
LEFT JOIN `fields` lbfbg ON lbld.lang_data_bg_color = lbfbg.field_id
LEFT JOIN `fields` lbpos ON lbld.lang_data_position = lbpos.field_id
GROUP BY lbld.lang_data_id
)
langdata_b ON langdata_b.lang_data_id = lrel.lang_rel_b_id
LEFT JOIN
(SELECT sdfb.synt_df_lang_data, GROUP_CONCAT(lbtf.text_func_t) AS func_b
FROM synt_data_func sdfb
INNER JOIN text_func lbtf ON lbtf.text_func_id = sdfb.synt_df_func
GROUP BY sdfb.synt_df_lang_data
)
lbsdf ON lbsdf.synt_df_lang_data = lrel.lang_rel_b_id
LEFT JOIN
(
SELECT sfb.synt_format_lang_data,
sfb.synt_format_fields_id,
GROUP_CONCAT(sfbf.field_text SEPARATOR ', ') AS bformat,
SUM(sfbf.field_value) AS score_bstyle
FROM synt_format sfb
INNER JOIN `fields` sfbf ON sfbf.field_id = sfb.synt_format_fields_id
GROUP BY sfb.synt_format_lang_data
)
lbsf ON lbsf.synt_format_lang_data = lrel.lang_rel_b_id
GROUP BY lrel.lang_rel_id
Maybe a bit long, but output is exactly what was needed :-)

Left join with multiple status

Lets say i have two stores. Store A(22) and Store B(21). This is the query to fetch the things that matched with the Store ID:
SELECT c . * , s.s_name, s.logo, s.s_slug, cm.c_code, cm.c_shorturl, cm.c_shorturl_id
FROM ci_cptbl c
LEFT JOIN ci_stores s ON s.store_id = c.store_id
LEFT JOIN ci_cptbl_mapper cm ON cm.c_id = c._id
WHERE c.coupon_id <> ''
AND c.store_id in ('22', '21')
AND s.s_status = '1'
AND c.c_status = '1'
AND DATE( c.c_end_date ) >= '2014-10-04'
ORDER BY c.c_id DESC
ci_cptbl Has the collection of product including the store_id. And ci_stores holds the store name, etc.. including s_status(0, 1).
CREATE TABLE `ci_stores` (
`store_id` int(10) NOT NULL AUTO_INCREMENT,
`cat_id` int(10) NOT NULL,
`s_name` varchar(255) NOT NULL,
`s_slug` varchar(255) NOT NULL,
`logo` varchar(255) NOT NULL,
`display_name` varchar(255) NOT NULL,
`s_description` text NOT NULL,
`network_id` int(10) NOT NULL,
`s_status` tinyint(1) NOT NULL DEFAULT '0',
`merged_stores` text NOT NULL,
`stat` bigint(20) NOT NULL,
PRIMARY KEY (`store_id`),
KEY `network_id` (`network_id`),
KEY `cat_id` (`cat_id`),
FULLTEXT KEY `display_name` (`display_name`)
) ENGINE=MyISAM AUTO_INCREMENT=127 DEFAULT CHARSET=utf8
Now condition is i have only first store id( 22 ) enabled and rest is disabled (21, .....) in my stores table and my AND s.s_status = '1' statement works only if all the stores are enabled but i want all stores including disabled.
BUT FIRST STORE ID MUST BE ENABLED
You could try changing you condition to test on store_id and enabled
(c.c_status = '1' or c.store_id <> '22')
so:
SELECT c . * , s.s_name, s.logo, s.s_slug, cm.c_code, cm.c_shorturl, cm.c_shorturl_id
FROM ci_cptbl c
LEFT JOIN ci_stores s ON s.store_id = c.store_id
LEFT JOIN ci_cptbl_mapper cm ON cm.c_id = c._id
WHERE c.coupon_id <> ''
AND c.store_id in ('22', '21')
AND s.s_status = '1'
AND (c.c_status = '1' or c.store_id <> '22')
AND DATE( c.c_end_date ) >= '2014-10-04'
ORDER BY c.c_id DESC
You need to move your "AND s.s_status = 1" clause to the LEFT JOIN part of your query. When you move it to the WHERE clause, it forces to an implied INNER JOIN and thus leaving out of you result set.
LEFT JOIN ci_stores s
ON c.store_id = s.store_id
AND s.s_status = '1'
FROM ci_cptbl c
LEFT JOIN ci_stores s ON s.store_id = c.store_id
The purpose of that LEFT JOIN is to allow unmatched records in the "joined to" table to be listed in the results
e.g. if there is a store 657 in ci_cptbl but no such store in ci_stores 657 would still be listed
However, using the WHERE clause:
if we then ALSO insist EVERY row has s.s_status = '1'
then store 657 would NOT be listed (how can it? it does not exist in ci_stores so ci_stores.s_status has to be NULL and can never ever be equal to 1
So; when using any outer join such as a LEFT OUTER JOIN you have to be very careful how you reference those tables in the WHERE clause. In many cases, such as here, it is better to move the additional condition(s) to the JOIN like this:
FROM ci_cptbl c
LEFT JOIN ci_stores s ON s.store_id = c.store_id
AND s.s_status = '1'

Left Join with Where clause returns empty result

I have got a post table and it's schema is like this:
CREATE TABLE IF NOT EXISTS `post` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) DEFAULT NULL,
`site_id` bigint(20) DEFAULT NULL,
`parent_id` bigint(20) DEFAULT NULL,
`title` longtext COLLATE utf8_turkish_ci NOT NULL,
`status` varchar(20) COLLATE utf8_turkish_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_5A8A6C8DA76ED395` (`user_id`),
KEY `IDX_5A8A6C8DF6BD1646` (`site_id`),
KEY `IDX_5A8A6C8D727ACA70` (`parent_id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_turkish_ci AUTO_INCREMENT=16620 ;
I'm using this DQL to fetch a post and it's children:
$post = $this->_diContainer->wordy_app_doctrine->fetch(
"SELECT p,c FROM Wordy\Entity\Post p LEFT JOIN p.children c WHERE p.site = :site AND p.id = :id AND p.language = :language AND p.status != 'trashed' AND c.status != 'trashed' ORDER BY c.title",
array(
'params' => array(
'id' => $id,
'site' => $this->_currentSite['id'],
'language' => $this->_currentLanguage->code,
)
)
);
What i'm trying to do is: Fetch a post and all of it's children. The criteria is, don't include trashed posts or trashed children.
But when i run this query with a post which doesn't even have children, the returned result set is empty.
When i remove the c.status != 'trashed' part from query, everything works fine but i will get trashed posts too.
Thanks, in advance.
edit: here is the SQL output of given DQL:
SELECT p0_.id AS id0, p0_.title AS title5, p0_.status AS status8, p0_.parent_id AS parent_id9, p1_.id AS id15, p1_.title AS title20, p1_.status AS status23, p1_.parent_id AS parent_id24 FROM post p0_ LEFT JOIN post p1_ ON p0_.id = p1_.parent_id WHERE p0_.site_id = ? AND p0_.id = ? AND p0_.language = ? AND p0_.status <> 'trashed' ORDER BY p1_.title ASC
You forgot the ON clause of the left join syntax.
Your long conditional WHERE is being AND'ed together after the join and nothing is passing all the checks. Move the join stuff into the ON clause and leave the c.trashed check in the WHERE clause. Try something like this:
SELECT p,c FROM Wordy\Entity\Post p LEFT JOIN children c ON p.site = :site AND p.id = :id AND p.language = :language AND p.status != 'trashed' AND p.id = c.parent_id WHERE c.status != 'trashed' ORDER BY c.title
I think i solved my own problem.
Just use a with clause on join field instead of where clause, like this:
"SELECT p,c FROM Wordy\Entity\Post p LEFT JOIN p.children c WITH c.status != 'trashed' WHERE p.site = :site........."