Related
i have a super slow query without any clue about the reason why it is slow, and some strange behavious on it too.
SELECT <fields list>
FROM scadenze s
JOIN cartelle_cliniche cc ON (cc.id = s.id_cartella)
JOIN anagrafica a ON (a.id = cc.id_anagrafica)
JOIN scadenze_sedute ss ON (ss.id_scadenza = s.id)
JOIN pmultiple_prenotazioni pp ON (pp.id = ss.id_fisico OR pp.id = ss.id_virtuale)
LEFT JOIN medici_privati mp ON (mp.id = a.id_medico)
LEFT JOIN operatorif o ON (o.id = cc.id_operatore_emittente)
WHERE pp.confermato = '1' AND pp.annullato = '0'
AND pp.id_esito != 4 AND s.id_stato = '0'
AND (DATE(s.data_scadenza) BETWEEN '2021-06-01' AND '2021-06-16')
GROUP BY s.id, TIME(ss.data_seduta)
It lasts 5 minutes!!
Now, as you can see there are a lot of JOINs and only 5 WHERE clauses.
If i remove from WHERE clauses these 3 parameters: pp.confermato = '1' AND pp.annullato = '0' AND pp.id_esito != 4, the query lasts only 0.15 seconds.
I have single Indexes on these 3 parameters, and i also tried to add a multiple Index using the FORCE INDEX function of MySQL. But nothing.
If instead i put that parameters in an HAVING clause, like:
SELECT <fields list>
FROM scadenze s
JOIN cartelle_cliniche cc ON (cc.id = s.id_cartella)
JOIN anagrafica a ON (a.id = cc.id_anagrafica)
JOIN scadenze_sedute ss ON (ss.id_scadenza = s.id)
JOIN pmultiple_prenotazioni pp ON (pp.id = ss.id_fisico OR pp.id = ss.id_virtuale)
LEFT JOIN medici_privati mp ON (mp.id = a.id_medico)
LEFT JOIN operatorif o ON (o.id = cc.id_operatore_emittente)
WHERE s.id_stato = '0' AND (DATE(s.data_scadenza) BETWEEN '2021-06-01' AND '2021-06-16')
GROUP BY s.id, TIME(ss.data_seduta)
HAVING pp.confermato = '1' AND pp.annullato = '0' AND pp.id_esito != 4
Query lasts 0.15 seconds.
But what's the problem with using these parameters in the WHERE clauses?
(In Having clause i get 308 records instead of 309)
What does can cause this slow query if i have Indexes on any of that field?
Thank you in advance.
EDIT:
Explain query here:
CREATE Table for pmultiple_prenotazione (pp) here:
CREATE TABLE `pmultiple_prenotazioni` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`id_utente` INT(11) NOT NULL,
`id_anagrafica` INT(11) NOT NULL DEFAULT '0',
`id_cartella` INT(11) NOT NULL DEFAULT '0',
`id_kit` INT(11) NOT NULL DEFAULT '0',
`id_fase` INT(11) NOT NULL DEFAULT '0',
`data_seduta` DATETIME NOT NULL,
`giorno` VARCHAR(50) NOT NULL COLLATE 'latin1_swedish_ci',
`id_reparto` INT(11) NOT NULL,
`importo_prestazione` DECIMAL(6,2) NOT NULL DEFAULT '0.00',
`peso_prestazione` DECIMAL(5,2) NOT NULL DEFAULT '0.00',
`durata_prestazione` SMALLINT(6) NOT NULL DEFAULT '0',
`id_operatore` INT(11) NOT NULL DEFAULT '0',
`solo` TINYINT(1) NOT NULL DEFAULT '0',
`confermato` TINYINT(1) NOT NULL DEFAULT '0',
`annullato` TINYINT(1) NOT NULL DEFAULT '0',
`id_esito` TINYINT(4) NOT NULL DEFAULT '0',
`creazione` DATETIME NOT NULL,
`univoco` VARCHAR(255) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
PRIMARY KEY (`id`) USING BTREE,
INDEX `id_cartella` (`id_cartella`) USING BTREE,
INDEX `id_kit` (`id_kit`) USING BTREE,
INDEX `data_seduta` (`data_seduta`) USING BTREE,
INDEX `confermato` (`confermato`) USING BTREE,
INDEX `id_utente` (`id_utente`) USING BTREE,
INDEX `id_reparto` (`id_reparto`) USING BTREE,
INDEX `id_anagrafica` (`id_anagrafica`) USING BTREE,
INDEX `id_fase` (`id_fase`) USING BTREE,
INDEX `peso_prestazione` (`peso_prestazione`) USING BTREE,
INDEX `durata_prestazione` (`durata_prestazione`) USING BTREE,
INDEX `solo` (`solo`) USING BTREE,
INDEX `id_operatore` (`id_operatore`) USING BTREE,
INDEX `univoco` (`univoco`) USING BTREE,
INDEX `annullato` (`annullato`) USING BTREE,
CONSTRAINT `prenkit` FOREIGN KEY (`id_kit`) REFERENCES `starbene_1`.`pmultiple_kit` (`id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=717
;
The problem is caused here:
JOIN pmultiple_prenotazioni pp ON (pp.id = ss.id_fisico OR pp.id = ss.id_virtuale)
There is no way an index can help this. If you just had one term or the other in this join condition, it could use the primary key index for pp.id. But with the OR expression, it cannot.
You can see the effect in the EXPLAIN where it must do a table-scan (type: ALL) for pp. So the total number of join comparisons is the number of rows examined in ss or 10193 multiplied by the number of rows examined in pp or 23565. Total: 240 million!
You could separate it into two joins:
LEFT JOIN pmultiple_prenotazioni pp1 ON (pp1.id = ss.id_fisico)
LEFT JOIN pmultiple_prenotazioni pp2 ON (pp2.id = ss.id_virtuale)
...
WHERE
pp1.confermato = '1' AND pp1.annullato = '0' AND pp1.id_esito != 4
AND pp2.confermato = '1' AND pp2.annullato = '0' AND pp2.id_esito != 4
AND ...
Or you could use a UNION:
SELECT * FROM (
SELECT s.id, ss.data_seduta
FROM scadenze s
JOIN cartelle_cliniche cc ON (cc.id = s.id_cartella)
JOIN anagrafica a ON (a.id = cc.id_anagrafica)
JOIN scadenze_sedute ss ON (ss.id_scadenza = s.id)
JOIN pmultiple_prenotazioni pp ON (pp.id = ss.id_fisico) -- ONLY fisico
LEFT JOIN medici_privati mp ON (mp.id = a.id_medico)
LEFT JOIN operatorif o ON (o.id = cc.id_operatore_emittente)
WHERE
pp.confermato = '1' AND pp.annullato = '0' AND pp.id_esito != 4
AND s.id_stato = '0' AND (DATE(s.data_scadenza) BETWEEN '2021-06-01' AND '2021-06-16')
UNION
SELECT s.id, ss.data_seduta
FROM scadenze s
JOIN cartelle_cliniche cc ON (cc.id = s.id_cartella)
JOIN anagrafica a ON (a.id = cc.id_anagrafica)
JOIN scadenze_sedute ss ON (ss.id_scadenza = s.id)
JOIN pmultiple_prenotazioni pp ON (pp.id = ss.id_virtuale) -- ONLY virtuale
LEFT JOIN medici_privati mp ON (mp.id = a.id_medico)
LEFT JOIN operatorif o ON (o.id = cc.id_operatore_emittente)
WHERE
pp.confermato = '1' AND pp.annullato = '0' AND pp.id_esito != 4
AND s.id_stato = '0' AND (DATE(s.data_scadenza) BETWEEN '2021-06-01' AND '2021-06-16')
) AS t
GROUP BY id, TIME(data_seduta);
In MySQL Workbench you can see the execution plan of your query just by clicking on "Execution plan" (bottom right of the page after the execution of the query). Then it should be easy to know which step requires more time.
Looks like it might be a case of "inflate-deflate". First it JOINs tables (inflate), then it GROUPs (deflate). In the middle there may be a lot of rows with a lot of work to do all the joining.
So, try to do the GROUP BY first.
SELECT ...
FROM ( SELECT ... FROM s JOIN ss ...
GROUP BY s.id, TIME(ss.data_seduta) ) AS aaa
JOIN ...
You should not need to do a GROUP BY at the end.
I have a rand query which runs very slow like almost every rand query. I researched all stackoverflow but cannot find any good solution for my query
SELECT u.id
, u.is_instagram_connected
, u.tokens
, u.username
, u.name
, u.photo
, u.bio
, u.voice
, u.mobile_update
, 1584450999 - l.time idleTime
FROM mobile_login_list l
JOIN users u
ON l.username = u.username
JOIN mobile_token_list t
ON t.username = l.username
WHERE l.time > 1584393399
AND l.username NOT IN ('enesdoo')
AND u.username NOT IN (
SELECT blocked_username
FROM hided_mobile_users_from_shuffle
WHERE username = 'enesdoo'
)
AND u.ban_status = 0
AND u.perma_ban = 0
AND u.mobile_online_status = 1
AND u.lock_status = 0
GROUP
BY l.username
ORDER
BY RAND( )
LIMIT 27
If i remove the order by rand line, this runs very very quick like 100 times faster.
How can i speed up this query?
mobile_login_list has > 50k rows
users has > 1m rows
Edit:
Explain:
My table:
CREATE TABLE IF NOT EXISTS `mobile_login_list` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(30) COLLATE utf8_bin NOT NULL,
`key` varchar(32) COLLATE utf8_bin NOT NULL,
`time` int(11) NOT NULL,
`ip` int(11) NOT NULL,
`version` smallint(4) NOT NULL,
`messaged` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `kontrol` (`username`,`key`),
KEY `username` (`username`),
KEY `time` (`time`),
KEY `username_2` (`username`,`time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=3351637 ;
In the lingo of random retrieval, this is called a deal operation (deal 27 different cards from a shuffled deck of 4k or so. The other random operation is called roll: it allows duplicates.)
You're using SELECT mess-of-columns FROM mess-of-joins WHERE mess-of-criteria ORDER BY RAND() LIMIT small-number to do shuffle and deal operation. That is a notorious performance antipattern. It causes some extra work for the server because it must order a fairly large result set then discard almost all of it (with the LIMIT).
A way to save some of the trouble is to defer the joins to the details. Shuffle only the ids. Then take the small number of results and fetch the details you need. Something like this.
SELECT u.id /* just the id values */
FROM mobile_login_list l
JOIN users u
ON l.username = u.username
JOIN mobile_token_list t
ON t.username = l.username
WHERE l.time > 1584393399
AND l.username NOT IN ('enesdoo')
AND u.username NOT IN (
SELECT blocked_username
FROM hided_mobile_users_from_shuffle
WHERE username = 'enesdoo'
)
AND u.ban_status = 0
AND u.perma_ban = 0
AND u.mobile_online_status = 1
AND u.lock_status = 0
ORDER
BY RAND( )
LIMIT 27
You can debug, run EXPLAIN and optimize this subquery by changing indexes and maybe tightening up your selection criteria. It's the one doing all the hard work of shuffling and dealing.
Then join that resultset to your detail tables to choose the data you need. This outer query only needs to process your 27 rows. Be sure to shuffle again.
SELECT u.id
, u.is_instagram_connected
, u.tokens
, u.username
, u.name
, u.photo
, u.bio
, u.voice
, u.mobile_update
, 1584450999 - l.time idleTime
FROM mobile_login_list l
JOIN users u
ON l.username = u.username
JOIN (
/* the subquery from above */
) selected ON u.id = selected.id
ORDER BY RAND()
Putting it all together, you get this big repetitive mess of a query. But it should be a little faster.
SELECT u.id
, u.is_instagram_connected
, u.tokens
, u.username
, u.name
, u.photo
, u.bio
, u.voice
, u.mobile_update
, 1584450999 - l.time idleTime
FROM mobile_login_list l
JOIN users u
ON l.username = u.username
JOIN (
SELECT u.id
FROM mobile_login_list l
JOIN users u
ON l.username = u.username
JOIN mobile_token_list t
ON t.username = l.username
WHERE l.time > 1584393399
AND l.username NOT IN ('enesdoo')
AND u.username NOT IN (
SELECT blocked_username
FROM hided_mobile_users_from_shuffle
WHERE username = 'enesdoo'
)
AND u.ban_status = 0
AND u.perma_ban = 0
AND u.mobile_online_status = 1
AND u.lock_status = 0
ORDER
BY RAND( )
LIMIT 27
) selected ON u.id = selected.id
ORDER BY RAND()
A more performant way to deal records is this, if you do the dealing a lot.
Add a FLOAT column to the table you're dealing from, let's call it deal. Put an index on it.
Every few hours, or maybe overnight or even once a week, shuffle the table by running this query UPDATE users SET deal = RAND(); It will take a while; it needs to change the deal value in every row.
When you need to deal, do ...WHERE deal >= RAND() * 0.9 ... ORDER BY deal LIMIT n. The multiplication by 0.9 helps ensure you don't hit the end of the table by choosing a random number too close to 1.
This is equivalent, in cardshark terms, to shuffling the deck every few hours and then just cutting it for every deal. It's the way Wikipedia implements their "show a random article" feature.
Can we see the EXPLAIN for this instead...?
SELECT DISTINCT u.id
, u.is_instagram_connected
, u.tokens
, u.username
, u.name
, u.photo
, u.bio
, u.voice
, u.mobile_update
, 1584450999 - l.time idleTime
FROM mobile_login_list l
JOIN users u
ON l.username = u.username
JOIN mobile_token_list t
ON t.username = l.username
LEFT
JOIN hided_mobile_users_from_shuffle x
ON x.blocked_username = u.username
AND x.username = 'enesdoo'
WHERE l.time > 1584393399
AND l.username NOT IN ('enesdoo')
AND x.blocked_username IS NULL
AND u.ban_status = 0
AND u.perma_ban = 0
AND u.mobile_online_status = 1
AND u.lock_status = 0
ORDER
BY RAND( )
LIMIT 27
Given my limited knowledge of query optimisation, I would simply define the table as follows, but maybe someone else can suggest further improvements:
CREATE TABLE IF NOT EXISTS mobile_login_list
(id SERIAL PRIMARY KEY
,username varchar(30) COLLATE utf8_bin NOT NULL
,`key` varchar(32) COLLATE utf8_bin NOT NULL
,time int NOT NULL
,ip int NOT NULL
,version smallint NOT NULL
,messaged int NOT NULL DEFAULT 0
,KEY username_2 (username,time) -- or (time,username)
);
Note that key is a reserved word (and time is a 'keyword') rendering it a poor choice for a table/column identifier
I have a complex query which takes 700ms to run on my machine. I found that the bottleneck is the ORDER BY at_firstname.value clause, but how can I use indexes to improve this?
SELECT
`e`.*
, `at_default_billing`.`value` AS `default_billing`
, `at_billing_postcode`.`value` AS `billing_postcode`
, `at_billing_city`.`value` AS `billing_city`
, `at_billing_region`.`value` AS `billing_region`
, `at_billing_country_id`.`value` AS `billing_country_id`
, `at_company`.`value` AS `company`
, `at_firstname`.`value` AS `firstname`
, `at_lastname`.`value` AS `lastname`
, CONCAT(at_firstname.value
, " "
, at_lastname.value) AS `full_name`
, `at_phone`.`value` AS `phone`
, IFNULL(at_phone.value,"N/A") AS `telephone`
, `e`.`entity_id` AS `id`
FROM
`customer_entity` AS `e`
LEFT JOIN
`customer_entity_int` AS `at_default_billing`
ON (`at_default_billing`.`entity_id` = `e`.`entity_id`)
AND (`at_default_billing`.`attribute_id` = '13')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_postcode`
ON (`at_billing_postcode`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_postcode`.`attribute_id` = '30')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_city`
ON (`at_billing_city`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_city`.`attribute_id` = '26')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_region`
ON (`at_billing_region`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_region`.`attribute_id` = '28')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_country_id`
ON (`at_billing_country_id`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_country_id`.`attribute_id` = '27')
LEFT JOIN
`customer_address_entity_varchar` AS `at_company`
ON (`at_company`.`entity_id` = `at_default_billing`.`value`)
AND (`at_company`.`attribute_id` = '24')
LEFT JOIN
`customer_entity_varchar` AS `at_firstname`
ON (`at_firstname`.`entity_id` = `e`.`entity_id`)
AND (`at_firstname`.`attribute_id` = '5')
LEFT JOIN
`customer_entity_varchar` AS `at_lastname`
ON (`at_lastname`.`entity_id` = `e`.`entity_id`)
AND (`at_lastname`.`attribute_id` = '7')
LEFT JOIN
`customer_entity_varchar` AS `at_phone`
ON (`at_phone`.`entity_id` = `e`.`entity_id`)
AND (`at_phone`.`attribute_id` = '136')
ORDER BY
`at_firstname`.`value` ASC LIMIT 20
This is execution plan :
EXPLAIN of Query :
'1','SIMPLE','e',NULL,'ALL',NULL,NULL,NULL,NULL,'19951','100.00','Using temporary; Using filesort'
'1','SIMPLE','at_default_billing',NULL,'eq_ref','UNQ_CUSTOMER_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_INT_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_INT_ENTITY_ID,IDX_CUSTOMER_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.e.entity_id,const','1','100.00',NULL
'1','SIMPLE','at_billing_postcode',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_billing_city',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_billing_region',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_billing_country_id',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_company',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_firstname',NULL,'eq_ref','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.e.entity_id,const','1','100.00',NULL
'1','SIMPLE','at_lastname',NULL,'eq_ref','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.e.entity_id,const','1','100.00',NULL
'1','SIMPLE','at_phone',NULL,'eq_ref','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.e.entity_id,const','1','100.00',NULL
Table Structure:
CREATE TABLE `customer_entity_varchar` (
`value_id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Value Id',
`entity_type_id` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'Entity Type Id',
`attribute_id` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'Attribute Id',
`entity_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Entity Id',
`value` varchar(255) DEFAULT NULL COMMENT 'Value',
PRIMARY KEY (`value_id`),
UNIQUE KEY `UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID` (`entity_id`,`attribute_id`),
KEY `IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_TYPE_ID` (`entity_type_id`),
KEY `IDX_CUSTOMER_ENTITY_VARCHAR_ATTRIBUTE_ID` (`attribute_id`),
KEY `IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID` (`entity_id`),
KEY `IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE` (`entity_id`,`attribute_id`,`value`),
CONSTRAINT `FK_CSTR_ENTT_VCHR_ATTR_ID_EAV_ATTR_ATTR_ID` FOREIGN KEY (`attribute_id`) REFERENCES `eav_attribute` (`attribute_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `FK_CSTR_ENTT_VCHR_ENTT_TYPE_ID_EAV_ENTT_TYPE_ENTT_TYPE_ID` FOREIGN KEY (`entity_type_id`) REFERENCES `eav_entity_type` (`entity_type_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `FK_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_CUSTOMER_ENTITY_ENTITY_ID` FOREIGN KEY (`entity_id`) REFERENCES `customer_entity` (`entity_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=131094 DEFAULT CHARSET=utf8 COMMENT='Customer Entity Varchar';
As of now your query is:
Performing ALL left outer joins first.
Then ORDERing the rows.
Then LIMITing the rows.
I would perform the strictly needed outer joins first, then ordering and limiting (to reduce to 20 rows), and finally I would do all the rest of the outer joins. In short I would do:
Performing minimal left outer join first. That is, two tables only.
Then ORDERing the rows.
Then LIMITing the rows. This produce a max of 20 rows.
Perform all the rest of outer joins. At this point this is not thousands of rows anymore, but only 20.
This change should massively reduce the "Unique Key Lookup" executions. The modified query will look like:
select
e.*
, `at_default_billing`.`value` AS `default_billing`
, `at_billing_postcode`.`value` AS `billing_postcode`
, `at_billing_city`.`value` AS `billing_city`
, `at_billing_region`.`value` AS `billing_region`
, `at_billing_country_id`.`value` AS `billing_country_id`
, `at_company`.`value` AS `company`
, `at_lastname`.`value` AS `lastname`
, CONCAT(firstname
, " "
, at_lastname.value) AS `full_name`
, `at_phone`.`value` AS `phone`
, IFNULL(at_phone.value,"N/A") AS `telephone`
from ( -- Step #1: joining customer_entity with customer_entity_varchar
SELECT
`e`.*
, `at_firstname`.`value` AS `firstname`
, `e`.`entity_id` AS `id`
FROM
`customer_entity` AS `e`
LEFT JOIN
`customer_entity_varchar` AS `at_firstname`
ON (`at_firstname`.`entity_id` = `e`.`entity_id`)
AND (`at_firstname`.`attribute_id` = '5')
ORDER BY -- Step #2: Sorting (the bare minimum)
`at_firstname`.`value` ASC
LIMIT 20 -- Step #3: Limiting (to 20 rows)
) e
LEFT JOIN -- Step #4: Performing all the rest of outer joins (only few rows now)
`customer_entity_int` AS `at_default_billing`
ON (`at_default_billing`.`entity_id` = `e`.`entity_id`)
AND (`at_default_billing`.`attribute_id` = '13')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_postcode`
ON (`at_billing_postcode`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_postcode`.`attribute_id` = '30')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_city`
ON (`at_billing_city`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_city`.`attribute_id` = '26')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_region`
ON (`at_billing_region`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_region`.`attribute_id` = '28')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_country_id`
ON (`at_billing_country_id`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_country_id`.`attribute_id` = '27')
LEFT JOIN
`customer_address_entity_varchar` AS `at_company`
ON (`at_company`.`entity_id` = `at_default_billing`.`value`)
AND (`at_company`.`attribute_id` = '24')
LEFT JOIN
`customer_entity_varchar` AS `at_lastname`
ON (`at_lastname`.`entity_id` = `e`.`entity_id`)
AND (`at_lastname`.`attribute_id` = '7')
LEFT JOIN
`customer_entity_varchar` AS `at_phone`
ON (`at_phone`.`entity_id` = `e`.`entity_id`)
AND (`at_phone`.`attribute_id` = '136')
Unfortunately, SELECT whole_mess_of_rows FROM many_tables ORDER BY one_col LIMIT small_number is a notorious performance antipattern. Why? Because it sorts a big result set, just to discard most of it.
The trick is to cheaply find out which rows are within that LIMIT small_number, then retrieve only those rows from the larger query.
Which rows do you want? It looks to me like this query will retrieve their customer_entity.id values. But it's hard to be sure, so you should test this subquery.
SELECT customer_entity.entity_id
FROM customer_entity
LEFT JOIN customer_entity_varchar AS at_firstname
ON (at_firstname.entity_id = e.entity_id)
AND (at_firstname.attribute_id = '5')
ORDER BY at_firstname.value ASC
LIMIT 20
This should give the twenty relevant entity_id values. Test it. Look at its execution plan. Add an appropriate index to customer_entity if need be. That index might be (firstname_attribute_id, firstname_entity_id, firstname_value) But I am guessing.
Then you can put this at the end of your main query, right before ORDER BY.
WHERE e.entity_id IN (
SELECT customer_entity.entity_id
FROM customer_entity
LEFT JOIN customer_entity_varchar AS at_firstname
ON (at_firstname.entity_id = e.entity_id)
AND (at_firstname.attribute_id = '5')
ORDER BY at_firstname.value ASC
LIMIT 20
)
and things should be a bit faster.
I agree with the previous Answers, but want to emphasize on more antipattern: Over-noramlization.
Your schema is a curious (and inefficient) variant on the already-bad EAV schema pattern.
There is little advantage, and some disadvantage in splitting customer_address_entity_varchar across 5 tables. Similarly for customer_entity_varchar.
An address should (usually) be stored as a few columns in a single table; no JOINs to other tables.
Likewise for firstname+lastname.
Phone could be another issue, since a person/company/entity could have multiple phone numbers (cell, home, work, fax, etc). But that is a different story.
I have a MySQL database table that's quite large with 2.5million rows and growing. To speed up queries I've added an index to one of columns. When I set the index manually, through PHPMyAdmin for example, it's cardinality is around 1500 which seems about right and my queries run without issue.
The problem then comes after a few queries, (especially on an INSERT but not limited to), have been run, the cardinality of that index drops to 17 or 18, and queries run extremely slow. Sometimes it seems to work it's way back to about 1500 or I have to do it through PHPMyAdmin again.
Is there any way to stop this cardinality drop from happening?
CREATE TABLE IF NOT EXISTS `probe_results` (
`probe_result_id` int(11) NOT NULL AUTO_INCREMENT,
`date` date NOT NULL,
`month` int(11) NOT NULL,
`year` int(11) NOT NULL,
`time` time NOT NULL,
`type` varchar(11) NOT NULL,
`probe_id` varchar(50) NOT NULL,
`status` varchar(11) NOT NULL,
`temp_1` decimal(11,0) NOT NULL,
`temp_2` decimal(11,0) NOT NULL,
`crc` varchar(11) NOT NULL,
`raw_data` text NOT NULL,
`txt_file` text NOT NULL,
PRIMARY KEY (`probe_result_id`),
KEY `probe_id` (`probe_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=2527300 ;
The 'probe_result_id' column is the primary key, the probe_id is the column with the index in question.
Example query:
SELECT IF(b.reactive_total IS NULL, 0, b.reactive_total) AS reactive_total, a.* FROM (SELECT COUNT(CASE WHEN asset_testing_results.asset_testing_year = '2016' AND asset_testing_results.asset_testing_month = '7' AND asset_testing_results.asset_stopped = '0' AND asset_testing_results.asset_testing_completed = '0' THEN 1 END) AS due_total, (COUNT(CASE WHEN asset_testing_results.asset_testing_year = '2016' AND asset_testing_results.asset_stopped = '0' AND asset_testing_results.asset_testing_completed = '1' AND asset_testing_results.asset_testing_satisfactory = '1' AND asset_testing_results.asset_testing_actioned = '0' THEN 1 END)+(IF(probes_passed_total IS NULL, 0, probes_passed_total))) AS passed_total, (COUNT(CASE WHEN asset_testing_results.asset_testing_year = '2016' AND asset_testing_results.asset_stopped = '0' AND asset_testing_results.asset_testing_completed = '1' AND asset_testing_results.asset_testing_satisfactory = '0' AND asset_testing_results.asset_testing_actioned = '0' THEN 1 END)+(IF(probes_failed_total IS NULL, 0, probes_failed_total))) AS failed_total, COUNT(CASE WHEN asset_testing_results.asset_testing_year = '2016' AND asset_testing_results.asset_stopped = '0' AND asset_testing_results.asset_testing_completed = '1' AND asset_testing_results.asset_testing_actioned = '1' THEN 1 END) AS actioned_total, COUNT(CASE WHEN asset_testing_results.asset_testing_year = '2016' AND asset_testing_results.asset_testing_month < '7' AND asset_testing_results.asset_testing_completed = '0' AND asset_testing_results.asset_testing_satisfactory = '0' AND asset_testing_results.asset_stopped = '0' THEN 1 END) AS missed_total, site.site_key, site.site_name FROM site LEFT JOIN location ON location.site_key = site.site_key LEFT JOIN sub_location ON sub_location.location_key = location.location_key LEFT JOIN asset ON asset.sub_location_key = sub_location.sub_location_key AND asset.stopped = '0' LEFT JOIN asset_testing ON asset_testing.asset_type_key = asset.asset_type_key AND asset_testing.probe_assessed = '0' LEFT JOIN asset_testing_results ON asset_testing_results.asset_testing_key = asset_testing.asset_testing_key AND asset_testing_results.asset_key = asset.asset_key LEFT JOIN (SELECT site.site_key, COUNT(CASE WHEN p.probe_id IS NOT NULL AND p.asset_testing_key IS NOT NULL THEN 1 END) AS probes_passed_total, COUNT(CASE WHEN p.probe_id IS NOT NULL AND p.asset_testing_key IS NULL AND p.temp_1 IS NOT NULL THEN 1 END) AS probes_failed_total FROM assetsvs_probes LEFT JOIN (SELECT q.probe_id, q.month, q.year, IF(r.temp_1 IS NULL, q.temp_1, r.temp_1) as temp_1, r.asset_testing_key FROM (SELECT DISTINCT probe_results.probe_id, probe_results.month, probe_results.year, probe_results.temp_1 FROM probe_results LEFT JOIN assetsvs_probes ON assetsvs_probes.probe_id = probe_results.probe_id LEFT JOIN asset ON asset.asset_key = assetsvs_probes.asset_key LEFT JOIN sub_location ON sub_location.sub_location_key = asset.sub_location_key LEFT JOIN location ON location.location_key = sub_location.location_key LEFT JOIN site ON site.site_key = location.site_key WHERE site.client_key = '25')q LEFT JOIN (SELECT probe_results.month, probe_results.year, probe_results.probe_id, temp_1, asset_testing.asset_testing_key FROM probe_results LEFT JOIN assetsvs_probes ON assetsvs_probes.probe_id = probe_results.probe_id LEFT JOIN asset_testing ON asset_testing.asset_testing_key = assetsvs_probes.asset_testing_key LEFT JOIN asset ON asset.asset_key = assetsvs_probes.asset_key LEFT JOIN sub_location ON sub_location.sub_location_key = asset.sub_location_key LEFT JOIN location ON location.location_key = sub_location.location_key LEFT JOIN site ON site.site_key = location.site_key WHERE temp_1 != 'invalid' AND ((temp_1 >= test_min AND test_max = '') OR (temp_1 <= test_max AND test_min = '') OR (temp_1 >= test_min AND temp_1 <= test_max)) AND year = '2016' AND site.client_key = '25' GROUP BY probe_results.month, probe_results.year, probe_results.probe_id)r ON r.probe_id = q.probe_id AND r.month = q.month AND r.year = q.year WHERE q.year = '2016' GROUP BY probe_id, month, year) p ON p.probe_id = assetsvs_probes.probe_id LEFT JOIN asset_testing ON asset_testing.asset_testing_key = assetsvs_probes.asset_testing_key LEFT JOIN asset ON asset.asset_key = assetsvs_probes.asset_key LEFT JOIN sub_location ON sub_location.sub_location_key = asset.sub_location_key LEFT JOIN location ON location.location_key = sub_location.location_key LEFT JOIN site ON site.site_key = location.site_key GROUP BY site.site_key) probe_results ON probe_results.site_key = site.site_key WHERE site.client_key = '25' GROUP BY site.site_key)a LEFT JOIN (SELECT COUNT(CASE WHEN jobs.status = '3' THEN 1 END) AS reactive_total, site.site_key FROM jobs LEFT JOIN jobs_meta ON jobs_meta.job_id = jobs.job_id AND jobs_meta.meta_key = 'start_date' LEFT JOIN site ON site.site_key = jobs.site_key WHERE site.client_key = '25' AND jobs_meta.meta_value LIKE '%/2016 %' GROUP BY site.site_key)b ON b.site_key = a.site_key
Thanks
Cardinality (along with other statistics) is calculated and updated by MySQL automatically, so you do not have direct means to prevent it from dropping.
However, you can take a few steps to make this less likely to happen or correct the behaviour.
First of all, MySQL updates index statistics for all supported table engines if you run analyze table command.
For innodb table engine MySQL provides a set of configuration settings that can influence the behaviour of the sampling. The settings and their effect are described in the MySQL documentation:
non-persistent statistics
persistent statistics
The main setting is innodb_stats_transient_sample_pages:
• Small values like 1 or 2 can result in inaccurate estimates of
cardinality.
• Increasing the innodb_stats_transient_sample_pages value might
require more disk reads. Values much larger than 8 (say, 100), can
cause a significant slowdown in the time it takes to open a table or
execute SHOW TABLE STATUS.
• The optimizer might choose very different query plans based on
different estimates of index selectivity
.
For myisam MySQL dos not provide such a variety of settings. myisam_stats_method setting is described in the general index statistics documentation
I am creating a small message board and I am stuck
I can select the subject, the original author, the number of replies but what I can't do is get the username, topic or date of the last post.
There are 3 tables, boards, topics and messages.
I want to get the author, date and topic of the last message in the message table. The author and date field are already fields on the messages table but i would need to join the messages and topics table on the topicid field.
this is my query that selects the subject, author, and number of replies
SELECT t.topicname, t.author, count( message ) AS message
FROM topics t
INNER JOIN messages m
ON m.topicid = t.topicid
INNER JOIN boards b
ON b.boardid = t.boardid
WHERE b.boardid = 1
GROUP BY t.topicname
Can anyone please help me get this finished?
This is what my tables look like
CREATE TABLE `boards` (
`boardid` int(2) NOT NULL auto_increment,
`boardname` varchar(255) NOT NULL default '',
PRIMARY KEY (`boardid`)
);
CREATE TABLE `messages` (
`messageid` int(6) NOT NULL auto_increment,
`topicid` int(4) NOT NULL default '0',
`message` text NOT NULL,
`author` varchar(255) NOT NULL default '',
`date` timestamp(14) NOT NULL,
PRIMARY KEY (`messageid`)
);
CREATE TABLE `topics` (
`topicid` int(4) NOT NULL auto_increment,
`boardid` int(2) NOT NULL default '0',
`topicname` varchar(255) NOT NULL default '',
`author` varchar(255) NOT NULL default '',
PRIMARY KEY (`topicid`)
);
if your SQL supports the LIMIT clause,
SELECT m.author, m.date, t.topicname FROM messages m
JOIN topics t ON m.topicid = t.topicid
ORDER BY date desc LIMIT 1
otherwise:
SELECT m.author, m.date, t.topicname FROM messages m
JOIN topics t ON m.topicid = t.topicid
WHERE m.date = (SELECT max(m2.date) from messages m2)
EDIT: if you want to combine this with the original query, it has to be rewritten using subqueries to extract the message count and the date of last message:
SELECT t.topicname, t.author,
(select count(message) from messages m where m.topicid = t.topicid) AS messagecount,
lm.author, lm.date
FROM topics t
INNER JOIN messages lm
ON lm.topicid = t.topicid AND lm.date = (SELECT max(m2.date) from messages m2)
INNER JOIN boards b
ON b.boardid = t.boardid
WHERE b.boardid = 1
GROUP BY t.topicname
also notice that if you don't pick any field from table boards, you don't need the last join:
SELECT t.topicname, t.author,
(select count(message) from messages m where m.topicid = t.topicid) AS messagecount,
lm.author, lm.date
FROM topics t
INNER JOIN messages lm
ON lm.topicid = t.topicid AND lm.date = (SELECT max(m2.date) from messages m2)
WHERE t.boardid = 1
GROUP BY t.topicname
EDIT: if mysql doesn't support subqueries in the field list, you can try this:
SELECT t.topicname, t.author, mc.messagecount, lm.author, lm.date
FROM topics t
JOIN (select m.topicid, count(*) as messagecount from messages m group by m.topicid) as mc
ON mc.topicid = t.topicid
JOIN messages lm
ON lm.topicid = t.topicid AND lm.date = (SELECT max(m2.date) from messages m2)
WHERE t.boardid = 1
GROUP BY t.topicname
If you want to get the latest entry in a table, you should have a DateTime field that shows when the entry was created (or updated). You can then sort on this column and select the latest one.
But if your id field is a number, you could find the highest. But I would recommend against this because it makes many assumptions and you would be fixed to numerical ids in the future.
You can use a subselect. Eg.:
select * from messages where id = (select max(id) from messages)
edit: And if you identify the newest record by a timestamp, you'd use:
select * from messages where id = (
select id
from messages
order by post_time desc
limit 1)
With MySQL this should work:
SELECT author, date, topicname as topic FROM messages LEFT JOIN topics ON messages.topicid = topics.topicid ORDER BY date DESC, LIMIT 0, 1;