MySQL query optimization for extremly slow query - mysql

I'm a web developer and I'm posting for the first time on SO.
Today I'm asking for your help because I already tried all the possibilities with no luck.
I created a SAAS web application that is used by salesman on the ground, it includes an offline version where users don't need to be connected to use it.
As the database is getting bigger, the queries are taking more and more time to be executed.
Today I'm facing a big issue where the query is leading to a timeout when the user tries to display it's result.
So, here's the dump :
SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
SET AUTOCOMMIT = 0;
START TRANSACTION;
SET time_zone = "+00:00";
-- 86 rows
CREATE TABLE t2 (
id_t2 int(11) NOT NULL,
quantite_t2 int(11) NOT NULL,
ca_t2 decimal(10,2) NOT NULL,
date_t2 date NOT NULL,
import_t2 datetime NOT NULL,
id_enseigne int(11) NOT NULL,
id_t3 int(11) NOT NULL,
annee_t2 int(11) NOT NULL,
mois_t2 int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- 2012065 rows
CREATE TABLE t1 (
id_t2 int(11) NOT NULL,
id_t0 bigint(20) NOT NULL,
id_t4 bigint(20) NOT NULL,
quantite_t1 int(11) NOT NULL,
ca_t1 decimal(10,2) NOT NULL,
pvc_moyen_t1 float NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- 388 rows
CREATE TABLE t4 (
id_t4 int(11) NOT NULL,
lib_t4 text NOT NULL,
libcourt_t4 varchar(255) NOT NULL,
ean_t4 varchar(255) NOT NULL,
pcb_t4 int(11) NOT NULL,
pcb2_t4 int(11) NOT NULL,
fam_t4 varchar(255) NOT NULL,
gam_t4 varchar(255) NOT NULL DEFAULT '0',
stat_t4 int(11) NOT NULL DEFAULT 1,
vmh_t4 decimal(10,2) NOT NULL,
detail_t4 text NOT NULL,
ingr_t4 text NOT NULL,
weight_t4 float NOT NULL,
lifetime_t4 varchar(255) NOT NULL,
pmc1_t4 float NOT NULL,
pmc2_t4 float NOT NULL,
dim_t4 decimal(10,2) NOT NULL,
ordre_t4 int(11) NOT NULL,
created_t4 datetime NOT NULL,
updated_t4 datetime NOT NULL,
updated_img_t4 datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- 1 row
CREATE TABLE t3 (
id_t3 int(11) NOT NULL,
nom_t3 text NOT NULL,
stat_t3 int(11) NOT NULL,
created_t3 datetime NOT NULL,
deleted_t3 datetime NOT NULL,
updated_t3 datetime NOT NULL,
ip_create_t3 text NOT NULL,
ip_delete_t3 text NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
ALTER TABLE t2
ADD PRIMARY KEY (id_t2),
ADD KEY annee_t2 (annee_t2,mois_t2,date_t2);
ALTER TABLE t1
ADD PRIMARY KEY (id_t2,id_t0,id_t4);
ALTER TABLE t4
ADD PRIMARY KEY (id_t4,ean_t4),
ADD KEY ean_t4 (ean_t4),
ADD KEY id_t4 (id_t4);
ALTER TABLE t3
ADD PRIMARY KEY (id_t3);
ALTER TABLE t2
MODIFY id_t2 int(11) NOT NULL AUTO_INCREMENT;
ALTER TABLE t4
MODIFY id_t4 int(11) NOT NULL AUTO_INCREMENT;
ALTER TABLE t3
MODIFY id_t3 int(11) NOT NULL AUTO_INCREMENT;
COMMIT;
The executed query below takes like 4 min to execute :
-- execution time 228 seconds
SELECT SUM(t1.ca_t1) AS ca_t4, SUM(t1.quantite_t1) AS qte_t4,
t4.fam_t4, t4.gam_t4, t4.lib_t4, t4.ean_t4, t4.id_t4,
t2.annee_t2, t2.mois_t2, COUNT(t1.id_t0) AS count_mag,
t3.id_t3, t3.nom_t3
FROM t1 t1
INNER JOIN t2 t2 ON t2.id_t2 = t1.id_t2
LEFT JOIN t3 t3 ON t2.id_t3 = t3.id_t3
INNER JOIN t4 t4 ON t1.id_t4 = t4.ean_t4
WHERE t2.date_t2 BETWEEN "2017-05-01" AND "2019-05-01"
GROUP BY t2.annee_t2, t2.mois_t2, t4.id_t4
ORDER BY ca_t4 DESC;
I tried all optimization I know to help me reduce execution time but no success...
The EXPLAIN shows this :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t2 ALL PRIMARY NULL NULL NULL 86 Using where; Using temporary; Using filesort
1 SIMPLE t3 eq_ref PRIMARY PRIMARY 4 db.t2.id_t3 1
1 SIMPLE t1 ref PRIMARY,id_t2 PRIMARY 4 db.t2.id_t2 11266
1 SIMPLE t4 ALL ean_t4 NULL NULL NULL 388 Using where; Using join buffer (flat, BNL join)
Thank you for your help guys.

This looks like a fairly complex query, with multiple places that will slow it down. However, the first thing I notice is that a double lookup would have to occur in order to use the annee_2 index, which is probably why it is not using it.
Try adding id_t3 to the end of that index on table t2:
(annee_t2,mois_t2,date_t2,id_t3)
This should permit the optimize to use that index.
Run the query again (twice, to populate the buffer cache, report only the 2nd result) and if it doesn't improve sufficiently, post the new EXPLAIN plan.

The GROUP BY is probably improper since it does not include the t3 columns that are not aggergated.
Do you really want 2 years plus 1 day? Perhaps use this:
t2.date_t2 >= "2017-05-01"
AND t2.date_t2 < "2017-05-01" + INTERVAL 2 YEAR
Do not mix datatypes when JOINing -- ON t1.id_t4 = t4.ean_t4:
ean_t4 varchar(255) NOT NULL,
id_t4 bigint(20) NOT NULL,
(There may be other issues, but these should help.)

Related

Improving the performance of a MYSQL query with a one-to-many relationship

I have a query in my DB that is taking 25 seconds to return results, which is way too long. It seems like it should be pretty simple. Two tables; the main table (document) is a standard table with some data columns, the join table is a mapping table with only two columns (parent_id, division_id). Previously there wasn't an index on the mapping table so I added one and that changed the "explain" to include the index but doesn't seem to have had an impact on the performance.
The query looks like this:
explain SELECT DISTINCT doc.*
FROM document doc
LEFT JOIN multi_division_mapper divisions ON doc.id = divisions.parent_id
WHERE doc.clientId = 'SOME_GUID'
AND (divisions.division_id IS NULL OR divisions.division_id IN ('SOME_GUID'));
and the results of explain are:
Total number of rows in document: 6720
Total number of rows in mapper: 6173
From what I've been able to gather I need to improve either the "type" or the "extra" to make the query faster. What can I do here?
Create table statements:
CREATE TABLE `document` (
`id` varchar(36) NOT NULL,
`addedBy` varchar(255) DEFAULT NULL,
`addedDate` datetime NOT NULL,
`editedBy` varchar(255) DEFAULT NULL,
`editedDate` datetime NOT NULL,
`deleted` bit(1) DEFAULT NULL,
`clientId` varchar(36) NOT NULL,
`departmentId` varchar(36) DEFAULT NULL,
`documentParentId` varchar(36) DEFAULT NULL,
`documentParent` varchar(50) DEFAULT NULL,
`fileId` varchar(255) DEFAULT NULL,
`fileUrl` varchar(600) DEFAULT NULL,
`documentName` varchar(500) NOT NULL,
`displayName` varchar(255) NOT NULL,
`documentId` varchar(45) DEFAULT NULL,
`notes` varchar(1000) DEFAULT NULL,
`visibility` varchar(45) NOT NULL DEFAULT 'PRIVATE',
`documentType` varchar(45) NOT NULL,
`restrictDelete` bit(1) NOT NULL,
`customData` text,
`releaseDate` datetime NOT NULL,
`expirationDate` datetime NOT NULL,
`isApproved` bit(1) NOT NULL DEFAULT b'0',
`userSupplier` varchar(36) DEFAULT NULL,
`complianceCertificateId` varchar(36) DEFAULT NULL,
`Status` varchar(50) DEFAULT 'NEUTRAL',
PRIMARY KEY (`id`),
KEY `idx_client` (`clientId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `multi_division_mapper` (
`parent_id` varchar(36) NOT NULL,
`division_id` varchar(36) NOT NULL,
PRIMARY KEY (`parent_id`,`division_id`),
KEY `idx_parent` (`parent_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I was able to get a more favorable EXPLAIN report in a test by creating the following index:
ALTER TABLE multi_division_mapper
DROP INDEX idx_parent,
ADD INDEX (division_id, parent_id);
I also dropped idx_parent because it's redundant; it's a prefix of the primary key.
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
doc
NULL
ref
idx_client
idx_client
110
const
1
100.00
Using temporary
1
SIMPLE
divisions
NULL
ref
PRIMARY,division_id
division_id
38
const
1
100.00
Using where; Using index; Distinct
The type: ref is better than type: index.
The query I tested is slightly different, but I believe it returns the same result:
SELECT DISTINCT doc.*
FROM document doc
LEFT JOIN multi_division_mapper divisions
ON doc.id = divisions.parent_id AND divisions.division_id in ('SOME_GUID')
WHERE doc.clientId = 'SOME_GUID'

Strange index behavior mysql

I usually pride myself to be a database pro but I can't really wrap my head around this behavior. I hope someone can explain how this is working.
I have two mysql tables orders:
CREATE TABLE `orders` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`status` tinyint(4) NOT NULL,
`total` decimal(7,2) NOT NULL,
`date_created` datetime NOT NULL,
`date_updated` datetime NOT NULL,
`voucher_code` varchar(127) DEFAULT NULL,
`voucher_id` int(11) unsigned DEFAULT NULL,
`user_id` int(11) unsigned DEFAULT NULL,
`billing_address_id` int(11) unsigned NOT NULL,
`shipping_address_id` int(11) unsigned NOT NULL,
`reference_id` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `reference_id` (`reference_id`),
KEY `address_id` (`billing_address_id`)
) ENGINE=InnoDB AUTO_INCREMENT=168067 DEFAULT CHARSET=latin1;
and addresses:
CREATE TABLE `addresses` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`title` tinyint(4) DEFAULT NULL,
`first_name` varchar(255) NOT NULL,
`last_name` varchar(255) NOT NULL,
`street` varchar(255) NOT NULL,
`street2` varchar(255) DEFAULT NULL,
`company_name` varchar(255) DEFAULT NULL,
`city` varchar(45) NOT NULL,
`postcode` varchar(45) DEFAULT NULL,
`region` varchar(45) DEFAULT NULL,
`country` varchar(45) NOT NULL,
`phone` varchar(45) DEFAULT NULL,
`user_id` int(11) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_addresses_users1_idx` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=95277 DEFAULT CHARSET=latin1;
Now as you can see I have created an index inside the orders table for the billing_address_id called address_id that should match with the address id.
This is the query I am trying to run:
SELECT
o.id, a.first_name, a.last_name, o.total, o.date_created
FROM
orders o USE INDEX FOR JOIN (PRIMARY) JOIN
addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
If I run the query without any index specification it will pickup and use the address_id index which I would expect be the fastest way to match the two tables.
Strangely enough with the 'address_id' index the query runs in 2 seconds.
If i use the normal 'PRIMARY' index which works on the order id it takes 0.000 seconds.
This is bugging me out. I thought I was supposed to create indexes to expedite the joining process between tables.
If I run EXPLAIN on the two queries I get:
EXPLAIN EXTENDED
SELECT o.id, a.first_name, a.last_name, o.total, o.date_created
FROM orders o
JOIN addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
1 SIMPLE a ALL PRIMARY 95234 100.00 Using temporary; Using filesort
1 SIMPLE o ref address_id address_id 4 my_basket.a.id 1 100.00
With the index:
EXPLAIN EXTENDED
SELECT o.id, a.first_name, a.last_name, o.total, o.date_created
FROM orders o USE INDEX FOR
JOIN (PRIMARY)
JOIN addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
1 SIMPLE o index PRIMARY 4 50 332632.00
1 SIMPLE a eq_ref PRIMARY PRIMARY 4 my_basket.o.billing_address_id 1 100.00
Thank you for finding the time to answer this question.
For ORDER BY ... LIMIT queries it will often be beneficial to use a query execution plan that avoids sorting. This is not necessarily because the sorting is expensive, but because it makes it possible to stop the query execution once the number of requested rows (here 50) are found.
In your case, if one starts with table a, the full join result will have to be generated before selecting the "top" 50 rows. If you start with scanning table o using the PRIMARY index, the join result will be sorted on o.id, and the join execution can stop once 50 rows have been found.
The cost model used to select between the two approaches has been improved since MySQL 5.6. I suggest you try out MySQL 5.7 to see if the MySQL optimizer is now able to select the most optimal plan.
I'm surprised that the two queries even compile -- ORDER BY id is ambiguous since each table has a different id.
When doing a JOIN, always qualify all columns.
Meanwhile, remove the USE INDEX.

Why update mysql query run slow

I have a SQL Tabe "STG_S_CUST" which contains a lot of rows (up to 1.5 million) and another table "S_CUST" which contains a lot of rows.
when I'm executing the following Update query, it's very slow, it takes too much time.
UPDATE STG_S_CUST AS STG
INNER JOIN S_CUST AS ST ON STG.SRC_NM=ST.SRC_NM
AND STG.SRC_KEY = ST.SRC_KEY
SET UPDATE_IND = 1,
STG.S_ID = ST.S_ID,
STG.M_ID = ST.M_ID
WHERE STG.PROCESSED_IND = 0
The problem is, that I get a Timeout-Exception unable to execute SQL.
EXPLAIN UPDATE STG_S_CUST AS STG
INNER JOIN S_CUST AS ST ON STG.SRC_NM=ST.SRC_NM
AND STG.SRC_KEY = ST.SRC_KEY
SET UPDATE_IND = 1,
STG.S_ID = ST.S_ID,
STG.M_ID = ST.M_ID
WHERE STG.PROCESSED_IND = 0
Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ST ALL NULL NULL NULL NULL 10479 NULL
1 SIMPLE STG ALL NULL NULL NULL NULL 159334 Using where; Using join buffer (Block Nested Loop)
here's an abbreviated version of the create tables
STG_S_CUST :
CREATE TABLE `STG_S_CUST` (
`STG_ID` int(14) NOT NULL AUTO_INCREMENT,
`STG_DATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`SRC_KEY` varchar(100) DEFAULT NULL,
`SRC_NM` varchar(20) DEFAULT NULL,
`M_ID` int(14) DEFAULT NULL,
`S_ID` int(14) DEFAULT NULL,
`PROCESSED_IND` int(1) NOT NULL DEFAULT '0',
`THREAD_ID` int(3) DEFAULT NULL,
`UPDATE_IND` int(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`STG_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=171998 DEFAULT CHARSET=latin1
S_CUST :
CREATE TABLE `S_CUST` (
`S_ID` int(14) NOT NULL AUTO_INCREMENT,
`SRC_KEY` varchar(100) DEFAULT NULL,
`SRC_NM` varchar(20) DEFAULT NULL,
`M_ID` int(14) DEFAULT NULL,
PRIMARY KEY (`S_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=10803 DEFAULT CHARSET=latin1
Does anyone have any ideas why this would be so slow and how to speed it up ?
Could anyone help me here for optimization?
You need some indexing for making the select part faster while doing the join update, start with adding the following indexes
alter table STG_S_CUST add index PROCESSED_IND_idx(PROCESSED_IND);
alter table STG_S_CUST add index SRC_idx(SRC_NM,SRC_KEY);
alter table S_CUST add index SRC_NM_idx(SRC_NM)
Take a backup of the tables first before applying the indexes

select count, group by and having optimization

I have this query
SELECT
t2.counter_id,
t2.hash_counter,
count(1) AS cnt
FROM
table1 t1
RIGHT JOIN
table2 t2 USING(counter_id)
WHERE
t2.hash_id = 973
GROUP BY
t1.counter_id
HAVING
cnt < 8000
Here are the tables.
CREATE TABLE `table1` (
`id` varchar(255) NOT NULL,
`platform` varchar(32) DEFAULT NULL,
`version` varchar(10) DEFAULT NULL,
`edition` varchar(2) NOT NULL DEFAULT 'us',
`counter_id` int(11) NOT NULL,
`created_on` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `counter_id` (`counter_id`)
) ENGINE=InnoDB
CREATE TABLE `table2` (
`counter_id` int(11) NOT NULL AUTO_INCREMENT,
`hash_id` int(11) DEFAULT NULL,
`hash_counter` int(11) DEFAULT NULL,
PRIMARY KEY (`counter_id`),
UNIQUE KEY `counter_key` (`hash_id`,`hash_counter`)
) ENGINE=InnoDB
The "EXPLAIN" shows "Using index; Using temporary; Using filesort" for table t2. Is there any way to get rid off temporary/filesort ? or any other ideas about optimizing this guy.
Your comment above gives more insight into what you want. It is always better to explain more about what you are trying to achieve - just looking at the non-working SQL leads people down the wrong path.
So, you want to know which table2 rows have < 8000 table1 rows?
Why not this:
select *
from table2 as t2
where hash_id = 973
and 8000 < (select count(*) from table1 as t1 where t1.counter_id = t2.counter_id)
;

Why does MySQL stop using an index for a join when I select non-indexed fields in the field list

I have the following two tables:
CREATE TABLE `temporal_expressions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`dated_obj_type` varchar(255) DEFAULT NULL,
`dated_obj_id` int(11) DEFAULT NULL,
`start_date` datetime DEFAULT NULL,
`end_date` datetime DEFAULT NULL,
`start_time` int(11) DEFAULT NULL,
`end_time` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`lock_version` int(11) NOT NULL DEFAULT '0',
`wday` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `te_search` (`dated_obj_type`,`dated_obj_id`,`start_date`,`end_date`),
KEY `te_calendar` (`dated_obj_type`,`dated_obj_id`,`start_date`,`end_date`,`start_time`,`end_time`),
KEY `te_search_wday` (`dated_obj_type`,`dated_obj_id`,`start_date`,`end_date`,`wday`),
KEY `te_calendar_wday` (`dated_obj_type`,`dated_obj_id`,`start_date`,`end_date`,`start_time`,`end_time`,`wday`),
KEY `te_index` (`wday`,`dated_obj_type`,`start_date`,`end_date`,`start_time`,`end_time`,`dated_obj_id`)
) ENGINE=InnoDB AUTO_INCREMENT=8162445 DEFAULT CHARSET=latin1
CREATE TABLE `asset_blocks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`block_type` int(11) DEFAULT '0',
`spaces_left` int(11) DEFAULT NULL,
`provider_note` varchar(255) DEFAULT NULL,
`extra_data` text,
`lock_version` int(11) DEFAULT '0',
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`type` varchar(255) DEFAULT NULL,
`service_provider_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `type` (`type`,`id`),
KEY `service_provider_id` (`service_provider_id`,`type`,`id`),
) ENGINE=InnoDB AUTO_INCREMENT=516867 DEFAULT CHARSET=latin1
If I run explain on this query (note that I am only selecting fields in the te_calendar_wday index from temporal_expressions) it uses the index for the join as expected
EXPLAIN SELECT asset_blocks.*, temporal_expressions.id,
temporal_expressions.dated_obj_type, temporal_expressions.dated_obj_id,
temporal_expressions.start_date, temporal_expressions.end_date,
temporal_expressions.start_time
FROM `asset_blocks`
LEFT OUTER JOIN `temporal_expressions`
ON `temporal_expressions`.dated_obj_id = `asset_blocks`.id
AND `temporal_expressions`.dated_obj_type = 'AssetBlock'
WHERE ( temporal_expressions.start_date <= '2010-11-25'
AND temporal_expressions.end_date >= '2010-11-01'
AND temporal_expressions.start_time < 1000 AND temporal_expressions.end_time > 1200
AND temporal_expressions.wday IN (1,2,3,4,5,6)
AND asset_blocks.id IN (1,2,3,4,5,6,7,8,9) )
1 SIMPLE temporal_expressions range te_search,te_calendar,te_search_wday,te_calendar_wday,te_index te_calendar_wday 272 NULL 9 Using where; Using index
1 SIMPLE asset_blocks eq_ref PRIMARY PRIMARY 4 lb_production.temporal_expressions.dated_obj_id 1
However, if I run this query (note that I have added a non-indexed field to the field list) it no longer uses the index (it uses a join buffer). Is this intentional or am I missing something?
EXPLAIN SELECT asset_blocks.*, temporal_expressions.id,
temporal_expressions.dated_obj_type, temporal_expressions.dated_obj_id,
temporal_expressions.start_date, temporal_expressions.end_date,
temporal_expressions.start_time, temporal_expressions.created_at
FROM `asset_blocks`
LEFT OUTER JOIN `temporal_expressions`
ON `temporal_expressions`.dated_obj_id = `asset_blocks`.id
AND `temporal_expressions`.dated_obj_type = 'AssetBlock'
WHERE ( temporal_expressions.start_date <= '2010-11-25'
AND temporal_expressions.end_date >= '2010-11-01'
AND temporal_expressions.start_time < 1000 AND temporal_expressions.end_time > 1200
AND temporal_expressions.wday IN (1,2,3,4,5,6)
AND asset_blocks.id IN (1,2,3,4,5,6,7,8,9) )
1 SIMPLE asset_blocks range PRIMARY PRIMARY 4 NULL 9 Using where
1 SIMPLE temporal_expressions range te_search,te_calendar,te_search_wday,te_calendar_wday,new_te_index te_search 272 NULL 9 Using where; Using join buffer
I cannot be sure if this is the case here, but:
If you select only indexed fields, MySQL can answer the whole query out of the index and does not even load the table data file.
If you select a field that is not indexed, it has to load the table data.
When making its execution plan, in certain cases (see comment) MySQL decides to do a full table scan although an index is present. This is because it's much quicker to read all data blindly than to look up every entry in the index and then read the data.