Mysql query with order by. Strange execution plan - mysql

I have 3 tables:
Product (about 700000 rows)
ProductId int(11) AI PK
ManufacturerId int(11) FK
Name varchar(256)
Description text
SKU varchar(64)
Code varchar(64)
ArtId int(11)
StockStateId int(2) FK
Quantity int(11)
QuantityText varchar(61)
Price decimal(12,2)
CurrencyId int(2)
AutoImport bit(1)
ImpactOnBalance bit(1)
HasPhoto bit(1)
HasParams bit(1)
StockState (3 rows)
StockStateId int(2) AI PK
Name varchar(64)
Manufacturer (about 200 rows)
ManufacturerId int(11) AI PK
Name varchar(64)
Description text
SortOrder int(11)
This is my query
select
p.ProductId
,p.Name
,p.Quantity
,p.QuantityText
,m.ManufacturerId
,m.Name as ManufacturerName
,ss.StockStateId
,ss.Name as StockStateName
from Product p
inner join Manufacturer m on m.ManufacturerId = p.ManufacturerId
inner join StockState ss on ss.StockStateId = p.StockStateId
order by p.ProductId asc
limit 1000, 25
I cannot understand why mysql doesn't use right indexes (it takes ~10s to get result). Execution plan looks like this
first query.
I can force mysql to use primary index
from Product p force index (primary)
It'll increase performance to 0.015s, but I'm going to use this query in SP where order depends on input parameter. So I've added dummy case condition
set #order = '';
select
p.ProductId
,p.Name
,p.Quantity
,p.QuantityText
,m.ManufacturerId
,m.Name as ManufacturerName
,ss.StockStateId
,ss.Name as StockStateName
from Product p force index (primary)
inner join Manufacturer m on m.ManufacturerId = p.ManufacturerId
inner join StockState ss on ss.StockStateId = p.StockStateId
order by case when #order = '' then p.ProductId end asc
limit 1000, 25
This query should have the same execution plan like the previous one (ordering by the same column which is PK) but no, I've got filesort.
third query
Why is this? Could someone help me to fix this? (improve query performance)r

I think you should try this, there is a left join concept, etc.
set #order = '';
select
p.ProductId
,p.Name
,p.Quantity
,p.QuantityText
,m.ManufacturerId
,m.Name as ManufacturerName
,ss.StockStateId
,ss.Name as StockStateName
from Product p force index (primary)
//this line what I mean
inner join Manufacturer m on p.ManufacturerId = m.ManufacturerId
//also this line what I mean
inner join StockState ss on p.StockStateId = ss.StockStateId
order by case when #order = '' then p.ProductId end asc
limit 1000, 25
Because you must check row in your Product first, then join to Manufacturer
if you check Manufacturer first, it will problem if row in Manufacturer not exist in Product (wasting time)

Related

why does index not work as expected in mysql?

I really want to why my index not working.
I have two table post, post_log.
create table post
(
id int auto_increment
primary key,
comment int null,
is_used tinyint(1) default 1 not null,
is_deleted tinyint(1) default 0 not null
);
create table post_log
(
id int auto_increment
primary key,
post_id int not null,
created_at datetime not null,
user int null,
constraint post_log_post_id_fk
foreign key (post_id) references post (id)
);
create index post_log_created_at_index
on post_log (created_at);
When I queried below, created_at index works well.
explain
SELECT *
FROM post p
INNER JOIN post_log pl ON p.id = pl.post_id
WHERE pl.created_at > DATE('2022-06-01')
AND pl.created_at < DATE('2022-06-08')
AND p.is_used is TRUE
AND p.is_deleted is FALSE;
When I queried below, it doesn't work and post table do full scan.
explain
SELECT *
FROM post p
INNER JOIN post_log pl ON p.id = pl.post_id
WHERE pl.created_at > DATE('2022-06-01')
AND pl.created_at < DATE('2022-06-08')
AND p.is_used = 1
AND p.is_deleted = 0;
And below not working either.
explain
SELECT *
FROM post p
INNER JOIN post_log pl ON p.id = pl.post_id
WHERE pl.created_at > DATE('2022-06-01')
AND pl.created_at < DATE('2022-06-08')
and p.comment = 111
what is different between 'tinyint = 1' and 'tinyint is true'?
and, why first query work correctly and the others don't work correctly??
When making the query plan, MySQL has to decide whether to first filter the post_log table using the index, or first filter the post table using the is_used and is_deleted columns.
= 1 tests for the specific value 1, while IS TRUE is true for any non-zero value. I guess it decides that when you're searching for specific values, it will be more efficient to filter the post table first because there will likely be fewer matches (since these columns aren't indexed, it doesn't know that 0 and 1 are the only values).

Mysql count gives me a very bad performance, am I doing it wrong?

When I want to get the count of a left join SQL, it takes me very very long time,
I cancelled the query after 1 minutes and didn't get the result.
I have two tables.
One is customer, it looks like:
----------------customer---------------
`ID` int(11) NOT NULL AUTO_INCREMENT,
`drpc` int(10) DEFAULT NULL,
`VIN` varchar(60) COLLATE utf8_bin DEFAULT NULL,
`cph` varchar(30) COLLATE utf8_bin DEFAULT NULL,
//... another 60+ columns here
`invalid` int(1) DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `index_drpc_cph` (`drpc`,`cph`),
KEY `index_drpc_vin` (`drpc`,`VIN`),
KEY `index_drpc_invalid` (`drpc`,`invalid`),
KEY `index_cph` (`cph`)
The other is repair, and it looks like:
-------------repair----------------
`ID` int(11) NOT NULL AUTO_INCREMENT,
`drpc` int(10) NOT NULL,
`cph` varchar(10) DEFAULT NULL,
`czbh` varchar(15) DEFAULT NULL,
`gdh` varchar(12) DEFAULT NULL,
`kdrq` date DEFAULT NULL,
// ... another 20+ columns here
`invalid` int(1) DEFAULT '0',
PRIMARY KEY (`ID`),
KEY `gmrepair_cph` (`cph`),
KEY `gmrepair_czbh` (`czbh`),
KEY `gmrepair_gdh` (`gdh`),
KEY `gmrepair_drpc_kdrq` (`drpc`,`kdrq`),
KEY `index_drpc_invalid` (`drpc`,`invalid`),
KEY `index_drpc_cph` (`drpc`,`cph`)
Both tables have a field: 'cph'.
The original requirement is: for given drpc, get those data cph exists in customer but not exist in repair.
My sql statement looks like this:
SELECT * FROM customer c LEFT JOIN
( SELECT cph FROM repair b WHERE b.drpc=77) r ON c.cph = r.cph
WHERE c.drpc = 76 AND r.cph IS NULL
Here is the explain result:
BTW,
for drpc = 77 in repair table, there are about 20k records;
for drpc = 76 in customer table, there are about 60k records.
And both tables' storage are: InnoDB.
It takes about 3 seconds to execute the sql above.
But, when I want to get the count of the sql above refers to, it takes me very very long time. It cannot finished even in 60 seconds.
I am not sure what the issue is.
Could you please give me some pointers, thanks a million!
Try left outer join instead of left join.
SELECT C.*
FROM Customer C
LEFT OUTER JOIN (SELECT cph from
FROM Repair WHERE drpc = 77)r ON C.cph = r.cph
WHERE C.drpc = 76 AND R.cph IS NULL
My understanding is that the query you provide:
SELECT * FROM customer c LEFT JOIN
( SELECT cph FROM repair b WHERE b.drpc=77) r ON c.cph = r.cph
WHERE c.drpc = 76 AND r.cph IS NULL
Should be the same as a simple left join (this is the count version):
select count(*) from customer c
where c.drpc = 76 and c.cph not in (
select cph from repair where drpc = 77
)
Does this second query take too long too?
It always helps to look at the explain for the plans. It looks like the index on drpc, cph should be used for the query.
However, if your base query works, perhaps this will give you better performance.
select count(*)
from (SELECT *
FROM customer c LEFT JOIN
(SELECT distinct cph
FROM repair b
WHERE b.drpc=77
) r
ON c.cph = r.cph
WHERE c.drpc = 76 AND r.cph IS NULL
) t;
EDIT:
You may be able to force the execution plan by phrasing the query like this:
select count(*)
from customer c
where c.drpc = 76 and
not exists (select 1 from repair r where r.drpc = 77 and r.cph = c.cph);
I don't understand why others did not mention, but the subquery in your query does not allow indexes to be used efficiently, you actually left join on an unindexed table with 20k rows.
For the query you need 2 indexes:
(drpc, cph) on customers and (cph, drpc) on repair (mind the order, you do not have it yet).
Then you need to rewrite the query:
SELECT COUNT(*)
FROM customer c
LEFT JOIN repair r ON c.chp = r.chp AND r.drpc = 77
WHERE c.drpc = 76 AND r.chp IS NULL;
I think I have found the real trick.
It is because of the left join filed cph, which is a varchar(10), that caused the VERY VERY slow when doing left join job.
I create a new column: hash_cph numberic(30,0) on both tables and then convert the cph to some MD5 hash numbers in this way:
UPDATE customer SET hash_cph = CONV(RIGHT(MD5(cph),16),16,10).
So I can apply left join on the new created column hash_cph now, it will be much much more faster.
The final SQL looks like:
SELECT COUNT(*)
FROM customer c
LEFT JOIN repair r ON c.hash_cph= r.hash_cph AND r.drpc = 32 WHERE c.drpc = 1
AND r.hash_cph IS NULL;
btw, I also added index on drpc and hash_cph for both tables.
Thanks for everyone's help!!

Mysql Query Taking lot of time to fetch data

I am fetching data from multiple table having 600 thousand of records. But is taking a lot and lot of time to fetch it.
Please let me know how can i shorten the time to fetch it.
I also have used the LIMIT case, but still no improvement.
My query is:
SELECT DISTINCT
tf_history.thefind_id,
tf_product.product_id,
tf_product.`name`,
tf_product.product_url,
tf_product.image_tpm,
tf_product.image_thefind,
tf_product.image_accuracy,
(SELECT MIN(tf_h.price)
FROM tf_history AS tf_h
WHERE tf_h.thefind_id = tf_history.thefind_id) as price,
oc_product.price AS priceTPM
FROM tf_product
LEFT JOIN tf_history ON tf_product.product_id = tf_history.product_id
AND tf_product.thefind_id = tf_history.thefind_id
LEFT JOIN oc_product ON tf_product.product_id = oc_product.product_id
WHERE tf_product.product_id = #product_id
MY table:
tf_history
history_id int(11) NO PRI auto_increment
thefind_id int(11) NO
product_id int(11) NO
price decimal(15,4) NO
date datetime NO
AND
tf_product
thefind_id int(11) NO PRI auto_increment
product_id int(11) NO
name varchar(255) NO
store_id int(11) NO
product_url varchar(255) NO
image_tpm varchar(255) NO
image_thefind varchar(255) NO
image_accuracy int(3) NO
date datetime NO
But when i use this query:
SELECT * from tf_history
i got the result in 0.641s, then what can be the issue?
When there was less record the, first query was running smoothly.
Finally got the result, by using Indexes
using index will solve your problem.
Move the subselect column into the join
SELECT DISTINCT
th.thefind_id,
tp.product_id,
tp.`name`,
tp.product_url,
tp.image_tpm,
tp.image_thefind,
tp.image_accuracy,
th.price,
op.price AS priceTPM
FROM tf_product tp
LEFT JOIN (SELECT thefind_id, product_id, MIN(price) as price
FROM tf_history
group by thefind_id, product_id) th ON tp.product_id = th.product_id
AND tp.thefind_id = th.thefind_id
LEFT JOIN oc_product op ON tp.product_id = op.product_id
WHERE tp.product_id = #product_id

MySQL query -- retrieve data from two different years

I cannot seem to get this MySQL query right. My table contains yearly inventory data for retail stores. Here's the table schema:
CREATE TABLE IF NOT EXISTS `inventory_data` (
inventory_id int unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
store_id smallint unsigned NOT NULL,
inventory_year smallint unsigned NOT NULL,
shortage_dollars decimal(10,2) unsigned NOT NULL
)
engine=INNODB;
Every store is assigned to a district which in this table (some non-relevant fields removed):
CREATE TABLE IF NOT EXISTS `stores` (
store_id smallint unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
district_id smallint unsigned not null
)
engine=INNODB;
I want to be able to retrieve the shortage dollar amounts for two given years for all the stores within a given district. Inventory data for each store is only added to the inventory_data table when the inventory is completed, so not all stores within a district will all be represented all the time.
This query works to return inventory data for all stores within a given district for a given year (ex: stores in district 1 for 2012):
SELECT stores.store_id, inventory_data.shortage_dollars
FROM stores
LEFT JOIN inventory_data ON (stores.store_id = inventory_data.store_id)
AND inventory_data.inventory_year = 2012
WHERE stores.district_id = 1
But, I need to be able to get data for stores within a district for two years, such that the data looks something close to this:
store_id | yr2011 | yr2012
For the specific result format that you need, you may try the following query:
SELECT `s`.`store_id`, `i`.`shortage_dollars` AS `yr2011`, `i1`.`shortage_dollars` AS `yr2012`
FROM `stores` `s`
LEFT JOIN `inventory_data` `i` ON `s`.`store_id` = `i`.`store_id`
AND `i`.`inventory_year` = 2011
LEFT JOIN `inventory_data` `i1` ON `s`.`store_id` = `i1`.`store_id`
AND `i1`.`inventory_year` = 2012
WHERE `s`.`district_id` = 1
Alternatively, you may as well try the next simpler query.
SELECT `s`.`store_id`, `i`.`inventory_year`, `i`.`shortage_dollars`
FROM `stores` `s`
LEFT JOIN `inventory_data` `i` ON `s`.`store_id` = `i`.`store_id`
WHERE `s`.`district_id` = 1
AND `i`.`inventory_year` IN (2011, 2012)
ORDER BY `s`.`store_id`, `i`.`inventory_year`
Hope it helps!
SELECT
stores.store_id,
inventory_data.inventory_year
inventory_data.shortage_dollars
FROM
(SELECT * FROM stores district_id = 1) stores
LEFT JOIN
(SELECT * FROM inventory_data
WHERE inventory_year IN (2011,2012)) inventory_data
USING (store_id)
;
or
SELECT
stores.store_id,
GROUP_CONCAT(inventory_data.shortage_dollars) dollars_per_year
FROM
(SELECT * FROM stores district_id = 1) stores
LEFT JOIN
(SELECT * FROM inventory_data
WHERE inventory_year IN (2011,2012)) inventory_data
USING (store_id)
GROUP BY stores.id,inventory_year;

Mysql query to check if all sub_items of a combo_item are active

I am trying to write a query that looks through all combo_items and only returns the ones where all sub_items that it references have Active=1.
I think I should be able to count how many sub_items there are in a combo_item total and then compare it to how many are Active, but I am failing pretty hard at figuring out how to do that...
My table definitions:
CREATE TABLE `combo_items` (
`c_id` int(11) NOT NULL,
`Label` varchar(20) NOT NULL,
PRIMARY KEY (`c_id`)
)
CREATE TABLE `sub_items` (
`s_id` int(11) NOT NULL,
`Label` varchar(20) NOT NULL,
`Active` int(1) NOT NULL,
PRIMARY KEY (`s_id`)
)
CREATE TABLE `combo_refs` (
`r_id` int(11) NOT NULL,
`c_id` int(11) NOT NULL,
`s_id` int(11) NOT NULL,
PRIMARY KEY (`r_id`)
)
So for each combo_item, there is at least 2 rows in the combo_refs table linking to the multiple sub_items. My brain is about to make bigbadaboom :(
I would just join the three tables usually and then combo-item-wise sum up the total number of sub-items and the number of active sub-items:
SELECT ci.c_id, ci.Label, SUM(1) AS total_sub_items, SUM(si.Active) AS active_sub_items
FROM combo_items AS ci
INNER JOIN combo_refs AS cr ON cr.c_id = ci.c_id
INNER JOIN sub_items AS si ON si.s_id = cr.s_id
GROUP BY ci.c_id
Of course, instead of using SUM(1) you could just say COUNT(ci.c_id), but I wanted an analog of SUM(si.Active).
The approach proposed assumes Active to be 1 (active) or 0 (not active).
To get only those combo-items whose all sub-items are active, just add WHERE si.Active = 1. You could then reject the SUM stuff anyway. Depends on what you are looking for actually:
SELECT ci.c_id, ci.Label
FROM combo_items AS ci
INNER JOIN combo_refs AS cr ON cr.c_id = ci.c_id
INNER JOIN sub_items AS si ON si.s_id = cr.s_id
WHERE si.Active = 1
GROUP BY ci.c_id
By the way, INNER JOIN ensures that there is at least one sub-item per combo-item at all.
(I have not tested it.)
See this answer:
MySQL: Selecting foreign keys with fields matching all the same fields of another table
Select ...
From combo_items As C
Where Exists (
Select 1
From sub_items As S1
Join combo_refs As CR1
On CR1.s_id = S1.s_id
Where CR1.c_id = C.c_id
)
And Not Exists (
Select 1
From sub_items As S2
Join combo_refs As CR2
On CR2.s_id = S2.s_id
Where CR2.c_id = C.c_id
And S2.Active = 0
)
The first subquery ensures that at least one sub_item exists. The second ensures that none of the sub_items are inactive.