I have a problem with this query:
SELECT a.*
FROM smartressort AS s
JOIN smartressort_to_ressort AS str
ON s.id = str.smartressort_id
JOIN article_to_ressort AS atr
ON str.ressort_id = atr.ressort_id
JOIN article AS a FORCE INDEX (source_created)
ON atr.article_id = a.id
WHERE
s.id = 1
ORDER BY
a.created_at DESC
LIMIT 25;
This one is realy slow, it some times takes 14 sec.
EXPLAIN show this:
1 SIMPLE s const PRIMARY PRIMARY 4 const 1 Using index; Using temporary; Using filesort
1 SIMPLE str ref PRIMARY,ressort_id PRIMARY 4 const 1 Using index
1 SIMPLE atr ref PRIMARY,article_id PRIMARY 4 com.nps.lvz-prod.str.ressort_id 1262 Using index
1 SIMPLE a ALL NULL NULL NULL NULL 146677 Using where; Using join buffer (flat, BNL join)
so the last "all" type is realy bad.
But i already tried to force using the index, with no luck.
The Article Table looks like this:
CREATE TABLE `article` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`node_id` varchar(255) NOT NULL DEFAULT '',
`object_id` varchar(255) DEFAULT NULL,
`headline_1` varchar(255) NOT NULL DEFAULT '',
`created_at` datetime(3) NOT NULL,
`updated_at` datetime(3) NOT NULL,
`teaser_text` longtext NOT NULL,
`content_text` longtext NOT NULL,
PRIMARY KEY (`id`),
KEY `article_nodeid` (`node_id`),
KEY `article_objectid` (`object_id`),
KEY `source_created` (`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=161116 DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC;
When i remove the FORCE INDEX, the Explain gets better, but the query is still slow.
Explain Without force index:
1 SIMPLE s const PRIMARY PRIMARY 4 const 1 Using index; Using temporary; Using filesort
1 SIMPLE str ref PRIMARY,ressort_id PRIMARY 4 const 1 Using index
1 SIMPLE atr ref PRIMARY,article_id PRIMARY 4 com.nps.lvz-prod.str.ressort_id 1262 Using index
1 SIMPLE a eq_ref PRIMARY PRIMARY 4 com.nps.lvz-prod.atr.article_id 1
And for another smartressort id(3) it looks like this:
1 SIMPLE s const PRIMARY PRIMARY 4 const 1 Using index; Using temporary; Using filesort
1 SIMPLE str ref PRIMARY,ressort_id PRIMARY 4 const 13 Using index
1 SIMPLE atr ref PRIMARY,article_id PRIMARY 4 com.nps.lvz-prod.str.ressort_id 1262 Using index
1 SIMPLE a eq_ref PRIMARY PRIMARY 4 com.nps.lvz-prod.atr.article_id 1
Here we have 13 Ressorts for one Smartressort.
Rows: 1x1x13x1262x1 = 16.406
1) What can i do to make this request faster?
2) What's wrong with the source_created index?
The SELECT * you have in your query is ugly, and this can often be an index killer. It can preclude the use of an index, because most indices you would define would not cover every column demanded by the SELECT *. The approach of this answer is to index all other tables in your query, which would therefore incentivize MySQL to just do a single scan over the article table.
CREATE INDEX idx1 ON article_to_ressort (article_id, ressort_id);
CREATE INDEX idx2 ON smartressort_to_ressort (ressort_id, smartressort_id);
These two indices should speed up the joining process. Note that I did not define an index for the smartressort table, assuming that its id column is already a primary key. I would probably write your query starting with the article table, and joining outwards, but it should not really matter.
Also, forcing an index is mostly either a bad idea or not necessary. The optimizer can usually figure out when it is best to use an index.
SELECT many columns FROM tables ORDER BY something LIMIT few is a notorious performance antipattern; it has to retrieve and order a whole mess of rows and columns, just to discard all but a few rows of the result set.
The trick is to figure out which values of article.id you need in your result set, then retrieve just those values. It's called a deferred join.
This should get you that set of id values. There's probably no need to join the smartressort table because smartressort_to_ressort contains the id values you need.
SELECT a.id
FROM article a
JOIN article_to_ressort atr ON a.id = atr.article_id
JOIN smartressort_to_ressort str ON atr.ressort_id = str.ressort_id
WHERE str.smartressort_id = 1
ORDER BY a.created_at DESC
LIMIT 25
Then you can use this as a subquery to get the rows you need.
SELECT a.*
FROM article a
WHERE a.id IN (
SELECT a.id
FROM article a
JOIN article_to_ressort atr ON a.id = atr.article_id
JOIN smartressort_to_ressort str ON atr.ressort_id = str.ressort_id
WHERE str.smartressort_id = 1
ORDER BY a.created_at DESC
LIMIT 25
)
ORDER BY a.created_at DESC
The second ORDER BY makes sure the rows from article are in a predictable order. Your index optimization work, then, need only apply to the subquery.
In addition to #TimBiegelsen 's great answer, I would recommend to modify your source_created index:
...
KEY `source_created` (`id`, `created_at`)
The gain would be that MySQL could use it for sorting, and wouldn't need to fetch all 16406 rows. It may or may not help, but worth to try (perhaps with explicite declaration to use it)
To start with: You can remove the smartressort table from your query, as it doesn't add anything to it.
The following is your query rewritten. We want all ressorts for smart ressort #1 and then all articles for these ressorts. Of these we show the newest 25.
SELECT *
FROM article
WHERE id IN
(
SELECT article_id
FROM article_to_ressort
WHERE ressort_id IN
(
SELECT ressort_id
FROM smartressort_to_ressort
WHERE smartressort_id = 1
)
)
ORDER BY created_at DESC
LIMIT 25;
Now which indexes would be needed to help the DBMS with this? Start with the inner table (smartressort_to_ressort). We access all records with a given smartressort_id and we want to get the associated ressort_id. So the index should contain these two columns in this order. Same for article_to_ressort and its ressort_id and article_id. At last we want to select the articles by the found article IDs and order by their created_at.
CREATE INDEX idx1 ON smartressort_to_ressort (smartressort_id, ressort_id);
CREATE INDEX idx2 ON article_to_ressort (ressort_id, article_id);
CREATE INDEX idx3 ON article (id, created_at);
Anyway, these indexes are just an offer to the DBMS. It may decide against them. This is especially true for the index on the article table. How many rows does the DBMS expect to access for one smartressort_id, i.e. how many rows may be in the IN clause? If the DBMS thinks that this might well be about 10% of all article IDs, it may already decide to rather read the table sequentially than muddle it's way through the index for so many rows.
So for me the solution was this:
SELECT a.*
FROM article as a USE INDEX (source_created)
where a.id in (
SELECT atr.article_id
from smartressort_to_ressort str
JOIN article_to_ressort atr ON atr.ressort_id = str.ressort_id
WHERE str.smartressort_id = 1
)
ORDER BY a.created_at DESC
LIMIT 25;
This only needs ~35ms.
Explain looks like this:
1 PRIMARY a index NULL source_created 7 NULL 1
1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 4 func 1
2 MATERIALIZED str ref PRIMARY,ressort_id,idx1 PRIMARY 4 const 1 Using index
2 MATERIALIZED atr ref PRIMARY,article_id,idx2 PRIMARY 4 com.nps.lvz-prod.str.ressort_id 1262 Using index
Even so, this query Explain looks better for me, but i don't know why exactly:
explain SELECT a.*, NOW()
FROM article as a USE INDEX (source_created)
where a.id in (SELECT atr.article_id
FROM smartressort AS s
JOIN smartressort_to_ressort AS str
ON s.id = str.smartressort_id
JOIN article_to_ressort AS atr
ON str.ressort_id = atr.ressort_id
WHERE s.id = 1
)
ORDER BY a.created_at DESC
LIMIT 25;
Output:
1 PRIMARY s const PRIMARY PRIMARY 4 const 1 Using index
1 PRIMARY a index NULL source_created 7 NULL 25
1 PRIMARY str ref PRIMARY,ressort_id,idx1 PRIMARY 4 const 1 Using index
1 PRIMARY atr eq_ref PRIMARY,article_id,idx2 PRIMARY 8 com.nps.lvz-prod.str.ressort_id,com.nps.lvz-prod.a.id 1 Using index; FirstMatch(a)
Related
I want to retrieve the latest status for an item from a history table. History table will have a record of all status changes for an item. The query must be quick to run.
Below is the query that I use to get the latest status per item
SELECT item_history.*
FROM item_history
INNER JOIN (
SELECT MAX(created_at) as created_at, item_id
FROM item_history
GROUP BY item_id
) as latest_status
on latest_status.item_id = item_history.item_id
and latest_status.created_at = item_history.created_at
WHERE item_history.status_id = 1
and item_history.created_at BETWEEN "2020-12-16" AND "2020-12-23"
I've tried putting query above into another inner join to link data with an item:
SELECT *
FROM `items`
INNER JOIN ( [query from above] )
WHERE items.category_id = 3
Notes about item_history table, I have index on the following columns: status_id, creatd_at and listing_id. I have also turned 3 of those into a compound primary key.
My issue is that MySQL keeps scanning the full table to grab MAX(created_at) which is a very slow operation, even tho I only have 3 million records within the history table.
Query plan as follows:
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
items
NULL
ref
"PRIMARY,district"
district
18
const
694
100.00
NULL
1
PRIMARY
item_history
NULL
ref
"PRIMARY,status_id,created_at,item_history_item_id_index"
PRIMARY
9
"main.items.id,const"
1
100.00
"Using where"
1
PRIMARY
NULL
ref
<auto_key0>
<auto_key0>
14
"func,main.items.id"
10
100.00
"Using where; Using index"
2
DERIVED
item_history
NULL
range
"PRIMARY,status_id,created_at,item_history_item_id_index"
item_history_item_id_index
8
NULL
2751323
100.00
"Using index"
I want to retrieve the latest status for an item from a history table.
If you want the results for just one item, then use order by and limit:
select *
from item_history
where item_id = ? and created_at between '2020-12-16' and '2020-12-23'
order by created_at desc limit 1
This query would benefit an index on (item_id, created_at).
If you want the latest status per item, I would recommend a correlated subquery:
select *
from item_history h
where created_at = (
select max(h1.created_at)
from item_history h1
where h1.item_id = h.item_id
and h1.created_at between '2020-12-16' and '2020-12-23'
)
The same index should be beneficial.
Using window function MySQL 8.0.14+:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY item_id ORDER BY created_at DESC) r
FROM item_history
WHERE item_history.status_id = 1
and item_history.created_at BETWEEN '2020-12-16' AND '2020-12-23'
)
SELECT *
FROM cte WHERE r = 1;
Index on (item_id,created_at) will also help
Both of these mysql queries produce exactly the same result but query A is a simple union and it has the where postType clause embedded inside individual queries whereas query B has the same where clause applied to the external select of the virtual table which is a union of individual query results. I am concerned that the virtual table sigma from query B might get too large for no good reason if there are a lot of rows but then I am bit confused because how would the order by work for query A ; would it also not have to make a virtual table or something like that for sorting results. All may depend on how order by works for a union ? If order by for a union is also making a temp table ; would then query A almost equate to query B in resources(it will be much easier for us to implement query B in our system compared to query A)? Please guide/advise in any way possible, thanks
Query A
SELECT `t1`.*, `t2`.*
FROM `t1` INNER JOIN `t2` ON
`t1`.websiteID= `t2`.ownerID
AND `t1`.authorID= `t2`.authorID
AND `t1`.authorID=1559 AND `t1`.postType="simplePost"
UNION
SELECT `t1`.*
FROM `t1` where websiteID=1559 AND postType="simplePost"
ORDER BY postID limit 0,50
Query B
Select * from (
SELECT `t1`.*,`t2`.*
FROM `t1` INNER JOIN `t2` ON
`t1`.websiteID= `t2`.ownerID
AND `t1`.authorID= `t2`.authorID
AND `t1`.authorID=1559
UNION
SELECT `t1`.*
FROM `t1` where websiteID=1559
)
As sigma where postType="simplePost" ORDER BY postID limit 0,50
EXPLAIN FOR QUERY A
id type table type possible_keys keys key_len ref rows Extra
1 PRIMARY t2 ref userID userID 4 const 1
1 PRIMARY t1 ref authorID authorID 4 const 2 Usingwhere
2 UNION t1 ref websiteID websiteID 4 const 9 Usingwhere
NULL UNIONRESULT <union1,2> ALL NULL NULL NULL NULL NULL Usingfilesort
EXPLAIN FOR QUERY B
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 10 Using where; Using filesort
2 DERIVED t2 ref userID userID 4 1
2 DERIVED t1 ref authorID authorID 4 2 Using where
3 UNION t1 ref websiteID websiteID 4 9
NULL UNION RESULT <union2,3> ALL NULL NULL NULL NULL NULL
There is no doubt that version 1 - separate where clauses in each side of the union - will be faster. Let's look at why version - where clause over the union result - is worse:
data volume: there's always going to be more rows in the union result, because there are less conditions on what rows are returned. This means more disk I/O (depending on indexes), more temporary storage to hold the rowset, which means more processing time
repeated scan: the entire result of the union must be scanned again to apply the condition, when it could have been handled during the initial scan. This means double handling the rowset, albeit probably in-memory, still it's extra work.
indexes aren't used for where clauses on a union result. If you have an index over the foreign key fields and postType, it would not be used
If you want maximum performance, use UNION ALL, which passes the rows straight out into the result with no overhead, instead of UNION, which removes duplicates (usually by sorting) and can be expensive and is unnecessary based in your comments
Define these indexes and use version 1 for maximum performance:
create index t1_authorID_postType on t1(authorID, postType);
create index t1_websiteID_postType on t1(websiteID, postType);
perhaps this would work in lieu:
SELECT
`t1`.*
,`t2`.*
FROM `t1`
LEFT JOIN `t2` ON `t1`.websiteID = `t2`.ownerID
AND `t1`.authorID = `t2`.authorID
AND `t1`.authorID = 1559
WHERE ( `t1`.authorID = 1559 OR `t1`.websiteID = 1559 )
AND `t1`.postType = 'simplePost'
ORDER BY postID limit 0 ,50
Both of these mysql queries produce exactly the same result but query A is a simple union and it has the where postType clause embedded inside individual queries whereas query B has the same where clause applied to the external select of the virtual table which is a union of individual query results. I am concerned that the virtual table sigma from query B might get too large for no good reason if there are a lot of rows but then I am bit confused because how would the order by work for query A ; would it also not have to make a virtual table or something like that for sorting results. All may depend on how order by works for a union ? If order by for a union is also making a temp table ; would then query A almost equate to query B in resources(it will be much easier for us to implement query B in our system compared to query A)? Please guide/advise in any way possible, thanks
Query A
SELECT `t1`.*, `t2`.*
FROM `t1` INNER JOIN `t2` ON
`t1`.websiteID= `t2`.ownerID
AND `t1`.authorID= `t2`.authorID
AND `t1`.authorID=1559 AND `t1`.postType="simplePost"
UNION
SELECT `t1`.*
FROM `t1` where websiteID=1559 AND postType="simplePost"
ORDER BY postID limit 0,50
Query B
Select * from (
SELECT `t1`.*,`t2`.*
FROM `t1` INNER JOIN `t2` ON
`t1`.websiteID= `t2`.ownerID
AND `t1`.authorID= `t2`.authorID
AND `t1`.authorID=1559
UNION
SELECT `t1`.*
FROM `t1` where websiteID=1559
)
As sigma where postType="simplePost" ORDER BY postID limit 0,50
EXPLAIN FOR QUERY A
id type table type possible_keys keys key_len ref rows Extra
1 PRIMARY t2 ref userID userID 4 const 1
1 PRIMARY t1 ref authorID authorID 4 const 2 Usingwhere
2 UNION t1 ref websiteID websiteID 4 const 9 Usingwhere
NULL UNIONRESULT <union1,2> ALL NULL NULL NULL NULL NULL Usingfilesort
EXPLAIN FOR QUERY B
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 10 Using where; Using filesort
2 DERIVED t2 ref userID userID 4 1
2 DERIVED t1 ref authorID authorID 4 2 Using where
3 UNION t1 ref websiteID websiteID 4 9
NULL UNION RESULT <union2,3> ALL NULL NULL NULL NULL NULL
There is no doubt that version 1 - separate where clauses in each side of the union - will be faster. Let's look at why version - where clause over the union result - is worse:
data volume: there's always going to be more rows in the union result, because there are less conditions on what rows are returned. This means more disk I/O (depending on indexes), more temporary storage to hold the rowset, which means more processing time
repeated scan: the entire result of the union must be scanned again to apply the condition, when it could have been handled during the initial scan. This means double handling the rowset, albeit probably in-memory, still it's extra work.
indexes aren't used for where clauses on a union result. If you have an index over the foreign key fields and postType, it would not be used
If you want maximum performance, use UNION ALL, which passes the rows straight out into the result with no overhead, instead of UNION, which removes duplicates (usually by sorting) and can be expensive and is unnecessary based in your comments
Define these indexes and use version 1 for maximum performance:
create index t1_authorID_postType on t1(authorID, postType);
create index t1_websiteID_postType on t1(websiteID, postType);
perhaps this would work in lieu:
SELECT
`t1`.*
,`t2`.*
FROM `t1`
LEFT JOIN `t2` ON `t1`.websiteID = `t2`.ownerID
AND `t1`.authorID = `t2`.authorID
AND `t1`.authorID = 1559
WHERE ( `t1`.authorID = 1559 OR `t1`.websiteID = 1559 )
AND `t1`.postType = 'simplePost'
ORDER BY postID limit 0 ,50
I Have the following query:
SELECT `p_products`.`id`, `p_products`.`name`, `p_products`.`date`,
`p_products`.`img`, `p_products`.`safe_name`, `p_products`.`sku`,
`p_products`.`productstatusid`, `op`.`quantity`
FROM `p_products`
INNER JOIN `p_product_p_category`
ON `p_products`.`id` = `p_product_p_category`.`p_product_id`
LEFT JOIN (SELECT `p_product_id`,`order_date`,SUM(`product_quantity`) as quantity
FROM `p_orderedproducts`
WHERE `order_date`>='2013-03-01 16:51:17'
GROUP BY `p_product_id`) AS op
ON `p_products`.`id` = `op`.`p_product_id`
WHERE `p_product_p_category`.`p_category_id` IN ('15','23','32')
AND `p_products`.`active` = '1'
GROUP BY `p_products`.`id`
ORDER BY `date` DESC
Explain says:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY p_product_p_category ref p_product_id,p_category_id,p_product_id_2 p_category_id 4 const 8239 Using temporary; Using filesort
1 PRIMARY p_products eq_ref PRIMARY PRIMARY 4 pdev.p_product_p_category.p_product_id 1 Using where
1 PRIMARY ALL NULL NULL NULL NULL 78
2 DERIVED p_orderedproducts index order_date p_product_id 4 NULL 201 Using where
And I have indexes on a number of columns including p_products.date.
Problem is the speed when there are more then 5000 products in a number of categories. 60000 products take >1 second. Is there any way to speed things up?
This also holds true if I remove the left join in which case the result is:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p_product_p_category index p_product_id,p_category_id,p_product_id_2 p_product_id_2 8 NULL 91167 Using where; Using index; Using temporary; Using filesort
1 SIMPLE p_products eq_ref PRIMARY PRIMARY 4 pdev.p_product_p_category.p_product_id 1 Using where
The intermediatate table p_product_p_category has indexes on both p_product_id and p_category_id aswell as a combined index with both.
Tries Ochi's suggestion and ended up with:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY ALL NULL NULL NULL NULL 62087 Using temporary; Using filesort
1 PRIMARY nr1media_products eq_ref PRIMARY PRIMARY 4 cats.nr1media_product_id 1 Using where
2 DERIVED nr1media_product_nr1media_category range nr1media_category_id nr1media_category_id 4 NULL 62066 Using where
I think I can simplify the question to how can I join my products on the category intermediate table to fetch all unique products for the selected categories, sorted by date.
EDIT:
This gives me all unique products in the categories without using a temp table for ordering or grouping:
SELECT
`p_products`.`id`,
`p_products`.`name`,
`p_products`.`img`,
`p_products`.`safe_name`,
`p_products`.`sku`,
`p_products`.`productstatusid`
FROM
p_products
WHERE
EXISTS (
SELECT
1
FROM
p_product_p_category
WHERE
p_product_p_category.p_product_id = p_products.id
AND p_category_id IN ('15', '23', '32')
)
AND p_products.active = 1
ORDER BY
`date` DESC
Above query is very fast, much faster then the join using group by order by (0.04 VS 0.7 sec), although I don't understand why it can do this query without temp tables.
I think I need to find another solution for the orderedproducts join, it still slows the query down to >1 sec. Might make a cron to update the ranking of the products sold once every night and save that info to the p_products table.
Unless someone has a definitive solution...
You are joining every type of category to products - only then it gets filtered by category id
try to limit your query as soon as possible for e.g. instead of
INNER JOIN `p_product_p_category`
do
INNER JOIN ( SELECT * FROM `p_product_p_category` WHERE `p_category_id` IN ('15','23','32') )
so that you will be working on smaller subset of products right from begining
One possible solution would be to remove the derived table and just do a single Group By:
Select P.id, P.name, P.date
, P.img, P.safe_name, P.sku
, P.productstatusid
, Sum( OP.product_quantity ) As quantity
From p_products As P
Join p_product_p_category As CAT
On p_products.id = CAT.p_product_id
Left Join p_orderedproducts As OP
On OP.p_product_id = P.id
And OP.order_date >= '2013-03-01 16:51:17'
Where CAT.p_category_id In ('15','23','32')
And P.active = '1'
Group By P.id, P.name, P.date
, P.img, P.safe_name, P.sku
, P.productstatusid
Order By P.date Desc
EXPLAIN EXTENDED SELECT id, name
FROM member
INNER JOIN group_assoc ON ( member.id = group_assoc.member_id
AND group_assoc.group_id =2 )
ORDER BY registered DESC
LIMIT 0 , 1
Outputs:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE group_assoc ref member_id,group_id group_id 4 const 3 100.00 Using temporary; Using filesort
1 SIMPLE member eq_ref PRIMARY PRIMARY 4 source_member.group_assoc.member_id 1 100.00
explain extended SELECT
id, name
FROM member WHERE
id
NOT IN (
SELECT
member_id
FROM group_assoc WHERE group_id = 2
)
ORDER BY registered DESC LIMIT 0,1
Outputs:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY member ALL NULL NULL NULL NULL 2635 100.00 Using where; Using filesort
2 DEPENDENT SUBQUERY group_assoc index_subquery member_id,group_id member_id 8 func,const 1 100.00 Using index; Using where
The first query I'm not so sure about, it uses a temporary table which seems like a worse idea. But I also see that it uses fewer rows than the 2nd query....
These queries return completely different resultsets: the first one returns members of group 2, the second one returns everybody who is not a member of group 2.
If you meant this:
SELECT id, name
FROM member
LEFT JOIN
group_assoc
ON member.id = group_assoc.member_id
AND group_assoc.group_id = 2
WHERE group_assoc.member_id IS NULL
ORDER BY
registered DESC
LIMIT 0, 1
, then the plans should be identical.
You may find this article interesting:
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
Create an index on member.registered to get rid of both filesort and temporary.
I would say the first is better. The temporary table might not be a good idea, but a subquery isn't much better. And you will give MySQL more options to optimize the query plan with an inner join than you have with a subquery.
The subquery solution is fast as long as there are just a few rows that will be returned.
But... the first and second query don't seem to be the same, should it be that way?