I'm trying to analyze why the following query is slower with LIMIT 0,1 than LIMIT 0,100
I've added SQL_NO_CACHE for testing purposes.
Query:
SELECT
SQL_NO_CACHE SQL_CALC_FOUND_ROWS wp_posts.*,
low_stock_amount_meta.meta_value AS low_stock_amount
FROM
wp_posts
LEFT JOIN wp_wc_product_meta_lookup wc_product_meta_lookup ON wp_posts.ID = wc_product_meta_lookup.product_id
LEFT JOIN wp_postmeta AS low_stock_amount_meta ON wp_posts.ID = low_stock_amount_meta.post_id
AND low_stock_amount_meta.meta_key = '_low_stock_amount'
WHERE
1 = 1
AND wp_posts.post_type IN ('product', 'product_variation')
AND (
(wp_posts.post_status = 'publish')
)
AND wc_product_meta_lookup.stock_quantity IS NOT NULL
AND wc_product_meta_lookup.stock_status IN('instock', 'outofstock')
AND (
(
low_stock_amount_meta.meta_value > ''
AND wc_product_meta_lookup.stock_quantity <= CAST(
low_stock_amount_meta.meta_value AS SIGNED
)
)
OR (
(
low_stock_amount_meta.meta_value IS NULL
OR low_stock_amount_meta.meta_value <= ''
)
AND wc_product_meta_lookup.stock_quantity <= 2
)
)
ORDER BY
wp_posts.ID DESC
LIMIT
0, 1
Explains shows the exact same output
1 SIMPLE wp_posts index PRIMARY,type_status_date PRIMARY 8 NULL 27071 Using where
1 SIMPLE low_stock_amount_meta ref post_id,meta_key meta_key 767 const 1 Using where
1 SIMPLE wc_product_meta_lookup eq_ref PRIMARY,stock_status,stock_quantity,product_id PRIMARY 8 woocommerce-admin.wp_posts.ID 1 Using where
The average query time is 350ms with LIMIT 0,1
The average query time is 7ms with LIMIT 0,100
The query performance gets faster starting with LIMIT 0,17
I've added another column to the order by clause as suggested in this question, but that triggers Using filesort in the explain output
Order by wp_posts.post_date, wp_posts.ID desc
1 SIMPLE wp_posts ALL PRIMARY,type_status_date NULL NULL NULL 27071 Using where; Using filesort
1 SIMPLE low_stock_amount_meta ref post_id,meta_key meta_key 767 const 1 Using where
1 SIMPLE wc_product_meta_lookup eq_ref PRIMARY,stock_status,stock_quantity,product_id PRIMARY 8 woocommerce-admin.wp_posts.ID 1 Using where
Is there a way to work around it without altering indices and why is this happening?
It's also interesting that the query time improves starting with LIMIT 0,17. I'm not sure why 17 is a magic number here.
Update 1: I just tried adding FORCE INDEX(PRIMARY) and now LIMIT 0,100 has same performance as LIMIT 0,1 smh
wp_postmeta has sloppy indexes; this slows down most queries involving it.
O. Jones and I have made a WordPress plugin to improve the indexing of postmeta. We detect all sorts of stuff like the presence of the Barracuda version of the InnoDB storage engine, and other MySQL arcana, and do the right thing.
The may speed up all three averages. It is likely to change the EXPLAINs.
Analyzing this query. I confess I don't understand the performance change from LIMIT 1 to LIMIT 17. Still, the problem for your store's customers (or managers) is the slowness on LIMIT 1. So let's address that.
The question you linked was for postgreSQL, not MySQL. postgreSQL has a more sophisticated way of handling ORDER BY ... LIMIT 1 than MySQL does. And, the resolution to that problem was the adding of an appropriate compound index for the required lookup.
It looks to me like the purpose of your query is to find the low-stock or out-of-stock WooCommerce product with the largest wp_posts.ID
The LEFT JOIN to the wp_wc_product_meta_lookup table should be, and is, straightforward: the ON-condition column mentioned is its primary key. This table is, basically, WooCommerce's materialized view of numeric values like stock_quantity stored in wp_postmeta. Numeric values in wp_postmeta can't be indexed because that table stores them as text strings. Yeah. I know.
The LEFT JOIN between wp_posts and wp_postmeta follows the very common ON-condition pattern ON posts.ID = meta.post_id AND meta.meta_key = 'constant'. That ON condition is notorious for poor support by WordPress's standard indexes. More or less the entire purpose of Rick and my Index WP MySQL For Speed plugin is to provide good compound indexes in wp_postmeta to work around that problem.
How so? This is the DDL it runs to add the indexes. The most important lines for this purpose: ((There's more to it, read the linked article.)
ALTER TABLE wp_postmeta ADD PRIMARY KEY (post_id, meta_key, meta_id);
ALTER TABLE wp_postmeta ADD KEY meta_key (meta_key, post_id);
These two indexes support the ON-condition pattern in the query. I am pretty sure that adding theses keys to postmeta will make your query more predictable and faster in performance.
If the ORDER BY post.ID DESC is a very common use case, an index could be added for that.
You could try refactoring the query (if you have control over its source) to defer the retrieval of details from the wp_posts table. Like this.
SELECT wp_posts.*, postid.low_stock_amount
FROM (
wp_posts.ID, low_stock_amount_meta.meta_value AS low_stock_amount
FROM
wp_posts
LEFT JOIN wp_wc_product_meta_lookup wc_product_meta_lookup ON wp_posts.ID = wc_product_meta_lookup.product_id
LEFT JOIN wp_postmeta AS low_stock_amount_meta ON wp_posts.ID = low_stock_amount_meta.post_id
AND low_stock_amount_meta.meta_key = '_low_stock_amount'
WHERE
1 = 1
AND wp_posts.post_type IN ('product', 'product_variation')
AND (
(wp_posts.post_status = 'publish')
)
AND wc_product_meta_lookup.stock_quantity IS NOT NULL
AND wc_product_meta_lookup.stock_status IN('instock', 'outofstock')
AND (
(
low_stock_amount_meta.meta_value > ''
AND wc_product_meta_lookup.stock_quantity <= CAST(
low_stock_amount_meta.meta_value AS SIGNED
)
)
OR (
(
low_stock_amount_meta.meta_value IS NULL
OR low_stock_amount_meta.meta_value <= ''
)
AND wc_product_meta_lookup.stock_quantity <= 2
)
)
ORDER BY
wp_posts.ID DESC
LIMIT
0, 1
) postid
LEFT JOIN wp_posts ON wp_posts.ID = postid.ID
This refactoring makes your complex query sort only the wp_posts.ID value and then retrieves the posts data once it has the appropriate value in hand. Lots of WordPress core code does something similar: retrieves a list of post ID values in one query, then retrieves the post data in a second query.
And, by the way, MySQL 8 ignores SQL_NO_CACHE.
Related
I am perplexed, because every now and then I am getting a random order just by executing this MySQL command directly on Navicat. In my understanding unless you explicitly use ORDER BY RAND(), the order should be the same all the time, but in this case it's not the case at all.
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts
INNER JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id )
INNER JOIN wp_product dg ON dg.post_id = wp_posts.ID
AND ( dg.location IN ('XV', 'QV', 'DH') )
AND (0 OR (dg.srp = 1) OR (dg.SoldDate > (now() - interval 30 DAY)
AND dg.isPremium = 0 AND dg.SoldDate != '0000-00-00 00:00:00'
AND dg.SoldDate IS NOT NULL AND price > 0 AND NoImage = 0))
AND ( dg.deleted IS NULL OR dg.deleted <> 1 )
WHERE 1=1 AND ( wp_postmeta.meta_key = '_product_info_new' )
AND wp_posts.post_type = 'product'
AND (wp_posts.post_status = 'publish')
GROUP BY wp_posts.ID
ORDER BY dg.SoldDate IS NULL, dg.SoldDate ASC, dg.isPremium DESC LIMIT 0, 30
wp_product has the same number of rows after executing COUNT(*) for 10 minutes straight. But executing the same query above gives me a different result every 5 seconds or so, which is really strange. Is there a way to prevent the same query from returning a different result set?
If you don't specify an ORDER BY, the rows are returned in an undefined order by the SQL standard.
In practice, if you use InnoDB tables, the rows are returned in the order of the index used to access them.
The query optimizer might choose a different index from time to time, depending on the values you search for. It chooses index using a cost-based algorithm, trying to estimate which index will result in examining the fewest rows, or otherwise eliminating overhead.
The estimate is based on statistics about frequency of values in each index, and these are rough estimates, not always precise.
In some cases, the optimizer is choosing between alternatives that it estimates have very close cost according to its algorithm, and very slight changes in the statistics can cause the choice to flip-flop between these alternatives.
The bottom line is that if you need the rows returned in a specific order, then use ORDER BY. This will override the default order-by-index-reads behavior, and cause the result to be sorted if necessary.
I have the following query, which currently takes about 0.3s to load, causing a heavy load on my Wordpress site.
SELECT SQL_CALC_FOUND_ROWS wp11_posts.ID
FROM wp11_posts
WHERE 1=1
AND ( wp11_posts.ID NOT IN (
SELECT object_id
FROM wp11_term_relationships
WHERE term_taxonomy_id IN (137,141) )
AND (
SELECT COUNT(1)
FROM wp11_term_relationships
WHERE term_taxonomy_id IN (53)
AND object_id = wp11_posts.ID ) = 1 )
AND wp11_posts.post_type = 'post'
AND ((wp11_posts.post_status = 'publish'))
GROUP BY wp11_posts.ID
ORDER BY wp11_posts.post_date DESC
LIMIT 0, 5
Where should I start to make it execute faster? Is there an apparent mistake standing out, that should definitely had been done differently?
You have a so-called dependent subquery (a/k/a correlated subquery) in your example. It's a performance killer.
WHERE (
SELECT COUNT(1)
FROM wp_term_relationships
WHERE term_taxonomy_id IN (53)
AND object_id = wp_posts.ID
) = 1
Refactoring it to an independent subquery looks like this:
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID
FROM wp_posts
JOIN (
SELECT object_id
FROM wp_term_relationships
WHERE term_taxonomy_id IN (53)
GROUP BY object_id
HAVING COUNT(*) = 1
) justone ON wp_posts.ID = justone.object_id
... WHERE ...
See how this works? It needs to scan term_relationships just one time looking for object_ids meeting your criterion (just one). Then the ordinary inner JOIN excludes posts rows that don't meet that criterion. (The dependent subquery loops to scan the table multiple times, while we wait.)
The SQL_FOUND_ROWS thing: WordPress puts it there to help with "pagination" -- it lets WordPress figure out how many pages (in your case of five items) there are to display. It provides data to the familiar
987 Items << < 2 of [ 20 ] > >>
page-selection interface you see in many parts of WordPress: it counts all the items matched by your query (987 in this example), not just one pageload of them.
If you don't need that pagination you can turn it off by giving a 'nopagination' => true element to WP_Query(). But if your query only yields a small number of items without the LIMIT clause, this probably doesn't matter much. If you wrote the query yourself, just leave it out along with the ORDER BY and LIMIT clauses.
So, leaving in the pagination stuff, a better query is
ANALYZE FORMAT=JSON SELECT wp_posts.ID
FROM wp_posts
JOIN (
SELECT object_id
FROM wp_term_relationships
WHERE term_taxonomy_id IN (53)
GROUP BY object_id
HAVING COUNT(*) = 1
) justone ON wp_posts.ID = justone.object_id
WHERE 1 = 1
AND (
wp_posts.ID NOT IN (
SELECT object_id
FROM wp_term_relationships
WHERE term_taxonomy_id IN (137,141)
)
AND wp_posts.post_type = 'post'
AND wp_posts.post_status = 'publish')
GROUP BY wp_posts.ID
ORDER BY wp_posts.post_date DESC LIMIT 0, 5
You also have an unnecessary GROUP BY near the end of your query. It doesn't hurt performance: MySQL can tell it's not needed in this case and doesn't do anything with it. But it is extra stuff. If you wrote the query yourself leave it out.
SQL_CALC_FOUND_ROWS requires doing nearly as much work as the same query without the LIMIT. [However, removing it without doing most of the following things probably won't help much.]
Do you already have this plugin installed? https://wordpress.org/plugins/index-wp-mysql-for-speed/ If not, that may be a good starting point.
WP is not designed to handle millions of posts/attributes/terms; you may have move on beyond WP.
Using JOIN or LEFT JOIN or [NOT] EXISTS ( SELECT 1 ... ) may be more efficient than IN ( SELECT ... ), especially in older versions of MySQL.
Is your SELECT COUNT(1) attempting to demand exactly 1? That is, 2 would be disallowed? If you really wanted to know if any exist, then use
AND EXISTS( SELECT 1 FROM wp11_term_relationships
WHERE term_taxonomy_id IN (53)
AND object_id = wp11_posts.ID )`
A better index for wp11_posts [I don't know whether your WP or the Plugin has this already]:
INDEX(post_status, post_type, -- first, either order is OK
post_date, ID) -- last, in this order
Having the GROUP BY and ORDER BY the 'same' may eliminate a sort. The following change will probably give you the same results, but faster.
GROUP BY wp11_posts.ID
ORDER BY wp11_posts.post_date DESC
-->
GROUP BY wp11_posts.post_date, wp11_posts.ID
ORDER BY wp11_posts.post_date DESC, wp11_posts.ID DESC
I have a meta key which is set by a select drop down so a user can select an option between 1 and 14 and then save their post. I want the posts to display on the page from 1 to 14 ordered by date but if the user creates a new set of posts the next day I also want this to happen so you have posts 1 to 14 each day displaying in that order.. the SQL i have so far is as follows
SELECT SQL_CALC_FOUND_ROWS
wp_postmeta.meta_key,
wp_postmeta.meta_value,
wp_posts.*
FROM wp_posts
INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id)
WHERE 1=1
AND wp_posts.post_type = 'projectgallery'
AND ( wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'private')
AND (wp_postmeta.meta_key = 'gallery_area' )
GROUP BY wp_posts.post_date asc
ORDER BY CAST(wp_postmeta.meta_value AS UNSIGNED) DESC,
DATE(wp_posts.post_date) desc;
Which gives me the following output noticte thatthe posts entered at different dates with either 1 or 3 show up in sequence, ideally i want the latest ones to display directly after 14 so it starts over again. the number 14 should not be static either as if someone adds another option to the select then it will increase and decrease if an option is removed.
GROUP BY is confusingly named. It only makes sense when there's a SUM() or COUNT() or some such function in the SELECT clause. It's not useful here.
The canonical way of getting a post_meta.value into a result set of post items is this. You're close but this makes it easier to read.
SELECT SQL_CALC_FOUND_ROWS
ga.meta_value gallery_area,
p.*
FROM wp_posts p
LEFT JOIN wp_postmeta ga ON p.ID = ga.post_id AND ga.meta_key = 'gallery_area'
WHERE 1=1
AND p.post_status IN ('publish', 'private')
AND p.post_type = 'projectgallery'
Notice the two parts of the ON clause in the JOIN. That way of doing the SQL gets you just the meta_key value you want cleanly.
So, that's your result set. You'll get a row for every post. If the metadata is missing, you'll get a NULL value for gallery_area.
Then you have to order the result set the way you want. First order by date, then order by gallery_area, like so:
ORDER BY DATE(p.post_date) DESC,
0+gallery_area ASC
The 0+value trick is sql shorthand for casting the value as an integer.
Edit. Things can get fouled up if the meta_value items contain extraneous characters like leading spaces. Try diagnosing with these changes. Put
DATE(p.post_date) pdate,
0+ga.meta_value numga,
ga.meta_value gallery_area
in your SELECT clause. If some of the numga items come up zero, this is your problem.
Also try
ORDER BY DATE(p.post_date) DESC,
0+TRIM(gallery_area) ASC
in an attempt to get rid of the spaces. But they might not be spaces.
I have a meta table that looks like this
ID post_id meta_key meta_value
1 1 key_01 val
2 1 key_02 val
3 1 key_03 val
and I want to get all 3 keys and values. Is it any slower to use JOIN then SELECT? JOIN would make it so much easier, for example:
SELECT M1.meta_value AS key_01, M2.meta_value AS key_02, M3.meta_value AS key_04
FROM `meta` AS M1
JOIN `meta` M2 ON M2.post_id = 1 AND M2.meta_key = 'key_02'
JOIN `meta M3 ON M3.post_id = 1 AND M3.meta_key = 'key_03'
WHERE M1.meta_key = 'key_01'
I didn't test that any, but you should get the idea.
I beleive JOINS are more efficient than using nested SELECTS. Also indexing the foreign-key columns (joined columns) will improve performance as well. So pop an index on post_id and meta_key. Indexes may hurt INSERT performance so take this into consideration.
i think that the mysql engine has a better chance to create a good query plan when using joins than it has when doing sub-selects and/or unions.
I've got a table of keywords that I regularly refresh against a remote search API, and I have another table that gets a row each each time I refresh one of the keywords. I use this table to block multiple processes from stepping on each other and refreshing the same keyword, as well as stat collection. So when I spin up my program, it queries for all the keywords that don't have a request currently in process, and don't have a successful one within the last 15 mins, or whatever the interval is. All was working fine for awhile, but now the keywords_requests table has almost 2 million rows in it and things are bogging down badly. I've got indexes on almost every column in the keywords_requests table, but to no avail.
I'm logging slow queries and this one is taking forever, as you can see. What can I do?
# Query_time: 20 Lock_time: 0 Rows_sent: 568 Rows_examined: 1826718
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT JOIN `keywords_requests` as KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
AND (KeywordsRequest.status = 'success' OR KeywordsRequest.status = 'active')
AND KeywordsRequest.source_id = '29'
AND KeywordsRequest.created > FROM_UNIXTIME(1234551323)
)
WHERE KeywordsRequest.id IS NULL
GROUP BY Keyword.id
ORDER BY KeywordsRequest.created ASC;
It seems your most selective index on Keywords is one on KeywordRequest.created.
Try to rewrite query this way:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN (
SELECT *
FROM `keywords_requests` as kr
WHERE created > FROM_UNIXTIME(1234567890) /* Happy unix_time! */
) AS KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
AND (KeywordsRequest.status = 'success' OR KeywordsRequest.status = 'active')
AND KeywordsRequest.source_id = '29'
)
WHERE keyword_id IS NULL;
It will (hopefully) hash join two not so large sources.
And Bill Karwin is right, you don't need the GROUP BY or ORDER BY
There is no fine control over the plans in MySQL, but you can try (try) to improve your query in the following ways:
Create a composite index on (keyword_id, status, source_id, created) and make it so:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN `keywords_requests` kr
ON (
keyword_id = id
AND status = 'success'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
WHERE keyword_id IS NULL
UNION
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN `keywords_requests` kr
ON (
keyword_id = id
AND status = 'active'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
WHERE keyword_id IS NULL
This ideally should use NESTED LOOPS on your index.
Create a composite index on (status, source_id, created) and make it so:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN (
SELECT *
FROM `keywords_requests` kr
WHERE
status = 'success'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
UNION ALL
SELECT *
FROM `keywords_requests` kr
WHERE
status = 'active'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
ON keyword_id = id
WHERE keyword_id IS NULL
This will hopefully use HASH JOIN on even more restricted hash table.
When diagnosing MySQL query performance, one of the first things you need to analyze is the report from EXPLAIN.
If you learn to read the information EXPLAIN gives you, then you can see where queries are failing to make use of indexes, or where they are causing expensive filesorts, or other performance red flags.
I notice in your query, the GROUP BY is irrelevant, since there will be only one NULL row returned from KeywordRequests. Also the ORDER BY is irrelevant, since you're ordering by a column that will always be NULL due to your WHERE clause. If you remove these clauses, you'll probably eliminate a filesort.
Also consider rewriting the query into other forms, and measure the performance of each. For example:
SELECT k.id, k.keyword
FROM `keywords` AS k
WHERE NOT EXISTS (
SELECT * FROM `keywords_requests` AS kr
WHERE kr.keyword_id = k.id
AND kr.status IN ('success', 'active')
AND kr.source_id = '29'
AND kr.created > FROM_UNIXTIME(1234551323)
);
Other tips:
Is kr.source_id an integer? If so, compare to the integer 29 instead of the string '29'.
Are there appropriate indexes on keyword_id, status, source_id, created? Perhaps even a compound index over all four columns would be best, since MySQL will use only one index per table in a given query.
You did a screenshot of your EXPLAIN output and posted a link in the comments. I see that the query is not using an index from Keywords, which makes sense since you're scanning every row in that table anyway. The phrase "Not exists" indicates that MySQL has optimized the LEFT OUTER JOIN a bit.
I think this should be improved over your original query. The GROUP BY/ORDER BY was probably causing it to save an intermediate data set as a temporary table, and sorting it on disk (which is very slow!). What you'd look for is "Using temporary; using filesort" in the Extra column of EXPLAIN information.
So you may have improved it enough already to mitigate the bottleneck for now.
I do notice that the possible keys probably indicate that you have individual indexes on four columns. You may be able to improve that by creating a compound index:
CREATE INDEX kr_cover ON keywords_requests
(keyword_id, created, source_id, status);
You can give MySQL a hint to use a specific index:
... FROM `keywords_requests` AS kr USE INDEX (kr_cover) WHERE ...
Dunno about MySQL but in MSSQL the lines of attack I would take are:
1) Create a covering index on KeywordsRequest status, source_id and created
2) UNION the results tog et around the OR on KeywordsRequest.status
3) Use NOT EXISTS instead o the Outer Join (and try with UNION instead of OR too)
Try this
SELECT Keyword.id, Keyword.keyword
FROM keywords as Keyword
LEFT JOIN (select * from keywords_requests where source_id = '29' and (status = 'success' OR status = 'active')
AND source_id = '29'
AND created > FROM_UNIXTIME(1234551323)
AND id IS NULL
) as KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
)
GROUP BY Keyword.id
ORDER BY KeywordsRequest.created ASC;