Preventing random ordering on MYSQL when not using ORDER BY RAND() - mysql

I am perplexed, because every now and then I am getting a random order just by executing this MySQL command directly on Navicat. In my understanding unless you explicitly use ORDER BY RAND(), the order should be the same all the time, but in this case it's not the case at all.
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts
INNER JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id )
INNER JOIN wp_product dg ON dg.post_id = wp_posts.ID
AND ( dg.location IN ('XV', 'QV', 'DH') )
AND (0 OR (dg.srp = 1) OR (dg.SoldDate > (now() - interval 30 DAY)
AND dg.isPremium = 0 AND dg.SoldDate != '0000-00-00 00:00:00'
AND dg.SoldDate IS NOT NULL AND price > 0 AND NoImage = 0))
AND ( dg.deleted IS NULL OR dg.deleted <> 1 )
WHERE 1=1 AND ( wp_postmeta.meta_key = '_product_info_new' )
AND wp_posts.post_type = 'product'
AND (wp_posts.post_status = 'publish')
GROUP BY wp_posts.ID
ORDER BY dg.SoldDate IS NULL, dg.SoldDate ASC, dg.isPremium DESC LIMIT 0, 30
wp_product has the same number of rows after executing COUNT(*) for 10 minutes straight. But executing the same query above gives me a different result every 5 seconds or so, which is really strange. Is there a way to prevent the same query from returning a different result set?

If you don't specify an ORDER BY, the rows are returned in an undefined order by the SQL standard.
In practice, if you use InnoDB tables, the rows are returned in the order of the index used to access them.
The query optimizer might choose a different index from time to time, depending on the values you search for. It chooses index using a cost-based algorithm, trying to estimate which index will result in examining the fewest rows, or otherwise eliminating overhead.
The estimate is based on statistics about frequency of values in each index, and these are rough estimates, not always precise.
In some cases, the optimizer is choosing between alternatives that it estimates have very close cost according to its algorithm, and very slight changes in the statistics can cause the choice to flip-flop between these alternatives.
The bottom line is that if you need the rows returned in a specific order, then use ORDER BY. This will override the default order-by-index-reads behavior, and cause the result to be sorted if necessary.

Related

Two same mysql queries with different execution plans

I am struggling with a mysql problem.
I have two exact same queries, just the item_id at the end is different, but they return different execution plans when I execute them with analyze/explain.
This results in a huge difference of time needed to return a result.
The query is something like
explain select `orders`.*,
(select count(*) from `items`
inner join `orders_items` on `items`.`id` = `orders_items`.`item_id`
where `orders`.`id` = `orders_items`.`order_id`
and `items`.`deleted_at` is null) as `items_count`,
(select count(*) from `returns`
where `orders`.`id` = `returns`.`order_id`
and `returns`.`deleted_at` is null) as `returns_count`,
(select count(*) from `shippings`
where `orders`.`id` = `shippings`.`order_id`
and `shippings`.`deleted_at` is null) as `shippings_count`,
(select count(*) from `orders` as `laravel_reserved_2`
where `orders`.`id` = `laravel_reserved_2`.`recurred_from_id`
and `laravel_reserved_2`.`deleted_at` is null) as `recurred_orders_count`,
(select COALESCE(SUM(orders_items.amount), 0) from `items`
inner join `orders_items` on `items`.`id` = `orders_items`.`item_id`
where `orders`.`id` = `orders_items`.`order_id`
and `items`.`deleted_at` is null) as `items_sum_orders_itemsamount`,
`orders`.*,
`orders_items`.`item_id` as `pivot_item_id`,
`orders_items`.`order_id` as `pivot_order_id`,
`orders_items`.`amount` as `pivot_amount`,
`orders_items`.`id` as `pivot_id`
from `orders`
inner join `orders_items` on `orders`.`id` = `orders_items`.`order_id`
where `orders_items`.`item_id` = 497
and `import_finished` = 1
and `orders`.`deleted_at` is null
order by `id` desc limit 50 offset 0;
As you can see it is a laravel/eloquent query.
This is the execution plan for the query above:
But when I change the item_id at the end it return the following execution plan
It is absolutely random. 30% of the item_id's return the faster one and 70% return the slower one and I have no idea why. The related data is almost the same for every item we have in our database.
I also flushed the query cache to see if this was causing the different exec plans but no success.
I am googlin' since 4 hours but I can't find anything about this exact problem.
Can someone of you guys tell me why this hapens?
Thanks in advance!!
Edit 01/21/2023 - 19:04:
Seems like mysql don't like to order by columns which are not defined in the related where clause, in this case the pivot table orders_items.
I just replaced the
order by id
with
order by orders_items.order_id
This results in a 10 times faster query.
A query using different execution plans just because of a different parameter can have several reasons. The simplest explanation would be the position of the used item_id in the relevant index. The position in the index may affect the cost of using the index which in turn may affect if it is used at all. (this is just an example)
It is important to note that the explain statement will give you the planned execution plan but maybe not the actually used one.
EXPLAIN ANALYZE is the command which will output the actually used execution plan for you. It may still yield different results for different parameters.
ON `orders`.`id` = `orders_items`.`order_id`
where `orders_items`.`item_id` = 497
and ???.`import_finished` = 1
and `orders`.`deleted_at` is null
order by ???.`id` desc
limit 50 offset 0;
One order maps to many order items, correct?
So, ORDER BY orders.id is different than ORDER BY orders_items.item_id.
Does item_id = 497 show up in many different orders?
So, think about which ORDER BY you really want.
Meanwhile, these may help with performance:
orders_items: INDEX(order_id, item_id)
returns: INDEX(order_id, deleted_at)
shippings: INDEX(order_id, deleted_at)
laravel_reserved_2: INDEX(recurred_from_id, deleted_at)

Where to start to optimize slow WP SQL query?

I have the following query, which currently takes about 0.3s to load, causing a heavy load on my Wordpress site.
SELECT SQL_CALC_FOUND_ROWS wp11_posts.ID
FROM wp11_posts
WHERE 1=1
AND ( wp11_posts.ID NOT IN (
SELECT object_id
FROM wp11_term_relationships
WHERE term_taxonomy_id IN (137,141) )
AND (
SELECT COUNT(1)
FROM wp11_term_relationships
WHERE term_taxonomy_id IN (53)
AND object_id = wp11_posts.ID ) = 1 )
AND wp11_posts.post_type = 'post'
AND ((wp11_posts.post_status = 'publish'))
GROUP BY wp11_posts.ID
ORDER BY wp11_posts.post_date DESC
LIMIT 0, 5
Where should I start to make it execute faster? Is there an apparent mistake standing out, that should definitely had been done differently?
You have a so-called dependent subquery (a/k/a correlated subquery) in your example. It's a performance killer.
WHERE (
SELECT COUNT(1)
FROM wp_term_relationships
WHERE term_taxonomy_id IN (53)
AND object_id = wp_posts.ID
) = 1
Refactoring it to an independent subquery looks like this:
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID
FROM wp_posts
JOIN (
SELECT object_id
FROM wp_term_relationships
WHERE term_taxonomy_id IN (53)
GROUP BY object_id
HAVING COUNT(*) = 1
) justone ON wp_posts.ID = justone.object_id
... WHERE ...
See how this works? It needs to scan term_relationships just one time looking for object_ids meeting your criterion (just one). Then the ordinary inner JOIN excludes posts rows that don't meet that criterion. (The dependent subquery loops to scan the table multiple times, while we wait.)
The SQL_FOUND_ROWS thing: WordPress puts it there to help with "pagination" -- it lets WordPress figure out how many pages (in your case of five items) there are to display. It provides data to the familiar
987 Items << < 2 of [ 20 ] > >>
page-selection interface you see in many parts of WordPress: it counts all the items matched by your query (987 in this example), not just one pageload of them.
If you don't need that pagination you can turn it off by giving a 'nopagination' => true element to WP_Query(). But if your query only yields a small number of items without the LIMIT clause, this probably doesn't matter much. If you wrote the query yourself, just leave it out along with the ORDER BY and LIMIT clauses.
So, leaving in the pagination stuff, a better query is
ANALYZE FORMAT=JSON SELECT wp_posts.ID
FROM wp_posts
JOIN (
SELECT object_id
FROM wp_term_relationships
WHERE term_taxonomy_id IN (53)
GROUP BY object_id
HAVING COUNT(*) = 1
) justone ON wp_posts.ID = justone.object_id
WHERE 1 = 1
AND (
wp_posts.ID NOT IN (
SELECT object_id
FROM wp_term_relationships
WHERE term_taxonomy_id IN (137,141)
)
AND wp_posts.post_type = 'post'
AND wp_posts.post_status = 'publish')
GROUP BY wp_posts.ID
ORDER BY wp_posts.post_date DESC LIMIT 0, 5
You also have an unnecessary GROUP BY near the end of your query. It doesn't hurt performance: MySQL can tell it's not needed in this case and doesn't do anything with it. But it is extra stuff. If you wrote the query yourself leave it out.
SQL_CALC_FOUND_ROWS requires doing nearly as much work as the same query without the LIMIT. [However, removing it without doing most of the following things probably won't help much.]
Do you already have this plugin installed? https://wordpress.org/plugins/index-wp-mysql-for-speed/ If not, that may be a good starting point.
WP is not designed to handle millions of posts/attributes/terms; you may have move on beyond WP.
Using JOIN or LEFT JOIN or [NOT] EXISTS ( SELECT 1 ... ) may be more efficient than IN ( SELECT ... ), especially in older versions of MySQL.
Is your SELECT COUNT(1) attempting to demand exactly 1? That is, 2 would be disallowed? If you really wanted to know if any exist, then use
AND EXISTS( SELECT 1 FROM wp11_term_relationships
WHERE term_taxonomy_id IN (53)
AND object_id = wp11_posts.ID )`
A better index for wp11_posts [I don't know whether your WP or the Plugin has this already]:
INDEX(post_status, post_type, -- first, either order is OK
post_date, ID) -- last, in this order
Having the GROUP BY and ORDER BY the 'same' may eliminate a sort. The following change will probably give you the same results, but faster.
GROUP BY wp11_posts.ID
ORDER BY wp11_posts.post_date DESC
-->
GROUP BY wp11_posts.post_date, wp11_posts.ID
ORDER BY wp11_posts.post_date DESC, wp11_posts.ID DESC

Why is limit 0,1 slower than limit 0, 17

I'm trying to analyze why the following query is slower with LIMIT 0,1 than LIMIT 0,100
I've added SQL_NO_CACHE for testing purposes.
Query:
SELECT
SQL_NO_CACHE SQL_CALC_FOUND_ROWS wp_posts.*,
low_stock_amount_meta.meta_value AS low_stock_amount
FROM
wp_posts
LEFT JOIN wp_wc_product_meta_lookup wc_product_meta_lookup ON wp_posts.ID = wc_product_meta_lookup.product_id
LEFT JOIN wp_postmeta AS low_stock_amount_meta ON wp_posts.ID = low_stock_amount_meta.post_id
AND low_stock_amount_meta.meta_key = '_low_stock_amount'
WHERE
1 = 1
AND wp_posts.post_type IN ('product', 'product_variation')
AND (
(wp_posts.post_status = 'publish')
)
AND wc_product_meta_lookup.stock_quantity IS NOT NULL
AND wc_product_meta_lookup.stock_status IN('instock', 'outofstock')
AND (
(
low_stock_amount_meta.meta_value > ''
AND wc_product_meta_lookup.stock_quantity <= CAST(
low_stock_amount_meta.meta_value AS SIGNED
)
)
OR (
(
low_stock_amount_meta.meta_value IS NULL
OR low_stock_amount_meta.meta_value <= ''
)
AND wc_product_meta_lookup.stock_quantity <= 2
)
)
ORDER BY
wp_posts.ID DESC
LIMIT
0, 1
Explains shows the exact same output
1 SIMPLE wp_posts index PRIMARY,type_status_date PRIMARY 8 NULL 27071 Using where
1 SIMPLE low_stock_amount_meta ref post_id,meta_key meta_key 767 const 1 Using where
1 SIMPLE wc_product_meta_lookup eq_ref PRIMARY,stock_status,stock_quantity,product_id PRIMARY 8 woocommerce-admin.wp_posts.ID 1 Using where
The average query time is 350ms with LIMIT 0,1
The average query time is 7ms with LIMIT 0,100
The query performance gets faster starting with LIMIT 0,17
I've added another column to the order by clause as suggested in this question, but that triggers Using filesort in the explain output
Order by wp_posts.post_date, wp_posts.ID desc
1 SIMPLE wp_posts ALL PRIMARY,type_status_date NULL NULL NULL 27071 Using where; Using filesort
1 SIMPLE low_stock_amount_meta ref post_id,meta_key meta_key 767 const 1 Using where
1 SIMPLE wc_product_meta_lookup eq_ref PRIMARY,stock_status,stock_quantity,product_id PRIMARY 8 woocommerce-admin.wp_posts.ID 1 Using where
Is there a way to work around it without altering indices and why is this happening?
It's also interesting that the query time improves starting with LIMIT 0,17. I'm not sure why 17 is a magic number here.
Update 1: I just tried adding FORCE INDEX(PRIMARY) and now LIMIT 0,100 has same performance as LIMIT 0,1 smh
wp_postmeta has sloppy indexes; this slows down most queries involving it.
O. Jones and I have made a WordPress plugin to improve the indexing of postmeta. We detect all sorts of stuff like the presence of the Barracuda version of the InnoDB storage engine, and other MySQL arcana, and do the right thing.
The may speed up all three averages. It is likely to change the EXPLAINs.
Analyzing this query. I confess I don't understand the performance change from LIMIT 1 to LIMIT 17. Still, the problem for your store's customers (or managers) is the slowness on LIMIT 1. So let's address that.
The question you linked was for postgreSQL, not MySQL. postgreSQL has a more sophisticated way of handling ORDER BY ... LIMIT 1 than MySQL does. And, the resolution to that problem was the adding of an appropriate compound index for the required lookup.
It looks to me like the purpose of your query is to find the low-stock or out-of-stock WooCommerce product with the largest wp_posts.ID
The LEFT JOIN to the wp_wc_product_meta_lookup table should be, and is, straightforward: the ON-condition column mentioned is its primary key. This table is, basically, WooCommerce's materialized view of numeric values like stock_quantity stored in wp_postmeta. Numeric values in wp_postmeta can't be indexed because that table stores them as text strings. Yeah. I know.
The LEFT JOIN between wp_posts and wp_postmeta follows the very common ON-condition pattern ON posts.ID = meta.post_id AND meta.meta_key = 'constant'. That ON condition is notorious for poor support by WordPress's standard indexes. More or less the entire purpose of Rick and my Index WP MySQL For Speed plugin is to provide good compound indexes in wp_postmeta to work around that problem.
How so? This is the DDL it runs to add the indexes. The most important lines for this purpose: ((There's more to it, read the linked article.)
ALTER TABLE wp_postmeta ADD PRIMARY KEY (post_id, meta_key, meta_id);
ALTER TABLE wp_postmeta ADD KEY meta_key (meta_key, post_id);
These two indexes support the ON-condition pattern in the query. I am pretty sure that adding theses keys to postmeta will make your query more predictable and faster in performance.
If the ORDER BY post.ID DESC is a very common use case, an index could be added for that.
You could try refactoring the query (if you have control over its source) to defer the retrieval of details from the wp_posts table. Like this.
SELECT wp_posts.*, postid.low_stock_amount
FROM (
wp_posts.ID, low_stock_amount_meta.meta_value AS low_stock_amount
FROM
wp_posts
LEFT JOIN wp_wc_product_meta_lookup wc_product_meta_lookup ON wp_posts.ID = wc_product_meta_lookup.product_id
LEFT JOIN wp_postmeta AS low_stock_amount_meta ON wp_posts.ID = low_stock_amount_meta.post_id
AND low_stock_amount_meta.meta_key = '_low_stock_amount'
WHERE
1 = 1
AND wp_posts.post_type IN ('product', 'product_variation')
AND (
(wp_posts.post_status = 'publish')
)
AND wc_product_meta_lookup.stock_quantity IS NOT NULL
AND wc_product_meta_lookup.stock_status IN('instock', 'outofstock')
AND (
(
low_stock_amount_meta.meta_value > ''
AND wc_product_meta_lookup.stock_quantity <= CAST(
low_stock_amount_meta.meta_value AS SIGNED
)
)
OR (
(
low_stock_amount_meta.meta_value IS NULL
OR low_stock_amount_meta.meta_value <= ''
)
AND wc_product_meta_lookup.stock_quantity <= 2
)
)
ORDER BY
wp_posts.ID DESC
LIMIT
0, 1
) postid
LEFT JOIN wp_posts ON wp_posts.ID = postid.ID
This refactoring makes your complex query sort only the wp_posts.ID value and then retrieves the posts data once it has the appropriate value in hand. Lots of WordPress core code does something similar: retrieves a list of post ID values in one query, then retrieves the post data in a second query.
And, by the way, MySQL 8 ignores SQL_NO_CACHE.

MY SQL running very slow due to `Order by` & `Limit`

I have a performance issue with the query below on MYSQL. The below query has 5 tables involved. When I apply the order by and limit, the results are retrieved in 0.3 secs. But without the order by and limit, I was able to get the results in 0.01 secs. I am tired changing the query but that did not work. Could someone please help me with this query so I can get the results in desired time (<0.3 secs).
Below are the details.
m_todos = 286579 (records)
m_pat = 214858 (records)
users = 119 (records)
m_programs = 26 (records)
role = 4 (records)
SELECT *
FROM (
SELECT t.*,
mp.name as A_name,
u.first_name, u.last_name,
p.first, p.last, p.zone, p.language,p.handling,
r.name,
u2.first_name AS created_first_name,
u2.last_name AS created_last_name
FROM m_todos t
INNER JOIN role r ON t.role_id=r.id
INNER JOIN m_pat p ON t.patient_id = p.id
LEFT JOIN users u2 ON t.created_id=u2.id
LEFT JOIN m_programs mp ON t.prog_id=mp.id
LEFT JOIN users u ON t.user_id=u.id
WHERE t.role_id !='9'
AND t.completed = '0000-00-00 00:00:00'
) C
ORDER BY priority DESC, due ASC
LIMIT 0,10
Get rid of the outer SELECT; move the ORDER BY and LIMIT in.
Indexes:
t: (completed)
t: (priority, due)
I assume priority and due are in t?? Please be explicit in the query. It could make a huge difference.
If the following works, it should speed things up a lot: Start by finding the t.id without all the JOINs:
SELECT id
FROM m_todos
WHERE role_id !='9'
AND completed = '0000-00-00 00:00:00'
ORDER BY priority DESC, due DESC
LIMIT 10
That will benefit from this covering composite index:
INDEX(completed, role_id, priority, due, id)
Debug that. Then use it in the rest:
SELECT t.*, the-other-stuff
FROM ( that-query ) AS t1
JOIN m_todos AS t USING(id)
then-the-rest-of-the-JOINs
ORDER BY priority DESC, due ASC -- yes, again
If you don't need all of t.*, it may be beneficial to spell out the actual columns needed.
The reason for this to run much faster is that the 10 rows are found efficiently by looking only at the one table. The original code was shoveling around a lot more rows than 10 and they included all the columns of t, plus columns from the other tables.
My version does only 10 lookups for all the extra stuff.

mySQL query ignoring NOT IN function

This query is processing and running but it is completely ignoring the NOT IN section
SELECT * FROM `offers` as `o` WHERE `o`.country_iso = '$country_iso' AND `o`.`id`
not in (select distinct(offer_id) from aff_disabled_offers
where offer_id = 'o.id' and user_id = '1') ORDER by rand() LIMIT 7
Maybe your "not in" query returns nothing.
Shouldn't the
where offer_id='o.id'
Be
where offer_id=o.id
?
guido has the answer... it looks like you meant to create a correlated subquery. 'o.id' is being seen as a literal.
SOME CAUTIONS:
You usually want some sort of guarantee that the subquery in the NOT IN predicate does NOT return a NULL value. If you don't have that guarantee enforced from the database, adding a WHERE/HAVING return_expr IS NOT NULL in the subquery is sufficient to give you that guarantee.
That correlated subquery is going to eat your lunch, performance wise, on large sets. As will that ORDER BY rand().
Generally, an anti-join pattern turns out to be much more efficient on large sets:
SELECT o.*
FROM offers o
LEFT
JOIN aff_disabled_offers d
ON d.user_id = '1'
AND d.offer_id = o.id
WHERE d.offer_id IS NULL
AND o.country_iso = '$country_iso'
ORDER BY rand()
LIMIT 7