mysql slow queries and timeout in wordpress - mysql

I am not an expert in sql.
My wordpress started to return timeouts and respond really slow.
when I started digging, I noticed that the slow_query log has a lot to tell me.
unfortunately I have a lot of slow queries.
for example:
# Time: 140425 17:03:29
# User#Host: geektime[geektime] # localhost []
# Query_time: 7.024031 Lock_time: 0.000432 Rows_sent: 0 Rows_examined: 0
SET timestamp=1398434609;
SELECT wp_posts.*
FROM wp_posts
INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id)
INNER JOIN wp_postmeta AS mt1 ON (wp_posts.ID = mt1.post_id)
LEFT JOIN wp_postmeta AS order1 ON order1.post_id = wp_posts.ID
AND order1.meta_key = '_event_start_date'
LEFT JOIN wp_postmeta AS order2 ON order2.post_id = wp_posts.ID
AND order2.meta_key = '_event_start_time'
WHERE 1=1
AND wp_posts.post_type = 'event'
AND (wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'future'
OR wp_posts.post_status = 'draft'
OR wp_posts.post_status = 'pending')
AND ((wp_postmeta.meta_key = '_event_start_date'
AND CAST(wp_postmeta.meta_value AS CHAR) BETWEEN '2014-04-11' AND '2014-04-17')
OR (mt1.meta_key = '_event_end_date'
AND CAST(mt1.meta_value AS CHAR) BETWEEN '2014-04-11' AND '2014-04-17'))
GROUP BY wp_posts.ID
ORDER BY order1.meta_value,
order2.meta_value ASC;
The columns post_id, meta_id and meta_key are indexed in wp_postmeta table.
The columns ID, post_name, post_type, post_status, post_date,post_parent, post_author and guid are indexed in wp_posts table.
however, the columns ID and GUID are indexed twice, is it bad?
and there are 4 indexs with the same key_name: type_status_date, is it bad?
How could it be that I have 60K rows in wp_posts and 3M rows in wp_postmeta?
I know its a lot to ask but I really tried to understand from researching online.
thanks in advance.

however, the columns ID and GUID are indexed twice, is it bad?
There are two different columns, so no, unless you're meaning that both have two indexes on them — in which case yes, it's bad and likely a bug in one of your theme or plugins (or a prior bug in WP itself).
and there are 4 indexs with the same key_name: type_status_date, is it bad?
Same as above: if you mean four identical indexes, it's either a theme or plugin or WP bug and you can safely drop the duplications.
How could it be that I have 60K rows in wp_posts and 3M rows in wp_postmeta?
Because the WP meta API sucks and enforces a database anti-pattern called the Entity Attribute Value (also known as EAV):
http://en.wikipedia.org/wiki/Entity-attribute-value_model
Cursory googling SO will yield plenty of threads that explain why it is a bad idea to store data in an EAV or equivalent (json, hstore, xml, whatever) if the stuff ever needs to appear in e.g. a where, join or order by clause.
You can see the inefficiencies first-hand in form of the slow query you highlighted. The query is joining the meta table four times, does so twice with a cast operator to boot — and it casts the value to char instead of date at that. Adding insult to injury, it then proceeds to order rows using values stored within it. It is a recipe for poor performance.
There is, sadly, little means of escaping the repulsive stench of this sewage, short of writing your own plugins that create proper tables to store, index and query the data you need in lieu of using the WP meta API, its wretched quoting madness, and the putrid SQL that results from using it.
One thing that you can do as temporary duct tape and WD-40 measure while you rewrite the plugins you're using from the ground up, is to toss callbacks on one or more of the filters you'll find in the giant mess of a class method that is WP_Query#get_posts(). For instance the posts_request filter, which holds the full and final SQL query, allows you to rewrite anything to your liking using regex-foo. It's no magic bullet: doing so will allow you to fix bugs such as integer values getting sorted lexicographically and such, as well as toss in very occasional query optimizations; little more.
Edit: Upon re-reading your query, methinks you're mostly in luck with respect to that last point. Your particular query features the following abomination:
INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id)
INNER JOIN wp_postmeta AS mt1 ON (wp_posts.ID = mt1.post_id)
LEFT JOIN wp_postmeta AS order1 ON order1.post_id = wp_posts.ID
AND order1.meta_key = '_event_start_date'
LEFT JOIN wp_postmeta AS order2 ON order2.post_id = wp_posts.ID
AND order2.meta_key = '_event_start_time'
Two of those have _event_start_date in common, so you can factor it out:
SELECT wp_posts.*
FROM wp_posts
INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id)
AND wp_postmeta.meta_key = '_event_start_date'
INNER JOIN wp_postmeta AS mt1 ON (wp_posts.ID = mt1.post_id)
AND mt1.meta_key = '_event_end_date'
INNER JOIN wp_postmeta AS order2 ON order2.post_id = wp_posts.ID
AND order2.meta_key = '_event_start_time'
WHERE 1=1
AND wp_posts.post_type = 'event'
AND (wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'future'
OR wp_posts.post_status = 'draft'
OR wp_posts.post_status = 'pending')
AND (CAST(wp_postmeta.meta_value AS CHAR) BETWEEN '2014-04-11' AND '2014-04-17'
OR CAST(mt1.meta_value AS CHAR) BETWEEN '2014-04-11' AND '2014-04-17')
GROUP BY wp_posts.ID
ORDER BY wp_postmeta.meta_value,
order2.meta_value ASC;

Among other things, slow performance is caused by the use of functions like this:
AND CAST(wp_postmeta.meta_value AS CHAR) BETWEEN '2014-04-11' AND '2014-04-17')
Assuming that field is a date field, you will get better performance with something like this:
and wp_postmeta.meta_value >= AStartDateVariable
and wp_postmeta.meta_value < TheDayAfterAnEndDateVariable
That will be even more true if meta_value is indexed. I assume you will be sending these variables as query parmameters.

Holy cow! 3 megarows in postmeta? 60k posts? Something is seriously wrong with your installation.
Is it possible that your events table is open to spammers entering rubbish?
Do you have tons of old expired events that could somehow be purged from your system?
You may be able to get your system back on the air by increasing your timeout value. If you know how to handle php.ini, go find the timeout value and increase it, or ask your hosting company for help.
Are you on one of those $5 per month hosting companies? With sixty thousand events to handle, you may need to upgrade.
The proximate cause of the timeout is obvious. This sequence of code is full-scanning that monster post_meta table TWICE!
Why? It has an OR in it. And it is applying functions to the value of a column.
AND ((wp_postmeta.meta_key = '_event_start_date'
AND CAST(wp_postmeta.meta_value AS CHAR) BETWEEN '2014-04-11' AND '2014-04-17')
OR (mt1.meta_key = '_event_end_date'
AND CAST(mt1.meta_value AS CHAR) BETWEEN '2014-04-11' AND '2014-04-17'))
One of the disadvantages of the WordPress schema when you scale up a site is the generic nature of the postmeta table. This query does date range searches, but it's hard to index a key-value repository like postmeta to optimize those.
Do you know your way around the code of the Events Manager plugin you're using? If so, you may want to investigate optimizing this yourself.
If not, seek support from the Events Manager plugin developer.

Related

very slow query with ORDER BY

I have read through many tutorials online and here on stackoverflow but I still can't figure out how to solve the problem I'm facing right now.
I would like to tell you guys that I'm a mysql newbie so please forgive my noobness.
Alright, the query is this and it grabs the information that I need from wordpress database
SELECT
product.ID productId,
product.guid productLink,
product.post_title productTitle,
post.ID postId,
post.post_title postTitle,
post.post_content postContent,
post.post_date postDate,
tm.slug typeSlug, tm.name typeName,
tm2.slug langSlug, tm2.name langName,
tm3.slug pubSlug, tm3.name pubName,
IFNULL(wl.id,0) wishlist
FROM wp_posts product
JOIN wp_postmeta meta ON meta.meta_key = 'p2m' AND meta.meta_value=product.ID
JOIN wp_posts post ON post.ID = meta.post_id
JOIN wp_term_relationships tr ON tr.object_id = product.ID
JOIN wp_term_taxonomy tt ON tt.term_taxonomy_id = tr.term_taxonomy_id AND tt.taxonomy = 'mtype'
JOIN wp_terms tm ON tm.term_id = tt.term_id
JOIN wp_term_relationships tr2 ON tr2.object_id = product.ID
JOIN wp_term_taxonomy tt2 ON tt2.term_taxonomy_id = tr2.term_taxonomy_id AND tt2.taxonomy = 'language'
JOIN wp_terms tm2 ON tm2.term_id = tt2.term_id
JOIN wp_term_relationships tr3 ON tr3.object_id = product.ID
JOIN wp_term_taxonomy tt3 ON tt3.term_taxonomy_id = tr3.term_taxonomy_id AND tt3.taxonomy = 'publisher'
JOIN wp_terms tm3 ON tm3.term_id = tt3.term_id
LEFT JOIN wp_yith_wcwl wl ON wl.user_id = 1 AND wl.prod_id = product.ID AND wl.post_id = post.ID
WHERE product.post_type = 'product'
ORDER BY post.post_date DESC LIMIT 0,35
When I remove "ORDER BY post.post_date DESC" the speed of the query gets down to .03 seconds which is freaking amazing.. But with the addition of the "ORDER BY post.post_date DESC" the speed of the query goes to amazing 10+ seconds which is way too long..
I've used EXPLAIN and it seems that there is usage of filesort when the ORDER BY by date gets into the query.
I need to have my query reply back the results according to the post_date so I can't figure out what I could do at this point...
Additionally, I would like to point it out that in Database Description of wordpress there is an INDEX referred as "type_status_date" which could be used in my case. However, I'm totally clueless where to use it and how to do it. If there is anyone who can point out the flaw in the logic of my query or help me out with the optimization of the query (or index) please do so. Thanks for you kind attention!
P.S: I don't know how to create an index too :)
Initial Result of EXPLAIN with ORDER BY
JOIN wp_postmeta meta
ON meta.meta_key = 'p2m' -- filters
AND meta.meta_value=product.ID -- shows relation
is confusing. JOIN...ON is used to say how two tables are related. Filters belong in WHERE:
WHERE ...
AND meta.meta_key = 'p2m'
...
wp_postmeta is not well indexed. More discussion here .
Adding INDEX(post_date) may or may not help performance -- It depends on how quickly 35 good rows are found.
From the EXPLAIN, we see that the worst part is getting into meta -- something like 30K rows to look through. This _estimates that there are 30 rows with meta_key = 'p2m'. How many rows are there?
Unfortunately wp_postmeta is not designed to efficiently start with the meta_key+meta_value. This is a general problem with key-value stores (such as Posts in WP), especially when the 'value' is LONGTEXT.
The index on wp_post for type_status_date, has the date field as the third field of the index,
type_status_date INDEX
post_type
post_status
post_date
ID
So you have a predicate for post type, but post status is not included in your query in any predicate, at best it can therefore do a partial index scan of the index (sometimes called a skip scan or range scan depending on the db, my sql is not going to play ball easily on that) but it will be slower.
That is a best case scenario though, with all those joins and additional fields not covered by the index the cost of the index scan and row lookups could be far too high to consider even touching the index vs a straight scan.
It would help if you post the explain plan, it would help confirm what the optimizer was doing. The above is a more generic DB engine commentary.
type_status_date is a combined index so it's used only if you order by all it's components. It cannot be used by MySQL to order by only post_date. So the best solution is to add an index by post_date.

MySQL performance issues with big tables and joins

I have three tables, wp_posts(60000 records), wp_postmeta(130000 records) and news_news_obj(70000 records).
I want to find all the posts from news_news_obj table that are missing from the table wp_posts.
The comparison is made with news_news_obj.id with a custom field that every posts has in wp_postmeta table (oldpostid).
I tried with the 2 queries below first with a limit 30 and the one with NOT IN is faster from the one with the Joins.
The problem is that when I remove the LIMIT the query takes reaaaly too long.. I tried leaving it for a couple of hours and it didn't returned any results.
What can I do for this kind of problem and so big data?
Any help appreciated!
The first query with the joins:
SELECT meta2.id, meta2.title, meta2.main_text
FROM wp_posts
INNER JOIN wp_postmeta meta1 ON meta1.post_id = wp_posts.ID
AND meta1.meta_key = 'oldpostid'
AND wp_posts.post_type = 'post'
RIGHT JOIN news_news_obj meta2 ON meta1.meta_value = meta2.id
WHERE meta1.meta_value IS NULL
The second query I tried with NOT IN:
SELECT news_news_obj.id, news_news_obj.title, news_news_obj.main_text
FROM news_news_obj
WHERE news_news_obj.id NOT IN (
SELECT wp_postmeta.meta_value
FROM wp_posts, wp_postmeta
WHERE wp_posts.ID = wp_postmeta.post_id
AND wp_postmeta.meta_key = 'oldpostid'
AND wp_postmeta.meta_value = news_news_obj.id
AND wp_posts.post_status = 'publish'
AND wp_posts.post_type = 'post'
)
(See my Comments, plus...)
Indexes needed:
posts: INDEX(post_status, post_type, ID)
posts: INDEX(post_type, ID)
postmeta: PRIMARY KEY(post_id, meta_key)
The two queries will potentially get different results because only one has
AND wp_posts.post_status = 'publish'

Modifying a WP_Query Request join condition

I'm having an issue and have tried a lot of methods to solve it -
What I'm trying to do is modify a WordPress WP_Query before it runs: to query based on a post's parent ID rather than it's own ID. To be specific, I have a query that looks something like this:
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID
FROM wp_posts
INNER JOIN wp_term_relationships
ON (wp_posts.ID = wp_term_relationships.object_id)
INNER JOIN wp_postmeta
ON ( wp_posts.ID = wp_postmeta.post_id )
WHERE 1=1 AND wp_posts.ID IN (1,2)
AND ( wp_term_relationships.term_taxonomy_id IN (35)
AND wp_posts.post_type = 'product_variation'
AND ((wp_posts.post_status = 'publish'))
GROUP BY wp_posts.ID
ORDER BY wp_postmeta.meta_value DESC LIMIT 0, 12;
The posts passed in are woocommerce product variations. The issue is that I want to return all product variations within the specific taxonomy, but the term_taxonomy_id of 35 is referenced only by the parent. So in the first join condition above, I believe I need:
ON(wp_posts.post_parent = wp_term_relationship.object_id)
Should be easy enough, but I can't figure out a way to modify this query suitably before it runs. Here are some things I have tried:
The tax_query has a primary_id_column that seems like it would be the right value to modify. I tried modifying the args before creating the query like this:
$args['tax_query']['primary_id_column'] = 'post_parent';
$wp_query = new WP_Query( $args );
In this case, the query vars are not modified whatsoever. I also tried several ways of modifying primary_id_column after the object is created, like these:
$wp_query->tax_query->primary_id_column = 'post_parent';
$wp_query->tax_query->get_sql('wp_posts', 'post_parent');
$wp_query->set('primary_id_column','post_parent');
These do in fact modify the query vars, but no matter what - the $wp_query->request string always contains the join condition on wp_posts.ID rather than wp_posts.post_parent. I wondered if at the point I make these changes the query has already been generated and isn't changed before I get the posts. I tried running the above lines in a hook for this reason, using:
add_action( 'pre_get_posts', 'custom_use_parent' );
But no luck. If anybody has a suggestion for how I could modify the join condition in this query, it would be greatly appreciated! Thanks a ton in advance.

Troubleshooting Wordpress/Woocommerce custom SQL query for reporting

Hopefully this is the right forum, my question seems to overlap the stack exchange community so this seemed best.
I have some custom reports for my WooCommerce orders on my wordpress site. I have one query that is just freezing locally, meaning in my localhost my CPU goes to 100% and it never finishes and I don't understand why. To the point here is the query:
SELECT SUM(postmeta.meta_value)
FROM pca_postmeta AS postmeta
LEFT JOIN pca_woocommerce_order_items AS orders ON orders.order_id = postmeta.post_id
WHERE postmeta.meta_key = '_order_total'
AND orders.order_item_id IN (
SELECT item_meta.order_item_id
FROM pca_woocommerce_order_itemmeta AS item_meta
LEFT JOIN pca_woocommerce_order_items AS orders ON item_meta.order_item_id = orders.order_item_id
LEFT JOIN pca_posts AS posts ON posts.ID = orders.order_id
WHERE item_meta.meta_value = '23563'
AND posts.post_status IN ('wc-processing','wc-completed')
GROUP BY orders.order_id
)
As you can hopefully see the goal here is to get the summation of all orders from this specific campaign (23563). The nested query works exactly as expected on its own, returning just a list of IDs like so:
NOTE: little curious if 2.6289 secs is long when it only returned 65 total, although there are 148220 total
The problem is this query doesn't seem to like the nested part. Any suggestions? Completely different approach in mind?
P.S. I use that nested query at other times as well to represent all orders by campaign id in my php reporting class. But for my question PHP has nothing to do with it.
UPDATE/FOLLOW UP:
Is it possible to convert this into a join as described here: Using a SELECT statement within a WHERE clause ? I'm a little light on my SQL so not sure how I would do that but it seems promising
GROUP BY orders.order_id
does not make sense because you are selecting only order_item_id.
pca_woocommerce_order_itemmeta would benefit from
INDEX(meta_value, order_item_id)
An this might be an equivalent query, but avoiding the IN(SELECT...):
SELECT SUM(pm.meta_value)
FROM
( SELECT im.order_item_id
FROM pca_woocommerce_order_itemmeta AS im
LEFT JOIN pca_woocommerce_order_items AS o
ON im.order_item_id = o.order_item_id
LEFT JOIN pca_posts AS posts ON posts.ID = o.order_id
WHERE im.meta_value = '23563'
AND posts.post_status IN ('wc-processing','wc-completed')
GROUP BY o.order_id
) AS w
JOIN pca_woocommerce_order_items AS o ON w.order_item_id = o.order_item_id
JOIN pca_postmeta AS pm ON o.order_id = pm.post_id
WHERE pm.meta_key = '_order_total'
Edit
Some principles behind what I did. Here I am guessing at what the optimizer will do with various possible formulations of the query.
I got rid of LEFT -- This may have changed the output. But I needed to avoid LEFT JOIN ( SELECT ... ) which would not be optimizable.
By having one subquery in the list of "tables" being JOINed, the optimizer will (almost certainly) start with the subquery and do "Nested Loop Joins" to the other tables. NLJ is the common way to perform a query.
A subselect like that has no index, so it needs to be first in the order, else it will be very inefficient.
Without subqueries, the optimizer generally likes to start with whichever table has something in the WHERE clause.
The requirement to start with the subquery "table" is stronger than the desire to pick the table based on WHERE pm.meta_key = '_order_total'.
Inside the subquery, the only "=" test (WHERE im.meta_value = '23563) provides the likely starting point for that set of JOINs. This is further enhanced by it not being 'right' of a LEFT JOIN. Hence, I suggested that index.

Wordpress MySQL - custom meta key order by key and date

I have a meta key which is set by a select drop down so a user can select an option between 1 and 14 and then save their post. I want the posts to display on the page from 1 to 14 ordered by date but if the user creates a new set of posts the next day I also want this to happen so you have posts 1 to 14 each day displaying in that order.. the SQL i have so far is as follows
SELECT SQL_CALC_FOUND_ROWS
wp_postmeta.meta_key,
wp_postmeta.meta_value,
wp_posts.*
FROM wp_posts
INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id)
WHERE 1=1
AND wp_posts.post_type = 'projectgallery'
AND ( wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'private')
AND (wp_postmeta.meta_key = 'gallery_area' )
GROUP BY wp_posts.post_date asc
ORDER BY CAST(wp_postmeta.meta_value AS UNSIGNED) DESC,
DATE(wp_posts.post_date) desc;
Which gives me the following output noticte thatthe posts entered at different dates with either 1 or 3 show up in sequence, ideally i want the latest ones to display directly after 14 so it starts over again. the number 14 should not be static either as if someone adds another option to the select then it will increase and decrease if an option is removed.
GROUP BY is confusingly named. It only makes sense when there's a SUM() or COUNT() or some such function in the SELECT clause. It's not useful here.
The canonical way of getting a post_meta.value into a result set of post items is this. You're close but this makes it easier to read.
SELECT SQL_CALC_FOUND_ROWS
ga.meta_value gallery_area,
p.*
FROM wp_posts p
LEFT JOIN wp_postmeta ga ON p.ID = ga.post_id AND ga.meta_key = 'gallery_area'
WHERE 1=1
AND p.post_status IN ('publish', 'private')
AND p.post_type = 'projectgallery'
Notice the two parts of the ON clause in the JOIN. That way of doing the SQL gets you just the meta_key value you want cleanly.
So, that's your result set. You'll get a row for every post. If the metadata is missing, you'll get a NULL value for gallery_area.
Then you have to order the result set the way you want. First order by date, then order by gallery_area, like so:
ORDER BY DATE(p.post_date) DESC,
0+gallery_area ASC
The 0+value trick is sql shorthand for casting the value as an integer.
Edit. Things can get fouled up if the meta_value items contain extraneous characters like leading spaces. Try diagnosing with these changes. Put
DATE(p.post_date) pdate,
0+ga.meta_value numga,
ga.meta_value gallery_area
in your SELECT clause. If some of the numga items come up zero, this is your problem.
Also try
ORDER BY DATE(p.post_date) DESC,
0+TRIM(gallery_area) ASC
in an attempt to get rid of the spaces. But they might not be spaces.