Optimising a MySQL query with a SUM in the sub-query

Optimising a MySQL query with a SUM in the sub-query - mysql

I'm trying to do a very specific thing in WordPress: expire posts over 30 days old that have no "likes" (or negative "likes") based on someone else's plugin. That plugin stores individual likes/dislikes for each user/post in a separate table (+1/-1), which means that my selection criteria are complex, based on a SUM.
Doing the SELECT is easy, as it is a simple JOIN on post ID with a "HAVING" clause to detect the total likes value of more than zero. It looks like this (with all the table names simplified for readability):
SELECT posts.id, SUM( wti_like_post.value )
FROM posts
JOIN wti_like_post
ON posts.ID = wti_like_post.post_id
WHERE posts.post_date < DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY posts.ID
HAVING SUM( wti_like_post.value ) < 1
But I'm stuck on optimising the UPDATE query. The unoptimised version takes 2 minutes to run, which is unacceptable.
UPDATE posts
SET posts.post_status = 'trash'
WHERE posts.post_status = 'publish'
AND posts.post_type = 'post'
AND posts.post_date < DATE_SUB(NOW(), INTERVAL 30 DAY)
AND ID IN
(SELECT post_id FROM wti_like_posts
GROUP BY post_id
HAVING SUM( wti_like_post.value ) < 1 )
This is obviously because of my inability to create an UPDATE query with a join based on a SUM result - I simply don't know how to do that (believe me, I've tried!).
If anyone could optimise that UPDATE for me, I'd be terribly grateful. It'd also teach me how to do it properly, which would be neat!
Thanks in advance.

Well it also depends on the no. of posts and also in subquery it will SUM the post ids which were trashed also there should be filter in the subquery rather than your update query try this one
UPDATE posts
SET posts.post_status = 'trash'
WHERE ID IN
(
SELECT posts.id
FROM posts
INNER JOIN wti_like_post
ON (posts.ID = wti_like_post.post_id AND posts.post_status = 'publish'
AND posts.post_type = 'post')
WHERE posts.post_date < DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY posts.ID
HAVING SUM( wti_like_post.value ) < 1
)

Well maybe sounds stupid but you could create a table out of the select, place an Index on it and then simply use the standard JOIN for update on that new table.
I guess even if you do that always on the fly, it should be faster then the non-indexed version.
EDIT:
Here is the code, sry it's out of my head haven't checked if it passes but it should give you at least an idea what I mean.
CREATE TABLE joinHelper(
id INT NOT NULL,
PRIMARY KEY ( id )
);
INSERT INTO joinHelper(id)
SELECT post_id FROM wti_like_posts
GROUP BY post_id
HAVING SUM( wti_like_post.value ) < 1
UPDATE posts JOIN joinHelper ON (posts.ID = joinHelper.id)
SET posts.post_status = 'trash'
WHERE posts.post_status = 'publish'
AND posts.post_type = 'post'
AND posts.post_date < DATE_SUB(NOW(), INTERVAL 30 DAY)

Related

Order records added in last 3 days first (exclude certain product type), then order by other columns

I have following query:
SELECT
prod.*,
details.specification,
details.warrantyinfo
FROM
Products prod
INNER JOIN ProductDetails details
ON details.product_id = prod.id
WHERE
prod.is_approved = '1'
AND prod.is_active = '1'
AND prod.is_deleted = '0'
ORDER BY
prod.created_at > '2017-08-14' DESC,
IF(prod.created_at > '2017-08-14', prod.created_at, FIELD(prod.listing_type,2,3,4,5,6)) DESC,
prod.is_featured DESC, prod.updated_at DESC
LIMIT 0, 20
What this query actually does is -
sort products added in last three days first,
Then sort product according to listing_type
Then by is_featured, etc.
Problem
This works perfectly well, but in 1st sort condition I want to exclude product if listing_type is 2.
Can anybody tell me how to achieve this
SQLFIDDLE
http://sqlfiddle.com/#!9/9bb584/1
Thanks

For those who stumble into similar issue in future,
Actually modified the query to this. Thanks #Solarflare
SELECT
prod.*,
details.specification,
details.warrantyinfo
FROM
Products prod
INNER JOIN ProductDetails details
ON details.product_id = prod.id
WHERE
prod.is_approved = '1'
AND prod.is_active = '1'
AND prod.is_deleted = '0'
ORDER BY
prod.created_at > '2017-08-14' AND prod.`listing_type`>2 DESC,
IF(prod.created_at > '2017-08-14' AND prod.`listing_type`>2, prod.created_at, FIELD(prod.listing_type,2,3,4,5,6)) DESC,
prod.is_featured DESC
And this worked flawlessly.

Include zeros in SQL count query?

I want to be able to return 0 when I am doing a count, I'd preferably not use joins as my query doesn't use them.
This is my query.
SELECT count( user_id ) as agencyLogins,
DATE_FORMAT(login_date, '%Y-%m-%d') as date
FROM logins, users
WHERE login_date >= '2015-02-10%' AND login_date < '2016-02-11%'
AND logins.user_id = users.id
GROUP BY DATE_FORMAT(login_date,'%Y-%m-%d')
What it does is counts the amount of times a user has logged into the website.
It doesn't count zeros though where as I want to know when there has been no log ins.

Please try using explicit join in the future, more readable and will make you avoid this errors. What you need is a left join:
SELECT t.id,count(s.user_id) as agencyLogins, DATE_FORMAT(s.login_date, '%Y-%m-%d') as date
FROM users t
LEFT OUTER JOIN login s
ON(t.id = s.user_id)
WHERE (s.login_date >= '2015-02-10%' AND s.login_date < '2016-02-11%') or (s.user_id is null)
GROUP BY t.id,DATE_FORMAT(s.login_date,'%Y-%m-%d')

This might be help you out
SELECT SUM(agencyLogins), date FROM (
SELECT count( user_id ) as agencyLogins,
DATE_FORMAT(login_date, '%Y-%m-%d') as date
FROM logins, users
WHERE login_date >= '2015-02-10%' AND login_date < '2016-02-11%'
AND logins.user_id = users.id
GROUP BY DATE_FORMAT(login_date,'%Y-%m-%d')
UNION ALL
SELECT 0,''
) AS A
GROUP BY DATE

I think below SQL useful to you. 2015-02-10% please remove % symbol in that string.
SELECT IF(COUNT(user_id) IS NULL,'0',COUNT(user_id)) as agencyLogins, DATE_FORMAT(login_date, '%Y-%m-%d') as date FROM users left join logins on logins.user_id = users.id
WHERE date(login_date) >= date('2015-02-10') AND date(login_date) <= date('2016-02-11')
GROUP BY DATE_FORMAT(login_date,'%Y-%m-%d')

Getting rows between 2 given dates (included rows on those dates)

I know this should be simple, but Its proving to be quite complicated, I have two tables:
USERS id | name | url
COMMENTS id | id_user | text | lang | date(datetime)
And I want to retrieve all records (comments) between this two given dates (both dates included) I have tried in two different ways but they dont work as spected, returning no results where it should:
OPTION A
The following sentence returns nothing, and there are comments and this two comments should appear as the dates are '2014-01-09 16:34:58' and '2014-01-13 10:09:24'
SELECT
comments.text,
users.url,
users.name
FROM comments
JOIN users
ON comments.id_user = users.id
WHERE comments.date BETWEEN '2014-01-13'
AND '2014-01-09'
AND comments.lang = 'es'
ORDER BY comments.date DESC
OPTION B
THe following sentence returns commments written on day '2014-01-09' BUT not the ones written on '2013-01-09'
SELECT
comments.text,
users.url,
users.name
FROM comments
JOIN users
ON comments.id_user = users.id
WHERE comments.date <= '2014-01-13'
AND comments.date > '2014-01-09'
AND comments.lang = 'es'
ORDER BY comments.date DESC
What am I doing wrong?

Change this:
WHERE comments.date BETWEEN '2014-01-13' AND '2014-01-09'
To this:
WHERE comments.date BETWEEN '2014-01-09' AND '2014-01-13 23:59:59'
The MySQL BETWEEN clause expects the min and max parameters to be ordered properly so smaller value must be specified first. Secondly, to include records for 2014-01-13 you need to add the time portion (the time part in datetime does not contain milliseconds so checking for 23:59:59 is sufficient). Alternately you can write:
WHERE comments.date >= '2014-01-09'
AND comments.date < '2014-01-13' + INTERVAL 1 DAY
-- ^---------------------------^ evaluates to 2014-01-14

Instead of BETWEEN use this
SELECT
comments.texto,
usuarios.url,
users.nombre
FROM comments
JOIN usuarios
ON comments.id_usuario = users.id
WHERE DATE(comments.date) <= '2014-01-14'
AND DATE(comments.date) >= '2014-01-09'
AND comments.lang = 'es'
ORDER BY comments.fecha DESC
OR
SELECT
comments.texto,
usuarios.url,
users.nombre
FROM comments
JOIN usuarios
ON comments.id_usuario = users.id
WHERE comments.date <= '2014-01-14 59:59:59'
AND comments.date >= '2014-01-09 00:00:00'
AND comments.lang = 'es'
ORDER BY comments.fecha DESC

Count tweets between dates (mysql)

I have an assignment to create a twitter like database. And in this assignment i have to filter out the trending topics. My idea was to count the tweets with a specific tag between the date the tweet was made and 7 days later, and order them by the count.
I have the following 2 tables i am using for this query :
Table Tweet : id , message, users_id, date
Table Tweet_tags : id, tag, tweet_id
Since mysql isn't my strong point at all im having trouble getting any results from the query.
The query i tried is :
Select
Count(twitter.tweet_tags.id) As NumberofTweets,
twitter.tweet_tags.tag
From twitter.tweet
Inner Join twitter.tweet_tags On twitter.tweet_tags.tweet_id = twitter.tweet.id
WHERE twitter.tweet_tags.tag between twitter.tweet.date and ADDDATE(twitter.tweet.date, INTERVAL 7 day)
ORDER BY NumberofTweets
The query works, but gives no results. I just can't get it to work. Could you guys please help me out on this, or if you have a better way to get the trending topics please let me know!
Thanks alot!

This is equivalent to your query, with table aliases to make it easier to read, with BETWEEN replaced by two inequality predicates, and the ADDDATE function replaced with equivalent operation...
SELECT COUNT(s.id) As NumberofTweets
, s.tag
FROM twitter.tweet t
JOIN twitter.tweet_tags s
ON s.tweet_id = t.id
WHERE s.tag >= t.date
AND s.tag <= t.date + INTERVAL 7 DAY
ORDER
BY NumberofTweets
Two things pop out at me here...
First, there is no GROUP BY. To get a count by "tag", you want at GROUP BY tag.
Second, you are comparing "tag" to "date". I don't know your tables, but that just doesn't look right. (I expect "date" is a DATETIME or TIMESTAMP, and "tag" is a character string (maybe what my daughter calls a "hash tag". Or is that tumblr she's talking about?)
If I understand your requirement:
For each tweet, and for each tag associated with that tweet, you want to get a count of the number of other tweets, that have a matching tag, that are made within 7 days after the datetime of the tweet.
One way to get this result would be to use a correlated subquery. (This is probably the easiest approach to understand, but is probably not the best approach from a performance standpoint).
SELECT t.id
, s.tag
, ( SELECT COUNT(1)
FROM twitter.tweet_tags r
JOIN twitter.tweet q
ON q.id = r.tweet_id
WHERE r.tag = s.tag
AND q.date >= t.date
AND q.date <= t.date + INTERVAL 7 DAY
) AS cnt
FROM twitter.tweet t
JOIN twitter.tweet_tags s
ON s.tweet_id = t.id
ORDER
BY cnt DESC
Another approach would be to use a join operation:
SELECT t.id
, s.tag
, COUNT(q.id) AS cnt
FROM twitter.tweet t
JOIN twitter.tweet_tags s
ON s.tweet_id = t.id
LEFT
JOIN twitter.tweet_tags r
ON r.tag = s.tag
LEFT
JOIN twitter.tweet q
ON q.id = r.tweet_id
AND q.date >= t.date
AND q.date <= t.date + INTERVAL 7 DAY
GROUP
BY t.id
, s.tag
ORDER
BY cnt DESC
The counts from both of these queries assume that tweet_tags (tweet_id, tag) is unique. If there are any "duplicates", then including the DISTINCT keyword, i.e. COUNT(DISTINCT q.id) (in place of COUNT(1) and COUNT(q.id) respectively) would get you the count of "related" tweets.
NOTE: the counts returned will include the original tweet itself.
NOTE: removing the LEFT keywords from the query above should return an equivalent result, since the tweet/tag (from t/s) is guaranteed to match itself (from r/q), as long as the tag is not null and the tweet date is not null.
Those queries are going to have problematic performance on large sets. Appropriate covering indexes are going to be needed for acceptable performance:
... ON twitter.tweet_tags (tag, tweet_id)
... ON twitter.tweet (date)

Inefficient SQL

I'm no MySQL expert, but I've managed until now to hack together something that works. Unfortunately, my latest bodged attempt results in the server dying, so obviously I'm doing something that is massively inefficient. Can anyone give me a hint as to where the problem is and how I might get the same results without bringing the whole site down everytime?
$sqlbest = "SELECT
wp_postmeta.meta_value
, wp_posts.post_title
, wp_posts.ID
, (TO_DAYS(CURDATE())- TO_DAYS(wp_posts.post_date))+1 AS days
FROM `wp_postmeta` , `wp_posts`
WHERE `wp_postmeta`.`post_id` = `wp_posts`.`ID`
AND `wp_posts`.`post_date` >= DATE_SUB( CURDATE( ) , INTERVAL 1 WEEK)
AND `wp_postmeta`.`meta_key` = 'views'
AND `wp_posts`.`post_status` = 'publish'
AND wp_posts.ID != '".$currentPostID."'
GROUP BY `wp_postmeta`.`post_id`
ORDER BY (CAST( `wp_postmeta`.`meta_value` AS UNSIGNED ) / days) DESC
LIMIT 0 , 4";
$results = $wpdb->get_results($sqlbest);
It uses a post views count to calculate views/day for posts published in the last, then orders them by that number, and grabs the top 4.
I think I see that it's inefficient in that it has to calculate that views/day everytime for a few thousand posts, but I don't know how to do it any better.
Thanks in advance.

You could eliminate the need to call those date functions every time by either passing them statically into the query from your PHP server (which may not be synced with your database) or you can instead write a stored procedure and save the results of those date functions to variables that will then be used in the query.

SELECT
wp_postmeta.meta_value
, wp_posts.post_title
, wp_posts.ID
, DATEDIFF(CURDATE(),wp_posts.post_date)+1 AS days <<--1: DATEDIFF
FROM wp_postmeta
INNER JOIN wp_posts ON (wp_postmeta.post_id = wp_posts.ID) <<--2: explicit join
WHERE wp_posts.post_date >= DATE_SUB( CURDATE( ) , INTERVAL 1 WEEK)
AND wp_postmeta.meta_key = 'views'
AND wp_posts.post_status = 'publish'
AND wp_posts.ID != '".$currentPostID."'
AND wp_postmeta.meta_value > 1 <<-- 3: extra filter
/*GROUP BY wp_postmeta.post_id */ <<-- 4: group by not needed
ORDER BY (CAST( wp_postmeta.meta_value AS UNSIGNED ) / days) DESC
LIMIT 0 , 4;
I've tried to make a few changes.
Replaced the two calls to TO_DAYS with one call to DATEDIFF.
Replaced the ugly implicit where join with an explicit inner join this does not do anything, just makes things clearer. One thing it shows, if wp_postmeta.post_id is unique, then you do not need the group by, because the inner join will only give one row per wp_postmeta.post_id.
Added an extra filter to filter out the posts with a low view count, this limits the amount of rows MySQL has to sort.
Eliminated group by this is only right if wp_postmeta.post_id is unique!

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Optimising a MySQL query with a SUM in the sub-query - mysql

Related

Order records added in last 3 days first (exclude certain product type), then order by other columns

Include zeros in SQL count query?

Getting rows between 2 given dates (included rows on those dates)

Count tweets between dates (mysql)

Inefficient SQL

Categories

Resources