How to speed up a NOT IN query? - mysql

I have the following query:
SELECT images.id from images WHERE images.id NOT IN
(
SELECT
temp.target_id
FROM (
SELECT
images_user_groups.images_id AS target_id,
images_user_groups.user_groups_id AS source_id
FROM
images_user_groups
) AS temp
INNER JOIN
user_groups ON user_groups.id = temp.source_id
INNER JOIN
images ON images.id = temp.target_id
WHERE
user_groups.description LIKE "%Freigabe ins Presseportal%"
GROUP BY
temp.target_id
)
It runs, but it goes very slowly -- 5 minutes and counting as I write this. (The subquery that I'm running a NOT IN on returns 36,000 rows, and I assume that the parent query is taking time to check every one of the 38,000 entries in the images table against those 36,000 rows.)
Is there a way I can speed this query up?

It should be a lot faster to write this as a LEFT JOIN, looking for NULL values in user_groups to indicate that the image was not in that group:
SELECT i.id
FROM images i
JOIN images_user_groups iu ON iu.images_id = i.id
LEFT JOIN user_groups u ON u.id = iu.user_groups_id AND u.description LIKE "%Freigabe ins Presseportal%"
GROUP BY i.id
HAVING COUNT(u.id) = 0
The problem with NOT IN and NOT EXISTS is that the WHERE clause has to be evaluated for every row in images.

You can try in below way
SELECT images.id from images WHERE images.id NOT IN
(
SELECT images_user_groups.images_id
FROM images_user_groups INNER JOIN user_groups ON user_groups.id = images_user_groups.user_groups_id
inner join images ON images.id = images_user_groups.images_id
WHERE user_groups.description LIKE "%Freigabe ins Presseportal%"
)A

Related

Improve MySql query left outer joins with subquery

We are maintaining a history of Content. We want to get the updated entry of each content, with create Time and update Time should be of the first entry of the Content. The query contains multiple selects and where clauses with so many left joins. The dataset is very huge, thereby query is taking more than 60 seconds to execute. Kindly help in improving the same. Query:
select * from (select * from (
SELECT c.*, initCMS.initcreatetime, initCMS.initupdatetime, user.name as partnerName, r.name as rightsName, r1.name as copyRightsName, a.name as agelimitName, ct.type as contenttypename, cat.name as categoryname, lang.name as languagename FROM ContentCMS c
left join ContentCategoryType ct on ct.id = c.contentType
left join User user on c.contentPartnerId = user.id
left join Category cat on cat.id = c.categoryId
left join Language lang on lang.id = c.languageCode
left join CopyRights r on c.rights = r.id
left join CopyRights r1 on c.copyrights = r1.id
left join Age a on c.ageLimit = a.id
left outer join (
SELECT contentId, createTime as initcreatetime, updateTime as initupdatetime from ContentCMS cms where cms.deleted='0'
) as initCMS on initCMS.contentId = c.contentId WHERE c.deleted='0' order by c.id DESC
) as temp group by contentId) as c where c.editedBy='0'
Any help would be highly appreciated. Thank you.
Just a partial eval and suggestion because your query seems non properly formed
This left join seems unuseful
FROM ContentCMS c
......
left join (
SELECT contentId
, createTime as initcreatetime
, updateTime as initupdatetime
from ContentCMS cms
where cms.deleted='0'
) as initCMS on initCMS.contentId = c.contentId
same table
the order by (without limit) in a subquery in join is unuseful because join ordered values or unordered value produce the same result
the group by contentId is strange beacuse there aren't aggregation function and the sue of group by without aggregation function is deprecated is sql
and in the most recente version for mysql is not allowed (by deafult) if you need distinct value or just a rows for each contentId you should use distinct or retrive the value in a not casual manner (the use of group by without aggregation function retrive casual value for not aggregated column .
for a partial eval your query should be refactored as
SELECT c.*
, c.initcreatetime
, c.initupdatetime
, user.name as partnerName
, r.name as rightsName
, r1.name as copyRightsName
, a.name as agelimitName
, ct.type as contenttypename
, cat.name as categoryname
, lang.name as languagename
FROM ContentCMS c
left join ContentCategoryType ct on ct.id = c.contentType
left join User user on c.contentPartnerId = user.id
left join Category cat on cat.id = c.categoryId
left join Language lang on lang.id = c.languageCode
left join CopyRights r on c.rights = r.id
left join CopyRights r1 on c.copyrights = r1.id
WHERE c.deleted='0'
) as temp
for the rest you should expiclitally select the column you effectively need add proper aggregation function for the others
Also the nested subquery just for improperly reduce the rows don't help performance ... you should also re-eval you data modelling and design.

MySQL - Ordering IN within INNER JOIN

I'm creating a system that allows a user to search a database of photo albums images for a keyword, it's working great, the only issue is that I'm ordering relevancy by the amount of times that keyword appears in an album. I'm doing this using:
SELECT collections_ids.collection_id
FROM `keywords`
INNER JOIN collections_ids ON keywords.id = collections_ids.photo_id
WHERE keywords.`keyword` = 'trees'
GROUP BY collection_id
ORDER BY COUNT(*) DESC
As said, this works great.
The only issue is, when this is included in an "WHERE IN" query, it loses it's order and is returned randomly. For clarity, here is the query:
SELECT collections.id,
collections.title
images.img_small
FROM `collections`
INNER JOIN images ON images.id = collections.cover_photo
WHERE collections.`id` IN
(SELECT collections_ids.collection_id
FROM `keywords`
INNER JOIN collections_ids ON keywords.id = collections_ids.photo_id
WHERE keywords.`keyword` = 'trees'
GROUP BY collection_id
ORDER BY COUNT(*) DESC)
I've tried researching, and people have suggested using the FIELD function, but I don't see that working in this context.
Any suggestions?
you can use sub query as join and take it count(*) as order by like below
SELECT collections.id,
collections.title
images.img_small
FROM `collections`
INNER JOIN images ON images.id = collections.cover_photo
INNER JOIN
(SELECT distinct collections_ids.collection_id As collection_id,COUNT(*) as total
FROM `keywords`
INNER JOIN collections_ids ON keywords.id = collections_ids.photo_id
WHERE keywords.`keyword` = 'trees'
GROUP BY collection_id
) as A
ON A.collection_id =collections.collection_id
order by A.total

Best way to write this query? Several JOINS

I have this query (below) while it does work I am wondering if it is the best as it will be going against thousands of records. I will try to explain the best I can.
SELECT items.*,
p.file AS item_pic,
i_f.id AS favorite_id,
COALESCE(f.favorite_count, 0) AS favorite_count,
COALESCE(b.num_buys, 0) AS num_buys,
COALESCE(c.comment_count, 0) AS comment_count
FROM items i
INNER JOIN (SELECT file,
item_id
FROM item_pics
ORDER BY item_pics.id ASC) AS p
ON p.item_id = i.id
LEFT JOIN (SELECT COUNT(*) AS favorite_count,
item_id
FROM item_favorites
GROUP BY item_id) AS f
ON f.item_id = i.id
LEFT JOIN (SELECT COUNT(*) AS num_buys,
item_id
FROM purchases
GROUP BY item_id) AS b
ON b.item_id = i.id
LEFT JOIN (SELECT COUNT(*) AS comment_count,
item_id
FROM comments
GROUP BY item_id) AS c
ON c.item_id = i.id
LEFT JOIN item_favorites AS i_f
ON i.id = i_f.item_id
AND i_f.userid = '14'
GROUP BY i.id
LIMIT 0, 20
So we are selecting the items in the database. The first join is for a picture (Items have multiple pictures but I only want one).
The next join is for favorite count. Each time a user favorites something it adds it to the table favorites with some info, so I am just trying to get the total number of favorites for that item.
Next up is the number of purchases for this item. Pretty much the same as favorites.
After that it is for comments. Again this is just like the purchases and favorites count.
The last join is to see if the logged in user (id 14) has favorited this item if not I use COALESCE to return 0.
Like I said this all works correctly but it does take a few seconds to load on a table of about 6700 items and about 180K rows in the purchases table for only loading 20 at a time (I do a scrolling/load similar to Facebook/Twitter). Indexes have been properly setup on all tables. Once this is complete/correct I would like to know how to limit results for purchases in the last seven days and order by number of purchases (num_buys).
EDIT: Results from EXPLAIN
I suppose you want the first picture (lowest id), and pictures are required, where as everything else is optional.
I guess you're doing subqueries because you think joining on uncorrelated subqueries (hitting the joined tables just once) will be faster than correlated subqueries or a plain JOIN. However, you end up having to lookup the records twice, and the second lookup (for the actual join) doesn't get to use an index because derived (temporary tables) don't have indexes.
Try normal JOINs:
SELECT items.*,
p.file AS item_pic,
COALESCE(i_f.id, 0) AS favorite_id,
COUNT(f.item_id) AS favorite_count,
COUNT(b.item_id) AS num_buys,
COUNT(c.item_id) AS comment_count
FROM items i
STRAIGHT_JOIN item_pics p
ON p.item_id = i.id
LEFT JOIN item_pics p2
ON p2.item_id = i.id
AND p2.id < p1.id
LEFT JOIN item_favorites f
ON f.item_id = i.id
LEFT JOIN purchases b
ON b.item_id = i.id
LEFT JOIN comments c
ON c.item_id = i.id
LEFT JOIN item_favorites AS i_f
ON i_f.item_id = i.id
AND i_f.userid = '14'
WHERE p2.id IS NULL
GROUP BY i.id
LIMIT 20
The double join on pictures is an anti-join WHERE p2.id IS NULL, to retrieve the picture with the lowest id.

How can I get the sum of a column ?

I have 3 tables: activites, taks and requirements. I want to return all of the duration of all the tasks for a specific requirement. This is my query:
SELECT r.id as req_id,
r.project_id,
r.name as req_name,
r.cost,r.estimated,
p.name as project_name,
v.name AS `status` ,
t.taskid,
(SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(duration)))
FROM activities a
WHERE a.taskid = t.taskid) AS worked
FROM requirements r
INNER JOIN projects p
ON p.projectid = r.project_id
INNER JOIN `values` v
ON v.id = r.r_status_id
LEFT JOIN tasks t
on t.id_requirement = r.id
WHERE 1 = 1
ORDER BY req_id desc
And this is the result :
As you can see there are 2 same req_id (48) . I want to appear one time and get the sum of the last two rows in worked. How can I manage that ?
this is the activities structure :
this is tasks structure :
and this is the requirement structure :
Include your activities table in the JOIN, GROUP by all requirement columns you need and add a sum. Since you are aggregating tasks, you cannot have taskid in the SELECT clause.
SELECT r.id as req_id,
r.project_id,
r.name as req_name,
r.cost,r.estimated,
p.name as project_name,
v.name AS `status` ,
SEC_TO_TIME(SUM(TIME_TO_SEC(a.duration)))
FROM requirements r
INNER JOIN projects p ON p.projectid = r.project_id
INNER JOIN `values` v ON v.id = r.r_status_id
LEFT JOIN tasks t ON t.id_requirement = r.id
LEFT JOIN activities a ON a.taskid=t.taskid
WHERE 1 = 1
GROUP BY r.id, r.project_id, r.name,r.cost,r.estimated,p.name, v.name
ORDER BY req_id desc
The joins in your query appear to be creating extra rows. I'm sure there is a way to fix the logic directly, possibly by pre-aggregating some results in the from clause.
Your duplicates appear to be complete duplicates (every column is exactly the same). The easy way to fix the problem is to use select distinct. So, just start your query with:
SELECT DISTINCT r.id as req_id, r.project_id, r.name as req_name,
. . .
I suspect that one of your underlying tables has duplicated rows that you are not expecting, but that is another issue.

SQL - Ambiguous Column

I can't see what is wrong with this query. I get an error saying:
"column article_id in from clause is ambiguous"
I understand that it may have something to do with table name aliases but not sure of how to fix. If the query was smaller I may be able to work something out but it's pretty confusing to me and every time I change something to try and fix it, something else stops - so I thought I'd ask first.
SELECT bt.topic_title, f.article_id, p.photo_id, ba.title, ba.slug,
IFNULL(c.cnt,0) comments, IFNULL(ph.cnt,0) photos, IFNULL(v.cnt,0) videos
FROM blog_article_followers AS f
LEFT OUTER JOIN (
SELECT article_id, COUNT(comment_id) as cnt
FROM blog_comments
GROUP BY article_id) c
ON f.article_id = c.article_id
LEFT OUTER JOIN (" _
SELECT article_id, COUNT(photo_id) as cnt
FROM photos
GROUP BY article_id) ph
ON f.article_id = ph.article_id
LEFT OUTER JOIN (
SELECT article_id, COUNT(video_id) as cnt
FROM videos
GROUP BY article_id) v
ON f.article_id = v.article_id
LEFT JOIN blog_topics bt ON f.topic_id = bt.topic_id
LEFT JOIN blog_articles AS ba USING (article_id)
LEFT JOIN photos AS p USING (article_id)
WHERE f.member_id = 100 AND p.cover = 1
ORDER BY f.follow_date DESC;
Try replacing this:
LEFT JOIN blog_articles AS ba USING (article_id)
LEFT JOIN photos AS p USING (article_id)
With this
LEFT JOIN blog_articles AS ba ON f.article_id = ba.article_id
LEFT JOIN photos AS p ON f.article_id = photos.article_id
you have to rename the column
LEFT JOIN photos AS p USING (p.article_id)
or to whichever table article_id belongs to