Select items that doesn't have a specific value in another table - mysql

I am trying to select items that doesn't have a specific value in another table, I was able to achieve the result that I wanted by using a subquery, however it's very slow so I am wondering if I could do it differently...
SELECT
content.*,
(SELECT views
FROM content_views
WHERE content_views.content = content.record_num
) as views
FROM content
RIGHT JOIN watch_log ON content.record_num = watch_log.content
WHERE content.enabled = 1
AND 24 NOT IN
(SELECT niche
FROM content_niches
WHERE content_niches.content = content.record_num
)
ORDER BY content.encoded_date
DESC LIMIT 0,6
I tried using a LEFT OUTER JOIN, but couldn't get the same result...
SELECT
content.*,
(SELECT content_views.views
FROM content_views
WHERE content_views.content = content.record_num
) as views
FROM content
RIGHT JOIN watch_log ON content.record_num = watch_log.content
LEFT OUTER JOIN content_niches ON content.record_num = content_niches.content AND content_niches.niche = 24
WHERE content.enabled = 1
ORDER BY content.encoded_date
DESC LIMIT 0,6

Mixing left and right outer joins is just confusing. In fact, right join isn't really needed. It can usually be replaced by left join. In your case, it can be replaced by inner join, because the where clause turns it into an inner join. So, how about:
SELECT c.*,
(SELECT views
FROM content_views cv
WHERE cv.content = c.record_num
) as views
FROM content c JOIN
watch_log wl
ON c.record_num = wl.content
WHERE c.enabled = 1 AND
NOT EXISTS (SELECT 1
FROM content_niches cn
WHERE cn.content = c.record_num AND
cn.niche = 24
)
ORDER BY c.encoded_date DESC
LIMIT 0, 6;
For performance you want indexes: content(enabled, encoded_date, record_num), content_views(content, views), and content_niches(content, niche).
Notes:
Don't mix different types of outer joins, unless you really, really understand what they are doing.
Use table aliases that abbreviations of the table names. This makes queries easier to write and to read.
Whatever your preference for formatting, don't start a line in a query with DESC (or ASC); this is a modifier on ORDER BY.
NOT EXISTS is better than NOT IN. The former handles NULL values the way you would expect. The latter returns nothing if there are NULL values.

Related

Understaing the difference between two queries from performance point

I have this two version of the same query. Both produce same results (164 rows). But the second one takes .5 sec while the 1st one takes 17 sec. Can someone explain what's going on here?
TABLE organizations : 11988 ROWS
TABLE transaction_metas : 58232 ROWS
TABLE contracts_history : 219469 ROWS
# TAKES 17 SEC
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
LEFT JOIN `transaction_metas` as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token` and `tm`.`field` = '1'
WHERE `contracts_history`.`seller_id` = '850'
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
# TAKES .6 SEC
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
left join (select * from `transaction_metas` where contract_token in (select token from `contracts_history` where seller_id = 850)) as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token` and `tm`.`field` = '1'
WHERE `contracts_history`.`seller_id` = '850'
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
Explain Results:
First Query: https://prnt.sc/hjtiw6
Second Query: https://prnt.sc/hjtjjg
As based on my debugging of the first query it was clear that left join to transaction_metas table was making it slow, So I tried to limit its rows instead of joining to the full table. It seems to work but I don't understand why.
Join is a set of combinations from rows in your tables. That in mind, in the first query the engine combines all the results to filter just after. In second case one it applies the filter before it tries make the combinations.
The best case would make use of filter in JOIN clause without subquery.
Much like this:
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
AND `contracts_history`.`seller_id` = '850'
LEFT JOIN `transaction_metas` as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token`
AND `tm`.`field` = 1
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
Note: When you reduce the size of the join tables by filtering with subqueries, it may allow the rows fit into the buffer. Nice trick to small buffer limit.
A Better explication:
https://dev.mysql.com/doc/refman/5.5/en/explain-output.html

SQL: LEFT JOIN and alias not working together

$query = "SELECT a.comment_user_id as main_id, a.comment_date as timestamp, a.comment_content as content, a.comment_link_id as link_unique, a.comment_id as status, NULL as url, b.user_login as ulogin, b.user_avatar_source as uavatar, c.link_title as ltitle, NULL as desc FROM kliqqi_comments as a WHERE comment_user_id IN ('$following2')
LEFT JOIN kliqqi_users as b ON a.comment_user_id = b.user_id
LEFT JOIN kliqqi_links as c ON a.comment_user_id = c.link_author
ORDER BY timestamp DESC LIMIT 10";
$result = mysqli_query($db_conx, $query);
$row = $result->fetch_array(MYSQLI_ASSOC);
Can anybody tell me what's wrong with the code? It is always returning this error:
Fatal error: Call to a member function fetch_assoc() on boolean
Boolean means this query is not getting executed due to some error in $query variable which I am unable to figure out.
$following is an array. kliqqi_comments alias a, kliqqi_users alias b, kliqqi_links alias c. I am storing all the other fields as alias too. There is no typo or any other silly mistake. I've checked it thoroughly.
UPDATE:
I'm updating this thread because my query actually has many parts and many users may find it helpful.
$query = "SELECT a.comment_user_id as main_id, a.comment_date as timestamp2, a.comment_content as content, a.comment_link_id as link_unique, a.comment_id as status, b.user_login as ulogin, b.user_avatar_source as uavatar, c.link_title as ltitle FROM kliqqi_comments a
LEFT JOIN kliqqi_users b ON a.comment_user_id = b.user_id
LEFT JOIN kliqqi_links c ON a.comment_link_id = c.link_id
WHERE comment_user_id IN ('$following')
UNION ALL
SELECT d.link_author as main_id, d.link_date as timestamp2, d.link_status as content, d.link_id as link_unique, NULL as status, e.user_login as ulogin, e.user_avatar_source as uavatar, d.link_title as ltitle FROM kliqqi_links d
LEFT JOIN kliqqi_users e ON d.link_author = e.user_id
WHERE link_author IN ('$following') AND link_status IN ('new','published')
UNION ALL
SELECT f.vote_user_id as main_id, f.vote_date as timestamp2, f.vote_value as content, f.vote_link_id as link_unique, NULL as status, g.user_login as ulogin, g.user_avatar_source as uavatar, h.link_title as ltitle FROM kliqqi_votes f
LEFT JOIN kliqqi_users g ON f.vote_user_id = g.user_id
LEFT JOIN kliqqi_links h ON f.vote_link_id = h.link_id
WHERE vote_user_id IN ('$following')
ORDER BY timestamp2 DESC LIMIT 30";
What does it do?
I've 3 tables: kliqqi_links, kliqqi_users, kliqqi_votes
UNION ALL
All of them have a timestamp field.
I wanted to fetch contents from these 3 tables combined in decreasing order of timestamp. And to do so, I used UNION ALL (UNION can also be used here but UNION has to run duplicate checks so it's better to avoid it if you can.). But UNION ALL works only when all of the tables have same number of fields. So, I created NULL elements for equating the numbers.
It is to be noted that there is no restriction of datatype for uniting respective fields. But since I had to use timestamp for sequence, I kept them together.
Alias
Since all the respective fields have different names in different tables, I used alias to avoid confusion. Without alias, results are stored in fields mentioned in first SELECT statement which would be a mess.
Multiple LEFT JOIN
Now, I wanted to grab some data from other tables for each SELECT query.
e.g. for kliqqi_comments (first SELECT statement), I wanted to grab user data for the person who made the comment from kliqqi_users plus I wanted to fetch the link where this comment was made from kliqqi_links table. So, I used left join with kliqqi_comments query where comment_user_id from kliqqi_comments equals to user_id from kliqqi_users and comment_link_id from kliqqi_comments equals link_id from kliqqi_links.
Notice that I managed to equate fields in all 3 statements for UNION ALL.
WHERE IN
$following is comma separated array to ensure that it returns result from the people user is following.
ORDER BY DESC, LIMIT
To order by timestamp and LIMIT output result.
That's it.
The where clauses should come after the join clauses, not before them. Additionally, desc and timestamp are reserved words. If you absolutely must use them as a column aliases, you need to escape them:
SELECT a.comment_user_id as main_id,
a.comment_date as `timestamp`, -- Notice the escaping
a.comment_content as content,
a.comment_link_id as link_unique,
a.comment_id as status,
NULL as url,
b.user_login as ulogin,
b.user_avatar_source as uavatar,
c.link_title as ltitle,
NULL as `desc` -- Notice the escaping
FROM kliqqi_comments as a
LEFT JOIN kliqqi_users as b ON a.comment_user_id = b.user_id
LEFT JOIN kliqqi_links as c ON a.comment_user_id = c.link_author
WHERE comment_user_id IN ('$following2') -- Where clause after the joins
ORDER BY `timestamp` DESC LIMIT 10";

Duplicate column name SQL - need change alias?

I have written SQL query with a INNER JOIN and Sub-query:
SELECT c.*,
ar.ArticleName,
ar.idArticle,
du.DetailToUsersName,
du.DetailToUsersPhoto,
COUNT(c.idCommentToArticle) AS CNT,
CASE WHEN d.Count IS NULL THEN 0 ELSE d.Count END AS CountLikes
from (select *
from commenttoarticle g
inner join (select distinct(s.idCommentToArticle)
from commenttoarticle s
order by s.CommentToArticlePID limit 3) as gh) as c
LEFT JOIN article ar ON c.CommentToArticleIdArticle = ar.idArticle
LEFT JOIN detailtousers du ON du.idDetailToUsers = c.CommentToArticleIdUser
LEFT JOIN `likes` d ON (d.IdNote = c.idCommentToArticle AND d.LikeType = 6)
WHERE c.CommentToArticleIdArticle = 11
GROUP BY c.idCommentToArticle
ORDER BY c.idCommentToArticle DESC
So, I get error:
Duplicate column name 'idCommentToArticle'
I can not find where the duplication is?
you can specify in the alias table query c
select g.* from commenttoarticle g
instead of
select * from commenttoarticle g
Also you should specify Join condition to limit the rows to 3 as per your intention, with out the ON clause it will be like a cross join.
select g.* from commenttoarticle g
inner join (select distinct(s.idCommentToArticle) from commenttoarticle s order by s.CommentToArticlePID limit 3) as gh
on g.idcommenttoarticle = gh.idcommenttoarticle
As #RADAR has suggested, your inner query joins don't seem to be complete. And I see from comments that once you place the JOIN condition in, then you lose all data. I think this is because neither part of the subqueries were doing what they were supposed to do.
Here is my attempt at a total solution (note, without dataset and table definition I can't show it working). OK, so you have asked the question again over here and provided a SQL-Fiddle, I have updated with a working version, but minus the additional JOIN tables, since they are not defined.
SELECT c.*,
ar.ArticleName,
ar.idArticle,
du.DetailToUsersName,
du.DetailToUsersPhoto,
COUNT(c.idCommentToArticle) AS CNT,
CASE WHEN d.Count IS NULL THEN 0 ELSE d.Count END AS CountLikes
FROM commenttoarticle c -- one layer of subquery not required.
INNER JOIN (select s.idCommentToArticle, s.CommentToArticlePID -- added both the id and the parent id
FROM commenttoarticle s
WHERE s.CommentToArticleIdArticle = 11 -- moved to inner query, instead of outer query
ORDER BY s.idCommentToArticle DESC limit 3) as gh
ON c.idcommenttoarticle = gh.idcommenttoarticle -- add join condition
OR c.idcommenttoarticle = gh.CommentToArticlePID -- which matches id and parent id
LEFT JOIN article ar ON c.CommentToArticleIdArticle = ar.idArticle
LEFT JOIN detailtousers du ON du.idDetailToUsers = c.CommentToArticleIdUser
LEFT JOIN `likes` d ON (d.IdNote = c.idCommentToArticle AND d.LikeType = 6)
GROUP BY c.idCommentToArticle
ORDER BY c.idCommentToArticle DESC
But let me explain a little further, the following code from your original query was selecting the top 3 idCommentToArticlePID,
(select *
from commenttoarticle g
inner join (select distinct(s.idCommentToArticle)
from commenttoarticle s
order by s.CommentToArticlePID limit 3) as gh)
but then because there was no ON specified the 3 records were then joined to every single record from the g reference. This resulted in the full dataset being returned.
And then you you specified WHERE c.CommentToArticleIdArticle = 11 this filtered the result set back down again to something that looked correct.
When you then added the ON (as per #RADAR's suggestion) the inner query did not contain any values that matched the WHERE c.CommentToArticleIdArticle = 11 filter and thus you lost all your results. If you move this filter into the inner query as shown above, then these will work together and not conflict.
Within the JOIN condition, you indicate that you want both the matching articles and their parents, so I added both to the return of the inner query, and checked for either in the join condition.
Also I think the whole g table reference is redundant and can be removed. You should be able to access this table directly as c.
I also have some concerns about the GROUP BY and COUNT (c.idCommentToArticle) - there seem a little strange, but I have no supporting context (ie data examples), so they may be correct. If you still have issues, I would comment the GROUP BY and COUNT statements out, and test to see what data you are getting, before adding these back in.

How to optimize this complected query?

While working with following query on mysql, Its getting locked,
SELECT event_list.*
FROM event_list
INNER JOIN members
ON members.profilenam=event_list.even_loc
WHERE (even_own IN (SELECT frd_id
FROM network
WHERE mem_id='911'
GROUP BY frd_id)
OR even_own = '911' )
AND event_list.even_active = 'y'
GROUP BY event_list.even_id
ORDER BY event_list.even_stat ASC
The Inner query inside IN constraint has many frd_id, So because of that above query is slooow..., So please help.
Thanks.
Try this:
SELECT el.*
FROM event_list el
INNER JOIN members m ON m.profilenam = el.even_loc
WHERE el.even_active = 'y' AND
(el.even_own = 911 OR EXISTS (SELECT 1 FROM network n WHERE n.mem_id=911 AND n.frd_id = el.even_own))
GROUP BY el.even_id
ORDER BY el.even_stat ASC
You don't need the GROUP BY on the inner query, that will be making the database engine do a lot of unneeded work.
If you put even_own = '911' before the select from network, then if even_own IS 911 then it will not have to do the subquery.
Also why do you have a group by on the subquery?
Also run explain plan top find out what is taking the time.
This might work better:
( SELECT e.*
FROM event_list AS e
INNER JOIN members AS m ON m.profilenam = e.even_loc
JOIN network AS n ON e.even_own = n.frd_id
WHERE n.mem_id = '911'
AND e.even_active = 'y'
ORDER BY e.even_stat ASC )
UNION DISTINCT
( SELECT e.*
FROM event_list AS e
INNER JOIN members AS m ON m.profilenam = e.even_loc
WHERE e.even_own = '911'
AND e.even_active = 'y' )
ORDER BY e.even_stat ASC
Since I don't know whether the JOINs one-to-many (or what), I threw in DISTINCT to avoid dups. There may be a better way, or it may be unnecessary (that is, UNION ALL).
Notice how I avoid two things that are performance killers:
OR -- turned into UNION
IN (SELECT...) -- turned into JOIN.
I made aliases to cut down on the clutter. I moved the ORDER BY outside the UNION (and added parens to make it work right).

Need help with an SQL query involving multiple tables - Join not an option

SELECT i.*, i.id IN (
SELECT id
FROM w
WHERE w.status='active') AS wish
FROM i
INNER JOIN r ON i.id=r.id
WHERE r.member_id=1 && r.status='active'
ORDER BY wish DESC
LIMIT 0,50
That's a query that I'm trying to run. It doesn't scale well, and I'm wondering if someone here can tell me where I could improve things. I don't join w to r and i because I need to show rows from i that are unrepresented in w. I tried a left join, but it didn't perform too well. This is better, but not ideal yet. All three tables are very large. All three are indexed on the fields I'm joining and selecting on.
Any comments, pointers, or constructive criticisms would be greatly appreciated.
EDIT Addition:
I should have put this in my original question. It's the EXPLAIN as return from SQLYog.
id|select_type |table|type |possible_keys|key |key_len|ref |rows|Extra|
1 |PRIMARY |r |ref |member_id,id |member_id|3 |const|3120|Using where; Using temporary; Using filesort
1 |PRIMARY |i |eq_ref |id |id |8 |r.id |1 |
2 |DEPENDENT SUBQUERY|w |index_subquery|id,status |id |8 |func |8 |Using where
EDIT le dorfier - more comments ...
I should mention that the key for w is (member_id, id). So each id can exist multiple times in w, and I only want to know if it exists.
WHERE x IN () is identical to an INNER JOIN to a SELECT DISTINCT subquery, and in general, a join to a subquery will typically perform better if the optimizer doesn't turn the IN into a JOIN - which it should:
SELECT i.*
FROM i
INNER JOIN (
SELECT DISTINCT id
FROM w
WHERE w.status = 'active'
) AS wish
ON i.id = wish.id
INNER JOIN r
ON i.id = r.id
WHERE r.member_id = 1 && r.status = 'active'
ORDER BY wish.id DESC
LIMIT 0,50
Which, would probably be equivalent to this if you don't need the DISTINCT:
SELECT i.*
FROM i
INNER JOIN w
ON w.status = 'active'
AND i.id = wish.id
INNER JOIN r
ON i.id = r.id
AND r.member_id = 1 && r.status = 'active'
ORDER BY i.id DESC
LIMIT 0,50
Please post your schema.
If you are using wish as an existence flag, try:
SELECT i.*, CASE WHEN w.id IS NOT NULL THEN 1 ELSE 0 END AS wish
FROM i
INNER JOIN r
ON i.id = r.id
AND r.member_id = 1 && r.status = 'active'
LEFT JOIN w
ON w.status = 'active'
AND i.id = w.id
ORDER BY wish DESC
LIMIT 0,50
You can use the same technique with a LEFT JOIN to a SELECT DISTINCT subquery. I assume you aren't specifying the w.member_id because you want to know if any members have this? In this case, definitely use the SELECT DISTINCT. You should have an index with id as the first column on w as well in order for that to perform:
SELECT i.*, CASE WHEN w.id IS NOT NULL THEN 1 ELSE 0 END AS wish
FROM i
INNER JOIN r
ON i.id = r.id
AND r.member_id = 1 && r.status = 'active'
LEFT JOIN (
SELECT DISTINCT w.id
FROM w
WHERE w.status = 'active'
) AS w
ON i.id = w.id
ORDER BY wish DESC
LIMIT 0,50
I should have put this in my original question. It's the EXPLAIN as return from SQLYog.
id|select_type|table|type|possible_keys|key|key_len|ref|rows|Extra|
1|PRIMARY|r|ref|member_id,id|member_id|3|const|3120|Using where; Using temporary; Using filesort
1|PRIMARY|i|eq_ref|id|id|8|r.id|1|
2|DEPENDENT SUBQUERY|w|index_subquery|id,status|id|8|func|8|Using where
Please post the EXPLAIN listing. And explain what the tables and columns mean.
wish appears to be a boolean - and you're ORDERing by it?
EDIT: Well, it looks like it's doing what it's being instructed to do. Cade seems to be thinking expansively on what this all could possibly mean (he probably deserves a vote just for effort.) But I'd really rather you tell us.
Wild guessing just confuses everyone (including you, I'm sure.)
OK, based on new info, here's my (slightly less wild) guess.
SELECT i.*,
CASE WHEN EXISTS (SELECT 1 FROM w WHERE id = i.id AND w.status = 'active' THEN 1 ELSE 0 END) AS wish
FROM i
INNER JOIN r ON i.id = r.id AND r.status = 'active'
WHERE r.member_id = 1
Do you want a row for each match in w? Or just to know for i.id , whether there is an active w record? I assumed the second answer, so you don't need to ORDER BY - it's for only one ID anyway. And since you're only returning columns from i, if there are multiple rows in r, you'll just get duplicate rows.
How about posting what you expect to get for a proper answer?
...
ORDER BY wish DESC
LIMIT 0,50
This appears to be the big expense. You're sorting by a computed column "wish" which cannot benefit from an index. This forces it to use a filesort (as indicated by the EXPLAIN) output, which means it writes the whole result set to disk and sorts it using disk I/O which is very slow.
When you post questions like this, you should not expect people to guess how you have defined your tables and indexes. It's very simple to get the full definitions:
mysql> SHOW CREATE TABLE w;
mysql> SHOW CREATE TABLE i;
mysql> SHOW CREATE TABLE r;
Then paste the output into your question.
It's not clear what your purpose is for the "wish" column. The "IN" predicate is a boolean expression, so it always results in 0 or 1. But I'm guessing you're trying to use "IN" in hopes of accomplishing a join without doing a join. It would help if you describe what you're trying to accomplish.
Try this:
SELECT i.*
FROM i
INNER JOIN r ON i.id=r.id
LEFT OUTER JOIN w ON i.id=w.id AND w.status='active'
WHERE r.member_id=1 AND r.status='active'
AND w.id IS NULL
LIMIT 0,50;
It uses an additional outer join, but it doesn't incur a filesort according to my test with EXPLAIN.
Have you tried this?
SELECT i.*, w.id as wish FROM i
LEFT OUTER JOIN w ON i.id = w.id
AND w.status = 'active'
WHERE i.id in (SELECT id FROM r WHERE r.member_id = 1 AND r.status = 'active')
ORDER BY wish DESC
LIMIT 0,50