SQL: many to many relationships select where multiple criteria - mysql

given these tables :
id_article | title
1 | super article
2 | another article
id_tag | title
1 | great
2 | awesome
id_relation | id_article | id_tag
1 | 1 | 1
2 | 1 | 2
3 | 2 | 1
I'd like to be able to select all articles that are "great" AND "awesome" (eventually, I'll probably have to implement OR too)
And basically, if I do a select on articles the relation table joining on id_article: of course, I cant join two different values of id_tag. Only lead I had with concatenating IDs to test as a string, but that seems so lame, there has to be a prettier solution.
Oh and if it matters, I use a MySQL server.
EDIT: for ByWaleed, the typical sql select that would surely fail that I cited in my original question:
SELECT
a.id_article,
a.title
FROM articles a, relations r
WHERE
r.id_article = a.id_article and r.id_tag = 1 and r.id_tag = 2
wouldnt work because r.id_tag cant obviously be 1 and 2 on the same line. I doubt w3schools has an article on that. My search on google didnt yield any result, probably because I searched with the wrong keyword.

If you do all the joins as normal, then aggregate the rows to one group by article, then you can assert that they must have at least two different tags.
(Having already filtered to great and/or awesome, that means they have both.)
SELECT
a.id_article,
a.title
FROM
articles a
INNER JOIN
relations r
ON r.id_article = a.id_article
INNER JOIN
tags t
ON t.id_tag = r.id_tag
WHERE
t.title IN ('great', 'awesome')
GROUP BY
a.id_article,
a.title
HAVING
COUNT(DISTINCT t.id_tag) = 2
(The DISTINCT is to avoid the possibility of one article having 'great' twice, for example.)
To do OR, you just remove the HAVING clause.

One approach is to aggregate by article, and then assert that the article both the "great" and "awesome" tags:
SELECT
a.id_article,
a.title
FROM articles a
INNER JOIN relations r
ON a.id_article = r.id_article
INNER JOIN tags t
ON r.id_tag = t.id_tag
WHERE
t.title IN ('great', 'awesome')
GROUP BY
a.id_article,
a.title
HAVING
MIN(t.title) <> MAX(t.title);
Demo
The logic here is that we first limit records, for each article, to only those of the two targets tags. Then we assert, in the HAVING clause, that both tags appear. I use a MIN/MAX trick here, because if the min and max differ, then it implies that there are two distinct tags.

Step 1: Use a temp table to get all articles with titles.
Step 2: If an article occurs multiple times in your temp table, that means it has great and awesome as titles.
Try:
CREATE TEMPORARY TABLE MyTempTable (
select t1.id_article, t2.title
from table1 t1
inner join table3 t3 on t3.id_article = t1.id_article
inner join table2 t2 on t2.id_tag = t3.id_tag
)
select m.id_article
from MyTempTable m
group by m.id_article
having count(*)>1
Edit: This solution assumes there are two possible tags, great and awesome. If more, please add a "where" clause to the select query for creating the temp table like where t2.title in ('great','awesome')

Related

Understanding use of multiple SUMs with LEFT JOINS in mysql

Using the GROUP BY command, it is possible to LEFT JOIN multiple tables and still get the desired number of rows from the first table.
For example,
SELECT b.title
FROM books `b`
LEFT JOIN orders `o`
ON o.bookid = b.id
LEFT JOIN authors `a`
ON b.authorid = a.id
GROUP BY b.id
However, since behind the scenes MYSQL is doing a cartesian product on the tables, if you include more than one SUM command you get incorrect values based on all the hidden rows. (The problem is explained fairly well here.)
SELECT b.title,SUM(o.id) as sales,SUM(a.id) as authors
FROM books `b`
LEFT JOIN orders `o`
ON o.bookid = b.id
LEFT JOIN authors `a`
ON b.authorid = a.id
GROUP BY b.id
There are a number of answers on SO about this, most using sub-queries in the JOINS but I am having trouble applying them to this fairly simple case.
How can you adjust the above so that you get the correct SUMs?
Edit
Example
books
id|title|authorid
1|Huck Finn|1
2|Tom Sawyer|1
3|Python Cookbook|2
orders
id|bookid
1|1
2|1
3|2
4|2
5|3
6|3
authors
id|author
1|Twain
2|Beazley
2|Jones
The "correct answer" for total # of authors of the Python Cookbook is 2. However, because there are two joins and the overall dataset is expanded by the join on number of orders, SUM(a.id) will be 4.
You are correct that by joining multiple tables you would not get the expected results.
But in this case you should use COUNT() instead of SUM() and count the distinct orders or authors.
Also by your design you should count the names of the authors and not the ids of the table authors:
SELECT b.title,
COUNT(DISTINCT o.id) as sales,
COUNT(DISTINCT a.author) as authors
FROM books `b`
LEFT JOIN orders `o` ON o.bookid = b.id
LEFT JOIN authors `a` ON b.authorid = a.id
GROUP BY b.id, b.title
See the demo.
Results:
| title | sales | authors |
| --------------- | ----- | ------- |
| Huck Finn | 2 | 1 |
| Tom Sawyer | 2 | 1 |
| Python Cookbook | 2 | 2 |
When dealing with separate aggregates, it is good style to aggregate before joining.
Your data model is horribly confusing, making it look like a book is written by one author only (referenced by books.authorid), while this "ID" is not an author's ID at all.
Your main problem is: You don't count! We count with COUNT. But you are mistakenly adding up ID values with SUM.
Here is a proper query, where I am aggregating before joining and using alias names to fight confusion and thus enhance the query's readability and maintainability.
SELECT
b.title,
COALESCE(o.order_count, 0) AS sales,
COALESCE(a.author_count, 0) AS authors
FROM (SELECT title, id AS book_id, authorid AS author_group_id FROM books) b
LEFT JOIN
(
SELECT id as author_group_id, COUNT(*) as author_count
FROM authors
GROUP BY id
) a ON a.author_group_id = b.author_group_id
LEFT JOIN
(
SELECT bookid AS book_id, COUNT(*) as order_count
FROM orders
GROUP BY bookid
) o ON o.book_id = b.book_id
ORDER BY b.title;
i don't think that your query would work like you eexspected.
Assume one book could have 3 authors.
For Authors:
So you would have three rows for that book in your books table,each one for every Author.
So a
SUM(b.authorid)
gives you the correct answer in your case.
For Orders:
you must use a subselect like
LEFT JOIN (SELECT SUM(id) o_sum,bookid FROM orders GROUP BY bookid) `o`
ON o.bookid = b.id
You should really reconsider your approach with books and authors.

Cleaning up SQL query with nested query and inner join

Been trying to reintroduce myself to SQL through some practice questions I've developed for myself, but struggling to find a better way of approaching the following problem:
playlists
id title
1 Title1
2 Title2
playlist_clips
id playlist_id clip_id
1 Title1 3
2 Title2 1
playlist_tags
playlist_id tag_id
1 1
1 2
2 2
Clips and Tags are two entirely separate tables, and I am using the playlist_tags and playlist_clips to connect them to the playlists table, to represent the two-way one-to-many relationships.
I wanted to select all the playlists that have a given title, and have ALL of the tags provided in the query (in this example [1, 2]), not just "at least one of them".
This is what I've come up with:
select p_clips.* from
(
select p.id, p.title, count(pc.id) as number_of_clips
from playlists p
left join playlist_clips pc on p.id = pc.playlist_id
where p.title like "Test1"
group by id
) as p_clips
inner join
(
select *
from playlists p
left join playlist_tags pt on p.id = pt.playlist_id
where pt.tag_id in (1, 2)
group by id
having count(*) = 2
) as p_tags
on p_clips.id = p_tags.id
Whilst, from my testing I've found this to work, it doesn't look particularly elegant, and I also assume it's not terribly efficient performance-wise. (I've removed irrelevant parameters from the code for this example, such as select parameters.)
What would be a cleaner way of approaching this, or at the least, a more optimized approach?
Expected Result:
id title
260 Title1
EDIT: I apologize for my initial confusing post, I've tried to clean up my tables and the information they contain.
I wanted to select all the playlists that have a given title, and have ALL of the tags provided in the query (in this example [1, 2]), not just "at least one of them".
You don't need the clips table at all. You don't need left joins or the playlists table in the subquery.
That suggests:
select p.*
from playlists p join
(select pt.playlist_id
from playlist_tags pt
where pt.tag_id in (1, 2)
group by id
having count(*) = 2
) pt
on p.id = pt.playlist_id
where p.title like 'Test1';
You could phrase this without a subquery as well:
select p.*
from playlists p join
playlist_tags pt
on p.id = pt.id
where p.title like 'Test1' and
pt.tag_id in (1, 2)
group by p.id
having count(*) = 2

Select rows with Left Outer Join and condition - MySQL

PEOPLE PEOPLE_FAVS
id user_id fav_id
------ ------- ----------
1 1 1
2 1 2
3 1 5
4 2 1
5 2 2
6
I have two tables PEOPLE and PEOPLE_FAVS, I am trying to get all PEOPLE which have not favorited number '5' so it should return
PEOPLE
id
------
2
3
4
5
6
I'm trying with this query:
SELECT `people`.`id`
FROM `people`
LEFT OUTER JOIN `people_favs` ON (`people_favs`.`user_id` = `people`.`id`)
WHERE (`people_favs`.`fav_id` != 5)
GROUP BY `people`.`id`
Here is a SQL Fiddle: http://sqlfiddle.com/#!2/4102b8/3
SELECT p.*
FROM people p
LEFT
JOIN people_favs pf
ON pf.user_id = p.id
AND pf.fav_id = 5
WHERE pf.fav_id IS NULL
http://sqlfiddle.com/#!2/665b6/1
You don't actually need to use an outer join. Outer joins are often used when you want to see ALL rows from one table, regardless of their condition with another. While it would work in this case (as seen by Strawberry's example), you can use the NOT EXISTS operator to check for ids that do not have 5 as a favorite.
As far as I am aware, there is little to no performance difference, but this query is a little shorter. I also feel it is a little more logical, because you aren't really joining information. That's just a personal opinion/thought though.
Try this:
SELECT id
FROM people
WHERE NOT EXISTS(SELECT id FROM people_favs WHERE fav_id = 5 AND user_id = id);
SQLFiddle example using your data.
Did you try to simply do this:
SELECT DISTINCT `people`.`id`
FROM `people`
JOIN `people_favs` ON (`people_favs`.`user_id` = `people`.`id`)
WHERE (`people_favs`.`fav_id` <> 5)
GROUP BY `people`.`id`

MySQL join a different table based on field value

I have a table containing the last comments posted on the website, and I'd like to join a different table depending on the comment type.
Comments Table is similar to this structure:
id | type | ressource_id |
---+------+--------------+
1 | 1 | 10 |
2 | 3 | 7 |
What I'd like to do is to join the "News" table if type type = 1 (on news.id = comments.ressource_id), "tutorial" table if type type = 3, etc.
How can I do this please? I've tried different queries using CASE and UNION, but never got the expected results.
Thanks.
Try using left outer join and a on clause matching on the type:
select coalesce(n.id, t.id) id
, c.type
, c.resource_id
from comments c
left
outer
join news n
on n.id = comments.resource_id
and c.type = 1
left
outer
join tutorial t
on t.id = comments.resource_id
and c.type = 3
You can do this with unioned queries assuming that you can coerce partial queries into producing a similar schema, along the lines of:
select n.id as id, c.something as something
from news n, comments c
where n.type = 1
and n.id = c.resource_id
union all
select n.id as id, t.something as something
from news n, tutorial t
where n.type = 3
and n.id = t.resource_id
In other words, the first query simply joins news and comments for rows where news.type indicates a comment, and the second query joins news and tutorials for rows where news.type indicates a tutorial.
Then the union combined the two into a single record set.
I'd steer clear of case in this situation (and many others) since it almost always invariably requires per-row modification of the data, which rarely scales wee. Running two queries and combining the results is usually more efficient.

MySql In an inner join does it matter which table comes first?

I'm selecting data to output posts a user has book marked.
The main table which holds the ids of the posts a user has bookMarked, is called bookMarks. This is the table based on which posts will be selected from the posts table, to display to the user.
bookMarks
id | postId | userId
--------------------------
1 | US01 | 1
2 | US02 | 1
3 | US01 | 2
4 | US02 | 2
posts
id | postId | postTitle
--------------------------
1 | US01 | Title 1
2 | US02 | Title 2
3 | US03 | Title 3
4 | US04 | Title 4
My sql is currently like this:
select a.postsTitle
from posts a
inner join bookmarks b
on b.userId = a.userId
and b.userId = :userId
Notice, I have the table posts put first before the table bookmarks. But, since I'm selecting based on whats there in bookmarks, is it necessary I declare the table bookmarks first instead of post in the sql statement? Will doing it the way I'm doing it cause and problems in data selection or efficiency?
Or should I do it like:
select b.postsTitle
from bookmarks a
inner join posts b
on a.userId = b.userId
and a.userId = :userId
Notice, I have table bookmarks put first here.
Instead of the following:
select a.postsTitle
from posts a
inner join bookmarks b
on b.userId = a.userId
and b.userId = :userId
You should consider formatting your JOIN in this format, using the WHERE clause, and proper capitalization:
SELECT p.postsTitle
FROM bookmarks b
INNER JOIN posts p
ON p.userId = b.userId
WHERE b.userId = :userId
While it makes no difference (performance wise) to MySQL which order you put the tables in with INNER JOIN (MySQL treats them as equal and will optimize them the same way), it's convention to put the table that you are applying the WHERE clause to first. In fact, assuming proper indexes, MySQL will most likely start with the table that has the WHERE clause because it narrows down the result set, and MySQL likes to start with the set that has the fewest rows.
It's also convention to put the joined table's column first in the ON clause. It just reads more logically. While you're at it, use logical table aliases.
The only caveat is if you don't name your columns and instead use SELECT * like the following:
SELECT *
FROM bookmarks b
INNER JOIN posts p
ON p.userId = b.userId
WHERE b.userId = :userId
You'll get the columns in the order they're listed in the query. In this case, you'll get the columns for bookmarks, followed by the columns for posts.
Most would say never use SELECT * in a production query, but if you really must return all columns, and you needed the columns from posts first, you could simply do the following:
SELECT p.*, b.*
FROM bookmarks b
INNER JOIN posts p
ON p.userId = b.userId
WHERE b.userId = :userId
It's always good to be explicit about the returned result set.
There is no effect on query performance or final resultset with respect to placement of table on either side of JOIN clause if INNER JOIN is used .
The only difference observed is in order of columns returned and that too only if SELECT * is used . Suppose you have tableA(aid,col1,col2) and tableB(bid,col3)
SELECT *
FROM tableA
INNER JOIN tableB
ON tableA.aid=tableB.bid
returns column in order
aid|col1|col2|bid|col3
On otherhand
SELECT *
FROM tableB
INNER JOIN tableA
ON tableA.aid=tableB.bid
returns column in order
bid|col3|aid|col1|col2|
But it matters in case of LEFT JOIN or RIGHT JOIN.