Mysql: SELECT and GROUP BY - mysql

Sorry for the abysmal title - if someone wants to change it for something more self-explanatory, great - I'm not sure how to express the problem. Which is:
I have a table like so:
POST_ID (INT) TAG_NAME (VARCHAR)
1 'tag1'
1 'tag2'
1 'tag3'
2 'tag2'
2 'tag4'
....
What I want to do is count the number of POSTs which have both tag1 AND tag2.
I've messed about with GROUP BY and DISTINCT and COUNT but I can't construct a query which does the trick.
Any suggestions?
Edit: In pseudo sql, the query I want is:
SELECT DISTINCT(POST_ID) WHICH HAS TAG_NAME = 'tag1' AND TAG_NAME = 'tag2';
Thanks

Edit: because 'TABLE' was a poor choice for a missing tablename, I'll suppose your table is called Posts.
Join the table against itself:
SELECT * FROM Posts P1
JOIN Posts P2
ON P1.POST_ID = P2.POST_ID
WHERE P1.TAG_NAME = 'tag1'
AND P2.TAG_NAME = 'tag2'

I'm just leaving this (untested) dependent subquery solution here for reference, even though it'll probably be horribly slow once you get to large data sets. Any solution that does the same thing using joins should be chosen over this.
Assuming you have a posts table with an id field, as well:
SELECT count(*) FROM posts WHERE EXISTS(SELECT NULL FROM posts_tags WHERE tag = 'tag1' AND post_id = posts.id) AND EXISTS(SELECT NULL FROM posts_tags WHERE tag = 'tag2' AND post_id = posts.id)

Try the following query:
SELECT COUNT(*) nb_posts
FROM (
SELECT post_id, COUNT(*) nb_tags
FROM table
WHERE tag_name in ('tag1','tag2')
GROUP BY post_id
HAVING COUNT(*) = 2
) t
Edit: based on Konerak answer, here is the query that handles the case when there are duplicated tag names for a given post:
SELECT DISTINCT t1.post_id
FROM table t1
JOIN table t2
ON t1.post_id = t2.post_id
AND t2.tag_name = 'tag2'
WHERE t1.tag_name = 'tag1'

Related

How to make a query with AND operator between rows [duplicate]

This question already has answers here:
Select values that meet different conditions on different rows?
(6 answers)
Closed 3 years ago.
I have the following table, witch is the relationship pivot table, between posts and tags:
post_tags table
I need to get the posts that strict contain a 'X' given tag(s).
For example:
If I need posts with exclusively tags 1 and 2, it should returns post_id 1 and 4.
If I need post with tag 2, it should only returns post_id 3.
If I need post with tag 23, it should't returns nothing.
I've tried with:
SELECT * FROM `post_tags` WHERE tag_id = 1 OR tag_id = 2;
but obviously it returns all post_id with these tags_id
And with:
SELECT * FROM `post_tags` WHERE tag_id = 1 AND tag_id = 2;
It doest's return anything, because it's trying to comparate between columns.
Any solution?
You need to group by post_id and check the conditions in the having clause:
SELECT post_id
FROM post_tags
GROUP BY post_id
HAVING
SUM(tag_id NOT IN (1, 2)) = 0
AND
COUNT(DISTINCT tag_id) = 2
This will return only posts with tags 1 and 2 and no other tag.
For posts with only tag 2:
SELECT post_id
FROM post_tags
GROUP BY post_id
HAVING
SUM(tag_id <> 2) = 0
AND
COUNT(DISTINCT tag_id) = 1
If each post_id, tag_id pair is unique, then you can do this:
SELECT post_id
FROM post_tags
GROUP BY post_id
HAVING COUNT(tag_id IN (1, 2)) = COUNT(tag_id)
You could use a correlated subquery:
SELECT *
FROM post_tags pt
WHERE (
SELECT count(*)
FROM post_tags
WHERE pt.post_id = post_id
AND post_tag IN (1, 2)
) = 2
Or you could use an in list (using similar logic):
SELECT *
FROM post_tags pt
WHERE post_id IN (
SELECT post_id
FROM post_tags
WHERE post_tag IN (1, 2)
GROUP BY post_id
HAVING count(*) = 2
)

Get latest user comments from posts sql

I`m new to sql.I have two joined tables, it's posts tables posts and comments, I'm using MySql.
Post :
id, UserName, Phone , product
Comments:
id , CommentText, post_id
I'm ussing join query to join them
SELECT t1.UserName , t1.Phone, t2.Comments
FROM table1 AS t1
LEFT JOIN table2 AS t2 ON (t1.id = t2.follow_id )
And now I need to display all unique users and their last comments,so they look like that
1. User1 LastComment1
2. User2 LastComment2
3. User3 LastComment3
...
I will be very grateful for the help.
i assume that column id in both tables are AUTO_INCREMENT PRIMARY KEY.
This corelated subquery wil give you the correct answer.
Query
SELECT
Post.Username
, Post.Phone
, (SELECT
Comments.CommentText
FROM
Comments
WHERE
Comments.post_id = Post.id
ORDER BY
Comments.id DESC
LIMIT 1
)
AS LastComment
FROM
Post
You might want to add a index on the Comments.post_id to improve performance off this query.

MySQL select join where AND where

I have two tables in my database:
Products
id (int, primary key)
name (varchar)
ProductTags
product_id (int)
tag_id (int)
I would like to select products having all given tags. I tried:
SELECT
*
FROM
Products
JOIN ProductTags ON Products.id = ProductTags.product_id
WHERE
ProductTags.tag_id IN (1, 2, 3)
GROUP BY
Products.id
But it gives me products having any of given tags, instead of having all given tags. Writing WHERE tag_id = 1 AND tag_id = 2 is pointless, because no rows will be returned.
This type of problem is known as relational division
SELECT Products.*
FROM Products
JOIN ProductTags ON Products.id = ProductTags.product_id
WHERE ProductTags.tag_id IN (1,2,3)
GROUP BY Products.id /*<--This is OK in MySQL other RDBMSs
would want the whole SELECT list*/
HAVING COUNT(DISTINCT ProductTags.tag_id) = 3 /*Assuming that there is a unique
constraint on product_id,tag_id you
don't need the DISTINCT*/
you need to have a group by / count to ensure all are accounted for
select Products.*
from Products
join ( SELECT Product_ID
FROM ProductTags
where ProductTags.tag_id IN (1,2,3)
GROUP BY Products.id
having count( distinct tag_id ) = 3 ) PreQuery
on ON Products.id = PreQuery.product_id
The MySQL WHERE fieldname IN (1,2,3) is essentially shorthand for WHERE fieldname = 1 OR fieldname = 2 OR fieldname = 3. So if you aren't getting the desired functionality with WHERE ... IN then try switching to ORs. If that still doesn't give you the results you want, then perhaps WHERE ... IN is not the function you need to use.

CASE + IF MysQL query

Problem is as follows. I have a product that can be in one of three categories (defined by category_id). Each category table has category_id field related to category_id in product table. So I have 3 cases. I'm checking If my product.category_id is in table one. If yes, I take some values. If not I check in tables that are left. What can I write In the ELSE section? Can anyone correct my query ?
CASE
WHEN IF EXISTS(SELECT * FROM table1 WHERE category_id='category_id') THEN SELECT type_id FROM table1 WHERE category_id='category_id';
WHEN IF EXISTS(SELECT * FROM table2 WHERE category_id='category_id') THEN SELECT value_id FROM table2 WHERE category_id='category_id';
WHEN IF EXISTS(SELECT * FROM table3 WHERE category_id='category_id') THEN SELECT group_id FROM table3 WHERE category_id='category_id';
ELSE "dont know what here";
END;
In the else you would put whatever you want as default value, for example null.
I think that it would be much more efficient to make three left joins instead of several subqueries for each product in the result, and use coalesce to get the first existing value. Example:
select coalesce(t1.type_id, t2.value_id, t3.group_id)
from product p
left join table1 t1 on t1.category_id = p.category_id
left join table2 t2 on t2.category_id = p.category_id
left join table3 t3 on t3.category_id = p.category_id
example
SELECT CompanyName,
Fax,
CASE WHEN IF(Fax='', 'No Fax', 'Got Fax')='No Fax' THEN NULL
ELSE IF(Fax='', 'No Fax', 'Got Fax')
END AS Note
FROM Customers;
You can possibly include this...
SELECT "Unknown type" FROM table1;
You do not need to use ELSE if there is nothing left to do.
or something like this
CASE
WHEN IF EXISTS(SELECT * FROM table1 WHERE category_id='category_id') THEN SELECT type_id FROM table1 WHERE category_id='category_id';
WHEN IF EXISTS(SELECT * FROM table2 WHERE category_id='category_id') THEN SELECT value_id FROM table2 WHERE category_id='category_id';
ELSE SELECT group_id FROM table3 WHERE category_id='category_id';
In addition to Guffa's answer here is another approach - assuming #category_id is
SET #category_id = 'some_category_id_value'
then
SELECT t1.type_id
WHERE category_id = #category_id
UNION ALL
SELECT t2.value_id
WHERE category_id = #category_id
UNION ALL
SELECT t3.group_id
WHERE category_id = #category_id
should return what you ask for (and performance is not bad either).
If you have certain category_id in more then one table you will get multiple records (you can get out of that by limiting the number of results to 1; you might need to make it the whole union a subquery and order it, but not sure, consult the docs)
However, your question looks like you have a problem with a design of your tables
why do you keep three category tables and not one?
what is the relationship between type_id, value_id and group_id and why does it make sense to select them as if they were the same thing (what is the meaning/semantics of each table/column)?
how do you guarantee that you don't have entries in multiple tables that correspond to one product (and implement other business rules that you might have)?
These questions could have valid answers, but you should know them :)

Optimizing Multilevel MySQL subqueries (folksonomy and taxonomy)

I was reading the great tagging article by Nitin Borwankar and he started me thinking of the ways to implement differnet levels of searches using two tables.
tags {
id,
tag
}
post_tags {
id
user_id
post_id
tag_id
}
I started with the simple example of T(U(i)) which means all tags of all users that have an item i. I was able to do it with the following SQL:
/* get all tags from the users found */
SELECT t.*, vt.* FROM verse_tags as vt
LEFT JOIN tags as t ON t.id = vt.tag_id
WHERE user_id in
(
/* Get all user_ids that have taged this item */
SELECT user_id FROM verse_tags WHERE verse_id = 26046 GROUP BY user_id
)
GROUP BY t.id
Then I started with a slightly harder +1 level deep query. T(U(T(u))) which is tags of users using tags like user #.
/* Then get the tags of the user with tags like the user 3 */
SELECT t.id FROM post_tags as pt
LEFT JOIN tags as t ON t.id = pt.tag_id
WHERE user_id in
(
/* Then get users with these tags */
SELECT pt.user_id FROM post_tags as pt
LEFT JOIN tags as t on t.id = pt.tag_id
WHERE tag_id in
(
/* get tags of user */
SELECT t.id FROM post_tags as pt
LEFT JOIN tags as t ON t.id = pt.tag_id
WHERE pt.user_id = 3
GROUP BY t.id
)
GROUP BY user_id
)
GROUP BY t.id
However, it since I normally use JOIN's in my queries I am not sure how something like this could be optimized or what design flaws need to be avoided when using subqueries. I have even read that JOIN's should be used instead, but I have no idea how this would be accomplished with the above queries.
How could these queries be optimized?
UPDATE
1) Replaced GROUP BY with SELECT DISTINCT. (.74 sec)
2) Replace WHERE in with WHERE exists. (.40 sec)
3) Added indexes (oops!) (0.09 sec)
4) Back to WHERE in (0.08 sec)
EXPLAIN SELECT DISTINCT tag_id FROM post_tags WHERE user_id in
(
SELECT DISTINCT user_id FROM post_tags WHERE tag_id in
(
SELECT DISTINCT tag_id FROM post_tags WHERE user_id = 3
)
)
Running EXPLAIN gives me these results:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY post_tags index NULL tag_id 4 NULL 14 Using where
2 DEPENDENT SUBQUERY post_tags index_subquery user_id user_id 4 func 1 Using where
3 DEPENDENT SUBQUERY post_tags index_subquery user_id,tag_id tag_id 4 func 1 Using where
According to me this is the solution:
SELECT DISTINCT(`t`.`id`) FROM `post_tags` as `pt`
left join `tags` as t on `t`.`id` = `pt`.`tag_id`
where `pt`.`user_id` in(
SELECT distinct(`pt`.`user_id`) FROM `post_tags` as `pt`
LEFT JOIN `tags` as `t` on `t`.`id` = `pt`.`tag_id`
WHERE `pt`.`tag_id` in(
SELECT distinct(`tag_id`) FROM `post_tags`
WHERE pt.user_id = 3
)
)