Select all the data matching multiple values in the same time - mysql

I have 2 following tables in my DB:
Statements
and Tags
When I need to select all the statements matching some tag ($tag), I use this SQL query:
$select = "SELECT Statements.id, Statements.name, Statements.link FROM Statements JOIN Tags ON Statements.id = Tags.probID AND Tags.tag = '$tag'";
Now I need to select all the statements matching a few tags in the same time. How can I do that? Should I change something in my DB?

You can match the tags using IN and then use GROUP BY to ensure that you have all of them:
SELECT s.id, s.name, s.link
FROM Statements s JOIN
Tags t
ON s.id = t.probID
WHERE t.tag IN ('$tag1', '$tag2', '$tag3')
GROUP BY s.id, s.name, s.link
HAVING COUNT(DISTINCT t.tag) = 3;
Note: the above example is for three tags. The number in the HAVING clause needs to match the number of tags you are looking for.

Gordon's answer is good. Nonetheless I want to give you an advice here: Don't make it a habit to join everything first and only then think how to get to your data.
Your original problem was simply to find statements for which exist a certain tag, or in other words that has a tag in the other table. Hence use EXISTS or IN.
select id, name, link
from statement
where id in (select probid from tags where tag = #tag);
You prabably see that this is more straight-forward than your query, because it is written like one might word the task: select data from statement where the ID is in the set of IDs tagged with #tag.
As to the additional requirement, Gordon's way is perfect: Look at the IDs in tags (i.e. group by the ID) and find the one having all tags. That changes above statement to:
select id, name, link
from statement
where id in
(
select probid
from tags
where tag in (#tag1, #tag2, #tag3)
group by probid
having count(distinct tag) = 3
);

Related

How to Include/Exclude array of IDs from a relationship/pivot table and avoid duplicates?

Let's say you have
records table with id and name
tags table with id and name
records_tags with record_id and tags_id (relationship table)
Now you want to run a query to include records that have X tags and exclude records that have X tags.
You could do INNER JOIN, but the challenge here is, when there are many tags to a record, it creates duplicates within the results.
Example:
inner join `records_tags` on `records_tags`.`record_id` = `records`.`id`
and `records_tags`.`tag_id` in (?) and `records_tags`.`tag_id` not in (?)
As for the Laravel side, Ive used:
$records->join('records_tags', function ($join) use($include, $exclude) {
$join->on('records_tags.record_id','=','records.id');
if ($include) $join->whereIn('records_tags.tag_id',$include);
if ($exclude) $join->whereNotIn('records_tags.tag_id',$exclude);
});
Could there be a better solution to handle this or a way to ask for it to create unique or distinct rows, the goal of the join is only to include or exclude the actual records themselves from the results?
Edit:
The only other thing I can think of is doing something like this, still have to run tests to see accuracy, but for a crude solution
Edit 2: This doesn't appear to work on NOT IN as it creates duplicates.
$records->join(\DB::raw('(SELECT tag_id, record_id FROM records_tags WHERE records_tags.tag_id IN ('.implode(',',$include).'))'),'records_tags.record_id','=','records.id');
The conditions in the ON clause:
... and `records_tags`.`tag_id` in (?) and `records_tags`.`tag_id` not in (?)
do not exclude from the results the ids of the records that you want to exclude.
Any id that is linked to any of the wanted tags will be returned even if it is also linked to an unwanted tag, because the joins return 1 row for each of the linked tags.
What you can use is aggregation and the conditions in the HAVING clause:
SELECT r.id, r.name
FROM records r INNER JOIN records_tags rt
ON rt.record_id = r.id
GROUP BY r.id -- I assume that id is the primary key of records
HAVING SUM(rt.tag_id IN (?)) > 0
AND SUM(rt.tag_id IN (?)) = 0;
or, if you want the ids that are linked to all the wanted tags, use GROUP_CONCAT():
SELECT r.id, r.name
FROM records r INNER JOIN records_tags rt
ON rt.record_id = r.id
GROUP BY r.id
HAVING GROUP_CONCAT(rt.tag_id ORDER BY rt.tag_id) = ?;
In this case you will have to provide for the wanted tags ? placeholder a sorted comma separated list of ids.

How to select every field from foreign table using intermediate table in SQL?

I have a question about pure SQL. I have a many to many relation with 3 tables: users, tags and user_tag. What I'm trying to do is select every field from the tags table, for as many entries where the user id matches the entries in user_tag.
The query I have right now looks like this
SELECT * FROM tags JOIN users_tags ON (users_tags.user_id = 1);
This retrieves the correct information (twice for some odd reason) but also appends unnecessary data from the pivot table (because of the SELECT *, but I need to keep it that way).
How can I only get relevant data from the tags table only then? (edited)
Thanks for your attention
You are missing the JOIN condition that connects the two tables. You only have a filtering condition. Something like this:
SELECT *
FROM tags t JOIN
users_tags ut
ON t.tag_id = ut.tag_id
WHERE ut.user_id = 1;
You haven't explained what the columns are, so of course the column names might be different.
The answer above works or i'm assuming if there's some name you want from the user table you could use this or a variation of this:
SELECT user.name, tags.*
FROM tags
INNER JOIN users_tags ON tags.tag_id = users_tags.tag_id
INNER JOIN user ON users_tags.user_id = users.user_id
where users_tags = 1;

CONCAT() result containing CONCAT()

Database setup:
http://sqlfiddle.com/#!2/4d1c2/1
Following query selects all tags which belongs to productID and their places, comma separated:
SELECT CONCAT_WS(',', GROUP_CONCAT(Tags.Name))
FROM `ProductTags`
LEFT JOIN Tags ON ProductTags.TagID = Tags.TagID
WHERE `ProductID` = 46356
GROUP BY DisplayOrder
It can contain 1-3 rows.
More complex query shows category, full of (like 50-100) products.
I want all tags be available at once, pass them to juery and then display.
The question is: how can i concat() this query into one field, so i only have one big query, or should i handle it with php and have like 100 queries at page?
I donĀ“t know if I get you right, but this could be one solution:
SELECT CONCAT_WS(',', GROUP_CONCAT( DISTINCT Tags.Name))
FROM `ProductTags`
LEFT JOIN Tags ON ProductTags.TagID = Tags.TagID
This will show you all Tags for all productTags. (DISTINCT makes the names unique)

Select a post that does not have a particular tag

I have a post/tag database, with the usual post, tag, and tag_post tables. The tag_post table contains tagid and postid fields.
I need to query posts. When I want to fetch posts that have a certain tag, I have to use a join:
... INNER JOIN tag_post ON post.id = tag_post.postid
WHERE tag_post.tagid = {required_tagid}`
When I want to fetch posts that have tagIdA and tagIdB, I have to use two joins (which I kind of came to terms with eventually).
Now, I need to query posts that do not have a certain tag. Without much thought, I just changed the = to !=:
... INNER JOIN tag_post ON post.id = tag_post.postid
WHERE tag_post.tagid != {certain_tagid}`
Boom! Wrong logic!
I did come up with this - just writing the logic here:
... INNER JOIN tag_post ON post.id = tag_post.postid
WHERE tag_post.postid NOT IN
(SELECT postid from tag_post where tagid = {certain_tagid})
I know this will work, but due to the way I've been brought up, I feel guilty (justified or not) whenever I write a query with a subquery.
Suggest a better way to do this?
You can think of it as "find all rows in posts that do not have a match in tags (for a specific tag)"
This is the textbook use case for a LEFT JOIN.
LEFT JOIN tag_post ON post.id = tag_post.postid AND tag_post.tagid = {required_tagid}
WHERE tag_post.tag_id IS NULL
Note that you have to have the tag id in the ON clause of the join.
For a reference on join types, see here: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
In addition to Gavin Towey's good answer, you can use a not exists subquery:
where not exists
(
select *
from tag_post
where post.id = tag_post.postid
and tag_post.tagid = {required_tagid}
)
The database typically executes both variants in the same way. I personally find the not exists approach easier to read.
When I want to fetch posts that have tagIdA and tagIdB, I have to use two joins (which I kind of came to terms with eventually).
There are other ways.
One can obtain all the id of all posts that are tagged with both tagid 123 and 456 by grouping filtering tag_post for only those tags, grouping by post and then dropping any groups that contain fewer tags than expected; then one can use the result to filter the posts table:
SELECT * FROM posts WHERE id IN (
SELECT postid
FROM tag_post
WHERE tagid IN (123,456)
GROUP BY postid
HAVING COUNT(*) = 2
)
If a post can be tagged with the same tagid multiple times, you will need to replace COUNT(*) with the less performant COUNT(DISTINCT tagid).
Now, I need to query posts that do not have a certain tag.
This is known as an anti-join. The easiest way is to replace IN from the query above with NOT IN, as you proposed. I wouldn't feel too guilty about it. The alternative is to use an outer join, as proposed in #GavinTowey's answer.

self join with a self-referring condition

What I want to do is to get all records that have almost exact duplicates except that duplicates don't have an extra char at the beginning of 'name'
this is my sql query:
select * from tags as spaced inner join tags as not_spaced on not_spaced.name = substring(spaced.name, 2);
also I tried:
select * from tags as spaced where (select count(*) from tags as not_spaced where not_spaced.name = substring(spaced.name, 2)) > 0;
What I'm getting is... the SQL connection stops responding.
Thanks!
p.s. Sorry I haven't mentioned that the only field I need is name. All other fields are insignificant (if present).
Try something like this:
select all potentially duplicated fields except name , name
from tags union all
select all potentially duplicated fields except name , substring(name, 2) name
from tags
group by all potentially duplicated fields including name
having count(*) > 1
If the tables are very large, make an index on name and substring(name,2) to make it faster:
select t1.* from tags t1
inner join tags t2 on t1.name = substring(t2.name, 2)
Even with an Index, your query will require every record in spaced to be checked against every record in tags.
If each table has 1,000 records, that's 1,000,000 combinations.
You may be better off creating a temporary table with just two fields spaced.id, substring(t2.name, 2) as shortname, then index the shortname field. Joining on that temporary and indexed table will be much much faster.
Without knowing the DB, how the tables are indexed, etc, it's just trying different things until one gets better optimized...
Here is another query you can try:
SELECT name, count(*) c FROM (
SELECT name FROM tags
UNION ALL
SELECT substring(name, 2) AS name FROM tags
) AS t
GROUP BY name