Select in a many-to-many relationship in MySQL - mysql

I have two tables in a MySQL database, Locations and Tags, and a third table LocationsTagsAssoc which associates the two tables and treats them as a many-to-many relationship.
Table structure is as follows:
Locations
---------
ID int (Primary Key)
Name varchar(128)
LocationsTagsAssoc
------------------
ID int (Primary Key)
LocationID int (Foreign Key)
TagID int (Foreign Key)
Tags
----
ID int (Primary Key)
Name varchar(128)
So each location can be tagged with multiple tagwords, and each tagword can be tagged to multiple locations.
What I want to do is select only Locations which are tagged with all of the tag names supplied. For example:
I want all locations which are tagged with both "trees" and "swings". Location "Park" should be selected, but location "Forest" should not.
Any insight would be appreciated. Thanks!

There are two ways to do this. I prefer the first way, which is to self-join for each tag:
SELECT l.*
FROM Locations l
JOIN LocationsTagsAssoc a1 ON a1.LocationID = l.ID
JOIN Tags t1 ON a1.TagID = t1.ID AND t1.Name = ?
JOIN LocationsTagsAssoc a2 ON a2.LocationID = l.ID
JOIN Tags t2 ON a2.TagID = t2.ID AND t2.Name = ?
JOIN LocationsTagsAssoc a3 ON a3.LocationID = l.ID
JOIN Tags t3 ON a3.TagID = t3.ID AND t3.Name = ?;
The other way also works, but using GROUP BY in MySQL tends to incur a temporary table and performance is slow:
SELECT l.*
FROM Locations l
JOIN LocationsTagsAssoc a ON a.LocationID = l.ID
JOIN Tags t ON a.TagID = t.ID
WHERE t.Name IN (?, ?, ?)
GROUP BY l.ID
HAVING COUNT(*) = 3;
Re comment from #Erikoenig:
If you want to make sure there are no extra tags, you can do it this way:
SELECT l.*
FROM Locations l
JOIN LocationsTagsAssoc a ON a.LocationID = l.ID
JOIN Tags t ON a.TagID = t.ID
GROUP BY l.ID
HAVING COUNT(*) = 3 AND SUM(t.Name IN (?, ?, ?)) = 3;
Taking out the WHERE clause allows other tags to be counted, if there are any. So the COUNT() may be greater than 3.
Or if the count is exactly three tags, but some of these three are not the correct tags, then the SUM() condition in the HAVING clause makes sure that all three tags you want are present in the group.

You need locations where there doesn't exist a given tag that doesn't appear in the LocationsTagsAssoc table with the location.
You can specify the given tags with IN () as in the following, or by joining onto another table containing them.
I.e.
SELECT l.*
FROM Locations AS l
WHERE NOT EXISTS (
SELECT NULL FROM Tags AS t
WHERE NOT EXISTS (
SELECT NULL FROM LocationsTagsAssoc AS lt
WHERE lt.LocationId = l.ID
AND lt.TagID = t.ID
)
AND t.ID IN (1, 2, 3,...)
)

Related

joining tables with not exists query [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Mysql: Perform of NOT EXISTS. Is it possible to improve permofance?
Is there a better/optimal way to do it. Should I use exists instead of join? Or two separate queries? And what about temporary tables, as I was reading about those but uncertain.
Getting members email from a group. Checking that they have not received a item yet.
SELECT m.email,g.id
FROM group g
LEFT JOIN members m
ON g.mid = m.id
AND g.gid='1'
WHERE NOT EXISTS
( SELECT id
FROM items AS i
WHERE i.mid=m.id
AND i.item_id='5'
)
Here's the same thing written as a JOIN:
SELECT m.email, g.id
From members m
JOIN group g ON g.mid = m.id AND g.gid = '1'
LEFT JOIN items i ON i.mid = m.id AND i.item_id = '5'
WHERE i.id IS NULL
Use the following compound indexes:
group (mid, gid)
items (mid, item_id)
I reversed the LEFT JOIN on members and group because it seems like you're returning members, not groups, and I changed the LEFT JOIN into an INNER JOIN since you only want members from that group.
I think this one might read better:
SELECT m.email, g.id
From members m
JOIN group g ON g.mid = m.id
LEFT JOIN items i ON i.mid = m.id AND i.item_id = 5
WHERE g.gid = 1
AND i.id IS NULL
You might be wondering if we can move the i.item_id = 5 part to the WHERE clause also. You can't because there are no rows where i.id IS NULL and i.item_id = 5. You must do the join first and then eliminate the NULL rows in the WHERE clause.
I don't believe a temporary table is necessary. We'd really only go that route if we can't get acceptable performance.
From your query, we gather your schema looks like this:
group (id INT PK, gid INT, mid INT)
items (id INT PK, item_id INT, mid INT)
members (id INT PK, email VARCHAR)
It looks like your group table is really a "membership" table, which resolves/implements a many-to-many relationship between a group and a person. (That is, a person can be a member of zero, one or more groups; a group can have zero, or or more persons as members.)
You are using a LEFT JOIN between group and members. This will return a row for group (returning group.id) when there are no matching members, with a NULL for members.email (which may be what you want). But if you only want to return email addresses, then this can be changed to an INNER JOIN.
The NOT EXISTS predicate can be replaced with an OUTER JOIN and a test for a NULL value returned from the JOINED table. If the group.gid and/or items.item_id columns are numeric datatype, then you can remove the quotes from around the integer literals in the predicates.
Here is an alternative which will return an equivalent resultset, and may perform better:
SELECT m.email
, g.id
FROM members m
JOIN group g ON g.mid = m.id AND g.gid = 1
LEFT
JOIN items i ON i.mid = m.id AND i.item_id = 5
WHERE i.id IS NULL
ADDENDUM:
TEST CASE (provided in comment on selected answer) demonstrates difference in result set between queries with the predicate items.item_id = 5 in the ON clause and in the WHERE clause. (Moving this predicate to the WHERE clause messes with the anti-join.)
CREATE TABLE `group` (`id` INT PRIMARY KEY, `gid` INT, `mid` INT);
CREATE TABLE `items` (`id` INT PRIMARY KEY, `item_id` INT, `mid` INT);
CREATE TABLE `members` (`id` INT PRIMARY KEY, `email` VARCHAR(40));
INSERT INTO `group` VALUES (1,1,1), (2,1,2);
INSERT INTO `items` VALUES (1,5,1);
INSERT INTO `members` VALUES (1,'one#m.com'),(2,'two#m.com');

How to subselect based on multiple column key

I'm working with a legacy database that uses a three column key for products. I want to select all products that have a status of 'A' or that have a matching record in a second table. If it were a single column primary key (like 'id'), I would do it this way:
SELECT * FROM `product`
WHERE `status` = 'A'
OR `id` IN (SELECT `foreign_key` FROM `table2`)
I can't figure out how to do the IN-clause subselect with three keys though. I suppose I can concatenate the keys together and compare the strings, but that seems horribly inefficient. Is there a way to do this without concatenation?
You can LEFT JOIN table product and table2 on the composite key, then status = 'A' OR table2.id IS NOT NULL
A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better
SELECT * FROM product p1
WHERE status = 'A'
OR EXISTS (SELECT *
FROM table2 t2
WHERE t2.id = p1.foreign_key
AND t2.other_key = p1.secret_key
...
);
Do a left join :)
SELECT p.* FROM product p
LEFT JOIN table2 t2 on p.key1 = t2.key1 and p.key2 = t2.key2 and p.key3 = t2.key3
WHERE status = 'A' OR t2.key1 IS NOT NULL
You could use a UNION:
SELECT *
FROM 'product'
WHERE 'status' = 'A'
UNION
SELECT *
FROM 'product'
JOIN 'table2'
ON (product.id = table2.foreign_key
AND ...)

MYSQL: evaluating a missing row into a result

I am trying to fetch records from 2 tables mapped by an id where on the second table there may be a row that is missing.
I have a column called name on the second table which contains a string value. The value I need to extract is 'subscriptions' but this does not always exist in the table. There is the possibility to have different values within this column which I do not want to extract.
Is it possible to check to see if the value exists and if it doesn't output null to all the fields.
So far I have this which returns all the records
select COUNT(*)
from PUser a, PAttribute b
where exists (select null
from PAttribute c
where c.name = 'subscriptions' or c.name is null)
and a.id = b.userid;
Hope that explains it.
EDIT
PUser table
id
other columns
PAttribute table
userid mapped to PUser.id
name
Now a userid can have multiple rows each with a different value in name eg, 'subscriptions', 'source', 'etc' 'etc'
I want to fetch all users who have the value 'subscriptions' in the name column or if the row doesnt exist with the value 'subscriptions' as they may not have any.
If they don't have this row the output should be null.
EDIT 2:
Worked this out and I needed
select COUNT(*),(select b.stringValue from PAttribute b where b.userid = a.id and b.name = 'subscriptions') from PUser a order by a.id desc;
Your example is using implicit joins, which are inner joins. This means that a result will only be returned if a row exists in both tables. Instead, you need to use a left join. Change your query to this:
select COUNT(*)
from PUser a LEFT JOIN PAttribute b ON a.id = b.userID
where exists (select null
from PAttribute c
where c.name = 'subscriptions' or c.name is null);
Or (not exactly sure what your desired behavior is), this might work for you:
SELECT count(*)
FROM PUser a LEFT JOIN PAttribute b ON a.id = b.userID
WHERE b.name = 'subscriptions' OR b.name IS NULL;
If you want to exclude rows that do not contain 'subscriptions', you could use the JOIN ON form and in order to keep rows from PUser even there is no matching row from PAttribute with name set to 'subrciptions', and thus obtaining null fields, exploit OUTER JOIN.
select COUNT(*)
from PUser a OUTER JOIN PAttribute b ON ( a.id = b.userid AND b.name = 'subscriptions' )
;
This is a little bit different from your query: EXISTS is less perfomant and, moreover, the SELECT in the EXISTS does search for a row in PAttribute with name equal to null, that is quite different from handling missing rows.

How to join three different tables in mysql with no common field amongst them all

If I have table 1 with fields Groupid and Branchid amonst others, table 2 with fields Group id and Groupname amongst others and table 3 with fields Branchid and Branchname amongst others, how do I join these tables?
natural join does not work.
SELECT foo
FROM Table1
JOIN Table2 ON Table2.GroupID = Table1.GroupID
JOIN Table3 ON Table3.BranchID = Table1.BranchID
This is what the query should likely be typically. Is this different that what you have?
(SQL Server syntax)
SELECT Column1, Column2...
FROM GroupBranch_Rel
INNER JOIN Groups
ON Groups.GroupID = GroupBranch_Rel.GroupID
INNER JOIN Branches
ON Branches.BranchID = GroupBranch_Rel.BranchID

Filter out rows by hardcoded list in MySQL performance

I have a hardcoded list of values like: 1,5,7,8 and so on.
And I must filter out rows from table that have ID in list above, so I do something like this:
Select
*
from myTable m
left join othertable t
on t.REF_ID = m.ID
where m.ID not in (1,5,7,8...)
But when I have more values (like 1000) and more rows (100) in othertable and myTable this query starts to be slow. I have an index on REF_ID and ID. It seems that the part "where m.ID in (1,5,7,8) is the problem.
Is there faster way to filter out rows by hardcoded list of values?
Try putting your list in a temporary table as temptable.ID and doing
SELECT *
FROM myTable m
LEFT JOIN othertable t ON t.REF_ID = m.ID
LEFT JOIN temptable ON m.ID = temptable.ID
WHERE temptable.ID IS NULL