SQL query: What groups is a given member NOT a member? - mysql

I have three tables in MySQL,
groups (key: group_id)
members (key: member_id)
group_member_relations key: group_id, member_id
The last table has combinations of members and groups (members that have joined that group).
I've been struggling with a way to perform a single query that gives me a list of members and groups that are NOT IN the group_member_relations table. (Basically I want to eventually ask the question "What groups is a given member not a member"). I can do this the hard way in code but was wondering if a single query was possible.
Not a SQL wiz at all, but have used it a lot over the last 20 years, mostly basic stuff. This is obviously over my head. Made many attempts over the last few days but, embarrassingly don't seem to get close.
Any pointers from the sql wizards out there..

Groups that a member is not in:
select *
from group
where id not in (
select group_id
from group_member_relations
where member_id = ?)

The following query will list all groups available and the members that are not present on each group. The query will also give all the columns for each table.
SELECT a.*, b.*
FROM members a
CROSS JOIN groups b
LEFT JOIN group_member_table c
ON a.memberID = c.memberID AND
b.groupID = c.groupID
WHERE c.memberID IS NULL OR -- actually this condition is already enough
c.groupID IS NULL
SQLFiddle Demo

Related

SQL select users that belong to two groups

I have a list of persons in a table. I then have another table where I correlate each person to one or more groups. Some persons have only one entry in the groups table but some have multiple.
I am now trying to SELECT list of persons that are in two specific groups. Person must be in BOTH groups in order to qualify.
My table with the basic information on the persons is base and the table with the group correlation is groups_registration. In fact I also have a third table where the groups names and further information are stored but it is not required for this query.
The groups I am trying to gather in this example are 4 and 11.
What I tried initially was:
SELECT base.*, groups_registration.person_id, groups_registration.group_id
FROM base
INNER JOIN groups_registration
ON base.id = groups_registration.person_id
WHERE (groups_registration.group_id = '4' AND groups_registration.group_id = '11')
ORDER BY base.name
This did not get my any response, I assume because no single row contains both group_id = 4 and group_id 11.
I have been searching through stackoverflow with no joy. Do you guys have any ideas?
Obviously, no row has both values. Use group by:
SELECT gr.person_id, groups_registration.group_id
FROM groups_registration gr
WHERE gr.group_id IN (4, 11)
GROUP BY gr.person_id
HAVING COUNT(DISTINCT gr.group_id) = 2;
I'll let you figure out how to join in the additional information from base.
Notes:
Use table aliases to make it easier to write and read queries.
Presumably, the ids are numbers. Compare numbers to numbers. Only use single quotes for date and string constants.
IN is better than long chains of OR/=.
You can use joins as shown below:
SELECT A.*, B.person_id, B.group_id
FROM base A
INNER JOIN
(SELECT gr.person_id, groups_registration.group_id
FROM groups_registration gr
WHERE gr.group_id IN (4, 11)
GROUP BY gr.person_id
HAVING COUNT(DISTINCT gr.group_id) = 2) B
ON A.id = B.person_id;
This will give you all the desired fields.

mySQL - How to do this query?

I'm trying to answer to the following query:
Select the first name and last name of the clients which rent films (that have DVD's) from all the categories, ordering by first name and last name.
Database consists in:
(better view - open in a new tab)
Inventory -> DVD's
Rental -> Rents customers did
Category table:
| category_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(25) | YES | | NULL |
My doubt is in how to assign that a field from a query must contain all ids from another query (categories).
I mean I understand the fact we can natural join inventory with rental and film, and then find an id that fails on a single category, then we know he doesn't contain all... But I can't complete this.
I have this solution (But I can't understand it very well):
SELECT first_name, last_name
FROM customer AS C WHERE NOT EXISTS
(SELECT * FROM category AS K WHERE NOT EXISTS
(SELECT * FROM (film NATURAL JOIN inventory) NATURAL JOIN rental
WHERE C.customer_id = customer_id AND K.category_id = category_id));
Are there any other solutions?
On our projects, we NEVER use NATURAL JOIN. That doesn't work for us, because the PRIMARY KEY is always a surrogate column named id, and the foreign key columns are always tablename_id.
A natural join would match id in one table to id in the other table, and that's not what we want. We also frequently have "housekeeping" columns in the tables that are named the same, such as version column used for optimistic locking pattern.
And even if our naming conventions were different, and the join columns were named the same, there would be a potential for a join in an existing query to change if we added a column to a table that was named the same as a column in another table.
And, reading SQL statement that includes a NATURAL JOIN, we can't see what columns are actually being matched, without running through the table definitions, looking for columns that are named the same. That seems to put an unnecessary burden on the reader of the statement. (A SQL statement is going to be "read" many more times than it's written... the author of the statement saving keystrokes isn't a beneficial tradeoff for ambiguity leading to extra work by future readers.
(I know others have different opinions on this topic. I'm sure that successful software can be written using the NATURAL JOIN pattern. I'm just not smart enough or good enough to work with that. I'll give significant weight to the opinions of DBAs that have years of experience with database modeling, implementing schemas, writing and tuning SQL, supporting operational systems, and dealing with evolving requirements and ongoing maintenance.)
Where was I... oh yes... back to regularly scheduled programming...
The image of the schema is way too small for me to decipher, and I can't seem to copy any text from it. Output from a SHOW CREATE TABLE is much easier to work with.
Did you have a SQL Fiddle setup?
I don't thin the query in the question will actually work. I thought there was a limitation on how far "up" a correlated subquery could reference an outer query.
To me, it looks like this predicate
WHERE C.customer_id = customer_id
^^^^^^^^^^^^^
is too deep. The subquery that's in isn't allowed to reference columns from C, that table is too high up. (Maybe I'm totally wrong about that; maybe it's Oracle or SQL Server or Teradata that has that restriction. Or maybe MySQL used to have that restriction, but a later version has lifted it.)
OTHER APPROACHES
As another approach, we could get each customer and a distinct list of every category that he's rented from.
Then, we could compare that list of "customer rented category" with a complete list of (distinct) category. One fairly easy way to do that would be to collapse each list into a "count" of distinct category, and then compare the counts. If a count for a customer is less than the total count, then we know he's not rented from every category. (There's a few caveats, We need to ensure that the customer "rented from category" list contains only categories in the total category list.)
Another approach would be to take a list of (distinct) customer, and perform a cross join (cartesian product) with every possible category. (WARNING: this could be fairly large set.)
With that set of "customer cross product category", we could then eliminate rows where the customer has rented from that category (probably using an anti-join pattern.)
That would leave us with a set of customers and the categories they haven't rented from.
OP hasn't setup a SQL Fiddle with tables and exemplar data; so, I'm not going to bother doing it either.
I would offer some example SQL statements, but the table definitions from the image are unusable; to demonstrate those statements actually working, I'd need some exemplar data in the tables.
(Again, I don't believe the statement in the question actually works. There's no demonstration that it does work.)
I'd be more inclined to test it myself, if it weren't for the NATURAL JOIN syntax. I'm not smart enough to figure that out, without usable table definitions.
If I worked on that, the first think I would do would be to re-write it to remove the NATURAL keyword, and add actual predicates in an actual ON clause, and qualify all of the column references.
And the query would end up looking something like this:
SELECT c.first_name
, c.last_name
FROM customer c
WHERE NOT EXISTS
( SELECT 1
FROM category k
WHERE NOT EXISTS
( SELECT 1
FROM film f
JOIN inventory i
ON i.film_id = f.film_id
JOIN rental r
ON r.inventory_id = i.inventory_id
WHERE f.category_id = k.category_id
AND r.customer_id = c.customer_id
)
)
(I think that reference to c.customer_id is too deep to be valid.)
EDIT
I stand corrected on my conjecture that the reference to C.customer_id was too many levels "deep". That query doesn't throw an error for me.
But it also doesn't seem to return the resultset that we're expecting, I may have screwed it up somehow. Oh well.
Here's an example of getting the "count of distinct rental category" for each customer (GROUP BY c.customer_id, just in case we have two customers with the same first and last names) and comparing to the count of category.
SELECT c.last_name
, c.first_name
FROM customer c
JOIN rental r
ON r.customer_id = c.customer_id
JOIN inventory i
ON i.inventory_id = r.inventory_id
JOIN film f
ON f.film_id = i.film_id
GROUP
BY c.last_name
, c.first_name
, c.customer_id
HAVING COUNT(DISTINCT f.category_id)
= (SELECT COUNT(DISTINCT a.category_id) FROM category a)
ORDER
BY c.last_name
, c.first_name
, c.customer_id
EDIT
And here's a demonstration of the other approach, generating a cartesian product of all customers and all categories (WARNING: do NOT do this on LARGE sets!), and find out if any of those rows don't have a match.
-- customers who have rented from EVERY category
-- h = cartesian (cross) product of all customers with all categories
-- g = all categories rented by each customer
-- perform outer join, return all rows from h and matching rows from g
-- if a row from h does not have a "matching" row found in g
-- columns from g will be null, test if any rows have null values from g
SELECT h.last_name
, h.first_name
FROM ( SELECT hi.customer_id
, hi.last_name
, hi.first_name
, hj.category_id
FROM customer hi
CROSS
JOIN category hj
) h
LEFT
JOIN ( SELECT c.customer_id
, f.category_id
FROM customer c
JOIN rental r
ON r.customer_id = c.customer_id
JOIN inventory i
ON i.inventory_id = r.inventory_id
JOIN film f
ON f.film_id = i.film_id
GROUP
BY c.customer_id
, f.category_id
) g
ON g.customer_id = h.customer_id
AND g.category_id = h.category_id
GROUP
BY h.last_name
, h.first_name
, h.customer_id
HAVING MIN(g.category_id IS NOT NULL)
ORDER
BY h.last_name
, h.first_name
, h.customer_id
I will take a stab at this, only because I am curious why the answer proposed seems so complex. First, a couple of questions.
So your question is: "Select the first name and last name of the clients which rent films (that have DVD's) from all the categories, ordering by first name and last name."
So, just go through the rental database, joining customer. I am not sure what the category part has anything to do with this, as you are not selecting or displaying any category, so that does not need to be part of the search, it is implied as when they rent a DVD, that DVD has a category.
SELECT C.first_name, C.last_name
FROM customer as C JOIN rental as R
ON (C.customer_id = R.customer_id)
WHERE R.return_date IS NOT NULL;
So, you are looking for movies that are currently rented, and displaying the first and last names of customers with active rentals.
You can also do some UNIQUE to reduce the number of duplicate customers that show up in the list.
Does this help?!

SQL Comment Grouping

I have two table in MySQL
Table 1: List of ID's
--Just a single column list of ID's
Table 2: Groups
--Group Titles
--Members **
Now the member field is basically a comments field where all the ID's that are part of that group are listed. So for instance one whole field of members looks like this:
"ID003|ID004|ID005|ID006|ID007|ID008|... Etc."
There they can be up to 500+ listed in the field.
What I would like to do is to run a query and find out which ID's appear in only three or less groups.
I've been taking cracks at it, but honestly I'm totally lost. Any ideas?
Edit; I misunderstood the question the first time, so I'm changing my answer.
SELECT l.id
FROM List_of_ids AS l
JOIN Groups AS g ON CONCAT('|', g.members, '|') LIKE CONCAT('%|', l.id, '|%')
GROUP BY l.id
HAVING COUNT(*) <= 3
This is bound to perform very poorly, because it forces a table-scan of both tables. If you have 500 id's and 500 groups, it must run 250000 comparisons.
You should really consider if storing a symbol-separated list is the right way to do this. See my answer to Is storing a delimited list in a database column really that bad?
The proper way to design such a relationship is to create a third table that maps id's to groups:
CREATE TABLE GroupsIds (
memberid INT,
groupid INT,
PRIMARY KEY (memberid, groupid)
);
With this table, it would be much more efficient by using an index for the join:
SELECT l.id
FROM List_of_ids AS l
JOIN GroupsIds AS gi ON gi.memberid = l.id
GROUP BY l.id
HAVING COUNT(*) <= 3
select * from
(
select ID,
(
select count(*)
From Groups
where LOCATE(concat('ID', a.id, '|'), concat(Members, '|'))>0
) as groupcount
from ListIDTable as a
) as q
where groupcount <= 3

MySQL - 3 tables, is this complex join even possible?

I have three tables: users, groups and relation.
Table users with fields: usrID, usrName, usrPass, usrPts
Table groups with fields: grpID, grpName, grpMinPts
Table relation with fields: uID, gID
User can be placed in group in two ways:
if collect group minimal number of points (users.usrPts > group.grpMinPts ORDER BY group.grpMinPts DSC LIMIT 1)
if his relation to the group is manually added in relation tables (user ID provided as uID, as well as group ID provided as gID in table named relation)
Can I create one single query, to determine for every user (or one specific), which group he belongs, but, manual relation (using relation table) should have higher priority than usrPts compared to grpMinPts? Also, I do not want to have one user shown twice (to show his real group by points, but related group also)...
Thanks in advance! :) I tried:
SELECT * FROM users LEFT JOIN (relation LEFT JOIN groups ON (relation.gID = groups.grpID) ON users.usrID = relation.uID
Using this I managed to extract specified relations (from relation table), but, I have no idea how to include user points, respecting above mentioned priority (specified first). I know how to do this in a few separated queries in php, that is simple, but I am curious, can it be done using one single query?
EDIT TO ADD:
Thanks to really educational technique using coalesce #GordonLinoff provided, I managed to make this query to work as I expected. So, here it goes:
SELECT o.usrID, o.usrName, o.usrPass, o.usrPts, t.grpID, t.grpName
FROM (
SELECT u.*, COALESCE(relationgroupid,groupid) AS thegroupid
FROM (
SELECT u.*, (
SELECT grpID
FROM groups g
WHERE u.usrPts > g.grpMinPts
ORDER BY g.grpMinPts DESC
LIMIT 1
) AS groupid, (
SELECT grpUID
FROM relation r
WHERE r.userUID = u.usrID
) AS relationgroupid
FROM users u
)u
)o
JOIN groups t ON t.grpID = o.thegroupid
Also, if you are wondering, like I did, is this approach faster or slower than doing three queries and processing in php, the answer is that this is slightly faster way. Average time of this query execution and showing results on a webpage is 14 ms. Three simple queries, processing in php and showing results on a webpage took 21 ms. Average is based on 10 cases, average execution time was, really, a constant time.
Here is an approach that uses correlated subqueries to get each of the values. It then chooses the appropriate one using the precedence rule that if the relations exist use that one, otherwise use the one from the groups table:
select u.*,
coalesce(relationgroupid, groupid) as thegroupid
from (select u.*,
(select grpid from groups g where u.usrPts > g.grpMinPts order by g.grpMinPts desc limit 1
) as groupid,
(select gid from relations r where r.userId = u.userId
) as relationgroupid
from users u
) u
Try something like this
select user.name, group.name
from group
join relation on relation.gid = group.gid
join user on user.uid = relation.uid
union
select user.name, g1.name
from group g1
join group g2 on g2.minpts > g1.minpts
join user on user.pts between g1.minpts and g2.minpts

join with where condition

i read many join questions here but unable to understand and create my own to get the right result i want.
i have three tables for now that is status,members,friends friends table have two columns friend_id and member_id
all three tables have member_id common primary id of members table
now i want to get all the status created by members and member's friends
if i have three members with id's 1,2,3
friends table have id's 1,2 so these two becomes friends of each other
2 have 5 status updates and 1 have 2 status and 3 have 1 updates in status table
if i query against member 2 it should return 7 record...( 5 for 2 and 2 for 1 ) and should not return record of member 3.
if i query against member 1 it should return same record as for point 5.
do i need change in my tables structure ? please help how to get the record the way i want
How about a pre-query to the friends table for any qualifying member PLUS the member itself, then back-join to the rest of the tables...
select STRAIGHT_JOIN
PeopleList.Member_id,
members.last_name,
members.first_name, (etc with any other fields)
ms.status_id,
ms.description (etc with any other fields from member_status table)
from
( Select DISTINCT m.member_id
from Members m
where m.member_id = MemberDesiredVariable
union select f.friend_id AS member_id
from Friends f
where f.member_id = MemberDesiredVariable
union select f2.member_id
from Friends f2
where f2.friend_id = MemberDesiredVariable ) PeopleList
join members
on PeopleList.member_id = members.member_id
join member_status ms
on PeopleList.member_id = ms.member_id
This should get the primary person in question regardless of the person having ANY records in the "friends" table, such as a new person with no entries yet... they would at least qualify themselves and join to the members and member_status tables.
Then, in your scenario where member 1 is the criteria, it will query against the friends for any "Friend_IDs", and thus DISTINCT will have the 1 (direct from members) and the 2 where the member_id = 1, finds the Friend_id = 2. So now, this pre-query has two IDs and proceeds to get whatever the rest of your details you want.
The THIRD scenario is you want member 2... So, direct query to the members table guarantees their ID in the list to process, yet since their ID is NOT as a "MEMBER_ID" in the friends table, it has to look for itself as a "FRIEND_ID" from someone else and grab THAT Member's ID. So now, member 2 will also find member 1 and proceed to get details out.
As for member 3, if you queried against the Friends table, you'd get NO records at all, even IF the member 3 had some status records... It must be qualified against itself to be inclusive of the rest for processing... Yet will not find itself as a "member_id" nor "friend_id" in the friends table.
I couldn't actually test this at my current location, but logically should go no problem.
Finally, if you want the friends names REGARDLESS of having any "status" changes, change the last join to member_status to a LEFT JOIN.
--- Comment feedback
I can't suggest any books specifically, it just comes from years of experience...
1. UNDERSTAND THE RELATIONSHIP OF YOUR DATA...
2. Find out the inner-most "what do I want to get".
3. Throw all other elements out until you get the CRITERIA, not the CONTENT.
4. Keep your primary "get the criteria" up front... THEN Join in your other tables.
5. Then tack on all the other fields you want in the output result set
Trying to solve a complex query can very often be cluttered by all the OTHER elements of data a person is trying to get. Like so many other programming tasks... I like to make it work, then make it pretty. So too goes with querying. If your baseline query doesn't get the WHAT you want, it doesn't matter how many other tables you are joining together (left, outer, or normal join), your output will be wrong.
I've also added the clause "STRAIGHT_JOIN" to the sql at the top. This tells MySql to do the query in the order I've instructed it and don't have the optimizer try to think for me. This one clause has come in so frequently when joining a main table (such as millions of records) to "lookup" secondary tables that the query engine has falsely interpretted the lookup table as primary for querying which killed the performance...
Try to do some timed tests between the versions that work. If they are equally comparable, I would typically go with the one that I could understand in case I had to modify / change something in the future.
-- own records
SELECT member_id, friend_id, user_name, description
FROM
(SELECT M.member_id,
M.member_id friend_id,
M.user_name,
MS.description
FROM members M
LEFT JOIN member_status MS on MS.member_id = M.member_id
UNION ALL
-- friends records
SELECT M.member_id,
F.friend_id,
MF.user_name,
MS.description
FROM members M
JOIN ( SELECT friend_id member_id, member_id friend_id from friends
UNION SELECT member_id, friend_id from friends) F
ON F.member_id = M.member_id
LEFT JOIN member_status MS on MS.member_id = F.friend_id
LEFT JOIN members MF on MF.member_id = F.friend_id) R
WHERE R.member_id = 1
Here is the solution using UNION clauses. If the result if each SELECT is short (let's say less than 1000 rows) then it is faster than LEFT JOIN combined with a OR.
If by "friends of each other" you mean that you want :
(a) the status of the members marked as friend
+
(b) the status of the members which the considered member is marked as friend
then you should use the tree UNION below.
If you want only (a) then delete the last UNION.
SELECT s.status_id
FROM member_status AS s
WHERE (s.member_id=#id)
UNION ALL
SELECT s.status_id
FROM member_status AS s
INNER JOIN friends AS f ON (s.member_id=f.friend_id)
WHERE (f.member_id=#id)
UNION ALL
SELECT s.status_id
FROM member_status AS s
INNER JOIN friends AS f ON (s.member_id=f.member_id)
WHERE (f.friend_id=#id)