Joining 3 tables with n:m relationship, want to see nonmatching rows

Joining 3 tables with n:m relationship, want to see nonmatching rows - mysql

For this problem, consider the following 3 tables:
Event
id (pk)
title
Event_Category
event_id (pk, fk)
category_id (pk, fk)
Category
id (pk)
description
Pretty trivial I guess... :) Each event can fall into zero or more categories, in total there are 4 categories.
In my application, I want to view and edit the categories for a specific event. Graphically, the event will be shown together with ALL categories and a checkbox indicating whether the event falls into the category. Changing and saving the choice will result in modifocation of the intermediate table Event_Category.
But first: how to select this for a specific event? The query I need will in fact always return 4 rows, the number of categories present.
Following returns only the entries for the categories the event with id=11 falls into. Experimenting with outer joins did not give more rows in the result.
SELECT e.id, c.omschrijving
FROM Event e
INNER JOIN Event_Categorie ec ON e.id = ec.event_id
INNER JOIN Categorie c ON c.id = ec.categorie_id
WHERE e.id = 11
Or should I start with the Category table in the query? Hope for some hints :)
TIA, Klaas
UPDATE:
Yes I did but still have not found the answer. But I have simplified the issue by omitting the Event table from the query because this table is only used to view the Event descriptions.
SELECT * from Categorie c LEFT JOIN Event_Categorie ec ON c.id = ec.categorie_id WHERE ec.event_id = 11;
The simplified 2-table query only uses the lookup table and the link table but still returns only 2 rows instead of the total of 4 rows in the Categorie table.
My guess would be that the WHERE clause is applied after the joining, so the rows not joined to the link table are excluded. In my application I solved the issues by using a subquery but I still would like to know what is the best solution.

What you want is the list of all categories, plus information about whether that category is in the list of categories of your event.
So, you can do:
SELECT
*
FROM
Category
LEFT JOIN Event_Category ON category_id = id
WHERE
event_id = 11
and event_id column will be NULL on the categories that are not part of your event.
You can also create a column (named has_category below) that you will use to see if the event has this category instead of comparing with NULL:
SELECT
*,
event_id IS NOT NULL AS has_category
FROM
Category
LEFT JOIN Event_Category ON category_id = id
WHERE
event_id = 11
EDIT: This seems exactly what you say you are doing on your edit. I tested it and it seems correct. Are you sure you are running this query, and that rows with NULL are not somehow ignored?

The query
SELECT * FROM Categorie;
returns 4 rows:
+----+--------------+-------------------------------------+--------------------------------------+
| id | omschrijving | afbeelding | afbeelding_klein |
+----+--------------+-------------------------------------+--------------------------------------+
| 1 | Creatief | images/categorieen/creatief420k.jpg | images/categorieen/creatief190k.jpg |
| 2 | Sportief | images/categorieen/sportief420k.jpg | images/categorieen/sportief190kr.jpg |
| 4 | Culinair | images/categorieen/culinair420k.jpg | images/categorieen/culinair190k.jpg |
| 5 | Spirit | images/categorieen/spirit420k.jpg | images/categorieen/spirit190k.jpg |
+----+--------------+-------------------------------------+--------------------------------------+
4 rows in set (0.00 sec)
BUT:
The query
SELECT *
FROM Categorie
LEFT JOIN Event_Categorie ON categorie_id = id
WHERE event_id = 11;
returns 2 rows:
+----+--------------+-------------------------------------+-------------------------------------+----------+--------------+
| id | omschrijving | afbeelding | afbeelding_klein | event_id | categorie_id |
+----+--------------+-------------------------------------+-------------------------------------+----------+--------------+
| 1 | Creatief | images/categorieen/creatief420k.jpg | images/categorieen/creatief190k.jpg | 11 | 1 |
| 4 | Culinair | images/categorieen/culinair420k.jpg | images/categorieen/culinair190k.jpg | 11 | 4 |
+----+--------------+-------------------------------------+-------------------------------------+----------+--------------+
2 rows in set (0.00 sec)
So I still need the subquery... and the LEFT JOIN is not effective in showing all rows of the CAtegorie table, regardless whether there is a match with the link table.
This query, however, does what I want it to do:
SELECT *
FROM Categorie c
LEFT JOIN (SELECT * FROM Event_Categorie ec WHERE ec.event_id = 11 ) AS subselect ON subselect.categorie_id = c.id;
Result:
+----+--------------+-------------------------------------+--------------------------------------+----------+--------------+
| id | omschrijving | afbeelding | afbeelding_klein | event_id | categorie_id |
+----+--------------+-------------------------------------+--------------------------------------+----------+--------------+
| 1 | Creatief | images/categorieen/creatief420k.jpg | images/categorieen/creatief190k.jpg | 11 | 1 |
| 2 | Sportief | images/categorieen/sportief420k.jpg | images/categorieen/sportief190kr.jpg | NULL | NULL |
| 4 | Culinair | images/categorieen/culinair420k.jpg | images/categorieen/culinair190k.jpg | 11 | 4 |
| 5 | Spirit | images/categorieen/spirit420k.jpg | images/categorieen/spirit190k.jpg | NULL | NULL |
+----+--------------+-------------------------------------+--------------------------------------+----------+--------------+
4 rows in set (0.00 sec)

The issue is that you have filtered the results by the eventid. As you can see in your results, two of the categories (Sportief and Spirit) do not have events. So the correct SQL statement (using SQL Server syntax; some translation may be required) is:
SELECT *
FROM Categorie
LEFT JOIN Event_Categorie ON categorie_id = id
WHERE (event_id IS NULL) OR (event_id = 11);

Finally I found the right query, no subselect is necessary. But the WHERE clause works after the joining and therefore is no part of the join anymore. THe solution is extending the ON clause with an extra condition. Now all 4 rows are returned with NULL for the non-matching Categories!
SELECT *
FROM Categorie
LEFT JOIN Event_Categorie ON categorie_id = id AND event_id = 11;
So the bottom line is that putting an extra condition in the ON clause has different effect than filtering out rows by the same condition in the WHERE clause!

Related

Sql left outer join with three tables

I am developing basically an e-commerce application. Application has two pages (all product and my-basket) authenticated user can add product to own basket. and I have three tables, the tables contains following data. I want to if the user adds product to own basket, these products don't exist on this user's all product page.
How should be the SQL query? I am looking query for all product page. so query's return type must be Product.
If user added any products to own basket on all product page these products
shouldn't see on the all product page for this user.
PRODUCT TABLE
+-------+--------+
| id | name |
+-------+--------+
| 1 | p1 |
| 2 | p2 |
+-------+--------+
USER TABLE
+-------+--------+
| id | name |
+-------+--------+
| 3 | U1 |
| 4 | U2 |
+-------+--------+
BASKET TABLE
+-------+---------+-------------+
| id | fk_user | fk_product |
+-------+---------+-------------+
| 5 | 3 | 1 |
| 6 | 4 | 2 |
+-------+---------+-------------+
So if authenticated user's id is 3. The user should see p2 product on own all product page.

try this:
SELECT product.name
FROM product
LEFT JOIN basket ON basket.fk_product = product.id
WHERE (basket.fk_user != 3 OR basket.fk_user IS NULL)
Check my demo query
If you want you can also join the user table but with the data you gave me is not necessary.
A left join keeps all rows in the first (product) table plus all rows in the second (basket) table, when the on clause evaluates to true.
When the on clause evaluates to false or NULL, the left join still keeps all rows in the first table with NULL values for the second table.

or, more commonly...
SELECT p.name
FROM product p
LEFT JOIN basket b
on b.fk_product = p.id
AND b.fk_user = 3
WHERE b.fk_user is null

What you are describing sounds like NOT EXISTS:
SELECT p.name
FROM product p
WHERE NOT EXISTS (SELECT 1
FROM basket b
WHERE b.fk_product = f.id AND
b.fk_user = 3
);
This seems like the most direct interpretation of your question.

Can I be selective on what rows I join on in MySQL

Suppose I have two tables, people and emails. emails has a person_id, an address, and an is_primary:
people:
id
emails:
person_id
address
is_primary
To get all email addresses per person, I can do a simple join:
select * from people join emails on people.id = emails.person_id
What if I only want (at most) one row from the right table for each row in the left table? And, if a particular person has multiple emails and one is marked as is_primary, is there a way to prefer which row to use when joining?
So, if I have
people: emails:
------ -----------------------------------------
| id | | id | person_id | address | is_primary |
------ -----------------------------------------
| 1 | | 1 | 1 | a#b.c | true |
| 2 | | 2 | 1 | b#b.c | false |
| 3 | | 3 | 2 | c#b.c | true |
| 4 | | 4 | 4 | d#b.c | false |
------ -----------------------------------------
is there a way to get this result:
------------------------------------------------
| people.id | emails.id | address | is_primary |
------------------------------------------------
| 1 | 1 | a#b.c | true |
| 2 | 3 | c#b.c | true | // chosen over b#b.c because it's primary
| 3 | null | null | null | // no email for person 3
| 4 | 4 | d#b.c | false | // no primary email for person 4
------------------------------------------------

You got it a bit wrong, how left/right joins work.
This join
select * from people join emails on people.id = emails.person_id
will get you every column from both tables for all records that match your ON condition.
The left join
select * from people left join emails on people.id = emails.person_id
will give you every record from people, regardless if there's a corresponding record in emails or not. When there's not, the columns from the emails table will just be NULL.
If a person has multiple emails, multiple records will be in the result for this person. Beginners often wonder then, why the data has duplicated.
If you want to restrict the data to the rows where is_primary has the value 1, you can do so in the WHERE clause when you're doing an inner join (your first query, although you ommitted the inner keyword).
When you have a left/right join query, you have to put this filter in the ON clause. If you would put it in the WHERE clause, you would turn the left/right join into an inner join implicitly, because the WHERE clause would filter the NULL rows that I mentioned above. Or you could write the query like this:
select * from people left join emails on people.id = emails.person_id
where (emails.is_primary = 1 or emails.is_primary is null)
EDIT after clarification:
Paul Spiegel's answer is good, therefore my upvote, but I'm not sure if it performs well, since it has a dependent subquery. So I created this query. It may depend on your data though. Try both answers.
select
p.*,
coalesce(e1.address, e2.address) AS address
from people p
left join emails e1 on p.id = e1.person_id and e1.is_primary = 1
left join (
select person_id, address
from emails e
where id = (select min(id) from emails where emails.is_primary = 0 and emails.person_id = e.person_id)
) e2 on p.id = e2.person_id

Use a correlated subquery with LIMIT 1 in the ON clause of the LEFT JOIN:
select *
from people p
left join emails e
on e.person_id = p.id
and e.id = (
select e1.id
from emails e1
where e1.person_id = e.person_id
order by e1.is_primary desc, -- true first
e1.id -- If e1.is_primary is ambiguous
limit 1
)
order by p.id
sqlfiddle

Optimization SQL for getting data from two joined tables (usernames for user-from-id and user-to-id msgs from two tables)

I have table "msgs" with messages between users (their ids):
+--------+-------------+------------+---------+---------+
| msg_id |user_from_id | user_to_id | message | room_id |
+--------+-------------+------------+---------+---------+
| 1 | 1 | 4 |Hello! | 2 |
| 2 | 1 | 5 |Hi there | 1 |
| 3 | 2 | 1 |CU soon | 2 |
| 4 | 3 | 7 |nice... | 1 |
+--------+-------------+------------+---------+---------+
I also have two tables with users names.
Table: user1
+--------+----------+
|user_id |user_name |
+--------+----------+
| 5 | Ann |
| 6 | Sam |
| 7 | Michael |
+--------+----------+
Table: user2
+--------+----------+
|user_id |user_name |
+--------+----------+
| 1 | John |
| 2 | Alice |
| 3 | Tom |
| 4 | Jane |
+--------+----------+
I need to get usernames for two users IDs in every row. Every user-id can be in first or second table with usernames.
I wrote this SQL query:
SELECT DISTINCT
m.msg_id,
m.user_from_id,
CASE WHEN c1.user_name IS NULL THEN c3.user_name ELSE c1.user_name END AS from_name,
m.user_to_id,
CASE WHEN c2.user_name IS NULL THEN c4.user_name ELSE c2.user_name END AS to_name,
m.message
FROM msgs m
LEFT JOIN users1 c1 ON c1.user_id=m.user_from_id
LEFT JOIN users1 c2 ON c2.user_id=m.user_to_id
LEFT JOIN users2 c3 ON c3.user_id=m.user_from_id
LEFT JOIN users2 c4 ON c4.user_id=m.user_to_id
WHERE m.room_id=1
LIMIT 0, 8
It works.
Execute query to get raw data without usernames (without any join) tooks about ~0.1 sec. But it's enough to join only one usernames table (user1 or user2 only) to get this data in about ~6.2 sec. (with join one table). I have quite a lot rows in this tables: 35K rows in msgs, 0.5K in user1, 25K in user2.
Executing query with join two tables (with all this data) is impossible.
How to optimize this query? I just need usernames for user_ids in first "msgs" table.

There are potentially many differences between the queries with and without the joins. I am going to assume that the ids have the appropriate indexes -- primary keys automatically do. If not, then check that.
The obvious solution is to use the original query as a subquery:
SELECT m.msg_id, m.user_from_id,
(CASE WHEN c1.user_name IS NULL THEN c3.user_name ELSE c1.user_name
END) AS from_name,
m.user_to_id,
(CASE WHEN c2.user_name IS NULL THEN c4.user_name ELSE c2.user_name
END) AS to_name,
m.message
FROM (SELECT m.*
FROM msgs m
WHERE m.room_id = 1
LIMIT 0, 8
) m LEFT JOIN
users1 c1
ON c1.user_id = m.user_from_id LEFT JOIN
users1 c2
ON c2.user_id = m.user_to_id LEFT JOIN
users2 c3
ON c3.user_id = m.user_from_id LEFT JOIN
users2 c4
ON c4.user_id = m.user_to_id;
For most data structures, the distinct is also unnecessary.
This also makes (the reasonable assumption) that user_id is unique in the users tables.
Also, use of LIMIT without ORDER BY is highly discouraged. The particular rows you get are indeterminate and might change from one execution to the next.

Join two tables using multiple rows in the join

I have two tables
Table: color_document
+----------+---------------------+
| color_id | document_id |
+----------+---------------------+
| 180907 | 4270851 |
| 180954 | 4270851 |
+----------+---------------------+
Table: color_group
+----------------+-----------+
| color_group_id | color_id |
+----------------+-----------+
| 3 | 180954 |
| 4 | 180907 |
| 11 | 180907 |
| 11 | 180984 |
| 12 | 180907 |
| 12 | 180954 |
+----------------+-----------+
Is it possible for a query to get a result that looks something like this using multiple color id's to join the two tables?
Result
+----------------+--------------+
| color_group_id | document_id |
+----------------+--------------+
| 12 | 4270851 |
+----------------+--------------+
Since Color Group 12 is the only group that has the exact same set of Colors that Document 4270851 has.
I've got some bad data that i'm being forced to work with so I've had to manufacture the color groups by finding each unique set of color_id's associated with document_id's. I'm trying to then create a new relationship directly between my manufactured color groups and documents.
I know I could probably do something with a GROUP_CONCAT to make a pseudo key of concatenated color ids, but I'm trying to find a solution that would also work in, say, Oracle. Am I barking up the completely wrong tree with this logic?
My ultimate goal is to be able to have a single row in a table that would represent any number of Colors that are associated with a Document to be exported to a completely different system than the one I'm working with.
Any thoughts/comments/suggestions are greatly appreciated.
Thank you in advance for looking at my question.

Do a normal join of the two tables, and count the number of rows in each pairing. Then test whether this is the same as the number of times each of the items appears in the original tables. If all are the same, then all color IDs must match.
SELECT a.color_group_id, a.document_id
FROM (
SELECT color_group_id, document_id, COUNT(*) ct
FROM color_document d
JOIN color_group g ON d.color_id = g.color_id
GROUP BY color_group_id, document_id) a
JOIN (
SELECT color_group_id, COUNT(*) ct
FROM color_group
GROUP BY color_group_id) b
ON a.color_group_id = b.color_group_id and a.ct = b.ct
JOIN (
SELECT document_id, COUNT(*) ct
FROM color_document
GROUP BY document_id) c
ON a.document_id = c.document_id and a.ct = c.ct
SQLFIDDLE

If i understand your question correct you just have to join the two tables and then group the results by color_group_id an document_id.
SQL Fiddle
select color_group_id, document_id
from
color_document cd join
color_group cg
on cd.color_id = cg.color_id
group by color_group_id, document_id
That query will give you this result set:
COLOR_GROUP_ID DOCUMENT_ID
3 4270851
4 4270851
11 4270851
12 4270851
Is that what you want?

Distinct Help

Alright so I have a table, in this table are two columns with ID's. I want to make one of the columns distinct, and once it is distinct to select all of those from the second column of a certain ID.
Originally I tried:
select distinct inp_kll_id from kb3_inv_plt where inp_plt_id = 581;
However this does the where clause first, and then returns distinct values.
Alternatively:
select * from (select distinct(inp_kll_id) from kb3_inv_plt) as inp_kll_id where inp_plt_id = 581;
However this cannot find the column inp_plt_id because distinct only returns the column, not the whole table.
Any suggestions?
Edit:
Each kll_id may have one or more plt_id. I would like unique kll_id's for a certain kb3_inv_plt id.
| inp_kll_id | inp_plt_id |
| 1941 | 41383 |
| 1942 | 41276 |
| 1942 | 38005 |
| 1942 | 39052 |
| 1942 | 40611 |
| 1943 | 5868 |
| 1943 | 4914 |
| 1943 | 39511 |
| 1944 | 39511 |
| 1944 | 41276 |
| 1944 | 40593 |
| 1944 | 26555 |

If you do mean, by "make distinct", "pick only inp_kll_ids that happen just once" (not the SQL semantics for Distinct), this should work:
select inp_kll_id
from kb3_inv_plt
group by inp_kll_id
having count(*)=1 and inp_plt_id = 581;

Get all the distinct first (alias 'a' in my following example) and then join it back to the table with the specified criteria (alias 'b' in my following example).
SELECT *
FROM (
SELECT
DISTINCT inp_kll_id
FROM kb3_inv_plt
) a
LEFT JOIN kb3_inv_plt b
ON a.inp_kll_id = b.inp_kll_id
WHERE b.inp_plt_id = 581

in this table are two columns with
ID's. I want to make one of the
columns distinct, and once it is
distinct to select all of those from
the second column of a certain ID.
SELECT distinct tableX.ID2
FROM tableX
WHERE tableX.ID1 = 581
I think your understanding of distinct may be different from how it works. This will indeed apply the where clause first, and then get a distinct list of unique entries of tableX.ID2, which is exactly what you ask for in the first part of your question.
By making a row distinct, you're ensuring no other rows are exactly the same. You aren't making a column distinct. Let's say your table has this data:
ID1 ID2
10 4
10 3
10 7
4 6
When you select distinct ID1,ID2 - you get the same as select * because the rows are already distinct.
Can you add information to clear up what you are trying to do?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008