LEFT - JOIN and WHERE is not working as expected in query - mysql

I have 3 tables: residual_types, containers, collections and collection_container. Each container has a residual_type and there is many-to-many relationship between containers and collections.
I need to make query that, in a given day, tells me how much mass has been collected for each residual_type, even though there is not any record associated with the residual_type. For example, in a given day, the "ORGANIC" residual_types has 850 kg collected, it shows "ORGANIC | 850", but if it had 0 kg collected, it would show "ORGANIC | 0".
This is the query I am using, but it seems that it does not respect the WHERE clause for collections.creation_time and it brings all the records
SELECT residual_types.name AS name, IFNULL(SUM(collection_container.mass),0) AS mass
FROM residual_types
INNER JOIN containers ON containers.residual_type_id = residual_types.id
INNER JOIN collection_container ON collection_container.container_id = containers.id
LEFT JOIN collections ON collection_container.collection_id = collections.id AND collections.creation_time BETWEEN 1557637200 AND 1557723599
GROUP BY residual_types.id
ORDER BY mass DESC
+---------+------+
| name | mass |
+---------+------+
| organic | 7580 |
+---------+------+
| paper | 1243 |
+---------+------+
| plastic | 123 |
+---------+------+
I've also tried this query, but it does not bring any records.
SELECT residual_types.name AS name, IFNULL(SUM(collection_container.mass),0) AS mass
FROM residual_types
INNER JOIN containers ON containers.residual_type_id = residual_types.id
INNER JOIN collection_container ON collection_container.container_id = containers.id
INNER JOIN collections ON collection_container.collection_id = collections.id
WHERE collections.creation_time BETWEEN 1557637200 AND 1557723599
GROUP BY residual_types.id
ORDER BY mass DESC
If there are not any collections associated with the residual_type, then the result set should look like this:
+---------+------+
| name | mass |
+---------+------+
| organic | 0 |
+---------+------+
| paper | 0 |
+---------+------+
| plastic | 0 |
+---------+------+

Your problem is that the value you are summing will always be a number, regardless of whether there was a collection or not. You need to condition the sum with whether there was a collection or not, which you can do by changing that expression from
IFNULL(SUM(collection_container.mass), 0)
to
SUM(CASE WHEN collections.id IS NOT NULL THEN collection_container.mass ELSE 0 END)

I think the problem is int the design in that the mass is held in collection_container. I would have expected it in collections. The effect is that the mass is found before the left join (which fails on date).
For example
drop table if exists residual_types,containers,collection_container,collections;
create table residual_types(id int,name varchar(3));
create table containers(id int,residual_type_id int);
create table collection_container(container_id int,collection_id int,mass int);
create table collections(id int,creation_time bigint);
insert into residual_types values(1,'aaa'),(2,'bbb'),(3,'ccc');
insert into containers values(1,1),(2,1),(3,1),(4,1);
insert into collection_container values(1,10,100),(1,20,100),(1,30,100);
insert into collections values(10,1557637100),(20,1557637200),(30,1557723599);
SELECT residual_types.name AS name, collections.creation_time,
IFNULL(SUM(collection_container.mass),0) AS mass
FROM residual_types
INNER JOIN containers ON containers.residual_type_id = residual_types.id
INNER JOIN collection_container ON collection_container.container_id = containers.id
LEFT JOIN collections ON collection_container.collection_id = collections.id AND collections.creation_time BETWEEN 1557637200 AND 1557723599
GROUP BY residual_types.id,collections.creation_time
ORDER BY mass DESC;
+------+---------------+------+
| name | creation_time | mass |
+------+---------------+------+
| aaa | NULL | 100 |
| aaa | 1557637200 | 100 |
| aaa | 1557723599 | 100 |
+------+---------------+------+
3 rows in set (0.00 sec)

Related

Merge based on "group by" groups

So I have a table called the Activities table that contains a schema of user_id, activity
There is a row for each user, activity combo.
Here is a what it might look like (empty rows added to make things easier to look at, please ignore):
| user_id | activity |
|---------|-----------|
| 1 | swimming | -- We want to match this
| 1 | running | -- person's activities
| | |
| 2 | swimming |
| 2 | running |
| 2 | rowing |
| | |
| 3 | swimming |
| | |
| 4 | skydiving |
| 4 | running |
| 4 | swimming |
I would like to basically find all other users with at least the same activities as a given input id so that I could recommend users with similar activities.
so in the table above, if I wanna find recommended users for user_id=1, the query would return user_id=2 and user_id=4 because they engage in both swimming, running (and more), but not user_id=3 because they only engage in swimming
So a result with a single column of:
| user_id |
|---------|
| 2 |
| 4 |
is what I would ideally be looking for
As far as what I've tried, I am kinda stuck at how to get a solid set of user_id=1's activities to match against. Basically I'm looking for something along the lines of:
SELECT user_id from Activities
GROUP BY user_id
HAVING input_user_activities in user_x_activities
where user1_activities is just a set of our input user's activities. I can create that set using a WITH input_user_activities AS (...) in the beginning, what I'm stuck at is the user_x_activities part
Any thoughts?
To get users with the same activities, you can use a self join. Let me assume that the rows are unique:
select a.user_id
from activities a1 join
activities a
on a1.activity = a.activity and
a1.user_id = #user_id
group by a.user_id
having count(*) = (select count(*) from activities a1 where a1.user_id = #user_id);
The having clause answers your question -- of getting users that have the same activities as a given user.
You can easily get all users ordered by similarity using a JOIN (that finds all common rows) and a GROUP BY (to summarize the similarity per user_id) and finally an ORDER BY to return the most similar users first.
SELECT b.user_id, COUNT(*) similarity
FROM activities a
JOIN activities b
ON a.activity = b.activity
WHERE a.user_id = 1 AND b.user_id != 1
GROUP BY b.user_id
ORDER BY COUNT(*) DESC
An SQLfiddle to test with.

Modify MySQL Query or Run in PHP

I'm trying to run a query to find which inventory I should promote and which campaign I should run so I can move that inventory.
I have three tables:
campaigns lists different campaigns that I can run, each campaign has a unique id. Some campaigns promote only one item and some promote multiple items.
inventory has all the items I have in stock and the quantity of those items.
campaign_to_inventory matches the unique campaign id to the inventory item.
campaigns:
name | id
-------------|---
blue-widgets | 1
gluten-free | 2
gadget | 3
inventory:
item | qty
-------|----
thing1 | 0
thing2 | 325
thing3 | 452
thing5 | 123
thing7 | 5
campaign_to_inventory:
id | item
---|-------
1 | thing1
1 | thing2
1 | thing5
2 | thing1
2 | thing3
3 | thing7
I'd like to run a query to find all the campaigns I could run where I have the needed inventory in stock. I'm currently running this query:
SELECT * FROM `campaigns` LEFT JOIN `campaign_to_inventory` ON `campaigns`.`id` = `campaign_to_inventory`.`id` LEFT JOIN `inventory` ON `campaign_to_inventory`.`item` = `inventory`.`item`
Which returns:
name | id | item | qty
-------------|----|--------|----
blue-widgets | 1 | thing1 | 0
blue-widgets | 1 | thing2 | 325
blue-widgets | 1 | thing5 | 123
gluten-free | 2 | thing1 | 0
gluten-free | 2 | thing3 | 452
gadget | 3 | thing7 | 5
Should I use PHP to process this data to find only campaigns where all item quantities are greater than a minimum threshold, or is there a way to modify the query to limit the rows there? Is there a rule of thumb of when I can/should do it in one and not the other?
There's no need to process the data in PHP.
One way to do this would be to select the campaign_to_inventory.id column where the number of items is less than your threshold, like this:
SET #min_qty = 1;
SELECT `c_to_i`.`id` FROM `campaign_to_inventory` AS `c_to_i`
INNER JOIN `inventory` ON `inventory`.`item` = `c_to_i`.`item`
WHERE `inventory`.`qty` <= #min_qty;
... And then do a left outer join from campaign_to_inventory to that like this:
SET #min_qty = 1;
SELECT `id`, `name` FROM `campaigns`
LEFT JOIN (
/* Table of campaigns which contain items with not enough qty*/
SELECT `c_to_i`.`id` FROM `campaign_to_inventory` AS `c_to_i`
INNER JOIN `inventory` ON `inventory`.`item` = `c_to_i`.`item`
WHERE `inventory`.`qty` <= #min_qty
) AS `campaigns_with_not_enough_items`
ON `campaigns`.`id` = `campaigns_with_not_enough_items`.`id`
WHERE `campaigns_with_not_enough_items`.`id` is NULL;
The result should be a table of campaigns which have the needed inventory in stock.
As an aside, you should rename your campaign_to_inventory.id column to campaign since the name id implies that the column is the primary key for the table.

mySQL: query in which all possible values are missing at least once

I have a list of TV shows. Each TV show may be blacked out in 0 or more timezones. To say that a show is "blacked out" in a timezone means that the network does not have rights to air the show in that timezone. This data looks like this:
|----|---------------------|
| ID | Show |
|----|---------------------|
| 1 | Nightly News |
| 2 | Primetime Sitcom |
| 3 | Daytime Talkshow |
| 4 | Nightly News II |
| 5 | Daytime Talkshow II |
| 6 | Nightly News III |
|----|---------------------|
|
|-----join
|
v
|----|----------------------|
| ID | Timezone Restriction |
|----|----------------------|
| 1 | EST |
| 1 | CST |
| 1 | PST |
| 2 | EST |
| 2 | CST |
| 3 | PST |
| 5 | CST |
| 5 | PST |
| 6 | HST |
|----|----------------------|
Not all shows are timezone restricted (most are not). Given this data, I need to fetch a list contains as many results as necessary in order to supply 2 shows in each timezone that are not blacked out. The results should be ordered by ID, with each timezone seeing the lowest possible unrestricted IDs.
For instance, in the above dataset, this hypothetical query would return rows 1-4, e.g:
|----|------------------|--------------|
| ID | Show | Restrictions |
|----|------------------|--------------|
| 1 | Nightly News | EST,CST,PST |
| 2 | Primetime Sitcom | EST,CST |
| 3 | Daytime Talkshow | PST |
| 4 | Nightly News II | None |
|----|------------------|--------------|
As you can see, in the above result set, all timezones have at least 2 shows which are unrestricted. A viewer in EST or CST could watch programs 3 and 4. A viewer in PST could view programs 2 and 4. A viewer in MST or HST could view programs 1 and 2.
I can't for the life of me figure out the SQL that would get at this problem (sidenote, I don't actually need the "restrictions" column in my result, that's just here for explanatory purposes).
Create a table that lists all the timezones. You can then CROSS JOIN this with the show list, to get all potential zones where a show could be shown. Then use a LEFT JOIN with the restrictions table to filter out the rows that match any restrictions, as described in Return row only if value doesn't exist.
SELECT s.show, z.zone
FROM shows AS s
CROSS JOIN timezones AS z
LEFT JOIN restrictions AS r ON r.id = s.id AND r.`Timezone Restriction` = z.zone
WHERE r.id IS NULL
ORDER BY z.zone, s.id
DEMO
This lists all the shows that can be shown in each timezone, not just the first 2. See Using LIMIT within GROUP BY to get N results per group? for how to restrict the number of results per group.
So having thought about this a bit more, I'm pretty sure the thing I want to do is 1) lookup a list of unrestricted shows for each timezone and 2) UNION them all together. This actually seems like pretty much exactly the usecase UNION was created for now that I think of it.
So I can get a single timezones unrestricted shows like so:
SELECT `shows`.`ID`
FROM shows
LEFT JOIN restrictions
ON `shows`.`ID`=`restrictions`.`ID`
AND `shows`.`ID` NOT IN (
SELECT `restrictions`.`ID`
FROM restrictions
WHERE `Timezone Restriction`='EST'
)
LIMIT 2
And then just chain them together like so:
(SELECT `shows`.`ID` FROM shows LEFT JOIN restrictions ON `shows`.`ID`=`restrictions`.`ID` AND `shows`.`ID` NOT IN (select `restrictions`.`ID` from restrictions where `Timezone Restriction`='EST') LIMIT 2)
UNION
(SELECT `shows`.`ID` FROM shows LEFT JOIN restrictions ON `shows`.`ID`=`restrictions`.`ID` AND `shows`.`ID` NOT IN (select `restrictions`.`ID` from restrictions where `Timezone Restriction`='CST') LIMIT 2)
UNION
(SELECT `shows`.`ID` FROM shows LEFT JOIN restrictions ON `shows`.`ID`=`restrictions`.`ID` AND `shows`.`ID` NOT IN (select `restrictions`.`ID` from restrictions where `Timezone Restriction`='MST') LIMIT 2)
UNION
(SELECT `shows`.`ID` FROM shows LEFT JOIN restrictions ON `shows`.`ID`=`restrictions`.`ID` AND `shows`.`ID` NOT IN (select `restrictions`.`ID` from restrictions where `Timezone Restriction`='PST') LIMIT 2)
UNION
(SELECT `shows`.`ID` FROM shows LEFT JOIN restrictions ON `shows`.`ID`=`restrictions`.`ID` AND `shows`.`ID` NOT IN (select `restrictions`.`ID` from restrictions where `Timezone Restriction`='HST') LIMIT 2)
ORDER BY ID;
Building on top of the sqlfiddle #Barmar supplied: http://www.sqlfiddle.com/#!9/25773/1/0

MySQL Join two tables with condition

Based on these two tables:
products
| ID | Active | Name | No
--------------------------------------------------
| 1 | 1 | Shirt | 100
| 2 | 0 | Pullover | 200
variants
| MasterID | Active | Name | No
--------------------------------------------------
| 1 | 1 | Red | 101
| 1 | 0 | Yellow | 102
I want to get every product which is active and also their active variants in one sql.
Relation between those tables MasterID -> ID
Needed result:
ID (master) | Name | No
--------------------------------------------------
1 | Shirt | 100
1 | Red | 101
I tried it with using union, but then I am not able to get the belonging MasterIDs.
It looks like you just need a simple join:
select *
from products
left join variants
on products.ID = variants.MasterID
where products.Active = 1
and variants.Active = 1
Update after requirements were made clearer:
select ID, Name, No, 'products' as RowType
from products
where Active = 1
union
select variants.MasterID as ID, variants.Name, variants.No, 'variants' as RowType
from products
join variants
on products.ID = variants.MasterID
where products.Active = 1
and variants.Active = 1
order by ID, RowType, No
I've assumed you want the results ordered by ID, with products followed by variants. The No column may order it this way implicitly (it's impossible to know without real data), in which case the RowType column can be removed. The order by clause might need to be altered to match your specific RDBMS.
This should gives you the expected result:
select * from products left join variants on products.id = variants.masterId
where products.active=1 and variants.active=1
If not please add the expected result to your question.

Joining 3 tables with n:m relationship, want to see nonmatching rows

For this problem, consider the following 3 tables:
Event
id (pk)
title
Event_Category
event_id (pk, fk)
category_id (pk, fk)
Category
id (pk)
description
Pretty trivial I guess... :) Each event can fall into zero or more categories, in total there are 4 categories.
In my application, I want to view and edit the categories for a specific event. Graphically, the event will be shown together with ALL categories and a checkbox indicating whether the event falls into the category. Changing and saving the choice will result in modifocation of the intermediate table Event_Category.
But first: how to select this for a specific event? The query I need will in fact always return 4 rows, the number of categories present.
Following returns only the entries for the categories the event with id=11 falls into. Experimenting with outer joins did not give more rows in the result.
SELECT e.id, c.omschrijving
FROM Event e
INNER JOIN Event_Categorie ec ON e.id = ec.event_id
INNER JOIN Categorie c ON c.id = ec.categorie_id
WHERE e.id = 11
Or should I start with the Category table in the query? Hope for some hints :)
TIA, Klaas
UPDATE:
Yes I did but still have not found the answer. But I have simplified the issue by omitting the Event table from the query because this table is only used to view the Event descriptions.
SELECT * from Categorie c LEFT JOIN Event_Categorie ec ON c.id = ec.categorie_id WHERE ec.event_id = 11;
The simplified 2-table query only uses the lookup table and the link table but still returns only 2 rows instead of the total of 4 rows in the Categorie table.
My guess would be that the WHERE clause is applied after the joining, so the rows not joined to the link table are excluded. In my application I solved the issues by using a subquery but I still would like to know what is the best solution.
What you want is the list of all categories, plus information about whether that category is in the list of categories of your event.
So, you can do:
SELECT
*
FROM
Category
LEFT JOIN Event_Category ON category_id = id
WHERE
event_id = 11
and event_id column will be NULL on the categories that are not part of your event.
You can also create a column (named has_category below) that you will use to see if the event has this category instead of comparing with NULL:
SELECT
*,
event_id IS NOT NULL AS has_category
FROM
Category
LEFT JOIN Event_Category ON category_id = id
WHERE
event_id = 11
EDIT: This seems exactly what you say you are doing on your edit. I tested it and it seems correct. Are you sure you are running this query, and that rows with NULL are not somehow ignored?
The query
SELECT * FROM Categorie;
returns 4 rows:
+----+--------------+-------------------------------------+--------------------------------------+
| id | omschrijving | afbeelding | afbeelding_klein |
+----+--------------+-------------------------------------+--------------------------------------+
| 1 | Creatief | images/categorieen/creatief420k.jpg | images/categorieen/creatief190k.jpg |
| 2 | Sportief | images/categorieen/sportief420k.jpg | images/categorieen/sportief190kr.jpg |
| 4 | Culinair | images/categorieen/culinair420k.jpg | images/categorieen/culinair190k.jpg |
| 5 | Spirit | images/categorieen/spirit420k.jpg | images/categorieen/spirit190k.jpg |
+----+--------------+-------------------------------------+--------------------------------------+
4 rows in set (0.00 sec)
BUT:
The query
SELECT *
FROM Categorie
LEFT JOIN Event_Categorie ON categorie_id = id
WHERE event_id = 11;
returns 2 rows:
+----+--------------+-------------------------------------+-------------------------------------+----------+--------------+
| id | omschrijving | afbeelding | afbeelding_klein | event_id | categorie_id |
+----+--------------+-------------------------------------+-------------------------------------+----------+--------------+
| 1 | Creatief | images/categorieen/creatief420k.jpg | images/categorieen/creatief190k.jpg | 11 | 1 |
| 4 | Culinair | images/categorieen/culinair420k.jpg | images/categorieen/culinair190k.jpg | 11 | 4 |
+----+--------------+-------------------------------------+-------------------------------------+----------+--------------+
2 rows in set (0.00 sec)
So I still need the subquery... and the LEFT JOIN is not effective in showing all rows of the CAtegorie table, regardless whether there is a match with the link table.
This query, however, does what I want it to do:
SELECT *
FROM Categorie c
LEFT JOIN (SELECT * FROM Event_Categorie ec WHERE ec.event_id = 11 ) AS subselect ON subselect.categorie_id = c.id;
Result:
+----+--------------+-------------------------------------+--------------------------------------+----------+--------------+
| id | omschrijving | afbeelding | afbeelding_klein | event_id | categorie_id |
+----+--------------+-------------------------------------+--------------------------------------+----------+--------------+
| 1 | Creatief | images/categorieen/creatief420k.jpg | images/categorieen/creatief190k.jpg | 11 | 1 |
| 2 | Sportief | images/categorieen/sportief420k.jpg | images/categorieen/sportief190kr.jpg | NULL | NULL |
| 4 | Culinair | images/categorieen/culinair420k.jpg | images/categorieen/culinair190k.jpg | 11 | 4 |
| 5 | Spirit | images/categorieen/spirit420k.jpg | images/categorieen/spirit190k.jpg | NULL | NULL |
+----+--------------+-------------------------------------+--------------------------------------+----------+--------------+
4 rows in set (0.00 sec)
The issue is that you have filtered the results by the eventid. As you can see in your results, two of the categories (Sportief and Spirit) do not have events. So the correct SQL statement (using SQL Server syntax; some translation may be required) is:
SELECT *
FROM Categorie
LEFT JOIN Event_Categorie ON categorie_id = id
WHERE (event_id IS NULL) OR (event_id = 11);
Finally I found the right query, no subselect is necessary. But the WHERE clause works after the joining and therefore is no part of the join anymore. THe solution is extending the ON clause with an extra condition. Now all 4 rows are returned with NULL for the non-matching Categories!
SELECT *
FROM Categorie
LEFT JOIN Event_Categorie ON categorie_id = id AND event_id = 11;
So the bottom line is that putting an extra condition in the ON clause has different effect than filtering out rows by the same condition in the WHERE clause!