I have two tables. The first is full of books each with a book_id. The second table is a book_id to keyword_id relationship table.
SELECT b.* FROM books_table b
INNER JOIN keywords_table k
ON b.book_id = k.book_id AND k.keyword_id NOT IN(1,2,3)
WHERE b.is_hardcover = 1
GROUP BY b.book_id
Desired Outcome
No books with the keyword_id 1, 2, or 3 attached to any of the books.
Actual Outcome
Books can have the keywords 1, 2, or 3 so long as they have additional keyword_ids attached to them that are not in the exclusion list.
What I've tried
The above query is the closest I have come to achieving it, but it fails in this one regard.
How can I achieve the desired outcome and in the most optimized way?
You can do so
SELECT b.*
FROM books_table b
INNER JOIN keywords_table k
ON b.book_id = k.book_id
WHERE b.is_hardcover = 1
GROUP BY b.book_id
HAVING SUM(k.keyword_id = 1) =0
AND SUM(k.keyword_id = 2) =0
AND SUM(k.keyword_id = 3) =0
As you noted, this query will produce any book that has at least one keyword that isn't 1, 2 or 3, which isn't what you want. Instead, you'd want to explicitly exclude books with these keywords. A join isn't really the right took for the job here. Instead, you could use the exists operator:
SELECT b.*
FROM books_table b
WHERE b.is_hardcover = 1 AND
NOT EXISTS (SELECT *
FROM keywords_table k
WHERE b.book_id = k.book_id AND
k.keyword_id IN (1,2,3))
What you are asking for is a flavor of "anti join". There are several ways to accomplish it; here's one:
SELECT b.* FROM books_table b
LEFT JOIN keywords_table k
ON b.book_id = k.book_id AND k.keyword_id IN (1,2,3)
WHERE k.book_id IS NULL AND b.is_hardcover = 1
The left join matches up each row from the left table (books_table) with those rows of the right table that satisfy the condition b.book_id = k.book_id AND k.keyword_id IN (1,2,3), and includes a single result row for each row of the left table that doesn't match any row of the right table. The filter condition k.book_id IS NULL conflicts with the join condition, so it can be satisfied only by those rows arising from a left row not matching any right row.
Note that the assignment of conditions to the join predicate and the filter predicate is critical with an outer join such as this one. Note also that there is no need for a GROUP BY clause in this case unless books_table may contain duplicate book_ids.
This approach is likely to perform better in practice than one based on a correlated subquery in the WHERE clause. If performance is important, however, then you would be well advised to test the alternatives you are considering.
You can use the following query:
SELECT *
FROM books_table
WHERE is_hardcover = 1 AND
book_id NOT IN (SELECT book_id
FROM keywords_table
GROUP BY book_id
HAVING COUNT(CASE WHEN keyword_id IN (1,2,3) THEN 1 END) <> 0)
Demo here
Related
I have long hive query, which has 10 joins and lots of conditions, below is 3 conditions
1) If id is not equal to XFG or GHT, use field sid
join ABC_Tables on sid
join CDE_Tables on sid
2) If id is equal to XFG or GHT, Tested is null, use field pid
join ABC_Tables on kid
join CDE_Tables on kid
3) If id is equal to XFG or GHT, Tested is not null, use field pid
join ABC_Tables on kid
join CDE_Tables on kid
What am I doing,
select 1 conditions
union all
select 2 conditions
union all
select 3 conditions
am I doing right. What is the alternative of above problem.
Your conditions are allowed to be part of ON join condition. Equal/not equal to constants are allowed in Hive ( ID!='XFG')and(ID!='GHT')and(a.PID=b.PID) is allowed join condition. a.ID not in ('XFG', 'GHT') and a.sid=b.sid also should work:
select *
from a
left join b on a.ID not in ('XFG', 'GHT') and a.sid=b.sid
left join b on a.ID in ('XFG', 'GHT') and Tested is null and a.pid=b.pid
I have multiple tables, related by multiple foreign keys as in the following example:
Recipes(id_recipe,name,calories,category) - id_recipe as PK.
Ingredients(id_ingredient,name,type) - id_ingredient as PK.
Contains(id_ingredient,id_recipe,quantity,unit) - (id_ingredient,id_recipe) as PK, and as Foreign Keys for Recipes(id_recipe) and Ingredients(id_ingredient).
You can see this relations represented in this image.
So basically Contains is a bridge between Recipes and Ingredients.
The query I try to write it's supposed to give as result the names of the recipes whose ingredients type are "bovine" but not "lactic".
My attempt:
SELECT DISTINCT Recipes.name
FROM Ingredients JOIN Contains USING(id_ingredient) JOIN Recipes USING (id_recipe)
WHERE Ingredients.type = "bovin"
AND Ingredients.type <> "lactic";
The problem is it still shows me recipes that have at least one lactic ingredient.
I would appreciate any help!
This is the general form of the kind of query you need:
SELECT *
FROM tableA
WHERE tableA.ID NOT IN (
SELECT table_ID
FROM ...
)
;
-- EXAMPLE BELOW --
The subquery gives the id values of all recipes that the "lactic" ingredient is used in, the outer query says "give me all the recipes not in that list".
SELECT DISTINCT Recipes.name
FROM Recipes
WHERE id_recipe IN (
SELECT DISTINCT id_recipe
FROM `Ingredients` AS `i`
INNER JOIN `Contains` AS `c` USING (id_ingredient)
WHERE `i`.`type` = "lactic"
)
;
Alternatively, using your original query:
You could've changed the second join to a LEFT JOIN, changed it's USING to an ON & included AND type = "lactic" there instead, and ended the query with HAVING Ingredients.type IS NULL (or WHERE, I just prefer HAVING for "final result" filtering). This would tell you which items could not be joined to the "lactic" ingredient.
A common solution of this type of question (checking conditions over a set of rows) utilizes aggregate + CASE.
SELECT R.Name
FROM Recipes R
INNER JOIN Contains C
on R.ID_Recipe = C.ID_Recipe
INNER JOIN Ingredients I
on C.ID_Ingredient = I.ID_Ingredient
GROUP BY R.name
having -- at least one 'lactic' ingredient
sum(case when type = 'lactic' then 1 else 0 end) = 0
and -- no 'bovin' ingredient
sum(case when type = 'bovin' then 1 else 0 end) > 0
It's easy to extend to any number of ingredients and any kind of question.
Hijacked the fiddle of xQbert
SELECT R.NAME
FROM CONTAINS C
INNER JOIN INGREDIENTS I
ON I.ID_INGREDIENTS = C.ID_INGREDIENTS AND I.TYPE = 'bovine' AND I.TYPE <> "lactic"
INNER JOIN RECIPES R
ON R.ID_RECIPE = C.ID_RECIPE
GROUP BY R.NAME
That should work, maybe you need to escape 'contains'. It could be recognized as a SQL function.
SQL Fiddle
In my example burgers and pasta have 'Bovin' and thus show up. So do cookies but cookies also have 'lactic' which is why they get excluded.
SELECT R.Name
FROM Recipes R
INNER JOIN Contains C
on R.ID_Recipe = C.ID_Recipe
INNER JOIN Ingredients I
on C.ID_Ingredient = I.ID_Ingredient
LEFT JOIN (SELECT R2.ID_Recipe
FROM Ingredients I2
INNER JOIN Contains C2
on C2.ID_Ingredient = I2.ID_Ingredient
INNER JOIN Recipes R2
on R2.ID_Recipe = C2.ID_Recipe
WHERE Type = 'lactic'
GROUP BY R2.ID_Recipe) T3
on T3.ID_Recipe = R.ID_Recipe
WHERE T3.ID_Recipe is null
and I.Type = 'Bovin'
GROUP BY R.name
There likely is a more elegant way of doing this. I really wanted to CTE this and join it to itself.. but no CTE in mySQL. Likely a way to do this using exists too.... I'm not a big fan of using IN clauses as the performance generally suffers. Exists fastest, Joins 2nd fastest, in slowest (generally speaking)
The inline view (sub query) returns the ID_recipe of those you don't want to include.
The outer query returns the Name of the recipes with ingredients you want.
By joining these two together using an outer join we return all recipes and only those with the undesired ingredient. We then limit the results to only those where the recipe ID doesn't exist for the undesired ingredient. (undesired ingredient not found) you'll get only those recipes having all desired ingredients.
You can use NOT EXISTS for this.
Try this:
SELECT DISTINCT Recipes.`name`
FROM Recipes JOIN Contains AS C1 USING (id_recipe) JOIN Ingredients USING(id_ingredient)
WHERE Ingredients.type = "bovin"
AND NOT EXISTS (
SELECT 1
FROM Contains AS C2 JOIN Ingredients USING(id_ingredient)
WHERE C1.id_recipe = C2.id_recipe
AND Ingredients.type = "lactic"
)
While running this query:
SELECT
a.id,
pub.name AS publisher_name,
pc.name AS placement_name,
b.name AS banner_name,
a.lead_id,
a.partner_id,
a.type,
l.status,
s.correctness,
a.landing_page,
t.name AS tracker_name,
a.date_view,
a.date_action
FROM actions AS a
LEFT JOIN publishers AS pub ON a.publisher_id = pub.id
LEFT JOIN placements AS pc ON pc.publisher_id = pub.id
LEFT JOIN banners AS b ON b.campaign_id = a.campaign_id
LEFT JOIN leads l ON
l.lead_id = a.lead_id
AND l.created = (
SELECT MAX(created) from leads l2 where l2.lead_id = l.lead_id
)
LEFT JOIN statuses AS s ON l.status = s.status
LEFT JOIN trackers AS t ON t.id = a.tracker_id
LIMIT 10
I am able to sort by every column from actions table. However when I try to for example ORDER BY b.name (from banners table, joined on actions.banner_id) or ORDER BY l.lead_id (joined from leads on more complex condition as seen above) MySQL is running query for a loooong time (most tables have tens of thousands records). Is it possible, performance-wise, to sort by joined columns?
You should rewrite the query with a inner join on the table where the column you want to sort on is.
For example, if you sort on actions.banner_id
SELECT ...
FROM actions AS a
JOIN banners AS b ON b.campaign_id = a.campaign_id
LEFT JOIN *rest of the query*
You will get the same results unless there is not enough banners that can be joined to action to produce a total of 10 rows.
I'm guessing it's not the case otherwise you wouldn't be sorting on banner_id.
You could first filter (order by, where, etc.) your records in a subquery and then join the result with the rest of the tables.
I tried to combine two tables' data.
I got an error like this. can you see why?
Every derived table must have its own alias
SELECT a.title, number
FROM store a
JOIN
( SELECT count(b.code) as number
FROM redeem_codes b
WHERE product = a.title
AND available = "Available")
It's a little hard tell without knowing more about your table structures. I'll give a try anyway:
SELECT a.title, count(b.code) AS number FROM store a
LEFT JOIN redeem_codes b ON b.product = a.title
WHERE b.available = "Available"
GROUP BY a.title;
you need to have ALIAS on your subquery.
SELECT a.title, number
FROM store a
JOIN (subquery) b -- b is the `ALIAS`
-- and this query will not give you the result you want
but here's a more efficient query without using subquery,
SELECT a.title, count(b.code) number
FROM store a
INNER JOIN redeem_codes b -- or use LEFT JOIN to show 0
-- for those who have no product
ON b.product = a.title
WHERE b.available = 'Available'
GROUP BY a.title
I'm trying to retrieve books from one table and left join the chapters table. What I need from the second table is just the COUNT() of chapters available for those books and add that value as an extra column called chapters (or something else).
My current try looks like this:
SELECT b.*, count(c.chapter_nr) as chapters FROM books as b left join chapters as c on c.book_id = b.id
This only gets one from from the books table and adds the count() result to that row, but I'd like to get ALL the rows from the books table, hence the LEFT JOIN
SELECT b.*, count(c.chapter_nr) as chapters
FROM books as b
LEFT JOIN chapters as c on (c.book_id = b.id)
GROUP BY b.id
EXPLANATION
You need to group by the book in order to determine the actual chapter counts. If you were to leave out the GROUP BY clause, you would be retrieving a resultset of all chapters of every book. You simply want to limit the results to unique books and their corresponding chapter counts.
You are missing the GROUP BY clause:
SELECT b.*, count(c.chapter_nr) as chapters
FROM books AS b
LEFT JOIN chapters AS c ON c.book_id = b.id
GROUP BY b.id
Try :
SELECT b.*,
(select count(*) from chapters c where c.book_id = b.id) as chapters
FROM books b
This will return 0 if there are no chapters for a book.
Untested, but you need a "group by" clause to do what you want:
Select b.*, count(*) as chapters
from books b left outer join chapters c
on c.book_id = b.id
group by b.*