Find rows without many-to-many children meeting a certain condition - mysql

Here's a generic version of what I'm trying to do:
The table recipes has fields id and name. The table ingredients has fields id, name, and sweetness, describing how sweet that ingredient is on a scale of 1-10. Recipes have many ingredients and ingredients are in many recipes, so the two are related in a ingredients_recipes table, with fields ingredient_id and recipe_id.
It's easy to find recipes that contain an ingredient with sweetness of 10.
SELECT DISTINCT recipes.* FROM recipes
INNER JOIN recipes_ingredients ri ON ri.recipe_id = recipes.id
INNER JOIN ingredients ON ingredients.id = ri.ingredient_id
WHERE ingredients.sweetness = 10
However, I'm having trouble with negating that query to find recipes with no ingredients with sweetness 10. My first thought was this:
SELECT DISTINCT recipes.* FROM recipes
INNER JOIN recipes_ingredients ri ON ri.recipe_id = recipes.id
INNER JOIN ingredients ON ingredients.id = ri.ingredient_id
WHERE ingredients.sweetness != 10
However, that finds recipes that contain any non-sweetness-10 ingredients.
My next attempt was the following, which seems to work:
SELECT * FROM recipes WHERE
(
SELECT count(*) FROM ingredients INNER JOIN recipes_ingredients ri ON
ri.ingredient_id = ingredients.id WHERE ingredients.sweetness = 10 AND
ri.recipe_id = recipes.id
) = 0
However, my general experience is that dependent subqueries run slowly compared to equivalent, well-crafted JOINs. I played around with joining, grouping, etc. but couldn't quite wrap my head around it, especially since, though it seems like LEFT JOIN and IS NULL were the proper tools, having two joins already made things nasty. Great SQL wizards, what query can I run to get the best results? Thanks!

Try this:
SELECT DISTINCT recipes.*
FROM recipes r LEFT JOIN
(SELECT ri.recipe_id
FROM recipes_ingredients ri
INNER JOIN ingredients ON ingredients.id = ri.ingredient_id
WHERE ingredients.sweetness = 10) i on i.recipe_id=r.recipe_id
WHERE i.recipe_id is null

Try:
select
r.*
from
recipes r
where
not exists (
select
1
from
recipe_ingredients ri
join ingredients i on ri.ingredient_id = ri.ingredient_id
where
ri.recipie_id = r.recipe_id
and i.sweetness = 10
)
It's still a correlated subquery, but exists and not exists have some optimizations that should make them perform better than your original query.
For a direct join solution, this should work:
select distinct
r.*
from
recipes r
join recipe_ingredients ri on ri.recipe_id = r.recipe_id
left join ingredents i on i.ingredient_id = ri.ingredient_id and i.sweetness = 10
where
i.ingredient_id is null
Depending on indexing, the not exists solution could be faster as not exists returns immediately upon figuring out if any rows satisfy the given conditions without looking at any more of the table than necessary. For example, if it finds a single row of sweetness 10, it stops looking at the table and returns false.

I played around with the answers given me here (which I've since upvoted), and, from their inspiration, have come up with a query that seems to do the job with surprisingly outstanding performance:
SELECT r.* FROM recipes r
LEFT JOIN recipes_ingredients ri ON ri.parent_id = r.id
LEFT JOIN ingredients i ON i.id = ri.ingredient_id AND i.sweetness = 10
GROUP BY r.id HAVING MAX(i.id) IS NULL
The joins with the condition inside (inspired by #Donnie) bring out recipe-ingredient combinations, with NULL rows if the ingredient is not of sweetness 10. We then group by recipe ID, and select the "max" ingredient ID. (The MAX function will return null if and only if there are no actual IDs to select, i.e., there are absolutely no non-sweetness-10 items associated with this recipe to choose instead.) If that "max" ingredient ID is null, then there were no sweetness-10 items for the MAX function to select, and, therefore, rows HAVING a null MAX(i.id) are selected.
I ran both the NOT EXISTS version of the query and the above version of the query a number of times with the query cacher disabled. Against about 400 recipes, the NOT EXISTS query would consistently take about 1.0 seconds to complete, whereas this query's runtime was usually around 0.1 seconds. Against about 5000 recipes, the NOT EXISTS query took about 30 seconds, whereas the above query usually still took 0.1 seconds, and was almost always under 1.0.
It's worth noting that, checking EXPLAINs on each, the query listed here is able to run almost entirely on the indices I've given these tables, which probably explains why it is able to do all sorts of joining and grouping without batting an eye. The NOT EXISTS query, on the other hand, has to do dependent subqueries. The two might perform more equally if these indices weren't in place, but that query optimizer is pretty darn powerful when given the chance to use raw joins, it would seem.
Moral of the story: well-formed JOINs are super-duper powerful :) Thanks, all!

Related

MySQL slow left join on sort

I have millions of customers and when I use left join and then I sort by a column it takes 4-5sec here is my query:
SELECT c.id AS id, o.description AS office_description, ... , d.type AS document_type, d.number AS document_number
FROM customers c INNER JOIN offices o ON (c.id_office = o.id)
INNER JOIN company cp ON (o.id_company = cp.id)
LEFT JOIN documents d ON (C.id = d.id_customer)
WHERE c.archive = 0
ORDER BY office_description
LIMIT 10
So when I remove documents columns in my SELECT the query is very fast.
Here is the query explain :
I have 1 millions customers and other tables I have only 1 row (for company / office / documents)
I set index on c.archive / o.description and primary keys / foreigns keys ofc. Here is the structures of these tables: http://sqlfiddle.com/#!9/a222f9
So I tried to build my query like this:
SELECT A.*, d.*
FROM (
SELECT c.id AS id, o.description AS office_description, ...
FROM customers c INNER JOIN offices o ON (c.id_office = o.id)
INNER JOIN company cp ON (o.id_company = cp.id)
WHERE c.archive = 0
ORDER BY o.description
LIMIT 10
) A LEFT JOIN documents d ON (A.id = d.id_customer)
And now, wow, it's very fast.
But I don't know if it's the best way to reduce the lag and if I'm doing wrong. I'd like to know if you know a better way to do that.
I hope there is an easier way because it will be complicated to use this query in my Phalcon project
An explanation...
Your faster query can find the 10 rows before looking in documents. So, it needs only 10 probes into that table.
In the original query, the Optimizer was not too smart. It planned to execute the query as if there were no LIMIT. Instead, it decided to optimizer the join to documents by fetching the entire table into the "join buffer" into RAM and built a hash index into it. While this would help some queries like yours, it was a big waste for the mere 10 rows that you needed.
So, your reformulation convinced the Optimizer to do it a better way.
If you had needed only one column from d, there is another way:
SELECT ...,
( SELECT col FROM d WHERE ... ) AS col,
... ((without the LEFT JOIN at all))
As for an "easier" way, especially one that can be reverse-engineered into some 3rd package, I doubt it. (Packages tend to be cruxes for getting started in databases. As you are finding out, you eventually need to learn more than they can teach you.)
A separate inefficiency:
WHERE c.archive = 0
ORDER BY o.office_description
LIMIT ...
If the archived rows had been removed from c, then the optimal execution would be to find the first 10 rows of o. Instead it must do a lengthy JOIN before sorting and limiting. (This is a common problem with "soft deletes". Neither MySQL nor the 3rd party package can optimize it.)

MySQL: Optimizing Sub-queries

I have this query I need to optimize further since it requires too much cpu time and I can't seem to find any other way to write it more efficiently. Is there another way to write this without altering the tables?
SELECT category, b.fruit_name, u.name
, r.count_vote, r.text_c
FROM Fruits b, Customers u
, Categories c
, (SELECT * FROM
(SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r
WHERE b.fruit_id = r.fruit_id
AND u.customer_id = r.customer_id
AND category = "Fruits";
This is your query re-written with explicit joins:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN
(
SELECT * FROM
(
SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r on r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
CROSS JOIN Categories c
WHERE c.category = 'Fruits';
(I am guessing here that the category column belongs to the categories table.)
There are some parts that look suspicious:
Why do you cross join the Categories table, when you don't even display a column of the table?
What is ORDER BY fruit_id, count_vote DESC, r_id supposed to do? Sub query results are considered unordered sets, so an ORDER BY is superfluous and can be ignored by the DBMS. What do you want to achieve here?
SELECT * FROM [ revues ] GROUP BY fruit_id is invalid. If you group by fruit_id, what count_vote and what r.text_c do you expect to get for the ID? You don't tell the DBMS (which would be something like MAX(count_vote) and MIN(r.text_c)for instance. MySQL should through an error, but silently replacescount_vote, r.text_cbyANY_VALUE(count_vote), ANY_VALUE(r.text_c)` instead. This means you get arbitrarily picked values for a fruit.
The answer hence to your question is: Don't try to speed it up, but fix it instead. (Maybe you want to place a new request showing the query and explaining what it is supposed to do, so people can help you with that.)
Your Categories table seems not joined/related to the others this produce a catesia product between all the rows
If you want distinct resut don't use group by but distint so you can avoid an unnecessary subquery
and you dont' need an order by on a subquery
SELECT category
, b.fruit_name
, u.name
, r.count_vote
, r.text_c
FROM Fruits b
INNER JOIN Customers u ON u.customer_id = r.customer_id
INNER JOIN Categories c ON ?????? /Your Categories table seems not joined/related to the others /
INNER JOIN (
SELECT distinct fruit_id, count_vote, text_c, customer_id
FROM Reviews
) r ON b.fruit_id = r.fruit_id
WHERE category = "Fruits";
for better reading you should use explicit join syntax and avoid old join syntax based on comma separated tables name and where condition
The next time you want help optimizing a query, please include the table/index structure, an indication of the cardinality of the indexes and the EXPLAIN plan for the query.
There appears to be absolutely no reason for a single sub-query here, let alone 2. Using sub-queries mostly prevents the DBMS optimizer from doing its job. So your biggest win will come from eliminating these sub-queries.
The CROSS JOIN creates a deliberate cartesian join - its also unclear if any attributes from this table are actually required for the result, if it is there to produce multiples of the same row in the output, or just an error.
The attribute category in the last line of your query is not attributed to any of the tables (but I suspect it comes from the categories table).
Further, your code uses a GROUP BY clause with no aggregation function. This will produce non-deterministic results and is a bug. Assuming that you are not exploiting a side-effect of that, the query can be re-written as:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN Reviews r
ON r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
ORDER BY r.fruit_id, count_vote DESC, r_id;
Since there are no predicates other than joins in your query, there is no scope for further optimization beyond ensuring there are indexes on the join predicates.
As all too frequently, the biggest benefit may come from simply asking the question of why you need to retrieve every single row in the tables in a single query.

MySQL SELECT query with a many to many relationship

I'm having trouble making a SELECT/WHERE query using a many to many relationship type. A user inputs ingredients, and I want to find which recipes use all the ingredients provided among the other ingredients (if any). (Think: use up the last ingredients I have in my fridge)
My DB is currently designed like this:
recipes_ingredients looks like this
For example, if I give,id_ingredient IN (22, 23) i want the recipe #16497 only, not #16631 (since it only has 22 and not 23).
I've come up with something that does the opposite of what I described
SELECT DISTINCT recipes.*
FROM recipes_ingredients
JOIN recipes ON recipes_ingredients.id_recipe = recipes.id
WHERE id_ingredient IN ( 96, 13196 )
If you want to get recipes which should have these both ingredients(not single ingredient) then you can use aggregation with some filter
SELECT r.*
FROM recipes_ingredients i
JOIN recipes r ON i.id_recipe = r.id
WHERE i.id_ingredient IN ( 96, 13196 )
GROUP BY r.id
HAVING COUNT(DISTINCT i.id_ingredient ) = 2
OR
SELECT r.*
FROM recipes_ingredients i
JOIN recipes r ON i.id_recipe = r.id
GROUP BY r.id
HAVING SUM(i.id_ingredient = 96)
AND SUM(i.id_ingredient = 13196)
Assuming that you need recipes that contains all ingredients you have on the input then you may use JOIN
SELECT recipes.* FROM recipes
JOIN recipes_ingredients r1 ON recipes.id = r1.id_recipe AND r1.id_ingredient = 96
JOIN recipes_ingredients r2 ON recipes.id = r2.id_recipe AND r2.id_ingredient = 13192
Unfortunately there is no intersect operator in mysql which would be more simple.
SELECT count(*) matches, id_recipe FROM `recipes_ingredients`
WHERE `id_ingredient` in ('23',...)
Group By `id_recipe`
WHERE matches = (
SELECT count(*) FROM ingredients where id in ('23',...)
);
This provides the count of matching ingredients per recipe, then compares counts to the exact number of parameters passed in. Or, since you are using phpmyadmin (and as such: PHP), you can pass in a count of the parameters (using PHP's count() if they start in an array, for example), and skip the subquery.
You can then join this list outwards to get any further information.

Incomprehensible query behaviour

I have multiple tables, related by multiple foreign keys as in the following example:
Recipes(id_recipe,name,calories,category) - id_recipe as PK.
Ingredients(id_ingredient,name,type) - id_ingredient as PK.
Contains(id_ingredient,id_recipe,quantity,unit) - (id_ingredient,id_recipe) as PK, and as Foreign Keys for Recipes(id_recipe) and Ingredients(id_ingredient).
You can see this relations represented in this image.
So basically Contains is a bridge between Recipes and Ingredients.
The query I try to write it's supposed to give as result the names of the recipes whose ingredients type are "bovine" but not "lactic".
My attempt:
SELECT DISTINCT Recipes.name
FROM Ingredients JOIN Contains USING(id_ingredient) JOIN Recipes USING (id_recipe)
WHERE Ingredients.type = "bovin"
AND Ingredients.type <> "lactic";
The problem is it still shows me recipes that have at least one lactic ingredient.
I would appreciate any help!
This is the general form of the kind of query you need:
SELECT *
FROM tableA
WHERE tableA.ID NOT IN (
SELECT table_ID
FROM ...
)
;
-- EXAMPLE BELOW --
The subquery gives the id values of all recipes that the "lactic" ingredient is used in, the outer query says "give me all the recipes not in that list".
SELECT DISTINCT Recipes.name
FROM Recipes
WHERE id_recipe IN (
SELECT DISTINCT id_recipe
FROM `Ingredients` AS `i`
INNER JOIN `Contains` AS `c` USING (id_ingredient)
WHERE `i`.`type` = "lactic"
)
;
Alternatively, using your original query:
You could've changed the second join to a LEFT JOIN, changed it's USING to an ON & included AND type = "lactic" there instead, and ended the query with HAVING Ingredients.type IS NULL (or WHERE, I just prefer HAVING for "final result" filtering). This would tell you which items could not be joined to the "lactic" ingredient.
A common solution of this type of question (checking conditions over a set of rows) utilizes aggregate + CASE.
SELECT R.Name
FROM Recipes R
INNER JOIN Contains C
on R.ID_Recipe = C.ID_Recipe
INNER JOIN Ingredients I
on C.ID_Ingredient = I.ID_Ingredient
GROUP BY R.name
having -- at least one 'lactic' ingredient
sum(case when type = 'lactic' then 1 else 0 end) = 0
and -- no 'bovin' ingredient
sum(case when type = 'bovin' then 1 else 0 end) > 0
It's easy to extend to any number of ingredients and any kind of question.
Hijacked the fiddle of xQbert
SELECT R.NAME
FROM CONTAINS C
INNER JOIN INGREDIENTS I
ON I.ID_INGREDIENTS = C.ID_INGREDIENTS AND I.TYPE = 'bovine' AND I.TYPE <> "lactic"
INNER JOIN RECIPES R
ON R.ID_RECIPE = C.ID_RECIPE
GROUP BY R.NAME
That should work, maybe you need to escape 'contains'. It could be recognized as a SQL function.
SQL Fiddle
In my example burgers and pasta have 'Bovin' and thus show up. So do cookies but cookies also have 'lactic' which is why they get excluded.
SELECT R.Name
FROM Recipes R
INNER JOIN Contains C
on R.ID_Recipe = C.ID_Recipe
INNER JOIN Ingredients I
on C.ID_Ingredient = I.ID_Ingredient
LEFT JOIN (SELECT R2.ID_Recipe
FROM Ingredients I2
INNER JOIN Contains C2
on C2.ID_Ingredient = I2.ID_Ingredient
INNER JOIN Recipes R2
on R2.ID_Recipe = C2.ID_Recipe
WHERE Type = 'lactic'
GROUP BY R2.ID_Recipe) T3
on T3.ID_Recipe = R.ID_Recipe
WHERE T3.ID_Recipe is null
and I.Type = 'Bovin'
GROUP BY R.name
There likely is a more elegant way of doing this. I really wanted to CTE this and join it to itself.. but no CTE in mySQL. Likely a way to do this using exists too.... I'm not a big fan of using IN clauses as the performance generally suffers. Exists fastest, Joins 2nd fastest, in slowest (generally speaking)
The inline view (sub query) returns the ID_recipe of those you don't want to include.
The outer query returns the Name of the recipes with ingredients you want.
By joining these two together using an outer join we return all recipes and only those with the undesired ingredient. We then limit the results to only those where the recipe ID doesn't exist for the undesired ingredient. (undesired ingredient not found) you'll get only those recipes having all desired ingredients.
You can use NOT EXISTS for this.
Try this:
SELECT DISTINCT Recipes.`name`
FROM Recipes JOIN Contains AS C1 USING (id_recipe) JOIN Ingredients USING(id_ingredient)
WHERE Ingredients.type = "bovin"
AND NOT EXISTS (
SELECT 1
FROM Contains AS C2 JOIN Ingredients USING(id_ingredient)
WHERE C1.id_recipe = C2.id_recipe
AND Ingredients.type = "lactic"
)

mysql query where not with Hierarchical tables

So basically I'm joining 3 tables together. The main table is recipe, then it goes to ingredients list then ingredient.
So I need to have a query which has only recipes which contain NO chicken. The problem I am having is that because recipes have many ingredients when I use where != that just removes the ingredients with that meat but leaves the others.....how can i account for the multiple ingredients.
select Recipe.name as "No chicken" from Recipe inner join IngredientList on Recipe.recipeId=IngredientList.recipeId inner join Ingredients on IngredientList.IngredientId=Ingredients.ingredientId where type!="chcicken" group by Recipe.name;
Your original statement has a GROUP BY with no aggregate function. That doesn't make sense. It should be an ORDER BY if you're trying to sort.
Try something like this:
SELECT `Recipe`.`name` AS "No chicken"
FROM `Recipe`
WHERE `Recipe`.`RecipeId` NOT IN (
SELECT DISTINCT `IngredientList`.`RecipeId` AS `RecipeID`
FROM `IngredientList`
INNER JOIN `Ingredients` ON `IngredientList`.`IngredientId` = `Ingredients`.`IngredientId`
WHERE `Ingredients`.`Type` = 'chicken'
)
ORDER BY `Recipe`.`name`
Depending on your schema, you may need to use SELECT DISTINCT in the main select statement if you're getting duplicate recipe names.
The above have some typos, but Amirshk has a logically correct answer.
However, I recommend one avoid the IN() and NOT IN() clauses in MySQL as they are very, very slow on a set of tables as big as a large recipe database would get. IN and NOT IN can be re-written as joins to cut the runtime to 1/100th the time in MySQL 5.0. Even with MySQL 5.5's great improvements, the equivalent JOIN query benchmarks 1/5th the time on large tables.
Here is the revised query:
SELECT
Recipe.name AS "No Chicken"
FROM Recipe LEFT JOIN
(
SELECT IngredientList.recipeId, Ingredients.ingredientId
FROM IngredientList JOIN Ingredients USING (IngredientId)
WHERE Ingredients.type = 'chicken'
) WithChicken
ON Recipe.recipeId = WithChicken.recipeId
WHERE WithChicken.recipeId IS NULL;
This is pretty obtuse, so here is simplified SQL that provides the key concept of the NOT IN(...) equivalent exclusion join:
SELECT whatever FROM x
WHERE x.id NOT IN (
SELECT id FROM y
};
becomes
SELECT whatever FROM x
LEFT JOIN y ON x.id = y.id
WHERE y.id IS NULL;
Use an inner query to filter recipes with chicken, then select all the recipes without them.
As so:
select
Recipe.name as "No chicken"
from Recipe
inner join IngredientList on Recipe.recipeId=IngredientList.recipeId
inner join Ingredients on IngredientList.IngredientId=Ingredients.ingredientId
where Recipe.recipeId NOT IN (
select
Recipe.recipeId
from Recipe
inner join IngredientList on Recipe.recipeId=IngredientList.recipeId
inner join Ingredients on IngredientList.IngredientId=Ingredients.ingredientId
type ="chcicken" group by Recipe.recipeId)