MySQL: Exclude subsets if not all row values are true

MySQL: Exclude subsets if not all row values are true - mysql

I have three MySQL tables: ingredients, ingredient_in_recipe and recipes which can be INNER JOINed to get ingredients in recipes. Also, the ingredients table has a column vegetarian. I want to get all recipes that are considered vegetarian, meaning that all ingredients for a given recipe must have set vegetarian to 1 (it is a BOOL/tinyint(1)).
I have looked at queries using ALL, HAVING NOT MAX and other various stuff, but I cannot find a working solution. What is the best way to do this? Are there some solutions that are more efficient than others?
Extra (only relevant) table information:
mysql> DESCRIBE ingredients;
+-----------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(100) | NO | | NULL | |
| vegetarian | tinyint(1) | NO | | 0 | |
+-----------------+---------------+------+-----+---------+----------------+
mysql> DESCRIBE ingredient_in_recipe;
+---------------+------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------------+------+-----+---------+-------+
| recipe_id | int(11) | NO | | NULL | |
| ingredient_id | int(11) | NO | | NULL | |
+---------------+------------------------+------+-----+---------+-------+
mysql> DESCRIBE recipes;
+------------------------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+----------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | text | NO | | NULL | |
+------------------------+----------------------+------+-----+---------+----------------+
The start of my query is currently:
SELECT recipe.name, ingredient.name
FROM ingredients AS ingredient
INNER JOIN ingredient_in_recipe AS ir
ON ir.ingredient_id = ingredient.id
INNER JOIN recipes AS recipe
ON ir.recipe_id = recipe.id;
So I am missing a WHERE, ALL, IN or something statement at the end.

You can try the following:
SELECT r.name FROM recipes r WHERE r.id NOT IN (
SELECT ir.recipe_id FROM ingredient_in_recipe ir
INNER JOIN ingredients i ON ir.ingredient_id = i.id
WHERE i.vegeterian = 0
)

Think of it this way.
Select out the set of recipes that have any non-vegitarian ingredients.
Subtract this set from the set of all recipes.
So here's the set of all recipes with a non veg ingredient.
select
id
from
recipes,
ingredient_in_recipe,
ingredients
where
ingredient_in_recipe.recipe_id = recipes.id
and
ingredient_in_recipe.ingredient_id = ingredients.id
and
ingredients.vegetarian <> 1
Note: why are you using a tinyint to mark a boolen? USE Boolean to mark boolean.
Also your DB model is pretty good. Your naming is consistent and appropriate.
Now that we have the "non-vegitarian" recipes, we just subtract from a "set" perspective [as in set theory].
select
*
from
recipes
where
id NOT IN (
-- this subquery returns a set of IDs corresponding to non-vegitarian recipes.
select
id
from
recipes,
ingredient_in_recipe,
ingredients
where
ingredient_in_recipe.recipe_id = recipes.id
and
ingredient_in_recipe.ingredient_id = ingredients.id
and
ingredients.vegetarian <> 1
);

Related

What is wrong in this Update query which tried to update table using concat() fun

I want to update the field with appending data into it, but it is giving an error, please correct me (Query and Table desc are below)
I tried to fire UPDATE command with CONCAT () FUNCTION in SQL.
update products a
set a.des = (select concat((select b.des from products b limit 1) ,' one okay') from a)
where a.p_id = 1;
I have used MySQL,
Table Description:
mysql> desc products;
+---------+-------------+------+-----+--------------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+--------------+-------+
| p_id | int(3) | NO | PRI | 0 | |
| p_name | varchar(10) | YES | | NULL | |
| p_price | int(10) | YES | | NULL | |
| cat_id | int(3) | YES | MUL | NULL | |
| des | varchar(30) | YES | | Good | |
+---------+-------------+------+-----+--------------+-------+
Expected Output :
mysql> select * from products;
+------+--------+---------+--------+---------------+
| p_id | p_name | p_price | cat_id | des |
+------+--------+---------+--------+---------------+
| 1 | Mouse | 150 | 3 | Good one okay |
| 2 | LAN | 50 | 4 | Good |
+------+--------+---------+--------+---------------+
2 rows in set (0.00 sec)
Output Came :
Error -
update products a set a.des =
(select concat((select b.des from products b limit 1) ,' one okay')
from a) where a.p_id = 1 Error Code: 1146. Table 'test.a' doesn't exist 0.437 sec

MySQL does not allow you to reference the table being updated in the rest of the update statement, as a general rule.
The normal work-around is to phrase this as a JOIN:
update products p cross join
(select * from products limit 1) arbitrary
set p.des = concat(arbitrary.des, ' one okay')
where p.p_id = 1;
Note the use of the alias arbitrary. You are using limit with no order by so you are getting an arbitrary description.
If you just want to append a string to the existing description, then you want the simpler:
update products p
set p.des = concat(p.des, ' one okay')
where p.p_id = 1;

Always display the left of the join and display the right only if any and if it matches a clause [duplicate]

This question already has answers here:
Filter Table Before Applying Left Join
(4 answers)
Closed 5 years ago.
I have 3 tables, a client table, a user table and a user_has_client table.
The user_has_client table is there to join the 2 others, but it also has a roles column.
MariaDB [extrapack]> desc user;
+----------------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------------------+------+-----+---------+----------------+
| user_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| email | varchar(255) | NO | UNI | NULL | |
MariaDB [extrapack]> desc client;
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| client_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
MariaDB [extrapack]> desc user_has_client;
+-----------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+-------+
| user_id | int(10) unsigned | NO | PRI | NULL | |
| client_id | int(10) unsigned | NO | PRI | NULL | |
| roles | tinytext | YES | | NULL | |
+-----------+------------------+------+-----+---------+-------+
There may be multiple different roles for a client and a user, and the roles column is an array.
MariaDB [extrapack]> select * from user_has_client where roles != "" limit 3;
+---------+-----------+---------+
| user_id | client_id | roles |
+---------+-----------+---------+
| 181 | 395 | cpa, ce |
| 181 | 473 | cpa |
| 181 | 498 | cpa |
+---------+-----------+---------+
But one client can offer only one same role to one user. For example, there cannot be two different users having the cpa role on one same client.
I would like to list one client, and for the client, list only the one user that has the role cpa if there is such a user.
Here is my statement:
SELECT c.client_id AS client_id0, ou.user_id AS user_id3, ou.email AS email5 FROM client c LEFT JOIN user_has_client ouhc ON c.client_id = ouhc.client_id LEFT JOIN user ou ON ouhc.user_id = ou.user_id AND ouhc.roles LIKE '%cpa%' WHERE c.client_id = 265 ORDER BY ou.email DESC;
There may or may not be a joining record for a client and a user, but even so, I still want to display a list line for the client, so I cannot do an inner join and have to do a left join.
But doing a left join, I still want only one list line per client.
As of now, the above statement gives me n lines for the client, for n users have a join on this client. But only one of these n users has a cpa role in its join. So I want to display only one list line, and with that user.
So, to sum it up, I always want one line per client, and only one line, and with its user of the given role, say the cpa role, if any for that client.

I would probably change your plan up a little bit. Make user_has_client.roles only carry one role per row. Then, if you want to only allow one of a role for each client, set a UNIQUE constraint on user_id and role.

Is this possible in one fast mysql query?

I have three tables and i need different data from all of them. Sadly i also need to be able to extract the latest row.
Here are my tables:
messages: I am just storing the content of the messages inside a table because one text could be sent to multiple users
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| message_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| content | varchar(255) | NO | | 0 | |
+------------+------------------+------+-----+---------+----------------+
conversations: This table just reflects a single conversation between two users.
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| partner_id | int(10) unsigned | NO | MUL | NULL | |
| conversation_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| expedition_id | int(11) | NO | | NULL | |
| active | tinyint(4) | NO | | 1 | |
+-----------------+------------------+------+-----+---------+----------------+
conversation_messages: This table stores the information about the actual messages exchanged.
+-----------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+-------+
| message_id | int(11) unsigned | NO | PRI | NULL | |
| receiver_id | int(11) unsigned | NO | PRI | NULL | |
| conversation_id | int(11) unsigned | NO | MUL | NULL | |
| status | tinyint(4) | NO | | NULL | |
| timestamp | datetime | YES | | NULL | |
+-----------------+------------------+------+-----+---------+-------+
What i want to do now to select the latest message inside each conversation and get the content from this message aswell. (It sounds simple, but it did not find a simple solution). What i tried is the following:
SELECT max(c_m.message_id), m.content, c_m.`status`
FROM expedition_conversations e_c, conversation_messages c_m
INNER JOIN messages m ON m.message_id = c_m.message_id
WHERE e_c.expedition_id = 1 AND (c_m.conversation_id = e_c.conversation_id)
GROUP BY c_m.conversation_id;
Sadly since GROUP BY internally seems to selecting the first inserted row most of the time, the content i select from the messages table is wrong, while the message_id selected from conversation_messages is correct.
Any idea how to perform this in one query? If you have any suggestions to alter the table structure, i would also appreciate those.
Thanks in advance.

You may want to try this version:
SELECT c_m.message_id, m.content, c_m.`status`
FROM expedition_conversations e_c join
conversation_messages c_m
ON c_m.conversation_id = e_c.conversation_id INNER JOIN
messages m
ON m.message_id = c_m.message_id
WHERE e_c.expedition_id = 1 AND
NOT EXISTS (SELECT 1
FROM conversation_messages cm2
WHERE cm2.conversation_id = c_m.conversation_id AND
cm2.timestamp > c_m.timestamp
)
For performance, you want an index on conversation_messages(conversation_id, timestamp).

This is possible, because your usage of AUTO_INCREMENT means, that the highest id belongs to the latest message:
SELECT
messages.*,
FROM
conversations
INNER JOIN (
SELECT conversation_id, MAX(message_id) AS maxmsgid
FROM conversation_messages
GROUP BY conversation_id
) AS latest ON latest.conversation_id=conversations.id
INNER JOIN messages
ON messages.message_id=latest.maxmsgid
WHERE
1=1 -- whatever you want or need!
Since this query is bound to be quite slow, you might want to consider a few options:
Throw hardware at it: Use enough RAM and configure MySQL to go to disk for the interims table as late as possibel
Use denormalization and have a ON AFTER INSERT trigger on messages update a field on conversation_messages, that holds the latest message ID

Try this little trick:
SELECT c_m.message_id, m.content, c_m.status
FROM expedition_conversations e_c
JOIN (select * from (
select * from conversation_messages
order by message_id desc) x
group by conversation_id) c_m ON c_m.conversation_id = e_c.conversation_id
INNER JOIN messages m ON m.message_id = c_m.message_id
WHERE e_c.expedition_id = 1
This will work on your version if mysql - 5.6.19 - and should out-perform other approaches.

Query taking very long (Explain included)

Goal of query:
Display race by district.
Query:
SELECT school_data_schools_outer.district_id,
school_data_race_ethnicity_raw_outer.year,
school_data_race_ethnicity_raw_outer.race,
ROUND(
SUM( school_data_race_ethnicity_raw_outer.count) /
(SELECT SUM(count)
FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_inner
INNER JOIN school_data_schools as school_data_schools_inner
USING (school_id)
WHERE school_data_schools_outer.district_id = school_data_schools_inner.district_id
AND school_data_race_ethnicity_raw_outer.year = school_data_race_ethnicity_raw_inner.year) * 100, 2)
FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_outer
INNER JOIN school_data_schools as school_data_schools_outer USING (school_id)
GROUP BY school_data_schools_outer.district_id,
school_data_race_ethnicity_raw_outer.year,
school_data_race_ethnicity_raw_outer.race
mysql> explain SELECT school_data_schools_outer.district_id, school_data_race_ethnicity_raw_outer.year, school_data_race_ethnicity_raw_outer.race,ROUND(SUM(school_data_race_ethnicity_raw_outer.count)/( SELECT SUM(count) FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_inner INNER JOIN school_data_schools as school_data_schools_inner USING (school_id) WHERE school_data_schools_outer.district_id = school_data_schools_inner.district_id and school_data_race_ethnicity_raw_outer.year = school_data_race_ethnicity_raw_inner.year ) * 100,2) FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_outer INNER JOIN school_data_schools as school_data_schools_outer USING (school_id) GROUP BY school_data_schools_outer.district_id, school_data_race_ethnicity_raw_outer.year, school_data_race_ethnicity_raw_outer.race;
+----+--------------------+--------------------------------------+--------+----------------------------+---------+---------+----------------------------------------------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------------------------+--------+----------------------------+---------+---------+----------------------------------------------------------------------+-------+---------------------------------+
| 1 | PRIMARY | school_data_race_ethnicity_raw_outer | ALL | school_id,school_id_2 | NULL | NULL | NULL | 84012 | Using temporary; Using filesort |
| 1 | PRIMARY | school_data_schools_outer | eq_ref | PRIMARY | PRIMARY | 257 | rocdocs_main_drupal_7.school_data_race_ethnicity_raw_outer.school_id | 1 | |
| 2 | DEPENDENT SUBQUERY | school_data_race_ethnicity_raw_inner | ref | school_id,year,school_id_2 | year | 4 | func | 8402 | |
| 2 | DEPENDENT SUBQUERY | school_data_schools_inner | eq_ref | PRIMARY | PRIMARY | 257 | rocdocs_main_drupal_7.school_data_race_ethnicity_raw_inner.school_id | 1 | Using where |
+----+--------------------+--------------------------------------+--------+----------------------------+---------+---------+----------------------------------------------------------------------+-------+---------------------------------+
4 rows in set (0.00 sec)
mysql>
mysql> describe school_data_race_ethnicity_raw;
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| school_id | varchar(255) | NO | MUL | NULL | |
| year | int(11) | NO | MUL | NULL | |
| race | varchar(255) | NO | | NULL | |
| count | int(11) | NO | | NULL | |
+-----------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> describe school_data_schools;
+-------------+----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------------+------+-----+---------+-------+
| school_id | varchar(255) | NO | PRI | NULL | |
| grade_level | varchar(255) | NO | | NULL | |
| district_id | varchar(255) | NO | | NULL | |
| school_name | varchar(255) | NO | | NULL | |
| address | varchar(255) | NO | | NULL | |
| city | varchar(255) | NO | | NULL | |
| lat | decimal(20,10) | NO | | NULL | |
| lon | decimal(20,10) | NO | | NULL | |
+-------------+----------------+------+-----+---------+-------+
8 rows in set (0.00 sec)
NOTE: I also have tried:
select sds.school_id,
detail.year,
detail.race,
ROUND((detail.count / summary.total) * 100 ,2) as percent
FROM school_data_race_ethnicity_raw as detail
inner join school_data_schools as sds USING (school_id)
inner join (
select sds2.district_id, year, sum(count) as total
from school_data_race_ethnicity_raw
inner join school_data_schools as sds2 USING (school_id)
group by sds2.district_id, year
) as summary on summary.district_id = sds.district_id
and summary.year = detail.year

This is slow beacuse:
You have no index in use on school_data_race_ethnicity_raw_outer, so it's scanning each of the ~84,000 rows
You are using a correlated subquery which means that your complex calculation has to be run once per row i.e. 84,000 times.
The best approach is not to use a correlated subquery, but if not, then to make it go fast, you need to use covering indexes so that the whole of that inner query (and the other parts via their own indexes) can be run lightning fast using just the index. For a great tutorial on the subject of indexes, check this out. It taught me a lot! Right now, your inner query just uses the year index on school_data_race_ethnicity_raw, so it has to look up the rest of the stuff it needs by reading 8000 rows for every one of the 84000 calculations. Indexes will make this far faster e.g. create a composite index on school_data_race_ethnicity_raw and you will find it helps:
CREATE index inner_composite ON school_data_race_ethnicity_raw (year, district_id, schoolid, count)
This will allow all the fields used in the WHERE to be gotten from the index, then the join field, then the field you want for the select. You should see it show up in the 'key' column of your explain result. Also, if you get it right, you'll see 'using index' in the right-most column, showing that no table access is happening, which is orders of magnitude faster.
You can experiment quick-and-dirty style by adding loads of indexes for the columns that the query mentions and see what gets picked up in the key column. If something appears, read your query to see what other columns from that table are in use, then add a new index with those columns added in too on the right hand side and see if that works better. Remember to delete the unused indexes once you find out what works.
MySQL doesn't allow you to directly index the SUM of a column, which would be the fastest way, so unless you want to move to another DB (good idea if you can), this will always be a little slow.

This should be all you need to aggregate your data to get a count of race by district, not sure why you are doing so much math in your original, as it is unnecessary to achieve your goal, and is forcing some crazy sub queries.
SELECT SUM(students.count) as studentCount, School.district_id, students.race
FROM school_data_schools schools,
school_data_race_ethnicity_raw students
WHERE shools.school_id = students.school_id
GROUP BY district_id, race
You probably also want an index on school_data_race_ethnicity_raw.school_id (alone, not as part of a multiple column key)
EDIT was not aware OP was looking for a percentage breakdown, and not just totals
SELECT ((studentCount / districtTotal) * 100) as percentage, district_id, race
FROM(
SELECT SUM(students.count) as studentCount, Schools.district_id, students.race,
(SELECT SUM(inStudents.count)
FROM school_data_schools inSchools,
school_data_race_ethnicity_raw inStudents
WHERE inSchools.school_id = inStudents.school_id
AND inSchools.district_ID = Schools.district_id
GROUP BY inSchools.district_id) as districtTotal
FROM school_data_schools schools,
school_data_race_ethnicity_raw students
WHERE schools.school_id = students.school_id
GROUP BY district_id, race
) table1
This will run pretty quick, still need to make sure there is an index on school_data_race_ethnicity_raw.school_id that is not part of a multiple column index. you can see it in action here, though my test case is rather small, it does seem to check out.

MySQL: return field for which no related entries exist in another table

First, sorry for the title, as I'm no native english-speaker, this is pretty hard to phrase. In other words, what I'm trying to achieve is this:
I'm trying to fetch all domain names from the table virtual_domains where there is no corresponding entry in the virtual_aliases table starting like "postmaster#%".
So if I have two domains:
foo.org
example.org
An they got aliases like:
info#foo.org => admin#foo.org
postmaster#foo.org => user1#foo.org
info#example.org => admin#example.org
I want the query to return only the domain "foo.org" as "example.org" is missing the postmaster alias.
This is the table layout:
mysql> show columns from virtual_aliases;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| domain_id | int(11) | NO | MUL | NULL | |
| source | varchar(100) | NO | | NULL | |
| destination | varchar(100) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
mysql> show columns from virtual_domains;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | | NULL | |
+-------+-------------+------+-----+---------+----------------+
I tried for many hours with IF, CASE, LIKE queries with no success. I don't need a final solution, maybe just a hint with some explanation. Thanks!

SELECT * FROM virtual_domains AS domains
LEFT JOIN virtual_aliases AS aliases
ON domains.id = aliases.domain_id
WHERE aliases.domain_id IS NULL
LEFT JOIN returns all records from the "left" table, even they have no corresponding records in "right" table. Those records will have the right table fields set to NULL. Use WHERE to strip all the others.
I guess I didn't understand you correctly the first time. You have several entries in aliases for single domain, and you want to display only those domains that don't have an entry in aliases table that starts with "postmaster"?
In this case you are should use NOT IN like this:
SELECT * FROM virtual_domains AS domains
WHERE domains.id NOT IN (
SELECT domain_id
FROM virtual_aliases
WHERE whatever_column LIKE "postmaster#%"
)

select id,domain from virtual_domains
where id not in (select domain_id from virtual_aliases)

SELECT * FROM virtual_domains vd
LEFT JOIN virtual_aliases va ON vd.id = va.domain_id
AND va.destination NOT LIKE 'postmaster#%';

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008