SQL query to find missing entries in - mysql

I have a database in which I need to find some missing entries and fill them in.
I have a table called "menu", each restaurant has multiple dishes and each dish has 4 different language entries (actually 8 in the main database but for simplicity lets go with 4), I need to find out which dishes for a particular restaurant are missing any language entries.
select * from menu where restaurantid = 1
i get stuck there, something along the lines of where language 1 or 2 or 3 or 4 doesn't exist which is the complicated bit because I need to see the languages that exist in order to see the language that's missing because I can't display something that isn't there. I hope that makes sense?
In the example table below restaurant 2 dishid 2 is missing language 3, that's what i need to find.
+--------------+--------+----------+-----------+
| RestaurantID | DishID | DishName | Language |
+--------------+--------+----------+-----------+
| 1 | 1 | Soup | 1 |
| 1 | 1 | Soúp | 2 |
| 1 | 1 | Soupe | 3 |
| 1 | 1 | Soupa | 4 |
| 1 | 2 | Bread | 1 |
| 1 | 2 | Bréad | 2 |
| 1 | 2 | Breade | 3 |
| 1 | 1 | Breada | 4 |
| 2 | 1 | Dish1 | 1 |
| 2 | 1 | Dísh1 | 2 |
| 2 | 1 | Disha1 | 3 |
| 2 | 1 | Dishe1 | 4 |
| 2 | 2 | Dish2 | 1 |
| 2 | 2 | Dísh2 | 2 |
| 2 | 2 | Dishe2 | 4 |
+--------------+--------+----------+-----------+

An anti-join pattern is usually the most efficient, in terms of performance.
Your particular case is a little more tricky, in that you need to "generate" rows that are missing. If every (ResturantID,DishID) should have 4 rows, with Language values of 1,2,3 and 4, we can generate that set of all rows with a CROSS JOIN operation.
The next step is to apply an anti-join... a LEFT OUTER JOIN to the rows that exist in the menu table, so we get all the rows from the CROSS JOIN set, along with matching rows.
The "trick" is to use a predicate in the WHERE clause that filters out rows where we found a match, so we are left rows that didn't have a match.
(It seems a bit strange at first, but once you get your brain wrapped around the anti-join pattern, it becomes familiar.)
So a query of this form should return the specified result set.
SELECT d.RestaurantID
, d.DishID
, lang.id AS missing_language
FROM (SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
) lang
CROSS
JOIN (SELECT e.RestaurantID, e.DishID
FROM menu e
GROUP BY e.RestaurantID, e.DishID
) d
LEFT
JOIN menu m
ON m.RestaurantID = d.RestaurantID
AND m.DishID = d.DishID
AND m.Language = lang.id
WHERE m.RestaurantID IS NULL
ORDER BY 1,2,3
Let's unpack that bit.
First we get a set containing the numbers 1 thru 4.
Next we get a set containing the (RestaurantID, DishID) distinct tuples. (For each distinct Restaurant, a distinct list of DishID, as long as there is at least one row for any Language for that combination.)
We do a CROSS JOIN, matching every row from set one (lang) with every row from set (d), to generate a "complete" set of every (RestaurantID, DishID, Language) we want to have.
The next part is the anti-join... the left outer join to menu to find which of the rows from the "complete" set has a matching row in menu, and filtering out all the rows that had a match.
That may be a little confusing. If we think of that CROSS JOIN operation producing a temporary table that looks like the menu table, but containing all possible rows... we can think of it in terms of pseudocode:
create temporary table all_menu_rows (RestaurantID, MenuID, Language) ;
insert into all_menu_rows ... all possible rows, combinations ;
Then the anti-join pattern is a little easier to see:
SELECT r.RestaurantID
, r.DishID
, r.Language
FROM all_menu_rows r
LEFT
JOIN menu m
ON m.RestaurantID = r.RestaurantID
AND m.DishID = r.DishID
AND m.Language = r.Language
WHERE m.RestaurantID IS NULL
ORDER BY 1,2,3
(But we don't have to incur the extra overhead of creating and populating the temporary table, we can do that right in the query.)
Of course, this isn't the only approach. We could use a NOT EXISTS predicate instead of an anti-join, though this is not usually as efficient. The first part of the query is the same, to generate the "complete" set of rows we expect to have; what differs is how we identify whether or not there is a matching row in the menu table:
SELECT d.RestaurantID
, d.DishID
, lang.id AS missing_language
FROM (SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
) lang
CROSS
JOIN (SELECT e.RestaurantID, e.DishID
FROM menu e
GROUP BY e.RestaurantID, e.DishID
) d
WHERE NOT EXISTS ( SELECT 1
FROM menu m
WHERE m.RestaurantID = d.RestaurantID
AND m.DishID = d.DishID
AND m.Language = lang.id
)
ORDER BY 1,2,3
For each row in the "complete" set (generated by the CROSS JOIN operation), we're going to run a correlated subquery that checks whether a matching row is found. The NOT EXISTS predicate returns TRUE if no matching row is found. (This is a little easier to understand, but it usually doesn't perform as well as the anti-join pattern.)

You can use the following statement if each menu item should have a record on each language (8 in real life 4 in example). You can change the number 4 to 8 if you want to see all menu items per restaurant that doesn't have all 8 entries.
SELECT RestaurantID,DishID, COUNT( * )
FROM Menu
GROUP BY RestaurantID,DishID
HAVING COUNT( * ) <4

Related

empty MySQL concatenated join causes query failure [duplicate]

This question already has answers here:
What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN? [duplicate]
(3 answers)
Closed 5 years ago.
I am having trouble with a MySQL query (version 5.6.37). I think it merely needs a reorganizing of the query components, but I can not make it work.
Problem: when my JOIN returns no rows, the entire query returns no rows, even though the data matches the query.
Here is my current query (where '#' is the input):
SELECT pets.id,pet,collar,GROUP_CONCAT(petData.fleas) AS f_id
FROM titles
JOIN petWear ON pets.id = petWear
JOIN petData ON petWear.id = petData.id
WHERE pets.id = '#'
GROUP BY pets.id,pet,collar
Assuming a "pets" table like this:
id | pet
1 | cat
2 | dog
3 | fish
4 | snake
5 | rabbit
And a JOINed "petData" table like this:
id | fleas
1 | 1
1 | 2
1 | 3
1 | 4
2 | 5
Successful query:
If # = 1, then the query returns a single result:
id | pet | collar | f_id
1 | cat | gold | 1,2,3,4
Unsuccessful query:
If # = 5, then the query returns no result.
What I would like to have returned (for # = 5) is this single result (i.e. no result, or NULL, for "f_id"):
id | pet | collar | f_id
5 | rabbit | red |
Note that I have included the petWear table with the "collar" listing, just to state that a "normal" join needs to also be part of the picture.
Try the following code. Basically, if GROUP_CONCAT returns no values, it will print no_fleas
SELECT pets.id,pet,collar,IFNULL((GROUP_CONCAT(petData.fleas), 'no_fleas') AS f_id
FROM titles
JOIN petWear ON pets.id = petWear
JOIN petData ON petWear.id = petData.id
WHERE pets.id = '#'
GROUP BY pets.id,pet,collar

MySQL. Group matches in columns by a common id

I am codding a search page with multiple filters and I am wondering if this is the best approach to get the results.
Each result of the search has several attributes, here I am using two attributes to simplify the example.
The main 'items' table:
id_items
1
2
The 'languages' table:
id_languages | language_code
1 es
2 en
The 'attributes_one' table:
id_attributes_one
1
2
The 'attributes_one_translations' table:
id_attributes_one_translations | id_attributes_one | id_language_code | translation
1 | 1 | 1 | Oro
2 | 1 | 2 | Gold
3 | 2 | 1 | Plata
4 | 2 | 2 | Silver
The 'attributes_one_match' table:
id_attributes_one_match | id_attributes_one | id_items
1 | 1 | 1
2 | 2 | 1
3 | 1 | 2
The 'attributes_two' table:
id_attributes_two
1
The 'attributes_two_translations' table:
id_attributes_two_translations | id_attributes_two | id_language_code | translation
1 | 1 | 2 | 99% gold
The 'attributes_two_match' table:
id_attributes_two_match | id_attributes_two | id_items
1 | 1 | 1
The concept is one item can have 0 or more match of each attribute table, and that match can have 0 or more translations.
Here is the query I am using when the user selects the filters to get all the items that have the attribute_one 'Gold' or 'Silver' order by this attribute ascendant:
SELECT
i.id_items AS id,
GROUP_CONCAT(DISTINCT aot.translation ORDER BY aot.translation DESC SEPARATOR '!¡') AS attribute_one,
GROUP_CONCAT(DISTINCT att.translation ORDER BY att.translation DESC SEPARATOR '!¡') AS attribute_two
FROM
items i
LEFT JOIN
languages AS l ON l.language_code = 'en'
LEFT JOIN
attributes_one_match AS aom ON aom.id_items = i.id_items
LEFT JOIN
attributes_one_translations AS aot ON aot.id_attributes_one = aom.id_attributes_one
AND l.id_languages = aot.id_language_code
AND (MATCH (aot.translation) AGAINST ('"Gold"' IN BOOLEAN MODE)
OR MATCH (aot.translation) AGAINST ('"Silver"' IN BOOLEAN MODE))
LEFT JOIN
attributes_one AS ao ON ao.id_attributes_one = aom.id_attributes_one
LEFT JOIN
attributes_two_match AS atm ON atm.id_items = i.id_items
LEFT JOIN
attributes_two_translations AS att ON att.id_attributes_two = atm.id_attributes_two
AND l.id_languages = att.id_language_code
LEFT JOIN
attributes_two AS at ON at.id_attributes_two = atm.id_attributes_two
GROUP BY id
ORDER BY 2 ASC
The result I get is:
id | attribute_one | attribute_two
2 | Gold | null
1 | Silver!¡Gold | 99% gold
That result is what I was expecting. Now:
* The table items will have around 300k entries once the data base is filled.
* There are 28 attributes table to match with the item.
Each attribute table will have around 20k entries, and each translation table will have 2
times the entries of the table that represents.
* Each item will have from 0 to 20 match to each item table, so I think
I wont have problems using the function GROUP_CONCAT
I am concern about the performance because the search filter page I am doing updates itself by ajax each time the user change one of the filters (it updates the filters and the results). The max results per page will be 1000 items, I didn't put the LIMIT in the query of the example.
I am not an sql expert so I don't really know if what I am doing is the best approach. I would appreciate some feedback.
Thanks a lot!

Join two tables using multiple rows in the join

I have two tables
Table: color_document
+----------+---------------------+
| color_id | document_id |
+----------+---------------------+
| 180907 | 4270851 |
| 180954 | 4270851 |
+----------+---------------------+
Table: color_group
+----------------+-----------+
| color_group_id | color_id |
+----------------+-----------+
| 3 | 180954 |
| 4 | 180907 |
| 11 | 180907 |
| 11 | 180984 |
| 12 | 180907 |
| 12 | 180954 |
+----------------+-----------+
Is it possible for a query to get a result that looks something like this using multiple color id's to join the two tables?
Result
+----------------+--------------+
| color_group_id | document_id |
+----------------+--------------+
| 12 | 4270851 |
+----------------+--------------+
Since Color Group 12 is the only group that has the exact same set of Colors that Document 4270851 has.
I've got some bad data that i'm being forced to work with so I've had to manufacture the color groups by finding each unique set of color_id's associated with document_id's. I'm trying to then create a new relationship directly between my manufactured color groups and documents.
I know I could probably do something with a GROUP_CONCAT to make a pseudo key of concatenated color ids, but I'm trying to find a solution that would also work in, say, Oracle. Am I barking up the completely wrong tree with this logic?
My ultimate goal is to be able to have a single row in a table that would represent any number of Colors that are associated with a Document to be exported to a completely different system than the one I'm working with.
Any thoughts/comments/suggestions are greatly appreciated.
Thank you in advance for looking at my question.
Do a normal join of the two tables, and count the number of rows in each pairing. Then test whether this is the same as the number of times each of the items appears in the original tables. If all are the same, then all color IDs must match.
SELECT a.color_group_id, a.document_id
FROM (
SELECT color_group_id, document_id, COUNT(*) ct
FROM color_document d
JOIN color_group g ON d.color_id = g.color_id
GROUP BY color_group_id, document_id) a
JOIN (
SELECT color_group_id, COUNT(*) ct
FROM color_group
GROUP BY color_group_id) b
ON a.color_group_id = b.color_group_id and a.ct = b.ct
JOIN (
SELECT document_id, COUNT(*) ct
FROM color_document
GROUP BY document_id) c
ON a.document_id = c.document_id and a.ct = c.ct
SQLFIDDLE
If i understand your question correct you just have to join the two tables and then group the results by color_group_id an document_id.
SQL Fiddle
select color_group_id, document_id
from
color_document cd join
color_group cg
on cd.color_id = cg.color_id
group by color_group_id, document_id
That query will give you this result set:
COLOR_GROUP_ID DOCUMENT_ID
3 4270851
4 4270851
11 4270851
12 4270851
Is that what you want?

Finding cooccuring values in MYSQL weak relation table

I have a weak relation table, called header, it is basically just three ID's: id is an autoincrement primary key, did points to the id of table D and hid points to the id of table H. D and H are irrelevant here.
I want to find for any value of hid, the other values of hid that shares did with the original hid. An example:
id | did | hid
===============
1 | 1 | 1
2 | 1 | 2
3 | 1 | 3
4 | 2 | 1
5 | 2 | 4
6 | 2 | 5
7 | 3 | 2
8 | 3 | 6
For hid = 1 I would thus like to find id = {2,3,5,6} as those are the rows that have did in common with hid = 1.
I can do this by creating some arrays in PHP and running through all possible values of hid and respective did, but this is a quite slow process for large tables. I was wondering if there is a clever kind of JOIN or similar statement that could be used to find the cooccuring values of hid.
If I have understood you correctly:-
SELECT a.hid, GROUP_CONCAT(b.id)
FROM header a
INNER JOIN header b
ON a.did = b.did
AND b.hid != 1
WHERE a.hid = 1
GROUP BY a.hid
SQL fiddle:-
http://www.sqlfiddle.com/#!2/9aa26/1
Maybe this:
SELECT d.id
FROM (
SELECT *
FROM header
WHERE header.hid =1
) AS h
JOIN header AS d ON d.did = h.did
WHERE d.hid !=1

MySQL selective GROUP BY, using the maximal value

I have the following (simplified) three tables:
user_reservations:
id | user_id |
1 | 3 |
1 | 3 |
user_kar:
id | user_id | szak_id |
1 | 3 | 1 |
2 | 3 | 2 |
szak:
id | name |
1 | A |
2 | B |
Now I would like to count the reservations of the user by the 'szak' name, but I want to have every user counted only for one szak. In this case, user_id has 2 'szak', and if I write a query something like:
SELECT sz.name, COUNT(*) FROM user_reservations r
LEFT JOIN user_kar k ON k.user_id = r.user_id
LEFT JOIN szak s ON r.szak_id = r.id
It will return two rows:
A | 2 |
B | 2 |
However I want to every reservation counted to only one szak (lets say the highest id only). I tried MAX(k.id) with HAVING, but seems uneffective.
I would like to know if there is a supported method for that in MySQL, or should I first pick all the user ID-s on the backend site first, check their maximum kar.user_id, and then count only with those, removing them from the id list, when the given szak is counted, and then build the data back together on the backend side?
Thanks for the help - I was googling around for like 2 hours, but so far, I found no solution, so maybe you could help me.
Something like this?
SELECT sz.name,
Count(*)
FROM (SELECT r.user_id,
Ifnull(Max(k.szak_id), -1) AS max_szak_id
FROM user_reservations r
LEFT OUTER JOIN user_kar k
ON k.user_id = r.user_id
GROUP BY r.user_id) t
LEFT OUTER JOIN szak sz
ON sz.id = t.max_szak_id
GROUP BY sz.name;