Finding cooccuring values in MYSQL weak relation table - mysql

I have a weak relation table, called header, it is basically just three ID's: id is an autoincrement primary key, did points to the id of table D and hid points to the id of table H. D and H are irrelevant here.
I want to find for any value of hid, the other values of hid that shares did with the original hid. An example:
id | did | hid
===============
1 | 1 | 1
2 | 1 | 2
3 | 1 | 3
4 | 2 | 1
5 | 2 | 4
6 | 2 | 5
7 | 3 | 2
8 | 3 | 6
For hid = 1 I would thus like to find id = {2,3,5,6} as those are the rows that have did in common with hid = 1.
I can do this by creating some arrays in PHP and running through all possible values of hid and respective did, but this is a quite slow process for large tables. I was wondering if there is a clever kind of JOIN or similar statement that could be used to find the cooccuring values of hid.

If I have understood you correctly:-
SELECT a.hid, GROUP_CONCAT(b.id)
FROM header a
INNER JOIN header b
ON a.did = b.did
AND b.hid != 1
WHERE a.hid = 1
GROUP BY a.hid
SQL fiddle:-
http://www.sqlfiddle.com/#!2/9aa26/1

Maybe this:
SELECT d.id
FROM (
SELECT *
FROM header
WHERE header.hid =1
) AS h
JOIN header AS d ON d.did = h.did
WHERE d.hid !=1

Related

I need to select a result based on 2 registers from a relation tablet. How do I do that?

I have 3 tables:
Question (id, questionText)
QuestionCategory (id, categoryName)
Question_QuestionCategory (questionId, categoryId)
Sample Data:
Table Question:
id | questionText
1 | 2 + 2 = ?
2 | 10 x 5 / 3 + 5 = ?
3 | USA is located in which continent?
Table QuestionCategory:
id | categoryName
1 | Easy
2 | Hard
3 | Math
4 | Geography
Table Question_QuestionCategory:
questionId | categoryId
1 | 1
1 | 3
2 | 2
2 | 3
3 | 1
3 | 4
The Question_QuestionCategory table is a relation table that stores the foreign keys from the question and questionCategory tables.
My problem is: I need a select that returns to me a question that has the Hard and Math categories at the same time (the question with id 2 in this case). How can I do that?
You can do that by using aggregation an checking if the distinct count of categories is equal to the number of categories you asked for. To only get one row as a result you can use LIMIT.
SELECT q.id,
q.text
FROM question q
INNER JOIN question_questioncategory qc
ON qc.question = q.id
INNER JOIN questioncategory c
ON c.id = qc.categoryid
WHERE c.categoryname IN ('Hard',
'Math')
GROUP BY q.id,
q.text
HAVING count(DISTINCT c.categoryname) = 2
LIMIT 1;

SQL select only items that happen after but not also before a certain date

I have a table in SQL that contains People's IDs, codes and entry dates for each code.
Table X:
PERSON_ID CODE ENTRY_DATE
1 A 2017-12-03
1 C 2016-01-13
1 C 2009-05-11
2 B 2007-03-25
2 F 2018-01-18
3 G 2003-04-09
And another table that contains the person_id and reference dates for each person.
Table Y:
PERSON_ID REF_DATE
1 2015-07-18
2 2017-06-17
3 2002-10-06
What I want to do is for each person select rows from table X for which codes happened after REF_DATE in TABLE Y but the CODE itself didn't also occur before REF_DATE. For Example, in the case of person 1, the codes that happened after 2015-07-18 are A (2017-12-03) and the first C (2016-01-13). But Since C also occurred before REF_DATE (2015-07-18) in 2009-05-11, C is not to be selected.
This is just an example, the actual tables have millions of rows and thousands of different codes so I can't manually type codes etc.
the expected result of the query in this example should be:
PERSON_ID CODE ENTRY_DATE
1 A 2017-12-03
2 F 2018-01-18
3 G 2003-04-09
Any idea how to code that in SQL ?
Thanks !
First you join both tables so you have REF_DATE then filter the rows to get only the ones after REF_DATE, but also make sure doesnt exist any row before that date with that code.
SQL DEMO
SELECT X.`PERSON_ID`, X.`CODE`, X.`ENTRY_DATE`, Y.`REF_DATE`
FROM TableX X
JOIN TableY Y
ON X.`PERSON_ID` = Y.`PERSON_ID`
WHERE X.`ENTRY_DATE` > Y.`REF_DATE`
AND NOT EXISTS (SELECT 1
FROM TableX
WHERE TableX.`PERSON_ID` = X.`PERSON_ID`
AND TableX.`CODE`= X.`CODE`
AND TableX.`ENTRY_DATE` < Y.`REF_DATE`
)
OUTPUT
| PERSON_ID | CODE | ENTRY_DATE | REF_DATE |
|-----------|------|----------------------|----------------------|
| 1 | A | 2017-12-03T00:00:00Z | 2015-07-18T00:00:00Z |
| 2 | F | 2018-01-18T00:00:00Z | 2017-06-17T00:00:00Z |
| 3 | G | 2003-04-09T00:00:00Z | 2002-10-06T00:00:00Z |

SQL query to find missing entries in

I have a database in which I need to find some missing entries and fill them in.
I have a table called "menu", each restaurant has multiple dishes and each dish has 4 different language entries (actually 8 in the main database but for simplicity lets go with 4), I need to find out which dishes for a particular restaurant are missing any language entries.
select * from menu where restaurantid = 1
i get stuck there, something along the lines of where language 1 or 2 or 3 or 4 doesn't exist which is the complicated bit because I need to see the languages that exist in order to see the language that's missing because I can't display something that isn't there. I hope that makes sense?
In the example table below restaurant 2 dishid 2 is missing language 3, that's what i need to find.
+--------------+--------+----------+-----------+
| RestaurantID | DishID | DishName | Language |
+--------------+--------+----------+-----------+
| 1 | 1 | Soup | 1 |
| 1 | 1 | Soúp | 2 |
| 1 | 1 | Soupe | 3 |
| 1 | 1 | Soupa | 4 |
| 1 | 2 | Bread | 1 |
| 1 | 2 | Bréad | 2 |
| 1 | 2 | Breade | 3 |
| 1 | 1 | Breada | 4 |
| 2 | 1 | Dish1 | 1 |
| 2 | 1 | Dísh1 | 2 |
| 2 | 1 | Disha1 | 3 |
| 2 | 1 | Dishe1 | 4 |
| 2 | 2 | Dish2 | 1 |
| 2 | 2 | Dísh2 | 2 |
| 2 | 2 | Dishe2 | 4 |
+--------------+--------+----------+-----------+
An anti-join pattern is usually the most efficient, in terms of performance.
Your particular case is a little more tricky, in that you need to "generate" rows that are missing. If every (ResturantID,DishID) should have 4 rows, with Language values of 1,2,3 and 4, we can generate that set of all rows with a CROSS JOIN operation.
The next step is to apply an anti-join... a LEFT OUTER JOIN to the rows that exist in the menu table, so we get all the rows from the CROSS JOIN set, along with matching rows.
The "trick" is to use a predicate in the WHERE clause that filters out rows where we found a match, so we are left rows that didn't have a match.
(It seems a bit strange at first, but once you get your brain wrapped around the anti-join pattern, it becomes familiar.)
So a query of this form should return the specified result set.
SELECT d.RestaurantID
, d.DishID
, lang.id AS missing_language
FROM (SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
) lang
CROSS
JOIN (SELECT e.RestaurantID, e.DishID
FROM menu e
GROUP BY e.RestaurantID, e.DishID
) d
LEFT
JOIN menu m
ON m.RestaurantID = d.RestaurantID
AND m.DishID = d.DishID
AND m.Language = lang.id
WHERE m.RestaurantID IS NULL
ORDER BY 1,2,3
Let's unpack that bit.
First we get a set containing the numbers 1 thru 4.
Next we get a set containing the (RestaurantID, DishID) distinct tuples. (For each distinct Restaurant, a distinct list of DishID, as long as there is at least one row for any Language for that combination.)
We do a CROSS JOIN, matching every row from set one (lang) with every row from set (d), to generate a "complete" set of every (RestaurantID, DishID, Language) we want to have.
The next part is the anti-join... the left outer join to menu to find which of the rows from the "complete" set has a matching row in menu, and filtering out all the rows that had a match.
That may be a little confusing. If we think of that CROSS JOIN operation producing a temporary table that looks like the menu table, but containing all possible rows... we can think of it in terms of pseudocode:
create temporary table all_menu_rows (RestaurantID, MenuID, Language) ;
insert into all_menu_rows ... all possible rows, combinations ;
Then the anti-join pattern is a little easier to see:
SELECT r.RestaurantID
, r.DishID
, r.Language
FROM all_menu_rows r
LEFT
JOIN menu m
ON m.RestaurantID = r.RestaurantID
AND m.DishID = r.DishID
AND m.Language = r.Language
WHERE m.RestaurantID IS NULL
ORDER BY 1,2,3
(But we don't have to incur the extra overhead of creating and populating the temporary table, we can do that right in the query.)
Of course, this isn't the only approach. We could use a NOT EXISTS predicate instead of an anti-join, though this is not usually as efficient. The first part of the query is the same, to generate the "complete" set of rows we expect to have; what differs is how we identify whether or not there is a matching row in the menu table:
SELECT d.RestaurantID
, d.DishID
, lang.id AS missing_language
FROM (SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
) lang
CROSS
JOIN (SELECT e.RestaurantID, e.DishID
FROM menu e
GROUP BY e.RestaurantID, e.DishID
) d
WHERE NOT EXISTS ( SELECT 1
FROM menu m
WHERE m.RestaurantID = d.RestaurantID
AND m.DishID = d.DishID
AND m.Language = lang.id
)
ORDER BY 1,2,3
For each row in the "complete" set (generated by the CROSS JOIN operation), we're going to run a correlated subquery that checks whether a matching row is found. The NOT EXISTS predicate returns TRUE if no matching row is found. (This is a little easier to understand, but it usually doesn't perform as well as the anti-join pattern.)
You can use the following statement if each menu item should have a record on each language (8 in real life 4 in example). You can change the number 4 to 8 if you want to see all menu items per restaurant that doesn't have all 8 entries.
SELECT RestaurantID,DishID, COUNT( * )
FROM Menu
GROUP BY RestaurantID,DishID
HAVING COUNT( * ) <4

Mysql - Group by multiple table and columns

I have the two following tables:
content:
========
cid | iid | qty
---------------
1 | 7 | 42
2 | 7 | 1
3 | 8 | 21
ret:
====
rid | cid | qty
--------------
1 | 1 | 2
2 | 1 | 10
3 | 2 | 1
I would like to retrieve, for each iid, the sum of content.qty and ret.qty
For exemple, for given tables, the result would be:
iid=7, SUM(content.qty) = 43, SUM(ret.qty)=13
iid=8, SUM(content.qty) = 21, SUM(ret.qty)=0
Is there any way to do it in one query?
In advance, thank you!
This is a bit complicated, because you don't want duplicates in your sums. To fix that problem, do the aggregations separately as subqueries. The first is directly on content the second joins back to content from ret to get the iid column.
The following query follows this approach, and assumes that cid is a unique key on content:
select c.iid, c.qty + coalesce(r.qty, 0)
from (select c.iid, SUM(qty) as cqty
from content c
group by c.iid
) c left outer join
(select c.iid, SUM(r.qty) as rqty
from ret r join
content c
on r.cid = c.cid
group by c.iid
) r
on c.iid = r.iid;

MySQL: select one row in subquery similar to MAX, only with custom order priority

Say I have two tables a and b; the column id is unique in table a but has multiple entries in table b, with different values for column x as well as other columns in that table. The primary key of table b is (id,x).
I need to be able to join a single row from b to the SELECT query as I could do with MAX like so:
SELECT * FROM a
INNER JOIN b USING(id)
WHERE b.x = (SELECT MAX(x) FROM b WHERE b.id = a.id)
With MAX, this works no problem. But I don't need MAX. I actually need something like this:
SELECT * FROM a
INNER JOIN b USING(id)
WHERE b.x = (SELECT x FROM b
WHERE x IN (2,3,7) OR x IS NULL
ORDER BY FIELD(x,3,2,7) DESC
LIMIT 1)
This query fails because my MySQL version does not support LIMIT in this subquery. Is there another way to make sure to join at least and exactly one row in the order I provided?
Schematic overview of what I'm trying to do would be:
Select * from table a for each id
Join table b where x = 7
If no entry in b exists where a.id=b.id and x = 7, join b where x = 2
If no entry in b exists where a.id=b.id and x IN (2,7), join b where x = 3
If no entry in b exists where a.id=b.id and x IN (2,3,7), join b where x IS NULL
I have this need because:
I can't use more than one query in this particular bit of code I need; it has to be one magic all-in-one query
I can't use INNER JOIN (SELECT *) statements
I can't have a query where any id is present more than once (so DISTINCT is not option because the values in table b might differ)
I know for sure that table b has an entry for a.id where x is either 2, 3, 7 or NULL, but I don't know which and I also don't know how many entries there are for a.id in table b.
Upgrading MySQL is not the best option since this code would have to work on generic servers of any of my customers
So, in short, my question is: is there a function similar to MAX that can take a specific order into account instead of the MAX value?
Thanks in advance!
You can do this by specifying manually the ordering weight of each X.
mysql> select * from aa;
+----+------+
| id | name |
+----+------+
| 1 | John |
| 2 | Ted |
| 3 | Jill |
| 4 | Jack |
+----+------+
4 rows in set (0.00 sec)
mysql> select * from bb;
+------+------+------------+
| id | x | class |
+------+------+------------+
| 1 | 7 | HighPriori |
| 1 | 2 | MediumPrio |
| 1 | 3 | LowPriorit |
| 2 | 2 | Medium |
| 2 | 3 | Low |
| 3 | 3 | Low only |
+------+------+------------+
5 rows in set (0.00 sec)
select version();
+-------------+
| version() |
+-------------+
| 5.5.25a-log |
+-------------+
SELECT aa.name, bb.x, bb.class
FROM aa LEFT JOIN bb ON (aa.id = bb.id AND bb.x IN (2,3,7)
AND bb.x = ( SELECT x FROM bb WHERE bb.id = aa.id AND x IN (2,3,7)
ORDER BY CASE
WHEN x = 7 THEN 100
WHEN x = 2 THEN 200
WHEN x = 3 THEN 300
ELSE 400
END LIMIT 1 )
);
The innermost SELECT will choose the most suitable value of X based on priority: 7 if available, else 2, else 3. This works also if, as in this case, the "priorities" are not in order, i.e., 7 is higher than 3, but 2 is also higher than 3.
Then the LEFT JOIN will match that one record, if it exists, or NULL, if it does not.
John has 2,3 and 7 and gets the 7-record, while Ted has 2 and 3, and gets the 2:
+------+------+------------+
| name | x | class |
+------+------+------------+
| John | 7 | HighPriori |
| Ted | 2 | Medium |
| Jill | 3 | Low only |
| Jack | NULL | NULL |
+------+------+------------+
(Strictly speaking, the IN (2,3,7) in the inner SELECT and the ELSE in the CASE are redundant; either will do).
In CASE of need...
If the X field already establishes an order, e.g., you want the values 3,2,7 in either 2 - 3 - 7 or 7 - 3 - 2 numeric order, you can do without the CASE: instead of
ORDER BY CASE
WHEN x = 7 THEN 100
WHEN x = 2 THEN 200
WHEN x = 3 THEN 300
ELSE 400
END
you can just specify
ORDER BY x
or
ORDER BY x DESC
...the performance improvement is slight, even if x is indexed on, but if you have a great many values of X, then specifying them all in the CASE can be awkward, and studying exceptions may be lengthy.
But let's imagine you wanted the objects to be in this order
First Last
20,21,22,23,40,41,42,9,1,2,3,4,5
which typically arises when X is declared unsigned with values 1-5 ("we will never need more classes and 1 is always going to be first!"), then some weeks later someone adds a "this is a new exciting product, must go first!" exception and assigns 9 to it, and finally it happens again with "let's add a whole new XY class starting with 2X and 4X for the forthcoming product series...", you could do this as
WHEN x <= 5 THEN 900+x
WHEN x = 9 THEN 809
ELSE 700 + x
which translates the above jumbled set in the ordered sequence
720,721,722,723,740,741,742,809,901,902,903,904,905
and with three WHENs you do all the work.
SELECT * FROM a
INNER JOIN b USING(id)
WHERE b.x = (SELECT x FROM b
WHERE x IN (2,3,7) OR x IS NULL
ORDER BY FIELD(x,3,2,7) DESC
LIMIT 1)
switch the subquery to a join
SELECT * FROM a
join ( SELECT x FROM b
WHERE x,id IN (2,3,7) OR x IS NULL
ORDER BY FIELD(x,3,2,7) DESC
LIMIT 1) as X on X.id = a.id