MySQL. Group matches in columns by a common id

MySQL. Group matches in columns by a common id - mysql

I am codding a search page with multiple filters and I am wondering if this is the best approach to get the results.
Each result of the search has several attributes, here I am using two attributes to simplify the example.
The main 'items' table:
id_items
1
2
The 'languages' table:
id_languages | language_code
1 es
2 en
The 'attributes_one' table:
id_attributes_one
1
2
The 'attributes_one_translations' table:
id_attributes_one_translations | id_attributes_one | id_language_code | translation
1 | 1 | 1 | Oro
2 | 1 | 2 | Gold
3 | 2 | 1 | Plata
4 | 2 | 2 | Silver
The 'attributes_one_match' table:
id_attributes_one_match | id_attributes_one | id_items
1 | 1 | 1
2 | 2 | 1
3 | 1 | 2
The 'attributes_two' table:
id_attributes_two
1
The 'attributes_two_translations' table:
id_attributes_two_translations | id_attributes_two | id_language_code | translation
1 | 1 | 2 | 99% gold
The 'attributes_two_match' table:
id_attributes_two_match | id_attributes_two | id_items
1 | 1 | 1
The concept is one item can have 0 or more match of each attribute table, and that match can have 0 or more translations.
Here is the query I am using when the user selects the filters to get all the items that have the attribute_one 'Gold' or 'Silver' order by this attribute ascendant:
SELECT
i.id_items AS id,
GROUP_CONCAT(DISTINCT aot.translation ORDER BY aot.translation DESC SEPARATOR '!¡') AS attribute_one,
GROUP_CONCAT(DISTINCT att.translation ORDER BY att.translation DESC SEPARATOR '!¡') AS attribute_two
FROM
items i
LEFT JOIN
languages AS l ON l.language_code = 'en'
LEFT JOIN
attributes_one_match AS aom ON aom.id_items = i.id_items
LEFT JOIN
attributes_one_translations AS aot ON aot.id_attributes_one = aom.id_attributes_one
AND l.id_languages = aot.id_language_code
AND (MATCH (aot.translation) AGAINST ('"Gold"' IN BOOLEAN MODE)
OR MATCH (aot.translation) AGAINST ('"Silver"' IN BOOLEAN MODE))
LEFT JOIN
attributes_one AS ao ON ao.id_attributes_one = aom.id_attributes_one
LEFT JOIN
attributes_two_match AS atm ON atm.id_items = i.id_items
LEFT JOIN
attributes_two_translations AS att ON att.id_attributes_two = atm.id_attributes_two
AND l.id_languages = att.id_language_code
LEFT JOIN
attributes_two AS at ON at.id_attributes_two = atm.id_attributes_two
GROUP BY id
ORDER BY 2 ASC
The result I get is:
id | attribute_one | attribute_two
2 | Gold | null
1 | Silver!¡Gold | 99% gold
That result is what I was expecting. Now:
* The table items will have around 300k entries once the data base is filled.
* There are 28 attributes table to match with the item.
Each attribute table will have around 20k entries, and each translation table will have 2
times the entries of the table that represents.
* Each item will have from 0 to 20 match to each item table, so I think
I wont have problems using the function GROUP_CONCAT
I am concern about the performance because the search filter page I am doing updates itself by ajax each time the user change one of the filters (it updates the filters and the results). The max results per page will be 1000 items, I didn't put the LIMIT in the query of the example.
I am not an sql expert so I don't really know if what I am doing is the best approach. I would appreciate some feedback.
Thanks a lot!

Related

MySQL add avg of count by id to existing select with id

Im not even sure what the title of this question should be but lets start out with my data.
I have a table of users who have taken a few lessons while belonging to a particular training center.
lesson table
id | lesson_id | user_id | has_completed
----------------------------------------
1 | asdf3314 | 2 | 1
2 | d13saf12 | 2 | 1
3 | a33adff5 | 2 | 0
4 | a33adff5 | 1 | 1
5 | d13saf12 | 1 | 0
user table
id | center_id | ...
----------------------------------------
1 | 20 | ...
2 | 30 | ...
training center table
id | center_name | ...
----------------------------------------
20 | learn.co | ...
30 | teach.co | ...
I've written a small chunk but am now stuck as I don't know how to proceed. This statement gets the counted total of completed lessons per user. it then figures the average completed value from a center id. if two users belong to a center and have completed 3 lessons and 2 lessons it finds the average of 3 and 2 then returns that.
SELECT
FLOOR(AVG(a.total)) AS avg_completion,
FROM
(SELECT
user_id,
user.center_id,
count(user_id) AS total
FROM lesson
LEFT JOIN user ON user.id = user_id
WHERE is_completed = 1 AND center_id = 2
GROUP BY user_id) AS a;
The question I have is how do I loop through the training centers table and also append average data from similar select statement as above to each center that is queried. I cant seem to pass the center id down to the subquery so there must be a fundamentally different way to achieve the same query but also loop through training centers.
An example of desired result:
center.id | avg_completion | ...training center table
-----------------------------------------------------
20 | 2 | ...

Your main query needs to select a.center_id and then use GROUP BY center_id. You can then join it with the training_center table.
SELECT c.*, x.avg_completion
FROM training_center AS c
JOIN (
SELECT
a.center_id,
FLOOR(AVG(a.total)) AS avg_completion
FROM (
SELECT
user_id
user.center_id,
count(*) AS total
FROM lesson
JOIN user ON user.id = user_id
WHERE is_completed = 1 AND center_id = 2
GROUP BY user_id) AS a
GROUP BY a.center_id) AS x
ON x.center_id = c.id

If I understand correctly:
select u.center_id, count(*) as num_users,
sum(l.has_completed) as num_completed,
avg(l.has_completed) as completed_ratio
from lesson l join
user u
on l.user_id = u.id
group by u.center_id

Select all items and count in related table by criteria

I have tables Match and Reaction as following:
REACTION
+----------+----------+----------+----------+
| user_id | game_id | item_id | reaction |
+----------+----------+----------+----------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 2 | 1 |
| 2 | 1 | 1 | 1 |
| 2 | 1 | 2 | 0 |
+----------+----------+----------+----------+
MATCH:
+----------+----------+
| game_id | item_id |
+----------+----------+
| 1 | 1 |
| 1 | 2 |
+----------+----------+
Now I want (if possible without subqueries) to select ALL item_ids from MATCH table AND count of rows where field reaction in table Reaction is equal to 1 for user with id = 2. For example, for defined tables I want to get following results:
+----------+----------+
| item_id | count |
+----------+----------+
| 1 |  1 |
| 2 | 0 |
+----------+----------+
I've tried something like
SELECT match.item_id, COUNT(reaction.user_id) as c
FROM match
LEFT JOIN reaction ON reaction.item_id = match.item_id
WHERE reaction.reaction = 1 AND match.game_id = 2
GROUP BY match.item_id
HAVING c > 0
but it didn't work as expected. I cannot get count for particular user.

I think you are close. I think you just need to move conditions on the second table to the ON clause:
SELECT m.item_id, COUNT(r.user_id) as c
FROM match m LEFT JOIN
reaction r
ON r.item_id = m.item_id AND
r.reaction = 1 AND
r.user_id = 2
WHERE m.game_id = 2
GROUP BY m.item_id;
I'm not sure what the HAVING clause is for, because you seem to want counts of 0.
Note that this also introduces table aliases so the query is easier to write and to read.

SELECT match.item_id, COUNT(reaction.user_id) as c
FROM match JOIN reaction ON (reaction.item_id = match.item_id and reaction.reaction = 1 AND match.game_id = 2)
GROUP BY match.item_id
HAVING COUNT(reaction.user_id)
I think you need to filter 'before' join -> so use the 'on' clause.
Filters in where are applied after the join is made while filter applied on on clause are applied before the join is made

You have not game_id = 2 so this should return no value
and you should not use left joined table columns in where condition otherwise these wprk as inner join ... in these cases you shou move the related condition in ON clause
SELECT match.item_id, COUNT(reaction.user_id) as c
FROM match
LEFT JOIN reaction ON reaction.item_id = match.item_id
AND reaction.reaction = 1
WHERE match.game_id = 2
GROUP BY match.item_id
HAVING c > 0
but try also
SELECT match.item_id, COUNT(reaction.user_id) as c
FROM match
LEFT JOIN reaction ON reaction.item_id = match.item_id
AND reaction.reaction = 1
GROUP BY match.item_id

MySQL Join two tables with condition

Based on these two tables:
products
| ID | Active | Name | No
--------------------------------------------------
| 1 | 1 | Shirt | 100
| 2 | 0 | Pullover | 200
variants
| MasterID | Active | Name | No
--------------------------------------------------
| 1 | 1 | Red | 101
| 1 | 0 | Yellow | 102
I want to get every product which is active and also their active variants in one sql.
Relation between those tables MasterID -> ID
Needed result:
ID (master) | Name | No
--------------------------------------------------
1 | Shirt | 100
1 | Red | 101
I tried it with using union, but then I am not able to get the belonging MasterIDs.

It looks like you just need a simple join:
select *
from products
left join variants
on products.ID = variants.MasterID
where products.Active = 1
and variants.Active = 1
Update after requirements were made clearer:
select ID, Name, No, 'products' as RowType
from products
where Active = 1
union
select variants.MasterID as ID, variants.Name, variants.No, 'variants' as RowType
from products
join variants
on products.ID = variants.MasterID
where products.Active = 1
and variants.Active = 1
order by ID, RowType, No
I've assumed you want the results ordered by ID, with products followed by variants. The No column may order it this way implicitly (it's impossible to know without real data), in which case the RowType column can be removed. The order by clause might need to be altered to match your specific RDBMS.

This should gives you the expected result:
select * from products left join variants on products.id = variants.masterId
where products.active=1 and variants.active=1
If not please add the expected result to your question.

SQL query to find missing entries in

I have a database in which I need to find some missing entries and fill them in.
I have a table called "menu", each restaurant has multiple dishes and each dish has 4 different language entries (actually 8 in the main database but for simplicity lets go with 4), I need to find out which dishes for a particular restaurant are missing any language entries.
select * from menu where restaurantid = 1
i get stuck there, something along the lines of where language 1 or 2 or 3 or 4 doesn't exist which is the complicated bit because I need to see the languages that exist in order to see the language that's missing because I can't display something that isn't there. I hope that makes sense?
In the example table below restaurant 2 dishid 2 is missing language 3, that's what i need to find.
+--------------+--------+----------+-----------+
| RestaurantID | DishID | DishName | Language |
+--------------+--------+----------+-----------+
| 1 | 1 | Soup | 1 |
| 1 | 1 | Soúp | 2 |
| 1 | 1 | Soupe | 3 |
| 1 | 1 | Soupa | 4 |
| 1 | 2 | Bread | 1 |
| 1 | 2 | Bréad | 2 |
| 1 | 2 | Breade | 3 |
| 1 | 1 | Breada | 4 |
| 2 | 1 | Dish1 | 1 |
| 2 | 1 | Dísh1 | 2 |
| 2 | 1 | Disha1 | 3 |
| 2 | 1 | Dishe1 | 4 |
| 2 | 2 | Dish2 | 1 |
| 2 | 2 | Dísh2 | 2 |
| 2 | 2 | Dishe2 | 4 |
+--------------+--------+----------+-----------+

An anti-join pattern is usually the most efficient, in terms of performance.
Your particular case is a little more tricky, in that you need to "generate" rows that are missing. If every (ResturantID,DishID) should have 4 rows, with Language values of 1,2,3 and 4, we can generate that set of all rows with a CROSS JOIN operation.
The next step is to apply an anti-join... a LEFT OUTER JOIN to the rows that exist in the menu table, so we get all the rows from the CROSS JOIN set, along with matching rows.
The "trick" is to use a predicate in the WHERE clause that filters out rows where we found a match, so we are left rows that didn't have a match.
(It seems a bit strange at first, but once you get your brain wrapped around the anti-join pattern, it becomes familiar.)
So a query of this form should return the specified result set.
SELECT d.RestaurantID
, d.DishID
, lang.id AS missing_language
FROM (SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
) lang
CROSS
JOIN (SELECT e.RestaurantID, e.DishID
FROM menu e
GROUP BY e.RestaurantID, e.DishID
) d
LEFT
JOIN menu m
ON m.RestaurantID = d.RestaurantID
AND m.DishID = d.DishID
AND m.Language = lang.id
WHERE m.RestaurantID IS NULL
ORDER BY 1,2,3
Let's unpack that bit.
First we get a set containing the numbers 1 thru 4.
Next we get a set containing the (RestaurantID, DishID) distinct tuples. (For each distinct Restaurant, a distinct list of DishID, as long as there is at least one row for any Language for that combination.)
We do a CROSS JOIN, matching every row from set one (lang) with every row from set (d), to generate a "complete" set of every (RestaurantID, DishID, Language) we want to have.
The next part is the anti-join... the left outer join to menu to find which of the rows from the "complete" set has a matching row in menu, and filtering out all the rows that had a match.
That may be a little confusing. If we think of that CROSS JOIN operation producing a temporary table that looks like the menu table, but containing all possible rows... we can think of it in terms of pseudocode:
create temporary table all_menu_rows (RestaurantID, MenuID, Language) ;
insert into all_menu_rows ... all possible rows, combinations ;
Then the anti-join pattern is a little easier to see:
SELECT r.RestaurantID
, r.DishID
, r.Language
FROM all_menu_rows r
LEFT
JOIN menu m
ON m.RestaurantID = r.RestaurantID
AND m.DishID = r.DishID
AND m.Language = r.Language
WHERE m.RestaurantID IS NULL
ORDER BY 1,2,3
(But we don't have to incur the extra overhead of creating and populating the temporary table, we can do that right in the query.)
Of course, this isn't the only approach. We could use a NOT EXISTS predicate instead of an anti-join, though this is not usually as efficient. The first part of the query is the same, to generate the "complete" set of rows we expect to have; what differs is how we identify whether or not there is a matching row in the menu table:
SELECT d.RestaurantID
, d.DishID
, lang.id AS missing_language
FROM (SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
) lang
CROSS
JOIN (SELECT e.RestaurantID, e.DishID
FROM menu e
GROUP BY e.RestaurantID, e.DishID
) d
WHERE NOT EXISTS ( SELECT 1
FROM menu m
WHERE m.RestaurantID = d.RestaurantID
AND m.DishID = d.DishID
AND m.Language = lang.id
)
ORDER BY 1,2,3
For each row in the "complete" set (generated by the CROSS JOIN operation), we're going to run a correlated subquery that checks whether a matching row is found. The NOT EXISTS predicate returns TRUE if no matching row is found. (This is a little easier to understand, but it usually doesn't perform as well as the anti-join pattern.)

You can use the following statement if each menu item should have a record on each language (8 in real life 4 in example). You can change the number 4 to 8 if you want to see all menu items per restaurant that doesn't have all 8 entries.
SELECT RestaurantID,DishID, COUNT( * )
FROM Menu
GROUP BY RestaurantID,DishID
HAVING COUNT( * ) <4

Score algorithm in multiple join

I have a list of publications stored in publications table. Each publication has a many-to-many relation with categories and also a many-to-many relation with keywords.
Given a publication I'd like to find related ones based on a score value computed with the following algorithm:
each shared category with other publications counts as one point
each shared keyword with other publications counts as one point
the score value is the sum of the points computed with previous steps
I want to retrieve with a single query the list of related publications ordered by this score.
Now I have these two queries which compute the score for both categories and keyword
SELECT c.publication_id, (COUNT(c.category_id)) AS cscore
FROM cat_pub c
WHERE c.category_id IN <list of category ids obtained from the current publication>
GROUP BY c.publication_id
ORDER BY cscore DESC
and for the keyword score
SELECT k.publication_id, (COUNT(k.keyword_id)) AS kscore
FROM key_pub k
WHERE k.keyword IN <list of category ids obtained from the current publication>
GROUP BY k.publication_id
ORDER BY kscore DESC
Finally I need to JOIN the resulting query with a SELECT query which should retrieve publications data (title, intro, etc,) ordering them by score and with a limit clause to get the most relevant publications related to the selected one.
Currently I tried to use these two queries as subtables in a join:
SELECT mydata.*, (q1.cscore + q2.kscore) AS score
FROM publications p
INNER JOIN (<cscore query>) q1 ON p.id = q1.publication_id
INNER JOIN (<kscore query>) q2 ON p.id = q2.publication_id
ORDER BY score DESC
LIMIT 5
EXPLAIN shows me that a couple of temporary table will be used. Could it be a performance problem? Is there any better way to implement this?
Update
To answer to Johan's comment
Your solution is wrong. Use a LIMIT clause in subqueries could lead to inconsistent results with every value for the limit. What if I have the following results for the subqueries (I'll show 11 records, but your query will fetch only the first ten)
+-------+--------+ +-------+--------+
| p.id | cscore | | p.id | kscore |
+-------+--------+ +-------+--------+
| 27854 | 100 | | 27865 | 100 |
| 27853 | 100 | | 27864 | 100 |
| 27852 | 100 | | 27863 | 100 |
| 27851 | 100 | | 27862 | 100 |
| 27850 | 100 | | 27861 | 100 |
| 27849 | 100 | | 27860 | 100 |
| 27848 | 100 | | 27859 | 100 |
| 27847 | 100 | | 27858 | 100 |
| 27846 | 100 | | 27857 | 100 |
| 27845 | 100 | | 27856 | 100 |
| 27844 | 100 | | 27855 | 100 |
| 1000 | 99 | | 1000 | 99 |
+-------+--------+ +-------+--------+
If I have ten record with 100 as cscore and ten different records with 100 as kscore the join will produce an empty set. So I'm not getting any result, while the publication with id 1000 should be the solution and it's left out from the result set.
Furthermore I could consider your solution with a LEFT JOIN, in this case only records from the left table will be fetched, and each record will get a total score of 100 (because of the NULL given by the empty kscore field in the second table). Again, the result is wrong because the highest scored record should be p1000 with a total score of 198 (= 99 + 99)
Your solution cannot produce reliable results.

You only want 5 results each from the subqueries.
I think it is best to only select 5 from then and use that in the query.
Rewrite q1 as:
SELECT c.publication_id, COUNT(*) AS cscore
FROM cat_pub c
WHERE c.publication_id = p.id
AND c.category_id IN <list of category ids obtained from the current publication>
GROUP BY c.publication_id
ORDER BY cscore DESC
LIMIT 10
Rewrite q2 as:
SELECT k.publication_id, COUNT(*) AS kscore
FROM key_pub k
WHERE p.id = k.publication_id
AND k.keyword IN <list of category ids obtained from the current publication>
GROUP BY k.publication_id
ORDER BY kscore DESC
LIMIT 10
Leave the join as is:
SELECT p.*, (q1.cscore + q2.kscore) AS score
FROM publications p
INNER JOIN (<cscore query>) q1 ON p.id = q1.publication_id
INNER JOIN (<kscore query>) q2 ON p.id = q2.publication_id
ORDER BY score DESC
LIMIT 5
Note that count(*) is usually a faster choice, because it will not test of null If you can have null values and don't want to include those in the count, then name the count(field) explicitly.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL. Group matches in columns by a common id - mysql

Related

MySQL add avg of count by id to existing select with id

Select all items and count in related table by criteria

MySQL Join two tables with condition

SQL query to find missing entries in

Score algorithm in multiple join

Categories

Resources