MYSQL - Left join a double relationship? - mysql

SELECT stores.ID, store_info.display_name, store_info.address,store_info.phone,
IFNULL(
GROUP_CONCAT(DISTINCT storeBrands.display_name ORDER BY storeBrands.name),
GROUP_CONCAT(chainBrands.display_name ORDER BY chainBrands.name)
) AS brands,
IFNULL(
GROUP_CONCAT(DISTINCT storeFilters.name ORDER BY storeFilters.name),
GROUP_CONCAT(DISTINCT chainFilters.name ORDER BY chainFilters.name)
) AS filters
FROM stores
LEFT JOIN store_info ON stores.ID = store_info.storeID
LEFT JOIN store_brands ON stores.ID = store_brands.store
LEFT JOIN chain_brands ON stores.chainID = chain_brands.chain
LEFT JOIN brands AS storeBrands ON store_brands.brand = storeBrands.ID
LEFT JOIN brands AS chainBrands ON chain_brands.brand = chainBrands.ID
LEFT JOIN store_filters ON stores.ID = store_filters.store
LEFT JOIN chain_filters ON stores.chainID = chain_filters.chain
LEFT JOIN filters AS storeFilters ON store_filters.filter = storeFilters.ID
LEFT JOIN filters AS chainFilters ON chain_filters.filter = chainFilters.ID
WHERE stores.city = 1
GROUP BY stores.ID
I have updated this question because I have solved the initial problem myself, but there's still one more question:
How can I improve on this?
I feel like I've made a lot of progress already. I have gone from doing a union with subqueries, to doing a single query with subqueries, to improving my joins up to the point where I don't need to do a subquery for each row anymore.
However, it still feels like it could be better. I'm very insecure about my joins.
Does anyone have tips of improvement here?
The goal:
I want this query to get results from a hierarchy. We have 'parents' (chains) that share the same brands and filters (and other things) as their own children(stores). The idea is for the 'child' to inherit the parent's settings as a fallback, but completely ignores it when it starts setting its own data.
So, basically, with one query, you want "either this data or that data", never both. One or the other. (Another reason why UNIONwon't really fit)

If you want the chain's only when the store has none, formulate a UNION. One operand of the UNION joins the master data to the store data, while the other operand--instantiated only when the store has none--joins the master data to the chain data. That's what one uses a UNION for: "I sometimes want these, and I sometimes want those."

I managed to fix this issue myself, also making it way more efficient, removing most subqueries!
SELECT stores.ID, store_info.display_name, store_info.address,store_info.phone,
IFNULL(
GROUP_CONCAT(DISTINCT storeBrands.display_name ORDER BY storeBrands.name),
GROUP_CONCAT(chainBrands.display_name ORDER BY chainBrands.name)
) AS brands,
IFNULL(
GROUP_CONCAT(DISTINCT storeFilters.name ORDER BY storeFilters.name),
GROUP_CONCAT(DISTINCT chainFilters.name ORDER BY chainFilters.name)
) AS filters
FROM stores
LEFT JOIN store_info ON stores.ID = store_info.storeID
LEFT JOIN store_brands ON stores.ID = store_brands.store
LEFT JOIN chain_brands ON stores.chainID = chain_brands.chain
LEFT JOIN brands AS storeBrands ON store_brands.brand = storeBrands.ID
LEFT JOIN brands AS chainBrands ON chain_brands.brand = chainBrands.ID
LEFT JOIN store_filters ON stores.ID = store_filters.store
LEFT JOIN chain_filters ON stores.chainID = chain_filters.chain
LEFT JOIN filters AS storeFilters ON store_filters.filter = storeFilters.ID
LEFT JOIN filters AS chainFilters ON chain_filters.filter = chainFilters.ID
WHERE stores.city = $cityID
GROUP BY stores.ID"

Related

FULL table scan in LEFT JOIN using OR

How can this query be optimized to avoid the full table scan described below?
I've got a slow query that's taking approximately 15 seconds to return.
Let's get this part out of the way - I've confirmed all indexes are there.
When I run EXPLAIN, it shows that there is a FULL TABLE scan ran on the crosswalk table (the index for fromQuestionCategoryJoinID is not used, even if I attempt to force) - if I remove either of the fields and the OR, the index is used and query completes in milliseconds.
SELECT c.id, c.name, GROUP_CONCAT(DISTINCT tags.externalDisplayID SEPARATOR ', ') AS tags
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN questioncategoryjoin qcatjsub on qcatjsub.parentQuestionID = qcatj.questionID
LEFT JOIN crosswalk cw on (cw.fromQuestionCategoryJoinID = qcatj.id OR cw.fromQuestionCategoryJoinID = qcatjsub.id)
-- index used if I remove OR, eg.: LEFT JOIN crosswalk cw on (cw.fromQuestionCategoryJoinID = qcatj.id)
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
GROUP BY c.id
ORDER BY c.name, tags.externalDisplayID;
Split the query into two queries for each part of the OR. Then combine them with UNION.
SELECT id, name, GROUP_CONCAT(DISTINCT externalDisplayID SEPARATOR ', ') AS tags
FROM (
SELECT c.id, c.name, tags.externalDisplayID
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN crosswalk cw on cw.fromQuestionCategoryJoinID = qcatj.id
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
UNION ALL
SELECT c.id, c.name, tags.externalDisplayID
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN questioncategoryjoin qcatjsub on qcatjsub.parentQuestionID = qcatj.questionID
LEFT JOIN crosswalk cw on cw.fromQuestionCategoryJoinID = qcatjsub.id
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
) AS x
GROUP BY x.id
ORDER BY x.name
Also, it doesn't make sense to include externalDisplayID in ORDER BY, because that will order by its value from a random row in the group. You could put ORDER BY externalDisplayID in the GROUP_CONCAT() arguments if that's what you want.
There is a second inefficiency going on here. I call it "explode-implode". First a bunch of JOINs (potentially) expand the number of rows in an intermediate table, then GROUP BY c.id collapses the number of rows back to what you started with (one row of output per row of checkpoint).
Before trying to help with that, please answer:
Is LEFT really needed?
How many rows in each table? (Especially in cw)
Can you get rid of DISTINCT?
Barmar's answer can possibly be improved upon by delaying the JOINs to qcj1andtagsuntil after theUNION`:
SELECT ...
FROM ( SELECT ...
FROM first few tables
UNION ALL
SELECT ...
FROM first few tables
) AS u
[LEFT] JOIN qcj1
[LEFT] JOIN tags
GROUP BY ...
ORDER BY ...
Another optimization (again building on Barmar's)
GROUP BY x.id
ORDER BY x.name
-->
GROUP BY x.name, x.id
ORDER BY x.name, x.id
When the items in GROUP BY and ORDER BY are the "same", they can be done in a single action, thereby saving (at least) a sort.
x.name, x.id is deterministic, where as x.name might put two rows with the same name in a different order, depending (perhaps) on the phase of the moon.
These indexes may help:
qcheckj: INDEX(checklistID, questionID)
qcatj: INDEX(questionID, id)
qcatjsub: INDEX(parentQuestionID, id)
cw: INDEX(fromQuestionCategoryJoinID, toQuestionCategoryJoinID)

Not getting the right results using GROUP_CONCAT in query

I have 7 tables to work with inside a query:
tb_post, tb_spots, users, td_sports, tb_spot_types, tb_users_sports, tb_post_media
This is the query I am using:
SELECT po.id_post AS id_post,
po.description_post as description_post,
sp.id_spot as id_spot,
po.date_post as date_post,
u.id AS userid,
u.user_type As tipousuario,
u.username AS username,
spo.id_sport AS sportid,
spo.sport_icon as sporticon,
st.logo_spot_type as spottypelogo,
sp.city_spot AS city_spot,
sp.country_spot AS country_spot,
sp.latitud_spot as latitudspot,
sp.longitud_spot as longitudspot,
sp.short_name AS spotshortname,
sp.verified_spot AS spotverificado,
u.profile_image AS profile_image,
sp.verified_spot_by as spotverificadopor,
uv.id AS spotverificador,
uv.user_type AS spotverificadornivel,
pm.media_type AS mediatype,
pm.media_file AS mediafile,
GROUP_CONCAT(tus.user_sport_sport) sportsdelusuario,
GROUP_CONCAT(logosp.sport_icon) sportsdelusuariologos,
GROUP_CONCAT(pm.media_file) mediapost,
GROUP_CONCAT(pm.media_type) mediaposttype
FROM tb_posts po
LEFT JOIN tb_spots sp ON po.spot_post = sp.id_spot
LEFT JOIN users u ON po.uploaded_by_post = u.id
LEFT JOIN tb_sports spo ON sp.sport_spot = spo.id_sport
LEFT JOIN tb_spot_types st ON sp.type_spot = st.id_spot_type
LEFT JOIN users uv ON sp.verified_spot_by = uv.id
LEFT JOIN tb_users_sports tus ON tus.user_sport_user = u.id
LEFT JOIN tb_sports logosp ON logosp.id_sport = tus.user_sport_sport
LEFT JOIN tb_post_media pm ON pm.media_post = po.id_post
WHERE po.status = 1
GROUP BY po.id_post,uv.id
I am having problems with some of the GROUP_CONCAT groups:
GROUP_CONCAT(tus.user_sport_sport) sportsdelusuario is giving me the right items but repeated, all items twice
GROUP_CONCAT(logosp.sport_icon) sportsdelusuariologos is giving me the right items but repeated, all items twice
GROUP_CONCAT(pm.media_file) mediapost is giving me the right items but repeated four times
GROUP_CONCAT(pm.media_type) mediaposttype s giving me the right items but repeated four times
I can put here all tables structures if you need them.
Multiple one-to-many relations JOINed in a query have a multiplicative affect on aggregation results; the standard solution is subqueries:
You can change
GROUP_CONCAT(pm.media_type) mediaposttype
...
LEFT JOIN tb_post_media pm ON pm.media_post = po.id_post
to
pm.mediaposttype
...
LEFT JOIN (
SELECT media_post, GROUP_CONCAT(media_type) AS mediaposttype
FROM tb_post_media
GROUP BY media_post
) AS pm ON pm.media_post = po.id_post
If tb_post_media is very big, and the po.status = 1 condition in the outer query would significantly reduce the results of the subquery, it can be worth replicating the original join within the subquery to filter down it's results.
Similarly, the correlated version I mentioned in the comments can also be more performant if the outer query has relatively few results. (Calculating the GROUP_CONCAT() for each individually can cost less than calculating it for all once if you would only actually using very few of the results of the latter).
or just add DISTINCT to all the group_concat, e.g., GROUP_CONCAT(DISTINCT pm.media_type)

INNER JOIN with GROUP_CONCAT returns repeating values

If I try to query my database with the following query, I get the intended result:
SELECT anime.anime, GROUP_CONCAT(themes.theme) AS themes
FROM anime
INNER JOIN anime_themes ON anime.id = anime_themes.anime_id
INNER JOIN themes ON themes.id = anime_themes.theme_id
GROUP BY anime.anime
ORDER BY anime.id;
Example Output:
"amnesia,bodyswitch,naturaldisaster,tragedy"
However, when I attempt to query more data at once using the following query, I get repeating data:
SELECT anime.*, directors.director, studios.studio,
GROUP_CONCAT(genres.genre) AS genres, GROUP_CONCAT(themes.theme) AS themes
FROM anime
INNER JOIN anime_directors ON anime.id = anime_directors.anime_id
INNER JOIN directors ON directors.id = anime_directors.director_id
INNER JOIN anime_studios ON anime.id = anime_studios.anime_id
INNER JOIN studios ON studios.id = anime_studios.studio_id
INNER JOIN anime_genres ON anime.id = anime_genres.anime_id
INNER JOIN genres ON genres.id = anime_genres.genre_id
INNER JOIN anime_themes ON anime.id = anime_themes.anime_id
INNER JOIN themes ON themes.id = anime_themes.theme_id
GROUP BY anime.anime
ORDER BY anime.id;
Example Output:
bodyswitch,naturaldisaster,tragedy,amnesia,bodyswitch,naturaldisaster,naturaldisaster,tragedy,amnesia,amnesia,bodyswitch,naturaldisaster,tragedy,tragedy,amnesia,bodyswitch"
The data doesn't even seem to be repeating in some coherent order with the second query. Why is the behavior different between the two queries, and how do I go about fixing this? Thanks.
Edit: Only 'themes' and 'genres' have repeating values while 'directors' and 'studios' don't, so I assume it has something to do with the GROUP_CONCAT function.
Just add DISTINCT.
Example:
GROUP_CONCAT(DISTINCT genres.genre) AS genres, GROUP_CONCAT(DISTINCT
themes.theme) AS themes
You are joining more table at second query, it may cause a record appear repeatedly.
Adding "DISTINCT" might remove the repeated values but that is not the best way to optimize this query.
You should also consider using subquery or left join.

SQL AND without requirement

I have a table displaying on my website with a list of projects. The SQL statement below pulls in each project and converts the ###_id columns to the ###_name in another table. So far so good.
The problem I have is that this is requiring all fields in a row in the projects table to be filled out. If, for example, the project row has no value for 'proj_industry_id' then the project won't display here at all.
I've tried removing the 'AND' for each match-up in the WHERE statement and separating them with commas, but it errors out.
I've also checked SQL docs and can't seem to find my way to an answer over there.
Any ideas on how I can get my statement to still match up the id with the name when I have one, but still show the record when I don't?
Thanks!
$sql = "SELECT
projects.*,
engagement_types.eng_type_name AS eng_type,
users.user_full_name AS username,
industries.industry_name AS industry_name,
categories.category_name AS category_name,
geographies.geo_name AS geo_name,
status.status_name AS status_name
FROM
projects,
engagement_types,
users,
industries,
categories,
geographies,
status
WHERE
projects.proj_eng_type_id = engagement_types.id
AND projects.proj_lead_id = users.id
AND projects.proj_industry_id = industries.id
AND projects.proj_category_id = categories.id
AND projects.proj_geo_id = geographies.id
AND projects.proj_status_id = status.id
AND projects.proj_geo_id = '$selected_geo_id'";
*****EDIT******
Here is the final correct code from the solution below using multiple left joins!
SELECT
projects.*,
engagement_types.eng_type_name AS eng_type,
users.user_full_name AS username,
industries.industry_name AS industry_name,
categories.category_name AS category_name,
geographies.geo_name AS geo_name,
status.status_name AS status_name
FROM
projects
LEFT JOIN engagement_types ON projects.proj_eng_type_id = engagement_types.id
LEFT JOIN users ON projects.proj_lead_id = users.id
LEFT JOIN industries ON projects.proj_industry_id = industries.id
LEFT JOIN categories ON projects.proj_category_id = categories.id
LEFT JOIN geographies ON projects.proj_geo_id = geographies.id
LEFT JOIN status ON projects.proj_status_id = status.id
GROUP BY
proj_start_date
sounds like your have to look at "LEFT JOIN" https://www.w3schools.com/sql/sql_join_left.asp
Otherwise you miss the left part of the green circle.

I'm getting more than one result per left row from a LEFT JOIN with a subquery

With this query I'm supposed to get the latest message from the chats table for every agreement I get, as well as all the information including business name, etc.
I kind of solved it using GROUP BY in the subquery, but it is not the way I wanna fix this, because I don't understand why it does act as a RIGHT JOIN, and WHY doesn't it order it in the way I meant in the subquery:
SELECT agreements.id, agreements.`date`, agreements.state, business.name, chat.message
FROM ((agreements JOIN
business_admin
ON agreements.business = business_admin.business AND business_admin.user = 1
) LEFT JOIN
business
ON business.id = agreements.business
) LEFT JOIN
(SELECT agreements_chat.agreement, agreements_chat.message
FROM agreements_chat
WHERE origin = 0
ORDER BY agreements_chat.`date` DESC
) AS chat
ON agreements.id = chat.agreement
I really appreciate your help, thank you so much!
It's not working because the subquery in your left join returns more than one rows, hence the duplication of rows you get.
SELECT agreements.id,
agreements.`date`,
agreements.state,
business.name,
chat.message
FROM agreements
JOIN business_admin
ON agreements.business = business_admin.business AND
business_admin.user = 1
LEFT JOIN
business
ON business.id = agreements.business
LEFT JOIN
agreements_chat chat
ON chat.origin = 0 AND
chat.agreement = agreements.id
LEFT JOIN
(
SELECT agreement, max(`date`) last_date
FROM agreements_chat
GROUP BY agreement
) last_chat
ON chat.agreement = last_chat.agreement AND
chat.`date` = last_chat.last_date
Note that (as per #GordonLinoff comment) you don't need parenthese around your joins.