Not getting the right results using GROUP_CONCAT in query - mysql

I have 7 tables to work with inside a query:
tb_post, tb_spots, users, td_sports, tb_spot_types, tb_users_sports, tb_post_media
This is the query I am using:
SELECT po.id_post AS id_post,
po.description_post as description_post,
sp.id_spot as id_spot,
po.date_post as date_post,
u.id AS userid,
u.user_type As tipousuario,
u.username AS username,
spo.id_sport AS sportid,
spo.sport_icon as sporticon,
st.logo_spot_type as spottypelogo,
sp.city_spot AS city_spot,
sp.country_spot AS country_spot,
sp.latitud_spot as latitudspot,
sp.longitud_spot as longitudspot,
sp.short_name AS spotshortname,
sp.verified_spot AS spotverificado,
u.profile_image AS profile_image,
sp.verified_spot_by as spotverificadopor,
uv.id AS spotverificador,
uv.user_type AS spotverificadornivel,
pm.media_type AS mediatype,
pm.media_file AS mediafile,
GROUP_CONCAT(tus.user_sport_sport) sportsdelusuario,
GROUP_CONCAT(logosp.sport_icon) sportsdelusuariologos,
GROUP_CONCAT(pm.media_file) mediapost,
GROUP_CONCAT(pm.media_type) mediaposttype
FROM tb_posts po
LEFT JOIN tb_spots sp ON po.spot_post = sp.id_spot
LEFT JOIN users u ON po.uploaded_by_post = u.id
LEFT JOIN tb_sports spo ON sp.sport_spot = spo.id_sport
LEFT JOIN tb_spot_types st ON sp.type_spot = st.id_spot_type
LEFT JOIN users uv ON sp.verified_spot_by = uv.id
LEFT JOIN tb_users_sports tus ON tus.user_sport_user = u.id
LEFT JOIN tb_sports logosp ON logosp.id_sport = tus.user_sport_sport
LEFT JOIN tb_post_media pm ON pm.media_post = po.id_post
WHERE po.status = 1
GROUP BY po.id_post,uv.id
I am having problems with some of the GROUP_CONCAT groups:
GROUP_CONCAT(tus.user_sport_sport) sportsdelusuario is giving me the right items but repeated, all items twice
GROUP_CONCAT(logosp.sport_icon) sportsdelusuariologos is giving me the right items but repeated, all items twice
GROUP_CONCAT(pm.media_file) mediapost is giving me the right items but repeated four times
GROUP_CONCAT(pm.media_type) mediaposttype s giving me the right items but repeated four times
I can put here all tables structures if you need them.

Multiple one-to-many relations JOINed in a query have a multiplicative affect on aggregation results; the standard solution is subqueries:
You can change
GROUP_CONCAT(pm.media_type) mediaposttype
...
LEFT JOIN tb_post_media pm ON pm.media_post = po.id_post
to
pm.mediaposttype
...
LEFT JOIN (
SELECT media_post, GROUP_CONCAT(media_type) AS mediaposttype
FROM tb_post_media
GROUP BY media_post
) AS pm ON pm.media_post = po.id_post
If tb_post_media is very big, and the po.status = 1 condition in the outer query would significantly reduce the results of the subquery, it can be worth replicating the original join within the subquery to filter down it's results.
Similarly, the correlated version I mentioned in the comments can also be more performant if the outer query has relatively few results. (Calculating the GROUP_CONCAT() for each individually can cost less than calculating it for all once if you would only actually using very few of the results of the latter).

or just add DISTINCT to all the group_concat, e.g., GROUP_CONCAT(DISTINCT pm.media_type)

Related

SQL query optimization for speed

So I was working on the problem of optimizing the following query I have already optimized this to the fullest from my side can this be further optimized?
select distinct name ad_type
from dim_ad_type x where exists ( select 1
from sum_adserver_dimensions sum
left join dim_ad_tag_map on dim_ad_tag_map.id=sum.ad_tag_map_id and dim_ad_tag_map.client_id=sum.client_id
left join dim_site on dim_site.id = dim_ad_tag_map.site_id
left join dim_geo on dim_geo.id = sum.geo_id
left join dim_region on dim_region.id=dim_geo.region_id
left join dim_device_category on dim_device_category.id=sum.device_category_id
left join dim_ad_unit on dim_ad_unit.id=dim_ad_tag_map.ad_unit_id
left join dim_monetization_channel on dim_monetization_channel.id=dim_ad_tag_map.monetization_channel_id
left join dim_os on dim_os.id = sum.os_id
left join dim_ad_type on dim_ad_type.id = dim_ad_tag_map.ad_type_id
left join dim_integration_type on dim_integration_type.id = dim_ad_tag_map.integration_type_id
where sum.client_id = 50
and dim_ad_type.id=x.id
)
order by 1
Your query although joined ok, is an overall bloat. You are using the dim_ad_type table on the outside, just to make sure it exists on the inside as well. You have all those left-joins that have NO bearing on the final outcome, why are they even there. I would simplify by reversing the logic. By tracing your INNER query for the same dim_ad_type table, I find the following is the direct line. sum -> dim_ad_tag_map -> dim_ad_type. Just run that.
select distinct
dat.name Ad_Type
from
sum_adserver_dimensions sum
join dim_ad_tag_map tm
on sum.ad_tag_map_id = tm.id
and sum.client_id = tm.client_id
join dim_ad_type dat
on tm.ad_type_id = dat.id
where
sum.client_id = 50
order by
1
Your query was running ALL dim_ad_types, then finding all the sums just to find those that matched. Run it direct starting with the one client, then direct with JOINs.

FULL table scan in LEFT JOIN using OR

How can this query be optimized to avoid the full table scan described below?
I've got a slow query that's taking approximately 15 seconds to return.
Let's get this part out of the way - I've confirmed all indexes are there.
When I run EXPLAIN, it shows that there is a FULL TABLE scan ran on the crosswalk table (the index for fromQuestionCategoryJoinID is not used, even if I attempt to force) - if I remove either of the fields and the OR, the index is used and query completes in milliseconds.
SELECT c.id, c.name, GROUP_CONCAT(DISTINCT tags.externalDisplayID SEPARATOR ', ') AS tags
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN questioncategoryjoin qcatjsub on qcatjsub.parentQuestionID = qcatj.questionID
LEFT JOIN crosswalk cw on (cw.fromQuestionCategoryJoinID = qcatj.id OR cw.fromQuestionCategoryJoinID = qcatjsub.id)
-- index used if I remove OR, eg.: LEFT JOIN crosswalk cw on (cw.fromQuestionCategoryJoinID = qcatj.id)
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
GROUP BY c.id
ORDER BY c.name, tags.externalDisplayID;
Split the query into two queries for each part of the OR. Then combine them with UNION.
SELECT id, name, GROUP_CONCAT(DISTINCT externalDisplayID SEPARATOR ', ') AS tags
FROM (
SELECT c.id, c.name, tags.externalDisplayID
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN crosswalk cw on cw.fromQuestionCategoryJoinID = qcatj.id
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
UNION ALL
SELECT c.id, c.name, tags.externalDisplayID
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN questioncategoryjoin qcatjsub on qcatjsub.parentQuestionID = qcatj.questionID
LEFT JOIN crosswalk cw on cw.fromQuestionCategoryJoinID = qcatjsub.id
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
) AS x
GROUP BY x.id
ORDER BY x.name
Also, it doesn't make sense to include externalDisplayID in ORDER BY, because that will order by its value from a random row in the group. You could put ORDER BY externalDisplayID in the GROUP_CONCAT() arguments if that's what you want.
There is a second inefficiency going on here. I call it "explode-implode". First a bunch of JOINs (potentially) expand the number of rows in an intermediate table, then GROUP BY c.id collapses the number of rows back to what you started with (one row of output per row of checkpoint).
Before trying to help with that, please answer:
Is LEFT really needed?
How many rows in each table? (Especially in cw)
Can you get rid of DISTINCT?
Barmar's answer can possibly be improved upon by delaying the JOINs to qcj1andtagsuntil after theUNION`:
SELECT ...
FROM ( SELECT ...
FROM first few tables
UNION ALL
SELECT ...
FROM first few tables
) AS u
[LEFT] JOIN qcj1
[LEFT] JOIN tags
GROUP BY ...
ORDER BY ...
Another optimization (again building on Barmar's)
GROUP BY x.id
ORDER BY x.name
-->
GROUP BY x.name, x.id
ORDER BY x.name, x.id
When the items in GROUP BY and ORDER BY are the "same", they can be done in a single action, thereby saving (at least) a sort.
x.name, x.id is deterministic, where as x.name might put two rows with the same name in a different order, depending (perhaps) on the phase of the moon.
These indexes may help:
qcheckj: INDEX(checklistID, questionID)
qcatj: INDEX(questionID, id)
qcatjsub: INDEX(parentQuestionID, id)
cw: INDEX(fromQuestionCategoryJoinID, toQuestionCategoryJoinID)

MYSQL - Left join a double relationship?

SELECT stores.ID, store_info.display_name, store_info.address,store_info.phone,
IFNULL(
GROUP_CONCAT(DISTINCT storeBrands.display_name ORDER BY storeBrands.name),
GROUP_CONCAT(chainBrands.display_name ORDER BY chainBrands.name)
) AS brands,
IFNULL(
GROUP_CONCAT(DISTINCT storeFilters.name ORDER BY storeFilters.name),
GROUP_CONCAT(DISTINCT chainFilters.name ORDER BY chainFilters.name)
) AS filters
FROM stores
LEFT JOIN store_info ON stores.ID = store_info.storeID
LEFT JOIN store_brands ON stores.ID = store_brands.store
LEFT JOIN chain_brands ON stores.chainID = chain_brands.chain
LEFT JOIN brands AS storeBrands ON store_brands.brand = storeBrands.ID
LEFT JOIN brands AS chainBrands ON chain_brands.brand = chainBrands.ID
LEFT JOIN store_filters ON stores.ID = store_filters.store
LEFT JOIN chain_filters ON stores.chainID = chain_filters.chain
LEFT JOIN filters AS storeFilters ON store_filters.filter = storeFilters.ID
LEFT JOIN filters AS chainFilters ON chain_filters.filter = chainFilters.ID
WHERE stores.city = 1
GROUP BY stores.ID
I have updated this question because I have solved the initial problem myself, but there's still one more question:
How can I improve on this?
I feel like I've made a lot of progress already. I have gone from doing a union with subqueries, to doing a single query with subqueries, to improving my joins up to the point where I don't need to do a subquery for each row anymore.
However, it still feels like it could be better. I'm very insecure about my joins.
Does anyone have tips of improvement here?
The goal:
I want this query to get results from a hierarchy. We have 'parents' (chains) that share the same brands and filters (and other things) as their own children(stores). The idea is for the 'child' to inherit the parent's settings as a fallback, but completely ignores it when it starts setting its own data.
So, basically, with one query, you want "either this data or that data", never both. One or the other. (Another reason why UNIONwon't really fit)
If you want the chain's only when the store has none, formulate a UNION. One operand of the UNION joins the master data to the store data, while the other operand--instantiated only when the store has none--joins the master data to the chain data. That's what one uses a UNION for: "I sometimes want these, and I sometimes want those."
I managed to fix this issue myself, also making it way more efficient, removing most subqueries!
SELECT stores.ID, store_info.display_name, store_info.address,store_info.phone,
IFNULL(
GROUP_CONCAT(DISTINCT storeBrands.display_name ORDER BY storeBrands.name),
GROUP_CONCAT(chainBrands.display_name ORDER BY chainBrands.name)
) AS brands,
IFNULL(
GROUP_CONCAT(DISTINCT storeFilters.name ORDER BY storeFilters.name),
GROUP_CONCAT(DISTINCT chainFilters.name ORDER BY chainFilters.name)
) AS filters
FROM stores
LEFT JOIN store_info ON stores.ID = store_info.storeID
LEFT JOIN store_brands ON stores.ID = store_brands.store
LEFT JOIN chain_brands ON stores.chainID = chain_brands.chain
LEFT JOIN brands AS storeBrands ON store_brands.brand = storeBrands.ID
LEFT JOIN brands AS chainBrands ON chain_brands.brand = chainBrands.ID
LEFT JOIN store_filters ON stores.ID = store_filters.store
LEFT JOIN chain_filters ON stores.chainID = chain_filters.chain
LEFT JOIN filters AS storeFilters ON store_filters.filter = storeFilters.ID
LEFT JOIN filters AS chainFilters ON chain_filters.filter = chainFilters.ID
WHERE stores.city = $cityID
GROUP BY stores.ID"

Query with joins

I am running a query:
select course.course,iars.id,
students.rollno,
students.name as name,
teachers.name as tname,
students.studentid,
attndata.studentid ,sum(attndata.obt) as obt
sum(attndata.benefits) as ben , (sum(attndata.max)) as abc
from groups, students
left join iars
on iars.id
left join str
on str.studentid=students.studentid
left join course
on course.c_id=students.course
left join teachers
on teachers.id=iars.teacherid
join sgm
on sgm.studentid=students.studentid
left join attndata
on attndata.studentid=students.studentid and iars.id=attndata.iarsid
left join sps
on sps.studentid=students.studentid and iars.paperid=sps.paperid
left join semdef
on semdef.semesterid=str.semesterid
where students.course='1'
and students.status='regular'
and sps.paperid='5'
and iars.courseid=students.course
and iars.semester=str.semesterid
and semdef.month=9
and iars.paperid='5'
and str.semesterid='1'
and str.sessionid='12'
and groups.id=sgm.groupid
group by sps.studentid,
teachers.id,
semdef.month
order by
students.name
In this query whenever I am having left join on semdef.id=attndata.mon, I am getting zero result when the value of semdef.id=null but I want all the results, irrespective of semdef, but I want to use it. As in it should fetch result, if the values are null. Can you please help it out.
It's probably because your where clause is saying
and semdef.month=9
and you probably want
and (semdef.month=9 OR semdef.id IS NULL)
or something similar.
It's because your where clause has statements relating to the semdef table. Add these to the join clause as putting these in the where is implying an inner join.
Eg:
Left join semdef on xxx and semdef.id = attndata.min

fetching records with long sql query with multple joins

I will try to explain things as much as I can.
I have following query to fetch records from different tables.
SELECT
p.p_name,
p.id,
cat.cat_name,
p.property_type,
p.p_type,
p.address,
c.client_name,
p.price,
GROUP_CONCAT(pr.price) AS c_price,
pd.land_area,
pd.land_area_rp,
p.tagline,
p.map_location,
r.id,
p.status,
co.country_name,
p.`show`,
u.name,
p.created_date,
p.updated_dt,
o.type_id,
p.furnished,
p.expiry_date
FROM
property p
LEFT OUTER JOIN region AS r
ON p.district_id = r.id
LEFT OUTER JOIN country AS co
ON p.country_id = co.country_id
LEFT OUTER JOIN property_category AS cat
ON p.cat_id = cat.id
LEFT OUTER JOIN property_area_details AS pd
ON p.id = pd.property_id
LEFT OUTER JOIN sc_clients AS c
ON p.client_id = c.client_id
LEFT OUTER JOIN admin AS u
ON p.adminid = u.id
LEFT OUTER JOIN sc_property_orientation_type AS o
ON p.orientation_type = o.type_id
LEFT OUTER JOIN property_amenities_details AS pad
ON p.id = pad.property_id
LEFT OUTER JOIN sc_commercial_property_price AS pr
ON p.id = pr.property_id
WHERE p.id > 0
AND (
p.created_date > DATE_SUB(NOW(), INTERVAL 1 YEAR)
OR p.updated_dt > DATE_SUB(NOW(), INTERVAL 1 YEAR)
)
AND p.p_type = 'sale'
everything works fine if I exclude GROUP_CONCAT(pr.price) AS c_price, from above query. But when I include this it just gives one result. My intention to use group concat above is to fetch comma separated price from table sc_commercial_property_price that matches the property id in this case p.id. If the records for property exist in sc_commercial_property_price then fetch them in comma separated form along with other records. If not it should return blank. What m I doing wrong here?
I will try to explain again if my problem is not clear. Thanks in advance
The GROUP_CONCAT is an aggregation function. When you include it, you are telling SQL that there is an aggregation. Without a GROUP BY, only one row is returns, as in:
select count(*)
from table
The query that you have is acceptable syntax in MySQL but not in any other database. The query does not automatically group by the columns with no functions. Instead, it returns an arbitrary value. You could imagine a function ANY, so you query is:
select any(p.p_name) as p_num, any(p.tagline) as tagline, . . .
To fix this, put all your current variables in a group by clause:
GROUP BY
p.p_name,
p.id,
cat.cat_name,
p.property_type,
p.p_type,
p.address,
c.client_name,
p.price,
pd.land_area,
pd.land_area_rp,
p.tagline,
p.map_location,
r.id,
p.status,
co.country_name,
p.`show`,
u.name,
p.created_date,
p.updated_dt,
o.type_id,
p.furnished,
p.expiry_date
Most people who write SQL think it is good form to include all the group by variables in the group by clause, even though MySQL does not necessarily require this.
Add GROUP BY clause enumerating whatever you intend to have separate rows for. What happens now is that it picks some value for each result column and group_concats every pr.price.