LEFT JOIN limited number of rows with GROUP_CONCAT too slow

LEFT JOIN limited number of rows with GROUP_CONCAT too slow - mysql

I boiled the question down in the db fiddle below.
I have an object table and an images table. X images can be assigned to an object.
Now I have a list where a configurable amount of images needs to be displayed per object.
Please consider this needs to work in MariaDB 10 or MYSQL 5.7. The original construct is much bigger, where indexes are set and more is happening.
My idea was to create a string with concatenated images titles and filenames
This works, but the subquery is super slow:
SELECT o.id, o.title, im.images FROM objects o
LEFT JOIN (
SELECT i.oid, GROUP_CONCAT(i.title, "::", i.name ORDER BY i.ordering ASC SEPARATOR "|") AS images
FROM images i
WHERE i.oid IN (1, 2, 3)
GROUP BY i.oid
ORDER BY NULL) im
ON im.oid=o.id
AND o.id IN (1, 2, 3)
GROUP BY o.id -- this has to stay. original query is bigger...a lot
ORDER BY o.id DESC
DB Fiddle
https://www.db-fiddle.com/f/xi1PJ3P61miBu1fCooWUNR/5
I know this is not limiting the amount of images, but i don't think 5.7 can do that withing GROUP_CONCAT.
It would also be fine if a limited number of images gets queried as single column, so there could be 3 or 4 new columns and not one concatenated string, but i have no idea how to do that.

Use SUBSTRING_INDEX() to return the first N items in the GROUP_CONCAT() result.
SUBSTRING_INDEX(im.images, '|', 3) AS images
will return the first 3 images.

If you prefer to have, say 3 separate columns and not a concatenated string, you can filter the images of each object with a condition like i.ordering <= 3 inside the subquery and then use conditional aggregation:
SELECT o.id, o.title,
MAX(CASE WHEN im.ordering = 1 THEN CONCAT(im.title, "::", im.name) END) image1,
MAX(CASE WHEN im.ordering = 2 THEN CONCAT(im.title, "::", im.name) END) image2,
MAX(CASE WHEN im.ordering = 3 THEN CONCAT(im.title, "::", im.name) END) image3
FROM objects o
LEFT JOIN (
SELECT *
FROM images i
WHERE i.oid IN (1, 2, 3) AND i.ordering <= 3
) im ON im.oid = o.id AND o.id IN (1, 2, 3)
GROUP BY o.id
ORDER BY o.id DESC;
Also, if you want results only for objects with ids 1, 2 and 3 then you should place the condition o.id IN (1, 2, 3) in a WHERE clause and not in the ON clause.
See the demo.

The first thing I notice in your dbfiddle is that you have no index on images(oid), so your condition in your subquery is doing a table-scan of the images table. That could be a part of the performance problem.
I came up with this solution to limit the result to 3 images per oid. This is the kind of awful solution we had to use in pre-MySQL 8.0 days before windowing functions. In fact, using expressions with side-effects like this is now deprecated, because it relies on the expressions in the select-list being evaluated left to right, and there is no guarantee that will be true.
SELECT t.id, t.title, GROUP_CONCAT(t.image SEPARATOR '|') AS images
FROM (
SELECT o.id, o.title, CONCAT(i.title, '::', i.name) AS image,
#row:=IF(#oid=o.id,#row+1,1) as rownum,
#oid:=o.id
FROM objects o
CROSS JOIN (SELECT #row:=0, #oid:=0) AS _init
LEFT JOIN images i ON i.oid = o.id
WHERE o.id IN (1, 2, 3)
ORDER BY i.ordering
) AS t
WHERE t.rownum <= 3
GROUP BY t.id
ORDER BY t.id DESC;
It produces the correct result, but I am not testing it with a large dataset so I can't guarantee it has any better performance.
Do create the index on images(oid) regardless. Adding the ordering column to the index won't help in this query. It must do a filesort anyway, both because of the range query and because the sort is on a table that isn't the first table accessed.
P.S.: I recommend you upgrade. MySQL 5.6 has been past its end of life for over a year by now, and MySQL 8.0 has been GA since 2018.

Related

MySQL select in select query

can somebody help me solving and explaining how i can get the following done? :
In my MySQL query i want to select all entries, where the forwarded_to_cordroom value is 0, and in the next row i want to have all where the value is 1, basically i could create 2 identical queries, where the only difference would be the WHERE clause (where forwarded_to_cordroom = 1 , where forwarded_to_cordroom = 0) , and i thought about doing this in one query, but getting the following error with what ive tried:
SELECT
COUNT(DISTINCT o.order_id) as count,
(SELECT o.forwarded_to_cordroom WHERE o.forwarded_to_cordroom = 1)
FROM
`orders_articles` o
LEFT JOIN orders oo ON
o.order_id = oo.order_id
WHERE
(
oo.finished_order_date IS NULL OR oo.finished_order_date >= '2021-09-27'
) AND oo.order_date <= '2021-09-27'
Results in :
#1140 - In aggregated query without GROUP BY, expression #2 of SELECT list contains nonaggregated column 'o.forwarded_to_cordroom'; this is
incompatible with sql_mode=only_full_group_by
I have also tried changing the subselect in various ways (with and without joins etc.) but without success, always the same error.
I'd prefer not turning this mode off, I think that would not be the purpose and that I can fix my query with some help.
Best Regards

Use conditional aggregation:
SELECT COUNT(DISTINCT o.order_id) AS count,
COUNT(CASE WHEN o.forwarded_to_cordroom = 1 THEN 1 END) AS count_1,
COUNT(CASE WHEN o.forwarded_to_cordroom = 0 THEN 1 END) AS count_0
FROM orders_articles AS o
LEFT JOIN orders AS oo ON o.order_id = oo.order_id
WHERE ...

MySQL Inner join naming error?

http://sqlfiddle.com/#!9/e6effb/1
I'm trying to get a top 10 by revenue per brand for France on december.
There are 2 tables (first table has date, second table has brand and I'm trying to join them)
I get this error "FUNCTION db_9_d870e5.SUM does not exist. Check the 'Function Name Parsing and Resolution' section in the Reference Manual"
Is my use of Inner join there correct?

It's because you had an extra space after SUM. Please change it from
SUM (o1.total_net_revenue)to SUM(o1.total_net_revenue).
See more about it here.
Also after correcting it, your query still had more error as you were not selecting order_id on your intermediate table i2 so edited here as :
SELECT o1.order_id, o1.country, i2.brand,
SUM(o1.total_net_revenue)
FROM orders o1
INNER JOIN (
SELECT i1.brand, SUM(i1.net_revenue) AS total_net_revenue,order_id
FROM ordered_items i1
WHERE i1.country = 'France'
GROUP BY i1.brand
) i2
ON o1.order_id = i2.order_id AND o1.total_net_revenue = i2.total_net_revenue
AND o1.total_net_revenue = i2.total_net_revenue
WHERE o1.country = 'France' AND o1.created_at BETWEEN '2016-12-01' AND '2016-12-31'
GROUP BY 1,2,3
ORDER BY 4
LIMIT 10`

--EDIT stack Fan is correct that the o2.total_net_revenue exists. My confusion was because the data structure duplicated three columns between the tables, including one that was being looked for.
There were a couple errors with your SQL statement:
1. You were referencing an invalid column in your outer-select-SUM function. I believe you're actually after i2.total_net_revenue.
The table structure is terrible, the "important" columns (country, revenue, order_id) are duplicated between the two tables. I would also expect the revenue columns to share the same name, if they always have the same values in them. In the example, there's no difference between i1.net_revenue and o1.total_net_revenue.
In your inner join, you didn't reference i1.order_id, which meant that your "on" clause couldn't execute correctly.
PROTIP:
When you run into an issue like this, take all the complicated bits out of your query and get the base query working correctly first. THEN add your functions.
PROTIP:
In your GROUP BY clause, reference the actual columns, NOT the column numbers. It makes your query more robust.
This is the query I ended up with:
SELECT o1.order_id, o1.country, i2.brand,
SUM(i2.total_net_revenue) AS total_rev
FROM orders o1
INNER JOIN (
SELECT i1.order_id, i1.brand, SUM(i1.net_revenue) AS total_net_revenue
FROM ordered_items i1
WHERE i1.country = 'France'
GROUP BY i1.brand
) i2
ON o1.order_id = i2.order_id AND o1.total_net_revenue = i2.total_net_revenue
AND o1.total_net_revenue = i2.total_net_revenue
WHERE o1.country = 'France' AND o1.created_at BETWEEN '2016-12-01' AND '2016-12-31'
GROUP BY o1.order_id, o1.country, i2.brand
ORDER BY total_rev
LIMIT 10

How to fix SQL query with Left Join and subquery?

I have SQL query with LEFT JOIN:
SELECT COUNT(stn.stocksId) AS count_stocks
FROM MedicalFacilities AS a
LEFT JOIN stocks stn ON
(stn.stocksIdMF = ( SELECT b.MedicalFacilitiesIdUser
FROM medicalfacilities AS b
WHERE b.MedicalFacilitiesIdUser = a.MedicalFacilitiesIdUser
ORDER BY stn.stocksId DESC LIMIT 1)
AND stn.stocksEndDate >= UNIX_TIMESTAMP() AND stn.stocksStartDate <= UNIX_TIMESTAMP())
These query I want to select one row from table stocks by conditions and with field equal value a.MedicalFacilitiesIdUser.
I get always count_stocks = 0 in result. But I need to get 1

The count(...) aggregate doesn't count null, so its argument matters:
COUNT(stn.stocksId)
Since stn is your right hand table, this will not count anything if the left join misses. You could use:
COUNT(*)
which counts every row, even if all its columns are null. Or a column from the left hand table (a) that is never null:
COUNT(a.ID)

Your subquery in the on looks very strange to me:
on stn.stocksIdMF = ( SELECT b.MedicalFacilitiesIdUser
FROM medicalfacilities AS b
WHERE b.MedicalFacilitiesIdUser = a.MedicalFacilitiesIdUser
ORDER BY stn.stocksId DESC LIMIT 1)
This is comparing MedicalFacilitiesIdUser to stocksIdMF. Admittedly, you have no sample data or data layouts, but the naming of the columns suggests that these are not the same thing. Perhaps you intend:
on stn.stocksIdMF = ( SELECT b.stocksId
-----------------------------^
FROM medicalfacilities AS b
WHERE b.MedicalFacilitiesIdUser = a.MedicalFacilitiesIdUser
ORDER BY b.stocksId DESC
LIMIT 1)
Also, ordering by stn.stocksid wouldn't do anything useful, because that would be coming from outside the subquery.

Your subquery seems redundant and main query is hard to read as much of the join statements could be placed in where clause. Additionally, original query might have a performance issue.
Recall WHERE is an implicit join and JOIN is an explicit join. Query optimizers
make no distinction between the two if they use same expressions but readability and maintainability is another thing to acknowledge.
Consider the revised version (notice I added a GROUP BY):
SELECT COUNT(stn.stocksId) AS count_stocks
FROM MedicalFacilities AS a
LEFT JOIN stocks stn ON stn.stocksIdMF = a.MedicalFacilitiesIdUser
WHERE stn.stocksEndDate >= UNIX_TIMESTAMP()
AND stn.stocksStartDate <= UNIX_TIMESTAMP()
GROUP BY stn.stocksId
ORDER BY stn.stocksId DESC
LIMIT 1

Highscores on multiple columns, efficient query, right approach

Let's say we've got high scores table with columns app_id, best_score, best_time, most_drops, longest_something and couple more.
I'd like to collect top three results ON EACH CATEGORY grouped by app_id?
For now I'm using separate rank queries on each category in a loop:
SELECT app_id, best_something1,
FIND_IN_SET( best_something1,
(SELECT GROUP_CONCAT( best_something1
ORDER BY best_something1 DESC)
FROM highscores )) AS rank
FROM highscores
ORDER BY best_something1 DESC LIMIT 3;
Two things worth to add:
All columns for specific app are being updated at the same time (can consider creating a helper table).
the result of prospective "turbo query" might be requested quite often - as often as updating the values.
I'm quite basic with SQL and suspect that it has many more commands that combined together could do the magic?
What I'd expect from this post is that some wise owl would at least point the direction where to go or how to go.
The sample table:
http://sqlfiddle.com/#!2/eef053/1
Here is sample result too (already in json format, sry):
{"total_blocks":[["13","174","1"],["9","153","2"],["10","26","3"]],"total_games":[["13","15","1"],["9","12","2"],["10","2","3"]],"total_score":[["13","410","1"],["9","332","2"],["11","88","3"]],"aver_pps":[["11","4.34011","1"],["13","2.64521","2"],["12","2.60623","3"]],"aver_drop_per_game":[["11","20","1"],["10","13","2"],["9","12.75","3"]],"aver_drop_val":[["11","4.4","1"],["13","2.35632","2"],["9","2.16993","3"]],"aver_score":[["11","88","1"],["9","27.6667","2"],["13","27.3333","3"]],"best_pps":[["13","4.9527","1"],["11","4.34011","2"],["9","4.13076","3"]],"most_drops":[["11","20","1"],["9","16","2"],["13","16","2"]],"longest_drop":[["9","3","1"],["13","2","2"],["11","2","2"]],"best_drop":[["11","42","1"],["13","36","2"],["9","30","3"]],"best_score":[["11","88","1"],["13","78","2"],["9","58","3"]]}

When I encounter this scenario, I prefer to employ the UNION clause, and combine the queries tailored to each ORDERing and LIMIT.
http://dev.mysql.com/doc/refman/5.1/en/union.html
UNION combines the result rows vertically (top 3 rows for 5 sort categories yields 15 rows).
For your specific purpose, you might then pivot them as sub-SELECTs, rolling them up with GROUP_CONCAT GROUPed on user so that each has the delimited list.

I'd test something like this query, to see if the performance is any better or not. I think this comes pretty close to satisfying the specification:
( SELECT 99 AS seq_
, a.category
, CONVERT(a.val,DOUBLE) AS val
, FIND_IN_SET(a.val,r.highest_vals) AS rank
, a.user_id
FROM ( SELECT 'total_blocks' AS category
, b.`total_blocks` AS val
, b.user_id
FROM app b
ORDER BY b.`total_blocks` DESC
LIMIT 3
) a
CROSS
JOIN ( SELECT GROUP_CONCAT(s.val ORDER BY s.val DESC) AS highest_vals
FROM ( SELECT t.`total_blocks` AS val
FROM app t
ORDER BY t.`total_blocks` DESC
LIMIT 3
) s
) r
ORDER BY a.val DESC
)
UNION ALL
( SELECT 97 AS seq_
, a.category
, CONVERT(a.val,DOUBLE) AS val
, FIND_IN_SET(a.val,r.highest_vals) AS rank
, a.user_id
FROM ( SELECT 'XXX' AS category
, b.`XXX` AS val
, b.user_id
FROM app b
ORDER BY b.`XXX` DESC
LIMIT 3
) a
CROSS
JOIN ( SELECT GROUP_CONCAT(s.val ORDER BY s.val DESC) AS highest_vals
FROM ( SELECT t.`XXX` AS val
FROM app t
ORDER BY t.`XXX` DESC
LIMIT 3
) s
) r
ORDER BY a.val DESC
)
ORDER BY seq_ DESC, val DESC
To unpack this a little bit... this is essentially separate queries that are combined with UNION ALL set operator.
Each of the queries returns a literal value to allow for ordering. (In this case, I've given the column a rather anonymous name seq_ (sequence)... if the specific order isn't important, then this could be removed.
Each query is also returning a literal value that tells which "category" the row is for.
Because some of the values returned are INTEGER, and others are FLOAT, I'd cast all of those values to floating point, so the datatypes of each query line up.
For the FLOAT (floating point) type values, there can be a problem with comparison. So I'd go with casting those to decimal and stringing them together into a list using GROUP_CONCAT (as the original query does).
Since we are returning only three rows from each query, we only need to concatenate together the three largest values. (If there's a two way "tie" for first place, we'll return rank values of 1, 1, 3.)
Suitable indexes for each query will improve performance for large sets.
... ON app (total_blocks, user_id)
... ON app (best_pps,user_id)
... ON app (XXX,user_id)

MySql query runs very slow(actually never gives output) without where clause

I have a mysql query and it works fine when i use where clause, but when i donot use
where clause it gone and never gives the output and finally timeout.
Actually i have used Explain command to check the performance of the query and in both cases the Explain gives the same number of rows used in joining.
I have attached the image of output got with Explain command.
Below is the query.
I couldn't figure whats the problem here.
Any help is highly appreciated.
Thanks.
SELECT
MCI.CLIENT_ID AS CLIENT_ID, MCI.NAME AS CLIENT_NAME, MCI.PRIMARY_CONTACT AS CLIENT_PRIMARY_CONTACT,
MCI.ADDED_BY AS SP_ID, CONCAT(MUD_SP.FIRST_NAME, ' ', MUD_SP.LAST_NAME) AS SP_NAME,
MCI.FK_PROSPECT_ID AS PROSPECT_ID, MCI.DATE_ADDED AS ADDED_ON,
(SELECT GROUP_CONCAT(LT.TAG_TEXT SEPARATOR ', ')
FROM LK_TAG LT
INNER JOIN M_OBJECT_TAG_MAPPING MOTM
ON LT.PK_ID = MOTM.FK_TAG_ID
WHERE MOTM.FK_OBJECT_ID = MCI.FK_PROSPECT_ID
AND MOTM.OBJECT_TYPE = 1
AND MOTM.IS_ACTIVE = 1
) AS TAGS,
IFNULL(SUM(GET_DIGITS(MMR.RCP_AMOUNT)), 0) AS REVENUE_SO_FAR,
IFNULL(SUM(GET_DIGITS(MMR.RCP_RUPEES)), 0) AS REVENUE_INR,
COUNT(DISTINCT PMI_MONTHLY.PROJECT_ID) AS MONTHLY,
COUNT(DISTINCT PMI_FIXED.PROJECT_ID) AS FIXED,
COUNT(DISTINCT PMI_HOURLY.PROJECT_ID) AS HOURLY,
COUNT(DISTINCT PMI_ANNUAL.PROJECT_ID) AS ANNUAL,
COUNT(DISTINCT PMI_CURRENTLY_RUNNING.PROJECT_ID) AS CURRENTLY_RUNNING_PROJECTS,
COUNT(DISTINCT PMI_YET_TO_START.PROJECT_ID) AS YET_TO_START_PROJECTS,
COUNT(DISTINCT PMI_TECH_SALES_CLOSED.PROJECT_ID) AS TECH_SALES_CLOSED_PROJECTS
FROM
M_CLIENT_INFO MCI
INNER JOIN M_USER_DETAILS MUD_SP
ON MCI.ADDED_BY = MUD_SP.PK_ID
LEFT OUTER JOIN M_MONTH_RECEIPT MMR
ON MMR.CLIENT_ID = MCI.CLIENT_ID
LEFT OUTER JOIN M_PROJECT_INFO PMI_FIXED
ON PMI_FIXED.CLIENT_ID = MCI.CLIENT_ID AND PMI_FIXED.PROJECT_TYPE = 1
LEFT OUTER JOIN M_PROJECT_INFO PMI_MONTHLY
ON PMI_MONTHLY.CLIENT_ID = MCI.CLIENT_ID AND PMI_MONTHLY.PROJECT_TYPE = 2
LEFT OUTER JOIN M_PROJECT_INFO PMI_HOURLY
ON PMI_HOURLY.CLIENT_ID = MCI.CLIENT_ID AND PMI_HOURLY.PROJECT_TYPE = 3
LEFT OUTER JOIN M_PROJECT_INFO PMI_ANNUAL
ON PMI_ANNUAL.CLIENT_ID = MCI.CLIENT_ID AND PMI_ANNUAL.PROJECT_TYPE = 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_CURRENTLY_RUNNING
ON PMI_CURRENTLY_RUNNING.CLIENT_ID = MCI.CLIENT_ID AND PMI_CURRENTLY_RUNNING.STATUS = 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_YET_TO_START
ON PMI_YET_TO_START.CLIENT_ID = MCI.CLIENT_ID AND PMI_YET_TO_START.STATUS < 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_TECH_SALES_CLOSED
ON PMI_TECH_SALES_CLOSED.CLIENT_ID = MCI.CLIENT_ID AND PMI_TECH_SALES_CLOSED.STATUS > 4
WHERE YEAR(MCI.DATE_ADDED) = '2012'
GROUP BY MCI.CLIENT_ID ORDER BY CLIENT_NAME ASC

Yes, as many people have said, the key is that when you have the where clause, mysql engine filters the table M_CLIENT_INFO --probably drammatically--.
A similar result as removing the where clause is to to add this where clause:
where 1 = 1
You will see that the performance is degraded also because mysql will try to get all the data.

Remove the where clause and all columns from select and add a count to see how many records you get. If it is reasonable, say up to 10k, then do the following,
put back the select columns related to M_CLIENT_INFO
do not include the nested one "TAGS"
remove all your joins
run your query without where clause and gradually include the joins
this way you'll find out when the timeout is caused.

I would try the following. First, MySQL has a keyword "STRAIGHT_JOIN" which tells the optimizer to do the query in the table order you've specified. Since all you left-joins are child-related (like a lookup table), you don't want MySQL to try and interpret one of those as a primary basis of the query.
SELECT STRAIGHT_JOIN ... rest of query.
Next, your M_PROJECT_INFO table, I dont know how many columns of data are out there, but you appear to be concentrating on just a few columns on your DISTINCT aggregates. I would make sure you have a covering index on these elements to help the query via an index on
( Client_ID, Project_Type, Status, Project_ID )
This way the engine can apply the criteria and get the distinct all out of the index instead of having to go back to the raw data pages for the query.
Third, your M_CLIENT_INFO table. Ensure that has an index on both your criteria, group by AND your Order By, and change your order by from the aliased "CLIENT_NAME" to the actual column of the SQL table so it matches the index
( Date_Added, Client_ID, Name )
I have "name" in ticks as it is also a reserved word and helps clarify the column, not the keyword.
Next, the WHERE clause. Whenever you apply a function to an indexed column name, it doesn't work the greatest, especially on date/time fields... You might want to change your where clause to
WHERE MCI.Date_Added between '2012-01-01' and '2012-12-31 23:59:59'
so the BETWEEN range is showing the entire year and the index can better be utilized.
Finally, if the above do not help, I would consider splitting your query some. The GROUP_CONCACT inline select for the TAGS might be a bit of a killer for you. You might want to have all the distinct elements first for the grouping per client, THEN get those details.... Something like
select
PQ.*,
group_concat(...) tags
from
( the entire primary part of the query ) as PQ
Left join yourGroupConcatTableBasis on key columns

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

LEFT JOIN limited number of rows with GROUP_CONCAT too slow - mysql

Use SUBSTRING_INDEX() to return the first N items in the GROUP_CONCAT() result. SUBSTRING_INDEX(im.images, '|', 3) AS images will return the first 3 images.

Related

MySQL select in select query

MySQL Inner join naming error?

How to fix SQL query with Left Join and subquery?

Highscores on multiple columns, efficient query, right approach

MySql query runs very slow(actually never gives output) without where clause

Categories

Resources