Query SQL runs slow with group by in MySQL - mysql

I have a sql below, it runs about 30 mins, which is too long for me.
SELECT LPP.learning_project_pupilID, SL.serviceID, MAX(LPPO.start_date), SUM(LPPOT.license_mode_value) totalAssignedLicenses
FROM t_services_licenses SL
INNER JOIN t_pupils_offers_services POS ON POS.service_licenseID = SL.service_licenseID
INNER JOIN j_learning_projects_pupils_offers LPPO ON LPPO.learning_project_pupil_offerID = POS.learning_project_pupil_offerID
INNER JOIN j_learning_projects_pupils LPP ON LPPO.learning_project_pupilID = LPP.learning_project_pupilID
INNER JOIN j_learning_projects_pupils_offers_tracking LPPOT ON LPPOT.pupil_offer_serviceID = POS.pupil_offer_serviceID
INNER JOIN t_filters_items FI ON FI.itemID = LPP.learning_project_pupilID_for_filter_join
WHERE FI.filterID = '4dce2235-aafd-4ba2-b248-c137ad6ce8ca'
AND SL.serviceID IN ('OnlineConversationClasses', 'TwentyFourSeven')
GROUP BY LPP.learning_project_pupilID, SL.serviceID
The explain result below(tell me if you can't view the image):
http://images0.cnblogs.com/blog2015/47012/201508/140920298959608.png
I have viewed the profile result, "copying temp data " wasted almost all the time. I know the reason is caused by "group by" functionality, So I did some changes below to verify it:
I removed the MAX, SUM functions as well as the Group By sql and ran it, the time is only cost about 40 seconds, which is ok for us.
So here , I want to know, if there are some other methods to make above sql execute much more faster?
more info, you can find here: http://www.cnblogs.com/scy251147/p/4728995.html
EDIT:
From the explain view, I can see that in t_filters_items table, there are about 50802 rows filtered, And this table is not luckily Using temporary to store temp data, which is not a good choice for me . I really don't like "Group By" in MySQL very much.

Do not use CHARACTER SET utf8 on UUID columns. Change to ascii. Further discussion of uuids and how to further shrink them: http://mysql.rjweb.org/doc.php/uuid
Are there really 50K rows with FI.filterID = '4dce2235-aafd-4ba2-b248-c137ad6ce8ca'?
The GROUP BY spans two table (LPP and SL) making it impossible to optimize. Can that be changed?
The SUM(...) is likely to have a bigger value than you expect. This is because of the JOINs. Try to rewrite the computation of the SUM in a subquery.
Are you using InnoDB? Is innodb_buffer_pool_size set to about 70% of available RAM?
Approximately how many rows in each table?

Related

AWS RDS MySQL fetching not being completed

I am writing queries off tables in my AWS RDS MySQL server and I can't get the query to complete the fetching. The duration of the query is 5.328 seconds but then fetching just doesn't end. I have Left Joined a sub query. When I run the sub separately it runs very quick and almost has no fetch time. When I run the main query it works great. The main query does have about 97,000 rows. I'm new to AWS RDS Servers and wonder if there is a parameter adjustment I need to be made? I feel as if the query is pretty simple.
We are in the middle of switching from BigQuery and BigQuery runs it just fine with the same data and same query.
Any ideas of what I can do to get it to fetch and speed up the fetching?
I've tried indexing and changing buffer pool size but still no luck
FROM
project__c P
LEFT JOIN contact C ON C.id = P.salesperson__c
LEFT JOIN account A ON A.id = P.sales_team_account_at_sale__c
LEFT JOIN contact S ON S.id = P.rep_contact__c
LEFT JOIN (
SELECT
U.name,
C.rep_id__c EE_Code
FROM
user U
LEFT JOIN profile P ON P.id = U.profileid
LEFT JOIN contact C ON C.email = U.email
WHERE
(P.name LIKE "%Inside%"OR P.name LIKE "%rep%")
AND C.active__c = TRUE
AND C.rep_id__c IS NOT NULL
AND C.recordtypeid = "############"
) LC ON LC.name = P.is_rep_formula__c
You can analyze the query by adding EXPLAIN [your query] and running that to see what indexes are being used and how many rows are examined for each joined table. It might be joining a lot more rows than you expect.
You can try:
SELECT SQL_CALC_FOUND_ROWS [your query] limit 1;
If the problem is in sending too much data (i.e. more rows than you think it's trying to return), this will return quickly. It would prove that the problem does lie in sending the data, not in the query stage. If this is what you find, run the following next:
select FOUND_ROWS();
This will tell you how many total rows matched your query. If it's bigger than you expect, your joins are wrong in some way. The explain analyzer mentioned above should provide some insight. While the outer query has 97000 rows, each of those rows could join with more than one row from the subquery. This can happen if you expect the left side of the join to always have a value, but find out there are rows in which it is empty/null. This can lead to a full cross join where every row from the left joins with every row on the right.
If the limit 1 query also hangs, the problem is in the query itself. Again, the explain analyzer should tell you where the problem lies. Most likely it's a missing index causing very slow scans of tables instead of fast lookups in both tables joins and where clauses.
The inner query might be fine/fast on its own. When you join with another query, if not indexed/joined properly, it could lead to a result set and/or query time many times larger.
If missing indices with derived tables is the problem, read up on how mysql can optimize via different settings by visiting https://dev.mysql.com/doc/refman/5.7/en/derived-table-optimization.html
As seen in a comment to your question, creating a temp table and joining directly gives you control instead of relying on/hoping mysql to optimize your query in a way that uses fast indices.
I'm not versed in BigQuery, but unless it's running the same core mysql engine under the hood, it's not really a good comparison.
Why all the left joins. Your subquery seems to be targeted to a small set of results, yet you still want to get 90k+ rows? If you are using this query to render any sort of list in a web application, apply reasonable limits and pagination.

MySQL query and LEFT JOIN for large table

I am trying to optimize query (below) which takes 80 minutes to execute. :/
I have very large table prodaja with 21m rows and actual_stock with 960k rows.
SELECT
p.NazivMat,
sum(p.Kolicina) AS ProdajaKol,
sum(p.Iznos) AS ProdajaIznos,
s.Kolicina AS TrenutnaZaliha,
s.Iznos AS TrenZalIznos
FROM
prodaja p
LEFT JOIN actual_stock s ON s.BrojSklad = p.BrojSklad
AND s.SifraMat = p.SifraMat
WHERE
p.Dobavljac = 1664
AND p.DatumOtprem BETWEEN '2020-12-10'
AND '2020-12-11'
I have set Indexes on fields BrojSklad and SifraMat but it does not change much at all as I have dates and range changing and query can take (run) forever if 10 days range is selected (with this query).
Is there any other way(s) to get same result with different query or two of them like "prefetch" and store in temp table and run another one?
Table with 20m rows is pain in the but. :/
UPDATE: 30. Dec
Thanks for all responds below. For sake of simplicity, I've shorten the query, the long version is below. I did add GROUP BY and the end of it, that's sorted.
EXPLAIN SELECT
cm_prodaja.NazivGrupe,
cm_prodaja.Grupa,
cm_prodaja.DatumOtprem,
cm_prodaja.SifraMat,
cm_prodaja.BarCode,
cm_prodaja.SifArtOdDob,
cm_prodaja.NazivMat,
sum(cm_prodaja.Kolicina) AS Kolicina,
sum(cm_prodaja.Iznos) AS Iznos,
IFNULL (zaliha_artikala_radnje.Kolicina, 0) AS TrenutnaZaliha,
IFNULL (zaliha_artikala_radnje.Iznos, 0) AS TrenZalIznos
FROM
cm_prodaja
LEFT JOIN zaliha_artikala_radnje ON zaliha_artikala_radnje.BrojSklad = cm_prodaja.BrojSklad
AND zaliha_artikala_radnje.SifraMat = cm_prodaja.SifraMat
WHERE
cm_prodaja.Dobavljac = 1664
AND cm_prodaja.DatumOtprem BETWEEN '2020-08-10'
AND '2020-08-11'
GROUP BY cm_prodaja.BrojSklad, cm_prodaja.NazivRadnje, cm_prodaja.SifraMat,
cm_prodaja.BarCode, cm_prodaja.SifArtOdDob, cm_prodaja.NazivMat,
cm_prodaja.Kolicina, cm_prodaja.Iznos, cm_prodaja.Dobavljac,
cm_prodaja.NazivDobavljaca, cm_prodaja.Proizvodjac,
cm_prodaja.NazivProizvodjaca, cm_prodaja.Grupa, cm_prodaja.NazivGrupe
I made it a bit faster by adding missing Index on zaliha_artikala_radnje.BrojSklad and zaliha_artikala_radnje.SifraMat.
Another thing I did is enabling partitioning and I've set to "split" table by year (months) on 4 sections/year and that helped a lot.
I've added image with EXPLAIN result.
Add these composite indexes, with the columns in the order given:
p: (Dobavljac, DatumOtprem)
s: (SifraMat, BrojSklad, Iznos, Kolicina)
If you need further assistance, please provide SHOW CREATE TABLE and fix the syntax error: ... LEFT The JOIN ...
What is the datatype of DatumOtprem? I am worried about the endpoints of the BETWEEN.
Another problem... The query has SUM(), but no GROUP BY; what is the intent?
thanks for suggestions. I ended up with using PHP way to JOIN data described here: https://www.koolreport.com/docs/processes/join/
My original query took 3-40 minutes to give results for 1-7 days selected in filter.
Now it takes 2-5 seconds, where final result has 1k - 8k rows.
I did try different methods and anything what has JOIN inside query dropped performance drastically. As said, KoolReport JOIN function solved my problem. I created two queries, both of them are getting their sets of data ordered by SifraMat and match same field Dobavljac.

How to improve query performance with order by, group by and joins

I Had a problem with order by when joins multiple tables which have millions of data. But I got solution as instead of join with distinct use of EXISTS will improve performance from the following question
How to improve order by performance with joins in mysql
SELECT
`tracked_twitter` . *,
COUNT( * ) AS twitterContentCount,
retweet_count + favourite_count + reply_count AS engagement
FROM
`tracked_twitter`
INNER JOIN
`twitter_content`
ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN
`tracker_twitter_content`
ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
WHERE
`tracker_twitter_content`.`tracker_id` = '88'
GROUP BY
`tracked_twitter`.`id`
ORDER BY
twitterContentCount DESC LIMIT 20 OFFSET 0
But that method solves if I only need the result set from the parent table. What if, I want to execute grouped count and other math functions in other than parent table. I wrote a query that solves my criteria, but it takes 20 sec to execute. How can I optimize it ??.
Thanks in advance
Given the query is already fairly simple the options I'd look in to are ...
Execution plan (to find any missing indexes you could add)
caching (to ensure SQL already has all the data in ram)
de-normalisation (to turn the query in to flat select)
cache the data in the application (so you could use something like PLINQ on it)
Use a ram based store (redis, elastic)
File group adjustments (physically move the db to faster discs)
Partition your tables (to spread the raw data over multiple physical discs)
The further you go down this list the more involved the solutions become.
I guess it depends how fast you need the query to be and how much you need your solution to scale.

Optimizing the SQL Query to reduce execution time

My SQL Query with all the filters applied is returning 10 lakhs (one million) records . To get all the records it is taking 76.28 seconds .. which is not acceptable . How can I optimize my SQL Query which should take less time.
The Query I am using is :
SELECT cDistName , cTlkName, cGpName, cVlgName ,
cMmbName , dSrvyOn
FROM sspk.villages
LEFT JOIN gps ON nVlgGpID = nGpID
LEFT JOIN TALUKS ON nGpTlkID = nTlkID
left JOIN dists ON nTlkDistID = nDistID
LEFT JOIN HHINFO ON nHLstGpID = nGpID
LEFT JOIN MEMBERS ON nHLstID = nMmbHhiID
LEFT JOIN BNFTSTTS ON nMmbID = nBStsMmbID
LEFT JOIN STATUS ON nBStsSttsID = nSttsID
LEFT JOIN SCHEMES ON nBStsSchID = nSchID
WHERE (
(nMmbGndrID = 1 and nMmbAge between 18 and 60)
or (nMmbGndrID = 2 and nMmbAge between 18 and 55)
)
AND cSttsDesc like 'No, Eligible'
AND DATE_FORMAT(dSrvyOn , '%m-%Y') < DATE_FORMAT('2012-08-01' , '%m-%Y' )
GROUP BY cDistName , cTlkName, cGpName, cVlgName ,
DATE_FORMAT(dSrvyOn , '%m-%Y')
I have searched on the forum and outside and used some of the tips given but it hardly makes any difference . The joins that i have used in above query is left join all on Primary Key and Foreign key . Can any one suggest me how can I modify this sql to get less execution time ....
You are, sir, a very demanding user of MySQL! A million records retrieved from a massively joined result set at the speed you mentioned is 76 microseconds per record. Many would consider this to be acceptable performance. Keep in mind that your client software may be a limiting factor with a result set of that size: it has to consume the enormous result set and do something with it.
That being said, I see a couple of problems.
First, rewrite your query so every column name is qualified by a table name. You'll do this for yourself and the next person who maintains it. You can see at a glance what your WHERE criteria need to do.
Second, consider this search criterion. It requires TWO searches, because of the OR.
WHERE (
(MEMBERS.nMmbGndrID = 1 and MEMBERS.nMmbAge between 18 and 60)
or (MEMBERS.nMmbGndrID = 2 and MEMBERS.nMmbAge between 18 and 55)
)
I'm guessing that these criteria match most of your population -- females 18-60 and males 18-55 (a guess). Can you put the MEMBERS table first in your list of LEFT JOINs? Or can you put a derived column (MEMBERS.working_age = 1 or some such) in your table?
Also try a compound index on (nMmbGndrID,nMmbAge) on MEMBERS to speed this up. It may or may not work.
Third, consider this criterion.
AND DATE_FORMAT(dSrvyOn , '%m-%Y') < DATE_FORMAT('2012-08-01' , '%m-%Y' )
You've applied a function to the dSrvyOn column. This defeats the use of an index for that search. Instead, try this.
AND dSrvyOn >= '2102-08-01'
AND dSrvyOn < '2012-08-01' + INTERVAL 1 MONTH
This will, if you have an index on dSrvyOn, do a range search on that index. My remark also applies to the function in your ORDER BY clause.
Finally, as somebody else mentioned, don't use LIKE to search where = will do. And NEVER use column LIKE '%something%' if you want acceptable performance.
You claim yourself you base your joins on good and unique indexes. So there is little to be optimized. Maybe a few hints:
try to optimize your table layout, maybe you can reduce the number of joins required. That probably brings more performance optimization than anything else.
check your hardware (available memory and things) and the server configuration.
use mysqls explain feature to find bottle necks.
maybe you can create an auxilliary table especially for this query, which is filled by a background process. That way the query itself runs faster, since the work is done before the query in background. That usually works if the query retrieves data that must not neccessarily be synchronous with every single change in the database.
check if an RDBMS is really the right type of database. For many purposes graph databases are much more efficient and offer better performance.
Try adding an index to nMmbGndrID, nMmbAge, and cSttsDesc and see if that helps your queries out.
Additionally you can use the "Explain" command before your select statement to give you some hints on what you might do better. See the MySQL Reference for more details on explain.
If the tables used in joins are least use for updates queries, then you can probably change the engine type from INNODB to MyISAM.
Select queries in MyISAM runs 2x faster then in INNODB, but the updates and insert queries are much slower in MyISAM.
You can create Views in order to avoid long queries and time.
Your like operator could be holding you up -- full-text search with like is not MySQL's strong point.
Consider setting a fulltext index on cSttsDesc (make sure it is a TEXT field first).
ALTER TABLE articles ADD FULLTEXT(cSttsDesc);
SELECT
*
FROM
table_name
WHERE MATCH(cSttsDesc) AGAINST('No, Eligible')
Alternatively, you can set a boolean flag instead of cSttsDesc like 'No, Eligible'.
Source: http://devzone.zend.com/26/using-mysql-full-text-searching/
This SQL has many things that are redundant that may not show up in an explain.
If you require a field, it shouldn't be in a table that's in a LEFT JOIN - left join is for when data might be in the joined table, not when it has to be.
If all the required fields are in the same table, it should be the in your first FROM.
If your text search is predictable (not from user input) and relates to a single known ID, use the ID not the text search (props to Patricia for spotting the LIKE bottleneck).
Your query is hard to read because of the lack of table hinting, but there does seem to be a pattern to your field names.
You require nMmbGndrID and nMmbAge to have a value, but these are probably in MEMBERS, which is 5 left joins down. That's a redundancy.
Remember that you can do a simple join like this:
FROM sspk.villages, gps, TALUKS, dists, HHINFO, MEMBERS [...] WHERE [...] nVlgGpID = nGpID
AND nGpTlkID = nTlkID
AND nTlkDistID = nDistID
AND nHLstGpID = nGpID
AND nHLstID = nMmbHhiID
It looks like cSttsDesc comes from STATUS. But if the text 'No, Eligible' matches exactly one nBStsSttsID in BNFTSTTS then find out the value and use that! If it is 7, take out LEFT JOIN STATUS ON nBStsSttsID = nSttsID and replace AND cSttsDesc like 'No, Eligible' with AND nBStsSttsID = '7'. This would see a massive speed improvement.

Slow query when using ORDER BY

Here's the query (the largest table has about 40,000 rows)
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
If I run this, it executes very quickly (.05 seconds roughly). It returns 13 rows.
When I add an ORDER BY clause at the end of the query (ordering by any column) the query takes about 10 seconds.
I'm using this database in production now, and everything is working fine. All my other queries are speedy.
Any ideas of what it could be? I ran the query in MySQL's Query Browser, and from the command line. Both places it was dead slow with the ORDER BY.
EDIT: Tolgahan ALBAYRAK solution works, but can anyone explain why it works?
maybe this helps:
SELECT * FROM (
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
) ORDER BY CourseID
Is the column you're ordering by indexed?
Indexing drastically speeds up ordering and filtering.
You are selecting from "UserCourse" which I assume is a joining table between courses and users (Many to Many).
You should index the column that you need to order by, in the "UserCourse" table.
Suppose you want to "order by CourseID", then you need to index it on UserCourse table.
Ordering by any other column that is not present in the joining table (i.e. UserCourse) may require further denormalization and indexing on the joining table to be optimized for speed;
In other words, you need to have a copy of that column in the joining table and index it.
P.S.
The answer given by Tolgahan Albayrak, although correct for this question, would not produce the desired result, in cases where one is doing a "LIMIT x" query.
Have you updated the statistics on your database? I ran into something similar on mine where I had 2 identical queries where the only difference was a capital letter and one returned in 1/2 a second and the other took nearly 5 minutes. Updating the statistics resolved the issue
Realise answer is too late, however I have just had a similar problem, adding order by increased the query time from seconds to 5 minutes and having tried most other suggestions for speeding it up, noticed that the /tmp files where getting to be 12G for this query. Changed the query such that a varchar(20000) field being returned was "trim("ed and performance dramatically improved (back to seconds). So I guess its worth checking whether you are returning large varchars as part of your query and if so, process them (maybe substring(x, 1, length(x))?? if you dont want to trim them.
Query was returning 500k rows and the /tmp file indicated that each row was using about 20k of data.
A similar question was asked before here.
It might help you as well. Basically it describes using composite indexes and how order by works.
Today I was running into a same kind of problem. As soon as I was sorting the resultset by a field from a joined table, the whole query was horribly slow and took more than a hundred seconds.
The server was running MySQL 5.0.51a and by chance I noticed that the same query was running as fast as it should have always done on a server with MySQL 5.1. When comparing the explains for that query I saw that obviously the usage and handling of indexes has changed a lot (at least from 5.0 -> 5.1).
So if you encounter such a problem, maybe your resolution is to simply upgrade your MySQL