mysql - query with multiple left joins takes too long - mysql

I am trying to fetch data with a query which has left joins on multiple tables. The query returns 4132 rows and takes 4.55 sec duration, 31.30 sec fetch in mysql workbench. I even tried to execute it from php but that takes the same amount of time.
SELECT
aa.bams_id AS chassis_bams_id, aa.hostname AS chassis_hostname,
aa.rack_number AS chassis_rack, aa.serial_number AS chassis_serial, aa.site_id AS chassis_site_id,
cb.bay_number, cb.bsn AS serial_number_in_bay,
CASE
WHEN a_a.bams_id IS NULL THEN 'Unknown'
ELSE a_a.bams_id
END AS blade_bams_id,
a_a.hostname AS blade_hostname, a_s.description AS blade_status, a_a.manufacturer AS blade_manufacturer, a_a.model AS blade_model,
a_a.bookable_unit_id AS blade_bookable_unit_id, a_a.rack_number AS blade_rack_number, a_a.manufactured_date AS blade_manufactured_date,
a_a.support_expired_date AS blade_support_expired_date, a_a.site_id AS blade_site_id
FROM all_assets aa
LEFT JOIN manufacturer_model mm ON aa.manufacturer = mm.manufacturer AND aa.model = mm.model
LEFT JOIN chassis_bays cb ON aa.bams_id = cb.chassis_bams_id
LEFT JOIN all_assets a_a ON cb.bsn = a_a.serial_number
LEFT JOIN asset_status a_s ON a_a.status=a_s.status
WHERE mm.hardware_type = 'chassis';
These are the definition of tables being used:
Output of EXPLAIN:
The query fetches data of each blade in each chassis.
Executed the same query on other systems and it takes only 5 seconds to fetch the result.
How do I optimize this query ?
Update (resolved)
Added indexes as suggested by experts here.
Below is the execution plan after adding indexes.

Create indexes, non indexed reads are slower than index reads.
To determine exactly what is causing you slow performance, the best tool
is to use the "query plan analyzer":
Have a look here:
mySql performance explain
Try creating indexes on the most obvious reads that will take place.Look at the fields that play a role when you join and also your where clause. If you have indexes on those fields your performance should increase, if non index reads were taking place.
If this is still a problem it is best to see how mySQL is fetching the data,
sometimes restructuring your data or even maybe altering the way you are queuing
can give you better results.
eg. Create index's for: aa.manufacturer_model, aa.manufacturer, aa.model and mm.hardware_type

Related

AWS RDS MySQL fetching not being completed

I am writing queries off tables in my AWS RDS MySQL server and I can't get the query to complete the fetching. The duration of the query is 5.328 seconds but then fetching just doesn't end. I have Left Joined a sub query. When I run the sub separately it runs very quick and almost has no fetch time. When I run the main query it works great. The main query does have about 97,000 rows. I'm new to AWS RDS Servers and wonder if there is a parameter adjustment I need to be made? I feel as if the query is pretty simple.
We are in the middle of switching from BigQuery and BigQuery runs it just fine with the same data and same query.
Any ideas of what I can do to get it to fetch and speed up the fetching?
I've tried indexing and changing buffer pool size but still no luck
FROM
project__c P
LEFT JOIN contact C ON C.id = P.salesperson__c
LEFT JOIN account A ON A.id = P.sales_team_account_at_sale__c
LEFT JOIN contact S ON S.id = P.rep_contact__c
LEFT JOIN (
SELECT
U.name,
C.rep_id__c EE_Code
FROM
user U
LEFT JOIN profile P ON P.id = U.profileid
LEFT JOIN contact C ON C.email = U.email
WHERE
(P.name LIKE "%Inside%"OR P.name LIKE "%rep%")
AND C.active__c = TRUE
AND C.rep_id__c IS NOT NULL
AND C.recordtypeid = "############"
) LC ON LC.name = P.is_rep_formula__c
You can analyze the query by adding EXPLAIN [your query] and running that to see what indexes are being used and how many rows are examined for each joined table. It might be joining a lot more rows than you expect.
You can try:
SELECT SQL_CALC_FOUND_ROWS [your query] limit 1;
If the problem is in sending too much data (i.e. more rows than you think it's trying to return), this will return quickly. It would prove that the problem does lie in sending the data, not in the query stage. If this is what you find, run the following next:
select FOUND_ROWS();
This will tell you how many total rows matched your query. If it's bigger than you expect, your joins are wrong in some way. The explain analyzer mentioned above should provide some insight. While the outer query has 97000 rows, each of those rows could join with more than one row from the subquery. This can happen if you expect the left side of the join to always have a value, but find out there are rows in which it is empty/null. This can lead to a full cross join where every row from the left joins with every row on the right.
If the limit 1 query also hangs, the problem is in the query itself. Again, the explain analyzer should tell you where the problem lies. Most likely it's a missing index causing very slow scans of tables instead of fast lookups in both tables joins and where clauses.
The inner query might be fine/fast on its own. When you join with another query, if not indexed/joined properly, it could lead to a result set and/or query time many times larger.
If missing indices with derived tables is the problem, read up on how mysql can optimize via different settings by visiting https://dev.mysql.com/doc/refman/5.7/en/derived-table-optimization.html
As seen in a comment to your question, creating a temp table and joining directly gives you control instead of relying on/hoping mysql to optimize your query in a way that uses fast indices.
I'm not versed in BigQuery, but unless it's running the same core mysql engine under the hood, it's not really a good comparison.
Why all the left joins. Your subquery seems to be targeted to a small set of results, yet you still want to get 90k+ rows? If you are using this query to render any sort of list in a web application, apply reasonable limits and pagination.

Temp tables vs subqueries in inner join

Both SQL, return the same results. The first my joins are on the subqueries the second the final queryis a join with a temporary that previously I create/populate them
SELECT COUNT(*) totalCollegiates, SUM(getFee(c.collegiate_id, dateS)) totalMoney
FROM collegiates c
LEFT JOIN (
SELECT collegiate_id FROM collegiateRemittances r
INNER JOIN remittances r1 USING(remittance_id)
WHERE r1.type_id = 1 AND r1.name = remesa
) hasRemittance ON hasRemittance.collegiate_id = c.collegiate_id
WHERE hasRemittance.collegiate_id IS NULL AND c.typePayment = 1 AND c.active = 1 AND c.exentFee = 0 AND c.approvedBoard = 1 AND IF(notCollegiate, c.collegiate_id NOT IN (notCollegiate), '1=1');
DROP TEMPORARY TABLE IF EXISTS hasRemittance;
CREATE TEMPORARY TABLE hasRemittance
SELECT collegiate_id FROM collegiateRemittances r
INNER JOIN remittances r1 USING(remittance_id)
WHERE r1.type_id = 1 AND r1.name = remesa;
SELECT COUNT(*) totalCollegiates, SUM(getFee(c.collegiate_id, dateS)) totalMoney
FROM collegiates c
LEFT JOIN hasRemittance ON hasRemittance.collegiate_id = c.collegiate_id
WHERE hasRemittance.collegiate_id IS NULL AND c.typePayment = 1 AND c.active = 1 AND c.exentFee = 0 AND c.approvedBoard = 1 AND IF(notCollegiate, c.collegiate_id NOT IN (notCollegiate), '1=1');
Which will have better performance for a few thousand records?
The two formulations are identical except that your explicit temp table version is 3 sql statements instead of just 1. That is, the overhead of the back and forth to the server makes it slower. But...
Since the implicit temp table is in a LEFT JOIN, that subquery may be evaluated in one of two ways...
Older versions of MySQL were 'dump' and re-evaluated it. Hence slow.
Newer versions automatically create an index. Hence fast.
Meanwhile, you could speed up the explicit temp table version by adding a suitable index. It would be PRIMARY KEY(collegiate_id). If there is a chance of that INNER JOIN producing dups, then say SELECT DISTINCT.
For "a few thousand" rows, you usually don't need to worry about performance.
Oracle has a zillion options for everything. MySQL has very few, with the default being (usually) the best. So ignore the answer that discussed various options that you could use in MySQL.
There are issues with
AND IF(notCollegiate,
c.collegiate_id NOT IN (notCollegiate),
'1=1')
I can't tell which table notCollegiate is in. notCollegiate cannot be a list, so why use IN? Instead simply use !=. Finally, '1=1' is a 3-character string; did you really want that?
For performance (of either version)
remittances needs INDEX(type_id, name, remittance_id) with remittance_id specifically last.
collegiateRemittances needs INDEX(remittance_id) (unless it is the PK).
collegiates needs INDEX(typePayment, active, exentFee , approvedBoard) in any order.
Bottom line: Worry more about indexes than how you formulate the query.
Ouch. Another wrinkle. What is getFee()? If it is a Stored Function, maybe we need to worry about optimizing it?? And what is dateS?
It depends actually. You'll have to test performance of every option. On my website I had 2 tables with articles and comments to them. It turned out it's faster to call comment counts 20 times for each article, than using a single union query. MySQL (like other DBs) caches queries, so small simple queries can run amazingly fast.
I did not saw that you have tagged the question as mysql so I initialy aswered for Oracle. Here is what I think about mySQL.
MySQL
There are two options when it comes to temporary tables Memory or Disk. And for Disk you can have MyIsam - non transactional and InnoDB transactional. Of course you can expect better performance for non transactional type of storage.
Additionaly you need to figure out how big resultset are you dealing with. For small resultset the memory option would be faster for large resultset the disk option would be faster.
Again at the end as in my original answer you need to figure out what performance is good enough and go for the most descriptive and easy to read option.
Oracle
It depends on what kind of temporary table you are dealing with.
You can have session based temporary tables - data is held until logout, or transaction based - data is held until commit . On top of this they can support transaction logging or not support it. Depending on configuration you can get better performance from a temporary table.
As everything in the world performance is relative therm. Most probably for few thousand records it will not do significant difference between the two queries. In which case I would go not for the most performant on but for the most easier to read and understand one.

Lengthy Query Time for MySQL

I have a mysql db with 4 tables that get updated independently. 2 of the tables will have roughly 25k to 35k rows at any given time (bhMacAcct and sv). I have to join various bits from each table.
This is the query I ended up landing on:
SELECT bhMacAcct.acct,pkg,sv.mac,sv.recData
FROM bhMacAcct,bhPkgAcct,sv
WHERE (bhMacAcct.acct = bhPkgAcct.acct) AND (bhMacAcct.mac = sv.mac);
It takes between 3-5 minutes when using this query. Is there possibly an easier way? Can this be further optimized for speed?
First, always use proper, explicit JOIN syntax and qualify all column names:
SELECT ma.acct, pkg, sv.mac, sv.recData
FROM bhMacAcct ma JOIN
bhPkgAcct pa
ON ma.acct = pa.acct JOIN
sv
ON ma.mac = sv.mac;
For this query, start with indexes on all the columns used for joins. I would recommend: bhMacAcct(acct, mac), bhPkgAcct(acct), and sv(mac, recData). I would also put pkg in the appropriate index, so the indexes cover the query.

Query SQL runs slow with group by in MySQL

I have a sql below, it runs about 30 mins, which is too long for me.
SELECT LPP.learning_project_pupilID, SL.serviceID, MAX(LPPO.start_date), SUM(LPPOT.license_mode_value) totalAssignedLicenses
FROM t_services_licenses SL
INNER JOIN t_pupils_offers_services POS ON POS.service_licenseID = SL.service_licenseID
INNER JOIN j_learning_projects_pupils_offers LPPO ON LPPO.learning_project_pupil_offerID = POS.learning_project_pupil_offerID
INNER JOIN j_learning_projects_pupils LPP ON LPPO.learning_project_pupilID = LPP.learning_project_pupilID
INNER JOIN j_learning_projects_pupils_offers_tracking LPPOT ON LPPOT.pupil_offer_serviceID = POS.pupil_offer_serviceID
INNER JOIN t_filters_items FI ON FI.itemID = LPP.learning_project_pupilID_for_filter_join
WHERE FI.filterID = '4dce2235-aafd-4ba2-b248-c137ad6ce8ca'
AND SL.serviceID IN ('OnlineConversationClasses', 'TwentyFourSeven')
GROUP BY LPP.learning_project_pupilID, SL.serviceID
The explain result below(tell me if you can't view the image):
http://images0.cnblogs.com/blog2015/47012/201508/140920298959608.png
I have viewed the profile result, "copying temp data " wasted almost all the time. I know the reason is caused by "group by" functionality, So I did some changes below to verify it:
I removed the MAX, SUM functions as well as the Group By sql and ran it, the time is only cost about 40 seconds, which is ok for us.
So here , I want to know, if there are some other methods to make above sql execute much more faster?
more info, you can find here: http://www.cnblogs.com/scy251147/p/4728995.html
EDIT:
From the explain view, I can see that in t_filters_items table, there are about 50802 rows filtered, And this table is not luckily Using temporary to store temp data, which is not a good choice for me . I really don't like "Group By" in MySQL very much.
Do not use CHARACTER SET utf8 on UUID columns. Change to ascii. Further discussion of uuids and how to further shrink them: http://mysql.rjweb.org/doc.php/uuid
Are there really 50K rows with FI.filterID = '4dce2235-aafd-4ba2-b248-c137ad6ce8ca'?
The GROUP BY spans two table (LPP and SL) making it impossible to optimize. Can that be changed?
The SUM(...) is likely to have a bigger value than you expect. This is because of the JOINs. Try to rewrite the computation of the SUM in a subquery.
Are you using InnoDB? Is innodb_buffer_pool_size set to about 70% of available RAM?
Approximately how many rows in each table?

Speed of query using FIND_IN_SET on MySql

i have several problems with my query from a catalogue of products.
The query is as follows:
SELECT DISTINCT (cc_id) FROM cms_catalogo
JOIN cms_catalogo_lingua ON ccl_id_prod=cc_id
JOIN cms_catalogo_famiglia ON (FIND_IN_SET(ccf_id, cc_famiglia) != 0)
JOIN cms_catalogo_categoria ON (FIND_IN_SET(ccc_id, cc_categoria) != 0)
JOIN cms_catalogo_sottocat ON (FIND_IN_SET(ccs_id, cc_sottocat) != 0)
LEFT JOIN cms_catalogo_order ON cco_id_prod=cc_id AND cco_id_lingua=1 AND cco_id_sottocat=ccs_id
WHERE ccc_nome='Alpine Skiing' AND ccf_nome='Ski'
I noticed that querying the first time it takes on average 4.5 seconds, then becomes rapid.
I use FIND_IN_SET because in my Database on table "cms_catalogo" I have the column "cc_famiglia" , "cc_categoria" and "cc_sottocat" with inside ID separated by commas (I know it's stupid).
Example:
Table cms_catalogo
Column cc_famiglia: 1,2,3,4,5
Table cms_catalogo_famiglia
Column ccf_id: 3
The slowdown in the query may arise from the use of FIND_IN_SET that way?
If instead of having IDs separated by comma have a table with ID as an index would be faster?
I can not explain, however, why the first execution of the query is very slow and then speeds up
It is better to use constraint connections between tables. So you better connect them by primary key.
If you want just to quick optimisation for this query:
Check explain select ... in mysql to see performance of you query;
Add indexes for columns ccc_id, ccf_id, ccs_id;
Check explain select ... after indexes added.
The first MySQL query takes much more time because it is raw query, the next are cached. So you should rely on first query time.
If it is not complicated report then execution time should be less than 50-100ms, otherwise you can get problems with performance in total. Because I am so sure it is not the only one query for your application.