I am writing queries off tables in my AWS RDS MySQL server and I can't get the query to complete the fetching. The duration of the query is 5.328 seconds but then fetching just doesn't end. I have Left Joined a sub query. When I run the sub separately it runs very quick and almost has no fetch time. When I run the main query it works great. The main query does have about 97,000 rows. I'm new to AWS RDS Servers and wonder if there is a parameter adjustment I need to be made? I feel as if the query is pretty simple.
We are in the middle of switching from BigQuery and BigQuery runs it just fine with the same data and same query.
Any ideas of what I can do to get it to fetch and speed up the fetching?
I've tried indexing and changing buffer pool size but still no luck
FROM
project__c P
LEFT JOIN contact C ON C.id = P.salesperson__c
LEFT JOIN account A ON A.id = P.sales_team_account_at_sale__c
LEFT JOIN contact S ON S.id = P.rep_contact__c
LEFT JOIN (
SELECT
U.name,
C.rep_id__c EE_Code
FROM
user U
LEFT JOIN profile P ON P.id = U.profileid
LEFT JOIN contact C ON C.email = U.email
WHERE
(P.name LIKE "%Inside%"OR P.name LIKE "%rep%")
AND C.active__c = TRUE
AND C.rep_id__c IS NOT NULL
AND C.recordtypeid = "############"
) LC ON LC.name = P.is_rep_formula__c
You can analyze the query by adding EXPLAIN [your query] and running that to see what indexes are being used and how many rows are examined for each joined table. It might be joining a lot more rows than you expect.
You can try:
SELECT SQL_CALC_FOUND_ROWS [your query] limit 1;
If the problem is in sending too much data (i.e. more rows than you think it's trying to return), this will return quickly. It would prove that the problem does lie in sending the data, not in the query stage. If this is what you find, run the following next:
select FOUND_ROWS();
This will tell you how many total rows matched your query. If it's bigger than you expect, your joins are wrong in some way. The explain analyzer mentioned above should provide some insight. While the outer query has 97000 rows, each of those rows could join with more than one row from the subquery. This can happen if you expect the left side of the join to always have a value, but find out there are rows in which it is empty/null. This can lead to a full cross join where every row from the left joins with every row on the right.
If the limit 1 query also hangs, the problem is in the query itself. Again, the explain analyzer should tell you where the problem lies. Most likely it's a missing index causing very slow scans of tables instead of fast lookups in both tables joins and where clauses.
The inner query might be fine/fast on its own. When you join with another query, if not indexed/joined properly, it could lead to a result set and/or query time many times larger.
If missing indices with derived tables is the problem, read up on how mysql can optimize via different settings by visiting https://dev.mysql.com/doc/refman/5.7/en/derived-table-optimization.html
As seen in a comment to your question, creating a temp table and joining directly gives you control instead of relying on/hoping mysql to optimize your query in a way that uses fast indices.
I'm not versed in BigQuery, but unless it's running the same core mysql engine under the hood, it's not really a good comparison.
Why all the left joins. Your subquery seems to be targeted to a small set of results, yet you still want to get 90k+ rows? If you are using this query to render any sort of list in a web application, apply reasonable limits and pagination.
Related
I am trying to fetch data with a query which has left joins on multiple tables. The query returns 4132 rows and takes 4.55 sec duration, 31.30 sec fetch in mysql workbench. I even tried to execute it from php but that takes the same amount of time.
SELECT
aa.bams_id AS chassis_bams_id, aa.hostname AS chassis_hostname,
aa.rack_number AS chassis_rack, aa.serial_number AS chassis_serial, aa.site_id AS chassis_site_id,
cb.bay_number, cb.bsn AS serial_number_in_bay,
CASE
WHEN a_a.bams_id IS NULL THEN 'Unknown'
ELSE a_a.bams_id
END AS blade_bams_id,
a_a.hostname AS blade_hostname, a_s.description AS blade_status, a_a.manufacturer AS blade_manufacturer, a_a.model AS blade_model,
a_a.bookable_unit_id AS blade_bookable_unit_id, a_a.rack_number AS blade_rack_number, a_a.manufactured_date AS blade_manufactured_date,
a_a.support_expired_date AS blade_support_expired_date, a_a.site_id AS blade_site_id
FROM all_assets aa
LEFT JOIN manufacturer_model mm ON aa.manufacturer = mm.manufacturer AND aa.model = mm.model
LEFT JOIN chassis_bays cb ON aa.bams_id = cb.chassis_bams_id
LEFT JOIN all_assets a_a ON cb.bsn = a_a.serial_number
LEFT JOIN asset_status a_s ON a_a.status=a_s.status
WHERE mm.hardware_type = 'chassis';
These are the definition of tables being used:
Output of EXPLAIN:
The query fetches data of each blade in each chassis.
Executed the same query on other systems and it takes only 5 seconds to fetch the result.
How do I optimize this query ?
Update (resolved)
Added indexes as suggested by experts here.
Below is the execution plan after adding indexes.
Create indexes, non indexed reads are slower than index reads.
To determine exactly what is causing you slow performance, the best tool
is to use the "query plan analyzer":
Have a look here:
mySql performance explain
Try creating indexes on the most obvious reads that will take place.Look at the fields that play a role when you join and also your where clause. If you have indexes on those fields your performance should increase, if non index reads were taking place.
If this is still a problem it is best to see how mySQL is fetching the data,
sometimes restructuring your data or even maybe altering the way you are queuing
can give you better results.
eg. Create index's for: aa.manufacturer_model, aa.manufacturer, aa.model and mm.hardware_type
I have a sql below, it runs about 30 mins, which is too long for me.
SELECT LPP.learning_project_pupilID, SL.serviceID, MAX(LPPO.start_date), SUM(LPPOT.license_mode_value) totalAssignedLicenses
FROM t_services_licenses SL
INNER JOIN t_pupils_offers_services POS ON POS.service_licenseID = SL.service_licenseID
INNER JOIN j_learning_projects_pupils_offers LPPO ON LPPO.learning_project_pupil_offerID = POS.learning_project_pupil_offerID
INNER JOIN j_learning_projects_pupils LPP ON LPPO.learning_project_pupilID = LPP.learning_project_pupilID
INNER JOIN j_learning_projects_pupils_offers_tracking LPPOT ON LPPOT.pupil_offer_serviceID = POS.pupil_offer_serviceID
INNER JOIN t_filters_items FI ON FI.itemID = LPP.learning_project_pupilID_for_filter_join
WHERE FI.filterID = '4dce2235-aafd-4ba2-b248-c137ad6ce8ca'
AND SL.serviceID IN ('OnlineConversationClasses', 'TwentyFourSeven')
GROUP BY LPP.learning_project_pupilID, SL.serviceID
The explain result below(tell me if you can't view the image):
http://images0.cnblogs.com/blog2015/47012/201508/140920298959608.png
I have viewed the profile result, "copying temp data " wasted almost all the time. I know the reason is caused by "group by" functionality, So I did some changes below to verify it:
I removed the MAX, SUM functions as well as the Group By sql and ran it, the time is only cost about 40 seconds, which is ok for us.
So here , I want to know, if there are some other methods to make above sql execute much more faster?
more info, you can find here: http://www.cnblogs.com/scy251147/p/4728995.html
EDIT:
From the explain view, I can see that in t_filters_items table, there are about 50802 rows filtered, And this table is not luckily Using temporary to store temp data, which is not a good choice for me . I really don't like "Group By" in MySQL very much.
Do not use CHARACTER SET utf8 on UUID columns. Change to ascii. Further discussion of uuids and how to further shrink them: http://mysql.rjweb.org/doc.php/uuid
Are there really 50K rows with FI.filterID = '4dce2235-aafd-4ba2-b248-c137ad6ce8ca'?
The GROUP BY spans two table (LPP and SL) making it impossible to optimize. Can that be changed?
The SUM(...) is likely to have a bigger value than you expect. This is because of the JOINs. Try to rewrite the computation of the SUM in a subquery.
Are you using InnoDB? Is innodb_buffer_pool_size set to about 70% of available RAM?
Approximately how many rows in each table?
I have 2 tables:
Service_BD:
LOB:
I have a requirement now to drop the redundant columns in LOB table like industryId etc. and use Service_BD table to fetch the LOBs for industryId and then get the details of the particular LOB using LOB table.
I am trying to get a single SQL query using Inner Joins but the results are odd.
When I run a simple SQL query like this:
SELECT industryId, LobId
FROM Service_BD
WHERE industryId = 'I01'
GROUP BY lobId
The results are 9 rows:
Now, I would like to join rest of the LOB columns (minus the dropped ones of course) to get the LOB details out of it. So I use the below query:
SELECT *
FROM LOB
INNER JOIN Service_BD ON Service_BD.lobId = LOB.lobId
WHERE Service_BD.industryId = 'I01'
GROUP BY Service_BD.lobID
I am getting the desired results but I have a doubt if this is the most efficient way or not. I doubt because, both Service_BD and LOB tables have huge amount of data, but I have a feeling that if GROUP BY Service_BD.lobID is performed first that would reduce the time complexity of WHERE condition.
Just wanted to know if this is the right way to write the query or are there any better ways to do the same.
You haven't mentioned which DB engine you are using so I guess you are using MySQL. In most cases the GROUP BY will be done only on the rows meeting the WHERE condition. So the GROUP BY is performed only on the fetched result of both the INNER JOIN and the WHERE clause.
I don't think
SELECT *
FROM LOB INNER
JOIN Service_BD ON Service_BD.lobId = LOB.lobId
WHERE Service_BD.industryId = 'I01'
GROUP BY Service_BD.lobID
improves the performance of your query but it certainly eliminates duplicate lobID from your result. Also, I don't see any other better way to eliminate duplicates except introducing the HAVING clause but I don't think it's going to improve the performance of your query.
I have multiple joins including left joins in mysql. There are two ways to do that.
I can put "ON" conditions right after each join:
select * from A join B ON(A.bid=B.ID) join C ON(B.cid=C.ID) join D ON(c.did=D.ID)
I can put them all in one "ON" clause:
select * from A join B join C join D ON(A.bid=B.ID AND B.cid=C.ID AND c.did=D.ID)
Which way is better?
Is it different if I need Left join or Right join in my query?
For simple uses MySQL will almost inevitably execute them in the same manner, so it is a manner of preference and readability (which is a great subject of debate).
However with more complex queries, particularly aggregate queries with OUTER JOINs that have the potential to become disk and io bound - there may be performance and unseen implications in not using a WHERE clause with OUTER JOIN queries.
The difference between a query that runs for 8 minutes, or .8 seconds may ultimately depend on the WHERE clause, particularly as it relates to indexes (How MySQL uses Indexes): The WHERE clause is a core part of providing the query optimizer the information it needs to do it's job and tell the engine how to execute the query in the most efficient way.
From How MySQL Optimizes Queries using WHERE:
"This section discusses optimizations that can be made for processing
WHERE clauses...The best join combination for joining the tables is
found by trying all possibilities. If all columns in ORDER BY and
GROUP BY clauses come from the same table, that table is preferred
first when joining."
For each table in a join, a simpler WHERE is constructed to get a fast
WHERE evaluation for the table and also to skip rows as soon as
possible
Some examples:
Full table scans (type = ALL) with NO Using where in EXTRA
[SQL] SELECT cr.id,cr2.role FROM CReportsAL cr
LEFT JOIN CReportsCA cr2
ON cr.id = cr2.id AND cr.role = cr2.role AND cr.util = 1000
[Err] Out of memory
Uses where to optimize results, with index (Using where,Using index):
[SQL] SELECT cr.id,cr2.role FROM CReportsAL cr
LEFT JOIN CReportsCA cr2
ON cr.id = cr2.id
WHERE cr.role = cr2.role
AND cr.util = 1000
515661 rows in set (0.124s)
****Combination of ON/WHERE - Same result - Same plan in EXPLAIN*******
[SQL] SELECT cr.id,cr2.role FROM CReportsAL cr
LEFT JOIN CReportsCA cr2
ON cr.id = cr2.id
AND cr.role = cr2.role
WHERE cr.util = 1000
515661 rows in set (0.121s)
MySQL is typically smart enough to figure out simple queries like the above and will execute them similarly but in certain cases it will not.
Outer Join Query Performance:
As both LEFT JOIN and RIGHT JOIN are OUTER JOINS (Great in depth review here) the issue of the Cartesian product arises, the avoidance of Table Scans must be avoided, so that as many rows as possible not needed for the query are eliminated as fast as possible.
WHERE, Indexes and the query optimizer used together may completely eliminate the problems posed by cartesian products when used carefully with aggregate functions like AVERAGE, GROUP BY, SUM, DISTINCT etc. orders of magnitude of decrease in run time is achieved with proper indexing by the user and utilization of the WHERE clause.
Finally
Again, for the majority of queries, the query optimizer will execute these in the same manner - making it a manner of preference but when query optimization becomes important, WHERE is a very important tool. I have seen some performance increase in certain cases with INNER JOIN by specifying an indexed col as an additional ON..AND ON clause but I could not tell you why.
Put the ON clause with the JOIN it applies to.
The reasons are:
readability: others can easily see how the tables are joined
performance: if you leave the conditions later in the query, you'll get way more joins happening than need to - it's like putting the conditions in the where clause
convention: by following normal style, your code will be more portable and less likely to encounter problems that may occur with unusual syntax - do what works
I came across writing the query in differnt ways like shown below
Type-I
SELECT JS.JobseekerID
, JS.FirstName
, JS.LastName
, JS.Currency
, JS.AccountRegDate
, JS.LastUpdated
, JS.NoticePeriod
, JS.Availability
, C.CountryName
, S.SalaryAmount
, DD.DisciplineName
, DT.DegreeLevel
FROM Jobseekers JS
INNER
JOIN Countries C
ON JS.CountryID = C.CountryID
INNER
JOIN SalaryBracket S
ON JS.MinSalaryID = S.SalaryID
INNER
JOIN DegreeDisciplines DD
ON JS.DegreeDisciplineID = DD.DisciplineID
INNER
JOIN DegreeType DT
ON JS.DegreeTypeID = DT.DegreeTypeID
WHERE
JS.ShowCV = 'Yes'
Type-II
SELECT JS.JobseekerID
, JS.FirstName
, JS.LastName
, JS.Currency
, JS.AccountRegDate
, JS.LastUpdated
, JS.NoticePeriod
, JS.Availability
, C.CountryName
, S.SalaryAmount
, DD.DisciplineName
, DT.DegreeLevel
FROM Jobseekers JS, Countries C, SalaryBracket S, DegreeDisciplines DD
, DegreeType DT
WHERE
JS.CountryID = C.CountryID
AND JS.MinSalaryID = S.SalaryID
AND JS.DegreeDisciplineID = DD.DisciplineID
AND JS.DegreeTypeID = DT.DegreeTypeID
AND JS.ShowCV = 'Yes'
I am using Mysql database
Both works really well, But I am wondering
which is best practice to use all time for any situation?
Performance wise which is better one?(Say the database as a millions records)
Any advantages of one over the other?
Is there any tool where I can check which is better query?
Thanks in advance
1- It's a no brainer, use the Type I
2- The type II join are also called 'implicit join', whereas the type I are called 'explicit join'. With modern DBMS, you will not have any performance problem with normal query. But I think with some big complex multi join query, the DBMS could have issue with the implicit join. Using explicit join only could improve your explain plan, so faster result !
3- So performance could be an issue, but most important maybe, the readability is improve for further maintenance. Explicit join explain exactly what you want to join on what field, whereas implicit join doesn't show if you make a join or a filter. The Where clause is for filter, not for join !
And a big big point for explicit join : outer join are really annoying with implicit join. It is so hard to read when you want multiple join with outer join that explicit join are THE solution.
4- Execution plan are what you need (See the doc)
Some duplicates :
Explicit vs implicit SQL joins
SQL join: where clause vs. on clause
INNER JOIN ON vs WHERE clause
in the most code i've seen, those querys are done like your Type-II - but i think Type-I is better because of readability (and more logic - a join is a join, so you should write it as a join (althoug the second one is just another writing style for inner joins)).
in performance, there shouldn't be a difference (if there is one, i think the Type-I would be a bit faster).
Look at "Explain"-syntax
http://dev.mysql.com/doc/refman/5.1/en/explain.html
My suggestion.
Update all your tables with some amount of records. Access the MySQL console and run SQL both command one by one. You can see the time execution time in the console.
For the two queries you mentioned (each with only inner joins) any modern database's query optimizer should produce exactly the same query plan, and thus the same performance.
For MySQL, if you prefix the query with EXPLAIN, it will spit out information about the query plan (instead of running the query). If the information from both queries is the same, them the query plan is the same, and the performance will be identical. From the MySQL Reference Manual:
EXPLAIN returns a row of information
for each table used in the SELECT
statement. The tables are listed in
the output in the order that MySQL
would read them while processing the
query. MySQL resolves all joins using
a nested-loop join method. This means
that MySQL reads a row from the first
table, and then finds a matching row
in the second table, the third table,
and so on. When all tables are
processed, MySQL outputs the selected
columns and backtracks through the
table list until a table is found for
which there are more matching rows.
The next row is read from this table
and the process continues with the
next table.
When the EXTENDED keyword is used,
EXPLAIN produces extra information
that can be viewed by issuing a SHOW
WARNINGS statement following the
EXPLAIN statement. This information
displays how the optimizer qualifies
table and column names in the SELECT
statement, what the SELECT looks like
after the application of rewriting and
optimization rules, and possibly other
notes about the optimization process.
As to which syntax is better? That's up to you, but once you move beyond inner joins to outer joins, you'll need to use the newer syntax, since there's no standard for describing outer joins using the older implicit join syntax.