I have this MySQL query that is very slow, I presume because of all the JOINs (it seems complicated, but it's a matter of lot of tables):
SELECT DISTINCT doctors.doc_id,
doctors.doc_user,
doctors.doc_first,
doctors.doc_last,
doctors.doc_email,
doctors.doc_notes,
titles.tit_name,
specializations.spe_name,
activities.act_name,
users.use_first,
users.use_last,
(SELECT COUNT(*) FROM locations WHERE locations.loc_doctor = doctors.doc_id) AS loc_count,
(SELECT COUNT(*) FROM reception WHERE reception.rec_doctor = doctors.doc_id) AS rec_count,
(SELECT COUNT(*) FROM visits INNER JOIN reports ON visits.vis_report = reports.rep_id WHERE visits.vis_doctor = doctors.doc_id AND reports.rep_user LIKE '%s') AS vis_count
FROM
doctors
INNER JOIN titles ON titles.tit_id = doctors.doc_title
INNER JOIN specializations ON specializations.spe_id = doctors.doc_specialization
INNER JOIN activities ON activities.act_id = doctors.doc_activity
LEFT JOIN locations ON locations.loc_doctor = doctors.doc_id
INNER JOIN users ON doctors.doc_user = users.use_id
WHERE
((doctors.doc_last LIKE %s) OR (doctors.doc_first LIKE %s) OR (doctors.doc_email LIKE %s))
AND doctors.doc_user LIKE %s
AND locations.loc_province LIKE %s
AND doctors.doc_specialization LIKE %s
AND doctors.doc_activity LIKE %s
ORDER BY %s
All the %s are parameters in a sprintf() PHP function
The most important thing to notice is... that I have NO indexes on MySQL! I presume that I can speed up the process adding some indexes... but what and where? There are so many joins and search parameters that I am in confusion about what would be efficient :-)
Please can you help?
Thanks in advance!
You can start with adding indexes on those columns you are using in the where condition.
Further, you should index those fields which are used in join, i.e. primary keys and foreign keys column.
I would suggest that gradually experimenting with these indexes would yield a real performance boost.
Further, I have observed that you are fetching too much of data in a single query. If it is really not required, break it up in different reports and pages (If possible) as even if you do proper indexing, the solution will not be quite scalable and may not handle large amount of data.
Note: You might have to create full text index on fields which you query by '%' qualifier.(i.e. use LIKE operator)
Like operators are pretty slow. Here is a discussion of applying indexes and FULL TEXT
mysql like performance boost
Related
I have a query that goes something like this.
Select *
FROM FaultCode FC
JOIN (
SELECT INNER_E.* FROM Equipment INNER_E
) E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057 AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN'
As you can see, in the outside query, there is a where clause in the outside main query.
But also, on the inside, we have an Inner Join statement with the line SELECT INNER_E.* FROM Equipment INNER_E. This inner join makes us only retrieve the fault codes that are inside the equipment table (correct me if I'm wrong).
I am trying to optimize this query.
My question is, does it make any difference to do this
Select *
FROM FaultCode FC
JOIN (
SELECT INNER_E.* FROM Equipment INNER_E
WHERE INNER_E.id_organization = 100057 AND INNER_E.equipment_status = 'ACTIVE'
) E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057 AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN'
So repeating the where clause inside the inner sub query, to further limit it before it joins. Or does the optimizer know to do this automatically?
I tried implementing that line in code, and it seemed to only make my query slower strangely enough. Is there any way I can optimize that query above, or since it's pretty simple, is that the best it's going to get without indexes?
I tried running the Explain Select statement, but I have a hard time parsing what it's telling me. Are there any good resources I can look into to learn some tips or techniques to optimize my query?
I don't have any aggregate functions in my Select fields. So is the only real answer Indexes?
Why is the first subquery needed? Perhaps simply
Select *
FROM FaultCode FC
JOIN Equipment AS E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type
AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057
AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN';
Likely Indexes:
FC: INDEX(code_status, EquipmentID)
E: INDEX(id_organization, equipment_status, EquipmentID,)
Probably unwise to do SELECT * -- It will give you all the columns of all 4 tables. (Without further details, I cannot suggest any "covering" indexes, which seems likely for AT.)
With my version of the query, your question about repeating the WHERE vanishes. With your version, it is likely to help. I don't think the Optimizer is smart enough to catch on to what you are doing.
Show us the EXPLAINs. We can help some with what the cryptic stuff is saying. (And what it is not saying.)
"the best it's going to get without indexes" -- Are you saying you have no indexes??! Not even a PRIMARY KEY for each table? "So is the only real answer Indexes?" Every time you write a query against a non-tiny table, you should ask "do the table(s) have adequate indexes for this query?"
I am tying to execute this query but it is taking more than 5 hours, but the data base size is just 20mb. this is my code. Here I am joining 11 tables with reg_id. I need all columns with distinct values. Please guide me how to rearrange the query.
SELECT *
FROM degree
JOIN diploma
ON degree.reg_id = diploma.reg_id
JOIN further_studies
ON diploma.reg_id = further_studies.reg_id
JOIN iti
ON further_studies.reg_id = iti.reg_id
JOIN personal_info
ON iti.reg_id = personal_info.reg_id
JOIN postgraduation
ON personal_info.reg_id = postgraduation.reg_id
JOIN puc
ON postgraduation.reg_id = puc.reg_id
JOIN skills
ON puc.reg_id = skills.reg_id
JOIN sslc
ON skills.reg_id = sslc.reg_id
JOIN license
ON sslc.reg_id = license.reg_id
JOIN passport
ON license.reg_id = passport.reg_id
GROUP BY fullname
Please help me if I did any mistake
This is a bit long for a comment.
The first problem with your query is that you are using select * with group by fullname. You have zillions of columns in the select that are not in the group by. Unless you really, really, really know what you are doing (which I doubt), this is the wrong way to write a query.
Your performance problem is undoubtedly due to cartesian products and lack of indexes. You are joining across different dimensions -- such as skills and degrees. The result is a product of all the possibilities. For some people, the data size can grow and grow and grow.
And then, the question is: do you have indexes on the keys used in the joins? For performance, you generally want such indexes.
I thought the problem is in the query.First make sure group by fullname and try to give some column names instead of *.
I have spent hours trying to get my query to run faster so far it works on my database.
however it takes 43 seconds to return my result. basically two tables are joined and I need to only return the latest order_history_id for each order_id with an order_status of 12.
I have tried using table shortcuts ie T1 T2 etc but to keep it simple my sql query has the relevant tables names below any help greatly appreciated
SELECT oc_order.order_id, oc_order.firstname, oc_order.lastname
FROM oc_order
INNER JOIN oc_order_history ON oc_order.order_id = oc_order_history.order_id
AND oc_order_history.comment NOT LIKE ''
AND oc_order_history.order_status_id LIKE '12'
AND order_history_id = (SELECT max(order_history_id)
FROM oc_order_history i
WHERE i.order_id = oc_order.order_id)
INNER JOIN is little bit more resources eating as it adds another filtering condition, i.e. it is really a LEFT JOIN + WHERE {joining column} IS NOT NULL. Having indexes on columns taking part in joins and/or where clauses (especially string values that are searched with LIKE) will help to minimise the resources cost.
In Your query You are using LIKE in WHERE clause but completely wrong thus I can say You are misusing it. LIKE is used when You need to search for strings but You only know part of the string, e.g.
name LIKE 'Pet%'
will match all the rows with names like Pete, Peter, Petra, Petronela, Petriarca, Peter Pan, etc... If You want to search for exact value, compare with comparator signs =, <>, <=, >=, e.g.
name = 'Peter'
Now consider also this query:
SELECT o.order_id, o.firstname, o.lastname
FROM oc_order o
LEFT JOIN oc_order_history oh USING(order_id)
WHERE oh.order_status_id = 12
AND oh.order_history_id IN (
SELECT MAX(order_history_id)
FROM oc_order_history
GROUP BY order_history_id
)
Do not use LIKE if you do not want to search a text inside of a field.
Also the do not put the WHERE conditions into the JOIN conditions.
In addition, check that you have SQL Indexes in the fields that you are joining and filtering on.
Update: after taking a look at your tables, I realised that oc_order_history.comment is a Text column, so search on this column is going to be very slow. If you try to remove the condition (like I did) I am sure your query it will be much faster. If you have to query for the comment column, try to change it to a VARCHAR or put and index.
Also you should have indexes at least in oc_order_history(order_status_id) and in oc_order_history(order_id)
SELECT oc_order.order_id, oc_order.firstname, oc_order.lastname
FROM oc_order
INNER JOIN oc_order_history ON oc_order.order_id = oc_order_history.order_id
WHERE oc_order_history.order_status_id = 12
AND order_history_id = (SELECT MAX(order_history_id)
FROM oc_order_history i
WHERE i.order_id = oc_order.order_id)
Ok, here we go. There's this messy SELECT crossing other tables and ordering to get the one desired row. Basically I do the "math" inside the ORDER BY.
1 base table.
7 JOINS poiting to local tables.
WHERE with 2 clauses and a NOT IN crossing another table.
You'll see in the code the ORDER BY is pretty damn big/ugly, it sums the result of 5 different calculations. I need that result to order by those calculations in order to get the worst row-case.
The problem is once I execute the Stored Procedure it takes up to 8 seconds to run. That's kind of non-acceptable. So, I'm starting to check Indexes.
So, I'm looking for advices on how to make this query run faster.
I'm indexing the WHERE clauses and the field LINEA, Should I index something else? Like the rows Im crossing for the JOINs? or should I approach the query differently?
Query:
SET #LINEA = (
SELECT TOP 1
BOA.LIN
FROM
BAND_BA BOA
LEFT JOIN
TEL PAR
ON REPLACE(BOA.Lin,'-','') = SUBSTRING(PAR.Te,2,10)
LEFT JOIN
TELP CLP
ON REPLACE(BOA.Lin,'-','') = SUBSTRING(CLP.Numtel,2,10)
LEFT JOIN
CA C
ON REPLACE(BOA.Lin,'-','') = C.An
LEFT JOIN
RE R
ON REPLACE(BOA.Lin,'-','') = R.Lin
LEFT JOIN
PRODUCTOS2 P2
ON BOA.PRODUCTO = P2.codigo
LEFT JOIN
EN
ON REPLACE(BOA.Lin,'-','') = EN.G
LEFT JOIN
TIP ID
ON TIPID = ID.ID
WHERE
BOA.EST = 'C' AND
ID.SE = 'boA' AND
BOA.LIN NOT IN (
SELECT
LIN
FROM
BAN
)
ORDER BY (EN.VALUE + ANT.VALUE + REIT.VAL + C.VALUE + TEL.VALUE
) DESC,
I'll be frank, this is some pretty terrible SQL. Without seeing all your table structures, advice here will be incomplete. That being said, please don't post all your table structures because you are already very close to "hire a consultant" territory with this.
All the REPLACE logic should be done away with. If you need to JOIN on these fields, then add comparable fields to the tables so you don't need to manipulate the data. Every single JOIN that uses a REPLACE or SUBSTRING is a table or index scan - those are non-SARGable and a definite anti-pattern.
The ORDER BY is probably the most convoluted ORDER BY I have ever seen. Some major issues there:
Subqueries should all be eliminated and materialized either in the outer query or as variables
String manipulation should be eliminated (see item 1 above)
The entire query is basically a code smell. If you need to write code like this to meet business requirements then you either have a terribly inappropriate design or some other much larger issue in the organization or data.
One thing that can kill performance is using a lot of LEFT JOINs. To improve performance of LEFT JOIN, you might want to make sure that the column(s) to which you join have an index - that can have a huge impact on performance.
I've currently got a query that selects metrics data from two tables whilst getting the projects to query from two other tables (one is owned projects, the other is projects to which the user has access).
SELECT v.`projectID`,
(SELECT COUNT(m.`session`)
FROM `metricData` m
WHERE m.`projectID` = v.`projectID`) AS `sessions`,
(SELECT COUNT(pb.`interact`)
FROM `interactionData` pb WHERE pb.`projectID` = v.`projectID` GROUP BY pb.`projectID`) AS `interactions`
FROM `medias` v
LEFT JOIN `projectsExt` pa ON v.`projectsExtID` = pa.`projectsExtID`
WHERE (pa.`user` = '1' OR v.`ownerUser` = '1')
GROUP BY v.`projectID`
It takes too long, 1-2seconds. This is obviously the multi left-join scenario. But, I've got a couple of ideas to improve speed and wondered what the thoughts were in principle. Do I:-
Try and select the list in the query and then get the data, rather than doing the joins. Not sure how this would work.
Do a select in a separate query to get the projectIDs and then run queries on each projectID afterwards. This may lead to hundreds of potentially thousands of requests, but may be better for the processing?
Other ideas?
There's two questions here:
how can I get my result in less than 2 seconds
how can I avoid a left join.
To answer #1 properly there has to be more information. Technical information, such as the explain plan for this particular query is a good start. Even better if we'd have the SHOW CREATE TABLE of all tables that you access, as well as the number of rows they contain.
But I'd also appreciate more functional information: what exactly is the question you're trying to answer? Right now, it seems you're looking at two different sets of medias:
either there is no matching row in projectsExt, in which case medias.ownerUser must equal '1' (is that '1' supposed to be a string btw?)
or there is exactly one mathching row in projectsExt for which projectsExt.user must equal '1' (is that '1' supposed to be a string btw?)
By lack of enough information to answer #1, I can answer #2 - "how to avoid a left join". Answer is: write a UNION of the two sets, one where there is a match and one where there isn't a match.
SELECT v.`projectID`
, (
SELECT COUNT(m.`session`)
FROM `metricData` m
WHERE m.`projectID` = v.`projectID`
) AS `sessions`
, (
SELECT COUNT(pb.`interact`)
FROM `interactionData` pb
WHERE pb.`projectID` = v.`projectID`
GROUP BY pb.`projectID`
) AS `interactions`
FROM (
SELECT v.projectID
FROM medias
WHERE ownerUser = '1'
GROUP BY projectID
UNION ALL
SELECT v.projectID
FROM medias v
INNER JOIN projectsExt pa
ON v.projectsExtID = pa.projectsExtID
WHERE v.ownerUser != '1'
AND pa.user = '1'
GROUP BY v.`projectID
) v
Have you tried, instead, to refactor everything into left joins? Seeing as how you're always grouping on the same field, it shouldn't be a problem. Try that and post an EXPLAIN to see what the bottlenecks are.
Subselects are less performant than joins, because the engine can optimize the joins to a much higher degree. In fact, subselects will usually, where applicable, be rewritten into joins by the engine where possible.
As a rule of a thumb, there is no gain in splitting queries, all you gain is overhead and confusing the optimizer. There are, as always, exceptions to this rule, but they come into play after you've done what you can traditionally and know you keen such an approach.