need some help optimizing a stored proc - sql-server-2008

I have a stored procedure which is building a dynamic sql query and then running it via exec(#sql).
The stored proc is joining about 12 tables. As it was, it was running relatively quickly. But then i needed to added in an additional field. To do this, i created a scalar function, which looks like this:
SELECT #weight = #weight +COUNT(*) FROM dbo.UserPDMedication WHERE UserID = #userid
SELECT #weight = #weight +COUNT(*) FROM dbo.[User] WHERE UserID = #userid AND HoehnYarhID IS NOT null
SELECT #weight = #weight +COUNT(*) FROM dbo.[User] WHERE UserID = #userid AND DateOfBirth IS NOT NULL
SELECT #weight = #weight +COUNT(*) FROM dbo.[User] WHERE UserID = #userid AND GenderID IS NOT NULL
SELECT #weight = #weight +COUNT(*) FROM dbo.[User] WHERE UserID = #userid AND DateDiagnosed IS NOT null
It's basically just a function that will return an int based on how many questions a user has filled out. So for each user in the stored proc, this function gets called. The stored proc looks like this:
SELECT DISTINCT u.UserID, u.Healthy, u.DateOfBirth, u.City, st.StateCode AS State, u.GenderID, g.Gender, u.Latitude, u.Longitude, u.PDConditionID, u.Zip, u.Distance,
(SELECT TOP 1 EmailID FROM Messages m WHERE TrialID = ' + #trialID + ' AND ToUserID = u.userid AND LocationID = ' + #locationID + ') AS MessageID, dbo.UserWeightedValue(u.UserID) as wt
FROM [User] u
INNER JOIN aspnet_UsersInRoles uir ON u.AspnetUserID = uir.UserId
INNER JOIN aspnet_Roles r ON uir.RoleId = r.RoleId
FULL JOIN UserHealthCondition uhc ON u.UserID = uhc.UserID
FULL JOIN UserMotorSymptom ums ON u.UserID = ums.UserID
FULL JOIN UserNonMotorSymptom unms ON u.UserID = unms.UserID
FULL JOIN UserPDMedication updm ON u.UserID = updm.UserID
FULL JOIN UserPDTreatment updt ON u.UserID = updt.UserID
FULL JOIN UserSupplement us ON u.UserID = us.UserID
FULL JOIN UserPDGeneticMarker updgm ON u.UserID = updgm.UserID
FULL JOIN UserFamilyMember ufm ON u.UserID = ufm.UserID
FULL JOIN State st ON u.StateID = st.ID
FULL JOIN Gender g ON u.GenderID = g.ID
WHERE u.UserID IS NOT NULL
(i removed some chunks to try and keep this short). This get's executed as a dynamic string in the stored proc. Any tips on how i can optimize this to speed things up?
Thanks
EDIT: i got this working using a combination of suggestions here. I kept my function as is although i combined the multiple select statements into 2 statements.I then took the original stored proc and changed the select to a select into ##temp. And then i ran my function against that temp table. Execution time dropped down to 3-4 seconds. I think I will have to give credit to grant for this question since it was his pointing out distinct that put me on the right trail. But thank you to everyone.

The DISTINCT is absolutely going to cause a performance hit as it does aggregations. Do you really need it? Frequently when you see DISTINCT it's an indication of a data or structural issue that is getting papered over by the ability to eliminate duplicates that the structure should elminate on it's own.
After that, instead of a correlated query in the SELECT list, I'd look to move that as a JOIN. It's not a sure fire win, but frequently the optimizer is better able to work that into the plan.
Based on the complexity of what you're presenting, I'd also look at the execution plan. First thing to check, do you have a full optimization or did it timeout. If it timed out, then you're dealing with a best guess, not a fully calculated "good enough" plan. If that's so, you need to look at simpllifying this query. If you have a good enough plan, see where the bottlenecks are within it.

If UserID is the primary key of the table User, then there is no need to do one SELECT for question filled by the user, you can wrap it in just one SELECT:
SELECT #weight = #weight + COUNT(HoehnYarhID) + COUNT(DateOfBirth) + COUNT(GenderID) + COUNT(DateDiagnosed)
FROM dbo.[User]
WHERE UserID = #userid

Convert the scalar valued function into an inline table valued function.
Scalar functions, inlining, and performance

Related

SQL IF ELSE WITH MULTIPLE SELECT STATEMENT

I want to optimize these SQL queries using if-else but how I should use it? .
if this query result contain 'ALL'
SELECT
bdsubcategory.subcategoryID as ID,
bdsubcategory.subcategoryName as Name
FROM
phonebook.newsms_subscription
INNER JOIN bdsubcategory ON bdsubcategory.subcategoryID = newsms_subscription.subcategoryID
INNER JOIN newsms_client ON newsms_subscription.clientID =newsms_client.clientID
INNER JOIN newsms_person ON newsms_subscription.personID = newsms_person.personID
WHERE
newsms_subscription.isActive = 1 AND
newsms_person.personID = '856'
Then i want to query this
SELECT
bdsubcategory.subcategoryID as ID,
bdsubcategory.subcategoryName as Name
FROM
phonebook.newsms_subscription
INNER JOIN bdsubcategory ON bdsubcategory.subcategoryID = newsms_subscription.subcategoryID
INNER JOIN newsms_person ON newsms_subscription.personID = newsms_person.personID
WHERE
newsms_subscription.isActive = 1
GROUP BY subcategoryName
ORDER BY subcategoryName
otherwise take query1 result .
The problem is that if we do not refactor your project, then you always have to evaluate query1 and see whether it contains All or not. If it does not contain All, then you need to evaluate query2 as well. This can hardly be optimized, let's see a few approaches:
Quickening query1
Since All might be not be the very last evaluated element, adding it to the filter and limiting it is a good idea to quicken query1:
SELECT
COUNT(*)
FROM
phonebook.newsms_subscription
INNER JOIN bdsubcategory ON bdsubcategory.subcategoryID = newsms_subscription.subcategoryID
INNER JOIN newsms_client ON newsms_subscription.clientID =newsms_client.clientID
INNER JOIN newsms_person ON newsms_subscription.personID = newsms_person.personID
WHERE
newsms_subscription.isActive = 1 AND
newsms_person.personID = '856' AND
bdsubcategory.subcategoryName = 'ALL'
LIMIT 0, 1
So, you could create a stored procedure which evaluates query1' (query1' is the quickened version of query1, as seen above) and if there is a result, then we need to execute query1. Otherwise we need to execute query2. This way you still execute two queries, but the first query is optimized.
Refactoring
Note that the second query does not change. You could create a table where you could cache its results, using a periodic job. Then, you could skip the second table to
SELECT ID, Name
FROM MyNewTable;
without the many joins. You would also cache the results of the first query into a table where the items having ALL would be stored and query that table.
One option would be to use a CASE.
Change this:
newsms_person.personID = '856'
To this:
'Y' = CASE WHEN UPPER('856') = 'ALL' THEN 'Y'
WHEN newsms_person.personID = '856' THEN 'Y'
ELSE 'N' END
Alternatively, a stored procedure could be used to first validate whether the personID seems valid, then returns the appropriate data.

How to optimize a query with inner join

My mysql query is too slow and i don't know how to optimize it. My webapp cant load this query because take too much time to run and the webserver have a limit time to get the result.
SELECT rc.trial_id,
rc.created,
rc.date_registration,
rc.agemin_value,
rc.agemin_unit,
rc.agemax_value,
rc.agemax_unit,
rc.exclusion_criteria,
rc.study_design,
rc.expanded_access_program,
rc.number_of_arms,
rc.enrollment_start_actual,
rc.target_sample_size,
(select name from repository_institution where id = rc.primary_sponsor_id) as
primary_sponsor,
(select label from vocabulary_studytype where id = rc.study_type_id) as study_type,
(select label from vocabulary_interventionassigment where id =
rc.intervention_assignment_id) as intervention_assignment,
(select label from vocabulary_studypurpose where id = rc.purpose_id) as study_purpose,
(select label from vocabulary_studymasking where id = rc.masking_id) as study_mask,
(select label from vocabulary_studyallocation where id = rc.allocation_id) as
study_allocation,
(select label from vocabulary_studyphase where id = rc.phase_id) as phase,
(select label from vocabulary_recruitmentstatus where id = rc.recruitment_status_id) as
recruitment_status,
GROUP_CONCAT(vi.label)
FROM
repository_clinicaltrial rc
inner JOIN repository_clinicaltrial_i_code rcic ON rcic.clinicaltrial_id = rc.id JOIN
vocabulary_interventioncode vi ON vi.id = rcic.interventioncode_id
GROUP BY rc.id;
Using inner join instead join could be a solution?
Changing to JOINs vs continuous selects per every row will definitely improve. Also, since you are using MySQL, using the keyword "STRAIGHT_JOIN" tells MySQL to do the query in the order I provided. Since your "rc" table is the primary and all the others are lookups, this will make MySQL use it in that context rather than hoping some other lookup table be the basis of the rest of the joins.
SELECT STRAIGHT_JOIN
rc.trial_id,
rc.created,
rc.date_registration,
rc.agemin_value,
rc.agemin_unit,
rc.agemax_value,
rc.agemax_unit,
rc.exclusion_criteria,
rc.study_design,
rc.expanded_access_program,
rc.number_of_arms,
rc.enrollment_start_actual,
rc.target_sample_size,
ri.name primary_sponsor,
st.label study_type,
via.label intervention_assignment,
vsp.label study_purpose,
vsm.label study_mask,
vsa.label study_allocation,
vsph.label phase,
vrs.label recruitment_status,
GROUP_CONCAT(vi.label)
FROM
repository_clinicaltrial rc
JOIN repository_clinicaltrial_i_code rcic
ON rc.id = rcic.clinicaltrial_id
JOIN vocabulary_interventioncode vi
ON rcic.interventioncode_id = vi.id
JOIN repository_institution ri
on rc.primary_sponsor_id = ri.id
JOIN vocabulary_studytype st
on rc.study_type_id = st.id
JOIN vocabulary_interventionassigment via
on rc.intervention_assignment_id = via.id
JOIN vocabulary_studypurpose vsp
ON rc.purpose_id = vsp.id
JOIN vocabulary_studymasking vsm
ON rc.masking_id = vsm.id
JOIN vocabulary_studyallocation vsa
ON rc.allocation_id = vsa.id
JOIN vocabulary_studyphase vsph
ON rc.phase_id = vsph.id
JOIN vocabulary_recruitmentstatus vrs
ON rc.recruitment_status_id = vrs.id
GROUP BY
rc.id;
One final note. You are using a GROUP BY and applying to the GROUP_CONCAT() which is ok. However, proper group by says you need to group by all non-aggregate columns, which in this case is every other column in the list. You may know this, and the fact the lookups will be the same based on the "rc" associated columns, but its not good practice to do so.
Your joins and subqueries are probably not the problem. Assuming you have correct indexes on the tables, then these are fast. "Correct indexes" means that the id column is the primary key -- a very reasonable assumption.
My guess is that the GROUP BY is the performance issue. So, I would suggest structuring the query with no `GROUP BY:
select . . .
(select group_concat(vi.label)
from repository_clinicaltrial_i_code rcic
vocabulary_interventioncode vi
on vi.id = rcic.interventioncode_id
where rcic.clinicaltrial_id = rc.id
)
from repository_clinicaltrial rc ;
For this, you want indexes on:
repository_clinicaltrial_i_code(clinicaltrial_id, interventioncode_id)
vocabulary_interventioncode(id, label)

Retrieving data from one table while comparing value from another table

Overview
I have two tables as can be seen below:
user_planes
----------------------------------
|id |user_id|plane_id|fuel|status|
----------------------------------
| 2 1 1 1 Ready |
----------------------------------
shop_planes
------------------------
|id |name|fuel_capacity|
------------------------
| 1 bob 3 |
------------------------
Foreign Key Primary Key
user_planes.plane_id <-> shop_planes.id
I want to be able to get every field (SELECT *) in user_planes and name and fuel_capacity based on the following criteria:
WHERE user_planes.user_id = ? - Parameter which will be added to the query through PHP.
WHERE user_planes.status = 'Ready'
WHERE user_planes.fuel < shop_planes.fuel_capacity
The Issue and My Attempts
I've tried JOIN however it retrieves data which doesn't fit that criteria, meaning it gets extra data which is from shop_planes and not user_planes.
SELECT * FROM `user_planes` WHERE fuel IN (SELECT shop_planes.fuel_capacity FROM shop_planes WHERE fuel < shop_planes.fuel_capacity) AND user_planes.user_id = 1 AND status = 'Ready'
and
SELECT * FROM `user_planes` INNER JOIN `shop_planes` ON user_planes.fuel < shop_planes.fuel_capacity AND user_planes.user_id = 1 AND user_planes.status = 'Ready'
I've searched Stackoverflow and looked through many questions but I've not been able to figure it.
I've looked up many tutorials but still can't get the desired result.
The desired result is that the query should use the data stored in user_planes to retrieve data from shop_planes while at the same time not getting any excess data from shop_planes.
Disclaimer
I really struggle using JOIN queries, I could use multiple separate queries however I wish to optimise my queries hence I'm trying to bring it in to one query.
If their isn't clarity in the question, please do say, I'll update it to the best of my ability.
Note - Is there an easy query builder option available either through phpmyadmin or an alternative software?
Thanks in advance.
Your last attempt was not a bad one, the only thing you missed there was the join criteria you described at the beginning of your post. I also moved the other filters to the where clause to better distinguish between join condition and the filters.
SELECT `user_planes`.*
FROM `user_planes`
INNER JOIN `shop_planes` ON user_planes.plane_id = shop_planes.id
WHERE user_planes.fuel < shop_planes.fuel_capacity AND user_planes.user_id = 1 AND user_planes.status = 'Ready'
First you need the base JOIN
SELECT up.* -- only user_plane fields
FROM shop_planes sp -- CREATE alias for table or field
JOIN user_planes up
ON sp.id = up.plane_id
Case 1: apply a filter in where condition with php parameter.
SELECT up.*
FROM shop_planes sp
JOIN user_planes up
ON sp.id = up.plane_id
WHERE up.user_id = ?
Case 2: apply a filter in where condition with string constant
SELECT up.*
FROM shop_planes sp
JOIN user_planes up
ON sp.id = up.plane_id
WHERE user_planes.status = 'Ready'
Case 3: aply filter comparing fields from both tables
SELECT up.*
FROM shop_planes sp
JOIN user_planes up
ON sp.id = up.plane_id
WHERE up.fuel < sp.fuel_capacity
Try something like:
SELECT
up.id AS User_Plane_ID
, up.[user_id]
, up.plane_id
, up.fuel
, up.[status]
, sp.name AS shop_Plane_Name
, sp.fuel_capacity AS shop_Plane_Fuel_Capacity
FROM User_Planes up
INNER JOIN Shop_Planes sp ON up.plane_id = sp.id
AND up.fuel < sp.Fuel_Capacity
WHERE up.[status] = 'Ready'
AND up.[user_id] = ?
Definitely find a tutorial for JOINs, and don't use SELECT *. With SELECT *, you may end up querying much more than you actually need and it can cause problems if the table changes. You'll enjoy your day much more if you explicitly name the columns you want in your query.
I've aliased some of the columns (with AS) since some of those column names may be reserved words. I've also moved the JOIN criteria to include a filter on fuel

SQL Inner Join with 3 tables

So I believe I need a Inner Join for this query, but am not 100% sure.
First of all see my database diagram:
Click here for database diagram
What I'm trying to achieve:
I'm trying to get all messages (so user_username, text, posted_at FROM Messages) where the user_username matches with the followed_username, and where the followed_username has a follower_username that = ?.
So essentially, everyone followed by ?, I want to get all their messages.
Where ? = an inputed username
What I've tried so far
I've tried a number of sql statements and have thus far been unable to get be successful. These are some I have tried.
$sql = "SELECT user_username, text, posted_at FROM Messages, Users, User_Follows WHERE user_username = (SELECT username FROM Users WHERE username = (SELECT followed_username FROM User_Follows WHERE follower_username = ?)) ORDER BY posted_at DESC;";
$sql = "SELECT user_username, text, posted_at FROM Messages, User_Follows WHERE follower_username = ? AND followed_username = user_username;";
$sql = "SELECT user_username, text, posted_at FROM Messages JOIN User_Follows ON user_username = followed_username WHERE follower_username = followed_username;";
I now think I need to use an inner join to achieve what I want, but am not too sure whether this is correct, or how to go about it.
Thanks in advance.
Try something like:
SELECT M.user_username, M.text, M.posted_at
FROM Messages M
INNER JOIN Users U on M.user_username= u.username
INNER JOIN User_Follows UF on UF.followed_username = u.username
WHERE UFfollower_username = ?
ORDER BY posted_at DESC;
Note for future designs that using a long varchar or nvarchar field is a poor choice for a field you will be joining on. Further, if the field is a character type field, it is often subject to being changed over time which is also a bad choice for a key field.
Integer joins are much faster generally. It might be ok if it can save you from having to to do some joins but in general it is a poor idea.
If someone changed his username (which does happen in every system I have ever worked in) you would then have to update all the child tables which could be quite a lot of records to update. If you used a surrogate key, you would only need to update the parent table.
Additionally the word text is a reserved word for many databases and it is best to avoid those in naming fields.
a simple sub query should do it:
select * from messages where user_username in
(
select followed_username from user_follows
where follower_username = 'your_input_username'
);

MySQL Replacing IN and EXISTS with joins in sub sub queries

So, this query is currently used in a webshop to retrieve technical data about articles.
It has served its purpose fine except the amount of products shown have increased lately resulting in unacceptable long loading times for some categories.
For one of the worst pages this (and some other queries) get requested about 80 times.
I only recently learned that MySQL does not optimize sub-queries that don't have a depending parameter to only run once.
So if someone could help me with one of the queries and explain how you can replace the in's and exists's to joins, i will probably be able to change the other ones myself.
select distinct criteria.cri_id, des_texts.tex_text, article_criteria.acr_value, article_criteria.acr_kv_des_id
from article_criteria, designations, des_texts, criteria, articles
where article_criteria.acr_cri_id = criteria.cri_id
and article_criteria.acr_art_id = articles.art_id
and articles.art_deliverystatus = 1
and criteria.cri_des_id = designations.des_id
and designations.des_lng_id = 9
and designations.des_tex_id = des_texts.tex_id
and criteria.cri_id = 328
and article_criteria.acr_art_id IN (Select distinct link_art.la_art_id
from link_art, link_la_typ
where link_art.la_id = link_la_typ.lat_la_id
and link_la_typ.lat_typ_id = 17484
and link_art.la_ga_id IN (Select distinct link_ga_str.lgs_ga_id
from link_ga_str, search_tree
where link_ga_str.lgs_str_id = search_tree.str_id
and search_tree.str_type = 1
and search_tree.str_id = 10132
and EXISTS (Select *
from link_la_typ
where link_la_typ.lat_typ_id = 17484
and link_ga_str.lgs_ga_id = link_la_typ.lat_ga_id)))
order by article_criteria.acr_value
I think this one is the main badguy with sub-sub-sub-queries
I just noticed i can remove the last exist and still get the same results but with no increase in speed, not part of the question though ;) i'll figure out myself whether i still need that part.
Any help or pointers are appreciated, if i left out some useful information tell me as well.
I think this is equivalent:
SELECT DISTINCT c.cri_id, dt.tex_text, ac.acr_value, ac.acr_kv_des_id
FROM article_criteria AS ac
JOIN criteria AS c ON ac.acr_cri_id = c.cri_id
JOIN articles AS a ON ac.acr_art_id = a.art_id
JOIN designations AS d ON c.cri_des_id = d.des_id
JOIN des_texts AS dt ON dt.tex_id = d.des_tex_id
JOIN (SELECT distinct la.la_art_id
FROM link_art AS la
JOIN link_la_typ AS llt ON la.la_id = llt.lat_la_id
JOIN (SELECT DISTINCT lgs.lgs_ga_id
FROM link_ga_str AS lgs
JOIN search_tree AS st ON lgs.lgs_str_id = st.str_id
JOIN link_la_typ AS llt ON lgs.lgs_ga_id = llt.lat_ga_id
WHERE st.str_type = 1
AND st.str_id = 10132
AND llt.lat_typ_id = 17484) AS lgs
ON la.la_ga_id = lgs.lgs_ga_id
WHERE llt.lat_typ_id = 17484) AS la
ON ac.acr_art_id = la.la_art_id
WHERE a.art_deliverystatus = 1
AND d.des_lng_id = 9
AND c.cri_id = 328
ORDER BY ac.acr_value
All the IN <subquery> clauses can be replaced with JOIN <subquery>, where you then JOIN on the column being tested equaling the column returned by the subquery. And the EXISTS test is converted to a join with the table, moving the comparison in the subquery's WHERE clause into the ON clause of the JOIN.
It's probably possible to flatten the whole thing, instead of joining with subqueries. But I suspect performance will be poor, because this won't reduce the temporary tables using DISTINCT. So you'll get combinatorial explosion in the resulting cross product, which will then have to be reduced at the end with the DISTINCT at the top.
I've converted all the implicit joins to ANSI JOIN clauses, to make the structure clearer, and added table aliases to make things more readable.
In general, you can convert a FROM tab1 WHERE ... val IN (SELECT blah) to a join like this.
FROM tab1
JOIN (
SELECT tab1_id
FROM tab2
JOIN tab3 ON whatever = whatever
WHERE whatever
) AS sub1 ON tab1.id = sub1.tab1_id
The JOIN (an inner join) will drop the rows that don't match the ON condition from your query.
If your tab1_id values can come up duplicate from your inner query, use SELECT DISTINCT. But don't use SELECT DISTINCT unless you need to; it is costly to evaluate.