I need help with optimizing some of my SQL queries. I'm not good in SQL performance. I have a SQL Server 2008 RS Express and I can't use DTA.
May be can help me with optimising and manually creating indexes for these two queries:
SELECT tblBlogs.RecordID, tblBlogs.RecordText, tblBlogs.CDate, tblBlogs.UserID, tblBlogs.Comments, tblUsers.Username, tblUserpics.UserpicName
FROM (
SELECT tblBlogs_2.RecordID, tblBlogs_2.RecordText, tblBlogs_2.CDate, tblBlogs_2.UserID, COUNT(dbo.tblBlogComments.CommentID) AS Comments
FROM (
SELECT TOP (150) RecordID, RecordText, CDate, UserID
FROM dbo.tblBlogs AS tblBlogs_1
ORDER BY RecordID DESC
) AS tblBlogs_2
LEFT OUTER JOIN dbo.tblBlogComments ON tblBlogs_2.RecordID = tblBlogComments.RecordID
GROUP BY tblBlogs_2.RecordID, tblBlogs_2.RecordText, tblBlogs_2.CDate, tblBlogs_2.UserID
) AS tblBlogs
INNER JOIN dbo.tblUsers ON tblBlogs.UserID = tblUsers.UserID
LEFT OUTER JOIN dbo.tblUserpics ON tblBlogs.UserID = tblUserpics.UserID
ORDER BY tblBlogs.CDate DESC
This must select top 150 ros from Blogs table with User details + Comments for every single Blog entry.
SELECT f.ForumID, f.ForumName, t.ThreadName, m.MsgID, m.MsgName, m.MsgBody, m.UserID, m.CDate, m.IP, u.Username, tblCities.CityName,
t.IsClosed, ISNULL(u.Msgs, 0) AS Posts, ISNULL(tblUserpics.UserpicName, '') AS UserpicName, t.IsPoll,
t.IsPollMultiple, ISNULL(u.Crashes, 0) AS Crashes, 0 AS LastMsgID, m.IsFlood, ISNULL(u.RepaGood, 0) AS RepaGood, ISNULL(u.RepaBad, 0)
AS RepaBad, ISNULL(dbo.vMsgsRepaGood.RepaGood, 0) AS MsgRepaGood, ISNULL(dbo.vMsgsRepaBad.RepaBad, 0) AS MsgRepaBad, t.ThreadID,
tblUserPrivateStatuses.StatusName AS PrivateStatus
FROM tblMsgs AS m
INNER JOIN tblThreads AS t ON m.ThreadID = t.ThreadID
INNER JOIN tblForums AS f ON t.ForumID = f.ForumID
INNER JOIN tblUsers AS u ON m.UserID = u.UserID
LEFT OUTER JOIN tblUserPrivateStatuses ON u.UserID = dbo.tblUserPrivateStatuses.UserID
LEFT OUTER JOIN tblCities ON u.CityID = dbo.tblCities.CityID
LEFT OUTER JOIN tblUserpics ON u.UserID = dbo.tblUserpics.UserID
LEFT OUTER JOIN vMsgsRepaGood ON m.MsgID = vMsgsRepaGood.MsgID
LEFT OUTER JOIN vMsgsRepaBad ON m.MsgID = vMsgsRepaBad.MsgID
WHERE m.ThreadID = "& ThreadID & " AND IsFlood = 0
GROUP BY f.ForumID, f.ForumName, t.ThreadName, m.MsgID, m.MsgName, m.MsgBody, m.UserID, m.CDate, m.IP, u.Username, tblCities.CityName, t.IsClosed, u.Msgs, dbo.tblUserpics.UserpicName, t.IsPoll, t.IsPollMultiple, u.Crashes, m.IsFlood, u.RepaGood, u.RepaBad, vMsgsRepaGood.RepaGood, vMsgsRepaBad.RepaBad, t.ThreadID, tblUserPrivateStatuses.StatusName
ORDER BY m.CDate</pre>
This query selects all not flood messages from specific Thread from specific Forum with User details (Registration date, number of good/bad reputation, number of crashes, number of post on this whole forum, city, userpic).
Or maybe somebody can tell me about free tools for optimizing queries and creating indexes?
There is a lot to talk about here, and without significantly more information, it's going to be impossible for anyone to help with your query fully.
Precaution: If you have a DBA for your system, check with them before indexing anything, especially on a live system. They can even help, if you're nice to them. If the system is used by many others, be careful before changing anything like indexes.
A basic tip on indexing, if you don't want to dive deep into the problem, is: index by the following, in this order:
Join predicates
Filter
Order by / Group By / etc.
Also:
Make sure whatever columns possible are non-null.
Use data types that make sense - store nothing as varchar if it's an integer or date. (Column width matters. Use the smallest data type you can, if possible.)
Make sure your joins are the same data type - int to int, varchar to varchar, and so on.
If possible, use unique, non-null indexes on each join predicate in each tables.
Do all of this, and you'll be well on your way. But if you need this stuff regularly, learn it! There is a lot out there, and it is a deep topic, but you can make queries MUCH better if you know what you are doing.
Edit: The syntax for building indexes is here: How do I index a database column. The How/Why is here: How does database indexing work?
Related
I have a query that goes something like this.
Select *
FROM FaultCode FC
JOIN (
SELECT INNER_E.* FROM Equipment INNER_E
) E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057 AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN'
As you can see, in the outside query, there is a where clause in the outside main query.
But also, on the inside, we have an Inner Join statement with the line SELECT INNER_E.* FROM Equipment INNER_E. This inner join makes us only retrieve the fault codes that are inside the equipment table (correct me if I'm wrong).
I am trying to optimize this query.
My question is, does it make any difference to do this
Select *
FROM FaultCode FC
JOIN (
SELECT INNER_E.* FROM Equipment INNER_E
WHERE INNER_E.id_organization = 100057 AND INNER_E.equipment_status = 'ACTIVE'
) E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057 AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN'
So repeating the where clause inside the inner sub query, to further limit it before it joins. Or does the optimizer know to do this automatically?
I tried implementing that line in code, and it seemed to only make my query slower strangely enough. Is there any way I can optimize that query above, or since it's pretty simple, is that the best it's going to get without indexes?
I tried running the Explain Select statement, but I have a hard time parsing what it's telling me. Are there any good resources I can look into to learn some tips or techniques to optimize my query?
I don't have any aggregate functions in my Select fields. So is the only real answer Indexes?
Why is the first subquery needed? Perhaps simply
Select *
FROM FaultCode FC
JOIN Equipment AS E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type
AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057
AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN';
Likely Indexes:
FC: INDEX(code_status, EquipmentID)
E: INDEX(id_organization, equipment_status, EquipmentID,)
Probably unwise to do SELECT * -- It will give you all the columns of all 4 tables. (Without further details, I cannot suggest any "covering" indexes, which seems likely for AT.)
With my version of the query, your question about repeating the WHERE vanishes. With your version, it is likely to help. I don't think the Optimizer is smart enough to catch on to what you are doing.
Show us the EXPLAINs. We can help some with what the cryptic stuff is saying. (And what it is not saying.)
"the best it's going to get without indexes" -- Are you saying you have no indexes??! Not even a PRIMARY KEY for each table? "So is the only real answer Indexes?" Every time you write a query against a non-tiny table, you should ask "do the table(s) have adequate indexes for this query?"
I am tying to execute this query but it is taking more than 5 hours, but the data base size is just 20mb. this is my code. Here I am joining 11 tables with reg_id. I need all columns with distinct values. Please guide me how to rearrange the query.
SELECT *
FROM degree
JOIN diploma
ON degree.reg_id = diploma.reg_id
JOIN further_studies
ON diploma.reg_id = further_studies.reg_id
JOIN iti
ON further_studies.reg_id = iti.reg_id
JOIN personal_info
ON iti.reg_id = personal_info.reg_id
JOIN postgraduation
ON personal_info.reg_id = postgraduation.reg_id
JOIN puc
ON postgraduation.reg_id = puc.reg_id
JOIN skills
ON puc.reg_id = skills.reg_id
JOIN sslc
ON skills.reg_id = sslc.reg_id
JOIN license
ON sslc.reg_id = license.reg_id
JOIN passport
ON license.reg_id = passport.reg_id
GROUP BY fullname
Please help me if I did any mistake
This is a bit long for a comment.
The first problem with your query is that you are using select * with group by fullname. You have zillions of columns in the select that are not in the group by. Unless you really, really, really know what you are doing (which I doubt), this is the wrong way to write a query.
Your performance problem is undoubtedly due to cartesian products and lack of indexes. You are joining across different dimensions -- such as skills and degrees. The result is a product of all the possibilities. For some people, the data size can grow and grow and grow.
And then, the question is: do you have indexes on the keys used in the joins? For performance, you generally want such indexes.
I thought the problem is in the query.First make sure group by fullname and try to give some column names instead of *.
I have written a sql query for my requirement.
This is working fine for me. This is taking 0.0006 sec to execute.
I want to know from sql experts "will this work fine with large amount of data?".
I have written my query below.
SELECT HM_customers.id,
HM_customers.username,
HM_customers.firstname,
HM_customers.lastname,
HM_customers.company,
HM_customers_address_bank.field_data
FROM HM_orders
JOIN HM_order_items
ON HM_order_items.order_id = HM_orders.id
JOIN HM_bid
ON HM_order_items.bid_id = HM_bid.bid_id
JOIN HM_customers
ON HM_bid.user_id = HM_customers.id
JOIN HM_customers_address_bank
ON HM_customers_address_bank.id = HM_customers.default_billing_address
WHERE HM_orders.id = '4'
Any expert can advice me or let me know how can I improve this query. Please suggest me if any issue in this query.
NOTE:- This is a simple query. But I want to know, will this work with large amount of data with less time
You don't need to include the orders table:
SELECT c.id,
c.username,
c.firstname,
c.lastname,
c.company,
cb.field_data
FROM HM_order_items oi
JOIN HM_bid b
ON oi.bid_id = b.bid_id
JOIN HM_customers c
ON b.user_id = c.id
JOIN HM_customers_address_bank cb
ON cb.id = c.default_billing_address
WHERE oi.order_id = '4';
Your query can also result in duplicate rows, if a customer bids on the same items multiple times. If you put in a select distinct, then you will incur overhead of duplicate elimination. If this becomes a problem, you will probably want to restructure the query as an exists.
There are few points worth noting
1) The reference to an outer table column in the WHERE clause prevents the OUTER JOIN from returning any non-matched rows, which implicitly converts the query to an INNER JOIN. This is probably a bug in the query or a misunderstanding of how OUTER JOIN works.
2) Selecting all columns with the * wildcard will cause the query's meaning and behavior to change if the table's schema changes, and might cause the query to retrieve too much data. You should only choose columns you need.
Please make your driven table to 'HM_customers' as all your data is coming from this table and change your join like this way, hopefully this will help you :)
SELECT hmCust.id,
hmCust.username,
hmCust.firstname,
hmCust.lastname,
hmCust.company,
hmCustAdd.field_data
FROM HM_customers hmCust
INNER JOIN HM_bid hmBid
ON hmBid.user_id = hmCust.id
INNER JOIN HM_customers_address_bank hmCustAdd
ON hmCustAdd.id = hmCust.default_billing_address
INNER JOIN HM_order_items hmOrderItem
ON hmOrderItem.order_id = hmBid.bid_id
INNER JOIN HM_orders hmOrder
ON hmOrder.id = hmOrderItem.order_id
WHERE hmOrder.id = '4'
I have an application which queries data from a 3rd party product. As such, i'm keen not to change the table structure.
Is there a way I can improve efficiency purely on the query side?
My query is:
CallsClosed.Query = #"SELECT COALESCE(ti.FIRST_NAME,'Not Assigned') AS 'Technician', COUNT(*) 'Calls_Closed'
FROM WorkOrder_Threaded wot
INNER JOIN WorkOrder wo ON wot.WORKORDERID=wo.WORKORDERID
LEFT JOIN SDUser sdu ON wo.REQUESTERID=sdu.USERID
LEFT JOIN AaaUser aau ON sdu.USERID=aau.USER_ID
LEFT JOIN WorkOrderStates wos ON wo.WORKORDERID=wos.WORKORDERID
LEFT JOIN SDUser td ON wos.OWNERID=td.USERID
LEFT JOIN AaaUser ti ON td.USERID=ti.USER_ID
WHERE (wo.COMPLETEDTIME != 0) AND (wo.COMPLETEDTIME != -1) AND (wo.COMPLETEDTIME IS NOT NULL)
AND wo.COMPLETEDTIME >= (UNIX_TIMESTAMP(TIMESTAMP('" + sdChartRange.From + #"')) * 1000)
AND wot.THD_WOID=wot.WORKORDERID
GROUP BY Technician ORDER BY 'Calls_Closed' DESC";
I've run JetProfiler on this, and it looks like the main offender is the size of the wot table. (c. 19k rows)
Any suggestions on where I should start to speed the query up? (Currently takes about 4s to run)
Minimise the number of joins you have to do.
Echoing the comment, add indices.
Look at the EXPLAIN QUERY result for that query.
If solved to satisfaction, post sample data in a fiddle and I'll take a look.
Make sure you have indexes on the fields used for selection and joins.
Check for foreign keys that are no longer valid ('dangling').
I've got a users table and a votes table. The votes table stores votes toward other users. And for better or worse, a single row in the votes table, stores the votes in both directions between the two users.
Now, the problem is when I wanna list for example all people someone has voted on.
I'm no MySQL expert, but from what I've figured out, thanks to the OR condition in the join statement, it needs to look through the whole users table (currently +44,000 rows), and it creates a temporary table to do so.
Currently, the bellow query takes about two minutes, yes, two minutes to complete. If I remove the OR condition, and everything after it in the join statement, it runs in less than half a second, as it only needs to look through about 17 of the 44,000 user rows (explain ftw!).
The bellow example, the user ID is 9834, and I'm trying to fetch his/her own no votes, and join the info from user who was voted on to the result.
Is there a better, and faster way to do this query? Or should I restructure the tables? I seriously hope it can be fixed by modifying the query, cause there's already a lot of users (+44,000), and votes (+130,000) in the tables, which I'd have to migrate.
thanks :)
SELECT *, votes.id as vote_id
FROM `votes`
LEFT JOIN users ON (
(
votes.user_id_1 = 9834
AND
users.uid = votes.user_id_2
)
OR
(
votes.user_id_2 = 9834
AND
users.uid = votes.user_id_1
)
)
WHERE (
(
votes.user_id_1 = 9834
AND
votes.vote_1 = 0
)
OR
(
votes.user_id_2 = 9834
AND
votes.vote_2 = 0
)
)
ORDER BY votes.updated_at DESC
LIMIT 0, 10
Instead of the OR, you could do a UNION of 2 queries. I have known instances where this is an order of magnitude faster in at least one other DBMS, and I'm guessing MySQL's query optimizer may share the same "feature".
SELECT whatever
FROM votes v
INNER JOIN
users u
ON v.user_id_1 = u.uid
WHERE v.user_id_2 = 9834
AND v.votes_2 = 0
UNION
SELECT whatever
FROM votes v
INNER JOIN
users u
ON v.user_id_2 = u.uid
WHERE v.user_id_1 = 9834
AND v.votes_1 = 0
ORDER BY updated_at DESC
You've answered your own question: yes, you should redesign the table, as it's not working for you. It's too slow, and requires overly complicated queries. Fortunately, migrating the data is just a matter of doing essentially the query you're asking about here, but for all user instead of just one. (That is, a sum or count over the unions the first answering suggested.)