I have a query with the explain plan below. It's pretty basic, every join is using an index (though not unique), and it's taking 5+ hours. The largest table has about 100k records. RAM and CPU are not pegged or anything, no other queries running, no table locks. The most "complicated" part I have is the coalesce in an outer join. Is that killing me?
For clarification, I'm joining to the same table twice because some of the records have a user ID, some just have a first/last name. I prefer to join by a unique user name obviously, and one of the selected items is coalesce(u1.job_title, u2.job_title)
from utilization_incident ui
left join users_utilization_v u1
on u1.cc_user_id = ui.assigned_to_user_id
and u1.source_system = ui.source
and u1.data_date = ui.data_date
left join users_utilization_v u2
on u2.first_name = ui.assigned_to_first_name
and u2.last_name = ui.assigned_to_last_name
and u2.source_system = ui.source
and u2.data_date = ui.data_date
left join lkp_job_title_service_area jtsa
on jtsa.job_title = coalesce(u1.job_title, u2.job_title)
The CPU may not be pegged, but what about I/O?
How much RAM? What is the value of innodb_buffer_pool_size?
Please provide SHOW CREATE TABLE.
Please provide the entire SELECT.
Please provide the text version of EXPLAIN SELECT ....
Pending further details, these indexes should help:
users_utilization_v: INDEX(cc_user_id, source_system, data_date)
users_utilization_v: INDEX(first_name, last_name, source_system, data_date)
lkp_job_title_service_area: INDEX(job_title)
Remove LEFT unless you need it.
Related
I am tying to execute this query but it is taking more than 5 hours, but the data base size is just 20mb. this is my code. Here I am joining 11 tables with reg_id. I need all columns with distinct values. Please guide me how to rearrange the query.
SELECT *
FROM degree
JOIN diploma
ON degree.reg_id = diploma.reg_id
JOIN further_studies
ON diploma.reg_id = further_studies.reg_id
JOIN iti
ON further_studies.reg_id = iti.reg_id
JOIN personal_info
ON iti.reg_id = personal_info.reg_id
JOIN postgraduation
ON personal_info.reg_id = postgraduation.reg_id
JOIN puc
ON postgraduation.reg_id = puc.reg_id
JOIN skills
ON puc.reg_id = skills.reg_id
JOIN sslc
ON skills.reg_id = sslc.reg_id
JOIN license
ON sslc.reg_id = license.reg_id
JOIN passport
ON license.reg_id = passport.reg_id
GROUP BY fullname
Please help me if I did any mistake
This is a bit long for a comment.
The first problem with your query is that you are using select * with group by fullname. You have zillions of columns in the select that are not in the group by. Unless you really, really, really know what you are doing (which I doubt), this is the wrong way to write a query.
Your performance problem is undoubtedly due to cartesian products and lack of indexes. You are joining across different dimensions -- such as skills and degrees. The result is a product of all the possibilities. For some people, the data size can grow and grow and grow.
And then, the question is: do you have indexes on the keys used in the joins? For performance, you generally want such indexes.
I thought the problem is in the query.First make sure group by fullname and try to give some column names instead of *.
I always have confusion when it comes into JOINING tables.
So, I have a table that stores the user details called tblUsers having the following fields(for the sake of simplicity, I am including only the required fields here while posting):
user_id
first_name
And I have another table which stores the messages called tblMessages:
msg_id
sender_id
recipient_id
msg_body
Now what am trying to do is to fetch all messages, with the user names too. What I have tried is this:
SELECT
`msg_id`,
(SELECT `first_name` FROM `tblUsers` WHERE `tblUsers`.`user_id` = `tblMessages`.`sender_id`) AS `sender_name`,
(SELECT `first_name` FROM `tblUsers` WHERE `tblUsers`.`user_id` = `tblMessages`.`recipient_id`) AS `recipient_name`,
`msg_body`
FROM `tblMessages`
It seems to be working at the moment. But is this the correct way for attaining my goal? Or will JOINing the tables will be better? The tblMessages can grow to a large number of rows probably. If we are going to do the JOIN, then we will do 2 LEFT JOINs? First, on the sender_id of tblMessages with user_id of tblUsers and again recipient_id of tblMessages with user_id of tblUsers. Is that correct?
Let me know your suggestions or corrections on my approach.
This is going to be your best query (It will run queries once, and then join tables on their indices):
SELECT m.`msg_id`, su.`first_name` AS `sender_name`, ru.`first_name` AS `recipient_name`, m.`msg_body`
FROM `tblMessages` m
LEFT JOIN `tblUsers` su ON m.`sender_id` = su.`user_id`
LEFT JOIN `tblUsers` ru ON m.`recipient_id` = ru.`user_id`;
When in doubt, use EXPLAIN right before your query to determine what indexes it's going to use, and how efficient it's going to be. Check out these sqlfiddles containing the EXPLAIN's for each query.
You can read a bit about the reasoning for choosing this query over yours here and straight from the docs here. EXPLAIN is also a helpful tool that can help you understand where your bottlenecks are and what is causing performance issues on your database (This likely isn't going to impact it very much, but you can always do some performance tests when your database reaches a healthy size.
You should JOIN the same table twice, using two different aliases for example s and r:
SELECT
m.msg_id,
m.sender_id,
s.first_name,
m.recipient_id,
r.first_name,
m.msg_body
FROM
tblMessages AS m
LEFT JOIN tblUsers AS s ON m.sender_id=s.user_id
LEFT JOIN tblUsers AS r ON m.recipient_id=r.user_id
but your approach is not wrong, it works and with proper indexes shouldn't be much slower.
I did not write this query. I am working on someone else's old code. I am looking into changing what is needed for this query but if I could simply speed up this query that would solve my problem temporarily. I am looking at adding indexes. when I did a show indexes there are so many indexes on the table orders can that also slow down a query?
I am no database expert. I guess I will learn more from this effort. :)
SELECT
orders.ORD_ID,
orders.ORD_TotalAmt,
orders.PAYMETH_ID,
orders.SCHOOL_ID,
orders.ORD_AddedOn,
orders.AMAZON_PurchaseDate,
orders.ORDSTATUS_ID,
orders.ORD_InvoiceNumber,
orders.ORD_CustFirstName,
orders.ORD_CustLastName,
orders.AMAZON_ORD_ID,
orders.ORD_TrackingNumber,
orders.ORD_SHIPPINGCNTRY_ID,
orders.AMAZON_IsExpedited,
orders.ORD_ShippingStreet1,
orders.ORD_ShippingStreet2,
orders.ORD_ShippingCity,
orders.ORD_ShippingStateProv,
orders.ORD_ShippingZipPostalCode,
orders.CUST_ID,
orders.ORD_ShippingName,
orders.AMAZON_ShipOption,
orders.ORD_ShipLabelGenOn,
orders.ORD_SHIPLABELGEN,
orders.ORD_AddressVerified,
orders.ORD_IsResidential,
orderstatuses.ORDSTATUS_Name,
paymentmethods.PAYMETH_Name,
shippingoptions.SHIPOPT_Name,
SUM(orderitems.ORDITEM_Qty) AS ORD_ItemCnt,
SUM(orderitems.ORDITEM_Weight * orderitems.ORDITEM_Qty) AS ORD_ItemTotalWeight
FROM
orders
LEFT JOIN orderstatuses ON
orders.ORDSTATUS_ID = orderstatuses.ORDSTATUS_ID
LEFT JOIN orderitems ON
orders.ORD_ID = orderitems.ORD_ID
LEFT JOIN paymentmethods ON
orders.PAYMETH_ID = paymentmethods.PAYMETH_ID
LEFT JOIN shippingoptions ON
orders.SHIPOPT_ID = shippingoptions.SHIPOPT_ID
WHERE
(orders.AMAZON_ORD_ID IS NOT NULL AND (orders.ORD_SHIPLABELGEN IS NULL OR orders.ORD_SHIPLABELGEN = '') AND orderstatuses.ORDSTATUS_ID <> 101 AND orderstatuses.ORDSTATUS_ID <> 40)
GROUP BY
orders.ORD_ID,
orders.ORD_TotalAmt,
orders.PAYMETH_ID,
orders.SCHOOL_ID,
orders.ORD_AddedOn,
orders.ORDSTATUS_ID,
orders.ORD_InvoiceNumber,
orders.ORD_CustFirstName,
orders.ORD_CustLastName,
orderstatuses.ORDSTATUS_Name,
paymentmethods.PAYMETH_Name,
shippingoptions.SHIPOPT_Name
ORDER BY
orders.ORD_ID
One simple thing you should consider is whether you really need to use left joins or you would be satisfied using inner joins for some of the joins. the new query would not be the same as the original query, so you would need to think carefully about what you really want back. If your foreign key relationships are indexed correctly, this could help substantially, especially between ORDERS and ORDERITEMS, because I would imagine these are your largest tables. The following post has a good explanation: INNER JOIN vs LEFT JOIN performance in SQL Server. There are lots of other things that can be done, but you will need to post the query plan so people can dive deeper.
It looks like just adding the index was all that was needed.
create index orderitems_ORD_ID_index on orderitems(ORD_ID);
I have a relatively simple game. I need help I think this query isn't optimized correctly.
I have a standard users table. There is an expansions table, which holds general information about the expansions in the game. Each time a user beats a level in an expansion, a row is added to playlog that says their final score (so at first, there are 0 rows in the playlog table for them for the expansion).
EXPLAIN SELECT users.username, expansions.title, expansions.description,
COUNT( playlog.id ) as levels_beaten
FROM users
INNER JOIN expansions
LEFT JOIN playlog ON users.id = playlog.user_id
AND expansions.id = playlog.expansions_id
WHERE users.id = 10
GROUP BY expansions.id
ORDER BY expansions.order_hint DESC
I have the following indexes:
users id - primary, username - unique
expansions id - primary, order_hint - index
playlog expansions_id - foreign, user_id - foreign
I took a database class awhile back and I remember the using temporary and filesorts was supposed to be bad but I don't really remember how to rectify it or if it's okay in this instance (ALSO if I don't select the username, it says "Using Index" in the first row of Explain as well)
Your query looked mostly accurate, but the trail of comments was taking a negative spin. I've rewritten the query to more explicitly show the relationship of the tables and join criteria. You had left vs inner joins. It appears from your description that the "Expansions" table is like a master list of expansions that ARE AVAILABLE in the game (like a lookup table). The ONLY way a record gets into the PLAYLOG is IF someone completes a given expansion. That said, start with the user to their playlog history. If no records, you are done anyhow. If there IS a playlog, then join to the expansions to get the descriptions. No need to get expansion descriptions if nobody completed any such levels.
SELECT
users.username,
expansions.title,
expansions.description,
COUNT( * ) as levels_beaten
FROM
users
JOIN playlog
ON users.id = playlog.user_id
JOIN expansions
ON playlog.expansions_id = expansions.id
WHERE
users.id = 10
GROUP BY
expansions.id
ORDER BY
expansions.order_hint DESC
If the query still appears to cause an issue, I would then suggest adding the keyword "STRAIGHT_JOIN" such as
SELECT STRAIGHT_JOIN ...rest of query.
STRAIGHT_JOIN tells the engine to query in the order I've said and not let it interpret a possibly less efficient query path.
I have a table structure like the following:
user
id
name
profile_stat
id
name
profile_stat_value
id
name
user_profile
user_id
profile_stat_id
profile_stat_value_id
My question is:
How do I evaluate a query where I want to find all users with profile_stat_id and profile_stat_value_id for many stats?
I've tried doing an inner self join, but that quickly gets crazy when searching for many stats. I've also tried doing a count on the actual user_profile table, and that's much better, but still slow.
Is there some magic I'm missing? I have about 10 million rows in the user_profile table and want the query to take no longer than a few seconds. Is that possible?
Typically databases are able to handle 10 million records in a decent manner. I have mostly used oracle in our professional environment with large amounts of data (about 30-40 million rows also) and even doing join queries on the tables has never taken more than a second or two to run.
On IMPORTANT lessson I realized whenever query performance was bad was to see if the indexes are defined properly on the join fields. E.g. Here having index on profile_stat_id and profile_stat_value_id (user_id I am assuming is the primary key) should have indexes defined. This will definitely give you a good performance increaser if you have not done that.
After defining the indexes do run the query once or twice to give DB a chance to calculate the index tree and query plan before verifying the gain
Superficially, you seem to be asking for this, which includes no self-joins:
SELECT u.name, u.id, s.name, s.id, v.name, v.id
FROM User_Profile AS p
JOIN User AS u ON u.id = p.user_id
JOIN Profile_Stat AS s ON s.id = p.profile_stat_id
JOIN Profile_Stat_Value AS v ON v.id = p.profile_stat_value_id
Any of the joins listed can be changed to a LEFT OUTER JOIN if the corresponding table need not have a matching entry. All this does is join the central User_Profile table with each of the other three tables on the appropriate joining column.
Where do you think you need a self-join?
[I have not included anything to filter on 'the many stats'; it is not at all clear to me what that part of the question means.]