I'm trying to run a query in MySQL that's timing out after a couple of minutes on a QA system with 8 million+ rows. It runs fine for me locally, but obviously less data.
Here's the query:
SELECT
system_name as systemName,
systemLabel,
feature_vector as featureVector,
code as norcaCode,
count(1) as sum
FROM (
SELECT a.id,
a.object_id,
a.system_name,
d.label as systemLabel,
b.norca_type AS norcaType,
b.feature_vector,
a.seqnb,
a.object_index,
c.code
FROM
system_objectdata a
JOIN
sick_il_dacq.system_barcode_norca b
ON
a.id = b.system_objectdata_id
AND
a.partition_key = b.partition_key
LEFT JOIN
system_feature_vector c
ON
b.feature_vector = c.value
JOIN
sick_il_services.system_config d
ON
a.system_name = d.name
WHERE LEFT(FROM_UNIXTIME(object_scan_time/1000),10) >= SUBDATE(CURRENT_DATE, 100)
AND
norca_type = 'BARCODE'
AND
a.is_duplicate = 0
) detail
GROUP BY
system_name, feature_vector, norcaCode;
Here's the explain plan:
It looks like the link to table d, system_config, is has no possible keys.
However, there is an index for name on the table:
Any idea why it's not using the name index? And in general, any ideas on how to improve the query speed?
Related
I've been trying my hand at this and I just keep getting an error or a query that hangs. Basically I have two database queries (one from each database) and I need to combine the results of the first into the second but also use the ID of the second query in the first...confusing!
The first is a simple query. Getting the number of topics approved and set it as "commentnumber". As you can see in the WHERE clause: It needs to use a.ID which would be from the second query.
Database 1
(SELECT (t.topic_posts_approved - 1)
FROM forum.bb_topics t, forum.bb_xpost xp
WHERE xp.wp_id = a.ID
AND t.topic_id = xp.topic_id) as 'commentnumber'
This is a query I've created to get 3 wordpress posts and sort them by a "weight". If I remove "commentnumber" (from the first query) it'll obviously work.
Database 2
SELECT a.post_author, a.id, b.pageviews, a.post_title, a.guid, c.meta_value, (b.pageviews * (c.meta_value + (commentnumber * 1.25))) AS 'weight'
FROM wordpress.wp_posts a, wordpress.wp_poppodyd b, wordpress.wp_postmeta c
WHERE a.ID = b.postid and (a.ID = c.post_id)
AND c.meta_key = 'thumbs_up'
AND (b.day >= NOW() - INTERVAL 2 DAY)
GROUP BY a.post_author
ORDER BY weight DESC
LIMIT 3
I've tried inner joining them but I either don't know what I'm doing or the query is just too much because a few variations I've tried just hangs until killed.
Any help would be massively appreciated!
I figured it out after sitting down for a couple of more hours with it.
As people have said, using the database.table.column name is the key.
Here is my end result in one query:
SELECT a.post_author, a.id, b.pageviews, a.post_title, a.guid, c.meta_value, t.topic_posts_approved, (b.pageviews * (c.meta_value + (t.topic_posts_approved * 1.25))) AS 'weight'
FROM wordpressdb.wp_posts a, wordpressdb.wp_poppodyd b, wordpressdb.wp_postmeta c, forumdb.bb_topics t, forumdb.bb_xpost xp
WHERE a.ID = b.postid and (a.ID = c.post_id) and (a.ID = xp.wp_id) and (t.topic_id = xp.topic_id)
AND c.meta_key = 'thumbs_up'
AND (b.day >= NOW() - INTERVAL 2 DAY)
GROUP BY a.post_author
ORDER BY weight DESC
LIMIT 3
I think you can create a new table and use both the database, select the columns that what you need and insert it to the new table then you can easily able to read that table.
For a reporting output, I used to DROP and recreate a table 'mis.pr_approval_time'. but now I just TRUNCATE it.
After populating the above table with data, I run an UPDATE statement, but I have written that as a SELECT below...
SELECT t.account_id FROM mis.hj_approval_survey h INNER JOIN mis.pr_approval_time t ON h.country = t.country AND t.scheduled_at =
(
SELECT MAX(scheduled_at) FROM mis.pr_approval_time
WHERE country = h.country
AND scheduled_at <= h.created_at
AND TIME_TO_SEC(TIMEDIFF(h.created_at, scheduled_at)) < 91
);
When I run the above statement or even just...
SELECT t.account_id FROM mis.hj_approval_survey h INNER JOIN mis.pr_approval_time t ON h.country = t.country AND t.scheduled_at =
(
SELECT MAX(scheduled_at) FROM mis.pr_approval_time
WHERE country = h.country
);
...it runs forever and does not seem to finish. There are only ~3,400 rows in hj_approval_survey table and 29,000 rows in pr_approval_time. I run this on an Amazon AWS instance with 15+ GB RAM.
Now, if I simply right click on pr_approval_time table and choose ALTER TABLE option, and just close without doing anything, then the above queries run within seconds.
I guess when I trigger the ALTER TABLE option and Workbench populates the table fields, it probably improves its execution plan somehow, but I am not sure why. Has anyone faced anything similar to this? How can I trigger a better execution plan check without right clicking the table and choosing 'ALTER TABLE'
EDIT
It may be noteworthy to mention that my organisation also uses DOMO. Originally, I had this setup as an MySQL Dataflow on DOMO, but the query would not complete on most occassions, but I have observed it finish at times.
This was the reason why I moved this query back to our AWS MySQL RDS. So the problem has not only been observed on our own MySQL RDS, but probably also on DOMO
I suspect this is slow because of the correlated subquery (subquery depends on row values from parent table, meaning it has to execute for each row). I'd try and rework the pr_approval_time table slightly so it's point-in-time and then you can use the JOIN to pick the correct rows without doing a correlated subquery. Something like:
SELECT
hj_approval_survey.country
, hj_approval_survey.created_at
, pr_approval_time.account_id
FROM
#hj_approval_survey AS hj_approval_survey
JOIN (
SELECT
current_row.country
, current_row.scheduled_at AS scheduled_at_start
, COALESCE( MIN( next_row.scheduled_at ), GETDATE() ) AS scheduled_at_end
FROM
#pr_approval_time AS current_row
LEFT OUTER JOIN
#pr_approval_time AS next_row ON (
next_row.country = current_row.country
AND next_row.scheduled_at > current_row.scheduled_at
)
GROUP BY
current_row.country
, current_row.scheduled_at
) AS pr_approval_pit ON (
pr_approval_pit.country = hj_approval_survey.country
AND ( hj_approval_survey.created_at >= pr_approval_pit.scheduled_at_start
AND hj_approval_survey.created_at < pr_approval_pit.scheduled_at_end
)
)
JOIN #pr_approval_time AS pr_approval_time ON (
pr_approval_time.country = pr_approval_pit.country
AND pr_approval_time.scheduled_at = pr_approval_pit.scheduled_at_start
)
WHERE
TIME_TO_SEC( TIMEDIFF( hj_approval_survey.created_at, pr_approval_time.scheduled_at ) ) < 91
Assuming you have proper index on the columns involved in join
You could try refactoring your query using a grouped by subquery and join on country
SELECT t.account_id
FROM mis.hj_approval_survey h
INNER JOIN mis.pr_approval_time t ON h.country = t.country
INNER JOIN (
SELECT country, MAX(scheduled_at) max_sched
FROM mis.pr_approval_time
group by country
) z on z.contry = t.country and t.scheduled_at = z.max_sched
Awhile back I got some help with a specific query. Here's the link: SQL Group BY using strings in new columns
My query looks similar to this:
SELECT event_data, class_40_winner, class_30_winner
FROM events e
LEFT JOIN (SELECT result_event, name AS class_40_winner
FROM results
WHERE class = 40 AND position = 1) c40 ON e.id = c40.result_event
LEFT JOIN (SELECT result_event, name AS class_30_winner
FROM results
WHERE class = 30 AND position = 1) c30 ON e.id = c30.result_event
I have now entered enough data in my database (22,000 rows) that this query is taking over 6 seconds to complete. (My actual query is bigger than the above, in that it now has 4 joins in it.)
I used the "Explain" function on my query to take a look. Each of the queries from the "results" table is pulling in the 22,000 rows, so this seems to be the problem.
I have done some research and it sounds like I should be able to INDEX the relevant column on the "results" table to help speed things up. But when I did that, it actually slowed my query down to about 10 seconds.
Any suggestions for what I can do to improve this query?
AFAIK, you are pivoting your data and I think using max(case ...) ... group by has good performance in pivoting data.
I can suggest you to use this query instead:
select event_date
, max(case when r.class = 40 then name end) `Class 40 Winner`
, max(case when r.class = 30 then name end) `Class 30 Winner`
from events e
left join results r on e.event_id = r.result_event and r.position = 1
group by event_date;
[SQL Fiddle Demo]
Try this query:
SELECT
e.event_date,
r1.name as class_40_winner,
r2.name as class_30_winner
FROM
events e,
results r1,
results r2
WHERE
r1.class = 40 AND
r2.class = 30 AND
r1.position = 1 AND
r2.position = 1 AND
r1.result_event = e.id AND
r2.result_event = e.id
SELECT e.event_data
, r.class
, r.name winner
FROM events e
JOIN results r
ON r.result_event = e.id
WHERE class IN (30,40)
AND position = 1
The rest of this problem is a simple display issue, best resolved in application code.
The below query is very slow (takes around 1 second), but is only searching approx 2500 records (+ inner joined tables).
if i remove the ORDER BY, the query runs in much less time (0.05 or less)
OR if i remove the part nested select below "# used to select where no ProfilePhoto specified" it also runs fast, but i need both of these included.
I have indexes (or primary key) on :tPhoto_PhotoID, PhotoID, p.Enabled, CustomerID, tCustomer_CustomerID, ProfilePhoto (bool), u.UserName, e.PrivateEmail, m.tUser_UserID, Enabled, Active, m.tMemberStatuses_MemberStatusID, e.tCustomerMembership_MembershipID, e.DateCreated
(do i have too many indexes? my understanding is add them anywhere i use WHERE or ON)
The Query :
SELECT e.CustomerID,
e.CustomerName,
e.Location,
SUBSTRING_INDEX(e.CustomerProfile,' ', 25) AS Description,
IFNULL(p.PhotoURL, PhotoTable.PhotoURL) AS PhotoURL
FROM tCustomer e
LEFT JOIN (tCustomerPhoto ep INNER JOIN tPhoto p ON (ep.tPhoto_PhotoID = p.PhotoID AND p.Enabled=1))
ON e.CustomerID = ep.tCustomer_CustomerID AND ep.ProfilePhoto = 1
# used to select where no ProfilePhoto specified
LEFT JOIN ((SELECT pp.PhotoURL, epp.tCustomer_CustomerID
FROM tPhoto pp
LEFT JOIN tCustomerPhoto epp ON epp.tPhoto_PhotoID = pp.PhotoID
GROUP BY epp.tCustomer_CustomerID) AS PhotoTable) ON e.CustomerID = PhotoTable.tCustomer_CustomerID
INNER JOIN tUser u ON u.UserName = e.PrivateEmail
INNER JOIN tmembers m ON m.tUser_UserID = u.UserID
WHERE e.Enabled=1
AND e.Active=1
AND m.tMemberStatuses_MemberStatusID = 2
AND e.tCustomerMembership_MembershipID != 6
ORDER BY e.DateCreated DESC
LIMIT 12
i have similar queries that but they run much faster.
any opinions would be grateful:
Until we get more clarity on your question between working in other query etc..Try EXPLAIN {YourSelectQuery} in MySQL client and see the suggestions to improve the performance.
In the following query, I show the latest status of the sale (by stage, in this case the number 3). The query is based on a subquery in the status history of the sale:
SELECT v.id_sale,
IFNULL((
SELECT (CASE WHEN IFNULL( vec.description, '' ) = ''
THEN ve.name
ELSE vec.description
END)
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
WHERE veh.id_sale = v.id_sale
AND vec.id_stage = 3
ORDER BY veh.id_record DESC
LIMIT 1
), 'x') sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
WHERE 1 =1
AND v.flag =1
AND v.id_quarters =4
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
the query delay 0.0057seg and show 1011 records.
Because I have to filter the sales by the name of the state as it would have to repeat the subquery in a where clause, I have decided to change the same query using joins. In this case, I'm using the MAX function to obtain the latest status:
SELECT
v.id_sale,
IFNULL(veh3.State3,'x') AS sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
LEFT JOIN (
SELECT veh.id_sale,
(CASE WHEN IFNULL(vec.description,'') = ''
THEN ve.name
ELSE vec.description END) AS State3
FROM t_record veh
INNER JOIN (
SELECT id_sale, MAX(id_record) AS max_rating
FROM(
SELECT veh.id_sale, id_record
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign AND vec.id_stage = 3
) m
GROUP BY id_sale
) x ON x.max_rating = veh.id_record
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
) veh3 ON veh3.id_sale = v.id_sale
WHERE v.flag = 1
AND v.id_quarters = 4
This query shows the same results (1011). But the problem is it takes 0.0753 sec
Reviewing the possibilities I have found the factor that makes the difference in the speed of the query:
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
If I remove this clause, both queries the same time delay... Why it works better? Is there any way to use this clause in the joins? I hope your help.
EDIT
I will show the results of EXPLAIN for each query respectively:
q1:
q2:
Interesting, so that little statement basically determines if there is a match between t_record.id_sale and t_sale.id_sale.
Why is this making your query run faster? Because Where statements applied prior to subSelects in the select statement, so if there is no record to go with the sale, then it doesn't bother processing the subSelect. Which is netting you some time. So that's why it works better.
Is it going to work in your join syntax? I don't really know without having your tables to test against but you can always just apply it to the end and find out. Add the keyword EXPLAIN to the beginning of your query and you will get a plan of execution which will help you optimize things. Probably the best way to get better results in your join syntax is to add some indexes to your tables.
But I ask you, is this even necessary? You have a query returning in <8 hundredths of a second. Unless this query is getting ran thousands of times an hour, this is not really taxing your DB at all and your time is probably better spent making improvements elsewhere in your application.