For a reporting output, I used to DROP and recreate a table 'mis.pr_approval_time'. but now I just TRUNCATE it.
After populating the above table with data, I run an UPDATE statement, but I have written that as a SELECT below...
SELECT t.account_id FROM mis.hj_approval_survey h INNER JOIN mis.pr_approval_time t ON h.country = t.country AND t.scheduled_at =
(
SELECT MAX(scheduled_at) FROM mis.pr_approval_time
WHERE country = h.country
AND scheduled_at <= h.created_at
AND TIME_TO_SEC(TIMEDIFF(h.created_at, scheduled_at)) < 91
);
When I run the above statement or even just...
SELECT t.account_id FROM mis.hj_approval_survey h INNER JOIN mis.pr_approval_time t ON h.country = t.country AND t.scheduled_at =
(
SELECT MAX(scheduled_at) FROM mis.pr_approval_time
WHERE country = h.country
);
...it runs forever and does not seem to finish. There are only ~3,400 rows in hj_approval_survey table and 29,000 rows in pr_approval_time. I run this on an Amazon AWS instance with 15+ GB RAM.
Now, if I simply right click on pr_approval_time table and choose ALTER TABLE option, and just close without doing anything, then the above queries run within seconds.
I guess when I trigger the ALTER TABLE option and Workbench populates the table fields, it probably improves its execution plan somehow, but I am not sure why. Has anyone faced anything similar to this? How can I trigger a better execution plan check without right clicking the table and choosing 'ALTER TABLE'
EDIT
It may be noteworthy to mention that my organisation also uses DOMO. Originally, I had this setup as an MySQL Dataflow on DOMO, but the query would not complete on most occassions, but I have observed it finish at times.
This was the reason why I moved this query back to our AWS MySQL RDS. So the problem has not only been observed on our own MySQL RDS, but probably also on DOMO
I suspect this is slow because of the correlated subquery (subquery depends on row values from parent table, meaning it has to execute for each row). I'd try and rework the pr_approval_time table slightly so it's point-in-time and then you can use the JOIN to pick the correct rows without doing a correlated subquery. Something like:
SELECT
hj_approval_survey.country
, hj_approval_survey.created_at
, pr_approval_time.account_id
FROM
#hj_approval_survey AS hj_approval_survey
JOIN (
SELECT
current_row.country
, current_row.scheduled_at AS scheduled_at_start
, COALESCE( MIN( next_row.scheduled_at ), GETDATE() ) AS scheduled_at_end
FROM
#pr_approval_time AS current_row
LEFT OUTER JOIN
#pr_approval_time AS next_row ON (
next_row.country = current_row.country
AND next_row.scheduled_at > current_row.scheduled_at
)
GROUP BY
current_row.country
, current_row.scheduled_at
) AS pr_approval_pit ON (
pr_approval_pit.country = hj_approval_survey.country
AND ( hj_approval_survey.created_at >= pr_approval_pit.scheduled_at_start
AND hj_approval_survey.created_at < pr_approval_pit.scheduled_at_end
)
)
JOIN #pr_approval_time AS pr_approval_time ON (
pr_approval_time.country = pr_approval_pit.country
AND pr_approval_time.scheduled_at = pr_approval_pit.scheduled_at_start
)
WHERE
TIME_TO_SEC( TIMEDIFF( hj_approval_survey.created_at, pr_approval_time.scheduled_at ) ) < 91
Assuming you have proper index on the columns involved in join
You could try refactoring your query using a grouped by subquery and join on country
SELECT t.account_id
FROM mis.hj_approval_survey h
INNER JOIN mis.pr_approval_time t ON h.country = t.country
INNER JOIN (
SELECT country, MAX(scheduled_at) max_sched
FROM mis.pr_approval_time
group by country
) z on z.contry = t.country and t.scheduled_at = z.max_sched
Related
I am trying to compare values against same table which has more than 1,000,000 rows. Below is my query and it takes around 25 secs to get results.
EXPLAIN SELECT DISTINCT a.studyid,a.number,a.load_number,b.studyid,b.number,b.load_number FROM
(SELECT t1.*, buildnumber,platformid FROM t t1
INNER JOIN testlog t2 ON t1.`testid` = t2.`testid`
WHERE (buildnumber =1031719 AND platformid IN (SELECT platformid FROM platform WHERE platform.`Description` = "Windows 7 SP1"))
)AS a
JOIN
(SELECT t1.*,buildnumber,platformid FROM t t1
INNER JOIN testlog t2 ON t1.`testid` = t2.`testid`
WHERE (buildnumber =1030716 AND platformid IN (SELECT platformid FROM platform WHERE platform.`Description` = "Windows 7 SP1"))
)AS b
ON a.studyid=b.studyid AND a.load_number = b.load_number AND a.number = b.number
Could you anyone help me to improve this query to get fast enough results?
The problem here is even I have number and load_number index, the query doesn't use that. I dont know why it is always ignored..
Thanks.
First, you have a silly query. You are retrieving six columns, but there are only three values. Look at the on clause.
I think your best bet is to rewrite the query using conditional aggregation. I think the following is equivalent:
SELECT t1.studyid, t1.load_number, t1.number
FROM t t1 INNER JOIN
testlog t2
ON t1.testid = t2.testid
WHERE t2.buildnumber IN (1031719, 1030716) AND
platformid IN (SELECT platformid FROM platform p WHERE p.Description = 'Windows 7 SP1'))
GROUP BY studyid, load_number, number
HAVING MIN(buildnumber) <> MAX(buildnumber)
For this query, you want indexes on platform(Description, platformid) and testlog(buildnumber, platformid) and t(testid).
Problem #1:
IN ( SELECT ... ) optimizes very poorly. The subquery is rerun again and again. It looks like you are expecting exactly one id from that query; if so, change it to = ( SELECT ... ). That way it will be run exactly once.
Problem #2:
FROM ( SELECT ... )
JOIN ( SELECT ... ) ON ...
optimizes poorly because neither subquery. Can you merge the two subqueries into one, as Gordon was trying? If not, then put one of them into a TEMPORARY TABLE and add an appropriate index to that table so that the ON will be able to use it. Probably PRIMARY KEY(studyid, load_number, number).
Footnote: The latest versions of MySQL have made improvements on these problems by dynamically generating indexes. What version are you using?
I have a MySQL query that maps Users to Zones according to their location, and the zone boundaries:
UPDATE User u SET u.zoneId = (
SELECT z.id FROM Zone z
WHERE ST_Contains(z.boundary, u.location)
ORDER BY z.level DESC
LIMIT 1
);
This works fine, but it is quite slow as it's performing a subquery for every single record.
Is it possible to rewrite it using a JOIN, even though it's using ORDER BY ... LIMIT 1 in the subquery?
This ORDER BY ... LIMIT 1 is necessary as several encapsulated zones can match a location, and only the smallest one (highest level) must be assigned.
In the absence of any test data or full DDLs I can't really test this, but this might work. Using joins and a sub query, but burying the sub query an extra level down which might allow MySQL to ignore he use of the table to be updated in the sub query.
Does rely on the user table having a unique key ( I have just taken it as being called id ).
UPDATE User u
INNER JOIN Zone z
ON ST_Contains(z.boundary, u.location)
INNER JOIN
(
SELECT id, MaxLevel
FROM
(
SELECT u.id, MAX(z.level) AS MaxLevel
FROM User u
INNER JOIN Zone z
ON ST_Contains(z.boundary, u.location)
GROUP BY u.id
) Sub1
) Sub2
ON u.id = Sub2.id AND z.level = Sub2.MaxLevel
SET u.zoneId = z.id
If you could set up some test data in SQL fiddle I can test this.
Here's an SQL statement (actually two statements) that works -- it's taking a series of matching rows and adding a delivery_number which increments for each row:
SELECT #i:=0;
UPDATE pipeline_deliveries AS d
SET d.delivery_number = #i:=#i+1
WHERE d.pipelineID = 11
ORDER BY d.setup_time;
But now, the client no longer wants them ordered by setup_time. They needed to be ordered according to departure time, which is a field in another table. I can't figure out how to do this.
The MySQL docs, as well as this answer, suggest that in version 4.0 and up (we're running MySQL 5.0) I should be able to do this:
SELECT #i:=0;
UPDATE pipeline_deliveries AS d RIGHT JOIN pipeline_routesXdeliveryID AS rXd
ON d.pipeline_deliveryID = rXd.pipeline_deliveryID
LEFT JOIN pipeline_routes AS r
ON rXd.pipeline_routeID = r.pipeline_routeID
SET d.delivery_number = #i:=#i+1
WHERE d.pipelineID = 11
ORDER BY r.departure_time,d.pipeline_deliveryID;
but I get the error #1221 - Incorrect usage of UPDATE and ORDER BY.
So what's the correct usage?
You can't mix UPDATE joining 2 (or more) tables and ORDER BY.
You can bypass the limitation, with something like this:
UPDATE
pipeline_deliveries AS upd
JOIN
( SELECT t.pipeline_deliveryID,
#i := #i+1 AS row_number
FROM
( SELECT #i:=0 ) AS dummy
CROSS JOIN
( SELECT d.pipeline_deliveryID
FROM
pipeline_deliveries AS d
JOIN
pipeline_routesXdeliveryID AS rXd
ON d.pipeline_deliveryID = rXd.pipeline_deliveryID
LEFT JOIN
pipeline_routes AS r
ON rXd.pipeline_routeID = r.pipeline_routeID
WHERE
d.pipelineID = 11
ORDER BY
r.departure_time, d.pipeline_deliveryID
) AS t
) AS tmp
ON tmp.pipeline_deliveryID = upd.pipeline_deliveryID
SET
upd.delivery_number = tmp.row_number ;
The above uses two features of MySQL, user defined variables and ordering inside a derived table. Because the latter is not standard SQL, it may very well break in a feature release of MySQL (when the optimizer is clever enough to figure out that ordering inside a derived table is useless unless there is a LIMIT clause). In fact the query would do exactly that in the latest versions of MariaDB (5.3 and 5.5). It would run as if the ORDER BY was not there and the results would not be the expected. See a related question at MariaDB site: GROUP BY trick has been optimized away.
The same may very well happen in any future release of main-strean MySQL (maybe in 5.6, anyone care to test this?) that will improve the optimizer code.
So, it's better to write this in standard SQL. The best would be window functions which haven't been implemented yet. But you could also use a self-join, which will be not very bad regarding efficiency, as long as you are dealing with a small subset of rows to be affected by the update.
UPDATE
pipeline_deliveries AS upd
JOIN
( SELECT t1.pipeline_deliveryID
, COUNT(*) AS row_number
FROM
( SELECT d.pipeline_deliveryID
, r.departure_time
FROM
pipeline_deliveries AS d
JOIN
pipeline_routesXdeliveryID AS rXd
ON d.pipeline_deliveryID = rXd.pipeline_deliveryID
LEFT JOIN
pipeline_routes AS r
ON rXd.pipeline_routeID = r.pipeline_routeID
WHERE
d.pipelineID = 11
) AS t1
JOIN
( SELECT d.pipeline_deliveryID
, r.departure_time
FROM
pipeline_deliveries AS d
JOIN
pipeline_routesXdeliveryID AS rXd
ON d.pipeline_deliveryID = rXd.pipeline_deliveryID
LEFT JOIN
pipeline_routes AS r
ON rXd.pipeline_routeID = r.pipeline_routeID
WHERE
d.pipelineID = 11
) AS t2
ON t2.departure_time < t2.departure_time
OR t2.departure_time = t2.departure_time
AND t2.pipeline_deliveryID <= t1.pipeline_deliveryID
OR t1.departure_time IS NULL
AND ( t2.departure_time IS NOT NULL
OR t2.departure_time IS NULL
AND t2.pipeline_deliveryID <= t1.pipeline_deliveryID
)
GROUP BY
t1.pipeline_deliveryID
) AS tmp
ON tmp.pipeline_deliveryID = upd.pipeline_deliveryID
SET
upd.delivery_number = tmp.row_number ;
Based on this documentation
For the multiple-table syntax, UPDATE updates rows in each table named
in table_references that satisfy the conditions. In this case, ORDER
BY and LIMIT cannot be used.
Without knowing too much about MySQL you could open up a cursor and process this row by row, or by passing it back to the client code (PHP,Java, etc) that you maintain to handle this processing.
After more digging:
To eliminate the badly optimized subquery, you need to rewrite the
subquery as a join, but how can you do that and retain the LIMIT and
ORDER BY? One way is to find the rows to be updated in a subquery in
the FROM clause, so the LIMIT and ORDER BY can be nested inside the
subquery. In this way work_to_do is joined against the ten
highest-priority unclaimed rows of itself. Normally you can’t
self-join the update target in a multi-table UPDATE, but since it’s
within a subquery in the FROM clause, it works in this case.
update work_to_do as target
inner join (
select w. client, work_unit
from work_to_do as w
inner join eligible_client as e on e.client = w.client
where processor = 0
order by priority desc
limit 10
) as source on source.client = target.client
and source.work_unit = target.work_unit
set processor = #process_id;
There is one downside: the rows are not locked in primary key order.
This may help explain the occasional deadlock we get on this table
The hard way:-
ALTER TABLE eav_attribute_option
ADD temp_value TEXT NOT NULL
AFTER sort_order;
UPDATE eav_attribute_option o
JOIN eav_attribute_option_value ov ON o.option_id=ov.option_id
SET o.temp_value = ov.value
WHERE o.attribute_id=90;
SET #x = 0;
UPDATE eav_attribute_option
SET sort_order = (#x:=#x+1)
WHERE attribute_id=90
ORDER BY temp_value ASC;
ALTER TABLE eav_attribute_option
DROP temp_value;
EDIT:
Sorry about unreadable query, I was under deadline. I managed to solve problem by breaking this query into two smaller ones, and doing some business logic in Java. Still want to know why this query can random times return two different results.
So, it randomly returns once all expected results, other time just half. I noticed that when I write it join per join, and execute after each join, in the end it returns all expected results. So am wandering if there's some kind of MySql memory or other limitation that it doesn't take whole tables in joins. Also read on undeterministic queries but not sure what to tell.
Please help, ask if needs clarification, and thank you in advance.
RESET QUERY CACHE;
SET SQL_BIG_SELECTS=1;
set #displayvideoaction_id = 2302;
set #ticSessionId = 3851;
select richtext.id,richtextcross.name,richtextcross.updates_demo_field,richtext.content from
(
select listitemcross.id,name,updates_demo_field,listitem.text_id from
(
select id,name, updates_demo_field, items_id from
(
SELECT id, name, answertype_id, updates_demo_field,
#student:=CASE WHEN #class <> updates_demo_field THEN 0 ELSE #student+1 END AS rn,
#class:=updates_demo_field AS clset FROM
(SELECT #student:= -1) s,
(SELECT #class:= '-1') c,
(
select id, name, answertype_id, updates_demo_field from
(
select manytomany.questions_id from
(
select questiongroup_id from
(
select questiongroup_id from `ticnotes`.`scriptaction` where ticsession_id=#ticSessionId and questiongroup_id is not null
) scriptaction
inner join
(
select * from `ticnotes`.`questiongroup`
) questiongroup on scriptaction.questiongroup_id=questiongroup.id
) scriptgroup
inner join
(
select * from `ticnotes`.`questiongroup_question`
) manytomany on scriptgroup.questiongroup_id=manytomany.questiongroup_id
) questionrelation
inner join
(
select * from `ticnotes`.`question`
) questiontable on questionrelation.questions_id=questiontable.id
where updates_demo_field = 'DEMO1' or updates_demo_field = 'DEMO2'
order by updates_demo_field, id desc
) t
having rn=0
) firstrowofgroup
inner join
(
select * from `ticnotes`.`multipleoptionstype_listitem`
) selectlistanswers on firstrowofgroup.answertype_id=selectlistanswers.multipleoptionstype_id
) listitemcross
inner join
(
select * from `ticnotes`.`listitem`
) listitem on listitemcross.items_id=listitem.id
) richtextcross
inner join
(
select * from `ticnotes`.`richtext`
) richtext on richtextcross.text_id=richtext.id;
My first impression is - don't use short cuts to describe your tables. I am lost at which td3 is where ,then td6, tdx3... I guess you might be lost as well.
If you name your aliases more sensibly there will be less chance to get something wrong and mix 6 with 8 or whatever.
Just a sugestion :)
There is no limitation on mySQL so my bet would be on human error - somewhere there join logic fails.
im trying to generate a report using CodeIgniter and Datatables.net .
Now i'm trying to the amount of closed jobs (its a human resources system). I used to query all jobs and in PHP do a foreach and then doing the calcs.
Because im want to use all the features of Datatables (sorting specifically) im trying to do all the calcs in mySQL.
The problem is: the second subquery is very very very slow.
SELECT
jobs.jobs_id, clients.nome_fantasia, concat_ws(' ', user_profiles.first_name, user_profiles.last_name) as fullname,
jobs.titulo_vaga, jobs.qtd_vagas, company.name as nome_company, jobs_status.name as status_name, DATEDIFF(NOW(), jobs.data_abertura) as date_idade,
(select count(job_cv.jobs_id) from job_cv where job_cv.jobs_id = jobs.jobs_id) as qtd_int,
(select count(distinct job_cv.user_id) from job_cv_history join job_cv on job_cv.job_cv_id = job_cv_history.job_cv_id where job_cv_history.status = '11' and job_cv.jobs_id = jobs.jobs_id ) as fechadas
FROM (jobs)
JOIN clients ON lients.clients_id=jobs.clients_idJOIN user_profiles ON jobs.consultor_id=user_profiles.user_id
JOIN jobs_status ON jobs.status=jobs_status.jobs_status_id
JOIN company ON jobs.company_id=company.company_id
LIMIT 50
Some one can help me? I can provide more information if its needed.
UPDATE
The idea to use JOIN instead SELECT work with the first subquery but with the second one not, there a way to pass a 'variable' to use inside the subquery? Like the current jobs_id?
UPDATE AGAIN
This line works fine by itself. But inside the subquery take about a minute with worng values
SELECT job_cv.jobs_id,count(distinct job_cv.user_id) AS fechadas
FROM job_cv_history
JOIN job_cv
ON job_cv.job_cv_id = job_cv_history.job_cv_id
WHERE job_cv_history.status = '11'
GROUP BY job_cv.jobs_id
It is not subquery that is slow. It's the fact, that you're executing these subqueries for each row returned from outer query. Move these to joins instead, and you should observe increase in performance.
SELECT
jobs.jobs_id, clients.nome_fantasia, concat_ws(' ', user_profiles.first_name, user_profiles.last_name) as fullname,
jobs.titulo_vaga, jobs.qtd_vagas, company.name as nome_company, jobs_status.name as status_name, DATEDIFF(NOW(), jobs.data_abertura) as date_idade,
qtd.qtd_int,
fechadas.fechadas
FROM (jobs)
JOIN clients ON lients.clients_id=jobs.clients_idJOIN user_profiles ON jobs.consultor_id=user_profiles.user_id
JOIN jobs_status ON jobs.status=jobs_status.jobs_status_id
JOIN company ON jobs.company_id=company.company_id
JOIN (
SELECT jobs_id, count(jobs_id) AS qtd_int FROM job_cv GROUP BY jobs_id
) AS qtd ON qtd.jobs_id = jobs.jobs_id
JOIN (
SELECT job_cv.user_id, count(distinct job_cv.user_id) AS fechadas
FROM job_cv_history
JOIN job_cv
ON job_cv.job_cv_id = job_cv_history.job_cv_id
WHERE job_cv_history.status = '11'
GROUP BY job_cv.user_id
) AS fechadas ON job_cv.jobs_id = jobs.jobs_id
LIMIT 50
You may try to create these indexes:
ALTER TABLE `job_cv` ADD INDEX `job_cv_cindex` (`job_cv_id` ASC, `jobs_id` ASC, `user_id` ASC);
ALTER TABLE `job_cv_history` ADD INDEX `job_cv_history_cindex` (`job_cv_id` ASC, `status` ASC);
use Joins instead of sub queries. It significantly improves the performance in MySql.
try to use Left join on your case and see if performance improves or not