How can I optimize this MySQL query? Should I use joins? - mysql

I would like to optimize the following MySQL Query, It is picking data from 2 tables where authorchecklist table has records in details while selectedrevlist table is using for the reviewer's reviews.
I want to get those records from selectedrevlist where scoreSubmit = 1
SELECT *
FROM `authorchecklist` acl
WHERE acl.manuscriptStatus = 'Awaiting Reviewer Assignment'
AND acl.submitStatus = '1'
AND
( SELECT COUNT( 1 )
FROM selectedrevlist srl
WHERE srl.OrderNumber = acl.OrderNumber
AND srl.editorType = 'Editor'
AND srl.editorID = '10'
AND srl.scoreSubmit = '1'
) = 1
The above query is working fine but it takes aprox 20 seconds to load the records.

This is your query:
SELECT acl.*
FROM authorchecklist acl
WHERE acl.manuscriptStatus = 'Awaiting Reviewer Assignment' AND
acl.submitStatus = 1 AND
(SELECT COUNT(1)
FROM selectedrevlist srl
WHERE srl.OrderNumber = acl.OrderNumber AND
srl.editorType = 'Editor' AND
srl.editorID = 10 AND
srl.scoreSubmit = 1
) = 1 ;
For this query, you want indexes on authorchecklist(submitStatus, manuscriptStatus, OrderNumber) and selectedrevlist(OrderNumber, editorId, scoreSubmit).

I rearranged the query to make it easier to read:
SELECT *
FROM `authorchecklist` acl
WHERE acl.manuscriptStatus = 'Awaiting Reviewer Assignment'
AND acl.submitStatus = '1'
AND (
SELECT COUNT( 1 )
FROM selectedrevlist srl
WHERE srl.OrderNumber = acl.OrderNumber
AND srl.editorType = 'Editor'
AND srl.editorID = '10' AND srl.scoreSubmit = '1'
) = 1
I'm assuming there's only one selectedrevlist for each authorchecklist.
You didn't send the tables definitions ("CREATE TABLE ..."), but probably at least one of those fields are not indexed:
authorchecklist.manuscriptStatus
selectedrevlist.OrderNumber
If they're not indexed, the SQL server will need to transverse all the records. It will transverse all the authorchecklist rows, and for each authorchecklist row, it will transverse all the selectedrevlist rows to find the "srl.OrderNumber = acl.OrderNumber". Indexes may make the insertions a bit slower, but they speed up readings if they're used appropriately.
[removed wrong assertion]
If you're using MySQL, add the "LIMIT 1" when you're sure that there will always be fetched only one record. Also, use the InnoDB engine and add foreign keys - those are indexes that validate the relations.
Have a look at these:
https://logicalread.com/optimize-mysql-indexes-mc12/#.WsDpLtP4_Eg
https://www.eversql.com/choosing-the-best-indexes-for-mysql-query-optimization/
https://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html
https://logicalread.com/mysql-foreign-keys-mc13/#.WsDrx9P4_Eg

Add these indexes:
ALTER TABLE
`authorchecklist`
ADD
INDEX `authorchecklist_idx_manuscriptstatu_submitstatus` (`manuscriptStatus`, `submitStatus`);
ALTER TABLE
`authorchecklist`
ADD
INDEX `authorchecklist_idx_ordernumber` (`OrderNumber`);
ALTER TABLE
`selectedrevlist`
ADD
INDEX `selectedrevlist_idx_editort_editori_scoresu_ordernu` (
`editorType`,
`editorID`,
`scoreSubmit`,
`OrderNumber`
);
Use EXISTS instead of a subquery that counts records.
Exists subquery will exit once something is found, instead of counting all rows matching the filters.
SELECT
*
FROM
`authorchecklist` acl
WHERE
acl.manuscriptStatus = 'Awaiting Reviewer Assignment'
AND acl.submitStatus = '1'
AND EXISTS (SELECT *
FROM
selectedrevlist srl
WHERE
srl.OrderNumber = acl.OrderNumber
AND srl.editorType = 'Editor'
AND srl.editorID = '10'
AND srl.scoreSubmit = '1')

Related

Query takes more than 40 seconds to execute

This query takes more than 40 seconds to execute on a table that has 200k rows
SELECT
my_robots.*,
(
SELECT count(id)
FROM hpsi_trading
WHERE estado <= 1 and idRobot = my_robots.id
) as openorders,
apikeys.apikey,
apikeys.apisecret
FROM my_robots, apikeys
WHERE estado <= 1
and idRobot = '2'
and ready = '1'
and apikeys.id = my_robots.idApiKey
and (my_robots.id LIKE '%0'
OR my_robots.id LIKE '%1'
OR my_robots.id LIKE '%2')
I know it is because of the count inside the query, but how could i fix this efficiently.
Edit: Explain
Thanks.
Use GROUP BY instead
SELECT my_robots.*,
count(id) as openorders,
apikeys.apikey,
apikeys.apisecret
FROM my_robots
JOIN apikeys ON apikeys.id = my_robots.idApiKey
LEFT JOIN hpsi_trading ON hpsi_trading.idRobot = my_robots.id and estado <= 1
WHERE estado <= 1 and
idRobot = '2' and
ready = '1' and
(
my_robots.id LIKE '%0' OR
my_robots.id LIKE '%1' OR
my_robots.id LIKE '%2'
)
GROUP BY my_robots.id, apikeys.apikey, apikeys.apisecret
Use explicit JOIN syntax. Some indexes will be needed to run it fast, however, the database structure is not clear from your post (and from your query as well).
The explain plan shows that the largest pain is selecting the data from the table hpsi_trading.
The challenge from the database's point of view is that the query contains a correlated subquery in the SELECT clause, which needs to be executed once for each result of the outer query (after filtering).
Replacing this subquery with a JOIN + GROUP BY will require MySQL to join between all these records (inflate) and only then deflate the data using GROUP BY, which might take time.
Instead, I would extract the subquery to a temporary table, which is grouped during creation, index it and join to it. That way, the subquery will run once, using a quick covering index, it will already group the data and only then join it to the other table.
This far, it's all pros. But, the con here is that extracting a subquery to a temporary table might require more effort on the development side.
Please try this version and let me know if it helped (if not, please provide a fresh EXPLAIN plan screenshot):
Creating the temp table:
CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS
SELECT idRobot, COUNT(id) as openorders
FROM hpsi_trading
WHERE estado <= 1
GROUP BY idRobot;
The modified query:
SELECT
my_robots.*,
temp1.openorders,
apikeys.apikey,
apikeys.apisecret
FROM
my_robots,
apikeys
LEFT JOIN temp1 on temp1.idRobot = my_robots.id
WHERE
estado <= 1 AND idRobot = '2'
AND ready = '1'
AND apikeys.id = my_robots.idApiKey
AND (my_robots.id LIKE '%0'
OR my_robots.id LIKE '%1'
OR my_robots.id LIKE '%2')
The indexes to add for this solution (I assumed from logic that estado, idRobot and ready are from the apikeys table. If that's not the case, let me know and I'll adjust the indexes):
ALTER TABLE `temp1` ADD INDEX `temp1_index_1` (idRobot);
ALTER TABLE `hpsi_trading` ADD INDEX `hpsi_trading_index_1` (idRobot, estado, id);
ALTER TABLE `apikeys` ADD INDEX `apikeys_index_1` (`idRobot`, `ready`, `id`, `estado`);
ALTER TABLE `my_robots` ADD INDEX `my_robots_index_1` (`idApiKey`);

INNER JOIN query

I'm having trouble optimising a simple SQL query but having serious issue with timing. I've written it three times and none of them work. Here is the original one I was hoping to work:
SELECT RSKADDR.*
FROM EDW_BASE.RCI_RISK_ADDRESS RSKADDR
INNER JOIN (
SELECT DISTINCT COVER_RISK_ID
FROM EDW_BASE.RCI_COVER_RISK_MASTER RSKMASTER
INNER JOIN
(SELECT DISTINCT CONTACT_ID, FOLLOW_UP_DATE
FROM EDW_STG.STG_CIM_SVOM03
WHERE OUTSTANDING = 1 AND QUEUE = 'CIM Update for Contact Address') ADDR_WF
ON RSKMASTER.CONTACT_CODE = ADDR_WF.CONTACT_ID
WHERE RSKMASTER.IS_STORNO != 1
AND RSKMASTER.PRODUCT_CODE = 'HOME'
AND ADDR_WF.FOLLOW_UP_DATE >= RSKMASTER.COVER_EFF_START_DATE
AND RSKMASTER.POLICY_STATUS_CODE = 'POLICY'
AND ADDR_WF.FOLLOW_UP_DATE <= RSKMASTER.COVER_EFF_END_DATE
) ACTVRSK
ON ACTVRSK.COVER_RISK_ID = RSKADDR.RISK_ID
The code in the first inner join works fast all the way to the end. That is, the second SELECT query (within the INNER JOIN query of the first and main SELECT query) works fast without a problem. The problem arises when I integrate the second SELECT query inside the INNER JOIN of the main SELECT query (select RSKADDR.*).
Then it seems the execution is never ending!
I tried other ways and same result:
SELECT RSKADDR.*
FROM EDW_BASE.RCI_RISK_ADDRESS RSKADDR
INNER JOIN EDW_BASE.RCI_COVER_RISK_MASTER RSKMASTER
ON RSKMASTER.COVER_RISK_ID = RSKADDR.RISK_ID
AND RSKMASTER.IS_STORNO != 1
AND RSKMASTER.PRODUCT_CODE = 'HOME'
AND RSKMASTER.POLICY_STATUS_CODE = 'POLICY'
INNER JOIN EDW_STG.STG_CIM_SVOM03 ADDR_WF
ON OUTSTANDING = 1 AND QUEUE = 'CIM Update for Contact Address'
AND RSKMASTER.CONTACT_CODE = ADDR_WF.CONTACT_ID
AND ADDR_WF.FOLLOW_UP_DATE >= RSKMASTER.COVER_EFF_START_DATE
AND ADDR_WF.FOLLOW_UP_DATE <= RSKMASTER.COVER_EFF_END_DATE
It's such an easy query and can't get it to work. How can I do this?
DISTINCT is a costly operation and seldom needed. It often indicates a bad database design or a poorly written query. In your query you are even doing this repeatedly; that doesn't look good.
The second query looks much better. As you say you get the same result, DISTINCT in the first query was superfluous obviously.
I see you doing joins, but all you select is data from one table. So why join then? Select from the table you want data from and put your criteria in WHERE where it belongs.
The following query may be faster, because it plainly shows that we are simply checking whether we find matches in the other tables or not. But then, MySQL was known for not performing too well with IN clauses, so that may depend on the Version you are using.
select *
from edw_base.rci_risk_address
where risk_id in
(
select rm.cover_risk_id
from edw_base.rci_cover_risk_master rm
where rm.is_storno <> 1
and rm.product_code = 'HOME'
and rm.policy_status_code = 'POLICY'
and exists
(
select *
from edw_stg.stg_cim_svom03 adr
where adr.contact_id = rm.contact_code
and adr.follow_up_date >= rm.cover_eff_start_date
and adr.follow_up_date <= rm.cover_eff_end_date
and adr.outstanding = 1
and adr.queue = 'CIM Update for Contact Address'
)
);
Anyway, with your second query or with mine, I suppose the following indexes would help:
create index idx1 on rci_cover_risk_master
(
product_code,
policy_status_code,
is_storno,
contact_code,
cover_eff_start_date,
cover_eff_end_date,
cover_risk_id
);
create index idx2 on stg_cim_svom03
(
contact_id,
follow_up_date,
outstanding,
queue
);
create index idx3 on rci_risk_address(risk_id);
From the query, you only need RSKADDR data, so no need for an INNER JOIN. You can do the same with EXISTS keyword. Try the below query
SELECT RSKADDR.*
FROM EDW_BASE.RCI_RISK_ADDRESS RSKADDR
WHERE EXISTS (
SELECT 1
FROM EDW_BASE.RCI_COVER_RISK_MASTER RSKMASTER
WHERE EXISTS
(SELECT 1
FROM EDW_STG.STG_CIM_SVOM03
WHERE OUTSTANDING = 1 AND QUEUE = 'CIM Update for Contact Address') ADDR_WF
AND RSKMASTER.CONTACT_CODE = ADDR_WF.CONTACT_ID
AND RSKMASTER.IS_STORNO != 1
AND RSKMASTER.PRODUCT_CODE = 'HOME'
AND ADDR_WF.FOLLOW_UP_DATE >= RSKMASTER.COVER_EFF_START_DATE
AND RSKMASTER.POLICY_STATUS_CODE = 'POLICY'
AND ADDR_WF.FOLLOW_UP_DATE <= RSKMASTER.COVER_EFF_END_DATE
)
AND RSKMASTER.COVER_RISK_ID = RSKADDR.RISK_ID
)
Note : I have not tested query as no schema available.

UPDATE query with values from SELECT subquery, efficiently

I tried to come up with a query that updates records in a MySQL table using other records in the same table, but I had mixed results between local testing and production. I don't know much about subqueries, so I want to bring this question here. In local development with MySQL InnoDB 5.6.23, a query on a dataset of about 180k records take 25 to 30 seconds. On a staging server with MySQL InnoDB 5.5.32 and a dataset of 254k records, the query seems to stall for hours until it's stopped, taking 100% of a CPU core.
This is the query I came up with:
UPDATE
`product_lang` AS `pl1`
SET
pl1.`name` = (
SELECT pl2.`name` FROM (SELECT `name`, `id_product`, `id_lang` FROM `product_lang`) AS `pl2`
WHERE pl1.`id_product` = pl2.`id_product`
AND pl2.`id_lang` = 1
)
WHERE
pl1.`id_lang` != 1
The objective is to replace the value of name in product records where id_lang is not 1 (default language for the sake of explaining) with the value of name of records value with the default id_lang of 1.
I know that subqueries are inefficient, but I really don't know how to solve this problem, and it would be a great plus to leave this in SQL-land instead of using the app layer to do the heavy lifting.
If you write the query like this:
UPDATE product_lang pl1
SET pl1.name = (SELECT pl2.`name`
FROM (SELECT `name`, `id_product`, `id_lang`
FROM `product_lang`
) `pl2`
WHERE pl1.`id_product` = pl2.`id_product` AND pl2.`id_lang` = 1
)
WHERE pl1.`id_lang` <> 1
Then you have a problem. The only index that can help is on product_lang(id_lang).
I would recommend writing this as a join:
UPDATE product_lang pl1 join
(select id_product, pl.name
from product_lang
where id_lang = 1
) pl2
on pl1.id_lang <> 1 and pl2.id_product = pl1.id_product
SET pl1.name = pl2.name
WHERE pl1.id_lang <> 1
The index that you want for this query is product_lang(id_lang, id_product) and product_lang(id_product). However, this seems like a strange update, because it will set all the names to the name from language 1.
UPDATE product_lang AS pl1
JOIN product_lang AS pl2
ON pl1.`id_product` =
pl2.`id_product`
SET pl1.name = pl2.name
WHERE pl2.`id_lang` = 1
AND pl1.`id_lang` != 1;
And have INDEX(id_lang, id_product).
Make sure that there is an index specifying columns id _ product and id_lang.
update pl1
set pl1.name=pl2.name
from product_lang pl1
,product_lang pl2
where pl1.id_product = pl2.id_product AND pl2.id_lang = 1
and pl1.id_lang <> 1
The composit index that will be required will be id_product and id_lang for

merging SQL statements and how can it affect processing time

Let's assume I have the following tables:
items table
item_id|view_count
item_views table
view_id|item_id|ip_address|last_view
What I would like to do is:
If last view of item with given item_id by given ip_address was 1+ hour ago I would like to increment view_count of item in items table. And as a result get the view count of item. How I will do it normally:
q = SELECT count(*) FROM item_views WHERE item_id='item_id' AND ip_address='some_ip' AND last_view < current_time-60*60
if(q==1) then q = UPDATE items SET view_count = view_count+1 WHERE item_id='item_id'
//and finally get view_count of item
q = SELECT view_count FROM items WHERE item_id='item_id'
Here I used 3 SQL queries. How can I merge it into one SQL query? And how can it affect the processing time? Will it be faster or slower than previous method?
I don't think your logic is correct for what you describe that you want. The query:
SELECT count(*)
FROM item_views
WHERE item_id='item_id' AND
ip_address='some_ip' AND
last_view < current_time-60*60
is counting the number of views longer ago than your time frame. I think you want:
last_view > current_time-60*60
and then have if q = 0 on the next line.
MySQL is pretty good with the performance of not exists, so the following should work well:
update items
set view_count = view_count+1
WHERE item_id='item_id' and
not exists (select 1
from item_views
where item_id='item_id' AND
ip_address='some_ip' AND
last_view > current_time-60*60
)
It will work much better with an index on item_views(item_id, ip_address, last_view) and an index on item(item_id).
In MySQL scripting, you could then write:
. . .
set view_count = (#q := view_count+1)
. . .
This would also give you the variable you are looking for.
update target
set target.view_count = target.view_count + 1
from items target
inner join (
select item_id
from item_views
where item_id = 'item_id'
and ip_address = 'some_ip'
and last_view < current_time - 60*60
) ref
on ref.item_id = target.item_id;
You can only combine the update statement with the condition using a join as in the above example; but you'll still need a separate select statement.
It may be slower on very large set and/or unindexed table.

Update with SELECT and group without GROUP BY

I have a table like this (MySQL 5.0.x, MyISAM):
response{id, title, status, ...} (status: 1 new, 3 multi)
I would like to update the status from new (status=1) to multi (status=3) of all the responses if at least 20 have the same title.
I have this one, but it does not work :
UPDATE response SET status = 3 WHERE status = 1 AND title IN (
SELECT title FROM (
SELECT DISTINCT(r.title) FROM response r WHERE EXISTS (
SELECT 1 FROM response spam WHERE spam.title = r.title LIMIT 20, 1)
)
as u)
Please note:
I do the nested select to avoid the famous You can't specify target table 'response' for update in FROM clause
I cannot use GROUP BY for performance reasons. The query cost with a solution using LIMIT is way better (but it is less readable).
EDIT:
It is possible to do SELECT FROM an UPDATE target in MySQL. See solution here
The issue is on the data selected which is totaly wrong.
The only solution I found which works is with a GROUP BY:
UPDATE response SET status = 3
WHERE status = 1 AND title IN (SELECT title
FROM (SELECT title
FROM response
GROUP BY title
HAVING COUNT(1) >= 20)
as derived_response)
Thanks for your help! :)
MySQL doesn't like it when you try to UPDATE and SELECT from the same table in one query. It has to do with locking priorities, etc.
Here's how I would solve this problem:
SELECT CONCAT('UPDATE response SET status = 3 ',
'WHERE status = 1 AND title = ', QUOTE(title), ';') AS sql
FROM response
GROUP BY title
HAVING COUNT(*) >= 20;
This query produces a series of UPDATE statements, with the quoted titles that deserve to be updated embedded. Capture the result and run it as an SQL script.
I understand that GROUP BY in MySQL often incurs a temporary table, and this can be costly. But is that a deal-breaker? How frequently do you need to run this query? Besides, any other solutions are likely to require a temporary table too.
I can think of one way to solve this problem without using GROUP BY:
CREATE TEMPORARY TABLE titlecount (c INTEGER, title VARCHAR(100) PRIMARY KEY);
INSERT INTO titlecount (c, title)
SELECT 1, title FROM response
ON DUPLICATE KEY UPDATE c = c+1;
UPDATE response JOIN titlecount USING (title)
SET response.status = 3
WHERE response.status = 1 AND titlecount.c >= 20;
But this also uses a temporary table, which is why you try to avoid using GROUP BY in the first place.
I would write something straightforward like below
UPDATE `response`, (
SELECT title, count(title) as count from `response`
WHERE status = 1
GROUP BY title
) AS tmp
SET response.status = 3
WHERE status = 1 AND response.title = tmp.title AND count >= 20;
Is using GROUP BY really that slow ? The solution you tried to implement looks like requesting again and again on the same table and should be way slower than using GROUP BY if it worked.
This is a funny peculiarity with MySQL - I can't think of a way to do it in a single statement (GROUP BY or no GROUP BY).
You could select the appropriate response rows into a temporary table first then do the update by selecting from that temp table.
you'll have to use a temporary table:
create temporary table r_update (title varchar(10));
insert r_update
select title
from response
group
by title
having count(*) < 20;
update response r
left outer
join r_update ru
on ru.title = r.title
set status = case when ru.title is null then 3 else 1;