I wrote a query thas was taking way too much time (32 minutes) so I tried other methods to find one faster.
I finally wrote another one taking under 5 seconds
The problem is that I don't understand my optimization.
Can someone explain how it happen to be that much faster.
hugeTable has 494 500 rows
smallTable1 has 983 rows
smallTable2 has 983 rows
cursor.execute('UPDATE hugeTable dst,
(
SELECT smallTable1.hugeTableId, smallTable2.valueForHugeTable
FROM smallTable2
INNER JOIN smallTable1 ON smallTable1.id = smallTable2.id
-- This select represent 983 rows
)src
SET dst.columnToUpdate = src.valueForHugeTable
WHERE dst.id2 = %s AND dst.id = src.hugeTableId;', inputId2)
-- Condition : dst.id2 = %s alone target 983 rows.
-- Combinasion of : dst.id2 = %s AND dst.id = src.hugeTableId target a single unique row.
-- This query takes 32 minutes
And here is a way to do the exact same request with more steps, but way faster:
-- First create a temporary table to hold (983) rows from hugeTable that has to be updated
cursor.execute('CREATE TEMPORARY TABLE tmpTable AS
SELECT * from hugeTable
WHERE id2 = %s;', inputid)
-- Update the rows into tmpTable instead of into hugeTable
cursor.execute('UPDATE tmpTable dst,
(
SELECT smallTable1.hugeTableId, smallTable2.valueForHugeTable
FROM smallTable2
INNER JOIN smallTable1 ON smallTable1.id = smallTable2.id
-- This select represent 983 rows
)src
SET dst.columnToUpdate = src.valueForHugeTable
WHERE dst.id = src.hugeTableId;')
-- Then delete the (983) rows we want to update
cursor.execute('DELETE FROM hugeTable WHERE id2 = %s;', inputId2)
-- And create new rows replacing the above deleled ones with rows from tmpTable
cursor.execute('INSERT INTO hugeTable SELECT * FROM tmpTable;')
-- This takes litle under 5 seconds.
I would like to know why the first method takes so much time.
Understanding this will help me getting a new MySql level up.
Add a composite index to dst: INDEX(id2, id) (in either order).
More
Case 1:
UPDATE hugeTable dst,
( SELECT smallTable1.hugeTableId, smallTable2.valueForHugeTable
FROM smallTable2
INNER JOIN smallTable1 ON smallTable1.id = smallTable2.id
)src SET dst.columnToUpdate = src.valueForHugeTable
WHERE dst.id2 = 1234
AND dst.id = src.hugeTableId;
Case 2:
CREATE TEMPORARY TABLE tmpTable AS
SELECT *
from hugeTable
WHERE id2 = 1234;
UPDATE tmpTable dst,
( SELECT smallTable1.hugeTableId, smallTable2.valueForHugeTable
FROM smallTable2
INNER JOIN smallTable1 ON smallTable1.id = smallTable2.id
)src SET dst.columnToUpdate = src.valueForHugeTable
WHERE dst.id = src.hugeTableId;
Without knowing the MySQL version and seeing the EXPLAINs, I can only guess at why they are so different...
The subquery ( SELECT ... JOIN ... ) may or may not be 'materialized' into an implicit temp table. (Newer versions are better at doing such.)
Such a materialized subquery may or may not have an index created for it. (Again, new versions are better.)
If there are no adequate indexes on either dst or src, then the amount of 'effort' is the product of the sizes of the two tables. Note that in Case 2, dst is much smaller. (This may be the answer you are looking for.)
If the tables are not fully cached in RAM, one could artificially involve more I/O than the other. An I/O-bound query is often 10 times as slow as a the same query when it is fully cached in RAM. (This is less likely to be the answer, but may be part of the answer.)
Having a 3-table UPDATE would probably eliminate some of the issues above. And it may (or may not) eliminate the timing difference.
For further discussion, please provide
MySQL version
SHOW CREATE TABLE -- for each table
How big is innodb_buffer_pool_size
SHOW TABLE STATUS -- for each table
EXPLAIN UPDATE ... -- for each UPDATE -- requires at least 5.6
What percentage of the table has ( id2 = inputId2 )?
Related
I have a query that selects ~8000 rows. When I execute the query it takes 0.1 sec.
When I copy the query into a view and execute the view it takes about 2 seconds. In the first row of explain it selects ~570K rows, i dont know why.
I dont understand the first Row and why it shows up only in the view explain
1 PRIMARY ALL NULL NULL NULL NULL
This is the query (yes i know im not a mysql pro and the query is not that efficent, but it works ans 0.1 sek would be ok for me. Does anyone know why it is so slow in a view?
MariaDB 10.5.9
select
`xxxxxxx`.`auftraege`.`Zustandigkeit` AS `Zustandigkeit`,
`xxxxxxx`.`auftraege`.`cms` AS `cms`,
`xxxxxxx`.`auftraege`.`auftrag_id` AS `auftrag_id`,
`xxxxxxx`.`angebot`.`angebot_id` AS `angebot_id`,
`xxxxxxx`.`kunden`.`kunde_id` AS `kid`,
`xxxxxxx`.`angebot`.`kunde_id` AS `kunde_id`,
`xxxxxxx`.`kunden`.`firma` AS `firma`,
`xxxxxxx`.`auftraege`.`gekuendigt` AS `gekuendigt`,
`xxxxxxx`.`kunden`.`ansprechpartnerVorname` AS `ansprechpartnerVorname`,
`xxxxxxx`.`kunden`.`ansprechpartner` AS `ansprechpartner`,
`xxxxxxx`.`auftraege`.`ampstatus` AS `ampstatus`,
`xxxxxxx`.`auftraege`.`autoMahnungen` AS `autoMahnungen`,
`xxxxxxx`.`kunden`.`mail` AS `mail`,
`xxxxxxx`.`kunden`.`ansprechpartnerAnrede` AS `ansprechpartnerAnrede`,
case
`xxxxxxx`.`kunden`.`ansprechpartnerAnrede`
when
'm'
then
concat('Herr ', ifnull(`xxxxxxx`.`kunden`.`ansprechpartnerVorname`, ''), ifnull(`xxxxxxx`.`kunden`.`ansprechpartner`, ''))
else
concat('Frau ', ifnull(`xxxxxxx`.`kunden`.`ansprechpartnerVorname`, ''), ifnull(`xxxxxxx`.`kunden`.`ansprechpartner`, ''))
end
AS `ansprechpartnerfullName`, `xxxxxxx`.`kunden`.`website` AS `website`, `xxxxxxx`.`personal`.`name_betrieb` AS `name_betrieb`, `xxxxxxx`.`kunden`.`prioritaet` AS `prioritaet`, `xxxxxxx`.`auftraege`.`infoemail` AS `infoemail`, `xxxxxxx`.`auftraege`.`keywords` AS `keywords`, `xxxxxxx`.`auftraege`.`ftp_h` AS `ftp_h`, `xxxxxxx`.`auftraege`.`ftp_u` AS `ftp_u`, `xxxxxxx`.`auftraege`.`ftp_pw` AS `ftp_pw`, `xxxxxxx`.`auftraege`.`lgi_h` AS `lgi_h`, `xxxxxxx`.`auftraege`.`lgi_u` AS `lgi_u`, `xxxxxxx`.`auftraege`.`lgi_pw` AS `lgi_pw`, `xxxxxxx`.`auftraege`.`autoRemind` AS `autoRemind`, `xxxxxxx`.`kunden`.`telefon` AS `telefon`, `xxxxxxx`.`kunden`.`mobilfunk` AS `mobilfunk`, `xxxxxxx`.`auftraege`.`kommentar` AS `kommentar`, `xxxxxxx`.`auftraege`.`phase` AS `phase`, `xxxxxxx`.`auftraege`.`datum` AS `datum`, `xxxxxxx`.`angebot`.`typ` AS `typ`,
case
`xxxxxxx`.`auftraege`.`gekuendigt`
when
'1'
then
'Ja'
else
'Nein'
end
AS `Gekuendigt ? `,
(
select
count(`xxxxxxx`.`status`.`aenderung`)
from
`xxxxxxx`.`status`
where
`xxxxxxx`.`status`.`auftrag_id` = `xxxxxxx`.`auftraege`.`auftrag_id`
)
AS `aenderungen`,
`xxxxxxx`.`auftraege`.`vertragStart` AS `vertragStart`,
`xxxxxxx`.`auftraege`.`vertragEnde` AS `vertragEnde`,
case
`xxxxxxx`.`auftraege`.`zahlungsart`
when
'U'
then
'Überweisung'
when
'L'
then
'Lastschrift'
else
'Unbekannt'
end
AS `Zahlungsart`, `xxxxxxx`.`kunden`.`yyyyy_piwik` AS `yyyyy_piwik`,
(
select
max(`xxxxxxx`.`status`.`datum`) AS `mxDTst`
from
`xxxxxxx`.`status`
where
`xxxxxxx`.`status`.`auftrag_id` = `xxxxxxx`.`auftraege`.`auftrag_id`
and `xxxxxxx`.`status`.`typ` = 'SEO'
)
AS `mxDTst`,
(
select
case
`xxxxxxx`.`rechnungen`.`beglichen`
when
'YES'
then
'isOk'
else
'isAffe'
end
AS `neuUwe`
from
(
`xxxxxxx`.`zahlungsplanneu`
join
`xxxxxxx`.`rechnungen`
on(`xxxxxxx`.`zahlungsplanneu`.`rechnungsnummer` = `xxxxxxx`.`rechnungen`.`rechnungsnummer`)
)
where
`xxxxxxx`.`zahlungsplanneu`.`auftrag_id` = `xxxxxxx`.`auftraege`.`auftrag_id`
and `xxxxxxx`.`rechnungen`.`beglichen` <> 'STO' limit 1
)
AS `neuer`,
(
select
group_concat(`xxxxxxx`.`kunden_keywords`.`keyword` separator ',')
from
`xxxxxxx`.`kunden_keywords`
where
`xxxxxxx`.`kunden_keywords`.`kunde_id` = `xxxxxxx`.`kunden`.`kunde_id`
)
AS `keyword`,
(
select
case
count(0)
when
0
then
'Cool'
else
'Uncool'
end
AS `AusfallVor`
from
`xxxxxxx`.`rechnungen`
where
`xxxxxxx`.`rechnungen`.`rechnung_tag` < current_timestamp() - interval 15 day
and `xxxxxxx`.`rechnungen`.`kunde_id` = `xxxxxxx`.`kunden`.`kunde_id`
and `xxxxxxx`.`rechnungen`.`beglichen` = 'NO' limit 1
)
AS `Liquidiert`
from
(
((((`xxxxxxx`.`auftraege`
join
`xxxxxxx`.`angebot`
on(`xxxxxxx`.`auftraege`.`angebot_id` = `xxxxxxx`.`angebot`.`angebot_id`))
join
`xxxxxxx`.`kunden`
on(`xxxxxxx`.`angebot`.`kunde_id` = `xxxxxxx`.`kunden`.`kunde_id`))
left join
`xxxxxxx`.`kunden_keywords`
on(`xxxxxxx`.`angebot`.`kunde_id` = `xxxxxxx`.`kunden_keywords`.`kunde_id`))
join
`xxxxxxx`.`personal`
on(`xxxxxxx`.`kunden`.`bearbeiter` = `xxxxxxx`.`personal`.`personal_id`))
left join
`xxxxxxx`.`status`
on(`xxxxxxx`.`auftraege`.`auftrag_id` = `xxxxxxx`.`status`.`auftrag_id`)
)
group by
`xxxxxxx`.`auftraege`.`auftrag_id`
order by
NULL
UPDATE 1
1. The View Itself (Duration 1.83 sec)
1.1 Create the View: This is the View i created, it only contains the query from above.
1.2 Executing the View: It takes 1.83 sek to execute the view
1.3 Analyze the View: This is the explain of the view
2. The view with added where clause (Duration 1.86 sec)
2.1 Analyze the View with added where clause #rick wanted me to add a where clause to the view, if i understood him correctly. This is the explain of the view, where i added a where clause, takes 1.86 sec.
3. The Query, that is the source of the view (Duration: 0.1 sec)
3.1 Execute the query directly This is the query, that is the source of the view, when i execute it directly to the server. It takes ~0.1 - 0.2 seconds.
3.2 Analyze the direct queryAnd this is the explain of the pure query.
Why the view is so much slower, by only cupsuling the query inside of the view?
Update 2
These are the indexes I have set
ALTER TABLE angebot ADD INDEX angebot_idx_angebot_id (angebot_id);
ALTER TABLE auftraege ADD INDEX auftraege_idx_auftrag_id (auftrag_id);
ALTER TABLE kunden ADD INDEX kunden_idx_kunde_id (kunde_id);
ALTER TABLE kunden_keywords ADD INDEX kunden_keywords_idx_kunde_id (kunde_id);
ALTER TABLE personal ADD INDEX personal_idx_personal_id (personal_id);
ALTER TABLE rechnungen ADD INDEX rechnungen_idx_rechnungsnummer_beglichen (rechnungsnummer,beglichen);
ALTER TABLE rechnungen ADD INDEX rechnungen_idx_beglichen_kunde_id_rechnung (beglichen,kunde_id,rechnung_tag);
ALTER TABLE status ADD INDEX status_idx_auftrag_id (auftrag_id);
ALTER TABLE status ADD INDEX status_idx_typ_auftrag_id_datum (typ,auftrag_id,datum);
ALTER TABLE zahlungsplanneu ADD INDEX zahlungsplanneu_idx_auftrag_id (auftrag_id);
Be consistent between tables. kunde_id, for example, seems to be declared differently between tables. This may be preventing some obvious optimizations. (There are 6 JOINs that say func in EXPLAIN`.)
Remove the extra parentheses in JOINs. They may be preventing what the Optimizer is happy to do -- rearrange the tables in a JOIN.
Turn the query inside out. By this, I mean to do the minimum amount of work to do the main JOIN. Collect mostly id(s). Then do the dependent subqueries in an outer select. Something like:
SELECT ... ( SELECT ... ), ...
FROM ( SELECT a1.id
FROM a AS a1
JOIN b ON ..
JOIN c ON .. )
JOIN a AS a2 ON a2.id = a1.id
JOIN d ON ...
The "inside-out" kludge may eliminate the need for the GROUP BY. (Your query is too complex for me to see for sure.) If so, then I call the problem "explode-implode" -- Your query first JOINs, producing a temp table with lots of rows ("explodes"). Then it does a GROUP BY ("implodes").
More
These indexes will probably help:
status: (auftrag_id, typ, datum, aenderung)
rechnungen: (beglichen, kunde_id, rechnung_tag)
rechnungen: (rechnungsnummer, beglichen)
zahlungsplanneu: (auftrag_id, rechnungsnummer)
kunden_keywords: (kunde_id, keyword) -- (unless `kunde_id` is the PK)
(I see from all 3 EXPLAINs that you probably have sufficient indexes on kunden_keywords and status. Show me what indexes you have, so I can see if the existing indexes are as good as my suggestions.) "Using index" == "covering index".
Near the end is this LEFT JOIN, but I did not spot any use for the table; perhaps it can be removed?
left join `kunden_keywords` on(`angebot`.`kunde_id` = `kunden_keywords`.`kunde_id`))
I'm having issues with MySQL query running extremely slow. It takes about 2 min for each UPDATE to process.
This is the query:
UPDATE msn
SET is_disable = 1
WHERE mid IN
(
SELECT mid from link
WHERE rid = ${param.rid}
);
So my question is, I would like to know how the performance of the UPDATE statement will be affected if the result of the subquery is 0 or NULL. Because I think that maybe the process is slow because the result of the subquery is 0 or NULL.
Thanks a lot in advance.
The issue here is that the subquery following IN has to execute, whether or not it returns any records. I would probably express your update using exists logic:
UPDATE msn m
SET is_disable = 1
WHERE EXISTS (SELECT 1 FROM link l WHERE m.mid = l.mid AND l.rid = ${param.rid});
Then, add the following index to the link table:
CREATE INDEX idx ON link (mid, rid);
You could also try and compare against this version of the index:
CREATE INDEX idx ON link (rid, mid);
I've this query:
"explain UPDATE requests R JOIN profile as P ON R.intern_id = P.intern_id OR R.intern_id_decoded = P.intern_id OR R.intern_id_full_decode = P.intern_id SET R.found_id=P.id WHERE R.id >= 28000001 AND R.id <= 28000001+2000000 AND R.found_id is NULL"
1 UPDATE R NULL range PRIMARY,intern_id_customer_id_batch_num,id_found_id PRIMARY 4 NULL 3616888 10.00 Using where
1 SIMPLE P NULL ALL intern_id_dt_snapshot,intern_id NULL NULL NULL 179586254 27.10 Range checked for each record (index map: 0x6)
That query takes about 40 seconds to execute, it's updating 5000-10000 rows from the set of 2 million rows.
I am currently updating in 2 million row "jobs" to make the join perform faster.
The whole table is 170 million records currently.
The EXPLAIN shows the second part without using an INDEX, I am not sure if that's right or not.
The intern_id fields are varchars, found_id and id are INT
Does the explain output look like it's working performantly ?
I noticed the second line does not use an index, not sure if that's normal.
I would do this logic using multiple joins:
UPDATE requests r LEFT JOIN
profile p1
ON r.intern_id = p1.intern_id LEFT JOIN
profile p2
ON r.intern_id_decoded = p2.intern_id AND p1.id IS NULL LEFT JOIN
profile p3
ON r.intern_id_full_decode = p3.intern_id AND p2.id IS NULL
SET r.found_id = COALESCE(p1.id, p2.id, p3.id)
WHERE R.id >= 28000001 AND R.id <= 28000001 + 2000000 AND
R.found_id is NULL;
Databases are very bad at optimizing OR in JOIN conditions. It might be better with explicit JOINs.
The ON conditions also ensure only the first match.
I would do 3 chunked-up UPDATEs -- one for each of the ON conditions.
10K rows to update is excessive; crank it down to perhaps 1K. That means cranking the chunking down to 200K. (The speed might even be faster.)
UPDATE ... ON P.intern_id = R.intern_id SET ... WHERE ...
UPDATE ... ON P.intern_id = R.intern_id_decoded SET ... WHERE ...
UPDATE ... ON P.intern_id = R.intern_id_full SET ... WHERE ...
(The range is the same fore each set of 3, thereby helping with caching of R.)
Possibly INDEX(found_id) would help, but this is not a given.
See here for more chunking suggestions, especially the tip on finding 1000 rows before starting the operation:
SELECT id WHERE id > ... AND found_id IS NULL LIMIT 1000,1;
Then using that as the limit instead of the 2-millionth. A goal here is to even out the number of rows updated.
This query (along with a few others I think have a related issue) did not take 30 seconds when MySQL was local on the same EC2 instance as the rest of the website. More like milliseconds.
Does anything look off?
SELECT *, chv_images.image_id FROM chv_images
LEFT JOIN chv_storages ON chv_images.image_storage_id =
chv_storages.storage_id
LEFT JOIN chv_users ON chv_images.image_user_id = chv_users.user_id
LEFT JOIN chv_albums ON chv_images.image_album_id = chv_albums.album_id
LEFT JOIN chv_categories ON chv_images.image_category_id =
chv_categories.category_id
LEFT JOIN chv_meta ON chv_images.image_id = chv_meta.image_id
LEFT JOIN chv_likes ON chv_likes.like_content_type = "image" AND
chv_likes.like_content_id = chv_images.image_id AND chv_likes.like_user_id = 1
LEFT JOIN chv_follows ON chv_follows.follow_followed_user_id =
chv_images.image_user_id
LEFT JOIN chv_follows_projects ON
chv_follows_projects.follows_project_project_id =
chv_images.image_project_id LEFT JOIN chv_projects ON
chv_projects.project_id = follows_project_project_id WHERE
chv_follows.follow_user_id='1' OR (follows_project_user_id = 1 AND
chv_projects.project_privacy = "public" AND
chv_projects.project_is_public_upload = 1) GROUP BY chv_images.image_id
ORDER BY chv_images.image_id DESC
LIMIT 0,15
And this is what EXPLAIN shows:
Thank you
Update: This query has the same issue. It does not have a GROUP BY.
SELECT *, chv_images.image_id FROM chv_images
LEFT JOIN chv_storages ON chv_images.image_storage_id =
chv_storages.storage_id
LEFT JOIN chv_users ON chv_images.image_user_id = chv_users.user_id
LEFT JOIN chv_albums ON chv_images.image_album_id = chv_albums.album_id
LEFT JOIN chv_categories ON chv_images.image_category_id =
chv_categories.category_id
LEFT JOIN chv_meta ON chv_images.image_id = chv_meta.image_id
LEFT JOIN chv_likes ON chv_likes.like_content_type = "image" AND
chv_likes.like_content_id = chv_images.image_id AND chv_likes.like_user_id = 1
ORDER BY chv_images.image_id DESC
LIMIT 0,15
That EXPLAIN shows several table-scans (type: ALL), so it's not surprising that it takes over 30 seconds.
Here's your EXPLAIN:
Notice the column rows shows an estimated 14420 rows read from the first table chv_images. It's doing a table-scan of all the rows.
In general, when you do a series of JOINs, you can multiple together all the values in the rows column of the EXPLAIN, and the final result is how many row-reads MySQL has to do. In this case it's 14420 * 2 * 1 * 1 * 2 * 1 * 916, or 52,834,880 row-reads. That should put into perspective the high cost of doing several table-scans in the same query.
You might help avoid those table-scans by creating some indexes on these tables:
ALTER TABLE chv_storages
ADD INDEX (storage_id);
ALTER TABLE chv_categories
ADD INDEX (category_id);
ALTER TABLE chv_likes
ADD INDEX (like_content_id, like_content_type, like_user_id);
Try creating those indexes and then run the EXPLAIN again.
The other tables are already doing lookups by primary key (type: eq_ref) or by secondary key (type: ref) so those are already optimized.
Your EXPLAIN shows your query uses a temporary table and filesort. You should reconsider whether you need the GROUP BY, because that's probably causing the extra work.
Another tip is to avoid using SELECT * because it might be forcing the query to read many extra columns that you don't need. Instead, explicitly name only the columns you need.
Is there any indexes in chv_images?
I propose:
CREATE INDEX idx_image_id ON chv_images (image_id);
(Bill's ideas are good. I'll take the discussion a different way...)
Explode-Implode -- If the LEFT JOINs match no more than 1 row, change, for example,
SELECT
...
LEFT JOIN chv_meta ON chv_images.image_id = chv_meta.image_id
into
SELECT ...,
( SELECT foo FROM chv_meta WHERE image_id = chv_images.image_id ) AS foo, ...
If that can be done for all the JOINs, you can get rid of GROUP BY. This will avoid the costly "explode-implode" where JOINs lead to more rows, then GROUP BY gets rid of the dups. (I suspect you can't move all the joins in.)
OR -> UNION -- OR is hard to optimize. Your query looks like a good candidate for turning into UNION, then making more indexes that will become useful.
WHERE chv_follows.follow_user_id='1'
OR (follows_project_user_id = 1
AND chv_projects.project_privacy = "public"
AND chv_projects.project_is_public_upload = 1
)
Assuming that follows_project_user_id is in `chv_images,
( SELECT ...
WHERE chv_follows.follow_user_id='1' )
UNION DISTINCT -- or ALL, if you are sure there won't be dups
( SELECT ...
WHERE follows_project_user_id = 1
AND chv_projects.project_privacy = "public"
AND chv_projects.project_is_public_upload = 1 )
Indexes needed:
chv_follows: (follow_user_id)
chv_projects: (project_privacy, project_is_public_upload) -- either order
But this has not yet handled the ORDER BY and LIMIT. The general pattern for such:
( SELECT ... ORDER BY ... LIMIT 15 )
UNION
( SELECT ... ORDER BY ... LIMIT 15 )
ORDER BY ... LIMIT 15
Yes, the ORDER BY and LIMIT are repeated.
That works for page 1. If you want the next 15 rows, see http://mysql.rjweb.org/doc.php/pagination#pagination_and_union
After building those two sub-selects, look at them; I think you will be able to optimize each one, and may need new indexes because the Optimizer will start with a different 'first' table.
In between the 2 query cases below, which one is faster:
update t set v = case when id = 1000000 then 100 when id = 10000000 then 500 else v end
Or
update t set v = 100 where id = 1000000;
update t set v = 500 where id = 10000000;
the table t has an unique index on id and the table can be pretty big (millions of entry).
My guess is that although the second case make multiple queries it is still faster because it can use the index to find the entries while in the first case it is doing a full scan of the table (but it is just a guess, I have actually no clue on how MySQL deal with CASE control flow).
Thank you in advance for any answers !
The second version you have would be better and cleaner, however, a cleaner single update is also possible to take advantage of the index on the "id" columns...
update t
set v = if( id = 1000000, 100, 500 )
where id in ( 1000000, 10000000 )