MySQL performance tuning for DELETE query - mysql

Can any one help me to re-write the query to speed up the execution time? It took 37 seconds to execute.
DELETE FROM storefront_categories
WHERE userid IN (SELECT userid
FROM MASTER
where expirydate<'2020-2-4'
)
At the same time, this query took only 4.69 seconds only to execute.
DELETE FROM storefront_categories
WHERE userid NOT IN (SELECT userid FROM MASTER)
The table storefront_categories have 97K records where as in MASTER have 40K records. We have created a index on MASTER.expirydate field.

When deleting 40K rows, expect it to take time. The main cost (assuming adequate indexing and a decent query) is the overhead of transactional semantics of an "atomic" delete. This involves making a copy of each row being deleted, just in case there is a crash. That way, InnoDB can bring the database back to what it had been before the crash.
When deleting 40% of a table, it is much faster to copy the rows to keep into another table then swap tables.
When deleting a large number of rows (regardless of the percentage), it is better to do it in chunks. And it is best to walk through the table based on the PRIMARY KEY.
I discuss both of those techniques, plus others, in http://mysql.rjweb.org/doc.php/deletebig
As for the query formulation:
It is version-dependent; old versions of MySQL did a poor job on some flavors.
NOT IN (SELECT ...) and NOT EXISTS tend to be the worst performers.
IN (SELECT ...) and/or EXISTS may be better.
"Multi-table DELETE is another option. It works like JOIN.
(Bottom line: You did not say what version you are running; I can't predict which formulation will be best.)
My blog avoids the formulation debate.

The query looks fine as it is.
I would suggest the following indexes for optimization:
master(expiry_date, userid)
storefront_categories(userid)
The first index is a covering index for the subquery on master: it means that the database should be able to execute the subquery by looking at the index only (whereas with just expiry_date in the index, it still needs to look at the table data to fetch the related userid).
The second index lets the database optimize the in operation.

I would try with exists :
DELETE
FROM storefront_categories
WHERE EXISTS (SELECT 1
FROM MASTER M
WHERE M.userid = storefront_categories.userid AND
M.expirydate <'2020-02-04'
);
Index would be metter here i would expect index on storefront_categories(userid) & MASTER(userid, expirydate).

I would advise you to use NOT EXISTS with the correct index:
DELETE sc
FROM storefront_categories sc
WHERE NOT EXISTS (SELECT 1
FROM master m
WHERE m.userid = sc.userid AND
m.expirydate < '2020-02-04'
);
The index you want is on master(userid, expirydate). The order of the columns is important. For this version, an index on storefront_categories does not help.
Note that I changed the date format. I recommend using YYYY-MM-DD to avoid ambiguity -- and to use the full 10 characters.

Related

Query execution taking long

I am currently using mysql
I have two tables called person and zim_list_id both tables has over 2 million rows
I want to update person table using zim_list_id table
the query I am using is
update person p JOIN zim_list_id z on p.person_id = z.person_id
set p.office_name = z.`Office Name`;
I have also created index on zim_list_id table and person table , the query I executed was
create index idx_person_office_name on person(`Office_name`);
create index idx_zim_list_id_office_name on zim_list_id(`Office name`);
the query execution is taking very long. is there any way to reduce the execution time?
The indexes on Office Name do nothing at all for this query. All you've done with those indexes is make inserts and updates slower, as now the database has to update the index any time that column changes.
What you really need, if you don't already have them, are indexes on the person_id field in those tables, to make the join more efficient.
You might also consider adding Office_Name as a second column on the zim_list_id table's index, as this will allow the database to fullfill that part of the query entirely from the index. But I wouldn't do that until I had checked the results after setting the plain person_id indexes first.
Finally, I'm curious how much memory is in that server (especially relative to the total size of the database), how much of it is available in your MySql buffer_pool_size setting, and what other work that server might be doing... there could always be an environmental factor as well.

Mysql delete and optimize very slow

I searched Internet and Stack Overflow for my trouble, but couldn't find a good solution.
I have a table (MySql MyISAM) containing 300,000 rows (one column is blob field).
I must use:
DELETE FROM tablename WHERE id IN (1,4,7,88,568,.......)
There are nearly 30,000 id's in the IN syntax.
It takes nearly 1 hour. Also It does not make the .MYD file smaller although I delete 10% of it, so I run OPTIMIZE TABLE... command. It also lasts long...(I should use it, because disk space matters for me).
What's a way to improve performance when deleting the data as above and recover space? (Increasing buffer size? which one? or else?)
With IN, MySQL will scan all the rows in the table and match the record against the IN clause. The list of IN predicates will be sorted, and all 300,000 rows in the database will get a binary search against 30,000 ids.
If you do this with JOIN on a temporary table (no indexes on a temp table), assuming id is indexed, the database will do 30,000 binary lookups on a 300,000 record index.
So, 300,000 binary searches against 30,000 records, or 30,000 binary searches against 300,000 records... which is faster? The second one is faster, by far.
Also, delaying the index rebuilding with DELETE QUICK will result in much faster deletes. All records will simply be marked deleted, both in the data file and in the index, and the index will not be rebuilt.
Then, to recover space and rebuild the indexes at a later time, run OPTIMIZE TABLE.
The size of the list in your IN() statement may be the cause. You could add the IDs to a temporary table and join to do the deletes. Also, as you are using MyISAM you can use the DELETE QUICK option to avoid the index hit whilst deleting:
For MyISAM tables, if you use the QUICK keyword, the storage engine
does not merge index leaves during delete, which may speed up some
kinds of delete operations.
I think the best approach to make it faster is to create a new table and insert into it the rows which you dont want to delete and then drop the original table and then you can copy the content from the table to the main table.
Something like this:
INSERT INTO NewTable SELECT * FROM My_Table WHERE ... ;
Then you can use RENAME TABLE to rename the copy to the original name
RENAME TABLE My_Table TO My_Table_old, NewTable TO My_Table ;
And then finally drop the original table
DROP TABLE My_Table_old;
try this
create a table name temptable with a single column id
insert into table 1,4,7,88,568,......
use delete join something like
DELETE ab, b FROM originaltable AS a INNER JOIN temptable AS b ON a.id= b.id where b.id is null;
its just an idea . the query is not tested . you can check the syntax on google.

SELECT vs UPDATE performance with index

If I SELECT IDs then UPDATE using those IDs, then the UPDATE query is faster than if I would UPDATE using the conditions in the SELECT.
To illustrate:
SELECT id FROM table WHERE a IS NULL LIMIT 10; -- 0.00 sec
UPDATE table SET field = value WHERE id IN (...); -- 0.01 sec
The above is about 100 times faster than an UPDATE with the same conditions:
UPDATE table SET field = value WHERE a IS NULL LIMIT 10; -- 0.91 sec
Why?
Note: the a column is indexed.
Most likely the second UPDATE statement locks much more rows, while the first one uses unique key and locks only the rows it's going to update.
The two queries are not identical. You only know that the IDs are unique in the table.
UPDATE ... LIMIT 10 will update at most 10 records.
UPDATE ... WHERE id IN (SELECT ... LIMIT 10) may update more than 10 records if there are duplicate ids.
I don't think there can be a one straight-forward answer to your "why?" without doing some sort of analysis and research.
The SELECT queries are normally cached, which means that if you run the same SELECT query multiple times, the execution time of the first query is normally greater than the following queries. Please note that this behavior can only be experienced where the SELECT is heavy and not in scenarios where even the first SELECT is much faster. So, in your example it might be that the SELECT took 0.00s because of the caching. The UPDATE queries are using different WHERE clauses and hence it is likely that their execution times are different.
Though the column a is indexed, but it is not necessary that MySQL must be using the index when doing the SELECT or the UPDATE. Please study the EXPLAIN outputs. Also, see the output of SHOW INDEX and check if the "Comment" column reads "disabled" for any indexes? You may read more here - http://dev.mysql.com/doc/refman/5.0/en/show-index.html and http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.
Also, if we ignore the SELECT for a while and focus only on the UPDATE queries, it is obvious that they aren't both using the same WHERE condition - the first one runs on id column and the latter on a. Though both columns are indexed but it does not necessarily mean that all the table indexes perform alike. It is possible that some index is more efficient than the other depending on the size of the index or the datatype of the indexed column or if it is a single- or multiple-column index. There sure might be other reasons but I ain't an expert on it.
Also, I think that the second UPDATE is doing more work in the sense that it might be putting more row-level locks compared to the first UPDATE. It is true that both UPDATES are finally updating the same number of rows. But where in the first update, it is 10 rows that are locked, I think in the second UPDATE, all rows with a as NULL (which is more than 10) are locked before doing the UPDATE. Perhaps MySQL first applies the locking and then runs the LIMIT clause to update only limited records.
Hope the above explanation makes sense!
Do you have a composite index or separate indexes?
If it is a composite index of id and a columns,
In 2nd update statement the a column's index would not be used. The reason is that only the left most prefix indexes are used (unless if a is the PRIMARY KEY)
So if you want the a column's index to be used, you need in include id in your WHERE clause as well, with id first then a.
Also it depends on what storage engine you are using since MySQL does indexes at the engine level, not server.
You can try this:
UPDATE table SET field = value WHERE id IN (...) AND a IS NULL LIMIT 10;
By doing this id is in the left most index followed by a
Also from your comments, the lookups are much faster because if you are using InnoDB, updating columns would mean that the InnoDB storage engine would have to move indexes to a different page node, or have to split a page if the page is already full, since InnoDB stores indexes in sequential order. This process is VERY slow and expensive, and gets even slower if your indexes are fragmented, or if your table is very big
The comment by Michael J.V is the best description. This answer assumes a is a column that is not indexed and 'id' is.
The WHERE clause in the first UPDATE command is working off the primary key of the table, id
The WHERE clause in the second UPDATE command is working off a non-indexed column. This makes the finding of the columns to be updated significantly slower.
Never underestimate the power of indexes. A table will perform better if the indexes are used correctly than a table a tenth the size with no indexing.
Regarding "MySQL doesn't support updating the same table you're selecting from"
UPDATE table SET field = value
WHERE id IN (SELECT id FROM table WHERE a IS NULL LIMIT 10);
Just do this:
UPDATE table SET field = value
WHERE id IN (select id from (SELECT id FROM table WHERE a IS NULL LIMIT 10));
The accepted answer seems right but is incomplete, there are major differences.
As much as I understand, and I'm not a SQL expert:
The first query you SELECT N rows and UPDATE them using the primary key.
That's very fast as you have a direct access to all rows based on the fastest possible index.
The second query you UPDATE N rows using LIMIT
That will lock all rows and release again after the update is finished.
The big difference is that you have a RACE CONDITION in case 1) and an atomic UPDATE in case 2)
If you have two or more simultanous calls of the case 1) query you'll have the situation that you select the SAME id's from the table.
Both calls will update the same IDs simultanously, overwriting each other.
This is called "race condition".
The second case is avoiding that issue, mysql will lock all rows during the update.
If a second session is doing the same command it will have a wait time until the rows are unlocked.
So no race condition is possible at the expense of lost time.

MySQL ALTER TABLE ORDER BY f1 DESC - Does this block SELECT queries?

I have a MySQL MYISAM table (say tbl) consisting of 2 unsigned int fields, say, f1 and f2. There is an index on f2 and the table is very large (approximately 320,000,000+ rows). I update this table periodically (with approximately 100,000 new rows a week), and, in order to be able to search this table without doing an ORDER BY (which would be very time consuming in real-time queries), I physically ORDER the table according to the way in which I want to retrieve its rows.
So, I perform an ALTER TABLE tbl ORDER BY f1 DESC. (I know I have enough physical space on the server for a copy of the table.) I have read that during this operation, a temporary table is created and SELECT statements are not affected on the current rows.
However, I have experienced that this is not the case, and SELECT statements on the table that occur at the same time with the ALTER table are getting blocked and do not terminate. After the ALTER TABLE tbl completes (about 40 minutes on the production server), the SELECT statements on tbl start executing fine again.
Is there any reason why the "ALTER table tbl ORDER BY f1 DESC" seems to be blocking other clients from querying tbl?
Altering a table will always grab a lock on the table, preventing SELECTs from running.
I'll admin that I didn't even know you could do that with an ALTER TABLE.
What are you trying to get from the table? For example, all records in a given range? 320 million rows is not a trivial number. I'll give you my gut reactions:
Switch to InnoDB (allows #2, also gives transactions, but without #2 may hurt performance)
Partition the table (makes it act like a number of slightly smaller tables)
Consider a redesign, such as having a "working set" table and a "historical" table, basically manually partitioning. If you usually look for recently inserted data, this (along with partitioning) will help a lot. If your lookups are evenly distributed, this probably won't make a difference.
Consider adding a new column you could use in conjunction to narrow down selects (so instead of searching on date, search on date and customer ID)
Since I don't know what you're storing, some of these (such as #4) may not apply.
There are some other things you could try. OPTIMIZE TABLE may help you but take less time, but I doubt it. I think internally it's implemented as a dump/reload, at least on the InnoDB side.

MySQL, delete and index hint

I have to delete about 10K rows from a table that has more than 100 million rows based on some criteria. When I execute the query, it takes about 5 minutes. I ran an explain plan (the delete query converted to select * since MySQL does not support explain delete) and found that MySQL uses the wrong index.
My question is: is there any way to tell MySQL which index to use during delete? If not, what ca I do? Select to temp table then delete from temp table?
There is index hint syntax. //ETA: sadly, not for deletes
ETA:
Have you tried running ANALYZE TABLE $mytable?
If that doesn't pay off, I'm thinking you have 2 choices: Drop the offending index before the delete and recreate it after. Or JOIN your delete table to another table on the desired index which should ensure that the desired index is used.
I've never really come across a situation where MySQL chose the wrong index, but rather my understanding of how indexes worked was usually at fault.
You might want to check out this book: http://oreilly.com/catalog/9780596003067
It has a great section on how indexes work and other tuning options.
As stated in other answers, MySQL can't use indexes, but the PRIMARY KEY index.
So your best option, if you have a PRIMARY KEY on the table is to run a fast SELECT, then DELETE according lines. Preferably in a TRANSACTION, so that you don't delete wrong rows.
Hence:
DELETE FROM table WHERE column_with_index = 0
Will be rewritten:
SELECT primary_key FROM table WHERE column_with_index = 0 => returns many lines
DELETE FROM table WHERE primary_key IN(?, ?, ?) => ? will be replaced by the results of the SELECTed primary keys.
If you have not that much lines to delete, it would be more efficient this way.
For example, I've just hit an exemple, on the same table, with the same data:
7499067 rows analyzed by DELETE : 12 seconds
vs
6 rows analyzed by SELECT using a good index : 0.10 seconds
0 rows to be deleted in the end