I have a query like this which takes a really long time to run. The table is around 4 million rows.
DELETE FROM TABLE WHERE value_was IS NULL OR value_was <= value_now;
I'm hoping I could create an index for (value_was, value_now) so I could do something like
DELETE FROM TABLE WHERE
ID1 IN (SELECT ID1 from TABLE where value_was IS NULL)
OR ID2 IN (SELECT ID2 FROM TABLE WHERE value_was <= value_now);
This table doesn't have primary key. It has two composite keys. And I guess I cannot use the same table in subquery, but how do I improve the performance of the first query?
Thanks very much any suggestion would be much appreciated.
Updated:
The db is innoDB
Due to the way data is stored, as an internal linked list, innoDB tables are inherently slow at huge DELETE operations. Changing the storage type to myISAM should make the operation an awful lot faster - I've seen 100x improvements in similar situations.
Related
Can any one help me to re-write the query to speed up the execution time? It took 37 seconds to execute.
DELETE FROM storefront_categories
WHERE userid IN (SELECT userid
FROM MASTER
where expirydate<'2020-2-4'
)
At the same time, this query took only 4.69 seconds only to execute.
DELETE FROM storefront_categories
WHERE userid NOT IN (SELECT userid FROM MASTER)
The table storefront_categories have 97K records where as in MASTER have 40K records. We have created a index on MASTER.expirydate field.
When deleting 40K rows, expect it to take time. The main cost (assuming adequate indexing and a decent query) is the overhead of transactional semantics of an "atomic" delete. This involves making a copy of each row being deleted, just in case there is a crash. That way, InnoDB can bring the database back to what it had been before the crash.
When deleting 40% of a table, it is much faster to copy the rows to keep into another table then swap tables.
When deleting a large number of rows (regardless of the percentage), it is better to do it in chunks. And it is best to walk through the table based on the PRIMARY KEY.
I discuss both of those techniques, plus others, in http://mysql.rjweb.org/doc.php/deletebig
As for the query formulation:
It is version-dependent; old versions of MySQL did a poor job on some flavors.
NOT IN (SELECT ...) and NOT EXISTS tend to be the worst performers.
IN (SELECT ...) and/or EXISTS may be better.
"Multi-table DELETE is another option. It works like JOIN.
(Bottom line: You did not say what version you are running; I can't predict which formulation will be best.)
My blog avoids the formulation debate.
The query looks fine as it is.
I would suggest the following indexes for optimization:
master(expiry_date, userid)
storefront_categories(userid)
The first index is a covering index for the subquery on master: it means that the database should be able to execute the subquery by looking at the index only (whereas with just expiry_date in the index, it still needs to look at the table data to fetch the related userid).
The second index lets the database optimize the in operation.
I would try with exists :
DELETE
FROM storefront_categories
WHERE EXISTS (SELECT 1
FROM MASTER M
WHERE M.userid = storefront_categories.userid AND
M.expirydate <'2020-02-04'
);
Index would be metter here i would expect index on storefront_categories(userid) & MASTER(userid, expirydate).
I would advise you to use NOT EXISTS with the correct index:
DELETE sc
FROM storefront_categories sc
WHERE NOT EXISTS (SELECT 1
FROM master m
WHERE m.userid = sc.userid AND
m.expirydate < '2020-02-04'
);
The index you want is on master(userid, expirydate). The order of the columns is important. For this version, an index on storefront_categories does not help.
Note that I changed the date format. I recommend using YYYY-MM-DD to avoid ambiguity -- and to use the full 10 characters.
I'm trying to find the best way to delete millions of records in MySQL DB.
I have a table with a PK on ID and an index on 'date' column and my delete queries are like:
DELETE FROM table WHERE date < '<today - 6 months>';
It's generating a lot of delay on the slave.
I had 2 options:
DELETE FROM table WHERE date < '<today - 6 months>' LIMIT 1000;
or
Include further Indexing or using PK for deleting.
I would like to hear your opinions. If using LIMIT won't change the workload or if using PK (in combination with LIMIT) is better.
The best way to delete lots of rows, especially in a replication setup, is to walk through the table via the PRIMARY KEY in chunks of 1000 rows.
See this for details.
If this is a recurring task, then a 'time series' PARTITIONing is even better. (Though it won't help until you set up partitioning.) See this.
I've got a three col table. It has a unique index, and another two (for two different columnts) for faster queries.
+-------------+-------------+----------+
| category_id | related_id | position |
+-------------+-------------+----------+
Sometimes the query is
SELECT * FROM table WHERE category_id = foo
and sometimes it's
SELECT * FROM table WHERE related_id = foo
So I decided to make both category_id and related_id an index for better performance. Is this bad practice? What are the downsides of this approach?
In the case I already have 100.000 rows in that table, and am inserting another 100.000, will it be an overkill. having to refresh the index with every new insert? Would that operation then take too long? Thanks
There are no downsides if it's doing exactly what you want, you query on a specific column a lot, so you make that column indexed, that's the whole point. Now you have a 60 column table and your adding indexes to columns you never query on then you are wasting resources because those indexes need to be maintained on INSERT/UPDATE/DELETE operations.
If you have created index for each column then you will definitely get benefit out of it.
Don't go for composite indexes (Multiple coulmn indexes).
You yourself can see the advantage of index in your query by using EXPLAIN (statement provides information about how MySQL executes statements).
EXAMPLE:
EXPLAIN SELECT * FROM table WHERE category_id = foo;
Hope this will help.
~K
Its good to have indexes. Just understand that indexes would take more disk space, but faster search.
It is in your best interest to index those fields which have less repeated values. For eg. Indexing a field that contains a Boolean flag might not be a good idea.
Since in your case you are having an id, hence I think you won't be having any problem in keeping the indexes that you have created.
Also, the inserts would be slower, but since you are saving id's there won't be much of a difference in the time required to insert. Go ahead and do the insert.
My personal advice :
When you are inserting large number of rows in a single table in one go, don't insert them using a single query, unless mandatory. This would prevent your table from getting locked and inaccessible for a long time.
A table with 3 columns, 1000000 records. Another table with 20 columns, 5000000 records. From the above which table gives quick output while query for data. Provided both table has auto increment value as primary key?
To represent more clearly,
Lets say, table1 has 3 columns with 1million records,1 field is indexed. And also table2 has 30 columns with 10lakh records, 5 field is indexed. If I run query to select a data from table1 and the next query to fetch data from table2 ( columns are indexed on both tables ), which table gives output much quicker than others?
Based on the sizes you mentioned, the tables are so small that it won't matter.
Generally speaking though MyISAM will be a bit faster than InnoDB for pretty much any table although it seems like the gap there is closing all the time.
Keep in mind though that for a small performance penalty, InnoDB gives you a lot in terms of ACID compliance.
I have a MyISAM table in MySQL which consists of two fields (f1 integer unsigned, f2 integer unsigned) and contains 320 million rows. I have an index on f2. Every week I insert about 150,000 rows into this table. I would like to know what is the frequency with which I need to run "analyze" and "optimize" on this table (as it would probably take a long time and block in the meantime)? I do not do any deletes or update statements, but just insert new rows every week. Also, I am not using this table in any joins so, based on this information, are "analyze" and "optimize" really required?
Thanks in advance,
Tim
ANALYZE TABLE checks the keys, OPTIMIZE TABLE kind of reorganizes tables.
If you never...ever... delete or update the data in your table, only insert new ones, you won't need analyze or optimize.