I have a table with 8 millions records in mysql.
I want to keep last one week data and delete the rest, i can take a dump and recreate the table in another schema.
I am struggling to get the queries right, please share your views and best approaches to do this.Best way to delete so that it will not affect other tables in the production.
Thanks.
MySQL offers you a feature called partitioning. You can do a horizontal partition and split your tables by rows. 8 Million isn't that much, how is the insertion rate per week?
CREATE TABLE MyVeryLargeTable (
id SERIAL PRIMARY KEY,
my_date DATE
-- your other columns
) PARTITION BY HASH (YEARWEEK(my_date)) PARTITIONS 4;
You can read more about it here: http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
Edit: This one creates 4 partitions, so this will last for 4 weeks - therefore I suggest changing to partitions based on months / year. Partition limit is quite high but this is really a question how the insertion rate per week/month/year looks like.
Edit 2
MySQL5.0 comes with an Archive Engine, you should use this for your Archive table ( http://dev.mysql.com/tech-resources/articles/storage-engine.html ). Now how to get your data into the archive table? It seems like you have to write a cron-job that runs on the beginning of every week, moving all records to the archive table and deleting them from the original one. You could write a stored procedure for this but the cron-job needs to run on the shell. Keep in mind this could affect your data integrity in some way. What about upgrading to MySQL 5.1?
Related
I have a MySQL table that is growing quite fast and I was wondering what would be the best approach regarding ARCHIVING not needed data moving forward.
The table has data that is 2 years old, but we only need the data for last year onwards.
At the moment, the table has about 4 million rows and is 2.2GB in size.
DB specs:
Engine version
5.7.mysql_aurora.2.07.2
Instance class
db.r4.xlarge
vCPU
4
RAM
30.5 GB
Would anyone have any input in that regard?
Thank you
If the table were already partitioned by, say, month, archiving would be relatively efficient.
In the absence of that prep work, I recommend:
PARTITION BY RANGE(..)
Create a new table that is partitioned; cf Partition
Copy the data since a year ago into that table.
Drop the current table
Work on creating a regular monthly process involving "transportable tablespaces". Or, if you don't need to keep the old data, then plan on just DROP PARTITION (and add a new partition). (See link above.)
Big DELETE
If, instead, you choose to do something that involves DELETEing millions of rows, I strongly suggest chunking the operation: http://mysql.rjweb.org/doc.php/deletebig
The above does not say where you will send the data you have removed from this main table. What is your plan for that?
I am facing a performance issue in mysql due to large index size on my table. Index size has grown to 6GB and my instance is running on 32GB memory. Majority of rows is not required in that table after a few hours and can be removed selectively. But removing them is a time consuming solution and doesn't reduce index size.
Please suggest some solution to manage this index.
You can optimize your table to rebuild index and get back space if not getting even after deletion-
optimize table table_name;
But as your table is bulky so it will lock during optimze table and also you are facing issue how can remove old data even you don't need few hours old data. So you can do as per below-
Step1: during night hours or when there is less traffic on your db, first rename your main table and create a new table with same name. Now insert few hours data from old table to new table.
By this you can remove unwanted data and also new table will be optimzed.
Step2: In future to avoid this issue, you can create a stored procedure. Which will will execute in night hours only 1 time per day and either delete till previous day (as per your requirement) data from this table or will move data to any historical table.
Step3: As now your table always keep only sigle day data then you can execute optimize table statement to rebuild and claim space back on this table easily.
Note: delete statement will not rebuild index and will not free space on server. For this you need to do optimize your table. It can be by various ways like by alter statement or by optimize statement etc.
If you can remove all the rows older than X hours, then PARTITIONing is the way to go. PARTITION BY RANGE on the hour and use DROP PARTITION to remove an old hour and REORGANIZE PARTITION to create a new hour. You should have X+2 partitions. More details.
If the deletes are more complex, please provide more details; perhaps we can come up with another solution that deals with the question about index size. Please include SHOW CREATE TABLE.
Even if you cannot use partitions for purging, it may be useful to have partitions for OPTIMIZE. Do not use OPTIMIZE PARTITION; it optimizes the entire table. Instead, use REORGANIZE PARTITION if you see you need to shrink the index.
How big is the table?
How big is innodb_buffer_pool_size?
(6GB index does not seem that bad, especially since you have 32GB of RAM.)
I have a dating website. In this website I used to send daily 10 photo matches to a user and store them in the structure as
SENDER RECEIVER
11 1
12 1
13 1
14 1
I maintain two month log.
User can also check them by logging to my website.
Which means there is parallel insert and select, which surely is not an issue.
Problem is when a user becomes inactive or deletes its id, I need to remove all the entries from the log where sender='inactive-id'.
Size of log is approx 60 million.
So whenever a delete queries comes in this huge table, all selects get locked and my site is getting down.
Note my table is merge myisam
as I need to store 2-3 month records and on 1st of every month I change the definition
Normally, Table is the most granular object that is locked by DELETE statement. Therefore, by using MERGE table you combine several objects that can be lock independently into a single big object that will be locked, when DELETE hits ANY of its tables.
MERGE is a solution for tables which change rarely or never: MERGE Table Advantages and Disadvantages.
You have 2 options:
Minimise impact of locks:
Delete in small batches
Run delete job during low load hours
Consider not deleting at all, if it does not save you much space
Instead of deleting rows mark them as "deleted" or obsolete and exclude from SELECT queries
Have smaller objects locked (rather than locking all your tables at once):
Have several Delete statements to delete from each of underlying tables
Drop MERGE definition, delete data from each underlying table create MERGE. However, I think you can do it without dropping MERGE definition.
Use partitioning.
Quote from MySQL Manual:
An alternative to a MERGE table is a partitioned table, which stores
partitions of a single table in separate files. Partitioning enables
some operations to be performed more efficiently and is not limited to
the MyISAM storage engine. For more information, see Chapter 18, Partitioning.
I would strongly advocate for partitioning, because:
- You can fully automate your logging / data retention process: a script can create new and remove empty partitions, move obsolete data to a different table and then truncate that table.
- key uniqueness is enforced
- Only partition that contains data to be deleted is locked. Selects on other partitions runs as normal.
- Searches run on all partitions at the same time (as with MERGE), but you can use HASH SubPartitioning to further speed up searches.
However if you believe that benefits of partitioning will be outweighed by cost of development, then may be you should not delete that data at all?
I think that the best solution would be setting partitions on log based on user id. This way when you run a delete Db will block only one partition.
If you Google on "delete on huge table" you'll get some informative results. Here are the first three hits:
http://www.dba-oracle.com/t_oracle_fastest_delete_from_large_table.htm
Improving DELETE and INSERT times on a large table that has an index structure
http://www.dbforums.com/microsoft-sql-server/1635228-fastest-way-delete-large-table.html
One method they all mention is deleting in small batches instead of all at once. You say that the table contains data for a 2 month period. Maybe you run delete statements for each day separate?
I hope this helps!
If you use InnoDB and create FOREIGN KEY relations, you can get the rows deleted automatically when the user themself is deleted:
CREATE TABLE `DailyChoices`(
sender INT(11) NOT NULL,
receiver INT(11) NOT NULL,
CONSTRAINT FOREIGN KEY (sender) REFERENCES users (userid) ON DELETE CASCADE ON UPDATE CASCADE
) TYPE = InnoDB;
I have a table with 1 million records. As my query will maily based on one column(32 constants). I am trying to add 32 partitions use type list.
My application can't stop, there will be insert some record in the meanwhile? Can I add partition to the table? Does it impace my application. Such as lock some rows duirng the partition.
I search the internet, but didn't find too much material abou the story of Add partition to existing table?
THank you.
A common way to do a table migration such as this without any impact to the application is to follow these steps:
Create a duplicated table that contains the revisions you require (in your case the partition)
Setup a trigger on the original table to insert into the duplicated table (this will act as a form of replication for a short period of time)
Start a migration from the original table to the new table, at a rate that will not hinder your application (say 1000 rows at a time)
When the migration completes, you'll have your tables in perfect sync, this is the time for you to modify your application to start reading and writing using the new table.
Once you're happy that your app is functional using the new table, drop the old table.
Migration complete, have a beer.
We use MYSQL InnoDB engine for insert and update operations, in order to improve the performance for query, we are considering using Memory table to store the latest data Ex. last two months data.
we can configure the MySQL to import data to Memory table when server start, but actual business data are updated all the time, we have to synchronize the data from InnoDB table to Meomory table frequently, but we cannot restart MySQL server every time when we want to synchronize the data.
Can anybody know how to synchronize the data without restart the MySQL?
You would typically do that with a trigger. My first idea would be to do it in two parts.
1) Create triggers for insert, update and delete (if that ever happens) on the innodb table that causes the same change in the memory table. Make sure no logic relies on that certain rows have been deleted from the memory table, it will hold the last 2 months and then some.
2) Create a background job to clear out the memory table of old data. If you have a high load against it consider a frequent job that nibbles of the old rows a few at a time.
Another solution would be to partition the innodb table by time and then make sure you include something like where time > subdate(now(), interval 2 month)