How can I create a lifetime of a row so after a specific time say 2 weeks the row will automatically erase? Any info would be great.
RDBMS don't generally allow rows to automatically self destruct. It's bad for business.
More seriously, some ideas, depending on your exact needs
run a scheduled job to run a DELETE to remove rows based on some date/time column
(more complex idea) use a partitioned table with a sliding window to move older rows to another partition
use a view to only show rows less than 2 weeks old
Add a timestamp column to the table that defaults to CURRENT_TIMESTAMP, and install a cron job on the server that frequently runs and prunes old records.
DELETE FROM MyTable WHERE datediff(now(), myTimestamp) >= 14;
Or you can add timestamp column and always select like this:
SELECT * FROM myTable WHERE timetampColumn>=date_sub(now(), interval 2 week);
It is better if you don't need to erase the data and you want to show only data from last 2 weeks.
Related
I am using mysql server v5.1.73 on a Centos 6.4 64bit operating system.I have a table with about 17m records that it's size is about 10GB. mysql engine for this table is innodb.
This table has 10 columns and one of them is 'date' which its type is datetime. I want to delete records of a specific date with mysql 'delete' command.
delete
from table
where date(date) = '2015-06-01'
limit 1000
but when I run this command, i get an error 'the total number of locks exceeds lock table size'. I had this problem before and when i change innodb_buffer_poolsize, it would fix the problem but this time even increasing this amount, problem still exits.
I tried many tricks like changing limit value to 100 or even 1 record, but it doesn't work. I even increased innodb-buffer-poolsize to 20GB but nothing changed.
I also read these links: "The total number of locks exceeds the lock table size" Deleting 267 Records andThe total number of locks exceeds the lock table size
but they didn't solve my problem. my server has 64GB RAM.
in the other hand, I can delete records when not using a filter on a specific date:
delete
from table
limit 1000
and also I can select records of the day without any problem.can anyone help me with this?
I would appreciate any help to fix the problem.
Don't use date(date), it cannot use INDEX(date). Instead simply use date and have an index beginning with date.
More ways to do a chunking delete.
date(date) in the WHERE clause requires the database to calculate the value of that column for all 17m rows in the database at run time - the DB will create date(date) row by row for 17m rows, and then table scan (there is no index) those 17m rows to work out your result set. This is where you are running out of resources.
You need to remove the usage of the calculated column, which can be solved a couple of different ways.
Rather than doing date(date), change your comparison to be:
WHERE date >= '2015-06-01 00:00:00' AND date <= '2015-06-01 23:59:59'
This will now hit the index on the date column directly (I'm assuming you have an index on this column, it just won't be used by your original query)
The other solution would be to add a column to the table of type DATE, and permanently store the DATE of each DATETIME in that column (and obviously add an index for the new DATE column). That would allow you to run any query you like that just wants to examine the DATE portion only, without having to specify the time range. If you've got other queries currently using date(date), having a column with just the date specifically in it might be a preferred solution than adding the time range to the query (adding the time range is fine for a straight index comparison in a SELECT or DELETE like here, but might not be a usable solution for other queries involving JOIN, GROUP BY, etc).
I have a large table containing hourly statistical data broken down across a number of dimensions. It's now large enough that I need to start aggregating the data to make queries faster. The table looks something like:
customer INT
campaign INT
start_time TIMESTAMP
end_time TIMESTAMP
time_period ENUM(hour, day, week)
clicks INT
I was thinking that I could, for example, insert a row into the table where campaign is null, and the clicks value would be the sum of all clicks for that customer and time period. Similarly, I could set the time period to "day" and this would be the sum of all of the hours in that day.
I'm sure this is a fairly common thing to do, so I'm wondering what the best way to achieve this in MySql? I'm assuming an INSERT INTO combined with a SELECT statement (like with a materialized view) - however since new data is constantly being added to this table, how do I avoid re-calculating aggregate data that I've previously calculated?
I done something similar and here is the problems I have deal with:
You can use round(start_time/86400)*86400 in "group by" part to get summary of all entries from same day. (For week is almost the same)
The SQL will look like:
insert into the_table
( select
customer,
NULL,
round(start_time/86400)*86400,
round(start_time/86400)*86400 + 86400,
'day',
sum(clicks)
from the_table
where time_period = 'hour' and start_time between <A> and <B>
group by customer, round(start_time/86400)*86400 ) as tbl;
delete from the_table
where time_period = 'hour' and start_time between <A> and <B>;
If you going to insert summary from same table to itself - you will use temp (Which mean you copy part of data from the table aside, than it dropped - for each transaction). So you must be very careful with the indexes and size of data returned by inner select.
When you constantly inserting and deleting rows - you will get fragmentation issues sooner or later. It will slow you down dramatically. The solutions is to use partitioning & to drop old partitions from time to time. Or you can run "optimize table" statement, but it will stop you work for relatively long time (may be minutes).
To avoid mess with duplicate data - you may want to clone the table for each time aggregation period (hour_table, day_table, ...)
If you're trying to make the table smaller, you'll be deleting the detailed rows after you make the summary row, right? Transactions are your friend. Start one, compute the rollup, insert the rollup, delete the detailed rows, end the transaction.
If you happen to add more rows for an older time period (who does that??), you can run the rollup again - it will combine your previous rollup entry with your extra data into a new, more powerful, rollup entry.
I find myself wanting to delete rows older than (x)-days on a rolling basis in a lot of applications. What is the best way to do this most efficiently on a high-traffic table?
For instance, if I have a table that stores notifications and I only want to keep these for 7 days. Or high scores that I only want to keep for 31 days.
Right now I keep a row storing the epoch time posted and run a cron job that runs once per hour and deletes them in increments like this:
DELETE FROM my_table WHERE time_stored < 1234567890 LIMIT 100
I do that until mysql_affected_rows returns 0.
I used to do it all at once but that caused everything in the application to hang for 30 seconds or so while INSERTS piled up. Adding the LIMIT worked to alleviate this but I'm wondering if there is a better way to do this.
Try creating Event that will run on database automatically after the time interval you want.
Here is an Example:
If you want to delete entries that are more than 30 days old from some table 'tableName', having column entry 'datetime'. Then following query runs every day which will do required clean-up action.
CREATE EVENT AutoDeleteOldNotifications
ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL 1 DAY
ON COMPLETION PRESERVE
DO
DELETE LOW_PRIORITY FROM databaseName.tableName WHERE datetime < DATE_SUB(NOW(), INTERVAL 30 DAY)
We need to add ON COMPLETION PRESERVE to keep the event after each run. You can find more info here: http://www.mysqltutorial.org/mysql-triggers/working-mysql-scheduled-event/
Check out MySQL Partitioning:
Data that loses its usefulness can often be easily removed from a partitioned table by dropping the partition (or partitions) containing only that data. Conversely, the process of adding new data can in some cases be greatly facilitated by adding one or more new partitions for storing specifically that data.
See e.g. this section to get some ideas on how to apply it:
MySQL Partition Pruning
And this one:
Partitioning by dates: the quick how-to
Instead of executing the delete against the table alone, try gathering the matching keys first and then do a DELETE JOIN
Given you sample query above
DELETE FROM my_table WHERE time_stored < 1234567890 LIMIT 100 ;
You can leave the LIMIT out of it.
Let say you want to delete data that over 31 days old.
Let's compute 31 days in seconds (86400 X 31 = 2678400)
Start with key gathering
Next, index the keys
Then, perform DELETE JOIN
Finally, drop the gathered keys
Here is the algorithm
CREATE TABLE delete_keys SELECT id FROM my_table WHERE 1=2;
INSERT INTO delete_keys
SELECT id FROM
(
SELECT id FROM my_table
WHERE time_stored < (UNIX_TIMESTAMP() - 2678400)
ORDER BY time_stored
) A LIMIT 100;
ALTER TABLE delete_keys ADD PRIMARY KEY (id);
DELETE B.* FROM delete_keys
INNER JOIN my_table B USING (id);
DROP TABLE delete_keys;
If the key gathering is less than 5 minutes, then run this query every 5 minutes.
Give it a Try !!!
UPDATE 2012-02-27 16:55 EDT
Here is something that should speed up key gathering a little more. Add the following index:
ALTER TABLE my_table ADD INDEX time_stored_id_ndx (time_stored,id);
This will better support the subquery that populates the delete_keys table because this provides a covering index so that the fields are retrieved frok the index only.
UPDATE 2012-02-27 16:59 EDT
Since you have to delete often, you may want to try this every two months
OPTIMIZE TABLE my_table;
This will defrag the table after all those annoying little deletes every 5 minutes for two months
At my company, we have a similar situation. We have a table that contains keys that have an expiration. We have a cron that runs to clean that out:
DELETE FROM t1 WHERE expiration < UNIXTIME(NOW());
This ran once an hour, but we were having similar issues to what you are experiencing. We increased it to once per minute. Then 6 times per minute. Setup a cron with a bash script that basically does the query, then sleeps for a few seconds and repeats until the minute is up.
The increased frequency significantly decreased the number of rows that we were deleting. Which relieved the contention. This is the route that I would go.
However, if you find that you still have too many rows to delete, use the limit and do a sleep between them. For example, if you have 50k rows to delete, do a 10k chunk with a 2 second sleep between them. This will help the queries from stacking up, and it will allow the server to perform some normal operations between these bulk deletes.
You may want to consider introducing a master/slave (replication) solution into your design. If you shift all the read traffic to the slave, you open up the master to handle 'on-the-fly' CRUD activities, which then replicate down to the slave (your read server).
And because you are deleting so many records you may want to consider running an optimize on the table(s) from where the rows are being deleted.
Ended up using this to leave only 100 last rows in place, so significant lag when executed frequently (every minute)
delete a from tbl a left join (
select ID
from tbl
order by id desc limit 100
) b on a.ID = b.ID
where b.ID is null;
I have an application that used to insert a datetime field. There was an issue though that it would insert with the seconds and it wasn't supposed to!
I.e. 2011-08-07 15:24:06
They have corrected this and inserts now only include up to the minute so the above example would look like
2011-08-07 15:24:00
There are 20 million rows "wrong" at the moment. What is the most efficient query to fix the old ones?
UPDATE
table
SET
timefield = DATE_FORMAT(timefield,'%Y-%m-%d %H:%i:00')
WHERE
SECOND(timefield) <> 0;
This will have to read each row of the table and extract the seconds part of the time (this is unavoidable), but it won't have to update rows which already are correct.
Probably filter your update based on the 'wrong' ones then do a right trim of the last 3 characters.
I have a table with a timestamp column that records when the record is modified.
I would like to on a nightly basis move all records that are older than 6 days.
should I use
insert into archive_table select * from regualr_table where datediff( now(), audit_updated_date)>=6;
delete from regular_table where datediff( now(), audit_updated_date)>=6;
since there are 1 million rows in regular_table, is there anyway to optimize the query so they run faster? Also will the delete be locking the regular_table?
My main concern is the read query to the db won't be slowed down by this archiving process.
Two suggestions:
Compute the value of the cutoff date in a variable, and query compared to that, e.g.:
SET #archivalCutoff = DATE_SUB(NOW(), INTERVAL 6 DAY);
insert into archive_table select * from regular_table where audit_updated_date < #archivalCutoff;
delete from regular_table where audit_updated_date)< #archivalCutoff;
In fact what you have in your question runs into problems, especially with lots of records, because the cutoff moves, you may get records in your regular and archive tables, and you may get records that are deleted but not archived.
The second suggestion is to index the audit_updated field.