delete rows from a table using MySQL Scheduler - mysql

I have a MySQL table called: "regkeys". This table has two columns: "keyCat" and "keyNum". In my web-app I have a keygen module that adds a key(1134fb) and a category(mds or dts, etc.) into this table.
What I want to accomplish is this: after issuing (adding into the db) a key, I'd like to create a MySQL event to delete the keys that stay/stayed in my table longer than 2 days (or so), this way expiring the keys (for example: - from the time it was created, counting 2 days, each key has to be deleted, having certain keys getting deleted sooner or later than others, depending on when they were created). I looked at the MySQL's API, but I need some help with the logic. I do not know how to tell the event to delete only the keys that were stored in for 2 days (or so). I was hoping somebody could give me a quick example or direct me to a clear tutorial.
Thank you very much in advance.
Edited: I think I found this other question that helps a bit with my problem. I think I was going about it the wrong way (key based). Since every key and key category get inserted into a row, the scheduler should deal with rows instead of keys.

This is the solution:
Enabled the event_scheduler in the db like this: SET GLOBAL event_scheduler = ON;
Added a timestamp column in my table using MySQL default values
I created an event, CREATE EVENT <name>
Called the event to run on a schedule like so: ON SCHEDULE EVERY 20 SECOND
Add the SQL query to the event: DO DELETE FROM <table_name> WHERE <time_stamp_column> < NOW() - INTERVAL 5 MINUTE.

Related

Can I do Change Data Capture with MariaDb's Automatic Data Versioning

We're using MariaDb in production and we've added a MariaDb slave so that our data team can perform some ETL tasks from this slave to our datawarehouse. However, they lack a proper Change Data Capture feature (i.e. they want to know which rows from the production table changed since yesterday in order to query rows that actually changed).
I saw that MariaDb's 10.3 had an interesting feature that allowed to perform a SELECT on an older version of a table. However, I haven't found resources that supported the idea that it could be used for CDC, any feedback on this feature?
If not, we'll probably resort to streaming the slave's binlogs to our datawarehouse but that looks challenging..
Thanks for your help!
(As a supplement to Stefans answer)
Yes, the System-Versioning can be used for CDC because the validity-period in ROW_START (Object starts to be valid) and ROW_END (Object is now invalid) can be interpreted when an INSERT-, UPDATE- or DELETE-query happened. But it's more cumbersome as with alternative CDC-variants.
INSERT:
Object was found for the first time
ROW_START is the insertion time
UPDATE:
Object wasn't found for the first time
ROW_START is the update time
DELETE:
ROW_END lies in the past
there is no new entry for this object in the next few lines
I'll add a picture to clarify this.
You can see that this versioning is space saving because you can combine the information about INSERT and DELETE of an object in one line, but to check for DELETEs is costly.
In the example above I used a Table with a clear Primary Key. So a check for the-same-object is easy: just look at the id. If you want to capture changes in talbes with an key-combination this can also make the whole process more annoying.
Edit: another point is that the protocol-Data is kept in the same table as the "real" data. Maybe this is faster for an INSERT than known alternativ solution like the tracking per TRIGGER (like here), but if changes are made quite frequent on the table and you want to process/analyse the CDC-Data this can cause performance problems.
MariaDB supports System-Versioned Tables since version 10.3.4. System version tables are specified in the SQL:2011 standard. They can be used for automatically capturing previous versions of rows. Those versions can then be queried to retrieve their values as they have been set at a specific point in time.
The following text and code example is from the official MariaDB documentation
With system-versioned tables, MariaDB Server tracks the points in time
when rows change. When you update a row on these tables, it creates a
new row to display as current without removing the old data. This
tracking remains transparent to the application. When querying a
system-versioned table, you can retrieve either the most current
values for every row or the historic values available at a given point
in time.
You may find this feature useful in efficiently tracking the time of
changes to continuously-monitored values that do not change
frequently, such as changes in temperature over the course of a year.
System versioning is often useful for auditing.
With adding SYSTEM VERSIONING to a newly created or an already existing table (using ALTER), the table will be expanded by row_start and row_end time stamp columns which allow retrieving the record valid within the time between the start and the end timestamps.
CREATE TABLE accounts (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255),
amount INT
) WITH SYSTEM VERSIONING;
It is then possible to retrieve data as it was at a specific time (with SELECT * FROM accounts FOR SYSTEM_TIME AS OF '2019-06-18 11:00';), all versions within a specific time range
SELECT * FROM accounts
FOR SYSTEM_TIME
BETWEEN (NOW() - INTERVAL 1 YEAR)
AND NOW();
or all versions at once:
SELECT * FROM accounts
FOR SYSTEM_TIME ALL;

Does mysql can give each row a specified lifetime to be deleted automatically?

I'd like to know if mysql(or mariadb) offers a function for the expiration that a row can be removed automatically in the DB without using any extra scheduler program nor using any SQL like DELETE.
This should happen or define when you create a table so that once INSERT occurs it starts to manage it.
There are many related questions here:
MySQL how to make value expire?
Remove Mysql row after specified time
MySQL give rows a lifetime
However, I couldn't find the answer. I am not curious about using WHERE nor DELETE.
Is it even possible?
Yes for same you can create an event by this way
CREATE EVENT lifetime ON SCHEDULE
EVERY 1 DAY STARTS '14:05:44' ENDS '14:05:46'
ON COMPLETION NOT PRESERVE
ENABLE
DO BEGIN
// put your delete query here with where clause by calculate your exp date
END

Detecting database change

I have a database intensive application that needs to run every couple hours. Is there a way to detect whether a given table has changed since the last time this application ran?
The most efficient way to detect changes is this.
CHECKSUM TABLE tableName
A couple of questions:
Which OS are you working on?
Which storage engine are you using?
The command [http://dev.mysql.com/doc/refman/5.5/en/show-table-status.html](SHOW TABLE STATUS) can display some info depending on storage engine though.
It also depends on how large is the interval between runs of your intensive operation.
The most precise way I believe is with the use of triggers (AFTER INSERT/UPDATE) as #Neuticle mentioned, and just store the CURRENT_TIMESTAMP next to the table name.
CREATE TABLE table_versions(
table_name VARCHAR(50) NOT NULL PRIMARY KEY,
version TIMESTAMP NOT NULL
);
CREATE TRIGGER table_1_version_insert AFTER INSERT
ON table_1
FOR EACH ROW
BEGIN
REPLACE INTO table_versions VALUES('table_1', CURRENT_TIMESTAMP);
END
Could you set a trigger on the tables you want to track to add to a log table on insert? If that would work you only have to read the log tables on each run.
Use timestamp. Depending upon your needs you can set it to update on new rows, or just changes to existing rows. Go here to see a reference:
http://dev.mysql.com/doc/refman/5.0/en/timestamp-initialization.html
A common way to detect changes to a table between runs is with a query like this:
SELECT COUNT(*),MAX(t) FROM table;
But for this to work, a few assumptions must be true about your table:
The t column has a default value of NOW()
There is a trigger that runs on UPDATE and always sets the t column to NOW().
Any normal changes made to the table will then cause the output of the above query to change:
There are a few race conditions that can make this sort of check not work in some instances.
Have used CHECKSUM TABLE tablename and that works just splendid.
Am calling it from an AJAX request to check for table updates. If changes are found a screen refresh is performed.
For database "myMVC" and table "detail" it returns one row with fields "table" and "Checksum" set to "mymvc.detail" and "521719307" respectively.

How record insert time in mysql database

I want to remove a table row from my table new_data once the row is 45 mins old and then input it in another table called old_data.
The only way i can think for this to work, it to query the database lets say every min and remove any row thats (current_time - time inserted) > 45 mins.
Is there any other way of doing this? if not how could i set up a table to record inserted_time?
edit added
How could i write this statement to retrieve the correct data into the old_data table
SELECT * FROM new_spots WHERE (NOW()-created_at)>45mins
and then insert the above into the old_data table
you can specify value of time column upon insertion:
INSERT INTO x (created_at) VALUES (NOW());
additionally you can setup VIEW to show you only recent entries.
you are asking for some kind of auto expiration feature, it is not built into mysql. Memcached provides this feature. So it might be cleaner to achieve your goal as:
when you insert data into your system, you do:
insert your data into memcached with 45 minutes expiration time -- after 45 minutes, the data automatically disappear from memcached.
insert the data into the old_data table with a created_at column -- in case you need to rebuild your memcached when your memcached have to restart or other issue.
So everytime you just need to get the new data from the memcached -- as a side effect, it is faster than get the data from mysql :).
#keymone showed you how to capture the insert time. Then, periodically (every minute seems excessive - every 5 mins? 15 mins?) go through and build a list that meets the criteria, and for each entry, insert into your second table and delete from your first table.
I don't think there is an automatic way to do this. Here are some alternative ideas.
Use CRON
I have a similar scenario where we need to aggregate data from one table into another. A simple command line tool running via CRON suffices. We receive a few messages a second into our Web server and each results in a database insert. So volumes aren't huge but they are reasonably similar to your scenario
We use the NOW() function to record the insert time and after the records are 1hr old, we process them. It isn't exactly an hour but it is good enough. You can see the created_on field below.
CREATE TABLE glossaries (
id int(11) NOT NULL auto_increment,
# Our stuff ...
created_on datetime default NULL,
KEY owner_id (owner_id),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Use CRON and a Trigger
Alternatively you could use a database trigger to kick off the processing. You would still need something scheduled to cause the trigger to fire but you would get max performance/
Chris

MySQL is handling one SQL query at the time?

If you got 100 000 users, is MySQL executing one SQL query at the time?
Because in my PHP code I check if a certain row exists; if it doesn't it creates one. If it does, it just updates the row counter.
It crossed my mind that perhaps 100 users are checking if the row exists at the same time, and when it doesn't they all create one row each.
If MySQL is handling them sequentially I know that it won't be an issue, then one user will check if it exists, if not, create it. The other user will check if it exists, and since that's the case, it just updates the counter.
But if they all check if it exists at the same time and let's say it doesn't, then they all create one row and the whole table structure will fail.
Would be great if someone could shed some light on this topic.
Use a UNIQUE constraint or, if viable, make the primary key one of your data items and the SQL server will prevent duplicate rows from being created. You can even use the "ON DUPLICATE KEY UPDATE ..." syntax to specify the alternate operation if the row already exists.
From your comments, it sounds like you could use the user_id as your primary key, in which case, you'd be able to use something like this:
INSERT INTO usercounts (user_id,usercount)
VALUES (id-goes-here,1)
ON DUPLICATE KEY UPDATE usercount=usercount+1;
If you put the check and insert into a transaction then you can avoid this problem. This way, the check and create will be run as one one query and there shouldn't be any confusion