Logging of data change in mysql tables using ado.net - mysql

Is there any work around to get the latest change in MySQL Database using Ado.NET.
i.e. change in which table, which column, performed operation, old and new value. both for single table change and multiple table change. want to log the changes in my own new table.

There are several ways how change tracking can be implemented for mysql:
triggers: you can add DB trigger for insert/update/delete that creates an entry in the audit log.
add application logic to track changes. Implementation highly depends on your data layer; if you use ADO.NET DataAdapter, RowUpdating event is suitable for this purpose.
Also you have the following alternatives how to store audit log in mysql database:
use one table for audit log with columns like: id, table, operation, new_value (string), old_value (string). This approach has several drawbacks: this table will grow up very fast (as it holds history for changes in all tables), it keeps values as strings, it saves excessive data duplicated between old-new pairs, changeset calculation takes some resources on every insert/update.
use 'mirror' table (say, with '_log' suffix) for each table with enabled change tracking. On insert/update you can execute additional insert command into mirror table - as result you'll have record 'snapshots' on every save, and by this snapshots it is possible to calculate what and when is changed. Performance overhead on insert/update is minimal, and you don't need to determine which values are actually changed - but in 'mirror' table you'll have a lot of redundant data as full row copy is saved even if only one column is changed.
hybrid solution when record 'snapshots' are temporarily saved, and then processed in background to store differences in optimal way without affecting app performance.
There are no one best solution for all cases, everything depends on the concrete application requirements: how many inserts/updates are performed, how audit log is used etc.

Related

MySQL trigger vs application insert for history

I have a main table in mysql and need a history table for tracking the changes in the table.
I have 2 approaches.
trigger --> create a trigger for the main table which inserts into history table for any change in the main table
insert into the history table while inserting or updating in the main table from application
I am checking which is the best approach with performance.
Assuming your trigger performs exactly the same operation as the separate logging query (e.g. both insert a row to your history table whenever you modify your table), there is no significant performance difference between your two options, as both do the same amount of work.
The decision is usually design driven - or the preference of whoever makes the guidelines you have to follow.
Some advantages of using a trigger for your history log:
You cannot forget to log, e.g. by coding mistakes in your app, and don't have to take care of it in every quick and dirty maintenance script. MySQL does it for you.
You have direct access to all column values in the trigger including their previous values, and specifically the primary key (new.id). This makes logging trivial.
If you e.g. do batch modifications, it might be complicated to write an individual logging query. delete from tablename where xyz? You probably will do a insert into historytable ... select ... where xyz first, and if xyz is a slow condition that ends up not deleting anything, you may just double your execution time this way. So much for performance. update tablename set a = rand() where a > 0.5? Good luck writing a proper separate logging query for this.
Some advantages not using a trigger to log:
you have control over when and what you log, e.g. if you want to log only specific changes done by end users in your application, but not those by batch scripts or automatic processes, it might be easier (and faster) to just log explicitly what you want to log.
you may want to log additional information not available to the trigger (and that you don't want to store in the main table), e.g the windows login or the last button the user pressed to access the function that modified this data.
it might be more convenient to write a general logging function in a programming language, where you can use meta data to e.g. dynamically generate the logging query or compare old and new values in a loop over all columns, than to maintain 3 triggers for every table, where you usually have to list every column explicitly.
since you are especially interested in performance: although it's probably more a theoretical than a practical advantage, if you do a lot of batch modifications, it might be faster to write the log in batches too (e.g. inserting 1000 history rows at once will be faster than inserting 1000 rows individually using a trigger). But you will have to properly design your logging query, and the query itself cannot be slow.

MySQL backup pieces of the database from a server

I'm writing the back-end for a web app in Spring and it uses a MySQL database on an AWS RDS instance to keep track of user data. Right now the SQL tables are separated by user groups (just a value in a column), so different groups have different access to data. Whenever a person using the app does a certain operation, we want to back up their part of the database, which can be viewed later, or replace their data in the current branch if they want.
The only way I can figure out how to do this is to create separate copies of every table for each backup and keep another table to keep track of what all the names of the tables are. This feels very inelegant and labor intensive.
So far all operations I do on the database are SQL queries from the server, and I would like to stay consistent with that.
Is there a nice way to do what I need?
Why would you want a separate table for each backup? You could have a single table that mirrored the main table but had a few additional fields to record some metadata about the change, for example the person making it, a timestamp, and the type of change either update or delete. Whenever a change is made, simply copy the old value over to this table and you will then have a complete history of the state of the record over time. You can still enforce the group-based access by keeping that column.
As for doing all this with queries, you will need some for viewing or restoring these archived changes, but the simplest way for maintaining the archived records is surely to create TRIGGERS on the main tables. If you add BEFORE UPDATE and BEFORE DELETE TRIGGERS these can copy the old version of each record over to the archive (and also add the metadata at the same time) each time a record is updated or deleted.

About staging tables and merge

I'm really new in the BI world, and some concepts seems misunderstood for me.
I'm reading some articles and books about this, they are full of graphics and flows that does not tell much about the process in practice.
About the staging tables and the extraction process.
I know that the tables in staging area need to be deleted after the flow has been executed.
Considering this, imagine a flow with a initial full extraction to the target database. Then, using a merge cdc, i need to identify what was updated in the source tables. My doubt is here, how can i know what was updated since my tables are on the target, and the data on staging has been deleted?
I need to bring the data of the target tables to the staging area and then do the merge?
Change Data Capture (CDC) is usually done on the source system, either with an explicit changed field (either a simple boolean or a timestamp) or automatically by the underlying database management system.
If you have a timestamp field in your data you first do your initial load to staging, record the maximum timestamp retrieved, and then on the next update you only retrieve records where the timestamp is greater than your recorded value. This is the preferred way to do it if there's no real CDC functionality on the source system.
Using a boolean field is trickier as all inserts and updates to the source-system must set it to true and after your extraction you'll have to reset it to false.

Can I mirror changes in a database with homebrew code?

I have several mysql databases and tables that need to be "listened to". I need to know what data changes and send the changes to remote servers that have local mirrors of the database.
How can I mirror changes in the mysql databases? I was thinking of setting up mysql triggers that write all changes to another table. This table has the database name, table name, and all of the columns. I'd then write custom code to transfer the changes and install them periodically on the remote mirrors. Will this accomplish my need?
Your plan is 100% correct.
That extra table is called an "audit" or "history" table (there are subtle distinctions but you shouldn't much care - but you now have the "official" terms which you can use to do further research).
If the main table has columns A, B, C, then the audit would have 3 more: A, B, C, Operation, Changed_By, Change_DateTime (names are subject to your tastes and coding standards).
"Operation" column stores whether the change was an insert, delete, old value of update or new value of update (frequently it's 3 characters wide and the operations as "INS"/"DEL"/"U_D" and "U_I", but there are other approaches).
The data in the audit table is populated via a trigger on the main table.
Then make sure there's an index on Change_DateTime column.
And to find a list of changes, you keep track of when you last polled, and then simply do
SELECT * FROM Table_Audit WHERE Change_DateTime > 'LAST_POLL_TIME'
You can tell MySQL to create an incremental backup from a specific point in time. The data contains only the changes to the database since that time.
You have to turn on binary logging and then you can use the mysqlbinlog command to export the changes since a given timestamp. See the Point-in-Time (Incremental) Recovery section of the manual as well as the documentation for mysqlbinlog. Specifically, you will want the --start-datetime parameter.
Once you have the exported log in text format, you can execute it on another database instance.
As soon as you step outside the mechanisms of the DBMS to accomplish an inherently DB oriented task like mirroring, you've violated most of the properties of a DB that distinguish it from an ordinary file.
In particular, the mechanism you propose violates the atomicity, consistency, isolation, and durability that MySQL is built to ensure. For example, incomplete playback of the log on the mirrors will leave your mirrors in a state inconsistent with the parent DB. What you propose can only approximate mirroring, thus you should prefer DBMS intrinsic mechanisms unless you don't care if the mirrors accurately reflect the state of the parent.

How to avoid to blow up transaction log?

I have a table which stores data out of a complex query. This table is truncated and new populated once per hour. As you might assume this is for performance reason so the application accesses this table and not the query.
Is truncate and insert the only way to resolve this task cheap, or are there other possibilities in respect of the transaction log?
If I am assuming right, you are using this table as a temp table to store some records and want to remove all records from this table every one hour, right?
Truncate is always minimally logged. So yes, truncate and then insert will work. Another option is to create a new table with same structure. Drop old table and then rename new table to the old table name.
If you want to avoid the above, you can explore the "simple" recovery model (this has implications on point of time recovery - so be very careful with this if you have other tables in this same database). Or you can create a new database which will just have this one table, set recovery for this DB to "simple". Simple recovery model will help you keep your t-log small.
Lastly, if you have to have full recovery and also cannot use "truncate" or "drop" options from above, you should at the very least backup your t-log at very regular intervals (depending on how big its growing and how much space you have).