Add every update, delete, insert query to a new record in MySQL - mysql

Is there a way that if there's a change in records, that a query that changed the data (update, delete, insert) can be added to a "history" table transparently?
For example, if mySQL detects a change in a record or set of records, is there a way for mySQL to add that query statement into a separate table so that way, we can track the changes? That would make "rollback" possible since every query (other than SELECT) would be able to reconstruct database from its first row. Right?
I use PHP to interact with mySQL.

You need to enable the MySQL BinLog. This automatically logs all the alteration statements to a binary log which can be replied as needed.
The alternative is to use an auditing function through Triggers

Read about transaction logging in MySQL. This is built in to MySQL.

MySQL has logging functionality that can be used to log all queries. I usually leave this turned off since these logs can grow very rapidly, but it is useful to turn on when debugging.
If you are looking to track changes to records so that you can "roll back" a sequence of queries if some error condition presents itself, then you may want to look into MySQL's native support of transactions.

Related

MySQL trigger vs application insert for history

I have a main table in mysql and need a history table for tracking the changes in the table.
I have 2 approaches.
trigger --> create a trigger for the main table which inserts into history table for any change in the main table
insert into the history table while inserting or updating in the main table from application
I am checking which is the best approach with performance.
Assuming your trigger performs exactly the same operation as the separate logging query (e.g. both insert a row to your history table whenever you modify your table), there is no significant performance difference between your two options, as both do the same amount of work.
The decision is usually design driven - or the preference of whoever makes the guidelines you have to follow.
Some advantages of using a trigger for your history log:
You cannot forget to log, e.g. by coding mistakes in your app, and don't have to take care of it in every quick and dirty maintenance script. MySQL does it for you.
You have direct access to all column values in the trigger including their previous values, and specifically the primary key (new.id). This makes logging trivial.
If you e.g. do batch modifications, it might be complicated to write an individual logging query. delete from tablename where xyz? You probably will do a insert into historytable ... select ... where xyz first, and if xyz is a slow condition that ends up not deleting anything, you may just double your execution time this way. So much for performance. update tablename set a = rand() where a > 0.5? Good luck writing a proper separate logging query for this.
Some advantages not using a trigger to log:
you have control over when and what you log, e.g. if you want to log only specific changes done by end users in your application, but not those by batch scripts or automatic processes, it might be easier (and faster) to just log explicitly what you want to log.
you may want to log additional information not available to the trigger (and that you don't want to store in the main table), e.g the windows login or the last button the user pressed to access the function that modified this data.
it might be more convenient to write a general logging function in a programming language, where you can use meta data to e.g. dynamically generate the logging query or compare old and new values in a loop over all columns, than to maintain 3 triggers for every table, where you usually have to list every column explicitly.
since you are especially interested in performance: although it's probably more a theoretical than a practical advantage, if you do a lot of batch modifications, it might be faster to write the log in batches too (e.g. inserting 1000 history rows at once will be faster than inserting 1000 rows individually using a trigger). But you will have to properly design your logging query, and the query itself cannot be slow.

Explain the idea behind of consistent nonlocking reads in mysql

I've just read a mysql docs where I found such sentence: "A consistent read means that InnoDB uses multi-versioning to present to a query a snapshot of the database at a point in time."
I read a lot of mysql doc pages, but still cann't clarify to myself what exactly "to a query" here means. Definitly it ralates to a SELECT statement, but what about if my transaction starts with UPDATE, INSERT, DELETE statement?
Thanks!
I found another way on my answer. And I think it should be apripriate by the others. So, days of searching whiting oracle docs and finaly founed:
InnoDB creates a consistent read view or a consistent snapshot either when the statement
mysql> START TRANSACTION WITH CONSISTENT SNAPSHOT;
is executed or when the first select query is executed in the transaction.
https://blogs.oracle.com/mysqlinnodb/entry/repeatable_read_isolation_level_in
When the query can change the data, the database also uses locks to synchronise queries.
So between queries that change data, locks are used to make sure that only one query at a time can change specific items. Between a query that reads data and a query that changes data, multi-versioning is used to present the data before the change to the query that reads it.

How to subscribe to update,delete and inserts on a mysql table?

I would like to get a notification when in certain mysql (or mariadb) tables (innodb) updates,inserts or deletes happen.
I need to track these changes from another process as soon as possible,
I was thinking maybe I could subscribe to the mysql binary log?
Can somebody explain how this can be done?
Is there for example a log read API that mysql offers?
Does the game change when I use a Galera cluster?
You may use mysqlbinlog with --stop-never option to get all insert, update, and delete statements (mysqlbinlog documentation).
You may use the C++ library MySQL Replication Listener that is based on the binlog api.
I don't know if this will help you, but I like to use a separate table to track the changes. If I have a table called "site_visitors", I'll create another table called "site_visitors_log" that is immediately written to with the information I need (IP addresses, timestamp, etc.) right after data is inserted into "site_visitors". Very convenient.
TRIGGER is your friend here. From MySQL-Doc:
A trigger is defined to activate when a statement inserts,
updates, or deletes rows in the associated table
See MySQL-Doc here, there are some examples, too.

MySQL history or transaction log table updated by triggers

my question is about a database history or transaction log table which is currently updated by mysql procedure. The procedure is called by mysql trigger every time when we keep a history of an appropriate table in during insert, update or delete actions. As far as we have lots of tables for each of them we need to create a separate trigger e.g. for "accounts table" we need to create "accounts_insert, accounts_update and accounts_delete" triggers.
The problem is every time when we alter "accounts table" we have to modify appropriate triggers as well.
Is there any way to avoid that manual work? Would it be better to implement it in application layer/code?
There are no 'global' triggers if that's what you're thinking about.
Application side logging is one possible solution. You'll want to do this within transactions whenever possible.
Other possible approaches:
Create a script that will update your triggers for you. Can be fairly easy, if your triggers are generally similar to each other. Using information_schema database can be helpful here.
Parse general query log (careful, enabling this log can have large negative impact on server's performance)

What is the best way to update (or replace) an entire database table on a live machine?

I'm being given a data source weekly that I'm going to parse and put into a database. The data will not change much from week to week, but I should be updating the database on a regular basis. Besides this weekly update, the data is static.
For now rebuilding the entire database isn't a problem, but eventually this database will be live and people could be querying the database while I'm rebuilding it. The amount of data isn't small (couple hundred megabytes), so it won't load that instantaneously, and personally I want a bit more of a foolproof system than "I hope no one queries while the database is in disarray."
I've thought of a few different ways of solving this problem, and was wondering what the best method would be. Here's my ideas so far:
Instead of replacing entire tables, query for the difference between my current database and what I want to place in the database. This seems like it could be an unnecessary amount of work, though.
Creating dummy data tables, then doing a table rename (or having the server code point towards the new data tables).
Just telling users that the site is going through maintenance and put the system offline for a few minutes. (This is not preferable for obvious reasons, but if it's far and away the best answer I'm willing to accept that.)
Thoughts?
I can't speak for MySQL, but PostgreSQL has transactional DDL. This is a wonderful feature, and means that your second option, loading new data into a dummy table and then executing a table rename, should work great. If you want to replace the table foo with foo_new, you only have to load the new data into foo_new and run a script to do the rename. This script should execute in its own transaction, so if something about the rename goes bad, both foo and foo_new will be left untouched when it rolls back.
The main problem with that approach is that it can get a little messy to handle foreign keys from other tables that key on foo. But at least you're guaranteed that your data will remain consistent.
A better approach in the long term, I think, is just to perform the updates on the data directly (your first option). Once again, you can stick all the updating in a single transaction, so you're guaranteed all-or-nothing semantics. Even better would be online updates, just updating the data directly as new information becomes available. This may not be an option for you if you need the results of someone else's batch job, but if you can do it, it's the best option.
BEGIN;
DELETE FROM TABLE;
INSERT INTO TABLE;
COMMIT;
Users will see the changeover instantly when you hit commit. Any queries started before the commit will run on the old data, anything afterwards will run on the new data. The database will actually clear the old table once the last user is done with it. Because everything is "static" (you're the only one who ever changes it, and only once a week), you don't have to worry about any lock issues or timeouts. For MySQL, this depends on InnoDB. PostgreSQL does it, and SQL Server calls it "snapshotting," and I can't remember the details off the top of my head since I rarely use the thing.
If you Google "transaction isolation" + the name of whatever database you're using, you'll find appropriate information.
We solved this problem by using PostgreSQL's table inheritance/constraints mechanism.
You create a trigger that auto-creates sub-tables partitioned based on a date field.
This article was the source I used.
Which database server are you using? SQL 2005 and above provides a locking method called "Snapshot". It allows you to open a transaction, do all of your updates, and then commit, all while users of the database continue to view the pre-transaction data. Normally, your transaction would lock your tables and block their queries, but snapshot locking would be perfect in your case.
More info here: http://blogs.msdn.com/craigfr/archive/2007/05/16/serializable-vs-snapshot-isolation-level.aspx
But it requires SQL Server, so if you're using something else....
Several database systems (since you didn't specify yours, I'll keep this general) do offer the SQL:2003 Standard statement called MERGE which will basically allow you to
insert new rows into a target table from a source which don't exist there yet
update existing rows in the target table based on new values from the source
optionally even delete rows from the target that don't show up in the import table anymore
SQL Server 2008 is the first Microsoft offering to have this statement - check out more here, here or here.
Other database system probably will have similar implementations - it's a SQL:2003 Standard statement after all.
Marc
Use different table names(mytable_[yyyy]_[wk]) and a view for providing you with a constant name(mytable). Once a new table is completely imported update your view so that it uses that table.