Can I mirror changes in a database with homebrew code? - mysql

I have several mysql databases and tables that need to be "listened to". I need to know what data changes and send the changes to remote servers that have local mirrors of the database.
How can I mirror changes in the mysql databases? I was thinking of setting up mysql triggers that write all changes to another table. This table has the database name, table name, and all of the columns. I'd then write custom code to transfer the changes and install them periodically on the remote mirrors. Will this accomplish my need?

Your plan is 100% correct.
That extra table is called an "audit" or "history" table (there are subtle distinctions but you shouldn't much care - but you now have the "official" terms which you can use to do further research).
If the main table has columns A, B, C, then the audit would have 3 more: A, B, C, Operation, Changed_By, Change_DateTime (names are subject to your tastes and coding standards).
"Operation" column stores whether the change was an insert, delete, old value of update or new value of update (frequently it's 3 characters wide and the operations as "INS"/"DEL"/"U_D" and "U_I", but there are other approaches).
The data in the audit table is populated via a trigger on the main table.
Then make sure there's an index on Change_DateTime column.
And to find a list of changes, you keep track of when you last polled, and then simply do
SELECT * FROM Table_Audit WHERE Change_DateTime > 'LAST_POLL_TIME'

You can tell MySQL to create an incremental backup from a specific point in time. The data contains only the changes to the database since that time.
You have to turn on binary logging and then you can use the mysqlbinlog command to export the changes since a given timestamp. See the Point-in-Time (Incremental) Recovery section of the manual as well as the documentation for mysqlbinlog. Specifically, you will want the --start-datetime parameter.
Once you have the exported log in text format, you can execute it on another database instance.

As soon as you step outside the mechanisms of the DBMS to accomplish an inherently DB oriented task like mirroring, you've violated most of the properties of a DB that distinguish it from an ordinary file.
In particular, the mechanism you propose violates the atomicity, consistency, isolation, and durability that MySQL is built to ensure. For example, incomplete playback of the log on the mirrors will leave your mirrors in a state inconsistent with the parent DB. What you propose can only approximate mirroring, thus you should prefer DBMS intrinsic mechanisms unless you don't care if the mirrors accurately reflect the state of the parent.

Related

Logging of data change in mysql tables using ado.net

Is there any work around to get the latest change in MySQL Database using Ado.NET.
i.e. change in which table, which column, performed operation, old and new value. both for single table change and multiple table change. want to log the changes in my own new table.
There are several ways how change tracking can be implemented for mysql:
triggers: you can add DB trigger for insert/update/delete that creates an entry in the audit log.
add application logic to track changes. Implementation highly depends on your data layer; if you use ADO.NET DataAdapter, RowUpdating event is suitable for this purpose.
Also you have the following alternatives how to store audit log in mysql database:
use one table for audit log with columns like: id, table, operation, new_value (string), old_value (string). This approach has several drawbacks: this table will grow up very fast (as it holds history for changes in all tables), it keeps values as strings, it saves excessive data duplicated between old-new pairs, changeset calculation takes some resources on every insert/update.
use 'mirror' table (say, with '_log' suffix) for each table with enabled change tracking. On insert/update you can execute additional insert command into mirror table - as result you'll have record 'snapshots' on every save, and by this snapshots it is possible to calculate what and when is changed. Performance overhead on insert/update is minimal, and you don't need to determine which values are actually changed - but in 'mirror' table you'll have a lot of redundant data as full row copy is saved even if only one column is changed.
hybrid solution when record 'snapshots' are temporarily saved, and then processed in background to store differences in optimal way without affecting app performance.
There are no one best solution for all cases, everything depends on the concrete application requirements: how many inserts/updates are performed, how audit log is used etc.

Is it possible to determine MySQL replication "position" with a normal query?

I have a MySQL (RDS) database that is replicated from one datacenter to another. There is also a message bus which spans these two locations, and it carries messages when certain writes to the database take place.
The messages and the MySQL replication race between the two locations. We need to make sure we don't process the message before the write that it refers to has definitely made it into the replica.
At the moment we use custom "last updated at" field on the tables that are replicated. It seems like there should be a global variable we can use instead though -- something that monotonically increases whenever there's a write anywhere in the database, and is available at both the master and the slave.
Does such a variable exist? Do I need special privileges to read it?
If there is not such a thing, what would be the tradeoffs associated with implementing it ourselves?

SQL JOB to Update all tables

I am using Microsoft SQL Server 2008 R2.I have copied database A(myproduction database) to database B(Myreportin database) by creating SSIS package.Both databases are in same server.I want to run a job so that If any change(data modifications like inserting new rows or updating values of any row in any table) take place in database A that will also take place in my B database and sql job will run and acomplish the changing automatically.I don't want that in database B table will be dropped and recreated (as its not our business rule )instead only the change will take place.
Can any one help me please.Thanks in Advance.
I would suggest that you investigate using replication. Specifically, transactional replication if you need constant updates. Here's a bit from MSDN:
Transactional replication typically starts with a snapshot of the publication database objects and data. As soon as the initial snapshot is taken, subsequent data changes and schema modifications made at the Publisher are usually delivered to the Subscriber as they occur (in near real time). The data changes are applied to the Subscriber in the same order and within the same transaction boundaries as they occurred at the Publisher; therefore, within a publication, transactional consistency is guaranteed.
If you don't need constant updating (that comes at a price in performance, of course), you can consider the alternatives of merge replication or snapshot replication. Here's a page to start examining those alternatives.

How to update DB structure when updating production system without doing a teardown / rebuild

If I'm working on a development server and have updates to the database structure for some of our releases, what is the best way to update the structure on the production server?
Currently we create a new production database containing the structure only, do a SQL dump of the data on the 'old' production database, then run a SQL query to insert the data into the new database.
I know there is an easier way to do these updates, right?
Thanks in advance.
We don't run anything on prod without a script and that script must be in source control. Additionally we have to write a rollback script in case the initial script goes bad and we have to back it out. And when we move to prod configuration management does a differential compare between prod and dev to see if we have missed anything in the production script (any differences have to be traceable to development we are not yet ready to move to prod and documented). A product like Red-gate's SQL compare can do this. Our process is very formalized so that we can maintain a certification required by our larger clients.
If you have large tables even alter table can be slow, but it's still generally more efficient in total time than making a copy of the table with a new name and structure, copying the data to that table, renaming the old table, then naming the new table the name of the orginal table, then deleting the old table.
However, there are times when that is a preferable process as the total down time apparent to the user in this case is the time it takes to rename two tables, so this is good for tables where the data only is filled from the backend not the application (if the application can update the tables, it is a dangerous practice to do this as you may lose changes made while the tables were in transition). A lot of what process to use depends on the nature of the change you are making. Some changes should be done in a maintenance window where the users are not allowed to access the database. For instance if you are adding a new field with a default value to a table with 100,000,000 records, you are liable to lock up the users from using the table while the update happens. It is better to do this in single user mode during off hours (and when the users are told in advance the database will not be available). Other changes only take milliseconds and can happen easily while users are logged in.
Look at alter table to change the schema
It might not be easier than your method but it means less copying of the database
This is actually quite a deep question. If the only changes you've made are to add some columns then ALTER TABLE is probably sufficient. But if you're renaming or deleting columns then ALTER statements may break various foreign key constraints. In addition, sometimes you need to make changes both to the database and the data, which is pretty much unscriptable.
Most likely the best way to automate this would be to write a simple script for each deployment (along with a script to roll back!) which is basically what some systems like Rails will do for you I believe. Some scripts might be simply ALTER statements, some might temporarily disable foreign-key checking and triggers etc, some might run some update statements as well. And some might be dumping the db and rebuilding it. I don't think there's a one-size-fits-all solution here, sorry :)
Use the ALTER TABLE command: http://dev.mysql.com/doc/refman/5.0/en/alter-table.html

What is the best way to update (or replace) an entire database table on a live machine?

I'm being given a data source weekly that I'm going to parse and put into a database. The data will not change much from week to week, but I should be updating the database on a regular basis. Besides this weekly update, the data is static.
For now rebuilding the entire database isn't a problem, but eventually this database will be live and people could be querying the database while I'm rebuilding it. The amount of data isn't small (couple hundred megabytes), so it won't load that instantaneously, and personally I want a bit more of a foolproof system than "I hope no one queries while the database is in disarray."
I've thought of a few different ways of solving this problem, and was wondering what the best method would be. Here's my ideas so far:
Instead of replacing entire tables, query for the difference between my current database and what I want to place in the database. This seems like it could be an unnecessary amount of work, though.
Creating dummy data tables, then doing a table rename (or having the server code point towards the new data tables).
Just telling users that the site is going through maintenance and put the system offline for a few minutes. (This is not preferable for obvious reasons, but if it's far and away the best answer I'm willing to accept that.)
Thoughts?
I can't speak for MySQL, but PostgreSQL has transactional DDL. This is a wonderful feature, and means that your second option, loading new data into a dummy table and then executing a table rename, should work great. If you want to replace the table foo with foo_new, you only have to load the new data into foo_new and run a script to do the rename. This script should execute in its own transaction, so if something about the rename goes bad, both foo and foo_new will be left untouched when it rolls back.
The main problem with that approach is that it can get a little messy to handle foreign keys from other tables that key on foo. But at least you're guaranteed that your data will remain consistent.
A better approach in the long term, I think, is just to perform the updates on the data directly (your first option). Once again, you can stick all the updating in a single transaction, so you're guaranteed all-or-nothing semantics. Even better would be online updates, just updating the data directly as new information becomes available. This may not be an option for you if you need the results of someone else's batch job, but if you can do it, it's the best option.
BEGIN;
DELETE FROM TABLE;
INSERT INTO TABLE;
COMMIT;
Users will see the changeover instantly when you hit commit. Any queries started before the commit will run on the old data, anything afterwards will run on the new data. The database will actually clear the old table once the last user is done with it. Because everything is "static" (you're the only one who ever changes it, and only once a week), you don't have to worry about any lock issues or timeouts. For MySQL, this depends on InnoDB. PostgreSQL does it, and SQL Server calls it "snapshotting," and I can't remember the details off the top of my head since I rarely use the thing.
If you Google "transaction isolation" + the name of whatever database you're using, you'll find appropriate information.
We solved this problem by using PostgreSQL's table inheritance/constraints mechanism.
You create a trigger that auto-creates sub-tables partitioned based on a date field.
This article was the source I used.
Which database server are you using? SQL 2005 and above provides a locking method called "Snapshot". It allows you to open a transaction, do all of your updates, and then commit, all while users of the database continue to view the pre-transaction data. Normally, your transaction would lock your tables and block their queries, but snapshot locking would be perfect in your case.
More info here: http://blogs.msdn.com/craigfr/archive/2007/05/16/serializable-vs-snapshot-isolation-level.aspx
But it requires SQL Server, so if you're using something else....
Several database systems (since you didn't specify yours, I'll keep this general) do offer the SQL:2003 Standard statement called MERGE which will basically allow you to
insert new rows into a target table from a source which don't exist there yet
update existing rows in the target table based on new values from the source
optionally even delete rows from the target that don't show up in the import table anymore
SQL Server 2008 is the first Microsoft offering to have this statement - check out more here, here or here.
Other database system probably will have similar implementations - it's a SQL:2003 Standard statement after all.
Marc
Use different table names(mytable_[yyyy]_[wk]) and a view for providing you with a constant name(mytable). Once a new table is completely imported update your view so that it uses that table.