Im researching the best way of logging queries in MySQL database. The log should be used for two things:
Documentation of system activity
Used to recreate the database (in case the database is hacked or otherwise corrupted)
It's possible to log all queries in a MySQL database (like this example)
Question: It is possible to recreate a database on the basis of the log file, or should I use a different approach?
You can use replication logs for this - they store the complete set of operations. You should be able to create a new database from original sources and apply all changes upon it.
You can do complete dumps (i.e. once a week) and archive the replication logs on daily basis.
Related
I want to export updated data from MySQL/postgreSQL to mongodb every time specified table has changed or, if that's impossible, make the dump of whole table to NoSQL every X seconds/minutes. What can I do to achieve this? I've googled and I found only paid, enterprise level solutions and those are out of reach for my amateur project.
SymmetricDS provides and open source database replication option that would that has support for replicating a RDMS database (MySQL, Postgres) into MongoDB.
Here is the specific documentation to setup the Mongo target node in SymmetricDS.
http://www.symmetricds.org/doc/3.11/html/user-guide.html#_mongodb
There is also a blog about setting up Mongo in a bit more detail.
https://www.jumpmind.com/blog/mongodb-synchronization
To get online replication into a target database you can use:
Get the data stream at the same time in both databases
Enterprise solution which reads the transaction log and pushes the data to the next database
Check periodically for change dates > X
Export table periodically
Write changed records to a certain table with trigger and poll this table to select the changes
Push the changed data with triggers in a datastreamservice into the next database
Many additional approaches
Depending on the amount of time you want to use and the lag the data can have it depends which solution fits for your demand.
If the amount of data gets bigger or the number or transaction increases some solutions which fit for an amateur project don't fit anymore.
A current project I am working on has been exclusively using MySQL as our RDMS. We are currently looking to segment the database into two different databases. One will be moving to RedShift (which runs using a modified Postgresql) while the other will continue using MySQL.
My concern does not stem from splitting the data, but rather how applications will interact with the segmented data. Effectively our current application will be reading static data from RedShift and writing to the MySQL database and I am curious if it is a bad practice to intermingle these Query Languages.
Would it be better to migrate the MySQL DB to Postgres to limit complications arising from their differences?
We (Looker) work with many customers (100s) that have both MySQL and Redshift. The progression as their needs grow is usually:
MySQL
MySQL + MySQL slave
MySQL + MySQL Writable Slave
MySQL + MySQL Writable Slave + Redshift
So your best bet, if you haven't done so is to setup a MySQL Replica slave database. The replica slave follows your master write database and is essentially an exact copy of your master.
You can also make your Replica Writable. This becomes really useful for building summary tables. Here are some instructions on how to make a writable replica in RDS, but you can do it with in other systems too.
http://www.looker.com/docs/setup-and-management/database-config/mysql-rds
If have big event data that you want to integrate with your transactional data, the next step is to setup a process that migrates all your MySQL data into Redshift and pumps in data from other sources (like your event data, for example). Moving all the data, gives you the ability to ask any question from Redshift.
Redshift will lag hours or more behind the MySQL database. If you need to answer real time questions, query MySQL. If you want general insights, query the Redshift database.
Assume a number of conventional LAMP-style applications which use MySQL as a back-end to record the 'current durable state' for the applications.
I am interested in establishing an 'audit' of transitions at the database level - and storing them as a log. The idea is that - assuming the MySQL database has been 'dumped' at the beginning of the day, it would be possible to 'replay' transactions against the back-up to recover any state during the working day.... A bit like time-machine for MySQL - I guess.
I have found some documentation about "Audit plugins" which look relevant but leaves me with more questions than answers.
http://dev.mysql.com/doc/refman/5.6/en/writing-audit-plugins.html
Essentially, I'd like to establish if it would be feasible to write a MySQL plugin to achieve my goal - such that it would work 'seamlessly' with existing MySQL applications?
The principal detail I'm finding it difficult to ascertain is this: When the audit-plugin is notified of an event, what is the mechanism by which the new data can be established in order to log it? How are data types encoded? How hard would it be to write a tool to 'replay' this audit against a 'full-system-backup' using mysqldump, for example?
Are there any existing examples of such plugins?
You just want MySQL's Point-in-Time (Incremental) Recovery Using the Binary Log:
Point-in-time recovery refers to recovery of data changes made since a given point in time. Typically, this type of recovery is performed after restoring a full backup that brings the server to its state as of the time the backup was made. (The full backup can be made in several ways, such as those listed in Section 7.2, “Database Backup Methods”.) Point-in-time recovery then brings the server up to date incrementally from the time of the full backup to a more recent time
I'd like to populate the MySQL timezone tables with the database provided by MySQL. I am using a cloud DB and can't overwrite DB tables and restart the server.
Can someone help me understand how to load these files manually?
Rational
I loaded the tz tables from the OS, but the OS has a ton of timezone names. I'd like a more concise set of names that I can query for forms. I think the set provided by MySQL might be a better fit. No other apps are running on the database, thus timezone conflicts aren't an issue.
The database provided by mysql comes as a bunch of myISAM container files; I don't think you're going to be able to safely drop them into the mysql data base directory without bouncing your mysqld.
Do you own this mysqld, or are you one of many tenants in a vendor-owned system?
If you own it, you can load a subset of the /usr/share/zoneinfo time zones. A useful subset might be /usr/share/zoneinfo/posix.
If you're using the mysql.time_zone_name.Name to populate a pick list (a good use for it) you could select an appropriate subset of the admittedly enormous list of names,
or create some aliases right in that table.
I ended up loading the tables into a SQL server on my on local machine, then exporting insert statements and manually loading those onto the server for which I don't have direct control of. Not a glamors solution, it it appears to be the only reasonable way to go about it.
I am researching the possibility to log all the changes made to a MySQL database including DDL statements that may occur and use that information so it can be synchronized with a remote database.
The application itself is written in C# so the best synchronization technology that I have seen so far to be available is Microsoft Sync Framework. This framework itself proposes a solution to track changes made to the DB by adding triggers and additional tables to store the deleted rows.
This does not seem to be a great idea for my case since it involves changing the schema of a standard DB used by more than 4 products. This method is also effectively doubling the number of tables (by adding a new table for the deleted rows of each table) which also does not feel to good.
On the other side MySQL has this great thing binlog, which tracks all the changes and can also use the so called mixed mode to track statements in most cases (so they can be executed again on the remote DB to replicate data) and the raw data when a non-deterministic function is called (like NOW()) so the data updated is the same on both places.
Also there seems to be 2 standard ways to retrieve this data:
1) The mysqlbinlog utility
2) Calling 'SHOW BINLOG EVENTS'
Option 2 seems the better to me since it does not require calling another external application, and running an application on the DB machine, BUT it does not include the actual data for the logged ROW format statements (only stuff like: table_id: 47 flags: STMT_END_F which tells me nothing).
So finally my questions are:
Is there a better way to track the changes made to a MySQL db without changing the whole structure and adding a ton of triggers and tables? I can change the product to log it's changes too but then we have to change all the products using this db to be sure we log everything ... and I think it's almost impossible to convince everyone.
Can I get all the information about the changes made using SHOW BINLOG EVENTS? Including the ROW data.
P.S. I researched MySQL Proxy too, but the problem in logging statements in all cases is that the actual data in non deterministic functions is not included.
Option 3 would be to parse the bin log yourself from within your app - that way you get total control of how often you check etc, and you can see all the statements with the actual values used.