I am researching the possibility to log all the changes made to a MySQL database including DDL statements that may occur and use that information so it can be synchronized with a remote database.
The application itself is written in C# so the best synchronization technology that I have seen so far to be available is Microsoft Sync Framework. This framework itself proposes a solution to track changes made to the DB by adding triggers and additional tables to store the deleted rows.
This does not seem to be a great idea for my case since it involves changing the schema of a standard DB used by more than 4 products. This method is also effectively doubling the number of tables (by adding a new table for the deleted rows of each table) which also does not feel to good.
On the other side MySQL has this great thing binlog, which tracks all the changes and can also use the so called mixed mode to track statements in most cases (so they can be executed again on the remote DB to replicate data) and the raw data when a non-deterministic function is called (like NOW()) so the data updated is the same on both places.
Also there seems to be 2 standard ways to retrieve this data:
1) The mysqlbinlog utility
2) Calling 'SHOW BINLOG EVENTS'
Option 2 seems the better to me since it does not require calling another external application, and running an application on the DB machine, BUT it does not include the actual data for the logged ROW format statements (only stuff like: table_id: 47 flags: STMT_END_F which tells me nothing).
So finally my questions are:
Is there a better way to track the changes made to a MySQL db without changing the whole structure and adding a ton of triggers and tables? I can change the product to log it's changes too but then we have to change all the products using this db to be sure we log everything ... and I think it's almost impossible to convince everyone.
Can I get all the information about the changes made using SHOW BINLOG EVENTS? Including the ROW data.
P.S. I researched MySQL Proxy too, but the problem in logging statements in all cases is that the actual data in non deterministic functions is not included.
Option 3 would be to parse the bin log yourself from within your app - that way you get total control of how often you check etc, and you can see all the statements with the actual values used.
Related
I want to write a listener which detects the DML changes on a table and perform some actions. This listener cannot be embedded in the application and it runs separately.
I thought let the application write to blackhole table and I will detect the changes from the binary log file.
But in the docs I found that enabling binary logging slows down the mysql performance slightly. Thats why i was wondering is there a way i can make the mysql master to log the changes related to a specific table.
Thanks!
SQL is the best way to track DML change and call function based on that. But, as you want to explore other options you may try
writing a cronjob with General Query Log which includes SELECT / SHOW statements as well which you don't need
mysqlbinlog : It slows down performance just a little, but it is necessary for point in time data recovery and replication.
Suggestions:
On a prod environment, MySQL binary log must be enabled. and general
query log must be disabled as general query logs almost everything
and gets filled very quickly and might run out of disk space if not
rotated properly.
On a dev/qa environment, general query log can be enabled with proper
rotation policy.
So I'm going to attempt to create a basic monitoring tool in VB.net. Now I'd like some advice on how basically to tackle the logging and reporting side of things so I'd appreciate some responses from users who I'm sure have a better idea than me and can tell me far more efficient ways of doing things.
So my plan is to have a client tool, which will read from a MySQL database values and basically change every x interval, I'm thinking 10/15 minutes at the moment. This side of the application is quite easy, I mean I can get something to read a database every x amount of time and then change labels and display alerts based on them. - This is all well documented and I am probably okay with that.
The second part is to have a client that sits in the system tray of the server gathering the required information. Now the system tray part I think will probably be the trickiest bit of this, however that's not really part of my question.
So I assume I can use the normal information gathering commands and store them perhaps as strings and I can then connect to the same database and add them to the relevant fields. For example if I had a MySQL table called "server" and a column titled "Connection" I could check if the server has an internet connection for example and store the result as the value 1 for yes and 0 for no and then send a MySQL command to the table to update the "connection" value to either 0/1.
Then I assume the monitoring tool I can run a MySQL query to check the "Connection" column and if the value is = 0 change a label or flag an error and if 1 report that connectivity is okay?
My main questions about the above are listed below.
Is using a MySQL database the most efficient way of doing something like this?
Obviously if my database goes down there's no more reporting, I still think that's a con I'll have to live with though.
Storing everything as values within the code is the best way to store my data?
Is there anything particular type of format I should use in the MySQL colum, I was thinking maybe tinyint(9)?
Is the above method redundant and pointless?
I assume all these database connections could cause some unwanted server load, however the 15 minute refresh time should combat that.
Is there a way to properly combat delays with perhaps client updating not in time for the reporter so it picks up false data, perhaps a fail safe for a column containing last updated time?
You probably don't need the tool that gathers information per se. The web app (real time monitor) can do that, since the clients are storing their information in the same database. The web app can access the database every 15 minutes and display the data, without the intermediate step of saving it again. This will provide the web app with the latest information instead of a potential 29-minute delay.
In other words, the clients are saving the connection information once. Don't duplicate it in the database.
MySQL should work just about as well as anything.
It's a bad idea to hard code "everything". You can use application settings or a MySQL table if you need to store IPs, etc.
In an application like this, the conversion will more than offset the data savings of a tinyint. I would use the most convenient data type.
I'm researching something that I'd like to call replication, but there is probably some other technical word for it - since as far as I know "replication" is a complete replication of structure and its data to slaves. I only want the structure replication. My terminology is probably wrong which is why I can't seem to find answers on my own.
Is it possible to set up a mysql environment that replicates a master structure to multiple local databases when a change, addition or drop has been made? I'm looking for a solution where each user gets its own database instance with their own unique data but with the same structure of tables. When an update is being made to the master structure, the same procedure should be replicated by each user database.
E.g. a column is being added to master.table1 that is replicated by user1.table1 and user2.table1.
My first idea was to write a update procedure in PHP but it feels like this would be a quite fundamental function built-in to the database, since my conclusion would be that index lookup would be much faster with less data (~ total data divided by users) and probably more secure (no unfortunate leaks, if any).
I solved this problem with simple set of SQL scripts for every change in database, named year-month-day-description.sql, which i run in lexicographical order (that's why it begins with date).
Of course you do not want to run them all every time. So to know which scripts I need to execute, each script has simple insert at it's end, which inserts filename of the script into table in database. So the updater PHP script simply make list of scripts, remove these in table and run the rest.
Good on this solution is, that you can include data transformations too. And also, it can be fully automatic and as long as scripts are ok, nothing bad will happen.
You will probably need to look into incorporating the use of database "migrations", something popularized by the Ruby on Rails framework. This Google search for PHP database migrations might be a could starting point for you.
The concept is that as you develop your application and make schema changes, you can create SQL migration scripts to roll-forward or roll-back the schema changes. This makes it really easy to then easily "migrate" your database schema to work with a particular code version (for example if you have branched code being worked on in multiple environments that need each need a different version of the database).
That isn't going to autmoatically make updates like you suggest, but is certainly a step in the right direction. There a also tools like Toad for MySQL and Navicat which have some level of support of schema synchronization. But again these would be manual comparisons/syncs.
I have a script in a Controller that I launch from the Ruby on Rails console (IRB).
This script constantly Creates-Reads-Updates (no deletions) a MySQL database, taking data from the Interwebs.
The problem is that it takes very long until all the required data is put into the database. So I would like to know if it is a good idea to simply open several Rails consoles and launch that script several times in parallel.
-> Several Ruby instances would work 1 database.
Is that a problem? Could this create any write conflicts (Create/Update) in the database? If so, is there anything I would have to do in order to avoid such conflicts?
If it's not a problem: How many Ruby instances could I "unleash" onto the database, in parallel?
You can definitely run multiple consoles simultaneously against a single database. The limit is the number of open connections the database allows. In Mysql 5.1, the default was 100, and in 5.5 it's 151. You're unlikely to run out of connections before something else becomes the bottleneck.
It might just work to have multiple processes running simultaneously, but it might not. The complete analysis of this is fairly complicated. A couple things you can do to ensure it will work properly with multiple simultaneous clients. First, if you wrap each change in a database transaction that will take care of most of what you need:
transaction do
# all your code to create / modify a single item goes here
end
Make sure your tables are using the InnoDB format instead of MyISAM which doesn't support transactions.
Also, as mu too short points out, put all the validation constraints you can directly into the database. So if you have uniqueness constraints or foreign key relations, add them to your schema by hand, since rails doesn't do it by default. Complex validations that compare different model objects (aside from FK relations as in belongs_to) could require database trigger validations -- hopefully you don't need that. But if you get all your validations in the database natively, and then everything should work.
I'm planning to create an VB.net application for retrieving data from a database (MS Access) and store it to a web server (MySQL data base). I really have confusion in my mind. I'm planning to use task scheduler so that the program will automatically run. I'm planning to set the time every 5 minutes.
How can I avoid the redundancy of data?
For example, I'm planning to get the sales for 5 minutes, after 5 minutes I will do it again. I think there will be redundancy in that case. I would like to ask your ideas about this scenario: how would you handle it?
If at all possible you should avoid using two databases in a situation like this.
Look for information on the linked table manager -- the data that Access uses doesn't have to be stored in Access.
http://www.mssqltips.com/sqlservertip/1480/configure-microsoft-access-linked-tables-with-a-sql-server-database/
If you have to do this, then see about using/upgrading to Access 2010 and use data macros (triggers), to put the new/changed data into temp tables that you clear out once you've copied the data over.
In a comment you said "i dont have any idea about how to replace the native tables with ODBC".
Is that the only obstacle which prevents you consolidating the data into one set in MySQL? If so, try this suggestion for setting ODBC links to MySQL tables.
Install an ODBC driver for MySQL, if you don't have one already. The latest version is available here: Download Connector/ODBC
Create a DSN (Data Source Name) for your MySQL database from the Windows ODBC Data Source Administrator.
Create a new Access database and use the DSN to create links with guidance from the web page link #jmoreno provided.
If the Access names of the linked tables are different than the names you originally used for the native Access tables, change them to match those original names.
Then you can import your forms, queries, reports, etc. from the old Access application. Ideally everything will just work, since Access will find the table names it needs and won't care that they are external instead of native tables. However you many need to resolve any data type incompatibilities between Access and MySQL.
You would need the MySQL ODBC driver on each machine where the Access application is used. Personally I would prefer to deal with that rather than the challenges of synchronizing between separate Access and MySQL data stores. (YMMV)
When you're ready to deploy, you can convert the ODBC links to DSN-less connections so the client machines wouldn't need to each have the DSN configured. See Using DSN-Less Connections by Doug Steele, Access MVP, for detailed instructions.
You will need to think very carefully about how you identify the data which has changed since the last synchronization cycle. If every row of data has a 'last updated' timestamp (that is indexed) then you could write a process that selected the recently updated rows from each table in turn. That's apt to be a bit heavy on the originating database (MS Access), plus you still have to identify the corresponding row to replace (where replacement is required) in the MySQL database. Of course, you can put different tables on different change schedules. For example, the table of US states probably doesn't change once a year, but your customer orders tables (or SO questions and answers tables) may change a lot in five minutes.
Some DBMS have alternative mechanisms, especially for working between copies of themselves. Some DBMS also provide a mechanism that is sometimes called 'changed data capture' (CDC) that allows you to get the changed data. Sometimes, in DBMS where you have a 'transaction log' or 'logical log' (but not CDC or something similar), you can 'mine' the log files (or log backups) to find the changes. However, the logs are typically optimized for the DBMS internal recovery processes, not for your use.
Well, obviously you will have to keep track of data items (may be in a different metadata space/datastore) that you have already processed to avoid the redundancy. The metadata should be used to filter out records that have been processed from the source. The logic and what needs to be in the metadata would depend on the exact use case here.