I'd like to be able to replicate a bunch of mysql tables to a custom service.
Right now, my best idea is creating an after insert trigger on each table and have these push to a 'cache' table that will get polled by my custom service for updated rows.
The problem with the above is that it means I have to poll at regular intervals. I'm wondering if there is a way to do it where mysql pushes updates to my service. The best way for this that I can think of is if triggers could support actions other than updating other tables, like doing a POST (which as far as I can tell is not possible).
I'm pretty sure there's a way to have mysql push binary logs to me somehow, but I dont know how to do that.
You can extend the engine to run system code from your function. Here's an overview.
Given this effort (setup and maintenance), a polling script doesn't look too bad.
Related
I have a database shared by two completely separate server applications and those applications cannot communicate with one another at all. Let's say those two applications are called A and B. Whenever A updates table in the shared DB, B should quickly know that there was a change somehow (remember *A and B cannot communicate with each other). Also, I want to avoid using setInterval type of approach where I query every x seconds. Initially I thought there would be a way to 'watch' changes within MySQL itself but seems like there isn't. What would be the best approach to achieve this? I'm using Node.js, MySQL Workbench, and PHP.
TDLR:
I'm trying to find a best way to 'watch' any table changes and trigger action (maybe like http request) whenever change is detected. I'm using MySQL Workbench and Node.js. I really want to avoid using setInterval type of approach. Any recommendation?
What you want is a Change Data Capture (CDC) feature. In MySQL, the feature is the binary log.
Some tools like Debezium are designed to watch and filter the binary log, and transform it into events on a message queue (e.g. Kafka).
Some comments above suggest using triggers, but this is a problematic idea, because triggers fire during a data change, when the transaction for that change is not yet committed. If you try to invoke an http request or any other application action when a trigger fires, then you risk having the action execute even if the data change is subsequently rolled back. This will really confuse people.
Also there isn't a good way to run application actions from triggers. They are for making subordinate data changes, not actions that are outside transaction scope.
Using the binary log as a record of changes is safer, because changes are not written to the binary log until they are committed. Also the binary log contains all changes to all tables.
Whereas with a trigger solution you would have to create three triggers (INSERT, UPDATE, and DELETE) for each table. Also MySQL does not support triggers for DDL statements (CREATE, ALTER, DROP, TRUNCATE, etc.).
I have 2 Couchbase clusters. 1 for real time work and 1 for back-end data query.
I wish to replicate only 10% of the data from the real time bucket to the back-end because it's used for statistically annalists.
Note one: I know it's not possible by the UI, I'm looking for a way to write some-kind of extension for it that could "sit" in the middle of the XCDR and filter it.
Note two: As I understand Elastic-Search are using the replication feature to get noticed for changes on the cluster and build there own indexes. If I could "listen" for those notification myself I could take it from there, reading and sending the relevant data myself.
Any ideas on how I can make it work?
==NOTES==
I found the following link: http://blog.couchbase.com/xdcr-aspnet-and-nancy, this give a basic example of Sinatra project which XDCR can connect to. But there is no link to a documentation on the Rest API for one which doesn't want to work with Sinatra.
As for #Cihan question, replication 10% of the data is the basic use I wish for and for that I can use only the key. But in general I probably like to munipulate the data and also be able to merge it to an existing data - that would be a case if I have 2 real time clusters replicating to 1 back-end cluster.
Don't have anything built in today to do this. You could set up XDCR and delete the data that you don't need on the destination cluster but it may reappear as updates happen so your cleanup will have to continuously run. would a method like that work?
By the way we do plan to have the facility in future. one comment that would be helpful for me is what type of a filtering would suffice in your case? can we filter with a prefix only to achieve your case or would you need a more sophisticated filtering expression?
thanks
Cihan Biyikoglu
I'm researching something that I'd like to call replication, but there is probably some other technical word for it - since as far as I know "replication" is a complete replication of structure and its data to slaves. I only want the structure replication. My terminology is probably wrong which is why I can't seem to find answers on my own.
Is it possible to set up a mysql environment that replicates a master structure to multiple local databases when a change, addition or drop has been made? I'm looking for a solution where each user gets its own database instance with their own unique data but with the same structure of tables. When an update is being made to the master structure, the same procedure should be replicated by each user database.
E.g. a column is being added to master.table1 that is replicated by user1.table1 and user2.table1.
My first idea was to write a update procedure in PHP but it feels like this would be a quite fundamental function built-in to the database, since my conclusion would be that index lookup would be much faster with less data (~ total data divided by users) and probably more secure (no unfortunate leaks, if any).
I solved this problem with simple set of SQL scripts for every change in database, named year-month-day-description.sql, which i run in lexicographical order (that's why it begins with date).
Of course you do not want to run them all every time. So to know which scripts I need to execute, each script has simple insert at it's end, which inserts filename of the script into table in database. So the updater PHP script simply make list of scripts, remove these in table and run the rest.
Good on this solution is, that you can include data transformations too. And also, it can be fully automatic and as long as scripts are ok, nothing bad will happen.
You will probably need to look into incorporating the use of database "migrations", something popularized by the Ruby on Rails framework. This Google search for PHP database migrations might be a could starting point for you.
The concept is that as you develop your application and make schema changes, you can create SQL migration scripts to roll-forward or roll-back the schema changes. This makes it really easy to then easily "migrate" your database schema to work with a particular code version (for example if you have branched code being worked on in multiple environments that need each need a different version of the database).
That isn't going to autmoatically make updates like you suggest, but is certainly a step in the right direction. There a also tools like Toad for MySQL and Navicat which have some level of support of schema synchronization. But again these would be manual comparisons/syncs.
For a RoR app I'm helping develop, I need to save all search queries in a database so I can analyze them later.
My plan right now is to create a Result model and table, and just save each search query's text in that table, along with a user's ID, the time, etc.
However, the app has about 15,000 users, so I'm afraid the single table approach won't be super efficient when it comes time to parse that data. (The database is setup via MySQL, if that factors in at all.)
Am I just being paranoid? Is there a Ruby gem that handles this sort of thing, or a better approach I could take?
Any input would be appreciated.
there are couple of approaches you can try:
1. Enable mysql query logging and than analyze these logs
http://dev.mysql.com/doc/refman/5.1/en/query-log.html
2. Use key=>value store (redis comes to mind) to log the search query in a similar way you described
If you decide to go with the 2nd approach i would create an asynch observer on the model you want to track
The answer depends on what you want to do with the data.
If your users don't need access to this, and you're not doing real-time analytics, dump them out of your app and get them into another database to run analytics to your heart's content.
If you want something integrated into your app, try a single mysql table.
Unless your server is tiny or your users are crazy active searchers, it should work just peachy. At a certain point you'll probably want to clear out old records and save them elsewhere though.
I was looking for a better mechanism for notifying my desktop clients that a sql server table has changed. Exclude the option of an app server tier, I'm looking at solutions that fit into the existing thick client->Sql server model. I'm familiar with triggers and polling, but was hoping for something a bit smarter.
One option seems to be SqlDependency. I'm looking at that at the moment, but have seen a few mentions that it has "restrictions" and may be "unsuitable" for large numbers of changes. I've not found a lot of information on that, or many recent code examples.
What are you using for notification that a sql server table has been amended?
Unless you have a service, ie all chnages to tables go through it, then you are down to polling or dependancy. All dependancy does is hook into sql servers own table change code and fire a change event. The underlying mechanism is very simple, and can get swamped by a large number of changes, attempting to rationalise the changes in the event handler is problematic at best.
You might get somewhere with triggers to a "communications table", where you could add the rationalisation logic, then use dependancy from there.
So instead of detecting a simple change to column1 in table1 you trigger an insert to an event record in your comms table.
It's going to be a PIA because you've excluded an app server tier, you've also drastically constrained your options for doing something efficient and nice.