notifying applications on db INSERT - mysql

Consider an application with two components, possibly running on separate machines:
Producer - Inserts records into a
database, but does little to no reading from the database.
Multiple instances may be running
concurrently.
Consumer - Must be notified when a record is inserted into the
database by an instance of component
A. May also have multiple instances.
What is the best way to perform the notifications, assuming that producers will be inserting 10-100 records into the database per second at peak times? The database technology is currently MySQL, but this is not necessarily set in stone. I can see a few different ways:
Use something like MySQL message queue to "push" INSERT notifications to subscribers (consumers). Producers would have no knowledge that this was occurring.
Have producers interact with an intermediate layer that performs the INSERT, and pushes notifications to a message queue that consumers are subscribed to.
Have consumers poll the database frequently to check for new additions (seems like a bad idea)
etc.
As far as coupling is concerned: Is it a good idea to have a two relatively separate application components perform direct queries on a shared database, or should one component "own" the database while the other component indirectly interacts with the DB via calls to the owning component?

I like the second proposed solution (the intermediate layer), as it separates the notification from the database work, and could possibly be part of a two-phase commit XA transaction. If the consumers need the database content in addition to the notification, that can be accomplished via MySQL replication. This could also address the coupling question, as the consumer components could have read-only access to their replicated instances.
Using a messaging solution would also address any potential bottlenecks in the database-only solution, as it would separate the notification and storage into separate processes.
Depending on the language, you have a number of choices for the message distribution. If you're using Java, I'd actually recommend JGroups rather than JMS, as it's somewhat easier to configure.
If Java isn't your language of choice, Apache's Active MQ supports a number of languages for interfacing. Apache's Qpid is an AMQP implementation that also supports a number of languages (Java, C++, Python, Ruby, etc.)
Other messaging options could include XMPP, STOMP, or RestMS implementations.

Related

What's the most efficient architecture for this system? (push or pull)

All s/w is Windows based, coded in Delphi.
Some guys submit some data, which I send by TCP to a database server running MySql.
Some other guys add a pass/fail to their data and update the database.
And a third group are just looking at reports.
Now, the first group can see a history of what they submitted. When the second group adds pass/fail, I would like to update their history. My options seem to be
blindly refresh the history regularly (in Delphi, I display on a DB grid so I would close then open the query), but this seems inefficient.
ask the database server regularly if anything changed in the last X minutes.
never poll the database server, instead letting it inform the user's app when something changes.
1 seems inefficient. 2 seems better. 3 reduces TCP traffic, but that isn't much. Anyway, just a few bytes for each 2. However, it has the disadvantage that both sides are now both TCP client and server.
Similarly, if a member of the third group is viewing a report and a member of either of the first two groups updates data, I wish to reflect this in the report. What it the best way to do this?
I guess there are two things to consider. Most importantly, reduce network traffic and, less important, make my code simpler.
I am sure this is a very common pattern, but I am new to this kind of thing, so would welcome advice. Thanks in advance.
[Update] Close voters, I have googled & can't find an answer. I am hoping for the beneft of your experience. Can you help me reword this to be acceptable? or maybe give a UTL which will help me? Thanks
Short answer: use notifications (option 3).
Long answer: this is a use case for some middle layer which propagates changes using a message-oriented middleware. This decouples the messaging logic from database metadata (triggers / stored procedures), can use peer-to-peer and publish/subscribe communication patterns, and more.
I have blogged a two-part article about this at
Firebird Database Events and Message-oriented Middleware (part 1)
Firebird Database Events and Message-oriented Middleware (part 2)
The article is about Firebird but the suggested solutions can be applied to any application / database.
In your scenarios, clients can also use the middleware message broker send messages to the system even if the database or the Delphi part is down. The messages will be queued in the broker until the other parts of the system are back online. This is an advantage if there are many clients and update installations or maintenance windows are required.
Similarly, if a member of the third group is viewing a report and a
member of either of the first two groups updates data, I wish to
reflect this in the report. What it the best way to do this?
If this is a real requirement (reports are usually a immutable 'snapshot' of data, but maybe you mean a view which needs to be updated while beeing watched, similar to a stock ticker) but it is easy to implement - a client just needs to 'subscribe' to an information channel which announces relevant data changes. This can be solved very flexible and resource-saving with existing message broker features like message selectors and destination wildcards. (Note that I am the author of some Delphi and Free Pascal client libraries for open source message brokers.)
Related questions:
Client-Server database application: how to notify clients that data was changed?
How to communicate within this system?
Each of your proposed solutions are all viable in certain situations.
I've been writing software for a long time and comments below relate to personal experience which dates way back to 1981. I have no doubt others will have alternative opinions which will also answer your questions.
Please allow me to justify the positives and negatives of each approach, and the parameters around each comment.
"blindly refresh the history regularly (in Delphi, I display on a DB grid so I would close then open the query), but this seems inefficient."
Yes, this is inefficient
Is often the quickest and simplest thing to do.
Seems like the best short-term temporary solution which gives maximum value for minimal effort.
Good for "exploratory coding" helping derive a better software design.
Should be a good basis to refine / explore alternatives.
It's very important for programmers to strive to document and/or share with team members who could be affected by your changes their team when a tech debt-inducing fix has been checked-in.
If not intended as production quality code, this is acceptable.
If usability is poor, then consider more efficient solutions, like what you've described below.
"ask the database server regularly if anything changed in the last X minutes."
You are talking about a "pull" or "polling" model. Consider the following API options for this model:
What's changed since the last time I called you? (client to provide time to avoid service having to store and retrieve seesion state)
If nothing has changed, server can provide a time when the client should poll again. A system under excessive load is then able to back-off clients, i.e if a server application has an awareness of such conditions, then it is therefore better able to control the polling rate of compliant clients, by instructing them to wait for a longer period before retrying.
After considering that, ask "Is the API as simple as it can possibly be?"
"never poll the database server, instead letting it inform the user's app when something changes."
This is the "push" model you're talking about- publishing changes, ready for subscribers to act upon.
Consider what impact this has on clients waiting for a push - timeout scenarios, number of clients, etc, System resource consumption, etc.
Consider that the "pusher" has to become aware of all consuming applications. If using industry standard messaging queueing systems (RabbitMQ, MS MQ, MQ Series, etc, all naturally supporting Publish/Subscribe JMS topics or equivalent then this problem is abstracted away, but also added some complexity to your application)
consider the scenarios where clients suddenly become unavailable, hypothesize failure modes and test the robustness of you system so you have confidence that it is able to recover properly from failure and consistently remain stable.
So, what do you think the right approach is now?

Database and tools for synchronize data between embedded computers in "real time"

I have 3 Beagleboards that needs to share data between each other as fast as possible. They are running debian (with a real time kernel) and are connected to each other via wlan.
All beagleboards have different sensors attached. All Beagleboards need the sensor data of the others in real time (or at least as fast as possible -the data are used in control algorithms for actuators).
The system is supposed to be used for demonstrate a concept and does not need to be 100% fault proof but as close as possible.
What is the best way to design such a system?
Ideas:
Design program for UDP broadcast and some sql server or just an object/class on the receiver end.
Embedded MySQL/High Performance MySQL with replication or cluster.
SQLite - need some addons?
Any other solutions might be better, I have never designed such a system before. Any help is much appreciated.
If "as fast as possible" is your requirement you need to do the sharing of data yourself and use the database just to store shared data.
You can implement a publisher / subscriber mechanism. One of your nodes becomes master and each of other nodes subbscribes to this node at startup. Master node multiplies and routes messages from subscribers.
Another (faster) option is implementing the publisher / subscriber mechanism without a master node. Each node registers itself to other nodes, it is similar to broadcasting you mentioned.

database synchronization - MS Access

I have an issue at the moment where multiple (same schema) access 2003 databases are used on laptops.
I need to find an automated way to synchronize the data into a central access database.
Data on the laptops is only appended to so update/delete operations wont be an issue.
Which tools will allow me to do this easily?
What factors will affect the decision on the best tool or solution?
It is possible to use the Jet replication built into Access, but I will warn you, it is quite flaky. It will also mess up your PK on whatever tables you do it on because it picks random signed integers to try and avoid key collisions, so you might end up with -1243482392912 as your next PK on a given record. That's a PITA to type in if you're doing any kind of lookup on it (like a customer ID, order number, etc.) You can't automate Access synchronization (maybe you can fake something like it by using VBA. but still, that will only be run when the database is opened).
The way I would recommend is to use SQL Server 2005/2008 on your "central" database and use SQL Server Express Editions as the back-end on your "remote" databases, then use linked tables in Access to connect to these SSEE databases and replication to sync them. Set up either merge replication or snapshot replication with your "central" database as the publisher and your SSEE databases as subscribers. Unlike Access Jet replication, you can control the PK numbering but for you, this won't be an issue as your subscribers will not be pushing changes.
Besides the scalability that SQL server would bring, you can also automate this using the Windows Synchronization manager (if you have synchronized folders, that's the annoying little box that pops up and syncs them when you logon/logoff), and set it up so that it synchronizes at a given interval, on startup, shutdown, or at a time of day, and/or when computer is idle, or only synchronizes on demand. Even if Access isn't run for a month, its data set can be updated every time your users connect to the network. Very cool stuff.
Access Replication can be awkward, and as you only require append queries with some checking, it would probably be best to write something yourself. If the data collected by each laptop cannot overlap, this may not be too difficult.
You will need to consider the primary keys. It may be best to incorporate the user or laptop name in the key to ensure that records relate correctly.
The answers in this thread are filled with misinformation about Jet Replication from people who obviously haven't used it and are just repeating things they've heard, or are attributing problems to Jet Replication that actually reflect application design errors.
It is possible to use the Jet
replication built into Access, but I
will warn you, it is quite flaky.
Jet Replication is not flakey. It is perfectly reliable when used properly, just like any other complex tool. It is true that certain things that cause no problems in a non-replicated database can lead to issues when replicated, but that stands to reason because of the nature of what replication by any database engine entails.
It will also mess up your PK on
whatever tables you do it on because
it picks random signed integers to try
and avoid key collisions, so you might
end up with -1243482392912 as your
next PK on a given record. That's a
PITA to type in if you're doing any
kind of lookup on it (like a customer
ID, order number, etc.)
Surrogate Autonumber PKs should never be exposed to users in the first place. They are meaningless numbers used for joining records behind the scenes, and if you're exposing them to users IT'S AN ERROR IN YOUR APPLICATION DESIGN.
If you do need sequence numbers, you'll have to roll your own and deal with the issue of how to prevent collisions between your replicas. But that's an issue for replication in any database engine. SQL Server offers the capability of allocating blocks of sequence numbers for individual replicas at the database engine level and that's a really nice feature, but it comes at the cost of increased administrative overhead from maintaining multiple SQL Server instances (with all the security and performance issues that entails). In Jet Replication, you'd have to do this in code, but that's hardly a complicated issue.
Another alternative would be to use a compound PK, where one column indicates the source replica.
But this is not some flaw in the Replication implementation of Jet -- it's an issue for any replication scenario with a need for meaningful sequence numbers.
You can't automate Access
synchronization (maybe you can fake
something like it by using VBA. but
still, that will only be run when the
database is opened).
This is patently untrue. If you install the Jet synchronizer you can schedule synchs (direct, indirect or Internet synchs). Even without it, you could schedule a VBScript to run periodically and do the synchronization. Those are just two methods of accomplishing automated Jet synchroniziation without needing to open your Access application.
A quote from MS documentation:
Use Jet and Replication Objects
JRO is really not the best way to manage Jet Replication. For one, it has only one function in it that DAO itself lacks, i.e., the ability to initiate an indirect synch in code. But if you're going to add a dependency to your app (JRO requires a reference, or can be used via late binding), you might as well add a dependency on a truly useful library for controlling Jet Replication, and that's the TSI Synchronizer, created by Michael Kaplan, once the world's foremost expert on Jet Replication (who has since moved onto internationalization as his area of concentration). It gives you full programmatic control of almost all the replication functionality that Jet exposes, including scheduling synchs, initiating all kinds of synchronization, and the much-needed MoveReplica command (the only legal way to move or rename a replica without breaking replication).
JRO is one of the ugly stepchildren of Microsoft's aborted ADO-Everywhere campaign. Its purpose is to provide Jet-specific functionality to supplement what is supported in ADO itself. If you're not using ADO (and you shouldn't be in an Access app with a Jet back end), then you don't really want to use JRO. As I said above, it adds only one function that isn't already available in DAO (i.e., initiating an indirect synch). I can't help but think that Microsoft was being spiteful by creating a standalone library for Jet-specific functionality and then purposefully leaving out all the incredibly useful functions that they could have supported had they chosen to.
Now that I've disposed of the erroneous assertions in the answers offered above, here's my recomendation:
Because you have an append-only infrastructure, do what #Remou has recommended and set up something to manually send the new records whereever they need to go. And he's right that you still have to deal with the PK issue, just as you would if you used Jet Replication. This is because that's necessitated by the requirement to add new records in multiple locations, and is common to all replication/synchronization applications.
But one caveat: if the add-only scenario changes in the future, you'll be hosed and have to start from scratch or write a whole lot of hairy code to manage deletes and updates (this is not easy -- trust me, I've done it!). One advantage of just using Jet Replication (even though it's most valuable for two-way synchronizations, i.e., edits in multiple locations) is that it will handle the add-only scenario without any problems, and then easily handle full merge replication should it become a requirement in the future.
Last of all, a good place to start with Jet Replication is the Jet Replication Wiki. The Resources, Best Practices and Things Not to Believe pages are probably the best places to start.
You should read into Access Database Replication, as there is some information out there.
But I think that in order for it to work correctly with your application, you will have to roll out a custom made solution using the methods and properties available for that end.
Use Jet and Replication Objects (JRO) if you require programmatic control over the exchange of data and design information among members of the replica set in Microsoft Access databases (.mdb files only). For example, you can use JRO to write a procedure that automatically synchronizes a user's replica with the rest of the set when the user opens the database. To replicate a database programmatically, the database must be closed.
If your database was created with Microsoft Access 97 or earlier, you must use Data Access Objects (DAO) to programmatically replicate and synchronize it.
You can create and maintain a replicated database in previous versions of Microsoft Access by using DAO methods and properties. Use DAO if you require programmatic control over the exchange of data and design information among members of the replica set. For example, you can use DAO to write a procedure that automatically synchronizes a user's replica with the rest of the set when the user opens the database.
You can use the following methods and properties to create and maintain a replicated database:
MakeReplica method
Synchronize method
ConflictTable property
DesignMasterID property
KeepLocal property
Replicable property
ReplicaID property
ReplicationConflictFunction property
Microsoft Jet provides these additional methods and properties for creating and maintaining partial replicas (replicas that contain a subset of the records in a full replica):
ReplicaFilter property
PartialReplica property
PopulatePartial method
You should definitely read the Synchronizing Data part of the documentation.
I used replication in a00 for years, until forced to upgrade to a07 (when it went away). The most problematic issue we ran into, at the enterprise level, was managing the CONFLICTS. If not managed timely, or there are too many, users get frustrated and the data becomes unreliable.
Replication did work well when our remote sites were not always connected to the internet. This allowed them to work with their data, and synchronize when they could. At least twice daily.
We install a separate database on the remote computers that managed the synchronization, so the user only had to click an icon on their desktop to evoke the synchronization.
The user had a separate button to push/pull in feeds off a designated FTP file that would update from the Legacy systems.
This process worked quite well, as we had 30 of these "nodes" working around the country, managing their data and updating to the FTP servers.
If you are seriously considering this path, let me know and I can send you my documentation.
You can write your own synchronization software that connects to the laptop selects the diff from it's db and inserts it to the master.
It is depends on your data scheme how easy this operation will be.
(if you have many tables with FKs... you will need to do it smartly).
I think it will be the most efficient if you write it yourself.
Automating this kind of behavior is called replication, and Accesss Supports that apparently, but I've never seen it implemented.
As I guess most of the time the laptop is not connected to the main DB it is not a good idea anyway (to replicate data).
if you will look for a 3rd party tool to do it - look for something that can easily do the diff between the tables before copying, and can do it incrementally of course.
FWIW:
Autonumbers. I agree with David - they should never be exposed. To remove that temptation, I use a Random autonumber.
Replication. I used this extensively some years back, with scheduled syncs, and using GUIDs as the PK. I repeatedly found that any hiccups over the network corrupted the replicas, with the result that I had to salvage data, and re-issue replicas. Painful!

Can MS Enterprise Library Logging be used for multiple applications?

I'm wondering if its - a) possible; b) good practice - to log multiple applications to single log instance?
I have several ASP.NET apps and I would like to aggregate all exceptions to a centralized location that can be queried as part of an Enterprise Dashboard app. I'm using both the EL logging block and the EL exception blog along with the Database Trace Listener. I would like to see exceptions across all apps logged to a single db.
Any comments, best practice guidelines or answers would be extremely welcome.
Yes, it is definitely possible to store multiple application logs in a central location using EL.
An Enterprise Dashboard application that lets you view exceptions across applications and tiers, and provides reporting is a great reason to centralize your logging. So I'll say yes to question b as well.
Possible Issues/Negatives
I'm assuming that you are using the Database Trace Listener since you mention that in your question. If there are a large number of applications logging a large number of log entries combined with users querying the (potentially large) log database, there is the potential for performance to degrade (since the logging is done synchronously) which could impact your application performance.
Another Approach
To mitigate against this possibility, I would investigate using the Distributor Service to log asynchronously. In that model, all of the applications would log to a message queue (using the MSMQ Trace Listener). A separate service then polls the queue and would forward the log entries to a trace listener (in your case a Database Trace Listener) which would persist the messages in your dashboard database. This setup is more complicated. But it does seem to align with what you are trying to achieve and has some other benefits such as asynchronous processing and the ability to log even if the dashboard database is down (e.g. for maintenance).
Other Considerations
You may also want to think about standardizing some LogEntry properties across applications. For example, LogEntry doesn't really have an "application" property so you could add an ExtendedProperty to represent the application name. Or you may standardize on a specific format for the Message property so that various information can be pulled out of the message and stored in separate database columns for easier searching and categorization.

Pattern for updating slave SQL Server 2008 databases from a master whilst minimising disruption

We have an ASP.NET web application hosted by a web farm of many instances using SQL Server 2008 in which we do aggregation and pre-processing of data from multiple sources into a format optimised for fast end user query performance (producing 5-10 million rows in some tables). The aggregation and optimisation is done by a service on a back end server which we then want to distribute to multiple read only front end copies used by the web application instances to facilitate maximum scalability.
My question is about the best way to get this data from a back end database out to the read only front end copies in such a way that does not kill their performance during the process. The front end web application instances will be under constant high load and need to have good responsiveness at all times.
The backend database is constantly being updated so I suspect that transactional replication will not be the best approach, as the constant stream of updates to the copies will hurt their performance.
Staleness of data is not a huge issue so snapshot replication might be the way to go, but this will result in poor performance during the periods of replication.
Doing a drop and bulk insert will result in periods with no data for user queries.
I don't really want to get into writing a complex cluster approach where we drop copies out of the cluster during updating - is there something along these lines that we can do without too much effort, or is there a better alternative?
There is actually a technology built into SQL Server 2005 (and 2008) that is designed to address this kind of issues. Service Broker (I'll refer further as SSB). The problem is that it has a very steep learning curve.
I know MySpace went public how uses SSB to manage their park of SQL Servers: MySpace Uses SQL Server Service Broker to Protect Integrity of 1 Petabyte of Data. I know of several more (major) sites that use similar patterns but unfortunately they have not gone public so I cannot refer names. I was personally involved with some projects around this technology (I am a former member of the SQL Server team).
Now bear in mind that SSB is not a dedicate data transfer technology like Replication. As such you will not find anyhting similar to the publishing wizards and simple deployment options of Replication (check a table and it gets transferred). SSB is a reliable messaging technology and as such its primitives stop at the level of message exchange, you would have to write the code that leverages the data change capture, packs it as messages and also the unpacking of message into relational tables at destination.
Why still some companies preffer SSB over Replication at a task like you describe is because SSB has a far better story when it comes to reliability and scalability. I know of projects that exchange data between 1500+ sites, far beyond the capabilities of Replication. SSB is also abstracted from the physical topology: you can move databases, rename machines, rebuild servers all without changing the application. Because data flow occurs over logical routes the application can addapt on-the-fly to new topologies. SSB is also resilient to long periods of disocnnect and downtime, being capable of resuming the data flow after hours, days and even months of disconnect. High troughput achieved by engine integration (SSB is part of the SQL engine itself, is not a collection of sattelite applications and processes like Replication) means that the backlog of changes can be processes on reasonable times (I know of sites that are going through half a million transactions per minute). SSB applications typically rely on internal Activation to process the incomming data. SSB also has some unique features like built-in load balancing (via routes) with sticky session semantics, support for deadlock free application specific correlated processing, priority data delivery, specific support for database mirroring, certificate based authentication for cross domain operations, built-in persisted timers and many more.
This is not a specific answer 'how to move data from table T on server A to server B'. Is more a generic technology on how to 'exhange data between server A and server B'.
I've never had to deal with this scenario before but did come up with a possible solution for this. Basically, it would require a change in your main database structure. Instead of storing the data, you would keep records of modifications of this data. Thus, if a record is added, you store "Table X, inserted new record with these values: ..." With modifications, just store the table, field and changed value. With deletions, just store which record is deleted. Every modification will be stored with a timestamp.
Your client systems would keep their local copies of the database and will regularly ask for all database modifications after a certain date/time. You then execute those modifications on the local database and it will be up-to-date again.
And the back-end? Well, it would just keep a list of modifications and perhaps a table with the base data. Keeping just the modifications also means you're keeping track of history, allowing you to ask the system what it looked like a year ago.
How well this would perform depends on the number of modifications on the back-end database. But if you request the changes every 15 minutes, it shouldn't be that much data every time.
But again, I never had the chance to work this out in a real application so it's still a theoretic principle for me. It seems fast but a lot of work will be required.
Option 1: Write an app to transfer the data using row level transactions. It might take longer but would result in no interruption of the site using the data because the rows are there before and after the read occurs, just with new data. This processing would happen on a separate server to minimize load.
In sql server 2008 you can set READ_COMMITTED_SNAPSHOT to ON to ensure that the row being updated is not causing blocking.
But basically all this app does is read the new data as it is available out from one database and into the other.
Option 2: Move the data (tables or entire database) from the aggregation server to the front-end server. Automate this if possible. Then switch your web application to point to the new database or tables for future requests. This works but requires control over the web app, which you may not have.
Option 3: If you were talking about a single table (or this could work with many) what you can do is a view swap. So you write your code against a sql view which points to table A. You do you work on Table B and when it's ready, you update the view to point to Table B. You can even write a function that determines the active table and automate the whole swap thing.
Option 4: You might be able to use something like byte-level replication of the server. That sounds scary though. Which is basically copying the server from point A to point B exactly down to the very bytes. It's mostly used in DR situations which this sounds like it could be a kinda/sorta DR situation, but not really.
Option 5: Give up and learn how to sell insurance. :)