database synchronization - MS Access - ms-access

I have an issue at the moment where multiple (same schema) access 2003 databases are used on laptops.
I need to find an automated way to synchronize the data into a central access database.
Data on the laptops is only appended to so update/delete operations wont be an issue.
Which tools will allow me to do this easily?
What factors will affect the decision on the best tool or solution?

It is possible to use the Jet replication built into Access, but I will warn you, it is quite flaky. It will also mess up your PK on whatever tables you do it on because it picks random signed integers to try and avoid key collisions, so you might end up with -1243482392912 as your next PK on a given record. That's a PITA to type in if you're doing any kind of lookup on it (like a customer ID, order number, etc.) You can't automate Access synchronization (maybe you can fake something like it by using VBA. but still, that will only be run when the database is opened).
The way I would recommend is to use SQL Server 2005/2008 on your "central" database and use SQL Server Express Editions as the back-end on your "remote" databases, then use linked tables in Access to connect to these SSEE databases and replication to sync them. Set up either merge replication or snapshot replication with your "central" database as the publisher and your SSEE databases as subscribers. Unlike Access Jet replication, you can control the PK numbering but for you, this won't be an issue as your subscribers will not be pushing changes.
Besides the scalability that SQL server would bring, you can also automate this using the Windows Synchronization manager (if you have synchronized folders, that's the annoying little box that pops up and syncs them when you logon/logoff), and set it up so that it synchronizes at a given interval, on startup, shutdown, or at a time of day, and/or when computer is idle, or only synchronizes on demand. Even if Access isn't run for a month, its data set can be updated every time your users connect to the network. Very cool stuff.

Access Replication can be awkward, and as you only require append queries with some checking, it would probably be best to write something yourself. If the data collected by each laptop cannot overlap, this may not be too difficult.
You will need to consider the primary keys. It may be best to incorporate the user or laptop name in the key to ensure that records relate correctly.

The answers in this thread are filled with misinformation about Jet Replication from people who obviously haven't used it and are just repeating things they've heard, or are attributing problems to Jet Replication that actually reflect application design errors.
It is possible to use the Jet
replication built into Access, but I
will warn you, it is quite flaky.
Jet Replication is not flakey. It is perfectly reliable when used properly, just like any other complex tool. It is true that certain things that cause no problems in a non-replicated database can lead to issues when replicated, but that stands to reason because of the nature of what replication by any database engine entails.
It will also mess up your PK on
whatever tables you do it on because
it picks random signed integers to try
and avoid key collisions, so you might
end up with -1243482392912 as your
next PK on a given record. That's a
PITA to type in if you're doing any
kind of lookup on it (like a customer
ID, order number, etc.)
Surrogate Autonumber PKs should never be exposed to users in the first place. They are meaningless numbers used for joining records behind the scenes, and if you're exposing them to users IT'S AN ERROR IN YOUR APPLICATION DESIGN.
If you do need sequence numbers, you'll have to roll your own and deal with the issue of how to prevent collisions between your replicas. But that's an issue for replication in any database engine. SQL Server offers the capability of allocating blocks of sequence numbers for individual replicas at the database engine level and that's a really nice feature, but it comes at the cost of increased administrative overhead from maintaining multiple SQL Server instances (with all the security and performance issues that entails). In Jet Replication, you'd have to do this in code, but that's hardly a complicated issue.
Another alternative would be to use a compound PK, where one column indicates the source replica.
But this is not some flaw in the Replication implementation of Jet -- it's an issue for any replication scenario with a need for meaningful sequence numbers.
You can't automate Access
synchronization (maybe you can fake
something like it by using VBA. but
still, that will only be run when the
database is opened).
This is patently untrue. If you install the Jet synchronizer you can schedule synchs (direct, indirect or Internet synchs). Even without it, you could schedule a VBScript to run periodically and do the synchronization. Those are just two methods of accomplishing automated Jet synchroniziation without needing to open your Access application.
A quote from MS documentation:
Use Jet and Replication Objects
JRO is really not the best way to manage Jet Replication. For one, it has only one function in it that DAO itself lacks, i.e., the ability to initiate an indirect synch in code. But if you're going to add a dependency to your app (JRO requires a reference, or can be used via late binding), you might as well add a dependency on a truly useful library for controlling Jet Replication, and that's the TSI Synchronizer, created by Michael Kaplan, once the world's foremost expert on Jet Replication (who has since moved onto internationalization as his area of concentration). It gives you full programmatic control of almost all the replication functionality that Jet exposes, including scheduling synchs, initiating all kinds of synchronization, and the much-needed MoveReplica command (the only legal way to move or rename a replica without breaking replication).
JRO is one of the ugly stepchildren of Microsoft's aborted ADO-Everywhere campaign. Its purpose is to provide Jet-specific functionality to supplement what is supported in ADO itself. If you're not using ADO (and you shouldn't be in an Access app with a Jet back end), then you don't really want to use JRO. As I said above, it adds only one function that isn't already available in DAO (i.e., initiating an indirect synch). I can't help but think that Microsoft was being spiteful by creating a standalone library for Jet-specific functionality and then purposefully leaving out all the incredibly useful functions that they could have supported had they chosen to.
Now that I've disposed of the erroneous assertions in the answers offered above, here's my recomendation:
Because you have an append-only infrastructure, do what #Remou has recommended and set up something to manually send the new records whereever they need to go. And he's right that you still have to deal with the PK issue, just as you would if you used Jet Replication. This is because that's necessitated by the requirement to add new records in multiple locations, and is common to all replication/synchronization applications.
But one caveat: if the add-only scenario changes in the future, you'll be hosed and have to start from scratch or write a whole lot of hairy code to manage deletes and updates (this is not easy -- trust me, I've done it!). One advantage of just using Jet Replication (even though it's most valuable for two-way synchronizations, i.e., edits in multiple locations) is that it will handle the add-only scenario without any problems, and then easily handle full merge replication should it become a requirement in the future.
Last of all, a good place to start with Jet Replication is the Jet Replication Wiki. The Resources, Best Practices and Things Not to Believe pages are probably the best places to start.

You should read into Access Database Replication, as there is some information out there.
But I think that in order for it to work correctly with your application, you will have to roll out a custom made solution using the methods and properties available for that end.
Use Jet and Replication Objects (JRO) if you require programmatic control over the exchange of data and design information among members of the replica set in Microsoft Access databases (.mdb files only). For example, you can use JRO to write a procedure that automatically synchronizes a user's replica with the rest of the set when the user opens the database. To replicate a database programmatically, the database must be closed.
If your database was created with Microsoft Access 97 or earlier, you must use Data Access Objects (DAO) to programmatically replicate and synchronize it.
You can create and maintain a replicated database in previous versions of Microsoft Access by using DAO methods and properties. Use DAO if you require programmatic control over the exchange of data and design information among members of the replica set. For example, you can use DAO to write a procedure that automatically synchronizes a user's replica with the rest of the set when the user opens the database.
You can use the following methods and properties to create and maintain a replicated database:
MakeReplica method
Synchronize method
ConflictTable property
DesignMasterID property
KeepLocal property
Replicable property
ReplicaID property
ReplicationConflictFunction property
Microsoft Jet provides these additional methods and properties for creating and maintaining partial replicas (replicas that contain a subset of the records in a full replica):
ReplicaFilter property
PartialReplica property
PopulatePartial method
You should definitely read the Synchronizing Data part of the documentation.

I used replication in a00 for years, until forced to upgrade to a07 (when it went away). The most problematic issue we ran into, at the enterprise level, was managing the CONFLICTS. If not managed timely, or there are too many, users get frustrated and the data becomes unreliable.
Replication did work well when our remote sites were not always connected to the internet. This allowed them to work with their data, and synchronize when they could. At least twice daily.
We install a separate database on the remote computers that managed the synchronization, so the user only had to click an icon on their desktop to evoke the synchronization.
The user had a separate button to push/pull in feeds off a designated FTP file that would update from the Legacy systems.
This process worked quite well, as we had 30 of these "nodes" working around the country, managing their data and updating to the FTP servers.
If you are seriously considering this path, let me know and I can send you my documentation.

You can write your own synchronization software that connects to the laptop selects the diff from it's db and inserts it to the master.
It is depends on your data scheme how easy this operation will be.
(if you have many tables with FKs... you will need to do it smartly).
I think it will be the most efficient if you write it yourself.
Automating this kind of behavior is called replication, and Accesss Supports that apparently, but I've never seen it implemented.
As I guess most of the time the laptop is not connected to the main DB it is not a good idea anyway (to replicate data).
if you will look for a 3rd party tool to do it - look for something that can easily do the diff between the tables before copying, and can do it incrementally of course.

FWIW:
Autonumbers. I agree with David - they should never be exposed. To remove that temptation, I use a Random autonumber.
Replication. I used this extensively some years back, with scheduled syncs, and using GUIDs as the PK. I repeatedly found that any hiccups over the network corrupted the replicas, with the result that I had to salvage data, and re-issue replicas. Painful!

Related

How to keep two databases with different schemas up-to-date

Our company has really old legacy system with such a bad database design (no foreign keys, columns with serialized PHP arrays, etc. :(). We decided to rewrite a system from a scratch with new database schema.
We want to rewrite a system by parts. So we will split old monolithic application to many smaller ones.
Problem is: we want to have live data in two databases. Old and New schema.
I'd like to ask you if anyone of you knows best practices how to do this.
What we think of:
asynchronous data synchronization with message queue
make a REST API in new system and make legacy application to use
it instead of db calls
some kind of table replication
Thank you very much
I had to deal with a similar problem in the past. There was a system which didn't have support but there was people using it because, It had some features (security holes) which allowed them certain functionalities. However, they also needed new functionalities.
I selected the tables which involved the new system and I created some triggers for cross update the tables, so when I created a register on the old system the trigger created a copy in the new system and reversal. If you design this system properly you would have both systems working at the same time in real time.
The drawback is that while the both system are running the system is going to become slower since you have to maintain the integrity of two databases in every operation.
I would start by adding a database layer to accept API calls from the business layer, then write to both the old schema and the new. This adds complexity up front, but it lets you guarantee that the data stays in sync.
This would require changing the legacy system to call an API instead of issuing SQL statements. If they did not have the foresight to do that originally, you may not be able to take my approach. But, you should do it going forward.
Triggers may or may not work out. In older versions of MySQL, there can be only one trigger of a given type on a given table. This forces you to lump unrelated things into a single trigger.
Replication can solve some changes -- Engine, datatypes, etc. But it cannot help with splitting one table into two. Be careful of the replication of Triggers and where the Trigger has effect (between Master and Slave). In general, a stored routine should be performed on the Master, letting the effect be replicated to the slave. But it may be worth considering how to have the trigger run in the Slave instead. Or different triggers in the two servers.
Another thought is to do the transformation in stages. By careful planning of schema changes versus application of triggers versus code changes versus database layer, you can do partial transformations one at a time, sometimes without having a big outage to update everything simultaneously (with your fingers crossed). A simple example: (1) change code to dynamically handle either new or old schema, (2) change the schema, (3) clean up the code (remove handling of old schema).
Doing a database migration may be a tedious task considering the complexity of data and structure of the tables which is of-course with out any constraints or a proper design. But given that your legacy application was doing its job - the amount of corrupt usable data will be minimal.
For the said problem I would suggest a db migration task which would convert all the old legacy data into the new form. And develop the new application. The advantages being.
1) There is not need to keep 2 different applications.
2) No need to change the code in the legacy application - which can become messy.
3) DB migration will give us a chance to correct any corrupt data (if needed).
DB migration may not be practical under all scenarios but if you can do it in lesser effort than making the changes for database sync, new api's for legacy application - I would suggest to go for it.

Virtual Segregation of Data in Multi-tenant MySQL Database

This is more of a conceptual question so variations on the stack are welcome should they be capable of accomplishing the same concept. We're currently on MySQL and expanding some services out into MongoDB.
The idea is that we would like to be able to manage a single physical database schema/structure so that adjustments, expansions etc. don't become overly cumbersome as the number of clients utilizing the structure grows into the thousands, tens of, hundreds of, etc. however we would like to segregate their data at this level rather than simply at the application layer to provide a more rigid separation. Is it possible to create virtual bins for each client using the same structure, but have their data structurally separated from one another?
The normal way would obviously be adding Client Keys to every row of data either directly or via foreign relationships, but given that we can't foresee with 20/20 how hacks on our system might occur allowing "cross client" data retrieval, I wanted to go a little further to embed the separation at a virtually structural level.
I've also read another post here: MySQL: how to do row-level security (like Oracle's Virtual Private Database)? which uses "views" as a method but this seems to become more work the larger the list of clients.
Thanks!
---- EDIT ----
Based on some of the literature suggested below, here's a little more info on our intent:
The closest situation of the three outlined in the MSDN article provided by #Stennie would be a single database, multiple-schema, however the difference being, we're not interested in customizing client schemas after their creation, we would actually prefer they remain locked to the parent/master schema.
Ideally the solution would keep each schema linked to the parent table-set structure rather than simply duplicating it with the hope that any change to the parent or master schema would be cascaded across all client/tenant schemas.
Taking it a step further, in a cluster we could have a single master with the master schema, and each slave replicating from it but with a sharded set of tenants. Changes to the master could then be filtered down through the cluster without interruption and would maintain consistency across all instances also allowing us to update the application layer faster knowing that all DB's are compatible with the updated schemas.
Hope that makes sense, I'm still a little fresh at this level.
There are a few common infrastructure approaches ranging from "share nothing" (aka multi-instance) to "share everything" (aka multi-tenant).
For example, a straightforward approach to your "virtual bins" would be to allocate a database per client using shared database servers. This is somewhere in between the two sharing extremes, as your customers would be sharing database server infrastructure but keeping their data and schema separate.
A database-per-client approach would allow you to:
manage authentication and access per client using the database's authentication & access controls
support different database software (you mention using both MySQL which supports views, and MongoDB which does not)
more easily backup and restore data per client
avoid potential cross-client leakage at a database level
avoid excessive table growth and related management issues for a single massive database
Some potential downsides would include:
having more databases to manage
in the case of a database where you want to enforce certain schema (i.e. MySQL) you will need to apply the schema changes across all your databases or support some form of versioning
in the case of a database which preallocates storage (i.e. MongoDB) you may use more storage per client (particularly if your actual data size is small)
you may run into limits on namespaces or open files
you still have to worry about application and data security :)
If you do some research on multi-tenancy you will find some other solutions ranging from this example (isolated DB per client on shared database server architecture) through to more complex partitioned data schemes.
This Microsoft article includes a useful overview of approaches and considerations: Multi-tenant SaaS database tenancy patterns.

Splitting a mysql database for security

I have used sql (mostly mysql) for years but not to a professional standard, so I'm looking for a shove in the right direction.
I am currently designing a web app that will collect user's names/addresses/emails etc in one set of tables, as well as other personal information in another set of tables. These would most naturally reside in one database, but I've been considering splitting the user contact information in one database on a separate server and all the other information into another database/server, the theory being that a hacker would have to break both systems to get anything very useful.
I've done searches off and on for a few weeks and haven't found this type of design discussed much so far. Is this generally done? Is it overkill? Is there a design method to approach it, or will I have to roll it all on my own?
I did find Is splitting databases a legitimate security measure? which I guess is saying that this approach is likely overkill.
I tend to think this is overkill.
Please check my answer on this question: Sharing users between 2 databases
Keep in mind to address separately database design and data access
security issues. Data access security should not lead you to illogical
choices in database design.
IMHO that seems to be wrong. By splitting data across 2 DB you will only increase complexity without reasonable security profits.
I think this is where data encryption can be used. Generate encryption key based on user credentials and encrypt/decrypt sensible data by user requests. Since private data must be shown only to that user, everything should be ok.
Here's an approach I used before:
Server1: DB
Server2: SC
DB is in a network domain that is accessible by the public, but cannot access SC
SC is in a network domain that is not accessible by the public, but can access SC
DB is where you stored all pertinent information, including the 'really important stuff'.
At a specified interval (I used 5 seconds) SC checks DB for any new records in any table it may want to monitor (there is a job or scheduled task) and encrypts the important information.
Although I was utilizing SQL Server 2005 and was able to work in two domains (a private(intern al) and public(for client access) and that what I just shared was a stripped down (removed as much MSSQL-exclusive parts), simplified version, with some effort I think it would be possible to recreate something similar in mysql, especially if you can host your two databases in separate, physical machines.
While many will also think this is overkill, this idea had been implemented. It costs more, and requires more work when it's data reporting time but the clients were pleased.

Synchronizing MS Access database file

I am developing a database with about 10 tables in it. Basically it will be used in 2 or 3 distant geographical locations (let's call them A,B and C). The desired work flow will be as follows:
A,B and C should always have the same database. So when A does any changes he should be able to send those changes over to B and C. Emailing the entire mdb file doesnt make sense since its 15+mb in size. So I would like to send the new additional records and changes only to B and C. The changes B and C make should also be reflected to the other repective parties. How can I do this?
I have a few ideas in mind but cont know how to implement it.
solution 'A' - export the data tables only into a xls file and email that. But the importing of the tables into the mdb file could be a bit complex right? and the xls is file will also become bigger and bigger with time.
solution 'B' - try extract just the changes and email only the new parts? (but how to extract just those)
Solution 'C' - find some way of syncing all users onto the same database(storage) location. I was thinking of a front/back end splitting solution by storing the tables in a shared drive in the parent company's server (which is also overseas). But the network connection between locations is very slow, and I dont know how much bandwidth is needed for this.
Any recomendations would be most welcome!
In regard to sources for information on replication, start with my Jet Replication Wiki.
But I would never recommend Jet replication for your scenario. The only environment where I currently recommend it (and I've been doing replicated apps since 1997 and still have several in production use) is for supporting laptop users who have to work with live data in the field disconnected from any network, and return to the home office and synch direct with the mother ship.
The easiest solutions with an Access application would be hosting the app on Windows Terminal Server/Citrix and the users would run it over a Remote Desktop Connection, or using Sharepoint. The Terminal Server/Citrix solution has no accomodation for disconnected users, but Sharepoint can accomodate offline usage and synch changes when connected. Access 2010 and Sharepoint 2010 provide a host of new features, including better schema design, the equivalent of triggers and greatly improved peformance for large Sharepoint lists, so it's a no-brainer to me that if you choose Sharepoint you'd want to use A2010 and Sharepoint 2010.
While it's possible to do what you want with Jet Replication, it requires a lot of setup on the server and client ends, and is relatively fragile (not in terms of data integrity if you're using indirect replication (as you should), but in terms of network reliability) -- there are too many moving parts and too many failure points.
Windows Terminal Server/Citrix is by far the simplest, with the fewest moving parts and completely centralized administration, and works very well for a relatively small investment.
Sharepoint is more complicated than WTS/Citrix, but is less complex and more centralized than a Jet Replication solution.
If it were me, I'd probably go with WTS/Citrix if there was no need for disconnected usage, but I'd be salivating over trying out A2010/Sharepoint 2010. If there was a need for disconnected usage, then I'd definitely go the Sharepoint route.
You want to use "Jet Replication". See
MSDN Search for jro at http://social.msdn.microsoft.com/Search/en-US?query=jro&ac=8
MSDN Search for access replication at http://social.msdn.microsoft.com/Search/en-US?query=access%20replication&ac=3
It's been some time since I did it, but the indirect method of replication worked well for me in a similar situation.
It takes something to set up. The documentation used to be appalling for it, but I found articles written by Michael Kaplan (aka Michka) that walked me through how to do it.
If your final environment is going to be fairly stable, then use Access the whole way. If not, then I'd urge you to take HansUp's advice and go with SQL Server or SharePoint.
Do note: if you're working in Access 2007 or later, replication is not directly supported, and you'll have to roll-your-own bits and pieces. If you're using an earlier installation, you'll be fine, but allow time for some head-scratching.

Pattern for updating slave SQL Server 2008 databases from a master whilst minimising disruption

We have an ASP.NET web application hosted by a web farm of many instances using SQL Server 2008 in which we do aggregation and pre-processing of data from multiple sources into a format optimised for fast end user query performance (producing 5-10 million rows in some tables). The aggregation and optimisation is done by a service on a back end server which we then want to distribute to multiple read only front end copies used by the web application instances to facilitate maximum scalability.
My question is about the best way to get this data from a back end database out to the read only front end copies in such a way that does not kill their performance during the process. The front end web application instances will be under constant high load and need to have good responsiveness at all times.
The backend database is constantly being updated so I suspect that transactional replication will not be the best approach, as the constant stream of updates to the copies will hurt their performance.
Staleness of data is not a huge issue so snapshot replication might be the way to go, but this will result in poor performance during the periods of replication.
Doing a drop and bulk insert will result in periods with no data for user queries.
I don't really want to get into writing a complex cluster approach where we drop copies out of the cluster during updating - is there something along these lines that we can do without too much effort, or is there a better alternative?
There is actually a technology built into SQL Server 2005 (and 2008) that is designed to address this kind of issues. Service Broker (I'll refer further as SSB). The problem is that it has a very steep learning curve.
I know MySpace went public how uses SSB to manage their park of SQL Servers: MySpace Uses SQL Server Service Broker to Protect Integrity of 1 Petabyte of Data. I know of several more (major) sites that use similar patterns but unfortunately they have not gone public so I cannot refer names. I was personally involved with some projects around this technology (I am a former member of the SQL Server team).
Now bear in mind that SSB is not a dedicate data transfer technology like Replication. As such you will not find anyhting similar to the publishing wizards and simple deployment options of Replication (check a table and it gets transferred). SSB is a reliable messaging technology and as such its primitives stop at the level of message exchange, you would have to write the code that leverages the data change capture, packs it as messages and also the unpacking of message into relational tables at destination.
Why still some companies preffer SSB over Replication at a task like you describe is because SSB has a far better story when it comes to reliability and scalability. I know of projects that exchange data between 1500+ sites, far beyond the capabilities of Replication. SSB is also abstracted from the physical topology: you can move databases, rename machines, rebuild servers all without changing the application. Because data flow occurs over logical routes the application can addapt on-the-fly to new topologies. SSB is also resilient to long periods of disocnnect and downtime, being capable of resuming the data flow after hours, days and even months of disconnect. High troughput achieved by engine integration (SSB is part of the SQL engine itself, is not a collection of sattelite applications and processes like Replication) means that the backlog of changes can be processes on reasonable times (I know of sites that are going through half a million transactions per minute). SSB applications typically rely on internal Activation to process the incomming data. SSB also has some unique features like built-in load balancing (via routes) with sticky session semantics, support for deadlock free application specific correlated processing, priority data delivery, specific support for database mirroring, certificate based authentication for cross domain operations, built-in persisted timers and many more.
This is not a specific answer 'how to move data from table T on server A to server B'. Is more a generic technology on how to 'exhange data between server A and server B'.
I've never had to deal with this scenario before but did come up with a possible solution for this. Basically, it would require a change in your main database structure. Instead of storing the data, you would keep records of modifications of this data. Thus, if a record is added, you store "Table X, inserted new record with these values: ..." With modifications, just store the table, field and changed value. With deletions, just store which record is deleted. Every modification will be stored with a timestamp.
Your client systems would keep their local copies of the database and will regularly ask for all database modifications after a certain date/time. You then execute those modifications on the local database and it will be up-to-date again.
And the back-end? Well, it would just keep a list of modifications and perhaps a table with the base data. Keeping just the modifications also means you're keeping track of history, allowing you to ask the system what it looked like a year ago.
How well this would perform depends on the number of modifications on the back-end database. But if you request the changes every 15 minutes, it shouldn't be that much data every time.
But again, I never had the chance to work this out in a real application so it's still a theoretic principle for me. It seems fast but a lot of work will be required.
Option 1: Write an app to transfer the data using row level transactions. It might take longer but would result in no interruption of the site using the data because the rows are there before and after the read occurs, just with new data. This processing would happen on a separate server to minimize load.
In sql server 2008 you can set READ_COMMITTED_SNAPSHOT to ON to ensure that the row being updated is not causing blocking.
But basically all this app does is read the new data as it is available out from one database and into the other.
Option 2: Move the data (tables or entire database) from the aggregation server to the front-end server. Automate this if possible. Then switch your web application to point to the new database or tables for future requests. This works but requires control over the web app, which you may not have.
Option 3: If you were talking about a single table (or this could work with many) what you can do is a view swap. So you write your code against a sql view which points to table A. You do you work on Table B and when it's ready, you update the view to point to Table B. You can even write a function that determines the active table and automate the whole swap thing.
Option 4: You might be able to use something like byte-level replication of the server. That sounds scary though. Which is basically copying the server from point A to point B exactly down to the very bytes. It's mostly used in DR situations which this sounds like it could be a kinda/sorta DR situation, but not really.
Option 5: Give up and learn how to sell insurance. :)