Partial database synchronization between secure and unsecure site? - mysql

I've been working with a couple different contact management solutions, but my security paranoia prevents me from wanting to put any sensitive data in the cloud. While I'm not dealing with anything like credit transactions that have detailed security requirements, I am considering handling sensitive personal data that I wouldn't want unauthorized people to have access to.
I'm considering a model where there are two systems: one on a Firewalled LAN that contains the sensitive data, which can only be accessed by those on the lan. A second system would contain the non-sensitive (phone book) information subset of the secure system. I would like changes to the non-sensitive data to be synchronized by a connection initiated by the secure system, so as not to compromise the firewall.
I understand that master-slave replication could probably provide synchronization from the secure system to the public system, however this would seem to overwrite any changes made to the public database.
Now to my questions:
1) Does this security model make sense? Are there any downsides or inherent problems with this approach that I might be overlooking?
2) Is there an off-the-shelf or some example of how to implement this type of system?
3) If no, how would you go about implementing this? (language suggestions, synch algorithms, etc)
Assume a LAMPP configuration, with no persistent processes allowed (except for Apache & MySQL) on the public system.

1) This sounds like it'll end up being a lot of pain. Are you sure you want the public system being able to push changes to the private one?
2) Mysql replication with the public system as the master should do the trick. You'll want to use the --replicate-do-* options to restrict what statements the slave will execute. To avoid breaking replication, you'll need to ensure that any SQL affecting tables you care about doesn't reference tables the slave doesn't know about.

Related

Database replication: multiple geographical locations with local database, one main remote database

I've got a very specific use case and because I'm not too familiar with database replication, I am open to suggestions and ideas about how to accomplish the following in the best possible way:
A web application + database is running on a remote server. Let's call this set-up R for remote.
Now suppose there are 3 separate geographical locations which need read+write access to the database. I will call these locations L1, L2 and L3.
The main problem: the remote server might be unavailable or the internet connection of one of the locations might not always work, rendering the remote application unavailable; but we want the application to work as a high availability solution (on-site) even when the remote server is down or when there is an internet connection problem.
Partial solution: So I was thinking about giving each geographical location its own server with a local copy of the web application. The web application itself can get updated when needed from a version control system automatically (for example using git hooks).
So far so good... (at least I believe so?)
But what about our data? The really tricky part seems to be the database replication. Let's assume no DNS or IP failover and assume that the user first tries to access the remote server directly and if this does not work, the user can still use the local server on-site instead. This all happens inside a web browser (or similar client).
One possible (but unsatisfactory) solution would be to use master-slave replication from R (master) to L1, L2 and L3 (slaves). When doing this asynchronously this should be quite fast? I think this is a viable solution for temporary local read-only database access when the main server is broken or can't be accessed.
But... what about read-write support? I suppose we would need multi-master replication in this case, but I am afraid that synchronous replication using something like (for example) MySQL Cluster or Galera would slow things down, especially since L1, L2 and L3 are on lower bandwidth connections. And they are connected through WAN. (Also, L1, L2 or L3 might not always be online.)
The real question: How would you tackle this specific use case? At the moment I am leaning towards multi-master replication if it doesn't slow down things too much. The application itself will mainly be used by employees on-site but by some external people over WAN as well. Would multi-master replication work well? What if for example L1 is down for 24 hours and suddenly comes back on-line? What if R can't be accessed?
EXTRA: not my main question, but I also need the synchronized data to be sent securely over SSL, if possible, please take this into account for your answer.
Perhaps I am still forgetting some necessary details; if so, please respond with some feedback and I will try to update my question accordingly.
Please note that I haven't decided on a database yet and the database schema will be developed from scratch, so ideas using other databases or database engines are welcome as well. (At the moment I have most experience with MySQL and PostgreSQL)
As you are still undecided, I would strongly recommand you to have a look at MS-SQL merge replication. It is strong, highly reliable, replicates through LAN and HTTPS (so called web replication), and not that expensive.
Terminology differs from the mySql Master\Slave idea. We are here talking about one publisher, and multiple subscribers. All changes done at subscriber's level are collected and sent to the publisher, then redistributed to all subscribers (with, if needed, fancy options like 'filtered subscriptions').
Standard architecture will then be:
a publisher, somewhere on a server, which collects and redistributes changes between subscribers. Publisher might not be accessed by end users.
other database subscribers servers, either for local or web access, replicating with the publisher. Subscribers are accessed by end users.
We have been using this architecture for years, including:
one subscriber for internet access
one subscriber for intranet access
tens of subscribers for local access: some subscribers are on our constructions projects, somewhere in the desert ....
Such an architecture is not available "from the shelf" with MySQL. I guess it could be built, but it would then certainly be a lot more expensive than just buying the corresponding MS-SQL licenses. Do not forget that the free SQLEXPRESS version of MS-SQL can be a subscriber.
Be careful: If you are planning to go through such a configuration, I would (really) strongly advise you to have all primary keys set to uniqueIdentifier data type, and randomly generated. This will avoid the typical replication pitfall, where PK's are set to int with automatic increment, and where independant servers generate identical primary keys between two replications (MS-SQL proposes a tool to avoid such problems, where you can allocate PK ranges per server, but this solution is a real PITA ...).

How to protect mysql database from anyone

I have launched my project to a hosting company. But I am worried about how to protect my mysql database from the hosting company.
My question is how can I protect my database from the hosting company so they can't access my database / data.
Here's a relevant rule of IT security:
"If a bad guy has unrestricted physical access to your computer, it's not your computer anymore."
http://technet.microsoft.com/en-us/library/cc722487.aspx
If you don't trust your hosting company, it's time to get a new one. There's little you can do to prevent someone with physical access to a server from getting at what's on it.
I think that you will just have to trust them. There is no way to fully protect the database, because a hosting company has access to almost all levels of your application. They can event inject a code that would fetch all data in some layer of your application.
The hosting company is only one of the threats. You should think about XSS's, CSRF's, data sniffing at network level and so on...
As all have answered there really is no way to protect your data from the hosting company. They own the server therefore giving them access to all databases on it.
Depending on your data you could encrypt all of it but that's kind of overkill and not a practical solution unless your data is sensitive. In that case I would recommend getting a server of your own and building it to support your needs.
You could check out rackspace and setup one of their servers but again if it's not physically in your possession they could potentially get on it and see what's there. I think it's less likely as you would be setting up your own VM or server through them.
I guess there is nothing we can do to prevent it from the hosting company. therefore configuring your own server may be the only option.
I whole heartedly endorse the general rule that Jay puts forward.
However, in certain environments it might be a good idea to take extra steps to ensure that your data is somewhat more protected given the rule that it really is someoneelses computer.
Try to encrypt data that does not need to be acted up with public keys, and keep private keys off of the server. This is trivial to overcome if someone else can change the code and ensure that unencrypted copies are kept in parallel.
So try to use stuff like Tripwire to ensure that your code has not been changed. Again tripwire can be reconfigured with physical access, so this is not fool proof, but it can work
Good luck, you are interested in a tackling an intractable problem, which are, of course, the most fun.
-FT

Splitting a mysql database for security

I have used sql (mostly mysql) for years but not to a professional standard, so I'm looking for a shove in the right direction.
I am currently designing a web app that will collect user's names/addresses/emails etc in one set of tables, as well as other personal information in another set of tables. These would most naturally reside in one database, but I've been considering splitting the user contact information in one database on a separate server and all the other information into another database/server, the theory being that a hacker would have to break both systems to get anything very useful.
I've done searches off and on for a few weeks and haven't found this type of design discussed much so far. Is this generally done? Is it overkill? Is there a design method to approach it, or will I have to roll it all on my own?
I did find Is splitting databases a legitimate security measure? which I guess is saying that this approach is likely overkill.
I tend to think this is overkill.
Please check my answer on this question: Sharing users between 2 databases
Keep in mind to address separately database design and data access
security issues. Data access security should not lead you to illogical
choices in database design.
IMHO that seems to be wrong. By splitting data across 2 DB you will only increase complexity without reasonable security profits.
I think this is where data encryption can be used. Generate encryption key based on user credentials and encrypt/decrypt sensible data by user requests. Since private data must be shown only to that user, everything should be ok.
Here's an approach I used before:
Server1: DB
Server2: SC
DB is in a network domain that is accessible by the public, but cannot access SC
SC is in a network domain that is not accessible by the public, but can access SC
DB is where you stored all pertinent information, including the 'really important stuff'.
At a specified interval (I used 5 seconds) SC checks DB for any new records in any table it may want to monitor (there is a job or scheduled task) and encrypts the important information.
Although I was utilizing SQL Server 2005 and was able to work in two domains (a private(intern al) and public(for client access) and that what I just shared was a stripped down (removed as much MSSQL-exclusive parts), simplified version, with some effort I think it would be possible to recreate something similar in mysql, especially if you can host your two databases in separate, physical machines.
While many will also think this is overkill, this idea had been implemented. It costs more, and requires more work when it's data reporting time but the clients were pleased.

How do you handle passwords or credentials for standalone applications?

Let's say that you have a standalone application (a Java application in my case) and that this application has a configuration file (a XML file in my case) where you store the credentials (user and password) for a bunch of databases you need to connect.
Everything works great, but now you discover (or your are given a new requirement like me) that you have to put this application in a different server and that you can't have these credentials in the configuration files because of security and/or compliance considerations.
I'm considering to use data sources hosted in the application server (a WAS server), but I think this could have poor performance and maybe it's not the best approach since I'm connecting from a standalone application.
I was also considering to use some sort of encryption, but I would like to keep things as simple as possible.
How would you handle this case? Where would you put these credentials or protect them from being compromised? Or how would you connect to your databases in this scenario?
I was also considering to use some
sort of encryption, but I would like
to keep things as simple as possible.
Take a look at the Java Cryptography Architecture - Password Based Encryption. The concept is fairly straight forward, you encrypt/decrypt the XML stream with a key derived from a user password prior to (de)serializing the file.
I'm only guessing at what your security/compliance considerations require, but definitely some things to consider:
Require strong passwords.
Try to minimize the amount of time that you leave the sensitive material decrypted.
At runtime, handle sensitive material carefully - don't leave it exposed in a global object; instead, try to reduce the scope of sensitive material as much as possible. For example, encapsulate all decrypted data as private in a single class.
Think about how you should handle the case where the password to the configuration file is lost. Perhaps its simple in that you can just create a new config file?
Require both a strong password and a user keyfile to access the configuration file. That would leave it up to the user to store the keyfile safely; and if either piece of information is accidentally exposed, it's still useless without both.
While this is probably overkill, I highly recommend taking a look at Applied Cryptography by Bruce Schneier. It provides a great look into the realm of crypto.
if your standalone application runs in a large business or enterprise, it's likely that they're using the Lightweight Directory Access Protocol, or LDAP, for their passwords.
You might want to consider using an LDAP, or providing hooks in your application for a corporate LDAP.
I'm considering to use data sources hosted in the application server (a WAS server), but I think this could have poor performance and maybe it's not the best approach since I'm connecting from a standalone application.
In contrary, those datasources are usually connection pooled datasources and it should just enhance DB connecting performance since connecting is per saldo the most expensive task.
Have you tested/benchmarked it?

database synchronization - MS Access

I have an issue at the moment where multiple (same schema) access 2003 databases are used on laptops.
I need to find an automated way to synchronize the data into a central access database.
Data on the laptops is only appended to so update/delete operations wont be an issue.
Which tools will allow me to do this easily?
What factors will affect the decision on the best tool or solution?
It is possible to use the Jet replication built into Access, but I will warn you, it is quite flaky. It will also mess up your PK on whatever tables you do it on because it picks random signed integers to try and avoid key collisions, so you might end up with -1243482392912 as your next PK on a given record. That's a PITA to type in if you're doing any kind of lookup on it (like a customer ID, order number, etc.) You can't automate Access synchronization (maybe you can fake something like it by using VBA. but still, that will only be run when the database is opened).
The way I would recommend is to use SQL Server 2005/2008 on your "central" database and use SQL Server Express Editions as the back-end on your "remote" databases, then use linked tables in Access to connect to these SSEE databases and replication to sync them. Set up either merge replication or snapshot replication with your "central" database as the publisher and your SSEE databases as subscribers. Unlike Access Jet replication, you can control the PK numbering but for you, this won't be an issue as your subscribers will not be pushing changes.
Besides the scalability that SQL server would bring, you can also automate this using the Windows Synchronization manager (if you have synchronized folders, that's the annoying little box that pops up and syncs them when you logon/logoff), and set it up so that it synchronizes at a given interval, on startup, shutdown, or at a time of day, and/or when computer is idle, or only synchronizes on demand. Even if Access isn't run for a month, its data set can be updated every time your users connect to the network. Very cool stuff.
Access Replication can be awkward, and as you only require append queries with some checking, it would probably be best to write something yourself. If the data collected by each laptop cannot overlap, this may not be too difficult.
You will need to consider the primary keys. It may be best to incorporate the user or laptop name in the key to ensure that records relate correctly.
The answers in this thread are filled with misinformation about Jet Replication from people who obviously haven't used it and are just repeating things they've heard, or are attributing problems to Jet Replication that actually reflect application design errors.
It is possible to use the Jet
replication built into Access, but I
will warn you, it is quite flaky.
Jet Replication is not flakey. It is perfectly reliable when used properly, just like any other complex tool. It is true that certain things that cause no problems in a non-replicated database can lead to issues when replicated, but that stands to reason because of the nature of what replication by any database engine entails.
It will also mess up your PK on
whatever tables you do it on because
it picks random signed integers to try
and avoid key collisions, so you might
end up with -1243482392912 as your
next PK on a given record. That's a
PITA to type in if you're doing any
kind of lookup on it (like a customer
ID, order number, etc.)
Surrogate Autonumber PKs should never be exposed to users in the first place. They are meaningless numbers used for joining records behind the scenes, and if you're exposing them to users IT'S AN ERROR IN YOUR APPLICATION DESIGN.
If you do need sequence numbers, you'll have to roll your own and deal with the issue of how to prevent collisions between your replicas. But that's an issue for replication in any database engine. SQL Server offers the capability of allocating blocks of sequence numbers for individual replicas at the database engine level and that's a really nice feature, but it comes at the cost of increased administrative overhead from maintaining multiple SQL Server instances (with all the security and performance issues that entails). In Jet Replication, you'd have to do this in code, but that's hardly a complicated issue.
Another alternative would be to use a compound PK, where one column indicates the source replica.
But this is not some flaw in the Replication implementation of Jet -- it's an issue for any replication scenario with a need for meaningful sequence numbers.
You can't automate Access
synchronization (maybe you can fake
something like it by using VBA. but
still, that will only be run when the
database is opened).
This is patently untrue. If you install the Jet synchronizer you can schedule synchs (direct, indirect or Internet synchs). Even without it, you could schedule a VBScript to run periodically and do the synchronization. Those are just two methods of accomplishing automated Jet synchroniziation without needing to open your Access application.
A quote from MS documentation:
Use Jet and Replication Objects
JRO is really not the best way to manage Jet Replication. For one, it has only one function in it that DAO itself lacks, i.e., the ability to initiate an indirect synch in code. But if you're going to add a dependency to your app (JRO requires a reference, or can be used via late binding), you might as well add a dependency on a truly useful library for controlling Jet Replication, and that's the TSI Synchronizer, created by Michael Kaplan, once the world's foremost expert on Jet Replication (who has since moved onto internationalization as his area of concentration). It gives you full programmatic control of almost all the replication functionality that Jet exposes, including scheduling synchs, initiating all kinds of synchronization, and the much-needed MoveReplica command (the only legal way to move or rename a replica without breaking replication).
JRO is one of the ugly stepchildren of Microsoft's aborted ADO-Everywhere campaign. Its purpose is to provide Jet-specific functionality to supplement what is supported in ADO itself. If you're not using ADO (and you shouldn't be in an Access app with a Jet back end), then you don't really want to use JRO. As I said above, it adds only one function that isn't already available in DAO (i.e., initiating an indirect synch). I can't help but think that Microsoft was being spiteful by creating a standalone library for Jet-specific functionality and then purposefully leaving out all the incredibly useful functions that they could have supported had they chosen to.
Now that I've disposed of the erroneous assertions in the answers offered above, here's my recomendation:
Because you have an append-only infrastructure, do what #Remou has recommended and set up something to manually send the new records whereever they need to go. And he's right that you still have to deal with the PK issue, just as you would if you used Jet Replication. This is because that's necessitated by the requirement to add new records in multiple locations, and is common to all replication/synchronization applications.
But one caveat: if the add-only scenario changes in the future, you'll be hosed and have to start from scratch or write a whole lot of hairy code to manage deletes and updates (this is not easy -- trust me, I've done it!). One advantage of just using Jet Replication (even though it's most valuable for two-way synchronizations, i.e., edits in multiple locations) is that it will handle the add-only scenario without any problems, and then easily handle full merge replication should it become a requirement in the future.
Last of all, a good place to start with Jet Replication is the Jet Replication Wiki. The Resources, Best Practices and Things Not to Believe pages are probably the best places to start.
You should read into Access Database Replication, as there is some information out there.
But I think that in order for it to work correctly with your application, you will have to roll out a custom made solution using the methods and properties available for that end.
Use Jet and Replication Objects (JRO) if you require programmatic control over the exchange of data and design information among members of the replica set in Microsoft Access databases (.mdb files only). For example, you can use JRO to write a procedure that automatically synchronizes a user's replica with the rest of the set when the user opens the database. To replicate a database programmatically, the database must be closed.
If your database was created with Microsoft Access 97 or earlier, you must use Data Access Objects (DAO) to programmatically replicate and synchronize it.
You can create and maintain a replicated database in previous versions of Microsoft Access by using DAO methods and properties. Use DAO if you require programmatic control over the exchange of data and design information among members of the replica set. For example, you can use DAO to write a procedure that automatically synchronizes a user's replica with the rest of the set when the user opens the database.
You can use the following methods and properties to create and maintain a replicated database:
MakeReplica method
Synchronize method
ConflictTable property
DesignMasterID property
KeepLocal property
Replicable property
ReplicaID property
ReplicationConflictFunction property
Microsoft Jet provides these additional methods and properties for creating and maintaining partial replicas (replicas that contain a subset of the records in a full replica):
ReplicaFilter property
PartialReplica property
PopulatePartial method
You should definitely read the Synchronizing Data part of the documentation.
I used replication in a00 for years, until forced to upgrade to a07 (when it went away). The most problematic issue we ran into, at the enterprise level, was managing the CONFLICTS. If not managed timely, or there are too many, users get frustrated and the data becomes unreliable.
Replication did work well when our remote sites were not always connected to the internet. This allowed them to work with their data, and synchronize when they could. At least twice daily.
We install a separate database on the remote computers that managed the synchronization, so the user only had to click an icon on their desktop to evoke the synchronization.
The user had a separate button to push/pull in feeds off a designated FTP file that would update from the Legacy systems.
This process worked quite well, as we had 30 of these "nodes" working around the country, managing their data and updating to the FTP servers.
If you are seriously considering this path, let me know and I can send you my documentation.
You can write your own synchronization software that connects to the laptop selects the diff from it's db and inserts it to the master.
It is depends on your data scheme how easy this operation will be.
(if you have many tables with FKs... you will need to do it smartly).
I think it will be the most efficient if you write it yourself.
Automating this kind of behavior is called replication, and Accesss Supports that apparently, but I've never seen it implemented.
As I guess most of the time the laptop is not connected to the main DB it is not a good idea anyway (to replicate data).
if you will look for a 3rd party tool to do it - look for something that can easily do the diff between the tables before copying, and can do it incrementally of course.
FWIW:
Autonumbers. I agree with David - they should never be exposed. To remove that temptation, I use a Random autonumber.
Replication. I used this extensively some years back, with scheduled syncs, and using GUIDs as the PK. I repeatedly found that any hiccups over the network corrupted the replicas, with the result that I had to salvage data, and re-issue replicas. Painful!