How to obfuscate data in SQL Server for development purposes to hide sensitive data without encryption keys because that's crack-able.
OK I am not sure if you require the data to be encrypted for regulatory purposes or just because you don't trust your developers. Given I don't know the laws where your data resides I can't answer the regulatory side of things.
For the trust side the best solution is not to encrypt/decrypt the data (although that may be needed for other reasons), but to partition data sets and only allow defined people to access their required data. You do this by having separate development, staging and production environments:
The developers only work in the development environment which is loaded with enough dummy data for them to do their job. Developers have full access to the data and code here.
QA people test the code in a staging environment which mimics the real system, but again only has enough dummy data loaded for the testing. Developers may or may not have access to this system
The production environment has the tested code and all the real data. Only trusted system admins have access to this system. Developers do not have any access to this system.
The sensitive data is protected by the system admins granting the correct permission to roles that people play in maintaining the overall system.
At some point you need to trust someone with your data, but by partitioning it you can reduce the number of people who have access to it.
Edit
From a comment it seems that you already have this architecture, and that you want to transfer the live data from production server to the development server. In general that is a Bad Idea, and defeats the purpose of having the split environment.
Unless you have some sort of compelling reason to do so, there should be no need to have actual sensitive data in the development environment. If you want to do load testing etc then get some development people to code up data generation routines.
Related
Can someone intercept it like man in the middle or something like that?
What's the drill when you want to backup your database with all the user info onto your hard drive?
Should I encrypt it? My phpMyAdmin runs under SSL so I guess connection is encrypted.
I assume that you are interested in moving your production database into your local PC for development purposes, and that you are concerned about protecting your end users from prying eyes. You highlight a number of valid concerns about how the data might be intercepted, but there is something that you seem to not realize: Exporting your production database to your developer machine is in-and-of-itself a breach of your application's security, even if nobody else knows about it. Your users expect that their data remain hidden from everyone, including you as a (presumably well-intentioned) developer! Using their personal data for your own purposes--no matter how honorable you may think your purposes are--is a violation of those expectations. With the exception of creating backups solely for the purpose of recovering from catastrophic data-loss events (failing hard drives in production, botched patches, and whatnot), you should never be sending raw dumps of your production data anywhere. (And sure, when you do create those backups, you should probably encrypt them.)
Getting back to the assumption that you want to protect your end-users' data, your best bet against malicious entities is to implement data masking on sensitive data. This is where you export the relational contents of your schemata with data that won't compromise the identities or intent of your actual, real-world users. Essentially, you replace names and e-mail addresses with spoof identities, and anything else that could be classified as "confidential" (which you will need to determine on your own) is similarly redacted and/or replaced with fake data.
The advantages to data-masking should be immediately apparent. Even if (god forbid!) someone were to intercept your backup and try to deduce information about your users from it, all they would end up with is a set of fictional data, which cannot be used to infer anything about your actual users. Of course, if they did intercept your dump somehow, they could easily reverse-engineer your schema, and that would be a trouble in-and-of-itself, but at least they won't immediately have access to the private details of your userbase. That said, there are a number of reasonably secure ways to transmit data these days, such as FTP over SSL (aka FTPS). (Note that FTPS is not the same as SFTP!)
I am working on creating a spec for a startup to create a financial broker check website. It involves storing information about financial advisers and payment details of the users (so obviously needs a lot of security). What kind of databases are best suited for the application. Is MySQL or its open source variations enough or is it better to go with Oracle Enterprise etc. Also any info about the usefulness of application servers over traditional web servers (cloud based or normal) in this scenario and the preferred scripting language (PHP, Ruby, Python) for secure web applications.
Your choice of language, database, etc. has a relatively small impact on the security of your application. The developer's understanding of how to write secure code and the developer's understanding of the features provided by their tools is far more important. It is entirely possible to write a secure application on an open source LAMP stack. It is entirely possible to write a secure application on a completely closed source stack. It is also very easy to write insecure applications on any stack.
An enterprise database like Oracle will (depending on the edition, the options that are licensed, and the add-ons that are purchased) provide a host of security functions that may be useful. You can transparently encrypt the data at rest, you can encrypt the data when it flows over the network to the app server, you can prevent the DBA from viewing sensitive data, you can audit the actions of the DBA and other users, etc. But these sorts of things really only come into play when you've written a reasonably secure application to begin with. It does you little good to encrypt all the data if your application is vulnerable to SQL injection attacks and can be easily hacked to present all the decrypted data to the attacker, for example.
Background info:
We are currently 3 web programmers (good, real-life friends, no distrust issues).
Each programmers SSH into the single Linux server, where the code resides, under their own username with sudo powers.
We all use work on the different files at one time. We ask the question "Are you in the file __?" sometimes. We use Vim so we know if the file is opened or not.
Our development code (no production yet) resides in /var/www/
Our remote repo is hosted on bitbucket.
I am *very* new to Git. I used subversion before but I was basically spoon-fed instructions and was told exactly what to type to sync up codes and commit.
I read about half of Scott Chacon's Pro Git and that's the extent to most of my Git knowledge.
In case it matters, we run Ubuntu 11.04, Apache 2.2.17, and Git 1.7.4.1.
So Jan Hudec gave me some advice in the previous question. He told me that a good practice to do the following:
Each developer have their own repo on their local computer.
Let the /var/www/ be the repo on the server. Set the .git folder to permission 770.
That would mean that each developer's computer need to have their own LAMP stack (or at least Apache, PHP, MySQL, and Python installed).
The codes are mostly JavaScript and PHP files so it's not a big deal to clone it over. However how do we locally manage the database?
In this case, we only have two tables and it'll be simple to recreate the entire database locally (at least for testing). But in the future when the database gets too big, then should we just remotely log on the MySQL database on the server or should we just have a "sample" data for developing and testing purposes?
What you're doing is transitioning from "everybody works together in one environment" to "everybody has their own development environment". The major benefit is everybody won't be stepping on each other's feet.
Other benefits include a heterogeneous development environment, that is if everyone is developing on the same machine the software will become dependent on that one setup because developers are lazy. If everyone develops in different environments, even just with slightly different versions of the same stuff, they'll be forced to write more robust code to deal with that.
The main drawback, as you've noticed, is setting up the environment is harder. In particular, making sure the database works.
First, each developer should have their own database. This doesn't mean they all have to have their own database server (though its good for heterogeneous purposes) but they should have their own database instance which they control.
Second, you should have a schema and not just whatever's in the database. It should be in a version controlled file.
Third, setting up a fresh database should be automatic. This lets developers set up a clean database with no hassle.
Fourth, you'll need to get interesting test data into that database. Here's where things get interesting...
You have several routes to do that.
First is to make a dump of an existing database which contains realistic data, sanitized of course. This is easy, and provides realistic data, but it is very brittle. Developers will have to hunt around to find interesting data to do their testing. That data may change in the next dump, breaking their tests. Or it just might not exist at all.
Second is to write "test fixtures". Basically each test populates the database with the test data it needs. This has the benefit of allowing the developer to get precisely the data they want, and know precisely the state the database is in. The drawbacks are that it can be very time consuming, and often the data is too clean. The data will not contain all the gritty real data that can cause real bugs.
Third is to not access the database at all and instead "mock" all the database calls. You trick all the methods which normally query a database into instead returning testing data. This is much like writing test fixtures, and has most of the same drawbacks and benefits, but it's FAR more invasive. It will be difficult to do unless your system has been designed to do it. It also never actually tests if your database calls work.
Finally, you can build up a set of libraries which generate semi-random data for you. I call this "The Sims Technique" after the video game where you create fake families, torture them and then throw them away. For example, lets say you have User object who needs a name, an age, a Payment object and a Session object. To test a User you might want users with different names, ages, ability to pay and login status. To control all that you need to generate test data for names, ages, Payments and Sessions. So you write a function to generate names and one to generate ages. These can be as simple as picking randomly from a list. Then you write one to make you a Payment object and one a Session object. By default, all the attributes will be random, but valid... unless you specify otherwise. For example...
# Generate a random login session, but guarantee that it's logged in.
session = Session.sim( logged_in = true )
Then you can use this to put together an interesting User.
# A user who is logged in but has an invalid Visa card
# Their name and age will be random but valid
user = User.sim(
session = Session.sim( logged_in = true ),
payment = Payment.sim( invalid = true, type = "Visa" ),
);
This has all the advantages of test fixtures, but since some of the data is unpredictable it has some of the advantages of real data. Adding "interesting" data to your default sim and rand functions will have wide ranging repercussions. For example, adding a Unicode name to random_name will likely discover all sorts of interesting bugs! It unfortunately is expensive and time consuming to build up.
There you have it. Unfortunately there's no easy answer to the database problem, but I implore you to not simply copy the production database as it's a losing proposition in the long run. You'll likely do a hybrid of all the choices: copying, fixtures, mocking, semi-random data.
A few options, in order of increasing complexity:
You all connect to the live master DB, read/write permissions. This is risky, but I guess you're already doing it. Make sure you have backups!
Use test fixtures to populate a local test DB and just use it. Not sure what tools there are for this in the PHP world.
Copy (mysqldump) the master database and import it into your local machines' MySQL instances, then set up your dev environments to connect to your local MySQL. Repeat the dump/import as necessary
Set up one-way replication from the master to your local instances.
Optionally, set up a read-only user on the main DB, and configure your app to let you switch to a read-only connection to the real master DB in case you can't wait for that next copy of the master data.
Own repo does not mean own Staging server (this config is hardly maintained and extremely bad scaled to 10-20-100 developers)
It's always better to have as soon as possible (semi-)automated build-system, which convert repository-stored source-data to live system (less handwork - less changes to make non-code errors) and (maybe) some type of Continuos Integration (test often, find bugs fast). For build-system (DB-part) you have only to prepare initial data (tables structures, data-dumps) as (versioned) texts, which are
easy mergeable between merges
handled and processed and converted to final usable object by code, not by hand - no human errors, no operation's interferences
Let's say that you have a standalone application (a Java application in my case) and that this application has a configuration file (a XML file in my case) where you store the credentials (user and password) for a bunch of databases you need to connect.
Everything works great, but now you discover (or your are given a new requirement like me) that you have to put this application in a different server and that you can't have these credentials in the configuration files because of security and/or compliance considerations.
I'm considering to use data sources hosted in the application server (a WAS server), but I think this could have poor performance and maybe it's not the best approach since I'm connecting from a standalone application.
I was also considering to use some sort of encryption, but I would like to keep things as simple as possible.
How would you handle this case? Where would you put these credentials or protect them from being compromised? Or how would you connect to your databases in this scenario?
I was also considering to use some
sort of encryption, but I would like
to keep things as simple as possible.
Take a look at the Java Cryptography Architecture - Password Based Encryption. The concept is fairly straight forward, you encrypt/decrypt the XML stream with a key derived from a user password prior to (de)serializing the file.
I'm only guessing at what your security/compliance considerations require, but definitely some things to consider:
Require strong passwords.
Try to minimize the amount of time that you leave the sensitive material decrypted.
At runtime, handle sensitive material carefully - don't leave it exposed in a global object; instead, try to reduce the scope of sensitive material as much as possible. For example, encapsulate all decrypted data as private in a single class.
Think about how you should handle the case where the password to the configuration file is lost. Perhaps its simple in that you can just create a new config file?
Require both a strong password and a user keyfile to access the configuration file. That would leave it up to the user to store the keyfile safely; and if either piece of information is accidentally exposed, it's still useless without both.
While this is probably overkill, I highly recommend taking a look at Applied Cryptography by Bruce Schneier. It provides a great look into the realm of crypto.
if your standalone application runs in a large business or enterprise, it's likely that they're using the Lightweight Directory Access Protocol, or LDAP, for their passwords.
You might want to consider using an LDAP, or providing hooks in your application for a corporate LDAP.
I'm considering to use data sources hosted in the application server (a WAS server), but I think this could have poor performance and maybe it's not the best approach since I'm connecting from a standalone application.
In contrary, those datasources are usually connection pooled datasources and it should just enhance DB connecting performance since connecting is per saldo the most expensive task.
Have you tested/benchmarked it?
there's this interesting problem i can not solve myself. I will be very glad, if you help me.
Here's it:
there are many client applications that send data records to one MySQL server.
Few data records are not very important, but the whole database is. (You can imagine it is facebook DB :) )
Is there any way to ensure that
data from DB won't be used by anyone but true owner
DB will preserve essential features such as sorting etc.
assuming that attacker can mysteriously gain full access to server?
You can't simply encrypt data client-side and store it encrypted, since client application is wide-spread and attacker can get key from it.
Maybe adding some layers between application and DB, or combining encryption methods client- and server-side (using mysql built-in methods) will help?
As long as the database needs to start up and run unattended you can't hide the keys from a compromised root account (= 'mysterious full access'). Anywhere the database could possibly store the master key(s), the root will also have access. No amount of business layers or combination of client-server encryption will ever circumvent this simple fact. You can obfuscate it till the day after but if the prize is worth then root can get it.
One alternative is to require a manually assisted start up process, ie. a human enters the master key password during the server boot (or hardware module PIN), but this is extremely hard to maintain in real world, it requires a highly trusted employee to be on pager call to log in and start the database whenever there is downtime.
Solutions like TPM offer protection against physical loss of the server, but not against a compromised root.
Your root is as important as the database master key(s), so you must protect your root with the same care as the keys. This means setting up operating procedures, screening who has access to root, rotating the root password and so on and so forth. The moment someone gains 'mysteriously full access' the game is pretty much lost.
I pretty much agree with Remus Rusanu's answer.
Maintaining good security is hard, but you can always pay attention to what you do. When ever you access sensitive information carefully verify your query and make sure it cannot be spoofed or exploited to gain access to information which shouldn't be accessible by given client.
If you can roll out physical access to the box by the attacker then there are several things you can do to harden your security. First of all I'd configure ssh access only to only allow connections from specific IP or IP range (and of course no root access). You can also do that that on your firewall. This would mean that the weakest link is your server (the application which receives data/requests from clients, could be web-server and whatever scripts you use). Now you "just" have to make sure that no one can exploit your server. There are a lot more things you could do to harden your system, but it think it would be more appropriate to ask on ServerFault.
If you're worried about physical access to the PC, there isn't really much you can do and most stuff has already been mentioned in Remus answer.
There's also another option. This is by far the most ineffective method from speed and ease to develop viewpoint, but it would partly protect you from any kind of an attack on your server (including physical). It's actually quite simple, but a bit hard to implement - only store the encrypted data in the database and handle all encryption/decryption client-side using javascript or flash. Only the client will have the key and data will always be transfered over the wire and stored in encrypted format. The biggest drawback is that once client forgets the key there's no way back, the data is inaccessible.
Of course it's all matter of time, money and effort - with enough of these anything can be broken.
I've no idea if such a thing exists in MySql, but row-level-versioning in Oracle enables you to define access rights on row-level IN the database: so that means, regardless of what tool is being used to access the data, the user only ever sees the same selection as determined by his/her credentials.
So if my username/role is only allowed to see data limited by some WHERE clause, that can appended to each and every SELECT that appears in the database, regardless of whether it comes from a web app, a SQL querying tool, or whatever.
I will use a 2nd layer and a firwall between them.
so you have firewall ---- web server --- firewall -- 2nd layer server --- firewll --- db
it will be wise to use different platfroms between layers, it all depends how important is the data.
anyway - the web server should have no access to DB.
about preserving sort - if you use a file encrypotion mechisim - it will only protect you from Hard drive theaft.
if you encrypt the data it self, and if you do it smartly (storing the keys in a separate place) you will not loose sorting as you will look for the encryoted entry and not the real one- but now you have another thing to protect....