I work at a small company and I am trying to figure out a solution for storing sensitive data of multiple clients in Microsoft SQL server. Actually, I feel like this is a general database question and it is not specific to MSSQL.
Until now we have been using a proprietary database where the client data is stored as db files (flat files) in the client’s root directories in the file system. So the operating system permissions guarantee that the application used by client X can never fetch data from client Y’s database. Please note that there is no database server/instance/engine here…
However, for my project I want to use SQL database. But the security folks are expressing concerns over putting data of different clients on a single database.
One option is to create separate database instances for different clients. However, I am not sure if this idea is scalable.
So my questions are:
1) Is there any mechanism in MSSQL that enables you to store databases ‘separately’ in different files used by the SQL server?
2) Let’s say I have only one database instance where I have databases of client X and client Y. How can I make sure that client X’s requests can never (accidentally) get misdirected to client Y’s database? I do not want to rely on some parameter in my code to determine which database to fetch from! :)
So, is there any solid authentication scheme to guarantee that my queries could not be misdirected to fetch from an incorrect client table?
I think this is a very common problem and there has to be a good solution for this. What are other companies doing?
Please let me know if there are any good articles to read up on this.
Different databases are always stored in different files in SQL Server so you don't even have to do anything special for this. However, NTFS permissions will not help you in this case as the clients aren't ever accessing the files directly on disk.
One possible solution in SQL Server is to create separate sets of Windows user IDs and map those to separate SQL Logins for each customer. You could then only assign those logins access to the appropriate databases. For example, if you were hosting web sites for client X and client Y, you would set up the connection string(s) in the web.config for client X's web site to use the appropriate login(s) for client X's database. Vice versa for client Y. This guarantees that no matter what (barring a hard-coded login), the code from client X's site will never access client Y's database.
You can have 32,000 databases on a single instance of SQL server and having separate databases enables a number of improved serviceability scenarios (such as restoring a single customer's DB in case of a data problem without affecting all of your other customers).
http://technet.microsoft.com/en-us/library/ms143432.aspx
Related
We have a client who is determined to keep their data in our cloud VM separate from other client data. That is, we have a centralized MySQL database where we store all of our client data and access the data depending on the id etc. The clients are now requesting that their data is separated one from the other. Meaning that if the database is hacked the hacker can't jump from one users data to see the others. I have never heard of this type of functionality especially for MySQL databases (you can create users and allocate to tables but not to specific data in a table) as far as I know. Possibly this is a functionality of Azure databases or something.
Has anyone encountered something like this request/solution?
Thanks
I did work for a notification service. We stored each client's data in a separate schema, but on the same MySQL instance. The reason was to keep PII (Personally Identifiable Information) separate, so on any given application request, it was not possible that it could accidentally read data for another client.
The application first connected to a special schema that stored a table listing all the client schemas and the username & password for each client schema. The app reads this table to query for one specific client, then opens a new connection using that username & password.
It added a little bit of overhead to every session to do this two-step connection, but it wasn't too much.
I'm not sure how this eliminates the possibility of being hacked. That's still a risk. If an attacker hacks the primary database, why couldn't they also hack the specific client's database?
I need to create a system with local webservers on Raspberry Pi 4 running laravel for API calls, websockets, etc. Each RPI will be installed in multiple customers places.
For this project i want to have the abality to save/sync the database to a remote server (when the local system is connected to internet).
Multiple locale databases => One remote database cutomers based
The question is, how to synchronize databases and identify properly each customers data and render them in a mutualised remote dashboard.
My first thought was to set a customer_id or a team_id on each tables but it seems dirty.
The other way is to create multiple databases on the remote server for the synchronization and one extra database to set customers ids and database connection informations...
Someone has already experimented something like that? Is there a sure and clean way to do this?
You refer to locale but I am assuming you mean local.
From what you have said you have two options at the central site. The central database can either store information from the remote databases into a single table with an additional column that indicates which remote site it's from, or you can setup a separate table (or database) for each remote site.
How do you want to use the data?
If you only ever want to work with the data from one remote site at a time it doesn't really matter - in both scenarios you need to identify what data you want to work with and build your SQL statement to either filter by the appropriate column, or you need to direct it to the appropriate table(s).
If you want to work on data from multiple remote sites at the same time, then using different tables requires tyhat you use UNION queries to extract the data and this is unlikely to scale well. In that case you would be better off using a column to mark each record with the remote site it references.
I recommend that you consider using Uuids as primary keys - it may be that key collision will not be an issue in your scenario but if it becomes one trying to alter the design retrospectively is likely to be quite a bit of work.
You also asked about how to synchronize the databases. That will depend on what type of connection you have between the sites and the capabilities of your software, but typically you would have the local system periodically talk to a webservice at the central site. Assuming you are collecting sensor data or some such the dialogue would be something like:
Client - Hello Server, my last sensor reading is timestamped xxxx
Server - Hello Client, [ send me sensor readings from yyyy | I don't need any data ]
You can include things like a signature check (for example an MD5 sum of the records within a time period) if you want to but that may be overkill.
This is a similar question to "Storing MS SQL Server credentials in a MySQL Database"
So, in theory, imagine I have 1 MySQL server. I have a "master" database, and then X number of other generic databases. What im looking for, is a way of using an app (for arguments sake, lets say a web app, running on php) to first access the master database. This database then needs to tell the app which database to connect to - in the process, giving it all the credentials and username etc.
How is the best way around this?
The three ideas I have so far
Store the credentials in the master database for all the other databases. These credentials would of course be encrypted in some way, AES probably. The app would get the encrypted credentials, decrypt, connect.
Store the credentials elsewhere - maybe a completely separate server. When the master database is accessed, it returns some sort of token, which can be used to access the credential storage. Again, encrypted via AES.
Using some sort of system that I am not aware of to do exactly this.
Not doing this at all, and come up with a completely different approach.
To give a little example. "master" would contain a list of clients. Each client would contain it's own separate database, with it's own permissions etc.
I've had no reason to do this kind of thing myself but your first two ideas sound good to me and (as long as you include server address) not even necessarily separate ideas (could have some clients on the server with master, and some elsewhere) the client logic won't need to care. The only issue I can see is keeping the data in the "master" schema synced with the server's security data. Also, I wouldn't bother keeping database permissions in the master schema as I would think all clients have the same permissions, just specific to their schema. If you have "permissions" (settings) that limit what specific clients can do (perhaps limited by contract/features paid for), I would think it would be much easier to keep those in that clients' schema but where their db user cannot change data.
Edit: It is a decent idea to have separate database users in this kind of situation; it will let you worry less about queries from one user's client inadvertently (or perhaps maliciously) modifying another's (client account should only have permissions to access their own schema.) It would probably be a good idea to keep the code for the "master" coordination (and connection) somewhat segregated from the client code base to prevent accidental leaking of access to that database into the client code; even if encrypted you probably don't want them to even have any more access than necessary to your client connection info.
I did something like this not long ago. It sounds like you're trying to build some kind of one-database-per-tenant multi-tenant system.
Storing encrypted credentials in a directory database is fine, since there's really no fundamentally different way to do it. At some point, you need to worry about storing some secret (your encryption key) no matter what you do.
In my use case, I was able to get away with a setup where the directory just mapped tenants to db-hosts. The database name and credentials for each tenant were derived from the tenant's identifier (a string). So something like, given a TenantID T:
host = whatever the directory says.
dbname = "db_" + T
dbuser = T
dbpass = sha1("some secret string" + T)
From a security standpoint, this is no better (actually a bit worse) than storing AES encrypted credentials in the directory database, since if someone owns your app server, they can learn everything either way. But it's pretty good, and easy to implement.
This is also nice because you can think about extending the idea a bit and get rid of the directory server entirely and write some function that maps your tenant-id to one of N database hosts. That works great until you add or remove db servers, and then you need to handle shuffling things around. See how memcache works, for example.
You can use Vault to do this in much systematic way. In fact this is a strong use-case for this.
Percona has already written a great blog on it,
I come from a strong experience in MySQL, and I am now starting with Oracle. But I find really difficult to understand what a DATABASE is in Oracle, given that they use similar concepts which I am struggling to differentiate. In mysql, there is a simple concept of "database" instead of a mixture of
SCHEMA concept (User's woprkspace logically divided by TABLESPACES)
TNS and SID/SERVICES concept
CONNECTION concept (in both ODBC definition and SQLDeveloper)
I won't ask a pure definition of them as I am still reading, but just some guidance on how can I map a mysql database in the closest Oracle concept.
This is the information I can give you coming from the perspective of a developer. I don't know huge amounts about Oracle, but I have done some fairly significant work with deploying to it for some applications that are now in production.
Database
A database, in Oracle terms, is a group of files that live on disk and are managed as a cohesive unit. The database contains almost everything: logins, roles, tables, indexes, temporary space, transaction logs, and so on. Creating one is a nontrivial task in Oracle. It basically requires direct access (as in SSH or Windows Remote Desktop) to the machine. It's common for a DBA to create one during installation and for that to be the only one the server ever hosts. Unlike in MySQL, PostgreSQL, and SQL Server, you can't really use this level for basic grouping. E.g., giving each developer their own database is uncommon because of the overhead in recreating it.
Schema
Oracle schemas conflate two purposes: users and namespaces.
Each schema is a user, and it can be associated with credentials (a password, a user in Active Directory). Note that all accounts are database specific; there is no way to create a user that can log into multiple databases (aside from pointing both databases at the same LDAP server or otherwise involving some external service).
The schema also acts as a namespace that contains objects (e.g., tables, views, procedures, and indexes), and the schema name can be used explicitly to qualify exactly which object you're trying to refer to. For example, if I say MYOWNER.MYTABLE, Oracle will look for the MYTABLE table owned by MYOWNER. If you need multiple copies of all the same objects, this is the easiest level to group them in, which makes them the best level for having per developer copies of the database.
It is common to divide the two concepts manually: a schema can be locked out of logging in, and permissions can be granted to another user on its objects. This is something of a hassle, though, since there's no way to grant permissions across an entire schema; each object must be granted explicitly to some user or role. There's also no way to force users to create objects in a specific schema besides their own; permissions can only be granted to either create objects in the user's own schema or globally in any schema.
Complete aside: in PostgreSQL and SQL Server, schemas are only namespaces, not users.
Tablespace
Tablespaces are sets of files on disk that contain everything you need to store, including both data and the metadata (such table definitions). A single database can use multiple tablespaces, and different objects within a schema can even be on different tablespaces. A tablespace can be one or many files, but they're managed as one logical unit. Each schema has a default tablespace for its objects if a tablespace isn't specified when creating the object. Sharing them between databases is somewhere between impossible and unheard of.
In practice, it's common to not even bother with tablespaces and just leave the default configuration alone. The default is one tablespace named USERS with one file, and it's the default tablespace for all schemas in the database. If you change these at all, you usually set a default for each schema and then never think about it again until disk space becomes an issue.
Instance
You didn't ask about these specifically, but you'll need to understand them before we can talk about connecting to the database.
An instance is the actual process running on the server that listens for connections. Like databases, these require direct access to the database server to set up. You can have multiple or a single one on the server. It's common to have one per database.
An instance can be identified two ways: an SID or a service name. The SID identifies a single instance, while the service name is an alias that can refer to several instances. The details of how that works are usually unimportant; just know that you need to know of them to connect.
Connecting
To connect from a client, you need a connect descriptor. This is a jumbled string containing the host, port, and either SID or service name. They look like this, for example: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=myoracleserver)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orclservice))). They can get more complicated, but that's the basic form. To use an SID instead of a service name, you would replace SERVICE_NAME=orclservice with SID=orclinstance. There's also a newer, more compact format called "EZ connect" that looks like this instead: myoracleserver:1521/orclservice; it only supports the basic parameters.
TNS is short for "Transparant Network Substrate," and it consists of the entire networking stack that is used to communicate with the database. You virtually never need to concern yourself with it as a whole.
What you encounter often is TNS names. TNS names are aliases to the connect descriptors. They're stored in a plain text file on the client machine, and they're typically global to the entire machine. Here's an example mapping that you mind find in the file: mydatabase=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=myoracleserver)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SID=orcl))). In my experience, most of the time you can actually avoid bothering with TNS names entirely and just use the connect descriptor directly.
A connect identifier is anything that can stand in for a connect descriptor. It can be a full connect descriptor, an EZ connect descriptor, a TNS name, or several other things. But generally, they identify a server and the particular database on it that you want to connect to.
With all that in mind, connections become a little more straightforward. Conceptually, they're pretty much the same as in other database. The thing that might be confusing about them is that you connect as a schema, as described earlier. The "username" is the schema name, and the schema can have a password or some other form of authentication associated with it. The connection string differs according to the client software, much like in any other database. For SQL*Plus (Oracle's command line client), connection strings look like this: [USERNAME]/[PASSWORD]#[connect identifier]. So if your user is MY_SCHEMA, the password is PASS, and the server is like above, it might look like
MY_SCHEMA/PASS#(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=myoracleserver)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SID=orclinstance)))
For a .NET application, it might look like
Data Source=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=myoracleserver)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SID=orcl)));User Id=MY_SCHEMA;Password=PASS
which is pretty similar to any other database. Note that anywhere you see that nasty server information, you could replace that with any connect identifer (such as a TNS name).
As far as SQL Developer is concerned, a "connection" is really just a saved connection string. ODBC connects like any other database; you just need the right connection string and drivers.
Drivers
The drivers can be a pain point in Oracle, depending on language. I believe Java has some decent stand alone clients, but other languages generally depend on the binary version. The binary version does have an installer that puts the binaries on PATH, but the installer is pretty difficult to use and best avoided. When I can, I avoid installing the client and make use of what's called "instant client". Usually, if you can get the instant client binaries in a place where the app can find them, they just work. If not, then it's preferable to just prepend PATH in memory for your application than to modify it globally for your machine.
If you happen to be developing using .NET, use the ODP.NET provider on NuGet from Oracle. It's written in full .NET, eliminating the need to deal with native binaries.
Summary
So in short:
A database is part of the server set up
A schema is both a user and how you divide your database
A tablespace is the physical files that hold the database
TNS names are just a naming convenience on the client side
SID/Service Name are just names used when connecting
I find this arrangement far too complex, personally.
I am working on a site that multiple projects will be using to enter confidential subject information for various research projects. Project data access will be limited to specific users and tools. But certain core data will be referenced in and joined to the project tables (username, project meta-data, etc). The current plan is that each project will have mysql users with any combination of Select, Update, or Insert rights as needed. Plus an overall project Adminstrator user that can alter the shape of the project's tables that will only be used in phpadmin. We are using a Database object with some backtrace logic to determine what object passed it connection credentials and will only allow that connection to be used by the originating object (not impossible to get around by a dedicated programmer, but would throw up red flags in code review). And we are following standard procedure of moving the config out of the web root and keeping all credentials in config files instead of code. Of course there is an overall administrator but that has so many access rules and it's password is ludicrously long (we have a static yubikey + 10 char password).
What I want to know is whether to separate project data out to their own databases or should I put them in tables that have access limited to certain accounts? Setting user permissions on the Database or Table level seem to be about equitable in difficulty. There will be joins and other such operations between the core tables (meta-data usually) and the protected data. But joining across databases on the same server works fine, but I am uncertain about how the performance of intra-database joins compare to inter-database joins.
It doesn't matter if you put them in the same database or in different ones. You can implement a good (or a bad) security concept with both alternatives.
if you are using one database and you put data for different users in one table you will have to implement a lot of the access control in you application.
if you have separated the data completely in different tables (or even databases) you can easily use the access control of mysql. In this case I would go with separated databases, because it is more convenient when setting up a backup system or if you want to scale your application over more than one machine. But since you want to join across different databases you gonna lose some of these advantages so it doesn't really matter.