We have a MySQL database and would like to have row-level security implemented at the database level. I have been playing with the Veil plug-in for PostgreSQL and like what it does. Is there something similar for MySQL so we do not have to convert over to PostgreSQL?
Update
It isn't so much that we would be using veil, or its MySQL equivalent, for authentication but to determine which rows to display for an already authenticated user. User privileges are based on a relational context. Without concerning ourselves with a plug-in, how efficient would a view be where the user privilege is based on multiple joins on a table with 100k rows? The ultimate goal is to be able to display different data to two different users based on the individual users privileges to the rows in a table of 100k> rows using the same query.
This isn't a common feature mainly because its in most cases this is not the right place for this security system. If you could provide more details into the exact attack you are attempting to defend against perhaps there is a more suitable secuirty system to fill this requirement.
Usually you are looking to limit to a specific user based on an Access Control List implementation that you application dictates. There are cases where you want two applications to share the same data and you want to limit the impact of a compromise of one of the applications. In this case you could split it up and give 1 application read access to a database while the the other has write or read/write. Using the databases native access control like this it is possible to safely pass information between applications. The main threat being defended against is that if 1 application where compromised due to a vulnerability like SQL Injection both databases would also succumb to the attack.
There is also sepgsql which does also this for PostgreSQL. This secuirty system could be used for better separation of applications with some dependent data, but this a very unusual software requirement. In general this secuirty system should be avoided in favor of other more common and proven systems.
Related
I am developing a site that has many subdomains in it.
It has blogging module, management system, and many more. I have shared this question in various sites but couldn't get a proper reply.
Question is should I use one database for all the modules, this means my database would have nearly 100 tables. Is this approach be appropriate or should I create separate database for every module?
Well, it does not really matter.
If you use innodb with single data file (innodb_file_per_table setting is not enabled), then all data will be stored in a single file anyway.
With innodb separate file per table mode or with myisam table engine, the only difference between one or multiple databases is really the directory where the database files are stored. Unless the directories (databases) are located in different storage devices with different speeds, their performance will be the same.
There can be 2 reasons to keep some tables in a different database:
Security: mysql does not support role based access control. Therefore if there is a group of tables that should be accessible by a certain group of users only, then the access control is more manageable if those tables are in a different database.
If some of the modules you mentioned happen to use the same table name, then you will have to move them to a separate database or you need to modify the code and table names to avoid errors.
There is no right or wrong way to design a system. Just advantages and disadvantages to the various techniques. I normally work in Oracle and SQL Server so I had to look up some terms for MySQL. According to my research, in MySQL a database is synonymous with a schema which changes things. I'd consider these things when planning the physical design for any vendor:
Security - Do all subdomains need read/write to each other? How are the users secured? Choosing one or many schemas can impact how easy schema and user security is to manage and control.
Growth - Do some subdomains grow at a faster rate than others? If yes, I'd consider separating them to allow for the different growth rates.
Organization - Is it easier to identify the different subdomains in practice if they're separated? If you don't separate them, use a strong naming convention so you can easily identify objects within one subdomain.
Linking - How easy is it to access one schema/database from another?
Hope this helps.
Scenario:
Building a commercial app consisting in an RESTful backend with symfony2 and a frontend in AngularJS
This app will never be used by many customers (if I get to sell 100 that would be fantastic. Hopefully much more, but in any case will be massive)
I want to have a multi tenant structure for the database with one schema per customer (they store sensitive information for their customers)
I'm aware of problem when updating schemas but I will have to live with it.
Today I have a MySQL demo database that I will clone each time a new customer purchase the app.
There is no relationship between my customers, so I don't need to communicate with multiple shards for any query
For one customer, they can be using the app from several devices at the time, but there won't be massive write operations in the db
My question
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and if it's a common practice to have one dedicated SQLite3 database PER CLIENT. I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
Is this a correct scenario for SQLite?
Any suggestion (aka tutorial) in how to achieve this?
[I wonder] if it's a common practice to have one dedicated SQLite3 database PER CLIENT
Only if the database is deployed along with the application, like on a phone. Otherwise I've never heard of such a thing.
I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
SQLite is a SQL database and responds to ALTER TABLE and the like. As for updating all the schemas, you'll have to re-run the update for all schemas.
Schema synching is usually handled by an outside utility, usually your ORM will have something. Some are server agnostic, some only support specific servers. There are also dedicated database change management tools such as Sqitch.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and
SQLite's main advantage is not requiring you to install and run a server. That makes sense for quick projects or where you have to deploy the database, like a phone app. For server based application there's no problem having a database server. SQLite's very restricted set of SQL features becomes a disadvantage. It will also likely run slower than a server database for anything but the simplest queries.
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
Under no circumstances should you test with a different database than the production database. Databases do not all implement SQL the same, MySQL is particularly bad about this, and your tests will not reflect reality. Running a MySQL instance for testing is not much work.
This separate schema thing claims three advantages...
Extensibility (you can add fields whenever you like)
Security (a query cannot accidentally show data for the wrong tenant)
Parallel Scaling (you can potentially split each schema onto a different server)
What they're proposing is equivalent to having a separate, customized copy of the code for every tenant. You wouldn't do that, it's obviously a maintenance nightmare. Code at least has the advantage of version control systems with branching and merging. I know only of one database management tool that supports branching, Sqitch.
Let's imagine you've made a custom change to tenant 5's schema. Now you have a general schema change you'd like to apply to all of them. What if the change to 5 conflicts with this? What if the change to 5 requires special data migration different from everybody else? Now let's imagine you've made custom changes to ten schemas. A hundred. A thousand? Nightmare.
Different schemas will require different queries. The application will have to know which schema each tenant is using, there will have to be some sort of schema version map you'll need to maintain. And every different possible query for every different possible schema will have to be maintained in the application code. Nightmare.
Yes, putting each tenant in a separate schema is more secure, but that only protects against writing bad queries or including a query builder (which is a bad idea anyway). There are better ways mitigate the problem such as the view filter suggested in the docs. There are many other ways an attacker can access tenant data that this doesn't address: gain a database connection, gain access to the filesystem, sniff network traffic. I don't see the small security gain being worth the maintenance nightmare.
As for scaling, the article is ten years out of date. There are far, far better ways to achieve parallel scaling then to coarsely put schemas on different servers. There are entire databases dedicated to this idea. Fortunately, you don't need any of this! Scaling won't be a problem for you until you have tens of thousands to millions of tenants. The idea of front loading your design with a schema maintenance nightmare for a hypothetical big parallel scaling problem is putting the cart so far before the horse, it's already at the pub having a pint.
If you want to use a relational database I would recommend PostgreSQL. It has a very rich SQL implementation, its fast and scales well, and it has something that renders this whole idea of separate schemas moot: a built in JSON type. This can be used to implement the "extensibility" mentioned in the article. Each table can have a meta column using the JSON type that you can throw any extra data into you like. The application does not need special queries, the meta column is always there. PostgreSQL's JSON operators make working with the meta data very easy and efficient.
You could also look into a NoSQL database. There are plenty to choose from and many support custom schemas and parallel scaling. However, it's likely you will have to change your choice of framework to use one that supports NoSQL.
It seems that MySQL provides some sort of API interface to it. I have never used it, but I think it would be an interesting feature if I could:
Specify which tables a user has access to
Restrict read, update, and delete operations only to records the users created (so an ownership concept needs to be supported)
Will the API support that? If not, are there any other solutions that might allow me to do so?
As the author of mysql-crud-api I can say that I understand your question. Although the permission rules regarding tables and/or users are application specific and should thus need to be configurable.
You may want to read about multitenancy. You may want to support multiple users, but the ownership and permissions may vary between applications. That is why I think the tool you are looking for does not exist.
In order to support multitenancy mysql-crud-api supports a multi-database mode. Using MySQL's built-in permission system you can use it to partition the database.
Not sure this helps you, as I do not know what you want to use it for.
I've used a MySQL API for python and it lets you directly interact with your databases by name (an empty string in the database name parameter will get you to your root database where you can create databases, grant all permissions, etc). You are able to execute sql queries directly on the database.
SQL is an API to a database; there are numerous other interfaces supported as well (either that compile into SQL or that use some other API).
You question appears to be more about row-level permissioning than about a particular API. MySQL does not have a built-in permissioning system at the row-level. A quick glance at the web (Google: "MySQL row level permissions") yields hits such as this.
This is more of a conceptual question so variations on the stack are welcome should they be capable of accomplishing the same concept. We're currently on MySQL and expanding some services out into MongoDB.
The idea is that we would like to be able to manage a single physical database schema/structure so that adjustments, expansions etc. don't become overly cumbersome as the number of clients utilizing the structure grows into the thousands, tens of, hundreds of, etc. however we would like to segregate their data at this level rather than simply at the application layer to provide a more rigid separation. Is it possible to create virtual bins for each client using the same structure, but have their data structurally separated from one another?
The normal way would obviously be adding Client Keys to every row of data either directly or via foreign relationships, but given that we can't foresee with 20/20 how hacks on our system might occur allowing "cross client" data retrieval, I wanted to go a little further to embed the separation at a virtually structural level.
I've also read another post here: MySQL: how to do row-level security (like Oracle's Virtual Private Database)? which uses "views" as a method but this seems to become more work the larger the list of clients.
Thanks!
---- EDIT ----
Based on some of the literature suggested below, here's a little more info on our intent:
The closest situation of the three outlined in the MSDN article provided by #Stennie would be a single database, multiple-schema, however the difference being, we're not interested in customizing client schemas after their creation, we would actually prefer they remain locked to the parent/master schema.
Ideally the solution would keep each schema linked to the parent table-set structure rather than simply duplicating it with the hope that any change to the parent or master schema would be cascaded across all client/tenant schemas.
Taking it a step further, in a cluster we could have a single master with the master schema, and each slave replicating from it but with a sharded set of tenants. Changes to the master could then be filtered down through the cluster without interruption and would maintain consistency across all instances also allowing us to update the application layer faster knowing that all DB's are compatible with the updated schemas.
Hope that makes sense, I'm still a little fresh at this level.
There are a few common infrastructure approaches ranging from "share nothing" (aka multi-instance) to "share everything" (aka multi-tenant).
For example, a straightforward approach to your "virtual bins" would be to allocate a database per client using shared database servers. This is somewhere in between the two sharing extremes, as your customers would be sharing database server infrastructure but keeping their data and schema separate.
A database-per-client approach would allow you to:
manage authentication and access per client using the database's authentication & access controls
support different database software (you mention using both MySQL which supports views, and MongoDB which does not)
more easily backup and restore data per client
avoid potential cross-client leakage at a database level
avoid excessive table growth and related management issues for a single massive database
Some potential downsides would include:
having more databases to manage
in the case of a database where you want to enforce certain schema (i.e. MySQL) you will need to apply the schema changes across all your databases or support some form of versioning
in the case of a database which preallocates storage (i.e. MongoDB) you may use more storage per client (particularly if your actual data size is small)
you may run into limits on namespaces or open files
you still have to worry about application and data security :)
If you do some research on multi-tenancy you will find some other solutions ranging from this example (isolated DB per client on shared database server architecture) through to more complex partitioned data schemes.
This Microsoft article includes a useful overview of approaches and considerations: Multi-tenant SaaS database tenancy patterns.
I am working on a site that multiple projects will be using to enter confidential subject information for various research projects. Project data access will be limited to specific users and tools. But certain core data will be referenced in and joined to the project tables (username, project meta-data, etc). The current plan is that each project will have mysql users with any combination of Select, Update, or Insert rights as needed. Plus an overall project Adminstrator user that can alter the shape of the project's tables that will only be used in phpadmin. We are using a Database object with some backtrace logic to determine what object passed it connection credentials and will only allow that connection to be used by the originating object (not impossible to get around by a dedicated programmer, but would throw up red flags in code review). And we are following standard procedure of moving the config out of the web root and keeping all credentials in config files instead of code. Of course there is an overall administrator but that has so many access rules and it's password is ludicrously long (we have a static yubikey + 10 char password).
What I want to know is whether to separate project data out to their own databases or should I put them in tables that have access limited to certain accounts? Setting user permissions on the Database or Table level seem to be about equitable in difficulty. There will be joins and other such operations between the core tables (meta-data usually) and the protected data. But joining across databases on the same server works fine, but I am uncertain about how the performance of intra-database joins compare to inter-database joins.
It doesn't matter if you put them in the same database or in different ones. You can implement a good (or a bad) security concept with both alternatives.
if you are using one database and you put data for different users in one table you will have to implement a lot of the access control in you application.
if you have separated the data completely in different tables (or even databases) you can easily use the access control of mysql. In this case I would go with separated databases, because it is more convenient when setting up a backup system or if you want to scale your application over more than one machine. But since you want to join across different databases you gonna lose some of these advantages so it doesn't really matter.