I'd like to implement a database-driven access control system. I've been reading about ACL, roles, RBAC, etc., but it seems like the most common schemes have some major drawbacks. RBAC, for example, seems to be clunky when it comes to implementing fine-grained access control (for example, allowing a certain role to update only particular columns of a particular record).
What if I structured my access control list like this:
| role | table | action | columns | conditions |
| ----- | ----- | ------ | -------- | ----------------- |
| user | user | view | name, id | self.id = user.id |
| user | user | update | password | self.id = user.id |
| admin | user | update | * | |
| admin | user | create | * | |
| admin | user | delete | * | |
The idea is that a user's role(s) would be checked against this table when they try to access the database (so, implemented at the model level). action can be any one of {create, view, update, delete, list}. The self scope would be a reserved keyword that references the current user's properties. This would allow us for example, to only allow users to update their own passwords (and not someone else's).
Is this robust? Obviously I would still need a separate list to control access to other types of resources like URIs, etc.
Great question. You are hitting the limitations of ACLs and RBAC. There is another way which is more flexible called attribute-based access control (ABAC).
The following diagram shows how access control has evolved over time to cater to more complex scenarios (more users, more data, more devices, more context).
More specifically, you are struggling with the fact that RBAC doesn't support relationships. ABAC does however. ABAC is based on attributes. An attribute is just a key-value pair e.g. role == manager or location == Arizona.
ABAC uses policies with attributes to express authorization scenarios. For instance, in ABAC you can express scenarios such as:
A user with the role == doctor can do the action == view on a resource of type == medical record if the doctor location == the patient location.
There is a standard called XACML (eXtensible Access Control Markup Language) which you can use to implement ABAC. There are even products that offer XACML specifically for databases and data access control such as the Axiomatics Data Access Filter.
If you want to learn more on ABAC I recommend you turn to 2 great resources:
NIST: Guide to Attribute Based Access Control (ABAC) Definition and Considerations (pdf)
Webinar on the NIST document.
Related
I have a MySQL database with a user table, after a new requirement I had to create a new table called social_media and from now on every new user will be created with a social_media_id column that holds a reference to their social media.
+===================+ +===================+
| user | | social_media |
+===================+ +===================+
|PK id | |PK id |
|FK social_media_id | | instagram |
| first_name | | facebook |
| last_name | | twitter |
| email | +===================+
+===================+
I want to update my database so that every user that didn't had a social media reference before gets one (even if the values inside are null), so they can update them if they wish, is there something I can do to make a new social_media record for every user that doesn't have one, and add the correct social_media_id foreign key for that user?
Ok #Jorche, this is too long to be a comment, but I do want to help.
First off, this is probably what youre data structure should look like:
Second, to be able to tell you how you enter these records is very difficult for me at this moment because I have absolutely ZERO requirements or any other business logic that would help me to help you on how to pinpoint the best approach to doing so. Odds are, you would have to work hand in hand with application developers, or ETL developers (that might even be you though) to figure out what that approach is. Maybe its a stored procedure that gets called, maybe its a trigger set up, hard to say for sure without additional context, ya know?
All we know at this point is that users exist and sometimes they have relational data related to social media entities. Your job is literally to understand that process flow and make the appropriate decisions on how to log that data in a way that makes sense from both an operational perspective and a database design perspective.
Hate to say it hombre, but the questions you have now are all entirely dependent on details you haven't provided.
I am hoping someone would be willing to take a look at this many-to-many relationship. This example is for a Laravel project, but the specifics shouldn't matter too much.
action
+----+------+--------+-------------+------+--------+------------+
| id | name | script | description | icon | custom | project_id |
+----+------+--------+-------------+------+--------+------------+
pipeline(action_server this is the pivot table)
+----+-----------+-----------+-------+
| id | action_id | server_id | order |
+----+-----------+-----------+-------+
server
+----+------+------------+------------+
| id | name | ip_address | project_id |
+----+------+------------+------------+
This many-to-many relationship is used for a deployment server an action is part of a deployment's pipeline.
An action can be executed on multiple servers.
A user can add an action with a custom script.
All the actions for a deployment pipeline can be fetched through a project_id
This concept works within Laravel and I could simply fetch my actions based on a given project_id. In turn I could fetch the server actions needed to run the deployment by using action->servers().
I need a way to add default actions though. Instead of actions always having a user supplied script, I want the ability to provide actions with pre-defined scripts for a user to select from and add to a deployment pipeline.
These pre-defined actions can't be stuffed in the action table because the actions defined there are tied into a project_id. These need to be generic.
I can't simply create another table for these pre-defined actions in my current setup because the action_id in my pipeline is already set up with a foreign key.
So far it feels like I am mixing 2 concepts, which are pre-defined actions and the user-defined actions which users have created themselves. They need to be in the same pipeline and eventually run in the right order though.
Any thoughts on how this might be achieved? I am open to all suggestions.
Edit
After drawing this out it seems a possible solution would be to add another pivot table in the form of action_project which allows me to decouple(remove) the project_id from the action table. I am wondering how to keep this clean in Laravel though.
action_project
+----+-----------+------------+
| id | action_id | project_id |
+----+-----------+------------+
Summarizing your problem in a conceptual way:
applications ("projects") have associated custom actions,
standard actions are not defined for a specific application
servers have/host applications
pipelines define which "actions" to perform on which server in which order
I think what you need is simply a generalization of custom actions and standard actions, corresponding to a superclass "action" that subsumes both cases. This leads to the following tables:
actions(id, type, name, description) with type being either custom or standard
custom_actions(id, script, icon, custom, project_id)
Alternatively, you could append the attributes of custom_actions to actions and have them all NULL for standard actions.
I just watched Turning the database inside-out and noticed a similarity between Samza and Redux: all state consists of a stream of immutable objects.
This made me realize that if you edited the stream after-the-fact, you could in theory regenerate all materialized views based on the new list of transactions and in effect "undo" a past change to the database.
As an example, suppose I had the following series of diffs:
1. Add user "tom"
2. Add user "bob"
3. Delete user "bob"
4. Change user "tom"s name to "joe"
5. Add user "fred"
After this series of changes, our database looks like:
+-------+
| users |
+-------+
| joe |
| fred |
+-------+
Now what if I wanted to undo number "3"? Our new set of diffs would be:
1. Add user "tom"
2. Add user "bob"
4. Change user "tom"s name to "joe"
5. Add user "fred"
And our database:
+-------+
| users |
+-------+
| joe |
| bob |
| fred |
+-------+
While this sounds good in theory, can this actually be done using Samza, Storm, or Spark? Can any transaction-stream database do this? I'm interested in such functionality for administrative purposes. I have some sites where clients have accidentally deleted an employee or modified records they didn't mean to. In the past I solved this by creating a separate table which recorded all changes to the database, then when an issue arose I could (manually) look at this table, figure out what they did wrong, and (manually) fix the data.
It would be SO much cooler if I could just look at a transaction stream, remove the bad one, and say "regenerate the database"
I am working on a project where an object (a product for example) could have potentially hundreds of attributes. Objects may have different attributes as well. Because of this, among other more obvious reasons, it doesn't make sense to design a single table with hundreds of columns. It's just not scalable. In my mind, a key/value storage mechanism seems like the correct approach (specifically an Entity-Attribute-Value Model).
The other challenge with this data is it needs to be overridable. To describe this requirement, imagine a company wide retail products database that has "recommended" product attributes. But in different regions, they want to override several different attributes with their own custom values, and then some franchises in each region wants to add an additional override to be specific to their store. In the legacy system, there are multiple tables (each with an excessive amount of columns) where we use a combination of COALESCE (in a View) and code to find the most specific value based on the information we know (product, region, location, etc).
My thoughts:
// An object could be a product, a car, a
// document, etc.
---------------------------------
| Table: object
---------------------------------
| - object_id
| - object_name
---------------------------------
// An attribute could be color, length, etc.
---------------------------------
| Table: attribute
---------------------------------
| - attribute_id
| - attribute_name
---------------------------------
// An owner could be a company, a region,
// a store, etc
---------------------------------
| Table: owner
---------------------------------
| - owner_id
| - parent_owner_id
| - owner_name
---------------------------------
// Object data would be a key/value specific
// to a specific object (entity), a specific
// attribute, and specific owner (override level)
---------------------------------
| Table: objectdata
---------------------------------
| - objectdata_id
| - object_id
| - attribute_id
| - owner_id
| - value
---------------------------------
In thinking about this, it satisfies requirement #1 to have dynamic attributes that can be scaled easily. But for #2, while it provides the data necessary to figure out the overrides, it seems it would be a complex query and may have performance issues. For example, if I am viewing a specific object from the level of an owner 3 levels deep, I need to get all of the attributes defined at the top level owner where they do not have a parent, then get the attributes from each level down merging them in until I reach the specific level.
As an added bonus issue, each attribute could be a multitude of different data types (string, int, float, timestamp, etc). Do I store them all as strings and handle all validation at the application level? Hmmm.
TL;DR; So my issue (and question) is what is an effective data modeling pattern in which I can dynamically add and remove attributes from an object as well as have some sort of parent/child relationship for determining most specific attribute values based on a set of constraints?
NOTE: The retail example above is fictional, but describes the problem much better than the real situation.
I want to accomplish nearly the same thing as this question, which is to store authentication data from multiple sources (Facebook, Twitter, LinkedIn, openID, my own site, etc.) so that one person can log in to their account from any/all of the mentioned providers.
The only caveat being that all user data needs to be stored in a single table.
Any suggestions?
If there is no clean way to accomplish this, is it better to have tons of columns for all of the possibly login methods or create additional tables to handle the login methods with Foreign Keys relating back to the user table (as described in this answer)?
perhaps you want to create a table dedicated to the account types, along with a table for the actual user.
Say you have a users table with an auto_increment uinique ID for each user. Then, you want to create another table example: user_accounts, with it's own auto_icnrement ID, another column for relational ID (to the users table and a 3rd (or/and) 4th table for account type / extra data for authentication if needed.
You then insert a record for each account type for each user. Basically it may look like this:
user_accounts
| ID | User_ID | account_type | authentication
| 1 | 1 | facebook | iamthecoolestfacebookerever
| 2 | 1 | google | mygoogleaccount
In it's most simplistic form. you will probably be storing much different data than that, but hopefully you get the point.