Database schema for ACL - mysql

I want to create a schema for a ACL; however, I'm torn between a couple of ways of implementing it.
I am pretty sure I don't want to deal with cascading permissions as that leads to a lot of confusion on the backend and for site administrators.
I think I can also live with users only being in one role at a time. A setup like this will allow roles and permissions to be added as needed as the site grows without affecting existing roles/rules.
At first I was going to normalize the data and have three tables to represent the relations.
ROLES { id, name }
RESOURCES { id, name }
PERMISSIONS { id, role_id, resource_id }
A query to figure out whether a user was allowed somewhere would look like this:
SELECT id FROM resources WHERE name = ?
SELECT * FROM permissions WHERE role_id = ? AND resource_id = ? ($user_role_id, $resource->id)
Then I realized that I will only have about 20 resources, each with up to 5 actions (create, update, view, etc..) and perhaps another 8 roles. This means that I can exercise blatant disregard for data normalization as I will never have more than a couple of hundred possible records.
So perhaps a schema like this would make more sense.
ROLES { id, name }
PERMISSIONS { id, role_id, resource_name }
which would allow me to lookup records in a single query
SELECT * FROM permissions WHERE role_id = ? AND permission = ? ($user_role_id, 'post.update')
So which of these is more correct? Are there other schema layouts for ACL?

In my experience, the real question mostly breaks down to whether or not any amount of user-specific access-restriction is going to occur.
Suppose, for instance, that you're designing the schema of a community and that you allow users to toggle the visibility of their profile.
One option is to stick to a public/private profile flag and stick to broad, pre-emptive permission checks: 'users.view' (views public users) vs, say, 'users.view_all' (views all users, for moderators).
Another involves more refined permissions, you might want them to be able to configure things so they can make themselves (a) viewable by all, (b) viewable by their hand-picked buddies, (c) kept private entirely, and perhaps (d) viewable by all except their hand-picked bozos. In this case you need to store owner/access-related data for individual rows, and you'll need to heavily abstract some of these things in order to avoid materializing the transitive closure of a dense, oriented graph.
With either approach, I've found that added complexity in role editing/assignment is offset by the resulting ease/flexibility in assigning permissions to individual pieces of data, and that the following to worked best:
Users can have multiple roles
Roles and permissions merged in the same table with a flag to distinguish the two (useful when editing roles/perms)
Roles can assign other roles, and roles and perms can assign permissions (but permissions cannot assign roles), from within the same table.
The resulting oriented graph can then be pulled in two queries, built once and for all in a reasonable amount of time using whichever language you're using, and cached into Memcache or similar for subsequent use.
From there, pulling a user's permissions is a matter of checking which roles he has, and processing them using the permission graph to get the final permissions. Check permissions by verifying that a user has the specified role/permission or not. And then run your query/issue an error based on that permission check.
You can extend the check for individual nodes (i.e. check_perms($user, 'users.edit', $node) for "can edit this node" vs check_perms($user, 'users.edit') for "may edit a node") if you need to, and you'll have something very flexible/easy to use for end users.
As the opening example should illustrate, be wary of steering too much towards row-level permissions. The performance bottleneck is less in checking an individual node's permissions than it is in pulling a list of valid nodes (i.e. only those that the user can view or edit). I'd advise against anything beyond flags and user_id fields within the rows themselves if you're not (very) well versed in query optimization.

This means that I can exercise blatant
disregard for data normalization as I
will never have more than a couple
hundred possible records.
The number of rows you expect isn't a criterion for choosing which normal form to aim for.
Normalization is concerned with data integrity. It generally increases data integrity by reducing redundancy.
The real question to ask isn't "How many rows will I have?", but "How important is it for the database to always give me the right answers?" For a database that will be used to implement an ACL, I'd say "Pretty danged important."
If anything, a low number of rows suggests you don't need to be concerned with performance, so 5NF should be an easy choice to make. You'll want to hit 5NF before you add any id numbers.
A query to figure out if a user was
allowed somewhere would look like
this:
SELECT id FROM resources WHERE name = ?
SELECT * FROM permissions
WHERE role_id = ? AND resource_id = ? ($user_role_id, $resource->id)
That you wrote that as two queries instead of using an inner join suggests that you might be in over your head. (That's an observation, not a criticism.)
SELECT p.*
FROM permissions p
INNER JOIN resources r ON (r.id = p.resource_id AND
r.name = ?)

You can use a SET to assign the roles.
CREATE TABLE permission (
id integer primary key autoincrement
,name varchar
,perm SET('create', 'edit', 'delete', 'view')
,resource_id integer );

Related

Mapping strings to relations

I'm creating a system where tasks can be assigned to different users. The problem is that tasks are mapped through a string column called recipient, that in the end maps to a collection of users. The contents of this column could look like this:
has:tasks-update,tasks-access - Users that have the tasks-update and tasks-access Permission.
role:administrator - Users that have the administrator role.
Right now I'm resolving it problematically. This is somewhat easy when I have to figure out who has access to a specific task, but cumbersome when a user needs to know what tasks are "assigned" to them.
Right now I'm resolving each recipient column to see if the user is included, this is unfortunately not very feasible as it comes with a huge performance cost.
I already have indices on the appropriate columns to speed the look-ups up.
A solution to this, was that I would resolve the recipients when the recipient was changed and then place the relationships between users and tasks in an intermediate table. While this lets me quickly look up the tasks a user is assigned to, it also becomes problematic since now I need to keep track of (for example) each time a user has been given the administrator role and now synchronize this to the intermediate table.
I was hoping I could get some insight into solving this issue without sacrificing performance like I am right now, but also not have to synchronize all the time.
Storing a list of anything as a string in a singular column can lead to all sorts of problems down the line
As you have encountered already.. any relational look-up, insert, update or delete operations on the list will first require some form of parsing of the existing list
It is worth noting that any indexes on this column will likely NOT be usable by the engine for these tasks, as indexes on string based columns (other than FULL TEXT) are only really useful when searching the start of the string
For example,
SELECT *
FROM site_user
WHERE recipients LIKE '%tasks-update%'
Will not be able to use an index on the recipients column
A suggestion
I would split out your current lists into new tables, like
role - id, name, …
e.g. {3, 'administrator',… }
permission - id, name, …
e.g. {5, 'tasks-access',… }
e.g. {9, 'tasks-update',… }
site_user - id, name, role_id, …
e.g. {7, 'Jeff', 3,… }
site_user_permission - id, site_user_id, permission_id, …
e.g. {1, 7, 5,… }
e.g. {2, 7, 9,… }
Where from the example records, 'Jeff' is an 'administrator' and has been assigned the 'tasks-update', and 'tasks-access' permissions
Lookups should be easily achievable using JOINs, and stay consistent when data is added or removed. Data integrity can be maintained by adding appropriate foreign keys and unique indexes
N.B. Without specific examples of the operations that are causing you issues, or more details on how you intend to use user roles and permissions, it is difficult to do more than make general suggestions
The tried and good approach, complying to normal forms would be to have task_type and role tables. You of course have a user table and since a user can have many roles and privileges, you will need a user_role and a user_privilege table to handle the many-to-many relations. An easy way to handle the problems is to have some numbers representing some privileges and roles, like 1 for administrator and 2, 3, 5, 7, 11, 13, 17 and so on for other privileges. Having a similar number for a role as a primary key would ease the role matching problem. For example, let's consider the case when you have a privilege with code 7. If you search for roles with the id divisible with this code, then you will get 7 (data_read, for example) and 1 (administrator).
You need for sure a relation table between users and tasks and of course in this relation you have also to flag if user is adminstrator or not. This is the best way for design the structure of your application instead of merging information into a single columns which cause performance/complexity issue. Go ahead with this approach ,your work will benefit from this.

What's a suitable table design for objects which can be "trashed"

I'm designing a simple media "server" as part of a larger application. I've chosen to adopt similar terminology as the AWS S3 service, i.e Objects and Buckets (i.e files and directories).
I have two tables:
cdn_bucket
id, directory
and
cdn_object
id, bucket_id, filename, is_deleted
Other tables in the database can include objects using a foreign key on cdn_object.id. This has nice side-effects in that I can specify a constraint to set the field NULL in the event that the object is deleted (or indeed prevent deletion if needed). e.g:
blog_post
id, title, body, featured_image
CONSTRAINT: featured_image = cdn_object.id ON DELETE SET NULL
I was told once that I shouldn't delete things, ever (that's an argument for another post, please don't comment on it here); hence the is_deleted flag. To clarify the question, this is what I mean by "trashed", i.e recoverable.
This works great, however I can't leverage the cascading functionality of the constraints (i.e I mark an object as deleted, but the referring table, e.g blog_post.featured_image references the old ID).
I was wondering what the SO opinions might be on the following two approaches, or if there's another approach which might be better.
1. Join the cdn_object table
SELECT bp.*, cdno.id featured_image FROM blog_post bp JOIN cdn_object cdno ON cdno.id = bp.featured_image AND cdno.is_deleted = 0.
Pro: easy to implement.
Con: every query has to join the cdn_object table.
or
2. Use a trash table
Have another table, cdn_object_trash and have the code 'move' the row cdn_object when it's deleted, triggering all the cascading constraints.
Pro: allows the relational rules to do what they were designed to do
Con: bad by design? Not sure.
My gut feeling tells me I should use the is_deleted flag and write code accordingly, but this is a generic class and so I'd prefer to not force the developer to write the join every time if I can configure that logic in the DB.
I hope my situation/question is clear, please ask me to clarify any points if needed.
Your third option is to set up a reasonable backup and retention schedule, and use cascading deletes. While I understand the desire to "never delete anything", abiding by that principle is forcing you to be redundant in your programming choices (option 1) or to figure out how to build a trash table to redundantly store information (option 2; do you build a single table with a string representation of the data, or do you make a trash copy of the schema?). Both of those choices seem like a lot of work to maintain (over the long haul).
I've worked with variants of both choices, and if those were the only options on the table, option 1 is a bit easier to maintain; however, you have to be EXTREMELY diligent in using it, and you have to make sure that future development efforts live up to that same standard.

SQL or NoSQL search?

Let us suppose I have a site with a certain number of users with the following three distinguishing characteristics:
1) The user is part of a network. (The site contains multiple networks.)
2) The user is a 'contact' of a certain number of other site members.
3) Individual documents uploaded by a user may be shared with certain contacts (excluding other contacts).
In this way, a user's document search is unique for each user based upon his or her network, contacts, and additional documents that have been shared with that user. What would be possible ways to address this -- would I need to append a long unique SQL query for each user for each of his or her searches? I am currently using MySQL as a database -- would using this be sufficient, or would I need to move towards a NoSQL option here to maintain the performance of a similar non-filtered search?
A few questions come to mind to help answer this question:
How many documents do you think the average user will have access to? Will many documents in the network be shared for all to see?
How will users be able to find documents and what do the documents look like? Will they only be able to search by the contact that shared it? By a simple title match? Will they be able to run a full text search against the document's contents?
Depending on the answer to those two questions, a relational system could work just fine, which I'm guessing is preferable since you are already using MySql. I think you could locate the documents for an individual user in a relational system with a few very reasonable queries.
Here is a potential bare bones schema
User
--all users in the system
UserId int
NetworkId int (Not sure if this is a 1 to many relationship)
Document
--all documents in the system
DocumentId int
UserId int -- the author
Name varchar
StatusId -- perhaps a flag to indicate whether it is public or not, e.g. shared with everyone in the same network or shared with all contacts
UserDocumentLink
--Linking between a document and the contacts a user has shared the document with
DocumentId
ContactId
UserContact
--A link between a user and all of their contacts
ContactId -- PK identity to represent a link between two users
UserId -- User who owns the contact
ContactUserId --The contact user
Here is a potential "search" query:
--documents owned by me
SELECT DocumentId
from Document where UserId = #userId
UNION
--documents shared with me explicitly
SELECT DocumentId
From UserContact uc
InnerJoin UserDocumentLink ucl on uc.ContactId = ucl.ContactId
Where
uc.ContactUserId = #userId
UNION
--documents shared with me via some public status, using a keyword filter
Select DocumentId
From Document d
inner join User u on d.UserId = u.UserId
where
u.NetworkId = #userNetworkId
and d.status in ()
and d.Name like '%' + #keyword + '%'
I think what might be a more influential requirement for schema design is one that is not mentioned in your question - how will users be able to search through documents? And what kind of documents are we talking about here? MySql is not a good option for full text search.
It rather depends on what you mean by a "certain number" of users. If you mean a few tens of thousands, then almost any solution can be made to perform adequately. If you mean many millions, then a NoSQL solution may scale up more cheaply and easily.
I suspect that a more general SQL query can be used, rather than a unique one for each user, e.g. selecting documents that belong to users that know the current user, that are marked as being shared with the current user, and match the search string.
Denormalisation can probably be used (as is common in NoSQL approaches) to improve performance.
However, a graph database (as Peter Neubauer suggests) possibly in combination with a document store (CouchDB, MongoDB or Cassandra) would work very well for this type of problem and would scale well.
I would take a look at some of the NOSQL solutions, for this interconnected dataset possibly Neo4j, a Graph Database. It's even pretty straightforward to query it through Cypher so that you get tabular results back.
As others have pointed out the number of users and the frequency of requests (traffic volume) must be looked at. Also, how important is redundancy? How likely are people to work on same documents simultaneously? Are most documents created once and distributed for "readonly" purposes?
NoSQL can help you scale and get redundancy in a much easier way compared to rdbms for this particular scenario. I am assuming that at some point you will want tagging etc. to be enabled on the documents.
Now, I am wondering if there is any particular reason why you are not looking at off the shelf document management and CMS system for this? I am sure there is a good reason, but it might be worth looking at all the those options too.
I hope this helps. Good luck!
Denormalization will give you better read-search performance in this
case.
Don't normalize users, keep frequently joined entities like owner and
text, in one table
e.g. keep names of the owners as FK on text table, to keep their
names on the text table and decrease number of joins, then you can
use sql freely.
I've managed this using long unique queries in MySQL as you suggest for a small-scale social networking project. Nowadays I would suggest using solr and keeping permission information as a denormalized array of interchangeable keywords on each document. Say each network has a unique recognizable code (ie 100N-20000N), similar for users and special permission grants. You can store an array of permission keys, like "5515N 43243N 2342N 603U 203PG 44321PG" and treat those as keywords when searching.
I would address it with a simple business process solution, which will lead to a simple data schema, a simple query and so performances and scalabilty:
Each User has a list of documents... Period.
This list is in fact a list of references to documents in a document table (with owner/security informations...)
When sharing a document to another user this document reference is added to the user's document list (Tagged as a shared one if you want), user is added to the document security list (with permission level for example).
sql query to get documents is a simple: select documentid from userdocument where userid=#userid
With a join on document table, proper indexes and sql tuning it will run with all needed informations and it will run fast.
I hope i understood well what you try to do.
-< = one to many
>-< = many to many (will require link table)
Network -< user -< documents >-< contact(user)
v
|
^
contacts(user,user)
This is relational, I don't see a good reason to go NoSQL unless you have a billion users
Network (unless you can belong to more than one) is an attribute of user
contacts will be maitained in the link table user_contact(user,user)
tables
documents(doc_id,user_id)
user(user_id)
contacts(user_id,c_user_id) with foreign keys on users
document_contact(doc_id,c_user_id) where a trigger constrains the c_user_id
then you get a view for all docs owners and subscribers (contacts)
CREATE OR REPLACE VIEW user_docs AS
SELECT d.user_id, d.doc_id, 'owner' AS role
FROM documents d
JOIN users u ON d.user_id = u.user_id
UNION
SELECT c.user_id, d.doc_id, 'subscriber' AS role
FROM documents d
JOIN contacts c ON d.user_id = c.c_user_id;
you can then filter the view against the document contacts,
select * from user_docs ud
where
(ud.role = 'originator'
or
ud.doc_id in (select doc_id from document_contact dc where ud.doc_id = dc.doc_id)
) and ud.user_id = 'me'
I would trade off immediateness with performance when it comes to full text searching.
I would create a hash table of the user combinations with the documents on a separate thread usually triggered by an asynchronous call when user associations change.
I then query the hash value + other search criteria. This will eliminate the need for the long SQL that appears at the end which may cause a lock.

database modelling -mysql

I am doing the design of a database, that will have eventually thousands of users. Each user has your profile and specific data associated.
In your opinion, it is best practice a table for id, username, activationLink and hash and another for address, age, photo, job, or it is best a unique table for all stuff?
thanks for your time
If:
All (or almost all) users have all data filled
Most of the time you query for all fields
then keep them in a single table, otherwies split them.
In your model, activationLink seems to be queried for only once per activation, so I'd move it into a separate table (which would allow deleting it after the account had been activated).
Address, age, photo and job are usually shown along with the username, so it would be better to merge them into a single table.
Don't allow your initial design to limit the ability (or just make it difficult) to expand your requirements in the future.
At the moment, a user may have one address so you might put it in the users table - what if you want them to be able to store "work" and "home" addresses in future, or a history of past addresses?
A user may only be allowed to have a single photo, but if you put it (or a URL for it) in users.photo, then you'd have to change your data structure to allow a user to have a history of profile photos
As Quassnoi mentions, there are performance implications for each of these decisions - more tables means more complexity, and more potential for slow queries. Don't create new tables for the sake of it, but consider your data model carefully as it quickly becomes very hard to change it.
Any values that are a strict 1-to-1 relationship with a user entity, and are unlikely to ever change and require a history for (date of birth is a good example) should go in the table with the core definition. Any potential 1-to-many relationships (even if they aren't right now) are good candidates for their own tables.

Permissions for web site users

I'm working on a web site where each user can have multiple roles/permissions such as basic logging in, ordering products, administrating other users, and so on. On top of this, there are stores, and each store can have multiple users administrating it. Each store also has it's own set of permissions.
I've confused myself and am not sure how best to represent this in a db. Right now I'm thinking:
users
roles
users_roles
stores
stores_users
But, should I also have stores_roles and stores_users_roles tables to keep track of separate permissions for the stores or should I keep the roles limited to a single 'roles' table?
I originally thought of having only a single roles table, but then what about users who have roles in multiple stores? I.e., if a user is given a role of let's say 'store product updating' there would need to be some method of determining which store this is referring to. A stores_users_roles table could fix this by having a store_id field, thus a user could have 'store product updating' and 'store product deletion' for store #42 and only 'store product updating' for store #84.
I hope I'm making sense here.
Edit
Thanks for the info everyone. Apparently I have some thinking to do. This is simply a fun project I'm working on, but RBAC has always been something that I wanted to understand better.
This is probably obvious to you by now, but role based access control is hard. My suggestion is, don't try to write your own unless you want that one part to take up all the time you were hoping to spend on the 'cool stuff'.
There are plenty of flexible, thoroughly-tested authorization libraries out there implementing RBAC (sometimes mislabeled as ACL), and my suggestion would be to find one that suits your needs and use it. Don't reinvent the wheel unless you are a wheel geek.
It seems likely to me that if I have permission to do certain roles in a set of stores, then I would probably have the same permissions in each store. So having a single roles table would probably be sufficient. So "joe" can do "store product updating" and "store product deletion", then have a user_stores table to list which stores he has access to. The assumption is for that entire list, he would have the same permissions in all stores.
If the business rules are such that he could update and delete in one store, but only update, no delete, in another store, well then you'll have to get more complex.
In my experience you'll usually be told that you need a lot of flexibility, then once implemented, no one uses it. And the GUI gets very complex and makes it hard to administer.
If the GUI does get complex, I suggest you look at it from the point of view of the store as well as the point of view of the user. In other words, instead of selecting a user, then selecting what permissions they have, and what stores they can access, it may be simpler to first select a store, then select which users have access to which roles in that store. Depends I guess on how many users and how many stores. In a past project I found it far easier to do it one way than the other.
Your model looks ok to me. The only modification I think you need is as to the granularity of the Role. Right now, your role is just an operation.
But first, you need a store_role table, a joint table resolving the Many-to-many relationship b/w a role and a store. ie, one store can have many roles and one role can be done in many stores.
Eg: StoreA can CREATE, UPDATE, DELETE customer. and DELETE customer can be done in StoreA, StoreB and StoreC.
Next, you can freely associate users to store_role_id in the user_store_roles table.
Now, a user_store_role record will have a user_id and a store_role_id:
A collection of
SELECT * FROM USER_STORE_ROLE WHERE user_id = #userID
returns all permitted operations of the user in all the stores.
For a collection of users's roles in a particular store, do an inner join of the above to user_store table adding a WHERE part of like
where STORE_ROLE.store_id = #storeID
Put a store_id in the user_roles table.
If this is Rails, the user model would have_many :stores, :through => :roles