How to filter DB results with permissions stored in another DB - mysql

i'm currently searching for a good approach to filter DB results based on permissions which are stored in another services DB.
Let me first show the current state:
There's one Document-Service with 2 tables (permission, document) in its MySQL DB. When documents for a user are requested, a paginated result should be returned. For brevity let's ignore the pagination for now.
Permission table: Document table:
user_id| document_id document_id| more columns
-------|------------ A
1 | A B
2 | A C
2 | B
2 | C
The following request "GET /documents/{userId}" will result in the following query against the DB:
SELECT d.* FROM document d JOIN permission p WHERE p.user_id = '{userId}' AND p.document_id = d.document_id;
That's the current implementation and now i am asked to move the permission table into its own service. I know, one would say that's not a good idea, but this question is just a broken down example and in the real scenario it's a more meaningful change than it looks like. So let's take it as a "must-do".
Now my problem: After i move the table into another DB, i cannot use it in the sql query of Document-Service anymore to filter results.
I also cannot query everything and filter in code, because there will be too much data AND i must use pagination which is currently implemented by LIMIT/OFFSET in the query (ignored in this example for brevity).
I am not allowed to access a DB from any other application except its service.
My question is: Is there any best practise or suggested approach for this kind of situation?
I already had 2 ideas which i would like to list here, even though i'm not really happy with either of them:
Query all document_ids of a user from the new Permission-Service and change the SQL to "SELECT * FROM document WHERE document_id IN {doc_id_array_from_permission_service}". The array could get pretty big and the statement slow; not happy about that.
Replicate the permission table into Document-Service DB on startup and keep the query as it is. But then i need to implement a logic/endpoint to update the table in the Document-Service whenever it changes in the Permission-Service otherwise it get's out of sync. This feels like i'm duplicating so much logic in both services.

For the sake of this answer, I'm going to assume that it is logical for Permissions to exist completely independently of Documents. That is to say - if the ONLY place a Permission is relevant is with respect to a DocumentID, it probably does not make sense to split them up.
That being the case, either of the two options you laid out could work okay; both have their caveats.
Option 1: Request Documents with ID Array
This could work, and in your simplified example you could handle pagination prior to making the request to the Documents service. But, this requires a coordinating service (or an API gateway) that understands the logic of the intended actions here. It's doable, but it's not terribly portable and might be tough to make performant. It also leaves you the challenge of now maintaining a full, current list of DocumentIDs in your Permissions service which feels upside-down. Not to mention the fact that if you have Permissions related to other entities, those all have to get propagated as well. Suddenly your Permissions service is dabbling in lots of areas not specifically related to permissions.
Option 2: Eventual Consistency
This is the approach I would take. Are you using a Messaging Plane in your Microservices architecture? If so, this is where it shines! If not, you should look into it.
So, the way this would work is any time you make a change to Permissions, your Permissions Service generates a permissionUpdatedForDocument event containing the relevant new/changed Permissions info. Your Documents service (and any other service that cares about permissions) subscribes to these events and stores its own local copy of relevant information. This lets you keep your join, pagination, and well-bounded functionality within the Documents service.
There are still some challenges. I'd try to keep your Permissions service away from holding a list of all the DocumentID values. That may or may not be possible. Are your permissions Role or Group-based? Or, are they document-specific? What Permissions does the Documents service understand?
If permissions are indeed tied explicitly to individual documents, and especially if there are different levels of permission (instead of just binary yes/no access), then you'll have to rethink the structure in your Permissions service a bit. Something like this:
Permission table:
user_id| entity_type| entity_id | permission_type
-------|------------|-----------|----------------
1 | document | A | rwcd
2 | document | A | r
2 | document | B | rw
2 | document | C | rw
1 | other | R | rw
Then, you'll need to publish serviceXPermissionUpdate events from any Service that understands permissions for its entities whenever those permissions change. Your Permissions service will subscribe to those and update its own data. When it does, it will generate its own event and your Documents service will see confirmation that its change has been processed and accepted.
This sounds like a lot of complication, but it's easy to implement, performant, and does a nice job of keeping each service pretty well contained. The Messaging plane is where they interact with each other, and only via well-defined contracts (message names, formats, etc.).
Good luck!

Related

database structure for a mobile app, which is faster

Maybe the question is a bit complex and complicated (that's why I need to make a brief introduction)
my team and I are developing a Mobile App with Node js. Now we are in the part of the database structure. Our idea is to do it in Azure SQL. But we have a couple of questions regarding the structure of the database.
We offer 5 services (at the moment), of which each user can be assigned several services (may non or all). Based on the services it has, the user will be redirected to a screen where all the services will be and only those assigned with color (to be able to click) and the others in gray (so that they cant click it)
Which is better, create one column per service or all services in a single column array style?
for example
service 1| service 2| service 3|service 4|service 5|
true | fasle | true | true| false| true
or
service
[service 1,service 2,service 3,service 4,service 5]
Because I think that if in the future we have x services, going through the entire array and making a condition to verify what service it has is going to make the latency of the app to slow, instead hitting a certain column maybe makes it faster
I hope the question has been understood, sorry if the maries.
regards
This isn't really an azure SQL question but more of a relational database question.
In general you should avoid both these methods and try to normalize your database.
Your database knows how to query multiple databases without any performance hits, its made for it.
The best option in my experience is to create a many to many table connection
So one table that holds the original data without any mention of a service, Maybe called Entities
Id, Data, Time, Active
Another table that holds the relations to the services, called EntitiesToServices
Id, ServiceId, OtherTableId
And a third table that holds data about the services called Services
Id, Name
In this way you can expand all your services freely and add more tables without anyone interfering with each other.
If all you need is a set of up to 64 true/false values, consider a single column of type SET. Similarly you could some any sized INT (again with a limit of 64 flags) and turn on/off each 'service'. Today you have 5; tomorrow, as you say, there will be more.
The syntax for SET is a bit clumsy. So is using INT for this.
It is very compact; this may or may not be a bonus.
Normalizing (as mentioned in another answer) may be a better solution, especially if you need to store more than just on/off for each service for each user.
Please provide more details on what actions will happen with these flags; then we can get into more detail.

Storing userID and other data and using it to query database

I am developing an app with PhoneGap and have been storing the user id and user level in local storage, for example:
window.localStorage["userid"] = "20";
This populates once the user has logged in to the app. This is then used in ajax requests to pull in their information and things related to their account (some of it quite private). The app is also been used in web browser as I am using the exact same code for the web. Is there a way this can be manipulated? For example user changes the value of it in order to get info back that isnt theirs?
If, for example another app in their browser stores the same key "userid" it will overwrite and then they will get someone elses data back in my app.
How can this be prevented?
Before go further attack vectors, storing these kind of sensitive data on client side is not good idea. Use token instead of that because every single data that stored in client side can be spoofed by attackers.
Your considers are right. Possible attack vector could be related to Insecure Direct Object Reference. Let me show one example.
You are storing userID client side which means you can not trust that data anymore.
window.localStorage["userid"] = "20";
Hackers can change that value to anything they want. Probably they will changed it to less value than 20. Because most common use cases shows that 20 is coming from column that configured as auto increment. Which means there should be valid user who have userid is 19, or 18 or less.
Let me assume that your application has a module for getting products by userid. Therefore backend query should be similar like following one.
SELECT * FROM products FROM owner_id = 20
When hackers changed that values to something else. They will managed to get data that belongs to someone else. Also they could have chance to remove/update data that belongs to someone else agains.
Possible malicious attack vectors are really depends on your application and features. As I said before you need to figure this out and do not expose sensitive data like userID.
Using token instead of userID is going solved that possible break attemps. Only things you need to do is create one more columns and named as "token" and use it instead of userid. ( Don't forget to generate long and unpredictable token values )
SELECT * FROM products FROM owner_id = iZB87RVLeWhNYNv7RV213LeWxuwiX7RVLeW12

Creating ACL module using NoSQL Database VS. SQL DB

I need to write an ACL module for a system I'm working on.
I've written ACLs before - but I always have the feel that it can be more efficient
As I didn't work with NoSQL databases too much, and I'm seriously considering start using them for the purposes where it has its advantages - I would like to hear your opinion whether ACL module would be a good example for such case
The ACL should consist of AROs (Access request objects - such as site members), ACOs (Access Control Objects - such as actions on items in the site. [for example: view product no. 2])
And both should be grouped as a tree - so that I can have groups of members and groups of products.
My recent implementation used a set of defined permissions (each had its own number) that was combined to one number with bitwise OR operations to set the permission level of the ARO
for example: if view = 1, modify = 2, delete = 4 and user #1 can view and delete object #3 we will have this row in the database (mysql db in my case):
| user_id | object_id | level |
===========================================
| 1 | 3 | 5 |
In order to get the full hierarchy of permissions for the user I created a recursive join query (while specifying the maximum depth I'm supporting) and then checked for a permission definition on all of the object in the ARO's path to the root of the tree
I've now seen an interesting way to eliminate the need in the joins part in this article
Back to my question -
In NoSQL the data is saved as objects which can be in a tree data structure "out of the box" - so it seems more reasonable to have at least the groups hierarchy in such structure
What do you think would be the better solution for the most efficient and generic ACL system?

Database schema for ACL

I want to create a schema for a ACL; however, I'm torn between a couple of ways of implementing it.
I am pretty sure I don't want to deal with cascading permissions as that leads to a lot of confusion on the backend and for site administrators.
I think I can also live with users only being in one role at a time. A setup like this will allow roles and permissions to be added as needed as the site grows without affecting existing roles/rules.
At first I was going to normalize the data and have three tables to represent the relations.
ROLES { id, name }
RESOURCES { id, name }
PERMISSIONS { id, role_id, resource_id }
A query to figure out whether a user was allowed somewhere would look like this:
SELECT id FROM resources WHERE name = ?
SELECT * FROM permissions WHERE role_id = ? AND resource_id = ? ($user_role_id, $resource->id)
Then I realized that I will only have about 20 resources, each with up to 5 actions (create, update, view, etc..) and perhaps another 8 roles. This means that I can exercise blatant disregard for data normalization as I will never have more than a couple of hundred possible records.
So perhaps a schema like this would make more sense.
ROLES { id, name }
PERMISSIONS { id, role_id, resource_name }
which would allow me to lookup records in a single query
SELECT * FROM permissions WHERE role_id = ? AND permission = ? ($user_role_id, 'post.update')
So which of these is more correct? Are there other schema layouts for ACL?
In my experience, the real question mostly breaks down to whether or not any amount of user-specific access-restriction is going to occur.
Suppose, for instance, that you're designing the schema of a community and that you allow users to toggle the visibility of their profile.
One option is to stick to a public/private profile flag and stick to broad, pre-emptive permission checks: 'users.view' (views public users) vs, say, 'users.view_all' (views all users, for moderators).
Another involves more refined permissions, you might want them to be able to configure things so they can make themselves (a) viewable by all, (b) viewable by their hand-picked buddies, (c) kept private entirely, and perhaps (d) viewable by all except their hand-picked bozos. In this case you need to store owner/access-related data for individual rows, and you'll need to heavily abstract some of these things in order to avoid materializing the transitive closure of a dense, oriented graph.
With either approach, I've found that added complexity in role editing/assignment is offset by the resulting ease/flexibility in assigning permissions to individual pieces of data, and that the following to worked best:
Users can have multiple roles
Roles and permissions merged in the same table with a flag to distinguish the two (useful when editing roles/perms)
Roles can assign other roles, and roles and perms can assign permissions (but permissions cannot assign roles), from within the same table.
The resulting oriented graph can then be pulled in two queries, built once and for all in a reasonable amount of time using whichever language you're using, and cached into Memcache or similar for subsequent use.
From there, pulling a user's permissions is a matter of checking which roles he has, and processing them using the permission graph to get the final permissions. Check permissions by verifying that a user has the specified role/permission or not. And then run your query/issue an error based on that permission check.
You can extend the check for individual nodes (i.e. check_perms($user, 'users.edit', $node) for "can edit this node" vs check_perms($user, 'users.edit') for "may edit a node") if you need to, and you'll have something very flexible/easy to use for end users.
As the opening example should illustrate, be wary of steering too much towards row-level permissions. The performance bottleneck is less in checking an individual node's permissions than it is in pulling a list of valid nodes (i.e. only those that the user can view or edit). I'd advise against anything beyond flags and user_id fields within the rows themselves if you're not (very) well versed in query optimization.
This means that I can exercise blatant
disregard for data normalization as I
will never have more than a couple
hundred possible records.
The number of rows you expect isn't a criterion for choosing which normal form to aim for.
Normalization is concerned with data integrity. It generally increases data integrity by reducing redundancy.
The real question to ask isn't "How many rows will I have?", but "How important is it for the database to always give me the right answers?" For a database that will be used to implement an ACL, I'd say "Pretty danged important."
If anything, a low number of rows suggests you don't need to be concerned with performance, so 5NF should be an easy choice to make. You'll want to hit 5NF before you add any id numbers.
A query to figure out if a user was
allowed somewhere would look like
this:
SELECT id FROM resources WHERE name = ?
SELECT * FROM permissions
WHERE role_id = ? AND resource_id = ? ($user_role_id, $resource->id)
That you wrote that as two queries instead of using an inner join suggests that you might be in over your head. (That's an observation, not a criticism.)
SELECT p.*
FROM permissions p
INNER JOIN resources r ON (r.id = p.resource_id AND
r.name = ?)
You can use a SET to assign the roles.
CREATE TABLE permission (
id integer primary key autoincrement
,name varchar
,perm SET('create', 'edit', 'delete', 'view')
,resource_id integer );

System for tracking changes in whois records

What's the best storage mechanism (from the view of the database to be used and system for storing all the records) for a system built to track whois record changes? The program will be run once a day and a track should be kept of what the previous value was and what the new value is.
Suggestions on database and thoughts on how to store the different records/fields so that data is not redundant/duplicated
(Added) My thoughts on one mechanism to store data
Example case showing sale of one domain "sample.com" from personA to personB on 1/1/2010
Table_DomainNames
DomainId | DomainName
1 example.com
2 sample.com
Table_ChangeTrack
DomainId | DateTime | RegistrarId | RegistrantId | (others)
2 1/1/2009 1 1
2 1/1/2010 2 2
Table_Registrars
RegistrarId | RegistrarName
1 GoDaddy
2 1&1
Table_Registrants
RegistrantId | RegistrantName
1 PersonA
2 PersonB
All tables are "append-only". Does this model make sense? Table_ChangeTrack should be "added to" only when there is any change in ANY of the monitored fields.
Is there any way of making this more efficient / tighter from the size point-of-view??
The primary data is the existence or changes to the whois records. This suggests that your primary table be:
<id, domain, effective_date, detail_id>
where the detail_id points to actual whois data, likely normalized itself:
<detail_id, registrar_id, admin_id, tech_id, ...>
But do note that most registrars consider the information their property (whether it is or not) and have warnings like:
TERMS OF USE: You are not authorized
to access or query our Whois database
through the use of electronic
processes that are high-volume and
automated except as reasonably
necessary to register domain names or
modify existing registrations...
From which you can expect that they'll cut you off if you read their databases too much.
You could
store the checksum of a normalized form of the whois record data fields for comparison.
store the original and current version of the data (possibly in compressed form), if required.
store diffs of each detected change (possibly in compressed form), if required.
It is much like how incremental backup systems work. Maybe you can get further inspiration from there.
you can write vbscript in an excel file to go out and query a webpage (in this case, the particular 'whois' url for a specific site) and then store the results back to a worksheet in excel.