I would like to implement row based access control for Bigtable using roles so that each row may allow one to many roles. I do not know how I could get this into the key easily.
What patterns are available for implementing something like this?
My current use case is making a prefix range lookup with my other indices as part of the key already.
Unfortunately, currently Bigtable does not support a native row-based ACL feature. It really depends on your use case, but there are several ways to implement this yourself. Following you will find some possible suggestions:
Apache Accumulo is a wrapper around HBase which implements row-based ACLs.
A proxy layer ahead of the table which queries for the "permissions" on the table itself, or a secondary permissions table is viable, however, quite complex to implement yourself. From what I understand, that is the solution you may have gone with, but keep in mind that the caveat is you must implement the check for complex queries as well, which might make it very inefficient (and convoluted to maintain as new features arrive).
roles := []bigtable.Filter{bigtable.ColumnFilter("public"),
bigtable.ColumnFilter("admin")}
acl := bigtable.InterleaveFilters(roles...)
table.ReadRows(context.Background(), rr, func(r bigtable.Row) bool {
// do something
}, bigtable.RowFilter(acl))
Related
Assume I have a schema defined with the 4 following entities:
Users
-> Timeline (fk: userId)
-> Entries (fk: timelineId)
-> Tags (fk: entryId), where fk means foreign key.
Now, let's say I want to check in the web application if a user has permission to delete a particular tag. Right now I use Basic Authentication, check if the user's email // password exist in the database, and if so, grab the userId.
Because of the userId only existing on the Timeline entity, I feel like I'd need to do the following:
DELETE t.* FROM `tags` AS t
INNER JOIN `entries` AS e ON t.entryId = e.id
INNER JOIN `timelines` AS tl ON e.timelineId = tl.id
WHERE
tl.userId = ? AND
t.id = ?
This approach works, but I feel like it would be inefficient. Instead, I could add a userId FK to every single table such as the tags, but that also seems like a maintenance nightmare.
I can't think of any other approaches other than implementing some other type of permission system, such as using an ACL. Any suggestions?
I think you can choose from a few options:
leave it as is until it actually becomes a problem and fix it then (it probably won't be a problem, there's a lot of optimization in MySQL)
add foreign keys to tables as you proposed and take the overhead on changes (your models / data access layer should hide the problem from higher layers anyway)
implement some kind of custom caching
you can either create something like a cache table, probably in a nosql database like Redis, which would be very fast (once a permission is retrieved, it can live in the cache for a while, but be aware of consequences, for example permission changes will not be immediate)
you can use memcache
you can do custom in-memory caching in your app (be careful with using the session for this, session-related vulnerabilities might allow an attacker more access than you intended)
Basically and in general it's a compute / storage tradeoff I think. You either compute permissions every time or store them pre-computed somewhere, which means you need to re-compute them sometimes (but probably not all the time).
The right solution depends on your exact scenario. My experience is that in most cases it's not worth to fix something that is not broken yet (unless of course you know it will not work that way in the scenario you want to use it in).
Check out foreign keys here. You can simply add relationships through MySQL to the other tables, and cascade delete when the parents get removed.
I currently have a tree structure containing data elements; on each of these, it is possible to perform basic CRUD operations. From here, I need to implement per-user permissions for each of these four actions. So a given user could be given Create and Read, but no Update or Delete permission. These permissions would then cascade down the tree to any children of the permitted object; therefore, that given user would have Create and Read for any child objects of the root object.
What's the best way of storing these permissions using an SQL (MySQL with PHP specifically) database? Currently I'm thinking the ideal solution may be to create another database table which tracks a User ID, an object ID, and then a list of booleans tracking each possible permission, and then checking the user ID and object ID against the permission table and traveling up the tree until a permission object is found (or not found, as the case may be).
My main issue with this is twofold. Firstly, it makes it impossible to give permission to one object, but not its children. Secondly, it seems like it might cause a performance hit on particularly deep objects. So, what seems like a good way of going about this?
Recursive data structures are often hard to "map" to SQL queries. Some databases have special support for it (Oracle, for example) but MySQL has no built-in support (= you can work around that but it's clumsy).
We needed something similar in our application. Our solution was to store normalized data (i.e. "user X has permission Y on node Z" -> three columns with FK relations) in a simple table.
A DAO/manager object reads this table and builds a cache where it can quickly look up permissions as we need them.
To summarize: Keep the database simple and write special helper code in your application to transform the database into the structure which you need.
I have an application that allows users to filter applicants based on very large set of criteria. The criteria are each represented by boolean columns spanning multiple tables in the database. Instead of using active record models I thought it was best to use pure sql and put the bulk of the work in the database. In order to do this I have to construct a rather complex sql query based on the criteria that the users selected and then run it through AR on the db. Is there a better way to do this? I want to maximize performance while also having maintainable and non brittle code at the same time? Any help would be greatly appreciated.
As #hazzit said, it is difficult to answer without much details, but here's my two cents on this. Raw SQL is usually needed to perform complex operations like aggregates, calculations, etc. However, when it comes to search / filtering features, I often find using raw SQL overkill and not quite maintainable.
The key question here is : can you break down your problem in multiple independent filters ?
If the answer is yes, then you should leverage the power of ActiveRecord and Arel. I often find myself implementing something like this in my model :
scope :a_scope, ->{ where something: true }
scope :another_scope, ->( option ){ where an_option: option }
scope :using_arel, ->{ joins(:assoc).where Assoc.arel_table[:some_field].not_eq "foo" }
# cue a bunch of scopes
def self.search( options = {} )
output = relation
relation = relation.a_scope if options[:an_option]
relation = relation.another_scope( options[:another_option] ) unless options[:flag]
# add logic as you need it
end
The beauty of this solution is that you declare a clean interface in which you can directly pour all the params from your checkboxes and fields, and that returns a relation. Breaking the query into multiple, reusable scopes helps keeping the thing readable and maintainable ; using a search class method ties it all together and allows thorough documentation... And all in all, using Arel helps securing the app against injections.
As a side note, this does not prevent you from using raw SQL, as long as the query can be isolated inside a scope.
If this method is not suitable to your needs, there's another option : use a full-fledged search / filtering solution like Sunspot. This uses another store, separate from your db, that indexes defined parts of your data for easy and performant search.
It is hard to answer this question fully without knowing more details, but I'll try anyway.
While databases are bad at quite a few things, they are very good at filtering data, especially when it comes to a high volumes.
If you do the filtering in Ruby on Rails (or just about any other programming language), the system will have to retrieve all of the unfiltered data from the database, which will cause tons of disk I/O and network (or interprocess) traffic. It then has to go through all those unfiltered results in memory, which may be quite a burdon on RAM and CPU.
If you do the filtering in the database, there is a pretty good chance that most of the records will never be actually retrieved from disk, won't be handed over to RoR and won't then be filtered. The main reason for indexes to even exist is for the sole purpose of avoiding expensive operations in order to speed things up. (Yes, they also help maintain data integrity)
To make this work, however, you may need to help the database a bit to do its job efficiently. You will have to create indexes matching your filtering criteria, and you may have to look into performance issues with certain types of queries (how to avoid temporary tables and such). However, it is definately worth it.
Having that said, there actually are a few types of queries that a given database is not good at doing. Those are few and far between, but they do exist. In those cases, an implementation in RoR might be the better way to go. Even without knowing more about your scenario, I'd say it's a pretty safe bet that your queries are not among those.
In MySQL, is it possible to append default columns after creation or create them automatically? A brief overview is this:
All tables must have 5 fields that are standardized across our databases (created_on, created_by, row_status etc). Its sometimes hard for developers to remember to do this and/or not done uniformly. Going forward we'd like to automate the task some how. Does anyone know if its possible to create some sort of internal mysql script that will automatically append a set of columns to a table?
After reading through some responses, I think i'd rephrase the question bit, rather than making it an autoamtic task (i.e EVERY Table), make it function that can be user-triggered to go through and check for said columns and if not add them. I'm pretty confident this is out of SQL's scope and would require a scripting language to do, not a huge issue but it had been preferable to keep things encapsulated within SQL.
I'm not very aware of MySQL specific data modeling tools, but there's no infrastructure to add columns to every table ever created in a database. Making this an automatic behavior would get messy too, when you think about situations where someone added the columns but there were typos. Or what if you have tables that are allowed to go against business practice (the columns you listed would typically be worthless on code tables)...
Development environments are difficult to control, but the best means of controlling this is by delegating the responsibility & permissions to as few people as possible. IE: There may be 5 developers, but only one of them can apply scripts to TEST/PROD/etc so it's their responsibility to review the table scripts for correctness.
i would say first - don't do that.
make an audit table seperately - and link with triggers.
otherwise, you will need to feed your table construction through a procedure or other application that will create what you want.
I'd first defer to Randy's answer - this info is probably better extracted elsewhere.
That said, if you're set on adding the columns, ALTER TABLE is probably what you're looking for. You might consider also including some extra logic to determine which columns are missing for each table.
I am currently working on a Wikipedia API which means that we have a
database for each language we want to use. The structure of each
database is identical, they only differ in their language. The only
place where this information is stored is in the name of the database.
When starting with one language the straight forward approach to use a
mapping between the tables to needed classes (e.g. Page) looked fine.
We defined an engine and corresponding metadata. When we added a
second
database with its own setup for engine and metadata we ran into the
following error:
ArgumentError:
Class '<class 'wp.orm.types.pages.Page'>' already has a primary mapper defined.
Use non_primary=True to create a non primary Mapper.clear_mappers() will remove
*all* current mappers from all classes.
I found an email saying that there must be at least one primary
mapper, so using this option for all databases doesn't seem feasible.
The next idea is to use sharding. For that we need a way to
distinguish
between the databases from the perspective of an instance, as noted in
the docs:
"You need a function which can return
a single shard id, given an instance
to be saved; this is called
"shard_chooser"
I am stuck here. Is there a way to get the database name given an
Object
it is loaded from? Or a possibility to add a static attribute based on
the engine? The alternative would be to add a language column to every
table which is just ugly.
Am I overseeing other possibilities? Any ideas how to define multiple
mappers for the same class, that map against tables in different
databases?
I asked this question on a mailing list and got this answer by Michael Bayer:
if you'd like distinct classes to
indicate that they "belong" in a
different database, and you have very
clear lines as to how this is
performed, use the "entity_name"
concept described at
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/EntityName
. this sounds very much like your use
case.
The next idea is to use sharding. For that we need a way to
distinguish
between the databases from the perspective of an instance, as noted
in
the docs:
"You need a function which can return a single shard id, given an
instance to be saved; this is called "shard_chooser"
horizontal sharding is a method of
storing many homogeneous instances
across multiple databases, with the
implication that you're creating one
big "virtual" database among
partitions - the main concept is
that an individual instance gets
placed in different partitions based
on some ruleset. This is a little
like your use case as well but since
you have a very simple delineation i
think the "entity name" approach is
easier.
So the basic idea is to generate anonymous subclasses for each desired mapping which are distinguished by the Entity_Name. The details can be found in Michaels Link