Is it okay to put Object ids into the dataset attribute? - html

so I just wanted to know if putting data in a dataset of an element is considered a security flaw even though it is meant to be seen.
For example, if instagram put the id of each post from their database into the dataset attribute in each post element
Another example would be:
Putting the id of the post in a dataset

OWASP calls this Insecure Direct Object Reference (IDOR) -- when you expose a "direct reference" (database ID, etc.) to an internal object to the client -- and it absolutely can be a security issue.
To quote OWASP here, which is a much better expert on this than any one person:
IDOR do not bring a direct security issue because, by itself, it reveals only the format/pattern used for the object identifier. IDOR brings, depending on the format/pattern in place, a capacity for the attacker to mount an enumeration attack in order to try to probe access to the associated objects.
So in essence, if you surface database IDs and thus the pattern they change in, and you have some sort of access control issue (you're relying on security by obscurity, or you have some sort of bug in your access control), an attacker can find their way to any object on your system because they know the ID scheme all objects follow and thus can enumerate to any object in your database.
This is a major flaw, but it's by no means the only one. Check out the OWASP cheat sheet on this for more details!

Related

Access control: RBAC with additional group memberships instead of object properties

Given an application that shows objects (e.g. films) according to certain user permissions.
The general permission to show or create objects is implemented as RBAC with roles and permissions.
The specific permission to access an object with certain attributes (e.g. a film with the attribute “drama”) should be implemented with memberships. That means the object doesn’t have the property “drama”, it is a member of the group “drama”. If the user and the object are members in the same group, the user has the specific permission to access this object. There can be different groups for showing, creating or deleting an object, like a simple viewer group or some kind of editor group. Furthermore there is a table that specifies which group types are relevant for certain actions on certain objects. For example relevant groups for the action “show” on the object "film" could be “genre” and “age” (film's suitability for certain audiences).
The reason to implement it in the described way is to have great flexibility without touching the code. Changes to groups can be processed in the database.
General database design:
Example: The film "The Revenant" is a member of the groups "genre:drama" and "age:18". The user can access it, if he is a member of these groups too.
Does this sound like a good approach? Are there any existing solutions that are similar to this approach? Does it have major drawbacks (e.g. too many database queries - there may be several hundred users every day)?
Please share your thoughts on this issue with me - the choice of "drama" as category for the example is not a coincidence ;) I just dont know if this is a dead end or if I am heading to the right direction. I stuck at this point for quite a while.
At least you have a good sense of humor :-)
Your approach sounds fine. So long as you keep the number of parameters low, then you can get away with role-based access control (RBAC) and a few additional parameters e.g. group membership.
But in the long run, if you want to implement business-driven authorization (access control), you need a way to do this independently of your code: you do not want to rewrite your app code every time there is a requirements change.
To do so, there is an access control model called Attribute-Based Access Control (ABAC) that will let you define your authorization policies independently of your code.
In ABAC, you have the following concepts:
an architecture which defines a policy enforcement point (PEP) and a policy decision point (PDP). The PEP sits in front of (or within) your app. It intercepts the business requests (e.g. a request to view a film) and sends an authorization request to the PDP. The PDP is configured with policies. Based on the request the PDP will reach a decision: either yes, Permit or no, Deny.
a policy language: the policy language is attribute-based (hence the name ABAC). This means that you can use any number of attributes (e.g. user role, user id, user group memberships, but also user age, user location, user subscription as well as resource attributes such as movie rating, movie category, movie price...)
a request / response scheme: this is how you ask for authorization. It is essentially a yes/no flow. "Can a user do X?", "Yes they can."
There are several implementations of ABAC out there - some of which are framework-specific e.g. CanCanCan. XACML and ALFA are two approaches that are not tied to any particular framework. You can choose from open-source and commercial implementations of either language e.g.:
Open Source: SunXACML, ATT XACML
Commercial: Axiomatics Policy Server

How do I prevent my HTML from being exploitable while avoiding GUIDs?

I've recently inherited a ASP.NET MVC 4 code base. One problem I noted was the use of some database ids (ints) in the urls as well in html form submissions. The code in its present state is exploitable through both URL tinkering and creating custom HTML posts with different numbers.
Now while I can easily fix the URL problems by using session state or additional auth checks i'm less sure about the database ids that get embedded into the HTML that the site spits out (i.e. I give them a drop down to fill). When the ids come back in a post how can I be sure I put them there as valid options?
What is considered "best practice" in terms of addressing this problem?
While I appreciate I could just "GUID it up" I'm hesitant to do so because I find them a pain in the ass to work with when debugging databases.
Do I have a choice here? Must I GUID to prevent easy guessing of ids or is there some kind of DRY mechanism I can use to validate the usage of ids as they come back into the site?
UPDATE: A commenter asked about the exploits I'm expecting. Lets say I spit out a HTML form with a drop down list of all the locations one can import "treasure" from. The id of the locations that the user owns are 1,2 and 3, these are provided in the HTML. But the user examines the html, fiddles with it and decides to put together a POST with the id of 4 selected. 4 is not his location, its someone else's.
Validate the ID passed against the IDs the user can modify.
It may seem tedious, but this is really the only way to make sure the user has access to what they're trying to modify. Using GUIDs without validation is security by obscurity: sure guessing them is hard, but you can potentially guess them given enough resources.
You can do this at the top of the controller before you do anything else with the posted data. If there's a violation, just throw an exception and have your global exception handler deal with it; you don't need to handle it in a pretty way since you can safely assume that the user is tampering with data in an unsupported way.
The issue you describe is known as "insecure direct object references," and the OWASP group recommends two policies for dealing with this issue:
using session-based indirect object references, and
validating all accesses to object references.
An example of Suggestion #1 would be that instead of having dropdown options 1, 2, and 3, you assign each option a GUID that is associated with the original ID in a map in the user's session. When you get a POST from that user, you check to see what object the given ID was supposed to be tied to. OWASP's ESAPI has some libraries to help with this in various languages.
But in many cases Suggestion #1 is actually counterproductive. For example, in many cases you want to have URLs that can be copy/pasted from one user to another. Process #2 is generally seen as the most foolproof way to address this issue.
You are describing Broken Access Control with Insecure Ids. Once you've identified the threat and decided which Ids are owned by certain users, ensure checks are in place for this server side.

What's the proper place for input data validation?

(Note: these two questions are similar, but more specific to ASP.Net)
Consider a typical web app with a rich client (it's Flex in my case), where you have a form, an underlying client logic that maps the form's input to a data model, some way of remoting these objects to a server logic, which usually puts it in a database.
Where should I - generally speaking - put the validation logic, i. e. ensuring correct format of email adresses, numbers etc.?
As early as possible. Rich client frameworks like Flex provide built-in validator logic that lets you validate right upon form submission, even before it reaches your data model. This is nice and responsive, but if you develop something extensible and you want the validation to protect from programming mistakes of later contributors, this doesn't catch it.
At the data model on the client side. Since this is the 'official' representation of your data and you have data types and getters / setters already there, this validation captures user errors and programming errors from people extending your system.
Upon receiving the data on the server. This adds protection from broken or malicious clients that may join the system later. Also in a multi-client scenario, this gives you one authorative source of validation.
Just before you store the data in the backend. This includes protection from all mistakes made anywhere in the chain (except the storing logic itself), but may require bubbling up the error all the way back.
I'm sort of leaning towards using both 2 and 4, as I'm building an application that has various points of potential extension by third parties. Using 2 in addition to 4 might seem superfluous, but I think it makes the client app behave more user friendly because it doesn't require a roundtrip to the server to see if the data is OK. What's your approach?
Without getting too specific, I think there should validations for the following reasons:
Let the user know that the input is incorrect in some way.
Protect the system from attacks.
Letting the user know that some data is incorrect early would be friendly -- for example, an e-mail entry field may have a red background until the # sign and a domain name is entered. Only when an e-mail address follows the format in RFC 5321/5322, the e-mail field should turn green, and perhaps put a little nice check mark to let the user know that the e-mail address looks good.
Also, letting the user know that the information provided is probably incorrect in some way would be helpful as well. For example, ask the user whether or not he or she really means to have the same recipient twice for the same e-mail message.
Then, next should be checks on the server side -- and never assume that the data that is coming through is well-formed. Perform checks to be sure that the data is sound, and beware of any attacks.
Assuming that the client will thwart SQL injections, and blindly accepting data from connections to the server can be a serious vulnerability. As mentioned, a malicious client whose sole purpose is to attack the system could easily compromise the system if the server was too trusting.
And finally, perform whatever checks to see if the data is correct, and the logic can deal with the data correctly. If there are any problems, notify the user of any problems.
I guess that being friendly and defensive is what it comes down to, from my perspective.
There's only a rule which is using at least some kind of server validation always (number 3/4 in your list).
Client validation (Number 2/1) makes the user experience snappier and reduces load (because you don't post to the server stuff that doesn't pass client validation).
An important thing to point out is that if you go with client validation only you're at great risk (just imagine if your client validation relies on javascript and users disable javascript on their browser).
There shoudl definitely be validation on the server end. I am thinking taht the validation should be done as early as possible on the server end, so there's less chance of malicious (or incorrect) data entering the system.
Input validation on the client end is helpful, since it makes the interface snappier, but there's no guarantee that data coming in to the server has been through the client-side validation, so there MUST be validation on the server end.
Because of security an convenience: server side and as early as possible
But what is also important is to have some global model/business logic validation so when you have for example multiple forms with common data (for example name of the product) the validation rule should remain consistent unless the requirements says otherwise.

How to avoid Anemic Domain Models and maintain Separation of Concerns?

It seems that the decision to make your objects fully cognizant of their roles within the system, and still avoid having too many dependencies within the domain model on the database, and service layers?
For example: Say that I've got an entity with a revision history, and several "lookup tables" that the data references, your entity object should have methods to get the details from some of the lookup tables, whether by providing access to the lookup table rows, or by delegating methods down to them, but in order to do so it depends on the database layer to read the data from those rows. Also, when the entity is saved, It needs to know not only how to save itself, but also to save entries into the revision history. Is it necessary to pass references to dozens of different data layer objects and service objects to the model object? This seems like it makes the logic far more complex to understand than just passing back and forth thin models to service layer objects, but I've heard many "wise men" recommending this sort of structure.
Really really good question. I have spent quite a bit of time thinking about such topics.
You demonstrate great insight by noting the tension between an expressive domain model and separation of concerns. This is much like the tension in the question I asked about Tell Don't Ask and Single Responsibility Principle.
Here is my view on the topic.
A domain model is anemic because it contains no domain logic. Other objects get and set data using an anemic domain object. What you describe doesn't sound like domain logic to me. It might be, but generally, look-up tables and other technical language is most likely terms that mean something to us but not necessarily anything to the customers. If this is incorrect, please clarify.
Anyway, the construction and persistence of domain objects shouldn't be contained in the domain objects themselves because that isn't domain logic.
So to answer the question, no, you shouldn't inject a whole bunch of non-domain objects/concepts like lookup tables and other infrastructure details. This is a leak of one concern into another. The Factory and Repository patterns from Domain-Driven Design are best suited to keep these concerns apart from the domain model itself.
But note that if you don't have any domain logic, then you will end up with anemic domain objects, i.e. bags of brainless getters and setters, which is how some shops claim to do SOA / service layers.
So how do you get the best of both worlds? How do you focus your domain objects only domain logic, while keeping UI, construction, persistence, etc. out of the way? I recommend you use a technique like Double Dispatch, or some form of restricted method access.
Here's an example of Double Dispatch. Say you have this line of code:
entity.saveIn(repository);
In your question, saveIn() would have all sorts of knowledge about the data layer. Using Double Dispatch, saveIn() does this:
repository.saveEntity(this.foo, this.bar, this.baz);
And the saveEntity() method of the repository has all of the knowledge of how to save in the data layer, as it should.
In addition to this setup, you could have:
repository.save(entity);
which just calls
entity.saveIn(this);
I re-read this and I notice that the entity is still thin because it is simply dispatching its persistence to the repository. But in this case, the entity is supposed to be thin because you didn't describe any other domain logic. In this situation, you could say "screw Double Dispatch, give me accessors."
And yeah, you could, but IMO it exposes too much of how your entity is implemented, and those accessors are distractions from domain logic. I think the only class that should have gets and sets is a class whose name ends in "Accessor".
I'll wrap this up soon. Personally, I don't write my entities with saveIn() methods, because I think even just having a saveIn() method tends to litter the domain object with distractions. I use either the friend class pattern, package-private access, or possibly the Builder pattern.
OK, I'm done. As I said, I've obsessed on this topic quite a bit.
"thin models to service layer objects" is what you do when you really want to write the service layer.
ORM is what you do when you don't want to write the service layer.
When you work with an ORM, you are still aware of the fact that navigation may involve a query, but you don't dwell on it.
Lookup tables can be a relational crutch that gets used when there isn't a very complete object model. Instead of things referencing things, you have codes, which must be looked up. In many cases, the codes devolve to little more than a static pool of strings with database keys. And the relevant methods wind up in odd places in the software.
However, if there is a more complete object model, we have first-class things instead of these degenerate lookup values.
For example, I've got some business transactions which have one of n different "rate plans" -- a kind of pricing model. Right now, the legacy relational database has the rate plan as a lookup table with a code, some pricing numbers, and (sometimes) a description.
[Everyone knows the codes -- the codes are sacred. No one is sure what the proper descriptions should be. But they know the codes.]
But really, a "rate plan" is an object that is associated with a contract; the rate plan has the method that computes the final price. When an app asks the contract for a price, the contract delegates some of the pricing work to the associated rate plan object.
There may have been some database query going on to lookup the rate plan when producing a contract price, but that's incidental to the delegation of responsibility between the two classes.
I aggree with DeadBeef - therein lies the tension. I don't really see though how a domain model is 'anemic' simply because it doesn't save itself.
There has to be much more to it. ie. It's anemic because the service is doing all the business rules and not the domain entity.
Service(IRepository) injected
Save(){
DomainEntity.DoSomething();
Repository.Save(DomainEntity);
}
'Do Something' is the business logic of the domain entity.
**This would be anemic**:
Service(IRepository) injected
Save(){
if(DomainEntity.IsSomething)
DomainEntity.SetItProperty();
Repository.Save(DomainEntity);
}
See the inherit difference ? I do :)
Try the "repository pattern" and "Domain driven design". DDD suggests to define certain entities as Aggregate-roots of other objects. Each Aggregate is encapsulated. The entities are "persistence ignorant". All the persistence-related code is put in a repository object which manages Data-access for the entity. This way you don't have to mix persistence-related code with your business logic. If you are interested in DDD, check out eric evans book.

How do you handle exceptional cases

This is often situation, but here is latest example:
Companies have various contact data (addresses, phone numbers, e-mails...) when they make job ad, they have checkboxes where they choose how they want to be contacted. It is basically descriptive data. User when reading an ad sees something like "You can apply by mail, in person...", except if it's "through web portal" or "by e-mail" because then appropriate buttons should appear. These options are stored in database, and client (owner of the site, not company making an ad) can change them (e.g. they can add "by telepathy" or whatever), yet if they tamper with "e-mail" and "web-portal" options, they screw their web site.
So how should I handle data where everything behaves same way except "this thing" that behaves this way, and "that thing" that behaves some other way, and data itself is live should be editable by client.
You've tagged your question as "language-agnostic", and not all languages cleanly support polymorphism, but that's the way I would approach this.
Each option has some type, and different types require different properties to be set. However, every type supports some sort of "render" method that can display the contact method as needed. Since the properties (phone number, or web address, etc.) are type-specific, you can validate the administrator's input when creating these "objects", to make sure that the necessary data is provided and valid. Since you implement the render method, rather than spitting out HTML provided by a user, you can ensure that the rendered page is correct. It's less flexible, but safer and more user friendly.
In the database, you can have one sparsely populated table that holds data for all types of contacts, or a "parent" table with common properties and sub-tables with type-specific properties. It depends on how many types you have and how different they are. In either case, you would have some sort of type indicator, so that you know the type of object to which the data should be be bound.
First of all, think twice do you really need it. Reason is simple. You are supposed to serve specific need and input data is a mean to provide that service. If data does not fit with existing service then what is its value and who are consumer of that specific information?
There are two possible answers: You are expanding your client base or you need to change existing service because of change of demand. In both cases you need to star from development of business model. If you describe what service you need and what information it should provide you will avoid much of specific data and come with clear requirements easy to implement in software.
I'd recommend the resolution pattern for this, based on the mention of a database. The link above describes it, but it's actually a lot simpler than it sounds. You write a database query that returns all the possible options (for example, you read the standard options and the customized options together using perhaps a UNION or a JOIN depending on your schema) - the COALESCE SQL keyword is then useful to find the first 'resolution' of the option value that isn't NULL.
Well, if all it is is that you have two options that are special, and then anything else is dealt with in the same way, then store your options as strings, and if either of the two special ones appears in that list, then show the appropriate stuff for that special item.
Just check your list of items for the two special ones. Nothing fancy.
By writing a very simple Rules Engine. You can use an out-of-the box implementation, or you can roll your own. Since your case seems so simple, I tend to roll my own, because it means less dependencies (YMMV).