Preventing dangerous user input from django tinymce - html

Say we want to use tinymce to allow users to enter HTML formatted input. The django-tinymce package is a handy solution.
But to render this later as output, we have to use {{ userinput | safe }} to display it. But do we know for a fact the original user's input is ... safe?
What in particular are the kinds of malicious HTML tags we need to be wary of and sanitize? What would be a sound strategy to not strip out the legitimate tags tinymce utilizes while protecting future website users who will be presented with 'safe' user input?

You can never assume that data provided to you client side is "clean" or "safe". Nefarious people can bypass your front end and all of its validation if their goal is to cause harm to your system.
You should always configure your front end appropriately. Validate data, configure TinyMCE to only allow those types of tags you want created, etc.
However, regardless of the front end design, you always have to re-check submitted content on the server to be safe. There is simply no way around that need. What constitutes "safe" is likely a business decision based on what your application does and who uses it.
There are many different libraries you can use server side to do this sort of validation/cleansing so depending on your specific server side setup (guessing PHP) you can find libraries that allow you to "sanitize/purify" the submitted HTML.

Related

prevent SQL injection using html only

Am trying to validate the inputs for a comment box in order to accept only text, and to alert a message if the user entered number (1-0) or symbol (# # $ % ^ & * + _ = ). to prevent SQL injection
is there is a way to do that in html
You can never trust what comes from the client. You must always have a server side check to block something such as an SQL injection.
You can of course add the client side validation you mentioned but it's only to help users not enter junk data. Still can't trust it once it's sent to the server.
On using Javascript/HTML to improve security
is there is a way to do that in html
No. As others have pointed out, you cannot increase security by doing anything in your HTML or Javascript.
The reason is that the communication between your browser and your server is totally transparent to an attacker. Any developer is probably familiar with the "developer tools" in Firefox, Chrome etc. . Even those tools, which are right there in most modern browsers, are enough to create arbitrary HTML requests (even over HTTPS).
So your server must never rely on the validity of any part of the request. Not the URL, not the GET/POST parameters, not the cookies etc.; you always have to verify it yourself, serverside.
On SQL injection
SQL injection is best avoided by making sure never to have code like this:
sql = "select xyz from abc where aaa='" + search_argument + "'" # UNSAFE
result = db.execute_statement(sql)
That is, you never want to just join strings together to for a SQL statement.
Instead, what you want to do is use bind variables, similar to this pseudo code:
request = db.prepare_statement("select xyz from abc where aaa=?")
result = request.execute_statement_with_bind(sql, search_argument)
This way, user input is never going to be parsed as SQL itself, rendering SQL injection impossible.
Of course, it is still wise to check the arguments on the client-side to improve user experience (avoid the latency of a server roundtrip); and maybe also on server-side (to avoid cryptic error messages). But these checks should not be confused with security.
Short answer: no, you can't do that in HTML. Even a form with a single check box and a submit button can be abused.
Longer answer...
While I strongly disagree that there's nothing you can do in HTML and JavaScript to enhance security, a full discussion of that goes way beyond the bounds of a post here.
But ultimately you cannot assume that any data coming from a computer system you do not control is in any way safe (indeed, in a lot of applications you should not assume that data from a machine you do control is safe).
Your primary defence against any attack is to convert the data to a known safe format for both the sending and receiving components before passing it between components of your system. Here, we are specifically talking about passing data from your server-side application logic to the database. Neither HTML nor JavaScript are involved in this exchange.
Moving out towards the client, you have a choice to make. You can validate and accept/reject content for further processing based on patterns in the data, or you can process all the data and put your trust in the lower layers handling the content correctly. Commonly, people take the first option, but this gives rise to new security problem; it becomes easy to map out the defences and find any gaps. In an ideal world that would not matter too much - the deeper defences will handle the problem, however in the real world, developers are limited by time and ability. If it comes down to a choice of where you spend your skills/time budget, then the answer should always be on making the output safer over validating input.
The question seems to be more straight forward, but i keep my answer the same as before:
Never validate information on the clientside. It makes no sense, because you need to validate the same information with the same (or even better) methods on the server! Validating on Clientside only generates unnecessary overhead as the information from a client can not be trusted. Its a waste of energy.
If you have problems with users sending many different Symbols but no real messages, you should shut down your server immideately! Because this could mean that your users try to find a way to hack into the server to gain control!
Some strange looking special character combinations could allow this if the server doesn't escape user input properly!
In short:
HTML is made for content display, CSS for design of the content, Javascript for interactivity and other Languages like Perl, PHP or Python are made for processing, delivering and validation of information. These last called Languages normally run on a server. Even if you use them on a server you need to be very carefull, as there are possible ways to render them useless too. (For instance if you use global variables the wrong way or you dont escape user input properly.)
I hope this helps to get the right direction.

How can I disable semantic notations in text areas in Semantic MediaWiki Forms?

I am working on a user-moderated database and settled on MediaWiki with Semantic MediaWiki as an engine. I installed Semantic Forms to force the end users to conform to a certain standard when creating or editing entries. The problem is that since a user can add a semantic notation to any form text input it can throw off the proper structure of the system, i.e. if it was an IMDB clone a user can add [[Directed by:Forest Gump]] which would then result in the movie "Forest Gump" showing up under a list of directors.
I doubt that there's any setting that can simply turn this off or on, but I've had one or two ideas as to how to get it working.
One, perhaps there's a way to disable semantic notation on specific namespaces and put the forms on those namespaces. I have a feeling that this will cause the forms to merely break.
Another idea is to modify the code. This is clearly the less ideal approach. To get started, I believe I would need to create some sort of filter on SFTextAreaInput which would disable semantic notations for the user inserted text, but alas I'm unsure as to how to get started on that.
Well, Semantic MediaWiki is still a Wiki. In your classical enterprise database, you restrict the users' input options as a means of ensuring data integrity. That isn't what wikis do; the thinking with a wiki is, yes, the user can enter incorrect information, but another user will amend it and let the first user know what was wrong.
I wouldn't try to coerce SMW into rigid data acquisition. I mean, you do have options such as removing the standard input fields in forms:
'''Free text:'''
{{{standard input|free text|rows=10}}}
If users are selecting a movie page when they should be selecting a director page, then you probably want to encourage correct selection by populating the form control from the Directors category, like:
{{{field|Director|input type=combobox|values from category=Directors}}}
Yes, they can still go very far out of their way to select "Forrest Gump", but if that happens then the fact that someone wilfully circumvented the preselected correct options is a more pressing concern than the fact that the system permits it.
Wikis work best when the system encourages rather than enforces valid knowledge.
My name is Wolfgang Fahl I am behind the smartMediaWiki approach. You might want to go the smartMediaWiki route
see
http://semantic-mediawiki.org/wiki/SMWCon_Spring_2015/smartMediaWiki
For a start don't go just by the property values but e.g. also by a category.
{{#ask: [[Category:Movie]] [[Directed by::+]]
|?Directed by
}}
will only show pages that have both the property set and are in the correct category.
In the smartMediaWiki approach you'd create a topic "Movie" and the entry of movies would be done via Forms. This is an elaboration of the SemanticForms and semantic PageSchemas idea that recently evolved. You can find out more about this at SMWCon Barcelona 2015 this fall.

Is it possible to let the client choose the right translation of a page without scripting?

I have written a website for a local Go meeting in Berlin. It is translated into German, English and Chinese. Currently, I use the naming scheme index.<lang>.html for the three translations and a navigation bar on top to let the user choose.
Is it possible to use meta tags on the index.html (which currently is just a symlink) to let the user agent automagically redirect to the site with the right language if possible? I am interested in solutions that neither involve reconfiguring the server nor need java script to be enabled although the first one might be possible.
You can use HTTP content negotiation to select a version that best matches the language preference information that the browser sends. So it is possible without scripting, but you need to set things up in the server for the negotiation.
However, this is not very practical, because the language preference information cannot be relied on. It is mostly based on browser defaults, since few users even know about the relevant settings in the browser, still less set the appropriately.
Is it possible to use meta tags on the index.html (which currently is just a symlink) to let the user agent automagically redirect to the site with the right language if possible?
No.
If you want automatic selection, then you need to pay attention to the Accept header in the request. That needs server configuration or scripting.
Without it, the best you can have is links to the translations of the document which the user can select manually.

Design: Exposing user interface behaviour to external systems

I'm working on a web application (Java EE backend) which contains a fairly complex input modal. This input modal allows the user to capture data, but it has a bunch of (JavaScript) restrictions, such as mandatory fields, fields only being available if a specific value is entered, etc.
I have to expose this functionality to external systems and allow them to submit this data to my server. These external systems can be both web or client based (but I can assume that the clients will have internet access). My first thought is to provide some kind of definition of the fields and stuff like mandatory to these systems through services, and have them render the input modal however they want. This has been met with resistance though, because the types of fields and restrictions will likely change quite a bit during the next few months of development. These external systems have different deployment timelines, and for this to work we'll have to firstly duplicate all the logic handling these restrictions across all systems, and secondly synchronize our deployments.
An alternative which has been proposed is to have the external systems call my modal through standard HTTP and render it either in an iframe or in an embedded rendered. This solves all of the previous complaints, but it leaves me feeling a little uneasy.
Are there any alternatives we are not thinking of? Maybe some kind of UI schema with existing render libraries for the different platforms? What are your thoughts on the second proposal, any major concerns or is this the "best" solution?
Edit: To clarify, I'll of course still perform backend validation regardless of the frontend decision, as I can't just trust the incoming data.
The constraints that you mention (mandatory fields etc.) really have nothing to do with the user interface. You are also right that it is not a good idea to have your backend render web content.
Your first proposal sounds like a good idea, here's how I would solve the issues you mentioned:
Do all the validation on the backend and send a model object to the client, representing the current state of the UI (field name, type, enabled/disabled, error message etc.).
Keep the client as dumb as possible. It should only be responsible for rendering the model on a window / webpage. Whenever a field is changed and it requires validation, submit the model to the backend for validation and get back a new model to be displayed. (You could optimize this by only returning the fields that changed.)
Doing it this way will keep your validation logic in one place (the backend) and the clients rarely need to be modified.
I have been faced with same issues in several previous projects. Based on this experience I can honestly say that server-side validation is the thing you will likely have to implement to avoid rubbish being committed from client side regardless if it comes from GUI or other third party system via API. You can choose one of available validation frameworks, I used Apache Commons Validator and think it is well, or you can implement your own one. On the other hand client side pre-validation, auto-completion and data look up are the solutions you should have to make human users happy. Do not consider about code duplication, just make your system right way from the business point of view.

What's the proper place for input data validation?

(Note: these two questions are similar, but more specific to ASP.Net)
Consider a typical web app with a rich client (it's Flex in my case), where you have a form, an underlying client logic that maps the form's input to a data model, some way of remoting these objects to a server logic, which usually puts it in a database.
Where should I - generally speaking - put the validation logic, i. e. ensuring correct format of email adresses, numbers etc.?
As early as possible. Rich client frameworks like Flex provide built-in validator logic that lets you validate right upon form submission, even before it reaches your data model. This is nice and responsive, but if you develop something extensible and you want the validation to protect from programming mistakes of later contributors, this doesn't catch it.
At the data model on the client side. Since this is the 'official' representation of your data and you have data types and getters / setters already there, this validation captures user errors and programming errors from people extending your system.
Upon receiving the data on the server. This adds protection from broken or malicious clients that may join the system later. Also in a multi-client scenario, this gives you one authorative source of validation.
Just before you store the data in the backend. This includes protection from all mistakes made anywhere in the chain (except the storing logic itself), but may require bubbling up the error all the way back.
I'm sort of leaning towards using both 2 and 4, as I'm building an application that has various points of potential extension by third parties. Using 2 in addition to 4 might seem superfluous, but I think it makes the client app behave more user friendly because it doesn't require a roundtrip to the server to see if the data is OK. What's your approach?
Without getting too specific, I think there should validations for the following reasons:
Let the user know that the input is incorrect in some way.
Protect the system from attacks.
Letting the user know that some data is incorrect early would be friendly -- for example, an e-mail entry field may have a red background until the # sign and a domain name is entered. Only when an e-mail address follows the format in RFC 5321/5322, the e-mail field should turn green, and perhaps put a little nice check mark to let the user know that the e-mail address looks good.
Also, letting the user know that the information provided is probably incorrect in some way would be helpful as well. For example, ask the user whether or not he or she really means to have the same recipient twice for the same e-mail message.
Then, next should be checks on the server side -- and never assume that the data that is coming through is well-formed. Perform checks to be sure that the data is sound, and beware of any attacks.
Assuming that the client will thwart SQL injections, and blindly accepting data from connections to the server can be a serious vulnerability. As mentioned, a malicious client whose sole purpose is to attack the system could easily compromise the system if the server was too trusting.
And finally, perform whatever checks to see if the data is correct, and the logic can deal with the data correctly. If there are any problems, notify the user of any problems.
I guess that being friendly and defensive is what it comes down to, from my perspective.
There's only a rule which is using at least some kind of server validation always (number 3/4 in your list).
Client validation (Number 2/1) makes the user experience snappier and reduces load (because you don't post to the server stuff that doesn't pass client validation).
An important thing to point out is that if you go with client validation only you're at great risk (just imagine if your client validation relies on javascript and users disable javascript on their browser).
There shoudl definitely be validation on the server end. I am thinking taht the validation should be done as early as possible on the server end, so there's less chance of malicious (or incorrect) data entering the system.
Input validation on the client end is helpful, since it makes the interface snappier, but there's no guarantee that data coming in to the server has been through the client-side validation, so there MUST be validation on the server end.
Because of security an convenience: server side and as early as possible
But what is also important is to have some global model/business logic validation so when you have for example multiple forms with common data (for example name of the product) the validation rule should remain consistent unless the requirements says otherwise.