prevent SQL injection using html only - html

Am trying to validate the inputs for a comment box in order to accept only text, and to alert a message if the user entered number (1-0) or symbol (# # $ % ^ & * + _ = ). to prevent SQL injection
is there is a way to do that in html

You can never trust what comes from the client. You must always have a server side check to block something such as an SQL injection.
You can of course add the client side validation you mentioned but it's only to help users not enter junk data. Still can't trust it once it's sent to the server.

On using Javascript/HTML to improve security
is there is a way to do that in html
No. As others have pointed out, you cannot increase security by doing anything in your HTML or Javascript.
The reason is that the communication between your browser and your server is totally transparent to an attacker. Any developer is probably familiar with the "developer tools" in Firefox, Chrome etc. . Even those tools, which are right there in most modern browsers, are enough to create arbitrary HTML requests (even over HTTPS).
So your server must never rely on the validity of any part of the request. Not the URL, not the GET/POST parameters, not the cookies etc.; you always have to verify it yourself, serverside.
On SQL injection
SQL injection is best avoided by making sure never to have code like this:
sql = "select xyz from abc where aaa='" + search_argument + "'" # UNSAFE
result = db.execute_statement(sql)
That is, you never want to just join strings together to for a SQL statement.
Instead, what you want to do is use bind variables, similar to this pseudo code:
request = db.prepare_statement("select xyz from abc where aaa=?")
result = request.execute_statement_with_bind(sql, search_argument)
This way, user input is never going to be parsed as SQL itself, rendering SQL injection impossible.
Of course, it is still wise to check the arguments on the client-side to improve user experience (avoid the latency of a server roundtrip); and maybe also on server-side (to avoid cryptic error messages). But these checks should not be confused with security.

Short answer: no, you can't do that in HTML. Even a form with a single check box and a submit button can be abused.
Longer answer...
While I strongly disagree that there's nothing you can do in HTML and JavaScript to enhance security, a full discussion of that goes way beyond the bounds of a post here.
But ultimately you cannot assume that any data coming from a computer system you do not control is in any way safe (indeed, in a lot of applications you should not assume that data from a machine you do control is safe).
Your primary defence against any attack is to convert the data to a known safe format for both the sending and receiving components before passing it between components of your system. Here, we are specifically talking about passing data from your server-side application logic to the database. Neither HTML nor JavaScript are involved in this exchange.
Moving out towards the client, you have a choice to make. You can validate and accept/reject content for further processing based on patterns in the data, or you can process all the data and put your trust in the lower layers handling the content correctly. Commonly, people take the first option, but this gives rise to new security problem; it becomes easy to map out the defences and find any gaps. In an ideal world that would not matter too much - the deeper defences will handle the problem, however in the real world, developers are limited by time and ability. If it comes down to a choice of where you spend your skills/time budget, then the answer should always be on making the output safer over validating input.

The question seems to be more straight forward, but i keep my answer the same as before:
Never validate information on the clientside. It makes no sense, because you need to validate the same information with the same (or even better) methods on the server! Validating on Clientside only generates unnecessary overhead as the information from a client can not be trusted. Its a waste of energy.
If you have problems with users sending many different Symbols but no real messages, you should shut down your server immideately! Because this could mean that your users try to find a way to hack into the server to gain control!
Some strange looking special character combinations could allow this if the server doesn't escape user input properly!
In short:
HTML is made for content display, CSS for design of the content, Javascript for interactivity and other Languages like Perl, PHP or Python are made for processing, delivering and validation of information. These last called Languages normally run on a server. Even if you use them on a server you need to be very carefull, as there are possible ways to render them useless too. (For instance if you use global variables the wrong way or you dont escape user input properly.)
I hope this helps to get the right direction.

Related

Best practice for email links that will set a DB flag?

Our business wants to email our customers a survey after they work with support. For internal reasons, we want to ask them the first question in the body of the email. We'd like to have a link for each answer. The link will go to a web service, which will store the answer, then present the rest of the survey.
So far so good.
The challenge I'm running into: making a server-side changed based on an HTTP GET is bad practice, but you can't do a POST from a link. Options seem to be:
Use an HTTP GET instead, even though that's not correct and could cause problems (https://twitter.com/rombulow/status/990684453734203392)
Embed an HTML form in the email and style some buttons to look like links (likely not compatible with a number of email platforms)
Don't include the first question in the email (not possible for business reasons)
Use HTTP GET, but have some sort of mechanism which prevents a link from altering the server state more than once
Does anybody have any better recommendations? Googling hasn't turned up much about this specific situation.
One thing to keep in mind is that HTTP is specifying semantics, not implementation. If you want to change the state of your server on receipt of a GET request, you can. See RFC 7231
This definition of safe methods does not prevent an implementation from including behavior that is potentially harmful, that is not entirely read-only, or that causes side effects while invoking a safe method. What is important, however, is that the client did not request that additional behavior and cannot be held accountable for it. For example, most servers append request information to access log files at the completion of every response, regardless of the method, and that is considered safe even though the log storage might become full and crash the server. Likewise, a safe request initiated by selecting an advertisement on the Web will often have the side effect of charging an advertising account.
Domain agnostic clients are going to assume that GET is safe, which means your survey results could get distorted by web spiders crawling the links, browsers pre-loading resource to reduce the perceived latency, and so on.
Another possibility that works in some cases is to treat the path through the graph as the resource. Each answer link acts like a breadcrumb trail, encoding into itself the history of the clients answers. So a client that answered A and B to the first two questions is looking at /survey/questions/questionThree?AB where the user that answered C to both is looking at /survey/questions/questionThree?CC. In other words, you aren't changing the state of the server, you are just guiding the client through a pre-generated survey graph.

HTML- Required "bug" [duplicate]

Using a simple tool like FireBug, anyone can change javascript parameters on the client side. If anyone take time and study your application for a while, they can learn how to change JS parameters resulting in hacking your site.
For example, a simple user can delete entities which they see but are not allowed to change. I know a good developer must check everything on server side, but this means more overhead, you must do checks with data from a DB first, in order to validate the request. This takes a lot of time, for every action someone must validate it, and can only do this by fetching the needed data from DB.
What would you do to minimize hacking in that case?
A more simple way to validate is to add another parameter for every javascript function, this parameter must be a signature between previous parameters and a secret key.
How good sounds the solution above to you?
Our team use teamworkpm.net to organize our work. I just discovered that I can edit someone else tasks by changing a javascript function (which initially edit my own tasks).
when every function call to server, in server side before you do the action , you need to check if this user is allowed to do this action.
It is necessary to build server-side permissions mechanism to prevent unwanted actions, you may want to define groups of users, not individual user level, it makes it easier.
Anything on the client side could be spoofed. If you use some type of secret key + parameter signature, your signature algorithm must be sufficiently random/secure that it cannot be reverse engineered.
The overhead created with adding client side complexity is better spent crafting proper server side validations.
What would you do to minimize hacking in that case ?
You can't work around using validation methods on the server side.
A more simple way to validate is to add another parameter for every javascript function, this parameter must be a signature between previous parameters and a secret key.
How good sounds the solution above to you ?
And how do you use the secret key without the client seeing it? As you self mentioned before, the user easily can manipulate your javascript, and also he can read everything in javascript, the secret key, too!
You can't hide anything in JavaScript, the only thing you can do is to obscure things in JavaScript, and hope nobody tries to find out what you try to hide.
This is why you must validate everything on the server. You can never guarantee that the user won't mess about with things on the client.
Everything, even your javascript source code is visible to the client and can be changed by them, theres no way around this.
There's really no way to do this completely client-side. If the person has a valid auth cookie, they can craft any sort of request they want regardless of the code on the page and send it to your server. You can do things with other, encrypted cookies that must sent back with the request and also must match the inputs on the page, but you still need to check this server-side. Server-side security is essential in protecting your application from unauthorized access and you must ensure, server-side, that every action being performed is one that the user is authorized to perform.
You certainly cannot hide anything client side, so there is little point in trying to do so.
If what you are saying is that you are sending something like a user ID and you want to ensure that the returned value has not been illicitly changed then the simplest way of doing so it probably to generate and send a UUID alongside it, and check on return that the value of the uuid matched that stored on the server for the userID before doing any further processing. The space for uuid's is so large that you can discount any false hits ever occurring.
As to actual server side processing vulnerabilities:- you should simply always build in your security/permissions as close to the database as you can, and defiantly not in the client. There's nothing different in the scenario you outline from any normal client-server design.
Peter from Teamworkpm.net here - I'm one of the main developers and was concerned to come across this report about a security problem. I checked into this and I am happy that is not possible to delete a task that you shouldn't have access to.
You get a message saying "You do not have permission to delete this task".
I think it is just the confusion between being a Project Administrator and being an overall Administrator that is the problem here :- You may not be a member of a project but as an overall administrator, you still have permission to delete any task within your Teamwork site. This is by design.
We take security very seriously and it's all implemented server side because as Jens F says, we can't reply on client side security.
If you do come across any issues in TeamworkPM that you would like to discuss, we'd encourage any of you to just hit the feedback link and you'll typically get an answer within a few hours.

Design: Exposing user interface behaviour to external systems

I'm working on a web application (Java EE backend) which contains a fairly complex input modal. This input modal allows the user to capture data, but it has a bunch of (JavaScript) restrictions, such as mandatory fields, fields only being available if a specific value is entered, etc.
I have to expose this functionality to external systems and allow them to submit this data to my server. These external systems can be both web or client based (but I can assume that the clients will have internet access). My first thought is to provide some kind of definition of the fields and stuff like mandatory to these systems through services, and have them render the input modal however they want. This has been met with resistance though, because the types of fields and restrictions will likely change quite a bit during the next few months of development. These external systems have different deployment timelines, and for this to work we'll have to firstly duplicate all the logic handling these restrictions across all systems, and secondly synchronize our deployments.
An alternative which has been proposed is to have the external systems call my modal through standard HTTP and render it either in an iframe or in an embedded rendered. This solves all of the previous complaints, but it leaves me feeling a little uneasy.
Are there any alternatives we are not thinking of? Maybe some kind of UI schema with existing render libraries for the different platforms? What are your thoughts on the second proposal, any major concerns or is this the "best" solution?
Edit: To clarify, I'll of course still perform backend validation regardless of the frontend decision, as I can't just trust the incoming data.
The constraints that you mention (mandatory fields etc.) really have nothing to do with the user interface. You are also right that it is not a good idea to have your backend render web content.
Your first proposal sounds like a good idea, here's how I would solve the issues you mentioned:
Do all the validation on the backend and send a model object to the client, representing the current state of the UI (field name, type, enabled/disabled, error message etc.).
Keep the client as dumb as possible. It should only be responsible for rendering the model on a window / webpage. Whenever a field is changed and it requires validation, submit the model to the backend for validation and get back a new model to be displayed. (You could optimize this by only returning the fields that changed.)
Doing it this way will keep your validation logic in one place (the backend) and the clients rarely need to be modified.
I have been faced with same issues in several previous projects. Based on this experience I can honestly say that server-side validation is the thing you will likely have to implement to avoid rubbish being committed from client side regardless if it comes from GUI or other third party system via API. You can choose one of available validation frameworks, I used Apache Commons Validator and think it is well, or you can implement your own one. On the other hand client side pre-validation, auto-completion and data look up are the solutions you should have to make human users happy. Do not consider about code duplication, just make your system right way from the business point of view.

Screen scraping gotchas

When screen-scraping, what are the "gotcha"s to look out for?
The inspiration for this is: my spouse's co-worker asked me to scrape all the pages from a Blogger-hosted blog that her friend with cancer kept in her final months and this lady wanted to keep all of the posts in case the blog were ever deleted. I eventually found a free tool that was barely good enough.
One issue with scraping many Blogger pages is that there's often a navigation menu where you can click on the triangles to expand the post lists by year or month. These little buggers created insane amounts of duplicate content because you'd have the same page over and over again with different combinations of the menus being expanded/collapsed. In Blogger's case I'm not sure this is avoidable since the links are all formatted as real http links and not obvious JavaScript calls. Still, it got me thinking:
If you were to scrape a website, what kinds of potentially non-obvious things would you compensate for?
Do not use regex to scrape
While regular expressions can be good for a large variety of tasks, I find it usually falls short when parsing HTML DOM. The problem with HTML is that the structure of your document is so variable that it is hard to accurately (and by accurately I mean 100% success rate with no false positive) extract a tag.
What I recommend you do is use a DOM parser such as BeautifulSoup or equivalent (SimpleHTMLDom in PHP).
Some may think this is overkill, but in the end, it will be easier to maintain and also allows for more extensibility.
A regular expression could be devised to achieve the same goal but would be limited. For example, developing a regex to get the src and alt tag would force the alt attribute to be after the src or the opposite, and to overcome this limitation would add more complexity to the regular expression.
Also, consider the following. To properly match an <img> tag using regular expressions and to get only the src attribute (captured in group 2), you need the following regular expression:
<\s*?img\s+?[^>]*?\s*?src\s*?=\s*?(["'])((\\?+.)*?)\1[^>]*?>
And then again, the above can fail if:
The attribute or tag name is in capital and the i modifier is not used.
Quotes are not used around the src attribute.
Another attribute then src uses the > character somewhere in their value.
Some other reason I have not foreseen.
So again, simply don't use regular expressions to parse a dom document.
I screen scrape a lot. Some advice:
Emulate a User-Agent string for some browser you want to use. Different websites frequently return very different results depending on what your user agent is. If they don't recognize the User-Agent they will often revert to lowest common denominator, so it's usually best to start with some recent browser. (For example the World of Warcraft Armory returns beautiful, easy to parse XML if it thinks you're a recent Firefox. If it doesn't know what you are it sends terrible HTML).
Be polite to the site you're scraping; don't hit it too hard. Your scraper will go faster if you multi-thread it, making many requests at once, but that will annoy the site owner.
Be smart about error handling. Do not write code like while (1) { makeRequest(); }. If your code or the server throws an error a loop like this will immediately fetch another request, generating another error. It can get ugly quickly. Handle errors well and consider putting in sleeps or exits if you see a lot of errors.
When developing your parsing code, test against a cached version rather than hitting the server every time. Will make your development go faster and is the basis of a simple test suite.
First, I'd check for an RSS feed. On blogger, you just have to add /rss to the root url, if I remember correctly.
Then I'd check if there isn't already some tool to scrape blogger.
Then if there's no RSS feed, and no existing tool, I'd give up and do it by hand with copy/paste. Unless we're talking 5000 pages, it's much faster and easier that way. Take it from someone who's tried.
If you have access to the actual account, blogger has an export function.
edit: Or of course, you could try mechanical turk.
As far as gotchas are concerned..It's usually a good idea to limit the amount of requests made over a certain period of time. Smashing a site with alot of requests in a short space of time is a good way to have your requests rejected.
Aside from the technical considerations, make sure your not putting yourself at legal risk. Most large sites have specific legal language in their terms of use that disallows programmatic access to their services via an automated computer program, and also, the obvious copyright concerns.
From a technical standpoint, definitely use a DOM parser library and you'll save loads of time. Many provide the ability to read HTML into an XML structure that can be queried using XPath to find exactly what you need.
If you know someone who has access to the account, they can use Blogger's export "Export blog" feature.

What's the proper place for input data validation?

(Note: these two questions are similar, but more specific to ASP.Net)
Consider a typical web app with a rich client (it's Flex in my case), where you have a form, an underlying client logic that maps the form's input to a data model, some way of remoting these objects to a server logic, which usually puts it in a database.
Where should I - generally speaking - put the validation logic, i. e. ensuring correct format of email adresses, numbers etc.?
As early as possible. Rich client frameworks like Flex provide built-in validator logic that lets you validate right upon form submission, even before it reaches your data model. This is nice and responsive, but if you develop something extensible and you want the validation to protect from programming mistakes of later contributors, this doesn't catch it.
At the data model on the client side. Since this is the 'official' representation of your data and you have data types and getters / setters already there, this validation captures user errors and programming errors from people extending your system.
Upon receiving the data on the server. This adds protection from broken or malicious clients that may join the system later. Also in a multi-client scenario, this gives you one authorative source of validation.
Just before you store the data in the backend. This includes protection from all mistakes made anywhere in the chain (except the storing logic itself), but may require bubbling up the error all the way back.
I'm sort of leaning towards using both 2 and 4, as I'm building an application that has various points of potential extension by third parties. Using 2 in addition to 4 might seem superfluous, but I think it makes the client app behave more user friendly because it doesn't require a roundtrip to the server to see if the data is OK. What's your approach?
Without getting too specific, I think there should validations for the following reasons:
Let the user know that the input is incorrect in some way.
Protect the system from attacks.
Letting the user know that some data is incorrect early would be friendly -- for example, an e-mail entry field may have a red background until the # sign and a domain name is entered. Only when an e-mail address follows the format in RFC 5321/5322, the e-mail field should turn green, and perhaps put a little nice check mark to let the user know that the e-mail address looks good.
Also, letting the user know that the information provided is probably incorrect in some way would be helpful as well. For example, ask the user whether or not he or she really means to have the same recipient twice for the same e-mail message.
Then, next should be checks on the server side -- and never assume that the data that is coming through is well-formed. Perform checks to be sure that the data is sound, and beware of any attacks.
Assuming that the client will thwart SQL injections, and blindly accepting data from connections to the server can be a serious vulnerability. As mentioned, a malicious client whose sole purpose is to attack the system could easily compromise the system if the server was too trusting.
And finally, perform whatever checks to see if the data is correct, and the logic can deal with the data correctly. If there are any problems, notify the user of any problems.
I guess that being friendly and defensive is what it comes down to, from my perspective.
There's only a rule which is using at least some kind of server validation always (number 3/4 in your list).
Client validation (Number 2/1) makes the user experience snappier and reduces load (because you don't post to the server stuff that doesn't pass client validation).
An important thing to point out is that if you go with client validation only you're at great risk (just imagine if your client validation relies on javascript and users disable javascript on their browser).
There shoudl definitely be validation on the server end. I am thinking taht the validation should be done as early as possible on the server end, so there's less chance of malicious (or incorrect) data entering the system.
Input validation on the client end is helpful, since it makes the interface snappier, but there's no guarantee that data coming in to the server has been through the client-side validation, so there MUST be validation on the server end.
Because of security an convenience: server side and as early as possible
But what is also important is to have some global model/business logic validation so when you have for example multiple forms with common data (for example name of the product) the validation rule should remain consistent unless the requirements says otherwise.