How and when to use Html encode - html

I've recently learned that i shouldn't store html encoded data in the database, but i should rather html encode the data that is shown on the screen for the user.
No big deal, i have to fix my database records and make some code changes.
But my question is, when should I use html encode and when shouldn't I.
For example, within a html table, I'm writing directly from the database to the inner HTML of a column. Without encoding this would be dangerous, I get that.
What about when setting the value of a textbox. It seems to work without having to html encode the value. But I'm not sure why. This is what the textbox look like:
<input type="textbox" value="xxx"/>
But when setting the value to: "/><p style="font-size: 100px;">testing hack</p>
The html source will be:
<input type="textbox" value=""/><p style="font-size: 100px;">testing hack</p>
It will look fine though when viewed so the p-tag isn't working as intended by the "hack".
Is anyone getting what I'm trying to aim at :) ?
If I do try to html encode something i set to a textbox value, the result will display "&lt" and so on, which is not what I intended.
So in short: Should I only html encode stuff that is set to the innerHtml of html-controls, and not when setting the value of, for example, textboxes?

The answer came out of thejh's and my discussion in the comment to the question. I was not sure what to mark as answer so I decided to answer my own question. I hope that's ok.
It seems like when setting a value of an attribute (like the textbox's "value") .NET automatically html encodes the value so there is no need to do this by yourself.
When setting a html controls inner HTML though, it's important that you do html encode the value.
Thanks Thejh, sorry I couldn't up vote anything u wrote.
edit: I can't mark this as the answer for another 2 days.

in the case of
<input type="textbox" value="xxx"/>
'xxx' is an attribute, and you should use a different encoding. In ASP.NET it's HtmlAttributeEncode for example.

For HTML attributes, encode backslashes and double quotes.
Replace every \ by \\
Replace every " by \"
Oh, by the way: Sometimes PHP does this for you, see here.
This feature has been DEPRECATED as of PHP 5.3.0. Relying on this feature is highly discouraged.

Related

Why would anyone use ${someVar} over <c:out value="${someVar}" />

Using Spring MVC with JSP:
After some reading, I came to the conclusion, that if I print some value using
${someValue}
no html escaping is done. This is a problem since I want to print texts containing < > etc.
The solution I am going to use is to replace all occurencies of this kind using the <c:out>-tag like
<c:out value="${someValue}" />
My question is: Why would I want to use the short form in the first place?
The only valid usage I'd imagine would be, if I want to render the content of someValue as html (which in my opinion is rather the exceptional case).
EDIT: I've found another post which answers my question about when to use the short form, it can be found here
XSS prevention in JSP/Servlet web application
As stated in the link, it is important to wrap user-controlled input which is being re-displayed since this is the potential source for an attack.
So, if some value does not have any special characters e.g. like < or > and is not a value generated or controlled by the user, the shorthand form ${someValue} can be used.

Direct link to MediaWiki page section

In my Wikipedia page, I have a section called subtitleA. Before arriving at this point when reading, I have one sentence that has a link that jumps to the content of that section.
To be more clear, this is a simple illustration:
To do this, you will need `this` (link to subtitleA).
To do that, you will do another thing..
== SubtitleA ==
this is how you do it....
I found the following solution:
To do this, you will need [http://wikisite.com/pageName#SubtitleA this].
This has already been proven correct; however, one of my subtitles contains spaces, brackets and directory like the following:
== SubtitleA (balabalaA\balabalaB\balabala....) ==
I can no longer use the solution I found because of those spaces... Can anyone provide me an alternative solutions? Thanks.
To do this, you will need [[pageName#SubtitleA|this]].
Use the exact same format as in the section title.
Anchor encoding is similar to percent encoding (with a . instead of a %) but not exactly the same (e.g. spaces are collapsed and encoded to _). If you really, really need to do it directly, you can use {{anchorencode|original title}}.
I found the solution:
URL encoder is the key, but not using standard %xx as the replacements for special characters. Use .xx (e.g. .5C .28) would work in the mediawiki framework.

Label text ignoring html tags

<label for="abc" id="xyz">http://abc.com/player.js</xref>?xyz="foo" </label>
is ignoring
</xref> tag
value in the browser. So, the displayed output is
http://abc.com/player.js?xyz="foo"
but i want the browser to display
http://abc.com/player.js</xref>?xyz="foo"
Please help me how to achieve this.
It isn't being ignored. It is being treated as an end tag (for a non-HTML element that has no start tag). Use < if you want a < character to appear as data instead of as "start of tag".
That said, this is a URL and raw <, > and " characters shouldn't appear in URIs anyway. So encode it as http://abc.com/player.js%3C/xref%3E?xyz=%22foo%22
You should do it like this
"http://abc.com/player.js%3C/xref%3E?xyz=foo"
Url should be encoded properly to work as valid URL
Use encodeURI for encoding URLs for a valid one
var ValidURL = encodeURI("http://abc.com/player.js</xref>?xyz=foo");
See this answer on encodeURI for better knowledge.
I misunderstood the question, I thought the URI was to be used elsewhere within JavaScript. But the question pretty clearly states that the URI is to just be rendered as text.
If the text being displayed is being passed in from a server, then your best bet is to encode it before printing it on the page (or if you're using a template engine, then you can most likely just encode it on the template). Pretty much any web framework/templating engine should have this functionality.
However, if it is just static HTML, just manually encode the the characters. If you don't know the codes off the top of your head, you can just use some online converter to help, such as something like:
HTML Encode/Decode:
http://htmlentities.net/
Old Answer:
Try encoding the URI using the JavaScript function encodeURI before using it:
encodeURI('http://abc.com/player.js</xref>?xyz="foo"');
You can also decode it using decodeURI if need be:
decodeURI(yourEncodedURI);
So ultimately I don't think you'll be able to get the browser to display the </xref> tag as is, but you will be able to preserve it (using encodeURI/decodeURI) and use it in your code, if this is what you need.
Fiddle:
http://jsfiddle.net/rk8nR/3/
More info:
When are you supposed to use escape instead of encodeURI / encodeURIComponent?

Trademark symbol is displayed as raw text

if you visit www.startwire.com you'll see in the center of the page (in the yellow box, under the video) the following:
StartWire™
in our dev and stage environments, this is not an issue, but it is in production. What could possibly be causing this?
If you look at the page source, you will see &trade; - you are double encoding the entity.
This should be simply ™.
In the HTML you have:
<h2>Sign-up now. StartWire&trade; is completely FREE.</h2>
whereas the correct would be:
<h2>Sign-up now. StartWire™ is completely FREE.</h2>
Notice the extraneous &. Look like you are double encoding something on the server.
If you check your page source it says:
&trade;
This means that probably it took ™ and transformed that into HTML. So the & becomes &. This is probably due to the use of a htmlentities() function.
Make sure you do not do this conversion twice...
A possible cause of this is that you are taking the contents from a database and that you have encoded the entries before inserting them into the database and you encode them a second time when you retrieve them from this database.
Is the content being "HTML encoded" (or whatever they call it) automatically, somewhere in the script? Because this is what appears in the HTML: &trade;.
My suggestions would be to just use the symbol in your code (™). If that doesn't work, try escaping the & of ™ using \ (so that it becomes \™).
not sure, but i have checked your site it shows like you have write like
&™
simple write ™

Perl AJAX stripping html characters out of string?

I have a Perl program that is reading html tags from a text file. (im pretty sure this is working because when i run the perl program on the command line it prints out the HTML like it should be.)
I then pass that "html" to the web page as the return to an ajax request. I then use innerHTML to stick that string into a div.
Heres the problem:
all the text information is getting to where it needs to be. but the "<" ">" and "/" are getting stripped.
any one know the answer to this?
The question is a bit unclear to me without some code and data examples, but if it is what it vaguely sounds like, you may need to HTML-encode your text (e.g. using HTML::Entities).
I'm kind of surprized that's an issue with inserting into innerHTML, but without specific example, that's the first thing which comes to mind
There could be a mod on the server that is removing special characters. Are you running Apache? (I doubt this is what's happening).
If something is being stripped on the client-side, it is most likely in the response handler portion of the AJAX call. Show your code where you stick the string in the div.