UTF-8 Dingbats In ColdFusion - html

I'm trying to display some hex and decimal encoded special characters (UTF-8 Dingbats) in a coldfusion (v11) page.
<cfloop>
<td id="..." align="center">✂</td>
<td id="..." align="center">✂</td>
</cfloop>
Based on the compiler error it most certainly seems like an issue with the pound (#) character, which of course is a special character in coldfusion.
So, is what I'm trying to do even possible, maybe by escaping #?

In ColdFusion the # is used to output variables within an <cfoutput> block. For example.
<cfoutput>The time is #now()#</cfoutput>
If you need to preserve the #, then you need to escape it, which you can do with a double #. For example:
<cfoutput>My dingbats: &##9986; &##x2702;</cfoutput>
If you're not inside a cfoutput block then you don't need to escape it. For example:
<cfoutput>My dingbat: &##9986;</cfoutput><br>
My dingbat: ✂

Related

What characters can come immediately after the < in a tag?

On a webpage I found a tag that begins with a Unicode letter 休
Is there a list somewhere of the letters and symbols may validly follow right after the less than sign?
They're not using a mark-up less than sign "<" on their site but rather they are using the HTML entity less than < to display the reserved character as text rather than HTML.
This can be treated just like ordinary text. So in essence, it's not a tag, its just ordinary text.
For instance the line:
<font style="color:#F00;"><休闲文化></font>
Actually is:
<font style="color:#F00;"><休闲文化></font>
Thus, <休闲文化> isn't a tag itself, but rather just text (which uses HTML reserved characters within it - perhaps marking you confuse it for a tag)
In which context is it used? XML, HTML,...?
In case of HTML there are tags already defined, you can't use a random one. In XML you can define you're own tags. In both cases you might use random tags, while not ending up with error you would notice, the tag might just get skipped.
I believe this Wiki page might help you:
https://en.m.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
All characters you want (but you should quote few).
As you notice, the character <休 come in <a href="http:(...)" target="_blank" title="<休闲文化>【成都大熊猫基. So this is inside a string (an attribute value). The < in this case will not indicate a start of a tag.

why need to escape string in html?

Well, I know that "correct" escaping will help to prevent SQL injection.
But I saw people escaping values in HTML
<input type="text" value =/"some/" /> <!-- some escaped, why? -->
Question is:
Why to escape in HTML?
<input type="text" value =/"some/" /> <!-- some escaped, why? -->
That is a syntax error. Don't do that.
Use character references to represent special characters (&, <, etc).
Why to escape in HTML?
(Assuming you use the correct syntax to do so): because some characters have special meaning in HTML. For example, you don't want a " (in the data) ending your attribute value prematurely since that can:
Lose data
Lose data but have it display in the page
Allow third parties to inject their JavaScript into your pages and steal data / redirect people to phishing sites / etc

Is it safe to display user input as input values without sanitization?

Say we have a form where the user types in various info. We validate the info, and find that something is wrong. A field is missing, invalid email, et cetera.
When displaying the form to the user again I of course don't want him to have to type in everything again so I want to populate the input fields. Is it safe to do this without sanitization? If not, what is the minimum sanitization that should be done first?
And to clearify: It would of course be sanitized before being for example added to a database or displayed elsewhere on the site.
No it isn't. The user might be directed to the form from a third party site, or simply enter data (innocently) that would break the HTML.
Convert any character with special meaning to its HTML entity.
i.e. & to &, < to <, > to > and " to " (assuming you delimit your attribute values using " and not '.
In Perl use HTML::Entities, in TT use the html filter, in PHP use htmlspecialchars. Otherwise look for something similar in the language you are using.
It is not safe, because, if someone can force the user to submit specific data to your form, you will output it and it will be "executed" by the browser. For instance, if the user is forced to submit '/><meta http-equiv="refresh" content="0;http://verybadsite.org" />, as a result an unwanted redirection will occur.
You cannot insert user-provided data into an HTML document without encoding it first. Your goal is to ensure that the structure of the document cannot be changed and that the data is always treated as data-values and never as HTML markup or Javascript code. Attacks against this mechanism are commonly known as "cross-site scripting", or simply "XSS".
If inserting into an HTML attribute value, then you must ensure that the string cannot cause the attribute value to end prematurely. You must also,of course, ensure that the tag itself cannot be ended. You can acheive this by HTML-encoding any chars that are not guaranteed to be safe.
If you write HTML so that the value of the tag's attribute appears inside a pair of double-quote or single-quote characters then you only need to ensure that you html-encode the quote character you chose to use. If you are not correctly quoting your attributes as described above, then you need to worry about many more characters including whitespace, symbols, punctuation and other ascii control chars. Although, to be honest, its arguably safest to encode these non-alphanumeric chars anyway.
Remember that an HTML attribute value may appear in 3 different syntactical contexts:
Double-quoted attribute value
<input type="text" value="**insert-here**" />
You only need to encode the double quote character to a suitable HTML-safe value such as "
Single-quoted attribute value
<input type='text' value='**insert-here**' />
You only need to encode the single quote character to a suitable HTML-safe value such as ‘
Unquoted attribute value
<input type='text' value=**insert-here** />
You shouldn't ever have an html tag attribute value without quotes, but sometimes this is out of your control. In this case, we really need to worry about whitespace, punctuation and other control characters, as these will break us out of the attribute value.
Except for alphanumeric characters, escape all characters with ASCII values less than 256 with the &#xHH; format (or a named entity if available) to prevent switching out of the attribute. Unquoted attributes can be broken out of with many characters, including [space] % * + , - / ; < = > ^ and | (and more). [para lifted from OWASP]
Please remember that the above rules only apply to control injection when inserting into an HTML attribute value. Within other areas of the page, other rules apply.
Please see the XSS prevention cheat sheet at OWASP for more information
Yes, it's safe, provided of course that you encode the value properly.
A value that is placed inside an attribute in an HTML needs to be HTML encoded. The server side platform that you are using should have methods for this. In ASP.NET for example there is a Server.HtmlEncode method, and the TextBox control will automatically HTML encode the value that you put in the Text property.

When should I HTML-escape data and when should I URL-escape data?

When should I HTML-escape data in my code and when should I URL-escape? I am confused about which one when to use...
For example, given a element which asks for an URL:
<input type="text" value="DATA" name="URL">
Should I HTML-Escape DATA here or URL-escape it here?
And what about an element:
NAME
Should URL be URL-escaped or HTML-escaped? What about NAME?
Thanks, Boda Cydo.
URL encoding ensures that special characters such as ? and & don't cause the URL to be misinterpreted on the receiving end. In practice, this means you'll need to URL encode any dynamic query string values that have a chance of containing such characters.
HTML encoding ensures that special characters such as > and " don't cause the browser the misinterpret the markup. Therefore you need to HTML encode any values outputted into the markup that might contain such characters.
So in your example:
DATA needs to be HTML encoded.
Any dynamic segments of URL will need to be URL encoded, then the whole string will need to be HTML encoded.
Name needs to be HTML encoded.
HTML Escape when you're writing anything to a HTML document.
URL Escape when you're constructing a URL to call in-code, or for a browser to call (i.e. in the href tag).
In your examples you'll want to 'Attribute' escape the attributes. (I can't remember the exact function name, but it's in HttpUtility).
In the examples you show, it should be first URL-escaped, then HTML-escaped:
<a href="http://www.example.com?arg1=this%2C+that&arg2=blah">

Escape tags in html

What are escape tags in html?
Are they " < > to represent " < >?
And how do these work?
Is that hex, or what is it?
How is it made, and why aren't they just the characters themselves?
Here are some common entities. You do not need to use the full code - there are common aliases for frequently used entities. For example, you can use < and > to indicate less than and greater than symbols. & is ampersand, etc.
EDIT: That should be - < > and &
EDIT: Another common character is &nbsp which is often used to represent tabs in <code> segments
How do these work?
Anything &#num; is replaced with character from ASCII table, matching that num.
Is that hex, or what is it?
It's not hex, the number represents characters number in decimal in ASCII table. Check out ASCII table. Check Dec and HTML columns.
Why aren't they just the characters themselves?
Look at this example:
<div>Hey you can use </div> to end div tag!</div>
It would mess up the interpreter. It's a bad example, but you got the idea.
Why you can't use escape characters like in programming languages?
I don't have exact answer to that. But html same as xml is a markup language and not programming language and answer probably lies within history of how markup languages become what they are now.
No, it's not hex, it's decimal. Hex equivalent is < But one usually uses < (less-than) for < and > for > instead.
Here is the complete reference of html entities:
Complete HTML Entities
It is use for correct character formatting
HTML has a set of special characters which browsers recognize as part of the HTML language itself. For example, browsers know that when they encounter a < character in the HTML code, they are to interpret it as a beginning of a tag. > is one of an example of reserved character. Therefore use html character to avoid any problem and for correct practice also
Those escapes are decimal ascii escapes. You can see the values here. Take a look at the HTML column.