Characters and symbols in HTML - html

I am looking for a list of characters and symbols for use in HTML in PDF or image format. It could be some sort of cheat-sheet. Basically I want a reference list for use in HTML for replacing for example '&' with '&'. I have found the list in http://www.w3schools.com/tags/ref_entities.asp but if anyone can point me to pdf or image format of the list.
Regards

There is a complete list in the specification but, with the exception of <, &, and " or ', you should be able to use any character directly in UTF-8 (which results in much more readable documents).

Cheat-sheet

Related

Proper Way to Escape the | Character Using HTML Entities

To escape the ampersand character in HTML, I use the & HTML entity, for example:
Link
If I have the following code in my HTML, how would I escape the | character?
Link
HTML Tidy is complaining, claiming an illegal character was found in my HTML.
I tried using ¦ and several other HTML entities, but Tidy says "malformed URI reference."
You wouldn't.
The problem (as the message says) is that the character is illegal in URLs. It is perfectly fine in HTML.
You need to apply encoding for URLs which would be %7C.
I don't know why tidy is complaining about it, but this character is not problematic in HTML nor in URL. | is not a reserved character and can be used in URL as is. You can percent-encode every character, but there is really no need for it.
What I would presume Tidy might be complaining is =. You have got two of them, the second being an invalid one.
There is no need to encode this character in HTML entities. It has no special meaning in HTML.

How do you escape escaped text in HTML? [duplicate]

I have some XML text that I wish to render in an HTML page. This text contains an ampersand, which I want to render in its entity representation: &.
How do I escape this ampersand in the source XML? I tried &, but this is decoded as the actual ampersand character (&), which is invalid in HTML.
So I want to escape it in such a way that it will be rendered as & in the web page that uses the XML output.
When your XML contains &amp;, this will result in the text &.
When you use that in HTML, that will be rendered as &.
As per §2.4 of the XML 1.0 spec, you should be able to use &.
I tried & but this isn't allowed.
Are you sure it isn't a different issue? XML explicitly defines this as the way to escape ampersands.
The & character is itself an escape character in XML so the solution is to concatenate it and a Unicode decimal equivalent for & thus ensuring that there are no XML parsing errors. That is, replace the character & with &.
Use CDATA tags:
<![CDATA[
This is some text with ampersands & other funny characters. >>
]]>
& should work just fine. Wikipedia has a list of predefined entities in XML.
In my case I had to change it to %26.
I needed to escape & in a URL. So & did not work out for me.
The urlencode function changes & to %26. This way neither XML nor the browser URL mechanism complained about the URL.
I have tried &amp, but it didn't work. Based on Wim ten Brink's answer I tried &amp and it worked.
One of my fellow developers suggested me to use & and that worked regardless of how many times it may be rendered.
& is the way to represent an ampersand in most sections of an XML document.
If you want to have XML displayed within HTML, you need to first create properly encoded XML (which involves changing & to &) and then use that to create properly encoded HTML (which involves again changing & to &). That results in:
&amp;
For a more thorough explanation of XML encoding, see:
What characters do I need to escape in XML documents?
<xsl:text disable-output-escaping="yes">& </xsl:text> will do the trick.
Consider if your XML looks like below.
<Employees Id="1" Name="ABC">
<Query>
SELECT * FROM EMP WHERE ID=1 AND RES<>'GCF'
<Query>
</Employees>
You cannot use the <> directly as it throws an error. In that case, you can use <> in replacement of that.
<Employees Id="1" Name="ABC">
<Query>
SELECT * FROM EMP WHERE ID=1 AND RES <> 'GCF'
<Query>
</Employees>
14.1 How to use special characters in XML has all the codes.

{{=XML(some thing)}} can't be parsed in html with web2py

I am using magicsuggest as a auto-complete plugin of a web application with web2py. I define a list variable dt=['张','李'] in the model/db.py. The element in the list is Chinese. However when I embeded the variable in the html like{{=XML(dt)}} according to the manual book of magicsuggest. The chinese character was garbled. After several days searching, I find the list variable with chinese character was encode into hex in the html. I know there is something wrong about encode/decode. Could someone help me to display the correct chinese character in the html?
XML() is meant to take a string, not a list of strings. If you pass it something other than a string, it will first be converted to a string, so your code is equivalent to {{=XML(str(dt))}}, and you'll notice that in Python, str(['张','李']) yields "['\\xe5\\xbc\\xa0', '\\xe6\\x9d\\x8e']".
Instead, you can do {{=XML(dt[0])}}, and you will see the first character in the list displayed properly.
If you want to display a comma separated list surrounded by brackets, you can do:
{{=json.dumps(dt, encoding="UTF-8", ensure_ascii=False)}}

Escape special (HTML tag) characters in XML attribute?

As part of an XML node attribute, I need to pass up HTML characters as part of an attribute value, such as hello" />. I can't use CDATA as part of the value of the node, as lots of other systems use this method and I cannot afford to break or rewrite that process, so I'm stuck with this.
I can't HTML encode the values, as they're used inside of an email and are subsequently outputted literally as HTML encoded values (<br >hello, for example).
Is there a way to escape HTML (specifically, the < character) and allow me to keep un-encoded HTML inline as an attribute? Thanks.
The XML characters <>&" must be escaped identical to the HTML entities < and so on. Using XML APIS will receive/store the original character. Other character entities in HTML should be converted to UTF-8. Numeric entities, hex (ü) and decimal (࣭) are simple, but for named entities (•) one needs a Library. (If one wants to achieve completeness.)

HTML Character Encoding

When outputting HTML content from a database, some encoded characters are being properly interpreted by the browser while others are not.
For example, %20 properly becomes a space, but %AE does not become the registered trademark symbol.
Am I missing some sort of content encoding specifier?
(note: I cannot realistically change the content to, for example, ® as I do not have control over the input editor's generated markup)
%AE is not valid for HTML safe ASCII,
You can view the table here: http://www.ascii.cl/htmlcodes.htm
It looks like you are dealing with Windows Word encoding (windows-1252?? something like that) it really will NOT convert to html safe, unless you do some sort of translation in the middle.
The byte AE is the ISO-8859-1 representation for the registered trademark. If you don't see anything, then apparently the URL decoder is using other charset to URL-decode it. In for example UTF-8, this byte does not represent any valid character.
To fix this, you need to URL-decode it using ISO-8859-1, or to convert the existing data to be URL-encoded using UTF-8.
That said, you should not confuse HTML(XML) encoding like ® with URL encoding like %AE.
The '%20' encoding is URL encoding. It's only useful for URLs, not for displaying HTML.
If you want to display the reg character in an HTML page, you have two options: Either use an HTML entity, or transmit your page as UTF-8.
If you do decide to use the entity code, it's fairly simple to convert them en-masse, since you can use numeric entities; you don't have to use the named entities -- ie use ® rather than &#reg;.
If you need to know entity codes for every character, I find this cheat-sheet very helpful: http://www.evotech.net/blog/2007/04/named-html-entities-in-numeric-order/
What server side language are you using? Check for a URL Decode function.
If you are using php you can use urldecode() but you should be careful about + characters.