Moodle and email question: Emails displaying special characters as & amp ; and not & - html

I have a custom module installed in Moodle (the certificate module). You can have the certificate emailed to the student. Currently, I have an ampersand in my course name and it's showing up as & amp ;. Here's what I mean:
The ampersand is showing in the email subject line and not getting encoded.
It is happening on gmail, and Mac mail. Any ideas on how to fix this or is this something that is out of my control?

You didn't post your HTML code, so it's difficult to see what might be happening.
An ampersand & next to another character might be interpreted as by the email client displaying the email as ASCII code, since the ampersand is used to display extended characters like the copyright symbol: © which can be displayed as: © or ©.
Extended characters are the characters that were not included in the standard ASCII character set. The original ASCII character set included all of the lowercase and uppercase letters on the keyboard (a, A), which are represented with ASCII codes 0 to 127. Anything above 127 is an extended character, for instance the Euro symbol, "€" € € or the trade mark symbol "™" ™ ™.
https://ascii-code.com
One thought is that your email is not being encoded in UTF-8, which allows you to use extended characters without much difficulty. Do you have something like this in the email head: <meta charset="utf-8">.
Good luck.

Related

What kind of encoding is this html encoding?

I am doing a project that involves searching words in the Arabic script on Wiktionary, and when I do a GET request on certain word pages, I get something like this for example:
title="\xd8\xb1\xd8\xa3\xd8\xb3\xd9\x85\xd8\xa7\xd9\x84\xd9\x8a\xd8\xa9">\xd8\xb1\xd8\xa3\xd8\xb3\xd9\x85\xd8\xa7\xd9\x84\xd9\x8a\xd8\xa9</a></li>\n<li><a href="/wiki/%D8%B1%D8%A3%D8%B3%D9%8A"
This corresponds to the following URL: https://en.wiktionary.org/wiki/%D8%B1%D8%A3%D8%B3%D9%8A.
Does anyone know what the \xd8 or %D8 encodings are called? I want to say they are hex codes, but I have already looked up hex codes for the Arabic script and they certainly are not these.
The percentages you see in the url are used to substitute characters that are'nt allowed in URLs, such as special characters like "/", ":" and "&" and non ASCII characters. This is called percent encoding - https://en.m.wikipedia.org/wiki/Percent-encoding
The "\xd.." prefixed represent hexadecimal character codes, since arabic characters fall outside of UTF-8 thats how that have to be represented. Thats assuming that HTML you showed used UTF-8 encoding.

HTML Issue, strange characters replacing HREF quotes

I am new to HTML coding. I'm taking an intro web design course this semester and i'm having a difficult time with my HREF segment. I have a table of contents page that references all of my projects over the semester.
This includes direct links to my projects where I should be able to embed my index.html file with the links to my new projects. However, whenever I try to update the HREF segments with quotes linking to my new project it spits out odd characters where the quotes would be.
â₠example of what the error shows below.
**The requested URL /“http://userid.myweb.usf.edu/project1/index.html“ was not found on this server.**
<li>This link goes to <a href=“http://userid.myweb.usf.edu/project1/index.html“>Project1</a></li>
I see a lot of references to it being a UNICODE8 issue but i have no idea what that means. If anyone could help i would greatly appreciate it as my professor is not the best at getting back to us.
Your <a> tag is using “ quote characters (Unicode codepoint U+201C LEFT DOUBLE QUOTATION MARK). HTML requires " quote characters instead (codepoint U+0022 QUOTATION MARK).
<li>This link goes to Project1</li>
Some editors, particularly word processors that were designed for editing documents and not HTML, will use “ instead of " when you type " on the keyboard or copy/paste text from other apps, so watch out for that. Use a text editor that is specifically designed for editing HTML, or at least a plain vanilla text editor, like NotePad/NodePad++, which doesn't reinterpret entered characters.
Here is a breakdown of what “ means:
The Unicode “ (U+201C) character, which you are entering in your HTML, is encoded in UTF-8 as bytes E2 80 9C.
When those same bytes are interpreted in the Windows-1252 charset (the default charset used by most Windows systems in Western countries), byte E2 is Unicode codepoint U+00E2 (â), byte 80 is codepoint U+20AC (€), and byte 9C is codepoint U+0153 (œ).
When encoded in UTF-8, codepoint U+00E2 is bytes C3 A2, codepoint U+20AC is bytes E2 82 AC, and codepoint U+0153 is bytes C5 93.
In Windows-1252, characters “ are bytes C3 A2 E2 82 AC C5 93.
Look familiar?
You have a charset mismatch between what you are saving your HTML file as, and what your web browser is interpreting the HTML as. Your HTML is being saved as UTF-8, but is being decoded to Unicode mis-interpretted as Windows-1252 instead of as UTF-8, re-encoded as UTF-8, and then displayed as Windows-1252.
If you are serving your HTML file over HTTP, make sure the HTTP server is reporting the correct charset=UTF-8 attribute in the Content-Type HTTP header.
You can (and should) also add a <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> tag (if using HTML4) or <meta charset="UTF-8"> tag (if using HTML5) to your HTML itself (when served over HTTP, web browsers are required to give the actual Content-Type HTTP header higher priority, though).
Make sure the reported charset in all cases matches the actual charset that you are saving your HTML file as.

Special characters representation issue in JSP

In JSP file, the source code is
|1€3|<%="\u0031\u0080\u0033" %>|
The result on the page is:
|1€3|13|
Why is the Euro symbol represented differently ?
The HTML numerical character references in the range 0x80–0x9F don't actually correspond to the characters U+0080–U+009F. Instead, they refer to the characters mapped into the bytes 0x80–0x9F from the windows-1252 encoding.
This is a weird historical artefact from the days before browsers did Unicode. HTML5 sort-of standardises it, in that although it's invalid parsers are required to parse it this way. This does not happen in XML/XHTML.
So \u0080 gives you the actual character U+0080, which you can't see because it's an invisible control character, but € gives you code page 1252 byte 0x80, which is U+20AC Euro Sign.

How can I show special characters like "e" with accent acute over it in HTML page?

I need to put the name of some universities on my web page. I have typed them as they were but in some browser or maybe some computers they appear differently. For example, "Universite de Moncton" should have the 2nd "e" in Universite with an accent acute over it. Could you please help about it.
If you’re using a character set that contains that character, you can use an appropriate character encoding and use it literally:
Universit‌é de Moncton
Don’t forget to specify the character set/encoding properly.
If not, you can use an HTML character reference, either a numeric character reference that denotes the code point of the character in the Universal Character Set (UCS):
Universit‌é de Moncton
Universit‌é de Moncton
Or using an entity reference:
Universit‌é de Moncton
But this entity is just a named representation of the numeric character reference (see the list of entity references that are defined in HTML 4):
<!ENTITY eacute CDATA "é" -- latin small letter e with acute,
U+00E9 ISOlat1 -->
You can use UTF-8 HTML Entities:
è è
é é
ê ê
ë ë
Here's a handy search page for the UTF-8 Character Map
I think from the mention that 'in some computers or browsers they appear differently' that the problem you have is with the page or server encoding. You must
encode the file correctly (how to do this depends on your text editor)
assign the correct encoding in your webpage, done with a meta tag
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
force the server encoding with, for example, PHP's header() function:
header('Content-Type: text/plain; charset=ISO-8859-1');
Or, yes, as everyone has pointed out, use the html entities for those characters, which is always safe, but might make a mess when you try to find-replace in code.
There are two methods. One is by using "HTML entities." You need to enter them as, for example, é. Here is a comprehensive reference of named entities; you can also reference the Unicode code point of a given character, using its decimal form as Ӓ or its hex form as Ӓ.
Perhaps more common now (ten years after this answer was originally entered) is simply using Unicode characters directly. Rất dễ dàng, phải không? This is more acceptable and universal because most pages now use UTF-8 as their character encoding.
运气!
By typing it in to your HTML code. é <--You can copy and paste this one if you want.
Microsoft windows has a character map for accessing characters not on your keyboard, it's called Character map.
http://www.starr.net/is/type/htmlcodes.html
This site shows you the HTML markup for all of those characters that you will need :)

IE munging pound (£) symbol

I have a html form which goes of to do all sorts of strange back end things. This works fine in firefox. and in most cases it works fine in IE
However the (pound sterling) £ sign causes problems, and seems to get munged in the submit.
The forms is something like this
<form action="*MyFormAction*" accept-charset="UTF-8" method="post">
I think I have seen this problem before but can't remember the solution.
edit, the euro symbol € works fine
edit 2,
In fact if I put the € symbol with a £ symbol it also works fine. Looking at the problem if I use characters which are not in the extended part of iso8859-1 it works ok. If I use extended charicters from iso8859-1 they get munged. So how do I make IE use the character set that the accept-charset says it should?
accept-charset="UTF-8"
Does not do what you think it does (or the standard says it does) in IE. Instead, IE uses the value (‘UTF-8’) as an alternative list of encodings for if a field can't be encoded using the usual default encoding (which is the same as the page's own encoding).
So if you add this attribute and your page isn't already in UTF-8, you can be getting characters submitted as either the page encoding or UTF-8, and there is no way for your form-submission-reading script to know!
For this reason you should never use accept-charset; instead you should always ensure that the page containing the form is correctly served as “Content-Type: text/html;charset=utf-8” (by HTTP header and/or <meta>).
In fact if I put the € symbol with a £ symbol it also works fine.
Yes, that's because ‘€’ cannot be encoded in the page's default encoding (presumably ISO-8859-1). So IE resorts to sending the field encoded as UTF-8, which is what you wanted all along.
I think bobince has the ideal answer which is “serve the page in UTF-8", however as I can't do this I am posting my work around for prosperity.
Adding a hidden field unmunge with a non ISO-8859-1 (what our pages are served in) extended character forces the submission into UTF8
so
<input type="hidden" name="unmunge" value="€" />
fixes the encoding (the entity is the euro symbol).
How is the £ submitted? If it's in an input box for a price don't submit it, only allow numbers to be submitted and add the £ when you display the price again. Or add the currency symbol in the backend script.
I am no sure if this will help (read the entire article at http://fyneworks.blogspot.com/2008/06/british-pound-sign-encoding-revisited.html)
Excerpt:
THE PROBLEM If you look at the
UTF-8/Latin-1 (AKA ISO-8859-1)
Character Table you will find that the
decimal code for the British pound
sterling sign is 163 - and the
hexadecimal code is A3.
£ = %A3
However, this is not the case in (all)
encoding/decoding functions in
Javascript...
encodeURI/encodeURIComponent
Encodes a Uniform Resource Identifier (URI) component by
replacing each instance of certain
characters by one, two, or three
escape sequences representing the
UTF-8 encoding of the character
Which means, in order to encode our
beloved pound sign, Javascript uses 2
characters. This is where the annoying
"Â" comes in...
£ = %C2%A3
Hope it helps.