ck-editor changes html codes of non-ascii characters to actual characters - html

I just started using CK-editor as "Google HTML Editor" on HTML documents on my Google Drive.
I am noticing that all non-ASCII characters represented with HTML codes are being automatically shown as the actual character.
For example I use the following codes (I am putting spaces in the code, the editor is changing them to the actual character here, too!):
ç as & # 231 ;
Ç as & # 199 ;
Ö as & # 214 ;
etc.
CK editor changes these codes within the HTML source to Ç, ç and Ö immediately!
Is there a way to stop this?

Related

What kind of encoding is this html encoding?

I am doing a project that involves searching words in the Arabic script on Wiktionary, and when I do a GET request on certain word pages, I get something like this for example:
title="\xd8\xb1\xd8\xa3\xd8\xb3\xd9\x85\xd8\xa7\xd9\x84\xd9\x8a\xd8\xa9">\xd8\xb1\xd8\xa3\xd8\xb3\xd9\x85\xd8\xa7\xd9\x84\xd9\x8a\xd8\xa9</a></li>\n<li><a href="/wiki/%D8%B1%D8%A3%D8%B3%D9%8A"
This corresponds to the following URL: https://en.wiktionary.org/wiki/%D8%B1%D8%A3%D8%B3%D9%8A.
Does anyone know what the \xd8 or %D8 encodings are called? I want to say they are hex codes, but I have already looked up hex codes for the Arabic script and they certainly are not these.
The percentages you see in the url are used to substitute characters that are'nt allowed in URLs, such as special characters like "/", ":" and "&" and non ASCII characters. This is called percent encoding - https://en.m.wikipedia.org/wiki/Percent-encoding
The "\xd.." prefixed represent hexadecimal character codes, since arabic characters fall outside of UTF-8 thats how that have to be represented. Thats assuming that HTML you showed used UTF-8 encoding.

HTML Issue, strange characters replacing HREF quotes

I am new to HTML coding. I'm taking an intro web design course this semester and i'm having a difficult time with my HREF segment. I have a table of contents page that references all of my projects over the semester.
This includes direct links to my projects where I should be able to embed my index.html file with the links to my new projects. However, whenever I try to update the HREF segments with quotes linking to my new project it spits out odd characters where the quotes would be.
â₠example of what the error shows below.
**The requested URL /“http://userid.myweb.usf.edu/project1/index.html“ was not found on this server.**
<li>This link goes to <a href=“http://userid.myweb.usf.edu/project1/index.html“>Project1</a></li>
I see a lot of references to it being a UNICODE8 issue but i have no idea what that means. If anyone could help i would greatly appreciate it as my professor is not the best at getting back to us.
Your <a> tag is using “ quote characters (Unicode codepoint U+201C LEFT DOUBLE QUOTATION MARK). HTML requires " quote characters instead (codepoint U+0022 QUOTATION MARK).
<li>This link goes to Project1</li>
Some editors, particularly word processors that were designed for editing documents and not HTML, will use “ instead of " when you type " on the keyboard or copy/paste text from other apps, so watch out for that. Use a text editor that is specifically designed for editing HTML, or at least a plain vanilla text editor, like NotePad/NodePad++, which doesn't reinterpret entered characters.
Here is a breakdown of what “ means:
The Unicode “ (U+201C) character, which you are entering in your HTML, is encoded in UTF-8 as bytes E2 80 9C.
When those same bytes are interpreted in the Windows-1252 charset (the default charset used by most Windows systems in Western countries), byte E2 is Unicode codepoint U+00E2 (â), byte 80 is codepoint U+20AC (€), and byte 9C is codepoint U+0153 (œ).
When encoded in UTF-8, codepoint U+00E2 is bytes C3 A2, codepoint U+20AC is bytes E2 82 AC, and codepoint U+0153 is bytes C5 93.
In Windows-1252, characters “ are bytes C3 A2 E2 82 AC C5 93.
Look familiar?
You have a charset mismatch between what you are saving your HTML file as, and what your web browser is interpreting the HTML as. Your HTML is being saved as UTF-8, but is being decoded to Unicode mis-interpretted as Windows-1252 instead of as UTF-8, re-encoded as UTF-8, and then displayed as Windows-1252.
If you are serving your HTML file over HTTP, make sure the HTTP server is reporting the correct charset=UTF-8 attribute in the Content-Type HTTP header.
You can (and should) also add a <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> tag (if using HTML4) or <meta charset="UTF-8"> tag (if using HTML5) to your HTML itself (when served over HTTP, web browsers are required to give the actual Content-Type HTTP header higher priority, though).
Make sure the reported charset in all cases matches the actual charset that you are saving your HTML file as.

Moodle and email question: Emails displaying special characters as & amp ; and not &

I have a custom module installed in Moodle (the certificate module). You can have the certificate emailed to the student. Currently, I have an ampersand in my course name and it's showing up as & amp ;. Here's what I mean:
The ampersand is showing in the email subject line and not getting encoded.
It is happening on gmail, and Mac mail. Any ideas on how to fix this or is this something that is out of my control?
You didn't post your HTML code, so it's difficult to see what might be happening.
An ampersand & next to another character might be interpreted as by the email client displaying the email as ASCII code, since the ampersand is used to display extended characters like the copyright symbol: © which can be displayed as: © or ©.
Extended characters are the characters that were not included in the standard ASCII character set. The original ASCII character set included all of the lowercase and uppercase letters on the keyboard (a, A), which are represented with ASCII codes 0 to 127. Anything above 127 is an extended character, for instance the Euro symbol, "€" € € or the trade mark symbol "™" ™ ™.
https://ascii-code.com
One thought is that your email is not being encoded in UTF-8, which allows you to use extended characters without much difficulty. Do you have something like this in the email head: <meta charset="utf-8">.
Good luck.

In Rich Text Editor Special Character are getting stored as Special Character not as their HTML code

We are using the Rich Text Editor in CQ, with special characters.
Whenever we add special characters by our button in the RTE, the character is added but is saved as the character in the source too, rather than the encoded HTML entity.
We are calling:
doc.execCommand("InsertHTML", false, htmlToInsert);
In htmlToInsert, we are sending the HTML code value of special character like ¥ for yen, but it is saving ¥ for yen, not ¥.
We need to store HTML code values only. Please help me in achieving this.

how to fix £ showing on HTML, possible through htaccess?

I've switched hosts and somehow on all my HTML files which contain the pound sign is replaces with an A in front: £. Is there a way to overcome this problem without adding
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /></head>
on every HTML page?
You have various alternative ways to overcome the problem, including any one of these:
In .htaccess as you asked: insert the line "AddDefaultCharset utf-8" or follow further advice from W3C or further advice from askapache.com
Insert the HTML 5 doctype "<!DOCTYPE html>" at the beginning of each HTML page, thus causing the browser to interpret the default character encoding to be UTF-8 instead of ISO-8859-1.
Store your HTML using character encoding ISO-8859-1, so that the pound sign is stored as one byte. Currently your HTML would appear to be stored using character encoding UTF-8, so that the pound sign is stored as two bytes. Here is one way to store a copy of a UTF-8 file as ISO-8859-1: iconv --from-code=UTF-8 --to-code=ISO-8859-1 inputfile.html > outputfile.html
Store your HTML using 7-bit (ASCII) characters, with the pound sign encoded as an XML numeric character entity £ or (hexadecimal) £ or the HTML named character entity £