I've switched hosts and somehow on all my HTML files which contain the pound sign is replaces with an A in front: £. Is there a way to overcome this problem without adding
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /></head>
on every HTML page?
You have various alternative ways to overcome the problem, including any one of these:
In .htaccess as you asked: insert the line "AddDefaultCharset utf-8" or follow further advice from W3C or further advice from askapache.com
Insert the HTML 5 doctype "<!DOCTYPE html>" at the beginning of each HTML page, thus causing the browser to interpret the default character encoding to be UTF-8 instead of ISO-8859-1.
Store your HTML using character encoding ISO-8859-1, so that the pound sign is stored as one byte. Currently your HTML would appear to be stored using character encoding UTF-8, so that the pound sign is stored as two bytes. Here is one way to store a copy of a UTF-8 file as ISO-8859-1: iconv --from-code=UTF-8 --to-code=ISO-8859-1 inputfile.html > outputfile.html
Store your HTML using 7-bit (ASCII) characters, with the pound sign encoded as an XML numeric character entity £ or (hexadecimal) £ or the HTML named character entity £
Related
I am new to HTML coding. I'm taking an intro web design course this semester and i'm having a difficult time with my HREF segment. I have a table of contents page that references all of my projects over the semester.
This includes direct links to my projects where I should be able to embed my index.html file with the links to my new projects. However, whenever I try to update the HREF segments with quotes linking to my new project it spits out odd characters where the quotes would be.
â₠example of what the error shows below.
**The requested URL /“http://userid.myweb.usf.edu/project1/index.html“ was not found on this server.**
<li>This link goes to <a href=“http://userid.myweb.usf.edu/project1/index.html“>Project1</a></li>
I see a lot of references to it being a UNICODE8 issue but i have no idea what that means. If anyone could help i would greatly appreciate it as my professor is not the best at getting back to us.
Your <a> tag is using “ quote characters (Unicode codepoint U+201C LEFT DOUBLE QUOTATION MARK). HTML requires " quote characters instead (codepoint U+0022 QUOTATION MARK).
<li>This link goes to Project1</li>
Some editors, particularly word processors that were designed for editing documents and not HTML, will use “ instead of " when you type " on the keyboard or copy/paste text from other apps, so watch out for that. Use a text editor that is specifically designed for editing HTML, or at least a plain vanilla text editor, like NotePad/NodePad++, which doesn't reinterpret entered characters.
Here is a breakdown of what “ means:
The Unicode “ (U+201C) character, which you are entering in your HTML, is encoded in UTF-8 as bytes E2 80 9C.
When those same bytes are interpreted in the Windows-1252 charset (the default charset used by most Windows systems in Western countries), byte E2 is Unicode codepoint U+00E2 (â), byte 80 is codepoint U+20AC (€), and byte 9C is codepoint U+0153 (œ).
When encoded in UTF-8, codepoint U+00E2 is bytes C3 A2, codepoint U+20AC is bytes E2 82 AC, and codepoint U+0153 is bytes C5 93.
In Windows-1252, characters “ are bytes C3 A2 E2 82 AC C5 93.
Look familiar?
You have a charset mismatch between what you are saving your HTML file as, and what your web browser is interpreting the HTML as. Your HTML is being saved as UTF-8, but is being decoded to Unicode mis-interpretted as Windows-1252 instead of as UTF-8, re-encoded as UTF-8, and then displayed as Windows-1252.
If you are serving your HTML file over HTTP, make sure the HTTP server is reporting the correct charset=UTF-8 attribute in the Content-Type HTTP header.
You can (and should) also add a <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> tag (if using HTML4) or <meta charset="UTF-8"> tag (if using HTML5) to your HTML itself (when served over HTTP, web browsers are required to give the actual Content-Type HTTP header higher priority, though).
Make sure the reported charset in all cases matches the actual charset that you are saving your HTML file as.
I have the following simple HTML page:
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>
<body>
<div>
méywe
</div>
</body>
</html>
When displaying it in Chrome or Firefox (I did not test other browsers), I see the following:
m�ywe
What did I miss? The html file is saved in UTF-8 encoding. The server is Apache. My machine is Windows 7 pro. The text editor is UltraEdit.
Thanks!
Update
Initially, I used UltraEdit for editing this html file and I got the problem. Based on cmbuckley's input and install of Notepad++ (from Heatmanofurioso's suggestion), I thought about the possibility of my file being corrupt somehow (even though it looks fine in both UltraEdit and Notepad). So I saved my file with Notepad in utf-8 encoding. Still saw the problem (maybe due to cache???). Then I used UltraEdit to save it again. See the page in the browser and the problem is gone.
Lesson Learned
Have two text editors if that that is your tool, and try the different one if you see unexplainable problem. No tool is perfect, even though you use one everyday. In my case, Notepad++ fixed the utf8 issue with my file that UltraEdit somehow failed.
Thanks to folks for helping!!!
1 - Replace your
<meta charset="utf-8">
with
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
2 - Check if your HTML Editor's encoding is in UTF8. Usually this option is found on the tabs on the top of the program, like in Notepad++.
3 - Check if your browser is compatible with your font, if you're somehow importing a font. Or try and add a css to set your fonts to a default/generally accepted one like
body
{
font-family: "Times New Roman", Times, serif;
}
Hope it helps :)
The reason for having saved the file with Windows-1252 encoding (most likely) instead of UTF-8 encoding resulting in getting the non-ASCII character displayed wrong in the browsers was missing knowledge about UTF-8 detection by UltraEdit and perhaps also appropriate UTF-8 configuration.
How currently latest version 22.10 of UltraEdit detects UTF-8 encoding is explained in detail in user-to-user forum topic UTF-8 not recognized, largish file. This forum topic contains also recommendations on how to configure UltraEdit best for HTML writers who use mainly UTF-8 encoding for all HTML files. The UTF-8 detection was greatly improved with UltraEdit v24.00 which detects UTF-8 encoded characters also on in very large files on scrolling to a block containing a UTF-8 encoded character.
Unfortunately the regular expression search used by currently latest UltraEdit v22.10 and previous versions to detect a UTF-8 HTML character set declaration does not work for short HTML5 variant as reported in forum topic Short UTF-8 charset declaration in HTML5 header. The reason is the double quote character between charset= and utf-8. I reported this by email to IDM Computer Solutions, Inc. as the referenced topic was created with the suggestion to make the small change in the regular expression to detect also short HTML5 UTF-8 declaration. The UTF-8 detection was updated later by the developers of UltraEdit for UE v24.00 and UES v17.00 as a post on referenced forum topic explains in detail.
However, when an HTML5 file is declared as UTF-8 encoded, but UltraEdit loaded it as ANSI file, the user can see the wrong loading in the status bar at bottom of main window. A small (less than 64 KB) UTF-8 encoded HTML file should result in getting
either U8- and line terminator type (DOS/UNIX/MAC) displayed for users of UE < v19.00 or when using basic status bar in later versions of UE
or UTF-8 selected in encoding selector in status bar for users of UE v19.00 or later versions not using basic status bar.
If this is not the case, the UltraEdit user can use
Save As from menu File and select UTF-8 - NO BOM for Encoding (Windows Vista or later) respectively Format (Windows 2000/XP) to convert the file from ANSI to UTF-8 without byte order mark, or
ASCII to UTF-8 (Unicode editing) from submenu Conversions in menu File to convert the file from ASCII/ANSI to UTF-8 without an immediate save, or
select Unicode - UTF-8 via encoding selector in status bar (UE v19.00 or later only) resulting also in an immediate conversion from ASCII/ANSI to UTF-8 and enabling Unicode editing.
For the last two options the UTF-8 BOM settings at Advanced - Settings or Configuration - File Handling - Save determine saving the file without or with byte order mark on next save.
Once the word méywe is saved into the file using UTF-8 encoding resulting in byte stream 6D C3 A9 79 77 65 (hexadecimal) which would be displayed as méywe when UTF-8 encoded file is opened in ASCII/ANSI mode (option in File - Open dialog) using Windows-1252 as code page, UltraEdit detects this file on next opening automatically as UTF-8 encoded file although <meta charset="utf-8"> is not recognized because there is now at least one UTF-8 encoded character in the first 64 KB of the file.
To answer the question:
What did I miss?
You missed to save the file as UTF-8 encoded file after having it opened or created as ANSI file (or more precise single byte per character encoded text file using a code page) and having it declared as UTF-8 encoded. This is a common problem of many users writing into an HTML file
<meta charset="utf-8">
or
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
or
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
or into an XML file
<?xml version="1.0" encoding="UTF-8"?>
or
<?xml version="1.0" encoding='utf-8'?>
and other variations depending on usage of ' or " and writing either UTF-8 or utf-8 (and other spellings) without really knowing what this string means for the applications interpreting the bytes of the file.
What's the best default new file format? contains lots of useful information and links to web pages with useful information about text encoding, which one to use for which file types and how to configure UltraEdit accordingly.
Check and see if the server is sending a charset in the Content-type header. The encoding specified in that will take precedence over what you specify with the meta element.
Changing font-family to Calibri (or any other generally accepted font) worked for me.
Example:
<span style="font-family:Calibri"># My_Text</span>
I am using MS access accdb database and PHP. It had problem in displaying the "±" character . It was displaying "�".
I added the following line in PHP at the beginning to get it right. My problem is solved now.
header('Content-type: text/html; charset=ASCII');
Another method is to use mb_convert_encoding($row,'UTF-8','ASCII' );
The header declaration is not required.
In my case I converted the special character to decimal NCR and it worked. I have to do this because using meta tag do not work and I do not want to change my font.
There are many online unicode to decimal or hex converter.
Χαίρετε -> Χαίρετε
Replace meta charset="utf-8" with meta http-equiv="Content-Type" content="text/html; charset=utf-8". Maybe it will help.
Otherwise, what is your font?
I want to use non-English language (ex: Bengali) in my website. I am using the following tag which is not working.
My project encoding is widows-1252. I am using net-beans 7.0 with font Arial Unicode MS.
Is it mandatory to change my project encoding to UTF-8 or other ways are there?
Please help
<%# page language="java" contentType="text/html; charset=UTF-8 pageEncoding="UTF-8" %>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<table >
<tr >
<td >বাঙালি </td>
</tr>
</table>
Is it mandatory to change my project encoding to UTF-8 or other ways are there?
You can use any encoding that supports the characters you need (widows-1252 won't do the job for you since it is for the Latin alphabet). Support for Unicode (which supports just about every language) has been excellent since around the turn of the century. You should have been using it for all new projects for the last decade and a half.
Other than changing to an encoding that supports the characters you need, you can use HTML character references. Typically this will be:
Decimal numeric character reference
The ampersand must be followed by a "#" (U+0023) character, followed by one or more ASCII digits,
representing a base-ten integer that corresponds to a Unicode code
point that is allowed according to the definition below. The digits
must then be followed by a ";" (U+003B) character.
Since character references are expressed in ASCII characters, you can express them in any encoding (as they build on top of ASCII).
I have some HTML which contains some forign characters (€, ó, á). The HTML document is saved as UTF-8 without BOM. When I view the page in the browser the forign characters seem to get replaced with stranger character combinations (€, ó, Ã). It's only when I save my HTML document as UTF-8 with BOM that the characters then display properly.
I'd really rather not have to include a BOM in my files, but has anybody got any idea why it might do this? and a way to fix it? (other than including a BOM)
You are probably not specifying the correct character set in your HTML file. The BOM (thanks #Jukka) sends the browser into UTF-.8 mode; in its absence, you need to use other means to declare the document UTF.8.
If you have access to your server configuration, you may want to make sure the server isn't sending the wrong character set info. See e.g. How to change the default encoding to UTF-8 for Apache?
If you have access only to your HTML, adding this meta tag in your document's head should do the trick:
<meta http-equiv='Content-Type' content='Type=text/html; charset=utf-8'>
or as #Mathias points out, the new HTML 5
<meta charset="utf-8">
(valid only if you use a HTML 5 doctype, against which there is no good argument any more even if you don't use HTML 5 markup.)
Insert <meta charset="utf-8"> in <head>.
Or set the header Content-Type: text/html;charset=utf-8 on the server-side.
You can also do add in .htaccess: AddDefaultCharset UTF-8 more info here http://www.askapache.com/htaccess/setting-charset-in-htaccess.html
validator.w3.org reports for www.besaltnlight.ca:
Character Encoding Override in effect!
The detected character encoding "utf-8" has been suppressed and "iso-8859-1" used instead.
The php code outputs iso-8859-1 and php sets that as the default characterset.
What is causing this problem? Am I using the wrong doctype?
Oh, and would any of this cause quirks mode in IE?
Thanks for your help.
Gerry
The document is encoded in UTF-8. It has a byte order mark, smart quotes, and an ellipsis, all properly encoded in UTF-8. It begins with two byte order marks, which is invalid. You must remove one, and the validator also says that the presence of a BOM in a UTF-8 document may be confusing, so you may remove them both.
Since you’re outputting UTF-8, you must change the HTTP header to:
Content-type: text/html; charset=utf-8
Since you are missing that header, you force the browser to guess. Additionally, the meta tag must be changed to
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
for the same reason.
Your output starts with a Unicode byte order mark, encoded in UTF-8.
This is likely the first some bytes of your PHP file, or any PHP file included by your main file. Your editor may not even show them. Interpreted as ISO-8859-1 the start of the output looks like <!DOCTYPE html, which are even two byte order marks, one after each other.
As said by jleedev, either make sure your files are really encoded in Latin-1, or declare the encoding as UTF-8.