I recently updated a page I'm working on to work in HTML 5. For some reason when I changed my headers the £ sign that is included on all of the prices is no longer recognised and is showing as a white '?' in a black diamond.
Can anyone explain how to fix this? I have a feeling it has something to do with the <meta charset="utf-8"> line in my head, but could be mistaken.
Any help would be much appreciated!
Thanks!
You need to actually encode your HTML document in UTF-8. <meta charset="utf-8"> tells the browser that the document is supposedly encoded in UTF-8 and that the browser should treat it as such. A UTF-8 replacement character � means an invalid UTF-8 byte sequence was found at that point, which means your document is not actually encoded in UTF-8.
If you tell the browser it's UTF-8, then it must be UTF-8 that you send. It sounds like you're not sending valid UTF-8 sequences. You can probably fix this by doing one of the following:
Make sure you're saving the script(s) as UTF-8 in your editor. (Recommended)
Save the script(s) as ISO-8859-1, and use utf8_encode() on any output.
Related
I am using Jekyll which has some issues with UTF-8 files. I was able to work around this by saving the file as Unicode (UTF-16 LE).
However it is an HTML document, which until now I have been using the
<meta charset="utf-8">
line in the file. Is this charset still correct or should I be using another?
If you save the file as UTF-16 LE, you have to update the <meta> tag to match.
The document cited deals with “incorrect UTF-8 characters”, whatever that means. Just don’t do incorrect UTF-8 characters.
Saving an HTML file as UTF-16 is normally pointless, because UTF-16 just does not work on the web. Of course the meta tag should describe the real encoding, but that’s not the point, and charset declaration in HTTP headers will override any meta tags.
So keep using UTF-8, and fix the problem with your character data, instead of creating a new, serious problem.
I found some
information from the World Wide Web Consortium.
HTML5 with UTF-16
Ensure that there is a byte-order mark
at the beginning of the file.
The HTML Working Group is currently discussing whether
you can use a meta element declaration in the head
element when the encoding is UTF-16. For now, don't.
I printed some UTF-16 encoded characters and tried to display it in Firefox and it displayed it as �.
So I went to Tools->Encoding and changed the encoding from UTF-8 to UTF-16 (I also tried changing charset directly in the HTML) However, when I did that, my page was completely flooded with symbols:
ℼ佄呃偙⁅瑨汭ാ㰊瑨汭ാഊ㰊敨摡ാ †ഠ †㰠楴汴㹥楬畮⁸楆敲潦⁸楤灳慬獹朠牡慢敧挠慨慲瑣牥湩氠敩⁵景眠扥 瀠条畓数獕牥⼼楴汴㹥††氼湩敲㵬猢潨瑲畣⁴捩湯•牨晥∽瑨灴⼺振湤献瑳瑡捩渮瑥猯灵牥獵牥椯杭是癡捩湯椮潣㸢††氼湩敲㵬愢灰敬琭畯档椭潣≮栠敲㵦栢瑴㩰⼯摣獳慴楴敮............
How can web browsers display UTF-16 characters without wrecking the page?
The “flooded with symbols” excerpt looks like an HTML document that is UTF-8 encoded but treated as if it were UTF-16 encoded. Or it might contain mostly UTF-8 data with some UTF-16 encoded data thrown in, which won’t work.
If you save your data as properly UTF-16 encoded and declare the encoding in HTTP headers and/or meta tags, then some browsers will display it OK, some won’t. Search engines generally fail to process UTF-16, and UTF-16 is mostly not used and should not be used on the web, except by mutual agreement between consenting well-informed partners.
Firefox could not figure the correct charset in your document.
For web pages head meta tag should be used to indicate the content's charset.
It should be placed in the beginning of the HTML file indicating which charset the browser should use for the rest of the file.
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
So the browser is charset blind until it reads that line. But using utf-8 is no problem. Because every character up to that point is encoded in utf-8 the same way it would be in ASCII (same goes for latin-1 and others). That's not the case in utf-16.
W3C says:
There are three different Unicode character encodings: UTF-8, UTF-16
and UTF-32. Of these three, only UTF-8 should be used for Web content.
So you should use utf-8. But if you still want to try something with utf-16 use the BOM in the begging of your file. You're going to give your browser a better chance of figuring it out and properly decode the content.
This other answer is very succinct about utf-16 usage.
While Joel gives a full lesson on character encoding and why HTML uses it declaration inside the content and not as a header information.
Sending UTF-16 data as a Web page to browsers is an XSS risk in older browsers. (See another answer.) Don’t do it. Instead, convert the data to UTF-8 on the server and send UTF-8 over HTTP.
The way to make this work is for the page to say what encoding it's in. In the case of UTF-16, it also helps to include a BOM. The "flooded with Chinese" effect is most likely because your page is UTF-16LE but the browser treated it as UTF-16BE or vice versa...
I found a website that contains the string "don’t". The obvious intent was the word "don't". I looked at the source expecting to see some character references, but didn't (it just shows the literal string "don’t". A Google search yielded nothing (expect lots of other sites that have the same problem!). Can anyone explain what's happening here?
Edit: Here's the meta tag that was used:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
Would this not cause the page to be served up as Latin-1 in the HTTP header?
In your browser, switch the page encoding to "UTF-8". You're seeing a right single quote character, which is encoded by the octets 0xE2 0x80 0x99 in UTF-8. In your charset, windows-1252, those 3 octets render as "’". The page should be explicitly specifying UTF-8 as its charset either in the HTTP headers or in an HTML <meta> tag, but it probably isn't.
According to Character encondings in HTML a lemme in wikipedia:
HTML (Hypertext Markup Language) has
been in use since 1991, but HTML 4.0
(December 1997) was the first
standardized version where
international characters were given
reasonably complete treatment. When an
HTML document includes special
characters outside the range of
seven-bit ASCII two goals are worth
considering: the information's
integrity, and universal browser
display.
I suppose the site you checked, isn't impelemented with this in mind.
This has all got to do with encoding. Take a look back at the source, is there a tag at the top specifying it (charset)? My guess is it'll be UTF8 - although it could be something completely different.
This thread explains all. A combination of using a weird UTF-8 apostrophe character (probably originating from a Word Document), on a server that probably reports its encoding as non-UTF-8, despite the page having UTF characters (and possible even correctly reporting its own encoding).
i have a website in which i have to put some lines in Arabic.... how to do it...
where to get the Arabic text characters... how to make the page support Arabic...
i have to put a line per page and there is a lotta lotta pages so can't go around making images and putting them...
This is the answer that was required but everybody answered only part one of many.
Step 1 - You cannot have the multilingual characters in unicode document.. convert the document to UTF-8 document
advanced editors don't make it simple for you... go low level...
use notepad to save the document as meName.html & change the encoding
type to UTF-8
Step 2 - Mention in your html page that you are going to use such characters by
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
Step 3 - When you put in some characters make sure your container tags have the following 2 properties set
dir='rtl'
lang='ar'
Step 4 - Get the characters from some specific tool\editor or online editor like i did with Arabic-Keyboard.org
example
<p dir="rtl" lang="ar" style="color:#e0e0e0;font-size:20px;">رَبٍّ زِدْنٍي عِلمًا</p>
NOTE: font type, font family, font face setting will have no effect on special characters
The W3C has a good introduction.
In short:
HTML is a text markup language. Text means any characters, not just ones in ASCII.
Save your text using a character encoding that includes the characters you want (UTF-8 is a good bet). This will probably require configuring your editor in a way that is specific to the particular editor you are using. (Obviously it also requires that you have a way to input the characters you want)
Make sure your server sends the correct character encoding in the headers (how you do this depends on the server software you us)
If the document you serve over HTTP specifies its encoding internally, then make sure that is correct too
If anything happens to the document between you saving it and it being served up (e.g. being put in a database, being munged by a server side script, etc) then make sure that the encoding isn't mucked about with on the way.
You can also represent any unicode character with ASCII
You not only have to put the meta tag, telling that it is UTF-8 but really make the document UTF-8. You can do that with good editors (like notepad++) by converting them to "unicode" or "UTF-8 without BOM". Than you can simply use arabic characters
As this page is UTF-8, here are some examples (I hope I don't write anything rude here): شغف
If you use a server side scripting language make sure that it does not output the page in a different encoding. In PHP e.g. you can set it like this:
header('Content-Type: text/html; charset=utf-8');
If you don't even know where to get Arabic characters, but you want to display them, then you're doing something wrong.
Save files containing Arabic characters with encoding UTF-8. A good editor allows you to set the character encoding.
In the HTML page, place the following after <head>:
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
If you're using XHTML:
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
That's it.
An alternative way (without messing with the encoding of a file), is using HTML escape sequences. This website does that jobs for you: http://www.htmlescape.net/
Won't you need the ensure the area where you display the Arabic is Right-to-Left orientated also?
e.g.
<p dir="rtl">
i edit the html page with notepad ++ ,set encoding to utf-8 and its work
As mentioned above, by default text editors will not use UTF-8 as the standard encoding for documents.
However most editors will allow you to change that in the settings. Even for each specific document.
Check you have <meta charset="utf-8"> inside head block.
My pages contain German characters and I have typed the text in between the
HTML tag, but the browser views some characters differently. Do I need to include anything in HTML to properly display German characters?
<label> ausgefüllt </label>
It seems you need some basic explanations about something that unfortunately even most programmers don't understand properly.
Files like your HTML page are saved and transmitted over the Internet as a sequence of bytes, but you want them displayed as characters. In order to translate bytes into characters, you need a set of rules called a character encoding. Unfortunately, there are many different character encodings that have historically emerged to handle different languages. Most of them are based on the American ASCII encoding, but as soon as you have characters outside of ASCII such as German umlauts, you need to be very careful about which encoding you use.
The source of your problem is that in order to correctly decode an HTML file, the browser needs to know which encoding to use. You can tell it so in several ways:
The "Content-Type" HTTP header
The HTML META tag
The XML encoding attribute if you use XHTML
So you need to pick one encoding, save the HTML file using that encoding, and make sure that you declare that encoding in at least one of the ways listed above (and if you use more than one make damn sure they agree). As for what encoding to use, Germans often use ISO/IEC 8859-15, but UTF-8 is increasingly becoming the norm, and can handle any kind of non-ASCII characters at the same time.
UTF-8 is your friend.
Try
<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=utf-8">
and check which encoding your webserver sends in the header.
If you use PHP, you can send your own headers in this way (you have to put this before any other output):
<?php header('Content-Type: text/html; charset=utf-8'); ?>
Also doublecheck that you saved your document in UTF-8.
Try the solution in blog post German characters encoding issue (2012-05-10):
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
Have you tried ü (ü) and Ü (Ü)?
You can find how to type other letters here.
Declare <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=utf-8">
and when saving the file, for example in notepad, choose the save as to be UTF-8 and not just .txt.
This should render the characters ok.
you may try utf8_encode() or utf8_decode() functions.Check if any of these works.
For example <?php echo utf8_encode('ausgefüllt'); ?>
Hope it will work.
Sounds like a character encoding issue, in that the file is saved as a different character encoding to what the webserver is saying it is.
I don't like the use of HTML entities (like %uuml;), they are only needed when there is something wrong with your characterset.
In short:
The RIGHT way is to fix your characterset.
The EASY way is to just use entities. You may not ever see any problems with this.
Tracking down characterset error can be very difficult. If you give us an URL where we can see the problem, we can probably give you a good hint where to look.
save as your file with UTF8, and use this META:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>