Characters not displaying correctly in different browsers - html

I used certain characters in website such as • — “ ” ‘ ’ º ©.
I found that when testing to see what my website looked like under different browsers (BrowserLab)
the afore-mentioned characters are replaced with �.
I then changed the charset in the webpage header from:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
to
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Suddenly all the pages have the above mentioned characters replaced with a ?.
Even more puzzling is this is not always consistent across and even within the same page, as some sections display the character • and © correctly.
In particular, I need to replace the character • with one that will display across browsers, can anyone help me with the answer? Thanks.

You should save your HTML source as UTF8.
Alternatively, you can use HTML entities instead.

The source code needs to be saved in the same encoding as you're instructing the browser to parse it in. If you're saving your files in UTF-8, instruct the browser to parse it as UTF-8 by setting an appropriate HTTP header or HTML meta tag (headers preferable, your web server may be setting one without you knowing). Use a decent editor that clearly tells you what encoding you're saving the file as. If it doesn't display correctly, there's a discrepancy between what you're telling your browser the file is encoded in and what it's really encoded in.

Check to see if Apache is setup to send the charset. Look for the directive "AddDefaultCharset" and set it to Off in .htaccess or your config file.
Most/all browsers will take what is sent in the HTTP headers over what is in the document.

If you're using Notepad++, I suggest You to use Edit Plus editor to copy the text (which has the special characters) and paste it in your file. This should work.

Yes I had this problem too in notepad++ copy and pasting wasn't working with some symbols
I think SLaks is right
HTML entities for copyright symbol &#169

Related

Chinese text encoding missing characters when viewed in web browser

I have a HTML file which contains Chinese text. When I open the file in any web browser, there are characters which appear to be missing.
Here's an example copied from the browser window:
本函旨在邀請您參�� 定於
I know for a fact that all other characters seen here are correct aside from the missing ones (confirmed by a native Chinese speaker).
In the HTML header, I have a tag which signifies the file contains UTF-8 encoded characters:
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
I've already tried some other charsets in this META tag, but so far it seems any encoding method I try aside from UTF-8 ends up looking worse.
I also considered the possibility that it is a font issue, so I installed 3 different traditional Chinese fonts on my system and forced Chrome to use them. None of them made any difference - missing characters were still present.
If I open the HTML file with Notepad++, here's what I can see:
http://i.imgur.com/GoS07WX.png
If I select and copy-paste this text into regular MS Notepad, I get this:
本函旨在邀請您參劦nbsp;定於
So you can see here that the "xE5 x8A" visible in Notepad++ seems to have been replaced by 劦.
Is there any reason why the browser would be showing �� instead of 劦 in this scenario?
Look again at the HTML file.
I see the first 2 bytes of a character encoded in UTF-8, followed by ... let's imagine there was originally a \xA0, and this was mutated to when the file was created by applying global substitutions to the UTF-8-encoded data.
However, \xE5\x8A\xA0 UTF-8 decodes to U+52A0 which is not the same as the alien character which is U+52A6 ... not close enough to an answer.

Special character not displaying as expected

I have the following simple HTML page:
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>
<body>
<div>
méywe
</div>
</body>
</html>
When displaying it in Chrome or Firefox (I did not test other browsers), I see the following:
m�ywe
What did I miss? The html file is saved in UTF-8 encoding. The server is Apache. My machine is Windows 7 pro. The text editor is UltraEdit.
Thanks!
Update
Initially, I used UltraEdit for editing this html file and I got the problem. Based on cmbuckley's input and install of Notepad++ (from Heatmanofurioso's suggestion), I thought about the possibility of my file being corrupt somehow (even though it looks fine in both UltraEdit and Notepad). So I saved my file with Notepad in utf-8 encoding. Still saw the problem (maybe due to cache???). Then I used UltraEdit to save it again. See the page in the browser and the problem is gone.
Lesson Learned
Have two text editors if that that is your tool, and try the different one if you see unexplainable problem. No tool is perfect, even though you use one everyday. In my case, Notepad++ fixed the utf8 issue with my file that UltraEdit somehow failed.
Thanks to folks for helping!!!
1 - Replace your
<meta charset="utf-8">
with
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
2 - Check if your HTML Editor's encoding is in UTF8. Usually this option is found on the tabs on the top of the program, like in Notepad++.
3 - Check if your browser is compatible with your font, if you're somehow importing a font. Or try and add a css to set your fonts to a default/generally accepted one like
body
{
font-family: "Times New Roman", Times, serif;
}
Hope it helps :)
The reason for having saved the file with Windows-1252 encoding (most likely) instead of UTF-8 encoding resulting in getting the non-ASCII character displayed wrong in the browsers was missing knowledge about UTF-8 detection by UltraEdit and perhaps also appropriate UTF-8 configuration.
How currently latest version 22.10 of UltraEdit detects UTF-8 encoding is explained in detail in user-to-user forum topic UTF-8 not recognized, largish file. This forum topic contains also recommendations on how to configure UltraEdit best for HTML writers who use mainly UTF-8 encoding for all HTML files. The UTF-8 detection was greatly improved with UltraEdit v24.00 which detects UTF-8 encoded characters also on in very large files on scrolling to a block containing a UTF-8 encoded character.
Unfortunately the regular expression search used by currently latest UltraEdit v22.10 and previous versions to detect a UTF-8 HTML character set declaration does not work for short HTML5 variant as reported in forum topic Short UTF-8 charset declaration in HTML5 header. The reason is the double quote character between charset= and utf-8. I reported this by email to IDM Computer Solutions, Inc. as the referenced topic was created with the suggestion to make the small change in the regular expression to detect also short HTML5 UTF-8 declaration. The UTF-8 detection was updated later by the developers of UltraEdit for UE v24.00 and UES v17.00 as a post on referenced forum topic explains in detail.
However, when an HTML5 file is declared as UTF-8 encoded, but UltraEdit loaded it as ANSI file, the user can see the wrong loading in the status bar at bottom of main window. A small (less than 64 KB) UTF-8 encoded HTML file should result in getting
either U8- and line terminator type (DOS/UNIX/MAC) displayed for users of UE < v19.00 or when using basic status bar in later versions of UE
or UTF-8 selected in encoding selector in status bar for users of UE v19.00 or later versions not using basic status bar.
If this is not the case, the UltraEdit user can use
Save As from menu File and select UTF-8 - NO BOM for Encoding (Windows Vista or later) respectively Format (Windows 2000/XP) to convert the file from ANSI to UTF-8 without byte order mark, or
ASCII to UTF-8 (Unicode editing) from submenu Conversions in menu File to convert the file from ASCII/ANSI to UTF-8 without an immediate save, or
select Unicode - UTF-8 via encoding selector in status bar (UE v19.00 or later only) resulting also in an immediate conversion from ASCII/ANSI to UTF-8 and enabling Unicode editing.
For the last two options the UTF-8 BOM settings at Advanced - Settings or Configuration - File Handling - Save determine saving the file without or with byte order mark on next save.
Once the word méywe is saved into the file using UTF-8 encoding resulting in byte stream 6D C3 A9 79 77 65 (hexadecimal) which would be displayed as méywe when UTF-8 encoded file is opened in ASCII/ANSI mode (option in File - Open dialog) using Windows-1252 as code page, UltraEdit detects this file on next opening automatically as UTF-8 encoded file although <meta charset="utf-8"> is not recognized because there is now at least one UTF-8 encoded character in the first 64 KB of the file.
To answer the question:
What did I miss?
You missed to save the file as UTF-8 encoded file after having it opened or created as ANSI file (or more precise single byte per character encoded text file using a code page) and having it declared as UTF-8 encoded. This is a common problem of many users writing into an HTML file
<meta charset="utf-8">
or
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
or
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
or into an XML file
<?xml version="1.0" encoding="UTF-8"?>
or
<?xml version="1.0" encoding='utf-8'?>
and other variations depending on usage of ' or " and writing either UTF-8 or utf-8 (and other spellings) without really knowing what this string means for the applications interpreting the bytes of the file.
What's the best default new file format? contains lots of useful information and links to web pages with useful information about text encoding, which one to use for which file types and how to configure UltraEdit accordingly.
Check and see if the server is sending a charset in the Content-type header. The encoding specified in that will take precedence over what you specify with the meta element.
Changing font-family to Calibri (or any other generally accepted font) worked for me.
Example:
<span style="font-family:Calibri"># My_Text</span>
I am using MS access accdb database and PHP. It had problem in displaying the "±" character . It was displaying "�".
I added the following line in PHP at the beginning to get it right. My problem is solved now.
header('Content-type: text/html; charset=ASCII');
Another method is to use mb_convert_encoding($row,'UTF-8','ASCII' );
The header declaration is not required.
In my case I converted the special character to decimal NCR and it worked. I have to do this because using meta tag do not work and I do not want to change my font.
There are many online unicode to decimal or hex converter.
Χαίρετε -> Χαίρετε
Replace meta charset="utf-8" with meta http-equiv="Content-Type" content="text/html; charset=utf-8". Maybe it will help.
Otherwise, what is your font?

How do I display Unicode as text in HTML?

I can't manage to find a way to do this.For example ∞ (infinity symbol) to display as text in a HTML document
You have first to check what is the Content-Type header your server returns? Is it Content-Type: text/html; charset=UTF-8? See Character_encodings_in_HTML If the server returns the charset, either fix it or use it, it overrides user provided encoding. (see HTML entities).
If your server does not provide charset, then add one in the document, as early as possible (should be in the first 1024 bytes entirely). Again, see Character_encodings_in_HTML. The following header should do:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
or for HTML 5:
<meta charset="utf-8">
or for XHTML (the first line):
<?xml version="1.0" encoding="ISO-8859-1"?>
And if you do not/can not use UTF-8 for your document, use HTML entities like
C Travel suggests.
You write the character, e.g. “∞”, in your authoring program, save the file as UTF-8 with BOM, and make sure that the fonts that you have declared for the page, or the relevant piece of text, contain the characters(s) you have included. For more information, see my Guide to using special characters in HTML. If problems remain, please post the code you have tried and specify how it fails (and on which browsers).
You can use the &#; HTML element.
For codes: http://unicode-table.com/en/
And you have to use UTF-8 encoding for the file save, and you have to put UTF-8 meta tag in the header too. (If you didn't already have this.)

Why does a diamond with a questionmark in it � appear in my HTML?

I have an unorder list, and � often (but not always!) appears where I have have two spaces between characters. What is causing this, and how do I prevent it?
This specific character � is usually the sign of an invalid (non-UTF-8) character showing up in an output (like a page) that has been declared to be UTF-8. It happens often when
a database connection is not UTF-8 encoded (even if the tables are)
a HTML or script source file is stored in the wrong encoding (e.g. Windows-1252 instead of UTF-8) - make sure it's saved as a UTF-8 file. The setting is often in the "Save as..." dialog.
an online source (like a widget or a RSS feed) is fetched that isn't serving UTF-8
I had the same issue ....
You can fix it by adding the following line in your template !
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
It's a character-set issue. Get a tool that inspects the response headers of the server (like the Firebug extension if you're using Mozilla Firefox) to see what character set the server response is sending with the content. If the server's character-set and the HTML character set of the actual content don't match up, you will see some strange looking characters like those little black diamond squares.
I had the same issue when getting an HTML output from an XSLT. Along with Pradip's solution I was also able to resolve the issue using UTF-32.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-32" />

Displaying unicode symbols in HTML

I want to simply display the tick (✔) and cross (✘) symbols in a HTML page but it shows up as either a box or goop ✔ - obviously something to do with the encoding.
I have set the meta tag to show utf-8 but obviously I'm missing something.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Edit/Solution: From comments made, using FireBug I found the headers being passed by my page were in fact "Content-Type: text/html" and not UTF-8. Looking at the file format using Notepad++ showed my file was formatted as "UTF-8 without BOM". Changing this to just UTF-8 the symbols now show correctly... but firebug still seems to indicate the same content-type.
You should ensure the HTTP server headers are correct.
In particular, the header:
Content-Type: text/html; charset=utf-8
should be present.
The meta tag is ignored by browsers if the HTTP header is present.
Also ensure that your file is actually encoded as UTF-8 before serving it, check/try the following:
Ensure your editor save it as UTF-8.
Ensure your FTP or any file transfer program does not mess with the file.
Try with HTML encoded entities, like &#uuu;.
To be really sure, hexdump the file and look as the character, for the ✔, it should be E2 9C 94 .
Note: If you use an unicode character for which your system can't find a glyph (no font with that character), your browser should display a question mark or some block like symbol. But if you see multiple roman characters like you do, this denotes an encoding problem.
I know an answer has already been accepted, but wanted to point a few things out.
Setting the content-type and charset is obviously a good practice, doing it on the server is much better, because it ensures consistency across your application.
However, I would use UTF-8 only when the language of my application uses a lot of characters that are available only in the UTF-8 charset. If you want to show a unicode character or symbol in one of cases, you can do so without changing the charset of your page.
HTML renderers have always been able to display symbols which are not part of the encoding character set of the page, as long as you mention the symbol in its numeric character reference (NCR). Sounds weird but its true.
So, even if your html has a header that states it has an encoding of ansi or any of the iso charsets, you can display a check mark by using its html character reference, in decimal - ✓ or in hex - ✓
So its a little difficult to understand why you are facing this issue on your pages. Can you check if the NCR value is correct, this is a good reference http://www.fileformat.info/info/unicode/char/2713/index.htm
Make sure that you actually save the file as UTF-8, alternatively use HTML entities (&#nnn;) for the special characters.
Unlike proposed by Nicolas, the meta tag isn’t actually ignored by the browsers. However, the Content-Type HTTP header always has precedence over the presence of a meta tag in the document.
So make sure that you either send the correct encoding via the HTTP header, or don’t send this HTTP header at all (not recommended). The meta tag is mainly a fallback option for local documents which aren’t sent via HTTP traffic.
Using HTML entities should also be considered a workaround – that’s tiptoeing around the real problem. Configuring the web server properly prevents a lot of nuisance.
I think this is a file problem, you simple saved your file in 1-byte encoding like latin-1. Google up your editor and how to set files to utf-8.
I wonder why there are editors that don't default to utf-8.