Unicode characters within text document display on browser - html

I've created a very simple functional web page. I have the file index.html which
has a link. A local link which points to a txt file inside the File manager of my
website.
I understand what the problem is. I just don't know how to fix it.
My text document has Unicode characters such as ± and √ and ² and ³.
These characters display well in the notepad file they were created in.
When the user clicks on the link it opens up this text file, which is not formatted in HTML, therefore, the Unicode characters don't appear on the
browser.
I can either re-type this text document with HTML tags and use the special symbol
numeric codes within it and it will most likely work. And, that is something
I don't want to go through.
Is there any other way to go around this problem. I need the browser to display
these characters within the text document.
Is there a way to tell the browser to convert the text document to HTML on the fly.
thanks

You can use non-ASCII characters just fine. If you save the file as UTF-8, every browser that’s been updated this century will be able to display it. It’s been decades since browsers choked on a byte-order mark, but most people recommend leaving it out.
You can also use non-ASCII characters in HTML saved as UTF-8, literally and without any escape codes. While browsers should be able to detect the character set, best practice is to add either the tag <meta charset="UTF-8"> or <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> immediately after the <html> tag. This removes all ambiguity.

Related

Chinese text encoding missing characters when viewed in web browser

I have a HTML file which contains Chinese text. When I open the file in any web browser, there are characters which appear to be missing.
Here's an example copied from the browser window:
本函旨在邀請您參�� 定於
I know for a fact that all other characters seen here are correct aside from the missing ones (confirmed by a native Chinese speaker).
In the HTML header, I have a tag which signifies the file contains UTF-8 encoded characters:
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
I've already tried some other charsets in this META tag, but so far it seems any encoding method I try aside from UTF-8 ends up looking worse.
I also considered the possibility that it is a font issue, so I installed 3 different traditional Chinese fonts on my system and forced Chrome to use them. None of them made any difference - missing characters were still present.
If I open the HTML file with Notepad++, here's what I can see:
http://i.imgur.com/GoS07WX.png
If I select and copy-paste this text into regular MS Notepad, I get this:
本函旨在邀請您參劦nbsp;定於
So you can see here that the "xE5 x8A" visible in Notepad++ seems to have been replaced by 劦.
Is there any reason why the browser would be showing �� instead of 劦 in this scenario?
Look again at the HTML file.
I see the first 2 bytes of a character encoded in UTF-8, followed by ... let's imagine there was originally a \xA0, and this was mutated to when the file was created by applying global substitutions to the UTF-8-encoded data.
However, \xE5\x8A\xA0 UTF-8 decodes to U+52A0 which is not the same as the alien character which is U+52A6 ... not close enough to an answer.

Can't find the source of the gap at the top of my website. Double Checked all my tags

My html and body have margins and padding of 0. all my top level elements have the same.
I don't want to hack it with positon.. any tips? (ps I hacked the position up 33px. just in the meantime until i could find a solution that's less hacky.)
http://dextressband.com/landingpage.php
Right after your <title> tag, and in a few other places, you have this little problem:
This entity encodes U+FEFF, which is Unicode BOM. The entity is treated as text, and while it's completely invisible, it still reserves vertical space for itself before your div with main site content.
This usually means something is using a wrong encoding (UTF-8 with BOM where UTF-8 without BOM is expected) - since it's PHP, I'd start with locating the place where those sections of the site is rendered. If it's rendering the site from a file, check that the file is saved as UTF-8 without BOM in your favorite text editor.
Your html is invalid. Be sure the html is not producing this kind of "view source" in Firefox or other browsers that color code errors like this.
UPDATE
Okay, this will sound crazy, but you have a non-printable character in your head that is causing output to start. Follow these directions carefully:
Put your cursor to the right of the "link" in your first stylesheet declaration (where i've put the asterisk...like this:)
Now use your backspace (or delete if you are on a Mac) and delete until you remove the HEAD tag so that the following tag is gone too
Now type and hit enter and type
try loading your page again, because the demononic, non-printing character will be gone now and your html again valid.
btw, for fun
here is a picture of your non-printable character in my ViM session
Your html is missing an html element but you have an html 4.01 strict doctype
You have iframes but an html 4.01 strict doctype.

HTML Character Encoding tag do not work for me

Hello to everyone and sorry for my novice question.
I have an HTML document in which I would like to put some Greek Characters.
I have used the following:
<head>
<meta charset="UTF-8">
</head>
but the result is a sequence of such characters "Ξ Ξ±Ξ½Ξ±Ξ³ΞΉΟΟΞ·Ο ΞΞ­ΟΞ²Ξ±Ο"
Could you please advice a solution?
Thanks in advance and cheers to everyone!!
Mybe your file encoding is ANSI, you can change your file encoding with notepad++ in encoding tab
The cake is:
Make sure you are writing down your HTML code in UTF-8/UTF-16 character encoding, according to your configuration HTML editor, notepad, vim or whatever.
Set the value of the meta HTML tag exactly with the character encoding in which you have saved your HTML file.
This is optional, but is extremely recommendable to assure that the web server recognize your HTML encoding to avoid mistranslations.
To sum up: you have to be sure that your encoding file in which you leave your HTML code is the same character encoding you specify in the meta HTML tag, otherwise the client web browser will show the characters according to the meta info HTML tag and will probably make that unreadable.

Characters not displaying correctly in different browsers

I used certain characters in website such as • — “ ” ‘ ’ º ©.
I found that when testing to see what my website looked like under different browsers (BrowserLab)
the afore-mentioned characters are replaced with �.
I then changed the charset in the webpage header from:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
to
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Suddenly all the pages have the above mentioned characters replaced with a ?.
Even more puzzling is this is not always consistent across and even within the same page, as some sections display the character • and © correctly.
In particular, I need to replace the character • with one that will display across browsers, can anyone help me with the answer? Thanks.
You should save your HTML source as UTF8.
Alternatively, you can use HTML entities instead.
The source code needs to be saved in the same encoding as you're instructing the browser to parse it in. If you're saving your files in UTF-8, instruct the browser to parse it as UTF-8 by setting an appropriate HTTP header or HTML meta tag (headers preferable, your web server may be setting one without you knowing). Use a decent editor that clearly tells you what encoding you're saving the file as. If it doesn't display correctly, there's a discrepancy between what you're telling your browser the file is encoded in and what it's really encoded in.
Check to see if Apache is setup to send the charset. Look for the directive "AddDefaultCharset" and set it to Off in .htaccess or your config file.
Most/all browsers will take what is sent in the HTTP headers over what is in the document.
If you're using Notepad++, I suggest You to use Edit Plus editor to copy the text (which has the special characters) and paste it in your file. This should work.
Yes I had this problem too in notepad++ copy and pasting wasn't working with some symbols
I think SLaks is right
HTML entities for copyright symbol &#169

Why do symbols like apostrophes and hyphens get replaced with black diamonds on my website?

A website I've made has a few problems... On one of the pages, wherever there's an apostrophe (') or a dash (-), the symbol gets replaced with a weird black diamond with a question mark in the center of it
Here's what I mean
It seems this is happening all over the site wherever these symbols appear. I've never seen this before, can anyone explain it to me?
Suggestions on how to fix it would also be greatly appreciated.
See http://test.rfinvestments.co.za/index.php?c=team for a clear look at the problem.
It's an encoding problem. You have to set the correct encoding in the HTML head via meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
Replace "ISO-8859-1" with whatever your encoding is (e.g. 'UTF-8'). You must find out what encoding your HTML files are. If you're on an Unix system, just type file file.html and it should show you the encoding. If this is not possible, you should be able to find out somewhere what encoding your editor produces.
You need to change your text to 'Plain text' before pasting into the HTML document. This looks like an error I've had before by pasting straight from MS word.
MS word and other rich text editors often place hidden or invalid chars into your code. Try using — for your dashes, or ’ for apostrophes (etc), to eliminate the need for relying on your char encoding.
I have the same issue in my asp.net web application. I solved by this link
I just replace ' with ’ text like below and my site in browser show apostrophe without rectangle around as in question ask.
Original text in html page
Click the Edit button to change a field's label, width and type-ahead options
Replace text in html page
Click the Edit button to change a field’s label, width and type-ahead options
Look at your actual html code and check that the weird symbols are not originating there. This issue came up when I started coding in Notepad++ halfway after coding in Notepad. It seems to me that the older version of Notepad I was using may have used different encoding to Notepad's++ UTF-8 encoding. After I transferred my code from Notepad to Notepad++, the apostrophes got replaced with weird symbols, so I simply had to remove the symbols from my Notepad++ code.
If you are editing HTML in Notepad you should use "Save As" and alter the default "Encoding:" selection at the botom of the dialog to UTF-8.
you should also include-
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
This un-ambiguously sets the correct character set and informs the browser.
I experienced the same problem when I copied a text that has an apostrophe from a Word document to my HTML code.
To resolve the issue, all I did was deleted the particular word in my HTML and typed it directly, including the apostrophe. This action nullified the original copy and paste acton and displayed the newly typed apostrophe correctly
What I really don't understand with this kind of problem is that the html page I ran as a local file displayed perfectly in Chromium browser, but as soon as I uploaded it to my website, it produced this error.
Even stranger, it displayed perfectly in the Vivaldi browser whether displayed from the local or remote file.
Is this something to do with the way Chromium reads the character set? But why only with a remote file?
I fixed the problem by retyping the text in a simple text editor and making sure the single quote mark was the one I used.