I am reading in HTML from a file and displaying it on a web page:
When I look at in the source I see:
The Club’s summer junior programs
but it shows up as:
The Club�s summer junior program
What is happening here and why the � is showing up?
Did you set the proper encoding of the html page?
Read here and here.
I'm guessing you (or someone close to you) is copy/pasting from Word and you are seeing the webby effects of word's [not so] smart quotes. The work around is to set the character encoding to utf-8 or windows-1252.
This is definitely a character encoding issue. It means the page says it has X encoding, but actually it has Y.
A very interesting read by Joel: http://www.joelonsoftware.com/articles/Unicode.html about this topic, definitively a must read if you didn't already read this.
It explains pretty well why these problems occur, how they came to be and how to avoid it :).
May be you have copied text from a work editor, like MS Word, which changes quotes to open quotes and closed quotes characters. When such a text is copied to a text file, it gives these problems.
A simple solution can be to type these quotes again in the text editor.
Related
This character:

shows up on my site 3 times and for all 3 cases it's shown after a closed div tag. I searched the web and SOF and there are some solutions but none of them worked on mine so decided to post here. I am using .NET. I realize that this is not sufficient info but i am new to programming so not sure what other info you might need. Please let me know. Thanks!
Looks like an byte order mark. Please check your source and output encoding.
Yes it is the Byte Order Mark (BOM). It was driving me crazy too. I researched and started reading about BOM and tried adding charset="UTF-8" to some script tags but no go.
I use Dreamweaver and found that when I saved (save as) some recent html files, the option for "Include Unicode Signature (BOM) was checked. I unchecked and saved and it resolved the unwanted characters (I guess it saves it without the BOM)!!
Updating the meta tags charset to UTF-8 will resolve this too and is recommended (which means dozens of pages for me) but I needed this quick fix.
Also, saving with notepad++ looks to do the trick as well. Here's a related article wrt ++ and settings wrt BOM: notepad++ converting ansi encoded file to utf-8
I hope this help someone!
I use Dreamweaver and found that when I saved (save as) some recent html files, the option for "Include Unicode Signature (BOM) was checked. I unchecked and saved and it resolved the unwanted characters (I guess it saves it without the BOM)!!
This is the perfect solution. its worked for me.
thx everyone
I am working on a site which has some Norwegian words. When I used "På" inside a <span> it is showing as "PÃ¥" in the browser.This is happening only for a particular page. For others it is working fine.I have tried to copy-paste from other working pages.But had no effect.It is showing "PÃ¥" instead of "På".Why this is happening?
you need to use å insead of å
see this link for html codes-
http://www.ascii.cl/htmlcodes.htm
Try converting your special characters to equivalent HTML entities using this converter
The character encoding of the page is wrong: the real encoding differs from the declared encoding. Using entity references for all non-Ascii characters would hide the symptoms (with the pertaining risk that later on, when someone inserts an “å”, things go wrong again). But the solution is to remove the conflict.
Check out the tutorial Declaring character encodings in HTML. If you need further help with this, posting the URL (not just copy of all code) is essential.
Recently I've done some careless copying and pasting into my html documents. Because the document type is set to Strict, the quotation marks and apostrophes show up as this crazy symbol: �.
Example: Brad says, �Don�t rock the boat baby.�
I considered changing the document type from Strict (which could turn out to be the easiest thing to do) but I'm not sure if changing from Strict would have any negative repercussions.
Naturally, I need to get rid of them. The problem is that I need to replace A LOT of them from a lot of different documents. I'd use the replace feature on Textpad, but it doesn't recognize � , so I can't change it. I've been reduced to going through all of the code and doing the tedious replacing.
Does anyone know of a good way to replace these things? It could be some other software, or anything else really.
I always use textmate on the Mac to replace them because it does recognize those characters. Try notepad++ and a bunch of different text editors and see what you come up with.
I remember dealing with this when I first started out and my secretary got reprimanded for such a crime.
Ok, so I want to have the characters from below in my html page. Seems easy, except I can't find the HTML encoding for them.
Note: I would like to do this without having sized elements, plain ol' text would be fine ^_^.
Cheers.
You can see that they have a unicode number of the selected character - at the bottom of the picture ("U+266A: Eighth Note").
Simply use the last portion in a unicode character entity: ♪ - ♪
If your page is already UTF-8, you can simply paste it in.
Try encoding it as █ - that should do the trick!
In a UTF-8 encoded page, just copy and paste them as-is.
Otherwise, use the number that the dialog gives you for each character, e.g. ♪
However, when working with rather exotic characters, be very wary of font support. See e.g. this question for background: Unicode support in Web standard fonts
This page gives some information about support for the characters you want to use. They seem to be relatively well supported, but a test on Linux and Mac machines won't hurt.
Here is one comprehensive entity reference. If you want to convert symbols into their entity counterparts, I suggest using this converter.
My suggestion is to use hexadecimal reference. ( it's easy dont worry :) )
for example, the first character you have highlighted in red got ascii value of 175, which is AF in hex.
So in short you can encode it using %AF, and so on...
is it clear mate? Let me know if you need further explanation or help about this :)
Edit: my post is meant for url encoding.
I've been asked to add a testimonial to this page...
http://www.orchardkitchens.com/Showroom/testimonials.html
As you will see there are funny characters showing up all over the place, and it has thrown the structure of the page out.
I've since reloaded the backup and the funny chars are still appearing. Any ideas what I need to do??
Please ask if you need more info from me about the problem in hand.
Many thanks,
ETFairfax.
Looks to me as though some of the text was encoded as UTF-8 yet loaded as if it were an ANSI charset then an HTML encode run over it. Resulting in these extra characters. You will need to find the source text re-build the HTML ensuring whatever is reading the source text understands that its in UTF-8 encoding.
Valid HTML might be a start; a HTML document shouldn't start with a meta tag directly. Also it seems that the charset problem is not with your web page but rather in the backend code. Look at the source, there are numerous things such as
“
appearing which are HTML character entities for things that UTF-8 encoding yields when interpreted as Latin 1. So you should probably fix your code instead of the HTML (well, that too).
Your HTML is syntactically invalid. The <!doctype> is missing, the <html> tag is missing, the <head> tag is missing, the meta information cannot be parsed reasonably by the webbrowser.
Fix your HTML first and then retry.
As to the character encoding story, just ensure that you're using one and same character encoding everywhere. In the datastore, in the source files, in the response headers, etcetera. You may find the introductory text of this article useful to learn a bit more about character encodings. If you actually know/use Java, then you may find the proposed solutions useful as well.