Validation error: "Byte-Order Mark found in UTF-8 File" - html

I'm working on a website and, while displaying it on Firefox is fine, on Internet Explorer I've got a lot of problems. I used the W3C validator and I got a lot of strange errors.
Here's the link to the website: http://misenplacecatering.it/
The first validation error, which I think is the most relevant, is this:
Byte-Order Mark found in UTF-8 File. The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause problems for some text editors and older browsers. You may want to consider avoiding its use until it is better supported.
and
Line 1, Column 1: Non-space characters found without seeing a doctype first. Expected .
<!DOCTYPE HTML>
I've read other questions about this issue, so I tried to open the file with different editors (I always use Vim, anyway), but I don't see any space or anything else before the doctype definition. I even used Notepad++ and used an option to remove the BOM, but nothing.
How can I fix it?

If using Notepad++, use Convert to UTF-8 without BOM.
If you are using PHP, make sure that any included/required file is in either in ASCII or UTF without a BOM, as PHP doesn't handle non-ASCII file very well (this one gave me a headache once)
You could try converting your files to ASCII, if you don't need UTF characters.
In your <meta charset> attribute, try writing the value within quotes.

The free text editor PSPad has a hex editing mode which is very handy for seeing exactly what you really have in your text files.

Related

How do I get rid of this character ?

This character:

shows up on my site 3 times and for all 3 cases it's shown after a closed div tag. I searched the web and SOF and there are some solutions but none of them worked on mine so decided to post here. I am using .NET. I realize that this is not sufficient info but i am new to programming so not sure what other info you might need. Please let me know. Thanks!
Looks like an byte order mark. Please check your source and output encoding.
Yes it is the Byte Order Mark (BOM). It was driving me crazy too. I researched and started reading about BOM and tried adding charset="UTF-8" to some script tags but no go.
I use Dreamweaver and found that when I saved (save as) some recent html files, the option for "Include Unicode Signature (BOM) was checked. I unchecked and saved and it resolved the unwanted characters (I guess it saves it without the BOM)!!
Updating the meta tags charset to UTF-8 will resolve this too and is recommended (which means dozens of pages for me) but I needed this quick fix.
Also, saving with notepad++ looks to do the trick as well. Here's a related article wrt ++ and settings wrt BOM: notepad++ converting ansi encoded file to utf-8
I hope this help someone!
I use Dreamweaver and found that when I saved (save as) some recent html files, the option for "Include Unicode Signature (BOM) was checked. I unchecked and saved and it resolved the unwanted characters (I guess it saves it without the BOM)!!
This is the perfect solution. its worked for me.
thx everyone

Strange validation behaviour of the charset with two html of the same template

I have got two HTML documents based on the same template. I built both exactly the same and then changed the contents inside the divs. I'm using the DTD HTML 4.01 Transitional and the charset ISO-8859-15 (for Spanish language, you know accents and so on) in a meta tag inside the head.
And when it comes to validation, one parses and the other doesn't, and I can't figure out why.
It complains about some accents in one of the documents that are also present in the other document which gets no complaints.
I find it funny, but there must be a reason.
I think I found the problem, I just opened with the simple notepad the file that was giving me trouble and once opened there I could see that I had strange characters like “ or ‰ in my code. I just removed them and wrote the contents properly and, of course, it passed the parser. I could not see those characters with my file opened from notepad++, that's why the parser error I was getting was so strange to me.
I didn't set the encoding in my Notepad++ to ANSI and maybe that was the reason I couldn't see those odd characters.

How to make the website show signs like "č" and "ć"?

I'm making a website that is in Croatian, and I need to use signs like: "č", "ć", "ž", "đ" and "š". They are currently displayed as little boxes.
Info:
I use Notepad ++.
I set the encoding there to UTF-8.
I put the following line of HTML in: <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
However, it does not work. Even Notepad ++ can't display my characters using UTF-8, so that would suggest that I should probably use something else...
http://webdesign.maratz.com/lab/utf_table/
Use HTML entities, for example
č : č
ž : ž
This sounds more like a font issue than a character encoding issue. If it were a character encoding issue, the characters would most likely be displayed as 2+ ASCII characters. The boxes, however, typically mean the character encoding is correct, but that specific character is not available in the font being used (which is especially common with lesser-used fonts). This would explain why it's behaving incorrectly in both the website and Notepad++.
To fix the issue, simply use a different font in your editor and website.
Note: I recommend a widely used font for the best chance of it working. Specifying a generic name in the website (e.g. serif or sans-serif) will probably have even better results, as the OS/browser would decide on the best font to use.
In short, be consistent about your character encoding throughout.
Configure your editor to save in the encoding you want
If you use any server side programming, make sure it isn't transcoding your data
If you use a database, make sure it is configured to use the same encoding
Configure your server to emit a Content-Type header that specifies that encoding
Use the meta tag in your question
The W3C provides useful material on encodings that starts here.
A useful site for special characters and their ASCII-codes: CopyPaste Character
To 'type' them, use the alt codes.
However, to use them in your site, you'll better use the HTML codes like you can find on CPC
As a test, try this:
<span style="font-family:Arial Unicode MS">
č ć ž đ š
</span>
You should be able to see your characters correctly.
I've just copied and pasted a line from your question along with your meta tag, placed it into a plain text file in vi.
It works just fine - all characters are displayed fine: http://www.dusystems.com/tmp/1.html
If you can't do the same with your editor then the problem is with the editor and not character sets and encodings.
If you're on Windows you can use its built-in Notepad to edit UTF-8 files. Open Notepad, type all of your special characters, add the meta tag. When doing Save As select UTF-8 from the Encoding drop-down in the dialog. Save as something.html and open in IE. It will 100% work.

HTML Encoding Charset Problem I think?

I've been asked to add a testimonial to this page...
http://www.orchardkitchens.com/Showroom/testimonials.html
As you will see there are funny characters showing up all over the place, and it has thrown the structure of the page out.
I've since reloaded the backup and the funny chars are still appearing. Any ideas what I need to do??
Please ask if you need more info from me about the problem in hand.
Many thanks,
ETFairfax.
Looks to me as though some of the text was encoded as UTF-8 yet loaded as if it were an ANSI charset then an HTML encode run over it. Resulting in these extra characters. You will need to find the source text re-build the HTML ensuring whatever is reading the source text understands that its in UTF-8 encoding.
Valid HTML might be a start; a HTML document shouldn't start with a meta tag directly. Also it seems that the charset problem is not with your web page but rather in the backend code. Look at the source, there are numerous things such as
“
appearing which are HTML character entities for things that UTF-8 encoding yields when interpreted as Latin 1. So you should probably fix your code instead of the HTML (well, that too).
Your HTML is syntactically invalid. The <!doctype> is missing, the <html> tag is missing, the <head> tag is missing, the meta information cannot be parsed reasonably by the webbrowser.
Fix your HTML first and then retry.
As to the character encoding story, just ensure that you're using one and same character encoding everywhere. In the datastore, in the source files, in the response headers, etcetera. You may find the introductory text of this article useful to learn a bit more about character encodings. If you actually know/use Java, then you may find the proposed solutions useful as well.

apostrophes coming in as �

I am reading in HTML from a file and displaying it on a web page:
When I look at in the source I see:
The Club’s summer junior programs
but it shows up as:
The Club�s summer junior program
What is happening here and why the � is showing up?
Did you set the proper encoding of the html page?
Read here and here.
I'm guessing you (or someone close to you) is copy/pasting from Word and you are seeing the webby effects of word's [not so] smart quotes. The work around is to set the character encoding to utf-8 or windows-1252.
This is definitely a character encoding issue. It means the page says it has X encoding, but actually it has Y.
A very interesting read by Joel: http://www.joelonsoftware.com/articles/Unicode.html about this topic, definitively a must read if you didn't already read this.
It explains pretty well why these problems occur, how they came to be and how to avoid it :).
May be you have copied text from a work editor, like MS Word, which changes quotes to open quotes and closed quotes characters. When such a text is copied to a text file, it gives these problems.
A simple solution can be to type these quotes again in the text editor.