Why W3 validator fails? - html

When I use Chrome, Firefox or Opera i have no problem with my website under Desktop computer, but when I use default Android browser (also on Google search preview), right menu does not show up. I checked on W3 validator website, but for index page, it says it cannot be checked:
http://volkangezer.scienceontheweb.net/index.php
For another page:
http://volkangezer.scienceontheweb.net/iletisim.php?dil=en
It shows some errors, but probably they are not the reason for this problem.
My first question is why my index.php page cannot be checked? The both pages have exactly the same encoding and include files.
Second question is, why right menu does not show up?
Thank you.

The validator tells you why it can't check it:
Sorry, I am unable to validate this document because on line 350 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
The error was: utf8 "\xC4" does not map to Unicode
In other words, either your file is screwed up or it is encoded using a character encoding that doesn't match the one it claims to use.
See Character encodings for beginners and the documents it links to for more information on the subject.

Related

International characters in website file names

I need to create a website (in PHP) that has filenames that include international characters.
For example: transportører.php (notice the 'o' with the diagonal line through it).
So I happily create the file, save it, and upload it to the web server. Whenever I LINK to this file, however, it all goes wrong. I'll have the usual link syntax:
My Link Text
Upon clicking such a link, the web browser attempts to navigate to a non-existent page:
The requested URL /transportører.php was not found on this server.
Notice how the filename has been mutated? The "ø" character in "transportører.php" has been changed into the bizarre "ø" symbol (that's not a comma after the "A", by the way, but an actual component of the symbol itself).
There's obviously some sort of translation going on here, but what, why, and how do I prevent it?
I think, it's two possible reasons:
html encoding
Possibly the encoding of the html file is wrong, so the link is actually pointing to a wrong path. Add
<meta charset="UTF-8">
in the head section of your file.
server settings
If the server is resolving the link wrongly (you can check this by typing the address of your norwegian-named.php in the browser and see if it is replaced), you need to know which server you are using and investigate in this direction. For apache, How to change the default encoding to UTF-8 for Apache? looks promising.
As the URL isn’t percent-encoded in the hyperlink, browsers assume¹ UTF-8 for percent-encoding it, where ø becomes %C3%B8.
However, your server seems to expect/use ISO 8859-1 (instead of UTF-8), where ø becomes %F8.
A quick fix would be to link to the ISO 8859-1 percent-encoded URL:
transportører
(A better fix would be to let your server use UTF-8 for everything, and then to use the UTF-8 percent-encoded URL in the hyperlink.)
¹ Either by default, or because the linking page seems to use UTF-8 (at least according to the HTTP header Content-Type: text/html; charset=UTF-8).
Well, this is embarrassing. Everything was - in actual fact - working correctly. The 404 error made the filename LOOK "wrong" - e.g. transportører.php. However, this is actually correct. That is how HTML seems to reference the file "behind the scenes". So to the browser, "transportører.php" is synonymous with "transportører.php"
What was happening was that FileZilla (my FTP client) objects to international characters. It was changing the filename during upload.... replacing the international characters with "something else". The filenames LOOKED correct on the screen (when I viewed the website folder with Linux Mint's native FTP client), but the underlying character coding was NOT correct. The web-browsers could tell the difference, and hence didn't associated my links with the (mutated) file names, hence triggering an error 404.
The solution in a nutshell: I used Linux Mint native FTP to upload my files, overwriting the ones uploaded by FileZilla, and everything just sprang into life.
Thanks to everyone who offered advice... it was all good stuff, just not the solution in this particular case.

Chrome form POST shows "(unable to decode value)" and database stores it as a question mark

I have a test site and test DB both set to windows-1252. When I type Alt+234 into Chrome it puts this symbol in the field: Ω. And when I submit the form it posts and stores it as Ω I'm assuming this is the browser saying "hey, this isn't in the specified charset but I do know of an html equivalent, so I'll post that instead". Fine. The symbol appears properly after saving, I can save, save, save, and it always appears fine. But if I try the same thing with Alt+230 the browser does not submit it's html entity value of µ. Instead I see "(unable to decode value)" when viewing the POST in the Chrome DevTool window. And it ends up being stored in the database as a question mark.
Why does it treat Alt+234 (Ω) differently than Alt+230 (µ)?
I know I should switch to UTF8 but I still would like to know why it is functioning this way. Thanks!
Using encodeURIComponent to wrap the value fixed the problem.
Broken:
`?value=${myValue}`
Working:
`?value=${encodeURIComponent(myValue)}`
U+03A9 Ω Greek capital letter omega is not part of Windows code page 1252.
U+00B5 µ Micro sign (which is not the exact same character as Greek mu) is part of 1252 (byte 181).
The Alt+keypad shortcut numbers don't align with code page 1252, or the current ANSI code page in general, so being able to type a character from that shortcut doesn't imply membership of those code pages. Instead they are from DOS code page 437.
And when I submit the form it posts and stores it as Ω I'm assuming this is the browser saying "hey, this isn't in the specified charset but I do know of an html equivalent, so I'll post that instead"
Yes, this is a long-standing weird unrecoverable mangling that HTML5 finally standardised, for when a character is not encodable in the encoding the page has requested.
Instead I see "(unable to decode value)" when viewing the POST in the Chrome DevTool window. And it ends up being stored in the database as a question mark.
The browser will be sending that character as code page 1252 byte 181. The devtools and whatever your application is aren't expecting to be dealing with code page 1252 bytes... probably they are expecting UTF-8. Because byte 181 on its own is not a valid UTF-8 sequence they can't keep it.

HTML5 placeholder showing numerical HTML encoding

I have an application on Apache. My Apache is configured with default encoding ISO-8859, and I´m not able to change it because Apache suport others applications that need this.
Then, in my application I´m using numerical HTML encoding in special characters, like that: Usu& #225;rio (this is Usuário).
It´s working fine, but in placeholders and title (HTML5 elements), the interface is showing &#225 ; instead to show á.
Any idea?
Thanks
You could rename your .html file to .php and add following line to the first row:
<?php header('Content-Type:text/html; charset=UTF-8'); ?>
This will send a response from server that the content which is sent is encoded in utf-8.
By adding above code nothing will be broken and you wont see any difference exept for correct encoding.
In case you need to move the site from one server to another, you can undo those steps and everything will still work as expected.
It tried to reproduce your issue with the given HTML entity and placeholder encodes the character correctly.
Resolved. I used unicode code point instead numerical HTML encoding. Take a look at UTF-8 encoding table and unicode characters here.

Internet Explorer Developer Center error : HTML1405 : Invalid character: U+0000 NULL. Null characters should not be used

I'm testing a website with Internet Explorer 10 on windows 8.
I have this error and i don't understand what does it mean :
HTML1405 "Invalid character: U+0000 NULL. Null characters should not
be used."
Here is the documentation on Internet Explorer Developer Center / F12 developer tools console error messages but there is no suggestions for how to fix the errors :
http://msdn.microsoft.com/en-us/library/ie/hh180764%28v=vs.85%29
This error is at the end of the source code of the website, after </html> but there is nothing here.
The HTML document as sent to a browser contains, in addition to some newline characters, a NUL character U+0000 after the end tag </html>. I checked this the clumsy way of using Rex Swain’s HTTP Viewer to analyze your page, with the Display Format option set to Hex. At the end of the result listing, there is the following line:
47FF0: 3C2F68746D6C3E0D 0A300D0A0D0A </html>• •0••••
So it seems that IE 10 is right here and the W3C validator is wrong. I’m not sure exactly how the W3C HTML5 CR defines the characters allowed in HTML source (it seems to say this rather indirectly, via the parsing algorithms), but by XML specs as well as previous HTML specs, NUL is simply disallowed.
In any case, NUL does not do any good there and should be removed, but on the other hand, it is difficult to see how it could do any actual harm either especially when appearing after the end tag of the document.
Try converting your file as UTF-8 without BOM, I had the same problem, worked for me!
My texteditor Notepad++ has an option at "Encoding > Encode in UTF-8 without BOM".
See What's different between UTF-8 and UTF-8 without BOM? for more info.

File upload mojibake

How do you do a file upload in an HTML form without running into mojibake?
I have a form that has three fields:
a file field
a required text field
a text field which accepts Japanese characters
I've set up my HTML form with the attribute enctype='multipart/form-data'. But when the form submission fails due to the missing required field, I get redirected to the same page but my 2nd text field (the one that accepts the Jap. chars) is already mojibaked.
However, if I remove the enctype or change it to anything else, and when the form submission fails, I see the Japanese chars as they are (no mojibake). The problem is, if this succeeds, I am unable to read the uploaded files.
Any ideas how to fix this??
Mojibake (mangled display of Japanese characters) can have two causes:
The data on the page is in the right character encoding, but the browser does not recognize it.
Some characters on the page use the wrong encoding (the server wrote them in an incorrect encoding).
If the other characters on the page (outside of your form) show correctly, you produced broken output on your server.
If everything is clobbered, and you can fix it by manually setting a different encoding from the browser's menu, then the page encoding is not properly specified.
What kind of content-type headers and HTML meta tags do you use?
I've figured it out (by reverse-engineering appfuse (appfuse.org) which does not seem to be affected by mojibake with its file upload form ).
It solved it by setting the charset encoding to UTF-8 in the server side (with spring's org.springframework.web.filter.CharacterEncodingFilter ). Thus, I guess multipart-/form-data really does screw up the character encoding ( or at least for java ).