HTML5 placeholder showing numerical HTML encoding - html

I have an application on Apache. My Apache is configured with default encoding ISO-8859, and I´m not able to change it because Apache suport others applications that need this.
Then, in my application I´m using numerical HTML encoding in special characters, like that: Usu& #225;rio (this is Usuário).
It´s working fine, but in placeholders and title (HTML5 elements), the interface is showing &#225 ; instead to show á.
Any idea?
Thanks

You could rename your .html file to .php and add following line to the first row:
<?php header('Content-Type:text/html; charset=UTF-8'); ?>
This will send a response from server that the content which is sent is encoded in utf-8.
By adding above code nothing will be broken and you wont see any difference exept for correct encoding.
In case you need to move the site from one server to another, you can undo those steps and everything will still work as expected.
It tried to reproduce your issue with the given HTML entity and placeholder encodes the character correctly.

Resolved. I used unicode code point instead numerical HTML encoding. Take a look at UTF-8 encoding table and unicode characters here.

Related

International characters in website file names

I need to create a website (in PHP) that has filenames that include international characters.
For example: transportører.php (notice the 'o' with the diagonal line through it).
So I happily create the file, save it, and upload it to the web server. Whenever I LINK to this file, however, it all goes wrong. I'll have the usual link syntax:
My Link Text
Upon clicking such a link, the web browser attempts to navigate to a non-existent page:
The requested URL /transportører.php was not found on this server.
Notice how the filename has been mutated? The "ø" character in "transportører.php" has been changed into the bizarre "ø" symbol (that's not a comma after the "A", by the way, but an actual component of the symbol itself).
There's obviously some sort of translation going on here, but what, why, and how do I prevent it?
I think, it's two possible reasons:
html encoding
Possibly the encoding of the html file is wrong, so the link is actually pointing to a wrong path. Add
<meta charset="UTF-8">
in the head section of your file.
server settings
If the server is resolving the link wrongly (you can check this by typing the address of your norwegian-named.php in the browser and see if it is replaced), you need to know which server you are using and investigate in this direction. For apache, How to change the default encoding to UTF-8 for Apache? looks promising.
As the URL isn’t percent-encoded in the hyperlink, browsers assume¹ UTF-8 for percent-encoding it, where ø becomes %C3%B8.
However, your server seems to expect/use ISO 8859-1 (instead of UTF-8), where ø becomes %F8.
A quick fix would be to link to the ISO 8859-1 percent-encoded URL:
transportører
(A better fix would be to let your server use UTF-8 for everything, and then to use the UTF-8 percent-encoded URL in the hyperlink.)
¹ Either by default, or because the linking page seems to use UTF-8 (at least according to the HTTP header Content-Type: text/html; charset=UTF-8).
Well, this is embarrassing. Everything was - in actual fact - working correctly. The 404 error made the filename LOOK "wrong" - e.g. transportører.php. However, this is actually correct. That is how HTML seems to reference the file "behind the scenes". So to the browser, "transportører.php" is synonymous with "transportører.php"
What was happening was that FileZilla (my FTP client) objects to international characters. It was changing the filename during upload.... replacing the international characters with "something else". The filenames LOOKED correct on the screen (when I viewed the website folder with Linux Mint's native FTP client), but the underlying character coding was NOT correct. The web-browsers could tell the difference, and hence didn't associated my links with the (mutated) file names, hence triggering an error 404.
The solution in a nutshell: I used Linux Mint native FTP to upload my files, overwriting the ones uploaded by FileZilla, and everything just sprang into life.
Thanks to everyone who offered advice... it was all good stuff, just not the solution in this particular case.

Why W3 validator fails?

When I use Chrome, Firefox or Opera i have no problem with my website under Desktop computer, but when I use default Android browser (also on Google search preview), right menu does not show up. I checked on W3 validator website, but for index page, it says it cannot be checked:
http://volkangezer.scienceontheweb.net/index.php
For another page:
http://volkangezer.scienceontheweb.net/iletisim.php?dil=en
It shows some errors, but probably they are not the reason for this problem.
My first question is why my index.php page cannot be checked? The both pages have exactly the same encoding and include files.
Second question is, why right menu does not show up?
Thank you.
The validator tells you why it can't check it:
Sorry, I am unable to validate this document because on line 350 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
The error was: utf8 "\xC4" does not map to Unicode
In other words, either your file is screwed up or it is encoded using a character encoding that doesn't match the one it claims to use.
See Character encodings for beginners and the documents it links to for more information on the subject.

Displaying unicode symbols in HTML

I want to simply display the tick (✔) and cross (✘) symbols in a HTML page but it shows up as either a box or goop ✔ - obviously something to do with the encoding.
I have set the meta tag to show utf-8 but obviously I'm missing something.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Edit/Solution: From comments made, using FireBug I found the headers being passed by my page were in fact "Content-Type: text/html" and not UTF-8. Looking at the file format using Notepad++ showed my file was formatted as "UTF-8 without BOM". Changing this to just UTF-8 the symbols now show correctly... but firebug still seems to indicate the same content-type.
You should ensure the HTTP server headers are correct.
In particular, the header:
Content-Type: text/html; charset=utf-8
should be present.
The meta tag is ignored by browsers if the HTTP header is present.
Also ensure that your file is actually encoded as UTF-8 before serving it, check/try the following:
Ensure your editor save it as UTF-8.
Ensure your FTP or any file transfer program does not mess with the file.
Try with HTML encoded entities, like &#uuu;.
To be really sure, hexdump the file and look as the character, for the ✔, it should be E2 9C 94 .
Note: If you use an unicode character for which your system can't find a glyph (no font with that character), your browser should display a question mark or some block like symbol. But if you see multiple roman characters like you do, this denotes an encoding problem.
I know an answer has already been accepted, but wanted to point a few things out.
Setting the content-type and charset is obviously a good practice, doing it on the server is much better, because it ensures consistency across your application.
However, I would use UTF-8 only when the language of my application uses a lot of characters that are available only in the UTF-8 charset. If you want to show a unicode character or symbol in one of cases, you can do so without changing the charset of your page.
HTML renderers have always been able to display symbols which are not part of the encoding character set of the page, as long as you mention the symbol in its numeric character reference (NCR). Sounds weird but its true.
So, even if your html has a header that states it has an encoding of ansi or any of the iso charsets, you can display a check mark by using its html character reference, in decimal - ✓ or in hex - ✓
So its a little difficult to understand why you are facing this issue on your pages. Can you check if the NCR value is correct, this is a good reference http://www.fileformat.info/info/unicode/char/2713/index.htm
Make sure that you actually save the file as UTF-8, alternatively use HTML entities (&#nnn;) for the special characters.
Unlike proposed by Nicolas, the meta tag isn’t actually ignored by the browsers. However, the Content-Type HTTP header always has precedence over the presence of a meta tag in the document.
So make sure that you either send the correct encoding via the HTTP header, or don’t send this HTTP header at all (not recommended). The meta tag is mainly a fallback option for local documents which aren’t sent via HTTP traffic.
Using HTML entities should also be considered a workaround – that’s tiptoeing around the real problem. Configuring the web server properly prevents a lot of nuisance.
I think this is a file problem, you simple saved your file in 1-byte encoding like latin-1. Google up your editor and how to set files to utf-8.
I wonder why there are editors that don't default to utf-8.

HTML encoding issues - "Â" character showing up instead of " "

I've got a legacy app just starting to misbehave, for whatever reason I'm not sure. It generates a bunch of HTML that gets turned into PDF reports by ActivePDF.
The process works like this:
Pull an HTML template from a DB with tokens in it to be replaced (e.g. "~CompanyName~", "~CustomerName~", etc.)
Replace the tokens with real data
Tidy the HTML with a simple regex function that property formats HTML tag attribute values (ensures quotation marks, etc, since ActivePDF's rendering engine hates anything but single quotes around attribute values)
Send off the HTML to a web service that creates the PDF.
Somewhere in that mess, the non-breaking spaces from the HTML template (the s) are encoding as ISO-8859-1 so that they show up incorrectly as an "Â" character when viewing the document in a browser (FireFox). ActivePDF pukes on these non-UTF8 characters.
My question: since I don't know where the problem stems from and don't have time to investigate it, is there an easy way to re-encode or find-and-replace the bad characters? I've tried sending it through this little function I threw together, but it turns it all into gobbledegook doesn't change anything.
Private Shared Function ConvertToUTF8(ByVal html As String) As String
Dim isoEncoding As Encoding = Encoding.GetEncoding("iso-8859-1")
Dim source As Byte() = isoEncoding.GetBytes(html)
Return Encoding.UTF8.GetString(Encoding.Convert(isoEncoding, Encoding.UTF8, source))
End Function
Any ideas?
EDIT:
I'm getting by with this for now, though it hardly seems like a good solution:
Private Shared Function ReplaceNonASCIIChars(ByVal html As String) As String
Return Regex.Replace(html, "[^\u0000-\u007F]", " ")
End Function
Somewhere in that mess, the non-breaking spaces from the HTML template (the s) are encoding as ISO-8859-1 so that they show up incorrectly as an "Â" character
That'd be encoding to UTF-8 then, not ISO-8859-1. The non-breaking space character is byte 0xA0 in ISO-8859-1; when encoded to UTF-8 it'd be 0xC2,0xA0, which, if you (incorrectly) view it as ISO-8859-1 comes out as " ". That includes a trailing nbsp which you might not be noticing; if that byte isn't there, then something else has mauled your document and we need to see further up to find out what.
What's the regexp, how does the templating work? There would seem to be a proper HTML parser involved somewhere if your strings are (correctly) being turned into U+00A0 NON-BREAKING SPACE characters. If so, you could just process your template natively in the DOM, and ask it to serialise using the ASCII encoding to keep non-ASCII characters as character references. That would also stop you having to do regex post-processing on the HTML itself, which is always a highly dodgy business.
Well anyway, for now you can add one of the following to your document's <head> and see if that makes it look right in the browser:
for HTML4: <meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
for HTML5: <meta charset="utf-8">
If you've done that, then any remaining problem is ActivePDF's fault.
If any one had the same problem as me and the charset was already correct, simply do this:
Copy all the code inside the .html file.
Open notepad (or any basic text editor) and paste the code.
Go "File -> Save As"
Enter you file name "example.html" (Select "Save as type: All Files (.)")
Select Encoding as UTF-8
Hit Save and you can now delete your old .html file and the encoding should be fixed
Problem:
Even I was facing the problem where we were sending '£' with some string in POST request to CRM System, but when we were doing the GET call from CRM , it was returning '£' with some string content. So what we have analysed is that '£' was getting converted to '£'.
Analysis:
The glitch which we have found after doing research is that in POST call we have set HttpWebRequest ContentType as "text/xml" while in GET Call it was "text/xml; charset:utf-8".
Solution:
So as the part of solution we have included the charset:utf-8 in POST request and it works.
In my case this (a with caret) occurred in code I generated from visual studio using my own tool for generating code. It was easy to solve:
Select single spaces ( ) in the document. You should be able to see lots of single spaces that are looking different from the other single spaces, they are not selected. Select these other single spaces - they are the ones responsible for the unwanted characters in the browser. Go to Find and Replace with single space ( ). Done.
PS: It's easier to see all similar characters when you place the cursor on one or if you select it in VS2017+; I hope other IDEs may have similar features
In my case I was getting latin cross sign instead of nbsp, even that a page was correctly encoded into the UTF-8. Nothing of above helped in resolving the issue and I tried all.
In the end changing font for IE (with browser specific css) helped, I was using Helvetica-Nue as a body font changing to the Arial resolved the issue .
I was having the same sort of problem. Apparently it's simply because PHP doesn't recognise utf-8.
I was tearing my hair out at first when a '£' sign kept showing up as '£', despite it appearing ok in DreamWeaver. Eventually I remembered I had been having problems with links relative to the index file, when the pages, if viewed directly would work with slideshows, but not when used with an include (but that's beside the point. Anyway I wondered if this might be a similar problem, so instead of putting into the page that I was having problems with, I simply put it into the index.php file - problem fixed throughout.
The reason for this is PHP doesn't recognise utf-8.
Here you can check it for all Special Characters in HTML
http://www.degraeve.com/reference/specialcharacters.php
Well I got this Issue too in my few websites and all i need to do is customize the content fetler for HTML entites. before that more i delete them more i got, so just change you html fiter or parsing function for the page and it worked. Its mainly due to HTML editors in most of CMSs. the way they store parse the data caused this issue (In My case). May this would Help in your case too

File upload mojibake

How do you do a file upload in an HTML form without running into mojibake?
I have a form that has three fields:
a file field
a required text field
a text field which accepts Japanese characters
I've set up my HTML form with the attribute enctype='multipart/form-data'. But when the form submission fails due to the missing required field, I get redirected to the same page but my 2nd text field (the one that accepts the Jap. chars) is already mojibaked.
However, if I remove the enctype or change it to anything else, and when the form submission fails, I see the Japanese chars as they are (no mojibake). The problem is, if this succeeds, I am unable to read the uploaded files.
Any ideas how to fix this??
Mojibake (mangled display of Japanese characters) can have two causes:
The data on the page is in the right character encoding, but the browser does not recognize it.
Some characters on the page use the wrong encoding (the server wrote them in an incorrect encoding).
If the other characters on the page (outside of your form) show correctly, you produced broken output on your server.
If everything is clobbered, and you can fix it by manually setting a different encoding from the browser's menu, then the page encoding is not properly specified.
What kind of content-type headers and HTML meta tags do you use?
I've figured it out (by reverse-engineering appfuse (appfuse.org) which does not seem to be affected by mojibake with its file upload form ).
It solved it by setting the charset encoding to UTF-8 in the server side (with spring's org.springframework.web.filter.CharacterEncodingFilter ). Thus, I guess multipart-/form-data really does screw up the character encoding ( or at least for java ).