Does the character encoding of an html page affect the character encoding of characters submitted via an input field (i.e. what a user is typing/what gets submitted), or is it the encoding of the display of the page only?
Yes. Browsers encode form data in the encoding of the page containing the form.
Related
Let's suppose that an chracter encoding format called X exists (for example UTF-8), if I insert in the HTML file the tag <meta charset="X"> and then I save the file, obviously, with the same encoding, how can the broswer read the file later?
I mean, how can the broswer know the encoding of an HTML page if, to get encoding, it must read the file? It seems a sort of loop.
According to https://www.w3.org/TR/html4/charset.html#h-5.2.2, a browser gets the correct encoding from the Content-Type header field of the HTTP response. If this field is not present, the browser reads the HTML page until the META tag, assuming all bytes were ASCII characters. So this only works if ASCII is a subset of the actual encoding.
I have a normal html form with the action to http://another-site.com.
My website(http://my-site.com) is encoded with UTF-8, but another-site is encoded with GBK.
The problem is, when I submit my form from my-site.com, and then the page forward to another-site.com, which is encoded with GBK as i mentioned. The page's characters are totally messy.
Is it my problem ? How do I tell the browser to use GBK in another-site.com ?
NOTE : Both another-site.com and my-site.com have set content-type with its encoding type.
I have a form that doesn't work properly when I input somehting with an accent.
If I input "bâtiment", for instance, in the form, I'm sent to
formation.php?search=b%E2timent, instead of formation.php?search=bâtiment
What could cause that ?
EDIT
I have another form that sends me correctly to something.php?search=bâtiment, with the accent in the URL...
%E2 is how you represent â in a URL.
It will be decoded automatically in $_GET['search']
you can convert it back on the far end using $search=url_decode($_REQUEST['search']); URL specs say you can't use accent characters as valid URI characters so they are URL encoded on the fly for you.
Check the page encoding.
Are you sure of what encoding you are using
%E2 is a latin1 encoding of â but many code use the windows cp1252.
Try to use %c6%92 (the utf8 encoding of â) and (%83, cp1252)
For example, chinese text(GB2312) is pasted into a text box(or text area) of a html page and the form is posted. At the server side, is there any means by which this character set gets detected?
How would this detection behave if texts belonging to different character sets are pasted in a text box?
You need to tell the browser what encoding to use by adding an accept-charset="UTF-8" (or similar) attribute to the form. Apparently this defaults to the character set of the page, but I wouldn't count on that. The browser won't tell you what encoding it used when it submits the form, so you need to assume it used the one you told it to.
The web browser should send up a content type including encoding when it posts the data.
I find it helpful to think of text as "just text" (without any particular encoding) until an encoding is required. So the browser shouldn't care what encoding (if any) was used to originally produce the text (e.g. if it was copied and pasted from a file, the file's encoding is irrelevant). It decides what encoding to use when posting it to the server, obviously making sure that it's an encoding which covers all the characters it needs to send.
if you use php on the server, you can use mb_detect_encoding
How do you do a file upload in an HTML form without running into mojibake?
I have a form that has three fields:
a file field
a required text field
a text field which accepts Japanese characters
I've set up my HTML form with the attribute enctype='multipart/form-data'. But when the form submission fails due to the missing required field, I get redirected to the same page but my 2nd text field (the one that accepts the Jap. chars) is already mojibaked.
However, if I remove the enctype or change it to anything else, and when the form submission fails, I see the Japanese chars as they are (no mojibake). The problem is, if this succeeds, I am unable to read the uploaded files.
Any ideas how to fix this??
Mojibake (mangled display of Japanese characters) can have two causes:
The data on the page is in the right character encoding, but the browser does not recognize it.
Some characters on the page use the wrong encoding (the server wrote them in an incorrect encoding).
If the other characters on the page (outside of your form) show correctly, you produced broken output on your server.
If everything is clobbered, and you can fix it by manually setting a different encoding from the browser's menu, then the page encoding is not properly specified.
What kind of content-type headers and HTML meta tags do you use?
I've figured it out (by reverse-engineering appfuse (appfuse.org) which does not seem to be affected by mojibake with its file upload form ).
It solved it by setting the charset encoding to UTF-8 in the server side (with spring's org.springframework.web.filter.CharacterEncodingFilter ). Thus, I guess multipart-/form-data really does screw up the character encoding ( or at least for java ).