I have a normal html form with the action to http://another-site.com.
My website(http://my-site.com) is encoded with UTF-8, but another-site is encoded with GBK.
The problem is, when I submit my form from my-site.com, and then the page forward to another-site.com, which is encoded with GBK as i mentioned. The page's characters are totally messy.
Is it my problem ? How do I tell the browser to use GBK in another-site.com ?
NOTE : Both another-site.com and my-site.com have set content-type with its encoding type.
Related
Let's suppose that an chracter encoding format called X exists (for example UTF-8), if I insert in the HTML file the tag <meta charset="X"> and then I save the file, obviously, with the same encoding, how can the broswer read the file later?
I mean, how can the broswer know the encoding of an HTML page if, to get encoding, it must read the file? It seems a sort of loop.
According to https://www.w3.org/TR/html4/charset.html#h-5.2.2, a browser gets the correct encoding from the Content-Type header field of the HTTP response. If this field is not present, the browser reads the HTML page until the META tag, assuming all bytes were ASCII characters. So this only works if ASCII is a subset of the actual encoding.
You can send the charset both in the http response headers and also you can define a charset in the html file you have sent..
What happens if these 2 are different charsets? How does the browser use the charset for what it received in the http headers and where does it matter what charset it provided in the html file itself?
The HTML 4.01 specification clearly says, in 5.2.2 Specifying the character encoding, that information in an HTTP header has precedence over a meta tag. HTML5 PR does not change this, but it adds, reflecting browser practice, in 8.2.2.2 Determining the character encoding that both of them are overridden by a Byte Order Mark (BOM) at the start of the HTML document (so if you have saved your .html file with “Save as UTF-8 with BOM”, it will be treated as UTF-8 no matter what).
A meta tag that specifies character encoding takes effect if the information is not provided in an HTTP header or with a BOM. A server might not include charset parameter in the Content-Type header, or the HTML document might be opened locally so that there are no HTTP headers at all. When a user saves an HTML document in his own device, the HTTP headers are not saved. This is the main reason for using a meta tag to specify character encoding; but it should then specify the correct encoding of course.
I'm having some trouble getting a special character properly encoded.
® keeps coming through instead of the registered trademark symbol. I've tried changing the meta tag to UTF-8 and Windows-1252, but it still comes through in the encoded format? Can I add a meta tag to fix this?
Make sure to save your file with the proper encoding:
.
Here is an example; on the left side, the file is saved with Window-1252 encoding.
On the right side, it's saved with UTF-8 encoding
HTML options
For such characters, encoding with ISO-8859-1 might do it too, but UTF-8 is greatly encouraged.
Make sure your DOCTYPE is clearly defined : <!DOCTYPE HTML>.
Make sure your meta tag is written properly: <meta charset="UTF-8">.
PHP options
If you use PHP within your page, add the following at the beginning of the page:
<?php header('Content-Type: text/html; charset=utf-8'); ?>
If the content is output from a database, you might want to use utf8_encode() to encode different encodings to UTF-8
utf8_encode()
Encodes an ISO-8859-1 string to UTF-8
The information about encoding should correspond to the actual encoding. So instead of making guesses and trial and error, find out what the encoding really is. It seems to be UTF-8, and if declaring UTF-8 in a meta tag does not help, the probable culprit is an HTTP header that the server sends and that declares a different encoding, trumping the meta tag. Use e.g. an HTTP header viewer to check out the situation.
If the server announces iso-8859-1 or windows-1252 and if you cannot change this, then you just have to use that encoding instead of UTF-8. Then save the page in your authoring program as windows-1252 encoded.
Does the character encoding of an html page affect the character encoding of characters submitted via an input field (i.e. what a user is typing/what gets submitted), or is it the encoding of the display of the page only?
Yes. Browsers encode form data in the encoding of the page containing the form.
I have a form that doesn't work properly when I input somehting with an accent.
If I input "bâtiment", for instance, in the form, I'm sent to
formation.php?search=b%E2timent, instead of formation.php?search=bâtiment
What could cause that ?
EDIT
I have another form that sends me correctly to something.php?search=bâtiment, with the accent in the URL...
%E2 is how you represent â in a URL.
It will be decoded automatically in $_GET['search']
you can convert it back on the far end using $search=url_decode($_REQUEST['search']); URL specs say you can't use accent characters as valid URI characters so they are URL encoded on the fly for you.
Check the page encoding.
Are you sure of what encoding you are using
%E2 is a latin1 encoding of â but many code use the windows cp1252.
Try to use %c6%92 (the utf8 encoding of â) and (%83, cp1252)