Should I still be using <meta charset="UTF-8"> in 2022? - html

<meta charset="UTF-8">
UTF-8 is default encoder in modern browsers. So all this code does is it adds support for browsers which don't automatically do this. I don't plan on supporting older browsers is there any other reason to add this line of code?
I have heard others saying that leaving it out could lead to some cross scripting attacks, bad things and such but never gave me any clear examples.
Also some old HTMl validator throws error when leaving <meta charset="UTF-8">. out
https://validator.w3.org/nu/#file
The character encoding was not declared
Then it does this
process with windows-1252.
This isn't great cause this could lead to error if the site has characters that windows-1252 doesn't support.
I'm guessing this only happens on browers that don't default to UTF-8 support this though. Should I be worried about this warning/error if I leave this out.
I have researched, about trying to understand why UTF-8 is used but I can't find a definite answer on to why to use or to not use it.
Thanks in advance.

For anyone looking back on this post, I did some more research, and found more reasons on why to include this line of code. I found this page from the Google Devs. That states this
https://web.dev/charset/#resources
Lighthouse flags pages that do not specify their character encoding:
So, not including <meta charset="UTF-8"> would make your Google Lighthouse score lower.
Here is why it is considered a best practice
Servers and browsers communicate with each other by sending bytes of data over the internet. If the server doesn't specify which character encoding format it's using when it sends an HTML file, the browser won't know what character each byte represents. The character encoding declaration specification solves this problem.
Here is the lighthouse doc
From my understanding this line of code isn't necessary anymore, but is considered a best practice. Hope this helped for anyone reading this.

Related

How come the following characters are displayed in ISO-8859-1?

I have the following html:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
</head>
<body>
会意字 / 會意字 huìyìzì
</body>
When I run it in firefox, it displays the Chinese characters just fine. How come it works with the ISO-8859-1 characterset? I thought you needed UTF-8?
I can't reproduce your successful rendering:
… but HTML 5 defines a fairly complex character encoding detection method which doesn't pay any attention to <meta> until step 9.
In general, you should avoid encodings other than UTF-8 and definitely should not lie about the encoding of the document.
The most probable explanation is that the document is in fact UTF-8 encoded and the browser treats it that way, despite the meta tag. According to HTML5 encoding sniffing algorithm, which largely reflects browser behavior, the meta tag is ignored if any of the following is true:
The user has instructed (via e.g. a View → Encoding command) the browser to use a specific encoding.
The page starts with bytes that represent the Byte Order Mark in UTF-8 or UTF-16. In practice, it starts that way if the file was saved in an editor with a command like “Save as UTF-8 (with BOM)”.
HTTP headers specify an encoding in a Content-Type header.
You can find out which of these is the cause by using e.g. Rex Swain’s HTTP viewer. It lets you see both the HTTP response headers and the actual data as bytes. Developer Tools in browsers have similar features.

Wrong character entities displaying in IE11 -- · rendering as ∑ -- is this really a charset issue?

I designed several UI prototypes (testing initially in Chrome) using HTML5, and while testing in other browsers, noticed that IE11 was substituting different characters for common character entities, like · and on one of the two UI's I was testing.
Both prototypes are hosted on the same server, in different folders, so I'm a bit baffled by the research I've done which points to IE10 & IE11 giving a higher precedence to HTTP over BOM in HTML5; but... if the server was sending out a header declaring ISO-8859-1 or windows-1251, overriding the UTF-8 charset, shouldn't I be seeing the same problem on both prototypes? Wouldn't I see problems with other characters?
The thing that is really bugging me is that, whatever the charset, the HTML character entities in the markup would be the same, right? How does IE manage to misinterpret that?
In any case, I've tried:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
as well as:
<meta charset="utf-8">
and:
<meta charset="ISO-8859-1">
and still get † instead of non-breaking spaces, or ∑ instead of ·
I don't have host access to change the .htaccess file, though I will pass that suggestion on to my supervisors. I'm just not sure I have a sound explanation for the way IE is behaving; is this character swapping truly caused by IE failing to recognize UTF-8?
How do I explain this issue appearing in subfolder A but not in subfolder B on the same host, if the problem does result from the HTTP vs BOM prioritization?
If I failed to present my question clearly, my apologies. To recap, character entities I've used in my markup for years were displaying incorrectly in IE11. My search for a cause led me to several posts on stack overflow and elsewhere that suggested the problem might be due to the way IE puts priority on the HTTP header over the BOM (unlike other browsers).
See: UTF-8 encoding does not work properly with Internet Explorer but works perfectly with Mozilla Firefox (which also recommends: IE uses the wrong character set when it renders an HTML page).
However, on checking to see exactly what encoding was being applied to the page, it kept saying UTF-8. So, I put up this question to find out if there was another known cause. I did not get an answer, but I did stumble across on on my own. In a way, the answer is implicit in the answer the author of the post I linked to gave to his own question. I simply did not see it at the time.
Put simply, my "defective" prototype did not have a unicode font declared. I thought I started both prototypes with the same basic css, but... yeah, these things happen. The one thing I didn't check, because I was so confident it did it by rote. I did catch references to the need for unicode fonts in HTML5 in various posts on this (and related) topics, but let me emphasize:
Be sure to use Unicode fonts with UTF-8 in HTML5
I'm assuming that browsers other than IE have unicode versions in the font stacks of their default font-families.
Move the order of the fonts from
body {
font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
to this
body {
font-family: Arial,"Helvetica Neue", Helvetica, sans-serif;
in your bootstrap.css

IE 10 does not load page in UTF-8

I've got simple HTML pages in Russian with a bit of js in it.
Every browser going well except IE10. Even IE9 is fine. Next code is included:
<html lang="ru">
<meta http-equiv="Cоntent-Type" content="text/html"; charset="utf-8">
Also I've added .htacess with
AddDefaultCharset UTF-8
Still IE10 loads page in Cyrillic encoding (cp-1251 I believe), the only way to display characters in a right way is to manually change it to UTF-8 inside of a browser (or chose auto-detect mode).
I don't understand why IE10 force load 1251 instead of UTF-8.
The website to check is http://btlabs.ru
What really causes the problem is that the HTTP headers sent by the server include
Content-Type: text/html; charset=windows-1251
This overrides any meta tags. You should of course fix the errors with the meta tag as pointed out in other answers, and run a markup validator to check your code, but to fix the actual problem, you need to fix the .htaccess file. Without seeing the file and other server-side issues, it is impossible to tell how to fix that (e.g., server settings might prevent the effect of a per-directory .htaccess file and apply one global file set by the server admin). Note that the file name must have two c's, not one (.htaccess, not `.htacess').
You can check what headers are e.g. using Rex Swain’s HTTP Viewer.
The reason why things work on other browsers is that they apply the modern HTML5 principle “BOM wins them all”. That is, an HTTP header wins a meta tag in specifying the character encoding, but if the actual data begins with three bytes that constitute the UTF-8 encoded form of the Byte Order Mark (BOM), then, no matter what, the data will be interpreted as UTF-8 encoded. For some unknown reason, IE 10 does not do that (and neither does IE 11).
But this won’t be a problem if you just make the server send an HTTP header that declares UTF-8.
If the server has been set to declare windows-1251 and you cannot possibly change that, then you just need to live with it. Transcode your HTML files to windows-1251 then, and declare windows-1251 in a meta tag. This means that if you need any characters outside the limited repertoire representable in windows-1251, you need to represent them using character references.
perhaps because your 'o' in 'content' is not an ascii 'o'. notice that it is not red in Stackoverflow? i then copied it to a good text editor and see that it is indeed not an o. because the 'o' is not really an ascii 'o', that whole line probably should get ignored in every web browser, which should then depend on what default charset it uses. Microsoft and IE is notorious for picking bad defaults, thus is my reason why it doesn't work in IE. ;)
but codingaround has good advice too. it's best to put quotes around your attribute values. but that should not break a web browser.
you should use a doctype at the start:
<!DOCTYPE html>
<html lang='ru'>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
but the real culprit is your content and charset problem. notice my line. mine is very different. ;) that's the problem. note that mine has two ascii 'o's, one in "Content-Type" and another in 'content='.
As Shawn pointed out, copy and paste this:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
This is a really good example of how non-Ascii letters that look like English Ascii letters can really mess things up!
Maybe you forgot changing cоntent=text/html; to cоntent="text/html";
As Shawn has already pointed out, it could also be content="text/html; charset=utf-8".
But as you have tried both things out, can you confirm if the IE10 output looks like this?
I can't really help further with this, as the only thing I have here is an IE 10 online emulator.
So far the possible problems are:
Different o character
I see, that the <meta> tag is still outside of <head>, put it in place
Problems with IE handling the content and charset attributes

Html charset and support for special (national) characters

I have a website in HTML5. Most of the content there is in Czech, which has some special symbols like "ř, č, š" etc...
I searched the internet for recommended charsets and I got these answers: UTF-8, ISO 8859-2 and Windows-1250.
<meta http-equiv="Content-Type" content="text/html; charset=ISO 8859-2" />
I tried UTF-8 which didnt work at all and then settled up with ISO 8859-2. I tested my website on my computer in the latest versions of Chrome, Firefox, IE and Opera. Everything worked fine but when I tested my website at http://browsershots.org/ , these characters were not displayed correctly (in the same browsers that I used for testing!).
How is that possible? How can I ensure, that all characters are displayed correctly in all web browsers. Is it possible that usage of HTML5 causes these problems (since its not fully supported by all browsers, but I am not using any advanced functions)?
Thanks for any hints and troubleshooting tips!
If you using HTML5, try this short declaration of charset:
<meta charset="UTF-8">
Additionally check you html file encoding. You can do it in Notepad++, menu Encoding -> Encode in UTF-8.
The important thing is that the actual encoding of the data coincides with the declared encoding. From the description, it seems that the actual encoding is ISO-8859-2, so you should declare it. Note that the name of the encoding has no space but hyphens. (I wonder whether you used it with a space – I would expect browsers to ignore the tag then.) The following is the simplest declaration:
<meta charset=ISO-8859-2>
I would not trust on browsershots.org getting things like this right. Testing on actual browsers is more useful.
UTF-8 is the best-supported character set for international usage. If it does not display correctly, you should ensure that your file is saved in UTF-8 format. Even Notepad has a "UTF-8" option in its save dialog.

How do browsers handle <meta> tag that specifies the character-encoding?

Suppose a browser encounters a <meta> tag that specifies the character-encoding, like this:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
Does it start over from the beginning parsing the page again, since some of the preceding characters in the <head> section may have been interpreted incorrectly? Or are there some other constraints that prevent prior characters from being interpreted incorrectly?
As far as I know, browsers wont go back after finding a charset declaration in the <head> and they assume a ASCII compatible charset up to that point. Unfortunately I can't find a reference to confirm this.
Confirming browsers will ignore a Content-Type meta element, if the server already provides a Content-Type HTTP header, so you can't override a "wrong" server-side charset with a <meta> element.
The point for the <meta> charset declaration is for HTML documents that are not server by a HTTP server.
That means you shouldn't rely on a <meta> charset declaration in the HTML file, but configure your HTTP server to provide the correct charset. If for some reason you have to rely on a <meta> charset declaration, you should only have ASCII characters up to that point and position it as early in the <head> as possible, preferably as the first element.
The parser can start over in some circumstances. The relevant spec is here: http://dev.w3.org/html5/spec/parsing.html#change-the-encoding
Note that browsers traditionally have probably not followed this algorithm exactly; chances are they've all done slightly different things. However, the link above describes what HTML5 compliant browsers should do. The algorithm described is likely an amalgam of various browsers previous behaviour.
Since HTML5 is still a working draft, this should be considered subject to change.
It has no real effect on the node structure. Only the content of text nodes (and attribute nodes) has to be transcoded.
If your server sends the
Content-type: text/html;charset=utf-8
...header the browser will know the right charset from the start. You can acieve ths with a .htaccess file containing:
AddDefaultCharset utf-8