Is the "charset" attribute required with HTML5? - html

The W3C "HTML5 differences from HTML4" working draft states:
For the HTML syntax, authors are required to declare the character encoding.
What does "required" mean?
Obviously, a browser will still render HTML5 without the charset meta attribute. If no encoding is specified, which encoding will a browser use?
Basically, I want to know if it is actually necessary to include <meta charset="">, or if 99% of the time browsers will use the correct encoding anyway.

It is not necessary to include <meta charset="blah">. As the specification says, the character set may also be specified by the server using the HTTP Content-Type header or by including a Unicode BOM at the beginning of the downloaded file.
Most web servers today will send back a character set in the Content-Type header for HTML text data if none is specified. If the web server doesn't send back a character set with the Content-Type header and the file does not include a BOM and the page does not include a <meta charset="blah"> declaration, the browser will have a default encoding that is usually based on the language settings of the host computer. If this does not match the actual character encoding of the file, then some characters will be displayed improperly.
Will browsers use the proper encoding 99% of the time? If your page is UTF-8, probably. If not, probably not.
The W3C provides a document outlining the precendence rules for the three methods that says the order is HTTP header, BOM, followed by in-document specification (meta tag).

According to the Google PageSpeed browser extension, declaring a charset in a meta element "disables IE8's lookahead feature" which apparently forces it to download everything in serial.
My understanding was that <meta charset-"utf-8"> was required for valid HTML5, but that is why I started browsing here.
That draft of the spec seems pretty clear to me and since I add the HTTP header via .htaccess, I am going to start leaving it out...even though I'm tempted not to, just make IE8 users suffer a bit more.
Thanks.
#Jules Mazur do you have any references about those points? Most of what I do is SEO and accessibility is important to me and if that is the case I am more than receptive to leaving the the meta declaration.

It’s important to specify a character set of the document as earlier as possible (either through the Content-Type header or the META tag), otherwise the browser will be left to determine the encoding before parsing the document and this may negatively impact the page load time.

The short answer is NO, the charset tag is not required, but recommended.
Modern HTML5 browsers all assume you are using UTF-8 encoding by default (it is the HTML5 standard encoding) AND nearly all of UTF-8 encoding/decoding routines work perfectly with older browser schemes of characters - like Latin-1, ASCII-127, etc. - because they both store character code point numbers the same starting with one byte of memory. UTF-8 was designed to address backwards compatibility issues like this and that is why HTML5 defaults to UTF-8. Many HTTP servers also deliver the correct charset encoding for HTML5 pages, anyway, which is UTF-8. If you leave it off of your HTML web pages, you should only see issues when using exotic upper plain Unicode characters or languages where the pages or character byte code was encoded incorrectly and the browser loses access to the right code points to a few Unicode characters. But again, UTF-8 is always assumed with modern browsers and HTML5. And most delivered pages, past and present, are easily decoded into the memory of the user agent correctly using HTML5's UTF-8.
MORE DETAILS BELOW...
Since 1998, when most of these W3C HTML and encoding specifications we use today came out, the standards bodies have pushed vendors (makers of servers and browsers and document applications) to follow encoding rules and use meta tags to help determine intent.
But due to greed, poor browser design, and other factors very few have followed the specifications consistently over the years. As a result, we have a fractured system. Some vendors, like Mozilla, have followed the standards since 2001 for meta tags while others, like Microsoft and Google, have not.
For that reason, if you want your web pages viewable in 99.9% of user agents still around, all web developers should use contingency design in how all their web pages are constructed, and use meta tags and other standard markup to support the right character encoding used in construction of the web page, despite inconsistent support for such tags. In other words, use both meta tag types. Why? The short "charset" meta tag version works well in modern HTML5 browsers, while the latter is needed in many versions of web browsers prior to 2010 that defaulted to older standards, like Latin-1 and ASCII, but started to support UTF-8 encoding after 2000. Example:
<meta charset="utf-8">
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
...though in reality such markup above will rarely decide how modern web pages are decoded or interpreted by web browsers, past and present.
What encoding is used by the browser when interpreting the page will often be based on the software used in creating the web page itself (as someone above mentioned) which increasingly is UTF-8, but often an ASCII text editor. This is a just a standard encoding scheme of Unicode that's currently popular in creating HTML5 web sites. The user's browser will then likely skip over meta tags and check the page to guess the encoding intent of the author.
You will also notice, in a typical HTML5 page, when you provide <link> or <script> tags to external files, you can control encoding/decoding suggestions using the tag attributes. But those are again, like the meta tag, just "hints" to the browser of what encoding to use and do not fully control what the browsers actually decides what encoding the files are really encoded in, or what the server headers tell the browser they are encoded in.
The main driver of encoding scheme used is the web server whose HTTP response header will often tell the browser the encoding type used, which again for HTML5 pages is always UTF-8. Because old ASCII (first 127 characters) used in older web pages is fully "decodable" from ASCII to UTF-8 in most cases everything using English characters, users in the West rarely have issues between new and old encoding web page technology. Because of all these fall back designs, using meta tags is often not needed at all today and completely ignored in modern web page parsing for the reasons outlined above.
JavaScript using UTF-16 is a different story...
ADDITIONAL OLD BROWSER HISTORY
Some more history of meta tags....in 2000 this whole meta tag debate was much worse than it is today. Use of HTML 4 with embedded Unicode characters often meant pages where neither encoded correctly or rendered correctly, despite server HTTP headers, use of character entities, and meta tags simply because modern browsers back then did not follow the standards and didn't look at meta tags, page encoding, or encoded character entities. Even today, old web pages encoded in old Windows ANSI still cannot be decoded by UTF-8 or UTF-16. That is why to battle all the complex combinations of support and systems in failed standards adoptions, it’s best to use all combinations of optional HTML tag technology to increase the 'likelihood' of your web pages being rendered correctly.
We learned a valuable lesson back then: Web standards would never be consistently followed by companies. When standards are not adopted consistently by private industry it's always best to use all forms and version of tagging, all the time, in every form possible way to maximize your pages are viewed correctly across many different devices using various forms of those standards, even if today they don't matter (as browsers now parse pages and determine encoding themselves).
This why I say, yes, you should use the charset meta tags, even if ignored by many browsers today. It can only help with cross-browser issues and maximize the percentage chance of user agents created the past 20 years can read your valuable web content.
That should be the strategy used for all web page design until we somehow enforce universal adoption of web standards which is increasingly unlikely now with mobile user-agents and HTML5 which have forced us to abandon yet again many of the XML standards that would have enforced better markup design.

Related

Browsers now ignore HTML encoded ampersands?

The conventional wisdom I keep seeing is that ampersands for user facing text and hrefs and such in HTML need to be encoded so that they don't mess up the parsing of the HTML. I also see conflicting advice that HTML 5 has now loosened the necessity of these conventions so that you can just pass something like
...
and it will render just fine.
I've been seeing that when I write the HTML with the encoded ampersands like
...
, modern browsers like Chrome, Safari, Firefox, treat the encoded string literally. When I click on that href, those browsers will take me to the URL somepage.html?x=1&y=2 instead of the URL somepage.html?x=1&y=2.
This has been breaking the functionality of some external links. Let's say I embed an external link in my website to an audio asset like an MP3 for some on demand music that belongs to some 3rd party provider I don't control. When I put it unencoded like somepage.html?x=1&y=2, the browser takes me to the URL somepage.html?x=1&y=2 and the MP3 downloads just fine. When I put it encoded like somepage.html?x=1&amp;y=2, the browser takes me to the URL somepage.html?x=1&amp;y=2 and the MP3 does not download at all. The website that the link belongs to gives back a blank response.
Why are these browsers ignoring the encoded ampersands and treating the href string literally? Does this mean we really don't need to encode ampersands any more for links? Now it's safe to just put links in hrefs as-is? In that regard, is it possible that nowadays, HTML encoding ampersands can actually be detrimental to the functioning of a website like in the example of the MP3 from some 3rd party provider?

frames - I see that frames are not supported in HTML5 - does that mean not supported if the DOCTYPE specifies HTML5?

I have read that frames are not supported in HTML5 but I still want to use them, on occasion.
:-) before you launch into a lecture as to the "evils" of frames, let me say I've resolved the biggest ones - orphaned frames and useless bookmarks. My pages which use frames check to see if they are in the correct frameset, if not, they load the correct frameset. When you bookmark one of my pages which contain frames, the bookmark takes you to the "page" you bookmarked, not just the frameset with default frame sources.
As to HTML5 not supporting frames, is using them a matter of specifying the correct DOCTYPE so that the page is not considered HTML5?
I admit I know very little about DOCTYPE statements and would appreciate any knowledge you can share with me.
Bob
Browsers either support frames or they don't. "Out of the box", all the modern graphical browsers do, though it may be possible for users to disable them. The doctype makes no difference to this.
In HTML5, frames are obsolete. This means that authors, if they want their pages to be HTML5 compliant, must not use frames. It does not mean the user agents (e.g. browsers) should not support them.¹ Indeed, the HTML5 spec devotes a section here and another one here to describing how user agents should process frames and framesets.
¹ So to be absolutely clear, the statement "frames are not supported in HTML5" is simply inaccurate.

Why is the MIME Type for HTML text but the MIME Type for XHTML application?

What makes XHTML (and other XML languages) applications while other SGML-based languages are text? Aren't XML files text files?
XHTML is a subset of XML; XML's media type/mime is text/xml while XHTML's media type/mime is application/xhtml+xml.
Generally HTML is essentially treated like plain text that is interpreted very loosely. Because HTML's junk status reputation XHTML was created to force web designers and web developers to code clean HTML. Gecko (Firefox) and Presto (Opera) browsers correctly break the page and display a malformed XML parse error whereas WebKit (Chrome/Safari) and Trident (Internet Explorer 9.0+ only) fail at failing and merely stop rendering the page.
An XHTML application served as text/html is NOT XHTML, it's HTML with an XHTML doctype.
For XHTML code to be served as an XHTML application it must be served as application/xhtml+xml.
XHTML is also intended to be backwards compatible with HTML.
The following PHP code will look at the headers sent by the client's browser and serve the page as an XHTML application if the browser supports it (all Chrome, Safari 3.0+ (maybe 1.0)), Mozilla Suite 0.8+/all Firefox and Opera 7.0+ (possibly 6.0) support XHTML. Only Internet Explorer 8.0 and older have any market share that does not support XHTML. KHTML browsers (Konqueror) DO support XHTML however I think 4.4 does/did not serve the correct header to the server.
<?php
$http_accept_xhtml = stristr($_SERVER['HTTP_ACCEPT'],'application/xhtml+xml');
if ($http_accept_xhtml) {$mime = 'application/xhtml+xml';}
else {$mime = 'text/html';}
header('Content-Type: '.$mime);
echo '<?xml version="1.0" encoding="UTF-8"?>'."\n";
?>
XHTML is way better than HTML if you're intelligent enough to fix errors when you come across them. It's stricter but that's the point, much less subjectivity. The X in XHTML stands for extensible so it supported SVG and other languages before HTML did if you do a bit of reading.
There are several acceptable MIME types for many kinds of data. For instance, XML could be either text/xml or application/xml (http://tools.ietf.org/html/rfc3023).
HTTP is full of multiple correct ways to do stuff; it's a byproduct of being designed and used by so many people. It's also constantly evolving. Generally, even if there was only one way to design something, there can be many ways in which it was used, and these become de facto standards after enough people pick up on them.
If you don't find any problems with saying that your XHTML and SGML are both "applications" and everything still works and it makes you happier, go for it.

Does HTML5 change anything if I don't use video or audio?

I keep hearing all about HTML5 and how great it is, but if I don't really care about audio and video, is there anything it really changes for me? I've read up on the new tags it supports and they just don't seem to be all that revolutionary beyond its video and audio capabilities.
Shamelessly copy-pasted from wikipedia:
The canvas element for immediate mode 2D drawing.
Timed media playback (possibly not interesting for you)
Offline storage database (offline web applications). See Web Storage[21]
Document editing (via DOM API and user interface)
Drag-and-drop
Cross-document messaging
Browser history management
MIME type and protocol handler registration.
Microdata
Browser-based SQL databases
Oh, and WebSockets to replace AJAX and Comet.
http://en.wikipedia.org/wiki/HTML5#New_APIs
DOM storage
Canvas
Drag'n'drop
Semantic microformats support
One of the biggest deals about HTML5 which is under reported is that HTML4 never defined error handing, but this is well defined in HTML5. All browser vendors are building HTML5 parsers that conform to this spec. While this is not sexy, the end result will be that browsers will become more interoperable with each other (especially in cases where the author makes an error). In the long run this should mean you'll spend less time trying to get all browsers to work correctly, and users will benefit from less broken sites in their browser of choice.
HTML5 also allows you to make more application quality sites, using many of the technologies mentioned in the other answers. Opera Dragonfly (the project I'm involved with) is a complex web app which doesn't use audio or video but takes advantage of a large number of HTML5 technologies. We use AppCache to make sure it still works when you are offline, Web Storage to save user preferences and history (we can store a lot more information than cookies allowed) and will likely use Web Workers to allow the app to use more than one process at once (will speed up performance on mult-core machines).
If you are doing anything with graphics then the Canvas API gives you a lot of drawing options, while SVG (an open vector format) can be used within your HTML pages now. Previously pages had to be served as XML for SVG to be included inside them.

Use of Iframe or Object tag to embed web pages in another

In a web-based system I maintain at work that recently went live, it makes an Object element to embed a second web page within the main web page. (Effectively the main web page contains the menu and header, and the main application pages are in the object)
For example
<object id="contentarea" standby="loading data, please wait..."
title="loading data, please wait..." width="100%" height="53%"
type="text/html" data="MainPage.aspx"></object>
Older versions of this application use an IFRAME to do this though. I have found that by using the object tag the embedded web page behaves differently to when it was previously hosted in an IFRAME. In IE, for example, the tool tips don't seen to work (I will post a separate question about this!), and it looks like the embedded page cannot access the parent page in script, although it can if it was an IFRAME.
I am told the reason for favouring the object tag over the IFRAME is that the IFRAME is being deprecated and so cannot be relied on for future versions of browsers. Is this true though? Is it preferable to use the Object tag over the Iframe to embed web pages? Or is it likely that the IFRAME will be well-supported into the future (long after I am old and grey, and past the useful life of the application I maintain)?
The IFRAME element is part of the upcoming HTML5 standard. Also, HTML5 is developed by the major browser vendors out there (Mozilla, Opera, Safari, IE), that basically makes a guarantee that we will have an IFRAME element in the foreseeable future. Some of them have support for some HTML5 elements already, like AUDIO and VIDEO and some new JavaScript APIs.
It's also true that the OBJECT element is in the draft, but that's because IFRAME and OBJECT will have different purposes. IFRAMES are mainly designed for sandboxing web applications.
So, my advise is to use IFRAME instead of OBJECT.
If you are embedding a HTML page, here is one noticeable difference between iframe and object:
with iframe updating src will change the browser history (adding a new entry)
with object updating data will not change the browser history
Also it seems like drag&drop does not work if the page is embedded in the object tag, but works in the iframe tag. I noticed it personally using react-draggable, and I can see someone had the same issue (https://stackoverflow.com/questions/31807848/replacing-iframe-with-object-tag-drag-and-drop-not-working)
IFRAMEs are not part of the XHTML 1.0 Strict DTD. They are totally valid in in HTML 4 and XHTML 1.0 Transitional, I believe. For these reasons alone, IFRAME will continue to be supported for a long time.
A lot of bookmarklets and analytics code still use IFRAMEs.
Although the W3C specs may indicate that the IFRAME tag is being deprecated, (in XHTML at least anyway), browser developers do not necessarily follow exactly what those specs say (IE6 anyone?)
As use of IFRAMEs is so prevalent at the moment, and the W3C can't seem to decide if they are part of the future or not (HTML 4.01 vs XHTML), I am pretty sure they are the safer implementation to use for almost every browser.