Browsers now ignore HTML encoded ampersands? - html

The conventional wisdom I keep seeing is that ampersands for user facing text and hrefs and such in HTML need to be encoded so that they don't mess up the parsing of the HTML. I also see conflicting advice that HTML 5 has now loosened the necessity of these conventions so that you can just pass something like
...
and it will render just fine.
I've been seeing that when I write the HTML with the encoded ampersands like
...
, modern browsers like Chrome, Safari, Firefox, treat the encoded string literally. When I click on that href, those browsers will take me to the URL somepage.html?x=1&y=2 instead of the URL somepage.html?x=1&y=2.
This has been breaking the functionality of some external links. Let's say I embed an external link in my website to an audio asset like an MP3 for some on demand music that belongs to some 3rd party provider I don't control. When I put it unencoded like somepage.html?x=1&y=2, the browser takes me to the URL somepage.html?x=1&y=2 and the MP3 downloads just fine. When I put it encoded like somepage.html?x=1&y=2, the browser takes me to the URL somepage.html?x=1&y=2 and the MP3 does not download at all. The website that the link belongs to gives back a blank response.
Why are these browsers ignoring the encoded ampersands and treating the href string literally? Does this mean we really don't need to encode ampersands any more for links? Now it's safe to just put links in hrefs as-is? In that regard, is it possible that nowadays, HTML encoding ampersands can actually be detrimental to the functioning of a website like in the example of the MP3 from some 3rd party provider?

Related

embed pdf in html: difference between gview and simple iframe

in this video they recommend using an iframe that calls the GView (google viewer). Is it of any use? Can't you simply reference the pdf in the iframe? what's the benefit of adding the GView?
https://www.youtube.com/watch?v=visxQbQIySg
Most of browsers are expected to manipulate standard web extension such HTML,CSS,JS etc. however they may optionaly support non-web extension (as well as PDF,SWF etc.).
If you push a PDF directly to the browser and the browser does not support PDF extension the file will be downloaded and no defference if you push it inisde an iframe. When you use GView or other Document Viewers they convert the target file to HTML tags or other supported formats for all browsers (like canvas) and so you make sure that the file will be displayed on screen rather than being downloaded. Also they have extra tools like zooming, paging etc. that improves the user experience.

Is the "charset" attribute required with HTML5?

The W3C "HTML5 differences from HTML4" working draft states:
For the HTML syntax, authors are required to declare the character encoding.
What does "required" mean?
Obviously, a browser will still render HTML5 without the charset meta attribute. If no encoding is specified, which encoding will a browser use?
Basically, I want to know if it is actually necessary to include <meta charset="">, or if 99% of the time browsers will use the correct encoding anyway.
It is not necessary to include <meta charset="blah">. As the specification says, the character set may also be specified by the server using the HTTP Content-Type header or by including a Unicode BOM at the beginning of the downloaded file.
Most web servers today will send back a character set in the Content-Type header for HTML text data if none is specified. If the web server doesn't send back a character set with the Content-Type header and the file does not include a BOM and the page does not include a <meta charset="blah"> declaration, the browser will have a default encoding that is usually based on the language settings of the host computer. If this does not match the actual character encoding of the file, then some characters will be displayed improperly.
Will browsers use the proper encoding 99% of the time? If your page is UTF-8, probably. If not, probably not.
The W3C provides a document outlining the precendence rules for the three methods that says the order is HTTP header, BOM, followed by in-document specification (meta tag).
According to the Google PageSpeed browser extension, declaring a charset in a meta element "disables IE8's lookahead feature" which apparently forces it to download everything in serial.
My understanding was that <meta charset-"utf-8"> was required for valid HTML5, but that is why I started browsing here.
That draft of the spec seems pretty clear to me and since I add the HTTP header via .htaccess, I am going to start leaving it out...even though I'm tempted not to, just make IE8 users suffer a bit more.
Thanks.
#Jules Mazur do you have any references about those points? Most of what I do is SEO and accessibility is important to me and if that is the case I am more than receptive to leaving the the meta declaration.
It’s important to specify a character set of the document as earlier as possible (either through the Content-Type header or the META tag), otherwise the browser will be left to determine the encoding before parsing the document and this may negatively impact the page load time.
The short answer is NO, the charset tag is not required, but recommended.
Modern HTML5 browsers all assume you are using UTF-8 encoding by default (it is the HTML5 standard encoding) AND nearly all of UTF-8 encoding/decoding routines work perfectly with older browser schemes of characters - like Latin-1, ASCII-127, etc. - because they both store character code point numbers the same starting with one byte of memory. UTF-8 was designed to address backwards compatibility issues like this and that is why HTML5 defaults to UTF-8. Many HTTP servers also deliver the correct charset encoding for HTML5 pages, anyway, which is UTF-8. If you leave it off of your HTML web pages, you should only see issues when using exotic upper plain Unicode characters or languages where the pages or character byte code was encoded incorrectly and the browser loses access to the right code points to a few Unicode characters. But again, UTF-8 is always assumed with modern browsers and HTML5. And most delivered pages, past and present, are easily decoded into the memory of the user agent correctly using HTML5's UTF-8.
MORE DETAILS BELOW...
Since 1998, when most of these W3C HTML and encoding specifications we use today came out, the standards bodies have pushed vendors (makers of servers and browsers and document applications) to follow encoding rules and use meta tags to help determine intent.
But due to greed, poor browser design, and other factors very few have followed the specifications consistently over the years. As a result, we have a fractured system. Some vendors, like Mozilla, have followed the standards since 2001 for meta tags while others, like Microsoft and Google, have not.
For that reason, if you want your web pages viewable in 99.9% of user agents still around, all web developers should use contingency design in how all their web pages are constructed, and use meta tags and other standard markup to support the right character encoding used in construction of the web page, despite inconsistent support for such tags. In other words, use both meta tag types. Why? The short "charset" meta tag version works well in modern HTML5 browsers, while the latter is needed in many versions of web browsers prior to 2010 that defaulted to older standards, like Latin-1 and ASCII, but started to support UTF-8 encoding after 2000. Example:
<meta charset="utf-8">
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
...though in reality such markup above will rarely decide how modern web pages are decoded or interpreted by web browsers, past and present.
What encoding is used by the browser when interpreting the page will often be based on the software used in creating the web page itself (as someone above mentioned) which increasingly is UTF-8, but often an ASCII text editor. This is a just a standard encoding scheme of Unicode that's currently popular in creating HTML5 web sites. The user's browser will then likely skip over meta tags and check the page to guess the encoding intent of the author.
You will also notice, in a typical HTML5 page, when you provide <link> or <script> tags to external files, you can control encoding/decoding suggestions using the tag attributes. But those are again, like the meta tag, just "hints" to the browser of what encoding to use and do not fully control what the browsers actually decides what encoding the files are really encoded in, or what the server headers tell the browser they are encoded in.
The main driver of encoding scheme used is the web server whose HTTP response header will often tell the browser the encoding type used, which again for HTML5 pages is always UTF-8. Because old ASCII (first 127 characters) used in older web pages is fully "decodable" from ASCII to UTF-8 in most cases everything using English characters, users in the West rarely have issues between new and old encoding web page technology. Because of all these fall back designs, using meta tags is often not needed at all today and completely ignored in modern web page parsing for the reasons outlined above.
JavaScript using UTF-16 is a different story...
ADDITIONAL OLD BROWSER HISTORY
Some more history of meta tags....in 2000 this whole meta tag debate was much worse than it is today. Use of HTML 4 with embedded Unicode characters often meant pages where neither encoded correctly or rendered correctly, despite server HTTP headers, use of character entities, and meta tags simply because modern browsers back then did not follow the standards and didn't look at meta tags, page encoding, or encoded character entities. Even today, old web pages encoded in old Windows ANSI still cannot be decoded by UTF-8 or UTF-16. That is why to battle all the complex combinations of support and systems in failed standards adoptions, it’s best to use all combinations of optional HTML tag technology to increase the 'likelihood' of your web pages being rendered correctly.
We learned a valuable lesson back then: Web standards would never be consistently followed by companies. When standards are not adopted consistently by private industry it's always best to use all forms and version of tagging, all the time, in every form possible way to maximize your pages are viewed correctly across many different devices using various forms of those standards, even if today they don't matter (as browsers now parse pages and determine encoding themselves).
This why I say, yes, you should use the charset meta tags, even if ignored by many browsers today. It can only help with cross-browser issues and maximize the percentage chance of user agents created the past 20 years can read your valuable web content.
That should be the strategy used for all web page design until we somehow enforce universal adoption of web standards which is increasingly unlikely now with mobile user-agents and HTML5 which have forced us to abandon yet again many of the XML standards that would have enforced better markup design.

Embedding base64 encoded Java-Applet in HTML

I tried to embed a base64 encoded Java-Applet in my HTML-file.
I thought I could use a data-url like in my example below:
<applet
name="AppletName"
id="AppletId"
code="Applet.class"
archive="data:application/x-jar;base64,UEsDBBQAAAAIAGY/eziUsj5wxAAAABwB...
</applet>
This did not work. In my tests Firefox and Chrome crashed.
I also tried different MIME types.
Any Ideas?
Thanks in advance!
There are a few issues:
The tag applet does not have the correct attribute values
The base64 encoded data URI must be able to be decoded by a browser, not a plugin

embed .swf inline into html using base64 encoding

try to do: embed .swf into html inline using base64 encoding
i read a post somewhere saying this is no longer possible with flash 10+
any hacks or definitive answer?
'data:application/x-shockwave-flash;base64,Q1dTChQHAAB4(cut)9ktAW5/4BvdnQmw=='
does not work anymore
Why, I think if you do that on <object> tag it should work, not sure why would it depend on Player's version, I'd imagine that the SWF is loaded by the browser, not the player.

HTML5 Video tag as URL

I am learning the new HTML5 tags, and have a question about the video tag that I cannot seem to find a good example/answer for.
Can I provide a source as a URL, or does the source have to route from the web server? I am just trying to play with an example to see what it looks like, and use a youtube video as the source. Is this possible?
The source can indeed be a valid URL, which of course in this case needs to be a URL to a valid video file.