Why use data URI scheme? - html

Basically the question is in the title.
Many people have had the question stackoverflow of how to create a data URI and problems therein.
My question is why use data URI?
What are the advantages to doing:
<img src="
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />
Over doing:
<img src="dot.png" alt="Red dot" />
I understand one has less overhead on the server side (maybe), but what are the real advantages/disadvantages to using data URI?

According to Wikipedia:
Advantages:
HTTP request and header traffic is not required for embedded data, so
data URIs consume less bandwidth whenever the overhead of encoding
the inline content as a data URI is smaller than the HTTP overhead.
For example, the required base64 encoding for an image 600 bytes long
would be 800 bytes, so if an HTTP request required more than 200
bytes of overhead, the data URI would be more efficient.
For transferring many small files (less than a few kilobytes each), this can be faster. TCP transfers tend to start slowly. If each file requires a new TCP connection, the transfer speed is limited by the round-trip time rather than the available bandwidth. Using HTTP keep-alive improves the situation, but may not entirely alleviate the bottleneck.
When browsing a secure HTTPS web site, web browsers commonly require that all elements of a web page be downloaded over secure connections, or the user will be notified of reduced security due to a mixture of secure and insecure elements. On badly configured servers, HTTPS requests have significant overhead over common HTTP requests, so embedding data in data URIs may improve speed in this case.
Web browsers are usually configured to make only a certain number
(often two) of concurrent HTTP connections to a domain, so inline
data frees up a download connection for other content.
Environments with limited or restricted access to external resources
may embed content when it is disallowed or impractical to reference
it externally. For example, an advanced HTML editing field could
accept a pasted or inserted image and convert it to a data URI to
hide the complexity of external resources from the user.
Alternatively, a browser can convert (encode) image based data from
the clipboard to a data URI and paste it in a HTML editing field.
Mozilla Firefox 4 supports this functionality.
It is possible to manage a multimedia page as a single file. Email
message templates can contain images (for backgrounds or signatures)
without the image appearing to be an "attachment".
Disadvantages:
Data URIs are not separately cached from their containing documents
(e.g. CSS or HTML files) so data is downloaded every time the
containing documents are redownloaded. Content must be re-encoded and
re-embedded every time a change is made.
Internet Explorer through version 7 (approximately 15% of the market as of January 2011), lacks support. However this can be overcome by serving browser specific content.
Internet Explorer 8 limits data URIs to a maximum length of 32 KB.
Data is included as a simple stream, and many processing environments (such as web browsers) may not support using containers (such as multipart/alternative or message/rfc822) to provide greater complexity such as metadata, data compression, or content negotiation.
Base64-encoded data URIs are 1/3 larger in size than their binary
equivalent. (However, this overhead is reduced to 2-3% if the HTTP
server compresses the response using gzip) Data URIs make it more
difficult for security software to filter content.
According to other sources
- Data URLs are significantly slower on mobile browsers.

A good use of Data URI is allowing the download of content that have been generated client side, without resorting to a server-side 'proxy'. Here are some example I can think of:
saving the output of a canvas element as an image.
offering download of a table as CSV
downloading output of any kind of online editor (text, drawing, CSS code ...etc)

Mainly I find use of this if I can't (for some reason) use CSS sprites and I don't want to download every little single image that I will use for styling.
Or for some reason you don't want anyone to link the image from external page. This can be achieved by other methodologies but embedding works as well.
Otherwise, personally I wouldn't encode large images as photos. It's better to have your media at a different server. A server that can lack all of the web-server related software installed. Simply delivering media. Much better use of resources.

I have used the data URI scheme in several (C++, Python) command line applications to generate
reports which include data plots.
Having a single file is quite convenient to send the reports by email (or move them around in general). Compared to PDF I did not need an additional library (other than a base64 encoding
routine) and I don't need to take care of page breaks (and I almost never need to print
these reports). I usually don't put these reports on a web server, just view them on the
local filesystem with a browser.

I agree with BiAiB that the real value of Data URIs is making client-side generated content available as file download without any need for server round-trips.
A working example of using Data URIs for "offering download of a table as CSV" is described on my blog.
IMHO, the embedding of image (or other binary resource) data into an HTML file for performance reasons is a red herring. The speed gain due to less HTTP connections is negligible and breaks the nice principle of separation between (textual) markup and binary resources (image files, videos, etc.).
I think HTTP 1.1 pipelining and some suggested improvements to HTTP are a cleaner and better way to handle HTTP network speed issues.

Related

Using data URI for images in IE 6 and 7

I am currently developing a web-app. It contains images which are generated dynamically in the server (and thus takes some time to appear after requested) and then dished out. So I thought that I will use HTML5 local-storage API to cache the images, so that on subsequent requests for the same image, it can be served instantly. For tha, I plan to use base64 encoding of the image as the source instead of using a source URL.
Instead of requesting the image from the server, the JS will now first check whether that image data is currently available in the local storage (say an image with attribute 123 is stored in the local storage with 123 as key, and the base 64 encoding as the value). If yes, then just change the image's source with the value obtained from there. Else request the server to send the encoding, upon receiving which, it is stored in the cache.
Problem is IE6 and IE7 don't support it. There is a workaround, as described here, but that involves a server side CSS file to contain the image data. Since images will be generated on the fly, that won't serve our purpose. How else can I achieve this in IE6 and IE7?
Alternatively don't try and cache anything clientside. Cache the generated images on the server side and host those images like normal. You don't need localstorage and cache it client side.
In other words:
generate image server side using your script
cache it somewhere like /httpdocs/cache/images/whatever-hash.jpg
serve the image in your document <img src="/cache/images/whatever-hash.jpg">
If generating an image takes 5 seconds and you have 120 concurrent users requesting 100 unique pages and your server script can only handle processing 4 threads at any given time that comes out to
5 seconds x (120 /4) / 60 = 2.5 minutes of server processing time before the last user in the queue's image is served and the data stored in localstorage.
The same time spent will be true if all users requested the same exact page. There would be no real benefits from caching per user since every user will have to ask the server to generate their own image. Also since localstorage will get invalidated often the more the user does they will feel a considerately slow user experience and in my opinion bail on your app.
Caching the file on the server will have many more benefits IMHO. Sure it takes up server storage space but these days it's rather cheap and you can get a cloud CDN (example www.maxcdn.com) to combat the load.
If you still decide you need to cache client side, because IE6/IE7 doesn't support localstorage or data URI so check out the following
You'll need a Web Storage shim for IE6/IE7 there's a list at https://github.com/Modernizr/Modernizr/wiki/HTML5-Cross-Browser-Polyfills
You'll need a way to store the generated image as blob temporarily and stick it in the storage. Example: http://robnyman.github.com/html5demos/localstorage/
You could also use a Canvas Shim and a toBlob shim: https://github.com/eligrey/canvas-toBlob.js/
set the headers to inform the browser that the resource is cached :
header("Last-Modified: " . date("D, d M Y H:i:s", getlastmod()));
in PHP
or
Response.Cache.SetLastModified(DateTime.Now);
in .net
This way the browser will cache the resource.

Extracting audio bit and sampling rate information in JS before upload

I am building an application that allows authenticated users to use a Web browser to upload MP3 audio files (of speeches) to a server, for distributing the audio on a network. The audio files need to use a specific bit rate (32kbps or less) to ensure efficient use of bandwidth, and an approved sampling rate (22.050 or 44.100) to maximize compatibility. Rather than validate these requirements following the upload using a server-side script, I was hoping to use HTML5 FileReader to determine this information prior to the upload. If the browser detects an invalid bit rate and/or sampling rate, the user can be advised of this, and the upload attempt can be blocked, until necessary revisions are made to the audio file.
Is this possible using HTML5? Please note that the question is regarding HTML5, not about my application's approach. Can HTML5 detect the sampling rate and/or bit rate of an MP3 audio file?
FYI note: I am using an FTP java applet to perform the upload. The applet is set up to automatically forward the user to a URL of my choosing following a successful upload. This puts the heavy lifting on the client, rather than on the server. It's also necessary because the final destination of each uploaded file is different; they can be on different servers and different domains, possibly supporting different scripting languages on the server. Any one server would quickly exceed its storage space otherwise, or if the server-side script did an FTP transfer, the server's performance would quickly degrade as a single point of failure. So for my application, which stores uploaded audio files on multiple servers and multiple domains, validation of the bit rate and sampling rate must take place on the client side.
You can use FileReader API and Javascript built audio codecs to extract this information from the audio files.
One library providing base code for pure JS codecs is Aurora.js - then the actual codec code is built upon it
https://github.com/audiocogs/aurora.js/wiki/Known-Uses
Naturally the browser must support FileReader API.
I didn't understand from your use case why you need Java applet or FTP. HTTP uploads work fine for multiple big files if done properly using async badckend (like Node.js, Python Twisted) and scalable storage (Amazon S3). Similar use case is resizing incoming images which is far more demanding application than extracting audio metadata out from the file. The only benefit on the client side is to reduce the number of unnecessary uploads by not-so-technically-aware users.
Given that any user can change your script/markup to bypass this or even re-purpose it, I wouldn't even consider it.
If someone can change your validation script with a bit of knowledge of HTML/Javascript, don't use HTML/Javascript. It's easier to make sure that it is validated, and validated correctly by validating it on the server.

Loading external files for a non-hosted HTML5 application

I am currently developing a HTML5 game which loads in an external resource. Currently, I am using XMLHttpRequest to read in the file, but this does not work on Chrome, resulting in a
XMLHttpRequest cannot load file:///E:/game/data.txt
Cross origin requests are only supported for HTTP.
The file is in the same directory as the HTML5 file.
Questions:
Is there any way for a HTML5 application to use XMLHttpRequest (or
other method) to load in an external file without requiring it to be
hosted on a webserver?
If I package the HTML5 code as an application on a tablet/phone
which supports HTML5, would XMLHttpRequest be able to load external
files?
(a) Yes and no. As a matter of security-policy, XHR has traditionally been both same-protocol (ie: http://, rather than file:///), and on top of that, has traditionally been same-domain, as well (as in same subdomain -- http://pages.site.com/index can't get a file from http://scripts.site.com/). Cross-domain requests are now available, but they require a webserver, regardless, and the server hosting the file has to accept the request specifically.
(b) So in a roundabout way, the answer is yes, some implementations might (incorrectly) allow you to grab a file through XHR, even if the page is speaking in file-system terms, rather than http requests (older versions of browsers)... ...but otherwise you're going to need a webserver of one kind or another. The good news is that they're dirt-simple to install. EasyPHP would be sufficient, and it's pretty much a 3-click solution. There are countless others as well. It's just the first that comes to mind in terms of brain-off installation, if all you want is a file-server in apache, and you aren't planning on using a server-side scripting language (or if you do plan on using PHP).
XMLHttpRequest would absolutely be able to get external files...
IF they're actually external (ie: not bundled in a phone-specific cache -- use the phone's built-in file-access API for that, and write a wrapper to handle each one with the same, custom interface), AND the phone currently has reception -- be prepared to handle failure-conditions (like having a default-settings object, or having error-handling or whatever the best-case is, for whatever is missing).
Also, look into Application Cache Manifests. Again, this is an html5 solution which different versions of different phones handle differently (early-days versus more standardized formats). DO NOT USE IT DURING DEVELOPMENT, AS IT MAKES TESTING CODE/CONTENT CHANGES MISERABLY SLOW AND PAINFUL, but it's useful when your product is pretty much finished and bug-free, and seconds away from launch, where you tell users' browsers to cache all of the content for eternity, so that they can play offline, and they can save all kinds of bandwidth, not having to download everything the next time they play.

Quick question regarding CSS sprites and memory usage

Well, it's more to do with images and memory in general. If I use the same image multiple times on a page, will each image be consolidated in memory? Or will each image use a separate amount of memory?
I'm concerned about this because I'm building a skinning system for a Windows Desktop Gadget, and I'm looking at spriting the images in the default skin so that I can keep the file system looking clean. At the same time I want to try and keep the memory footprint to a minimum. If I end up with a single file containing 100 images and re-use that image 100 times across the gadget I don't want to have performance issues.
Cheers.
What about testing it? Create a simple application with and without spriting, and monitor your windows memory to see which approach is better.
I'm telling you to test it because of this interesting post from Vladimir, even endorsed by Mozilla "use sprites wisely" entry:
(...) where this image is used as a sprite. Note that this is a 1299x15,000 PNG.
It compresses quite well — the actual download size is around 26K - but browsers
don't render compressed image data. When this image is downloaded and
decompressed, it will use almost 75MB in memory (1299 * 15000 * 4).
(At the end of Vladimir's post there are some other great references to check)
Since I don't know how Windows renders it's gadgets (and if it's not going to handle compressed image data), it's dificult IMHO to say exactly which approach is better without testing.
EDIT: The official Windows Desktop blog (not updated since 2007) says the HTML runtime used for Windows Gadgets is MSHTML, so I think a test is really needed to know how your application would handle the CSS sprites.
However, if you read some of the official Windows Desktop Gadgets and Windows sidebar documentation, there's an interesting thing about your decision to not use css sprites, in the The GIMAGE Protocol section:
This protocol is useful for adding
images to the gadget DOM more
efficiently than the standard HTML
tag. This efficiency results
from improved thumbnail handling and
image caching (it will attempt to use
thumbnails from the Windows cache if
the requested size is smaller than 256
pixels by 256 pixels) when compared
with requesting an image using the
file:// or http:// protocols. An added
benefit of the gimage protocol is that
any file other than a standard image
file can be specified as a source, and
the icon associated with that file's
type is displayed.
I would try to use this protocol instead of CSS sprites and do some testing too.
If none of this information would help you, I would try to ask at Windows Desktop Gadgets official forums.
Good luck!
The image will show up one time in the cache (as long as the url is the same and there's no query string appended to the file name). Spriting is the way to go.
Webbrowsers identifies cacheable resources by their ETag response header. If it is absent or differs among requests, then the image may be downloaded and stored in cache multiple times. If you (actually, the webserver) supply an unique and the same ETag header for each unique resource, then any decent webbrowser is smart enough to keep one in cache and reuse it as long as its Expires header allows.
Any decent webserver will supply the ETag header automatically for static resources, it is often autogenerated based on a combination of the local filename, the file length and the last modified timestamp. But often they don't add the Expires header, so you need to add it yourself. After judging your post history here at Stackoverflow I safely assume that you're familiar with Apache HTTPD as web server, so I'd suggest to have a look at mod_expires documentation to learn how to configure it yourself to an optimum.
In a nutshell, serve the sprite image along with an ETag and a far future Expires header and it'll be okay.

Pros and Cons of a separate image server (e.g. images.mydomain.com)?

We have several images and PDF documents that are available via our website. These images and documents are stored in source control and are copied content on deployment. We are considering creating a separate image server to put our stock images and PDF docs on - thus significantly decreasing the bulk of our deployment package.
Does anyone have experience with this approach?
I am wondering about any "gotchas" - like XSS issues and/or browser issues delivering content from the alternate sub-domain?
Pro:
Many browsers will only allocate two sockets to downloading assets from a single host. So if index.html is downloaded from www.domain.com and it references 6 image files, 3 javascript files, and 3 CSS files (all on www.domain.com), the browser will download them 2 at a time, with the other blocking until a socket is free.
If you pull the 6 image files off onto a separate host, say images.domain.com, you get an extra two sockets dedicated to download your images. This parallelizes the asset download process so, in theory, your page could render twice as fast.
Con:
If you're using SSL, you would need to either get an additional single-host SSL certificate for images.domain.com or a wildcard SSL certificate for *.domain.com (matches any subdomain). Failure to do so will generate a warning in the browser saying the page contains mixed secure and insecure content.
You will also, with a different domain, not send the cookies data with every request. This can increase performance.
Another thing not yet mentioned is that you can use different web servers to serve different sorts of content. For example, your static content could be served via lighttpd or nginx while still serving your dynamic content off Apache.
Pros:
-load balancing
-isolating a different functionality
Cons:
-more work (when you create a page on the main site you would have to maintain the resources on the separate server)
Things like XSS is a problem of code not sanitizing input (or output for that matter). The only issue that could arise is if you have sub-domain specific cookies that are used for authentication.. but that's really a trivial fix.
If you're serving HTTPS and you serve an image from an HTTP domain then you'll get browser security alert warnings pop up when you use it.
So if you do HTTPS, you'll need to buy HTTPS for your image domain awell if you don't want to annoy the hell out of your users :)
There are other ways around this, but it's not particularly in the scope of this answer - it was just a warning!