I am currently developing a web-app. It contains images which are generated dynamically in the server (and thus takes some time to appear after requested) and then dished out. So I thought that I will use HTML5 local-storage API to cache the images, so that on subsequent requests for the same image, it can be served instantly. For tha, I plan to use base64 encoding of the image as the source instead of using a source URL.
Instead of requesting the image from the server, the JS will now first check whether that image data is currently available in the local storage (say an image with attribute 123 is stored in the local storage with 123 as key, and the base 64 encoding as the value). If yes, then just change the image's source with the value obtained from there. Else request the server to send the encoding, upon receiving which, it is stored in the cache.
Problem is IE6 and IE7 don't support it. There is a workaround, as described here, but that involves a server side CSS file to contain the image data. Since images will be generated on the fly, that won't serve our purpose. How else can I achieve this in IE6 and IE7?
Alternatively don't try and cache anything clientside. Cache the generated images on the server side and host those images like normal. You don't need localstorage and cache it client side.
In other words:
generate image server side using your script
cache it somewhere like /httpdocs/cache/images/whatever-hash.jpg
serve the image in your document <img src="/cache/images/whatever-hash.jpg">
If generating an image takes 5 seconds and you have 120 concurrent users requesting 100 unique pages and your server script can only handle processing 4 threads at any given time that comes out to
5 seconds x (120 /4) / 60 = 2.5 minutes of server processing time before the last user in the queue's image is served and the data stored in localstorage.
The same time spent will be true if all users requested the same exact page. There would be no real benefits from caching per user since every user will have to ask the server to generate their own image. Also since localstorage will get invalidated often the more the user does they will feel a considerately slow user experience and in my opinion bail on your app.
Caching the file on the server will have many more benefits IMHO. Sure it takes up server storage space but these days it's rather cheap and you can get a cloud CDN (example www.maxcdn.com) to combat the load.
If you still decide you need to cache client side, because IE6/IE7 doesn't support localstorage or data URI so check out the following
You'll need a Web Storage shim for IE6/IE7 there's a list at https://github.com/Modernizr/Modernizr/wiki/HTML5-Cross-Browser-Polyfills
You'll need a way to store the generated image as blob temporarily and stick it in the storage. Example: http://robnyman.github.com/html5demos/localstorage/
You could also use a Canvas Shim and a toBlob shim: https://github.com/eligrey/canvas-toBlob.js/
set the headers to inform the browser that the resource is cached :
header("Last-Modified: " . date("D, d M Y H:i:s", getlastmod()));
in PHP
or
Response.Cache.SetLastModified(DateTime.Now);
in .net
This way the browser will cache the resource.
Related
I am hosting a website where a couple of PDF-files which are currently in Object-tags are updated weekly.
The name of these PDF-files stay the same, but the data changes.
Currently I'm using:
<object id="men" data="seasons/S2223/Men2023.pdf?" type="application/pdf" width="100%" height="750px">
<p>The file could not be read in the browser
Click here to download
</p>
</object>
When I update the PDF I'm expecting the
data="seasons/S2223/Men2023.pdf?"
to be reading the latest PDF however it stays the same as before.
I added the ? at the end of the filename which should check for the latest version but it doesn't seem to work.
When I clear my browser's cache it's updated but ofcourse this isn't a suitable option for users.
All help is appreciated.
Caching, in this context, is where the browser has loaded the data from a URL in the past and still has a local copy of it. To speed things up and save bandwidth, it uses its local copy instead of asking the server for a fresh copy.
If you want the browser to fetch a fresh copy then you need to do something to make it think that the copy it has in the cache is no good.
Cache Busting Query String
You are trying to use this approach, but it isn't really suitable for your needs and your implementation is broken.
This technique is designed for resources that change infrequently and unpredictable such as the stylesheet for a website. (Since your resources change weekly, this isn't a good option for you.)
It works by changing the URL to the resource whenever the resource changes. This means that the URL doesn't match the one the browser has cached data for. Since the browser doesn't know about the new URL it has to ask for it fresh.
Since you have hardcoded the query to n=1, it never changes which defeats the object.
Common approaches are to set the value of the query to a time stamp or a checksum of the file. (This is usually done with the website's build tool as part of the deployment process.)
Cache Control Headers
HTTP provides mechanisms to tell the browser when it should get a new copy. There are a variety of headers and I encourage you to read this Caching Tutorial for Web Authors and Webmasters as it covers the topic well.
Since your documents expire weekly, I think the best approach for you would be to set an Expires header on the HTTP resource for the PDF's URL.
You could programmatically set it to (for example) one hour after the time a new version is expected to be uploaded.
How you go about this would depend on the HTTP server and/or server-side programming capabilities of the host where you deploy the PDF.
I have a website that is used as a kiosk app. When online, I preload data and images from a wordpress API, and store the images in the cache storage.
A service worker intercepts http gets to these images and serves the cache data instead. Like this, the app can run offline (API calls included).
But after few hours running offline (generally around 6h) some images disappear from the cache storage. And it's always the same ones.
But not all.
Any idea where should I check to see what's going wrong ?
There can be two possible reasons for such clearing of cache, either your storage gets full(chrome localStorage), so it gets clear OR check for the expiring length of the data send from the server, check for headers that might give you insights of time it takes to expire.
And for checking if the data gets evicted from the storage, try testing your website in Safari or Edge browser where such eviction does not occur.
How did you configured Cache-Control? The ones which are constantly deleted might configured as max-age.
It's still recommended that you configure the Cache-Control headers on
your web server, even when you know that you're using the Cache
Storage API. You can usually get away with setting Cache-Control:
no-cache for unversioned URLs, and/or Cache-Control: max-age=31536000
for URLs that contain versioning information, like hashes.
as stated at https://web.dev/service-workers-cache-storage
Also, you should have a check storage amount of cache storage. (https://developers.google.com/web/tools/chrome-devtools/storage/cache) Even they claim that either all caches are deleted or none of them, since currently it is experimental technology, it is normal to be suspicious about this point.
You are also responsible for periodically purging cache entries. Each
browser has a hard limit on the amount of cache storage that a given
origin can use. Cache quota usage estimates are available via the
StorageEstimate API. The browser does its best to manage disk space,
but it may delete the Cache storage for an origin. The browser will
generally delete all of the data for an origin or none of the data for
an origin. Make sure to version caches by name and use the caches only
from the version of the script that they can safely operate on. See
Deleting old caches for more information.
as stated at https://developer.mozilla.org/en-US/docs/Web/API/Cache
Service Workers use the Storage API to cache assets.
There are fixed quotas available and different from browser to browser.
You can get some more hints from your app using the following call and analysing the results:
navigator.storageQuota.queryInfo("temporary")
.then(function(info) {
console.log(info.quota);
// It gives us the quota in bytes
console.log(info.usage);
// It gives us the used data in bytes
});
UPDATE 1
Do you have anywhere a cleanup method or logic that removes old cache entries and eventually is triggered somehow unexpected? Something like the code below.
caches.keys().then(function(keyList) {
return Promise.all(keyList.map(function(key) {
if (key !== cacheName) {
return caches.delete(key);
}
}));
})
A couple of more questions:
Do you have any business logic in the SW dealing with caching the images?
Could you get any info by using the storageQuota methods to see the available quota and usage? This might be very useful to understand if the images are evicted because of reaching the limit. (even by setting a box as persistence, you are not 100% sure the assets are retained)
From "Building PWAs" (o'Reilly - Tal Alter):
Each browser behaves differently when it comes to managing CacheStorage, allocating space in the cache for each site, and clearing out old cache entries.
In addition to a storage limit per site (also known as an origin), most browsers will also set a size limit for their entire cache. When the cache passes this limit, the browser will delete the caches of the site accessed the longest time ago (also known as the least recently used).
Browsers will never delete only part of your site’s cache. Either your site’s entire cache will be deleted or none of it will be. This ensures your site never has an unpredictable partial cache.
Considering the last words "Browsers will never delete only part of your site’s cache." might me think that maybe it is not a problem with the cache limit reached, as otherwise all images wiould be deleted in the cache.
If interested, I wrote an article about this topic here.
You could mark your data storage unit (aka "box") as "persistent", making the user agent retain it as long as possible (more details in the Storage API link above).
I have the following problem - there is a clock on my webpage and I'm trying to fight the following thing - if a user goes to another page and then presses back, the clock is desynchronized (it should show the time of the server). I have the mechanisms to synchronise the clock but I'm thinking how should I detect whether to fire them up (they should be used as rarely as possible as they are expensive). So is there a widely used way to detect whether user is using cached version of the page? One way I thought about is looking at user local time and if suddenly there is a jump between previous time registered and time registered now I can fire the mechanism, but is there a simpler, generic way (for example - maybe browser sends some message to webpage to communicate that...)?
It sounds like you're allowing the client or server to do a full page cache, which you don't want due to parts of the page relying on the current server date time.
You certainly want to remove client caching from you response headers of the main response, especially if you have any sort of authentication that you need to check before rendering the page; you will want to make sure the client is still logged in for example.
You will also want to remove any full response caching from the server. You should investigate server partial or donut caching which allows you to cache certain parts of the response (so it doesn't need to work out static data that won't change; for example the navigation or footer) but will go off and get new results for the parts of your page that should change on each request.
The date time display and all other reliant parts of the response will be updated, but static parts will come from the server cache.
A good (and easier) example of this problem is where you have a page that displays a random image on reload. If the page / response is cached on the server, this image will remain the same for each request. The server (code) would need set up to cache everything about the page apart from the image which would be updated by the server. The server (code) will therefore only need to work out the new image that needs to be applied to the response as all other parts of the page come from the cache on the server.
It's also worth noting that server cache is global. If you cache a particular response and another user then requests the same page, they will get the same response as the other user. You may need to vary the cache by certain parameters depending on the needs of your system.
I'm implementing XmlHttpRequest and FormData for a new application on our site, and there's a concern that some customers may try to upload tens, or possibly, hundreds of thousands of documents at a time. Before I went through the painstaking effort of testing these conditions, I was hoping that someone knew if there was a limit on (1) the number of files that can be uploaded at once, and/or (2) if there is a file size limit in any of the modern browsers that have implemented these features.
The limit on the file size depends on data type used by the browser application to store the content length attribute (usually signed or unsigned int, or around 2GB/4GB). You can also impose a max file size limit in your server side code by checking the content length and rejecting the upload of it exceeds certain threshold.
The number of files which can be uploaded at once depends on the maximum number of connections allowed per domain. See this question for more info. So, all files will be uploaded, n at a time.
UPDATE:
I did some testing as I wasn't sure whether the browser streams the files from the file system straight into the TCP connection or buffers them in memory.
If I try to upload very big file from Chrome and FF through Fiddler, the memory usage of the browser process on my machine doesn't go up while the fiddler process increases to 2GB before it crashes (its a 32bit application).
So, if on the server you stream the content of the files directly into your data store, I don't think that you will run into memory issues on neither the client or the server.
Bear in mind that most web servers need to be configured to accept very large request.
Also, make sure you use one XmlHttpRequest instance per file. If you upload everything though a single connection the content-length will be set to the total size of all files.
Also, bear in mind that allowing big files to be uploaded will make your server vulnerable to DoS attacks (you probably know this already but I though its worth mentioning).
Basically the question is in the title.
Many people have had the question stackoverflow of how to create a data URI and problems therein.
My question is why use data URI?
What are the advantages to doing:
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />
Over doing:
<img src="dot.png" alt="Red dot" />
I understand one has less overhead on the server side (maybe), but what are the real advantages/disadvantages to using data URI?
According to Wikipedia:
Advantages:
HTTP request and header traffic is not required for embedded data, so
data URIs consume less bandwidth whenever the overhead of encoding
the inline content as a data URI is smaller than the HTTP overhead.
For example, the required base64 encoding for an image 600 bytes long
would be 800 bytes, so if an HTTP request required more than 200
bytes of overhead, the data URI would be more efficient.
For transferring many small files (less than a few kilobytes each), this can be faster. TCP transfers tend to start slowly. If each file requires a new TCP connection, the transfer speed is limited by the round-trip time rather than the available bandwidth. Using HTTP keep-alive improves the situation, but may not entirely alleviate the bottleneck.
When browsing a secure HTTPS web site, web browsers commonly require that all elements of a web page be downloaded over secure connections, or the user will be notified of reduced security due to a mixture of secure and insecure elements. On badly configured servers, HTTPS requests have significant overhead over common HTTP requests, so embedding data in data URIs may improve speed in this case.
Web browsers are usually configured to make only a certain number
(often two) of concurrent HTTP connections to a domain, so inline
data frees up a download connection for other content.
Environments with limited or restricted access to external resources
may embed content when it is disallowed or impractical to reference
it externally. For example, an advanced HTML editing field could
accept a pasted or inserted image and convert it to a data URI to
hide the complexity of external resources from the user.
Alternatively, a browser can convert (encode) image based data from
the clipboard to a data URI and paste it in a HTML editing field.
Mozilla Firefox 4 supports this functionality.
It is possible to manage a multimedia page as a single file. Email
message templates can contain images (for backgrounds or signatures)
without the image appearing to be an "attachment".
Disadvantages:
Data URIs are not separately cached from their containing documents
(e.g. CSS or HTML files) so data is downloaded every time the
containing documents are redownloaded. Content must be re-encoded and
re-embedded every time a change is made.
Internet Explorer through version 7 (approximately 15% of the market as of January 2011), lacks support. However this can be overcome by serving browser specific content.
Internet Explorer 8 limits data URIs to a maximum length of 32 KB.
Data is included as a simple stream, and many processing environments (such as web browsers) may not support using containers (such as multipart/alternative or message/rfc822) to provide greater complexity such as metadata, data compression, or content negotiation.
Base64-encoded data URIs are 1/3 larger in size than their binary
equivalent. (However, this overhead is reduced to 2-3% if the HTTP
server compresses the response using gzip) Data URIs make it more
difficult for security software to filter content.
According to other sources
- Data URLs are significantly slower on mobile browsers.
A good use of Data URI is allowing the download of content that have been generated client side, without resorting to a server-side 'proxy'. Here are some example I can think of:
saving the output of a canvas element as an image.
offering download of a table as CSV
downloading output of any kind of online editor (text, drawing, CSS code ...etc)
Mainly I find use of this if I can't (for some reason) use CSS sprites and I don't want to download every little single image that I will use for styling.
Or for some reason you don't want anyone to link the image from external page. This can be achieved by other methodologies but embedding works as well.
Otherwise, personally I wouldn't encode large images as photos. It's better to have your media at a different server. A server that can lack all of the web-server related software installed. Simply delivering media. Much better use of resources.
I have used the data URI scheme in several (C++, Python) command line applications to generate
reports which include data plots.
Having a single file is quite convenient to send the reports by email (or move them around in general). Compared to PDF I did not need an additional library (other than a base64 encoding
routine) and I don't need to take care of page breaks (and I almost never need to print
these reports). I usually don't put these reports on a web server, just view them on the
local filesystem with a browser.
I agree with BiAiB that the real value of Data URIs is making client-side generated content available as file download without any need for server round-trips.
A working example of using Data URIs for "offering download of a table as CSV" is described on my blog.
IMHO, the embedding of image (or other binary resource) data into an HTML file for performance reasons is a red herring. The speed gain due to less HTTP connections is negligible and breaks the nice principle of separation between (textual) markup and binary resources (image files, videos, etc.).
I think HTTP 1.1 pipelining and some suggested improvements to HTTP are a cleaner and better way to handle HTTP network speed issues.