Chrome - Images disappearing from cache storage after few hours - google-chrome

I have a website that is used as a kiosk app. When online, I preload data and images from a wordpress API, and store the images in the cache storage.
A service worker intercepts http gets to these images and serves the cache data instead. Like this, the app can run offline (API calls included).
But after few hours running offline (generally around 6h) some images disappear from the cache storage. And it's always the same ones.
But not all.
Any idea where should I check to see what's going wrong ?

There can be two possible reasons for such clearing of cache, either your storage gets full(chrome localStorage), so it gets clear OR check for the expiring length of the data send from the server, check for headers that might give you insights of time it takes to expire.
And for checking if the data gets evicted from the storage, try testing your website in Safari or Edge browser where such eviction does not occur.

How did you configured Cache-Control? The ones which are constantly deleted might configured as max-age.
It's still recommended that you configure the Cache-Control headers on
your web server, even when you know that you're using the Cache
Storage API. You can usually get away with setting Cache-Control:
no-cache for unversioned URLs, and/or Cache-Control: max-age=31536000
for URLs that contain versioning information, like hashes.
as stated at https://web.dev/service-workers-cache-storage
Also, you should have a check storage amount of cache storage. (https://developers.google.com/web/tools/chrome-devtools/storage/cache) Even they claim that either all caches are deleted or none of them, since currently it is experimental technology, it is normal to be suspicious about this point.
You are also responsible for periodically purging cache entries. Each
browser has a hard limit on the amount of cache storage that a given
origin can use. Cache quota usage estimates are available via the
StorageEstimate API. The browser does its best to manage disk space,
but it may delete the Cache storage for an origin. The browser will
generally delete all of the data for an origin or none of the data for
an origin. Make sure to version caches by name and use the caches only
from the version of the script that they can safely operate on. See
Deleting old caches for more information.
as stated at https://developer.mozilla.org/en-US/docs/Web/API/Cache

Service Workers use the Storage API to cache assets.
There are fixed quotas available and different from browser to browser.
You can get some more hints from your app using the following call and analysing the results:
navigator.storageQuota.queryInfo("temporary")
.then(function(info) {
console.log(info.quota);
// It gives us the quota in bytes
console.log(info.usage);
// It gives us the used data in bytes
});
UPDATE 1
Do you have anywhere a cleanup method or logic that removes old cache entries and eventually is triggered somehow unexpected? Something like the code below.
caches.keys().then(function(keyList) {
return Promise.all(keyList.map(function(key) {
if (key !== cacheName) {
return caches.delete(key);
}
}));
})
A couple of more questions:
Do you have any business logic in the SW dealing with caching the images?
Could you get any info by using the storageQuota methods to see the available quota and usage? This might be very useful to understand if the images are evicted because of reaching the limit. (even by setting a box as persistence, you are not 100% sure the assets are retained)
From "Building PWAs" (o'Reilly - Tal Alter):
Each browser behaves differently when it comes to managing CacheStorage, allocating space in the cache for each site, and clearing out old cache entries.
In addition to a storage limit per site (also known as an origin), most browsers will also set a size limit for their entire cache. When the cache passes this limit, the browser will delete the caches of the site accessed the longest time ago (also known as the least recently used).
Browsers will never delete only part of your site’s cache. Either your site’s entire cache will be deleted or none of it will be. This ensures your site never has an unpredictable partial cache.
Considering the last words "Browsers will never delete only part of your site’s cache." might me think that maybe it is not a problem with the cache limit reached, as otherwise all images wiould be deleted in the cache.
If interested, I wrote an article about this topic here.
You could mark your data storage unit (aka "box") as "persistent", making the user agent retain it as long as possible (more details in the Storage API link above).

Related

What does Blink in-memory cache store?

Besides the browser cache, there are a few other ways browser cache data. For Chrome, there is another cache in the rendering engine Blink that stores images, styles, scripts and fonts (maybe more) in memory.
This cache is used for consecutive navigations on a site. Resources delivered from the Blink cache are tagged with (from memory cache) in the network tab. Resources served from the browser cache are tagged with (from disk cache).
My question is now, which resources are stored in and delivered from this very fast cache? From my tests, it varies a lot:
It works extremely well for images and script tag which are directly in the HTML.
It works sometimes for style (link) tags which are directly in the HTML. Sometimes it does not work (in the same browser with the same session).
It works almost never for script tags that are inserted into the HTML programmatically. Sometimes it works though.
One huge difference between disk cache hits and memory cache hits becomes visible in combination with Service Workers. Requests that are served by the in-memory cache cannot be observed in the Service Worker (because the cache comes before the Service Worker). Requests that are served by the disk cache pass through the Service Worker (since the Browser Cache lies behind Service Worker).
To show the explained behavior, I built a test page with all resource types: https://dm-clone-optimized.app.baqend.com/
You can navigate through the site with the links at the top and observe how the requests behave in the network tab and console. Every page loads the same resources.
After a bit of navigating (Chrome 70.0.3538.67), I get this behavior most of the time:
HTML is fetched from network
Script tags scripts.js and scripts2.js are from in-memory cache
Image tag logo.png is from in-memory as well
Style link tag styles.css is from disk cache :(
Programatically added script tag scripts2.js?id=1 is from disk cache as well :(
Sometimes though, I get really lucky and everything is served from in-memory cache:
I would love to understand how the Blink in-memory cache works and how I can tune my site to use it for all resources with appropriate cache control header.
---- edit ----
What concerns me the most is: Why are dynamically added scripts not cached at all? This has a noticeable impact on frameworks like require.js since they insert all dependencies as dynamically added script tags.
Blink in-memory cache works
Blink has four memory allocators
PartitionAlloc, Oilpan, tcmalloc, and system allocator
So the team working on Chrome Blink has removed tcmalloc and system allocators from Blink
Blink (PartitionAlloc+Oilpan) is the second largest consumer of renderer’s memory which consumes 10 - 20% in typical cases and retains some of the memory in Discardable, CC and V8
Inside Blink, the primary memory consumers are:
Large StringImpls (used by JavaScript source code)
shared buffers (used by Resources)
Vectors and HashTables
The recommendation is to: "identify caches that have an impact on Blink’s total memory and implement purgeMemory() only on them."
Reducing the size of (DOM object) won’t have an impact
Discarding caches won’t affect in most cases
They are working on getting rid of "DiscardableMemory" items which will help to do things like forcibly detaching all layout objects which in turn will release memory retained by the layout tree.
I believe it is a result of optimization in chrome, and they make it verbose to you.
The files are always go into disk cache. And they also goes into memory, and flushed very soon.
Chrome is smart enough to ask running process that do they still have a loaded copy of them in memory before seek on disk.
The step has a high hit rate, as those images/js are actively using for something.
You will not have any control how chrome manage TTL of them/capacity of memory could be used to keep blob hot. Chrome dev team doing quite a lots on dynamic tuning based on actual hardware capacity and system loading.
P.S. If you are asking for keep YOUR APP in memory. You are failing into Sun/Adobe way of evil: making their app DLL hot in memory(by tray icon/service) and slow everyone else down.
P.P.S. If it is the app end-user might want to use, use electron and follow Whatsapp/Slack/etc to build an app always running.

Can I include data with GET request that should be cached (in a Chrome App)

I am developing a Chrome App that needs to work off-line, i.e., I want it to use the App Cache, rather than sending a request to the server, as much as possible. However, should it be necessary to hit the server, then I would like to include a few bytes of data (so the server can collect and analyse some statistics about, for example, how many previous requests were served by the cache alone).
I can not send the data bytes in the URL query, because this defeats the cache (and is a well known technique for ensuring the cache is bypassed - exactly what I do NOT want to do).
I can't use a POST request, as that will also defeat the cache.
I have thought about including data in a header, but am unsure how, and unsure if this will do the trick as everything I have found about this idea recommends against it. I do not want to get in the trap of relying on something completely undocumented in case it stops working in the future (and I am not sure what will work today).
Can I include data in a GET request that will have absolutely no effect on the App Cache (in Chrome in particular), but is available to the server if and when the request makes it that far?
See comment above; this question isn't really about Chrome Apps. It's more about webapps that use App Cache.
It might be helpful to decouple normal web requests from the reporting feature. You are trying to piggyback the stats on the normal requests, but there doesn't seem to be a need to do so. Moreover, for the occasional client that caches perfectly, you'll never hear from that client because the reporting never occurs. So even in the best case the design would have been flaky.
Instead, you could increment a counter every time you make a request that might be cacheable, and keep it somewhere persistent like in localStorage. Then periodically report it to a noncached destination. You can then subtract the traffic you are receiving at the web server (which represents uncached requests) from that reported total (which represents all requests, cached or not). The difference is cached requests.

Using data URI for images in IE 6 and 7

I am currently developing a web-app. It contains images which are generated dynamically in the server (and thus takes some time to appear after requested) and then dished out. So I thought that I will use HTML5 local-storage API to cache the images, so that on subsequent requests for the same image, it can be served instantly. For tha, I plan to use base64 encoding of the image as the source instead of using a source URL.
Instead of requesting the image from the server, the JS will now first check whether that image data is currently available in the local storage (say an image with attribute 123 is stored in the local storage with 123 as key, and the base 64 encoding as the value). If yes, then just change the image's source with the value obtained from there. Else request the server to send the encoding, upon receiving which, it is stored in the cache.
Problem is IE6 and IE7 don't support it. There is a workaround, as described here, but that involves a server side CSS file to contain the image data. Since images will be generated on the fly, that won't serve our purpose. How else can I achieve this in IE6 and IE7?
Alternatively don't try and cache anything clientside. Cache the generated images on the server side and host those images like normal. You don't need localstorage and cache it client side.
In other words:
generate image server side using your script
cache it somewhere like /httpdocs/cache/images/whatever-hash.jpg
serve the image in your document <img src="/cache/images/whatever-hash.jpg">
If generating an image takes 5 seconds and you have 120 concurrent users requesting 100 unique pages and your server script can only handle processing 4 threads at any given time that comes out to
5 seconds x (120 /4) / 60 = 2.5 minutes of server processing time before the last user in the queue's image is served and the data stored in localstorage.
The same time spent will be true if all users requested the same exact page. There would be no real benefits from caching per user since every user will have to ask the server to generate their own image. Also since localstorage will get invalidated often the more the user does they will feel a considerately slow user experience and in my opinion bail on your app.
Caching the file on the server will have many more benefits IMHO. Sure it takes up server storage space but these days it's rather cheap and you can get a cloud CDN (example www.maxcdn.com) to combat the load.
If you still decide you need to cache client side, because IE6/IE7 doesn't support localstorage or data URI so check out the following
You'll need a Web Storage shim for IE6/IE7 there's a list at https://github.com/Modernizr/Modernizr/wiki/HTML5-Cross-Browser-Polyfills
You'll need a way to store the generated image as blob temporarily and stick it in the storage. Example: http://robnyman.github.com/html5demos/localstorage/
You could also use a Canvas Shim and a toBlob shim: https://github.com/eligrey/canvas-toBlob.js/
set the headers to inform the browser that the resource is cached :
header("Last-Modified: " . date("D, d M Y H:i:s", getlastmod()));
in PHP
or
Response.Cache.SetLastModified(DateTime.Now);
in .net
This way the browser will cache the resource.

When is localStorage cleared?

How long can I expect data to be kept in localStorage. How long will an average user's localStorage data persist? If the user doesn't clear it, will it last till a browser re-install?
Is this consistent across browsers?
localStorage is also known as Web Storage, HTML5 Storage, and DOM Storage (these all mean the same thing).
localStorage.setItem('bob', varMyData);
sessionStorage.setItem('bob', varMyData);
localStorage is similar to sessionStorage, except that data stored in localStorage has no expiration time, while data stored in sessionStorage gets cleared when the browsing session ends (i.e. when the browser / browser tab is closed). (See Limitations section below for up-to-date storage size limitations.)
Session storage is used much less often than localStorage, and exists only within the current browser tab - even two tabs loaded with the same website will have different sessionStorage data. sessionStorage data survives page refresh, but not closing/opening the tab. LocalStorage data, on the other hand, is shared between all tabs and windows from the same origin. LocalStorage data does not expire; it remains after the browser is restarted and even after OS reboot.
localStorage is available on all browsers, but persistence is not consistently implemented. In particular, localStorage can be cleared by user action and may be cleared inadvertently (who would think that clearing all cookies also clears localStorage?).
In Firefox, localStorage is cleared when these three conditions are met: (a) user clears recent history, (b) cookies are selected to be cleared, (c) time range is "Everything" -- or when LocalStorage is full - see "Limitations" section below.
In Chrome, localStorage is cleared when these conditions are met: (a) clear browsing data, (b) "cookies and other site data" is selected, (c) timeframe is "from beginning of time" -- or when LocalStorage is full (see "Limitations" section below). In Chrome, it is also now possible to delete localStorage for one specific site.
In IE, to clear localStorage: (a) Tools--Internet Options, (b) General tab, (c) delete browsing history on exit, (d) ensure "Cookies and website data" (or "temporary internet files and website files") is selected, (e) consider unchecking "Preserve Favorites website data" at the top
In Safari: (a) Click Safari (b) Preferences (c) Select the Privacy tab (d) Click Remove all website data (e) Click Remove Now
Opera: Despite excellent articles on localStorage from the Opera site, I haven't yet found clear (non-programmatic) instructions to users on how to clear localStorage. If anyone finds, please leave a comment below this answer with reference link.
Limitations:
TOTAL localStorage is limited to 50% of free disk space.
ALSO, the localStorage for any one "origin" (domain + any subdomains) is (theoretically) limited to 20% of total localStorage - in practice, though, the localStorage for one domain (as of Oct/2022) is:
minimum: 10Mb
maximum: 2Gb Source
actual: 5Mb (limit on my system with 6Gb free space, per a modified version of this script )
(Test System: i7 / 32Gb / 500Gb SSD w 6Gb free / Brave browser:
Version 1.45.133 Chromium: 107.0.5304.141 (Official Build) (64-bit)
When the TOTAL localStorage is full, then the browser will start clearing out data (called "origin eviction") based on an LRU policy — the Least Recently Used domain will be deleted first, then the next one, until the browser is no longer over the limit.
Note that this origin eviction process will delete an entire domain's worth of data until the storage amount goes under the limit again. Deletion of a domain's localStorage data is "all-or-nothing" -- there is no trimming effect put in place to delete parts of origins (domains) because partial data could be much worse than no data.
The Opera dev site has an excellent summary of localStorage:
The current way of storing data on the client-side — cookies — is a
problem:
Low size: Cookies generally have a maximum size of around 4 KB, which
is not much good for storing any kind of complex data
It’s difficult for cookies to keep track of two or more transactions on the same
site, which might be happening in two or more different tabs
Cookies
can be exploited using techniques such as cross site scripting,
resulting in security breaches
Other (less popular) alternatives to
cookies include techniques involving query strings, hidden form
fields, flash based local shared objects, etc. Each with their own set
of problems related to security, ease of use, size restrictions etc.
So up until now we have been using pretty bad ways of storing data on
the user’s end. We need a better way, which is where Web Storage comes
in.
Web Storage
The W3C Web Storage specification was designed as a better way of
storing data on the client-side. It has two different types of
storage: Session Storage and Local Storage.
Both Session and Local Storage will typically be able to store around
5 MB of data per domain, which is significantly more than cookies. NOTE THAT although MDN's numbers were updated (Oct 2022) and now say: minimum: 10Mb / Maximum: 2Gb, this author is unable to exceed 5Mb per domain/origin. M.D.N. / test script
Resources:
https://developer.mozilla.org/en-US/docs/Web/API/Window/sessionStorage
MDN - Browser_storage_limits_and_eviction_criteria
https://javascript.info/localstorage
https://dev.opera.com/articles/web-storage/
http://www.quirksmode.org/html5/storage.html
http://www.ghacks.net/2015/02/05/how-to-clear-web-storage-in-your-browser-of-choice/
https://nakedsecurity.sophos.com/2014/11/05/how-to-clear-out-cookies-flash-cookies-and-local-storage/
http://www.opera.com/dragonfly/documentation/storage/
DOMStorage article on MDN (written by John Resig)
http://ejohn.org/blog/dom-storage/
W3C draft says this
User agents should expire data from the local storage areas only for security reasons or when requested to do so by the user. User agents should always avoid deleting data while a script that could access that data is running.
So if browsers follow the spec it should persist untill the user removes it on all browsers, I have not found any that have deleted on any off my projects.
A good article to read is also http://ejohn.org/blog/dom-storage/
Duration
Unlimited. The data persists through browser & OS restarts.
Capacity
Each domain can store minimum of 5MB of data in LocalStorage.
For some browsers you can store up to 1GB of data.
In Chrome while performing 'clear browsing data' , if you choose 'Cookies and other site and plugin data' option then sessionStorage data will be erased.
The content in localstorage is persistent as long as the user chooses to clear the storage (entirely or a single value inside it)
About the consistency across browser, localstorage is currently available on every major browser, including IE8+ (see http://caniuse.com/#feat=namevalue-storage)

What is the difference between a primed cache and empty cache?

What is the difference between a primed cache and empty cache?
For example the statistics result of YSlow provides a graphical data of an empty cached vs. primed cache. What are the difference between them?
Simply, a primed cache means the browser has it cached. It has been there before, or (though I don't think YSlow means it this way) it has been somewhere that uses some of the same resources (images, CSS, JavaScript)
This was asked 3 years before, but I bumped into this question my self since I had it to. So I did a small research on the internet and I found that :
Statistics is the third tab and provides a graphical representation of the number of HTTP requests made to the server and the total weight of the page in kilobytes for both Empty Cache and Primed Cache scenarios.
The Empty Cache scenario is when the browser makes the first request to the page and the Primed Cache scenario is when the browser has a cached version of the page. In a Primed Cache scenario, the components are already in the cache, so this will reduce the number of HTTP requests and thereby the weight of the page.
Keyword here is "scenarios". This does not mean that the graphs will change if you already have cached the page. I run the test two times even though I cached it and it always displays both graphs since it shows the "scenarios". So if I cached the page I am looking at the Primed Cache scenario but for my new visitors the Empty Cache.
So in the above example, when I request the page and it is cached, my browser will still make 3 requests with total weight of 86.6K
This page explains what actually yslow displays. http://www.devcurry.com/2010/07/understanding-yslow-firebug-extension.html