How to set an expiration date for an html link - html

I'm not sure if I'm being a complete noob at this (it's been a looooong night :D), but is it possible to cache links with .htaccess? I know that you can set extensions and stuff like jpg, png, css, js, etc.
And if you've ever hosted a website, I'm sure you've probably used one of those online "website optimizers", and I keep getting the message "The following cacheable resources have a short freshness lifetime. Specify an expiration at least one week in the future for the following resources:"
...followed by a list of outside links like Facebook and Google.
Any ideas?

You cannot alter the headers or the content for external resources like Google cdn or facebook. Assume that big companies like Google and Facebook know how to cache and what resources are viable to cache and for how long.
For resources on your own server, you can set the Cache-Control header with a custom time to tell the client for how long the page can be cached.
<FilesMatch \.(css|js)$>
Header set Cache-Control "public, no-transform, max-age=600"
</FilesMatch>
You can check how long it takes to load certain resources on your page by going to your browser and opening the developer console. Under the network tab you can see all requests that are being made. Make sure to load the page both with cache and without cache.

Related

Can images from another website create cookies on my site?

I have a static website, it only contains html and css. No javascript, no php, no databases. On this site, I'm using images, which I get from image-hosting websites (like imgur).
I've noticed when I visit my website (on Google Chrome at least), if I click the information button next to the URL, it says there are cookies on this site. If I click on the cookies button, it says The following cookies were set when you viewed this page and has a list from cookies, including from those sites that I use for image-hosting.
If I delete them, they come back after a while, but not immediately. I'm trying to avoid cookies as the site is very simple. Are they considered part of my site? If so, is there anything I can do, except hosting the images myself?
I always though that if you link to an image directly (as in a link ending in .png for example) it would be the same as if you were hosting the image yourself, and there would be no javascript being run (to save cookies).
Are they considered part of my site?
That depends on your perspective.
The browser doesn't consider them to be part of your site. Cookies are stored on a per-domain basis, so a cookie received in response to a request for an image from http://example.com will belong to http://example.com and not to your site.
However, for the purpose of privacy laws (such as GDPR) then they are considered part of your site and, if they are used by the third party to track personally identifiable information, you are required to jump through the usual GDPR hoops.
If so, is there anything I can do, except hosting the images myself?
Not really.
I always though that if you link to an image directly (as in a link ending in .png for example) it would be the same as if you were hosting the image yourself, and there would be no javascript being run (to save cookies).
Cookies are generally set with HTTP response headers, not with JavaScript.
Whenever a browser requests a file from a server it automatically forwards any cookie data along with the request. Image Hosting services may use that for different purposes.
I always though that if you link to an image directly (as in a link ending in .png for example) it would be the same as if you were hosting the image yourself, and there would be no javascript being run (to save cookies).
So the question is, how to they set these cookies?
Let's say, you use a simple img tag to load an image from a hoster.
<img src="imageHoster.tld/123xyz.png">
The site imageHoster.tld can handle that request by redirecting all requests to e.g. requestHandler.php and that file can set the cookie before sending the image with a simple
<?
setcookie("cookieName", "whateverValue", time()+3600);
header('content-type: image/png');
...
?>
What happens there is actually the same as if you would set the image source like that:
<img src="imageHoster.tld/requestHandler.php?img=123xyz">
Are they considered part of my site?
Since these so called third party cookies are set when visiting your site one could consider them as part of your site. To be on the safe side I would at least mention the use of third party services in the data privacy statement.
If so, is there anything I can do, except hosting the images myself?
Third party cookies can be disabled in the clients browser. But you can't disable them for the visitors of your site. So no, to avoid third parties setting cookies on client browsers visiting your site you can only avoid using their services.

Omit current page from HTML5 offline appcache but use cached resources

For performance purposes, I want to have some of my web pages use resources that have been cached for offline use (images, CSS, etc.) but to not have the page itself cached as the content will be generated dynamically.
One way to do this would be to refactor my pages so that they load the dynamic content via AJAX or by looking things up in LocalStorage. Details may vary, but broadly speaking, something like that.
If it's possible, I'd prefer to find a way to simply instruct the browser to use cached resources (again, images, CSS, etc.) for the page but to not actually cache the (dynamically generated) HTML content itself.
Is there a way to do that with HTML5 offline appcache? I'm under the impression that the answer is "no" because:
Any page that includes the manifest will be cached so I can't specify the cached resources in the page itself.
There is no way to tell a previous page "use offline assets for this other page but don't actually cache the HTML on that page". You have to specify the page itself, which means the HTML will be cached.
Am I wrong about that? It seems like there is probably some tricky (or not-so-tricky) way around that. Now that I've typed it out, I wonder if including the page explicitly in the NETWORK section of the appcache manifest will do the trick.
My answer is "yes".
I have worked on a web-app where I listed all the necessary resources in the manifest, and set the NETWORK section to *.
The manifest is then included only on the main landing page. So all resources are cached the first time you visit the site and and it works a treat.
In short,
one of your pages must include the manifest and will therefore be cached.
maybe you can have the manifest loaded in a iframe and not have the whole page cached, just a thought.
list all your resources to be cached in the CACHE section
set the NETWORK section to *
I'm fairly certain that the answer to this is no.
If you use the Network section in Chrome, then it shows which resources are loaded from the cache and which are loaded from the server. I have attempted to set the appcache as described above and the resources are always loaded from the server.
Would I be correct in assuming that if the current page is not in the appcache then it wont bother to check in the appcache for any of the resources?
What I've found that is working is to list those files that you don't want cached in appcache in the NETWORK: section of the manifest. For me, that meant adding *.asp* to the network section. Now, none of the classic asp files, or aspx files are cached.

CloudFront Cache HTML

Can Amazon CloudFront be used to cache HTML pages, and no just image, css files, etc?
If not, is there a comparable service out there that does this? I.E., I overlay the service on a domain, and literally it only queries that site again, when the cached page has expired.
I looked at CloudFlare as well and they don't yet do this.
Yes, you can serve HTML through Cloudfront.
The main disadvantage is when you need to update the HTML as you are unable to version HTML for SEO reasons.
So by setting a cache into cloudfront of 1 hour for example it means that the page is kept into cloudfront for maximum 1 hour, and only after cloudfront will retake the HTML from your source and update it.
You can use invalidations on cloudfront to speed up the process but you need a full list of your html pages for a fast copy and paste into aws for invalidating.
Of course all this work for fixed webpages, that do not change for user.
You can apply it even to ASP / PHP only if the generated content is fixed between all users.
So you have the PHP into your origin , and cloudfront save the HTML of it.
My English is not the best one, so i hope i clear somethink...
Yes, you can serve HTML through CloudFront as long as you don't mind every user getting the same content until the cache expires.
I can't imagine a CDN that would not support this. They might not advertise it since many web sites are dynamic and can't be cached, but if your site is basically static, then any CDN should work.

How do I configure optimal cache policy for index.html page in my site?

I have a web site with an index.html homepage that is updated from time to time. We sometimes add offers for our clients, special messages and so on, which have to be visible by next day for everyone.
If index.html is cached by browsers, many users will not notice that anything has changed, unless they explicitly refresh the contents of the page...
Which is the best way to be sure that 100% visitors have an up-to-date index.html page, without compromising cache performance?
My best answer would be to skip out on updating the index.html each time and go with a server-side programming language, like PHP. You can then set the headers for the page to not cache, and you can also set up an admin page that you can use to change the content. Or you could go with a browser-side script with JavaScript using AJAX. Then the page has an ability to update before the next loading of the site.

how do I add an expiration date to an img tag?

I'm using the Page Speed Firebug extension to help improve page performance. I have an image-heavy page, and one of the suggestions it made is this:
Leverage browser caching
The following cacheable resources have a short freshness lifetime. Specify an expiration at least one week in the future for the following resources:
http://www.mysite.com/components/com_arrcard/assets/merchant-logos/aap25.jpg (expiration not specified)
I know you can set Expires or Cache-Control headers on an entire page, but how do I add an expiration to a specific element? Is it even possible, or am I misinterpreting what Page Speed is suggesting?
Presumably you'd set the expires or cache-headers on an image by using some sort of setup in your webserver (either configure a specific directory or use a script) so that the http request sent for each image also contains these attributes.
If you're using apache, one option to do this for you is mod_expires.
If you're using IIS you can put the images into a separate folder and then set the cache header. If you update an image, change the file name so that it will be refreshed in the browser the next time the user loads the page
Using Content Expiration