I'm building a website whose hosting supports only html and javascript. I've corrected some mistakes but visitors probably won't see them because browsers can't show the updated pages. They show the old pages.
I remember "html expires" code but it's too late. Because many visitors saw our site.
I'm assuming your HTML files aren't being updated and not CSS style sheets or JS files (for which the answer would be different).
This is mainly a server side issue - you would have to check with your hosting provider what caching headers the server emits. Ideally, the server would listen to if-modified-since requests from browsers so it can serve updated content if there is any, and make the browser use the cached copy if there is none.
To remedy the problem at hand, you may need to rename your existing HTML files, or put them in a new directory. That will force the browser to actively re-fetch them. Of course, that still means that the entry point (usually index.html) will have to be re-fetched by the client - otherwise they will never notice the new structure, and hence not re-load anything.
A hosting configuration that indiscriminately dishes out caching instructions that prevent frequent updates from being made is not really useful. Talking to your hosting provider would probably be a good idea.
Related
I am hosting a website where a couple of PDF-files which are currently in Object-tags are updated weekly.
The name of these PDF-files stay the same, but the data changes.
Currently I'm using:
<object id="men" data="seasons/S2223/Men2023.pdf?" type="application/pdf" width="100%" height="750px">
<p>The file could not be read in the browser
Click here to download
</p>
</object>
When I update the PDF I'm expecting the
data="seasons/S2223/Men2023.pdf?"
to be reading the latest PDF however it stays the same as before.
I added the ? at the end of the filename which should check for the latest version but it doesn't seem to work.
When I clear my browser's cache it's updated but ofcourse this isn't a suitable option for users.
All help is appreciated.
Caching, in this context, is where the browser has loaded the data from a URL in the past and still has a local copy of it. To speed things up and save bandwidth, it uses its local copy instead of asking the server for a fresh copy.
If you want the browser to fetch a fresh copy then you need to do something to make it think that the copy it has in the cache is no good.
Cache Busting Query String
You are trying to use this approach, but it isn't really suitable for your needs and your implementation is broken.
This technique is designed for resources that change infrequently and unpredictable such as the stylesheet for a website. (Since your resources change weekly, this isn't a good option for you.)
It works by changing the URL to the resource whenever the resource changes. This means that the URL doesn't match the one the browser has cached data for. Since the browser doesn't know about the new URL it has to ask for it fresh.
Since you have hardcoded the query to n=1, it never changes which defeats the object.
Common approaches are to set the value of the query to a time stamp or a checksum of the file. (This is usually done with the website's build tool as part of the deployment process.)
Cache Control Headers
HTTP provides mechanisms to tell the browser when it should get a new copy. There are a variety of headers and I encourage you to read this Caching Tutorial for Web Authors and Webmasters as it covers the topic well.
Since your documents expire weekly, I think the best approach for you would be to set an Expires header on the HTTP resource for the PDF's URL.
You could programmatically set it to (for example) one hour after the time a new version is expected to be uploaded.
How you go about this would depend on the HTTP server and/or server-side programming capabilities of the host where you deploy the PDF.
Found an interesting article about "Cruftless" links (removing the "index.html" from links) but when I do that no browser shows the local pages.
http://www.nimblehost.com/blog/2012/11/why-cruftless-links-are-better/,
This is understandable, it's a 'file' url from a local machine, so what do people do to work on basic html sites offline? How do they preview them?
For example, no browser (understandably) will display this...
file:///JOBS/ABC/About/
... but this is fine...
file:///JOBS/ABC/About/index.html
?... so what do people do to get around this?
The meaning of file: URLs is, by definition, system-dependent. Normally browsers map them to files in the file system in a relatively straightforward manner.
Thus, a link with href value like file:///JOBS/ABC/About/ may or may not work, depending on system. It may fail, or it may open a generated document containing a directory (folder) listing, or it might do something else.
There is normally no need to get around this, and it is pointless to worry about SEO when dealing with local files.
This could, however, matter during site development when you work with a site locally (and perhaps test and demonstrate it locally). Then you might wish to have, say, About us so that it works locally as well as on a server, yielding About/index.html in both cases but without hard-wiring index.html in HTML markup.
I’m afraid the answer is “you can’t”. But as a workaound, you can install and use a local HTTP server, with settings similar to those that you will have on the real server. This means a little extra work (mainly downloading and installing and configuring software like XAMPP), but it also gives you important other benefits, like testing your pages locally with server-based features (to the extent that the real server is similar to the local server).
I have a few questions regarding HTML5 offline storage, which I could not figure out.
Where exactly these files are stored in the Windows? I could not find here:
C:\Documents and Settings[User Name]\Application Data\Mozilla\Firefox\Profiles\
Is there any expiration time, after that browser deletes these files automatically? Or do the files remain forever?
What if I change the contents of the page, is there anyway to refresh the refresh the data which is stored offline?
Thanks.
I found them in %AppData%/Profiles/<currentprofilename>.default/OfflineCache. I'm using Windows 7.
This depends on the expires headers your web server sends for the files in question. It is recommended you set the expires header to one week, but it is up to you, you can make it never expire. Note that the manifest file itself should be set to never be cached.
In order to refresh the data you must actually change the manifest file. It is recommended that somewhere in the manifest file you put a comment with the version number, then update it every time you change any of your other files.
Edit: I had answered these questions thinking you meant offline application cache, not local storage.
Well, for the sake of accuracy, it should be mentioned that although localStorage was indeed part of the HTML5 specification, it was split into its own after getting slightly over-complicated to be include alongside with the rest of HTML5.
It really depends on your browser, but it should be found on your AppData folder, in /profiles//OfflineCache. (for Windoes 7).
There is generally NO expiration date for localStorage, it can remain forever unless specifically removed by the website.
Javascript changes the localStorage data, (assuming you don't touch the actual file), in which case, the website you are using (or writing) needs to be smart enough to refresh the localStorage along with the page's content.
I'm curious what are some effects/downside of not putting an index.html file to your directories (e.g images). I know when an index file is not present to a directory, files inside that directory are no longer private and will be visible to the browsers when point (eg yoursite.com/images/). Aside from that what are some big effects to consider? and how to properly secure them.
thanks!
This depends on your web server, but there are two disadvantages to not having one.
If not secured, some web servers will show the directory contents if there is no default page.
If someone types your_site/directory/, and there is no default page, they will receive a 404 error.
With current web servers, there are many ways to get around not having a default page for each directory. You can set the custom 404 to redirect to a page that is available, and some can put one in automatically for you. As far as security goes, you can just turn off directory browsing to not allow people to see the contents.
If you are configuring your own server, not having one becomes less of an issue since you can control how the server responds to such situations. People normally just put it in as a fail safe, to make sure that the above two problems are taken care of.
We have several images and PDF documents that are available via our website. These images and documents are stored in source control and are copied content on deployment. We are considering creating a separate image server to put our stock images and PDF docs on - thus significantly decreasing the bulk of our deployment package.
Does anyone have experience with this approach?
I am wondering about any "gotchas" - like XSS issues and/or browser issues delivering content from the alternate sub-domain?
Pro:
Many browsers will only allocate two sockets to downloading assets from a single host. So if index.html is downloaded from www.domain.com and it references 6 image files, 3 javascript files, and 3 CSS files (all on www.domain.com), the browser will download them 2 at a time, with the other blocking until a socket is free.
If you pull the 6 image files off onto a separate host, say images.domain.com, you get an extra two sockets dedicated to download your images. This parallelizes the asset download process so, in theory, your page could render twice as fast.
Con:
If you're using SSL, you would need to either get an additional single-host SSL certificate for images.domain.com or a wildcard SSL certificate for *.domain.com (matches any subdomain). Failure to do so will generate a warning in the browser saying the page contains mixed secure and insecure content.
You will also, with a different domain, not send the cookies data with every request. This can increase performance.
Another thing not yet mentioned is that you can use different web servers to serve different sorts of content. For example, your static content could be served via lighttpd or nginx while still serving your dynamic content off Apache.
Pros:
-load balancing
-isolating a different functionality
Cons:
-more work (when you create a page on the main site you would have to maintain the resources on the separate server)
Things like XSS is a problem of code not sanitizing input (or output for that matter). The only issue that could arise is if you have sub-domain specific cookies that are used for authentication.. but that's really a trivial fix.
If you're serving HTTPS and you serve an image from an HTTP domain then you'll get browser security alert warnings pop up when you use it.
So if you do HTTPS, you'll need to buy HTTPS for your image domain awell if you don't want to annoy the hell out of your users :)
There are other ways around this, but it's not particularly in the scope of this answer - it was just a warning!