I have a website
www.somesite1.com which gets all its image content from www.somesite2.com
At the moment each time an image is to be displayed we simply use an absolute URL to get it like this
<img src="http://www.somesite2.com/images/myimage.jpg" />
So each time a user goes to www.somesite1.com for content www.somesite2.com gets hammered.
I'm looking for clever ways of caching images, perhaps using http headers or such like. Thanks
The simplest answer is to set an HTTP Expires header with a far future value (say 10 years). This way (in theory) the browser should never request that image again taking huge load off your server if you have a lot of repeat visits. Of course, you'll need to come up with a mechanism for changing the image url when the you want to update the image. Most people approach this by adding a build number into the path somewhere (or by appending a build number to a query string).
There's some good information on performance (including use of the Expires header) here:
http://developer.yahoo.com/performance/rules.html
I personally wouldn't worry about caching these on the server side (I presume they aren't being generated dynamically). They are being served directly from disc so the overhead should be small anyway. Make sure you have a separate machine for serving static content and use a very lightweight web server, such as Lighttpd or nginx which should help increase throughput.
http://www.lighttpd.net/
http://nginx.net/
Unless your website is very popular you probably don't need a second server for static content.
If you are using Apache HTTP then you could install mod_cache. You could also look at installing Squid Cache.
I agree with Andy Hume though, if you do require a second server then keep it light and use something like lighttpd and Squid.
Keep a separate webserver only for the images, since images are a separate http requests.
There is nothing wrong with your setup, it works perfectly. To understand what you need to do to make it work better, you must understand how the browser works.
First, it will read the HTML document from www.somesite1.com. Then it will read it, looking for any URLs. When it finds any, it will start to download them.
So from a browser point of view, there is no relation between site 1 and 2. If you want to improve caching for site 2, you can completely ignore site 1.
This means that you must tell the web server on site 2 to send the correct cache information to the browser. For example, have it respond to HEAD requests and/or set the lifetime of the image to some large value in the header field Cache-Control.
Related
I have a JPG in the header section of a responsive HTML page. I want to use the same site content for multiple domains that will point to my single set of files at my hosted URL. Example : mysite.com will host all files. But a second site - example theirsite.com will point in a forward to my hosted files location.
All the content will be the same, EXCEPT FOR the one image file (logo.png) or can be any name, but I would like to see if I can image-substitute 1 file (logo.png) to render so that, when visitors come to mysite.com, they see my file. When visitors come to theirsite.com they will see the logo file for theirsite.com instead of for mysite.com Sorry if I have not explained this professionally.
There are two real ways to do this. The best way is to handle it server-side.
You would need some sort of dynamic site generation, such as PHP. As the site isn't changing on each request, I'd recommend doing this generation ahead of time. Then you can utilize static hosting on CDNs and such. The specifics of how you do this depend on your technology choice. And, it matters little what you pick.
Doing the switch server-side is better, as crawlers will be able to see the right version of the site. Most crawlers don't run client-side code.
The second option is to handle it client-side. In this case, I'd recommend including a site definition file for each domain and writing some JavaScript to check the hostname the site was loaded on, and load the right site definition file. That file could contain elements and attributes to replace. Again, this is less desirable of an option, but still possible.
I have a one-page static website. My website is displaying different images than those referenced in the HTML. For example:
<img src="img/About_Us_Graphic.png" alt="About us photo" id="aboutUsPic" style="margin: auto;">
Will be sometimes displayed as the image that's actually
<img src="img/Facebook_icon.png">
This happens pretty much randomly. Sometimes the pictures are correct, sometimes they're totally different pictures. And when it's a wrong picture, it isn't consistently the same wrong picture. What causes this? How can I fix it?
My site uses Foundation 5 (not sure if that's relevant). Thanks!
I've found situations similar to the one you described to be the symptom of one of a few causes:
Someone is tinkering with the content on the site without you being aware. Ask your team members if they know of anyone who might do this.
Your client-side cache is taking over. To remedy this specific problem, go to your browser and clear out the temporary files. Sometimes you have to also clear out cookies and other historical items.
Client-side proxies. Sometimes proxy servers store caches of what they serve to reduce the load of their requests. When they work in a round-robin fashion, different servers within the proxy circle might have mismatched content. * https://en.wikipedia.org/wiki/Load_balancing_(computing)
Load-balanced web servers. I've seen some situations where servers that are load balancing content will hold onto data. In my specific scenario, a memcache was used and would seemingly hold onto content until its index was refreshed.
Without more information about your set-up, there's not much anyone can do. As oxguy3 suggested - there could even be something in your code causing this.
Please try typing the URL of the image directly in your browser and see if it consistantly comes up the same, then try the same url with "?someArbitraryText" after it where "someArbitraryText" is just some random characters.
E.G. instead of "http://my.server.com/img/About_Us_Graphic.png", use "http://my.server.com/img/About_Us_Graphic.png?arbitrary". Most servers that I've encountered will still serve the image, but if a load balancer, proxy, or memcache is involved it will consider this a different URL and load it from the source rather than from some cached file.
I've seen some cases (such as on salesforce clouds) where doing so will bring up different results.
Let us know what you discover. Any little clue could help someone identify and determine the root cause.
I've read quite some post about the angularjs html5Mode including this one and I'm still puzzled:
What is it good for? I can see that I can use http://example.com/home instead of http://example.com/#/home, but the two chars saved are not worth mentioning, are they?
How is this related to HTML5?
The answers links to a page showing how to configure a server. It seems like the purpose of this rewriting to make the server return always the same page, no matter what the URL looks like. But doesn't it read to needlessly increased traffic?
Update after Peter Lyons's answer
I started to react in a comment, but it grew too long. His long and valuable answer raises some more questions of mine.
option of rendering the actual "/home"
Yes, but that means a lot of work.
crazy escaped fragment hacks
Yes, but this hack is easy to implement (I did it just a few hours ago). I actually don't know what I should do for in case of the html5mode (as I haven't finished reading this seo article yet.
Here's a demo
It works neither in my Chromium 25 nor in my Firefox 20. Sure, they're both ancient, but everything else I needed works in both of them.
Nope, it's the opposite. The server ONLY gets a full page request when the user first clicks
But the same holds for the hashbang, too. Moreover, a user following an external link to http://example.com/#!/home and then another link to http://example.com/#!/foreign will be always served the same page via the same URL, while in the html5mode they'll be served the same page (unless the burdensome optimization you mentioned gets done) via a different URL (which means it has to be loaded again).
but the two chars saved are not worth mentioning, are they?
Many people consider the URL without the hash considerably more "pretty" or "user friendly". Also, a very big difference is when you browse to a URL with a hash (a "fragment"), the browser does NOT include the fragment in it's request to the server, which means the server has a lot less information available to deliver the right content immediately. As compared to a regular URL without any fragment, where the full path "/home" IS including in the HTTP GET request to the server, thus the server has the option of rendering the actual "/home" content directly instead of sending the generic "index.html" content and waiting for javascript on the browser to update it once it loads and sees the fragment is "#home".
HTML5 mode is also better suited for search engine optimization without any crazy escaped fragment hacks. My guess is this is probably the largest contributing factor to the push for HTML5 mode.
How is this related to HTML5?
HTML5 introduced the necessary javascript APIs to change the browser's location bar URL without reloading the page and without just using the fragment portion of the URL. Here's a demo
It seems like the purpose of this rewriting to make the server return always the same page, no matter what the URL looks like. But doesn't it read to needlessly increased traffic?
Nope, it's the opposite. The server ONLY gets a full page request when the user first clicks a link onto the site OR does a manual browser reload. Otherwise, the user can navigate around the app clicking like mad and in some cases the server will see ZERO traffic from that. More commonly, each click will make at least one AJAX request to an API to get JSON data, but overall the approach serves to reduce browser<->server traffic. If you see an app responding instantly to clicks and the URL is changing, you have HTML5 to thank for that, as compared to a traditional app, where every click includes a certain minimum latency, a flicker as the full page reloads, input forms losing focus, etc.
It works neither in my Chromium 25 nor in my Firefox 20. Sure, they're both ancient, but everything else I needed works in both of them.
A good implementation will use HTML5 when available and fall back to fragments otherwise, but work fine in any browser. But in any case, the web is a moving target. At one point, everything was full page loads. Then there was AJAX and single page apps with fragments. Now HTML5 can do single page apps with fragmentless URLs. These are not overwhelmingly different approaches.
My feeling from this back and forth is like you want someone to declare for you one of these is canonically more appropriate than the other, and it's just not like that. It depends on the app, the users, their devices, etc. Twitter was all about fragments for a good long while and then they realized their mobile users were seeing too much latency and "time to first tweet" was too long, so they went back to server-side rendering of HTML with real data in it.
To your other point about rendering on the server being "a lot of work", it's true but some consider it the "holy grail" of web app development. Look at what airbnb has done with their
rendr framework. See also Derby JS. My point being, if you decide you want rendering in both the browser and the server, you pick a framework that offers that. Not that you have a lot of options to choose from at the moment, granted, but I wouldn't advise hacking together your own.
If I display abc.jpg 20 times on a web page, does loading of the web page cause 20 http requests to the abc.jpg? Or it depends if I am using relative or absolute paths?
Thanks
It's down to the browser. A poorly written browser may request the same file multiple times, but any of the widely-used browsers will get this right. It shouldn't matter whether they are using relative or absolute paths (though mixing between relative and absolute paths on the same page might trip up some browsers, so you should probably avoid it).
It depends on the web browser, but any modern browser should only request it once.
It is up to the browser. A modern browser will try hard to cache the image. Use consistent URL format in your requests when possible - consistent capitalization, don't use "www." one time and no "www." another time, etc.
Download Firebug and use the 'Net' tab to inspect all requests.
For this case, I agree with the other answers, any modern browser with proper settings should cache it.
It does depend on the browser settings but it also depends on what the web server tells the client to do with the image.
See this, it's quite complicated
http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html
While I agree with the above statements, I suggest looking at your web server access log for the target image and comparing the referring page and browser fingerprint.
You will possibly see lots of hits to HEAD rather than GET in order to make sure the file cache is up to date.
I've noticed an issue while developing my pages that's always bothered me: while Firefox (my general "dev" browser) always updates CSS and images when they change on the server, Internet Explorer does not always do this. Usually, I need to refresh the page in IE before it will ask the server for updated versions of things.
As I understand it, isn't the browser supposed to at least check the timestamps on all server side objects for each request, and then update them client side as necessary? Is there a way I can... not force, but.. "encourage" the browser to do this for certain items?
The main issue I'm having here is that I have some JavaScript on my pages that relies on the CSS being initialized a certain way, and vice versa. When one updates and the other does not (very common in IE when both are in their own external pages) it causes confusion, and occasional splatter results on the page. I can handle doing the "refresh the page" dance on my own for deving, but I don't want to have to encourage my users to "refresh the page or else" when I'm going on a scripting spree on the site.
Any advice would be greatly appreciated. The page itself updates without a hitch (it's PHP), so worst case scenario I could just spit the CSS and JavaScript out into the page itself, but that's really really ugly and of course I'm trying to avoid it at all costs.
It's a good practice to design your site so that the client only needs to fetch external JavaScript and CSS once.
Set up external resources to expire 1 year in the future. Append a version number to each file name, so instead of "style.css", use "style-1.2.4.css" and then when you want to update the client's CSS, increment the version number.
Incrementing the version number forces the client to download the updated file because it is seen as a totally separate resource.
Add a little random query string at the end of all URLs. It's ugly but it works when developing. For example:
<link rel="stylesheet" type="text/css" href="/mycss.css?foo=<?php echo rand(); ?>"/>
You can do the same for scripts, background images, etc. (And don't forget to remove these things when your site goes live ;))
This is only an issue during development, when the files are changing frequently. Once you've published the site, visitors will get the current version, and the caching mechanisms will do more or less what you want.
Rather than modify how my sites work, I have simply developed the habit of clearing my cache before refreshes during development. In Firefox, as you said, this is not usually an issue, but if I need to be really sure, I use Shift+Ctrl+Del, Enter (and my Clear Private Data settings leave only "Cache" checked.) And on IE, there's the old Shift+F5.
Of course, as others have mentioned, a random query string on your volatile files can save you a few keystrokes. But understand that this is in fact just for your own convenience and not really necessary for the production site.
It's better if you put the last-change timestamp as GET-data at the end of the source - in this way you'll be sure that there wont be any cache errors.
It's just dumb to remove the cache completely since it will result in slower pages.
If you're using apache, in your .htaccess for that css file:
<FilesMatch "mycssfile\.css$">
Header set Cache-Control: "private, pre-check=0, post-check=0, max-age=0"
Header set Expires: 0
Header set Pragma: no-cache
</FilesMatch>
Browsers store pages in the internal cache usually according to the instructions your server tells them. No, they are not required to reload the files if they appear to be unexpired based on the last check. Of course, when you hit reload, they do explicitly reload main page, but their behavior with respect to js and styles may differ.
I develop with Safari which has Develop -> Disable Caches menu for exactly that reason. I'm sure Firefox has something similar.
I use a slight twist on DanHerbert's method. I tend to use the same names for the stylesheets, but I affix a version number in the query string so that the browser sees the files as different whenever one chooses to incriment the version. Things still will get cached, but you can invalidate it at your choosing.
It's a bit cleaner than changing filenames and lends itself to being a centrally managed solution, especially in complex apps with lots of scripts and stylesheets coming from the four winds.