I've seen this issue so much, what it means?
what is its usage? is it just a simple subdomain or not?
It seems that sites with lots of visitors use that!
Any help or suggestion would be greatly appreciated...
The example site for example is as below:
http://static1.cloob.com
thanks,
alireza.
This is a performance optimization practice.
Your server must include a cookie for EVERY HTTP request it receives. That means if your site serves 500 images on the page - the server will send back 500 cookies. That's obviously bad and slow. That's why separate domains and sub-domains used to serve static content (images, stylesheets, flash movies etc.) from a cookieless domain.
Read a little more here, for example - Serving Static Content from a Cookieless Domain
It is a simple subdomain.
Using multiple subdomains for static content allows you to avoid the browser's max-connections-per-server limit.
Related
There's a list of 100s of URLs that need to be checked to determined if the sites are live (someone has put their own content, even if just a landing), unreachable, or parked.
Unreachable is self explanatory, but distinguishing between actual user content and a parked domain is trickier. What I mean to say is someone who's hosting a domain through GoDaddy and uses their default landing page versus a hosted site with unique content as a landing page.
Using http codes (2xx,3xx,4xx,etc) isn't reliable. Does anyone know of a solution? It doesn't need to be 100% accurate in all instances, just accurate when it says it's accurate in order to minimise manual checking.
The best solution I can come up with is seeing who the site is registered with and comparing the code against other sites also registered there where matches >.9 or something to that effect. This is clunky.
Are there any ready-made solutions for this problem? If not, is there a more efficient methodology?
There is a company I'm working with that says we are slowing down their web hosting software by hosting images on a separate domain.
I've told them what we are doing should only speed them up because there will be less file requests to their server.
They replied by saying that because they use HTML 4.0, their server is having to make image requests on the server side before they send content to the user.
This makes no sense to me and am trying to disprove this claim.
Am I wrong and just crazy?
I've been looking for articles on this for hours and have had no luck.
Proof that their statement is false would be greatly appreciated, and an article on this topic would be even more helpful.
Your mindset is correct. There is nothing about HTML4 that validates their claim in the context you provided us.
When you make a GET request to the server, you pull an HTML page. The browser then parses the document and makes additional requests, as declared in the document. Images are no exception. When it reaches an image, it will make a GET request to retrieve it to the specified URI. If that URI is not on the same domain, it is not going to make a request to the same domain. The server does not make the GET request for you.
Now, they could be doing something special that would cause it to operate more slowly, but nothing about the HTML4 spec would lead to it.
Simply has nothing to do with HTML 4, because you could target every Image in a <img src="http://other-server.de/bla.png" /> Tag. So if you point these tags to your own hosting solution it doesn't slow down their software, except you point these tags to their servers, and the servers fetch the images from the remote server. The browser always load resources from the URL, you put in the tag.
Except they rewrite the HTML code automatically on the fly, so they point to their servers.
EDIT: Maybe the page loads slowly because maybe your Image-Hosting-Server is responding slowly?!
I maintain a local intranet site that among other things, displays movie poster images from IMDB.com. Until recently, I simply had a perl script download the images I needed and save them to the local server. But that became a HUGE space-hog, so I thought I could simply point my site directly to IMDB servers, since my traffic is very minimal.
The result was that some images would display, while others wouldn't. And images that were displayed, would sometimes disappear after a few refreshes. The images existed on the IMDB servers, they just wouldn't display on my page.
It seems unlikely to me that IMDB would somehow block this kind of access, but is that possible? Is there something that needs to be configured on my end?
I'm out of ideas - it just doesn't make sense to me.
I'm serving my pages with mod_perl and HTML::Mason, if that's relevant.
Thanks,
Ryan
Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8l DAV/2 mod_perl/2.0.4 Perl/v5.10.0
Absolutely they would block that kind of access. You're using their bandwidth, which they have to pay for, for your web site. Sites will often look at the referrer, see that its not coming from their site, and either block or throttle access. Likely you're seeing this as an intermittent problem because IMDB is allowing you some amount of use of their images.
To find out more, look at the HTTP logs on your client. Either by using a browser plugin or by scripting it. Look at the HTTP response codes and you'll probably see some 4xx or 5xx responses.
I would suggest either caching the images in a cache that expires unused images, that will balance accesses with space, or perhaps getting a paid IMDB account. You may be able to get an API key to use to fetch images indicating you are a paying customer.
IMDB sure could be preventing your 'bandwidth theft' by checking the "referer". More info here: http://www.thesitewizard.com/archive/bandwidththeft.shtml
Why is it intermittent? Maybe they only implement this on some of the servers in their web farm.
Just to add to the existing answers, what you're doing is called "hotlinking", and people who run websites don't like it very much. Google for "hotlink blocking".
I have an ASP.NET web site technology that I use for scores of clients. Each client gets their own web site (a copy of the core site that can then be customized). The web site includes a fair amount of content - articles on health and wellness - that is loaded from a central content server. I can load the html for these articles from a central content server by copying from the content server and then inserting the text into the page as it is produced.
Easy so far.
However, these articles have image references that point back to the central server. The problem that I have is due to the fact that these sites are always accessed (every page) via an SSL link. When a page with an external image reference is loaded, the visitor receives a message that the page "contains both secure and insecure elements" (or something similar) because the images come from the (unsecured) server. There is really no way around this.
So, in your judgment, is it better to:
A) just put a cert on the content server so I can get the images over SSL? Are there problems there due to the page content having two certs? Any other thoughts?
B) change the links to the article presentation page so they don't use SSL? They don't need SSL but the left side of the page contains lots of links to pages that do need - all of which are now relative links. Making them all absolute links is grody because each client's site has its own URL so all links would need to be generated in code (blech).
C) Something else that I haven't thought of? This is where I am hoping that someone with experience in the area will offer something brilliant!
NOTE: I know that I can not get rid of the warning about insecure elements - it is there for a reason. I am just wondering if anyone else has experience in this area and has a reasonable compromise or some new insight.
Not sure how feasable this is but it may be possible to use a rewrite or proxy module to mirror the (img directory) structure on each clone to that of the central. With such a rule in place you could use relative img urls instead & internally rewrite all requests to these images over to the central server, silently
e.g.:
https://cloneA/banner.jpg -> http://central/static/banner.jpg
https://cloneB/topic7/img/header.jpg -> http://central/static/topic7/header.jpg
I'd go with B.
Sadly, I think you'll find this is a sad fact of life in SSL. Even if you were to put a cert on the other server, I think it may still get confused because of different sites [can't confirm nor deny though], and regardless, you don't want to waste the time of your media server by encrypting images.
I figured out a completely different way to import the images late last night after asking this question. In IIS, at least, you can set up "Virtual Directories" that can point essentially anywhere (I'm now evaluating whether to use a dedicated directory on each web server or a URL). If I use a dedicated directory on each server I will have three directories to keep up to date, but at least I don't have 70+.
Since each site will pull the images using resource locations found on the local site, then I don't have to worry about changing the SSL status of any page.
We have several images and PDF documents that are available via our website. These images and documents are stored in source control and are copied content on deployment. We are considering creating a separate image server to put our stock images and PDF docs on - thus significantly decreasing the bulk of our deployment package.
Does anyone have experience with this approach?
I am wondering about any "gotchas" - like XSS issues and/or browser issues delivering content from the alternate sub-domain?
Pro:
Many browsers will only allocate two sockets to downloading assets from a single host. So if index.html is downloaded from www.domain.com and it references 6 image files, 3 javascript files, and 3 CSS files (all on www.domain.com), the browser will download them 2 at a time, with the other blocking until a socket is free.
If you pull the 6 image files off onto a separate host, say images.domain.com, you get an extra two sockets dedicated to download your images. This parallelizes the asset download process so, in theory, your page could render twice as fast.
Con:
If you're using SSL, you would need to either get an additional single-host SSL certificate for images.domain.com or a wildcard SSL certificate for *.domain.com (matches any subdomain). Failure to do so will generate a warning in the browser saying the page contains mixed secure and insecure content.
You will also, with a different domain, not send the cookies data with every request. This can increase performance.
Another thing not yet mentioned is that you can use different web servers to serve different sorts of content. For example, your static content could be served via lighttpd or nginx while still serving your dynamic content off Apache.
Pros:
-load balancing
-isolating a different functionality
Cons:
-more work (when you create a page on the main site you would have to maintain the resources on the separate server)
Things like XSS is a problem of code not sanitizing input (or output for that matter). The only issue that could arise is if you have sub-domain specific cookies that are used for authentication.. but that's really a trivial fix.
If you're serving HTTPS and you serve an image from an HTTP domain then you'll get browser security alert warnings pop up when you use it.
So if you do HTTPS, you'll need to buy HTTPS for your image domain awell if you don't want to annoy the hell out of your users :)
There are other ways around this, but it's not particularly in the scope of this answer - it was just a warning!