I maintain a local intranet site that among other things, displays movie poster images from IMDB.com. Until recently, I simply had a perl script download the images I needed and save them to the local server. But that became a HUGE space-hog, so I thought I could simply point my site directly to IMDB servers, since my traffic is very minimal.
The result was that some images would display, while others wouldn't. And images that were displayed, would sometimes disappear after a few refreshes. The images existed on the IMDB servers, they just wouldn't display on my page.
It seems unlikely to me that IMDB would somehow block this kind of access, but is that possible? Is there something that needs to be configured on my end?
I'm out of ideas - it just doesn't make sense to me.
I'm serving my pages with mod_perl and HTML::Mason, if that's relevant.
Thanks,
Ryan
Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8l DAV/2 mod_perl/2.0.4 Perl/v5.10.0
Absolutely they would block that kind of access. You're using their bandwidth, which they have to pay for, for your web site. Sites will often look at the referrer, see that its not coming from their site, and either block or throttle access. Likely you're seeing this as an intermittent problem because IMDB is allowing you some amount of use of their images.
To find out more, look at the HTTP logs on your client. Either by using a browser plugin or by scripting it. Look at the HTTP response codes and you'll probably see some 4xx or 5xx responses.
I would suggest either caching the images in a cache that expires unused images, that will balance accesses with space, or perhaps getting a paid IMDB account. You may be able to get an API key to use to fetch images indicating you are a paying customer.
IMDB sure could be preventing your 'bandwidth theft' by checking the "referer". More info here: http://www.thesitewizard.com/archive/bandwidththeft.shtml
Why is it intermittent? Maybe they only implement this on some of the servers in their web farm.
Just to add to the existing answers, what you're doing is called "hotlinking", and people who run websites don't like it very much. Google for "hotlink blocking".
Related
There is a company I'm working with that says we are slowing down their web hosting software by hosting images on a separate domain.
I've told them what we are doing should only speed them up because there will be less file requests to their server.
They replied by saying that because they use HTML 4.0, their server is having to make image requests on the server side before they send content to the user.
This makes no sense to me and am trying to disprove this claim.
Am I wrong and just crazy?
I've been looking for articles on this for hours and have had no luck.
Proof that their statement is false would be greatly appreciated, and an article on this topic would be even more helpful.
Your mindset is correct. There is nothing about HTML4 that validates their claim in the context you provided us.
When you make a GET request to the server, you pull an HTML page. The browser then parses the document and makes additional requests, as declared in the document. Images are no exception. When it reaches an image, it will make a GET request to retrieve it to the specified URI. If that URI is not on the same domain, it is not going to make a request to the same domain. The server does not make the GET request for you.
Now, they could be doing something special that would cause it to operate more slowly, but nothing about the HTML4 spec would lead to it.
Simply has nothing to do with HTML 4, because you could target every Image in a <img src="http://other-server.de/bla.png" /> Tag. So if you point these tags to your own hosting solution it doesn't slow down their software, except you point these tags to their servers, and the servers fetch the images from the remote server. The browser always load resources from the URL, you put in the tag.
Except they rewrite the HTML code automatically on the fly, so they point to their servers.
EDIT: Maybe the page loads slowly because maybe your Image-Hosting-Server is responding slowly?!
As far as I know, at the current moment, late 2011 the max-connections-per-server limit remains 6. Please correct me if I am wrong. This is bad that we cannot fix this easily as in Firefox. As far as I know this value is hardcoded.
One of the solutions is to download the Chromium's sources and rebuild them. Is there a more easy solution?
Is there any tricky way to hack this without creating a dozen of mirror-domains?
Why I'm asking the question: My task is to create a html-javascript slideshow that will run inside a fullscreened browser, and a huge monitor is hanging on the wall. The javascript is really complicated, it preloads photos and makes a lot of ajax calls to my web services. If WIFI connection is slow, if 6 photos are loading, the AJAX calls fail, the application runs bad. I want a fast solution based, on http or browser or ubuntu tweak something else, because rebuilding the javascript app will take days.
Offtopic: do you know any other things that can be tweaked in my concrete situation?
IE is even worse with 2 connection per domain limit. But I wouldn't rely on fixing client browsers. Even if you have control over them, browsers like chrome will auto update and a future release might behave differently than you expect. I'd focus on solving the problem within your system design.
Your choices are to:
Load the images in sequence so that only 1 or 2 XHR calls are active at a time (use the success event from the previous image to check if there are more images to download and start the next request).
Use sub-domains like serverA.myphotoserver.com and serverB.myphotoserver.com. Each sub domain will have its own pool for connection limits. This means you could have 2 requests going to 5 different sub-domains if you wanted to. The downfall is that the photos will be cached according to these sub-domains. BTW, these don't need to be "mirror" domains, you can just make additional DNS pointers to the exact same website/server. This means you don't have the headache of administrating many servers, just one server with many DNS records.
I don't know that you can do it in Chrome outside of Windows -- some Googling shows that Chrome (and therefore possibly Chromium) might respond well to a certain registry hack.
However, if you're just looking for a simple solution without modifying your code base, have you considered Firefox? In the about:config you can search for "network.http.max" and there are a few values in there that are definitely worth looking at.
Also, for a device that will not be moving (i.e. it is mounted in a fixed location) you should consider not using Wi-Fi (even a Home-Plug would be a step up as far as latency / stability / dropped connections go).
BTW, HTTP 1/1 specification (RFC2616) suggests no more than 2 connections per server.
Clients that use persistent connections SHOULD limit the number of simultaneous connections that they maintain to a given server. A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy. A proxy SHOULD use up to 2*N connections to another server or proxy, where N is the number of simultaneously active users. These guidelines are intended to improve HTTP response times and avoid congestion.
There doesn't appear to be an external way to hack the behaviour of the executables.
You could modify the Chrome(ium) executables as this information is obviously compiled in. That approach brings a lot of problems with support and automatic upgrades so you probably want to avoid doing that. You also need to understand how to make the changes to the binaries which is not something most people can pick up in a few days.
If you compile your own browser you are creating a support issue for yourself as you are stuck with a specific revision. If you want to get new features and bug fixes you will have to recompile. All of this involves tracking Chrome development for bugs and build breakages - not something that a web developer should have to do.
I'd follow #BenSwayne's advice for now, but it might be worth thinking about doing some of the work outside of the client (the web browser) and putting it in a background process running on the same or different machines. This process can handle many more connections and you are just responsible for getting the data back from it. Since it is local(ish) you'll get results back quickly even with minimal connections.
I have an ASP.NET web site technology that I use for scores of clients. Each client gets their own web site (a copy of the core site that can then be customized). The web site includes a fair amount of content - articles on health and wellness - that is loaded from a central content server. I can load the html for these articles from a central content server by copying from the content server and then inserting the text into the page as it is produced.
Easy so far.
However, these articles have image references that point back to the central server. The problem that I have is due to the fact that these sites are always accessed (every page) via an SSL link. When a page with an external image reference is loaded, the visitor receives a message that the page "contains both secure and insecure elements" (or something similar) because the images come from the (unsecured) server. There is really no way around this.
So, in your judgment, is it better to:
A) just put a cert on the content server so I can get the images over SSL? Are there problems there due to the page content having two certs? Any other thoughts?
B) change the links to the article presentation page so they don't use SSL? They don't need SSL but the left side of the page contains lots of links to pages that do need - all of which are now relative links. Making them all absolute links is grody because each client's site has its own URL so all links would need to be generated in code (blech).
C) Something else that I haven't thought of? This is where I am hoping that someone with experience in the area will offer something brilliant!
NOTE: I know that I can not get rid of the warning about insecure elements - it is there for a reason. I am just wondering if anyone else has experience in this area and has a reasonable compromise or some new insight.
Not sure how feasable this is but it may be possible to use a rewrite or proxy module to mirror the (img directory) structure on each clone to that of the central. With such a rule in place you could use relative img urls instead & internally rewrite all requests to these images over to the central server, silently
e.g.:
https://cloneA/banner.jpg -> http://central/static/banner.jpg
https://cloneB/topic7/img/header.jpg -> http://central/static/topic7/header.jpg
I'd go with B.
Sadly, I think you'll find this is a sad fact of life in SSL. Even if you were to put a cert on the other server, I think it may still get confused because of different sites [can't confirm nor deny though], and regardless, you don't want to waste the time of your media server by encrypting images.
I figured out a completely different way to import the images late last night after asking this question. In IIS, at least, you can set up "Virtual Directories" that can point essentially anywhere (I'm now evaluating whether to use a dedicated directory on each web server or a URL). If I use a dedicated directory on each server I will have three directories to keep up to date, but at least I don't have 70+.
Since each site will pull the images using resource locations found on the local site, then I don't have to worry about changing the SSL status of any page.
I am thinking to save server load, i could load common javascript files (jquery src) and maybe certain images from websites like Google (which are almost always never down, and always pretty fast, maybe faster than my server).
Will it save much load?
Thanks!
UPDATE: I am not so much worried about saving bandwidth, as I am reducing Server Load because my server has difficulty when there are a lot of users online, and I think this is because there are too many images/files it loads from my single server.
You might consider putting up another server that does nothing but serve your static files using an ultra efficient web server such as lighttpd
This is known as a content delivery network, and it will help, although you should probably make sure you need one before you go about setting it all up. I have heard okay things about Amazon S3 for this (which Twitter, among other sites, use to host their images and such). Also, you should consider Google's API cloud if you are using any popular javascript libraries.
Well, there are a couple things in principle:
Serving up static resources (.htm files, image files, etc) rarely even make a server breathe hard except under the most demanding of circumstances (thousands of requests in a very short period of time)
Google's network is most likely faster than yours, and most everyone else. ;)
So if you are truly not experiencing any bandwidth problems, I don't think offloading your images, etc will do much for you. However, as you move stuff off to Google, then it frees your server's bandwidth up for more concurrent requests and faster transfer on the existing ones. The only tradeoff here is that clients will experience a slight (most likely unnoticable) initial delay while DNS looks up the other servers and initiates the connection to them.
It really depends on what your server load is like now. Are there lots of small web pages and lots of users? If so, then the 50K taken up by jQuery could mean a lot. If all of your pages are fairly large, and/or you have a small user base, caching jQuery with Google might not help much. Same with the pictures. That said, I have heard anecdotal reports (here on SO) that loading your scripts from Google does indeed provide noticeable performance improvement. I have also heard that Google is not necessarily 100% uptime (though it is close), and when it is down it is damned inconvenient.
If you're suffering from speed problems, putting your scripts at the bottom of the web page can help a lot.
I'm assuming you want to save costs by offloading commonly used resources to the web at large.
What you're suggesting is called Hotlinking.. that means directly linking to other people's content. While it can work in most cases, you do lose control of the content, that means your website may change without your input. Since image hosted on google are scoured from other websites, the images may be copyrighted, causing some (potential) concern, or they may have anti-hotlinking measures that may block the images from your webpage.
If you're just working on a hobby website, you can consider hosting your resources on a free web account to save bandwidth.
We have several images and PDF documents that are available via our website. These images and documents are stored in source control and are copied content on deployment. We are considering creating a separate image server to put our stock images and PDF docs on - thus significantly decreasing the bulk of our deployment package.
Does anyone have experience with this approach?
I am wondering about any "gotchas" - like XSS issues and/or browser issues delivering content from the alternate sub-domain?
Pro:
Many browsers will only allocate two sockets to downloading assets from a single host. So if index.html is downloaded from www.domain.com and it references 6 image files, 3 javascript files, and 3 CSS files (all on www.domain.com), the browser will download them 2 at a time, with the other blocking until a socket is free.
If you pull the 6 image files off onto a separate host, say images.domain.com, you get an extra two sockets dedicated to download your images. This parallelizes the asset download process so, in theory, your page could render twice as fast.
Con:
If you're using SSL, you would need to either get an additional single-host SSL certificate for images.domain.com or a wildcard SSL certificate for *.domain.com (matches any subdomain). Failure to do so will generate a warning in the browser saying the page contains mixed secure and insecure content.
You will also, with a different domain, not send the cookies data with every request. This can increase performance.
Another thing not yet mentioned is that you can use different web servers to serve different sorts of content. For example, your static content could be served via lighttpd or nginx while still serving your dynamic content off Apache.
Pros:
-load balancing
-isolating a different functionality
Cons:
-more work (when you create a page on the main site you would have to maintain the resources on the separate server)
Things like XSS is a problem of code not sanitizing input (or output for that matter). The only issue that could arise is if you have sub-domain specific cookies that are used for authentication.. but that's really a trivial fix.
If you're serving HTTPS and you serve an image from an HTTP domain then you'll get browser security alert warnings pop up when you use it.
So if you do HTTPS, you'll need to buy HTTPS for your image domain awell if you don't want to annoy the hell out of your users :)
There are other ways around this, but it's not particularly in the scope of this answer - it was just a warning!