SVG stacking, anchor elements, and HTTP fetching - html

I have a series of overlapping questions, the intersection of which can best be asked as:
Under what circumstances does an # character (an anchor) in a URL trigger an HTTP fetch, in the context of either an <a href or an <img src ?
Normally, should:
http://foo.com/bar.html#1
and
http://foo.com/bar.html#2
require two different HTTP fetches? I would think the answer should definitely be NO.
More specific details:
The situation that prompted this question was my first attempt to experiment with SVG stacking - a technique where multiple icons can be embedded within a single svg file, so that only a single HTTP request is necessary. Essentially, the idea is that you place multiple SVG icons within a single file, and use CSS to hide all of them, except the one that is selected using a CSS :target selector.
You can then select an individual icon using the # character in the URL when you write the img element in the HTML:
<img
src="stacked-icons.svg#icon3"
width="80"
height="60"
alt="SVG Stacked Image"
/>
When I try this out on Chrome it works perfectly. A single HTTP request is made, and multiple icons can be displayed via the same svg url, using different anchors/targets.
However, when I try this with Firefox (28), I see via the Console that multiple HTTP requests are made - one for each svg URL! So what I see is something like:
GET http://myhost.com/img/stacked-icons.svg#icon1
GET http://myhost.com/img/stacked-icons.svg#icon2
GET http://myhost.com/img/stacked-icons.svg#icon3
GET http://myhost.com/img/stacked-icons.svg#icon4
GET http://myhost.com/img/stacked-icons.svg#icon5
...which of course defeats the purpose of using SVG stacking in the first place.
Is there some reason Firefox is making a separate HTTP request for each URL instead of simply fetching img/stacked-icons.svg once like Chrome does?
This leads into the broader question of - what rules determine whether an # character in the URL should trigger an HTTP request?

Here's a demo in Plunker to help sort out some of these issues
Plunker
stack.svg#circ
stack.svg#rect
A URI has a couple basic components:
Protocol - determines how to send the request
Domain - where to send the request
Location - file path within the domain
Fragment - which part of the document to look in
Media Fragment URI
A fragment is simply identifying a portion of the entire file.
Depending on the implementation of the Media Fragment URI spec, it actually might be totally fair game for the browser to send along the Fragment Identifier. Think about a streaming video, some of which has been cached on the client. If the request is for /video.ogv#t=10,20 then the server can save space by only sending back the relevant portion/segment/fragment. Moreover, if the applicable portion of the media is already cached on the client, then the browser can prevent a round trip.
Think Round Trips, Not Requests
When a browser issues a GET Request, it does not necessarily mean that it needs to grab a fresh copy of the file all the way from the server. If it has already has a cached version, the request could be answered immediately.
HTTP Status Codes
200 - OK - Got the resource and returned from the server
302 - Found - Found the resource in the cache and not enough has changed since the previous request that we need to get a fresh copy.
Caching Disabled
Various things can affect whether or not a client will perform caching: The way the request was issued (F5 - soft refresh; Ctrl + R - hard refresh), the settings on the browser, any development tools that add attributes to requests, and the way the server handles those attributes. Often, when a browser's developer tools are open, it will automatically disable caching so you can easily test changes to files. If you're trying to observe caching behavior specifically, make sure you don't have any developer settings that are interfering with this.
When comparing requests across browsers, to help mitigate the differences between Developer Tool UI's, you should use a tool like fiddler to inspect the actual HTTP requests and responses that go out over the wire. I'll show you both for the simple plunker example. When the page loads it should request two different ids in the same stacked svg file.
Here are side by side requests of the same test page in Chrome 39, FF 34, and IE 11:
Caching Enabled
But we want to test what would happen on a normal client where caching is enabled. To do this, update your dev tools for each browser, or go to fiddler and Rules > Performance > and uncheck Disable Caching.
Now our request should look like this:
Now all the files are returned from the local cache, regardless of fragment ID
The developer tools for a particular browser might try to display the fragment id for your own benefit, but fiddler should always display the most accurate address that is actually requested. Every browser I tested omitted the fragment part of the address from the request:
Bundling Requests
Interestingly, chrome seems to be smart enough to prevent a second request for the same resource, but FF and IE fail to do so when a fragment is part of address. This is equally true for SVG's and PNG's. Consequently, the first time a page is ever requested, the browser will load one file for each time the SVG stack is actually used. Thereafter, it's happy to take the version from the cache, but will hurt performance for new viewers.
Final Score
CON: First Trip - SVG Stacks are not fully supported in all browsers. One request made per instance.
PRO: Second Trip - SVG resources will be cached appropriately
Additional Resources
simurai.com is widely thought to be the first application of using SVG stacks and the article explains some browser limitations that are currently in progress
Sven Hofmann has a great article about SVG Stacks that goes over some implementation models.
Here's the source code in Github if you'd like to fork it or play around with samples
Here's the official spec for Fragment Identifiers within SVG images
Quick look at Browser Support for SVG Fragments
Bugs
Typically, identical resources that are requested for a page will be bundled into a single request that will satisfy all requesting elements. Chrome appears to have already fixed this, but I've opened bugs for FF and IE to one day fix this.
Chrome - This very old bug which highlighted this issue appears to have been resolved for webkit.
Firefox - Bug Added Here
Internet Explorer - Bug Added Here

Related

Why does View Source issue a new HTTP request?

I've noticed that both Firefox and Chrome issue a new HTTP request when you view the source for a web page that you've already loaded. It's particularly annoying when the page itself is slow to load or if it won't load at all.
Why is that? Wouldn't they have the existing source for the originally received page cached already? Is it based on Cache-Control headers?
This has been on my mind for a while (usually, comes up when looking at what's behind slow web apps).
In the context of Chrome, according to this link it is indeed base on Cache-Control headers.
...view-source grabs the html source from the http cache
and pretty-prints it, but for the page NOT in http cache, it's 'forced to'
make a new request.
To me, this makes sense. You wouldn't want to use what is currently rendered as the source of truth as obviously the HTML can be manipulated dynamically. If you can't use this, then the http cache would be the next likely candidate for the source. If the source in unavailable from cahce, a subsequent GET of the source seems to be the only alternative.
This does, however, introduce another interesting delima raised here.
Requesting the URL again doesn't make sense as there is no guarantee that source received during the second request will match what was received during the first request.
I would imagine this was a conscious trade-off that was made to ensure that a view-source request is always satisfied in some form or another.
You need to do "Inspect Element" for the live web page. Show-code reloads the page to show the source code without modification.

How to find the HTTP request from google chrome inspect element?

Forgive me if I don't use the proper terminology. I have a webpage that I'm trying to scrape information from. The problem is that when I view the page source the data I want to scrape is not there. I've encountered this problem before where the main http request triggers other requests and so the information I'm looking for is actually somewhere else which I find using Google chromes inspect - Network feature. I manually search the various documents and xhr files so the one that has the correct information. This is sometimes long and tedious. I can also use google chromes inspect feature to inspect the element that contains the information I want and that brings up the correct source code but it I can't seem to figure out where or how I can use that to quickly find the corresponding HTTP headers.
Restated in a short - can I use the inspect element feature of google chrome and then ask it to show me the corresponding network event (HTTP request) that produced that code?
I'll add the case study I'm working on.
http://www.flashscore.com/tennis/atp-singles/acapulco/results/
shows the different matches that took place at a tennis tournament. I'm trying to scrape the the match hrefs but if you view source of the page you'll see they're not there.
Thanks
Restated in a short - can I use the inspect element feature of google chrome and then ask it to show me the corresponding network event (HTTP request) that produced that code?
No. This isn't something that the browser keeps track of.
In most situations, the HTTP response will pass through a good deal of Javascript code before being eventually turned into elements on the page. Tracing which HTTP response was "responsible" for a given element would involve a great deal of data flow analysis, and is impractical for a browser to do.
One way:
open firefox, install LiveHttpHeaders, then run it, and you will see the expected HEADERS.
There's the same addon for google chrome, but not tested.

Managing browser history in Dart

I'm building a single-page Dart web app that will essentially consist of 1 Dart file (cross-compiled to JS) and 1 HTML file that has several "views" (screens, pages, etc.). in it. Depending on what "view" the user is currently located at, I will hide/enable different DOM elements defined inside this HTML file. This way the user can navigate between views without triggering multiple page loads.
I would still like to use each browser's native history-tracking mechanism, so that the user click can the back- and forward-buttons in the browser, and I'll have a Dart Historian object figure out what view to load (again just hiding/enabling DOM elements) depending on what URL the browser has in its history.
I've pretty much figured everything out, with one exception:
Say the user is currently "at" View #3, which has a URL of, say, http://myapp.example.com/#view3. Then they click a button that should take them to View #4 at, say, http://myapp.example.com/#view4. I need a way, in Dart, to tell the browser to:
Set http://myapp.example.com/#view4 in the browser URL bar
Add http://myapp.example.com/#view4 to the browser's history
If not already enabled, enable the browser's back button
I believe I can accomplish #1 above like so:
window.location.href = "http://myapp.example.com/#view3";
...but maybe not. Either way, how can I accomplish this (Dart code communicates with browser's history API)?
Check out the route library.
angular.dart also has it's own routing mechanism, but it's part of a much larger framework, so unless you plan on using the rest of it, I would recommend the stand-alone route library.
If you want to build your own solution, you can take a look at route's client.dart for inspiration.
There are two methods of history navigation supported:
The page fragment method that you've used. Reassign the window location to the new page fragment: window.location.assign(newPathWithPageFragment). Doing this will automatically add a new item to the browser history (which will then enable the back button).
The newer History API, which allows for regular URLs without fragments (e.g. http://myapp.example.com/view3. You can use window.history to control the history.The History API is only supported by newer browsers so that may be a concern (although given that dart2js also only supports newer browsers, there are probably not too many instances of a browser that dart2js supports that doesn't support the History API).
One issue you will have to handle if you support History API is the initial page load. When a user navigates to http://myapp.example.com/view3, the browser expects to find a resource at that location. You will have to setup your server to respond to any page request by serving your Dart application and then navigate to the correct view on the client-side. This issue will apply whether you use route, angular.dart, or build your own solution, since this is a general server-side issue and the above are all client-side libraries.

When multiple instances of same images are embedded in an HTML, does that load the image once?

If I use the same image within a single page multiple times, will each load separately, taking up the bandwidth and traffic, or will only one be loaded and rest embed code will reuse the image?
For example, let's say I did this:
<img src="http://img.to/image.jpg"/>
<img src="http://img.to/image.jpg"/>
<img src="http://img.to/image.jpg"/>
<img src="http://img.to/image.jpg"/>
<img src="http://img.to/image.jpg"/>
<img src="http://img.to/image.jpg"/>
...
<img src="http://img.to/image.jpg"/>
<img src="http://img.to/image.jpg"/>
and image.jpg is 100kb. When the browser loads this page, will it waste up (100Kb * # of img tags) of traffic? or will it just load one image.jpg and reuse it for the rest of the tags?
Browsers that implement 5th version of HTML specification may reuse images regardless of cache related HTTP headers. Probably only single network request will occur.
Specification defines image key.
7.2 Let key be a tuple consisting of the resulting absolute URL, the img element's crossorigin attribute's mode, and, if that mode is not No CORS, the Document object's origin.
When browser downloads first image source it adds it to list of available images using key.
If the download was successful and the user agent was able to determine the image's width and height
Set the img element to the completely available state.
Add the image to the list of available images using the key key.
When browser sees another image with same key it will take it from a list of available images.
7.3 If the list of available images contains an entry for key, then set the img element to the completely available state, update the presentation of the image appropriately, queue a task to fire a simple event named load at the img element, and abort these steps.
Nevertheless, browser may remove image from list of available images at any time.
Each Document object must have a list of available images. Each image in this list is identified by a tuple consisting of an absolute URL, a CORS settings attribute mode, and, if the mode is not No CORS, an origin. User agents may copy entries from one Document object's list of available images to another at any time (e.g. when the Document is created, user agents can add to it all the images that are loaded in other Documents), but must not change the keys of entries copied in this way when doing so. User agents may also remove images from such lists at any time (e.g. to save memory).
For more information see How does a list of available images used when parsing document with multiple img nodes with same src? issue in HTML Standard repository on GitHub.
Try it — when looking into caching issues, a tool like Firebug for Firefox or the Developer Tools within Chrome are very beneficial. If you open the 'Net' panel in either and reload a page, you will see what HTTP status code was sent for each item. 304 (Not modified) means that the item was retrieved locally from the cache.
As dthorpe says above, cache headers are important here. As well as making sure that 'no-cache' hasn't been set, if you have access to your server configuration you should be pro-active — if you know a resource isn't going to change you should make sure to set either an 'Expires' header (which tells browsers a date after which the cached copy should be considered stale) or a 'Cache-Control: max-age' header (which gives a number of days/hours rather than a set date).
You can set different time-scales for different mime-types/folders too, which allows you to get clients data to refresh HTML content often, but images and stylesheets rarely.
Here's a nice intro video/article by Google that's worth checking out.
It may depend on the specific browser implementation, but I would expect the first reference to the image to hit the server and subsequent references to the same image URL to be served from the browser cache. So, only one network request for the image.
That is, IF the HTTP cache headers set by the server on the image response allow the browser to cache the image at all. If the cache header is set to something like "no-cache", then the browser is required to refetch the image for every reference. You can check to see what the HTTP headers on the image response are using a network packet sniffer like Fiddler.
If the browser doesn't populate the image URL in the browser cache until after the image has completely downloaded, then you could see multiple requests for the same image, but that seems very unlikely.

HTML5 History API back button with partial page loads

To improve the performance/responsiveness of my website I have implemented a partial page load using AJAX, replaceState, pushState, and a popstate listener.
I essentially store the central part of my page (HTML) as my state object in the history. When a link is clicked I request just the central bit of the page from the server (identifying these requests with a different Accept header) and replace it with javascript. On popstate I grab the previous central part and push it back into the dom.
This mostly works fine, however I have found a particular issue which I am stuck on. It is a little complicated to explain, so my apologies if this isn't very clear.
There is a search form on most of our pages. The partial page loading via ajax is only on GET requests, and the form performs a POST which results in a full page load.
If I navigate the following set of pages, I end up on a malformed partial page consisting of ONLY the central content, without any of the surrounding dom:
Start on the Home Page (via full page load) - perform a Search (post-redirect-get)
Takes you to Search Results (via full page load) - then click Home
Returns you to the Home Page (via dynamic get) - click browser back
Search Results (from popstate listener) - click browser back
Malformed home page.
When the malformed home page appears, my popstate listener is not present at all.
What I think is happening, is that the second load (dynamic, partial) of the home page is being cached by the browser, and then when the full page back occurs, the browser is merely showing the cached partial response rather than the full page.
To try to remedy this I have added a Vary: Accept header to the response to let the browsers know that content may change based on the accept header. I have also added Cache-Control max-age=0, pragma no-cache, and a past expiry date to the partial loaded content to try to force the browser not to cache this, but none of this solves it.
Unfortunately my company does not allow external traffic to our dev servers, so I cannot show you the problem. I have looked at various similar questions on here, but none of them seem quite the same, nor do the solutions suggested seem to work.
If I add a pointless parameter (blah=blah) to my dynamic GET requests, this solves the problem. However this is an ugly hack that I'd rather not do. This seems like it should be solvable with headers, since I think it is a caching problem. Can anyone explain what's going on?
That's a caching issue. With the response header Cache-Control set to no-cache or max-age=0, the problem doesn't happen in FF (as you said), but it persists in Chrome.
The header that worked for me is Cache-Control: no-store. That's not consistently implemented by all the browsers (you can find questions asking what is the difference between no-cache and no-store), but has the result you expect in Chrome as well.
I had a similar issue. I'm building a web-based wizard and using jquery ajax to load each step (ASP.NET MVC on the back end).
The inital route is something like /Questions/Show - which loads the whole page and displays the first question (question 0). When the user clicks the Next image, it does a jquery .load() with the url /Questions/Save/0 . This saves the answer and returns a partial view with the next question. The next save does a jquery .load() with /Questions/Save/1 ...
I implemented History.js so that the user can go back and forward (forth?). It stores the question number in the state data. When it detects a statechange (and the state's question number is different from what's on the page), it does a jquery .load() to load the correct question.
At first I was using the same route as when the initial page is loaded (/Questions/Show/X where X is the question number). On the back end I detected whether it was an ajax request, and if so returned a partial view instead of the full view. This is where the issue came in similar to yours: Say I was on question 3, went back, then forward, then went to www.google.com, and then hit the back button. It showed question 3 but it was the partial view - because the browser cached the partial for that route.
My solution was to create a separate route for the ajax call: /Questions/Load/X (it used common code on the back end). Now with two different routes (/Questions/Show for non-ajax and /Questions/Load for ajax), the browser shows the full page correctly in the above situation.
So maybe a similar solution would work for you... i.e. two different routes for your home page - one for the full page and one for a partial. Hope that helps.
When a link is clicked I request just the central bit of the page from the server (identifying these requests with a different Accept header) and replace it with javascript.
Awesome. That's the RESTful way to do it. But there's one thing left to do to make it work: add a Vary header to the response.
Vary: Accept
That tells the browser that a request with a different Accept header might get a different response. Because the two requests use different Accept headers, the browser (and any caching proxies) will cache the responses separately.
Unlike setting Cache-Control: no-store, this will still allow you to use caching.