Why does View Source issue a new HTTP request? - google-chrome

I've noticed that both Firefox and Chrome issue a new HTTP request when you view the source for a web page that you've already loaded. It's particularly annoying when the page itself is slow to load or if it won't load at all.
Why is that? Wouldn't they have the existing source for the originally received page cached already? Is it based on Cache-Control headers?
This has been on my mind for a while (usually, comes up when looking at what's behind slow web apps).

In the context of Chrome, according to this link it is indeed base on Cache-Control headers.
...view-source grabs the html source from the http cache
and pretty-prints it, but for the page NOT in http cache, it's 'forced to'
make a new request.
To me, this makes sense. You wouldn't want to use what is currently rendered as the source of truth as obviously the HTML can be manipulated dynamically. If you can't use this, then the http cache would be the next likely candidate for the source. If the source in unavailable from cahce, a subsequent GET of the source seems to be the only alternative.
This does, however, introduce another interesting delima raised here.
Requesting the URL again doesn't make sense as there is no guarantee that source received during the second request will match what was received during the first request.
I would imagine this was a conscious trade-off that was made to ensure that a view-source request is always satisfied in some form or another.

You need to do "Inspect Element" for the live web page. Show-code reloads the page to show the source code without modification.

Related

Serve different resource depending on full URL of requesting page

Let's say that we have two pages:
https://www.example.com/first/firstpage.html
https://www.example.com/second/secondpage.html
that both load the resource https://www.example.com/resource.js
If I want the server that serves resource.js to be able to serve a different version of resource.js depending on which page the request is coming from, is there a reliable header upon which the full URL of the requesting page can be determined (or maybe there is some other way to determine this)?
I know that there is an Origin header, but from my understanding this just represents the domain (and any subdomains) without the full URL and query string. Is there any way for the server to know the full URL and query string that the request for the resource is coming from?
If this isn't possible, I know it would be easy to include that info in the JS script tag as follows:
<script src="/resource.js?origin=/first/firstpage.html"></script>
But I don't want to have to modify the script tag for each page. Is there some other way to have the page automatically include it's own URL in the query string of the resource request (without having to dynamically load the resource using my own JS script - HTML only please!), or just any unique identifier so that the script tag doesn't have to be modified individually on each page?
There's the Referer header that you can use.
Make sure that your response uses Vary: Referer. Otherwise, browsers are going to cache this resource as if the referring page URL didn't matter.
I'd plead of you not to do this at all though. You're going to create a rabbit hole of problems, as not all browsers or proxy servers are well behaved. Some are going to aggressively cache this anyway, no matter what you do with the Vary header.

SVG stacking, anchor elements, and HTTP fetching

I have a series of overlapping questions, the intersection of which can best be asked as:
Under what circumstances does an # character (an anchor) in a URL trigger an HTTP fetch, in the context of either an <a href or an <img src ?
Normally, should:
http://foo.com/bar.html#1
and
http://foo.com/bar.html#2
require two different HTTP fetches? I would think the answer should definitely be NO.
More specific details:
The situation that prompted this question was my first attempt to experiment with SVG stacking - a technique where multiple icons can be embedded within a single svg file, so that only a single HTTP request is necessary. Essentially, the idea is that you place multiple SVG icons within a single file, and use CSS to hide all of them, except the one that is selected using a CSS :target selector.
You can then select an individual icon using the # character in the URL when you write the img element in the HTML:
<img
src="stacked-icons.svg#icon3"
width="80"
height="60"
alt="SVG Stacked Image"
/>
When I try this out on Chrome it works perfectly. A single HTTP request is made, and multiple icons can be displayed via the same svg url, using different anchors/targets.
However, when I try this with Firefox (28), I see via the Console that multiple HTTP requests are made - one for each svg URL! So what I see is something like:
GET http://myhost.com/img/stacked-icons.svg#icon1
GET http://myhost.com/img/stacked-icons.svg#icon2
GET http://myhost.com/img/stacked-icons.svg#icon3
GET http://myhost.com/img/stacked-icons.svg#icon4
GET http://myhost.com/img/stacked-icons.svg#icon5
...which of course defeats the purpose of using SVG stacking in the first place.
Is there some reason Firefox is making a separate HTTP request for each URL instead of simply fetching img/stacked-icons.svg once like Chrome does?
This leads into the broader question of - what rules determine whether an # character in the URL should trigger an HTTP request?
Here's a demo in Plunker to help sort out some of these issues
Plunker
stack.svg#circ
stack.svg#rect
A URI has a couple basic components:
Protocol - determines how to send the request
Domain - where to send the request
Location - file path within the domain
Fragment - which part of the document to look in
Media Fragment URI
A fragment is simply identifying a portion of the entire file.
Depending on the implementation of the Media Fragment URI spec, it actually might be totally fair game for the browser to send along the Fragment Identifier. Think about a streaming video, some of which has been cached on the client. If the request is for /video.ogv#t=10,20 then the server can save space by only sending back the relevant portion/segment/fragment. Moreover, if the applicable portion of the media is already cached on the client, then the browser can prevent a round trip.
Think Round Trips, Not Requests
When a browser issues a GET Request, it does not necessarily mean that it needs to grab a fresh copy of the file all the way from the server. If it has already has a cached version, the request could be answered immediately.
HTTP Status Codes
200 - OK - Got the resource and returned from the server
302 - Found - Found the resource in the cache and not enough has changed since the previous request that we need to get a fresh copy.
Caching Disabled
Various things can affect whether or not a client will perform caching: The way the request was issued (F5 - soft refresh; Ctrl + R - hard refresh), the settings on the browser, any development tools that add attributes to requests, and the way the server handles those attributes. Often, when a browser's developer tools are open, it will automatically disable caching so you can easily test changes to files. If you're trying to observe caching behavior specifically, make sure you don't have any developer settings that are interfering with this.
When comparing requests across browsers, to help mitigate the differences between Developer Tool UI's, you should use a tool like fiddler to inspect the actual HTTP requests and responses that go out over the wire. I'll show you both for the simple plunker example. When the page loads it should request two different ids in the same stacked svg file.
Here are side by side requests of the same test page in Chrome 39, FF 34, and IE 11:
Caching Enabled
But we want to test what would happen on a normal client where caching is enabled. To do this, update your dev tools for each browser, or go to fiddler and Rules > Performance > and uncheck Disable Caching.
Now our request should look like this:
Now all the files are returned from the local cache, regardless of fragment ID
The developer tools for a particular browser might try to display the fragment id for your own benefit, but fiddler should always display the most accurate address that is actually requested. Every browser I tested omitted the fragment part of the address from the request:
Bundling Requests
Interestingly, chrome seems to be smart enough to prevent a second request for the same resource, but FF and IE fail to do so when a fragment is part of address. This is equally true for SVG's and PNG's. Consequently, the first time a page is ever requested, the browser will load one file for each time the SVG stack is actually used. Thereafter, it's happy to take the version from the cache, but will hurt performance for new viewers.
Final Score
CON: First Trip - SVG Stacks are not fully supported in all browsers. One request made per instance.
PRO: Second Trip - SVG resources will be cached appropriately
Additional Resources
simurai.com is widely thought to be the first application of using SVG stacks and the article explains some browser limitations that are currently in progress
Sven Hofmann has a great article about SVG Stacks that goes over some implementation models.
Here's the source code in Github if you'd like to fork it or play around with samples
Here's the official spec for Fragment Identifiers within SVG images
Quick look at Browser Support for SVG Fragments
Bugs
Typically, identical resources that are requested for a page will be bundled into a single request that will satisfy all requesting elements. Chrome appears to have already fixed this, but I've opened bugs for FF and IE to one day fix this.
Chrome - This very old bug which highlighted this issue appears to have been resolved for webkit.
Firefox - Bug Added Here
Internet Explorer - Bug Added Here

HTML5 History API back button with partial page loads

To improve the performance/responsiveness of my website I have implemented a partial page load using AJAX, replaceState, pushState, and a popstate listener.
I essentially store the central part of my page (HTML) as my state object in the history. When a link is clicked I request just the central bit of the page from the server (identifying these requests with a different Accept header) and replace it with javascript. On popstate I grab the previous central part and push it back into the dom.
This mostly works fine, however I have found a particular issue which I am stuck on. It is a little complicated to explain, so my apologies if this isn't very clear.
There is a search form on most of our pages. The partial page loading via ajax is only on GET requests, and the form performs a POST which results in a full page load.
If I navigate the following set of pages, I end up on a malformed partial page consisting of ONLY the central content, without any of the surrounding dom:
Start on the Home Page (via full page load) - perform a Search (post-redirect-get)
Takes you to Search Results (via full page load) - then click Home
Returns you to the Home Page (via dynamic get) - click browser back
Search Results (from popstate listener) - click browser back
Malformed home page.
When the malformed home page appears, my popstate listener is not present at all.
What I think is happening, is that the second load (dynamic, partial) of the home page is being cached by the browser, and then when the full page back occurs, the browser is merely showing the cached partial response rather than the full page.
To try to remedy this I have added a Vary: Accept header to the response to let the browsers know that content may change based on the accept header. I have also added Cache-Control max-age=0, pragma no-cache, and a past expiry date to the partial loaded content to try to force the browser not to cache this, but none of this solves it.
Unfortunately my company does not allow external traffic to our dev servers, so I cannot show you the problem. I have looked at various similar questions on here, but none of them seem quite the same, nor do the solutions suggested seem to work.
If I add a pointless parameter (blah=blah) to my dynamic GET requests, this solves the problem. However this is an ugly hack that I'd rather not do. This seems like it should be solvable with headers, since I think it is a caching problem. Can anyone explain what's going on?
That's a caching issue. With the response header Cache-Control set to no-cache or max-age=0, the problem doesn't happen in FF (as you said), but it persists in Chrome.
The header that worked for me is Cache-Control: no-store. That's not consistently implemented by all the browsers (you can find questions asking what is the difference between no-cache and no-store), but has the result you expect in Chrome as well.
I had a similar issue. I'm building a web-based wizard and using jquery ajax to load each step (ASP.NET MVC on the back end).
The inital route is something like /Questions/Show - which loads the whole page and displays the first question (question 0). When the user clicks the Next image, it does a jquery .load() with the url /Questions/Save/0 . This saves the answer and returns a partial view with the next question. The next save does a jquery .load() with /Questions/Save/1 ...
I implemented History.js so that the user can go back and forward (forth?). It stores the question number in the state data. When it detects a statechange (and the state's question number is different from what's on the page), it does a jquery .load() to load the correct question.
At first I was using the same route as when the initial page is loaded (/Questions/Show/X where X is the question number). On the back end I detected whether it was an ajax request, and if so returned a partial view instead of the full view. This is where the issue came in similar to yours: Say I was on question 3, went back, then forward, then went to www.google.com, and then hit the back button. It showed question 3 but it was the partial view - because the browser cached the partial for that route.
My solution was to create a separate route for the ajax call: /Questions/Load/X (it used common code on the back end). Now with two different routes (/Questions/Show for non-ajax and /Questions/Load for ajax), the browser shows the full page correctly in the above situation.
So maybe a similar solution would work for you... i.e. two different routes for your home page - one for the full page and one for a partial. Hope that helps.
When a link is clicked I request just the central bit of the page from the server (identifying these requests with a different Accept header) and replace it with javascript.
Awesome. That's the RESTful way to do it. But there's one thing left to do to make it work: add a Vary header to the response.
Vary: Accept
That tells the browser that a request with a different Accept header might get a different response. Because the two requests use different Accept headers, the browser (and any caching proxies) will cache the responses separately.
Unlike setting Cache-Control: no-store, this will still allow you to use caching.

HTML5: Offline Cache is ignored if loading different page under the same URL

I have a web application that has a constant URL and internal state machine. The states are changed via posts. I know it is a bad design and I should use the rest approach. But given this I have a following problem.
I use HTML5 offline cache (the manifest attribute in HTML tag). For the first page it is parsed and cached as I would expect (login page). But for the second page (main menu) the manifest included there is not parsed. No events are shown inside Chrome browser. If I change the URL a little by including a parameter then the manifest is parsed, but not before.
Event if I include everything in the login page manifest the second page downloads the same files again. Event if they are specified in the manifest for the first page.
Why this behaviour?
To answer it myself. It was looking so odd, simply because the cache is only parsed on GET calls and ignored on POST calls. Event if post loads another HTML page. To me this is a little bit silly but it seems that is how it works.
Now it finally works as it is supposed to.

Problem with modifying a page with ajax, and the browser keeping the unmodified page in cache

I have a situation where my page loads some information from a database, which is then modified through AJAX.
I click a link to another page, then use the 'back' button to return to the original page.
The changes to the page through AJAX I made before don't appear, because the browser has the unchanged page stored in the cache.
Is there a way of fixing this without setting the page not to cache at all?
Thanks :)
Imagine that each request to the server for information, including the initial page load and each ajax request, are distinct entities. Each one may or may not be cached anywhere between the server and the browser.
You are modifying the initial page that was served to you (and cached by the browser, in most cases) with arbitrary requests to the server and dynamic DOM manipulation. The browser has to capacity to track these changed.
You will have to maintain state, maybe using a cookie, in order to reconstruct the page. In fact, it seems to me that a dynamically generated document that you may wish to move to and from should definitely have a workflow defined that persists and retrieves it's state.
Perhaps set a cookie for each manipulated element with the key that was sent to the server to get the data?