I tried to access some sites using iframe. Seems it works for some sites but not for others. Any ideas about this? How to fix it? The src link in the iframe is valid, I can access it directly. It just didn't show up in the iframe.
Is this related to X-Frame-Options? How to go around it?
Thanks a lot.
========
Here is the html codes:
<html>
<head>
<title>IFrame Test</title>
</head>
<body>
<iframe src="http://s.click.taobao.com/t?e=zGU34CA7K%2BPkqB07S4%2FK0CITy7klxn%2F7bvn0ay1PXvKwtZSEswjTW0qipvZgGECAqg4jFMvRrmqEsewCV2vrDIKlj1m7fQBXl8oXaLeyNdntQSWHx%2F4LzcWUbay3v1DbpVnqVPkubyxNIXPljUBoBNuLFC0ZLm4SV46zTasP3e6uSYcfiqrWYSyZV%2B7G&spm=2014.21191910.1.0" width="100%" height="600" >
</iframe>
</body>
</html>
Is this related to X-Frame-Options?
Yes, well kinda. It's difficult, possibly impossible, to determine what technique was adopted on the server side to prevent contents from being loaded in frames of different domains, but the X-Frame-Options:SAMEORIGIN wasn't used. This are response headers returned, when attempting to load the web page in question in an iframe:
HTTP/1.1 302 Moved Temporarily
Server: Tengine
Date: Sun, 27 Jan 2013 06:24:22 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Connection: close
Location: [too long, irrelevant, removed ;)]
Expires: Sun, 27 Jan 2013 06:24:22 GMT
Cache-Control: max-age=0
As you can see, the server responds with HTTP status code 302 (moved temporarily), which is often used on web servers instead of forbidden in a bid to prevent any additional such requests overloading the web server (or to distinguish individual denied requests one from another in log files). However, there isn't any X-Frame-Options:SAMEORIGIN header. Such header might or might not be respected by various browsers anyway.
How to go around it?
Not doing some illegal and nasty work on the web server involved, or knowing what conditions need to be satisfied for web server to grant access to the requested location, that's impossible to say. You could ask the admin of the web server kindly to make an exception to this rule for your domain, though. Just a thought...
EDIT: Upon further investigation, the URL you provided actually redirects to
http://item.taobao.com/item.htm?id=13188785766&ali_trackid=2:mm_32761976_0_0:1359267664_4k8_1456591680&spm=2014.21191910.1.0
meaning there are some server-side rewrite rules involved. That doesn't help you much with your URL, but you could use the one it translates to, if all you wanted is to present the target page to the end user. Problem is, you lose any session data and/or tracking cookies that you might have wanted to force on the unsuspecting user and collect spoils of some referral scheme. I'm not implying that's what you wanted to do, but if you did, putting in the iframe source a translated URL wouldn't work. ;)
Related
I'm using Chrome 40 (so something nice and modern).
Cache-Control: max-age=0, no-cache is set on all pages - so I expect the browser to only use something from its cache if it has first checked with the server and gotten a 304 Not Modified response.
However on pressing the back button the browser merrily hits its own cache without checking with the server.
If I open the same page, as I reached with the back button, in a new tab then it does check with the server (and gets a 303 See Other response as things have changed).
See the screen captures below showing the output for the two different cases from the Network tab of the Chrome Developer Tools.
I thought I could use max-age=0, no-cache as a lighter weight alternative to no-store where I don't want users seeing stale data via the back button (but where the data is non-valuable and so can be cached).
My understanding of no-cache (see here and here on SO) is that the browser must always revalidate all responses. So why doesn't Chrome do this when using the back button?
Is no-store the only option?
200 response (from cache) on pressing back button:
303 response on requesting the same page in a new tab:
From http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1
no-cache
If the no-cache directive does not specify a field-name, then a cache MUST NOT use the response to satisfy a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent caching even by caches that have been configured to return stale responses to client requests.
If the no-cache directive does specify one or more field-names, then a cache MAY use the response to satisfy a subsequent request, subject to any other restrictions on caching. However, the specified field-name(s) MUST NOT be sent in the response to a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent the re-use of certain header fields in a response, while still allowing caching of the rest of the response.
Other than the name implies, no-cache does not require that the response must not be stored in cache. It only specifies that the cached response must not be reused to serve a subsequent request without re-validating, so it's a shorthand for must-revalidate, max-age=0.
It is up to the browser what to qualify as a subsequent request, and to my understanding using the back-button does not. This behavior varies between different browser engines.
no-store forbids the use of the cached response for all requests, not only for subsequent ones.
Note that even with no-store, the RFC actually permits the client to store the response for use in history buffers. That means client may still use a cached response even when no-store has been specified.
Latter behavior covers cases where the page has been recorded with its original page title in the browser history. Another use case is the behavior of various mobile browsers which will not discard the previous page until the following page has fully loaded as the user might want to abort.
For clarification on the the behavior of the back button: It is not subject to any cache header, according to http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.13
User agents often have history mechanisms, such as "Back" buttons and history lists, which can be used to redisplay an entity retrieved earlier in a session.
History mechanisms and caches are different. In particular history mechanisms SHOULD NOT try to show a semantically transparent view of the current state of a resource. Rather, a history mechanism is meant to show exactly what the user saw at the time when the resource was retrieved.
By default, an expiration time does not apply to history mechanisms. If the entity is still in storage, a history mechanism SHOULD display it even if the entity has expired, unless the user has specifically configured the agent to refresh expired history documents.
That means that disrespecting any cache control headers when using the back button is the recommended behavior. If your browser happens to honor a backdated expiration date or applies the no-store directive not only to the browser cache but also to the history, it's actually already departing from that recommendation.
For how to solve it:
You can't, and you are not supposed to. If the user is returning to a previously visited page, most browsers will even try to restore the viewport. You may use deferred mechanism like AJAX to refresh content if this was the original behavior before the user left the page, but otherwise you should not even modify the content.
Have you tried using the full old set of no-cache headers?
<meta http-equiv="cache-control" content="max-age=0" />
<meta http-equiv="cache-control" content="no-cache" />
<meta http-equiv="expires" content="0" />
<meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT" />
<meta http-equiv="pragma" content="no-cache" />
This seems to work all the time, unless you are running a "pushState" web.
Looks like this is a known 'quirk' in Chrome with using the back button. There is a good discussion of the issue in the bug report for it here:
https://code.google.com/p/chromium/issues/detail?id=28035
Sadly it looks like most people reverted to using no-store instead.
I expect though that most users are used to an experience of not getting a full page refresh using the back button. If you think about most Angular or Backbone apps that manage the back action themselves so that you just refresh content and not the page. With this in mind I suspect that having the customer refresh or get updates when they come back might not be that unexpected.
I have a web page that returns the following header when I access material:
HTTP/1.1 200 OK
Date: Sat, 29 Jun 2013 15:57:25 GMT
Server: Apache
Content-Length: 2247515
Cache-Control: no-cache, no-store, must-revalidate, max-age=-1
Pragma: no-cache, no-store
Expires: -1
Connection: close
Using a chrome extension, I want to modify this response header so that the material is actually cached instead of wasting bandwidth.
I have the following sample code:
chrome.webRequest.onHeadersReceived.addListener(function(details)
{
// Delete the required elements
removeHeader(details.responseHeaders, 'pragma');
removeHeader(details.responseHeaders, 'expires');
// Modify cache-control
updateHeader(details.responseHeaders, 'cache-control', 'max-age=3600;')
console.log(details.url);
console.log(details.responseHeaders);
return{responseHeaders: details.responseHeaders};
},
{urls: ["<all_urls>"]}, ['blocking', 'responseHeaders']
);
Which correctly modifies the header to something like this (based on the console.log() output):
HTTP/1.1 200 OK
Date: Sat, 29 Jun 2013 15:57:25 GMT
Server: Apache
Content-Length: 2247515
Cache-Control: max-age=3600
Connection: close
But based on everything I have tried to check this, I cannot see any evidence whatsoever that this has actually happened:
The cache does not contain an entry for this file
The Network tab in the Developer Console shows no change at all to the HTTP response (I have tried changing it to even trivial modifications just for the sake of ensuring that its not a error, but still no change).
The only real hints I can find are this question which suggests that my approach still works and this paragraph on the webRequest API documentation which suggests that this won't work (but doesn't explain why I can't get any changes whatsoever):
Note that the web request API presents an abstraction of the network
stack to the extension. Internally, one URL request can be split into
several HTTP requests (for example to fetch individual byte ranges
from a large file) or can be handled by the network stack without
communicating with the network. For this reason, the API does not
provide the final HTTP headers that are sent to the network. For
example, all headers that are related to caching are invisible to the
extension.
Nothing is working whatsoever (I can't modify the HTTP response header at all) so I think that's my first concern.
Any suggestions at where I could be going wrong or how to go about finding what is going wrong here?
If its not possible, are there any other ways to achieve what I am trying to achieve?
I have recently spent some hours on trying to get a file cached, and discovered that the chrome.webRequest and chrome.declarativeWebRequest APIs cannot force resources to be cached. In no way.
The Cache-Control (and other) response headers can be changed, but it will only be visible in the getResponseHeader method. Not in the caching behaviour.
I'm facing an issue on downloading PDF files from Amazon S3 using Chrome.
When I click a link, my controller redirect the request to the file's URL on S3.
It works perfectly with Firefox, but nothing happens with Chrome.
Yet, if I perform a right click -> Save location as will download the file ...
And even a copy-paste of the S3 URL into Chrome will lead to a blank screen ...
Here is some information returned by curl:
Date: Wed, 01 Feb 2012 15:34:09 GMT
Last-Modified: Wed, 01 Feb 2012 04:45:24 GMT
Accept-Ranges: bytes
Content-Type: application/x-pdf
Content-Length: 50024
Server: AmazonS3
My guesses are related to an issue with the content type ... but all I tried didn't work.
The canonical Internet media type for a PDF document is actually application/pdf as defined in The application/pdf Media Type (RFC 3778) - please note that application/x-pdf, while commonly encountered and listed as a media type in Portable Document Format as well, is notably absent from the official Application Media Types listed by the Internet Assigned Numbers Authority (IANA).
I'm not aware of why and when application/x-pdf came to life, but apparently the Chrome PDF Plugin does not open application/x-pdf documents as of today.
Consequently you should be able to trigger a different behavior in Chrome by changing the media type of the stored objects accordingly.
Alternative (for authenticated requests)
Another approach would be to Force a PDF to download instead of letting Chrome attempt to open it, which can be done by means of triggering the Content-Diposition: attachment header with your GET request - please see the S3 documentation for GET Object on how to achieve this via the response-content-disposition request parameter, specifically response-content-disposition=attachment as demonstrated there in section Sample Request with Parameters Altering Response Header Values.
This is only available for authenticated requests though, see section Request Parameters:
Note
You must sign the request, either using an Authorization header
or a Pre-signed URL, when using these parameters. They can not be used
with an unsigned (anonymous) request.
There is an html based solution to this. Since chrome is up to date with HTML5, we can use the shiny new download attribute!
Broken
Works
After symcbean's answer I decided to change my question to:
What's the correct way to keep cache of images/css/js only? Html will not be cached in any web browser.
Go read some good books on the topic - or the specs. You're currently very badly informed.
A normal "trick" is to use:
Normal for whom? Setting Pragma: no-cache has nothing to do with what the browser caches. Setting Expires to -1 should prevent the current document from being cached - but its an HTTP/1.0 ONLY attribute - HTTP/1.1 has been in widespread use for the past 8 years.
However this is a very expensive decision. The cost is to retrieve all the images, css and javascript files in every request
No - the example you've given is an HTML tag - which can only occur in an HTML file. By default (i.e. in the absence of any specific caching directions) browsers "MAY" use cached file - in my experience, its only some mobile devices which cache so aggressively - but none of them implement the requirement to warn the user about this (see rfc 2616 13.1.5).
Caching instructions (and indeed all meta-data) should be sent in the HTTP headers - the META tags provide a surrogate mechanism in some cases.
Have a google for Mark Nottingham's caching tutorial - its a good starting point - but only a starting point.
Configure your server to send the Pragma: no-cache and Expires: ... headers with html content. its trivial to do with apache in a .htaccess just add a files section with a pattern that matches any .html file and set the headers there using mod_headers or better yet mod_expires
I've never seen this before, I've always known there was either GET or POST. And I can't find any good documentation.
GET send variables via the URL.
POST send it via the file body?
What does HEAD do?
It doesn't get used often, am I correct?
W3schools.com doesn't even mention it.
HTML’s method attribute only allows GET and POST.
The HEAD method is used to send the request and retrieve just the HTTP header as response. For example, a client application can issue a HEAD request to check the size of a file (from HTTP headers) without downloading it. As Arjan points out, it's not even valid in HTML forms.
HTTP method HEAD sends the response's headers but without a body; it's often useful, as the URL I've given explains, though hardly ever in a "form" HTML tag.
The only thing I can imagine is that the server may actually have been set up to validate the request method, to discover submissions by robots that for HEAD might actually use a different method than a browser does. (And thus reject those submissions.)
A response to a HEAD request does not imply nothing is shown to the user: even a response to HEAD can very well redirect to another page. However, like Gumbo noted: it's not valid for the method in a HTML form, so this would require a lot of testing in each possible browser...
For a moment I wondered if HEAD in a form is somehow used to avoid accidental multiple submissions. But I assume the only useful response would be a 301 Redirect, but that could also be used with GET or POST, so I don't see how HEAD would solve any issues.
A quick test in the current versions of both Safari and Firefox on a Mac shows that actually a GET is invoked. Of course, assuming this is undocumented behavior, one should not rely on that. Maybe for some time, spam robots were in fact fooled into using HEAD (which would then be rejected on the server), or might be fooled into skipping this form if they would only support GET and POST. But even the dumbest robot programmer (aren't they all dumb for not understanding their work is evil?) would soon have learned that a browser converts this into GET.
(Do you have an example of a website that uses this? Are you sure there's no JavaScript that changes this, or does something else? Can anyone test what Internet Explorer sends?)
HEAD Method
The HEAD method is functionally like GET, except that the server replies with a response line and headers, but no entity-body. Following is a simple example which makes use of HEAD method to fetch header information about hello.htm:
HEAD /hello.htm HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.tutorialspoint.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Following will be a server response against the above GET request:
HTTP/1.1 200 OK
Date: Mon, 27 Jul 2009 12:28:53 GMT
Server: Apache/2.2.14 (Win32)
Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT
ETag: "34aa387-d-1568eb00"
Vary: Authorization,Accept
Accept-Ranges: bytes
Content-Length: 88
Content-Type: text/html
Connection: Closed
You can notice that here server does not send any data after header.
-Obtained from tutorialspoint.com