Downloading resources from external domains

Downloading resources from external domains - html

We have an ASP.net website, (https://website.example.com) which is loading external libraries (css and javascript) from a different sub-domain name (https://library.example.com)
The resources we are loading via the library project are only css files and javascript plugins, which themselves doesn't make any request (via AJAX).
Testing the website in normal environments, everything works fine.
However, opening it from an Internet Explorer 8 browser, returns an error:
internet explorer has modified this page to prevent cross site scripting
Could the fact that we are referencing external resources cause the error?
If yes, what would be the solution to fix this problem?
I think 90% of the websites downloads references from external domains (like CDN servers) for example.

Here's one way- configure the X-XSS-Protection header on your server.
This will tell IE to disable XSS protection on your site.
Looks something like this :
GET / HTTP/1.1
HTTP/1.1 200 OK
Date: Wed, 01 Feb 2012 03:42:24 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=6ddbc0a0342e7e63:FF=0:TM=1328067744:LM=1328067744:S=4d4farvCGl5Ww0C3; expires=Fri, 31-Jan-2014 03:42:24 GMT; path=/; domain=.google.com
Set-Cookie: NID=56=PgRwCKa8EltKnHS5clbFuhwyWsd3cPXiV1-iXzgyKsiy5RKXEKbg89gWWpjzYZjLPWTKrCWhOUhdInOlYU56LOb2W7XpC7uBnKAjMbxQSBw1UIprzw2BFK5dnaY7PRji; expires=Thu, 02-Aug-2012 03:42:24 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Transfer-Encoding: chunked
1000
Please read here for more details

Related

Cannot simulate this browser request using curl

I'm trying to scrape products from a site (e.g. https://www.violetgrey.com/en-us/shopping/the-rich-cream-18105401). Whilst on browser it loads normally, when I copy the initial curl request for the site, it gives me access denied. This is all done in local environment. So far, before copying the curl request from browser dev tools I have:
Disabled JS for the site
Cleared all my cache, cookies
Tried different browsers
Still, it's the same result - blocked via curl. When the exact same request worked in my browser. Could anyone please point me to the right direction?

If you look at the response header you can see it comes from Cloudflare.
Cloudflare is evil. IMHO.
The HTTP status is 403. HTTP/2 403 Which means Forbidden.
The body is the text:
error code: 1020
Error 1020 can be roughly translated to "take your curl and go elsewhere. You and your curl are not wanted here."
Cloudflare profiles and fingerprints Browsers. For example they monitor the SSL/TLS handshaking and if your curl handshaking is not do the handshaking exactly like the Browser in your User Agent, they give you a 403 Forbidden and Error code 1020.
And your request does not reach violetgrey.com. They do not even know you tried.
Cloudflare is political and blocks whatever traffic they want to. If it is in their best interest not allow you through, they block you. For example Cloudflare blocked me from accessing the US Patent and Trademark site. Not only that but they sent out 3 XHR beacon requests to YouTube and Google Play. My Firefox blocked those requests. Cloudflare and Google are closely related. I do not trust either one of them.
There is no shortage of articles about your problem and possible fixes. Just search "Cloudflare 403 forbidden 1020 error". And maybe not use Google to do the search.
Here is my effort to scrape your URL. I tried a few things like trying various User Agents. I tried wget.
Request header
GET /en-us/shopping/the-rich-cream-18105401 HTTP/2
Host: www.violetgrey.com
mozilla/5.0 (x11; netbsd amd64; rv:16.0) Gecko/20121102 Firefox/16.0
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
accept-language: en-US,en;q=0.5
accept-encoding: gzip, deflate, br
dnt: 1
alt-used: www.violetgrey.com
connection: keep-alive
upgrade-insecure-requests: 1
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: cross-site
sec-fetch-user: ?1
te: trailers
Response header:
HTTP/2 403
date: Thu, 27 Oct 2022 23:56:19 GMT
content-type: text/plain; charset=UTF-8
content-length: 16
x-frame-options: SAMEORIGIN
referrer-policy: same-origin
cache-control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
expires: Thu, 01 Jan 1970 00:00:01 GMT
server-timing: cf-q-config;dur=4.9999998736894e-06
vary: Accept-Encoding
server: cloudflare
cf-ray: 760f5e1ced6e8dcc-MIA
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400
Response Body:
error code: 1020

CORS Issue with woff2 fonts behind CDN in Chrome

I have an S3 Bucket fronted with a Cloudfront CDN. In that bucket, I have some woff2 fonts that were automatically tagged with the content type octet-stream. When trying to load that font from a CSS file on a live production website, I get the following error:
Access to Font at 'https://cdn.example.com/fonts/my-font.woff2' from origin
'https://www.live-website.com' has been blocked by CORS policy:
No 'Access-Control-Allow-Origin' header is present on the requested resource.
Origin 'https://www.live-website.com' is therefore not allowed access.
The thing is that a curl reveals that the Access-Control-Allow-Origin is present:
HTTP/1.1 200 OK
Content-Type: binary/octet-stream
Content-Length: 98488
Connection: keep-alive
Date: Wed, 08 Aug 2018 19:43:01 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET
Access-Control-Max-Age: 3000
Last-Modified: Mon, 14 Aug 2017 14:57:06 GMT
ETag: "<redacted>"
Accept-Ranges: bytes
Server: AmazonS3
Age: 84847
X-Cache: Hit from cloudfront
Via: 1.1 <redacted>
X-Amz-Cf-Id: <redacted>
Everything is working fine in Firefox, so I guess that Chrome is doing an extra validation that blocks my font.

Turns out that the problem was actually with the Content-Type. As soon as I changed the content type to application/font-woff2 and invalidated the cache of these woff2 files, everything went through smoothly.

My problem with CORS and multi domain was that Cloudfront was taking in cache the first request so I had to select in Whitelist Headers the Origin option. And it works.
enter image description here

Chrome + CORS + cache - requesting same file from two different origins

I'm experiencing an issue with Chrome that I can't seem to fully understand, I'm curious if folks here have dealt with it before. This doesn't reproduce in Firefox. The steps are as follows:
Start incognito Chrome, navigate to https://foo.mysite.com and have the JS on the page make a GET ajax request to S3 for https://s3.amazonaws.com/mystuff/file.json . You get back a 200 response with:
HTTP/1.1 200 OK
x-amz-id-2: somestuffhere
x-amz-request-id: somestuffhere
Date: Tue, 14 Oct 2014 03:06:41 GMT
Access-Control-Allow-Origin: https://foo.mysite.com
Access-Control-Allow-Methods: GET
Access-Control-Max-Age: 3000
Access-Control-Allow-Credentials: true
Vary: Origin, Access-Control-Request-Headers, Access-Control-Request-Method
Cache-Control: max-age=86400
Content-Encoding: gzip
Last-Modified: Sun, 05 Oct 2014 00:29:53 GMT
ETag: "fe76607baa40a793eb3b3cbd373a3fb8"
Accept-Ranges: bytes
Content-Type: application/json
Content-Length: 5609
Server: AmazonS3
Open a second tab, navigate to https://bar.mysite.com and have its JS make a GET ajax request to S3 for the same file https://s3.amazonaws.com/mystuff/file.json . Get back the following 304 response:
HTTP/1.1 304 Not Modified
x-amz-id-2: somestuffhere
x-amz-request-id: somestuffhere
Date: Tue, 14 Oct 2014 03:06:58 GMT
Access-Control-Allow-Origin: https://bar.mysite.com
Access-Control-Allow-Methods: GET
Access-Control-Max-Age: 3000
Access-Control-Allow-Credentials: true
Vary: Origin, Access-Control-Request-Headers, Access-Control-Request-Method
Cache-Control: max-age=86400
Last-Modified: Sun, 05 Oct 2014 00:29:53 GMT
ETag: "fe76607baa40a793eb3b3cbd373a3fb8"
Server: AmazonS3
Open a third tab, navigate to https://foo.mysite.com (the first site) and repeat the same steps as in 1. Chrome kills the response for CORS reasons and reports the following:
XMLHttpRequest cannot load https://s3.amazonaws.com/mystuff/file.json. The 'Access-Control-Allow-Origin' header has a value 'https://bar.mysite.com' that is not equal to the supplied origin. Origin 'https://foo.mysite.com' is therefore not allowed access.
What's the story here? This doesn't reproduce in Firefox. In Firefox I'm happily getting a 304 in both steps 2 and 3, which I would expect to see in Chrome as well.
A temporary workaround for this issue in Chrome is to set Cache-Control: no-cache on the file in S3, but then I'm forcing our clients to be re-downloading that file for no good reason, so it's not a real solution.
Is this intended and documented behavior? Is this a bug with Chrome? Any other thoughts?

Looks like this is caused by Chromium issue 260239

I found this blog that help: Add Vary headers to S3
It helped by adding Vary headers to all XHR request.
I did run into a problem with html request (i.e. ) but I was able to overcome that by using hackround#2 described here:https://serverfault.com/a/856948
TL;DR of hack#2 is to use a "dummy" query string parameter that differs for HTML and XHR or is absent from one or the other. Example:
<img src="https://s3.png?x-request=html">

I just add a timestamp in request URL to force load the asset from S3 again, not from cache, such as xxxx?timestamp=yyyy

Chrome loads a text/html file but shows status "failed" and does not render on screen

Still facing a weird issue with Google Chrome. I have a text/html page generated from a php source code. This page loads well and is well displayed on any popular browsers but Chrome. When the source code has just been saved, Chrome loads and render the file the right way. Even if I simply added or removed a space character. Next, if I try to refresh the page, Chrome displays a blank page and shows an error in the Developper Tools panel (see screenshot) indicating a "failed" status. But if I check the HTTP response headers, everything seems to be fine, including the HTTP status: 200 OK.
HTTP/1.1 200 OK
Date: Mon, 17 Sep 2012 08:37:03 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: PHP/5.3.14
Expires: Mon, 17 Sep 2012 08:52:03 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
Bellow is the HTTP response headers just after saving the source and getting a correct rendering. No change (except time related information)
HTTP/1.1 200 OK
Date: Mon, 17 Sep 2012 08:56:06 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: PHP/5.3.14
Expires: Mon, 17 Sep 2012 09:11:07 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
I also checked the HTTP request headers, they are the same in both cases:
Working case:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
Cache-Control:max-age=0
Connection:keep-alive
Cookie:PHPSESSID=qn01olb0lkgh3qh7tpmjbdbct1
Host:(hidden here, but correct, looks like subsubdomain.subdomain.domain.tld)
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1
Failing case:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
Cache-Control:max-age=0
Connection:keep-alive
Cookie:PHPSESSID=qn01olb0lkgh3qh7tpmjbdbct1
Host:(also hidden)
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1
I noticed that even if the page failed, some other resources (javascript, style sheets) are successfully loaded or retrieved from the local cache. I can also access the HTML source code successfully every time, could the page be rendered or not (the HTML code is what it's expected to contain).
I also ran Wireshark in order to see if there would be something wrong while transferring data or something, but everything seems to be OK on this side too.
I read something on Google about content-length that would make Chrome fail if the information provided in the HTTP headers differs from the effective size of the delivered file. It does not seem to me it would be the case here as content-length is not provided.
Any help would be welcome! Thanks!

To those who are still encountering the issue:
add
setEnv downgrade-1.0
to your .htaccess, for me it works as temporary fix. It disables chunked transfers.
Another solution is to set user agent to IE, apparently it does the same...
Actual issue is somehow related to server configuration, I am also seeing segmentation fault errors.
Site I am encountering this behaviour runs on PHP 5.3.27 over HTTPS.
edit: It doesn't matter if site is accessed over HTTPS or HTTP.
edit2: The cause for this error was one extremely long line, splitting it into multiple lines solved the problem.

Chrome serving resource from cache when it is not present in cache-manifest

I am playing around with HTML 5 cache manifests, and I am seeing a very strange issue in Chrome. Here's the page's header:
<html id="html" xmlns="http://www.w3.org/1999/xhtml" manifest="Portal/CacheManifestHandler.ashx">
Here are the manifest contents captured from fiddler:
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Type: text/cache-manifest; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Fri, 20 Apr 2012 15:56:20 GMT
Content-Length: 56
CACHE MANIFEST
NETWORK:
*
#Timestamp: 634705337615835020
I have one particular script on the page's header inside tag that is generated dynamically on server. Here are the contents returned for that script tag the first time user accesses the page:
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Type: text/javascript; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Fri, 20 Apr 2012 15:36:33 GMT
Content-Length: 74
document.location='/Portal/Login.aspx?ReturnUrl=%2fPortal%2fDefault.aspx';
You can see neither the script is in Cache-manifest, nor its headers allow browser (Chrome) to cache it.
Still when I subsequently open the same page in browser, Chrome loads the page from cache-manifest, which is okay.
However surprisingly it loads the <script> also from cache. I can verify it as my server breakpoints are not hit, nor does Fiddler show a request for this <script>. The network is not down and the server is accessible (this should not have made a difference because Chrome was asked to not cache this <script> anyways).
Is this the expected behavior? Shouldn't Chrome have requested the <script> again from server even when its containing page was loaded from manifest cache.
Chrome's chrome://appcache-internals also shows only 2 urls in the cache which again is fine, why then it loads the <script> from cache and not the server

We had the same issue, our resolution was to stick a * in the network section of our app.manifest so our Network section looked like
NETWORK:
*
I'm now digging to see if that's really "by design" for Google or just plain wrong.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008