I added a preload for my site like this <link rel=preload href=http://api.github.com/... as=fetch crossorigin=anonymous> which is fetch()ed later. It worked very well (the preload request sent to the remote server at the very beginning of loading, and answer came back, everything was fine).
Later I added an Authorization: Bearer ... to the fetch() call (because of other reasons), which is caused that the preload's HTTP headers do not match the later fetch's HTTP headers, so the entire preload result is not re-used anymore (Both Chrome and Firefox are correctly notify me about this).
I also tried to add the preload with Link HTTP header to the main page's response, but that did not help as well.
So the current sitation is this: I can't add the same Authorization header to preload request because it is simply not possible, so the two request are never will be the same, so the preload is useless anymore.
Please correct me and advise:
Is there a way to add that Authorization: Bearer ... to the preload request to?
OR is there a way to ask the browser to ignore that difference between to two request's headers?
OR any other idea?
Web standards do not allow for such a possibility.
The Fetch standard defines the conditions under which prefetched resources may be used as follows:
To fetch, given a request request, […] run the steps below. […]
If all of the following conditions are true:
[…]
request’s unsafe-request flag is not set or request’s header list is empty
then:
Let foundPreloadedResource be the result of invoking consume a preloaded resource for [request]
Having an Authorization header in the fetch request disqualifies it from reusing preloaded resources. Unless you happen to know of a non-standard extension that allows you to bypass this, this means there is no way to prefetch a resource with custom headers.
There are three ways you can resolve this: skip the Authorization header in the request proper, give up on preloading entirely, or reimplement prefetching yourself. That is, inject a script that fetches the resource early during page loading, preferably gated by network.connection && !network.connection.saveData, and stores it in your own cache, then simply look up the data there.
The order I listed those solutions is one of, in my opinion, decreasing appropriateness. Prefetching has been designed mostly for the sake of static resources that present the same to any user that may want to download them; as such, an Authorization header is not supposed to matter, so if you can get away with avoiding it, do. If authorization does matter, then maybe the resource isn’t such a great candidate for prefetching. If you insist though, you can do it manually.
Related
I have a page /data.txt, which is cached in the client's browser. Based on data which might be known only to the server, I now know that this page is out of date and should be refreshed. However, since it is cached, they will not re-request it for a long time (until the cache expires).
The client is now requesting a different page /foo.html. How can I make the client's browser re-request /data.txt and update its cache?
This should be done using HTTP or HTML (not all clients have JS).
(I specifically want to avoid the "cache-busting" pattern of appending version numbers to the /data.txt URL, like /data.txt?v=2. This fills the cache with useless entries rather than replacing expired ones.)
Edit for clarity: I specifically want to cache /data.txt for a long time, so telling the client not to cache it is unfortunately not what I'm looking for (for this question). I want /data.txt to be cached forever until the server chooses to invalidate it. But since the user never re-requests /data.txt, I need to invalidate it as a side effect of another request (for /foo.html).
To expand my comment:
You can use IF-Modified-Since and Etag, and to invalidate the resource that has been already downloaded you may take a look at the different approaches suggested in Clear the cache in JavaScript and fetch(), how do you make a non-cached request?, most of the suggestions there mentioned fetching the resource from JavaScript with no-cache header fetch(url, {cache: "no-store"}).
Or, if you can try sending a Clear-Site-Data header if your clients' browsers are supported.
Or maybe, give up this time only for the cache-busting solution. And if it's possible for you, rename the file to something else rather than adding a querystring as suggested in Revving Filenames: don’t use querystring.
Update after clarification:
If you are not maintaining a legacy implementation with users that already have /data.txt cached, the use of Etag And IF-Modified-Since headers should help.
And for the users with the cached versions, you may redirect to: /newFile.txt or /data.txt?v=1 from /foo.html. The new requests will have the newly added headers.
The first step is to fix your cache headers on the data.txt resource so it uses your desired cache policy (perhaps Cache-Control: no-cache in conjunction with an ETag for conditional validation). Otherwise you're just going to have this problem over and over again.
The next step is to get clients who have it in their cache already to re-request it. In general there's no automatic way to achieve this, but if you know they're accessing foo.html then it should be possible. On that page you can make an AJAX request to data.txt with the Cache-Control: no-cache request header. That should force the browser to bypass the cache and get a fresh version, and the cache should then be repopulated with the new version.
(At least, that's how it's supposed to work. I've never tried this, and I've seen reports here that browsers don't handle Cache-Control request headers properly.)
I'm trying to get data from this site: [1] https://www.eurobet.it/it/scommesse/#!/calcio/?temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI
I found this link where I can get the data in JSON format: [2] https://www.eurobet.it/detail-service/sport-schedule/services/discipline/calcio?prematch=1&live=0&temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI
But there is a problem:
The JSON link Doesn't work every time in fact sometimes I get a 404 error.
I noticed that if I open the first link [1] before opening the second [2] it works perfectly.
This error is also more frequent when I try to scrape other data on the same site: [3] https://www.eurobet.it/detail-service/sport-schedule/services/discipline/calcio/piu-giocate/u-o-goal?prematch=1&live=0&temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI
In this link [3] I try to get all "u-o-goal" odds but this link works only if (before starting my program to scrape data) in the main link [1] I press the "U/O GOAL" button -> https://i.stack.imgur.com/Nei5u.png
In my code, I'm using Java and htmlunit to scrape the data.
My question is: how this webpage works, why couldn't I open directly the links [2]/[3], I know that there is a sort of request and approval system behind but I can't see where.
You cannot directly open these URLs since the website (and many like it) will use cookies and bot-prevention techniques/session tracking so they can gather data about usage of their website. eg. they set a "Referer".
I'm not going to code a solution for you but I can at least help you understand what you need to do to get to where you want...
I've attempted to summarise how I'd typically unpick a request like this to recreate it, but in its essence, you need to understand the sequence of HTTP requests being made (this is how the web works - HTTP requests).
First you typically start with no session cookies and you access the site directly (no referer).
Once you access a website, typically the server responds with a session cookie for you to communicate back to the server a unique session ID so it has some sort of record of your browser having already been in contact.
Your browser may make more requests (asynchronously) and in doing so typically sends the cookies and the referring URL (usually the base Url will work... just don't use something that starts with something other than "https://www.eurobet.it"
anything else you're going to need to figure it out. Lots of headers are optional. Lots of query params have defaults.
https://stackoverflow.com/a/64671815/7619034 - here's an answer I've given before that answers this type of question which comes up often enough.
so to explain a bit further, for your specific scenario...
When you access https://www.eurobet.it/it/scommesse/#!/calcio/?temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI, the server responds with HTTP headers:
...
set-cookie: __cfduid=dd38d***********41125; ...
...
The rest doesn't look that relevant:
Going straight to the other request: https://www.eurobet.it/detail-service/sport-schedule/services/discipline/calcio?prematch=1&live=0&temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI
This HTTP request takes (as input):
cookie: __cfduid=dd38d***********41125; mbox=session#6661556c.....b6e8cc1fa6f03#1608242987; at_check=true; s_ecid=MCMID%***********2021453010; AMCVS_45F10C3A53DAEC9F0A490D4D%40AdobeOrg=1; AMCV_45F10C3A53DAEC9F0A490D4D%40AdobeOrg=1075005958%7CMCIDTS%7C18614%7CMCMID%7C91883906030825914429183258312021453010%7CMCAID%7CNONE%7CMCOPTOUT-1608248327s%7CNONE%7CvVersion%7C4.4.1; s_cc=true
...
referer: https://www.eurobet.it/it/scommesse/
...
x-eb-accept-language: it_IT
x-eb-marketid: 5
x-eb-platformid: 1
Cookies are set in an initial request (typically) using Set-Cookie header and then are passed back to the server in subsequent requests using the cookie header.
I'm not certain how many of these values are relevant but you'd need to figure out where each came from in the chain of HTTP requests between the initial one and this one and you'd need to replicate them (see url above of my previous answer - warning this can be time consuming).
The other headers can be set statically most likely since they probably aren't due to change.
If you have access to curl on the command line, you can attempt to reconstruct some of these requests by hand. Some will be time sensitive since cookies do expire after an amount of time (see set-cookie header details for exactly when). Once you've reconstructed a working request, you can then start coding it in your application.
If you can work all this out you should be able to re-construct the chain of HTTP GET requests to get the JSON data you want. Good luck!
I'm creating a TODO list app with nodejs and I need an option to delete a todo record. So I've created an HTML form that contains a delete button but I can only use POST as the method. actually my code is working fine but are there any problems with using POST to DELETE records?
No there isn't. Apart from GET and POST requests, most of the other HTTP method serve only semantic purposes. Except for some notable examples such as OPTIONS which is use to communicate supported methods. Using correct HTTP verbs will make your application / API easier to understand.
The same goes for HTTP status code. Functionality-wise, it doesn't really matter if you send a 200 (OK), 201(Created), or 202 (Accepted) for a successful request. However, sending the correct status code can avoid necessary confusion.
As far as I know, no. There're no hazards of using HTTP post method instead of DELETE. You can see how it's used in deletion here
One difference is that users might be tricked into making the POST request and inadvertently deleting an entry, if they are lured onto a malicious web page that contains an auto-submitting HTML form
<form method="POST" action="<your deletion endpoint>">
(Whether that is possible depends on the content type that your deletion endpoint expects.)
A DELETE request, however, cannot be forged in such a way. A malicious web page could create a DELETE request only with fetch or XMLHttpRequest, and because of the CORS protocol the browser would refuse to carry that out (unless your server explicitly allows it through a suitable CORS preflight response).
I'm studing HTML5's security problems. I saw all the presentations made by Shreeraj Shah. I tried to simulate a basic CSRF attack with my own servers using withCredentials tag sets to true (so in the response message the cookies should be replayed) and adding Content-Type sets to text/plain in the request (to bypass the preflight call).
When I tried to start the attack the browser told me that the XMLHttpRequest can not be accomplish because of the Access-Control-Allow-Origin header. So I put a * in the header of the victim's web page and the browser told me that I can't use the * character when I send a request with withCredentials sets to true.
I tried to make the same thing with the web apps stored in the same domain, and all was fine (I suppose it is because the browser doesn't check if the request comes from the same domain).
I'm asking, it's a new features that modern browsers set up recently to avoid this kind of problems?
Because in the Shreeraj's videos, the request was across different domains and it worked...
Thank you all and sorry for my english :-)
EDIT:
I think I found the reason why the CSRF attack doesn't work fine as in the Shreeraj's presentations.
I read the previous CORS document, published in 2010, and I found that there wasn't any recommendation about the with credential flag setted to true when Access-Control-Allow-Origin is set to *, but if we look at the last two publications about CORS (2012 and 2013), in the section 6.1, one of the notes is that we can't make a request using with credentials flag setted to true if the Access-Control-Allow-Origin is set to *.
Here are the links:
The previous one (2010): http://www.w3.org/TR/2010/WD-cors-20100727/
The last two (2012, 2013): http://www.w3.org/TR/2012/WD-cors-20120403/ --- http://www.w3.org/TR/cors/
Here is the section I'm talking about: http://www.w3.org/TR/cors/#supports-credentials
If we look at the previous document we can not find it, because there isn't.
I think this is the reason why the simple CSRF attack made in 2012 by Shreeraj Shah today doesn't work (of course in modern browsers that follow the w3c's recommendations). Could it be?
The request will still be made despite the browser error (if there's no pre-flight).
The Access-Control-Allow-Origin simply allows access to the response from a different domain, it does not affect the actual HTTP request.
e.g. it would still be possible for evil.com to make a POST request to example.com/transferMoney even though there are no CORS headers set by example.com using AJAX.
Quite simple really:
var req:URLRequest=new URLRequest();
req.url="http://somesite.com";
var header:URLRequestHeader=new URLRequestHeader("my-bespoke-header","1");
req.requestHeaders.push(header);
req.method=URLRequestMethod.GET;
stream.load(req);
Yet, if I inspect the traffic with WireShark, the my-bespoke-header is not being sent. If I change to URLRequestMethod.POST and append some data to req.data, then the header is sent, but the receiving application requires a GET not a POST.
The documentation mentions a blacklist of headers that will not get sent. my-bespoke-header is not one of these. It's possibly worth mentioning that the originating request is from a different port on the same domain. Nothing is reported in the policyfile log, so it seems unlikely, but is this something that can be remedied by force loading a crossdomain.xml with a allow-http-request-headers-from despite the fact that this is not a crossdomain issue? Or is it simply an undocumented feature of the Flash Player that it can only send custom headers with a POST request?
From what I can gather, it seems like your assumption about the lack of custom headers support for HTTP GET is indeed an undocumented feature (or a bug?) in the standard libraries.
In any case, you might want to see if as3httpclient would fit your purposes and let you work around this issue. Here's a relevant snippet from a post in the blog of the developer of this library:
"I was not able to set the header of a
HTTP/GET request. Macromedia Flash
Player allows you set the header only
for POST requests. I discussed this
issues with Ted Patrick and he told me
how I can us Socket to achieve the
desired and he was very kind to give a
me code-snippet, which got me
started."
If this limitation was undocumented at one time, that's no longer the case. See:
http://livedocs.adobe.com/flex/3/langref/flash/net/URLRequest.html#requestHeaders
"[...] Due to browser limitations, custom HTTP request headers are only supported for POST requests, not for GET requests. [...]"