Why does Chrome send a HEAD request? Example in logs:
2013-03-04 07:43:51 W3SVC7 NS1 GET /page.html 80 - *.*.*.* HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.22+(KHTML,+like+Gecko)+Chrome/25.0.1364.97+Safari/537.22
2013-03-04 07:43:51 W3SVC7 NS1 HEAD / - 80 - *.*.*.* HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.22+(KHTML,+like+Gecko)+Chrome/25.0.1364.97+Safari/537.22
I have a ban system, and this head request really annoying, and its happening exactly the same second with GET request.
What is the nature of it? any help appreciated.
p.s: I noticed that the head requests are all only to my homepage.
RFC 2616 states:
9.4 HEAD
The HEAD method is identical to GET except that the server MUST NOT
return a message-body in the response. The metainformation contained
in the HTTP headers in response to a HEAD request SHOULD be identical
to the information sent in response to a GET request. This method can
be used for obtaining metainformation about the entity implied by the
request without transferring the entity-body itself. This method is
often used for testing hypertext links for validity, accessibility,
and recent modification.
The response to a HEAD request MAY be cacheable in the sense that the
information contained in the response MAY be used to update a
previously cached entity from that resource. If the new field values
indicate that the cached entity differs from the current entity (as
would be indicated by a change in Content-Length, Content-MD5, ETag
or Last-Modified), then the cache MUST treat the cache entry as
stale.
Most likely it is trying to verify the clients cookie/session is valid with the server.
Related
It appears that in a recent Chrome release, (or at least recently when making calls to my API --- haven't see it until today), Google is throwing warnings about CORB requests being blocked.
Cross-Origin Read Blocking (CORB) blocked cross-origin response [domain] with MIME type text/plain. See https://www.chromestatus.com/feature/5629709824032768 for more details.
I have determined that the requests to my API are succeeding, and that it's the pre-flight OPTIONS request that is triggering the warning in console.
The application which is calling the API, is not explicitly making the OPTIONS request, rather I have come to understand this is enforced by the browser when making a cross-origin request and is done automatically by the browser.
I can confirm that the OPTIONS request response does not have a mime-type defined. However, I am a little confused as it is my understanding that an OPTIONS response, is only headers, and does not contain a body. I do not understand why such a request would require a mime-type to be defined.
Moreover, the console warning says the request was blocked; yet the various POST and GET requests, are succeeding. So it looks as though the OPTIONS request isn't actually being blocked?
This is a three-part question:
Why does an OPTIONS request require a mime-type to be defined, when there is no body response?
What should the mime-type be for an OPTIONS request, if plain/text is not appropriate? Would I assume application/json to be correct?
How do I configure my Apache2 server to include a mime-type for all pre-flight OPTIONS requests?
I have gotten to the bottom of these CORB warnings.
The issue is related, in part, to my use of the content-type-options: nosniff header. I set this header in order to stop the browser from trying to sniff the content-type itself, thereby removing mime-type trickery, namely with user-uploaded files, as an attack vector.
The other part of this, is related to the content-type being returned application/json;charset=utf-8. Per Google's documentation, it notes:
A response served with a "X-Content-Type-Options: nosniff" response header and an incorrect "Content-Type" response header, may be blocked.
Based on this, I set out to double check IANA's site on acceptable media types. To my surprise, I discovered that no charset parameter was ever actually defined in any RFC for the application/json type, and further notes:
No "charset" parameter is defined for this registration. Adding one really has no effect on compliant recipients.
Based on this, I removed the charset from the content-type: application/json and can confirm the CORB warnings stopped in Chrome.
In conclusion, it would appear that per a recent Chrome release, Google has opted to start treating the mime-type more strictly than it has in the past.
Lastly, as a side note, the reason all of our application requests still succeeds, is because it appears Cross-Origin Read Blocking isnt actually enforced in Chrome:
In most cases, the blocked response should not affect the web page's behavior and the CORB error message can be safely ignored.
Was having the same issue.
The problem I had was due to the fact the API was answering to the preflight with 200 OK but was an empty response without the Content-Length header set.
So, either changing the preflight response status to 204 No Content or by simply setting the Content-Length: 0 header solved the issue.
I am writing a http handler for a server and I am looking directly at the http requests when they come in from different clients. I can easily deal with normal http requests. The problem occurs when I get a GET or POST request. I do not know how to access the data from the GET or the POST therefore I cannot continue. Could someone please point me in the direction of some where which deals with the issue on how to access the data. Thanks in advance.
Answer:
to do this:
In a GET request the data comes in the URL itself therefore just parse the URL from the HTTP request and look for the question mark and the arguments.
For a POST request there are 2 different ways however the main one means that the arguments are put in the body of the request like this:
q=hello&v=world
The length is specified in the request as well so if you need it is under Content-Length:
Per https://www.rfc-editor.org/rfc/rfc7230#section-3.1.1 recipients of an invalid request-line SHOULD respond with a 400 - Bad Request. Thus as per the RFC, the request, GET /cat".html HTTP/1.1 should return 400.
I've written a server that will return just that upon detection of a ". Thus a request via telnet to my server returns just that.
However, when the identical request is sent via a browser, GET /cat".html HTTP/1.1 is converted by the browser and sent as GET /cat%22.html HTTP/1.1. Thus, 400 is not being returned but rather 404 - Not Found since the file cat%22.html is not in my public directory.
I'm confused as to what the RFC is wanting since it would never be possible to send GET /cat".html HTTP/1.1 via a browser and have a error code of 404 returned. Since cat".html is a bad request sent via a browser a server should return that but it's not possible unless you code in the server %22 as being a bad request however anything with %22 in the filename is valid and thus wouldn't be a 400 bad request although it could be 404 Not Found.
What am I missing here?
The HTTP specificication says that the HTTP request, nothing to do with browsers the specification is HTTP(the protocol only), shouldn't contain a ". If you try and send a " your browser is url encoding it to %22 because " is invalid (it's helping you). So that's a good thing right?
it would never be possible to send GET /cat".html HTTP/1.1
Your presuming that all HTTP is generated by browsers, it's not. Many many technologies and software generate HTTP. Not all of them will kindly URL encode your request for you.
BTW: You shouldn't really assume that all browsers will do this either, to assume makes an ass-out-of-u-and-me ;)
TL;DR
If your HTTP contains an actual " return a 400
If your HTTP request has url encoded the " to a %22 this is valid and should be processed accordingly (this may result in a 404)
I'd like to know if the POST method on HTTP sends data as a QueryString, or if it use a special structure to pass the data to the server.
In fact, when I analyze the communication with POST method from client to server (with Fiddler for example), I don't see any QueryString, but a Form Body context with the name/value pairs.
The best way to visualize this is to use a packet analyzer like Wireshark and follow the TCP stream. HTTP simply uses TCP to send a stream of data starting with a few lines of HTTP headers. Often this data is easy to read because it consists of HTML, CSS, or XML, but it can be any type of data that gets transfered over the internet (Executables, Images, Video, etc).
For a GET request, your computer requests a specific URL and the web server usually responds with a 200 status code and the the content of the webpage is sent directly after the HTTP response headers. This content is the same content you would see if you viewed the source of the webpage in your browser. The query string you mentioned is just part of the URL and gets included in the HTTP GET request header that your computer sends to the web server. Below is an example of an HTTP GET request to http://accel91.citrix.com:8000/OA_HTML/OALogout.jsp?menu=Y, followed by a 302 redirect response from the server. Some of the HTTP Headers are wrapped due to the size of the viewing window (these really only take one line each), and the 302 redirect includes a simple HTML webpage with a link to the redirected webpage (Most browsers will automatically redirect any 302 response to the URL listed in the Location header instead of displaying the HTML response):
For a POST request, you may still have a query string, but this is uncommon and does not have anything to do with the data that you are POSTing. Instead, the data is included directly after the HTTP headers that your browser sends to the server, similar to the 200 response that the web server uses to respond to a GET request. In the case of POSTing a simple web form this data is encoded using the same URL encoding that a query string uses, but if you are using a SOAP web service it could also be encoded using a multi-part MIME format and XML data.
For example here is what an HTTP POST to an XML based SOAP web service located at http://192.168.24.23:8090/msh looks like in Wireshark Follow TCP Stream:
Post uses the message body to send the information back to the server, as opposed to Get, which uses the query string (everything after the question mark). It is possible to send both a Get query string and a Post message body in the same request, but that can get a bit confusing so is best avoided.
Generally, best practice dictates that you use Get when you want to retrieve data, and Post when you want to alter it. (These rules aren't set in stone, the specs don't forbid altering data with Get, but it's generally avoided on the grounds that you don't want people making changes just by clicking a link or typing a URL)
Conversely, you can use Post to retrieve data without changing it, but using Get means you can bookmark the page, or share the URL with other people, things you couldn't do if you'd used Post.
http://en.wikipedia.org/wiki/GET_%28HTTP%29
http://en.wikipedia.org/wiki/POST_%28HTTP%29
As for the actual format of the data sent in the message body, that's entirely up to the sender and is specified with the Content-Type header. If not specified, the default content-type for HTML forms is application/x-www-form-urlencoded, which means the server will expect the post body to be a string encoded in a similar manner to a GET query string. However this can't be depended on in all cases. RFC2616 says the following on the Content-Type header:
Any HTTP/1.1 message containing an entity-body SHOULD include a
Content-Type header field defining the media type of that body. If
and only if the media type is not given by a Content-Type field, the
recipient MAY attempt to guess the media type via inspection of its
content and/or the name extension(s) of the URI used to identify the
resource. If the media type remains unknown, the recipient SHOULD
treat it as type "application/octet-stream".
A POST request can include a query string, however normally it doesn't - a standard HTML form with a POST action will not normally include a query string for example.
GET will send the data as a querystring, but POST will not. Rather it will send it in the body of the request.
If your post try to reach the following URL
mypage.php?id=1
you will have the POST data but also GET data.
I'm writing a resource handling method where I control access to various files, and I'd like to be able to make use of the browser's cache. My question is two-fold:
Which are the definitive HTTP headers that I need to check in order to know for sure whether I should send a 304 response, and what am I looking for when I do check them?
Additionally, are there any headers that I need to send when I initially send the file (like 'Last-Modified') as a 200 response?
Some psuedo-code would probably be the most useful answer.
What about the cache-control header? Can the various possible values of that affect what you send to the client (namely max-age) or should only if-modified-since be obeyed?
Here's how I implemented it. The code has been working for a bit more than a year and with multiple browsers, so I think it's pretty reliable. This is based on RFC 2616 and by observing what and when the various browsers were sending.
Here's the pseudocode:
server_etag = gen_etag_for_this_file(myfile)
etag_from_browser = get_header("Etag")
if etag_from_browser does not exist:
etag_from_browser = get_header("If-None-Match")
if the browser has quoted the etag:
strip the quotes (e.g. "foo" --> foo)
set server_etag into http header
if etag_from_browser matches server_etag
send 304 return code to browser
Here's a snippet of my server logic that handles this.
/* the client should set either Etag or If-None-Match */
/* some clients quote the parm, strip quotes if so */
mketag(etag, &sb);
etagin = apr_table_get(r->headers_in, "Etag");
if (etagin == NULL)
etagin = apr_table_get(r->headers_in, "If-None-Match");
if (etag != NULL && etag[0] == '"') {
int sl;
sl = strlen(etag);
memmove(etag, etag+1, sl+1);
etag[sl-2] = 0;
logit(2,"etag=:%s:",etag);
}
...
apr_table_add(r->headers_out, "ETag", etag);
...
if (etagin != NULL && strcmp(etagin, etag) == 0) {
/* if the etag matches, we return a 304 */
rc = HTTP_NOT_MODIFIED;
}
If you want some help with etag generation post another question and I'll dig out some code that does that as well. HTH!
A 304 Not Modified response can result from a GET or HEAD request with either an If-Modified-Since ("IMS") or an If-Not-Match ("INM") header.
In order to decide what to do when you receive these headers, imagine that you are handling the GET request without these conditional headers. Determine what the values of your ETag and Last-Modified headers would be in that response and use them to make the decision. Hopefully you have built your system such that determining this is less costly than constructing the complete response.
If there is an INM and the value of that header is the same as the value you would place in the ETag, then respond with 304.
If there is an IMS and the date value in that header is later than the one you would place in the Last-Modified, then respond with 304.
Else, proceed as though the request did not contain those headers.
For a least-effort approach to part 2 of your question, figure out which of the (Expires, ETag, and Last-Modified) headers you can easily and correctly produce in your Web application.
For suggested reading material:
http://www.w3.org/Protocols/rfc2616/rfc2616.html
http://www.mnot.net/cache_docs/
You should send a 304 if the client has explicitly stated that it may already have the page in its cache. This is called a conditional GET, which should include the if-modified-since header in the request.
Basically, this request header contains a date from which the client claims to have a cached copy. You should check if content has changed after this date and send a 304 if it hasn't.
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.25 for the related section in the RFC.
We are also handling cached, but secured, resources. If you send / generate an ETAg header (which RFC 2616 section 13.3 recommends you SHOULD), then the client MUST use it in a conditional request (typically in an If-None-Match - HTTP_IF_NONE_MATCH - header). If you send a Last-Modified header (again you SHOULD), then you should check the If-Modified-Since - HTTP_IF_MODIFIED_SINCE - header. If you send both, then the client SHOULD send both, but it MUST send the ETag. Also note that validtion is just defined as checking the conditional headers for strict equality against the ones you would send out. Also, only a strong validator (such as an ETag) will be used for ranged requests (where only part of a resource is requested).
In practice, since the resources we are protecting are fairly static, and a one second lag time is acceptable, we are doing the following:
Check to see if the user is authorized to access the requested resource
If they are not, Redirect them or send a 4xx response as appropriate. We will generate 404 responses to requests that look like hack attempts or blatant tries to perform a security end run.
Compare the If-Modified-Since header to the Last-Modified header we would send (see below) for strict equality
If they match, send a 304 Not Modified response and exit page processing
Create a Last-Modified header using the modification time of the requested resource
Look up the HTTP Date format in RFC 2616
Send out the header and resource content along with an appropriate Content-Type
We decided to eschew the ETag header since it is overkill for our purposes. I suppose we could also just use the date time stamp as an ETag. If we move to a true ETag system, we would probably store computed hashes for the resources and use those as ETags.
If your resources are dynamically generated, from say database content, then ETags may be better for your needs, since they are just text to be populated as you see fit.
regarding cache-control:
You shouldn't have to worry about the cache-control when serving out, other than setting it to a reasonable value. It's basically telling the browser and other downstream entities (such as a proxy) the maximum time that should elapse before timing out the cache.