How does browser determine the Accept header? - html

So I have this web-page in my local:
<html>
<head>
<title>Sample "Hello, World" Application</title>
</head>
<body bgcolor=white>
<table border="0">
<tr>
<td>
<img src="images/tomcat.gif">
</td>
<td>
<h1>Sample "Hello, World" Application</h1>
<p>This is the home page for a sample application used to illustrate the
source directory organization of a web application utilizing the principles
outlined in the Application Developer's Guide.
</td>
</tr>
</table>
<p>To prove that they work, you can execute either of the following links:
<ul>
<li>To a JSP page.
<li>To a servlet.
</ul>
</body>
</html>
And when I trace my HTTP Requests and Responses I will see:
GET /sample/ HTTP/1.1
Host: localhost:8080
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, lzma, sdch
Accept-Language: tr
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.65 Safari/537.36 OPR/26.0.1656.24
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 636
Content-Type: text/html
Date: Fri, 28 Nov 2014 19:48:47 GMT
ETag: W/"636-1185801988000"
Last-Modified: Mon, 30 Jul 2007 13:26:28 GMT
Server: Apache-Coyote/1.1
and second request is made for the image:
GET /sample/images/tomcat.gif HTTP/1.1
Host: localhost:8080
Accept: image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, lzma, sdch
Accept-Language: tr
Referer: http://localhost:8080/sample/
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.65 Safari/537.36 OPR/26.0.1656.24
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 1441
Content-Type: image/gif
Date: Fri, 28 Nov 2014 19:54:55 GMT
ETag: W/"1441-1185801988000"
Last-Modified: Mon, 30 Jul 2007 13:26:28 GMT
Server: Apache-Coyote/1.1
Why is the browser sending a Accept: image/webp,*/*;q=0.8
When I click on the image itself and open it in new tab the request will be sent as:
GET /sample/images/tomcat.gif HTTP/1.1
Host: localhost:8080
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, lzma, sdch
Accept-Language: tr
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.65 Safari/537.36 OPR/26.0.1656.24
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 1441
Content-Type: image/gif
Date: Fri, 28 Nov 2014 19:55:53 GMT
ETag: W/"1441-1185801988000"
Last-Modified: Mon, 30 Jul 2007 13:26:28 GMT
Server: Apache-Coyote/1.1
Why is the second request has many more values in Accept header but in the first case it is different?

It's essentially a historical record of formats that browser manufacturers wanted to make it easy to identify support for.
As Grice points out, these all include */* and so accept anything; the specified formats are just preferences.
In at least one case:
Accept: image/webp,*/*;q=0.8
is simply a special case to promote the uptake of WebP. Modify Accept header to explicitly denote the image formats Chrome supports:
As part of promoting the adoption of WebP, it would be useful if Chrome more explicitly indicated the image formats it supports in the Accept header. Currently it just returns */*, but this makes it difficult for servers to know whether it's safe to return WebP images in lieu of JPEG or not
As Chrome clearly accepts PNG, JPEG and GIF as well, making WebP a special case is simply an attempt to encourage specific support for a preferred format. Likewise, the special case for application/xhtml+xml made a lot of sense when trying to encourage wider use of XHTML.

Related

Website refuses headless chrome connections

Im trying to implement simple scraper, however I encoutered some problem. Somehow website is refusing connections from headless chrome. This is first and the only request, there is no any javascript execution. Requests from normal chrome works well so it's definitly not a banned ip. What can be wrong here? How are they posibly detecting it?
I'm running normal headless chrome and then I replace user agent, that's all.
.\chrome.exe --headless --remote-debugging-port=9222
General:
Request URL: https://www.adidas.de/
Request Method: GET
Status Code: 403
Remote Address: 23.210.248.137:443
Referrer Policy: no-referrer-when-downgrade
Response Headers:
cache-control: max-age=0, no-cache, no-store
content-length: 1952
content-type: text/html
date: Thu, 26 Dec 2019 16:16:49 GMT
expires: Thu, 26 Dec 2019 16:16:49 GMT
pragma: no-cache
status: 403
Request Headers:
:authority: www.adidas.de
:method: GET
:path: /
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
cache-control: max-age=0
sec-fetch-mode: navigate
sec-fetch-site: none
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36
#RobertHarvey: Yes, you are correct: sending Accept-Language is a must for some websites. You can either do it with puppeteer via their API, or with chrome-remote-interface by intercepting requests and adding header directly.

Chrome not caching preflight

I'm implementing a REST API that should support cross domain requests. Using CORS I want to achieve this. Almost all of my requests are 'not-simple', meaning for all non-GET requests a preflight request must be send by the browser.
To limit the amount of preflight/OPTIONS requests I try to let the browser cache the OPTIONS requests. This seems to work in Firefox and Safari, but not in Chrome. I know Chrome will only cache the preflight requests for only 10 minutes, but in my case it seems no caching takes place at all.
These are the HTTP requests and responses sent/received by Chrome:
Request:
OPTIONS /api/v1/sessions HTTP/1.1
Host: xxxxxxx
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Access-Control-Request-Method: POST
Origin: http://localhost:8000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.107 Safari/537.36
Access-Control-Request-Headers: content-type
Accept: */*
Referer: http://localhost:8000/
Accept-Encoding: gzip, deflate, sdch
Accept-Language: nl-NL,nl;q=0.8,en-US;q=0.6,en;q=0.4
Response:
HTTP/1.1 200 OK
Date: Sun, 26 Jul 2015 09:33:27 GMT
Server: Apache/2.4.7 (Ubuntu)
X-Powered-By: PHP/5.5.9-1ubuntu4.9
Cache-Control: private, max-age=1440, pre-check=1440
Access-Control-Allow-Origin: http://localhost:8000
Access-Control-Allow-Methods: GET,POST,PATCH,DELETE
Access-Control-Max-Age: 86400
Access-Control-Allow-Headers: content-type
Content-Length: 0
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8
You have Pragma: no-cache & Cache-Control: no-cache headers set in the request.
Try removing them.
Api requests by default do not set these headers, and I doubt chrome does
either.
You should check your code and find out where they are
set from.
Now, given that its working fine on other browsers, you'd better check if you have set no-cache option on Dev Tools.
Preflight caching is a known bug in 98 version.
Follow below ticket for more details
https://bugs.chromium.org/p/chromium/issues/detail?id=1298477

ERR_CONTENT_LENGTH_MISMATCH when loading audio

I've been trying to get background music working for my browser-based game. It's working great, but in Chrome I frequently have the music cut short and this error appears:
Failed to load resource: net::ERR_CONTENT_LENGTH_MISMATCH
I watched the Network tab and saw the audio file being loaded as it should be, with the 206 Partial Content status, until it hit that error and just stopped.
Reloading the page will usually yield the same result, but at a different point in the track. I have yet to encounter this problem in IE, it seems to only be Chrome that's affected.
Any suggestions as to what may be happening?
Example request/response:
GET /music/___________.mp3 HTTP/1.1
Host: ____________.net
Connection: keep-alive
Accept-Encoding: identity;q=1, *;q=0
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36
Accept: */*
DNT: 1
Referer: http://____________.net/
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
Cookie: SID=XXXXXXXXXX
Range: bytes=0-
HTTP/1.1 206 Partial Content
Date: Tue, 26 Aug 2014 13:53:38 GMT
Server: Apache/2.2.26 (Unix) mod_ssl/2.2.26 OpenSSL/0.9.8e-fips-rhel5 mod_bwlimited/1.4
Last-Modified: Fri, 13 Jun 2014 21:00:31 GMT
ETag: "219f1a-8ed344-4fbbdf7c339c0"
Accept-Ranges: bytes
Content-Length: 9360196
Content-Range: bytes 0-9360195/9360196
Connection: close
Content-Type: audio/mpeg
Increasing Apache's timeout setting fixed it.
Basically Chrome was being "too clever" by only downloading fast enough to keep just ahead of the buffer, and Apache was getting bored.

Browser is not caching images in HTTPS (HTTP works fine). Even with Cache-Control: public

I'm trying to follow Google's caching recommendation, but neither IE nor Chrome are caching my images when HTTPS is used. The second request is not even a conditional GET. If I simply switch to HTTP, it works fine.
Here's request information, according to Chrome's request logger:
Remote Address: ::1:443
Request URL: https://localhost/getmyimage.php?id=123
Request Method: GET
Status Code: 200 OK
Request Headers
Accept: image/webp,*/*;q=0.8
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en;q=0.8
Connection: keep-alive
Cookie: PHPSESSID=gbk4vk7ejlr20nqgajcqgskul7
Host: localhost
Referer: https://localhost/
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36
Query String Parameters
id: 123
Response Headers
Cache-Control: public
Connection: Keep-Alive
Content-Length: 3224
Content-Type: image/png
Date: Tue, 27 May 2014 06:53:03 GMT
Expires: Mon, 25 Aug 2014 06:53:03 GMT
Keep-Alive: timeout=5, max=99
Last-Modified: Mon, 24 Feb 2014 02:17:21 GMT
Server: Apache/2.4.7 (Win32) OpenSSL/1.0.1e PHP/5.5.9
X-Powered-By: PHP/5.5.9
i think this is happening because of the url format, you can use apache's mod_rewrite to make a url format for images given by this script to look like localhost/image/123.png
EDIT
after reading your comment, i can say that it's not about your server's config, you can't do anything about it according to this and this because of HTTPS implementations since you have Cache-control: public set already.

Chrome does not make additional request for seeking video file

I try to achieve pseudo streaming, I have html like so:
<video src="GetVideo.ashx?id=mp4" controls></video>
after loading page Chrome 28.0.1500.72 m sends request (even before clicking play):
GET /GetVideo.ashx?id=mp4 HTTP/1.1
Host: localhost
Connection: keep-alive
Accept-Encoding: identity;q=1, *;q=0
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36
Accept: */*
Referer: http://localhost/JWPlayerTestMp4Proper.aspx
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4
Cookie: jwplayer.volume=12
Range: bytes=0-
And server responds with
HTTP/1.1 206 Partial Content
Cache-Control: private
Content-Length: 5186931
Content-Type: video/mp4
Content-Range: bytes 0-5186930/5186931
Accept-Ranges: bytes
Server: Microsoft-IIS/8.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Mon, 22 Jul 2013 08:13:28 GMT
File starts to play after clicking play, but the problem is if i try to seek to yet not downloaded part, it does not send additional request for that part, it simply waits until file is downloaded upon specified position.
When I do the same in Firefox 22.0:
first request (after page loading):
GET http://localhost/GetVideo.ashx?id=mp4 HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0
Accept: video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3
Range: bytes=0-
Referer: http://localhost/JWPlayerTestMp4Proper.aspx
Connection: keep-alive
first response (it is the same as for Chrome):
HTTP/1.1 206 Partial Content
Cache-Control: private
Content-Length: 5186931
Content-Type: video/mp4
Content-Range: bytes 0-5186930/5186931
Accept-Ranges: bytes
Server: Microsoft-IIS/8.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Mon, 22 Jul 2013 08:28:19 GMT
second request (after seeking to yet not downloaded part):
GET http://localhost/GetVideo.ashx?id=mp4 HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0
Accept: video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3
Range: bytes=2490368-
Referer: http://localhost/JWPlayerTestMp4Proper.aspx
Connection: keep-alive
second response:
HTTP/1.1 206 Partial Content
Cache-Control: private
Content-Length: 2696563
Content-Type: video/mp4
Content-Range: bytes 2490368-5186930/5186931
Accept-Ranges: bytes
Server: Microsoft-IIS/8.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Mon, 22 Jul 2013 08:35:34 GMT
IE 10 is working in the same way as Firefox.
What response header Chrome expects to be able to behave in the same way - to make addtional requests after seeking to not-downloaded part?
It turned out response header was correct.
The problem was I was using video file with small length and chrome appears to have some kind of optimization not sending additional request if difference in time is too small (less then 30 seconds or so).