HTTP "Ping-To" Headers - How can they be external URLs? - html

PS: I don't know if this question is better suited for Server Fault, or Webmasters. If so, it can be moved.
In HTML5 there is a concept called hyperlink auditing, which can involve a ping attribute in a a hyperlink element. The use of which results in Ping-To and Ping-From HTTP request headers being received. All of the ones my site receives, the Ping-From value is a google search results page.
From what I understand, when these are present in a link to a given website, that website would receive the Ping-To HTTP header with the website's URL as the value.
99.9% of these Ping-To http request header values I am seeing are some url within the website. The other .1% are external URLs.
I have reviewed the w3.org spec for hyperlink auditing (linked above) and based on that, the Ping-To seems that it should always be the website receiving the Ping-To header.
If both the address of the Document object containing the hyperlink being audited and the ping URL have the same origin -
The request must include a Ping-From HTTP header with, as its value, the address of the document containing the hyperlink, and a Ping-To HTTP header with, as its value, the address of the absolute URL of the target of the hyperlink. The request must not include a Referer (sic) HTTP header.
Otherwise, if the origins are different, but the document containing the hyperlink being audited was not retrieved over an encrypted connection -
The request must include a Referer (sic) HTTP header [sic] with, as its value, the current address of the document containing the hyperlink, a Ping-From HTTP header with the same value, and a Ping-To HTTP header with, as its value, the address of the target of the hyperlink.
Otherwise, the origins are different and the document containing the hyperlink being audited was retrieved over an encrypted connection -
The request must include a Ping-To HTTP header with, as its value, the address of the target of the hyperlink. The request must neither include a Referer (sic) HTTP header nor include a Ping-From HTTP header.
The external URLs I am seeing aren't linked from my site in any way, so I can't think of any other relation, other than they might show up in similar search engine search results pages as does my site.
So my data of values in Ping-To request headers might look like this, for example. The only way this would be related to my site is that it would for sure show up in search results for the query in the Ping-From value:
Ping-From:
https://www.google.com/search?client=safari&hl=vi-us&biw=414&bih=622&tbm=isch&sa=1&ei=ZZwGWsrxF8OpjwTC56-IBg&q=denver+Carom+Billiard+Table+for+sale&oq=denver+Carom+Billiard+Table+for+sale&gs_l=mobile-gws-img.3...20443.29051.0.29760.14.12.2.0.0.0.140.1309.0j11.11.0....0...1.1.64.mobile-gws-img..1.3.369...35i39k1j30i10k1.150.Pm80h9oyYco#imgrc=eCyr8UATa475QM:
Ping-To:
https://www.yelp.com/biz/corner-pocket-billiards-wilton-manors
These are a pair of http headers on a single request that my site (which is not yelp.com) received.
What I don't understand is: What does it mean when a received Ping-To header has a value of some other web site? (what is the scenario occurring when I receive a Ping-To header with an external url as the value?)

Related

How to check the HTTP response body(such as HTML content) with wireshark?

I input the URL in the address bar in the browser of the virtual machine, and the URL requests an HTML document in my host computer (this HTML document is also written by me). Then the HTML document is successfully displayed in the virtual machine browser. It seems that the HTTP request and response are successfully sent, otherwise the HTML document cannot be seen in the browser.
I tracked this process with Wireshark software and found the HTTP request (that is, the request for HTML document) sent by the virtual machine to my host computer, as shown in the following screenshot:(some personal information is masked with purple)
HTTP request
Double clicked this request to see the details and I found that:
request's detail
It showed the response is in frame 6.
However, the contents of the HTML document cannot be found after opening the details of the frame 6. I wrote the HTML myself. I know there are stuffs like "button", "input". Normally, the content of these HTML should be in the HTTP response, but I couldn't find it, not even a "body". There is no HTML document content in the HTTP response. The whole frame 6 is down below:
response's detail
You get a HTTP status code 304 Not Modified and not a status code 200 back. This means that the browser already had the response body and just verified with the server is the body is still the same as it has cached. You should see an If-None-Match field and/or If-Modified-Since
Since the body has not changed the server just informs the browser using status code 304 about this so that it uses the already cached response. It does not send the content again in the response body since it is not needed by the browser. If you want to get the actual content in the response make sure it is not already cached, i.e. remove all caches in the browser or use a client which does not cache, like curl.

Why is the HTTP location header only set for POST requests/201 (Created) responses?

Ignoring 3xx responses for a moment, I wonder why the HTTP location header is only used in conjunction with POST requests/201 (Created) responses.
From the RFC 2616 spec:
For 201 (Created) responses, the Location is that of the new resource which was created by the request.
This is a widely supported behavior, but why shouldn't it be used with other HTTP methods? Take the JSON API spec as an example:
It defines a self referencing link for the current resource inside the JSON payload (not uncommon for RESTful APIs). This link is included in every payload. The spec says that you MUST include an HTTP location header, if you create a new document via POST and that the value is the same as the self referencing link in the payload, but this is ONLY needed for POST. Why bother with a custom format for a self referencing link, if you could just use the HTTP location header?
Note: This isn't specific to JSON API. It's the same for HAL, JSON Hyper-Schema or other standards.
Note 2: It isn't even specific to the HTTP location header as it is the same with the HTTP link header. As you can see the JSON API, HAL and JSON Hyper-Schema not only define conventions for self referencing links, but also to express information about related resources or possible actions for a resource. But it seems that they all could just use the HTTP link header. (They could even put the self referencing link into the HTTP link header, if they don't want to use the HTTP location header.)
I don't want to rant, it just seems to be some sort of "reinventing the wheel". It also seems to be very limiting: if you would just use HTTP location/link header, it doesn't matter if you ask for JSON, XML or whatever in your HTTP accept header and you would get useful meta-information about your resource on a HEAD request, which wouldn't contain the links if you would use JSON API, HAL or JSON Hyper-Schema.
The semantics of the Location header isn't that of a self-referencing link, but of a link the user-agent should follow in order to complete the request. That makes sense in redirects, and when you create a new resource that will be in a new location you should go to. If your request is already completed, meaning you already have a full representation of the resource you wanted, it doesn't make sense to return a Location.
The Link header may be considered semantically equivalent to an hypertext Link, but it should be used to reference metadata related to the given resource when the media-type is not hypermedia-aware, so it doesn't replace the functionality of a link to related resources in a RESTful API.
The need for a custom link format in the resource representation is inherent to the need to decouple the resource from the underlying implementation and protocol. REST is not coupled to HTTP, and any protocol for which there's a valid URI scheme can be used. If you decided to use the Link header for all links, you're coupling to HTTP.
Let's say you present an FTP link for clients to follow. Where would be the Link in that case?
The semantic of the Location header depends on the status code. For 201, it links to the newly created resource, but in 3xx requests it can have multiple (although similiar) meanings. I think that is why it is generally avoided for other usages.
The alternative is the Content-Location header, which always has a consistent meaning. It tells the client the canonical URL the resource it requested. It is purely informative (in contrast to the Location, which is expected to be processed by the client).
So, the Content-Location header seems to closer resemble a self-referencing link. However, the Content-Location also has no defined behavior for PUT and POST. It also seems to be quite rarely used.
This blogs post Location vs Content-Location is a nice comparison. Here is a quote:
Finally, neither header is meant for general-purpose linking.
In sum, requiring a standardized, self link in the body seems to be good idea. It avoids a lot of confusion on the client side.

Why does yslow identifiy my images, scripts and css as cookies?

I am using yslow version 3.1.4
A cookie is sent with every request made to the host domain that matches the domain and path attributes that were specified when it was set with a set-cookie: response header.
When your browser issues a GET for print.css its request header will still contain a cookie: header if the domain & path match.
To prevent this see setting the path on cookie prevent it being sent in http static requests?.

Do HTTP POST methods send data as a QueryString?

I'd like to know if the POST method on HTTP sends data as a QueryString, or if it use a special structure to pass the data to the server.
In fact, when I analyze the communication with POST method from client to server (with Fiddler for example), I don't see any QueryString, but a Form Body context with the name/value pairs.
The best way to visualize this is to use a packet analyzer like Wireshark and follow the TCP stream. HTTP simply uses TCP to send a stream of data starting with a few lines of HTTP headers. Often this data is easy to read because it consists of HTML, CSS, or XML, but it can be any type of data that gets transfered over the internet (Executables, Images, Video, etc).
For a GET request, your computer requests a specific URL and the web server usually responds with a 200 status code and the the content of the webpage is sent directly after the HTTP response headers. This content is the same content you would see if you viewed the source of the webpage in your browser. The query string you mentioned is just part of the URL and gets included in the HTTP GET request header that your computer sends to the web server. Below is an example of an HTTP GET request to http://accel91.citrix.com:8000/OA_HTML/OALogout.jsp?menu=Y, followed by a 302 redirect response from the server. Some of the HTTP Headers are wrapped due to the size of the viewing window (these really only take one line each), and the 302 redirect includes a simple HTML webpage with a link to the redirected webpage (Most browsers will automatically redirect any 302 response to the URL listed in the Location header instead of displaying the HTML response):
For a POST request, you may still have a query string, but this is uncommon and does not have anything to do with the data that you are POSTing. Instead, the data is included directly after the HTTP headers that your browser sends to the server, similar to the 200 response that the web server uses to respond to a GET request. In the case of POSTing a simple web form this data is encoded using the same URL encoding that a query string uses, but if you are using a SOAP web service it could also be encoded using a multi-part MIME format and XML data.
For example here is what an HTTP POST to an XML based SOAP web service located at http://192.168.24.23:8090/msh looks like in Wireshark Follow TCP Stream:
Post uses the message body to send the information back to the server, as opposed to Get, which uses the query string (everything after the question mark). It is possible to send both a Get query string and a Post message body in the same request, but that can get a bit confusing so is best avoided.
Generally, best practice dictates that you use Get when you want to retrieve data, and Post when you want to alter it. (These rules aren't set in stone, the specs don't forbid altering data with Get, but it's generally avoided on the grounds that you don't want people making changes just by clicking a link or typing a URL)
Conversely, you can use Post to retrieve data without changing it, but using Get means you can bookmark the page, or share the URL with other people, things you couldn't do if you'd used Post.
http://en.wikipedia.org/wiki/GET_%28HTTP%29
http://en.wikipedia.org/wiki/POST_%28HTTP%29
As for the actual format of the data sent in the message body, that's entirely up to the sender and is specified with the Content-Type header. If not specified, the default content-type for HTML forms is application/x-www-form-urlencoded, which means the server will expect the post body to be a string encoded in a similar manner to a GET query string. However this can't be depended on in all cases. RFC2616 says the following on the Content-Type header:
Any HTTP/1.1 message containing an entity-body SHOULD include a
Content-Type header field defining the media type of that body. If
and only if the media type is not given by a Content-Type field, the
recipient MAY attempt to guess the media type via inspection of its
content and/or the name extension(s) of the URI used to identify the
resource. If the media type remains unknown, the recipient SHOULD
treat it as type "application/octet-stream".
A POST request can include a query string, however normally it doesn't - a standard HTML form with a POST action will not normally include a query string for example.
GET will send the data as a querystring, but POST will not. Rather it will send it in the body of the request.
If your post try to reach the following URL
mypage.php?id=1
you will have the POST data but also GET data.

How does HTTP and HTML Work Together?

The answer to this little question will clear everything up for me.
If have a form tag that has a Get method and an action of some random script.
When I hit the submit button on the page, the Get Method is sent to HTTP and HTTP is what appends the query string to the url, the HTTP then returns a 20X status if the response is good and a 40X is a bad response? And our action goes to our webserver to run the script?
HTTP is transport and HTML is content. The Form submit calls a GET or POST request on the server depending on the action defined for the HTML form. The Form's arguments are appended by the Browser's form logic to the HTTP request, depending whehter GET or POST is used, they are attached to the request URL or put into the request body.
Then the request is handled on the server and the result is returned by the server logic (which can be a CGI, some perl script, a J2EE application etc.).
The server seponds with a HTTP status code (where everything below 300 is a success, and everything above 399 is an error - see here:HTTP staus codes ).
You are sending your form's data via HTTP using the "get" request. HTTP is a protocol and not a server. Your request is handled by a server who knows how to handle the HTTP protocol, eg. Apache.
The server processes the data and sends back a response. As you mention there are different kind of responses. 404 is best known (document not found).
The script is not run on the server, it is run on the client (the browser).
HTML is the markup code that describes the structure of the page. Browsers interpet the HTML code they receive and construct your page from it. Check here for more details: Wikipedia: HTML
The HTTP is the protocol used by the browser to talk to the server. Check this for more details: Wikipedia again: HTTP