HTTP protocol: HTML only? - html

My informatics teacher explained that HTTP can really be used exclusively to transfer unencoded ASCII, and as soon as an image or the like have to be downloaded from the server, FTP is used instead.
Is this true?

No, it isn't. HTTP contains a header and data part, the latter gets interpreted by the receipent according to the CONTENT-TYPE header. HTTP can transfer arbitrary data.

If you want any more specific information about the HTTP protocol, the full RFC (request for comments) document is available here: http://www.w3.org/Protocols/rfc2616/rfc2616.html
An RFC is where the various different parties who have a stake in these protocols are presented the current version and invited to give their opinion. It's the method by which most of the internet has been built :)

Related

Why not use enctype="multipart/form-data" always?

By change I discovered that the django admin interfaces uses enctype="multipart/form-data" always.
I would like to adopt this pattern, but I am unsure if I see all consequences this has.
Why not use enctype="multipart/form-data" always?
Update
Since more than one year we use enctype="multipart/form-data" always in some forms. Works fine.
From the RFC that defines multipart/form-data:
Many web applications use the "application/x-www-form-urlencoded"
method for returning data from forms. This format is quite compact,
for example:
name=Xavier+Xantico&verdict=Yes&colour=Blue&happy=sad&Utf%F6r=Send
However, there is no opportunity to label the enclosed data with a
content type, apply a charset, or use other encoding mechanisms.
Many form-interpreting programs (primarily web browsers) now
implement and generate multipart/form-data, but a receiving
application might also need to support the
"application/x-www-form-urlencoded" format.
Aside from letting you upload files, multipart/form-data also allows you to use other charsets and encoding mechanisms. So the only reasons not to use it are:
If you want to save a bit of bandwidth (bearing in mind that this becomes much less of an issue if the request body is compressed).
If you need to support really old clients that can't handle file uploads and only know application/x-www-form-urlencoded, or that have issues handling anything other than ASCII.
There's a bit of overhead with using multipart/form-data for simple text forms. Compare a simple form with name and email.
Default (x-www-form-urlencoded)
Content-Type: application/x-www-form-urlencoded; charset=utf-8
name=Nomen+Nescio&email=foo%40bar.com
multipart/form-data
Content-Type: multipart/form-data; boundary=96a188ad5f9d4026822dacbdde47f43f
--96a188ad5f9d4026822dacbdde47f43f
Content-Disposition: form-data; name="name"
Nomen Nescio
--96a188ad5f9d4026822dacbdde47f43f
Content-Disposition: form-data; name="email"
foo#bar.com
--96a188ad5f9d4026822dacbdde47f43f
As you can see, you need to transmit a bunch of additional bytes in the body when using multipart encoding (37 bytes vs 252 bytes in this example)
But when you add the http headers and apply compression, the relative difference in payload would in most real life cases be much smaller.
The reason to prefer urlencoded over multipart is a small saving in http request size.
TL; DR
There's almost certainly no problem if you're targeting any modern browser and using SSL for any confidential data.
Background
The form-data type was originally developed as an experimental extension for file uploads in browsers, as explained in rfc 1867. There were compatibility issues at the time, but if your target browsers supports HTML 4.x and hence the enc-type, you're fine. As you can see here that's not an issue for all mainstream browsers.
As already noted in other answers, it is a more verbose format, but that is also not an issue when you can compress the request or even just rely on the improved speed of communications in the last 20 years.
Finally, you should also consider the potential for abuse of this format. Since it was designed to upload files, there was the potential for this to be used to extract information from the user's machine without their knowledge, or sending confidential information unencrypted, as noted in the HTML spec. Once again, though, modern browsers are so field hardened, I would be stunned if such low hanging fruit was left for hackers to abuse and you can use HTTPS for confidential data.
The enctype attribute specifies how the form-data should be encoded when submitting it to the server and enctype="multipart/form-data" is used when a user want to upload a file (images, text files etc.) to the server.

Why is the HTTP location header only set for POST requests/201 (Created) responses?

Ignoring 3xx responses for a moment, I wonder why the HTTP location header is only used in conjunction with POST requests/201 (Created) responses.
From the RFC 2616 spec:
For 201 (Created) responses, the Location is that of the new resource which was created by the request.
This is a widely supported behavior, but why shouldn't it be used with other HTTP methods? Take the JSON API spec as an example:
It defines a self referencing link for the current resource inside the JSON payload (not uncommon for RESTful APIs). This link is included in every payload. The spec says that you MUST include an HTTP location header, if you create a new document via POST and that the value is the same as the self referencing link in the payload, but this is ONLY needed for POST. Why bother with a custom format for a self referencing link, if you could just use the HTTP location header?
Note: This isn't specific to JSON API. It's the same for HAL, JSON Hyper-Schema or other standards.
Note 2: It isn't even specific to the HTTP location header as it is the same with the HTTP link header. As you can see the JSON API, HAL and JSON Hyper-Schema not only define conventions for self referencing links, but also to express information about related resources or possible actions for a resource. But it seems that they all could just use the HTTP link header. (They could even put the self referencing link into the HTTP link header, if they don't want to use the HTTP location header.)
I don't want to rant, it just seems to be some sort of "reinventing the wheel". It also seems to be very limiting: if you would just use HTTP location/link header, it doesn't matter if you ask for JSON, XML or whatever in your HTTP accept header and you would get useful meta-information about your resource on a HEAD request, which wouldn't contain the links if you would use JSON API, HAL or JSON Hyper-Schema.
The semantics of the Location header isn't that of a self-referencing link, but of a link the user-agent should follow in order to complete the request. That makes sense in redirects, and when you create a new resource that will be in a new location you should go to. If your request is already completed, meaning you already have a full representation of the resource you wanted, it doesn't make sense to return a Location.
The Link header may be considered semantically equivalent to an hypertext Link, but it should be used to reference metadata related to the given resource when the media-type is not hypermedia-aware, so it doesn't replace the functionality of a link to related resources in a RESTful API.
The need for a custom link format in the resource representation is inherent to the need to decouple the resource from the underlying implementation and protocol. REST is not coupled to HTTP, and any protocol for which there's a valid URI scheme can be used. If you decided to use the Link header for all links, you're coupling to HTTP.
Let's say you present an FTP link for clients to follow. Where would be the Link in that case?
The semantic of the Location header depends on the status code. For 201, it links to the newly created resource, but in 3xx requests it can have multiple (although similiar) meanings. I think that is why it is generally avoided for other usages.
The alternative is the Content-Location header, which always has a consistent meaning. It tells the client the canonical URL the resource it requested. It is purely informative (in contrast to the Location, which is expected to be processed by the client).
So, the Content-Location header seems to closer resemble a self-referencing link. However, the Content-Location also has no defined behavior for PUT and POST. It also seems to be quite rarely used.
This blogs post Location vs Content-Location is a nice comparison. Here is a quote:
Finally, neither header is meant for general-purpose linking.
In sum, requiring a standardized, self link in the body seems to be good idea. It avoids a lot of confusion on the client side.

REST : different representations of the same data

How to structure an API where the same data may request in different format, in a RESTful format. For example.
GET /person/<id> //get the details of resource <id>
Now depending on the client (browser) requirement, the data may send as html (say normal rendering) or Json (say ajax call). So my doubts are
Can I keep the same url for both requests, or should keep them seperate?
How to detect whether the request is for html/Json at the server. The request type is same (GET). So which parameter should I consider.
How to detect the difference in data type at client (html/Json)\
thanks,
bsr.
Similar question: REST Content-Type: Should it be based on extension or Accept header?
The accepted answers has great points.
Can I keep the same url for both requests, or should keep them seperate?
Yes, keep them the same. Its the same resource, you're just asking for different representations of it.
How to detect whether the request is for html/Json at the server. The request type is same (GET). So which parameter should I consider.
You can use the Accept header to specify the return content-type.
How to detect the difference in data type at client (html/Json)\
You would look at the "Content-Type" header.
What about adding a variable for output type?

Mimetypes for a RESTful API

The Sun Cloud API at http://kenai.com/projects/suncloudapis/pages/Home is a good example to follow for a RESTful API. True to RESTful principles, when you GET a resource you get no more nor less than a representation of that resource.
The Content-Type header in the response tells you exactly what the type of that resource is, for example application/vnd.com.sun.cloud.Snapshot+json. Sun has registered these mimetypes with the IANA.
How practical is this in general currently? Most API's I have seen have used the Content-Type of "application/json". This tells you that the response is JSON but nothing more about it. You have to have something in the JSON object, like a "type" property, to know what it is.
I'm designing a RESTful API (which will not be made public, therefore I wouldn't be registering mimetypes). I have been using RESTEasy and I find that even if I specify a complete mimetype, the Content-Type in the response header will be exactly what the Accept request header specified. If the request asks for "application/*+json" by default the response header will have "application/*+json". I can probably fix this by changing the header before the response goes out, but should I try to do that? Or should the response have a wildcard just like the request did?
Or should I just serve up "application/json" like most APIs seem to do?
Additional thoughts added later:
Another way of stating the question is: Should I use HTTP as the protocol, or should I use HTTP only as a transport mechanism to wrap my own protocol?
To use HTTP as the protocol, the entity body of the response contains the representation of the object requested (or the representation of an error message object), the "Content-Type" header contains the exact type of the object, and the "Status" header contains a success or error code.
To use HTTP as merely a transport mechanism, the "Status" header is always set to 200 OK, the "Content-Type" is something generic like "application/json", and the entity body contains something that itself has an object, an object type, an error code and whatever else you want. If your own protocol is RESTful, then the whole scheme is RESTful. (HTTP is a RESTful protocol but not the only possible one.)
Your own protocol will be opaque to all the transport layers. If you use HTTP as the protocol, all the transport layers will understand it and may do things you don't want; for instance a browser will intercept a "401 Unauthorized" response and put up a login dialog, even if you want to handle it yourself.
I use my own vnd.mycompany.mymediatype+xml media types for many of my representations. On the client I dispatch to the appropriate controller class based on the media type of the returned representation. This really allows the server to control the behavior of my client application in response to the user following a link.
Personally, I believe that using application/xml and application/json are one of the worst choices you can make if you hoping to support REST clients. The only exception to this is when the client only uses downloaded code (like Javascript) to interpret the data.
Or should I just serve up "application/json" like most APIs seem to do?
I don't think so.
A media type is the only point of coupling between your RESTful web application and the clients that use it. The documentation of your media types is the documentation of your API. Your media types are the contract between your clients and your application. Eliminate the specific media type and you eliminate an important element that makes REST workable.
Sun has registered these mimetypes with the IANA.
Couldn't find any mention of that here. AFAIK, there is no requirement to actually register your custom media type with the IANA. The convention seems to be to use the inverted domain notation of application/vnd.com.example.app.foo+json, which prevents namespace conflicts. If and when your media type becomes stable and public, it might be a good idea, but there's no requirement. Could be wrong on this, though.
Will you get any value by specifying a complete mimetype? Would you do anything with the complete mimetype different than you would if the mimetype was application/json?
My 2 cents- If the API is not going to be made public, then I see no reason for a complete mimetype. A mimetype of application/json should be more than enough. You already know the type of json that the response is returning. If the API eventually becomes public, then worry about a complete mimetype... or just let people figure it out.

How to respond to an alternate URI in a RESTful web service

I'm building a RESTful web service which has multiple URIs for one of its resources, because there is more than one unique identifier. Should the server respond to a GET request for an alternate URI by returning the resource, or should I send an HTTP 3xx redirect to the canonical URI? Is HTTP 303 (see also) the most appropriate redirect?
Clarification: the HTTP specification makes it clear that the choice of redirect depends on which URI future requests should use. In my application, the 'canonical' URI is the most stable of the alternatives; an alternative URI will always direct to same canonical URI, or become invalid.
I'd personally plump for returning the resource rather than faffing with a redirect, although I suspect that's only because my subcoscious is telling me redirects are slower.
However, if you were to decide to use a redirect I'd think a 302 or 307 might be more appropiate than a 303, although the w3.org has details of the different redirect codes you could use.
Under W3C's Architexture of the World Wide Web, Volume One, there is a section on URI aliases (Section 2.3.1) which states the following:
"When a URI alias does become common currency, the URI owner should use protocol techniques such as server-side redirects to relate the two resources. The community benefits when the URI owner supports redirection of an aliased URI to the corresponding "official" URI. For more information on redirection, see section 10.3, Redirection, in RFC2616. See also CHIPS for a discussion of some best practices for server administrators."
For what it's worth, I would recommend a 302 redirect.
The answer from Ubiguchi had what I needed, except that I now think a redirect is the way to go, via the link to the HTTP 1.1 specifiction section on response codes. It turns out that I actually need a 301 redirect because the URI I'm redirecting to is more 'correct' and stable, and should therefore be used for future requests.