Hateoas links in Header or in Entity - json

I've seen two primary ways to add JSON REST Hateoas and I'm not sure which is more standard or the pros and cons of each approach.
The typical approach I see (Atom Links) is that the returned entity is appended to with a field named either links or _links. This field is an array of rel=<rel> and href=<href> pairs.
But I've also seen (Link Headers) links put into the header value named "Link". The Link is a collection with the format <hef>; rel=<rel>.
Also, I noticed that in JAX-RS there doesn't seem to add Atom Links with fully qualified hrefs, only paths. By fully qualified I mean scheme and authority included. Is it looked on as bad practice to have a complete URI for the href when using Atom Links for HATEOAS?

All the HATEOAS formats i know use the link relationship RFC https://www.rfc-editor.org/rfc/rfc5988 to abstractly define a link relationship. This rfc describes the Link header which is a fine way for conveying link relationships. Other formats serialize/present links in different ways. _links is probably most associated with the HAL+JSON format, while links is used by Siren and COLLECTION+JSON (which also allows for link headers).
It's all a matter of preference. Many times it comes down to asking if you think of link relationships as metadata of the resource or actually part of the resource. Sometimes it's both. HTML primarily treats them as part of the relationship and has been wildly successful with that. Having them in the response body of the resource makes it very easy to see them in a browser, headers are a little trickier to see.
Regarding URLs being absolute, scheme relative, root relative, path relative. That again is all preference. Something i like to keep in mind is that a resource is not always retrieved from a request, thus relative paths often can be useless. For example, storing a resource in a cache or on disk. Absolute or scheme relative URLs are much more portable across systems and I personally prefer them over root or path relative URLs in most cases. There are interesting scenarios where you may actually want URLs to be relative so they target different destinations depending on the executing environment. There was an interesting discussion on this recently in the HAL forum: https://groups.google.com/forum/#!topic/hal-discuss/_rwYvjLOT7Q

Adding another answer because HATEOAS links in headers have bitten us. If we had the option to do it again we would surely use links in the payload instead of the headers
The main reasons are:
When querying a collection, getting links for each of the records requires some special way to allow you to know which of the records a link is for.
<cry>We misused the title attribute to be a record ID </cry>
You can easily get a 431 Request Header Fields Too Large error if you have too many links.
<scream>Our collections don't send the all the links because of that limitation, which means we have to fetch every instance separately to have access to all the actions.<\scream>

Related

Should the links in a REST API be placed in the response header or body?

Is there a best practice for where to put the links to other resources in a REST API response? When I look at standards like HAL they always seem to put their links in the body, is there a reason for that? I'm developing an API in JAX-RS so it would be really easy to the links in the header, so it would be great if that's a viable option
If you have an actual Hypermedia content type, then, yes, the links should be in the body. Having the link type is part and parcel to what a Hypermedia data type is all about.
However, not all media type support hypermedia (for example, images), so any relevant links for those types can only belong in the header.
In the end, though, as you say, "it's really easy", then by all means, just use the header links.
You can also put them in both places. Use the header links for yourself in your system (i.e. you can assume they're there and leverage them), and if it's not a huge burden, populate the same links within the Hypermedia documents that you publish.

Having links relative to path (i.e. http://domain/path/)

Are there commonly accepted ways to have all links and references to images, scripts, stylesheets be relative to some path regardless of current document's URL?
Let's start from the very beginning. I am developing a custom content managing system in PHP. I am using mod_rewrite to redirect all requests like http://domain.com/path/artist/edit/25 to http://domain.com/path/index.php?url=/artist/edit/25. So the part of the URL following http://domain.com/path/ is actually virtual.
I would like all links to be in the format like ... and references to images, scripts, etc. in the format like <link href="ui/css/style.css"...>.
Well, it seems to be possible with:
...
<base href="http://domain.com/path/" />
...
This way I can link to scripts and stylesheets in a way like below:
...
<!-- Custom page style CSS -->
<link href="ui/css/style.css" rel="stylesheet" type='text/css'>
<!-- Support for CSS3 media query in IE8 -->
<script type="text/javascript" src="ui/js/respond.js"></script>
<!-- MooTools 1.6.0 -->
<script type="text/javascript" src="ui/js/MooTools-Core-1.6.0.js"></script>
...
However, AFAIK the <base href=...> should match the current page request (which is http://domain.com/path/artist/edit/25). And it ruins the whole concept.
That's why I need you to clarify:
Is it a commonly accepted practice to have <base href=...> pointing to a directory and not to the current document URL?
Does this practice comply with the requirements for the usage of HTML <base> element?
Will it in any way affect crawlers like Googlebot? Do they require the <base href=...> to match the every particular document URL?
I also would like to know how do you solve the problem of relative links and references to resources when some part of URL is virtual. I have discovered that projects like WordPress tend to completely avoid relative links and go the "absolute links way".
The whole point of the base element is to specify an arbitrary base URL to be used to resolve relative links instead of the current-document URL. Otherwise the element would not make sense since current-document URL is used as the base url by default anyway.
Major crawlers support both absolute and relative URLs as well as the base element. Some shake-and-bake crawlers don’t understand relative URLs and/or don’t support the base element (thus resulting in multiple 404 lines in your server logs, though this is a minor thing).
I would recommend not to use the base element. Relative links tend to be error-prone resulting in wrong resolved URLs while not providing any serious benefits. It’s generally more reasonable and easy to always use absolute URLs.
Is it a commonly accepted practice to have pointing to
a directory and not to the current document URL?
No, it's not common. In fact I'd say it's very uncommon because there are better ways create a logical information architecture of your site without it.
Will it in any way affect crawlers like Googlebot? Do they require the to match the every particular document URL?
It's hard to get the base tag correct and there are ways to do what you want using better methods that are transparent to googlebot etc.
Note, absolute links are what you're seeing in the source but it that does not means that the links physically map to directories and files etc. Using tools like mod_rewrite on apache you can structure your site as many ways as you please with practically any physical filesystem, doing this is also what I'd recommend because as things changes you're not tied to a particular solution. This is also why most php apps send everything through an index.php script, the application then controls the information architecture, not the filesystem.
"base href" can be used without problems, but it is not always the best solution. It is fine if your server will answer requests with diferent server name and paths (e.g. "http://www.example.com/companysection/especificservice" and "http://service.internalnetwork.dev/")
IMHO it's not the best solution for your case.
In the url "http://example.com/path/index.php?url=/artist/edit/25" you want to transform part of the query in a path ( base example.com/path/index.php ?url= )... and this can be a big problem. How are you going to handle querys that also have a query? (receiving a search term or a form GET, for example)
Apache mod_rewrite would be a better option, as Harry answer suggest (or nginx rewrite rules). With it you can easily "transform" a request like http://example.com/path/artist/edit/25?search=something&order=ASC in http://example.com/path/index.php?url=artist/edit/25&search=something&order=ASC
This will give you less problems in the long term.
Check the last example in https://wiki.apache.org/httpd/RewriteQueryString , it's really close to fulfill all your rewriting needs
(you will just need to ensure you handle the rest of query properly)
Take a URL of the form http://example.com/path/var/val and transform
it into a var=val query http://example.com/path?var=val. Essentially
the reverse of the above recipe. This example will work for any valid
three level URL. http://example.com/path/var/val will be transformed
into http://example.com/path?var=val.
RewriteRule ^/path/([^/]+)/([^/]+) /path?$1=$2

REST, hypertext and non-browser clients

I am confused on how a REST api can both be hypertext driven, but also machine readable. Say I design an API and some endpoint lists contents of a collection.
GET /api/companies/
The server should return a list with company resources e.g:
/api/companies/adobe
/api/companies/microsoft
/api/companies/apple
One way would be to generate a hypertext (html) page with <a> links to the respective resources. However I would also like to make it easy for a non-browser client to read this list. For example, some client might want to populates a dropdown gui with companies. In this case returning a html document is inappropriate, and it might be better to return a list in JSON or XML format.
It is not clear to me how REST style can satisfy both. Is there a practical solution or examples of a REST api that is both nice to browsers and non-browser clients?
What you're looking for is nowadays referred to as HATEOAS API's. See this question for examples: Actual examples for HATEOAS (REST-architecture)
The ReST architectural style, as originally defined by Roy Fielding, prescribes "hypertext as the engine for application state" as one of the architectural contraints. However, this concept got more or less "lost in translation" when people started equaling "RESTful API's" with "using the HTTP verbs right" (plus a little more, if you're lucky). (Edit: Providing credence for my assertion are the first and second highest-ratest answers in What exactly is RESTful programming? . The first talks only about HTTP verbs).
Some thoughts on your question: (mainly because the subject keeps fascinating me)
In HATEOAS, standardized media types with precise meaning are very important. It's generally thought to be best to reuse an existing media type when possible, to benefit from general understanding and tooling around this. One popular method is using XML, because it offers both general structure for data and a way to define semantics, i.e. through an XML schema or with namespaces. XML in and by itself is more or less meaningless when considering HATEOAS. The same applies for JSON.
For supporting links, you want to choose a media type that either supports links "natively" (i.e. text/html, application/xhtml+xml) or a media type that allows defining what pieces in the document must be interpreted as links through some embedded metadata, such as XML can with for example XLINK. I don't think you could use application/json because JSON by itself has no pre-defined place to define metadata. I do think that it would be possible to design a media type based on json - call it application/x-descriptive-json - that defines up-front that the JSON document returned must consist of a "header" and "body" property where header may contain further specified metadata. You could also design a media type for JSON just to support embedded links. Simpler media type, less extenisble. I wouldn't be surprised if both things I describe already exist in some form.
To be both nice to browsers and non-browser clients, all it takes is respecting the Accept header. You must assume that a client who asks for text/html is truly happy with text/html. This could be an argument for not using text/html as the media type for your non-browser API entry point. In principle, I think it could work though if the only thing you want is links. Good HTML markup can be very well consumed by non-browser clients. HTML also defines way to do paging, through rel="next", rel="previous".
The three biggest problems of a singular media type for both browsers and non-browsers I see are:
you must ensure all your site html is outputted with non-browser consumption in mind, i.e. embed sufficient metadata. Perhaps add hidden links in some places. It's a bit comparable from thinking about accessibility for visually impaired people: Though now, you're designing for a consumer who cannot read English, or any natural language for that matter. :)
there may be lots of markup and content that may essentially irrelevant to a non-browser client. Think of repeating header and footer text, navigation area's that kind of things.
HTML may simply lack the expressiveness you need. In principle, as soon as you go "think up" some conventions specific to your site (i.e. say rel="original-image means the link to the full-size, original image), then you're not doing strictly HATEOS anymore (at least, that's my understanding). HTML leaves no room for defining new meaning to elements. XML does.
A work-around to problem three might be using XHTML, since XHTML, by the virtue of being XML, does allow specifying new kinds of elements through namespaces.
I see #robert_b_clarke mentioning Microformats, which is relevant in this discussion. It's indeed one way of trying to improve accessibility for non-human agents. The main issue with this from a technical point of view is that it essentially relies on "out-of-band" information. Microformats are not part of the text/html spec. In a way, it's comparable to saying: "Hey, if I say that there's a resource with type A and id X, you can access it at mysite.com/A/X." The example I gave with rel=original-image could be called a micro-format as well. But it is a way to go. "State in your API docs: We serve nicely formatted text/html. Our text/html also embeds the following microformats: ..." You can even define your own ones.
I think the following presentation as a nice down-to-earth explanation of HATEOAS:
http://www.slideshare.net/apigee/hateoas-101-opinionated-introduction-to-a-rest-api-style
Edit:
I only now read about HTML5 microdata (because of #robert_b_clarke). It seems like HTML5 does provide a way for supplying additional information beyond what's possible with standard HTML tags. Consider what I wrote dated. :) Edit edit: It's only a draft, phew. ;)
Edit 2
Re a "descriptive JSON" format: This has just been announced http://jsonapi.org/ . They have applied for their own mime type. It's by Yehuda Katz (Ember.js) and Steve Klabnib, who's writing Designing Hypermedia API's.
The HTTP Accept header can be used by clients to request a response in a specific content type. For example, your REST API clients might request JSON data using the following header:
GET http://yourdomain.com/api/companies
Accept: application/json
So your server app can then serve JSON or HTML for the same URL depending on the value of the Accept header. Of course all your REST client apps will have to include that header, which may or may not be practical.
There are numerous alternative approaches, one of which is to serve the same XHTML content to both browsers and client apps. You can use HTML5 microdata or Microformats to embed structured data within HTML. That approach has a number of limitations. API client requests will result in larger, more complicated responses than necessary as they will include a load of stuff that's only usable by a web browser. There are also other differences in behaviour you might like to enforce. For instance you would probably want an unauthorized GET request for a protected resource to result in an HTTP 401 response for a machine client, and a redirect to login page for a web browser.
You may find that the easiest way is to be less principled and serve the human friendly and machine friendly versions of your resources through separate URLs
http://yourdomain.com/companies
http://yourdomain.com/api/companies
I've seen this question answered several ways. Some developers add a request parameter to indicate the format of the response, as in /api/companies/?rtnType=json. This method may be acceptable in a small application. It is a departure from true RESTful theology though.
The better way (in Java at least) is to use something like the Spring Framework. Spring can provide dynamic response formatting based on the media type in the HTTP request. The book "Spring in Action" (Walls, 2011) has an excellent explanation of this in chapter 11. And there are similar ways to accomplish dynamic response formatting in other languages without breaking REST.

Which values for a link-tag's rel-attribute should be used to represent a hierarchy of collections and documents?

I would like to represent links to subfolders and documents in a folder in hypertext documents (possibly using HAL). Therefore, a document representing a folder should have links to the parent folder, subfolders and files contained in the folder.
For the parent folder, <link rel="up" href=".." > seems to be the straightforward choice. However, I am unsure as to what is most appropriate for links to subfolders and documents contained in the folder.
There are a couple of options defined in RFC-5988. However, I couldn't say which one would be most appropriate to represent a tree of folders and files.
I could come up with my own values and produce documents. For example (using HTML syntax rather then HAL for familiarity):
...
<link rel="self" href="http://example.com/some/folder/">
<link rel="up" href="http://example.com/some/">
<link rel="file" href="image1.jpg">
<link rel="file" href="image2.png">
<link rel="folder" href="subfolder/">
...
Using custom rel-attributes has the clear disadvantage of applications consuming these documents needing to have explicit support for them. Consequently, I'd rather use something that an application could understand by just following standards and best practices.
Update:
AtomPub (RFC 5023) )seems to use rel="edit" on links to members of a collection. They don't have a concept for sub.collection I believe. rel="subsection" from RFC-5988 might be an option.
Representing a Hierarchy
One very common way to represent a hierarchy is to use the concept of parent-child relationships between levels in the hierarchy. This allows you to describe the hierarchy very generally without tying yourself to only representing folders and documents since parent-child/children can represent any hierarchy. DOM parsers, for example, use this concept heavily, since the DOM is a hierarchy.
Assuming your hierarchy is not fully balanced (ie. you can have documents at 3 levels deep, and also at 20 levels deep), you need to know what type of "node" you are at, either a collection (folder) or a leaf (document). With the notion of collection vs. leaf, and parent vs. child/children, you will be able to easily notate the relationship of all the links you should need.
Content Type
The most important part of any REST interaction is the clear definition of the content types at both the client and the server. Content types are rather obscure as far as what they need to actually be, but for your purposes it could be as little as indicating that the general resource "format" is HAL, and would also define the meaning of the parent and child rel values.
Most people forget that content types are for humans. Only the names of the content types matter to user-agents, not the contents of the content type definition. The content type definitions are used by developers (or very smart user agents) to decide how to interpret things. The user-agent just uses the name to switch interpretation modes.
Standards are not Everything
In general, it's more important for your application to represent it's own resources the way your application needs to than for you to try to shoehorn your representation(s) into some "standard". If it makes the most sense for your application to use up, folder, and file, the by all means use those link relations. The only caution I would have it to think hard about how you might want to change things down the road. Changing an API isn't the easiest thing to do, but introducing a new content type and deprecating an older one isn't too tough.
Don't get me wrong, I'm all for standards, but they, by their very nature, are almost always behind the real world. Case in point, the RCF doesn't seem to handle generic hierarchical collections of things well.

Adding ids to HTML tags for QA automation

I have a query In our application we have lots of HTML tags. During development many tags were not given any id because of no requirement.Now the QA team wants to automate the test cases using QTP. In most of the cases this tool doesn't recognizes because it does not find ids for most of the HTML tags.Now we are asked to add ids to all the HTML tags.
I want to know if there will be any effect adding id attribute to these tags. Even positive impact are welcome
I do not think there will be any either positive or negative effect : maybe the size of the HTML page will increase a bit, but probably not that much.
Still, are you sure you need to put "id" attributes on every HTML tag of your pages ? Wouldn't only a few of those be enough ? Like on form fields, on links, on error-messages ; and that's probably about it ?
One thing you must take care, though, is that "id", as in "identifers", must be unique ; which implies it might be good, before starting adding them, to define some kind of "id-policy", to say, for instance, that "ids for elements of that kind should be named that way".
And, for your next projects : have developpers add those when theyr're developping ;-)
(And following the policy, of course)
Now that I'm thinking about it : a positive effect might be that it'll be easier to write Javascript code interacting with your HTML document -- but that'll be true for next projects or evolutions for this one, when those id are already present in the HTML at the time developpers put the JS code in place...
Since there are no QTP related answers yet.
GUI recognition in QTP is object-oriented. In order to identify an object QTP needs a unique combination of object's properties, and checking them better to be as fast as possible - that is why HTML ID would be ideal.
Now, where it is especially critical - for objects that do not have other unique identifiers. The most typical example - html tables. Their contents is dynamic, their number on the page may vary. By adding HTML ID you allow recognition mechanism get straight to the right table.
Objects with other unique properties can be recognized well without HTML ID. For example, if you have a single "submit" link on the page QTP will successfully recognize it by inner text.
So the context-specific answer: don't start adding ids to every single tag. Ask automation guys to prepare a list of objects they have problem with. And add ids to those objects.
PS. It also depends on automation programming skills. There are descriptive programming and dynamic recognition methods. They allow retrieving the right objects even without ids provided.
As Albert said, QTP doesn't rely solely on elements' id, in fact due to the fact that many web applications generate different ids for each session, (as far as I remember) the id property isn't part of the default description for most web test objects.
QTP is pretty good at recognizing most simple web controls and if you're facing problems it may be the case that a Web Extensibility project will help you bridge the gap between the semantics of your web application and the raw HTML it is created in. If a complex control is recognized by QTP as a WebElement (which is actually the div that contains the span that drives the code) you will understandably have object recognition problems since there are many divs on the page but probably many less complex controls.
If you are talking about side-effects - NO. Adding ids won't cause any problems (apart from taking up some extra bytes of course)
If you really have the need to add ids, go ahead and add them.
http://www.w3.org/TR/html4/struct/links.html#anchors-with-id says: The id and name attributes share the same name space. This means that they cannot both define an anchor with the same name in the same document. It is permissible to use both attributes to specify an element's unique identifier for the following elements: A, APPLET, FORM, FRAME, IFRAME, IMG, and MAP. When both attributes are used on a single element, their values must be identical.