I want to know, if the Json files that would be used in Elasticsearch should have a predefined structure. Or can any Json document can be uploaded?
I've seen some Json documents that before each record there's such:
{"index":{"_index":"plos","_type":"article","_id":0}}
{"id":"10.1371/journal.pone.0007737","title":"Phospholipase C-β4 Is Essential for the Progression of the Normal Sleep Sequence and Ultradian Body Temperature Rhythms in Mice"}
Theoretically you can upload any JSON document. However, be mindful that Elasticsearch can create/change the index mapping based on your create/update actions. So if you send a JSON that includes a previously unknown field? Congratulations, your index mapping now contains a new field! In this same way the data type of a field might also be affected by introducing a document with data of a different type. So, my advice is be very careful in constructing your requests to avoid surprises.
Fyi, the syntax you posted looks like a bulk request (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html). Those do have some demands on the syntax to clarify what you want to do to which documents. "Index" call sending one document is very unrestricted though.
I came across a few resources regarding REST but I'm not able to understand things clearly. It would help me if someone could explain things with respect to my example below.
I have a table named User
User table Content
id name
1 xxx
The URL I will be calling is /test/1
The result will be in JSON format, eg: { 1: "xxx" }
My understanding so far regarding REST:
Resource - User table content
Representation - table/JSON
State transfer - data in the form of table to JSON.
Please do let me know if my understanding is correct.
Else, please answer the below questions:
What is a Resource in my example?
What is a Representation in my example?
What is a state transfer or when does this happen in my example?
REST is about resource state manipulation through their representations on the top of stateless communication between client and server. It's a protocol independent architectural style but, in practice, it's commonly implemented on the top of the HTTP protocol.
When designing REST over HTTP, URLs are used to locate the resources, HTTP methods are used to express the operations over the resources and representations such as JSON and/or XML documents are used to represent the state of the resource. HTTP headers can be used to exchange some metadata about the request and response while HTTP status code are used to inform the client regarding the status of the operation.
What is a resource in my example?
Understand resource as the concept of a user. Don't think about the table in your database, think about an abstraction of a user with their set of attributes.
What is a representation in my example?
A JSON document can be used to represent the state of a particular resource. A resource can have many representations, such as JSON and/or XML documents, and the client can use content negotiation to request different representations of the same resource.
What is a state transfer or when does this happens in my example?
The state of a given resource can be retrieved and manipulated using representations.
A GET request, for example, allows you to retrieve a representation of the state of a resource, sent in the response payload. A PUT request, for example, allows you to replace the state of a resource with the state defined by the representation enclosed in the request payload.
Example
Consider a user resource with attributes such as id and name stored somehow in your server:
ID: 1
Name: John Doe
These details make the state of the resource.
A URL such as /users/1 can be used to locate the resource in your server.
Requests such as GET, PUT and DELETE can be performed against this URL to retrieve/manipulate the state of the resource using representations, such as JSON and/or XML documents (other representations can be supported according to your needs):
{
"id": 1,
"name": "John Doe"
}
<user>
<id>1</id>
<name>John Doe</name>
</user>
The above shown documents are not the resource itself. They are just a way to represent the resource. which is stored somehow in your server.
If you want to understand REST, you should really start from the source: Fielding's thesis.
What is a Resource in my example?
OK, review of the term:
The key abstraction of information in REST is a resource. Any information that can be named can be a resource: a document or image, a temporal service (e.g. "today's weather in Los Angeles"), a collection of other resources, a non-virtual object (e.g. a person), and so on. In other words, any concept that might be the target of an author's hypertext reference must fit within the definition of a resource. A resource is a conceptual mapping to a set of entities, not the entity that corresponds to the mapping at any particular point in time.
In other words, the "resource" is the concept you are talking about. In this case, the user with name xxx. But it could be anything - the table that holds the data about the user with name xxx is also a "resource".
What is a Representation in my example?
Representations are fundamentally byte arrays
A representation is a sequence of bytes, plus representation metadata to describe those bytes. Other commonly used but less precise names for a representation include: document, file, and HTTP message entity, instance, or variant.
So your json document -- more precisely, the utf-8 encoded byte array, is a representation. A given resource might have many representations at any given time.
What is a state transfer or when does this happens in my example?
When the client and server exchange messages; the client server architectural style is the first of the architectural constraints in the REST architectural style.
I have a situation where I have to write a api to create a resource and amongst datafields that I need to accept is a string that is basically contents of a html file. As I see it I have a choice between structuring the entire thing as a json object where this field is a string field with urlencoded html string , and having the Content Type as multipart/form-data where each of the fields and the html string (UTF-8 encoded) is a part in the message.
Not using json is something I am not comfortable with as I feel violating the REST standards in not structuring the content of the entity I am about to create thus there is a loss of information for the consumers as they can't tell immediately looking at my api definition about what data to feed to it. But practically multipart/form-data handles stuff like html file content better and more efficient as I will not have to urlencode it and can control the char-encoding also.
What will be a better approach in current context and upholding RESTful principles ? Also are there other trade-offs i should be aware of ? what about parsing a json with a huge string field (~ 200 Kb)embedded?
EDIT :- I was reading some similar questions on SO and one approach that stood out was the 2-step approach of making a first call with metadata to create the entity and then upload the file as an UPDATE process to the created entity wherein we use multipart/form-data. In that context, I guess , what I am asking is how sound is an approach where I send both metadata and the file in a single api call as multipart data , where each metadata field is actually a part in the multipart message as is the file.
The canonical way to upload files to REST API is using the multipart/form-data. As W3 recommendation guide says:
The content type "multipart/form-data" should be used for submitting
forms that contain files, non-ASCII data, and binary data.
Multipart/form-data has advantages over base64 to represent binary data. Is sticked to REST/Http philosophy, and simplify the develop of API clients.
Returning values from Forms: multipart/form-data
W3 Recommendation guide
The good practice is to use multipart/form-data whenever files are uploaded to the server along with database fields. Do not send a base64 JSON string as the request to your Rest API as it might corrupt the file or degrade the performance of your application.
As far as documenting multipart/form-data Rest API for your consumers is concerned you have to force your API consumers to use the same form fields which you have predefined in your web service.
Returning Values from Forms: multipart/form-data
I started using FormData objects everywhere on the client-side, in lieu of regular form input fields, for dynamic REST posts. FormData is presented in a positive light in various tutorials, so I went with it.
However, down the line, this caused me problems when decoding the form data into my Go structs. FormData objects are sent as "multipart/form-data" (regardless of files being sent) and I believe my decoder in Go didn't convert the raw data back to string form. Eventually my SQL queries were throwing panics, as hex data was being sent in where strings should have been.
So with some adjustment, I could use FormData however I've decided to revert to the simple universal recommendation: Use "multipart/form-data" only for special cases like when sending files. Otherwise, just use regular "application/x-www-form-urlencoded".
A REST API can have arguments in several places:
In the request body - As part of a json body, or other MIME type
In the query string - e.g. /api/resource?p1=v1&p2=v2
As part of the URL-path - e.g. /api/resource/v1/v2
What are the best practices and considerations of choosing between 1 and 2 above?
2 vs 3 is covered here.
What are the best practices and considerations of choosing between 1
and 2 above?
Usually the content body is used for the data that is to be uploaded/downloaded to/from the server and the query parameters are used to specify the exact data requested. For example when you upload a file you specify the name, mime type, etc. in the body but when you fetch list of files you can use the query parameters to filter the list by some property of the files. In general, the query parameters are property of the query not the data.
Of course this is not a strict rule - you can implement it in whatever way you find more appropriate/working for you.
You might also want to check the wikipedia article about query string, especially the first two paragraphs.
I'll assume you are talking about POST/PUT requests. Semantically the request body should contain the data you are posting or patching.
The query string, as part of the URL (a URI), it's there to identify which resource you are posting or patching.
You asked for a best practices, following semantics are mine. Of course using your rules of thumb should work, specially if the web framework you use abstract this into parameters.
You most know:
Some web servers have limits on the length of the URI.
You can send parameters inside the request body with CURL.
Where you send the data shouldn't have effect on debugging.
The following are my rules of thumb...
When to use the body:
When the arguments don't have a flat key:value structure
If the values are not human readable, such as serialized binary data
When you have a very large number of arguments
When to use the query string:
When the arguments are such that you want to see them while debugging
When you want to be able to call them manually while developing the code e.g. with curl
When arguments are common across many web services
When you're already sending a different content-type such as application/octet-stream
Notice you can mix and match - put the the common ones, the ones that should be debugable in the query string, and throw all the rest in the json.
The reasoning I've always used is that because POST, PUT, and PATCH presumably have payloads containing information that customers might consider proprietary, the best practice is to put all payloads for those methods in the request body, and not in the URL parms, because it's very likely that somewhere, somehow, URL text is being logged by your web server and you don't want customer data getting splattered as plain text into your log filesystem.
That potential exposure via the URL isn't an issue for GET or DELETE or any of the other REST operations.
What are REST resources and how do they relate to resource names and resource representations?
I read a few articles on the subject, but they were too abstract and they left me more confused than I was before.
Is the following URL a resource? If it is, what is the name of that resource and what is its representation?
http://api.example.com/users.json?length=2&offset=5
The GET response of the URL should look something like:
[
{
id: 6,
name: "John"
},
{
id: 7,
name: "Jane"
}
]
The reason why articles on REST resources are abstract is because the concept of a REST resource is abstract. It's basically "whatever thing is accessed by the URL you supply". So, in your example, the resource would be the list of two users starting at offset 5 in some bigger list. Note that, how the resource is implemented is a detail you don't care about unless you are the one writing the implementation.
Is the following URL a resource?
The URL is not a resource, it is a label that identifies the resource, it is, if you like, the name of the resource.
The JSON is a representation of the resource.
What’s a Resource?
A resource is anything that’s important enough to be referenced as a
thing in itself. If your users might “want to create a hypertext link
to it, make or refute assertions about it, retrieve or cache a
representation of it, include all or part of it by reference into
another representation, annotate it, or perform other operations on
it”, then you should make it a resource.
Usually, a resource is something that can be stored on a computer and
represented as a stream of bits: a document, a row in a database, or
the result of running an algorithm. A resource may be a physical
object like an apple, or an abstract concept like courage, but (as
we’ll see later) the representations of such resources are bound to be
disappointing. Here are some possible resources:
Version 1.0.3 of the software release
The latest version of the software release
The first weblog entry for October 24, 2006
A road map of Little Rock, Arkansas
Some information about jellyfish
A directory of resources pertaining to jellyfish
The next prime number after 1024
The next five prime numbers after 1024
The sales numbers for Q42004
The relationship between two acquaintances, Alice and Bob
A list of the open bugs in the bug database
The text is from the O'Reilly book "RESTful Web Services".
The URL is never a resource or its name or its representation.
URL just tells where the resource is located and You can invoke GET,POST,PUT,DELETE etc on this URL to invoke the resource.
Data responded back are the resources while the form of the data is its representation.
Let's say Your URL with given GET parameters can output a JSON resource - this is the JSON representation of this resource. While with other flag in the GET it could respond with the same data in XML - that will be another representation of the very same resource.
EDIT: Due to the comments to the OP and to my answer I'm adding another explanations.
Also the resource name is considered to be the 'script name', e.g. in this case it is users.json while this resource name is self describing the resource representation itself - when calling this resource we expect the resource is in JSON, while when calling e.g. users.xml we would expect the data in XML.
When I change the offset parameter in GET the response contains
different data set - is it a new resource or its representation?
When I define which columns are returned in response in GET, is it a different resource or different representation, or?
Well, here the problem and answer are clear - we still call the same URL, the server responses with the data in the same form (still it is JSON), data still contains information about users - just the information itself has changed due to the new offset parameter. So it is obvious that it is still the same resource with the same representation and the same resource name as before.
Second problem could be a little confusing. Though we are calling the same resource, though the resource contains the same data (just with only predefined column set) and though the data is in the same representation it could seem to us as a different resource. But due to the points in the paragraph above it is nor the different resource or different representation. Though the data set contains less information the requesting side (filtering this data set) should be considering this and behave accordingly. So again: it is the same resource with the same resource name and the same resource representation.
REST
This architectural style was defined in the chapter 5 of Roy T. Fielding's dissertation.
REST is about resource state manipulation through their representations on the top of stateless communication between client and server. It's a protocol independent architectural style but, in practice, it's commonly implemented on the top of the HTTP protocol.
Resources
The resource itself is an abstraction and, in the words of the author, a resource can be any information that can be named. The domain entities of an application (e.g. a person, a user, a invoice, a collection of invoices, etc) can be resources. See the following quote from Fielding's dissertation:
5.2.1.1 Resources and Resource Identifiers
The key abstraction of information in REST is a resource. Any information that can be named can be a resource: a document or image, a temporal service (e.g. "today's weather in Los Angeles"), a collection of other resources, a non-virtual object (e.g. a person), and so on. In other words, any concept that might be the target of an author's hypertext reference must fit within the definition of a resource. A resource is a conceptual mapping to a set of entities, not the entity that corresponds to the mapping at any particular point in time.
More precisely, a resource R is a temporally varying membership function MR(t), which for time t maps to a set of entities, or values, which are equivalent. The values in the set may be resource representations and/or resource identifiers. [...]
Resource representations
A JSON document is resource representation that allows you to represent the state of a resource. A server can provide different representations for the same resource. For example, using XML and JSON documents. A client can use content negotiation to request different representations of the same resource.
Quoting Fielding's dissertation:
5.2.1.2 Representations
REST components perform actions on a resource by using a representation to capture the current or intended state of that resource and transferring that representation between components. A representation is a sequence of bytes, plus representation metadata to describe those bytes. Other commonly used but less precise names for a representation include: document, file, and HTTP message entity, instance, or variant.
A representation consists of data, metadata describing the data, and, on occasion, metadata to describe the metadata (usually for the purpose of verifying message integrity). Metadata is in the form of name-value pairs, where the name corresponds to a standard that defines the value's structure and semantics. Response messages may include both representation metadata and resource metadata: information about the resource that is not specific to the supplied representation. [...]
Over HTTP, request and response headers can be used to exchange metadata about the representation.
Resource identifiers
A URL a resource identifier that identifies/locates a resource in the server.
This answer may also be insightful.
What are REST resources and how do they relate to resource names and resource representations?
REST doesn't mean a great deal more then you use HTTP verbs (GET, POST, PUT, DELETE, etc) properly.
Is the following URL a resource?
All URLs are strings that tell computers where a resource can be located. (Hence the name: Uniform Resource Locator).
A resource is:
a noun
that is unique
and can be represented as data
and has at least one URI
I go into more detail on my blog post, What, Exactly, Is a RESTful Resource?
Representational State Transfer (REST) is a style of software architecture for distributed systems such as the World Wide Web. REST-style architectures consist of clients and servers. Clients initiate requests to servers; servers process requests and return appropriate responses. Requests and responses are built around the transfer of representations of resources. Resources are a set of addressable objects, basically files and documents, linked using URLs. As correctly pointed out above by Quentin, REST archiecture simply implies that you'd use the HTTP verbs GET/POST/PUT/DELETE...
Conceptually you can think about a resource as everything which is accessible on the web using an URL.
If you stick to this rule http://api.example.com/users.json?length=2&offset=5 can be considered a resource
You've only provided what appear to be relative parameters rather than "ID" which is (or should be) concrete. Remember, get operations should be idempotent (i.e. repeatable with the same outcome).
What is REST?
REST is an architecture Style which stands for Representational(RE) State(S) transfer(T).
What is REST Resource ?
Rest Resource is data on which we want to perform operation(s).So this data can be present in database as record(s) of table(s) or in any other form.This record has unique identifier with which it can be identified like id for Employee.
Now when this data is requested by unique url like http://www.example.com/employees/123,so ultimately data or record which is present in database will be converted to JSON/XML/Plain text format by Rest Service and will be sent to Consumer.
So basically,what is happening here is REPRESENTATIONAL STATE TRANSFER, in a way that state of the data present in database is transferred to another format which can be JSON/XML or plain text.
So in this case 1 employee represents 1 resource which can be accessed by unique url like http://www.example.com/employees/123
In case we want to get list of all resources(employees),we will do:
http://www.example.com/employees
Hope this will help.
REST stands for REpresentational State Transfer. It's a method of transferring variable information from one location to another. A common method of doing this is using JSON - a way of formatting your variables so that they can be transferred without loss of information.
PHP, for example, has built in JSON support. Passing a PHP array to json_encode($array) will output a string in the format you posted (which by the way, is indeed a REST resource, because it gives you variables and information).
In PHP, the array you posted would turn out as:
Array (
[0]=>Array (
['id']=>6;
['name']=>'John';
)
[1]=>Array (
['id']=>7;
['name']=>'Jane';
)
)