What is the RESTful way to return a JSON + binary file in an API - json

I have to implement a REST endpoint that receives start and end dates (among other arguments). It does some computations to generate a result that is a kind of forecast according to the server state at invocation epoch and the input data (imagine a weather forecast for next few days).
Since the endpoint does not alter the system state, I plan to use GET method and return a JSON.
The issue is that the output includes also an image file (a plot). So my idea is to create a unique id for the file and include an URI in the JSON response to be consumed later (I think this is the way suggested by HATEOAS principle).
My question is, since this image file is a resource that is valid only as part of the response to a single invocation to the original endpoint, I would need a way to delete it once it was consumed.
Would it be RESTful to deleting it after serving it via a GET?
or expose it only via a DELETE?
or not delete it on consumption and keep it for some time? (purge should be performed anyway since I can't ensure the client consumes the file).
I would appreciate your ideas.

Would it be RESTful to deleting it after serving it via a GET?
Yes.
or expose it only via a DELETE?
Yes.
or not delete it on consumption and keep it for some time?
Yes.
The last of these options (caching) is a decent fit for REST in HTTP, since we have meta-data that we can use to communicate to general purpose components that a given representation has a finite lifetime.
So this reference of the report (which includes the link to the plot) could be accompanied by an Expires header that informs the client that the representation of the report has an expected shelf life.
You might, therefore, plan to garbage collect the image resource after 10 minutes, and if the client hasn't fetched it before then - poof, gone.
The reason that you might want to keep the image around after you send the response to the GET: the network is unreliable, and the GET message may never reach its destination. Having things in cache saves you the compute of trying to recalculate the image.
If you want confirmation that the client did receive the data, then you must introduce another message to the protocol, for the client to inform you that the image has been downloaded successfully.
It's reasonable to combine these strategies: schedule yourself to evict the image from the cache in some fixed amount of time, but also evict the image immediately if the consumer acknowledges receipt.
But REST doesn't make any promises about liveness - you could send a response with a link to the image, but 404 Not Found every attempt to GET it, and that's fine (not useful, of course, but fine). REST doesn't promise that resources have stable representations, or that the resource is somehow eternal.
REST gives us standards for how we request things, and how responses should be interpreted, but we get a lot of freedom in choosing which response is appropriate for any given request.

You could offer a download link in the JSON response to that binary resource that also contains the parameters that are required to generate that resource. Then you can decide yourself when to clean that file up (managing disk space) or cache it - and you can always regenerate it because you still have the parameters. I assume here that the generation doesn't take significant time.

It's a tricky one. Typically GET requests should be repeatable as an import HTTP feature, in case the original failed. Some people might rely on it.
It could also be construed as a 'non-safe' operation, GET resulting in what is effectively a DELETE.
I would be inclined to expire the image after X seconds/minutes instead, perhaps also supporting DELETE at that endpoint if the client got the result and wants to clean up early.

Related

Ethereum: What's a good way to retrieve a large amount of old smartcontract log data from a RPC service for a backfill?

The problem I'm posed with is backfilling a specialized database, using data from the event log of a given smartcontract on an Ethereum blockchain.
The question is however: how to do so without reaching the limits of eth_getLogs (also without limits: how to have reasonably sized RPC responses)
What I tried so far
I prefer to use Infura, but they limit this call at 100 entries per response. And rightfully so, querying should be done in small chunks for load balancing etc. Is api pagination + eth_getLogs the right way to collect data for backfills?
Idea 1: eth_getLogs on ranges of blocks
I don't know of any way to paginate the eth_getLogs other than querying for ranges of blocks. A block may contain more than 100 events however, which prevents me from reading all of the data when using Infura. Maybe there is a way to paginate on log index? (100 is something I came accross when experimenting, but I can't find documentation on this)
Idea 2: log filters
Using a filter RPC call is another option: i.e. start a "watcher" on a range of old blocks. I tried this, but the Infura websocket RPC I am using doesn't seem to give any response, and neither does Ganache when testing locally. Non-archive (i.e. live watching) logs work, so I known that my code is working as intended at least. (My go-ethereum Watch... generated binding call works, but does not result in responses on the output channel when specifying an old block in bind.WatchOpts.Start)
Does anyone have any suggestions on how to retrieve large amounts of log data? Or a link to other projects that tackled this problem?

REST API - file (ie images) processing - best practices

We are developing server with REST API, which accepts and responses with JSON. The problem is, if you need to upload images from client to server.
Note: and also I am talking about a use-case where the entity (user) can have multiple files (carPhoto, licensePhoto) and also have other properties (name, email...), but when you create new user, you don't send these images, they are added after the registration process.
The solutions I am aware of, but each of them have some flaws
1. Use multipart/form-data instead of JSON
good : POST and PUT requests are as RESTful as possible, they can contain text inputs together with file.
cons : It is not JSON anymore, which is much easier to test, debug etc. compare to multipart/form-data
2. Allow to update separate files
POST request for creating new user does not allow to add images (which is ok in our use-case how I said at beginning), uploading pictures is done by PUT request as multipart/form-data to for example /users/4/carPhoto
good : Everything (except the file uploading itself) remains in JSON, it is easy to test and debug (you can log complete JSON requests without being afraid of their length)
cons : It is not intuitive, you cant POST or PUT all variables of entity at once and also this address /users/4/carPhoto can be considered more as a collection (standard use-case for REST API looks like this /users/4/shipments). Usually you cant (and dont want to) GET/PUT each variable of entity, for example users/4/name . You can get name with GET and change it with PUT at users/4. If there is something after the id, it is usually another collection, like users/4/reviews
3. Use Base64
Send it as JSON but encode files with Base64.
good : Same as first solution, it is as RESTful service as possible.
cons : Once again, testing and debugging is a lot worse (the body can have megabytes of data), there is increase in size and also in processing time in both - client and server
I would really like to use solution no. 2, but it has its cons... Anyone can give me a better insight of "what is best" solution?
My goal is to have RESTful services with as much standards included as possible, while I want to keep it as simple as possible.
OP here (I am answering this question after two years, the post made by Daniel Cerecedo was not bad at a time, but the web services are developing very fast)
After three years of full-time software development (with focus also on software architecture, project management and microservice architecture) I definitely choose the second way (but with one general endpoint) as the best one.
If you have a special endpoint for images, it gives you much more power over handling those images.
We have the same REST API (Node.js) for both - mobile apps (iOS/android) and frontend (using React). This is 2017, therefore you don't want to store images locally, you want to upload them to some cloud storage (Google cloud, s3, cloudinary, ...), therefore you want some general handling over them.
Our typical flow is, that as soon as you select an image, it starts uploading on background (usually POST on /images endpoint), returning you the ID after uploading. This is really user-friendly, because user choose an image and then typically proceed with some other fields (i.e. address, name, ...), therefore when he hits "send" button, the image is usually already uploaded. He does not wait and watching the screen saying "uploading...".
The same goes for getting images. Especially thanks to mobile phones and limited mobile data, you don't want to send original images, you want to send resized images, so they do not take that much bandwidth (and to make your mobile apps faster, you often don't want to resize it at all, you want the image that fits perfectly into your view). For this reason, good apps are using something like cloudinary (or we do have our own image server for resizing).
Also, if the data are not private, then you send back to app/frontend just URL and it downloads it from cloud storage directly, which is huge saving of bandwidth and processing time for your server. In our bigger apps there are a lot of terabytes downloaded every month, you don't want to handle that directly on each of your REST API server, which is focused on CRUD operation. You want to handle that at one place (our Imageserver, which have caching etc.) or let cloud services handle all of it.
small 2023 update: If possible, but CDN in front of the pictures, it usually will save you a lot of money and make the pictures even more available (i.e. no issues when peaks happen).
Cons : The only "cons" which you should think of is "not assigned images". User select images and continue with filling other fields, but then he says "nah" and turn off the app or tab, but meanwhile you successfully uploaded the image. This means you have uploaded an image which is not assigned anywhere.
There are several ways of handling this. The most easiest one is "I don't care", which is a relevant one, if this is not happening very often or you even have desire to store every image user send you (for any reason) and you don't want any deletion.
Another one is easy too - you have CRON and i.e. every week and you delete all unassigned images older than one week.
There are several decisions to make:
The first about resource path:
Model the image as a resource on its own:
Nested in user (/user/:id/image): the relationship between the user and the image is made implicitly
In the root path (/image):
The client is held responsible for establishing the relationship between the image and the user, or;
If a security context is being provided with the POST request used to create an image, the server can implicitly establish a relationship between the authenticated user and the image.
Embed the image as part of the user
The second decision is about how to represent the image resource:
As Base 64 encoded JSON payload
As a multipart payload
This would be my decision track:
I usually favor design over performance unless there is a strong case for it. It makes the system more maintainable and can be more easily understood by integrators.
So my first thought is to go for a Base64 representation of the image resource because it lets you keep everything JSON. If you chose this option you can model the resource path as you like.
If the relationship between user and image is 1 to 1 I'd favor to model the image as an attribute specially if both data sets are updated at the same time. In any other case you can freely choose to model the image either as an attribute, updating the it via PUT or PATCH, or as a separate resource.
If you choose multipart payload I'd feel compelled to model the image as a resource on is own, so that other resources, in our case, the user resource, is not impacted by the decision of using a binary representation for the image.
Then comes the question: Is there any performance impact about choosing base64 vs multipart?. We could think that exchanging data in multipart format should be more efficient. But this article shows how little do both representations differ in terms of size.
My choice Base64:
Consistent design decision
Negligible performance impact
As browsers understand data URIs (base64 encoded images), there is no need to transform these if the client is a browser
I won't cast a vote on whether to have it as an attribute or standalone resource, it depends on your problem domain (which I don't know) and your personal preference.
Your second solution is probably the most correct. You should use the HTTP spec and mimetypes the way they were intended and upload the file via multipart/form-data. As far as handling the relationships, I'd use this process (keeping in mind I know zero about your assumptions or system design):
POST to /users to create the user entity.
POST the image to /images, making sure to return a Location header to where the image can be retrieved per the HTTP spec.
PATCH to /users/carPhoto and assign it the ID of the photo given in the Location header of step 2.
There's no easy solution. Each way has their pros and cons . But the canonical way is using the first option: multipart/form-data. As W3 recommendation guide says
The content type "multipart/form-data" should be used for submitting forms that contain files, non-ASCII data, and binary data.
We aren't sending forms,really, but the implicit principle still applies. Using base64 as a binary representation, is incorrect because you're using the incorrect tool for accomplish your goal, in other hand, the second option forces your API clients to do more job in order to consume your API service. You should do the hard work in the server side in order to supply an easy-to-consume API. The first option is not easy to debug, but when you do it, it probably never changes.
Using multipart/form-data you're sticked with the REST/http philosophy. You can view an answer to similar question here.
Another option if mixing the alternatives, you can use multipart/form-data but instead of send every value separate, you can send a value named payload with the json payload inside it. (I tried this approach using ASP.NET WebAPI 2 and works fine).

Is it worth to exclude null fields from a JSON server response in a web application to reduce traffic?

Lets say that the API is well documented and every possible response field is described.
Should web application's server API exclude null fields in a JSON response to lower the amount of traffic? Is this a good idea at all?
I was trying to calculate the amount of traffic reduced for a large app like Twitter, and the numbers are actually quite convincing.
For example: if you exclude a single response field, "someGenericProperty":null, which is 26 bytes, from every single API response, while Twitter is reportedly having 13 billion API requests per day, the amount of traffic reduction will be >300 Gb.
More than 300 Gb less traffic every day is quite a money saver, isn't it? That's probably the most naive and simplistic calculation ever, but still.
In general, no. The more public the API and and the more potential consumers of the API, the more invariant the API should be.
Developers getting started with the API are confused when a field shows up some times, but not other times. This leads to frustration and ultimately wastes the API owner's time in the form of support requests.
There is no way to know exactly how downstream consumers are using an API. Often, they are not using it just as the API developer imagines. Elements that appear or disappear based on the context can break applications that consume the API. The API developer usually has no way to know when a downstream application has been broken, short of complaints from downstream developers.
When data elements appear or disappear, uncertainty is introduced. Was the data element not sent because the API considered it to be irrelevant? Or has the API itself changed? Or is some bug in the consumer's code not parsing the response correctly? If the consumer expects a fields and it isn't there, how does that get debugged?
On the server side, extra code is needed to strip out those fields from the response. What if the logic to strip out data the wrong? It's a chance to inject defects and it means there is more code that must be maintained.
In many applications, network latency is the dominating factor, not bandwidth. For performance reasons, many API developers will favor a few large request/responses over many small request/responses. At my last company, the sales and billing systems would routinely exchange messages of 100 KB, 200 KB or more. Sometimes only a few KB of the data was needed. But overall system performance was better than fetching some data, discovering more was needed then sending additional request for that data.
For most applications some inconsistency is more dangerous than superfluous data is wasteful.
As always, there are a million exceptions. I once interviewed for a job at a torpedo maintenance facility. They had underwater sensors on their firing range to track torpedoes. All sensor data were relayed via acoustic modems to a central underwater data collector. Acoustic underwater modems? Yes. At 300 baud, every byte counts.
There are battery-powered embedded applications where every bytes counts, as well as low-frequency RF communication systems.
Another exception is sparse data. For example, imagine a matrix with 4,000,000 rows and 10,000 columns where 99.99% of the values of the matrix are zero. The matrix should be represented with a sparse data structure that does not include the zeros.
It's definitely dependent from the service and the amount of data it provides; it should be evaluate the ratio about null / not null data and set a threshold over than it worth to exclude that elements.
Thanks for sharing, it's an interesting point as for me.
The question is on a wrong side - JSON is not the best format to compress or reduce traffic, but something like google protobuffers or bson is.
I am carefully re-evaluating nullables in the API scheme right now. We use swagger (Open API) and json scheme does not really have something like nullable type and I think there is a good reason for this.
If you have a JSON response that maps a DB integer field which is suddenly NULL (or can be according to DB scheme), well it is indeed ok for relational DB but not at all healthy for your API.
I suggest to adopt and follow a much more elegant approach, and that would be to make better use of "required" also for the response.
If the field is optional in the response API scheme and it has null value in the DB do not return this field.
We have enabled strict scheme checks also for the API responses, and this gives us a much better control of our data and force us not to rely on states in the API.
For the API client that of course means doing checks like:
if ("key" in response) {
console.log("Optional key value:" + response[key]);
} else {
console.log("Optional key not found");
}

what is the need of GET method in PHP,JAVA ot Dot NET, when you have many advantages of POST over GET method?

In all languages there are GET and POST methods for transfering data. POST is more secure then GET and data transfer size limits are also there. So why in all languages there is a GET method? What are the advantages of the GET method?
GET data is stored in URL, so page with GET request can be bookmarked or linked. You just can't do that with POST. Almost every web-page uses GET to specify requested page, even stackoverflow.com.
Note that GET, POST (and PUT, DELETE, etc.) are not methods of the language you program in, but are HTTP protocol methods.
What do you mean by "transfer data"?
If, by this, you mean to collect data from the user in the browser (or other client application) and then send to the server for updating a database or to process in some other way that creates/updates a resource on the server, consider the POST or PUT method instead (depending on whether the action is idempotent or not).
If, however, you mean to collect data from the user and send to the server to retrieve information and, without updating/creating a resource on the server, the GET method would be appropriate.
It's useful for direct linking for the user. You can immediately put the thread number in the address bar in forums or video numbers for videos in YouTube instead of having to browse the entire site.

Using messaging to do writes as well as reads

I come from a web background where I only have to deal with HTTP so please excuse my ignorance.
I have an app which where clients listen for changes in a message queue which uses stomp. Previously the client only needed to listen to the relevant channels for messages telling them about changes on the server and update themselves accordingly. Simple stuff.
There is now a requirement for the client to be able to edit data and push those changes back to the server. The data on the server is already exposed via restful resources so my first thought was just to make REST put requests to change the data on the server, but then I started to wonder whether I could find a solution using messaging. I could just open up another channel which the clients could publish changes to and the server could subscribe to that channel and update itself accordingly. Implementing this would obviously be simple but I would love to have some of the potential pitfalls pointed out to me ahead of time.
I am familiar with REST so I want to ask some questions in the context of REST:
Would I map a group of queues to REST/CRUD verbs for each resource i.e. itemPostQueue, itemPutQueue, itemDeleteQueue?
What about GET's how can I request data to read using a queue?
What do I use to replace my status code mechanism to catch problems or do I just fire and forget (gulp) or use error/receipt headers in Stomp somehow?
Any answers and advise will be much appreciated.
Regards,
Chris
While I am not clear on why you must use messaging here, a few thoughts:
You could map to REST on the wire like itemPostQueue, but this would likely feel unnatural to a message-oriented person. If you are using some kind of queue with a guaranteed semantic and deliver-once built in, then go ahead and use that mechanism. For a shopping-cart example, then you could put an AddItem message on the wire, and you trust the infrastructure to deliver it once to the server.
There is no direct GET like concept here in message queuing. You can simulate it with a pair of messages, I send you a request and you send me back a response. This is much like RPC, but even further decoupled. So I send you a PublishCart request and later on, the server sends a CartContents message on a channel that the client is listening to.
Status codes are more complex, and generally fall into two camps. First are the actual queue-library messages - deal with them just as you would any normal system message. Second you may have your own messages you want to put on the wire that signal failure at some place in the chain.
One thing that messaging does do is significantly decouple your app. Unlike HTTP, where you know that something happened, with a queue, you send a letter to somebody. It may get there. The postman might drop it in the snow. The dog might eat it. If you don't get a response in some period of time, you try other means to contact your relatives, or to pull back the analogy, to contact the server. Monitoring of the health of the queue infrastructure and depth of queues and the like take on added importance, as they are the plumbing that you are now depending upon.
Good Luck