Is it worth to exclude null fields from a JSON server response in a web application to reduce traffic? - json

Lets say that the API is well documented and every possible response field is described.
Should web application's server API exclude null fields in a JSON response to lower the amount of traffic? Is this a good idea at all?
I was trying to calculate the amount of traffic reduced for a large app like Twitter, and the numbers are actually quite convincing.
For example: if you exclude a single response field, "someGenericProperty":null, which is 26 bytes, from every single API response, while Twitter is reportedly having 13 billion API requests per day, the amount of traffic reduction will be >300 Gb.
More than 300 Gb less traffic every day is quite a money saver, isn't it? That's probably the most naive and simplistic calculation ever, but still.

In general, no. The more public the API and and the more potential consumers of the API, the more invariant the API should be.
Developers getting started with the API are confused when a field shows up some times, but not other times. This leads to frustration and ultimately wastes the API owner's time in the form of support requests.
There is no way to know exactly how downstream consumers are using an API. Often, they are not using it just as the API developer imagines. Elements that appear or disappear based on the context can break applications that consume the API. The API developer usually has no way to know when a downstream application has been broken, short of complaints from downstream developers.
When data elements appear or disappear, uncertainty is introduced. Was the data element not sent because the API considered it to be irrelevant? Or has the API itself changed? Or is some bug in the consumer's code not parsing the response correctly? If the consumer expects a fields and it isn't there, how does that get debugged?
On the server side, extra code is needed to strip out those fields from the response. What if the logic to strip out data the wrong? It's a chance to inject defects and it means there is more code that must be maintained.
In many applications, network latency is the dominating factor, not bandwidth. For performance reasons, many API developers will favor a few large request/responses over many small request/responses. At my last company, the sales and billing systems would routinely exchange messages of 100 KB, 200 KB or more. Sometimes only a few KB of the data was needed. But overall system performance was better than fetching some data, discovering more was needed then sending additional request for that data.
For most applications some inconsistency is more dangerous than superfluous data is wasteful.
As always, there are a million exceptions. I once interviewed for a job at a torpedo maintenance facility. They had underwater sensors on their firing range to track torpedoes. All sensor data were relayed via acoustic modems to a central underwater data collector. Acoustic underwater modems? Yes. At 300 baud, every byte counts.
There are battery-powered embedded applications where every bytes counts, as well as low-frequency RF communication systems.
Another exception is sparse data. For example, imagine a matrix with 4,000,000 rows and 10,000 columns where 99.99% of the values of the matrix are zero. The matrix should be represented with a sparse data structure that does not include the zeros.

It's definitely dependent from the service and the amount of data it provides; it should be evaluate the ratio about null / not null data and set a threshold over than it worth to exclude that elements.
Thanks for sharing, it's an interesting point as for me.

The question is on a wrong side - JSON is not the best format to compress or reduce traffic, but something like google protobuffers or bson is.
I am carefully re-evaluating nullables in the API scheme right now. We use swagger (Open API) and json scheme does not really have something like nullable type and I think there is a good reason for this.
If you have a JSON response that maps a DB integer field which is suddenly NULL (or can be according to DB scheme), well it is indeed ok for relational DB but not at all healthy for your API.
I suggest to adopt and follow a much more elegant approach, and that would be to make better use of "required" also for the response.
If the field is optional in the response API scheme and it has null value in the DB do not return this field.
We have enabled strict scheme checks also for the API responses, and this gives us a much better control of our data and force us not to rely on states in the API.
For the API client that of course means doing checks like:
if ("key" in response) {
console.log("Optional key value:" + response[key]);
} else {
console.log("Optional key not found");
}

Related

Best strategy to script a web journey in jmeter with too many json submits in requests

I am in process of developing a Jmeter script for a web journey which has close to 30 transactions, I observe that there are like 20 requests(API calls) which are submitting heavy Json load(with lots of fields) and this Json structure varies significantly with key data( Account number), it doesn't look like corelating 100s of fields of each of these json requests is an efficient way of scripting. I did give a try to submit predefined Json files in the requests, but that way I would need at least 10 Json( 1 json per request) for each Account number and considering I am looking to test like 200 Account numbers, this also doesn't look like a reasonable approach. Can someone please suggest pointers to approach this ? Should I test APIs independently ?
There is only one strategy: your load test must represent real life application usage with 100% accuracy so I would go for separate APIs testing in 2 cases:
As the last resort if for some reason realistic workload is not possible
When one particular API is known to be a bottleneck and you need to repeat the test focusing on one endpoint only
With regards to correlation there are some semi and fully automated correlation solutions like:
Correlations Recorder Plugin for JMeter where you can specify the correlation rules prior to recording and all matching values will be automatically replaced with the respective extractors/variables
BlazeMeter Proxy Recorder which is capable of exporting recorded scripts in "SmartJMX" mode with automatic detection and correlation of dynamic elements
If the next request can be "constructed" from the previous response but with some transformation of the JSON structure you can amend the response and convert it to the required form using JSR223 Test Elements - see Apache Groovy - Parsing and Producing JSON article for more information

What is the RESTful way to return a JSON + binary file in an API

I have to implement a REST endpoint that receives start and end dates (among other arguments). It does some computations to generate a result that is a kind of forecast according to the server state at invocation epoch and the input data (imagine a weather forecast for next few days).
Since the endpoint does not alter the system state, I plan to use GET method and return a JSON.
The issue is that the output includes also an image file (a plot). So my idea is to create a unique id for the file and include an URI in the JSON response to be consumed later (I think this is the way suggested by HATEOAS principle).
My question is, since this image file is a resource that is valid only as part of the response to a single invocation to the original endpoint, I would need a way to delete it once it was consumed.
Would it be RESTful to deleting it after serving it via a GET?
or expose it only via a DELETE?
or not delete it on consumption and keep it for some time? (purge should be performed anyway since I can't ensure the client consumes the file).
I would appreciate your ideas.
Would it be RESTful to deleting it after serving it via a GET?
Yes.
or expose it only via a DELETE?
Yes.
or not delete it on consumption and keep it for some time?
Yes.
The last of these options (caching) is a decent fit for REST in HTTP, since we have meta-data that we can use to communicate to general purpose components that a given representation has a finite lifetime.
So this reference of the report (which includes the link to the plot) could be accompanied by an Expires header that informs the client that the representation of the report has an expected shelf life.
You might, therefore, plan to garbage collect the image resource after 10 minutes, and if the client hasn't fetched it before then - poof, gone.
The reason that you might want to keep the image around after you send the response to the GET: the network is unreliable, and the GET message may never reach its destination. Having things in cache saves you the compute of trying to recalculate the image.
If you want confirmation that the client did receive the data, then you must introduce another message to the protocol, for the client to inform you that the image has been downloaded successfully.
It's reasonable to combine these strategies: schedule yourself to evict the image from the cache in some fixed amount of time, but also evict the image immediately if the consumer acknowledges receipt.
But REST doesn't make any promises about liveness - you could send a response with a link to the image, but 404 Not Found every attempt to GET it, and that's fine (not useful, of course, but fine). REST doesn't promise that resources have stable representations, or that the resource is somehow eternal.
REST gives us standards for how we request things, and how responses should be interpreted, but we get a lot of freedom in choosing which response is appropriate for any given request.
You could offer a download link in the JSON response to that binary resource that also contains the parameters that are required to generate that resource. Then you can decide yourself when to clean that file up (managing disk space) or cache it - and you can always regenerate it because you still have the parameters. I assume here that the generation doesn't take significant time.
It's a tricky one. Typically GET requests should be repeatable as an import HTTP feature, in case the original failed. Some people might rely on it.
It could also be construed as a 'non-safe' operation, GET resulting in what is effectively a DELETE.
I would be inclined to expire the image after X seconds/minutes instead, perhaps also supporting DELETE at that endpoint if the client got the result and wants to clean up early.

Can we use mysql with ethereum?

We are thinking of building a dapp for finding salon near me and allowing booking. In such application end users has to be shown the salons which are at a certain distance from their current location. Where would be query that data from. Cause I don't think so that kind of querying is possible in solidity. Do we need to bring in any RDBMS in such a scenario to store salon data so that we can query it easily and booking information can be sent to blockchain.
Is hybrid application the only way out? People are talking about IPFS should be used for storing the images, videos and other data. Is that the solution, if yes then how would we query it from there? Moreover, would it be fast enough?
TL;DR: Short answer: you might, but you shouldn't.
The real question here is what do you need ethereum for in this project?
In Ethereum, each write operation is costly whereas reading data isn't. Write operations are transactions, read are calls.
It means that "uploading" your salon list will cost you money (i.e. gas), but also each data update (opening hours, booking ..).
For the mysql specific part of your question, well Ethereum is not designed for that kind of operation.
Something with an oracle might do the trick but it's truly not designed for. See Ethereum like a way to intermediate transactions between peers that stores every transaction publicly and permanently.
According to the Wikipedia page blockchains are basically a "continuously growing list of records". Ethereum has the possibility to make workers run some code in exchange of gas.
This code (or Smart Contract) only purpose "to facilitate, verify, or enforce the negotiation or performance of a contract" (here contract is a legally binding contract).
From what you described, IMO a simple web application with "standard" SQL is more than enough.
You just have to store the salons' GPS coordinates and do the closest match(es) from the user's GPS coordinates.
You likely want to separate your application into two parts:
- The part which shows the available booking times
- The part which makes a new booking

How to send a very large array to the client browser?

I have a very large array (20 million numbers, output of a sql query) in my MVC application and I need to send it to the client browser (it will be visualized on a map using webGL and the user is supposed to play with the data locally). What is the best approach to send the data? (Please just do not suggest this is a bad idea! I am looking for an answer to this specific question, not alternative suggestions)
This is my current code (called using ajax), but when array size goes above 3 millions I receive outofmemory exception. It seems the serialization (stringbuilder?) fails.
List<double> results = DomainModel.GetPoints();
JsonResult result = Json(results, JsonRequestBehavior.AllowGet);
result.MaxJsonLength = Int32.MaxValue;
return result;
I do not have much experience with web programming/javascript/MVC. I have been researching for the past 24 hours but did not get anywhere, so I need a hint/sample code to continue my research.
NO, NO, NO, you do not send that much information to the browser:
it results in a huge memory usage that will most likely crash the web-browser (and in fact in your case it does)
it takes a large amount of time to retrieve it, not everyone has a good internet connection, and even good connections can fluctuate over time
If you're building a map tool, then I'd recommend splitting the map into tiles and sending only the data corresponding to the portion of the map the user is currently working on. Also for larger zooms you can filter out data, as surely you can't place it all on the map.
Edit. A somewhat another alternative would be to ask your users to use machines with at least 16GB of RAM, or whatever RAM size is needed to deal with your huge data).

Is it a good idea to collect data from devices into a message queue?

Let's say you have 1000's of devices all sending data in all the time, would a message queue be a good data collection tool for this data?
Obviously, it depends:
Do you need to process every set of data, regardless of how old it
is?
Will data arive in a stead stream, or be bursty?
Can a single
application process all the data, or will it need to be load
balanced?
Do you have any need for the "messaging" features, such as
topics?
If your devices would be the clients, is there a client
implementation that works on them?
If you have a stream of data that can be processed by a single application, and you are tolerant of occasional lost data, I would keep it simple and post the data via REST or equivalent. I would only look to messaging once you needed either scalability, durability, fault tolerance, or the ability to level out load of time.
You can't go wrong design-wise with employing queues, but as Chris (other responder) stated, it may not be worth your effort in terms of infrastructure as web servers are pretty good at handling reasonable load.
In the "real world" I have seen commercial instruments report status into a queue for processing, so it is certainly a valid solution.