Architectural Decision: How to structure a big Json Response - json

I'm working on an app that will generate a Json potentially very big. In my tests this was 8000 rows. This is because is an aggregation of data for a year, and is required to display details in the UI.
For example:
"voice1": {
"sum": 24000,
"items": [
{
"price": 2000,
"description": "desc1",
"date": "2021-11-01T00:00:00.000Z",
"info": {
"Id": "85fda619bbdc40369502ec3f792ae644",
"address": "add2",
"images": {
"icon": "img.png",
"banner": null
}
}
},
{
"price": 2000,
"description": "desc1",
"date": "2021-11-01T00:00:00.000Z",
"info": {
"Id": "85fda619bbdc40369502ec3f792ae644",
"address": "add2",
"images": {
"icon": "img.png",
"banner": null
}
}
}
]
},
The point is that I can have potentially 10 voices and for each dozen and dozens of items.
I was wondering if you can point to me some Best Practices or if you have some tips about them because I've got the feeling this can be done better.

It sounds like you are finding out that JSON is a rather verbose format (not as bad as XML but still very verbose). If you are worried about the size of messages between server client and you have a few options:
JSON compresses rather well. You can see how most tokens repeat many times. So make sure to Gzip or Snappy before sending to clients. This will drastically reduce the size, but cost some performance for inflating / deflating.
The other alternative is to not use JSON for transfer, but a more optimized format. One of the best options here is Flat Buffers. It does require you to provide schemas of the data that you are sending but is an optimized binary format with minimal overhead. It will also drastically speed up your application because it will remove the need for serialization / deserialization, which takes a significant time for JSON. Another popular, but slightly slower alternative is Protobuf.

The only thing immediately obvious to me is that you would likely want to make a list of voices (like you have for items) rather than voice1, voice2, etc.
Beyond that it really just depends the structure of the data you start with (to create the json) and the structure of the data or code at the destination (and possibly also the method of transferring data if size is a concern). If you're doing a significant amount of processing on either end to encode/decode the json that can suggest there's a simpler way to structure the data. Can you share some additional context or examples of the overall process?

Related

What is the best practice in REST-Api, to pass structured data or key-value pair?

I have a data-structure similar to the given below, which I am supposed to process. I am designing an API which should accept a POST request similar to the one given below. (ignore the headers, etc)
{
"Name" : "Johny English",
"Id": "534dsf",
"Message":[
{
"Header":"Country of origin",
"Value":"England"
},
{
"Header":"Nature of work",
"Value":"Secret Agent/Spy"
}
]
}
Some how I do not feel, its a correct way to pass/accept data. Here I am talking about structured data vs. Key-Value pair. While I can extract the fields ("Name", "Id") directly to an object attributes, but for Key-Value pairs, I need to loop through the collection and compare with strings (eg. "Nature of Work") to extract values.
I searched few sites, looking for any best practices, could not reach into any conclusion. Is there any best practice, suggestions etc.
I don't think you are going to find any firm, evidence based arguments against including a list of key value pairs in your message schema. But that's the sort of thing to look for - people writing about message schema design, and how to design messages to support change, and so on.
As a practical matter, there's not a whole lot of difference
{
"Name" : "Johny English",
"Id": "534dsf",
"Message":[
{
"Header":"Country of origin",
"Value":"England"
},
{
"Header":"Nature of work",
"Value":"Secret Agent/Spy"
}
]
}
or
{
"Name" : "Johny English",
"Id": "534dsf",
"Message": {
"Country of origin": "England",
"Nature of work": "Secret Agent/Spy"
}
}
In the early days of the world wide web, "everything" is key value pairs, because it was easy to describe a collection of key value pairs in such a way that a general-purpose component, like a web browser, could work with it (ie, definitions of HTML forms). It got the job done.
It's usually good to structure your response data the same as what you'd expect the input of the corresponding POST, PUT, and PATCH endpoints to be. This allows record alteration to not require the consuming entity to transform the data first. So in that context, arrays of objects with "name"/"value" fields is much easier to write input validation for.

Structuring json data in GET call query parameters

I'm trying to pass a list of the following objects as query params to a GET call to my Java service:
{
"id": "123456",
"country": "US",
"locale": "en_us"
}
As a url, this would like like
GET endpoint.com/entity?id1=123456&country1=US&locale1=en_us&id2=...
What's the best way to handle this as a service? If I'm passing potentially 15 of these objects, is there a concise way to take in these parameters and convert them to Java objects on the server side?
I imagine with a URL like this, the service controller would have a lot of #QueryParams...
Create the entire dataset as JSON array, e.g.
[
{
"id": "123456",
"country": "US",
"locale": "en_us"
},
{
"id": "7890",
"country": "UK",
"locale": "en_gb"
}
]
base64 encode it and pass it as a parameter, e.g.
GET endpoint.com/entity?set=BASE64_ENCODED_DATASET
then decode on the server and parse the JSON array into Java objects using perhaps Spring Boot.
Based on the valid URL size comment (although 2000 is usable), you could put the data in a header instead, which can be from 8-16kb depending on the server. GETting multiple resources at once is going to involve compromise somewhere in the design.
As Base64 can contain +/= you can url encode it too although I haven't found the need to do this in practice when using this technique in SAML.
Another approach would be to compromise on searching via country and locale specific IDs:
GET endpoint.com/entity/{country}/{locale}/{id_csv}
so you would search like this:
GET endpoint.com/entity/US/en_us/123456,0349,23421
your backend handles (if using Spring) as #PathParam for {country} and {locale} and it splits {id_csv} to get the list of IDs for that country/locale combination.
To get another country/locale search:
GET endpoint.com/entity/UK/en_gb/7890,234,123232
URLs are much smaller but you can't query the entire dataset in one go as you need to query based on country/locale each time.
It looks like your GET is getting multiple resources from the server. I'd consider refactoring to GET 1 resource from the server per GET request. If this causes performance issues, consider using HTTP caching.

Storage Optimisation: JSON vs String with delimiters

The below JSON file costs 163 bytes to store.
{
"locations": [
{
"station": 6,
"category": 1034,
"type": 5
},
{
"station": 3,
"category": 1171,
"type": 7
},
]
}
But, If the values are put together as a string with delimiters ',' and '_', 6_1034_5,3_1171_7 costs only 17 bytes.
What are the problems with this approach?
Thank you.
The problems that I have seen with this sort of approach are mainly centered around maintainability.
With the delimited approach, the properties of your location items are identified by ordinal. Since there are all numbers, there is nothing to tell you whether the first segment is the station, category, or type; you must know that in advance. Someone new to your code base may not know that and therefore introduce bugs.
Right now all of your data are integers, which are relatively easy to encode and decode and do not risk conflicting with your delimiters. However, if you need to add user-supplied text at some point, you run the risk of that text containing your delimiters. In that case, you will have to invent an escaping/encoding mechanism to ensure that you can reliably detect your delimiters. This may seem simple, but it is more difficult than you may suspect. I've seen it done incorrectly many times.
Using a well-known structured text format like XML or JSON has the advantages that it has fully developed and tested rules for dealing with all types of text, and there are fully developed and tested libraries for reading and writing it.
Depending on your circumstances, this concern over the amount of storage could be a micro-optimization. You might want to try some capacity calculations (e.g., how much actual storage is required for X items) and compare that to the expected number of items vs. the expected amount of storage that will be available.

Should we use enums in web json result

For example, I have a web api which return a Json Http respone Body. The fields in JSON is meaningful, but the question is should I use a string to describe it ? or use a int enum?
Example A:
{
"user_id": 123,
"sex": "male",
"status": "active"
}
Example B:
{
"user_id": 123,
"sex": 1,
"status": 1
}
which is better? and why?
maybe the Example can save some net flow?
This depends on a couple of things, mainly: How many times do these values appear in the JSON you're sending, and are you using compression?
If you're compressing the data you're sending using gzip or something similar, then the difference will be negligible.
The best way to find out is to try both approaches for your use case and meter the data usage, and see which one works better for you.

Looking for configuration DSL that allows sparse specification and outputs JSON

Background: We have all seen several ways to configure a distributed application. For my purposes, two of them stand out:
Have a massive database that all nodes have access to. Each node knows its own identity, and so can perform queries against said database to pull out the configuration information specific to itself.
Use tailored (i.e., specific to each node) configuration files (e.g., JSON) so that the nodes do not have to touch a database at all. They simply read the tailored config file and do what it says.
There are pros and cons to each. For my purposes, I would like to explore #2a little further, but the problem I'm running into is that the JSON files can get pretty big. I'm wondering if anyone knows a DSL that is well-suited for generating these JSON files.
Step-by-step examples to illustrate what I mean:
Suppose I make up this metalanguage that looks like this:
bike.%[0..3](Vehicle)
This would then output the following JSON:
{
"bike.0":
{
"type": "Vehicle"
},
"bike.1":
{
"type": "Vehicle"
},
"bike.2":
{
"type": "Vehicle"
},
"bike.3":
{
"type": "Vehicle"
}
}
The idea is that we've just created 4 bikes, each of which is of type Vehicle.
Going further:
bike[i=0..3](Vehicle)
label: "hello %i"
label.3: "last"
Now what this does is to name the index variable 'i' so that it can be used for the configuration information of each item. The JSON that would be output would be something like this:
{
"bike.0":
{
"type": "Vehicle",
"cfg":
{
"label": "hello 0"
}
},
"bike.1":
{
"type": "Vehicle",
"cfg":
{
"label": "hello 1"
}
},
"bike.2":
{
"type": "Vehicle",
"cfg":
{
"label": "hello.2"
}
},
"bike.3":
{
"type": "Vehicle",
"cfg":
{
"label": "last"
}
}
}
You can see how the last label was overridden, so this is a way to sparsely specify stuff. Is there anything already out there that lets one do this?
Thanks!
Rather than thinking of the metalanguage as a monolithic entity, it might be better to divide it into three parts:
An input specification. You can use a configuration file syntax to hold this specification.
A library or utility that can use print statements and for-loops to generate runtime configuration files. The Apache Velocity template engine comes to mind as something that is suitable for this purpose. I suggest you look at its user guide to get a feel for what it can do.
Some glue code to join together the above two items. In particular, the glue code reads name=value pairs from the input specification, and passes them to the template engine, which uses them as parameters to "instantiate" the template versions of the configuration files that you want to generate.
My answer to another StackOverflow question provides some more details of the above idea.