Looking for configuration DSL that allows sparse specification and outputs JSON - json

Background: We have all seen several ways to configure a distributed application. For my purposes, two of them stand out:
Have a massive database that all nodes have access to. Each node knows its own identity, and so can perform queries against said database to pull out the configuration information specific to itself.
Use tailored (i.e., specific to each node) configuration files (e.g., JSON) so that the nodes do not have to touch a database at all. They simply read the tailored config file and do what it says.
There are pros and cons to each. For my purposes, I would like to explore #2a little further, but the problem I'm running into is that the JSON files can get pretty big. I'm wondering if anyone knows a DSL that is well-suited for generating these JSON files.
Step-by-step examples to illustrate what I mean:
Suppose I make up this metalanguage that looks like this:
bike.%[0..3](Vehicle)
This would then output the following JSON:
{
"bike.0":
{
"type": "Vehicle"
},
"bike.1":
{
"type": "Vehicle"
},
"bike.2":
{
"type": "Vehicle"
},
"bike.3":
{
"type": "Vehicle"
}
}
The idea is that we've just created 4 bikes, each of which is of type Vehicle.
Going further:
bike[i=0..3](Vehicle)
label: "hello %i"
label.3: "last"
Now what this does is to name the index variable 'i' so that it can be used for the configuration information of each item. The JSON that would be output would be something like this:
{
"bike.0":
{
"type": "Vehicle",
"cfg":
{
"label": "hello 0"
}
},
"bike.1":
{
"type": "Vehicle",
"cfg":
{
"label": "hello 1"
}
},
"bike.2":
{
"type": "Vehicle",
"cfg":
{
"label": "hello.2"
}
},
"bike.3":
{
"type": "Vehicle",
"cfg":
{
"label": "last"
}
}
}
You can see how the last label was overridden, so this is a way to sparsely specify stuff. Is there anything already out there that lets one do this?
Thanks!

Rather than thinking of the metalanguage as a monolithic entity, it might be better to divide it into three parts:
An input specification. You can use a configuration file syntax to hold this specification.
A library or utility that can use print statements and for-loops to generate runtime configuration files. The Apache Velocity template engine comes to mind as something that is suitable for this purpose. I suggest you look at its user guide to get a feel for what it can do.
Some glue code to join together the above two items. In particular, the glue code reads name=value pairs from the input specification, and passes them to the template engine, which uses them as parameters to "instantiate" the template versions of the configuration files that you want to generate.
My answer to another StackOverflow question provides some more details of the above idea.

Related

How to easily change a recurring property name in multiple schemas?

To be able to deserialize polymorphic types, I use a type discriminator across many of my JSON objects. E.g., { "$type": "SomeType", "otherProperties": "..." }
For the JSON schemas of concrete types, I specify a const value for type.
{
"type": "object",
"properties": {
"$type": { "const": "SomeType" },
"otherProperties": { "type": "string" }
}
}
This works, but distributes the chosen "$type" property name throughout many different JSON schemas. In fact, we are considering renaming it to "__type" to play more nicely with BSON.
Could I have prevented having to rename this property in all affected schemas?
I tried searching for a way to load the property name from elsewhere. As far as I can tell $ref only works for property values.
JSON Schema has no ability to dynamically load in key values from other location like you are asking. Specifically because the value will be different, and you want only the key to be loaded from elsewhere.
While you can't do this with JSON Schema, you could use a templating tool such as Jsonnet. I've seen this work well at scale.
This would require you have a pre-processing step, but it sounds like that's something you're planning for already, creating some sort of pipeline to generate your schemas.
A word of warning, watch out for existing schema generation tooling. It is often only good for scaffolding, and requires lots of modifications. It sounds like you're building your own, which is likely a better approach.

Architectural Decision: How to structure a big Json Response

I'm working on an app that will generate a Json potentially very big. In my tests this was 8000 rows. This is because is an aggregation of data for a year, and is required to display details in the UI.
For example:
"voice1": {
"sum": 24000,
"items": [
{
"price": 2000,
"description": "desc1",
"date": "2021-11-01T00:00:00.000Z",
"info": {
"Id": "85fda619bbdc40369502ec3f792ae644",
"address": "add2",
"images": {
"icon": "img.png",
"banner": null
}
}
},
{
"price": 2000,
"description": "desc1",
"date": "2021-11-01T00:00:00.000Z",
"info": {
"Id": "85fda619bbdc40369502ec3f792ae644",
"address": "add2",
"images": {
"icon": "img.png",
"banner": null
}
}
}
]
},
The point is that I can have potentially 10 voices and for each dozen and dozens of items.
I was wondering if you can point to me some Best Practices or if you have some tips about them because I've got the feeling this can be done better.
It sounds like you are finding out that JSON is a rather verbose format (not as bad as XML but still very verbose). If you are worried about the size of messages between server client and you have a few options:
JSON compresses rather well. You can see how most tokens repeat many times. So make sure to Gzip or Snappy before sending to clients. This will drastically reduce the size, but cost some performance for inflating / deflating.
The other alternative is to not use JSON for transfer, but a more optimized format. One of the best options here is Flat Buffers. It does require you to provide schemas of the data that you are sending but is an optimized binary format with minimal overhead. It will also drastically speed up your application because it will remove the need for serialization / deserialization, which takes a significant time for JSON. Another popular, but slightly slower alternative is Protobuf.
The only thing immediately obvious to me is that you would likely want to make a list of voices (like you have for items) rather than voice1, voice2, etc.
Beyond that it really just depends the structure of the data you start with (to create the json) and the structure of the data or code at the destination (and possibly also the method of transferring data if size is a concern). If you're doing a significant amount of processing on either end to encode/decode the json that can suggest there's a simpler way to structure the data. Can you share some additional context or examples of the overall process?

What is the best practice in REST-Api, to pass structured data or key-value pair?

I have a data-structure similar to the given below, which I am supposed to process. I am designing an API which should accept a POST request similar to the one given below. (ignore the headers, etc)
{
"Name" : "Johny English",
"Id": "534dsf",
"Message":[
{
"Header":"Country of origin",
"Value":"England"
},
{
"Header":"Nature of work",
"Value":"Secret Agent/Spy"
}
]
}
Some how I do not feel, its a correct way to pass/accept data. Here I am talking about structured data vs. Key-Value pair. While I can extract the fields ("Name", "Id") directly to an object attributes, but for Key-Value pairs, I need to loop through the collection and compare with strings (eg. "Nature of Work") to extract values.
I searched few sites, looking for any best practices, could not reach into any conclusion. Is there any best practice, suggestions etc.
I don't think you are going to find any firm, evidence based arguments against including a list of key value pairs in your message schema. But that's the sort of thing to look for - people writing about message schema design, and how to design messages to support change, and so on.
As a practical matter, there's not a whole lot of difference
{
"Name" : "Johny English",
"Id": "534dsf",
"Message":[
{
"Header":"Country of origin",
"Value":"England"
},
{
"Header":"Nature of work",
"Value":"Secret Agent/Spy"
}
]
}
or
{
"Name" : "Johny English",
"Id": "534dsf",
"Message": {
"Country of origin": "England",
"Nature of work": "Secret Agent/Spy"
}
}
In the early days of the world wide web, "everything" is key value pairs, because it was easy to describe a collection of key value pairs in such a way that a general-purpose component, like a web browser, could work with it (ie, definitions of HTML forms). It got the job done.
It's usually good to structure your response data the same as what you'd expect the input of the corresponding POST, PUT, and PATCH endpoints to be. This allows record alteration to not require the consuming entity to transform the data first. So in that context, arrays of objects with "name"/"value" fields is much easier to write input validation for.

Adding properties to fields in json-schema to tightly couple the schema with a web UI

My application will be receiving a large json payload from an upstream system. This upsteam system is essentially a UI that will be collecting business requirements from a user, format those questions and facts into a json payload, and transmit the json to my application, which will validate it against a schema defined by the json-schema standard. The conundrum is that this upstream system is being built by a different team who doesn't necessarily understand all of the business requirements that need to be captured.
Take the following schema:
schema = {
"$schema": "http://json-schema.org/draft-04/schema#",
"title":"Requirements",
"description": "A Business Requirements Payload",
"type": "object",
"properties": {
"full_name": {
"type": "string"
},
"sex": {
"enum": ["m", "f"]
},
"age": {
"type": "number"
},
"consents": {
"type": "boolean"
}
},
"required": ["full_name", "sex", "age", "consents"],
"additionalProperties": False
}
Assume that the upstream system has no idea what a full_name, sex, or age was. Currently, I am having meetings explaining the nature of every field/question/fact that I require, default values that should show up on the UI, accompanying text labels that should show up to each field, and etc.
In brainstorming a mechanism to make this easier for everyone, I thought of tightly coupling the json-schema I am creating to the UI that the upstream system is building. What if I include these details inside of the json-schema itself, hand the json-schema to the upstream system, and let the UI team generate the UI with the accompanying text labels, default values, and etc?
For example, the full_name and sex fields could instead be described like this:
"full_name": {
"type": "string",
"default": "\"John Smith\"",
"label": "Full Name",
"text": "Please include your full name.",
"description": "This field will be the primary key in the database"
},
"sex": {
"enum": ["m", "f"],
"default": "m",
"enum_labels": ["Male", "Female"],
"label": "Sex",
"text": "Please include your sex.",
"description": "We want to run analytics on this field"
}
The UI team and I could come to an agreement on certain things:
If the field is of type string, generate a text box.
If the field is an enum, generate a combo box.
Use the field's label property infront of the form entry.
If the field is of type enum, generate pretty labels for the enum values by comparing positioninally against the enum_labels property.
Use the field's text property right below the form entry.
The Description field is only to help you, the UI guy, to know the business logic.
Here are some negatives to this approach:
Tightly coupling the view in this manner may not be optimal
If json-schema v5 introduces a keyword that I am using, such as text, the schema would break if I upgraded to v5 and then I would have to change the contract with the UI team. (What could also be done to avoid this is to use the description field to hold all the form-related keywords, delimited by some character, but it wouldn't look as nice).
It it appropriate to tightly couple a json-schema with a UI, and if it is, is there anything wrong with adding properties to the json-schema like I have described in order to accomplish this?
*I just stumbled across jsonform which is pretty much what I desire, but this question still applies to jsonform as well as a custom parser.
Just to be certain, you are aware there is an optional form object which is used to structure the form output? It allows custom grouping, custom ordering, conditional fields and more ...
https://github.com/joshfire/jsonform/wiki#fields
If your default schema object is satisfactory for both the form layout, as well as how the data object gets stored, then nothing wrong with sticking to the schema for layout of the form.
I am not sure if this answers your question, but the question is slightly unclear to me. Basically yes you can stick to the main schema, but if that is not sufficient for the form layout, you can populate the form object.

Elasticsearch Reindex or Flag Deleted Type Property

This is related to my original question here:
Elasticsearch Delete Mapping Property
From that post assuming you are going to have to "reindex" your data. What is a safe strategy for doing this?
To summarize from the original post I am trying to take the mapping from:
{
"propVal1": {
"type": "double",
"index": "analyzed"
},
"propVal2": {
"type": "string",
"analyzer": "keyword"
},
"propVal3": {
"type": "string",
"analyzer": "keyword"
}
}
to this:
{
"propVal1": {
"type": "double",
"index": "analyzed"
},
"propVal2": {
"type": "string",
"analyzer": "keyword"
}
}
Removing all data for the property that was removed.
I have been contemplating using the REST API for this. This seems dangerous though since you are going to need to synchronize state with the client application making the REST calls, i.e. you need to send all of your documents to the client, modify them, and send them back.
What would be ideal is if there was a server side operation that could move and transform types around. Does something like this exist or am I missing something obvious with the "reindexing"?
Another approach would be to flag the data as no longer valid. Is there any built in flags for this, in terms of the mapping, or is it necessary to create an auxiliary type to define if another type property is valid?
You can have a look to elasticsearch-reindex plugin.
A more manual operation could be to use scan & scroll API to get back your original content and use bulk API to index it in a new index or type.
Last answer, how did you get your docs in Elasticsearch? If you have already a data source somewhere, just use the same process as before.
If you don't want any downtime, use an alias on top of your old index and once reindex is done, just move the alias to the new index.