JSONata data mapping template - json

I'm starting to use JSONata for data transformation and I was wondering if there exists a way to have a file which contains values transformation for some fields in json file.
I will have to do multiple transformation types, but most cases will be to translate a field value from "A" to "B" for example and I will be easier to do that in a file in order to not create new versions of data transformation and just will be necessary to create a new entry in this file.
Regards

You can use $lookup to perform simple mappings. For example, with the following JSON:
{
"mapping": [
{ "a": "a1" },
{ "b": "b1" }
],
"values": [
"a", "b"
]
}
You can map the values using:
values.$lookup($$.mapping, $)
In which case the result will be:
[
"a1",
"b1"
]
Alternatively you can look at $sift which will allow you to write a function to sift through the mappings.

Related

JSONPath to get multiple values from nested json

I have a nested JSON in a field that contains multiple important keys that I would like to retrieve as an array:
{
"tasks": [
{
"id": "task_1",
"name": "task_1_name",
"assignees": [
{
"id": "assignee_1",
"name": "assignee_1_name"
}
]
},
{
"id": "task_2",
"name": "task_2_name",
"assignees": [
{
"id": "assignee_2",
"name": "assignee_2_name"
},
{
"id": "assignee_3",
"name": "assignee_3_name"
}
]}]}
All the queries that I've tried so far fx ( $.tasks.*.assignees..id) and many others have returned
[
"assignee_1",
"assignee_2",
"assignee_3"
]
But what I need is:
[
["assignee_1"],
["assignee_2", "assignee_3"]
]
Is it possible to do with JSONPath or any script inside of it, without involving 3rd party tools?
The problem you're facing is that tasks and assignees are arrays. You need to use [*] instead of .* to get the items in the array. So your path should look like
$.tasks[*].assignees[*].id
You can try it at https://json-everything.net/json-path.
NOTE The output from my site will give you both the value and its location within the original document.
Edit
(I didn't read the whole thing :) )
You're not going to be able to get
[
["assignee_1"],
["assignee_2", "assignee_3"]
]
because, as #Tomalak mentioned, JSON Path is a query language. It's going to remove all structure and return only values.

How to split json using EvaluateJsonPath processor in NiFi

I want to split and transfer the json data in NiFi, Here is my json structure look like this;
I want to split json by id1,id2 array of json transfer to respective processor group say example processor_group a,b. I tried with evaluate json path $.id1,$.id2 but i didn't get exact solution. Can you please help me out from this issue;
{
"id1": [{
"u_name": "aa"
}, {
"addr": "bb"
}],
"id2": [{
"u_name": "aa"
}, {
"addr": "bb"
}]
}
The processor you're looking for is SplitJSON.
Configure it as follows:
Then, you'll receive two FlowFiles:
First one will contain the id1:
[{
"u_name": "aa"
}, {
"addr": "bb"
}]
second one will contain id2:
[{
"u_name": "aa"
}, {
"addr": "bb"
}]
Here is how to get to the values you want with EvaluateJsonPath:
#varun_rathinam Accessing json in an array object via EvaluateJsonPath can be quite confusing.   I also notice the structure of your json is kind of confusing with same values in both.  I have adjusted id2 for cc and dd for testing so that I can tell id1 and id2 values apart.
The solution you want is (see template for exact string values):
Notice we use the normal tree for each json object ( $.object ) then access the array ( 0, 1 ) then access the array's objects.   Also notice it is possible to access the json object array with or without a . before the [.
Reference:
https://community.cloudera.com/t5/Support-Questions/how-to-extract-fields-in-flow-file-which-are-surrounded-by/m-p/208635
You can also find my template during testing of your issue on my GitHub:
https://github.com/steven-dfheinz/NiFi-Templates/blob/master/NiFI_EvaluateJsonPath_Demo.xml

How to use jq to reconstruct complete contents of json file, operating only on part of interest?

All the examples I've seen so far "reduce" the output (filter out) some part. I understand how to operate on the part of the input I want to, but I haven't figured out how to output the rest of the content "untouched".
The particular example would be an input file with several high level entries "array1", "field1", "array2", "array3" say. Each array contents is different. The specific processing I want to do is to sort "array1" entries by a "name" field which is doable by:
jq '.array1 | sort_by(.name)' test.json
but I also want this output as "array1" as well as all the other data to be preserved.
Example input:
{
"field1": "value1",
"array1":
[
{ "name": "B", "otherdata": "Bstuff" },
{ "name": "A", "otherdata": "Astuff" }
],
"array2" :
[
array2 stuff
],
"array3" :
[
array3 stuff
]
}
Expected output:
{
"field1": "value1",
"array1":
[
{ "name": "A", "otherdata": "Astuff" },
{ "name": "B", "otherdata": "Bstuff" }
],
"array2" :
[
array2 stuff
],
"array3" :
[
array3 stuff
]
}
I've tried using map but I can't seem to get the syntax correct to be able to handle any type of input other than the array I want to be sorted by name.
Whenever you use the assignment operators (=, |=, +=, etc.), the context of the expression is kept unchanged. So as long as your top-level filter(s) are assignments, in the end, you'll get the rest of the data (with your changes applied).
In this case, you're just sorting the array1 array so you could just update the array.
.array1 |= sort_by(.name)

d3js dynamically accessor children

I am creating with the d3 tree layout a tree. My data is as already as a tree but not with the d3js format ( {name: "", "childrend": []} ) but with a simple JSON tree format like :
[{
"A": [{
"AA": []
}, {
"AB": []
}, {
"B": [{
"BA": []
}, {
"BB": []
}]
}]
}]
Of course, the data is not with "A" and "B", is just for making the JSON more clear and give just a part of my data. (My data not following a pattern as the exemple)
I saw i could use tree.children() to change the name, but how can i dynamically do it ?!
I need to use this tree format with d3 tree layout.
So since you can write an accessor function, you can make it smarter than just returning a single property.
The function can be made to check each object key and return it if the corresponding value contains children.

JSON Data Optimization by removing repeated column names

I have a basic Json question - I have a JSON file. Every object in this file has columns repeated.
[
{
id: 1,
name: "ABCD"
},
{
id: 2,
name: "ABCDE"
},
{
id: 3,
name: "ABCDEF"
}
]
For optimization I was thinking to remove repeated column names.
{
"cols": [
"id",
"name"
],
"rows": [
[
"1",
"ABCD"
],
[
"2",
"ABCDE"
]
]
}
What I am trying to understand is - is this a better approach? Are there any disadvantages of this format? Say for writing unit tests?
EDIT
The second case (after your editing) is valid json. You can derive it to the following class using json2csharp
public class RootObject
{
public List<string> cols { get; set; }
public List<List<string>> rows { get; set; }
}
The very important point to note about a valid json is that it has no other way but to repeat the column names (or, keys in general) to represent values in json. You can test the validity of your json putting it # jsonlint.com
But if you want to optimize json by compressing it using some compression library like gzip (likewise), then I would recommend Json.HPack.
According to this format, it has many compression levels ranging from 0 to 4 (4 is the best).
At compression level 0:
you have to remove keys (property names) from the structure creating a header on index 0 with each property name. Then your compressed json would look like:
[
[
"id",
"name"
],
[
1,
"ABCD"
],
[
2,
"ABCDE"
],
[
3,
"ABCDEF"
]
]
In this way, you can compress your json at any levels as you want. But in order to work with any json library, you must have to decompress it to valid json first like the one you provided earlier with repeated property names.
For your kind information, you can have a look at the comparison between different compression techniques:
{
"cols": [
"id",
"name"
],
"rows": [
"1",
"ABCD"
], [
"2",
"ABCDE"
], [
"3",
"ABCDEF"
]
}
In this approach it will be hard to determine which value stand for which item (id,name). Your first approach was good if you use this JSON for communication.
A solution for it, is use any type (by your preference) of Object-Relational-Mapper,
By that, you can compress your JSON data and still using legible structure/code.
Please, see this article: What is "compressed JSON"?