Jolt to Map input fields with conditions - json

I am trying to make some jolt where I need to map only one input to output.
Any help or suggestions appreciated.
If topicA.owner and topicZ.owner both present output owner.name should be mapped to topicZ.owner
if topicA.owner only then output owner.name should be mapped to topicA.owner
if topicZ.owner only then output owner.name should be mapped to topicZ.owner
Input :
{
"topicA": {
"owner": "topic_a_owner"
},
"topicZ": {
"owner": "topic_z_owner"
}
}
Jolt:
[
{
"spec": {
"*": {
"ta": "#(2,topicA.owner)",
"za": "#(2,topicZ.owner)"
}
},
"operation": "modify-default-beta"
},
{
"operation": "shift",
"spec": {
"topicA": {
"ta": "owner.name"
},
"topicZ": {
"za": "owner.name"
}
}
}
]
Expected Output:
{
"owner" : {
"name" : "topic_z_owner"
}
}

The 3 conditions you have mentioned can be simplified into 2 conditions as below.
If topicZ.owner presents, (irrespective of whether topicA.owner is present or not) then output owner.name should be mapped to topicZ.owner (This merges your 1st and 3rd conditions)
If topicA.owner only presents, then output owner.name should be mapped to topicA.owner
So based on this, you can do the following operations.
Use modify-default-beta operation to copy the value of topicA.owner to topicZ.owner field when topicZ.owner is not present.
Use shift operation to map the value of topicZ.owner to owner.name field on the output.
[
{
"spec": {
"topicZ": {
"owner": "#(2,topicA.owner)"
}
},
"operation": "modify-default-beta"
},
{
"operation": "shift",
"spec": {
"topicZ": {
"owner": "owner.name"
}
}
}
]

Related

transforming all string attributes which are boolean to true booleans

I thought this would be simple, perhaps it is but on my jolt learning journey i am once again struggling.
I have some json files (without a schema) which can be up to say 30Mb in size which have many thousands of string attributes at all levels of the document some of which (say 20%) which hold booleans as strings types.
I get that i can write a spec to pick out individual ones and convert them as per (post)[https://stackoverflow.com/questions/64972556/convert-boolean-to-string-for-map-values-in-nifi-jolt]
They technique wont work for me as nesting and levels are very arbitrary and there are simply way to many of them.
so how can i apply the data type transform to any attribute which has a boolean represented as a string ?
for example input
{
"name": "Fred",
"age": 45,
"opentowork" : "true",
"friends" : [
{
"name": "penny",
"closefriend": "false"
},
{
"name": "roger",
"farfriend": "true"
}
]
}
to desired
{
"name": "Fred",
"age": 45,
"opentowork" : true,
"friends" : [
{
"name": "penny",
"closefriend": false
},
{
"name": "roger",
"farfriend": true
}
]
}
I want to pick up attributes opentowork, closefriend and farfriend without explicity defining them int the spec, i also need to leave the values of the other attributes as they are (whatever level they are at).
You can use =toBoolean conversion just a bit separating case within the friends array by using "f*" representation from the else case "*" such as
[
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=toBoolean",
"f*": {
"*": {
"*": "=toBoolean"
}
}
}
}
]
or some multiple modify specs, without explicitly defining any attribute/array/object, might be added at the number of desired levels such as
[
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=toBoolean"
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": {
"*": "=toBoolean"
}
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": {
"*": {
"*": "=toBoolean"
}
}
}
}
]

Convert object property to array with one element per key

I am trying to use Jolt to transform an object into an array with one array element per key in the original object. I'd like the output objects to include the original key as a property, and preserve any properties from the source value. And I need to handle three scenarios for the input properties:
"key": null
"key": {}
"key": {...}
Here's an example:
{
"operations": {
"foo": null,
"bar": {},
"baz": {
"arbitrary": 1
}
}
}
And the desired output"
{
"operations": [
{
"operation": "foo"
},
{
"operation": "bar"
},
{
"operation": "baz",
"arbitrary": 1
}
]
}
Note: foo, bar and baz are arbitrary here. It needs to handle any property names inside the operations object.
This is really close to what I want:
[
{
"operation": "default",
"spec": {
"operations": {
"*": {}
}
}
},
{
"operation": "shift",
"spec": {
"operations": {
"*": {
"$": "operations[].operation"
}
}
}
}
]
But it drops "arbitrary": 1 from the baz operation.
Alternately this keeps the properties in the operations, but doesn't add a key for the operation name:
[
{
"operation": "default",
"spec": {
"operations": {
"*": {}
}
}
},
{
"operation": "shift",
"spec": {
"operations": {
"*": {
"#": "operations[]"
}
}
}
}
]
Any help getting both behaviors would be appreciated.
You can use one level of shift transformation spec along with symbolical usage(wildcards) rather than repeated literals such as
[
{
"operation": "shift",
"spec": {
"*s": {
"*": {
"$": "&2[#2].&(2,1)",
"*": "&2[#2].&"
}
}
}
}
]
where
&2 represents going 2 levels up the tree by traversing { signs twice in order to pick the key name operations (if it were only &->eg.identicals &(0) or &(0,0), then it would traverse only the colon and reach $ to grab its value)
[#2] also represents going 2 levels of traversing { signs and : sign, as it's already located on the Right Hand Side of the spec, in order to ask that reached node how many matches it has had
&(2,1) subkey lookup represents going 2 levels up the tree and grab the reached key name of the object by the first argument, and which part of the key, which's partitioned by * wildcard, to use by the second argument. (in this case we produce the literal operation without plural suffix)
* wildcard, which's always on the Left Hand Side, represents the rest of the attributes(else case).
the demo on the site http://jolt-demo.appspot.com is

unable to convert json list to objects using jolt

I need to use jolt transform to do the below JSON transformation.
need to create new columns from the list from reeval column where sometimes we only one value and some times we get multiple values my input data :-
example 1:
{
"id":"1",
"reeval":["one","two"]
}
example 2:
{
"id":"2",
"reeval":["one","two","three"]
}
example 3:
{
"id":"3",
"reeval":["one"]
}
I have written jolt expresson as below
[
{
"operation": "shift",
"spec": {
"id": "id",
"reeval": {
"*": "&"
}
}
}
]
with above jolt expression is working fine but unable to add column name
output for above jolt is as below
example 1:
{
"id" : "1",
"0" : "one",
"1" : "two"
}
example 2:
{
"id" : "2",
"0" : "one",
"1" : "two",
"2" : "three"
}
here i am unable to change the names of the columns as i need to change colunms as below
my expected output after jolt transformation should be like
example 1:
{
"id":"1",
"reeval":"one",
"reeval1":"two"
}
example 2:
{
"id":"2",
"reeval":"one",
"reeval1":"two",
"reeval2":"three"
}
example 3:
{
"id":"3",
"reeval":"one"
}
Prepending &1 to the current ampersand would suffice in order to go one level up the tree, and to grab the key name in the first shift transformation, and then apply another to rename only the key with index zero such as
[
{
"operation": "shift",
"spec": {
"id": "id",
"reeval": {
"*": "&1&"
}
}
},
{
"operation": "shift",
"spec": {
"reeval0": "reeval",
"*": "&"
}
}
]

How can I combine two arrays to create a key value pair with Jolt?

I've already created a spec to convert my JSON input
{
"rows": [
{
"row": [
"row1",
"row2",
"row3"
],
"header": [
"header1",
"header2",
"header3"
]
},
{
"row": [
"row4",
"row5",
"row6"
],
"header": [
"header4",
"header5",
"header6"
]
}
]
}
to convert to key-value pairs as following object result :
{
"header1" : "row1",
"header2" : "row2",
"header3" : "row3",
"header4" : "row4",
"header5" : "row5",
"header6" : "row6"
}
Is this possible to do using Jolt?
Is there a copy/paste error in your input? Judging by your desired output, the second object's header array should be ["header4", "header5", "header6"]. If that's the case, this spec should work:
[
{
"operation": "shift",
"spec": {
"rows": {
"*": {
"header": {
"*": {
"*": {
"#(3,row[#2])": "&"
}
}
}
}
}
}
}
]
One option is to use the following shift transformation spec :
[
{
"operation": "shift",
"spec": {
"*s": { // rows array
"*": {
"&(1,1)": { // row array
"*": {
"#": "#(3,header[&1])"
}
}
}
}
}
}
]
where
"*s": { stands for rows
"&(1,1)": { -> not immediate(zeroth) level but one more level up by using &(1, and grab the value there the first asterisk exists by &(..,1)
"#": "#(3,header[&1])" -> 3 levels needed as stated at the right hand side traverse the colon
as well in order to reach the level of &(1,1) which is used to
represent the "row" array along with &1 representation to go one level up the tree to reach the indexes of the array "row" while matching with the values of "row" through use of # on the left hand side
the demo on the site http://jolt-demo.appspot.com/ is :

Jolt spec for conditional presence of field

I have a scenario where I have two very similar inputs formats, but I need one Jolt spec to process both formats consistently.
This is input style 1:
{
"creationTime": 1503999158000,
"device": {
"ip": "155.157.36.226",
"hostname": "server-123.example.int"
}
}
and this is input style 2:
{
"creationTime": 1503999158000,
"device": {
"ip6": "2001::face",
"hostname": "server-123.example.int"
}
}
The only difference is that style 1 uses device.ip, and style 2 uses device.ip6. There will always be one or neither of those fields, but never both.
I want to simply extract the following:
{
"created_ts": 1503999158000,
"src_ip_addr": "....."
}
I need src_ip_addr to be set to whichever field was present out of ip and ip6. If neither field was present in the source data, the value should default to null.
Is this possible with a single Jolt spec?
A single spec with two operations.
Spec
[
{
"operation": "shift",
"spec": {
"creationTime": "created_ts",
"device": {
// map ip or ip6 to src_ip_addr
"ip|ip6": "src_ip_addr"
}
}
},
{
"operation": "default",
"spec": {
// if src_ip_addr does not exist, then apply a default of null
"src_ip_addr": null
}
}
]
I tried out the following and it worked for my requirements:
[
{
"operation": "shift",
"spec": {
"creationTime": "created_ts",
"device": {
// map both to src_ip_addr, whichever one is present will be used
"ip": "src_ip_addr",
"ip6": "src_ip_addr"
}
}
},
{
"operation": "default",
"spec": {
"src_ip_addr": null
}
}
]