transforming all string attributes which are boolean to true booleans - json

I thought this would be simple, perhaps it is but on my jolt learning journey i am once again struggling.
I have some json files (without a schema) which can be up to say 30Mb in size which have many thousands of string attributes at all levels of the document some of which (say 20%) which hold booleans as strings types.
I get that i can write a spec to pick out individual ones and convert them as per (post)[https://stackoverflow.com/questions/64972556/convert-boolean-to-string-for-map-values-in-nifi-jolt]
They technique wont work for me as nesting and levels are very arbitrary and there are simply way to many of them.
so how can i apply the data type transform to any attribute which has a boolean represented as a string ?
for example input
{
"name": "Fred",
"age": 45,
"opentowork" : "true",
"friends" : [
{
"name": "penny",
"closefriend": "false"
},
{
"name": "roger",
"farfriend": "true"
}
]
}
to desired
{
"name": "Fred",
"age": 45,
"opentowork" : true,
"friends" : [
{
"name": "penny",
"closefriend": false
},
{
"name": "roger",
"farfriend": true
}
]
}
I want to pick up attributes opentowork, closefriend and farfriend without explicity defining them int the spec, i also need to leave the values of the other attributes as they are (whatever level they are at).

You can use =toBoolean conversion just a bit separating case within the friends array by using "f*" representation from the else case "*" such as
[
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=toBoolean",
"f*": {
"*": {
"*": "=toBoolean"
}
}
}
}
]
or some multiple modify specs, without explicitly defining any attribute/array/object, might be added at the number of desired levels such as
[
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=toBoolean"
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": {
"*": "=toBoolean"
}
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": {
"*": {
"*": "=toBoolean"
}
}
}
}
]

Related

Nifi Jolt convert after subtract

[
{
"dst_cnt": "149125"
},
{
"src_cnt": "149136"
},
{
"TABLENAME": "NAME"
}
]
I want to subtract dst_cnt and src_cnt from this data with NIfi jolt. Is data operation possible after type conversion in NIfi?
Yes, it's possible. You can try the following transformatios spec
[
{
// combine individual attributes within the common object
"operation": "shift",
"spec": {
"*": {
"*": "&"
}
}
},
{
// sum up the the integer values after determining the negative form of "dst_cnt"
"operation": "modify-overwrite-beta",
"spec": {
"dst_cnt_": "=divide(#(1,dst_cnt),-1)",
"cnt_dif": "=intSum(#(1,dst_cnt_),#(1,src_cnt))"
}
},
{
// only pick the result derived from the subtraction
"operation": "shift",
"spec": {
"cnt_*": "&"
}
}
]

Jolt transform specification input

i have the following input json:
{
"tags": {
"event": "observation",
"source": "hunter"
}
}
The output JSON should look like below:
{
"tags" : [ "event:observation", "source:hunter" ]
}
can anyone provide any guidance on how to build a proper jolt specification for the above?
thank you very much for the help ^_^
You can use this specification
[
{ // combine each key-value pair under within common arrays
"operation": "shift",
"spec": {
"tags": {
"*": {
"$": "&2_&1",
"#": "&2_&1"
}
}
}
},
{ // concatenate key-value pairs by colon characters
"operation": "modify-overwrite-beta",
"spec": {
"*": "=join(':',#(1,&))"
}
},
{
"operation": "shift",
"spec": { // make array key common("tags") for all arrays
// through use of _ seperator and * wildcard
"*_*": "&(0,1)"
}
}
]
the demo on the site http://jolt-demo.appspot.com/ is

Convert object property to array with one element per key

I am trying to use Jolt to transform an object into an array with one array element per key in the original object. I'd like the output objects to include the original key as a property, and preserve any properties from the source value. And I need to handle three scenarios for the input properties:
"key": null
"key": {}
"key": {...}
Here's an example:
{
"operations": {
"foo": null,
"bar": {},
"baz": {
"arbitrary": 1
}
}
}
And the desired output"
{
"operations": [
{
"operation": "foo"
},
{
"operation": "bar"
},
{
"operation": "baz",
"arbitrary": 1
}
]
}
Note: foo, bar and baz are arbitrary here. It needs to handle any property names inside the operations object.
This is really close to what I want:
[
{
"operation": "default",
"spec": {
"operations": {
"*": {}
}
}
},
{
"operation": "shift",
"spec": {
"operations": {
"*": {
"$": "operations[].operation"
}
}
}
}
]
But it drops "arbitrary": 1 from the baz operation.
Alternately this keeps the properties in the operations, but doesn't add a key for the operation name:
[
{
"operation": "default",
"spec": {
"operations": {
"*": {}
}
}
},
{
"operation": "shift",
"spec": {
"operations": {
"*": {
"#": "operations[]"
}
}
}
}
]
Any help getting both behaviors would be appreciated.
You can use one level of shift transformation spec along with symbolical usage(wildcards) rather than repeated literals such as
[
{
"operation": "shift",
"spec": {
"*s": {
"*": {
"$": "&2[#2].&(2,1)",
"*": "&2[#2].&"
}
}
}
}
]
where
&2 represents going 2 levels up the tree by traversing { signs twice in order to pick the key name operations (if it were only &->eg.identicals &(0) or &(0,0), then it would traverse only the colon and reach $ to grab its value)
[#2] also represents going 2 levels of traversing { signs and : sign, as it's already located on the Right Hand Side of the spec, in order to ask that reached node how many matches it has had
&(2,1) subkey lookup represents going 2 levels up the tree and grab the reached key name of the object by the first argument, and which part of the key, which's partitioned by * wildcard, to use by the second argument. (in this case we produce the literal operation without plural suffix)
* wildcard, which's always on the Left Hand Side, represents the rest of the attributes(else case).
the demo on the site http://jolt-demo.appspot.com is

How can I combine two arrays to create a key value pair with Jolt?

I've already created a spec to convert my JSON input
{
"rows": [
{
"row": [
"row1",
"row2",
"row3"
],
"header": [
"header1",
"header2",
"header3"
]
},
{
"row": [
"row4",
"row5",
"row6"
],
"header": [
"header4",
"header5",
"header6"
]
}
]
}
to convert to key-value pairs as following object result :
{
"header1" : "row1",
"header2" : "row2",
"header3" : "row3",
"header4" : "row4",
"header5" : "row5",
"header6" : "row6"
}
Is this possible to do using Jolt?
Is there a copy/paste error in your input? Judging by your desired output, the second object's header array should be ["header4", "header5", "header6"]. If that's the case, this spec should work:
[
{
"operation": "shift",
"spec": {
"rows": {
"*": {
"header": {
"*": {
"*": {
"#(3,row[#2])": "&"
}
}
}
}
}
}
}
]
One option is to use the following shift transformation spec :
[
{
"operation": "shift",
"spec": {
"*s": { // rows array
"*": {
"&(1,1)": { // row array
"*": {
"#": "#(3,header[&1])"
}
}
}
}
}
}
]
where
"*s": { stands for rows
"&(1,1)": { -> not immediate(zeroth) level but one more level up by using &(1, and grab the value there the first asterisk exists by &(..,1)
"#": "#(3,header[&1])" -> 3 levels needed as stated at the right hand side traverse the colon
as well in order to reach the level of &(1,1) which is used to
represent the "row" array along with &1 representation to go one level up the tree to reach the indexes of the array "row" while matching with the values of "row" through use of # on the left hand side
the demo on the site http://jolt-demo.appspot.com/ is :

Jolt spec for conditional presence of field

I have a scenario where I have two very similar inputs formats, but I need one Jolt spec to process both formats consistently.
This is input style 1:
{
"creationTime": 1503999158000,
"device": {
"ip": "155.157.36.226",
"hostname": "server-123.example.int"
}
}
and this is input style 2:
{
"creationTime": 1503999158000,
"device": {
"ip6": "2001::face",
"hostname": "server-123.example.int"
}
}
The only difference is that style 1 uses device.ip, and style 2 uses device.ip6. There will always be one or neither of those fields, but never both.
I want to simply extract the following:
{
"created_ts": 1503999158000,
"src_ip_addr": "....."
}
I need src_ip_addr to be set to whichever field was present out of ip and ip6. If neither field was present in the source data, the value should default to null.
Is this possible with a single Jolt spec?
A single spec with two operations.
Spec
[
{
"operation": "shift",
"spec": {
"creationTime": "created_ts",
"device": {
// map ip or ip6 to src_ip_addr
"ip|ip6": "src_ip_addr"
}
}
},
{
"operation": "default",
"spec": {
// if src_ip_addr does not exist, then apply a default of null
"src_ip_addr": null
}
}
]
I tried out the following and it worked for my requirements:
[
{
"operation": "shift",
"spec": {
"creationTime": "created_ts",
"device": {
// map both to src_ip_addr, whichever one is present will be used
"ip": "src_ip_addr",
"ip6": "src_ip_addr"
}
}
},
{
"operation": "default",
"spec": {
"src_ip_addr": null
}
}
]