I am trying to use Jolt to transform an object into an array with one array element per key in the original object. I'd like the output objects to include the original key as a property, and preserve any properties from the source value. And I need to handle three scenarios for the input properties:
"key": null
"key": {}
"key": {...}
Here's an example:
{
"operations": {
"foo": null,
"bar": {},
"baz": {
"arbitrary": 1
}
}
}
And the desired output"
{
"operations": [
{
"operation": "foo"
},
{
"operation": "bar"
},
{
"operation": "baz",
"arbitrary": 1
}
]
}
Note: foo, bar and baz are arbitrary here. It needs to handle any property names inside the operations object.
This is really close to what I want:
[
{
"operation": "default",
"spec": {
"operations": {
"*": {}
}
}
},
{
"operation": "shift",
"spec": {
"operations": {
"*": {
"$": "operations[].operation"
}
}
}
}
]
But it drops "arbitrary": 1 from the baz operation.
Alternately this keeps the properties in the operations, but doesn't add a key for the operation name:
[
{
"operation": "default",
"spec": {
"operations": {
"*": {}
}
}
},
{
"operation": "shift",
"spec": {
"operations": {
"*": {
"#": "operations[]"
}
}
}
}
]
Any help getting both behaviors would be appreciated.
You can use one level of shift transformation spec along with symbolical usage(wildcards) rather than repeated literals such as
[
{
"operation": "shift",
"spec": {
"*s": {
"*": {
"$": "&2[#2].&(2,1)",
"*": "&2[#2].&"
}
}
}
}
]
where
&2 represents going 2 levels up the tree by traversing { signs twice in order to pick the key name operations (if it were only &->eg.identicals &(0) or &(0,0), then it would traverse only the colon and reach $ to grab its value)
[#2] also represents going 2 levels of traversing { signs and : sign, as it's already located on the Right Hand Side of the spec, in order to ask that reached node how many matches it has had
&(2,1) subkey lookup represents going 2 levels up the tree and grab the reached key name of the object by the first argument, and which part of the key, which's partitioned by * wildcard, to use by the second argument. (in this case we produce the literal operation without plural suffix)
* wildcard, which's always on the Left Hand Side, represents the rest of the attributes(else case).
the demo on the site http://jolt-demo.appspot.com is
Related
I thought this would be simple, perhaps it is but on my jolt learning journey i am once again struggling.
I have some json files (without a schema) which can be up to say 30Mb in size which have many thousands of string attributes at all levels of the document some of which (say 20%) which hold booleans as strings types.
I get that i can write a spec to pick out individual ones and convert them as per (post)[https://stackoverflow.com/questions/64972556/convert-boolean-to-string-for-map-values-in-nifi-jolt]
They technique wont work for me as nesting and levels are very arbitrary and there are simply way to many of them.
so how can i apply the data type transform to any attribute which has a boolean represented as a string ?
for example input
{
"name": "Fred",
"age": 45,
"opentowork" : "true",
"friends" : [
{
"name": "penny",
"closefriend": "false"
},
{
"name": "roger",
"farfriend": "true"
}
]
}
to desired
{
"name": "Fred",
"age": 45,
"opentowork" : true,
"friends" : [
{
"name": "penny",
"closefriend": false
},
{
"name": "roger",
"farfriend": true
}
]
}
I want to pick up attributes opentowork, closefriend and farfriend without explicity defining them int the spec, i also need to leave the values of the other attributes as they are (whatever level they are at).
You can use =toBoolean conversion just a bit separating case within the friends array by using "f*" representation from the else case "*" such as
[
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=toBoolean",
"f*": {
"*": {
"*": "=toBoolean"
}
}
}
}
]
or some multiple modify specs, without explicitly defining any attribute/array/object, might be added at the number of desired levels such as
[
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=toBoolean"
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": {
"*": "=toBoolean"
}
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": {
"*": {
"*": "=toBoolean"
}
}
}
}
]
i have an input JSON file , which have some attributes those contains some informations which i want to delete on the output
( example input : Hello_World => output = World )
For example this is the input :
{
"test": "hello_world"
}
and this is output needed :
{
"test" : "world"
}
the result derived from the jolt spec i tried is not even close, this is what i tried :
[
{
"operation": "modify-overwrite-beta",
"spec": {
"test": "=test.substring(test.indexOf(_) + 1)"
}
}
]
Sorry im i a newbie at jolt, just started it.
You can split by _ character within a modify transformation spec, and pick the one with the index 1 in order to display the last component of the arrah considering the current case of having a single underscore such as
[
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=split('_', #(1,&))"
}
},
{
"operation": "shift",
"spec": {
"*": {
"1": "&1" // &1 represents going one level up to reach to the level of the label "test" to replicate it
}
}
}
]
the demo on the site http://jolt-demo.appspot.com/ is :
Considering an input with attribute having value more than one underscore :
{
"test": "how_are_you"
}
[More generic] solution would be as follows :
[
{
"operation": "modify-overwrite-beta",
"spec": {
"test_": "=split('_', #(1,test))",
"test": "=lastElement(#(1,test_))"
}
},
{
// get rid of the extra generated attribute
"operation": "remove",
"spec": {
"test_": ""
}
}
]
the demo on the site http://jolt-demo.appspot.com/ is :
I've already created a spec to convert my JSON input
{
"rows": [
{
"row": [
"row1",
"row2",
"row3"
],
"header": [
"header1",
"header2",
"header3"
]
},
{
"row": [
"row4",
"row5",
"row6"
],
"header": [
"header4",
"header5",
"header6"
]
}
]
}
to convert to key-value pairs as following object result :
{
"header1" : "row1",
"header2" : "row2",
"header3" : "row3",
"header4" : "row4",
"header5" : "row5",
"header6" : "row6"
}
Is this possible to do using Jolt?
Is there a copy/paste error in your input? Judging by your desired output, the second object's header array should be ["header4", "header5", "header6"]. If that's the case, this spec should work:
[
{
"operation": "shift",
"spec": {
"rows": {
"*": {
"header": {
"*": {
"*": {
"#(3,row[#2])": "&"
}
}
}
}
}
}
}
]
One option is to use the following shift transformation spec :
[
{
"operation": "shift",
"spec": {
"*s": { // rows array
"*": {
"&(1,1)": { // row array
"*": {
"#": "#(3,header[&1])"
}
}
}
}
}
}
]
where
"*s": { stands for rows
"&(1,1)": { -> not immediate(zeroth) level but one more level up by using &(1, and grab the value there the first asterisk exists by &(..,1)
"#": "#(3,header[&1])" -> 3 levels needed as stated at the right hand side traverse the colon
as well in order to reach the level of &(1,1) which is used to
represent the "row" array along with &1 representation to go one level up the tree to reach the indexes of the array "row" while matching with the values of "row" through use of # on the left hand side
the demo on the site http://jolt-demo.appspot.com/ is :
I am trying to transform a simple JSon object into an array of objects with keys and values broken out, but I'm not sure how to quite get there.
I have tried this a number of ways but the closest I got was to create an object with two arrays, instead of an array with multiple objects with two properties each:
EDIT: I am trying to write a spec which would take any object, not this specific object. I do not know what the incoming object will be other than it will have simple properties (values will not be arrays or other objects).
Sample Input:
{
"property": "someValue",
"propertyName" : "anotherValue"
}
Expected Output:
{
"split_attributes": [
{
"key" : "property",
"value": "someValue"
},
{
"key" : "propertyName",
"value" : "anotherValue"
}
]
}
My spec so far:
{
"operation": "shift",
"spec": {
"*": {
"$": "split_attributes[#0].key",
"#": "split_attributes[#0].value"
}
}
}
Produces
{
"split_attributes" : [
{
"key" : [ "property", "propertyName" ],
"value" : [ "someValue", "anotherValue"]
}
]
}
SOLUTION
I was pretty close, and after looking at the tests, the solution was obvious (it's identical to one of the tests)
{
"operation": "shift",
"spec": {
"*": {
"$": "split_attributes[#2].key",
"#": "split_attributes[#2].value"
}
}
}
From what it seems, I was creating an array but I was looking at the wrong level for an index to the new array. I'm still fuzzy on the whole # level (for example where in the "tree" (and of which object) is #0, #1 and #2 actually looking).
I have a scenario where I have two very similar inputs formats, but I need one Jolt spec to process both formats consistently.
This is input style 1:
{
"creationTime": 1503999158000,
"device": {
"ip": "155.157.36.226",
"hostname": "server-123.example.int"
}
}
and this is input style 2:
{
"creationTime": 1503999158000,
"device": {
"ip6": "2001::face",
"hostname": "server-123.example.int"
}
}
The only difference is that style 1 uses device.ip, and style 2 uses device.ip6. There will always be one or neither of those fields, but never both.
I want to simply extract the following:
{
"created_ts": 1503999158000,
"src_ip_addr": "....."
}
I need src_ip_addr to be set to whichever field was present out of ip and ip6. If neither field was present in the source data, the value should default to null.
Is this possible with a single Jolt spec?
A single spec with two operations.
Spec
[
{
"operation": "shift",
"spec": {
"creationTime": "created_ts",
"device": {
// map ip or ip6 to src_ip_addr
"ip|ip6": "src_ip_addr"
}
}
},
{
"operation": "default",
"spec": {
// if src_ip_addr does not exist, then apply a default of null
"src_ip_addr": null
}
}
]
I tried out the following and it worked for my requirements:
[
{
"operation": "shift",
"spec": {
"creationTime": "created_ts",
"device": {
// map both to src_ip_addr, whichever one is present will be used
"ip": "src_ip_addr",
"ip6": "src_ip_addr"
}
}
},
{
"operation": "default",
"spec": {
"src_ip_addr": null
}
}
]