Jolt reference first element in array as target name - json

I have been looking at this for a few weeks (in the background) and am stumped on how to convert JSON data approximating a CSV into a tagged set using the NiFi JoltTransformJson processor. What I mean by this is to use the data from the first row of an array in the input as the JSON object name in the output.
As an example I have this input data:
[
[
"Company",
"Retail Cost",
"Percentage"
],
[
"ABC",
"5,368.11",
"17.09%"
],
[
"DEF",
"101.47",
"0.32%"
],
[
"GHI",
"83.79",
"0.27%"
]
]
and what I am trying to get as output is:
[
{
"Company": "ABC",
"Retail Cost": "5,368.11",
"Percentage": "17.09%"
},
{
"Company": "DEF",
"Retail Cost": "101.47",
"Percentage": "0.32%"
},
{
"Company": "GHI",
"Retail Cost": "83.79",
"Percentage": "0.27%"
}
]
I see this as primarily 2 problems: getting access to the content of the first array and then making sure that the output data does not include that first array.
I would love to post a Jolt Specification showing myself getting somewhat close, but the closest gives me the correct shape of output without the correct content. It looks like this:
[
{
"operation": "shift",
"spec": {
"*": {
"*": "[&1].&0"
}
}
}
]
But it results in an output like this:
[ {
"0" : "Company",
"1" : "Retail Cost",
"2" : "Percentage"
}, {
"0" : "ABC",
"1" : "5,368.11",
"2" : "17.09%"
}, {
"0" : "DEF",
"1" : "101.47",
"2" : "0.32%"
}, {
"0" : "GHI",
"1" : "83.79",
"2" : "0.27%"
} ]
Which clearly has the wrong object name and it has 1 too many elements in the output.

Can do it, but wow it is hard to read / looks like terrible regex
Spec
[
{
// this does most of the work, but producs an output
// array with a null in the Zeroth space.
"operation": "shift",
"spec": {
// match the first item in the outer array and do
// nothing with it, because it is just "header" data
// e.g. "Company", "Retail Cost", "Percentage".
// we need to reference it, but not pass it thru
"0": null,
//
// loop over all the rest of the items in the outer array
"*": {
// this is rather confusing
// "*" means match the array indices of the innner array
// and we will write the value at that index "ABC" etc
// to "[&1].#(2,[0].[&])"
// "[&1]" means make the ouput be an array, and at index
// &1, which is the index of the outer array we are
// currently in.
// Then "lookup the key" (Company, Retail Cost) using
// #(2,[0].[&])
// Which is go back up the tree to the root, then
// come back down into the first item of the outer array
// and Index it by the by the array index of the current
// inner array that we are at.
"*": "[&1].#(2,[0].[&])"
}
}
},
{
// We know the first item in the array will be null / junk,
// because the first item in the input array was "header" info.
// So we match the first item, and then accumulate everything
// into a new array
"operation": "shift",
"spec": {
"0": null,
"*": "[]"
}
}
]

Related

how to use $ and * at the same level in a spec

I am new to jolt and whilst i like lots of it one thing thats really hurting me right now is how to use * and $ at the same level in a spec. I have the following desired input and output. But try as i might i cannot seem to transform both the list of action ids (there are the "1" and "2" into attribute values and move the list of action data associated with the id into a sub attribute.
Input
{
"Attr1": "Attr1_data",
"Actions": {
"1": [
"Action data 1 line 1",
"Action data 1 line 2",
"Action data 1 line 3"
],
"2": [
"Action data 2 line 1",
"Action data 2 line 2",
"Action data 2 line 3"
]
},
"Attr2": "Attr2_data"
}
Desired
{
"Attr1": "Attr1_data",
"Action": [
{
"id" : "1",
"data" : [
"Action data 1 line 1",
"Action data 1 line 2",
"Action data 1 line 3"
]
},
{
"id" : "2",
"data" : [
"Action data 2 line 1",
"Action data 2 line 2",
"Action data 2 line 3"
]
}
],
"Attr2": "Attr2_data"
}
using the following spec
[
{
"operation" : "shift",
"spec": {
"Actions": {
"*" : {
"$": "Action[].id"
}
},
"*": "&"
}
}
]
I can generate
{
"Attr1": "Attr1_data",
"Action": [
{
"id": "1"
},
{
"id": "2"
}
],
"Attr2": "Attr2_data"
}
But try as i might i cannot seem to copy the data lines in to a new data attribute.
Can anyone pls point me in the right direction ?
You can convert yours to the following transformation spec
[
{
"operation": "shift",
"spec": {
"*s": { // represents a tag(of an object/array/attribute) with a trailing letter "s". The reason of this reform is to be able use "Action" as the key of the inner array without repeatedly writing it.
"*": {
"$": "&(2,1)[#2].id", // "$" looks one level up and copies the tag name, &(2,1) goes two levels up the tree and pick the first piece separated by asterisk, [#2] goes two level up in order to reach the level of "Actions" array to combine the subelements distributed from that level in arrayic manner with "Action"(&(2,1)) label, and the leaf node "id" stands for tag of the current attribute
"*": "&(2,1)[#2].data"
}
},
"*": "&" // the rest of the attributes(/objects/arrays) other than "Actions"
}
}
]
the demo on the site http://jolt-demo.appspot.com/ is

how to perform if-else drop operation in Apache nifi

I have a use case where I have couple of key values and perform if-else operation on it. If condition is not matched then whole content will drop, else pass the content as a result.
Input JSON :
{
"id": 30006,
"SourceName": "network",
"Number": 1,
"SourceNameCopy": "network",
"currenttime": "Thu Aug 30 21:19:27 IST 2022"
}
My Jolt Spec :
[
{
"operation": "shift",
"spec": {
"SourceNameCopy": {
"network": {
"#1": "&2",
"#id": "id",
"#SourceName": "SourceName",
"#Number": "Number",
"#currenttime": "currenttime"
},
"hardware": {
"#1": "&2",
"#id": "id",
"#SourceName": "SourceName",
"#Number": "Number",
"#currenttime": "currenttime"
}
}
}
}
]
Expected output :
if condition matched :
{
"id": 30006,
"SourceName": "network",
"Number": 1,
"SourceNameCopy": "network",
"currenttime": "Thu Aug 30 21:19:27 IST 2022"
}
Else (condition not matched) Drop the whole event as null.
Problem Statement :
The Key values is getting as a string, it should contain actual value in output as a result.
If your aim is to check out the match for value of SourceNameCopy versus fixed cases network or hardware, then add an OR operator(|) among them and compare as in the following case within a shift transformation spec :
[
{
"operation": "shift",
"spec": {
"SourceNameCopy": {
"network|hardware": {
"#2": "" // bring the whole value after going two levels up the tree
}
}
}
}
]
No need to include nothing about the other cases they would return as null spontaneously.

How can I use "not equal" condition while filtering array using JOLT specification

I want to filter JSON array using JOLT transformation, where condition is negative. In the below example I want only records where URL value is not equal to Not Available.
{
"Photos": [
{
"Id": "327703",
"Caption": "TEST>> photo 1",
"Url": "Not Available."
},
{
"Id": "327704",
"Caption": "TEST>> photo 2",
"Url": "http://bob.com/0001/327704/photo.jpg"
},
{
"Id": "327705",
"Caption": "TEST>> photo 3",
"Url": "http://bob.com/0001/327705/photo.jpg"
}
]
}
Take a look on very similar question Removing Elements from array based on a condition. Based on it you can solve it as below:
[
{
"operation": "shift",
"spec": {
"Photos": {
// loop thru all the photos
"*": {
// for each URL
"Url": {
// For "Not Available." do nothing.
"Not Available.": null,
// In other case pass thru
"*": {
"#2": "Photos[]"
}
}
}
}
}
}
]
Generally when you want to negate filter you do a filter and as transformation pass null which skips item.

Break down JSON properties to array of objects

I am trying to transform a simple JSon object into an array of objects with keys and values broken out, but I'm not sure how to quite get there.
I have tried this a number of ways but the closest I got was to create an object with two arrays, instead of an array with multiple objects with two properties each:
EDIT: I am trying to write a spec which would take any object, not this specific object. I do not know what the incoming object will be other than it will have simple properties (values will not be arrays or other objects).
Sample Input:
{
"property": "someValue",
"propertyName" : "anotherValue"
}
Expected Output:
{
"split_attributes": [
{
"key" : "property",
"value": "someValue"
},
{
"key" : "propertyName",
"value" : "anotherValue"
}
]
}
My spec so far:
{
"operation": "shift",
"spec": {
"*": {
"$": "split_attributes[#0].key",
"#": "split_attributes[#0].value"
}
}
}
Produces
{
"split_attributes" : [
{
"key" : [ "property", "propertyName" ],
"value" : [ "someValue", "anotherValue"]
}
]
}
SOLUTION
I was pretty close, and after looking at the tests, the solution was obvious (it's identical to one of the tests)
{
"operation": "shift",
"spec": {
"*": {
"$": "split_attributes[#2].key",
"#": "split_attributes[#2].value"
}
}
}
From what it seems, I was creating an array but I was looking at the wrong level for an index to the new array. I'm still fuzzy on the whole # level (for example where in the "tree" (and of which object) is #0, #1 and #2 actually looking).

JSON Array Structure Variations

Below are 3 JSON Array structure formats...
The first one, the one outlined at JSON.org, is the one I am familiar with:
Format #1
{"People": [
{
"name": "Sally",
"age": "10"
},
{
"name": "Greg",
"age": "10"
}
]}
The second one is a slight variation that names the elements of the array. I personally don't care for it; you don't name elements of an array in code (they are accessed by index), why name them in JSON?
Format #2
{"People": [
"Person1": {
"name": "Sally",
"age": "10"
},
"Person2": {
"name": "Greg",
"age": "10"
}
]}
This last one is another variation, quite similar to Format #2, but I have a hunch this one is incorrect because it appears to have extra curly braces where they do not belong.
Format #3
{"People": [
{
"Person1": {
"name": "Sally",
"age": "10"
}
},
{
"Person2": {
"name": "Greg",
"age": "10"
}
}
]}
Again, I'm confident that Format #1 is valid as it is the JSON Array format outlined at JSON.org. However, what about Format #2 and Format #3? Are either of those considered valid JSON? If yes, where did those formats come from? I do not see them outlined at JSON.org or on Wikipedia.
Both #1 and #3 are (nearly - there are commas missing) valid JSON, but encode different structures:
#1 gives you an Array of Objects, each with name and age String properties
#3 gives you an Array of Objects, each with a single Object property, each with name and age String properties.
The #2 is invalid: Arrays (as defined by [ ... ]) may not contain property names.
Solution For Format#1
By default:
array=[];
object={};
JSON Code:
var Json = {
People:[]
};
Json.People.push({
"name": "Sally",
"age": "10"
});
Json.People.push({
"name": "Greg",
"age": "10"
});
JSON Result:
{"People":
[
{
"name": "Sally",
"age": "10"
},
{
"name": "Greg",
"age": "10"
}
]
}