how to perform if-else drop operation in Apache nifi - json

I have a use case where I have couple of key values and perform if-else operation on it. If condition is not matched then whole content will drop, else pass the content as a result.
Input JSON :
{
"id": 30006,
"SourceName": "network",
"Number": 1,
"SourceNameCopy": "network",
"currenttime": "Thu Aug 30 21:19:27 IST 2022"
}
My Jolt Spec :
[
{
"operation": "shift",
"spec": {
"SourceNameCopy": {
"network": {
"#1": "&2",
"#id": "id",
"#SourceName": "SourceName",
"#Number": "Number",
"#currenttime": "currenttime"
},
"hardware": {
"#1": "&2",
"#id": "id",
"#SourceName": "SourceName",
"#Number": "Number",
"#currenttime": "currenttime"
}
}
}
}
]
Expected output :
if condition matched :
{
"id": 30006,
"SourceName": "network",
"Number": 1,
"SourceNameCopy": "network",
"currenttime": "Thu Aug 30 21:19:27 IST 2022"
}
Else (condition not matched) Drop the whole event as null.
Problem Statement :
The Key values is getting as a string, it should contain actual value in output as a result.

If your aim is to check out the match for value of SourceNameCopy versus fixed cases network or hardware, then add an OR operator(|) among them and compare as in the following case within a shift transformation spec :
[
{
"operation": "shift",
"spec": {
"SourceNameCopy": {
"network|hardware": {
"#2": "" // bring the whole value after going two levels up the tree
}
}
}
}
]
No need to include nothing about the other cases they would return as null spontaneously.

Related

How to get column value of key in JOLT

I'm looking for breaking following nested JSON file and transform it into a SQL prepared format.
Input JSON file:
{
"Product1": {
"Purchase": 31
},
"Product2": {
"Purchase": 6213,
"Cancel": 1988,
"Change": 3702,
"Renewal": 5934
}
}
Desired output:
[
{
"product": "Product1",
"Purchase": 31
},
{
"product": "Product2",
"Purchase": 6213,
"Cancel": 1988,
"Change": 3702,
"Renewal": 5934
}
]
What you need is using a $ wildcard within a shift transformation spec in order to replicate the current attributes's key such as
[
{
"operation": "shift",
"spec": {
"*": {
"$": "[#2].product",// $ grabs the value after going tree one level up from the current level
"*": "[#2].&"// keeps the current attributes conforming to the objects nested within a common array
}
}
}
]

Nifi jolt transformation json array for each element

I have a following input in Nifi Jolt Specification processor :
{
"transaction_id": 53279810162,
"bets": [
{
"event_name": "Mawkhar Sc - Rangdajied United FC (live)",
"match_start": "1660905000000",
"game_name": "Handicap ? £",
"outcome_name": "2 (3:0)"
},
{
"event_name": "University Azzurri Fc - Hellenic Athletic Club (live)",
"match_start": "1660905000000",
"game_name": "Handicap ? £",
"outcome_name": "2 (3:0)"
}
],
"user_id": 1009425,
"bet_type": "Multi"
}
and from this input I want to get such output :
{
"transaction_id": 53279810162,
"bets": [
{
"event_name": "Mawkhar Sc - Rangdajied United FC (live)",
"match_start": "1660905000000",
"outcomes": [
{
"outcome_name": "2 (3:0)",
"game_name": "Handicap ? £"
}
]
},
{
"event_name": "University Azzurri Fc - Hellenic Athletic Club (live)",
"match_start": "1660905000000",
"outcomes": [
{
"outcome_name": "2 (3:0)",
"game_name": "Handicap ? £"
}
]
}
],
"user_id": 1009425,
"bet_type": "Multi"
}
I need for each "bets" create new array with the name "outcomes", where placed game_name and outcome_name.
Can you explain me how I can produce such output ?
You can use such a single shift transformation
[
{
"operation": "shift",
"spec": {
"*": "&", // the "else" case(the attributes other than "bets")
"bets": {
"*": {
"*": "&2[&1].&", // &2 represents "bets", and [&1] indexes of it
"ou*|ga*": "&2[&1].outcomes[#].&" // # replicates the indexes of the newly created array "outcomes", and & substitues the values of the current attributes starting with "ou" OR "ga"
}
}
}
}
]
the demo on the site http://jolt-demo.appspot.com/ is

In Logic Apps JSON Array while parsing throwing error for single object but for multiple objects it is working fine

While parsing JSON in Azure Logic App in my array I can get single or multiple values/objects (Box as shown in below example)
Both type of inputs are correct but when only single object is coming then it is throwing an error "Invalid type. Expected Array but got Object "
Input 1 (Throwing error) : -
{
"MyBoxCollection":
{
"Box":{
"BoxName": "Box 1"
}
}
}
Input 2 (Working Fine) : -
{
"MyBoxCollection":
[
{
"Box":{
"BoxName": "Box 1"
},
"Box":{
"BoxName": "Box 2"
}
}]
}
JSON Schema :
"MyBoxCollection": {
"type": "object",
"properties": {
"box": {
"type": "array",
items": {
"type": "object",
"properties": {
"BoxName": {
"type": "string"
},
......
.....
..
}
Error Details :-
[
{
"message": "Invalid type. Expected Array but got Object .",
"lineNumber": 0,
"linePosition": 0,
"path": "Order.MyBoxCollection.Box",
"schemaId": "#/properties/Root/properties/MyBoxCollection/properties/Box",
"errorType": "type",
"childErrors": []
}
]
I used to use the trick of injecting a couple of dummy rows in the resultset as suggested by the other posts, but I recently found a better way. Kudos to Thomas Prokov for providing the inspiration in his NETWORG blog post.
The JSON parse schema accepts multiple choices as type, so simply replace
"type": "array"
with
"type": ["array","object"]
and your parse step will happily parse either an array or a single value (or no value at all).
You may then need to identify which scenario you're in: 0, 1 or multiple records in the resultset? I'm pasting below how you can create a variable (ResultsetSize) which takes one of 3 values (rs_0, rs_1 or rs_n) for your switch:
"Initialize_ResultsetSize": {
"inputs": {
"variables": [
{
"name": "ResultsetSize",
"type": "string",
"value": "rs_n"
}
]
},
"runAfter": {
"<replace_with_name_of_previous_action>": [
"Succeeded"
]
},
"type": "InitializeVariable"
},
"Check_if_resultset_is_0_or_1_records": {
"actions": {
"Set_ResultsetSize_to_0": {
"inputs": {
"name": "ResultsetSize",
"value": "rs_0"
},
"runAfter": {},
"type": "SetVariable"
}
},
"else": {
"actions": {
"Set_ResultsetSize_to_1": {
"inputs": {
"name": "ResultsetSize",
"value": "rs_1"
},
"runAfter": {},
"type": "SetVariable"
}
}
},
"expression": {
"and": [
{
"equals": [
"#string(body('<replace_with_name_of_Parse_JSON_action>')?['<replace_with_name_of_root_element>']?['<replace_with_name_of_list_container_element>']?['<replace_with_name_of_item_element>']?['<replace_with_non_null_element_or_attribute>'])",
""
]
}
]
},
"runAfter": {
"Initialize_ResultsetSize": [
"Succeeded"
]
},
"type": "If"
},
"Process_resultset_depending_on_ResultsetSize": {
"cases": {
"Case_no_record": {
"actions": {
},
"case": "rs_0"
},
"Case_one_record_only": {
"actions": {
},
"case": "rs_1"
}
},
"default": {
"actions": {
}
},
"expression": "#variables('ResultsetSize')",
"runAfter": {
"Check_if_resultset_is_0_or_1_records": [
"Succeeded",
"Failed",
"Skipped",
"TimedOut"
]
},
"type": "Switch"
}
For this problem, I met another stack overflow post which is similar to this problem. While there is one "Box", it will be shown as {key/value pair} but not [array] when we convert it to json format. I think it is caused by design, so maybe we can just add a record "Box" at the source of your xml data such as:
<Box>specific_test</Box>
And do some operation to escape the "specific_test" in the next steps.
Another workaround for your reference:
If your json data has only one array, we can use it to do a judgment. We can judge the json data if it contains "[" character. If it contains "[", the return value is the index of the "[" character. If not contains, the return value is -1.
The expression shows as below:
indexOf('{"MyBoxCollection":{"Box":[aaa,bbb]}}', '[')
The screenshot above is the situation when it doesn't contain "[", it return -1.
Then we can add a "If" condition. If >0, do "Parse JSON" with one of the schema. If =-1, do "Parse JSON" with the other schema.
Hope it would be helpful to your problem~
We faced a similar issue. The only solution we find is by manipulating the XML before conversion. We updated XML nodes which needs to be an array even when we have single element using this. We used a Azure function to update the required XML attributes and then returned the XML for conversion in Logic Apps. Hope this helps someone.

How can I use "not equal" condition while filtering array using JOLT specification

I want to filter JSON array using JOLT transformation, where condition is negative. In the below example I want only records where URL value is not equal to Not Available.
{
"Photos": [
{
"Id": "327703",
"Caption": "TEST>> photo 1",
"Url": "Not Available."
},
{
"Id": "327704",
"Caption": "TEST>> photo 2",
"Url": "http://bob.com/0001/327704/photo.jpg"
},
{
"Id": "327705",
"Caption": "TEST>> photo 3",
"Url": "http://bob.com/0001/327705/photo.jpg"
}
]
}
Take a look on very similar question Removing Elements from array based on a condition. Based on it you can solve it as below:
[
{
"operation": "shift",
"spec": {
"Photos": {
// loop thru all the photos
"*": {
// for each URL
"Url": {
// For "Not Available." do nothing.
"Not Available.": null,
// In other case pass thru
"*": {
"#2": "Photos[]"
}
}
}
}
}
}
]
Generally when you want to negate filter you do a filter and as transformation pass null which skips item.

Jolt reference first element in array as target name

I have been looking at this for a few weeks (in the background) and am stumped on how to convert JSON data approximating a CSV into a tagged set using the NiFi JoltTransformJson processor. What I mean by this is to use the data from the first row of an array in the input as the JSON object name in the output.
As an example I have this input data:
[
[
"Company",
"Retail Cost",
"Percentage"
],
[
"ABC",
"5,368.11",
"17.09%"
],
[
"DEF",
"101.47",
"0.32%"
],
[
"GHI",
"83.79",
"0.27%"
]
]
and what I am trying to get as output is:
[
{
"Company": "ABC",
"Retail Cost": "5,368.11",
"Percentage": "17.09%"
},
{
"Company": "DEF",
"Retail Cost": "101.47",
"Percentage": "0.32%"
},
{
"Company": "GHI",
"Retail Cost": "83.79",
"Percentage": "0.27%"
}
]
I see this as primarily 2 problems: getting access to the content of the first array and then making sure that the output data does not include that first array.
I would love to post a Jolt Specification showing myself getting somewhat close, but the closest gives me the correct shape of output without the correct content. It looks like this:
[
{
"operation": "shift",
"spec": {
"*": {
"*": "[&1].&0"
}
}
}
]
But it results in an output like this:
[ {
"0" : "Company",
"1" : "Retail Cost",
"2" : "Percentage"
}, {
"0" : "ABC",
"1" : "5,368.11",
"2" : "17.09%"
}, {
"0" : "DEF",
"1" : "101.47",
"2" : "0.32%"
}, {
"0" : "GHI",
"1" : "83.79",
"2" : "0.27%"
} ]
Which clearly has the wrong object name and it has 1 too many elements in the output.
Can do it, but wow it is hard to read / looks like terrible regex
Spec
[
{
// this does most of the work, but producs an output
// array with a null in the Zeroth space.
"operation": "shift",
"spec": {
// match the first item in the outer array and do
// nothing with it, because it is just "header" data
// e.g. "Company", "Retail Cost", "Percentage".
// we need to reference it, but not pass it thru
"0": null,
//
// loop over all the rest of the items in the outer array
"*": {
// this is rather confusing
// "*" means match the array indices of the innner array
// and we will write the value at that index "ABC" etc
// to "[&1].#(2,[0].[&])"
// "[&1]" means make the ouput be an array, and at index
// &1, which is the index of the outer array we are
// currently in.
// Then "lookup the key" (Company, Retail Cost) using
// #(2,[0].[&])
// Which is go back up the tree to the root, then
// come back down into the first item of the outer array
// and Index it by the by the array index of the current
// inner array that we are at.
"*": "[&1].#(2,[0].[&])"
}
}
},
{
// We know the first item in the array will be null / junk,
// because the first item in the input array was "header" info.
// So we match the first item, and then accumulate everything
// into a new array
"operation": "shift",
"spec": {
"0": null,
"*": "[]"
}
}
]