Support for concat in JSON Jolt in Apache NiFi - json

Using Apache NiFi I want to add a new field to all the elements within a JSON flow file based on the concatenation of two other fields. I am trying to use the JoltTransformJSON processor for this, however, the Jolt transform I want to use works OK using online tools but does not work within NiFi. I suspect a version issue but there may be something stupid with my Jolt specification.
The input JSON looks like this...
[
{
"id": 485842,
"cc": 1,
"x": 0,
"y": null
},
{
"id": 281733,
"cc": 1,
"x": 0,
"y": 10
},
{
"id": 721412,
"cc": 12,
"x": 0,
"y": null
}
]
The desired output I want is this...
[ {
"id" : 485842,
"cc" : 1,
"x" : 0,
"y" : null,
"id_cc" : "485842_1"
}, {
"id" : 281733,
"cc" : 1,
"x" : 0,
"y" : 10,
"id_cc" : "281733_1"
}, {
"id" : 721412,
"cc" : 12,
"x" : 0,
"y" : null,
"id_cc" : "721412_12"
} ]
And the Jolt transform I use on the online site is...
[{
"operation": "modify-default-beta",
"spec": {
"*": {
"id_cc": "=concat(#(1,id),'_',#(1,cc))"
}
}
}]
In NiFi, I configure the JoltTransformJSON processor to have Modify-Default and I use this slightly modified Jolt Specification...
{
"operation": "modify-default",
"spec": {
"*": {
"id_cc": "=concat(#(1,id),'_',#(1,cc))"
}
}
}
NiFi validates this OK and the process runs. The output JSON consists of a single record only and a new field is added like this
"operation": "modify-default"
Is there a quick fix to the modify-default operation that will resolve this or is there an even easier way using a shift operation?
Thanks in advance for any pointers.

After some help from colleagues, I have found a way to make this work.
Firstly, set the Jolt Transformation DSL property of the JoltTransformJSON processor to be Chain.
Secondly, set the Jolt specification to the following...
[{
"operation": "modify-default-beta",
"spec": {
"*": {
"id_cc": "=concat(#(1,id),'_',#(1,cc))"
}
}
}]
The [] are vital as is the beta in the operation.
Thirdly, ensure the JSON fed into the Jolt processor is an array.
Get all this correct and the expected output will be produced.

Related

Apache VTL - Copy node

Is there a way to do deep copy using apache VTL?
I was trying to use x-amazon-apigateway-integration using requestTemplates.
The input JSON is as shown below,
{
"userid": "21d6523137f6",
"time": "2020-06-16T15:22:33Z",
"item": {
"UserID" : { "S": "21d6523137f6" },
<... some complex json nodes here ...>,
"TimeUTC" : { "S": "2020-06-16T15:22:33Z" },
}
}
The requestTemplate is as shown below,
requestTemplates:
application/json: !Sub
- |
#set($inputRoot = $input.path('$'))
{
"TableName": "${tableName}",
"ConditionExpression": "attribute_not_exists(TimeUTC) OR TimeUTC > :sk",
"ExpressionAttributeValues": {
":sk":{
"S": "$util.escapeJavaScript($input.path('$.time'))"
}
},
"Item": "$input.path('$.item')", <== Copy the entire item over to Item.
"ReturnValues": "ALL_OLD",
"ReturnConsumedCapacity": "INDEXES",
"ReturnItemCollectionMetrics": "SIZE"
}
- {
tableName: !Ref EventsTable
}
The problem is, the item gets copied like,
"Item": "{UserID={S=21d6523137f6}, Lat={S=37.33180957}, Lng={S=-122.03053391}, ... other json elements..., TimeUTC={S=2020-06-16T15:22:33Z}}",
As you can see, the whole nested json become a single atribute. While I expected it to become a fully blown json node on its own like below,
"Item": {
"UserID" : { "S": "21d6523137f6" },
"Lat": { "S": "37.33180957" },
"Lng": { "S": "-122.03053391" },
<.... JSON nodes ...>
"TimeUTC" : { "S": "2020-06-20T15:22:33Z" }
},
Is it possible to deep/nested copy operation on a json node like above without doing the kung-fu of iterating the node and appending the childs o a json node variable etc...
btw, I'm using AWS API Gateway request template, so it may not support all the Apache VTL templating options.
You need to use the $input.json method instead of $input.path.
"Item": $input.json('$.item'),
Note that I removed the double quotes.
If you had the double quotes because you want to stringify $.item, you can do that like so:
"Item": "$util.escapeJavaScript($input.json('$.item'))",

Need Jolt Spec to convert matrix json to denormalized json formart

Can anyone please help me a JOLT spec to convert my matrix type json to denormalized json. Please find the below my input json and my expected josn output.
Input Json:
[
{
"attributes": [
{
"name": "brand",
"value": "Patriot Lighting"
},
{
"name": "color",
"value": "Chrome"
},
{
"name": "price",
"value": "49.97 USD"
}
]
},
{
"attributes": [
{
"name": "brand",
"value": "Masterforce"
},
{
"name": "color",
"value": "Green"
},
{
"name": "price",
"value": "99.0 USD"
}
]
}
]
Expected Json output:
[
{
"brand": "Patriot Lighting",
"color": "Chrome",
"price": "49.97 USD"
},
{
"brand": "Masterforce",
"color": "Green",
"price": "99.0 USD"
}
]
I was trying to build JOLT spec to convert this json. But challenge is the json which I have multiple tables with "attributes" tag.
Thanks in advance!
JOLT is not easy to use but I get a lot out of some other StackOverflow questions floating around and I just started reading up on the source code comments
[
{
"operation": "shift",
"spec": {
// for each element in the array
"*": {
"attributes": {
// for each element in the attribute
"*": {
// grab the value
// - put it in an array
// - but it must be indexed by the "positions" found four steps back
// - put the value in a key
// - that is determined by moving one step back and looking at member name
"value": "[#4].#(1,name)"
}
}
}
}
}
]
This is seems very obscure at first glance but I hope the comments explain everything.
Please go read on JOLT transformation to copy single value along an array
Also this is almost mandatory for JOLT beginners https://docs.google.com/presentation/d/1sAiuiFC4Lzz4-064sg1p8EQt2ev0o442MfEbvrpD1ls/edit#slide=id.g9a487080_011
If you need another example, I just answered a question here Nifi JOLT: flat JSON object to a list of JSON object
And, probably your best friend can be found at https://jolt-demo.appspot.com

convert null values corresponding to an Array to empty array in nifi jolt

I want to achieve following JSON transformation using Jolt processor in NIFI
By focusing on values field, in the first input in json (id 900551), values are populated as the following
input JSON
{
"id": 900551,
"internal_name": [],
"values": [
{
"id": 1430156,
"form_field_id": 900551,
"pos": 0,
"weight": null,
"category": null,
"created_at": "2020-10-15 12:55:02",
"updated_at": "2020-11-27 10:45:09",
"deleted_at": null,
"settings": {
"image": "myimage.png"
"fix": false,
"bold": false,
"exclusive": false
},
"internal_value": "494699DV7271000,6343060SX0W1000,619740BWR0W1000",
"css_class": null,
"value": "DIFFERENCE",
"settings_lang": {},
"value_html": ""
}
]
}
On the second input Json file to parse, values is null.
{
"id": 900552,
"internal_name": [],
"values": []
}
I would like to convert null values to an empty array in my conversion
Is there a way to do this using existing Jolt operations ?
Thanks.
The default operation is what you are looking for:
Defaultr walks the spec and asks "Does this exist in the data? If not, add it."
In our case:
if the value for "values" key is null, put the empty array instead
Here is the spec:
[
{
"operation": "default",
"spec": {
"values": []
}
}
]
tested with https://jolt-demo.appspot.com/
edit: answering the question from the comment:
Maybe this workaround will work for you

Update Json-Attributes in Apache-Nifi: Jolt

I'm a newbie on Apache Nifi and have the following Problem: I would like to transform a json file as follows:
From:
{
"Property1": "x1",
"Property2": "Tag_**2ABC**",
"Property3": "x3",
"Property4": "x4"
}
to:
{
"**2ABC**_Property1": "x1",
"**2ABC**_Property3": "x3",
"**2ABC**_Property4": "x4"
},
it means: taking the value from a certain Attribute to update all other attributes.
I could find examples using JoltTransformer-Processor that works well when the update is only adding a string. But not for my case
What I've done so far: I have set each Attribute using evaluateJSONPath processor. But I just tried quite a lot of possibilities to use the update Attribute processor to do it without success. All my possible tests looked like (within UpdateAttribute):
Property1 --> ${'Property2':substring(4,6)}"_"${'Property1'}
Using Jolt:
[
{"operation": "modify-overwrite-beta",
"spec": {
"Property1": "${'Property2':substring(4,6)}_${'Property1'}"
}
}
]
Which point am I missing here? Thanks in advance!
I don't know about Nifi, but here is how you can do it in Jolt.
Spec
[
{
"operation": "shift",
"spec": {
// match Property2
"Property2": {
"Tag_*": { // capture the nasty "**2ABC**" part to reference later
// go back up the tree to the root
"#2": {
// match and ignore Property2
"Property2": null,
//
// match Property* and use it and the captured
// "prefix" to create the output key
// &(2,1) references the Tag_*, and pull off the "**2ABC**" part
"Property*": "&(2,1)_&"
}
}
}
}
}
]

json schema for a map of similar objects

I wish to write a json schema to cover this (simplified) example
{
"errorMessage": "",
"nbRunningQueries": 0,
"isError": False,
"result": {
"foo": {"price":10.0, "country":"UK"},
"bar": {"price":100.2, "country":"UK"}
}
}
which can have this pretty trivial root schema
schema = {
"type":"object",
"required":True,
"properties":{
"errorMessage": {"type":"string", "required":True},
"isError": {"type":"boolean", "required":True},
"nbRunningQueries": {"type":"number", "required":True},
"result": {"type":"object","required":True}
}
}
The complication is the results {} element. Unlike a standard pattern where results would be an array of same objects - each with an id field or similar this response models a python dictionary which looks like this:
{
"foo": {},
"bar": {},
...
}
So given that a will be getting a results object of flexible size with no set keys how can I write json schema for this?
I don't control the input sadly or I'd rewrite it to be something like
{
"errorMessage": "",
"nbRunningQueries": 0,
"isError": False,
"result": [
{"id": "foo", "price":10.0, "country": "UK"},
{"id": "bar", "price":100.2, "country":"UK"}
]
}
Any help or links to pertinent examples would be great. Thanks.
With json-schema draft 4, you can use additionalProperties keyword to specify the schema of any new properties that you could receive in your results object.
"result" : {
"type" : "object"
"additionalProperties" : {
"type" : "number"
}
}
If you can restrict the allowed key names, then you may use "patternProperties" keyword and a regular expression to limit the permited key names.
Note that in json-schema draft 4 "required" must be an array which is bounded to the object, not to each property.