Remove special character from JSON data by JOLT processor - json

I have following json input . The Input contain many special character I want to remove all the special character form the input data
Expected Input
{
"A": "pwnbfd%2hdj&mdnb",
"B": "my name is param (India) ",
"C": "#pqwe",
"D": "jfdk#djnsn(america) djhfb "
}
Expected Output
{
"A": "pwnbfd2hdjmdnb",
"B": "my name is param India",
"C": "pqwe",
"D": "jfdkdjnsnamerica djhfb"
}
I need the above changes by using jolt transform json processor in apache nifi. There could be many other keywords in json payload.
I need to remove all the special characters from the input json so please help!!!!!

I don't know(furthermore don't think) whether there's a straightforward method to remove all of them, but you can individually remove each desired character within a modify transformation by using successive split and join functions like this :
[
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=split('%',#(1,&))"
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=join('',#(1,&))"
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=split('&',#(1,&))"
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=join('',#(1,&))"
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=split('#',#(1,&))"
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=join('',#(1,&))"
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=split('\\)',#(1,&))"
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=join('',#(1,&))"
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=split('\\(',#(1,&))"
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=join('',#(1,&))"
}
}
]

Related

Broke nested dynamic JSON array with JOLT

I'm looking for flattening nested JSON file into SQL ready format.
JSON file's content:
{
"ProductLine": [
"Product 1",
"Product 2"
],
"Purchase": 364,
"Cancel": [
140,
2
]
}
My current transformation:
[
{
"operation": "shift",
"spec": {
"*": {
"*": {
"#": "[#2].&2"
}
}
}
}
]
Desired output:
[
{
"ProductLine": "Product 1",
"Purchase": 364,
"Cancel": 140
},
{
"ProductLine": "Product 2",
"Cancel": 2
}
]
The difficulty is that arrays can change, sometimes "Cancel" can be an array or sometimes "Purchase" block can be nested.
You can use this spec:
If Purchase or cancel be an array or not, this works
[
{
"operation": "cardinality",
"spec": {
"*": "MANY"
}
},
{
"operation": "shift",
"spec": {
"ProductLine": {
"*": {
"*": {
"#1": "[&2].&3",
"#(3,Purchase[&1])": "[&2].Purchase",
"#(3,Cancel[&1])": "[&2].Cancel"
}
}
}
}
}
]
First, change all values to the array. Now you can loop on the ProductLine and get other fields from Purchase and Cancel.
Update: The following answer has been obtained in collaboration with Barbaros Özhan. Special thanks.
[
{
"operation": "cardinality",
"spec": {
"*": "MANY"
}
},
{
"operation": "shift",
"spec": {
"*": {
"*": {
"#": "[#2].&2"
}
}
}
}
]
We can pick Purchase at a different(outer) level such as
[
{
"operation": "shift",
"spec": {
"*": {
"*": {
"#": "[#2].&2"
}
},
"Purchase": "[#].&"// at two level less than the inner object
}
}
]
the demo one the site http://jolt-demo.appspot.com/ is
Edit : Considering array indeterminance for the attributes, you can use the following spec alternatively
[
{ //reform two separate objects
"operation": "shift",
"spec": {
"#": "orj",
"*": "non_array.&.#0[]"
}
},
{ // in order to keep the non-array values as the first component of the newly formed array(s)
"operation": "sort"
},
{
"operation": "shift",
"spec": {
"*": { //the topmost level
"*": { //level for the keys
"*": "&1[]" //match keys and values to convert non-arrays to arrays
}
}
}
},
{// pick the first component for the non-array(s)
"operation": "modify-overwrite-beta",
"spec": {
"*": {
"*": "=firstElement"
}
}
},
{ // apply the original spec after having got individual array values
"operation": "shift",
"spec": {
"*": {
"*": {
"#": "[#2].&2"
}
}
}
},
{ //get rid of the attributes with null values
"operation": "modify-overwrite-beta",
"spec": {
"*": "=recursivelySquashNulls"
}
}
]
or another straightforward alternative would be using your original spec after applying cardinality spec such as
[
{
"operation": "cardinality",
"spec": {
"*": "MANY"
}
},
{
"operation": "shift",
"spec": {
"*": {
"*": {
"#": "[#2].&2"
}
}
}
}
]

JoltTransformJson - Json Transformation

i have JSON value as below :
{
"table": "table_name",
"op_type": "U",
"before": {
"AAAA": "1-1111",
"BBBB": "2022-08-31 03:57:01",
"CCCC": "2023-08-31 23:59:59"
},
"after": {
"AAAA": "1-1112",
"BBBB": "2022-08-31 10:10:34"
}
}
i want to do this how can i do?
{
"AAAA": "1-1112",
"BBBB": "2022-08-31 10:10:34",
"CCCC": "2023-08-31 23:59:59"
"changed_columns": "AAAA, BBBB"
}
AAAA: "If you have after.AAAA, take AAAA else before.AAAA", BBBB: "If you have after.BBBB, take BBBB else before.BBBB.
AND I want to add changed_columns field like this :
,"changed_columns": "AAAA, BBBB"
is there a way to do this?
You can use shift operation and getting after values for the first. and then you can using before values. So if keys match with together you have an array with two element.
Now You can get first element to getting after values with modify-overwrite-beta operations and =firstElement function.
[
{
"operation": "shift",
"spec": {
"after": {
"*": {
"$": "changed_columns[]",
"#(1,&)": "&1"
}
},
"before": {
"*": "&"
}
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=firstElement(#(1,&))",
"changed_columns": "=join(', ',#(1,changed_columns))"
}
}
]
You can use
cardinality spec lately after using "after|before" as the key in this order to determine the precedence
exchange key-value pairs consecutively twice to determine whether
really changed the components in order to form "changed_columns"
such as
[
{
// multiplex the attributes in order to generate three independent groups
"operation": "shift",
"spec": {
"after|before": { // this order is important to determine the precedence in the upcoming cardinality spec
"*": {
"#": "&",
"#(0)": "l.&",
"*": {
"#1": "f.&2"
}
}
}
}
},
{
// determine whether before vs. after values equal through this and next two specs
"operation": "modify-overwrite-beta",
"spec": {
"l": {
"*": "=lastElement(#(1,&))"
},
"f": {
"*": "=firstElement(#(1,&))"
}
}
},
{
"operation": "shift",
"spec": {
"*": "&",
"l|f": {
"*": {
"$": "lf.#(0)"
}
}
}
},
{
"operation": "shift",
"spec": {
"*": "&",
"lf": {
"*": {
"$": "&2.#(0)"
}
}
}
},
{
// construct an array from those newly formed keys
"operation": "shift",
"spec": {
"*": "&",
"lf": {
"*": {
"$": "changed_columns"
}
}
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"changed_columns": "=join(', ',#(1,&))"
}
},
{
"operation": "cardinality",
"spec": {
"*": "ONE"
}
},
{
"operation": "sort"
}
]

JoltTransformJson - How to get column names as values

i have JSON value as below :
{
"table": "table_name",
"op_type": "U",
"before": {
"AAAA": "1-1111",
"BBBB": "2022-08-31 03:57:01"
},
"after": {
"AAAA": "1-1111",
"BBBB": "2022-08-31 10:10:34",
"DDDD": "2023-08-31 23:59:59"
}
}
I want to add column_names field like this :
,"changed_columns": "AAAA,BBBB,DDDD"
is there a way to do this?
You can use the following specs in which the main idea is to arrange the attributes so as to generate an array with unique elements within the an array by using successive shift transformation, then combine them within a modify transformation such as
[
{
// combine common key names for each respective values for the attributes
"operation": "shift",
"spec": {
"before|after": {
"*": {
"$": "&"
}
}
}
},
{
// construct an array from those newly formed keys
"operation": "shift",
"spec": {
"*": {
"$": "changed_columns"
}
}
},
{
// make them comma-separated
"operation": "modify-overwrite-beta",
"spec": {
"*": "=join(',',#(1,&))"
}
}
]
the demo on the site http://jolt-demo.appspot.com/ is
Edit : If your aim is to keep newly generated attribute along with the existing ones, then you can prefer using the following spec
[
{
"operation": "shift",
"spec": {
"*": "&", //else case
"before|after": {
"*": {
"$": "cc.&",
"#": "&2.&"
}
}
}
},
{
"operation": "shift",
"spec": {
"cc": {
"*": {
"$": "changed_columns"
}
},
"*": "&" //else case
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"changed_columns": "=join(',',#(1,&))"
}
}
]

Truncate every string in an array with JOLT Transformation

I try to truncate strings in an array with JOLT. The character "^" should be removed from any string that contains it in the first position.
Sample Input:
{
"scores": [
"^aaaa",
"^bbbb",
"cccc",
"^dddd",
"eeee"
]
}
Expected Output:
{
"scores" : [ "aaaa", "bbbb", "cccc", "dddd", "eeee" ]
}
My Spec:
[
{
"operation": "modify-overwrite-beta",
"spec": {
"splittedScores": "=split('\\^',#(1,scores))",
"scores": "=join('',#(1,splittedScores))" // <--- Not working
}
}
]
Seems you'd need one extra shift transformation sandwiched between modify transformations to tame the subarrays such as
[
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=split('\\^',#(1,&))"
}
},
{
"operation": "shift",
"spec": {
"*": {
"*": "&"
}
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=join('',#(1,&))"
}
},
{
"operation": "shift",
"spec": {
"*": {
"#": "scores"
}
}
}
]
the demo on the site http://jolt-demo.appspot.com/ is
or only one shift transformation will be sufficient to use(a straightforward option) such as
[
{
"operation": "shift",
"spec": {
"*": {
"*": {
"^*": {
"$(0,1)": "&3"
},
"*": {
"$": "&3"
}
}
}
}
}
]
splitting by "$(0,1)" wildcard for the object under "^*" node. &3 represents going three levels up to grab the name of the key scores .
the demo on the site http://jolt-demo.appspot.com/ is
If ^ is being removed in the first position really matters, then the second method; if all ^ characters from any strings should be removed, then the first method should be used.

Jolt Transform MapRecord to JSON

I need to flatten a db using nifi. I read in a table based in a PK ID. For each row the nifi content is shown as a MapRecord. I need to pull each field value out of the MapRecord and make it a json property.
Input:
{
"maxResults": 150,
"total": 89,
"issues": "MapRecord[{issueId=1, firstName=Jack, lastName=Smith}]",
"address": "MapRecord[{addressId=1, street=Mockingbird Lane, town=Timbuktoo}]"
}
Notice in the MapRecord nothing is in quotes. I don't know why this is like this. It is obviously not JSON.
I want the result to look like:
Expected output:
{
"maxResults": 150,
"total": 89,
"firstName": "Jack",
"lastName": "Smith",
"street": "Mockingbird Lane",
"town": "Timbuktoo"
}
Does anyone know how to do this using a JOLT transform?
Thanks.
You can manage it to solve by applying too many transformation specs. In the first step need to extract the key-value pairs. In order to perform this split function within modify-overwrite-beta might be used. At the later steps, need to keep the existing pairs(maxResults&total) and properly combine with the extracted ones using some trick methods such as
[
{
"operation": "modify-overwrite-beta",
"spec": {
"i": "=split('}',#(1,issues))",
"a": "=split('}',#(1,address))"
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"i": "=split(', ',#(1,i))",
"a": "=split(', ',#(1,a))"
}
},
{
"operation": "shift",
"spec": {
"maxResults": "&",
"total": "&",
"i|a": {
"0": { "1|2": { "#": "theRest" } }
}
}
},
{
"operation": "shift",
"spec": {
"*": "arr0.&",
"theRest": { "*": "&" }
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=split('=',#(1,&))"
}
},
{
"operation": "shift",
"spec": {
"arr0": {
"*": {
"#": "arr[1]",
"$": "arr[0]"
}
},
"*": {
"*": {
"#": "arr.[&1]"
}
}
}
},
{
"operation": "shift",
"spec": {
"*": "&",
"arr": {
"0": null,
"*": {
"*": "[&1].#(2,[0].[&])"
}
}
}
},
{
"operation": "shift",
"spec": {
"0": null,
"*": ""
}
}
]