Grouping JSON elements using Jolt transform - json

I need help in jolt transform spec. Below is my work till now.
Input:
[
{
"ID": "1234",
"Date": "2020-12-10",
"Time": "06:00:00",
"Rate": null,
"Interest": null,
"Term": 99
},
{
"ID": "1234",
"Date": "2020-12-11",
"Time": "07:00:00",
"Rate": 8,
"Interest": null,
"Term": 99
}
]
Jolt Code used:
[
{
"operation": "shift",
"spec": {
"*": {
"ID": "#(1,ID).id",
"Date": "#(1,ID).date",
"Time": "#(1,ID).group.time",
"Rate": "#(1,ID).group.rate",
"Interest": "#(1,ID).group.interest",
"Term": "#(1,ID).group.term"
}
}
},
{
"operation": "cardinality",
"spec": {
"*": {
"id": "ONE"
}
}
},
{
"operation": "shift",
"spec": {
"*": ""
}
}
]
Current output:
[
{
"id": "1234",
"date": ["2020-12-10", "2020-12-11"],
"group": {
"time": ["06:00:00", "07:00:00"],
"rate": 8,
"interest": null,
"term": [99, 99]
}
}
]
Expected output
[
{
"id": "1234",
"date": "2020-12-10",
"group": {
"time": "06:00:00",
"rate": null,
"interest": null,
"term": 99
}
},
{
"id": "1234",
"date": "2020-12-11",
"group": {
"time": "07:00:00",
"rate": 8,
"interest": null,
"term": 99
}
}
]
When using only single json object, this code works fine. But when we use multiple items with same id, it starts grouping all related fields.

You can use square bracketed notation([&1]) as the common factor while qualifying rest of the elements other than id and Date as group such as
[
{
"operation": "shift",
"spec": {
"*": {
"ID": "[&1].&",
"Date": "[&1].&",
"*": "[&1].group.&"
}
}
}
]

Related

Applying cardinality for multiple columns in Jolt

I am trying to apply Jolt for below data
input:
[
{
"id": "500",
"code": "abc",
"date": "2020-10-10",
"category": 1,
"amount": 100,
"result": 0
},
{
"id": "500",
"code": "abc",
"date": "2020-10-10",
"category": 2,
"amount": 200,
"result": 1
}
]
jolt used:
[
{
"operation": "shift",
"spec": {
"*": {
"id": "#(1,id).id",
"code": "#(1,id).code",
"date": "#(1,id).group1.date",
"category": "#(1,id).group1.group2[&1].category"
}
}
},
{
"operation": "cardinality",
"spec": {
"*": {
"id": "ONE"
}
}
},
{
"operation": "shift",
"spec": {
"*": ""
}
}
]
current output:
{
"id": "500",
"code": [
"abc",
"abc"
],
"group1": {
"date": [
"2020-10-10",
"2020-10-10"
],
"group2": [
{
"category": 1
},
{
"category": 2
}
]
}
}
expected:
{
"id": "500",
"code": "abc",
"group1": {
"date": "2020-10-10",
"group2": [
{
"category": 1
},
{
"category": 2
}
]
}
}
If i keep column of code & date in cardinality, it's fine. But in my use case, there are multiple such columns to be added. Are there any better ways to handle this scenario?
You should add each added node and use "*" wildcard to represent the rest of the attributes within the cardinality transformation such as
{
"operation": "cardinality",
"spec": {
"*": {
"*": "ONE",
"group1": {
"*": "ONE",
"group2": "MANY"
}
}
}
}
where "group2": "MANY" will make group2 to be excepted for extracting only the first element of the respective list.
the demo on the site http://jolt-demo.appspot.com/ :

apache nifi- how to create a custom date format

I am new to nifi and I am trying to create a week_start_date and week_number from the date in json format.
I am using jolt transform.
The input is google ads api response.
This is the spec I use:
[
{
"operation": "shift",
"spec": {
"customer_id": {
"*": "[&].customer_id"
},
"customer_name": {
"*": "[&].customer_name"
},
"account_currency_code": {
"*": "[&].account_currency_code"
},
"campaign_id": {
"*": "[&].campaign_id"
},
"campaign_name": {
"*": "[&].campaign_name"
},
"campaign_status": {
"*": "[&].campaign_status"
},
"ad_group_id": {
"*": "[&].ad_group_id"
},
"ad_group_name": {
"*": "[&].ad_group_name"
},
"clicks": {
"*": "[&].clicks"
},
"cost": {
"*": "[&].cost"
},
"impressions": {
"*": "[&].impressions"
},
"device": {
"*": "[&].device"
},
"date": {
"*": "[&].date"
},
"week_number": {
"*": "[&].week_number"
},
"year": {
"*": "[&].year"
},
"keywords": {
"*": "[&].keywords"
},
"keywords_id": {
"*": "[&].keywords_id"
}
}
},
{
"operation": "modify-default-beta",
"spec": {
"date": {
"date": "=intSubtract(#(1,date))"
}
}
}
]
The expected output should be:
[
{
"customer_id": "2538943578",
"customer_name": "test.com",
"account_currency_code": "USD",
"campaign_id": "11137311251",
"campaign_name": "testers",
"campaign_status": "ENABLED",
"ad_group_id": "1111",
"ad_group_name": "tesst- E",
"clicks": "6",
"cost": "26580000",
"impressions": "40",
"device": "DESKTOP",
"date": "2021-12-01",
"week_number": "48",
"week_start_date": "2021-11-29",
"year": 2021,
"keywords": "test",
"keywords_id": "56357925842"
}
]
the output I have:
[
{
"customer_id": "2538943578",
"customer_name": "test.com",
"account_currency_code": "USD",
"campaign_id": "11137311251",
"campaign_name": "testers",
"campaign_status": "ENABLED",
"ad_group_id": "1111",
"ad_group_name": "tesst- E",
"clicks": "6",
"cost": "26580000",
"impressions": "40",
"device": "DESKTOP",
"date": "2021-12-01",
"week_number": "2021-11-29",
"year": 2021,
"keywords": "test",
"keywords_id": "56357925842"
}
]
I am not sure on how to use correctly the modify-default-beta
Also I tried looking at the docs:
https://github.com/bazaarvoice/jolt/tree/master/jolt-core/src/test/resources/json/shiftr
What is the correct way also to understand the structure?

Convert sample JSON to nested JSON array using JOLT Transformation

I am facing a problem, transforming flat JSON to the nested JSON using jolt transformation. And I am very new to jolt Transformation. Input and output detail is given below.
My input:
[
{
"policyNo": 1,
"lProdCode": 500,
"name": "Prasad",
"id": "10",
"Age": "56"
},
{
"policyNo": 1,
"lProdCode": 500,
"name": "Mahapatra",
"id": "101",
"Age": "56"
},
{
"policyNo": 2,
"lProdCode": 500,
"name": "Pra",
"id": "109",
"Age": "56"
},
{
"policyNo": 3,
"lProdCode": 400,
"name": "Pra",
"id": "108",
"Age": "56"
},
{
"policyNo": 1,
"lProdCode": 500,
"name": "Pra",
"id": "108",
"Age": "56"
}
]
expected output
[
{
"policyNo": 1,
"lProdCode": 500,
"beneficiaries": [
{
"name": "Prasad",
"id": "10900629001",
"Age": "56"
},
{
"name": "Mahapatra",
"id": "10900629001",
"Age": "56"
},
{
"name": "Pra",
"id": "108",
"Age": "56"
}
]
},
{
"policyNo": 2,
"lProdCode": 500,
"beneficiaries": [
{
"name": "Pra",
"id": "10900629001",
"Age": "56"
}
]
},
{
"policyNo": 3,
"lProdCode": 400,
"beneficiaries": [
{
"name": "Pra",
"id": "108",
"Age": "56"
}
]
}
]
Principally you need to group by policyNo attribute along with generating a new list(beneficiaries) for the attributes other than policyNo&lProdCode. That might be handled within a shift transformation. Then add three more steps to prune the roughnesses stems from the first transformation such as
[
{
"operation": "shift",
"spec": {
"*": {
"policyNo": "#(1,policyNo).&",
"lProdCode": "#(1,policyNo).&",
"*": "#(1,policyNo).beneficiaries[&1].&"
}
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=recursivelySquashNulls"
}
},
{
"operation": "cardinality",
"spec": {
"*": {
"policyNo": "ONE",
"lProdCode": "ONE"
}
}
},
{
"operation": "shift",
"spec": {
"*": ""
}
}
]

jolt - copy or move a key from nested object to the top level

I'm looking for a way to copy or move a key from nested object to the top level
Input:
{
"id": "123",
"name": "foo",
"details": {
"orderNumber": "456789",
"addr": "N st 124",
"date": "2021-01-01"
}
}
desired output:
{
"id": "123",
"name": "foo",
"orderNumber": "456789",
"details": {
"orderNumber": "456789",
"addr": "N st 124",
"date": "2021-01-01"
}
}
or ideally
{
"id": "123",
"name": "foo",
"orderNumber": "456789",
"details": {
"addr": "N st 124",
"date": "2021-01-01"
}
}
the closest I could get is below transformation, but it converts object to value array
[
{
"operation": "shift",
"spec": {
"id": "id",
"name": "name",
"details": {
"orderNumber": "orderNumber",
"*": "details"
}
}
}
]
You're so close to the result, just a slight change(adding an ampersand) is needed such as
[
{
"operation": "shift",
"spec": {
"id": "id",
"name": "name",
"details": {
"orderNumber": "orderNumber",
"*": "&1.&"
}
}
}
]
in this case the keys keeps on appearing.

Jolt transformation array

I have this JSON for input:
{
"id": 1031435,
"event_id": "Formula_257",
"formula_id": 257,
"ts_start": 1583164200084000,
"ts_end": 1583164484960000,
"type": "formula",
"details": {
"6aa0734f-6d6a-4b95-8a2b-2dde346f9df7": {
"PowerActiveTriPhase": 183836912
}
},
"ack_ts": null,
"ack_user": null
}
and I need to get this kind of output:
{
"id": 1031435,
"event_id": "Formula_257",
"formula_id": 257,
"ts_start": 1583164200084000,
"ts_end": 1583164484960000,
"type": "formula",
"equipment_id":"6aa0734f-6d6a-4b95-8a2b-2dde346f9df7",
"parameter":"PowerActiveTriPhase",
"value":183836912,
"ack_ts": null,
"ack_user": null
}
What kind of spec do I need to use?
Thanks a lot!
This should work
[
{
"operation": "shift",
"spec": {
"id": "id",
"event_id": "event_id",
"formula_id": "formula_id",
"ts_start": "ts_start",
"ts_end": "ts_end",
"type": "type",
"details": {
"*": {
"$": "equipment_id",
"*": {
"$": "parameter",
"#": "value"
}
}
},
"ack_ts": "ack_ts",
"ack_user": "ack_user"
}
}
]