Transform JSON-JSON JOLT - json

I am quite new to JOLT and I need to transform my JSON files to the desired schema. This is my input
[
{
"PK": 12345,
"FULL_NAME":"Amit Prakash",
"BIRTHDATE":"1987-05-25",
"SEX":"M",
"EMAIL": "amprak#mail.com",
"PHONE": "809386731",
"TS":"2015-11-19 14:36:34.0"
},
{
"PK": 12654,
"FULL_NAME": "Rohit Dhand",
"BIRTHDATE":"1979-02-01",
"SEX":"M",
"EMAIL": "rodha#mail.com",
"PHONE": "937013861",
"TS":"2015-11-20 11:03:02.6"
},
...
]
and this is my desired output:
{
"records": [
{
"attribs": [{
"type": "customer",
"reference": "CUST"
}],
"name": "Amit Prakash",
"personal_email": "amprak#mail.com",
"mobile": "809386731",
"id": 12345
},
{
"attribs": [{
"type": "customer",
"reference": "CUST"
}],
"name": "Rohit Dhand",
"personal_email": "rodha#mail.com",
"mobile": "937013861",
"id": 12654
},
...
]
}
So far, I have only managed up to this point:
[
{
"operation": "remove",
"spec": {
"*": {
"BIRTHDATE": "",
"SEX": "",
"TS": ""
}
}
},
{
"operation": "shift",
"spec": {
"*": "records"
}
}
]
But I can't go on from here. I don't know how to rename keys in the output.
Also, what's the alternative to remove operation? remove operation is good if you have fewer keys to exclude than to include, but how about the reverse (few keys to include, more than to exclude within a JSON object)?

Spec
[
{
"operation": "shift",
"spec": {
"*": {
"PK": "records[&1].id",
"PHONE": "records[&1].mobile",
"EMAIL": "records[&1].personal_email",
"FULL_NAME": "records[&1].name"
}
}
},
{
"operation": "default",
"spec": {
"records[]": {
"*": {
"attribs[]": {
"0": {
"type": "customer",
"reference": "CUST"
}
}
}
}
}
}
]
Shift makes copy of your data, whereas the other operations do not. Thus, one way to remove stuff is to just not have it copied across in your shift spec.
Generally the remove is used to get rid of things that would "mess up" the shift.

Related

Jolt Spec for Json that may or may not have Array

Any help is greatly Appreciated.
I Have input JSON that can have Phone in array or it can be blank or it can be missing.
[
{
"Name": "abc",
"Phone": [
{
"office-1": "123",
"home-1": "989"
},
{
"office-1": "456",
"home-1": "999"
}
],
"Email": "abc#123.com"
},
{
"Name": "efg",
"Phone": [],
"Email": "efg#123.com"
},
{
"Name": "xyz",
"Email": "xyz#123.com"
}
]
My Jolt is already able to convert the Phone number array, but it is not working if the label Phone is missing in JSON input.
Expected output:
[
{
"Name": "abc",
"office-1": "123",
"home-1": "989",
"Email": "abc#123.com"
},
{
"Name": "abc",
"office-1": "456",
"home-1": "999",
"Email": "abc#123.com"
},
{
"Name": "efg",
"Email": "efg#123.com"
},
{
"Name": "xyz",
"Email": "xyz#123.com"
}
]
Please help
You can walk through the objects after separating the stuff under Phone node and the others such as
[
{
"operation": "shift",
"spec": {
"*": {
"Phone": {
"*": {
"*": "&3.&1.&",
"#(2,Name)": "&3.&1.Name",
"#(2,Email)": "&3.&1.Email"
}
},
"*": "&1.&1.&" // "else" case
}
}
},
{
// get rid of object labels
"operation": "shift",
"spec": {
"*": {
"*": ""
}
}
},
{
// get rid of duplicated values of some attributes
"operation": "cardinality",
"spec": {
"*": {
"*": "ONE"
}
}
}
]
the demo on the site http://jolt-demo.appspot.com/ is
new sample of data where I have kept non array data in between array data

Jolt - Merge two arrays and rename fields

I have a problem writing objects to an array. Basically I want to merge the arrays and rename the fields, while keeping the objects seperately.
My input json looks like this:
{
"board":[
{
"role":"Head of board",
"id":"111",
"name":"John Snow"
}
],
"leaders":[
{
"role":"Accounting leader",
"id":"222",
"name":"Amanda Johns"
},
{
"role":"HR leader",
"id":"333",
"name":"Frank Smith"
}
]
}
This is my spec: (I am aware that the values in brackets are probably not right)
[
{
"operation":"shift",
"spec":{
"board":{
"*":{
"id":"employees.bosses[#2].emp_num",
"role":"employees.bosses[#2].position",
"name":"employees.bosses[#2].name"
}
},
"leaders":{
"*":{
"id":"employees.bosses[#2].emp_num",
"role":"employees.bosses[#2].position",
"name":"employees.bosses[#2].name"
}
}
}
}
]
and this is my output:
{
"employees": {
"bosses": [ {
"emp_num": ["111", "222"],
"position": ["Head of board", "Accounting leader"],
"name": ["John Snow", "Amanda Johns"]
}, {
"emp_num": "333",
"position": "HR leader",
"name": "Frank Smith"
} ]
}
}
while I expect output that looks like this:
{
"employees": {
"bosses": [ {
"emp_num": "111",
"position": "Head of board",
"name": "John Snow"
}, {
"emp_num": "222",
"position": "Accounting leader",
"name": "Amanda Johns"
}, {
"emp_num": "333",
"position": "HR leader",
"name": "Frank Smith"
} ]
}
}
I have major troubles understanding what to do and how the [#n] work, I would really appreciate any help with fixing my spec and explaination why this does/does not work!
Need to distinguish the indexes of the arrays while combining them. To accomplish this, add a suffix letter or word(here I chose a for a.[&1]) for the first array while renaming all keys as desired within the first shift operation, then apply new extensions employees.bosses in the consecutive shift operation such as
[
{
"operation": "shift",
"spec": {
"board": {
"*": {
"id": "a.[&1].emp_num",
"role": "a.[&1].position",
"name": "a.[&1].name"
}
},
"leaders": {
"*": {
"id": "&1.emp_num",
"role": "&1.position",
"name": "&1.name"
}
}
}
},
{
"operation": "shift",
"spec": {
"*": "employees.bosses"
}
}
]

Shift JOLT transformation - facing problem with below transformation

I'm trying to convert below input json to flatten necessary column names and its values while retaining all metadata.
Below is the input json that I've for my CDC use-case.
{
"type": "update",
"timestamp": 1558346256000,
"binlog_filename": "mysql-bin-changelog.000889",
"binlog_position": 635,
"database": "books",
"table_name": "publishers",
"table_id": 111,
"columns": [
{
"id": 1,
"name": "id",
"column_type": 4,
"last_value": 2,
"value": 2
},
{
"id": 2,
"name": "name",
"column_type": 12,
"last_value": "Suresh",
"value": "Suresh123"
},
{
"id": 3,
"name": "email",
"column_type": 12,
"last_value": "Suresh#yahoo.com",
"value": "Suresh#yahoo.com"
}
]
}
Below is the expected output json
[
{
"type": "update",
"timestamp": 1558346256000,
"binlog_filename": "mysql-bin-changelog.000889",
"binlog_position": 635,
"database": "books",
"table_name": "publishers",
"table_id": 111,
"columns": {
"id": "2",
"name": "Suresh123",
"email": "Suresh#yahoo.com"
}
}
]
I tried the below spec from which I'm able to retrieve columns object but not the rest of the metadata.
[
{
"operation": "shift",
"spec": {
"columns": {
"*": {
"#(value)": "[#1].#(1,name)"
}
}
}
}
]
Any leads would be very much appreciated.
I got the JOLT spec for above transformation. I'm posting it here incase if anyone stumbles upon the something like this.
[
{
"operation": "shift",
"spec": {
"columns": {
"*": {
"#(value)": "columns.#(1,name)"
}
},
"*": "&"
}
}
]

Unable to form the JOLT schema to transform JSON in NiFi

I am trying to use the jolt JSON to JSON transformation in Apache NiFi. I want to transform one JSON into another format.
Here is my original JSON:
{
"total_rows": 5884,
"offset": 0,
"rows": [
{
"id": "03888c0ab40c32451a018be6b409eba3",
"key": "03888c0ab40c32451a018be6b409eba3",
"value": {
"rev": "1-d5cc089dd8682422962ccab4f24bd21b"
},
"doc": {
"_id": "03888c0ab40c32451a018be6b409eba3",
"_rev": "1-d5cc089dd8682422962ccab4f24bd21b",
"topic": "iot-2/type/home-iot/id/1234/evt/temp/fmt/json",
"payload": {
"temperature": 36
},
"deviceId": "1234",
"deviceType": "home-iot",
"eventType": "temp",
"format": "json"
}
},
{
"id": "03888c0ab40c32451a018be6b409f163",
"key": "03888c0ab40c32451a018be6b409f163",
"value": {
"rev": "1-dee82cbb1b5ffa8a5e974135eb6340c5"
},
"doc": {
"_id": "03888c0ab40c32451a018be6b409f163",
"_rev": "1-dee82cbb1b5ffa8a5e974135eb6340c5",
"topic": "iot-2/type/home-iot/id/1234/evt/temp/fmt/json",
"payload": {
"temperature": 22
},
"deviceId": "1234",
"deviceType": "home-iot",
"eventType": "temp",
"format": "json"
}
}
]
}
I want this to be transformed in the following JSON:
[
{
"temperature":36,
"deviceId":"1234",
"deviceType":"home-iot",
"eventType":"temp"
},
{
"temperature":22,
"deviceId":"1234",
"deviceType":"home-iot",
"eventType":"temp"
}
]
This is what my spec looks like:
[
{
"operation": "shift",
"spec": {
"rows": {
"*": {
"doc": {
"deviceId": "[&1].deviceId",
"deviceType": "[&1].deviceType",
"eventType": "[&1].eventType",
"payload": {
"*": "[&1]"
}
}
}
}
}
}
]
I keep getting a null response. I am new to this and the documentation is not very easy to comprehend. Can somebody please help?
Because you are "down" one more level after the array index, by the time you get to deviceId you are 2 levels away from the index. Replace all the &1s with &2 except for payload. In that case you are another level "down" so you'll want to use &3 for the index. You also need to take whatever is matched by the * (temperature, e.g.) and set the outgoing field name to the same thing, by using & after the array index. Here's the resulting spec:
[
{
"operation": "shift",
"spec": {
"rows": {
"*": {
"doc": {
"deviceId": "[&2].deviceId",
"deviceType": "[&2].deviceType",
"eventType": "[&2].eventType",
"payload": {
"*": "[&3].&"
}
}
}
}
}
}
]

Jolt: Merge arrays from properties

I'm trying to extract and merge objects from an array contained in some (but not all) of my input elements. Using the JOLT JSON transformation library.
Also, the arrays I'm trying to merge contain objects that don't always have the same properties. One key might be present in some, but not others.
Example is contrived/nonsensical simplification, but has the general shape of our data.
Input:
{
"Widgets": [
{
"Id": "1",
"PetFriendly": "True",
"Features": [
{
"Name": "Easy Button",
"Type": "Button"
},
{
"Name": "Lunch Lever",
"Type": "Food Service",
"MenuItems": [
"Pizza",
"Cheezburger"
]
}
]
},
{
"Id": "2",
"PetFriendly": "True"
},
{
"Id": "3",
"PetFriendly": "False",
"Features": [
{
"Name": "Missles",
"Type": "Attack"
}
]
},
{
"Id": "4",
"PetFriendly": "False",
"Features": [
{
"Name": "Bombs",
"Type": "Attack",
"MenuItems": [
"Rat Poison"
]
}
]
}
]
}
Desired output:
{
"Widgets": [
{
"Id": "1"
"PetFriendly": "True"
},
{
"Id": "2"
"PetFriendly": "True"
},
{
"Id": "3",
"PetFriendly": "False"
},
{
"Id": "4",
"PetFriendly": "False"
}
],
"Features": [
{
"WidgetId": "1",
"Name": "Easy Button",
"Type": "Button"
},
{
"WidgetId": "1",
"Name": "Lunch Lever",
"Type": "Food Service",
"MenuItems": [
"Pizza",
"Cheezburger"
]
},
{
"WidgetId": "3",
"Name": "Missles",
"Type": "Attack"
},
{
"WidgetId": "4",
"Name": "Bombs",
"Type": "Attack",
"MenuItems": [
"Rat Poison"
]
}
]
}
I have tried many transforms with no success, and read all the ShiftR documentation and its unit tests. A little help?
Spec
[
{
"operation": "shift",
"spec": {
"Widgets": {
"*": {
// build the finished "Widgets" output
"Id": "Widgets[&1].Id",
"PetFriendly": "Widgets[&1].PetFriendly",
//
// Process the Features, by pushing the Id
// down into them, but maintain the same doubly
// nested structure.
// Shift works property by property, so first
// fix the properties in side each Features element,
// (pulling ID down).
// Then in a 2nd Shift can accumulate things into array.
"Features": {
"*": {
"#(2,Id)": "temp[&3].Features[&1].WidgetId",
"*": "temp[&3].Features[&1].&"
}
}
}
}
}
},
{
"operation": "shift",
"spec": {
// passthru
"Widgets": "Widgets",
"temp": {
"*": {
"Features": {
// walk thru the doubly nested structure an
// now accumulate all non-null itens into
// the the final Features array.
"*": "Features[]"
}
}
}
}
}
]
Finally got it working with the below spec, BUT it has an undesirable side effect: It leaves empty default arrays. Is there a way to remove empty arrays, or otherwise mark them during the default step so they can be deleted? I checked this GitHub issue but not sure how to translate it to arrays of string. Anyone have a better solution?
[
// First fill in default value for "MenuItems" since not all Features have it.
{
"operation": "default",
"spec": {
"Widgets[]": {
"*": {
"Features[]": {
"*": {
"MenuItems": []
}
}
}
}
}
},
{
// Extract the Features' properties into arrays. The defaults added above ensure that we can merge the arrays into Feature objects as in this example:
// https://github.com/bazaarvoice/jolt/blob/master/jolt-core/src/test/resources/json/shiftr/mergeParallelArrays2_and-do-not-transpose.json.
"operation": "shift",
"spec": {
"Widgets": {
"*": {
"Id": "Widgets[&1].Id",
"PetFriendly": "Widgets[&1].PetFriendly",
"Features": {
"*": {
"#(2,Id)": "temp.WidgetId",
"Name": "temp.Name",
"Type": "temp.Type",
"MenuItems": "temp.MenuItems[]"
}
}
}
}
}
},
// Finally merge the arrays into Feature objects.
{
"operation": "shift",
"spec": {
"Widgets": "Widgets",
"temp": {
"WidgetId": {
"*": "Features[&0].WidgetId"
},
"Name": {
"*": "Features[&0].Name"
},
"Type": {
"*": "Features[&0].Type"
},
"MenuItems": {
"*": "Features[&0].MenuItems"
}
}
}
}
]
Result:
{
"Widgets": [
{
"Id": "1",
"PetFriendly": "True"
},
{
"Id": "2",
"PetFriendly": "True"
},
{
"Id": "3",
"PetFriendly": "False"
},
{
"Id": "4",
"PetFriendly": "False"
}
],
"Features": [
{
"WidgetId": "1",
"Name": "Easy Button",
"Type": "Button",
"MenuItems": []
},
{
"WidgetId": "1",
"Name": "Lunch Lever",
"Type": "Food Service",
"MenuItems": [ "Pizza", "Cheezburger" ]
},
{
"WidgetId": "3",
"Name": "Missles",
"Type": "Attack",
"MenuItems": []
},
{
"WidgetId": "4",
"Name": "Bombs",
"Type": "Attack",
"MenuItems": [ "Rat Poison" ]
}
]
}