I'm trying to convert below input json to flatten necessary column names and its values while retaining all metadata.
Below is the input json that I've for my CDC use-case.
{
"type": "update",
"timestamp": 1558346256000,
"binlog_filename": "mysql-bin-changelog.000889",
"binlog_position": 635,
"database": "books",
"table_name": "publishers",
"table_id": 111,
"columns": [
{
"id": 1,
"name": "id",
"column_type": 4,
"last_value": 2,
"value": 2
},
{
"id": 2,
"name": "name",
"column_type": 12,
"last_value": "Suresh",
"value": "Suresh123"
},
{
"id": 3,
"name": "email",
"column_type": 12,
"last_value": "Suresh#yahoo.com",
"value": "Suresh#yahoo.com"
}
]
}
Below is the expected output json
[
{
"type": "update",
"timestamp": 1558346256000,
"binlog_filename": "mysql-bin-changelog.000889",
"binlog_position": 635,
"database": "books",
"table_name": "publishers",
"table_id": 111,
"columns": {
"id": "2",
"name": "Suresh123",
"email": "Suresh#yahoo.com"
}
}
]
I tried the below spec from which I'm able to retrieve columns object but not the rest of the metadata.
[
{
"operation": "shift",
"spec": {
"columns": {
"*": {
"#(value)": "[#1].#(1,name)"
}
}
}
}
]
Any leads would be very much appreciated.
I got the JOLT spec for above transformation. I'm posting it here incase if anyone stumbles upon the something like this.
[
{
"operation": "shift",
"spec": {
"columns": {
"*": {
"#(value)": "columns.#(1,name)"
}
},
"*": "&"
}
}
]
Related
My input JSON is like
{
"common": {
"name": "abc"
},
"details": {
"id": 4,
"node": [
{
"name": "node1",
"array2": []
},
{
"name": "node2",
"array2": [
{
"name": "node2_a2_1"
}
]
},
{
"name": "node3",
"array2": [
{
"name": "node3_a2_1"
},
{
"name": "node3_a2_2"
},
{
"name": "node3_a2_3"
}
]
}
]
}
}
What I want is for each leaf node in array2 (e.g. {"name": "node3_a2_1"}) I will traverse towards the root and add all common items and create an array of JSON objects without any nested object. So, the output I want is like
[
{
"common_name": "abc",
"id": 4,
"node_name": "node2",
"name": "node2_a2_1"
},
{
"common_name": "abc",
"id": 4,
"node_name": "node3",
"name": "node3_a2_1"
},
{
"common_name": "abc",
"id": 4,
"node_name": "node3",
"name": "node3_a2_2"
},
{
"common_name": "abc",
"id": 4,
"node_name": "node3",
"name": "node3_a2_3"
}
]
Could you please suggest how can I do that?
You can walk through the array2 while picking values of each element from their original location in the JSON value.
For example; traverse } four times to grab the value of id by using #(4,id), and use #(3,name)[&1] as the common distinguishing identifier for each attribute such as
[
{
"operation": "shift",
"spec": {
"details": {
"node": {
"*": {
"array2": {
"*": {
"#(5,common.name)": "#(3,name)[&1].common_name",
"#(4,id)": "#(3,name)[&1].id",
"#(2,name)": "#(3,name)[&1].node_name",
"name": "#(3,name)[&1].&"
}
}
}
}
}
}
},
{
"operation": "shift",
"spec": {
"*": {
"*": ""
}
}
}
]
i want to use JOLT Transform to convert an dictionary to an JSON Object. Here below i demonstrate it with an example below.
There is in the root an Array which contains the results from different computers. "Computer1", "Computer2", and more.
This structure should be stay. I removed the 2nd and more array elemt which will reoccur in the same way with {...}.
Given Object:
[
{
"name": "Computer1",
"events": {
"counts": [
{
"countType": "CRITICAL",
"count": 5
},
{
"countType": "HIGH",
"count": 12
},
{
"countType": "LOW",
"count": 40
}
]
},
"processes": {
"counts": [
{
"countType": "CRITICAL",
"count": 0
},
{
"countType": "HIGH",
"count": 2
},
{
"countType": "LOW",
"count": 80
}
]
}
},
{
"name": "Computer2",
"events": {...},...
}
]
Desired Output:
[
{
"name": "Computer1",
"events": {
"CRITICAL": 5,
"HIGH": 12,
"LOW": 40
},
"processes": {
"CRITICAL": 0,
"HIGH": 2,
"LOW": 80
}
}
, {
"name": "Computer2",
"events": {...},
...
}
]
Please help to identify the right JOLT spec.
Thanks in advance
Marcus
You can use such a shift transformation specs
[
{
"operation": "shift",
"spec": {
"*": {
"name": "&1.&",
"events|processes": { // pipe represents "OR" logic
"counts": {
"*": {
"#count": "&4.&3.#countType" // "&4" and "&3" represent going four and three level up the tree to grab the indices of the outermost level list and the key name of the objects("events" and "processes") respectively
}
}
}
}
}
},
{
"operation": "shift",
"spec": {
"*": ""
}
}
]
I am trying to apply Jolt for below data
input:
[
{
"id": "500",
"code": "abc",
"date": "2020-10-10",
"category": 1,
"amount": 100,
"result": 0
},
{
"id": "500",
"code": "abc",
"date": "2020-10-10",
"category": 2,
"amount": 200,
"result": 1
}
]
jolt used:
[
{
"operation": "shift",
"spec": {
"*": {
"id": "#(1,id).id",
"code": "#(1,id).code",
"date": "#(1,id).group1.date",
"category": "#(1,id).group1.group2[&1].category"
}
}
},
{
"operation": "cardinality",
"spec": {
"*": {
"id": "ONE"
}
}
},
{
"operation": "shift",
"spec": {
"*": ""
}
}
]
current output:
{
"id": "500",
"code": [
"abc",
"abc"
],
"group1": {
"date": [
"2020-10-10",
"2020-10-10"
],
"group2": [
{
"category": 1
},
{
"category": 2
}
]
}
}
expected:
{
"id": "500",
"code": "abc",
"group1": {
"date": "2020-10-10",
"group2": [
{
"category": 1
},
{
"category": 2
}
]
}
}
If i keep column of code & date in cardinality, it's fine. But in my use case, there are multiple such columns to be added. Are there any better ways to handle this scenario?
You should add each added node and use "*" wildcard to represent the rest of the attributes within the cardinality transformation such as
{
"operation": "cardinality",
"spec": {
"*": {
"*": "ONE",
"group1": {
"*": "ONE",
"group2": "MANY"
}
}
}
}
where "group2": "MANY" will make group2 to be excepted for extracting only the first element of the respective list.
the demo on the site http://jolt-demo.appspot.com/ :
I am trying to use the jolt JSON to JSON transformation in Apache NiFi. I want to transform one JSON into another format.
Here is my original JSON:
{
"total_rows": 5884,
"offset": 0,
"rows": [
{
"id": "03888c0ab40c32451a018be6b409eba3",
"key": "03888c0ab40c32451a018be6b409eba3",
"value": {
"rev": "1-d5cc089dd8682422962ccab4f24bd21b"
},
"doc": {
"_id": "03888c0ab40c32451a018be6b409eba3",
"_rev": "1-d5cc089dd8682422962ccab4f24bd21b",
"topic": "iot-2/type/home-iot/id/1234/evt/temp/fmt/json",
"payload": {
"temperature": 36
},
"deviceId": "1234",
"deviceType": "home-iot",
"eventType": "temp",
"format": "json"
}
},
{
"id": "03888c0ab40c32451a018be6b409f163",
"key": "03888c0ab40c32451a018be6b409f163",
"value": {
"rev": "1-dee82cbb1b5ffa8a5e974135eb6340c5"
},
"doc": {
"_id": "03888c0ab40c32451a018be6b409f163",
"_rev": "1-dee82cbb1b5ffa8a5e974135eb6340c5",
"topic": "iot-2/type/home-iot/id/1234/evt/temp/fmt/json",
"payload": {
"temperature": 22
},
"deviceId": "1234",
"deviceType": "home-iot",
"eventType": "temp",
"format": "json"
}
}
]
}
I want this to be transformed in the following JSON:
[
{
"temperature":36,
"deviceId":"1234",
"deviceType":"home-iot",
"eventType":"temp"
},
{
"temperature":22,
"deviceId":"1234",
"deviceType":"home-iot",
"eventType":"temp"
}
]
This is what my spec looks like:
[
{
"operation": "shift",
"spec": {
"rows": {
"*": {
"doc": {
"deviceId": "[&1].deviceId",
"deviceType": "[&1].deviceType",
"eventType": "[&1].eventType",
"payload": {
"*": "[&1]"
}
}
}
}
}
}
]
I keep getting a null response. I am new to this and the documentation is not very easy to comprehend. Can somebody please help?
Because you are "down" one more level after the array index, by the time you get to deviceId you are 2 levels away from the index. Replace all the &1s with &2 except for payload. In that case you are another level "down" so you'll want to use &3 for the index. You also need to take whatever is matched by the * (temperature, e.g.) and set the outgoing field name to the same thing, by using & after the array index. Here's the resulting spec:
[
{
"operation": "shift",
"spec": {
"rows": {
"*": {
"doc": {
"deviceId": "[&2].deviceId",
"deviceType": "[&2].deviceType",
"eventType": "[&2].eventType",
"payload": {
"*": "[&3].&"
}
}
}
}
}
}
]
I am quite new to JOLT and I need to transform my JSON files to the desired schema. This is my input
[
{
"PK": 12345,
"FULL_NAME":"Amit Prakash",
"BIRTHDATE":"1987-05-25",
"SEX":"M",
"EMAIL": "amprak#mail.com",
"PHONE": "809386731",
"TS":"2015-11-19 14:36:34.0"
},
{
"PK": 12654,
"FULL_NAME": "Rohit Dhand",
"BIRTHDATE":"1979-02-01",
"SEX":"M",
"EMAIL": "rodha#mail.com",
"PHONE": "937013861",
"TS":"2015-11-20 11:03:02.6"
},
...
]
and this is my desired output:
{
"records": [
{
"attribs": [{
"type": "customer",
"reference": "CUST"
}],
"name": "Amit Prakash",
"personal_email": "amprak#mail.com",
"mobile": "809386731",
"id": 12345
},
{
"attribs": [{
"type": "customer",
"reference": "CUST"
}],
"name": "Rohit Dhand",
"personal_email": "rodha#mail.com",
"mobile": "937013861",
"id": 12654
},
...
]
}
So far, I have only managed up to this point:
[
{
"operation": "remove",
"spec": {
"*": {
"BIRTHDATE": "",
"SEX": "",
"TS": ""
}
}
},
{
"operation": "shift",
"spec": {
"*": "records"
}
}
]
But I can't go on from here. I don't know how to rename keys in the output.
Also, what's the alternative to remove operation? remove operation is good if you have fewer keys to exclude than to include, but how about the reverse (few keys to include, more than to exclude within a JSON object)?
Spec
[
{
"operation": "shift",
"spec": {
"*": {
"PK": "records[&1].id",
"PHONE": "records[&1].mobile",
"EMAIL": "records[&1].personal_email",
"FULL_NAME": "records[&1].name"
}
}
},
{
"operation": "default",
"spec": {
"records[]": {
"*": {
"attribs[]": {
"0": {
"type": "customer",
"reference": "CUST"
}
}
}
}
}
}
]
Shift makes copy of your data, whereas the other operations do not. Thus, one way to remove stuff is to just not have it copied across in your shift spec.
Generally the remove is used to get rid of things that would "mess up" the shift.