Nifi - SplitJson retaining all other info - json

Working in Nifi, I have the following json structure in the content of a flow file:
{
"firstname": "fred",
"lastname": "jackson",
"dob": "19550607",
"children": [{
"firstname": "janet",
"lastname": "jackson",
"dob": "20020607"
},
{
"firstname": "michael",
"lastname": "jackson",
"dob": "20010201"
},
{
"firstname": "tito",
"lastname": "jackson",
"dob": "20030707"
}
]
}
I want to split this such that I would have three (3) flowfiles, each containing the top level info, but with just one child. For example one of them would look like this:
{
"firstname": "fred",
"lastname": "jackson",
"dob": "19550607",
"children": {
"firstname": "janet",
"lastname": "jackson",
"dob": "20020607"
}
}
Again, I would have three different flow files, one for each child. The output does not have to look exactly like this. The important thing is that I am able to split the struture, yet maintain the common data in each of the result flow files.
I tried using SplitJson with a JSONExpression of "$.children", which does give me the three flow files, but I loose the parent info. I could save the key/values for the common elements in attributes, split, and then add them, but the parent information can be more complex than my example (dynamic fields, etc), so I am unsure how I would do this.
Appreciate any ideas or thoughts.

The simplest way would be to use ForkRecord with a JSON Reader/Writer.
Set Include Parent Fields to true to retain the parent fields.
However, this may flatten the JSON in a way that you don't want - give it a try.
Alternatively, look at JoltTransformJSON which gives a lot more flexibility, but is quite complex to work out the appropriate spec. You can use https://jolt-demo.appspot.com/#inception to test your JOLT Specs.

Related

Add data to a json file using Talend

I have the following JSON:
[
{
"date": "29/11/2021",
"Name": "jack",
},
{
"date": "30/11/2021",
"Name": "Adam",
},
"date": "27/11/2021",
"Name": "james",
}
]
Using Talend, I wanna add 2 lines to have something like:
[
{
"company": "AMA",
"service": "BI",
"date": "29/11/2021",
"Name": "jack",
},
{
"company": "AMA",
"service": "BI",
"date": "30/11/2021",
"Name": "Adam",
},
"company": "AMA",
"service": "BI",
"date": "27/11/2021",
"Name": "james",
}
]
Currently, I use 3 components (tJSONDocOpen, tFixedFlowInput, tJSONDocOutput) but I can't have the right configuration of components in order to get the job done !
If you are not comfortable with json .
Just do these steps :
In the metaData just create a FileJson like this then paste it in your job as a tFileInputJson
Your job design and mapping would be
In your tFileOutputJson don't forget to change in the name of the data block "Data" with ""
What you need to do there according to the Talend practices is read your JSON. Then extract each object of it, add your properties and finally rebuild your JSON in a file.
An efficient way to do this is using tMap componenent like this.
The first tFileInputJSON will have to specify what properties it has to read from the JSON by setting your 2 objects in the mapping field.
Then the tMap will simply add 2 columns to your main stream, here is an example with hard coded string values. Depending on you needs, this component will also offer you the possibility to assign dynamic data to your 2 new columns, it's a powerful tool for manipulating the structure of a data stream.
You will find more infos about this component in the official documentation : https://help.talend.com/r/en-US/7.3/tmap/tmap; especially the "tMap scenarios" part.
Note
Instead of using the tMap, if you are comfortable with Java, you can use a tjavaRow instead. Using this, you can setup your 2 new columns with whatever java code you want to put as long as you have defined the output schema of the component.
output_row.Name = input_row.Name;
output_row.date = input_row.date;
output_row.company = "AMA";
output_row.service = "BI";

Azure Logic Apps - Map Json to Json with Liquid flatten array

Any help would be much appreciated. What I am trying to achieve is to request a record from Dynamics 365(cloud) to an on-premise system (exposed by mulesoft) I have decided to use Azure logic apps to do the integration and to use Liquid to do the mapping, however I am battling to flatten the array with liquid, I'm getting a JSON payload from the on-premise system which I need to transform readily to load into dynamics 365, what I am getting is something like the following:
{
"person": {
"firstname": " Fred",
"surname" : "Smith",
"age": 27,
"phoneno":"123456789",
"addresses": [
{
"address": {
"AddressLine1":"1 milky way",
"AddressLine2":"galaxy cresent",
"city": "tempest",
"state": "Idiho",
"postcode": "12345"
}
},
{
"address": {
"AddressLine1":"52 Saturn Drive",
"AddressLine2":"Wharfridge",
"city": "tempest",
"state": "Idiho",
"postcode": "12345"
}
}
]
}
}
and what I need is to flatten the array into the root node like this:
{
"person": {
"firstname": " Fred",
"surname" : "Smith",
"age": 27,
"phoneno":"123456789",
"addr1_AddressLine1":"1 milky way",
"addr1_AddressLine2":"galaxy cresent",
"addr1_city": "tempest",
"addr1_state": "Idiho",
"addr1_postcode": "12345",
"addr2_AddressLine1":"52 Saturn Drive",
"addr2_AddressLine2":"Wharfridge",
"addr2_city": "tempest",
"addr2_state": "Idiho",
"addr2_postcode": "12345"
}
}
If there any other solutions\ideas, i am all ears.
Thanks in advance for your help
Paul
So i found a solution or rather a work around, for some reason the liquid connector in logic apps does not support the "increment" tag, this was causing my issue. i was able to evaluate a property form the input json to decide where my fields would reside. but thanks for

Reading complex json data without iteration

I am working with some data and often the data is nested and i am required to perform some CRUD operations based on the structure of the data i have. For instance i have this json structure
{
"_id": "KnNLkJEhrDsvWedLu",
"createdAt": {
"$date": "2016-10-13T11:24:13.843Z"
},
"services": {
"password": {
"bcrypt": "$2a$30$1/cniPwPNCuwZ/MQDPQkLej..cAATkoGX.qD1TS4iHgf/pwZYE.j."
},
"email": {
"verificationTokens": [
{
"token": "qxe_T9IS7jW7gntpK0Q7UQ35RJ9jO9m2lclnokO3z87",
"address": "drwho#gmail.com",
"when": {
"$date": "2016-10-13T11:24:14.428Z"
}
}
]
},
"resume": {
"loginTokens": []
}
},
"username": "doctorwho",
"emails": [
{
"address": "drwho#gmail.com",
"verified": false
}
],
"persodata": {
"lastlogin": {
"$date": "2016-10-13T11:29:36.816Z"
},
"fname": "Doctor",
"lname": "Who",
"mobile": "+4480000000",
"identity": "1",
"email": "drwho#gmail.com",
"gender": null
}
}
I have several data sets with such complex structure. I need to read the data, edit and also delete. Before i get to iteration, i was wondering how i can read the data without iteration then iterate when i absolutely have to.
What are the rules i should keep in mind when reading such complex json structures to enable me read any complex structure i come across?.
I am currently using javascript but i am looking for rules that apply in other languages as well.
Parsing Json in JavaScript should be easy. http://www.json.org/js.html.
"Since JSON is a proper subset of JavaScript, the compiler will correctly parse the text and produce an object structure". Just follow the examples on that page.
If you want to use another language, in Java you could use Jackson or Gson to map those json strings to objects. Then using them becomes easy. Both libraries are annotation based, and wouldn't be difficult to implement.

json formats - which one to use?

What is the difference between these two JSON formats? Which format should I use?
[{
"employeeid": "12345",
"firstname": "joe",
"lastname": "smith",
"favoritefruit": "apple"
}, {
"employeeid": "45678",
"firstname": "paul",
"lastname": "johnson",
"favoritefruit": "orange"
}]
OR
[
["employeeid", "firstname", "lastname", "favoritefruit"],
["12345", "joe", "smith", "apple"],
["45678", "paul", "johnson", "orange"]
]
Definately first one. It will create array of employee object while second one will create array of array of objects which will be more difficult to parse in most of language.
It depends on the context.
The first is much easer to parse if you want to create employee objects to work with.
The second may be better if you need to work on the "raw" data only. Furthermore the second is much shorter. That's not important for small or medium datasets, but could be important for example if you need to transfer large sets of employee data.

JSON Notation - Lists with Single Member

Let's say I have a JSON structure that contains the following:
{
"ROWS": [{
"name": "Greg",
"age": "24",
},
{
"name": "Tom",
"age": "53",
}]
}
The value for the key "ROWS" is a list of dictionaries, right?
Okay, well what if I only have one entry? Is it still appropriate to use list notation, even if that list has a single element?
{
"ROWS": [{
"name": "Greg",
"age": "24",
}]
}
Would there be any reason I could NOT do this?
There is no technical reason why you could not use a list. Your array could be empty and that's perfectly acceptable and valid technically.
For your ROWS property I think the most important thing to consider is how many rows you could possibly have. You want to incorporate the computer engineering principle of generality to make sure you don't paint yourself into a corner by making ROWS an object. If you can expect to ever have more then one object as a row, even if currently there is only one, then it's absolutely appropriate to use an array.
For example let's assume you expect to get a unique record such as a login system. Then it wouldn't make sense to use an array , in this case you should use an object instead
{
"LOGIN_ROW": {
"name": "Greg",
"age": "24",
}
}
Again I said should because it's up to you to format your json object graph. But of course if you have a scenario where you have a list of employees then it would make sense to use an array:
{
"LIST_OF_ROWS": [{
"name": "Greg",
"age": "24",
}]
}
This is perfectly fine because you have one employee at this time but you wish to expand your company so you would expect to get more employees.