NiFi convert json to csv using ConvertRecord - json

i have a stream of json in apache nifi that contain dynamic fields (maximum 11 fields) and i want to convert it to csv file.
sample json:
{
"field1":"some text",
"field2":"some text",
"field3":"some text",
"field4":"some text",
"field5":"some text",
"field6":"some text",
"field7":"some text"
}
i don't wanna using replace or json evaluate; how i do it with ConvertRecord?
using this processor is so odd and hard to work...
Clear expression about dynamic fields:
i have 11 fields at total. one record may have contain 7 fields, and next record may contain 11 fields and next 9 fields...

The steps provided below will help you in getting this done.:
Connect your source processor which generates/outputs the JSON files to ConvertRecord.
Configure ConvertRecord and set 'Record Reader' to use JsonTreeReader controller service and 'Record Writer' to use CSVRecordSetWriter controller service
Configure both the controller services and set Schema Registry property to use AvroSchemaRegistry
Configure AvroSchemaRegistry. Go to 'Properties' tab and click the + button which lets you add a dynamic property.
Give some property name (ex: mySchema) and for the value, give the Avro schema expected for your input JSON. (You can use InferAvroSchema processor to generate Avro schema for your JSON)
Configure both JsonTreeReader and CsvRecordSetWriter and set the 'Schema Name' property to the name provided above, in this case, mySchema.
Connect the relationships of ConvertRecord to downstream processors according to your need.

Related

Data Factory Copy Data Source as Body of Sink Object

I've been trying to create an ADF pipeline to move data from one of our databases into an azure storage folder - but I can't seem to get the transform to work correctly.
I'm using a Copy Data task and have the source and sink set up as datasets and data is flowing from one to the other, it's just the format that's bugging me.
In our Database we have a single field that contains a JSON object, this needs to be mapped into the sink object but doesn't have a Column name, it is simply the base object.
So for example the source looks like this
and my output needs to look like this
[
{
"ID": 123,
"Field":"Hello World",
"AnotherField":"Foo"
},
{
"ID": 456,
"Field":"Don't Panic",
"AnotherField":"Bar"
}
]
However, the Copy Data task seems to only seems to accept direct Source -> Sink mapping, and also is treating the SQL Server field as VARCHAR (which I suppose it is). So as a result I'm getting this out the other side
[
{
"Json": "{\"ID\": 123,\"Field\":\"Hello World\",\"AnotherField\":\"Foo\"}"
},
{
"Json": "{\"ID\": 456,\"Field\":\"Don't Panic\",\"AnotherField\":\"Bar\"}"
}
]
I've tried using the internal #json() parse function on the source field but this causes errors in the pipeline. I also can't get the sink to just map directly as an object inside the output array.
I have a feeling I just shouldn't be using Copy Data, or that Copy Data doesn't support the level of transformation I'm trying to do. Can anybody set me on the right path?
Using a JSON dataset as a source in your data flow allows you to set five additional settings. These settings can be found under the JSON settings accordion in the Source Options tab. For Document Form setting, you can select one of Single document, Document per line and Array of documents types.
Select Document form as Array of documents.
Refer - https://learn.microsoft.com/en-us/azure/data-factory/format-json

Transform Avro file with Wrangler into JSON in cloud Datafusion

I try to read an Avro file, make a basic transformation (remove records with name = Ben) using Wrangler and write the result as JSON file into google cloud storage.
The Avro file has the following schema:
{
"type": "record",
"name": "etlSchemaBody",
"fields": [
{
"type": "string",
"name": "name"
}
]
}
Transformation in wrangler is the following:
transformation
The following is the output schema for JSON file:
output schema
When I run the pipeline it runs successfully and the JSON file is created in cloud storage. But the JSON output is empty.
When trying a preview run I get the following message:
warning message
Why is the JSON output file in gcloud storage empty?
When using the Wrangler to make transformations, the default values for the GCS source are format: text and body: string (data type); however, to properly work with an Avro file in the Wrangler you need to change that, you need to set the format to blob and the body data type to bytes, as follows:
After that, the preview for your pipeline should produce output records. You can see my working example next:
Sample data
Transformations
Input records preview for GCS sink (final output)
Edit:
You need to set the format: blob and the output schema as body: bytes if you want to parse the file to Avro within the Wrangler, as described before, because it needs the content of the file in a binary format.
On the other hand if you only want to apply filters (within the Wrangler), you could do the following:
Open the file using format: avro, see img.
Set the output schema according to the fields that your Avro file has, in this case name with string data type, see img.
Use only filters on the Wrangler (no parsing to Avro here), see img.
And this way you can also get the desired result.

NiFi non-Avro JSON Reader/Writer

It appears that the standard Apache NiFi readers/writers can only parse JSON input based on Avro schema.
Avro schema is limiting for JSON, e.g. it does not allow valid JSON properties starting with digits.
JoltTransformJSON processor can help here (it doesn't impose Avro limitations to how the input JSON may look like), but it seems that this processor does not support batch FlowFiles. It is also not based on the readers and writers (maybe because of that).
Is there a way to read arbitrary valid batch JSON input, e.g. in multi-line form
{"myprop":"myval","12345":"12345",...}
{"myprop":"myval2","12345":"67890",...}
and transform it to other JSON structure, e.g. defined by JSON schema, and e.g. using JSON Patch transformation, without writing my own processor?
Update
I am using Apache NiFi 1.7.1
Update 2
Unfortunately, #Shu's suggestion did work. I am getting same error.
Reduced the case to a single UpdateRecord processor that reads JSON with numeric properties and writes to a JSON without such properties using
myprop : /data/5836c846e4b0f28d05b40202
mapping. Still same error :(
it does not allow valid JSON properties starting with digits?
This bug NiFi-4612 fixed in NiFi-1.5 version, We can use AvroSchemaRegistry to defined your schema and change the
Validate Field Names
false
Then we can have avro schema field names starting with digits.
For more details refer to this link.
Is there a way to read arbitrary valid batch JSON input, e.g. in multi-line form?
This bug NiFi-4456 fixed in NiFi-1.7, if you are not using this version of NiFi then we can do workaround to create an array of json messages with ,(comma delimiter) by using.
Flow:
1.SplitText //split the flowfile with 1 line count
2.MergeRecord //merge the flowfiles into one
3.ConvertRecord
For more details regards to this particular issues refer to this link(i have explained with the flow).

How to insert multiple Json data to hbase using NiFI.?

Please tell me how to insert multiple json data into hbase using Nifi
PutHbaseJson Image Output
PutHbaseCell Image Output
when we try to insert more than one id's or object.
This is the file which i have tried with PutHbaseCell
{"id" : "1334134","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"},
{"id" : "412","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}
Image of PutHbaseCell Processor
PutHBaseJson expects each flow file to contain one JSON document which becomes a row in HBase. The row id can be specified in the processor using expression language, or it can come from one of the fields in the JSON. The other field/value pairs in the JSON become the the columns/values of the row in HBase.
If you want to use PutHBaseJson, you just need to split up your data in NiFi before it reaches this processor. There are many ways to do this.. SplitJson, SplitText, SplitContent, ExecuteScript, a custom processors.
Alternatively there is a PutHBaseRecord processors which can use a record reader to read records from a flow file and send them all to HBase. In your case you would need a JSON record reader. The data also has to be in a format that is understood by the record reader, and I believe for JSON it would need to be an array of documents.

How to read the values of attributes in Apache NiFi

I'm working on a sample NiFi flow where I get a Json file as input. I use EvaluateJsonPath processor to get the value of the desired path. I've set the destination of EvaluateJsonPath as "flow-file attribute" and I added new properties with the required JsonPath. For ex: Property name: username, value: $.input.username. Now, I will be needing this value in the next processor. So I want to know which processor should I use to read the attributes of the flow-file.
You don't need a special processor to read the attributes of a FlowFile.
If this is your attribute key/value pair.
username : $.input.username
You can read that value like below in any processor property that supports Expression Language.
${username}