I have case: in flow content is always json format and the data inside json always change (both kyes and values). Is this possible to convert this flow content to csv?
Please note that, keys in json are always change.
Many thanks,
To achieve this usecase we need to generate avro schema dynamically for each json record first then convert to AVRO finally convert AVRO to CSV
Flow:
1.SplitJson //split the array of json records into individual records
2.InferAvroSchema //infer the avro schema based on Json record and store in attribute
3.ConvertJSONToAvro //convert each json record into Avro data file
4.ConvertRecord //read the avro data file dynamically and convert into CSV format
5.MergeContent (or) MergeRecord processor //to merge the splitted flowfiles into one flowfile based on defragment strategy.
Save this xml and upload to your nifi instance and change as per your requirements.
Related
{"messageData":[{"vc":1,"ccid":"0010","hardwarePartNum":"00010","softwarePartNum":"000010","ecuName":"ecu333","ecuAssemblyNum":"4523001","requestId":"1001"},{"vc":2,"ccid":"0020","hardwarePartNum":"00020","softwarePartNum":"000020","ecuName":"ecu222","ecuAssemblyNum":"4523002","requestId":"2002"},{"vc":3,"ccid":"0010","hardwarePartNum":"00010","softwarePartNum":"000010","ecuName":"ecu333","ecuAssemblyNum":"4523001","requestId":"1001"},{"vc":4,"ccid":"0020","hardwarePartNum":"00020","softwarePartNum":"000020","ecuName":"ecu222","ecuAssemblyNum":"4523002","requestId":"2002"}]}
this is my jsonfile which i send to my kafka consumser
after parsing and storing it using Arraylist it is now in the form of list i.e like this
[messageData [vc=1,ccid=0010,hardwarePartNum=00010,softwarePartNum=000010,ecuName=ecu333,ecuAssemblyNum=4523001,requestId=1001]
[messageData [vc=2,ccid=0020,hardwarePartNum=00020,softwarePartNum=000020,ecuName=ecu222,ecuAssemblyNum=4523001,requestId=2002]
[messageData [vc=3,ccid=0010,hardwarePartNum=00010,softwarePartNum=000010,ecuName=ecu333,ecuAssemblyNum=4523001,requestId=1001]
[messageData [vc=4,ccid=0020,hardwarePartNum=00020,softwarePartNum=000020,ecuName=ecu222,ecuAssemblyNum=4523001,requestId=2002]
which data structure should i use to store , so that the final store file is also a json format file?
Snowflake supports multiple file types via creation FILE_FORMAT (avro, json, csv etc).
Now I have tested SELECTing from snowflake stage (s3) both:
*.avro files (generated from nifi processor batching 10k source oracle table).
*.json files (json per line).
And when Select $1 from #myStg, snowflake expands as many rows as records on avro or json files (cool), but.. the $1 variant is both json format and now i wonder if whatever snowflake file_format we use do records always arrive as json on the variant $1 ?
I haven't tested csv or others snowflake file_formats.
Or i wonder if i get json from the avros (from oracle table) because maybe NiFi processor creates avro files (with internally uses json format).
Maybe im making some confusion here.. i know avro files contain both:
avro schema - language similar to json key/value.
compressed data (binary).
Thanks,
Emanuel O.
I tried with CSV, When Its came to CSV its parsing each records in the file like below
So when its came to JSON it will treat one complete JSON as one records so its displaying in JSON format.
When using crossfilter (for example for dc.js), do I always need to transform my data to a flat JSON for input?
Flat JSON data when reading from AJAX requests tend to be a lot larger than it needs to be (in comparison to for example nested JSON, value to array or CSV data).
Is there an API available which can read in other types than flat json? Are there plans to add those?
I would like to avoid to let the client transform the data before using it.
I want to convert my nested json into csv ,i used
df.write.format("com.databricks.spark.csv").option("header", "true").save("mydata.csv")
But it can use to normal json but not nested json. Anyway that I can convert my nested json to csv?help will be appreciated,Thanks!
When you ask Spark to convert a JSON structure to a CSV, Spark can only map the first level of the JSON.
This happens because of the simplicity of the CSV files. It is just asigning a value to a name. That is why {"name1":"value1", "name2":"value2"...} can be represented as a CSV with this structure:
name1,name2, ...
value1,value2,...
In your case, you are converting a JSON with several levels, so Spark exception is saying that it cannot figure out how to convert such a complex structure into a CSV.
If you try to add only a second level to your JSON, it will work, but be careful. It will remove the names of the second level to include only the values in an array.
You can have a look at this link to see the example for json datasets. It includes an example.
As I have no information about the nature of the data, I can't say much more about it. But if you need to write the information as a CSV you will need to simplify the structure of your data.
Read json file in spark and create dataframe.
val path = "examples/src/main/resources/people.json"
val people = sqlContext.read.json(path)
Save the dataframe using spark-csv
people.write
.format("com.databricks.spark.csv")
.option("header", "true")
.save("newcars.csv")
Source :
read json
save to csv
I have to develop a mapreduce program that is needed to perform a join on two different data sets.
One of them is a csv file and other is an avro file.
I am using MultipleInputs to process both sources. However to process both dataset in one single reducer, I am converting the Avro Data to Text by using
new Text(key.datum.toString())
My challenge is to convert the Json String generated above to Avro rcord back in reducer as the final output needs to be in avro format.
Is there a particular function or class that can be used to do this?
If yes, can you please quote an example as well?