{"messageData":[{"vc":1,"ccid":"0010","hardwarePartNum":"00010","softwarePartNum":"000010","ecuName":"ecu333","ecuAssemblyNum":"4523001","requestId":"1001"},{"vc":2,"ccid":"0020","hardwarePartNum":"00020","softwarePartNum":"000020","ecuName":"ecu222","ecuAssemblyNum":"4523002","requestId":"2002"},{"vc":3,"ccid":"0010","hardwarePartNum":"00010","softwarePartNum":"000010","ecuName":"ecu333","ecuAssemblyNum":"4523001","requestId":"1001"},{"vc":4,"ccid":"0020","hardwarePartNum":"00020","softwarePartNum":"000020","ecuName":"ecu222","ecuAssemblyNum":"4523002","requestId":"2002"}]}
this is my jsonfile which i send to my kafka consumser
after parsing and storing it using Arraylist it is now in the form of list i.e like this
[messageData [vc=1,ccid=0010,hardwarePartNum=00010,softwarePartNum=000010,ecuName=ecu333,ecuAssemblyNum=4523001,requestId=1001]
[messageData [vc=2,ccid=0020,hardwarePartNum=00020,softwarePartNum=000020,ecuName=ecu222,ecuAssemblyNum=4523001,requestId=2002]
[messageData [vc=3,ccid=0010,hardwarePartNum=00010,softwarePartNum=000010,ecuName=ecu333,ecuAssemblyNum=4523001,requestId=1001]
[messageData [vc=4,ccid=0020,hardwarePartNum=00020,softwarePartNum=000020,ecuName=ecu222,ecuAssemblyNum=4523001,requestId=2002]
which data structure should i use to store , so that the final store file is also a json format file?
I want to write a data frame as a json dataframe in pyspark replicating this way to write json from pandas:
df.to_json(orient='columns')
then I got
'{"col 1":{"row 1":"a","row 2":"c"},"col 2":{"row 1":"b","row 2":"d"}}'
But when I use this in AWS GLUE
df.write.mode('overwrite').json(path)
I got this format:
df.to_json(orient='records')
'[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"
I was finding out about parameters of json in pyspark and there is not orient to set the json format
I have case: in flow content is always json format and the data inside json always change (both kyes and values). Is this possible to convert this flow content to csv?
Please note that, keys in json are always change.
Many thanks,
To achieve this usecase we need to generate avro schema dynamically for each json record first then convert to AVRO finally convert AVRO to CSV
Flow:
1.SplitJson //split the array of json records into individual records
2.InferAvroSchema //infer the avro schema based on Json record and store in attribute
3.ConvertJSONToAvro //convert each json record into Avro data file
4.ConvertRecord //read the avro data file dynamically and convert into CSV format
5.MergeContent (or) MergeRecord processor //to merge the splitted flowfiles into one flowfile based on defragment strategy.
Save this xml and upload to your nifi instance and change as per your requirements.
I understand that we can define Schema for a JSON structure, but I'm unable to find any info on how to tie the schema with actual json data, where exactly within JSON data we mention the schema for the JSON.
I'm trying to understand how I can return a JSON data along with the Schema so that the end user/service that consumes my JSON can discover and understand the schema associated with the json data.
I am making an Android apps and I have a big json file and I like to store it in mysqlite.
Should I convert the json file to objects before inserting into mysqlite?
thanks.
I see that you were able to convert it to JSON object. After that You should convert the JSON object to a String then save the string as VARCHAR in the DB