I have a CSV file with a column with JSON formatted data.
How can I extract the JSON data into a CSV file that can be processed in Access or SQL?
What code language can be used and how will that code look like?
Related
I am trying to create a parquet file from a CSV file using Apache Nifi.
I am able to convert the CSV to parquet file, but the problem is, the schema of the parquet file contains struct type(Which I need to overcome) and convert it into string type.
I am using Apache Nifi 1.14.0 on Windows Server 2016.
This is what I've tried to convert CSV to parquet till now...
I have used the below 3 controllers
CSVReader
CSVRecordSetWriter
ParquetRecordSetWriter
And, These are the processors/Flow
GetFile
ConvertRecord(CSVReader to CSVRecordSetWriter and this will automatically generate "avro.schema" attribute and in next step I am updating this attribute)
UpdateAttribute(Updating "avro.schema" attribute, where ever I've got 2 data types inferred, I am replacing it to '["null","string"]')
ConvertRecord(CSVReader to ParquetRecordSetWriter)
UpdatedAttribute(For appending '.parquet' in the filename)
PutFile
I also want to know, how to view a .parquet file in Windows OS. Currently, I am reading the parquet file via PySpark and checking the schema. :|
This is how parquet file schema looks like after conversion. I want string instead of Struct as output.
Please Note: There are lots of CSVs with many columns/fields. I don't want to create schema manually.
OR
Any other ways to achieve this would be very helpfull.
Thanks!
After playing around with some more options of "ParquetRecordSetWriter", I was able to create a parquet file with the schema that I've captured in "avro.schema" attribute.
I need to convert a CSV file to JSON file using Python. I used this,
variable = csv.DictReader(file.csv)
It throws this ERROR
csv.Error: line contains NULL byte
I checked the CSV file in Excel, it shows no NULL chars, but when I printed the data in CSV file using Python. There are some data like SOHNULNULHG (here last 2 letters, HG is the data displaying in the Excel). I need to remove these ASCII chars in the CSV file, while converting to JSON. (i.e. I need only HG from the above string)
I just ran into the same issue. I converted my csv file to csv UTF-8 and ran it again without any errors. That seemed to fix the ASCII char issue. Hope that helps.
To convert the csv type, I just opened my file up in Excel, did save as, then selected CSV UTF-8(Comma delimited)(*.csv) in the Save as type.
Hope that helps.
I am attempting to write a spark dataframe as JSON file; this will eventually be written out into MapR JSON DB table.
grp_small.toJSON.write.save("<path>")
This seems to write JSON file in snappy.parquet format. How do I force it to write it as a readable JSON (txt format) ?
You can write dataframe to json which contains each row as readable json in each line.
grp_small.write.json("path to output")
Hope this hepls!
I am trying to parse the XML file and write to DataFrame result to CSV file.
My problem is some of characters are not supported when i write the output to the CSV. For eg, there is a field Nectarine tree named ‘Polar Zee’ its writes like Nectarine tree named ‘Polar Zee’.
Is there any settings need to be change? or any properties need to be added?
[I have json data like this 1]
I wanted to save the json into csv
the out put will be like this ,each tittle will be holding the information in that titile
I hope this gets converted to a comment, but look at Pandas, it can probably do what you want (Pandas json to csv)