Creating individual JSON files from a CSV file that is already in JSON format - json

I have JSON data in a CVS file that I need to break apart into seperate JSON files. The data looks like this: {"EventMode":"","CalculateTax":"Y",.... There are multiple rows of this and I want each row to be a separate JSON file. I have used code provided by Jatin Grover that parses the CVS into JSON:
lcount = 0
out = json.dumps(row)
jsonoutput = open( 'json_file_path/parsedJSONfile'+str(lcount)+'.json', 'w')
jsonoutput.write(out)
lcount+=1
This does an excellent job the problem is it adds "R": " before the {"EventMode... and adds extra \ between each element as well as item at the end.
Each row of the CVS file is already valid JSON objects. I just need to break each row into a separate file with the .json extension.
I hope that makes sense. I am very new to this all.

It's not clear from your picture what your CSV actually looks like.
I mocked up a really small CSV with JSON lines that looks like this:
Request
"{""id"":""1"", ""name"":""alice""}"
"{""id"":""2"", ""name"":""bob""}"
(all the double-quotes are for escaping the quotes that are part of the JSON)
When I run this little script:
import csv
with open('input.csv', newline='') as input_file:
reader = csv.reader(input_file)
next(reader) # discard/skip the fist line ("header")
for i, row in enumerate(reader):
with open(f'json_file_path/parsedJSONfile{i}.json', 'w') as output_file:
output_file.write(row[0])
I get two files, json_file_path/parsedJSONfile0.json and json_file_path/parsedJSONfile1.json, that look like this:
{"id":"1", "name":"Alice"}
and
{"id":"2", "name":"bob"}
Note that I'm not using json.dumps(...), that only makes sense if you are starting with data inside Python and want to save it as JSON. Your file just has text that is complete JSON, so basically copy-paste each line as-is to a new file.

Related

Reading a .dat file in Julia, issues with variable delimeter spacing

I am having issues reading a .dat file into a dataframe. I think the issue is with the delimiter. I have included a screen shot of what the data in the file looks like below. My best guess is that it is tab delimited between columns and then new-line delimited between rows. I have tried reading in the data with the following commands:
df = CSV.File("FORCECHAIN00046.dat"; header=false) |> DataFrame!
df = CSV.File("FORCECHAIN00046.dat"; header=false, delim = ' ') |> DataFrame!
My result either way is just a DataFrame with only one column including all the data frome each column concatenated into one string. I tried to even specify the types with the following code:
df = CSV.File("FORCECHAIN00046.dat"; types=[Float64,Float64,Float64,Float64,
Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64]) |> DataFrame!
And I received an the following error:
┌ Warning: 2; something went wrong trying to determine row positions for multithreading; it'd be very helpful if you could open an issue at https://github.com/JuliaData/CS
V.jl/issues so package authors can investigate
I can work around this by uploading it into google sheets and then downloading a csv, but I would like to find a way to make the original .dat file work.
Part of the issue here is that .dat is not a proper file format—it's just something that seems to be written out in a somewhat human-readable format with columns of numbers separated by variable numbers of spaces so that the numbers line up when you look at them in an editor. Google Sheets has a lot of clever tricks built in to "do what you want" for all kinds of ill-defined data files, so I'm not too surprised that it manages to parse this. The CSV package on the other hand supports using a single character as a delimiter or even a multi-character string, but not a variable number of spaces like this.
Possible solutions:
if the files aren't too big, you could easily roll your own parser that splits each line and then builds a matrix
you can also pre-process the file turning multiple spaces into single spaces
That's probably the easiest way to do this and here's some Julia code (untested since you didn't provide test data) that will open your file and convert it to a more reasonable format:
function dat2csv(dat_path::AbstractString, csv_path::AbstractString)
open(csv_path, write=true) do io
for line in eachline(dat_path)
join(io, split(line), ',')
println(io)
end
end
return csv_path
end
function dat2csv(dat_path::AbstractString)
base, ext = splitext(dat_path)
ext == ".dat" ||
throw(ArgumentError("file name doesn't end with `.dat`"))
return dat2csv(dat_path, "$base.csv")
end
You would call this function as dat2csv("FORCECHAIN00046.dat") and it would create the file FORCECHAIN00046.csv, which would be a proper CSV file using commas as delimiters. That won't work well if the files contain any values with commas in them, but it looks like they are just numbers, in which case it should be fine. So you can use this function to convert the files to proper CSV and then load that file with the CSV package.
A little explanation of the code:
the two-argument dat2csv method opens csv_path for writing and then calls eachline on dat_path to read one line form it at a time
eachline strips any trailing newline from each line, so each line will be bunch of numbers separated by whitespace with some leading and/or trailing whitespace
split(line) does the default splitting of line which splits it on whitespace, dropping any empty values—this leaves just the non-whitespace entries as strings in an array
join(io, split(line), ',') joins the strings in the array together, separated by the , character and writes that to the io write handle for csv_path
println(io) writes a newline after that—otherwise everything would just end up on a single very long line
the one-argument dat2csv method calls splitext to split the file name into a base name and an extension, checking that the extension is the expected .dat and calling the two-argument version with the .dat replaced by .csv
Try using the readdlm function in DelimitedFiles library, and convert to DataFrame afterwards:
using DelimitedFiles, DataFrames
df = DataFrame(readdlm("FORCECHAIN00046.dat"), :auto)

how to set header from a single file when reading multiple csv files with spark?

I have multi .csv file with same format. the name of them is like file_#.csv. the header of them is in first file (file_1.csv).
I read this file with spark whit this code:
spark.read.csv('*.csv', header=True)
When I show the result the header is not the header of first file, it is one of the data row.
How can we say to spark that header is in the which file?
If you know the file that has the header row, then you can generate the schema by reading the schema from the header file and then use the same schema to read all other files.
df1 = spark.read.csv('a.csv', header=True)
header = spark.read.csv('a.csv', header=False).first()
df2 = spark.read.schema(df1.schema).csv(*.csv, header=False).filter(lambda line: line != header)
The code is also removing the header line from the data. You can improve upon the filter function if few fields can be used to distinguish the header from the data.
Not possible in any generic elegant way using standard spark.read apis.

how to save json data into csv using python

[I have json data like this 1]
I wanted to save the json into csv
the out put will be like this ,each tittle will be holding the information in that titile
I hope this gets converted to a comment, but look at Pandas, it can probably do what you want (Pandas json to csv)

Zapier Code Step Model Data into CSV

I'm looking for help with some JavaScript to insert inside of a code step in Zapier. I have two inputs that are named/look like the following:
RIDS: 991,992,993
LineIDs: 1,2,3
Each of these should match in the quantity of items in the list. There can be 1, 2 or 100 of them. The order is significant.
What I'm looking for is a code step to model the data into one CSV matching up the positions of each. So using the above data, my output would look like this:
991,1
992,2
993,3
Does anyone have code or easily know how to achieve this? I am not a JavaScript developer.
Zapier doesn't allow you to create files in a code step. You can, though, use the code step to generate text which can then be used in another step. I used Python for my example (I'm not as familiar with Javascript but the strategy is the same).
Create CSV file in Zapier from Raw Data
Code Step with LindeIDs and RIDs as inputs
import csv
import io
# Convert inputs into lists
lids = input_data['LineIDs'].split(',')
rids = input_data['RIDs'].split(',')
# Create file-like CSV object
csvfile = io.StringIO()
filewriter = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
# Write CSV rows
filewriter.writerow(['LineID', 'RID'])
for x in range(len(lids)):
filewriter.writerow([lids[x], rids[x]])
# Get CSV object value as text and set to output
output = {'text': csvfile.getvalue()}
Use a Google Drive step to Create File from Text
File Content = Text from Step 1
Convert to Document = no
This will create a *.txt document
Use a CloudConvert step to Convert File from txt to csv.

How do I read a list of JSON files from file in python?

I have a list of JSON files saved to disk that I would like to read. Sometimes the JSON files span more than one line and so, I think that a simple list comprehension that loops over open(file,'rb').readlines() will fail.
The files are surrounded in brackets and so passing them to json.load or json.loads won't work.
An example file would be:
[{key:value,key2:value2},{morekeys:morevalues},{evenmorekeys,evenmorevalues}]
What is the best/ most Pythonic way to read a saved list of JSON entries when the entries span more than one line?
Your example is valid json. [] define json arrays. What you have is an array of objects:
with open("myFile.json") as f:
objects = json.load(f)