I have a dict() for all the nmea sentences that are found in a csv. I tried creating another csv to write the results of the dict() into it for statistical and logging purposes. However, I can't due to the dict() not being 'callable'?
import csv
#Counts the number of times a GPS command is observed
def list_gps_commands(data):
"""Counts the number of times a GPS command is observed.
Returns a dictionary object."""
gps_cmds = dict()
for row in data:
try:
gps_cmds[row[0]] += 1
except KeyError:
gps_cmds[row[0]] = 1
return gps_cmds
print(list_gps_commands(read_it))
print ("- - - - - - - - - - - - -")
with open('gpsresults.csv', 'w') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',', dialect='excel')
spamwriter.writerow(list_gps_commands(read_it))
Can someone help me? Is there a way I can convert the keys/values into sequences so the csv module can recognize it? Or another way?
Use csv.DictWriter instead of csv.writer.
Related
I am working on a time series problem. Different training time series data is stored in a large JSON file with the size of 30GB. In tensorflow I know how to use TF records. Is there a similar way in pytorch?
I suppose IterableDataset (docs) is what you need, because:
you probably want to traverse files without random access;
number of samples in jsons is not pre-computed.
I've made a minimal usage example with an assumption that every line of dataset file is a json itself, but you can change the logic.
import json
from torch.utils.data import DataLoader, IterableDataset
class JsonDataset(IterableDataset):
def __init__(self, files):
self.files = files
def __iter__(self):
for json_file in self.files:
with open(json_file) as f:
for sample_line in f:
sample = json.loads(sample_line)
yield sample['x'], sample['time'], ...
...
dataset = JsonDataset(['data/1.json', 'data/2.json', ...])
dataloader = DataLoader(dataset, batch_size=32)
for batch in dataloader:
y = model(batch)
Generally, you do not need to change/overload the default data.Dataloader.
What you should look into is how to create a custom data.Dataset.
Once you have your own Dataset that knows how to extract item-by-item from the json file, you feed it do the "vanilla" data.Dataloader and all the batching/multi-processing etc, is done for you based on your dataset provided.
If, for example, you have a folder with several json files, each containing several examples, you can have a Dataset that looks like:
import bisect
class MyJsonsDataset(data.Dataset):
def __init__(self, jfolder):
super(MyJsonsDataset, self).__init__()
self.filenames = [] # keep track of the jfiles you need to load
self.cumulative_sizes = [0] # keep track of number of examples viewed so far
# this is not actually python code - just pseudo code for you to follow
for each jsonfile in jfolder:
self.filenames.append(jsonfile)
l = number of examples in jsonfile
self.cumulative_sizes.append(self.cumulative_sizes[-1] + l)
# discard the first element
self.cumulative_sizes.pop(0)
def __len__(self):
return self.cumulative_sizes[-1]
def __getitem__(self, idx):
# first you need to know wich of the files holds the idx example
jfile_idx = bisect.bisect_right(self.cumulative_sizes, idx)
if jfile_idx == 0:
sample_idx = idx
else:
sample_idx = idx - self.cumulative_sizes[jfile_idx - 1]
# now you need to retrieve the `sample_idx` example from self.filenames[jfile_idx]
return retrieved_example
I am using the following code in Python 3 to convert ~30,000 json files to a csv.
with open('out.csv', 'w') as f:
for fname in glob("*.json"): # Reads all json from the current directory
with open(fname) as j:
f.write(str(json.load(j)))
f.write('\n')
The json files are timestamps and values, for example {"1501005600":956170,"1501048800":970046,...
The output currently is
.
How can I put each in their own respective cells so the output is ?
I have tried many approaches with csv.writer but I cannot figure this out.
UPDATE
with open('out.csv', 'w') as f:
for fname in glob("*.json"):
with open(fname) as j:
values = json.load(j)
for k, v in values.items():
f.write("{},{},".format(str(k), str(v)))
Parsing is correct but each .json file is on one row now.
A friend helped me get to the bottom of this, hope this may help others.
with open('[insert].csv', 'w') as f:
for fname in glob("*.json"):
with open(fname) as j:
values = json.load(j)
for k, v in values.items():
f.write("{},{},".format(str(k), str(v)))
f.write('\n')
I want to write a program that loads data from a JSON database into a Python list of dictionary and adds all of the number of times the mean temperature was above versus below freezing. However, I am struggling to extract information from the database successfully/ I am concerned my algorithim is off. My plan:
1) define a function that loads data from the json file.
2) define a function that extracts information from the file
3) use that extracted information to tally the number of times the temp was above or below freezing
import json
def load_weather_data(): #function 1: Loads data
with open("NYC4-syr-weather-dec-2015.json", encoding = 'utf8') as w: #w for weather
data = w.read()
weather = json.loads(data)
print(type(weather))
return weather
def extract_temp(weather): #function 2: Extracts information on weather
info = {}
info['Mean TemperatureF'] = weather['Mean TemperatureF']#i keep getting a type error here
return info
print("Above and blelow freezing")
weather = load_weather_data()
info = extract_temp(weather)
above_freezing = 0
below_freezing = 0
for temperature in weather: # summing the number of times the weather was above versus below freezing
if info['Mean Temperature'] >32:
above_freezing=above_freezing+1
elif info['mean temperature']<32:
below_freezing = below_freezing +1
print(above_freezing)
print(below_freezing)
If you have any ideas, please let me know! Thank you.
You are trying to extract temperature from the weather list one time before starting the loop when really you should be doing it for each temperature object in the loop. You haven't posted sample data, but I think that weather is a list and you are trying to use it as a dict. Below is a fix with a couple of other changes for tidiness.
import json
# fixed: call with filename so that the function works on other files
def load_weather_data(filename): #function 1: Loads data
with open(filename, encoding = 'utf8') as w: #w for weather
# fixed: fewer steps
return json.load(w)
# fixed: not needed, doesn't simply anything
#def extract_temp(weather): #function 2: Extracts information on weather
# info = {}
# info['Mean TemperatureF'] = weather['Mean TemperatureF']#i keep getting a type error here
# return info
print("Above and blelow freezing")
weather = load_weather_data("NYC4-syr-weather-dec-2015.json")
above_freezing = 0
below_freezing = 0
for temperature in weather: # summing the number of times the weather was above versus below freezing
if info['Mean Temperature'] > 32:
above_freezing += 1
# fixed: capitalized above... so assuming it should be here too
elif info['Mean Temperature'] < 32:
below_freezing += 1
print(above_freezing)
print(below_freezing)
I want to output empty dataframe to csv file. I use these codes:
df.repartition(1).write.csv(path, sep='\t', header=True)
But due to there is no data in dataframe, spark won't output header to csv file.
Then I modify the codes to:
if df.count() == 0:
empty_data = [f.name for f in df.schema.fields]
df = ss.createDataFrame([empty_data], df.schema)
df.repartition(1).write.csv(path, sep='\t')
else:
df.repartition(1).write.csv(path, sep='\t', header=True)
It works, but I want to ask whether there are a better way without count function.
df.count() == 0 will make your driver program retrieve the count of all your dataframe partitions across the executors.
In your case I would use df.take(1).isEmpty (Spark >= 2.1). Still slow, but preferable to a raw count().
Only header:
cols = '\t'.join(df.columns)
with open('./cols.csv', 'w') as f:
f.write(cols)
I've found some data that someone is downloading into a JSON file (I think! - I'm a newb!). The file contains data on nearly 600 football players.
Here you can find the file
In the past, I have downloaded the json file and then used this code:
import csv
import json
json_data = open("file.json")
data = json.load(json_data)
f = csv.writer(open("fix_hists.csv","wb+"))
arr = []
for i in data:
fh = data[i]["fixture_history"]
array = fh["all"]
for j in array:
try:
j.insert(0,str(data[i]["first_name"]))
except:
j.insert(0,'error')
try:
j.insert(1,data[i]["web_name"])
except:
j.insert(1,'error')
try:
f.writerow(j)
except:
f.writerow(['error','error'])
json_data.close()
Sadly, when I do this now in command prompt, i get the following error:
Traceback (most recent call last):
File"fix_hist.py", line 12 (module)
fh = data[i]["fixture_history"]
TypeError: list indices must be integers, not str
Can this be fixed or is there another way I can grab some of the data and convert it to .csv? Specifically the 'Fixture History'? and then 'First'Name', 'type_name' etc.
Thanks in advance for any help :)
Try this tool: http://www.convertcsv.com/json-to-csv.htm
You will need to configure a few things, but should be easy enough.