I have a .MF4 file and want to export a list of Channels to a CSV file.
Following is the function i used:
list.export(fmt='csv', filename='foo.csv',single_time_base=True, overwrite = True)
empty_channels are skipped by default see documentation
in the CSV i dont get any values for a specific Channel because there are multiple Signals with the same name. So he only find the empty channel and skips this.
How is it possible to get the Signal and add it to the CSV?
Using the development branch code I get the expected result, since an unique name is generate in case of multiple occurrence. So you should see the following columns in the .csv output: "Sig", "Sig_0", "Sig_1" ...
from asammdf import MDF, Signal
import numpy as np
mdf = MDF()
mdf.append(Signal(np.arange(10), np.arange(10), name='Sig'))
mdf.append(Signal([], [], name='Sig'))
mdf.append(Signal(np.arange(0, 10, 0.1), np.arange(0, 10, 0.1), name='Sig'))
mdf.export(fmt='csv', filename='foo.csv', single_time_base=True, overwrite = True)
Related
I am trying to load the airbnb_nyc data set from GCS bucket to BigqueryTable. Link to the dataset.
I am using the following Code:
def parse_file(element):
for line in csv.reader([element],delimiter=','):
return line
class DataIngestion2:
def parse_method2(self, values):
row1 = dict(
zip(('id', 'name', 'host_id', 'host_name', 'neighbourhood_group', 'neighbourhood', 'latitude', 'longitude',
'room_type', 'price', 'minimum_nights', 'number_of_reviews', 'last_review', 'reviews_per_month',
'calculated_host_listings_count', 'availability_365'),
values))
return row1
with beam.Pipeline(options=pipeline_options) as p:
lines= p | 'Read' >> ReadFromText(known_args.input,skip_header_lines=1)\
| 'parse' >> beam.Map(parse_file)
pipeline2 = lines | 'Format to Dict _ original CSV' >> beam.Map(lambda x: data_ingestion2.parse_method2(x))
pipeline2 | 'Load2' >> beam.io.WriteToBigQuery(table_spec, schema=table_schema,
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED
)
`
But my output on BigQuery Table is wrong.
I am only getting values for the first two columns and the rest of the 14 columns are showing NULL. I am not able to figure out what I am doing wrong. Can Someone Help me find the error in my logic. I basically want to know how to transfer a csv from GCS bucket to BigQuery through DataFlow pipeline.
Thank you,
You can use the ReadFromText method and then create your own transform by extending beam.DoFn. Attached the code below for reference.
https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.io.textio.html#apache_beam.io.textio.ReadFromText
Note that you can use gs:// for GCS in file_pattern.
More details about Pardo and DoFn
https://beam.apache.org/documentation/programming-guide/#pardo
import apache_beam as beam
from apache_beam.io.textio import ReadAllFromText,ReadFromText
from apache_beam.io.gcp.bigquery import WriteToBigQuery
from apache_beam.io.gcp.gcsio import GcsIO
import csv
COLUMN_NAMES = ['id','name','host_id','host_name','neighbourhood_group','neighbourhood','latitude','longitude','room_type','price','minimum_nights','number_of_reviews','last_review','reviews_per_month','calculated_host_listings_count','availability_365']
def files(path='gs:/some/path'):
return list(GcsIO(storage_client='<ur storage client>').list_prefix(path=path).keys())
def transform_csv(element):
rows = []
with open(element,newline='\r\n') as f:
itr = csv.reader(f, delimiter = ',',quotechar= '"')
skip_head = next(itr)
for row in itr:
rows.append(row)
return rows
def to_dict(element):
rows = []
for item in element:
row_dict = {}
zipped = zip(COLUMN_NAMES,item)
for key,val in zipped:
row_dict[key] =val
rows.append(row_dict)
yield rows
with beam.Pipeline() as p:
read =(
p
|'read-file'>> beam.Create(files())
|'transform-dict'>>beam.Map(transform_csv)
|'list-to-dict'>>beam.FlatMap(to_dict )
|'print'>>beam.Map(print)
#|'write-to-bq'>>WriteToBigQuery(schema=COLUMN_NAMES,table='ur table',project='',dataset='')
)
EDITED1 The ReadFromText supports \r\n as newline char.But,this fails to consider the condition where column data itself has \r\n. Updating the code below.
EDITED 2 GcsIo error fixed.
Note - I have used GCSIO for getting the list of files.
Details here
Please Up-vote and mark as answer if this helps.
Let me suggest another approch for this use case. BiqQuery offers special feature for uploading from Google Could Storage (GCS) to Bigquery. You can load data in several formats and CSV is among them.
There is nice tutorial on Google documentation explaining how to do it. You do not have to use Dataflow or apache_beam. Such process is available through BigQuery API itself.
This is working in many languages, but you do not have to use any language as such process can be done from console or via Cloud SDK using bq command. Everything can be found in mentioned tutorial.
I am trying to extract json file data using python but running in some errors.
aircraft.json (json file):
{ "now" : 1609298440.3,
"messages" : 31501,
"aircraft" : [
{"hex":"abadf9","alt_baro":37000,"alt_geom":36625,"gs":541.9,"track":73.3,"baro_rate":0,"version":0,"nac_p":7,"nac_v":1,"sil":2,"sil_type":"unknown","mlat":[],"tisb":[],"messages":13,"seen":6.6,"rssi":-25.3},
{"hex":"acc02b","flight":"SWA312 ","alt_baro":37000,"alt_geom":36650,"gs":549.3,"track":62.2,"baro_rate":0,"category":"A3","nav_qnh":1013.6,"nav_altitude_mcp":36992,"nav_heading":56.2,"lat":42.171346,"lon":-93.298198,"nic":8,"rc":186,"seen_pos":66.3,"version":2,"nic_baro":1,"nac_p":8,"nac_v":1,"sil":3,"sil_type":"perhour","gva":1,"sda":2,"mlat":[],"tisb":[],"messages":1205,"seen":7.4,"rssi":-26.0},
{"hex":"ac9e9a","category":"A4","version":2,"sil_type":"perhour","mlat":[],"tisb":[],"messages":746,"seen":119.1,"rssi":-26.6},
{"hex":"a96577","flight":"DAL673 ","alt_baro":40025,"alt_geom":39625,"gs":371.4,"track":265.1,"baro_rate":0,"squawk":"2641","emergency":"none","category":"A4","nav_qnh":1013.6,"nav_altitude_mcp":40000,"nav_heading":258.8,"lat":42.057220,"lon":-94.098337,"nic":8,"rc":186,"seen_pos":0.9,"version":2,"nic_baro":1,"nac_p":9,"nac_v":1,"sil":3,"sil_type":"perhour","gva":2,"sda":2,"mlat":[],"tisb":[],"messages":3021,"seen":0.3,"rssi":-21.8},
{"hex":"aa56db","category":"A3","version":2,"sil_type":"perhour","mlat":[],"tisb":[],"messages":1651,"seen":85.3,"rssi":-26.4}
]
}
My code:
import json
json_file = open('test.json')
aircraft_json = json.load(json_file)
for i in aircraft_json['aircraft']:
print(i['hex'],i['flight'],i['alt_baro'],i['alt_geom'],i['gs'],i['gs'],i['track'],i['baro_rate'],i[
'category'],i['nav_qnh'],i['nav_altitude_mcp'],i['lat'],i['lon'],i['nic'],i['rc'],i['seen_pos'],i['version'],i['nic_baro'],i['nac_p'],i['nac_v'],i['sil'],i['sil_type'],i['gva'],i['sda'],i['mlat'],i['tisb'],i['messages'],i['seen'],i['rssi'])
json_file.close()
Output:
Traceback (most recent call last):
File "/home/pi/aircraft_json_to_csv.py", line 11, in <module>
print(i['hex'],i['flight'],i['alt_baro'],i['alt_geom'],i['gs'],i['gs'],i['track'],i['baro_rate'],i[
KeyError: 'flight
The json file is updated every second and json file may miss key values like 'flight' or any random key values. My question is if that key is missing then how to replace those missing value with empty space without getting keyerror.
Thank you
My advice would be to give each field a suitable default value and store these fields in a dictionary.
Then, instead of assuming the field is present, check if the field exists. If it doesn't, then apply the default value.
Below is a simple example of this in action.
The defaults dict has been populated with a few possible defaults
to get you started, to which you would add the rest of the fields as well.
I've adapted the loop to iterate through the keys of the dict (all the known fields so to speak), and add the default value for any missing field.
import json
with open('aircraft.json') as json_file:
aircraft_json = json.load(json_file)
defaults = {
'alt_baro': 0,
'alt_geom': 0,
'version': 0,
'baro_rate': 0,
'mlat': [],
'tisb': []
# similarly for the other fields
}
for dat in aircraft_json['aircraft']:
for field in defaults.keys():
if field not in dat:
dat[field] = defaults[field]
print(dat[field], end=' ')
print('')
I am currently trying to import a big csv file (50GB+) without any headers into a pyarrow table with the overall target to export this file into the Parquet format and further to process it in a Pandas or Dask DataFrame. How can i specify the column names and column dtypes within pyarrow for the csv file?
I already thought about to append the header to the csv file. This enforces a complete rewrite of the file which looks like a unnecssary overhead. As far as I know, pyarrow provides schemas to define the dtypes for specific columns, but the docs are missing a concrete example for doing so while transforming a csv file to an arrow table.
Imagine that this csv file just has for an easy example the two columns "A" and "B".
My current code looks like this:
import numpy as np
import pandas as pd
import pyarrow as pa
df_with_header = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
print(df_with_header)
df_with_header.to_csv("data.csv", header=False, index=False)
df_without_header = pd.read_csv('data.csv', header=None)
print(df_without_header)
opts = pa.csv.ConvertOptions(column_types={'A': 'int8',
'B': 'int8'})
table = pa.csv.read_csv(input_file = "data.csv", convert_options = opts)
print(table)
If I print out the final table, its not going to change the names of the columns.
pyarrow.Table
1: int64
3: int64
How can I now change the loaded column names and dtypes? Is there maybe also a possibility to for example pass in a dict containing the names and their dtypes?
You can specify type overrides for columns:
fp = io.BytesIO(b'one,two,three\n1,2,3\n4,5,6')
fp.seek(0)
table = csv.read_csv(
fp,
convert_options=csv.ConvertOptions(
column_types={
'one': pa.int8(),
'two': pa.int8(),
'three': pa.int8(),
}
))
But in your case you don't have a header, and as far as I can tell this use case is not supported in arrow:
fp = io.BytesIO(b'1,2,3\n4,5,6')
fp.seek(0)
table = csv.read_csv(
fp,
parse_options=csv.ParseOptions(header_rows=0)
)
This raises:
pyarrow.lib.ArrowInvalid: header_rows == 0 needs explicit column names
The code is here: https://github.com/apache/arrow/blob/3cf8f355e1268dd8761b99719ab09cc20d372185/cpp/src/arrow/csv/reader.cc#L138
This is similar to this question apache arrow - reading csv file
There should be fix for it in the next version: https://github.com/apache/arrow/pull/4898
BACKGROUND:
I am having issues trying to search through some CSV files.
I've gone through the python documentation: http://docs.python.org/2/library/csv.html
about the csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds) object of the csv module.
My understanding is that the csv.DictReader assumes the first line/row of the file are the fieldnames, however, my csv dictionary file simply starts with "key","value" and goes on for atleast 500,000 lines.
My program will ask the user for the title (thus the key) they are looking for, and present the value (which is the 2nd column) to the screen using the print function. My problem is how to use the csv.dictreader to search for a specific key, and print its value.
Sample Data:
Below is an example of the csv file and its contents...
"Mamer","285713:13"
"Champhol","461034:2"
"Station Palais","972811:0"
So if i want to find "Station Palais" (input), my output will be 972811:0. I am able to manipulate the string and create the overall program, I just need help with the csv.dictreader.I appreciate any assistance.
EDITED PART:
import csv
def main():
with open('anchor_summary2.csv', 'rb') as file_data:
list_of_stuff = []
reader = csv.DictReader(file_data, ("title", "value"))
for i in reader:
list_of_stuff.append(i)
print list_of_stuff
main()
The documentation you linked to provides half the answer:
class csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
[...] maps the information read into a dict whose keys are given by the optional fieldnames parameter. If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as the fieldnames.
It would seem that if the fieldnames parameter is passed, the given file will not have its first record interpreted as headers (the parameter will be used instead).
# file_data is the text of the file, not the filename
reader = csv.DictReader(file_data, ("title", "value"))
for i in reader:
list_of_stuff.append(i)
which will (apparently; I've been having trouble with it) produce the following data structure:
[{"title": "Mamer", "value": "285713:13"},
{"title": "Champhol", "value": "461034:2"},
{"title": "Station Palais", "value": "972811:0"}]
which may need to be further massaged into a title-to-value mapping by something like this:
data = {}
for i in list_of_stuff:
data[i["title"]] = i["value"]
Now just use the keys and values of data to complete your task.
And here it is as a dictionary comprehension:
data = {row["title"]: row["value"] for row in csv.DictReader(file_data, ("title", "value"))}
The currently accepted answer is fine, but there's a slightly more direct way of getting at the data. The dict() constructor in Python can take any iterable.
In addition, your code might have issues on Python 3, because Python 3's csv module expects the file to be opened in text mode, not binary mode. You can make your code compatible with 2 and 3 by using io.open instead of open.
import csv
import io
with io.open('anchor_summary2.csv', 'r', newline='', encoding='utf-8') as f:
data = dict(csv.reader(f))
print(data['Champhol'])
As a warning, if your csv file has two rows with the same value in the first column, the later value will overwrite the earlier value. (This is also true of the other posted solution.)
If your program really is only supposed to print the result, there's really no reason to build a keyed dictionary.
import csv
import io
# Python 2/3 compat
try:
input = raw_input
except NameError:
pass
def main():
# Case-insensitive & leading/trailing whitespace insensitive
user_city = input('Enter a city: ').strip().lower()
with io.open('anchor_summary2.csv', 'r', newline='', encoding='utf-8') as f:
for city, value in csv.reader(f):
if user_city == city.lower():
print(value)
break
else:
print("City not found.")
if __name __ == '__main__':
main()
The advantage of this technique is that the csv isn't loaded into memory and the data is only iterated over once. I also added a little code the calls lower on both the keys to make the match case-insensitive. Another advantage is if the city the user requests is near the top of the file, it returns almost immediately and stops looking through the file.
With all that said, if searching performance is your primary consideration, you should consider storing the data in a database.
I have a CSV file looks like
Hit39, Hit24, Hit9
Hit8, Hit39, Hit21
Hit46, Hit47, Hit20
Hit24, Hit 53, Hit46
I want to read file and create a dictionary based on the first come first serve first basis
like Hit39 : 1, Hit 24:2 and so on ...
but notice Hit39 appeared on column 2 and row2 . So if the reader reads it then it should not append it to dictionary it will move on with the new number.
Once a row number is visited it shouldn't include numbers after that if appeared.
Using Python - Best guess until the OP is clarified - treat the file as though it was one huge list and assign an incrementing variable to unique occurences of value.
import csv
from itertools import count
mydict = {}
counter = count(1)
with open('infile.csv') as fin:
for row in csv.reader(fin, skipinitialspace=True):
for col in row:
mydict[col] = mydict.get(col, next(counter))
Since Python is a popular language that has dictionaries, you must be using Python. At least I assume.
import csv
reader = csv.reader(file("filename.csv"))
d = dict((line[0], 1+lineno) for lineno, line in enumerate(reader))
print d