Extracting csv file rows as individual .txt files - csv

I am new to Python and trying to extract certain data from rows of a csv file into individual .txt files (to create a corpus for NLP). So far I have the following:
import csv
with open(r"file.csv", "r+", encoding='utf-8') as f:
reader = csv.reader(f)
data = list(reader)
t = (data[1][91])
fn = str(data[1][90])
g = open("%s.txt" %fn,"w+")
for i in range(1):
g.write(t)
g.close
Which does what I want for the 1st row, however I am not sure how to get the program to loop up to row 1047. Note: the [1] signifies row 1, the [91] & [90] should remain fixed.
Thanks in advance!

Related

Azure Bicep - load excel file in Bicep

I would like to load the values from excel file, they are only names inside it and I have a lot of them. So I don't want to copy all of them and place them in an array. I want some solution if it's possible like [loadJsonContent].
I want some solution if it's possible like [loadJsonContent].
If you want built-in File function for bicep, the answer is NO.
From the official document, File functions for Bicep only have three:
loadFileAsBase64
Loads the file as a base64 string.
loadJsonContent
Loads the specified JSON file as an Any object.
loadTextContent
Loads the content of the specified file as a string.
I think your requirement needs to achieve via writing code.
By the way, you didn't clearly define the excel file, xlsx or csv? And if possible, please provide a sample file format so that we can provide specific code.
For example, I have a Student.xlsx file like this(CSV file is also this structure.):
Then I can use this Python code to parse and get the data I want:
import os
import openpyxl
import csv
#get the student name from 'Student Name' of sheet 'Tabelle1' of the file 'XLSX_Folder/Student.xlsx'
def get_student_name(file_path, sheet_name, col):
student_name = []
#if file_path ends with '.xlsx'
if file_path.endswith('.xlsx'):
wb = openpyxl.load_workbook(file_path)
sheet = wb[sheet_name]
#get all the values under the column 'Student Name'
for i in range(col, sheet.max_row+1):
student_name.append(sheet.cell(row=i, column=col).value)
print('This is xlsx file.')
return student_name
elif file_path.endswith('.csv'):
#get all the values under the column 'Student Name', except the first row
with open(file_path, 'r') as f:
reader = csv.reader(f)
for row in reader:
if reader.line_num == 1:
continue
student_name.append(row[col-1])
print('This is csv file.')
return student_name
print('This is csv file.')
else:
print('This is other format file.')
XLSX_file_path = 'XLSX_Folder/Student.xlsx'
CSV_file_path = 'CSV_Folder/Student.csv'
sheet_name = 'Tabelle1'
col = 2
print(get_student_name(XLSX_file_path, sheet_name, col))
print(get_student_name(CSV_file_path, sheet_name, col))
Result:
After that, parse your bicep file and put the above data into your bicep file.
The above code is just a demo, you can write your own code with the develop language you like. Anyway, no built-in feature of your requirement.

create a script to classify elements of a CSV file

I have the list of all the airports in the world in a CSV file. I would like if it is possible to create a script which allows to create a folder (name of the country) and to put all the airports of the same country in the same folder and to do this automatically for all the countries present in the CSV files.
Thanks for your help.
I am assuming that you have a csv file called input.csv which contains a column named Country
The following python script creates a folder for every distinct country in the input file and appends the airport data in a file called data.csv inside that folder.
import os
import csv
countries = []
with open('input.csv', 'r') as read_obj:
csv_dict_reader = csv.DictReader(read_obj)
for row in csv_dict_reader:
if row["Country"] not in countries:
countries.append(row["Country"])
try:
os.mkdir(row["Country"])
except FileExistsError:
print(row["Country"], " already exists")
with open(row["Country"] + '/data.csv', 'a+') as f:
writer = csv.DictWriter(f, row.keys())
writer.writerow(row)
You might want to check pandas for another way to achieve this.
The following script reads 2 different csv files containting data the reference the same airport codes, creates a folder for each airport code and saves it's data in 2 different files: one for each input. Change the output and input filenames according to your needs.
import os
import pandas as pd
df1 = pd.read_csv('input.csv')
df2 = pd.read_csv('input1.csv')
for c in df1['code'].unique():
try:
os.mkdir(c)
except FileExistsError:
print(c, " already exists")
df1.loc[df1["code"] == c].to_csv(c + '/output1.csv', index=False)
for c in df2['code'].unique():
try:
os.mkdir(c)
except FileExistsError:
print(c, " already exists")
df2.loc[df2["code"] == c].to_csv(c + '/output2.csv', index=False)

Python: copy and paste files based on paths in two csvs

I have two csv files, one with a list of paths for source files, the second, a list of paths for where to copy the files to. Both files have the same number of elements and each source file is only copied once.
How would I load the .csv files (Pandas? Numpy? csv.reader?), and how would I copy all of the items in the best possible way? I am able to get the following to work if src and dest each refer to one path.
import pandas as pd
srcdf = pd.read_csv('src.csv')
destdf = pd.read_csv('dest.csv')
from shutil import copyfile
copyfile(src,dest)
There are no headers or columns in my files. It's just a vector of comma-separated values. The comma-separated values in my src csv file are look like:
/Users/johndoe/Downloads/50.jpg,
/Users/johndoe/Downloads/51.jpg,
In my dest csv file are like:
/Users/johndoe/Downloads/newFolder/50.jpg,
/Users/johndoe/Downloads/newFolder/51.jpg,
Assuming your CSV is just a list of paths with a single path on each row, you could do something like this:
import csv
from shutil import copyfile
def load_paths(filename):
pathdict = {}
with open(filename) as csvfile:
filereader = csv.reader(csvfile, delimiter=' ')
a = 0
for row in filereader:
pathdict[a] = ''.join(row)
a += 1
csvfile.close()
return pathdict
srcpaths = load_paths('srcfile.csv')
dstpaths = load_paths('dstfile.csv')
for a in range(len(srcpaths)):
copyfile(srcpaths[a],dstpaths[a])
You can use numpy genfromtxt as follows,
import numpy as np
from shutil import copyfile
srcdf = np.genfromtxt('./src.csv', dtype='S')
destdf = np.genfromtxt('./dest.csv', dtype='S')
assert len(srcdf) == len(destdf)
for n in range(len(srcdf)):
copyfile(srcdf[n],destdf[n])

Python 3 .csv not writing

I am trying to insert quiz results from a quiz into a .csv file however the results are not being written into the file after it is created.
file_writer = csv.writer(open('Class Results.csv', 'w'), delimiter=',')
file_writer.writerow((name, Class, score))
Is any other part of my code required?
You never store or close the file object, the result is written when the file object is closed.
with open('Class Results.csv', 'w') as f:
file_writer = csv.writer(f, delimiter=',')
file_writer.writerow((name, Class, score))

NOAA data and writing a csv file

I am trying to create a csv file from NOAA data from their http://www.srh.noaa.gov/data/obhistory/PAFA.html.
At the moment, I am having problems writing the csv file.
import urllib2 as urllib
from bs4 import BeautifulSoup
from time import localtime, strftime
import csv
url = 'http://www.srh.noaa.gov/data/obhistory/PAFA.html'
file_pointer = urllib.urlopen(url)
soup = BeautifulSoup(file_pointer)
table = soup('table')[3]
table_rows = table.findAll('tr')
row_count = 0
for table_row in table_rows:
row_count += 1
if row_count < 4:
continue
date = table_row('td')[0].contents[0]
time = table_row('td')[1].contents[0]
wind = table_row('td')[2].contents[0]
print date, time, wind
with open("/home/eyalak/Documents/weather/weather.csv", "wb") as f:
writer = csv.writer(f)
print date, time, wind
writer.writerow( ('Title 1', 'Title 2', 'Title 3') )
writer.writerow(str(time)+str(wind)+str(date)+'\n')
if row_count == 74:
print "74"
The printed result is fine, it is the file that is not. I get:
Title 1,Title 2,Title 3
0,5,:,5,3,C,a,l,m,0,8,"
The problems in the csv file created are: 1. The title is broken into the wrong columns;column 2, has "1,Title" versus "title 2" 2. The data is comma delineated in the wrong places 3. As The script writes new lines it over writes on the previous one, instead of appending from the bottom. Any thoughts?
As far as overwriting rows try opening the file with the 'a' (append) option rather than 'wb'. As far as fixing the comma delineation try to encapsulate each string in square brackets. Take a look at the two examples here to see the difference:
import csv
text = 'This is a string'
with open('test.csv','a') as f:
writer = csv.writer(f)
writer.writerow(text)
This creates a csv whose first row is each letter of text is separated by a comma. Alternatively,
import csv
text = 'This is a string'
with open('test.csv','a') as f:
writer = csv.writer(f)
writer.writerow([text])
This will create a csv file whose first row contains only one item that is text and there is no comma separating the characters.