Python 3: write string list to csv file - csv

I have found several answers (encoding, decoding...) online, but I still don't get what to do.
I have a list called abc.
abc = ['sentence1','-1','sentence2','1','sentence3','0'...]
Now I would like to store this list in a CSV file, the following way:
sentence1, -1
sentence2, 1
sentence3, 0
I know that the format of my abc list probably isn't how it should to achieve this. I guess it should be a list of lists? But the major problem is actually that I have no clue how to write this to a CSV file, using Python 3. The only times it kinda worked, was when every character turned out to be separated by a comma.
Does anybody know how to solve this? Thank you!

You can use zip and join to create a new list and then write to csv :
abc=['sentence1', '-1', 'sentence2', '1', 'sentence3', '0', 'sentence4']
new=[(abc[0],)]+[(''.join(i),) for i in zip(abc[1::2],abc[2::2])]
import csv
with open('test.csv', 'w', newline='') as fp:
a = csv.writer(fp, delimiter=',')
a.writerows(new)
result :
sentence1
-1sentence2
1sentence3
0sentence4

Here is the documentation to work with files, and CSV is basically the same thing as txt, the difference is that you should use commas to separate the columns and new lines to rows.
In your example you could do this (or iterate over a loop):
formated_to_csv = abc[0]+','+abc[1]+','+abc[2]+','+abc[3]...
the value of formated_to_csv would be 'sentence1,-1,sentence2,1,sentence3,0'.. note that this is a single string, so it will generate a single row, and then write the formated_to_csv as text in the csv file :
f.write(formated_to_csv)
To put all sentences on the first column and all the numbers on the second column it would be better to have a list of lists :
abc = [['sentence1','-1'],['sentence2','1'],['sentence3','0']...]
for row in abc:
f.write(row[0]+','+row[1])
The "conversion" to table will be done by excel, calc or whatever program that you use to read spreadsheets.

Related

Replacing multiple values in CSV

I have a directory full of CSVs. A script I use loads each CSV via a Loop and corrects commonly known errors in several columns prior to being imported into an SQL database. The corrections I want to apply are stored in a JSON file so that a user can freely add/remove any corrections on-the-fly without altering the main script.
My script works fine for 1 value correction, per column, per CSV. However I have noticed that 2 or more columns per CSV now contain additional errors, as well as more than one correction per column is now required.
Here is relevant code:
with open('lookup.json') as f:
translation_table = json.load(f)
for filename in gl.glob("(Compacted)_*.csv"):
df = pd.read_csv(filename, dtype=object)
#... Some other enrichment...
# Extract the file "key" with a regular expression (regex)
filekey = re.match(r"^\(Compacted\)_([A-Z0-9-]+_[0-9A-z]+)_[0-9]{8}_[0-9]{6}.csv$", filename).group(1)
# Use the translation tables to apply any error fixes
if filekey in translation_table["error_lookup"]:
tablename = translation_table["error_lookup"][filekey]
df[tablename[0]] = df[tablename[0]].replace({tablename[1]: tablename[2]})
else:
pass
And here is the lookup.json file:
}
"error_lookup": {
"T7000_08": ["MODCT", "C00", -5555],
"T7000_17": ["MODCT", "C00", -5555],
"T7000_20": ["CLLM5", "--", -5555],
"T700_13": ["CODE", "100T", -5555]
}
For example if a column (in a CSV that includes the key "T7000_20") has a new erroneous value of ";;" in column CLLM5, how can I ensure that values that contain "--" and ";;" are replaced with "-5555"? How do I account for another column in the same CSV too?
Can you change the JSON file? The example below would edit Column A (old1 → new 1 and old2 → new2) and would make similar changes to Column B:
{'error_lookup': {'T7000_20': {'colA': ['old1', 'new1', 'old2', 'new2'],
'colB': ['old3', 'new3', 'old4', 'new4']}}}
The JSON parsing gets more complex, in order to handle current use case and new requirements.

Reading csv files into table in GAMS

I am having trouble reading CSV files into table. Suppose I have csv file looking like:
;c1;c1;c3;c4
r1;(some numeric values separated by ";")
r2;(some numeric values separated by ";")
I have tried to rewrite the csv into a .inc file and replacing ;with spaces. And then do something like this:
set col /c1 * c4/
set row /r1 * r2/
table(row, col)
$include("myfile.inc")
;
But this doesn't work because my collins are not aligned and I can't do it manually because I have more than 500 columns.
My problem can be solved by
finding a way to define table entries without the text being aligned
find way to read csv directly into GAMS
What you suggest I do?
There are two things that could help you:
Setting $onDelim (https://www.gams.com/latest/docs/UG_DollarControlOptions.html#DOLLARonoffdelim)
Using csv2gdx (https://www.gams.com/latest/docs/T_CSV2GDX.html)

How to remove duplicates from a csv file without sorting

I have a file with three columns: first-name, last-name, email
I want to remove duplicate columns without sorting files. The file is in csv format.
How I can do that? Please help.
You can use python to remove the duplicate from a csv file. There are multiple ways in python to remove the duplicates. E.g you can use set as shown below
import fileinput
withoutDuplicate = set()
for line in fileinput.FileInput('<name of your csv file>', inplace=1):
if line in withoutDuplicate:
withoutDuplicate.add(line)
print line,
This will remove the duplicate from your original file.Let me know if you are looking for something else.

Howto process multivariate time series given as multiline, multirow *.csv files with Apache Pig?

I need to process multivariate time series given as multiline, multirow *.csv files with Apache Pig. I am trying to use a custom UDF (EvalFunc) to solve my problem. However, all Loaders I tried (except org.apache.pig.impl.io.ReadToEndLoader which I do not get to work) to load data in my csv-files and pass it to the UDF return one line of the file as one record. What I need is, however one column (or the content of the complete file) to be able to process a complete time series. Processing one value is obviously useless because I need longer sequences of values...
The data in the csv-files looks like this (30 columns, 1st is a datetime, all others are double values, here 3 sample lines):
17.06.2013 00:00:00;427;-13.793273;2.885583;-0.074701;209.790688;233.118828;1.411723;329.099170;331.554919;0.077026;0.485670;0.691253;2.847106;297.912382;50.000000;0.000000;0.012599;1.161726;0.023110;0.952259;0.024673;2.304819;0.027350;0.671688;0.025068;0.091313;0.026113;0.271128;0.032320;0
17.06.2013 00:00:01;430;-13.879651;3.137179;-0.067678;209.796500;233.141233;1.411920;329.176863;330.910693;0.071084;0.365037;0.564816;2.837506;293.418550;50.000000;0.000000;0.014108;1.159334;0.020250;0.954318;0.022934;2.294808;0.028274;0.668540;0.020850;0.093157;0.027120;0.265855;0.033370;0
17.06.2013 00:00:02;451;-15.080651;3.397742;-0.078467;209.781511;233.117081;1.410744;328.868437;330.494671;0.076037;0.358719;0.544694;2.841955;288.345883;50.000000;0.000000;0.017203;1.158976;0.022345;0.959076;0.018688;2.298611;0.027253;0.665095;0.025332;0.099996;0.023892;0.271983;0.024882;0
Has anyone an idea how I could process this as 29 time series?
Thanks in advance!
What do you want to achieve?
If you want to read all rows in all files as a single record, this can work:
a = LOAD '...' USING PigStorage(';') as <schema> ;
b = GROUP a ALL;
b will contain all the rows in a bag.
If you want to read each CSV file as a single record, this can work:
a = LOAD '...' USING PigStorage(';','tagsource') as <schema> ;
b = GROUP a BY $0; --$0 is the filename
b will contain all the rows per file in a bag.

prevent CRLF in CSV export data

I have an export functionality that reads data from DB (entire records) and writes them in a .txt file, one record on a row each field being separated by ';'. the problem i am facing is that some fields contain CRLFs in it and when i write them to the file it goes to the next line thus destroying the structure of the file.
The only solution is to replace the CRLFs with a custom value, and at import replace back with CRLF. but i don't like this solution because these files are huge and the replace operation decreases performance....
Do you have any other ideas?
thank you!
Yes, use a CSV generator that quotes string values. For example, Python's csv module.
For example (ripped and modified from the csv docs):
import csv
def write(filename):
spamWriter = csv.writer(open(filename, 'w'), quoting=csv.QUOTE_ALL)
spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam\nbar'])
def read(filename):
reader = csv.reader(open(filename, "rb"))
for row in reader:
print row
write('eggs.csv')
read('eggs.csv')
Outputs:
['Spam', 'Spam', 'Spam', 'Spam', 'Spam', 'Baked Beans']
['Spam', 'Lovely Spam', 'Wonderful Spam\r\nbar']
If you have control over how the file is exported and imported, then you might want to consider using XML .. also you can use double quotes i believe to indicate literals like "," in the values.