prevent CRLF in CSV export data - csv

I have an export functionality that reads data from DB (entire records) and writes them in a .txt file, one record on a row each field being separated by ';'. the problem i am facing is that some fields contain CRLFs in it and when i write them to the file it goes to the next line thus destroying the structure of the file.
The only solution is to replace the CRLFs with a custom value, and at import replace back with CRLF. but i don't like this solution because these files are huge and the replace operation decreases performance....
Do you have any other ideas?
thank you!

Yes, use a CSV generator that quotes string values. For example, Python's csv module.
For example (ripped and modified from the csv docs):
import csv
def write(filename):
spamWriter = csv.writer(open(filename, 'w'), quoting=csv.QUOTE_ALL)
spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam\nbar'])
def read(filename):
reader = csv.reader(open(filename, "rb"))
for row in reader:
print row
write('eggs.csv')
read('eggs.csv')
Outputs:
['Spam', 'Spam', 'Spam', 'Spam', 'Spam', 'Baked Beans']
['Spam', 'Lovely Spam', 'Wonderful Spam\r\nbar']

If you have control over how the file is exported and imported, then you might want to consider using XML .. also you can use double quotes i believe to indicate literals like "," in the values.

Related

Replacing multiple values in CSV

I have a directory full of CSVs. A script I use loads each CSV via a Loop and corrects commonly known errors in several columns prior to being imported into an SQL database. The corrections I want to apply are stored in a JSON file so that a user can freely add/remove any corrections on-the-fly without altering the main script.
My script works fine for 1 value correction, per column, per CSV. However I have noticed that 2 or more columns per CSV now contain additional errors, as well as more than one correction per column is now required.
Here is relevant code:
with open('lookup.json') as f:
translation_table = json.load(f)
for filename in gl.glob("(Compacted)_*.csv"):
df = pd.read_csv(filename, dtype=object)
#... Some other enrichment...
# Extract the file "key" with a regular expression (regex)
filekey = re.match(r"^\(Compacted\)_([A-Z0-9-]+_[0-9A-z]+)_[0-9]{8}_[0-9]{6}.csv$", filename).group(1)
# Use the translation tables to apply any error fixes
if filekey in translation_table["error_lookup"]:
tablename = translation_table["error_lookup"][filekey]
df[tablename[0]] = df[tablename[0]].replace({tablename[1]: tablename[2]})
else:
pass
And here is the lookup.json file:
}
"error_lookup": {
"T7000_08": ["MODCT", "C00", -5555],
"T7000_17": ["MODCT", "C00", -5555],
"T7000_20": ["CLLM5", "--", -5555],
"T700_13": ["CODE", "100T", -5555]
}
For example if a column (in a CSV that includes the key "T7000_20") has a new erroneous value of ";;" in column CLLM5, how can I ensure that values that contain "--" and ";;" are replaced with "-5555"? How do I account for another column in the same CSV too?
Can you change the JSON file? The example below would edit Column A (old1 → new 1 and old2 → new2) and would make similar changes to Column B:
{'error_lookup': {'T7000_20': {'colA': ['old1', 'new1', 'old2', 'new2'],
'colB': ['old3', 'new3', 'old4', 'new4']}}}
The JSON parsing gets more complex, in order to handle current use case and new requirements.

CSV import (neo4j browser) returning empty nodes only i.e. without properties

I am unable to successfully import a csv file in the neo4j browser, as the nodes are created but they do not show the properties. Does anyone see the problem? I will describe how I proceeded:
This is how the csv file looks
I have tested the csv file with LOAD CSV WITH HEADERS FROM "file:///testCSV3.csv" AS line
WITH line LIMIT 4
RETURN line
and the result is ok (I guess?):
I then tried various things, as e.g. this query:
LOAD CSV WITH HEADERS FROM "file:///testCSV3.csv" AS line
CREATE (:Activity {activityName: line.MyActivity, time: toInteger(line.Timestamp)})
The outcome is nodes without properties:
Any ideas what I am missing? Why are the properties activityName and time not showing up? - Thanks in advance!
(You should have shown your raw CSV file, to make the problem clearer.)
I assume your raw file starts out like this:
ID ;Timestamp;MyActivity
1;1;Run
2;2;Talk
3;3;Eat
LOAD CSV is sensitive to extra spaces, so your ID header should not be followed by a space. Also, the default field terminator is a comma not a semicolon, so you need to specify the FIELDTERMINATOR option to override the default.
Your results would be more reasonable if you removed the extra space and changed your query to this:
LOAD CSV WITH HEADERS FROM "file:///testCSV3.csv" AS line FIELDTERMINATOR ';'
WITH line LIMIT 4
RETURN line

Importing PIPE delimited format txt into MySQL via PHPMyAdmin

I am importing some thousands lines of Data from a .txt file containing two columns and the format is as it follows:
A8041550408#=86^:|blablablablablablablablablablablablablablablablablablablabla1
blablablablablablablablablablablablablablablablablablablabla2
blablablablablablablablablablablablablablablablablablablabla3
A8041550408#=86^:|blablablablablablablablablablablablablablablablablablablabla1
blablablablablablablablablablablablablablablablablablablabla2
A8041550408#=86^:|blablablablablablablablablablablablablablablablablablablabla1
blablablablablablablablablablablablablablablablablablablabla2
blablablablablablablablablablablablablablablablablablablabla3
blablablablablablablablablablablablablablablablablablablabla4
etc....
What I have done so far is create a table with the two fields, but when i try to import the .txt file as a CSV and putting / Columns separated By : | /, I get an error:
"Invalid column count in CSV input on line 2."
Which is quite obvious since the second line of the .txt file is empty.
Moreover, I have tried importing the file as a CSV using LOAD DATA, and it didn't work as well it has just filled up the table with random words and phrases from the .txt file .
So my question is : How can I import the data from this file ?
You have to fix your file; in its current state you cannot expect the import module to be able to understand it. First step would be to remove the empty lines: How to remove blank lines from a Unix file

Python 3: write string list to csv file

I have found several answers (encoding, decoding...) online, but I still don't get what to do.
I have a list called abc.
abc = ['sentence1','-1','sentence2','1','sentence3','0'...]
Now I would like to store this list in a CSV file, the following way:
sentence1, -1
sentence2, 1
sentence3, 0
I know that the format of my abc list probably isn't how it should to achieve this. I guess it should be a list of lists? But the major problem is actually that I have no clue how to write this to a CSV file, using Python 3. The only times it kinda worked, was when every character turned out to be separated by a comma.
Does anybody know how to solve this? Thank you!
You can use zip and join to create a new list and then write to csv :
abc=['sentence1', '-1', 'sentence2', '1', 'sentence3', '0', 'sentence4']
new=[(abc[0],)]+[(''.join(i),) for i in zip(abc[1::2],abc[2::2])]
import csv
with open('test.csv', 'w', newline='') as fp:
a = csv.writer(fp, delimiter=',')
a.writerows(new)
result :
sentence1
-1sentence2
1sentence3
0sentence4
Here is the documentation to work with files, and CSV is basically the same thing as txt, the difference is that you should use commas to separate the columns and new lines to rows.
In your example you could do this (or iterate over a loop):
formated_to_csv = abc[0]+','+abc[1]+','+abc[2]+','+abc[3]...
the value of formated_to_csv would be 'sentence1,-1,sentence2,1,sentence3,0'.. note that this is a single string, so it will generate a single row, and then write the formated_to_csv as text in the csv file :
f.write(formated_to_csv)
To put all sentences on the first column and all the numbers on the second column it would be better to have a list of lists :
abc = [['sentence1','-1'],['sentence2','1'],['sentence3','0']...]
for row in abc:
f.write(row[0]+','+row[1])
The "conversion" to table will be done by excel, calc or whatever program that you use to read spreadsheets.

Importing CSV file in Talend - how to set options to match Excel

I have a CSV file that I can open in Excel 2012 and it comes in perfectly. When I try to setup the metadata for this CSV file in Talend the fields (columns) are not splitting the same was as Excel splits them. I suspect I am not properly setting the metadata.
The specific issue is that I have a column with string data in it which may contain commas within the string. For example suppose I have a CSV file with three columns: ID, Name and Age which looks like this:
ID,Name,Age
1,Ralph,34
2,Sue,14
3,"Smith, John", 42
When Excel reads this CSV file it looks at the second element of the third row ("Smith, John") as a single token and places it into a cell by itself.
In Talend it trys to break this same token into two since there is a comma within the token. Apparently Excel ignores all delimeters within a quoted string while Talend by default does not.
My question is how to I get Talend to behave the same as Excel?
if you use tfileinputdelimited component to read this csv file, you can use delimeter as "," and under csv options properties of this component you should enable Text Enclosure """ option or even if you use metadata there would be an option to define string/text enclosure - here you should mention """ to resolve your problem