Write a CSV based on another CSV file creating an additional empty row? [duplicate] - csv

import csv
with open('thefile.csv', 'rb') as f:
data = list(csv.reader(f))
import collections
counter = collections.defaultdict(int)
for row in data:
counter[row[10]] += 1
with open('/pythonwork/thefile_subset11.csv', 'w') as outfile:
writer = csv.writer(outfile)
for row in data:
if counter[row[10]] >= 504:
writer.writerow(row)
This code reads thefile.csv, makes changes, and writes results to thefile_subset1.
However, when I open the resulting csv in Microsoft Excel, there is an extra blank line after each record!
Is there a way to make it not put an extra blank line?

The csv.writer module directly controls line endings and writes \r\n into the file directly. In Python 3 the file must be opened in untranslated text mode with the parameters 'w', newline='' (empty string) or it will write \r\r\n on Windows, where the default text mode will translate each \n into \r\n.
#!python3
with open('/pythonwork/thefile_subset11.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
In Python 2, use binary mode to open outfile with mode 'wb' instead of 'w' to prevent Windows newline translation. Python 2 also has problems with Unicode and requires other workarounds to write non-ASCII text. See the Python 2 link below and the UnicodeReader and UnicodeWriter examples at the end of the page if you have to deal with writing Unicode strings to CSVs on Python 2, or look into the 3rd party unicodecsv module:
#!python2
with open('/pythonwork/thefile_subset11.csv', 'wb') as outfile:
writer = csv.writer(outfile)
Documentation Links
https://docs.python.org/3/library/csv.html#csv.writer
https://docs.python.org/2/library/csv.html#csv.writer

Opening the file in binary mode "wb" will not work in Python 3+. Or rather, you'd have to convert your data to binary before writing it. That's just a hassle.
Instead, you should keep it in text mode, but override the newline as empty. Like so:
with open('/pythonwork/thefile_subset11.csv', 'w', newline='') as outfile:

Note: It seems this is not the preferred solution because of how the extra line was being added on a Windows system. As stated in the python document:
If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.
Windows is one such platform where that makes a difference. While changing the line terminator as I described below may have fixed the problem, the problem could be avoided altogether by opening the file in binary mode. One might say this solution is more "elegent". "Fiddling" with the line terminator would have likely resulted in unportable code between systems in this case, where opening a file in binary mode on a unix system results in no effect. ie. it results in cross system compatible code.
From Python Docs:
On Windows, 'b' appended to the mode
opens the file in binary mode, so
there are also modes like 'rb', 'wb',
and 'r+b'. Python on Windows makes a
distinction between text and binary
files; the end-of-line characters in
text files are automatically altered
slightly when data is read or written.
This behind-the-scenes modification to
file data is fine for ASCII text
files, but it’ll corrupt binary data
like that in JPEG or EXE files. Be
very careful to use binary mode when
reading and writing such files. On
Unix, it doesn’t hurt to append a 'b'
to the mode, so you can use it
platform-independently for all binary
files.
Original:
As part of optional paramaters for the csv.writer if you are getting extra blank lines you may have to change the lineterminator (info here). Example below adapated from the python page csv docs. Change it from '\n' to whatever it should be. As this is just a stab in the dark at the problem this may or may not work, but it's my best guess.
>>> import csv
>>> spamWriter = csv.writer(open('eggs.csv', 'w'), lineterminator='\n')
>>> spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
>>> spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

The simple answer is that csv files should always be opened in binary mode whether for input or output, as otherwise on Windows there are problems with the line ending. Specifically on output the csv module will write \r\n (the standard CSV row terminator) and then (in text mode) the runtime will replace the \n by \r\n (the Windows standard line terminator) giving a result of \r\r\n.
Fiddling with the lineterminator is NOT the solution.

A lot of the other answers have become out of date in the ten years since the original question. For Python3, the answer is right in the documentation:
If csvfile is a file object, it should be opened with newline=''
The footnote explains in more detail:
If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.

Use the method defined below to write data to the CSV file.
open('outputFile.csv', 'a',newline='')
Just add an additional newline='' parameter inside the open method :
def writePhoneSpecsToCSV():
rowData=["field1", "field2"]
with open('outputFile.csv', 'a',newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(rowData)
This will write CSV rows without creating additional rows!

I'm writing this answer w.r.t. to python 3, as I've initially got the same problem.
I was supposed to get data from arduino using PySerial, and write them in a .csv file. Each reading in my case ended with '\r\n', so newline was always separating each line.
In my case, newline='' option didn't work. Because it showed some error like :
with open('op.csv', 'a',newline=' ') as csv_file:
ValueError: illegal newline value: ''
So it seemed that they don't accept omission of newline here.
Seeing one of the answers here only, I mentioned line terminator in the writer object, like,
writer = csv.writer(csv_file, delimiter=' ',lineterminator='\r')
and that worked for me for skipping the extra newlines.

with open(destPath+'\\'+csvXML, 'a+') as csvFile:
writer = csv.writer(csvFile, delimiter=';', lineterminator='\r')
writer.writerows(xmlList)
The "lineterminator='\r'" permit to pass to next row, without empty row between two.

Borrowing from this answer, it seems like the cleanest solution is to use io.TextIOWrapper. I managed to solve this problem for myself as follows:
from io import TextIOWrapper
...
with open(filename, 'wb') as csvfile, TextIOWrapper(csvfile, encoding='utf-8', newline='') as wrapper:
csvwriter = csv.writer(wrapper)
for data_row in data:
csvwriter.writerow(data_row)
The above answer is not compatible with Python 2. To have compatibility, I suppose one would simply need to wrap all the writing logic in an if block:
if sys.version_info < (3,):
# Python 2 way of handling CSVs
else:
# The above logic

I used writerow
def write_csv(writer, var1, var2, var3, var4):
"""
write four variables into a csv file
"""
writer.writerow([var1, var2, var3, var4])
numbers=set([1,2,3,4,5,6,7,2,4,6,8,10,12,14,16])
rules = list(permutations(numbers, 4))
#print(rules)
selection=[]
with open("count.csv", 'w',newline='') as csvfile:
writer = csv.writer(csvfile)
for rule in rules:
number1,number2,number3,number4=rule
if ((number1+number2+number3+number4)%5==0):
#print(rule)
selection.append(rule)
write_csv(writer,number1,number2,number3,number4)

When using Python 3 the empty lines can be avoid by using the codecs module. As stated in the documentation, files are opened in binary mode so no change of the newline kwarg is necessary. I was running into the same issue recently and that worked for me:
with codecs.open( csv_file, mode='w', encoding='utf-8') as out_csv:
csv_out_file = csv.DictWriter(out_csv)

Related

Invalid literal because symbol appears when reading a csv file

When I am using replit I can remove the little symbol that appears when I drag and drop in a csv file so my main.py can read it, otherwise I get invalid literal base 10 issue. I am trying to run this on local machine with sublime text and getting same error now as it is reading the file from the directory, so I assume it is adding this symbol in before reading.... I can click on the csv file in replit and edit, but cannot do this in sublime.
Can someone explain what this is for? HOw can I get it to read the basic comma delimited numbers in the file (It is a game tile map).
with open(f'level{level}_data.csv', newline= '') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
Saved it is comma delimited csv instead of UTF-8 comma delimited csv. It then imports without the 'question mark in a diamon' symbol. I understand this is an unrecognised special character, but I have nothing apart from integers in my table. Maybe someone could clarify that?...

Reading a .dat file in Julia, issues with variable delimeter spacing

I am having issues reading a .dat file into a dataframe. I think the issue is with the delimiter. I have included a screen shot of what the data in the file looks like below. My best guess is that it is tab delimited between columns and then new-line delimited between rows. I have tried reading in the data with the following commands:
df = CSV.File("FORCECHAIN00046.dat"; header=false) |> DataFrame!
df = CSV.File("FORCECHAIN00046.dat"; header=false, delim = ' ') |> DataFrame!
My result either way is just a DataFrame with only one column including all the data frome each column concatenated into one string. I tried to even specify the types with the following code:
df = CSV.File("FORCECHAIN00046.dat"; types=[Float64,Float64,Float64,Float64,
Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64]) |> DataFrame!
And I received an the following error:
┌ Warning: 2; something went wrong trying to determine row positions for multithreading; it'd be very helpful if you could open an issue at https://github.com/JuliaData/CS
V.jl/issues so package authors can investigate
I can work around this by uploading it into google sheets and then downloading a csv, but I would like to find a way to make the original .dat file work.
Part of the issue here is that .dat is not a proper file format—it's just something that seems to be written out in a somewhat human-readable format with columns of numbers separated by variable numbers of spaces so that the numbers line up when you look at them in an editor. Google Sheets has a lot of clever tricks built in to "do what you want" for all kinds of ill-defined data files, so I'm not too surprised that it manages to parse this. The CSV package on the other hand supports using a single character as a delimiter or even a multi-character string, but not a variable number of spaces like this.
Possible solutions:
if the files aren't too big, you could easily roll your own parser that splits each line and then builds a matrix
you can also pre-process the file turning multiple spaces into single spaces
That's probably the easiest way to do this and here's some Julia code (untested since you didn't provide test data) that will open your file and convert it to a more reasonable format:
function dat2csv(dat_path::AbstractString, csv_path::AbstractString)
open(csv_path, write=true) do io
for line in eachline(dat_path)
join(io, split(line), ',')
println(io)
end
end
return csv_path
end
function dat2csv(dat_path::AbstractString)
base, ext = splitext(dat_path)
ext == ".dat" ||
throw(ArgumentError("file name doesn't end with `.dat`"))
return dat2csv(dat_path, "$base.csv")
end
You would call this function as dat2csv("FORCECHAIN00046.dat") and it would create the file FORCECHAIN00046.csv, which would be a proper CSV file using commas as delimiters. That won't work well if the files contain any values with commas in them, but it looks like they are just numbers, in which case it should be fine. So you can use this function to convert the files to proper CSV and then load that file with the CSV package.
A little explanation of the code:
the two-argument dat2csv method opens csv_path for writing and then calls eachline on dat_path to read one line form it at a time
eachline strips any trailing newline from each line, so each line will be bunch of numbers separated by whitespace with some leading and/or trailing whitespace
split(line) does the default splitting of line which splits it on whitespace, dropping any empty values—this leaves just the non-whitespace entries as strings in an array
join(io, split(line), ',') joins the strings in the array together, separated by the , character and writes that to the io write handle for csv_path
println(io) writes a newline after that—otherwise everything would just end up on a single very long line
the one-argument dat2csv method calls splitext to split the file name into a base name and an extension, checking that the extension is the expected .dat and calling the two-argument version with the .dat replaced by .csv
Try using the readdlm function in DelimitedFiles library, and convert to DataFrame afterwards:
using DelimitedFiles, DataFrames
df = DataFrame(readdlm("FORCECHAIN00046.dat"), :auto)

How to check encoding of a CSV file

I have a CSV file and I wish to understand its encoding. Is there a menu option in Microsoft Excel that can help me detect it
OR do I need to make use of programming languages like C# or PHP to deduce it.
You can use Notepad++ to evaluate a file's encoding without needing to write code. The evaluated encoding of the open file will display on the bottom bar, far right side. The encodings supported can be seen by going to Settings -> Preferences -> New Document/Default Directory and looking in the drop down.
In Linux systems, you can use file command. It will give the correct encoding
Sample:
file blah.csv
Output:
blah.csv: ISO-8859 text, with very long lines
If you use Python, just use a print() function to check the encoding of a csv file. For example:
with open('file_name.csv') as f:
print(f)
The output is something like this:
<_io.TextIOWrapper name='file_name.csv' mode='r' encoding='utf8'>
You can also use python chardet library
# install the chardet library
!pip install chardet
# import the chardet library
import chardet
# use the detect method to find the encoding
# 'rb' means read in the file as binary
with open("test.csv", 'rb') as file:
print(chardet.detect(file.read()))
Use chardet https://github.com/chardet/chardet (documentation is short and easy to read).
Install python, then pip install chardet, at last use the command line command.
I tested under GB2312 and it's pretty accurate. (Make sure you have at least a few characters, sample with only 1 character may fail easily).
file is not reliable as you can see.
Or you can execute in python console or in Jupyter Notebook:
import csv
data = open("file.csv","r")
data
You will see information about the data object like this:
<_io.TextIOWrapper name='arch.csv' mode='r' encoding='cp1250'>
As you can see it contains encoding infotmation.
CSV files have no headers indicating the encoding.
You can only guess by looking at:
the platform / application the file was created on
the bytes in the file
In 2021, emoticons are widely used, but many import tools fail to import them. The chardet library is often recommended in the answers above, but the lib does not handle emoticons well.
icecream = '🍦'
import csv
with open('test.csv', 'w') as f:
wf = csv.writer(f)
wf.writerow(['ice cream', icecream])
import chardet
with open('test.csv', 'rb') as f:
print(chardet.detect(f.read()))
{'encoding': 'Windows-1254', 'confidence': 0.3864823918622268, 'language': 'Turkish'}
This gives UnicodeDecodeError while trying to read the file with this encoding.
The default encoding on Mac is UTF-8. It's included explicitly here but that wasn't even necessary... but on Windows it might be.
with open('test.csv', 'r', encoding='utf-8') as f:
print(f.read())
ice cream,🍦
The file command also picked this up
file test.csv
test.csv: UTF-8 Unicode text, with CRLF line terminators
My advice in 2021, if the automatic detection goes wrong: try UTF-8 before resorting to chardet.
In Python, You can Try...
from encodings.aliases import aliases
alias_values = set(aliases.values())
for encoding in set(aliases.values()):
try:
df=pd.read_csv("test.csv", encoding=encoding)
print('successful', encoding)
except:
pass
As it is mentioned by #3724913 (Jitender Kumar) to use file command (it also works in WSL on Windows), I was able to get encoding information of a csv file by executing file --exclude encoding blah.csv using info available on man file as file blah.csv won't show the encoding info on my system.
import pandas as pd
import chardet
def read_csv(path: str, size: float = 0.10) -> pd.DataFrame:
"""
Reads a CSV file located at path and returns it as a Pandas DataFrame. If
nrows is provided, only the first nrows rows of the CSV file will be
read. Otherwise, all rows will be read.
Args:
path (str): The path to the CSV file.
size (float): The fraction of the file to be used for detecting the
encoding. Defaults to 0.10.
Returns:
pd.DataFrame: The CSV file as a Pandas DataFrame.
Raises:
UnicodeError: If the encoding of the file cannot be detected with the
initial size, the function will retry with a larger size (increased by
0.20) until the encoding can be detected or an error is raised.
"""
try:
byte_size = int(os.path.getsize(path) * size)
with open(path, "rb") as rawdata:
result = chardet.detect(rawdata.read(byte_size))
return pd.read_csv(path, encoding=result["encoding"])
except UnicodeError:
return read_csv(path=path, size=size + 0.20)
Hi, I just added a function to find the correct encoding and read the csv in the given file path. Thought it would be useful
Just add the encoding argument that matches the file you`re trying to upload.
open('example.csv', encoding='UTF8')

Python3 csv writer failing, exiting on error "TypeError: 'newline' is an invalid keyword argument for this function

What I'm trying to do:
I'm trying to change the formatting on a csv file from space delimited to comma delimited.
What I've done:
I can ingest the csv file just fine, and print the output row-by-row to the console. That code looks like this:
with open(txtpath, mode='r', newline='') as f:
fReader = csv.reader(f)
for rows in fReader:
print(rows)
This does exactly what it's supposed to, and spot checking the output confirms that the rows are being read correctly.
The Problem:
According to the official Python3 Documentation on csv.writer, "If csvfile is a file object, it should be opened with newline='' 1." My code looks like this:
with open(csvpath, 'w') as g:
gWriter = csv.writer(g, newline='')
gWriter.writerows(rows)
so all together, it looks like this:
with open(txtpath, mode='r', newline='') as f:
fReader = csv.reader(f)
for rows in fReader:
print(rows)
with open(csvpath, 'w') as g:
gWriter = csv.writer(g, newline='')
gWriter.writerows(rows)
However, when I run the code with both Pycharm (Anacondas 3.4 selected as project interpreter) and from the console with python3 mycode.py, both results tell me that newline "is an invalid keyword argument for this function" and references line 42, which is where my writer object is instantiated. I ran it through the debugger and it craps out as soon as I try to create the writer object. If I don't add the newline argument it asks for a dialect specification, so that doesn't work, either.
I'm sure there's something blindingly obvious that I'm missing, but I can't see it.
To avoid this error , open the file in 'wb' mode instead of 'w' mode. This will eliminate the need for newline.
Corrected code is as follows:
with open(csvpath, 'wb') as g:
gWriter = csv.writer(g)
gWriter.writerows(rows)
The wording is just a bit vague. The file should be opened with newline='', but newline is not a valid option for csv.writer() itself.
newline does not work in with open('output.csv', 'a',newline='') as fp. It will return back an error:
'newline' is an invalid keyword argument for this function
I used 'ab' method and it worked without blank lines between the lines:
with open('output.csv', 'ab',) as fp
I met the same problem and the solution may sound funny but it does work.
I actually changed the position of the newline argument to be in front of mode as below:
with open("database.csv", newline="", mode="a") as database2:
You can test and see the result.
The following will do
with open('filename_inCSV.csv',mode='r') as csvfile:
or
with open('filename_inCSV.csv','r') as csvfile:

Changing The Delimiter to CTRL+A in Python CSV Module

I'm trying to write a csv file with the delimiter ctrl+a. I'm going to have to eventually write the file to hadoop and I'm unable to use a standard delimiter.
Currently I'm trying this:
writer = csv.writer(f, delimiter = "\u0001")
for item in aList:
writer.writerow(item)
f.close()
However, the outputted excel file doesn't appear to be written correctly...
Some rows are condensed into one block, while others will have one field in the first and then the rest condensed into the second block, etc.
Is the error where I'm setting up the writer object, or am I just not familiar with separating files this way?
You can try using the nonprinting "group separator" character, which can be represented in python code as '\035'
see http://www.asciitable.com/index/asciifull.gif for some other nonprinting characters if you need more.
It may be helpful to include more context about why you want to use nonstandard delimiter. And whether Excel parsing of the file is necessary, or just a quick check to see if the file might be parsed properly by the target system, Hadoop.