Invalid literal because symbol appears when reading a csv file - csv

When I am using replit I can remove the little symbol that appears when I drag and drop in a csv file so my main.py can read it, otherwise I get invalid literal base 10 issue. I am trying to run this on local machine with sublime text and getting same error now as it is reading the file from the directory, so I assume it is adding this symbol in before reading.... I can click on the csv file in replit and edit, but cannot do this in sublime.
Can someone explain what this is for? HOw can I get it to read the basic comma delimited numbers in the file (It is a game tile map).
with open(f'level{level}_data.csv', newline= '') as csvfile:
reader = csv.reader(csvfile, delimiter=',')

Saved it is comma delimited csv instead of UTF-8 comma delimited csv. It then imports without the 'question mark in a diamon' symbol. I understand this is an unrecognised special character, but I have nothing apart from integers in my table. Maybe someone could clarify that?...

Related

How do I preserve the leading 0 of a number using Unoconv when converting from a .csv file to a .xls file?

I have a 3 column csv file. The 2nd column contains numbers with a leading zero. For example:
044934343
I need to convert a .csv file into a .xls and to do that I'm using the command line tool called 'unoconv'.
It's converting as expected, however when I load up the .xls in Excel instead of showing '04493434', the cell shows '4493434' (the leading 0 has been removed).
I have tried surrounding the number in the .csv file with a single quote and a double quote however the leading 0 is still removed after conversion.
Is there a way to tell unoconv that a particular column should be of a TEXT type? I've tried to read the man page of unocov however the options are little confusing.
Any help would be greatly appreciated.
Perhaps I came too late at the scene, but just in case someone is looking for an answer for a similar question this is how to do:
unoconv -i FilterOptions=44,34,76,1,1/1/2/2/3/1 --format xls <csvFileName>
The key here is "1/1/2/2/3/1" part, which tells unoconv that the second column's type should be "TEXT", leaving the first and third as "Standard".
You can find more info here: https://wiki.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options#Token_7.2C_csv_import
BTW this is my first post here...

Reading a .dat file in Julia, issues with variable delimeter spacing

I am having issues reading a .dat file into a dataframe. I think the issue is with the delimiter. I have included a screen shot of what the data in the file looks like below. My best guess is that it is tab delimited between columns and then new-line delimited between rows. I have tried reading in the data with the following commands:
df = CSV.File("FORCECHAIN00046.dat"; header=false) |> DataFrame!
df = CSV.File("FORCECHAIN00046.dat"; header=false, delim = ' ') |> DataFrame!
My result either way is just a DataFrame with only one column including all the data frome each column concatenated into one string. I tried to even specify the types with the following code:
df = CSV.File("FORCECHAIN00046.dat"; types=[Float64,Float64,Float64,Float64,
Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64]) |> DataFrame!
And I received an the following error:
┌ Warning: 2; something went wrong trying to determine row positions for multithreading; it'd be very helpful if you could open an issue at https://github.com/JuliaData/CS
V.jl/issues so package authors can investigate
I can work around this by uploading it into google sheets and then downloading a csv, but I would like to find a way to make the original .dat file work.
Part of the issue here is that .dat is not a proper file format—it's just something that seems to be written out in a somewhat human-readable format with columns of numbers separated by variable numbers of spaces so that the numbers line up when you look at them in an editor. Google Sheets has a lot of clever tricks built in to "do what you want" for all kinds of ill-defined data files, so I'm not too surprised that it manages to parse this. The CSV package on the other hand supports using a single character as a delimiter or even a multi-character string, but not a variable number of spaces like this.
Possible solutions:
if the files aren't too big, you could easily roll your own parser that splits each line and then builds a matrix
you can also pre-process the file turning multiple spaces into single spaces
That's probably the easiest way to do this and here's some Julia code (untested since you didn't provide test data) that will open your file and convert it to a more reasonable format:
function dat2csv(dat_path::AbstractString, csv_path::AbstractString)
open(csv_path, write=true) do io
for line in eachline(dat_path)
join(io, split(line), ',')
println(io)
end
end
return csv_path
end
function dat2csv(dat_path::AbstractString)
base, ext = splitext(dat_path)
ext == ".dat" ||
throw(ArgumentError("file name doesn't end with `.dat`"))
return dat2csv(dat_path, "$base.csv")
end
You would call this function as dat2csv("FORCECHAIN00046.dat") and it would create the file FORCECHAIN00046.csv, which would be a proper CSV file using commas as delimiters. That won't work well if the files contain any values with commas in them, but it looks like they are just numbers, in which case it should be fine. So you can use this function to convert the files to proper CSV and then load that file with the CSV package.
A little explanation of the code:
the two-argument dat2csv method opens csv_path for writing and then calls eachline on dat_path to read one line form it at a time
eachline strips any trailing newline from each line, so each line will be bunch of numbers separated by whitespace with some leading and/or trailing whitespace
split(line) does the default splitting of line which splits it on whitespace, dropping any empty values—this leaves just the non-whitespace entries as strings in an array
join(io, split(line), ',') joins the strings in the array together, separated by the , character and writes that to the io write handle for csv_path
println(io) writes a newline after that—otherwise everything would just end up on a single very long line
the one-argument dat2csv method calls splitext to split the file name into a base name and an extension, checking that the extension is the expected .dat and calling the two-argument version with the .dat replaced by .csv
Try using the readdlm function in DelimitedFiles library, and convert to DataFrame afterwards:
using DelimitedFiles, DataFrames
df = DataFrame(readdlm("FORCECHAIN00046.dat"), :auto)

Reading csv without specifying enclosure characters in Weka

I have a dataset that I want to open in Weka, so I converted it as csv file. (The file contains some text including commas/apostrophes/quotation marks, while its seperator is pipeline character.)
When I try to read this csv file, in options window, I specify pipeline (|) as my fieldSeperator, leave enclosureCharacters empty, and don't touch the rest of the options. This can be seen in the screenshot:
Then I get this error:
File not recognised as an 'CSV data files' file. Reason: Enclosures
can only be single characters.
Seems like Weka's csv loader does not accept enclosureCharacters field empty? What can I write into this field? I think my file does not have enclosures for its text data.

Remove all binary characters from a file

Occasionally, I have a hard time manipulating data in a CSV file because of the following error.
Binary file (standard input) matches
I researched several articles online but cannot seem to find one that helps me remove all of the binary characters or elements from a CSV file.
Unfortunately, I do not know where to start with this.
If I run the 'file' command on the file, I get the following output:
Little-endian UTF-16 Unicode text, with very long lines, with CRLF, CR line terminators
The second from last line in the file prints as:
"???? ?????, ???? ???",????,"?????, ????",???,,,,,,,,,,,,,,,,,,,,,,,,* Home,email#address.com,,
The second line in the file prints as:
,,,,,,,,,,,,,,,,,,,,,,,,,,,* ,email#address.com,,
This file contains too many lines to open in Excel or a GUI, "Save as..." and remove the binary elements that way.
Please help me. Thank you!

Changing The Delimiter to CTRL+A in Python CSV Module

I'm trying to write a csv file with the delimiter ctrl+a. I'm going to have to eventually write the file to hadoop and I'm unable to use a standard delimiter.
Currently I'm trying this:
writer = csv.writer(f, delimiter = "\u0001")
for item in aList:
writer.writerow(item)
f.close()
However, the outputted excel file doesn't appear to be written correctly...
Some rows are condensed into one block, while others will have one field in the first and then the rest condensed into the second block, etc.
Is the error where I'm setting up the writer object, or am I just not familiar with separating files this way?
You can try using the nonprinting "group separator" character, which can be represented in python code as '\035'
see http://www.asciitable.com/index/asciifull.gif for some other nonprinting characters if you need more.
It may be helpful to include more context about why you want to use nonstandard delimiter. And whether Excel parsing of the file is necessary, or just a quick check to see if the file might be parsed properly by the target system, Hadoop.