CSV::MalformedCSVError: Illegal quoting in line 1 with SmarterCSV - csv

I have an issue, when trying to process a csv file, using SmarterCSV.
The error I get is -
CSV::MalformedCSVError: Illegal quoting in line 1
This is where the code I use to process the csv file
SmarterCSV.process(file_path)
I have gone through similar questions. But no where I find a good fit that could help me.
I tried to resolve it using some options of SmarterCSV such as -
:remove_empty_values, :remove_empty_hashes etc. But in vain.
I welcome the suggestions or refactoring to make this work? Thanks all

This is due to illegal Unicode characters inside your file.
You can process file with Unicode characters with
f = File.open(file_path, "r:bom|utf-8"); data = SmarterCSV.process(f); f.close
here data will contain parsed data.
Also refer official documentation on this:https://github.com/tilo/smarter_csv#notes-about-file-encodings

Related

How do you fix the following error? I am trying to Data Table Import Wizard to load a csv file into Workbench

I am trying to upload a .csv file into Workbench using the Table Data Import Wizard.
I receive the following error whenever attempting to load it:
Unhandled exception: 'ascii' codec can't decode byte 0xc3 in position 1253: ordinal not in range(128)
I have tried previous solutions that suggested I encode the .csv file as a MS-DOS csv and as a UTF-8 csv. Neither have worked for me.
Attempting to change the data in the file would not be feasible since its made up of thousands of cells, so it would quite impractical. Is there anything that can be done to resolve this?
What was after the C3? What should have been there?
C3, when interpreted as "latin1" is à -- an unlikely character.
More likely is a 2-byte UTF-8 code that starts with C3. This includes the accented letters of Western European languages. Example é, hex C3A9.
You tried "UTF-8 csv" -- Please provide the specifics of how you tried it. What settings in the Wizard, etc.
Probably you should state that the data is "UTF-8" or utf8mb4, depending on whether you are referring to outside or inside MySQL.
Meanwhile, if you are loading the data into an existing "table", let's see SHOW CREATE TABLE. It should probably not say "ascii" anywhere; instead, it should probably say "utf8mb4".

Golang CSV read : extraneous " in field error

I am using a simple program to read CSV file, somehow I noticed when I created a CSV using EXCEL or windows based computer go library fails to read it. even when I use cat command it only shows me last line on the terminal. It always results in this error extraneous " in field.
I researched somewhat than I found it is somewhat related to carriage return differences between OS.
But I really want to ask how to make a generic csv reader. I tried reading the same csv using pandas and it was reading successfully. But i am not been able to achieve this using my Go code.
Also screen shot of correct csv Is here
Your file clearly shows that you've got an extra quote at the end of the content. While programs like pandas may be fine with that, I assume it's not valid csv so go does return an error.
Quick example of what's wrong with your data: https://play.golang.org/p/KBikSc1nzD
Update: After your update and a little bit of searching, I have to apoligize, the carriage return does matter and seems to be tha main culprit here, Go seems to be ok handling the \r\n windows variant but not the \r one. In that case what you can do is wrap the bytes.Reader into a custom reader that replaces the \r byte with the \n byte.
Here's an example: https://play.golang.org/p/vNjzwAHmtg
Please note, that the example is just that, an example, it's not handling all the possible cases where \r might be a legit byte.

Java.io.IOException: wrong number of values (WEKA CSV to ARFF)

Currently working on a Data Mining project using my own dataset I had found using Weka. The only issue is that taking my file from csv format and converting it into arff format is causing issues.
java.io.IOException: wrong number of values. Read 2, expected 5, Read Token[EOL], line 3
This is the error I am getting. I have browsed around online looking for similar issues and have tried removing all quotes and special characters that throw this exception. Every place I looked told me to remove special characters and I believe there are none left. The link to my dataset is here : https://docs.google.com/spreadsheets/d/1xqEe7MZE9SdKB_yvFSgWeSVYuDrq0b31Eu5oECNbGH0/edit#gid=1736568367&vpid=A1
This is the first three lines of my file where the first is the attribute names, file is separated by commas in note
Inequality Adjusted HPI Rank,Sub Region,Inequality Adjusted Life Expectancy,Inquality Adjusted Well being,Footprint
,Inequality adjusted HPI
1,1,73.1,6.9,2.5,48.2
2,6,65.17333333,5.487667631,1.390974448,45.97489063
If you open your file with a text editor, you will see that Footprint has quotes around it. Delete the quotes and you are good to go!
Weka is normally not that good in reading CSV files that include special characters, and ARFF files are normally easier to use. Therefore, in such cases, the easiest way is to convert your CSV file to an ARFF file using R ("RWeka" and "foreign" libraries can handle this conversion).
There is also another possibility. I was creating my CSV file and the header had a different number of elements compared to the rest of the data. So, check the header as well...!

When COPYing to Redshift, how to deal with special characters in a CSV?

I am using a COPY with ACCEPTINVCHARS to load a CSV into Amazon Redshift.
Unfortunately I get errors like
Missing newline: Unexpected character 0x69 found at location 129
However, if I try to use the ESCAPE option as well, I get the exception
CSV is not compatible with ESCAPE
What am I supposed to do in order to COPY this into Redshift? I'm fine if the chars get replaced with ? or whatever.
Ignore the header as the headers might not be of the same datatype as your fields.
Use IGNOREHEADER AS
Refer to the forum for more details,
https://forums.aws.amazon.com/thread.jspa?messageID=557452
For future generations, "CSV is not compatible with ESCAPE" is probably right but you don't actually need the CSV keyword to load CSV, so it's worth trying to remove the CSV keyword from your copy command.

how to use ascii character for quote in COPY in cqlsh

I am uploading data from a a big .csv file into Cassandra using copy in cqlsh.
I am using cassandra 1.2 and CQL 3.0.
However since " is part of my data I have to use some other character for uploading my data, I need to use any extended ASCII characters. I tried various approaches but fails.
The following works, but need to use an extended ascii characters for my purpose..
copy (<columnnames>) from <filename> where deleimiter='|' and quote = '"';
copy (<columnnames>) from <filename> where deleimiter='|' and quote = '~';
When I give quote='ß', I get the error below:
:"quotechar" must be an 1-character string
Pls advice on how I can use an extended ASCII character for quote parameter..
Thanks in advance
A note on the COPY documentation page suggests that for bulk loading (like in your case), the json2sstable utility should be used. You can then load the sstables to your cluster using sstableloader. So I suggest that you write a script/program to convert your CSV to JSON and use these tools for your big CSV. JSON will not have any problem handling all characters from ASCII table.
I had a similar problem, and inspected the source code of cqlsh (it's a python script). In my case, I was generating the csv with python, so it was a matter of finding the right python csv parameters.
Here's the key information from cqlsh:
csv_dialect_defaults = dict(delimiter=',', doublequote=False,
escapechar='\\', quotechar='"')
So if you are lucky enough to generate your .csv file from python, it's just a matter of using the csv module with:
writer = csv.writer(open("output.csv", 'w'), **csv_dialect_defaults)
Hope this helps, even if you are not using python.