Removing bunch of commas from the end of all lines in CSV - csv

I need to turn a csv into an arff file but when i try to do it through the ARFFViewer form weka I get the following error:
"java.io.IOException: wrong number of values. Read 5, expected 6, read Token[EOL], line 2 encountered line: 2"
I've investigated this and what I have found is that I have a comma at the end of each line in my csv, the problem here is that is not one comma, there are a bunch of commas and not the same quantity in each line of the file and i have 10.000 lines so what could I do here?
Example of csv line:
chicken,tropical fruit,domestic eggs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
pot plants,domestic eggs,diapers,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
specialty bar,white bread,diapers,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Examples of other ending commas:
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Try this in a VI editor :%s/,,//g Search for ,, and replace with

Related

uploading csv into solr with simpleposttool

I want to upload my .csv file into my solr core using simpleposttool but there is a problem. My excel creates .csv file with semicolon ; because of that I'm replacing semicolon ; with comma ,. But there are some other comma , in my .csv file's data, so when I'm trying to upload .csv file, I'm getting this error.
error
17 commas separates datas but some datas has commas either. I've been dealing with this for a long time but there is no progress for me.
You can configure the separator with the separator argument when uploading CSV files.
&separator=%3B
(%3B is the URL encoded version of ;)
You can give extra parameters to the bin/post commparameters by adding -params "separator=%3B".

i can't escape commas using weka explorer

i have a .csv file which contains some sentences like this :
hi my name is Lorenzo, i want to solve this problem.
You can notice that there is a comma inside the sentence,i've tried putting it with
"," \, "\," ',' "'","'"
none of this worked...
the error that weka launch when i keep the sentence with comma is
wrong number of values. Read 2, expected 1

How to bulk load into cassandra other than copy method.?

AM using the copy method for cpying the .csv file into the cassandra tables..
But am getting records mismatch error..
Record 41(Line 41) has mismatched number of records (85 instead of 82)
This is happening for all the .csv files & all the .csv files are system generated..
Any work around for this error..?
Based on your error message, it sounds like the copy command is working for you, until record 41. What are you using as a delimiter? The default delimiter for the COPY command is a comma, and I'll bet that your data has some additional commas in it on line 41.
A few options:
Edit your data and remove the extra commas.
Alter your .csv file to encapsulate the values of all of your fields in double-quotes, as COPY's default QUOTE value is ". This will allow you to leave the in-text commas.
Alter your .csv file to delimit with pipes | instead of a comma, and set the COPY command's DELIMITER option to |.
Try using either the Cassandra bulk loader or json2sstable utility to import your data. I've never used them, but I would bet you'll have similar problems if you have commas in your data set.

Wrong number of values when importing csv in Weka

I want to open a csv file (saved from openoffice calc) in weka.
I keep getting an error: "wrong number of values. 140 read, 139 expected on line 3."
The csv was already fixed with quotes around the labels. And I count 140 values on the first lines.
What is wrong here?
Link to the file.
Turns out there was an value somewhere for beyond sight in the excel file I was exporting.
I noticed it because all the rows ended with a comma instead of nothing.
Carefully selected only the right reach, copied in a document and works.
Hope this helps somebody else as well.
I had the same error.!!!! I found the solution.
Just remove all the double-quote, single-quote from the .csv, .xls file.
i,e for eg. under the Name column if the value is "john" it throws an error. Make it to john by removing the quotes.
To remove all the quotes, go to the excel file FInd and replace box.
Find what - "
Replace with - (empty space)
I also went through the same problem when I was using Weka and importing a csv file.
The problem is with the wrong formatting of the file
In my file there was a word in one of the columns GOV'T what I just did was removed the "'" and wrote a whole word GOVERNMENT and it worked.
Hope this helps !!
I had the same error. Problem was a sigle quote character in a string value. Solution for me was to eclose the whole string value in double quotes.
So I have to convert
this: ...,Uncharted 3: Drake's Deception,...
to this: ...,"Uncharted 3: Drake's Deception",...
using weka v. 3.8.0
This is because of addition of extra column. So to get rid of that error, select whole of that column and delete that column.
That should work fine. :)
I also encountered with that error. My csv file contains floating numbers. I have solved that problem by replacing "," with "." .
For me all of the above worked. I replaced " ' , with space.
I had the same error before. I changed my .xls files without any blank ranks. Sometimes the Weka loaded too many "," . But if I clear the blank ranks than the Weka could be work.
If you have copied data from another file using Conrol+A, Control+C and control+V, you copied extra columns. if you open csv file in Nodepad you will see comma in the end of each row. you got this error because of the comma in the end of each row.
To avoid this error, press Control and select columns one by one then Control+C now copy it to new File which you will use in weka.
or you can use another method to avoid comma in the end of each row.
I encountered the same problem.
Replacing/ erasing all " and ' with space worked for me!

Loading comma separated .csv file into database

I was trying to load a .csv file into my database. It was a comma delimited file and for one of the columns there is a comma(,) in between the data just like Texas,Houston can some one help me how to get rid of the comma in between. the package which i have created recognizing the value after the comma as a new column but it should not be like that. Can any of the guys help me in this. I was getting error in the Flat file source itself. I thought of using Derived column but the package is failing at the source point itself.
Well some "comma" delimited files have ,"something or other", when there is a string and only use ,numeric_value, when its a number type. If your file is like this then you can preprocess your file changing ," for some (other) rare character, and similarly ", then replace the , if it occurs between the two rare characters. Or you can count the comma in any line and if its greater than the number pf delimited columns, manually frocess the exceptions