Premature end of line Weka error - csv

I'm new to Weka and I have to use it for a University project. So, I created a .csv file and when I try to upload it to Weka, it says: "not recognised as a CSV data file. Reason: 1 problem encountered on line 2".
Then, if I open the .csv file with Notepad and then save as .arff file, when I try to open it again with Weka, in this case I get another error message: "not recognised as an arff data file. Reason: premature end of line, read Token[EOL], line 8".
Please help, I don't know much about working with Weka and really don't know what could be the problem, even though I did a lot of research about this problem.
This is the file: https://app.box.com/s/adfpf1zatgpl5mo20u5hdd1gnqihnq40
#Relation "PIB_Rata inflatiei"
#Attribute "PIB" NUMERIC
#Attribute "Rata_inflatiei" NUMERIC
#Data
30624.3,20780.9,27980.4,31920.3,37657.0,37168.3,35838.9,41978.0,36183.4,37439.0,40717.1,46174.0,59867.6,76217.6,99699.2,123533.7,171540.2,208185.1,167421.6,167998.1,185362.3,171664.6,191548.1,199325.9,177956.0
128.0,211.2,255.2,136.8,32.2,38.8,154.8,59.1,45.8,45.7,34.5,22.5,15.3,11.3,9.0,6.6,4.8,7.8,5.6,6.1,5.8,3.3,4.0,1.1,-0.6

In the ARFF format (as well as CSV) instances are rows, and attributes are columns.
Your file thus has too many columns, ever row must have exactly.two values.

Related

What is causing a csv load error in weka?

Im receiving the following error when trying to open a CSV file in Weka version 3.8.5
File not recognized as an 'CSV data files' file Reason: wrong number
of values. Read 2, expected 12, read Token [EOL], Line 2 Problem
encountered on Line:2
I have read solutions to similar errors on this site and can't seem to find what is wrong with my particular file. However, as a very newbie weka user, it may just be my misunderstanding of the issue. Can someone take a look at the sample csv data below and let me know if you see what I am not understnding or missing?
LossMonth,LossYear,ClaimNumber,PolicyNumber,ClaimBranch,Agency,LocationCounty,CATCode,CauseCode,IncurredLoss,CurrentReserves,"
City",State,ZIPCODE,"
COLLISIONTYPECD","
CLOSEDDT",DaystoCLose,"
FATALITYCNT","
FATALITYIND","
FAULTRATINGIND","
AUTOGLASSIND","
DEERLOSSIND","
WEATHERRELATEDIND","
POLICYTIERCD",ClaimStatus,AgencyHandled,VEHICLEYEAR,DRIVERRELATIONTOINSUREDDESC,TOTALLOSSIND,INSURANCESCORE,Age
10,2016,4125858,20169200,4,113,73,1,comp,2525,0,PADUCAH,KY,42001,x,42692,18,0,0,0,0,0,1,70,1,0,2004,Other third party,0,703,73
1,2018,4265645,20137828,13,106,37,1,hail,3164,0,BAGDAD,KY,40003,x,43214,88,0,0,0,0,0,0,50,1,0,2010,Named Insured,1,799,63
12,2016,4136759,20322058,5,105,105,1,hail,2547,0,GEORGETOWN,KY,40324,x,42713,2,0,0,0,0,0,0,10,1,0,2010,Named Insured,0,999,68
1,2016,4033032,20175699,13,106,106,1,comp,15327,0,SIMPSONVILLE,KY,40067,x,42469,73,0,0,0,0,0,1,80,1,0,2000,Named Insured,1,999,34
9,2016,4116782,20133146,2,115,115,1,wind,7529,0,SPRINGFIELD,KY,40069,x,42649,8,0,0,0,0,0,0,10,1,0,2003,Named Insured,0,783,47
2,2016,4038442,20170355,7,148,10,1,hail,3631,0,ASHLAND,KY,41101,x,42417,1,0,0,0,0,0,0,50,1,0,2010,Named Insured,0,778,42
2,2016,4039439,20218265,7,45,10,1,hail,3579,0,FLATWOODS,KY,41139,x,42444,25,0,0,0,0,0,0,40,1,0,2013,Named Insured,0,820,52
2,2016,4039440,20218265,7,45,10,1,hail,570,0,FLATWOODS,KY,41139,x,42422,3,0,0,0,0,0,0,40,1,0,2012,Named Insured,0,820,52
3,2018,4275810,20126522,15,40,40,1,hail,3747,0,LANCASTER,KY,40444,x,43216,55,0,0,0,0,0,0,10,1,0,2009,Named Insured,1,999,74
5,2016,4071936,20461965,15,40,40,1,hail,525,0,LANCASTER,KY,40444,x,42521,7,0,0,0,0,0,0,50,1,0,2006,Named Insured,0,999,68
3,2016,4046685,20226270,7,35,35,1,hail,3558,0,FLEMINGSBURG,KY,41041,x,42447,2,0,0,0,0,0,0,80,1,0,2012,Named Insured,0,842,69
4,2016,4055942,20439287,7,35,35,1,hail,2551,0,EWING,KY,41039,x,42475,1,0,0,0,0,0,0,70,1,0,2006,Named Insured,0,867,48
1,2016,4026514,20394097,7,148,10,1,hail,1350,0,ASHLAND,KY,41101,x,42376,3,0,0,0,0,0,0,40,1,0,2007,Named Insured,0,637,65
3,2016,4047152,20212062,15,141,76,1,hail,1739,0,BEREA,KY,40403,x,42473,27,0,0,0,0,0,0,80,1,0,2008,Named Insured,0,777,77
2,2016,4035512,20103029,15,40,40,1,hail,2008,0,LANCASTER,KY,40444,x,42405,1,0,0,0,0,0,0,0,1,0,2000,Named Insured,1,885,72
1,2016,4030456,20385643,15,120,40,1,hail,1497,0,LANCASTER,KY,40444,x,42450,62,0,0,0,0,0,0,20,1,0,2013,Named Insured,0,839,65
4,2016,4053299,20251610,5,69,11,1,hail,1535,0,DANVILLE,KY,40422,x,42514,48,0,0,0,0,0,0,100,1,0,2013,Insured,0,999,64
6,2016,4076264,20337992,17,140,1,1,hail,1799,0,MILLTOWN,KY,42728,x,42529,2,0,0,0,0,0,0,50,1,0,2002,Named Insured,0,999,84
8,2017,4217498,20596983,8,86,86,1,hail,660,0,TOMPKINSVILLE,KY,42167,x,42954,0,0,0,0,0,0,0,100,1,0,2012,Named Insured,0,999,45
1,2016,4026053,20511114,4,113,113,1,hail,1310,0,STURGIS,KY,42459,x,42376,3,0,0,0,0,0,0,100,1,0,2003,Named Insured,0,694,44
1,2016,4026766,20656586,4,113,113,1,hail,2360,0,MORGANFIELD,KY,42437,x,42383,9,0,0,0,0,0,0,20,1,0,2010,Named Insured,0,999,89
1,2016,4027473,20085251,6,42,42,1,hail,1699,0,MAYFIELD,KY,42066,x,42381,5,0,0,0,0,0,0,90,1,0,2008,Named Insured,0,747,50
1,2016,4029284,20167051,17,109,109,1,wind,3133,0,CAMPBELLSVILLE,KY,42718,x,42387,5,0,0,0,0,0,0,10,1,0,1993,Named Insured,0,886,78
1,2016,4031937,20326278,3,81,12,1,comp,3385,0,FOSTER,KY,41043,x,42402,8,0,0,0,0,0,1,40,1,0,2003,Named Insured,0,723,79
1,2016,4027931,20339366,8,107,107,1,wind,5858,0,FRANKLIN,KY,42134,x,42447,70,0,0,0,0,0,0,20,1,0,2014,Named Insured,0,940,80
1,2016,4028456,20453076,15,87,87,1,comp,2056,0,JEFFERSONVILLE,KY,40337,x,42387,7,0,0,0,0,0,1,100,1,0,2013,Named Insured,0,999,51
1,2016,4028597,20051661,4,113,113,1,hail,5320,0,WAVERLY,KY,42462,x,42712,332,0,0,0,0,0,0,20,1,0,2014,Named Insured,0,717,58
3,2016,4046687,20018268,6,42,42,1,hail,2736,0,MAYFIELD,KY,42066,x,42450,5,0,0,0,0,0,0,110,1,0,2012,Named Insured,0,735,73
9,2016,4116499,20128172,3,96,59,1,glss,320,0,TAYLOR MILL,KY,41015,x,42660,20,0,0,0,0,0,1,0,1,0,1997,Spouse,0,923,81
1,2016,4026247,20086164,4,113,113,1,hail,1611,0,MORGANFIELD,KY,42437,x,42376,3,0,0,0,0,0,0,10,1,0,2013,Named Insured,0,902,61
1,2016,4027222,20033936,6,79,79,1,glss,105,0,CALVERT CITY,KY,42029,x,42389,14,0,0,0,0,0,1,110,1,0,2001,Named Insured,0,772,57
1,2016,4028311,20059964,4,75,75,1,comp,1040,0,SACRAMENTO,KY,42372,x,42382,2,0,0,0,0,0,1,10,1,0,1996,Named Insured,0,999,64
1,2016,4029164,20541039,6,42,42,1,wind,1495,0,SEDALIA,KY,42079,x,42382,0,0,0,0,0,0,0,0,1,0,2008,Named Insured,0,756,67
1,2016,4027475,20085251,6,42,42,1,hail,940,0,MAYFIELD,KY,42066,x,42381,5,0,0,0,0,0,0,90,1,0,2013,Named Insured,0,747,50
1,2016,4030356,20007300,4,117,117,1,hail,6550,0,DIXON,KY,42409,x,42436,49,0,0,0,0,0,0,40,1,0,2009,Named Insured,0,864,34
Weka's CSVLoader cannot handle rows that span multiple lines (despite quoting). Once all your rows (header and data) are one per line, you should be fine.
The common-csv (unofficial) Weka package should be able to handle rows spanning multiple lines.

Weka ARFF error

I am new to Weka.
I am trying to run my arff file in weka and I keep getting an error.
java.io.IOException: Unable to determine structure as arff(Reason: java.io.IOException:end of line expected, readToken[LINKEDIN],line1)
I don't see whats wrong with the line and I have tried to format the file and declare the attributes as best as I can.
I have attached the file
ARFF file.
Remove extra whitespace from your first line.
You can also try adding quotation marks, but I'm not sure if that is permitted by ARFF

Java.io.IOException: wrong number of values (WEKA CSV to ARFF)

Currently working on a Data Mining project using my own dataset I had found using Weka. The only issue is that taking my file from csv format and converting it into arff format is causing issues.
java.io.IOException: wrong number of values. Read 2, expected 5, Read Token[EOL], line 3
This is the error I am getting. I have browsed around online looking for similar issues and have tried removing all quotes and special characters that throw this exception. Every place I looked told me to remove special characters and I believe there are none left. The link to my dataset is here : https://docs.google.com/spreadsheets/d/1xqEe7MZE9SdKB_yvFSgWeSVYuDrq0b31Eu5oECNbGH0/edit#gid=1736568367&vpid=A1
This is the first three lines of my file where the first is the attribute names, file is separated by commas in note
Inequality Adjusted HPI Rank,Sub Region,Inequality Adjusted Life Expectancy,Inquality Adjusted Well being,Footprint
,Inequality adjusted HPI
1,1,73.1,6.9,2.5,48.2
2,6,65.17333333,5.487667631,1.390974448,45.97489063
If you open your file with a text editor, you will see that Footprint has quotes around it. Delete the quotes and you are good to go!
Weka is normally not that good in reading CSV files that include special characters, and ARFF files are normally easier to use. Therefore, in such cases, the easiest way is to convert your CSV file to an ARFF file using R ("RWeka" and "foreign" libraries can handle this conversion).
There is also another possibility. I was creating my CSV file and the header had a different number of elements compared to the rest of the data. So, check the header as well...!

Remove all binary characters from a file

Occasionally, I have a hard time manipulating data in a CSV file because of the following error.
Binary file (standard input) matches
I researched several articles online but cannot seem to find one that helps me remove all of the binary characters or elements from a CSV file.
Unfortunately, I do not know where to start with this.
If I run the 'file' command on the file, I get the following output:
Little-endian UTF-16 Unicode text, with very long lines, with CRLF, CR line terminators
The second from last line in the file prints as:
"???? ?????, ???? ???",????,"?????, ????",???,,,,,,,,,,,,,,,,,,,,,,,,* Home,email#address.com,,
The second line in the file prints as:
,,,,,,,,,,,,,,,,,,,,,,,,,,,* ,email#address.com,,
This file contains too many lines to open in Excel or a GUI, "Save as..." and remove the binary elements that way.
Please help me. Thank you!

org.supercsv.exception.SuperCsvException: unexpected end of file while reading quoted column beginning on line

I'm reading csv files using superCSV reader and got the following exception. the file has 80000 lines. As I remove the end lines the exception still happens so there's some line in file that's causing this problem. how do I fix this?
org.supercsv.exception.SuperCsvException: unexpected end of file while reading quoted column beginning on line 80000 and ending on line 80000
context=null
at org.supercsv.io.Tokenizer.readColumns(Tokenizer.java:198)
at org.supercsv.io.AbstractCsvReader.readRow(AbstractCsvReader.java:179)
at org.supercsv.io.CsvListReader.read(CsvListReader.java:69)
at csv.filter.CSVFilter.filterFile(CSVFilter.java:400)
at csv.filter.CSVFilter.filter(CSVFilter.java:369)
at csv.filter.CSVFilter.main(CSVFilter.java:292)
ICsvListReader reader = null;
String[] line=null;
ListlineList=null;
try{
reader = new CsvListReader(new FileReader(inputFile), CsvPreference.STANDARD_PREFERENCE);
while((lineList=reader.read())!=null){
line=lineList.toArray(new String[lineList.size()]);
}
}catch(Exception exp){
exp.printStackTrace();
error=true;
}
The fact that the exception states it begins and ends on line 80000 should mean that there's an incorrect number of quotes on that line.
You should get the same error with the following CSV (but the exception will say line 1):
one,two,"three,four
Because the 3rd column is missing the trailing quote, so Super CSV will reach the end of the file and not know how to interpret the input.
FYI here is the relevant unit test for this scenario from the project source.
You can try removing lines to find the culprit, just remember that CSV can span multiple lines so make sure you remove whole records.
The line shown in the error message is not necessarily the one with the problem, since unbalanced quotechars throw off SuperCSV's line detection.
If possible, open the csv in a spreadsheet problem (for instance libreoffice calc) and search (as in CTRL-F search) for the quote char.
Calc will usually import the file well, even if there is a mismatch but you will see the quotechar somewhere if you search for it. Then check in the csv if it is properly escaped. If it is, make sure SuperCSV knows about it. If it isn't, complain to the producer of the csv.