weka - csv file upload produces null error - csv

Hej,
no matter what I try, I keep getting the error: file not recognised as 'CSV data files' file, reason: null, while loading a cvs file into Weka explorer. Any suggestions what could be wrong?
I have been trying "correct" this type of errors Wrong number of values, Read 1, expected 2 Token[EOL], line 17 and after it stops giving those, the null one appears.
The file in question: file link
Thank you in advance!

I've preprocessed the file with these shell commands.
# optional:
# The file uses "\r" characters (sometimes displayed as ^M) characters
# as line separator. Character "\n" is better.
# make it a unix-compliant csv file
# original file is saved into ~/Downloads/rezultati.csv.bak
perl -pi.bak -E "s/\r/\n/g" ~/Downloads/rezultati.csv
# end optional
# take first 240 lines, except the defective last line .
# I don't know what's wrong with it. maybe it's "No newline at end of file"
# I'll just omit that single line starting with ID 243.
head -240 ~/Downloads/rezultati.csv > ~/Downloads/rezultati-240.csv
resultati-240.csv can be loaded into weka.

Related

Perl CSV reading characters that aren't there

I'm reading in a file using Perl's Text::CSV_XS package which I am reading on Ubuntu:
open my $fh, '<:encoding(utf8)', 'file.csv' or die "Can't read csv: $!"; # error shows on this line
while (my $row = $list->getline ($fh)) {
....
}
and this reads just fine until one line gives an error:
UTF-8 "\xE9" does not map to Unicode at 0.xlsx_to_json.pl line 198, <$_[...]> line 14019.
looking online, this suggests that this is a ê character or something similar, which is strange because I don't see any such characters on line 14109, that line looks just like any other line.
I tried changing the open line to
open my $fh, '<', 'file.csv'
but that gives the same error.
I tried opening the CSV and saving as CSV with a different delimiter, but I can't do that in Excel 2016 anymore, the option to change the delimiter simply doesn't appear
I tried opening in LibreOffice to save as a CSV, but an update removed the ability to change the delimiter
How can I read this CSV file without this strange error?
Your file is not a valid UTF-8 file. Byte E9 appears where it's not expected.
Followed by two continuation bytes = ok
$ perl -M5.010 -MEncode=decode -e'
decode("UTF-8", "\xE9\xBF\xBF", Encode::FB_WARN | Encode::LEAVE_SRC);
say "done";
'
done
Not followed by two continuation bytes = bad
$ perl -M5.010 -MEncode=decode -e'
decode("UTF-8", "\xE9\x41", Encode::FB_WARN | Encode::LEAVE_SRC);
say "done";
'
UTF-8 "\xE9" does not map to Unicode at -e line 2.
done
Fix your bad data.

error finding and uploading a file in octave

I tried converting my .csv file to .dat format and tried to load the file into Octave. It throws an error:
unable to find file filename
I also tried to load the file in .csv format using the syntax
x = csvread(filename)
and it throws the error:
'filename' undefined near line 1 column 13.
I also tried loading the file by opening it on the editor and I tried loading it and now it shows me
warning: load: 'filepath' found by searching load path
error: load: unable to determine file format of 'Salary_Data.dat'.
How can I load my data?
>> load Salary_Data.dat
error: load: unable to find file Salary_Data.dat
>> Salary_Data
error: 'Salary_Data' undefined near line 1 column 1
>> Salary_Data
error: 'Salary_Data' undefined near line 1 column 1
>> Salary_Data
error: 'Salary_Data' undefined near line 1 column 1
>> x = csvread(Salary_Data)
error: 'Salary_Data' undefined near line 1 column 13
>> x = csvread(Salary_Data.csv)
error: 'Salary_Data' undefined near line 1 column 13
>> load Salary_Data.dat
warning: load: 'C:/Users/vaith/Desktop\Salary_Data.dat' found by searching load path
error: load: unable to determine file format of 'Salary_Data.dat'
>> load Salary_Data.csv
warning: load: 'C:/Users/vaith/Desktop\Salary_Data.csv' found by searching load path
error: load: unable to determine file format of 'Salary_Data.csv'
Salary_Data.csv
YearsExperience,Salary
1.1,39343.00
1.3,46205.00
1.5,37731.00
2.0,43525.00
2.2,39891.00
2.9,56642.00
3.0,60150.00
3.2,54445.00
3.2,64445.00
3.7,57189.00
3.9,63218.00
4.0,55794.00
4.0,56957.00
4.1,57081.00
4.5,61111.00
4.9,67938.00
5.1,66029.00
5.3,83088.00
5.9,81363.00
6.0,93940.00
6.8,91738.00
7.1,98273.00
7.9,101302.00
8.2,113812.00
8.7,109431.00
9.0,105582.00
9.5,116969.00
9.6,112635.00
10.3,122391.00
10.5,121872.00
Ok, you've stumbled through a whole pile of issues here.
It would help if you didn't give us error messages without the commands that produced them.
The first message means you were telling Octave to open something called filename and it couldn't find anything called filename. Did you define the variable filename? Your second command and the error message suggests you didn't.
Do you know what Octave's working directory is? Is it the same as where the file is located? From the response to your load commands, I'd guess not. The file is located at C:/Users/vaith/Desktop. Octave's working directory is probably somewhere else.
(Try the pwd command and see what it tells you. Use the file browser or the cd command to navigate to the same location as the file. help pwd and help cd commands would also provide useful information.)
The load command, used as a command (load file.txt) can take an input that is or isn't defined as a string. A function format (load('file.txt') or csvread('file.txt')) must be a string input, hence the quotes around file.txt. So all of your csvread input commands thought you were giving it variable names, not filenames.
Last, the fact that load couldn't read your data isn't overly surprising. Octave is trying to guess what kind of file it is and how to load it. I assume you tried help load to see what the different command options are? You can give it different options to help Octave figure it out. If it actually is a csv file though, and is all numbers not text, then csvread might still be your best option if you use it correctly. help csvread would be good information for you.
It looks from your data like you have a header line that is probably confusing the load command. For data that simply formatted, the csvread command can bring in the data. It will replace your header text with zeros.
So, first, navigate to the location of the file:
>> cd C:/Users/vaith/Desktop
then open the file:
>> mydata = csvread('Salary_Data.csv')
mydata =
0.00000 0.00000
1.10000 39343.00000
1.30000 46205.00000
1.50000 37731.00000
2.00000 43525.00000
...
If you plan to reuse the filename, you can assign it to a variable, then open the file:
>> myfile = 'Salary_Data.csv'
myfile = Salary_Data.csv
>> mydata = csvread(myfile)
mydata =
0.00000 0.00000
1.10000 39343.00000
1.30000 46205.00000
1.50000 37731.00000
2.00000 43525.00000
...
Notice how the filename is stored and used as a string with quotation marks, but the variable name is not. Also, csvread converted non-numeric header data to 'zeros'. The help for csvread and dlmread show you how to change it to something other than zero, or to skip a certain number of rows. If you want to preserve the text, you'll have to use some other input function.

neo4j throws error for "\" character

I am exporting csv file and need to read line one by one.
One of the line in csv file contains the string "C:\Program Files\". Because of this line it throws the below error.
At D:\workdir\Neo4j_Database\Database1\import\Data.csv:22798 - there's
a field starting with a quote and whereas it ends that quote there
seems to be characters in that field after that ending quote. That
isn't supported. This is what I read: 'CMM 10.0.1 Silent Installation
will install SW always in "C:\Program Files"",V10.0,
,,,,,,,,105111,AVASAAIS AG,E,,"G,"'
If I remove the last \ of the line then it does not throw this error.
I am not sure how to resolve this without modifying the csv file.
Note: CSV loader used LOAD CSV.

Remove all binary characters from a file

Occasionally, I have a hard time manipulating data in a CSV file because of the following error.
Binary file (standard input) matches
I researched several articles online but cannot seem to find one that helps me remove all of the binary characters or elements from a CSV file.
Unfortunately, I do not know where to start with this.
If I run the 'file' command on the file, I get the following output:
Little-endian UTF-16 Unicode text, with very long lines, with CRLF, CR line terminators
The second from last line in the file prints as:
"???? ?????, ???? ???",????,"?????, ????",???,,,,,,,,,,,,,,,,,,,,,,,,* Home,email#address.com,,
The second line in the file prints as:
,,,,,,,,,,,,,,,,,,,,,,,,,,,* ,email#address.com,,
This file contains too many lines to open in Excel or a GUI, "Save as..." and remove the binary elements that way.
Please help me. Thank you!

JSON-file without line breaks, cant import file to SAS

I have a large json file (250 Mb) that has no line breaks in it when opening the file in notepad or SAS. But if I open it in Wordpad, I get the correct line breaks. I suppose this could mean the json file uses unix line breaks, which notapad can't read, but wordpad can read, from what I have read.
I need to import the file to SAS. One way of doing this migth be to open the file in wordpad, save it as a text file, which will hopefully retain the correct line breaks, so that I can read the file in SAS. I have tried reading the file, but without line breaks, I only get the first observation, and I can't get the program to find the next observation.
I have tried getting wordpad to save the file, but wordpad crashes each time, probably because of the file size. Also tried doing this through powershell, but can't figure out how to save the file once it is opened, and I see no reason why it should work seeing as wordpad crashes when i try it through point and click.
Is there another way to fix this json-file? Is there a way to view the unix code for line breaks and replace it with windows line breaks, or something to that effect?
EDIT:
I have tried adding the TERMSTR=LF option both in filename and infile, without any luck:
filename test "C:\path";
data datatest;
infile test lrecl = 32000 truncover scanover TERMSTR=LF;
input #'"Id":' ID $9.;
run;
However, If I manually edit a small portion of the file to have line breaks, it works. The TERMSTR option doesn't seem to do much for me
EDIT 2:
Solved using RECFM=F
data datatest;
infile test lrecl = 42000 truncover scanover RECFM=F ;
input #'"Id":' ID $9.;
run;
EDIT 3:
Turn out it didnt solve the problem after all. RECFM=F means all records have a fixed length, which they don't, so my data gets mixed up and a lot of info is skipped. Tried RECFM=V(ariable), but this is not working either.
I guess you're using windows, so try:
TYPE input_filename | MORE /P > output_filename
this should replace unix style text file with windows/dos one.
250 Mbytes is not too long to treat as a single record.
data want ;
infile json lrecl=250000000; *250 Mb ;
input #'"Id":' ID :$9. ##;
run;