I am trying to index csv file in Endeca.Indexing is working fine in the case the line length is less than 65536.For large data it is throwing below exception.
FATAL 02/18/14 15:45:53.122 UTC (1392738353122) FORGE {baseline}: TextObjectInputStream: while reading "/opt/soft/endeca/apps/MyApp/data/processing/TestRecord.csv", delimiter " " not found within allowed distance of 65536 characters. ............................................. .............................................. ERROR 02/17/14 16:10:58.060 UTC (1392653458060) FORGE {baseline}: I/O Exception: Error reading data from Java: EdfException thrown in: edf/src/format/Shared/TextObjectInputStream.cpp:76. Message is: exit called
How can I increase this limit to index large data(having more than 65537 character in single line) in Endeca ?.
I imagine you've fixed this. If not, your error is when the row delimiter isn't set correctly in your Record Adapter.
If your records are legitimately that long in a CSV file, switch to XML or something else.
Related
I am unable to load any CSV file into MySQL. Using the Table Data Import Wizard, this error pops up every time I get to the 'Configure Import Settings' step:
"Unhandled exception: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)"
... even though the CSV is encoded as UTF-8 and that seems to be the default encoding setting for MySQL Workbench. Granted, I am not very skilled with computers, I have only a few weeks' exposure to MySQL. This has not always happened to me. I had no issues with this a couple of months ago while I was in a database management course.
But, I think this is where my problem lies: at one point I tried to uninstall MySQL Workbench and Community Server and re-installed, and ever since, this error happens every time I try to load data. I am even using a very basic test file that still won't load (all column types are set to 'Text' in Excel and saved as UTF-8 CSV:
I am using MySQL 8.0.28 on MacOS 11.5.2 (Big Sur)
Case 1, you wanted ï ("LATIN SMALL LETTER I WITH DIAERESIS"):
Character set ASCII is not adequate for the accented letters you have. You probably need latin1
Case 2, the first 3 bytes of the file are (hex) EF BB BF:
That is "BOM", which is a marker at the beginning of the file that indicates that it is encoded in UTF-8. But, apparently, the program reading it dos not handle such.
In some situations, you can remove the 3 bytes and proceed; in other situations, you need to read it using some UTF-8 setting.
Since you say "Text' in Excel and saved as UTF-8 CSV", I suspect that it is case 2. But that only addresses the source (Excel), over which you may not have enough control to get rid of the BOM.
I don't know what app has "Table Data Import Wizard", I cannot address the destination side of the problem. Maybe the wizard has a setting of UTF-8 or utf8mb4 or utf8; any of those might work instead of "ascii".
Sorry, I don't have the full explanation, but maybe the clues "BOM" or "EFBBBF" will help you find a solution either in Excel or in the Wizard.
Was able to solve it by saving my excel file to csv using MS DOS csv and Macintosh csv. After that, I was able to import my csv through the import wizard without the bug.
I am trying to load json files in to snow flake using copy command.I have two files of same structure.However one file loaded without issue,the other one is throwing the error
"Error parsing JSON: misplaced { "
The simple example select parse_json($1) record from values ('{{'); also errors with Error parsing JSON: misplaced {, pos 2 so your second file probably does in fact contain invalid JSON.
Try running the statement in validation mode (e.g. copy into mytable validation_mode = 'RETURN_ERRORS';) which will return a table containing useful troubleshooting info like the line number and character of the error(s).
The docs cover this here: https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#validating-staged-files
I am trying to read an input from csv file (input.csv, in the same folder of this f95 file) which has some numbers in certain columns. while I execute my program, it gives error
program received signal SIGSEGV: segmentation fault - invalid memory reference. backtrace =#0 ffffffff
the code is below; it terminates after the first print statement.
program section
implicit none
integer::no_num,count_le,count_cve,count_cvx,count_ce
integer::i,j
double precision,allocatable,dimension(:)::x1,y1,x2,y2
double precision,allocatable,dimension(:)::xc,yc,rc,sa,ea
character (2),allocatable,dimension(:):: asc,asl,sd
character(100)::rand_char ! random characters/txt
print*,"program"
open(unit=100,file='input.csv')
print*,"check"
close(100)
end program
I'm new to Weka and I have to use it for a University project. So, I created a .csv file and when I try to upload it to Weka, it says: "not recognised as a CSV data file. Reason: 1 problem encountered on line 2".
Then, if I open the .csv file with Notepad and then save as .arff file, when I try to open it again with Weka, in this case I get another error message: "not recognised as an arff data file. Reason: premature end of line, read Token[EOL], line 8".
Please help, I don't know much about working with Weka and really don't know what could be the problem, even though I did a lot of research about this problem.
This is the file: https://app.box.com/s/adfpf1zatgpl5mo20u5hdd1gnqihnq40
#Relation "PIB_Rata inflatiei"
#Attribute "PIB" NUMERIC
#Attribute "Rata_inflatiei" NUMERIC
#Data
30624.3,20780.9,27980.4,31920.3,37657.0,37168.3,35838.9,41978.0,36183.4,37439.0,40717.1,46174.0,59867.6,76217.6,99699.2,123533.7,171540.2,208185.1,167421.6,167998.1,185362.3,171664.6,191548.1,199325.9,177956.0
128.0,211.2,255.2,136.8,32.2,38.8,154.8,59.1,45.8,45.7,34.5,22.5,15.3,11.3,9.0,6.6,4.8,7.8,5.6,6.1,5.8,3.3,4.0,1.1,-0.6
In the ARFF format (as well as CSV) instances are rows, and attributes are columns.
Your file thus has too many columns, ever row must have exactly.two values.
I'm reading csv files using superCSV reader and got the following exception. the file has 80000 lines. As I remove the end lines the exception still happens so there's some line in file that's causing this problem. how do I fix this?
org.supercsv.exception.SuperCsvException: unexpected end of file while reading quoted column beginning on line 80000 and ending on line 80000
context=null
at org.supercsv.io.Tokenizer.readColumns(Tokenizer.java:198)
at org.supercsv.io.AbstractCsvReader.readRow(AbstractCsvReader.java:179)
at org.supercsv.io.CsvListReader.read(CsvListReader.java:69)
at csv.filter.CSVFilter.filterFile(CSVFilter.java:400)
at csv.filter.CSVFilter.filter(CSVFilter.java:369)
at csv.filter.CSVFilter.main(CSVFilter.java:292)
ICsvListReader reader = null;
String[] line=null;
ListlineList=null;
try{
reader = new CsvListReader(new FileReader(inputFile), CsvPreference.STANDARD_PREFERENCE);
while((lineList=reader.read())!=null){
line=lineList.toArray(new String[lineList.size()]);
}
}catch(Exception exp){
exp.printStackTrace();
error=true;
}
The fact that the exception states it begins and ends on line 80000 should mean that there's an incorrect number of quotes on that line.
You should get the same error with the following CSV (but the exception will say line 1):
one,two,"three,four
Because the 3rd column is missing the trailing quote, so Super CSV will reach the end of the file and not know how to interpret the input.
FYI here is the relevant unit test for this scenario from the project source.
You can try removing lines to find the culprit, just remember that CSV can span multiple lines so make sure you remove whole records.
The line shown in the error message is not necessarily the one with the problem, since unbalanced quotechars throw off SuperCSV's line detection.
If possible, open the csv in a spreadsheet problem (for instance libreoffice calc) and search (as in CTRL-F search) for the quote char.
Calc will usually import the file well, even if there is a mismatch but you will see the quotechar somewhere if you search for it. Then check in the csv if it is properly escaped. If it is, make sure SuperCSV knows about it. If it isn't, complain to the producer of the csv.