Having troubles loading data in InfoBright ICE - mysql

ICE Version: infobright-3.5.2-p1-win_32
I’m trying to load a large file but keep running into problems with errors such as:
Wrong data or column definition. Row: 989, field: 5.
This is row 989, field 5:
”(450)568-3***"
Note: The last 3 chars are numbers as well, but didn’t want to post somebodys phone number on here.
It’s really no different to any of the other entries in that field.
The datatype of that field is VARCHAR(255) NOT NULL

Also, if you upgrade to the current release 4.0.6, we now support row level error checking during LOAD and support a reject file.
To enable the reject file functionality, you must specify BH_REJECT_FILE_PATH and one of the associated parameters (BH_ABORT_ON_COUNT or BH_ABORT_ON_THRESHOLD). For example, if you want to load data from the file DATAFILE.csv to table T but you expects that 10 rows in this file might be wrongly formatted, you would run the following commands:
set #BH_REJECT_FILE_PATH = '/tmp/reject_file';
set #BH_ABORT_ON_COUNT = 10;
load data infile DATAFILE.csv into table T;
If less than 10 rows are rejected, a warning will be output, the load will succeed and all problematic rows will be output to the file /tmp/reject_file. If the Infobright Loader finds a tenth bad row, the load will terminate with an error and all bad rows found so far will be output to the file /tmp/reject_file.

I've run into this issue when the last line of the file is not terminated by the value of --lines-terminated-by="\n".
For example If I am importing a file with 9000 lines of data I have to make sure there is a new line at the end of the file.
Depending on the size of the file, you can just open it with a text editor and hit the return k

I have found this to be consistent with the '\r\n' vs. '\n' difference. Even when running on the loader on Windows, '\n' succeeds 100% of times (assuming you don't have real issues with your data vs. col. definition)

Related

Difficulties creating CSV table in Google BigQuery

I'm having some difficulties creating a table in Google BigQuery using CSV data that we download from another system.
The goal is to have a bucket in the Google Cloud Platform that we will upload a 1 CSV file per month. This CSV files have around 3,000 - 10,000 rows of data, depending on the month.
The error I am getting from the job history in the Big Query API is:
Error while reading data, error message: CSV table encountered too
many errors, giving up. Rows: 2949; errors: 1. Please look into the
errors[] collection for more details.
When I am uploading the CSV files, I am selecting the following:
file format: csv
table type: native table
auto detect: tried automatic and manual
partitioning: no partitioning
write preference: WRITE_EMPTY (cannot change this)
number of errors allowed: 0
ignore unknown values: unchecked
field delimiter: comma
header rows to skip: 1 (also tried 0 and manually deleting the header rows from the csv files).
Any help would be greatly appreciated.
This usually points to the error in the structure of data source (in this case your CSV file). Since your CSV file is small, you can run a little validation script to see that the number of columns is exactly the same across all your rows in the CSV, before running the export.
Maybe something like:
cat myfile.csv | awk -F, '{ a[NF]++ } END { for (n in a) print n, "rows have",a[n],"columns" }'
Or, you can bind it to the condition (lets say if your number of columns should be 5):
ncols=$(cat myfile.csv | awk -F, 'x=0;{ a[NF]++ } END { for (n in a){print a[n]; x++; if (x==1){break}}}'); if [ $ncols==5 ]; then python myexportscript.py; else echo "number of columns invalid: ", $ncols; fi;
It's impossible to point out the error without seeing an example CSV file, but it's very likely that your file is incorrectly formatted. As a result, one typo confuses BQ into thinking there are thousands. Let's say you have the following csv file:
Sally Whittaker,2018,McCarren House,312,3.75
Belinda Jameson 2017,Cushing House,148,3.52 //Missing a comma after the name
Jeff Smith,2018,Prescott House,17-D,3.20
Sandy Allen,2019,Oliver House,108,3.48
With the following schema:
Name(String) Class(Int64) Dorm(String) Room(String) GPA(Float64)
Since the schema is missing a comma, everything is shifted one column over. If you have a large file, it results in thousands of errors as it attempts to inserts Strings into Ints/Floats.
I suggest you run your csv file through a csv validator before uploading it to BQ. It might find something that breaks it. It's even possible that one of your fields has a comma inside the value which breaks everything.
Another theory to investigate is to make sure that all required columns receive an appropriate (non-null) value. A common cause of this error is if you cast data incorrectly which returns a null value for a specific field in every row.
As mentioned by Scicrazed, this issue seems to be generated as some file rows has an incorrect format, in which case it is required to validate the content data in order to figure out the specific error that is leading this issue.
I recommend you to check the errors[] collection that might contains additional information about the aspects that can be making to fail the process. You can do this by using the Jobs: get method that returns detailed information about your BigQuery Job or refer to the additionalErrors field of the JobStatus Stackdriver logs that contains the same complete error data that is reported by the service.
I'm probably too late for this, but it seems the file has some errors (it can be a character that cannot be parsed or just a string in an int column) and BigQuery cannot upload it automatically.
You need to understand what the error is and fix it somehow. An easy way to do it is by running this command on the terminal:
bq --format=prettyjson show -j <JobID>
and you will be able to see additional logs for the error to help you understand the problem.
If the error happens only a few times you just can increase the number of errors allowed.
If it happens many times you will need to manipulate your CSV file before you upload it.
Hope it helps

LOAD DATA FROM S3 command failing because of timestamp

I'm running the "LOAD DATA FROM S3" command to load a CSV file from S3 to Aurora MySQL. The command works fine if run it in the Mysql Workbench (it gives me the below exception as warnings though but still inserts the dates fine), but when I run it in Java I get the following exception:
com.mysql.cj.jdbc.exceptions.MysqlDataTruncation:
Data truncation: Incorrect datetime value: '2018-05-16T00:31:14-07:00'
Is there a workaround? Is there something I need to setup on the mysql side or in my app to make this transformation seamless? Should I somehow run a REPLACE() command on the timestamp?
Update 1:
When I use REPLACE to remove the "-07:00" from the time original timestamp (2018-05-16T00:31:14-07:00) it loads the data appropriately. Here's my load statement:
LOAD DATA FROM S3 's3://bucket/object.csv'
REPLACE
INTO TABLE sample
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(#myDate)
SET `created-date` = replace(#myDate, '-07:00', ' ');
For obvious reasons it's not a good solution. Why would the LOAD statement work in the mysql workbench and not in my java code? Can I set some parameter to make it work? Any help is appreciated!!
The way I solved it is by using mysql's SUBSTRING function in the 'SET' part of the LOAD DATA query (instead of the 'replace'):
SUBSTRING(#myDate, 1, 10)
This way the trailing '-07:00' was removed (I actually opted to remove the time as well, since I didn't need it, but you can use it for TIMESTAMPS as well.

CSV file import errors in to Mysql Workbench 6.3

I'm new to Mysql and am using it to make use of several CSV files I have that are very large (some have over a million rows). I'm on Win7-64 Ultimate. I have installed MySql Workbench v. 6.3.6 build 511 64 bit. I read a similar question however I cannot comment since I am new. I am getting a different error anyway.
I have set up a database called crash0715, and created a table called driver_old with five columns. The first column is a report number (set up as INT(20)) that will be keyed to other files. It contains some duplicates depending upon the data in the other columns. The next four columns contain numeric data that is either 1 or 2 digits.
I set up the report_number column as INT(20), primary key, not null.
The other 4 were set up as INT or INT(2)
When I tried to import a little over 1 million rows in a 5-column CSV file (named do.csv in my c:\ root) via the GUI, the program hung. I had let it run over 12 hours and my task manager showed the program was using 25% cpu.
I next tried the command line. After switching to the database, I used
LOAD DATA LOCAL INFILE 'c:/do.csv' INTO TABLE driver_old FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
I had removed the header row from the CSV before trying both imports.
I got the following message:
QUERY OK, 111 rows affected, 65535 warnings <3.97 sec> Records: 1070145 Deleted: 0 Skipped: 1070034 Warnings: 2273755
I read the first few lines of SHOW WARNINGS and they were as follows:
1264 Out of range value for column 'report_number' for row 1.
1261 Row 1 doesn't contain data for all columns
These two repeated for all of the other lines.
There was also a
1062 Duplicate entry '123456789' for key 'primary' (123456789 is a representative value)
It also reoccurred with the other two codes.
The CSV file has no blanks on the first column, however there are a few in the other ones.
Any idea what I'm doing wrong here?
i solved this by save and export sql insert statement
I would use bigint insted of int!
Inserting ignore or replace may help with duplicate primary key values!
LOAD DATA LOCAL INFILE 'c:/do.csv' ignore/replace INTO TABLE driver_old FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
I cannot comment on this question,but it would be great if you could post an url to a picture showing few lines from csv file and code how you created table and inserted data ! That would be very helpful for answering the question!
I have now successfully imported the 1045767 records. As suggested by another member here, I imported a small 100 row file that gave the same errors. I then opened the csv in Libre Office and saved it. I was able to import it OK.
The problem was the spreadsheet program, GS-Calc. When saving csv files, it gives three options: UTF-8, UTF-16, and ANSI/OEM/ISO. I had initially saved it as UTF-8 and it returned the error.
I saved it as ANSI/OEM/ISO and it was able to be imported OK. I hope this helps others with large csv files in the future.
i change the separator default in mysql by comma

Redshift COPY - No Errors, 0 Record(s) Loaded Successfully

I'm attempting to COPY a CSV file to Redshift from an S3 bucket. When I execute the command, I don't get any error messages, however the load doesn't work.
Command:
COPY temp FROM 's3://<bucket-redacted>/<object-redacted>.csv'
CREDENTIALS 'aws_access_key_id=<redacted>;aws_secret_access_key=<redacted>'
DELIMITER ',' IGNOREHEADER 1;
Response:
Load into table 'temp' completed, 0 record(s) loaded successfully.
I attempted to isolate the issue via the system tables, but there is no indication there are issues.
Table Definition:
CREATE TABLE temp ("id" BIGINT);
CSV Data:
id
123,
The line endings in your csv file probably don't have a unix new line character at the end, so the COPY command probably sees your file as:
id123,
Given you have the IGNOREHEADER option enabled, and the line endings in the file aren't what COPY is expecting (my assumption based on past experience), the file contents get treated as one line, and then skipped.
I had this occur for some files created from a Windows environment.
I guess one thing to remember is that CSV is not a standard, more a convention, and different products/vendors have different implementations for csv file creation.
I repeated your instructions, and it worked just fine:
First, the CREATE TABLE
Then, the LOAD (from my own text file containing just the two lines you show)
This resulted in:
Code: 0 SQL State: 00000 --- Load into table 'temp' completed, 1 record(s) loaded successfully.
So, there's nothing obviously wrong with your commands.
At first, I thought that the comma at the end of your data line could cause Amazon Redshift to think that there is an additional column of data that it can't map to your table, but it worked fine for me. Nonetheless, you might try removing the comma, or create an additional column to store this 'empty' value.

how to populate a database?

I have a mysql database with a single table, that includes an autoincrement ID, a string and two numbers. I want to populate this database with many strings, coming from a text file, with all numbers initially reset to 0.
Is there a way to do it quickly? I thought of creating a script that generates many INSERT statements, but that seems somewhat primitive and slow. Especially since mysql is on remote site.
Yes - use LOAD DATA INFILE docs are here Example :
LOAD DATA INFILE 'csvfile'
INTO TABLE table
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 0 LINES
(cola, colb, colc)
SET cold = 0,
cole = 0;
Notice the set line - here is where you set a default value.
Depending on your field separator change the line FIELDS TERMINATED BY ','
The other answers only respond to half of your question. For the other half (zeroing numeric columns):
Either:
Set the default value of your number columns to 0,
In your text file, simply delete the numeric values,
This will cause the field to be read by LOAD INFILE as null, and the defauly value will be assigned, which you have set to 0.
Or:
Once you have your data in the table, issue a MySQL command to zero the fields, like
UPDATE table SET first_numeric_column_name = 0, second_numeric_column_name = 0 WHERE 1;
And to sum everything up, use LOAD DATA INFILE.
If you have access to server's file system, you can utilize LOAD DATA
If you don't want to fight with syntax, easiest way (if on windows) is to use HeidiSQL
It has friendly wizard for this purpose.
Maybe I can help you with right syntax, if you post sample line from text file.
I recommend you to use SB Data Generator by Softbuilder (which I work for), download and install the free trial.
First, create a connection to your MySQL database then go to “Tools -> Virtual Data” and import your test data (the file must be in CSV format).
After importing the test data, you will be able to preview them and query them in the same window without inserting them into the database.
Now, if you want to insert test data into your database, go to “Tools -> Data Generation” and select "generate data from virtual data".
SB data generator from Softbuilder