sql loader not loading all the needed rows due to new line character and enclosement character - sql-loader

I have a problem with how SQL loader manage the end of a column value. I was hoping to manage CR LF, the enclosement character and the separator character but it seems I can't find a solution!
The data I receive from the .csv file looks like this:
"C","I","FLAGS","LASTUPDATEDATE","BOEVERSION","C_OSUSER_UPDATEDBY","I_OSUSER_UPDATEDBY","C_OSUSER_PWF","DESCRIPTION","DURATION","ENDDATE","I_OSUSER_PWF","LASTSTATUSCHA","STARTDATE","DURATIONUNIT","TYPE","STATUS","C_BNFTRGHT_CONDITIONS","I_BNFTRGHT_CONDITIONS","C_CNTRCT1_CONDITION","I_CNTRCT1_CONDITION","EXTBLOCKTYPE","EXTBLOCKDURATIONUNIT","EXTBLOCKDURATION","EXTBLOCKDESCRIPTION","PARTITIONID"
"7680","423","PE","2015-07-06 11:42:10","0","1000","1506","","No benefits are payable for a Total Disability period during a Parental or Family-Related Leave, for a Total Disability occurring during this period.
","0","","","","","69280000","69312015","71328000","7285","402","","","","","","","1"
"7680","426","PE","2015-07-06 11:42:10","0","1000","1506","","""Means to be admitted to a Hospital as an in-patient for more than 18 consecutive hours.
""
","0","","","","","69280000","69312021","71328000","7285","402","","","","","","","1"
My ctl file is as follow:
Load Data
infile 'C:\2020-07-29-03-04-48-TolCondition.csv'
CONTINUEIF LAST != '"'
into table TolCondition
REPLACE
FIELDS TERMINATED BY "," ENCLOSED by '"'
(
C,
I,
FLAGS,
LASTUPDATEDATE DATE "YYYY-MM-DD HH24:MI:SS",
BOEVERSION,
C_OSUSER_UPDATEDBY,
I_OSUSER_UPDATEDBY,
C_OSUSER_PWF,
DESCRIPTION CHAR(1000),
DURATION,
ENDDATE DATE "YYYY-MM-DD HH24:MI:SS",
I_OSUSER_PWF,
LASTSTATUSCHA DATE "YYYY-MM-DD HH24:MI:SS",
STARTDATE DATE "YYYY-MM-DD HH24:MI:SS",
DURATIONUNIT,
TYPE,
STATUS,
C_BNFTRGHT_CONDITIONS,
I_BNFTRGHT_CONDITIONS,
C_CNTRCT1_CONDITION,
I_CNTRCT1_CONDITION,
EXTBLOCKTYPE,
EXTBLOCKDURATIONUNIT,
EXTBLOCKDURATION,
EXTBLOCKDESCRIPTION,
PARTITIONID)
Here is what I tried in the control file:
CONTINUEIF LAST != '"'
CONTINUEIF THIS PRESERVE (1:2) != '",'
"str X'220D0A'"
Here is the result I currently have with "CONTINUEIF LAST != '"'
Record 2: Rejected - Error on table TOLCONDITION, column DESCRIPTION.
second enclosure string not present
Record 3: Rejected - Error on table TOLCONDITION, column C.
no terminator found after TERMINATED and ENCLOSED field
Table TOLCONDITION:
1 Row successfully loaded.
2 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Is there any way to manage line break and enclosement character in SQL Loader? I dont understand why we can`t change how it sees rows. Instead of seeing a new row when there is a CR LF, can we tell it to concacenate values until the last enclosement character (chr34 in my case) + the separator character (, in my case) has been seen.
I really hope to find a way to resolve this issue without having to change the .csv file.
Thank you

If you are on 12c you can use the "FIELDS CSV WITH EMBEDDED" clause to tell sqlldr that some columns have the data file's end of line character embedded within. This will cause the column to be inserted as it is in the data file. More info here
Load Data
infile 'C:\2020-07-29-03-04-48-TolCondition.csv'
into table TolCondition
REPLACE
FIELDS CSV WITH EMBEDDED TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'

Related

Has anyone ever encountered additional unknown characters appending to your column entries after importing from a .csv file?

I've filled my table of instruments using LOAD INTO FILE. It fills the rows successfully but then doesn't enclose the final column (status) with a vertical line. I didn't think this was an issue until I ran a query to check the number of column entries = "commissioning".
SELECT COUNT(*)
FROM instrument
WHERE status = 'commissioning';
All 60 rows contain "commissioning" so it should return 60, but instead it returns 0?
I retried the query with a wildcard search and returned the right result here (You can also see the table is not enclosed)
Perhaps something is going on when I imported from csv file, because a LENGTH(status) query returns 14 when "commissioning" is only 13 characters. Has anyone encountered this before or know what character could be causing this?
Heres the import from the csv file code for further clarity - but it worked fine with my other tables
The problem you are having is produced because Windows uses '\r\n' instead of '\n'. As you are telling the import statement to finish lines with '\n' you have an extra '\r' character in every line. You need to change your import statement as:
LOAD DATA INFILE 'instruments.csv'
INTO TABLE instruments
FILEDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 ROWS;

How can I load blank/NULL values with LOAD DATA INFILE from the MySQL command line

I am using the following command from the MySQL command line to try and import a csv file into one of my tables:
LOAD DATA INFILE 'file path' INTO TABLE table
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(BASEID, BIGID, StartDate, EndDate)
Some of my rows have no value for EndDate which I want to be represented as NULL values in the database.
Unfortunately when I execute the above command I get the following error:
for column 'EndDate' at row 141lue:
If I remove the rows with blank cells the command works, so it is clearly the blank values for EndDate which are causing the problem.
I have also tried changing my csv file so that the blank cells say NULL or \N. I have also tried the following command instead but I still get the same error message:
LOAD DATA INFILE 'file path' INTO TABLE Table
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(BASEID, BIGID, StartDate, #EndDate)
SET EndDate = nullif(#EndDate, ' ')
How can I load csv files which have some blank values? The suggest solutions I have seen on other posts don't seem to work as outlined above.
Is the issue that the value for the end date is missing, or that the column itself is missing? These are not the same thing. For the former case, I think LOAD DATA should be able to handle this, assuming that the target column for the end date can tolerate missing/null values.
What I suspect here is that some of your input lines look like this:
1,1,'2020-10-03'
That is, there is no fourth column present at all. If this be the case, then the most prudent thing to do here might be to run a simple regex over your input CSV flat file to fix these missing fourth column edge cases. You may try:
Find: ^([^,]+,[^,]+,'[^,]+')$
Replace: $1,
This would turn the sample line above into:
1,1,'2020-10-03',
Now, the date value is still missing, but at least LOAD DATA should detect that the line has four columns, instead of just three.

MySQL Format Date with a Load Data command

Hello I am trying to load data from a csv file but need to format the date when loading the file.
The date format of my csv file is: 1/1/2008 0:00
i'd like to format the date and to just the year with the load command
load data infile 'C:/mysql/ca_pop_educational_attainment.csv'
into table ca_pop.educational_attainment
fields terminated by ','
enclosed by '"'
ignore line 1
set year = STR_TO_DATE(#year,"%Y")
(ea_id, #year, age, educational_attainment, personal_income,pop_count)
The list of columns must come before the SET clause. That is, you list the columns or variables to import into, then following that, you can set individual columns.
Also, you need to use STR_TO_DATE() to parse the whole string into a date. Then you can wrap that in another function or expression to extract individual parts like the year.
Something like the following, but I have not tested it:
load data infile 'C:/mysql/ca_pop_educational_attainment.csv'
into table ca_pop.educational_attainment
fields terminated by ','
enclosed by '"'
ignore line 1
(ea_id, #year, age, educational_attainment, personal_income, pop_count)
set year = YEAR(STR_TO_DATE(#year,'%m/%d/%Y %k:%i'))

Load data from text file to DB

Data:
1|\N|"First\Line"
2|\N|"Second\Line"
3|100|\N
\N represents NULL in MYSQL & MariaDB.
I'm trying to load above data using LOAD DATA LOCAL INFILE method into a table named ID_OPR.
Table structure:
CREATE TABLE ID_OPR (
idnt decimal(4),
age decimal(3),
comment varchar(100)
);
My code looks like below:
LOAD DATA LOCAL INFILE <DATA FILE LOCATION> INTO TABLE <TABLE_NAME> FIELDS TERMINATED BY '|' ESCAPED BY '' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';
Problem with this code is it aborts with error Incorrect decimal value: '\\N' For column <Column name>.
Question:
How to load this data with NULL values in second decimal column and also without loosing \(Backslash) from third string column?
I'm trying this is MariaDB which is similar to Mysql in most case.
Update:
The error i have mentioned appears like a warning and the data is actually getting loaded into table. But the catch here is with the text data.
For example: Incase of the third record above it is being loaded as \N itself into string column. But i want it to be NULL.
Is there any way to make the software to recognize this null value? Something like decode in oracle?
You can't have it both ways - either \ is an escape character or it is not. From MySQL docs:
If the FIELDS ESCAPED BY character is empty, no characters are escaped and NULL is output as NULL, not \N. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.
So, I'd suggest a consistently formatted input file, however that was generated:
use \\ if you want to keep the backslash in the strings
make \ an escape character in your load command
OR
make strings always, not optionally, enclosed in quotes
leave escape character empty, as is
use NULL for nulls, not \N
BTW, this also explains the warnings you were experiencing loading \N in your decimal field.
Deal with nulls with blanks. that should fix it.
1||"First\Line"
2||"Second\Line"
3|100|
Thats how nulls are handled on CSVs and TSVs. And don't expect decimal datatype to go null as it stays 0, use int or bigint instead if needed. You should forget about "ESCAPED BY"; as long as string data is enclosed by "" that deals with the escaping problem.
we need three text file & 1 batch file for Load Data:
Suppose your file location 'D:\loaddata'
Your text file 'D:\loaddata\abc.txt'
1. D:\loaddata\abc.bad -- empty
2. D:\loaddata\abc.log -- empty
3. D:\loaddata\abc.ctl
a. Write Code Below for no separator
OPTIONS ( SKIP=1, DIRECT=TRUE, ERRORS=10000000, ROWS=5000000)
load data
infile 'D:\loaddata\abc.txt'
TRUNCATE
into table Your_table
(
a_column POSITION (1:7) char,
b_column POSITION (8:10) char,
c_column POSITION (11:12) char,
d_column POSITION (13:13) char,
f_column POSITION (14:20) char
)
b. Write Code Below for coma separator
OPTIONS ( SKIP=1, DIRECT=TRUE, ERRORS=10000000, ROWS=5000000)
load data
infile 'D:\loaddata\abc.txt'
TRUNCATE
into table Your_table
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
(a_column,
b_column,
c_column,
d_column,
e_column,
f_column
)
4.D:\loaddata\abc.bat "Write Code Below"
sqlldr db_user/db_passward#your_tns control=D:\loaddata\abc.ctl log=D:\loaddata\abc.log
After double click "D:\loaddata\abc.bat" file you data will be load desire oracle table. if anything wrong check you "D:\loaddata\abc.bad" and "D:\loaddata\abc.log" file

Convert Numerical Fields in MySQL

I have imported a CSV file where a specific column has a decimal number.
In the original excel file (before saving it to a CSV), the first number of the column shows up as 218,790. When I choose the cell, the number shows up as 218790.243077911.
In the CSV file the number shows up as 218790 and when I choose the cell it is 218,790.
When I import the file on mySQL and show the table I created, the number shows up as 218.000000000.
Here is the code I used:
create table Apolo_Test(
Leads decimal (15,9)
);
LOAD DATA LOCAL INFILE 'C:/Users/SCRIPTS/file.csv'
INTO TABLE Apolo_Test
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 7 ROWS
;
I tried updating the format with this :
update Apolo_Test set Leads = format(Leads, 10, 'de_DE');
but it did not work. I have never had a case where files had a comma before. I guess it is the UK version of numerical fields.
How is it possible to make it work on mySQL without using any MACROS in excel?
UPD:
It works but I get some warnings although I double checked the csv file and the fields :
create table Apolo_Test(
Ad_Group varchar(50),
Impacts int,
Leads decimal (10,3)
);
LOAD DATA LOCAL INFILE 'C:/Users/me/Desktop/SCRIPTS/11/Adalyser.csv'
INTO TABLE Apolo_Test
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 7 ROWS
(Ad_Group, Impacts, #Leads)
SET Leads = replace(#Leads, ',', '');
;
alter table Apolo_Test ADD IPL decimal (10,6) after Leads;
update Apolo_Test set IPL=Impacts/Leads;
select * from Apolo_Test;
You have to use this syntax:
LOAD DATA LOCAL INFILE 'C:/path/to/mytable.txt' IGNORE
INTO TABLE mytable
FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\r\n'
(int_col, #float_col)
SET float_col = replace(#float_col, ',', '.');
For more information read here
The thousands-separator should not matter when moving data around -- Excel internal values and CSV files and MySQL internal values do not include it. Only "formatted" output includes it. And you should not use formatted output for moving numbers around.
Be careful with locale, such as de_DE.
The German "218.790" is the same as English "218,790".
"218790.243077911" is likely to be what Excel had internally for the number.
"218,790" is likely to be the English representation on the screen; note the English thousands separator.
In the CSV file the number shows up as 218790 and when I choose the cell it is 218,790.
What do you mean? Perhaps that there no comma or dot in the file, itself? But what you mean by "choose the cell"?
I can't see how to get "218.000000000" without truncation going on somewhere.