I am using LOAD DATA INFILE to import into a MySQL table twenty |-delimited .dat files. But some of the | field terminators are escaped with a backslash. The second field below is an example:
1001604|EMERITUS CORP\WA\|SC 13G|1996-02-13|edgar/data/1001604/0000728757-96-000006.txt
1001604|EMERITUS CORP\WA\|SC 13G|1996-02-14|edgar/data/1001604/0000903949-96-000038.txt
I get an error because the last field clashes with the DATE type declared for the next to last field. I can open the .dat file and escape the escape, but is there a better way?
I could use a stream editor to double all backslashes, but this seems like a bad idea. Can I safely change the FIELDS ESCAPED BY option to something other than "\", or is that a bad idea? Thanks!
Here is my LOAD DATA INFILE command:
LOAD DATA INFILE 'C:/users/richard/research/data/edgar/masterfiles/master_1996.dat'
INTO TABLE edgar.master
FIELDS TERMINATED BY '|'
IGNORE 1 LINES;
Adding ESCAPED BY '' to my FIELDS clause allows the query to complete without error. I will update if I find that this caused a silent fail.
Related
Hi there I am new to web development.
I am trying to import a CSV file into mysql workbench using 'Table Data Import Wizard'. However, I have read my file needs to be a CSV (MS-DOS), or I get the following error: Can't analyze file. Please try to change encoding type. If that doesn't help, maybe the file is not: csv, or the file is empty.
I cannot use a CSV (MS-DOS) as my data contains a lot of different special characters including those from Nordic Europe. When I convert my CSV (comma delimited) to CSV (MS-DOS) the special characters are no longer the same.
Is there a way to import a CSV comma delimited file into mysql workbench? Or is there a better solution to getting my data into the table such as keeping the special characters the same in the MS-DOS file somehow?
You can import regular CSVs without an issue, just make sure the encoding matches.
Something like
LOAD DATA
INFILE yourfile.csv
INTO TABLE tablename
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
LINES
TERMINATED BY '\n'
IGNORE 1 LINES
should work. If your CSV doesn't have headers, remove the ignore 1 lines line from the code. If your formatting is different, change the enclosing and terminating characters accordingly.
You can look up the exact syntax in the manual.
Your CSV should work fine. You just need to Load Data Infile
You will likely need to define these settings though
LOAD DATA INFILE 'c:/tmp/discounts.csv'
INTO TABLE discounts
-- comma seperated? maybe pipes '|'?
FIELDS TERMINATED BY ','
-- what surrounds input and is it optional? then add OPTIONALLY before ENCLOSED
ENCLOSED BY '"'
-- what is at the end of files
LINES TERMINATED BY '\n'
-- how many header rows are there, if any?
IGNORE 1 ROWS;
Data:
1|\N|"First\Line"
2|\N|"Second\Line"
3|100|\N
\N represents NULL in MYSQL & MariaDB.
I'm trying to load above data using LOAD DATA LOCAL INFILE method into a table named ID_OPR.
Table structure:
CREATE TABLE ID_OPR (
idnt decimal(4),
age decimal(3),
comment varchar(100)
);
My code looks like below:
LOAD DATA LOCAL INFILE <DATA FILE LOCATION> INTO TABLE <TABLE_NAME> FIELDS TERMINATED BY '|' ESCAPED BY '' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';
Problem with this code is it aborts with error Incorrect decimal value: '\\N' For column <Column name>.
Question:
How to load this data with NULL values in second decimal column and also without loosing \(Backslash) from third string column?
I'm trying this is MariaDB which is similar to Mysql in most case.
Update:
The error i have mentioned appears like a warning and the data is actually getting loaded into table. But the catch here is with the text data.
For example: Incase of the third record above it is being loaded as \N itself into string column. But i want it to be NULL.
Is there any way to make the software to recognize this null value? Something like decode in oracle?
You can't have it both ways - either \ is an escape character or it is not. From MySQL docs:
If the FIELDS ESCAPED BY character is empty, no characters are escaped and NULL is output as NULL, not \N. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.
So, I'd suggest a consistently formatted input file, however that was generated:
use \\ if you want to keep the backslash in the strings
make \ an escape character in your load command
OR
make strings always, not optionally, enclosed in quotes
leave escape character empty, as is
use NULL for nulls, not \N
BTW, this also explains the warnings you were experiencing loading \N in your decimal field.
Deal with nulls with blanks. that should fix it.
1||"First\Line"
2||"Second\Line"
3|100|
Thats how nulls are handled on CSVs and TSVs. And don't expect decimal datatype to go null as it stays 0, use int or bigint instead if needed. You should forget about "ESCAPED BY"; as long as string data is enclosed by "" that deals with the escaping problem.
we need three text file & 1 batch file for Load Data:
Suppose your file location 'D:\loaddata'
Your text file 'D:\loaddata\abc.txt'
1. D:\loaddata\abc.bad -- empty
2. D:\loaddata\abc.log -- empty
3. D:\loaddata\abc.ctl
a. Write Code Below for no separator
OPTIONS ( SKIP=1, DIRECT=TRUE, ERRORS=10000000, ROWS=5000000)
load data
infile 'D:\loaddata\abc.txt'
TRUNCATE
into table Your_table
(
a_column POSITION (1:7) char,
b_column POSITION (8:10) char,
c_column POSITION (11:12) char,
d_column POSITION (13:13) char,
f_column POSITION (14:20) char
)
b. Write Code Below for coma separator
OPTIONS ( SKIP=1, DIRECT=TRUE, ERRORS=10000000, ROWS=5000000)
load data
infile 'D:\loaddata\abc.txt'
TRUNCATE
into table Your_table
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
(a_column,
b_column,
c_column,
d_column,
e_column,
f_column
)
4.D:\loaddata\abc.bat "Write Code Below"
sqlldr db_user/db_passward#your_tns control=D:\loaddata\abc.ctl log=D:\loaddata\abc.log
After double click "D:\loaddata\abc.bat" file you data will be load desire oracle table. if anything wrong check you "D:\loaddata\abc.bad" and "D:\loaddata\abc.log" file
Sorry for the wordy title. I'm running the following query:
LOAD DATA LOCAL INFILE 'file.csv' INTO TABLE tablename
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY ',\r\n'
I have a CSV with data in the following format:
1111, 2222, "32", "1,234.304", 1023.53,
All lines are terminated with an extra comma.
When attempting to load, I get an error when trying to put the "1,234.304" type field into a NUMERIC(12,4) column. I end up with 1.0000 and a warning that the data was truncated. I had expected 1234.304.
Edit: It seems that doing a regular insert for a value which has comma thousand-separators upsets MySQL. Is there a way to modify the load command for the desired behavior?
Break the problem down into two steps:
Get the data into the database
Get the data into the table/format you want it in
If you get all the data into a table with VARCHAR fields, you can easily manipulate it and explicitly convert the values into the types you want them in, rather than relying on whatever implicit conversion you get with LOAD DATA LOCAL INFILE.
I have .csv file data like this:
"UPRR 38 PAN AM "M"","1"
and I loaded data into table using below command which is having two columns (a and b).
LOAD DATA LOCAL INFILE 'E:\monthly_data.csv'
INTO TABLE test_data_table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n';
But when I select table, it's giving unexpected results which is shown below.
a contains:
UPRR 38 PAN AM "M","1
... and b is NULL.
Thanks
You can replace all the instances of "Double quote double quote" in your file
either A. open the files and find replace them
or B. make a script to open the files and replace the extra quote that is messing it up
You have this:
ENCLOSED BY '"'
Thus " is not a regular character any more. It's a special character that has a special meaning: it highlights the start and end of a column value. If you want to type a " that does not behave that way you need to escape it. The RFC 4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files document explains how to do that:
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote
a;b
"UPRR 38 PAN AM ""M""";1
As they say, garbage in, garbage out ;-)
I'm trying to import a large csv file wiht 27797 rows into MySQL. Here is my code:
load data local infile 'foo.csv' into table bar fields terminated by ',' enclosed by '"' lines terminated by '\n' ignore 1 lines;
It works fine. However, some rows of this file containing backslashes (\), for example:
"40395383771234304","40393156566585344","84996340","","","2011-02-23 12:59:44 +0000","引力波宇宙广播系统零号控制站","#woiu 太好了"
"40395151830421504","40392270645563392","23063222","","","2011-02-23 12:58:49 +0000","引力波宇宙广播系统零号控制站","#wx0 确切地讲安全电压是\""不高于36V\""而不是\""36V\"", 呵呵. 话说要如何才能测它的电压呢?"
"40391869477158912","40390512645124096","23063222","","","2011-02-23 12:45:46 +0000","引力波宇宙广播系统零号控制站","#wx0 这是别人的测量结果, 我没验证过. 不过麻麻的感觉的确是存在的, 而且用适配器充电时麻感比用电脑的前置USB接口充电高"
"15637769883","15637418359","35192559","","","2010-06-07 15:44:15 +0000","强互作用力宇宙探测器","#Hc95 那就不是DOS程序啦,只是个命令行程序,就像Android里的adb.exe。$ adb push d:\hc95.tar.gz /tmp/ $ adb pull /system/hc95/eyes d:\re\"
After importing, lines with backslashes will be broken.
How could I fix it? Should I use sed or awk to substitute all \ with \ (within 27797 rows...)? Or this can be fixed by just modifying the SQL query?
This is abit more of a discussion than a direct answer. Do you need the double quotes in the middle of the values in the final data (in the DB)? The fact that you have a large amount of data to munge doesn't present any problems at all.
The "" thing is what Oracle does for quotes inside strings. I think whatever built that file attempted to escape the quote sequence. This is the string manual for MySQL. Either of these is valid::
select "hel""lo", "\"hello";
I would tend to do the editing separately to the import, so it easier/faster to see if things worked. If your text file is less than 10MB, it shouldn't take more than a minute to update it via sed.
sed -e 's/\\//' foo.csv
From your comments, you can set the escape char to be something other than '\'.
ESCAPED BY 'char'
This means the loader should verbatim add the values. If it gets too complicated, if you base64() the data before you insert it, this will stop any tools from breaking the UTf8 sequences.
What I did in a similar situation was to create a java string first in a test application. Then compile the test class and fix any errors that I found.
For example:
`String me= "LOAD DATA LOCAL INFILE 'X:/access.log/' REPLACE INTO TABLE `logrecords"+"`\n"+
"FIELDS TERMINATED BY \'|\'\n"+
"ENCLOSED BY \'\"\'\n"+
"ESCAPED BY \'\\\\\'\n"+
"LINES TERMINATED BY \'\\r\\n\'(\n"+
"`startDate` ,\n"+
"`IP` ,\n"+
"`request` ,\n"+
"`threshold` ,\n"+
"`useragent`\n"+
")";
System.out.println("" +me);
enter code here