I'm trying to import a large csv file wiht 27797 rows into MySQL. Here is my code:
load data local infile 'foo.csv' into table bar fields terminated by ',' enclosed by '"' lines terminated by '\n' ignore 1 lines;
It works fine. However, some rows of this file containing backslashes (\), for example:
"40395383771234304","40393156566585344","84996340","","","2011-02-23 12:59:44 +0000","引力波宇宙广播系统零号控制站","#woiu 太好了"
"40395151830421504","40392270645563392","23063222","","","2011-02-23 12:58:49 +0000","引力波宇宙广播系统零号控制站","#wx0 确切地讲安全电压是\""不高于36V\""而不是\""36V\"", 呵呵. 话说要如何才能测它的电压呢?"
"40391869477158912","40390512645124096","23063222","","","2011-02-23 12:45:46 +0000","引力波宇宙广播系统零号控制站","#wx0 这是别人的测量结果, 我没验证过. 不过麻麻的感觉的确是存在的, 而且用适配器充电时麻感比用电脑的前置USB接口充电高"
"15637769883","15637418359","35192559","","","2010-06-07 15:44:15 +0000","强互作用力宇宙探测器","#Hc95 那就不是DOS程序啦,只是个命令行程序,就像Android里的adb.exe。$ adb push d:\hc95.tar.gz /tmp/ $ adb pull /system/hc95/eyes d:\re\"
After importing, lines with backslashes will be broken.
How could I fix it? Should I use sed or awk to substitute all \ with \ (within 27797 rows...)? Or this can be fixed by just modifying the SQL query?
This is abit more of a discussion than a direct answer. Do you need the double quotes in the middle of the values in the final data (in the DB)? The fact that you have a large amount of data to munge doesn't present any problems at all.
The "" thing is what Oracle does for quotes inside strings. I think whatever built that file attempted to escape the quote sequence. This is the string manual for MySQL. Either of these is valid::
select "hel""lo", "\"hello";
I would tend to do the editing separately to the import, so it easier/faster to see if things worked. If your text file is less than 10MB, it shouldn't take more than a minute to update it via sed.
sed -e 's/\\//' foo.csv
From your comments, you can set the escape char to be something other than '\'.
ESCAPED BY 'char'
This means the loader should verbatim add the values. If it gets too complicated, if you base64() the data before you insert it, this will stop any tools from breaking the UTf8 sequences.
What I did in a similar situation was to create a java string first in a test application. Then compile the test class and fix any errors that I found.
For example:
`String me= "LOAD DATA LOCAL INFILE 'X:/access.log/' REPLACE INTO TABLE `logrecords"+"`\n"+
"FIELDS TERMINATED BY \'|\'\n"+
"ENCLOSED BY \'\"\'\n"+
"ESCAPED BY \'\\\\\'\n"+
"LINES TERMINATED BY \'\\r\\n\'(\n"+
"`startDate` ,\n"+
"`IP` ,\n"+
"`request` ,\n"+
"`threshold` ,\n"+
"`useragent`\n"+
")";
System.out.println("" +me);
enter code here
Related
I have csv file with delimiter ';'. It looks following (third column should be empty):
id;theme_name;description
5cbde2fe-b70a-5245-bbde-c2504a4bced1;DevTools;allow web developers to test and debug their code. They are different from website builders and integrated development environments (IDEs) in that they do not assist in the direct creation of a webpage, rather they are tools used for testing the user interface of a website or web application.
c8bfc406-aaa6-5cf9-94c3-09fc54b934e7;AI;
Here is my script for inserting data from csv into db:
mysql -u ${MYSQL_USER} --password=${MYSQL_PASSWORD} ${MYSQL_DATABASE} --local-infile=1 -h ${MYSQL_HOST} -e"LOAD DATA LOCAL INFILE '/tmp/init_data/$file' INTO TABLE $table_name FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '\"' IGNORE 1 LINES";
When I'm making SELECT statement, Im getting carriage return (\r) in last column in response:
Here is response from mysql
{
"themeName": "DevTools",
"description": "allow web developers to test and debug their code. They are different from website builders and integrated development environments (IDEs) in that they do not assist in the direct creation of a webpage, rather they are tools used for testing the user interface of a website or web application.\r"
}, {
"themeName": "AI",
"description": "\r"
}
When I add delimiter ';' after last column in csv file, carriage return disappeared from response.
for example: c8bfc406-aaa6-5cf9-94c3-09fc54b934e7;AI;;
Why mysql add \r into third column ?
Is there any possible way how to solve it ? (except replace in select statement)
Thanks
I bet your CSV file comes from Windows. Those files have \r\n at the end of every line.
Add this to your command ...
LINES TERMINATED BY '\\r\\n'
and things should work. If they don't try this
LINES TERMINATED BY '\r\n'
On UNIX-derived systems (like Linux) the default LINES TERMINATED BY '\n' works.
On Mac systems you need LINES TERMINATED BY '\r'.
If you add a trailing ; column separator you create a fourth column in your CSV. But the table you're loading only has three columns, so LOAD DATA INFILE ignores that fourth column, which has a \r in it.
Why this difference, you ask? Old-timey teletype devices (the original terminals for MS-DOS and UNIX) needed a Return code to move the print head back to the first column, and a Newline code to move the paper up one line. The UNIX Bell Labs team decided their tty driver would add the extra Return code so lines ended with a particular single character. MS-DOS's team (umm, Bill Gates) left them both in.
Why Mac with just Return? Maybe somebody knows.
Answer:
According to #O.Jones answer I needed to add LINES TERMINATED BY '\r' but alsoI need to add \n
mysql -u ${MYSQL_USER} --password=${MYSQL_PASSWORD} ${MYSQL_DATABASE} --local-infile=1 -h ${MYSQL_HOST} -e"LOAD DATA LOCAL INFILE '/tmp/init_data/$file' INTO TABLE $table_name FIELDS TERMINATED BY ';' ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
I have a csv file with millions of rows. Here is the command I am using to load data
load data local infile 'myfile' into table test.mytable
fields terminated by ',' optionally enclosed by '"'
lines terminated by '\n' ignore 1 lines
This caters almost everything except some of the lines where there are double quotes inside a double quoted string. as in
"first column",second column,"third column has "double quotes" inside", fourth column
It truncates the third column and give me warning as this row does not contain data for all columns.
Appreciate your help
The CSV is broken. There is no way MySQL or any program can import it. The double quotes needed to be escaped if inside a column.
You might fix the CSV with a script. If the quotes doesn't have a comma in front or behind it, it's probably part of the text and should be escaped.
The following regular expression will do a negative lookbehind and lookahead to find quotes that don't have a quote right in front or behind it.
/(?<!^)(?<!,)(\s*)"(\s*)(?!,)(?!$)/
See it on regex101
On the command like you can run
perl -pe 's/(?<!,)(?<!^)(\s*)"(\s*)(?!,)(?!$)/\1\\"\2/g' data.csv > data-fixed.csv
Note that this method isn't fool proof. If there is a double quote that does have a comma behind it but is part of the text, there is little you can do to fix the CSV. In that case, the script simply has no way of knowing if it's a column delimiter or not.
Try this:
mysqlimport --fields-optionally-enclosed-by='"' --fields-terminated-by=, --lines-terminated-by="\r\n" --user=YOUR_USERNAME --password YOUR_DATABASE YOUR_TABLE.csv
Good Day
I have created a bat file to import a text file to my MySQL database and it looks as follows:
sqlcmd /user root /pass password /db "MyDB" /command "LOAD DATA LOCAL INFILE 'file.csv' INTO TABLE TG_Orders FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'"
My problem is that I cannot get the "Treat consecutive delimiters as one" to work...
How would I add that?
Now that we have actually got to the real crux of the problem, this is not a consecutive delimiter problem - it's a CSV file format problem.
If your CSV file contains fields like B121,535 and they are not enclosed within quote marks of some kind and your delimeter is , then no amount of SQL jiggery-pokery will sort out your problem. Un-quoted fields with commas like this will always be interpreted as two separate fields unless enclosed within quote marks.
Post a sample line from the CSV file which is causing problems and we can diagnose further. Failing that, export the data from the initial system again making sure that the formatting is correct (either enclose everything in speech marks or just string fields)
Finally, are you sure that your database is MySQL based and not Microsoft SQL? The only references to SQLCMD.EXE I can find all point to Microsoft sites in relation to SQL Server Express but, even then, it has a different option structure (-U for user rather than /user). If this is the case you could have saved a lot of hassle by putting the correct information tags. If not then I would say that SQLCMD.EXE is a custom written application from somewhere and the problem could all stem from that. If that is the case then we can't help if the CSV formatting is correct - you're on your own
I downloaded a tab-delimited file from a well-known source and now want to upload it into a MySQL table. I am doing this using load data local infile.
This data file, which has over 10 million records, also has the misfortune of many backslashes.
$ grep '\\' tabd_file.txt | wc -l
223212
These backslashes aren't a problem, except when they come at the end of fields. MySQL interprets backslashes as an escape character, and when it comes at the end of the field, it messes up the next field, or possibly the next row.
In spite of these backslashes, I only received 6 warnings from MySQL when loading it into a table. In each of these warnings, a row doesn't have the proper number of columns precisely because the backslash concatenated two adjacent fields in the same row.
My question is, how to deal with these backslashes? Should I specify load data local infile [...] escaped by '' to remove any special meaning from them? Or would this have unintended consequences? I can't think of a single important use of an escape sequence in this data file. The actual tabs that terminate fields are "physical tabs", not "\t" sequences.
Or, is removing the escape character from my load command bad practice? Should I just replace every instance of '\' in the file with '\\'?
Thanks for any advice :-)
If you don't need the escaping, then definitely use ESCAPED BY ''.
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
"If the FIELDS ESCAPED BY character is empty, escape-sequence interpretation does not occur. "
I am using LOAD DATA INFILE to import into a MySQL table twenty |-delimited .dat files. But some of the | field terminators are escaped with a backslash. The second field below is an example:
1001604|EMERITUS CORP\WA\|SC 13G|1996-02-13|edgar/data/1001604/0000728757-96-000006.txt
1001604|EMERITUS CORP\WA\|SC 13G|1996-02-14|edgar/data/1001604/0000903949-96-000038.txt
I get an error because the last field clashes with the DATE type declared for the next to last field. I can open the .dat file and escape the escape, but is there a better way?
I could use a stream editor to double all backslashes, but this seems like a bad idea. Can I safely change the FIELDS ESCAPED BY option to something other than "\", or is that a bad idea? Thanks!
Here is my LOAD DATA INFILE command:
LOAD DATA INFILE 'C:/users/richard/research/data/edgar/masterfiles/master_1996.dat'
INTO TABLE edgar.master
FIELDS TERMINATED BY '|'
IGNORE 1 LINES;
Adding ESCAPED BY '' to my FIELDS clause allows the query to complete without error. I will update if I find that this caused a silent fail.