error with MySQL load data infile field with double quotes - mysql

I have .csv file data like this:
"UPRR 38 PAN AM "M"","1"
and I loaded data into table using below command which is having two columns (a and b).
LOAD DATA LOCAL INFILE 'E:\monthly_data.csv'
INTO TABLE test_data_table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n';
But when I select table, it's giving unexpected results which is shown below.
a contains:
UPRR 38 PAN AM "M","1
... and b is NULL.
Thanks

You can replace all the instances of "Double quote double quote" in your file
either A. open the files and find replace them
or B. make a script to open the files and replace the extra quote that is messing it up

You have this:
ENCLOSED BY '"'
Thus " is not a regular character any more. It's a special character that has a special meaning: it highlights the start and end of a column value. If you want to type a " that does not behave that way you need to escape it. The RFC 4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files document explains how to do that:
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote
a;b
"UPRR 38 PAN AM ""M""";1
As they say, garbage in, garbage out ;-)

Related

MySQL bulk load

I'm trying to load csv files into mysql table.
Delimiter : ,(comma)
As part of the source data few of the field values are enclosed in double quotes and inside the double quotes we have ,
There are few records for which / is part of the field data and we need to escape it.
By default / is getting escaped and when I specified the " as escape character " is getting escaped. As we have multiple special characters inside the same file, we need to escape multiple special characters.
Any suggestion
Eg:
id name location
1 A "Location , name here"
2 B "Different Location"
3 C Another Location
4 D Location / with escape character
LOAD DATA LOCAL INFILE 'data.csv' INTO TABLE table_name FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES;
I think it's not possible. Referring to LOAD DATA reference
Any of the field- or line-handling options can specify an empty string (''). If not empty, the FIELDS [OPTIONALLY] ENCLOSED BY and FIELDS ESCAPED BY values must be a single character.
Only a single char is supported for ESCAPED BY field.
My proposal is to use any programming language (e.g. PHP, C# etc.) for opening and processing file line-by-line using regexp

formatting MySQL output to valid CSV or XLSX

I have a query whose output I format and dump onto a CSV file.
This is the code I'm using,
(query.....)
INTO OUTFILE
"/tmp/dump.csv"
FIELDS TERMINATED BY
','
ENCLOSED BY
'"'
LINES TERMINATED BY
'\n'
;
However when I open the CSV in Google Sheets or Excel, the columns are broken up into hundreds of smaller ones.
When I open the CSV in a plain text editor, I see that the column values itself contain quotes (single and double), commas, line-breaks.
Only the double-quotes are escaped.
Even though the double-quotes are escaped, they are omitted when interpreted by Google Sheets and Excel.
I tried manually editing the CSV entries; escaping the commas and such. But no luck. The commas still break the columns. However, in a couple of instances they didn't break the column. I am not able to figure why though.
So my question is how do I correctly format the output to accommodate for these characters and dump it onto a CSV or even an XLXS ( in case a CSV is not capable for situations like these )?
For context, I'm operating in a WordPress environment. If there is a solution in PHP, that can work too.
EDIT ::
Here is a sample line from the CSV,
"1369","Blaze Pannier Mounts for KTM Duke 200 & 390","HTA.04.740.80200/B","<strong>Product Description</strong><span data-sheets-value=\"[null,2,"SW Motech brings you the Blaze Pannier Brackets for the Duke 200 & 390. "]\" data-sheets-userformat=\"[null,null,15293,[null,0],11]\">SW Motech brings you the Blaze Pannier Brackets for the Duke 200 & 390.</span>"," <strong>What's in the box? </strong><span data-sheets-value=\"[null,2,"2 Quick Lock SupportsnMounting materialnMounting Instructions"]\" data-sheets-userformat=\"[null,null,15293,[null,0],null,[null,[[null,2,0,null,null,[null,2,13421772]],[null,0,0,3],[null,1,0,null,1]]],[null,[[null,2,0,null,null,[null,2,13421772]],[null,0,0,3],[null,1,0,null,1]]],[null,[[null,2,0,null,null,[null,2,13421772]],[null,0,0,3],[null,1,0,null,1]]],[null,[[null,2,0,null,null,[null,2,13421772]],[null,0,0,3],[null,1,0,null,1]]],null,0,1,0,null,[null,2,0],"calibri,arial,sans,sans-serif",11]\">2 Quick Lock SupportsMounting materialMounting Instructions</span> ","Installation Instructions"
From RFC 4180
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
Any double quotes inside fields enclosed with double quotes need to be escaped with another double quote. So given abc,ab"c," the expected formatting would be abc,"ab""c","""".

Unable to import UTF-8 .txt file into MySQLWorkbench

I have a .txt file in UTF-8 format with information I am trying to import in an already created table that has rows already in it.
The information in the .txt file is structured like this: (the quotes are included in the .txt file)
"Bob,Smith,25,California,,,,Single,"
"John,Doe,72,Nevada,,2,1,Married,"
"Will,Smith,22,Texas,1000005,2,1,Married,"
The query I'm using is:
LOAD DATA LOCAL INFILE 'myfile.txt' INTO TABLE mytable FIELDS ENCLOSED BY '"' TERMINATED BY ',' LINES TERMINATED BY '\n'
What happens is that all of these records get inserted but get inserted like this
Bob,null,null,null,null,null,null,null
John,null,null,null,null,null,null,null
Will,null,null,null,null,null,null,null
It's like the " is not being caught at the end or something weird . Am I doing something wrong here?
According to the example data provided your fields are not enclosed by quotes but rather the whole record is.
You can use the STARTING BY option to ignore the initial quote and the trailing one should be ignore automatically.
I think what you need is this:
LOAD DATA LOCAL INFILE 'your_file.txt' INTO TABLE your_table FIELDS TERMINATED BY ',' LINES STARTING BY '"' TERMINATED BY '\n';
The format of your text file does not make any sense. (Strangely enough, the way the importer handles it does not make any sense either. But we have no control over the importer, so let us ignore that.)
The double quotes must surround each field, not each line. The ENCLOSED BY '"' clause refers to the fields, not to the lines. So, you are telling the importer that your fields are enclosed in quotes, but you are enclosing your lines in quotes. So, the importer considers each one of your lines as a field.
(Then, the importer proceeds to further chop up your lines at the comma, which makes no sense because the commas are within the quotes, so it should be ignoring them, so the importer is kind of brain-damaged too.)
By using the FIELDS ENCLOSED BY '"' statement the input file should enclose EACH field data by a " therefor the input file should be as follows
"Bob","Smith","25","California","","","","Single",
"John","Doe","72","Nevada","","2","1","Married",
"Will","Smith","22","Texas","1000005","2","1","Married",
That should add the data into the fields

Prevent LOAD DATA INFILE from escaping double double quotes

I have csv data like the following:
"E12 98003";1085894;"HELLA";"8GS007949261";"";1
"5 3/4"";652493;"HELLA";"9HD140976001";"";1
Some fields are included in double quotes. The problem is that
as you may see in the second line the data in the first column contains a double quotation mark at the end as part of the data.
I tried something along the lines of:
LOAD DATA INFILE file.csv
INTO TABLE mytable
FIELDS TERMINATED BY ';' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
but it will use the quotation mark in the data to escape the field enclosing quotation mark. I also tried ESCAPED BY '' and ESCAPED BY '\\' with no success.
Is there a way to stop the LOAD DATA INFILE command from escaping the double double quotation marks?
Or should I parse the csv and put double quotation marks when there is only one?
I am parsing the files anyway using powershell to change the encoding to utf8. Is there some way to fix this quickly there? My powershell code:
function Convert-FileToUTF8 {
param([string]$infile,
[string]$outfile,
[System.Int32]$encodingCode)
$encoding = [System.Text.Encoding]::GetEncoding($encodingCode)
$text = [System.IO.File]::ReadAllText($infile, $encoding)
[System.IO.File]::WriteAllText($outfile, $text)
}
Ok, I did it using a .NET regular expression to fix the csv. It is costly, but not too much.
I wrote
$text = [regex]::Replace($text, "(?m)(?<!^)(?<!\;)""(?!\;)(?!\r?$)", '""');
just before the last line in the function and it seems to work ok. Since I am a novice in regular expressions this could probably be improved.
The main problem is that the input data constitutes invalid CSV syntax, as stated in RFC-4180, paragraph 7:
If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.
But in your PowerShell script you could try to fix this issue with an extra line, using the replace method on $text, once you got it's value:
$text = $text.Replace('"";', '""";')
This should be enough, as the loader will deal well with unescaped double quotes if they appear elsewhere in the data, as stated on mysql.com (my highlight):
If the field begins with the ENCLOSED BY character, instances of that character are recognized as terminating a field value only if followed by the field or line TERMINATED BY sequence.
Of course, if the badly formatted CSV has data that contains ";, then you still have a problem. But it is very hard to determine whether such an occurrence terminates the data or should be seen as part of the data, even for humans :-)
Another thing to pay attention to as found on mysql.com:
If the input values are not necessarily enclosed within quotation marks, use OPTIONALLY before the ENCLOSED BY keywords.
In addition: importing CSV files in MySQL having the values enclosed in quotes works fine when using the ENCLOSED BY option.. UNLESS the enclosed field is the last field in a row, AND you used Excel to create the CSV file. Excel omits the field separator after the last field in a row. MySQL doesn't mind... unless the last field is enclosed in quotes. Then the import terminates at that line.
Examples:
This works fine: ...;value2;value3 (no trailing separator)
This also works fine ...;"value 2";value3 (value enclosed in quotes)
This also works fine ...;value 2;"value3"; (last field value enclosed in quotes and trailing separator)
But this breaks the import: ...;value2;"value 3" (last field value enclosed in quotes and no trailing separator)
Took me some time to figure this out; hope sharing this saves somebody else that time.

MySQL fields terminated by tab

I am trying to upload a tab delimitted file with MySQL. I want a query something likes this: LOAD DATA LOCAL INFILE 'file' INTO TABLE tbl FIELDS TERMINATED BY 'TAB' Is there something I can subsitute for TAB to make this work?
have you tried '\t' the escape sequence + "T" is considered tab... haven't tried, but might be what you need
Just tried to find the answer to this question myself to save re-saving my file with commas separating instead of tabs...
From an old MySQL reference manual, a long way down the page, you can find that TAB is the default separater for files loaded using LOAD DATA on MySQL.
See: http://dev.mysql.com/doc/refman/4.1/en/load-data.html
I just loaded a CSV file in this way into MySQL5.1.
BW
fields terminated by '\t'
Try this one
Note :
Field and Line Handling
For both the LOAD DATA and SELECT ... INTO OUTFILE statements, the syntax of the FIELDS and LINES clauses is the same. Both clauses are optional, but FIELDS must precede LINES if both are specified.
If you specify a FIELDS clause, each of its subclauses (TERMINATED BY, [OPTIONALLY] ENCLOSED BY, and ESCAPED BY) is also optional, except that you must specify at least one of them. Arguments to these clauses are permitted to contain only ASCII characters.
If you specify no FIELDS or LINES clause, the defaults are the same as if you had written this:
FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'
LINES TERMINATED BY '\n' STARTING BY ''
Backslash is the MySQL escape character within strings in SQL statements. Thus, to specify a literal backslash, you must specify two backslashes for the value to be interpreted as a single backslash. The escape sequences '\t' and '\n' specify tab and newline characters, respectively.
In other words, the defaults cause LOAD DATA to act as follows when reading input:
Look for line boundaries at newlines.
Do not skip any line prefix.
Break lines into fields at tabs.
Do not expect fields to be enclosed within any quoting characters.
Interpret characters preceded by the escape character \ as escape sequences. For example, \t, \n, and \ signify tab, newline, and backslash, respectively. See the discussion of FIELDS ESCAPED BY later for the full list of escape sequences.
Conversely, the defaults cause SELECT ... INTO OUTFILE to act as follows when writing output:
Write tabs between fields.
Do not enclose fields within any quoting characters.
Use \ to escape instances of tab, newline, or \ that occur within field values.
Write newlines at the ends of lines.
see: https://dev.mysql.com/doc/refman/8.0/en/load-data.html
for more details.