I'm trying to load csv files into mysql table.
Delimiter : ,(comma)
As part of the source data few of the field values are enclosed in double quotes and inside the double quotes we have ,
There are few records for which / is part of the field data and we need to escape it.
By default / is getting escaped and when I specified the " as escape character " is getting escaped. As we have multiple special characters inside the same file, we need to escape multiple special characters.
Any suggestion
Eg:
id name location
1 A "Location , name here"
2 B "Different Location"
3 C Another Location
4 D Location / with escape character
LOAD DATA LOCAL INFILE 'data.csv' INTO TABLE table_name FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES;
I think it's not possible. Referring to LOAD DATA reference
Any of the field- or line-handling options can specify an empty string (''). If not empty, the FIELDS [OPTIONALLY] ENCLOSED BY and FIELDS ESCAPED BY values must be a single character.
Only a single char is supported for ESCAPED BY field.
My proposal is to use any programming language (e.g. PHP, C# etc.) for opening and processing file line-by-line using regexp
Related
I'm using the following to load a csv file to a mysql database:
LOAD DATA LOCAL INFILE 'file_location.csv'
INTO TABLE social_spend
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 ROWS
I have a text field that is written as: 'D,i,Y', which, after being uploaded shows in my database as the same:'D,i,Y'
I have a numeric field that is written as: '1,234' that is automatically trunacted on upload to: '1'
Why are text field commas preserved, but numeric fields are not?
When MySQL encounters a text string in a numeric context, it coerces it to a number. Sloppily.
So if you use the string 00134abc in a numeric context, you get the number 134. And, in your case it sees 1,234 and coerces it to 1.
Yeah, we all know about this being a pain in the xxx neck.
You may be able to do your load with something like this (NOT debugged!)
LOAD DATA LOCAL INFILE 'file_location.csv'
INTO TABLE social_spend
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 ROWS
(col1, col2, #yourNumber, col4)
SET numberColumn = CAST(REPLACE(#yourNumber, ',', '') AS SIGNED INTEGER);
Between the parentheses you can list columns of your table to correspond to columns in your CSV. If you put #something you can use it in an expression in a SET clause. The expression I showed strips out the commas from your numbers.
I have a .txt file in UTF-8 format with information I am trying to import in an already created table that has rows already in it.
The information in the .txt file is structured like this: (the quotes are included in the .txt file)
"Bob,Smith,25,California,,,,Single,"
"John,Doe,72,Nevada,,2,1,Married,"
"Will,Smith,22,Texas,1000005,2,1,Married,"
The query I'm using is:
LOAD DATA LOCAL INFILE 'myfile.txt' INTO TABLE mytable FIELDS ENCLOSED BY '"' TERMINATED BY ',' LINES TERMINATED BY '\n'
What happens is that all of these records get inserted but get inserted like this
Bob,null,null,null,null,null,null,null
John,null,null,null,null,null,null,null
Will,null,null,null,null,null,null,null
It's like the " is not being caught at the end or something weird . Am I doing something wrong here?
According to the example data provided your fields are not enclosed by quotes but rather the whole record is.
You can use the STARTING BY option to ignore the initial quote and the trailing one should be ignore automatically.
I think what you need is this:
LOAD DATA LOCAL INFILE 'your_file.txt' INTO TABLE your_table FIELDS TERMINATED BY ',' LINES STARTING BY '"' TERMINATED BY '\n';
The format of your text file does not make any sense. (Strangely enough, the way the importer handles it does not make any sense either. But we have no control over the importer, so let us ignore that.)
The double quotes must surround each field, not each line. The ENCLOSED BY '"' clause refers to the fields, not to the lines. So, you are telling the importer that your fields are enclosed in quotes, but you are enclosing your lines in quotes. So, the importer considers each one of your lines as a field.
(Then, the importer proceeds to further chop up your lines at the comma, which makes no sense because the commas are within the quotes, so it should be ignoring them, so the importer is kind of brain-damaged too.)
By using the FIELDS ENCLOSED BY '"' statement the input file should enclose EACH field data by a " therefor the input file should be as follows
"Bob","Smith","25","California","","","","Single",
"John","Doe","72","Nevada","","2","1","Married",
"Will","Smith","22","Texas","1000005","2","1","Married",
That should add the data into the fields
I'm exporting a database report with a shell file. If I run the query in PHPMyAdmin the file comes out fine, new lines at the end of each row in the database only.
However when I run the query in my shell script using outfile to generate the file I get /n, /r and /r/n in some of the columns content. I can't work out what causes this or how to avoid it.
The issue only seems to be caused in the colour column which is the third in the example export.
Query:
mysql $MYSQLOPTS << EOFMYSQL
SELECT Product_Name, Item_Size, Item_Colour, Item_Price, Current_Stock, Item_Price * Current_Stock AS Stock_Value
FROM Items
ORDER BY Product_Name
INTO OUTFILE '$FILE'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
EOFMYSQL
Example result:
"Scarf_in_Peach","ONE SIZE","12/04-B2B2 ",10.00,3,30.00
"Scarf_in_Pink","ONE SIZE ","11/06-odds-C1C12100",10.00,0,0.00
"Scarf_in_Red","ONE SIZE ","11/06-B7B2-C1C12100",10.00,0,0.00
"Scarf_in_Sand_","ONE SIZE","11/06-B1I3-C1C12100
",10.00,0,0.00
"Scarf_in_Sand_/_Blue_Flowers","ONE SIZE","12/04-B2E2-C1C12100 ",10.00,4,40.00
"Scarf_in_Teal","ONE SIZE","11/06-B5G1-C1C12100
",10.00,0,0.00
"Scarf_in_Teal_/_Red_Flowers","ONE SIZE","12/04 - B2B2 ",10.00,1,10.00
"Sunrise_Skinnies","16","ODD-R1S009-1-BLUE",20.00,0,0.00
"Sunrise_Skinnies","8","ODD-R1S009-1
BLUE",20.00,0,0.00
You have 2 options:
Replace carriage return and line feed characters with empty string within your query. Pro: it is completely up to you what characters you filter out and from which fields. Con: you have to create expression for each affected field manually.
Use FIELDS ESCAPED BY character option of the SELECT ... INTO OUTFILE ... command:
FIELDS ESCAPED BY controls how to write special characters. If the
FIELDS ESCAPED BY character is not empty, it is used when necessary to
avoid ambiguity as a prefix that precedes following characters on
output:
The FIELDS ESCAPED BY character
The FIELDS [OPTIONALLY] ENCLOSED BY character
The first character of the FIELDS TERMINATED BY and LINES TERMINATED BY values
ASCII NUL (the zero-valued byte; what is actually written following the escape character is ASCII “0”, not a zero-valued byte)
The FIELDS TERMINATED BY, ENCLOSED BY, ESCAPED BY, or LINES TERMINATED
BY characters must be escaped so that you can read the file back in
reliably. ASCII NUL is escaped to make it easier to view with some
pagers.
Pro: this is a fast and standard approach, that you can easily apply to all export functionality using this approach. Con: less flexible. For example, if the lines terminated by option is set to \n, then \r is not going to be escaped, which can still cause some issues on some systems.
I have .csv file data like this:
"UPRR 38 PAN AM "M"","1"
and I loaded data into table using below command which is having two columns (a and b).
LOAD DATA LOCAL INFILE 'E:\monthly_data.csv'
INTO TABLE test_data_table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n';
But when I select table, it's giving unexpected results which is shown below.
a contains:
UPRR 38 PAN AM "M","1
... and b is NULL.
Thanks
You can replace all the instances of "Double quote double quote" in your file
either A. open the files and find replace them
or B. make a script to open the files and replace the extra quote that is messing it up
You have this:
ENCLOSED BY '"'
Thus " is not a regular character any more. It's a special character that has a special meaning: it highlights the start and end of a column value. If you want to type a " that does not behave that way you need to escape it. The RFC 4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files document explains how to do that:
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote
a;b
"UPRR 38 PAN AM ""M""";1
As they say, garbage in, garbage out ;-)
I am trying to upload a tab delimitted file with MySQL. I want a query something likes this: LOAD DATA LOCAL INFILE 'file' INTO TABLE tbl FIELDS TERMINATED BY 'TAB' Is there something I can subsitute for TAB to make this work?
have you tried '\t' the escape sequence + "T" is considered tab... haven't tried, but might be what you need
Just tried to find the answer to this question myself to save re-saving my file with commas separating instead of tabs...
From an old MySQL reference manual, a long way down the page, you can find that TAB is the default separater for files loaded using LOAD DATA on MySQL.
See: http://dev.mysql.com/doc/refman/4.1/en/load-data.html
I just loaded a CSV file in this way into MySQL5.1.
BW
fields terminated by '\t'
Try this one
Note :
Field and Line Handling
For both the LOAD DATA and SELECT ... INTO OUTFILE statements, the syntax of the FIELDS and LINES clauses is the same. Both clauses are optional, but FIELDS must precede LINES if both are specified.
If you specify a FIELDS clause, each of its subclauses (TERMINATED BY, [OPTIONALLY] ENCLOSED BY, and ESCAPED BY) is also optional, except that you must specify at least one of them. Arguments to these clauses are permitted to contain only ASCII characters.
If you specify no FIELDS or LINES clause, the defaults are the same as if you had written this:
FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'
LINES TERMINATED BY '\n' STARTING BY ''
Backslash is the MySQL escape character within strings in SQL statements. Thus, to specify a literal backslash, you must specify two backslashes for the value to be interpreted as a single backslash. The escape sequences '\t' and '\n' specify tab and newline characters, respectively.
In other words, the defaults cause LOAD DATA to act as follows when reading input:
Look for line boundaries at newlines.
Do not skip any line prefix.
Break lines into fields at tabs.
Do not expect fields to be enclosed within any quoting characters.
Interpret characters preceded by the escape character \ as escape sequences. For example, \t, \n, and \ signify tab, newline, and backslash, respectively. See the discussion of FIELDS ESCAPED BY later for the full list of escape sequences.
Conversely, the defaults cause SELECT ... INTO OUTFILE to act as follows when writing output:
Write tabs between fields.
Do not enclose fields within any quoting characters.
Use \ to escape instances of tab, newline, or \ that occur within field values.
Write newlines at the ends of lines.
see: https://dev.mysql.com/doc/refman/8.0/en/load-data.html
for more details.