I'm trying to import a CSV file into a MySQL 5.1 DB using phpMyAdmin. The file includes several date columns which may contain NULL values. According to the manual, NULL should be written as \N. However, after an otherwise successful import, \N appears as 0000-00-00 in the date columns (as opposed to NULL). How do I get NULLs imported?
Options set:
line separator ,
fields enclosed by ",
fields escaped by \,
lines terminated by auto.
phpMyAdmin distinguishes two CSV import formats: CSV and CSV with LOAD DATA. The latter option actually accepts \N as described in the manual.
If columns are enclosed by a character (as it is the case), NULLs may also be imported using the former method (no LOAD DATA) by setting a value to a non-enclosed NULL in the CSV file. This is in accordance with the manual's following statement:
If FIELDS ENCLOSED BY is not empty, a field containing the literal
word NULL as its value is read as a NULL value. This differs from the
word NULL enclosed within FIELDS ENCLOSED BY characters, which is read
as the string 'NULL'.
Related
I tried to import data from csv to mysql with wizard, rows contain null values are not included in the imported table.
How to fix it?
You need to handle the NULL in you csv file, as described on MySQL manual :
Handling of NULL values varies according to the FIELDS and LINES options in use:
For the default FIELDS and LINES values, NULL is written as a field value of \N for output, and a field value of \N is read as NULL for input (assuming that the ESCAPED BY character is ).
If FIELDS ENCLOSED BY is not empty, a field containing the literal word NULL as its value is read as a NULL value. This differs from the word NULL enclosed within FIELDS ENCLOSED BY characters, which is read as the string 'NULL'.
If FIELDS ESCAPED BY is empty, NULL is written as the word NULL.
With fixed-row format (which is used when FIELDS TERMINATED BY and FIELDS ENCLOSED BY are both empty), NULL is written as an empty string. This causes both NULL values and empty strings in the table to be indistinguishable when written to the file because both are written as empty strings. If you need to be able to tell the two apart when reading the file back in, you should not use fixed-row format.
So you if there is NULL in csv data, replace it by \N.
E.g.: replace
1, foo, NULL, bar
by :
1, foo, \N, bar
Summary:
Original question from a year ago: How to escape double quotes within a data when it is already enclosed by double quotes
I have the same need as the original poster: I have a CSV file that matches the CSV RFC spec (my data has double quotes that are properly qualified, my data has commas in it, and my data also has line feeds in it. Excel is able to read it just fine because the file matches the spec and excel properly reads the spec).
Unfortunately I can't figure out how to import files that match the CSV RFC 4180 spec into snowflake. Any ideas?
Details:
We've been creating CSV files that match the RFC 4180 spec for years in order to maximize compatibility across applications and OSes.
Here is a sample of what my data looks like:
KEY,NAME,DESCRIPTION
1,AFRICA,This is a simple description
2,NORTH AMERICA,"This description has a comma, so I have to wrap the whole field in double quotes"
3,ASIA,"This description has ""double quotes"" in it, so I have to qualify the double quotes and wrap the field in double quotes"
4,EUROPE,"This field has a carriage
return so it is wrapped in double quotes"
5,MIDDLE EAST,Simple descriptoin with single ' quote
When opening this file in Excel, Excel properly reads the rows/columns (because excel follows the RFC spec):
In order to import this file into Snowflake, I first try to create a file format and I set the following:
Name
Value
Column Separator
Comma
Row Separator
New Line
Header lines to skip
1
Field optionally enclosed by
Double Quote
Escape Character
"
Escape Unenclosed Field
None
But when go to save the file format, I get this error:
Unable to create file format "CSV_SPEC".
SQL compilation error: value ["] for parameter 'FIELD_OPTIONALLY_ENCLOSED_BY' conflict with parameter 'ESCAPE'
It would appear that I'm missing something? I would think that I must be getting the snowflake configuration wrong. (
While writing up this question and testing all the scenarios I could think of, I found a file format that seems to work:
Name
Value
Column Separator
Comma
Row Separator
New Line
Header lines to skip
1
Field optionally enclosed by
Double Quote
Escape Character
None
Escape Unenclosed Field
None
Same information, but for those that prefer screenshots:
Same information again, but in SQL form:
ALTER FILE FORMAT "DB_NAME"."SCHEMA_NAME"."CSV_SPEC3" SET COMPRESSION = 'NONE' FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '\042' TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE' ESCAPE_UNENCLOSED_FIELD = 'NONE' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
I don't know why this works, but it does, so, there you.
Iam getting the below error when I try to load CSV From my system to Snowflake table:
Unable to copy files into table.
Numeric value '"4' is not recognized File '#EMPP/ui1591621834308/snow.csv', line 2, character 25 Row 1, column "EMPP"["SALARY":5] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
You appear to be loading your CSV with the file format option of FIELD_OPTIONALLY_ENCLOSED_BY='"' specified.
This option will allow reading any fields properly quoted with the " character, and even support such fields carrying the delimiter character as well as the " character if properly escaped. Some examples that could be considered valid:
CSV FORM | ACTUAL DATA
------------------------
abc | abc
"abc" | abc
"a,bc" | a,bc
"a,""bc""" | a,"bc"
In particular, notice that the final example follows the specified rule:
When a field contains this character, escape it using the same character. For example, if the value is the double quote character and a field contains the string A "B" C, escape the double quotes as follows:
A ""B"" C
If your CSV file carries quote marks within the data but is not necessarily quoting the fields (and delimiters and newlines do not appear within data fields), you can remove the FIELD_OPTIONALLY_ENCLOSED_BY option from your file format definition and just read the file at the delimited (,) fields.
If your CSV does use quoting, ensure that whatever is producing the CSV files is using a valid CSV format writer and not simple string munging, and recreate it with the quotes properly escaped. If the above data example is to be considered valid in quoted form, it must instead appear within the file as "4" or 4.
The error message is saying that you have a value in your file that contains a "4 which is being added into a table that has a number field for that value. Since that isn't a number, it fails. This appears to be happening in your very first row of your file, so you could open it up and take a look at the value. If its just one record, you can add the ON_ERROR = 'CONTINUE' to your command, so that it skips it and moves on.
Im trying to load a CSV file to redshift.
Delimiter '|'
1'st column of CSV:
1 |Bhuvi|"This is ok"|xyz#domain.com
I used this command to load.
copy tbl from 's3://datawarehouse/source.csv'
iam_role 'arn:aws:iam:::role/xxx'cas-pulse-redshift'
delimiter '|'
removequotes
ACCEPTINVCHARS ;
ERROR:
raw_field_value | This is ok" |xyz#domain.com
err_code | 1214
err_reason | Delimited value missing end quote
then I tried this too.
copy tbl from 's3://datawarehouse/source.csv'
iam_role 'arn:aws:iam:::role/xxx'
CSV QUOTE '\"'
DELIMITER '|'
ACCEPTINVCHARS ;
Disclaimer - Even though this post does not answer the question asked here, I am posting this analysis in case it helps some one.
The error "Delimited value missing end quote" can be reported in cases where a quoted text column is missing the end quote, or if the text column value has a new line in the value itself. In my case, there was a newline in the text column value.
As per RFC 4180 the specification of CSV says,
Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes.
For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
So a valid CSV can have multi-line rows, and the correct way to import it in Redshift is to specify the CSV format option. This also assumes that all columns having the quote character in the value will have the quote character escaped by another preceding quote character. This is also as per the CSV RFC specification.
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote.
For example:
"aaa","b""bb","ccc"
If the file that we are trying to import is not a valid CSV, and is just named as a .CSV file as the case may just be, then we have the following options.
Try copying the file without specifying the CSV option, and fine tuning the delimiter and escape and quoting behaviour with the corresponding copy options.
If a set of options is not able to consistently copy data, then pre-process the file to make it consistent.
In general, it helps to make the behaviour deterministic if we try to export and import data in formats that are consistent.
Data:
1|\N|"First\Line"
2|\N|"Second\Line"
3|100|\N
\N represents NULL in MYSQL & MariaDB.
I'm trying to load above data using LOAD DATA LOCAL INFILE method into a table named ID_OPR.
Table structure:
CREATE TABLE ID_OPR (
idnt decimal(4),
age decimal(3),
comment varchar(100)
);
My code looks like below:
LOAD DATA LOCAL INFILE <DATA FILE LOCATION> INTO TABLE <TABLE_NAME> FIELDS TERMINATED BY '|' ESCAPED BY '' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';
Problem with this code is it aborts with error Incorrect decimal value: '\\N' For column <Column name>.
Question:
How to load this data with NULL values in second decimal column and also without loosing \(Backslash) from third string column?
I'm trying this is MariaDB which is similar to Mysql in most case.
Update:
The error i have mentioned appears like a warning and the data is actually getting loaded into table. But the catch here is with the text data.
For example: Incase of the third record above it is being loaded as \N itself into string column. But i want it to be NULL.
Is there any way to make the software to recognize this null value? Something like decode in oracle?
You can't have it both ways - either \ is an escape character or it is not. From MySQL docs:
If the FIELDS ESCAPED BY character is empty, no characters are escaped and NULL is output as NULL, not \N. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.
So, I'd suggest a consistently formatted input file, however that was generated:
use \\ if you want to keep the backslash in the strings
make \ an escape character in your load command
OR
make strings always, not optionally, enclosed in quotes
leave escape character empty, as is
use NULL for nulls, not \N
BTW, this also explains the warnings you were experiencing loading \N in your decimal field.
Deal with nulls with blanks. that should fix it.
1||"First\Line"
2||"Second\Line"
3|100|
Thats how nulls are handled on CSVs and TSVs. And don't expect decimal datatype to go null as it stays 0, use int or bigint instead if needed. You should forget about "ESCAPED BY"; as long as string data is enclosed by "" that deals with the escaping problem.
we need three text file & 1 batch file for Load Data:
Suppose your file location 'D:\loaddata'
Your text file 'D:\loaddata\abc.txt'
1. D:\loaddata\abc.bad -- empty
2. D:\loaddata\abc.log -- empty
3. D:\loaddata\abc.ctl
a. Write Code Below for no separator
OPTIONS ( SKIP=1, DIRECT=TRUE, ERRORS=10000000, ROWS=5000000)
load data
infile 'D:\loaddata\abc.txt'
TRUNCATE
into table Your_table
(
a_column POSITION (1:7) char,
b_column POSITION (8:10) char,
c_column POSITION (11:12) char,
d_column POSITION (13:13) char,
f_column POSITION (14:20) char
)
b. Write Code Below for coma separator
OPTIONS ( SKIP=1, DIRECT=TRUE, ERRORS=10000000, ROWS=5000000)
load data
infile 'D:\loaddata\abc.txt'
TRUNCATE
into table Your_table
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
(a_column,
b_column,
c_column,
d_column,
e_column,
f_column
)
4.D:\loaddata\abc.bat "Write Code Below"
sqlldr db_user/db_passward#your_tns control=D:\loaddata\abc.ctl log=D:\loaddata\abc.log
After double click "D:\loaddata\abc.bat" file you data will be load desire oracle table. if anything wrong check you "D:\loaddata\abc.bad" and "D:\loaddata\abc.log" file