MySQL "LOAD DATA INFILE" is importing unquoted "NULL" string as `NULL` - mysql

I'm using MySQL 5.7.35. If I use the LOAD DATA INFILE command on a CSV file with NULL as an unquoted string value in the CSV file, the value is imported as NULL in MySQL.
For example, if I import a CSV file with the following content:
record_number,a,b,c,d,e,f
1,1,2,3,4,5,6
2,NULL,null,Null,nUlL,,"NULL"
The imported table will have the following values:
+---------------+------+--------+--------+--------+--------+--------+
| record_number | a | b | c | d | e | f |
+---------------+------+--------+--------+--------+--------+--------+
| 1 | 1 | 2 | 3 | 4 | 5 | 6 |
| 2 | NULL | "null" | "Null" | "nUlL" | "" | "NULL" |
+---------------+------+--------+--------+--------+--------+--------+
Is there any way to force column a, record 2, to be imported as a string without modifying the CSV file?
Update
#Barmar Pointed out that there's a paragraph in the MySQL documentation on this behavior here:
If FIELDS ENCLOSED BY is not empty, a field containing the literal
word NULL as its value is read as a NULL value. This differs from the
word NULL enclosed within FIELDS ENCLOSED BY characters, which is read
as the string 'NULL'.

This is documented here:
If FIELDS ENCLOSED BY is not empty, a field containing the literal word NULL as its value is read as a NULL value. This differs from the word NULL enclosed within FIELDS ENCLOSED BY characters, which is read as the string 'NULL'.
So you need to specify the quoting character with something like FIELDS ENCLOSED BY '"' and then write "NULL" in the CSV file.
You could check for a NULL value in your code and convert it to a string.
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(record_number, #a, #b, #c, #d, #e, #f)
SET a = IFNULL(#a, 'NULL'),
b = IFNULL(#b, 'NULL'),
c = IFNULL(#c, 'NULL'),
d = IFNULL(#d, 'NULL'),
e = IFNULL(#e, 'NULL'),
f = IFNULL(#f, 'NULL')
However, this can't distinguish between an intentional NULL written as \N and MySQL treating NULL as NULL.

Related

Greenplum turn strings into table

Everyone, how does GP turn the strings looking like a dictionary As below into the table with field names and values.
I mean how to made it in SQL
{"useCoupon":true,"useLevel":false,"usePoints":false,"useActivity" :false}
create table test(a json);
insert into test(a) values('{"useCoupon":true,"useLevel":false,"usePoints":false,"useActivity" :false}');
insert into test(a) values('{"useCoupon":false,"useLevel":false,"usePoints":true,"useActivity" :true}');
After loading data to a staging table using copy or gpfdist or any another utilities, select the required fields like below and load the data to final table.
gpadmin=# select a -> 'useCoupon' as useCoupon,a -> 'useLevel' as useLevel,a -> 'usePoints' as usePoints,a -> 'useActivity' as useActivity from test;
usecoupon | uselevel | usepoints | useactivity
-----------+----------+-----------+-------------
true | false | false | false
false | false | true | true
(2 rows)

Jsonb extract in PostgresSQL - problem with '{}'::

I have this jsonb column in a PostgresSQL table.
{
"{\"start\":\"14:00\",\"end\":\"14:50\"}",
"{\"start\":\"14:51\",\"end\":\"15:40\"}",
"{\"start\":\"15:41\",\"end\":\"16:30\"}",
"{\"start\":\"16:31\",\"end\":\"17:20\"}"
}
I need to extract all values of start and end.
I want the result to be like this
id | start1 | end1 | start2 | end2 | start3 | end3 | start4 | end4
or
id | start1 | end1
id | start2 | end2
id | start3 | end3
id | start4 | end4
The usual ->> doesn't work for this and I have no clue how can I do that.
You don't say what version of Postgres you are using, but if it probably has:
https://www.postgresql.org/docs/12/runtime-config-compatible.html#RUNTIME-CONFIG-COMPATIBLE-VERSION
standard_conforming_strings (boolean)
This controls whether ordinary string literals ('...') treat backslashes The presence of this parameter can also be taken as an indication that the escape string syntax (E'...') is supported. Escape string syntax (Section 4.1.2.2) should be used if an application desires backslashes to be treated as escape characters.
In that case to deal with the escapes in your JSON you need to do:
select E'{\"start\":\"14:00\",\"end\":\"14:50\"}'::jsonb;
jsonb
------------------------------------
{"end": "14:50", "start": "14:00"}
(1 row)
select E'{\"start\":\"14:00\",\"end\":\"14:50\"}'::jsonb ->> 'start';
?column?
----------
14:00
select E'[
{\"start\":\"14:00\",\"end\":\"14:50\"},
{\"start\":\"14:51\",\"end\":\"15:40\"},
{\"start\":\"15:41\",\"end\":\"16:30\"},
{\"start\":\"16:31\",\"end\":\"17:20\"}
]'::jsonb;
--------------------------------------------------------------------------------------------------------------------------------------------------
[{"end": "14:50", "start": "14:00"}, {"end": "15:40", "start": "14:51"}, {"end": "16:30", "start": "15:41"}, {"end": "17:20", "start": "16:31"}]

Loading quoted numbers into snowflake table from CSV with COPY TO <TABLE>

I have a problem with loading CSV data into snowflake table. Fields are wrapped in double quote marks and hence there is problem with importing them into table.
I know that COPY TO has CSV specific option FIELD_OPTIONALLY_ENCLOSED_BY = '"'but it's not working at all.
Here are some pices of table definition and copy command:
CREATE TABLE ...
(
GamePlayId NUMBER NOT NULL,
etc...
....);
COPY INTO ...
FROM ...csv.gz'
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 1
error_on_column_count_mismatch=false
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "ABORT_STATEMENT"
;
Csv file looks like this:
"3922000","14733370","57256","2","3","2","2","2019-05-23 14:14:44",",00000000",",00000000",",00000000",",00000000","1000,00000000","1000,00000000","1317,50400000","1166,50000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000"
I get an error
'''Numeric value '"3922000"' is not recognized '''
I'm pretty sure it's because NUMBER value is interpreted as string when snowflake is reading "" marks, but since I use
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
it shouldn't even be there... Does anyone have some solution to this?
Maybe something is incorrect with your file? I was just able to run the following without issue.
1. create the test table:
CREATE OR REPLACE TABLE
dbNameHere.schemaNameHere.stacko_58322339 (
num1 NUMBER,
num2 NUMBER,
num3 NUMBER);
2. create test file, contents as follows
1,2,3
"3922000","14733370","57256"
3,"2",1
4,5,"6"
3. create stage and put file in stage
4. run the following copy command
COPY INTO dbNameHere.schemaNameHere.STACKO_58322339
FROM #stageNameHere/stacko_58322339.csv.gz
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 0
ERROR_ON_COLUMN_COUNT_MISMATCH=FALSE
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "CONTINUE";
4. results
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| stageNameHere/stacko_58322339.csv.gz | LOADED | 4 | 4 | 4 | 0 | NULL | NULL | NULL | NULL |
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 2.436s
5. view the records
>SELECT * FROM dbNameHere.schemaNameHere.stacko_58322339;
+---------+----------+-------+
| NUM1 | NUM2 | NUM3 |
|---------+----------+-------|
| 1 | 2 | 3 |
| 3922000 | 14733370 | 57256 |
| 3 | 2 | 1 |
| 4 | 5 | 6 |
+---------+----------+-------+
Can you try with a similar test as this?
EDIT: A quick look at your data shows many of your numeric fields appear to start with commas, so something definitely amiss with the data.
Assuming your numbers are European formatted , decimal place, and . thousands, reading the numeric formating help, it seems Snowflake does not support this as input. I'd open a feature request.
But if you read the column in as text then use REPLACE like
SELECT '100,1234'::text as A
,REPLACE(A,',','.') as B
,TRY_TO_DECIMAL(b, 20,10 ) as C;
gives:
A B C
100,1234 100.1234 100.1234000000
safer would be to strip placeholders first like
SELECT '1.100,1234'::text as A
,REPLACE(A,'.') as B
,REPLACE(B,',','.') as C
,TRY_TO_DECIMAL(C, 20,10 ) as D;

spark df.write quote all fields but not null values

I am trying to create a csv from values stored in the table:
| col1 | col2 | col3 |
| "one" | null | "one" |
| "two" | "two" | "two" |
hive > select * from table where col2 is null;
one null one
I am getting the csv using the below code:
df.repartition(1)
.write.option("header",true)
.option("delimiter", ",")
.option("quoteAll", true)
.option("nullValue", "")
.csv(S3Destination)
Csv I get:
"col1","col2","col3"
"one","","one"
"two","two","two"
Expected Csv:WITH NO DOUBLE QUOTES FOR NULL VALUE
"col1","col2","col3"
"one",,"one"
"two","two","two"
Any help is appreciated to know if the dataframe writer has options to do this.
You can go in a udf approach and apply on the column (using withColumn on the repartitioned datafrmae above) where possiblity of double quote empty string is there see below sample code
sqlContext.udf().register("convertToEmptyWithOutQuotes",(String abc) -> (abc.trim().length() > 0 ? abc : abc.replace("\"", " ")),DataTypes.StringType);
String has replace method which does the job.
val a = Array("'x'","","z")
println(a.mkString(",").replace("\"", " "))
will produce 'x',,z

Remove double quotes from the return of a function in PostgreSQL

I have the following function in PostgreSQL
CREATE OR REPLACE FUNCTION public.translatejson(JSONB, TEXT)
RETURNS TEXT
AS
$BODY$
SELECT ($1->$2)::TEXT
$BODY$
LANGUAGE sql STABLE;
When I execute it I receive the values surrounded by double quotes. For example:
SELECT id, translatejson("title", 'en-US') AS "tname" FROM types."FuelTypes";
in return I get a table like this
-------------------
| id | tname |
-------------------
| 1 | "gasoline" |
| 2 | "diesel" |
-------------------
The values in the 'title' column are in JSON format:
{ "en-US":"gasoline", "fr-FR":"essence" }.
How I can omit the double quotes to return just the string of the result?
The -> operator returns a json result. Casting it to text leaves it in a json reprsentation.
The ->> operator returns a text result. Use that instead.
test=> SELECT '{"car": "going"}'::jsonb -> 'car';
?column?
----------
"going"
(1 row)
test=> SELECT '{"car": "going"}'::jsonb ->> 'car';
?column?
----------
going
(1 row)