Insert blanks as NULL to MySQL - mysql

I'm building an AWS pipeline to insert CSV files from S3 to an RDS MySQL DB. The problem I'm facing is that when it attempts to load the file, it treats blanks as empty strings instead of NULLs. For example, Line 1 of the CSV is:
"3","John","Doe",""
Where the value is an integer in the MySQL table, and of course the error in the pipeline is:
Incorrect integer value: '' for column 'col4' at row 1
I was researching the jdbc MySQL paramaters to modify the connection string:
jdbc:mysql://my-rds-endpoint:3306/my_db_name?
jdbcCompliantTruncation=false
jdbcCompliantTruncationis is just an example, is there any of these parameters that can help me insert those blanks as nulls?
Thanks!
EDIT:
A little context, the CSV files are UNLOADS from redshift, so the blanks are originally NULLs when I put them in S3.

the csv files are UNLOADS from redshift
Then look at the documentation for the Redshift UNLOAD command and add the NULL AS option. For example:
NULL AS 'NULL'

use null as '\N' converts blank to null
unload ('SELECT * FROM table')
to 's3://path' credentials
'aws_access_key_id=sdfsdhgfdsjfhgdsjfhgdsjfh;aws_secret_access_key=dsjfhsdjkfhsdjfksdhjkfsdhfjkdshfs'
delimiter '|' null as '\\N' ;

I resolve this issue using the NULLIF function:
insert into table values (NULLIF(?,''),NULLIF(?,''),NULLIF(?,''),NULLIF(?,''))

Related

Oracle sqlldr - handling null values in input dataset

I am working on moving data from MySQL to Oracle. The MySQL input datasets have been provided via a MySQL data dump. Null values in the MySQL database were written as "\N" (without the quotes) in the output file.
I am using sqlldr to get the data into Oracle and "\N" values are problematic in columns mapped to NUMBER data type because Oracle thinks they are strings.
How do I tell sqlldr that any \N values in the input dataset should be mapped to Nulls in Oracle?
Thanks.
This is what worked for me. Note that if you are on unix-based systems, the \N will need to be escaped as follows:
...
COLUMN_NM CHAR(4000) NULLIF COLUMN_NM='\\N',
...
You can use NULLIF in the control file. It will assign null if finds \N in that column. See syntax below.
<COLUMN_NUMBER> NULLIF <COLUMN_NUMBER> = '\\N'

MySQL 5.6, how to export NULL as \N?

I am migrating a MySQL 5.5 physical host database to a MySQL 5.6 AWS Aurora database. I noticed that when data is written to a file using INTO OUTFILE, 5.5 writes NULL value as '\N' and empty string as ''. However, 5.6 writes both empty string and NULL as ''.
Query
SELECT * FROM $databasename.$tablename INTO OUTFILE $filename CHARACTER SET utf8 FIELDS ESCAPED BY '\\\\' TERMINATED BY $delimiter;
I found official documents about this:
https://dev.mysql.com/doc/refman/5.6/en/load-data.html
With fixed-row format (which is used when FIELDS TERMINATED BY and
FIELDS ENCLOSED BY are both empty), NULL is written as an empty
string. This causes both NULL values and empty strings in the table to
be indistinguishable when written to the file because both are written
as empty strings. If you need to be able to tell the two apart when
reading the file back in, you should not use fixed-row format.
How do I export NULL as '\N'?
How do I export NULL as '\N'?
First of all that's strange and why you want to do that? But if for some reason you want to export it that way then you will have to change your query from select * to using a CASE expression like
select
case when col1 is null then '\\N' else col1 end as col1,
...
from $databasename.$tablename....
As commented you can as well use IFNULL() function or COALESCE() function for the same purpose.

Redshift Error 1202 "Extra column(s) found" using COPY command

I'm getting a 1202 Extra column(s) found error in Redshift when trying to load a simple CSV. I've made sure that there are no additional columns nor any unescaped characters in the file that would cause the COPY command to fail with this error.
Here's the created target table:
create table test_table(
name varchar(500),
email varchar(500),
developer_id integer,
developer_name varchar(500),
country varchar(20),
devdatabase varchar(50));
I'm using a simple CSV with no header and only 3 rows of data:
john smith,john#gmail.com,123,johndev,US,comet
jane smith,jane#gmail.com,124,janedev,GB,titan
jack smith,jack#gmail.com,125,jackdev,US,comet
Unfortunately my COPY command fails with err_1202 "Extra column(s) found".
COPY test_table
FROM 's3://mybucket/test/test_contacts.csv'
WITH credentials AS 'aws_access_key_id=<awskey>;aws_secret_access_key=<mykey>'
CSV;
There are no additional columns in the file.
I was also facing the same issue while loading the data. i rectified using following codes :
copy yourtablename
from 'your S3 Locations'
credentials 'your AWS credentials'
delimiter ',' IGNOREHEADER 1
removequotes
emptyasnull
blanksasnull
maxerror 5;
Try this:
COPY test_table
FROM 's3://mybucket/test/test_contacts.csv'
WITH credentials AS 'aws_access_key_id=<awskey>;aws_secret_access_key=<mykey>'
delimiter ','
ignoreheader as 1
emptyasnull
blanksasnull
removequotes
escape;
Source: https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html#r_COPY_command_examples-copy-data-with-the-escape-option
Make sure the correct delimiter is specified in the copy statement (and the source files). I run into the same issue. After a couple of attempts with different delimiters (while unloading table to s3 files, then copying into another table from the s3 files), I was able to solve the issue by using the delimiter '\t'. Here is the full example in my case:
copy <TABLE-NAME>
from 's3://<FILES/LOCATION>'
access_key_id '<INSERT>'
secret_access_key '<INSERT>'
delimiter '\t'
ignoreheader 1
maxerror 10;
notice glue is not as robust as one might think, column order plays a major role, check your table order as well as the table input, make sure the order and data types are identical, also see AWS Glue Developer Guide for more info
in addition, make sure you disabled 'Job bookmark' in the 'Job details' tab, for any development or generic job this is a major source of headache and troubles
This mostly happens because you are using csv format which by default has ',' as delimiter. And in your data, there will be fields with values that contains ','. This causes the data to have extra columns when try to load to redshift. There are quite a few ways to fix this. It will be mostly easy once you have identified which which column has commas in their value. You can identify the columns by looking at the stl_load errors
SELECT starttime, err_reason,raw_line,err_code,query,session,tbl FROM stl_load_errors WHERE filename like 's3://mybucket/test/%' ORDER BY query DESC, starttime DESC
then fix the column where there are extra columns. let say in this example, 'name' column has extra commas. then lets clean that data
df = (df.withColumn('name', F.regexp_replace(F.col('name'), ',', ' '))
)
Store the new dataframe in s3 and then use the below copy command to load to redshift
COPY 'table_name'
FROM 's3 path'
IAM_ROLE 'iam role'
DELIMITER ','
ESCAPE
IGNOREHEADER 1
MAXERROR AS 5
COMPUPDATE FALSE
ACCEPTINVCHARS
ACCEPTANYDATE
FILLRECORD
EMPTYASNULL
BLANKSASNULL
NULL AS 'null';
END;
For me, it turned out to be that I executed the scripts on the wrong database within the cluster.

NULL values while exporting data from remote MySQL server into TSVs

I need to export a large number of tables (~50) as TSVs from a remote MySQL server (so SELECT INTO OUTFILE is not an option per documentation). I am using mysql -e 'SELECT * FROM table' > file.tsv (in a script that loops for each of the ~50 tables). The problem is that with this way, NULL values in all the table are represented as 'NULL' string instead of \N. The 'NULL' then gets converted/casted to odd/undesirable values when the TSVs are loaded back into a local DB (using LOAD DATA INFILE). For example, a date column with NULL is read as '00-00-0000', a varchar(3) column is read as 'NUL'.
I've confirmed that when using SELECT INTO OUTFILE NULLs are correctly represented as \N in the TSV and therefore loaded back into DB correctly. I've also confirmed that if I change the 'NULL' in the TSV to \N the data is loaded corrected.
My question is how do I export the data from the remote server such that the TSVs retain \N in the first place. Are the better ways then doing a SELECT * and redirecting the output to a file? Appreciate any tips! as this NULL issue is bothersome.

BCP : Retaining null values as '\N'

Have to move a table from MS SQL Server to MySQL (~ 8M rows with 8 coloumns). One of the coloumns (DECIMAL Type) is exported as empty string with "bcp" export to a csv file. When I'm using this csv file to load data into MySQL table, it fails saying "Incorrect decimal value".
Looking for possible work arounds or suggestions.
I would create a view in MS SQL which converts the decimal column to a varchar column:
CREATE VIEW MySQLExport AS
SELECT [...]
COALESCE(CAST(DecimalColumn AS VARCHAR(50)),'') AS DecimalColumn
FROM SourceTable;
Then, import into a staging table in MySQL, and use a CASE statement for the final INSERT:
INSERT INTO DestinationTable ([...])
SELECT [...]
CASE DecimalColumn
WHEN '' THEN NULL
ELSE CAST(DecimalColumn AS DECIMAL(10,5))
END AS DecimalColumn,
[...]
FROM ImportMSSQLStagingTable;
This is safe because the only way the value can be an empty string in the export file is if it's NULL.
Note that I doubt you can cheat by exporting it with COALESCE(CAST(DecimalColumn AS VARCHAR(50)),'\N'), because LOAD INFILE would see that as '\N', which is not the same as \N.