Mysql Load Data: SET IF for all or given subset of columns? - mysql

Got working import code for csv files into my 8-column database here:
Load Data LOCAL InFile 'file.csv' into table myTable
Fields Terminated by ','
OPTIONALLY Enclosed by '"'
Lines Terminated by '\n'
IGNORE 1 Lines
(col1, col2, #var3, #var4, col5, col6, col7, col8)
Set
col3 = if(#var3 = '', NULL, #var3),
col4 = if(#var4 = '', NULL, #var4)
;
It's working fine in changing empty entries to NULL values, but....
Is there any way to shorten the Set part so I don't have to specify a condition for each and every column?
I actually need this for 7 of the 8 columns above, and this particular table is rather small.

Is there any way to shorten the Set part
Yes, MySQL provides a shorthand function NULLIF():
col3 = NULLIF(#var3, '') -- etc
so I don't have to specify a condition for each and every column?
Sadly not, although it should be fairly trivial to generate the desired SQL dynamically in your application code.

Related

sqlldr does not recognize empty tab delimited columns

I am seeing a strange problem when loading my data with sqlldr. Here is my table schema:
CREATE TABLE TEST(
"COL1" VARCHAR2 (255 BYTE),
"COL2" VARCHAR2 (255 BYTE),
"COL3" NUMBER,
"COL4" VARCHAR2 (255 BYTE)
and here is just one row of data I am trying to ingest from the tab delimited file test.txt:
COL1 COL2 COL3 COL4
10 17-cc
notice that the first two columns are empty (null). So my row is really:
\t\t10\t17-cc
my loader script:
load data
infile 'test.txt'
append into table TEST
fields terminated by "\t" optionally enclosed by '"'
TRAILING NULLCOLS
(COL1,COL2,COL3,COL4)
This will be loaded into my table as:
COL1 COL2 COL3 COL4
10 17-CC (null) (null)
which is incorrect. it seems that the two leading tabs in the data row were ignored and COL3 position (10) was assigned to COL1. However, if I try to import the data as a comma separated file:
COL1,COL2,COL3,COL4
,,10,17-cc
it works as expected. Why does the tab delimited version fails here?
NOTE - Fixed my original wrong answer.
Your TAB is defined just fine. You need the NULLIF statement:
load data
infile 'test.txt'
append into table TEST
fields terminated by "\t" optionally enclosed by '"'
TRAILING NULLCOLS
(COL1 NULLIF(COL1=BLANKS),
COL2 NULLIF(COL2=BLANKS),
COL3 NULLIF(COL3=BLANKS),
COL4 NULLIF(COL4=BLANKS)
)

phpMyAdmin - manually import csv and skip columns

In phpMyAdmin, in the tab import, how to specify which columns of the csv must be skipped by the import?
For example, I have this csv:
col1 col2 col3 col4 col5
a b c x 0
1 2 3 y 1
I need to manually import that csv skipping the 4th column.
With the Csv using LOAD DATA I can specify the Column names; in that field I tried to insert these but without working:
col1, col2, col3, #dummy, col5
Invalid column (#dummy) specified! Ensure that columns names are spelled correctly, separated by commas, and not enclosed in quotes.
and
col1, col2, col3, , col5
SQL query:
LOAD DATA LOCAL INFILE '/tmp/phpBlaBlaBla'
INTO TABLE `tblName` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\' LINES TERMINATED BY '\n' IGNORE 1 LINES
(`col1` , `col2` , `col3` , , `col5`)
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near `col5`)' at line 1
The phpMyAdmin version is 4.0.10.16.
Thanks in advance!

mysql infile can't access #user variable in sub query

I am trying to read a csv file and insert rows into a table. I am able to insert them without any problem when I am assigning the value. But the same sql stops working when I try to use a #user variable.
All help is appreciated.
This works:
LOAD DATA INFILE 'tmp/test.csv'
INTO TABLE T1
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
( #var1 )
set create_timestamp = now(),
col1 = ( select max(id) from T2 where
col2 = 1234 );
This doesn't work:
LOAD DATA INFILE 'tmp/test.csv'
INTO TABLE T1
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
( #var1 )
set create_timestamp = now(),
col1 = ( select max(id) from T2 where
col2 = #var1 );
I suspect that the subquery ( select max(id) from is being executed before a value is assigned to #var1.
I'm actually surprised that the first example works at all; I've never tried running a subquery in a LOAD DATA.
When you say "This doesn't work" do you mean that the LOAD DATA throws an error? Or do you mean that the LOAD DATA completes successfully, but with unexpected results?
In the latter case, I'd recommend a simple test, doing a separate SET #var1 = 1234; statement before the LOAD DATA, and see what happens for the first row.
If it's throwing an error, it may be that a subquery isn't supported in that context. (We'd need to consult the MySQL Reference Manual, to see if that is supported.)
That's my guesses. 1) unexpected order of operations (#var1 is being evaluated before the value from the row is assigned to #var1), or 2) a subquery isn't valid in that context.
EDIT
According the the MySQL Reference Manual 5.7
http://dev.mysql.com/doc/refman/5.7/en/load-data.html
It looks like a subquery is supported in the context of a LOAD DATA statement.
I'm having the same problems. Part of my problem was that, using your table definition as an example, T1.col1 is set to NOT NULL but the expression "select max(id) from T2 where col2 = #var1" resulted in nothing being returned and so MySQL attempted to assign a null value to col1. That didn't solve everything, but it was a start.
Here's what I'm working with:
LOAD DATA LOCAL INFILE 'streets.csv'
INTO TABLE streets
CHARACTER SET latin1
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(pre_dir, name, suffix, post_dir, #community, #state)
SET community_id = (SELECT community_id FROM communities_v WHERE community_name = #community AND state_abbr = #state);
EDIT: I had originally put a bunch of other stuff here that I thought was related to the problem, but as it turns out, didn't. It really was a data problem. The subquery I'm using had certain combinations of #community and #state that had no value for community_id, so I would double-check your own subquery to see if it returns a value in all cases.

How to recognize duplicate row when using "load data infile"

I'm trying to load data from a csv file into my database. However rows need to be unique, that is, rows are the same if all columns contain equal data.
My approach now is to add a new column that contains the MD5 sum of all other columns. To this end I created a stored procedure (by using information_schemes.columns) that returns a string with all the columns except the column that will get this unique MD5 sum (uniqueIdentifier).
The code for importing the data now looks as follows (I want to keep it as flexible to, on a later stage, be able to apply it to other files as well):
call select_all_exclude_one('vegas', 'uniqueIdentifier', #exclude_fields);
set #file_input = 'C:/MRTK_Enigma_IRD_VegaOSWP_New_Format.csv' ;
set #field_terminate = '|';
set #line_terminate = '\\n';
set #date_format = '%Y%m%d %H:%i:%s';
set #columns_input = 'deskCode, bookName, riskType, riskTypeShiftSizeInBP, productCode, currency,
maturity, maturityUnderlying, riskValue, currencyRiskValue,
issuerCategory, countryOfIssuer, ratingCategory, postDate,
#the_date, strike, currencyBase, indexCategory, EOL';
set #sql = concat('LOAD DATA INFILE ''', #file_input, '''
IGNORE
INTO TABLE skewrisk.vegas
FIELDS TERMINATED BY ''', #field_terminate, '''
LINES TERMINATED BY ''', #line_terminate, '''
IGNORE 1 LINES \n(', #columns_input,
')\nset lastUpdate = str_to_date(#the_date, ''', #date_format, '''),
uniqueIdentifier = MD5(concat(', #exclude_fields, '))');
select #sql;
prepare stmt from #sql;
execute stmt;
At the end of the code, the column uniqueIdentifier (which is marked as primary key) is set to contain the MD5 sum of all the columns except itself.
However, when running this code I get the following error:
Action: prepare stmt from #sql
Message: Error Code: 1295. This command is not supported in the prepared statement protocol yet
Questions:
1) Is there a more simple approach to what I'm trying?
2) If not, how can this be solved?
Define a UNIQUE constraint over all of the columns:
ALTER TABLE skewrisk.vegas ADD UNIQUE (
deskCode, bookName, riskType, riskTypeShiftSizeInBP, productCode, currency,
maturity, maturityUnderlying, riskValue, currencyRiskValue,
issuerCategory, countryOfIssuer, ratingCategory, postDate,
strike, currencyBase, indexCategory, EOL, lastUpdate
);
Then use LOAD DATA IGNORE:
If you specify IGNORE, input rows that duplicate an existing row on a unique key value are skipped.
NB: if the file contains the table's primary key, this shouldn't be necessary (as the PK is necessarily unique).
I had a similar problem and concatenated the row's values into the primary key column. Not one column value is unique, but the combination of the date, and the value of a couple columns ensured I'd have a unique index value and if anyone tried to import those values again, it would not allow the duplicate:
LOAD DATA LOCAL INFILE 'file'
INTO TABLE table_name
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\\r\\n'
(unique_id, date, col1, col2, ... )
SET unique_id = CONCAT(date,col1,col2)

string to timestamp mysql Error #1411

I'm trying to convert timestamps on the fly when importing a csv file into mysql from string to datetime data type. But I am getting a #1411 - Incorrect datetime value: '2007-03-30 16:01:15' for function str_to_date error.
The SQL:
load data infile 'C:/ProgramData/MySQL/MySQL Server 5.5/data/testfile.csv'
into table test
fields terminated by ','
lines terminated by '\n'
(date, col1,col2,col3,col4)
SET
date = str_to_date(date,'%Y.%m.%d %H:%i:%s.%f');
All rows in the .csv are formated like this:
2007.03.30 16:01:15.901,117.53,117.55,35600000,43700000
I've applied
SELECT str_to_date(date,'%Y.%m.%d %H:%i:%s.%f') FROM test
to sample data that was already stored in mysql, it did work.
The target row date is set to DATETIME.
You need to go via a user variable. As the manual says:
The column list can contain either column names or user variables. With user variables, the SET clause enables you to perform transformations on their values before assigning the result to columns.
User variables in the SET clause can be used in several ways. The following example uses the first input column directly for the value of t1.column1, and assigns the second input column to a user variable that is subjected to a division operation before being used for the value of t1.column2:
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, #var1)
SET column2 = #var1/100;
In your case:
LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 5.5/data/testfile.csv'
INTO TABLE test
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(#date, col1, col2, col3, col4)
SET date = STR_TO_DATE(#date, '%Y.%m.%d %H:%i:%s.%f');