Reformat input DATE data - mysql

I have a rather large (over 200,000 records) file that I am inserting into a table with 59 columns. The data contains several DATETIME fields. The input dates are in the form '10/06/2019 10:45:58'. How can I input these fields as DATETIME (or maybe just DATE, since the TIME field is irrelevant for my purposes)?
If I set the various date fields to type DATETIME, the fields come up as 0000-00-00 00:00:00 after loading. That's because the database does not know what to do with the input format.
I see two different approaches, but each has issues:
Preprocessing:
I create a script to detect the date fields using Regex and reformat them to the expected form using something like Perl's DateTime::Format::DBI. There is a risk here since the records include freeform TEXT fields that may contain embedded commas and quotes. Positively identifying the DATE fields is difficult just because of scale.
Post-processing:
Create the table with the date fields as VARCHAR and use the STR_TO_DATE SQL function populate the date columns.
INSERT INTO mytable(DATELastDetected, DATEFirstDetected)
SELECT STR_TO_DATE(LastDetected, '%c/%e/%Y %H:%i'),
STR_TO_DATE(FirstDetected, '%c/%e/%Y %H:%i')
from mytable;
Third Option?
I've wondered whether I might specify the expected format of the input data for that DATE columns in the CREATE TABLE statement, which would render the whole discussion moot. I've seen another question that mentions the use of DATEFORMAT in a CREATE TABLE statement, but I have not found the right syntax to use.
Appreciate any thoughts.

#ben-personick answered it with his comment. Here's what my Load statement looks like:
LOAD DATA INFILE '/opt/mysql/work/report.csv'
INTO TABLE `my_db`.`tbl_reportdata`
CHARACTER SET utf8mb4
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1
LINES (`IP`,
[...]
`OS`,
#FirstDetectedVar,# This field is defined as DATETIME
#LastDetectedVar, # This field is defined as DATETIME
[...]
`Category`)
set
`FirstDetected` = str_to_date(#FirstDetectedVar, '%m/%d/%Y %H:%i'),
`LastDetected` = str_to_date(#LstDetectedVar, '%m/%d/%Y %H:%i');
I figured the answer was out there. Hopefully this working example will help someone else.

Related

Dates not importing correctly

I know similar questions have been posted before, but when I try to follow similar approaches as per the suggestions in the comments, it simply does not help. My query is the following:
LOAD DATA INFILE 'File.txt'
IGNORE
INTO TABLE table_name
FIELDS TERMINATED BY '^~'
LINES TERMINATED BY '\r\n'
IGNORE 1 ROWS
(RUN_DATE, PROC_DT, STL_DT, TRD_DT)
SET RUN_DATE = STR_TO_DATE(RUN_DATE, '%d-%b-%y');
The records in the file look something like this:
RUN_DATE^~PROC_DT^~STL_DT^~TRD_DT
21-DEC-20^~23-DEC-20^~23-DEC-20^~21-DEC-20
The dates that get loaded are all populated as '0000-00-00 00:00:00' which I know are the default values when there is a datatype error and IGNORE is used. From what I found online, the issue has to do with the in-file date not being in yyyy-mm-dd format which is the default for mySQL, but the '%d-%b-%y' in the STR_TO_DATE function should help alleviate this issue since
%d: Day of the month as a numeric value (01 to 31) -
%b: Abbreviated month name (Jan to Dec) -
%y: Year as a numeric, 2-digit value
Why is this not helping? I also tried making the months lower case using LOWER() thinking maybe the abbreviated months needed to be all lower case, but this produces the same result. What am I missing here?
To read from the file but store a modified value, you need to use variables:
LOAD DATA INFILE 'File.txt'
IGNORE
INTO TABLE table_name
FIELDS TERMINATED BY '^~'
LINES TERMINATED BY '\r\n'
IGNORE 1 ROWS
(#RUN_DATE, #PROC_DT, #STL_DT, #TRD_DT)
SET RUN_DATE = STR_TO_DATE(#RUN_DATE, '%d-%b-%y'),
PROC_DT = STR_TO_DATE(#PROC_DT, '%d-%b-%y'),
STL_DT = STR_TO_DATE(#STL_DT, '%d-%b-%y'),
TRD_DT = STR_TO_DATE(#TRD_DT, '%d-%b-%y');

How to use UTC_TIMESTAMP() in csv file

I need a way to use the UTC_TIMESTAMP() function in a CSV file.
Currently I'm using the following Syntax;
LOAD DATA INFILE '/path/to/file.csv'
INTO TABLE my_table FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
I'm not quite sure what to do to keep the UTC_TIMESTAMP() from being enclosed in quotes. When entered in the database, I get the following result
"0000-00-00 00:00:00"
This was the only example I could find on stack overflow or on Google for converting a string to a MYSQL value.
MySQL CSV import: datetime value
I solved the problem by following the MySQL documentation on this page.
https://dev.mysql.com/doc/refman/5.7/en/load-data.html
About halfway down the page there is a section that shows you how to create variables for rows and then set table rows equal to native mysql functions or values assigned to those variables(Whichever you choose).
The example in the documentation that I'm referring to looks like this.
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, column2)
SET column3 = CURRENT_TIMESTAMP;
I fixed my problem by restructuring my code like this...
LOAD DATA LOCAL INFILE '/path/to/file.csv'
INTO TABLE veh_icodes FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(id, vcode, year, make, model, body_style, tran_type, engine_cap, drive_train, doors, trim, created_by, updated_by, #created_at, #updated_at, active)
SET created_at = CURRENT_TIMESTAMP, updated_at = CURRENT_TIMESTAMP;"
I hope this helps someone. :)

MySQL:values are not correctly imported from yyyymmdd to date variable, using str_to_date

Here is my code:
CREATE TABLE A
(`ID` INT NULL,
`DATE` DATE NULL,
`NUM` INT NULL
);
LOAD DATA LOCAL INFILE "fakepath/file.csv"
INTO TABLE A
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(ID,DATE,NUM)
SET
DATE = str_to_date(#DATE, '%Y%m%d');
The original data in csv file is like this-- 20160101,20160102,20160103 (the date is different). After I execute the code, all the date in the DATE column become one day value such as 2016-01-02 in table A.
Why do this happen? I have other table which used the same code(different column name)
How can I fix it? Thank you!
You have to tell MySQL to load the from-csv data value into a variable first:
LOAD DATA LOCAL INFILE "fakepath/file.csv"
[..snip..]
IGNORE 1 LINES
(ID,DATE,NUM)
^---table field, **NOT** a variable
SET
DATE = str_to_date(#DATE, '%Y%m%d');
^---variable never gets populated
Try
(ID, #DATE, NUM)
^--note this
instead. That'll load the id/num values directly into the table, but puts your date value into the variable, which you can use afterwards in the SET portion of the query.
The fact that you actually get a date value put into the table with a proper date format indicates that somwhere else, in previous code, you did set a #DATE variable, and it's simply being re-used in this query. But since you don't CHANGE that variable's value in this query, you end up using the SAME date value for all records.

Importing from oracle to mysql using csv

I have a csv file (12 gb) which exported from oracle database,
formatted like
6436,,N,,,,,,,,,,,,04/01/1999,04/01/1999,352,1270,1270,406,406,1999,1,31/01/1999,0,88,0,A,11/12/2005,N,0,11/12/2005,,,,1270,1,0,,2974,,,,,,,,,,,,,,,,,,,,,,,,
As you see it has a lot of null values (mostly integer),
And when i import it to mysql database, It fills null values with zero
Like,
6436,0,0,,0,0,0,0,0,0,0,0,0,0,4,04/01/1999,04/01/1999,1270,1270,406,406,1999,1,31,31/01/1999,88,0,A,11/12/2005,N,0,11/12/2005,0,0,0,1270,1,0,,2974,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,0,0,00/00/0000,0,,0,0
What is the real issue here?
Thanks.
I figure it out with helpful comment of #Marc B
I wrote something like
LOAD DATA LOCAL INFILE '/path/data.csv' INTO TABLE table_name
FIELDS TERMINATED BY ','
IGNORE 1 LINES
(columns with #)
SET
column = IF(length(#column)= 0,null,#column),
date = str_to_date(#date, '%d/%m/%Y');

Modify column before inserting XML value to MySQL table

I'm trying to import a XML file into a MySQL Table. In the XML file there is a timestamp in <CurrentTime> in the following format:
2016-01-26T09:52:19.3420655+01:00
This timstamp should go into the corresponding DATETIME CurrentTime column in my Table. So I did the following
LOAD XML INFILE 'xxx.xml'
INTO TABLE test.events
ROWS IDENTIFIED BY '<Event>'
SET CurrentTime = str_to_date(CurrentTime, '%Y-%m-%dT%H:%i:%s.%f');
But it quits with the error
Error Code: 1292. Incorrect datetime value: '2016-01-25T16:22:24.1840792+01:00' for column 'CurrentTime' at row 1
So it seems it doesn't convert the string at all. Why?
I think that error is thrown when the string value from the file is loaded directly to the column. The error is thrown before you get to the SET clause.
Here's an abbreviated example of how to use user-defined variables to pass the value of a field down to the SET, bypassing the assignment to the column.
Note that the columns _row and account_number are populated directly from the first two fields in the file. The later fields in the file are assigned to user-defined variables (identifiers beginning with #.
The SET clause evaluates the user-defined variables, and assigns the result of the expression to the actual column in the table.
In this example, the "dates" were formatted YYYYMMDD. I used the STR_TO_DATE() function to have that string converted to a proper DATE.
I abbreviated this sample somewhat, but it demonstrates the approach of reading field values into user-defined variables.
CREATE TABLE _import_water
(`_row` INT
,`account_number` VARCHAR(255)
,`total_due` DECIMAL(18,2)
,`end_date` DATE
,`start_date` DATE
,`ccf` DECIMAL(18,4)
)
LOAD DATA LOCAL INFILE '//server/share$/users/me/mydir/myfile.csv'
INTO TABLE _import_water
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(_row
,account_number
,#total_due
,#end_date
,#start_date
,#ccf
)
SET `total_due` = NULLIF(#total_due,'')
, `end_date` = STR_TO_DATE(#end_date,'%Y%m%d')
, `start_date` = STR_TO_DATE(#start_date,'%Y%m%d')
, `ccf` = NULLIF(#ccf,'')
Also, it doesn't look like there's any problem with your STR_TO_DATE, it seems to evaluate just fine.
testing...
SELECT STR_TO_DATE('2016-01-25T16:22:24.1840792+01:00','%Y-%m-%dT%H:%i:%s.%f') AS mydatetime
returns:
mydatetime
--------------------------
2016-01-25 16:22:24.184079