date conversion in csv to mysql db format - mysql

I am exporting a csv file in to mysql db using load data infile syntax.
the date in csv is in 2009/10/31 7:8:57.0 format. Is there any way to convert this while loading to something like 2009-10-31 07:08:57 ?

Execute TO_CHAR(TO_DATE(datefromcsv, 'YYYY/MM/DD HH:MI:SS.FF'), 'YYYY-MM-DD HH:MI:SS') when you are doing the INSERT into the db.

(usual caveats apply here) A regular expression might be what you need. Substitute / with - and remove the trailing .0.
I am surprised, though, that mysql can't handle dates like the one you provided. See for example the mySql manual. Have you tried feeding it to mysql and seeing what happens?

Related

In MYSQL, how to upload a csv file that contains a date in the format of '1/1/2020' properly into a DATE data type format (standard YYYY-MM-DD)

I have a column of data, let's call it bank_date, that I receive from an external vendor as a csv file every day. As such the dates in that column show as '1/1/2020'.
I am trying to upload that raw csv file directly to SQL daily. We used to store the SQL bank_date format as text, but we have converted it to a Data data type, and now it keeps zero'ing out every time, with some sort of truncate / "datetime value incorrect" error.
I have now tested 17 different versions of utilizing STR_TO_date (mostly), CAST, and CONVERT, and feel like I'm close, but I'm not quite getting the syntax right.
Also for reference, I did find 2 other workarounds that are successful, but my boss specifically wants it uploaded and converted directly through the import process (not manipulating the raw csv data) for safety reasons. For reference:
Workaround 1: Convert csv date column to the YYYY-MM-DD format and save file. The issue with this is that if you try to open that CSV file again, it auto-changes the date format back to the standard mm/dd/yyyy. If someone doesn't know to watch out for this and is re-opening the csv file to double check something, they're gonna find an error when they upload, and the problem is not easy to identify.
Workaround 2:Create an extra dummy_date column in the table that is formatted as a text data type and upload as normal. Then copy and paste the data into the correct bank_date column using a str_to_date function as follows: UPDATE dummy_date SET bank_date = STR_TO_DATE(dummy_date, ‘%c/%e/%Y’); The issue with this is that it just creates extra unnecessary data that can be confused when other people may not know that 1 of the columns is not intended for querying.
Here is my current code:
USE database_name;
LOAD DATA LOCAL INFILE 'C:/Users/Shelly/Desktop/Date Import.csv'
INTO TABLE bank_table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 ROWS
(bank_date, bank_amount)
SET bank_date = str_to_date(bank_date,'%Y-%m-%d');
The "SET" line is what I cannot work out on syntax to convert a csv's 1/5/2020' to SQL's 2020-1-5 format. Every test I've made either produces 0000-00-00 or nulls the column cells. I'm thinking maybe I need to tell SQL how to understand the csv's format in order for it to know how to convert it. Newbie here and stuck.
You need to specify a format for a date that is in the file, not a "required" one:
SET bank_date = str_to_date(bank_date,'%c/%e/%Y');

LOAD DATA FROM S3 command failing because of timestamp

I'm running the "LOAD DATA FROM S3" command to load a CSV file from S3 to Aurora MySQL. The command works fine if run it in the Mysql Workbench (it gives me the below exception as warnings though but still inserts the dates fine), but when I run it in Java I get the following exception:
com.mysql.cj.jdbc.exceptions.MysqlDataTruncation:
Data truncation: Incorrect datetime value: '2018-05-16T00:31:14-07:00'
Is there a workaround? Is there something I need to setup on the mysql side or in my app to make this transformation seamless? Should I somehow run a REPLACE() command on the timestamp?
Update 1:
When I use REPLACE to remove the "-07:00" from the time original timestamp (2018-05-16T00:31:14-07:00) it loads the data appropriately. Here's my load statement:
LOAD DATA FROM S3 's3://bucket/object.csv'
REPLACE
INTO TABLE sample
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(#myDate)
SET `created-date` = replace(#myDate, '-07:00', ' ');
For obvious reasons it's not a good solution. Why would the LOAD statement work in the mysql workbench and not in my java code? Can I set some parameter to make it work? Any help is appreciated!!
The way I solved it is by using mysql's SUBSTRING function in the 'SET' part of the LOAD DATA query (instead of the 'replace'):
SUBSTRING(#myDate, 1, 10)
This way the trailing '-07:00' was removed (I actually opted to remove the time as well, since I didn't need it, but you can use it for TIMESTAMPS as well.

MySQL to GeoMesa through .csv

I have a MySQL table whose data I have to export to .csv and then ingest this .csv to GeoMesa.
My Mysql table structure is like below:
[
Now, as you can see the the_geom attribute of table has data type point and in database it is stored as blob like shown below:
Now I have two problems :
When I export the MySQL data into a (.csv) file my csv file shows (...) for the_geom attribute as shown below instead of any binary representation or anything which will allow it to be ingested in GeoMesa. So, how to overcome this?
Csv file also shows # for any attribute with datetime datatype but if you expand the column the date time can be seen as sown in below picture (however my question is does it will cause problem in geomesa?).
For #1, MySQL's export is not automatically converting the Point datatype into text for you. You might need to call a conversion function such as AsWKT to output the geometry as Well Known Text. The WKT format can be used by GeoMesa to read in the Point data.
For #2, I think you'll need to do the same for the date field. Check out the date and time functions.

Unable to import 3.4GB csv into redshift because values contains free-text with commas

And so we found a 3.6GB csv that we have uploaded onto S3 and now want to import into Redshift, then do the querying and analysis from iPython.
Problem 1:
This comma delimited file contains values free text that also contains commas and this is interfering with the delimiting so can’t upload to Redshift.
When we tried opening the sample dataset in Excel, Excel surprisingly puts them into columns correctly.
Problem 2:
A column that is supposed to contain integers have some records containing alphabets to indicate some other scenario.
So, the only way to get the import through is to declare this column as varchar. But then we can do calculations later on.
Problem 3:
The datetime data type requires the date time value to be in the format YYYY-MM-DD HH:MM:SS, but the csv doesn’t contain the SS and the database is rejecting the import.
We can’t manipulate the data on a local machine because it is too big, and we can’t upload onto the cloud for computing because it is not in the correct format.
The last resort would be to scale the instance running iPython all the way up so that we can read the big csv directly from S3, but this approach doesn’t make sense as a long-term solution.
Your suggestions?
Train: https://s3-ap-southeast-1.amazonaws.com/bucketbigdataclass/stack_overflow_train.csv (3.4GB)
Train Sample: https://s3-ap-southeast-1.amazonaws.com/bucketbigdataclass/stack_overflow_train-sample.csv (133MB)
Try having different delimiter or use escape characters.
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_preparing_data.html
For second issue, if you want to extract only numbers from the column after loading into char use regexp_replace or other functions.
For third issue, you can as well load it into VARCHAR field and then use substring cast(left(column_name, 10)||' '||right(column_name, 6)||':00' as timestamp)
to load it into final table from staging table
For the first issue, you need to find out a way to differentiate between the two types of commas - the delimiter and the text commas. Once you have done that, replace the delimiters with a different delimiter and use the same as delimiter in the copy command for Redshift.
For the second issue, you need to first figure out if this column needs to be present for numerical aggregations once loaded. If yes, you need to get this data cleaned up before loading. If no, you can directly load this as char/ varchar field. All your queries will still work but you will not be able to do any aggregations (sum/ avg and the likes) on this field.
For problem 3, you can use Text(date, "yyyy-mm-dd hh:mm:ss") function in excel to do a mass replace for this field.
Let me know if this works out.

MySQL CSV Import - date entered as 0000-00-00 00:00:00 if timestamp has milliseconds?

I currently have a large number of CSVs to import to a MySQL database. The files contain timestamps for each record, which are in the format (for example):
2011-10-13 09:36:02.297000000
I am aware of the MySQL bug #8523, which indicates that storing milliseconds in a datetime field is not supported. Despite this, I would have expected the datetime field to truncate the record after the seconds, instead of being entered as blank.
I have narrowed down the problem to the milliseconds (as opposed to the formatting of the csv etc.), since
2011-10-13 09:36:02
imports correctly.
Could anyone suggest a way that I can get this data imported without zeros? I have too many CSVs to go into each manually and adjust the length/formatting of the timestamps.
I should point out that while milliseconds would be a nice-to-have, they are not necessary to my application, so I would be happy with a solution that allows me to easily truncate the numbers and import them.
Thanks!
EDIT: To clarify, I am importing the CSVs using the following command:
mysqlimport --fields-enclosed-by="" --fields-terminated-by="," --lines-terminated-by="\n" --columns=id,#x,Pair,Time -p --local gain [file].csv
This is very fast for importing the records - I have around 50m to import, so reading each line in is not a great option.
I don't know how are you importing the CSVs but the way I would do is to write a script (php/perl) to read each file, round up or trim the time stamp to seconds and execute INSERT statements on the DATABASE.
Something like
<?php
$file=fopen("your.csv","r");
mysql_connect ($ip, $user, $pass);
while(!feof($file))
{
$line = explode(',',fgets($file));
mysql_query("INSERT INTO TABLE1 (ID, DATE) values (".$line[0].", ".substr($line[1],0,19).")");
}
fclose($file);
?>
Execute this from the command line and it should do the job
It will not import when using the milliseconds, but it does import without. So you have to substring one way or another. MySQL has various string functions, such as SUBSTRING - which you could use, since you need to cut of those milliseconds at exactly the same position each time.
However, this you would use when performing the query. If you cannot modify the query, due to it being automated in some way, you can add a step to the process and change the data first, then adding it to your database. A simple script would be able to read the csv, change it, write it back again, or perform the query directly.