I have a MySQL script that, on a weekly basis, imports a large dataset (~250,000 records) into a database. This table has 85 data fields, of which 18 are DATETIME fields, For each of these 18 date fields, the script must run the following commands:
ALTER TABLE InputTable
ADD COLUMN EnrollmentDate DATETIME
AFTER EnrollmentDateRAW;
UPDATE InputTable
SET EnrollmentDate = STR_TO_DATE(EnrollmentDateRAW, '%c/%e/%Y %l:%i:%s %p')
WHERE EnrollmentDate > '';
ALTER TABLE InputTable
DROP EnrollmentDateRAW;
Of course, in an effort to optimize the script, it has a single ALTTER statement that adds all the DATETIME columns, and another single ALTER statement that removes the RAW data fields after conversion.
As you can probably imagine, running the conversion eighteen times on a quarter million records take quite a bit of time. My question is: Is there a way to have the import function convert the date itself, instead of running the conversion after the import?
My advice? Leave both columns in there to avoid painful schema changes if this is a regular thing.
You could also try fixing the data before you import it to begin the correct date format. Ideally this is the ISO 8601 format, YYYY-MM-DD HH:MM:SS.
If you're pulling in via a CSV, this is usually easy to fix as a pass before doing your LOAD DATA step.
You could also stage your data in a temporary table, alter it as necessary, then merge that data into the main set only when all the operations are complete. Altering the schema of a 250K row table isn't that bad. For millions of rows it can be brutal.
Related
I have created a connection to Cloud SQL and used EXTERNAL_QUERY() to export the data to Bigquery. My problem is that I do not know a computationally efficient way to export a new days data since the Cloud SQL table is not partitioned; however, it does have a date column date_field but it is of the datatype char.
I have tried running the following query with the view of scheduling a similar type so that it inserts the results: SELECT * FROM EXTERNAL_QUERY("connection", "SELECT period FROM table where date_field = cast(current_date() as char);") but it takes very long to run, whereas: SELECT * FROM EXTERNAL_QUERY("connection", "SELECT period FROM table where date_field = '2020-03-20';") is almost instant.
Firstly, it’s highly recommended to convert the ‘date_field’ column to the datatype DATE. This would improve simplicity and performance in the future.
When comparing two strings, MySQL will make use of indexes to speed up the queries. This is executed successfully when defining the string as ‘2020-03-20’ for example. When casting the current date to a string, it’s possible that the characters set used in the comparison aren’t the same, so indexes can’t be used.
You may want to check the characters set once current_datetime has been casted compared to the values in the ‘date_field’ column. You could then use this command instead of cast:
CONVERT(current_date() USING enter_char_sets_here)
Here is the documentation for the different casting functions.
I have a CSV file with lots of different columns, with one column having date which is in DD-MM-YYYY format . now mysql's default format is YYYY-MM-DD . I was planning to load my CSV file directly into my mysql table , but this may cause a problem. what should I do . PS- i am not planning to run a code from some other language on the file , so i would appreciate solutions that include the use of mysql itself.
If you are able to perform multiple steps you could first import the csv into an intermediate table which has the same properties as your real data table, except for the date columns.
Then you can just insert it into the real table and format the date-string while selecting
INSERT INTO realTable (/* ... */)
SELECT /*.... */, STR_TO_DATE('01-5-2013','%d-%m-%Y')
FROM intermediateTable;
When ready truncate the table, done. Should be done in a transaction so you can rollback if something goes wrong.
edit:
Seems to be a better way: https://stackoverflow.com/a/4963561/3595565 So you should be able to format the date while importing by using a variable
You can do two things.
Since your column format is DD-MM-YYYY directly you can't convert YYYY-MM-DD.You can use substr to get DD, MM, YYYY seperately and insert in your table.
Otherwise you can store as varchar not date type.
Im new to SQl and trying to do a dump through phpmyadmin.
At the moment date data is stored in my DB as int(11).
When I export from phpmyadmin, the data is naturally exported as numbers like '1325336400' but i would like this to display as 01/01/2012 or similar format. is there any way I can do this?
Many thanks in advance
Jus
If you're storing your "date data" (as you put it) in 32-bit integers, I guess you are using *nix timestamp values (seconds since the 1-jan-1970 00:00 UTC epoch).
(You know this may overflow sometime in 2038, right? http://en.wikipedia.org/wiki/Year_2038_problem)
phpmyadmin has a hard time with these values, as you have discovered.
MySQL has a TIMESTAMP data type which also uses *nix-style timestamps. (It won't overflow; the MySQL developers did the right thing.)
You really do need to convert your date data to the TIMESTAMP data type. Otherwise dealing with time will be a huge pain in the neck, forever. Here's how to do it.
First, add a column to your table in this way,
ALTER TABLE mytable ADD COLUMN ts TIMESTAMP AFTER myinttimestamp
Then populate your new ts column using the values you already have.
UPDATE TABLE mytable SET ts = FROM_UNIXTIME(myinttimestamp)
Next, change the definition of your new column so it disallows NULL values and uses the current time as a default:
ALTER TABLE mytable
CHANGE ts
ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
Finally, if you want you can get rid of the old column with the INT values in it.
ALTER TABLE mytable DROP COLUMN myinttimestamp
(You should consider trying all this out on a copy of your table; it would stink to make a mistake and wreck your data).
When you use the TIMESTAMP data type, MySQL does its best to store all these timestamps internally in UTC (time-zone-insensitive) time, and convert them to local time upon display, based on how you set
SET time_zone = 'Asia/Vladivostok'
or whatever. It will also convert them from local time to UTC time when you put them in to the data base.
Here's a write up.
https://dev.mysql.com/doc/refman/5.5/en/time-zone-support.html
I am writing a series of SQL scripts to import large datasets in CSV format. I know that the syntax:
STR_TO_DATE('1/19/2013 5:11:28 PM', '%c/%e/%Y %l:%i:%s %p')
will convert the incoming date/time strings correctly, like so:
2013-01-19 17:11:28
One dataset that I am bringing in has 240,000 records with 78 fields/columns, with at least 16 of those columns being DATETIME fields.
I will be performing this import on a periodic basis, using different dataset.
For each import, I wil rename the tables for backup and start with clean, empty new ones.
My question is this: in terms of best practices, which is the better approach to take on the imports?
Perform the date conversions as I bring them in using LOAD DATA LOCAL INFILE
Bring all of the data into VARCHAR fields using LOAD DATA..., then go back and convert each of the 16 columns separately
I think that I can write the script to use either approach, but I am seeking feedback as to which approach is "better".
You can convert all of the columns in several simple passes:
Import the data as-is storing your ad-hoc dates in a VARCHAR column.
Use ALTER TABLE to create the date columns in the correct DATE or DATETIME format.
Use UPDATE TABLE to do the conversion from the raw column to the DATETIME column.
Delete the original raw columns.
The alternative is to pre-process the CSV file before import which side-steps all of this.
I have 2 stored procedures Encode,Decode and i want to use this sp to convert my datetime column values (say Dob) to an encrypted date.The problem is that the encrypted format is not in datetime(varbinary) and hence it cant be inserted into that field.Changing the datatype or adding a new column doesn' favour me as my db is a huge one with lots of tables and sps.The steps I use presently is:
declare #datetime
set #datetime='01/02/2008 12:45 PM'
declare #secretDate varchar(400)
declare #date varchar(200)
set #date=(select Convert(varchar(200),#datetime,120)
EXEC #secretDate=dbo.Encode #date
set #date=(select Convert(varchar(200),#secretdate,120))
select Convert(varchar(200),convert(varbinary(MAX),#date)) as EncryptedDate
Any suggestion is appreciated!
You would have to do this change of the column definition in multiple steps.
1) Add a new encryptedDate column set to the encoded value.
2) Drop the existing date column from the table.
3) Rename the encryptedDate to existing date column name.
You may be able to do steps 2 + 3 in one command, but I'm not sure of the syntax.
Any suggestion is appreciated!
This whole thing sounds like a bad idea. If the data is encrypted but the 'Decode' function is a stored procedure in the DB, then the data is effectively not encrypted. Doing this also prevents all data compares from working, which is a Bad Thing.
Why not just encode the data when you read it from the DB if you don't want to present it to users?
Times, and particularly dates have a very unusual, non-linear structure. Even storing dates in structures intended for dates is difficult. If you need to store this data encrypted then don't try to store it in a date / datetime field.