I am writing a series of SQL scripts to import large datasets in CSV format. I know that the syntax:
STR_TO_DATE('1/19/2013 5:11:28 PM', '%c/%e/%Y %l:%i:%s %p')
will convert the incoming date/time strings correctly, like so:
2013-01-19 17:11:28
One dataset that I am bringing in has 240,000 records with 78 fields/columns, with at least 16 of those columns being DATETIME fields.
I will be performing this import on a periodic basis, using different dataset.
For each import, I wil rename the tables for backup and start with clean, empty new ones.
My question is this: in terms of best practices, which is the better approach to take on the imports?
Perform the date conversions as I bring them in using LOAD DATA LOCAL INFILE
Bring all of the data into VARCHAR fields using LOAD DATA..., then go back and convert each of the 16 columns separately
I think that I can write the script to use either approach, but I am seeking feedback as to which approach is "better".
You can convert all of the columns in several simple passes:
Import the data as-is storing your ad-hoc dates in a VARCHAR column.
Use ALTER TABLE to create the date columns in the correct DATE or DATETIME format.
Use UPDATE TABLE to do the conversion from the raw column to the DATETIME column.
Delete the original raw columns.
The alternative is to pre-process the CSV file before import which side-steps all of this.
Related
I have created a connection to Cloud SQL and used EXTERNAL_QUERY() to export the data to Bigquery. My problem is that I do not know a computationally efficient way to export a new days data since the Cloud SQL table is not partitioned; however, it does have a date column date_field but it is of the datatype char.
I have tried running the following query with the view of scheduling a similar type so that it inserts the results: SELECT * FROM EXTERNAL_QUERY("connection", "SELECT period FROM table where date_field = cast(current_date() as char);") but it takes very long to run, whereas: SELECT * FROM EXTERNAL_QUERY("connection", "SELECT period FROM table where date_field = '2020-03-20';") is almost instant.
Firstly, it’s highly recommended to convert the ‘date_field’ column to the datatype DATE. This would improve simplicity and performance in the future.
When comparing two strings, MySQL will make use of indexes to speed up the queries. This is executed successfully when defining the string as ‘2020-03-20’ for example. When casting the current date to a string, it’s possible that the characters set used in the comparison aren’t the same, so indexes can’t be used.
You may want to check the characters set once current_datetime has been casted compared to the values in the ‘date_field’ column. You could then use this command instead of cast:
CONVERT(current_date() USING enter_char_sets_here)
Here is the documentation for the different casting functions.
I have a DB2 11 database with a large table that has JSON data stored in a CLOB column. Given that I'd like to perform queries on it using the JSON_VAL function, I always need to use JSON2BSON to convert it first, which I assume is a significant overhead. I would like to move the data into another table that has exactly the same structure, except for the CLOB column which I'd like to replace with a BLOB one to store the JSON immediately in BLOB, hoping that this will speed up my queries.
My approach to this was writing a
insert into newtable (ID, BLOBDATA) select ID, SYSTOOLS.JSON2BSON(CLOBDATA) from oldtable;
After doing this I realized that long json objects got truncated. I have googled on this and learned that selects to truncate large objects.
I am reaching out to here to see if there is any simple way for me to do this excercise, without having to write a program to read out and write back all the data. (I had myself burnt with similar truncation taking place when I used DB2 csv export features.)
Thanks.
Starting with Db2 11.1.4.4 there are new JSON functions based on the ISO technical paper. I would advise to use them. They are the strategic functionality going forward.
You could use JSON_VALUE to perform the equivalent of what you planned to with JSON_VAL.
I have a CSV file with lots of different columns, with one column having date which is in DD-MM-YYYY format . now mysql's default format is YYYY-MM-DD . I was planning to load my CSV file directly into my mysql table , but this may cause a problem. what should I do . PS- i am not planning to run a code from some other language on the file , so i would appreciate solutions that include the use of mysql itself.
If you are able to perform multiple steps you could first import the csv into an intermediate table which has the same properties as your real data table, except for the date columns.
Then you can just insert it into the real table and format the date-string while selecting
INSERT INTO realTable (/* ... */)
SELECT /*.... */, STR_TO_DATE('01-5-2013','%d-%m-%Y')
FROM intermediateTable;
When ready truncate the table, done. Should be done in a transaction so you can rollback if something goes wrong.
edit:
Seems to be a better way: https://stackoverflow.com/a/4963561/3595565 So you should be able to format the date while importing by using a variable
You can do two things.
Since your column format is DD-MM-YYYY directly you can't convert YYYY-MM-DD.You can use substr to get DD, MM, YYYY seperately and insert in your table.
Otherwise you can store as varchar not date type.
I have a MySQL script that, on a weekly basis, imports a large dataset (~250,000 records) into a database. This table has 85 data fields, of which 18 are DATETIME fields, For each of these 18 date fields, the script must run the following commands:
ALTER TABLE InputTable
ADD COLUMN EnrollmentDate DATETIME
AFTER EnrollmentDateRAW;
UPDATE InputTable
SET EnrollmentDate = STR_TO_DATE(EnrollmentDateRAW, '%c/%e/%Y %l:%i:%s %p')
WHERE EnrollmentDate > '';
ALTER TABLE InputTable
DROP EnrollmentDateRAW;
Of course, in an effort to optimize the script, it has a single ALTTER statement that adds all the DATETIME columns, and another single ALTER statement that removes the RAW data fields after conversion.
As you can probably imagine, running the conversion eighteen times on a quarter million records take quite a bit of time. My question is: Is there a way to have the import function convert the date itself, instead of running the conversion after the import?
My advice? Leave both columns in there to avoid painful schema changes if this is a regular thing.
You could also try fixing the data before you import it to begin the correct date format. Ideally this is the ISO 8601 format, YYYY-MM-DD HH:MM:SS.
If you're pulling in via a CSV, this is usually easy to fix as a pass before doing your LOAD DATA step.
You could also stage your data in a temporary table, alter it as necessary, then merge that data into the main set only when all the operations are complete. Altering the schema of a 250K row table isn't that bad. For millions of rows it can be brutal.
I am trying to import a CSV with columns [ID, timestamp, account, liters]->[#, datetime, #, #] using the MYSQL workbench 6.3. MySQL creates the table fine using the CSV (imports 1 record) and then I reimport the same CSV for data rows. MySQL identifies the columns and matches them with table well. Then it takes a long time and reports that all rows have been imported but table shows no rows.
I have searched forums and seem people had problems with timestamps but the solutions involved using the commandline. Can someone please guide if I should format my csv differently? It's a big pain. Thanks
OK..so this was problem with date format. When I specified the datetime field, I couldn't see a field popping up at the bottom to specify the format as per https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior (screen resolution issues). Anyway, I types in the matching format there and CSV is importing now..Thanks