How to standardize a dirty date field in MySQL - mysql

I have a MySQL table that contains column representing a date and is stored as a string.
The dates in this column (date) are not standard (dirty) and can range from
"Jan 5, 2004" or "Jun 22 2:45 AM"
For the records that are missing the year I have another column (OpeningDate) that can be null or "22 June 2005" and "Deadline" which is a dirty column with values like ("26 January 2004", "01 July 2005, 6 pm
ABOUT: BearingPoint, Inc. Commercial Law and Economic Regulation
Program")
How do I go about to get a normalized representation of the values in the date field.
For other tables I've been able to normalize the date field by using the following queries but for this table the solutions I come up with are too convoluted and not even close to accurate.
SELECT DATE_FORMAT(STR_TO_DATE(DATE,'%M %d, %Y'), '%Y-%m-%d') FROM `data job posts`

I'm not sure there is a clean way to do this since strings are very much non normalized. The cleanest approach would likely be to chunk the data being modified and identify patterns that are identifiable so that you can reduce the size of the dataset to a smaller group of highly unnormalized strings.
As an example something similar to this:
UPDATE table
SET DATE = CASE WHEN DATE LIKE '^alnum+, digit+$' THEN DATE_FORMAT(STR_TO_DATE(DATE,'%M %d, %Y'), '%Y-%m-%d') ELSE DATE END,
DATE = CASE WHEN DATE LIKE '^alnum+:alnum+$' THEN DATE_FORMAT(STR_TO_DATE(DATE,'%M %d %l:%i %p'), '%m-%d') ELSE DATE END;
It might help to create this as a new column and rename the new column when dropping the old one once the operation is complete or creating this as a new table if the current table is live and needs to be queryable as updating records may lock the table.

Related

mysql Search range by year

I want to get the date of birth of a cat as a range of years.
The year range is as follows, and several selections are possible.
Year : [2000, 2010, 2020]
If I select 2020, the period from 2020-01-01 to 2029-12-31.
If I select 2000, 2020, the period from 2000-01-01 to 2009-12-31 and 2020-01-01 to 2029-12-31.
<TABLE>
CAT {
ID number,
Birth DateTime,
...
}
I have searched for various ways through books and Google, but I can't find the way I want to do so..
select * from CAT
where birth between '2000-01-01' and '2009-12-31'
or birth between '2010-01-01' and '2019-12-31'
or birth between '2020-01-01' and '2029-12-31'
I tried to use 'Between' or '-01-01', but if [2000, 2020] is selected, it must be connected with 'or'.
The more or, the slower the speed.
Please tell me a good way to do range calculations being able to use index.
The index is being used for BirthDate.
Add) In my db, the query of 'SUBSTRING(YEAR(CAT.birth),1,3) IN (200,202)' works quickly.
I have 500,000 data, can I use it like this?
All you need to do is add index to column birth and run your query above with BETWEEN and OR.
if you are using mysql, did you tried with YEAR() function ?
Example:
SELECT * FROM cat WHERE YEAR(birth) BETWEEN 1990 AND 2018 ORDER BY YEAR(birth) ASC;
Please check this Mysql YEAR()
If you expect to get more than about 20% of the rows from a table, then an INDEX will be eschewed for simply scanning all the rows.
Otherwise, having INDEX(birth) will help with certain queries, but none of the ones mentioned so far. Each of them is not "sargable" .
To use the index (and be efficient for a limited range of years or date range), something like this is probably what you need. This example covers 2 calendar years.
WHERE CAT.birth >= '2018-01-01'
AND CAT.birth < '2020-01-01'
BTW: SUBSTRING(YEAR(CAT.birth),1,3) can be simplified to LEFT(CAT.birth, 3), but that still cannot use the recommended index.
BTW: A 'bug' in your code: Since birth is a DATETIME, and '2009-12-31' excludes but midnight of the morning of New Year's Eve. Note how I avoided this common 'bug' by using < and the next day. This works whether you have DATE, DATETIME or DATETIME(6), etc.

how to change the year of birth to age in .sql?

I have a sql file containing bYear and u_age column in users table.
I would like to know how I can change all the digits in bYear, such as 1986, 2000, to u_age such as 33, 19.
Thanks so much !!
If you are looking to update the table (not the file), you can just do:
update users set u_age = year(curdate()) - bYear;
curdate() gives you the current date, from which you can extract the year using the year() function.
Please note that this computation is not accurate at all: to compute an age, you need the entire date of birth (including month and day). The above computation behaves like the date of birth is actually the first day of year bYear.
If you are looking to update a sql file: as commented by Raymond Nijland, just don't. This is much more complicated and far less efficient. Instead, load the file in a table, update the table and then export it to a file

SQL Query to get data between two weeks?

I have a week column with week numbers as w0, w1, w2.... I am trying to get last last six weeks data. Here's the sql query I am using.
SELECT * FROM week
WHERE uid = '9df984da-4318-1035-9589-493e89385fad'
AND report_week BETWEEN `'w52' AND 'w5'`;
'w52' is essentially week 52 in December 2015 and 'w5' is Jan 2016. The 'between' seems to not work. Whats the best way to get data from the above two weeks?
Here's the CREATE TABLE statement:
CREATE TABLE `week` (`uid` VARCHAR(255) DEFAULT '' NOT NULL,
`report_week` VARCHAR(7) NOT NULL,
`report_files_active` BIGINT DEFAULT NULL);
Essentially this table is getting populated from other table which has date column. It uses dates from other table and summarizes weekly data into this.
Any help is appreciated.
Refer to this SO Discussion which details the reasons for a problem similar to yours.
BETWEEN 'a' and 'b' actually matches to columnValue >='a' and columnValue <= 'b'
In your case w52 is greater than w5 due to lexicographic ordering of Strings - this means that the BETWEEN clause will never return a true (think about it as equivalent to saying BETWEEN 10 and 1 instead of BETWEEN 1 and 10.
Edit to my response:
Refrain from storing the week value as a string. Instead here are a couple of approaches in order of their preference:
Have a timestamp column. You can easily then use MySQL query
facilities to extract the week information out of this. For a
reference see this post.
Maintain two columns - YEAR, WEEKNO where YEAR will store values
like 2015, 2016 etc and WEEKNO will store the week number.
This way you can query data for any week in any year.
please show me table structure and DB name because it different for other, if it is any timestamp then we can use BETWEEN 'systemdate' AND 'systemdate-6'

Sort Date in Mysql table in DESC order

I want to show date column in DESC order where date is entered as VARCHAR and is in order 20-JUN-2007 I have already used ORDER BY RIGHT(vPublishedDate, 4) but it doesn't effect the month and date
Here is one way to do it using STR_TO_DATE (take into account the other answers about converting the column to date, although you may not have control over the database):
SELECT ...
FROM ...
ORDER BY STR_TO_DATE(vPublishedDate,'%d-%M-%Y')
As an example:
SELECT STR_TO_DATE('20-JUN-2007','%d-%M-%Y') as Date;
+------------+
| Date |
+------------+
| 2007-06-20 |
+------------+
Why are you using a VARCHAR to store a DATE? Use a DATE to store a DATE and then, as if by magic, sorting works all on its own.
You really should be storing dates as dates, not character-type fields. Then you wouldn't need to worry about this sort of "SQL gymnastics" (as I like to call it).
Databases are for storing data, not formatting.
By forcing yourself to manipulate sub-columns, you basically prevent the database from performing any useful optimisations.
In order to do what you want with the data you have you have to do something like:
use substring to extract individual sub-column information to get them in the order you want; and
use some sort of lookup to turn a string like "NOV" into 11 (since the month names will sort as DEC, FEB, AUG, APR, JAN, JUL, JUN, MAR, MAY, NOV, OCT, SEP).
And this would be a serious performance killer. Now there may be a function which can turn that particular date format into a proper date but I urge you: don't use it.
Set up or change your database to use an intelligent schema and all these problems will magically disappear.
It's a lot easier to turn a date column into any sort of output format than to do the same with a character column.
Change that VARCHARto a Date type column, if you can.
You can also try this, although this is NOT the RIGHT approach.
Select STR_TO_DATE(your_date_column,'%d/%m/%Y') AS your_new_date from your_table order by your_new_date DESC
Try converting the varchar to date using str_to_date and then you can apply the sorting logic.
I would suggest you to change the type as Date.
Then run a script which converts your dates to the correct DB format.
Sorting would be then be just as simple as sorting ids in MySql

Order by date (varchar)?

I want to order by date.
e.g.
table_date
February 2011
January 2011
December 2010
I've already tried:
SELECT distinct(table_date) FROM tables ORDER BY table_date DESC
bur it doesn't work.
I get this instead:
January 2011
February 2011
December 2010
Can you help me please?
If you must store the dates in a varchar which as others pointed out is not recommended, you could use:
SELECT table_date FROM tables ORDER BY STR_TO_DATE(table_date, '%M %Y') DESC;
If you want to order by date, store it as a date, not a string. Unless your date string is of the form yyyy-mm-dd, it will not sort as you want it.
Databases are hard enough work as-is, without people making it harder, and you should be striving as much as possible to avoid what I like to call SQL gymnastics.
Store it as a date then, if you must, use date functions to get it in the form February 2011.
It'll be a lot easier going that way than what you're trying to do.
Even if you can't change any of the current columns due to code restrictions, you can always add another column to the database like TABLE_DATE_AS_DATE and put in an insert/update trigger to populate it based on TABLE-DATE.
Then just do:
update table x set table_date = table_date
or something similar, to fire the trigger for all rows.
Then, your query can still get at table_date but use table_date_as_date for ordering. That's a kludge of course but I've had to use tricks like that in the past when it was imperative the code could not change, so we had to resort to DBMS trickery.
Store dates as DATE, not as VARCHAR, that's a huge mistake. Use STR_TO_DATE() to convert your content. When you're done, you can order by dates without any problems.
Date should be stored as date and not VARCHAR.
Suppose you have table_date in the following format (DD-MM-YYYY)
table_date
2011-01-01
2011-02-01
2010-12-01
Now you can perform order by clause in the following way
SELECT * FROM table_order ORDER BY str_to_date(date, "%Y-%M-%D") ASC
I doubt if the output will be in ordered form