SQL SELECT - order dates with wrong format - mysql

I was tasked with ordering some entries in our web application. Current solution made by some other guy 10 years ago, is that there is a select on db and then it iterates and make table.
Problem is, that date is in dd-mm-yyyy format and in varchar data.
And not really sure, if I am brave enought to make changes to the database.
So is there some way to order it anyway within a select, some way to order it by the end meaby? Or only way without making some gruesome function in code is to change the db?

You can use the STR_TO_DATE() function for this. Try
ORDER BY STR_TO_DATE(varcharDateColumn, '%d-%m-%Y')
It converts your character-string dates to the DATE datatype where ordering works without trouble.
As of MySQL 5.7 or later, you can add a so-called generated column to your table without touching the other data.
ALTER TABLE tbl
ADD COLUMN goodDate
AS (STR_TO_DATE(varcharDateColumn, '%m-%d-%Y'))
STORED;
You can even put an index on that column if you need to use it for searrching.
ALTER TABLE t1 ADD INDEX goodDate(goodDate);

You can use STR_TO_DATE function, but this will work only for small tables(maybe thousands of records), on large data sets you will face performance problems:
SELECT *
FROM (
SELECT '01-5-2013' AS Date
UNION ALL
SELECT '02-6-2013' AS Date
UNION ALL
SELECT '01-6-2013' AS Date
) AS t1
ORDER BY STR_TO_DATE(Date,'%d-%m-%Y')
Long term solution should be conversion of that column to proper date type.

Related

Why does MySQL drops my index when using DATE(`table`.`column`)

I have a MySQL innodb table with a few columns.
one of them is named "dateCreated" which is a DATETIME column and it is indexed.
My query:
SELECT
*
FROM
`table1`
WHERE
DATE(`dateCreated`) BETWEEN '2014-8-7' AND '2013-8-7'
MySQL for some reason refuses to use the index on the dateCreated column (even with USE INDEX or FORCE INDEX.
However, if I change the query to this:
SELECT
*
FROM
`table1`
WHERE
`dateCreated` BETWEEN '2014-8-7' AND '2013-8-7'
note the DATE(...) removal
MySQL uses the index just fine.
I could manage without using the DATE() function, but this is just weird to me.
I understand that maybe MySQL indexes the full date and time and when searching only a part of it, it gets confused or something. But there must be a way to use a partial date (lets say MONTH(...) or DATE(...)) and still benefit from the indexed column and avoid the full table scan.
Any thoughts..?
Thanks.
As you have observed once you apply a function to that field you destroy access to the index. So,
It will help if you don't use between. The rationale for applying the function to the data is so you can get the data to match the parameters. There are just 2 parameter dates and several hundred? thousand? million? rows of data. Why not reverse this, change the parameters to suit the data? (making it a "sargable" predicate)
SELECT
*
FROM
`table1`
WHERE
( `dateCreated` >= '2013-08-07' AND `dateCreated` < '2014-08-07' )
;
Note 2013-08-07 is used first, and this needs to be true if using between also. You will not get any results using between if the first date is younger than the second date.
Also note that exactly 12 months of data is contained >= '2013-08-07' AND < '2014-08-07', I presume this is what you are seeking.
Using the combination of date(dateCreated) and between would include 1 too many days as all events during '2014-08-07' would be included. If you deliberately wanted one year and 1 day then add 1 day to the higher date i.e. so it would be < '2014-08-08'

Explain how the following query works?

I have a mysql query which works in a strange way. I am posting the 2 queries with input data changed and the output are listed under each query.
Query 1 (Area to be noted BETWEEN '13/05/11' AND '30/05/11'):
SELECT COUNT(pos_transaction_id) AS total,
DATE_FORMAT(pt.timestamp,'%d-%m-%Y %H:%i:%S') AS Date,
SUM(amount) AS amount
FROM pos_transactions pt
WHERE DATE_FORMAT(pt.timestamp,'%e/%m/%y') BETWEEN '13/05/11' AND '30/05/11'
GROUP BY WEEK(pt.timestamp) ORDER BY pt.timestamp
Output:
Query 2 (Area to be noted BETWEEN '3/05/11' AND '30/05/11'):
SELECT COUNT(pos_transaction_id) AS total,
DATE_FORMAT(pt.timestamp,'%d-%m-%Y %H:%i:%S') AS Date,
SUM(amount) AS amount
FROM pos_transactions pt
WHERE DATE_FORMAT(pt.timestamp,'%e/%m/%y') BETWEEN '3/05/11' AND '30/05/11'
GROUP BY WEEK(pt.timestamp) ORDER BY pt.timestamp
Output:
Now when the range is increased in the second query why am I getting just one record ? And even in the first query I am getting records which is out of range. What is wrong with it??
EDIT
The changed query looks like this and still not doing what I wanted it to do.
SELECT COUNT(pos_transaction_id) AS total,
DATE_FORMAT(pt.timestamp,'%d-%m-%Y %H:%i:%S') AS Date,
SUM(amount) AS amount
FROM pos_transactions pt
WHERE DATE_FORMAT(pt.timestamp,'%e/%m/%y') BETWEEN STR_TO_DATE('01/05/11','%e/%m/%y') AND STR_TO_DATE('30/05/11','%e/%m/%y')
GROUP BY WEEK(pt.timestamp) ORDER BY pt.timestamp
The output is:
I think you're seeing the result of the intersection of two bad practices.
First, the date_format() function returns a string. Your WHERE clause does a string comparison. In PostgreSQL
select '26/04/2011' between '13/05/11' AND '30/05/11';
--
T
That's because the string '26' is between the strings '13' and '30'. If you write them as dates, though, PostgreSQL will correctly tell you that '2011-04-26' (following the datestyle setting on my server) isn't in that range.
Second, I'm guessing that the odd out-of-range values appear because you're using an indeterminate expression in your aggregate. The expression WEEK(pt.timestamp) doesn't appear in the SELECT list. I think every other SQL engine on the market will throw an error if you try to do that. Since it's not in the SELECT list, MySQL will return an apparently random value from that aggregate range.
To avoid these kinds of errors, don't do string comparisons on date or timestamp ranges, and don't use indeterminate aggregate expressions.
Posting DDL and minimal SQL INSERT statements to reproduce the problem helps people help you.
I'm absolutely not sure, but it is maybe the comparison is done as a string and not as a date.
DATE_FORMAT returns a string and both your condition are strings too.
You should try without the DATE_FORMAT, just the column, or maybe trying to convert the condition to a date.
I'm thinking something like this :
pt.timestamp BETWEEN STR_TO_DATE('13/05/11', '%e/%m/%y') AND STR_TO_DATE('30/05/11', '%e/%m/%y')
I am pretty sure you are meaning to do
WHERE pt.timestamp BETWEEN TO_DATE('13/04/11', 'dd/mm/yy') AND TO_DATE('30/05/11', 'dd/mm/yy')
Before you are asking it for a string between two other strings.
Update
I think a few point is being missed here. Based on the calculations you are doing on pos_transactions.timestamp I am going to assume it's a type of timestamp. In your query you need to use the timestamp directly if you want to do a range compare. A timestamp already contains all the data you need to do this comparison. You don't need to covert it to Day/Month/Year to compare it.
What you need to do is this:
Find all values where my timestamp is between create a new date from '13/05/11' AND create a new date from '30/05/11'. pt.timestamp is already a timestamp, no need to convert it in your WHERE clause.
What you keep doing is converting it into a String representation. Thats ok when you want to display it, but not when you want to compare it with other values.

storing dates in mysql

Is it better to store dates in mysql in three columns or use just one column. Which one is faster. Also, if I just want to do inserts with todays date in format dd/mm/yy , how do I do that. and also how do I do selects with that. Also, lets say if I wanted to get results for all the wednesdays, how do I do that or lets say one date 25th of all the months and years, how do i do that.
Thanks People.
I am using PHP with Apache and Mysql.
What are the drawbacks of using the structure that I am proposing. I can easily get all the 25th by using the date table and I can get all the days using another column for days. How much difference would be there in the terms of speed between my proposed solution and using a DATE table?
You will want to use a proper column type, such as DATE, DATETIME, or TIMESTAMP, depending on your needs. They are built specifically to handle dates, and can more easily perform other functions (adding, comparing, etc.) that would be difficult to perform on 3 separate columns.
Read this for more info.
DAYOFWEEK(date) will give you a numeric representation for the day. In your case, 4 = Wednesday. DAYOFMONTH(date) will work for finding all 25th days of the month.
DAYNAME(date) will return the name of the weekday for date
Also, if I just want to do inserts with todays date in format dd/mm/yy ,how do I do that.
Well it depends on the format your date is passed in through your
form but you are going to want to store your date in YYYY-mm-dd format.
INSERT INTO my_table (timefieldname) VALUES ( '$date' );
and also how do I do selects with that.
SELECT timefieldname FROM my_table;
//or you can format the date - this will give you month/day/year 01/01/2012
SELECT DATE_FORMAT(timefieldname, '%m/%d/%Y') FROM my_table;
Also, lets say if I wanted to get results for all the wednesdays,
SELECT timefieldname FROM my_table WHERE DAYNAME(timefieldname) = 'Wednesday';
How do I do that or lets say one date 25th of all the months and years, how do i do that.
SELECT timefieldname FROM my_table WHERE DAY(timefieldname) = '25';
You can free up having to pass dates from your codebase and let mysql insert them for you, provided they are time stamps:
ALTER TABLE tablename ADD `timefieldname` TIMESTAMP ON UPDATE CURRENT_TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ;
It's not much of a speed boost, but it does reduce your need to code and validate variables passed to the database.

Timestamp as int field, query performance

I'm storing timestamp as int field. And on large table it takes too long to get rows inserted at date because I'm using mysql function FROM_UNIXTIME.
SELECT * FROM table WHERE FROM_UNIXTIME(timestamp_field, '%Y-%m-%d') = '2010-04-04'
Is there any ways to speed this query? Maybe I should use query for rows using timestamp_field >= x AND timestamp_field < y?
Thank you
EDITED This query works great, but you should take care of index on timestamp_field.
SELECT * FROM table WHERE
timestamp_field >= UNIX_TIMESTAMP('2010-04-14 00:00:00')
AND timestamp_field <= UNIX_TIMESTAMP('2010-04-14 23:59:59')
Use UNIX_TIMESTAMP on the constant instead of FROM_UNIXTIME on the column:
SELECT * FROM table
WHERE timestamp_field
BETWEEN UNIX_TIMESTAMP('2010-04-14 00:00:00')
AND UNIX_TIMESTAMP('2010-04-14 23:59:59')
This can be faster because it allows the database to use an index on the column timestamp_field, if one exists. It is not possible for the database to use the index when you use a non-sargable function like FROM_UNIXTIME on the column.
If you don't have an index on timestamp_field then add one.
Once you have done this you can also try to further improve performance by selecting the columns you need instead of using SELECT *.
If you're able to, it would be faster to either store the date as a proper datetime field, or, in the code running the query, to convert the date you're after to a unix timestamp before sending it to the query.
The FROM_UNIXTIME would have to convert every record in the table before it can check it which, as you can see, has performance issues. Using a native datatype that is closest to what you're actually using in your queries, or querying with the column's data type, is the fastest way.
So, if you need to continue using an int field for your time, then yes, using < and > on a strict integer would boost performance greatly, assuming you store things to the second, rather than the timestamp that would be for midinight of that day.

Better to use two columns or DATETIME

I'm working on a MySQL database which will create a "Today at" list and send it to subscribers. I'm wondering if it's better to use the DATETIME data type on the start and end fields, or two have two columns, startDate and startTime (with the appropriate data types). My first thought was to use DATETIME, but that makes subsequent use of the system a bit awkward, since you can no longer write:
SELECT * FROM event_list WHERE startAt='2009-04-20';
Instead, the best I found was:
SELECT * FROM event_list WHERE startAt LIKE '2009-04-20%';
and I don't like the hack or its potential impact on performance.
Just use the DATE() function.
SELECT * FROM event_list WHERE DATE(startAt) = '2009-04-20'
SELECT * FROM event_list WHERE startAt >= '2009-04-20' AND startAt < '2009-04-21'
This will use an index on startAt efficiently and handle the boundary conditions correctly. (Any WHERE clause including a function won't be able to use an index - it has no way to know that the expression result has the same ordering as the column values.
Using two columns is a bit like having columns for the integer and decimal parts of real numbers. If you don't need the time, just don't save it in the first place.
you can try smf like this
select * from event_list where date(startAt) = '2009-04-20
How about the best of both worlds -- have a table that uses a single datetime column and a view of that table that gives you both date and time fields.
create view vw_event_list
as select ..., date(startAt) as startDate, time(startAt) as startTime
select * from vw_event_list where startDate = '2009-04-20'
The real consideration between separate date and time fields or 1 datetime field is indexing. You do not want to do this:
select * from event_list where date(startAt) = '2009-04-20'
on a datetime field because it won't use an index. MySQL will convert the startAt data to a date in order to compare it, which means it can't use the index.
You want to do this:
select * from event_list where startAt BETWEEN '2009-04-20 00:00:00' AND '2009-04-20 23:59:59'
The problem with a datetime field is that you can't really use it a compound index since the value is fairly unique. For example, a compound index on startAt+event isn't going to allow you to search on date+event, only datetime+event.
But if you split the data between date and time fields, you can index startDate+event and search on it efficiently.
That's just an example for discussion purposes, you could obviously index on event+startAt instead and it would work. But you may find yourself wanting to search/summarize based on date plus another field. Creating a compound index on that data would make it very efficient.
Just one more thing to add: Beware time zones, if you're offering an online service it'll come up sooner or later and it's really difficult to do retroactively.
Daylight Savings Time is especially bad.
(DAMHIK)