I have the following table in MySQL that records event counts of stuff happening each day
event_date event_count
2011-05-03 21
2011-05-04 12
2011-05-05 12
I want to be able to query this efficiently by date range AND by day of week. For example - "What is the event_count on Tuesdays in May?"
Currently the event_date field is a date type. Are there any functions in MySQL that let me query this column by day of week, or should I add another column to the table to store the day of week?
The table will hold hundreds of thousands of rows, so given a choice I'll choose the most efficient solution (as opposed to most simple).
Use DAYOFWEEK in your query, something like:
SELECT * FROM mytable WHERE MONTH(event_date) = 5 AND DAYOFWEEK(event_date) = 7;
This will find all info for Saturdays in May.
To get the fastest reads store a denormalized field that is the day of the week (and whatever else you need). That way you can index columns and avoid full table scans.
Just try the above first to see if it suits your needs and if it doesn't, add some extra columns and store the data on write. Just watch out for update anomalies (make sure you update the day_of_week column if you change event_date).
Note that the denormalized fields will increase the time taken to do writes, increase calculations on write, and take up more space. Make sure you really need the benefit and can measure that it helps you.
Check DAYOFWEEK() function
If you want textual representation of day of week - use DAYNAME() function.
Related
I need to calculate the number of "working minutes" between two datetime values, lets call them 'Created' and 'Finished'.
'Finished' is always subsequent to 'Created'. The two values can differ by anything from 1 second to several years. The median difference is 50,000 seconds or roughly 14 hours.
Working minutes are defined as those occurring between 0900 to 1700 hours, Monday to Friday; excluding weekends and official holidays in our country.
I decided a lookup table was the way to go, so I generated a table of all work minutes, explicitly excluding weekends, nights and holidays...
CREATE TABLE `work_minutes` (
`min` datetime NOT NULL,
PRIMARY KEY (`min`),
UNIQUE KEY `min_UNIQUE` (`min`)
)
I populated this programatically with all the "working minutes" between years 2017 to 2024, and at this point I started to get the feeling I was being very inefficient as the table began to balloon to several hundred thousand rows.
I can do a lookup easily enough, for instance:
SELECT COUNT(min) FROM `work_minutes` AS wm
WHERE wm.min > '2022-01-04 00:04:03'
AND wm.min <= '2022-02-03 14:13:09';
#Returns 10394 'working minutes' in 0.078 sec
This is good enough for a one-off lookup but to query a table of 70,000 value pairs takes over 90 minutes.
So, I am uncomfortable with the slowness of the query and the sense that the lookup table is unnecessarily bloated.
I am thinking I need to set up two tables, one just for dates and another just for minutes, but not sure how to implement. Date logic has never been my forte. The most important thing to me is that the lookup can query over 70,000 values reasonably quickly and efficiently.
Working in MySQL 5.7.30. Thanks in advance for your expertise.
Divide the timerange to 3 parts - starting and finishing incomplete day parts, and middle part which consists from a lot of complete days. Of course if both starting and finishing time stamps have the same date part then it will be one part only, if their dates are consecutive then you\ll have 2 parts to process.
There is no problem to calculate the number of working minutes in incomplete day part. Common overlapping formula with weekday checking will help.
Create static calendar/service table which starts from the date which is earlier than any possible date in your beginning timestamp with guarantee and includes all dates after any possible date in your finishing timestamp. Calculate cumulative working minutes for each date in the table. This table allows to calculate the amount of working time in any range of complete days with single substraction.
Plan A: Convert the DATETIME values to seconds (from some arbitrary time) via TO_SECONDS(), then manipulate them with simple arithmetic.
Plan B: Use the DATEDIFF() function.
Your COUNT(min) counts the number of rows where min IS NOT NULL. You may as well say COUNT(*). But did you really want to count the number of rows?
SELECT
name,
start_time,
TIME(cancelled_date) AS cancelled_time,
TIMEDIFF(start_time, TIME(cancelled_date)) AS difference
FROM
bookings
I'm trying to get from the database a list of bookings which were cancelled with less than an hour's notice. The start time and the cancellation times are both in TIME format, I know a timestamp would have made this easier. So above I've calculated the time difference between the two values and now need to add a WHERE clause to restrict it to only those records that have a difference of under 1:00:00. Obviously this isn't a number, it's a time, so a simple bit of maths won't do it.
start_time is a TIME
cancelled_date is a DATETIME but I'm converting it to TIME in the query to then calculate cancelled_time and difference.
I would be inclined to do this by adding and hour to the notice, something like this:
WHERE start_time > date_add(cancelled_date, interval 1 hour)
I can't quite tell what the right logic is from the question, because your column names don't match the description.
In this case, so a subtraction or doing the comparison are similar performance wise. But, if you had a constant instead of cancelled_date, then there is a difference. The following:
WHERE start_time < date_add(now(), interval -1 hour)
Allows the engine to use an index on start_time.
you can use having difference<time('1:00')
I generally use datetime field to store created_time updated time of data within an application.
But now i have come across a database table where they have kept date and time separate fields in table.
So what are the schema in which two of these should be used and why?
What are pros and cons attached with using of two?
There is a huge difference in performance when using DATE field above DATETIME field. I have a table with more then 4.000.000 records and for testing purposes I added 2 fields with both their own index. One using DATETIME and the other field using DATE.
I disabled MySQL query cache to be able to test properly and looped over the same query for 1000x:
SELECT * FROM `logs` WHERE `dt` BETWEEN '2015-04-01' AND '2015-05-01' LIMIT 10000,10;
DATETIME INDEX:
197.564 seconds.
SELECT * FROM `logs` WHERE `d` BETWEEN '2015-04-01' AND '2015-05-01' LIMIT 10000,10;
DATE INDEX:
107.577 seconds.
Using a date indexed field has a performance improvement of: 45.55%!!
So I would say if you are expecting a lot of data in your table please consider in separating the date from the time with their own index.
I tend to think there are basically no advantages to storing the date and time in separate fields. MySQL offers very convenient functions for extracting the date and time parts of a datetime value.
Okay. There can be some efficiency reasons. In MySQL, you can put separate indexes on the fields. So, if you want to search for particular times, for instance, then a query that counts by hours of the day (for instance) can use an index on the time field. An index on a datetime field would not be used in this case. A separate date field might make it easier to write a query that will use the date index, but, strictly speaking, a datetime should also work.
The one time where I've seen dates and times stored separately is in a trading system. In this case, the trade has a valuation date. The valuation time is something like "NY Open" or "London Close" -- this is not a real time value. It is a description of the time of day used for valuation.
The tricky part is when you have to do date arithmetic on a time value and you do not want a date portion coming into the mix. Ex:
myapptdate = 2014-01-02 09:00:00
Select such and such where myapptdate between 2014-01-02 07:00:00 and 2014-01-02 13:00:00
1900-01-02 07:00:00
2014-01-02 07:00:00
One difference I found is using BETWEEN for dates with non-zero time.
Imagine a search with "between dates" filter. Standard user's expectation is it will return records from the end day as well, so using DATETIME you have to always add an extra day for the BETWEEN to work as expected, while using DATE you only pass what user entered, with no extra logic needed.
So query
SELECT * FROM mytable WHERE mydate BETWEEN '2020-06-24' AND '2020-06-25'
will return a record for 2020-06-25 16:30:00, while query:
SELECT * FROM mytable WHERE mydatetime BETWEEN '2020-06-24' AND '2020-06-25'
won't - you'd have to add an extra day:
SELECT * FROM mytable WHERE mydatetime BETWEEN '2020-06-24' AND '2020-06-26'
But as victor diaz mentioned, doing datetime calculations with date+time would be a super inefficient nightmare and far worse, than just adding a day to the second datetime. Therefore I'd only use DATE if the time is irrelevant, or as a "cache" for speeding queries up for date lookups (see Elwin's answer).
I store top-views and 'likes' in a table called 'counts'. Once a night I run this query
UPDATE `counts` SET rank=d7+d6+d5+d4+d3+d2+d1,d7=d6,d6=d5,d5=d4,d4=d3,d3=d2,d2=d1,d1=0
Each day of the week has a d1-d7 variable, and we move it 'down' one each night and re-calculate the sum.
As my site has grown, this query now takes ~20 minutes.
I'm looking for suggestions on how to organize this more efficiently, as it seems like it might be a common pattern.
As the comments say, we need to see the schema. But I'll make a suggestion anyway. Don't have 7 different fields d1-d7. What if later you decide to keep the score over a year? Ouch.
I'm going to assume that counts has view_id as its PK. Then have another table ranks with columns view_id (set as FK into counts), rank (generalizes d1-d7, whatever datatype they are) and rank_date, which is a date. Now every night you have
UPDATE counts SET rank = (SELECT SUM(rank) FROM ranks r WHERE r.view_id=counts.view_id
AND r.rank_date>=DATE_SUB(CURDATE(), INTERVAL 1 WEEK) );
[Some RDBMSs allow a JOIN-type syntax in UPDATE queries. I believe MySQL understands something similar to the following, but it isn't my usual RDBMS
UPDATE counts, (SELECT view_id, SUM(rank) AS srank FROM ranks r
WHERE r.rank_date>=DATE_SUB(CURDATE(), INTERVAL 1 WEEK)
GROUP BY r.view_id) AS q1
SET rank = srank
WHERE counts.view_id=q1.view_id;
]
If so, that will probably run faster than the first version.
Meanwhile, optionally to clean up, you can delete rows from ranks that are more than 1 week old, but in a more flexible schema, you don't have to.
If I have MySQL query like this, summing word frequencies per week:
SELECT
SUM(`city`),
SUM(`officers`),
SUM(`uk`),
SUM(`wednesday`),
DATE_FORMAT(`dateTime`, '%d/%m/%Y')
FROM myTable
WHERE dateTime BETWEEN '2011-09-28 18:00:00' AND '2011-10-29 18:59:00'
GROUP BY WEEK(dateTime)
The results given by MySQL take the first value of column dateTime, in this case 28/09/2011 which happens to be a Saturday.
Is it possible to adjust the query in MySQL to show the date upon which the week commences, even if there is no data available, so that for the above, 2011-09-28 would be replaced with 2011/09/26 instead? That is, the date of the start of the week, being a Monday. Or would it be better to adjust the dates programmatically after the query has run?
The dateTime column is in format 2011/10/02 12:05:00
It is possible to do it in SQL but it would be better to do it in your program code as it would be more efficient and easier. Also, while MySQL accepts your query, it doesn't quite make sense - you have DATE_FORMAT(dateTime, '%d/%m/%Y') in select's field list while you group by WEEK(dateTime). This means that the DB engine has to select random date from current group (week) for each row. Ie consider you have records for 27.09.2011, 28.09.2011 and 29.09.2011 - they all fall onto same week, so in the final resultset only one row is generated for those three records. Now which date out of those three should be picked for the DATE_FORMAT() call? Answer would be somewhat simpler if there is ORDER BY in the query but it still doesn't quite make sense to use fields/expressions in the field list which aren't in GROUP BY or which aren't aggregates. You should really return the week number in the select list (instead of DATE_FORMAT call) and then in your code calculate the start and end dates from it.