I have an attribute in MYSQL database called "dueDate". I want to update the record on the due date at 11:59PM.
Is there a way to create an event or cronjob that could do that?
You can set an event, or cronjob, for any particular time, if your system administration allows you to use either facility. It's easy to look up how to create a MySQL repeating event.
But this is a brittle way of dealing with time dependencies in your business rules. What do I mean "brittle?" For one thing, if something goes wrong and the job doesn't run, your business rules are fouled up and need to be repaired. For another thing, cronjobs and events don't run at precise times of day, they run on or after that time of day. They can take awhile to start.
So, I suggest you use rules in a query to enforce your business rules. Suppose, for example, your original desire is to set a column called is_overdue to 1 at the end of the due date. Instead, use a query like this to compute your is_overdue column.
SELECT whatever, dueDate,
IF(dueDate >= CURDATE() + INTERVAL 1 DAY, 1, 0) is_overdue
FROM table ...
This has the advantage that it will always be correct, down to the millisecond, and won't depend on the running of a brittle background job.
Events and cronjobs are better used for purging of stale records. For example, you can get rid of any records that have been expired for 30 days or more by using this kind of query in them.
DELETE FROM table WHERE dueDate <= CURDATE() - INTERVAL 30 DAY
If your cronjob / event fails to run on a particular day, the next day's run will still do the cleanup correctly.
Edit. The point of this suggestion is to compute the time-dependent column (is_expired in the example). If you follow this suggestion, you won't update the table at all. Instead, you'll use the suggested query whenever you retrieve the is_expired value.
Pro tip. When you want to do something at a time <= the last moment of a particular day, you're better off doing it at a time < the first moment of the next day. That is, for best results use
WHERE dueDate < '2017-11-17' + INTERVAL 1 DAY
in place of
WHERE dueDate <= '2017-11-17 23:59:59'
Why? the last moment of a day is hard to express precisely, especially if your system's timing using subsecond precision. But the first moment of a day is easy to express precisely.
Related
Last year I was working on a project for university where one feature necessitated the expiry of records in the database with almost to-the-second precision (i.e. exactly x minutes/hours after creation). I say 'almost' because a few seconds probably wouldn't have meant the end of the world for me, although I can imagine that in something like an auction site, this probably would be important (I'm sure these types of sites use different measures, but just as an example).
I did research on MySQL events and did end up using them, although now that I think back on it I'm wondering if there is a better way to do what I did (which wasn't all that precise or efficient). There's three methods I can think of using events to achieve this - I want to know if these methods would be effective and efficient, or if there is some better way:
Schedule an event to run every second and update expired records. I
imagine that this would cause issues as the number of records
increases and takes longer than a second to execute, and might even
interfere with normal database operations. Correct me if I'm wrong.
Schedule an event that runs every half-hour or so (could be any
time interval, really), updating expired records. At the same time, impose
selection criteria when querying the database to only return records
whose expiration date has not yet passed, so that any records that
expired since the last event execution are not retrieved. While this
would be accurate at the time of retrieval, it defeats the purpose
of having the event in the first place, and I'd assume the extra
selection criteria would slow down the select query. In my project
last year, I used this method, and the event updating the records
was really only for backend logging purposes.
At insert, have a trigger that creates a dynamic event specific to
the record that will expire it precisely when it should expire.
After the expiry, delete the event. I feel like this would be a
great method of doing it, but I'm not too sure if having so many
events running at once would impact on the performance of the
database (imagine a database that has even 60 inserts an hour -
that's 60 events all running simultaneously for just one hour. Over
time, depending on how long the expiration is, this would add up).
I'm sure there's more ways that you could do this - maybe using a separate script that runs externally to the RDBMS is an option - but these are the ones I was thinking about. If anyone has any insight as to how you might expire a record with precision, please let me know.
Also, despite the fact that I actually did use it in the past, I don't really like method 2 because while this works for the expiration of records, it doesn't really help me if instead of expiring a record at a precise time, I wanted to make it active at a certain time (i.e. a scheduled post in a blog site). So for this reason, if you have a method that would work to update a record at a precise time, regardless of what that that update does (expire or post), I'd be happy to hear it.
Option 3:
At insert, have a trigger that creates a dynamic event specific to the record that will expire it precisely when it should expire. After the expiry, delete the event. I feel like this would be a great method of doing it, but I'm not too sure if having so many events running at once would impact on the performance of the database (imagine a database that has even 60 inserts an hour - that's 60 events all running simultaneously for just one hour. Over time, depending on how long the expiration is, this would add up).
If you know the expiry time on insert just put it in the table..
library_record - id, ..., create_at, expire_at
And query live records with the condition:
expire_at > NOW()
Same with publishing:
library_record - id, ..., create_at, publish_at, expire_at
Where:
publish_at <= NOW() AND expire_at > NOW()
You can set publish_at = create_at for immediate publication or just drop create_at if you don't need it.
Each of these, with the correct indexing, will have performance comparable to an is_live = 1 flag in the table and save you a lot of event related headache.
Also you will be able to see exactly why a record isn't live and when it expired/should be published easily. You can also query things such as records that expire soon and send reminders with ease.
In order to analyze dates and times I am creating a MySQL table where I want to keep the time information. Some example analyses will be stuff like:
Items per day/week/month/year
Items per weekday
Items per hour
etc.
Now in regards to performance, what way should I record in my datatable:
date type: Unix timestamp?
date type: datetime?
or keep date information in one row each, e.g. year, month, day in separate fields?
The last one, for example, would be handy if I'm analysing by weekday; I wouldn't have to perform WEEKDAY(item.date) on MySQL but could simply use WHERE item.weekday = :w.
Based on your usage, you want to use the native datetime format. Unix formats are most useful when the major operations are (1) ordering; (2) taking differences in seconds/minutes/hours/days; and (3) adding seconds/minutes/hours/days. They need to be converted to internal date time formats to get the month or week day, for instance.
You also have a potential indexing issue. If you want to select ranges of days, hours, months and so on for your results, then you want an index on the column. For this purpose an index on a datetime is probably sufficient.
If the summaries are by hour, you might find it helpful to stored the date component in a date field and the hour in a separate column. That would be particularly helpful if you are combining hours from different days.
Whether you break out other components of the date, such as weekday and month, for indexing purposes would depend on the volume of data in the table, performance requirements, and the queries you are planning on running. I would not be inclined to do this, except as a later optimization.
The rule of thumb is: store things as they should be stored, don't do performance tweaks until you're hitting the bottleneck. If you store your date as separate fields, you'll eventually stumble upon a situation you need this date as a whole inside your database (e.g. update query for a particular range of time), and this will be like hell - condition from 3 april 2015 till 15 may 2015 would be as giant as possible.
You should keep your dates as date type. This will grant you maximum flexibility, (most probably) query readability and will keep all of your opportunities to work with them. The only thing I really can recommend is storing the same date divided into year/month/day in next columns - of course, this will bloat your database and require extreme caution on update scenarios, but this will allow you to use any variant of source data in your queries.
I've got a dataset that I want to be able to slice up by date interval. It's a bunch of scraped web data and each item has a unix-style milisecond timestamp as well as a standard UTC datetime.
I'd like to be able to query the dataset, picking out the rows that are closest to various time intervals:
e.g.: Every hour, once a day, once a week, etc.
There is no guarantee that the timestamps are going to fall evenly on the interval times, otherwise I'd just do a mod query on the timestamp.
Is there a way to do this with SQL commands that doesn't involve stored procs or some sort of pre-computed support tables?
I use the latest MariaDB.
EDIT:
The marked answer doesn't quite answer my specific question but it is a decent answer to the more generalized problem so I went ahead and marked it.
I was specifically looking for a way to query a set of data where the timestamp is highly variable and to grab out rows that are reasonably close to periodic time intervals. E.g.: get all the rows that are the closest to being on 24 hour intervals from right now.
I ended up using a modulus query to solve the problem: timestamp % interval < average spacing between data points. This occasionally grabs extra points and misses a few but was good enough for my graphing application.
And them I got sick of the node-mysql library crashing all the time so I moved to MongoDB.
You say you want 'closest to various time intervals' but then say 'every hour/day/week', so the actual implementation will depend on what you really want, but you can use a host of standard date/time functions to group records, for example count by day:
SELECT DATE(your_DateTime) AS Dt, COUNT(something) AS CT
FROM yourTable
GROUP BY DATE(your_DateTime)
Count by Hour:
SELECT DATE(your_DateTime) AS Dt,HOUR(your_DateTime) AS Hr, COUNT(something) AS CT
FROM yourTable
GROUP BY DATE(your_DateTime), HOUR(your_DateTime)
See the full list of supported date and time functions here:
https://mariadb.com/kb/en/date-and-time-functions/
I have a fairly 'active' CDR table I want to select records from it every say 5 minutes for those last 5 minutes. The problem is it has a SHA IDs generated on a few of the other columns so all I have to lean on is a timestamp field by which I filter by date to select the time window of records I want.
The next problem is that obviously I cannot guarantee my script will run on the second precisely every time, or that the wall clocks of the server will be correct (which doesn't matter) and most importantly there almost certainly will be more than one record per second say 3 rows '2013-08-08 14:57:05' and before the second expired one more might be inserted.
By the time for '2013-08-08 14:57:05' and get records BETWEEN '2013-08-08 14:57:05' AND '2013-08-08 15:02:05' there will be more records for '2013-08-08 14:57:05' which I would have missed.
Essentially:
imprecise wall clock time
no sequential IDs
multiple records per second
query execution time
unreliable frequency of running the query
Are all preventing me from getting a valid set of rows in a specified rolling time window. Any suggestions for how I can go around these?
If you are using the same clock then i see no reason why things would be wrong. a resolution you would want to consider is a datetime table. So that way, every time you updated the start and stop times based on the server time.... then as things are added it would be guarenteed to be within that timeframe.
I mean, you COULD do it by hardcoding, but my way would sort of forcibly store a start and stop point in the database to use.
I would use Cron to handle the intervals and timing somewhat. Not use the time from that, but just to not lock up the database by checking all the time.
I probably not got all the details but to answer to your question title "Reliably select from a database table at fixed time intervals"...
I don't think you could even hope for a query to be run at "second precise" time.
One key problem with that approach is that you will have to deal with concurrent access and lock. You might be able to send the query at fixed time maybe, but your query might be waiting on the DB server for several seconds (or being executed seeing fairly outdated snapshot of the db). Especially in your case since the table is apparently "busy".
As a suggestion, if I were you, I would spend some time to think about queue messaging systems (like http://www.rabbitmq.com/ just to cite one, not presaging it is somehow "your" solution). Anyway those kind of tools are probably more suited to your needs.
Distilling this project down to the simplest of terms;
Users click a button, a record is made with a timestamp of NOW().
NOW() of course equals the time on the server of record creation.
I need to show them stats based on their timezone, not mine.
What is the best method for dealign with time zone offsets in MySql? Is there a specific field format that is designed to deal with an offset?
I will need to run things along the lines of:
SELECT DATE_FORMAT(b_stamp, '%W') AS week_day, count(*) AS b_total, b_stamp
FROM table
WHERE
(b_stamp >= DATE_SUB(NOW(), INTERVAL 7 DAY))
AND
(user_id = '$user_id') GROUP BY week_day ORDER BY b_stamp DESC
I would rather not ask the user what time zone they are in, I assume JS is the only way to pull this data out of the browser. Maybe if they are on a mobile device, and this is not a web based app, I could get it there, but that may not be the direction this goes in.
I am considering the best way may be to determine their offset, and set a variable to "server_time" +/- their_offset. This makes it appear as if the server is in a different location. I believe this would be best, as there would be no additional +/- logic I need to add to the code, muddying it and making it ugly.
On the other hand, that puts the data in the database with time stamps that are all over the board.
Suggestions?
You can use javascript to get timezone from client as follows:
var timeZone=(new Date().gettimezoneOffset()/60)*(-1);
print the variable out and test before using it. I think this will be your simplest bet.
Other than using JS, you could get the time zone of their IP address (using something like ip2location) then use MySQL's CONVERT_TZ() function.