Users of my application need to be able to schedule certain task to run at certain times (e.g. once only, every every minute, every hour, etc.). My plan is to have a cron run a script every minute to check the application to see if it has tasks to execute. If so, then execute the tasks.
Questions:
Is the running of cron every minute a good idea?
How do I model in the database intervals like cron does (e.g. every minute, ever 5th minute of every hour, etc.)?
I'm using LAMP.
Or, rather than doing any, you know, real work, simply create an interface for the users, and then publish entries in cron! Rather than having cron call you every minute, have it call scripts as directed by the users. When they add or change jobs, rewrite the crontab.
No big deal.
In unix, cron allows each user (unix login that is) to have their own crontab, so you can have one dedicated to your app, don't have to use the root crontab for this.
Do you mean that you have a series of user-defined jobs that need executed in user-defined intervals, and you'd like to have cron facilitate the processing of those jobs? If so, you'd want to have a database with at least 2 fields:
JOB,
OFTEN
where OFTEN is how often they'd like the job to run, using syntax similar to CRON.
you'd then need to write a script (in python, ruby, or some similar language) to parse that data. this script would be what runs every 1 minute via your actual cron.
take a look at this StackOverflow question, and this StackOverflow question, regarding how to parse crontab data via python.
Related
I would like to delay deletion of data from the database. I am using MySQL, nest.js. I heard that CRON is what I need. I want to delete the entry in a week. Can you help me with this? CRON is what I need, or i need to use something another?
A cron job (or at in Windows) or a MySQL EVENT can be created to periodically check for something and take action. The resolution is only 1 minute.
If you need a very precise resolution, another technique would be required. For example, if you don't want to show a user something that is more than 1 week old to the second, then simply exclude that from the SELECT. That is add something like this to the WHERE: AND created_date >= NOW() - INTERVAL 7 DAY.
Doing the above gives you the freedom to schedule the actual DELETE for only, say, once a day -- rather than pounding on the database only to usually find nothing to do.
If you do choose to "pound on the database", be aware of the following problem. If one instance of the deleter script is running for a long time (for any of a number of reasons), it might not be finished before the next copy comes along. In some situations these scripts can stumple over each other to the extent of effectively "crashing" the server.
That leads to another solution -- a single script that runs forever. It has a simple loop:
Do the actions needed (deleting old rows)
Sleep 1 -- or 10 or 60 or whatever -- this is to be a "nice guy" and not "pound on the system".
The only tricky part is making sure that starts up after any server restart or crash of the script.
You can configure a cronjob to periodically delete it.
There are several ways to configure a cron job.
You can write a shell script that periodically deletes entities in the db using linux crontab, or you can configure an application that provides cronjobs such as jenkins or airflow.
AWS lambda also provides cronjob.
Using crontab provided by nestjs seems to be the simplest to solve the problem.
See this link
https://docs.nestjs.com/techniques/task-scheduling
I know that it's possible to do a manual import of data from an Excel file to a MySQL database.
But is it possible to make this an automated job, a job to be done every 24 hours or 60 seconds?
In general you should be looking at your operating system's feature set and not at MySQL if you want a certain task to be executed on a periodic basis.
On Unix systems, you can have a look at cron jobs.
On Windows, there's the Task Scheduler.
Concept
The idea is that you have a script (or something else that you can execute) that does exactly what you want once. Then tell your operating system by the means of a cron job, a task or similar to run that script every 24 hours, 60 seconds or whatever you like.
Example
Run the script (or program) excel_to_db.sh every 24 hours:
0 0 * * * /home/user/scripts/excel_to_db.sh
You might want to extend this cron job to redirect the script output (stdout and stderr) to a file and so on to make this more robust.
I have a pipeline taking data from a MySQl server and inserting into a Datastore using DataFlow Runner.
It works fine as a batch job executing once. The thing is that I want to get the new data from the MySQL server in near real-time into the Datastore but the JdbcIO gives bounded data as source (as it is the result of a query) so my pipeline is executing only once.
Do I have to execute the pipeline and resubmit a Dataflow job every 30 seconds?
Or is there a way to make the pipeline redoing it automatically without having to submit another job?
It is similar to the topic Running periodic Dataflow job but I can not find the CountingInput class. I thought that maybe it changed for the GenerateSequence class but I don't really understand how to use it.
Any help would be welcome!
This is possible and there's a couple ways you can go about it. It depends on the structure of your database and whether it admits efficiently finding new elements that appeared since the last sync. E.g., do your elements have an insertion timestamp? Can you afford to have another table in MySQL containing the last timestamp that has been saved to Datastore?
You can, indeed, use GenerateSequence.from(0).withRate(1, Duration.standardSeconds(1)) that will give you a PCollection<Long> into which 1 element per second is emitted. You can piggyback on that PCollection with a ParDo (or a more complex chain of transforms) that does the necessary periodic synchronization. You may find JdbcIO.readAll() handy because it can take a PCollection of query parameters and so can be triggered every time a new element in a PCollection appears.
If the amount of data in MySql is not that large (at most, something like hundreds of thousands of records), you can use the Watch.growthOf() transform to continually poll the entire database (using regular JDBC APIs) and emit new elements.
That said, what Andrew suggested (emitting records additionally to Pubsub) is also a very valid approach.
Do I have to execute the pipeline and resubmit a Dataflow job every 30 seconds?
Yes. For bounded data sources, it is not possible to have the Dataflow job continually read from MySQL. When using the JdbcIO class, a new job must be deployed each time.
Or is there a way to make the pipeline redoing it automatically without having to submit another job?
A better approach would be to have whatever system is inserting records into MySQL also publish a message to a Pub/Sub topic. Since Pub/Sub is an unbounded data source, Dataflow can continually pull messages from it.
So I got the error message in the subject on the 1 (one) script I had running yesterday and I am assuming I will get a similar message today.
I have improved the script (which has a trigger to run once per minute) so it functions more along the lines of how it is supposed to however the error message got me thinking as to what sort of functions or bits of programs might be asking for more service time than others.
For example, I have had to use multiple sleep calls in my google apps script to allow the data import to run and again for the worksheet changes/copy paste calls to process. Are all those sleep calls counting against me in terms of service time used?
I would ask on the community's behalf that this be left as an open ended question not specific to the sleep function. What sorts of parts of a script are demanding service time and which are not (if any).
Every call to a service (Spreadsheet, Calendar or whatever) takes more time than regular JavaScript operations.
For example, if you have to modify 10 cells in a Spreadsheet,
calling range.setValue() 10 times takes far more time than having all the data in an array and then updating the spreadsheet in one go using range.setValues().
If you can paste pieces of your code, the community will be able to offer more advice on how to improve your script.
The limit is on CPU time used in time based triggers, and I believe those sleep calls are counted against your limit. I'd encourage you to find ways to avoid the sleep calls, or schedule your script to run less often.
I run a game statistics site. Its MySQL database is small potatoes compared to most of the things people work on around here, but shared hosting does necessitate an eye on query optimization, particularly when performing lots of joins and sub-queries.
Earlier this week I moved a rather slow (~0.5s) query that grouped, counted, averaged, and sorted the ratings of members to a nightly cron job. Results are stored in a table.
Because we average about one new rating per day, the change does not cause any perceptible data inaccuracy to my users, AND the new query which just grabs rows from the table runs in the ~0.000X range, so all pages pulling that data are noticeably faster.
Clearly this is a good thing!
And as I sat there basking in the glow of my cron job, my mind started running through other aspects of the site and mentally tagging those that could be cron'd... (many)
Which leads me to wonder - is it possible to use cron too much?
Because my site's database changes about once a day, I could conceivably run ALL complex queries (there are many) through nightly cron jobs and store the results in tables.
Is there ever a downside? (apart from data occasionally not being up-to-the-second accurate?)
Cron is great; it's usually a good thing to refrain from reinventing wheels. Some applications have more precise needs than cron can accommodate, so that's one reason not to use it. Also, distributing and managing cronjobs that are to form an integral part of your app can be difficult and error-prone, especially absent a competent package manager from the OS. Troubleshooting can be a little bit of a pain, particularly when there's one server missing one of its 100 cronjobs or something, but that can be managed with an OS package manager or with something like puppet.
But my opinion is to use cron whenever you can and makes sense, rather than rolling your own.
You're not beginning to approach the limits of what amount of jobs can (or should) be scheduled with cron. You'll be just fine. :)
You might want to consider a worker-message queue like gearman to trigger jobs that should be run 'after the fact', but not necessarily on a fixed schedule.
how about one cron job that runs all your procedures?
I once worked on a unix system that failed pretty miserably after the cron job queue exceeded 20 entries. The queue did not execute on any predictable cycle - i.e. FILO, FIFO LIFO etc. it simply was randomized
You might consider using triggers to keep your summary statistics up to date. There's also an event scheduler in MySQL 5.1+ if you like running queries periodically.
http://dev.mysql.com/doc/refman/5.0/en/triggers.html
http://dev.mysql.com/doc/refman/5.1/en/events.html