I would like to delay deletion of data from the database. I am using MySQL, nest.js. I heard that CRON is what I need. I want to delete the entry in a week. Can you help me with this? CRON is what I need, or i need to use something another?
A cron job (or at in Windows) or a MySQL EVENT can be created to periodically check for something and take action. The resolution is only 1 minute.
If you need a very precise resolution, another technique would be required. For example, if you don't want to show a user something that is more than 1 week old to the second, then simply exclude that from the SELECT. That is add something like this to the WHERE: AND created_date >= NOW() - INTERVAL 7 DAY.
Doing the above gives you the freedom to schedule the actual DELETE for only, say, once a day -- rather than pounding on the database only to usually find nothing to do.
If you do choose to "pound on the database", be aware of the following problem. If one instance of the deleter script is running for a long time (for any of a number of reasons), it might not be finished before the next copy comes along. In some situations these scripts can stumple over each other to the extent of effectively "crashing" the server.
That leads to another solution -- a single script that runs forever. It has a simple loop:
Do the actions needed (deleting old rows)
Sleep 1 -- or 10 or 60 or whatever -- this is to be a "nice guy" and not "pound on the system".
The only tricky part is making sure that starts up after any server restart or crash of the script.
You can configure a cronjob to periodically delete it.
There are several ways to configure a cron job.
You can write a shell script that periodically deletes entities in the db using linux crontab, or you can configure an application that provides cronjobs such as jenkins or airflow.
AWS lambda also provides cronjob.
Using crontab provided by nestjs seems to be the simplest to solve the problem.
See this link
https://docs.nestjs.com/techniques/task-scheduling
Related
I have a pipeline taking data from a MySQl server and inserting into a Datastore using DataFlow Runner.
It works fine as a batch job executing once. The thing is that I want to get the new data from the MySQL server in near real-time into the Datastore but the JdbcIO gives bounded data as source (as it is the result of a query) so my pipeline is executing only once.
Do I have to execute the pipeline and resubmit a Dataflow job every 30 seconds?
Or is there a way to make the pipeline redoing it automatically without having to submit another job?
It is similar to the topic Running periodic Dataflow job but I can not find the CountingInput class. I thought that maybe it changed for the GenerateSequence class but I don't really understand how to use it.
Any help would be welcome!
This is possible and there's a couple ways you can go about it. It depends on the structure of your database and whether it admits efficiently finding new elements that appeared since the last sync. E.g., do your elements have an insertion timestamp? Can you afford to have another table in MySQL containing the last timestamp that has been saved to Datastore?
You can, indeed, use GenerateSequence.from(0).withRate(1, Duration.standardSeconds(1)) that will give you a PCollection<Long> into which 1 element per second is emitted. You can piggyback on that PCollection with a ParDo (or a more complex chain of transforms) that does the necessary periodic synchronization. You may find JdbcIO.readAll() handy because it can take a PCollection of query parameters and so can be triggered every time a new element in a PCollection appears.
If the amount of data in MySql is not that large (at most, something like hundreds of thousands of records), you can use the Watch.growthOf() transform to continually poll the entire database (using regular JDBC APIs) and emit new elements.
That said, what Andrew suggested (emitting records additionally to Pubsub) is also a very valid approach.
Do I have to execute the pipeline and resubmit a Dataflow job every 30 seconds?
Yes. For bounded data sources, it is not possible to have the Dataflow job continually read from MySQL. When using the JdbcIO class, a new job must be deployed each time.
Or is there a way to make the pipeline redoing it automatically without having to submit another job?
A better approach would be to have whatever system is inserting records into MySQL also publish a message to a Pub/Sub topic. Since Pub/Sub is an unbounded data source, Dataflow can continually pull messages from it.
Users of my application need to be able to schedule certain task to run at certain times (e.g. once only, every every minute, every hour, etc.). My plan is to have a cron run a script every minute to check the application to see if it has tasks to execute. If so, then execute the tasks.
Questions:
Is the running of cron every minute a good idea?
How do I model in the database intervals like cron does (e.g. every minute, ever 5th minute of every hour, etc.)?
I'm using LAMP.
Or, rather than doing any, you know, real work, simply create an interface for the users, and then publish entries in cron! Rather than having cron call you every minute, have it call scripts as directed by the users. When they add or change jobs, rewrite the crontab.
No big deal.
In unix, cron allows each user (unix login that is) to have their own crontab, so you can have one dedicated to your app, don't have to use the root crontab for this.
Do you mean that you have a series of user-defined jobs that need executed in user-defined intervals, and you'd like to have cron facilitate the processing of those jobs? If so, you'd want to have a database with at least 2 fields:
JOB,
OFTEN
where OFTEN is how often they'd like the job to run, using syntax similar to CRON.
you'd then need to write a script (in python, ruby, or some similar language) to parse that data. this script would be what runs every 1 minute via your actual cron.
take a look at this StackOverflow question, and this StackOverflow question, regarding how to parse crontab data via python.
I have never used CRON or anything like that, rails etc.. before, but I think that I will need to run one. My idea is to create another DB (MySQL) to take stats of another MySQL database everyday. I would also like this to happen for every week and then every month.
Please could you tell me how I could do this?
Is CRON the right thing to use, and am I spelling it right?!
Cron is a task scheduler for *nix systems. There are plenty of resources out there how to use it. Briefly:
You need a script that uses some kind of language that can connect to your database (perl/php) are good options
Assuming cron is installed in your system, in a terminal type crontab -e and the format you can find here at wikipedia
Is there another way to prevent nightly cron jobs that do batch processing against mysql from impacting online webserver->mysql queries other than setting query priority? I'm thinking there may be a way to segment these, but I'm not sure if this is possible?
Try and break the queries down, perhaps rather than processing lots of data in one go try and process smaller batches but more often. This way you will lock tables for less time and allow gaps for queries from the frontend to be executed.
Another solution would be to process more often but even during the day. My last project used an event system, so that a user would comment something and this event would go into a queue. A background process (executed from The Fat Controller) would then take this event and insert data so that all the user's friends news feeds were updated about the comment. That way feeds are updated by simple insert statements and not rebuilt from scratch every x hours.
I'm new to databases and web servers and that kind of thing. So I am looking for information so I can begin to figure out a starting point and options open to me.
I need to have a database that can be accessed by an iPhone app. So logically it will be hosted on a webserver somewhere.
To get/insert the data from/into the database the app would make a HTTP connection to a php file on the same server as the DB which would then insert/return the relevant data. To stop random hackers messing with the DB the app would have some validation code inside it to send to the php file to check that its not a hacker trying to mess with the database. This all making sense or will that not be secure enough.
Now the most confusing part to get my head around is :
I need check every minute has any data in the database become to old and remove it if so. So something needs to be running on the server constantly checking/manageing the database. What would this be? What is commonly used to do this kinda of thing? Is there somekey word for it that i can start searching and reading about to see what options there are?
Thanks for your advise,
-Code
One way to do this is to have a purge script run via crontab. The script can run every minute and check for old data and remove it.
MySQL version greater than 5.1.6 has inbuilt event scheduler which can be used to schedule periodic jobs inside mysql server itself.
http://dev.mysql.com/doc/refman/5.1/en/events.html
Sounds to me like you need a cron job. Cron is the standard scheduling task application for Unix type systems.
You would have some sort of script that connects to the database and performs a cleanup query, and you would schedule that script via cron.
http://en.wikipedia.org/wiki/Cron