Is there any way to link two Cron Jobs in OpenShift - openshift

everyone!
Is there any way to link two Cron Jobs in OpenShift? I have two Cron Jobs. The first one makes a .tar archive from some data. And the second one should operate with this archive. Can I add some condition to the second Cron Job, to make it run only if the first one has finished? The first Cron Job can run from several seconds to several hours, so it is not very comfortable to guess certain time interval, to be sure that it is completed.
Will be thankful for any idea.

Share the archive file via persist volume, and in the second job readiness probe to check if the jar file exists, then make decision to go ahead or abort for next check, archive the archive file to other location after finish.
With Kubernetes API to schedule the second job in the first job

Related

What exactly triggers a pipeline job to run in Code Repositories?

Want to understand when the pipeline job runes so I can more effectively understand the pipeline build process. Does it check the code change from master branch of the Code Repository?
Building a job on the pipeline, builds the artifact that was delivered on the instances, not what has been merged onto master.
It should be the same, but there is a checking process after the merge onto master and before the delivery of the artifact, like you would have on a regular Git/Jenkins/Artifactory.
So there is a delay.
And moreover if these checks don't pass, your change, even though merged onto master, will never appear on the pipeline.
To add a bit more precision as to what #Kevin Zhang wrote.
There's also the possibility to trigger a job using an API call, even though it's not the most common.
Also you can combine the different events to say things like
Before work hours
build only if the schedule of the morning update has succeeded
during work hours
build every hour
if an input has new data
and
if a schedule has run successfully
or another dataset has been updated
after hours
build whenever an input has new data
It can also helps you create loops, like if you have a huge amount of data coming in input B and it impacts your sync toward the ontology, or a timeserie,... , you could create a job that takes a limited number of rows from input B and log the id's of these in a table to not retake them again, you process those rows and when the output C is updated you rerun your job and when there is no more row you update output D.
You can also add a schedule on the job that produces input B from input A stating to rerun it only when output D is updated.
This would enable you to process a number of files from a source, process the data from those files chunk by chunk and then take another batch of files and iterate.
By naming your schedule functionnaly you can have a more controlled build of your pipeline and finer grain of data governance and you can also add some audit table or log tables based on these schedules, which will make debugging and auditing way more easy.
You would have a trace of when and where a specific source update has reach.
Of course, you need such precision only if your pipeline is complexe : like many different sources, updated at different time and updating multiple part of your pipeline.
For instance if you're unifying the data of your client that was before separated in many silos or if it is a multinational group of many different local or global entities, like big car manufacturers
It depends on what type of trigger you’ve set up.
If your schedule is a single cron schedule (i.e.: by scheduled time), the build would not look at the master branch repo. It'll just build according to the cron schedule.
If your schedule contains an event trigger (e.g. one of the 4 event types: Job Spec Put, Transaction Committed, Job Succeeded and Schedule Ran Successfully), then it'll trigger based on the event where only the Job Spec Put even type would trigger based on the master branch code change.

How can I delay deletion?

I would like to delay deletion of data from the database. I am using MySQL, nest.js. I heard that CRON is what I need. I want to delete the entry in a week. Can you help me with this? CRON is what I need, or i need to use something another?
A cron job (or at in Windows) or a MySQL EVENT can be created to periodically check for something and take action. The resolution is only 1 minute.
If you need a very precise resolution, another technique would be required. For example, if you don't want to show a user something that is more than 1 week old to the second, then simply exclude that from the SELECT. That is add something like this to the WHERE: AND created_date >= NOW() - INTERVAL 7 DAY.
Doing the above gives you the freedom to schedule the actual DELETE for only, say, once a day -- rather than pounding on the database only to usually find nothing to do.
If you do choose to "pound on the database", be aware of the following problem. If one instance of the deleter script is running for a long time (for any of a number of reasons), it might not be finished before the next copy comes along. In some situations these scripts can stumple over each other to the extent of effectively "crashing" the server.
That leads to another solution -- a single script that runs forever. It has a simple loop:
Do the actions needed (deleting old rows)
Sleep 1 -- or 10 or 60 or whatever -- this is to be a "nice guy" and not "pound on the system".
The only tricky part is making sure that starts up after any server restart or crash of the script.
You can configure a cronjob to periodically delete it.
There are several ways to configure a cron job.
You can write a shell script that periodically deletes entities in the db using linux crontab, or you can configure an application that provides cronjobs such as jenkins or airflow.
AWS lambda also provides cronjob.
Using crontab provided by nestjs seems to be the simplest to solve the problem.
See this link
https://docs.nestjs.com/techniques/task-scheduling

SSIS ETL solution needs to import 600,000 small simple files every hour. What would be optimal Agent scheduling interval?

The hardware, infrastructure, and redundancy are not in the scope of this question.
I am building an SSIS ETL solution needs to import ~600,000 small, simple files per hour. With my current design, SQL Agent runs the SSIS package, and it takes “n” number of files and processes them.
Number of files per batch “n” is configurable
The SQL Agent SSIS package execution is configurable
I wonder if the above approach is a right choice? Or alternatively, I must have an infinite loop in the SSIS package and keep taking/processing the files?
So the question boils down to a choice between infinite loop vs. batch+schedule. Is there any other better option?
Thank you
In a similar situation, I run an agent job every minute and process all files present. If the job takes 5 minutes to run because there are alot of files, the agent skips the scheduled runs until the first one finishes so there is no worry that two processes will conflict with each other.
Is SSIS the right tool?
Maybe. Let's start with the numbers
600000 files / 60 minutes = 10,000 files per minute
600000 files / (60 minutes * 60 seconds) = 167 files per second.
Regardless of what technology you use, you're looking at some extremes here. Windows NTFS starts to choke around 10k files in a folder so you'll need to employ some folder strategy to keep that count down in addition to regular maintenance
In 2008, the SSIS team managed to load 1TB in 30 minutes which was all sourced from disk so SSIS can perform very well. It can also perform really poorly which is how I've managed to gain ~36k SO Unicorn points.
6 years is a lifetime in the world of computing so you may not need to take such drastic measures as the SSIS team did to set their benchmark but you will need to look at their approach. I know you've stated the hardware is outside of the scope of discussion but it very much is inclusive. If the file system (san, nas, local disk, flash or whatever) can't server 600k files then you'll never be able to clear your work queue.
Your goal is to get as many workers as possible engaged in processing these files. The Work Pile Pattern can be pretty effective to this end. Basically, a process asks: Is there work to be done? If so, I'll take a bit and go work on it. And then you scale up the number of workers asking and doing work. The challenge here is to ensure you have some mechanism to prevent workers from processing the same file. Maybe that's as simple as filtering by directory or file name or some other mechanism that is right for your situation.
I think you're headed down this approach based on your problem definition with the agent jobs that handle N files but wanted to give your pattern a name for further research.
I would agree with Joe C's answer - schedule the SQL Agent job to run as frequently as needed. If it's already running, it won't spawn a second process. Perhaps you're going to have multiple agents that all start every minute - AgentFolderA, AgentFolderB... AgentFolderZZH and they are each launching a master package that then has subprocesses looking for work.
Use WMI Event viewer watcher to know if new file arrived or not and next step you can call job scheduler to execute or execute direct the ssis package.
More details on WMI event .
https://msdn.microsoft.com/en-us/library/ms141130%28v=sql.105%29.aspx

LAMP: How to Implement Scheduling?

Users of my application need to be able to schedule certain task to run at certain times (e.g. once only, every every minute, every hour, etc.). My plan is to have a cron run a script every minute to check the application to see if it has tasks to execute. If so, then execute the tasks.
Questions:
Is the running of cron every minute a good idea?
How do I model in the database intervals like cron does (e.g. every minute, ever 5th minute of every hour, etc.)?
I'm using LAMP.
Or, rather than doing any, you know, real work, simply create an interface for the users, and then publish entries in cron! Rather than having cron call you every minute, have it call scripts as directed by the users. When they add or change jobs, rewrite the crontab.
No big deal.
In unix, cron allows each user (unix login that is) to have their own crontab, so you can have one dedicated to your app, don't have to use the root crontab for this.
Do you mean that you have a series of user-defined jobs that need executed in user-defined intervals, and you'd like to have cron facilitate the processing of those jobs? If so, you'd want to have a database with at least 2 fields:
JOB,
OFTEN
where OFTEN is how often they'd like the job to run, using syntax similar to CRON.
you'd then need to write a script (in python, ruby, or some similar language) to parse that data. this script would be what runs every 1 minute via your actual cron.
take a look at this StackOverflow question, and this StackOverflow question, regarding how to parse crontab data via python.

Build job if some specific last job build was successful

I have a job A that is run every hour. Also job B is run after each commit to github (integration tests job). How can i know before running job A if last build of job B was successful and discard build of A if last build of B was unstable?
Thanks.
As far as I know, this is not possible when you use hudson out of the box. Without any specifics about your job dependencies it is also not easy to design the right workaround.
Different Options:
If your job A runs fast, let it run anyway.
Since job A runs every hour, can you go away with job B running every hour. In this case Job B is successful it will trigger job A.
Have an external shell script that triggers job A every hour. Before triggering, check the status of your last build from job B (http:///job//api/xml?xpath=/mavenModuleSetBuild/result/text%28%29). For info on how to trigger a build have a look at the "Trigger builds remotely" option in your job.
This list is probably not exhaustive.