Successful build, but incomplete writeAndRead tasks - palantir-foundry

is it usual for a build to be successful, when not all the tasks for the writeAndRead stage completed?
e.g. I have 38637 of 50000 tasks complete

Related

Need a kibana watcher script

I need a watcher, because for one of my application we are getting some Not found errors in our main application in mule. So we implemented a new application like the payloads which are failed with Not found errors in the main application it will come to new application and there it will try for n number of times even though if it get failed after n number of times it will come to the Error queue. So, the errors which are come to error queue after retires i need a watcher notification in kibana. For this scenario i need a watcher and it should trigger for every 24 hours and it should notify to team via mail.
I need a kiabana watcher script for the above scenario.

Make sure some github action only run by the owner

I have an action job which upload the context to other website. The token was set and stored in the secret.MY_TOKEN.
But others who make the pull request also trigger this action job using the token I set.
How to limit the privilege of executing the jobs that only I can run this action job.
fyi my ci.yml as follow:
name: foobar
on: [push, pull_request]
jobs:
upload:
runs-on: ubuntu-latest
steps:
....
- name: execute upload
env:
TOKEN: ${{ secrets.MYTOKEN }}
run:
upl --token ${TOKEN}
I assume there are two security problems here.
The token is printed in log file.
others who can use this private token by trigger action with their own purpose.
Use the github.repository_owner context
https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#github-context
The syntax should be something like:
- if: github.repository_owner == 'owner_name'
There is a new feature which could help, since July 2022:
Differentiating triggering actor from executing actor
Starting next week, workflow re-runs in GitHub Actions will use the initial run’s actor for privilege evaluation.
The actor who triggered the re-run will continue to be displayed in the UI, and can be accessed in a workflow via the triggering_actor field in the GitHub context.
Currently, the privileges (e.g. – secrets, permissions) of a run are derived from the triggering actor.
This poses a challenge in situations where the actor triggering a re-run is different than the original executing actor.
The upcoming change will differentiate the initial executing actor from the triggering actor, enabling the stable execution of re-runs.
For more details see Re-running workflows and jobs.
I don't believe allowing actions to run only for certain users is a native feature.
However, you could simply check the action context actor and exit early if the actor is not the yourself (or the owner of the repo, or whatever condition you'd like).

Receive email only when all the tasks are completed

I am launching a lot of jobs on a cluster as an array (similarly to what explained in http://www3.imperial.ac.uk/bioinfsupport/help/cluster_usage/submitting_array_jobs)
If I use $ -m ea I receive hundreds of emails, one for job.
How can I receive an email only when all the tasks are completed? Is it possible to receive when all the tasks are completed but also an email when any of the task is aborted?
According to my knowledge, this does not seem possible. Others may have more experience, so I defer final solution to those with more experience.
However, what you can do is:
Submit your job array without the -m option (or with -m a to track aborted tasks)
submit a second single dummy job using -hold_jid_ad <job_id_of_job_array> and -m e option.
This will send email when hold on on single job (step 2) is satisfied i.e. when all tasks in your job array complete (step 1).

SGE hold_jid and catching failed jobs

I have a script that submits a number of jobs to run in parallel on an SGE queue, and another gathering script that is executed when this list of jobs are finished. I am using -hold_jid wc_job_list to hold the execution of the gathering script while the parallel jobs are running.
I just noticed that sometimes some of the parallel jobs fail and the gathering script still runs. The documentation states that:
If any of the referenced jobs exits with exit code 100, the submitted
job will remain ineligible for execution.
How can I catch the parallel failed jobs exit status so that if any of them fail for any reason, the gathering script is not executed or gives an error message?
In case of BASH, you could parse the exit status of your program (can be referenced as $?) and in the case of not being 0 (which is the exit status for normal termination), call exit 100 at the end of your jobscript.
The problem with this is, that your job will remain in the queue in state Eqw and has to be deleted manually.
UPDATE: For every job you set to Eqw your administrators get an email...

Avoid printing job exit codes in SGE with option -sync yes

I have a Perl script which submits a bunch of array jobs to SGE. I want all the jobs to be run in parallel to save me time, and the script to wait for them all to finish, then go on to the next processing step, which integrates information from all SGE output files and produces the final output.
In order to send all the jobs into the background and then wait, I use Parallel::ForkManager and a loop:
$fork_manager = new Parallel::ForkManager(#as);
# #as: Max nb of processes to run simultaneously
for $a (#as) {
$fork_manager->start and next; # Starts the child process
system "qsub <qsub_options> ./script.plx";
$fork_manager->finish; # Terminates the child process
}
$fork_manager->wait_all_children;
<next processing step, local>
In order for the "waiting" part to work, however, I have had to add "-sync yes" to the qsub options. But as a "side effect" of this, SGE prints the exit code for each task in each array job, and since there are many jobs and the single tasks are light, it basically renders my shell unusable due to all those interupting messages while the qsub jobs are running.
How can I get rid of those messages? If anything, I would be interested in checking qsub's exit code for the jobs (so I can check everything went ok before the next step), but not in one exit code for each task (I log the tasks' error via option -e anyway in case I need it).
The simplest solution would be to redirect the output from qsub somewhere, i.e.
system("qsub <qsub options> ./script.plx >/dev/null 2>&1");
but this masks errors that you might want to see. Alternatively, you can use open() to start the subprocess and read it's output, only printing something if the subprocess generates an error.
I do have an alternate solution for you, though. You could submit the jobs to SGE without -sync y, and capture the job id when qsub prints it. Then, turn your summarization and results collection code into a follow on job and submit it with a dependency on the completion of the first jobs. You can submit this final job with -sync y so your calling script waits for it to end. See the documentation for -hold_jid in the qsub man page.
Also, rather than making your calling script decide when to submit the next job (up to your maximum), use SGE's -tc option to specify the maximum number of simultaneous jobs (note that -tc isn't in the man page, but it is in qsub's -help output). This depends on you using a new enough version of SGE to have -tc, of course.