I am new to Apache Airflow, and I plan to run Python and R script files using the BashOperator class. I want to understand how Exceptions should work in two situations:
1. The R or Python script fails for some reason; or
2. The R or Python script completes but I want to require human input before the DAG proceeds to the next task.
I have two very basic questions:
1. How does an Exception get passed from an R or Python script file to the BashOperator to the DAG? For example, should the call to an R script file be inside a try block in the BashOperator?
2. How do I pass a custom exception (warning? error?) so that even if the R or Python script completes successfully, I can pause execution of the DAG?
I'd appreciate any examples of Airflow exception handling that you could point me to.
Forget that you are using any Airflow operators.
Just say you are writing a Bash Script that runs the R or Python script.
Case 1: Bash Script that fails if Python Script fails:
set -e
python test_file.py
Case 2: Bash Script that passes even if Python Script fails:
python test_file.py || exit 0
Now just pass one of the above bash scripts in BashOperator.
Basically, BashOperator just runs normal Bash commands/script and passes the logs and exit status of the Script. In Case 1 your Airflow task would also fail and in case 2 the Airflow task would fail.
Related
I have a Pid 1 problem. Ok so in order to explain i need to focus on my formulation of the problem.
I have a service which is depended on a hostid and a license file generated to match the hostid in order to run. How the hostid is genereted is unknown to me.
If the service does not have a valid license the prosess shuts down.
So im unable to containerize just this simple service.
But what if I a have another process running first, like an API to set the license file, and to query for hostid. Then this api can set the license file in place. But now to the tricky part, how can I switch the process running PID 1? Cause the service needs to be run as PID 1.
I was thinking of abbreviating with the PID 1 beeing a bash loop which first starts the API, then when the API exits starts the service.
Would this be possible?
And how would you create the bash loop?
The C execve(2) function replaces the current process with a new one; the new process keeps properties like the effective user ID and it has the same process ID. The Bourne shell includes an exec built-in that does the same thing.
A common pattern in a Docker image is to use an entrypoint wrapper script to do first-time setup. If a container has both an entrypoint and a command, the command gets passed as arguments to the entrypoint. So you can write a script like:
#!/bin/sh
# Do whatever's needed to get the license
/opt/myapp/bin/get_license
# Then run the command part
# exec replaces this script, so it will have pid 1
# "$#" is the command-line arguments
exec "$#"
In the Dockerfile, set the ENTRYPOINT to this wrapper, and the CMD to run the real service.
# Run the script above
# ENTRYPOINT must have JSON-array syntax in this usage
ENTRYPOINT ["/opt/myapp/bin/start_with_license"]
# Say the normal thing you want the container to do
# CMD can have either JSON-array or shell syntax
CMD ["/opt/myapp/bin/server", "--foreground"]
I am using slurm to manipulate the gpus to train my model. I configured the python environment on node A, which is where my code and data stored. The common practice is like this:
srun -p gpu --ntasks-per-node=1 --gres=gpu:2 python train.py
This will let slurm find a node for me and run my code on this node. Here I found my code is running 3 times slower than it will run on some local machine with same number of gpus. I guess the reason is that the data used in the code is stored on node A, while slurm assigned a node B for me to run my code. Thus the data on node A will have to be continuously transmitted from node A to node B which slows down the process.
Here my question is: is there a method that I could copy my data to node B so that the code can use the data like in the local machine?
You can replace the python train.py part in your command with a Bash script that first transfers the data and then run python train.py.
Even better would be to consider creating a proper submission script and submit it with sbath rather than using srun on its own:
#!/bin/bash
#SBATCH -p gpu
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:2
cp /global/directory/data /local/directory/
python train.py
You would need to replace the line cp /global/directory/data /local/directory/ with a proper command to copy the files. It could be scp rather than cp.
I have an SSIS package that executes the WinSCP.exe via execute process task.
All works well.
However, i am working on a logging routine for the files that were successfully/unsuccessfully downloaded and i have a header and detail logging table to capture the data.
Is there a way in the execute process task to have the exit code returned into a variable?
Thanks
I renamed an action_hook from a non-cartridge-specific action hook (such as post_restart) to be cartridge-specific (such as post_restart_cron) and then encountered strange new errors such as:
/var/lib/openshift/${USER}/app-root/runtime/repo/.openshift/action_hooks/post_restart_cron: line 5: `firstcron-secondcron': not a valid identifier
The script file post_restart_cron is:
#!/bin/bash
function firstcron-secondcron {
echo in function
}
The issue is that non-cartridge-specific action hooks apparently run bash in non-POSIX mode which allows hyphens in function names, but cartridge-specific action hooks run bash in POSIX mode which does not allow hyphens in function names.
Why do cartridge-specific action hooks run bash in POSIX mode? I'm not 100% sure, but I think the following occurs:
v2_cart_model.rb:cartridge_hooks remembers the hook as source <hook filepath>.
v2_cart_model.rb:do_control_with_directory creates a command string set -e; <path to control script> <action> <other args>; source <hook filepath>.
It probably passes that string to sh -c which runs in POSIX mode and because it uses source, it reads the script directly instead of running it in a new process (which would read the #!/bin/bash line and run /bin/bash which is by default non-POSIX mode).
There must be something different about the non-cartridge-specific codepath in v2_cart_model.rb that avoids the steps above.
My solution was to use unset POSIXLY_CORRECT in my script which disabled POSIX mode.
I debugged this issue by running the set command in my script which showed a variety of bash variables leading to investigations that I used the process of elimination on.
I need an explanation how to pass arguments to WinSCP in SSIS Execute process task.
I want to download a latest file using SFTP on everyday basis.
I am able to connect to Remote Server using WinSCP in SSIS.
FTP Task steps:
http://winscp.net/eng/docs/guide_ssis
I followed the steps as in:
http://winscp.net/eng/docs/scripting
My Problem is:
I want to pass parameter to my WinSCP Script.
My script has command get "%1%".
Complete Script:
option batch abort
option confirm off
open sftp://usename:password#ftp.dummy.com
option transfer binary
cd /root
get "%1%" C:\Data\
close
exit
On my Execute Process Task editor, I am passing argument *20120817*.xml as below, but it is not working:
/script=scriptB.txt \*20120817\*.xml
You are missing the /parameter switch:
/script=scriptB.txt /parameter *20120817*.xml
Refer to:
https://winscp.net/eng/docs/commandline#scripting