Sun Grid Engine : How to check the machine that a job was run on? - sungridengine

Is there a flag to make SGE output the machine that it finally dispatched a job to run on ?
I looked through the man but couldn't pinpoint anything.

There are several possibilities:
1: While the job is running, you could use qstat -g t to get the nodes, where your job(s) is/are running.
2: After the job has finished, qacct -j [jobid] shows information for each node, the job was running on.
3: On Linux you could execute the command hostname (or mpirun hostname) to print the respective nodes.

Related

gitlab ci shell runner reports profile loading problem

I'm trying to setup the gitlab ci shell runner. I've used the docker runner before successfully but now I'd like to use another docker container within my testing routine and therefore switched to the shell runner.
After registering I'm running into an exception:
ERROR: Job failed (system failure): prepare environment: exit status 1. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
So, I went through the linked material but that didn't cure the problem. Now, I verified that the gitlab-runner user exists and it has access to docker (needed to run the docker test container). Also the gitlab-runner user is part of the docker group. I can also --login, fire up the /bin/bash without problems.
Still, all I get from the runner side is the the enigmatic message above. What other checkups to I need to do to track down this issue?
The careful reader will find the answer:
"A common failure is when you have a .bash_logout that tries to clear
the console."

Limit to single core for unit test

I have an issue where my unit test currently passes on my dev machine (multi core machine), but the same code fails in pre production (single core machine).
Is it possible to somehow limit the number of cores available for a unit test to get an equal environment on my dev machine? Unfortunately I'm not able to run the unit tests on the pre prod machine.
There are several ways to do that.
Use taskset command
Taskset command binds all threads of a particular process to some subset of cores. Using is easy: taskset -c 0 'your command'
This will bind every thread to the first CPU.
So in order to do this you need to be able to run your unit test programmatically via the command line. If you use some build tool you just run the coommand after taskset. For example
taskset -c 0 "mvn clean compile test"
If you run your test via IDE then you can check full command which is printed when you run the test. In that case it will look like
taskset -c 0 "C:\Program Files\Java\jdk1.8.0_73\bin\java -cp classpath com.intellij.rt.execution.junit.JUnitStarter name_of_test"
More about taskset command
Use affinity locks
Affinity lock can be used programmatically to bind some code to a particular core. But in that case I'm not sure if it will be able to bind also newly created threads during the code execution. I think taskset is easier to use and does all the work.
Check OpenHFT/Java-Thread-Affinity as it's the most popular affinity lock tool for java.

Elastic Beanstalk stops at EbExtensionPostBuild

I am having a problem deploying an EB instance with a custom .ebextensions file. This is the relevant part in that file:
container_commands:
01_migrate:
command: 'python db_migrate.py'
02_npm_build:
command: 'npm install && npm run prod'
As you can see, these commands are for migrating my PostgreSQL database (via a Flask backend) and building my React .jsx files.
If I leave these commands out, the deployment completes perfectly well. However, once I put them in, looking at the eb-activity.log it stalls at this part forever (as far as I can tell):
[2017-04-10T02:39:24.106Z] INFO [3023] - [Application deployment app-613e-170409_223418#1/StartupStage0/EbExtensionPostBuild] : Starting activity...
I also get this message on the Health overview in the console (this is after 1 day):
Performing application deployment (running for 1 day).
I have also tried to deploy it without those container_commands, and then including it back after the successful initial deployment. Then I get the same error message as before in eb-activity.log, and I also get this message on the Health overview:
Incorrect application version "app-2a3d-170409_214923" (deployment 1). Expected version "app-2a3d-170409_214923" (deployment 1).
Which is very strange because those two versions referenced are the same versions. I don't know what this means!
I found a solution.
Remove all you container_commands from .ebextensions/
Go ssh to instance, kill process with.
sudo killall python
Then Deploy new version without container_commands.
And start debuging all your container_commands, one by one on ssh..
Have fun.

Jenkins doesn't launch the application under test on chromebrowser

I ran into an issue with Jenkins which I've never seen before and I thought I'll get some advice. Jenkins wouldn’t launch the AUT on the chrome browser for running selenium tests.
Steps that I followed:
A Jenkins Master and Slave are setup on the same machine. Not as a windows service, but I launch them manually via command prompt
I setup a project on the Slave node with 2 build steps. One for the MSBuild (I dowloaded the plugin) to build the solution and the second step for executing the windows batch command that will start the tests
I also have a TFS plugin to fetch the server version of the solution to build on Jenkins
So when I build the job on Jenkins Slave,
The solution gets built successfully without any errors
Then for the next build step, Jenkins executes the windows batch command and loads the .dll file. Says “starting execution..”
Chromedriver launches. It opens up the chrome browser
But the chrome browser wouldn’t launch the AUT. It just tries to load it and stays intact indefinitely until my Jenkins job times out
With all this happening, my CPU utilization is at 100%. The browser that runs the Jenkins UI on the local host and Java.exe*32 consumes it to the fullest
I ran the exact same MSTest.exe command (that I entered in the build step) in command prompt when Jenkins is not running and it launches the AUT successfully and tests ran
I ran the exact same MSTest.exe command (that I entered in the build step) in command prompt when Jenkins is running. It again spikes the CPU to 100% and AUT never launches
Any thoughts?
I was also running into this issue and solved it as follows.
Basically the jenkins slave has to be started from the startup through a batch job.
Here is the step by step process.
Node URL : http://host:port/computer/nodeName/
Go to the node "Node URL"
Click on "Mark this node temporarily offline"
Go the the machine where slave is running.
Open command prompt in admin mode.
cd to the location where jenkins is installed
Execute jenkins-slave uninstall
Go to services (type services in run) and stop the jenkins slave running
Restart the machine.
cd C:\Users\myUserName\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup
Create a new batch job (name say LaunchJenkinsSlave.bat) with following content
>
java -jar C:/Jenkins/slave.jar -jnlpUrl http://host:port/computer/nodeName/slave-agent.jnlp -secret yourSecret
netsh advfirewall firewall set rule group="remote desktop" new enable=Yes
<<
fyi : You can refer jenkins-slave.xml in your Jenkins install location for yourSecret, nodeName, host ect if you forgot.
Restart your machine.
Observation : Jenkins slave will be started automatically
12. Go to the "Node URL"
and bring the node back online.
Hope this helps.

Troubleshooting failed packer build

I am just getting started with Packer, and have had several instances where my build is failing and I'd LOVE to log in to the box to investigate the cause. However, there doesn't seem to be a packer login or similar command to give me a shell. Instead, the run just terminates and tears down the box before I have a chance to investigate.
I know I can use the --debug flag to pause execution at each stage, but I'm curios if there is a way to just pause after a failed run (and prior to cleanup) and then runt he cleanup after my debugging is complete.
Thanks.
This was my top annoyance with packer. Thankfully, packer build now has an option -on-error that gives you options.
packer build -on-error=ask ... to the rescue.
From the packer build docs:
-on-error=cleanup (default), -on-error=abort, -on-error=ask - Selects what to do when the build fails. cleanup cleans up after the previous steps, deleting temporary files and virtual machines. abort exits without any cleanup, which might require the next build to use -force. ask presents a prompt and waits for you to decide to clean up, abort, or retry the failed step.
Having used Packer extensively, the --debug flag is most helpful. Once the process is paused you SSH to the box with the key (in the current dir) and figure out what is going on.
Yeah, the way I handle this is to put a long sleep in a script inline provisioner after the failing step, then I can ssh onto the box and see what's up. Certainly the debug flag is useful, but if you're running the packer build remotely (I do it on jenkins) you can't really sit there and hit the button.
I do try and run tests on all the stuff I'm packing outside of the build - using the Chef provisioner I've got kitchen tests all over everything before it gets packed. It's a royal pain to try and debug anything besides packer during a packer run.
While looking up info for this myself, I ran across numerous bug reports/feature requests for Packer.
Apparently, someone added new features to the virtualbox and vmware builders a year ago (https://github.com/mitchellh/packer/issues/409), but it hasn't gotten merged into main.
In another bug (https://github.com/mitchellh/packer/issues/1687), they were looking at adding additional features to --debug, but that seemed to stall out.
If a Packer build is failing, first check where the build process has got stuck, but do the check in this sequence:
Are the boot commands the appropriate ones?
Is the preseed config OK?
If 1. and 2. are OK, then it means box has booted and the next to check is the login: SSH keys, ports, ...
Finally any issues within the provisioning scripts