SGE: Whats the difference between -hold_jid_ad and -hold_jid - sungridengine

When qsub-ing a job I want one job to wait to execute until the first one has succeeded. I've been looking at the options -hold_jid_ad and -hold_jid to do this, but can't see a difference in them.

Lets say you have jobs A and B, and A depends on B. That is you want to run something like qsub --hold_jid[_ad] B A.sh
--hold_jid_ad
This is for array jobs only. Use it if:
A and B are both array jobs,
A and B has the same range of tasts (i.e. not qsub -t 1-3 A.sh and qsub -t 2-4 B.sh),
A[i] depends on B[i], AND
A[i] does not depend on B[j] for i != j
--hold_jid
Use this is any other situation. When in doubt use this.
This is based on a diff the documentation and a few tries on our engine.

Difference in man page is subtle. It appears that -hold_jid_ad is for job array, whereas -hold_jid is for a regular job type.

Related

Computing powers of -1

Is there an established idiom for implementing (-1)^n * a?
The obvious choice of pow(-1,n) * a seems wasteful, and (1-2*(n%2)) * a is ugly and not perfectly efficient either (two multiplications and one addition instead of just setting the sign). I think I will go with n%2 ? -a : a for now, but introducing a conditional seems a bit dubious as well.
Making certain assumptions about your programming language, compiler, and CPU...
To repeat the conventional -- and correct -- wisdom, do not even think about optimizing this sort of thing unless your profiling tool says it is a bottleneck. If so, n % 2 ? -a : a will likely generate very efficient code; namely one AND, one test against zero, one negation, and one conditional move, with the AND+test and negation independent so they can potentially execute simultaneously.
Another option looks something like this:
zero_or_minus_one = (n << 31) >> 31;
return (a ^ zero_or_minus_one) - zero_or_minus_one;
This assumes 32-bit integers, arithmetic right shift, defined behavior on integer overflow, twos-complement representation, etc. It will likely compile into four instructions as well (left shift, right shift, XOR, and subtract), with a dependency between each... But it can be better for certain instruction sets; e.g., if you are vectorizing code using SSE instructions.
Incidentally, your question will get a lot more views -- and probably more useful answers -- if you tag it with a specific language.
As others have written, in most cases, readability is more important than performance and compilers, interpreters and libraries are better at optimizing than most people think. Therfore pow(-1,n) * a is likely to be an efficient solution on your platform.
If you really have a performance issue, your own suggestion n%2 ? -a : a is fine. I don't see a reason to worry about the conditional assignment.
If your language has a bitwise AND operator, you could also use n & 1 ? -a : a which should be very efficient even without any optimization. It is likely that on many platforms, this is what pow(a,b) actually does in the special case of a == -1 and b being an integer.

Filling the gap in my understanding

I was going through the following series of lecture notes on OS :
http://williamstallings.com/Extras/OS-Notes/h3.html
Here while trying to explain the different outcomes the program for thread can produce it breaks down the execution of function and says the following line :
"sum first reads the value of a into a register. It then increments the register, then stores the contents of the register back into a. It then reads the values of of the control string, p and a into the registers that it uses to pass arguments to the printf routine. It then calls printf, which prints out the data"
I exactly don't know how a function is executed at the level of registers and at the same time don't know which topic should I learn to know more about it .
So , which topic encompasses this execution of function at the level of registers and the level of electronic circuits?
please kindly elaborate how a stack is incremented while a value is being read during the execution of function .
Thanks in advance.
The advice to look at the assambler code is already a good one. You can look up the assembler instructions and think what happens if at any instruction the thread execution changes to the other thread.
Look at this code
la a, %r0
ld [%r0],%r1
add %r1,1,%r1
st %r1,[%r0]
ld [%r0], %o3 ! parameters are passed starting with %o0
mov %o0, %o1
la .L17, %o0
call printf
In the first four lines (the a++) there are different possibilities how the execution can happen. You dont know if sum(1) or sum(0) is called first.
To understand what is ongoing on a deeper level I suggest you look up 'computer organization'. See for example this link Computer Organisation WikiBook.

In Vowpal Wabbit, what is the difference between a namespace and feature?

While carrying out analysis in R or python we are only aware of feature names (their values) and use them. In Vowpal Wabbit we also have Namespaces.
I am unable to understand:
a. what is meant by Namespace;
b. how is it different from features;
c. when is it used? And when not used? That is, can we avoid using it.
d. And how is it used?
Will be grateful for one or two examples. Sorry for so many questions.
In vowpal wabbit name-spaces are used in order to conveniently generate interaction features on-the-fly during run-time without the need to predeclare them.
A simple example format, without a name space is:
1 | a:2 b:3
where 1 is the label, and a, b are regular input features.
Note that there's a space after the |.
Contrast the above with using two name spaces x and y (note no space between the | separator and the name-spaces):
1 |x a:2 |y b:3
This example is essentially equivalent (except for feature hash locations) to the first example. It still has two features with the same values as the original example. The difference is that now with these name-spaces, we can cross features by passing options to vw. For example:
vw -q xy
will generate additional features on-the-fly by crossing all features in name-space x with all features in name-space y. The names of the auto-generated features will be the concatenation of the names from the two name-spaces and the values will be the products of their respective values. In this particular case, it would be as if our data-set had one additional feature: ab:6 (*)
Obviously, this is a very simple example, imagine that you have an example with 3 features in a name-space:
1 |x a:2 b:3 c:5
By adding -q xx to vw you could automatically generate 6 additional interaction features: aa, ab, ac, bb, bc, cc on the fly. And if you had 3 name-spaces, say: x, y, z, you could cross any (or any wanted subset) of them: -q xx -q xy -q xz -q yz -q yy -q zz on the command-line to get all possible interactions between the separate sets of features.
That's all there is to it. It is a powerful feature allowing you to experiment and add interaction features on the fly.
There are several options which accept (1st letters of) name-spaces as arguments, among them:
-q
--cubic
--ignore
--keep
--redefine (very new)
--lrq
Check out the vw command line arguments wiki for more details.
(*) In practice, the feature names will have the name spaces prepended to them with a ^ separator in between so the actual hashed string would be x^a^y^b:6 rather than ab:6 (You may verify this by using the --audit option) but this is just a detail.

Pipeline Hazards questions

I'm currently studying for an exam tomorrow and need some help understanding the following:
The following program is given:
ADDF R12, R13, R14
ADD R1,R8,R9
MUL R4,R2,R3
MUL R5,R6,R7
ADD R10,R5,R7
ADD R11,R2,R3
Find the potential conficts that can arise if the architecture has:
a) No pipeline
b) A Pipeline
c) Multiple pipelines
So for (b) I would say the instruction on line 5 is a Data Hazard because it fetches the value of R5 which is from the previous line given the result of a multiplication, so that instruction is not yet finished.
But what happens if an architecture doesn't have a pipeline? My best guess is that no hazards exist, but I'm not sure.
Also, what happens if it has 2 or more pipelines?
Cheers.
You are correct to suggest that for a) there are no hazards as each instruction must complete before the next starts.
For b):
There is a "Read After Write" dependency between lines 4 and 5.
There are "Read After Read" dependencies between lines 4 and 5 and also between lines 2 and 6.
I suspect that the difference between parts b) and c) is that the question assumes you know ahead of time that the pipe-line has a well defined number of stages. For example we know that if the pipe-line has 3 stages then the RAR dependency between lines 2 and 6 is irrelevant.
In a system with multiple pipelines however the system could fetch say 4 instructions per cycle making dependencies that were formally too far apart now potential hazards.

Why are register-based virtual machines better than stack-based ones?

Why are register-based virtual machines better than stack-based ones?
Specifically, in the Parrot VM's document, the designer explains the benefits of register machines:
[...] many programs in high-level languages consist of nested function and method calls, sometimes with lexical variables to hold intermediate results. Under non-JIT settings, a stack-based VM will be popping and then pushing the same operands many times, while a register-based VM will simply allocate the right amount of registers and operate on them, which can significantly reduce the amount of operations and CPU time.
but why are the same operands pushed many times?
It seems like they describe a VM which executes the code as described in the language design, bytecode-by-bytecode without compiling or optimisation. In that case it is true. Think about code doing something like this for example:
x = first(a,b,c)
y = second(a,b,c)
third(y,x)
With a register based system, you might be able to simply put the arguments in whatever position they're expected (if registers can be used to pass arguments). If all registers are "global", not per-function (or at least restored when poping the call-stack) you might not need to do anything between the call to first and second.
If you have a stack-based VM, you'd end up with something like (hopefully you do have swap):
push a
push b
push c
call first
push a # pushing same arguments again
push b
push c
call second
swap
call third
Also if you calculate a math expression which reuses the same variables, you might need to do something like this:
push a
push b
add
push a
push c
add
add
instead of (assuming there are registers a,b,c and you can destroy the contents of b and c):
add b, a
add c, a
add b, c # result in b
this avoids restoring a, which needed to be done in a separate push in the first case.
Then again, I'm just guessing the examples, maybe they meant some other case...