Minor bug in pararrayfun? Is there a workaround (threads vs processes)? - octave

I am using Octave version 4.2.2, but I think the question applies to previous versions as well.
I want to know if the following behavior is well-known and caused by my ignorance or if it is a bug that should be addressed, and also if there is a workaround. Note that I am focusing on parallelization in situations where vectorization is not possible, and where copy by values are not an option.
Basically, my problem is that functions such as pararrayfun or parcellfun seem to violate the principle that the properties of handles are passed by reference.
As an example, suppose we define
classdef data_class < handle
properties
data
end
end
and that we want to take each element from the .data property of one object input_arr from this class, apply some stochastic function fancy_func to them, and copy the result to the corresponding index of the .data property of another object arr of this class. The .data property is just a matrix of size 12*1, and we want to have 4 processes that each process 3 elements.
elem_per_process=3;
num_processes=4;
start_indexes={1,4,7,10};
function []=fill_arr(start_idx,num_elems,in_arr_handle,arr_to_fill_handle)
for i=1:num_elems
arr_to_fill_handle.data(start_idx+i-1,:)=fancy_func(in_arr_handle.data(start_idx+i-1,:));
end
end
filler=#(start_idx)fill_arr(start_idx,elem_per_process,input_arr,arr);
cellfun(filler,start_indexes);%works fine
parcellfun(num_processes,filler,start_indexes);% PROBLEM! Nothing is copied
So, this is actually worse that what I thought:
parcellfun copies object handle properties by value
It breaks code that works with cellfun !
A quick look at
<octave_dir>/packages/parallel-3.1.1/parcellfun.m
seems to indicate that the parallelization relies on processes instead of threads (also note pararrayfun uses parcellfun at its core), so this is expected. What bothers me is that it does so without a single warning, while going against the fundamental properties that many Octave users expect to hold.
So, to summarize:
Is this as a (minor) bug?
Is there any way to parallelize using threads instead of processes (for situations where vectorization is not possible, and where copy by value is unacceptable)?

Related

Why is my %h is List = 1,2; a valid assignment?

While finalizing my upcoming Raku Advent Calendar post on sigils, I decided to double-check my understanding of the type constraints that sigils create. The docs describe sigil type constraints with the table
below:
Based on this table (and my general understanding of how sigils and containers work), I strongly expected this code
my %percent-sigil is List = 1,2;
my #at-sigil is Map = :k<v>;
to throw an error.
Specifically, I expected that is List would attempt to bind the %-sigiled variable to a List, and that this would throw an X::TypeCheck::Binding error – the same error that my %h := 1,2 throws.
But it didn't error. The first line created a List that seemed perfectly ordinary in every way, other than the sigil on its variable. And the second created a seemingly normal Map. Neither of them secretly had Scalar intermediaries, at least as far as I could tell with VAR and similar introspection.
I took a very quick look at the World.nqp source code, and it seems at least plausible that discarding the % type constraint with is List is intended behavior.
So, is this behavior correct/intended? If so, why? And how does that fit in with the type constraints and other guarantees that sigils typically provide?
(I have to admit, seeing an %-sigiled variable that doesn't support Associative indexing kind of shocked me…)
I think this is a grey area, somewhere between DIHWIDT (Docter, It Hurts When I Do This) and an oversight in implementation.
Thing is, you can create your own class and use that in the is trait. Basically, that overrides the type with which the object will be created from the default Hash (for %) and Array (for # sigils). As long as you provide the interface methods, it (currently) works. For example:
class Foo {
method AT-KEY($) { 42 }
}
my %h is Foo;
say %h<a>; # 42
However, if you want to pass such an object as an argument to a sub with a % sigil in the signature, it will fail because the class did not consume the Associatve role:
sub bar(%) { 666 }
say bar(%h);
===SORRY!=== Error while compiling -e
Calling bar(A) will never work with declared signature (%)
I'm not sure why the test for Associative (for the % sigil) and Positional (for #) is not enforced at compile time with the is trait. I would assume it was an oversight, maybe something to be fixed in 6.e.
Quoting the Parameters and arguments section of the S06 specification/speculation document about the related issue of binding arguments to routine parameters:
Array and hash parameters are simply bound "as is". (Conjectural: future versions ... may do static analysis and forbid assignments to array and hash parameters that can be caught by it. This will, however, only happen with the appropriate use declaration to opt in to that language version.)
Sure enough the Rakudo compiler implemented some rudimentary static analysis (in its AOT compilation optimization pass) that normally (but see footnote 3 in this SO answer) insists on binding # routine parameters to values that do the Positional role and % ones to Associatives.
I think this was the case from the first official Raku supporting release of Rakudo, in 2016, but regardless, I'm pretty sure the "appropriate use declaration" is any language version declaration, including none. If your/our druthers are static typing for the win for # and % sigils, and I think they are, then that's presumably very appropriate!
Another source is the IRC logs. A quick search quickly got me nothing.
Hmm. Let's check the blame for the above verbiage so I can find when it was last updated and maybe spot contemporaneous IRC discussion. Oooh.
That is an extraordinary read.
"oversight" isn't the right word.
I don't have time tonight to search the IRC logs to see what led up to that commit, but I daresay it's interesting. The previous text was talking about a PL design I really liked the sound of in terms of immutability, such that code could become increasingly immutable by simply swapping out one kind of scalar container for another. Very nice! But reality is important, and Jonathan switched the verbiage to the implementation reality. The switch toward static typing certainty is welcome, but has it seriously harmed the performance and immutability options? I don't know. Time for me to go to sleep and head off for seasonal family visits. Happy holidays...

How to detect redundant piece of code containing array?

The lecture for my Java class has this piece of code:
for (int i=0; i<arr.length; i=i+10){
if(i%10 == 0){
System.out.println(arr[i]);
}
}
If you start at 0 and then go 10, 20, etc. Why do you need the if condition? Naturally all of these numbers divide by 10.
It's redundant. The only way it could have an effect is when the array length is close to the Integer max value and you're causing overflows by adding 10, but then your code would loop infinitely anyway (or crash when accessing negative array values).
To me the code in the if condition might have 2 reasones:
It is a way to monitor the progress of the function (although since the condition of the for loop is i=i+10 instead of i++, it is less meaningful in this case). This is very normal when we are using some script to execute a task that is dealing with a lots of data (normally in single process, and take some time). By printing out the progress periodically we are able to know (or estimate) how many data has been read/wrtie, or how many times have the codes in the loop has been executed, in this case.
There might be more code added in the for loop, which might modify i. In this case, i%10 == 0 will be meaningful.
In other words, without any more context it does seems like the if condition is redundant, in this case.
To answer the question of the title, here's what we usually do. First, have the code review done by someone else before you merge your branch. Having another fellow to review your codes are good practise as they could give you a fresh mind on correctness and code style. Second, if you find something that is suspecious but not sure (for example, the "redundant code" you think here), wrote unit tests to cover the part of code that you would like to change, make the changes and rerun the unit tests and see if you still get what is expected.
Personally I haven't heard of any tools that is able to detect "redundant code" as the example here, as "redundant" might not be "redundant" at all under different circumstances.

Questions about the Boundary Value Check

I'm doing my JUnit homework and need some explanations here.
Here's the quotation from my homework description:
One of the issues with boundary conditions is that the system needs to behave well even if the boundary is approached multiple times. This should be obvious, but it doesn't always happen in practice.
Remember that we can characterize an object as state and behavior. Typically, the state is not directly accessible, but instead, is accessed indirectly by means of the behavior. That is, the behavior reflects the state of the object.
Now, if we think about boundaries in math, it might not be too surprising to imagine the the value at some boundary will be different if we approach that boundary in different ways. So, if the value can be likened to the state, the state at the boundary may vary depending on how we got there. This would mean that the behavior could be different.
To make objects that behave consistently, we would have to insure that the internal state at those boundaries is consistent. So, test cases should check this assumption. To receive challenge points for this homework assignment enhance your test cases so that potential problems around the boundaries may be discovered.
Clearly mark the Challenge test cases with the string "### challenge ###" in the comments. Include in those comments what boundary is being tested, and how you're guessing that the state of the object may be different depending on how the boundary is being approached.
I don't understand this especially the highlighted part. What does he mean by "object behave consistently" and the "potential probelms"?
Also, how is this different than general boundary check that will just throws the exception and i expected in the JUnit?
Thank you!
Without knowing the details of the homework, an answer could only be somewhat generic, but I'll try.
Boundary checking is not just exception checking, its about seeing which paths in your code are execution on what condition. If you have control statements, loops, if-else, switch, etc you have to verify, on what conditions (of your internal state) those statements are processed in what way.
To me, boundary testing is that you change certain values of an instance field in a way that would cause the behavior to run through different branches of your code.
for example, you have this behavior:
if(someInstanceValue > 5) {
return "great";
} else {
return "poor";
}
Now you could test with data for someInstanceValue that define the boundary
4 : "poor"
5 : "great"
If you have multiple fields in your class, all of them define the state but only some of them may affect a certain path in your code. As the test is a specification of your class under test, written in code, you should specify which fields are relevant to a function, and which are not (by leaving them out).
So you should set up your instance-under-test accordingly (calling all setters) or if you require more complex objects, you could use frameworks like Mockito to specify the state (in a when().thenReturn() syntax).
If you want to verify if you covered all your boundaries, you could run a mutation test against your suite using a mutation testing tool like PIT. It will flip the switches in your code (i.e. replacing a < with a >=) to check whether your test will fail. Often, it's a good source of inspiration for improving the way you test.
Neverthelss, some parts of the homework assignment sound a bit confusing to me. You may approach a boundary from two sides, ok, but there is no such thing as a state that represents THE boundary, you're either on one or the other side of the boundary. If the way, how you approached one side of a boundary matters, and the object behaves differently depending on that "history" of how you reached that state, the history becomes part of the state. In other words: different history = different state.
Keep in mind: every instance field is part of the state. Every possible combination of values of your instance fields defines a single state. Every transition from one combination to another is a state transition triggered by calling a behavior. No think of your test describing this statemachine, be listing the triple of {currentState,input} -> nextState (with input being method invocation). Wich is basically the Given-When-Then structure good tests should have.

What does Backpatching mean?

What does backpatching mean ? Please illustrate with a simple example.
Back patching usually refers to the process of resolving forward branches that have been planted in the code, e.g. at 'if' statements, when the value of the target becomes known, e.g. when the closing brace or matching 'else' is encountered.
In intermediate code generation stage of a compiler we often need to execute "jump" instructions to places in the code that don't exist yet. To deal with this type of cases a target label is inserted for that instruction.
A marker nonterminal in the production rule causes the semantic action to pick up.
Some statements like conditional statements, while, etc. will be represented as a bunch of "if" and "goto" syntax while generating the intermediate code.
The problem is that, These "goto" instructions, do not have a valid reference at the beginning(when the compiler starts reading the source code line by line - A.K.A 1st pass). But, after reading the whole source code for the first time, the labels and references these "goto"s are pointing to, are determined.
The problem is that can we make the compiler able to fill the X in the "goto X" statements in one single pass or not?
The answer is yes.
If we don't use backpatching, this can be achieved by a 2 pass analysis on the source code. But, backpatching lets us to create and hold a separate list which is exclusively designed for "goto" statements. Since it is done in only one pass, the first pass will not fill the X in the "goto X" statements because the comipler doesn't know where the X is at first glance. But, it does stores the X in that exclusive list and after going through the whole code and finding that X, the X is replaced by that address or reference.
Backpaching is the process of leaving blank entries for the goto instruction where the target address is unkonown in the forward transfer in the first pass and filling these unknown in the second pass.
Backpatching:
The syntax directed definition can be implemented in two or more passes (we have both synthesized attributes and inherited attributes).
Build the tree first.
Walk the tree in the depth-first order.
The main difficulty with code generation in one pass is that we may not know the target of a branch when we generate code for flow of control statements
Backpatching is the technique to get around this problem.
Generate branch instructions with empty targets
When the target is known, fill in the label of the branch instructions (backpatching).
backpatching is a process in which the operand field of an instruction containing a forward reference is left blank initially. the address of the forward reference symbol is put into this field when its definition is encountered in the program.
Back patching is the activity of filling up the unspecified information of labels
by using the appropriate semantic expression in during the code generation process.
It is done by:
boolean expression.
flow of control statement.

Programming style: should you return early if a guard condition is not satisfied?

One thing I've sometimes wondered is which is the better style out of the two shown below (if any)? Is it better to return immediately if a guard condition hasn't been satisfied, or should you only do the other stuff if the guard condition is satisfied?
For the sake of argument, please assume that the guard condition is a simple test that returns a boolean, such as checking to see if an element is in a collection, rather than something that might affect the control flow by throwing an exception. Also assume that methods/functions are short enough not to require editor scrolling.
// Style 1
public SomeType aMethod() {
SomeType result = null;
if (!guardCondition()) {
return result;
}
doStuffToResult(result);
doMoreStuffToResult(result);
return result;
}
// Style 2
public SomeType aMethod() {
SomeType result = null;
if (guardCondition()) {
doStuffToResult(result);
doMoreStuffToResult(result);
}
return result;
}
I prefer the first style, except that I wouldn't create a variable when there is no need for it. I'd do this:
// Style 3
public SomeType aMethod() {
if (!guardCondition()) {
return null;
}
SomeType result = new SomeType();
doStuffToResult(result);
doMoreStuffToResult(result);
return result;
}
Having been trained in Jackson Structured Programming in the late '80s, my ingrained philosophy was always "a function should have a single entry-point and a single exit-point"; this meant I wrote code according to Style 2.
In the last few years I have come to realise that code written in this style is often overcomplex and hard to read/maintain, and I have switched to Style 1.
Who says old dogs can't learn new tricks? ;)
Style 1 is what the Linux kernel indirectly recommends.
From https://www.kernel.org/doc/Documentation/process/coding-style.rst, chapter 1:
Now, some people will claim that having 8-character indentations makes
the code move too far to the right, and makes it hard to read on a
80-character terminal screen. The answer to that is that if you need
more than 3 levels of indentation, you're screwed anyway, and should fix
your program.
Style 2 adds levels of indentation, ergo, it is discouraged.
Personally, I like style 1 as well. Style 2 makes it harder to match up closing braces in functions that have several guard tests.
I don't know if guard is the right word here. Normally an unsatisfied guard results in an exception or assertion.
But beside this I'd go with style 1, because it keeps the code cleaner in my opinion. You have a simple example with only one condition. But what happens with many conditions and style 2? It leads to a lot of nested ifs or huge if-conditions (with || , &&). I think it is better to return from a method as soon as you know that you can.
But this is certainly very subjective ^^
Martin Fowler refers to this refactoring as :
"Replace Nested Conditional with Guard Clauses"
If/else statements also brings cyclomatic complexity. Hence harder to test cases. In order to test all the if/else blocks you might need to input lots of options.
Where as if there are any guard clauses, you can test them first, and deal with the real logic inside the if/else clauses in a clearer fashion.
If you dig through the .net-Framework using .net-Reflector you will see the .net programmers use style 1 (or maybe style 3 already mentioned by unbeli).
The reasons are already mentioned by the answers above. and maybe one other reason is to make the code better readable, concise and clear.
the most thing this style is used is when checking the input parameters, you always have to do this if you program a kind of frawework/library/dll.
first check all input parameters than work with them.
It sometimes depends on the language and what kinds of "resources" that you are using (e.g. open file handles).
In C, Style 2 is definitely safer and more convenient because a function has to close and/or release any resources that it obtained during execution. This includes allocated memory blocks, file handles, handles to operating system resources such as threads or drawing contexts, locks on mutexes, and any number of other things. Delaying the return until the very end or otherwise restricting the number of exits from a function allows the programmer to more easily ensure that s/he properly cleans up, helping to prevent memory leaks, handle leaks, deadlock, and other problems.
In C++ using RAII-style programming, both styles are equally safe, so you can pick one that is more convenient. Personally I use Style 1 with RAII-style C++. C++ without RAII is like C, so, again, Style 2 is probably better in that case.
In languages like Java with garbage collection, the runtime helps smooth over the differences between the two styles because it cleans up after itself. However, there can be subtle issues with these languages, too, if you don't explicitly "close" some types of objects. For example, if you construct a new java.io.FileOutputStream and do not close it before returning, then the associated operating system handle will remain open until the runtime garbage collects the FileOutputStream instance that has fallen out of scope. This could mean that another process or thread that needs to open the file for writing may be unable to until the FileOutputStream instance is collected.
Although it goes against best practices that I have been taught I find it much better to reduce the nesting of if statements when I have a condition such as this. I think it is much easier to read and although it exits in more than one place it is still very easy to debug.
I would say that Style1 became more used because is the best practice if you combine it with small methods.
Style2 look a better solution when you have big methods. When you have them ... you have some common code that you want to execute no matter how you exit. But the proper solution is not to force a single exit point but to make the methods smaller.
For example if you want to extract a sequence of code from a big method, and this method has two exit points you start to have problems, is hard to do it automatically. When i have a big method written in style1 i usually transform it in style2, then i extract methods then in each of them i should have Style1 code.
So Style1 is best but is compatible with small methods.
Style2 is not so good but is recommended if you have big methods that you don't want, have time to split.
I prefer to use method #1 myself, it is logically easier to read and also logically more similar to what we are trying to do. (if something bad happens, exit function NOW, do not pass go, do not collect $200)
Furthermore, most of the time you would want to return a value that is not a logically possible result (ie -1) to indicate to the user who called the function that the function failed to execute properly and to take appropriate action. This lends itself better to method #1 as well.
I would say "It depends on..."
In situations where I have to perform a cleanup sequence with more than 2 or 3 lines before leaving a function/method I would prefer style 2 because the cleanup sequence has to be written and modified only once. That means maintainability is easier.
In all other cases I would prefer style 1.
Number 1 is typically the easy, lazy and sloppy way. Number 2 expresses the logic cleanly. What others have pointed out is that yes it can become cumbersome. This tendency though has an important benefit. Style #1 can hide that your function is probably doing too much. It doesn't visually demonstrate the complexity of what's going on very well. I.e. it prevents the code from saying to you "hey this is getting a bit too complex for this one function". It also makes it a bit easier for other developers that don't know your code to miss those returns sprinkled here and there, at first glance anyway.
So let the code speak. When you see long conditions appearing or nested if statements it is saying that maybe it would be better to break this stuff up into multiple functions or that it needs to be rewritten more elegantly.