I am working on MUMPS in my recent project. I have a question regarding naked indicators. I am confused between routines and naked global references.
Can anybody help me to understand the difference between routines and naked indicators? The syntax for routines seems very similar to the syntax for naked indicators.
I'm not sure I fully understand your question, but I suspect you're referring to the fact that both Routine and Global references start with a caret (^).
Routines use the caret to distinguish between the routine and a label within the current routine. For example:
D COMPUTATION ; executes the COMPUTATION label in the current routine
D ^COMPUTATION ; executes the COMPUTATION routine
D SUBCOMP^COMPUTATION ; executes the SUBCOMP label in the COMPUTATION routine.
For variables, the caret indicates it is a global variable and not a local variable. This is the case whether you use naked references or not (this is where more clarification on your question might be in order, since as I understand it the reference being naked makes no difference). The difference being, of course, with the naked reference you can drop the variable name, and all but the last subscript of the global. For example:
S ^MYGLOB(1,1)="one"
S ^MYGLOB(1,2)="two"
is equivalent to
S ^MYGLOB(1,1)="one"
S ^(2)="two" ;naked, ewww
All that said, I would strongly advise against using naked references. They are intended to save time when entering code from the command prompt, but are very dangerous in code that has to be maintained. For example if a reference to ^OTHERGLOB were inserted between the two lines of code above, ^(2) would now reference ^OTHERGLOB(2), not ^MYGLOB(1,2). Not to mention, it's a pain to read.
Related
While finalizing my upcoming Raku Advent Calendar post on sigils, I decided to double-check my understanding of the type constraints that sigils create. The docs describe sigil type constraints with the table
below:
Based on this table (and my general understanding of how sigils and containers work), I strongly expected this code
my %percent-sigil is List = 1,2;
my #at-sigil is Map = :k<v>;
to throw an error.
Specifically, I expected that is List would attempt to bind the %-sigiled variable to a List, and that this would throw an X::TypeCheck::Binding error – the same error that my %h := 1,2 throws.
But it didn't error. The first line created a List that seemed perfectly ordinary in every way, other than the sigil on its variable. And the second created a seemingly normal Map. Neither of them secretly had Scalar intermediaries, at least as far as I could tell with VAR and similar introspection.
I took a very quick look at the World.nqp source code, and it seems at least plausible that discarding the % type constraint with is List is intended behavior.
So, is this behavior correct/intended? If so, why? And how does that fit in with the type constraints and other guarantees that sigils typically provide?
(I have to admit, seeing an %-sigiled variable that doesn't support Associative indexing kind of shocked me…)
I think this is a grey area, somewhere between DIHWIDT (Docter, It Hurts When I Do This) and an oversight in implementation.
Thing is, you can create your own class and use that in the is trait. Basically, that overrides the type with which the object will be created from the default Hash (for %) and Array (for # sigils). As long as you provide the interface methods, it (currently) works. For example:
class Foo {
method AT-KEY($) { 42 }
}
my %h is Foo;
say %h<a>; # 42
However, if you want to pass such an object as an argument to a sub with a % sigil in the signature, it will fail because the class did not consume the Associatve role:
sub bar(%) { 666 }
say bar(%h);
===SORRY!=== Error while compiling -e
Calling bar(A) will never work with declared signature (%)
I'm not sure why the test for Associative (for the % sigil) and Positional (for #) is not enforced at compile time with the is trait. I would assume it was an oversight, maybe something to be fixed in 6.e.
Quoting the Parameters and arguments section of the S06 specification/speculation document about the related issue of binding arguments to routine parameters:
Array and hash parameters are simply bound "as is". (Conjectural: future versions ... may do static analysis and forbid assignments to array and hash parameters that can be caught by it. This will, however, only happen with the appropriate use declaration to opt in to that language version.)
Sure enough the Rakudo compiler implemented some rudimentary static analysis (in its AOT compilation optimization pass) that normally (but see footnote 3 in this SO answer) insists on binding # routine parameters to values that do the Positional role and % ones to Associatives.
I think this was the case from the first official Raku supporting release of Rakudo, in 2016, but regardless, I'm pretty sure the "appropriate use declaration" is any language version declaration, including none. If your/our druthers are static typing for the win for # and % sigils, and I think they are, then that's presumably very appropriate!
Another source is the IRC logs. A quick search quickly got me nothing.
Hmm. Let's check the blame for the above verbiage so I can find when it was last updated and maybe spot contemporaneous IRC discussion. Oooh.
That is an extraordinary read.
"oversight" isn't the right word.
I don't have time tonight to search the IRC logs to see what led up to that commit, but I daresay it's interesting. The previous text was talking about a PL design I really liked the sound of in terms of immutability, such that code could become increasingly immutable by simply swapping out one kind of scalar container for another. Very nice! But reality is important, and Jonathan switched the verbiage to the implementation reality. The switch toward static typing certainty is welcome, but has it seriously harmed the performance and immutability options? I don't know. Time for me to go to sleep and head off for seasonal family visits. Happy holidays...
I want to know when a function body end in assemby, for example in c you have this brakets {} that tell you when the function body start and when it ends but how do i know this in assembly?
Is there a parser that can extract me all the functions from assembly and start line and endline of their body?
There's no foolproof way, and there might not even be a well-defined correct answer in hand-written asm.
Usually (e.g. in compiler-generated code) you know a function ends when you see the next global symbol, like objdump does to decide when to print a new "banner". But without all function-start symbols being visible, there's no unambigious way. That's why some object file formats have room for size metadata associated with a symbol. Like .size foo, . - foo in GAS syntax.
It's not as easy as looking for a ret; some functions end with a jmp tail-call to another function. And some call a noreturn function like abort or __stack_chk_fail (not tailcall because they want to push a return address for a backtrace.) Or just fall off into whatever's next because that path had undefined behaviour in the source so the compiler assumed it wasn't reachable and stopped generating instructions for it, e.g. a C++ non-void function where execution can/does fall off the end without a return.
In general, assembly can blur the lines of what a function is.
Asm has features you can use to implement the high-level concept of a function, but you're not restricted to that.
e.g. multiple asm functions could all return by jumping to a common block of code that pops some registers before a ret. Is that shared tail a separate function that's called with a tail-called with a special calling convention?
Compilers don't usually do that, but humans could.
As for function entry points, usually some other code somewhere in the program will contain a call to it. But not necessarily; it might only be reachable via a table of function pointers, and you don't know that a block of .rodata holds function pointers until you find some code loading from it and calling or jumping.
But that doesn't work if the lowest-address instruction of the function isn't its entry point. See Does a function with instructions before the entry-point label cause problems for anything (linking)? for an example
Compilers don't generate code like that, but humans can. (It's a handy trick sometimes for https://codegolf.stackexchange.com/ questions.)
Or in the general case, a function might have multiple entry points. Or you could describe that as multiple functions with overlapping implementations. Sometimes it's as simple as one tailcalling another by falling into it without needing a jmp, i.e. it starts a few instructions before another.
I wan't to know when a function body ends in assembly, [...]
There are mainly four ways that the execution of a stream of (userspace) instructions can "end":
An unconditional jump like jmp or a conditional one like Jcc (je,jnz,jg ...)
A ret instruction (meaning the end of a subroutine) which probably comes closest to the intent of your question (including the ExitProcess "ret" command)
The call of another "function"
An exception. Not a C style exception, but rather an exception like "Invalid instruction" or "Division by 0" which terminates the user space program
[...] for example in c you have this brakets {} that tell you when the function body start and when it ends but how do i know this in assembly?
Simple answer: you don't. On the machine level every address can (theoretically) be an entry point to a "function". So there is no unique entry point to a "function" other than defined - and you can define anything.
On a tangent, this relates to self-modifying code and viruses, but it must not. The exit/end is as described in the first part above.
Is there a parser that can extract me all the functions from assembly and
start line and endline of their body?
Disassemblers create some kind of "functions" with entry and exit points. But they are merely assumed. No way to know if that assumption is correct. This may cause problems.
The usual approach is using a disassembler and the work to recombinate the stream of instructions to different "functions" remains to the person that mandated this task (vulgo: you). Some tools exist that claim to simplify this, but I cannot judge their efficacy.
From the perspective of a high level language, there are decompilers that try to reverse the transformation from (for example) C to assembly/machine code that try to automatize that task and will work more or less or in some cases.
I am using Octave version 4.2.2, but I think the question applies to previous versions as well.
I want to know if the following behavior is well-known and caused by my ignorance or if it is a bug that should be addressed, and also if there is a workaround. Note that I am focusing on parallelization in situations where vectorization is not possible, and where copy by values are not an option.
Basically, my problem is that functions such as pararrayfun or parcellfun seem to violate the principle that the properties of handles are passed by reference.
As an example, suppose we define
classdef data_class < handle
properties
data
end
end
and that we want to take each element from the .data property of one object input_arr from this class, apply some stochastic function fancy_func to them, and copy the result to the corresponding index of the .data property of another object arr of this class. The .data property is just a matrix of size 12*1, and we want to have 4 processes that each process 3 elements.
elem_per_process=3;
num_processes=4;
start_indexes={1,4,7,10};
function []=fill_arr(start_idx,num_elems,in_arr_handle,arr_to_fill_handle)
for i=1:num_elems
arr_to_fill_handle.data(start_idx+i-1,:)=fancy_func(in_arr_handle.data(start_idx+i-1,:));
end
end
filler=#(start_idx)fill_arr(start_idx,elem_per_process,input_arr,arr);
cellfun(filler,start_indexes);%works fine
parcellfun(num_processes,filler,start_indexes);% PROBLEM! Nothing is copied
So, this is actually worse that what I thought:
parcellfun copies object handle properties by value
It breaks code that works with cellfun !
A quick look at
<octave_dir>/packages/parallel-3.1.1/parcellfun.m
seems to indicate that the parallelization relies on processes instead of threads (also note pararrayfun uses parcellfun at its core), so this is expected. What bothers me is that it does so without a single warning, while going against the fundamental properties that many Octave users expect to hold.
So, to summarize:
Is this as a (minor) bug?
Is there any way to parallelize using threads instead of processes (for situations where vectorization is not possible, and where copy by value is unacceptable)?
What does backpatching mean ? Please illustrate with a simple example.
Back patching usually refers to the process of resolving forward branches that have been planted in the code, e.g. at 'if' statements, when the value of the target becomes known, e.g. when the closing brace or matching 'else' is encountered.
In intermediate code generation stage of a compiler we often need to execute "jump" instructions to places in the code that don't exist yet. To deal with this type of cases a target label is inserted for that instruction.
A marker nonterminal in the production rule causes the semantic action to pick up.
Some statements like conditional statements, while, etc. will be represented as a bunch of "if" and "goto" syntax while generating the intermediate code.
The problem is that, These "goto" instructions, do not have a valid reference at the beginning(when the compiler starts reading the source code line by line - A.K.A 1st pass). But, after reading the whole source code for the first time, the labels and references these "goto"s are pointing to, are determined.
The problem is that can we make the compiler able to fill the X in the "goto X" statements in one single pass or not?
The answer is yes.
If we don't use backpatching, this can be achieved by a 2 pass analysis on the source code. But, backpatching lets us to create and hold a separate list which is exclusively designed for "goto" statements. Since it is done in only one pass, the first pass will not fill the X in the "goto X" statements because the comipler doesn't know where the X is at first glance. But, it does stores the X in that exclusive list and after going through the whole code and finding that X, the X is replaced by that address or reference.
Backpaching is the process of leaving blank entries for the goto instruction where the target address is unkonown in the forward transfer in the first pass and filling these unknown in the second pass.
Backpatching:
The syntax directed definition can be implemented in two or more passes (we have both synthesized attributes and inherited attributes).
Build the tree first.
Walk the tree in the depth-first order.
The main difficulty with code generation in one pass is that we may not know the target of a branch when we generate code for flow of control statements
Backpatching is the technique to get around this problem.
Generate branch instructions with empty targets
When the target is known, fill in the label of the branch instructions (backpatching).
backpatching is a process in which the operand field of an instruction containing a forward reference is left blank initially. the address of the forward reference symbol is put into this field when its definition is encountered in the program.
Back patching is the activity of filling up the unspecified information of labels
by using the appropriate semantic expression in during the code generation process.
It is done by:
boolean expression.
flow of control statement.
For example, in this simple function where fun1 takes as input two numbers, adds them together and passes them to function 2 for printing the output. var1_in is local to each function, so is it OK to use the name var1_in in both functions, or is it better practice to call them different things?
fun1 <- function (var1_in, var2_in) {
var3 = var1_in + var2_in
fun2(var3)
}
fun2 <- function (var1_in) {
var4 = var1_in
print(var4)
}
As long as the functions are short enough to easily understand, then identifying the scope of local variables and parameters will be easy as well. But there isn't a hard and fast rule for this. What's important is that the code is easy to understand and that the names of variables are relevant and meaningful regardless if this means name duplication. Modern IDE's will also help here by highlighting the instances of such variables making it easy to see their declaration and various usage points. Point being, I would focus more on quality and meaningful naming rather than duplication of variable names.
EDIT - Of course, one situation to avoid would be naming a local variable or parameter the same as a global variable. This can confuse things greatly and lead to many a subtle bug.