are fork.. join statements allowed in functions in system verilog? - function

they are definitely allowed in tasks, But I could not find, if they are allowed in functions.
Thanks in advance for your help.

Yes, fork...join_none is allowed within functions.
A fork block can only be used in a function if it is matched with a join_none. The reason is that functions must execute in zero time. Because a fork...join_none will be spawned into a separate thread/process, the function can still complete in zero time.
This is clearly stated in IEEE 1800-2012 in section 13.4.4 Background processes spawned by function calls
Functions shall execute with no delay. Thus, a process calling a function shall return immediately. Statements that do not block shall be allowed inside a function; specifically, nonblocking assignments, event triggers, clocking drives, and fork - join_none constructs shall be allowed inside a function.

My simulation tool allows fork...join_none in functions but issues a warning that fork...join (and probably fork...join_any) will be converted to begin...end. I couldn't find anything in the standard about this which most likely is why I don't get a strict compile error.
Be careful as different simulator vendors may implement different rules. In two of the big 3 simulators fork...join_none in functions definitely works. fork...join/join_any doesn't make sense in the context of a function so I would avoid it altogether.

Related

Tcl_DoOneEvent is blocked if tkwait / vwait is called

There is an external C++ function that is called from Tcl/Tk and does some stuff in a noticeable amount of time. Tcl caller has to get the result of that function so it waits until it's finished. To avoid blocking of GUI, that C++ function has some kind of event loop implemented in its body:
while (m_curSyncProc.isRunning()) {
const clock_t tm = clock();
while (Tcl_DoOneEvent(TCL_ALL_EVENTS | TCL_DONT_WAIT) > 0) {} // <- stuck here in case of tkwait/vwait
// Pause for 10 ms to avoid 100% CPU usage
if (double(clock() - tm) / CLOCKS_PER_SEC < 0.005) {
nanosleep(10000);
}
}
Everything works great unless tkwait/vwait is in action in Tcl code.
For example, for dialogs the tkwait variable someVariable is used to wait Ok/Close/<whatever> button is pressed. I see that even standard Tk bgerror uses the same method (it uses vwait).
The problem is that once called Tcl_DoOneEvent does not return while Tcl code is waiting in tkwait/vwait line, otherwise it works well. Is it possible to fix it in that event loop without total redesigning of C++ code? Because that code is rather old and complicated and its author is not accessible anymore.
Beware! This is a complex topic!
The Tcl_DoOneEvent() call is essentially what vwait, tkwait and update are thin wrappers around (passing different flags and setting up different callbacks). Nested calls to any of them create nested event loops; you don't really want those unless you're supremely careful. An event loop only terminates when it is not processing any active event callbacks, and if those event callbacks create inner event loops, the outer event loop will not get to do anything at all until the inner one has finished.
As you're taking control of the outer event loop (in a very inefficient way, but oh well) you really want the inner event loops to not run at all. There's three possible ways to deal with this; I suspect that the third (coroutines) will be most suitable for you and that the first is what you're really trying to avoid, but that's definitely your call.
1. Continuation Passing
You can rewrite the inner code into continuation-passing style — a big pile of procedures that hands off from step to step through a state machine/workflow — so that it doesn't actually call vwait (and friends). The only one of the family that tends to be vaguely safe is update idletasks (which is really just Tcl_DoOneEvent(TCL_IDLE_EVENTS | TCL_DONT_WAIT)) to process Tk internally-generated alterations.
This option was your main choice up to Tcl 8.5, and it was a lot of work.
2. Threads
You can move to a multi-threaded application. This can be easy… or very difficult; the details depend on an examination of what you're doing throughout the application.
If going this route, remember that Tcl interpreters and Tcl values are totally thread-bound; they internally use thread-specific data so that they can avoid big global locks. This means that threads in Tcl are comparatively expensive to set up, but actually use multiple CPUs very efficiently afterwards; thread pooling is a very common approach.
3. Coroutines
Starting in 8.6, you can put the inner code in a coroutine. Almost everything in 8.6 is coroutine-aware (“non-recursive” in our internal lingo) by default (including commands you wouldn't normally think of, such as source) and once you've done that, you can replace the vwait calls with equivalents from the Tcllib coroutine package and things will typically “just work”. (For example, vwait var becomes coroutine::vwait var, and after 123 becomes coroutine::after 123.)
The only things that don't have direct replacements are tkwait window and tkwait visibility; you'll need to simulate those with waiting for a <Destroy> or <Visibility> event (the latter is uncommon as it is unsupported on some platforms), which you do by binding a trivial callback on those that just sets a variable that you can coroutine::vwait on (which is essentially all that tkwait does internally anyway).
Coroutines can become messy in a few cases, such as when you've got C code that is not coroutine-aware. The main places in Tcl where these come into play are in trace callbacks, inter-interpreter calls, and the scripted implementations of channels; the issue there is that the internal APIs these sit behind are rather complicated already (especially channels) and nobody's felt up to wading in and enabling a non-recursive implementation.

How many arguments are passed in a function call?

I wish to analyze assembly code that calls functions, and for each 'call' find out how many arguments are passed to the function. I assume that the target functions are not accessible to me, but only the calling code.
I limit myself to code that was compiled with GCC only, and to System V ABI calling convention.
I tried scanning back from each 'call' instruction, but I failed to find a good enough convention (e.g., where to stop scanning? what happen on two subsequent calls with the same arguments?). Assistance is highly appreciated.
Reposting my comments as an answer.
You can't reliably tell in optimized code. And even doing a good job most of the time probably requires human-level AI. e.g. did a function leave a value in RSI because it's a second argument, or was it just using RSI as a scratch register while computing a value for RDI (the first argument)? As Ross says, gcc-generated code for stack-args calling-conventions have more obvious patterns, but still nothing easy to detect.
It's also potentially hard to tell the difference between stores that spill locals to the stack vs. stores that store args to the stack (since gcc can and does use mov stores for stack-args sometimes: see -maccumulate-outgoing-args). One way to tell the difference is that locals will be reloaded later, but args are always assumed to be clobbered.
what happen on two subsequent calls with the same arguments?
Compilers always re-write args before making another call, because they assume that functions clobber their args (even on the stack). The ABI says that functions "own" their args. Compilers do make code that does this (see comments), but compiler-generated code isn't always willing to re-purpose the stack memory holding its args for storing completely different args in order to enable tail-call optimization. :( This is hand-wavey because I don't remember exactly what I've seen as far as missed tail-call optimization opportunities.
Yet if arguments are passed by the stack, then it shall probably be the easier case (and I conclude that all 6 registers are used as well).
Even that isn't reliable. The System V x86-64 ABI is not simple.
int foo(int, big_struct, int) would pass the two integer args in regs, but pass the big struct by value on the stack. FP args are also a major complication. You can't conclude that seeing stuff on the stack means that all 6 integer arg-passing slots are used.
The Windows x64 ABI is significantly different: For example, if the 2nd arg (after adding a hidden return-value pointer if needed) is integer/pointer, it always goes in RDX, regardless of whether the first arg went in RCX, XMM0, or on the stack. It also requires the caller to leave "shadow space".
So you might be able to come up with some heuristics to will work ok for un-optimized code. Even that will be hard to get right.
For optimized code generated by different compilers, I think it would be more work to implement anything even close to useful than you'd ever save by having it.

What are the advantages of inline?

I've been reading a bit about how to use the inline metadata when using the ASC 2.0 compiler.
However I can't find any source of info explaining why I should use them.
Anyone knows?.
Functions induce overhead in any programming language. Per ActionScript, when function execution begins, a number of objects and properties are created.
First, an activation object is created that stores the parameters and local variables declared in a function body. It's an internal mechanism that cannot be directly accessed.
Second, a scope chain is created that contains an ordered list of objects that Flash platform checks for identifier declarations. Every function that executes has a scope chain that is stored in an internal property.
Function closures maintain a snapshot of a function and its lexical environment.
Moving code inline reduces the creation of these objects, and how references are maintained on the stack. Per Flash, you may see 4x performance increase.
Of course, there are tradeoffs - without the inline keyword induces code complexity; as well, inlining code increasing the amount of bytecode. Besides larger applications, the virtual machine spends additional time verifying and JIT compiling.
To simplify, inline is some sort of copy/paste of code. Since method calls are expensive and cost execution time, using inline keyword will copy/paste the body of the method each time the method call is present in your code so the method call will be replaced by the body of the method instead. Since this is done at compilation time it will increase in theory the size of the resulting app (if an inline method is called 10 times its body will be copied and pasted 10 times) but since all calls will be replaced you will gain speed execution. This is of course only relevant for demanding code execution like loops running at each frame for example.

What is the difference between message-passing and method-invocation?

Is there a difference between message-passing and method-invocation, or can they be considered equivalent? This is probably specific to the language; many languages don't support message-passing (though all the ones I can think of support methods) and the ones that do can have entirely different implementations. Also, there are big differences in method-invocation depending on the language (C vs. Java vs Lisp vs your favorite language). I believe this is language-agnostic. What can you do with a passed-method that you can't do with an invoked-method, and vice-versa (in your favorite language)?
Using Objective-C as an example of messages and Java for methods, the major difference is that when you pass messages, the Object decides how it wants to handle that message (usually results in an instance method in the Object being called).
In Java however, method invocation is a more static thing, because you must have a reference to an Object of the type you are calling the method on, and a method with the same name and type signature must exist in that type, or the compiler will complain. What is interesting is the actual call is dynamic, although this is not obvious to the programmer.
For example, consider a class such as
class MyClass {
void doSomething() {}
}
class AnotherClass {
void someMethod() {
Object object = new Object();
object.doSomething(); // compiler checks and complains that Object contains no such method.
// However, through an explicit cast, you can calm the compiler down,
// even though your program will crash at runtime
((MyClass) object).doSomething(); // syntactically valid, yet incorrect
}
}
In Objective-C however, the compiler simply issues you a warning for passing a message to an Object that it thinks the Object may not understand, but ignoring it doesn't stop your program from executing.
While this is very powerful and flexible, it can result in hard-to-find bugs when used incorrectly because of stack corruption.
Adapted from the article here.
Also see this article for more information.
as a first approximation, the answer is: none, as long as you "behave normally"
Even though many people think there is - technically, it is usually the same: a cached lookup of a piece of code to be executed for a particular named-operation (at least for the normal case). Calling the name of the operation a "message" or a "virtual-method" does not make a difference.
BUT: the Actor language is really different: in having active objects (every object has an implicit message-queue and a worker thread - at least conceptionally), parallel processing becones easier to handle (google also "communicating sequential processes" for more).
BUT: in Smalltalk, it is possible to wrap objects to make them actor-like, without actually changing the compiler, the syntax or even recompiling.
BUT: in Smalltalk, when you try to send a message which is not understoof by the receiver (i.e. "someObject foo:arg"), a message-object is created, containing the name and the arguments, and that message-object is passed as argument to the "doesNotUnderstand" message. Thus, an object can decide itself how to deal with unimplemented message-sends (aka calls of an unimplemented method). It can - of course - push them into a queue for a worker process to sequentialize them...
Of course, this is impossible with statically typed languages (unless you make very heavy use of reflection), but is actually a VERY useful feature. Proxy objects, code load on demand, remote procedure calls, learning and self-modifying code, adapting and self-optimizing programs, corba and dcom wrappers, worker queues are all built upon that scheme. It can be misused, and lead to runtime bugs - of course.
So it it is a two-sided sword. Sharp and powerful, but dangerous in the hand of beginners...
EDIT: I am writing about language implementations here (as in Java vs. Smalltalk - not inter-process mechanisms.
IIRC, they've been formally proven to be equivalent. It doesn't take a whole lot of thinking to at least indicate that they should be. About all it takes is ignoring, for a moment, the direct equivalence of the called address with an actual spot in memory, and consider it simply as a number. From this viewpoint, the number is simply an abstract identifier that uniquely identifies a particular type of functionality you wish to invoke.
Even when you are invoking functions in the same machine, there's no real requirement that the called address directly specify the physical (or even virtual) address of the called function. For example, although almost nobody ever really uses them, Intel protected mode task gates allow a call to be made directly to the task gate itself. In this case, only the segment part of the address is treated as an actual address -- i.e., any call to a task gate segment ends up invoking the same address, regardless of the specified offset. If so desired, the processing code can examine the specified offset, and use it to decide upon an individual method to be invoked -- but the relationship between the specified offset and the address of the invoked function can be entirely arbitrary.
A member function call is simply a type of message passing that provides (or at least facilitates) an optimization under the common circumstance that the client and server of the service in question share a common address space. The 1:1 correspondence between the abstract service identifier and the address at which the provider of that service reside allows a trivial, exceptionally fast, mapping from one to the other.
At the same time, make no mistake about it: the fact that something looks like a member function call doesn't prevent it from actually executing on another machine or asynchronously, or (frequently) both. The typical mechanism to accomplish this is proxy function that translates the "virtual message" of a member function call into a "real message" that can (for example) be transmitted over a network as needed (e.g., Microsoft's DCOM, and CORBA both do this quite routinely).
They really aren't the same thing in practice. Message passing is a way to transfer data and instructions between two or more parallel processes. Method invocation is a way to call a subroutine. Erlang's concurrency is built on the former concept with its Concurrent Oriented Programing.
Message passing most likely involves a form of method invocation, but method invocation doesn't necessarily involve message passing. If it did it would be message passing. Message passing is one form of performing synchronization between to parallel processes. Method invocation generally means synchronous activities. The caller waits for the method to finish before it can continue. Message passing is a form of a coroutine. Method-invocation is a form of subroutine.
All subroutines are coroutines, but all coroutines are not subroutines.
Is there a difference between message-passing and method-invocation, or can they be considered equivalent?
They're similar. Some differences:
Messages can be passed synchronously or asynchronously (e.g. the difference between SendMessage and PostMessage in Windows)
You might send a message without knowing exactly which remote object you're sending it to
The target object might be on a remote machine or O/S.

Why is memoization not a language feature?

I was wondering: why is memoization not provided natively as a language feature in any language I know about?
Edit: to clarify, what I mean is that the language provides a keyword to specify a given function as memoizable, not that every function is automatically memoized "by default" unless specified otherwise. For example, Fortran provides the keyword PURE to specify a specific function as such. I guess that the compiler can take advantage of this information to memoize the call, but I ignore what happens if you declare PURE a function with side effects.
What YOU want from memoization may not be the same as what the compiler memoization option would provide.
You may know that it is only profitable to memoize the last 10 or so distinct values computed, because you know how the function will be used.
You may know that it only makes sense to memoize the last 2 or 3 values, because you will never use values older than that. (Fibonacci's Sequence comes to mind.)
You may be generating a LOT of values on some runs, and just a few on others.
You may want to "throw away" some of the memoized values and start over. (I memoized a random number generator this way, so I could replay the sequence of random numbers that built a certain structure, while some other parameters of the structure had been changed.)
Memoization as an optimization depends on the search for the memoized value being a lot cheaper than recomputation of the value. This in turn depends on the ordering of the input requests. This has implications for the memoization database: Does it use a stack, an array of all possible input values (which may be very large), a bucket hash, or a b-tree?
The memoizing compiler has to either provide a "one size fits all" memoization, or it has to provide lots of possible alternatives, and parameters to control the alternatives. At some point, it becomes easier for everyone to require the user to provide his own memoization.
Because compilers have to emit semantically correct programs. You can't memoize a function without changing program semantics unless it is referentially transparent. In most programming languages not all functions are referentially transparent (pure functional programming languages are an exception) so you can't memoize everything. But then a mechanism is needed for detecting referential transparency and that is too hard.
In Haskell, memoization is automatic for (pure) functions you've defined that take no arguments. And the Fibonacci example in that Wiki is really about the simplest demonstrable example I would be able to think of either.
Haskell can do this because your pure functions are defined to produce the same results every time; of course, monadic functions that depend on side effects won't be memoized.
I'm not sure what the upper limits are -- obviously, it won't memoize more than the available memory. And I'm also not sure offhand if the memoization occurs at compile-time (if the values can be determined at compile-time), or if it always occurs the first time the function is called.
Clojure has a memoize function (http://richhickey.github.com/clojure/clojure.core-api.html#clojure.core/memoize):
memoize
function
Usage: (memoize f)
Returns a memoized version of a referentially transparent function. The
memoized version of the function keeps a cache of the mapping from arguments
to results and, when calls with the same arguments are repeated often, has
higher performance at the expense of higher memory use.
A) Memoization trades space for time. I imagine that this can turn out to a fairly unbound property, in the sense, that the amount of data programs or libraries would have to store could consume large parts of memory really quick.
For a couple of languages, memoization is easy to implement and easy to customize for the given requirements.
As an example take some natural language processing on large bodies of text, where you don't want to compute basic properties of texts (word count, frequency, cooccurrences, ...) over and over again. In that case a memoization in combination with object serialization can be useful as opposed to memory caching, since you may run your application multiple times on unchanged corpora.
B) Another aspect: It's not true, that all functions or methods yield the same output for a same given input. Anyway some keyword or syntax for memoization would be necessary, along with configuration (memory limits, invalidation policy, ...) ...
Because you shouldn't implement something as a language feature when it can easily be implemented in the language itself. A memoization feature belongs in a library, which is exactly where most languages put it.
Your question also leaves open the solution of your learning more languages. I think that Lisp supports memoization, and I know that Mathematica does.
In order for memoization to work as a language feature there would be a couple requirements.
The compiler would need to be identify valid functions for memoization (e.g. they are referentially transparent).
The run-time would have to be able to intelligently select candidates for memoization without slowing down the overall performance.
There are some assumptions in the other language, but if we can have performance gains by just-in-time compilation of hot-spots in a Java VM, then one can surely write an automated memoziation system.
While non-trivial I think this is all theoretically possible to get performance gains in a language (especially an interpreted one) and is a worthwhile area for research.
Not all the languages natively support function decorators. I guess it would be a more general approach to support rather than supporting just memoization.
Reverse the question. Why it should? As someone has said, it can be put in a library so no need of add syntax to the language, it's only usable on pure functions which are hard to identify automatically(unless you force the programmer to annotate them). It's also very hard to determine if memoization is going to speed up things or not. I don't think it's a desirable feature for a programming language.
I really think such an option should be.
In data processing tasks there is an immutable input data (as time series, for example, where for a given time as soon as a value is known, it can never change). Taking in mind today RAM affordability, if a function result only depends on such immutable data, it is rational to memoize it rather than reread every time it's needed. Currently I have (in Scala and C#) to manually introduce an in-memory storage table and write 3 functions instead of one - one reading a value from file/db/ws, one storing it into an in-memory table, one to wrap them and read from memory if available or call the raw function if not. I think this could and should be implemented as a keyword and done behind the scenes.