This isn't a specific question about a language, but more about programming in general. The question came after I started an argument with a friend about "Function prototypes" which I learnt about recently from my C++ course. I mentioned to him that prototypes are function headers that you have to create at the beginning of your code so the compiler allocates some space at runtime before getting to the actual function. We then started rambling about whether other programming languages (Like java or python) which do not utilize function prototypes -- so far as we're concerned -- actually had a system similar to that of C++, just that they handled it themselves rather than having the user create them.
So we're curious to know, what are function prototypes after all? Are they only accountable on C/C++, or do other programming languages make use of them? And is it something I'd need to develop more on as a future programmer? Thanks for the help!
With respect to C and C++, the word "prototype" refers to a specific declaration syntax.
In the earliest versions of C, function definitions were written as
int func( arg1, arg2, arg3 ) // no types in argument list, just identifiers
int arg1;
double arg2;
char *arg3;
{
// function body
}
and declarations were written as
int func( ); // no argument list
The function argument list only contained identifiers and no type information - that was supplied separately. Function declarations didn't include arguments, just the return type.
C++ introduced and C later adopted the concept of prototype syntax, where the type information is included in the parameter list in both the definition:
int func( int arg1, double arg2, char *arg3 )
{
// function body
}
and the declaration
int func( int, double, char * );
This allowed compilers to check the number and types of arguments in a function call and issue diagnostics if they didn't match, rather than waiting until runtime to find out if there's a problem.
While the old-style function declaration and definition syntax is still supported, it should not be used for new code development - we're almost to the point where the word "prototype" is kind of redundant, since prototype syntax is the norm rather than the exception.
Statically-typed languages like Fortran and Pascal and Ada all have separate function declarations, but they don't refer to those declarations as prototypes. And again with C, "prototype" refers to a specific style of function declaration and definition, not just a declaration in and of itself.
This is greatly oversimplified, but the reason for function prototypes is "one-pass compilers."
If your compiler only makes one pass through the code, it needs to know what functions are available to it, before the function implementation is called. This is where prototypes come in.
Compilers that make multiple passes through the code build jump tables that tell it where all the functions are, so there's no need for function prototypes.
In C compilers, prototypes are used to check the type and number of function parameters (i.e. the function signature). This program:
#include <stdio.h>
int main()
{
printf("%d\n",add(3));
}
int add(int i, int j)
{
return i+j;
}
compiles and executes in Clang 7 with a warning, even though the result is meaningless (i.e. undefined behavior).
Whereas this program, incorporating a function prototype:
#include <stdio.h>
int add (int, int); /* function prototype for add */
void main()
{
printf("%d\n",add(3));
}
int add(int i, int j)
{
return i+j;
}
fails to compile.
C and C++ are compiled to native code and support calling between compilation units (files). To call a function XYZ from a neighboring compilation unit the compiler inserts a reference "calling XYZ" which is later resolved by the linker. But you need to know what to prepare on the stack for the function. The prototype supplies that information without having to compile the whole function.
Early C treated everything as int and as the C calling convention is caller-cleans-up, you as the caller know how many ints to remove from the stack after a function returns. If you call printf with three arguments without explaining what it is, the C compiler can still figure out what code to generate. If you misspelled it to vrintf it will compile but fail to link. So plain C worked (still works?) to some extent without prototypes (or treats missing prototypes as just a warning).
As C++ can pass all kinds of crazy stuff as function arguments, if you try to call something without explaining its argument types first, the compiler does not know what code to generate and you get an error.
This outside article is reasonably nice: http://www.cplusplus.com/articles/yAqpX9L8/
Related
Why would one write a C++ lambda with a name so it can be called from somewhere? Would that not defeat the very purpose of a lambda? Is it better to write a function instead there? If not, why? Would a function instead have any disadvantages?
One use of this is to have a function access the enclosing scope.
In C++, we don't have nested functions as we do in some other languages.
Having a named lambda solves this problem.
An example:
#include <iostream>
int main ()
{
int x;
auto fun = [&] (int y) {
return x + y;
};
std::cin >> x;
int t;
std::cin >> t;
std::cout << fun (fun (t));
return 0;
}
Here, the function fun is basically a nested function in main, able to access its local variables.
We can format it so that it resembles a regular function, and use it more than once.
A good reason to use names is to express intent. Then one can check that the lambda does 'the right thing' and the reader can check the intent. Given:
std::string key;
std::map<std::string, int> v;
one can write the following:
std::find_if( v.begin(), v.end(), [&](auto const& elem){ return elem.first == key; } );
but it's hard to tell whether it does 'the right thing'. Whereas if we spell it out:
auto matches_key = [&](auto const& elem){ return elem.first == key; };
std::find_if( v.begin(), v.end(), matches_key );
it is clearer that we do want the equality comparison and the readability is improved.
I see three things to consider when choosing between a named lamdba and a free function:
Do you need variables from the surrouding scope? If yes, choose a lamdba and leverage its closure. Otherwise, go with a free function (because of 3.).
Could the closure state equally well be passed as a function parameter? If yes, consider preferring a free function (because of 3.).
Do you want to write a test for the callable and/or reuse it in multiple translation units? If yes, choose a free function, because you must declare it in a header file and capturing variables in a lamdba closure
is a bit confusing in a header file (though this is debatable, of course).
requires the types to be known. You can't therefore live with forward declarations of function parameters and return types to reduce compilation times.
When your lambda is a recursive function by itself you have no choice but to give it a name. Also, an auto keyword won't suffice and you would HAVE to declare it using an std::function with the return type and the argument list.
Below is the example for a function that returns the Nth Fibonacci number:
std::function<int(int)> fibonacci = [&](int n) {
if (n == 0 || n == 1) {
return 1;
} else {
return fibonacci(n - 1) + fibonacci(n - 2);
}
}
You have to give it a name in order to capture it with &. And auto won't work since lambda needs its to know its types before calling itself.
This is basicly an opinion based question. It's up to you, whether you prefer functions or lambdas, they are equivalent. A lambda shines, when you need variables from the surrounding. You just can capture them instead of passing it as a parameter, that's neat.
But beside of that, there is no difference.
when tuning a C++ application, a named lambda is easier to tune/trace, as compared to an anonymous/unamed lambda
I always consider lamdas as a nicety - I did plenty of C++ coding without them before they were introduced. So in some ways, I don't consider that there are many shoulds or shouldn'ts surrounding them. They are there to use however they make your life easier.
One time I use named lamdas is to scope a function - i.e. the lamda is only going to be used within another function - perhaps it does something a little dangerous, that you don't want other functions to have access to or perhaps you don't want to pollute the namespace.
If your lamda is too long to be an easy one-liner, but you don't want it to be
a available outside of your scope, then a named lamda is ideal way to produce tidy easy to read code.
I've been playing around with the reflect package, and I notice how limited the functionality of functions are.
package main
import (
"fmt"
"reflect"
"strings"
)
func main() {
v := reflect.ValueOf(strings.ToUpper)
fmt.Printf("Address: %v\n", v) // 0xd54a0
fmt.Printf("Can set? %d\n", v.CanSet()) // False
fmt.Printf("Can address? %d\n", v.CanAddr()) // False
fmt.Printf("Element? %d\n", v.Elem()) // Panics
}
Playground link here.
I've been taught that functions are addresses to memory with a set of instructions (hence v prints out 0xd54a0), but it looks like I can't get an address to this memory address, set it, or dereference it.
So, how are Go functions implemented under the hood? Eventually, I'd ideally want to manipulate the strings.ToUpper function by making the function point to my own code.
Disclaimers:
I've only recently started to delve deeper into the golang compiler, more specifically: the go assembler and mapping thereof. Because I'm by no means an expert, I'm not going to attempt explaining all the details here (as my knowledge is most likely still lacking). I will provide a couple of links at the bottom that might be worth checking out for more details.
What you're trying to do makes very, very little sense to me. If, at runtime, you're trying to modify a function, you're probably doing something wrong earlier on. And that's just in case you want to mess with any function. The fact that you're trying to do something with a function from the strings package makes this all the more worrying. The reflect package allows you to write very generic functions (eg a service with request handlers, but you want to pass arbitrary arguments to those handlers requires you to have a single handler, process the raw request, then call the corresponding handler. You cannot possibly know what that handler looks like, so you use reflection to work out the arguments required...).
Now, how are functions implemented?
The go compiler is a tricky codebase to wrap ones head around, but thankfully the language design, and the implementation thereof has been discussed openly. From what I gather, golang functions are essentially compiled in pretty much the same way as a function in, for example, C. However, calling a function is a bit different. Go functions are first-class objects, that's why you can pass them as arguments, declare a function type, and why the reflect package has to allow you to use reflection on a function argument.
Essentially, functions are not addressed directly. Functions are passed and invoked through a function "pointer". Functions are effectively a type like similar to a map or a slice. They hold a pointer to the actual code, and the call data. In simple terms, think of a function as a type (in pseudo-code):
type SomeFunc struct {
actualFunc *func(...) // pointer to actual function body
data struct {
args []interface{} // arguments
rVal []interface{} // returns
// any other info
}
}
This means that the reflect package can be used to, for example, count the number of arguments and return values the function expects. It also tells you what the return value(s) will be. The overall function "type" will be able to tell you where the function resides, and what arguments it expects and returns, but that's about it. IMO, that's all you really need though.
Because of this implementation, you can create fields or variables with a function type like this:
var callback func(string) string
This would create an underlying value that, based on the pseudo code above, looks something like this:
callback := struct{
actualFunc: nil, // no actual code to point to, function is nil
data: struct{
args: []interface{}{string}, // where string is a value representing the actual string type
rVal: []interface{}{string},
},
}
Then, by assigning any function that matches the args and rVal constraints, you can determine what executable code the callback variable points to:
callback = strings.ToUpper
callback = func(a string) string {
return fmt.Sprintf("a = %s", a)
}
callback = myOwnToUpper
I hope this cleared 1 or 2 things up a bit, but if not, here's a bunch of links that might shed some more light on the matter.
Go functions implementation and design
Introduction to go's ASM
Rob Pike on the go compiler written in go, and the plan 9 derived asm mapping
Writing a JIT in go asm
a "case study" attempting to use golang ASM for optimisation
Go and assembly introduction
Plan 9 assembly docs
Update
Seeing as you're attempting to swap out a function you're using for testing purposes, I would suggest you not use reflection, but instead inject mock functions, which is a more common practice WRT testing to begin with. Not to mention it being so much easier:
type someT struct {
toUpper func(string) string
}
func New(toUpper func(string) string) *someT {
if toUpper == nil {
toUpper = strings.ToUpper
}
return &someT{
toUpper: toUpper,
}
}
func (s *someT) FuncToTest(t string) string {
return s.toUpper(t)
}
This is a basic example of how you could inject a specific function. From within your foo_test.go file, you'd just call New, passing a different function.
In more complex scenario's, using interfaces is the easiest way to go. Simply implement the interface in the test file, and pass the alternative implementation to the New function:
type StringProcessor interface {
ToUpper(string) string
Join([]string, string) string
// all of the functions you need
}
func New(sp StringProcessor) return *someT {
return &someT{
processor: sp,
}
}
From that point on, simply create a mock implementation of that interface, and you can test everything without having to muck about with reflection. This makes your tests easier to maintain and, because reflection is complex, it makes it far less likely for your tests to be faulty.
If your test is faulty, it could cause your actual tests to pass, even though the code you're trying to test isn't working. I'm always suspicious if the test code is more complex than the code you're covering to begin with...
Underneath the covers, a Go function is probably just as you describe it- an address to a set of instructions in memory, and parameters / return values are filled in according to your system's linkage conventions as the function executes.
However, Go's function abstraction is much more limited, on purpose (it's a language design decision). You can't just replace functions, or even override methods from other imported packages, like you might do in a normal object-oriented language. You certainly can't do dynamic replacement of functions under normal circumstances (I suppose you could write into arbitrary memory locations using the unsafe package, but that's willful circumvention of the language rules, and all bets are off at that point).
Are you trying to do some sort of dependency injection for unit testing? If so, the idiomatic way to do this in Go is to define interface that you pass around to your functions/methods, and replace with a test version in your tests. In your case, an interface may wrap the call to strings.ToUpper in the normal implementation, but a test implementation might call something else.
For example:
type Upper interface {
ToUpper(string) string
}
type defaultUpper struct {}
func (d *defaultUpper) ToUpper(s string) string {
return strings.ToUpper(s)
}
...
// normal implementation: pass in &defaultUpper{}
// test implementation: pass in a test version that
// does something else
func SomethingUseful(s string, d Upper) string {
return d.ToUpper(s)
}
Finally, you can also pass function values around. For example:
var fn func(string) string
fn = strings.ToUpper
...
fn("hello")
... but this won't let you replace the system's strings.ToUpper implementation, of course.
Either way, you can only sort of approximate what you want to do in Go via interfaces or function values. It's not like Python, where everything is dynamic and replaceable.
Or there exists pointers and references like C?
I'm trying to get started with vala but is good to know if vala is "pass by reference" or "pass by value"
First of all you should understand that the default vala compiler valac compiles to C (as an itermediate language). The code is then compiled using a C compiler (usually gcc).
valac -C example.vala will compile to example.c
So you can inspect the produced C code yourself.
Now to the real question:
Vala supports both call-by-value and call-by-reference. It is even a bit more fine grained than that.
Let's take an example using a plain C data type (int).
Call-by-value:
public void my_func (int value) {
// ...
}
The value will be copied into the function, no matter what you do with value inside my_func it won't affect the caller.
Call-by-reference using ref:
public void my_func (ref int value) {
// ...
}
The address will be copied into the function. Everything you do with value inside my_func will be reflected on the caller side as well.
Call-by-reference using out:
public void my_func (out int value) {
// ...
}
Basically the same as ref, but the value doesn't have to be initialized before calling my_func.
For GObject based data types (non-static classes) it gets more complicated, because you have to take memory management into account.
Since those are always managed using pointers (implictly) the ref and `out´ modifiers now reflect how the (implicit) pointer is passed.
It adds one more level of indirection so to speak.
string and array data types are also internally managed using pointers and automatic reference counting (ARC).
Though discouraged, Vala also does support pointers, so you can have an int * or MyClass * just like in C.
Technically, it pass by value since the underlying code is converted to C. Simple types (numeric types, booleans, enums, flags) are passed by value. Strings are passed by reference, but since they are immutable, they might as well be pass by value.
However, arrays, objects, and structs are all passed using pointers in C, so they are pass by reference. There is also the ref and out modifiers to function parameters that force those parameters to be passed by reference.
In case of lack of proper updated tutorials for some particular library functions (in my case, latest allegro5), how can one learn by oneself how to call and use those functions? Is there some clue in header files?
thanks in advance
The header files are going to provide you with the bare minimum information required to correctly compile a program with those functions. It has the types, constants, and function prototypes. Nothing (short of comments) is going to explain how to correctly use the functions, just how to call them.
General
For example, if you see:
int do_something(int n, const char* desc);
You can only infer that you need to pass an integer n and a (C) string desc. That function returns an integer as well.
For a more complex example:
typedef struct {
int foo;
double bar;
} blam_t;
void munge(blam_t info);
You know that munge takes one argument of type blam_t which is a custom structure, as defined above. You could use that to create a blam_t variable and pass it to munge():
blam_t myvar;
myvar.foo = 42;
myvar.bar = 0.67;
munge(myar);
Allegro5
If we look at the source of include/allegro5/display.h we see things like this:
AL_FUNC(void, al_set_new_display_flags, (int flags));
This is an uncommon way of defining functions. They are using a macro AL_FUNC to define their functions. We see (by clicking on it) that AL_FUNC is defined as:
#define AL_FUNC(type, name, args) type name args
So that first example basically becomes:
void al_set_new_display_flags(int flags);
And we can call it with just an integer argument.
Without any documentation, you can only hope to learn by trying the functions. Then this becomes more a reverse engineering task.
Is it currently possible to override the structure constructor in Fortran? I have seen proposed examples like this (such as in the Fortran 2003 spec):
module mymod
type mytype
integer :: x
! Other stuff
end type
interface mytype
module procedure init_mytype
end interface
contains
type(mytype) function init_mytype(i)
integer, intent(in) :: i
if(i > 0) then
init_mytype%x = 1
else
init_mytype%x = 2
end if
end function
end
program test
use mymod
type(mytype) :: x
x = mytype(0)
end program
This basically generates a heap of errors due to redundant variable names (e.g. Error: DERIVED attribute of 'mytype' conflicts with PROCEDURE attribute at (1)). A verbatim copy of the fortran 2003 example generates similar errors. I've tried this in gfortran 4.4, ifort 10.1 and 11.1 and they all produce the same errors.
My question: is this just an unimplemented feature of fortran 2003? Or am I implementing this incorrectly?
Edit: I've come across a bug report and an announced patch to gfortran regarding this issue. However, I've tried using a November build of gcc46 with no luck and similar errors.
Edit 2: The above code appears to work using Intel Fortran 12.1.0.
Is it currently possible to override the structure constructor in Fortran?
No. Anyway even using your approach is completely not about constructor overriding. The main reason is that structure constructor # OOP constructor. There is some similarity but this is just another idea.
You can not use your non-intrinsic function in initialization expression. You can use only constant, array or structure constructor, intrinsic functions, ... For more information take a look at 7.1.7 Initialization expression in Fortran 2003 draft.
Taking that fact into account I completely do not understand what is the real difference between
type(mytype) :: x
x = mytype(0)
and
type(mytype) :: x
x = init_mytype(0)
and what is the whole point of using INTERFACE block inside mymod MODULE.
Well, honestly speaking there is a difference, the huge one - the first way is misleading. This function is not the constructor (because there are no OOP constructors at all in Fortran), it is an initializer.
In mainstream OOP constructor is responsible for sequentially doing two things:
Memory allocation.
Member initialization.
Let's take a look at some examples of instantiating classes in different languages.
In Java:
MyType mt = new MyType(1);
a very important fact is hidden - the fact the object is actually a pointer to a varibale of a class type. The equivalent in C++ will be allocation on heap using:
MyType* mt = new MyType(1);
But in both languages one can see that two constructor duties are reflected even at syntax level. It consists of two parts: keyword new (allocation) and constructor name (initialization). In Objective-C syntax this fact is even more emphasized:
MyType* mt = [[MyType alloc] init:1];
Many times, however, you can see some other form of constructor invocation. In the case of allocation on stack C++ uses special (very poor) syntax construction
MyType mt(1);
which is actually so misleading that we can just not consider it.
In Python
mt = MyType(1)
both the fact the object is actually a pointer and the fact that allocation take place first are hidden (at syntax level). And this method is called ... __init__! O_O So misleading. С++ stack allocation fades in comparison with that one. =)
Anyway, the idea of having constructor in the language imply the ability to do allocation an initialization in one statement using some special kind of method. And if you think that this is "true OOP" way I have bad news for you. Even Smalltalk doesn't have constructors. It just a convention to have a new method on classes themselves (they are singleton objects of meta classes). The Factory Design Pattern is used in many other languages to achieve the same goal.
I read somewhere that concepts of modules in Fortran was inspired by Modula-2. And it seems for me that OOP features are inspired by Oberon-2. There is no constructors in Oberon-2 also. But there is of course pure allocation with predeclared procedure NEW (like ALLOCATE in Fortran, but ALLOCATE is statement). After allocation you can (should in practice) call some initializer, which is just an ordinary method. Nothing special there.
So you can use some sort of factories to initialize objects. It's what you actually did using modules instead of singleton objects. Or it's better to say that they (Java/C#/... programmers) use singleton objects methods instead of ordinary functions due to the lack of the later one (no modules - no way to have ordinary functions, only methods).
Also you can use type-bound SUBROUTINE instead.
MODULE mymod
TYPE mytype
PRIVATE
INTEGER :: x
CONTAINS
PROCEDURE, PASS :: init
END TYPE
CONTAINS
SUBROUTINE init(this, i)
CLASS(mytype), INTENT(OUT) :: this
INTEGER, INTENT(IN) :: i
IF(i > 0) THEN
this%x = 1
ELSE
this%x = 2
END IF
END SUBROUTINE init
END
PROGRAM test
USE mymod
TYPE(mytype) :: x
CALL x%init(1)
END PROGRAM
INTENT(OUT) for this arg of init SUBROUTINE seems to be fine. Because we expect this method to be called only once and right after allocation. Might be a good idea to control that this assumption will not be wrong. To add some boolean flag LOGICAL :: inited to mytype, check if it is .false. and set it to .true. upon first initialization, and do something else on attempt to re-initialization. I definitely remember some thread about it in Google Groups... I can not find it.
I consulted my copy of the Fortran 2008 standard. That does allow you to define a generic interface with the same name as a derived type. My compiler (Intel Fortran 11.1) won't compile the code though so I'm left suspecting (without a copy of the 2003 standard to hand) that this is an as-yet-unimplemented feature of the Fortran 2003 standard.
Besides that, there is an error in your program. Your function declaration:
type(mytype) function init_mytype
integer, intent(in) :: i
specifies the existence and intent of an argument which is not present in the function specification, which should perhaps be rewritten as:
type(mytype) function init_mytype(i)