Golang Function Types with a binding to a struct? - function

This might seem like a silly question, but I want to make a struct with a collection of functions, but the functions bind to the struct. I can sorta see that this is a cycle, but humor me with this example:
type FuncType func() error
type FuncSet struct {
TokenVariable int
FuncTyper FuncType
}
and I want to be able to create a function bound to the FuncSet type so it can operate on TokenVariable, thusly:
func (f *FuncSet) FuncType() error {
f.TokenVariable = 100
return nil
}
However, this changes the signature of the type (I can't find any information about type bindings as part of function type specifications) such that assigning this function to the struct element tells me this function/variable is not found.
I can see an easy work-around for this, by prefixing the parameters with a pointer to the struct type, it's just a bit ugly.
I looked around a little further and discovered that what I'm kinda looking for is like a closure in that it can be passed a variable from the immediate outer scope but... well, I'll be glad to be corrected about this absence of type binding in function types, but for now passing the pointer to the type looks like the way to go.
I think I found the solution:
type nullTester func(*Bast, uint32) bool
type Bast struct {
...
isNull nullTester
...
}
func isNull(b *Bast, d uint32) bool {
return d == 0
}
and then I can bind it to the type like this:
func NewBast() (b *Bast) {
...
b.isNull = isNull
...
}
// IsNull - tests if a value in the tree is null
func (b *Bast) IsNull(d uint32) bool {
return b.isNull(b, d)
}
It seems a bit hackish and I'm not sure what's going to happen in a second library that I will write that sets a different type for the uint32 parameter, but go vet is happy so maybe this is the correct way to do it.
It does seem to me that func types should really have a field in the grammar to specify a binding type, but maybe I just found a hack that sorta lets me do polymorphism. In calling programs all they will see is the nice exported function that binds to the type as planned and I get my readability as well as being able to retarget the base library to store a different type of data.
I think this is the proper solution. I just can't find anything that confirms or denies whether in a type Name func specification there is any way of asserting the type. It really should not match up, since the binding is part of the signature, but the syntax for type with functions does not appear to have this type binding.
My actual code is here, and you can see by looking at it what I am aiming to do:
https://github.com/calibrae-project/bast/blob/master/pkg/bast/bast.go
The differences between the type of data the tree stores is entirely superficial, because it is intended to be primarily used for sorting unsigned integers of various lengths, and one important thing it needs to have is to be able to work from a, for example, 64 bit integer but sort only by the first or last half (as I have a bigger project that treats these hash values as coordinates in an adjacency list). In theory it could be used instead of a hash table lookup as well, with a low variance in time to find elements because of the binary tree structure.
It's not a conventional, reference-vector based tree, and the store itself is an array with an unconventional power of two mapping, a 'dense' tree, and the purpose above all, for implementing this way, is that when the tree is walked, as well as rotated, much of the time it is sequential blocks of memory being accessed which should make for a lot less cache misses than a conventional binary tree (and for which reason generally this type of application just uses some kind of sort like a bucket sort).

You could use an anonymous field with an interface that defines the method set that you want to use (that might change).
Go playground here
You'd define your interface
type validator interface {
IsRightOf(a, b interface{}) bool
... // other methods
}
and your type:
type Bast struct {
validator // anonymous interface field
... // some fields
}
Then you can access the methods of validator from the Bast type
b := bast.New()
b.IsRightOf(c, d) // this is valid, you do not need to do b.validator.IsRightOf(...)
because validator is an interface you can change those methods how you like.

Related

How to do type checking for a recursive function with no explicit return type?

I am writing a language where functions are not typed. Which means I need to infer the return type of a function call in order to do type checking. However when somebody writes a recursive function the type checker goes into an infinite recursion trying to infer the type of the function call inside the function body.
The type checker does something like this:
Infer the types of the function call actual arguments.
Create a mapping of the actual argument types to the formal arguments.
Use the mapping to annotate types on the arguments used inside the function body.
Infer and return the return type of the function body.
Step 4 tries to then infer the type of the function call inside the function body, which calls the same type checker function again, causing an infinite recursion.
An example of a recursive function that gives me this problem:
function factorial(n) = n<1 ? 1 : n*factorial(n-1); // Function definition.
...
assert 24 == factorial(4); // Function call expression usage example.
How can I solve this problem without going in to an infinite recursion loop? Is there a way to infer the type of the recursive function call without having to go into the body again? Or some clean way to infer the type from context?
I know the easy solution might be to add types annotations to functions, this way the problem is trivial, but before doing that I want to know if there is a way to solve this without resorting to that.
I'd also like for the solution to work for mutual recursion.
Type inference can vary a lot depending on the language's type system and on what properties you want to have in terms of when annotations are needed. But whatever your language looks like, I think there's one seminal case you really should read about, which is ML. ML's type inference holds a nice sweet spot where it all fits together in a relatively simple paradigm. No type annotations are needed, and any expression has a single most general type (this property is called principality of typing).
ML's type system is the Hindley-Milner type system, which has parametric polymorphism. The type of an expression is either a specific type, or “any”. More precisely, the type constructor of an expression is either a specific type constructor or “any”, and type constructors can have arguments which themselves either have a specific type constructor or “any”. For example, the empty list has the type “list of any”. Two expressions that can have “any” type in isolation may be constrained to have the same type, whatever it is, so “any” is expressed with variables. For example, function list_of_two(x, y) = [x, y] (in a notation like your language) constrains x and y to have the same type, because they're inserted in the same list, but that type can be any type, so the type of this function is “take any two parameters of the same type α, and return a value of type list of α”.
The basic type inference algorithm for Hindley-Milner is algorithm W. At its core, it works by giving each subexpression a type that's a variable: α₁, α₂, α₃, … Programming language constructions then impose constraints on those variables. For example, if a list contains two elements of types α₁ and α₂ and the list itself has the type α₃, this constraints α₁ = α₂ and α₃ = list of α₁. Putting all these constraints together is a unification problem.
The constraints are based on a purely syntactic reading of the program. If there's a recursive call, you don't need to know the type of the function: it just means that there's a constraint that the variable for the return type of the function is the same as the type at its point of use. That's just one more equation to add to the set of constraints.
I left out an important aspect of ML which is that an expression's type can be generalized: an expression can be used with different types at different places. This is what allows polymorphism. For example,
let empty_list = [] in
(empty_list # [3]), (empty_list # ["hello"])
is a valid program where empty_list is used once with the type “list of integers” and once with the type “list of strings”. The type of empty_list is “for any α, list of α”: that's parametric polymorphism. Generalization adds some complexity to the algorithm, but it also removes complexity elsewhere, because that's what allows principality. Without it, let empty_list = [] in … would be ambiguous: empty_list would have to have some type, but there's no way to know what type without analyzing …, and then when you do analyze the … above you'd need to make a choice between integer and string.
Depending on your language's type system, ML and algorithm W may be directly reusable or may just provide some vague inspiration. But the principle of using variables during the inference, and progressively constraining these variables, is very general.

How are functions implemented?

I've been playing around with the reflect package, and I notice how limited the functionality of functions are.
package main
import (
"fmt"
"reflect"
"strings"
)
func main() {
v := reflect.ValueOf(strings.ToUpper)
fmt.Printf("Address: %v\n", v) // 0xd54a0
fmt.Printf("Can set? %d\n", v.CanSet()) // False
fmt.Printf("Can address? %d\n", v.CanAddr()) // False
fmt.Printf("Element? %d\n", v.Elem()) // Panics
}
Playground link here.
I've been taught that functions are addresses to memory with a set of instructions (hence v prints out 0xd54a0), but it looks like I can't get an address to this memory address, set it, or dereference it.
So, how are Go functions implemented under the hood? Eventually, I'd ideally want to manipulate the strings.ToUpper function by making the function point to my own code.
Disclaimers:
I've only recently started to delve deeper into the golang compiler, more specifically: the go assembler and mapping thereof. Because I'm by no means an expert, I'm not going to attempt explaining all the details here (as my knowledge is most likely still lacking). I will provide a couple of links at the bottom that might be worth checking out for more details.
What you're trying to do makes very, very little sense to me. If, at runtime, you're trying to modify a function, you're probably doing something wrong earlier on. And that's just in case you want to mess with any function. The fact that you're trying to do something with a function from the strings package makes this all the more worrying. The reflect package allows you to write very generic functions (eg a service with request handlers, but you want to pass arbitrary arguments to those handlers requires you to have a single handler, process the raw request, then call the corresponding handler. You cannot possibly know what that handler looks like, so you use reflection to work out the arguments required...).
Now, how are functions implemented?
The go compiler is a tricky codebase to wrap ones head around, but thankfully the language design, and the implementation thereof has been discussed openly. From what I gather, golang functions are essentially compiled in pretty much the same way as a function in, for example, C. However, calling a function is a bit different. Go functions are first-class objects, that's why you can pass them as arguments, declare a function type, and why the reflect package has to allow you to use reflection on a function argument.
Essentially, functions are not addressed directly. Functions are passed and invoked through a function "pointer". Functions are effectively a type like similar to a map or a slice. They hold a pointer to the actual code, and the call data. In simple terms, think of a function as a type (in pseudo-code):
type SomeFunc struct {
actualFunc *func(...) // pointer to actual function body
data struct {
args []interface{} // arguments
rVal []interface{} // returns
// any other info
}
}
This means that the reflect package can be used to, for example, count the number of arguments and return values the function expects. It also tells you what the return value(s) will be. The overall function "type" will be able to tell you where the function resides, and what arguments it expects and returns, but that's about it. IMO, that's all you really need though.
Because of this implementation, you can create fields or variables with a function type like this:
var callback func(string) string
This would create an underlying value that, based on the pseudo code above, looks something like this:
callback := struct{
actualFunc: nil, // no actual code to point to, function is nil
data: struct{
args: []interface{}{string}, // where string is a value representing the actual string type
rVal: []interface{}{string},
},
}
Then, by assigning any function that matches the args and rVal constraints, you can determine what executable code the callback variable points to:
callback = strings.ToUpper
callback = func(a string) string {
return fmt.Sprintf("a = %s", a)
}
callback = myOwnToUpper
I hope this cleared 1 or 2 things up a bit, but if not, here's a bunch of links that might shed some more light on the matter.
Go functions implementation and design
Introduction to go's ASM
Rob Pike on the go compiler written in go, and the plan 9 derived asm mapping
Writing a JIT in go asm
a "case study" attempting to use golang ASM for optimisation
Go and assembly introduction
Plan 9 assembly docs
Update
Seeing as you're attempting to swap out a function you're using for testing purposes, I would suggest you not use reflection, but instead inject mock functions, which is a more common practice WRT testing to begin with. Not to mention it being so much easier:
type someT struct {
toUpper func(string) string
}
func New(toUpper func(string) string) *someT {
if toUpper == nil {
toUpper = strings.ToUpper
}
return &someT{
toUpper: toUpper,
}
}
func (s *someT) FuncToTest(t string) string {
return s.toUpper(t)
}
This is a basic example of how you could inject a specific function. From within your foo_test.go file, you'd just call New, passing a different function.
In more complex scenario's, using interfaces is the easiest way to go. Simply implement the interface in the test file, and pass the alternative implementation to the New function:
type StringProcessor interface {
ToUpper(string) string
Join([]string, string) string
// all of the functions you need
}
func New(sp StringProcessor) return *someT {
return &someT{
processor: sp,
}
}
From that point on, simply create a mock implementation of that interface, and you can test everything without having to muck about with reflection. This makes your tests easier to maintain and, because reflection is complex, it makes it far less likely for your tests to be faulty.
If your test is faulty, it could cause your actual tests to pass, even though the code you're trying to test isn't working. I'm always suspicious if the test code is more complex than the code you're covering to begin with...
Underneath the covers, a Go function is probably just as you describe it- an address to a set of instructions in memory, and parameters / return values are filled in according to your system's linkage conventions as the function executes.
However, Go's function abstraction is much more limited, on purpose (it's a language design decision). You can't just replace functions, or even override methods from other imported packages, like you might do in a normal object-oriented language. You certainly can't do dynamic replacement of functions under normal circumstances (I suppose you could write into arbitrary memory locations using the unsafe package, but that's willful circumvention of the language rules, and all bets are off at that point).
Are you trying to do some sort of dependency injection for unit testing? If so, the idiomatic way to do this in Go is to define interface that you pass around to your functions/methods, and replace with a test version in your tests. In your case, an interface may wrap the call to strings.ToUpper in the normal implementation, but a test implementation might call something else.
For example:
type Upper interface {
ToUpper(string) string
}
type defaultUpper struct {}
func (d *defaultUpper) ToUpper(s string) string {
return strings.ToUpper(s)
}
...
// normal implementation: pass in &defaultUpper{}
// test implementation: pass in a test version that
// does something else
func SomethingUseful(s string, d Upper) string {
return d.ToUpper(s)
}
Finally, you can also pass function values around. For example:
var fn func(string) string
fn = strings.ToUpper
...
fn("hello")
... but this won't let you replace the system's strings.ToUpper implementation, of course.
Either way, you can only sort of approximate what you want to do in Go via interfaces or function values. It's not like Python, where everything is dynamic and replaceable.

Value receiver vs. pointer receiver

It is very unclear for me in which case I would want to use a value receiver instead of always using a pointer receiver.
To recap from the docs:
type T struct {
a int
}
func (tv T) Mv(a int) int { return 0 } // value receiver
func (tp *T) Mp(f float32) float32 { return 1 } // pointer receiver
The docs also say "For types such as basic types, slices, and small structs, a value receiver is very cheap so unless the semantics of the method requires a pointer, a value receiver is efficient and clear."
First point they docs say a value receiver is "very cheap", but the question is whether it is cheaper than a pointer receiver. So I made a small benchmark (code on gist) which showed me, that pointer receiver is faster even for a struct that has only one string field. These are the results:
// Struct one empty string property
BenchmarkChangePointerReceiver 2000000000 0.36 ns/op
BenchmarkChangeItValueReceiver 500000000 3.62 ns/op
// Struct one zero int property
BenchmarkChangePointerReceiver 2000000000 0.36 ns/op
BenchmarkChangeItValueReceiver 2000000000 0.36 ns/op
(Edit: Please note that second point became invalid in newer go versions, see comments.)
Second point the docs say that a value receiver it is "efficient and clear" which is more a matter of taste, isn't it? Personally I prefer consistency by using the same thing everywhere. Efficiency in what sense? Performance wise it seems pointer are almost always more efficient. Few test-runs with one int property showed minimal advantage of Value receiver (range of 0.01-0.1 ns/op)
Can someone tell me a case where a value receiver clearly makes more sense than a pointer receiver? Or am I doing something wrong in the benchmark? Did I overlook other factors?
Note that the FAQ does mention consistency
Next is consistency. If some of the methods of the type must have pointer receivers, the rest should too, so the method set is consistent regardless of how the type is used. See the section on method sets for details.
As mentioned in this thread:
The rule about pointers vs. values for receivers is that value methods can
be invoked on pointers and values, but pointer methods can only be invoked
on pointers
Which is not true, as commented by Sart Simha
Both value receiver and pointer receiver methods can be invoked on a correctly-typed pointer or non-pointer.
Regardless of what the method is called on, within the method body the identifier of the receiver refers to a by-copy value when a value receiver is used, and a pointer when a pointer receiver is used: example.
Now:
Can someone tell me a case where a value receiver clearly makes more sense then a pointer receiver?
The Code Review comment can help:
If the receiver is a map, func or chan, don't use a pointer to it.
If the receiver is a slice and the method doesn't reslice or reallocate the slice, don't use a pointer to it.
If the method needs to mutate the receiver, the receiver must be a pointer.
If the receiver is a struct that contains a sync.Mutex or similar synchronizing field, the receiver must be a pointer to avoid copying.
If the receiver is a large struct or array, a pointer receiver is more efficient. How large is large? Assume it's equivalent to passing all its elements as arguments to the method. If that feels too large, it's also too large for the receiver.
Can function or methods, either concurrently or when called from this method, be mutating the receiver? A value type creates a copy of the receiver when the method is invoked, so outside updates will not be applied to this receiver. If changes must be visible in the original receiver, the receiver must be a pointer.
If the receiver is a struct, array or slice and any of its elements is a pointer to something that might be mutating, prefer a pointer receiver, as it will make the intention more clear to the reader.
If the receiver is a small array or struct that is naturally a value type (for instance, something like the time.Time type), with no mutable fields and no pointers, or is just a simple basic type such as int or string, a value receiver makes sense.
A value receiver can reduce the amount of garbage that can be generated; if a value is passed to a value method, an on-stack copy can be used instead of allocating on the heap. (The compiler tries to be smart about avoiding this allocation, but it can't always succeed.) Don't choose a value receiver type for this reason without profiling first.
Finally, when in doubt, use a pointer receiver.
The part in bold is found for instance in net/http/server.go#Write():
// Write writes the headers described in h to w.
//
// This method has a value receiver, despite the somewhat large size
// of h, because it prevents an allocation. The escape analysis isn't
// smart enough to realize this function doesn't mutate h.
func (h extraHeader) Write(w *bufio.Writer) {
...
}
Note: irbull points out in the comments a warning about interface methods:
Following the advice that the receiver type should be consistent, if you have a pointer receiver, then your (p *type) String() string method should also use a pointer receiver.
But this does not implement the Stringer interface, unless the caller of your API also uses pointer to your type, which might be a usability problem of your API.
I don't know if consistency beats usability here.
points out to:
"Method Sets (Pointer vs Value Receiver)"
you can mix and match methods with value receivers and methods with pointer receivers, and use them with variables containing values and pointers, without worrying about which is which.
Both will work, and the syntax is the same.
However, if methods with pointer receivers are needed to satisfy an interface, then only a pointer will be assignable to the interface — a value won't be valid.
"Go interfaces and automatically generated functions" from Chris Siebenmann (June 2017)
Calling value receiver methods through interfaces always creates extra copies of your values.
Interface values are fundamentally pointers, while your value receiver methods require values; ergo every call requires Go to create a new copy of the value, call your method with it, and then throw the value away.
There is no way to avoid this as long as you use value receiver methods and call them through interface values; it's a fundamental requirement of Go.
"Learning about Go's unaddressable values and slicing" (still from Chris (Sept. 2018))
Concept of unaddressable values, which are the opposite of addressable values. The careful technical version is in the Go specification in Address operators, but the hand waving summary version is that most anonymous values are not addressable (one big exception is composite literals)
To add additionally to #VonC great, informative answer.
I am surprised no one really mentioned the maintainance cost once the project gets larger, old devs leave and new one comes. Go surely is a young language.
Generally speaking, I try to avoid pointers when I can but they do have their place and beauty.
I use pointers when:
working with large datasets
have a struct maintaining state, e.g. TokenCache,
I make sure ALL fields are PRIVATE, interaction is possible only via defined method receivers
I don't pass this function to any goroutine
E.g:
type TokenCache struct {
cache map[string]map[string]bool
}
func (c *TokenCache) Add(contract string, token string, authorized bool) {
tokens := c.cache[contract]
if tokens == nil {
tokens = make(map[string]bool)
}
tokens[token] = authorized
c.cache[contract] = tokens
}
Reasons why I avoid pointers:
pointers are not concurrently safe (the whole point of GoLang)
once pointer receiver, always pointer receiver (for all Struct's methods for consistency)
mutexes are surely more expensive, slower and harder to maintain comparing to the "value copy cost"
speaking of "value copy cost", is that really an issue? Premature optimization is root to all evil, you can always add pointers later
it directly, conciously forces me to design small Structs
pointers can be mostly avoided by designing pure functions with clear intention and obvious I/O
garbage collection is harder with pointers I believe
easier to argue about encapsulation, responsibilities
keep it simple, stupid (yes, pointers can be tricky because you never know the next project's dev)
unit testing is like walking through pink garden (slovak only expression?), means easy
no NIL if conditions (NIL can be passed where a pointer was expected)
My rule of thumb, write as many encapsulated methods as possible such as:
package rsa
// EncryptPKCS1v15 encrypts the given message with RSA and the padding scheme from PKCS#1 v1.5.
func EncryptPKCS1v15(rand io.Reader, pub *PublicKey, msg []byte) ([]byte, error) {
return []byte("secret text"), nil
}
cipherText, err := rsa.EncryptPKCS1v15(rand, pub, keyBlock)
UPDATE:
This question inspired me to research the topic more and write a blog post about it https://medium.com/gophersland/gopher-vs-object-oriented-golang-4fa62b88c701
It is a question of semantics. Imagine you write a function taking two numbers as arguments. You don't want to suddenly find out that either of these numbers got mutated by the calling function. If you pass them as pointers that is possible. Lots of things should act just like numbers. Things like points, 2D vectors, dates, rectangles, circles etc. These things don't have identity. Two circle at the same position and with the same radius should not be distinguished from each other. They are value types.
But something like a database connection or a file handle, a button in the GUI is something where identity matters. In these cases you want a pointer to the object.
When something is inherently a value type such as a rectangle or point, it is really preferable to be able to pass them without using pointers. Why? Because it means you are certain to avoid mutating the object. It clarifies semantics and intent to reader of your code. It is clear that the function receiving the object cannot and will not mutate the object.

What is an existential type?

I read through the Wikipedia article Existential types. I gathered that they're called existential types because of the existential operator (∃). I'm not sure what the point of it is, though. What's the difference between
T = ∃X { X a; int f(X); }
and
T = ∀x { X a; int f(X); }
?
When someone defines a universal type ∀X they're saying: You can plug in whatever type you want, I don't need to know anything about the type to do my job, I'll only refer to it opaquely as X.
When someone defines an existential type ∃X they're saying: I'll use whatever type I want here; you won't know anything about the type, so you can only refer to it opaquely as X.
Universal types let you write things like:
void copy<T>(List<T> source, List<T> dest) {
...
}
The copy function has no idea what T will actually be, but it doesn't need to know.
Existential types would let you write things like:
interface VirtualMachine<B> {
B compile(String source);
void run(B bytecode);
}
// Now, if you had a list of VMs you wanted to run on the same input:
void runAllCompilers(List<∃B:VirtualMachine<B>> vms, String source) {
for (∃B:VirtualMachine<B> vm : vms) {
B bytecode = vm.compile(source);
vm.run(bytecode);
}
}
Each virtual machine implementation in the list can have a different bytecode type. The runAllCompilers function has no idea what the bytecode type is, but it doesn't need to; all it does is relay the bytecode from VirtualMachine.compile to VirtualMachine.run.
Java type wildcards (ex: List<?>) are a very limited form of existential types.
Update: Forgot to mention that you can sort of simulate existential types with universal types. First, wrap your universal type to hide the type parameter. Second, invert control (this effectively swaps the "you" and "I" part in the definitions above, which is the primary difference between existentials and universals).
// A wrapper that hides the type parameter 'B'
interface VMWrapper {
void unwrap(VMHandler handler);
}
// A callback (control inversion)
interface VMHandler {
<B> void handle(VirtualMachine<B> vm);
}
Now, we can have the VMWrapper call our own VMHandler which has a universally-typed handle function. The net effect is the same, our code has to treat B as opaque.
void runWithAll(List<VMWrapper> vms, final String input)
{
for (VMWrapper vm : vms) {
vm.unwrap(new VMHandler() {
public <B> void handle(VirtualMachine<B> vm) {
B bytecode = vm.compile(input);
vm.run(bytecode);
}
});
}
}
An example VM implementation:
class MyVM implements VirtualMachine<byte[]>, VMWrapper {
public byte[] compile(String input) {
return null; // TODO: somehow compile the input
}
public void run(byte[] bytecode) {
// TODO: Somehow evaluate 'bytecode'
}
public void unwrap(VMHandler handler) {
handler.handle(this);
}
}
A value of an existential type like ∃x. F(x) is a pair containing some type x and a value of the type F(x). Whereas a value of a polymorphic type like ∀x. F(x) is a function that takes some type x and produces a value of type F(x). In both cases, the type closes over some type constructor F.
Note that this view mixes types and values. The existential proof is one type and one value. The universal proof is an entire family of values indexed by type (or a mapping from types to values).
So the difference between the two types you specified is as follows:
T = ∃X { X a; int f(X); }
This means: A value of type T contains a type called X, a value a:X, and a function f:X->int. A producer of values of type T gets to choose any type for X and a consumer can't know anything about X. Except that there's one example of it called a and that this value can be turned into an int by giving it to f. In other words, a value of type T knows how to produce an int somehow. Well, we could eliminate the intermediate type X and just say:
T = int
The universally quantified one is a little different.
T = ∀X { X a; int f(X); }
This means: A value of type T can be given any type X, and it will produce a value a:X, and a function f:X->int no matter what X is. In other words: a consumer of values of type T can choose any type for X. And a producer of values of type T can't know anything at all about X, but it has to be able to produce a value a for any choice of X, and be able to turn such a value into an int.
Obviously implementing this type is impossible, because there is no program that can produce a value of every imaginable type. Unless you allow absurdities like null or bottoms.
Since an existential is a pair, an existential argument can be converted to a universal one via currying.
(∃b. F(b)) -> Int
is the same as:
∀b. (F(b) -> Int)
The former is a rank-2 existential. This leads to the following useful property:
Every existentially quantified type of rank n+1 is a universally quantified type of rank n.
There is a standard algorithm for turning existentials into universals, called Skolemization.
I think it makes sense to explain existential types together with universal types, since the two concepts are complementary, i.e. one is the "opposite" of the other.
I cannot answer every detail about existential types (such as giving an exact definition, list all possible uses, their relation to abstract data types, etc.) because I'm simply not knowledgeable enough for that. I'll demonstrate only (using Java) what this HaskellWiki article states to be the principal effect of existential types:
Existential types can be used for several different purposes. But what they do is to 'hide' a type variable on the right-hand side. Normally, any type variable appearing on the right must also appear on the left […]
Example set-up:
The following pseudo-code is not quite valid Java, even though it would be easy enough to fix that. In fact, that's exactly what I'm going to do in this answer!
class Tree<α>
{
α value;
Tree<α> left;
Tree<α> right;
}
int height(Tree<α> t)
{
return (t != null) ? 1 + max( height(t.left), height(t.right) )
: 0;
}
Let me briefly spell this out for you. We are defining…
a recursive type Tree<α> which represents a node in a binary tree. Each node stores a value of some type α and has references to optional left and right subtrees of the same type.
a function height which returns the furthest distance from any leaf node to the root node t.
Now, let's turn the above pseudo-code for height into proper Java syntax! (I'll keep on omitting some boilerplate for brevity's sake, such as object-orientation and accessibility modifiers.) I'm going to show two possible solutions.
1. Universal type solution:
The most obvious fix is to simply make height generic by introducing the type parameter α into its signature:
<α> int height(Tree<α> t)
{
return (t != null) ? 1 + max( height(t.left), height(t.right) )
: 0;
}
This would allow you to declare variables and create expressions of type α inside that function, if you wanted to. But...
2. Existential type solution:
If you look at our method's body, you will notice that we're not actually accessing, or working with, anything of type α! There are no expressions having that type, nor any variables declared with that type... so, why do we have to make height generic at all? Why can't we simply forget about α? As it turns out, we can:
int height(Tree<?> t)
{
return (t != null) ? 1 + max( height(t.left), height(t.right) )
: 0;
}
As I wrote at the very beginning of this answer, existential and universal types are complementary / dual in nature. Thus, if the universal type solution was to make height more generic, then we should expect that existential types have the opposite effect: making it less generic, namely by hiding/removing the type parameter α.
As a consequence, you can no longer refer to the type of t.value in this method nor manipulate any expressions of that type, because no identifier has been bound to it. (The ? wildcard is a special token, not an identifier that "captures" a type.) t.value has effectively become opaque; perhaps the only thing you can still do with it is type-cast it to Object.
Summary:
===========================================================
| universally existentially
| quantified type quantified type
---------------------+-------------------------------------
calling method |
needs to know | yes no
the type argument |
---------------------+-------------------------------------
called method |
can use / refer to | yes no
the type argument |
=====================+=====================================
These are all good examples, but I choose to answer it a little bit differently. Recall from math, that ∀x. P(x) means "for all x's, I can prove that P(x)". In other words, it is a kind of function, you give me an x and I have a method to prove it for you.
In type theory, we are not talking about proofs, but of types. So in this space we mean "for any type X you give me, I will give you a specific type P". Now, since we don't give P much information about X besides the fact that it is a type, P can't do much with it, but there are some examples. P can create the type of "all pairs of the same type": P<X> = Pair<X, X> = (X, X). Or we can create the option type: P<X> = Option<X> = X | Nil, where Nil is the type of the null pointers. We can make a list out of it: List<X> = (X, List<X>) | Nil. Notice that the last one is recursive, values of List<X> are either pairs where the first element is an X and the second element is a List<X> or else it is a null pointer.
Now, in math ∃x. P(x) means "I can prove that there is a particular x such that P(x) is true". There may be many such x's, but to prove it, one is enough. Another way to think of it is that there must exist a non-empty set of evidence-and-proof pairs {(x, P(x))}.
Translated to type theory: A type in the family ∃X.P<X> is a type X and a corresponding type P<X>. Notice that while before we gave X to P, (so that we knew everything about X but P very little) that the opposite is true now. P<X> doesn't promise to give any information about X, just that there there is one, and that it is indeed a type.
How is this useful? Well, P could be a type that has a way of exposing its internal type X. An example would be an object which hides the internal representation of its state X. Though we have no way of directly manipulating it, we can observe its effect by poking at P. There could be many implementations of this type, but you could use all of these types no matter which particular one was chosen.
To directly answer your question:
With the universal type, uses of T must include the type parameter X. For example T<String> or T<Integer>. For the existential type uses of T do not include that type parameter because it is unknown or irrelevant - just use T (or in Java T<?>).
Further information:
Universal/abstract types and existential types are a duality of perspective between the consumer/client of an object/function and the producer/implementation of it. When one side sees a universal type the other sees an existential type.
In Java you can define a generic class:
public class MyClass<T> {
// T is existential in here
T whatever;
public MyClass(T w) { this.whatever = w; }
public static MyClass<?> secretMessage() { return new MyClass("bazzlebleeb"); }
}
// T is universal from out here
MyClass<String> mc1 = new MyClass("foo");
MyClass<Integer> mc2 = new MyClass(123);
MyClass<?> mc3 = MyClass.secretMessage();
From the perspective of a client of MyClass, T is universal because you can substitute any type for T when you use that class and you must know the actual type of T whenever you use an instance of MyClass
From the perspective of instance methods in MyClass itself, T is existential because it doesn't know the real type of T
In Java, ? represents the existential type - thus when you are inside the class, T is basically ?. If you want to handle an instance of MyClass with T existential, you can declare MyClass<?> as in the secretMessage() example above.
Existential types are sometimes used to hide the implementation details of something, as discussed elsewhere. A Java version of this might look like:
public class ToDraw<T> {
T obj;
Function<Pair<T,Graphics>, Void> draw;
ToDraw(T obj, Function<Pair<T,Graphics>, Void>
static void draw(ToDraw<?> d, Graphics g) { d.draw.apply(new Pair(d.obj, g)); }
}
// Now you can put these in a list and draw them like so:
List<ToDraw<?>> drawList = ... ;
for(td in drawList) ToDraw.draw(td);
It's a bit tricky to capture this properly because I'm pretending to be in some sort of functional programming language, which Java isn't. But the point here is that you are capturing some sort of state plus a list of functions that operate on that state and you don't know the real type of the state part, but the functions do since they were matched up with that type already.
Now, in Java all non-final non-primitive types are partly existential. This may sound strange, but because a variable declared as Object could potentially be a subclass of Object instead, you cannot declare the specific type, only "this type or a subclass". And so, objects are represented as a bit of state plus a list of functions that operate on that state - exactly which function to call is determined at runtime by lookup. This is very much like the use of existential types above where you have an existential state part and a function that operates on that state.
In statically typed programming languages without subtyping and casts, existential types allow one to manage lists of differently typed objects. A list of T<Int> cannot contain a T<Long>. However, a list of T<?> can contain any variation of T, allowing one to put many different types of data into the list and convert them all to an int (or do whatever operations are provided inside the data structure) on demand.
One can pretty much always convert a record with an existential type into a record without using closures. A closure is existentially typed, too, in that the free variables it is closed over are hidden from the caller. Thus a language that supports closures but not existential types can allow you to make closures that share the same hidden state that you would have put into the existential part of an object.
An existential type is an opaque type.
Think of a file handle in Unix. You know its type is int, so you can easily forge it. You can, for instance, try to read from handle 43. If it so happens that the program has a file open with this particular handle, you'll read from it. Your code doesn't have to be malicious, just sloppy (e.g., the handle could be an uninitialized variable).
An existential type is hidden from your program. If fopen returned an existential type, all you could do with it is to use it with some library functions that accept this existential type. For instance, the following pseudo-code would compile:
let exfile = fopen("foo.txt"); // No type for exfile!
read(exfile, buf, size);
The interface "read" is declared as:
There exists a type T such that:
size_t read(T exfile, char* buf, size_t size);
The variable exfile is not an int, not a char*, not a struct File—nothing you can express in the type system. You can't declare a variable whose type is unknown and you cannot cast, say, a pointer into that unknown type. The language won't let you.
Seems I’m coming a bit late, but anyway, this document adds another view of what existential types are, although not specifically language-agnostic, it should be then fairly easier to understand existential types: http://www.cs.uu.nl/groups/ST/Projects/ehc/ehc-book.pdf (chapter 8)
The difference between a universally and existentially quantified type can be characterized by the following observation:
The use of a value with a ∀ quantified type determines the type to choose for the instantiation of the quantified type variable. For example, the caller of the identity function “id :: ∀a.a → a” determines the type to choose for the type variable a for this particular application of id. For the function application “id 3” this type equals Int.
The creation of a value with a ∃ quantified type determines, and hides, the type of the quantified type variable. For example, a creator of a “∃a.(a, a → Int)” may have constructed a value of that type from “(3, λx → x)”; another creator has constructed a value with the same type from “(’x’, λx → ord x)”. From a users point of view both values have the same type and are thus interchangeable. The value has a specific type chosen for type variable a, but we do not know which type, so this information can no longer be exploited. This value specific type information has been ‘forgotten’; we only know it exists.
A universal type exists for all values of the type parameter(s). An existential type exists only for values of the type parameter(s) that satisfy the constraints of the existential type.
For example in Scala one way to express an existential type is an abstract type which is constrained to some upper or lower bounds.
trait Existential {
type Parameter <: Interface
}
Equivalently a constrained universal type is an existential type as in the following example.
trait Existential[Parameter <: Interface]
Any use site can employ the Interface because any instantiable subtypes of Existential must define the type Parameter which must implement the Interface.
A degenerate case of an existential type in Scala is an abstract type which is never referred to and thus need not be defined by any subtype. This effectively has a shorthand notation of List[_] in Scala and List<?> in Java.
My answer was inspired by Martin Odersky's proposal to unify abstract and existential types. The accompanying slide aids understanding.
Research into abstract datatypes and information hiding brought existential types into programming languages. Making a datatype abstract hides info about that type, so a client of that type cannot abuse it. Say you've got a reference to an object... some languages allow you to cast that reference to a reference to bytes and do anything you want to that piece of memory. For purposes of guaranteeing behavior of a program, it's useful for a language to enforce that you only act on the reference to the object via the methods the designer of the object provides. You know the type exists, but nothing more.
See:
Abstract Types Have Existential Type, MITCHEL & PLOTKIN
http://theory.stanford.edu/~jcm/papers/mitch-plotkin-88.pdf
I created this diagram. I don't know if it's rigorous. But if it helps, I'm glad.
As I understand it's a math way to describe interfaces/abstract class.
As for T = ∃X { X a; int f(X); }
For C# it would translate to a generic abstract type:
abstract class MyType<T>{
private T a;
public abstract int f(T x);
}
"Existential" just means that there is some type that obey to the rules defined here.

What is Type-safe?

What does "type-safe" mean?
Type safety means that the compiler will validate types while compiling, and throw an error if you try to assign the wrong type to a variable.
Some simple examples:
// Fails, Trying to put an integer in a string
String one = 1;
// Also fails.
int foo = "bar";
This also applies to method arguments, since you are passing explicit types to them:
int AddTwoNumbers(int a, int b)
{
return a + b;
}
If I tried to call that using:
int Sum = AddTwoNumbers(5, "5");
The compiler would throw an error, because I am passing a string ("5"), and it is expecting an integer.
In a loosely typed language, such as javascript, I can do the following:
function AddTwoNumbers(a, b)
{
return a + b;
}
if I call it like this:
Sum = AddTwoNumbers(5, "5");
Javascript automaticly converts the 5 to a string, and returns "55". This is due to javascript using the + sign for string concatenation. To make it type-aware, you would need to do something like:
function AddTwoNumbers(a, b)
{
return Number(a) + Number(b);
}
Or, possibly:
function AddOnlyTwoNumbers(a, b)
{
if (isNaN(a) || isNaN(b))
return false;
return Number(a) + Number(b);
}
if I call it like this:
Sum = AddTwoNumbers(5, " dogs");
Javascript automatically converts the 5 to a string, and appends them, to return "5 dogs".
Not all dynamic languages are as forgiving as javascript (In fact a dynamic language does not implicity imply a loose typed language (see Python)), some of them will actually give you a runtime error on invalid type casting.
While its convenient, it opens you up to a lot of errors that can be easily missed, and only identified by testing the running program. Personally, I prefer to have my compiler tell me if I made that mistake.
Now, back to C#...
C# supports a language feature called covariance, this basically means that you can substitute a base type for a child type and not cause an error, for example:
public class Foo : Bar
{
}
Here, I created a new class (Foo) that subclasses Bar. I can now create a method:
void DoSomething(Bar myBar)
And call it using either a Foo, or a Bar as an argument, both will work without causing an error. This works because C# knows that any child class of Bar will implement the interface of Bar.
However, you cannot do the inverse:
void DoSomething(Foo myFoo)
In this situation, I cannot pass Bar to this method, because the compiler does not know that Bar implements Foo's interface. This is because a child class can (and usually will) be much different than the parent class.
Of course, now I've gone way off the deep end and beyond the scope of the original question, but its all good stuff to know :)
Type-safety should not be confused with static / dynamic typing or strong / weak typing.
A type-safe language is one where the only operations that one can execute on data are the ones that are condoned by the data's type. That is, if your data is of type X and X doesn't support operation y, then the language will not allow you to to execute y(X).
This definition doesn't set rules on when this is checked. It can be at compile time (static typing) or at runtime (dynamic typing), typically through exceptions. It can be a bit of both: some statically typed languages allow you to cast data from one type to another, and the validity of casts must be checked at runtime (imagine that you're trying to cast an Object to a Consumer - the compiler has no way of knowing whether it's acceptable or not).
Type-safety does not necessarily mean strongly typed, either - some languages are notoriously weakly typed, but still arguably type safe. Take Javascript, for example: its type system is as weak as they come, but still strictly defined. It allows automatic casting of data (say, strings to ints), but within well defined rules. There is to my knowledge no case where a Javascript program will behave in an undefined fashion, and if you're clever enough (I'm not), you should be able to predict what will happen when reading Javascript code.
An example of a type-unsafe programming language is C: reading / writing an array value outside of the array's bounds has an undefined behaviour by specification. It's impossible to predict what will happen. C is a language that has a type system, but is not type safe.
Type safety is not just a compile time constraint, but a run time constraint. I feel even after all this time, we can add further clarity to this.
There are 2 main issues related to type safety. Memory** and data type (with its corresponding operations).
Memory**
A char typically requires 1 byte per character, or 8 bits (depends on language, Java and C# store unicode chars which require 16 bits).
An int requires 4 bytes, or 32 bits (usually).
Visually:
char: |-|-|-|-|-|-|-|-|
int : |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-|
A type safe language does not allow an int to be inserted into a char at run-time (this should throw some kind of class cast or out of memory exception). However, in a type unsafe language, you would overwrite existing data in 3 more adjacent bytes of memory.
int >> char:
|-|-|-|-|-|-|-|-| |?|?|?|?|?|?|?|?| |?|?|?|?|?|?|?|?| |?|?|?|?|?|?|?|?|
In the above case, the 3 bytes to the right are overwritten, so any pointers to that memory (say 3 consecutive chars) which expect to get a predictable char value will now have garbage. This causes undefined behavior in your program (or worse, possibly in other programs depending on how the OS allocates memory - very unlikely these days).
** While this first issue is not technically about data type, type safe languages address it inherently and it visually describes the issue to those unaware of how memory allocation "looks".
Data Type
The more subtle and direct type issue is where two data types use the same memory allocation. Take a int vs an unsigned int. Both are 32 bits. (Just as easily could be a char[4] and an int, but the more common issue is uint vs. int).
|-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-|
|-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-|
A type unsafe language allows the programmer to reference a properly allocated span of 32 bits, but when the value of a unsigned int is read into the space of an int (or vice versa), we again have undefined behavior. Imagine the problems this could cause in a banking program:
"Dude! I overdrafted $30 and now I have $65,506 left!!"
...'course, banking programs use much larger data types. ;) LOL!
As others have already pointed out, the next issue is computational operations on types. That has already been sufficiently covered.
Speed vs Safety
Most programmers today never need to worry about such things unless they are using something like C or C++. Both of these languages allow programmers to easily violate type safety at run time (direct memory referencing) despite the compilers' best efforts to minimize the risk. HOWEVER, this is not all bad.
One reason these languages are so computationally fast is they are not burdened by verifying type compatibility during run time operations like, for example, Java. They assume the developer is a good rational being who won't add a string and an int together and for that, the developer is rewarded with speed/efficiency.
Many answers here conflate type-safety with static-typing and dynamic-typing. A dynamically typed language (like smalltalk) can be type-safe as well.
A short answer: a language is considered type-safe if no operation leads to undefined behavior. Many consider the requirement of explicit type conversions necessary for a language to be strictly typed, as automatic conversions can sometimes leads to well defined but unexpected/unintuitive behaviors.
A programming language that is 'type-safe' means following things:
You can't read from uninitialized variables
You can't index arrays beyond their bounds
You can't perform unchecked type casts
An explanation from a liberal arts major, not a comp sci major:
When people say that a language or language feature is type safe, they mean that the language will help prevent you from, for example, passing something that isn't an integer to some logic that expects an integer.
For example, in C#, I define a function as:
void foo(int arg)
The compiler will then stop me from doing this:
// call foo
foo("hello world")
In other languages, the compiler would not stop me (or there is no compiler...), so the string would be passed to the logic and then probably something bad will happen.
Type safe languages try to catch more at "compile time".
On the down side, with type safe languages, when you have a string like "123" and you want to operate on it like an int, you have to write more code to convert the string to an int, or when you have an int like 123 and want to use it in a message like, "The answer is 123", you have to write more code to convert/cast it to a string.
To get a better understanding do watch the below video which demonstrates code in type safe language (C#) and NOT type safe language ( javascript).
http://www.youtube.com/watch?v=Rlw_njQhkxw
Now for the long text.
Type safety means preventing type errors. Type error occurs when data type of one type is assigned to other type UNKNOWINGLY and we get undesirable results.
For instance JavaScript is a NOT a type safe language. In the below code “num” is a numeric variable and “str” is string. Javascript allows me to do “num + str” , now GUESS will it do arithmetic or concatenation .
Now for the below code the results are “55” but the important point is the confusion created what kind of operation it will do.
This is happening because javascript is not a type safe language. Its allowing to set one type of data to the other type without restrictions.
<script>
var num = 5; // numeric
var str = "5"; // string
var z = num + str; // arthimetic or concat ????
alert(z); // displays “55”
</script>
C# is a type safe language. It does not allow one data type to be assigned to other data type. The below code does not allow “+” operator on different data types.
Concept:
To be very simple Type Safe like the meanings, it makes sure that type of the variable should be safe like
no wrong data type e.g. can't save or initialized a variable of string type with integer
Out of bound indexes are not accessible
Allow only the specific memory location
so it is all about the safety of the types of your storage in terms of variables.
Type-safe means that programmatically, the type of data for a variable, return value, or argument must fit within a certain criteria.
In practice, this means that 7 (an integer type) is different from "7" (a quoted character of string type).
PHP, Javascript and other dynamic scripting languages are usually weakly-typed, in that they will convert a (string) "7" to an (integer) 7 if you try to add "7" + 3, although sometimes you have to do this explicitly (and Javascript uses the "+" character for concatenation).
C/C++/Java will not understand that, or will concatenate the result into "73" instead. Type-safety prevents these types of bugs in code by making the type requirement explicit.
Type-safety is very useful. The solution to the above "7" + 3 would be to type cast (int) "7" + 3 (equals 10).
Try this explanation on...
TypeSafe means that variables are statically checked for appropriate assignment at compile time. For example, consder a string or an integer. These two different data types cannot be cross-assigned (ie, you can't assign an integer to a string nor can you assign a string to an integer).
For non-typesafe behavior, consider this:
object x = 89;
int y;
if you attempt to do this:
y = x;
the compiler throws an error that says it can't convert a System.Object to an Integer. You need to do that explicitly. One way would be:
y = Convert.ToInt32( x );
The assignment above is not typesafe. A typesafe assignement is where the types can directly be assigned to each other.
Non typesafe collections abound in ASP.NET (eg, the application, session, and viewstate collections). The good news about these collections is that (minimizing multiple server state management considerations) you can put pretty much any data type in any of the three collections. The bad news: because these collections aren't typesafe, you'll need to cast the values appropriately when you fetch them back out.
For example:
Session[ "x" ] = 34;
works fine. But to assign the integer value back, you'll need to:
int i = Convert.ToInt32( Session[ "x" ] );
Read about generics for ways that facility helps you easily implement typesafe collections.
C# is a typesafe language but watch for articles about C# 4.0; interesting dynamic possibilities loom (is it a good thing that C# is essentially getting Option Strict: Off... we'll see).
Type-Safe is code that accesses only the memory locations it is authorized to access, and only in well-defined, allowable ways.
Type-safe code cannot perform an operation on an object that is invalid for that object. The C# and VB.NET language compilers always produce type-safe code, which is verified to be type-safe during JIT compilation.
Type-safe means that the set of values that may be assigned to a program variable must fit well-defined and testable criteria. Type-safe variables lead to more robust programs because the algorithms that manipulate the variables can trust that the variable will only take one of a well-defined set of values. Keeping this trust ensures the integrity and quality of the data and the program.
For many variables, the set of values that may be assigned to a variable is defined at the time the program is written. For example, a variable called "colour" may be allowed to take on the values "red", "green", or "blue" and never any other values. For other variables those criteria may change at run-time. For example, a variable called "colour" may only be allowed to take on values in the "name" column of a "Colours" table in a relational database, where "red, "green", and "blue", are three values for "name" in the "Colours" table, but some other part of the computer program may be able to add to that list while the program is running, and the variable can take on the new values after they are added to the Colours table.
Many type-safe languages give the illusion of "type-safety" by insisting on strictly defining types for variables and only allowing a variable to be assigned values of the same "type". There are a couple of problems with this approach. For example, a program may have a variable "yearOfBirth" which is the year a person was born, and it is tempting to type-cast it as a short integer. However, it is not a short integer. This year, it is a number that is less than 2009 and greater than -10000. However, this set grows by 1 every year as the program runs. Making this a "short int" is not adequate. What is needed to make this variable type-safe is a run-time validation function that ensures that the number is always greater than -10000 and less than the next calendar year. There is no compiler that can enforce such criteria because these criteria are always unique characteristics of the problem domain.
Languages that use dynamic typing (or duck-typing, or manifest typing) such as Perl, Python, Ruby, SQLite, and Lua don't have the notion of typed variables. This forces the programmer to write a run-time validation routine for every variable to ensure that it is correct, or endure the consequences of unexplained run-time exceptions. In my experience, programmers in statically typed languages such as C, C++, Java, and C# are often lulled into thinking that statically defined types is all they need to do to get the benefits of type-safety. This is simply not true for many useful computer programs, and it is hard to predict if it is true for any particular computer program.
The long & the short.... Do you want type-safety? If so, then write run-time functions to ensure that when a variable is assigned a value, it conforms to well-defined criteria. The down-side is that it makes domain analysis really difficult for most computer programs because you have to explicitly define the criteria for each program variable.
Type Safety
In modern C++, type safety is very important. Type safety means that you use the types correctly and, therefore, avoid unsafe casts and unions. Every object in C++ is used according to its type and an object needs to be initialized before its use.
Safe Initialization: {}
The compiler protects from information loss during type conversion. For example,
int a{7}; The initialization is OK
int b{7.5} Compiler shows ERROR because of information loss.\
Unsafe Initialization: = or ()
The compiler doesn't protect from information loss during type conversion.
int a = 7 The initialization is OK
int a = 7.5 The initialization is OK, but information loss occurs. The actual value of a will become 7.0
int c(7) The initialization is OK
int c(7.5) The initialization is OK, but information loss occurs. The actual value of a will become 7.0