Value polymorphism and "generating an exception" - exception

Per The Definition of Standard ML (Revised):
The idea is that dynamic evaluation of a non-expansive expression will neither generate an exception nor extend the domain of the memory, while the evaluation of an expansive expression might.
[§4.7, p19; emphasis mine]
I've found a lot of information online about the ref-cell part, but almost none about the exception part. (A few sources point out that it's still possible for a polymorphic binding to raise Bind, and that this inconsistency can have type-theoretic and/or implementation consequences, but I'm not sure whether that's related.)
I've been able to come up with one exception-related unsoundness that, if I'm not mistaken, is prevented only by the value restriction; but that unsoundness does not depend on raising an exception:
local
val (wrapAnyValueInExn, unwrapExnToAnyType) =
let exception EXN of 'a
in (EXN, fn EXN value => value)
end
in
val castAnyValueToAnyType = fn value => unwrapExnToAnyType (wrapAnyValueInExn value)
end
So, can anyone tell me what the Definition is getting at, and why it mentions exceptions?
(Is it possible that "generate an exception" means generating an exception name, rather than generating an exception packet?)

I'm not a type theorist or formal semanticist, but I think I understand what the definition is trying to get at from an operational point of view.
ML exceptions being generative means that, whenever the control of flow reaches the same exception declaration twice, two different exceptions are created. Not only are these distinct objects in memory, but these objects are also extensionally unequal: we can distinguish these objects by pattern-matching against exceptions constructors.
[Incidentally, this shows an important difference between ML exceptions and exceptions in most other languages. In ML, new exception classes can be created at runtime.]
On the other hand, if your program builds the same list of integers twice, you may have two different objects in memory, but your program has no way to distinguish between them. They are extensionally equal.
As an example of why generative exceptions are useful, consider MLton's sample implementation of a universal type:
signature UNIV =
sig
type univ
val embed : unit -> { inject : 'a -> univ
, project : univ -> 'a option
}
end
structure Univ :> UNIV =
struct
type univ = exn
fun 'a embed () =
let
exception E of 'a
in
{ inject = E
, project = fn (E x) => SOME x | _ => NONE
}
end
end
This code would cause a huge type safety hole if ML had no value restriction:
val { inject = inj1, project = proj1 } = Univ.embed ()
val { inject = inj2, project = proj2 } = Univ.embed ()
(* `inj1` and `proj1` share the same internal exception. This is
* why `proj1` can project values injected with `inj1`.
*
* `inj2` and `proj2` similarly share the same internal exception.
* But this exception is different from the one used by `inj1` and
* `proj1`.
*
* Furthermore, the value restriction makes all of these functions
* monomorphic. However, at this point, we don't know yet what these
* monomorphic types might be.
*)
val univ1 = inj1 "hello"
val univ2 = inj2 5
(* Now we do know:
*
* inj1 : string -> Univ.univ
* proj1 : Univ.univ -> string option
* inj2 : int -> Univ.univ
* proj2 : Univ.univ -> int option
*)
val NONE = proj1 univ2
val NONE = proj2 univ1
(* Which confirms that exceptions are generative. *)
val SOME str = proj1 univ1
val SOME int = proj2 univ2
(* Without the value restriction, `str` and `int` would both
* have type `'a`, which is obviously unsound. Thanks to the
* value restriction, they have types `string` and `int`,
* respectively.
*)

[Hat-tip to Eduardo León's answer for stating that the Definition is indeed referring to this, and for bringing in the phrase "generative exceptions". I've upvoted his answer, but am posting this separately, because I felt that his answer came at the question from the wrong direction, somewhat: most of that answer is an exposition of things that are already presupposed by the question.]
Is it possible that "generate an exception" means generating an exception name, rather than generating an exception packet?
Yes, I think so. Although the Definition doesn't usually use the word "exception" alone, other sources do commonly refer to exception names as simply "exceptions" — including in the specific context of generating them. For example, from http://mlton.org/GenerativeException:
In Standard ML, exception declarations are said to be generative, because each time an exception declaration is evaluated, it yields a new exception.
(And as you can see there, that page consistently refers to exception names as "exceptions".)
The Standard ML Basis Library, likewise, uses "exception" in this way. For example, from page 29:
At one extreme, a programmer could employ the standard exception General.Fail everywhere, letting it carry a string describing the particular failure. […] For example, one technique is to have a function sampleFn in a structure Sample raise the exception Fail "Sample.sampleFn".
As you can see, this paragraph uses the term "exception" twice, once in reference to an exception name, and once in reference to an exception value, relying on context to make the meaning clear.
So it's quite reasonable for the Definition to use the phrase "generate an exception" to refer to generating an exception name (though even so, it is probably a small mistake; the Definition is usually more precise and formal than this, and usually indicates when it intends to rely on context for disambiguation).

Related

Do you have to declare a function's type?

One thing I do not fully understand about Haskell is declaring functions and their types: is it something you have to do or is it just something you should do for clarity? Or are there certain scenarios where you need to do it, just not all?
You don’t need to declare the type of any function that uses only standard Haskell type system features. Haskell 98 is specified with global type inference, meaning that the types of all top-level bindings are guaranteed to be inferable.
However, it’s good practice to include type annotations for top-level definitions, for a few reasons:
Verifying that the inferred type matches your expectations
Helping the compiler producing better diagnostic messages when there are type mismatches
Most importantly, documenting your intent and making the code more readable for humans!
As for definitions in where clauses, it’s a matter of style. The conventional style is to omit them, partly because in some cases, their types could not be written explicitly before the ScopedTypeVariables extension. I consider the omission of scoped type variables a bit of a bug in the 1998 and 2010 standards, and GHC is the de facto standard compiler today, but it’s still a nonstandard extension. Regardless, it’s good practice to include annotations where possible for nontrivial code, and helpful for you as a programmer.
In practice, it’s common to use some language extensions that complicate type inference or make it “undecidable”, meaning that, at least for arbitrary programs, it’s impossible to always infer a type, or at least a unique “best” type. But for the sake of usability, extensions are usually designed very carefully to only require annotations at the point where you actually use them.
For example, GHC (and standard Haskell) will only infer polymorphic types with top-level foralls, which are normally left entirely implicit. (They can be written explicitly using ExplicitForAll.) If you need to pass a polymorphic function as an argument to another function like (forall t. …) -> … using RankNTypes, this requires an annotation to override the compiler’s assumption that you meant something like forall t. (… -> …), or that you mistakenly applied the function on different types.
If an extension requires annotations, the rules for when and where you must include them are typically documented in places like the GHC User’s Guide, and formally specified in the papers specifying the feature.
Short answer: Functions are defined in "bindings" and have their types declared in "type signatures". Type signatures for bindings are always syntactically optional, as the language doesn't require their use in any particular case. (There are some places type signatures are required for things other than bindings, like in class definitions or in declarations of data types, but I don't think there's any case where a binding requires an accompanying type signature according to the syntax of the language, though I might be forgetting some weird situation.) The reason they aren't required is that the compiler can usually, though not always, figure out the types of functions itself as part of its type-checking operation.
However, some programs may not compile unless a type signature is added to a binding, and some programs may not compile unless a type signature is removed, so sometimes you need to use them, and sometimes you can't use them (at least not without a language extension and some changes to the syntax of other, nearby type signatures to use the extension).
It is considered best practice to include type signatures for every top-level binding, and the GHC -Wall flag will warn you if any top-level bindings lack an associated type signature. The rationale for this is that top-level signatures (1) provide documentation for the "interface" of your code, (2) aren't so numerous that they overburden the programmer, and (3) usually provide sufficient guidance to the compiler that you get better error messages than if you omit type signatures entirely.
If you look at almost any real-world Haskell source code (e.g., browse the source of any decent library on Hackage), you'll see this convention being used -- all top-level bindings have type signatures, and type signatures are used sparingly in other contexts (in expressions or where or let clauses). I'd encourage any beginner to use this same convention in the code they write as they're learning Haskell. It's a good habit and will avoid many frustrating error messages.
Long answer:
In Haskell, a binding assigns a name to a chunk of code, like the following binding for the function hypo:
hypo a b = sqrt (a*a + b*b)
When compiling a binding (or collection of related bindings), the compiler performs a type-checking operation on the expressions and subexpressions that are involved.
It is this type-checking operation that allows the compiler to determine that the variable a in the above expression must be of some type t that has a Num t constraint (in order to support the * operation), that the result of a*a will be the same type t, and that this implies that b*b and so b are also of this same type t (since only two values of the same type can be added together with +), and that a*a + b*b is therefore of type t, and so the result of the sqrt must also be of this same type t which must incidentally have a Floating t constraint to support the sqrt operation. The information collected and type relationships deduced during this type checking allow the compiler to infer a general type signature for the hypo function automatically, namely:
hypo :: (Floating t) => t -> t -> t
(The Num t constraint doesn't appear because it's implied by Floating t).
Because the compiler can learn the type signatures of (most) bound names, like hypo, automatically as a side-effect of the type-checking operation, there's no fundamental need for the programmer to explicitly supply this information, and that's the motivation for the language making type signatures optional. The only requirements the language places on type signatures is that if they are supplied, they must appear in the same declaration list as the associated binding (e.g., both must appear in the same module, or in the same where clause or whatever, and you can't have a type signature without a binding), there must be at most one type signature for a binding (no duplicate type signatures, even if they are identical, unlike in C, say), and the type supplied in the type signature must not be in conflict with the results of type checking.
The language allows the type signature and binding to appear anywhere in the same declaration list, in any order, and with no requirement they be next to each other, so the following is valid Haskell code:
double :: (Num a) => a -> a
half x = x / 2
double x = x + x
half :: (Fractional a) => a -> a
Such silliness is not recommended, however, and the convention is to place the type signature immediately before the corresponding binding, though one exception is to have a type signature shared across multiple bindings of the same type, whose definitions follow:
ex1, ex2, ex3 :: Tree Int
ex1 = Leaf 1
ex2 = Node (Leaf 2) (Leaf 3)
ex3 = Node (Node (Leaf 4) (Leaf 5)) (Leaf 5)
In some situations, the compiler cannot automatically infer the correct type for a binding, and a type signature may be required. The following binding requires a type signature and won't compile without it. (The technical problem is that toList is written using polymorphic recursion.)
data Binary a = Leaf a | Pair (Binary (a,a)) deriving (Show)
-- following type signature is required...
toList :: Binary a -> [a]
toList (Leaf x) = [x]
toList (Pair b) = concatMap (\(x,y) -> [x,y]) (toList b)
In other situations, the compiler can automatically infer the correct type for a binding, but the type can't be expressed in a type signature (at least, not without some GHC extensions to the standard language). This happens most often in where clauses. (The technical problem is that type variables aren't scoped, and go's type involves the type variable a from the type signature of myLookup.)
myLookup :: Eq a => a -> [(a,b)] -> Maybe b
myLookup k = go
where -- go :: [(a,b)] -> Maybe b
go ((k',v):rest) | k == k' = Just v
| otherwise = go rest
go [] = Nothing
There's no type signature in standard Haskell for go that would work here. However, if you enable an extension, you can write one if you also modify the type signature for myLookup itself to scope the type variables.
myLookup :: forall a b. Eq a => a -> [(a,b)] -> Maybe b
myLookup k = go
where go :: [(a,b)] -> Maybe b
go ((k',v):rest) | k == k' = Just v
| otherwise = go rest
go [] = Nothing
It's considered best practice to put type signatures on all top-level bindings and use them sparingly elsewhere. The -Wall compiler flag turns on the -Wmissing-signatures warning which warns about any missing top-level signatures.
The main motivation, I think, is that top-level bindings are the ones that are most likely to be used in multiple places throughout the code at some distance from where they are defined, and the type signature usually provides concise documentation for what a function does and how it's intended to be used. Consider the following type signatures from a Sudoku solver I wrote many years ago. Is there much doubt what these functions do?
possibleSymbols :: Index -> Board -> [Symbol]
possibleBoards :: Index -> Board -> [Board]
setSymbol :: Index -> Board -> Symbol -> Board
While the type signatures auto-generated by the compiler also serve as decent documentation and can be inspected in GHCi, it's convenient to have the type signatures in the source code, as a form of compiler-checked comment documenting the binding's purpose.
Any Haskell programmer who's spent a moment trying to use an unfamiliar library, read someone else's code, or read their own past code knows how helpful top-level signatures are as documentation. (Admittedly, a frequently levelled criticism of Haskell is that sometimes the type signatures are the only documentation for a library.)
A secondary motivation is that in developing and refactoring code, type signatures make it easier to "control" the types and localize errors. Without any signatures, the compiler can infer some really crazy types for code, and the error messages that get generated can be baffling, often identifying parts of the code that have nothing to do with the underlying error.
For example, consider this program:
data Tree a = Leaf a | Node (Tree a) (Tree a)
leaves (Leaf x) = x
leaves (Node l r) = leaves l ++ leaves r
hasLeaf x t = elem x (leaves t)
main = do
-- some tests
print $ hasLeaf 1 (Leaf 1)
print $ hasLeaf 1 (Node (Leaf 2) (Leaf 3))
The functions leaves and hasLeaf compile fine, but main barfs out the following cascade of errors (abbreviated for this posting):
Leaves.hs:12:11-28: error:
• Ambiguous type variable ‘a0’ arising from a use of ‘hasLeaf’
prevents the constraint ‘(Eq a0)’ from being solved.
Probable fix: use a type annotation to specify what ‘a0’ should be.
Leaves.hs:12:19: error:
• Ambiguous type variable ‘a0’ arising from the literal ‘1’
prevents the constraint ‘(Num a0)’ from being solved.
Probable fix: use a type annotation to specify what ‘a0’ should be.
Leaves.hs:12:27: error:
• No instance for (Num [a0]) arising from the literal ‘1’
Leaves.hs:13:11-44: error:
• Ambiguous type variable ‘a1’ arising from a use of ‘hasLeaf’
prevents the constraint ‘(Eq a1)’ from being solved.
Probable fix: use a type annotation to specify what ‘a1’ should be.
Leaves.hs:13:19: error:
• Ambiguous type variable ‘a1’ arising from the literal ‘1’
prevents the constraint ‘(Num a1)’ from being solved.
Probable fix: use a type annotation to specify what ‘a1’ should be.
Leaves.hs:13:33: error:
• No instance for (Num [a1]) arising from the literal ‘2’
With programmer-supplied top-level type signatures:
leaves :: Tree a -> [a]
leaves (Leaf x) = x
leaves (Node l r) = leaves l ++ leaves r
hasLeaf :: (Eq a) => a -> Tree a -> Bool
hasLeaf x t = elem x (leaves t)
a single error is immediately localized to the offending line:
leaves (Leaf x) = x
^
Leaves.hs:4:19: error:
• Occurs check: cannot construct the infinite type: a ~ [a]
Beginners might not understand the "occurs check" but are at least looking at the right place to make the simple fix:
leaves (Leaf x) = [x]
So, why not add type signatures everywhere, not just at top-level? Well, if you literally tried to add type signatures everywhere they were syntactically valid, you'd be writing code like:
{-# LANGUAGE ScopedTypeVariables #-}
hypo :: forall t. (Floating t) => t -> t -> t
hypo (a :: t) (b :: t) = sqrt (((a :: t) * (a :: t) :: t) + ((b :: t) * (b :: t) :: t) :: t) :: t
so you want to draw the line somewhere. The main argument against adding them for all bindings in let and where clauses is that those bindings are often short bindings easily understood at a glance, and they're localized to the code that you're trying to understand "all at once" anyway. The signatures are also potentially less useful as documentation because bindings in these clauses are more likely to refer to and use other nearby bindings of arguments or intermediate results, so they aren't "self-contained" like a top-level binding. The signature only documents a small portion of what the binding is doing. For example, in:
qsort :: (Ord a) => [a] -> [a]
qsort (x:xs) = qsort l ++ [x] ++ qsort r
where -- l, r :: [a]
l = filter (<=x) xs
r = filter (>x) xs
qsort [] = []
having type signatures l, r :: [a] in the where clause wouldn't add very much. There's also the additional complication that you'd need the ScopedTypeVariables extension to write it, as above, so that's maybe another reason to omit it.
As I say, I think any Haskell beginner should be encouraged to adopt a similar convention of writing top-level type signatures, ideally writing the top-level signature before starting to write the accompanying bindings. It's one of the easiest ways to leverage the type system to guide the design process and write good Haskell code.

Practical difference between def f(x: Int) = x+1 and val f = (x: Int) => x+1 in Scala

I'm new to Scala and I'm having a problem understanding this. Why are there two syntaxes for the same concept, and none of them more efficient or shorter at that (merely from a typing standpoint, maybe they differ in behavior - which is what I'm asking).
In Go the analogues have a practical difference - you can't forward-reference the lambda assigned to a variable, but you can reference a named function from anywhere. Scala blends these two if I understand it correctly: you can forward-reference any variable (please correct me if I'm wrong).
Please note that this question is not a duplicate of What is the difference between “def” and “val” to define a function.
I know that def evaluates the expression after = each time it is referenced/called, and val only once. But this is different because the expression in the val definition evaluates to a function.
It is also not a duplicate of Functions vs methods in Scala.
This question concerns the syntax of Scala, and is not asking about the difference between functions and methods directly. Even though the answers may be similar in content, it's still valuable to have this exact point cleared up on this site.
There are three main differences (that I know of):
1. Internal Representation
Function expressions (aka anonymous functions or lambdas) are represented in the generated bytecode as instances of any of the Function traits. This means that function expressions are also objects. Method definitions, on the other hand, are first class citizens on the JVM and have a special bytecode representation. How this impacts performance is hard to tell without profiling.
2. Reference Syntax
References to functions and methods have different syntaxes. You can't just say foo when you want to send the reference of a method as an argument to some other part of your code. You'll have to say foo _. With functions you can just say foo and things will work as intended. The syntax foo _ is effectively wrapping the call to foo inside an anonymous function.
3. Generics Support
Methods support type parametrization, functions do not. For example, there's no way to express the following using a function value:
def identity[A](a: A): A = a
The closest would be this, but it loses the type information:
val identity = (a: Any) => a
As an extension to Ionut's first point, it may be worth taking a quick look at http://www.scala-lang.org/api/current/#scala.Function1.
From my understanding, an instance of a function as you described (ie.
val f = (x: Int) => x + 1) extends the Function1 class. The implications of this are that an instance of a function consumes more memory than defining a method. Methods are innate to the JVM, hence they can be determined at compile time. The obvious cost of a Function is its memory consumption, but with it come added benefits such as composition with other Function objects.
If I understand correctly, the reason defs and lambdas can work together is because the Function class has a self-type (T1) ⇒ R which is implied by its apply() method https://github.com/scala/scala/blob/v2.11.8/src/library/scala/Function1.scala#L36. (At least I THINK that's what going on, please correct me if I'm wrong). This is all just my own speculation, however. There's certain to be some extra compiler magic taking place underneath to allow method and function interoperability.

Are there disadvantages to return type inference? If yes, what are they?

A lot of statically typed languages, like C++ and C#, have local variable type inference (with the keywords auto and var respectively, I think).
However, I haven't seen many C-derived languages (apart from those mentioned in the comments) implementing compile-time return type inference. I'll describe what I mean by "return type inference" before I ask the question. (I definitely don't mean overloading by return type.)
Consider this code in a hypothetical C#-like language:
private auto SomeMethod(int x)
{
return 3 * x;
}
It's more than obvious (to humans and to the compiler) that the return type is int (and the compilers can verify it).
The same goes for multiple paths:
private auto SomeOtherMethod(int x)
{
if(x == 0) return 1;
else return 3 * x;
}
It's still not ambiguous at all, because there is already an algorithm in said languages to resolve whether two expressions have compatible types:
private auto YetAnotherMethod(int x)
{
var r = (x == 0) ? 1 : 3 * x;
return r;
}
Since the algorithm exists and it is already implemented in some form, it's probably not a technical problem in this regard. But still, I haven't seen it anywhere in statically typed languages, which got me thinking about whether there's something bad about it.
My question:
Does return type inference, as a concept, have any disadvantage or subtle pitfall that I'm not seeing? (Apart from readability - I already understand that.)
Is there some corner case where it would introduce problems or ambiguity to a statically typed language? (By "introduce", I'm referring to issues that local variable type inference doesn't already have.)
yes, there are disadvantages. one you already mentioned: readability. second - the type has to be calculated so it takes time (in turing-complete type systems it may be infinite). but there is also something different - theory of type systems is much more complicated.
let's write a function that takes a list and return its head. what's its type? or function that takes a function, and a parameter applies that and return the result. in many languages you can't declare it. to support this kind of stuff, java introduced generics and it failed miserably. currently it's one of the most hated features of the language because of consistency problems
another thing: returned type may depend on not only the body of the function but also context of the invocation. let's look at haskell (that has best type system i've ever seen) http://learnyouahaskell.com/types-and-typeclasses
there is a function called read that takes a string, parse it and return... whatever you need, an int, an array.
so each time a type system is designed, the designer has to choose at which level she wants to stop. dynamic languages decided not to infer types at all, scala decided to do some local inference but not, for example, for overloaded or recursive functions and c++ decided not to infer the result

What is an existential type?

I read through the Wikipedia article Existential types. I gathered that they're called existential types because of the existential operator (∃). I'm not sure what the point of it is, though. What's the difference between
T = ∃X { X a; int f(X); }
and
T = ∀x { X a; int f(X); }
?
When someone defines a universal type ∀X they're saying: You can plug in whatever type you want, I don't need to know anything about the type to do my job, I'll only refer to it opaquely as X.
When someone defines an existential type ∃X they're saying: I'll use whatever type I want here; you won't know anything about the type, so you can only refer to it opaquely as X.
Universal types let you write things like:
void copy<T>(List<T> source, List<T> dest) {
...
}
The copy function has no idea what T will actually be, but it doesn't need to know.
Existential types would let you write things like:
interface VirtualMachine<B> {
B compile(String source);
void run(B bytecode);
}
// Now, if you had a list of VMs you wanted to run on the same input:
void runAllCompilers(List<∃B:VirtualMachine<B>> vms, String source) {
for (∃B:VirtualMachine<B> vm : vms) {
B bytecode = vm.compile(source);
vm.run(bytecode);
}
}
Each virtual machine implementation in the list can have a different bytecode type. The runAllCompilers function has no idea what the bytecode type is, but it doesn't need to; all it does is relay the bytecode from VirtualMachine.compile to VirtualMachine.run.
Java type wildcards (ex: List<?>) are a very limited form of existential types.
Update: Forgot to mention that you can sort of simulate existential types with universal types. First, wrap your universal type to hide the type parameter. Second, invert control (this effectively swaps the "you" and "I" part in the definitions above, which is the primary difference between existentials and universals).
// A wrapper that hides the type parameter 'B'
interface VMWrapper {
void unwrap(VMHandler handler);
}
// A callback (control inversion)
interface VMHandler {
<B> void handle(VirtualMachine<B> vm);
}
Now, we can have the VMWrapper call our own VMHandler which has a universally-typed handle function. The net effect is the same, our code has to treat B as opaque.
void runWithAll(List<VMWrapper> vms, final String input)
{
for (VMWrapper vm : vms) {
vm.unwrap(new VMHandler() {
public <B> void handle(VirtualMachine<B> vm) {
B bytecode = vm.compile(input);
vm.run(bytecode);
}
});
}
}
An example VM implementation:
class MyVM implements VirtualMachine<byte[]>, VMWrapper {
public byte[] compile(String input) {
return null; // TODO: somehow compile the input
}
public void run(byte[] bytecode) {
// TODO: Somehow evaluate 'bytecode'
}
public void unwrap(VMHandler handler) {
handler.handle(this);
}
}
A value of an existential type like ∃x. F(x) is a pair containing some type x and a value of the type F(x). Whereas a value of a polymorphic type like ∀x. F(x) is a function that takes some type x and produces a value of type F(x). In both cases, the type closes over some type constructor F.
Note that this view mixes types and values. The existential proof is one type and one value. The universal proof is an entire family of values indexed by type (or a mapping from types to values).
So the difference between the two types you specified is as follows:
T = ∃X { X a; int f(X); }
This means: A value of type T contains a type called X, a value a:X, and a function f:X->int. A producer of values of type T gets to choose any type for X and a consumer can't know anything about X. Except that there's one example of it called a and that this value can be turned into an int by giving it to f. In other words, a value of type T knows how to produce an int somehow. Well, we could eliminate the intermediate type X and just say:
T = int
The universally quantified one is a little different.
T = ∀X { X a; int f(X); }
This means: A value of type T can be given any type X, and it will produce a value a:X, and a function f:X->int no matter what X is. In other words: a consumer of values of type T can choose any type for X. And a producer of values of type T can't know anything at all about X, but it has to be able to produce a value a for any choice of X, and be able to turn such a value into an int.
Obviously implementing this type is impossible, because there is no program that can produce a value of every imaginable type. Unless you allow absurdities like null or bottoms.
Since an existential is a pair, an existential argument can be converted to a universal one via currying.
(∃b. F(b)) -> Int
is the same as:
∀b. (F(b) -> Int)
The former is a rank-2 existential. This leads to the following useful property:
Every existentially quantified type of rank n+1 is a universally quantified type of rank n.
There is a standard algorithm for turning existentials into universals, called Skolemization.
I think it makes sense to explain existential types together with universal types, since the two concepts are complementary, i.e. one is the "opposite" of the other.
I cannot answer every detail about existential types (such as giving an exact definition, list all possible uses, their relation to abstract data types, etc.) because I'm simply not knowledgeable enough for that. I'll demonstrate only (using Java) what this HaskellWiki article states to be the principal effect of existential types:
Existential types can be used for several different purposes. But what they do is to 'hide' a type variable on the right-hand side. Normally, any type variable appearing on the right must also appear on the left […]
Example set-up:
The following pseudo-code is not quite valid Java, even though it would be easy enough to fix that. In fact, that's exactly what I'm going to do in this answer!
class Tree<α>
{
α value;
Tree<α> left;
Tree<α> right;
}
int height(Tree<α> t)
{
return (t != null) ? 1 + max( height(t.left), height(t.right) )
: 0;
}
Let me briefly spell this out for you. We are defining…
a recursive type Tree<α> which represents a node in a binary tree. Each node stores a value of some type α and has references to optional left and right subtrees of the same type.
a function height which returns the furthest distance from any leaf node to the root node t.
Now, let's turn the above pseudo-code for height into proper Java syntax! (I'll keep on omitting some boilerplate for brevity's sake, such as object-orientation and accessibility modifiers.) I'm going to show two possible solutions.
1. Universal type solution:
The most obvious fix is to simply make height generic by introducing the type parameter α into its signature:
<α> int height(Tree<α> t)
{
return (t != null) ? 1 + max( height(t.left), height(t.right) )
: 0;
}
This would allow you to declare variables and create expressions of type α inside that function, if you wanted to. But...
2. Existential type solution:
If you look at our method's body, you will notice that we're not actually accessing, or working with, anything of type α! There are no expressions having that type, nor any variables declared with that type... so, why do we have to make height generic at all? Why can't we simply forget about α? As it turns out, we can:
int height(Tree<?> t)
{
return (t != null) ? 1 + max( height(t.left), height(t.right) )
: 0;
}
As I wrote at the very beginning of this answer, existential and universal types are complementary / dual in nature. Thus, if the universal type solution was to make height more generic, then we should expect that existential types have the opposite effect: making it less generic, namely by hiding/removing the type parameter α.
As a consequence, you can no longer refer to the type of t.value in this method nor manipulate any expressions of that type, because no identifier has been bound to it. (The ? wildcard is a special token, not an identifier that "captures" a type.) t.value has effectively become opaque; perhaps the only thing you can still do with it is type-cast it to Object.
Summary:
===========================================================
| universally existentially
| quantified type quantified type
---------------------+-------------------------------------
calling method |
needs to know | yes no
the type argument |
---------------------+-------------------------------------
called method |
can use / refer to | yes no
the type argument |
=====================+=====================================
These are all good examples, but I choose to answer it a little bit differently. Recall from math, that ∀x. P(x) means "for all x's, I can prove that P(x)". In other words, it is a kind of function, you give me an x and I have a method to prove it for you.
In type theory, we are not talking about proofs, but of types. So in this space we mean "for any type X you give me, I will give you a specific type P". Now, since we don't give P much information about X besides the fact that it is a type, P can't do much with it, but there are some examples. P can create the type of "all pairs of the same type": P<X> = Pair<X, X> = (X, X). Or we can create the option type: P<X> = Option<X> = X | Nil, where Nil is the type of the null pointers. We can make a list out of it: List<X> = (X, List<X>) | Nil. Notice that the last one is recursive, values of List<X> are either pairs where the first element is an X and the second element is a List<X> or else it is a null pointer.
Now, in math ∃x. P(x) means "I can prove that there is a particular x such that P(x) is true". There may be many such x's, but to prove it, one is enough. Another way to think of it is that there must exist a non-empty set of evidence-and-proof pairs {(x, P(x))}.
Translated to type theory: A type in the family ∃X.P<X> is a type X and a corresponding type P<X>. Notice that while before we gave X to P, (so that we knew everything about X but P very little) that the opposite is true now. P<X> doesn't promise to give any information about X, just that there there is one, and that it is indeed a type.
How is this useful? Well, P could be a type that has a way of exposing its internal type X. An example would be an object which hides the internal representation of its state X. Though we have no way of directly manipulating it, we can observe its effect by poking at P. There could be many implementations of this type, but you could use all of these types no matter which particular one was chosen.
To directly answer your question:
With the universal type, uses of T must include the type parameter X. For example T<String> or T<Integer>. For the existential type uses of T do not include that type parameter because it is unknown or irrelevant - just use T (or in Java T<?>).
Further information:
Universal/abstract types and existential types are a duality of perspective between the consumer/client of an object/function and the producer/implementation of it. When one side sees a universal type the other sees an existential type.
In Java you can define a generic class:
public class MyClass<T> {
// T is existential in here
T whatever;
public MyClass(T w) { this.whatever = w; }
public static MyClass<?> secretMessage() { return new MyClass("bazzlebleeb"); }
}
// T is universal from out here
MyClass<String> mc1 = new MyClass("foo");
MyClass<Integer> mc2 = new MyClass(123);
MyClass<?> mc3 = MyClass.secretMessage();
From the perspective of a client of MyClass, T is universal because you can substitute any type for T when you use that class and you must know the actual type of T whenever you use an instance of MyClass
From the perspective of instance methods in MyClass itself, T is existential because it doesn't know the real type of T
In Java, ? represents the existential type - thus when you are inside the class, T is basically ?. If you want to handle an instance of MyClass with T existential, you can declare MyClass<?> as in the secretMessage() example above.
Existential types are sometimes used to hide the implementation details of something, as discussed elsewhere. A Java version of this might look like:
public class ToDraw<T> {
T obj;
Function<Pair<T,Graphics>, Void> draw;
ToDraw(T obj, Function<Pair<T,Graphics>, Void>
static void draw(ToDraw<?> d, Graphics g) { d.draw.apply(new Pair(d.obj, g)); }
}
// Now you can put these in a list and draw them like so:
List<ToDraw<?>> drawList = ... ;
for(td in drawList) ToDraw.draw(td);
It's a bit tricky to capture this properly because I'm pretending to be in some sort of functional programming language, which Java isn't. But the point here is that you are capturing some sort of state plus a list of functions that operate on that state and you don't know the real type of the state part, but the functions do since they were matched up with that type already.
Now, in Java all non-final non-primitive types are partly existential. This may sound strange, but because a variable declared as Object could potentially be a subclass of Object instead, you cannot declare the specific type, only "this type or a subclass". And so, objects are represented as a bit of state plus a list of functions that operate on that state - exactly which function to call is determined at runtime by lookup. This is very much like the use of existential types above where you have an existential state part and a function that operates on that state.
In statically typed programming languages without subtyping and casts, existential types allow one to manage lists of differently typed objects. A list of T<Int> cannot contain a T<Long>. However, a list of T<?> can contain any variation of T, allowing one to put many different types of data into the list and convert them all to an int (or do whatever operations are provided inside the data structure) on demand.
One can pretty much always convert a record with an existential type into a record without using closures. A closure is existentially typed, too, in that the free variables it is closed over are hidden from the caller. Thus a language that supports closures but not existential types can allow you to make closures that share the same hidden state that you would have put into the existential part of an object.
An existential type is an opaque type.
Think of a file handle in Unix. You know its type is int, so you can easily forge it. You can, for instance, try to read from handle 43. If it so happens that the program has a file open with this particular handle, you'll read from it. Your code doesn't have to be malicious, just sloppy (e.g., the handle could be an uninitialized variable).
An existential type is hidden from your program. If fopen returned an existential type, all you could do with it is to use it with some library functions that accept this existential type. For instance, the following pseudo-code would compile:
let exfile = fopen("foo.txt"); // No type for exfile!
read(exfile, buf, size);
The interface "read" is declared as:
There exists a type T such that:
size_t read(T exfile, char* buf, size_t size);
The variable exfile is not an int, not a char*, not a struct File—nothing you can express in the type system. You can't declare a variable whose type is unknown and you cannot cast, say, a pointer into that unknown type. The language won't let you.
Seems I’m coming a bit late, but anyway, this document adds another view of what existential types are, although not specifically language-agnostic, it should be then fairly easier to understand existential types: http://www.cs.uu.nl/groups/ST/Projects/ehc/ehc-book.pdf (chapter 8)
The difference between a universally and existentially quantified type can be characterized by the following observation:
The use of a value with a ∀ quantified type determines the type to choose for the instantiation of the quantified type variable. For example, the caller of the identity function “id :: ∀a.a → a” determines the type to choose for the type variable a for this particular application of id. For the function application “id 3” this type equals Int.
The creation of a value with a ∃ quantified type determines, and hides, the type of the quantified type variable. For example, a creator of a “∃a.(a, a → Int)” may have constructed a value of that type from “(3, λx → x)”; another creator has constructed a value with the same type from “(’x’, λx → ord x)”. From a users point of view both values have the same type and are thus interchangeable. The value has a specific type chosen for type variable a, but we do not know which type, so this information can no longer be exploited. This value specific type information has been ‘forgotten’; we only know it exists.
A universal type exists for all values of the type parameter(s). An existential type exists only for values of the type parameter(s) that satisfy the constraints of the existential type.
For example in Scala one way to express an existential type is an abstract type which is constrained to some upper or lower bounds.
trait Existential {
type Parameter <: Interface
}
Equivalently a constrained universal type is an existential type as in the following example.
trait Existential[Parameter <: Interface]
Any use site can employ the Interface because any instantiable subtypes of Existential must define the type Parameter which must implement the Interface.
A degenerate case of an existential type in Scala is an abstract type which is never referred to and thus need not be defined by any subtype. This effectively has a shorthand notation of List[_] in Scala and List<?> in Java.
My answer was inspired by Martin Odersky's proposal to unify abstract and existential types. The accompanying slide aids understanding.
Research into abstract datatypes and information hiding brought existential types into programming languages. Making a datatype abstract hides info about that type, so a client of that type cannot abuse it. Say you've got a reference to an object... some languages allow you to cast that reference to a reference to bytes and do anything you want to that piece of memory. For purposes of guaranteeing behavior of a program, it's useful for a language to enforce that you only act on the reference to the object via the methods the designer of the object provides. You know the type exists, but nothing more.
See:
Abstract Types Have Existential Type, MITCHEL & PLOTKIN
http://theory.stanford.edu/~jcm/papers/mitch-plotkin-88.pdf
I created this diagram. I don't know if it's rigorous. But if it helps, I'm glad.
As I understand it's a math way to describe interfaces/abstract class.
As for T = ∃X { X a; int f(X); }
For C# it would translate to a generic abstract type:
abstract class MyType<T>{
private T a;
public abstract int f(T x);
}
"Existential" just means that there is some type that obey to the rules defined here.

Poll: Correct behavior of equality when passed object does not match LHS type?

I asked a related question about findbugs, but let's ask a more general question.
Suppose that I am working with an object-oriented language in which polymorphism is possible.
Suppose that the language supports static type checking (e.g., Java, C++)
Suppose that the language does not allow variance in parameters (e.g., Java, again...)
If I am overriding the equality operation, which takes Object as a parameter, what should I do in a situation where the parameter is not the same type or a subtype as the LHS that equals had been called upon?
Option 1 - Return false because the objects are clearly not equals
Option 2 - Throw a casting exception because if the language actually supported variance (which would have been preferable), this would have been caught at compile time as an error; thus, detecting this error at runtime makes sense since a situation where another type is sent should have been illegal.
I vote for option 1. It is possible for two objects of different types to be equal -- for example, int and double, if first class objects, can validily be cast as each other and are comparable mathematically. Also, you may want to consider differing subclasses equal in some respects, yet neither may be able to be cast to the other (though they may be derived from the same parent).
Return false, because the objects are not equal.
I don't see how throwing a ClassCastException would be any better here.
There are contracts in interfaces such as Collection or List that actually depend on any object being able to check for equality with any other object.
It depends.
SomeClass obj1 = new SomeClass();
object other = (object)obj1;
return obj1.Equals(other); // should return "true", since they are really the same reference.
SomeClass obj1 = new SomeClass();
object other = new object();
return obj1.Equals(other); // should return "false", because they're different reference objects.
class SomeClass { }
class OtherClass { }
SomeClass obj1 = new SomeClass();
OtherClass obj2 = new OtherClass();
return obj1.Equals(obj2); // should return "false", because they're different reference objects.
If the types are two completely different types that don't inherit from one another, then there is no possible way they can be the same reference.
You shouldn't be throwing any type of casting exception because you're accepting the base object class as a parameter.
Hmmm.. I like Option 1 as well, for the following 4 reasons:
1) The objects were apparently not equal by whatever condition you used to check
2) Throwing ClassCastException requires a check for that exception every time you do a comparison, and I think that contributes to the code being less understandable or at least longer...
3) The class cast exception is merely a symptom of the problem, which is that the two objects were not equal, even at the type level.
4) As user "cbo" mentioned above, double/int being equal despite their types being different (4.0 == 4) and this applies to other types as well.
Disclaimer: I may be letting my Python ways colour a Java debate :)
Dr. Dobb's Java Q&A says the best practice is that they both are the same type. So I vote option 1.