Should Tuples Subclass Each Other? - language-agnostic

Given a set of tuple classes in an OOP language: Pair, Triple and Quad, should Triple subclass Pair, and Quad subclass Triple?
The issue, as I see it, is whether a Triple should be substitutable as a Pair, and likewise Quad for Triple or Pair. Whether Triple is also a Pair and Quad is also a Triple and a Pair.
In one context, such a relationship might be valuable for extensibility - today this thing returns a Pair of things, tomorrow I need it to return a Triple without breaking existing callers, who are only using the first two of the three.
On the other hand, should they each be distinct types? I can see benefit in stronger type checking - where you can't pass a Triple to a method that expects a Pair.
I am leaning towards using inheritance, but would really appreciate input from others?
PS: In case it matters, the classes will (of course) be generic.
PPS: On a way more subjective side, should the names be Tuple2, Tuple3 and Tuple4?
Edit: I am thinking of these more as loosely coupled groups; not specifically for things like x/y x/y/z coordinates, though they may be used for such. It would be things like needing a general solution for multiple return values from a method, but in a form with very simple semantics.
That said, I am interested in all the ways others have actually used tuples.

Different length of tuple is a different type. (Well, in many type systems anyways.) In a strongly typed language, I wouldn't think that they should be a collection.
This is a good thing as it ensures more safety. Places where you return tuples usually have somewhat coupled information along with it, the implicit knowledge of what each component is. It's worse if you pass in more values in a tuple than expected -- what's that supposed to mean? It doesn't fit inheritance.
Another potential issue is if you decide to use overloading. If tuples inherit from each other, then overload resolution will fail where it should not. But this is probably a better argument against overloading.
Of course, none of this matters if you have a specific use case and find that certain behaviours will help you.
Edit: If you want general information, try perusing a bit of Haskell or ML family (OCaml/F#) to see how they're used and then form your own decisions.

It seems to me that you should make a generic Tuple interface (or use something like the Collection mentioned above), and have your pair and 3-tuple classes implement that interface. That way, you can take advantage of polymorphism but also allow a pair to use a simpler implementation than an arbitrary-sized tuple. You'd probably want to make your Tuple interface include .x and .y accessors as shorthand for the first two elements, and larger tuples can implement their own shorthands as appropriate for items with higher indices.

Like most design related questions, the answer is - It depends.
If you are looking for conventional Tuple design, Tuple2, Tuple3 etc is the way to go. The problem with inheritance is that, first of all Triplet is not a type of Pair. How would you implement the equals method for it? Is a Triplet equal to a Pair with first two items the same? If you have a collection of Pairs, can you add triplet to it or vice versa? If in your domain this is fine, you can go with inheritance.
Any case, it pays to have an interface/abstract class (maybe Tuple) which all these implement.

it depends on the semantics that you need -
a pair of opposites is not semantically compatible with a 3-tuple of similar objects
a pair of coordinates in polar space is not semantically compatible with a 3-tuple of coordinates in Euclidean space
if your semantics are simple compositions, then a generic class Tuple<N> would make more sense

I'd go with 0,1,2 or infinity. e.g. null, 1 object, your Pair class, or then a collection of some sort.
Your Pair could even implement a Collection interface.
If there's a specific relationship between Three or Four items, it should probably be named.
[Perhaps I'm missing the problem, but I just can't think of a case where I want to specifically link 3 things in a generic way]

Gilad Bracha blogged about tuples, which I found interesting reading.
One point he made (whether correctly or not I can't yet judge) was:
Literal tuples are best defined as read only. One reason for this is that readonly tuples are more polymorphic. Long tuples are subtypes of short ones:
{S. T. U. V } <= {S. T. U} <= {S. T} <= {S}
[and] read only tuples are covariant:
T1 <= T2, S1 <= S2 ==> {S1. T1} <= {S2. T2}
That would seem to suggest my inclination to using inheritance may be correct, and would contradict amit.dev when he says that a Triple is not a Pair.

Related

Guidelines for listing the order of function arguments

Are there any rules that you follow to determine the order of function arguments? For example, float pow(float x, float exponent) vs float pow(float exponent, float x). For concreteness, C++ could be used, but the question is valid for all programming languages.
My main concern is from the usability point of view, not runtime performance.
Edit:
Some possible bases for ordering could be:
Inputs versus Output
The way a "formula" is usually written, i.e., arguments from left-to-write.
Specificity to the argument to the context of the function, i.e., whether it is a "general" argument, e.g., a singleton object of the system, or specific.
In the example you cite, I think the order was decided on the basis of the mathematical notation xexponent, in which the base is written before the exponent and becomes the left parameter.
I'm not aware of any really sound general principle other than to try to imagine what your users will expect and/or easily remember. People aren't even wholly agreed whether you should write (source, destination) or (destination, source) when copying (compare std::copy with std::memcpy), although I'm pretty sure that the former is now much more common.
There are a whole lot of general conventions, though, followed to different extents by different people:
if the function is considered primarly to act upon a particular object, put it first
parameters that are considered to "configure" the operation of the function come after parameters that are considered the main subject of the function.
out-params come last (but I suspect some people follow the reverse)
To some extent it doesn't really matter -- namely the extent to which your users have IDEs that tell them the parameter order as they type the function name.

Does the term "monad" apply to values of types like Maybe or List, or does it instead apply only to the types themselves?

I've noticed that the word "monad" seems to be used in a somewhat inconsistent way. I've come to believe that this is because many (if not most) of the monad tutorials out there are written by folks who have only just started to figure monads out themselves (eg: nuclear waste spacesuit burritos), and so the term ends up getting kind of overloaded/corrupted.
In particular, I'm wondering whether the term "monad" can be applied to individual values of types like Maybe, List or IO, or if the term "monad" should really only be applied to the types themselves.
This is a subtle distinction, so perhaps an analogy might make it more clear. In mathematics we have, rings, fields, groups, etc. These terms apply to an entire set of values along with the operations that can be performed on them, rather than to individual elements. For example, integers (along with the operations of addition, negation and multiplication) form a ring. You could say "Integer is a ring", but you would never say "5 is a ring".
So, can you say "Just 5 is a monad", or would that be as wrong as saying "5 is a ring"? I don't know category theory, but I'm under the impression that it really only makes sense to say "Maybe is a monad" and not "Just 5 is a monad".
"Monad" (and "Functor") are popularly misused as describing values.
No value is a monad, functor, monoid, applicative functor, etc.
Only types & type constructors (higher-kinded types) can be.
When you hear (and you will) that "lists are monoids" or "functions are monads", etc, or "this function takes a monad as an argument", don't believe it.
Ask the speaker "How can any value be a monoid (or monad or ...), considering that Haskells classes classify types (including higher-order ones) rather than values?"
Lists are not monoids (etc). List a is.
My guess is that this popular misuse stems from mainstream languages having value classes and not type classes, so that habitual, unconscious value-class thinking sneaks in.
Why does it matter whether we use language precisely?
Because we think in language and we build & convey understandings via language.
So in order to have clear thoughts, it helps to have clear language (or be able to at any time).
"The slovenliness of our language makes it easier for us to have foolish thoughts. The point is that the process is reversible." - George Orwell, Politics and the English Language
Edit: These remarks apply to Haskell, not to the more general setting of category theory.
List is a monad, List a is a type, and [] is a List a (an element of a type).
Technically, a monad is a functor with extra structure; and in Haskell we only use functors from the category of Haskell types to itself.
It is thus in particular a "function" which takes a type and returns another type (it has kind * -> *).
List, State s, Maybe, etc are monads. State is not a monad, since it has kind * -> * -> *.
(aside: to confuse matters, Monads are just functors, and if I give myself a partially ordered set A, then it forms a category, with Hom(a, b) = { 1 element } if a <= b and Hom(a, b) = empty otherwise. Now any increasing function f : A -> A forms a functor, and monads are those functions which satisfy x <= f(x) and f(f(x)) <= f(x), hence f(f(x)) = f(x) -- monads here are technically "elements of A -> A". See also closure operators.)
(aside 2: since you appear to know some mathematics, I encourage you to read about category theory. You'll see among others that algebraic structures can be seen as arising from monads. See this excellent blog entry from the excellent blog by Dan Piponi for a teaser.)
To be exact, monads are structures from category theory. They don't have a direct code counterpart. For simplicity let's talk about general functors instead of monads. In the case of Haskell roughly speaking a functor is a mapping from a class of types to a class of types that also maps functions in the first class to functions in the second. The Functor instance gives you access to the mapping function, but doesn't directly capture the concept of functors.
It is however fair to say that the type constructor as mentioned in the Functor instance is the actual functor:
instance Functor Tree
In this case Tree is the functor. However, because Tree is a type constructor it can't stand for both mapping functions that make a functor at the same time. The function that maps functions is called fmap. So if you want to be precise you have to say that the tuple (Tree, fmap) is the functor, where fmap is the particular fmap from Tree's Functor instance. For convenience, again, we say that Tree is the functor, because the corresponding fmap follows from its Functor instance.
Note that functors are always types of kind * -> *. So Maybe Int is not a functor – the functor is Maybe. Also people often talk about "the state monad", which is also imprecise. State is a whole family of infinitely many state monads, as you can see in the instance:
instance Monad (State s)
For every type s the type constructor State s (of kind * -> *) is a state monad, one of many.
So, can you say "Just 5 is a monad", or would that be as wrong as saying "5 is a ring"?
Your intuition is exactly right. Int is to Ring (or AbelianGroup or whatever) as Maybe is to Monad (or Functor or whatever). Values (5, Just 5, etc.) are unimportant.
In algebra, we say the set of integers form a ring; in Haskell we would say (informally) that Int is a member of the Ring typeclass, or (slightly more formally) that there exists a Ring instance for Int. You might find this proposal fun and/or useful. Anyway, same deal with monads.
I don't know category theory, but ...
Whatever, if you know a thing or two about abstract algebra, you're golden.
I would say "Just 5 is of a type that is an instance of a Monad" like i would say "5 is a number that has type (Integer) is a ring".
I use the term instance because is how in Haskell you declare an implementation of a typeclass, and Monad is one of them.

In which languages is function abstraction not primitive

In Haskell function type (->) is given, it's not an algebraic data type constructor and one cannot re-implement it to be identical to (->).
So I wonder, what languages will allow me to write my version of (->)? How does this property called?
UPD Reformulations of the question thanks to the discussion:
Which languages don't have -> as a primitive type?
Why -> is necessary primitive?
I can't think of any languages that have arrows as a user defined type. The reason is that arrows -- types for functions -- are baked in to the type system, all the way down to the simply typed lambda calculus. That the arrow type must fundamental to the language comes directly from the fact that the way you form functions in the lambda calculus is via lambda abstraction (which, at the type level, introduces arrows).
Although Marcin aptly notes that you can program in a point free style, this doesn't change the essence of what you're doing. Having a language without arrow types as primitives goes against the most fundamental building blocks of Haskell. (The language you reference in the question.)
Having the arrow as a primitive type also shares some important ties to constructive logic: you can read the function arrow type as implication from intuition logic, and programs having that type as "proofs." (Namely, if you have something of type A -> B, you have a proof that takes some premise of type A, and produces a proof for B.)
The fact that you're perturbed by the use of having arrows baked into the language might imply that you're not fundamentally grasping why they're so tied to the design of the language, perhaps it's time to read a few chapters from Ben Pierce's "Types and Programming Languages" link.
Edit: You can always look at languages which don't have a strong notion of functions and have their semantics defined with respect to some other way -- such as forth or PostScript -- but in these languages you don't define inductive data types in the same way as in functional languages like Haskell, ML, or Coq. To put it another way, in any language in which you define constructors for datatypes, arrows arise naturally from the constructors for these types. But in languages where you don't define inductive datatypes in the typical way, you don't get arrow types as naturally because the language just doesn't work that way.
Another edit: I will stick in one more comment, since I thought of it last night. Function types (and function abstraction) forms the basis of pretty much all programming languages -- at least at some level, even if it's "under the hood." However, there are languages designed to define the semantics of other languages. While this doesn't strictly match what you're talking about, PLT Redex is one such system, and is used for specifying and debugging the semantics of programming languages. It's not super useful from a practitioners perspective (unless your goal is to design new languages, in which case it is fairly useful), but maybe that fits what you want.
Do you mean meta-circular evaluators like in SICP? Being able to write your own DSL? If you create your own "function type", you'll have to take care of "applying" it, yourself.
Just as an example, you could create your own "function" in C for instance, with a look-up table holding function pointers, and use integers as functions. You'd have to provide your own "call" function for such "functions", of course:
void call( unsigned int function, int data) {
lookup_table[function](data);
}
You'd also probably want some means of creating more complex functions from primitive ones, for instance using arrays of ints to signify sequential execution of your "primitive functions" 1, 2, 3, ... and end up inventing whole new language for yourself.
I think early assemblers had no ability to create callable "macros" and had to use GOTO.
You could use trampolining to simulate function calls. You could have only global variables store, with shallow binding perhaps. In such language "functions" would be definable, though not primitive type.
So having functions in a language is not necessary, though it is convenient.
In Common Lisp defun is nothing but a macro associating a name and a callable object (though lambda is still a built-in). In AutoLisp originally there was no special function type at all, and functions were represented directly by quoted lists of s-expressions, with first element an arguments list. You can construct your function through use of cons and list functions, from symbols, directly, in AutoLisp:
(setq a (list (cons 'x NIL) '(+ 1 x)))
(a 5)
==> 6
Some languages (like Python) support more than one primitive function type, each with its calling protocol - namely, generators support multiple re-entry and returns (even if syntactically through the use of same def keyword). You can easily imagine a language which would let you define your own calling protocol, thus creating new function types.
Edit: as an example consider dealing with multiple arguments in a function call, the choice between automatic currying or automatical optional args etc. In Common LISP say, you could easily create yourself two different call macros to directly represent the two calling protocols. Consider functions returning multiple values not through a kludge of aggregates (tuples, in Haskell), but directly into designated recepient vars/slots. All are different types of functions.
Function definition is usually primitive because (a) functions are how programmes get things done; and (b) this sort of lambda-abstraction is necessary to be able to programme in a pointful style (i.e. with explicit arguments).
Probably the closest you will come to a language that meets your criteria is one based on a purely pointfree model which allows you to create your own lambda operator. You might like to explore pointfree languages in general, and ones based on SKI calculus in particular: http://en.wikipedia.org/wiki/SKI_combinator_calculus
In such a case, you still have primitive function types, and you always will, because it is a fundamental element of the type system. If you want to get away from that at all, probably the best you could do would be some kind of type system based on a category-theoretic generalisation of functions, such that functions would be a special case of another type. See http://en.wikipedia.org/wiki/Category_theory.
Which languages don't have -> as a primitive type?
Well, if you mean a type that can be named, then there are many languages that don't have them. All languages where functions are not first class citiziens don't have -> as a type you could mention somewhere.
But, as #Kristopher eloquently and excellently explained, functions are (or can, at least, perceived as) the very basic building blocks of all computation. Hence even in Java, say, there are functions, but they are carefully hidden from you.
And, as someone mentioned assembler - one could maintain that the machine language (of most contemporary computers) is an approximation of the model of the register machine. But how it is done? With millions and billions of logical circuits, each of them being a materialization of quite primitive pure functions like NOT or NAND, arranged in a certain physical order (which is, obviously, the way hardware engeniers implement function composition).
Hence, while you may not see functions in machine code, they're still the basis.
In Martin-Löf type theory, function types are defined via indexed product types (so-called Π-types).
Basically, the type of functions from A to B can be interpreted as a (possibly infinite) record, where all the fields are of the same type B, and the field names are exactly all the elements of A. When you need to apply a function f to an argument x, you look up the field in f corresponding to x.
The wikipedia article lists some programming languages that are based on Martin-Löf type theory. I am not familiar with them, but I assume that they are a possible answer to your question.
Philip Wadler's paper Call-by-value is dual to call-by-name presents a calculus in which variable abstraction and covariable abstraction are more primitive than function abstraction. Two definitions of function types in terms of those primitives are provided: one implements call-by-value, and the other call-by-name.
Inspired by Wadler's paper, I implemented a language (Ambidexer) which provides two function type constructors that are synonyms for types constructed from the primitives. One is for call-by-value and one for call-by-name. Neither Wadler's dual calculus nor Ambidexter provides user-defined type constructors. However, these examples show that function types are not necessarily primitive, and that a language in which you can define your own (->) is conceivable.
In Scala you can mixin one of the Function traits, e.g. a Set[A] can be used as A => Boolean because it implements the Function1[A,Boolean] trait. Another example is PartialFunction[A,B], which extends usual functions by providing a "range-check" method isDefinedAt.
However, in Scala methods and functions are different, and there is no way to change how methods work. Usually you don't notice the difference, as methods are automatically lifted to functions.
So you have a lot of control how you implement and extend functions in Scala, but I think you have a real "replacement" in mind. I'm not sure this makes even sense.
Or maybe you are looking for languages with some kind of generalization of functions? Then Haskell with Arrow syntax would qualify: http://www.haskell.org/arrows/syntax.html
I suppose the dumb answer to your question is assembly code. This provides you with primitives even "lower" level than functions. You can create functions as macros that make use of register and jump primitives.
Most sane programming languages will give you a way to create functions as a baked-in language feature, because functions (or "subroutines") are the essence of good programming: code reuse.

Pattern matching with associative and commutative operators

Pattern matching (as found in e.g. Prolog, the ML family languages and various expert system shells) normally operates by matching a query against data element by element in strict order.
In domains like automated theorem proving, however, there is a requirement to take into account that some operators are associative and commutative. Suppose we have data
A or B or C
and query
C or $X
Going by surface syntax this doesn't match, but logically it should match with $X bound to A or B because or is associative and commutative.
Is there any existing system, in any language, that does this sort of thing?
Associative-Commutative pattern matching has been around since 1981 and earlier, and is still a hot topic today.
There are lots of systems that implement this idea and make it useful; it means you can avoid write complicated pattern matches when associtivity or commutativity could be used to make the pattern match. Yes, it can be expensive; better the pattern matcher do this automatically, than you do it badly by hand.
You can see an example in a rewrite system for algebra and simple calculus implemented using our program transformation system. In this example, the symbolic language to be processed is defined by grammar rules, and those rules that have A-C properties are marked. Rewrites on trees produced by parsing the symbolic language are automatically extended to match.
The maude term rewriter implements associative and commutative pattern matching.
http://maude.cs.uiuc.edu/
I've never encountered such a thing, and I just had a more detailed look.
There is a sound computational reason for not implementing this by default - one has to essentially generate all combinations of the input before pattern matching, or you have to generate the full cross-product worth of match clauses.
I suspect that the usual way to implement this would be to simply write both patterns (in the binary case), i.e., have patterns for both C or $X and $X or C.
Depending on the underlying organisation of data (it's usually tuples), this pattern matching would involve rearranging the order of tuple elements, which would be weird (particularly in a strongly typed environment!). If it's lists instead, then you're on even shakier ground.
Incidentally, I suspect that the operation you fundamentally want is disjoint union patterns on sets, e.g.:
foo (Or ({C} disjointUnion {X})) = ...
The only programming environment I've seen that deals with sets in any detail would be Isabelle/HOL, and I'm still not sure that you can construct pattern matches over them.
EDIT: It looks like Isabelle's function functionality (rather than fun) will let you define complex non-constructor patterns, except then you have to prove that they are used consistently, and you can't use the code generator anymore.
EDIT 2: The way I implemented similar functionality over n commutative, associative and transitive operators was this:
My terms were of the form A | B | C | D, while queries were of the form B | C | $X, where $X was permitted to match zero or more things. I pre-sorted these using lexographic ordering, so that variables always occurred in the last position.
First, you construct all pairwise matches, ignoring variables for now, and recording those that match according to your rules.
{ (B,B), (C,C) }
If you treat this as a bipartite graph, then you are essentially doing a perfect marriage problem. There exist fast algorithms for finding these.
Assuming you find one, then you gather up everything that does not appear on the left-hand side of your relation (in this example, A and D), and you stuff them into the variable $X, and your match is complete. Obviously you can fail at any stage here, but this will mostly happen if there is no variable free on the RHS, or if there exists a constructor on the LHS that is not matched by anything (preventing you from finding a perfect match).
Sorry if this is a bit muddled. It's been a while since I wrote this code, but I hope this helps you, even a little bit!
For the record, this might not be a good approach in all cases. I had very complex notions of 'match' on subterms (i.e., not simple equality), and so building sets or anything would not have worked. Maybe that'll work in your case though and you can compute disjoint unions directly.

When to use sentinel values?

I recently had to use a GPS location API where each location object had among other things two properties altitude and verticalAccuracy. A negative verticalAccuracy signifies that altitude is invalid, whereas normally a smaller but positive value of verticalAccuracy actually means that altitude is more precise (since it's the vertical distance that it may be off by - I'll leave the discussion as to why this measure is called verticalAccuracy and not verticalInaccuracy for some other time).
This got me thinking: When is it a good idea to use sentinel values like this API does and when would it be better to explicitly make a separate hasValidAltitude property? Are there other options?
Sometimes, sentinel answers aren't really possible; maybe the function's range coincides with the codomain (range). This isn't the case with altitude, unless you allow negative altitudes (maybe in the future, there will be underwater cities). For instance, maybe we're talking about the intersection between lines (not a great example, since floating-points have a few built-in sentinels like +INF and NaN) or the precise integer quotient (without rounding, this is not guaranteed to exist... 7 and 3, for instance... here, the remainder after division can be viewed as either a sentinel or a "exact integer quotient exists" property). More generally, any reliable sentinel can be trivially used to construct a property-based mechanism.
Based on this, I'd recommend avoiding sentinels wherever this is possible and makes sense. My reasoning is that they are an internal implementation detail of the module, and should be encapsulated behind an information-hiding interface.