Haskell storing functions in data structures - use cases - function

I was going through the Software Foundations book (http://www.cis.upenn.edu/~bcpierce/sf/Basics.html). In the second paragraph of Introduction, it is mentioned that as functions are treated as first class values, they can be stored in a data structure. A quick Google search for the applications of this turned up acid-state (http://acid-state.seize.it/) which probably uses this for storing functions (please correct me if I am wrong). Are there any other applications in which storing functions as data in data structures is/can be used? Is storing functions as data a common pattern in Haskell? Thanks in advance.
Edit: I am aware of the higher order functions.

A simple and easily-used functions-in-data-structure example is that of having a record containing functions (and possibly data).
A good example is in Luke Palmer's blog post Haskell Antipattern: Existential Typeclass where he explains that rather than try to reproduce an OO-like class for widgets, with many instances and then store your UI widgets in an existentially quantified list, you're better off making a record of the actual functions you'll use and returning that instead of retaining your original datatype - once you're existentially quantified, you can't get at the original datatype anyway, so you don't lose anything in expressiveness, but gain a lot in manipulatability. He concludes
Functions are the masters of reuse: when you use an advanced feature, you need a yet more advanced feature to abstract over it (think: classes < existential types < universally quantified constraints < unknown). But all you need to abstract over a function is another function.
This means that we can write higher-order functions to deal with those functions, and pass them around and manipulate them easily, since they're data.

First class functions are extremely useful -- I would contend that their main use case is in higher order functions, which simplify many problems that require more complex patterns in other languages, like Java.

Related

Efficient implementation of multiple return values?

Is it possible to implement efficiently (with little to no runtime overhead) functions that return multiple vales / a tuple type?
In a C-like language something like this:
int, float f(int a) {
return a*2 , a / 2;
}
Is there a reason why very few statically compiled languages do this?
Yes, it can be efficient. You may need to spill registers, but it is possible.
GHC for example, implements the "constructed product return" optimization, that:
determines when a function can profitably return multiple results in registers. The analysis is based only on a function's definition, and not on its uses (so separate compilation is easily supported) and the results of the analysis can be expressed by a transformation of the function definition alone.
CPR is a huge win for returning small structures (i.e. tuples, tagged unions).
More information:
CPR analysis in GHC.
More on demand analysis.
The best paper I have read exactly about this topic is An Efficient Implementation of Multiple Return Values in Scheme (PDF). Although it is about Scheme programming language, they explain the matter in terms of low machine level stack/registry implementation.
This article actually made me think that many high-level features normally considered inefficient are a solved problem regarding the efficient implementation and just the inertia of the popular languages is in the way.
If your tuple doesn't fit into a single register (32- or 64-bit, depending on your architecture, most likely), then there's going to be actual allocation (most likely on the heap) involved in implementing this.
That said, the reasons why very few languages permit this style is unlikely to be related to performance as much as it is likely related to stylistic concerns in the language (i.e., there are other idiomatic ways to achieve the same thing, such as returning a struct). Introducing new primitives to the language can be clumsy and introduce inconsistencies. For example, if tuples become first-class values, can I use them anywhere? How do I access them? Do we enforce immutability? How do I allocate or deallocate them?
Languages with more expressive type systems tend to make it easier to add these kinds of language features in a principled manner, which is why you'll find tuples (and all sorts of other exotic creatures) as first-class values in languages derived from the ML family (amongst others).
In C, just return a struct holding the values. Sure, the result won't fit in a register, unlike an int, so it will not be as efficient as an int return, but it will still be efficient, if the struct is a local variable and thus allocated on the stack rather than the heap.

Does any programming language support defining constraints on primitive data types?

Last night I was thining that programming languages can have a feature in which we should be able to constraints the values assigned to primitive data types.
For example I should be able to say my variable of type int can only have value between 0 and 100
int<0, 100> progress;
This would then act as a normal integer in all scenarios except the fact that you won't be able to specify values out of the range defined in constraint. The compiler will not compile the code progress=200.
This constraint can be carried over with type information.
Is this possible? Is it done in any programming language? If yes then which language has it and what is this technique called?
It is generally not possible. It makes little sense to use integers without any arithmetic operators. With arithmetic operators you have this:
int<0,100> x, u, v;
...
x = u + v; // is it in range?
If you're willing to do checks at run-time, then yes, several mainstream languages support it, starting with Pascal.
I believe Pascal (and Delphi) offers something similar with subrange types.
I think this is not possible at all in Java and in Ruby (well, in Ruby probably it is possible, but requires some effort). I have no idea about other languages, though.
Ada allows something like what you describe with ranges:
type My_Int is range 1..100;
So if you try assign a value to a My_Int that's less than 1 or greater than 100, Ada will raise the exception Constraint_Error.
Note that I've never used Ada. I've only read about this feature, so do your research before you plunge in.
It is certainly possible. There are many different techniques to do that, but 'dependent types' is the most popular.
The constraints can be even checked statically at compile time by compiler. See, for example, Agda2 and ATS (ats-lang.org).
Weaker forms of your 'range types' are possible without full dependent types, I think.
Some keywords to search for research papers:
- Guarded types
- Refinment types
- Subrange types
Certainly! In case you missed it: C. Do you C? You don't C? You don't count short as a constraint on Integer? Ok, so C only gives you pre-packaged constrained types.
BTW: It seems the answer that Pascal has subrange types misses the point of them. In Pascal array bounds violations are not possible. This is because the array index must of the same type as the array was declared with. In turn this means that to use an integer index you must coerce it down to the subrange, and that is where the run time check is done, not accessing the array.
This is a very important idea because it means a for loop over an array index type may access the array components safely without any run time checking.
Pascal has subranges. Ada extended that a bit, so you can do something like a subrange, or you can create an entirely new type with characteristics of the existing type, but not compatible with it (e.g., even if it was in the right range, you wouldn't be able to assign an Integer to your new type based off of Integer).
C++ doesn't support the idea directly, but is flexible enough that you can implement it if you want to. If you decide to support all the compound assignment operators (+=, -=, *=, etc.) this can be a lot of work though.
Other languages that support operator overloading (e.g., ML and company) can probably support it in much the same way as C++.
Also note that there are a few non-trivial decisions involved in the design. In particular, if the type is used in a way that could/does result in an intermediate result that overflows the specified range, but produces a final result that's within the specified range, what do you want to happen? Depending on your situation, that might be an error, or it might be entirely acceptable, and you'll have to decide which.
I really doubt that you can do that. Afterall these are primitive datatypes, with emphasis on primitive!
adding a constraint will make the type a subclass of its primitive state, thus extending it.
from wikipedia:
a basic type is a data type provided by a programming language as a basic building block. Most languages allow more complicated composite types to be recursively constructed starting from basic types.
a built-in type is a data type for which the programming language provides built-in support.
So personally, even if it is possible i wouldnt do, since its a bad practice. Instead just create an object that returns this type and the constraints (which i am sure you thought of this solution).
SQL has domains, which consist of a base type together with a dynamically-checked constraint. For example, you might define the domain telephone_number as a character string with an appropriate number of digits, valid area code, etc.

Can a variable like 'int' be considered a primitive/fundamental data structure?

A rough definition of a data structure is that it allows you to store data and apply a set of operations on that data while preserving consistency of data before and after the operation.
However some people insist that a primitive variable like 'int' can also be considered as a data structure. I get that part where it allows you to store data but I guess the operation part is missing. Primitive variables don't have operations attached to them. So I feel that unless you have a set of operations defined and attached to it you cannot call it a data structure. 'int' doesn't have any operation attached to it, it can be operated upon with a set of generic operators.
Please advise if I got something wrong here.
To say that something is structured implies that there is a form or formatting that defines HOW the data is structured. Note this has nothing to do with how the data is actually stored. You could for example create a data structure that exists entirely within a single Integer, yet represents a number of different values.
A data structure is an arbitrary construct used to describe how to store data in a system. It may be as simple as a single primitive, or as complex as a class. So the answer is largely subjective. It's "yes" if you decide to use a primitive as such, that a simple primitive may be considered a primitive data structure, because it describes HOW you wish to store an element of data. The answer is also "no", because it describes an element of a structure and not necessarily the whole structure in itself.
As for how this relates to operations, strictly speaking a data structure has nothing to do with behaviour, it is simply a storage mechanism. Preserving consistency of data is really a behavioural thing. Yes, your compiler probably spits out errors if you try to shoe-horn a 32-bit value into a Byte, but that's symptomatic of the behaviour of the system (Ie: compilation) acting on the data structure of your application, of which your primitives are an element.
I don't think your definition of data structure is correct.
It seems to me that a struct (with no methods) is a valid data structure, but it has no real 'operations'. And that's not important. It's holding data.
To that end, and int holds data, an Object holds data. They are data structures (technically).
That said, I don't ever find myself saying "What datastructure shall I use? I know! an int!".
I would say you need to re-evaluate the meaning of "data structure".
Your definition of a data structure isn't quite correct. A data structure doesn't necessarily have any attached behaviors or operations. An ADT or Abstract Data Type is what you are describing as a data structure. An ADT includes the data and the behaviors or operations that work on that data. An int by itself is not an ADT, but I suppose you could call it a data structure. If you encapsulate an int and its operations then you have an ADT which is what I think you are trying to describe as a data structure. Classes provide a mechanism for implementation of ADTs in modern languages.
wikipedia has a good description of abstract data types.
I would argue that "int" is a data structure - it has a defined representation and meaning. That is, depending on your system, it has a specific length, a specific set of operators available to it, and a specified representation (be it twos-compliment). It is designed to hold "integer numbers".
Practically, the distinction isn't particularly relevant.
Primitives do have operations attached to them; however, they may not be in the format of methods as you would expect in an object-oriented paradigm.
Assignment =, addition +, subtraction -, comparison ==, etc are all operations. Especially if you consider that you can explicitly define, override, or overload these operations for arbitrary classes (i.e.: data structures) in some languages (e.g. C++), then the primitive int, char, or what have you, are not very different.
Primitive variables don't have operations attached to them. So I feel that unless you have a set of operations defined and attached to it you cannot call it a data structure.
'int' doesn't have any operation attached to it, it can be operated upon with a set of generic operators.
Generic? Then how come 2+2 works, but "ninja" + List<float> doesn't? If the operator was generic, it'd work on anything. It doesn't. It only works on a few predefined types, such as integers.
Ints certainly have a set of operations defined on them. Arithmetic operations such as addition, subtraction, multiplication or division, for example. Most languages also have some kind of ToString()-like functionality defined on integers. But you can't do just anything with an int. For example, you can't pass an int to a function expecting a string. ints have a very specific set of operations defined on them. Those operations just aren't member methods. They come in the form of operators and non-member functions or member methods of other classes. But they are still operations that work on integers.
I don't think, (ref: Wikipedia entry) that Data Structure includes the definition of permissible operations (on operators). Of course, we could cite the C++ class as a counter-example in which we can define overloaded operators. At the same time, we define a structure as just a composite/user defined datatype and do not declare any permissible operations on them. We allow the compiler to figure that out.
'int' doesn't have any operation attached to it, it can be operated upon with a set of generic operators.
Operations are intrinsically linked with the things that they operate on; there's no such thing as generic operations.
This is true in the mathematical sense (< works for in set of integers, but has no meaning for complex numbers), and also in the computer scientific sense (evaluating a + b requires that a and b are or can be converted to compatible types on which the + operation is defined).
Of course, it depends what you mean by "data structure." Others have focused on whether your definition is correct and raise good questions. But what if we say, "Let's ignore the term for now, and focus on what you described?" In other words, what if look at
A piece of data that has a designated interpretation of its value
A set of operations on that data
Then certainly, int qualifies. (If there were no operations on int, we'd all be stuck!)
For a more mathematical approach to programming that begins with these questions, and takes them to what some have called "an algebra of computation," see Elements of Programming by Alex Stepanov and Paul McJones.

Functional, Declarative, and Imperative Programming [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
What do the terms functional, declarative, and imperative programming mean?
At the time of writing this, the top voted answers on this page are imprecise and muddled on the declarative vs. imperative definition, including the answer that quotes Wikipedia. Some answers are conflating the terms in different ways.
Refer also to my explanation of why spreadsheet programming is declarative, regardless that the formulas mutate the cells.
Also, several answers claim that functional programming must be a subset of declarative. On that point it depends if we differentiate "function" from "procedure". Lets handle imperative vs. declarative first.
Definition of declarative expression
The only attribute that can possibly differentiate a declarative expression from an imperative expression is the referential transparency (RT) of its sub-expressions. All other attributes are either shared between both types of expressions, or derived from the RT.
A 100% declarative language (i.e. one in which every possible expression is RT) does not (among other RT requirements) allow the mutation of stored values, e.g. HTML and most of Haskell.
Definition of RT expression
RT is often referred to as having "no side-effects". The term effects does not have a precise definition, so some people don't agree that "no side-effects" is the same as RT. RT has a precise definition:
An expression e is referentially transparent if for all programs p every occurrence of e in p can be replaced with the result of evaluating e, without affecting the observable result of p.
Since every sub-expression is conceptually a function call, RT requires that the implementation of a function (i.e. the expression(s) inside the called function) may not access the mutable state that is external to the function (accessing the mutable local state is allowed). Put simply, the function (implementation) should be pure.
Definition of pure function
A pure function is often said to have "no side-effects". The term effects does not have a precise definition, so some people don't agree.
Pure functions have the following attributes.
the only observable output is the return value.
the only output dependency is the arguments.
arguments are fully determined before any output is generated.
Remember that RT applies to expressions (which includes function calls) and purity applies to (implementations of) functions.
An obscure example of impure functions that make RT expressions is concurrency, but this is because the purity is broken at the interrupt abstraction layer. You don't really need to know this. To make RT expressions, you call pure functions.
Derivative attributes of RT
Any other attribute cited for declarative programming, e.g. the citation from 1999 used by Wikipedia, either derives from RT, or is shared with imperative programming. Thus proving that my precise definition is correct.
Note, immutability of external values is a subset of the requirements for RT.
Declarative languages don't have looping control structures, e.g. for and while, because due to immutability, the loop condition would never change.
Declarative languages don't express control-flow other than nested function order (a.k.a logical dependencies), because due to immutability, other choices of evaluation order do not change the result (see below).
Declarative languages express logical "steps" (i.e. the nested RT function call order), but whether each function call is a higher level semantic (i.e. "what to do") is not a requirement of declarative programming. The distinction from imperative is that due to immutability (i.e. more generally RT), these "steps" cannot depend on mutable state, rather only the relational order of the expressed logic (i.e. the order of nesting of the function calls, a.k.a. sub-expressions).
For example, the HTML paragraph <p> cannot be displayed until the sub-expressions (i.e. tags) in the paragraph have been evaluated. There is no mutable state, only an order dependency due to the logical relationship of tag hierarchy (nesting of sub-expressions, which are analogously nested function calls).
Thus there is the derivative attribute of immutability (more generally RT), that declarative expressions, express only the logical relationships of the constituent parts (i.e. of the sub-expression function arguments) and not mutable state relationships.
Evaluation order
The choice of evaluation order of sub-expressions can only give a varying result when any of the function calls are not RT (i.e. the function is not pure), e.g. some mutable state external to a function is accessed within the function.
For example, given some nested expressions, e.g. f( g(a, b), h(c, d) ), eager and lazy evaluation of the function arguments will give the same results if the functions f, g, and h are pure.
Whereas, if the functions f, g, and h are not pure, then the choice of evaluation order can give a different result.
Note, nested expressions are conceptually nested functions, since expression operators are just function calls masquerading as unary prefix, unary postfix, or binary infix notation.
Tangentially, if all identifiers, e.g. a, b, c, d, are immutable everywhere, state external to the program cannot be accessed (i.e. I/O), and there is no abstraction layer breakage, then functions are always pure.
By the way, Haskell has a different syntax, f (g a b) (h c d).
Evaluation order details
A function is a state transition (not a mutable stored value) from the input to the output. For RT compositions of calls to pure functions, the order-of-execution of these state transitions is independent. The state transition of each function call is independent of the others, due to lack of side-effects and the principle that an RT function may be replaced by its cached value. To correct a popular misconception, pure monadic composition is always declarative and RT, in spite of the fact that Haskell's IO monad is arguably impure and thus imperative w.r.t. the World state external to the program (but in the sense of the caveat below, the side-effects are isolated).
Eager evaluation means the functions arguments are evaluated before the function is called, and lazy evaluation means the arguments are not evaluated until (and if) they are accessed within the function.
Definition: function parameters are declared at the function definition site, and function arguments are supplied at the function call site. Know the difference between parameter and argument.
Conceptually, all expressions are (a composition of) function calls, e.g. constants are functions without inputs, unary operators are functions with one input, binary infix operators are functions with two inputs, constructors are functions, and even control statements (e.g. if, for, while) can be modeled with functions. The order that these argument functions (do not confuse with nested function call order) are evaluated is not declared by the syntax, e.g. f( g() ) could eagerly evaluate g then f on g's result or it could evaluate f and only lazily evaluate g when its result is needed within f.
Caveat, no Turing complete language (i.e. that allows unbounded recursion) is perfectly declarative, e.g. lazy evaluation introduces memory and time indeterminism. But these side-effects due to the choice of evaluation order are limited to memory consumption, execution time, latency, non-termination, and external hysteresis thus external synchronization.
Functional programming
Because declarative programming cannot have loops, then the only way to iterate is functional recursion. It is in this sense that functional programming is related to declarative programming.
But functional programming is not limited to declarative programming. Functional composition can be contrasted with subtyping, especially with respect to the Expression Problem, where extension can be achieved by either adding subtypes or functional decomposition. Extension can be a mix of both methodologies.
Functional programming usually makes the function a first-class object, meaning the function type can appear in the grammar anywhere any other type may. The upshot is that functions can input and operate on functions, thus providing for separation-of-concerns by emphasizing function composition, i.e. separating the dependencies among the subcomputations of a deterministic computation.
For example, instead of writing a separate function (and employing recursion instead of loops if the function must also be declarative) for each of an infinite number of possible specialized actions that could be applied to each element of a collection, functional programming employs reusable iteration functions, e.g. map, fold, filter. These iteration functions input a first-class specialized action function. These iteration functions iterate the collection and call the input specialized action function for each element. These action functions are more concise because they no longer need to contain the looping statements to iterate the collection.
However, note that if a function is not pure, then it is really a procedure. We can perhaps argue that functional programming that uses impure functions, is really procedural programming. Thus if we agree that declarative expressions are RT, then we can say that procedural programming is not declarative programming, and thus we might argue that functional programming is always RT and must be a subset of declarative programming.
Parallelism
This functional composition with first-class functions can express the depth in the parallelism by separating out the independent function.
Brent’s Principle: computation with work w and depth d can be
implemented in a p-processor PRAM in time O(max(w/p, d)).
Both concurrency and parallelism also require declarative programming, i.e. immutability and RT.
So where did this dangerous assumption that Parallelism == Concurrency
come from? It’s a natural consequence of languages with side-effects:
when your language has side-effects everywhere, then any time you try
to do more than one thing at a time you essentially have
non-determinism caused by the interleaving of the effects from each
operation. So in side-effecty languages, the only way to get
parallelism is concurrency; it’s therefore not surprising that we
often see the two conflated.
FP evaluation order
Note the evaluation order also impacts the termination and performance side-effects of functional composition.
Eager (CBV) and lazy (CBN) are categorical duels[10], because they have reversed evaluation order, i.e. whether the outer or inner functions respectively are evaluated first. Imagine an upside-down tree, then eager evaluates from function tree branch tips up the branch hierarchy to the top-level function trunk; whereas, lazy evaluates from the trunk down to the branch tips. Eager doesn't have conjunctive products ("and", a/k/a categorical "products") and lazy doesn't have disjunctive coproducts ("or", a/k/a categorical "sums")[11].
Performance
Eager
As with non-termination, eager is too eager with conjunctive functional composition, i.e. compositional control structure does unnecessary work that isn't done with lazy. For example, eager eagerly and unnecessarily maps the entire list to booleans, when it is composed with a fold that terminates on the first true element.
This unnecessary work is the cause of the claimed "up to" an extra log n factor in the sequential time complexity of eager versus lazy, both with pure functions. A solution is to use functors (e.g. lists) with lazy constructors (i.e. eager with optional lazy products), because with eager the eagerness incorrectness originates from the inner function. This is because products are constructive types, i.e. inductive types with an initial algebra on an initial fixpoint[11]
Lazy
As with non-termination, lazy is too lazy with disjunctive functional composition, i.e. coinductive finality can occur later than necessary, resulting in both unnecessary work and non-determinism of the lateness that isn't the case with eager[10][11]. Examples of finality are state, timing, non-termination, and runtime exceptions. These are imperative side-effects, but even in a pure declarative language (e.g. Haskell), there is state in the imperative IO monad (note: not all monads are imperative!) implicit in space allocation, and timing is state relative to the imperative real world. Using lazy even with optional eager coproducts leaks "laziness" into inner coproducts, because with lazy the laziness incorrectness originates from the outer function (see the example in the Non-termination section, where == is an outer binary operator function). This is because coproducts are bounded by finality, i.e. coinductive types with a final algebra on an final object[11].
Lazy causes indeterminism in the design and debugging of functions for latency and space, the debugging of which is probably beyond the capabilities of the majority of programmers, because of the dissonance between the declared function hierarchy and the runtime order-of-evaluation. Lazy pure functions evaluated with eager, could potentially introduce previously unseen non-termination at runtime. Conversely, eager pure functions evaluated with lazy, could potentially introduce previously unseen space and latency indeterminism at runtime.
Non-termination
At compile-time, due to the Halting problem and mutual recursion in a Turing complete language, functions can't generally be guaranteed to terminate.
Eager
With eager but not lazy, for the conjunction of Head "and" Tail, if either Head or Tail doesn't terminate, then respectively either List( Head(), Tail() ).tail == Tail() or List( Head(), Tail() ).head == Head() is not true because the left-side doesn't, and right-side does, terminate.
Whereas, with lazy both sides terminate. Thus eager is too eager with conjunctive products, and non-terminates (including runtime exceptions) in those cases where it isn't necessary.
Lazy
With lazy but not eager, for the disjunction of 1 "or" 2, if f doesn't terminate, then List( f ? 1 : 2, 3 ).tail == (f ? List( 1, 3 ) : List( 2, 3 )).tail is not true because the left-side terminates, and right-side doesn't.
Whereas, with eager neither side terminates so the equality test is never reached. Thus lazy is too lazy with disjunctive coproducts, and in those cases fails to terminate (including runtime exceptions) after doing more work than eager would have.
[10] Declarative Continuations and Categorical Duality, Filinski, sections 2.5.4 A comparison of CBV and CBN, and 3.6.1 CBV and CBN in the SCL.
[11] Declarative Continuations and Categorical Duality, Filinski, sections 2.2.1 Products and coproducts, 2.2.2 Terminal and initial objects, 2.5.2 CBV with lazy products, and 2.5.3 CBN with eager coproducts.
There's not really any non-ambiguous, objective definition for these. Here is how I would define them:
Imperative - The focus is on what steps the computer should take rather than what the computer will do (ex. C, C++, Java).
Declarative - The focus is on what the computer should do rather than how it should do it (ex. SQL).
Functional - a subset of declarative languages that has heavy focus on recursion
imperative and declarative describe two opposing styles of programming. imperative is the traditional "step by step recipe" approach while declarative is more "this is what i want, now you work out how to do it".
these two approaches occur throughout programming - even with the same language and the same program. generally the declarative approach is considered preferable, because it frees the programmer from having to specify so many details, while also having less chance for bugs (if you describe the result you want, and some well-tested automatic process can work backwards from that to define the steps then you might hope that things are more reliable than having to specify each step by hand).
on the other hand, an imperative approach gives you more low level control - it's the "micromanager approach" to programming. and that can allow the programmer to exploit knowledge about the problem to give a more efficient answer. so it's not unusual for some parts of a program to be written in a more declarative style, but for the speed-critical parts to be more imperative.
as you might imagine, the language you use to write a program affects how declarative you can be - a language that has built-in "smarts" for working out what to do given a description of the result is going to allow a much more declarative approach than one where the programmer needs to first add that kind of intelligence with imperative code before being able to build a more declarative layer on top. so, for example, a language like prolog is considered very declarative because it has, built-in, a process that searches for answers.
so far, you'll notice that i haven't mentioned functional programming. that's because it's a term whose meaning isn't immediately related to the other two. at its most simple, functional programming means that you use functions. in particular, that you use a language that supports functions as "first class values" - that means that not only can you write functions, but you can write functions that write functions (that write functions that...), and pass functions to functions. in short - that functions are as flexible and common as things like strings and numbers.
it might seem odd, then, that functional, imperative and declarative are often mentioned together. the reason for this is a consequence of taking the idea of functional programming "to the extreme". a function, in it's purest sense, is something from maths - a kind of "black box" that takes some input and always gives the same output. and that kind of behaviour doesn't require storing changing variables. so if you design a programming language whose aim is to implement a very pure, mathematically influenced idea of functional programming, you end up rejecting, largely, the idea of values that can change (in a certain, limited, technical sense).
and if you do that - if you limit how variables can change - then almost by accident you end up forcing the programmer to write programs that are more declarative, because a large part of imperative programming is describing how variables change, and you can no longer do that! so it turns out that functional programming - particularly, programming in a functional language - tends to give more declarative code.
to summarise, then:
imperative and declarative are two opposing styles of programming (the same names are used for programming languages that encourage those styles)
functional programming is a style of programming where functions become very important and, as a consequence, changing values become less important. the limited ability to specify changes in values forces a more declarative style.
so "functional programming" is often described as "declarative".
In a nutshell:
An imperative language specfies a series of instructions that the computer executes in sequence (do this, then do that).
A declarative language declares a set of rules about what outputs should result from which inputs (eg. if you have A, then the result is B). An engine will apply these rules to inputs, and give an output.
A functional language declares a set of mathematical/logical functions which define how input is translated to output. eg. f(y) = y * y. it is a type of declarative language.
Imperative: how to achieve our goal
Take the next customer from a list.
If the customer lives in Spain, show their details.
If there are more customers in the list, go to the beginning
Declarative: what we want to achieve
Show customer details of every customer living in Spain
Imperative Programming means any style of programming where your program is structured out of instructions describing how the operations performed by a computer will happen.
Declarative Programming means any style of programming where your program is a description either of the problem or the solution - but doesn't explicitly state how the work will be done.
Functional Programming is programming by evaluating functions and functions of functions... As (strictly defined) functional programming means programming by defining side-effect free mathematical functions so it is a form of declarative programming but it isn't the only kind of declarative programming.
Logic Programming (for example in Prolog) is another form of declarative programming. It involves computing by deciding whether a logical statement is true (or whether it can be satisfied). The program is typically a series of facts and rules - i.e. a description rather than a series of instructions.
Term Rewriting (for example CASL) is another form of declarative programming. It involves symbolic transformation of algebraic terms. It's completely distinct from logic programming and functional programming.
imperative - expressions describe sequence of actions to perform (associative)
declarative - expressions are declarations that contribute to behavior of program (associative, commutative, idempotent, monotonic)
functional - expressions have value as only effect; semantics support equational reasoning
Since I wrote my prior answer, I have formulated a new definition of the declarative property which is quoted below. I have also defined imperative programming as the dual property.
This definition is superior to the one I provided in my prior answer, because it is succinct and it is more general. But it may be more difficult to grok, because the implication of the incompleteness theorems applicable to programming and life in general are difficult for humans to wrap their mind around.
The quoted explanation of the definition discusses the role pure functional programming plays in declarative programming.
All exotic types of programming fit into the following taxonomy of declarative versus imperative, since the following definition claims they are duals.
Declarative vs. Imperative
The declarative property is weird, obtuse, and difficult to capture in a technically precise definition that remains general and not ambiguous, because it is a naive notion that we can declare the meaning (a.k.a semantics) of the program without incurring unintended side effects. There is an inherent tension between expression of meaning and avoidance of unintended effects, and this tension actually derives from the incompleteness theorems of programming and our universe.
It is oversimplification, technically imprecise, and often ambiguous to define declarative as “what to do” and imperative as “how to do”. An ambiguous case is the “what” is the “how” in a program that outputs a program— a compiler.
Evidently the unbounded recursion that makes a language Turing complete, is also analogously in the semantics— not only in the syntactical structure of evaluation (a.k.a. operational semantics). This is logically an example analogous to Gödel's theorem— “any complete system of axioms is also inconsistent”. Ponder the contradictory weirdness of that quote! It is also an example that demonstrates how the expression of semantics does not have a provable bound, thus we can't prove2 that a program (and analogously its semantics) halt a.k.a. the Halting theorem.
The incompleteness theorems derive from the fundamental nature of our universe, which as stated in the Second Law of Thermodynamics is “the entropy (a.k.a. the # of independent possibilities) is trending to maximum forever”. The coding and design of a program is never finished— it's alive!— because it attempts to address a real world need, and the semantics of the real world are always changing and trending to more possibilities. Humans never stop discovering new things (including errors in programs ;-).
To precisely and technically capture this aforementioned desired notion within this weird universe that has no edge (ponder that! there is no “outside” of our universe), requires a terse but deceptively-not-simple definition which will sound incorrect until it is explained deeply.
Definition:
The declarative property is where there can exist only one possible set of statements that can express each specific modular semantic.
The imperative property3 is the dual, where semantics are inconsistent under composition and/or can be expressed with variations of sets of statements.
This definition of declarative is distinctively local in semantic scope, meaning that it requires that a modular semantic maintain its consistent meaning regardless where and how it's instantiated and employed in global scope. Thus each declarative modular semantic should be intrinsically orthogonal to all possible others— and not an impossible (due to incompleteness theorems) global algorithm or model for witnessing consistency, which is also the point of “More Is Not Always Better” by Robert Harper, Professor of Computer Science at Carnegie Mellon University, one of the designers of Standard ML.
Examples of these modular declarative semantics include category theory functors e.g. the Applicative, nominal typing, namespaces, named fields, and w.r.t. to operational level of semantics then pure functional programming.
Thus well designed declarative languages can more clearly express meaning, albeit with some loss of generality in what can be expressed, yet a gain in what can be expressed with intrinsic consistency.
An example of the aforementioned definition is the set of formulas in the cells of a spreadsheet program— which are not expected to give the same meaning when moved to different column and row cells, i.e. cell identifiers changed. The cell identifiers are part of and not superfluous to the intended meaning. So each spreadsheet result is unique w.r.t. to the cell identifiers in a set of formulas. The consistent modular semantic in this case is use of cell identifiers as the input and output of pure functions for cells formulas (see below).
Hyper Text Markup Language a.k.a. HTML— the language for static web pages— is an example of a highly (but not perfectly3) declarative language that (at least before HTML 5) had no capability to express dynamic behavior. HTML is perhaps the easiest language to learn. For dynamic behavior, an imperative scripting language such as JavaScript was usually combined with HTML. HTML without JavaScript fits the declarative definition because each nominal type (i.e. the tags) maintains its consistent meaning under composition within the rules of the syntax.
A competing definition for declarative is the commutative and idempotent properties of the semantic statements, i.e. that statements can be reordered and duplicated without changing the meaning. For example, statements assigning values to named fields can be reordered and duplicated without changed the meaning of the program, if those names are modular w.r.t. to any implied order. Names sometimes imply an order, e.g. cell identifiers include their column and row position— moving a total on spreadsheet changes its meaning. Otherwise, these properties implicitly require global consistency of semantics. It is generally impossible to design the semantics of statements so they remain consistent if randomly ordered or duplicated, because order and duplication are intrinsic to semantics. For example, the statements “Foo exists” (or construction) and “Foo does not exist” (and destruction). If one considers random inconsistency endemical of the intended semantics, then one accepts this definition as general enough for the declarative property. In essence this definition is vacuous as a generalized definition because it attempts to make consistency orthogonal to semantics, i.e. to defy the fact that the universe of semantics is dynamically unbounded and can't be captured in a global coherence paradigm.
Requiring the commutative and idempotent properties for the (structural evaluation order of the) lower-level operational semantics converts operational semantics to a declarative localized modular semantic, e.g. pure functional programming (including recursion instead of imperative loops). Then the operational order of the implementation details do not impact (i.e. spread globally into) the consistency of the higher-level semantics. For example, the order of evaluation of (and theoretically also the duplication of) the spreadsheet formulas doesn't matter because the outputs are not copied to the inputs until after all outputs have been computed, i.e. analogous to pure functions.
C, Java, C++, C#, PHP, and JavaScript aren't particularly declarative.
Copute's syntax and Python's syntax are more declaratively coupled to
intended results, i.e. consistent syntactical semantics that eliminate the extraneous so one can readily
comprehend code after they've forgotten it. Copute and Haskell enforce
determinism of the operational semantics and encourage “don't repeat
yourself” (DRY), because they only allow the pure functional paradigm.
2 Even where we can prove the semantics of a program, e.g. with the language Coq, this is limited to the semantics that are expressed in the typing, and typing can never capture all of the semantics of a program— not even for languages that are not Turing complete, e.g. with HTML+CSS it is possible to express inconsistent combinations which thus have undefined semantics.
3 Many explanations incorrectly claim that only imperative programming has syntactically ordered statements. I clarified this confusion between imperative and functional programming. For example, the order of HTML statements does not reduce the consistency of their meaning.
Edit: I posted the following comment to Robert Harper's blog:
in functional programming ... the range of variation of a variable is a type
Depending on how one distinguishes functional from imperative
programming, your ‘assignable’ in an imperative program also may have
a type placing a bound on its variability.
The only non-muddled definition I currently appreciate for functional
programming is a) functions as first-class objects and types, b)
preference for recursion over loops, and/or c) pure functions— i.e.
those functions which do not impact the desired semantics of the
program when memoized (thus perfectly pure functional
programming doesn't exist in a general purpose denotational semantics
due to impacts of operational semantics, e.g. memory
allocation).
The idempotent property of a pure function means the function call on
its variables can be substituted by its value, which is not generally
the case for the arguments of an imperative procedure. Pure functions
seem to be declarative w.r.t. to the uncomposed state transitions
between the input and result types.
But the composition of pure functions does not maintain any such
consistency, because it is possible to model a side-effect (global
state) imperative process in a pure functional programming language,
e.g. Haskell's IOMonad and moreover it is entirely impossible to
prevent doing such in any Turing complete pure functional programming
language.
As I wrote in 2012 which seems to the similar consensus of
comments in your recent blog, that declarative programming is an
attempt to capture the notion that the intended semantics are never
opaque. Examples of opaque semantics are dependence on order,
dependence on erasure of higher-level semantics at the operational
semantics layer (e.g. casts are not conversions and reified generics
limit higher-level semantics), and dependence on variable values
which can not be checked (proved correct) by the programming language.
Thus I have concluded that only non-Turing complete languages can be
declarative.
Thus one unambiguous and distinct attribute of a declarative language
could be that its output can be proven to obey some enumerable set of
generative rules. For example, for any specific HTML program (ignoring
differences in the ways interpreters diverge) that is not scripted
(i.e. is not Turing complete) then its output variability can be
enumerable. Or more succinctly an HTML program is a pure function of
its variability. Ditto a spreadsheet program is a pure function of its
input variables.
So it seems to me that declarative languages are the antithesis of
unbounded recursion, i.e. per Gödel's second incompleteness
theorem self-referential theorems can't be proven.
Lesie Lamport wrote a fairytale about how Euclid might have
worked around Gödel's incompleteness theorems applied to math proofs
in the programming language context by to congruence between types and
logic (Curry-Howard correspondence, etc).
Imperative programming: telling the “machine” how to do something, and as a result what you want to happen will happen.
Declarative programming: telling the “machine” what you would like to happen, and letting the computer figure out how to do it.
Example of imperative
function makeWidget(options) {
const element = document.createElement('div');
element.style.backgroundColor = options.bgColor;
element.style.width = options.width;
element.style.height = options.height;
element.textContent = options.txt;
return element;
}
Example of declarative
function makeWidget(type, txt) {
return new Element(type, txt);
}
Note: The difference is not one of brevity or complexity or abstraction. As stated, the difference is how vs what.
I think that your taxonomy is incorrect. There are two opposite types imperative and declarative. Functional is just a subtype of declarative. BTW, wikipedia states the same fact.
Nowadays, new focus: we need the old classifications?
The Imperative/Declarative/Functional aspects was good in the past to classify generic languages, but in nowadays all "big language" (as Java, Python, Javascript, etc.) have some option (typically frameworks) to express with "other focus" than its main one (usual imperative), and to express parallel processes, declarative functions, lambdas, etc.
So a good variant of this question is "What aspect is good to classify frameworks today?"
... An important aspect is something that we can labeling "programming style"...
Focus on the fusion of data with algorithm
A good example to explain. As you can read about jQuery at Wikipedia,
The set of jQuery core features — DOM element selections, traversal and manipulation —, enabled by its selector engine (...), created a new "programming style", fusing algorithms and DOM-data-structures
So jQuery is the best (popular) example of focusing on a "new programming style", that is not only object orientation, is "Fusing algorithms and data-structures". jQuery is somewhat reactive as spreadsheets, but not "cell-oriented", is "DOM-node oriented"... Comparing the main styles in this context:
No fusion: in all "big languages", in any Functional/Declarative/Imperative expression, the usual is "no fusion" of data and algorithm, except by some object-orientation, that is a fusion in strict algebric structure point of view.
Some fusion: all classic strategies of fusion, in nowadays have some framework using it as paradigm... dataflow, Event-driven programming (or old domain specific languages as awk and XSLT)... Like programming with modern spreadsheets, they are also examples of reactive programming style.
Big fusion: is "the jQuery style"... jQuery is a domain specific language focusing on "fusing algorithms and DOM-data-structures". PS: other "query languages", as XQuery, SQL (with PL as imperative expression option) are also data-algorith-fusion examples, but they are islands, with no fusion with other system modules... Spring, when using find()-variants and Specification clauses, is another good fusion example.
Declarative programming is programming by expressing some timeless logic between the input and the output, for instance, in pseudocode, the following example would be declarative:
def factorial(n):
if n < 2:
return 1
else:
return factorial(n-1)
output = factorial(argvec[0])
We just define a relationship called the 'factorial' here, and defined the relationship between the output and the input as the that relationship. As should be evident here, about any structured language allows declarative programming to some extend. A central idea of declarative programming is immutable data, if you assign to a variable, you only do so once, and then never again. Other, stricter definitions entail that there may be no side-effects at all, these languages are some times called 'purely declarative'.
The same result in an imperative style would be:
a = 1
b = argvec[0]
while(b < 2):
a * b--
output = a
In this example, we expressed no timeless static logical relationship between the input and the output, we changed memory addresses manually until one of them held the desired result. It should be evident that all languages allow declarative semantics to some extend, but not all allow imperative, some 'purely' declarative languages permit side effects and mutation altogether.
Declarative languages are often said to specify 'what must be done', as opposed to 'how to do it', I think that is a misnomer, declarative programs still specify how one must get from input to output, but in another way, the relationship you specify must be effectively computable (important term, look it up if you don't know it). Another approach is nondeterministic programming, that really just specifies what conditions a result much meet, before your implementation just goes to exhaust all paths on trial and error until it succeeds.
Purely declarative languages include Haskell and Pure Prolog. A sliding scale from one and to the other would be: Pure Prolog, Haskell, OCaml, Scheme/Lisp, Python, Javascript, C--, Perl, PHP, C++, Pascall, C, Fortran, Assembly
Some good answers here regarding the noted "types".
I submit some additional, more "exotic" concepts often associated with the functional programming crowd:
Domain Specific Language or DSL Programming: creating a new language to deal with the problem at hand.
Meta-Programming: when your program writes other programs.
Evolutionary Programming: where you build a system that continually improves itself or generates successively better generations of sub-programs.
In a nutshell, the more a programming style emphasizes What (to do) abstracting away the details of How (to do it) the more that style is considered to be declarative. The opposite is true for imperative. Functional programming is associated with the declarative style.

What's the best way to model an unordered list (i.e., a set)?

What's the most natural way to model a group of objects that form a set? For example, you might have a bunch of user objects who are all subscribers to a mailing list.
Obviously you could model this as an array, but then you have to order the elements and whoever is using your interface might be confused as to why you're encoding arbitrary ordering data.
You can use a hash where the members are keys that map to "1" or "true", but in most languages there are restrictions on what data types a hash key can be.
What's the standard way to do this in modern languages (PHP, Perl, Ruby, Python, etc)?
In Python, you would use the set datatype. A set supports containing any hashable object, so if you have a custom class you need to store in a set and the default hashable behaviour is not appropriate, you can implement __hash__ to implement the behaviour you want.
C# has the HashSet<T> generic collection.
public class EmailAddress // probably needs to override GetHashCode()
{
...
}
var addresses = new HashSet<EmailAddress>();
Most modern languages are going to have some form of Set data structure. Java has HashSet, which implements the Set interface.
In PHP you can use an array to store your data. Either search the array before you add a new element, or use array_unique to remove duplicates after inserting all elements.
In c as a stand-in for understanding the machine directly:
For small, discrete and well defined ranges: use a bitwise array to indicate the presence of each possible item (set for present, unset for absent).
Use a hash-table for all other cases.
Write functions to implement adding and removing items, testing for presence or absence, testing for sub-sets, etc as needed.
As the other answers note, however, if you just want the functionality, use a language feature or third-party library that is already well debugged.
A lot of the time hash-based sets are the correct thing to use, but if you don't need to do key-based lookups and don't worry about enforcing unique values, a vector or list is fine. There is overhead to a hash table, after all.
You seem to be concerned that people will think that the order in the vector is important, but I think that it is a common enough usage that, with documentation, you shouldn't confuse people.
It really depends on how you want to access and use the data.
and Array is usually the simplest way to store data, without any other requirements. Usually other data types are used for different reasons (you want to append data, you want to search data in constant time, you need quick set union/intersection, etc) If your only concern is the abstraction you could wrap it in some kind of unordered facade.
In Perl I would use a hash, definitely. In other languages I would lament the lack of a hash.