What are the differences between genetic algorithms and genetic programming?

What are the differences between genetic algorithms and genetic programming? - terminology

I would like to have a simple explanation of the differences between genetic algorithms and genetic programming (without too much programming jargon). Examples would also be appreciated.
Apparently, in genetic programming, solutions are computer programs. On the other hand, genetic algorithms represent a solution as a string of numbers. Any other differences?

Genetic algorithms (GA) are search algorithms that mimic the process of natural evolution, where each individual is a candidate solution: individuals are generally "raw data" (in whatever encoding format has been defined).
Genetic programming (GP) is considered a special case of GA, where each individual is a computer program (not just "raw data"). GP explore the algorithmic search space and evolve computer programs to perform a defined task.

Genetic programming and genetic algorithms are very similar. They are both used to evolve the answer to a problem, by comparing the fitness of each candidate in a population of potential candidates over many generations.
Each generation, new candidates are found by randomly changing (mutation) or swapping parts (crossover) of other candidates. The least 'fit' candidates are removed from the population.
Structural differences
The main difference between them is the representation of the algorithm/program.
A genetic algorithm is represented as a list of actions and values, often a string. for example:
1+x*3-5*6
A parser has to be written for this encoding, to understand how to turn this into a function. The resulting function might look like this:
function(x) { return 1 * x * 3 - 5 * 6; }
The parser also needs to know how to deal with invalid states, because mutation and crossover operations don't care about the semantics of the algorithm, for example the following string could be produced: 1+/3-2*. An approach needs to be decided to deal with these invalid states.
A genetic program is represented as a tree structure of actions and values, usually a nested data structure. Here's the same example, illustrated as a tree:
-
/ \
* *
/ \ / \
1 * 5 6
/ \
x 3
A parser also has to be written for this encoding, but genetic programming does not (usually) produce invalid states because mutation and crossover operations work within the structure of the tree.
Practical differences
Genetic algorithms
Inherently have a fixed length, meaning the resulting function has bounded complexity
Often produces invalid states, so these need to be handled non-destructively
Often rely on operator precedence (e.g. in our example multiplication happens before subtraction) which could be seen as a limitation
Genetic programs
Inherently have a variable length, meaning they are more flexible, but often grow in complexity
Rarely produces invalid states, these can usually be discarded
Use an explicit structure to avoid operator precedence entirely

To make it simple, (on the way I see it) Genetic Programming is an application of Genetic Algorithm. The Genetic Algorithm is used to create another solution via a computer program.

Practical answer:
GA is when using a population and evolve the generations of population to a better state.
(For example how the humans have evolved from animals to people, by breading and get better genes)
GP is when by known definition of the problem generate code into better solve a problem.
(GP will usually give a lots of if/else statements, that will explain the solution)

Lots of good partial answers above. As Koza put it in his seminal texts on the subject, "[if a GA was the best solution for a problem then a GP would evolve a GA to solve it]." Simply put, a GP is a type of GA that evolves programs that are evaluated by a cost function. The fact that the genome is a program rather than a collection of inputs for the cost function IMHO is the material difference.
https://en.wikipedia.org/wiki/Genetic_programming

genetic programming is much more powerful than genetic algorithms. the output of the genetic algorithms is a quantity while the output of the genetic programming is another computer program.

Related

Efficient implementation of multiple return values?

Is it possible to implement efficiently (with little to no runtime overhead) functions that return multiple vales / a tuple type?
In a C-like language something like this:
int, float f(int a) {
return a*2 , a / 2;
}
Is there a reason why very few statically compiled languages do this?

Yes, it can be efficient. You may need to spill registers, but it is possible.
GHC for example, implements the "constructed product return" optimization, that:
determines when a function can profitably return multiple results in registers. The analysis is based only on a function's definition, and not on its uses (so separate compilation is easily supported) and the results of the analysis can be expressed by a transformation of the function definition alone.
CPR is a huge win for returning small structures (i.e. tuples, tagged unions).
More information:
CPR analysis in GHC.
More on demand analysis.

The best paper I have read exactly about this topic is An Eﬃcient Implementation of Multiple Return Values in Scheme (PDF). Although it is about Scheme programming language, they explain the matter in terms of low machine level stack/registry implementation.
This article actually made me think that many high-level features normally considered inefficient are a solved problem regarding the efficient implementation and just the inertia of the popular languages is in the way.

If your tuple doesn't fit into a single register (32- or 64-bit, depending on your architecture, most likely), then there's going to be actual allocation (most likely on the heap) involved in implementing this.
That said, the reasons why very few languages permit this style is unlikely to be related to performance as much as it is likely related to stylistic concerns in the language (i.e., there are other idiomatic ways to achieve the same thing, such as returning a struct). Introducing new primitives to the language can be clumsy and introduce inconsistencies. For example, if tuples become first-class values, can I use them anywhere? How do I access them? Do we enforce immutability? How do I allocate or deallocate them?
Languages with more expressive type systems tend to make it easier to add these kinds of language features in a principled manner, which is why you'll find tuples (and all sorts of other exotic creatures) as first-class values in languages derived from the ML family (amongst others).

In C, just return a struct holding the values. Sure, the result won't fit in a register, unlike an int, so it will not be as efficient as an int return, but it will still be efficient, if the struct is a local variable and thus allocated on the stack rather than the heap.

How do I evaluate a text summarization tool?

I have written a system that summarizes a long document containing thousands of words. Are there any norms on how such a system should be evaluated in the context of a user survey?
In short, is there a metric for evaluating the time that my tool has saved a human? Currently, I was thinking of using the (Time taken to read the original document/Time taken to read the summary) as a way of determining the time saved, but are there better metrics?
Currently, I am asking the user subjective questions about the accuracy of the summary.

In general:
Bleu measures precision: how much the words (and/or n-grams) in the machine generated summaries appeared in the human reference summaries.
Rouge measures recall: how much the words (and/or n-grams) in the human reference summaries appeared in the machine generated summaries.
Naturally - these results are complementing, as is often the case in precision vs recall. If you have many words/ngrams from the system results appearing in the human references you will have high Bleu, and if you have many words/ngrams from the human references appearing in the system results you will have high Rouge.
There's something called brevity penalty, which is quite important and has already been added to standard Bleu implementations. It penalizes system results which are shorter than the general length of a reference (read more about it here). This complements the n-gram metric behavior which in effect penalizes longer than reference results, since the denominator grows the longer the system result is.
You could also implement something similar for Rouge, but this time penalizing system results which are longer than the general reference length, which would otherwise enable them to obtain artificially higher Rouge scores (since the longer the result, the higher the chance you would hit some word appearing in the references). In Rouge we divide by the length of the human references, so we would need an additional penalty for longer system results which could artificially raise their Rouge score.
Finally, you could use the F1 measure to make the metrics work together: F1 = 2 * (Bleu * Rouge) / (Bleu + Rouge)

BLEU
Bleu measures precision
Bilingual Evaluation Understudy
Originally for machine translation(Bilingual)
W(machine generates summary) in (Human reference Summary)
That is how much the word (and/or n-grams) in the machine generated summaries appeared in the human reference summaries
The closer a machine translation is to a professional human translation, the better it is
ROUGE
Rouge measures recall
Recall Oriented Understudy for Gisting Evaluation
-W(Human Reference Summary) In w(machine generates summary)
That is how much the words (and/or n-grams) in the machine generates summaries appeared in the machine generated summaries.
Overlap of N-grams between the system and references summaries.
-Rouge N, ehere N is n-gram
reference_text = """Artificial intelligence (AI, also machine intelligence, MI) is intelligence demonstrated by machines, in contrast to the natural intelligence (NI) displayed by humans and other animals. In computer science AI research is defined as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving". See glossary of artificial intelligence. The scope of AI is disputed: as machines become increasingly capable, tasks considered as requiring "intelligence" are often removed from the definition, a phenomenon known as the AI effect, leading to the quip "AI is whatever hasn't been done yet." For instance, optical character recognition is frequently excluded from "artificial intelligence", having become a routine technology. Capabilities generally classified as AI as of 2017 include successfully understanding human speech, competing at a high level in strategic game systems (such as chess and Go), autonomous cars, intelligent routing in content delivery networks, military simulations, and interpreting complex data, including images and videos. Artificial intelligence was founded as an academic discipline in 1956, and in the years since has experienced several waves of optimism, followed by disappointment and the loss of funding (known as an "AI winter"), followed by new approaches, success and renewed funding. For most of its history, AI research has been divided into subfields that often fail to communicate with each other. These sub-fields are based on technical considerations, such as particular goals (e.g. "robotics" or "machine learning"), the use of particular tools ("logic" or "neural networks"), or deep philosophical differences. Subfields have also been based on social factors (particular institutions or the work of particular researchers). The traditional problems (or goals) of AI research include reasoning, knowledge, planning, learning, natural language processing, perception and the ability to move and manipulate objects. General intelligence is among the field's long-term goals. Approaches include statistical methods, computational intelligence, and traditional symbolic AI. Many tools are used in AI, including versions of search and mathematical optimization, neural networks and methods based on statistics, probability and economics. The AI field draws upon computer science, mathematics, psychology, linguistics, philosophy and many others. The field was founded on the claim that human intelligence "can be so precisely described that a machine can be made to simulate it". This raises philosophical arguments about the nature of the mind and the ethics of creating artificial beings endowed with human-like intelligence, issues which have been explored by myth, fiction and philosophy since antiquity. Some people also consider AI to be a danger to humanity if it progresses unabatedly. Others believe that AI, unlike previous technological revolutions, will create a risk of mass unemployment. In the twenty-first century, AI techniques have experienced a resurgence following concurrent advances in computer power, large amounts of data, and theoretical understanding; and AI techniques have become an essential part of the technology industry, helping to solve many challenging problems in computer science."""
Abstractive summarization
# Abstractive Summarize
len(reference_text.split())
from transformers import pipeline
summarization = pipeline("summarization")
abstractve_summarization = summarization(reference_text)[0]["summary_text"]
Abstractive Output
In computer science AI research is defined as the study of "intelligent agents" Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving" Capabilities generally classified as AI as of 2017 include successfully understanding human speech, competing at a high level in strategic game systems (such as chess and Go)
EXtractive summarization
# Extractive summarize
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lex_rank import LexRankSummarizer
parser = PlaintextParser.from_string(reference_text, Tokenizer("english"))
# parser.document.sentences
summarizer = LexRankSummarizer()
extractve_summarization = summarizer(parser.document,2)
extractve_summarization) = ' '.join([str(s) for s in list(extractve_summarization)])
Extractive Output
Colloquially, the term "artificial intelligence" is often used to describe machines that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".As machines become increasingly capable, tasks considered to require "intelligence" are often removed from the definition of AI, a phenomenon known as the AI effect. Sub-fields have also been based on social factors (particular institutions or the work of particular researchers).The traditional problems (or goals) of AI research include reasoning, knowledge representation, planning, learning, natural language processing, perception and the ability to move and manipulate objects.
Using Rouge to Evaluate abstractive Summary
from rouge import Rouge
r = Rouge()
r.get_scores(abstractve_summarization, reference_text)
Using Rouge Abstractive summary output
[{'rouge-1': {'f': 0.22299651364421083,
'p': 0.9696969696969697,
'r': 0.12598425196850394},
'rouge-2': {'f': 0.21328671127225052,
'p': 0.9384615384615385,
'r': 0.1203155818540434},
'rouge-l': {'f': 0.29041095634452996,
'p': 0.9636363636363636,
'r': 0.17096774193548386}}]
Using Rouge to Evaluate abstractive Summary
from rouge import Rouge
r = Rouge()
r.get_scores(extractve_summarization, reference_text)
Using Rouge Extractive summary output
[{'rouge-1': {'f': 0.27860696251962963,
'p': 0.8842105263157894,
'r': 0.16535433070866143},
'rouge-2': {'f': 0.22296172781038814,
'p': 0.7127659574468085,
'r': 0.13214990138067062},
'rouge-l': {'f': 0.354755780824869,
'p': 0.8734177215189873,
'r': 0.22258064516129034}}]
Interpreting rouge scores
ROUGE is a score of overlapping words. ROUGE-N refers to overlapping n-grams. Specifically:
I tried to simplify the notation when compared with the original paper. Let's assume we are calculating ROUGE-2, aka bigram matches. The numerator ∑s loops through all bigrams in a single reference summary and calculates the number of times a matching bigram is found in the candidate summary (proposed by the summarization algorithm). If there are more than one reference summary, ∑r ensures we repeat the process over all reference summaries.
The denominator simply counts the total number of bigrams in all reference summaries. This is the process for one document-summary pair. You repeat the process for all documents, and average all the scores and that gives you a ROUGE-N score. So a higher score would mean that on average there is a high overlap of n-grams between your summaries and the references.
Example:
S1. police killed the gunman
S2. police kill the gunman
S3. the gunman kill police
S1 is the reference and S2 and S3 are candidates. Note S2 and S3 both have one overlapping bigram with the reference, so they have the same ROUGE-2 score, although S2 should be better. An additional ROUGE-L score deals with this, where L stands for Longest Common Subsequence. In S2, the first word and last two words match the reference, so it scores 3/4, whereas S3 only matches the bigram, so scores 2/4.

Historically, summarization systems have often been evaluated by comparing to human-generated reference summaries. In some cases, the human summarizer constructs a summary by selecting relevant sentences from the original document; in others, the summaries are hand-written from scratch.
Those two techniques are analogous to the two major categories of automatic summarization systems - extractive vs. abstractive (more details available on Wikipedia).
One standard tool is Rouge, a script (or a set of scripts; I can't remember offhand) that computes n-gram overlap between the automatic summary and a reference summary. Rough can optionally compute overlap allowing word insertions or deletions between the two summaries (e.g. if allowing a 2-word skip, 'installed pumps' would be credited as a match to 'installed defective flood-control pumps').
My understanding is that Rouge's n-gram overlap scores were fairly well correlated with human evaluation of summaries up to some level of accuracy, but that the relationship may break down as summarization quality improves. I.e., that beyond some quality threshold, summaries that are judged better by human evaluators may be scored similarly to - or outscored by - summaries judged inferior. Nevertheless, Rouge scores might be a helpful first cut at comparing 2 candidate summarization systems, or a way to automate regression testing and weed out serious regressions before passing a system on to human evaluators.
Your approach of collecting human judgements is probably the best evaluation, if you're able to afford the time / monetary cost. To add a little rigor to that process, you might look at the scoring criteria used in recent summarization tasks (see the various conferences mentioned by #John Lehmann). The scoresheets used by those evaluators might help guide your own evaluation.

I'm not sure about the time evaluation, but regarding accuracy you might consult literature under the topic Automatic Document Summarization. The primary evaluation was the Document Understanding Conference (DUC) until the Summarization task was moved into Text Analysis Conference (TAC) in 2008. Most of these focus on advanced summarization topics such as multi-document, multi-lingual, and update summaries.
You can find the evaluation guidelines for each of these events posted online. For single document summarization tasks look at DUC 2002-2004.
Or, you might consult the ADS evaluation section in Wikipedia.

There is also the very recent BERTScore metric (arXiv'19, ICLR'20, already almost 90 citations) that does not suffer from the well-known issues of ROUGE and BLEU.
Abstract from the paper:
We propose BERTScore, an automatic evaluation metric for text
generation. Analogously to common metrics, BERTScore computes a
similarity score for each token in the candidate sentence with each
token in the reference sentence. However, instead of exact matches, we
compute token similarity using contextual embeddings. We evaluate
using the outputs of 363 machine translation and image captioning
systems. BERTScore correlates better with human judgments and provides
stronger model selection performance than existing metrics. Finally,
we use an adversarial paraphrase detection task to show that BERTScore
is more robust to challenging examples when compared to existing
metrics.
Paper: https://arxiv.org/pdf/1904.09675.pdf
Code: https://github.com/Tiiiger/bert_score
Full reference:
Zhang, Tianyi, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. "Bertscore: Evaluating text generation with bert." arXiv preprint arXiv:1904.09675 (2019).

There are many parameters against which you can evaluate your summarization system.
like
Precision = Number of important sentences/Total number of sentences summarized.
Recall = Total number of important sentences Retrieved / Total number of important sentences present.
F Score = 2*(Precision*Recall/Precision+ Recall)
Compressed Rate = Total number of words in the summary / Total number of words in original document.

When you are evaluating an automatic summarisation system you would typically look at the content of the summary rather than time.
Your idea of:
(Time taken to read the original document/Time taken to read the summary)
Doesn't tell you much about your summarisation system, it really only gives you and idea of the compression rate of your system (i.e. the summary is 10% of the original document).
You may want to consider the time it takes your system to summarise a document vs. the time it would take a human (system: 2s, human: 10 mins).

I recommend BartScore. Check the Github page and the article. The authors issued also a meta-evaluation on the ExplainaBoard platform, "which allows to interactively understand the strengths, weaknesses, and complementarity of each metric". You can find the list of most of the state-of-the-art metrics there.

As a quick summary for collection of metrics, I wrote a post descrbing the evaluation metrics, what kind of metrics do we have ? what's the difference between human evaluation ? etc. You can read the blog post Evaluation Metrics: Assessing the quality of NLG outputs.
Also, along with the NLP projects we created and publicly released an evaluation package Jury which is still actively maintained and you can see the reasons why we created such a package in the repo. There are packages to carry out evaluation in NLP (some of them are specialized in a spesific NLP task):
jury
datasets
sacrebleu (Machine Translation)
SummEval (Summarization)

Calculating 4th power differences

I am using Modelica for solving a system of equations for heat transfer problems, and one of them is radiation which is written as
Ta^4-Tb^4
Can someone say if it is computationally faster solving a system with the equation written as:
(Ta-Tb)(Ta+Tb)(Ta^2+Tb^2)
?

There cannot be a definitive answer to this question. This is because the Modelica specification is used to formally define the problem statement but it says nothing about how tools solve such equations. Furthermore, since most Modelica tools do symbolic manipulation anyway, it is hard to predict what steps they might take with such an equation. For example, a tool may very well transform this into a Horner polynomial on its own (without your manual intervention).
If you are going to solve for the temperatures in such an equation as a non-linear system, be careful about negative temperature solutions. You should investigate the "start" attribute to specify initial (positive) guesses when these temperatures are iteration variables in non-linear problems.

I would say that there are two reasons why splitting it into (Ta-Tb)(Ta+Tb)(Ta^2+Tb^2) is SLOWER and NOT FASTER.
(Ta^2+Tb^2) requires 2 multiplications and an addition, which means that (Ta-Tb)(Ta+Tb)(Ta^2+Tb^2) requires 4 multiplications and 3 additions. On the other hand, i guess that Ta^4-Tb^4 is done like this: ((Ta^2)^2 - (Tb^2)^2) which means 1 addition and 4 multiplications.
Mathematica, like a more generic compiler probably knows very well how to optimise these very simple expression. Which means that it is generally safer in terms of computation time to use simple patterns which will be easily caugth and translated into super efficient machine code.
I might obviously be wrong, but I cannot see any reason why (Ta-Tb)(Ta+Tb)(Ta^2+Tb^2) could be FASTER. Hope it helps.
Oscar

What are the lesser known but useful data structures?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
There are some data structures around that are really useful but are unknown to most programmers. Which ones are they?
Everybody knows about linked lists, binary trees, and hashes, but what about Skip lists and Bloom filters for example. I would like to know more data structures that are not so common, but are worth knowing because they rely on great ideas and enrich a programmer's tool box.
PS: I am also interested in techniques like Dancing links which make clever use of properties of a common data structure.
EDIT:
Please try to include links to pages describing the data structures in more detail. Also, try to add a couple of words on why a data structure is cool (as Jonas Kölker already pointed out). Also, try to provide one data-structure per answer. This will allow the better data structures to float to the top based on their votes alone.

Tries, also known as prefix-trees or crit-bit trees, have existed for over 40 years but are still relatively unknown. A very cool use of tries is described in "TRASH - A dynamic LC-trie and hash data structure", which combines a trie with a hash function.

Bloom filter: Bit array of m bits, initially all set to 0.
To add an item you run it through k hash functions that will give you k indices in the array which you then set to 1.
To check if an item is in the set, compute the k indices and check if they are all set to 1.
Of course, this gives some probability of false-positives (according to wikipedia it's about 0.61^(m/n) where n is the number of inserted items). False-negatives are not possible.
Removing an item is impossible, but you can implement counting bloom filter, represented by array of ints and increment/decrement.

Rope: It's a string that allows for cheap prepends, substrings, middle insertions and appends. I've really only had use for it once, but no other structure would have sufficed. Regular strings and arrays prepends were just far too expensive for what we needed to do, and reversing everthing was out of the question.

Skip lists are pretty neat.
Wikipedia
A skip list is a probabilistic data structure, based on multiple parallel, sorted linked lists, with efficiency comparable to a binary search tree (order log n average time for most operations).
They can be used as an alternative to balanced trees (using probalistic balancing rather than strict enforcement of balancing). They are easy to implement and faster than say, a red-black tree. I think they should be in every good programmers toolchest.
If you want to get an in-depth introduction to skip-lists here is a link to a video of MIT's Introduction to Algorithms lecture on them.
Also, here is a Java applet demonstrating Skip Lists visually.

Spatial Indices, in particular R-trees and KD-trees, store spatial data efficiently. They are good for geographical map coordinate data and VLSI place and route algorithms, and sometimes for nearest-neighbor search.
Bit Arrays store individual bits compactly and allow fast bit operations.

Zippers - derivatives of data structures that modify the structure to have a natural notion of 'cursor' -- current location. These are really useful as they guarantee indicies cannot be out of bound -- used, e.g. in the xmonad window manager to track which window has focused.
Amazingly, you can derive them by applying techniques from calculus to the type of the original data structure!

Here are a few:
Suffix tries. Useful for almost all kinds of string searching (http://en.wikipedia.org/wiki/Suffix_trie#Functionality). See also suffix arrays; they're not quite as fast as suffix trees, but a whole lot smaller.
Splay trees (as mentioned above). The reason they are cool is threefold:
They are small: you only need the left and right pointers like you do in any binary tree (no node-color or size information needs to be stored)
They are (comparatively) very easy to implement
They offer optimal amortized complexity for a whole host of "measurement criteria" (log n lookup time being the one everybody knows). See http://en.wikipedia.org/wiki/Splay_tree#Performance_theorems
Heap-ordered search trees: you store a bunch of (key, prio) pairs in a tree, such that it's a search tree with respect to the keys, and heap-ordered with respect to the priorities. One can show that such a tree has a unique shape (and it's not always fully packed up-and-to-the-left). With random priorities, it gives you expected O(log n) search time, IIRC.
A niche one is adjacency lists for undirected planar graphs with O(1) neighbour queries. This is not so much a data structure as a particular way to organize an existing data structure. Here's how you do it: every planar graph has a node with degree at most 6. Pick such a node, put its neighbors in its neighbor list, remove it from the graph, and recurse until the graph is empty. When given a pair (u, v), look for u in v's neighbor list and for v in u's neighbor list. Both have size at most 6, so this is O(1).
By the above algorithm, if u and v are neighbors, you won't have both u in v's list and v in u's list. If you need this, just add each node's missing neighbors to that node's neighbor list, but store how much of the neighbor list you need to look through for fast lookup.

I think lock-free alternatives to standard data structures i.e lock-free queue, stack and list are much overlooked.
They are increasingly relevant as concurrency becomes a higher priority and are much more admirable goal than using Mutexes or locks to handle concurrent read/writes.
Here's some links
http://www.cl.cam.ac.uk/research/srg/netos/lock-free/
http://www.research.ibm.com/people/m/michael/podc-1996.pdf [Links to PDF]
http://www.boyet.com/Articles/LockfreeStack.html
Mike Acton's (often provocative) blog has some excellent articles on lock-free design and approaches

I think Disjoint Set is pretty nifty for cases when you need to divide a bunch of items into distinct sets and query membership. Good implementation of the Union and Find operations result in amortized costs that are effectively constant (inverse of Ackermnan's Function, if I recall my data structures class correctly).

Fibonacci heaps
They're used in some of the fastest known algorithms (asymptotically) for a lot of graph-related problems, such as the Shortest Path problem. Dijkstra's algorithm runs in O(E log V) time with standard binary heaps; using Fibonacci heaps improves that to O(E + V log V), which is a huge speedup for dense graphs. Unfortunately, though, they have a high constant factor, often making them impractical in practice.

Anyone with experience in 3D rendering should be familiar with BSP trees. Generally, it's the method by structuring a 3D scene to be manageable for rendering knowing the camera coordinates and bearing.
Binary space partitioning (BSP) is a
method for recursively subdividing a
space into convex sets by hyperplanes.
This subdivision gives rise to a
representation of the scene by means
of a tree data structure known as a
BSP tree.
In other words, it is a method of
breaking up intricately shaped
polygons into convex sets, or smaller
polygons consisting entirely of
non-reflex angles (angles smaller than
180°). For a more general description
of space partitioning, see space
partitioning.
Originally, this approach was proposed
in 3D computer graphics to increase
the rendering efficiency. Some other
applications include performing
geometrical operations with shapes
(constructive solid geometry) in CAD,
collision detection in robotics and 3D
computer games, and other computer
applications that involve handling of
complex spatial scenes.

Huffman trees - used for compression.

Have a look at Finger Trees, especially if you're a fan of the previously mentioned purely functional data structures. They're a functional representation of persistent sequences supporting access to the ends in amortized constant time, and concatenation and splitting in time logarithmic in the size of the smaller piece.
As per the original article:
Our functional 2-3 finger trees are an instance of a general design technique in- troduced by Okasaki (1998), called implicit recursive slowdown. We have already noted that these trees are an extension of his implicit deque structure, replacing pairs with 2-3 nodes to provide the flexibility required for efficient concatenation and splitting.
A Finger Tree can be parameterized with a monoid, and using different monoids will result in different behaviors for the tree. This lets Finger Trees simulate other data structures.

Circular or ring buffer - used for streaming, among other things.

I'm surprised no one has mentioned Merkle trees (ie. Hash Trees).
Used in many cases (P2P programs, digital signatures) where you want to verify the hash of a whole file when you only have part of the file available to you.

<zvrba> Van Emde-Boas trees
I think it'd be useful to know why they're cool. In general, the question "why" is the most important to ask ;)
My answer is that they give you O(log log n) dictionaries with {1..n} keys, independent of how many of the keys are in use. Just like repeated halving gives you O(log n), repeated sqrting gives you O(log log n), which is what happens in the vEB tree.

How about splay trees?
Also, Chris Okasaki's purely functional data structures come to mind.

An interesting variant of the hash table is called Cuckoo Hashing. It uses multiple hash functions instead of just 1 in order to deal with hash collisions. Collisions are resolved by removing the old object from the location specified by the primary hash, and moving it to a location specified by an alternate hash function. Cuckoo Hashing allows for more efficient use of memory space because you can increase your load factor up to 91% with only 3 hash functions and still have good access time.

A min-max heap is a variation of a heap that implements a double-ended priority queue. It achieves this by by a simple change to the heap property: A tree is said to be min-max ordered if every element on even (odd) levels are less (greater) than all childrens and grand children. The levels are numbered starting from 1.
http://internet512.chonbuk.ac.kr/datastructure/heap/img/heap8.jpg

I like Cache Oblivious datastructures. The basic idea is to lay out a tree in recursively smaller blocks so that caches of many different sizes will take advantage of blocks that convenient fit in them. This leads to efficient use of caching at everything from L1 cache in RAM to big chunks of data read off of the disk without needing to know the specifics of the sizes of any of those caching layers.

Left Leaning Red-Black Trees. A significantly simplified implementation of red-black trees by Robert Sedgewick published in 2008 (~half the lines of code to implement). If you've ever had trouble wrapping your head around the implementation of a Red-Black tree, read about this variant.
Very similar (if not identical) to Andersson Trees.

Work Stealing Queue
Lock-free data structure for dividing the work equaly among multiple threads
Implementation of a work stealing queue in C/C++?

Bootstrapped skew-binomial heaps by Gerth Stølting Brodal and Chris Okasaki:
Despite their long name, they provide asymptotically optimal heap operations, even in a function setting.
O(1) size, union, insert, minimum
O(log n) deleteMin
Note that union takes O(1) rather than O(log n) time unlike the more well-known heaps that are commonly covered in data structure textbooks, such as leftist heaps. And unlike Fibonacci heaps, those asymptotics are worst-case, rather than amortized, even if used persistently!
There are multiple implementations in Haskell.
They were jointly derived by Brodal and Okasaki, after Brodal came up with an imperative heap with the same asymptotics.

Kd-Trees, spatial data structure used (amongst others) in Real-Time Raytracing, has the downside that triangles that cross intersect the different spaces need to be clipped. Generally BVH's are faster because they are more lightweight.
MX-CIF Quadtrees, store bounding boxes instead of arbitrary point sets by combining a regular quadtree with a binary tree on the edges of the quads.
HAMT, hierarchical hash map with access times that generally exceed O(1) hash-maps due to the constants involved.
Inverted Index, quite well known in the search-engine circles, because it's used for fast retrieval of documents associated with different search-terms.
Most, if not all, of these are documented on the NIST Dictionary of Algorithms and Data Structures

Ball Trees. Just because they make people giggle.
A ball tree is a data structure that indexes points in a metric space. Here's an article on building them. They are often used for finding nearest neighbors to a point or accelerating k-means.

Not really a data structure; more of a way to optimize dynamically allocated arrays, but the gap buffers used in Emacs are kind of cool.

Fenwick Tree. It's a data structure to keep count of the sum of all elements in a vector, between two given subindexes i and j. The trivial solution, precalculating the sum since the begining doesn't allow to update a item (you have to do O(n) work to keep up).
Fenwick Trees allow you to update and query in O(log n), and how it works is really cool and simple. It's really well explained in Fenwick's original paper, freely available here:
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol24/issue3/spe884.pdf
Its father, the RQM tree is also very cool: It allows you to keep info about the minimum element between two indexes of the vector, and it also works in O(log n) update and query. I like to teach first the RQM and then the Fenwick Tree.

Van Emde-Boas trees. I have even a C++ implementation of it, for up to 2^20 integers.

Nested sets are nice for representing trees in the relational databases and running queries on them. For instance, ActiveRecord (Ruby on Rails' default ORM) comes with a very simple nested set plugin, which makes working with trees trivial.

It's pretty domain-specific, but half-edge data structure is pretty neat. It provides a way to iterate over polygon meshes (faces and edges) which is very useful in computer graphics and computational geometry.

Functional, Declarative, and Imperative Programming [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
What do the terms functional, declarative, and imperative programming mean?

At the time of writing this, the top voted answers on this page are imprecise and muddled on the declarative vs. imperative definition, including the answer that quotes Wikipedia. Some answers are conflating the terms in different ways.
Refer also to my explanation of why spreadsheet programming is declarative, regardless that the formulas mutate the cells.
Also, several answers claim that functional programming must be a subset of declarative. On that point it depends if we differentiate "function" from "procedure". Lets handle imperative vs. declarative first.
Definition of declarative expression
The only attribute that can possibly differentiate a declarative expression from an imperative expression is the referential transparency (RT) of its sub-expressions. All other attributes are either shared between both types of expressions, or derived from the RT.
A 100% declarative language (i.e. one in which every possible expression is RT) does not (among other RT requirements) allow the mutation of stored values, e.g. HTML and most of Haskell.
Definition of RT expression
RT is often referred to as having "no side-effects". The term effects does not have a precise definition, so some people don't agree that "no side-effects" is the same as RT. RT has a precise definition:
An expression e is referentially transparent if for all programs p every occurrence of e in p can be replaced with the result of evaluating e, without affecting the observable result of p.
Since every sub-expression is conceptually a function call, RT requires that the implementation of a function (i.e. the expression(s) inside the called function) may not access the mutable state that is external to the function (accessing the mutable local state is allowed). Put simply, the function (implementation) should be pure.
Definition of pure function
A pure function is often said to have "no side-effects". The term effects does not have a precise definition, so some people don't agree.
Pure functions have the following attributes.
the only observable output is the return value.
the only output dependency is the arguments.
arguments are fully determined before any output is generated.
Remember that RT applies to expressions (which includes function calls) and purity applies to (implementations of) functions.
An obscure example of impure functions that make RT expressions is concurrency, but this is because the purity is broken at the interrupt abstraction layer. You don't really need to know this. To make RT expressions, you call pure functions.
Derivative attributes of RT
Any other attribute cited for declarative programming, e.g. the citation from 1999 used by Wikipedia, either derives from RT, or is shared with imperative programming. Thus proving that my precise definition is correct.
Note, immutability of external values is a subset of the requirements for RT.
Declarative languages don't have looping control structures, e.g. for and while, because due to immutability, the loop condition would never change.
Declarative languages don't express control-flow other than nested function order (a.k.a logical dependencies), because due to immutability, other choices of evaluation order do not change the result (see below).
Declarative languages express logical "steps" (i.e. the nested RT function call order), but whether each function call is a higher level semantic (i.e. "what to do") is not a requirement of declarative programming. The distinction from imperative is that due to immutability (i.e. more generally RT), these "steps" cannot depend on mutable state, rather only the relational order of the expressed logic (i.e. the order of nesting of the function calls, a.k.a. sub-expressions).
For example, the HTML paragraph <p> cannot be displayed until the sub-expressions (i.e. tags) in the paragraph have been evaluated. There is no mutable state, only an order dependency due to the logical relationship of tag hierarchy (nesting of sub-expressions, which are analogously nested function calls).
Thus there is the derivative attribute of immutability (more generally RT), that declarative expressions, express only the logical relationships of the constituent parts (i.e. of the sub-expression function arguments) and not mutable state relationships.
Evaluation order
The choice of evaluation order of sub-expressions can only give a varying result when any of the function calls are not RT (i.e. the function is not pure), e.g. some mutable state external to a function is accessed within the function.
For example, given some nested expressions, e.g. f( g(a, b), h(c, d) ), eager and lazy evaluation of the function arguments will give the same results if the functions f, g, and h are pure.
Whereas, if the functions f, g, and h are not pure, then the choice of evaluation order can give a different result.
Note, nested expressions are conceptually nested functions, since expression operators are just function calls masquerading as unary prefix, unary postfix, or binary infix notation.
Tangentially, if all identifiers, e.g. a, b, c, d, are immutable everywhere, state external to the program cannot be accessed (i.e. I/O), and there is no abstraction layer breakage, then functions are always pure.
By the way, Haskell has a different syntax, f (g a b) (h c d).
Evaluation order details
A function is a state transition (not a mutable stored value) from the input to the output. For RT compositions of calls to pure functions, the order-of-execution of these state transitions is independent. The state transition of each function call is independent of the others, due to lack of side-effects and the principle that an RT function may be replaced by its cached value. To correct a popular misconception, pure monadic composition is always declarative and RT, in spite of the fact that Haskell's IO monad is arguably impure and thus imperative w.r.t. the World state external to the program (but in the sense of the caveat below, the side-effects are isolated).
Eager evaluation means the functions arguments are evaluated before the function is called, and lazy evaluation means the arguments are not evaluated until (and if) they are accessed within the function.
Definition: function parameters are declared at the function definition site, and function arguments are supplied at the function call site. Know the difference between parameter and argument.
Conceptually, all expressions are (a composition of) function calls, e.g. constants are functions without inputs, unary operators are functions with one input, binary infix operators are functions with two inputs, constructors are functions, and even control statements (e.g. if, for, while) can be modeled with functions. The order that these argument functions (do not confuse with nested function call order) are evaluated is not declared by the syntax, e.g. f( g() ) could eagerly evaluate g then f on g's result or it could evaluate f and only lazily evaluate g when its result is needed within f.
Caveat, no Turing complete language (i.e. that allows unbounded recursion) is perfectly declarative, e.g. lazy evaluation introduces memory and time indeterminism. But these side-effects due to the choice of evaluation order are limited to memory consumption, execution time, latency, non-termination, and external hysteresis thus external synchronization.
Functional programming
Because declarative programming cannot have loops, then the only way to iterate is functional recursion. It is in this sense that functional programming is related to declarative programming.
But functional programming is not limited to declarative programming. Functional composition can be contrasted with subtyping, especially with respect to the Expression Problem, where extension can be achieved by either adding subtypes or functional decomposition. Extension can be a mix of both methodologies.
Functional programming usually makes the function a first-class object, meaning the function type can appear in the grammar anywhere any other type may. The upshot is that functions can input and operate on functions, thus providing for separation-of-concerns by emphasizing function composition, i.e. separating the dependencies among the subcomputations of a deterministic computation.
For example, instead of writing a separate function (and employing recursion instead of loops if the function must also be declarative) for each of an infinite number of possible specialized actions that could be applied to each element of a collection, functional programming employs reusable iteration functions, e.g. map, fold, filter. These iteration functions input a first-class specialized action function. These iteration functions iterate the collection and call the input specialized action function for each element. These action functions are more concise because they no longer need to contain the looping statements to iterate the collection.
However, note that if a function is not pure, then it is really a procedure. We can perhaps argue that functional programming that uses impure functions, is really procedural programming. Thus if we agree that declarative expressions are RT, then we can say that procedural programming is not declarative programming, and thus we might argue that functional programming is always RT and must be a subset of declarative programming.
Parallelism
This functional composition with first-class functions can express the depth in the parallelism by separating out the independent function.
Brent’s Principle: computation with work w and depth d can be
implemented in a p-processor PRAM in time O(max(w/p, d)).
Both concurrency and parallelism also require declarative programming, i.e. immutability and RT.
So where did this dangerous assumption that Parallelism == Concurrency
come from? It’s a natural consequence of languages with side-effects:
when your language has side-effects everywhere, then any time you try
to do more than one thing at a time you essentially have
non-determinism caused by the interleaving of the effects from each
operation. So in side-effecty languages, the only way to get
parallelism is concurrency; it’s therefore not surprising that we
often see the two conflated.
FP evaluation order
Note the evaluation order also impacts the termination and performance side-effects of functional composition.
Eager (CBV) and lazy (CBN) are categorical duels[10], because they have reversed evaluation order, i.e. whether the outer or inner functions respectively are evaluated first. Imagine an upside-down tree, then eager evaluates from function tree branch tips up the branch hierarchy to the top-level function trunk; whereas, lazy evaluates from the trunk down to the branch tips. Eager doesn't have conjunctive products ("and", a/k/a categorical "products") and lazy doesn't have disjunctive coproducts ("or", a/k/a categorical "sums")[11].
Performance
Eager
As with non-termination, eager is too eager with conjunctive functional composition, i.e. compositional control structure does unnecessary work that isn't done with lazy. For example, eager eagerly and unnecessarily maps the entire list to booleans, when it is composed with a fold that terminates on the first true element.
This unnecessary work is the cause of the claimed "up to" an extra log n factor in the sequential time complexity of eager versus lazy, both with pure functions. A solution is to use functors (e.g. lists) with lazy constructors (i.e. eager with optional lazy products), because with eager the eagerness incorrectness originates from the inner function. This is because products are constructive types, i.e. inductive types with an initial algebra on an initial fixpoint[11]
Lazy
As with non-termination, lazy is too lazy with disjunctive functional composition, i.e. coinductive finality can occur later than necessary, resulting in both unnecessary work and non-determinism of the lateness that isn't the case with eager[10][11]. Examples of finality are state, timing, non-termination, and runtime exceptions. These are imperative side-effects, but even in a pure declarative language (e.g. Haskell), there is state in the imperative IO monad (note: not all monads are imperative!) implicit in space allocation, and timing is state relative to the imperative real world. Using lazy even with optional eager coproducts leaks "laziness" into inner coproducts, because with lazy the laziness incorrectness originates from the outer function (see the example in the Non-termination section, where == is an outer binary operator function). This is because coproducts are bounded by finality, i.e. coinductive types with a final algebra on an final object[11].
Lazy causes indeterminism in the design and debugging of functions for latency and space, the debugging of which is probably beyond the capabilities of the majority of programmers, because of the dissonance between the declared function hierarchy and the runtime order-of-evaluation. Lazy pure functions evaluated with eager, could potentially introduce previously unseen non-termination at runtime. Conversely, eager pure functions evaluated with lazy, could potentially introduce previously unseen space and latency indeterminism at runtime.
Non-termination
At compile-time, due to the Halting problem and mutual recursion in a Turing complete language, functions can't generally be guaranteed to terminate.
Eager
With eager but not lazy, for the conjunction of Head "and" Tail, if either Head or Tail doesn't terminate, then respectively either List( Head(), Tail() ).tail == Tail() or List( Head(), Tail() ).head == Head() is not true because the left-side doesn't, and right-side does, terminate.
Whereas, with lazy both sides terminate. Thus eager is too eager with conjunctive products, and non-terminates (including runtime exceptions) in those cases where it isn't necessary.
Lazy
With lazy but not eager, for the disjunction of 1 "or" 2, if f doesn't terminate, then List( f ? 1 : 2, 3 ).tail == (f ? List( 1, 3 ) : List( 2, 3 )).tail is not true because the left-side terminates, and right-side doesn't.
Whereas, with eager neither side terminates so the equality test is never reached. Thus lazy is too lazy with disjunctive coproducts, and in those cases fails to terminate (including runtime exceptions) after doing more work than eager would have.
[10] Declarative Continuations and Categorical Duality, Filinski, sections 2.5.4 A comparison of CBV and CBN, and 3.6.1 CBV and CBN in the SCL.
[11] Declarative Continuations and Categorical Duality, Filinski, sections 2.2.1 Products and coproducts, 2.2.2 Terminal and initial objects, 2.5.2 CBV with lazy products, and 2.5.3 CBN with eager coproducts.

There's not really any non-ambiguous, objective definition for these. Here is how I would define them:
Imperative - The focus is on what steps the computer should take rather than what the computer will do (ex. C, C++, Java).
Declarative - The focus is on what the computer should do rather than how it should do it (ex. SQL).
Functional - a subset of declarative languages that has heavy focus on recursion

imperative and declarative describe two opposing styles of programming. imperative is the traditional "step by step recipe" approach while declarative is more "this is what i want, now you work out how to do it".
these two approaches occur throughout programming - even with the same language and the same program. generally the declarative approach is considered preferable, because it frees the programmer from having to specify so many details, while also having less chance for bugs (if you describe the result you want, and some well-tested automatic process can work backwards from that to define the steps then you might hope that things are more reliable than having to specify each step by hand).
on the other hand, an imperative approach gives you more low level control - it's the "micromanager approach" to programming. and that can allow the programmer to exploit knowledge about the problem to give a more efficient answer. so it's not unusual for some parts of a program to be written in a more declarative style, but for the speed-critical parts to be more imperative.
as you might imagine, the language you use to write a program affects how declarative you can be - a language that has built-in "smarts" for working out what to do given a description of the result is going to allow a much more declarative approach than one where the programmer needs to first add that kind of intelligence with imperative code before being able to build a more declarative layer on top. so, for example, a language like prolog is considered very declarative because it has, built-in, a process that searches for answers.
so far, you'll notice that i haven't mentioned functional programming. that's because it's a term whose meaning isn't immediately related to the other two. at its most simple, functional programming means that you use functions. in particular, that you use a language that supports functions as "first class values" - that means that not only can you write functions, but you can write functions that write functions (that write functions that...), and pass functions to functions. in short - that functions are as flexible and common as things like strings and numbers.
it might seem odd, then, that functional, imperative and declarative are often mentioned together. the reason for this is a consequence of taking the idea of functional programming "to the extreme". a function, in it's purest sense, is something from maths - a kind of "black box" that takes some input and always gives the same output. and that kind of behaviour doesn't require storing changing variables. so if you design a programming language whose aim is to implement a very pure, mathematically influenced idea of functional programming, you end up rejecting, largely, the idea of values that can change (in a certain, limited, technical sense).
and if you do that - if you limit how variables can change - then almost by accident you end up forcing the programmer to write programs that are more declarative, because a large part of imperative programming is describing how variables change, and you can no longer do that! so it turns out that functional programming - particularly, programming in a functional language - tends to give more declarative code.
to summarise, then:
imperative and declarative are two opposing styles of programming (the same names are used for programming languages that encourage those styles)
functional programming is a style of programming where functions become very important and, as a consequence, changing values become less important. the limited ability to specify changes in values forces a more declarative style.
so "functional programming" is often described as "declarative".

In a nutshell:
An imperative language specfies a series of instructions that the computer executes in sequence (do this, then do that).
A declarative language declares a set of rules about what outputs should result from which inputs (eg. if you have A, then the result is B). An engine will apply these rules to inputs, and give an output.
A functional language declares a set of mathematical/logical functions which define how input is translated to output. eg. f(y) = y * y. it is a type of declarative language.

Imperative: how to achieve our goal
Take the next customer from a list.
If the customer lives in Spain, show their details.
If there are more customers in the list, go to the beginning
Declarative: what we want to achieve
Show customer details of every customer living in Spain

Imperative Programming means any style of programming where your program is structured out of instructions describing how the operations performed by a computer will happen.
Declarative Programming means any style of programming where your program is a description either of the problem or the solution - but doesn't explicitly state how the work will be done.
Functional Programming is programming by evaluating functions and functions of functions... As (strictly defined) functional programming means programming by defining side-effect free mathematical functions so it is a form of declarative programming but it isn't the only kind of declarative programming.
Logic Programming (for example in Prolog) is another form of declarative programming. It involves computing by deciding whether a logical statement is true (or whether it can be satisfied). The program is typically a series of facts and rules - i.e. a description rather than a series of instructions.
Term Rewriting (for example CASL) is another form of declarative programming. It involves symbolic transformation of algebraic terms. It's completely distinct from logic programming and functional programming.

imperative - expressions describe sequence of actions to perform (associative)
declarative - expressions are declarations that contribute to behavior of program (associative, commutative, idempotent, monotonic)
functional - expressions have value as only effect; semantics support equational reasoning

Since I wrote my prior answer, I have formulated a new definition of the declarative property which is quoted below. I have also defined imperative programming as the dual property.
This definition is superior to the one I provided in my prior answer, because it is succinct and it is more general. But it may be more difficult to grok, because the implication of the incompleteness theorems applicable to programming and life in general are difficult for humans to wrap their mind around.
The quoted explanation of the definition discusses the role pure functional programming plays in declarative programming.
All exotic types of programming fit into the following taxonomy of declarative versus imperative, since the following definition claims they are duals.
Declarative vs. Imperative
The declarative property is weird, obtuse, and difficult to capture in a technically precise definition that remains general and not ambiguous, because it is a naive notion that we can declare the meaning (a.k.a semantics) of the program without incurring unintended side effects. There is an inherent tension between expression of meaning and avoidance of unintended effects, and this tension actually derives from the incompleteness theorems of programming and our universe.
It is oversimplification, technically imprecise, and often ambiguous to define declarative as “what to do” and imperative as “how to do”. An ambiguous case is the “what” is the “how” in a program that outputs a program— a compiler.
Evidently the unbounded recursion that makes a language Turing complete, is also analogously in the semantics— not only in the syntactical structure of evaluation (a.k.a. operational semantics). This is logically an example analogous to Gödel's theorem— “any complete system of axioms is also inconsistent”. Ponder the contradictory weirdness of that quote! It is also an example that demonstrates how the expression of semantics does not have a provable bound, thus we can't prove2 that a program (and analogously its semantics) halt a.k.a. the Halting theorem.
The incompleteness theorems derive from the fundamental nature of our universe, which as stated in the Second Law of Thermodynamics is “the entropy (a.k.a. the # of independent possibilities) is trending to maximum forever”. The coding and design of a program is never finished— it's alive!— because it attempts to address a real world need, and the semantics of the real world are always changing and trending to more possibilities. Humans never stop discovering new things (including errors in programs ;-).
To precisely and technically capture this aforementioned desired notion within this weird universe that has no edge (ponder that! there is no “outside” of our universe), requires a terse but deceptively-not-simple definition which will sound incorrect until it is explained deeply.
Definition:
The declarative property is where there can exist only one possible set of statements that can express each specific modular semantic.
The imperative property3 is the dual, where semantics are inconsistent under composition and/or can be expressed with variations of sets of statements.
This definition of declarative is distinctively local in semantic scope, meaning that it requires that a modular semantic maintain its consistent meaning regardless where and how it's instantiated and employed in global scope. Thus each declarative modular semantic should be intrinsically orthogonal to all possible others— and not an impossible (due to incompleteness theorems) global algorithm or model for witnessing consistency, which is also the point of “More Is Not Always Better” by Robert Harper, Professor of Computer Science at Carnegie Mellon University, one of the designers of Standard ML.
Examples of these modular declarative semantics include category theory functors e.g. the Applicative, nominal typing, namespaces, named fields, and w.r.t. to operational level of semantics then pure functional programming.
Thus well designed declarative languages can more clearly express meaning, albeit with some loss of generality in what can be expressed, yet a gain in what can be expressed with intrinsic consistency.
An example of the aforementioned definition is the set of formulas in the cells of a spreadsheet program— which are not expected to give the same meaning when moved to different column and row cells, i.e. cell identifiers changed. The cell identifiers are part of and not superfluous to the intended meaning. So each spreadsheet result is unique w.r.t. to the cell identifiers in a set of formulas. The consistent modular semantic in this case is use of cell identifiers as the input and output of pure functions for cells formulas (see below).
Hyper Text Markup Language a.k.a. HTML— the language for static web pages— is an example of a highly (but not perfectly3) declarative language that (at least before HTML 5) had no capability to express dynamic behavior. HTML is perhaps the easiest language to learn. For dynamic behavior, an imperative scripting language such as JavaScript was usually combined with HTML. HTML without JavaScript fits the declarative definition because each nominal type (i.e. the tags) maintains its consistent meaning under composition within the rules of the syntax.
A competing definition for declarative is the commutative and idempotent properties of the semantic statements, i.e. that statements can be reordered and duplicated without changing the meaning. For example, statements assigning values to named fields can be reordered and duplicated without changed the meaning of the program, if those names are modular w.r.t. to any implied order. Names sometimes imply an order, e.g. cell identifiers include their column and row position— moving a total on spreadsheet changes its meaning. Otherwise, these properties implicitly require global consistency of semantics. It is generally impossible to design the semantics of statements so they remain consistent if randomly ordered or duplicated, because order and duplication are intrinsic to semantics. For example, the statements “Foo exists” (or construction) and “Foo does not exist” (and destruction). If one considers random inconsistency endemical of the intended semantics, then one accepts this definition as general enough for the declarative property. In essence this definition is vacuous as a generalized definition because it attempts to make consistency orthogonal to semantics, i.e. to defy the fact that the universe of semantics is dynamically unbounded and can't be captured in a global coherence paradigm.
Requiring the commutative and idempotent properties for the (structural evaluation order of the) lower-level operational semantics converts operational semantics to a declarative localized modular semantic, e.g. pure functional programming (including recursion instead of imperative loops). Then the operational order of the implementation details do not impact (i.e. spread globally into) the consistency of the higher-level semantics. For example, the order of evaluation of (and theoretically also the duplication of) the spreadsheet formulas doesn't matter because the outputs are not copied to the inputs until after all outputs have been computed, i.e. analogous to pure functions.
C, Java, C++, C#, PHP, and JavaScript aren't particularly declarative.
Copute's syntax and Python's syntax are more declaratively coupled to
intended results, i.e. consistent syntactical semantics that eliminate the extraneous so one can readily
comprehend code after they've forgotten it. Copute and Haskell enforce
determinism of the operational semantics and encourage “don't repeat
yourself” (DRY), because they only allow the pure functional paradigm.
2 Even where we can prove the semantics of a program, e.g. with the language Coq, this is limited to the semantics that are expressed in the typing, and typing can never capture all of the semantics of a program— not even for languages that are not Turing complete, e.g. with HTML+CSS it is possible to express inconsistent combinations which thus have undefined semantics.
3 Many explanations incorrectly claim that only imperative programming has syntactically ordered statements. I clarified this confusion between imperative and functional programming. For example, the order of HTML statements does not reduce the consistency of their meaning.
Edit: I posted the following comment to Robert Harper's blog:
in functional programming ... the range of variation of a variable is a type
Depending on how one distinguishes functional from imperative
programming, your ‘assignable’ in an imperative program also may have
a type placing a bound on its variability.
The only non-muddled definition I currently appreciate for functional
programming is a) functions as first-class objects and types, b)
preference for recursion over loops, and/or c) pure functions— i.e.
those functions which do not impact the desired semantics of the
program when memoized (thus perfectly pure functional
programming doesn't exist in a general purpose denotational semantics
due to impacts of operational semantics, e.g. memory
allocation).
The idempotent property of a pure function means the function call on
its variables can be substituted by its value, which is not generally
the case for the arguments of an imperative procedure. Pure functions
seem to be declarative w.r.t. to the uncomposed state transitions
between the input and result types.
But the composition of pure functions does not maintain any such
consistency, because it is possible to model a side-effect (global
state) imperative process in a pure functional programming language,
e.g. Haskell's IOMonad and moreover it is entirely impossible to
prevent doing such in any Turing complete pure functional programming
language.
As I wrote in 2012 which seems to the similar consensus of
comments in your recent blog, that declarative programming is an
attempt to capture the notion that the intended semantics are never
opaque. Examples of opaque semantics are dependence on order,
dependence on erasure of higher-level semantics at the operational
semantics layer (e.g. casts are not conversions and reified generics
limit higher-level semantics), and dependence on variable values
which can not be checked (proved correct) by the programming language.
Thus I have concluded that only non-Turing complete languages can be
declarative.
Thus one unambiguous and distinct attribute of a declarative language
could be that its output can be proven to obey some enumerable set of
generative rules. For example, for any specific HTML program (ignoring
differences in the ways interpreters diverge) that is not scripted
(i.e. is not Turing complete) then its output variability can be
enumerable. Or more succinctly an HTML program is a pure function of
its variability. Ditto a spreadsheet program is a pure function of its
input variables.
So it seems to me that declarative languages are the antithesis of
unbounded recursion, i.e. per Gödel's second incompleteness
theorem self-referential theorems can't be proven.
Lesie Lamport wrote a fairytale about how Euclid might have
worked around Gödel's incompleteness theorems applied to math proofs
in the programming language context by to congruence between types and
logic (Curry-Howard correspondence, etc).

Imperative programming: telling the “machine” how to do something, and as a result what you want to happen will happen.
Declarative programming: telling the “machine” what you would like to happen, and letting the computer figure out how to do it.
Example of imperative
function makeWidget(options) {
const element = document.createElement('div');
element.style.backgroundColor = options.bgColor;
element.style.width = options.width;
element.style.height = options.height;
element.textContent = options.txt;
return element;
}
Example of declarative
function makeWidget(type, txt) {
return new Element(type, txt);
}
Note: The difference is not one of brevity or complexity or abstraction. As stated, the difference is how vs what.

I think that your taxonomy is incorrect. There are two opposite types imperative and declarative. Functional is just a subtype of declarative. BTW, wikipedia states the same fact.

Nowadays, new focus: we need the old classifications?
The Imperative/Declarative/Functional aspects was good in the past to classify generic languages, but in nowadays all "big language" (as Java, Python, Javascript, etc.) have some option (typically frameworks) to express with "other focus" than its main one (usual imperative), and to express parallel processes, declarative functions, lambdas, etc.
So a good variant of this question is "What aspect is good to classify frameworks today?"
... An important aspect is something that we can labeling "programming style"...
Focus on the fusion of data with algorithm
A good example to explain. As you can read about jQuery at Wikipedia,
The set of jQuery core features — DOM element selections, traversal and manipulation —, enabled by its selector engine (...), created a new "programming style", fusing algorithms and DOM-data-structures
So jQuery is the best (popular) example of focusing on a "new programming style", that is not only object orientation, is "Fusing algorithms and data-structures". jQuery is somewhat reactive as spreadsheets, but not "cell-oriented", is "DOM-node oriented"... Comparing the main styles in this context:
No fusion: in all "big languages", in any Functional/Declarative/Imperative expression, the usual is "no fusion" of data and algorithm, except by some object-orientation, that is a fusion in strict algebric structure point of view.
Some fusion: all classic strategies of fusion, in nowadays have some framework using it as paradigm... dataflow, Event-driven programming (or old domain specific languages as awk and XSLT)... Like programming with modern spreadsheets, they are also examples of reactive programming style.
Big fusion: is "the jQuery style"... jQuery is a domain specific language focusing on "fusing algorithms and DOM-data-structures". PS: other "query languages", as XQuery, SQL (with PL as imperative expression option) are also data-algorith-fusion examples, but they are islands, with no fusion with other system modules... Spring, when using find()-variants and Specification clauses, is another good fusion example.

Declarative programming is programming by expressing some timeless logic between the input and the output, for instance, in pseudocode, the following example would be declarative:
def factorial(n):
if n < 2:
return 1
else:
return factorial(n-1)
output = factorial(argvec[0])
We just define a relationship called the 'factorial' here, and defined the relationship between the output and the input as the that relationship. As should be evident here, about any structured language allows declarative programming to some extend. A central idea of declarative programming is immutable data, if you assign to a variable, you only do so once, and then never again. Other, stricter definitions entail that there may be no side-effects at all, these languages are some times called 'purely declarative'.
The same result in an imperative style would be:
a = 1
b = argvec[0]
while(b < 2):
a * b--
output = a
In this example, we expressed no timeless static logical relationship between the input and the output, we changed memory addresses manually until one of them held the desired result. It should be evident that all languages allow declarative semantics to some extend, but not all allow imperative, some 'purely' declarative languages permit side effects and mutation altogether.
Declarative languages are often said to specify 'what must be done', as opposed to 'how to do it', I think that is a misnomer, declarative programs still specify how one must get from input to output, but in another way, the relationship you specify must be effectively computable (important term, look it up if you don't know it). Another approach is nondeterministic programming, that really just specifies what conditions a result much meet, before your implementation just goes to exhaust all paths on trial and error until it succeeds.
Purely declarative languages include Haskell and Pure Prolog. A sliding scale from one and to the other would be: Pure Prolog, Haskell, OCaml, Scheme/Lisp, Python, Javascript, C--, Perl, PHP, C++, Pascall, C, Fortran, Assembly

Some good answers here regarding the noted "types".
I submit some additional, more "exotic" concepts often associated with the functional programming crowd:
Domain Specific Language or DSL Programming: creating a new language to deal with the problem at hand.
Meta-Programming: when your program writes other programs.
Evolutionary Programming: where you build a system that continually improves itself or generates successively better generations of sub-programs.

In a nutshell, the more a programming style emphasizes What (to do) abstracting away the details of How (to do it) the more that style is considered to be declarative. The opposite is true for imperative. Functional programming is associated with the declarative style.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008