What is differentiable programming?

What is differentiable programming? - language-agnostic

Native support for differential programming has been added to Swift for the Swift for Tensorflow project. Julia has similar with Zygote.
What exactly is differentiable programming?
what does it enable? Wikipedia says
the programs can be differentiated throughout
but what does that mean?
how would one use it (e.g. a simple example)?
and how does it relate to automatic differentiation (the two seem conflated a lot of the time)?

I like to think about this question in terms of user-facing features (differentiable programming) vs implementation details (automatic differentiation).
From a user's perspective:
"Differentiable programming" is APIs for differentiation. An example is a def gradient(f) higher-order function for computing the gradient of f. These APIs may be first-class language features, or implemented in and provided by libraries.
"Automatic differentiation" is an implementation detail for automatically computing derivative functions. There are many techniques (e.g. source code transformation, operator overloading) and multiple modes (e.g. forward-mode, reverse-mode).
Explained in code:
def f(x):
return x * x * x
∇f = gradient(f)
print(∇f(4)) # 48.0
# Using the `gradient` API:
# ▶ differentiable programming.
# How `gradient` works to compute the gradient of `f`:
# ▶ automatic differentiation.

I never heard the term "differentiable programming" before reading your question, but having used the concepts noted in your references, both from the side of creating code to solve a derivative with Symbolic differentiation and with Automatic differentiation and having written interpreters and compilers, to me this just means that they have made the ability to calculate the numeric value of the derivative of a function easier. I don't know if they made it a First-class citizen, but the new way doesn't require the use of a function/method call; it is done with syntax and the compiler/interpreter hides the translation into calls.
If you look at the Zygote example it clearly shows the use of prime notation
julia> f(10), f'(10)
Most seasoned programmers would guess what I just noted because there was not a research paper explaining it. In other words it is just that obvious.
Another way to think about it is that if you have ever tried to calculate a derivative in a programming language you know how hard it can be at times and then ask yourself why don't they (the language designers and programmers) just add it into the language. In these cases they did.
What surprises me is how long it to took before derivatives became available via syntax instead of calls, but if you have ever worked with scientific code or coded neural networks at at that level then you will understand why this is a concept that is being touted as something of value.
Also I would not view this as another programming paradigm, but I am sure it will be added to the list.
How does it relate to automatic differentiation (the two seem conflated a lot of the time)?
In both cases that you referenced, they use automatic differentiation to calculate the derivative instead of using symbolic differentiation. I do not view differentiable programming and automatic differentiation as being two distinct sets, but instead that differentiable programming has a means of being implemented and the way they chose was to use automatic differentiation, they could have chose symbolic differentiation or some other means.
It seems you are trying to read more into what differential programming is than it really is. It is not a new way of programming, but just a nice feature added for doing derivatives.
Perhaps if they named it differentiable syntax it might have been more clear. The use of the word programming gives it more panache than I think it deserves.
EDIT
After skimming Swift Differentiable Programming Mega-Proposal and trying to compare that with the Julia example using Zygote, I would have to modify the answer into parts that talk about Zygote and then switch gears to talk about Swift. They each took a different path, but the commonality and bottom line is that the languages know something about differentiation which makes the job of coding them easier and hopefully produces less errors.
About the Wikipedia quote that
the programs can be differentiated throughout
At first reading it seems nonsense or at least lacks enough detail to understand it in context which is why I am sure you asked.
In having many years of digging into what others are trying to communicate, one learns that unless the source has been peer reviewed to take it with a grain of salt, and unless it is absolutely necessary to understand, then just ignore it. In this case if you ignore the sentence most of what your reference makes sense. However I take it that you want an answer, so let's try and figure out what it means.
The key word that has me perplexed is throughout, but since you note the statement came from Wikipedia and in Wikipedia they give three references for the statement, a search of the word throughout appears only in one
∂P: A Differentiable Programming System to Bridge Machine Learning and Scientific Computing
Thus, since our ∂P system does not require primitives to handle new
types, this means that almost all functions and types defined
throughout the language are automatically supported by Zygote, and
users can easily accelerate specific functions as they deem necessary.
So my take on this is that by going back to the source, e.g. the paper, you can better understand how that percolated up into Wikipedia, but it seems that the meaning was lost along the way.
In this case if you really want to know the meaning of that statement you should ask on the Wikipedia talk page and ask the author of the statement directly.
Also note that the paper referenced is not peer reviewed, so the statements in there may not have any meaning amongst peers at present. As I said, I would just ignore it and get on with writing wonderful code.

You can guess its definition by application of differentiability.
It's been used for optimization i.e. to calculate minimum value or maximum value
Many of these problems can be solved by finding the appropriate function and then using techniques to find the maximum or the minimum value required.

Related

High order forward made automatic differentiation

For quite some time I have been wondering how automatic differentiation works. However, I am a bit confused on how the forward mode works -- I am not equipped to deal with reverse mode at the moment. I have tried to read the source code of some libraries (mainly autodiff) and read some papers (e.g. FAD) in order to understand how people are doing it, with little success.
My main issue is I don't get how dual numbers are used. For example, let's say we define a class of dual numbers (in C++) that holds two numbers; value and derivative. Then, we can overload different mathematical functions and operators, in order to define the dual number algebra (as in the complex number case). Then, and this is my problem, no matter we do, we are only going to get first derivatives.
I keep reading about implementation of hyper-dual numbers, which are described as duals that store values, Jacobian, Hessian, etc. If this is true, then if I have a function of 15 variables and I need the third derivative wrt all of them, my computer is going to blow up... Since there are very efficient libraries out there that do such calculations, I am clearly missing something.
I don't have a specific coding question, I would appreciate any input on how forward mode autodiff can be implemented in a practical way.
More info
I have written a basic dual number library in C++, which you can find on github. However, once I finished writing the class and a few function overloads, I gave up due to the problem I describe above (DualNumbers.cpp has several examples, thogouh).
Recently I also started again, this time using expression templates (because I wanted to learn how to use them) -- see github, but this approach has another issue I describe in another question.

Is there a way to prove a program has no bug?

I was thinking about the fact that we can prove a program has bugs. We can test it to assess that it is more or less bug resistant.
But is there a way (even theoretically) to prove that a program has no bug ?
For simple programs, such as a "Hello World", I guess we should be able to do it.
But what about larger programs ?

There are nowadays many different formalisms that can be used to prove programs correct (e.g., formalizations in proof assistants, dependently typed programming languages, separation logic, ...). As noted by others, there is no automatic way to prove any given program correct (see the halting problem). However, those mentioned formalisms are often applicable to specific programs. (Such an application can be far from automatic and require a tremendous amount of creativity.)
Another very important point is what we actually mean by proving a program correct or as you stated prove that a program has no bug. Even with formal methods there is typically no way to say that nothing whatsoever can go wrong with a program. The reason is that formal methods usually show that a program conforms to a specification.
You can think of a specification as a logical formula (that states some property about a program) and of the correctness proof as a formal proof that a program satisfies this formula (i.e., enjoys the corresponding property). Due to this setup, everything outside the specification is not even "considered" by the proof. So to really show that a program has no bugs you would first have to write down a logical formula that states when a program does not have bugs.
So it would maybe be more honest to say that formal methods are often able to prove (without doubt) that a program does not have certain kinds of bugs (depending on the used specification).

Why are "pure" functions called "pure"? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
A pure function is one that has no side effects -- it cannot do any kind of I/O and it cannot modify the state of anything -- and it is referentially transparent -- when called multiple times with the same inputs, it always gives the same outputs.
Why is the word "pure" used to describe functions with those properties? Who first used the word "pure" in that way, and when? Are there other words that mean roughly the same thing?

To answer your first question, mathematical functions have often been described as "pure" in terms of some specified variables. e.g.:
the first term is a pure function of x and the second term is a pure function of y
Because of this, I don't think you'll find a true "first" occurrence.
For programming languages, a little searching shows that Ada 95 (pragma Pure), High Performance Fortran (1993) (PURE) and VHDL-93 (pure) all contain formal notions of 'pure functions'.
Haskell (1990) is fairly obvious, but purity isn't explicit. GCC's C has various function attributes for various differing levels of 'pure'.
A couple of books: Rationale for the C programming language (1990) uses the term, as does Programming Languages and their Definitions (1984). However, both apparently only use it once! Programming the IBM Personal Computer, Pascal (also 1984) uses the term, but it isn't clear from Google's restricted view whether or not the Pascal compiler had support for it. (I suspect not.)
An interesting note is that Green, the predecessor to Ada, actually had a fairly strict 'function' definition - even memory allocation was disallowed. However, this was dropped before it became Ada, where functions can have side-effects (I/O or global variables), but can't modify their arguments.
C28-6571-3 (the first PL/I reference manual, written before the compiler) shows that PL/I had support for pure functions, in the form of the REDUCIBLE (= pure) attribute, as far back as 1966 - when the compiler was first released. (This also answers your third question.)
This last document specifically notes that it includes REDUCIBLE as a new change since document C28-6571-2. So REDUCIBLE, which is possibly the first incarnation of formal pure functions in programming languages, appeared somewhere between January and July 1966.
Update: The earliest instance of "pure function" on Google Groups in this sense is from 1988, which easily postdates the book references.

A couple of myths:
The term "pure functional" doesn't come from mathematics, where all functions are by nature "pure" and, so, there was never any need to call anything a "pure function".
The term doesn't come from imperative programming. The early imperative programming languages, Fortran, Algol 60, Pascal etc., always had two kinds of abstractions: "functions" that produced results based on their inputs and "procedures" which took some inputs and did an action. It was considered good programming practice for "functions" not to have side effects. There was no need for them to have side effects because one could always use procedures instead.
So, where else could the term "pure functional" have come from? The answer is - sort of- obvious. It came from impure functional programming languages, the foremost among them being Lisp. Lisp was designed sometime between 1958 and 1960 (between the first and second reports of Algol 60, whose design McCarthy was involved in, but didn't feel satisfied with). Lisp's design was based fundamentally on functional programming. However, it also allowed side-effects as a pragmatic choice. It did not have a notion of a command or a procedure. So, in Lisp, one mostly wrote "pure functions", but occasionally, one wrote "impure functions," i.e., functions with side-effects, to get something done. The terms "pure Lisp" or "purely functional subset of Lisp" have been in use for a long time. Slowly, by osmosis, this idea of "purity" has come to invade all our space.
The imperative programming languages could have resisted the trend. But, once C decided to abolish the idea of "procedures" and call them "void functions" instead, they didn't have much of a leg to stand on.

It comes from the mathematical definition of "function", where it is not possible for functions to have side effects.

Why is the word "pure" used to describe functions with those properties?
From Wiktionary > pure # adjective
free of flaws or imperfections; unsullied
free of foreign material or pollutants
free of immoral behavior or qualities; clean
of a branch of science, done for its own sake instead of serving another branch of science.
It should be obvious that the behavior of interacting functions is easiest to reason about when they are influenced only by their inputs, and they themselves influence only their outputs. Therefore it is inevitable that these kinds of functions will be noticed and classified. Now what word could we use to describe a function with such properties? "free of foreign material or pollutants" and "free of immoral behavior or qualities" seem to describe this rather well.
Who first used the word "pure" in that way, and when?
I am much too young to answer this with any degree of confidence. I argue, however, that it was inevitable that the word pure (or some very close synonym) would be used to describe functions that behave in this way.
Are there other words that mean roughly the same thing?
You said it yourself: "referentially transparent". However, you seem to suggest that "referential transparency" encompasses only part of the meaning of the phrase "pure function". I disagree; I feel it is entirely synonymous. From Wikipedia > Referential Transparency:
An expression is said to be referentially transparent if it can be replaced with its value without changing the behavior of a program. (emphasis mine)
The Haskell community sometimes uses the adjective "safe" in a similar manner. (See the Safe library, made to avoid throwing exceptions. Contrast with unsafePerformIO)
I can't think of any other synonyms right now.

The concept of a function originated in mathematics. The mathematical concept of a function is more-or-less a mapping from one set onto another. In this sense it's impossible for functions to have side effects; not because they're "better" that way or because they're specifically defined as to not have side effects, but because the concept of "having side effects" doesn't make any sense with this definition of a function. Mathematical functions aren't a series of steps that execute, so how could any of those steps somehow "affect" other mathematical objects you're talking about?
When people started studying computation, they became interested in machine-implementable algorithms for computing the values of mathematical functions given their inputs. People started talking about computable functions. But functions as implemented in a computer (in imperative languages at least, which are what programmers first worked with) are a series of executable steps, which obviously can have side effects.
So it became natural for programmers to think about functions as algorithms, not as mathematical functions. So then a pure function is one that is purely a mathematical function, to which all the hundreds of years of theory about functions applies, as opposed to the generalised programmer's function, which can't be reasoned about that way.

is there a Universal Model for languages?

Many programming languages share generic and even fairly universal features. For example, if you compared Java, VB6, .NET, PHP, Python, then you would find common functions such as control structures, numeric and string manipulation, etc.
What has been done to define these features at a meta-language (or language-agnostic) level?
UML offers a descriptive reference of software in every aspect, but the real-world focus seems to be data processes. Is UML relevant?
I'm not asking "Why we don't have a single language that replaces the current plethora." We need many different tools (at least in this eon).
I'm not asking that all languages fit a template -- assembly vs. compiled languages are different enough to make that unfeasible (and some folks call HTML a language, though I wouldn't). Any attempt would start with a properly narrow scope. In line with this, I wouldn't expect the model to cover even a small selection with full validity.
I would expect however that such a model could be used to transpose from one language to another (with limited goals -- think jist translation).

There have been many attempts at this, but none have been very successful. The earliest I'm aware of is UNCOL more than 50 years ago.
You've given a list of languages that have a lot in common because they're pretty similar -- they're all procedural languages with common roots and some OO extensions thrown in, so that's not too suprising. If you start looking at different languages like LISP, haskell, erlang, prolog, or even SQL you start seeing very different things.

What you're describing sounds like the formal semantics of programming languages. There are a variety of approaches and each will give a way to formally specify the meaning of a program in some programming language. In some cases, this specification is essentially a translation into another language such as lambda calculus, or compilation for a formally specified abstract machine such as SECD.
There is so much work here it's hard to pick a specific reference. But I hope I've given you some useful keywords to continue your search.

UML is typically used to define algorithms/code in simpler terms before moving on to real code.
To answer what I am guessing to be your question, there is already a defined set of required parts of languages while,for,if,else... Will this ever be set as a standard, or made into a base library that is used by all languages: no, this is because the different developers of languages like to do it themselves.

I think the closest you can get to this without loss of generality is a Turing machine, which is not very useful for practical purposes. But if you allow Turing machine languages to be "labeled" and reused, you could build up the concepts you need, working from low- to high-level.

I think that MOF is the universal language.
You can for example create UML diagrams from MOF via a UML metamodel. If you save this metamodel information into xmi then you can save what ever information you need and even more than in any language. XMI semantic is so rich that there is no limit to its use. If you map UML to xmi on the top of a metamodel live synchronize with MOF then this is for me the universal language.

The author of Pattern Calculus seems to propose such a universal model. I expect that it will turn out to be just as useful as previous attempts to define a universal model, that is to say, good in parts but not the last word.

handling amorphous subsystems in formal software design

People like Alexander Stepanov and Sean Parent vote for a formal and abstract approach on software design.
The idea is to break complex systems down into a directed acyclic graph and hide cyclic behaviour in nodes representing that behaviour.
Parent gave presentations at boost-con and google (sheets from boost-con, p.24 introduces the approach, there is also a video of the google talk).
While i like the approach and think its a neccessary development, i have a problem with imagining how to handle subsystems with amorphous behaviour.
Imagine for example a common pattern for state-machines: using an interface which all states support and having different behaviour in concrete implementations for the states.
How would one solve that?
Note that i am just looking for an abstract approach.
I can think of hiding that behaviour behind a node and defining different sub-DAGs for the states, but that complicates the design considerately if you want to influence the behaviour of the main DAG from a sub-DAG.

Your question is not clear. Define amorphous subsystems.
You are "just looking for an abstract approach" but then you seem to want details about an implementation in a conventional programming language ("common pattern for state-machines"). So, what are you asking for? How to implement nested finite state-machines?
Some more detail will help the conversation.
For a real abstract approach, look at something like Stream X-Machines:
... The X-machine model is structurally the
same as the finite state machine, except
that the symbols used to label the machine's
transitions denote relations of type X→X. ...
The Stream X-Machine differs from Eilenberg's
model, in that the fundamental data type
X = Out* × Mem × In*,
where In* is an input sequence,
Out* is an output sequence, and Mem is the
(rest of the) memory.
The advantage of this model is that it
allows a system to be driven, one step
at a time, through its states and
transitions, while observing the
outputs at each step. These are
witness values, that guarantee that
particular functions were executed on
each step. As a result, complex
software systems may be decomposed
into a hierarchy of Stream
X-Machines, designed in a top-down
way and tested in a bottom-up way.
This divide-and-conquer approach to
design and testing is backed by
Florentin Ipate's proof of correct
integration, which proves how testing
the layered machines independently is
equivalent to testing the composed
system. ...
But I don't see how the presentation is related to this. He seems to speak about a quite mainstream approach to programming, nothing similar to X-Machines. Anyway, the presentation is quite confusing and I have no time to see the video right now.
First impression of the talk, reading the slides only
The author touches haphazardly on numerous fields/problems/solutions, apparently without recognizing it: from Peopleware (for example Psychology of programming), to Software Engineering (for example software product lines), to various programming techniques.
How the various parts are linked and what exactly he is advocating is not clear at all (I'm accustomed to just reading slides and they are usually consequential):
Dataflow programming?
Constraints solving for User Interfaces? For practical implementations, see Garnet for Common Lisp, Amulet/OpenAmulet for C++.
What advantages gives us this "new" concept-based generic programming with respect to well-known approaches (for example, tools based on Hoare logic pre/post conditions and invariants or, better, Hoare's Communicating Sequential Processes (CSP) or Hehner's Practical Theory of Programming or some programming language with a sophisticated type-system like ATS, Qi or Epigram and so on)? It seems to me that introducing "concepts" - which, as-is, are specific to C++ - is not more simple than using the alternatives. Is it just about jargon and "politics"? (Finally formal methods... but disguised).
Why organizing program modules as a DAG and not as a tree, like David Parnas advocated decades ago in Designing software for ease of extension and contraction? (here a directly accessible .pdf and here slides from a lecture). The work on X-Machines probably is an answer to this question (going even beyond DAGs), but, again, the author seems to speak about a quite conventional program development regime in which Parnas' approach is the only sensible.
If/when I will see the video I will update this answer.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008