Why is memoization not a language feature? - language-agnostic

I was wondering: why is memoization not provided natively as a language feature in any language I know about?
Edit: to clarify, what I mean is that the language provides a keyword to specify a given function as memoizable, not that every function is automatically memoized "by default" unless specified otherwise. For example, Fortran provides the keyword PURE to specify a specific function as such. I guess that the compiler can take advantage of this information to memoize the call, but I ignore what happens if you declare PURE a function with side effects.

What YOU want from memoization may not be the same as what the compiler memoization option would provide.
You may know that it is only profitable to memoize the last 10 or so distinct values computed, because you know how the function will be used.
You may know that it only makes sense to memoize the last 2 or 3 values, because you will never use values older than that. (Fibonacci's Sequence comes to mind.)
You may be generating a LOT of values on some runs, and just a few on others.
You may want to "throw away" some of the memoized values and start over. (I memoized a random number generator this way, so I could replay the sequence of random numbers that built a certain structure, while some other parameters of the structure had been changed.)
Memoization as an optimization depends on the search for the memoized value being a lot cheaper than recomputation of the value. This in turn depends on the ordering of the input requests. This has implications for the memoization database: Does it use a stack, an array of all possible input values (which may be very large), a bucket hash, or a b-tree?
The memoizing compiler has to either provide a "one size fits all" memoization, or it has to provide lots of possible alternatives, and parameters to control the alternatives. At some point, it becomes easier for everyone to require the user to provide his own memoization.

Because compilers have to emit semantically correct programs. You can't memoize a function without changing program semantics unless it is referentially transparent. In most programming languages not all functions are referentially transparent (pure functional programming languages are an exception) so you can't memoize everything. But then a mechanism is needed for detecting referential transparency and that is too hard.

In Haskell, memoization is automatic for (pure) functions you've defined that take no arguments. And the Fibonacci example in that Wiki is really about the simplest demonstrable example I would be able to think of either.
Haskell can do this because your pure functions are defined to produce the same results every time; of course, monadic functions that depend on side effects won't be memoized.
I'm not sure what the upper limits are -- obviously, it won't memoize more than the available memory. And I'm also not sure offhand if the memoization occurs at compile-time (if the values can be determined at compile-time), or if it always occurs the first time the function is called.

Clojure has a memoize function (http://richhickey.github.com/clojure/clojure.core-api.html#clojure.core/memoize):
memoize
function
Usage: (memoize f)
Returns a memoized version of a referentially transparent function. The
memoized version of the function keeps a cache of the mapping from arguments
to results and, when calls with the same arguments are repeated often, has
higher performance at the expense of higher memory use.

A) Memoization trades space for time. I imagine that this can turn out to a fairly unbound property, in the sense, that the amount of data programs or libraries would have to store could consume large parts of memory really quick.
For a couple of languages, memoization is easy to implement and easy to customize for the given requirements.
As an example take some natural language processing on large bodies of text, where you don't want to compute basic properties of texts (word count, frequency, cooccurrences, ...) over and over again. In that case a memoization in combination with object serialization can be useful as opposed to memory caching, since you may run your application multiple times on unchanged corpora.
B) Another aspect: It's not true, that all functions or methods yield the same output for a same given input. Anyway some keyword or syntax for memoization would be necessary, along with configuration (memory limits, invalidation policy, ...) ...

Because you shouldn't implement something as a language feature when it can easily be implemented in the language itself. A memoization feature belongs in a library, which is exactly where most languages put it.

Your question also leaves open the solution of your learning more languages. I think that Lisp supports memoization, and I know that Mathematica does.

In order for memoization to work as a language feature there would be a couple requirements.
The compiler would need to be identify valid functions for memoization (e.g. they are referentially transparent).
The run-time would have to be able to intelligently select candidates for memoization without slowing down the overall performance.
There are some assumptions in the other language, but if we can have performance gains by just-in-time compilation of hot-spots in a Java VM, then one can surely write an automated memoziation system.
While non-trivial I think this is all theoretically possible to get performance gains in a language (especially an interpreted one) and is a worthwhile area for research.

Not all the languages natively support function decorators. I guess it would be a more general approach to support rather than supporting just memoization.

Reverse the question. Why it should? As someone has said, it can be put in a library so no need of add syntax to the language, it's only usable on pure functions which are hard to identify automatically(unless you force the programmer to annotate them). It's also very hard to determine if memoization is going to speed up things or not. I don't think it's a desirable feature for a programming language.

I really think such an option should be.
In data processing tasks there is an immutable input data (as time series, for example, where for a given time as soon as a value is known, it can never change). Taking in mind today RAM affordability, if a function result only depends on such immutable data, it is rational to memoize it rather than reread every time it's needed. Currently I have (in Scala and C#) to manually introduce an in-memory storage table and write 3 functions instead of one - one reading a value from file/db/ws, one storing it into an in-memory table, one to wrap them and read from memory if available or call the raw function if not. I think this could and should be implemented as a keyword and done behind the scenes.

Related

Regarding (When Executed) in Haskell IO Monad

I have no problem with the IO Monad. But I want to understand the followings:
In All/almost Haskell tutorials/ text books they keep saying that getChar is not a pure function, because it can give you a different result. My question is: Who said that this is a function in the first place. Unless you give me the implementation of this function, and I study that implementation, I can't guarantee it is pure. So, where is that implementation?
In All/almost Haskell tutorials/ text books, it's said that, say (IO String) is an action that (When executed) it can give you back a value of type String. This is fine, but who/where this execution is taking place. Of course! The computer is doing this execution. This is OK too. but since I am only a beginner, I hope you forgive me to ask, where is the recipe for this "execution". I would guess it is not written in Haskell. Does this later idea mean that, after all, that a Haskell program is converted into a C-like program, which will eventually be converted into Assembly -> Machine code? If so, where one can find the implementation of the IO stuff in Haskell?
Many thanks
Haskell functions are not the same as computations.
A computation is a piece of imperative code (perhaps written in C or Assembler, and then compiled to machine code, directly executable on a processor), that is by nature effectful and even unrestricted in its effects. That is, once it is ran, a computation may access and alter any memory and perform any operations, such as interacting with keyboard and screen, or even launching missiles.
By contrast, a function in a pure language, such as Haskell, is unable to alter arbitrary memory and launch missiles. It can only alter its own personal section of memory and return a result that is specified in its type.
So, in a sense, Haskell is a language that cannot do anything. Haskell is useless. This was a major problem during the 1990's, until IO was integrated into Haskell.
Now, an IO a value is a link to a separately prepared computation that will, eventually, hopefully, produce a. You will not be able to create an IO a out of pure Haskell functions. All the IO primitives are designed separately, and packaged into GHC. You can then compose these simple computations into less trivial ones, and eventually your program may have any effects you may wish.
One point, though: pure functions are separate from each other, they can only influence each other if you use them together. Computations, on the other hand, may interact with each other freely (as I said, they can generally do anything), and therefore can (and do) accidentally break each other. That's why there are so many bugs in software written in imperative languages! So, in Haskell, computations are kept in IO.
I hope this dispels at least some of your confusion.

CUDA intrinsic function advantage

Similar to this question, is there any advantage of using the intrinsics (single, double or half) in the CUDA Math API. I understand some have faster (less accurate) versions such as __fdivdef and these can always be used with -use_fast_math, however what about the other functions. For example, why would one use __fadd_rd(A,B) instead of A+B or __fmaf_rd(A,B,C) instead of A+B+C? One reason I can think of is that one can choose the rounding method more conveniently - fine.
Also some functions, for example __fmul_rd "will never be merged into a single multiply-add instruction" (according to the CUDA Math API documentation). Why would this be advantageous?
The really short answer is that using something like __fmul_rd is never "advantageous", but sometimes use floating point instructions with clearly defined and fully predictable (or standardised) rounding and compilation behaviour is required to make calculations work correctly. This, for example.
The general rule is that if you don't understand why those floating point intrinsic functions exist, you shouldn't use them.
Intrinsics give you finer control over exactly what operations your inner loop is going to do. If I call __fmaf_rd, I am virtually certain that the emitted PTX is going to have an fma.rd instruction without having to resort to writing inline assembly code.
I will therefore have no worries that the compiler might optimize the loop differently than I want*, or that there might be some subtlety of the standards I'm overlooking that requires the compiler to implement something more complicated than I thought I wrote.
Naturally, this is only a good motivation if I really know what I'm doing in this regard, but if I do, it's there for me to use. And being an intrinsic is superior to inline assembly, because the compiler actually understands the instruction.
*: You can't understand how frustrating it is when you know the best way to implement the loop, but the compiler keeps "optimizing" to something less efficient

Using functions in VHDL for synthesis

I do use functions in VHDL now and then, mostly in testbenches and seldom in synthesized projects, and I'm quite happy with that.
However, I was wondering if for projects that will be synthesized, it really is a smart move (in terms of LE use mostly?) I've read quite a lot of things about that online, however I can't find anything satisfying.
For instance, I've read something like that : "The function is synthesized each time it's called !!". Is it really so? (I thought of it more like a component instantiated once but whose inputs and output and accessed from various places in the design but I guess that may be incorrect).
In the case of a once-used function, what would change between that and writing the VHDL directly in the process for example? (In terms of LE use?).
A circuit in hardware, for example a FPGA, executes everywhere all the time, where in compare a program for an CPU executes only one place at a time. This allows a program on a CPU to reuse program code for different data, where a circuit in hardware must have sufficient resources to process all the data all the time.
So a circuit written in VHDL is generally translated by the synthesis tool as massive parallel construction that allows concurrent operation of all of the design all the time. The VHDL language is created with the purpose of concurrent execution, and this is a major different from ordinary programming languages.
As a consequence, a design that implements an algorithm with functions vs. a design that implements the same algorithm with separate logic, will have the exact same size and speed since the synthesis tool will expand the functions to dedicated logic in order to make the required hardware available.
That being said, it is possible to reuse the same hardware for different data, but the designer must generally explicitly create the design to support this, and thereby interleave different data sets when timing allows it.
And finally, as scary_jeff also points out, it is a smart move to use functions since there is nothing to loose in terms of size or speed, but all the advantages of creating a manageable design. But be aware, that functions can't contain state, so it is only possible to create functions for combinatorial logic between flip-flops, which usually limits the possible complexity in order to meet timing.
Yes, you should use functions and procedures.
Many people and companies use functions and procedures in synthesizable code. Some coding styles disallow functions for no good reason. If you feel uncertain about a certain construct in VHDL (in this case: functions), just type up a small example and inspect the synthesis result.
Functions are really powerful and they can help you create better hardware with less effort. As with all powerful things, you can create really bad code (and bad synthesis results) with functions too.

Where does the compiler spend most of its time during parsing?

I read in Sebesta book, that the compiler spends most of its time in lexing source code. So, optimizing the lexer is a necessity, unlike the syntax analyzer.
If this is true, why lexical analysis stage takes so much time compared to syntax analysis in general ?
I mean by syntax analysis the the derivation process.
First, I don't think it actually is true: in many compilers, most time is not spend in lexing source code. For example, in C++ compilers (e.g. g++), most time is spend in semantic analysis, in particular in overload resolution (trying to find out what implicit template instantiations to perform). Also, in C and C++, most time is often spend in optimization (creating graph representations of individual functions or the whole translation unit, and then running long algorithms on these graphs).
When comparing lexical and syntactical analysis, it may indeed be the case that lexical analysis is more expensive. This is because both use state machines, i.e. there is a fixed number of actions per element, but the number of elements is much larger in lexical analysis (characters) than in syntactical analysis (tokens).
Lexical analysis is the process whereby all the characters in the source code are converted to tokens. For instance
foreach (x in o)
is read character by character - "f", "o", etc.
The lexical analyser must determine the keywords being seen ("foreach", not "for" and so on.)
By the time syntactic analysis occurs the program code is "just" a series of tokens. That said, I agree with the answer above that lexical analysis is not necessarily the most time-consuming process, just that it has the biggest stream to work with.
It depends really where you draw the line between lexing and parsing. I tend to have a very limited view of what a token is, and as a result my parsers spend a lot more time on parsing than on lexing, not because they are faster, but because they simply do less.
It certainly used to be the case that lexing was expensive. Part of that had to do with limited memory and doing multiple file operations to read in bits of program. Now that memory is measured in GB this is no longer an issue and for the same reason a lot more work can be done, so optimization is more important. Of course, whether the optimization helps much is another question.

What is "Orthogonality"?

What does "orthogonality" mean when talking about programming languages?
What are some examples of Orthogonality?
Orthogonality is the property that means "Changing A does not change B". An example of an orthogonal system would be a radio, where changing the station does not change the volume and vice-versa.
A non-orthogonal system would be like a helicopter where changing the speed can change the direction.
In programming languages this means that when you execute an instruction, nothing but that instruction happens (which is very important for debugging).
There is also a specific meaning when referring to instruction sets.
From Eric S. Raymond's "Art of UNIX programming"
Orthogonality is one of the most important properties that can help make even complex designs compact. In a purely orthogonal design, operations do not have side effects; each action (whether it's an API call, a macro invocation, or a language operation) changes just one thing without affecting others. There is one and only one way to change each property of whatever system you are controlling.
Think of it has being able to change one thing without having an unseen affect on another part.
Broadly, orthogonality is a relationship between two things such that they have minimal effect on each other.
The term comes from mathematics, where two vectors are orthogonal if they intersect at right angles.
Think about a typical 2 dimensional cartesian space (your typical grid with X/Y axes). Plot two lines: x=1 and y=1. The two lines are orthogonal. You can change x=1 by changing x, and this will have no effect on the other line, and vice versa.
In software, the term can be appropriately used in situations where you're talking about two parts of a system which behave independently of each other.
If you have a set of constructs. A langauge is said to be orthogonal if it allows the programmer to mix these constructs freely. For example, in C you can't return an array(static array), C is said to be unorthognal in this case:
int[] fun(); // you can't return a static array.
// Of course you can return a pointer, but the langauge allows passing arrays.
// So, it is unorthognal in case.
Most of the answers are very long-winded, and even obscure. The point is: if a tool is orthogonal, it can be added, replaced, or removed, in favor of better tools, without screwing everything else up.
It's the difference between a carpenter having a hammer and a saw, which can be used for hammering or sawing, or having some new-fangled hammer/saw combo, which is designed to saw wood, then hammer it together. Either will work for sawing and then hammering together, but if you get some task that requires sawing, but not hammering, then only the orthogonal tools will work. Likewise, if you need to screw instead of hammering, you won't need to throw away your saw, if it's orthogonal (not mixed up with) your hammer.
The classic example is unix command line tools: you have one tool for getting the contents of a disk (dd), another for filtering lines from the file (grep), another for writing those lines to a file (cat), etc. These can all be mixed and matched at will.
While talking about project decisions on programming languages, orthogonality may be seen as how easy is for you to predict other things about that language for what you've seen in the past.
For instance, in one language you can have:
str.split
for splitting a string and
len(str)
for getting the lenght.
On a language more orthogonal, you would always use str.x or x(str).
When you would clone an object or do anything else, you would know whether to use
clone(obj)
or
obj.clone
That's one of the main points on programming languages being orthogonal. That avoids you to consult the manual or ask someone.
The wikipedia article talks more about orthogonality on complex designs or low level languages.
As someone suggested above on a comment, the Sebesta book talks cleanly about orthogonality.
If I would use only one sentence, I would say that a programming language is orthogonal when its unknown parts act as expected based on what you've seen.
Or... no surprises.
;)
From Robert W. Sebesta's "Concepts of Programming Languages":
As examples of the lack of orthogonality in a high-level language,
consider the following rules and exceptions in C. Although C has two
kinds of structured data types, arrays and records (structs), records
can be returned from functions but arrays cannot. A member of a
structure can be any data type except void or a structure of the same
type. An array element can be any data type except void or a function.
Parameters are passed by value, unless they are arrays, in which case
they are, in effect, passed by reference (because the appearance of an
array name without a subscript in a C program is interpreted to be
the address of the array’s first element)
from wikipedia:
Computer science
Orthogonality is a system design property facilitating feasibility and compactness of complex designs. Orthogonality guarantees that modifying the technical effect produced by a component of a system neither creates nor propagates side effects to other components of the system. The emergent behavior of a system consisting of components should be controlled strictly by formal definitions of its logic and not by side effects resulting from poor integration, i.e. non-orthogonal design of modules and interfaces. Orthogonality reduces testing and development time because it is easier to verify designs that neither cause side effects nor depend on them.
For example, a car has orthogonal components and controls (e.g. accelerating the vehicle does not influence anything else but the components involved exclusively with the acceleration function). On the other hand, a non-orthogonal design might have its steering influence its braking (e.g. electronic stability control), or its speed tweak its suspension.1 Consequently, this usage is seen to be derived from the use of orthogonal in mathematics: One may project a vector onto a subspace by projecting it onto each member of a set of basis vectors separately and adding the projections if and only if the basis vectors are mutually orthogonal.
An instruction set is said to be orthogonal if any instruction can use any register in any addressing mode. This terminology results from considering an instruction as a vector whose components are the instruction fields. One field identifies the registers to be operated upon, and another specifies the addressing mode. An orthogonal instruction set uniquely encodes all combinations of registers and addressing modes.
From Wikipedia:
Orthogonality is a system design
property facilitating feasibility and
compactness of complex designs.
Orthogonality guarantees that
modifying the technical effect
produced by a component of a system
neither creates nor propagates side
effects to other components of the
system. The emergent behavior of a
system consisting of components should
be controlled strictly by formal
definitions of its logic and not by
side effects resulting from poor
integration, i.e. non-orthogonal
design of modules and interfaces.
Orthogonality reduces testing and
development time because it is easier
to verify designs that neither cause
side effects nor depend on them.
For example, a car has orthogonal
components and controls (e.g.
accelerating the vehicle does not
influence anything else but the
components involved exclusively with
the acceleration function). On the
other hand, a non-orthogonal design
might have its steering influence its
braking (e.g. electronic stability
control), or its speed tweak its
suspension.[1] Consequently, this
usage is seen to be derived from the
use of orthogonal in mathematics: One
may project a vector onto a subspace
by projecting it onto each member of a
set of basis vectors separately and
adding the projections if and only if
the basis vectors are mutually
orthogonal.
An instruction set is said to be
orthogonal if any instruction can use
any register in any addressing mode.
This terminology results from
considering an instruction as a vector
whose components are the instruction
fields. One field identifies the
registers to be operated upon, and
another specifies the addressing mode.
An orthogonal instruction set uniquely
encodes all combinations of registers
and addressing modes.
To put it in the simplest terms possible, two things are orthogonal if changing one has no effect upon the other.
Orthogonality means the degree to which language consists of a set of independent primitive constructs that can be combined as necessary to express a program.
Features are orthogonal if there are no restrictions on how they may be combined
Example : non-orthogonality
PASCAL: functions can't return structured types.
Functional Languages are highly orthogonal.
Real life examples of orthogonality in programming languages
There are a lot of answers already that explain what orthogonality generally is while specifying some made up examples. E.g. this answer explains it well. I wanted to provide (and gather) some real life examples of orthogonal or non-orthogonal features in programming languages:
Orthogonal: C++20 Modules and Namespaces
On the cppreference-page about the new Modules system in c++20 is written:
Modules are orthogonal to namespaces
In this case they write that modules are orthogonal to namespaces because a statement like import foo will not import the module-namespace related to foo:
import foo; // foo exports foo::bar()
bar (); // Error
foo::bar (); // Ok
using namespace foo;
bar (); // Ok
(adapted from modules-cppcon2017 slide 9)
In programming languages a programming language feature is said to be orthogonal if it is bounded with no restrictions (or exceptions).
For example, in Pascal functions can't return structured types. This is a restriction on returning values from a function. Therefore we it is considered as a non-orthogonal feature. ;)
Orthogonality in Programming:
Orthogonality is an important concept, addressing how a relatively small number of components can be combined in a relatively small number of ways to get the desired results. It is associated with simplicity; the more orthogonal the design, the fewer exceptions. This makes it easier to learn, read and write programs in a programming language. The meaning of an orthogonal feature is independent of context; the key parameters are symmetry and consistency (for example, a pointer is an orthogonal concept).
from Wikipedia
Orthogonality in a programming language means that a relatively small set of
primitive constructs can be combined in a relatively small number of ways to
build the control and data structures of the language. Furthermore, every pos-
sible combination of primitives is legal and meaningful. For example, consider data types. Suppose a language has four primitive data types (integer, float,
double, and character) and two type operators (array and pointer). If the two
type operators can be applied to themselves and the four primitive data types,
a large number of data structures can be defined.
The meaning of an orthogonal language feature is independent of the
context of its appearance in a program. (the word orthogonal comes from the
mathematical concept of orthogonal vectors, which are independent of each
other.) Orthogonality follows from a symmetry of relationships among primi-
tives. A lack of orthogonality leads to exceptions to the rules of the language.
For example, in a programming language that supports pointers, it should be
possible to define a pointer to point to any specific type defined in the language.
However, if pointers are not allowed to point to arrays, many potentially useful user-defined data structures cannot be defined.
We can illustrate the use of orthogonality as a design concept by compar-
ing one aspect of the assembly languages of the IBM mainframe computers
and the VAX series of minicomputers. We consider a single simple situation:
adding two 32-bit integer values that reside in either memory or registers and
replacing one of the two values with the sum. The IBM mainframes have two
instructions for this purpose, which have the forms
A Reg1, memory_cell
AR Reg1, Reg2
where Reg1 and Reg2 represent registers. The semantics of these are
Reg1 ← contents(Reg1) + contents(memory_cell)
Reg1 ← contents(Reg1) + contents(Reg2)
The VAX addition instruction for 32-bit integer values is
ADDL operand_1, operand_2
whose semantics is
operand_2 ← contents(operand_1) + contents(operand_2)
In this case, either operand can be a register or a memory cell.
The VAX instruction design is orthogonal in that a single instruction can
use either registers or memory cells as the operands. There are two ways to
specify operands, which can be combined in all possible ways. The IBM design
is not orthogonal. Only two out of four operand combinations possibilities are
legal, and the two require different instructions, A and AR . The IBM design
is more restricted and therefore less writable. For example, you cannot add
two values and store the sum in a memory location. Furthermore, the IBM
design is more difficult to learn because of the restrictions and the additional instruction.
Orthogonality is closely related to simplicity: The more orthogonal the
design of a language, the fewer exceptions the language rules require. Fewer
exceptions mean a higher degree of regularity in the design, which makes the
language easier to learn, read, and understand. Anyone who has learned a sig-
nificant part of the English language can testify to the difficulty of learning its
many rule exceptions (for example, i before e except after c).
The basic idea of orthogonality is that things that are not related conceptually should not be related in the system. Parts of the architecture that really have nothing to do with the other, such as the database and the UI, should not need to be changed together. A change to one should not cause a change to the other.
Orthogonality is the idea that things that are not related conceptually should not be related in the system so parts of the architecture that have nothing to do with each other, like the database and UI should not be changed together. A change to one part of your system should not cause the change to the other.
If for example, you change a few lines on the screen and cause a change in the database schema, this is called coupling. You usually want to minimize coupling between things that are mostly unrelated because it can grow and the system can become a nightmare to maintain in the long run.
From Michael C. Feathers' book "Working Effectively With Legacy Code":
If you want to change existing behavior in your code and there is exactly one place you have to go to make that change, you've got orthogonality.