What is "Orthogonality"? - language-agnostic

What is "Orthogonality"? - language-agnostic

What does "orthogonality" mean when talking about programming languages?
What are some examples of Orthogonality?

Orthogonality is the property that means "Changing A does not change B". An example of an orthogonal system would be a radio, where changing the station does not change the volume and vice-versa.
A non-orthogonal system would be like a helicopter where changing the speed can change the direction.
In programming languages this means that when you execute an instruction, nothing but that instruction happens (which is very important for debugging).
There is also a specific meaning when referring to instruction sets.

From Eric S. Raymond's "Art of UNIX programming"
Orthogonality is one of the most important properties that can help make even complex designs compact. In a purely orthogonal design, operations do not have side effects; each action (whether it's an API call, a macro invocation, or a language operation) changes just one thing without affecting others. There is one and only one way to change each property of whatever system you are controlling.

Think of it has being able to change one thing without having an unseen affect on another part.

Broadly, orthogonality is a relationship between two things such that they have minimal effect on each other.
The term comes from mathematics, where two vectors are orthogonal if they intersect at right angles.
Think about a typical 2 dimensional cartesian space (your typical grid with X/Y axes). Plot two lines: x=1 and y=1. The two lines are orthogonal. You can change x=1 by changing x, and this will have no effect on the other line, and vice versa.
In software, the term can be appropriately used in situations where you're talking about two parts of a system which behave independently of each other.

If you have a set of constructs. A langauge is said to be orthogonal if it allows the programmer to mix these constructs freely. For example, in C you can't return an array(static array), C is said to be unorthognal in this case:
int[] fun(); // you can't return a static array.
// Of course you can return a pointer, but the langauge allows passing arrays.
// So, it is unorthognal in case.

Most of the answers are very long-winded, and even obscure. The point is: if a tool is orthogonal, it can be added, replaced, or removed, in favor of better tools, without screwing everything else up.
It's the difference between a carpenter having a hammer and a saw, which can be used for hammering or sawing, or having some new-fangled hammer/saw combo, which is designed to saw wood, then hammer it together. Either will work for sawing and then hammering together, but if you get some task that requires sawing, but not hammering, then only the orthogonal tools will work. Likewise, if you need to screw instead of hammering, you won't need to throw away your saw, if it's orthogonal (not mixed up with) your hammer.
The classic example is unix command line tools: you have one tool for getting the contents of a disk (dd), another for filtering lines from the file (grep), another for writing those lines to a file (cat), etc. These can all be mixed and matched at will.

While talking about project decisions on programming languages, orthogonality may be seen as how easy is for you to predict other things about that language for what you've seen in the past.
For instance, in one language you can have:
str.split
for splitting a string and
len(str)
for getting the lenght.
On a language more orthogonal, you would always use str.x or x(str).
When you would clone an object or do anything else, you would know whether to use
clone(obj)
or
obj.clone
That's one of the main points on programming languages being orthogonal. That avoids you to consult the manual or ask someone.
The wikipedia article talks more about orthogonality on complex designs or low level languages.
As someone suggested above on a comment, the Sebesta book talks cleanly about orthogonality.
If I would use only one sentence, I would say that a programming language is orthogonal when its unknown parts act as expected based on what you've seen.
Or... no surprises.
;)

From Robert W. Sebesta's "Concepts of Programming Languages":
As examples of the lack of orthogonality in a high-level language,
consider the following rules and exceptions in C. Although C has two
kinds of structured data types, arrays and records (structs), records
can be returned from functions but arrays cannot. A member of a
structure can be any data type except void or a structure of the same
type. An array element can be any data type except void or a function.
Parameters are passed by value, unless they are arrays, in which case
they are, in effect, passed by reference (because the appearance of an
array name without a subscript in a C program is interpreted to be
the address of the array’s first element)

from wikipedia:
Computer science
Orthogonality is a system design property facilitating feasibility and compactness of complex designs. Orthogonality guarantees that modifying the technical effect produced by a component of a system neither creates nor propagates side effects to other components of the system. The emergent behavior of a system consisting of components should be controlled strictly by formal definitions of its logic and not by side effects resulting from poor integration, i.e. non-orthogonal design of modules and interfaces. Orthogonality reduces testing and development time because it is easier to verify designs that neither cause side effects nor depend on them.
For example, a car has orthogonal components and controls (e.g. accelerating the vehicle does not influence anything else but the components involved exclusively with the acceleration function). On the other hand, a non-orthogonal design might have its steering influence its braking (e.g. electronic stability control), or its speed tweak its suspension.1 Consequently, this usage is seen to be derived from the use of orthogonal in mathematics: One may project a vector onto a subspace by projecting it onto each member of a set of basis vectors separately and adding the projections if and only if the basis vectors are mutually orthogonal.
An instruction set is said to be orthogonal if any instruction can use any register in any addressing mode. This terminology results from considering an instruction as a vector whose components are the instruction fields. One field identifies the registers to be operated upon, and another specifies the addressing mode. An orthogonal instruction set uniquely encodes all combinations of registers and addressing modes.

From Wikipedia:
Orthogonality is a system design
property facilitating feasibility and
compactness of complex designs.
Orthogonality guarantees that
modifying the technical effect
produced by a component of a system
neither creates nor propagates side
effects to other components of the
system. The emergent behavior of a
system consisting of components should
be controlled strictly by formal
definitions of its logic and not by
side effects resulting from poor
integration, i.e. non-orthogonal
design of modules and interfaces.
Orthogonality reduces testing and
development time because it is easier
to verify designs that neither cause
side effects nor depend on them.
For example, a car has orthogonal
components and controls (e.g.
accelerating the vehicle does not
influence anything else but the
components involved exclusively with
the acceleration function). On the
other hand, a non-orthogonal design
might have its steering influence its
braking (e.g. electronic stability
control), or its speed tweak its
suspension.[1] Consequently, this
usage is seen to be derived from the
use of orthogonal in mathematics: One
may project a vector onto a subspace
by projecting it onto each member of a
set of basis vectors separately and
adding the projections if and only if
the basis vectors are mutually
orthogonal.
An instruction set is said to be
orthogonal if any instruction can use
any register in any addressing mode.
This terminology results from
considering an instruction as a vector
whose components are the instruction
fields. One field identifies the
registers to be operated upon, and
another specifies the addressing mode.
An orthogonal instruction set uniquely
encodes all combinations of registers
and addressing modes.
To put it in the simplest terms possible, two things are orthogonal if changing one has no effect upon the other.

Orthogonality means the degree to which language consists of a set of independent primitive constructs that can be combined as necessary to express a program.
Features are orthogonal if there are no restrictions on how they may be combined
Example : non-orthogonality
PASCAL: functions can't return structured types.
Functional Languages are highly orthogonal.

Real life examples of orthogonality in programming languages
There are a lot of answers already that explain what orthogonality generally is while specifying some made up examples. E.g. this answer explains it well. I wanted to provide (and gather) some real life examples of orthogonal or non-orthogonal features in programming languages:
Orthogonal: C++20 Modules and Namespaces
On the cppreference-page about the new Modules system in c++20 is written:
Modules are orthogonal to namespaces
In this case they write that modules are orthogonal to namespaces because a statement like import foo will not import the module-namespace related to foo:
import foo; // foo exports foo::bar()
bar (); // Error
foo::bar (); // Ok
using namespace foo;
bar (); // Ok
(adapted from modules-cppcon2017 slide 9)

In programming languages a programming language feature is said to be orthogonal if it is bounded with no restrictions (or exceptions).
For example, in Pascal functions can't return structured types. This is a restriction on returning values from a function. Therefore we it is considered as a non-orthogonal feature. ;)

Orthogonality in Programming:
Orthogonality is an important concept, addressing how a relatively small number of components can be combined in a relatively small number of ways to get the desired results. It is associated with simplicity; the more orthogonal the design, the fewer exceptions. This makes it easier to learn, read and write programs in a programming language. The meaning of an orthogonal feature is independent of context; the key parameters are symmetry and consistency (for example, a pointer is an orthogonal concept).
from Wikipedia

Orthogonality in a programming language means that a relatively small set of
primitive constructs can be combined in a relatively small number of ways to
build the control and data structures of the language. Furthermore, every pos-
sible combination of primitives is legal and meaningful. For example, consider data types. Suppose a language has four primitive data types (integer, float,
double, and character) and two type operators (array and pointer). If the two
type operators can be applied to themselves and the four primitive data types,
a large number of data structures can be defined.
The meaning of an orthogonal language feature is independent of the
context of its appearance in a program. (the word orthogonal comes from the
mathematical concept of orthogonal vectors, which are independent of each
other.) Orthogonality follows from a symmetry of relationships among primi-
tives. A lack of orthogonality leads to exceptions to the rules of the language.
For example, in a programming language that supports pointers, it should be
possible to define a pointer to point to any specific type defined in the language.
However, if pointers are not allowed to point to arrays, many potentially useful user-defined data structures cannot be defined.
We can illustrate the use of orthogonality as a design concept by compar-
ing one aspect of the assembly languages of the IBM mainframe computers
and the VAX series of minicomputers. We consider a single simple situation:
adding two 32-bit integer values that reside in either memory or registers and
replacing one of the two values with the sum. The IBM mainframes have two
instructions for this purpose, which have the forms
A Reg1, memory_cell
AR Reg1, Reg2
where Reg1 and Reg2 represent registers. The semantics of these are
Reg1 ← contents(Reg1) + contents(memory_cell)
Reg1 ← contents(Reg1) + contents(Reg2)
The VAX addition instruction for 32-bit integer values is
ADDL operand_1, operand_2
whose semantics is
operand_2 ← contents(operand_1) + contents(operand_2)
In this case, either operand can be a register or a memory cell.
The VAX instruction design is orthogonal in that a single instruction can
use either registers or memory cells as the operands. There are two ways to
specify operands, which can be combined in all possible ways. The IBM design
is not orthogonal. Only two out of four operand combinations possibilities are
legal, and the two require different instructions, A and AR . The IBM design
is more restricted and therefore less writable. For example, you cannot add
two values and store the sum in a memory location. Furthermore, the IBM
design is more difficult to learn because of the restrictions and the additional instruction.
Orthogonality is closely related to simplicity: The more orthogonal the
design of a language, the fewer exceptions the language rules require. Fewer
exceptions mean a higher degree of regularity in the design, which makes the
language easier to learn, read, and understand. Anyone who has learned a sig-
nificant part of the English language can testify to the difficulty of learning its
many rule exceptions (for example, i before e except after c).

The basic idea of orthogonality is that things that are not related conceptually should not be related in the system. Parts of the architecture that really have nothing to do with the other, such as the database and the UI, should not need to be changed together. A change to one should not cause a change to the other.

Orthogonality is the idea that things that are not related conceptually should not be related in the system so parts of the architecture that have nothing to do with each other, like the database and UI should not be changed together. A change to one part of your system should not cause the change to the other.
If for example, you change a few lines on the screen and cause a change in the database schema, this is called coupling. You usually want to minimize coupling between things that are mostly unrelated because it can grow and the system can become a nightmare to maintain in the long run.

From Michael C. Feathers' book "Working Effectively With Legacy Code":
If you want to change existing behavior in your code and there is exactly one place you have to go to make that change, you've got orthogonality.

Related

Using functions in VHDL for synthesis

I do use functions in VHDL now and then, mostly in testbenches and seldom in synthesized projects, and I'm quite happy with that.
However, I was wondering if for projects that will be synthesized, it really is a smart move (in terms of LE use mostly?) I've read quite a lot of things about that online, however I can't find anything satisfying.
For instance, I've read something like that : "The function is synthesized each time it's called !!". Is it really so? (I thought of it more like a component instantiated once but whose inputs and output and accessed from various places in the design but I guess that may be incorrect).
In the case of a once-used function, what would change between that and writing the VHDL directly in the process for example? (In terms of LE use?).

A circuit in hardware, for example a FPGA, executes everywhere all the time, where in compare a program for an CPU executes only one place at a time. This allows a program on a CPU to reuse program code for different data, where a circuit in hardware must have sufficient resources to process all the data all the time.
So a circuit written in VHDL is generally translated by the synthesis tool as massive parallel construction that allows concurrent operation of all of the design all the time. The VHDL language is created with the purpose of concurrent execution, and this is a major different from ordinary programming languages.
As a consequence, a design that implements an algorithm with functions vs. a design that implements the same algorithm with separate logic, will have the exact same size and speed since the synthesis tool will expand the functions to dedicated logic in order to make the required hardware available.
That being said, it is possible to reuse the same hardware for different data, but the designer must generally explicitly create the design to support this, and thereby interleave different data sets when timing allows it.
And finally, as scary_jeff also points out, it is a smart move to use functions since there is nothing to loose in terms of size or speed, but all the advantages of creating a manageable design. But be aware, that functions can't contain state, so it is only possible to create functions for combinatorial logic between flip-flops, which usually limits the possible complexity in order to meet timing.

Yes, you should use functions and procedures.
Many people and companies use functions and procedures in synthesizable code. Some coding styles disallow functions for no good reason. If you feel uncertain about a certain construct in VHDL (in this case: functions), just type up a small example and inspect the synthesis result.
Functions are really powerful and they can help you create better hardware with less effort. As with all powerful things, you can create really bad code (and bad synthesis results) with functions too.

CUDA - Use the CURAND Library for Dummies

I was reading the CURAND Library API and I am a newbie in CUDA and I wanted to see if someone could actually show me a simple code that uses the CURAND Library to generate random numbers. I am looking into generating a large amount of number to use with Discrete Event Simulation. My task is just to develop the algorithms to use GPGPU's to speed up the random number generation. I have implemented the LCG, Multiplicative, and Fibonacci methods in standard C Language Programming. However I want to "port" those codes into CUDA and take advantage of threads and blocks to speed up the process of generating random numbers.
Link 1: http://adnanboz.wordpress.com/tag/nvidia-curand/
That person has two of the methods I will need (LCG and Mersenne Twister) but the codes do not provide much detail. I was wondering if anyone could expand on those initial implementations to actually point me in the right direction on how to use them properly.
Thanks!

Your question is misleading - you say "Use the cuRAND Library for Dummies" but you don't actually want to use cuRAND. If I understand correctly, you actually want to implement your own RNG from scratch rather than use the optimised RNGs available in cuRAND.
First recommendation is to revisit your decision to use your own RNG, why not use cuRAND? If the statistical properties are suitable for your application then you would be much better off using cuRAND in the knowledge that it is tuned for all generations of the GPU. It includes Marsaglia's XORWOW, l'Ecuyer's MRG32k3a, and the MTGP32 Mersenne Twister (as well as Sobol' for Quasi-RNG).
You could also look at Thrust, which has some simple RNGs, for an example see the Monte Carlo sample.
If you really need to create your own generator, then there's some useful techniques in GPU Computing Gems (Emerald Edition, Chapter 16: Parallelization Techniques for Random Number Generators).
As a side note, remember that while a simple LCG is fast and easy to skip-ahead, they typically have fairly poor statistical properties especially when using large quantities of draws. When you say you will need "Mersenne Twister" I assume you mean MT19937. The referenced Gems book talks about parallelising MT19937 but the original developers created the MTGP generators (also referenced above) since MT19937 is fairly complex to implement skip-ahead.
Also as another side note, just using a different seed to achieve parallelisation is usually a bad idea, statistically you are not assured of the independence. You either need to skip-ahead or leap-frog, or else use some other technique (e.g. DCMT) for ensuring there is no correlation between sequences.

Can these sorts of programs exist in every Turing-complete language?

In every Turing-Complete language, is it possible to create a working
Compiler for itself which first runs on an interpreter written in some other language and then compiles it's own source code? (Bootstrapping)
Standards-Compilant C++ compiler which outputs binaries for, e.g.: Windows?
Regex Parser and Evaluater?
World of Warcraft clone? (Assuming the language gets the necessary API bindings as, for example, OpenGL and the WoW source code is available)
(Everything here theoretical)
Let's take Brainf*ck as an example language.

In every Turing-Complete language, is it possible to create a working...
If one Turing-complete language can do it, then they all can. In this sense, they're all equally "powerful". Since everything you described already exists in at least one Turing-complete language, any of these programs can be written in any other Turing-complete language.
However, merely because something is possible doesn't mean it's easy, or even feasible. That's a hugely important distinction, and it's the crux of why different programming languages exist. They're not all equally good at making specific kinds of software -- if they were, we'd only need one language!

Turing-Complete only express computation capability, nothing about I/O capability !

No, Turing completeness have nothing to do with I/O and hardwares.
However, you can pretend I/O, hardware systems and graphic systems existed by using variables (or the "memory tape"). In BF, you could use the first 2 cells (x, y) for the "pretended" screen resolution, then another x times y cells for all pixels on the screen, then next cell (n) for the "pretended" filesystem size, then next n cells for the filesystem content...

Yes, of course, all of those. That's what "Turing-complete" means, after all: it can compute everything that can be computed.

All that being Turing-complete really requires is that you can do simple math, have some variables, and can do a while loop. Or any number of equivalent things. If you want to do real programs, you need a bit more (notably syscalls) and you have to worry about efficiency too (turing machines can be very slow...) In theory there's no difference between turing-equivalent systems, but in practice there is.
If anyone does a WoW client in BF, I will be very impressed!

Any algorithm that can be implemented in one Turing-Complete language can be implemented in any other. Your questions have more to do with the operating system services and APIs which must be made available through the language in question.
In short, the answer is yes to all of the above from a formal language point of view.

Theoretical, yes. But a much more interesting question is if it would be practically possible given a certain "esoteric" programming language.

Every Turing-Complete language could compute the same set of functions.
So, a turing-complete language could do all what you wrote because that things are computed with other turing-complete languages.

Why is memoization not a language feature?

I was wondering: why is memoization not provided natively as a language feature in any language I know about?
Edit: to clarify, what I mean is that the language provides a keyword to specify a given function as memoizable, not that every function is automatically memoized "by default" unless specified otherwise. For example, Fortran provides the keyword PURE to specify a specific function as such. I guess that the compiler can take advantage of this information to memoize the call, but I ignore what happens if you declare PURE a function with side effects.

What YOU want from memoization may not be the same as what the compiler memoization option would provide.
You may know that it is only profitable to memoize the last 10 or so distinct values computed, because you know how the function will be used.
You may know that it only makes sense to memoize the last 2 or 3 values, because you will never use values older than that. (Fibonacci's Sequence comes to mind.)
You may be generating a LOT of values on some runs, and just a few on others.
You may want to "throw away" some of the memoized values and start over. (I memoized a random number generator this way, so I could replay the sequence of random numbers that built a certain structure, while some other parameters of the structure had been changed.)
Memoization as an optimization depends on the search for the memoized value being a lot cheaper than recomputation of the value. This in turn depends on the ordering of the input requests. This has implications for the memoization database: Does it use a stack, an array of all possible input values (which may be very large), a bucket hash, or a b-tree?
The memoizing compiler has to either provide a "one size fits all" memoization, or it has to provide lots of possible alternatives, and parameters to control the alternatives. At some point, it becomes easier for everyone to require the user to provide his own memoization.

Because compilers have to emit semantically correct programs. You can't memoize a function without changing program semantics unless it is referentially transparent. In most programming languages not all functions are referentially transparent (pure functional programming languages are an exception) so you can't memoize everything. But then a mechanism is needed for detecting referential transparency and that is too hard.

In Haskell, memoization is automatic for (pure) functions you've defined that take no arguments. And the Fibonacci example in that Wiki is really about the simplest demonstrable example I would be able to think of either.
Haskell can do this because your pure functions are defined to produce the same results every time; of course, monadic functions that depend on side effects won't be memoized.
I'm not sure what the upper limits are -- obviously, it won't memoize more than the available memory. And I'm also not sure offhand if the memoization occurs at compile-time (if the values can be determined at compile-time), or if it always occurs the first time the function is called.

Clojure has a memoize function (http://richhickey.github.com/clojure/clojure.core-api.html#clojure.core/memoize):
memoize
function
Usage: (memoize f)
Returns a memoized version of a referentially transparent function. The
memoized version of the function keeps a cache of the mapping from arguments
to results and, when calls with the same arguments are repeated often, has
higher performance at the expense of higher memory use.

A) Memoization trades space for time. I imagine that this can turn out to a fairly unbound property, in the sense, that the amount of data programs or libraries would have to store could consume large parts of memory really quick.
For a couple of languages, memoization is easy to implement and easy to customize for the given requirements.
As an example take some natural language processing on large bodies of text, where you don't want to compute basic properties of texts (word count, frequency, cooccurrences, ...) over and over again. In that case a memoization in combination with object serialization can be useful as opposed to memory caching, since you may run your application multiple times on unchanged corpora.
B) Another aspect: It's not true, that all functions or methods yield the same output for a same given input. Anyway some keyword or syntax for memoization would be necessary, along with configuration (memory limits, invalidation policy, ...) ...

Because you shouldn't implement something as a language feature when it can easily be implemented in the language itself. A memoization feature belongs in a library, which is exactly where most languages put it.

Your question also leaves open the solution of your learning more languages. I think that Lisp supports memoization, and I know that Mathematica does.

In order for memoization to work as a language feature there would be a couple requirements.
The compiler would need to be identify valid functions for memoization (e.g. they are referentially transparent).
The run-time would have to be able to intelligently select candidates for memoization without slowing down the overall performance.
There are some assumptions in the other language, but if we can have performance gains by just-in-time compilation of hot-spots in a Java VM, then one can surely write an automated memoziation system.
While non-trivial I think this is all theoretically possible to get performance gains in a language (especially an interpreted one) and is a worthwhile area for research.

Not all the languages natively support function decorators. I guess it would be a more general approach to support rather than supporting just memoization.

Reverse the question. Why it should? As someone has said, it can be put in a library so no need of add syntax to the language, it's only usable on pure functions which are hard to identify automatically(unless you force the programmer to annotate them). It's also very hard to determine if memoization is going to speed up things or not. I don't think it's a desirable feature for a programming language.

I really think such an option should be.
In data processing tasks there is an immutable input data (as time series, for example, where for a given time as soon as a value is known, it can never change). Taking in mind today RAM affordability, if a function result only depends on such immutable data, it is rational to memoize it rather than reread every time it's needed. Currently I have (in Scala and C#) to manually introduce an in-memory storage table and write 3 functions instead of one - one reading a value from file/db/ws, one storing it into an in-memory table, one to wrap them and read from memory if available or call the raw function if not. I think this could and should be implemented as a keyword and done behind the scenes.

Performances evaluation with Message Passing

I have to build a distributed application, using MPI.
One of the decision that I have to take is how to map instances of classes into process (and then into machines), in order to take maximum advantages from a distributed environment.
My question is: there is a model that let me choose the better mapping? I mean, some arrangements are surely wrong (for ex., putting in two different machines two objects that should process together a fairly large amount of data, in a sequential manner, without a stream of tokens to process), but there's a systematically way to determine such wrong arrangements, determined by flow of execution, message complexity, time taken by the computation done by the algorithmic components?

Well, there are data flow diagrams. Those can help identify parallelism's opportunities and pitfalls. The references on the wikipedia page might give you some more theoretical grounding.
When I worked at Lockheed Martin, I was exposed to CSIM, a tool they developed for modeling algorithm mapping to processing blocks.

Another thing you might try is the Join Calculus. I've found examples of programming with it to be surprisingly intuitive, and I think it's well grounded in theory. I'm not sure why it hasn't caught on more.
The other approach is the Pi Calculus, and I think that might be more popular, though it seems harder to understand.

A practical solution to this would be using a different model of distributed-memory parallel programming, that directly addresses your concerns. I work on the Charm++ programming system, whose model is that of individual objects sending messages from one to another. The runtime system facilitates automatic mapping of these objects to available processors, to account for issues of load balance and communication locality.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008