Beyond type theory - language-agnostic

There has been much fuss about dynamically vs. statically typed languages. To my eye, however, while statically typed languages enable the compiler (or interpreter) to know a bit more about your intentions, they only barely scratch the surface of what could be conveyed. Indeed, some languages have an orthogonal mechanism for providing a bit more information in annotations.
I am aware of strongly typed languages like Agda and Coq that are very persnickety about what they allow you to do; I'm not terribly interested in those. Rather, I'm wondering what languages or theory exist that expand the richness of what you can explain to the compiler about what it is that you intend. For example, if you have a mutable vector and you turn it into a unit vector, why couldn't your compiler select a unit-vector form of vector projection instead of the more computationally expensive general form? The type has not changed--and the work required to build all the requisite types would be off-putting even in a language with amazingly easy typing such as Haskell--and yet it seems that the compiler could be empowered to know a great deal about the situation.
Does some language already enable things like this, either outside of standard type-theory or within one of its more advanced branches?

there are languages with turing-complete system type. which means that your types can express any computable property. for example list of length 6 or valid credit card number. however most mainstream languages uses simpler system types. haskell is considered to have very powerful system type

Related

Why can't programming language turn off select "features"

As programming evolved over the years (from assembler to high level languages), more and more features (garbage collection, exceptions, dynamic typing) have been added as standard to some languages. Is it possible to create a high level language that starts of with all features on by default, and once a program runs as well, then to be able to selectively turn features of in code, or have sections of code which a quarantined off so that they do not use these features. Perhaps modifying branches in the Abstract Syntax Tree to be statically typed, instead of dynamic; compiled, instead of interpreted.
Is there any programming language which can be used as dynamic and static, and also selectively turn of garbage collection, by releasing used objects, even up to disabling exception handling, all the way to the point where the run-time consist of only c like constructs, or any the above mentioned?
For a language to do what you're asking, it would have to be built to support both alternatives: garbage collection and manual memory management or static and dynamic typing and make the two worlds interoperate.
In other words, what you're saying to be just "turn off A", is actually "design A, design B, design transitioning between A and B". So, doing this would be a significant amount of additional design and implementation work, it would make the language more complicated and the language might end up as "worst of both worlds".
Now, languages that support both combinations of the features you mentioned do exist, in a limited form:
C# is normally a statically typed language, but it also has the dynamic keyword, which allows you to switch to dynamic typing for certain variables. This was primarily meant for interoperation with dynamic languages, and is not used much in practice.
C++/CLI is a language that supports both manually managed memory, (* pointers, new to allocate and delete to deallocate) and garbage collected memory (^ pointers, gcnew to allocate). It is primarily meant for interoperation between C++ code and .Net code and is not widely used in practice.
You might have noticed a theme here: in both cases, the feature/language were created to bridge the two worlds, but didn't gain much traction.

Pros and cons of weak and strong typing

I'm making the transition from Java to PHP/Javascript and discovering all the practical aspects of using a weakly typed language.
As I'm in a position to fully compare the two I'd like to know the pros and cons of each approach. Also, are there any other forms of typing out there?
A weakly dynamically typed programming language (like PHP) made that the programmer's mistakes occur as non-coherent behaviours (for instance, the program gonna display stupid informations).
With a strongly dynamically typed language (like python), the programming mistakes causes error message. It makes the mistakes easier to uncover and diagnosis but in general the program became not usable after the message has been shown.
Finally, with a strongly statically typed language (like Java, Ada, OCaml, Haskell, ...) some mistakes can be uncovered at compile time and hence reduce the risk to provide an bugged program. (but the release occurs later)
Yes. Python uses Dynamic Typing.
Generally it's a matter of personal preference and the role that the architects of a given language's intended use.
PHP (a scripting language) for example makes sense to be weakly typed, as the tasks it generally performs are far less complex, and require less constraints then say a compiled language.
Regarding your final question, Mathematica is said to be "typeless."
High-level, typeless, dynamic language with consistent symbolic syntax and semantics across all data, functions, and interfaces
PHP/javascript can be used to develope better looking UI's than Java. PHP will be having less constraints and easy to learn and execute than java.

Idiom vs. pattern

In the context of programming, how do idioms differ from patterns?
I use the terms interchangeably and normally follow the most popular way I've heard something called, or the way it was called most recently in the current conversation, e.g. "the copy-swap idiom" and "singleton pattern".
The best difference I can come up with is code which is meant to be copied almost literally is more often called pattern while code meant to be taken less literally is more often called idiom, but such isn't even always true. This doesn't seem to be more than a stylistic or buzzword difference. Does that match your perception of how the terms are used? Is there a semantic difference?
Idioms are language-specific.
Patterns are language-independent design principles, usually written in a "pattern language" (a uniform template) describing things such as the motivating circumstances, pros & cons, related patterns, etc.
When people observing program development from On High (Analysts, consultants, academics, methodology gurus, etc) see developers doing the same thing over and over again in various situations and environments, then the intelligence gained from that observation can be distilled into a Pattern. A pattern is a way of "doing things" with the software tools at hand that represent a common abstraction.
Some examples:
OO programming took global variables away from developers. For those cases where they really still need global variables but need a way to make their use look clean and object oriented, there's the Singleton Pattern.
Sometimes you need to create a new object having one of a variety of possible different types, depending on some circumstances. An ugly way might involve an ever-expanding case statement. The accepted "elegant" way to achieve this in an OO-clean way is via the "Factory" or "Factory Method" pattern.
Sometimes, a lot of developers do things in a certain way but it's a bad way that should be disrecommended. This can be formalized in an antipattern.
Patterns are a high-level way of doing things, and most are language independent. Whether you create your objects with new Object or Object.new is immaterial to the pattern.
Since patterns are something a bit theoretical and formal, there is usually a formal pattern (heh - word overload! let's say "template") for their description. Such a template may include:
Name
Effect achieved
Rationale
Restrictions and Limitations
How to do it
Idioms are something much lower-level, and usually operate at the language level. Example:
*dst++ = *src++
in C copies a data element from src to dst while incrementing the pointers to both; it's usually done in a loop. Obviously, you won't see this idiom in Java or Object Pascal.
while <INFILE> { print chomp; }
is (roughly quoted from memory) a Perl idiom for looping over an input file and printing out all lines in the file. There's a lot of implicit variable use in that statement. Again, you won't see this particular syntax anywhere but in Perl; but an old Perl hacker will take a quick look at the statement and immediately recognize what you're doing.
Contrary to the idea that patterns are language agnostic, both Paul Graham and Peter Norvig have suggested that the need to use a pattern is a sign that your language is missing a feature. (Visitor Pattern is often singled out as the most glaring example of this.)
I generally think the main difference between "patterns" and "idioms" to be one of size. An idiom is something small, like "use an interface for the type of a variable that holds a collection" while Patterns tend to be larger. I think the smallness of idioms does mean that they're more often language specific (the example I just gave was a Java idiom), but I don't think of that as their defining characteristic.
Since if you put 5 programmers in a room they will probably not even agree on what things are patterns, there's no real "right answer" to this.
One opinion that I've heard once and really liked (though can't for the life of me recall the source), is that idioms are things that should probably be in your language or there is some language that has them. Conversely, they are tricks that we use because our language doesn't offer a direct primitive for them. For instance, there's no singleton in Java, but we can mimic it by hiding the constructor and offering a getInstance method.
Patterns, on the other hand, are more language agnostic (though they often refer to a specific paradigm). You may have some infrastructure to support them (e.g., Spring for MVC), but they're not and not going to be language constructs, and yet you could need them in any language from that paradigm.

What language features are required in a programming language to make a compiler?

Programming languages seem to go through several stages. Firstly, someone dreams up a new language, Foo Language. The compiler/interpreter is written in another language, usually C or some other low level language. At some point, FooL matures and grows, and eventually someone, somewhere will write a compiler and/or interpreter for FooL in FooL itself.
My question is this: What is the minimal subset of language features such that someone could implement that language in itself?
Compiler can be written even using a Turing machine - a Universal Turing Machine is basically a compiler/interpreter of any Turing machine, so any Turing-complete language should be enough :)
In theory, surprisingly little. A computability theorist would say that all you need is mu-recursion or a Turing machine or the like.
However, from a practical point of view, you're not going to be very happy trying to implement a programming language in a Turing machine. I would say that, at a minimum, you would want to have all the usual control-flow constructs, the primitive datatypes, subroutines, as well as arrays and structs. That should be enough to let you implement that subset of the language in the language itself -- and you can then bootstrap yourself up from there.
One option is a read-eval-print loop. This can be used to build many higher-level constructs. I believe this is the path taken by LISP.
I am unsure about the beginnings of C, but I think it started with a few system calls to implement branching, loops, assignment and single-character I/O, and built from there.
Id assume a assembler would make the cut.
My question is this: What is the minimal subset of language features such that someone could implement that language in itself?
There is no requirement for the language to be useful for anything other than compiling itself? I present to you Useless, the language in which every text is a proper program and means "a program that takes any input and produces itself" (this is also known as Useless compiler).

Syntactic sugar vs. feature

In C# (and Java) a string is little more than a char array with a stored length and a few methods tacked on. Likewise, (reference vs. value stuff aside) objects are little more than glorified structs with inheritance and interfaces added.
On one level, these additions feel like clear features and enhancements unto themselves. On another level, they feel like a marginal upgrade from the status of "syntactic sugar."
To take this idea further, consider (I may have some details wrong, but the point remains):
transistor
elementary logic gate
compound gate
| |
ALU flip-flop
| | |
| register RAM
| |
CPU
microcode
assembly
C
C++
| |
MSIL JavaScript
C# jQuery
Many times, any single layer of abstraction looks a lot like syntactic sugar but multiple layers of separation feel very removed from each other.
How do you know when something has stopped being syntactic sugar and started being a bona fide feature?
It turns out to be a feature instead of syntactic sugar when it implies a different way of thinking.
You are right when you say objects are in fact glorified structs with methods and inheritance. That, however, is just the implementation detail. What objects allow is to think in a different way. You can relate more easily to real world entities when thinking about objects. The same thing happened when even further back in time, we jumped from using go-to's to procedural programming. Under the hood, the processor still keeps on jmp'ing from OP to OP, but we could think in a different, more black-box, way.
Having said that, in extreme, you can say everything is syntactic sugar, but some of that sugar is a feature when it allows you to think differently.
Syntactic sugar is a feature.
All of software is a giant stack of abstractions built on top of other abstractions. A string may be nothing more than an array of characters, but there are many operations that feel natural on strings, but awkward on character arrays. The goal of all of these abstractions is the same: remove irrelevant details so that the developer can focus on the important parts of the problem.
As you point out, all modern programming languages could be eliminated, and we could go back to working in assembly language. But our productivity would plummet.
I guess people call something syntactic sugar when they feel they get little benefit from it, and a feature when they feel the get a large benefit from it. That makes the distinction very fuzzy, and quite subjective.
When the change provides value? I have coded in assembler. I switched to C and looked at the output from the compiler. It's code was 95+% as good as my hand crafted assembler and it was much easier to write. For me that provided value so I'd say it wasn't sugar.
C++ helps me translate my object oriented thoughts into code. As long as the overhead isn't terribly high then I think it's a feature.
I'm a practical sort. "If I can see it's valuable" is my answer
"Syntactic sugar" is a feature you don't like
It seems that syntactical surgar is a syntax that changes nothing about the abilities of the language, and using a different construct accomplishes exactly the same thing. A String (thinking in Java) is not just syntatical sugar over a char array. A char array is mutable (in content if not in length). You could not make a char array immutable with an existing language feature without a String array.
On the other hand, the plus operator working on Strings is indeed syntatical sugar for using a StringBuilder and calling append.
I would have to say when the same result is cannot be achieved simply by writing different code, with the same type of "time-constraint" as using the syntactical sugar.
My Example would be a Lambda expression, writing a foreach loop doesn't take a lot of effort, but using .Foreach() sure is nice too; versus rewriting the whole HttpRequest class on your own. One is syntactical, one is a feature. Both save time, one in a much bigger way than the other.
Generally the term "syntactic sugar" refers to language features which never allowed a programmer to do something which could not be done before, but rather provided a nice means of expressing something that could already expressed in the language, even if somewhat more awkwardly.
Certain constructs may be unambiguously regarded as syntactic sugar. For example, in VB.NET, code to test for whether two references weren't equal used to require If Not (ref1 Is Ref2) but newer versions of the language allow If ref1 IsNot Ref2. Nothing can be expressed in the new syntax that couldn't be expressed in the old, but the new syntax is cleaner, introduces no ambiguities, and the only reason not to use it would be if code had to be back-compatible with old versions of the language.
Some constructs may be a bit harder to define as sugar. In particular, if a language adds constructs which will work identically to existing construct when used with other types, but will fail compilation with others, such constructs may provide a means of compile-time type verification which did not exist previously. Java generics may generally be viewed in this light. One can add a Cat to an ArrayList<Cat> just as easily as to an ArrayList; what the ArrayList<Cat> adds is a guard to reject Dogs at compile time. Since compile-time constraints don't allow one to write any program that couldn't be written without them, some people may view them as syntactic sugar. On the other hand, even though type verification is performed at compile-time rather than run-time, it might still be viewed as one of the jobs of a program.
Syntactic sugar and language feature are basically describing the same thing, even if syntactic sugar is sometimes used in a pejorative way whereas feature is often associated with deeper changes in the language architecture (introducing lambdas etc.).
But this distinction is very dependent on a individual point of view (and its subjectively felt usefulness).
Regarding language-design aspects and your example with strings and char-arrays, I would say that this should be neither a feature nor sugar, but simply expressible in the languages basic syntax (LOP - language-oriented programming). Generic concepts (typeclasses, metaprogramming etc.) allow you to express many new and useful constructs by yourself without waiting for the language to get a new feature. Just look at Haskell or C++'s metaprogramming capabilities.