I am writing a language where functions are not typed. Which means I need to infer the return type of a function call in order to do type checking. However when somebody writes a recursive function the type checker goes into an infinite recursion trying to infer the type of the function call inside the function body.
The type checker does something like this:
Infer the types of the function call actual arguments.
Create a mapping of the actual argument types to the formal arguments.
Use the mapping to annotate types on the arguments used inside the function body.
Infer and return the return type of the function body.
Step 4 tries to then infer the type of the function call inside the function body, which calls the same type checker function again, causing an infinite recursion.
An example of a recursive function that gives me this problem:
function factorial(n) = n<1 ? 1 : n*factorial(n-1); // Function definition.
...
assert 24 == factorial(4); // Function call expression usage example.
How can I solve this problem without going in to an infinite recursion loop? Is there a way to infer the type of the recursive function call without having to go into the body again? Or some clean way to infer the type from context?
I know the easy solution might be to add types annotations to functions, this way the problem is trivial, but before doing that I want to know if there is a way to solve this without resorting to that.
I'd also like for the solution to work for mutual recursion.
Type inference can vary a lot depending on the language's type system and on what properties you want to have in terms of when annotations are needed. But whatever your language looks like, I think there's one seminal case you really should read about, which is ML. ML's type inference holds a nice sweet spot where it all fits together in a relatively simple paradigm. No type annotations are needed, and any expression has a single most general type (this property is called principality of typing).
ML's type system is the Hindley-Milner type system, which has parametric polymorphism. The type of an expression is either a specific type, or “any”. More precisely, the type constructor of an expression is either a specific type constructor or “any”, and type constructors can have arguments which themselves either have a specific type constructor or “any”. For example, the empty list has the type “list of any”. Two expressions that can have “any” type in isolation may be constrained to have the same type, whatever it is, so “any” is expressed with variables. For example, function list_of_two(x, y) = [x, y] (in a notation like your language) constrains x and y to have the same type, because they're inserted in the same list, but that type can be any type, so the type of this function is “take any two parameters of the same type α, and return a value of type list of α”.
The basic type inference algorithm for Hindley-Milner is algorithm W. At its core, it works by giving each subexpression a type that's a variable: α₁, α₂, α₃, … Programming language constructions then impose constraints on those variables. For example, if a list contains two elements of types α₁ and α₂ and the list itself has the type α₃, this constraints α₁ = α₂ and α₃ = list of α₁. Putting all these constraints together is a unification problem.
The constraints are based on a purely syntactic reading of the program. If there's a recursive call, you don't need to know the type of the function: it just means that there's a constraint that the variable for the return type of the function is the same as the type at its point of use. That's just one more equation to add to the set of constraints.
I left out an important aspect of ML which is that an expression's type can be generalized: an expression can be used with different types at different places. This is what allows polymorphism. For example,
let empty_list = [] in
(empty_list # [3]), (empty_list # ["hello"])
is a valid program where empty_list is used once with the type “list of integers” and once with the type “list of strings”. The type of empty_list is “for any α, list of α”: that's parametric polymorphism. Generalization adds some complexity to the algorithm, but it also removes complexity elsewhere, because that's what allows principality. Without it, let empty_list = [] in … would be ambiguous: empty_list would have to have some type, but there's no way to know what type without analyzing …, and then when you do analyze the … above you'd need to make a choice between integer and string.
Depending on your language's type system, ML and algorithm W may be directly reusable or may just provide some vague inspiration. But the principle of using variables during the inference, and progressively constraining these variables, is very general.
Is it possible to remove the last element from a tuple in typesafe manner for arbitrary arity?
I want something like this:
[A,B,C] abc = [a,b,c];
[A,B] ab = removeLast(abc);
No, unfortunately it's not possible, the reason being that a tuple type is represented within the type system as a linked list of instantiations of Tuple, but the type system can't express loops or recursion within the signature of a function. (And having loops/recursion would almost certainly make the type system undecidable.)
One way we could, in principle, solve this in future would be to have a built-in primitive type function that evaluates the last element type of a tuple type.
By "primitive" type function, I mean a type function that can't be written in the language itself, but is instead provided as a built-in by the compiler.
Ceylon doesn't currently have any of these sorts of primitive type functions, but there are a couple of other similar problems which could be solved in this manner.
I'm starting to learn Lisp with a Java background. In SICP's exercise there are many tasks where students should create abstract functions with many parameters, like
(define (filtered-accumulate combiner null-value term a next b filter)...)
in exercise 1.33. In Java (language with safe, static typing discipline) - a method with more than 4 arguments usually smells, but in Lisp/Scheme it doesn't, does it? I'm wondering how many arguments do you use in your functions? If you use it in production, do you make as many layers?
SICP uses a subset of Scheme
SICP is a book used in introductory computer science course. While it explains some advanced concepts, it uses a very tiny language, a subset of the Scheme language and a sub-subset of any real world Scheme or Lisp a typical implementation provides. Students using SICP are supposed to start with a simple and easy to learn language. From there they learn to implement more complex language additions.
Only positional parameters are being used in plain educational Scheme
There are for example no macros developed in SICP. Add that standard Scheme does have only positional parameters for functions.
Lisp and Scheme offer also more expressive argument lists
In 'real' Lisp or Scheme one can use one or more of the following:
objects or records/structures (poor man's closures) which group things. An object passed can contain several data items, which otherwise would need to be passed 'spread'.
defaults for optional variables. Thus we need only to pass those that we want to have a certain non-default value
optional and named arguments. This allows flexible argument lists which are much more descriptive.
computed arguments. The value or the default value of arguments can be computed based on other arguments
Above leads to more complicated to write function interfaces, but which are often easier to use.
In Lisp it is good style to have descriptive names for arguments and also provide online documentation for the interface. The development environment will display information about the interface of a function, so this information is typically only a keystroke away or is even display automatically.
It's also good style for any non-trivial interface which is supposed to be used interactively by the user/developer to check its arguments at runtime.
Example for a complex, but readable argument list
When there are more arguments, then Common Lisp provides named arguments, which can appear in any order after the normal argument. Named arguments provide also defaults and can be omitted:
(defun order-product (product
&key
buyer
seller
(vat (local-vat seller))
(price (best-price product))
amount
free-delivery-p)
"The function ORDER-PRODUCT ..." ; documentation string
(declare (type ratio vat price) ; type declarations
(type (integer 0) amount)
(type boolean free-delivery-p))
...)
We would use it then:
(order-product 'sicp
:seller 'mit-press
:buyer 'stan-kurilin
:amount 1)
Above uses the seller argument before the buyerargument. It also omits various arguments, some of which have their values computed.
Now we can ask whether such extensive arguments are good or bad. The arguments for them:
the function call gets more descriptive
functions have standard mechanisms to attach documentation
functions can be asked for their parameter lists
type declarations are possible -> thus types don't need to be written as comments
many parameters can have sensible default values and don't need to be mentioned
Several Scheme implementations have adopted similar argument lists.
I'm writing a toy compiler thingy which can optimise function calls if the result depends only on the values of the arguments. So functions like xor and concatenate depend only on their inputs, calling them with the same input always gives the same output. But functions like time and rand depend on "hidden" program state, and calling them with the same input may give different output. I'm just trying to figure out what the adjective is that distinguishes these two types of function, like "isomorphic" or "re-entrant" or something. Can someone tell me the word I'm looking for?
The term you are looking for is Pure
http://en.wikipedia.org/wiki/Pure_function
I think it's called Pure Function:
In computer programming, a function may be described as pure if both these statements about the function hold:
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change as program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices.
Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.
The result value need not depend on all (or any) of the argument values. However, it must depend on nothing other than the argument values.
I guess you could say the adjective is "pure" if you go by "pure function".
I always learnt that a function whose output is always the same when the arguments are always the same is called "deterministic". Personally, I feel that that is a more descriptive term. I guess a "pure function" is by definition deterministic, and it seems a pure function is also required to not have any side-effects. I assume that that need not be the case for all deterministic functions (as long as the return value is always the same for the same arguments).
Wikipedia link: http://en.wikipedia.org/wiki/Deterministic_algorithm
Quote:
Given a particular input, it will always produce the same output, and the underlying machine will always pass through the same sequence of states.
What does "type-safe" mean?
Type safety means that the compiler will validate types while compiling, and throw an error if you try to assign the wrong type to a variable.
Some simple examples:
// Fails, Trying to put an integer in a string
String one = 1;
// Also fails.
int foo = "bar";
This also applies to method arguments, since you are passing explicit types to them:
int AddTwoNumbers(int a, int b)
{
return a + b;
}
If I tried to call that using:
int Sum = AddTwoNumbers(5, "5");
The compiler would throw an error, because I am passing a string ("5"), and it is expecting an integer.
In a loosely typed language, such as javascript, I can do the following:
function AddTwoNumbers(a, b)
{
return a + b;
}
if I call it like this:
Sum = AddTwoNumbers(5, "5");
Javascript automaticly converts the 5 to a string, and returns "55". This is due to javascript using the + sign for string concatenation. To make it type-aware, you would need to do something like:
function AddTwoNumbers(a, b)
{
return Number(a) + Number(b);
}
Or, possibly:
function AddOnlyTwoNumbers(a, b)
{
if (isNaN(a) || isNaN(b))
return false;
return Number(a) + Number(b);
}
if I call it like this:
Sum = AddTwoNumbers(5, " dogs");
Javascript automatically converts the 5 to a string, and appends them, to return "5 dogs".
Not all dynamic languages are as forgiving as javascript (In fact a dynamic language does not implicity imply a loose typed language (see Python)), some of them will actually give you a runtime error on invalid type casting.
While its convenient, it opens you up to a lot of errors that can be easily missed, and only identified by testing the running program. Personally, I prefer to have my compiler tell me if I made that mistake.
Now, back to C#...
C# supports a language feature called covariance, this basically means that you can substitute a base type for a child type and not cause an error, for example:
public class Foo : Bar
{
}
Here, I created a new class (Foo) that subclasses Bar. I can now create a method:
void DoSomething(Bar myBar)
And call it using either a Foo, or a Bar as an argument, both will work without causing an error. This works because C# knows that any child class of Bar will implement the interface of Bar.
However, you cannot do the inverse:
void DoSomething(Foo myFoo)
In this situation, I cannot pass Bar to this method, because the compiler does not know that Bar implements Foo's interface. This is because a child class can (and usually will) be much different than the parent class.
Of course, now I've gone way off the deep end and beyond the scope of the original question, but its all good stuff to know :)
Type-safety should not be confused with static / dynamic typing or strong / weak typing.
A type-safe language is one where the only operations that one can execute on data are the ones that are condoned by the data's type. That is, if your data is of type X and X doesn't support operation y, then the language will not allow you to to execute y(X).
This definition doesn't set rules on when this is checked. It can be at compile time (static typing) or at runtime (dynamic typing), typically through exceptions. It can be a bit of both: some statically typed languages allow you to cast data from one type to another, and the validity of casts must be checked at runtime (imagine that you're trying to cast an Object to a Consumer - the compiler has no way of knowing whether it's acceptable or not).
Type-safety does not necessarily mean strongly typed, either - some languages are notoriously weakly typed, but still arguably type safe. Take Javascript, for example: its type system is as weak as they come, but still strictly defined. It allows automatic casting of data (say, strings to ints), but within well defined rules. There is to my knowledge no case where a Javascript program will behave in an undefined fashion, and if you're clever enough (I'm not), you should be able to predict what will happen when reading Javascript code.
An example of a type-unsafe programming language is C: reading / writing an array value outside of the array's bounds has an undefined behaviour by specification. It's impossible to predict what will happen. C is a language that has a type system, but is not type safe.
Type safety is not just a compile time constraint, but a run time constraint. I feel even after all this time, we can add further clarity to this.
There are 2 main issues related to type safety. Memory** and data type (with its corresponding operations).
Memory**
A char typically requires 1 byte per character, or 8 bits (depends on language, Java and C# store unicode chars which require 16 bits).
An int requires 4 bytes, or 32 bits (usually).
Visually:
char: |-|-|-|-|-|-|-|-|
int : |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-|
A type safe language does not allow an int to be inserted into a char at run-time (this should throw some kind of class cast or out of memory exception). However, in a type unsafe language, you would overwrite existing data in 3 more adjacent bytes of memory.
int >> char:
|-|-|-|-|-|-|-|-| |?|?|?|?|?|?|?|?| |?|?|?|?|?|?|?|?| |?|?|?|?|?|?|?|?|
In the above case, the 3 bytes to the right are overwritten, so any pointers to that memory (say 3 consecutive chars) which expect to get a predictable char value will now have garbage. This causes undefined behavior in your program (or worse, possibly in other programs depending on how the OS allocates memory - very unlikely these days).
** While this first issue is not technically about data type, type safe languages address it inherently and it visually describes the issue to those unaware of how memory allocation "looks".
Data Type
The more subtle and direct type issue is where two data types use the same memory allocation. Take a int vs an unsigned int. Both are 32 bits. (Just as easily could be a char[4] and an int, but the more common issue is uint vs. int).
|-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-|
|-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-|
A type unsafe language allows the programmer to reference a properly allocated span of 32 bits, but when the value of a unsigned int is read into the space of an int (or vice versa), we again have undefined behavior. Imagine the problems this could cause in a banking program:
"Dude! I overdrafted $30 and now I have $65,506 left!!"
...'course, banking programs use much larger data types. ;) LOL!
As others have already pointed out, the next issue is computational operations on types. That has already been sufficiently covered.
Speed vs Safety
Most programmers today never need to worry about such things unless they are using something like C or C++. Both of these languages allow programmers to easily violate type safety at run time (direct memory referencing) despite the compilers' best efforts to minimize the risk. HOWEVER, this is not all bad.
One reason these languages are so computationally fast is they are not burdened by verifying type compatibility during run time operations like, for example, Java. They assume the developer is a good rational being who won't add a string and an int together and for that, the developer is rewarded with speed/efficiency.
Many answers here conflate type-safety with static-typing and dynamic-typing. A dynamically typed language (like smalltalk) can be type-safe as well.
A short answer: a language is considered type-safe if no operation leads to undefined behavior. Many consider the requirement of explicit type conversions necessary for a language to be strictly typed, as automatic conversions can sometimes leads to well defined but unexpected/unintuitive behaviors.
A programming language that is 'type-safe' means following things:
You can't read from uninitialized variables
You can't index arrays beyond their bounds
You can't perform unchecked type casts
An explanation from a liberal arts major, not a comp sci major:
When people say that a language or language feature is type safe, they mean that the language will help prevent you from, for example, passing something that isn't an integer to some logic that expects an integer.
For example, in C#, I define a function as:
void foo(int arg)
The compiler will then stop me from doing this:
// call foo
foo("hello world")
In other languages, the compiler would not stop me (or there is no compiler...), so the string would be passed to the logic and then probably something bad will happen.
Type safe languages try to catch more at "compile time".
On the down side, with type safe languages, when you have a string like "123" and you want to operate on it like an int, you have to write more code to convert the string to an int, or when you have an int like 123 and want to use it in a message like, "The answer is 123", you have to write more code to convert/cast it to a string.
To get a better understanding do watch the below video which demonstrates code in type safe language (C#) and NOT type safe language ( javascript).
http://www.youtube.com/watch?v=Rlw_njQhkxw
Now for the long text.
Type safety means preventing type errors. Type error occurs when data type of one type is assigned to other type UNKNOWINGLY and we get undesirable results.
For instance JavaScript is a NOT a type safe language. In the below code “num” is a numeric variable and “str” is string. Javascript allows me to do “num + str” , now GUESS will it do arithmetic or concatenation .
Now for the below code the results are “55” but the important point is the confusion created what kind of operation it will do.
This is happening because javascript is not a type safe language. Its allowing to set one type of data to the other type without restrictions.
<script>
var num = 5; // numeric
var str = "5"; // string
var z = num + str; // arthimetic or concat ????
alert(z); // displays “55”
</script>
C# is a type safe language. It does not allow one data type to be assigned to other data type. The below code does not allow “+” operator on different data types.
Concept:
To be very simple Type Safe like the meanings, it makes sure that type of the variable should be safe like
no wrong data type e.g. can't save or initialized a variable of string type with integer
Out of bound indexes are not accessible
Allow only the specific memory location
so it is all about the safety of the types of your storage in terms of variables.
Type-safe means that programmatically, the type of data for a variable, return value, or argument must fit within a certain criteria.
In practice, this means that 7 (an integer type) is different from "7" (a quoted character of string type).
PHP, Javascript and other dynamic scripting languages are usually weakly-typed, in that they will convert a (string) "7" to an (integer) 7 if you try to add "7" + 3, although sometimes you have to do this explicitly (and Javascript uses the "+" character for concatenation).
C/C++/Java will not understand that, or will concatenate the result into "73" instead. Type-safety prevents these types of bugs in code by making the type requirement explicit.
Type-safety is very useful. The solution to the above "7" + 3 would be to type cast (int) "7" + 3 (equals 10).
Try this explanation on...
TypeSafe means that variables are statically checked for appropriate assignment at compile time. For example, consder a string or an integer. These two different data types cannot be cross-assigned (ie, you can't assign an integer to a string nor can you assign a string to an integer).
For non-typesafe behavior, consider this:
object x = 89;
int y;
if you attempt to do this:
y = x;
the compiler throws an error that says it can't convert a System.Object to an Integer. You need to do that explicitly. One way would be:
y = Convert.ToInt32( x );
The assignment above is not typesafe. A typesafe assignement is where the types can directly be assigned to each other.
Non typesafe collections abound in ASP.NET (eg, the application, session, and viewstate collections). The good news about these collections is that (minimizing multiple server state management considerations) you can put pretty much any data type in any of the three collections. The bad news: because these collections aren't typesafe, you'll need to cast the values appropriately when you fetch them back out.
For example:
Session[ "x" ] = 34;
works fine. But to assign the integer value back, you'll need to:
int i = Convert.ToInt32( Session[ "x" ] );
Read about generics for ways that facility helps you easily implement typesafe collections.
C# is a typesafe language but watch for articles about C# 4.0; interesting dynamic possibilities loom (is it a good thing that C# is essentially getting Option Strict: Off... we'll see).
Type-Safe is code that accesses only the memory locations it is authorized to access, and only in well-defined, allowable ways.
Type-safe code cannot perform an operation on an object that is invalid for that object. The C# and VB.NET language compilers always produce type-safe code, which is verified to be type-safe during JIT compilation.
Type-safe means that the set of values that may be assigned to a program variable must fit well-defined and testable criteria. Type-safe variables lead to more robust programs because the algorithms that manipulate the variables can trust that the variable will only take one of a well-defined set of values. Keeping this trust ensures the integrity and quality of the data and the program.
For many variables, the set of values that may be assigned to a variable is defined at the time the program is written. For example, a variable called "colour" may be allowed to take on the values "red", "green", or "blue" and never any other values. For other variables those criteria may change at run-time. For example, a variable called "colour" may only be allowed to take on values in the "name" column of a "Colours" table in a relational database, where "red, "green", and "blue", are three values for "name" in the "Colours" table, but some other part of the computer program may be able to add to that list while the program is running, and the variable can take on the new values after they are added to the Colours table.
Many type-safe languages give the illusion of "type-safety" by insisting on strictly defining types for variables and only allowing a variable to be assigned values of the same "type". There are a couple of problems with this approach. For example, a program may have a variable "yearOfBirth" which is the year a person was born, and it is tempting to type-cast it as a short integer. However, it is not a short integer. This year, it is a number that is less than 2009 and greater than -10000. However, this set grows by 1 every year as the program runs. Making this a "short int" is not adequate. What is needed to make this variable type-safe is a run-time validation function that ensures that the number is always greater than -10000 and less than the next calendar year. There is no compiler that can enforce such criteria because these criteria are always unique characteristics of the problem domain.
Languages that use dynamic typing (or duck-typing, or manifest typing) such as Perl, Python, Ruby, SQLite, and Lua don't have the notion of typed variables. This forces the programmer to write a run-time validation routine for every variable to ensure that it is correct, or endure the consequences of unexplained run-time exceptions. In my experience, programmers in statically typed languages such as C, C++, Java, and C# are often lulled into thinking that statically defined types is all they need to do to get the benefits of type-safety. This is simply not true for many useful computer programs, and it is hard to predict if it is true for any particular computer program.
The long & the short.... Do you want type-safety? If so, then write run-time functions to ensure that when a variable is assigned a value, it conforms to well-defined criteria. The down-side is that it makes domain analysis really difficult for most computer programs because you have to explicitly define the criteria for each program variable.
Type Safety
In modern C++, type safety is very important. Type safety means that you use the types correctly and, therefore, avoid unsafe casts and unions. Every object in C++ is used according to its type and an object needs to be initialized before its use.
Safe Initialization: {}
The compiler protects from information loss during type conversion. For example,
int a{7}; The initialization is OK
int b{7.5} Compiler shows ERROR because of information loss.\
Unsafe Initialization: = or ()
The compiler doesn't protect from information loss during type conversion.
int a = 7 The initialization is OK
int a = 7.5 The initialization is OK, but information loss occurs. The actual value of a will become 7.0
int c(7) The initialization is OK
int c(7.5) The initialization is OK, but information loss occurs. The actual value of a will become 7.0