Why is 'bicrement' not possible like this? - language-agnostic

Why is this not possible?
int c = 0;
++c++;
Or in PHP:
$c = 0;
++$c++;
I would expect it to increment the variable c by 2, or perhaps do something weird, but instead it gives an error while compiling. I've tried to come up with a reason but got nothing really... My reasoning was this:
The compiler reads ++
It reads the variable
It does whatever it does to make the value of the variable increment and then get returned when executing the application
It encounters another ++
It uses the previous variable in order to return it before incrementing the value
This is where it gets confusing: does it use the variable c, or does it try to read the value that (++c) returned? Or, since you can only do varname++ (and not 2++), does it see (++c) as a pointer and then tries to read that memory location? Whatever it does, why would it give a compile error? Or is the error preventive, because the compiler's programmer knew it wouldn't do anything useful?
It's not really that I would want to use this, and certainly not for code that is not one-time use only, but I'm just curious.

For the same reason you can't do:
c++ = 5;
It returns a value, which cannot be modified nor assigned to. That's not a runtime error, either - that's a compilation error. (Like this one.)
Returning a reference wouldn't make sense either, because then:
$a = 1;
$b = $a++; // How can it be a reference if b should be 1 and a should be 2?

This isn't necessarily language-agnostic, although I'd be a little surprised to see a language in which it was different.
In C (and hence I assume in all non-annoying languages based on C), the operator precedence means your expression is equivalent to ++(c++). Based on your step-by-step, you were expecting it to be equivalent to (++c)++, but it isn't. Postfix ++ "binds more tightly" than prefix ++.
Like everyone says, the expression c++ results in a value, not a modifiable object (an expression in C that refers to an object is called an lvalue). This is necessary because the object c no longer holds the value that the expression evaluates to -- there is no object that c++ could refer to.
In C++ (not to be confused with c++!), ++c is an lvalue. It refers to the object c, which now has the new value.
So in C++ (++c)++ is syntactically correct. It happens to have undefined behavior if c is of type int (at least, it did in C++03: C++11 made some things defined that used to be undefined and I'm not up to date on those changes).
So, if you imagine (or go ahead and invent) a C++-like language in which the operator precedence is what you expected, then you could arrange for ++c++ to be valid. Presumably it would increment c twice, and evaluate to the value in between the old and new values. This imagined language would be "annoying" in the sense that it's only subtly different from C++, which would tend to be confusing.
++c++ is also valid in C++ if c is an instance of a user-defined type that overloads both increment operators. The reason is that an rvalue of user-defined type is an object (a "temporary object"), and can be modified. But as soon as the expression has been evaluated, the temporary is destroyed and the modification is lost.

Another way to Explain this is that the ++ operator only operates on an lvalue. But when you combine them, it's parsed as either ++(c++) or (++c)++ -- in either case, the parameter to the operator outside the parentheses is an rvalue (c's value before or after the increment), not an lvalue.

Related

C99 parameter evaluation before C function is called

I know that this question is similar to this one, but I feel like I don't fully understand C99 standard. I want to ask about parameter evaluation itself, for example:
int index = 0;
sprintf(somebuf, "some-text-%d", index++);
So, it seems like index is not incremented before function call (I got some-text-0 as a result). Is it expected behavior?
By using the post-increment operator (++ follows index), the value is used first, then incremented. If you wanted to use the incremented value, you should have used the pre-increment operator (++index). FYI, this dates back to the earliest versions of C.

Term for function parameters defined for generic use, not specific

Is there a term for this technique? One prominent example is the WinAPI: SendMessage( hwnd, msg, info1, info2 ) where parameters #3 and #4 only make sense per msg (which also means there are cases when only one or none of those two parameters are needed). See MSDN.
Rephrased: having an all-purpose function that always accepts multiple arguments, but interpreting them depends on a previous argument. I don't want to talk about open arrays, open arguments, typeless arguments... I know all that. That's not what I'm asking - I want to have the term for this type of functions (any maybe also how unspecific parameters are called).
This is not about casts or passing by reference - the parameter types are always the same. Other example: calculate( char operation, int a, int b ) which is then used as
calculate( '+', 2, 5 ) (parameters #2 and #3 are summands)
calculate( '/', 4, 2 ) (parameter #2 is the divident and parameter #3 is the divisor)
calculate( '!', 3, 0 ) (parameter #2 is the factorial and parameter #3 is unused)
In all these cases the data type is always the same and never casted. But the meaning of parameters #2 and #3 differ per parameter #1. And since this is the case it is difficult to give those parameters a meaningful name. Of course the function itself most likely uses a switch(), but that is not subject to my question. How are parameters #2 and #3 called, where a distinct name cannot be found, but data types are always the same?
The fact the msg argument "changes" the parameters is through a simple switch statement. Each "msg" in the switch knows the parameters(with type) needed and casts them appropriately.
This "technique" is called passing by reference, or passing by address. The latter is usually used for method pointers.
There is no Special name if that is what you are asking. It is a regular function, method or procedure.
The referenced Function is a Win32 API Function, which may be referred to as a "Windows function call."
This is an example of a static Parameter and multiple Dynamic parameters.
The static is the "msg" and the dynamic is described as the following:
These parameters are generic pointers. Passed by reference. They can point to any data type or no value, ie null pointer. It is up to the sender to lock the memory in place, and the receiving method to interpret the pointer correctly (through pointer casts).
This is an example of typeless argument passing. The only thing passed is a memory address. It is dangerous since the types passed must be agreed upon ahead of time(by convention and not contract as with a typed language construct) and must match on both sides of the call.
This was common before C++, in the C days, we only had C structs to pass around. Leading to many General Fault Protection errors. Since then, typed interfaces mostly have replaced the generic equivalents through libraries. But the underlying Win32 methods remain the same. The main substantial change since its' inception is the acceptance of 64-bit pointers.
Although not widely supported, what you are referring to would be a dependently typed function (or dependently typed parameters).
To quote wikipedia on dependent types
A "pair of integers" is a type. A "pair of integers where the second is greater than the first" is a dependent type because of the dependence on the value.
The parameters could have a type that depends on a value. The type of info1 depends on the value msg as does info2.
In order to make this approach work in a language without dependent types, the dependent parameters are given a very generic type that is only refined later on when more information is available. When the type of msg becomes known (at runtime) only then is are the types of info1 and info2 assumed. Even though the language doesn't allow you to express this dependency, I would still call the approach a dependently type one.

Functions in Lua

I am starting to learn Lua from Programming in Lua (2nd edition) I didn't understand the following in the book.
network = {
{name ="grauna", IP="210.26.30.34"},
{name ="araial", IP="210.26.30.23"},
}
If we want to sort the table by field name, the author mentions
table.sort(network, function (a,b) return (a.name > b.name) end }
Whats happening here? What does function (a,b) stand for? Is function a key word or something.
If was playing around with it and created a table order
order={x=1,x=22,x=10} // not sure this is legal
and then did
print (table.sort(order,function(a,b) return (a.x > b.x) end))
I did not get any output. Where am I going wrong?
Thanks
It's an anonymous function that takes two arguments and returns true if the first argument is less than the second argument. table.sort() runs this function for each of the elements that need sorting and compares each element with the previous element.
I think (but I am not sure) that order={x=1,x=22,x=10} has the same meaning in Lua as order={x=10}, a table with one key "x" associated with the value 10. Maybe you meant {{x=1},{x=22},{x=10}} to make an "array" of 3 components, each having the key "x".
To answer the second part of your question: Lua is very small, and doesn't provide a way to print a table directly. If you use a table as a list or array, you can do this:
print(unpack(some_table))
unpack({1, 2, 3}) returns 1, 2, 3. A very useful function.
function in lua is a keyword, similar to lambda in Scheme or Common Lisp (& also Python), or fun in Ocaml, to introduce anonymous functions with closed variables, i.e. closures

What is Type-safe?

What does "type-safe" mean?
Type safety means that the compiler will validate types while compiling, and throw an error if you try to assign the wrong type to a variable.
Some simple examples:
// Fails, Trying to put an integer in a string
String one = 1;
// Also fails.
int foo = "bar";
This also applies to method arguments, since you are passing explicit types to them:
int AddTwoNumbers(int a, int b)
{
return a + b;
}
If I tried to call that using:
int Sum = AddTwoNumbers(5, "5");
The compiler would throw an error, because I am passing a string ("5"), and it is expecting an integer.
In a loosely typed language, such as javascript, I can do the following:
function AddTwoNumbers(a, b)
{
return a + b;
}
if I call it like this:
Sum = AddTwoNumbers(5, "5");
Javascript automaticly converts the 5 to a string, and returns "55". This is due to javascript using the + sign for string concatenation. To make it type-aware, you would need to do something like:
function AddTwoNumbers(a, b)
{
return Number(a) + Number(b);
}
Or, possibly:
function AddOnlyTwoNumbers(a, b)
{
if (isNaN(a) || isNaN(b))
return false;
return Number(a) + Number(b);
}
if I call it like this:
Sum = AddTwoNumbers(5, " dogs");
Javascript automatically converts the 5 to a string, and appends them, to return "5 dogs".
Not all dynamic languages are as forgiving as javascript (In fact a dynamic language does not implicity imply a loose typed language (see Python)), some of them will actually give you a runtime error on invalid type casting.
While its convenient, it opens you up to a lot of errors that can be easily missed, and only identified by testing the running program. Personally, I prefer to have my compiler tell me if I made that mistake.
Now, back to C#...
C# supports a language feature called covariance, this basically means that you can substitute a base type for a child type and not cause an error, for example:
public class Foo : Bar
{
}
Here, I created a new class (Foo) that subclasses Bar. I can now create a method:
void DoSomething(Bar myBar)
And call it using either a Foo, or a Bar as an argument, both will work without causing an error. This works because C# knows that any child class of Bar will implement the interface of Bar.
However, you cannot do the inverse:
void DoSomething(Foo myFoo)
In this situation, I cannot pass Bar to this method, because the compiler does not know that Bar implements Foo's interface. This is because a child class can (and usually will) be much different than the parent class.
Of course, now I've gone way off the deep end and beyond the scope of the original question, but its all good stuff to know :)
Type-safety should not be confused with static / dynamic typing or strong / weak typing.
A type-safe language is one where the only operations that one can execute on data are the ones that are condoned by the data's type. That is, if your data is of type X and X doesn't support operation y, then the language will not allow you to to execute y(X).
This definition doesn't set rules on when this is checked. It can be at compile time (static typing) or at runtime (dynamic typing), typically through exceptions. It can be a bit of both: some statically typed languages allow you to cast data from one type to another, and the validity of casts must be checked at runtime (imagine that you're trying to cast an Object to a Consumer - the compiler has no way of knowing whether it's acceptable or not).
Type-safety does not necessarily mean strongly typed, either - some languages are notoriously weakly typed, but still arguably type safe. Take Javascript, for example: its type system is as weak as they come, but still strictly defined. It allows automatic casting of data (say, strings to ints), but within well defined rules. There is to my knowledge no case where a Javascript program will behave in an undefined fashion, and if you're clever enough (I'm not), you should be able to predict what will happen when reading Javascript code.
An example of a type-unsafe programming language is C: reading / writing an array value outside of the array's bounds has an undefined behaviour by specification. It's impossible to predict what will happen. C is a language that has a type system, but is not type safe.
Type safety is not just a compile time constraint, but a run time constraint. I feel even after all this time, we can add further clarity to this.
There are 2 main issues related to type safety. Memory** and data type (with its corresponding operations).
Memory**
A char typically requires 1 byte per character, or 8 bits (depends on language, Java and C# store unicode chars which require 16 bits).
An int requires 4 bytes, or 32 bits (usually).
Visually:
char: |-|-|-|-|-|-|-|-|
int : |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-|
A type safe language does not allow an int to be inserted into a char at run-time (this should throw some kind of class cast or out of memory exception). However, in a type unsafe language, you would overwrite existing data in 3 more adjacent bytes of memory.
int >> char:
|-|-|-|-|-|-|-|-| |?|?|?|?|?|?|?|?| |?|?|?|?|?|?|?|?| |?|?|?|?|?|?|?|?|
In the above case, the 3 bytes to the right are overwritten, so any pointers to that memory (say 3 consecutive chars) which expect to get a predictable char value will now have garbage. This causes undefined behavior in your program (or worse, possibly in other programs depending on how the OS allocates memory - very unlikely these days).
** While this first issue is not technically about data type, type safe languages address it inherently and it visually describes the issue to those unaware of how memory allocation "looks".
Data Type
The more subtle and direct type issue is where two data types use the same memory allocation. Take a int vs an unsigned int. Both are 32 bits. (Just as easily could be a char[4] and an int, but the more common issue is uint vs. int).
|-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-|
|-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-|
A type unsafe language allows the programmer to reference a properly allocated span of 32 bits, but when the value of a unsigned int is read into the space of an int (or vice versa), we again have undefined behavior. Imagine the problems this could cause in a banking program:
"Dude! I overdrafted $30 and now I have $65,506 left!!"
...'course, banking programs use much larger data types. ;) LOL!
As others have already pointed out, the next issue is computational operations on types. That has already been sufficiently covered.
Speed vs Safety
Most programmers today never need to worry about such things unless they are using something like C or C++. Both of these languages allow programmers to easily violate type safety at run time (direct memory referencing) despite the compilers' best efforts to minimize the risk. HOWEVER, this is not all bad.
One reason these languages are so computationally fast is they are not burdened by verifying type compatibility during run time operations like, for example, Java. They assume the developer is a good rational being who won't add a string and an int together and for that, the developer is rewarded with speed/efficiency.
Many answers here conflate type-safety with static-typing and dynamic-typing. A dynamically typed language (like smalltalk) can be type-safe as well.
A short answer: a language is considered type-safe if no operation leads to undefined behavior. Many consider the requirement of explicit type conversions necessary for a language to be strictly typed, as automatic conversions can sometimes leads to well defined but unexpected/unintuitive behaviors.
A programming language that is 'type-safe' means following things:
You can't read from uninitialized variables
You can't index arrays beyond their bounds
You can't perform unchecked type casts
An explanation from a liberal arts major, not a comp sci major:
When people say that a language or language feature is type safe, they mean that the language will help prevent you from, for example, passing something that isn't an integer to some logic that expects an integer.
For example, in C#, I define a function as:
void foo(int arg)
The compiler will then stop me from doing this:
// call foo
foo("hello world")
In other languages, the compiler would not stop me (or there is no compiler...), so the string would be passed to the logic and then probably something bad will happen.
Type safe languages try to catch more at "compile time".
On the down side, with type safe languages, when you have a string like "123" and you want to operate on it like an int, you have to write more code to convert the string to an int, or when you have an int like 123 and want to use it in a message like, "The answer is 123", you have to write more code to convert/cast it to a string.
To get a better understanding do watch the below video which demonstrates code in type safe language (C#) and NOT type safe language ( javascript).
http://www.youtube.com/watch?v=Rlw_njQhkxw
Now for the long text.
Type safety means preventing type errors. Type error occurs when data type of one type is assigned to other type UNKNOWINGLY and we get undesirable results.
For instance JavaScript is a NOT a type safe language. In the below code “num” is a numeric variable and “str” is string. Javascript allows me to do “num + str” , now GUESS will it do arithmetic or concatenation .
Now for the below code the results are “55” but the important point is the confusion created what kind of operation it will do.
This is happening because javascript is not a type safe language. Its allowing to set one type of data to the other type without restrictions.
<script>
var num = 5; // numeric
var str = "5"; // string
var z = num + str; // arthimetic or concat ????
alert(z); // displays “55”
</script>
C# is a type safe language. It does not allow one data type to be assigned to other data type. The below code does not allow “+” operator on different data types.
Concept:
To be very simple Type Safe like the meanings, it makes sure that type of the variable should be safe like
no wrong data type e.g. can't save or initialized a variable of string type with integer
Out of bound indexes are not accessible
Allow only the specific memory location
so it is all about the safety of the types of your storage in terms of variables.
Type-safe means that programmatically, the type of data for a variable, return value, or argument must fit within a certain criteria.
In practice, this means that 7 (an integer type) is different from "7" (a quoted character of string type).
PHP, Javascript and other dynamic scripting languages are usually weakly-typed, in that they will convert a (string) "7" to an (integer) 7 if you try to add "7" + 3, although sometimes you have to do this explicitly (and Javascript uses the "+" character for concatenation).
C/C++/Java will not understand that, or will concatenate the result into "73" instead. Type-safety prevents these types of bugs in code by making the type requirement explicit.
Type-safety is very useful. The solution to the above "7" + 3 would be to type cast (int) "7" + 3 (equals 10).
Try this explanation on...
TypeSafe means that variables are statically checked for appropriate assignment at compile time. For example, consder a string or an integer. These two different data types cannot be cross-assigned (ie, you can't assign an integer to a string nor can you assign a string to an integer).
For non-typesafe behavior, consider this:
object x = 89;
int y;
if you attempt to do this:
y = x;
the compiler throws an error that says it can't convert a System.Object to an Integer. You need to do that explicitly. One way would be:
y = Convert.ToInt32( x );
The assignment above is not typesafe. A typesafe assignement is where the types can directly be assigned to each other.
Non typesafe collections abound in ASP.NET (eg, the application, session, and viewstate collections). The good news about these collections is that (minimizing multiple server state management considerations) you can put pretty much any data type in any of the three collections. The bad news: because these collections aren't typesafe, you'll need to cast the values appropriately when you fetch them back out.
For example:
Session[ "x" ] = 34;
works fine. But to assign the integer value back, you'll need to:
int i = Convert.ToInt32( Session[ "x" ] );
Read about generics for ways that facility helps you easily implement typesafe collections.
C# is a typesafe language but watch for articles about C# 4.0; interesting dynamic possibilities loom (is it a good thing that C# is essentially getting Option Strict: Off... we'll see).
Type-Safe is code that accesses only the memory locations it is authorized to access, and only in well-defined, allowable ways.
Type-safe code cannot perform an operation on an object that is invalid for that object. The C# and VB.NET language compilers always produce type-safe code, which is verified to be type-safe during JIT compilation.
Type-safe means that the set of values that may be assigned to a program variable must fit well-defined and testable criteria. Type-safe variables lead to more robust programs because the algorithms that manipulate the variables can trust that the variable will only take one of a well-defined set of values. Keeping this trust ensures the integrity and quality of the data and the program.
For many variables, the set of values that may be assigned to a variable is defined at the time the program is written. For example, a variable called "colour" may be allowed to take on the values "red", "green", or "blue" and never any other values. For other variables those criteria may change at run-time. For example, a variable called "colour" may only be allowed to take on values in the "name" column of a "Colours" table in a relational database, where "red, "green", and "blue", are three values for "name" in the "Colours" table, but some other part of the computer program may be able to add to that list while the program is running, and the variable can take on the new values after they are added to the Colours table.
Many type-safe languages give the illusion of "type-safety" by insisting on strictly defining types for variables and only allowing a variable to be assigned values of the same "type". There are a couple of problems with this approach. For example, a program may have a variable "yearOfBirth" which is the year a person was born, and it is tempting to type-cast it as a short integer. However, it is not a short integer. This year, it is a number that is less than 2009 and greater than -10000. However, this set grows by 1 every year as the program runs. Making this a "short int" is not adequate. What is needed to make this variable type-safe is a run-time validation function that ensures that the number is always greater than -10000 and less than the next calendar year. There is no compiler that can enforce such criteria because these criteria are always unique characteristics of the problem domain.
Languages that use dynamic typing (or duck-typing, or manifest typing) such as Perl, Python, Ruby, SQLite, and Lua don't have the notion of typed variables. This forces the programmer to write a run-time validation routine for every variable to ensure that it is correct, or endure the consequences of unexplained run-time exceptions. In my experience, programmers in statically typed languages such as C, C++, Java, and C# are often lulled into thinking that statically defined types is all they need to do to get the benefits of type-safety. This is simply not true for many useful computer programs, and it is hard to predict if it is true for any particular computer program.
The long & the short.... Do you want type-safety? If so, then write run-time functions to ensure that when a variable is assigned a value, it conforms to well-defined criteria. The down-side is that it makes domain analysis really difficult for most computer programs because you have to explicitly define the criteria for each program variable.
Type Safety
In modern C++, type safety is very important. Type safety means that you use the types correctly and, therefore, avoid unsafe casts and unions. Every object in C++ is used according to its type and an object needs to be initialized before its use.
Safe Initialization: {}
The compiler protects from information loss during type conversion. For example,
int a{7}; The initialization is OK
int b{7.5} Compiler shows ERROR because of information loss.\
Unsafe Initialization: = or ()
The compiler doesn't protect from information loss during type conversion.
int a = 7 The initialization is OK
int a = 7.5 The initialization is OK, but information loss occurs. The actual value of a will become 7.0
int c(7) The initialization is OK
int c(7.5) The initialization is OK, but information loss occurs. The actual value of a will become 7.0

Expression Versus Statement

I'm asking with regards to c#, but I assume its the same in most other languages.
Does anyone have a good definition of expressions and statements and what the differences are?
Expression: Something which evaluates to a value. Example: 1+2/x
Statement: A line of code which does something. Example: GOTO 100
In the earliest general-purpose programming languages, like FORTRAN, the distinction was crystal-clear. In FORTRAN, a statement was one unit of execution, a thing that you did. The only reason it wasn't called a "line" was because sometimes it spanned multiple lines. An expression on its own couldn't do anything... you had to assign it to a variable.
1 + 2 / X
is an error in FORTRAN, because it doesn't do anything. You had to do something with that expression:
X = 1 + 2 / X
FORTRAN didn't have a grammar as we know it today—that idea was invented, along with Backus-Naur Form (BNF), as part of the definition of Algol-60. At that point the semantic distinction ("have a value" versus "do something") was enshrined in syntax: one kind of phrase was an expression, and another was a statement, and the parser could tell them apart.
Designers of later languages blurred the distinction: they allowed syntactic expressions to do things, and they allowed syntactic statements that had values.
The earliest popular language example that still survives is C. The designers of C realized that no harm was done if you were allowed to evaluate an expression and throw away the result. In C, every syntactic expression can be a made into a statement just by tacking a semicolon along the end:
1 + 2 / x;
is a totally legit statement even though absolutely nothing will happen. Similarly, in C, an expression can have side-effects—it can change something.
1 + 2 / callfunc(12);
because callfunc might just do something useful.
Once you allow any expression to be a statement, you might as well allow the assignment operator (=) inside expressions. That's why C lets you do things like
callfunc(x = 2);
This evaluates the expression x = 2 (assigning the value of 2 to x) and then passes that (the 2) to the function callfunc.
This blurring of expressions and statements occurs in all the C-derivatives (C, C++, C#, and Java), which still have some statements (like while) but which allow almost any expression to be used as a statement (in C# only assignment, call, increment, and decrement expressions may be used as statements; see Scott Wisniewski's answer).
Having two "syntactic categories" (which is the technical name for the sort of thing statements and expressions are) can lead to duplication of effort. For example, C has two forms of conditional, the statement form
if (E) S1; else S2;
and the expression form
E ? E1 : E2
And sometimes people want duplication that isn't there: in standard C, for example, only a statement can declare a new local variable—but this ability is useful enough that the
GNU C compiler provides a GNU extension that enables an expression to declare a local variable as well.
Designers of other languages didn't like this kind of duplication, and they saw early on that if expressions can have side effects as well as values, then the syntactic distinction between statements and expressions is not all that useful—so they got rid of it. Haskell, Icon, Lisp, and ML are all languages that don't have syntactic statements—they only have expressions. Even the class structured looping and conditional forms are considered expressions, and they have values—but not very interesting ones.
an expression is anything that yields a value: 2 + 2
a statement is one of the basic "blocks" of program execution.
Note that in C, "=" is actually an operator, which does two things:
returns the value of the right hand subexpression.
copies the value of the right hand subexpression into the variable on the left hand side.
Here's an extract from the ANSI C grammar. You can see that C doesn't have many different kinds of statements... the majority of statements in a program are expression statements, i.e. an expression with a semicolon at the end.
statement
: labeled_statement
| compound_statement
| expression_statement
| selection_statement
| iteration_statement
| jump_statement
;
expression_statement
: ';'
| expression ';'
;
http://www.lysator.liu.se/c/ANSI-C-grammar-y.html
An expression is something that returns a value, whereas a statement does not.
For examples:
1 + 2 * 4 * foo.bar() //Expression
foo.voidFunc(1); //Statement
The Big Deal between the two is that you can chain expressions together, whereas statements cannot be chained.
You can find this on wikipedia, but expressions are evaluated to some value, while statements have no evaluated value.
Thus, expressions can be used in statements, but not the other way around.
Note that some languages (such as Lisp, and I believe Ruby, and many others) do not differentiate statement vs expression... in such languages, everything is an expression and can be chained with other expressions.
For an explanation of important differences in composability (chainability) of expressions vs statements, my favorite reference is John Backus's Turing award paper, Can programming be liberated from the von Neumann style?.
Imperative languages (Fortran, C, Java, ...) emphasize statements for structuring programs, and have expressions as a sort of after-thought. Functional languages emphasize expressions. Purely functional languages have such powerful expressions than statements can be eliminated altogether.
Expressions can be evaluated to get a value, whereas statements don't return a value (they're of type void).
Function call expressions can also be considered statements of course, but unless the execution environment has a special built-in variable to hold the returned value, there is no way to retrieve it.
Statement-oriented languages require all procedures to be a list of statements. Expression-oriented languages, which is probably all functional languages, are lists of expressions, or in tha case of LISP, one long S-expression that represents a list of expressions.
Although both types can be composed, most expressions can be composed arbitrarily as long as the types match up. Each type of statement has its own way of composing other statements, if they can do that all. Foreach and if statements require either a single statment or that all subordinate statements go in a statement block, one after another, unless the substatements allow for thier own substatements.
Statements can also include expressions, where an expression doesn't really include any statements. One exception, though, would be a lambda expression, which represents a function, and so can include anything a function can iclude unless the language only allows for limited lambdas, like Python's single-expression lambdas.
In an expression-based language, all you need is a single expression for a function since all control structures return a value (a lot of them return NIL). There's no need for a return statement since the last-evaluated expression in the function is the return value.
Simply: an expression evaluates to a value, a statement doesn't.
Some things about expression based languages:
Most important: Everything returns an value
There is no difference between curly brackets and braces for delimiting code blocks and expressions, since everything is an expression. This doesn't prevent lexical scoping though: A local variable could be defined for the expression in which its definition is contained and all statements contained within that, for example.
In an expression based language, everything returns a value. This can be a bit strange at first -- What does (FOR i = 1 TO 10 DO (print i)) return?
Some simple examples:
(1) returns 1
(1 + 1) returns 2
(1 == 1) returns TRUE
(1 == 2) returns FALSE
(IF 1 == 1 THEN 10 ELSE 5) returns 10
(IF 1 == 2 THEN 10 ELSE 5) returns 5
A couple more complex examples:
Some things, such as some function calls, don't really have a meaningful value to return (Things that only produce side effects?). Calling OpenADoor(), FlushTheToilet() or TwiddleYourThumbs() will return some sort of mundane value, such as OK, Done, or Success.
When multiple unlinked expressions are evaluated within one larger expression, the value of the last thing evaluated in the large expression becomes the value of the large expression. To take the example of (FOR i = 1 TO 10 DO (print i)), the value of the for loop is "10", it causes the (print i) expression to be evaluated 10 times, each time returning i as a string. The final time through returns 10, our final answer
It often requires a slight change of mindset to get the most out of an expression based language, since the fact that everything is an expression makes it possible to 'inline' a lot of things
As a quick example:
FOR i = 1 to (IF MyString == "Hello, World!" THEN 10 ELSE 5) DO
(
LotsOfCode
)
is a perfectly valid replacement for the non expression-based
IF MyString == "Hello, World!" THEN TempVar = 10 ELSE TempVar = 5
FOR i = 1 TO TempVar DO
(
LotsOfCode
)
In some cases, the layout that expression-based code permits feels much more natural to me
Of course, this can lead to madness. As part of a hobby project in an expression-based scripting language called MaxScript, I managed to come up with this monster line
IF FindSectionStart "rigidifiers" != 0 THEN FOR i = 1 TO (local rigidifier_array = (FOR i = (local NodeStart = FindsectionStart "rigidifiers" + 1) TO (FindSectionEnd(NodeStart) - 1) collect full_array[i])).count DO
(
LotsOfCode
)
I am not really satisfied with any of the answers here. I looked at the grammar for C++ (ISO 2008). However maybe for the sake of didactics and programming the answers might suffice to distinguish the two elements (reality looks more complicated though).
A statement consists of zero or more expressions, but can also be other language concepts. This is the Extended Backus Naur form for the grammar (excerpt for statement):
statement:
labeled-statement
expression-statement <-- can be zero or more expressions
compound-statement
selection-statement
iteration-statement
jump-statement
declaration-statement
try-block
We can see the other concepts that are considered statements in C++.
expression-statements is self-explaining (a statement can consist of zero or more expressions, read the grammar carefully, it's tricky)
case for example is a labeled-statement
selection-statements are if if/else, case
iteration-statements are while, do...while, for (...)
jump-statements are break, continue, return (can return expression), goto
declaration-statement is the set of declarations
try-block is statement representing try/catch blocks
and there might be some more down the grammar
This is an excerpt showing the expressions part:
expression:
assignment-expression
expression "," assignment-expression
assignment-expression:
conditional-expression
logical-or-expression assignment-operator initializer-clause
throw-expression
expressions are or contain often assignments
conditional-expression (sounds misleading) refers to usage of the operators (+, -, *, /, &, |, &&, ||, ...)
throw-expression - uh? the throw clause is an expression too
The de-facto basis of these concepts is:
Expressions: A syntactic category whose instance can be evaluated to a value.
Statement: A syntactic category whose instance may be involved with evaluations of an expression and the resulted value of the evaluation (if any) is not guaranteed available.
Besides to the very initial context for FORTRAN in the early decades, both definitions of expressions and statements in the accepted answer are obviously wrong:
Expressions can be unvaluated operands. Values are never produced from them.
Subexpressions in non-strict evaluations can be definitely unevaluated.
Most C-like languages have the so-called short-circuit evaluation rules to conditionally skip some subexpression evaluations not change the final result in spite of the side effects.
C and some C-like languages have the notion of unevaluated operand which may be even normatively defined in the language specification. Such constructs are used to avoid the evaluations definitely, so the remained context information (e.g. types or alignment requirements) can be statically distinguished without changing the behavior after the program translation.
For example, an expression used as the operand of the sizeof operator is never evaluated.
Statements have nothing to do with line constructs. They can do something more than expressions, depending on the language specifications.
Modern Fortran, as the direct descendant of the old FORTRAN, has concepts of executable statements and nonexecutable statements.
Similarly, C++ defines declarations as the top-level subcategory of a translation unit. A declaration in C++ is a statement. (This is not true in C.) There are also expression-statements like Fortran's executable statements.
To the interest of the comparison with expressions, only the "executable" statements matter. But you can't ignore the fact that statements are already generalized to be constructs forming the translation units in such imperative languages. So, as you can see, the definitions of the category vary a lot. The (probably) only remained common property preserved among these languages is that statements are expected to be interpreted in the lexical order (for most users, left-to-right and top-to-bottom).
(BTW, I want to add [citation needed] to that answer concerning materials about C because I can't recall whether DMR has such opinions. It seems not, otherwise there should be no reasons to preserve the functionality duplication in the design of C: notably, the comma operator vs. the statements.)
(The following rationale is not the direct response to the original question, but I feel it necessary to clarify something already answered here.)
Nevertheless, it is doubtful that we need a specific category of "statements" in general-purpose programming languages:
Statements are not guaranteed to have more semantic capabilities over expressions in usual designs.
Many languages have already successfully abandon the notion of statements to get clean, neat and consistent overall designs.
In such languages, expressions can do everything old-style statements can do: just drop the unused results when the expressions are evaluated, either by leaving the results explicitly unspecified (e.g. in RnRS Scheme), or having a special value (as a value of a unit type) not producible from normal expression evaluations.
The lexical order rules of evaluation of expressions can be replaced by explicit sequence control operator (e.g. begin in Scheme) or syntactic sugar of monadic structures.
The lexical order rules of other kinds of "statements" can be derived as syntactic extensions (using hygienic macros, for example) to get the similar syntactic functionality. (And it can actually do more.)
On the contrary, statements cannot have such conventional rules, because they don't compose on evaluation: there is just no such common notion of "substatement evaluation". (Even if any, I doubt there can be something much more than copy and paste from existed rules of evaluation of expressions.)
Typically, languages preserving statements will also have expressions to express computations, and there is a top-level subcategory of the statements preserved to expression evaluations for that subcategory. For example, C++ has the so-called expression-statement as the subcategory, and uses the discarded-value expression evaluation rules to specify the general cases of full-expression evaluations in such context. Some languages like C# chooses to refine the contexts to simplify the use cases, but it bloats the specification more.
For users of programming languages, the significance of statements may confuse them further.
The separation of rules of expressions and statements in the languages requires more effort to learn a language.
The naive lexical order interpretation hides the more important notion: expression evaluation. (This is probably most problematic over all.)
Even the evaluations of full expressions in statements are constraint with the lexical order, subexpressions are not (necessarily). Users should ultimately learn this besides any rules coupled to the statements. (Consider how to make a newbie get the point that ++i + ++i is meaningless in C.)
Some languages like Java and C# further constraints the order of evaluations of subexpressions to be permissive of ignorance of evaluation rules. It can be even more problematic.
This seems overspecified to users who have already learned the idea of expression evaluation. It also encourages the user community to follow the blurred mental model of the language design.
It bloats the language specification even more.
It is harmful to optimization by missing the expressiveness of nondeterminism on evaluations, before more complicated primitives are introduced.
A few languages like C++ (particularly, C++17) specify more subtle contexts of evaluation rules, as a compromise of the problems above.
It bloats the language specification a lot.
This goes totally against to simplicity to average users...
So why statements? Anyway, the history is already a mess. It seems most language designers do not take their choice carefully.
Worse, it even gives some type system enthusiasts (who are not familiar enough with the PL history) some misconceptions that type systems must have important things to do with the more essential designs of rules on the operational semantics.
Seriously, reasoning depending on types are not that bad in many cases, but particularly not constructive in this special one. Even experts can screw things up.
For example, someone emphasizes the well-typing nature as the central argument against the traditional treatment of undelimited continuations. Although the conclusion is somewhat reasonable and the insights about composed functions are OK (but still far too naive to the essense), this argument is not sound because it totally ignores the "side channel" approach in practice like _Noreturn any_of_returnable_types (in C11) to encode Falsum. And strictly speaking, an abstract machine with unpredictable state is not identical to "a crashed computer".
A statement is a special case of an expression, one with void type. The tendency of languages to treat statements differently often causes problems, and it would be better if they were properly generalized.
For example, in C# we have the very useful Func<T1, T2, T3, TResult> overloaded set of generic delegates. But we also have to have a corresponding Action<T1, T2, T3> set as well, and general purpose higher-order programming constantly has to be duplicated to deal with this unfortunate bifurcation.
Trivial example - a function that checks whether a reference is null before calling onto another function:
TResult IfNotNull<TValue, TResult>(TValue value, Func<TValue, TResult> func)
where TValue : class
{
return (value == null) ? default(TValue) : func(value);
}
Could the compiler deal with the possibility of TResult being void? Yes. All it has to do is require that return is followed by an expression that is of type void. The result of default(void) would be of type void, and the func being passed in would need to be of the form Func<TValue, void> (which would be equivalent to Action<TValue>).
A number of other answers imply that you can't chain statements like you can with expressions, but I'm not sure where this idea comes from. We can think of the ; that appears after statements as a binary infix operator, taking two expressions of type void and combining them into a single expression of type void.
Statements -> Instructions to follow sequentially
Expressions -> Evaluation that returns a value
Statements are basically like steps, or instructions in an algorithm, the result of the execution of a statement is the actualization of the instruction pointer (so-called in assembler)
Expressions do not imply and execution order at first sight, their purpose is to evaluate and return a value. In the imperative programming languages the evaluation of an expression has an order, but it is just because of the imperative model, but it is not their essence.
Examples of Statements:
for
goto
return
if
(all of them imply the advance of the line (statement) of execution to another line)
Example of expressions:
2+2
(it doesn't imply the idea of execution, but of the evaluation)
Statement,
A statement is a procedural building-block from which all C# programs are constructed. A statement can declare a local variable or constant, call a method, create an object, or assign a value to a variable, property, or field.
A series of statements surrounded by curly braces form a block of code. A method body is one example of a code block.
bool IsPositive(int number)
{
if (number > 0)
{
return true;
}
else
{
return false;
}
}
Statements in C# often contain expressions. An expression in C# is a fragment of code containing a literal value, a simple name, or an operator and its operands.
Expression,
An expression is a fragment of code that can be evaluated to a single value, object, method, or namespace. The two simplest types of expressions are literals and simple names. A literal is a constant value that has no name.
int i = 5;
string s = "Hello World";
Both i and s are simple names identifying local variables. When those variables are used in an expression, the value of the variable is retrieved and used for the expression.
I prefer the meaning of statement in the formal logic sense of the word. It is one that changes the state of one or more of the variables in the computation, enabling a true or false statement to be made about their value(s).
I guess there will always be confusion in the computing world and science in general when new terminology or words are introduced, existing words are 'repurposed' or users are ignorant of the existing, established or 'proper' terminology for what they are describing
Here is the summery of one of the simplest answer I found.
originally Answered by Anders Kaseorg
A statement is a complete line of code that performs some action, while an expression is any section of the code that evaluates to a value.
Expressions can be combined “horizontally” into larger expressions using operators, while statements can only be combined “vertically” by writing one after another, or with block constructs.
Every expression can be used as a statement (whose effect is to evaluate the expression and ignore the resulting value), but most statements cannot be used as expressions.
http://www.quora.com/Python-programming-language-1/Whats-the-difference-between-a-statement-and-an-expression-in-Python
Statements are grammatically complete sentences. Expressions are not. For example
x = 5
reads as "x gets 5." This is a complete sentence. The code
(x + 5)/9.0
reads, "x plus 5 all divided by 9.0." This is not a complete sentence. The statement
while k < 10:
print k
k += 1
is a complete sentence. Notice that the loop header is not; "while k < 10," is a subordinating clause.
In a statement-oriented programming language, a code block is defined as a list of statements. In other words, a statement is a piece of syntax that you can put inside a code block without causing a syntax error.
Wikipedia defines the word statement similarly
In computer programming, a statement is a syntactic unit of an imperative programming language that expresses some action to be carried out. A program written in such a language is formed by a sequence of one or more statements
Notice the latter statement. (although "a program" in this case is technically wrong because both C and Java reject a program that consists of nothing of statements.)
Wikipedia defines the word expression as
An expression in a programming language is a syntactic entity that may be evaluated to determine its value
This is, however, false, because in Kotlin, throw new Exception("") is an expression but when evaluated, it simply throws an exception, never returning any value.
In a statically typed programming language, every expression has a type. This definition, however, doesn't work in a dynamically typed programming language.
Personally, I define an expression as a piece of syntax that can be composed with an operator or function calls to yield a bigger expression. This is actually similar to the explanation of expression by Wikipedia:
It is a combination of one or more constants, variables, functions, and operators that the programming language interprets (according to its particular rules of precedence and of association) and computes to produce ("to return", in a stateful environment) another value
But, the problem is in C programming language, given a function executeSomething like this:
void executeSomething(void){
return;
}
Is executeSomething() an expression or is it a statement? According to my definition, it is a statement because as defined in Microsoft's C reference grammar,
You cannot use the (nonexistent) value of an expression that has type void in any way, nor can you convert a void expression (by implicit or explicit conversion) to any type except void
But the same page clearly indicates that such syntax is an expression.
A statement is a block of code that doesn't return anything and which is just a standalone unit of execution. For example-
if(a>=0)
printf("Hello Humen,I'm a statement");
An expression, on the other hand, returns or evaluates a new value. For example -
if(a>=0)
return a+10;//This is an expression because it evalutes an new value;
or
a=10+y;//This is also an expression because it returns a new value.
Expression
A piece of syntax which can be evaluated to some value. In other words, an expression is an accumulation of expression elements like literals, names, attribute access, operators or function calls which all return a value. In contrast to many other languages, not all language constructs are expressions. There are also statements which cannot be used as expressions, such as while. Assignments are also statements, not expressions.
Statement
A statement is part of a suite (a “block” of code). A statement is either an expression or one of several constructs with a keyword, such as if, while or for.
To improve on and validate my prior answer, definitions of programming language terms should be explained from computer science type theory when applicable.
An expression has a type other than the Bottom type, i.e. it has a value. A statement has the Unit or Bottom type.
From this it follows that a statement can only have any effect in a program when it creates a side-effect, because it either can not return a value or it only returns the value of the Unit type which is either nonassignable (in some languages such a C's void) or (such as in Scala) can be stored for a delayed evaluation of the statement.
Obviously a #pragma or a /*comment*/ have no type and thus are differentiated from statements. Thus the only type of statement that would have no side-effects would be a non-operation. Non-operation is only useful as a placeholder for future side-effects. Any other action due to a statement would be a side-effect. Again a compiler hint, e.g. #pragma, is not a statement because it has no type.
Most precisely, a statement must have a "side-effect" (i.e. be imperative) and an expression must have a value type (i.e. not the bottom type).
The type of a statement is the unit type, but due to Halting theorem unit is fiction so lets say the bottom type.
Void is not precisely the bottom type (it isn't the subtype of all possible types). It exists in languages that don't have a completely sound type system. That may sound like a snobbish statement, but completeness such as variance annotations are critical to writing extensible software.
Let's see what Wikipedia has to say on this matter.
https://en.wikipedia.org/wiki/Statement_(computer_science)
In computer programming a statement is the smallest standalone element of an imperative programming language that expresses some action to be carried out.
Many languages (e.g. C) make a distinction between statements and definitions, with a statement only containing executable code and a definition declaring an identifier, while an expression evaluates to a value only.