Why does the for loop use a semicolon? - language-agnostic

In most C-derived languages (C, Java, Javascript, etc), the for loop is of the same basic syntax
for (int i = 0; i < 100; i++) {
// code here
}
Why does this syntax contain semicolons, when semicolons are usually reserved for the end of the line? Also, why is there no semicolon after i++?

This pseudo-code:
for (A; B; C) {
D;
}
can be internally converted to
{ // scope bracket
A;
while (B) {
D;
C;
}
}

The semicolon is not for ending lines. It's for ending instructions. In most of those languages you can do:
int i;i = 0;
And it's legal. Look for any minified Javascript code. You'll see thousands of semicolons per line.
By the same principle, a for block takes three instructions. They are separated by semicolons so that the compiler or interpreter knows where each command starts and ends.
This is perfectly legal (though it results in an infinite loop):
for (;;) {}

I feel this can be answered sufficiently by examining how language parsers resolve the syntax. For example, a common depiction of a for loop is:
for (initialization; condition; increment-decrement) {
/** statements **/
}
You can generalize that to:
for (expression; expression; expression) {
/** statements **/
}
Note that the generalization is not entirely accurate, because the middle expression is typically reserved only for relational expressions, and the other two are either statements or statement lists. For example, in C and C++ you can have multiple statements in the initializer or increment-decrement regions by using the comma (,) operator.
It may help to note that statements are usually a collection of zero or more expressions, often separated by operators.
Why does the for loop syntax use semicolons?
In many languages, a semicolon doesn't occur at the end of a line of code, it typically occurs at the end of a statement. A statement is a often defined as the smallest standalone executable element of a piece of code. A common type of statement is an expression statement, which is a statement composed of exactly one expression. This helps to explain why there is a semicolon at the end of each expression in the for loop construct, because it is consistent with how the language parser interprets statements.
Why is there no semicolon at the end?
There is no semicolon at the end simply because that's how the language grammar is defined. The other components, as mentioned above, could be for consistency.
What about languages that don't use semicolons as mentioned?
This is difficult to answer, but I think a probable reason is that it's consistent with how it has been done in the past. C was/is a very popular language, and many languages base their syntax on some variant of C including C++, C#, Objective-C, Java, Python, Perl, and JavaScript.

Probably there is no rational explanation for why this specific for(;;){} syntax was born. You should ask Kernighan or Ritchie about that.
Going back in the history of programming languages to the first '60s, the curly braces could have appeared in BCPL programming language, while the parantheses wrapping conditions where glorified by B (which only had a while statement, no for). C was modelled over B.
Since 1972, C has penetrated all sectors of computer engineering and subsequent languages where often modelled on C syntax (C++, Java, Javascript, C#, Scala, just to name a few) not to upset the habits of established programmers. This includes the for(;;) loop syntax and curly braces.
As a sidenote, there are many widespread languages not resorting to a C-style syntax, such as Python, whose for loop you may find more logic (obviuosly, this is a personal opionion), or Ruby.

The general form is:
for ( expression; expression; expression ) { ... }
The parser can easily recognize these expression because their syntax is identical to the sintax of "normal" expressions:
{
expression;
expression;
...
}
The last expression can easily be recognized because it ends with a ')'.
Furthermore, commas couldn't be used because they can be put inside single expressions:
for ( i=1,j=10; i<10,j>0; i++,j--) { ... }

for is a language keyword/instruction - not to be confused with calls to library functions.
keyword/instructions are part of the language; if, while, do, for, etc. As such the compiler will need to generate assembly code (or instruction code of some kind) during compilation, interpretation. When the for instruction was being developed in the c language (by Brian Kernighan and Dennis Ritchie), they would have had to choose the syntax of the for operation, and how the operation was going to break down into assembler
From:
for( start-condition ; end-condition ; step-control )
to something like this:
mov eax, $x
beginning:
cmp eax, 0x0A
jg end
inc eax
jmp beginning
end:
mov $x, eax
this syntax was then used in C++, and other languages followed suite.
in c the semicolon is a line/command terminator. in other language it is a line separator. In the for loop it is a separator for the terms.
in c not all terms are required:
for(;;)
is valid code. and is equivalent to while(1) (pseudo: while(true))
in Pascal/modular2 the structure is different
for i:= start_value to end_value do

Related

Magic methods on other programming languages

Python has such methods as __add__, __mul__, __cmp__ and so on (called magic methods), which are used as a class methods and can give a different meaning to adding(+), multiplying(*), comparing(==), ... two instances of a class. My question is do other languages have a similar method? I'm familiar with Java, C++, ruby and PHP, but never came across such a thing. I know all four have a constructor method which corresponds to __init__, but what about other magic methods?
I tried googling "Magic methods in other programming languages" but nothing related showed up, probably they got different names on different languages.
In general, having too much "magic" in a language is a sign of bad language design. Maybe that is why there are not many languages which have magic methods?
Magic like this creates a two-class system: the language designer can add new magic methods to the language, but the programmer is restricted to only use the methods that the High Priest Of Language Design allows them to. In general, it should be possible for the programmer to do as much possible without requiring to change the language specification.
For example, in Scala, +, -, *, /, ==, !=, <, >, <=, >=, ::, |, &, ||, &&, **, ^, +=, -=, *=, /=, and so on and so forth, are simply legal identifiers. So, if you want to implement your own version of multiplication for your own objects, you just write a method named *. This is just a boring old standard method, there is absolutely nothing "magic" about it.
Conversely, any method can be called using operator notation, i.e. without a dot. And any method that takes exactly one argument can be called without parentheses in operator notation.
This does not only apply to methods. Also, any type constructor with exactly two type arguments can be used in infix notation, so if I have
class ↔️[A, B]
I can do
class Foo extends (String ↔️ Int)
which is the same as
class Foo extends ↔️[String, Int]
Well … I kinda lied: there is some syntactic sugar in Scala:
foo() is translated to foo.apply() if there is no method named foo in scope. This allows you to effectively overload the function call operator.
foo.bar = baz is translated to foo.bar_=(baz). This allows you to effectively overload property assignment. (This is how you write setters in Scala.)
foo(bar) = baz is translated to foo.update(bar, baz). This allows you to effectively overload index assignment. (This is how you write array or dictionary access in Scala, for example).
!foo (and a couple of others) are translated to foo.unary_!.
foo += bar will try to call the += method of foo, i.e. it is equivalent to foo.+=(bar). But if this fails and foo is a valid lvalue, and foo has a method named +, then Scala will also try foo = foo + bar instead.
Also, precedence, associativity, and fixity are fixed in Scala: they are determined by the first character of the method name. I.e. all methods starting with * have the same precedence, all methods starting with - have the same precedence, and so on.
Haskell goes a step further: there is no fundamental difference between functions and operators. Every function can be used in function call notation and in operator notation. The only difference is lexical: if the function name consists of operator characters, then when I want to use it in function call notation, I have to wrap it in parentheses. OTOH, if the function name consists of alphanumeric characters and I want to use it in operator notation, I need to wrap it in backticks. So, the following are equivalent:
a + b
(+) a b
a `plus` b
plus a b
For operator usage of functions, you can freely define the fixity, associativity, and precedence, e.g.:
infixr 15 <!==!>
In Ruby, there is a pre-defined set of operators that has corresponding methods, e.g.:
def +(other)
plus(other)
end
In C++ operator overloading is what your are looking for.
Java has no native support for operator overloading (Reference).
C has no operator overloading (Reference). Thus, a lot of add, mult and so on functions are written. Often those are macros, because then they can be used for different types. IMHO this is why I like C++ better.
#Alex gave reference to a nice overview of operator overlaoding.

Is the "if" statement considered a method?

Interesting discussion came up among my peers as to whether or not the "if" statement is considered a method? Although "if" is appended with the word statement it still behaves similar to a simple non-return value method.
For example:
if(myValue) //myValue is the parameter passed in
{
//Execute
}
Likewise a method could perform the same operation:
public void MyMethod(myValue)
{
switch(myValue)
{
case true:
//Logic
break;
case false:
//Logic
break;
}
}
Is it accurate to call (consider) the "if" statement a simple predefined method in a programming language?
In languages such as C, C++, C#, Java, IF is a statement implemented as a reserved word, part of the core of the language. In programming languages of the LISP family (Scheme comes to mind) IF is an expression (meaning that it returns a value) and is implemented as a special form. On the other hand, in pure object-oriented languages such as Smalltalk, IF really is a method (more precisely: a message), typically implemented on the Boolean class or one of its subclasses.
Bottom line: the true nature of the conditional instruction IF depends on the programming language, and on the programming paradigm of that language.
No, the "if" statement is nothing like a method in C#. Consider the ways in which it is not like a method:
The entities in the containing block are in scope in the body of an "if". But a method does not get any access to the binding environment of its caller.
In many languages methods are members of something -- a type, probably. Statements are not members.
In languages with first-class methods, methods can be passed around as data. (In C#, by converting them to delegates.) "if" statements are not first-class.
and so on. The differences are myriad.
Now, it does make sense to think of some things as a kind of method, just not "if" statements. Many operators, for instance, are a lot like methods. There's very little conceptual difference between:
decimal x = y + z;
and
decimal x = Add(y, z);
And in fact if you disassemble an addition of two decimals in C#, you'll find that the generated code actually is a method call.
Some operators have unusual characteristics that make it hard to characterize them as methods though:
bool x = Y() && Z();
is different from
bool x = And(Y(), Z());
in a language that has eager evaluation of method arguments; in the first, Z() is not evaluated if Y() is false. In the second, both are evaluated.
Your creation of an "if" method rather begs the question; the implementation is more complicated than an "if" statement. Saying that you can emulate "if" with a switch is like saying that you can emulate a bicycle with a motorcycle; replacing something simple with something far more complex is not compelling. It would be more reasonable to point out that a switch is actually a fancy "if".
You can't create a myIfStatement() method and expect the following to work:
...
myIfStatement(something == somethingElse)
{
// execute if equal
}
else
{
// execute if different
}
if is a control statement, and cannot be replicated by a method, nor can you replace a method call with if:
myVariable = if(something == somethingElse);
if cannot be overloaded.
These are a few signs that if is not a method, but there are others I suspect.
Depends on the language for sure, but in C, java, perl, no, they're language commands. Reserved words. If they were functions, you'd be able to overload them and get pointers to them and do all the other things that you can do with functions.
This is more of a philiosophical question than a programming question though.
A method has a signature and its main intention is resuable logic, whereas if is simply a condition that controls the flow of execution.
If you understand assembly, you would know that both are different even on a very low level.
You can of course write If() and IfElse() methods but that does not make them the same.
if() is defined as a statement in the language , at the same level as method calls. But there are differences in a.o. syntax and optimization possibilities.
So: No, the if() statement is not a method. You cannot for instance not assign it to a delegate.
Considering the if statement to be a method only makes it confusing, in my opinion. The similarities with a method call is just superficial.
The if statement is one of the statements that control the execution flow. When it's compiled into native machine code, it will evaluate the expression and make a conditional jump.
Pseudo code:
load myValue, reg0
test reg0
jumpeq .skip
; code inside the if
.skip:
If you use else, you will get two jumps:
load myValue, reg0
test reg0
jumpeq .else
; code inside the if
jmp .done
.else:
; code inside the else
.done:
Is the “if” statement considered a method?
No, it's not considered a method as you may have already seen in the other answers. However, if your question were - "Does it behave like a method?", then the answer could be yes depending on the language in question. Any language that supports first-class functions could do without an in-built construct/statement like if. Ignore all the fluffy stuff like return values and syntax, as basically it is just a function that evaluates a boolean value and if it is true, then it executes some block of code. Also ignore OO and functional differences because the following examples can be implemented as a method on the Boolean class in whatever language is being used like Smalltalk does it.
Ruby supports blocks of executable code that can be stored in a variable and passed around to methods. So here's a custom _if_ function implemented in Ruby. The stuff within the { .. } is a piece of executable code that's passed to the function. It's also known as a block in Ruby.
def _if_ (condition)
condition && yield
end
# simple statement
_if_ (42 > 0) { print "42 is indeed greater than 0" }
# complicated statement
_if_ (2 + 3 == 5) {
_if_ (3 + 5 == 8) { puts "3 + 5 is 8" }
_if_ (5 + 8 == 13) { puts "5 + 8 is 13" }
}
We can do the same thing in C, C++, Objective-C, JavaScript, Python, LISP, and many other languages. Here's a JavaScript example.
function _if_(condition, code) {
condition && code();
}
_if_(42 > 0, function() { console.log("Yes!"); });
If it were to be classed as a method then surely we would be in the realms of OO, however we're not, so I'll assume we're on about a function. Certainly a function/subroutine could be written to replicate the if behaviour (I think it is actually a function in lisp/scheme).
I wouldn't class it as a function or even a subroutine though, just control flow.
If by method we understand a block of code that could be called and the control flow automatically returns to the caller when the method ends, then ifs aren't methods. The control flow doesn't return anywhere after an if is executed.
The IF statement is a conditional contruct feature used in most lanuages which executes a path flow from the boolean condition evaluation of true or false. Apart from the case of branch predication, this is always achieved by selectively altering the control flow based on some condition.
The IF construct is the most basic and needed logic used when programming. It allows the building blocks for functions to be introduced.
Yes, if is a function in certain languages, even though it's rare and the uses are limited.
Usually the construct is something like if(booleanCondition, functionPointerToCallIfConditionTrue, functionPointerToCallIfCondtionFalse) This can itself be used as a delegate to other functions if you want.
Mathematica, for example, behaves this way and even C# can do so with a bit of work if you use Linq-expressions; Take a look at System.Linq.Expressions.Expression.IfThenElse.
No. You don't return back when you are finished with an if. It's merely a control statement.
Note that in your example, you replaced one "selection statement" (C# 4 specification, section 8.7), the if statement (section 8.7.1) with another, the switch statement (section 8.7.2). You also refactored the selection statement into a separate method. You haven't replaced the use of a selection statement with a method, however.
The answer to your question is "no".

Can if statements be implemented as function calls?

One of the stylistic 'conventions' I find slightly irritating in published code, is the use of:
if(condition) {
instead of (my preference):
if (condition) {
A slight difference, and probably an unimportant one, but it occurred to me that the first style might be justified if 'if' statements were implemented as a kind of function call. Then I could stop objecting to it.
Does anyone know of a programming language where an if statement is implemented as a function call, where the argument is a single boolean expression?
EDIT: I realise the blocks following the if() are problematic, and the way I expressed my question was probably too naive, but I'm encouraged by the answers so far.
tcl is one language which implements if as a regular in built function/command which takes two parameters ; condition and the code block to execute
if {$vbl == 1} { puts "vbl is one" }
http://tmml.sourceforge.net/doc/tcl/if.html
In fact, all language constructs in tcl (for loop , while loop etc.) are implemented as commands/functions.
It's impossible for it to have a single argument since it has to decide which code path to follow, which would have to be done outside of said function. It would need at least two arguments, but three would allow an "else" condition.
Lisp's if has exactly the same syntax as any other macro in the language (it's not quite exactly a function, but the difference is minimal): (if cond then else)
Both the 'then' and 'else' clauses are left unevaluated unless the condition selects them.
In Smalltalk, an if statement is kind of a function call -- sort of, in (of course) a completely object oriented way, so it's really a method not a free function. I'm not sure how it would affect your thinking on syntax though, since the syntax is completely different, looking like:
someBoolean
ifTrue: [ do_something ]
ifFalse: [ do_something_else ]
Given that this doesn't contain any parentheses at all, you can probably interpret it as proving whatever you wanted to believe. :-)
If the if function is to be a regular function, then it can't just take the condition, it needs as its parameters the block of code to run depending on whether the condition evaluates to true or not.
A prototype for a function like that in C++ might be something along the lines of
void custom_if(bool cond, void (*block)());
This function can either call the block function, or not, depending on cond.
In some functional languages things are much easier. In Haskell, a simple function like:
if' True a _ = a
if' _ _ b = b
allows you to write code like this:
if' (1 == 1)
(putStrLn "Here")
(putStrLn "There")
which will always print Here.
I don't know of any languages where if(condition) is implemented as a regular function call, but Perl implements try { } catch { } etc.. {} as function calls.

Why do programming languages not allow spaces in identifiers?

This may seem like a dumb question, but still I don't know the answer.
Why do programming languages not allow spaces in the names ( for instance method names )?
I understand it is to facilitate ( allow ) the parsing, and at some point it would be impossible to parse anything if spaces were allowed.
Nowadays we are so use to it that the norm is not to see spaces.
For instance:
object.saveData( data );
object.save_data( data )
object.SaveData( data );
[object saveData:data];
etc.
Could be written as:
object.save data( data ) // looks ugly, but that's the "nature" way.
If it is only for parsing, I guess the identifier could be between . and ( of course, procedural languages wouldn't be able to use it because there is no '.' but OO do..
I wonder if parsing is the only reason, and if it is, how important it is ( I assume that it will be and it will be impossible to do it otherwise, unless all the programming language designers just... forget the option )
EDIT
I'm ok with identifiers in general ( as the fortran example ) is bad idea. Narrowing to OO languages and specifically to methods, I don't see ( I don't mean there is not ) a reason why it should be that way. After all the . and the first ( may be used.
And forget the saveData method , consider this one:
key.ToString().StartsWith("TextBox")
as:
key.to string().starts with("textbox");
Be cause i twoul d makepa rsing suc hcode reallydif ficult.
I used an implementation of ALGOL (c. 1978) which—extremely annoyingly—required quoting of what is now known as reserved words, and allowed spaces in identifiers:
"proc" filter = ("proc" ("int") "bool" p, "list" l) "list":
"if" l "is" "nil" "then" "nil"
"elif" p(hd(l)) "then" cons(hd(l), filter(p,tl(l)))
"else" filter(p, tl(l))
"fi";
Also, FORTRAN (the capitalized form means F77 or earlier), was more or less insensitive to spaces. So this could be written:
799 S = FLO AT F (I A+I B+I C) / 2 . 0
A R E A = SQ R T ( S *(S - F L O ATF(IA)) * (S - FLOATF(IB)) *
+ (S - F LOA TF (I C)))
which was syntactically identical to
799 S = FLOATF (IA + IB + IC) / 2.0
AREA = SQRT( S * (S - FLOATF(IA)) * (S - FLOATF(IB)) *
+ (S - FLOATF(IC)))
With that kind of history of abuse, why make parsing difficult for humans? Let alone complicate computer parsing.
Yes, it's the parsing - both human and computer. It's easier to read and easier to parse if you can safely assume that whitespace doesn't matter. Otherwise, you can have potentially ambiguous statements, statements where it's not clear how things go together, statements that are hard to read, etc.
Such a change would make for an ambiguous language in the best of cases. For example, in a C99-like language:
if not foo(int x) {
...
}
is that equivalent to:
A function definition of foo that returns a value of type ifnot:
ifnot foo(int x) {
...
}
A call to a function called notfoo with a variable named intx:
if notfoo(intx) {
...
}
A negated call to a function called foo (with C99's not which means !):
if not foo(intx) {
...
}
This is just a small sample of the ambiguities you might run into.
Update: I just noticed that obviously, in a C99-like language, the condition of an if statement would be enclosed in parentheses. Extra punctuation can help with ambiguities if you choose to ignore whitespace, but your language will end up having lots of extra punctuation wherever you would normally have used whitespace.
Before the interpreter or compiler can build a parse tree, it must perform lexical analysis, turning the stream of characters into a stream of tokens. Consider how you would want the following parsed:
a = 1.2423 / (4343.23 * 2332.2);
And how your rule above would work on it. Hard to know how to lexify it without understanding the meaning of the tokens. It would be really hard to build a parser that did lexification at the same time.
There are a few languages which allow spaces in identifiers. The fact that nearly all languages constrain the set of characters in identifiers is because parsing is more easy and most programmers are accustomed to the compact no-whitespace style.
I don’t think there’s real reason.
Check out Stroustrup's classic Generalizing Overloading for C++2000.
We were allowed to put spaces in filenames back in the 1960's, and computers still don't handle them very well (everything used to break, then most things, now it's just a few things - but they still break).
We simply can't wait another 50 years before our code will work again.
:-)
(And what everyone else said, of course. In English, we use spaces and punctuation to separate the words. The same is true for computer languages, except that computer parsers define "words" in a slightly different sense)
Using space as part of an identifier makes parsing really murky (is that a syntactic space or an identifier?), but the same sort "natural reading" behavior is achieved with keyword arguments. object.save(data: something, atomically: true)
The TikZ language for creating graphics in LaTeX allows whitespace in parameter names (also known as 'keys'). For instance, you see things like
\shade[
top color=yellow!70,
bottom color=red!70,
shading angle={45},
]
In this restricted setting of a comma-separated list of key-value pairs, there's no parsing difficulty. In fact, I think it's much easier to read than the alternatives like topColor, top_color or topcolor.

Expression Versus Statement

I'm asking with regards to c#, but I assume its the same in most other languages.
Does anyone have a good definition of expressions and statements and what the differences are?
Expression: Something which evaluates to a value. Example: 1+2/x
Statement: A line of code which does something. Example: GOTO 100
In the earliest general-purpose programming languages, like FORTRAN, the distinction was crystal-clear. In FORTRAN, a statement was one unit of execution, a thing that you did. The only reason it wasn't called a "line" was because sometimes it spanned multiple lines. An expression on its own couldn't do anything... you had to assign it to a variable.
1 + 2 / X
is an error in FORTRAN, because it doesn't do anything. You had to do something with that expression:
X = 1 + 2 / X
FORTRAN didn't have a grammar as we know it today—that idea was invented, along with Backus-Naur Form (BNF), as part of the definition of Algol-60. At that point the semantic distinction ("have a value" versus "do something") was enshrined in syntax: one kind of phrase was an expression, and another was a statement, and the parser could tell them apart.
Designers of later languages blurred the distinction: they allowed syntactic expressions to do things, and they allowed syntactic statements that had values.
The earliest popular language example that still survives is C. The designers of C realized that no harm was done if you were allowed to evaluate an expression and throw away the result. In C, every syntactic expression can be a made into a statement just by tacking a semicolon along the end:
1 + 2 / x;
is a totally legit statement even though absolutely nothing will happen. Similarly, in C, an expression can have side-effects—it can change something.
1 + 2 / callfunc(12);
because callfunc might just do something useful.
Once you allow any expression to be a statement, you might as well allow the assignment operator (=) inside expressions. That's why C lets you do things like
callfunc(x = 2);
This evaluates the expression x = 2 (assigning the value of 2 to x) and then passes that (the 2) to the function callfunc.
This blurring of expressions and statements occurs in all the C-derivatives (C, C++, C#, and Java), which still have some statements (like while) but which allow almost any expression to be used as a statement (in C# only assignment, call, increment, and decrement expressions may be used as statements; see Scott Wisniewski's answer).
Having two "syntactic categories" (which is the technical name for the sort of thing statements and expressions are) can lead to duplication of effort. For example, C has two forms of conditional, the statement form
if (E) S1; else S2;
and the expression form
E ? E1 : E2
And sometimes people want duplication that isn't there: in standard C, for example, only a statement can declare a new local variable—but this ability is useful enough that the
GNU C compiler provides a GNU extension that enables an expression to declare a local variable as well.
Designers of other languages didn't like this kind of duplication, and they saw early on that if expressions can have side effects as well as values, then the syntactic distinction between statements and expressions is not all that useful—so they got rid of it. Haskell, Icon, Lisp, and ML are all languages that don't have syntactic statements—they only have expressions. Even the class structured looping and conditional forms are considered expressions, and they have values—but not very interesting ones.
an expression is anything that yields a value: 2 + 2
a statement is one of the basic "blocks" of program execution.
Note that in C, "=" is actually an operator, which does two things:
returns the value of the right hand subexpression.
copies the value of the right hand subexpression into the variable on the left hand side.
Here's an extract from the ANSI C grammar. You can see that C doesn't have many different kinds of statements... the majority of statements in a program are expression statements, i.e. an expression with a semicolon at the end.
statement
: labeled_statement
| compound_statement
| expression_statement
| selection_statement
| iteration_statement
| jump_statement
;
expression_statement
: ';'
| expression ';'
;
http://www.lysator.liu.se/c/ANSI-C-grammar-y.html
An expression is something that returns a value, whereas a statement does not.
For examples:
1 + 2 * 4 * foo.bar() //Expression
foo.voidFunc(1); //Statement
The Big Deal between the two is that you can chain expressions together, whereas statements cannot be chained.
You can find this on wikipedia, but expressions are evaluated to some value, while statements have no evaluated value.
Thus, expressions can be used in statements, but not the other way around.
Note that some languages (such as Lisp, and I believe Ruby, and many others) do not differentiate statement vs expression... in such languages, everything is an expression and can be chained with other expressions.
For an explanation of important differences in composability (chainability) of expressions vs statements, my favorite reference is John Backus's Turing award paper, Can programming be liberated from the von Neumann style?.
Imperative languages (Fortran, C, Java, ...) emphasize statements for structuring programs, and have expressions as a sort of after-thought. Functional languages emphasize expressions. Purely functional languages have such powerful expressions than statements can be eliminated altogether.
Expressions can be evaluated to get a value, whereas statements don't return a value (they're of type void).
Function call expressions can also be considered statements of course, but unless the execution environment has a special built-in variable to hold the returned value, there is no way to retrieve it.
Statement-oriented languages require all procedures to be a list of statements. Expression-oriented languages, which is probably all functional languages, are lists of expressions, or in tha case of LISP, one long S-expression that represents a list of expressions.
Although both types can be composed, most expressions can be composed arbitrarily as long as the types match up. Each type of statement has its own way of composing other statements, if they can do that all. Foreach and if statements require either a single statment or that all subordinate statements go in a statement block, one after another, unless the substatements allow for thier own substatements.
Statements can also include expressions, where an expression doesn't really include any statements. One exception, though, would be a lambda expression, which represents a function, and so can include anything a function can iclude unless the language only allows for limited lambdas, like Python's single-expression lambdas.
In an expression-based language, all you need is a single expression for a function since all control structures return a value (a lot of them return NIL). There's no need for a return statement since the last-evaluated expression in the function is the return value.
Simply: an expression evaluates to a value, a statement doesn't.
Some things about expression based languages:
Most important: Everything returns an value
There is no difference between curly brackets and braces for delimiting code blocks and expressions, since everything is an expression. This doesn't prevent lexical scoping though: A local variable could be defined for the expression in which its definition is contained and all statements contained within that, for example.
In an expression based language, everything returns a value. This can be a bit strange at first -- What does (FOR i = 1 TO 10 DO (print i)) return?
Some simple examples:
(1) returns 1
(1 + 1) returns 2
(1 == 1) returns TRUE
(1 == 2) returns FALSE
(IF 1 == 1 THEN 10 ELSE 5) returns 10
(IF 1 == 2 THEN 10 ELSE 5) returns 5
A couple more complex examples:
Some things, such as some function calls, don't really have a meaningful value to return (Things that only produce side effects?). Calling OpenADoor(), FlushTheToilet() or TwiddleYourThumbs() will return some sort of mundane value, such as OK, Done, or Success.
When multiple unlinked expressions are evaluated within one larger expression, the value of the last thing evaluated in the large expression becomes the value of the large expression. To take the example of (FOR i = 1 TO 10 DO (print i)), the value of the for loop is "10", it causes the (print i) expression to be evaluated 10 times, each time returning i as a string. The final time through returns 10, our final answer
It often requires a slight change of mindset to get the most out of an expression based language, since the fact that everything is an expression makes it possible to 'inline' a lot of things
As a quick example:
FOR i = 1 to (IF MyString == "Hello, World!" THEN 10 ELSE 5) DO
(
LotsOfCode
)
is a perfectly valid replacement for the non expression-based
IF MyString == "Hello, World!" THEN TempVar = 10 ELSE TempVar = 5
FOR i = 1 TO TempVar DO
(
LotsOfCode
)
In some cases, the layout that expression-based code permits feels much more natural to me
Of course, this can lead to madness. As part of a hobby project in an expression-based scripting language called MaxScript, I managed to come up with this monster line
IF FindSectionStart "rigidifiers" != 0 THEN FOR i = 1 TO (local rigidifier_array = (FOR i = (local NodeStart = FindsectionStart "rigidifiers" + 1) TO (FindSectionEnd(NodeStart) - 1) collect full_array[i])).count DO
(
LotsOfCode
)
I am not really satisfied with any of the answers here. I looked at the grammar for C++ (ISO 2008). However maybe for the sake of didactics and programming the answers might suffice to distinguish the two elements (reality looks more complicated though).
A statement consists of zero or more expressions, but can also be other language concepts. This is the Extended Backus Naur form for the grammar (excerpt for statement):
statement:
labeled-statement
expression-statement <-- can be zero or more expressions
compound-statement
selection-statement
iteration-statement
jump-statement
declaration-statement
try-block
We can see the other concepts that are considered statements in C++.
expression-statements is self-explaining (a statement can consist of zero or more expressions, read the grammar carefully, it's tricky)
case for example is a labeled-statement
selection-statements are if if/else, case
iteration-statements are while, do...while, for (...)
jump-statements are break, continue, return (can return expression), goto
declaration-statement is the set of declarations
try-block is statement representing try/catch blocks
and there might be some more down the grammar
This is an excerpt showing the expressions part:
expression:
assignment-expression
expression "," assignment-expression
assignment-expression:
conditional-expression
logical-or-expression assignment-operator initializer-clause
throw-expression
expressions are or contain often assignments
conditional-expression (sounds misleading) refers to usage of the operators (+, -, *, /, &, |, &&, ||, ...)
throw-expression - uh? the throw clause is an expression too
The de-facto basis of these concepts is:
Expressions: A syntactic category whose instance can be evaluated to a value.
Statement: A syntactic category whose instance may be involved with evaluations of an expression and the resulted value of the evaluation (if any) is not guaranteed available.
Besides to the very initial context for FORTRAN in the early decades, both definitions of expressions and statements in the accepted answer are obviously wrong:
Expressions can be unvaluated operands. Values are never produced from them.
Subexpressions in non-strict evaluations can be definitely unevaluated.
Most C-like languages have the so-called short-circuit evaluation rules to conditionally skip some subexpression evaluations not change the final result in spite of the side effects.
C and some C-like languages have the notion of unevaluated operand which may be even normatively defined in the language specification. Such constructs are used to avoid the evaluations definitely, so the remained context information (e.g. types or alignment requirements) can be statically distinguished without changing the behavior after the program translation.
For example, an expression used as the operand of the sizeof operator is never evaluated.
Statements have nothing to do with line constructs. They can do something more than expressions, depending on the language specifications.
Modern Fortran, as the direct descendant of the old FORTRAN, has concepts of executable statements and nonexecutable statements.
Similarly, C++ defines declarations as the top-level subcategory of a translation unit. A declaration in C++ is a statement. (This is not true in C.) There are also expression-statements like Fortran's executable statements.
To the interest of the comparison with expressions, only the "executable" statements matter. But you can't ignore the fact that statements are already generalized to be constructs forming the translation units in such imperative languages. So, as you can see, the definitions of the category vary a lot. The (probably) only remained common property preserved among these languages is that statements are expected to be interpreted in the lexical order (for most users, left-to-right and top-to-bottom).
(BTW, I want to add [citation needed] to that answer concerning materials about C because I can't recall whether DMR has such opinions. It seems not, otherwise there should be no reasons to preserve the functionality duplication in the design of C: notably, the comma operator vs. the statements.)
(The following rationale is not the direct response to the original question, but I feel it necessary to clarify something already answered here.)
Nevertheless, it is doubtful that we need a specific category of "statements" in general-purpose programming languages:
Statements are not guaranteed to have more semantic capabilities over expressions in usual designs.
Many languages have already successfully abandon the notion of statements to get clean, neat and consistent overall designs.
In such languages, expressions can do everything old-style statements can do: just drop the unused results when the expressions are evaluated, either by leaving the results explicitly unspecified (e.g. in RnRS Scheme), or having a special value (as a value of a unit type) not producible from normal expression evaluations.
The lexical order rules of evaluation of expressions can be replaced by explicit sequence control operator (e.g. begin in Scheme) or syntactic sugar of monadic structures.
The lexical order rules of other kinds of "statements" can be derived as syntactic extensions (using hygienic macros, for example) to get the similar syntactic functionality. (And it can actually do more.)
On the contrary, statements cannot have such conventional rules, because they don't compose on evaluation: there is just no such common notion of "substatement evaluation". (Even if any, I doubt there can be something much more than copy and paste from existed rules of evaluation of expressions.)
Typically, languages preserving statements will also have expressions to express computations, and there is a top-level subcategory of the statements preserved to expression evaluations for that subcategory. For example, C++ has the so-called expression-statement as the subcategory, and uses the discarded-value expression evaluation rules to specify the general cases of full-expression evaluations in such context. Some languages like C# chooses to refine the contexts to simplify the use cases, but it bloats the specification more.
For users of programming languages, the significance of statements may confuse them further.
The separation of rules of expressions and statements in the languages requires more effort to learn a language.
The naive lexical order interpretation hides the more important notion: expression evaluation. (This is probably most problematic over all.)
Even the evaluations of full expressions in statements are constraint with the lexical order, subexpressions are not (necessarily). Users should ultimately learn this besides any rules coupled to the statements. (Consider how to make a newbie get the point that ++i + ++i is meaningless in C.)
Some languages like Java and C# further constraints the order of evaluations of subexpressions to be permissive of ignorance of evaluation rules. It can be even more problematic.
This seems overspecified to users who have already learned the idea of expression evaluation. It also encourages the user community to follow the blurred mental model of the language design.
It bloats the language specification even more.
It is harmful to optimization by missing the expressiveness of nondeterminism on evaluations, before more complicated primitives are introduced.
A few languages like C++ (particularly, C++17) specify more subtle contexts of evaluation rules, as a compromise of the problems above.
It bloats the language specification a lot.
This goes totally against to simplicity to average users...
So why statements? Anyway, the history is already a mess. It seems most language designers do not take their choice carefully.
Worse, it even gives some type system enthusiasts (who are not familiar enough with the PL history) some misconceptions that type systems must have important things to do with the more essential designs of rules on the operational semantics.
Seriously, reasoning depending on types are not that bad in many cases, but particularly not constructive in this special one. Even experts can screw things up.
For example, someone emphasizes the well-typing nature as the central argument against the traditional treatment of undelimited continuations. Although the conclusion is somewhat reasonable and the insights about composed functions are OK (but still far too naive to the essense), this argument is not sound because it totally ignores the "side channel" approach in practice like _Noreturn any_of_returnable_types (in C11) to encode Falsum. And strictly speaking, an abstract machine with unpredictable state is not identical to "a crashed computer".
A statement is a special case of an expression, one with void type. The tendency of languages to treat statements differently often causes problems, and it would be better if they were properly generalized.
For example, in C# we have the very useful Func<T1, T2, T3, TResult> overloaded set of generic delegates. But we also have to have a corresponding Action<T1, T2, T3> set as well, and general purpose higher-order programming constantly has to be duplicated to deal with this unfortunate bifurcation.
Trivial example - a function that checks whether a reference is null before calling onto another function:
TResult IfNotNull<TValue, TResult>(TValue value, Func<TValue, TResult> func)
where TValue : class
{
return (value == null) ? default(TValue) : func(value);
}
Could the compiler deal with the possibility of TResult being void? Yes. All it has to do is require that return is followed by an expression that is of type void. The result of default(void) would be of type void, and the func being passed in would need to be of the form Func<TValue, void> (which would be equivalent to Action<TValue>).
A number of other answers imply that you can't chain statements like you can with expressions, but I'm not sure where this idea comes from. We can think of the ; that appears after statements as a binary infix operator, taking two expressions of type void and combining them into a single expression of type void.
Statements -> Instructions to follow sequentially
Expressions -> Evaluation that returns a value
Statements are basically like steps, or instructions in an algorithm, the result of the execution of a statement is the actualization of the instruction pointer (so-called in assembler)
Expressions do not imply and execution order at first sight, their purpose is to evaluate and return a value. In the imperative programming languages the evaluation of an expression has an order, but it is just because of the imperative model, but it is not their essence.
Examples of Statements:
for
goto
return
if
(all of them imply the advance of the line (statement) of execution to another line)
Example of expressions:
2+2
(it doesn't imply the idea of execution, but of the evaluation)
Statement,
A statement is a procedural building-block from which all C# programs are constructed. A statement can declare a local variable or constant, call a method, create an object, or assign a value to a variable, property, or field.
A series of statements surrounded by curly braces form a block of code. A method body is one example of a code block.
bool IsPositive(int number)
{
if (number > 0)
{
return true;
}
else
{
return false;
}
}
Statements in C# often contain expressions. An expression in C# is a fragment of code containing a literal value, a simple name, or an operator and its operands.
Expression,
An expression is a fragment of code that can be evaluated to a single value, object, method, or namespace. The two simplest types of expressions are literals and simple names. A literal is a constant value that has no name.
int i = 5;
string s = "Hello World";
Both i and s are simple names identifying local variables. When those variables are used in an expression, the value of the variable is retrieved and used for the expression.
I prefer the meaning of statement in the formal logic sense of the word. It is one that changes the state of one or more of the variables in the computation, enabling a true or false statement to be made about their value(s).
I guess there will always be confusion in the computing world and science in general when new terminology or words are introduced, existing words are 'repurposed' or users are ignorant of the existing, established or 'proper' terminology for what they are describing
Here is the summery of one of the simplest answer I found.
originally Answered by Anders Kaseorg
A statement is a complete line of code that performs some action, while an expression is any section of the code that evaluates to a value.
Expressions can be combined “horizontally” into larger expressions using operators, while statements can only be combined “vertically” by writing one after another, or with block constructs.
Every expression can be used as a statement (whose effect is to evaluate the expression and ignore the resulting value), but most statements cannot be used as expressions.
http://www.quora.com/Python-programming-language-1/Whats-the-difference-between-a-statement-and-an-expression-in-Python
Statements are grammatically complete sentences. Expressions are not. For example
x = 5
reads as "x gets 5." This is a complete sentence. The code
(x + 5)/9.0
reads, "x plus 5 all divided by 9.0." This is not a complete sentence. The statement
while k < 10:
print k
k += 1
is a complete sentence. Notice that the loop header is not; "while k < 10," is a subordinating clause.
In a statement-oriented programming language, a code block is defined as a list of statements. In other words, a statement is a piece of syntax that you can put inside a code block without causing a syntax error.
Wikipedia defines the word statement similarly
In computer programming, a statement is a syntactic unit of an imperative programming language that expresses some action to be carried out. A program written in such a language is formed by a sequence of one or more statements
Notice the latter statement. (although "a program" in this case is technically wrong because both C and Java reject a program that consists of nothing of statements.)
Wikipedia defines the word expression as
An expression in a programming language is a syntactic entity that may be evaluated to determine its value
This is, however, false, because in Kotlin, throw new Exception("") is an expression but when evaluated, it simply throws an exception, never returning any value.
In a statically typed programming language, every expression has a type. This definition, however, doesn't work in a dynamically typed programming language.
Personally, I define an expression as a piece of syntax that can be composed with an operator or function calls to yield a bigger expression. This is actually similar to the explanation of expression by Wikipedia:
It is a combination of one or more constants, variables, functions, and operators that the programming language interprets (according to its particular rules of precedence and of association) and computes to produce ("to return", in a stateful environment) another value
But, the problem is in C programming language, given a function executeSomething like this:
void executeSomething(void){
return;
}
Is executeSomething() an expression or is it a statement? According to my definition, it is a statement because as defined in Microsoft's C reference grammar,
You cannot use the (nonexistent) value of an expression that has type void in any way, nor can you convert a void expression (by implicit or explicit conversion) to any type except void
But the same page clearly indicates that such syntax is an expression.
A statement is a block of code that doesn't return anything and which is just a standalone unit of execution. For example-
if(a>=0)
printf("Hello Humen,I'm a statement");
An expression, on the other hand, returns or evaluates a new value. For example -
if(a>=0)
return a+10;//This is an expression because it evalutes an new value;
or
a=10+y;//This is also an expression because it returns a new value.
Expression
A piece of syntax which can be evaluated to some value. In other words, an expression is an accumulation of expression elements like literals, names, attribute access, operators or function calls which all return a value. In contrast to many other languages, not all language constructs are expressions. There are also statements which cannot be used as expressions, such as while. Assignments are also statements, not expressions.
Statement
A statement is part of a suite (a “block” of code). A statement is either an expression or one of several constructs with a keyword, such as if, while or for.
To improve on and validate my prior answer, definitions of programming language terms should be explained from computer science type theory when applicable.
An expression has a type other than the Bottom type, i.e. it has a value. A statement has the Unit or Bottom type.
From this it follows that a statement can only have any effect in a program when it creates a side-effect, because it either can not return a value or it only returns the value of the Unit type which is either nonassignable (in some languages such a C's void) or (such as in Scala) can be stored for a delayed evaluation of the statement.
Obviously a #pragma or a /*comment*/ have no type and thus are differentiated from statements. Thus the only type of statement that would have no side-effects would be a non-operation. Non-operation is only useful as a placeholder for future side-effects. Any other action due to a statement would be a side-effect. Again a compiler hint, e.g. #pragma, is not a statement because it has no type.
Most precisely, a statement must have a "side-effect" (i.e. be imperative) and an expression must have a value type (i.e. not the bottom type).
The type of a statement is the unit type, but due to Halting theorem unit is fiction so lets say the bottom type.
Void is not precisely the bottom type (it isn't the subtype of all possible types). It exists in languages that don't have a completely sound type system. That may sound like a snobbish statement, but completeness such as variance annotations are critical to writing extensible software.
Let's see what Wikipedia has to say on this matter.
https://en.wikipedia.org/wiki/Statement_(computer_science)
In computer programming a statement is the smallest standalone element of an imperative programming language that expresses some action to be carried out.
Many languages (e.g. C) make a distinction between statements and definitions, with a statement only containing executable code and a definition declaring an identifier, while an expression evaluates to a value only.