As of 2021, what are all common lisp special functions? - function

Special functions receive this name because they are different from macros and ordinary functions.
Just like macros, special functions do not evaluate their inputs. But unlike
macros, they do not return Lisp expressions that are to be evaluated. Special
functions provide the primitives on which Lisp is built, such as assignment,
conditionals, and block structure.
In 1989 [1], David S. Touretzky wrote that there were 24 built-in Common Lisp special functions:
BLOCK,
CATCH,
COMPILER-LET,
DECLARE,
EVAL-WHEN,
FLET,
FUNCTION,
GO,
IF,
LABELS,
LET,
LET*,
MACROLET,
MULTIPLE-VALUE-CALL,
MULTIPLE-VALUE-PROG1,
PROGN,
PROGV,
QUOTE,
RETURN-FROM,
SETQ,
TAGBODY,
THE,
THROW;
and, UNWIND-PROTECT.
He also said that:
This list may change with future revisions of the Common Lisp standard.
There was a new release of the book in 2014. However, the information stood the same. Even in the book in 2014, the text starts exactly like the first edition in 1989 with:
"As of mid-1989, the 24 built-in Common Lisp special functions are: ...." (page 507).
Not sure if they gave a careful look on this point, if they did, I guess they would have updated the year to be "As of mid-2014..."
Thus, as of mid-2021 is the list any different now from how it was in 1989?
Source:
[1] - COMMON LISP: A Gentle Introduction to Symbolic Computation

In the Common Lisp standard (which was published in 1994) there is no concept of a special function. The standard defines the concept of a special operator and these must not be functions. As Will Ness commented, the special operators are listed here in the Common Lisp HyperSpec (which is HTML pages derived from the standard - the original standard is published as a printed doc and a PDF file): 3.1.2.1.2.1 Special Forms. This is a fixed list -> the standard has no language mechanism provided to the Common Lisp user to add new ones. Though I think that some implementations of Common Lisp have a few additional special operators.
a form is an object meant to be evaluated: a symbol, a compound form or a self-evaluating object.
a compound form: a list with an operator or a lambda expression as the first element: a macro form, a function form, a special form or a lambda form
a special form: a compound form, which has a special operator as its first element
special operator: one of the symbols listed in chapter 3.1.2.1.2.1
Why does the book not use the terminology of the Common Lisp standard? Maybe the author was not aware of it or thought it to be too complex (or too much effort) to update the wording to use the standard wording - which would possibly mean changing a lot of things in the text.
Another useful reference is this late draft of the standard in PDF form: draft proposed ANSI CL standard. The draft has basically the same content as the published standard, but is freely available. The Common Lisp HyperSpec also has basically the same content, but in a different form.

Why does the book not use the terminology of the Common Lisp standard? Maybe the author was not aware of it or thought it to be too complex (or too much effort) to update the wording to use the standard wording - which would possibly mean changing a lot of things in the text.
I believe that the standard used an early form of CLTL, the copyright was "relaxed" to make this possible per Guy Steele Jr.

Related

Having Multiple Commands for Calling a Specific Programming Language: To Provide a Delimiter-less Option or Not?

After re-reading the off/on topic lists, I'm still not certain if this question is best posted to this site, so apologies in advance, if it is not.
Overview:
I am working on a project that mixes several programming languages and we are trying to determine important considerations for the command used to call one in particular.
For definiteness, I will list the specific languages; however, I think the principles ought to be general, so familiarity with these specific languages is not really essential.
Specific Context
Specifically, we are using: Maxima, KaTeX, Markdown and HTML). While building the prototype, we have used the following (I believe, standard) conventions:
KaTeX delimited by $ $ or $$ $$;
HTML delimited by < > </ > pairs;
Markdown works anywhere in the body, except within KaTeX or Maxima environments;
The only non-standard convention we used during this design phase was to call on Maxima using \comp{<Maxima commands>}. This command works within all the other environments (which is desired).
Now that we are ready to start using the platform, it has become apparent that this temporary command for calling Maxima is cumbersome for our users. The vast majority of use cases involve simply calling a single variable or function, e.g.
As such, we have $\eval{function-name()}(\eval{variable-name})$
as opposed to actually using Maxima for computation, e.g.
Here, it is clear that $\eval{a} + \eval{b} = \eval{a+b}$
(where \eval{a+b} would return the actual sum, as calculated by Maxima).
As such, our users would prefer a delimiter-less command option for invoking a single variable or function, e.g. \#<variable-name-in-Maxima> and \#<function-name>(<argument>) (where # is some reserved character not used in the other languages), while also having a delimited alternative for the (much less frequent) cases where they actually want to use Maxima for computation; perhaps something like \#{a+b}.
However, we have a general sense that this is not a best practice, even though we can't foresee any specific issue.
"Research" / Comparisons:
Indeed, there is precedence for delimit-less expressions for single arguments like x^2 (on any calculator) or Knuth's a \over b in TeX (which persists in LaTeX with \frac12 being parsed as \frac{1}{2}.
IIRC Knuth's point was that this delimit-less notation was more semantic (and so, in his view, preferable), and because delimiters can be added, ambiguity can be avoided, whenever the need arises: e.g. x^{22}, {a+b}\over{c+d} and \frac{12}{3}.
The Question, Proper:
Can anyone point to or explain actual shortcomings / risks associated with a dual solution like:
\#<var>, \#<function>(<arg>) and,
\#[<extended expression>],
(where # is a reserved (& escapable) character), for calling one language amongst others, as opposed to only using a delimited command?
Any alternative suggestions for how to achieve the ease-of-use and more semantic code enabled by the above solution, while keeping the code unambiguous would be very much welcome and appreciated.

Common Lisp a Lisp-n?

I'm aware that Common Lisp has different binding environments for functions and variables, but I believe that it also has another binding environment for tagbody labels. Are there even more binding environments than this? If so, then is it fair to categorize Common Lisp as a Lisp-2?
These question are not meant as pedantry or bike-shedding, I only want to gain a better understanding of Common Lisp and hopefully get some pointers into where to dig deeper into its spec.
I'm aware that Common Lisp has different binding environments for
functions and variables,
That would be namespaces, according to the HyperSpec:
namespace n. 1. bindings whose denotations are restricted to a
particular kind. The bindings of names to tags is the tag
namespace.'' 2. any mapping whose domain is a set of names.A
package defines a namespace.''
(Point 1.)
but I believe that it also has another binding environment for tagbody
labels. Are there even more binding environments than this?
Yes, there are more namespaces. I even remember a little snippet exposing most of them, but unfortunately, I can't find it anymore¹. It at least exposed variable, function, tag, and block namespaces, but maybe also types and declarations were included. There is also another SO answer that lists these namespaces.
If so, then is it fair to categorize Common Lisp as a Lisp-2?
In the comments to the above linked answer, Rainer Joswig agrees that the "general debate is about Lisp-1 against Lisp-n".
The "2" might be due to the relative importance of the distinction between value and function slots, or because the objects of the other namespaces aren't first-class objects. For example in the Gabriel/Pitman paper referenced in the other answer:
There is really a larger number of namespaces than just the two that
are discussed here. As we noted earlier, other namespaces include at
least those of blocks and tags; type names and declaration names are
often considered namespaces. Thus, the names Lisp1 and Lisp2, which we
have been using are misleading. The names Lisp5 and Lisp6 might be
more appropriate.
and:
In this paper, there are two namespaces of concern, which we
shall term the "value namespace" and the "function namespace." Other
namespaces include tag names (used by TAGBODY and GO) and block names
(used by BLOCK and RETURN-FROM), but the objects in the location parts
of their bindings are not first-class Lisp objects.
¹) PAIP, p. 837:
(defun f (f)
(block f
(tagbody
f (catch 'f
(if (typep f 'f)
(throw 'f (go f)))
(funcall #'f (get (symbol-value 'f) 'f))))))
In PAIP, Peter Norvig says "Common Lisp has at least seven name spaces" (p. 836).
The seven he lists are:
functions and macros
variables
special variables
data types
label for go statements within a tagbody
a block name for return-from statements within a block
symbols inside a quoted expression
Peter Seibel makes a great point in his comp.lang.lisp post about "compiler" versus "library" namespaces. I think all of Norvig's seven namespaces are "compiler" namespaces.
See for example this old discussion post from comp.lang.lisp:
http://coding.derkeiler.com/Archive/Lisp/comp.lang.lisp/2004-04/0737.html
Yes - http://www.lispworks.com/documentation/lw51/CLHS/Body/t_symbol.htm#symbol specifies a separate value cell and function cell, consonant with a lisp-2.
There is also a property list, but as there is no context in which a symbol "naturally" refers to its property list, it is not usual to describe CL as a lisp-3 (in fact, I am not aware of any language usually so designated).

What does 'Language Construct' mean?

I am learning C from 'Programming in C' by Stephen Kochan.
Though the author is careful from the beginning only not to confuse the students with jargon, but occasionally he has used few terms without explaining their meaning. I have figured out the meaning of many such terms with the help of internet.
However, I could not understand the exactly meaning of the phrase 'language construct', and unfortunately the web doesn't provide a good explanation.
Considering I am a beginner, what does 'language construct' mean?
First, you need to understand what a constructed language Formal Language is. All programming languages are constructed formal languages (read the reference). You may then read a little bit about compiler construction, including this reference as well.
Going back to your question, consider this: The English language (a natural language) has tokens 'A-Z/0-9/,;"...' which we use to build "words" and we use languages rules to build sentences out of words. So, in the English language, a construct is what we build out of tokens.
Consider this brick-and-mortar example: Imagine if you set out to build a house, the basic materials you might use are: sand, iron, wood, cement, water (just five for simplicity). Anything you build out of these 4 or 5+ items would be a "construct", which in turn helps you build your house.
I have intentionally omitted details to further simplify the answer; hope this is helpful.
A language construct is a piece of language syntax. For example, the following is a language construct in C that lets you control the flow of a program:
if ( condition ) {
/* when condition is true */
} else {
/* when condition is false */
}
They usually use the term language construct because these are parts of most programming languages, but may be written differently, depending on the language. For example, a similar language construct in bourne shell would be:
if COMMAND; then
# when command returns 0
else
# when command returns anything else
fi
The function of this construct is the same, however, the way it's written is a bit different.
Hope this helps. If you need more detail, you may want to do a bit more research. As one of the comments suggests, Wikipedia may be helpful.
They are the base units from which the language is built up. They can't be used as a function rollback. They are directly called by the parser.
It includes all the syntax, semantics and coding styles of a language.
For more clarification you may refer to this question.
Wikipedia definition:
A language construct is a syntactically allowable part of a program that may be formed from one or more lexical tokens in accordance with the rules of a programming language.
The term Language Constructs is often used as a synonym for control structure, and should not be confused with a function.
Without seeing the context that the phrase is used in, I cannot be sure, but generally the phrase 'language construct' just means the combination of keywords, grammar and structure of a coding language. Basically, how to format/write/construct a piece of code.
Let say you want to create a class containing methods and properties, so:
Construct is an architecture of a class you are about to create. The architecture of the class consists of methods and properties created by you by using predefined utilities (such as: 'if', 'else', 'switch', 'break', etc)
That's my take on construct.
In reference to a programming language
Language Constructs mean the basic constructs of a programming languge e.g
1. Conditions (if, else, switch)
2. Loops (For, While, Do-while) etc
C is a structural language so while compiling your code everything thing goes statement by statement. Thus it becomes necessary to place your statement properly. This placing i.e. putting your statement correctly is your language construct else there may be syntax error or logical error.
Language constructs according to the GCSE book are basic building block of a programming language. that are
1. Sequential,
2. Selection, if, if/else
3. Iteration, while, for
Language construct is a piece of language syntax.
Example:
A declaration of a variable is a language construct:
{
int a; // declaration of a variable "a"
}
A language construct is a piece of syntax that the compiler has intimate knowledge about, usually because it needs to handle it specially. Typical examples of language constructs are the short-circuiting operators found in many imperative languages. Because these operators require lazy evaluation in an otherwise eager language, they must be handled specially by the compiler.
So, a stricter definition of a language construct may be: a syntactical form that is handled specially by the compiler, having functionality that cannot be implemented by a user.

Is there a programming language with no controls structures or operators?

Like Smalltalk or Lisp?
EDIT
Where control structures are like:
Java Python
if( condition ) { if cond:
doSomething doSomething
}
Or
Java Python
while( true ) { while True:
print("Hello"); print "Hello"
}
And operators
Java, Python
1 + 2 // + operator
2 * 5 // * op
In Smalltalk ( if I'm correct ) that would be:
condition ifTrue:[
doSomething
]
True whileTrue:[
"Hello" print
]
1 + 2 // + is a method of 1 and the parameter is 2 like 1.add(2)
2 * 5 // same thing
how come you've never heard of lisp before?
You mean without special syntax for achieving the same?
Lots of languages have control structures and operators that are "really" some form of message passing or functional call system that can be redefined. Most "pure" object languages and pure functional languages fit the bill. But they are all still going to have your "+" and some form of code block--including SmallTalk!--so your question is a little misleading.
Assembly
Befunge
Prolog*
*I cannot be held accountable for any frustration and/or headaches caused by trying to get your head around this technology, nor am I liable for any damages caused by you due to aforementioned conditions including, but not limited to, broken keyboard, punched-in screen and/or head-shaped dents in your desk.
Pure lambda calculus? Here's the grammar for the entire language:
e ::= x | e1 e2 | \x . e
All you have are variables, function application, and function creation. It's equivalent in power to a Turing machine. There are well-known codings (typically "Church encodings") for such constructs as
If-then-else
while-do
recursion
and such datatypes as
Booleans
integers
records
lists, trees, and other recursive types
Coding in lambda calculus can be a lot of fun—our students will do it in the undergraduate languages course next spring.
Forth may qualify, depending on exactly what you mean by "no control structures or operators". Forth may appear to have them, but really they are all just symbols, and the "control structures" and "operators" can be defined (or redefined) by the programmer.
What about Logo or more specifically, Turtle Graphics? I'm sure we all remember that, PEN UP, PEN DOWN, FORWARD 10, etc.
The SMITH programming language:
http://esolangs.org/wiki/SMITH
http://catseye.tc/projects/smith/
It has no jumps and is Turing complete. I've also made a Haskell interpreter for this bad boy a few years back.
I'll be first to mention brain**** then.
In Tcl, there's no control structures; there's just commands and they can all be redefined. Every last one. There's also no operators. Well, except for in expressions, but that's really just an imported foreign syntax that isn't part of the language itself. (We can also import full C or Fortran or just about anything else.)
How about FRACTRAN?
FRACTRAN is a Turing-complete esoteric programming language invented by the mathematician John Conway. A FRACTRAN program is an ordered list of positive fractions together with an initial positive integer input n. The program is run by updating the integer (n) as follows:
for the first fraction f in the list for which nf is an integer, replace n by nf
repeat this rule until no fraction in the list produces an integer when multiplied by n, then halt.
Of course there is an implicit control structure in rule 2.
D (used in DTrace)?
APT - (Automatic Programmed Tool) used extensively for programming NC machine tools.
The language also has no IO capabilities.
XSLT (or XSL, some say) has control structures like if and for, but you should generally avoid them and deal with everything by writing rules with the correct level of specificity. So the control structures are there, but are implied by the default thing the translation engine does: apply potentially-recursive rules.
For and if (and some others) do exist, but in many many situations you can and should work around them.
How about Whenever?
Programs consist of "to-do list" - a series of statements which are executed in random order. Each statement can contain a prerequisite, which if not fulfilled causes the statement to be deferred until some (random) later time.
I'm not entirely clear on the concept, but I think PostScript meets the criteria, although it calls all of its functions operators (the way LISP calls all of its operators functions).
Makefile syntax doesn't seem to have any operators or control structures. I'd say it's a programming language but it isn't Turing Complete (without extensions to the POSIX standard anyway)
So... you're looking for a super-simple language? How about Batch programming? If you have any version of Windows, then you have access to a Batch compiler. It's also more useful than you'd think, since you can carry out basic file functions (copy, rename, make directory, delete file, etc.)
http://www.csulb.edu/~murdock/dosindex.html
Example
Open notepad and make a .Bat file on your Windows box.
Open the .Bat file with notepad
In the first line, type "echo off"
In the second line, type "echo hello world"
In the third line, type "pause"
Save and run the file.
If you're looking for a way to learn some very basic programming, this is a good way to start. (Just be careful with the Delete and Format commands. Don't experiment with those.)

Why shouldn't I use "Hungarian Notation"?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I know what Hungarian refers to - giving information about a variable, parameter, or type as a prefix to its name. Everyone seems to be rabidly against it, even though in some cases it seems to be a good idea. If I feel that useful information is being imparted, why shouldn't I put it right there where it's available?
See also: Do people use the Hungarian naming conventions in the real world?
vUsing adjHungarian nnotation vmakes nreading ncode adjdifficult.
Most people use Hungarian notation in a wrong way and are getting wrong results.
Read this excellent article by Joel Spolsky: Making Wrong Code Look Wrong.
In short, Hungarian Notation where you prefix your variable names with their type (string) (Systems Hungarian) is bad because it's useless.
Hungarian Notation as it was intended by its author where you prefix the variable name with its kind (using Joel's example: safe string or unsafe string), so called Apps Hungarian has its uses and is still valuable.
Joel is wrong, and here is why.
That "application" information he's talking about should be encoded in the type system. You should not depend on flipping variable names to make sure you don't pass unsafe data to functions requiring safe data. You should make it a type error, so that it is impossible to do so. Any unsafe data should have a type that is marked unsafe, so that it simply cannot be passed to a safe function. To convert from unsafe to safe should require processing with some kind of a sanitize function.
A lot of the things that Joel talks of as "kinds" are not kinds; they are, in fact, types.
What most languages lack, however, is a type system that's expressive enough to enforce these kind of distinctions. For example, if C had a kind of "strong typedef" (where the typedef name had all the operations of the base type, but was not convertible to it) then a lot of these problems would go away. For example, if you could say, strong typedef std::string unsafe_string; to introduce a new type unsafe_string that could not be converted to a std::string (and so could participate in overload resolution etc. etc.) then we would not need silly prefixes.
So, the central claim that Hungarian is for things that are not types is wrong. It's being used for type information. Richer type information than the traditional C type information, certainly; it's type information that encodes some kind of semantic detail to indicate the purpose of the objects. But it's still type information, and the proper solution has always been to encode it into the type system. Encoding it into the type system is far and away the best way to obtain proper validation and enforcement of the rules. Variables names simply do not cut the mustard.
In other words, the aim should not be "make wrong code look wrong to the developer". It should be "make wrong code look wrong to the compiler".
I think it massively clutters up the source code.
It also doesn't gain you much in a strongly typed language. If you do any form of type mismatch tomfoolery, the compiler will tell you about it.
Hungarian notation only makes sense in languages without user-defined types. In a modern functional or OO-language, you would encode information about the "kind" of value into the datatype or class rather than into the variable name.
Several answers reference Joels article. Note however that his example is in VBScript, which didn't support user-defined classes (for a long time at least). In a language with user-defined types you would solve the same problem by creating a HtmlEncodedString-type and then let the Write method accept only that. In a statically typed language, the compiler will catch any encoding-errors, in a dynamically typed you would get a runtime exception - but in any case you are protected against writing unencoded strings. Hungarian notations just turns the programmer into a human type-checker, with is the kind of job that is typically better handled by software.
Joel distinguishes between "systems hungarian" and "apps hungarian", where "systems hungarian" encodes the built-in types like int, float and so on, and "apps hungarian" encodes "kinds", which is higher-level meta-info about variable beyound the machine type, In a OO or modern functional language you can create user-defined types, so there is no distinction between type and "kind" in this sense - both can be represented by the type system - and "apps" hungarian is just as redundant as "systems" hungarian.
So to answer your question: Systems hungarian would only be useful in a unsafe, weakly typed language where e.g. assigning a float value to an int variable will crash the system. Hungarian notation was specifically invented in the sixties for use in BCPL, a pretty low-level language which didn't do any type checking at all. I dont think any language in general use today have this problem, but the notation lived on as a kind of cargo cult programming.
Apps hungarian will make sense if you are working with a language without user defined types, like legacy VBScript or early versions of VB. Perhaps also early versions of Perl and PHP. Again, using it in a modern languge is pure cargo cult.
In any other language, hungarian is just ugly, redundant and fragile. It repeats information already known from the type system, and you should not repeat yourself. Use a descriptive name for the variable that describes the intent of this specific instance of the type. Use the type system to encode invariants and meta info about "kinds" or "classes" of variables - ie. types.
The general point of Joels article - to have wrong code look wrong - is a very good principle. However an even better protection against bugs is to - when at all possible - have wrong code to be detected automatically by the compiler.
I always use Hungarian notation for all my projects. I find it really helpful when I'm dealing with 100s of different identifier names.
For example, when I call a function requiring a string I can type 's' and hit control-space and my IDE will show me exactly the variable names prefixed with 's' .
Another advantage, when I prefix u for unsigned and i for signed ints, I immediately see where I am mixing signed and unsigned in potentially dangerous ways.
I cannot remember the number of times when in a huge 75000 line codebase, bugs were caused (by me and others too) due to naming local variables the same as existing member variables of that class. Since then, I always prefix members with 'm_'
Its a question of taste and experience. Don't knock it until you've tried it.
You're forgetting the number one reason to include this information. It has nothing to do with you, the programmer. It has everything to do with the person coming down the road 2 or 3 years after you leave the company who has to read that stuff.
Yes, an IDE will quickly identify types for you. However, when you're reading through some long batches of 'business rules' code, it's nice to not have to pause on each variable to find out what type it is. When I see things like strUserID, intProduct or guiProductID, it makes for much easier 'ramp up' time.
I agree that MS went way too far with some of their naming conventions - I categorize that in the "too much of a good thing" pile.
Naming conventions are good things, provided you stick to them. I've gone through enough old code that had me constantly going back to look at the definitions for so many similarly-named variables that I push "camel casing" (as it was called at a previous job). Right now I'm on a job that has many thousand of lines of completely uncommented classic ASP code with VBScript and it's a nightmare trying to figure things out.
Tacking on cryptic characters at the beginning of each variable name is unnecessary and shows that the variable name by itself isn't descriptive enough. Most languages require the variable type at declaration anyway, so that information is already available.
There's also the situation where, during maintenance, a variable type needs to change. Example: if a variable declared as "uint_16 u16foo" needs to become a 64-bit unsigned, one of two things will happen:
You'll go through and change each variable name (making sure not to hose any unrelated variables with the same name), or
Just change the type and not change the name, which will only cause confusion.
Joel Spolsky wrote a good blog post about this.
http://www.joelonsoftware.com/articles/Wrong.html
Basically it comes down to not making your code harder to read when a decent IDE will tell you want type the variable is if you can't remember. Also, if you make your code compartmentalized enough, you don't have to remember what a variable was declared as three pages up.
Isn't scope more important than type these days, e.g.
* l for local
* a for argument
* m for member
* g for global
* etc
With modern techniques of refactoring old code, search and replace of a symbol because you changed its type is tedious, the compiler will catch type changes, but often will not catch incorrect use of scope, sensible naming conventions help here.
There is no reason why you should not make correct use of Hungarian notation. It's unpopularity is due to a long-running back-lash against the mis-use of Hungarian notation, especially in the Windows APIs.
In the bad-old days, before anything resembling an IDE existed for DOS (odds are you didn't have enough free memory to run the compiler under Windows, so your development was done in DOS), you didn't get any help from hovering your mouse over a variable name. (Assuming you had a mouse.) What did you did have to deal with were event callback functions in which everything was passed to you as either a 16-bit int (WORD) or 32-bit int (LONG WORD). You then had to cast those parameter to the appropriate types for the given event type. In effect, much of the API was virtually type-less.
The result, an API with parameter names like these:
LRESULT CALLBACK WindowProc(HWND hwnd,
UINT uMsg,
WPARAM wParam,
LPARAM lParam);
Note that the names wParam and lParam, although pretty awful, aren't really any worse than naming them param1 and param2.
To make matters worse, Window 3.0/3.1 had two types of pointers, near and far. So, for example, the return value from memory management function LocalLock was a PVOID, but the return value from GlobalLock was an LPVOID (with the 'L' for long). That awful notation then got extended so that a long pointer string was prefixed lp, to distinguish it from a string that had simply been malloc'd.
It's no surprise that there was a backlash against this sort of thing.
Hungarian Notation can be useful in languages without compile-time type checking, as it would allow developer to quickly remind herself of how the particular variable is used. It does nothing for performance or behavior. It is supposed to improve code readability and is mostly a matter a taste and coding style. For this very reason it is criticized by many developers -- not everybody has the same wiring in the brain.
For the compile-time type-checking languages it is mostly useless -- scrolling up a few lines should reveal the declaration and thus type. If you global variables or your code block spans for much more than one screen, you have grave design and reusability issues. Thus one of the criticisms is that Hungarian Notation allows developers to have bad design and easily get away with it. This is probably one of the reasons for hatered.
On the other hand, there can be cases where even compile-time type-checking languages would benefit from Hungarian Notation -- void pointers or HANDLE's in win32 API. These obfuscates the actual data type, and there might be a merit to use Hungarian Notation there. Yet, if one can know the type of data at build time, why not to use the appropriate data type.
In general, there are no hard reasons not to use Hungarian Notation. It is a matter of likes, policies, and coding style.
As a Python programmer, Hungarian Notation falls apart pretty fast. In Python, I don't care if something is a string - I care if it can act like a string (i.e. if it has a ___str___() method which returns a string).
For example, let's say we have foo as an integer, 12
foo = 12
Hungarian notation tells us that we should call that iFoo or something, to denote it's an integer, so that later on, we know what it is. Except in Python, that doesn't work, or rather, it doesn't make sense. In Python, I decide what type I want when I use it. Do I want a string? well if I do something like this:
print "The current value of foo is %s" % foo
Note the %s - string. Foo isn't a string, but the % operator will call foo.___str___() and use the result (assuming it exists). foo is still an integer, but we treat it as a string if we want a string. If we want a float, then we treat it as a float. In dynamically typed languages like Python, Hungarian Notation is pointless, because it doesn't matter what type something is until you use it, and if you need a specific type, then just make sure to cast it to that type (e.g. float(foo)) when you use it.
Note that dynamic languages like PHP don't have this benefit - PHP tries to do 'the right thing' in the background based on an obscure set of rules that almost no one has memorized, which often results in catastrophic messes unexpectedly. In this case, some sort of naming mechanism, like $files_count or $file_name, can be handy.
In my view, Hungarian Notation is like leeches. Maybe in the past they were useful, or at least they seemed useful, but nowadays it's just a lot of extra typing for not a lot of benefit.
The IDE should impart that useful information. Hungarian might have made some sort (not a whole lot, but some sort) of sense when IDE's were much less advanced.
Apps Hungarian is Greek to me--in a good way
As an engineer, not a programmer, I immediately took to Joel's article on the merits of Apps Hungarian: "Making Wrong Code Look Wrong". I like Apps Hungarian because it mimics how engineering, science, and mathematics represent equations and formulas using sub- and super-scripted symbols (like Greek letters, mathematical operators, etc.). Take a particular example of Newton's Law of Universal Gravity: first in standard mathematical notation, and then in Apps Hungarian pseudo-code:
frcGravityEarthMars = G * massEarth * massMars / norm(posEarth - posMars)
In the mathematical notation, the most prominent symbols are those representing the kind of information stored in the variable: force, mass, position vector, etc. The subscripts play second fiddle to clarify: position of what? This is exactly what Apps Hungarian is doing; it's telling you the kind of thing stored in the variable first and then getting into specifics--about the closest code can get to mathematical notation.
Clearly strong typing can resolve the safe vs. unsafe string example from Joel's essay, but you wouldn't define separate types for position and velocity vectors; both are double arrays of size three and anything you're likely to do to one might apply to the other. Furthermore, it make perfect sense to concatenate position and velocity (to make a state vector) or take their dot product, but probably not to add them. How would typing allow the first two and prohibit the second, and how would such a system extend to every possible operation you might want to protect? Unless you were willing to encode all of math and physics in your typing system.
On top of all that, lots of engineering is done in weakly typed high-level languages like Matlab, or old ones like Fortran 77 or Ada.
So if you have a fancy language and IDE and Apps Hungarian doesn't help you then forget it--lots of folks apparently have. But for me, a worse than a novice programmer who is working in weakly or dynamically typed languages, I can write better code faster with Apps Hungarian than without.
It's incredibly redundant and useless is most modern IDEs, where they do a good job of making the type apparent.
Plus -- to me -- it's just annoying to see intI, strUserName, etc. :)
If I feel that useful information is being imparted, why shouldn't I put it right there where it's available?
Then who cares what anybody else thinks? If you find it useful, then use the notation.
Im my experience, it is bad because:
1 - then you break all the code if you need to change the type of a variable (i.e. if you need to extend a 32 bits integer to a 64 bits integer);
2 - this is useless information as the type is either already in the declaration or you use a dynamic language where the actual type should not be so important in the first place.
Moreover, with a language accepting generic programming (i.e. functions where the type of some variables is not determine when you write the function) or with dynamic typing system (i.e. when the type is not even determine at compile time), how would you name your variables? And most modern languages support one or the other, even if in a restricted form.
In Joel Spolsky's Making Wrong Code Look Wrong he explains that what everybody thinks of as Hungarian Notation (which he calls Systems Hungarian) is not what was it was really intended to be (what he calls Apps Hungarian). Scroll down to the I’m Hungary heading to see this discussion.
Basically, Systems Hungarian is worthless. It just tells you the same thing your compiler and/or IDE will tell you.
Apps Hungarian tells you what the variable is supposed to mean, and can actually be useful.
I've always thought that a prefix or two in the right place wouldn't hurt. I think if I can impart something useful, like "Hey this is an interface, don't count on specific behaviour" right there, as in IEnumerable, I oughtta do it. Comment can clutter things up much more than just a one or two character symbol.
It's a useful convention for naming controls on a form (btnOK, txtLastName etc.), if the list of controls shows up in an alphabetized pull-down list in your IDE.
I tend to use Hungarian Notation with ASP.NET server controls only, otherwise I find it too hard to work out what controls are what on the form.
Take this code snippet:
<asp:Label ID="lblFirstName" runat="server" Text="First Name" />
<asp:TextBox ID="txtFirstName" runat="server" />
<asp:RequiredFieldValidator ID="rfvFirstName" runat="server" ... />
If someone can show a better way of having that set of control names without Hungarian I'd be tempted to move to it.
Joel's article is great, but it seems to omit one major point:
Hungarian makes a particular 'idea' (kind + identifier name) unique,
or near-unique, across the codebase - even a very large codebase.
That's huge for code maintenance.
It means you can use good ol' single-line text search
(grep, findstr, 'find in all files') to find EVERY mention of that 'idea'.
Why is that important when we have IDE's that know how to read code?
Because they're not very good at it yet. This is hard to see in a small codebase,
but obvious in a large one - when the 'idea' might be mentioned in comments,
XML files, Perl scripts, and also in places outside source control (documents, wikis,
bug databases).
You do have to be a little careful even here - e.g. token-pasting in C/C++ macros
can hide mentions of the identifier. Such cases can be dealt with using
coding conventions, and anyway they tend to affect only a minority of the identifiers in the
codebase.
P.S. To the point about using the type system vs. Hungarian - it's best to use both.
You only need wrong code to look wrong if the compiler won't catch it for you. There are plenty of cases where it is infeasible to make the compiler catch it. But where it's feasible - yes, please do that instead!
When considering feasibility, though, do consider the negative effects of splitting up types. e.g. in C#, wrapping 'int' with a non-built-in type has huge consequences. So it makes sense in some situations, but not in all of them.
Debunking the benefits of Hungarian Notation
It provides a way of distinguishing variables.
If the type is all that distinguishes the one value from another, then it can only be for the conversion of one type to another. If you have the same value that is being converted between types, chances are you should be doing this in a function dedicated to conversion. (I have seen hungarianed VB6 leftovers use strings on all of their method parameters simply because they could not figure out how to deserialize a JSON object, or properly comprehend how to declare or use nullable types.) If you have two variables distinguished only by the Hungarian prefix, and they are not a conversion from one to the other, then you need to elaborate on your intention with them.
It makes the code more readable.
I have found that Hungarian notation makes people lazy with their variable names. They have something to distinguish it by, and they feel no need to elaborate to its purpose. This is what you will typically find in Hungarian notated code vs. modern: sSQL vs. groupSelectSql (or usually no sSQL at all because they are supposed to be using the ORM that was put in by earlier developers.), sValue vs. formCollectionValue (or usually no sValue either, because they happen to be in MVC and should be using its model binding features), sType vs. publishSource, etc.
It can't be readability. I see more sTemp1, sTemp2... sTempN from any given hungarianed VB6 leftover than everybody else combined.
It prevents errors.
This would be by virtue of number 2, which is false.
In the words of the master:
http://www.joelonsoftware.com/articles/Wrong.html
An interesting reading, as usual.
Extracts:
"Somebody, somewhere, read Simonyi’s paper, where he used the word “type,” and thought he meant type, like class, like in a type system, like the type checking that the compiler does. He did not. He explained very carefully exactly what he meant by the word “type,” but it didn’t help. The damage was done."
"But there’s still a tremendous amount of value to Apps Hungarian, in that it increases collocation in code, which makes the code easier to read, write, debug, and maintain, and, most importantly, it makes wrong code look wrong."
Make sure you have some time before reading Joel On Software. :)
Several reasons:
Any modern IDE will give you the variable type by simply hovering your mouse over the variable.
Most type names are way long (think HttpClientRequestProvider) to be reasonably used as prefix.
The type information does not carry the right information, it is just paraphrasing the variable declaration, instead of outlining the purpose of the variable (think myInteger vs. pageSize).
I don't think everyone is rabidly against it. In languages without static types, it's pretty useful. I definitely prefer it when it's used to give information that is not already in the type. Like in C, char * szName says that the variable will refer to a null terminated string -- that's not implicit in char* -- of course, a typedef would also help.
Joel had a great article on using hungarian to tell if a variable was HTML encoded or not:
http://www.joelonsoftware.com/articles/Wrong.html
Anyway, I tend to dislike Hungarian when it's used to impart information I already know.
Of course when 99% of programmers agree on something, there is something wrong. The reason they agree here is because most of them have never used Hungarian notation correctly.
For a detailed argument, I refer you to a blog post I have made on the subject.
http://codingthriller.blogspot.com/2007/11/rediscovering-hungarian-notation.html
I started coding pretty much the about the time Hungarian notation was invented and the first time I was forced to use it on a project I hated it.
After a while I realised that when it was done properly it did actually help and these days I love it.
But like all things good, it has to be learnt and understood and to do it properly takes time.
The Hungarian notation was abused, particularly by Microsoft, leading to prefixes longer than the variable name, and showing it is quite rigid, particularly when you change the types (the infamous lparam/wparam, of different type/size in Win16, identical in Win32).
Thus, both due to this abuse, and its use by M$, it was put down as useless.
At my work, we code in Java, but the founder cames from MFC world, so use similar code style (aligned braces, I like this!, capitals to method names, I am used to that, prefix like m_ to class members (fields), s_ to static members, etc.).
And they said all variables should have a prefix showing its type (eg. a BufferedReader is named brData). Which shown as being a bad idea, as the types can change but the names doesn't follow, or coders are not consistent in the use of these prefixes (I even see aBuffer, theProxy, etc.!).
Personally, I chose for a few prefixes that I find useful, the most important being b to prefix boolean variables, as they are the only ones where I allow syntax like if (bVar) (no use of autocast of some values to true or false).
When I coded in C, I used a prefix for variables allocated with malloc, as a reminder it should be freed later. Etc.
So, basically, I don't reject this notation as a whole, but took what seems fitting for my needs.
And of course, when contributing to some project (work, open source), I just use the conventions in place!