Using magic strings or constants in processing punctuation? - language-agnostic

We do a lot of lexical processing with arbitrary strings which include arbitrary punctuation. I am divided as to whether to use magic characters/strings or symbolic constants.
The examples should be read as language-independent although most are Java.
There are clear examples where punctuation has a semantic role and should be identified as a constant:
File.separator not "/" or "\\"; // a no-brainer as it is OS-dependent
and I write XML_PREFIX_SEPARATOR = ":";
However let's say I need to replace all examples of "" with an empty string ``. I can write:
s = s.replaceAll("\"\"", "");
or
s = s.replaceAll(S_QUOT+S_QUOT, S_EMPTY);
(I have defined all common punctuation as S_FOO (string) and C_FOO (char))
In favour of magic strings/characters:
It's shorter
It's natural to read (sometimes)
The named constants may not be familiar (C_APOS vs '\'')
In favour of constants
It's harder to make typos (e.g. contrast "''" + '"' with S_APOS+S_APOS + C_QUOT)
It removes escaping problems Should a regex be "\\s+" or "\s+" or "\\\\s+"?
It's easy to search the code for punctuation
(There is a limit to this - I would not write regexes this way even though regex syntax is one of the most cognitively dysfunctional parts of all programming. I think we need a better syntax.)

If the definitions may change over time or between installations, I tend to put these things in a config file, and pick up the information at startup or on-demand (depending on the situation). Then provide a static class with read-only interface and clear names on the properties for exposing the information to the system.
Usage could look like this:
s = s.replaceAll(CharConfig.Quotation + CharConfig.Quotation, CharConfig.EmtpyString);

For general string processing, I wouldn't use special symbols. A space is always going to be a space, and it's just more natural to read (and write!):
s.replace("String", " ");
Than:
s.replace("String", S_SPACE);
I would take special care to use things like "\t" to represent tabs, for example, since they can't easily be distinguished from spaces in a string.
As for things like XML_PREFIX_SEPARATOR or FILE_SEPARATOR, you should probably never have to deal with constants like that, since you should use a library to do the work for you. For example, you shouldn't be hand-writing: dir + FILE_SEPARATOR + filename, but rather be calling: file_system_library.join(dir, filename) (or whatever equivalent you're using).
This way, you'll not only have an answer for things like the constants, you'll actually get much better handling of various edge cases which you probably aren't thinking about right now

Related

Why does gensim ignore underscores during preprocessing?

Going through the gensim source, I noticed the simple_preprocess utility function clears all punctuations except those with words starting with an underscore, _. Is there a reason for this?
def simple_preprocess(doc, deacc=False, min_len=2, max_len=15):
tokens = [
token for token in tokenize(doc, lower=True, deacc=deacc, errors='ignore')
if min_len <= len(token) <= max_len and not token.startswith('_')
]
return tokens
The underscore ('_') isn't typically meaningful punctuation, but is often considered a "word" character in programming and text-processing.
For example, common regular-expression syntax uses \w to indicate a "word character". Per https://www.regular-expressions.info/shorthand.html :
\w stands for "word character". It always matches the ASCII characters
[A-Za-z0-9_]. Notice the inclusion of the underscore and digits. In
most flavors that support Unicode, \w includes many characters from
other scripts. There is a lot of inconsistency about which characters
are actually included. Letters and digits from alphabetic scripts and
ideographs are generally included. Connector punctuation other than
the underscore and numeric symbols that aren't digits may or may not
be included. XML Schema and XPath even include all symbols in \w.
Again, Java, JavaScript, and PCRE match only ASCII characters with \w.
As such, it's often used in authoring, or in other text-preprocessing steps, to connect other groups of letters/numbers that should be kept together as a unit. Thus it's not often cleared with other true punctuation.
The code you've referenced also does something else, different than your question about clearing punctuation: it drops word-tokens beginning with _.
I'm not sure why it does that; at some point that code may have be designed with some specific text-format in mind where leading-underscore tokens were semantically-unimportant formatting directives.
The simple_preprocess() function in gensim is just a quick-and-dirty baseline helpful for internal tests and compact beginner tutorials. It shouldn't be considered a "best practice".
Real projects should give more consideration to the kind of word-tokenization that makes sense for their data and purposes – and either look to libraries with more options, or custom approaches (which still need not be more than a few lines of Python), to implement tokenization that best suits their needs.

Why do programming languages not allow spaces in identifiers?

This may seem like a dumb question, but still I don't know the answer.
Why do programming languages not allow spaces in the names ( for instance method names )?
I understand it is to facilitate ( allow ) the parsing, and at some point it would be impossible to parse anything if spaces were allowed.
Nowadays we are so use to it that the norm is not to see spaces.
For instance:
object.saveData( data );
object.save_data( data )
object.SaveData( data );
[object saveData:data];
etc.
Could be written as:
object.save data( data ) // looks ugly, but that's the "nature" way.
If it is only for parsing, I guess the identifier could be between . and ( of course, procedural languages wouldn't be able to use it because there is no '.' but OO do..
I wonder if parsing is the only reason, and if it is, how important it is ( I assume that it will be and it will be impossible to do it otherwise, unless all the programming language designers just... forget the option )
EDIT
I'm ok with identifiers in general ( as the fortran example ) is bad idea. Narrowing to OO languages and specifically to methods, I don't see ( I don't mean there is not ) a reason why it should be that way. After all the . and the first ( may be used.
And forget the saveData method , consider this one:
key.ToString().StartsWith("TextBox")
as:
key.to string().starts with("textbox");
Be cause i twoul d makepa rsing suc hcode reallydif ficult.
I used an implementation of ALGOL (c. 1978) which—extremely annoyingly—required quoting of what is now known as reserved words, and allowed spaces in identifiers:
"proc" filter = ("proc" ("int") "bool" p, "list" l) "list":
"if" l "is" "nil" "then" "nil"
"elif" p(hd(l)) "then" cons(hd(l), filter(p,tl(l)))
"else" filter(p, tl(l))
"fi";
Also, FORTRAN (the capitalized form means F77 or earlier), was more or less insensitive to spaces. So this could be written:
799 S = FLO AT F (I A+I B+I C) / 2 . 0
A R E A = SQ R T ( S *(S - F L O ATF(IA)) * (S - FLOATF(IB)) *
+ (S - F LOA TF (I C)))
which was syntactically identical to
799 S = FLOATF (IA + IB + IC) / 2.0
AREA = SQRT( S * (S - FLOATF(IA)) * (S - FLOATF(IB)) *
+ (S - FLOATF(IC)))
With that kind of history of abuse, why make parsing difficult for humans? Let alone complicate computer parsing.
Yes, it's the parsing - both human and computer. It's easier to read and easier to parse if you can safely assume that whitespace doesn't matter. Otherwise, you can have potentially ambiguous statements, statements where it's not clear how things go together, statements that are hard to read, etc.
Such a change would make for an ambiguous language in the best of cases. For example, in a C99-like language:
if not foo(int x) {
...
}
is that equivalent to:
A function definition of foo that returns a value of type ifnot:
ifnot foo(int x) {
...
}
A call to a function called notfoo with a variable named intx:
if notfoo(intx) {
...
}
A negated call to a function called foo (with C99's not which means !):
if not foo(intx) {
...
}
This is just a small sample of the ambiguities you might run into.
Update: I just noticed that obviously, in a C99-like language, the condition of an if statement would be enclosed in parentheses. Extra punctuation can help with ambiguities if you choose to ignore whitespace, but your language will end up having lots of extra punctuation wherever you would normally have used whitespace.
Before the interpreter or compiler can build a parse tree, it must perform lexical analysis, turning the stream of characters into a stream of tokens. Consider how you would want the following parsed:
a = 1.2423 / (4343.23 * 2332.2);
And how your rule above would work on it. Hard to know how to lexify it without understanding the meaning of the tokens. It would be really hard to build a parser that did lexification at the same time.
There are a few languages which allow spaces in identifiers. The fact that nearly all languages constrain the set of characters in identifiers is because parsing is more easy and most programmers are accustomed to the compact no-whitespace style.
I don’t think there’s real reason.
Check out Stroustrup's classic Generalizing Overloading for C++2000.
We were allowed to put spaces in filenames back in the 1960's, and computers still don't handle them very well (everything used to break, then most things, now it's just a few things - but they still break).
We simply can't wait another 50 years before our code will work again.
:-)
(And what everyone else said, of course. In English, we use spaces and punctuation to separate the words. The same is true for computer languages, except that computer parsers define "words" in a slightly different sense)
Using space as part of an identifier makes parsing really murky (is that a syntactic space or an identifier?), but the same sort "natural reading" behavior is achieved with keyword arguments. object.save(data: something, atomically: true)
The TikZ language for creating graphics in LaTeX allows whitespace in parameter names (also known as 'keys'). For instance, you see things like
\shade[
top color=yellow!70,
bottom color=red!70,
shading angle={45},
]
In this restricted setting of a comma-separated list of key-value pairs, there's no parsing difficulty. In fact, I think it's much easier to read than the alternatives like topColor, top_color or topcolor.

Are hard-coded STRINGS ever acceptable?

Similar to Is hard-coding literals ever acceptable?, but I'm specifically thinking of "magic strings" here.
On a large project, we have a table of configuration options like these:
Name Value
---- -----
FOO_ENABLED Y
BAR_ENABLED N
...
(Hundreds of them).
The common practice is to call a generic function to test an option like this:
if (config_options.value('FOO_ENABLED') == 'Y') ...
(Of course, this same option may need to be checked in many places in the system code.)
When adding a new option, I was considering adding a function to hide the "magic string" like this:
if (config_options.foo_enabled()) ...
However, colleagues thought I'd gone overboard and objected to doing this, preferring the hard-coding because:
That's what we normally do
It makes it easier to see what's going on when debugging the code
The trouble is, I can see their point! Realistically, we are never going to rename the options for any reason, so about the only advantage I can think of for my function is that the compiler would catch any typo like fo_enabled(), but not 'FO_ENABLED'.
What do you think? Have I missed any other advantages/disadvantages?
If I use a string once in the code, I don't generally worry about making it a constant somewhere.
If I use a string twice in the code, I'll consider making it a constant.
If I use a string three times in the code, I'll almost certainly make it a constant.
if (config_options.isTrue('FOO_ENABLED')) {...
}
Restrict your hard coded Y check to one place, even if it means writing a wrapper class for your Map.
if (config_options.isFooEnabled()) {...
}
Might seem okay until you have 100 configuration options and 100 methods (so here you can make a judgement about future application growth and needs before deciding on your implementation). Otherwise it is better to have a class of static strings for parameter names.
if (config_options.isTrue(ConfigKeys.FOO_ENABLED)) {...
}
I realise the question is old, but it came up on my margin.
AFAIC, the issue here has not been identified accurately, either in the question, or the answers. Forget about 'harcoding strings" or not, for a moment.
The database has a Reference table, containing config_options. The PK is a string.
There are two types of PKs:
Meaningful Identifiers, that the users (and developers) see and use. These PKs are supposed to be stable, they can be relied upon.
Meaningless Id columns which the users should never see, that the developers have to be aware of, and code around. These cannot be relied upon.
It is ordinary, normal, to write code using the absolute value of a meaningful PK IF CustomerCode = "IBM" ... or IF CountryCode = "AUS" etc.
referencing the absolute value of a meaningless PK is not acceptable (due to auto-increment; gaps being changed; values being replaced wholesale).
.
Your reference table uses meaningful PKs. Referencing those literal strings in code is unavoidable. Hiding the value will make maintenance more difficult; the code is no longer literal; your colleagues are right. Plus there is the additional redundant function that chews cycles. If there is a typo in the literal, you will soon find that out during Dev testing, long before UAT.
hundreds of functions for hundreds of literals is absurd. If you do implement a function, then Normalise your code, and provide a single function that can be used for any of the hundreds of literals. In which case, we are back to a naked literal, and the function can be dispensed with.
the point is, the attempt to hide the literal has no value.
.
It cannot be construed as "hardcoding", that is something quite different. I think that is where your issue is, identifying these constructs as "hardcoded". It is just referencing a Meaningfull PK literally.
Now from the perspective of any code segment only, if you use the same value a few times, you can improve the code by capturing the literal string in a variable, and then using the variable in the rest of the code block. Certainly not a function. But that is an efficiency and good practice issue. Even that does not change the effect IF CountryCode = #cc_aus
I really should use constants and no hard coded literals.
You can say they won't be changed, but you may never know. And it is best to make it a habit. To use symbolic constants.
In my experience, this kind of issue is masking a deeper problem: failure to do actual OOP and to follow the DRY principle.
In a nutshell, capture the decision at startup time by an appropriate definition for each action inside the if statements, and then throw away both the config_options and the run-time tests.
Details below.
The sample usage was:
if (config_options.value('FOO_ENABLED') == 'Y') ...
which raises the obvious question, "What's going on in the ellipsis?", especially given the following statement:
(Of course, this same option may need to be checked in many places in the system code.)
Let's assume that each of these config_option values really does correspond to a single problem domain (or implementation strategy) concept.
Instead of doing this (repeatedly, in various places throughout the code):
Take a string (tag),
Find its corresponding other string (value),
Test that value as a boolean-equivalent,
Based on that test, decide whether to perform some action.
I suggest encapsulating the concept of a "configurable action".
Let's take as an example (obviously just as hypthetical as FOO_ENABLED ... ;-) that your code has to work in either English units or metric units. If METRIC_ENABLED is "true", convert user-entered data from metric to English for internal computation, and convert back prior to displaying results.
Define an interface:
public interface MetricConverter {
double toInches(double length);
double toCentimeters(double length);
double toPounds(double weight);
double toKilograms(double weight);
}
which identifies in one place all the behavior associated with the concept of METRIC_ENABLED.
Then write concrete implementations of all the ways those behaviors are to be carried out:
public class NullConv implements MetricConverter {
double toInches(double length) {return length;}
double toCentimeters(double length) {return length;}
double toPounds(double weight) {return weight;}
double toKilograms(double weight) {return weight;}
}
and
// lame implementation, just for illustration!!!!
public class MetricConv implements MetricConverter {
public static final double LBS_PER_KG = 2.2D;
public static final double CM_PER_IN = 2.54D
double toInches(double length) {return length * CM_PER_IN;}
double toCentimeters(double length) {return length / CM_PER_IN;}
double toPounds(double weight) {return weight * LBS_PER_KG;}
double toKilograms(double weight) {return weight / LBS_PER_KG;}
}
At startup time, instead of loading a bunch of config_options values, initialize a set of configurable actions, as in:
MetricConverter converter = (metricOption()) ? new MetricConv() : new NullConv();
(where the expression metricOption() above is a stand-in for whatever one-time-only check you need to make, including looking at the value of METRIC_ENABLED ;-)
Then, wherever the code would have said:
double length = getLengthFromGui();
if (config_options.value('METRIC_ENABLED') == 'Y') {
length = length / 2.54D;
}
// do some computation to produce result
// ...
if (config_options.value('METRIC_ENABLED') == 'Y') {
result = result * 2.54D;
}
displayResultingLengthOnGui(result);
rewrite it as:
double length = converter.toInches(getLengthFromGui());
// do some computation to produce result
// ...
displayResultingLengthOnGui(converter.toCentimeters(result));
Because all of the implementation details related to that one concept are now packaged cleanly, all future maintenance related to METRIC_ENABLED can be done in one place. In addition, the run-time trade-off is a win; the "overhead" of invoking a method is trivial compared with the overhead of fetching a String value from a Map and performing String#equals.
I believe that the two reasons you have mentioned, Possible misspelling in string, that cannot be detected until run time and the possibility (although slim) of a name change would justify your idea.
On top of that you can get typed functions, now it seems you only store booleans, what if you need to store an int, a string etc. I would rather use get_foo() with a type, than get_string("FOO") or get_int("FOO").
I think there are two different issues here:
In the current project, the convention of using hard-coded strings is already well established, so all the developers working on the project are familiar with it. It might be a sub-optimal convention for all the reasons that have been listed, but everybody familiar with the code can look at it and instinctively knows what the code is supposed to do. Changing the code so that in certain parts, it uses the "new" functionality will make the code slightly harder to read (because people will have to think and remember what the new convention does) and thus a little harder to maintain. But I would guess that changing over the whole project to the new convention would potentially be prohibitively expensive unless you can quickly script the conversion.
On a new project, symbolic constants are the way IMO, for all the reasons listed. Especially because anything that makes the compiler catch errors at compile time that would otherwise be caught by a human at run time is a very useful convention to establish.
Another thing to consider is intent. If you are on a project that requires localization hard coded strings can be ambiguous. Consider the following:
const string HELLO_WORLD = "Hello world!";
print(HELLO_WORLD);
The programmer's intent is clear. Using a constant implies that this string does not need to be localized. Now look at this example:
print("Hello world!");
Here we aren't so sure. Did the programmer really not want this string to be localized or did the programmer forget about localization while he was writing this code?
I too prefer a strongly-typed configuration class if it is used through-out the code. With properly named methods you don't lose any readability. If you need to do conversions from strings to another data type (decimal/float/int), you don't need to repeat the code that does the conversion in multiple places and can cache the result so the conversion only takes place once. You've already got the basis of this in place already so I don't think it would take much to get used to the new way of doing things.

What is your system for avoiding keyword naming clashes?

Typically languages have keywords that you are unable to use directly with the exact same spelling and case for naming things (variables,functions,classes ...) in your program. Yet sometimes a keyword is the only natural choice for naming something. What is your system for avoiding/getting around this clash in your chosen technology?
I just avoid the name, usually. Either find a different name or change it slightly - e.g. clazz instead of class in C# or Java. In C# you can use the # prefix, but it's horrible:
int #int = 5; // Ick!
There is nothing intrinsically all-encompassing about a keyword, in that it should stop you from being able to name your variables. Since all names are just generalized instances of some type to one degree or another, you can always go up or down in the abstraction to find another useful name.
For example, if your writing a system that tracks students and you want an object to represent their study in a specific field, i.e. they've taken a "class" in something, if you can't use the term directly, or the plural "classes", or an alternative like "studies", you might find a more "instanced" variation: studentClass, currentClass, etc. or a higher perspective: "courses", "courseClass" or a specfic type attribute: dailyClass, nightClass, etc.
Lots of options, you should just prefer the simplest and most obvious one, that's all.
I always like to listen to the users talk, because the scope of their language helps define the scope of the problem, often if you listen long enough you'll find they have many multiple terms for the same underlying things (with only subtle differences). They usually have the answer ...
Paul.
My system is don't use keywords period!
If I have a function/variable/class and it only seems logical to name it with a keyword, I'll use a descriptive word in front of the keyword.
(adjectiveNoun) format. ie: personName instead of Name where "Name" is a keyword.
I just use a more descriptive name. For instance, 'id' becomes identifier, 'string' becomes 'descriptionString,' and so on.
In Python I usually use proper namespacing on my modules to avoid name clashes.
import re
re.compile()
instead of:
from re import *
compile()
Sometimes, when I can't avoid keyword name clashes I simply drop the last letter off the name of my variable.
for fil in files:
pass
As stated before either change class to clazz in Java/C#, or use some underscore as a prefix, for example
int _int = 0;
There should be no reason to use keywords as variable names. Either use a more detailed word or use a thesaraus. Capitalizing certain letters of the word to make it not exactly like the keyword is not going to help much to someone inheriting your code later.
Happy those with a language without ANY keywords...
But joke apart, I think in the seldom situations where "Yet sometimes a keyword is the only natural choice for naming something." you can get along by prefixing it with "my", "do", "_" or similar.
I honestly can't really think of many such instances where the keyword alone makes a good name ("int", "for" and "if" are definitely bad anyway). The only few in the C-language family which might make sense are "continue" (make it "doContinue"), "break" (how about "breakWhenEOFIsreached" or similar ?) and the already mentioned "class" (how about "classOfThingy" ?).
In other words: make the names more reasonable.
And always remember: code is WRITTEN only once, but usualy READ very often.
Typically I follow Hungarian Notation. So if, for whatever reason, I wanted to use 'End' as a variable of type integer I would declare it as 'iEnd'. A string would be 'strEnd', etc. This usually gives me some room as far as variables go.
If I'm working on a particular personal project that other people will only ever look at to see what I did, for example, when making an add-on to a game using the UnrealEngine, I might use my initials somewhere in the name. 'DS_iEnd' perhaps.
I write my own [vim] syntax highlighters for each language, and I give all keywords an obvious colour so that I notice them when I'm coding. Languages like PHP and Perl use $ for variables, making it a non-issue.
Developing in Ruby on Rails I sometime look up this list of reserved words.
In 15 years of programming, I've rarely had this problem.
One place I can immediately think of, is perhaps a css class, and in that case, I'd use a more descriptive name. So instead of 'class', I might use 'targetClass' or something similar.
In python the generally accepted method is to append an '_'
class -> class_
or -> or_
and -> and_
you can see this exemplified in the operator module.
I switched to a language which doesn't restrict identifier names at all.
First of all, most code conventions prevent such a thing from happening.
If not, I usually add a descriptive prose prefix or suffix:
the_class or theClass infix_or (prefix_or(class_param, in_class) , a_class) or_postfix
A practice, that is usually in keeping with every code style advice you can find ("long names don't kill", "Longer variable names don't take up more space in memory, I promise.")
Generally, if you think the keyword is the best description, a slightly worse one would be better.
Note that, by the very premise of your question you introduce ambiguity, which is bad for the reader, be it a compiler or human. Even if it is a custom to use class, clazz or klass and even if that custom is not so custom that it is a custom: it takes a word word, precisely descriptive as word may be, and distorts it, effectively shooting w0rd's precision in the "wrd". Somebody used to another w_Rd convention or language might have a few harsh wordz for your wolds.
Most of us have more to say about things than "Flower", "House" or "Car", so there's usually more to say about typeNames, decoratees, class_params, BaseClasses and typeReferences.
This is where my personal code obfuscation tolerance ends:
Never(!!!) rely on scoping or arcane syntax rules to prevent name clashes with "key words". (Don't know any compiler that would allow that, but, these days, you never know...).
Try that and someone will w**d you in the wörd so __rd, Word will look like TeX to you!
My system in Java is to capitalize the second letter of the word, so for example:
int dEfault;
boolean tRansient;
Class cLass;

How to name variables

What rules do you use to name your variables?
Where are single letter vars allowed?
How much info do you put in the name?
How about for example code?
What are your preferred meaningless variable names? (after foo & bar)
Why are they spelled "foo" and "bar" rather than FUBAR
function startEditing(){
if (user.canEdit(currentDocument)){
editorControl.setEditMode(true);
setButtonDown(btnStartEditing);
}
}
Should read like a narrative work.
One rule I always follow is this: if a variable encodes a value that is in some particular units, then those units have to be part of the variable name. Example:
int postalCodeDistanceMiles;
decimal reactorCoreTemperatureKelvin;
decimal altitudeMsl;
int userExperienceWongBakerPainScale
I will NOT be responsible for crashing any Mars landers (or the equivalent failure in my boring CRUD business applications).
Well it all depends on the language you are developing in. As I am currently using C# I tend you use the following.
camelCase for variables.
camelCase for parameters.
PascalCase for properties.
m_PascalCase for member variables.
Where are single letter vars allows?
I tend to do this in for loops but feel a bit guilty whenever I do so. But with foreach and lambda expressions for loops are not really that common now.
How much info do you put in the name?
If the code is a bit difficult to understand write a comment. Don't turn a variable name into a comment, i.e .
int theTotalAccountValueIsStoredHere
is not required.
what are your preferred meaningless variable names? (after foo & bar)
i or x. foo and bar are a bit too university text book example for me.
why are they spelled "foo" and "bar" rather than FUBAR?
Tradition
These are all C# conventions.
Variable-name casing
Case indicates scope. Pascal-cased variables are fields of the owning class. Camel-cased variables are local to the current method.
I have only one prefix-character convention. Backing fields for class properties are Pascal-cased and prefixed with an underscore:
private int _Foo;
public int Foo { get { return _Foo; } set { _Foo = value; } }
There's some C# variable-naming convention I've seen out there - I'm pretty sure it was a Microsoft document - that inveighs against using an underscore prefix. That seems crazy to me. If I look in my code and see something like
_Foo = GetResult();
the very first thing that I ask myself is, "Did I have a good reason not to use a property accessor to update that field?" The answer is often "Yes, and you'd better know what that is before you start monkeying around with this code."
Single-letter (and short) variable names
While I tend to agree with the dictum that variable names should be meaningful, in practice there are lots of circumstances under which making their names meaningful adds nothing to the code's readability or maintainability.
Loop iterators and array indices are the obvious places to use short and arbitrary variable names. Less obvious, but no less appropriate in my book, are nonce usages, e.g.:
XmlWriterSettings xws = new XmlWriterSettings();
xws.Indent = true;
XmlWriter xw = XmlWriter.Create(outputStream, xws);
That's from C# 2.0 code; if I wrote it today, of course, I wouldn't need the nonce variable:
XmlWriter xw = XmlWriter.Create(
outputStream,
new XmlWriterSettings() { Indent=true; });
But there are still plenty of places in C# code where I have to create an object that you're just going to pass elsewhere and then throw away.
A lot of developers would use a name like xwsTemp in those circumstances. I find that the Temp suffix is redundant. The fact that I named the variable xws in its declaration (and I'm only using it within visual range of that declaration; that's important) tells me that it's a temporary variable.
Another place I'll use short variable names is in a method that's making heavy use of a single object. Here's a piece of production code:
internal void WriteXml(XmlWriter xw)
{
if (!Active)
{
return;
}
xw.WriteStartElement(Row.Table.TableName);
xw.WriteAttributeString("ID", Row["ID"].ToString());
xw.WriteAttributeString("RowState", Row.RowState.ToString());
for (int i = 0; i < ColumnManagers.Length; i++)
{
ColumnManagers[i].Value = Row.ItemArray[i];
xw.WriteElementString(ColumnManagers[i].ColumnName, ColumnManagers[i].ToXmlString());
}
...
There's no way in the world that code would be easier to read (or safer to modify) if I gave the XmlWriter a longer name.
Oh, how do I know that xw isn't a temporary variable? Because I can't see its declaration. I only use temporary variables within 4 or 5 lines of their declaration. If I'm going to need one for more code than that, I either give it a meaningful name or refactor the code using it into a method that - hey, what a coincidence - takes the short variable as an argument.
How much info do you put in the name?
Enough.
That turns out to be something of a black art. There's plenty of information I don't have to put into the name. I know when a variable's the backing field of a property accessor, or temporary, or an argument to the current method, because my naming conventions tell me that. So my names don't.
Here's why it's not that important.
In practice, I don't need to spend much energy figuring out variable names. I put all of that cognitive effort into naming types, properties and methods. This is a much bigger deal than naming variables, because these names are very often public in scope (or at least visible throughout the namespace). Names within a namespace need to convey meaning the same way.
There's only one variable in this block of code:
RowManager r = (RowManager)sender;
// if the settings allow adding a new row, add one if the context row
// is the last sibling, and it is now active.
if (Settings.AllowAdds && r.IsLastSibling && r.Active)
{
r.ParentRowManager.AddNewChildRow(r.RecordTypeRow, false);
}
The property names almost make the comment redundant. (Almost. There's actually a reason why the property is called AllowAdds and not AllowAddingNewRows that a lot of thought went into, but it doesn't apply to this particular piece of code, which is why there's a comment.) The variable name? Who cares?
Pretty much every modern language that had wide use has its own coding standards. These are a great starting point. If all else fails, just use whatever is recommended. There are exceptions of course, but these are general guidelines. If your team prefers certain variations, as long as you agree with them, then that's fine as well.
But at the end of the day it's not necessarily what standards you use, but the fact that you have them in the first place and that they are adhered to.
I only use single character variables for loop control or very short functions.
for(int i = 0; i< endPoint; i++) {...}
int max( int a, int b) {
if (a > b)
return a;
return b;
}
The amount of information depends on the scope of the variable, the more places it could be used, the more information I want to have the name to keep track of its purpose.
When I write example code, I try to use variable names as I would in real code (although functions might get useless names like foo or bar).
See Etymology of "Foo"
What rules do you use to name your variables?
Typically, as I am a C# developer, I follow the variable naming conventions as specified by the IDesign C# Coding Standard for two reasons
1) I like it, and find it easy to read.
2) It is the default that comes with the Code Style Enforcer AddIn for Visual Studio 2005 / 2008 which I use extensively these days.
Where are single letter vars allows?
There are a few places where I will allow single letter variables. Usually these are simple loop indexers, OR mathematical concepts like X,Y,Z coordinates. Other than that, never! (Everywhere else I have used them, I have typically been bitten by them when rereading the code).
How much info do you put in the name?
Enough to know PRECISELY what the variable is being used for. As Robert Martin says:
The name of a variable, function, or
class, should answer all the big
questions. It should tell you why it
exists, what it does, and how it is
used. If a name requires a comment,
then the name does not reveal its
intent.
From Clean Code - A Handbook of Agile Software Craftsmanship
I never use meaningless variable names like foo or bar, unless, of course, the code is truly throw-away.
For loop variables, I double up the letter so that it's easier to search for the variable within the file. For example,
for (int ii=0; ii < array.length; ii++)
{
int element = array[ii];
printf("%d", element);
}
What rules do you use to name your variables? I've switched between underscore between words (load_vars), camel casing (loadVars) and no spaces (loadvars). Classes are always CamelCase, capitalized.
Where are single letter vars allows? Loops, mostly. Temporary vars in throwaway code.
How much info do you put in the name? Enough to remind me what it is while I'm coding. (Yes this can lead to problems later!)
what are your preferred meaningless variable names? (after foo & bar) temp, res, r. I actually don't use foo and bar a good amount.
What rules do you use to name your variables?
I need to be able to understand it in a year's time. Should also conform with preexisting style.
Where are single letter vars allows?
ultra-obvious things. E.g. char c; c = getc(); Loop indicies(i,j,k).
How much info do you put in the name?
Plenty and lots.
how about for example code?
Same as above.
what are your preferred meaningless variable names? (after foo & bar)
I don't like having meaningless variable names. If a variable doesn't mean anything, why is it in my code?
why are they spelled "foo" and "bar" rather than FUBAR
Tradition.
The rules I adhere to are;
Does the name fully and accurately describe what the variable represents?
Does the name refer to the real-world problem rather than the programming language solution?
Is the name long enough that you don't have to puzzle it out?
Are computed value qualifiers, if any, at the end of the name?
Are they specifically instantiated only at the point once required?
What rules do you use to name your variables?
camelCase for all important variables, CamelCase for all classes
Where are single letter vars allows?
In loop constructs and in mathematical funktions where the single letter var name is consistent with the mathematical definition.
How much info do you put in the name?
You should be able to read the code like a book. Function names should tell you what the function does (scalarProd(), addCustomer(), etc)
How about for example code?
what are your preferred meaningless variable names? (after foo & bar)
temp, tmp, input, I never really use foo and bar.
I would say try to name them as clearly as possible. Never use single letter variables and only use 'foo' and 'bar' if you're just testing something out (e.g., in interactive mode) and won't use it in production.
I like to prefix my variables with what they're going to be: str = String, int = Integer, bool = Boolean, etc.
Using a single letter is quick and easy in Loops: For i = 0 to 4...Loop
Variables are made to be a short but descriptive substitute for what you're using. If the variable is too short, you might not understand what it's for. If it's too long, you'll be typing forever for a variable that represents 5.
Foo & Bar are used for example code to show how the code works. You can use just about any different nonsensical characters to use instead. I usually just use i, x, & y.
My personal opinion of foo bar vs. fu bar is that it's too obvious and no one likes 2-character variables, 3 is much better!
In DSLs and other fluent interfaces often variable- and method-name taken together form a lexical entity. For example, I personally like the (admittedly heretic) naming pattern where the verb is put into the variable name rather than the method name. #see 6th Rule of Variable Naming
Also, I like the spartan use of $ as variable name for the main variable of a piece of code. For example, a class that pretty prints a tree structure can use $ for the StringBuffer inst var. #see This is Verbose!
Otherwise I refer to the Programmer's Phrasebook by Einar Hoest. #see http://www.nr.no/~einarwh/phrasebook/
I always use single letter variables in for loops, it's just nicer-looking and easier to read.
A lot of it depends on the language you're programming in too, I don't name variables the same in C++ as I do in Java (Java lends itself better to the excessively long variable names imo, but this could just a personal preference. Or it may have something to do with how Java built-ins are named...).
locals: fooBar;
members/types/functions FooBar
interfaces: IFooBar
As for me, single letters are only valid if the name is classic; i/j/k for only for local loop indexes, x,y,z for vector parts.
vars have names that convey meaning but are short enough to not wrap lines
foo,bar,baz. Pickle is also a favorite.
I learned not to ever use single-letter variable names back in my VB3 days. The problem is that if you want to search everywhere that a variable is used, it's kinda hard to search on a single letter!
The newer versions of Visual Studio have intelligent variable searching functions that avoid this problem, but old habits and all that. Anyway, I prefer to err on the side of ridiculous.
for (int firstStageRocketEngineIndex = 0; firstStageRocketEngineIndex < firstStageRocketEngines.Length; firstStageRocketEngineIndex++)
{
firstStageRocketEngines[firstStageRocketEngineIndex].Ignite();
Thread.Sleep(100); // Don't start them all at once. That would be bad.
}
It's pretty much unimportant how you name variables. You really don't need any rules, other than those specified by the language, or at minimum, those enforced by your compiler.
It's considered polite to pick names you think your teammates can figure out, but style rules don't really help with that as much as people think.
Since I work as a contractor, moving among different companies and projects, I prefer to avoid custom naming conventions. They make it more difficult for a new developer, or a maintenance developer, to become acquainted with (and follow) the standard being used.
So, while one can find points in them to disagree with, I look to the official Microsoft Net guidelines for a consistent set of naming conventions.
With some exceptions (Hungarian notation), I think consistent usage may be more useful than any arbitrary set of rules. That is, do it the same way every time.
.
I work in MathCAD and I'm happy because MathCAD gives me increadable possibilities in naming and I use them a lot. And I can`t understand how to programm without this.
To differ one var from another I have to include a lot of information in the name,for example:
1.On the first place - that is it -N for quantity,F for force and so on
2.On the second - additional indices - for direction of force for example
3.On the third - indexation inside vector or matrix var,for convinience I put var name in {} or [] brackets to show its dimensions.
So,as conclusion my var name is like
N.dirs / Fx i.row / {F}.w.(i,j.k) / {F}.w.(k,i.j).
Sometimes I have to add name of coordinate system for vector values
{F}.{GCS}.w.(i,j.k) / {F}.{LCS}.w.(i,j.k)
And as final step I add name of the external module in BOLD at the end of external function or var like Row.MTX.f([M]) because MathCAD doesn't have help string for function.
Use variables that describes clearly what it contains. If the class is going to get big, or if it is in the public scope the variable name needs to be described more accurately. Of course good naming makes you and other people understand the code better.
for example: use "employeeNumber" insetead of just "number".
use Btn or Button in the end of the name of variables reffering to buttons, str for strings and so on.
Start variables with lower case, start classes with uppercase.
example of class "MyBigClass", example of variable "myStringVariable"
Use upper case to indicate a new word for better readability. Don't use "_", because it looks uglier and takes longer time to write.
for example: use "employeeName".
Only use single character variables in loops.
Updated
First off, naming depends on existing conventions, whether from language, framework, library, or project. (When in Rome...) Example: Use the jQuery style for jQuery plugins, use the Apple style for iOS apps. The former example requires more vigilance (since JavaScript can get messy and isn't automatically checked), while the latter example is simpler since the standard has been well-enforced and followed. YMMV depending on the leaders, the community, and especially the tools.
I will set aside all my naming habits to follow any existing conventions.
In general, I follow these principles, all of which center around programming being another form of interpersonal communication through written language.
Readability - important parts should have solid names; but these names should not be a replacement for proper documentation of intent. The test for code readability is if you can come back to it months later and still be understanding enough to not toss the entire thing upon first impression. This means avoiding abbreviation; see the case against Hungarian notation.
Writeability - common areas and boilerplate should be kept simple (esp. if there's no IDE), so code is easier and more fun to write. This is a bit inspired by Rob Pyke's style.
Maintainability - if I add the type to my name like arrItems, then it would suck if I changed that property to be an instance of a CustomSet class that extends Array. Type notes should be kept in documentation, and only if appropriate (for APIs and such).
Standard, common naming - For dumb environments (text editors): Classes should be in ProperCase, variables should be short and if needed be in snake_case and functions should be in camelCase.
For JavaScript, it's a classic case of the restraints of the language and the tools affecting naming. It helps to distinguish variables from functions through different naming, since there's no IDE to hold your hand while this and prototype and other boilerplate obscure your vision and confuse your differentiation skills. It's also not uncommon to see all the unimportant or globally-derived vars in a scope be abbreviated. The language has no import [path] as [alias];, so local vars become aliases. And then there's the slew of different whitespacing conventions. The only solution here (and anywhere, really) is proper documentation of intent (and identity).
Also, the language itself is based around function level scope and closures, so that amount of flexibility can make blocks with variables in 2+ scope levels feel very messy, so I've seen naming where _ is prepended for each level in the scope chain to the vars in that scope.
I do a lot of php in nowadays, It was not always like that though and I have learned a couple of tricks when it comes to variable naming.
//this is my string variable
$strVar = "";
//this would represent an array
$arrCards = array();
//this is for an integer
$intTotal = NULL:
//object
$objDB = new database_class();
//boolean
$blValid = true;