Related
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Anyone else find naming classes and methods one of the most difficult part in programming?
Sometimes it seems i cant really find any name for a function i am writing, can this be because the function is not cohesive enough?
What do you do when no good name for a function comes to mind?
For naming functions, just avoid having simply nouns and rather name them after verbs. Some pointers:
Have function names that are unique visibly, e.g. don't have validateInput() and validateUserInput() since it's hard to say what one does over another. Also, avoid having characters that look very similar, e.g. the number 1 and lowercase 'l'. Sometimes it makes a difference.
Are you working on a project with multiple people? You should spend some time going over naming conventions as well, such as if the function name should have underscores, should be camelCase, etc.
Hungarian notation is a bad idea; avoid doing it.
Think about what the function is doing. The cohesion that you mentioned in your question comes to mind. Generally, functions should do just one thing, so don't name it constructCarAndRunCar() but rather have one function that constructs and another that runs it. If your functions are between, say 20 and 40 lines, you're good.
Sometimes, and this depends on the project, you might also want to prefix your function names with the class if the class is purely procedural (only composed of functions). So if you have a class that takes care of running a simulation, name your functions sim_pauseSimulation() and sim_restartSimulation(). If your class is OOP-based, this isn't an issue as much.
Don't use the underlying data structures in the functions themselves; these should be abstracted away. Rather than having functions like addToVector() or addToArray(), have them be addToList() instead. This is especially true if these are prototypes or the data structures might change later.
Finally, be consistent in your naming conventions. Once you come up with a convention after some thinking, stick to it. PHP comes to mind when thinking of inconsistent function names.
Happy coding! :)
Give it your best-shot and re-factor later if it still doesn't fit.
Sometimes it could be that your function is too large and therefore doing too many things. Try splitting up your function into other functions and it might be clearer what to call each individual function.
Don't worry about naming things with one or two words. Sometimes if functions do something that can be explained in a mini-sentence of sorts, go ahead and name the function a little longer if it'll help other developers understand what is going on.
Another suggestion is to get feedback from others. Often others who come from another perspective and seeing the function for the first time will have a better idea on what to call the function.
I follow following rule: Name according to the purpose (Why? - design decision) and not to the contents (What, How? - can be seen in the code).
For functions it is almost always an action (verb) followed by the noun of parameters and (or results. (Off-topic but for variables do not use "arrayOfNames" or "listOfNames", these are type information but simply "names"). This will also avoid inconsistencies if you refactor the code partly.
For given patterns like object creation, be consistent and always use the same naming like "Create..." (and not sometimes "Allocate..." or "Build..." otherwise you or your collegues will end up in scratching their head wound)
I find it easier to name functions when I don't have to cut back on the words. As long as your not doing javascript for the google start page you can do longer names.
For example you have the method dequeueReusableCellWithIdentifierandmergeChangesFromContextDidSaveNotification in apples cocoa framework.
As long as it's clear what the function is doing you can name it whatever you want and refactor it later.
Almost as important as the function name is that you are consistent with comments. Many IDEs will user your properly formatted comments not only to provide context sensitive help for a function you might be using, but they can be used to generate documentation. This is invaluable when returning to a project after a long period or when working with other developers.
In academic settings, they provide an appreciated demonstration of your intentions.
A good rule of thumb is [verb]returnDescription. This is easy with GetName() type functions and can't be applied universally. It's tough to find a balance between unobtrusive and descriptive code.
Here's a .Net convention guide, but it is applicable to most languages.
Go to www.thesaurus.com and try to find a better suited name though synonyms.
As a practical rule of my own, if a function name is too long, it should be atomized in a new object. Yet, i agree with all posts above. btw, nice noob question
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
On my previous job, providing all methods with javadoc was mandatory, which resulted in things like:
/**
* Sets the Frobber.
*
* #param frobber The frobber
*/
public setFrobber(Frobber frobber) { ... }
As you can see, the documentation takes up space and work, but adds little to the code.
Should documenting all methods be mandatory or optional? Is there a rule for which methods to document? What are pros and cons of requiring every method to be documented?
"providing all methods with javadoc was mandatory"
I strongly suspect that documenting all methods was mandatory, but providing javadoc comments was all that could be automatically enforced and hence all that was uniformly done.
Personally I think it's better to have no javadoc than completely useless javadoc - at least you can see from a glance at the HTML which methods are undocumented, because there are no descriptions of the parameters etc.
Documentation is frequently underrated, because it always seems less important and urgent when you're writing the code, than it does when you're using it later. But the style and form of documentation is often overrated - auto-generated XML nonsense is still nonsense. Given the choice, I'd rather have the code comment // Sets this object to use the specified frobber for all future frobbing, than your zero-information javadoc.
For all I know from your docs, the function doesn't actually modify this object at all, it might call the set() function on frobber, or it might be while(!frobber.isset()) { refrigerator.add(frobber); sleep(3600); refrigerator.remove(frobber); } Hence it "sets the frobber". I'm sure I read somewhere that "set" is the word with the most distinct definitions in the OED. Brief descriptions are ambiguous and hence misleading, and the purpose of documentation is to stop people relying on your source, and hence on details of your current implementation. My comment doesn't really take any longer to write than it took to add "Sets the frobber" and "the frobber" to the IDE-generated javadoc stub. It doesn't explain what frobbing is or when this object does it (hopefully that's elsewhere in the class docs) but at least it tries to tell you what the function does.
As for when to mandate documentation - I think every interface must be documented. If you're not defining Java interface s, the "interface" is every public and protected method, and every package-protected method unless the package is tiny. Implementation doesn't have to be documented, although it should be commented if the way it works is non-obvious. Documentation might be as simple as the sentence in my comment above - you don't necessarily need a separate sentence for each parameter if the method description already says what they are.
If you have code review, then IMO the answer is to review comments and documentation at the same time. If you don't have code review, then you need a cone of shame for whichever developer most recently forced someone else to come over and ask what the code actually does.
The same applies to anyone who relied on undocumented behaviour of a function, with a result that an implementation change that didn't change the interface, breaks their code. The way you enforce that code be documented, is to complain that you can't call it until you know what it guarantees to do. Arbitrary rules like, "javadoc comments must exist" become less important, at least for functions that other developers need to call.
For big projects or frameworks/libraries or even open source project that you are creating, it is mandatory. For small personal or private projects it is optional. Having said that, it is always a good idea to document your code so if you come back to your project after a year whether small or big, you know what it was doing. This really helps greatly.
You should always document your code. especially if someone else work or will work on your code. Maybe you didn't have a chance yet to work on legacy not-documented code but it can be a real pain!
About the comment itself, one thing to avoid is writing a comment because it is mandatory, Just think a few second and you'll find something to tell about your method, something that's not already in the method name, something that might not be obvious to the next developer. Explain what your method does, what are the corner cases, what it expect as input.
And remember :
Always code as if the guy who ends up
maintaining your code will be a
violent psychopath who knows where you
live.
it applies to comments too :)
It's much easier to maintain "self-documenting" code. If you choose good function and variable names, keep functions short (eg. < 10 lines with only a single idea per function), this will help keep the purpose of the code clear. And you won't have to try to keep the comments up to date - the only thing worse than no comments is comments that are wrong!
There's a good and recent summary of various points of view at InfoQ.
Documentation of code is very important. But Javadoc (or similar tools) are not the only and not the best method for this. The biggest downside is, that Javadoc-documentation must be kept up to date. If the method is changed, but the description stays the same, this documentation can do more trouble than good.
To avoid the problem with documentation not in sync with the code, use code to document. Unit-tests show how your code is used and asserts in the code can ensure that parameters and return-values are validated. In a project I added asserts to a calculation, that the probabilities in this calculation are always between 0 and 1. Later this assert triggered in a use case and pointed me directly to a bug.
The most important documentation is a good naming. If you set a Frobber, then setFrobber is a good name. The Javadoc given in your example adds nothing to this naming. frobIt would be a not so good name, method3 would be very bad. Code reviews should help to get good naming.
Javadocs and ither documentation should be added, if the other methods aren't sufficient. But in this case you need to take care, that this documentation is always up to date.
Q: Should documenting all methods be mandatory or optional?
A: Mandatory.
Q: Is there a rule for which methods to document?
A: All of them.
Q: What are pros and cons of requiring every method to be documented?
A: Pros: Smart people can spend time focusing on code writing, not code figuring-out. Code is well explained. Code can be passed to newbies. Cons: Whining. Stale comments.
A focus on quality commenting obviates the 'code is self-documenting' issues.
In the case of getters and setters, not every get and set is trivial. Sometimes it is, that's great. When it isn't, the comment should note the information. It's better to be conservative and always have comments than unconservative and have to scrap code and waste time figuring it out.
Final example: The Carmack Inverse Square Root code. Self-documenting, eh?
The problem:
You have some data and your program needs specified input. For example strings which are numbers. You are searching for a way to transform the original data in a format you need.
And the problem is: The source can be anything. It can be XML, property lists, binary which
contains the needed data deeply embedded in binary junk. And your output format may vary
also: It can be number strings, float, doubles....
You don't want to program. You want routines which gives you commands capable to transform the data in a form you wish. Surely it contains regular expressions, but it is very good designed and it offers capabilities which are sometimes much more easier and more powerful.
ADDITION:
Many users have this problem and hope that their programs can convert, read and write data which is given by other sources. If it can't, they are doomed or use programs like business
intelligence. That is NOT the problem.
I am talking of a tool for a developer who knows what is he doing, but who is also dissatisfied to write every time routines in a regular language. A professional data manipulation tool, something like a hex editor, regex, vi, grep, parser melted together
accessible by routines or a REPL.
If you have the spec of the data format, you can access and transform the data at once. No need to debug or plan meticulously how to program the transformation. I am searching for a solution because I don't believe the problem is new.
It allows:
joining/grouping/merging of results
inserting/deleting/finding/replacing
write macros which allows to execute a command chain repeatedly
meta-grouping (lists->tables->n-dimensional tables)
Example (No, I am not looking for a solution to this, it is just an example):
You want to read xml strings embedded in a binary file with variable length records. Your
tool reads the record length and deletes the junk surrounding your text. Now it splits open
the xml and extracts the strings. Being Indian number glyphs and containing decimal commas instead of decimal points, your tool transforms it into ASCII and replaces commas with points. Now the results must be stored into matrices of variable length....etc. etc.
I am searching for a good language / language-design and if possible, an implementation.
Which design do you like or even, if it does not fulfill the conditions, wouldn't you want to miss ?
EDIT: The question is if a solution for the problem exists and if yes, which implementations are available. You DO NOT implement your own sorting algorithm if Quicksort, Mergesort and Heapsort is available. You DO NOT invent your own text parsing
method if you have regular expressions. You DO NOT invent your own 3D language for graphics if OpenGL/Direct3D is available. There are existing solutions or at least papers describing the problem and giving suggestions. And there are people who may have worked and experienced such problems and who can give ideas and suggestions. The idea that this problem is totally new and I should work out and implement it myself without background
knowledge seems for me, I must admit, totally off the mark.
UPDATE:
Unfortunately I had less time than anticipated to delve in the subject because our development team is currently in a hot phase. But I have contacted the author of TextTransformer and he kindly answered my questions.
I have investigated TextTransformer (http://www.texttransformer.de) in the meantime and so far I can see it offers a complete and efficient solution if you are going to parse character data.
For anyone who will give it a try to implement a good parsing language, the smallest set of operators to directly transform any input data to any output data if (!) they were powerful enough seems to be:
Insert/Remove: Self-explaining
Group/Ungroup: Split the input data into a set of tokens and organize them into groups
and supergroups (datastructures, lists, tables etc.)
Transform
Substituition: Change the content of the tokens (special operation: replace)
Transposition: Change the order of tokens (swap,merge etc.)
Have you investigated TextTransformer?
I have no experience with this, but it sounds pretty good and the author makes quite competent posts in the comp.compilers newsgroup.
You still have to some programming work.
For a programmer, I would suggest:
Perl against a SQL backend.
For a non-programmer, what it sounds like you're looking for is some sort of business intelligence suite.
This suggestion may broaden the scope of your search too much... but here it is:
You could either reuse, as-is, or otherwise get "inspiration" from the [open source] code of the SnapLogic framework.
Edit (answering the comment on SnapLogic documentation etc.)
I agree, the SnapLogic documentation leaves a bit to be desired, in particular for people in your situation, i.e. when just trying to quickly get an overview of what SnapLogic can do, and if it would generally meet their needs, without investing much time and learn the system in earnest.
Also, I realize that the scope and typical uses of of SnapLogic differ, somewhat, from the requirements expressed in the question, and I should have taken the time to better articulate the possible connection.
So here goes...
A salient and powerful feature of SnapLogic is its ability to [virtually] codelessly create "pipelines" i.e. processes made from pre-built components;
Components addressing the most common needs of Data Integration tasks at-large are supplied with the SnapLogic framework. For example, there are components to
read in and/or write to files in CSV or XML or fixed length format
connect to various SQL backends (for either input, output or both)
transform/format [readily parsed] data fields
sort records
join records for lookup and general "denormalized" record building (akin to SQL joins but applicable to any input [of reasonnable size])
merge sources
Filter records within a source (to select and, at a later step, work on say only records with attribute "State" equal to "NY")
see this list of available components for more details
A relatively weak area of functionality of SnapLogic (for the described purpose of the OP) is with regards to parsing. Standard components will only read generic file formats (XML, RSS, CSV, Fixed Len, DBMSes...) therefore structured (or semi-structured?) files such as the one described in the question, with mixed binary and text and such are unlikely to ever be a standard component.
You'd therefore need to write your own parsing logic, in Python or Java, respecting the SnapLogic API of course so the module can later "play nice" with the other ones.
BTW, the task of parsing the files described could be done in one of two ways, with a "monolithic" reader component (i.e. one which takes in the whole file and produces an array of readily parsed records), or with a multi-component approach, whereby an input component reads in and parse the file at "record" level (or line level or block level whatever this may be), and other standard or custom SnapLogic components are used to create a pipeline which effectively expresses the logic of parsing a record (or block or...) into its individual fields/attributes.
The second approach is of course more modular and may be applicable if the goal is to process many different files format, whereby each new format requires piecing together components with no or little coding. Whatever the approach used for the input / parsing of the file(s), the SnapLogic framework remains available to create pipelines to then process the parsed input in various fashion.
My understanding of the question therefore prompted me to suggest SnapLogic as a possible framework for the problem at hand, because I understood the gap in feature concerning the "codeless" parsing of odd-formatted files, but also saw some commonality of features with regards to creating various processing pipelines.
I also edged my suggestion, with an expression like "inspire onself from", because of the possible feature gap, but also because of the relative lack of maturity of the SnapLogic offering and its apparent commercial/open-source ambivalence.
(Note: this statement is neither a critique of the technical maturity/value of the framework per-se, nor a critique of business-oriented use of open-source, but rather a warning that business/commercial pressures may shape the offering in various direction)
To summarize:
Depending on the specific details of the vision expressed in the question, SnapLogic may be worthy of consideration, provided one understands that "some-assembly-required" will apply, in particular in the area of file parsing, and that the specific shape and nature of the product may evolve (but then again it is open source so one can freeze it or bend it as needed).
A more generic remark is that SnapLogic is based on Python which is a very swell language for coding various connectors, convertion logic etc.
In reply to Paul Nathan you mentioned writing throwaway code as something rather unpleasant. I don't see why it should be so. After all, all of our code will be thrown away and replaced eventually, no matter how perfect we wrote it. So my opinion is that writing throwaway code is pretty much ok, if you don't spend too much time writing it.
So, it seems that there are two approaches to solving your solution: either a) find some specific tool intended for the purpose (parse data, perform some basic operations on it and storing it in some specific structure) or b) use some general purpose language with lots of libraries and code it yourself.
I don't think that approach a) is viable because sooner or later you'll bump into an obstacle not covered by the tool and you'll spend your time and nerves hacking the tool, or mailing the authors and waiting for them to implement what you need. I might as well be wrong, so please if you find a perfect tool, drop here a link (I myself am doing lots of data processing in my day job and I can't swear that I couldn't do it more efficiently).
Approach b) may at first seem "unpleasant", but given a nice high-level expressive language with bunch of useful libraries (regexps, XML manipulation, creating parsers...) it shouldn't be too hard, and may be gradually turned into a DSL for the very purpose. Beside Perl which was already mentioned, Python and Ruby sound like good candidates for these languages (I bet some Lisp derivatives too, but I have no experience there).
You might find AntlrWorks useful if you go so far as defining formal grammars for what you're parsing.
I have been reading over design-by-contract posts and examples, and there is something that I cannot seem to wrap my head around. In all of the examples I have seen, DbC is used on a trivial class testing its own state in the post-conditions (e.g. lots of Bank Accounts).
It seems to me that most of the time when you call a method of a class, it does much more work delegating method calls to its external dependencies. I understand how to check for this in a Unit-Test with specific scenarios using dependency inversion and mock objects that focus on the external behavior of the method, but how does this work with DbC and post-conditions?
My second question has to deal with understanding complex post-conditions. It seems to me that to write out a post-condition for many functions, that you basically have to re-write the body of the function for your post-condition to know what the new state is going to be. What is the point of that?
I really do like the notion of DbC and I think that it has great promise, particularly if I can figure out how to reproduce some failure state once I find a validated contract. Over the past couple of hours I have been reading some neat stuff wrt. automatic test generation in Eiffel. I am currently trying to improve my processes in C++ development, but I am open to learning something new if I can figure out how to not lose all of the ground I have made in my current projects. Thanks.
but how does this work with DbC and
post-conditions?
Every function is basically one of these:
A sequence of statements
A conditional statement
A loop
The idea is that you should check any postconditions about the results of the function that go beyond the union of the postconditions of all the functions called.
that you basically have to re-write
the body of the function for your
post-condition to know what the new
state is going to be
Think about it the other way round. What made you write the function in the first place? What were you pursuing? Can that be expressed in a postcondition which is more simple than the function body itself? A postcondition will typically use queries (what in C++ are const functions), while the body usually combines commands and queries (methods that modify the object and methods which only get information from it).
In some cases, yes, you will find out that you can really add little value with postconditions. In these cases, writing a bunch of tests will typically be enough.
See also:
Bertrand Meyer, Contract Driven
Development
Related questions 1, 2
Delegation at the contract level
most of the time when you call a
method of a class, it does much more
work delegating method calls to its
external dependencies
As for this first question: the implementation of a function/method may call many other function/methods, but if the designer of the code had a clear mind, this does not imply that the specification of the caller is the concatenation of the specifications of the callees. For a method that calls many others, the size of the specification can remain contained if the method accomplishes a precise and well-defined task. Which it should if the whole system was well designed.
You are clearly asking your question from the point of view of run-time assertion checking. In this context, the above would perhaps be expressed as "you don't need to re-check in the post-condition of the caller that all the callees have respected their respective contracts. These checks will already be made on each call. In the post-condition of the caller, only check the functionally visible result of the caller."
Understanding complex post-conditions
You may find this "ACSL by example" document interesting (although probably different from what you're used to). It contains many examples of formal contracts for C functions. The language of the contracts is intended for static verification instead of run-time checking, with all the advantages and the drawbacks that it entails. They are a little more sophisticated than the "Bank Accounts" that you mention — these functions implement real algorithms, although simple ones. The document keeps the contracts short and readable by introducing well-thought-out auxiliary predicates (which would be called queries in Eiffel, as Daniel points out in his answer).
I recently expressed my view about this elsewhere* , but I think it deserves further analysis so I'm posting this as its own question.
Let's say that I need to create and pass around a container in my program. I probably don't have a strong opinion about one kind of container versus another, at least at this stage, but I do pick one; for sake of argument, let's say I'm going to use a List<>.
The question is: Is it better to write my methods to accept and return a high level interface such as C#'s IEnumerable? Or should I write methods to take and pass the specific container class that I have chosen.
What factors and criteria should I look for to decide? What kind of programs work benefit from one or the other? Does the computer language affect your decision? Performance? Program size? Personal style?
(Does it even matter?)
**(Homework: find it. But please post your answer here before you look for my own, so as not bias you.)*
Your method should always accept the least-specific type it needs to execute its function. If your method needs to enumerate, accept IEnumerable. If it needs to do IList<>-specific things, by definition you must give it a IList<>.
The only thing that should affect your decision is how you plan to use the parameter. If you're only iterating over it, use IEnumerable<T>. If you are accessing indexed members (eg var x = list[3]) or modifying the list in any way (eg list.Add(x)) then use ICollection<T> or IList<T>.
There is always a tradeoff. The general rule of thumb is to declare things as high up the hierarchy as possible. So if all you need is access to the methods in IEnumerable then that is what you should use.
Another recent example of a SO question was a C API that took a filename instead of a File * (or file descriptor). There the filename severly limited what sores of things could be passed in (there are many things you can pass in with a file descriptor, but only one that has a filename).
Once you have to start casting you have either gone too high OR you should be making a second method that takes a more specific type.
The only exception to this that I can think of is when speed is an absolute must and you do not want to go through the expense of a virtual method call. Declaring the specific type removes the overhead of virtual functions (will depend on the language/environment/implementation, but as a general statement that is likely correct).
It was a discussion with me that prompted this question, so Euro Micelli already knows my answer, but here it is! :)
I think Linq to Objects already provides a great answer to this question. By using the simplest interface to a sequence of items it could, it gives maximum flexibility about how you implement that sequence, which allows lazy generation, boosting productivity without sacrificing performance (not in any real sense).
It is true that premature abstraction can have a cost - but mainly it is the cost of discovering/inventing new abstractions. But if you already have perfectly good ones provided to you, then you'd be crazy not to take advantage of them, and that is what the generic collection interfaces provides you with.
There are those who will tell you that it is "easier" to make all the data in a class public, just in case you will need to access it. In the same way, Euro advised that it would be better to use a rich interface to a container such as IList<T> (or even the concrete class List<T>) and then clean up the mess later.
But I think, just as it is better to hide the data members of a class that you don't want to access, to allow you to modify the implementation of that class easily later, so you should use the simplest interface available to refer to a sequence of items. It is easier in practice to start by exposing something simple and basic and then "loosen" it later, than it is to start with something loose and struggle to impose order on it.
So assume IEnumerable<T> will do to represent a sequence. Then in those cases where you need to Add or Remove items (but still don't need by-index lookup), use IContainer<T>, which inherits IEnumerable<T> and so will be perfectly interoperable with your other code.
This way it will be perfectly clear (just from local examination of some code) precisely what that code will be able to do with the data.
Small programs require less abstraction, it is true. But if they are successful, they tend to become big programs. This is much easier if they employ simple abstractions in the first place.
It does matter, but the correct solution completely depends on usage. If you only need to do a simple enumeration then sure use IEnumerable that way you can pass any implementer to access the functionality you need. However if you need list functionality and you don't want to have to create a new instance of a list if by chance every time the method is called the enumerable that was passed wasn't a list then go with a list.
I answered a similar C# question here. I think you should always provide the simplest contract you can, which in the case of collections in my opinion, ordinarily is IEnumerable Of T.
The implementation can be provided by an internal BCL type - be it Set, Collection, List etcetera - whose required members are exposed by your type.
Your abstract type can always inherit simple BCL types, which are implemented by your concrete types. This in my opinion allows you to adhere to LSP easier.