"Convenience" Functions - language-agnostic

Several parts of my library come with "convenience" functions. For example, a container class might have a function to parse information from a string. These functions are not necessarily needed (or wanted) all the time, so I'd like to put them in separate files so they can be included or left out according to the users' needs.
How should this be structured? Should I put all the "convenience" stuff in header files in a separate folder? Or perhaps it belongs in a completely separate library...?
How do big libraries (like Boost) handle this sort of thing? Or do they just avoid it altogether?

"Should" is a word which tends to provoke religious responses, but I think you'll be better off thinking about this as though you were a user of your library.
How would you want it to be structured? Everything in one api so you can find it, or scattered around the classpath?
Is there any real reason to even consider putting (for example) your container class's "parseString" method anywhere else than in the container class?

It's common for library providers to organize their library into logical pieces, but then provide a way to include the entire library in one go (in C/C++, a single header file; in Ruby, a single include, &c.). This allows for good cohesion, and allows library users to include just the pieces they need, if they desire.

Related

When, why and how to use wrappers?

I'm talking about wrappers for third-party libraries. Until recently I was trying to provide a general enough wrapper so I could easily switch libraries if needed. This however proved to be nearly impossible since libraries can vary greatly even in terms of how basic concepts are handled.
So the question came to me why one should use wrappers at all. (In the past I have been encouraged by experienced coders to write wrappers for 3rd-party libs.) I came to the following conclusions; please tell me if they are wrong or if you have anything to add.
If the library isn't widely used in the application (e.g. used by only one or two classes), don't write a wrapper at all, just use it directly. (Especially if it's a portable lib.)
When you do write wrappers don't think you can make one-size-fit-all wrapper. Write something appropriate for the strengths of the lib.
... But in some cases you can still generalize the wrapper enough so that it'll be somewhat easier to switch libraries. (E.g.: most graphics libraries use images and fonts.)
Wrappers are useful for when the library offers more functionality than you need. You can hide the unneeded functionality in the wrapper.
In the case of C libs (if you're using C++), you can also write a wrapper to help you with automatic memory management.
What do you think are the (dis)advantages of using wrappers, and how should they be used properly?
I think you've hit the nail on the head, wrappers just to allow something to potentially be swapped out is a bad idea. The classic example is a database and who has actually ever had to switch from SQL to Oracle (I know people have, but how often and did having a wrapper really help?).
In my experience a wrapper only helps if it is hiding 2+ calls to the 3rd party component or api's into a single call that means something to the calling code (basically a facade pattern) or if it is wrapping the code and adding value / type conversion for the caller (an adapter pattern).
So the wrapper must provide a benefit here and now to the consumer, not a potential future benefit (to the system coder) that may never be needed.
Wrappers are powerfull if you want to test in isolation. For example my development system has no connection to my customers activedirectory that holds usernames and roles. so i have a UserInfoWrapper-Interface with two implementations: one that uses activedirectory and one with fake userdata used for development.
"All problems in computer science can be solved by another level of indirection" by Butler Lampson
There is a cost involved in abstracting third party libraries by creating a wrapper. You need to decide whether the cost is worth it or not. For e.g. it is extremely difficult (or at least involves significant development cost) to create wrapper on UI toolkits or libraries. On the contrary it is relatively easy to create wrappers for third party logging libraries.
Wrapper can also be used to provide domain specific and a simplified API on top of third party libary. Facade pattern could be of help (as Paul Hadfield has mentioned above).

How do you organize your code?

I'm not referring to the indentation or the directory structure but the actual file itself.
Do you arrange your members and methods alphabetically? Maybe in their order of use or order of complex logic (either ascending or descending)? Is there any rhyme to your madness?
I'm thinking about making the switch to alphabetically but some situations would just madden me:
var _height
var _properties
var _width
width and height should definitely be grouped together... but sometimes finding the right method in a larger file can be pretty disheartening.
What do you do?
I tend to be disorganized. I certainly don't arrange my variables alphabetically. And with C#, I am even less organized because some variables I use stuff_like_this, and with properties I might DoItLikeThis. It kind of drives me mad at times, but in the end I like letting the IDE features do the work for me. Visual Studio is incredibly nice, and being able to just right click on a variable to go to its definition, or see everywhere in the code that it's being used is downright awesome. I don't know what IDE you use, but hopefully it's got similar features. In the end, I care less about the organization and really need to worry if my design is fundamentally sound (and that part usually takes me a few tries to get right).
I would suggest not trying to apply any sweeping general rules to your code. Rather, on a case-by-case basis, organize it in a way that makes sense. In the example you give, purely alphabetical would not make sense. In other cases, it would.
Methods: public first, then private. Methods which are related (e.g. getHeight(), getWidth(), getArea()) are placed near each other. If there is a hierarchal relationship between methods, then high-level before the low level (e.g. getArea() uses getWidth() so it's placed before it)
variables: similar to functions, public first, then private. Grouped according to context.
Alphabetical? I don't like it. It can be a nightmare when reading / modifying the code. I cannot tell whether a function's name is sqrt() or calculateSqrt(), so I won't enjoy looking for it. If the functions are organized according to context, it will be much easier to find them.
Generally order the members by their nature (constants, fields, constructors, methods) and then by their visibility (private, protected, public).
For finding the right thing I rely a lot on a modern IDE's capabilities (e.g. Visual Studio + Resharper)
Functions are topologically sorted with the "root" at the bottom of the file.
Variables hardly matter -- if a single "unit" (function, class, etc.) has enough variables for their arrangement to mean anything, that's probably something that needs fixing in itself.
I organize my code is by type of the elements, meaning that all the methods are at one place (at the end of the file, that is), all the constructor in one (before methods), etc. The same goes for variables, constants, enums etc. Usually I place private variables first and others after them, but that's not a strict rule. Also by nature code related to each other often ends up at the same place, but that's just because it is easiest.
I don't bother keeping my code in alphabetical order as I do that easily in the Eclipse outline view. However, if the editor does not support this, then I might use alphabetical order. Well, at least for bigger files.

Class member order in source code

This has been asked before (question no. 308581), but that particular question and the answers are a bit C++ specific and a lot of things there are not really relevant in languages like Java or C#.
The thing is, that even after refactorization, I find that there is a bit of mess in my source code files. I mean, the function bodies are alright, but I'm not quite happy with the way the functions themselves are ordered. Of course, in an IDE like Visual Studio it is relatively easy to find a member if you remember how it is called, but this is not always the case.
I've tried a couple of approaches like putting public methods first but that the drawback of this approach is that a function at the top of the file ends up calling an other private function at the bottom of the file so I end up scrolling all the time.
Another approach is to try to group related methods together (maybe into regions) but obviously this has its limits as if there are many non-related methods in the same class then maybe it's time to break up the class to two or more smaller classes.
So consider this: your code has been refactored properly so that it satisfies all the requirements mentioned in Code Complete, but you would still like to reorder your methods for ergonomic purposes. What's your approach?
(Actually, while not exactly a technical problem, this is problem really annoys the hell out of me so I would be really grateful if someone could come up with a good approach)
Actually I totally rely on the navigation functionality of my IDE, i.e. Visual Studio. Most of the time I use F12 to jump to the declaration (or Shift-F12 to find all references) and the Ctrl+- to jump back.
The reason for that is that most of the time I am working on code that I haven't written myself and I don't want to spend my time re-ordering methods and fields.
P.S.: And I also use RockScroll, a VS add-in which makes navigating and scrolling large files quite easy
If you're really having problems scrolling and finding, it's possible you're suffering from god class syndrome.
Fwiw, I personally tend to go with:
class
{
#statics (if any)
#constructor
#destructor (if any)
#member variables
#properties (if any)
#public methods (overrides, etc, first then extensions)
#private (aka helper) methods (if any)
}
And I have no aversion to region blocks, nor comments, so make free use of both to denote relationships.
From my (Java) point of view I would say constructors, public methods, private methods, in that order. I always try to group methods implementing a certain interface together.
My favorite weapon of choice is IntelliJ IDEA, which has some nice possibilities to fold methods bodies so it is quite easy to display two methods directly above each other even when their actual position in the source file is 700 lines apart.
I would be careful with monkeying around with the position of methods in the actual source. Your IDE should give you the ability to view the source in the way you want. This is especially relevant when working on a project where developers can use their IDE of choice.
My order, here it comes.
I usually put statics first.
Next come member variables and properties, a property that accesses one specific member is grouped together with this member. I try to group related information together, for example all strings that contain path information.
Third is the constructor (or constructors if you have several).
After that follow the methods. Those are ordered by whatever appears logical for that specific class. I often group methods by their access level: private, protected, public. But I recently had a class that needed to override a lot of methods from its base class. Since I was doing a lot of work there, I put them together in one group, regardless of their access level.
My recommendation: Order your classes so that it helps your workflow. Do not simply order them, just to have order. The time spent on ordering should be an investment that helps you save more time that you would otherwise need to scroll up and down.
In C# I use #region to seperate those groups from each other, but that is a matter of taste. There are a lot of people who don't like regions. I do.
I place the most recent method I just created on top of the class. That way when I open the project, I'm back at the last method I'm developing. Easier for me to get back "in the zone."
It also reflected the fact that the method(which uses other methods) I just created is the topmost layer of other methods.
Group related functions together, don't be hard-pressed to put all private functions at the bottom. Likewise, imitate the design rationale of C#'s properties, related functions should be in close proximity to each other, the C# language construct for properties reinforces that idea.
P.S.
If only C# can nest functions like Pascal or Delphi. Maybe Anders Hejlsberg can put it in C#, he also invented Turbo Pascal and Delphi :-) D language has nested functions.
A few years ago I spent far too much time pondering this question, and came up with a horrendously complex system for ordering the declarations within a class. The order would depend on the access specifier, whether a method or field was static, transient, volatile etc.
It wasn't worth it. IMHO you get no real benefit from such a complex arrangement.
What I do nowadays is much simpler:
Constructors (default constructor first, otherwise order doesn't matter.)
Methods, sorted by name (static vs. non-static doesn't matter, nor abstract vs. concrete, virtual vs. final etc.)
Inner classes, sorted by name (interface vs. class etc. doesn't matter)
Fields, sorted by name (static vs. non-static doesn't matter.) Optionally constants (public static final) first, but this is not essential.
i pretty sure there was a visual studio addin that could re-order the class members in the code.
so i.e. ctors on the top of the class then static methods then instance methods...
something like that
unfortunately i can't remember the name of this addin! i also think that this addin was for free!
maybe someone other can help us out?
My personal take for structuring a class is as follows:
I'm strict with
constants and static fields first, in alpha order
non-private inner classes and enums in alpha order
fields (and attributes where applicable), in alpha order
ctors (and dtors where applicable)
static methods and factory methods
methods below, in alpha order, regardless of visibility.
I use the auto-formatting capabilities of an IDE at all times. So I'm constantly hitting Ctrl+Shift+F when I'm working. I export auto-formatting capabilities in an xml file which I carry with me everywhere.
It helps down the lane when doing merges and rebases. And it is the type of thing you can automate in your IDE or build process so that you don't have to make a brain cell sweat for it.
I'm not claiming MY WAY is the way. But pick something, configure it, use it consistently until it becomes a reflex, and thus forget about it.

How many classes should a programmer put in one file?

In your object-oriented language, what guidelines do you follow for grouping classes into a single file? Do you always give each class a seperate file? Do you put tightly coupled classes together? Have you ever specified a couple of implementations of an interface in one file? Do you do it based on how many lines of code the implementation might be or how "cluttered" it might look to the user of the class? Or would the user prefer to have everything on one place?
Personally, I suggest one class per file unless the secondary classes are private to the primary class in the file. For example, a nested class in C# would remain in the parent classes file, but utility classes that might be useful elsewhere get broken into their own file or even namespace.
The key is to understand your environment and where people will look for things. If there is an established methodology in place, think carefully before you upset it. If your coworkers expect that related, tightly bound classes will be in a single document, having to search for them could be annoying (although with modern IDEs it shouldn't be a problem).
An additional reason for breaking things into more files rather than less is version control. If you make a small change, it should change only a small file where possible. If you make a sweeping change, it is obvious looking at the logs because of all the files (and indirectly, classes) that were affected are noted.
I think best practices in all OO languages I have ever used is to have one class in one file. I believe some languages may require this but I am not sure of that fact. But I would say that one class per file, and the name of the file matching the name of the class (as well as the directory structure matching the package structure for the most part) is best-practice.
1 class = 2 files. An .h and a .c, you kids are so lucky :)
There is no hard and fast rule that must always be followed (unless a particular language enforces it). There are good reasons for having just one class, or having multiple classes in a file. And it does depend on the language.
In C# and Java people tend to stick to one file per class.
I'd say in C++ though I often put multiple classes in one file. Often those classes are small and very related. Eg. One class for each message in some communications protocol. In this case a file for each would mean a lot of files and actually make maintenance and reading of the code more difficult than if they were in one file.
In C++ the implementation of a class is separate from the class definition, so each class { /body/ } is smaller than in other language and that means classes are more conveniently sized for grouping together in one file.
In C++ if you're writing a library (eg the standard template library) , you can put all the classes in one file. Users only need to include the one header file and they get all the classes then, so its easier for them to work with.
There's a balance. The answer is whatever is most easy to comprehend and maintain. By default it makes sense to have one class per file, but there are plenty of cases when it's more practical to work with a related set of classes if they are defined in one file.
I put classes into the same file if they belong together, either for techinical or aesthetic reasons. For example, in an application that provides a plugin interface, the classes Plugin (base class for plugins) and PluginManager I would usually put together in the same file. However, if the file grows too big for my taste, I would split them into separate file.
I note that I write code mostly in Python at the moment, and this influences my design. Python is very flexible in how I get to divide stuff into modules, and has good tools for managing the name spaces of things. For example, I usually put all the code for an application in a Python module (a directory with __init__.py) and have the module import specific names from sub-modules. The API is then something like applib.PluginManager rather than applib.pluginstuff.PluginManager.
This makes it easy to move things around, which also allows me to not be so fussy when I am creating the design: I can always fix things later.
One class = one file. Always. Apart from when one class = multiple files in C#, or a class contains inner classes etc of course ;)
One per a file is our standard. The only exception is that for a class and it's typed collection we put those together.
Over time I've come to relize, that "small class" always tend to grow. And then you'll want to split them up, confusing everyone else on the team (and your self).
I try to keep one class per file (like most of the above), unless they are small classes. If there are a lot of them, I may split them into subjects otherwise. But usually I just keep them all in one file with code-folding in editors. For my private hacks, it just isn't worth the (minimal) effort to me.
One class per file seems to be the standard. This is the way that I usually do it as well.
There have been a few times where I've strayed away from this. Particularly when a smaller class is a member of another class. For example, when designing a data structure, I would likely implement a "node" class within the same file as the "bigstructure" class.
In your object-oriented language, what guidelines do you follow for grouping classes into a single file?
It depends. In team work I try to follow team standards; in solo work I tend more towards whatever-I-please.
In solo work, then ...
Do you always give each class a seperate file? Do you put tightly coupled classes together? Have you ever specified a couple of implementations of an interface in one file?
No. Sometimes. Yes.
Do you do it based on how many lines of code the implementation might be or how "cluttered" it might look to the user of the class? Or would the user prefer to have everything on one place?
It's mostly based on:
How easy is it to navigate? A huge long source file, with many classes, is more difficult.
How easy is it to edit? When editing multiple, short, related classes, it may be easier if they're all in one source file with a splitter than if they're in several source files, given that I run my text editor maximized showing one source file at a time.
I prefer 1 to 1 for classes unless the inner class will be entirely private.
Even then I usually break it out for ease of finding it and tracking changes in SVN.

What's the difference between data and code?

To take an example, consider a set of discounts available to a supermarket shopper.
We could define these rules as data in some standard fashion (lists of qualifying items, applicable dates, coupon codes) and write generic code to handle these. Or, we could write each as a chunk of code, which checks for the appropriate things given the customer's shopping list and returns any applicable discounts.
You could reasonably store the rules as objects, serialised into Blobs or stored in code files, so that each rule could choose its own division between data and code, to allow for future rules that wouldn't fit the type of generic processor considered above.
It's often easy to criticise code that mixes data in, via if statements that check for 6 different things that should be in a file or a database, but is there a rule that helps in the edge cases?
Or is this the point of Object Oriented design, to stop us worrying about the line between data and code?
To clarify, the underlying question is this: How would you code the above example? Is there a rule of thumb that made you decide what is data and what is code?
(Note: I know, code can be compiled, but in a world of dynamic languages and JIT compilation, even that is a blurry concept.)
Fundamentally, there is of course no difference between data and code, but for real software infrastructures, there can be a big difference. Apart from obvious things like, as you mentioned, compilation, the biggest issue is this:
Most sufficiently large projects are designed to produce "releases" that are one big bundle, produced in 3-month (or longer) cycles, tested extensively and cannot be changed afterwards except in tightly controlled ways. "Code" most definitely cannot be changed, so anything that does need to be changed has to be factored out and made "configuration data" so that changing it becomes palatable those whose job it is to ensure that a release works.
Of course, in most cases bad configuration data can break a release just as thoroughly as bad code, so the whole thing is largely an illusion - in reality it doesn't matter whether it's code or "configuration data" that changes, what matters is that the interface between the main system and the parts that change is narrow and well-defined enough to give you a good chance that the person who does the change understands all consequences of what he's doing.
This is already harder than most people think when it's really just a few strings and numbers that are configured (I've personally witnessed a production mainframe system crash because it had one boolean value set differently than another system it was talking to). When your "configuration data" contains complex logic, it's almost impossible to achieve. But the situation isn't going to be any better ust because you use a badly-designed ad hoc "rules configuration" language instead of "real" code.
This is a rather philosophical question (which I like) so I'll answer it in a philosophical way: with nothing much to back it up. ;)
Data is the part of a system that can change. Code defines behavior; the way in which data can change into new data.
To put it more accurately: Data can be described by two components: a description of what the datum is supposed to represent (for instance, a variable with a name and a type) and a value.
The value of the variable can change according to rules defined in code. The description does not change, of course, because if it does, we have a whole new piece of information.
The code itself does not change, unless requirements (what we expect of the system) change.
To a compiler (or a VM), code is actually the data on which it performs its operations. However, the to-be-compiled code does not specify behavior for the compiler, the compiler's own code does that.
It all depends on the requirement. If the data is like lookup data and changes frequently you dont really want to do it in code, but things like Day of the Week, should not chnage for the next 200 years or so, so code that.
You might consider changing your topic, as the first thing I thought of when I saw it, was the age old LISP discussion of code vs data. Lucky in Scheme code and data looks the same, but thats about it, you can never accidentally mix code with data as is very possible in LISP with unhygienic macros.
Data are information that are processed by instructions called Code. I'm not sure I feel there's a blurring in OOD, there are still properties (Data) and methods (Code). The OO theory encapsulates both into a gestalt entity called a Class but they are still discrete within the Class.
How flexible you want to make your code in a matter of choice. Including constant values (what you are doing by using if statements as described above) is inflexible without re-processing your source, whereas using dynamically sourced data is more flexible. Is either approach wrong? I would say it really depends on the circumstances. As Leppie said, there are certain 'data' points that are invariate, like the days of the week that can be hard coded but even there it may be advantageous to do it dynamically in certain circumstances.
In Lisp, your code is data, and your
data is code
In Prolog clauses are terms, and terms
are clauses.
The important note is that you want to separate out the part of your code that will execute the same every time, (i.e. applying a discount) from the part of your code which could change (i.e. the products to be discounted, or the % of the discount, etc.)
This is simply for safety. If a discount changes, you won't have to re-write your discount code, you'll only need to go into your discounts repository (DB, or app file, or xml file, or however you choose to implement it) and make a small change to a number.
Also, if the discount code is separated into an XML file, then you can give the entire application to a manager, and with sufficient instructions, they won't need to pester you whenever they want to change the discount rates.
When you mix in data and code, you are exponentially increasing the odds of breaking when anything changes. So, as leppie said, you need to extract the constantly changing parts, and put them in a separate place.
Huge difference. Data is a given to system while code is a part of system.
Wrong data is senseless: our code===handler is good and what you put that you take, it is not a trouble of system that you meant something else. But if code is bad - system is bad.
In example, let's consider some JSON, some bad code parser.js by me and let's say good V8. For my system bad parser.js is a code and my system works wrong. But for Google system my bad parser is data that no how says about quality of V8.
The question is very practical, no sophistic.
https://en.wikipedia.org/wiki/Systems_engineering tries to make good answer and money.
Data is information. It's not about where you decide to put it, be it a db, config file, config through code or inside the classes.
The same happens for behaviors / code. It's not about where you decide to put it or how you choose to represent it.
The line between data and code (program) is blurry. It's ultimately just a question of terminology - for example, you could say that data is everything that is not code. But, as you wrote, they can be happily mixed together (although usually it's better to keep them separate).
Code is any data which can be executed. Now since all data is used as input to some program at some point of time, it can be said that this data is executed by a program! Thus your program acts as a virtual machine for your data. Hence in theory there is no difference between data and code!
In the end what matters is software engineering/development considerations like performance, efficiency etc. For example data driven programs may not be as efficient as programs which have hard coded (and hence fragile) conditional statements. Hence I choose to define code as any data which can be efficiently executed and all else being plain data.
It's a tradeoff between flexibility and efficiency. Executable data (like XML rules) offers more flexibility (sometimes) while the same data/rules when coded as part of the application will run more efficiently but changing it frequently becomes cumbersome. In other words executable data is easy to deploy but is inefficient and vice-versa. So ultimately the decision rests with you - the software designer.
Please correct me if I wrong.
Relationship between code and data is as follows:
code after compiled to a program processes the data while execution
program can extract data, transform data, load data, generate data ...
Also
program can extract code, transform code, load code, generate code tooooooo...
Hence code without compiled or interperator is useless, data is always worth..., but code after compiled can do all the above activities....
For eg)
Sourcecontrolsystem process Sourcecodes
here source code itself is a code
Backupscripts process files
here files is a data and so on...
I would say that the distinction between data, code and configuration is something to be made within the context of a particular component. Sometimes it's obvious, sometimes less so.
For example, to a compiler, the source code it consumes and the object code it creates are both data - and should be separated from the compiler's own code.
In your case you seem to be describing the option of a particularly powerful configuration file, which can contain code. Much as, for example, the GIMP lets you 'configure' plugins using Scheme. As the developer of the component that reads this configuration, you would think of it as data. When working at a different level -- writing the configuration -- you would think of it as code.
This is a very powerful way of designing.
Applying this to the underlying question ("How would you code the above example?"), one option might be to adopt or design a high level Domain Specific Language (DSL) for specifying rules. At startup, or when first required, the server reads the rule and executes it.
Provide an admin interface allowing the administrator to
test a new rule file
replace the current configuration with that from a new rule file
... all of which would happen at runtime.
A DSL might be something as simple as a table parser or an XML parser, or it could be something as sophisticated as a scripting language. From C, it's easy to embed Python or Lua. From Java it's easy to embed Groovy or Clojure.
You could switch in compiled code at runtime, with clever linking or classloader tricks. This seems more difficult and less valuable than the embedded DSL option, in my opinion.
The best practical answer to this question I found is this:
Any class that needs to be serialized, now or in any foreseeable future, is data.
Everything else is code.
That's why, for example, Java's HashMap is data - although it has a lot of code, API methods and specific implementation (i.e., it might look as code at first glance).