Related
In theory, source code should not contain hardcoded values beyond 0, 1 and the empty string. In practice, I find it very hard to avoid all hardcoded values while being on very tight delivery times, so I end up using a few of them and feeling a little guilty.
How do you reconcile avoiding hardcoded values with tight delivery times?
To avoid hard-coding you should
use configuration files (put your values in XML or ini-like text files).
use database table(s) to store your values.
Of course not all values qualify to be moved to a config file. For those you should use constructs provided by the programming language such as (constants, enums, etc).
Just saw an answer to use "Constatn Interface". With all due respect to the poster and the voters, but this is not recommended. You can read more about that at:
http://en.wikipedia.org/wiki/Constant_interface
The assumption behind the question seems to me invalid.
For most software, configuration files are massively more difficult to change that source code. For widely installed, software, this could easily be a factor of a million times more difficult: there could easily be that many files hanging round on user installations which you have little knowledge and no control over.
Having numeric literals in the software is no different from having functional or algorithmic literals: it's just source code. It is the responsibility of any software that intends to be useful to get those values right.
Failing that make them at least maintainable: well named and organised.
Making them configurable is the kind of last-ditch compromise you might be forced into if you are on a tight schedule.
This comes with a little bit of planning, in most cases it is as simple as having a configuration file, or possibly a database table that stores critical configuration items. I don't find that there is any reason that you "have" to have hard coded values, and it shouldn't take you much additional time to offload to a configuration mechanism to where tight time lines would be a valid excuse.
The problem of hardcoded values is that sometimes it's not obvoius that particular code relies on them. For example, in java it is possible to move all constants into separate interface and separate particular constants into inner sub-interfaces. It's quite convenient and obvious. Also it's easy to find the constant usage just by using IDE facilities ("find usage" functionality) and change or refactor them.
Here's an example:
public interface IConstants {
public interface URL {
String ALL = "/**";
}
public interface REST_URL {
String DEBUG = "/debug";
String SYSTEM = "/system";
String GENERATE = "/generate";
}
}
Referencing is quite human readable: IConstants.REST_URL.SYSTEM
Most non-trivial enterprise-y projects will have some central concept of properties or configuration options, which already takes care of loading up option from a file/database. In these cases, it's usually simple (as in, less than 5 minutes' work) to extend this to support the new propert(ies) you need.
If your project doesn't have one, then either:
It could benefit from one - in which case write it, taking values from a flat .properties file to start with. This shouldn't take more than an hour, tops, and is reusable for any config stuff in future
Even that hour would be a waste - in which case, you can still hav a default value but allow this to be overridden by a system property. This require no infrastructure work and minimal time to implement in your class.
There's really no excuse for hardcoding values - it only saves you a few minutes at most, and if your project deadline is measured in minutes then you've got bigger problems than how to code for configurability.
Admittedly, I hardcode a lot of stuff at my current hobby project. Configuration files are ridiculously easy to use instead (at least with Python, which comes with a great and simple .cfg parser), I just don't bother to use them because I am 99% confident that I will never have to change them - and even if that assumption proved false, it's small enough to refactor it with reasonable effort. For annything larger/more important, however, I would never type if foo == "hardcoded bar", but rather if foo == cfg.bar (likely with a more meaningful name for cfg). Cfg is a global singleton (yeah, I know...) which is fed the .cfg file at startup, and next time some sentinel value changes, you change the configuration file and not the source.
With a dynamic/reflective language, you don't even need to change the part loading the .cfg when you add another value to it - make it populate the cfg object dynamically with all entries in the file (or use a hashmap, for that matter) and be done.
2 suggestions here:
First, if you are working on embedded system using language like C. Simply work out a coding convention to use a #define for any string or constant. All the #define should be categorized in a .h file. That should be enough - not too complex but enough for maintainability. You don't need to mangle all the constant between the code line.
Second, if you are working on a application with access to DB. It is simple just to keep all the constant values in the database. You just need a very simple interface API to do the retrieval.
With simple tricks, both methods can be extended to support multi-language feature.
For TDD you have to
Create a test that fail
Do the simplest thing that could possible work to pass the test
Add more variants of the test and repeat
Refactor when a pattern emerge
With this approach you're supposing to cover all the cases ( that comes to my mind at least) but I'm wonder if am I being too strict here and if it is possible to "think ahead" some scenarios instead of simple discover them.
For instance, I'm processing a file and if it doesn't conform to a certain format I am to throw an InvalidFormatException
So my first test was:
#Test
void testFormat(){
// empty doesn't do anything nor throw anything
processor.validate("empty.txt");
try {
processor.validate("invalid.txt");
assert false: "Should have thrown InvalidFormatException";
} catch( InvalidFormatException ife ) {
assert "Invalid format".equals( ife.getMessage() );
}
}
I run it and it fails because it doesn't throw an exception.
So the next thing that comes to my mind is: "Do the simplest thing that could possible work", so I :
public void validate( String fileName ) throws InvalidFormatException {
if(fileName.equals("invalid.txt") {
throw new InvalidFormatException("Invalid format");
}
}
Doh!! ( although the real code is a bit more complicated, I found my self doing something like this several times )
I know that I have to eventually add another file name and other test that would make this approach impractical and that would force me to refactor to something that makes sense ( which if I understood correctly is the point of TDD, to discover the patterns the usage unveils ) but:
Q: am I taking too literal the "Do the simplest thing..." stuff?
I think your approach is fine, if you're comfortable with it. You didn't waste time writing a silly case and solving it in a silly way - you wrote a serious test for real desired functionality and made it pass in - as you say - the simplest way that could possibly work. Now - and into the future, as you add more and more real functionality - you're ensuring that your code has the desired behavior of throwing the correct exception on one particular badly-formatted file. What comes next is to make that behavior real - and you can drive that by writing more tests. When it becomes simpler to write the correct code than to fake it again, that's when you'll write the correct code. That assessment varies among programmers - and of course some would decide that time is when the first failing test is written.
You're using very small steps, and that's the most comfortable approach for me and some other TDDers. If you're more comfortable with larger steps, that's fine, too - but know you can always fall back on a finer-grained process on those occasions when the big steps trip you up.
Of course your interpretation of the rule is too literal.
It should probably sound like "Do the simplest potentially useful thing..."
Also, I think that when writing implementation you should forget the body of the test which you are trying to satisfy. You should remember only the name of the test (which should tell you about what it tests). In this way you will be forced to write the code generic enough to be useful.
I too am a TDD newbie struggling with this question. While researching, I found this blog post by Roy Osherove that was the first and only concrete and tangible definition of "the simplest thing that could possibly work" that I have found (and even Roy admitted it was just a start).
In a nutshell, Roy says:
Look at the code you just wrote in your production code and ask yourself the following:
“Can I implement the same solution in a way that is ..”
“.. More hard-coded ..”
“.. Closer to the beginning of the method I wrote it in.. “
“.. Less indented (in as less “scopes” as possible like ifs, loops, try-catch) ..”
“.. shorter (literally less characters to write) yet still readable ..”
“… and still make all the tests pass?”
If the answer to one of these is “yes” then do that, and see all the tests still passing.
Lots of comments:
If validation of "empty.txt" throws an exception, you don't catch it.
Don't Repeat Yourself. You should have a single test function that decides if validation does or does not throw the exception. Then call that function twice, with two different expected results.
I don't see any signs of a unit-testing framework. Maybe I'm missing them? But just using assert won't scale to larger systems. When you get a result from validation, you should have a way to announce to a testing framework that a given test, with a given name, succeeded or failed.
I'm alarmed at the idea that checking a file name (as opposed to contents) constitutes "validation". In my mind, that's a little too simple.
Regarding your basic question, I think you would benefit from a broader idea of what the simplest thing is. I'm also not a fundamentalist TDDer, and I'd be fine with allowing you to "think ahead" a little bit. This means thinking ahead to this afternoon or tomorrow morning, not thinking ahead to next week.
You missed point #0 in your list: know what to do. You say you are processing a file for validation purposes. Once you have specified what "validation" means (hint: do this before writing any code), you might have a better idea of how to a) write tests that, well, test the specification as implemented, and b) write the simplest thing.
If, e.g., validation is "must be XML", your test case is just some non-xml-conformant string, and your implementation is using an XML library and (if necessary) transform its exceptions into those specified for your "validation" feature.
One thing of note to future TDD learners - the TDD mantra doesn't actually include "Do the simplest thing that could possibly work." Kent Beck's TDD Book has only 3 steps:
Red— Write a little test that doesn't work, and perhaps doesn't even
compile at first.
Green— Make the test work quickly, committing
whatever sins necessary in the process.
Refactor— Eliminate all of the duplication created in merely getting the test to work.
Although the phrase "Do the simplest thing..." is often attributed to Ward Cunningham, he actually asked a question "What's the simplest thing that could possibly work?", but that question was later turned into a command - which Ward believes may confuse rather help.
Edit: I can't recommend reading Beck's TDD Book strongly enough - it's like having a pairing session with the master himself, giving you his insights and thoughts on the Test Driven Development process.
Like a method should do one thing only, one test should test one thing (behavior) only. To address the example given, I'd write two tests, for instance, test_no_exception_for_empty_file and test_exception_for_invalid_file. The second could indeed be several tests - one per sort of invalidity.
The third step of the TDD process shall be interpreted as "add a new variant of the test", not "add a new variant to the test". Indeed, a unit test shall be atomic (test one thing only) and generally follows the triple A pattern: Arrange - Act - Assert. And it's very important to verify the test fails first, to ensure it is really testing something.
I would also separate the responsibility of reading the file and validating its content. That way, the test can pass a buffer to the validate() function, and the tests do not have to read files. Usually unit tests do not access to the filesystem cause this slow them down.
Imagine i have a function with a bug in it:
Pseudo-code:
void Foo(LPVOID o)
{
//implementation details omitted
}
The problem is the user passed null:
Object bar = null;
...
Foo(bar);
Then the function might crash due to a access violation; but it could also happen to work fine. The bug is that the function should have been checking for the invalid case of passing null, but it just never did. It was never issue because developers were trusted to know what they're doing.
If i now change the function to:
Pseudo-code:
void Foo(LPVOID o)
{
if (o == null) throw new EArgumentNullException("o");
//implementation details omitted
}
then people who were happily using the function, and happened to but not get an access violation, now suddenly will begin seeing an EArgumentNullException.
Do i continue to let people using the function improperly, and create a new version of the function? Or do i fix the function to include what it should have originally had?
So now the moral dillema. Do you ever add new sanity checks, safety checks, assertions to exising code? Or do you call the old function abandoned, and have a new one?
Consider a bug so common that Microsoft had to fix it for developers:
MessageBox(GetDesktopWindow, ...);
You never, ever, ever want to make a window model against the desktop. You'll lock up the system. Do you continue to let developers lock up the user's computer? Or do you change the function to:
MessageBox(HWND hWndParent, ...)
{
if (hWndParent == GetDesktopWindow)
throw new Exception("hWndParent cannot be the desktop window. Use NULL instead.");
...
}
In reality Microsoft changed the Window Manager to auto-fix the bad parameter:
MessageBox(HWND hWndParent, ...)
{
if (hWndParent == GetDesktopWindow)
hWndParent = 0;
...
}
In my made up example there is no way to patch the function - if i wasn't given an object, i can't do what i need to do on it.
Do you risk breaking existing code by adding parameter validation? Do you let existing code continue to be wrong, getting incorrect results?
The problem is that not only are you fixing a bug, but you are changing the semantic signature of the method by introducing an error case.
From a software engineering perspective I would advocate that you try to specify methods as best as possible (for instance using pre and post-conditions) but once the method is out there, specification changes are a no-go (or at least you would have to check all occurrences of the method) and a new method would be better.
I'd keep the old function and simply let it create a warning that notifies you of every (possibly) wrong use and then i'd just kick the developer who used it wrong until he uses it properly.
You cannot catch everything. What if someone wrote "MakeLocation("Ian Boyd", "is stupid");"? Would you create a new function or change the function to catch that? No, you would fire the developer (or at least punish him).
Of course this requires that you document what your function requires as input.
This is where having automated tests [Unit testing, Intergration Testing, automated functional testing] are great, they give you the power to change existing code with confidance.
When making changes like this I would suggest finding all usages and ensuring they are behaving how you belive they should.
I myself would make bug fixes to existing function rather them duplicating them 99% of the time. If it changes behavior alot and there are a lot of calls to this function you need to be very sure of your change.
So go ahead make your change, run your unit tests, then your automated functional tests. Fix any errors and your golden!
If your code has a bug in it you should do what you normally do when any bug is reported. One part of that is assessing the impacts of fixing and of not fixing it. Sometimes the right thing to do with a bug is to not fix it because the behaviour it exposes has become accepted. Sometimes the cost of fixing it, or the iconvenience of releasing a fix outside the normal release cycle, stops you releasing a fixed bug for a while. This isn't a moral dilemma, it's an economic question of costs and benefits. If you are disturbed at the thought of having known bugs in your published code, publish a known-bugs list.
One option none of the other respondents seems to have suggested is to wrap the buggy function in another function which imposes the new behaviour that you require. In the world where functions can run to many lines it is sometimes less likely to introduce new bugs to preserve a 99%-correct piece of code and address the change without modifying existing code. Of course, this is not always possible
Two choices:
Give the error checking version a new name and deprecate the old version (one version later have it start issuing warnings (compile time if possible, run time if necessary), two versions later remove it).
[not always possible] Place the newly introduced error check in such a way that it only triggers if the unmodified version would crash or produce undefined behavior. (So that users who were taking care in their code don't get any unpleasant surprises.)
It entirely depends on you, your codebase, and your users.
If you are Microsoft and you have a bug in your API that is used by millions of devs around the world, then you will probably want to just create a new function and update the docs on the old one. If you can, you would also want to update the compiler to give warnings as well. (Though even then you may be able to change the existing system; remember when MS switched VC to the C++ standard and you had to update all of your #include iostreams and add using stds to get simple, existing console apps working again?)
It basically depends on what the function is. If it is something basic that will have massive ripple effects, then it could break a lot of code. If it is just an ancillary function, then you may as well fix it. Of course if you are Microsoft and your other code depends on a bug in one of your functions, then you probably should fix it since that is just plain embarrassing to keep. If other devs rely on the bug (that you created), then you may have an obligation to the users to not break their code that you caused to be buggy.
If you are a small company or independent developer, then sure, go ahead and fix the function. If you only need to update yourself or a few people on the new usage then fixing it is the best solution, especially since it is not even a big deal because all it really requires is an added note to the docs for the function. eg do not pass NULL or an exception is thrown if hWnd is the desktop, etc.
Another option as a sort of compromise would be to create a wrapper function. You could create a small, inline function that checks the args and then calls the existing function. That way you don’t really have to do much in the short term and eventually when people have moved to the new one, you can deprecate or even remove the old one, moving the code to the new once between the checks.
In most scenarios, it is better to fix a buggy function, particularly if you are merely adding argument checks as opposed to completely changing the behavior of the function. It is not really a good idea to facilitate—read encourage—bad coding just because it would break some existing code (especially if the code is free!) Think about it: if someone is creating a new program, then they can do it right from the start instead of relying on a bug. If they are re-compiling an old program that depends on the bug, then they can just update the code. Again, it depends on how messy and convoluted the code is, how many people are affected, and whether or not they pay you, but it is quite common to have to update old code to for example initialize variables that hand’t been, or check for error codes, etc.
To sum up, in your specific example (given the information provided), you should just fix it.
Many people have argued about function size. They say that functions in general should be pretty short. Opinions vary from something like 15 lines to "about one screen", which today is probably about 40-80 lines.
Also, functions should always fulfill one task only.
However, there is one kind of function that frequently fails in both criteria in my code: Initialization functions.
For example in an audio application, the audio hardware/API has to be set up, audio data has to be converted to a suitable format and the object state has to properly initialized. These are clearly three different tasks and depending on the API this can easily span more than 50 lines.
The thing with init-functions is that they are generally only called once, so there is no need to re-use any of the components. Would you still break them up into several smaller functions would you consider big initialization functions to be ok?
I would still break the function up by task, and then call each of the lower level functions from within my public-facing initialize function:
void _init_hardware() { }
void _convert_format() { }
void _setup_state() { }
void initialize_audio() {
_init_hardware();
_convert_format();
_setup_state();
}
Writing succinct functions is as much about isolating fault and change as keeping things readable. If you know the failure is in _convert_format(), you can track down the ~40 lines responsible for a bug quite a bit faster. The same thing applies if you commit changes that only touch one function.
A final point, I make use of assert() quite frequently so I can "fail often and fail early", and the beginning of a function is the best place for a couple of sanity-checking asserts. Keeping the function short allows you to test the function more thoroughly based on its more narrow set of duties. It's very hard to unit-test a 400 line function that does 10 different things.
If breaking into smaller parts makes code better structured and/or more readable - do it no matter what the function does. It not about the number of lines it's about code quality.
I would still try to break up the functions into logical units. They should be as long or as short as makes sense. For example:
SetupAudioHardware();
ConvertAudioData();
SetupState();
Assigning them clear names makes everything more intuitive and readable. Also, breaking them apart makes it easier for future changes and/or other programs to reuse them.
In a situation like this I think it comes down to a matter of personal preference. I prefer to have functions do only one thing so I would split the initialization into separate functions, even if they are only called once. However, if someone wanted to do it all in a single function I wouldn't worry about it too much (as long as the code was clear). There are more important things to argue about (like whether curly braces belong on their own separate line).
If you have a lot of components the need to be plugged into each other, it can certainly be reasonably natural to have a large method - even if the creation of each component is refactored into a separate method where feasible.
One alternative to this is to use a Dependency Injection framework (e.g. Spring, Castle Windsor, Guice etc). That has definite pros and cons... while working your way through one big method can be quite painful, you at least have a good idea of where everything is initialized, and there's no need to worry about what "magic" might be going on. Then again, the initialization can't be changed after deployment (as it can with an XML file for Spring, for example).
I think it makes sense to design the main body of your code so that it can be injected - but whether that injection is via a framework or just a hard-coded (and potentially long) list of initialization calls is a choice which may well change for different projects. In both cases the results are hard to test other than by just running the application.
First, a factory should be used instead of an initialization function. That is, rather than have initialize_audio(), you have a new AudioObjectFactory (you can think of a better name here). This maintains separation of concerns.
However, be careful also not to abstract too early. Clearly you do have two concerns already: 1) audio initialization and 2) using that audio. Until, for example, you abstract the audio device to be initialized, or the way a given device may be configured during initialization, your factory method (audioObjectFactory.Create() or whatever), should really be kept to just one big method. Early abstraction serves only to obfuscate design.
Note that audioObjectFactory.Create() is not something that can be unit-tested. Testing it is an integration test, and until there are parts of it that can be abstracted, it will remain an integration test. Later on, you may find that the you have multiple different factories for different configurations; at that point, it might be beneficial to abstract the hardware calls into an interface, so you that you can create unit tests to ensure the various factories configure the hardware in a proper way.
I think it's the wrong approach to try and count the number of lines and determine functions based on that. For something like initialization code I often have a separate function for it, but mostly so that the Load or Init or New functions aren't cluttered and confusing. If you can separate it into a few tasks like others have suggested, then you can name it something useful and help organize. Even if you are calling it just once, it's not a bad habit, and often you find that there are other times when you may want to re-init things and can use that function again.
Just thought I'd throw this out there, since it hasn't been mentioned yet - the Facade Pattern is sometimes cited as an interface to a complex subsystem. I haven't done much with it myself, but the metaphors are usually something like turning on a computer (requires several steps), or turning on a home theater system (turn on TV, turn on receiver, turn down lights, etc...)
Depending on the code structure, might be something worth considering to abstract away your large initialization functions. I still agree with meagar's point though that breaking down functions into _init_X(), _init_Y(), etc. is a good way to go. Even if you aren't going to reuse comments in this code, on your next project, when you say to yourself, "How did I initialize that X-component?", it'll be much easier to go back and pick it out of the smaller _init_X() function than it would be to pick it out of a larger function, especially if the X-initialization is scattered throughout it.
Function length is, as you tagged, a very subjective matter. However, a standard best-practice is to isolate code that is often repeated and/or can function as its own entity. For instance, if your initialization function is loading library files or objects that will be used by a specific library, that block of code should be modularized.
With that said, it's not bad to have an initialization method that's long, as long as it's not long because of lots of repeated code or other snippets that can be abstracted away.
Hope that helps,
Carlos Nunez
To take an example, consider a set of discounts available to a supermarket shopper.
We could define these rules as data in some standard fashion (lists of qualifying items, applicable dates, coupon codes) and write generic code to handle these. Or, we could write each as a chunk of code, which checks for the appropriate things given the customer's shopping list and returns any applicable discounts.
You could reasonably store the rules as objects, serialised into Blobs or stored in code files, so that each rule could choose its own division between data and code, to allow for future rules that wouldn't fit the type of generic processor considered above.
It's often easy to criticise code that mixes data in, via if statements that check for 6 different things that should be in a file or a database, but is there a rule that helps in the edge cases?
Or is this the point of Object Oriented design, to stop us worrying about the line between data and code?
To clarify, the underlying question is this: How would you code the above example? Is there a rule of thumb that made you decide what is data and what is code?
(Note: I know, code can be compiled, but in a world of dynamic languages and JIT compilation, even that is a blurry concept.)
Fundamentally, there is of course no difference between data and code, but for real software infrastructures, there can be a big difference. Apart from obvious things like, as you mentioned, compilation, the biggest issue is this:
Most sufficiently large projects are designed to produce "releases" that are one big bundle, produced in 3-month (or longer) cycles, tested extensively and cannot be changed afterwards except in tightly controlled ways. "Code" most definitely cannot be changed, so anything that does need to be changed has to be factored out and made "configuration data" so that changing it becomes palatable those whose job it is to ensure that a release works.
Of course, in most cases bad configuration data can break a release just as thoroughly as bad code, so the whole thing is largely an illusion - in reality it doesn't matter whether it's code or "configuration data" that changes, what matters is that the interface between the main system and the parts that change is narrow and well-defined enough to give you a good chance that the person who does the change understands all consequences of what he's doing.
This is already harder than most people think when it's really just a few strings and numbers that are configured (I've personally witnessed a production mainframe system crash because it had one boolean value set differently than another system it was talking to). When your "configuration data" contains complex logic, it's almost impossible to achieve. But the situation isn't going to be any better ust because you use a badly-designed ad hoc "rules configuration" language instead of "real" code.
This is a rather philosophical question (which I like) so I'll answer it in a philosophical way: with nothing much to back it up. ;)
Data is the part of a system that can change. Code defines behavior; the way in which data can change into new data.
To put it more accurately: Data can be described by two components: a description of what the datum is supposed to represent (for instance, a variable with a name and a type) and a value.
The value of the variable can change according to rules defined in code. The description does not change, of course, because if it does, we have a whole new piece of information.
The code itself does not change, unless requirements (what we expect of the system) change.
To a compiler (or a VM), code is actually the data on which it performs its operations. However, the to-be-compiled code does not specify behavior for the compiler, the compiler's own code does that.
It all depends on the requirement. If the data is like lookup data and changes frequently you dont really want to do it in code, but things like Day of the Week, should not chnage for the next 200 years or so, so code that.
You might consider changing your topic, as the first thing I thought of when I saw it, was the age old LISP discussion of code vs data. Lucky in Scheme code and data looks the same, but thats about it, you can never accidentally mix code with data as is very possible in LISP with unhygienic macros.
Data are information that are processed by instructions called Code. I'm not sure I feel there's a blurring in OOD, there are still properties (Data) and methods (Code). The OO theory encapsulates both into a gestalt entity called a Class but they are still discrete within the Class.
How flexible you want to make your code in a matter of choice. Including constant values (what you are doing by using if statements as described above) is inflexible without re-processing your source, whereas using dynamically sourced data is more flexible. Is either approach wrong? I would say it really depends on the circumstances. As Leppie said, there are certain 'data' points that are invariate, like the days of the week that can be hard coded but even there it may be advantageous to do it dynamically in certain circumstances.
In Lisp, your code is data, and your
data is code
In Prolog clauses are terms, and terms
are clauses.
The important note is that you want to separate out the part of your code that will execute the same every time, (i.e. applying a discount) from the part of your code which could change (i.e. the products to be discounted, or the % of the discount, etc.)
This is simply for safety. If a discount changes, you won't have to re-write your discount code, you'll only need to go into your discounts repository (DB, or app file, or xml file, or however you choose to implement it) and make a small change to a number.
Also, if the discount code is separated into an XML file, then you can give the entire application to a manager, and with sufficient instructions, they won't need to pester you whenever they want to change the discount rates.
When you mix in data and code, you are exponentially increasing the odds of breaking when anything changes. So, as leppie said, you need to extract the constantly changing parts, and put them in a separate place.
Huge difference. Data is a given to system while code is a part of system.
Wrong data is senseless: our code===handler is good and what you put that you take, it is not a trouble of system that you meant something else. But if code is bad - system is bad.
In example, let's consider some JSON, some bad code parser.js by me and let's say good V8. For my system bad parser.js is a code and my system works wrong. But for Google system my bad parser is data that no how says about quality of V8.
The question is very practical, no sophistic.
https://en.wikipedia.org/wiki/Systems_engineering tries to make good answer and money.
Data is information. It's not about where you decide to put it, be it a db, config file, config through code or inside the classes.
The same happens for behaviors / code. It's not about where you decide to put it or how you choose to represent it.
The line between data and code (program) is blurry. It's ultimately just a question of terminology - for example, you could say that data is everything that is not code. But, as you wrote, they can be happily mixed together (although usually it's better to keep them separate).
Code is any data which can be executed. Now since all data is used as input to some program at some point of time, it can be said that this data is executed by a program! Thus your program acts as a virtual machine for your data. Hence in theory there is no difference between data and code!
In the end what matters is software engineering/development considerations like performance, efficiency etc. For example data driven programs may not be as efficient as programs which have hard coded (and hence fragile) conditional statements. Hence I choose to define code as any data which can be efficiently executed and all else being plain data.
It's a tradeoff between flexibility and efficiency. Executable data (like XML rules) offers more flexibility (sometimes) while the same data/rules when coded as part of the application will run more efficiently but changing it frequently becomes cumbersome. In other words executable data is easy to deploy but is inefficient and vice-versa. So ultimately the decision rests with you - the software designer.
Please correct me if I wrong.
Relationship between code and data is as follows:
code after compiled to a program processes the data while execution
program can extract data, transform data, load data, generate data ...
Also
program can extract code, transform code, load code, generate code tooooooo...
Hence code without compiled or interperator is useless, data is always worth..., but code after compiled can do all the above activities....
For eg)
Sourcecontrolsystem process Sourcecodes
here source code itself is a code
Backupscripts process files
here files is a data and so on...
I would say that the distinction between data, code and configuration is something to be made within the context of a particular component. Sometimes it's obvious, sometimes less so.
For example, to a compiler, the source code it consumes and the object code it creates are both data - and should be separated from the compiler's own code.
In your case you seem to be describing the option of a particularly powerful configuration file, which can contain code. Much as, for example, the GIMP lets you 'configure' plugins using Scheme. As the developer of the component that reads this configuration, you would think of it as data. When working at a different level -- writing the configuration -- you would think of it as code.
This is a very powerful way of designing.
Applying this to the underlying question ("How would you code the above example?"), one option might be to adopt or design a high level Domain Specific Language (DSL) for specifying rules. At startup, or when first required, the server reads the rule and executes it.
Provide an admin interface allowing the administrator to
test a new rule file
replace the current configuration with that from a new rule file
... all of which would happen at runtime.
A DSL might be something as simple as a table parser or an XML parser, or it could be something as sophisticated as a scripting language. From C, it's easy to embed Python or Lua. From Java it's easy to embed Groovy or Clojure.
You could switch in compiled code at runtime, with clever linking or classloader tricks. This seems more difficult and less valuable than the embedded DSL option, in my opinion.
The best practical answer to this question I found is this:
Any class that needs to be serialized, now or in any foreseeable future, is data.
Everything else is code.
That's why, for example, Java's HashMap is data - although it has a lot of code, API methods and specific implementation (i.e., it might look as code at first glance).