How many variables are too much for a class? - language-agnostic

I want to see if anyone has a better design for a class (class as in OOP) I am writing. We have a script that puts shared folder stats in a CSV file. I am reading that in and putting it in a Share class.
My boss wants to know information like:
Total Number of Files
Total Size of Files
Number of Office Files
Size of Office Files
Number of Exe Files
Size of Exe Files
etc ....
I have a class with variables like $numOfficeFiles, $sizeOfficeFiles, etc. with a ton of get/set methods. Isn't there a better way to do this? What is the general rule if you have a class with a lot of variables/properties?
I think of this as a language agnostic question, but if it matters, I am using PHP.

Whenever I see more than 5 or 6 non-final variables in a class I get antsy.
Chances are that they should probably be placed in a smaller class as suggested by Outlaw Programmer. There's also a good chance it could just be placed in a hashtable.
Here's a good rule of thumb: If you have a variable that has nothing but a setter and a getter, you have DATA, not code--get it out of your class and place it into a collection or something.
Having a variable with a setter and a getter just means that either you never do anything with it (it's data) or the code that manipulates it is in another class (terrible OO design, move the variable to the other class).
Remember--every piece of data that is a class member is something you will have to write specific code to access; for instance, when you transfer it from your object to a control on a GUI.
I often tag GUI controls with a name so I can iterate over a collection and automatically transfer data from the collection to the screen and back, significantly reducing boilerplate code; storing the data as member variables makes this process much more complicated (requires reflection).

Sometimes, data can be just data:
files = {
'total': { count: 200, size: 3492834 },
'office': { count: 25, size: 2344 },
'exe': { count: 30, size: 342344 },
...
}

"A class should do one thing, and do it well"
If you're not breaking this rule, then I'd say there aren't too many.
However it depends.
If by too many you mean 100's, then you might want to break it into a data class and collection as shown in the edit below.
Then you've only one get/set operation, however there are pros and cons to this "lazyness".
EDIT:
On second glance, you've pairs of variables, Count and Size.
There should be another class e.g. FileInfo with count and class, now your frist class just has FileInfo classes.
You can also put file type e.g. "All", "Exe" . . . on the File Info class.
Now the parent class becomes a collection of FileInfo objects.
Personally, I think I'd go for that.

I think the answer is "there's no such thing as too many variables."
But then, if this data is going to be kept for a while, you might just want to put it in a database and make your functions calls to the database.
I assume you don't want to recalculate all these values every time you're asked for them.

Each class' "max variables" count really is a function of what data makes sense for the class in question. If there are truly X different values for a class and all data is related, that should be your structure. It can be a bit tedious to create depending on the language being used, but I wouldn't say there is any "limit" that you shouldn't exceed. It is dictated by the purpose.

Sounds like you might have a ton of duplicate code. You want the # of files and the size of files for a bunch of different types. You can start with a class that looks like this:
public class FileStats
{
public FileStats(String extension)
{
// logic to discover files goes here
}
public int getSize() { }
public int getNumFiles() { }
}
Then, in your main class, you can have an array of all the file types you want, and a collection of these helper objects:
public class Statistics
{
private static final String[] TYPES = { "exe", "doc", "png" };
private Collection<FileStats> stats = new HashSet<FileStats>();
public static void collectStats()
{
stats.clear();
for(String type : TYPES)
stats.add(new FileStats(type));
}
}
You can clean up your API by passing a parameter to the getter method:
public int getNumFiles(String type)
{
return stats.get(type).getNumFiles();
}

There is no "hard" limit. OO design does however have a notion of coupling and cohesion. As long as your class is loosely coupled and highly cohesive I believe that you are ok with as many members/methods as you need.

Maybe I didn't understand the goal, but why do you load all the values into memory by using the variables, just to dump them to the csv file (when?). I'd prefer a stateless listener to the directory and writing values immediately to the csv.

I always try to think of a Class as being the "name of my container" or the "name of the task" that I am going to compute. Methods in the Class are "actions" part of the task.
In this case seems like you can start grouping things together, for example you are repeating the number and the size actions many times. Why not create a super class that other classes inherit from, for example:
class NameOfSuperClass {
public $type;
function __construct($type) {
$this->type = $type;
$this->getNumber();
$this->getSize();
}
public function getNumber() {
// do something with the type and the number
}
public function getSize() {
// do something with the type and the size
}
}
Class OfficeFiles extends NameOfSuperClass {
function __construct() {
$this->_super("office");
}
}
I'm not sure if this is right in PHP, but you get my point. Things will start to look a lot cleaner and better to manage.

Just from what I glanced at:
If you keep an array with all of the file names in it, all of those variables can be computed on the fly.

It's more of a readability issue.
I would wrap all the data into an array. And use just one pair of get/set methods.
Something like:
class Test()
{
private $DATA = array();
function set($what,$data) {
$DATA[$what] = $data;
}
function get($what) {
return $this->DATA[$what];
}
}

Related

JSON: better define a type of object inside or outside the object?

Context
We are building a JSON API for web (HTML+JS) and mobile (iOS/Android/Windows).
Server needs to send data with a base structure and a variable structure. In our example, the base structure includes "name" and "description", the variable structure is called "template" and have different fields depending on its type. We figured out at least three ways to write it (there may be more):
A: variable structure type defined outside the object
{
"id": "A001",
"name": "My First Game",
...,
"template_type": "BATTLE",
"template": {
...
}
}
In this scenario, the client should look at "template_type" in order to determine how to parse "template". The "template" object alone is not self-sufficient to know what it is.
B: variable structure type defined inside the object
{
"id": "A001",
"name": "My First Game",
...,
"template": {
"type": "BATTLE",
...
}
}
In this scenario, the client should look at "type" inside "template" in order to determine how to parse "template". The "template" object alone is self-sufficient to know what it is.
C: variable structure type defined by the key of the object
{
"id": "A001",
"name": "My First Game",
...,
"template_battle": {
...
}
}
In this scenario, the client should look at all keys ("template_battle", "template_puzzle", ...) in order to determine which type of game we have. The "template_battle" object alone is self-sufficient to know what it is, because it would always be the "BATTLE" type.
Question
Any recommendation on which JSON solution is the most client friendly for web and mobile to parse and use? (you can propose other solutions)
Personally, I would put the type on the template itself for a simple reason, that is encapsulation. Imagine you want to separate the creation of the template and the outside object (remember separation of concerns and the single responsibility principle (https://en.wikipedia.org/wiki/Single_responsibility_principle)). If the type is on the outside object, you will always have to specify the type of the template, to be able to create it. That's a possibility, but it increases coupling and violates encapsulation.
For further reading I recommend https://en.wikipedia.org/wiki/SOLID_(object-oriented_design) for the beginning.
I would recommend going with option A, for 2 simple reasons.
A > B because separate data and types:
It separates the type information from the data itself. By doing this, you would not have naming conflicts if say you wanted a template_type property associated with it. You could potentially simplify it enumerate all the properties and set them on your custom object, without having to have a special case to ignore the type property.
A > C because less-work:
Parsing the key string is more work. To find the template_* key in the first place, you would need to enumerate the properties and loop over them to find the one you want.
Ultimately, I think option A will give you the easiest method of parsing and using the data.
The approach B would be much better IMHO. That is simply because, it provides a generic approach for user to access the template's attributes without concerning about its type. In this manner, user can simply write his program for a generic template which include the type as an attribute of itself.
For example, imagine you have a object type named Template which maps
the json definition of a template to a Java object.
Class Template{
String type;
String attribute1;
String attribute2;
......
......
}
By using approach B, you can directly map the json definition of that
template, to above template object.(In this case, it is a Java object
but of course the concept works for any other programming language).
User does not need to have an prior knowledge of template type, before
accessing the template's definition. That's why it is said to be a
more generic approach.
I'd prefer your B variant over A and C.
However, you might also consider a structure like this:
{
"longDesc": "The Long description and other(?)necessary hints here",
"type": "template",
"ID": {
"A001": {
"name": "My First Game",
"type": "BATTLE"
/*more data here*/
},
"A002": {
"name": "My 2nd Game",
"type": "STRATEGY"
/*more data here*/
}
}
};
It might give a better feel in everyday use.
I would prefer B over the others, because it will separate the concerns/data.
Since here if you want to process only template data you can easily extract template data in one step in case of B (E.g Obj.template) But it is not easy case of A.
And also, If you add multiple types of templates in future then if you want to extract template data, it is straight forward in case B(E.g Obj.template), But in case of C , you need to write code like below,
if(template_type='temp1'){
template=Obj["template_tep1"]
}
if(template_type='temp1'){
template=Obj["template_tep1"]
}
if(template_type='temp1'){
template=Obj["template_tep1"]
}
or
you need to write code like template=Obj["template"+Obj.template_type].
So I will prefer B over others.
B: It is easier to use a standalone json node
As other answers says, I would go for B for encapsulation reason, but I will give another pragmatic reason: Think of what would do a generic process that you develop yourself, or if you use a library: I will use "Jackson" (it seem possible to use it on Android).
If the type is outside the JsonNode you parse, you will have to specify in your deserializer for each property where the type is, if it is inside the same Node, you specify only where the type is "inside the object", and it can be the same for many objets.
An additional argument, if you pass the "Battle" object only, it doens't have a container, so no external properties to specify the type
Other argument, at least 1 JS library use this technique: ExtJS,see the "xtype" property in documentation http://docs.sencha.com/extjs/5.0.1/guides/getting_started/getting_started.html
So here is the Node you want to parse with the good type:
{
"type": "BATTLE",
"aPropertyOfBattle":1
}
here is the jackson code for this
#JsonTypeInfo (use = JsonTypeInfo.Id.NAME, include = As.PROPERTY, property = "type")
#JsonSubTypes ({
#JsonSubTypes.Type (Battle.class),
//...all your types
})
public interface ICustomType {}
#JsonTypeName("BATTLE")
public class Battle implements ICustomType{
int aPropertyOfBattle;
// getters/setters...
}
jackson provide a solution for "gessing" the type:
Full working code :
#JsonTypeInfo (use = JsonTypeInfo.Id.NAME, include = As.PROPERTY, property = "type")
#JsonSubTypes ({
#JsonSubTypes.Type (Battle.class),
//...all your types
})
public interface ICustomType {}
#JsonTypeName("BATTLE")
public class Battle implements ICustomType{
int aPropertyOfBattle;
// getters setters...
}
public class BattleContainer {
private ICustomType template;
private String id;
private String name;
// getters/setters
}
public class BattleTest {
#Test
public void testBattle() throws IOException {
ObjectMapper objectMapper = new ObjectMapper();
ICustomType battle = objectMapper.readValue("{'type': 'BATTLE','aPropertyOfBattle':1}".replace('\'','"'),Battle.class );
Assert.assertTrue("Instance of battle",battle instanceof Battle);
Assert.assertEquals(((Battle)battle).getaPropertyOfBattle(),1);
}
#Test
public void testBattleContainer() throws IOException {
ObjectMapper objectMapper = new ObjectMapper();
BattleContainer battleContainer = objectMapper.readValue("{'id': 'A001','name': 'My First Game','template': {'type': 'BATTLE', 'aPropertyOfBattle':1}}"
.replace('\'','"'),BattleContainer.class );
Assert.assertTrue("Instance of battle",battleContainer.getTemplate() instanceof Battle);
Assert.assertEquals(((Battle)battleContainer.getTemplate()).getaPropertyOfBattle(),1);
}
}
Note that this is not jackson specific, you can parse the node using simple JsonNode in java.
Edit: I am seeing that it may seem out of subject since I give a technical solution, so I precise that the argument here is not to use Jackson, it is to show that whatever the solution language and library you choose, it is possible to use the "B" solution in an elegant way.
D: An encapsulating node
An other solution is this one:
{
"BATTLE":{
"aPropertyOfBattle":1
}
}
It may be easier to parse: you get the property name, then you parse the sub-node using any tool (Gson or other...)
In jackson, the only difference is that you use include = As.WRAPPER_OBJECT
The inconvenient is that it is less logical to use in Javascript since you have a useless node in the middle of your structure.
Other solution of Jackson
This library as other options behind include = As....
As.WRAPPER_ARRAY
As.EXTERNAL_PROPERTY
WRAPPER_ARRAY is easier to parse too, but i don't find it elegant (it is totally subjective) :
[
"BATTLE",
{
"aPropertyOfBattle":1
}
]
EXTERNAL_PROPERTY would be the A. solution, but as I said, you must specify it every time you use your variable and not in your type class (it lacks of coherence to me, because you can use the "Battle" object in different context, with different type naming convention)
#JsonTypeInfo (use = JsonTypeInfo.Id.NAME, include = As.EXTERNAL_PROPERTY, property = "template_type")
private ICustomType template;
Note again, I am inspired of jackson functionality, but it can be applied for every solution in every language.
I would recommend you to use the static and the dynamic structure in two different collections
As shown below
Static Structure and using the dynamic field as an array and by passing the unique id or the field which you think may be unique.
{
"id": "A001",
"name": "My First Game",
"description" : "GGWP noob",
...,
"template": ['temp1','temp2','temp3','temp4'],
}
Dynamic Structure. In the dynamic structure you can pass the rest of the fields into a different api,since the major functionality like searching,autocomplete might be dependant on them. Similarly it can also be referenced by the parent api easily.
{
"id" : "temp1",
"type": "BATTLE",
... //other features
}
This also allows faster searching,indexing and good compression.Rather than traversing through the whole single JSON api to search for the relevant tags, the dynamic structure helps in reducing the overheads.
There are many other major uses of this approach but I have mentioned only a few of them which i think would help you design in such a way.

Communication between visitor and visitee

My current project contains a complex object hierarchy. The following structure is a simplified example of this hierarchy for demonstration purposes:
Library
Category "Fiction"
Category "Science Fiction"
Book A (Each book contains pages, not displayed here)
Book B
Category "Crime"
Book C
Category "Non-fiction"
(Many subcategories)
Now, I want to avoid having nested loops all over my code whenever I need some information from the data structure, because when the structure changes I'd have to update all the loops.
So I plan on using the visitor pattern, which seems to give me the flexibility I need. It would look something like this:
class Library
{
void Accept(ILibraryVisitor visitor)
{
IterateCategories(this.categories, visitor);
}
void IterateCategories(
IEnumerable<Category> categorySequence,
ILibraryVisitor visitor)
{
foreach (var category in categorySequence)
{
visitor.VisitCategory(category.Name);
IterateCategories(category.Subcategories, visitor);
foreach (var book in category.Books)
{
// Could also pass in a book instance, not sure about that yet...
visitor.VisitBook(book.Title, book.Author, book.PublishingDate);
foreach (var page in book.Pages)
{
visitor.VisitPage(page.Number, page.Content);
}
}
}
}
}
interface ILibraryVisitor
{
void VisitCategory(string name);
void VisitBook(string title, string author, DateTime publishingDate);
void VisitPage(int pageNumber, string content);
}
I'm already seeing some possible problems though, so I'm hoping you can give me some advice.
Question 1
If I wanted to create a list of book titles prefixed by the (sub)categories it belongs to (e.g. Fiction » Science Fiction » Book A), a simple visitor implementation would appear to do the trick:
// LibraryVisitor is a base implementation with no-op methods
class BookListingVisitor : LibraryVisitor
{
private Stack<string> categoryStack = new Stack<string>();
void VisitCategory(string name)
{
this.categoryStack.Push(name);
}
// Other methods
}
Here I have already run into a problem: I have no clue on when to pop the stack, because I don't know when a category ends. Is it a common approach to split up the VisitCategory method into two methods, like below?
interface ILibraryVisitor
{
void VisitCategoryStart(string name);
void VisitCategoryEnd();
// Other methods
}
Or are there other ways of dealing with structures like this, which have a clear scope with a start and end?
Question 2
Suppose I only want to list the books that were published in 1982. A decorator visitor would separate the filtering from the listing logic:
class BooksPublishedIn1982 : LibraryVisitor
{
private ILibraryVisitor visitor;
public BooksPublishedIn1982(ILibraryVisitor visitor)
{
this.visitor = visitor;
}
void VisitBook(string title, string author, DateTime publishingDate)
{
if (publishingDate.Year == 1982)
{
this.visitor.VisitBook(string title, string author, publishingDate);
}
}
// Other methods that simply delegate to this.visitor
}
The problem here is that VisitPage will still be called for books that are not published in 1982. So the decorator somehow needs to communicate with the visited object:
Visitor: 'Hey, this book isn't from 1982, so please don't tell me anything about it.'
Library: 'Oh ok, then I won't show you its pages.'
The visit methods currently return void. I could change it to return a boolean which indicates whether to visit sub-items, but that feels kind of dirty. Are there common practices for letting the visitee know that it should skip certain items? Or perhaps I should look into a different design pattern?
P.S. If you think these should be two separate questions, just let me know and I'll be happy to split them up.
The Visitor pattern, as described by the GoF book, deals with class hierarchies and not with object hierarchies. To put it simply, adding a new Visitor type acts like adding a new virtual function to the base class and all the children, without touching their code.
The machinery of a Visitor consists of one Visitor::Visit function per class in the hierarchy, and the Accept function in the parent class and in all the descendants. It works by calling Accept(visitor) through a parent class reference. The implementation of Accept in the object that happens to be referenced calls the right kind of Visitor::Visit(this). It is all fully orthogonal to any object hierarchy that may exist between instances of different subclasses of our root class.
In your case, the ILibraryVisitor interface would have a VisitLibrary(Library) method, a VisitCategory(Category) method, a VisitBook(Book) method, and so on, while each of Library, Category, Book and so on would inherit a common base class and reimplement its Accept(ILibraryVisitor) method.
So far so good. But from this point on your implementation seems to get a bit disoriented. A Visitor does not call its own Visit functions! Members of the hierarchy do, Visitor implements these functions for their benefit. So how do we go down the category tree?
Remember that to call Accept(FooVisitor) replaces the method Foo in the root of the hierarchy, and FooVisitor::VisitBar replaces the implementation of bar::Foo . When we want to do something with an object, we call its methods. don't we? So let's do it (in pseudocode).
class LibraryVisitor : ILibraryVisitor
{
IterateChildren (List<ILibraryObject> objects) {
foreach obj in objects {
obj.Accept(this);
}
}
IterateSubcategories (Category cat) {
stack.push (cat); # we need a stack here to build a path
IterateChildren (cat.children); # both books and subcategories
stack.pop();
}
VisitLibrary (Library) = abstract
VisitCategory (Category) = abstract
VisitBook (page) = abstract
VisitPage (Page) = abstract
}
class MyLibraryVisitor : LibraryVisitor {
VisitLibrary (Library l ) { ... IterateChildren (categories) ... }
VisitCategory (Category c) = { ... IterateSubcategories (c) ... }
VisitBook (Book) = { ... IterateChildren (pages) ... }
VisitPage (Page) = { ... no children here, end of walk ... }
}
Note the ping-pong action between Visit and Accept. Visitor calls Accept on the children of the current visitee, the children call Visitor::Visit back, and Visitor calls Accept on their children etc.
This is how your second question is answered:
class BooksPublishedIn1982 : LibraryVisitor
{
VisitBook (Book b) {
if b.publishedIn (1982) {
IterateChildren(b.pages)
}
}
}
Once again, it is apparent that the tree walk and the visitor machinery have just about nothing to do with each other.
I have left the decision of iterating or not iterating children entirely with each Visit implementation. This need not be the case, you can easily split each VisitXYZ into two functions, VisitXYZProper and VisitXYZChildren. By default, VisitXYZ will call both and each concrete visitor may override that decision.

How can I create a subclass that takes in different parameters for the same function name?

So I have made this simple interface:
package{
public interface GraphADT{
function addNode(newNode:Node):Boolean;
}
}
I have also created a simple class Graph:
package{
public class Graph implements GraphADT{
protected var nodes:LinkedList;
public function Graph(){
nodes = new LinkedList();
}
public function addNode (newNode:Node):Boolean{
return nodes.add(newNode);
}
}
last but not least I have created another simple class AdjacancyListGraph:
package{
public class AdjacancyListGraph extends Graph{
public function AdjacancyListGraph(){
super();
}
override public function addNode(newNode:AwareNode):Boolean{
return nodes.add(newNode);
}
}
Having this setup here is giving me errors, namely:
1144: Interface method addNode in namespace GraphADT is implemented with an incompatible signature in class AdjacancyListGraph.
Upon closer inspection it was apparent that AS3 doesn't like the different parameter types from the different Graph classes newNode:Node from Graph , and newNode:AwareNode from AdjacancyListGraph
However I don't understand why that would be a problem since AwareNode is a subClass of Node.
Is there any way I can make my code work, while keeping the integrity of the code?
Simple answer:
If you don't really, really need your 'addNode()' function to accept only an AwareNode, you can just change the parameter type to Node. Since AwareNode extends Node, you can pass in an AwareNode without problems. You could check for type correctness within the function body :
subclass... {
override public function addNode (node:Node ) : Boolean {
if (node is AwareNode) return nodes.add(node);
return false;
}
}
Longer answer:
I agree with #32bitkid that your are getting an error, because the parameter type defined for addNode() in your interface differs from the type in your subclass.
However, the main problem at hand is that ActionScript generally does not allow function overloading (having more than one method of the same name, but with different parameters or return values), because each function is treated like a generic class member - the same way a variable is. You might call a function like this:
myClass.addNode (node);
but you might also call it like this:
myClass["addNode"](node);
Each member is stored by name - and you can always use that name to access it. Unfortunately, this means that you are only allowed to use each function name once within a class, regardless of how many parameters of which type it takes - nothing comes without a price: You gain flexibility in one regard, you lose some comfort in another.
Hence, you are only allowed to override methods with the exact same signature - it's a way to make you stick to what you decided upon when you wrote the base class. While you could obviously argue that this is a bad idea, and that it makes more sense to use overloading or allow different signatures in subclasses, there are some advantages to the way that AS handles functions, which will eventually help you solve your problem: You can use a type-checking function, or even pass one on as a parameter!
Consider this:
class... {
protected function check (node:Node) : Boolean {
return node is Node;
}
public function addNode (node:Node) : Boolean {
if (check(node)) return nodes.add(node);
return false;
}
}
In this example, you could override check (node:Node):
subclass... {
override protected function check (node:Node) : Boolean {
return node is AwareNode;
}
}
and achieve the exact same effect you desired, without breaking the interface contract - except, in your example, the compiler would throw an error if you passed in the wrong type, while in this one, the mistake would only be visible at runtime (a false return value).
You can also make this even more dynamic:
class... {
public function addNode (node:Node, check : Function ) : Boolean {
if (check(node)) return nodes.add(node);
return false;
}
}
Note that this addNode function accepts a Function as a parameter, and that we call that function instead of a class method:
var f:Function = function (node:Node) : Boolean {
return node is AwareNode;
}
addNode (node, f);
This would allow you to become very flexible with your implementation - you can even do plausibility checks in the anonymous function, such as verifying the node's content. And you wouldn't even have to extend your class, unless you were going to add other functionality than just type correctness.
Having an interface will also allow you to create implementations that don't inherit from the original base class - you can write a whole different class hierarchy, it only has to implement the interface, and all your previous code will remain valid.
I guess the question is really this: What are you trying to accomplish?
As to why you are getting an error, consider this:
public class AnotherNode extends Node { }
and then:
var alGraph:AdjacancyListGraph = new AdjacancyListGraph();
alGraph.addNode(new AnotherNode());
// Wont work. AnotherNode isn't compatable with the signature
// for addNode(node:AwareNode)
// but what about the contract?
var igraphADT:GraphADT = GraphADT(alGraph);
igraphADT.addNode(new AnotherNode()); // WTF?
According to the interface this should be fine. But your implemenation says otherwise, your implemenation says that it will only accept a AwareNode. There is an obvious mismatch. If you are going to have an interface, a contract that your object should follow, then you might as well follow it. Otherwise, whats the point of the interface in the first place.
I submit that architecture messed up somewhere if you are trying to do this. Even if the language were to support it, I would say that its a "Bad Idea™"
There's an easier way, then suggested above, but less safe:
public class Parent {
public function get foo():Function { return this._foo; }
protected var _foo:Function = function(node:Node):void { ... }}
public class Child extends Parent {
public function Child() {
super();
this._foo = function(node:AnotherNode):void { ... }}}
Of course _foo needs not be declared in place, the syntax used is for shortness and demonstration purposes only.
You will loose the ability of the compiler to check types, but the runtime type matching will still apply.
Yet another way to go about it - don't declare methods in the classes they specialize on, rather make them static, then you will not inherit them automatically:
public class Parent {
public static function foo(parent:Parent, node:Node):Function { ... }}
public class Child extends Parent {
public static function foo(parent:Child, node:Node):Function { ... }}
Note that in second case protected fields are accessible inside the static method, so you can achieve certain encapsulation. Besides, if you have a lot of Parent or Child instances, you will save on individual instance memory footprint (as static methods therefore static there exists only one copy of them, but instance methods would be copied for each instance). The disadvantage is that you won't be able to use interfaces (can be actually an improvement... depends on your personal preferences).

Abstract syntax tree construction and traversal

I am unclear on the structure of abstract syntax trees. To go "down (forward)" in the source of the program that the AST represents, do you go right on the very top node, or do you go down? For instance, would the example program
a = 1
b = 2
c = 3
d = 4
e = 5
Result in an AST that looks like this:
or this:
Where in the first one, going "right" on the main node will advance you through the program, but in the second one simply following the next pointer on each node will do the same.
It seems like the second one would be more correct since you don't need something like a special node type with a potentially extremely long array of pointers for the very first node. Although, I can see the second one becoming more complicated than the first when you get into for loops and if branches and more complicated things.
The first representation is the more typical one, though the second is compatible with the construction of a tree as a recursive data structure, as may be used when the implementation platform is functional rather than imperative.
Consider:
This is your first example, except shortened and with the "main" node (a conceptual straw man) more appropriately named "block," to reflect the common construct of a "block" containing a sequence of statements in an imperative programming language. Different kinds of nodes have different kinds of children, and sometimes those children include collections of subsidiary nodes whose order is important, as is the case with "block." The same might arise from, say, an array initialization:
int[] arr = {1, 2}
Consider how this might be represented in a syntax tree:
Here, the array-literal-type node also has multiple children of the same type whose order is important.
Where in the first one, going "right"
on the main node will advance you
through the program, but in the second
one simply following the next pointer
on each node will do the same.
It seems like the second one would be
more correct since you don't need
something like a special node type
with a potentially extremely long
array of pointers for the very first
node
I'd nearly always prefer the first approach, and I think you'll find it much easier to construct your AST when you don't need to maintain a pointer to the next node.
I think its generally easier to have all objects descend from a common base class, similar to this:
abstract class Expr { }
class Block : Expr
{
Expr[] Statements { get; set; }
public Block(Expr[] statements) { ... }
}
class Assign : Expr
{
Var Variable { get; set; }
Expr Expression { get; set; }
public Assign(Var variable, Expr expression) { ... }
}
class Var : Expr
{
string Name { get; set; }
public Variable(string name) { ... }
}
class Int : Expr
{
int Value { get; set; }
public Int(int value) { ... }
}
Resulting AST is as follows:
Expr program =
new Block(new Expr[]
{
new Assign(new Var("a"), new Int(1)),
new Assign(new Var("b"), new Int(2)),
new Assign(new Var("c"), new Int(3)),
new Assign(new Var("d"), new Int(4)),
new Assign(new Var("e"), new Int(5)),
});
It depends on the language. In C, you'd have to use the first form to capture the notion of a block, since a block has a variable scope:
{
{
int a = 1;
}
// a doesn't exist here
}
The variable scope would be an attribute of what you call the "main node".
I believe your first version make more sense, for a couple of reasons.
Firstly, the first more clearly demonstrates the "nestedness" of the program, and also is clearly implemented as a rooted tree (which is the usual concept of a tree).
The second, and more important reason, is that your "main node" could really have been a "branch node" (for example), which can simply be another node within a larger AST. This way, your AST can be viewed in a recursive sense, where each AST is a node with other ASTs as it children. This make the design of the first much simpler, more general, and very homogeneous.
Suggestion: When dealing with tree data structures, wheter is compiler-related AST or other kind, always use a single "root" node, it may help you perform operations and have more control:
class ASTTreeNode {
bool isRoot() {...}
string display() { ... }
// ...
}
void main ()
{
ASTTreeNode MyRoot = new ASTTreeNode();
// ...
// prints the root node, plus each subnode recursively
MyRoot.Show();
}
Cheers.

Access to global application settings

A database application that I'm currently working on, stores all sorts of settings in the database. Most of those settings are there to customize certain business rules, but there's also some other stuff in there.
The app contains objects that specifically do a certain task, e.g., a certain complicated calculation. Those non-UI objects are unit-tested, but also need access to lots of those global settings. The way we've implemented this right now, is by giving the objects properties that are filled by the Application Controller at runtime. When testing, we create the objects in the test and fill in values for testing (not from the database).
This works better, in any case much better than having all those objects need some global Settings object --- that of course effectively makes unit testing impossible :) Disadvantage can be that you sometimes need to set a dozen of properties, or that you need to let those properties 'percolate' into sub-objects.
So the general question is: how do you provide access to global application settings in your projects, without the need for global variables, while still being able to unit test your code? This must be a problem that's been solved 100's of times...
(Note: I'm not too much of an experienced programmer, as you'll have noticed; but I love to learn! And of course, I've already done research into this topic, but I'm really looking for some first-hand experiences)
You could use Martin Fowlers ServiceLocator pattern. In php it could look like this:
class ServiceLocator {
private static $soleInstance;
private $globalSettings;
public static function load($locator) {
self::$soleInstance = $locator;
}
public static function globalSettings() {
if (!isset(self::$soleInstance->globalSettings)) {
self::$soleInstance->setGlobalSettings(new GlobalSettings());
}
return self::$soleInstance->globalSettings;
}
}
Your production code then initializes the service locator like this:
ServiceLocator::load(new ServiceLocator());
In your test-code, you insert your mock-settings like this:
ServiceLocator s = new ServiceLocator();
s->setGlobalSettings(new MockGlobalSettings());
ServiceLocator::load(s);
It's a repository for singletons that can be exchanged for testing purposes.
I like to model my configuration access off of the Service Locator pattern. This gives me a single point to get any configuration value that I need and by putting it outside the application in a separate library, it allows reuse and testability. Here is some sample code, I am not sure what language you are using, but I wrote it in C#.
First I create a generic class that will models my ConfigurationItem.
public class ConfigurationItem<T>
{
private T item;
public ConfigurationItem(T item)
{
this.item = item;
}
public T GetValue()
{
return item;
}
}
Then I create a class that exposes public static readonly variables for the configuration item. Here I am just reading the ConnectionStringSettings from a config file, which is just xml. Of course for more items, you can read the values from any source.
public class ConfigurationItems
{
public static ConfigurationItem<ConnectionStringSettings> ConnectionSettings = new ConfigurationItem<ConnectionStringSettings>(RetrieveConnectionString());
private static ConnectionStringSettings RetrieveConnectionString()
{
// In .Net, we store our connection string in the application/web config file.
// We can access those values through the ConfigurationManager class.
return ConfigurationManager.ConnectionStrings[ConfigurationManager.AppSettings["ConnectionKey"]];
}
}
Then when I need a ConfigurationItem for use, I call it like this:
ConfigurationItems.ConnectionSettings.GetValue();
And it will return me a type safe value, which I can then cache or do whatever I want with.
Here's a sample test:
[TestFixture]
public class ConfigurationItemsTest
{
[Test]
public void ShouldBeAbleToAccessConnectionStringSettings()
{
ConnectionStringSettings item = ConfigurationItems.ConnectionSettings.GetValue();
Assert.IsNotNull(item);
}
}
Hope this helps.
Usually this is handled by an ini file or XML configuration file. Then you just have a class that reads the setting when neeed.
.NET has this built in with the ConfigurationManager classes, but it's quite easy to implement, just read text files, or load XML into DOM or parse them by hand in code.
Having config files in the database is ok, but it does tie you to the database, and creates an extra dependancy for your app that ini/xml files solve.
I did this:
public class MySettings
{
public static double Setting1
{ get { return SettingsCache.Instance.GetDouble("Setting1"); } }
public static string Setting2
{ get { return SettingsCache.Instance.GetString("Setting2"); } }
}
I put this in a separate infrastructure module to remove any issues with circular dependencies.
Doing this I am not tied to any specific configuration method, and have no strings running havoc in my applications code.