Code optimization - Unused methods - language-agnostic

How can I tell if a method will never be used ?
I know that for dll files and libraries you can't really know if someone else (another project) will ever use the code.
In general I assume that anything public might be used somewhere else.
But what about private methods ? Is it safe to assume that if I don't see an explicit call to that method, it won't be used ?
I assume that for private methods it's easier to decide. But is it safe to decide it ONLY for private methods ?

Depends on the language, but commonly, a name that occurs once in the program and is not public/exported is not used. There are exceptions, such as constructors and destructors, operator overloads (in C++ and Python, where the name at the point of definition does not match the name at the call site) and various other methods.
For example, in Python, to allow indexing (foo[x]) to work, you define a method __getitem__ in the class to which foo belongs. But hardly ever would you call __getitem__ explicitly.

What you need to know is the (or all possible) entry point(s) to your code:
For a simple command line program, this is the "main" method or, in the most simple case, the top of your script.
For libraries, in fact, it is everything visible from outside.
The situation turns more complicated if methods can be referenced from outside by means of introspection. This is language specific and requires knowledge into details of the techniques used.
What you need to do is follow all references from all entry points recursively to mark up all used methods. Whatever remains unmarked can safely - and should - be removed.
Since this is a diligent but routine piece of work, there are tools available which do that for various programming languages. Examples include ReSharper for C# or ProGuard for Java.

Related

Making superclass variables read-only to children in TCL OO

Let's I have a class foo, with a variable bar. Now... I want that if there is a class moo, which has class foo as a superclass, I want that any attempts to write to, or better yet, even refer directly to bar will be errored out. This could save situations when someone is using my code (which could be compiled to byte-code), to not override, by having one's own variable with the same name
TclOO simply does not support the concept. Classes are not security boundaries in TclOO, just as namespaces are not security boundaries in plain Tcl (TclOO objects are really just fancy namespaces). Tcl's security boundaries are between interpreters, and between the Tcl script level and the (usually) C implementation level. We're considering adding “private” instance variables for Tcl 8.7, but even those won't be truly private; their names will still be predictable if you know how (and they will be accessible from outside the class; that's important for when using the variable with third-party code such as Tk). To reiterate: classes are not security boundaries.
If you have something that must be locked out of sight, it is easiest to implement it in C. You can plug in methods implemented in C into TclOO (applying whatever controls you can think of) and those methods can use the (C level only) metadata mechanism to create instance-attached storage that they can use. All the callbacks are in place to do deletion correctly at the right time. Methods in C are not much more complicated than commands in C; the function callback signature is a little different and the usage is a bit more complicated (because there are other standard operations on methods such as copying them) but if you can do one, you can figure out how to do the other too.

How are implemented classes in dynamic languages?

How are implemented classes in dynamic languages ?
I know that Javascript is using a prototype pattern (there is 'somewhere' a container of unbound JS functions, which are bind when calling them through an object), but I have no idea of how it works in other languages.
I'm curious about this, because I can't think of an efficient way to have native bound methods without wasting memory and/or cpu by copying members for each instance.
(By bound method, I mean that the following code should work :)
class Foo { function bar() : return 42; };
var test = new Foo();
var method = test.bar;
method() == 42;
This highly depends on the language and the implementation. I'll tell you what I know about CPython and PyPy.
The general idea, which is also what CPython does for the most part, goes like this:
Every object has a class, specifically a reference to that class object.
Apart from instance members, which are obviously stored in the individual object, the class also has members. This includes methods, so methods don't have a per-object cost.
A class has a method resolution order (MRO) determined by the inheritance relationships, wherein each base class occurs exactly once. If we didn't have multiple inheritance, this would simply be a reference to the base class, but this way the MRO is hard to figure out on the fly (you'd have to start from the most derived class every time).
(Classes are also objects and have classes themselves, but we'll gloss over that for now.)
If attribute lookup on an object fails, the same attribute is looked up on the classes in the MRO, in the order specified by the MRO. (This is the default behavior, which can be changed by defining magic methods like __getattr__ and __getattribute__.)
So far so simple, and not really an explanation for bound methods. I just wanted to make sure we're talking about the same thing. The missing piece is descriptors. The descriptor protocol is defined in the "deep magic" section of the language reference, but the short and simple story is that lookup on a class can be hijacked by the object it results in via a __get__ method. More importantly, this __get__ method is told whether the lookup started on an instance or on the "owner" (the class).
In Python 2, we have an ugly and unnecessary UnboundMethod descriptor which (apart from the __get__ method) simply wraps the function to throw errors on Class.method(self) if self is not of an acceptable type. In Python 3, the __get__ is simply part of all function objects, and unbound methods are gone. In both cases, the __get__ method returns itself when you look it up on a class (so you can use Class.method, which is useful in a few cases) and a "bound method" object when you look it up on an object. This bound method object does nothing more than storing the raw function and the instance, and passing the latter as first argument to the former in its __call__ (special method overriding the function call syntax).
So, for CPython: While there is a cost to bound methods, it's smaller than you might think. Only two references are needed space-wise, and the CPU cost is limited to a small memory allocation, and an extra indirection when calling. Note though that this cost applies to all method calls, not just those which actually make use of bound method features. a.f() has to call the descriptor and use its return value, because in a dynamic language we don't know if it's monkey-patched to do something different.
In PyPy, things are more interesting. As it's an implementation which doesn't compromise on correctness, the above model is still correct for reasoning about semantics. However, it's actually faster. Apart from the fact that the JIT compiler inlines and then eliminates the entire mess described above in most cases, they also tackle the problem on bytecode level. There are two new bytecode instructions, which preserve the semantics but omit the allocation of the bound method object in the case of a.f(). There is also a method cache which can simplify the lookup process, but requires some additional bookkeeping (though some of that bookkeeping is already done for the JIT).

What are namespaces for ? what about usages?

what is the purpose of namespaces ?
and, more important, should they be used as objects in java (things that have data and functions and that try to achieve encapsulation) ? is this idea to far fetched ? :)
or should they be used as packages in java ?
or should they be used more generally as a module system or something ?
Given that you use the Clojure tag, I suppose that you'll be interested in a Clojure-specific answer:
what is the purpose of namespaces ?
Clojure namespaces, Java packages, Haskell / Python / whatever modules... At a very high level, they're all different names for the same basic mechanism whose primary purpose is to prevent name clashes in non-trivial codebases. Of course, each solution has its own little twists and quirks which make sense in the context of a given language and would not make sense outside of it. The rest of this answer will deal with the twists and quirks specific to Clojure.
A Clojure namespace groups Vars, which are containers holding functions (most often), macro functions (functions used by the compiler to generate macroexpansions of appropriate forms, normally defined with defmacro; actually they are just regular Clojure functions, although there is some magic to the way in which they are registered with the compiler) and occasionally various "global parameters" (say, clojure.core/*in* for standard input), Atoms / Refs etc. The protocol facility introduced in Clojure 1.2 has the nice property that protocols are backed by Vars, as are the individual protocol functions; this is key to the way in which protocols present a solution to the expression problem (which is however probably out of the scope of this answer!).
It stands to reason that namespaces should group Vars which are somehow related. In general, creating a namespace is a quick & cheap operation, so it is perfectly fine (and indeed usual) to use a single namespace in early stages of development, then as independent chunks of functionality emerge, factor those out into their own namespaces, rinse & repeat... Only the things which are part of the public API need to be distributed between namespaces up front (or rather: prior to a stable release), since the fact that function such-and-such resides in namespace so-and-so is of course a part of the API.
and, more important, should they be used as objects in java (things that have data and functions and that try to achieve encapsulation) ? is this idea to far fetched ? :)
Normally, the answer is no. You might get a picture not too far from the truth if you approach them as classes with lots of static methods, no instance methods, no public constructors and often no state (though occasionally there may be some "class data members" in the form of Vars holding Atoms / Refs); but arguably it may be more useful not to try to apply Java-ish metaphors to Clojure idioms and to approach a namespace as a group of functions etc. and not "a class holding a group of functions" or some such thing.
There is an important exception to this general rule: namespaces which include :gen-class in their ns form. These are meant precisely to implement a Java class which may later be instantiated, which might have instance methods and per-instance state etc. Note that :gen-class is an interop feature -- pure Clojure code should generally avoid it.
or should they be used as packages in java ?
They serve some of the same purposes packages were designed to serve (as already mentioned above); the analogy, although it's certainly there, is not that useful, however, just because the things which packages group together (Java classes) are not at all like the things which Clojure namespaces group together (Clojure Vars), the various "access levels" (private / package / public in Java, {:private true} or not in Clojure) work very differently etc.
That being said, one has to remember that there is a certain correspondence between namespaces and packages / classes residing in particular packages. A namespace called foo.bar, when compiled, produces a class called bar in the package foo; this means, in particular, that namespace names should contain at least one dot, as so-called single-segment names apparently lead to classes being put in the "default package", leading to all sorts of weirdness. (E.g. I find it impossible to have VisualVM's profiler notice any functions defined in single-segment namespaces.)
Also, deftype / defrecord-created types do not reside in namespaces. A (defrecord Foo [...] ...) form in the file where namespace foo.bar is defined creates a class called Foo in the package foo.bar. To use the type Foo from another namespace, one would have to :import the class Foo from the foo.bar package -- :use / :require would not work, since they pull in Vars from namespaces, which records / types are not.
So, in this particular case, there is a certain correspondence between namespaces and packages which Clojure programmers who wish to take advantage of some of the newer language features need to be aware of. Some find that this gives an "interop flavour" to features which are not otherwise considered to belong in the realm of interop (defrecord / deftype / defprotocol are a good abstraction mechanism even if we forget about their role in achieving platform speed on the JVM) and it is certainly possible that in some future version of Clojure this flavour might be done away with, so that the namespace name / package name correspondence for deftype & Co. can be treated as an implementation detail.
or should they be used more generally as a module system or something ?
They are a module system and this is indeed how they should be used.
A package in Java has its own namespace, which provides a logical grouping of classes. It also helps prevent naming collisions. For example in java you will find java.util.Date and java.sql.Date - two different classes with the same name differentiated by their namespace. If you try an import both into a java file, you will see that it wont compile. At least one version will need to use its explicit namespace.
From a language independant view, namespaces are a way to isolate things (i.e. encapsulate in a sens). It's a more general concept (see xml namespaces for example). You can "create" namespaces in several ways, depending on the language you use: packages, static classes, modules and so on. All of these provides namespaces to the objects/data/functions they contain. This allow to organize the code better, to isolate features, tends for better code reuse and adaptability (as encapsulation)
As stated in the "Zen of Python", "Namespaces are one honking great idea -- let's do more of those !".
Think of them as containers for your classes. As in if you had a helper class for building strings and you wanted it in your business layer you would use a namespace such as MyApp.Business.Helpers. This allows your classes to be contained in sensical locations so when you or some else referencing your code wants to cosume them they can be located easily. For another example if you wanted to consume a SQL connection helper class you would probably use something like:
MyApp.Data.SqlConnectionHelper sqlHelper = new MyApp.Data.SqlConnectionHelper();
In reality you would use a "using" statement so you wouldn't need to fully qualify the namespace just to declare the variable.
Paul

Self-Configuring Classes W/ Command Line Args: Pattern or Anti-Pattern?

I've got a program where a lot of classes have really complicated configuration requirements. I've adopted the pattern of decentralizing the configuration and allowing each class to take and parse the command line/configuration file arguments in its c'tor and do whatever it needs with them. (These are very coarse-grained classes that are only instantiated a few times, so there is absolutely no performance issue here.) This avoids having to do shotgun surgery to plumb new options I add through all the levels they need to be passed through. It also avoids having to specify each configuration option in multiple places (where it's parsed and where it's used).
What are some advantages/disadvantages of this style of programming? It seems to reduce separation of concerns in that every class is now doing configuration stuff, and to make programs less self-documenting because what parameters a class takes becomes less explicit. OTOH, it seems to increase encapsulation in that it makes each class more self-contained because no other part of the program needs to know exactly what configuration parameters a class might need.
Regardless of the way you do it, you have several "modules" which compete for the same sequence of command-line arguments. There must be some cooperation so that the same command-line arguments can be processed by your classes without clashes.
By having each class implements the parsing, you simply make that cooperation implicit. There is no module dedicated to the cooperation between your classes. The problem becomes a matter of documentation rather than a matter of code. It is not bad but it may seductively appear as if the problem has simply "gone away". To be brief, this practice requires more discipline.
Also, it will make major overhauls of command-line argument syntax more difficult.
it seems to increase
encapsulation in that it makes each
class more self-contained because no
other part of the program needs to
know exactly what configuration
parameters a class might need.
If I understand what you're proposing, it really makes each class hide their dependencies.
The dependencies in this case may be simple (primitive), but if a class needs an username and password to function correctly, it should say so in its constructor. Otherwise, the class' callers need to look at the source code or documentation to use it.

What is the benefit of explicitly naming getters and setters as "get..." and "set..."?

Does this rankle anyone else out there? I would much rather see:
block.key(newKey); // set the key for this block
and
testKey = block.key(); // look up the key for this block
than
block.setKey(newKey); // set the key for this block
testKey = block.getKey(); // look up the key for this block
First, the "set" and "get" are redundant (and hence add noise reducing the readability). The action (set/get) is defined by the syntax of each statement. I understand the overloading. In fact, using the same exact identifier reinforces that these are the same property of the object. When I read "getKey()" and "setKey()" I may not be so sure.
Second, if "get" and "set" are to be strictly interpreted as setter and getter, then if other semantic associated with setting/getting a value, side effects for example, will be surprising.
I suppose this bias comes from my Smalltalk background, but in a world where polymorphism works just fine, wouldn't we be better off without the "get" and "set" sprinkled everywhere? Just think of how much more code we could type if we didn't have to type those three letters over and over again?! (tongue somewhat in cheek)
Anyone out there feel the same way?
The designers of C# apparently agree, and the 'C#' tag outnumbers the next most popular language by 2:1 on StackOverflow. So I suspect you're preaching to the choir.
Several languages have different ways of handling getters and setters. In Java you have getName and setName, in Qt you have name and setName. I prefer the Java way for these reasons:
What if you have a function that is called drive? Does it cause your class to drive, or does it set/get a drive?
With suggestions turned on, you can type get, and then get all the getters. This is very useful if you don't remember the name of a getter you need.
Building on reason one, it separates the functions into different groups. You have the functions that do something, they don't start with get or set (though maybe they should start with do?). Then you have the functions that get a property, and they all start with get. Then you have the functions that set a property, and they all start with set.
For me, get and set prefixes make my brain do less work when reading code. When used consistently, get/set methods make it easier to grep a header, possibly making the class easier to learn & use. I spend a disproportionately large amount of time reading code vs. writing code, so the extra characters for get and set are fine by me.
The Java language currently doesn't have properties, so the getter / setter syntax has become the de-facto standard. If your writing Java code, you'll be well served to use the Java convention. This isn't just so other programmers can read your code, but more importantly hundreds of other Java frameworks are built to handle objects supporting Java Bean style getters / setters.
For example, in the Velocity templating engine, you could write something like:
The answer is $block.key
The above will attempt to invoke:
block.getkey();
block.getKey();
If you've defined block.getKey(), then all will work fine. Generally it's best to follow the conventions of the language.
Prefixing accessors with "get" and mutators with "set" is a practice that varies from language to language, and seems to be mostly prevalent in Java. For example:
In Python, you have the concept of properties, and so an accessor might look like obj.foo and a mutator might look like obj.foo = bar (even though methods are called behind the scenes). Similar concepts exist in languages such as Ruby. So in a lot of languages, calls to accessors and mutators look like direct "calls" to instance variables, and you never see anything like "setFoo" or "getFoo".
Languages such as Objective-C discourage the practice of prefixing accessors with "get", so in Objective-C you'd see calls like [obj foo] and [obj setFoo:bar].
I never liked the practice of prefixing accessors in Java with "get", either, but I do see the merit in prefixing mutators with "set". That said, since the prevailing practice is to prefix with "get"/"set" in Java, I continue the trend in my own code.
In general property names should be nouns whereas method names should be verbs, so the property in the above example would be called 'driver' and the method would be 'drive'. There will of course be cases where there is overlap but in general this works and makes the code more readable.
In C++ I just use overloading
int parameter() const { return m_param }
void parameter(int param) { m_param = param; }
Yes the get/set in java is essentially a workaround a problem in the language.
Comparing it to c# properites
http://www.csharp-station.com/Tutorials/Lesson10.aspx
Or python
http://blog.fedecarg.com/2008/08/17/no-need-for-setget-methods-in-python/
I think this is one of biggest failings of java.
C# getters and setters are "first class" entities and don't resemble function calls syntactically (though any arbitrary code can run in the context of an accessor).
I only use get/set in languages that force me to, like Objective-C.
You can easily search your codebase for references to getDrive() or setDrive(). If your method is just named 'drive', you will get many more false positives when searching.
Hear, hear.
I really like special notation for getters and setters. CLU did this best:
Using p.x in an expression was equivalent to the call get_x(p).
Using p.x := e (assignment) was equivalent to the call set_x(p, e).
If the interface for object p didn't export get_x or set_x, you were prevented from doing the corresponding operation. Simple.
Let's hear it for syntactic sugar!