History of Namespaces/Packages/Modules? - language-agnostic

I've been researching how different languages manage organization of source code. It appears most modern languages use some form of named abstract container. What its called and how its implemented varies from one language to the next but it boils down to a programming construct that operates beyond file boundaries to group related code.
In Java and .NET languages it is used as the basis for organizing dependencies (You include/import the namespace/package a class belongs to rather than the file it is defined in). While C++ uses it only for avoiding name clashes.
I'm curious as to who first proposed this idea and when was it proposed. Also which language was the first to implement it?

Namespaces and modules are separate concerns. Namespaces provide separate conceptual grouping of identifiers. If project A uses namespace A and all its identifiers are in A or subnamespaces of A, then it cannot clash with project B using namespace B. In a language with one big flat namespace such as C, problems can occur when different projects want to use the same identifier.
Modules are individual units of code. Generally they are files or groups of files, though I don't think a strict definition is possible. Modules can contain submodules which contain submodules.
The difference here is that while it is common for each module to have its own namespace in a one-to-one relationship, it's not required in general. For example, the C++ STL is divided into different modules such as <vector>, <functional> etc but they all use the same namespace std::. In C, you can have modular code (in .c/.h pairs) but you can't have namespaces - or equivalently, all modules use one namespace.
The name "package" in general can be ambiguous: I have seen it refer to either a namespace (as in Perl), or to a namespace/module combination (as in Java).

Related

tcl - when to use package+namespace vs interp?

I'm just starting with TCL and trying to get my head around how to best define and integrate modules. There seem to be much effort put into the package+namespace concept, but from what I can tell interp is more powerful and lean for every thinkable scenario. In particular when it comes to hiding and renaming procedures, but also the lack of creep in the global namespace. The only reason to use package+namespaces seem to be because "once upon a time Sun said so".
When should I ever use package+namespace instead of interp?
Namespaces and packages work together. Interpreters are something else.
A namespace is a small scale naming context in Tcl. It can contain commands, variables and other namespaces. You can refer to entities in a namespace via either local names (foo) or via qualified names (bar::foo); if a qualified name starts with ::, it is relative to the (interpreter-)global namespace, and can be used to refer to its command or variable from anywhere in the interpreter. (FWIW, the TclOO object system builds extensively on top of namespaces; there is one namespace per object.)
A package is a high-level concept for a bunch of code supplied by some sort of library. Packages have abstract names (the name do not have to correspond to how the library's implementation is stored on disk) and a distinct version; you can ask for a particular version if necessary, though most of the time you don't bother. Packages can be implemented by multiple mechanisms, but they almost all come down to sourceing some number of Tcl scripts and loading some number of DLLs. Almost all packages declare commands, and they conventionally are encouraged to put those commands in a namespace with the same general name as the package. However, quite a few older packages do not do this for various reasons, mostly to do with compatibility with existing code.
An interpreter is a security context in Tcl. By default, Tcl creates one interpreter (plus another if it sets up the console window in wish). Named entities in one interpreter are completely distinct from named entities in another interpreter with a few key exceptions:
Channels have common names across all interpreters. This means that an interpreter can talk about channels owned by another interpreter, but merely being able to mention its name doesn't give permission to access the channel. (The stdin, stdout and stderr channels are shared by default.)
The interp alias command can be used to make alias commands, which are such that invoking a command (the alias) in one interpreter can cause a command (the implementation) in another interpreter to be called, with all arguments safely passed over. This allows one interpreter to expose whatever special calls it wants another interpreter to access without losing control, but it is up to the implementation of those commands to act safely on those arguments.
A safe interpreter is one with the unsafe commands of Tcl profiled out by default. (That's things like open, socket, source, load, cd, etc.) The parent interpreter that created the safe child interpreter can use the alias mechanism to add in exactly the functionality desired; it's very much analogous to an OS system call except you can easily make your own application-specific ones.
Tcl's threading package is designed to create one interpreter per thread (and the aliasing mechanism does not work across threads). That means that there's very little in the way of shared resources by default, and inter-thread communication is done via queued message passing.
In general, packages are required at most once per interpreter and are how you are recommended to access most third-party functionality. Namespaces are fairly lightweight and are used for all sorts of things, and interpreters are considered to be expensive; lots of quite thoroughly production-grade Tcl scripts only ever work with a single interpreter. (Threads are even more expensive than interpreters; it's good practice to match the number of threads you create to the hardware load that you wish to impose, probably through the use of suitable thread pools.)
The purpose of a module is to provide modular code, i.e. code that can easily be used by applications beyond the module writer's knowledge and control, and that encapsulates their own internals.
Package-namespace- and interpreter-based modules are probably equally good at encapsulation, but it's not as easy to make interpreter-based modules that play well with arbitrary applications (it is of course possible).
My own opinion is that interpreters are application level (I mostly use them for user input and for controlled evaluation), not module level. Both namespaces and packages have their warts, but in most cases they do what is expected of them with a minimum of fuss.
My recommendation is that if you are writing modules for your own benefit and interpreters serve you well, by all means use them. If you write modules that other people are to use, possibly including yourself in 18 months, you should stick with namespaces and packages.

Origin of hierarchical structuring

Why are libraries located behind com/ or net/ directory structures?
This is agnostic to Flash, Flex or any language. It's been used for a long time in general software development. I believe it stemmed from the Java package structure, but I'm not sure. It's used because it's now a standard on how to do things and helps split up projects in a fairly unique way.
It normally goes like <domain extension>/<domain>/<project name>/<sub component>/<whatever>.
This format/structure is called the reverse domain name structure. This structure is used for the package namespace for your classes.
Here is a good article on The Classpath Demystified by Jody Hall
If you're talking about class packages the point is every package should be unique. Imagine you wrote a class named MyGreatClass. Without any package or within some simple package test.MyGreatClass (this is called fully qualified class name). In this project you've decided to use some library where somebody wrote another test.MyGreatClass class (he/she didn't realize you have another one). So you'll have a conflict of two classes.
To avoid that situation there is a convention to start classes with author's site name in reverse order. Taking in mind every domain name is unique. Following this convention you can be sure you class won't conflict with others.
As far as com and net are most common domains you can see com.example (for http://example.com/) and net.example (for http://example.net/) very often.
Advantages of OOP
Inheritance
maintainability
Re-usability
A class is considered an object.
Having a package structure allows for all the advantages of OOP
Having a standard folder "com" where all your custom classes are allows you to reuse those classes with ease.
All libraries that I did not create, I make sure goes into my com folder. So when I make a new project I just have to point the project settings to that folder, then I can access those libraries with just having to do an import statement.
For example The AS3crypto library I have in the com folder.

What are namespaces for ? what about usages?

what is the purpose of namespaces ?
and, more important, should they be used as objects in java (things that have data and functions and that try to achieve encapsulation) ? is this idea to far fetched ? :)
or should they be used as packages in java ?
or should they be used more generally as a module system or something ?
Given that you use the Clojure tag, I suppose that you'll be interested in a Clojure-specific answer:
what is the purpose of namespaces ?
Clojure namespaces, Java packages, Haskell / Python / whatever modules... At a very high level, they're all different names for the same basic mechanism whose primary purpose is to prevent name clashes in non-trivial codebases. Of course, each solution has its own little twists and quirks which make sense in the context of a given language and would not make sense outside of it. The rest of this answer will deal with the twists and quirks specific to Clojure.
A Clojure namespace groups Vars, which are containers holding functions (most often), macro functions (functions used by the compiler to generate macroexpansions of appropriate forms, normally defined with defmacro; actually they are just regular Clojure functions, although there is some magic to the way in which they are registered with the compiler) and occasionally various "global parameters" (say, clojure.core/*in* for standard input), Atoms / Refs etc. The protocol facility introduced in Clojure 1.2 has the nice property that protocols are backed by Vars, as are the individual protocol functions; this is key to the way in which protocols present a solution to the expression problem (which is however probably out of the scope of this answer!).
It stands to reason that namespaces should group Vars which are somehow related. In general, creating a namespace is a quick & cheap operation, so it is perfectly fine (and indeed usual) to use a single namespace in early stages of development, then as independent chunks of functionality emerge, factor those out into their own namespaces, rinse & repeat... Only the things which are part of the public API need to be distributed between namespaces up front (or rather: prior to a stable release), since the fact that function such-and-such resides in namespace so-and-so is of course a part of the API.
and, more important, should they be used as objects in java (things that have data and functions and that try to achieve encapsulation) ? is this idea to far fetched ? :)
Normally, the answer is no. You might get a picture not too far from the truth if you approach them as classes with lots of static methods, no instance methods, no public constructors and often no state (though occasionally there may be some "class data members" in the form of Vars holding Atoms / Refs); but arguably it may be more useful not to try to apply Java-ish metaphors to Clojure idioms and to approach a namespace as a group of functions etc. and not "a class holding a group of functions" or some such thing.
There is an important exception to this general rule: namespaces which include :gen-class in their ns form. These are meant precisely to implement a Java class which may later be instantiated, which might have instance methods and per-instance state etc. Note that :gen-class is an interop feature -- pure Clojure code should generally avoid it.
or should they be used as packages in java ?
They serve some of the same purposes packages were designed to serve (as already mentioned above); the analogy, although it's certainly there, is not that useful, however, just because the things which packages group together (Java classes) are not at all like the things which Clojure namespaces group together (Clojure Vars), the various "access levels" (private / package / public in Java, {:private true} or not in Clojure) work very differently etc.
That being said, one has to remember that there is a certain correspondence between namespaces and packages / classes residing in particular packages. A namespace called foo.bar, when compiled, produces a class called bar in the package foo; this means, in particular, that namespace names should contain at least one dot, as so-called single-segment names apparently lead to classes being put in the "default package", leading to all sorts of weirdness. (E.g. I find it impossible to have VisualVM's profiler notice any functions defined in single-segment namespaces.)
Also, deftype / defrecord-created types do not reside in namespaces. A (defrecord Foo [...] ...) form in the file where namespace foo.bar is defined creates a class called Foo in the package foo.bar. To use the type Foo from another namespace, one would have to :import the class Foo from the foo.bar package -- :use / :require would not work, since they pull in Vars from namespaces, which records / types are not.
So, in this particular case, there is a certain correspondence between namespaces and packages which Clojure programmers who wish to take advantage of some of the newer language features need to be aware of. Some find that this gives an "interop flavour" to features which are not otherwise considered to belong in the realm of interop (defrecord / deftype / defprotocol are a good abstraction mechanism even if we forget about their role in achieving platform speed on the JVM) and it is certainly possible that in some future version of Clojure this flavour might be done away with, so that the namespace name / package name correspondence for deftype & Co. can be treated as an implementation detail.
or should they be used more generally as a module system or something ?
They are a module system and this is indeed how they should be used.
A package in Java has its own namespace, which provides a logical grouping of classes. It also helps prevent naming collisions. For example in java you will find java.util.Date and java.sql.Date - two different classes with the same name differentiated by their namespace. If you try an import both into a java file, you will see that it wont compile. At least one version will need to use its explicit namespace.
From a language independant view, namespaces are a way to isolate things (i.e. encapsulate in a sens). It's a more general concept (see xml namespaces for example). You can "create" namespaces in several ways, depending on the language you use: packages, static classes, modules and so on. All of these provides namespaces to the objects/data/functions they contain. This allow to organize the code better, to isolate features, tends for better code reuse and adaptability (as encapsulation)
As stated in the "Zen of Python", "Namespaces are one honking great idea -- let's do more of those !".
Think of them as containers for your classes. As in if you had a helper class for building strings and you wanted it in your business layer you would use a namespace such as MyApp.Business.Helpers. This allows your classes to be contained in sensical locations so when you or some else referencing your code wants to cosume them they can be located easily. For another example if you wanted to consume a SQL connection helper class you would probably use something like:
MyApp.Data.SqlConnectionHelper sqlHelper = new MyApp.Data.SqlConnectionHelper();
In reality you would use a "using" statement so you wouldn't need to fully qualify the namespace just to declare the variable.
Paul

What are common conventions for using namespaces in Clojure?

I'm having trouble finding good advice and common practices for the use of namespaces in Clojure. I realize that namespaces are not the same as Java packages so I'm trying to tease out the conventions in Clojure, which seem surprisingly hard to determine.
I think I have a pretty good idea how to split functions into clj files and even roughly how I'd want to organize those files into directories. But beyond that I'm having trouble finding the mechanics for my dev environment. Some inter-related questions:
Do I use the same uniqueness conventions for Clojure namespaces as I would normally use for Java packages? [ie backwards-company-domain.project.subsystem]
Should I save my files in a directory structure that matches my namespaces? [ala Java]
If I have multiple namespaces, do I need to compile all of my code into a jar and add it to my classpath to make it accessible?
Should each namespace compile to one jar? Or should I create a single jar that contains clj code from many namespaces?
Thanks...
I guess it's ok if you think it helps, but many Clojure projects don't do so -- cf. Compojure (using a top-level compojure ns and various compojure.* ns's for specific functionality), Ring, Leiningen... Clojure itself uses clojure.* (and clojure.contrib.* for contrib libraries), but that's a special case, I suppose.
Yes! You absolutely must do so, or else Clojure won't be able to find your namespaces. Also note that you musn't use the underscore in namespace names or the hyphen in filenames and wherever you use a hyphen in a namespace name, you must use an underscore in the filename (so that the ns my.cool-project is defined in a file called cool_project.clj in a directory called my).
You need to make sure all your stuff is on the classpath, but it doesn't matter if it's in a jar, multiple jars, a mixture of jars and directories on the filesystem... As long as it obeys the correct naming conventions (your point no. 2) you should be fine.
However, do not compile things ahead-of-time if there's no particular reason to do so -- this may prevent your code from being portable across various versions of Clojure without providing any benefits besides a slightly improved loading time.
You'll still need to use AOT compilation sometimes, notably in some Java interop scenarios -- the documentation of the relevant functions / macros always mentions that. There are examples of things requiring AOT in clojure.contrib; I've never needed it, so I can't provide much in the way of details.
I'd say you should use jars for functional units of code. E.g. Compojure and Ring get packaged as single jars containing many namespaces which together compose the whole package. Also, clojure.contrib is notably packaged as a single jar with multiple unrelated libraries; but that again may be a special case.
On the other hand, a single jar containing all of your project's code together with its dependencies might occasionally be useful for deployment. Check out the Leiningen build tool and its 'uberjar' facility if you think that sort of thing may be useful to you.
Strictly speaking, not necessary, though many Java projects have dropped that convention as well, especially for internal projects or private APIs. Do avoid single-segment namespaces though, which would result in classfiles being generated in the default package.
Yes.
Regarding 3 & 4, packaging and AOT compilation are entirely orthogonal to the question of namespace conventions.

Hiding Complexity by Building Concise Libraries

I'm developing a product with a bunch of interlocking pieces (server, client, libraries, etc) and one of the pieces is a tiny library that users will link into their own client-side code (something kind of like the Flickr API or the Google Maps API). Once they've included that library, all of the interlocking bits magically hook themselves together. So API simplicity is a major, important goal.
The API that I expose to users has a grand total of two classes and seven public methods. Easy peasy, lemon-squeezy.
But the simplicity is a carefully crafted illusion. The library I'm distributing actually depends on another library, with 136 classes of its own (and more than a thousand public methods). During the build process, I link the two libraries together into a single deliverable, for ease of integration and deployment by the API consumer.
The problem I'm facing now is that I don't want the end user (an application developer integrating my software to enhance their own functionality) to ever be bothered with all that extra cruft, drowning in a torrent of unnecessary complexity.
From the outside, the library should look like it contains exactly two public classes, with exactly seven public methods.
How do you handle this sort of thing in your own projects? I'm interested in the language agnostic solutions, as well as the various techniques for different languages, compilers, and build tools.
In my specific case, I'm developing for the flash platform (AIR/Flex/Actionscript) with SWC library files. The build methodology is analagous to the Java platform, where all classes are bundled into a zipped code module with equal visibility (an Actionscript SWC file is, conceptually, almost exactly identical to a Java JAR file).
Doesn't .NET have an "internal" modifier for classes and methods? That's exactly the sort of thing I'm looking for, and if anyone knows of a tricky technique to hide the visibility of classes between SWC boundaries, I'd love to hear it.
It's pretty hard to hide things in AS. There is an internal access specifier and there are also namespaces. Adobe has some help on Packages and namespaces that may be useful to you.
It is important to note that namespaces do not limit access - they are really used to place symbols into a different ... well namespace. This could be used to have 2 versions of the same library accessed in the same swf. My guess is it just does some name-mangling behind the scenes before inserting definitions into the symbol table. If users want, they can just import the namespace and access anything that is "hidden" behind it. I've done that when hacking apart Adobe components. That said, if the user doesn't have the original source and is incapable of determining the namespace identifier than you have a bit of security through obscurity.
package access specifiers (e.g. private and internal) are closer to what you want. But if you can access classes outside package boundaries then the user can too. There are even hacks I've seen around that can examine a swfc and spit out a list of embedded classes which one can use getClassByDefinition to instantiate.
So, you can hide the classes existence in your documentation, use internal and private access specifiers wherever possible and then use namespaces to mangle the classnames. But you cannot prevent a determined person from finding and using these classes.
I think you can pull this off by using namespaces:
http://livedocs.adobe.com/flash/9.0/main/wwhelp/wwhimpl/common/html/wwhelp.htm?context=LiveDocs_Parts&file=00000040.html
Notice that namespaces is not the same in actionscript as in C#, it is more like namespaces in xml.
Incidentally, one of the other tricks that I've used (since I didn't know about the "internal" modifier or namespaces) is to hide classes by declaring them outside the current package, like this:
package com.example {
public class A {
// ...
}
}
class B {
// ...
}
class C {
// ...
}
I've even though about writing a little tool that will analyze all the "import" directives within a project and move all external dependencies into these kinds of hidden private classes.