Decision tree induction open-source code [closed] - open-source

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am preparing a task for computer vision class, which involves training a simple classifier after extracting features from images. Since machine learning is not the main topic here, I don't want students to implement a learning algirithm from scratch. So, I have to recommend them some reference implementations. I believe the decision tree classifier is suitable for that.
The problem is the variety of languages allowed for the class is quite large: C++, C#, Delphi. Also, I don't want students to spend a lot of time to any technical issues like linking a library. WEKA is great for Java. We also can use OpenCV with all the wrappers, but it is quite big and clumsy while I want something simple and sweet.
So, do you know any simple C++/C#/Delphi libraries for learning decision trees?

I know of such libraries, only one of which i have used recently. The two are Waffles and the Tilburg-Based Memory Learner (TiMBL). Both are free and open-source (lgpl and GNU gpl, respectively). In addition, both are stable, mature libraries. Waffles was created and is currently maintained by a single developer, while TiMBL i believe is an academic project (directed at the field of Linguistics).
Of these two, i have only used the decision tree module in Waffles (in class GDecisionTree, see the documentation here) Waffles might be the library of choice here because it includes a decent set of functions for descriptive statistics as well as plotting functions for diagnostics, to visualize the solution space, and whatnot. The Library author (Mike Gashler) also included a set of demo apps, though i don't recall if one of these apps is a decision tree.
I have used several of the classes in the Waffles Library (including the decision tree class) and i can certainly recommend it. I'm unable to say anything more about the Tilburg-Based Memory Learner because i have never used its decision tree class though.

Have you looked at the "decision forest" implementation in Alglib? It's free for academic use. The webpage claims support for C++/C# and (maybe) Delphi. It's not a decision tree implementation but random forests tend to be better classifiers than single decision trees on many problems and they don't take much longer to train. My guess is that it will be hard to find a consistent decision tree implementation across multiple languages because there are so many different types of decision tree algorithms.
There are a number of other open source random forest libraries listed in the Wikipedia article if the Alglib one is not what you need. Cavaet: the Alglib implementation claims not to be a traditional random forest.

Programming language is not a problem. It is very hard to find a decision tree implementation for each language. Nearly impossible to guarantee that all the versions are the same implementation.
Since decision tree is a black box method. You can write the training and testing data into standard files(e.g. arff format in Weka, opencv also has its own format.) and use command line to call the tree learner and tester. In this way, all the students have the same decision tree. Otherwise, student A uses a good tree learner, student B uses a bad tree learner, when their results are different, you don't know whether it comes from the difference of decision tree or the CV part (e.g. feature processing). In this situation, you will go into the situation where you have to care about the details/implementation quality of the tree learners.

Related

New design patterns/design strategies [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I've studied and implemented design patterns for a few years now, and I'm wondering. What are some of the newer design patterns (since the GOF)? Also, what should one, similar to myself, study [in the way of software design] next?
Note: I've been using TDD, and UML for some time now. I'm curious about the newer paradigm shifts, and or newer design patterns.
There is roughly an infinite number of design patterns. Design patterns are just that: a recurrence of tricks programmers use to get things done. The most useful thing about the GoF patterns is how famous they are. In that, they have become a language -- exactly what the GoF hoped to achieve.
Many other patterns you'll find on the web and in literature are "just" useful tricks, not so much a language you can use when you speak to fellow programmers. That said, there is a number of patterns that arose in the past ten years or so, particularly in the realm of web development. See the patterns listed in Martin Fowler's patterns book.
I'm surprised that no one has mentioned Martin Fowler's book Patterns of Enterprise Application Architecture. This is an outstanding book with dozens of patterns, many of which are used in modern ORM design (repository, active record), along with a number of UI layer patterns. Highly recommended.
I am an avid follower and supporter of the PCMEF (now PCBMER) framework
Here's a simpler overview of it.
It kind of understands that enterprise systems are huge an complex, and by combining a bunch of other design patterns together into the PCMBER framework (Presentation, Control, Mediator, Entity and Resource), even the most complex system remain easy to usnerstand and manage.
One of the newer ones that I found particularly useful is Domain Driven Design. Not so much a pattern in its own right, but more of a mindset - to concentrate on the domain objects - i.e. the things that you model and build the rest of application around it.
I found that it gave meaning to principles that we all knew before but were too lazy to deal with - like Single Responsibility Principle and Separation of Concerns. I take those two especially more seriously now.
Another axis of improvement for me was TDD and Dependency Injection. I have discovered that with lots of interfaces and classes implementing them I was able to let go of this fear of only defining something once. That is not to say that it is in conflict with DRY(Don't Repeat Yourself) much. It's OK to have two classes with the same properties if their purposes are different. Encapsulation and SRP are much more important than only defining a property once.
Umm...none of the things people have mentioned are design patterns.
GOF was written implicitly with Java in mind. It explored that space pretty well. However once you go into other languages some patterns are no longer necessary (Observer is rarely used in a language like C# that supports events) and some new ones spring forth. Grab yourself the Pro JavaScript Design Patterns or Design Patterns In Ruby books and see what happens to the stand-by pattens in these very different paradigms.
My favorites lately have come from leaning on the functional drift of modern languages. I'm a big fan of nested closures and of the functional ways of tackling some of the same problems that GoF does (again, see the Ruby book for great examples). I also am currently in love with the idea of internal domain-specific languages which open up into a whole series of design patterns of their own (including nested closures). Also event-aggregation seems to be poised to hit it big in the .Net world in the near future.
A couple other big ones that have hit the scene but aren't discussed as much in GoF - probably because they are more high level then what those guys were going for - are Inversion Of Control Containers, Message Bussing, Aspect-Oriented-Programming, Model-View-Controller, Model-View-Presenter, Model-View-ViewModel, and their ilk.
By the way, these are not design patterns, but if you're looking to progress beyond TDD start looking into Behavior-Driven-Development and Context/Specification.
A huge change from a maintenance aspect is the use of DVCS. If you don't know what one is or haven't used one, I highly suggest reading up on the two hard hitters:
Mercurial (hg): https://www.mercurial-scm.org/
git : http://git-scm.com/
They've done quite a bit to change the workflow of the common programming environment. Not really a pattern/design I spose, but I don't think TDD or UML are technical patterns/designs either at some level. Maybe more like common practices surrounding programming.

Looking for (c)lisp examples of mini-languages, that is, DSLs [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Reading well-written code seems to help me learn a language. (At least it worked with C.) [deleting the 'over-specified' part of the question]
I'm interested in particular in lisp's reputation as a language suited to creating a mini-language or DSL specific to a problem. The program ought to be open-source, of course, and available over the web, preferably.
I've Googled and found this example:
http://lispm.dyndns.org/news?ID=NEWS-2005-07-08-1
Anybody have another? (And, yes, I will continue reading "Practical Common Lisp".)
After 11 hours (only 11 hours!): Thanks, everyone. What a wonderful site, and what a bunch of good answers and tips!
I feel your constraints are over-specified:
small enough to comprehend, varied
enough to show off most of (c)lisp's
tricks and features without being
opaque (the 'well-written' part of the
wish), and independent of other
packages.
Common Lisp is a huge language, and the power set that emerges when you combine the language elements is much larger. You can't have a small program showing "most tricks" in CL.
There are also many concepts that you will find alien when you learn CL coming from another language. As such CL is less about tricks but more about its fundamental paradigms.
My suggestion is to read up on it a bit first and then start building your own programs or looking into open source code.
Edi Weitz for example usually writes good code. Check out his projects at http://www.weitz.de/.
And now go read PCL. :)
I'm kind of lazy to find the links, but you should be able to 'Google'/'Bing' it. The following list mentions very different ways to embed languages and very different embedded languages.
ITERATE for iterations
System/Module/File description in 'defsystem's, an example would be ASDF
infix readmacro
define-application-frame in CLIM for specifying user interfaces
embedded Lispified SQL queries in LispWorks and CLSQL
Knowledgeworks of LispWorks: logic language with rules, queries, ...
embedded Prolog in Allegro CL
embedded HTML in various forms
XMLisp, integrates XML and Lisp
Screamer for non-deterministic programming
PWGL, visual programming for composing music
Note that there are simple embedded languages and really complex ones that are providing whole new paradigms like Prolog, Screamer, CORBA, ...
If you haven't taken a look at it yet, the book Practical Common Lisp is available free online and has several example projects.
The LOOP macro is an almost perfect example of a DSL embedded in Common Lisp. However, since it's already part of the standard, it may not be what you're after.
CLs format function have a mini dsl.
http://cybertiggyr.com/fmt/
I think that dsl for printing strings will compile to machine code.
(format nil "~{~A~#[~:;, ~]~}" lst))
CLSQL provides a Lispy notation for SQL queries, which it compiles to SQL, and just about all Lisp HTML and XML generation libraries qualify. Metabang bind is a DSL for lexically binding variables. You probably didn't know you needed one, but it turns out to be amazingly useful.
SERIES is kind of a DSL, depending on your definition. It's in an appendix to CLTL2, though it's not actually part of the language.

Where to find programming projects that help science? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I would like to work on a programming project in my spare time and would like to know
if there is a project where I can help the science community in some way?
Sure, plenty! I see I'm not the first to think of numerical computation libraries like Numpy/Scipy - the code in that is actually fairly mature but they could certainly use help documenting. There's also GNU Octave, which does much of the same things as Numpy but doesn't require Python. A slightly related area in which there's a lot of work to do is computer algebra systems (CAS), basically open source equivalents of Mathematica; for example Maxima, and more are listed at http://sage.math.washington.edu/home/wdj/sigsam/opensource_math.html. You could also help with visualization libraries, i.e. creation of 2D and 3D plots and figures. For Scipy the most commonly used plot generator is Matplotlib, for example. There are also loads of more specialized data visualization tools that I'm sure you can find with a few searches.
One area that I personally think needs a lot of work is creating GUIs for the programs mentioned in the previous paragraph; one major advantage that commercial programs like Matlab and Mathematica enjoy over their open source equivalents is easy-to-use graphical interfaces. Having a nice usable interface would be great for scientists who may not be skilled in command-line-fu, but open source projects have a long way to go if they're going to catch up.
Projects like scipy and numpy are largely contributed by the scientific community. I'm sure they would appreciate any help you thought you could provide.
I know BOINC is always looking for help
Edit: Here is their programming help page http://boinc.berkeley.edu/trac/wiki/DevProjects
The Bio* projects like BioPerl, BioPython, or BioRuby would certainly like some help, too.
http://sourceforge.net/search/?type_of_search=soft&words=science
In addition to searching open source projects online, you can try to contact your local university and ask if any of their researchers (students or faculty) need development help.
If you are still looking, feel free to contact me via my profile page - I know of a hardware product that needs software - it is used for research (chemistry and biology)
The nuclear ad particle physics communities make heavy use of ROOT, which is developed using an open source methodology. They accept suggestions and patches without much trouble. The main work is in C++, but there are binding and support for other languages as well.
I'm sure that other disciplines have their own domain specific tools. For instance, I know that there are open Computational Fluid Dynamics and Finite Element systems.
Have a look around. While domain knowledge would be helpful, most big tools are going to need help with routine stuff like RDBMS access, GUIs, documentation, and so on...
You can discover the current problems of Science by reading the abstracts of the academic journals. e.g. the Bioinformatics journal.
A few examples:
Find a faster/efficient methods to assemble a huge set of short DNA reads:
Find a way to build an efficient social scientific network
Find a way to compare thousand of human genomes
....
you could also propose your help on Nature Network:Collaboration or FriendFeed: The life scientists
There are many exicting opportunities in chemistry. There is a strong Open Source community, much of which is organized under the Blue Obelisk (http://www.blueobelisk.org). There have been major contributions in visualisation and algorithms which did not need previous chemical knowledge and the community is very welcoming to anyone who wishes to help.
For an example of the standard which has been achieved take a look at Jmol which visualizes molecules and other chemistry in 3D (http://www.jmol.org);
There is also real opportunity to do porting between platforms/languages. The commonest ones are Java, Python, C++ and we have been working in C#. You don't have to be an ace programmer either - contributions to data standards, data resources, tutorials, packaging, installers, testing, etc. are all highly valued.
Some of these projects are within the top 100-500 projects on Sourceforge.
Don't forget that if you find a project to be a bit over your head or you aren't able to really contribute, but you still like the idea of it, you can always donate!

What does it take to make a language successful? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have an interesting idea for a new programming language. It's based on a new programming paradigm that I've been working out in my head for some time. I finally got around to start working on a basic parser and interpreter for it a few weeks ago.
I want my new language to be successful and I want to eventually create a community around it when it's ready to release. The idea behind it is fairly innovative, so I don't expect it to gain a lot of ground in the business world, but it would thrill me more than anything else to see a handful of start ups use or open source projects use it.
So taking those aims into account, what can I do to help make my language successful? What do language projects do to become successful? What should I avoid at all costs? I'd love to hear opinions or stories about other languages -- successful or not -- so I can think about them as I continue to develop.
So far, the two biggest concerns on my mind are finding a market, access to existing libraries, having amazing tool support. What else might I add to this list?
The true answer is by having a beard.
http://blogs.microsoft.co.il/blogs/tamir/archive/2008/04/28/computer-languages-and-facial-hair-take-two.aspx
Although not specific to new programming languages, the book Producing Open Source Software by Karl Fogel (available to read online) may be contain some hints to the issue of making a community around your new programming language.
In terms of adoption of programming languages in general, it seems like the trend lately has been to have a rich library to make development times shorter.
As there isn't much detail on what your language is like, it's hard to determine whether adoption of the language is going to depend on the availability of a rich library. Perhaps your language will be able to fill a niche that has been overlooked by other languages and be able to gain users. Or perhaps it has a slick name that will draw people in -- there are many factors which can affect the adoption of a language.
Here are some factors that come to mind when thinking about recent successful languages:
Ability to leverage existing libraries in the new language.
Having an adapter to external libraries written in other languages.
Python allows access to code written in C through the Python/C API.
Targeting a platform which already has plenty of libraries available for use.
Groovy and Scala target the Java platform, therefore allowing the use of and interoperation between existing Java code.
Language design and syntax to allow increased productivity.
Many dynamically-typed languages have gained popularity, such as Ruby and Python to name a couple.
More concise and clear code can be written in languages such as Groovy, as opposed to verbose languages such as Java.
Offering features such as functions as first-class objects and closures which aren't offered in more "traditional" languages such as C and Java.
A community of dedicated users who also are willing to teach newcomers on the benefits of a language
The human factor is going to be big in wide-spread support for a language -- if people never start using your language, it won't gain more users.
Also, another suggestion that I could add is to make the development of your language open -- keep your users posted on developments in your language, and allow people to give you feedback. Better yet, let your users take part in the decision-making process, if you feel that is appropriate.
I believe that by offering ways to participate in the bringing up of a language, the more people will feel that they have a stake in the success of the new language, so the more likely it will gain more support.
Good luck!
Most languages that end up taking off rapidly do so by means of a killer app. For C it was Unix. Ruby had Rails. JavaScript is the only available programming system common to most browsers without third-party add-ons.
Another means of success is by fiat. This only works if you have significant clout. For example C#, as nice as a language as it might be, wouldn't be any where near as popular as it is now if Microsoft had not pushed it as hard as it does. Objective-C is the language of MacOS X simply because Apple says so.
The vast majority of languages, though, which lack a single killer app or a major corporate backer have gained success through long term investment of their respective creators. Perl and Python are prime examples. C++ has no single entity behind it, but it has evolved as the needs of developers have changed.
Don't worry about trying to make the language be successful; worry about using it to solve real problems and make real money.
You'll either make lots of money from using this language, or not. Once you have lots of money, others may care how you did it. Or not, either way you have lots of money.
If you don't make lots of money, nobody will want to know how you did it.
Edit based on comment: I define successful as people using it, and people use languages to solve problems, most for profit, thus successful == profitable.
In addition to making the language easy to use (which has several meanings), you should develop a comprehensive library that covers and also provides a good level of abstraction over (the following most important areas):
* Data structures and manipulation
* File I/O support
* XML processing
* Networking (plus web based technologies like HTTP/HTTPS)
* Database support
* Synchronous and asynchronous I/O
* Processes and threads
* Math
A well thought out framework that makes rapid development faster (and easier to maintain) would be a great addition. For this, you should know the currently popular frameworks well.
Keep in mind that it takes a lot of time. I think it took python about 10 years (someone please correct me if I'm wrong).
So even if your community still seems small after say, 5 years, that's not the end of the story.
"It's based on a new programming paradigm that I've been working out in my head for some time."
While laudable, odds are really good that someone has already done something with your "new" paradigm.
To make a language usable, it must build on prior art. Totally new is not a good path to success. My favorite example is Algol 68.
Algol 60 was wildly popular (back in the day, which is a while ago, admittedly).
The experts wanted to build on this success. They proposed some new paradigms, the effort split into factions. The purists put the new paradigms into Algol 68; it disappeared into obscurity. Some folks created a different version of Algol, called PL/I. It did not have any really new paradigms. It actually went somewhere and was used heavily. Another group created Pascal -- it didn't have much that was new -- it discarded things from Algol 60. It actually went somewhere ans was used heavily.
Your new paradigm must have a clear and concise summary so people can fit it into a context of where the language is usable, how it can be used, what the costs and benefits of using it are.
A "new programming paradigm" causes some people to say "why learn a completely new paradigm when the ones I have work so nicely?" You have to be very clear on how it helps to have a new paradigm.
The language and libraries must work, and work very, very well. A language that isn't rock-solid is worthless. In order to be rock-solid it must be very simple.
It has to have a tutorial that will help anyone get started with your language.
Good Framework for Common Tasks
Easy Installation/Deployment
Good Documentation
Debugger/IDE and other Tools
A popular flagship product that uses your language!
Good documentation, including a detailed reference manual as well as simple examples to get people started quickly.
Good library support so that people can actually write useful programs.
Most popular languages seem to be very strong in either or both or both of those.
Use Trojan Horse approach
C++ - The Forgotten Trojan Horse
An interesting article on why C++ can grab the heart of programmers successfully.

Open source expert system [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
does anyone know about a open source expert system? actually, I'm rather interested in calling its inferential engine from C#.
Both CLIPS and JESS are already mentioned in other answers, so I will supply this link to CLIPS versus JESS:
http://www.comp.lancs.ac.uk/~kristof/research/notes/clipsvsjess/
It was written June 4, 1999, and at that time the advantage was clearly with CLIPS.
If you don't want to read it all, here are the conclusions:
Chapter 3 The conclusions
Both CLIPS and JESS are products with a large support on the internet,
but CLIPS seems to have a broader audience, probably because it exists
longer. This difference in age results in the CLIPS package being more
stable and complete, while JESS users will still experience some minor
bugs. JESS is constantly updated and the author, Ernest Friedman-Hill,
has been very responsive to user/developer feedback and regularly puts
out new releases and bug fixes.
Nowadays, the choice between JESS and CLIPS depends on the
application. If it is web-based or should reside in applet-form, the
choice of JESS is a very logical one (which is even supported by the
authors of CLIPS). For the more classic applications, CLIPS will
probably be chosen because of its reputation of being more stable and
having more support.
The future of JESS depends highly on the evolution of the web, the
Java programming language and its own future stability. These three
conditions make that there is a great possibility that JESS will
become more popular and more frequently used. Especially the
object-oriented possibilities and the easy integration into Java code
makes JESS’ future very promising.
CLIPS, on the other hand, is more likely to implement the new and
sophisticated features first as they come out, since it still has the
advantage in time. CLIPS has also various extensions and variants(like
FuzzyCLIPS, AGENT CLIPS, DYNACLIPS, KnowExec, CAPE, PerlCLIPS, wxCLIPS
and EHSIS to name a few) that give it an advantage with respect to
support of methods like fuzzy logic and agents.
The multifunctional developing environment of CLIPS for operating
systems that support windows is also an advantage, while JESS has just
one window with two buttons (‘clear window’ and ‘quit’), without a
menu. Figures 1 and 2 depict both environments.
To summarize, CLIPS is still more complete and stable than JESS, but
this might change in the future, since the JESS package is being
improved constantly. Besides that, JESS has also the property of using
Java, which in the long run might prove to be a big advantage over
CLIPS.
These links may also be of interest:
http://en.wikipedia.org/wiki/CLIPS
Commercial & Freeware Expert System Shells
http://www.kbsc.com/rulebase.html
Are there open source expert systems with reasoning capabilities?
I went through the same process, about a year ago, trying to find a good .Net system for this. I recall finding a few decent engines, but they were all too general, and required too many assumptions.
In the end I found that writing my own system was pretty easy to do, and it did exactly what I wanted it to, without any extra bull to make it work with some abstract generalized engine.
It might help to know what your intended use is.
Take a look at CLIPS -- it is coded in C.
There's more info on CLIPS at Wikipedia.
If you'd consider a rule-processing engine, JBoss Rules (also known as Drools) is the best that I know of. Open Source and free. It's written in Java, but designed for integration. You can incorporate objects in the rules and rule-base applications in your components. You can even build or modify rule-bases on the fly.
AI::ExpertSystem::Advanced or AI::ExpertSystem::Simple is a Perl solution.
You can try JESS, but it is Java-based. Amzilogic also provide a good platform.