Studying standard library sources - language-agnostic

How does one study open-source libraries code, particularly standard libraries?
The code base is often vast and hard to navigate. How to find some function or class definition?
Do I search through downloaded source files?
Do I need cvs/svn for that?
Maybe web-search?
Should I just know the structure of the standard library?
Is there any reference on it?
Or do some IDEs have such features? Or some other tools?
How to do it effectively without one?
What are the best practices of doing this in any open-source libraries?
Is there any convention of how are sources manipulated on Linux/Unix systems?
What are the differences for specific programming languages?
Broad presentation of the subject is highly encouraged.
I mark this 'community wiki' so everyone can rephrase and expand my awkward formulations!
Update: Probably didn't express the problem clear enough. What I want to, is to view just the source code of some specific library class or function. And the problem is mostly about work organization and usability - how do I navigate in the huge pile of sources to find the thing, maybe there are specific tools or approaches? It feels like there should've long existed some solution(s) for that.

One thing to note is that standard libraries are sometimes (often?) optimized more than is good for most production code.
Because they are widely used, they have to perform well over a wide variety of conditions, and may be full of clever tricks and special logic for corner cases.
Maybe they are not the best thing to study as a beginner.
Just a thought.

Well, I think that it's insane to just site down and read a library's code. My approach is to search whenever I come across the need to implement something by myself and then study the way that it's implemented in those libraries.
And there's also allot of projects/libraries with excellent documentation, which I find more important to read than the code. In Unix based systems you often find valuable information in the man pages.

Wow, that's a big question.
The short answer: it depends.
The long answer:
Some libraries provide documentation while others don't. Standard libraries are usually pretty well documented, whether your chosen implementation of the library includes documentation or not. For instance you may have found an implementation of the c standard library without documentation but the c standard has been around long enough that there are hundreds of good reference books available. Documentation with hyperlinks is a very useful way to learn a new API. In any case the first place I would look is the library's main website
For less well known libraries lacking documentation I find two different approaches very helpful.
First is a doc generator. Nearly every language I know of has one. It basically parses an source tree and creates documentation (usually as html or xml) which can be used to learn a library. Some use specially formatted comments in the code to create more complete documentation. JavaDoc is one good example of this. Doc generators for many other languages borrow from JavaDoc.
Second an IDE with a class browser. These act as a sort of on the fly documentation. Some display just the library's interface. Other's include description comments from the library's source.
Both of these will require access to the libraries source (which will come in handy if you intend actually use a library).
Many of these tools and techniques work equally well for closed/proprietary libraries.

The standard Java libraries' source code is available. For a beginning Java programmer these can be a great read. Especially the Collections framework is a good place to start. Take for instance the implementation of ArrayList and learn how you can implement a resizeable array in Java. Most of the source has even useful comments.
The best parts to read are probably whose purpose you can understand immediately. Start with the easy pieces and try to follow all the steps that are hidden behind that single call you make from your own code.

Something I do from time to time :
apt-get source foo
Then new C++ project (or whatever) in Eclipse and import.
=> Wow ! Browsable ! (use F3)

Related

what's DSL in plain words?

I heard from someone that DSL is really powerful in some specific fields. So i want to find out if i can put it into my skill sets.
The first problem came out is What is DSL exactly? After doing some search, it seems Groovy supports DSL very well. Then i go and read Groovy's documents and try it out by myself.
And i got the impression that DSL is just some kind of configuration files consisting of texts, XMLs and you use some tools like Groovy to parse it, it magically become some methods or functions you can invoke. What happened?
I read something, but cannot get it straight. Any Help?
Did you read this? Martin Fowler is an authority on the subject and a great writer. I doubt that anyone will improve on the first paragraph. If you still don't get it, give it some time and re-read the article a few times.
I'd recommend looking into JetBrain's MPS
A book might be overwhelming, but there's a relatively new one available.
And i got the impression that DSL is just some kind of configuration
files consisting of texts, XMLs and you use some tools like Groovy to
parse it, it magically become some methods or functions you can
invoke. What happened?
I don't think your impression is entirely accurate. I'd forget about Groovy and parsing and all the implementation details for now. Focus on the problem that DSL is trying to solve.
A DSL designer tries to come up with a pseudo programming language that an expert, who is unfamilar with programming languages like Groovy or Java or C#, would recognize as a simple language describing they way they solve problems.
The DSL uses terms and concepts familiar to any one knowledgable about that domain.
The DSL shields users from the underlying implementation details so they can focus on how to attack their problems.
A DSL is written for the convenience of business users, not developers.
Keep that in mind and the rest is implementation. Eye on the prize....
A domain specific language (DSL) is a programing language that is not fully featured. The point is that programing in a DSL can be easier than programing in a general purpose language, and be less prone to bugs. The "domain" in "domain specific language" refers to the specific purpose the language will be used for.
For example, the language that a calculator uses with just + - * / and numbers could be called a domain specific language. It has the advantage over a regular programing language in that programs will never segfault, crash, loop forever, etc. Other examples of domains might be web development -- for example, Ur/Web is a DSL for building web applications. SQL is a database domain specific language. etc.
I don't know much about Groovy, but it seems that there are particular tools for using it to create DSLs. Fundamentally, to create a DSL you need to specify a syntax, along with some sort of semantics. How exactly Groovy does this I do not know.
DSL is a language dedicated to a specific domain. For instance, the well-known CSS is a Domain Specific Language serving the look and formatting of a document.
By using Groovy you might create your own DSL focusing on any selected domain - e.g. accounting, telecommunications, banking etc. This means, that the language will use the common terminology of this area meeting the needs of this domain. This language will be easily understood by people of this domain that are not necessarily technical (e.g. accountants). In some times, it focuses on being used by non-programmers. Especially Groovy is a dynamic language with which you can enable end-users to add code scripts dynamically similarly to what Excel does with VB, through configuration files.
You should delve into Martin Fowler's publications if you are interested in this subject, anyway.

Extending embedded Python in C++ - Design to interact with C++ instances

There are several packages out there that help in automating the task of writing bindings between C\C++ and other languages.
In my case, I'd like to bind Python, some options for such packages are: SWIG, Boost.Python and Robin.
It seems that the straight forward process is to use these packages to create C\C++ linkable libraries (with mostly static functions) and have the higher language be extended using them.
However, my situation is that I already have a developed working system in C++ therefore plan to embed Python into it so that future development will be in Python.
It's not clear to me how, and if at all possible, to use these packages in helping to extend embedded Python in such a way that the Python code would be able to interact with the various Singleton instances already running in the system, and instantiate C++ classes and interact with them.
What I'm looking for is an insight regarding the design best fitted for this situation.
Boost.python lets you do a lot of those things right out of the box, especially if you use smart pointers. You can even inherit from C++ classes in Python, then pass instances of those back to your C++ code and have everything still work. My favorite resource on how to do various stuff is this (especially check out the "How To" section): http://wiki.python.org/moin/boost.python/ .
Boost.python is especially good if you're using smart pointers or intrusive pointers, as those translate transparently into PyObject reference counting. Also, it's very good at making factory functions look like Python constructors, which makes for very clean Python APIs.
If you're not using smart pointers, it's still possible to do all the things you want, but you have to mess with various return and lifetime policies, which can give you a headache.
To make it short: There is the modern alternative pybind11.
Long version: I also had to embed python. The C++ Python interface is small so I decided to use the C Api. That turned out to be a nightmare. Exposing classes lets you write tons of complicated boilerplate code. Boost::Python greatly avoids this by using readable interface definitions. However I found that boost lacks a sophisticated documentation and dor some things you still have to call the Python api. Further their build system seems to give people troubles. I cant tell since i use packages provided by the system. Finally I tried the boost python fork pybind11 and have to say that it is really convenient and fixes some shortcomings of boost like the necessity of the use of the Python Api, ability to use lambdas, the lack of an easy comprehensible documentation and automatic exception translation. Further it is header only and does not pull the huge boost dependency on deployment, so I can definitively recommend it.

Text User Interface Design Reference?

Is there a good book or other references on Text User Interface Design? I am not interested in graphical user interfaces. I am interested in usability for good command line and scripting interfaces.
Your interface should follow the Rule of Least Surprise as described by ESR in The Art of Unix Programming. If your programm supports command line options, make sure they have the traditional meaning. Be sure to read the chapter about Tradeoffs between CLI and Visual Interfaces.
IBM developed a standard called Common User Access. The Common User Access Basic Interface Design Guide has been published in the BookManager format and in HTML here.
The guide was written as a standard for developing 3270 applications. In my opinion the most important parts are the function keys standard and a color standard.
I'd use a favorite program as a reference for something like this. What command line utility do you think has a good, efficient interface that you could model your program on? Use it.
Update: So I think I need to revise this a little. It was taken way too literally. Google and this site proved that the internet is very democratic. What is popular is replicated, linked to or reproduced in someway.
Given this, plus one's personal experiences with computers, I think it is feasible to derive a pretty good solution based on personal experience and consideration for the solution to be provided.
For example, vim is a great program. A lot of people use it and love it. But that type of interface is probably not going to work (at least well) for a version control system. But both interfaces are very elegant for the purpose they suite. On the other hand, the vim type interface might work for a section of the version control system -- the commit dialog for example.
Now, I know that vim is normally used for the "commit dialog" (by default) for svn (on unix based OSes). This is just an example of mixing two styles of interfaces to come up with a cohesive solution.
You should have a look at some of the ideas behind Ubiquity as well as some of the ideas Aza Raskin talks about, seems like the same kind of thing.

Essential Dojo

I'm starting to use Dojo; this is (essentially) my introduction to AJAX. We have a Java backend (torque / turbine / velocity) and are using the jabsorb JSON-RPC library to bridge Java and Javascript.
What do I need to know? What is the big picture of Dojo and JSON, and what are the nasty little details that will catch me up? What did you spend a couple of days tracking down, when you started with Dojo, that you now take for granted? Thanks for any and all tips.
The first thing to do is get familiar with the Dojo Object Model. JavaScript does not have a class system so the Dojo toolkit has created a sort of "by convention" object model that works rather well but is very different to how it works in Java for example.
The reason I suggest getting familiar with it is so you can dig into the code base whenever you start experiencing issues. The documentation available has improved significantly over the past year, but every now and then I find myself having to work out a bug in my code by learning exactly how the Dojo code involved works.
Another tip is to make use of the custom build feature which will significantly improve performance once your application is ready.
As a general tip on DHTML programming, use firebug (a plug-in for Firefox). It allows JavaScript debugging, DOM inspection, HTML editing in real-time and a whole lot more. I've become totally reliant on it now when I'm working in DHTML!
Good luck!
I too just dove head first into Dojo, they have a good API documentation at http://api.dojotoolkit.org/. Even Dojo Campus has some good examples of the plug ins.
If you ask me O'Reilly's Dojo: The Definitive Guide is the best Dojo book on the market.
I also would like any tips and pointers from the Dojo masters.
Cheers
Make sure documentation you read pertains to as recent a release as possible, since a lot has changed very quickly in the Dojo architecture.
Also a great way to see how some Dojo or Dijit widget is used is to look at the source code for the tests - for example, the DataGrid has poor documentation but the tests show a lot of use cases and configurations.
Sitepen is a good resource for Dojo articles.
Also, read up on Deferred (andDeferredList), as well as hitch() - two extremely flexible and powerful features of Dojo. SitePen has a great article on demystifying Deferreds.
Check out plugd, a collection of Dojo extensions that make some things more convenient or adds some clever functionalities to the language. It's made by one of the core Dojo authors so it's rather reliable. It even brings some jQuery niceties into the framework.
Some more things: look into data stores, they're very useful and a much cleaner way to handle Ajax. DojoX has a lot of nice ones too, just remember that DojoX ranges in how well documented or how experimental the components are. Learn the differences between dojo.byId and dijit.byId, as well as the HTML attributes id versus jsId (again, Sitepen has an article).
A couple of things that caught me when I started writing widgets where:
[Understand what dojoAttachPoint, dojoAttachEvent, containerNode and widgitsInTemplate do][1]
have a firm grasp of closures,
Get your head around deferreds
understand ItemFileReadStore, ItemFileWriteStore and stores in general
You can look at stores like a ResultSet (sort of) as well you can data bind them to widgets.
With these major concepts you can start to put together some compelling applications.
Generally what I do is I build a JavaScript facade around my service calls and then I will scrub the response into a store by attaching the first callback in the facade, that call back converts the results into a store and then returns it. This allows me to not hard bind my services to Dojo constructs (so I can support mobile, etc.) while also retuning the data from the facade in a format that data aware widgets expect.
As well if you are doing Java service development you my want to look into JAX-RS. I started out using JSON-RPC which became JABS-ORB but after working with JAX-RS I prefer it, as it integrates well with JPA-EJB and JAXB.
First read how to configure Dojo in your application. Try to understand basic structure of Dojo like if we are writing dijit.form.Button or dijit/form/Button it means Button.js resides in dijit/form folder. Try to understand require, define, declare modules of Dojo. This is enough to start Dojo Toolkit.
Very important fact, indulge with your own sample project using Dojo.

runnable pseudocode?

I am attempting to determine prior art for the following idea:
1) user types in some code in a language called (insert_name_here);
2) user chooses a destination language from a list of well-known output candidates (javascript, ruby, perl, python);
3) the processor translates insert_name_here into runnable code in destination language;
4) the processor then runs the code using the relevant system call based on the chosen language
The reason this works is because there is a pre-established 1 to 1 mapping between all language constructs from insert_name_here to all supported destination languages.
(Disclaimer: This obviously does not produce "elegant" code that is well-tailored to the destination language. It simply does a rudimentary translation that is runnable. The purpose is to allow developers to get a quick-and-dirty implementation of algorithms in several different languages for those cases where they do not feel like re-inventing the wheel, but are required for whatever reason to work with a specific language on a specific project.)
Does this already exist?
The .NET CLR is designed such that C++.Net, C#.Net, and VB.Net all compile to the same machine language, and you can "decompile" that CLI back in to any one of those languages.
So yes, I would say it already exists though not exactly as you describe.
There are converters available for different languages. The problem you are going to have is dealing with libraries. While mapping between language statements might be easy, finding mappings between library functions will be very difficult.
I'm not really sure how useful that type of code generator would be. Why would you want to write something in one language and then immediately convert it to something else? I can see the rationale for 4th Gen languages that convert diagrams or models into code but I don't really see the point of your effort.
Yes, a program that transform a program from one representation to another does exist. It's called a "compiler".
And as to your question whether that is always possible: as long as your target language is at least as powerful as the source language, then it is possible. So, if your target language is Turing-complete, then it is always possible, because there can be no language that is more powerful than a Turing-complete language.
However, there does not need to be a dumb 1:1 mapping.
For example: the Microsoft Volta compiler which compiles CIL bytecode to JavaScript sourcecode has a problem: .NET has threads, JavaScript doesn't. But you can implement threads with continuations. Well, JavaScript doesn't have continuations either, but you can implement continuations with exceptions. So, Volta transforms the CIL to CPS and then implements CPS with exceptions. (Newer versions of JavaScript have semi-coroutines in the form of generators; those could also be used, but Volta is intended to work across a wide range of JavaScript versions, including obviously JScript in Internet Explorer.)
This seems a little bizarre. If you're using the term "prior art" in its most common form, you're discussing a potentially patentable idea. If that is the case, you have:
1/ Published the idea, starting the clock running on patent filing - I'm assuming, perhaps incorrectly, that you're based in the U.S. Other jurisdictions may have other rules.
2/ Told the entire planet your idea, which means it's pretty much useless to try and patent it, unless you act very fast.
If you're not thinking about patenting this and were just using the term "prior art" in a laypersons sense, I apologize. I work for a company that takes patents very seriously and it's drilled into us, in great detail, what we're allowed to do with information before filing.
Having said that, patentable ideas must be novel, useful and non-obvious. I would think that your idea would not pass on the third of these since you're describing a language translator which would have the prior art of the many pascal-to-c and fortran-to-c converters out there.
The one glimmer of hope would be the ability of your idea to generate one of multiple output languages (which p2c and f2c don't do) but I think even that would be covered by the likes of cross compilers (such as gcc) which turn source into one of many different object languages.
IBM has a product called Visual Age Generator in which you code in one (proprietary) language and it's converted into COBOL/C/Java/others to run on different target platforms from PCs to the big honkin' System z mainframes, so there's your first problem (thinking about patenting an idea that IBM, the biggest patenter in the world, is already using).
Tons of them. p2c, f2c, and the original implementation s of C++ and Objective C strike me immediately. Beyond that, it's kind of hard to distinguish what you're describing from any compiler, especially for us old guys whose compilers generated ASM code for an intermediate represetation anyway.