On K.I.S.S and paving cowpaths [closed]

On K.I.S.S and paving cowpaths [closed] - language-agnostic

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm currently developing a PHP application that's using an Access database as a backend. Not by choice you understand... the database is what the client used originally and using it is part of the requirements.
One of the problems with this database is that the column names have the most insane naming convention you could possibly imagine. Uppercase, lowercase, underscores, spaces and the plain insane. For example the column "gender" holds a date. And so does column "User2". There's a lot more but you get the idea.
Faced with this I decided to create an array to map the database columns to PHP variables so we can isolate the code from the madness. However my colleague believes that I'm over-complicating things and we should use the database's column names for the corresponding PHP variables so we don't need to go through the mapping array to find what goes where.
So my question is this... am I doing the right thing or am I complicating things?

Absolutely you are on the right track. If you don't abstract away the madness you will eventually succumb to the madness yourself.
Your colleague has a valid point though, so I suggest you also code an easy way to determine the data to column mapping in PHP.
This isn't about keeping it simple, it's about retrofitting a solid foundation to build upon.
The thing that would worry me is that this kind of random design often hides certain business rules, things like "...if the gender is a date then they must have purchased a widget at some point therefore they can't be allowed to fribbish the lubdub... " - crazy I know but more common than it should be.

Names are exceptionally important. If you want your application to be maintainable, fix them before the code base grows further.

I wouldn't say you are complicating things.
Eric Evan's book Domain Driven Design has a lovely term for this: Anti Corruption Layer

To play Devil's Advocate, there's something to be said for not having an unnecessary layer of indirection in your short-term memory load for working with the system. Once familiar with the code, you will know what goes in which variable, so the main benefit is to someone new picking up the code from scratch. However, fixing that problem properly would also require fixing up the database schema which would (a) be a significant body of work, and (b) largely make the problem go away.
There is no black-and-white answer to this question, and the lack of an obvious answer to your specific problem suggests that you may want to let sleeping dogs lie.
On the other hand, if a cleanup operation is within the bounds of possibility then you may want to do it on a re-factoring type basis, incrementally fixing up the DB column names as the opportunity arises.

Just create views where it is most needed.

This is a good question as it talks to the heart of coding IMHO.
I would go with you and abstract out the bad names into readable decent names. The result being a little complication for much more logically understandable and readable code.

You didn't say you can't rename the columns in Access, so....do that! Another possibility would be to create views for each table, and rename the columns in the view. Then instead of working with table Employees, you work with view vEmployees. If I recall correctly, Access lets you update views as well as select from them. If you are using an ORM with PHP, that may not support updating views however.

Hard coding table names and column names is never a good idea even when the names make sense.
I don't know if using arrays is the best solution though. I'm not really familiar with PHP but I would have gone with something like constant strings to store the table names. In the languages I work in this would lead to more readable code.

You are very unlucky to be stuck with this database but I think on the whole a way of abstracting the field names into something more sensible is smarter.
I would perhaps create a data structure containing the database name, sanitised name, type and a field for the content when you're pulling the data out of the DB. That would give a convenient way of drawing things together so you're not only mapping away the crazy name scheme.

Absolutely you're doing the right thing. In my opinion it's better to implement some sanity there. Going forward, you're logic wouldn't be throw away if they decided to change that database or any of it's column names. If you build your mapping the right way, it should be easy to just plug the new tables/columns right in.
If anything, what you're doing improves the agility of your overall solution.
Of course I would still say KISS applies to the method of your mapping!

Using proper column names in your end of the application is the best you can do. And you should do it unless you want to have to look up "what that field was supposed to be again?" when you have to look at it again after you did something else.
Your colleague's point is not to overcomplicate things. That's valid, too.
So encapsulate access to the fields in a method or methods and have that method do the translation. Using maps this shouldn't be a performance problem.
In fact putting all the mapping to the data source in one object might help you if your customer reconsiders to use a real database. And customers love to change their opinion.

Why not create a datalayer with classes that map on to each table. Then you can define the class methods to access the columns and give the methods whatever names you want. Then the datalayer database access code is the only thing that needs to know about the real column names. I suspect that someone (perhaps several soneones) has already developed a framework to do this. Google "php orm".

Use a ORM, you will be changing the db soon...

You still need to maintain database. One possible approach I can suggest is to map field names in application code as you plan it to do. But then sooner or later you have to start handling this naming madness with field names and fix it. It is not good idea just to screen from a problem and imagine that it is a safe solution and good way to go. It is only temporary workaround. Do not full your self about it.

Related

Downside of having string properties in service contracts that can contain a full json model

We are working with a DDD framework in our company. We are changing a lot of core things in our API because we are still growing and we are still in our enfant phase when designing a good API.
The problem is that there are alot of flows already in the same api. Which are not compatible with eachother.
We have an order service and a product service.
Normally when the product model radically changes, we have a major impact in the order model.
Now im here listing all kind of red flags which should never happen but I simply dont have control over how it needs to be done. That is pretty much management pushing for a fast solution. And leading to bad shortcuts...
The way is has been decided to overcome that Order needs to adapt constantly. They made a property in the orderline called productConfiguration. This is in the contract of the service and is direcrtly translated as is in the DB tables. This contains the product model that can change. In json format.
For me its very clear that this is very dangerous to do this. Because i nthe end you need to change this json into an actual object. So you just move the restrictions from the service contract to code logic. Which makes it worse cause it will only cause an issue at run time...
Are there other major things I just know about, so I can bring it to the table to avoid this way of working...

Using strings that are directly converted into DB tables is not just in your opinion a bad design. It's an opinion shared by a lot of us.
What do you do when an object changes? For example, the new one requires an attribute that the old one didn't had. How do you manage this situation? I suppose that you've to change everything, including the objects stored before. Or build a kind of transformation layer where you translate objects from the old to the new design. A lot of extra work.
Anyway, given that the two domains are separated, what are the information that change so much and require such a design? I mean, for most of the things you could know at the beginning what do you need for your part of the domain. For the rest, I would prefer to have a kind of service that given an Id gives you the information from the other domain. You can change this service (here could be also json obj, if nothing than just showing is required) and adapt to your/their needs. But, it's just a solution that comes from my limited knowledge of your processes.
Other ways are also possible, as long as you can always understand which version of the design are you using.

Passing my own project on someone else - what to do? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Often there are situations where a project is passed on someone else. And often this process is unpleasant for both sides - the new owner complains about horrible documentation, bugs and bad design. The original owner is then bothered for months with questions about the project, requests to fix old bugs etc.
I might soon be in a situation where one of my projects will be given to someone else so I can focus on my other projects. I wonder what should I do to make this transfer as smooth as possible. What i already have is a decent documentation, the code is quite good commented and i'm still improving it. Its a medium sized project, not very large but still its not something you can code in a week.
I'm looking for a list of things that should be done in order to help the future owner taking over the project and at the same time will spare me all those annoying questions like "and what does this function do, what purpose does this class have...". I know documentation is a must - what else?
Note: although my project is in C++ i believe this is a language-agnostic question. If there are things you think are specific to some language, please mention them too.

Documentation is one thing, getting it into the head of your new project owner another. IMHO this is a typical situation where "less is more" - the less documentation your colleague has to read to understand something, the better. And, of course, learning takes time - for both of you, accept it.
So
instead of writing lots of documentation, make your code self-commentatory
have all documents / source code etc. in a clean and well named folder structure
make sure your build-process is almost completely automatic
don't forget to document your deployment process, if it is not automatic, too
clean-up, clean-up clean-up!

When taking over a project, documentation is of course desirable, but even more so is a good test suite. Trying to modify a program that you have no means of testing for correctness is a nightmare.

Documentation, but on all levels:
API docs
High level architecture: What components are there, what are their relationships and dependencies
For each component, a high level description pointing to important code sections
Tutorials: If you want to do X, here's how
Data: What data does it use and how, database schemas
Idioms: If you've created some idioms within your code, explain them
And, to start, give the guy a personal introduction to all of the above in person, hopefully doing some needed change in a pair programming way

the new owner complains about horrible documentation, bugs and bad design.
I suspect that no matter what you would do, new owner will always complain about something. People are different, so something that looks easy to understand for you, will look horrible and extremely complicated for someone else.
The original owner is then bothered for months with questions about the project, requests to fix old bugs etc.
In this case you should clearly refuse to help. If you won't refuse, you'll probably end up doing someone else's job for free. If maintaining the project is no longer your job, then the new guy should fix his problem without your help. If "the new guy" can't deal with that, he isn't suitable for the job and should quit.
Its a medium sized project,
"Medium sized" compared to what? How many lines or code, how many files, how many megabytes of code?
I wonder what should I do to make this transfer as smooth as possible. What i already have is a decent documentation, the code is quite good commented and i'm still improving it.
I would handle it like this:
First, do a sweep through the entire code and:
1.1 Remove all commented out blocks of code.
1.2 Remove all unused routines and classes (I'm talking about "forgotten" routines, not parts of utility library).
1.3 Make sure all code follow consistent formatting rules. I.e. you shouldn't mix class_a, ClassA and CClassA in same app, you shouldn't use different styles for putting brackets, etc.
1.4 Make sure that all names (class, variable, function) are self-explanatory. Your code should be as self-explaining as possible - this will save you from writing too much documentation.
1.5 In situations when there is a complicated or hard to understand function, write comments. Keep them as short as possible, and post only when they are absolutely necesarry.
1.6 Try to make sure that there are no known bugs left. If there are known bugs, document them and their behavior.
1.7 Remove garbage from project directories (files that are not used in project, etc.)
1.8 If possible, make sure that code still compiles and works as expected.
Generate html documentation with doxygen. Reveiw it few times, modify code comments a bit until you're satisfied. Or until you're somewhat satisfied with the result. Do not skip this step.
If there is a version control repository (say, git repository) with entire development history, hand it over to a new maintainer, or give him(her?) a functional copy of the repository. This will be useful for (git )bisecting and finding source of the bugs.
Once it is done, and code is transferred to a new maintainer, do not offer "free help", unless you're paid for it (or unless you get something else for helping, or unless it is order from your boss which makes helping new maintainer a part of your current task). Maintaining the code is no longer your job, and if new maintainer can't handle it, he isn't qualified for the job.

I think most of the problems can be avoided with just two simple rules.
Keep the code consistent with platform style guide.
Naming, naming and naming.
If the project is huge, then you just need to run some code camps with the new guys. There's no shortcut for this one.
Remember also that complaining happens mostly because new guy is not qualified enough, i.e. doesn't understand something. That's why it is important to keep things simple. And in case he is more qualified, then I guess you deserve it ;)
Some good advice where to start hacking/changing things is always better than documentation. Consider documentation as a backup material after you are familiar with the code, it should never be the starting point (except if you are exceptional technical writer with unlimited resources and time)

If there is good documentation and commented code as you say, then you've done your part. Just make sure that the documentation includes high-level documentation (architecture, data flow, etc.) as well as lower module or procedure-level documentation.
If this is a situation where you can, I would strongly suggest you protect yourself with some type of contract that specifies what future support (if any) you will provide and for how long.

I think for a situation like this the most important thing is a working, complete build that automatically compiles, documents, and tests the project. That way, there is a well defined point at which the new developer has it working. He can then figure stuff out from the tests and documentation, in principal.

What programming practice that you once liked have you since changed your mind about? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
As we program, we all develop practices and patterns that we use and rely on. However, over time, as our understanding, maturity, and even technology usage changes, we come to realize that some practices that we once thought were great are not (or no longer apply).
An example of a practice I once used quite often, but have in recent years changed, is the use of the Singleton object pattern.
Through my own experience and long debates with colleagues, I've come to realize that singletons are not always desirable - they can make testing more difficult (by inhibiting techniques like mocking) and can create undesirable coupling between parts of a system. Instead, I now use object factories (typically with a IoC container) that hide the nature and existence of singletons from parts of the system that don't care - or need to know. Instead, they rely on a factory (or service locator) to acquire access to such objects.
My questions to the community, in the spirit of self-improvement, are:
What programming patterns or practices have you reconsidered recently, and now try to avoid?
What did you decide to replace them with?

//Coming out of university, we were taught to ensure we always had an abundance
//of commenting around our code. But applying that to the real world, made it
//clear that over-commenting not only has the potential to confuse/complicate
//things but can make the code hard to follow. Now I spend more time on
//improving the simplicity and readability of the code and inserting fewer yet
//relevant comments, instead of spending that time writing overly-descriptive
//commentaries all throughout the code.

Single return points.
I once preferred a single return point for each method, because with that I could ensure that any cleanup needed by the routine was not overlooked.
Since then, I've moved to much smaller routines - so the likelihood of overlooking cleanup is reduced and in fact the need for cleanup is reduced - and find that early returns reduce the apparent complexity (the nesting level) of the code. Artifacts of the single return point - keeping "result" variables around, keeping flag variables, conditional clauses for not-already-done situations - make the code appear much more complex than it actually is, make it harder to read and maintain. Early exits, and smaller methods, are the way to go.

Trying to code things perfectly on the first try.
Trying to create perfect OO model before coding.
Designing everything for flexibility and future improvements.
In one word overengineering.

Hungarian notation (both Forms and Systems).
I used to prefix everything. strSomeString or txtFoo.
Now I use someString and textBoxFoo. It's far more readable and easier for someone new to come along and pick up. As an added bonus, it's trivial to keep it consistant -- camelCase the control and append a useful/descriptive name. Forms Hungarian has the drawback of not always being consistent and Systems Hungarian doesn't really gain you much. Chunking all your variables together isn't really that useful -- especially with modern IDE's.

The "perfect" architecture
I came up with THE architecture a couple of years ago. Pushed myself technically as far as I could so there were 100% loosely coupled layers, extensive use of delegates, and lightweight objects. It was technical heaven.
And it was crap. The technical purity of the architecture just slowed my dev team down aiming for perfection over results and I almost achieved complete failure.
We now have much simpler less technically perfect architecture and our delivery rate has skyrocketed.

The use of caffine. It once kept me awake and in a glorious programming mood, where the code flew from my fingers with feverous fluidity. Now it does nothing, and if I don't have it I get a headache.

Commenting out code. I used to think that code was precious and that you can't just delete those beautiful gems that you crafted. I now delete any commented-out code I come across unless there's a TODO or NOTE attached because it's too perilous to leave it in. To wit, I've come across old classes with huge commented-out portions and it really confused me why they were there: were they recently commented out? is this a dev environment change? why does it do this unrelated block?
Seriously consider not commenting out code and just deleting it instead. If you need it, it's still in source control. YAGNI though.

The overuse / abuse of #region directives. It's just a little thing, but in C#, I previously would use #region directives all over the place, to organize my classes. For example, I'd group all class properties together in a region.
Now I look back at old code and mostly just get annoyed by them. I don't think it really makes things clearer most of the time, and sometimes they just plain slow you down.
So I have now changed my mind and feel that well laid out classes are mostly cleaner without region directives.

Waterfall development in general, and in specific, the practice of writing complete and comprehensive functional and design specifications that are somehow expected to be canonical and then expecting an implementation of those to be correct and acceptable. I've seen it replaced with Scrum, and good riddance to it, I say. The simple fact is that the changing nature of customer needs and desires makes any fixed specification effectively useless; the only way to really properly approach the problem is with an iterative approach. Not that Scrum is a silver bullet, of course; I've seen it misused and abused many, many times. But it beats waterfall.

Never crashing.
It seems like such a good idea, doesn't it? Users don't like programs that crash, so let's write programs that don't crash, and users should like the program, right? That's how I started out.
Nowadays, I'm more inclined to think that if it doesn't work, it shouldn't pretend it's working. Fail as soon as you can, with a good error message. If you don't, your program is going to crash even harder just a few instructions later, but with some nondescript null-pointer error that'll take you an hour to debug.
My favorite "don't crash" pattern is this:
public User readUserFromDb(int id){
User u = null;
try {
ResultSet rs = connection.execute("SELECT * FROM user WHERE id = " + id);
if (rs.moveNext()){
u = new User();
u.setFirstName(rs.get("fname"));
u.setSurname(rs.get("sname"));
// etc
}
} catch (Exception e) {
log.info(e);
}
if (u == null){
u = new User();
u.setFirstName("error communicating with database");
u.setSurname("error communicating with database");
// etc
}
u.setId(id);
return u;
}
Now, instead of asking your users to copy/paste the error message and sending it to you, you'll have to dive into the logs trying to find the log entry. (And since they entered an invalid user ID, there'll be no log entry.)

I thought it made sense to apply design patterns whenever I recognised them.
Little did I know that I was actually copying styles from foreign programming languages, while the language I was working with allowed for far more elegant or easier solutions.
Using multiple (very) different languages opened my eyes and made me realise that I don't have to mis-apply other people's solutions to problems that aren't mine. Now I shudder when I see the factory pattern applied in a language like Ruby.

Obsessive testing. I used to be a rabid proponent of test-first development. For some projects it makes a lot of sense, but I've come to realize that it is not only unfeasible, but rather detrimental to many projects to slavishly adhere to a doctrine of writing unit tests for every single piece of functionality.
Really, slavishly adhering to anything can be detrimental.

This is a small thing, but: Caring about where the braces go (on the same line or next line?), suggested maximum line lengths of code, naming conventions for variables, and other elements of style. I've found that everyone seems to care more about this than I do, so I just go with the flow of whoever I'm working with nowadays.
Edit: The exception to this being, of course, when I'm the one who cares the most (or is the one in a position to set the style for a group). In that case, I do what I want!
(Note that this is not the same as having no consistent style. I think a consistent style in a codebase is very important for readability.)

Perhaps the most important "programming practice" I have since changed my mind about, is the idea that my code is better than everyone else's. This is common for programmers (especially newbies).

Utility libraries. I used to carry around an assembly with a variety of helper methods and classes with the theory that I could use them somewhere else someday.
In reality, I just created a huge namespace with a lot of poorly organized bits of functionality.
Now, I just leave them in the project I created them in. In all probability I'm not going to need it, and if I do, I can always refactor them into something reusable later. Sometimes I will flag them with a //TODO for possible extraction into a common assembly.

Designing more than I coded.
After a while, it turns into analysis paralysis.

The use of a DataSet to perform business logic. This binds the code too tightly to the database, also the DataSet is usually created from SQL which makes things even more fragile. If the SQL or the Database changes it tends to trickle to everything the DataSet touches.
Performing any business logic inside an object constructor. With inheritance and the ability to create overloaded constructors tend to make maintenance difficult.

Abbreviating variable/method/table/... Names
I used to do this all of the time, even when working in languages with no enforced limits on lengths of names (well they were probably 255 or something). One of the side-effects were a lot of comments littered throughout the code explaining the (non-standard) abbreviations. And of course, if the names were changed for any reason...
Now I much prefer to call things what they really are, with good descriptive names. including standard abbreviations only. No need to include useless comments, and the code is far more readable and understandable.

Wrapping existing Data Access components, like the Enterprise Library, with a custom layer of helper methods.
It doesn't make anybody's life easier
Its more code that can have bugs in it
A lot of people know how to use the EntLib data access components. No one but the local team knows how to use the in house data access solution

I first heard about object-oriented programming while reading about Smalltalk in 1984, but I didn't have access to an o-o language until I used the cfront C++ compiler in 1992. I finally got to use Smalltalk in 1995. I had eagerly anticipated o-o technology, and bought into the idea that it would save software development.
Now, I just see o-o as one technique that has some advantages, but it's just one tool in the toolbox. I do most of my work in Python, and I often write standalone functions that are not class members, and I often collect groups of data in tuples or lists where in the past I would have created a class. I still create classes when the data structure is complicated, or I need behavior associated with the data, but I tend to resist it.
I'm actually interested in doing some work in Clojure when I get the time, which doesn't provide o-o facilities, although it can use Java objects if I understand correctly. I'm not ready to say anything like o-o is dead, but personally I'm not the fan I used to be.

In C#, using _notation for private members. I now think it's ugly.
I then changed to this.notation for private members, but found I was inconsistent in using it, so I dropped that too.

I stopped going by the university recommended method of design before implementation. Working in a chaotic and complex system has forced me to change attitude.
Of course I still do code research, especially when I'm about to touch code I've never touched before, but normally I try to focus on as small implementations as possible to get something going first. This is the primary goal. Then gradually refine the logic and let the design just appear by itself. Programming is an iterative process and works very well with an agile approach and with lots of refactoring.
The code will not look at all what you first thought it would look like. Happens every time :)

I used to be big into design-by-contract. This meant putting a lot of error checking at the beginning of all my functions. Contracts are still important, from the perspective of separation of concerns, but rather than try to enforce what my code shouldn't do, I try to use unit tests to verify what it does do.

I would use static's in a lot of methods/classes as it was more concise. When I started writing tests that practice changed very quickly.

Checked Exceptions
An amazing idea on paper - defines the contract clearly, no room for mistake or forgetting to check for some exception condition. I was sold when I first heard about it.
Of course, it turned to be such a mess in practice. To the point of having libraries today like Spring JDBC, which has hiding legacy checked exceptions as one of its main features.

That anything worthwhile was only coded in one particular language. In my case I believed that C was the best language ever and I never had any reason to code anything in any other language... ever.
I have since come to appreciate many different languages and the benefits/functionality they offer. If I want to code something small - quickly - I would use Python. If I want to work on a large project I would code in C++ or C#. If I want to develop a brain tumour I would code in Perl.

When I needed to do some refactoring, I thought it was faster and cleaner to start straightaway and implement the new design, fixing up the connections until they work. Then I realized it's better to do a series of small refactorings to slowly but reliably progress towards the new design.

Perhaps the biggest thing that has changed in my coding practices, as well as in others, is the acceptance of outside classes and libraries downloaded from the internet as the basis for behaviors and functionality in applications. In school at the time I attended college we were encouraged to figure out how to make things better via our own code and rely upon the language to solve our problems. With the advances in all aspects of user interface and service/data consumption this is no longer a realistic notion.
There are certain things which will never change in a language, and having a library that wraps this code in a simpler transaction and in fewer lines of code that I have to write is a blessing. Connecting to a database will always be the same. Selecting an element within the DOM will not change. Sending an email via a server-side script will never change. Having to write this time and again wastes time that I could be using to improve my core logic in the application.

Initializing all class members.
I used to explicitly initialize every class member with something, usually NULL. I have come to realize that this:
normally means that every variable is initialized twice before ever being read
is silly because in most languages automatically initialize variables to NULL.
actually enforces a slight performance hit in most languages
can bloat code on larger projects

Like you, I also have embraced IoC patterns in reducing coupling between various components of my apps. It makes maintenance and parts-swapping much simpler, as long as I can keep each component as independent as possible. I'm also utilizing more object-relational frameworks such as NHibernate to simplify database management chores.
In a nutshell, I'm using "mini" frameworks to aid in building software more quickly and efficiently. These mini-frameworks save lots of time, and if done right can make an application super simple to maintain down the road. Plug 'n Play for the win!

How to partition a problem into smaller understandable portions?

I'm not sure if it's possible to give general advice on this topic, but please try. It's hard to explain my case because it's too complex to explain. And that's exactly the problem.
I seem to constantly stumble on a situation where I try to design some part of my project, but it has so many things to take into consideration that I'm unable to get a grasp of it.
Are there any general tips or advice on how to look at my system in smaller pieces at a time? How to find smaller portions that could be designed separately on their own?

Create a glossary.
In other words, identify the terms that are meaningful to the project domain — not from the programmer's point of view, but from a user's, who is familiar with the subject matter.
Then define the terms as precisely and discretely as you can. A good definition in this form can serve as a kind of pseudocode.
Since you have not identified even the domain of your problem, I'll choose a random example. In a civilian personnel system, you might have terms like:
billet: a term of service (from start date to end date) at a particular grade and step
employee: a series of billets associated with a particular SSN
grade and step: row and column in the federal general schedule
And so on. This isn't to identify functional units, as it sounds like you are trying to do, but it's a good preparatory step before doing so, so that you can express your functional steps in well-defined terms.

Your key goals are:
High cohesion: Code (methods, fields, classes) within one piece/module/partition should interact intensively; it should make sense for these elements to know about each other. If you find that some of them don't interact much with the rest, they probably belong somwhere else or should form their own partition. If you find code outside interacting intensively with the partition and knowing too much about its inner workings, it probably belongs inside. The typical example is found in OO code written in procedural style, with "dumb" data objects and "manager" code that operates on them but should really be part of the data objects.
Loose coupling: Interaction between pieces/modules/partitions should only happen through narrow, well-defined, well-documented APIs. Try to identify such APIs and see what code is needed to implement them and what code will use them.

It's useful to approach problem decomposition both top-down and bottom-up.
If you're having trouble splitting a big problem into two or more smaller problems, try to think of the smallest possible problems that will need to be solved. Once those are handled, you may start to see ways to combine them into larger problems as you approach your original large problem.

When I find myself copying and pasting chunks of code with minimal adjustments I realize that's a "partition" and then create a class, method, function, or whatever.
Actually, the whole object oriented approach is what it's all about. Try thinking of your application as tangible things that do stuff. Write pseudo code describing what the things are and what they do, I find lots of "partitions" this way.

Here's a try, kind of wild guess.
People usually underestimate how long it will take them to do the work. If your project is large, then most likely you'll need several people to work on it, so you can try planning with that in mind. Now a person can be expected to hold just one area in the head, so you'll need to explain to him exactly what kind of task he's supposed to do.
So I'd say you should try to write a job description that should encompass as much as possible for one person to seriously concentrate on. Repeat, until you have broken your project into parts you wanted to. As a benefit, you're ready to assemble your team. But if you find out the parts are small, maybe you'll still be able to do it yourself.

What's the best way to become familiar with a large codebase? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Joining an existing team with a large codebase already in place can be daunting. What's the best approach;
Broad; try to get a general overview of how everything links together, from the code
Narrow; focus on small sections of code at a time, understanding how they work fully
Pick a feature to develop and learn as you go along
Try to gain insight from class diagrams and uml, if available (and up to date)
Something else entirely?
I'm working on what is currently an approx 20k line C++ app & library (Edit: small in the grand scheme of things!). In industry I imagine you'd get an introduction by an experienced programmer. However if this is not the case, what can you do to start adding value as quickly as possible?
--
Summary of answers:
Step through code in debug mode to see how it works
Pair up with someone more familiar with the code base than you, taking turns to be the person coding and the person watching/discussing. Rotate partners amongst team members so knowledge gets spread around.
Write unit tests. Start with an assertion of how you think code will work. If it turns out as you expected, you've probably understood the code. If not, you've got a puzzle to solve and or an enquiry to make. (Thanks Donal, this is a great answer)
Go through existing unit tests for functional code, in a similar fashion to above
Read UML, Doxygen generated class diagrams and other documentation to get a broad feel of the code.
Make small edits or bug fixes, then gradually build up
Keep notes, and don't jump in and start developing; it's more valuable to spend time understanding than to generate messy or inappropriate code.
this post is a partial duplicate of the-best-way-to-familiarize-yourself-with-an-inherited-codebase

Start with some small task if possible, debug the code around your problem.
Stepping through code in debug mode is the easiest way to learn how something works.

Another option is to write tests for the features you're interested in. Setting up the test harness is a good way of establishing what dependencies the system has and where its state resides. Each test starts with an assertion about the way you think the system should work. If it turns out to work that way, you've achieved something and you've got some working sample code to reproduce it. If it doesn't work that way, you've got a puzzle to solve and a line of enquiry to follow.

One thing that I usually suggest to people that has not yet been mentioned is that it is important to become a competent user of the existing code base before you can be a developer. When new developers come into our large software project, I suggest that they spend time becoming expert users before diving in trying to work on the code.
Maybe that's obvious, but I have seen a lot of people try to jump into the code too quickly because they are eager to start making progress.

This is quite dependent on what sort of learner and what sort of programmer you are, but:
Broad first - you need an idea of scope and size. This might include skimming docs/uml if they're good. If it's a long term project and you're going to need a full understanding of everything, I might actually read the docs properly. Again, if they're good.
Narrow - pick something manageable and try to understand it. Get a "taste" for the code.
Pick a feature - possibly a different one to the one you just looked at if you're feeling confident, and start making some small changes.
Iterate - assess how well things have gone and see if you could benefit from repeating an early step in more depth.

Pairing with strict rotation.
If possible, while going through the documentation/codebase, try to employ pairing with strict rotation. Meaning, two of you sit together for a fixed period of time (say, a 2 hour session), then you switch pairs, one person will continue working on that task while the other moves to another task with another partner.
In pairs you'll both pick up a piece of knowledge, which can then be fed to other members of the team when the rotation occurs. What's good about this also, is that when a new pair is brought together, the one who worked on the task (in this case, investigating the code) can then summarise and explain the concepts in a more easily understood way. As time progresses everyone should be at a similar level of understanding, and hopefully avoid the "Oh, only John knows that bit of the code" syndrome.
From what I can tell about your scenario, you have a good number for this (3 pairs), however, if you're distributed, or not working to the same timescale, it's unlikely to be possible.

I would suggest running Doxygen on it to get an up-to-date class diagram, then going broad-in for a while. This gives you a quickie big picture that you can use as you get up close and dirty with the code.

I agree that it depends entirely on what type of learner you are. Having said that, I've been at two companies which had very large code-bases to begin with. Typically, I work like this:
If possible, before looking at any of the functional code, I go through unit tests that are already written. These can generally help out quite a lot. If they aren't available, then I do the following.
First, I largely ignore implementation and look only at header files, or just the class interfaces. I try to get an idea of what the purpose of each class is. Second, I go one level deep into the implementation starting with what seems to be the area of most importance. This is hard to gauge, so occasionally I just start at the top and work my way down in the file list. I call this breadth-first learning. After this initial step, I generally go depth-wise through the rest of the code. The initial breadth-first look helps to solidify/fix any ideas I got from the interface level, and then the depth-wise look shows me the patterns that have been used to implement the system, as well as the different design ideas. By depth-first, I mean you basically step through the program using the debugger, stepping into each function to see how it works, and so on. This obviously isn't possible with really large systems, but 20k LOC is not that many. :)

Work with another programmer who is more familiar with the system to develop a new feature or to fix a bug. This is the method that I've seen work out the best.

I think you need to tie this to a particular task. When you have time on your hands, go for whichever approach you are in the mood for.
When you have something that needs to get done, give yourself a narrow focus and get it done.

Get the team to put you on bug fixing for two weeks (if you have two weeks). They'll be happy to get someone to take responsibility for that, and by the end of the period you will have spent so much time problem-solving with the library that you'll probably know it pretty well.

If it has unit tests (I'm betting it doesn't). Start small and make sure the unit tests don't fail. If you stare at the entire codebase at once your eyes will glaze over and you will feel overwhelmed.
If there are no unit tests, you need to focus on the feature that you want. Run the app and look at the results of things that your feature should affect. Then start looking through the code trying to figure out how the app creates the things you want to change. Finally change it and check that the results come out the way you want.
You mentioned it is an app and a library. First change the app and stick to using the library as a user. Then after you learn the library it will be easier to change.
From a top down approach, the app probably has a main loop or a main gui that controls all the action. It is worth understanding the main control flow of the application. It is worth reading the code to give yourself a broad overview of the main flow of the app. If it is a GUI app, creating a paper that shows which screens there are and how to get from one screen to another. If it is a command line app, how the processing is done.
Even in companies it is not unusual to have this approach. Often no one fully understands how an application works. And people don't have time to show you around. They prefer specific questions about specific things so you have to dig in and experiment on your own. Then once you get your specific question you can try to isolate the source of knowledge for that piece of the application and ask it.

Start by understanding the 'problem domain' (is it a payroll system? inventory? real time control or whatever). If you don't understand the jargon the users use, you'll never understand the code.
Then look at the object model; there might already be a diagram or you might have to reverse engineer one (either manually or using a tool as suggested by Doug). At this stage you could also investigate the database (if any), if should follow the object model but it may not, and that's important to know.
Have a look at the change history or bug database, if there's an area that comes up a lot, look into that bit first. This doesn't mean that it's badly written, but that it's the bit everyone uses.
Lastly, keep some notes (I prefer a wiki).
The existing guys can use it to sanity check your assumptions and help you out.
You will need to refer back to it later.
The next new guy on the team will really thank you.

I had a similar situation. I'd say you go like this:
If its a database driven application, start from the database and try to make sense of each table, its fields and then its relation to the other tables.
Once fine with the underlying store, move up to the ORM layer. Those table must have some kind of representation in code.
Once done with that then move on to how and where from these objects are coming from. Interface? what interface? Any validations? What preprocessing takes place on them before they go to the datastore?
This would familiarize you better with the system. Remember that trying to write or understand unit tests is only possible when you know very well what is being tested and why it needs to be tested in only that way.
And in case of a large application that is not driven towards databases, I'd recommend an other approach:
What the main goal of the system?
What are the major components of the system then to solve this problem?
What interactions each of the component has among them? Make a graph that depicts component dependencies. Ask someone already working on it. These componentns must be exchanging something among each other so try to figure out those as well (like IO might be returning File object back to GUI and like)
Once comfortable to this, dive into component that is least dependent among others. Now study how that component is further divided into classes and how they interact wtih each other. This way you've got a hang of a single component in total
Move to the next least dependent component
To the very end, move to the core component that typically would have dependencies on many of the other components which you've already tackled
While looking at the core component, you might be referring back to the components you examined earlier, so dont worry keep working hard!
For the first strategy:
Take the example of this stackoverflow site for instance. Examine the datastore, what is being stored, how being stored, what representations those items have in the code, how an where those are presented on the UI. Where from do they come and what processing takes place on them once they're going back to the datastore.
For the second one
Take the example of a word processor for example. What components are there? IO, UI, Page and like. How these are interacting with each other? Move along as you learn further.
Be relaxed. Written code is someone's mindset, froze logic and thinking style and it would take time to read that mind.

First, if you have team members available who have experience with the code you should arrange for them to do an overview of the code with you. Each team member should provide you with information on their area of expertise. It is usually valuable to get multiple people explaining things, because some will be better at explaining than others and some will have a better understanding than others.
Then, you need to start reading the code for a while without any pressure (a couple of days or a week if your boss will provide that). It often helps to compile/build the project yourself and be able to run the project in debug mode so you can step through the code. Then, start getting your feet wet, fixing small bugs and making small enhancements. You will hopefully soon be ready for a medium-sized project, and later, a big project. Continue to lean on your team-mates as you go - often you can find one in particular who is willing to mentor you.
Don't be too hard on yourself if you struggle - that's normal. It can take a long time, maybe years, to understand a large code base. Actually, it's often the case that even after years there are still some parts of the code that are still a bit scary and opaque. When you get downtime between projects you can dig in to those areas and you'll often find that after a few tries you can figure even those parts out.
Good luck!

You may want to consider looking at source code reverse engineering tools. There are two tools that I know of:
SWAG Kit (Linux only) link
Bauhaus academic commercial
Both tools offer similar feature sets that include static analysis that produces graphs of the relations between modules in the software.
This mostly consists of call graphs and type/class decencies. Viewing this information should give you a good picture of how the parts of the code relate to one another. Using this information, you can dig into the actual source for the parts that you are most interested in and that you need to understand/modify first.

I find that just jumping in to code can be a a bit overwhelming. Try to read as much documentation on the design as possible. This will hopefully explain the purpose and structure of each component. Its best if an existing developer can take you through it but that isn't always possible.
Once you are comfortable with the high level structure of the code, try to fix a bug or two. this will help you get to grips with the actual code.

I like all the answers that say you should use a tool like Doxygen to get a class diagram, and first try to understand the big picture. I totally agree with this.
That said, this largely depends on how well factored the code is to begin with. If its a gigantic mess, it's going to be hard to learn. If its clean, and organized properly, it shouldn't be that bad.

See this answer on how to use test coverage tools to locate the code for a feature of interest, without knowing anything about where that feature is, or how it is spread across many modules.

(shameless marketing ahead)
You should check out nWire. It is an Eclipse plugin for navigating and visualizing large codebases. Many of our customers use it to break-in new developers by printing out visualizations of the major flows.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008