Benefits of splitting a large project into sub projects?

Benefits of splitting a large project into sub projects? - language-agnostic

I've just been assigned to a java project, which is organized into many small projects (30+).
It took me a while to get the projects to build correctly, because i had to adjust the buildpath for each project.
What are the benefits of splitting a large project into many small ones like this?
Doesn't this increase the maintenance for the projects?

A small project is easier to maintain, debug and test than a large project. If a bug is discovered in the large project it is easier to discover where the bug is if it is divided into sub projects.
Dividing a project into several small projects also makes the code more reusable. If a new project requires one of the smaller projects it just needs to be included.
Integrating small projects into a large one may not be easy but it is worth it once it is done.

Related

Estimating Integration times, is it really 3x? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I read in a mythical man month that integration takes 3 times the amount of time it took to develop the individual components.
What has you guys experienced?

Yes and no. Integration effort is probably still 3x, but now it's amortized over the whole development process (e.g. early integration, integration tests (esp. in TDD), etc.)
We still have to do the work but it doesn't catch us by surprise anymore.

That was referring to the bad old days when they built all the software components separately and then tried to put them all together. Smart people don't work like that anymore - they integrate continuously.

I would agree, if not higher. Though it really depends on the integration touch points.
I was involved on a project to carry out integration of a number of modules between Siebel and SAP. While both of these products have integration modules available, all the problems on the project (and there were many) were involved in the integration.
It wasn't helped by the fact that the majority of SAP that we were using was in German, and the messages being transfered were in different XML encoding formats (UTF8 / UTF16).
Once we'd got to grips with the intricacies of what SAP wanted to send and recieve, the whole project moved along much quicker.
Key things for a successfull integration project:
Good documentation (in English!) on the integration modules
Good documentation on the message formats
Good project management
The project management bit is important as they supply the pizza, and do show some understanding when you have been working 30 hours straight to get an account name from one textbox on one machine to appear in another textbox on another machine.
Our project lasted over a year. The rest of the configuration of Siebel that we did, which was alot was only a couple of months,
So Integration - 10 months+, rest of the config 2 months.

Integration time depends on a few of factors: the size of the project, communication between teams, and your integration philosophy.
Small projects take less time to integrate. Large, very large, or huge projects will take more time to integrate. I've been on small projects where integration was minimal. I've been on huge projects spread across multiple component teams where integration took a very long time.
Time to integrate also depends on how well you project communication across teams. If your teams are not communicating it can take 3x or more time to integrate and work out all the related bugs.
Continuous integration helps with the perception that integration takes less time. With CI the time to integrate is amortized over the life of the project. But again if you have a poor relationship with the other component teams to total time for all integrations will take non-zero time.
CI is definitely better than the alternative. Waiting until late in the development cycle to integrate is bound to cause you much pain. The earlier you begin the integration the more comfortable each team becomes with the process.

Kind of a side note, there is an interesting talk given by Juval Lowey regarding this. If you have many tiny components, then you increase effort to integrate. If you have only a few large components, you decrease integration, but you have more complex components to maintain. So your effort to integrate is dependent on the architecture and where it balances the number of components to their complexity. A good balance is key, because if you drift too one side or the other, the effort required increases exponentially. If you have 20 components and add a 21st, it's not just 5% (1/20) more complex, because you have to consider the interactions of the potentially other 20 components. So adding one component adds the potential for 20 ways to interact with the existing components. Of course a good design will limit this interaction to as few components as possible, but that was basically why Juval felt it was exponential.

Neil and Markus' answers are spot on. In Windows, we integrate continuously. Our source code control system (we call it "Source Depot") is hierarchical with "WinMAIN" being at the top, and a tree of branches underneath it that general correspond to the organizational structure.
In general, changes flow up and down this tree in a process of "forward integrations" and "reverse integrations". This happens very regularly - almost every day for the top level branches under winmain, and a bit less often for the lower level branches. Each major branch builds fully every day in at least four flavors. The top level branches build in six flavors. A flavor is something like x86/fre, or x64/chk. We also get some daily pseudo localized builds as well.
By "build" I mean build a fully installable Windows client product from source. This happens several hundred times per day.
This works well for us - there are two important goals here:
maintaining good code flow (we call it velocity) up and down the tree. The idea is that any branch is never too different from WinMAIN.
Catching integration errors as early as possible.
Markus is very correct that this amortizes integration costs over the life of the project. For us, this makes the costs a LOT lower than they would be if we deferred costs more toward the ends of cycles. This is how things used to work. Not to air too much dirty laundry, but when I started in Windows about 5 1/2 years ago, getting a build out took a very, very long time. They now happen every day, like clock work. The Windows Engineering Tools and Release group (WETR) gets all the credit for this.
As many others have suggested in various forums - regular integration and full daily automated builds are essential for most projects.
Note, daily and regular automated test are a whole other topic (its a massive effort as you can imagine).

If you have an integration phase towards the end of your project on you project plan you are doomed and x3 is not too bad.
You should rather go for continuous integration, where you have, say every 2 weeks, a release with some integration before it.

New to Large Projects

I think I'm in somewhat of a unique situation: I have a decent amount of coding experience in C/C++ in a Linux environment. However, I don't really have "project experience." For example, I'm familiar with the concept of version control, but I've never used any. Or, i've never worked on a project with more than a half dozen source files.
So, where I am now is that I'm working on this project with a large amount of code already existing. I have to write all my code in a windows environment using Visual Studio 2008 (Visual C++ to be specific) So I have a few questions:
How do integrate the already existing code into VC++? I'm using tortoise SVN and I have all the code on my machine...
Does anyone have any general advice on moving from small projects to larger projects?
Any help/advice would be greatly appreciated

Some of the keys to coming to grips with a large codebase:
Learn the ins and outs of source control. For now, start with just learning how all the SVN commands work. SVNBook is a good resource for this. Use VisualSVN or a similar plugin to interact with your repository within your IDE (Tortoise is still useful when you want to interact from elsewhere). Down the road, you'll want to become intimately familiar with branching and merging (and the tools for doing so quickly and correctly), and perhaps learn a DVCS (distributed version control system) like Git or Mercurial. This will at the very least expand your mind a bit, and likely teach you some lessons that will be useful even in projects where you use more traditional (centralized) version control.
Learn how to quickly look for things, find the declarations of unfamiliar classes and variables, and trace the execution of the larger application. There are lots of approaches for this, and you'll likely use most of them at one point or another, but some of them are your IDE's built-in features (most of them are quite robust in this regard, you should be able to right-click on a class name and find its declaration easily), grep and the like (ack is a variant that is very suitable for code spelunking), and CTAGS if you swing that (C/C++) way.
Learn the basics of unit testing, maybe even try out a framework like NUnit. Personally I'm not big into unit tests, but I recognize their utility and many swear by them, so don't knock it 'til you've tried it, at least.
Even if you're a flawless programmer with a robust battery of unit tests, large codebases demand a higher level of debugging skill, due to the inherent complexity of the problems that crop up. Whether it's learning how to write concise, descriptive debug printf()-like statements, becoming more familiar with your debugger, or even learning the ins and outs of your language (e.g. the corner cases of the type system/object model) can be helpful in unwinding these complex issues.
Unfortunately, I haven't used Visual Studio, but I think getting to know your IDE's project import/migration flow will be instructive too. Maybe somebody else will chime in with more concrete advice on that front. The process can be onerous, especially if you had a non-standard custom build system before and you want everything to be done The One True Visual Studio Way henceforth, but the tools for automatic dependence extraction from code are getting better and better.

The ideas already given are very good. But you also might want to read Mike Gunderloy's Coder to Developer. From your description of your current experience, I think you'll find it useful. Also read The Pragmatic Programmer; I keep it in my office by my desk, and often find myself loaning it to younger developers.

Just dive in. Hopefully whoever has been working on this project so far organized the code into logical groups (namespaces, class hierarchies, folders).
I'll also second Matt J on learning how to use the IDE: I'm not familiar with Visual Studio specifically, but there should be contextual menu items when you click on a class or method to take you to the place where it was declared, and from there to the classes it was derived from.
Get version control set up first though: you'll feel more comfortable poking around once you learn how to "revert" ;)

I have been using VisualSVN for quite a while without problems. It integrates perfectly with vs2008. As for moving on to large projects, a great way to see how things are done is to download the source of a decent size existing project and see how it was put together. After you've got a good idea of how things are structured, the best thing you can do for yourself is to write code. Brainstorm a project and go at it. Depending on how you feel about your outcome after completion, you could use it as part of your portfolio as well.

Unit tests. Use them or you'll regret it.

Get to know Visual Studio well, if you live in it, you really have to know it well.
AnkSVN is a free plug-in for Visual Studio 2008, and it works very well.
Also, Refactor for C++ is another free plug-in and one of the only ways to get refactoring support in Visual C++.
Also, you will soon learn that on large projects, 80% of your time is going to be spent doing code maintenance, so do yourself a favour and make your code a place in which you want to live, not a place of terror that you shrink back from. Clean code, the occasional comment, and unit tests will go a long way to making you want to get up and go to work in the morning, rather than dreading that you have to work on that monstrosity where anytime you touch anything it breaks.

How is web programming different from back-end programming?

I have worked on single threaded business logic/back-end programming for most of my career. I now wish to learn web programming but would like to know how web programming is different from non-GUI programming (e.g. writing an API or a file processing application). I am not talking about the GUI design aspects (someone has already asked that question here) but more about programming complexity.
On the few occasions when I have worked on a web application, I felt that web applications are relatively more non-deterministic and unpredictable (for example, due to the event driven, multi-threaded model of web applications, there are several permutations and combinations of events and actions one needs to take care of) .
What would you say are some of the basic features of web programming that makes it different from non-GUI applications? What are the pitfalls/mistakes a back-end developer might commit while working on web applications?
EDIT
My definition of back-end programming means non-GUI applications like an API or a file processing batch application that parses a large data file, reads the records, does a lot of number crunching calculations on the data and spews out the results into another file or database. Anothe example could be a library of date and time utilities.

The biggest challenge with web programming is dealing with state. HTTP is a stateless protocol. This can make maintaining state more challenging than in a desktop application. Web applications tend to have a different life cycle due to this. Each web development platform deals with this somewhat differently, but they all need to deal with it in some way.

Web applications generally feel like single threaded applications, as you - the application developer - rarely create threads of your own. If anything, it's actually a lot easier, because the stateless nature of the web transactions means that you have to load the data for the page each time from the database. Therefore, you don't have to worry about concurrency, since 'whatever is there' is usually good enough.
The biggest problem with Web development is all of the background knowledge that you have to accumulate over time. How do you lay out web pages? How do you style things with CSS? How do you get parameters from the query string? How do you validate a field value in JavaScript? All of those things are actually really easy to learn, but there's just so many of them that it can be a real pain.

The biggest pitfalls I've witnessed Application developers make when moving into Web is not considering the costs of their code. Either they abuse MySQL to much too the point of bogging the RDBMS down, they write code that uses too much memory, or they make front end pages that are to big to fit in dialup/cellphones or low end broadband/dsl pipeline.
Sometimes it can't be avoid in writing a heavy duty page, but considerations can be made to attempt to cache as much as possible or when writing a page that will be hit a lot they will make no effort to profile and optimize queries before they go out the door.
Its not that these people are stupid, just a lack of experience and awareness that they need to play nice and write code that's somewhat lean.

Back-end programming is infinitely easier than web programming. (You have been warned!) Web programming is the easiest to show off to everybody.

Most web sites have a back end component as well. Typical structure will be something like:
UI - html/css/javascript
Controller - if using MVC
Business Logic/Services - this is backend
Database - this is also backend
So building web sites will still mean a lot of back end work. In regards to the UI, the main difference is that you will need to have a good eye for design and layout to do it well. The html/css technology is pretty simple in itself.

HTML was actually developed to deliver physics papers. You can still see it in some of the old meta tags. At any rate the difference is web programming is stateless and thick client development is not.
As you have adeptly indicated, its all driven by events. True javascript has mucked up web development a bit by creating the illusion of a stateful enviornment but in the end everything comes down to simple HTML.
Its never too late to start learning, I would say start making some static HTML pages and move your way up to an MVC Framework, I suggest Microsoft MVC Framework. Its pretty fantastic, there are others you could use like ASP.Net Webforms but you won't learn anything by dragging and dropping things onto a designer ;).

web & GUI applications interface with humans .. back-end applications interface with services and databases .. As such your specifications need to include significant consideration of your user's mental model - making things behave as people expect them to. And doing that - understanding how users think - is not always easy or logical. You may have elegant algorithmic solutions that simply fail to engage, because people don't always think logically. Many times, quite elegant UI's are extremely twisted coding-wise .. which is very contrary to system->system programming
Depending on problem-space, much of this can be more art than science.

One consideration (amongst many) with web programming is that users won't just be stupid (not that they all are, but you always have to factor that in), they will sometimes (assume always) be downright malicious and nasty, and will do everything in their power to destroy your application, your database, your weekends, your sanity...
Be as paranoid as a very small nun at a penguin shoot. Do not trust your users.

Another consideration is that Back End programming as per your definition is easier to test.
Once you begin web programming you're at the mercy of the various browsers' different interpretations of the same code. Plus the user, with inputs of mouse and keyboard, has a variety of ways to break what you produce.

Web programming isn't back-end programing. It shows stuff on the front end, the web.
Are you defining it otherwise?
EDIT
Web programming pulls you into presenting data consistently, visually, to everyone. Back end coding means constructing that data, in the same way for presentation, but not presenting it.

Based on your definition of "back end programming," your question applies not only to web applications, but to any GUI application.
It kind of depends what kind of GUI application we're talking about. For example:
Internal business applications tend to involve lots of business process workflow logic, record keeping, and interoperability between separate systems. No fancy alorithms or number crunching are needed. Your audience is limited, so performance is not a big deal, but cross-platform compatibility is important so these tend to be web applications. Your main concerns are making it easy to tie business sytems together, and keeping the API layered to ensure that the GUI code does not have to deal with any of the business logic code.
Public web sites (such as this one) tend to involve less of a formal architecture, and more of a mentality of "just get this cool feature to work so we can get more visitors." Again, no number crunching or algorithms unless performance is an issue. Performance is more of an issue for massively popular sites like Slashdot or Google, so if you anticipate rapid growth it pays to design for scalability in advance.
Public e-commerce web sites are kind of like both of the above: features and performance are important, but equally important is the structured architecture underneath it that ties all of the commerce business systems together (purchasing, supplier, shopping cart, payment gateways, etc.)
For the actual GUI portion, the complexity of the application kind of determines how complex the GUI code will be. For highly complex, nested GUIs where your requirements change often, it's easy to fall into the trap of putting too much GUI stuff into one page. Soon the page exceeds most people's complexity threshold, making the page very difficult to maintain. It pays to think in advance how you can separate different portions of the GUI into separate components, and then tie them together. If you're new to GUI programming, read some articles on the Model-View-Controller (MVC) pattern.
For simple web sites, where most pages are fairly static, this issue doesn't come up so much because each individual page is easy to maintain.

Most web programming is done in the style popular in the early seventies, before Dijkstra's 'goto considered harmful' was well-known.

Flow Based Programming

I have been doing a little reading on Flow Based Programming over the last few days. There is a wiki which provides further detail. And wikipedia has a good overview on it too. My first thought was, "Great another proponent of lego-land pretend programming" - a concept harking back to the late 80's. But, as I read more, I must admit I have become intrigued.
Have you used FBP for a real project?
What is your opinion of FBP?
Does FBP have a future?
In some senses, it seems like the holy grail of reuse that our industry has pursued since the advent of procedural languages.

1. Have you used FBP for a real project?
We've designed and implemented a DF server for our automation project (dispatcher, component iterface, a bunch of components, DF language, DF compiler, UI). It is written in bare C++, and runs on several Unix-like systems (Linux x86, MIPS, avr32 etc., Mac OSX). It lacks several features, e.g. sophisticated flow control, complex thread control (there is only a not too advanced component for it), so it is just a prototype, even it works. We're now working on a full-featured server. We've learnt lot during implementing and using the prototype.
Also, we'll make a visual editor some day.
2. What is your opinion of FBP?
2.1. First of all, dataflow programming is ultimate fun
When I met dataflow programming, I was feel like 20 years ago, when I met programming first. Altough, DF programming differs from procedural/OOP programming, it's just a kind of programming. There are lot of things to discover, even sooo simple ones! It's very funny, when, as an experienced programmer, you met a DF problem, which is a very-very basic thing, but it was completely unknown for you before. So, if you jump into DF programming, you will feel like a rookie programmer, who first met the "cycle" or "condition".
2.2. It can be used only for specific architectures
It's just a hammer, which are for hammering nails. DF is not suitable for UIs, web server and so on.
2.3. Dataflow architecture is optimal for some problems
A dataflow framework can make magic things. It can paralellize procedures, which are not originally designed for paralellization. Components are single-threaded, but when they're organized into a DF graph, they became multi-threaded.
Example: did you know, that make is a DF system? Try make -j (see man, what -j is used for). If you have multi-core machine, compile your project with and without -j, and compare times.
2.4. Optimal split of the problem
If you're writing a program, you often split up the problem for smaller sub-problems. There are usual split points for well-known sub-problems, which you don't need to implement, just use the existing solutions, like SQL for DB, or OpenGL for graphics/animation, etc.
DF architecture splits your problem a very interesting way:
the dataflow framework, which provides the architecture (just use an existing one),
the components: the programmer creates components; the components are simple, well-separated units - it's easy to make components;
the configuration: a.k.a. dataflow programming: the configurator puts the dataflow graph (program) together using components provided by the programmer.
If your component set is well-designed, the configurator can build such system, which the programmer has never even dreamed about. Configurator can implement new features without disturbing the programmer. Customers are happy, because they have personalised solution. Software manufacturer is also happy, because he/she don't need to maintain several customer-specific branches of the software, just customer-specific configurations.
2.5. Speed
If the system is built on native components, the DF program is fast. The only time loss is the message dispatching between components compared to a simple OOP program, it's also minimal.
3. Does FBP have a future?
Yes, sure.
The main reason is that it can solve massive multiprocessing issues without introducing brand new strange software architectures, weird languages. Dataflow programming is easy, and I mean both: component programming and dataflow configuration building. (Even dataflow framework writing is not a rocket science.)
Also, it's very economic. If you have a good set of components, you need only put the lego bricks together. A DF program is easy to maintain. The DF config building requires no experienced programmer, just a system integrator.
I would be happy, if native systems spread, with doors open for custom component creating. Also there should be a standard DF language, which means that it can be used with platform-independent visual editors and several DF servers.

Interesting discussion! It occurred to me yesterday that part of the confusion may be due to the fact that many different notations use directed arcs, but use them to mean different things. In FBP, the lines represent bounded buffers, across which travel streams of data packets. Since the components are typically long-running processes, streams may comprise huge numbers of packets, and FBP applications can run for very long periods - perhaps even "perpetually" (see a 2007 paper on a project called Eon, mostly by folks at UMass Amherst). Since a send to a bounded buffer suspends when the buffer is (temporarily) full (or temporarily empty), indefinite amounts of data can be processed using finite resources.
By comparison, the E in Grafcet comes from Etapes, meaning "steps", which is a rather different concept. In this kind of model (and there are a number of these out there), the data flowing between steps is either limited to what can be held in high-speed memory at one time, or has to be held on disk. FBP also supports loops in the network, which is hard to do in step-based systems - see for example http://www.jpaulmorrison.com/cgi-bin/wiki.pl?BrokerageApplication - notice that this application used both MQSeries and CORBA in a natural way. Furthermore, FBP is natively parallel, so it lends itself to programming of grid networks, multicore machines, and a number of the directions of modern computing. One last comment: in the literature I have found many related projects, but few of them have all the characteristics of FBP. A list that I have amassed over the years (a number of them closer than Grafcet) can be found in http://www.jpaulmorrison.com/cgi-bin/wiki.pl?FlowLikeProjects .

I do have to disagree with the comment about FBP being just a means of implementing FSMs: I think FSMs are neat, and I believe they have a definite role in building applications, but the core concept of FBP is of multiple component processes running asynchronously, communicating by means of streams of data chunks which run across what are now called bounded buffers. Yes, definitely FSMs are one way of building component processes, and in fact there is a whole chapter in my book on FBP devoted to this idea, and the related one of PDAs (1) - http://www.jpaulmorrison.com/fbp/compil.htm - but in my opinion an FSM implementing a non-trivial FBP network would be impossibly complex. As an example the diagram shown in
is about 1/3 of a single batch job running on a mainframe. Every one of those blocks is running asynchronously with all the others. By the way, I would be very interested to hearing more answers to the questions in the first post!
1: http://en.wikipedia.org/wiki/Pushdown_automaton Push-down automata

Whenever I hear the term flow based programming I think of LabView, conceptually. Ie component processes who's scheduling is driven primarily by a change to its input data. This really IS lego programming in the sense that the labview platform was used for the latest crop of mindstorm products. However I disagree that this makes it a less useful programming model.
For industrial systems which typically involve data collection, control, and automation, it fits very well. What is any control system if not data in transformed to data out? Ie what component in your control scheme would you not prefer to represent as a black box in a bigger picture, if you could do so. To achieve that level of architectural clarity using other methodologies you might have to draw a data domain class diagram, then a problem domain run time class relationship, then on top of that a use case diagram, and flip back and forth between them. With flow driven systems you have the luxury of being able to collapse a lot of this information together accurately enough that you can realistically design a system visually once the components are build and defined.
One question I never had to ask when looking at an application written in labview is "What piece of code set this value?", as it was inherent and easy to trace backwards from the data, and also mistakes like multiple untintended writers were impossible to create by mistake.
If only that was true of code written in a more typically procedural fashion!

1) I build a small FBP framework for an anomaly detection project, and it turns out to have been a great idea.
You can also have a look at some of the KNIME videos, that give a good idea of what a flow based framework feels like when the framework is put together by a great team. Admittedly, it is batch based and not created for continuous operation.
By far the best example of flow based programming, however, is UNIX pipes which is one of the oldest, most overlooked FBP framework. I don't think I have to elaborate on the power of nix pipes...
2) FBP is a very powerful tool for a large set of problems. The intrinsic parallelism is a great advantage, and any FBP framework can be made completely network transparent by using adapter modules. Smart frameworks are also absurdly fault tolerant, and able to dynamically reload crashed modules when necessary. The conceptual simplicity also allows cleaner communication with everybody involved in a project, and much cleaner code.
3) Absolutely! Pipes are here to stay, and are one of the most powerful feature of unix. The power inherent in a FBP framework compared to a static program are many, and trivialise change, to the point where some frameworks can be reconfigured while running with no special measures.
FBP FTW! ;-)

In automotive development, they have a language agnostic messaging protocol which is part of the MOST specification (Media Oriented Systems Transport), this was designed to communicate between components over a network or within the same device. Systems usually have both a real and visualized message bus - therefore you effectively have a form of flow based programming.
That was what made the light bulb go on for me several years ago and brought me here. It really is a fantastic way to work and so much more fun than conventional programming. The message catalog form the central specification and point of reference. It works well for both developers and management. i.e. Management are able to browse the message catalog instead of looking at source.
With integrated logging also referencing the catalog to produce intelligible analysis things can get really productive. I have real world experience of developing commercial products in this way. I am interested in taking things further, particularly with regards to tools and IDEs. Unfortunately I think many people within the automotive sector have missed the point about how great this is and have failed to build on it. They are now distracted by other fads and failed to realize that there was far more to most development than the physical bus.

I've used Spring Web Flow extensively in Java Web applications to model (typically) application processes, which tend to be complex wizard-like affairs with lots of conditional logic as to which pages to display. Its incredibly powerful. A new product was added and I managed to recut the existing pieces into a completely new application process in an hour or two (with adding a couple of new views/states).
I also looked into using OS Workflow to model business processes but that project got canned for various reasons.
In the Microsoft world you have Windows Workflow Foundation ("WWF"), which is becoming more popular, particularly in conjunction with Sharepoint.
FBP is just a means of implementing a finite state machine. It's nothing new.

I realize that it is not exactly the same thing, but this model has been used for years in PLC programming. ISO calls it Sequential Flow Chart, but many people call it Grafcet after a popular implementation. It offers parallel processing and defines transitions between states.

It's being used in the Business Intelligence world these days to mashup and process data. Data processing steps like ETL, querying, joining , and producing reports can be done by the end-user. I'm a developer on an open system - ComposableAnalytics.com In CA, the flow-based apps can be shared and executed via the browser.

This is what MQ Series, MSMQ and JMS are for.
This is cornerstone of Web Services and Enterprise Service Bus implementations.
Products like TIBCO and Sun's JCAPS are basically flow-based without using this particular buzz-word.
Most of the work of the application is done with small modules that pass messages through a processing network.

Nightly Builds: Why should I do it? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Why should I do Nightly Builds?

You should do nightly builds to ensure that your codebase stays healthy.
A side effect of doing nightly builds is that it forces the team to create and maintain a fully automated build script. This helps to ensure that your build process is documented and repeatable.
Automated builds are good at finding the following problems:
Somebody checked in something that breaks stuff.
Somebody forgot to check in a necessary file or change.
Your build scripts no longer work.
Your build machine is broken.
Doing this nightly ensures that you catch such problems within 24 hours of when they occur. That is preferable to finding all the problems 24 hours before you are supposed to deliver the software.
You should also, of course, have automated unit tests that are run for each nightly build.

I've personally found continuous integration to be better than nightly builds:
http://en.wikipedia.org/wiki/Continuous_integration
I even use it on one man projects, it's amazing how fast you can expose issues and take care of them right there.

I've been doing build engineering (among other things) for 16 years. I am a strong believer in build-early, build-often, continuous integration. So the first thing I do with a project is establish how it will be built (Java: Ant or Maven; .NET: NAnt or MSBuild) and how it will be managed (Subversion or some other version control). Then I'll add Continuous Integration (CruiseControl or CruiseControl.NET) depending upon the platform, then let the other developers loose.
As the project grows, and the need for reports and documentation grows, eventually the builds will take longer to run. At that point I'll split the builds into continuous builds (run on checkin) that only compile and run unit tests and daily builds that build everything, run all the reports, and build any generated documentation. I may also add a delivery build that tags the repository and does any additional packaging for a customer delivery. I'll use fine-grained build targets to manage the details, so that any developer can build any part of the system -- the Continuous Integration server use the exact same build steps as any developer. Most importantly, we never deliver a build for testing or a customer that wasn't built using the build server.
That's what I do -- here's why I do it (and why you should too):
Suppose you have a typical application, with multiple projects and several developers. While the developers may start with a common, consistent development environment (same OS, same patches, same tools, same compilers), over the course of time their environments will diverge. Some developers will religiously apply all security patches and upgrades, others won't. Some developers will add new (maybe better) tools, others won't. Some will remember to update their complete workspace before building; others will only update the part of the project they're developing. Some developers will add source code and data files to the project, but forget to add them to source control. Others will write unit tests that depend upon specific quirks of their environment. As a consequence, you'll quickly see the ever-popular "Well, it builds/works on my machine" excuses.
By having a separate, stable, consistent, known-good server for building your application, you'll easily discover these sorts of problems, and by running builds from every commit, you'll be able to pinpoint when a problem crept into the system. Even more importantly, because you use a separate server for building and packaging your application, it will always package everything the same way, every time. There is nothing worse than having a developer ship a custom build to a customer, have it work, and then have no idea how to reproduce the customizations.

When I saw this question, first I searched for Joel Spolsky's answer. Bit disappointed, so I planned to add it here.
Hope everyone is aware of Joel Test on Careers.
From his blog on The Joel Test: 12 Steps to Better Code
3. Do you make daily builds?
When you're using source control, sometimes one programmer
accidentally checks in something that breaks the build. For example,
they've added a new source file, and everything compiles fine on their
machine, but they forgot to add the source file to the code
repository. So they lock their machine and go home, oblivious and
happy. But nobody else can work, so they have to go home too, unhappy.
Breaking the build is so bad (and so common) that it helps to make
daily builds, to insure that no breakage goes unnoticed. On large
teams, one good way to insure that breakages are fixed right away is
to do the daily build every afternoon at, say, lunchtime. Everyone
does as many checkins as possible before lunch. When they come back,
the build is done. If it worked, great! Everybody checks out the
latest version of the source and goes on working. If the build failed,
you fix it, but everybody can keep on working with the pre-build,
unbroken version of the source.
On the Excel team we had a rule that whoever broke the build, as their
"punishment", was responsible for babysitting the builds until someone
else broke it. This was a good incentive not to break the build, and a
good way to rotate everyone through the build process so that everyone
learned how it worked.
Though I haven't got an opportunity to make daily builds, I'm a great fan of it.
Still not convinced? Check out the brief here in Daily Builds Are Your Friend!!

You don't actually, what you should be wanting is Continuous Integration and automatic testing (which is a step further than nightly builds).
If you are in any doubt you should read this article by Martin Fowler about Continuous Integration.
To summarize, you want to build and test as early and often as possible to spot errors immediately so they can be fixed while what you were trying to achieve when you caused them is still fresh in your mind.

I'd actually recommend to do builds every time you check in. In other words, I'd recommend setting up a Continuous Integration system.
The advantages of such a system and other details can be found in Fowler's article and on the Wikipedia entry among other places.
In my personal experience, it's a matter of Quality Control: every time code (or tests, which can be seen as a form of requirements) are modified, bugs might be creeping in. To ensure quality you should make a fresh build of the product as it would be shipped and perform all the tests available. The more often this is done, the less likely bugs will be allowed to form a colony. Therefore, daily (nightly) or continuous cycles are preferred.
In addition, whether you restrict access o your project to developers or a larger group of users, a nightly build enables everyone to be on the 'latest version', minimizing the pain of merging their own contributions back into the code.

You want to do builds on a regular schedule in order to catch problems with integration of code between developers. The reason you want to do this nightly, as opposed to weekly or on some longer schedule, is that the longer you wait to discover these kinds of problems, the more difficult it will be to resolve them. The practice of doing a build on every check in (Continuous Integration) is just taking the nightly build process to a logical extreme.
The side benefit of having a repeatable build process is important in the long run as well. If you work on a team where there are multiple projects going on, then at some point you will need to be able to easily recreate an old build, perhaps for creating a patch. :(
The more you can automate the build process, the more time you will save for each subsequent build. It also takes the build process itself off of the critical path of delivering the final product, which should make your manager happy. :)

It also depends on the size and structure of the team(s) working on your project. If there are different teams relying on each others API, it may make a lot of sense to have nightly builds for frequent integration. If you're hacking away with only one or two team mates it may or may not be worth it.

Depending on the complexity of your product continuous integration may or may not be able run a full test suite.
Imagine Cisco testing a router with the literally 1000s of different setups to test. To run a full test suite on some products takes time. Sometimes weeks. So you need builds for different purposes. A nightly build can be the basis for a more thorough test suite.

I think they are very important especially on projects with more than 1 person. The team needs to know ASAP if someone:
checks in a bad file
doesn't check in a file
...

Any build automation is better than no build automation :-)
Personally, I prefer daily builds - that way if the build doesn't work then everyone is around to get it fixed.
In fact, if at all possible then Continuous Integration builds are the way to go (i.e. a build on every check-in) as that minimizes the amount of change between a build and so makes it easy to tell who broke the build and also easy to fix the build.

Well ... I guess it depends a lot on your project, of course. If it's just your hobby project, with no releases, no dependencies, and noone but you submitting code, it might be overkill.
If, on the other hand, there's a team of developers all submitting code, automatic nightly builds will help you ensure the quality of the code in the repository. If someone does something that "breaks the build" for all others, it will quickly be noticed. It is possible to break the build without noticing, for instance by forgetting to add a new file to the repository, and nightly builds in a centralized location will detect these quite quickly.
There are of course other possible benefits, I'm sure others will supply them. :)

Nightly builds are only necessary for significantly large projects (when it takes too long to build it often throughout the day). If you have a small project that does not take long to build you can build it as you get functional pieces of code done so that you know that you did not mess anything up in the procees. However, with larger projects this is not possible so it is important to build the project just so that you know that everything is still in working order

There are several reasons, some will be more applicable than others
If your project is being worked on by two or more people
It's a good way to grab the latest version of code that you aren't working on
A nightly build provides a slice in time of the current state of the code
A nightly build will give you a stable build if you need to send code to people

Nightly builds aren't always necessary - I think they're only really useful on big projects. But if you're on a big project, a nightly build is a good way of checking that everything is working - you can run all your tests (unit tests, integration tests), build all your code - in short, verify that nothing is broken in your project.
If you've got a smaller project your build and test times will be shorter so you can probably afford to do more regular builds.

Nightly builds are ideal for performing static code analysis (see qalab and the projects it collects stats from if you are in java world). Unfortunately, this is something that's rarely done.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008