Related
We're developing a web site. One of the development tools we're using has an alpha release available of its next version which includes a number of features which we really want to use (ie they'd save us from having to implement thousands of lines to do pretty much exactly the same thing anyway).
I've done some initial evaluations on it and I like what I see. The question is, should we start actually using it for real? ie beyond just evaluating it, actually using it for our development and relying on it?
As alpha software, it obviously isn't ready for release yet... but then nor is our own code. It is open source, and we have the skills needed to debug it, so we could in theory actually contribute bug fixes back.
But on the other hand, we don't know what the release schedule for it is (they haven't published one yet), and while I feel okay developing with it, I wouldn't be so sure about using it in production so if it isn't ready before we are then it may delay our own launch.
What do you think? Is it worth taking the risk? Do you have any experiences (good or bad) of similar situations?
[EDIT]
I've deliberately not specified the language we're using or the dev-tool in question in order to keep the scope of the question broad, as I feel it's a question that can apply to pretty much any dev environment.
[EDIT2]
Thank you to Marjan for the very helpful reply. I was hoping for more responses though, so I'm putting a bounty on this.
I've had experience contributing to an open source project once, like you said you hope to contribute. They ignored the patch for one year (they have customers to attend of course, although they don't sell the software but the support). After one year, they rejected the patch with no alternative solution to the problem, and without a sound foundation to do that. It was just out of their scope at that time, I guess.
In your situation I would try to solve one or two of their not-so-high priority, already reported bugs and see how responsive they are, and then decide. Because your success on deadlines will be compromised to theirs. If you have to maintain a copy of their artifacts, that's guaranteed pain.
In short: not only evaluate the product, evaluate the producers.
Regards.
My personal take on this: don't. If they don't come through for you in your time scale, you're stuck and will still have to put in the thousands of lines yourself and probably under a heavy time restriction.
Having said that, there is one way I see you could try and have your cake and eat it too.
If you see a way to abstract it out, that is to insulate your own code from the library's, for example using adapter or facade patterns, then go ahead and use the alpha for development. But determine beforehand what the latest date is according to your release schedule that you should start developing your own thousands of lines version behind the adapter/facade. If the alpha hasn't turned into an RC by then: grin and bear it and develop your own.
It depends.
For opensource environments it depends more on the quality of the release than the label (alpha/beta/stable) it has. I've worked with alpha code that is rock solid compared to alleged production code from another producer.
If you've got the source then you can fix the any bugs, whereas with closed source (usually commercially supported) you could never release production code built with a beta product because it's unsupported by the vendor who has the code, and so you can't fix it.
So in your position I'd be assessing the quality of the alpha version and then deciding if that could go into production.
Of course all of the above doesn't apply to anything even remotely safety critical.
It is just a question of managing risks. In open source, alpha release can mean a lot of different things. You need to be prepared to:
handle API changes;
provide bug fixes and workarounds;
test stability, performance and scalability yourself;
track changes much more closely, and decide whether to adopt then yet;
track the progress they are making and their responsiveness to patches/issues.
You do use continuous integration, do you?
I'm trying to choose between a couple of different HTML parsers for a project I am working on, part of which accepts HTML input from the client.
I've built a simple automated test for each one, to see if they fit my needs. I have a large number of real-life HTML fragments to test, but they aren't enough for testing for safety, since they (probably) do not contain any malicious code.
I don't mind reviewing the outputs by hand.
My question is, is there a freely available database or list of HTML snippets containing malformed HTML and scripts intended for testing for XSS?
The ha.ckers XSS cheatsheet is pretty comprehensive, and was the catalyst for me to build a whitelist based sanitiser into jsoup.
Google's home page seems to be malformed, maybe you can use that?
http://validator.w3.org/check?uri=www.google.com&charset=%28detect+automatically%29&doctype=Inline&group=0
http://www.codinghorror.com/blog/2006/11/its-a-malformed-world.html
I built html-sanitizer-testbed for exactly this purpose. It consists of two components:
A suite of tests, that are designed to check the security of a HTML sanitizer. I have collected every tricky case I've been able to find. It includes everything on the ha.ckers.org XSS cheatsheet, as well as many other test cases I've collected over the years. Over the years I've analyzed dozens of HTML sanitizers (most of them were vulnerable), and added a test case for every security vulnerability I've ever found, so this is a pretty nice collection.
Also, it provides some test automation functionality, so that you don't need to review the outputs by hand: you can fire up a browser and check whether the browser seems to have executed any Javascript in the outputs of the sanitizer (in which case the sanitizer is broken). This part is not 100% reliable and comes with no guarantees whatsoever, so for maximum effectiveness, you might want to review the outputs by hand. However, it has worked pretty well for me so far.
I welcome feedback and contributions.
I am after some advice and pointers on integration testing for a web app. Our project has been running for a number of years, and it is reasonably complex. We are pretty well covered with unit tests, but we are missing a decent set of integration tests. We don't have documented use cases or even a reasonable set of test cases beyond our unit tests. 'Integration testing' today consists of the developer's knowledge of the likely impact of a change and manual, ad-hoc testing of the app. It is really not ideal - we now want to design and automate a solid set of tests to allow us to perform regression testing, and increase our confidence in the quality of the app.
We have finally built a platform (based on Selenium) to allow us to quickly author and automate the execution of the tests. The problem now: we don't have any tests, the page is well and truly blank. The system has around 30 classes which interact with each other and influence the UI. For a new user signing up, there are about 40 properties that can be set, with each once impacting the experience. Over the user life time they will generate even more states. Given so many variables and possible states, it is a daunting prospect to get started, which is probably why it has been neglected thus far.
The pain of not having a decent set of tests is now becoming destructive. I am dedicating time to get this problem fixed - I am after some practical advice on the authoring of the tests. How do you approach it? Do you have any links I may find useful? How can I stop my mind running away with the seemingly infinite number of states for a user's data? How can I flush out the edge cases which are failing (and our users seeming to be finding)?
If it is the sheer amount of combinations that is holding you back in trying to generate testcases, you should definitly take a look at all-pair testing.
We have used PICT from microsoft as a tool to successfully minimize the amount of testcases while still being reasonable confident to have most cases covered.
the reasoning behind all-pairs testing
is this: the simplest bugs in a
program are generally triggered by a
single input parameter. The next
simplest category of bugs consists of
those dependent on interactions
between pairs of parameters, which can
be caught with all-pairs testing.1
Bugs involving interactions between
three or more parameters are
progressively less common2, whilst
at the same time being progressively
more expensive to find by exhaustive
testing, which has as its limit the
exhaustive testing of all possible
inputs.
I have heard many developers refer to code as "legacy". Most of the time it is code that has been written by someone who no longer works on the project. What is it that makes code, legacy code?
Update in response to:
"Something handed down from an ancestor or a predecessor or from the past" http://www.thefreedictionary.com/legacy. Clearly you wanted to know something else. Could you clarify or expand your question? S.Lott
I am looking for the symptoms of legacy code that make it unusable or a nightmare to work with. When is it better to throw it away? It is my opinion that code should be thrown away more often and that reinventing the wheel is valuable part of development. The academic ideal of not reinventing the wheel is a nice one but it is not very practical.
On the other hand there is obviously legacy code worth keeping.
By using hardware, software, APIs, languages, technologies or features that are either no longer supported or have been superceded, typically combined with little to no possibility of ever replacing that code, instead using it til it or the system dies.
What is it that makes code, legacy code?
As with plain legacy, when the author is dead or missing, you as a heir get all or some of his code.
You shed some tears and try to figure out what to do with all this rubbish.
Michael Feathers has an interesting definition in his book Working Effectively with Legacy Code. According to him legacy code is code without automated tests.
It is a very general (and oft abused term) but any of the following would be legitimate reasons to call an app legacy:
The code base is based on a language/platform which is entirely unsupported by the manufacturer of the original product (often said manufacturer has gone out of business).
(really 1a) The code base or platform on which it is built is so old that getting qualified or experienced developers for the system is both hard and expensive.
The application supports some aspect of the business which is no longer actively grown and for which alterations are extremely rare, normally to fix it if something entirely unexpected changes around it (the canonical example being the Y2K issue) or if some regulation/external pressure forces it. Since both reasons are pressing and normally unavoidable but no significant development has occurred on the project it is likely that those people assigned to deal with this will be unfamiliar with the system (and it's accumulated behaviours and intricacies). In these cases this would often be reason to increase the perceived and planned for risk associated with the project.
The system has/or is being replaced with another. As such the system may be used for much less than originally intended, or perhaps only as a means of viewing historical data.
Legacy generally refers to code that is no longer being developed - meaning that if you use it, you have to use it on its original terms - you cannot just edit it to support the way the world looks today. For example, legacy code has to run on hardware that may not exist today - or is no longer supported.
According to Michael Feathers, the author of the excellent Working Effectively with Legacy Code, legacy code is a code which has no tests. When there is no way to know what breaks when this code changes.
The main thing that distinguishes
legacy code from non-legacy code is
tests, or rather a lack of tests. We
can get a sense of this with a little
thought experiment: how easy would it
be to modify your code base if it
could bite back, if it could tell you
when you made a mistake? It would be
pretty easy, wouldn't it? Most of the
fear involved in making changes to
large code bases is fear of
introducing subtle bugs; fear of
changing things inadvertently. With
tests, you can make things better with
impunity. To me, the difference is so
critical, it overwhelms any other
distinction. With tests, you can make
things better. Without them, you just
don’t know whether things are getting
better or worse.
Nobody is gonna read this, but I feel the other answers don't get it quite right:
It has value, if it wasn't useful it would've been thrown away long ago
Its hard to reason about because either of
Lack of documentation,
Original author cannot be found or forgot (yes 2 months later your code can be legacy code too!!),
Lack of tests or typesystem
Doesn't follow modern practices (ie no context to hold on too)
There is a requirement to change or extend it.
If there isn't a requirement to change it, it isn't legacy code
since nobody cares about it. It does its thing and there is nobody
around to call it legacy code.
A colleague once told me that legacy code was any code that you hadn't written yourself.
Arguably, it's just a pejorative term for code that we don't like any more for whatever reason (typically because it's not cool or fashionable but it works).
The TDD brigade might suggest that any code without tests is legacy code.
Legacy code is source code that relates to a no-longer supported or manufactured operating system or other computer technology.
http://en.wikipedia.org/wiki/Legacy_code
"Legacy code is source code that relates to a no-longer supported or manufactured "
Any code with support (or documentation) missing. Be it:
inline comments
technical documentation
spoken documentation (the person who wrote it)
unit tests documenting the workings of the code
For me legacy code is code that was written prior to some paradigm shift.
It may still be very much in use but it is in the process of being refactored to bring it into line.
e.g. Old procedural code hanging around in an otherwise OO system.
Code (or anything else, really) becomes "legacy" when it has been replaced by something newer/better, and yet despite this it's still used and kept alive "in the wild".
Preserving legacy code is not so much an academic ideal as it is keeping code that works, no matter how poorly. In many conservative enterprise situations, that would be considered more practical than throwing it away and starting again from scratch. Better the devil you know...
Legacy code is code that is painful/expensive to keep current with changing requirements.
There are two ways that this can happen:
The code is unsuitable for change
The semantics of the code have been swapped out to silicon
1) is the easier of the two to recognize. It is software that has fundamental limits making it unable to keep up with the ecosystem around it. For example, a system built around O(n^2) algorithm won't scale beyond a certain point and must be re-written if requirements move in that direction. Another example is code using libraries that are not supported on the latest OS versions.
2) Is harder to recognize, but all code of this kind shares the characteristic that people are afraid to change it. This could be because it was badly written/documented to begin with, because it is untested, or because it is non-trivial and the original authors who understood it left the team.
The ASCII/Unicode chars that comprise living code have semantic meaning, the "why's", "what's" and to some degree the "how's", in the minds of people associated with it. Legacy code is either un-owned or the owners do not have meaning associated with large portions of it. Once this happens (and it could happen the next day with really poorly-written code), to change this code, someone must learn it and understand it. This process is a significant fraction of the time it takes to write it in the first place.
The day you're afraid to refactor your code is the day when your code has become legacy.
I consider code "legacy" if any or all of the following conditions apply:
It was written using a language or methodology that is a generation behind current standards
The code is a complete mess with no planning or design behind it
It is written in outdated languages and in an outdated, non object-oriented style
It is difficult to find developers who know the language because it is so old
Unlike some of the other opinions here, I've seen plenty of modern applications that work decently without unit tests. Unit testing still has not caught on with everyone. Perhaps ten years from now the next generation of programmers will look at our current applications and consider them "legacy" for not containing unit tests, just as I consider non object-oriented applications to be legacy.
If few changes need to be made to a legacy codebase, it's better to simply leave it as-is and go with the flow. If the application needs drastic functionality changes, a GUI overhaul, and/or you can't find anyone who knows the programming language, it's time to throw away and start over. A word of warning, however: rewriting from scratch can be very time-consuming, and it's difficult to know if you've replicated all functionality. You'll probably want to have test cases and unit tests written for the legacy application and the new application.
Quite honestly legacy code is any code, framework, api, of other software construct thta's not "cool" anymore. For example COBOL is unanimously regarded as legacy while APL is not. Now one can also make the case that COBOL is consideed legacy and APL not because it has about 1m times the install base as APL. However, if you say that you need to work on APL code the reply would not be "oh no, that legacy stuff" but rather "oh my god, guess you won't be doing anything for the next century" see the difference?
This is a general term thrown around quite often (and quite generically) in the software ecosystem.
Well, I like to think of legacy code as inherited code. This is simply code that was written in the past. In most cases, legacy code do not follow new/current practices and is often considered archaic.
Legacy code is anything written more than a month ago :-)
It's often any code that isn't written in the trendy scripting language du jour, and I'm only half joking.
Any useful metrics will be fine
One of the things that I look for in a code is unit test. This will give the freedom to refactor it. So if the code does not have tests I consider it a legacy code.
If the code:
has been replaced by newer code that implements the same or functionality or better
is not being used by current systems
is soon to be replaced by something else altogether
has been archived for historic reasons
when vendors stop supporting it
We use the term "legacy" to refer to any code, still in use, developed using technology we have ceased active development in.
It is code that we would rather rewrite using more recent tools than modify in its current state.
Micheal Feathers, Author of the excellent "Working Effectively with Legacy Code", defines it as any code that does not have tests.
A better question would probably be what marks a piece of code as non legacy.
To me legacy means unchangeable. So as soon as you're no longer 'able' to change it it's legacy.
Whether that ability is removed by fixed requirements, fear of breakage, knowledge loss, or some other impact is largely irrelevant.
A related note is that I don't think I'd ever use the exact word legacy as it stirs up too many emotions to be useful.
I don't believe there is a definitive answer, but I do believe that the likelihood that code is legacy code increases with the number of people who don't want to touch it and the likelihood that changing it will cause it to break.
the term "legacy code" is subjective and is probably a loaded term. but in general I subscribe to the view that legacy code is one that is not unit-testable and as such is hard to refactor.
When the code is old enough you never met the developer who originally wrote the code.
When 3rd party libraries aren't supported anymore.
In my opinion all code that is written is legacy code. It might take some time before the original intent and all the decisions made about the code is forgotten but sooner or later you cannot imagine what they were thinking while writing it. You never write legacy code yourself, right?
Using unit tests or some measure like seconds since the developer has left the building do not really measure whether or not the code is legacy code. Legacy code may have a good set of unit tests and comments and it may have undergone a strict code review and other analysis. This doesn't mean that the code is still relevant for the program at hand. It just suggests that the code might be comparably well written. And if it is no longer relevant, the code will actually make it harder to solve the problem the program is developed for.
Legacy code has been defined in many places as "code without tests". I don't think they are specific in the types of tests, but in general, if you can't make a change to your code without the fear of something unknown happening, well, it quickly devolves.
See "Working Effectively with Legacy Code"
I maybe wrong, but I don't think there is an established metric for this.
Usually a piece of code is deemed to be legacy, when it has seen at least 5-6 release cycles( maybe more ). More often than not, the Original Implementor is no longer around and the code is maintained through.
Almost seconds after the devs leave the premises. :)
If...
there's no money in the bank for new features
you can't find anyone that admits working on the project that needs fixing
the source code to the project you own has gone MIA
...then you're working on legacy code.
Usually people refer to something as legacy code when no one is still around that is familiar with or feels comfortable maintaining the code.
Unit tests make it easier for people unfamiliar with code to dig into it, so the theory is it helps prevent code from becoming "legacy".
Often when code is legacy it is changed in a different manner. People are afraid to change it, but also changes tend to be quick and dirty because nobody understands full consequences. Code duplication issues may arise, because people don't want to take the risk associated with deeper changes.
So, in such circumstances, the situation may get worse, at an increasing rate.
I don't know of any real metrics that can be used to determine if something is "legacy code" or not, but anything older than just written could be considered legacy. Legacy code means different things to different people/organizations, so it really is somewhat subjective.