What is the best practice to write Selenium-based integration testing from zero for a complex application? - language-agnostic

I am after some advice and pointers on integration testing for a web app. Our project has been running for a number of years, and it is reasonably complex. We are pretty well covered with unit tests, but we are missing a decent set of integration tests. We don't have documented use cases or even a reasonable set of test cases beyond our unit tests. 'Integration testing' today consists of the developer's knowledge of the likely impact of a change and manual, ad-hoc testing of the app. It is really not ideal - we now want to design and automate a solid set of tests to allow us to perform regression testing, and increase our confidence in the quality of the app.
We have finally built a platform (based on Selenium) to allow us to quickly author and automate the execution of the tests. The problem now: we don't have any tests, the page is well and truly blank. The system has around 30 classes which interact with each other and influence the UI. For a new user signing up, there are about 40 properties that can be set, with each once impacting the experience. Over the user life time they will generate even more states. Given so many variables and possible states, it is a daunting prospect to get started, which is probably why it has been neglected thus far.
The pain of not having a decent set of tests is now becoming destructive. I am dedicating time to get this problem fixed - I am after some practical advice on the authoring of the tests. How do you approach it? Do you have any links I may find useful? How can I stop my mind running away with the seemingly infinite number of states for a user's data? How can I flush out the edge cases which are failing (and our users seeming to be finding)?

If it is the sheer amount of combinations that is holding you back in trying to generate testcases, you should definitly take a look at all-pair testing.
We have used PICT from microsoft as a tool to successfully minimize the amount of testcases while still being reasonable confident to have most cases covered.
the reasoning behind all-pairs testing
is this: the simplest bugs in a
program are generally triggered by a
single input parameter. The next
simplest category of bugs consists of
those dependent on interactions
between pairs of parameters, which can
be caught with all-pairs testing.1
Bugs involving interactions between
three or more parameters are
progressively less common2, whilst
at the same time being progressively
more expensive to find by exhaustive
testing, which has as its limit the
exhaustive testing of all possible
inputs.

Related

Some questions about the process of BDD from a beginner

These days I've read several articles about BDD to find what it is talking about. Now I get a basic understandings, but still not clear about the whole process.
The following is what I think to do in a BDD process:
All the stakeholders(BA, customer, Dev, QA) are sitting together to discuss the requirements, and write the agreed features on story cards. Here I take a "user registeration" feature as example:
As a user,
I want to register on the system,
so that I can use its services
Create several scenarios in Given/When/Then format, and here is one of them:
Scenario: user successfully register
Given an register page
And an un-registered user
When the user fills username "Jeff" and password "123456"
And click on "Register"
Then the user can see a "Success" message
And the user "Jeff" is created in the system
Implement this scenario with some BDD testing framework, say cucumber-jvm, like:
import cucumber.api.java.en.Given;
public class Stepdefs {
#Given("an register page")
public void an_register_page throws Throwable {
// ...
}
#Given("an un-registered user")
public void an_register_page throws Throwable {
// ...
}
// ...
}
for the steps one by one.
But I soon find myself in trouble: there are pages, models, maybe databases need for this scenario, seems a lot of thing to do.
What should I do now?
What should I do now? Do I need to discuss this scenario with all the stakeholders? For BA/Customer/QA, I don't think they really care about the implementations, is it a good idea to discuss it with some other developers?
Suppose after I discuss it with some other developers, we agree to split it to several small parts. Can we make these small parts as "scenario"s with Scenario/Given/When/Then format as what we just did with cucumber-jvm, or can we use JUnit as we normally do in TDD?
1. If choose "cucumber-jvm", it seems a little heavy for small part
2. If choose JUnit, we need to involve more than one testing framework in a project
3. Is it the best if there is a single testing framework to do both things (not sure if there is)
Suppose I choose option 2, using JUnit for the small tasks
The following is what I will do after this decision:
Now we create new small tests to drive implementations, like creating user in database, as we normally do in TDD. (red->green->refactoring). And we don't care the cucumber test Scenario: user successfully register (which is failed) for now, just leave it there. Right?
We develop more small tests with JUnit, made them red -> green -> refactored. (And the imcomplete cucumber test is always failed)
Until all the small tests are passed, we turn to the cucumber test Scenario: user successfully register. Completed it and make sure it turn green at last.
Now develop another scenario, if it's easy, we can implement it just with cucumber, otherwise we will have to split it and write several jUnit test
There are definitely a lot of mis-understandings, even very basic ones. Because I don't find myself gain much value from BDD except the "discuss with all stakeholders" part.
Where is my mistake? Appreciate for any sugguestions!
Don't start with logging in; start with the thing that's different to the other systems out there. Why is someone logging in? Why do they want to use the service? Hard-code a user, pretend they're logged in, focus on the value.
If you focus on UI details you tie yourself to the UI very strongly, and it makes the UI hard to change. Instead, look at what capabilities the system is delivering. I don't recommend using a login scenario anyway, but if I did, I'd expect it to look more like:
Given Jeff isn't registered with the site
When he registers with the username "Jeff" and password "123456"
Then his account creation should be confirmed
And he should be invited to log in for the first time.
Look up "declarative vs. imperative" here to see more on this.
If your UI is really in flux, try out the scenario manually until the UI has settled down a bit. It will be easier to automate then. As you move into more stable scenarios it will be better to automate first (TDD-style).
What should you do now? Well, most people don't write class-level tests for the UI, so don't worry about it until you start driving out the controller and presenter layers. It's normally easier to have frameworks in the same language, but two different frameworks is fine. Cucumber / RSpec, JBehave / JUnit, SpecFlow / NUnit are pretty typical combinations. Do the smallest amount you need to get that first scenario working. It won't be much, because you can hard-code a lot of it. The second scenario will start introducing more interesting behaviour, and then you'll start to see your class-level tests emerge.
BTW, BDD started at a class level, so you can do the same thing with the classes; think of an example of how you might use it, write "Given, When, Then" in comments if your framework doesn't work that way already, and then fill in the gaps!
Yes, your Cucumber scenario will be red throughout, until it isn't.
Ideally you'll be making one last unit test and the Cucumber scenario pass at the same time, rather than just writing a bit of extra code. It's very satisfying to see it finally go green.
The original point of BDD was to get rid of the word "test", since it causes people to think of things like TDD as being about testing. TDD's really about clean design; understanding the responsibilities and behaviour of your code, in the same way that scenarios help you understand the capabilities and behaviour of your system. It should be normal to write both system-level scenarios and class-level tests too.
You're already ahead of all the people who forget to discuss the scenarios before they start coding, though! The conversations with stakeholders are the most important part. You might get value out of including a tester in those conversations. Testers are very good at spotting scenarios that other people miss.
It looks like you're pretty much on the right track where the rest of the process is concerned. You might find some of the other BDD answers in my profile helpful for you too. Congrats and good luck!
I think doing registration/sign_in first is a really good thing to do when you are learning the mechanics of doing BDD. Pretty much everyone understands why you would want to sign into a system, and everyone understands that the system has to know who you are before you can do this, so you have to register first.
Doing this simple task allows you to concentrate on a smaller subset of BDD. By narrowing your focus you can improve quality, whilst being aware that there is much more to learn a little later on.
To write your sign in scenarios you need to focus on two things:
writing scenarios
implementing step definitions
These are the basic mechanics of BDD, but they are only a small part of the overall process. Still I think you'd benefit from working on them because at the moment you are not executing the mechanics very well, which is to be expected because you are new to this.
When you write scenarios you should concentrate on 'what' you are doing and 'why' you are doing it. Scenarios have no need to know anything about 'how' you do things. Anything to do with filling in stuff, clicking on stuff etc. is a smell. When your scenarios only deal with the what and why they become much simpler.
Feature: Registration
A pre-requistite for signing in, see sign_in.feature
Scenario: Register
Given I am a new user
When I register
Then I should be registered
Feature: Sign in
Dependant on registration ...
I want to sign in so I can have personalised content and ...
Scenario: Sign in
Given I am registered
When I sign in
Then I should be signed in
You really don't much more than this to drive the development of a simple sign_in system. Once you have that running you can deal with some sad paths e.g.
Scenario: Sign in with bad password
Given I am registered
When I sign in with a bad password
Then I should not be signed in
And I should be told ...
If you implement things nicely this sad path scenario should be trivial to implement as all the infrastructure is already in place to sign in, all that is different is you are using a bad password.
You can see an example of this at https://github.com/diabolo/cuke_up. The way to use this example is follow the commit history, and in particular notice how I am using the extract_method refactor to take all the code out of the step definitions. Each method I extract is a tool to reused when writing subsequent scenarios. Making an effective set of tools is the key to productivity when implementing scenarios.
Nowadys sign_up is so simple because we can rely on a 3rd party library and their unit tests. This means we can get pretty good results without ever having to worry about the transition to our own code and doing bits of TDD. So for now there really is no need to think about TDD.
So long as you are aware that you are only doing a small subset of BDD, I think you can successfully use this approach to provide foundations for all the extra stuff you have to deal with when working with the things that differentiate your system from others.
To summarize, just focus on
writing simple scenarios
making your step definitions elegant
creating tools (extracted methods) that can be used in the next scenario you write
You have plenty of time to learn the other stuff, and it will be much easier if your basic mechanics are better developed.

Disagreement on software time estimation

How do you deal with a client who has different time estimates for the software product than yours?
I am going to describe a scenario that is not mine, but that captures broadly the same problem. I am working as a subcontractor to a large company that has a programming department. The software project we are working on is in an area that the department believe they have a handle on, but because their expertise and mine are very different we tend to get different results.
Example: At the start of the project I suggested one way of development which they rubbished as being unrealistically difficult and suggested integrating a different framework (one they are familiar with) with the programming language we are using (Python) to get more or less the same result.
Their estimate for this integration: less than a week (they haven't done the integration before).
My estimate for the integration: above two weeks.
Using my suggested way to get the result needed (including using matplotlib among other libraries used elsewhere within the project): 45 minutes. This is not an estimate, the bit was actually finished in 45 minutes.
Example: for the software to be integrated with their internal system, they needed to provide a web service for me to use. They provided a broken one, though it does work with their internal tool (doesn't work with .Net or Java mainstream packages among other options). They maintain that it is my fault that the integration has taken longer than the time estimated.
The problem is not that they don't know, the problem is that they have enough knowledge about programming to be dangerous (in my opinion). Is there some guidelines for how to deal with this type of situation? A way for expectation management? Or may be I shouldn't get involved in such projects from the start and in this case what are the telltale signs?
If a client isn't happy with a time estimate, don't do the work. If they think they can do it better or faster, tell them to go ahead.
The one thing I never allow is for my estimates to be modified. That's something that caught me out early on in my career but we learn our lessons.
If clients were so good at doing the work, they wouldn't be hiring me. I'd simply point out that they hired me for my expertise so why are they disregarding that expertise. Of course, if they were to allow the scope of the project to change (i.e., less work), that would be another matter, and one up for discussion.
If you didn't lock in exactly what they were meant to provide as part of the deal, then it's a "he says, she says" situation and, unfortunately, the customer controls the purse strings. However, often, the greatest power you can have is the ability to just walk away.
No-one says you have to do the job.
Of course, all that advice above is worth every cent you paid for it :-)
I don't know your specific circumstances.
Or may be I shouldn't get involved in such projects from the start and in this case what are the telltale signs?
My answer for sure. If you can avoid those projects, do it.
Some signs : people thinking they know how to do things when you can guess they can't. The "oh no let's not use this perfectly suitable tool because I don't know it" is a major indicator that the person is technically challenged.
first of all, it is no fun to be in such an environment.
So, if you like to have fun at your job, and you do not need to take this job for extenuating financial reasons, then simply do not take the job that is not fun.
Since that is hardly realistic in many cases, you will end up with the job and need to manage the situation as best you can. One way is to make sure there is a paper trail documenting your objections and concerns with the plan. Try not to be overtly negative, but try to be constructive and present valid alternatives. Here you will need to feel out the political landscape, determine if the 'boss' will be appreciative or threatened by your commentary, and act accordingly.
Many times there are other issues that management is dealing with that you are not aware of. Be cautious of this fact, and maybe ask the management team if this is the case, again without being condescending or negative.
Finally, if you have alternatives that take less time than the meetings it would take to discuss them, just try it in a sandbox, and show it off. This would go a long way to 'proving' your points. Caution here is that you could be accused of not being a team player, or of wasting resources, or not following direction. Make sure this is mitigated by doing these types of things on your own time, or after careful consideration of how long you are spending on these things as well as how vested your boss seems to be on the alternatives.
hth
I ran into the same problem with integration. Example: for the
software to be integrated with their internal system, they needed to
provide a web service for me to use...They maintain that it is my
fault that the integration has taken longer than the time estimated.
Wow very similar to what I was experiencing with a client. The best thing I can suggest is to keep good documentation. In the end that is what saved me. When it came to finger pointing I had all of the emails and facts in order and was prepared to defend my self.
One thing I would suggest is to separate out a target/goal and an estimation. I would not change my estimate unless it involved actually removing features or something is revealed that would make it easier. Tell them you will try to hit the target in anyway you can and you care about the business goal. However, your estimate will not change. If its getting no where and they are just dense then smile and nod and take it if its the only gig around.
Was just writing about this in my blog
How to estimate the WRONG way

How do you stress test your own software?

I've been working on an app, by myself, and I am at a stage where everything works great--as long as the user does everything he or she is supposed to do. :-) The software needs more testing to see how robust it is, how well it works when people do things like click the same button repeatedly, try to open the wrong kind of files, put data in the wrong places, etc.
I'm having a little trouble with this because it's a bit difficult for me to think in terms of using the application incorrectly. These are all edge cases to me. Still, I'd like to have the application as stable and well tested as possible before I start giving it to beta testers. Assuming that I am not talking about hiring professional testers at this point, I'm curious whether y'all have any tips or systematic ways of thinking about this task.
Thanks, as always.
Well it sounds like you are talking about 2 different things
"Testing your application's functionality" and "Stress testing"(which is the title of your question)
Stress testing is when you have a website, and want to check that it can server 100,000 people at the same time. Seeing how your application performs under stress. You can do this a number of ways, e.g by recording some actions and then getting a number of agent machines to hit your application concurrently.
This questions sounds more like a Quality Assurance question. That is what testers / beta testers are for. But there are things that you can do yourself to validate your application works the best it can.
Unit testing your code would be a good start, it helps you to try and find those edge cases. If your method takes in things like ints, try passing in int.max, int.min, and seeing what blows up. Pass nulls into everything. If you are using .Net you might want to look at PEX, it will go through all the branches/codepaths that your application has. That might help you to further refine your unit tests to test your application the best you can.
Integration tests, see what happens end to end for some of your usual things. This will help you 'find bugs' as you are developing later.
Those are some quick tips on things you can do yourself to try and find edge cases that you may have missed. But yes, eventually you will need to pass your app off to someone else to test. Just make sure that you have covered off as much as you can before it hits them :-)
Make sure you have adequate code coverage in your unit tests and integration tests.
Use appropriate UI validation, and test combinations that can break it.
I have found that a well-architected application that reduces the number of possible permutations in the UI (ways the user can break it) helps a lot. Design patterns like MVC can be especially useful in this regard, since they make your UI veneer as thin as possible.
Automation.
(Re)Factor your code so that another program can throw user-events at it. Create simple scripts of user events and play them back to your program. Capture events from beta users and save those as test scripts (useful for reproducing problems and checking for regressions). Write a fuzz-tester that applies small random changes to the scripts and try them against your program as well.
With this kind of automation you can stress and application and find glaring problems like caches and memory leaks. It won't test the actual functionality. For functionality, unit tests can be helpful. There are a ton of unit testing frameworks out there to try. Pick something useful, learn to write good tests, and integrate them into your build process.

How smoothly does a website launch usually go for you?

My coworkers and I were having a discussion about this yesterday. It seems that no matter how well we prepare and no matter how much we test and no matter what the client says immediately before the site becomes public, initial site launches almost always seem to be somewhat rocky. Some clients are better than others, but often things that were just fine during testing suddenly go horribly wrong when the site becomes public.
Is this a common experience? I'm not just talking about functionality breaking down (although that's often a problem as well). I'm also talking about sites that work exactly the way we wanted them to, but suddenly are not satisfactory to the client when it's time to make the site public. And I'm talking about clients that have been familiar with the site during most of the development process. Meaning, the public launch is definitely not the first time they've seen the site.
If you've dealt with this problem before, have you found a way to improve the situation? Or is this just something that will always be somewhat of a problem?
Don't worry. This is completely and entirely normal and happens with every piece of software. Everything that can go wrong will go wrong, and the most volatile entity in the development process, the client, will be the cause of these things.
You could do all the Requirements Gathering in the world, write a 100 page Proposal, provide screenshots and updates to the project hourly and the client will still not approve. On a personal note, I feel that the Internet is one of the worst mediums for this, as designs are a lot more free-flowing nowadays and the client will always have a certain picture in his/her mind; one that won't look like the finished product.
I find that a bulletproof contact with defined stages and sign-off sheets are the best way to handle such a situation. Assuming that your work is contracted you should ensure that at each stage the client is shown the work and is forced to approve each and every change made. At least that way if the client wants something changed you can tell them that they've already signed off that section and the additional work will cost them extra (also defined within the contract).
Not only did this approach work for me, it made the client stop and think about what he/she REALLY wanted. Luckily for me many of my clients are already tech-oriented, so they understand that these things can take time, but those that haven't a clue about Web Development expect things to be perfect within a couple of days. As long as you make sure that everything is covered in the contract the client will think about what they want and won't pester you with issues after.
Of course, anything you can do in regards to Quality Control would be fantastic and help the project move along nicely. Also ensure that some form of methodology is planned out before the project and that this methodology is known by the client(s). Often changes in fundamental areas can be costly and many clients do not seem to realise that a small change can require many things to be changed.
Yes, saw this several times on our projects (human beings are fickle).
Something that helps us in these situations is a good PM/Account Manager that can handle the customer, which makes things a little bit bearable on the technical level.
Web site launches are usually fairly smooth for us. Of course, we do extensive validation including code inspections, deployments to proto-servers (identical to our production servers), and mountains of documentation.
After every launch, we have a meeting to discuss what went well and what didn't so that we can make adjustments to our overall process and best-known-methods documents.
As for clients that change their minds at the last minute... sigh... we minimize that by having them sign off on the beta version. That way, there is no disagreement when the project is launched. If there is a disagreement, there is always a next release.
For what it's worth, the last site launch I did went off without a hitch. Now, it wasn't a high-traffic site, and there were some bugs that I did eventually fix, but there wasn't anything troubling on the day of the actual launch.
This was an ASP.NET/C# site. It wasn't terribly large or complicated, but it wasn't trivial either. Probably the most notable thing is that it was 100% designed, implemented, and tested by myself, from the database schema all the way up to the CSS. It was also my first time using ASP.NET. There were plenty of bumps in development but by the time I launched it I was pretty familiar with them and so knew what to expect.
I think the lesson to be learned from this is to have a good design up-front, solid implementation skills, and good testing, and a new site doesn't have to be a nightmare. There's at least a possibility of a trouble-free launch.
I wouldn't limit your statement to just web sites. I have worked on a lot of projects over the years and there are always details that get "discovered" when going live. No amount of testing removes all the fun things that can happen.
One thing I will say is what you learn in the first couple of hours of a new system going "on-line" is way move valuable that all the stuff learned during development. It's show time when the real cool problems and scenarios appear. Learn to love them and use these times as a learning point for the next time. Then each time it will be just at fun!
We used to have this problem a lot, but much less recently.
Partly for us it is about firmer project management and documenting the specification (as suggested in other answers here) but I believe more of the difference came from:
Expectation management - getting the client to accept that iterative changes are still to be expected after launch, that this is normal and not to worry about it
Increasing authority - we are now a well established (13 years) web developer and we can speak with a lot of expertise
Simply being more experienced - we can now predict in advance most of the queries that are likely to come up, and either resolve them, mitigate them or bring them to the client's attention so they don't sting us on the day
Plus, we rarely do big fanfare launches - a soft launch makes things much less stressful.
My experience is that web site launches are almost always rocky. I've only had two exceptions to this common truth. The first was a site developed for a small business ran by one person. This went smoothly because, well there was only one person to please so is was fairly easy to track what they wanted. The other was a multi-million dollar website launched by a fortune 500 company. This happened to go smoothly because there were 2 PMs and a small army of consultants there to manage the needs of the customer. That coupled with a one month of straight application load testing and a 1,000 user beta launch meant when the site finally went "live", I was able to get a full nights sleep (which is fairly uncommon). Neither of these situations constitute then norm though. Of course, there's nothing better than several thousand beta testers hitting your site to help find those contingencies that you never thought of.
I'm sure you can figure out the kind of errors that always sneak in, so for example is it due to rather superficial testing? E.g. randomly clicking around and checking if things appear to be right.
In order to improve I propose something along the following:
Create documents/checklists that specify all testing procedures.
Get regular people to test, not just the folks who built the application.
Setup a staging environment which closely resembles production.
Post-launch, analyze what went wrong and why it went wrong.
Maybe get external QA to check on your procedures.
Now, all those suggestions are of course very obvious but implementing them into your launch procedures will require time.
In general this really is an ongoing process which will help you and your colleagues to improve. And also be happier, because fixing bugs in production just makes you age rapidly. ;-)
Keep in mind, you won't be done the first time. Documents are heavy which is why people don't read them. People are also lazy and don't follow the procedures. This means that you always have analyze what happened, go back and improve the procedures.
If you have the opportunity I'd also spend some time on looking why nothing went wrong with another launch and comparing this to the usual.

How to measure usability to get hard data?

There are a few posts on usability but none of them was useful to me.
I need a quantitative measure of usability of some part of an application.
I need to estimate it in hard numbers to be able to compare it with future versions (for e.g. reporting purposes). The simplest way is to count clicks and keystrokes, but this seems too simple (for example is the cost of filling a text field a simple sum of typing all the letters ? - I guess it is more complicated).
I need some mathematical model for that so I can estimate the numbers.
Does anyone know anything about this?
P.S. I don't need links to resources about designing user interfaces. I already have them. What I need is a mathematical apparatus to measure existing applications interface usability in hard numbers.
Thanks in advance.
http://www.techsmith.com/morae.asp
This is what Microsoft used in part when they spent millions redesigning Office 2007 with the ribbon toolbar.
Here is how Office 2007 was analyzed:
http://cs.winona.edu/CSConference/2007proceedings/caty.pdf
Be sure to check out the references at the end of the PDF too, there's a ton of good stuff there. Look up how Microsoft did Office 2007 (regardless of how you feel about it), they spent a ton of money on this stuff.
Your main ideas to approach in this are Effectiveness and Efficiency (and, in some cases, Efficacy). The basic points to remember are outlined on this webpage.
What you really want to look at doing is 'inspection' methods of measuring usability. These are typically more expensive to set up (both in terms of time, and finance), but can yield significant results if done properly. These methods include things like heuristic evaluation, which is simply comparing the system interface, and the usage of the system interface, with your usability heuristics (though, from what you've said above, this probably isn't what you're after).
More suited to your use, however, will be 'testing' methods, whereby you observe users performing tasks on your system. This is partially related to the point of effectiveness and efficiency, but can include various things, such as the "Think Aloud" concept (which works really well in certain circumstances, depending on the software being tested).
Jakob Nielsen has a decent (short) article on his website. There's another one, but it's more related to how to test in order to be representative, rather than how to perform the testing itself.
Consider measuring the time to perform critical tasks (using a new user and an experienced user) and the number of data entry errors for performing those tasks.
First you want to define goals: for example increasing the percentage of users who can complete a certain set of tasks, and reducing the time they need for it.
Then, get two cameras, a few users (5-10) give them a list of tasks to complete and ask them to think out loud. Half of the users should use the "old" system, the rest should use the new one.
Review the tapes, measure the time it took, measure success rates, discuss endlessly about interpretations.
Alternatively, you can develop a system for bucket-testing -- it works the same way, though it makes it far more difficult to find out something new. On the other hand, it's much cheaper, so you can do many more iterations. Of course that's limited to sites you can open to public testing.
That obviously implies you're trying to get comparative data between two designs. I can't think of a way of expressing usability as a value.
You might want to look into the GOMS model (Goals, Operators, Methods, and Selection rules). It is a very difficult research tool to use in my opinion, but it does provide a "mathematical" basis to measure performance in a strictly controlled environment. It is best used with "expert" users. See this very interesting case study of Project Ernestine for New England Telephone operators.
Measuring usability quantitatively is an extremely hard problem. I tackled this as a part of my doctoral work. The short answer is, yes, you can measure it; no, you can't use the results in a vacuum. You have to understand why something took longer or shorter; simply comparing numbers is worse than useless, because it's misleading.
For comparing alternate interfaces it works okay. In a longitudinal study, where users are bringing their past expertise with version 1 into their use of version 2, it's not going to be as useful. You will also need to take into account time to learn the interface, including time to re-understand the interface if the user's been away from it. Finally, if the task is of variable difficulty (and this is the usual case in the real world) then your numbers will be all over the map unless you have some way to factor out this difficulty.
GOMS (mentioned above) is a good method to use during the design phase to get an intuition about whether interface A is better than B at doing a specific task. However, it only addresses error-free performance by expert users, and only measures low-level task execution time. If the user figures out a more efficient way to do their work that you haven't thought of, you won't have a GOMS estimate for it and will have to draft one up.
Some specific measures that you could look into:
Measuring clock time for a standard task is good if you want to know what takes a long time. However, lab tests generally involve test subjects working much harder and concentrating much more than they do in everyday work, so comparing results from the lab to real users is going to be misleading.
Error rate: how often the user makes mistakes or backtracks. Especially if you notice the same sort of error occurring over and over again.
Appearance of workarounds; if your users are working around a feature, or taking a bunch of steps that you think are dumb, it may be a sign that your interface doesn't give the tools to figure out how to solve their problems.
Don't underestimate simply asking users how well they thought things went. Subjective usability is finicky but can be revealing.