How to measure usability to get hard data?

How to measure usability to get hard data? - usability

There are a few posts on usability but none of them was useful to me.
I need a quantitative measure of usability of some part of an application.
I need to estimate it in hard numbers to be able to compare it with future versions (for e.g. reporting purposes). The simplest way is to count clicks and keystrokes, but this seems too simple (for example is the cost of filling a text field a simple sum of typing all the letters ? - I guess it is more complicated).
I need some mathematical model for that so I can estimate the numbers.
Does anyone know anything about this?
P.S. I don't need links to resources about designing user interfaces. I already have them. What I need is a mathematical apparatus to measure existing applications interface usability in hard numbers.
Thanks in advance.

http://www.techsmith.com/morae.asp
This is what Microsoft used in part when they spent millions redesigning Office 2007 with the ribbon toolbar.
Here is how Office 2007 was analyzed:
http://cs.winona.edu/CSConference/2007proceedings/caty.pdf
Be sure to check out the references at the end of the PDF too, there's a ton of good stuff there. Look up how Microsoft did Office 2007 (regardless of how you feel about it), they spent a ton of money on this stuff.

Your main ideas to approach in this are Effectiveness and Efficiency (and, in some cases, Efficacy). The basic points to remember are outlined on this webpage.
What you really want to look at doing is 'inspection' methods of measuring usability. These are typically more expensive to set up (both in terms of time, and finance), but can yield significant results if done properly. These methods include things like heuristic evaluation, which is simply comparing the system interface, and the usage of the system interface, with your usability heuristics (though, from what you've said above, this probably isn't what you're after).
More suited to your use, however, will be 'testing' methods, whereby you observe users performing tasks on your system. This is partially related to the point of effectiveness and efficiency, but can include various things, such as the "Think Aloud" concept (which works really well in certain circumstances, depending on the software being tested).
Jakob Nielsen has a decent (short) article on his website. There's another one, but it's more related to how to test in order to be representative, rather than how to perform the testing itself.

Consider measuring the time to perform critical tasks (using a new user and an experienced user) and the number of data entry errors for performing those tasks.

First you want to define goals: for example increasing the percentage of users who can complete a certain set of tasks, and reducing the time they need for it.
Then, get two cameras, a few users (5-10) give them a list of tasks to complete and ask them to think out loud. Half of the users should use the "old" system, the rest should use the new one.
Review the tapes, measure the time it took, measure success rates, discuss endlessly about interpretations.
Alternatively, you can develop a system for bucket-testing -- it works the same way, though it makes it far more difficult to find out something new. On the other hand, it's much cheaper, so you can do many more iterations. Of course that's limited to sites you can open to public testing.
That obviously implies you're trying to get comparative data between two designs. I can't think of a way of expressing usability as a value.

You might want to look into the GOMS model (Goals, Operators, Methods, and Selection rules). It is a very difficult research tool to use in my opinion, but it does provide a "mathematical" basis to measure performance in a strictly controlled environment. It is best used with "expert" users. See this very interesting case study of Project Ernestine for New England Telephone operators.

Measuring usability quantitatively is an extremely hard problem. I tackled this as a part of my doctoral work. The short answer is, yes, you can measure it; no, you can't use the results in a vacuum. You have to understand why something took longer or shorter; simply comparing numbers is worse than useless, because it's misleading.
For comparing alternate interfaces it works okay. In a longitudinal study, where users are bringing their past expertise with version 1 into their use of version 2, it's not going to be as useful. You will also need to take into account time to learn the interface, including time to re-understand the interface if the user's been away from it. Finally, if the task is of variable difficulty (and this is the usual case in the real world) then your numbers will be all over the map unless you have some way to factor out this difficulty.
GOMS (mentioned above) is a good method to use during the design phase to get an intuition about whether interface A is better than B at doing a specific task. However, it only addresses error-free performance by expert users, and only measures low-level task execution time. If the user figures out a more efficient way to do their work that you haven't thought of, you won't have a GOMS estimate for it and will have to draft one up.
Some specific measures that you could look into:
Measuring clock time for a standard task is good if you want to know what takes a long time. However, lab tests generally involve test subjects working much harder and concentrating much more than they do in everyday work, so comparing results from the lab to real users is going to be misleading.
Error rate: how often the user makes mistakes or backtracks. Especially if you notice the same sort of error occurring over and over again.
Appearance of workarounds; if your users are working around a feature, or taking a bunch of steps that you think are dumb, it may be a sign that your interface doesn't give the tools to figure out how to solve their problems.
Don't underestimate simply asking users how well they thought things went. Subjective usability is finicky but can be revealing.

Related

Disagreement on software time estimation

How do you deal with a client who has different time estimates for the software product than yours?
I am going to describe a scenario that is not mine, but that captures broadly the same problem. I am working as a subcontractor to a large company that has a programming department. The software project we are working on is in an area that the department believe they have a handle on, but because their expertise and mine are very different we tend to get different results.
Example: At the start of the project I suggested one way of development which they rubbished as being unrealistically difficult and suggested integrating a different framework (one they are familiar with) with the programming language we are using (Python) to get more or less the same result.
Their estimate for this integration: less than a week (they haven't done the integration before).
My estimate for the integration: above two weeks.
Using my suggested way to get the result needed (including using matplotlib among other libraries used elsewhere within the project): 45 minutes. This is not an estimate, the bit was actually finished in 45 minutes.
Example: for the software to be integrated with their internal system, they needed to provide a web service for me to use. They provided a broken one, though it does work with their internal tool (doesn't work with .Net or Java mainstream packages among other options). They maintain that it is my fault that the integration has taken longer than the time estimated.
The problem is not that they don't know, the problem is that they have enough knowledge about programming to be dangerous (in my opinion). Is there some guidelines for how to deal with this type of situation? A way for expectation management? Or may be I shouldn't get involved in such projects from the start and in this case what are the telltale signs?

If a client isn't happy with a time estimate, don't do the work. If they think they can do it better or faster, tell them to go ahead.
The one thing I never allow is for my estimates to be modified. That's something that caught me out early on in my career but we learn our lessons.
If clients were so good at doing the work, they wouldn't be hiring me. I'd simply point out that they hired me for my expertise so why are they disregarding that expertise. Of course, if they were to allow the scope of the project to change (i.e., less work), that would be another matter, and one up for discussion.
If you didn't lock in exactly what they were meant to provide as part of the deal, then it's a "he says, she says" situation and, unfortunately, the customer controls the purse strings. However, often, the greatest power you can have is the ability to just walk away.
No-one says you have to do the job.
Of course, all that advice above is worth every cent you paid for it :-)
I don't know your specific circumstances.

Or may be I shouldn't get involved in such projects from the start and in this case what are the telltale signs?
My answer for sure. If you can avoid those projects, do it.
Some signs : people thinking they know how to do things when you can guess they can't. The "oh no let's not use this perfectly suitable tool because I don't know it" is a major indicator that the person is technically challenged.

first of all, it is no fun to be in such an environment.
So, if you like to have fun at your job, and you do not need to take this job for extenuating financial reasons, then simply do not take the job that is not fun.
Since that is hardly realistic in many cases, you will end up with the job and need to manage the situation as best you can. One way is to make sure there is a paper trail documenting your objections and concerns with the plan. Try not to be overtly negative, but try to be constructive and present valid alternatives. Here you will need to feel out the political landscape, determine if the 'boss' will be appreciative or threatened by your commentary, and act accordingly.
Many times there are other issues that management is dealing with that you are not aware of. Be cautious of this fact, and maybe ask the management team if this is the case, again without being condescending or negative.
Finally, if you have alternatives that take less time than the meetings it would take to discuss them, just try it in a sandbox, and show it off. This would go a long way to 'proving' your points. Caution here is that you could be accused of not being a team player, or of wasting resources, or not following direction. Make sure this is mitigated by doing these types of things on your own time, or after careful consideration of how long you are spending on these things as well as how vested your boss seems to be on the alternatives.
hth

I ran into the same problem with integration. Example: for the
software to be integrated with their internal system, they needed to
provide a web service for me to use...They maintain that it is my
fault that the integration has taken longer than the time estimated.
Wow very similar to what I was experiencing with a client. The best thing I can suggest is to keep good documentation. In the end that is what saved me. When it came to finger pointing I had all of the emails and facts in order and was prepared to defend my self.
One thing I would suggest is to separate out a target/goal and an estimation. I would not change my estimate unless it involved actually removing features or something is revealed that would make it easier. Tell them you will try to hit the target in anyway you can and you care about the business goal. However, your estimate will not change. If its getting no where and they are just dense then smile and nod and take it if its the only gig around.
Was just writing about this in my blog
How to estimate the WRONG way

Nielson's usablity scale

Just wondering if anyone out there knows of a standard survey (preferably based off Jacob Nielson's work on usability) that web admin's can administer to test groups for usability?
I could just make up my own but I feel there as got to be some solid research out there on the sort of judgments on tasks I should be asking.
For example
Q:: Ask user to find profile page
Do I ...
A.) Present them with standard likert scale after each question
B.) Present them the likert after all the questions
..
Then what should that likert be, I know Nielson's usability judgments scale is based on Learnability, Efficiency of Use, Memorability, Error Rate, Satisfaction but I can only imagine a likert I would design that would effectively measure satisfaction...how am I suppose to ask a user to rank the Memorability of a site after one use on a 1-5 scale? Surely someone has devised a good way to pose the question?

A few recommendations:
Don't determine your standard exclusively by listening to the users and waiting for their feedback. Nielsen says that rule #1 in usability is "Don't listen to users"; it's more important to watch them work.
Here is an FAQ regarding development of Likert questionnaires. I would err on the side of simplicity and brevity if you are going to ask users a list of questions after every task. There are advantages and disadvantages to both of the options you are considering. If you make a user wait until they have finished all of their tasks before they fill out a survey, they may not remember their initial difficulties with the interface as they adjust to its learning curve. On the other hand, if you ask them questions after each task, they may start rushing through the questionnaire as they get toward the end of the list of tasks. An extra option, depending on how many tasks you have, may be to have the user fill out a survey after every several tasks.
The University of Maryland HCI Laboratory maintains a Questionnaire for User Interaction Satisfaction, which is available for download and now on version 7.0. You may be able to use their survey, or at least tailor it for your use.

The short and easy System Usability Scale (SUS) has been found by Tullis and Stetson (2004) to psychometrically outperform other subjective scales including the renowned QUIS. Most SUS items seem related to learnability or memorability, along with a couple for efficiency. However, I wouldn’t try to break it into subscales; all items are highly intercorrelated suggesting this scale measures a single underlying construct.
I would doubt you can get a scale to measure each of Nielsen’s dimensions separately. A user can tell you if a product is “hard” to use, but it’s much more difficult for them to break it down further. They know it took a lot of work to do something, but was it because they couldn’t figure out an easier way (learnability)? Or maybe they had learned a better way on a previous task, but forgot it (memorability)? Or is that just the way it has to be (efficiency)? Users are not going to have sufficient information to make the distinction.
If you are specifically interested in each of Nielsen’s dimensions separately, then assess them separately and directly. You can measure learnability crudely through recording the number of errors or time between clicks, and precisely by how many trials it takes for users to learn the normative interaction sequence. For efficiency, after you train users to do the normative interaction sequence, record how long it takes them to do it. You can also get a pretty good answer analytically using something like GOMS-KLM. For memorability, bring the same users in a week or so later and compare their performance to that of the efficiency-measuring trial.
Like nearly all subjective scales, the SUS is primarily useful for comparing the overall subjective experience of different products. It’s hard to know what to make out of a single score without something to compare it to. These scales won’t tell what specific problems a product has or why it has them (e.g., to help you determine improvements). For that, qualitative observation and debriefing your test participants is best.

What is the best practice to write Selenium-based integration testing from zero for a complex application?

I am after some advice and pointers on integration testing for a web app. Our project has been running for a number of years, and it is reasonably complex. We are pretty well covered with unit tests, but we are missing a decent set of integration tests. We don't have documented use cases or even a reasonable set of test cases beyond our unit tests. 'Integration testing' today consists of the developer's knowledge of the likely impact of a change and manual, ad-hoc testing of the app. It is really not ideal - we now want to design and automate a solid set of tests to allow us to perform regression testing, and increase our confidence in the quality of the app.
We have finally built a platform (based on Selenium) to allow us to quickly author and automate the execution of the tests. The problem now: we don't have any tests, the page is well and truly blank. The system has around 30 classes which interact with each other and influence the UI. For a new user signing up, there are about 40 properties that can be set, with each once impacting the experience. Over the user life time they will generate even more states. Given so many variables and possible states, it is a daunting prospect to get started, which is probably why it has been neglected thus far.
The pain of not having a decent set of tests is now becoming destructive. I am dedicating time to get this problem fixed - I am after some practical advice on the authoring of the tests. How do you approach it? Do you have any links I may find useful? How can I stop my mind running away with the seemingly infinite number of states for a user's data? How can I flush out the edge cases which are failing (and our users seeming to be finding)?

If it is the sheer amount of combinations that is holding you back in trying to generate testcases, you should definitly take a look at all-pair testing.
We have used PICT from microsoft as a tool to successfully minimize the amount of testcases while still being reasonable confident to have most cases covered.
the reasoning behind all-pairs testing
is this: the simplest bugs in a
program are generally triggered by a
single input parameter. The next
simplest category of bugs consists of
those dependent on interactions
between pairs of parameters, which can
be caught with all-pairs testing.1
Bugs involving interactions between
three or more parameters are
progressively less common2, whilst
at the same time being progressively
more expensive to find by exhaustive
testing, which has as its limit the
exhaustive testing of all possible
inputs.

So was that Data Structures & Algorithms course really useful after all?

I remember when I was in DSA I was like wtf O(n) and wondering where would I use it other than in grad school or if you're not a PhD like Bloch. Somehow uses for it does pop up in business analysis, so I was wondering when have you guys had to call up your Big O skills to see how to write an algorithm, which data structure did you use to fit or whether you had to actually create a new ds (like your own implementation of a splay tree or trie).

Understanding Data Structures has been fundamental to many of the projects I've worked on, and that goes beyond the ten minute song 'n dance one does when asked such a question in an interview situation.
Granted that modern environments with all sorts of collection classes can make light work of storing and accessing large amounts of data, but having an understanding that a particular problem is best solved with a particular data structure can be a great timesaver. And by "timesaver" I mean "the difference between something working and not working".

Honestly, being able to answer that stuff is my biggest criterion for taking interviewees seriously in an interview. Knowing how basic data structures work, basic O(n) analysis, and some light theory is really crucial to being able to write large applications successfully.
It's important in the interview because it's important in the job. I've worked with techs in the past that were self taught, without taking the data structures course or reading a data structures book, and their code is occasionally bad in ways they should have seen coming.
If you don't know that n2 is going to run slowly compared to n log n, you've got more to learn.
As far as the later half of the data structures courses, it isn't generally applicable to most tech jobs, but if you ever do wind up needing it, you'll wish you had paid more attention.

Big-O notation is one of the basic notations used when describing algorithms implemented by a particular library. For example, all documentation on STL that I've seen describes various operations in terms of big-O, so naturally you have to e.g. understand the difference between O(1), O(log n) and O(n) to understand the implications of your choice of STL containers and algorithms. MSDN also does that for .NET classes, and IIRC Java documentation does that for standard Java classes. So, I'd say that knowing the notation is pretty much a requirement for understanding documentation of most popular frameworks out there.

Sure (even though I'm a humble MS in EE -- no PhD, no CS, differently from my colleague Joshua Block), I write a lot of stuff that needs to be highly scalable (or components that may need to be reused in highly scalable apps), so big-O considerations are most always at work in my design (and it's not hard to take them into account). The data structures I use are almost always from Python's simple but rich supply (which I did lend a hand developing;-), rarely is a totally custom one needed (rather than building on top of list, dict, etc); but when it does happen (e.g. the bitvectors in my open source project gmpy), no big deal.

I was able to use B-Trees right when I learned about them in algorithm class (that was about 15 years ago when there were much less open source implementations available). And even later the knowledge about the differences of e. g. container classes came in handy...

Absolutely: even though stacks, queues, etc. are pretty straightforward, it helps to have been introduced to them in a disciplined fashion.
B-Tree's and more advanced sorting are a bit more difficult so learning them early was a big benefit and I have indeed had to implement each of them at various points.
Finally, I created an algorithm for single-connected components a few years back that was significantly better than the one our signal-processing team was using but I couldn't convince them that it was better until I could show that it was O(n) complexity rather than O(nlogn).
...just to name a few examples.
Of course, if you are content to remain a CRUD-system hacker with no real desire to do more than collect a paycheck, then it may not be necessary...

I found my knowledge of data structures very useful when I needed to implement a customizable event-driven system about ten years ago. That's the biggie, but I use that sort of knowledge fairly frequently in lesser ways.

For me, knowing the exact algorithms has been... nice as background knowledge. However, the thing that's been the most useful is the more general background of having to pay attention to how different pieces of an algorithm interact. For instance, there can be places in code where moving one piece of code (ie, outside a loop) can make a huge difference in both time and space.
Its less of the specific knowledge the course taught and, rather, more that it acted like several years of experience. The course took something that might take years to encounter (have drilled into you) all the variations of in pure "real world experience" and condensed it.

The title of your question asks about data structures and algorithms, but the body of your question focuses on complexity analysis, so I'll focus on that too:
There are lots of programming jobs where being able to do complexity analysis is at least occasionally useful. See What career can I hope for if I like algorithms? for some examples of these.
I can think of several instances in my career where either I or a co-worker have discovered a a piece of code where the (usually time, sometimes space) complexity was higher that it should have been. eg: something that was quadratic or cubic when it could have been linear or nlog(n). Such code would work fine when given small inputs, but on larger inputs would quickly become really slow or consume all available memory. Knowing alternative algorithms and data structures, their complexities, and also how to analyze the complexity to build new algorithms is vital in being able to correct these problems (or avoid them in the first place).

Networking is all I've used it: in an implementation of traveling salesman.

Unfortunately I do a lot of "line of business" and "forms over data" apps, so most problems I work on can be solved by hammering together arrays, linked lists, and hash tables. However, I've had the chance to work my data structures magic here and there:
Due to weird complex business rules, I worked on an application which used a custom thread pool implemented as a leftist-heap.
My dev team struggled to write a complex multithreaded app. It was plagued with race conditions, dead locks, and lousy performance due to very fine-grained locking. We re-worked the code to share state between threads, opting to write a very light-weight wrapper to facilitate message passing. Next, we converting our linked lists and hash tables to immutable stacks and immutable style and immutable red-black trees, we had no more problems with thread safety or performance. The resulting code was immaculate and surprisingly readable.
Frequently, a business rules engine requires you to roll your own state machine, which is very naturally modelled as a graph where vertexes and states and edges are transitions between states.
If for no other reasons, I'm glad I took the time to readable about data structures and algorithms simply to be able picture novel problems a little differently, especially combinatorial problems and graph problems. Graph theory is no longer a synonym for "scary".

Not letting users make mistakes vs. giving them flexibility

I'm working on a product which is meant to be simple to use and simple to set up, the competition largely requiring a long set up period and in some cases going as far as a bespoke solution for each customer. One part of our application is now expanding based on customer requests and it is looking like we'll need to make it very flexible so each customer can have a lot of control over how it behaves for them. The problem being that I don't want to make the system too configurable, as I believe this then makes it more complex to learn and to work with. I'm also concerned it opens the door to someone messing things up for themselves, kind of like handing them a gun, although I'm not actually pointing it at their foot for them.
Has anyone else faced a similar dilemma of putting power in users hands? How did you solve it? and what was the result?

I don't normally like to subscribe to the idea that all users are stupid, but there is a rule which can still be applied:
If you give them the opportunity, they WILL break it
Now it is up to you whether or not to give them the ability to do potentially dumb things. Or better yet, develop it so that when they do do the stupid voodoo that they do, it can be reverted or recovered from error state gracefully.

I highly recommend you read Joel's Controlling Your Environment Makes You Happy, which can be described as a treatise on user interface design but is really about usability with a healthy dab of psychology thrown in.
The section I'm referring to is Choices:
Every time you provide an option,
you're asking the user to make a
decision.
This is something I strongly agree with. Many developers, product managers and so on take the easy route and instead of figuring out what users actually need, they just give them a choice. You see this in enterprise bloatware like Clearcase or PVCS where there are so many options--90% of which you'd never change--indicating the designers have tried to make it all things to all men rather than doing one or two things exceptionally well.
Instead it just does lots of things badly.
Keep it simple, follow conventions, don't overwhelm the user with pointless and unnecessary choices and make the software behave like a normal user would expect. That alone would set you apart from an awful lot of other products.

Personally I like the TurboTax model (http://turbotax.intuit.com/). When creating a tax return, I get a simple, tell-me-like-I'm-five wizard that takes me step-by-step through the process, but I can step outside the process at any time and use more advanced features, returning to the process later.
Make it easy and simple and uncluttered for your user to do what they're going to do 80% of the time, but give them the power to deliberately step outside of the norm.

Interesting timing for your question. In the U.S. this is Income Tax week. Filling out the ol' 1040 and associated subforms should give us some sympathy for what users endure.
Lessons I take away are:
Only ask questions that relate to the user domain; avoid questions relating to the software system; and if you can derive the answer or suggest a most likely answer, do so.
Put related questions together (as long as they are normally entered by the same person using data most likely available at the same place and time, which is the definition of related for these purposes).
Make it support incremental input. It should be easy to enter the data they have, and defer completing it when the rest is available.
Show status validity and completeness. Make it clear and obvious how far they are to having validatable data.
Make it interruptable. Make sure it's possible to interrupt the process, leave the application, come back, and resume where they left off.
Yup, it's harder to program. Embrace it.

There are at least two ways to build a good software product:
Focus on a narrow set of functionality, and implement that functionality very well.
Design your system to be customizable (ideally, through scripting.) If you do the base system right, it will be easy to provide the basic, no options, just-do-what-I-want functionality on top of the customization layer.
Unfortunately, there are many more ways to create a bad software product.

Your questions implies that you can either provide a flexible solution OR make it foolproof.
I wouldn't put it like that. To me this is rather a matter of user expectations and the question in the first place would be:
How can I meet all important user expectations (even if they conflict with each other) without corrupting the application?
For instance a web application which has a menu, breadcrumb navigation, a site map and a search offers together with the inline links five different ways to find what you're looking for and how to go there.
That way most users can find fast and easily the functionality they are expecting and therefore the need for an extensive documentation might actually decrease.
So the answer might be to offer several different carefully chosen ways to solve one specific task, while each of them can be streamlined independent to avoid user mistakes.

The answer with this lies in who your end-users are. I used to write software that got used by professional sports coaches. While these guys were definitely good at what they did, they were hardly proficient at computer use, so our configurability was kept to a minimum (at least as far as what could be done in the GUI).
On the other hand, if you're dealing with power users, adding options is usually not a bad thing as long as they aren't intrusive.
It's all about who's going to be getting them.

Read Jeff Atwood's Training Your Users. It's a great article with some very useful links.

I like the approach of Firefox towards this. The basic options are accessible in the option menu, all the rest is under about:config. Thus you have an easy interface and an incredible flexibility if you need it.

I've had great success, and been happiest as an user, when using sensible defaults. In other words, make the most common use case easy (or even better, free), but give users the ability to step outside of that use case when the situation calls for it.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008