Code duplication metrics - Best practice

Code duplication metrics - Best practice - duplicates

When looking at code duplication metrics over a long period of time (>10 years) are there guidelines / best practices for what level of code duplication is "normal" or "recommended"?
I have great difficulty with this question as if the code quality was great then nobody needs to maintain it so who cares? But, in general terms, are there references on "normal". Say for 10 lines of duplication threshold.
Is a, say X%, duplication unusual or normal? If normal does that mean that there are healthy profitable projects our there with this level of duplication.
Perhaps the answer is if there is a study that includes code duplication as one of the metrics against success / average / failure? Perhaps people can share their success experience in maintenance costs for a level of code duplication?

In my opinion there is no general answer to this.
You should inspect every finding of the tool and decide if it is a false positive or justified (may be some standard coding pattern or generated code parts) or does contain problematic copy pasted code that should be refactored and extracted into a own function/module/whatever.
In case of false positive or intended code there should be a tool specific way to suppress the warning on this occurrence or on a general base on a specific pattern.
So in future runs, you always should get warnings if new duplicates are found (or older unfixed true positives).
May be on false positives consider filing a bug report to the author (crafting the smallest possible code-configuration that still throws that warning).

Related

Write programs that do one thing and do it well

I can grasp the part "do one thing" via encapsulation, Dependency Injection, Principle of Least Knowledge, and You Ain't Gonna Need It; but how do I understand the second part "do it well?"
An example given was the notion of completeness, given in the same YAGNI article:
for example, among features which allow adding items, deleting items, or modifying items, completeness could be used to also recommend "renaming items".
However, I found reasoning like that could easily be abused into feature creep, thus violating the "do one thing" part.
So, what is a litmus test for seeing rather a feature belongs to the "do it well" category (hence, include it into the function/class/program) or to the other "do one thing" category (hence, exclude it)?
The first part, "do one thing," is best understood via UNIX's ls command as a counterexample for its inclusion of excessive number of flags for formatting its output, which should have been completely delegated to another external program. But I don't have a good example to see the second part "do it well."
What is a good example where removing any further feature would make it not "do it well?"

I see "Do It Well" as being as much about quality of implementation of a function than about the completeness of a set functions (in your example having rename, as well as create and delete).
Do It Well manifests in many ways, some ways of thinking:
Behaviour in response to "special" inputs. Example, calculating the mean of some integers:
int mean(int[] values) { ... }
what does this do if the array has zero elements? If the items total more than MAX_INT?
Performance Characteristics. Has sufficient attention been given to behaviour as the data volumes increase?
Dependency Failures. If our implementation depends upon other modules or infrastructure what happens when these fail. Example: File System Full, Database Down?
Concerning feature creep itself, I think you're correct to indentify a tension here. One thing you might consider: you don't need to implment every feature providing that it's pretty obvious that a feature can be added easily without a complete rewrite.

The whole purpose of this advice is to make you favor quality over quantity.
The concept of one thing is subjective and depends on granularity. Would you say that a spreadsheet application does more than one thing if it can also print, or is that part of that one thing?
The point is that you should make sure that any feature, and the application itself, is done and will delight customers before you scramble to add new features.

I think your question points out the fundamentally organic nature of feature creep, and in understanding that nature, you will be empowered to meditate on the larger question.
Think of it like a garden: If you plant one thing and plant it well, say, a chrysanthemum, you aren't done at simply planting the seed. In fact you'll need to ensure that the soil is well tended, that the area is sufficiently protected, that the season is right, etc.
As your chrysanthemum (your one thing) grows, so too will other competitive plants - some that need to be weeded out and others that may actually compliment the original one thing. In fact, these other organisms may in some cases prove vital for the survival of your one thing.
Like those features that YAGN, a bit of vigilance is required to determine which weeds represent feature creep and which represent vital and complimentary functions.
Regardless, having done it well means simply that your chrysanthemum is hearty, healthy, and on-time. :-)

I would say an email program without the ability to add attachments would be a good example.

This may sound like an odd example, but I'd say dropbox is a good, albeit complex example.
Its managed to beat off a swathe of similar competing apps, through a dedication to simplification and a lack of feature creep tha,t as you mentioned, would violate the 'do one thing' principle. The ap lets you store documents in a folder that you can access anywhere, and that's about the limit of it. They drilled down to the core problem, and solved it in a way that works perfectly well in 90+% of cases.
Its hard to put a hard and fast rule to it, but I'd say that catering to around the 90% majority of use cases and ignoring 'fringe requirements' is the best way to stick to this rule.
I'd guess 90+% of ls use is with no arguments or maybe two or three of the most popular. The 'do it well' principle should focus on what the majority of users need, instead of catering for power users or fringe cases, as ls does with its plethora of options.
This is what dropbox does successfully and why it is pretty well agreed upon as an example of good application design.

Being pressured to GOTO the dark-side

We have a situation at work where developers working on a legacy (core) system are being pressured into using GOTO statements when adding new features into existing code that is already infected with spaghetti code.
Now, I understand there may be arguments for using 'just one little GOTO' instead of spending the time on refactoring to a more maintainable solution. The issue is, this isolated 'just one little GOTO' isn't so isolated. At least once every week or so there is a new 'one little GOTO' to add. This codebase is already a horror to work with due to code dating back to or before 1984 being riddled with GOTOs that would make many Pastafarians believe it was inspired by the Flying Spaghetti Monster itself.
Unfortunately the language this is written in doesn't have any ready made refactoring tools, so it makes it harder to push the 'Refactor to increase productivity later' because short-term wins are the only wins paid attention to here...
Has anyone else experienced this issue whereby everybody agrees that we cannot be adding new GOTOs to jump 2000 lines to a random section, but continually have Anaylsts insist on doing it just this one time and having management approve it?
tldr;
How can one go about addressing the issue of developers being pressured (forced) to continually add GOTO statements (by add, I mean add to jump to random sections many lines away) because it 'gets that feature in quicker'?
I'm beginning to fear we may lose valuable developers to the raptors over this...
Clarification:
Goto here
alsoThere: No, I'm talking about the kind of goto that jumps 1000 lines out of one subroutine into another one mid way into a while loop. Goto somewhereClose
there: I wasn't even talking about the kind of gotos you can reasonably read over and determine what a program was doing. Goto alsoThere
somewhereClose: This is the sort of code that makes meatballs midpoint: If first time here Goto nextpoint detail:(each one almost completely different) Goto pointlessReturn
here: In this question, I was not talking about the occasionally okay use of a goto. Goto there
tacoBell: and it has just gone back to the drawing board. Goto Jail
elsewhere: When it takes Analysts weeks to decypher what a program is doing each time it is touched, something is deeply wrong with your codebase. In fact, I'm actually up to my hell:if not up-to-date goto 4 rendition of the spec goto detail pointlessReturn: goto tacoBell
Jail: Actually, just a small update with a small victory. I spent 4 hours refactoring a portion of this particular program a single label at a time, saving each iteration in svn as I went. Each step (about 20 of them) was smallish, logical and easy enough to goto bypass nextpoint: spontaneously jump out of your meal and onto you screen through some weird sort of spaghetti-meatball magnetism. Goto elseWhere
bypass: reasonably verify that it should not introduce any logic changes. Using this new more readable version, I've sat down with the analyst and completed almost all of this change now. Goto end
4: first *if first time here goto hell, no second if first time here goto hell, no third if first time here goto hell fourth now up-to-date goto hell
end:

How many bugs have been introduced because of incorrectly written GOTOs? How much money did they cost the company? Turn the issue into something concrete, rather than "this feels bad". Once you can get it recognized as a problem by the people in charge, turn it into a policy like, "no new GOTOs for anything except simplifying the exit logic for a function", or "no new GOTOs for any functions that don't have 100% unit test coverage". Over time, tighten the policies.

GOTOs don't make good code spaghetti, there are a multitude of other factors involved. This linux mailing list discussion can help put a few things into perspective (comments from Linus Torvalds about the bigger picture of using gotos).
Trying to institute a "no goto policy" just for the sake of not having gotos will not achive anything in the long run, and will not make your code more maintainable. The changes will need to be more subtle and focus on increasing the overall quality of the code, think along the lines of using best practices for the platform/language, unit test coverage, static analysis etc.

The reality of development is that despite all the flowery words about doing it the right way, most clients are more interested in doing it the fast way. The concept of a code base rapidly moving towards the point of imploding and the resulting fallout on their business is something that they cannot comprehend because that would mean having to think beyond today.
What you have is just one example. How you stand on this will dictate how you do development in the future. I think you have 4 options:
Accept the request and accept that you will always be doing this.
Accept the request, and immediately start looking for a new job.
Refuse to do and and be prepared to fight to fix the mess.
Resign.
Which option you choose is going to depend on how much you value your job and your self esteem.

Maybe you can use the boyscout-pattern: Leave the place a little more clean than you found it. So for every featurerequest: don't introduce new gotos, but remove one.
This won't spend too much time for improvements, would give more time, to find newly introduced bugs. Maybe it wouldn't cost much additional time, if you remove a goto from the part, which although had to spend time understanding, and bringing the new feature in.
Argue, that a refactoring of 2 hours will save 20 times 15 minutes in the future, and allow faster and deeper improvements.

Back to principles:
Is it readable?
Does it work?
Is it maintainable?

This is the classic "management" vs. "techies" conflict.
In spite of being on the "techie" team, over the years I have settled firmly the "management" side of this debate.
There are a number of reasons for this:
It's quite possible to have well written easy to read properly structured programs with gotos! Ask any assembler programmer; conditional branch and a primitive do loop are all they have to work with. Just because the "style" is not current and doesn't mean its not well written.
If it is a mess then its going to be a real pain to extract the busines rules you will need if you are going for a complete re-write, or, if you are doing a technical refactoring of the program you will never be sure the behaviour of the refactored program is "correct" i.e. it does exactly what the old program does.
Return on investment -- sticking to the original programming style and making minimal changes will take minimum effort and quickly statisfy the customers request. Spending a lot of time and effort refactoring will be more expensive and take longer.
Risk -- rewrites and refactoring are hard to get right, extensive testing of the refactored code is required and things that look like "bugs" may have been "features" as far as the customer is concerned. A particular problem with "improving" legacy code is that business users may have well established work arounds that depend on a bug being there, or, exploit the existense of a bugs to change the business procedures without informing the IT department.
So all in all management is faced with a decision -- "add one little goto" test the change and get it into production in double quick time with little risk -- or -- go in for a major programming effort and have the business scream at them when a new set of bugs crops up.
And if you do decide to refactor in five years time some snotty nosed college graduate will be moaning that your refactored program is not buzzword compliant any more and demanding he be allowed to spend weeks rewriting it.
If it ain't broke dont fix it!
PS: Even Joel thinks this is a bad idea: things you should never do
Update!-
OK if you want to refactor and improve the code you need to go about it properly.
The basic problem is you are saying to the client is "I want to spend n weeks working on this program and, if everything goes well, it will do exactly what it does now." -- this is a hard sell to say the least.
You need to gather long term data on the number of crashes and outages, the time spent analysing and programming seemingly small changes, the number of change requests that were not done because they were too hard, business opertunities lost because the system could not change fast enough. Also gather some acedemic data on the costs of maintaing well structured programs vs. letting it sink.
Unless you have a watertight case to present to the beancounters you will not get the budget. You really have to sell this to the business management, not, your immediate bosses.

I recently had to work on some code that wasn't legacy per se, but the former developers' habits certainly were and thus GOTO's were everywhere. I don't like GOTO's; they make a hideous mess of things and make debugging a nightmare. Worse yet, replacing them with normal code is not always straightforward.
IF you can't unwind your GOTO's I certainly recommend that you no longer use them.

Unfortunately the language this is written in doesn't have any ready made refactoring tools
Don't you have editors with macro capabilities? Don't you have shell scripts? I've done tons of refactoring over the years, very little of it with refactoring browsers.

The underlying problem here seems to be that you have 'analysts' who describe, presumably necessary, functional changes in terms of adding a goto to some code. And then you have 'programmers' whose job appears to be limited to typing in that change and then complaining about it.
To make anything different, that dysfunctional allocation of responsibilities between those people needs to change. There are a lot of different ways to split up the work: The classic, conventional one (that is quite likely the official, but ignored, policy in your work) is to have analysts write a implementation-independent specification document and programmers implement it as maintainably as they can.
The problem with that theoretical approach is it is actually unworkably unrealistic in many common situations. In particular, doing it 'properly' requires employees seen as junior to win an argument with colleagues who are more senior, have better social connections in the office, and more business-savvy. If the argument 'goto is not implementation-independent, so as an analyst you simply can't say that word' doesn't fly at your workspace, then presumably this is the case.
Far better in many circumstances are alternatives like:
analysts write client-facing code and developers write infrastructure.
Analysts write executable specifications which are used as reference implementations for unit tests.
Developers create a domain specific language in which analysts program.
You drop he distinction between one and the other, having only a hybrid.

If a change to the program requires "just one little goto" I would say that the code was very maintainable.
This is a common problem when dealing with legacy code. Stick to the style the program was originally written in or "modernize" the code. For me the answer is usually to stick with the original style unless you have a really big change that would justify a complete re-write.
Also be aware that several "modern" language features like java's "throw" statement, or SQLs "ON ERROR" are really "GO TO"s in disguise.

How to prioritize bugs?

In my current company there isn't clear understanding between the test and development teams as to how severe a bug should be? There are arguments which go back and forth to reduce or to increase the severity. We are not as of now aware of any documents which lays the rules. The tester raises the bug and assigns priority based on his intuition. The developer would request a change based on his load or some other factor.
How are severity/priority of bugs classified? Are there any standards which guide how software defect priorities needs to be determined based on customer needs, time lines and other things?

Use priority levels that deliberately have nothing to do with severity or impact, and describe only the conceptual position of the bug in the schedule. This field will determine which bugs get worked on, so it will be a target for negotiation.
Use severity levels that deliberately have concrete, verifiable definitions not open to negotiation, that have nothing to do with scheduling or priority. I've worked successfully with the severity definitions used by the Debian BTS, generalised to apply to programming projects in general.
That way, the severity is much more a matter of verifiable fact, independent of a statement of priority. The priority is then free to be tweaked up and down by negotiation or whatever, without affecting the factual information in the severity field.
Attempting to conflate both “severity” and “priority” into a single field will lead to soul-draining arguments and wasted time. The bug reporter needs a firm guide of fact to determine how “bad” the bug is, and this needs to be easily agreed on by independent parties. The priority, on the other hand, is the correct target for negotiation and scheduling games.

I work on emergency control centre systems, so this set of bug levels is a little, well... extreme:
someone dies
total system failure requiring DR invocation
server failure requiring engineer response
failure involving loss of call continuity
failure involving loss of data
incorrect data recorded
application failure - non-recoverable
application failure - non-recoverable, but automatically restarted
does not meet requirement spec, no workaround
does not meet requirement spec, but has workaround
cosmetic - layout etc.
actually a feature request
That's off the top of my head. In case you were wondering, it's from most extreme to least :-)

Some stuff we used before. We split the defect rating into priority and severity.
Severity (set by submitter during submission of defect)
Highest (5): Data loss, hardware damage possible, or a security-related failure
High (4): Loss of functionality without any reasonable workaround
Medium (3): Loss of functionality with a reasonable workaround
Low (2): Partial loss of a function or a feature set (feature still hits the design requirements)
Lowest (1): A cosmetic error
Priority (adjusted by development, management and QA during defect evaluation)
Highest (5): The system is practically unusable with this defect.
High (4): The defect will have a serious impact on the company’s ability to sell and maintain this system.
Medium (3): The company will lose some money if this defect is in the system, but it might be more important to meet the schedule. Fix after release.
Low (2): Do not delay the release, but do fix this problem afterwards.
Lowest (1): Fix as time and resources allow.
Both numbers together create a risk priority number (RPN). Simply multiply severity with priority. Higher result means higher risk. 25 defines the ultimate defect bomb. 1 can be done during idle time or if someone is bored and needs something to do.
First goal: Defects with a rating of highest or high of any kind should be fixed before release.
Second goal: Defects with RPN > 8 should be fixed before releasing the product.
This is of course a little bit artificial but helps to give all parties (Support, QA/Test, Engineering, and Product Managers) a tool to set priorities without blowing away the opinion of other side.

Replace your bug tracking system with fogbugz and get rid of severity field altogether.
See Priority vs Severity

"Been there done that".
I've had this discussion over and over again, on different projects. We've tried to combine priority with severity, but the lesson I've learned: do not combine severity with priority !
We've had a lot of brainstorms and meetings which ended with the words "this is it". Multiple guideline-documents have been created and spread between the different "parties", but after a while we discovered that it didn't work at the end. Different "parties" think different about bugs: our helpdesk has another understanding of priority than the development team or the sales has.
Having both a severity and a priority level will very quickly become very confusion because:
when using numbers (between 1 to 5) one will not know what each number means
what if an issue has the highest possible priority, but the lowest possible severity - and I'm sure that this will happen!
what if someone reduces a severity, does he need to reduce the priority also?
"So what should you do then?":
Only use one kind of indicator for the 'level' of an issue: Doesn't matter how you call it.
Use numbers (eg 1 - 5, but could be more or less depending on your needs) to clearly indicate the importance but combine it with a keyword so that it's clear what it means (eg. 'nice to have', 'show stopper'). For some people prio 1 means the most import, for others 5 does -> therefore a keyword to indicate what a number means is necessary.
Make a distinction between a 'normal issue' or a 'red alert'. In our case a 'Red Alert' must be solved immediately and put in production immediately. A normal issue will follow the normal development-test-deployment-flow. The priority/severity/however-how-you-call-it should only be set for normal issues and will be ignored for 'red alerts'.
*> In practice, a 'Red Alert' can become a
'Normal Issue': the support team
discovered a major bug and created a
'Red Alert'. But after some
investigation we discovered that data
had become 'corrupt' in the database
since it was inserted there directly
and not via the application.*
Choose a good tool that allows you to customize the flow; but most tools do.

As for a standard, IEEE guide to classification for software anomalies although I am not sure how widely this is adopted. IEEE 1044.1-1995

One option is to have the product owner determine the priority of the bug. While there is some general intuition on how "bad" a bug is, it can be the responsibility of the owner of the product to set an order of precidence (i.e. bug A should be fixed before bug B etc...).
The more information (clear and concise) that can be provided to the product owner can assist that individual make those determinations (i.e. how many users have experienced the bug, what features are not available as a result of the bug, etc...)

Must be done now
Must be done before we ship
Minor annoyance (Doesn't prevent the user from exercising the functionality)
Edge case/Remote/Tester-from-Mordor scenario
Well I just made that up... my point being categorizing bugs should not be a weekly hour+ long ritual..
IMHO, prioritizing acc to a flowchart is wasted time. Fix bugs in Cat#1 and #2 - as quickly as they surface. If you find yourself swamped by bugs, slow down and reflect. Defer Cat#3 and Cat#4 if the schedule doesn't permit or higher priority items override.
The critical thing is that all of you have a shared understanding of this severity and expected quality. Don't let compliance to the holy standards of X slow you down from delivering what the customer wants... working software.

Personally I favour the two tier severity/priority model. I know the arguments for a single level but the places I've worked generally I've just seen a two level heirarchy work better
Severity is set by the support team (based on input from the client). Priority is set by the client (with input from the support team).
For severity I use:
1 - Blocker/show stopped
2 - Major functionality unavailable (or effectively unavailable), no practical work around possible
3 - Major functionality unavailable (or ...), work around possible
4 - Minor functionality unavailable (or effectively unavailable), no work around possible
5 - Minor functionality unavailable (or ...), work around possible
6 - Cosmetic or other trivial
Then for priority I just use High, Medium, Low but anything from 3 - 5 levels works (much more than that is just over the top).
I'd generally then order by Priority first and then severity within that. The important thing about this is that the client has the most important say. If they say the way their logo is printing out on a report is the highest priority then that's what gets looked at BUT it gets looked at after the other client's high priority which is stopping them logging in.
Generally speaking I wouldn't release with any high priority issues or any medium priority issues with severity 1 - 4. Obviously in an ideal world you'd fix everything but I've never been lucky enough to have that option.

The tester tells what is broken
The developer estimates how much work it will be to fix
The customer decides the business value, i.e. the priority.

Set the requirements of the project so you can base the priority of a fix on the priority of the requirements interfered by the bug.

I had the same issue with one of our customers. In the end we set up a document together describing what kind of bugs would match to a certain severity. Aside from an occasional discussion using this document as a guideline appears to work.
But be well aware that test teams and development teams may have very different opinions on what is a severe bug and what is not. From the point of view of the testers a small layout bug can be high priority when a developer would just say that no one will notice.
In our document those bugs can be high priority if they are "brand damaging", i.e. if the layout bug is in the logo or one of the products then it is severe - if it's just a paragraph on the page that is 2 pixels off then it's not.

I use the following categories both for features and bugs:
Showstopper, the program (or a major feature) will not work
Must have, a significant part of the customers will be bothered by this
Would have, some customers will be bothered
Nice to have, a few customers want this
Normally you plan to fix 1, 2 and 3, but 3 is often postponed to a next release due to time constraints.

I think this is the scale we used at a previous job:
Causes loss of files or system instability.
Crashes the program.
Feature doesn't work.
Feature doesn't work, but there are workarounds.
Cosmetic issue.
Request for enhancement.
Sometimes this was abused - if a feature was so poorly designed that someone couldn't figure out how to use it, that was classified as a 6, and it never got fixed.

I agree with the FogBugz folks that this should be kept super simple: http://fogbugz.stackexchange.com/questions/352/priority-vs-severity
I made up this scheme, which I find easy to remember:
pS: seconds matter, eg, server is on fire
pM: minutes matter, eg, something is broken
pH: hours matter, ie, don't go to bed till this is done
pd: days matter, ie, normal priority
pw: weeks matter, ie, lower priority
pm: months matter, ie, no hurry
py: years matter, ie, maybe/someday, ie, wishlist
It roughly parallels Debian's scheme: http://www.debian.org/Bugs/Developer#severities
I like it because it straightforwardly combines priority and severity into a single field that's easy to set a value for.
PS: You can also pick intermediate urgencies like "pMH" for in between "minutes matter" and "hours matter". Or "pHd" is in between "hours mattter" and "days matter" -- roughly, "don't literally pull an all-nighter for it but don't work on anything else till it's done".

What code metric(s) convince you that provided code is "crappy"? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Code lines per file, methods per class, cyclomatic complexity and so on. Developers resist and workaround most if not all of them! There is a good Joel article on it (no time to find it now).
What code metric(s) you recommend for use to automatically identify "crappy code"?
What can convince most (you can't convince all of us to some crappy metric! :O) ) of developers that this code is "crap".
Only metrics that can be automatically measured counts!

Not an automated solution, but I find WTF's per minute useful.
(source: osnews.com)

No metrics regarding coding-style are part of such a warning.
For me it is about static analysis of the code, which can truly be 'on' all the time:
cyclomatic complexity (detected by checkstyle)
dependency cycle detection (through findbugs for instance)
critical errors detected by, for instance findbugs.
I would put coverage test in a second step, as such tests can take time.
Do not forget that "crappy" code are not detected by metrics, but by the combination and evolution (as in "trend) of metrics: see the What is the fascination with code metrics? question.
That means you do not have just to recommend code metrics to "automatically identify "crappy code"", but you also have to recommend the right combination and trend analysis to go along those metrics.
On a sidenote, I do share your frustration ;), and I do not share the point of view of tloach (in the comments of another answers) "Ask a vague question, get a vague answer" he says... your question deserve a specific answer.

Number of warnings the compiler spits out when I do a build.

Number of commented out lines per line of production code. Generally it indicates a sloppy programmer that doesn't understand version control.

Developers are always concerned with metrics being used against them and calling "crappy" code is not a good start. This is important because if you are worried about your developers gaming around them then don't use the metrics for anything that is to their advantage/disadvantage.
The way this works best is don't let the metric tell you where the code is crappy but use the metric to determine where you need to look. You look by having a code review and the decision of how to fix the issue is between the developer and the reviewer. I would also error on the side of the developer against the metric. If the code is still popping on the metric but the reviewers think it is good, leave it alone.
But it is important to keep in mind this gaming effect when your metrics start to improve. Great, I now have 100% coverage but are the unit tests any good? The metric tells me I am ok, but I still need to check it out and look at what got us there.
Bottom line, the human trumps the machine.

number of global variables.

Non-existent tests (revealed by code coverage). It's not necessarily an indicator that the code is bad, but it's a big warning sign.
Profanity in comments.

Metrics alone do not identify crappy code. However they can identify suspicious code.
There are a lot of metrics for OO software. Some of them can be very useful:
Average method size (both in LOC/Statements or complexity). Large methods can be a sign of bad design.
Number of methods overridden by a subclass. A large number indicates bad class design.
Specialization index (number of overridden methods * nesting level / total number of methods). High numbers indicate possible problems in the class diagram.
There are a lot more viable metrics, and they can be calculated using tools. This can be a nice help in identifying crappy code.

global variables
magic numbers
code/comment ratio
heavy coupling (for example, in C++ you can measure this by looking at class relations or number of cpp/header files that cross-include each other
const_cast or other types of casting within the same code-base (not w/ external libs)
large portions of code commented-out and left in there

My personal favourite warning flag: comment free code. Usually means the coder hasn't stopped to think about it; plus it automatically makes it hard to understand, so ups the crappy ratio.

At first sight: cargo cult application of code idioms.
As soon as I have a closer look: obvious bugs and misconceptions by the programmer.

My bet: combination of cyclomatic complexity(CC) and code coverage from automated tests(TC).
CC | TC
2 | 0% - good anyway, cyclomatic complexity too small
10 | 70% - good
10 | 50% - could be better
10 | 20% - bad
20 | 85% - good
20 | 70% - could be better
20 | 50% - bad
...
crap4j - possible tool (for java) and concept explanation ... in search for C# friendly tool :(

Number of worthless comments to meaningful comments:
'Set i to 1'
Dim i as Integer = 1

I don't believe there is any such metric. With the exception of code that actually doesn't do what it's supposed to (which is a whole extra level of crappiness) 'crappy' code means code that is hard to maintain. That usually means it's hard for the maintainer to understand, which is always to some extent a subjective thing, just like bad writing. Of course there are cases where everyone agrees the writing (or the code) is crappy, but it's very hard to write a metric for it.
Plus everything is relative. Code doing a highly complex function, in minimal memory, optimized for every last cycle of speed, will look very bad compared with a simple function under no restrictions. But it's not crappy - it's just doing what it has to.

Unfortunately there is not a metric that I know of. Something to keep in mind is no matter what you choose the programmers will game the system to make their code look good. I have seen that everywhere any kind of "automatic" metric is put into place.

A lot of conversions to and from strings. Generally it's a sign that the developer isn't clear about what's going on and is merely trying random things until something works. For example, I've often seen code like this:
object num = GetABoxedInt();
// long myLong = (long) num; // throws exception
long myLong = Int64.Parse(num.ToString());
when what they really wanted was:
long myLong = (long)(int)num;

I am surprised no one has mentioned crap4j.

Watch out for ratio of Pattern classes vs. standard classes. A high ratio would indicate Patternitis
Check for magic numbers not defined as constants
Use a pattern matching utility to detect potentially duplicated code

Sometimes, you just know it when you see it. For example, this morning I saw:
void mdLicense::SetWindows(bool Option) {
_windows = (Option ? true: false);
}
I just had to ask myself 'why would anyone ever do this?'.

Code coverage has some value, but otherwise I tend to rely more on code profiling to tell if the code is crappy.

Ratio of comments that include profanity to comments that don't.
Higher = better code.

Lines of comments / Lines of code
value > 1 -> bad (too many comments)
value < 0.1 -> bad (not enough comments)
Adjust numeric values according to your own experience ;-)

I take a multi-tiered approach with the first tier being reasonable readability offset only by the complexity of the problem being solved. If it can't pass the readability test I usually consider the code less than good.

TODO: comments in production code. Simply shows that the developer does not execute tasks to completion.

Methods with 30 arguments. On a web service. That is all.

Well, there are various different ways you could use to point out whether or not a code is a good code. Following are some of those:
Cohesiveness: Well, the block of code, whether class or a method, if found to be serving multiple functionality, then the code can be found to be lower in cohesiveness. The code lower in cohesiveness can be termed as low in re-usability. This can further be termed as code lower in maintainability.
Code complexity: One can use McCabe cyclomatic complexity (no. of decision points) to determine the code complexity. The code complexity being high can be used to represent code with less usability (difficult to read & understand).
Documentation: Code with not enough document can also attribute to low software quality from the perspective of usability of the code.
Check out following page to read about checklist for code review.

This hilarious blog post on The Code C.R.A.P Metric could be useful.

How to measure usability to get hard data?

There are a few posts on usability but none of them was useful to me.
I need a quantitative measure of usability of some part of an application.
I need to estimate it in hard numbers to be able to compare it with future versions (for e.g. reporting purposes). The simplest way is to count clicks and keystrokes, but this seems too simple (for example is the cost of filling a text field a simple sum of typing all the letters ? - I guess it is more complicated).
I need some mathematical model for that so I can estimate the numbers.
Does anyone know anything about this?
P.S. I don't need links to resources about designing user interfaces. I already have them. What I need is a mathematical apparatus to measure existing applications interface usability in hard numbers.
Thanks in advance.

http://www.techsmith.com/morae.asp
This is what Microsoft used in part when they spent millions redesigning Office 2007 with the ribbon toolbar.
Here is how Office 2007 was analyzed:
http://cs.winona.edu/CSConference/2007proceedings/caty.pdf
Be sure to check out the references at the end of the PDF too, there's a ton of good stuff there. Look up how Microsoft did Office 2007 (regardless of how you feel about it), they spent a ton of money on this stuff.

Your main ideas to approach in this are Effectiveness and Efficiency (and, in some cases, Efficacy). The basic points to remember are outlined on this webpage.
What you really want to look at doing is 'inspection' methods of measuring usability. These are typically more expensive to set up (both in terms of time, and finance), but can yield significant results if done properly. These methods include things like heuristic evaluation, which is simply comparing the system interface, and the usage of the system interface, with your usability heuristics (though, from what you've said above, this probably isn't what you're after).
More suited to your use, however, will be 'testing' methods, whereby you observe users performing tasks on your system. This is partially related to the point of effectiveness and efficiency, but can include various things, such as the "Think Aloud" concept (which works really well in certain circumstances, depending on the software being tested).
Jakob Nielsen has a decent (short) article on his website. There's another one, but it's more related to how to test in order to be representative, rather than how to perform the testing itself.

Consider measuring the time to perform critical tasks (using a new user and an experienced user) and the number of data entry errors for performing those tasks.

First you want to define goals: for example increasing the percentage of users who can complete a certain set of tasks, and reducing the time they need for it.
Then, get two cameras, a few users (5-10) give them a list of tasks to complete and ask them to think out loud. Half of the users should use the "old" system, the rest should use the new one.
Review the tapes, measure the time it took, measure success rates, discuss endlessly about interpretations.
Alternatively, you can develop a system for bucket-testing -- it works the same way, though it makes it far more difficult to find out something new. On the other hand, it's much cheaper, so you can do many more iterations. Of course that's limited to sites you can open to public testing.
That obviously implies you're trying to get comparative data between two designs. I can't think of a way of expressing usability as a value.

You might want to look into the GOMS model (Goals, Operators, Methods, and Selection rules). It is a very difficult research tool to use in my opinion, but it does provide a "mathematical" basis to measure performance in a strictly controlled environment. It is best used with "expert" users. See this very interesting case study of Project Ernestine for New England Telephone operators.

Measuring usability quantitatively is an extremely hard problem. I tackled this as a part of my doctoral work. The short answer is, yes, you can measure it; no, you can't use the results in a vacuum. You have to understand why something took longer or shorter; simply comparing numbers is worse than useless, because it's misleading.
For comparing alternate interfaces it works okay. In a longitudinal study, where users are bringing their past expertise with version 1 into their use of version 2, it's not going to be as useful. You will also need to take into account time to learn the interface, including time to re-understand the interface if the user's been away from it. Finally, if the task is of variable difficulty (and this is the usual case in the real world) then your numbers will be all over the map unless you have some way to factor out this difficulty.
GOMS (mentioned above) is a good method to use during the design phase to get an intuition about whether interface A is better than B at doing a specific task. However, it only addresses error-free performance by expert users, and only measures low-level task execution time. If the user figures out a more efficient way to do their work that you haven't thought of, you won't have a GOMS estimate for it and will have to draft one up.
Some specific measures that you could look into:
Measuring clock time for a standard task is good if you want to know what takes a long time. However, lab tests generally involve test subjects working much harder and concentrating much more than they do in everyday work, so comparing results from the lab to real users is going to be misleading.
Error rate: how often the user makes mistakes or backtracks. Especially if you notice the same sort of error occurring over and over again.
Appearance of workarounds; if your users are working around a feature, or taking a bunch of steps that you think are dumb, it may be a sign that your interface doesn't give the tools to figure out how to solve their problems.
Don't underestimate simply asking users how well they thought things went. Subjective usability is finicky but can be revealing.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008