What factors do I need to consider to determine whether I should "trust the defaults" with respect to encryption - language-agnostic

Background
With respect to cryptography in general, the following advice is so common that it may even be platform and language-agnostic.
Cryptography is an incredibly complex subject which developers should leave to security experts`
I understand and agree with the reasoning behind this statement, and therefore follow the advice when using cryptography in an application.
That being said, because cryptography is tread upon so lightly in all but crypto-specific reference material, I do not know enough about how cryptography works in order to be able to determine whether the default provided to me is adequate for the situation I'm in.There are thousands of crypto frameworks out there in a myriad of different languages, I refuse to believe that every one of those implementations is secure because I don't believe every crypto implementation was created by a crypto expert, principally because if popular opinion is to be believed there just aren't that many of them.
Question:
What information do I need to know about a given encryption algorithm to be able to determine for myself whether an algorithm is a reasonable choice?

You need to know the current estimates of time-to-break for each algorithm variant.
You need to know the certifications for particular libraries.
You need to know the required effective security level for the data you are encrypting. Health information in the USA has particular requirements, for example. So do electric utilities.
The more technical you want to get with crypto algorithm evaluation, the more you are wanting the services of an expert. :-/
Consider http://www.cryptopp.com as an example of information provided. For instance, it is certified by NIST.

What information do I need to know
about a given encryption algorithm to
be able to determine for myself
whether an algorithm is a reasonable
choice?
Once you identify what you do need, there are very few peer-reviewed solutions you can trust. For example:
Symmetric Encryption: AES (Rijndael), Triple DES
Asymmetric Encryption: Diffie-Hellman, RSA
Hashing: The SHA family of functions
These are proven, battle-tested solutions. Until someone proves otherwise, they can be used safely. It's been a while since cryptography departed from security through obscurity and "roll your own" implementations.
There's a lot of cryptographic quackery out there, just be careful when choosing your solution. Make sure it's built on proven technologies, and if it sounds too good or has words like "unbreakable," "revolutionary" or the like, you can be 99% sure that it's bogus.

The effective methods are well documented and extensively used. I tend to think of three situations relative to cryptography:
If a government sized entity wants your stuff, they'll get it.
For confidential personal or business stuff, social engineering and non-cryptographic means are almost always more effective than code-breaking for almost any imaginable situation.
For hiding stuff from friends, relations, and mere interlopers, anything off the shelf is sufficient. In these scenarios that you have hidden stuff is typically more damning than the stuff itself might be.
There was a time when railroad boxcars switched from heavy-duty padlocks to easily defeated but hard to forge loops of wire. Make the lock stronger and they just go in through the walls. Turn the lock into an intrusion detector and you've gained something.
Signing and authentication are turning out to be better uses of cryptography than mere encryption.

Related

How large a role does subjectiveness play in programming?

I often read about the importance of readability and maintainability. Or, I read very strong opinions about which syntax features are bad or good. Or discussions about the values of certain paradigms, like OOP.
Aside from that, this same question floats about in my mind whenever I read debates on SO or Meta about subjective questions. Or read questions about best practices and sometimes find myself or others disagreeing.
What role does subjectiveness play within the programming realm?
Sometimes I think it plays a large role. Software developers are engineers in a way, but also people. A large part of programming is dealing with code that's human readable. This is very different from Math or Physics or other disciplines with very exact and structured rules. Here the exact structure and rules are largely up in the air, changeable on a whim, and hence the amount of languages in existence. And one person may find one language very readable, and another person may find their own language the most comforting.
The same with practices. One person may not like certain accepted practices. I myself find splitting classes into different files very unreadable, for instance.
But, I can't say rules haven't helped in general. Certain practices have and do make life easier. And new languages have given rise to syntax and structure that make life easier. There's certainly been a progression towards code that is easier to read and maintain even given a largely diverse group of people. So maybe these things aren't as subjective as I thought.
It reminds me, in a way, of UI design. Certainly it's subjective, but then there's an entire discipline involved in crafting good UI and it tends to work.
Is there something non-subjective about the ideas behind maintainability, readability, and other best practices? Is there something tangible to grasp when one develops a new language or thinks of new practices?
Arguably your question is really about the distinction between programming, which is mathematical, algorithmic and scientific, and software engineering, which is subjective, variable and human-focused.
Great programmers are not necessarily great software engineers, and vice versa. The two skillsets, while not exclusive by any means, have less overlap than they appear at first. Their relative importance depends a lot on the project: a brilliant programmer working alone can turn out amazing examples of technical genius, and it doesn't matter that nobody else can understand or maintain it, because he's not going to share the code anyway. But move into an enterprise environment -- like corporate in-house software development -- and I'll gladly trade you ten "cave troll" geniuses for a mediocre programmer who understands the importance of readability and documentation.
It's been my experience that the world needs great software engineers more than it needs great programmers. Relatively few people in this day and age are writing software which is truly performance-critical (OS kernels, compilers, graphics engines, realtime embedded systems, etc), and the Internet allows mediocre programmers to quickly grab algorithmic solutions for problems they couldn't solve alone. But nearly everyone writing professional code has to work within a team. And team productivity rises and falls dramatically on the ability of its members to communicate effectively and distribute workload efficiently, two skills which are highly subjective and impossible to prove by rigid formula.
Most software engineering principles are built on experience rather than objective law. Much like the social sciences, we study, learn, adapt and apply -- but with no real guarantees of outcome. All we can say is that some things seem to work better than others in most groups.
I think, a lot of it is necessarily determined by how much our mind is able to process at one time. So it comes down to how much the language and tools enable a team or a developer to break down the problem into chunks that are meaningful by themselves, but not so large that it becomes too hard to grasp them. The common theme is the art of organizing information (in this case, the code, the logic, ...) But that's not so different from Maths or Physics, by the way.
Just as the best authors borrow from many styles, the best programmers keep a huge range of patterns in their mental arsenal. Slavishly following a few patterns and adhering to some absolute truth is both lazy and dangerous.
Put it another way, the day we rely on robots for code review is the day I quit.
It all depends on your point of view :-)
But to answer your questions, I think one way to view subjectivity is to recognize that software languages, tools, and best practices are a shared means of communication among individuals. Yes, a programming language is a formal way of instructing a computer how to behave, but a programming language may also be viewed as a way to define and communicate specifications to a high level of detail (the code is the ultimate spec, is it not?).
So as far as we may want to concern ourselves with the degree of subjectivity in software languages, tools, and best practices, I would say that the lack of subjectivity may indicate how well communication is facilitated.
Yes, individuals have certain proclivities that are expressed in their habits and tendencies, but that should not ultimately matter too much in the perfect platform for development.
Turning to my Maths PhD wife I asked if there's any subjectivity in mathematics. Her answer is yes there is, mainly in the way we as humans achieve the answer.
If a mathematical proof is the result, how you get to that result can vary. If the dataset is large you may need to use a computer, which can introduce errors, and thus debated about whether that is the right approach. Or sometimes mathematicians can disagree on the theory - one is trying to prove that x is true while the other is trying to prove that x is false.
I think the same thing exists in computer science. A correct answer is a program that runs correctly, but that definition of correct may be different for each project. Sometimes correct means no bugs. Sometimes it means running efficiently.
From here programmers can argue how best to achieve the "correct" result. A good example of this is is the FizzBuzz application. A simple answer would be just a for loop, but Enterprise FizzBuzz is also "correct" in that it produces the correct answer, but is generally laughed at as "bad" engineering due to its overcomplication of the idea (it was a joke app after all).
How large a role does subjectiveness play in programming? I'd say it's a very large part of what we do, simply because we are human, and because there are multiple ways of getting the "correct" answer so there is disagreement over which way is the best.
Studies have been done showing that certain practices reduce defect rates in software. For instance, a study found a strong correlation between cyclomatic complexity and the probability of being fault-prone. Other studies show the average effectiveness of design and code inspections are 55 and 60 percent. So it appears to be in our best interests to favor simplicity, check metrics, and do code reviews.
We're talking probabilities here, though. If I review your code, I'm not guaranteed to find 60% of your bugs. There are also few absolutes in software development; experienced developers know that the correct answer is generally "it depends." That said, there are a number of practices with objective data in their favor.

Analyzing client requirements and formulating software functionalities

How does one go about properly analyzing the requirements of a client in terms of software and writing down a list of functionalities? Feel free to share your own experiences.
Requirement engineering is a discipline on its own. It can be actually quite hard, because most of the time people want something, but don't know what it is. Then comes the hard part to identify and determine what is really needed, that's called requirement elicitation.
There are various methods to do so, including workshop sessions with customers, or sending analysts at customer site to identify how the client works and what it needs. There are also numerous tools to support this activity and capture/organize requirements efficiently.
The hardest part, IMHO, is to identify contradicting requirements, e.g. I want fulltext search, but I also want all data to be encrypted. Pretty hard to achieve.
Without a more precise question, I can only provide such a generic answer.

How to make sure that your code is secure?

I am a programmer. I have about 5 years experience of programming in different kind of languages. I was concerning about my code speed, about optimizing the memory that uses my code, and about good coding style and so on. But have never thought how secure my code is. So I have disassembled my code to see what can do a hacker. Would it be easy to crack my code?
And I saw that it is! It is very easy, because I was storing
serial number as a string
encryption-decryption codes as well
So if someone has the minimal knowledge of assembler he/she can just simple dissembler and after 10-20 minutes of debugging my code is cracked!!! Even it could be done by opening the exe with notepad I guess! :-)
So what I am asking are the following:
Where I should store that kind of secure information’s?
What are the common strategies of delivering a secure code?
First thing you must realize is that you'll never prevent a determined reverser from cracking any protection schemes because anything that the code can do, the reverser will eventually find out how to replicate it. The only way you can achieve any sort of reliable protection is to have the shipped program be nothing more than a dumb client and have the brunt of the software on some server the reverser has no access to.
With that out of the way, you can certainly make it harder for a would be reverser to break your protections. Obfuscation is the sort of first step in achieving this. I have no experience using obfuscators but I'm sure you can find some suggestions for some on SO. Also if you're using a lower level language like C/C++, simply compiling the code with full optimization and stripping all debugging symbols gets you a decent amount of obfuscation.
I read this article a few years ago, but I still think it's techniques hold up today. It's one of the developers of a video game called Spyro talking about the set of techniques they used to prevent piracy. They claim it wasn't until 3 months after the release that a cracked version became available, which is fairly impressive.
If you are concerned about piracy, then there are many avenues you can take. Making the code security tighter (obfuscation, license codes, binding the software to a particular PC, hardware/dongle protection, etc) is one, but it's worth bearing in mind that every piece of software can be cracked if someone sufficiently talented can be bothered.
Another approach is to consider the pricing model for your software. If you charge $1000 a copy, then there is a big incentive for someone to have a go at cracking it. If you only charge $5 then why should anyone bother to crack it?
So what is needed is a balance. Even the most basic protection will stop ordinary people making casual copies. Beyond that, simple techniques (obfuscation and license codes) and a sensible pricing strategy will hold most would-be crackers at bay by making it not worth the bother of cracking. After that, you start getting into ever more sophisticated techniques (dongles/CDs needing to be present to run the software, only being able to run the software after logging on to an online licensing system) that take a lot of effort/cost to implement and significantly increase the risk of annoying genuine customers (remember how annoyed everyone got when they bought half life but it wouldn't let them play the game?) - unless you have a popular mainstream product (i.e. a huge revenue stream to protect), there probably isn't much point going to that much effort.
Make it web app.
It will generally not be well-protected unless there's an external service doing the checking that you are in control of - and that service can still be spoofed by those who really wants to "crack" it. Instead, trust the customer and provide only minimal copyright protection. I'm sure there was an article or podcast about this by Joel Spolsky somewhere... here's another related SO question.
I have no idea if it will help but Windows provides (since 2000) a mechanism to retrieve and store encrypted information and you can also salt this storage on a per-application basis if needed: Data Protection API (DPAPI)
This is on a machine or a user level but storing serials and perhaps some keys using it might be better than having them hidden in the application?
What sort of secure are you talking about?
Secure from the perspective that you are guarding your users data well? If so, study some real cryptography and utilize Existing libraries to encrypt your data. The win32 API is pretty good for this.
But if you're talking about stopping a cracker from stealing your application? There are many methods, but just give up. They slow crackers down, they don't stop them.
Look at How to hide a string in binary code? question
First you have to define what your code should be secure against, being secure as such is meaningless.
You seem to be worried about reverse engineering and users generating license codes without paying, though you don't say so. To make this harder you can obfuscate your code and key information in various ways. There area also techniques to make the use of debuggers harder, to prevent the reverse engineer from stepping through the code and seeing the information in clear.
But this only makes reverse engineering somewhat harder, not impossible
Another common security threat is execution of unwanted code, for example via buffer overflows.
A simple technique for doing this is to xor over all your code and xor back when you need it... but this needs an innate knowledge of assembly... I'm not sure, but you could try this:
void (*encryptionFunctn)(void);
void hideEncryptnFunctn(void)
{
volatile char * i;
while(*i!=0xC0) // 0xC0 is the opcode for ret
{
*i++^=0x45; // or any other code
}
}
To prevent against hackers viewing your code, you should use an obfuscator. An obfuscator will use various techniques which make it extremely difficult to make sense of the obfuscated code. Some techniques used are string encryption, symbol renaming, control flow obfuscation, etc. Check out Crypto Obfuscator which additionally also has external method call hiding, Anti-Reflector, Anti-Debugging, etc
The goal is to erect as many obstacles as possible in the path of a would-be hacker.

Under what circumstances is it reasonable to use a vendor-specific API?

I'm working with Websphere Portal and Oracle. On two occasions last week, I found that the most direct way to code a particular solution involved using API's provided by IBM and Oracle. However, I realize every line of code written using these vendor API's renders us a little bit more locked in to these systems (particularly the portal, which we only implemented recently).
How would you decide whether the benefits of using a vendor API outweigh the costs of being tied (Bound? Shackled? Fettered? Pinioned?) to a particular product?
Obviously a very personal / specific question...
One thing to remember however, is that you can offset some the risks of being "held prisoner" by wrapping whatever proprietary API is supplied into your own defined API. In doing so you will not only decouple your own code from the 3rd party API, but you may also make it easier to write your own logic, by streamlining the 3rd party API to the minimal set of features you desire. A small risk, risk with this minimalist approach, however, is that you may sometimes "miss-out" on some cool features supplied by the 3rd party library/API.
If other vendors are likely to provide similar functionality with a different API, you can define your own interface for the operations, and create a vendor-specific implementation. This helps you avoid lock-in.
Whether this is worthwhile depends on several factors: how much application code will depend on the API, how complex is the API (and the wrapper it will require), how likely are you to switch vendors.
The cost-benefit ratio of being tightly bound to a custom/vendor-specific API varies a lot depending on what you're working with and with who will use your product. Vendor lock-in may be an impediment for some customers and may go unnoticed on others. Perhaps you may find these guidelines useful when deciding what to do:
If your product will target a specific company only and they're already bound to some technology stack because they have contracts that make them pay big bucks for support and there's no way that in a near or far future they'll change that stack, you should go lock-in.
If your product will target a variety of companies in the same sector and you verify that most of them have the same kind of systems, then you may go lock-in for now because it's faster or easier and later on you may adapt your code to the minority
If you will target a broad range of customers, try to avoid to lock-ins. However if you note that these will play a fundamental role in your product's time-to-market, you may again use them for your first customers and then adapt it to the needs of the others, but only as they come.
This is a pretty subjective question. My take is that if you're using Oracle, and don't have immediate plans to move away from Oracle, I can't think of a good reason to deny yourself the power and flexibility of APIs provided by the vendor. Abstracting your application to the Nth degree in pursuit of some holy grail of portability often causes more trouble than its worth, and definitely increases your cost. It can also lead to less-than-optimal performance. Even TSQL has been enhanced and tweaked by every DB vendor to the point that you're probably going to be using some proprietary extension in a query at some point; may as well get all of the benefits while you're at it.

Do formal methods of program verfication have a place in industry?

I took a glimpse on Hoare Logic in college. What we did was really simple. Most of what I did was proving the correctness of simple programs consisting of while loops, if statements, and sequence of instructions, but nothing more. These methods seem very useful!
Are formal methods used in industry widely?
Are these methods used to prove mission-critical software?
Well, Sir Tony Hoare joined Microsoft Research about 10 years ago, and one of the things he started was a formal verification of the Windows NT kernel. Indeed, this was one of the reasons for the long delay of Windows Vista: starting with Vista, large parts of the kernel are actually formally verified wrt. to certain properties like absence of deadlocks, absence of information leaks etc.
This is certainly not typical, but it is probably the single most important application of formal program verification, in terms of its impact (after all, almost every human being is in some way, shape or form affected by a computer running Windows).
This is a question close to my heart (I'm a researcher in Software Verification using formal logics), so you'll probably not be surprised when I say I think these techniques have a useful place, and are not yet used enough in the industry.
There are many levels of "formal methods", so I'll assume you mean those resting on a rigourous mathematical basis (as opposed to, say, following some 6-Sigma style process). Some types of formal methods have had great success - type systems being one example. Static analysis tools based on data flow analysis are also popular, model checking is almost ubiquitous in hardware design, and computational models like Pi-Calculus and CCS seem to be inspiring some real change in practical language design for concurrency. Termination analysis is one that's had a lot of press recently - The SDV project at Microsoft and work by Byron Cook are recent examples of research/practice crossover in formal methods.
Hoare Reasoning has not, so far, made great inroads in the industry - this is for more reasons than I can list, but I suspect is mostly around the complexity of writing then proving specifications for real programs (they tend to get big, and fail to express properties of many real world environments). Various sub-fields in this type of reasoning are now making big inroads into these problems - Separation Logic being one.
This is partially the nature of ongoing (hard) research. But I must confess that we, as theorists, have entirely failed to educate the industry on why our techniques are useful, to keep them relevant to industry needs, and to make them approachable to software developers. At some level, that's not our problem - we're researchers, often mathematicians, and practical usage is not foremost in our minds. Also, the techniques being developed are often too embryonic for use in large scale systems - we work on small programs, on simplified systems, get the math working, and move on. I don't much buy these excuses though - we should be more active in pushing our ideas, and getting a feedback loop between the industry and our work (one of the main reasons I went back to research).
It's probably a good idea for me to resurrect my weblog, and make some more posts on this stuff...
I cannot comment much on mission-critical software, although I know that the avionics industry uses a wide variety of techniques to validate software, including Hoare-style methods.
Formal methods have suffered because early advocates like Edsger Dijkstra insisted that they ought to be used everywhere. Neither the formalisms nor the software support were up to the job. More sensible advocates believe that these methods should be used on problems that are hard. They are not widely used in industry, but adoption is increasing. Probably the greatest inroads have been in the use of formal methods to check safety properties of software. Some of my favorite examples are the SPIN model checker and George Necula's proof-carrying code.
Moving away from practice and into research, Microsoft's Singularity operating-system project is about using formal methods to provide safety guarantees that ordinarily require hardware support. This in turn leads to faster performance and stronger guarantees. For example, in singularity they have proved that if a third-party device driver is allowed into the system (which means basic verification conditions have been proved), then it cannot possibly bring down that whole OS–he worst it can do is hose its own device.
Formal methods are not yet widely used in industry, but they are more widely used than they were 20 years ago, and 20 years from now they will be more widely used still. So you are future-proofed :-)
Yes, they are used, but not widely in all areas. There are more methods than just hoare logic, some are used more, some less, depending on suitability for given task. The common problem is that sofware is biiiiiiig and verifying that all of it is correct is still too hard a problem.
For example the theorem-prover (a software that aids humans in proving program correctness) ACL2 has been used to prove that a certain floating-point processing unit does not have a certain type of bug. It was a big task, so this technique is not too common.
Model checking, another kind of formal verification, is used rather widely nowadays, for example Microsoft provides a type of model checker in the driver development kit and it can be used to verify the driver for a set of common bugs. Model checkers are also often used in verifying hardware circuits.
Rigorous testing can be also thought of as formal verification - there are some formal specifications of which paths of program should be tested and so on.
"Are formal methods used in industry?"
Yes.
The assert statement in many programming languages is related to formal methods for verifying a program.
"Are formal methods used in industry widely ?"
No.
"Are these methods used to prove mission-critical software ?"
Sometimes. More often, they're used to prove that the software is secure. More formally, they're used to prove certain security-related assertions about the software.
There are two different approaches to formal methods in the industry.
One approach is to change the development process completely. The Z notation and the B method that were mentioned are in this first category. B was applied to the development of the driverless subway line 14 in Paris (if you get a chance, climb in the front wagon. It's not often that you get a chance to see the rails in front of you).
Another, more incremental, approach is to preserve the existing development and verification processes and to replace only one of the verification tasks at a time by a new method. This is very attractive but it means developing static analysis tools for exiting, used languages that are often not easy to analyse (because they were not designed to be).
If you go to (for instance)
http://dblp.uni-trier.de/db/indices/a-tree/d/Delmas:David.html
(sorry, only one hyperlink allowed for new users :( )
you will find instances of practical applications of formal methods to the verification of C programs (with static analyzers Astrée, Caveat, Fluctuat, Frama-C) and binary code (with tools from AbsInt GmbH).
By the way, since you mentioned Hoare Logic, in the above list of tools, only Caveat is based on Hoare logic (and Frama-C has a Hoare logic plug-in). The others rely on abstract interpretation, a different technique with a more automatic approach.
My area of expertise is the use of formal methods for static code analysis to show that software is free of run-time errors. This is implemented using a formal methods technique known "abstract interpretation". The technique essentially enables you to prove certain atributes of a s/w program. E.g. prove that a+b will not overflow or x/(x-y) will not result in a divide by zero. An example static analysis tool that uses this technique is Polyspace.
With respect to your question: "Are formal methods used in industry widely?" and "Are these methods used to prove mission-critical software?"
The answer is yes. This opinion is based on my experience and supporting the Polyspace tool for industries that rely on the use of embedded software to control safety critical systems such as electronic throttle in an automobile, braking system for a train, jet engine controller, drug delivery infusion pump, etc. These industries do indeed use these types of formal methods tools.
I don't believe all 100% of these industry segments are using these tools, but the use is increasing. My opinion is that the Aerospace and Automotive industries lead with the Medical Device industry quickly ramping up use.
Polyspace is a a (hideously expensive, but very good) commercial product based on program verification. It's fairly pragmatic, in that it scales up from 'enhanced unit testing that will probably find some bugs' to 'the next three years of your life will be spent showing these 10 files have zero defects'.
It is based more on negative verification ('this program won't corrupt your stack') instead positive verification ('this program will do precisely what these 50 pages of equations say it will').
To add to Jorg's answer, here's an interview with Tony Hoare. The tools Jorg's referring to, I think, are PREfast and PREfix. See here for more information.
Besides of other more procedural approaches, Hoare logic was in the basis of Design by Contract, introduced as an object oriented technique by Bertrand Meyer in Eiffel (see Meyer's article of 1992, page 4). While Design by Contract is not the same as formal verification methods (for one thing, DbC doesn't prove anything until the software is executed), in my opinion it provides a more practical use.