What exactly is GUID? Why and where I should use it? - language-agnostic

What exactly is GUID? Why and where I should use it?
I've seen references to GUID in a lot of places, and in wikipedia,
but it is not very clear telling you where to use it.
If someone could answer this, it would be nice.
Thanks

GUID technically stands for globally unique identifier. What it is, actually, is a 128 bit structure that is unlikely to ever repeat or create a collision. If you do the maths, the domain of values is in the undecillions.
Use guids when you have multiple independent systems or clients generating ID's that need to be unique.
For example, if I have 5 client apps creating and inserting transactional data into a table that has a unique constraint on the ID, then use guids. This prevents having to force a client to request an issued ID from the server first.
This is also great for object factories and systems that have numerous object types stored in different tables where you don't want any 2 objects to have the same ID. This makes caching and scavenging schemas much easier to implement.

A GUID is a "Globally Unique IDentifier". You use it anywhere that you need an identifier that guaranteed to be different than every other.
Usually, you only need a value to be "locally unique" -- the Primary Key identity in a database table,for example, needs only be different from the other rows in that table, but can be the same as the ID in other tables. (no need for a GUID here)
GUIDs are generally used when you will be defining an ID that must be different from an ID that someone else (outside of your control) will be defining. One such place in the Interface identifier on ActiveX controls. Anyone can create an ActiveX, and not know with what other control someone will be using them with --- and there's nothing to stop everyone from giving their controls the same name. GUIDs keep them distinct.
GUIDs are a combination of the time (in very small fractions of a second) (so it assured to be different from any GUID defined before or later), and a number defining your location (sometimes taken from the MAC address of you network card) (so it's assured to be different from any other GUID defined right now by someone else).
They are also sometimes known as UUIDs (universally unique ID).

As addition to all the other answers, here is an online GUID generator:
http://www.guidgenerator.com/
What is a GUID?
GUID (or UUID) is an acronym for
'Globally Unique Identifier' (or
'Universally Unique Identifier'). It
is a 128-bit integer number used to
identify resources. The term GUID is
generally used by developers working
with Microsoft technologies, while
UUID is used everywhere else.
How unique is a GUID?
128-bits is big enough and the
generation algorithm is unique enough
that if 1,0000,000,000 GUIDs per
second were generated for 1 year the
probability of a duplicate would be
only 50%. Or if every human on Earth
generated 600,000,000 GUIDs there
would only be a 50% probability of a
duplicate.
How are GUIDs used?
GUIDs are used in software development
as database keys, component
identifiers, or just about anywhere
else a truly unique identifier is
required. GUIDs are also used to
identify all interfaces and objects in
COM programming.

A GUID is a "Globally Unique ID". Also called a UUID (Universally Unique ID).
It's basically a 128 bit number that is generated in a way (see RFC 4112 http://www.ietf.org/rfc/rfc4122.txt) that makes it nearly impossible for duplicates to be generated. This way, I can generate GUIDs without some third party organization having to give them to me to ensure they are unique.
One widespread use of GUIDs is as identifiers for COM entities on Windows (classes, typelibs, interfaces, etc.). Using GUIDs, developers could build their COM components without going to Microsoft to get a unique identifier. Even though identifying COM entities is a major use of GUIDs, they are used for many things that need unique identifiers. Some developers will generate GUIDs for database records to provide them an ID that can be used even when they must be unique across many different databases.
Generally, you can think of a GUID as a serial number that can be generated by anyone at anytime and they'll know that the serial number will be unique.
Other ways to get unique identifiers include getting a domain name. To ensure the uniqueness of domain names, you have to get it from some organization (ultimately administered by ICANN).
Because GUIDs can be unwieldy (from a human readable point of view they are a string of hexadecimal numbers, usually grouped like so: aaaaaaaa-bbbb-cccc-dddd-ffffffffffff), some namespaces that need unique names across different organization use another scheme (often based on Internet domain names).
So, the namespace for Java packages by convention starts with the orgnaization's domain name (reversed) followed by names that are determined in some organization specfic way. For example, a Java package might be named:
com.example.jpackage
This means that dealing with name collisions becomes the responsibility of each organization.
XML namespaces are also made unique in a similar way - by convention, someone creating an XML namespace is supposed to make it 'underneath' a registered domain name under their control. For example:
xmlns="http://www.w3.org/1999/xhtml"
Another way that unique IDs have been managed is for Ethernet MAC addresses. A company that makes Ethernet cards has to get a block of addresses assigned to them by the IEEE (I think it's the IEEE). In this case the scheme has worked pretty well, and even if a manufacturer screws up and issues cards with duplicate MAC addresses, things will still work OK as long as those cards are not on the same subnet, since beyond a subnet, only the IP address is used to route packets. Although there are some other uses of MAC addresses that might be affected - one of the algorithms for generating GUIDs uses the MAC address as one parameter. This GUID generation method is not as widely used anymore because it is considered a privacy threat.
One example of a scheme to come up with unique identifiers that didn't work very well was the Microsoft provided ID's for 'VxD' drivers in Windows 9x. Developers of third party VxD drivers were supposed to ask Microsoft for a set of IDs to use for any drivers the third party wrote. This way, Microsoft could ensure there were not duplicate IDs. Unfortunately, many driver writers never bothered, and simply used whatever ID was in the example VxD they used as a starting point. I'm not sure how much trouble this caused - I don't think VxD ID uniqueness was absolutely necessary, but it probably affected some functionality in some APIs.

GUID or UUID (globally vs Universally) Unique IDentifier is, well, a unique ID :) When you need something really unique machine generated, there are libraries to get you one.
See GUID on wikipedia for details.
As to when you don't need a GUID, it is when a counter that you control (one way or another, like a SERIAL SQL type or a sequence) gets incremented. Indexing a "text" value (GUID in textual form) or a 128 bit binary value (which a GUID is) is far more expensive than an integer.

Someone said they are conceptually 128-bit random values, and that is substantially true, but having done a little reading on UUID (GUID usually refers to Microsoft's implementation of UUID), I see that there are several different UUID versions, and most of them are not actually random. So it is possible to generate a UUID for a machine (or something else) and be able to reliably repeat that process to obtain the same UUID down the road, which is important for some applications.

For me it's easier to think of them as simply "128-bit random values". Which is essentially what they are. There are some algorithms for including a bit of information in a few digits of your GUID (thus the random part gets a bit smaller), but still they are pretty large almost-random values.
Since they are so large, it is extremely unlikely that two GUIDs will ever be generated that are the same. For all practical purposes, every GUID ever generated is unique in the world.
I'll leave it to you to figure out where to use them, but other answers already have some examples. Let your imagination run wild. :)

Can be a hard thing to understand because of all the maths that goes on behind generating them. Think of it as a unique id. You can get Visual Studio to generate one for you, or .NET if you happen to be using C# or one of the many other applications or websites. They are considered unique because there is such a silly small chance you'll see the same one twice that it isn't worth considering.

128-bit Globally Unique ID. You can generate GUIDs from now until sunset and you never generate the same GUID twice, and neither will anyone else. They are used a lot with COM.
As for example of something you would use them for, we use them in one of our products. Our users can generate categories and cards on various devices. We want to make sure that we don't confuse a category made on one device with a category created on a different one, so it's important that IDs are unique no matter who generates them, where they generate them, and when they generate them. So we use GUIDs (actually we use our own scheme using 64-bit numbers but they are similar to GUIDs).

I worked on an ACD call center system a few years back where we wanted to gather call detail records from multiple call processors into a single database. I setup a column in MS SQL to generate a GUID for the database key rather than using a system-generated sequential ID (identity column). Back then, this required setting the default value to NewID (or generating it in the code, but the NewID() function was safer). Of course, having a large value for a key may raise a few eyebrows, but I would rather give up the space than risk a collision.
I didn't see anyone address using a GUID as a database key so I thought it might help to know you could do that too.

GUID stands for "Globally Unique Identifier" and you use it when you want to have, erm, a Globally Unique Identifier.
In RSS feeds, for example, you should have a GUID for each item in the feed. That way, the feed reader software can keep track of whether you have read that item or not. Without a GUID, it would be impossible to tell.
A GUID differs from something like a database ID in that no matter who creates an object -- you, me, the guy down the street -- our GUIDs will always be different. There should be no collisions using a GUID.
You'll also see the term UUID, which stands for "Universally Unique Identifier." There is essentially no difference between the two. UUID is the more appropriate term. GUID is the term used by Microsoft.

If you need to generate an identifier that needs to be unique during the whole lifetime of your application, you use a GUID.
Imagine you have a server with sessions, if you give each session a GUID, you are certain that it will be unique for every session ever created by your server. This is useful for tracing bugs.

One particularly useful application of GUIDs that I've found is using them to track unique visitors in webapps where the visitors are anonymous (i.e. not logged in or registered).

GUID = Global Unique IDentifier.
Use it when you want to uniquely identify something in a global context.
This generator can be handy.

The Wikipedia article on GUIDs is pretty clear on what they are used for - maybe rephrasing your question would help - what do you need a GUID for?

To actually see what it looks like on a windows computer, go to cmd or powershell.
Powershell => [guid]::NewGuid()
CMD => powershell [guid]::NewGuid()

Related

How to sanitize or randomize sensitive database fields

What's the most efficient method or tool to randomize a list of database table columns to obscure sensitive information?
I have a Django application used by several clients, and I need to onboard some development contractors to do work on the application. When they work on bugs (e.g. page /admin/model/123 has an error), ideally they'd need a snapshot of the client database in order to reproduce and fix the bug. However, because they're off-site contractors, I'd like to mitigate risk in the event they expose the client database (unintentionally or otherwise). I don't want to have to explain to a client why all their data's been published online because a foreign contractor left his laptop in an unlocked car.
To do this, I'd like to find or write a tool to "randomize" sensitive fields in the database, like usernames, email addresses, account numbers, company names, phone numbers, etc so that the structure of the data is maintained, but all personally identifiable information is removed.
Presumably, this is a task that many other people have had to do, but I'm not sure what the technical term is, so I'm not finding much through Google. Are there any existing tools to do this with a Django application running a MySQL or PostgreSQL backend?
Anonymize and sanitize are good words for this chore.
It's relatively easy to do. Use queries like
UPDATE person
SET name = CONCAT('Person', person_id),
email = CONCAT('Person', person_id, '#example.com')
and so forth, to stomp actual names and emails and all that. It's helpful to preserve the uniqueness of entries, and the autoincrementing IDs of various tables can help you do that.
(Adding this as an answer, as I am not allowed to comment yet.)
As Cerin said, O. Jones approach for anonymizing/sanitizing works for simple fields, but not more complicated ones like addresses, phone number or account numbers that need to match a specific format. However, the method can be modified to allow this too.
Let's take a phone number with format aaa-bbbb-ccc as an example and use the autoincrementing person_id as the source of unique numbers. For the ccc part of the phone number, use MOD(person_id,1000). This will give the remainder of person_id divided by a 1000. For bbbb, take MOD((person_id-MOD(person_id,1000))/1000,10000). It looks complicated, but what this is doing is taking person_id, removing the last three digits (which were used for ccc), then dividing by a 1000. The last four digits are taken from the resulting number to use as bbbb. I think you'll be able to figure out how to calculate aaa.
The three parts of the phone number can then be concatenated to give the complete phone number: CONCAT(aaa,"-",bbbb,"-",ccc)
(You might have to explicitly convert the numbers to string, I'm not sure)

I know a GUID is nearly unique. But is it acceptable practice to assume it is unique?

So I completely understand the mathematical unlikeliness of creating two GUID values with the same number. But is it acceptable practice to assume they are unique?
For example I am working with a system for dealing with medical files. When I began to layout the database structure the manager (Not very technically knowledgeable, but likes to think he is and delegates things that would be better left for the more technically minded to decide) says he wants to use GUID's to separate different medical records instead of INT because it is "More unique". I explained how an INT is always going to be unique because it is sequential. I suggested we use BigINT if it will make him feel more comfortable since there are more numbers in that then if the population of the planet increased to the point people would only fit standing next to one another across the planet, but he is insisting on using GUIDs.
My feeling is although it is NEARLY IMPOSSIBLE for there to be a mix up, when dealing with medical records, why take the chance? What is the advantage of using a GUID vs an INT in this scenario?
But is it acceptable practice to assume it is unique?
Yes. That is the entire purpose of UUID, to be used as a reliable unique identifier without centralized coordination. (A GUID is Microsoft’s variation of a UUID.)
Only you (or your appropriate management) can make the final judgement for your particular project.
But if you truly begin to appreciate the enormity of the numerical range of 12x bits (which is actually incomprehensible to the human mind), then you know you can remove the usage of a properly generated UUID from your list of worries.
By “properly generated” I mean things like using the date-time Versions, or for lower number of values use the random (Version 4) if backed by a cryptographically-strong random number generator. Nearly every modern operating system today includes a UUID generation library. Or you can use the OSSP UUID project. Improperly-generated would include roll-your-own implementations you may see bandied about the inter webs.
As for the suggestion to use a database’s auto-incrementing serial/sequence number, every database person I know with years of real-world experience has been burned by those. I’ve never heard of or read of anyone ever having a collision with properly-generated UUIDs. I'm not saying sequences are necessarily bad or don't have their place, I'm just saying that all I can do is laugh when I hear people turn away from a UUID because of some beyond-astrononomically incomprehensibly minute possibility of a UUID collision and choose a sequence instead.
when dealing with medical records, why take the chance?
Your medical system is far far more likely to fail because of faulty data-entry or other human error with handling records. But do you post 3 clerks on duty to independently triple-enter the same data to reduce that chance of error? No. And that risk is incomprehensibly mathematically more likely to happen than a UUID problem. Yet every medical facility I know of accepts that enormous risk without even thinking about it.
What is the advantage of using a GUID vs an INT
The advantages include:
No need to manage your sequences.Examples include: Resetting for development, test, and production environments. Or when restoring a backup. Or fixing the sequence after faults in the system’s serial generation library (my own experience).
Avoid users’ intuited assumptions being confused about missing numbers in the sequence. I've had that conversation far too often.
Federating data between distributed systems.This is the biggest advantage, each system can act independently yet easily share data back and forth with other systems. Without UUIDs, the administrative overhead and the risk of error are bothersome at first and only grow over time.
Downsides include:
Larger memory and storage usage.Serial numbers are usually 32-bit integers, sometimes 64-bit. A good database with native support for UUID as a data type will use 128 bits.
Less readable by humans.One workaround is to just read several of the first or last digits for casual work.
Possibly less efficient indexing, with very large number of entries.
using an incrementing integer ID ensures only uniqueness within its own domain/type, an advantage of UUIDs/GUIDs is that they uniquely identify the owning thing in the entire universe.
So if you have multiple objects, say MedicalRecord, ID = 5, VaccinationForm, ID = 5 then you need to specify both the type ("medicalRecord" or "vaccinationForm" with the ID value of 5) whereas with a GUID you only need to store a single quanta of information to uniquely identify it.
It can be argued that using GUIDs is a waste of space as they are 16 bytes long (a 128-bit value).
If your system is self-contained and not interfacing with others you might want to use SQL Server's "sequence" concept, where instead of each table storing its own identity sequence, the sequence is maintained for all tables, making it a Locally-Unique ID value. You can use any size integer too.
See here: https://msdn.microsoft.com/en-us/library/ff878091.aspx

Are there any inobvious ways of abusing GUIDs?

GUIDs are typically used for uniquely identifying all kinds of entities - requests from external systems, files, whatever. Work like magic - you call a "GiveMeGuid()" (UuidCreate() on Windows) function - and a fresh new GUID is here at your service.
Given my code really calls that "GiveMeGuid()" function each time I need a new GUID is there any not so obvious way to misuse it?
Just found an answer to an old question: How deterministic Are .Net GUIDs?. Requoting it:
It's not a complete answer, but I can tell you that the 13th hex digit is always 4 because it denotes the version of the algorithm used to generate the GUID (id est, v4); also, and I quote Wikipedia:
Cryptanalysis of the WinAPI GUID generator shows that, since the sequence of V4 GUIDs is pseudo-random, given the initial state one can predict up to the next 250 000 GUIDs returned by the function UuidCreate. This is why GUIDs should not be used in cryptography, e.g., as random keys.
So, if you got lucky and get same seed, you'll break 250k mirrors in sequence. To quote another Wikipedia piece:
While each generated GUID is not guaranteed to be unique, the total number of unique keys (2128 or 3.4×1038) is so large that the probability of the same number being generated twice is extremely small.
Bottom line: maybe a misuse form it's to consider GUID always unique.
It depends. Some implementations of GUID generation are time dependant, so calling CreateGuid in quick succession MAY create clashing GUIDs.
edit: I now remember the problem. I was once working on some php code where the GUID generating function was reseeding the RNG with the system time each call. Don't do this.
The only way I can see of misusing a Guid is trying to interpret the value in some logical manner. Not that it really invites you to do so, which is one of the characteristics around Guid's that I really like.
Some GUIDs include some identifier of the machine it was generated on, so it can be used in client/server environments, but some can't. Be sure if yours doesn't to not use them in, for instance, a database multiple clients access.
Maybe the entropy could be manipulated by playing with some parameters used to generate the GUIDs in the first place (e.g. interface identifiers).

Should I obscure primary key values?

I'm building a web application where the front end is a highly-specialized search engine. Searching is handled at the main URL, and the user is passed off to a sub-directory when they click on a search result for a more detailed display. This hand-off is being done as a GET request with the primary key being passed in the query string. I seem to recall reading somewhere that exposing primary keys to the user was not a good idea, so I decided to implement reversible encryption.
I'm starting to wonder if I'm just being paranoid. The reversible encryption (base64) is probably easily broken by anybody who cares to try, makes the URLs very ugly, and also longer than they otherwise would be. Should I just drop the encryption and send my primary keys in the clear?
What you're doing is basically obfuscation. A reversible encrypted (and base64 doesn't really count as encryption) primary key is still a primary key.
What you were reading comes down to this: you generally don't want to have your primary keys have any kind of meaning outside the system. This is called a technical primary key rather than a natural primary key. That's why you might use an auto number field for Patient ID rather than SSN (which is called a natural primary key).
Technical primary keys are generally favoured over natural primary keys because things that seem constant do change and this can cause problems. Even countries can come into existence and cease to exist.
If you do have technical primary keys you don't want to make them de facto natural primary keys by giving them meaning they didn't otherwise have. I think it's fine to put a primary key in a URL but security is a separate topic. If someone can change that URL and get access to something they shouldn't have access to then it's a security problem and needs to be handled by authentication and authorization.
Some will argue they should never be seen by users. I don't think you need to go that far.
On the dangers of exposing your primary key, you'll want to read "autoincrement considered harmful", By Joshua Schachter.
URLs that include an identifier will
let you down for three reasons.
The first is that given the URL for
some object, you can figure out the
URLs for objects that were created
around it. This exposes the number of
objects in your database to possible
competitors or other people you might
not want having this information (as
famously demonstrated by the Allies
guessing German tank production levels
by looking at the serial numbers.)
Secondly, at some point some jerk will
get the idea to write a shell script
with a for-loop and try to fetch every
single object from your system; this
is definitely no fun.
Finally, in the case of users, it
allows people to derive some sort of
social hierarchy. Witness the frequent
hijacking and/or hacking of
high-prestige low-digit ICQ ids.
If you're worried about someone altering the URL to try and look at other values, then perhaps you need to look at token generation.
For instance, instead of giving the user a 'SearchID' value, you give them a SearchToken, which is some long unique psuedo-random value (Read: GUID), which you then map to the SearchID internally.
Of course, you'll also need to apply session security and soforth still - because even a unique URL with a non-sequential ID isn't protected against sniffing by anything between your server and the user.
If you're obscuring the primary keys for a security reason, don't do it. That's called security by obscurity and there is a better way. Having said that, there is at least one valid reason to obscure primary keys and that's to prevent someone from scraping all your content by simply examining a querystring in a URL and determining that they can simply increment an id value and pull down every record. A determined scraper may still be able to discover your means of obsuring and do this despite your best efforts, but at least you haven't made it easy.
PostgreSQL provides multiple solutions for this problem, and that could be adapted for others RDBMs:
hashids : https://hashids.org/postgresql/
Hashids is a small open-source library that generates short, unique, non-sequential ids from numbers.
It converts numbers like 347 into strings like “yr8”, or array of numbers like [27, 986] into “3kTMd”.
You can also decode those ids back. This is useful in bundling several parameters into one or simply using them as short UIDs.
optimus is similar to hashids but provides only integers as output: https://github.com/jenssegers/optimus
skip32 at https://wiki.postgresql.org/wiki/Skip32_(crypt_32_bits):
It may be used to generate series of unique values that look random, or to obfuscate a SERIAL primary key without loosing its unicity property.
pseudo_encrypt() at https://wiki.postgresql.org/wiki/Pseudo_encrypt:
pseudo_encrypt(int) can be used as a pseudo-random generator of unique values. It produces an integer output that is uniquely associated to its integer input (by a mathematical permutation), but looks random at the same time, with zero collision. This is useful to communicate numbers generated sequentially without revealing their ordinal position in the sequence (for ticket numbers, URLs shorteners, promo codes...)
this article gives details on how this is done at Instagram: https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c and it boils down to:
We’ve delegated ID creation to each table inside each shard, by using PL/PGSQL, Postgres’ internal programming language, and Postgres’ existing auto-increment functionality.
Each of our IDs consists of:
41 bits for time in milliseconds (gives us 41 years of IDs with a custom epoch)
13 bits that represent the logical shard ID
10 bits that represent an auto-incrementing sequence, modulus 1024. This means we can generate 1024 IDs, per shard, per millisecond
Just send the primary keys. As long as your database operations are sealed off from the user interface, this is no problem.
For your purposes (building a search engine) the security tradeoffs benefits of encrypting database primary keys is negligible. Base64 encoding isn't encryption - it's security through obscurity and won't even be a speedbump to an attacker.
If you're trying to secure database query input just use parametrized queries. There's no reason at all to hide primary keys if they are manipulated by the public.
When you see base64 in the URL, you are pretty much guaranteed the developers of that site don't know what they are doing and the site is vulnerable.
URLs that include an identifier will
let you down for three reasons.
Wrong, wrong, wrong.
First - every request has to be validated, regardless of it coming in the form of a HTTP GET with an id, or a POST, or a web service call.
Second - a properly made web-site needs protection against bots which relies on IP address tracking and request frequency analysis; hiding ids might stop some people from writing a shell script to get a sequence of objects, but there are other ways to exploit a web site by using a bruteforce attack of some sort.
Third - ICQ ids are valuable but only because they're related to users and are a user's primary means of identification; it's a one-of-a-kind approach to user authentication, not used by any other service, program or web-site.
So, to conclude.. Yes, you need to worry about scrapers and DDOS attacks and data protection and a whole bunch of other stuff, but hiding ids will not properly solve any of those problems.
When I need a query string parameter to be able to identify a single row in a column, I normally add a GUID column to that table, and then pass the GUID in the connection string instead of the row's primary key value.

What's a good approach for developing a simple serial number generator/verifier?

I'm working on an app I'd like to sell some day -- sooner rather than later! I'd like to develop a reasonably simple serial number scheme to protect it.
A simple number/letter combination not more than 25-30 alphanumeric characters long (think Microsoft product keys)
Does not require the user to enter any personal information (like an email address) as part of the verification
I've been thinking about this a (very little) bit, and I think public key cryptography is a good place to start. I could generate a string that identifies the license (like SKU + plain ole' integral serial number), hash it, encrypt it, and encode the serial number + identifier into a 25 digit (or so) alphanumeric key. The app would then decode the key into a serial number and "signature", generate an identifier hash, decrypt the "signature" using a corresponding public key and compare it against the generated identifier hash.
Essentially, the product key carries two pieces of data: the serial number the user claims to own plus a signature of sorts the program can use to verify that claim. I don't know if 25 alphanumeric characters (which encode 5 bits each for a realistic total of 120 bits) is enough for all this. But, it doesn't have to be cryptographically secure, just enough that the codes aren't easily guessable. I'm OK with short key lengths and short hashes.
As far as implementation goes, the app is written in Objective-C for Mac OS X, but given how easy it is to inject code into Cocoa apps, I'll probably write the verification code in straight C.
I would not use any strong cryptography, since you have to decrypt it in program anyways, making keygens or at least cracks easy to do.
I would do the following - take a, say, 25 digit number. Now add some rules, such as:
- number must be divisible by 31
- it must start and end with the last letter
...
Always generate keys using these rules. Use 20 rules or more (more the better). When deploying the app, use smaller number of rules, e.g. 10 to check if key is valid.
These rules will then be disassemled and used to create keygen.
On every update enable one of the rules you didn't use before. If rules are selected correctly, you will disable most of keys generated by keygens.
I like #bh213's method, however it isn't going to prevent the key-gens from being fixed as you update your serial number rules.
On a more personal preference note, I prefer the key generator based on a rule set method because if hackers have to patch a binary, you stand to get a bad review on your software because of a bad hacker patch and the ensuing battle between you and hackers.
My preference is based on a universal software truth: Software can and will be hacked if it is popular enough, there is no scheme, no ingenious method that will prevent this from happening. This battle is between the developer who has limited resources and limited time and a hacker group with unlimited time on their hands.
Your key generation scheme is really only to keep the honest customers honest - its easier to get a check cut from Accounts Payable than it is to get Security to sign off on a key generator.
As Redbeard 0x0A is correct, CD-Keys are to keep honest customers honest. It just needs to be slightly less difficult to buy your product than to find a keygen.
If you are selling your product online, then the best way to do it is to give them a file containing the serial. This way the serial can be however long you want and your paying customers don't have to waste time entering a serial.
A serial scheme can be very simple:
Have a large serial space (25 alpha numeric is about 1044, but just use a serial file and do 80 char x 16 rows for a key space of 102311)
Select a few rules to reduce valid serial number space to 100 times what you think you will sell in your wildest dreams
If your product has an online component (like how games have multiplayer online), you could further reduce valid serial numbers using a cryptographically strong random number generator (to select a subset of the rule based keys, product uses rule to check serial, server uses final true serial list). When your product requests service from your server, the server can check the serial.
All rules whether custom developed or encryption based can be found out and cracked. Think about who you're building your application for. Many of my products are for businesses so they're either going to buy it or they won't. They're typically not going to run a hacked version on their networks. People looking for a keygen aren't likely to purchase your product regardless. You just want to make sure you don't annoy your customer to the point where they're no longer wanting to buy your app.
That being said, I've written a library for this sort of thing to use in my applications based on AES encryption. I'm selling it for $25, and it uses a passphrase and a salt to make your serial number unique. If you're interested you can find it here: http://simpleserials.com
There is a Blog Post on sigpipe.macromates.com explaining how you use private/public key crypto for checking a serial number. It can verify that the user and Serialnumber match. (Signing/Verify). I would probably add some salt, just to be sure.
As this post is from 2004, you should consider the recommended keylength at keylength.com.