How do Keygens function? - key-generator

A keying algorithm used to register software and games. Let's say there is no server-side verification to get in the way...
I would like to know how these can be replicated in C Code so I can have a better understanding of creating my own keygens.
I'm currently a student learning Reverse Engineering and I am trying to reverse crackmes in order to build a keygen for the software.
What are the steps to creating my own keygen that I can debug and crack?
Are all keygens basically a sum of ASCII numbers?
Are there different variants of keygens?

The key generators you're thinking of work very similar to Cryptographic hash functions and often (if not checking with an online authority) verify against some files or checksums and the software just stops execution if the hash/key wasn't found or entered correctly.
You're getting your terminology mixed up speaking about "ASCII Numbers". Any data on a computer can be converted to numbers, and if chosen to convert blocks of 8 bits to numbers between 0-255 you'll always find a corresponding ASCII character. Often you will actually find that keys are encoded in hex, characters from 0-9 and A-F using 4 bits per character.
And, there are many.

Related

How data stored in Binary form?

I want to know the concept of data That how data is stored in a computer system in binary form. There is huge amount of data and how many combinations computer system make to store that data.
Binary is as the name suggests, only two points that are expressed in a number of ways depending on where you look but all represent the same meaning: -
1 or 0
1 or -1
+ or -
up or down
on or off
A computer does nothing more than put all them together very fast to make things, similar to Morse code. In Morse code a series of on and off makes up letters, letters make up words, sentences, paragraphs, chapters, books, libraries.
How is the data stored, its nothing more than binary, a piece of metal that is magnetized (or switched) to be + or - to represent one of the binary choices.
This was not the right question for this site and you should have searched and read up on this yourself.

Checking social security numbers and business identity codes in Access 97

In my work, I was asked to write an Access 97 program which checks whether a given social security number is valid or not. The problem is that I was unable to find any good site which gives the format of SSN's in different languages. Are there any such site or where can I find those formats? The another problem is that there are languages with different alphabets so is is possible to put for example Greek letters in Access database if my Windows is in Finnish and Access 97 is in English? Or do functions like len or mid works in different alphabets?
Also, I need an algorithm to check whether a given Business ID is valid in different languages.
Access 97? So last century.
SSN is an American-only program, so I don't see what Finnish has to do with it. Digits are digits in all languages.
If you're asking what other countries do for individual or tax ID numbers, that's a different issue.
I think a design that uses data like this for primary keys is doomed to failure. A lot of US states prohibit you from collecting SSN. What do you do for those people? Better to have a surrogate key and make SSN just another indexed column with a unique constraint.

What's a good approach for developing a simple serial number generator/verifier?

I'm working on an app I'd like to sell some day -- sooner rather than later! I'd like to develop a reasonably simple serial number scheme to protect it.
A simple number/letter combination not more than 25-30 alphanumeric characters long (think Microsoft product keys)
Does not require the user to enter any personal information (like an email address) as part of the verification
I've been thinking about this a (very little) bit, and I think public key cryptography is a good place to start. I could generate a string that identifies the license (like SKU + plain ole' integral serial number), hash it, encrypt it, and encode the serial number + identifier into a 25 digit (or so) alphanumeric key. The app would then decode the key into a serial number and "signature", generate an identifier hash, decrypt the "signature" using a corresponding public key and compare it against the generated identifier hash.
Essentially, the product key carries two pieces of data: the serial number the user claims to own plus a signature of sorts the program can use to verify that claim. I don't know if 25 alphanumeric characters (which encode 5 bits each for a realistic total of 120 bits) is enough for all this. But, it doesn't have to be cryptographically secure, just enough that the codes aren't easily guessable. I'm OK with short key lengths and short hashes.
As far as implementation goes, the app is written in Objective-C for Mac OS X, but given how easy it is to inject code into Cocoa apps, I'll probably write the verification code in straight C.
I would not use any strong cryptography, since you have to decrypt it in program anyways, making keygens or at least cracks easy to do.
I would do the following - take a, say, 25 digit number. Now add some rules, such as:
- number must be divisible by 31
- it must start and end with the last letter
...
Always generate keys using these rules. Use 20 rules or more (more the better). When deploying the app, use smaller number of rules, e.g. 10 to check if key is valid.
These rules will then be disassemled and used to create keygen.
On every update enable one of the rules you didn't use before. If rules are selected correctly, you will disable most of keys generated by keygens.
I like #bh213's method, however it isn't going to prevent the key-gens from being fixed as you update your serial number rules.
On a more personal preference note, I prefer the key generator based on a rule set method because if hackers have to patch a binary, you stand to get a bad review on your software because of a bad hacker patch and the ensuing battle between you and hackers.
My preference is based on a universal software truth: Software can and will be hacked if it is popular enough, there is no scheme, no ingenious method that will prevent this from happening. This battle is between the developer who has limited resources and limited time and a hacker group with unlimited time on their hands.
Your key generation scheme is really only to keep the honest customers honest - its easier to get a check cut from Accounts Payable than it is to get Security to sign off on a key generator.
As Redbeard 0x0A is correct, CD-Keys are to keep honest customers honest. It just needs to be slightly less difficult to buy your product than to find a keygen.
If you are selling your product online, then the best way to do it is to give them a file containing the serial. This way the serial can be however long you want and your paying customers don't have to waste time entering a serial.
A serial scheme can be very simple:
Have a large serial space (25 alpha numeric is about 1044, but just use a serial file and do 80 char x 16 rows for a key space of 102311)
Select a few rules to reduce valid serial number space to 100 times what you think you will sell in your wildest dreams
If your product has an online component (like how games have multiplayer online), you could further reduce valid serial numbers using a cryptographically strong random number generator (to select a subset of the rule based keys, product uses rule to check serial, server uses final true serial list). When your product requests service from your server, the server can check the serial.
All rules whether custom developed or encryption based can be found out and cracked. Think about who you're building your application for. Many of my products are for businesses so they're either going to buy it or they won't. They're typically not going to run a hacked version on their networks. People looking for a keygen aren't likely to purchase your product regardless. You just want to make sure you don't annoy your customer to the point where they're no longer wanting to buy your app.
That being said, I've written a library for this sort of thing to use in my applications based on AES encryption. I'm selling it for $25, and it uses a passphrase and a salt to make your serial number unique. If you're interested you can find it here: http://simpleserials.com
There is a Blog Post on sigpipe.macromates.com explaining how you use private/public key crypto for checking a serial number. It can verify that the user and Serialnumber match. (Signing/Verify). I would probably add some salt, just to be sure.
As this post is from 2004, you should consider the recommended keylength at keylength.com.

How do I do a fuzzy match of company names in MYSQL with PHP for auto-complete?

My users will import through cut and paste a large string that will contain company names.
I have an existing and growing MYSQL database of companies names, each with a unique company_id.
I want to be able to parse through the string and assign to each of the user-inputed company names a fuzzy match.
Right now, just doing a straight-up string match, is also slow. ** Will Soundex indexing be faster? How can I give the user some options as they are typing? **
For example, someone writes:
Microsoft -> Microsoft
Bare Essentials -> Bare Escentuals
Polycom, Inc. -> Polycom
I have found the following threads that seem similar to this question, but the poster has not approved and I'm not sure if their use-case is applicable:
How to find best fuzzy match for a string in a large string database
Matching inexact company names in Java
You can start with using SOUNDEX(), this will probably do for what you need (I picture an auto-suggestion box of already-existing alternatives for what the user is typing).
The drawbacks of SOUNDEX() are:
its inability to differentiate longer strings. Only the first few characters are taken into account, longer strings that diverge at the end generate the same SOUNDEX value
the fact the the first letter must be the same or you won't find a match easily. SQL Server has DIFFERENCE() function to tell you how much two SOUNDEX values are apart, but I think MySQL has nothing of that kind built in.
for MySQL, at least according to the docs, SOUNDEX is broken for unicode input
Example:
SELECT SOUNDEX('Microsoft')
SELECT SOUNDEX('Microsift')
SELECT SOUNDEX('Microsift Corporation')
SELECT SOUNDEX('Microsift Subsidary')
/* all of these return 'M262' */
For more advanced needs, I think you need to look at the Levenshtein distance (also called "edit distance") of two strings and work with a threshold. This is the more complex (=slower) solution, but it allows for greater flexibility.
Main drawback is, that you need both strings to calculate the distance between them. With SOUNDEX you can store a pre-calculated SOUNDEX in your table and compare/sort/group/filter on that. With the Levenshtein distance, you might find that the difference between "Microsoft" and "Nzcrosoft" is only 2, but it will take a lot more time to come to that result.
In any case, an example Levenshtein distance function for MySQL can be found at codejanitor.com: Levenshtein Distance as a MySQL Stored Function (Feb. 10th, 2007).
SOUNDEX is an OK algorithm for this, but there have been recent advances on this topic. Another algorithm was created called the Metaphone, and it was later revised to a Double Metaphone algorithm. I have personally used the java apache commons implementation of double metaphone and it is customizable and accurate.
They have implementations in lots of other languages on the wikipedia page for it, too. This question has been answered, but should you find any of the identified problems with the SOUNDEX appearing in your application, it's nice to know there are options. Sometimes it can generate the same code for two really different words. Double metaphone was created to help take care of that problem.
Stolen from wikipedia: http://en.wikipedia.org/wiki/Soundex
As a response to deficiencies in the
Soundex algorithm, Lawrence Philips
developed the Metaphone algorithm for
the same purpose. Philips later
developed an improvement to Metaphone,
which he called Double-Metaphone.
Double-Metaphone includes a much
larger encoding rule set than its
predecessor, handles a subset of
non-Latin characters, and returns a
primary and a secondary encoding to
account for different pronunciations
of a single word in English.
At the bottom of the double metaphone page, they have the implementations of it for all kinds of programming languages: http://en.wikipedia.org/wiki/Double-Metaphone
Python & MySQL implementation: https://github.com/AtomBoy/double-metaphone
Firstly, I would like to add that you should be very careful when using any form of Phonetic/Fuzzy Matching Algorithm, as this kind of logic is exactly that, Fuzzy or to put it more simply; potentially inaccurate. Especially true when used for matching company names.
A good approach is to seek corroboration from other data, such as address information, postal codes, tel numbers, Geo Coordinates etc. This will help confirm the probability of your data being accurately matched.
There are a whole range of issues related to B2B Data Matching too many to be addressed here, I have written more about Company Name Matching in my blog (also an updated article), but in summary the key issues are:
Looking at the whole string is unhelpful as the most important part
of a Company Name is not necessarily at the beginning of the Company
Name. i.e. ‘The Proctor and Gamble Company’ or ‘United States Federal
Reserve ‘
Abbreviations are common place in Company Names i.e. HP, GM, GE, P&G,
D&B etc..
Some companies deliberately spell their names incorrectly as part of
their branding and to differentiate themselves from other companies.
Matching exact data is easy, but matching non-exact data can be much more time consuming and I would suggest that you should consider how you will be validating the non-exact matches to ensure these are of acceptable quality.
Before we built Match2Lists.com, we used to spend an unhealthy amount of time validating fuzzy matches. In Match2Lists we incorporated a powerful Visualisation tool enabling us to review non-exact matches, this proved to be a real game changer in terms of match validation, reducing our costs and enabling us to deliver results much more quickly.
Best of Luck!!
Here's a link to the php discussion of the soundex functions in mysql and php. I'd start from there, then expand into your other not-so-well-defined requirements.
Your reference references the Levenshtein methodology for matching. Two problems. 1. It's more appropriate for measuring the difference between two known words, not for searching. 2. It discusses a solution designed more to detect things like proofing errors (using "Levenshtien" for "Levenshtein") rather than spelling errors (where the user doesn't know how to spell, say "Levenshtein" and types in "Levinstein". I usually associate it with looking for a phrase in a book rather than a key value in a database.
EDIT: In response to comment--
Can you at least get the users to put the company names into multiple text boxes; 2. or use an unambigous name delimiter (say backslash); 3. leave out articles ("The") and generic abbreviations (or you can filter for these); 4. Squoosh the spaces out and match for that also (so Micro Soft => microsoft, Bare Essentials => bareessentials); 5. Filter out punctuation; 6. Do "OR" searches on words ("bare" OR "essentials") - people will inevitably leave one or the other out sometimes.
Test like mad and use the feedback loop from users.
the best function for fuzzy matching is levenshtein. it's traditionally used by spell checkers, so that might be the way to go. there's a UDF for it available here: http://joshdrew.com/
the downside to using levenshtein is that it won't scale very well. a better idea might be to dump the whole table in to a spell checker custom dictionary file and do the suggestion from your application tier instead of the database tier.
This answer results in indexed lookup of almost any entity using input of 2 or 3 characters or more.
Basically, create a new table with 2 columns, word and key. Run a process on the original table containing the column to be fuzzy searched. This process will extract every individual word from the original column and write these words to the word table along with the original key. During this process, commonly occurring words like 'the','and', etc should be discarded.
We then create several indices on the word table, as follows...
A normal, lowercase index on word + key
An index on the 2nd through 5th character + key
An index on the 3rd through 6th character + key
Alternately, create a SOUNDEX() index on the word column.
Once this is in place, we take any user input and search using normal word = input or LIKE input%. We never do a LIKE %input as we are always looking for a match on any of the first 3 characters, which are all indexed.
If your original table is massive, you could partition the word table by chunks of the alphabet to ensure the user's input is being narrowed down to candidate rows immediately.
Though the question asks about how to do fuzzy searches in MySQL, I'd recommend considering using a separate fuzzy search (aka typo tolerant) engine to accomplish this. Here are some search engines to consider:
ElasticSearch (Open source, has a ton of features, and so is also complex to operate)
Algolia (Proprietary, but has great docs and super easy to get up and running)
Typesense (Open source, provides the same fuzzy search-as-you-type feature as Algolia)
Check if it's spelled wrong before querying using a trusted and well tested spell checking library on the server side, then do a simple query for the original text AND the first suggested correct spelling (if spell check determined it was misspelled).
You can create custom dictionaries for any spell check library worth using, which you may need to do for matching more obscure company names.
It's way faster to match against two simple strings than it is to do a Levenshtein distance calculation against an entire table. MySQL is not well suited for this.
I tackled a similar problem recently and wasted a lot of time fiddling around with algorithms, so I really wish there had been more people out there cautioning against doing this in MySQL.
Probably been suggested before but why not dump the data out to Excel and use the Fuzzy Match Excel plugin. This will give a score from 0 to 1 (1 being 100%).
I did this for business partner (company) data that was held in a database.
Download the latest UK Companies House data and score against that.
For ROW data its more complex as we had to do a more manual process.

1-1 mappings for id obfuscation

I'm using sequential ids as primary keys and there are cases where I don't want those ids to be visible to users, for example I might want to avoid urls like ?invoice_id=1234 that allow users to guess how many invoices the system as a whole is issuing.
I could add a database field with a GUID or something conjured up from hash functions, random strings and/or numeric base conversions, but schemes of that kind have three issues that I find annoying:
Having to allocate the extra database field. I know I could use the GUID as my primary key, but my auto-increment integer PK's are the right thing for most purposes, and I don't want to change that.
Having to think about the possibility of hash/GUID collisions. I give my full assent to all the arguments about GUID collisions being as likely as spontaneous combustion or whatever, but disregarding exceptional cases because they're exceptional goes against everything else I've been taught, and it continues to bother me even when I know I should be more bothered about other things.
I don't know how to safely trim hash-based identifiers, so even if my private ids are 16 or 32 bits, I'm stuck with 128 bit generated identifiers that are a nuisance in urls.
I'm interested in 1-1 mappings of an id range, stretchable or shrinkable so that for example 16-bit ids are mapped to 16 bit ids, 32 bit ids mapped to 32 bit ids, etc, and that would stop somebody from trying to guess the total number of ids allocated or the rate of id allocation over a period.
For example, if my user ids are 16 bit integers (0..65535), then an example of a transformation that somewhat obfuscates the id allocation is the function f(x) = (x mult 1001) mod 65536. The internal id sequence of 1, 2, 3 becomes the public id sequence of 1001, 2002, 3003. With a further layer of obfuscation from base conversion, for example to base 36, the sequence becomes 'rt', '1jm', '2bf'. When the system gets a request to the url ?userid=2bf, it converts from base 36 to get 3003 and it applies the inverse transformation g(x) = (x mult 1113) mod 65536 to get back to the internal id=3.
A scheme of that kind is enough to stop casual observation by casual users, but it's easily solvable by someone who's interested enough to try to puzzle it through. Can anyone suggest something that's a bit stronger, but is easily implementable in say PHP without special libraries? This is getting close to a roll-your-own encryption scheme, so maybe there is a proper encryption algorithm that's widely available and has the stretchability property mentioned above?
EDIT: Stepping back a little bit, some discussion at codinghorror about choosing from three kinds of keys - surrogate (guid-based), surrogate (integer-based), natural. In those terms, I'm trying to hide an integer surrogate key from users but I'm looking for something shrinkable that makes urls that aren't too long, which I don't know how to do with the standard 128-bit GUID. Sometimes, as commenter Princess suggests below, the issue can be sidestepped with a natural key.
EDIT 2/SUMMARY:
Given the constraints of the question I asked (stretchability, reversibility, ease of implementation), the most suitable solution so far seems to be the XOR-based obfuscation suggested by Someone and Breton.
It would be irresponsible of me to assume that I can achieve anything more than obfuscation/security by obscurity. The knowledge that it's an integer sequence is probably a crib that any competent attacker would be able to take advantage of.
I've given some more thought to the idea of the extra database field. One advantage of the extra field is that it makes it a lot more straightforward for future programmers who are trying to familiarise themselves with the system by looking at the database. Otherwise they'd have to dig through the source code (or documentation, ahem) to work out how a request to a given url is resolved to a given record in the database.
If I allow the extra database field, then some of the other assumptions in the question become irrelevant (for example the transformation doesn't need to be reversible). That becomes a different question, so I'll leave it there.
I find that simple XOR encryption is best suited for URL obfuscation. You can continue using whatever serial number you are using without change. Further XOR encryption doesn't increase the length of source string. If your text is 22 bytes, the encrypted string will be 22 bytes too. It's not easy enough as to be guessed like rot 13 but not heavy weight like DSE/RSA.
Search the net for PHP XOR encryption to find some implementation. The first one I found is here.
I've toyed with this sort of thing myself, in my amateurish way, and arrived at a kind of kooky number scrambling algorithm, involving mixed radices. Basically I have a function that maps a number between 0-N to another number in the 0-N range. For URLS I then map that number to a couple of english words. (words are easier to remember).
A simplified version of what I do, without mixed radices: You have a number that is 32 bits, so ahead of time, have a passkey which is 32-bits long, and XOR the passkey with your input number. Then shuffle the bits around in a determinate reordering. (possibly based on your passkey).
The nice thing about this is
No collisions, as long as you shuffle and xor the same way each time
No need to store the obfuscated keys in the database
Still use your ordered IDS internally, since you can reverse the obfuscation
You can repeat the operation several times to get more obfuscated results.
if you're up for the mixed radix version, it's basically the same, except that I add the steps of converting the input to a mixed raddix number, using the maximum range's prime factors as the digit's bases. Then I shuffle the digits around, keeping the bases with the digits, and turn it back into a standard integer.
You might find it useful to revisit the idea of using a GUID, because you can construct GUIDs in a way that isn't subject to collision.
Check out the Wikipedia page on GUIDs - the "Type 1" algorithm uses both the MAC address of the PC, and the current date/time as inputs. This guarantees that collisions are simply impossible.
Alternatively, if you create a GUID column in your database as an alternative-key (keep using your auto-increment primary keys), define it as unique. Then, if your GUID generation approach does give a duplicate, you'll get an appropriate error on insert that you can handle.
I saw this question yesterday: how reddit generates an alphanum id
I think it's a reasonably good method (and particularily clever)
it uses Python
def to_base(q, alphabet):
if q < 0: raise ValueError, "must supply a positive integer"
l = len(alphabet)
converted = []
while q != 0:
q, r = divmod(q, l)
converted.insert(0, alphabet[r])
return "".join(converted) or '0'
def to36(q):
return to_base(q, '0123456789abcdefghijklmnopqrstuvwxyz')
Add a char(10) field to your order table... call it 'order_number'.
After you create a new order, randomly generate an integer from 1...9999999999. Check to see if it exists in the database under 'order_number'. If not, update your latest row with this value. If it does exist, pick another number at random.
Use 'order_number' for publicly viewable URLs, maybe always padded with zeros.
There's a race condition concern for when two threads attempt to add the same number at the same time... you could do a table lock if you were really concerned, but that's a big hammer. Add a second check after updating, re-select to ensure it's unique. Call recursively until you get a unique entry. Dwell for a random number of milliseconds between calls, and use the current time as a seed for the random number generator.
Swiped from here.
UPDATED As with using the GUID aproach described by Bevan, if the column is constrained as unique, then you don't have to sweat it. I guess this is no different that using a GUID, except that the customer and Customer Service will have an easier time referring to the order.
I've found a much simpler way. Say you want to map N digits, pseudorandomly to N digits. you find the next highest prime from N, and you make your function
prandmap(x) return x * nextPrime(N) % N
this will produce a function that repeats (or has a period) every N, no number is produced twice until x=N+1. It always starts at 0, but is pseudorandom thereafter.
I honestly thing encrypting/decrypting query string data is a bad approach to this problem. The easiest solution is sending data using POST instead of GET. If users are clicking on links with querystring data, you have to resort to some javascript hacks to send data by POST (keep accessibility in mind for users with Javascript turned off). This doesn't prevent users from viewing source, but at the very least it keeps sensitive from being indexed by search engines, assuming the data you're trying to hide really that sensitive in the first place.
Another approach is to use a natural unique key. For example, if you're issuing invoices to customers on a monthly basis, then "yyyyMM[customerID]" uniquely identifies a particular invoice for a particular user.
From your description, personally, I would start off by working with whatever standard encryption library is available (I'm a Java programmer, but I assume, say, a basic AES encryption library must be available for PHP):
on the database, just key things as you normally would
whenever you need to transmit a key to/from a client, use a fairly strong, standard encryption system (e.g. AES) to convert the key to/from a string of garbage. As your plain text, use a (say) 128-byte buffer containing: a (say) 4-byte key, 60 random bytes, and then a 64-byte medium-quality hash of the previous 64 bytes (see Numerical Recipes for an example)-- obviously when you receive such a string, you decrypt it then check if the hash matches before hitting the DB. If you're being a bit more paranoid, send an AES-encrypted buffer of random bytes with your key in an arbitrary position, plus a secure hash of that buffer as a separate parameter. The first option is probably a reasonable tradeoff between performance and security for your purposes, though, especially when combined with other security measures.
the day that you're processing so many invoices a second that AES encrypting them in transit is too performance expensive, go out and buy yourself a big fat server with lots of CPUs to celebrate.
Also, if you want to hide that the variable is an invoice ID, you might consider calling it something other than "invoice_id".