Algorithm to generate all possible letter combinations of given string down to 2 letters - actionscript-3

Algorithm to generate all possible letter combinations of given string down to 2 letters
Trying to create an Anagram solver in AS3, such as this one found here:
http://homepage.ntlworld.com/adam.bozon/anagramsolver.htm
I'm having a problem wrapping my brain around generating all possible letter combinations for the various lengths of strings. If I was only generating permutations for a fixed length, it wouldn't be such a problem for me... but I'm looking to reduce the length of the string and obtain all the possible permutations from the original set of letters for a string with a max length smaller than the original string. For example, say I want a string length of 2, yet I have a 3 letter string of “abc”, the output would be: ab ac ba bc ca cb.
Ideally the algorithm would produce a complete list of possible combinations starting with the original string length, down to the smallest string length of 2. I have a feeling there is probably a small recursive algorithm to do this, but can't wrap my brain around it. I'm working in AS3.
Thanks!

For the purpose of writing an anagram solver the kind of which you linked, the algorithm that you are requesting is not necessary. It is also VERY expensive.
Let's look at a 6-letter word like MONKEY, for example. All 6 letters of the word are different, so you would create:
6*5*4*3*2*1 different 6-letter words
6*5*4*3*2 different 5-letter words
6*5*4*3 different 4-letter words
6*5*4 different 3-letter words
6*5 different 2-letter words
For a total of 1950 words
Now, presumably you're not trying to spit out all 1950 words (e.g. 'OEYKMN') as anagrams (which they are, but most of them are also gibberish). I'm guessing you have a dictionary of legal English words, and you just want to check if any of those words are anagrams of the query word, with the option of not using all letters.
If that is the case, then the problem is simple.
To determine if 2 words are anagrams of each other, all you need to do is count how many times each letters are used, and compare these numbers!
Let's restrict ourself to only 26 letters A-Z, case insensitive. What you need to do is write a function countLetters that takes a word and returns an array of 26 numbers. The first number in the array corresponds to the count of the letter A in the word, second number corresponds to count of B, etc.
Then, two words W1 and W2 are exact anagram if countLetters(W1)[i] == countLetters(W2)[i] for every i! That is, each word uses each letter the exact same number of times!
For what I'd call sub-anagrams (MONEY is a sub-anagram of MONKEY), W1 is a sub-anagram of W2 if countLetters(W1)[i] <= countLetters(W2)[i] for every i! That is, the sub-anagram may use less of certain letters, but not more!
(note: MONKEY is also a sub-anagram of MONKEY).
This should give you a fast enough algorithm, where given a query string, all you need to do is read through the dictionary once, comparing the letter count array of each word against the letter count array of the query word. You can do some minor optimizations, but this should be good enough.
Alternatively, if you want utmost performance, you can preprocess the dictionary (which is known in advance) and create a directed acyclic graph of sub-anagram relationship.
Here's a portion of such a graph for illustration:
D=1,G=1,O=1 ----------> D=1,O=1
{dog,god} \ {do,od}
\
\-------> G=1,O=1
{go}
Basically each node is a bucket for all words that have the same letter count array (i.e. they're exact anagrams). Then there's a node from N1 to N2 if N2's array is <= (as defined above) N1's array (you can perform transitive reduction to store the least amount of edges).
Then to list all sub-anagrams of a word, all you have to do is find the node corresponding to its letter count array, and recursively explore all nodes reachable from that node. All their buckets would contain the sub-anagrams.

The following js code will find all possible "words" in an n letter word. Of course this doesn't mean that they are real words but does give you all the combinations. On my machine it takes about 0.4 seconds for a 7 letter word and 15 secs for a 9 letter word (up to almost a million possibilities if no repeated letters). However those times include looking in a dictionary and finding which are real words.
var getWordsNew=function(masterword){
var result={}
var a,i,l;
function nextLetter(a,l,key,used){
var i;
var j;
if(key.length==l){
return;
}
for(i=0;i<l;i++){
if(used.indexOf(""+i)<0){
result[key+a[i]]="";
nextLetter(a,l,key+a[i],used+i);
}
}
}
a=masterword.split("");
l=a.length;
for (i = 0; i < a.length; i++) {
result[a[i]] = "";
nextLetter(a, l, a[i], "" + i)
}
return result;
}
Complete code at
Code for finding words in words

You want a sort of arrangements. If you're familiar with the permutation algorithm then you know you have a check to see when you've generated enough numbers. Just change that limit:
I don't know AS3, but here's a pseudocode:
st = an array
Arrangements(LettersInYourWord, MinimumLettersInArrangement, k = 1)
if ( k > MinimumLettersInArrangements )
{
print st;
}
if ( k > LettersInYourWord )
return;
for ( each position i in your word that hasn't been used before )
st[k] = YourWord[i];
Arrangements(<same>, <same>, k + 1);
for "abc" and Arrangements(3, 2, 1); this will print:
ab
abc
ac
acb
...
If you want those with three first, and then those with two, consider this:
st = an array
Arrangements(LettersInYourWord, DesiredLettersInArrangement, k = 1)
if ( k > DesiredLettersInArrangements )
{
print st;
return
}
for ( each position i in your word that hasn't been used before )
st[k] = YourWord[i];
Arrangements(<same>, <same>, k + 1);
Then for "abc" call Arrangements(3, 3, 1); and then Arrangements(3, 2, 1);

You can generate all words in an alphabet by finding all paths in a complete graph of the letters. You can find all paths in that graph by doing a depth first search from each letter and returning the current path at each point.

There is simple O(N) where n is size of vocabulary.
Just sort letters in each word in vocabulary or better, create binary mask of them and then compare whit letters you have.

Related

Generate unique serial from id number

I have a database that increases id incrementally. I need a function that converts that id to a unique number between 0 and 1000. (the actual max is much larger but just for simplicity's sake.)
1 => 3301,
2 => 0234,
3 => 7928,
4 => 9821
The number generated cannot have duplicates.
It can not be incremental.
Need it generated on the fly (not create a table of uniform numbers to read from)
I thought a hash function but there is a possibility for collisions.
Random numbers could also have duplicates.
I need a minimal perfect hash function but cannot find a simple solution.
Since the criteria are sort of vague (good enough to fool the average person), I am unsure exactly which route to take. Here are some ideas:
You could use a Pearson hash. According to the Wikipedia page:
Given a small, privileged set of inputs (e.g., reserved words for a compiler), the permutation table can be adjusted so that those inputs yield distinct hash values, producing what is called a perfect hash function.
You could just use a complicated looking one-to-one mathematical function. The drawback of this is that it would be difficult to make one that was not strictly increasing or strictly decreasing due to the one-to-one requirement. If you did something like (id ^ 2) + id * 2, the interval between ids would change and it wouldn't be immediately obvious what the function was without knowing the original ids.
You could do something like this:
new_id = (old_id << 4) + arbitrary_4bit_hash(old_id);
This would give the unique IDs and it wouldn't be immediately obvious that the first 4 bits are just garbage (especially when reading the numbers in decimal format). Like the last option, the new IDs would be in the same order as the old ones. I don't know if that would be a problem.
You could just hardcode all ID conversions by making a lookup array full of "random" numbers.
You could use some kind of hash function generator like gperf.
GNU gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only.
You could encrypt the ids with a key using a cryptographically secure mechanism.
Hopefully one of these works for you.
Update
Here is the rotational shift the OP requested:
function map($number)
{
// Shift the high bits down to the low end and the low bits
// down to the high end
// Also, mask out all but 10 bits. This allows unique mappings
// from 0-1023 to 0-1023
$high_bits = 0b0000001111111000 & $number;
$new_low_bits = $high_bits >> 3;
$low_bits = 0b0000000000000111 & $number;
$new_high_bits = $low_bits << 7;
// Recombine bits
$new_number = $new_high_bits | $new_low_bits;
return $new_number;
}
function demap($number)
{
// Shift the high bits down to the low end and the low bits
// down to the high end
$high_bits = 0b0000001110000000 & $number;
$new_low_bits = $high_bits >> 7;
$low_bits = 0b0000000001111111 & $number;
$new_high_bits = $low_bits << 3;
// Recombine bits
$new_number = $new_high_bits | $new_low_bits;
return $new_number;
}
This method has its advantages and disadvantages. The main disadvantage that I can think of (besides the security aspect) is that for lower IDs consecutive numbers will be exactly the same (multiplicative) interval apart until digits start wrapping around. That is to say
map(1) * 2 == map(2)
map(1) * 3 == map(3)
This happens, of course, because with lower numbers, all the higher bits are 0, so the map function is equivalent to just shifting. This is why I suggested using pseudo-random data for the lower bits rather than the higher bits of the number. It would make the regular interval less noticeable. To help mitigate this problem, the function I wrote shifts only the first 3 bits and rotates the rest. By doing this, the regular interval will be less noticeable for all IDs greater than 7.
It seems that it doesn't have to be numerical? What about an MD5-Hash?
select md5(id+rand(10000)) from ...

How to convert a QuadTree Cell's Spatial Index (Binary Index) to Position and Dimension values?

Sorry in advance for miss-using any terminology in this question, but basically I'm looking into creating a QuadTree that makes use of Binary Indexing, like this:
As you can see in the two illustrations above, if each cells are given a binary ID (ex: 1010, 1011) then every ODD binary indices controls the X offset and every EVEN binary indices controls the Y offset.
For example, in the case of the Level 2 grid (16 cells), 1010 (cell #10) could be said to have 1s at it's 4th and 2nd index, therefore those would perform two Y offsets. The first '1###' (on the leftmost side) would indicate an offset of one cell-height, then the second '##1#' would additionally offset it twice the cell height.
As in:
// If Cell Height = 64pixels
1### = 64 pixels
+ ##1# = 128 pixels
__________________
1#1# = 192 pixels
The same can be applied to the X axis, only it uses the odd numbers instead (ex: #1#1).
Now, when I initialize my QuadTree, I began calculating the maximum nodes it may contain if all cells and all depths are used. I have calculated this with the sum of 4 to the power of each depths:
_totalNodes = 0;
var t:int=0, tLen:int=_maxLevels;
for (; t<tLen; t++) {
_totalNodes += Math.pow(4, t); //Adds 1, 4, 16, 64, 256, etc...
}
Then, I create another loop (iterating from 0 to _totalNodes) which instantiates the nodes and stores it in a long array. It passes the current iteration integer to the Node constructor, and it stores it as it's index.
So far I've been able to determine which depth (aka: Level) the Node would be stored in by figuring out it's index's Most Significant Bit:
public static function MSB( pValue:uint ):int {
var bits:int = 0;
while ( pValue >>= 1) {
bits++;
}
return bits;
}
But now, I'm stuck trying to figure out how to convert the index from binary form to actual Cell X and Y positions. like I said above, the dimensions of each cells are found. It's just a matter of doing some logical operations on the whole index (or "bit-code" is the name I refer to in my code)
If you know of a good example that uses logical-operations (binary level) to convert the binary index values to X and Y positions, could you please post a link or explanation here?
Thanks!
Here's a reference where I got this idea from (note: different programming language):
L. Spiro Engine - http://lspiroengine.com/?p=530
I'm not familiar with the language used in that article though, so I can't really follow it and convert it easily to ActionScript 3.0.
your task is described by Hannan Samet.
This works by first building the quadtree, and then assign to each quad cell the coresponding morton code. (bit interleaving code).
once you have the code, you assign it to the objects in the quad. then you can delte the quad tree. you then can search by converting a coordinate to the coresponding morton code, and do a bin search on the morton index. Instead of morton (also called z order) you als can use hilbert or gray codes.

What is the probability of collision with a 6 digit random alphanumeric code?

I'm using the following perl code to generate random alphanumeric strings (uppercase letters and numbers, only) to use as unique identifiers for records in my MySQL database. The database is likely to stay under 1,000,000 rows, but the absolute realistic maximum would be around 3,000,000. Do I have a dangerous chance of 2 records having the same random code, or is it likely to happen an insignificantly small number of times? I know very little about probability (if that isn't already abundantly clear from the nature of this question) and would love someone's input.
perl -le 'print map { ("A".."Z", 0..9)[rand 36] } 1..6'
Because of the Birthday Paradox it's more likely than you might think.
There are 2,176,782,336 possible codes, but even inserting just 50,000 rows there is already a quite high chance of a collision. For 1,000,000 rows it is almost inevitable that there will be many collisions (I think about 250 on average).
I ran a few tests and this is the number of codes I could generate before the first collision occurred:
73366
59307
79297
36909
Collisions will become more frequent as the number of codes increases.
Here was my test code (written in Python):
>>> import random
>>> codes = set()
>>> while 1:
code=''.join(random.choice('1234567890qwertyuiopasdfghjklzxcvbnm')for x in range(6))
if code in codes: break
codes.add(code)
>>> len(codes)
36909
Well, you have 36**6 possible codes, which is about 2 billion. Call this d. Using a formula found here, we find that the probability of a collision, for n codes, is approximately
1 - ((d-1)/d)**(n*(n-1)/2)
For any n over 50,000 or so, that's pretty high.
Looks like a 10-character code has a collision probability of only about 1/800. So go with 10 or more.
Based on the equations given at http://en.wikipedia.org/wiki/Birthday_paradox#Approximation_of_number_of_people, there is a 50% chance of encountering at least one collision after inserting only 55,000 records or so into a universe of this size:
http://wolfr.am/niaHIF
Trying to insert two to six times as many records will almost certainly lead to a collision. You'll need to assign codes nonrandomly, or use a larger code.
As mentioned previously, the birthday paradox makes this event quite likely. In particular, a accurate approximation can be determined when the problem is cast as a collision problem. Let p(n; d) be the probability that at least two numbers are the same, d be the number of combinations and n the number of trails. Then, we can show that p(n; d) is approximately equal to:
1 - ((d-1)/d)^(n*(n-1)/2)
We can easily plot this in R:
> d = 2176782336
> n = 1:100000
> plot(n,1 - ((d-1)/d)^(n*(n-1)/2), type='l')
which gives
As you can see the collision probability increases very quickly with the number of trials/rows
While I don't know the specifics of exactly how you want to use these pseudo-random IDs, you may want to consider generating an array of 3000000 integers (from 1 to 3000000) and randomly shuffling it. That would guarantee that the numbers are unique.
See Fisher-Yates shuffle on Wikipedia.
A caution: Beware of relying on the built-in rand where the quality of the pseudo random number generator matters. I recently found out about Math::Random::MT::Auto:
The Mersenne Twister is a fast pseudorandom number generator (PRNG) that is capable of providing large volumes (> 10^6004) of "high quality" pseudorandom data to applications that may exhaust available "truly" random data sources or system-provided PRNGs such as rand.
The module provides a drop in replacement for rand which is handy.
You can generate the sequence of keys with the following code:
#!/usr/bin/env perl
use warnings; use strict;
use Math::Random::MT::Auto qw( rand );
my $SEQUENCE_LENGTH = 1_000_000;
my %dict;
my $picks;
for my $i (1 .. $SEQUENCE_LENGTH) {
my $pick = pick_one();
$picks += 1;
redo if exists $dict{ $pick };
$dict{ $pick } = undef;
}
printf "Generated %d keys with %d picks\n", scalar keys %dict, $picks;
sub pick_one {
join '', map { ("A".."Z", 0..9)[rand 36] } 1..6;
}
Some time ago, I wrote about the limited range of built-in rand on Windows. You may not be on Windows, but there might be other limitations or pitfalls on your system.

Generating unique codes that are different in two digits

I want to generate unique code numbers (composed of 7 digits exactly). The code number is generated randomly and saved in MySQL table.
I have another requirement. All generated codes should differ in at least two digits. This is useful to prevent errors while typing the user code. Hopefully, it will prevent referring to another user code while doing some operations as it is more unlikely to miss two digits and match another existing user code.
The generate algorithm works simply like:
Retrieve all previous codes if any from MySQL table.
Generate one code at a time.
Subtract the generated code with all previous codes.
Check the number of non-zero digits in the subtraction result.
If it is > 1, accept the generated code and add it to previous codes.
Otherwise, jump to 2.
Repeat steps from 2 to 6 for the number of requested codes.
Save the generated codes in the DB table.
The algorithm works fine, but the problem is related to performance. It takes a very long to finish generating the codes when requesting to generate a large number of codes like: 10,000.
The question: Is there any way to improve the performance of this algorithm?
I am using perl + MySQL on Ubuntu server if that matters.
Have you considered a variant of the Luhn algorithm? Luhn is used to generate a check digit for strings of numbers in lots of applications, including credit card account numbers. It's part of the ISO-7812-1 standard for generating identifiers. It will catch any number that is entered with one incorrect digit, which implies any two valid numbers differ in a least two digits.
Check out Algorithm::LUHN in CPAN for a perl implementation.
Don't retrieve the existing codes, just generate a potential new code and see if there are any conflicting ones in the database:
SELECT code FROM table WHERE abs(code-?) regexp '^[1-9]?0*$';
(where the placeholder is the newly generated code).
Ah, I missed the generating lots of codes at once part. Do it like this (completely untested):
my #codes = existing_codes();
my $frontwards_index = {};
my $backwards_index = {};
for my $code (#codes) {
index_code($code, $frontwards_index);
index_code(reverse($code), $backwards_index);
}
my #new_codes = map generate_code($frontwards_index, $backwards_index), 1..10000;
sub index_code {
my ($code, $index) = #_;
push #{ $index{ substr($code, 0, length($code)/2) } }, $code;
return;
}
sub check_index {
my ($code, $index) = #_;
my $found = grep { ($_ ^ $code) =~ y/\0//c <= 1 } #{ $index{ substr($code, 0, length($code)/2 } };
return $found;
}
sub generate_code {
my ($frontwards_index, $backwards_index) = #_;
my $new_code;
do {
$new_code = sprintf("%07d", rand(10000000));
} while check_index($new_code, $frontwards_index)
|| check_index(reverse($new_code), $backwards_index);
index_code($new_code, $frontwards_index);
index_code(reverse($new_code), $backwards_index);
return $new_code;
}
Put the numbers 0 through 9,999,999 in an augmented binary search tree. The augmentation is to keep track of the number of sub-nodes to the left and to the right. So for example when your algorithm begins, the top node should have value 5,000,000, and it should know that it has 5,000,000 nodes to the left, and 4,999,999 nodes to the right. Now create a hashtable. For each value you've used already, remove its node from the augmented binary search tree and add the value to the hashtable. Make sure to maintain the augmentation.
To get a single value, follow these steps.
Use the top node to determine how many nodes are left in the tree. Let's say you have n nodes left. Pick a random number between 0 and n. Using the augmentation, you can find the nth node in your tree in log(n) time.
Once you've found that node, compute all the values that would make the value at that node invalid. Let's say your node has value 1,111,111. If you already have 2,111,111 or 3,111,111 or... then you can't use 1,111,111. Since there are 8 other options per digit and 7 digits, you only need to check 56 possible values. Check to see if any of those values are in your hashtable. If you haven't used any of those values yet, you can use your random node. If you have used any of them, then you can't.
Remove your node from the augmented tree. Make sure that you maintain the augmented information.
If you can't use that value, return to step 1.
If you can use that value, you have a new random code. Add it to the hashtable.
Now, checking to see if a value is available takes O(1) time instead of O(n) time. Also, finding another available random value to check takes O(log n) time instead of... ah... I'm not sure how to analyze your algorithm.
Long story short, if you start from scratch and use this algorithm, you will generate a complete list of valid codes in O(n log n). Since n is 10,000,000, it will take a few seconds or something.
Did I do the math right there everybody? Let me know if that doesn't check out or if I need to clarify anything.
Use a hash.
After generating a successful code (not conflicting with any existing code), but that code in the hash table, and also put the 63 other codes that differ by exactly one digit into the hash.
To see if a randomly generated code will conflict with an existing code, just check if that code exists in the hash.
Howabout:
Generate a 6 digit code by autoincrementing the previous one.
Generate a 1 digit code by incrementing the previous one mod 10.
Concatenate the two.
Presto, guaranteed to differ in two digits. :D
(Yes, being slightly facetious. I'm assuming that 'random' or at least quasi-random is necessary. In which case, generate a 6 digit random key, repeat until its not a duplicate (i.e. make the column unique, repeat until the insert doesn't fail the constraint), then generate a check digit, as someone already said.)

Rot13 for numbers

EDIT: Now a Major Motion Blog Post at http://messymatters.com/sealedbids
The idea of rot13 is to obscure text, for example to prevent spoilers. It's not meant to be cryptographically secure but to simply make sure that only people who are sure they want to read it will read it.
I'd like to do something similar for numbers, for an application involving sealed bids. Roughly I want to send someone my number and trust them to pick their own number, uninfluenced by mine, but then they should be able to reveal mine (purely client-side) when they're ready. They should not require further input from me or any third party.
(Added: Note the assumption that the recipient is being trusted not to cheat.)
It's not as simple as rot13 because certain numbers, like 1 and 2, will recur often enough that you might remember that, say, 34.2 is really 1.
Here's what I'm looking for specifically:
A function seal() that maps a real number to a real number (or a string). It should not be deterministic -- seal(7) should not map to the same thing every time. But the corresponding function unseal() should be deterministic -- unseal(seal(x)) should equal x for all x. I don't want seal or unseal to call any webservices or even get the system time (because I don't want to assume synchronized clocks). (Added: It's fine to assume that all bids will be less than some maximum, known to everyone, say a million.)
Sanity check:
> seal(7)
482.2382 # some random-seeming number or string.
> seal(7)
71.9217 # a completely different random-seeming number or string.
> unseal(seal(7))
7 # we always recover the original number by unsealing.
You can pack your number as a 4 byte float together with another random float into a double and send that. The client then just has to pick up the first four bytes. In python:
import struct, random
def seal(f):
return struct.unpack("d",struct.pack("ff", f, random.random() ))[0]
def unseal(f):
return struct.unpack("ff",struct.pack("d", f))[0]
>>> unseal( seal( 3))
3.0
>>> seal(3)
4.4533985422978706e-009
>>> seal(3)
9.0767582382536571e-010
Here's a solution inspired by Svante's answer.
M = 9999 # Upper bound on bid.
seal(x) = M * randInt(9,99) + x
unseal(x) = x % M
Sanity check:
> seal(7)
716017
> seal(7)
518497
> unseal(seal(7))
7
This needs tweaking to allow negative bids though:
M = 9999 # Numbers between -M/2 and M/2 can be sealed.
seal(x) = M * randInt(9,99) + x
unseal(x) =
m = x % M;
if m > M/2 return m - M else return m
A nice thing about this solution is how trivial it is for the recipient to decode -- just mod by 9999 (and if that's 5000 or more then it was a negative bid so subtract another 9999). It's also nice that the obscured bid will be at most 6 digits long. (This is plenty security for what I have in mind -- if the bids can possibly exceed $5k then I'd use a more secure method. Though of course the max bid in this method can be set as high as you want.)
Instructions for Lay Folk
Pick a number between 9 and 99 and multiply it by 9999, then add your bid.
This will yield a 5 or 6-digit number that encodes your bid.
To unseal it, divide by 9999, subtract the part to the left of the decimal point, then multiply by 9999.
(This is known to children and mathematicians as "finding the remainder when dividing by 9999" or "mod'ing by 9999", respectively.)
This works for nonnegative bids less than 9999 (if that's not enough, use 99999 or as many digits as you want).
If you want to allow negative bids, then the magic 9999 number needs to be twice the biggest possible bid.
And when decoding, if the result is greater than half of 9999, ie, 5000 or more, then subtract 9999 to get the actual (negative) bid.
Again, note that this is on the honor system: there's nothing technically preventing you from unsealing the other person's number as soon as you see it.
If you're relying on honesty of the user and only dealing with integer bids, a simple XOR operation with a random number should be all you need, an example in C#:
static Random rng = new Random();
static string EncodeBid(int bid)
{
int i = rng.Next();
return String.Format("{0}:{1}", i, bid ^ i);
}
static int DecodeBid(string encodedBid)
{
string[] d = encodedBid.Split(":".ToCharArray());
return Convert.ToInt32(d[0]) ^ Convert.ToInt32(d[1]);
}
Use:
int bid = 500;
string encodedBid = EncodeBid(bid); // encodedBid is something like 54017514:4017054 and will be different each time
int decodedBid = DecodeBid(encodedBid); // decodedBid is 500
Converting the decode process to a client side construct should be simple enough.
Is there a maximum bid? If so, you could do this:
Let max-bid be the maximum bid and a-bid the bid you want to encode. Multiply max-bid by a rather large random number (if you want to use base64 encoding in the last step, max-rand should be (2^24/max-bid)-1, and min-rand perhaps half of that), then add a-bid. Encode this, e.g. through base64.
The recipient then just has to decode and find the remainder modulo max-bid.
What you want to do (a Commitment scheme) is impossible to do client-side-only. The best you could do is encrypt with a shared key.
If the client doesn't need your cooperation to reveal the number, they can just modify the program to reveal the number. You might as well have just sent it and not displayed it.
To do it properly, you could send a secure hash of your bid + a random salt. That commits you to your bid. The other client can commit to their bid in the same way. Then you each share your bid and salt.
[edit] Since you trust the other client:
Sender:
Let M be your message
K = random 4-byte key
C1 = M xor hash(K) //hash optional: hides patterns in M xor K
//(you can repeat or truncate hash(K) as necessary to cover the message)
//(could also xor with output of a PRNG instead)
C2 = K append M //they need to know K to reveal the message
send C2 //(convert bytes to hex representation if needed)
Receiver:
receive C2
K = C2[:4]
C1 = C2[4:]
M = C1 xor hash(K)
Are you aware that you need a larger 'sealed' set of numbers than your original, if you want that to work?
So you need to restrict your real numbers somehow, or store extra info that you don't show.
One simple way is to write a message like:
"my bid is: $14.23: aduigfurjwjnfdjfugfojdjkdskdfdhfddfuiodrnfnghfifyis"
All that junk is randomly-generated, and different every time.
Send the other person the SHA256 hash of the message. Have them send you the hash of their bid. Then, once you both have the hashes, send the full message, and confirm that their bid corresponds to the hash they gave you.
This gives rather stronger guarantees than you need - it's actually not possible from them to work out your bid before you send them your full message. However, there is no unseal() function as you describe.
This simple scheme has various weaknesses that a full zero-knowledge scheme would not have. For example, if they fake you out by sending you a random number instead of a hash, then they can work out your bid without revealing their own. But you didn't ask for bullet-proof. This prevents both accidental and (I think) undetectable cheating, and uses only a commonly-available command line utility, plus a random number generator (dice will do).
If, as you say, you want them to be able to recover your bid without any further input from you, and you are willing to trust them only to do it after posting their bid, then just encrypt using any old symmetric cipher (gpg --symmetric, perhaps) and the key, "rot13". This will prevent accidental cheating, but allow undetectable cheating.
One idea that poped into my mind was to maybe base your algorithm on the mathematics
used for secure key sharing.
If you want to give two persons, Bob and Alice, half a key each so
that only when combining them they will be able to open whatever the key locks, how do you do that? The solution to this comes from mathematics. Say you have two points A (-2,2) and B (2,0) in a x/y coordinate system.
|
A +
|
C
|
---+---+---+---|---+---B---+---+---+---
|
+
|
+
If you draw a straight line between them it will cross the y axis at exactly one single point, C (0,1).
If you only know one of the points A or B it is impossible to tell where it will cross.
Thus you can let the points A and B be the shared keys which when combined will reveal the y-value
of the crossing point (i.e. 1 in this example) and this value is then typically used as
a real key for something.
For your bidding application you could let seal() and unseal() swap the y-value between the C and B points
(deterministic) but have the A point vary from time to time.
This way seal(y-value of point B) will give completely different results depending on point A,
but unseal(seal(y-value of point B)) should return the y-value of B which is what you ask for.
PS
It is not required to have A and B on different sides of the y-axis, but is much simpler conceptually to think of it this way (and I recommend implementing it that way as well).
With this straight line you can then share keys between several persons so that only two of
them are needed to unlock whatever. It is possible to use curve types other then straight lines to create other
key sharing properties (i.e. 3 out of 3 keys are required etc).
Pseudo code:
encode:
value = 2000
key = random(0..255); // our key is only 2 bytes
// 'sealing it'
value = value XOR 2000;
// add key
sealed = (value << 16) | key
decode:
key = sealed & 0xFF
unsealed = key XOR (sealed >> 16)
Would that work?
Since it seems that you are assuming that the other person doesn't want to know your bid until after they've placed their own, and can be trusted not to cheat, you could try a variable rotation scheme:
from random import randint
def seal(input):
r = randint(0, 50)
obfuscate = [str(r)] + [ str(ord(c) + r) for c in '%s' % input ]
return ':'.join(obfuscate)
def unseal(input):
tmp = input.split(':')
r = int(tmp.pop(0))
deobfuscate = [ chr(int(c) - r) for c in tmp ]
return ''.join(deobfuscate)
# I suppose you would put your bid in here, for 100 dollars
tmp = seal('$100.00') # --> '1:37:50:49:49:47:49:49' (output varies)
print unseal(tmp) # --> '$100.00'
At some point (I think we may have already passed it) this becomes silly, and because it is so easy, you should just use simple encryption, where the message recipient always knows the key - the person's username, perhaps.
If the bids are fairly large numbers, how about a bitwise XOR with some predetermined random-ish number? XORing again will then retrieve the original value.
You can change the number as often as you like, as long as both client and server know it.
You could set a different base (like 16, 17, 18, etc.) and keep track of which base you've "sealed" the bid with...
Of course, this presumes large numbers (> the base you're using, at least). If they were decimal, you could drop the point (for example, 27.04 becomes 2704, which you then translate to base 29...)
You'd probably want to use base 17 to 36 (only because some people might recognize hex and be able to translate it in their head...)
This way, you would have numbers like G4 or Z3 or KW (depending on the numbers you're sealing)...
Here's a cheap way to piggyback off rot13:
Assume we have a function gibberish() that generates something like "fdjk alqef lwwqisvz" and a function words(x) that converts a number x to words, eg, words(42) returns "forty two" (no hyphens).
Then define
seal(x) = rot13(gibberish() + words(x) + gibberish())
and
unseal(x) = rot13(x)
Of course the output of unseal is not an actual number and is only useful to a human, but that might be ok.
You could make it a little more sophisticated with words-to-number function that would also just throw away all the gibberish words (defined as anything that's not one of the number words -- there are less than a hundred of those, I think).
Sanity check:
> seal(7)
fhrlls hqufw huqfha frira afsb ht ahuqw ajaijzji
> seal(7)
qbua adfshua hqgya ubiwi ahp wqwia qhu frira wge
> unseal(seal(7))
sueyyf udhsj seven ahkua snsfo ug nuhdj nwnvwmwv
I know this is silly but it's a way to do it "by hand" if all you have is rot13 available.