Real world use cases of bitwise operators [closed] - language-agnostic

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
What are some real world use cases of the following bitwise operators?
AND
XOR
NOT
OR
Left/Right shift

Bit fields (flags)
They're the most efficient way of representing something whose state is defined by several "yes or no" properties. ACLs are a good example; if you have let's say 4 discrete permissions (read, write, execute, change policy), it's better to store this in 1 byte rather than waste 4. These can be mapped to enumeration types in many languages for added convenience.
Communication over ports/sockets
Always involves checksums, parity, stop bits, flow control algorithms, and so on, which usually depend on the logic values of individual bytes as opposed to numeric values, since the medium may only be capable of transmitting one bit at a time.
Compression, Encryption
Both of these are heavily dependent on bitwise algorithms. Look at the deflate algorithm for an example - everything is in bits, not bytes.
Finite State Machines
I'm speaking primarily of the kind embedded in some piece of hardware, although they can be found in software too. These are combinatorial in nature - they might literally be getting "compiled" down to a bunch of logic gates, so they have to be expressed as AND, OR, NOT, etc.
Graphics
There's hardly enough space here to get into every area where these operators are used in graphics programming. XOR (or ^) is particularly interesting here because applying the same input a second time will undo the first. Older GUIs used to rely on this for selection highlighting and other overlays, in order to eliminate the need for costly redraws. They're still useful in slow graphics protocols (i.e. remote desktop).
Those were just the first few examples I came up with - this is hardly an exhaustive list.

Is it odd?
(value & 0x1) > 0
Is it divisible by two (even)?
(value & 0x1) == 0

I've used bitwise operations in implementing a security model for a CMS. It had pages which could be accessed by users if they were in appropriate groups. A user could be in multiple groups, so we needed to check if there was an intersection between the users groups and the pages groups. So we assigned each group a unique power-of-2 identifier, e.g.:
Group A = 1 --> 00000001
Group B = 2 --> 00000010
Group C = 3 --> 00000100
We OR these values together, and store the value (as a single int) with the page. E.g. if a page could be accessed by groups A & B, we store the value 3 (which in binary is 00000011) as the pages access control. In much the same way, we store a value of ORed group identifiers with a user to represent which groups they are in.
So to check if a given user can access a given page, you just need to AND the values together and check if the value is non-zero. This is very fast as this check is implemented in a single instruction, no looping, no database round-trips.

Here's some common idioms dealing with flags stored as individual bits.
enum CDRIndicators {
Local = 1 << 0,
External = 1 << 1,
CallerIDMissing = 1 << 2,
Chargeable = 1 << 3
};
unsigned int flags = 0;
Set the Chargeable flag:
flags |= Chargeable;
Clear CallerIDMissing flag:
flags &= ~CallerIDMissing;
Test whether CallerIDMissing and Chargeable are set:
if((flags & (CallerIDMissing | Chargeable )) == (CallerIDMissing | Chargeable)) {
}

Low-level programming is a good example. You may, for instance, need to write a specific bit to a memory-mapped register to make some piece of hardware do what you want it to:
volatile uint32_t *register = (volatile uint32_t *)0x87000000;
uint32_t value;
uint32_t set_bit = 0x00010000;
uint32_t clear_bit = 0x00001000;
value = *register; // get current value from the register
value = value & ~clear_bit; // clear a bit
value = value | set_bit; // set a bit
*register = value; // write it back to the register
Also, htonl() and htons() are implemented using the & and | operators (on machines whose endianness(Byte order) doesn't match network order):
#define htons(a) ((((a) & 0xff00) >> 8) | \
(((a) & 0x00ff) << 8))
#define htonl(a) ((((a) & 0xff000000) >> 24) | \
(((a) & 0x00ff0000) >> 8) | \
(((a) & 0x0000ff00) << 8) | \
(((a) & 0x000000ff) << 24))

I use them to get RGB(A) values from packed colorvalues, for instance.

When I have a bunch of boolean flags, I like to store them all in an int.
I get them out using bitwise-AND. For example:
int flags;
if (flags & 0x10) {
// Turn this feature on.
}
if (flags & 0x08) {
// Turn a second feature on.
}
etc.

& = AND:
Mask out specific bits.
You are defining the specific bits which should be displayed
or not displayed. 0x0 & x will clear all bits in a byte while 0xFF will not change x.
0x0F will display the bits in the lower nibble.
Conversion:
To cast shorter variables into longer ones with bit identity it is necessary to adjust the bits because -1 in an int is 0xFFFFFFFF while -1 in a long is 0xFFFFFFFFFFFFFFFF. To preserve
the identity you apply a mask after conversion.
|=OR
Set bits. The bits will be set indepently if they are already set. Many datastructures (bitfields) have flags like IS_HSET = 0, IS_VSET = 1 which can be indepently set.
To set the flags, you apply IS_HSET | IS_VSET (In C and assembly this is very convenient to read)
^=XOR
Find bits which are the same or different.
~= NOT
Flip bits.
It can be shown that all possible local bit operations can be implemented by these operations.
So if you like you can implement an ADD instruction solely by bit operations.
Some wonderful hacks:
http://www.ugcs.caltech.edu/~wnoise/base2.html
http://www.jjj.de/bitwizardry/bitwizardrypage.html

Encryption is all bitwise operations.

You can use them as a quick and dirty way to hash data.
int a = 1230123;
int b = 1234555;
int c = 5865683;
int hash = a ^ b ^ c;

I just used bitwise-XOR (^) about three minutes ago to calculate a checksum for serial communication with a PLC...

This is an example to read colours from a bitmap image in byte format
byte imagePixel = 0xCCDDEE; /* Image in RRGGBB format R=Red, G=Green, B=Blue */
//To only have red
byte redColour = imagePixel & 0xFF0000; /*Bitmasking with AND operator */
//Now, we only want red colour
redColour = (redColour >> 24) & 0xFF; /* This now returns a red colour between 0x00 and 0xFF.
I hope this tiny examples helps....

In the abstracted world of today's modern language, not too many. File IO is an easy one that comes to mind, though that's exercising bitwise operations on something already implemented and is not implementing something that uses bitwise operations. Still, as an easy example, this code demonstrates removing the read-only attribute on a file (so that it can be used with a new FileStream specifying FileMode.Create) in c#:
//Hidden files posses some extra attibutes that make the FileStream throw an exception
//even with FileMode.Create (if exists -> overwrite) so delete it and don't worry about it!
if(File.Exists(targetName))
{
FileAttributes attributes = File.GetAttributes(targetName);
if ((attributes & FileAttributes.ReadOnly) == FileAttributes.ReadOnly)
File.SetAttributes(targetName, attributes & (~FileAttributes.ReadOnly));
File.Delete(targetName);
}
As far as custom implementations, here's a recent example:
I created a "message center" for sending secure messages from one installation of our distributed application to another. Basically, it's analogous to email, complete with Inbox, Outbox, Sent, etc, but it also has guaranteed delivery with read receipts, so there are additional subfolders beyond "inbox" and "sent." What this amounted to was a requirement for me to define generically what's "in the inbox" or what's "in the sent folder". Of the sent folder, I need to know what's read and what's unread. Of what's unread, I need to know what's received and what's not received. I use this information to build a dynamic where clause which filters a local datasource and displays the appropriate information.
Here's how the enum is put together:
public enum MemoView :int
{
InboundMemos = 1, // 0000 0001
InboundMemosForMyOrders = 3, // 0000 0011
SentMemosAll = 16, // 0001 0000
SentMemosNotReceived = 48, // 0011
SentMemosReceivedNotRead = 80, // 0101
SentMemosRead = 144, // 1001
Outbox = 272, //0001 0001 0000
OutBoxErrors = 784 //0011 0001 0000
}
Do you see what this does? By anding (&) with the "Inbox" enum value, InboundMemos, I know that InboundMemosForMyOrders is in the inbox.
Here's a boiled down version of the method that builds and returns the filter that defines a view for the currently selected folder:
private string GetFilterForView(MemoView view, DefaultableBoolean readOnly)
{
string filter = string.Empty;
if((view & MemoView.InboundMemos) == MemoView.InboundMemos)
{
filter = "<inbox filter conditions>";
if((view & MemoView.InboundMemosForMyOrders) == MemoView.InboundMemosForMyOrders)
{
filter += "<my memo filter conditions>";
}
}
else if((view & MemoView.SentMemosAll) == MemoView.SentMemosAll)
{
//all sent items have originating system = to local
filter = "<memos leaving current system>";
if((view & MemoView.Outbox) == MemoView.Outbox)
{
...
}
else
{
//sent sub folders
filter += "<all sent items>";
if((view & MemoView.SentMemosNotReceived) == MemoView.SentMemosNotReceived)
{
if((view & MemoView.SentMemosReceivedNotRead) == MemoView.SentMemosReceivedNotRead)
{
filter += "<not received and not read conditions>";
}
else
filter += "<received and not read conditions>";
}
}
}
return filter;
}
Extremely simple, but a neat implementation at a level of abstraction that doesn't typically require bitwise operations.

Usually bitwise operations are faster than doing multiply/divide. So if you need to multiply a variable x by say 9, you will do x<<3 + x which would be a few cycles faster than x*9. If this code is inside an ISR, you will save on response time.
Similarly if you want to use an array as a circular queue, it'd be faster (and more elegant) to handle wrap around checks with bit wise operations. (your array size should be a power of 2). Eg: , you can use tail = ((tail & MASK) + 1) instead of tail = ((tail +1) < size) ? tail+1 : 0, if you want to insert/delete.
Also if you want a error flag to hold multiple error codes together, each bit can hold a separate value. You can AND it with each individual error code as a check. This is used in Unix error codes.
Also a n-bit bitmap can be a really cool and compact data structure. If you want to allocate a resource pool of size n, we can use a n-bits to represent the current status.

Bitwise & is used to mask/extract a certain part of a byte.
1 Byte variable
01110010
&00001111 Bitmask of 0x0F to find out the lower nibble
--------
00000010
Specially the shift operator (<< >>) are often used for calculations.

Bitwise operators are useful for looping arrays which length is power of 2. As many people mentioned, bitwise operators are extremely useful and are used in Flags, Graphics, Networking, Encryption. Not only that, but they are extremely fast. My personal favorite use is to loop an array without conditionals. Suppose you have a zero-index based array(e.g. first element's index is 0) and you need to loop it indefinitely. By indefinitely I mean going from first element to last and returning to first. One way to implement this is:
int[] arr = new int[8];
int i = 0;
while (true) {
print(arr[i]);
i = i + 1;
if (i >= arr.length)
i = 0;
}
This is the simplest approach, if you'd like to avoid if statement, you can use modulus approach like so:
int[] arr = new int[8];
int i = 0;
while (true) {
print(arr[i]);
i = i + 1;
i = i % arr.length;
}
The down side of these two methods is that modulus operator is expensive, since it looks for a remainder after integer division. And the first method runs an if statement on each iteration. With bitwise operator however if length of your array is a power of 2, you can easily generate a sequence like 0 .. length - 1 by using & (bitwise and) operator like so i & length. So knowing this, the code from above becomes
int[] arr = new int[8];
int i = 0;
while (true){
print(arr[i]);
i = i + 1;
i = i & (arr.length - 1);
}
Here is how it works. In binary format every number that is power of 2 subtracted by 1 is expressed only with ones. For example 3 in binary is 11, 7 is 111, 15 is 1111 and so on, you get the idea. Now, what happens if you & any number against a number consisting only of ones in binary? Let's say we do this:
num & 7;
If num is smaller or equal to 7 then the result will be num because each bit &-ed with 1 is itself. If num is bigger than 7, during the & operation computer will consider 7's leading zeros which of course will stay as zeros after & operation only the trailing part will remain. Like in case of 9 & 7 in binary it will look like
1001 & 0111
the result will be 0001 which is 1 in decimal and addresses second element in array.

Base64 encoding is an example. Base64 encoding is used to represent binary data as a printable characters for sending over email systems (and other purposes). Base64 encoding converts a series of 8 bit bytes into 6 bit character lookup indexes. Bit operations, shifting, and'ing, or'ing, not'ing are very useful for implementing the bit operations necessary for Base64 encoding and decoding.
This of course is only 1 of countless examples.

I'm suprised no one picked the obvious answer for the Internet age. Calculating valid network addresses for a subnet.
http://www.topwebhosts.org/tools/netmask.php

Nobody seems to have mentioned fixed point maths.
(Yeah, I'm old, ok?)

Is a number x a power of 2? (Useful for example in algorithms where a counter is incremented, and an action is to be taken only logarithmic number of times)
(x & (x - 1)) == 0
Which is the highest bit of an integer x? (This for example can be used to find the minimum power of 2 that is larger than x)
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
return x - (x >>> 1); // ">>>" is unsigned right shift
Which is the lowest 1 bit of an integer x? (Helps find number of times divisible by 2.)
x & -x

If you ever want to calculate your number mod(%) a certain power of 2, you can use yourNumber & 2^N-1, which in this case is the same as yourNumber % 2^N.
number % 16 = number & 15;
number % 128 = number & 127;
This is probably only useful being an alternative to modulus operation with a very big dividend that is 2^N... But even then its speed boost over the modulus operation is negligible in my test on .NET 2.0. I suspect modern compilers already perform optimizations like this. Anyone know more about this?

I use them for multi select options, this way I only store one value instead of 10 or more

it can also be handy in a sql relational model, let's say you have the following tables: BlogEntry, BlogCategory
traditonally you could create a n-n relationship between them using a BlogEntryCategory table
or when there are not that much BlogCategory records you could use one value in BlogEntry to link to multiple BlogCategory records just like you would do with flagged enums,
in most RDBMS there are also a very fast operators to select on that 'flagged' column...

When you only want to change some bits of a microcontroller's Outputs, but the register to write to is a byte, you do something like this (pseudocode):
char newOut = OutRegister & 0b00011111 //clear 3 msb's
newOut = newOut | 0b10100000 //write '101' to the 3 msb's
OutRegister = newOut //Update Outputs
Of course, many microcontrollers allow you to change each bit individually...

I've seen them used in role based access control systems.

There is a real world use in my question here -
Respond to only the first WM_KEYDOWN notification?
When consuming a WM_KEYDOWN message in the windows C api bit 30 specifies the previous key state. The value is 1 if the key is down before the message is sent, or it is zero if the key is up

They are mostly used for bitwise operations (surprise). Here are a few real-world examples found in PHP codebase.
Character encoding:
if (s <= 0 && (c & ~MBFL_WCSPLANE_MASK) == MBFL_WCSPLANE_KOI8R) {
Data structures:
ar_flags = other->ar_flags & ~SPL_ARRAY_INT_MASK;
Database drivers:
dbh->transaction_flags &= ~(PDO_TRANS_ACCESS_MODE^PDO_TRANS_READONLY);
Compiler implementation:
opline->extended_value = (opline->extended_value & ~ZEND_FETCH_CLASS_MASK) | ZEND_FETCH_CLASS_INTERFACE;

I've seen it in a few game development books as a more efficient way to multiply and divide.
2 << 3 == 2 * 8
32 >> 4 == 32 / 16

Whenever I first started C programming, I understood truth tables and all that, but it didn't all click with how to actually use it until I read this article http://www.gamedev.net/reference/articles/article1563.asp (which gives real life examples)

I don't think this counts as bitwise, but ruby's Array defines set operations through the normal integer bitwise operators. So [1,2,4] & [1,2,3] # => [1,2]. Similarly for a ^ b #=> set difference and a | b #=> union.

Related

Generate unique serial from id number

I have a database that increases id incrementally. I need a function that converts that id to a unique number between 0 and 1000. (the actual max is much larger but just for simplicity's sake.)
1 => 3301,
2 => 0234,
3 => 7928,
4 => 9821
The number generated cannot have duplicates.
It can not be incremental.
Need it generated on the fly (not create a table of uniform numbers to read from)
I thought a hash function but there is a possibility for collisions.
Random numbers could also have duplicates.
I need a minimal perfect hash function but cannot find a simple solution.
Since the criteria are sort of vague (good enough to fool the average person), I am unsure exactly which route to take. Here are some ideas:
You could use a Pearson hash. According to the Wikipedia page:
Given a small, privileged set of inputs (e.g., reserved words for a compiler), the permutation table can be adjusted so that those inputs yield distinct hash values, producing what is called a perfect hash function.
You could just use a complicated looking one-to-one mathematical function. The drawback of this is that it would be difficult to make one that was not strictly increasing or strictly decreasing due to the one-to-one requirement. If you did something like (id ^ 2) + id * 2, the interval between ids would change and it wouldn't be immediately obvious what the function was without knowing the original ids.
You could do something like this:
new_id = (old_id << 4) + arbitrary_4bit_hash(old_id);
This would give the unique IDs and it wouldn't be immediately obvious that the first 4 bits are just garbage (especially when reading the numbers in decimal format). Like the last option, the new IDs would be in the same order as the old ones. I don't know if that would be a problem.
You could just hardcode all ID conversions by making a lookup array full of "random" numbers.
You could use some kind of hash function generator like gperf.
GNU gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only.
You could encrypt the ids with a key using a cryptographically secure mechanism.
Hopefully one of these works for you.
Update
Here is the rotational shift the OP requested:
function map($number)
{
// Shift the high bits down to the low end and the low bits
// down to the high end
// Also, mask out all but 10 bits. This allows unique mappings
// from 0-1023 to 0-1023
$high_bits = 0b0000001111111000 & $number;
$new_low_bits = $high_bits >> 3;
$low_bits = 0b0000000000000111 & $number;
$new_high_bits = $low_bits << 7;
// Recombine bits
$new_number = $new_high_bits | $new_low_bits;
return $new_number;
}
function demap($number)
{
// Shift the high bits down to the low end and the low bits
// down to the high end
$high_bits = 0b0000001110000000 & $number;
$new_low_bits = $high_bits >> 7;
$low_bits = 0b0000000001111111 & $number;
$new_high_bits = $low_bits << 3;
// Recombine bits
$new_number = $new_high_bits | $new_low_bits;
return $new_number;
}
This method has its advantages and disadvantages. The main disadvantage that I can think of (besides the security aspect) is that for lower IDs consecutive numbers will be exactly the same (multiplicative) interval apart until digits start wrapping around. That is to say
map(1) * 2 == map(2)
map(1) * 3 == map(3)
This happens, of course, because with lower numbers, all the higher bits are 0, so the map function is equivalent to just shifting. This is why I suggested using pseudo-random data for the lower bits rather than the higher bits of the number. It would make the regular interval less noticeable. To help mitigate this problem, the function I wrote shifts only the first 3 bits and rotates the rest. By doing this, the regular interval will be less noticeable for all IDs greater than 7.
It seems that it doesn't have to be numerical? What about an MD5-Hash?
select md5(id+rand(10000)) from ...

What is an effective way to filter numbers according to certain number-ranges using CUDA?

I have a lot of random floating point numbers residing in global GPU memory. I also have "buckets" that specify ranges of numbers they will accept and a capacity of numbers they will accept.
ie:
numbers: -2 0 2 4
buckets(size=1): [-2, 0], [1, 5]
I want to run a filtration process that yields me
filtered_nums: -2 2
(where filtered_nums can be a new block of memory)
But every approach I take runs into a huge overhead of trying to synchronize threads across bucket counters. If I try to use a single-thread, the algorithm completes successfully, but takes frighteningly long (over 100 times slower than generating the numbers in the first place).
What I am asking for is a general high-level, efficient, as-simple-as-possible approach algorithm that you would use to filter these numbers.
edit
I will be dealing with 10 buckets and half a million numbers. Where all the numbers fall into exactly 1 of the 10 bucket ranges. Each bucket will hold 43000 elements. (There are excess elements, since the objective is to fill every bucket, and many numbers will be discarded).
2nd edit
It's important to point out that the buckets do not have to be stored individually. The objective is just to discard elements that would not fit into a bucket.
You can use thrust::remove_copy_if
struct within_limit
{
__host__ __device__
bool operator()(const int x)
{
return (x >=lo && x < hi);
}
};
thrust::remove_copy_if(input, input + N, result, within_limit());
You will have to replace lo and hi with constants for each bin..
I think you can templatize the kernel, but then again you will have to instantiate the template with actual constants. I can't see an easy way at it, but I may be missing something.
If you are willing to look at third party libraries, arrayfire may offer an easier solution.
array I = array(N, input, afDevice);
float **Res = (float **)malloc(sizeof(float *) * nbins);
for(int i = 0; i < nbins; i++) {
array res = where(I >= lo[i] && I < hi[i]);
Res[i] = res.device<float>();
}

How does Bit Masking Buffer Index Result in Wrap around

Can someone explain how bit masking works in terms of a circular buffer index. Specifically in the following code:
#define USART_RX_BUFFER_SIZE 128 /* 2,4,8,16,32,64,128 or 256 bytes */
#define USART_RX_BUFFER_MASK ( USART_RX_BUFFER_SIZE - 1 )
ISR(USART_RX_vect)
{
unsigned char data;
unsigned char tmphead;
/* Read the received data */
data = UDR0;
/* Calculate buffer index */
tmphead = ( USART_RxHead + 1 ) & USART_RX_BUFFER_MASK;
USART_RxHead = tmphead; /* Store new index */
if ( tmphead == USART_RxTail )
{
/* ERROR! Receive buffer overflow */
}
USART_RxBuf[tmphead] = data; /* Store received data in buffer */
}
I know the result of bit masking the index is that the index wraps around; my question is why? Also, why does the "USART_RX_BUFFER_SIZE" have to be a power of 2?
Thank you
Joe
To understand this, you have to understand some binary, and you have to understand binary operations.
As you probably know, everything in computers is stored in binary, sequences of ones and zeros. This means that any string of data in memory can be treated as number, theoretically. Since your code is usin chars, I will focus on them.
in C, chars are either signed or unsigned, it is important that you use unsigned for this. I won't get into two's complement representation, but suffice it to say that it would break if you used signed chars. A char is a single byte, and that is normally considered to be 8 bits, like so:
00000000 -> 0
00001001 -> 9
Basically each bit represents a power of two (I'm using MSB-first here), so the second number is 2^1 + 2^3 = 1 + 8 = 9. So you can see how it can be used to index into an array.
Bitwise operations operate on the individual bits of some data. In this case, you are using binary and (&), and the act of applying binary and is called bit-masking.
data - 00101100
mask - 11110110
----------
result - 00101100
As you can see, the result has bits set to 1 only where both the data and mask has 1.
Now back to our binary representation. Since each bit is a power of two, a power of two in binary can be represented using a single 1 in amongst 0's.
01000000 - 64
And just like 1000 - 1 = 999, 01000000 - 1 = 00111111, where 00111111 is 63.
Using that we can find that when working out the next index, we perform the following operation:
(a + 1) & 00111111
if a is (for example) 10, then we get
(00001010 + 1) = 00001011 (11)
00001011 & 00111111 = 00001011
So masking made no change, but in the case of 63:
(00111111 + 1) = 01000000 (64)
01000000 & 00111111 = 00000000 (0)
So rather than trying to index into 64 (which is the 65th element, and therefore an error), you go back to the beginning.
This is why the buffer size has to be a power of two, if it wasn't then the mask wouldn't calculate properly and you would have to use modulo (%), or a comparison, rather than bit masking. This is important because bitwise operators are very fast, given that they are normally only a single instruction in most processors, and & would require very few cycles. A modulo may be a single instruction, but it would probably be integer division, and that is traditionally quite slow on most platforms. And a comparison would require several instructions, registers and at least 1 jump.
JOhn writed:
I second the comment about modulo, we found on our micro it was taking
something like 10-20x longer to do "a %= MODULUS" than to do if(a>b)
a/=MODULUS; – John U Aug 3 '12 at 17:01
But there is still division : a/=MODULUS and therefore effeciency is same as modulo operation, i assume ..

What's the absolute minimum a programmer should know about binary numbers and arithmetic? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Although I know the basic concepts of binary representation, I have never really written any code that uses binary arithmetic and operations.
I want to know
What are the basic concepts any
programmer should know about binary
numbers and arithmetic ? , and
In what "practical" ways can binary
operations be used in programming. I
have seen some "cool" uses of shift
operators and XOR etc. but are there
some typical problems where using binary
operations is an obvious choice.
Please give pointers to some good reference material.
If you are developing lower-level code, it is critical that you understand the binary representation of various types. You will find this particularly useful if you are developing embedded applications or if you are dealing with low-level transmission or storage of data.
That being said, I also believe that understanding how things work at a low level is useful even if you are working at much higher levels of abstraction. I have found, for example, that my ability to develop efficient code is improved by understanding how things are represented and manipulated at a low level. I have also found such understanding useful in working with debuggers.
Here is a short-list of binary representation topics for study:
numbering systems (binary, hex, octal, decimal, ...)
binary data organization (bits, nibbles, bytes, words, ...)
binary arithmetic
other binary operations (AND,OR,XOR,NOT,SHL,SHR,ROL,ROR,...)
type representation (boolean,integer,float,struct,...)
bit fields and packed data
Finally...here is a nice set of Bit Twiddling Hacks you might find useful.
Unless you're working with lower level stuff, or are trying to be smart, you never really get to play with binary stuff.
I've been through a computer science degree, and I've never used any of the binary arithmetic stuff we learned since my course ended.
Have a squizz here: http://www.swarthmore.edu/NatSci/echeeve1/Ref/BinaryMath/BinaryMath.html
You must understand bit masks.
Many languages and situations require the use of bit masks, for example flags in arguments or configs.
PHP has its error level which you control with bit masks:
error_reporting = E_ALL & ~E_NOTICE
Or simply checking if an int is odd or even:
isOdd = myInt & 1
I believe basic know-hows on binary operations line AND, OR, XOR, NOT would be handy as most of the programming languages support these operations in the form of bit-wise operators.
These operations are also used in image processing and other areas in graphics.
One important use of XOR operation which I can think of is Parity check. Check this http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BitOp/xor.html
cheers
The following are things I regularly appreciate knowing in my quite conventional programming work:
Know the powers of 2 up to 2^16, and know that 2^32 is about 4.3 billion. Know them well enough so that if you see the number 2147204921 pop up somewhere your first thought is "hmm, that looks pretty close to 2^31" -- that's a very effective module for your bug radar.
Be able to do simple arithmetic; e.g. convert a hexadecimal digit to a nybble and back.
Have some vague idea of how floating-point numbers are represented in binary.
Understand standard conventions that you might encounter in other people's code related to bit twiddling (flags get ORed together to make composite values and AND checks if one's set, shift operators pack and unpack numbers into different bytes, XOR something twice and you get the same something back, that kind of thing.)
Further knowledge is mostly gravy unless you work with significant performance constraints or do other less common work.
At the absolute bare minimum you should be able to implement a bit mask solution. The tasks associated with bit mask operations should ensure that you at least understand binary at a superficial level.
From the top of my head, here are some examples of where I've used bitwise operators to do useful stuff.
A piece of javascript that needed one of those "check all" boxes was something along these lines:
var check = true;
for(var i = 0; i < elements.length; i++)
check &= elements[i].checked;
checkAll.checked = check;
Calculate the corner points of a cube.
Vec3f m_Corners[8];
void corners(float a_Size){
for(size_t i = 0; i < 8; i++){
m_Corners[i] = a_Size * Vec3f(axis(i, Vec3f::X), axis(i, Vec3f::Y), axis(i, Vec3f::Z));
}
}
float axis(size_t a_Corner, int a_Axis) const{
return ((a_Corner >> a_Axis) & 1) == 1
? -.5f
: +.5f;
}
Draw a Sierpinski triangle
for(int y = 0; y < 512; y++)
for(int x = 0; x < 512; x++)
if(x & y) pixels[x + y * w] = someColor;
else pixels[x + y * w] = someOtherColor;
Finding the next power of two
int next = 1 << ((int)(log(number) / log(2));
Checking if a number is a power of two
bool powerOfTwo = number & (number - 1);
The list can go on and on, but for me these are (except for Sierpinksi) everyday examples. Once you'll understand and work with it though, you'll encounter it in more and more places such as the corners of a cube.
You don't specifically mention (nor rule out!-) floating point binary numbers and arithmetic, so I won't miss the opportunity to flog one of my favorite articles ever (seriously: I sometimes wish I could make passing a strict quiz on it a pre-req of working as a programmer...;-).
The most important thing every programmer should know about binary numbers and arithmetic is : Every number in a computer is represented in some kind of binary encoding, and all arithmetic on a computer is binary arithmetic.
The consequences of this are many:
Floating point "bugs" when doing math with IEEE floating point binary numbers (Which is all numbers in javascript, and quite a few in JAVA, and C)
The upper and lower bounds of representable numbers for each type
The performance cost of multiplication/division/square root etc operations (for embedded systems
Precision loss, and accumulation errors
and more. This is stuff you need to know even if you never do a bitwise xor, or not, or whatever in your life. You'll still run into these things.
This really depends on the language you're using. Recent languages such as C# and Java abstract the binary representation from you -- this makes working with binary difficult and is not usually the best way to do things anyway in these languages.
Middle and low level languages like C and C++, however, require you to understand quite a bit about how the numbers are stored underneath -- especially regarding endianness.
Binary knowledge is also useful when implementing a cross platform protcol of some sort .... for example, on x86 machines, byte order is little endian. but most network protocols want big endian numbers. Therefore you have to realize you need to do the conversion for things to go smoothly. Many RFCs, such as this one -> https://www.rfc-editor.org/rfc/rfc4648 require binary knowledge to understand.
In short, it's completely dependent on what you're trying to do.
Billy3
It's handy to know the numbers 256 and 65536. It's handy to know how two's complement negative numbers work.
Maybe you won't run into a lot of binary. I still use it pretty often, but maybe out of habit.
A good familiarity with bitwise operations should make you more facile with boolean algebra, and I think that's important for every programmer--you want to be able to quickly simplify complex logic expressions.
Absolute minimum is, that "2" is not a binary digit and 10b is smaller than 3.
If you never do low-level programming (like C in embedded systems), never have to use a debugger, and never have to work with real numbers, then I suppose you could get by without knowing binary. But knowing binary will make you a stronger programmer, even if indirectly.
Once you venture into those areas you will need to know binary (and its ``sister'' base, hexadecimal). Without knowing it:
Embedded systems programming would be impossible.
Debugging would be hard because you wouldn't know what you were looking at in memory.
Numerical calculations with decimals would give you answers you don't understand.
I learned to twiddle bits back when c and asm were still used for "mainstream" programming. Although I no longer have much use for that knowledge, I recently used it to solve a real-world business problem.
We use a fax service that posts a message back to us when the fax has been sent or failed after x number of retries. The only way I had to identify the fax was a 15 character field. We wanted to consolidate this into one URL for all of our clients. Before we consolidated, all we had to fit in this field was the FaxID PK (32 bit int) column which we just sent as a string.
Now we had to identify the client (a 4 character code) and the database (32 bit int) underneath the client. I was able to do this using base 64 encoding. Without understanding the binary representation of numbers and characters, I probably would never have even thought of this solution.
Some useful information about the number system.
Binary | base 2
Hexadecimal | base 16
Decimal | base 10
Octal | base 8
These are the most common.
Converting them is faily easy.
112 base 8 = (1 x 8^2) + (2 x 8^1) + (4 x 8^0)
74 base 10 = (7 x 10^1) + (4 x 10^0)
The AND, OR, XOR, and etc. are used in logic gates. Search boolean algebra, something well worth the time knowing.
Say for instance, you have 11001111 base 2 and you want to extract the last four only.
Truth table for AND:
P | Q | R
T | T | T
T | F | F
F | F | F
F | T | F
You can use 11001111 base 2 AND 00111111 base 2 = 00001111 base 2
There are plenty of resources on the internet.

Rot13 for numbers

EDIT: Now a Major Motion Blog Post at http://messymatters.com/sealedbids
The idea of rot13 is to obscure text, for example to prevent spoilers. It's not meant to be cryptographically secure but to simply make sure that only people who are sure they want to read it will read it.
I'd like to do something similar for numbers, for an application involving sealed bids. Roughly I want to send someone my number and trust them to pick their own number, uninfluenced by mine, but then they should be able to reveal mine (purely client-side) when they're ready. They should not require further input from me or any third party.
(Added: Note the assumption that the recipient is being trusted not to cheat.)
It's not as simple as rot13 because certain numbers, like 1 and 2, will recur often enough that you might remember that, say, 34.2 is really 1.
Here's what I'm looking for specifically:
A function seal() that maps a real number to a real number (or a string). It should not be deterministic -- seal(7) should not map to the same thing every time. But the corresponding function unseal() should be deterministic -- unseal(seal(x)) should equal x for all x. I don't want seal or unseal to call any webservices or even get the system time (because I don't want to assume synchronized clocks). (Added: It's fine to assume that all bids will be less than some maximum, known to everyone, say a million.)
Sanity check:
> seal(7)
482.2382 # some random-seeming number or string.
> seal(7)
71.9217 # a completely different random-seeming number or string.
> unseal(seal(7))
7 # we always recover the original number by unsealing.
You can pack your number as a 4 byte float together with another random float into a double and send that. The client then just has to pick up the first four bytes. In python:
import struct, random
def seal(f):
return struct.unpack("d",struct.pack("ff", f, random.random() ))[0]
def unseal(f):
return struct.unpack("ff",struct.pack("d", f))[0]
>>> unseal( seal( 3))
3.0
>>> seal(3)
4.4533985422978706e-009
>>> seal(3)
9.0767582382536571e-010
Here's a solution inspired by Svante's answer.
M = 9999 # Upper bound on bid.
seal(x) = M * randInt(9,99) + x
unseal(x) = x % M
Sanity check:
> seal(7)
716017
> seal(7)
518497
> unseal(seal(7))
7
This needs tweaking to allow negative bids though:
M = 9999 # Numbers between -M/2 and M/2 can be sealed.
seal(x) = M * randInt(9,99) + x
unseal(x) =
m = x % M;
if m > M/2 return m - M else return m
A nice thing about this solution is how trivial it is for the recipient to decode -- just mod by 9999 (and if that's 5000 or more then it was a negative bid so subtract another 9999). It's also nice that the obscured bid will be at most 6 digits long. (This is plenty security for what I have in mind -- if the bids can possibly exceed $5k then I'd use a more secure method. Though of course the max bid in this method can be set as high as you want.)
Instructions for Lay Folk
Pick a number between 9 and 99 and multiply it by 9999, then add your bid.
This will yield a 5 or 6-digit number that encodes your bid.
To unseal it, divide by 9999, subtract the part to the left of the decimal point, then multiply by 9999.
(This is known to children and mathematicians as "finding the remainder when dividing by 9999" or "mod'ing by 9999", respectively.)
This works for nonnegative bids less than 9999 (if that's not enough, use 99999 or as many digits as you want).
If you want to allow negative bids, then the magic 9999 number needs to be twice the biggest possible bid.
And when decoding, if the result is greater than half of 9999, ie, 5000 or more, then subtract 9999 to get the actual (negative) bid.
Again, note that this is on the honor system: there's nothing technically preventing you from unsealing the other person's number as soon as you see it.
If you're relying on honesty of the user and only dealing with integer bids, a simple XOR operation with a random number should be all you need, an example in C#:
static Random rng = new Random();
static string EncodeBid(int bid)
{
int i = rng.Next();
return String.Format("{0}:{1}", i, bid ^ i);
}
static int DecodeBid(string encodedBid)
{
string[] d = encodedBid.Split(":".ToCharArray());
return Convert.ToInt32(d[0]) ^ Convert.ToInt32(d[1]);
}
Use:
int bid = 500;
string encodedBid = EncodeBid(bid); // encodedBid is something like 54017514:4017054 and will be different each time
int decodedBid = DecodeBid(encodedBid); // decodedBid is 500
Converting the decode process to a client side construct should be simple enough.
Is there a maximum bid? If so, you could do this:
Let max-bid be the maximum bid and a-bid the bid you want to encode. Multiply max-bid by a rather large random number (if you want to use base64 encoding in the last step, max-rand should be (2^24/max-bid)-1, and min-rand perhaps half of that), then add a-bid. Encode this, e.g. through base64.
The recipient then just has to decode and find the remainder modulo max-bid.
What you want to do (a Commitment scheme) is impossible to do client-side-only. The best you could do is encrypt with a shared key.
If the client doesn't need your cooperation to reveal the number, they can just modify the program to reveal the number. You might as well have just sent it and not displayed it.
To do it properly, you could send a secure hash of your bid + a random salt. That commits you to your bid. The other client can commit to their bid in the same way. Then you each share your bid and salt.
[edit] Since you trust the other client:
Sender:
Let M be your message
K = random 4-byte key
C1 = M xor hash(K) //hash optional: hides patterns in M xor K
//(you can repeat or truncate hash(K) as necessary to cover the message)
//(could also xor with output of a PRNG instead)
C2 = K append M //they need to know K to reveal the message
send C2 //(convert bytes to hex representation if needed)
Receiver:
receive C2
K = C2[:4]
C1 = C2[4:]
M = C1 xor hash(K)
Are you aware that you need a larger 'sealed' set of numbers than your original, if you want that to work?
So you need to restrict your real numbers somehow, or store extra info that you don't show.
One simple way is to write a message like:
"my bid is: $14.23: aduigfurjwjnfdjfugfojdjkdskdfdhfddfuiodrnfnghfifyis"
All that junk is randomly-generated, and different every time.
Send the other person the SHA256 hash of the message. Have them send you the hash of their bid. Then, once you both have the hashes, send the full message, and confirm that their bid corresponds to the hash they gave you.
This gives rather stronger guarantees than you need - it's actually not possible from them to work out your bid before you send them your full message. However, there is no unseal() function as you describe.
This simple scheme has various weaknesses that a full zero-knowledge scheme would not have. For example, if they fake you out by sending you a random number instead of a hash, then they can work out your bid without revealing their own. But you didn't ask for bullet-proof. This prevents both accidental and (I think) undetectable cheating, and uses only a commonly-available command line utility, plus a random number generator (dice will do).
If, as you say, you want them to be able to recover your bid without any further input from you, and you are willing to trust them only to do it after posting their bid, then just encrypt using any old symmetric cipher (gpg --symmetric, perhaps) and the key, "rot13". This will prevent accidental cheating, but allow undetectable cheating.
One idea that poped into my mind was to maybe base your algorithm on the mathematics
used for secure key sharing.
If you want to give two persons, Bob and Alice, half a key each so
that only when combining them they will be able to open whatever the key locks, how do you do that? The solution to this comes from mathematics. Say you have two points A (-2,2) and B (2,0) in a x/y coordinate system.
|
A +
|
C
|
---+---+---+---|---+---B---+---+---+---
|
+
|
+
If you draw a straight line between them it will cross the y axis at exactly one single point, C (0,1).
If you only know one of the points A or B it is impossible to tell where it will cross.
Thus you can let the points A and B be the shared keys which when combined will reveal the y-value
of the crossing point (i.e. 1 in this example) and this value is then typically used as
a real key for something.
For your bidding application you could let seal() and unseal() swap the y-value between the C and B points
(deterministic) but have the A point vary from time to time.
This way seal(y-value of point B) will give completely different results depending on point A,
but unseal(seal(y-value of point B)) should return the y-value of B which is what you ask for.
PS
It is not required to have A and B on different sides of the y-axis, but is much simpler conceptually to think of it this way (and I recommend implementing it that way as well).
With this straight line you can then share keys between several persons so that only two of
them are needed to unlock whatever. It is possible to use curve types other then straight lines to create other
key sharing properties (i.e. 3 out of 3 keys are required etc).
Pseudo code:
encode:
value = 2000
key = random(0..255); // our key is only 2 bytes
// 'sealing it'
value = value XOR 2000;
// add key
sealed = (value << 16) | key
decode:
key = sealed & 0xFF
unsealed = key XOR (sealed >> 16)
Would that work?
Since it seems that you are assuming that the other person doesn't want to know your bid until after they've placed their own, and can be trusted not to cheat, you could try a variable rotation scheme:
from random import randint
def seal(input):
r = randint(0, 50)
obfuscate = [str(r)] + [ str(ord(c) + r) for c in '%s' % input ]
return ':'.join(obfuscate)
def unseal(input):
tmp = input.split(':')
r = int(tmp.pop(0))
deobfuscate = [ chr(int(c) - r) for c in tmp ]
return ''.join(deobfuscate)
# I suppose you would put your bid in here, for 100 dollars
tmp = seal('$100.00') # --> '1:37:50:49:49:47:49:49' (output varies)
print unseal(tmp) # --> '$100.00'
At some point (I think we may have already passed it) this becomes silly, and because it is so easy, you should just use simple encryption, where the message recipient always knows the key - the person's username, perhaps.
If the bids are fairly large numbers, how about a bitwise XOR with some predetermined random-ish number? XORing again will then retrieve the original value.
You can change the number as often as you like, as long as both client and server know it.
You could set a different base (like 16, 17, 18, etc.) and keep track of which base you've "sealed" the bid with...
Of course, this presumes large numbers (> the base you're using, at least). If they were decimal, you could drop the point (for example, 27.04 becomes 2704, which you then translate to base 29...)
You'd probably want to use base 17 to 36 (only because some people might recognize hex and be able to translate it in their head...)
This way, you would have numbers like G4 or Z3 or KW (depending on the numbers you're sealing)...
Here's a cheap way to piggyback off rot13:
Assume we have a function gibberish() that generates something like "fdjk alqef lwwqisvz" and a function words(x) that converts a number x to words, eg, words(42) returns "forty two" (no hyphens).
Then define
seal(x) = rot13(gibberish() + words(x) + gibberish())
and
unseal(x) = rot13(x)
Of course the output of unseal is not an actual number and is only useful to a human, but that might be ok.
You could make it a little more sophisticated with words-to-number function that would also just throw away all the gibberish words (defined as anything that's not one of the number words -- there are less than a hundred of those, I think).
Sanity check:
> seal(7)
fhrlls hqufw huqfha frira afsb ht ahuqw ajaijzji
> seal(7)
qbua adfshua hqgya ubiwi ahp wqwia qhu frira wge
> unseal(seal(7))
sueyyf udhsj seven ahkua snsfo ug nuhdj nwnvwmwv
I know this is silly but it's a way to do it "by hand" if all you have is rot13 available.