Is there a relationship between doing these bit operations on the numbers they produce - binary

Hey guys I'm a second year uni student and I'm really new to using bits and bitwise operations. When I got the binary of 3 (0011), and reversed it (1100) and then shifted it right I get (0110) and 6 as the decimal number. If i do this with 2 I get 2. I was wondering if there was some generic relationship to find what n would become doing those 2 operations on it, because I think it might be the key to one of my homework questions.
Also does anyone have a good resource to learn about simple bitwise operations and properties in general for a school kid

When I got the binary of 3 (0011), and reversed it (1100) and then
shifted it right I get (0110) and 6 as the decimal number.
You are assuming a 4 bit number, what if there is 16 or 32 bit number system? Your results will change accordingly.
Shifting right are two operations, unsigned right shift and signed right shift. Seems like you performed an unsigned right shift since, signed right shift will replace bits according to sign. If it's a negative number (Most significant bit is 1), all replaced bits will be 1, otherwise 0.
I was wondering if there was some generic relationship to find what n
would become doing those 2 operations on it.
Answer to this statement is, it totally varies.
Also does anyone have a good resource to learn about simple bitwise
operations and properties in general for a school kid
I luckily just recently have blogged about Number System and Bit operations.

Related

Signed integer convertion

What is -10234(10) in binary with a fixed width of 16 bits in 1) one's complement 2) two's complement 3) signed magnitude.
Please help me step by step, I feel confused about the above three. Many thanks.
That sounds like a homework problem. I'm not going to do your homework for you, because the goal is for you to learn, but I can explain the stuff in my own words for you. In my experience, most of the people who get lost on this stuff just need to hear things said in a way that works for them, rather than having the same thing repeated.
The first thing that you need to understand for this is what the positive of that number is in base 2. Since the problem said you have 16 bits to handle the signed version in, you'll only have 15 bits to get this done.
As far as how to make it negative...
When you're doing signed magnitude, you would have one of those bits signal whether it was positive or negative. For an example, I'll do 4 bits of signed magnitude. Our number starts off as 3, that is 0011. The signed bit is always the most significant bit, so -3 would be 1011.
When you're doing one's complement, you just flip all of the bits. (So if you had an 8 bit one's complement number that's currently positive - let's say it's 25(9+1) or 00011001(1+1), to make that 25 in one's complement, you'd flip all of those bits, so -25(9+1) is 11100110(1+1) in one's complement.
Two's complement is the same sort of thing, except that rather than having all 1s (11111111(1+1) for the 8 bit version be -0, a number we rarely care to distinguish from +0, it adjusts all of the negative numbers by one so that's now -1.
Note that I'm giving the bases in the form of number +1, because every base is base 10 in that base. But that's me, a grizzled computer professional; if you're still in school, represent bases the way your instructor tells you to, but understand they're crazy. (I can prove they're crazy: 1. They're human. 2. QED. In future years when some people are just learning from AIs, the proof is slightly more complicated. 1. They were made, directly or indirectly by humans. 2 All humans are crazy. 3. QED.)

how does a computer work out if a value is greater than?

I understand basic binary logic and how to do basic addition, subtraction etc. I get that each of the characters in this text is just a binary number representing a number in a charset. The numbers dont really mean anything to the computer. I'm confused however as to how a computer works out that a number is greater than another. what does it do at the bit level?
If you have two numbers, you can compare each bit, from most significant to least significant, using a 1-bit comparator gate:
Of course n-bit comparator gates exist and are described further here.
It subtracts one from the other and sees if the result is less than 0 (by checking the highest-order bit, which is 1 on a number less than 0 since computers use 2's complement notation).
http://academic.evergreen.edu/projects/biophysics/technotes/program/2s_comp.htm
It substracts the two numbers and checks if the result is positive, negative (highest bit - aka "the minus bit" is set), or zero.
Within the processor, often there will be microcode to do operations, using hardwired options, such as add/subtract, that is already there.
So, to do a comparison of an integer the microcode can just do a subtraction, and based on the result determine if one is greater than the other.
Microcode is basically just low-level programs that will be called by assembly, to make it look like there are more commands than is actually hardwired on the processor.
You may find this useful:
http://www.osdata.com/topic/language/asm/intarith.htm
I guess it does a bitwise comparison of two numbers from the most significant bit to the least significant bit, and when they differ, the number with the bit set to "1" is the greater.
In a Big-endian architecture, the comparison of the following Bytes:
A: 0010 1101
B: 0010 1010
would result in A being greatest than B for its 6th bit (from the left) is set to one, while the precedent bits are equal to B.
But this is just a quick theoretic answer, with no concerns about floating point numbers and negative numbers.

Why is it useful to count the number of bits?

I've seen the numerous questions about counting the number of set bits in an insert type of input, but why is it useful?
For those looking for algorithms about bit counting, look here:
Counting common bits in a sequence of unsigned longs
Fastest way to count number of bit transitions in an unsigned int
How to count the number of set bits in a 32-bit integer?
You can regard a string of bits as a set, with a 1 representing membership of the set for the corresponding element. The bit count therefore gives you the population count of the set.
Practical applications include compression, cryptography and error-correcting codes. See e.g. wikipedia.org/wiki/Hamming_weight and wikipedia.org/wiki/Hamming_distance.
If you're rolling your own parity scheme, you might want to count the number of bits. (In general, of course, I'd rather use somebody else's.) If you're emulating an old computer and want to keep track of how fast it would have run on the original, some had multiplication instructions whose speed varied with the number of 1 bits.
I can't think of any time I've wanted to do it over the past ten years or so, so I suspect this is more of a programming exercise than a practical need.
In an ironic sort of fashion, it's useful for an interview question because it requires some detailed low-level thinking and doesn't seem to be taught as a standard algorithm in comp sci courses.
Some people like to use bitmaps to indicate presence/absence of "stuff".
There's a simple hack to isolate the least-significant 1 bit in a word, convert it to a field of ones in the bits below it, and then you can find the bit number by counting the 1-bits.
countbits((x XOR (x-1)))-1;
Watch it work.
Let x = 00101100
Then x-1 = 00101011
x XOR x-1 = 00000111
Which has 3 bits set, so bit 2 was the least-significant 1-bit in the original word

Easiest way to find the correct kademlia bucket

In the Kademlia protocol node IDs are 160 bit numbers. Nodes are stored in buckets, bucket 0 stores all the nodes which have the same ID as this node except for the very last bit, bucket 1 stores all the nodes which have the same ID as this node except for the last 2 bits, and so on for all 160 buckets.
What's the fastest way to find which bucket I should put a new node into?
I have my buckets simply stored in an array, and need a method like so:
Bucket[] buckets; //array with 160 items
public Bucket GetBucket(Int160 myId, Int160 otherId)
{
//some stuff goes here
}
The obvious approach is to work down from the most significant bit, comparing bit by bit until I find a difference, I'm hoping there is a better approach based around clever bit twiddling.
Practical note:
My Int160 is stored in a byte array with 20 items, solutions which work well with that kind of structure will be preferred.
Would you be willing to consider an array of 5 32-bit integers? (or 3 64-bit integers)? Working with whole words may give you better performance than working with bytes, but the method should work in any case.
XOR the corresponding words of the two node IDs, starting with the most significant. If the XOR result is zero, move on to the next most significant word.
Otherwise, find the most significant bit that is set in this XOR result using the constant time method from Hacker's Delight.. This algorithm results in 32 (64) if the most significant bit is set, and 1 if the least significant bit is set, and so on. This index, combined with the index of the current word, will will tell you which bit is different.
For starters you could compare byte-by-byte (or word-by-word), and when you find a difference search within that byte (or word) for the first bit of difference.
It seems vaguely implausible to me that adding a node to an array of buckets will be so fast that it matters whether you do clever bit-twiddling to find the first bit of difference within a byte (or word), or just churn in a loop up to CHAR_BIT (or something). Possible, though.
Also, if IDs are essentially random with uniform distribution, then you will find a difference in the first 8 bits about 255/256 of the time. If all you care about is average-case behaviour, not worst-case, then just do the stupid thing: it's very unlikely that your loop will run for long.
For reference, though, the first bit of difference between numbers x and y is the first bit set in x ^ y. If you were programming in GNU C, __builtin_clz might be your friend. Or possibly __builtin_ctz, I'm kind of sleepy...
Your code looks like Java, though, so I guess the bitfoo you're looking for is integer log.

Is TimeSpan unnecessary?

EDIT 2009-Nov-04
OK, so it's been a little while since I first posted this question. It seems to me that many of the initial responders failed to really get what I was saying--a common response was some variation on "What you're saying doesn't make any sense"--and so I've made some handy diagrams to really illustrate my point.
When we speak of numbers, we are generally referring to points on what grade school children learn is called the Number Line:
Now, when we learn arithmetic, our minds learn to perform a very interesting transformation of this concept. Evalutating the expression 1 + 0.5, for example, if we simply applied our "number line thinking", would require us to somehow make sense of this:
It's difficult to really illustrate that, because it's difficult to think about that: "adding" two points. This is where a lot of responders struggled with the idea of adding dates (or simply dismissed it as absurd), because they were thinking of dates as points.
However, the expression 1 + 0.5 does make sense to us, because when we think of it, we're really imagining this:
That is, the number (or point) 1, plus the vector 0.5, resulting in point 1.5.
Alternately, we may be imagining this:
That is, the vector 1, plus the vector 0.5, resulting in the vector 1.5.
In other words, when dealing with numbers, we treat points and vectors interchangeably. But what about dates? Dates are, after all, basically numbers. If you don't believe me, compare this line to the number line above:
Notice the correspondence between the timeline and the number line? This was my point: if we perform the transformation above with numbers, we ought to be able to do it with dates as well. So, applying "timeline thinking", the expression 0001-Jan-02 00:00:00 + 0001-Jan-01 12:00:00 doesn't make a lot of sense, as plenty of responders pointed out:
But, if we do the same conceptual transformation in our head that we perform every time we add or subtract numbers, we can easily "rethink" the above as this:
So clearly, the difference between a DateTime and a TimeSpan is the same difference that exists between a point and a vector. What I think caused a lot of people to respond negatively to my suggestion is that it just feels so unnatural to think of dates as magnitudes in this way. But I don't buy the argument that there's no obvious reference point to use as zero. There is an obvious reference point, and I'll give you a hint where it is: about 2010 years ago.
Don't get me wrong: I'm not questioning the usefulness of drawing a conceptual divide between the notion of a DateTime and a TimeSpan. Really, my question all along should have been (as ChrisW indirectly suggested), why do we treat numbers and vectors interchangeably when dealing with regular numeric types? (Or: why do we have just one int type, instead of int and intspan?) There's a big difference, and yet we don't ever really think about it until sometime in junior high or high school, when we begin geometry. And then it's treated as this new mathematical concept, when in reality it's something we've been utilizing ever since we learned to add numbers by counting with our fingers.
In the end, the best answer came from Strilanc, who pointed out that the use of DateTime and TimeSpan is really an implementation of an affine space, which has the convenient property of not needing a reference point to treat as the origin. So thanks, Strilanc. I'm giving the accepted answer to ChrisW, however, for being the first one to bring up the concept of vectors and points, which really got to the crux of the matter.
ORIGINAL QUESTION (for posterity)
I am certainly no programming jack of all trades, but I know both PHP and .NET have a TimeSpan class in addition to a DateTime class (or structure in .NET), and I am guessing this is the case in a variety of other languages and frameworks as well (though I am writing this primarily with reference to the .NET structures). This might seem a strange question, but isn't TimeSpan redundant?
In case you think the answer is obvious ("A DateTime is an absolute point in time, while a TimeSpan is a range of time -- simple as that!"), consider this: an integer can be conceptualized as either an absolute value (the point on the number line) or a distance between values--and we don't need two separate data types for these different conceptualizations. I can still write 5 + 6 without any ambiguity as to what I mean.
As long as there is a consistent zero-point reference, it seems to me there should be no reason why one would need a TimeSpan object to perform arithmetic operations on DateTime objects, or to get the distance between them.
What am I missing? Why can't the unique methods and properties of the TimeSpan structure simply be folded into DateTime?
(Disclaimer: It isn't like I'm passionate about this or anything; I'm fine using DateTime and TimeSpan objects as they're intended all the time. I'm just asking a question.)
EDIT: Okay, over-simplified example to illustrate my point:
Consider the equation 10 - 5 = 5. One could read this as "Start at 10 (value), move 5 to the left (span), and you end up at 5 (value)."
Suppose, just to make things easy, we let January 1 1900 be point zero and we define TimeSpan objects in terms of days only.
Then 10 - 5 = 5 could be understood, in DateTime terms, as January 11 1900 - January 6 1900 = January 6 1900. This is fine, because January 11 is just "10" by our definition and January 6 is "5". The fact that we are viewing the 10 as a value, the first 5 as a span, and the last 5 as a value again is merely for our own conceptual benefit. My point is just this: that the only difference is in how you think of the number, not in what it actually is. This is why we don't have separate structures for, say, integer values and integer spans -- a plain old integer covers all our bases.
Am I making any sense?
consider this: an integer can be conceptualized as either an absolute value (the point on the number line) or a distance between values
By your logic, it isn't TimeSpan that's unecessary: rather it's DateTime that's unnecessary, and could be replaced by TimeSpan (duration since zero).
Plus there's the fact that integers have an obvious zero, whereas Dates however don't have an obvious zero; but having an obvious zero is necessary, if you want to replace "place on the number line" with "distance/span from the zero/origin".
Edit:
A point (location on a plane) isn't the same as a vector.
They seem similar ...
A vector (distance from origin) can represent a point
A point (relative to the origin) can represent a vector
... however the value of the vector that's required to represent a given point will change if the origin changes.
It always makes sense to add two (relative) vectors; but, it makes no sense to add two points, except by converting those points to vectors and then adding the vectors.
The sum of two vectors is unaffected by a change in the origin, but the sum of two points would be affected by a change in the origin if you summed them by converting them to vectors and adding the vectors (because changing the origin would affect the values of those vectors).
[Replace 'point' with DateTime and 'vector' with TimeSpan in the argument above.]
I think there is a genuine difference between absolute and relative values. I'm don't know why that difference isn't more apparent in arithmetic, i.e. why 'numbers' are used seemingly interchangeably to represent both absolute and relative values.
(Speaking as a mathematician) It's because arithmetic operations on a "date" aren't closed or well defined, necessitating the need for an additional structure.
For example, January 1, 2000 - December 1, 1999 = ... ? We know there's 31 days between them, but if this were interpreted as a date, then the answer is Epoch (i.e., zero) + 31 days. This is not a valid "date" anymore.
Similarly, all the arithmetic operations on integers aren't well defined (1 / 2 has no answer in the integers .. integer math returns zero here, but 0 * 2 = 0, not 1 as you would expect). This necessitates the need for an additional structure that we call fractions.
Just because you can define an operation doesn't mean you should. For example, one of the reasons division by zero is undefined is because defining it would require sacrificing some very useful properties of arithmetic (eg. associativity, etc).
The distinction between a timespan and a date comes down to addition. It makes sense to add two timespans, but it doesn't make sense to add two dates unless you have an arbitrary reference date. By not allowing addition of dates, you abstract away that arbitrary reference date. I don't know what date '0' is in .Net, and I've never needed to know. Isn't that nice?
Adding two dates is almost always a bug (seriously, try to think of where this makes sense outside of numerology). By introducing timespans (creating an Affine Space) you eliminate a whole class of bugs.
One reason is that splitting the types prevents a class of bugs where you think you have a relative time but really have an absolute time, and vice versa. For example, addition of two absolute times can be flagged as a compiler error if the two types are separate.
Also, IntelliSense (and discovery for newbies) works better when the number of members is smaller-- by splitting methods between the two types, working with each gets easier.
Asked the other way round: what would the benefit of weakening the type system in that regard be?
It’s all a question of cost vs. benefit and DateTime has the great benefit of reducing bugs due to illogical date/time calculations by forbidding such actions. DateTime exists for very much the same reasons that a strict type-checking system exists in the first place: to make semantic errors in the code produce compile-time messages. that notify the programmers of errors in their code.
Conversely, there’s the cost of having DateTime: zilch.
Now consider dropping DateTime. What would we gain?
To answer your question directly: “isn't TimeSpan redundant?” Absolutely not, it reduces bugs. It definitely has, for me.
Think about it conceptually. If I tell you that I'm having a party 7 days from now, is "7 days" in that sentence a date. Could I just say my party is on 7 days? Of course not, because 7 days isn't a date. One of the key ideas of object oriented programming is to represent concepts like this in the system as types. It's true that we could represent everything as an integer (and in fact, many people have and do), but in object oriented programming, we have the notion of types of items, and their behaviors and properties, and in that sense, it makes sense to have an object that expresses this.
I think you could make the opposite argument that DateTime is redundant, and we should only have TimeSpan :)
Seriously, all dates really are just time spans. They are all relative to some starting point. Technically, there is no "year zero" in the Christian calendar (since you can't really have a "zeroth year of our lord"), but if we assign 12:00 A.M. January 1, 0001 B.C. as the "zero point", then every date that comes after (or before) can be thought of as relative to that date. So, 12:00 A.M. on September 19, 2009 would have a TimeSpan of 734033 days.
So, mathematically, DateTime and TimeSpan are redundant. But when we write code, we are attempting to communicate much more than just abstract mathematical constructs. Any given DateTime instance may in fact just be a time span relative to some arbitrary zero point, but to most people reading your code, it will imply a particular point on the calendar. Similarly, a TimeSpan implies the gap between two points on the calendar.
In this case, Microsoft has chosen to be clear rather than parsimonious. I can't say I disagree with the decision.
There are a lot of complications in dates, for example:
leap years
leap seconds
the 1582 change to the gregorian calendar
the fact that there is no such thing as 0 years
differences in the lengths of months
Treating Dates and TimeSpans as different things means that these kinds of issues are much less likely to confuse you in practise.
its sugar not more or less....