This may be a silly question since I can see a relationship between "16 is a power of two" and "each bit represents 2^(n of bits) values", so it is logical that two hex numerical digits represent a byte's possible values. However I am still too blind to see how it would be any better than decimal as either way it needs to be converted into binary.
Can someone explain the numerical notation conversion process in low-level pseudo-programming to make it more understandable? I really don't see how the conversion process is shortened if there is a conversion going on anyway.
As you have noticed. Both are equally machine friendly. Both 10 and 0x0a is the same number. In the same way that 1/2 and 0.5 is the same number in decimal.
From the machine's point of view it doesn't matter. The machine doesn't see "10" or "0x0a". The machine just sees 00001010. Actually, not even that. What's really in the machine are two bits at a voltage (or charge) level that represents on and eight bits at a voltage level that represents off.
The reason for the popularity of hexadecimal numbers is that it is easier to read for a "human". At least for engineers and programmers (to be honest I'm a bit hesitant to mention programmers because it isn't true for a vast majority of programmers).
First of all, engineers and programmers who have to write device drivers rarely care if a variable has the value 10 or 62 or 233 or whatever. They do care if a number fits within memory or not. What values are sent to the hardware is the user's problem. Weather or not that value can be sent is what the engineer or driver writers have to deal with.
For this hex numbers have a very significant advantage because each character is aligned to exactly one nybble. Which means that a byte is represented by exactly two characters and two characters represent exactly one byte. Contrast this with decimals where a byte requires three characters but three characters can represent up to 12 bits.
Quick, can you fit 133 in one byte? Maybe you know that one byte can represent numbers from 0-255 so it may seem obvious, but a number like 0x133 obviously requires two bytes. In fact, to make it more clear you'll often see engineers write (hex) 01 33 in data dumps or documentation to make it more obvious that it's two bytes.
It's even more useful at higher bit counts. Quick, can you fit 4311111111 in 32 bits? I have a vague idea that 32 bits is roughly 4 million but am not sure if that number fits in 32 bits. Contrast this with it's hexadecimal representation: 0x1 00F6 55C7 it is much more obvious that you need at least 33 bits to represent that number.
Engineers are also used to seeing bit patterns in hexadecimal. The basic pattern is simple:
1 = first bit
2 = second bit
4 = third bit
8 = fourth bit
So, if you want to set bit number 19 in a register, your thought process would go something like this:
Bit 19 is after the 16th bit. And you know that the 16th bit is `0x0000 8000`
(when I think in hex, my mind always add the blank spaces for clarity).
19 is 3 bits higher than 16. And the number that represents the third bit is 4.
So bit number 19 is: `0x0004 0000`
That part I mentioned about the 16th bit is another basic pattern most engineers trained to read hex recognize:
00 00 00 80 = bit no 8
00 00 80 00 = bit no 16
00 80 00 00 = bit no 24
80 00 00 00 = bit no 32
Another common pattern that I've trained myself to recognize is square waves:
_ _ _ _
0x55 = _| |_| |_| |_| (01010101)
_ _ _ _
0xaa = |_| |_| |_| |_ (10101010)
_ _ _ _
0x33 = _ _| |_ _| (00110011)
_ _ _ _
0xcc = |_ _| |_ _ (11001100)
_ _ _ _
0x66 = _| |_ _| |_ (01100110)
_ _ _ _
0x99 = |_ _| |_ _| (10011001)
_ _ _ _
0xf0 = |_ _ _ _ (11110000)
_ _ _ _
0x0f = _ _ _ _| (00001111)
Which means, that for someone like me, if you tell me you want to set the first, third and fourth bits of a variable in under 2 seconds I'd go Aha: 0x0d. Because obviously third and foutth bits is c (square wave pattern) so add bit one and it becomes d.
You may be thinking to yourself: wait a minute, wouldn't binary representation be better. Well, yes (bin) 0000 1101 is much more obviously the first, third and fourth bits set. The problem with binary representation is that at anything higher than 8 bits the number becomes too large for the human eye to easily comprehend and in practice have lead to even more mistakes and misunderstandings compared to hexadecimal.
Also, not many languages have support for binary representation and of those that do most don't allow spaces or other separators (underscore for example) between digits which makes the number even less readable compared to hex.
Hexadecimal hasn't always been the preferred representation among engineers and (some) programmers. At one point octal numbers were just as popular. But octals have the same problem as decimal. A byte requires three octal digits to represent but three octal digits can actually represent 9 bits. So it doesn't have that nice perfect division that hexadecimal has.
There is at least one other method to represent binary values that's very popular. You've seen it many times but may not have realized what it actually is: the dotted decimal notation of IPv4 addresses is actually an encoding that represents the values as decimal yet attempts to fit them within the structure of memory:
192.168.0.1 = 11000000 10101000 00000000 00000001
(192) (168) (0) (1)
Alas, as we move forward with IPv6, we seem to have decided to go back to hexadecimal encoding.
It isn't more machine-friendly, it's more people-friendly. As the question comments have pointed out, machines deal almost exclusively with bits. But they expose design elements to people in a variety of ways. Different machine designs have encouraged different powers-of-two encodings in a number of ways. There's the obvious effect of the now-standard 8-bit byte, but bytes haven't always been 8 bits. A lot of the use of hexadecimal comes from how machines, especially those designed prior to near-universal microcoding, lay out instruction codes. In some designs, octal makes better sense. For example, the Intel 8080 CPU had a MOVinstruction that contained two register ids. Expressed in hexadecimal, the various register combinations produce opcodes ranging from 40 to 7F, with no obvious patterns. But in octal, it's obvious: 200 through 277, or rather 2xy, where x and y are registerids.
Related
Question with regards to the use of hexadecimal. Here's a statement I have a question about:
Hexadecimal is often the preferred means of representing data because it uses fewer digits than binary.
For example, the word vim can be represented by:
Hexadecimal: 76 69 6D
Decimal: 118 105 109
Binary: 01110110 01101001
01101101
Obviously, hex is shorter than binary in this example, however, wont the hex values eventually be converted to binary at the machine level so the end result for hex-binary is exactly the same?
This is a good question.
Yes, the hexadecimal values will be converted in binaries at machine level.
But you are looking the question from the machine point of view.
Hexadecimal notation was introduced because:
It's more easy to read and memorize than binaries for human. For example if you are reading memory addresses, you can observe that they are actually written in hexadecimal, that is far more simple to read then binary.
It's easy to do calculation from binaries to hexadecimal than other base (like
our today-used base 10). For example, it's easy to group binary digits into hex in your head (4 bits per hex digit).
I suggest you this article that gives some easy example calculations to better understand hexadecimal advantages.
Instead of writing ffff why the syntax of writing heaxadecimal number's are like 0*ffff.What is the meaning of "0*". Does it specify something?
Anyhow A,B,C,D,E,F notations only in hexa decimal number system. Then whats the need of "0*".
Sorry "*" was not the character i supposed it is "x" .
Is it a nomenclature or notation for hexadecimal number systems.
I don't know what language you are talking about, but if you for example in C# write
var ffffff = "Some unrelated string";
...
var nowYouveDoneIt = ffffff;
what do you expect to happen? How does the compiler know if ffffff refers to the hexadecimal representation of the decimal number 16777215 or to the string variable defined earlier?
Since identifiers (in C#) can't begin with a number, prefixing with a 0 and some other character (in C# it's 0xffffff or hex and 0b111111111111111111111111 for binary IIRC) is a handy way of communicating what base the number literal is in.
EDIT: Another issue, if you were to write var myCoolNumber = 10, how would you have ANY way of knowing if this means 2, 10 or 16? Or something else entirely.
It's typically 0xFFFF: the letter, not the multiplication symbol.
As for why, 0x is just the most common convention, like how some programming languages allow binary to be prefixed by 0b. Prefixing a number with just 0 is typically reserved for octal, or base 8; they wanted a way to tell the machine that the following number is in hexadecimal, or base 16 (10 != 0b10 [2] != 010 [8] != 0x10 [16]). They typically omitted a small 'o' from identifying octal for human readability purposes.
Interestingly enough, most Assembly-based implementations I've come across use (or at least allow the use of) 0h instead or as well.
It's there to indicate the number as heX. It's not '*', it's 'x' actually.
See:
http://www.tutorialspoint.com/cprogramming/c_constants.htm
I am trying to figure out how to read a historical binary data file. I believe it came from an older 32 bit Solaris system. I am looking at a section of the file that I believe contains 32 bit floating numbers (not IEEE floats). The format appears to be (as a hex dump):
xx41 xxxx
xx42 xxxx
The 41 and 42 in those positions appear consistently through the floating point numbers. I'm afraid that I do not have any additional information to add to this. So the first part of my question is, what format is this? If the first part can not be answered directly, a list of likely possibilities would be great. Lastly, how would you suggest going about determining what format this is? Thank you for your input.
Could this be PDP-11 format? The giveaway for me is that the second byte is mostly constant, which suggests that the exponent of the floating-point format is ending up in the second byte rather than the first (as you'd expect for a big-endian machine) or the last (for a little-endian machine). The PDP-11 is notorious for its funny byte order for floats and integers; see the material near the bottom of this Floating-Point Formats page.
The values of 41 and 42 would appear to be consistent with positive values of roughly unit-magnitude: the exponent bias for the PDP-11 format appears to be 128, so with the unusual byte-order I'd expect the 2nd byte that you list to contain the sign and the topmost 7 bits of the exponent; that would make the unbiased exponent for a second byte of 41 be either 2 or 3 depending on the 8th exponent bit (which should appear as the MSB of the first byte).
See also this page for a brief description of the PDP-11 format.
[EDIT] Here's some Python code to convert from a 4-byte string in the form you describe to a Python float, assuming that the 4-byte string represents a float in PDP-11 format.
import struct
def pdp_to_float(xs):
"""Convert a 4-byte PDP-11 single-precision float to a Python float."""
ordered_bytes = ''.join(xs[i] for i in [1, 0, 3, 2])
n = struct.unpack('>I', ordered_bytes)[0]
fraction = n & 0x007fffff
exponent = (n & 0x7f800000) >> 23
sign = (n & 0x80000000) >> 31
hidden = 1 if exponent != 0 else 0
return (-1)**sign * 2**(exponent-128) * (hidden + fraction / 2.0**23)
Example:
>>> pdp_to_float('\x00\x00\x00\x00')
0.0
>>> pdp_to_float('\x23\x41\x01\x00')
5.093750476837158
>>> pdp_to_float('\x00\x42\x00\x00')
16.0
The data described is consistent with the usual IEEE 754 format, stored in big-endian order, then displayed by a little-endian dump program two bytes at a time.
32-bit floats in the interval [8, 128) have first bytes of 0x41 or 0x42. Consider such a number, perhaps 0x41010203. Stored big end first, it would appear in memory as the four bytes 0x41, 0x01, 0x02, and 0x03. When the dump program reads 16-byte integers, little end first, it will read and display 0x0141 and 0x0302.
Sorry for the earlier no so clear questions,
I should of asked my doubt than just posting the question.
just to clarify so lets say it is a 3bit float and a 3 bit integer, so for int its
2^3, [_ _ _ _ ], ex: 8=[1 0 0 0].
What about for the float will it be like
[ _ _ _ _ . _ _ _ _]
example i can represent 8.5 as
[1 0 0 0. 1 0 0 0]
Is what i wrote correct, i get how to represent a integer value given how many bits it should, but for float, are the number of bits same before and after the decimal place.
3 bit floating point has to have a maximum of 3 bits, which makes it if not impossible, completely impractical. Here is why from the definition of a floating point representation (IEEE-754)
Floating point is comprised of finite numbers, which may be either base 2 (binary) or base 10 (decimal). Each finite number is described by three integers: s = a sign (zero or one), c = a significand (or 'coefficient'), q = an exponent. This means in your case that you have a sign bit (bit 1) a significand (bit2) and and exponent (bit3). The range for this is useless.
You don't string series of three bits together because floating point is a predetermined fixed length that everybody agrees on ahead of time, at least in the sense you mean.
There are examples of how to do extended floating point in GSL source:
http://www.gnu.org/software/gsl/
Decimal numbers can also be represented by other formats like: BCD
http://en.wikipedia.org/wiki/Binary-coded_decimal
Final point: you CAN make 3bit FP numbers, and then string them together if you want to make bigger numbers. They just won't be 3bit anymore: 6bit 12bit, etc.
Need a refresher on bits/bytes, hex notation and how it relates to programming (C# preferred).
Looking for a good reading list (online preferably).
There are several layers to consider here:
Electronic
In the electronic paradigm, everything is a wire.
A single wire represents a single bit.
0 is the LOW voltage, 1 is the
HIGH voltage. The voltages may be [0,5], [-3.3, 3], [-5, 5], [0, 1.3],
etc. The key thing is that there are only two voltage levels which control
the action of the transistors.
A byte is a collection of wires(To be precise, it's probably collected in a set of flip-flops called registers, but let's leave it as "wires" for now).
Programming
A bit is 0 or 1.
A byte is - in modern systems - 8 bits. Ancient systems might have had 10-bit bytes or other sizes; they don't exist today.
A nybble is 4 bits; half a byte.
Hexadecimal is an efficient representation of 8 bits. For example: F
maps to 1111 1111. That is more efficient than writing 15. Plus, it is quite clear if you are writing down multiple byte values: FF is unambiguous; 1515 can be read several different ways.
Historically, octal has been also used(base 8). However, the only place where I have met it is in the Unix permissions.
Since on the electronic layer, it is most efficient to collect memory
in groups of 2^n, hex is a natural notation for representing
memory. Further, if you happen to work at the driver level, you may
need to specifically control a given bit, which will require the use
of bit-level operators. It is clear which bytes are on HI if you say
F & outputByte than 15 & outputByte.
In general, much of modern programming does not need to concern itself
with binary and hexadecimal. However, if you are in a place where you
need to know it, there is no slipping by - you really need to know
it then.
Particular areas that need the knowledge of binary include: embedded
systems, driver writing, operating system writing, network protocols,
and compression algorithms.
While you wanted C#, C# is really not the right language for bit-level
manipulation. Traditionally, C and C++ are the languages used for bit
work. Erlang works with bit manipulation, and Perl supports it as
well. VHDL is completely bit-oriented, but is fairly difficult to work
with from the typical programming perspective.
Here is some sample C code for performing different logical operations:
char a, b, c;
c = a ^ b; //XOR
c = a & b; //AND
c = a | b; //OR
c = ~(a & b); //NOT AND(NAND)
c = ~a; //NOT
c = a << 2; //Left shift 2 places
c = a >> 2; //Right shift 2 places.
A bit is either 1 or 0.
A byte is 8 bits.
Each character in hex is 4 bits represented as 0-F
0000 is 0
0001 is 1
0010 is 2
0011 is 3
...
1110 is E
1111 is F
There's a pretty good intro to C#'s bit-munching operations here
Here is some basic reading: http://www.learn-c.com/data_lines.htm
Bits and bytes hardly ever relates to C# since the CLR handles memory by itself. There are classes and methods handling hex notation and all those things in the framework too. But, it is still a fun read.
Write Great Code is a good primer on this topic among others...brings you from the bare metal to higher order languages.