Consider a system that has a byte-addressable memory organized in 32-bit words according to the big-endian scheme. A program reads ASCII characters entered at a keyboard and stores them in successive byte locations starting at location 1000.
Show the contents of the two memory words at locations 1000 and 1004 after the name johnson has been entered. Write this in the little-endian scheme.
What I got was:
[NULL, n], [o, s], [n,h], [o,j]
00, 6E 6F, 73 6E, 68 6F, 6A
I just want to know if this is correct and if not, what I did wrong.
Thank you all!
There is no such thing as endianes for storing a single byte (such as an ASCII character). Endianes only comes into play when a value is represented as multiple bytes. So for example, storing a sequence of bytes is the same in little- and big-endian, only the representation of the bytes are different. For example, take the number 3 735 928 559 (or 0xdeadbeef in hex notation) and store that as a 32-bit word (e.g., an int) at memory location 1000 will give:
ADR: 1000 1001 1002 1004
BE: de ad be ef
LE: ef be ad de
So, if you were to actually represent your ASCII character as a 32-bit word you would get:
[0, 0, 0, 6a], [0, 0, 0, 6f], ... or,
[6a, 0, 0, 0], [6f, 0, 0, 0], ...
for BE and LE respectively.
I find this question quite confusing.
byte is normally defined as the smallest addressable unit so saying that a machine has byte-addressable memory just tells nothing: every machine has byte-addressable memory because that's the definition of what a byte is, what can change is how many bits is a byte.
If the question is talking about a 32-bit byte machine (I know they exists, but I personally used only machines with 8-bit and 16-bit bytes) then it's not clear what role is playing endian-ness given that no multibyte processing is needed for storing ASCII.
What is often done in large-byte machines is however storing multiple characters per byte to save space (not necessarily a 16-bit byte machine is "big": the one I know is a DSP with a very limited amount of memory) but this seems unrelated to the question and there so "standard" way to do so anyway.
If instead the question assumes that a byte is always 8 bit by definition and talks about storing ASCII chars then once again endian-ness plays no role; chars are just store in memory one after another in consecutive locations. For example if the string "johnson" has been stored (assuming a C string convention) the content of memory would be:
0x6A 0x6F 0x68 0x6E 0x73 0x6F 0x6E 0x00
Reading this memory content as two 32-bit words would be affected by endian-ness of course, but saying that the machine uses big-endian and asking to display the result in little-endian scheme is nonsense.
In a big endian scheme (e.g. 68k) the two 32-bit words would be 0x6A6F686E and 0x736F6E00, in a little-endian scheme (e.g. x86) they would be 0x6E686F6A and 0x006E6F73.
Related
What is the difference between 65 and the letter A in binary as both represent same bit level information?
Basically, a computer only understand numbers, and not every numbers: it only understand binary represented numbers, ie. which can be represented using only two different states (for example, 1 and 2, 0V and 5V, open and close, true or false, etc.).
Unfortunately, we poor humans doesn't really like reading zeros and ones... So, we have created some codes, to use number like if they were characters: one of them is called ASCII (American Standard Code for Information Interchange), but there is also some others, such as Unicode. The principle is simple: all the program have to do is manipulating numbers, what any CPU does very well, but, when it comes to displaying these data, the display represent them as real characters, such as 'A', '4', '#', or even a space or a newline.
Now, as soon as you are using ASCII, the number 65 will represent the letter 'A'. All is a question of representation: for example, the binary number 0bOOOO1111, the hexadecimal one 0x0F, the octal one 017 and the decimal number 15 all represent the same number. It's the same for letter 'A': think of ASCII as a base, but instead of using the base 2 (binary), 8(octal), 10(decimal) or 16(hexadecimal), to display numbers, it's used in a complete different manner.
To answer your question: ASCII 'A' is hexadecimal 0x41 is decimal 65 is octal 0101 is binary 0b01000001.
Every character is represented by a number. The mapping between numbers and characters is called encoding. Many encodings use for the letter A the number 65. Since in memory there are no special cells for characters or numbers, they are represented the same way, but the interpretation in any program could be very different.
I may be misunderstanding the question and if so I apologise for getting it wrong
But if I'm right I believe your asking what's the difference between a char and int in binary representation of the value 65 which is the ascii decimal value for the letter A (in capital form)
First off we need to appreciate data types which reserve blocks of memory in the ram modules
An interget is usually 16 bits or more if a float or long (in c# this declaration is made by stating uint16, int16, or int32, uint32 so on, so forth)
A character is an 8 bit memory block
Therefore the binary would appear as follows
A byte (8 bits) - char
Decimal: 128, 64, 32, 16, 8, 4, 2, 1
Binary: 01000001
2 bytes (16 bit) - int16
Binary; 0000000001000001
Its all down to the size of the memory block reserved based on the data type in the variable declaration
I'd of done the decimal calculations for the 2 bit but I'm on the bus at the moment
First of all, the difference can be in size of the memory (8bits, 16bits or 32bits). This question: bytes of a string in java
Secondly, to store letter 'A' you can have different encodings and different interpretation of memory. The ASCII character of 'A' in C can occupy exact one byte (7bits + an unused sign bit) and it has exact same binary value as 65 in char integer. But the bitwise interpretation of numbers and characters are not always the same. Just consider that you can store signed values in 8bits. This question: what is an unsigned char
In Deflate algorithm there are two ways to encode a length of 258:
Code 284 + 5 extra bits of all 1's
Code 285 + 0 extra bits;
On first glance, this is not optimal, because the proper use of code 285 would allow a length of 259 be encoded;
Is this duality some specification mistake, not fixed because of compatibility reasons, or there are some arguments about it - for example length of 258 must be encoded with shorter code (0 extra bits) because of some reason?
We may never know. The developer of the deflate format, Phil Katz, passed away many years ago at a young age.
My theory is that a match length was limited to 258 so that a match length in the range 3..258 could fit in a byte, encoded as 0..255. This format was developed around 1990, when this might make a difference in an assembler implementation.
Adding a second answer here to underscore Mark's guess that allowing the length to be encoded in a byte is helpful to assembler implementations. At the time 8086 level assembler was still common and using the 8 bit form of registers gave you more of them to work with than using them in 16 bit size.
The benefit is even more pronounced on 8 bit processors such as the 6502. It starts with the length decoding. Symbols 257 .. 264 represent a match length of 3 .. 10 respectively. If you take the low byte of those symbols (1 .. 8) you get exactly 2 less than the match length.
A more complicated yet fairly easy to compute formula gives 2 less than the match length of symbols 265 through 284. 2 less than the match length of symbol 285 is 256. That doesn't fit in a byte but we can store 0 which turns out to be equivalent.
zlib6502 uses this for considerable advantage. It calculates the match length in inflateCodes_lengthMinus2. And once the back pointer into the window has been determined it copies the data like so:
jsr copyByte
jsr copyByte
inflateCodes_copyByte
jsr copyByte
dec inflateCodes_lengthMinus2
bne inflateCodes_copyByte
It makes two explicit calls to copy a byte and then loops over the length less 2. Which works as you would expect for lengths 1 to 255. For length 0 it will actually iterate 256 times as we desire. The first time through the loop the length of 0 is decremented to 255 which is non-zero so the loop continues 255 more times for a total of 256.
I'd have to think that Phil Katz understood intuitively if not explicitly the benefits of keeping the length of matches within 8 bits.
8 bits is called "byte". How is 16 bits called? "Short"? "Word"?
And what about 32 bits? I know "int" is CPU-dependent, I'm interested in universally applicable names.
A byte is the smallest unit of data that a computer can work with. The C language defines char to be one "byte" and has CHAR_BIT bits. On most systems this is 8 bits.
A word on the other hand, is usually the size of values typically handled by the CPU. Most of the time, this is the size of the general-purpose registers. The problem with this definition, is it doesn't age well.
For example, the MS Windows WORD datatype was defined back in the early days, when 16-bit CPUs were the norm. When 32-bit CPUs came around, the definition stayed, and a 32-bit integer became a DWORD. And now we have 64-bit QWORDs.
Far from "universal", but here are several different takes on the matter:
Windows:
BYTE - 8 bits, unsigned
WORD - 16 bits, unsigned
DWORD - 32 bits, unsigned
QWORD - 64 bits, unsigned
GDB:
Byte
Halfword (two bytes).
Word (four bytes).
Giant words (eight bytes).
<stdint.h>:
uint8_t - 8 bits, unsigned
uint16_t - 16 bits, unsigned
uint32_t - 32 bits, unsigned
uint64_t - 64 bits, unsigned
uintptr_t - pointer-sized integer, unsigned
(Signed types exist as well.)
If you're trying to write portable code that relies upon the size of a particular data type (e.g. you're implementing a network protocol), always use <stdint.h>.
The correct name for a group of exactly 8 bits is really an octet. A byte may have more than or fewer than 8 bits (although this is relatively rare).
Beyond this there are no rigorously well-defined terms for 16 bits, 32 bits, etc, as far as I know.
Dr. Werner Buchholz coined the word byte to mean, "a unit of digital information to describe an ordered group of bits, as the smallest amount of data that a computer could process." Therefore, the word's actual meaning is dependent on the machine in question's architecture. The number of bits in a byte is therefore arbitrary, and could be 8, 16, or even 32.
For a thorough dissertation on the subject, refer to Wikipedia.
There's no universal name for 16-bit or 32-bit units of measurement.
The term 'word' is used to describe the number of bits processed at a time by a program or operating system. So, in a 16-bit CPU, the word length is 16 bits. In a 32-bit CPU, the word length is 32 bits. I also believe the term is a little flexible, so if I write a program that does all its processing in chunks of say, 10 bits, I could refer to those 10-bit chunks as 'words'.
And just to be clear; 'int' is not a unit of measurement for computer memory. It really is just the data type used to store integer numbers (i.e. numbers with a decimal component of zero). So if you find a way to implement integers using only 2 bits (or whatever) in your programming language, that would still be an int.
short, word and int are all dependent on the compiler and/or architecture.
int is a datatype and is usually 32-bit on desktop 32-bit or 64-bit systems. I don't think it's ever larger than the register size of the underlying hardware, so it should always be a fast (and usually large enough) datatype for common uses.
short may be of smaller size then int, that's all you know. In practice, they're usually 16-bit, but you cannot depend on it.
word is not a datatype, it rather denotes the natural register size of the underlying hardware.
And regarding the names of 16 or 32 bits, there aren't any. There is no reason to label them.
I used to ear them referred as byte, word and long word. But as others mention it is dependant on the native architecture you are working on.
They are called 2 bytes and 4 bytes
There aren't any universal terms for 16 and 32 bits. The size of a word is machine dependent.
I am trying to figure out how to read a historical binary data file. I believe it came from an older 32 bit Solaris system. I am looking at a section of the file that I believe contains 32 bit floating numbers (not IEEE floats). The format appears to be (as a hex dump):
xx41 xxxx
xx42 xxxx
The 41 and 42 in those positions appear consistently through the floating point numbers. I'm afraid that I do not have any additional information to add to this. So the first part of my question is, what format is this? If the first part can not be answered directly, a list of likely possibilities would be great. Lastly, how would you suggest going about determining what format this is? Thank you for your input.
Could this be PDP-11 format? The giveaway for me is that the second byte is mostly constant, which suggests that the exponent of the floating-point format is ending up in the second byte rather than the first (as you'd expect for a big-endian machine) or the last (for a little-endian machine). The PDP-11 is notorious for its funny byte order for floats and integers; see the material near the bottom of this Floating-Point Formats page.
The values of 41 and 42 would appear to be consistent with positive values of roughly unit-magnitude: the exponent bias for the PDP-11 format appears to be 128, so with the unusual byte-order I'd expect the 2nd byte that you list to contain the sign and the topmost 7 bits of the exponent; that would make the unbiased exponent for a second byte of 41 be either 2 or 3 depending on the 8th exponent bit (which should appear as the MSB of the first byte).
See also this page for a brief description of the PDP-11 format.
[EDIT] Here's some Python code to convert from a 4-byte string in the form you describe to a Python float, assuming that the 4-byte string represents a float in PDP-11 format.
import struct
def pdp_to_float(xs):
"""Convert a 4-byte PDP-11 single-precision float to a Python float."""
ordered_bytes = ''.join(xs[i] for i in [1, 0, 3, 2])
n = struct.unpack('>I', ordered_bytes)[0]
fraction = n & 0x007fffff
exponent = (n & 0x7f800000) >> 23
sign = (n & 0x80000000) >> 31
hidden = 1 if exponent != 0 else 0
return (-1)**sign * 2**(exponent-128) * (hidden + fraction / 2.0**23)
Example:
>>> pdp_to_float('\x00\x00\x00\x00')
0.0
>>> pdp_to_float('\x23\x41\x01\x00')
5.093750476837158
>>> pdp_to_float('\x00\x42\x00\x00')
16.0
The data described is consistent with the usual IEEE 754 format, stored in big-endian order, then displayed by a little-endian dump program two bytes at a time.
32-bit floats in the interval [8, 128) have first bytes of 0x41 or 0x42. Consider such a number, perhaps 0x41010203. Stored big end first, it would appear in memory as the four bytes 0x41, 0x01, 0x02, and 0x03. When the dump program reads 16-byte integers, little end first, it will read and display 0x0141 and 0x0302.
My instructor stated that "an octal byte consists of 6 bits". I am having difficulty understanding why this is, as an octal digit consists of 3 binary bits. I also do not understand the significance of an octal byte being defined as '6 bits' as opposed to some other number.
Can anyone explain why this is, if it is in fact true, or point me to a useful explanation?
This is all speculation and guesswork, since none of this is in any way standard terminology.
An 8-bit byte can be written as two digits of hexadecimals, because each digit expresses 4 bits. The largest such byte value is 0xFF.
By analogy, two digits of octals can express 2 × 3 = 6 bits. Its largest value is 077. So if you like you can call a pair of two octals an "octal byte", but only if you will also call an 8-bit byte a "hexadecimal byte".
In my personal opinion neither notion is helpful or useful, and you'd be best of just to say how many bits your byte has.
Like #Kerrek SB I'd have to guess the same.
Tell him an octal byte is two octal nibbles, that should sort him out.
Two hexadecimal digits is an 8 bit byte, so each four bits were called a nibble.
as for wiki : The byte is a unit of digital information that most commonly consists of 8 bits. The bit is a contraction of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented as either "1" or "0".
so byte is a set of bunch of bits. to be specific of 8 bits.
we see that the definition doesnt say anything about dec oct hex and other notations of integer number.
indeed byte and integer number is not the same. so where
does it start ?
if we type bits of byte like so 01001010 we can find that
we can map 1-to-1 this object to binary notation
of integer numbers 01001010.
(byte)01001010 --> (binary integer) 01001010
the two objects look the same but actually is just mapping.
now we work with integer number instead of abstract object "byte". an integer
number can be designated via different notations like : dec, binary, hex, oct etc.
notation like octal(hex,dec etc) is just a method of designation of integer number. it does not influence the nature of initial byte object.
byte has 8 bits whatever notation is used.
ISO/IEC 2382-1:1993 says:
1.The number of bits in a byte is usually 8.
2.Octet is Byte that consists of eight
Bits