The name of 16 and 32 bits - terminology

8 bits is called "byte". How is 16 bits called? "Short"? "Word"?
And what about 32 bits? I know "int" is CPU-dependent, I'm interested in universally applicable names.

A byte is the smallest unit of data that a computer can work with. The C language defines char to be one "byte" and has CHAR_BIT bits. On most systems this is 8 bits.
A word on the other hand, is usually the size of values typically handled by the CPU. Most of the time, this is the size of the general-purpose registers. The problem with this definition, is it doesn't age well.
For example, the MS Windows WORD datatype was defined back in the early days, when 16-bit CPUs were the norm. When 32-bit CPUs came around, the definition stayed, and a 32-bit integer became a DWORD. And now we have 64-bit QWORDs.
Far from "universal", but here are several different takes on the matter:
Windows:
BYTE - 8 bits, unsigned
WORD - 16 bits, unsigned
DWORD - 32 bits, unsigned
QWORD - 64 bits, unsigned
GDB:
Byte
Halfword (two bytes).
Word (four bytes).
Giant words (eight bytes).
<stdint.h>:
uint8_t - 8 bits, unsigned
uint16_t - 16 bits, unsigned
uint32_t - 32 bits, unsigned
uint64_t - 64 bits, unsigned
uintptr_t - pointer-sized integer, unsigned
(Signed types exist as well.)
If you're trying to write portable code that relies upon the size of a particular data type (e.g. you're implementing a network protocol), always use <stdint.h>.

The correct name for a group of exactly 8 bits is really an octet. A byte may have more than or fewer than 8 bits (although this is relatively rare).
Beyond this there are no rigorously well-defined terms for 16 bits, 32 bits, etc, as far as I know.

Dr. Werner Buchholz coined the word byte to mean, "a unit of digital information to describe an ordered group of bits, as the smallest amount of data that a computer could process." Therefore, the word's actual meaning is dependent on the machine in question's architecture. The number of bits in a byte is therefore arbitrary, and could be 8, 16, or even 32.
For a thorough dissertation on the subject, refer to Wikipedia.

There's no universal name for 16-bit or 32-bit units of measurement.
The term 'word' is used to describe the number of bits processed at a time by a program or operating system. So, in a 16-bit CPU, the word length is 16 bits. In a 32-bit CPU, the word length is 32 bits. I also believe the term is a little flexible, so if I write a program that does all its processing in chunks of say, 10 bits, I could refer to those 10-bit chunks as 'words'.
And just to be clear; 'int' is not a unit of measurement for computer memory. It really is just the data type used to store integer numbers (i.e. numbers with a decimal component of zero). So if you find a way to implement integers using only 2 bits (or whatever) in your programming language, that would still be an int.

short, word and int are all dependent on the compiler and/or architecture.
int is a datatype and is usually 32-bit on desktop 32-bit or 64-bit systems. I don't think it's ever larger than the register size of the underlying hardware, so it should always be a fast (and usually large enough) datatype for common uses.
short may be of smaller size then int, that's all you know. In practice, they're usually 16-bit, but you cannot depend on it.
word is not a datatype, it rather denotes the natural register size of the underlying hardware.
And regarding the names of 16 or 32 bits, there aren't any. There is no reason to label them.

I used to ear them referred as byte, word and long word. But as others mention it is dependant on the native architecture you are working on.

They are called 2 bytes and 4 bytes

There aren't any universal terms for 16 and 32 bits. The size of a word is machine dependent.

Related

What does "extend immediate to 32 bits" mean in MIPS?

I'm reading about the Instruction Decode (ID) phase in the MIPS datapath, and I've got the following quote: "Once operands are known, read the actual data (from registers) or extend the data to 32 bits (immediates)."
Can someone explain what the "extend the data to 32 bits (immediates)" part means? I know that registers all contain 32 bits, and I know what an immediate is. I just don't understand why you need to extend the immediate from 26 to 32 bits.
Thanks!
26-bit immediates are only in jump instructions, and aren't sign- or zero-extended to 32 bit, because they're not displacements to be added/subtracted.
I-type instructions with 16-bit immediates are different.
addi / addiu immediates are sign-extended (by duplicating the top/sign bit of the immediate to all higher bits).
https://en.wikipedia.org/wiki/Two%27s_complement#Sign_extension
This allows 2's complement numbers from -2^15 .. +2^15-1 to be encoded.
(0xFFFF8000 to 0x00007FFF)
ori/andi/xori boolean immediates are zero-extended (by setting all higher bits to zero)
This allows unsigned / 2's complement numbers from 0 .. 2^16-1 to be encoded.
(0x00000000 to 0x0000FFFF)
For other instructions see this instruction-set reference which breaks down each instruction showing 016 || [I15..0] for zero-extension or [I15]16 || [I15..0] for sign-extension.
This makes it possible to use 16-bit immediates as inputs to a 32-bit binary operation that only makes sense with 2 equal-width inputs. (In a simple classic MIPS pipeline, the decode stage fetches operands from registers and/or immediates. Register inputs are always going to be 32-bit, so the ALU is wired up for 32-bit inputs. Extending immediates to 32-bit means the rest of the CPU doesn't have to care whether the data came from an immediate or a register.)
Also sign-extended:
offsets in the reg+imm16 addressing mode used by lw/sw and other load/store instructions
relative branches (PC += imm16<<2)
maybe others, check the manual for instructions I didn't mention to see if they sign- or zero- extend.
You might be wondering "why does addiu sign-extend its immediate even though it's unsigned?"
Remember that there's no subiu, only addiu with a negative immediate. Being able to add or subtract numbers in the range -2^15 .. +2^15-1 is more useful than only being able to add 0 .. 2^16-1.
And usually you don't want to raise an exception on signed overflow, so normally compilers use addu / addiu even on signed integers. addu is badly named: it's not "for unsigned integers", it's just a wrapping-allowed / never-faulting version of add/addi. It sort of makes sense if you think of C, where signed overflow is undefined behaviour (and thus could use add and raise an exception in that case if the compiler wanted to implement it that way), but unsigned integers have well-defined overflow behaviour: base 2 wraparound.
On a 32-bit CPU, most of the operations you do (like adding, subtracting, dereferencing a pointer) are done with 32-bit numbers. When you have a number with fewer bits, you need to somehow decide what those other bits are going to be when you want to use that number in one of those operations. The act of deciding what those new high bits are is called "extending".
Assuming you are just doing a standard zero extension or sign extension, extending is very cheap. However, it does require some circuitry, so it makes sense that a description of the MIPS datapath would mention it.

Number of bits required to store unsigned Int

I need to know what is the correct solution for the minimum bits required to store an unsigned int. Say, I have 403 its binary representation as an unsigned int will be 00000000000000000000000110010011 that adds up to 32 bit. Now, I know that an unsigned integer takes 32 bits to store. But, why do we have all those zeros in front when the number can be explained by only 9 bits 110010011. Moreover, How come unsigned int takes 32 bits to store and decimal takes only 8 bits ?
Please explain in detail. Thanks
This has nothing to do with how many bits are needed, and everything to do with how many bits your computer is wired for (32). Though 9 bits are enough, your computer has data channels that are 32 bits wide - it's physically wired to be efficient for 32, 64, 128, etc. And your compiler presumably chose 32 bits for you.
The decimal representation of "403" is three digits, and to represent each digit in binary requires at least four bits (2^4 is 16, so you have 6 spare codes); so the minimum "decimal" representation of "403" requires 12 bits, not eight.
However, to represent a normal character (including the decimal digits as well as alpha, punctuation, etc) it's common to use 8 bits, which allows up to 2^8 or 256 possible characters. Represented this way, it takes 3x8 or 24 binary bits to represent 403.

How to know if it is either a positive/negative number or it is referring to a number in binary?

I'm learning Integer data formats in a computer science book and as far as I understand that binary representation of a integer whether it is positive or negative is to have the leftmost bit (msb) be either a 0 for positive or 1 for negative, lets say in a 8-bit computer how would I know if it is talking about 10000010 - 130 in base 10 or if it is referring to negative 2?
I might be wrong, if i'm please correct me.
If you were to just see the string 10000010 somewhere, I don't know... written on a wall or something, how would you know how to interpret it?
You might say, hey, that's ten million and ten ( you thought it was base 10 ) or you might say hey, that's -126 ( you thought it was two's complement binary ), or you might say that's positive 130 ( you thought it was standard binary ).
It is, in a theoretical sense, up to whatever is doing the interpreting how it is interpreted.
So, when a computer is holding 8 bits of data, it's up to it how it interprets it.
Now if you're programming, you can tell the computer how you want something interpreted. For example, in c++
// char is 1 byte
unsigned char x = 130u;
Here I have told the compiler to put 130 unsigned into a byte, so the computer will store 10000010 and will interpret it as the value 130
Now consider
// char is 1 byte
char x = -126;
Here I have told the compiler to put -126 signed into a byte, so the computer will again store 10000010 but this time it will interpret it as the value -126.
Take a look at the answer posted to this question: Negative numbers are stored as 2's complement in memory, how does the CPU know if it's negative or positive?
The CPU uses something called an opcode in order to determine which function it will take when manipulating a memory location (in this case, the value 10000010). It is that function within the CPU that will either manipulate it as a negative or a positive number. The CPU doesn't have access to whether or not the number is signed or unsigned - it uses the op code when manipulating that number to determine whether or not it should be a signed or unsigned operation.

negative integers in binary

5 (decimal) in binary 00000101
-5 (two's complement) in binary 11111011
but 11111011 is also 251 (decimal)!
How does computer discern one from another??
How does it know whether it's -5 or 251??
it's THE SAME 11111011
Thanks in advance!!
Signed bytes have a maximum of 127.
Unsigned bytes cannot be negative.
The compiler knows whether the variable holding that value is of signed or unsigend type, and treats it appropriately.
If your program chooses to treat the byte as signed, the run-time system decides whether the byte is to be considered positive or negative according to the high-order bit. A 1 in that high-order bit (bit 7, counting from the low-order bit 0) means the number is negative; a 0 in that bit position means the number is positive. So, in the case of 11111011, bit 7 is set to 1 and the number is treated, accordingly, as negative.
Because the sign bit takes up one bit position, the absolute magnitude of the number can range from 0 to 127, as was said before.
If your program chooses to treat the byte as unsigned, on the other hand, what would have been the sign bit is included in the magnitude, which can then range from 0 to 255.
Two's complement is designed to allow signed numbers to be added/substracted to one another in the same way unsigned numbers are. So there are only two cases where the signed-ness of numbers affect the computer at low level.
when there are overflows
when you are performing operations on mixed: one signed, one unsigned
Different processors take different tacks for this. WRT orverflows, the MIPS RISC architecture, for example, deals with overflows using traps. See http://en.wikipedia.org/wiki/MIPS_architecture#MIPS_I_instruction_formats
To the best of my knowledge, mixing signed and unsigned needs to avoided at a program level.
If you're asking "how does the program know how to interpret the value" - in general it's because you've told the compiler the "type" of the variable you assigned the value to. The program doesn't actually care if 00000101 as "5 decimal", it just has an unsigned integer with value 00000101 that it can perform operations legal for unsigned integers upon, and will behave in a given manner if you try to compare with or cast to a different "type" of variable.
At the end of the day everything in programming comes down to binary - all data (strings, numbers, images, sounds etc etc) and the compiled code just ends up as a large binary blob.

What is the name for a number which is the same in both big- and little-endian byte order?

One of my old classmates just asked me this and I'm at a total loss. Google gave me a lot of definitions of endian-ness, but not the term for this special case. IS there even a term for this?
See palindrome. Consider, for example, a 32-bit integer as a sequence of four byte values when stored in memory. If the sequence of four bytes is a palindrome, then the it has the same integer value in both big- and little-endian. So,
all 8-bit integers are palindromes,
all 16-bit integers of the form AA (where A is a byte) are palindromes,
all 32-bit integers of the form AAAA or ABBA (where A and B are bytes) are palindromes,
and so on. Historically, there have been architectures with mixed endianness (notably the VAX), but here I'm limiting myself to pure big- or little-endian representations.
how about zero :-)
The answer is, "A byte".
Palindromic, even though that term is usually used for numbers that have digits that are the same both ways.