How to properly pad binary numbers (with and without radix) - binary

So I've just got a quick question on padding with 0's. The examples I made are below, just assumed that length doesn't matter and there is no sign bit.
a) 1000 = 00000001000
b) 110.101 = 110.00000101
c) 110.101 = 110.10100000
I know that padding the integer part of a binary number in front is fine (notwithstanding buffers and stuff like that). Its more when I deal with radix points that I can never remember how it works. Do you pad beside the radix (b) or at the end of the digit (c). Also, does padding like this effect the complements of these numbers?
Thanks
EDIT:
The reason I'm asking this is because I'm converting from binary to hex and binary to octal. To do this I have to group bits in groups of 4 or 3. I don't have enough bits so I need to add zeros somewhere.

Two's radix works the same way as the base-10 numbers (according to Wikipedia)
Thus, parts (a) and (c) are correct, but part (b) is not.
The bits to the left of the binary point should follow the same algorithm to find their complement, with the MSB of fractional bit being positive.

Related

Why does the addition of 8-bit signed integers 0xA3 + 0xF9 NOT produce overflow, but the addition of signed integers 0x9F + 0xA3 does?

I presented the Hexadecimal values for these integers in the title, just to preserve space, however I am actually trying to add them in Binary.
The overflow conditions (from what I could understand) for signed binary Addition are as follows:
Suppose we have 2 8-bit signed integers x and y.
If we are adding x and y, and they are both positive, but the result is negative, then there is overflow.
Likewise, if we are adding x and y, and they are both negative, but the result is positive, then there is overflow.
Lastly, if the Most Significant Bit Carry-In value is not equal to the MSB Carry-Out value, then there is overflow (I think this is the condition I am having the most trouble understanding, but that will be more clear after you've viewed the way I worked out both of these problems)
Now for the actual addition, I have attached my work:
Firstly, am I even doing signed addition correctly? If so, then what is the point of 2's complement if I am just going to add these signed integers without even using it?
Secondly, in the first problem and the second, there is an extra 9th bit, however overflow only occurs in the second problem. What gives?
I thought I understood the concept of signed binary, however now that I am actually performing computations with 8-bit binary numbers, I have run into lots of confusion, especially where overflow is thrown into the mix. If you need any clarification or if I am breaking any rules, please let me know so I can edit my post accordingly.

Representing numbers with IEEE754 with Round to Nearest Even

I'm currently learning about IEEE754 standard and rounding, and I have an exercise which is the following:
Add -325.875 to 0.546875 in IEEE754, but with 3 bits dedicated to the mantissa instead of 23.
I'm having a lot of trouble doing this, especially representing the intermediary values, and the guard/round/sticky bits. Can someone give me a step-by-step solution, to the problem?
My biggest problem is that obviously I can't represent 0.546875 as 0.100011 as that would have more precision than the system has. So how would that be represented?
Apologies if the wording is confusing.
Preliminaries
The preferred term for the fraction portion of a floating-point number is “significand,” not “mantissa.” “Mantissa” is an old word for the fraction portion of a logarithm. Mantissas are logarithmic; adding to the mantissa multiplies the number represented. Significands are linear; adding to the significand adds to the number represented (as scaled by the exponent).
When working with a significand, use its mathematical precision, not the number of bits in the storage format. The IEEE-754 binary32 format has 23 bits in its primary field for the encoding of a significand, but another bit is encoded via the exponent field. Mathematically, numbers in the binary32 format behave as if they have 24 bits in their significands.
So, the task is to work with numbers with four bits in their significands, not three.
Work
In binary, −325.875 is −101000101.1112•2. In scientific notation, that is −1.010001011112•28. Rounding it to four bits in the significand gives −1.0102•28.
In binary, 0.546875 is .1000112. In scientific notation, that is 1.000112•2−1. Rounding it to four bits in the significand gives 1.0012•2−1. Note that the first four bits are 1000, but they are immediately followed by 11, so we round up. 1.00011 is closer to 1.001 than it is to 1.000.
So, in a floating-point format with four-bit significands, we want to add −1.0102•28 and 1.0012•2−1. If we adjust the latter number to have the same exponent as the former, we have −1.0102•28 and 0.0000000010012•28. To add those, we note the signs are different, so we want to subtract the magnitudes. It may help to line up the digits as we were taught in elementary school:
1.010000000000
0.000000001001
——————————————
1.001111110111
Thus, the mathematical result would be −1.0011111101112•28. However, we need to round the significand to four bits. The first four bits are 1001, but they are followed by 11, so we round up, producing 1010. So the final result is −1.0102•28.
−1.0102•28 is −1.25•28 = −320.

Binary numbers addition

I have just started doing some binary number exercices to prepare for a class that i will start next month and i got the hang of all the conversion from decimal to binary and viceverca But now with the two letters 'a ' ' b' in this exercise i am not sure how can i apply that knowledge to add the bits with the following exercise
Given two Binary numbers a = (a7a6 ... a0) and b = (b7b6 ... b0).There is a clculator that can add 4-bit binary numbers.How many bits will be used to represent the result of a 4-bit addition? Why?
We would like to use our calculator to calculate a + b. For this we can put as many as eight bits (4 bits of the first and 4 bits of the second number) of our choice in the calculator and continue to use the result bit by bit
How many additions does our calculator have to carry out for the addition of a and b at most? How many bits is the result maximum long?
How many additions does the calculator have to perform at least For the result to be correct for all possible inputs a and b?
The number of bits needed to represent a 4-bit binary addition is 5. This is because there could be a carry-over bit that pushes the result to 5 bits.
For example 1111 + 0010 = 10010.
This can be done the same way as adding decimal numbers. From right to left just add the numbers of the same significance. If the two bits are 1+1, the result is 10 so that place becomes a zero and the 1 carries over to the next pair of bits, just like decimal addition.
With regard to the min/max number of step, these seems more like an algorithm specific question. Look up some different binary addition algorithms, like ripple-carry for instance, and it should give you a better idea of what is meant by the question.

Huffman Coding: handling negative ambiguity with zero

I've written a simple text file compressor that uses Huffman coding. I encode the text and write the binary resulting from Huffman to a file. To decode, I read in the binary and step through the Huffman tree.
That part is straightforward. The problem arises with 0 and negative numbers. For practice/fun/learning, I decided to do my own binary conversion methods (from a Java byte to a string and vice-versa) and I decided to represent negative numbers by flipping the last bit to a 1.
E.g, -2 = 00000101;; 2 = 00000100 (the extra 0's for padding since even the unnecessary 0's are important in Huffman... it's irrelevant, though)
However, 0 = 00000000 = 00000001
This may not seem like a problem, but those two binary strings map to two different characters in the huffman tree.
Is there a better way handle negatives in binary that will get around this?
I'm not sure this will help you, but i will try:
First of all, there is different kind of binary, pure or the others. Binary pure DON'T allow negatives, it goes from 0.......
You can use magnitude and sign, another kind of binnary, it allows negative numbers, and the - or + sign is represented with the most important bit of the number, for example:
A number with 4 bits:
0100=2
1100=-2
(1 bit for the sign, the most important, the first left one, and the other 3 for the number)
You can use too the Two's complement, but it's harder and you need to get the number in binary and then translate it to the other type.
I hope i could help you, and sorry for the lot of mistakes in english!

Help Understanding 8bit Floating Point Conversions with Decimals and Binary

I'm in a basic Engineering class and we're going through binary conversions. I can figure out the base 10 to binary or hex conversions really well, however the 8bit floating point conversions are kicking my ass and I can't find anything online that breaks it down in a n00b level and shows the steps? Wondering if any gurus have found anything online that would be helpful for this situation.
I have questions like 00101010(8bfp) = what number in base 10
Whenever I want to remember how floating point works, I refer back to the wikipedia page on 32 bit floats. I think it lays out the concepts pretty well.
http://en.wikipedia.org/wiki/Single_precision_floating-point_format
Note that wikipedia doesn't know what 8 bit floats are, I think your professor may have invented them ;)
Binary floating point formats are usually broken down into 3 fields: Sign bit, exponent and mantissa. The sign bit is simply set to 1 if the entire number should be negative, and 0 if the number is positive. The exponent is usually an unsigned int with an offset, where 2 to the 0'th power (1) is in the middle of the range. It's simpler in hardware and software to compare sizes this way. The mantissa works similarly to the mantissa in regular scientific notation, with the following caveat: The most significant bit is hidden. This is due to the requirement of normalizing scientific notation to have one significant digit above the decimal point. Remember when your math teacher in elementary school would whack your knuckles with a ruler for writing 35.648 x 10^6 or 0.35648 x 10^8 instead of the correct 3.5648 x 10^7? Since binary only has two states, this required digit above the decimal point is always one, and eliminating it allows another bit of accuracy at the low end of the mantissa.