How to used the alphabet binary symbols - binary

I was reading an article on binary numbers and it had some practice problems at the end but it didn't give the solutions to the problems. The last is "How many bits are required to represent the alphabet?". Can tell me the answer to that question and briefly explain why?
Thanks.

You would only need 5 bits because you are counting to 26 (if we take only upper or lowercase letters). 5 bits will count up to 31, so you've actually got more space than you need. You can't use 4 because that only counts to 15.
If you want both upper and lowercase then 6 bits is your answer - 6 bits will happily count to 63, while your double alphabet has (2 * 24 = 48) characters, again leaving plenty of headroom.

It depends on your definition of alphabet. If you want to represent one character from the 26-letter Roman alphabet (A-Z), then you need log2(26) = 4.7 bits. Obviously, in practice, you'll need 5 bits.
However, given an infinite stream of characters, you could theoretically come up with an encoding scheme that got close to 4.7 bits (there just won't be a one-to-one mapping between individual characters and bit vectors any more).
If you're talking about representing actual human language, then you can get away with a far lower number than this (in the region of 1.5 bits/character), due to redundancy. But that's too complicated to get into in a single post here... (Google keywords are "entropy", and "information content").

There are 26 letters in the alphabet so you 2^5 = 32 is the minimum word length than contain all the letters.

How direct does the representation need to be? If you need 1:1 with no translation layer, then 5 bits will do. But if a translation layer is an option, then you can get away with less. Morse code, for example, can do it in 3 bits. :)

Related

Need some assistance understanding binary addition/subtraction using 2's complement

If A = 01110011, B = 10010100, how would I add these?
I did this:
i.e: 01110011 + 10010100 = 100000111
Though, isn't it essentially 115 + (-108) = 7, whereas, I'm getting -249
Edit: I see that removing the highest order bit (overflow) I get 7 which is what I'm looking for but I'm not getting why you wouldn't have the extra bit.
Edit**: Ok, I figured it out. There was no overflow as I had assumed there was because 7 is within [-128, 127] (8-bits). Instead, like Omar hinted at I was supposed to drop the "extra" 1 from addition.
Your calculation is correct and the result is correct.
You stated that the second number is -108, so both your numbers are interpreted as signed 8-bit values. Thus, you should also interpret your result as an 8-bit signed value, this is why the 9th bit must be dropped, and so the result is 7 (00000111).
On a real hardware, like an 8-bit CPU for example, as all the registers are 8-bit wide, you are only be able to store the lowest 8-bit of the result, which here is 7 (00000111).
In some cases, the 9th bit may also be put inside a carry/overflow flag so it's not completely "dropped".

Binary numbers addition

I have just started doing some binary number exercices to prepare for a class that i will start next month and i got the hang of all the conversion from decimal to binary and viceverca But now with the two letters 'a ' ' b' in this exercise i am not sure how can i apply that knowledge to add the bits with the following exercise
Given two Binary numbers a = (a7a6 ... a0) and b = (b7b6 ... b0).There is a clculator that can add 4-bit binary numbers.How many bits will be used to represent the result of a 4-bit addition? Why?
We would like to use our calculator to calculate a + b. For this we can put as many as eight bits (4 bits of the first and 4 bits of the second number) of our choice in the calculator and continue to use the result bit by bit
How many additions does our calculator have to carry out for the addition of a and b at most? How many bits is the result maximum long?
How many additions does the calculator have to perform at least For the result to be correct for all possible inputs a and b?
The number of bits needed to represent a 4-bit binary addition is 5. This is because there could be a carry-over bit that pushes the result to 5 bits.
For example 1111 + 0010 = 10010.
This can be done the same way as adding decimal numbers. From right to left just add the numbers of the same significance. If the two bits are 1+1, the result is 10 so that place becomes a zero and the 1 carries over to the next pair of bits, just like decimal addition.
With regard to the min/max number of step, these seems more like an algorithm specific question. Look up some different binary addition algorithms, like ripple-carry for instance, and it should give you a better idea of what is meant by the question.

Is this just a coincidence that 0xaaaaaaaa represents binary with even positions set as 1

Is this just a coincidence that hexadecimal 0xaaaaaaaa represents binary with even positions set as 1.
Similarly something as elegant as 0x55555555 represents binary with odd positions set as 1 ?
Binary representation of 5 is 0101. So 0X55555555 has 16 ones, 16 zeros and the ones,zeros take alternate positions. Similarly 0X33333333 has 16 ones, 16 zeros and 2 consecutive ones, 2 consecutive zeros alternate.
Nothing special about those numbers per se, other than the fact that their corresponding bit patterns are useful.
I think the key realization here is that it's super easy to come up with a compact hex number to represent any longer bit pattern (even easier if it's repeating), right off the top of your head.
Why? Because it's trivial to convert from hex-to-binary or binary-to-hex - every four bits of the pattern can be neatly represented by one hex digit:
So let's say I wanted this 16-bit mask: 1110111011101110. This is 1110 repeated 4 times, so it's just some hex digit, 4 times. Since 1110 is 14 in decimal, that's gonna be "E", so our mask would be: 0xEEEE.

Deflate length of 258 double encoding

In Deflate algorithm there are two ways to encode a length of 258:
Code 284 + 5 extra bits of all 1's
Code 285 + 0 extra bits;
On first glance, this is not optimal, because the proper use of code 285 would allow a length of 259 be encoded;
Is this duality some specification mistake, not fixed because of compatibility reasons, or there are some arguments about it - for example length of 258 must be encoded with shorter code (0 extra bits) because of some reason?
We may never know. The developer of the deflate format, Phil Katz, passed away many years ago at a young age.
My theory is that a match length was limited to 258 so that a match length in the range 3..258 could fit in a byte, encoded as 0..255. This format was developed around 1990, when this might make a difference in an assembler implementation.
Adding a second answer here to underscore Mark's guess that allowing the length to be encoded in a byte is helpful to assembler implementations. At the time 8086 level assembler was still common and using the 8 bit form of registers gave you more of them to work with than using them in 16 bit size.
The benefit is even more pronounced on 8 bit processors such as the 6502. It starts with the length decoding. Symbols 257 .. 264 represent a match length of 3 .. 10 respectively. If you take the low byte of those symbols (1 .. 8) you get exactly 2 less than the match length.
A more complicated yet fairly easy to compute formula gives 2 less than the match length of symbols 265 through 284. 2 less than the match length of symbol 285 is 256. That doesn't fit in a byte but we can store 0 which turns out to be equivalent.
zlib6502 uses this for considerable advantage. It calculates the match length in inflateCodes_lengthMinus2. And once the back pointer into the window has been determined it copies the data like so:
jsr copyByte
jsr copyByte
inflateCodes_copyByte
jsr copyByte
dec inflateCodes_lengthMinus2
bne inflateCodes_copyByte
It makes two explicit calls to copy a byte and then loops over the length less 2. Which works as you would expect for lengths 1 to 255. For length 0 it will actually iterate 256 times as we desire. The first time through the loop the length of 0 is decremented to 255 which is non-zero so the loop continues 255 more times for a total of 256.
I'd have to think that Phil Katz understood intuitively if not explicitly the benefits of keeping the length of matches within 8 bits.

Why is it useful to know how to convert between numeric bases?

We are learning about converting Binary to Decimal (and vice-versa) as well as other base-conversion methods, but I don't understand the necessity of this knowledge.
Are there any real-world uses for converting numbers between different bases?
When dealing with Unicode escape codes— '\u2014' in Javascript is — in HTML
When debugging— many debuggers show all numbers in hex
When writing bitmasks— it's more convenient to specify powers of two in hex (or by writing 1 << 4)
In this article I describe a concrete use case. In short, suppose you have a series of bytes you want to transfer using some transport mechanism, but you cannot simply pass the payload as bytes, because you are not able to send binary content. Let's say you can only use 64 characters for encoding the payload. A solution to this problem is to convert the bytes (8-bit characters) into 6-bit characters. Here the number conversion comes into play. Consider the series of bytes as a big number whose base is 256. Then convert it into a number with base 64 and you are done. Each digit of the new base 64 number now denotes a character of your encoded payload...
If you have a device, such as a hard drive, that can only have a set number of states, you can only count in a number system with that many states.
Because a computer's byte only have on and off, you can only represent 0 and 1. Therefore a base2 system is used.
If you have a device that had 3 states, you could represent 0, 1 and 2, and therefore count in a base 3 system.