Why is it useful to know how to convert between numeric bases? - base-conversion

We are learning about converting Binary to Decimal (and vice-versa) as well as other base-conversion methods, but I don't understand the necessity of this knowledge.
Are there any real-world uses for converting numbers between different bases?

When dealing with Unicode escape codes— '\u2014' in Javascript is — in HTML
When debugging— many debuggers show all numbers in hex
When writing bitmasks— it's more convenient to specify powers of two in hex (or by writing 1 << 4)

In this article I describe a concrete use case. In short, suppose you have a series of bytes you want to transfer using some transport mechanism, but you cannot simply pass the payload as bytes, because you are not able to send binary content. Let's say you can only use 64 characters for encoding the payload. A solution to this problem is to convert the bytes (8-bit characters) into 6-bit characters. Here the number conversion comes into play. Consider the series of bytes as a big number whose base is 256. Then convert it into a number with base 64 and you are done. Each digit of the new base 64 number now denotes a character of your encoded payload...

If you have a device, such as a hard drive, that can only have a set number of states, you can only count in a number system with that many states.
Because a computer's byte only have on and off, you can only represent 0 and 1. Therefore a base2 system is used.
If you have a device that had 3 states, you could represent 0, 1 and 2, and therefore count in a base 3 system.

Related

When JSON is send over network how are numbers represented (as binary or text)?

This might be a trivial question... Or might not be. When I serialize an object to JSON how are numbers represented?
Specifically, I need to know how efficiently they are encoded to binary. There are 2 ways:
Transform number to its decimal string representation and then encode that string to binary.
Or encode the number directly to binary.
Which is the case?
That is a big difference: Let's say serialized object contains number 12345678. Encoded first way it will take 8 B to transfer, encoded second way only 4 B. When it comes to lots of big numbers (my case) than in the first case I would better use base64 as pre-process for serialization.
I can imagine that this might be dependent on serializer (though I really hope it is not). In that case, I am using Firebase Realtime database SDK.
JSON is a textual notation. So the number 12345678 is sent as those eight characters, 1, 2, 3, etc. Depending on your text encoding, that's probably eight bytes (e.g., UTF-8 or Windows-1252; but if you were using UTF-16, for instance, it would be 16 bytes).
There have been various "binary JSON" proposals over the years, but I don't think any of them really caught on outside of specific applications (for instance, BSON in MongoDB).

What's the exact meaning of the statement "Since ASCII used 7 bits for the character, it could only represent 128 different characters"?

I come across the below statement while studying about HTML Character Sets and Character Encoding :
Since ASCII used 7 bits for the character, it could only represent 128
different characters.
When we convert any decimal value from the ASCII character set to its binary equivalent it comes down to a 7-bits long binary number.
E.g. For Capital English Letter 'E' the decimal value of 69 exists in ASCII table. If we convert '69' to it's binary equivalent it comes down to the 7-bits long binary number 1000101
Then, why in the ASCII Table it's been mentioned as a 8-bits long binary number 01000101 instead of a 7-bits long binary number 1000101 ?
This is contradictory to the statement
Since ASCII used 7 bits for the character, it could only represent 128
different characters.
The above statement is saying that ASCII used 7 bits for the character.
Please clear my confusion about considering the binary equivalent of a decimal value. Whether should I consider a 7-bits long binary equivalent or a 8-bits long binary equivalent of any decimal value from the ASCII Table? Please explain to me in an easy to understand language.
Again, consider the below statement :
Since ASCII used 7 bits for the character, it could only represent 128
different characters.
According to the above statement how does the number of characters(128) that ASCII supports relates to the fact that ASCII uses 7 bits to represent any character?
Please clear the confusion.
Thank You.
In most processors, memory is byte-addressable and not bit-addressable. That is, a memory address gives the location of an 8-bit value. So, almost all data is manipulated in multiples of 8 bits at a time.
If we were to store a value that has by its nature only 7 bits, we would very often use one byte per value. If the data is a sequence of such values, as text might be, we would still use one byte per value to make counting, sizing, indexing and iterating easier.
When we describe the value of a byte, we often show all of its bits, either in binary or hexadecimal. If a value is some sort of integer (say of 1, 2, 4, or 8 bytes) and its decimal representation would be more understandable, we would write the decimal digits for the whole integer. But in those cases, we might lose the concept of how many bytes it is.
BTW—HTML doesn't have anything to do with ASCII. And, Extended ASCII isn't one encoding. The fundamental rule of character encodings is to read (decode) with the encoding the text was written (encoded) with. So, a communication consists of the transferring of bytes and a shared understanding of the character encoding. (That makes saying "Extended ASCII" so inadequate as to be nearly useless.)
An HTML document represents a sequence of Unicode characters. So, one of the Unicode character encodings (UTF-8) is the most common encoding for an HTML document. Regardless, after it is read, the result is Unicode. An HTML document could be encoded in ASCII but, why do that? If you did know it was ASCII, you could just as easily know that it's UTF-8.
Outside of HTML, ASCII is used billions—if not trillions—of times per second. But, unless you know exactly how it pertains to your work, forget about it, you probably aren't using ASCII.

Representation of numbers in the computer

In the representation of inputs in the computer, the numbers are taken as characters and encoded with Ascii code or are they converted directly to binary? in other way: When my input is considered as integer and not a character?
Both are possible, and it depends on the application. In other words the software programmer decides. In general, binary representation is more efficient in terms of storage requirements and processing speed. Therefore binary representation is more usual, but there are good examples when it is better to keep numbers as strings:
to avoid problems with conversions
phone numbers
when no adequate binary representation is available (e.g. 100 digits of pi)
numbers where no processing takes places
to be continued ...
The most basic building block of electronic data is a bit. It can have only 2 values, 0 and 1. Other data structures are built from collection of bits, such as an 8-bit byte, or a 32-bit float.
When a collection of bits needs to be used to represent a character, a certain encoding is used to give lexical meaning to these bits, such as ASCII, UTF8 and others.
When you want to display character information to the screen, you use a graphical layer to draw pixels representing the "character" (collection of bits with matching encoding) to the screen.

Correct way to store a bit array

I'm working on a project that needs to store something like
101110101010100011010101001
into the database. It's not a file or archive: it's only a bit array, and I think that storing it into a varchar column is waste of space/performance.
I've searched about the BLOB and the VARBINARY type. But both of then allows to insert a value like 54563423523515453453, that's not exactly a bit array.
For sure, if I store a bit array like 10001000 into a BLOB/varbinary/varchar column, it will consume more than a byte, and I want that the minimum space is consumed. In the case of eight bits, it needs to consume only one byte, 16 bits two bytes, and so on.
If it's not possible, then what is the best approach to waste the minimum amount of space in this case?
Important notes: The size of the array is variable, and is not divisible by eight in every situation. Sometimes I will need to store 325 bits, other times 7143 bits....
In one of my previous projects, I converted streams of 1's and 0' to decimal, but they were shorter. I dont know if that would be applicable in your project.
On the other hand, imho, you should clarify what will you need to do with that data once you get it stored. Search? Compare? It might largely depend on the purpose of the database.
Could you gzip it and then store it? Is that applicable?
Binary is a string representation of a number. The string
101110101010100011010101001
represents the number
... + 1*25 + 0*24 + 1*23 + 0*22 + 0*21 + 1*20
As such, it can be stored in a 32-bit integer if were to be converted from a binary string to the number it represents. In Perl, one would use
oct('0b'.$binary)
But you have a variable number of bits. Not a problem! Just process them 8 at a time to create a string of bytes to place in a BLOB or similar.
Ah, but there's a catch. You'll need to add padding to get a number divisible by 8, which means you'll have to use a means of removing that padding. A simple approach if there's a known maximum length is to use a length prefix. e.g. If you know the number of bits is never going to exceed 65,535, encode the number of bits in the first two bytes of the string.
pack('nB*', length($binary), $binary)
which is reverted using
my ($length, $binary) = unpacked('nB*', $packed);
substr($binary, $length) = '';

In what situations is octal base used?

I've seen binary and hex used quite often but never octal. Yet octal has it's own convention for being used in some languages (ie, a leading 0 indicating octal base). When is octal used? What are some typical situations when one would use octal or octal would be easier to reason about? Or is it merely a matter of taste?
Octal is used when the number of bits in one word is a multiple of 3, or if the grouping of the bits makes sense to notate in groups of 3. Examples are
ancient systems with 18bit word sizes (mostly historical)
systems with 9bit bytes (mostly historical)
unix file permissions with 9bits (3*3bits, "rwxr-x---" 0750)
unix file permissions with 12bits (the same as the 9bit version but adding three bits in front for setuid, setgid, and sticky, 01777 but the letters are more complicated here)
I have not encountered any uses of octal other than unix file permission bits during my about 25 years in IT.
If the number of bits in your word is a multiple of 4, however, please do use hex, by all means.
Octal is used as a shorthand for representing file permissions on UNIX systems. For example, file mode rwxr-xr-x would be 0755.
Octal is used when the syntax is a relic from the ages when it perhaps made sense on some platform (system words haven't always been a multiple of 8 bits). Nowadays hex is the thing to use.
Didn't think of this but Digital displays!
Several other uses from:
http://en.wikipedia.org/wiki/Octal
One of the main reasons octal used to be more frequently used was that it is easier to convert between octal and binary in your head than hex to binary: you only have to remember the binary representation of the 8 octal digits (0-7).
Back in the days when debugging meant reading register contents from a row of LEDs, or entering data with an array of toggle switches, this was a big concern. The panels on many of these early computers grouped the LEDs and switches in groups of threes to facilitate this.
However, hex began to win out as word sizes that are multiples of 8-bit bytes began to win out, and the need to read and enter data in binary became unecessary (with console text UI and later GUI debuggers).
If birds could count, my guess would be that they use octal. While most birds have 3 digits on their feathered "hands", most are Tetradactyly, meaning 4 toes on each foot.
In avionics, ARINC 429 word labels are almost always expressed in octal.
Music, as long as you stay away from (most) sharps and flats.
FYI, there are a few places that windows and javascript automatically decide that a number prefixed with a zero is octal and convert the number.
In windows if you ping and address like 10.0.2.010 it will actually ping 10.0.2.8
Windows also does this if you enter it as the ip/dns address for the computer
Though it is deprecated, Javascript does this by default on some functions like parseInt if you do not specify a radix http://www.w3schools.com/jsref/jsref_parseint.asp