How do exactly computers convert ASCII to Binary? - binary

I have read that when you press a key on the keyboard, the OS will translate it to the corresponding ASCII, then the computer will convert ACII to Binary. But, what part of the computer converts ASCII to Binary? The question may be stupid because I have only started to learn CS.

Bear with me, it has been a while since I dealt with this sort of thing...
When you press a key on the keyboard, a (very low) voltage signal is raised and is detected by one of the I/O subsystems on the motherboard - in this case the one responsible for the signals at the port that the keyboard is connected to (e.g. USB, DIN, Bluetooth, etc).
The I/O handler then signals this to the interrupt handler, which in turn sends it as a keyboard interrupt to the operating system's keyboard driver. The keyboard driver maps this high-priority interrupt signal to a binary value according to the hardware's specific rules. And this binary representation of the pressed key is the value that the operating system uses and/or hands over to another program (like a word processor, console/terminal, email, etc).
For example, and assuming a very simple, single-byte ASCII-based system (it gets a lot more complicated these days - UTF-8, UTF-16, EBCDIC, etc., etc.):
When you press the letters g and H, the two voltages get translated into binary values 01100111 and 01001000 respectively. But since the computer does not understand the concept of "letters", these binary values represent numbers instead (no problem for the computer). In decimal this would be 103 and 72 respectively.
So where do the actual letters come in? The ASCII code is a mapping between the binary representation, its numeric value (in dec, Hex, Oct, etc.) and a corresponding symbol. The symbols in this case being g and H, which the computer then "paints" on the screen. Letters, numbers, punctuation marks - they are all graphical representations of a number -- little images if you like.

Related

How do computers differentiate between letters and numbers in binary?

I was just curious because 65 is the same as the letter A
If this is the wrong stack sorry.
"65 is the same as the letter A": It is true if you say it is. But not saying more than that isn't very useful.
There is no text but encoded text. There are no numbers but encoded numbers. To the CPU, some number encodings are native, everything else is just undifferentiated data.
(Some data is just data for programs, other data is the CPU instructions of programs. It's a security problem if a CPU executes data as instructions inappropriately. Some architectures keep program data and instructions separate.)
Common native number encodings are signed and unsigned integers of 1, 2, 4, and 8 bytes and IEEE-754 single and double precision floating point numbers. Signed integers are usually two's-complement. Multi-byte integers have a byte ordering (or endianness) because on typical machines each byte is individually addressable. If a number encoding is not native, a program library is needed to process such data.
Text is a sequence of encoded characters from a character set. There are hundreds of character sets. A character set is an assignment of a conceptual character to a number called a codepoint. Sometimes the conceptual characters are categorized as lowercase letter, digit, symbol, etc. A codepoint value is mapped to bytes using a character encoding. Most character sets have one encoding, but Unicode has several. Some character sets are subsets of other character sets—such relationships are not generally useful because exactly one character set is used in any one context.
A program is a set of instructions that operate on data. It must apply the correct operations to the right data. So, it is the program that differentiates between text and number, usually by its location or flow path.
Stored data must be in a known layout of encoded text and numbers. Sometimes the layout is stored also. The layout is called metadata. Without the metadata accompanying the data, or being agreed upon, the data cannot be used.
It's all quite simple with appropriate bookkeeping. But there are several methods of bookkeeping so there is no general solution to how to handle data without metadata. Methods include: Well-known and/or registered file extensions, HTTP headers, MIME types, HTML meta charset tag, XML encoding declaration. Some methods only work in a certain context, such as audio/video codecs having a four-character code (FourCC), and unix shell scripts with a shebang. Some methods only help narrow guessing, such as file signatures. Needless to say, guessing should be avoided; it leads to security issues and data loss.
Unfortunately, text files are often without metadata. It is particularly important to agree upon or separately communicate the metadata.
Data without metadata is "binary". So the writer of text must agree with the reader on which character encoding is to be used. Similarly, for all types of data. Here reader and writer are both humans and programs.
Short answer. They don't. Longer answer, every binary combination between 00000000 and 11111111 has a character representation in the ASCII character set. 01000001 just happens to be the first capital letter in the Latin alphabet that was designated over 30 years ago. There are other character sets, and code pages that represent different letter, numbers, non-printable and accented letters. It's entirely possible that the binary 01000001 could be a lower case z with a tilde over the top in a different character set. 'computers' don't know (or care) what a particular binary representation means to humans.

Representation of numbers in the computer

In the representation of inputs in the computer, the numbers are taken as characters and encoded with Ascii code or are they converted directly to binary? in other way: When my input is considered as integer and not a character?
Both are possible, and it depends on the application. In other words the software programmer decides. In general, binary representation is more efficient in terms of storage requirements and processing speed. Therefore binary representation is more usual, but there are good examples when it is better to keep numbers as strings:
to avoid problems with conversions
phone numbers
when no adequate binary representation is available (e.g. 100 digits of pi)
numbers where no processing takes places
to be continued ...
The most basic building block of electronic data is a bit. It can have only 2 values, 0 and 1. Other data structures are built from collection of bits, such as an 8-bit byte, or a 32-bit float.
When a collection of bits needs to be used to represent a character, a certain encoding is used to give lexical meaning to these bits, such as ASCII, UTF8 and others.
When you want to display character information to the screen, you use a graphical layer to draw pixels representing the "character" (collection of bits with matching encoding) to the screen.

Why is it useful to know how to convert between numeric bases?

We are learning about converting Binary to Decimal (and vice-versa) as well as other base-conversion methods, but I don't understand the necessity of this knowledge.
Are there any real-world uses for converting numbers between different bases?
When dealing with Unicode escape codes— '\u2014' in Javascript is — in HTML
When debugging— many debuggers show all numbers in hex
When writing bitmasks— it's more convenient to specify powers of two in hex (or by writing 1 << 4)
In this article I describe a concrete use case. In short, suppose you have a series of bytes you want to transfer using some transport mechanism, but you cannot simply pass the payload as bytes, because you are not able to send binary content. Let's say you can only use 64 characters for encoding the payload. A solution to this problem is to convert the bytes (8-bit characters) into 6-bit characters. Here the number conversion comes into play. Consider the series of bytes as a big number whose base is 256. Then convert it into a number with base 64 and you are done. Each digit of the new base 64 number now denotes a character of your encoded payload...
If you have a device, such as a hard drive, that can only have a set number of states, you can only count in a number system with that many states.
Because a computer's byte only have on and off, you can only represent 0 and 1. Therefore a base2 system is used.
If you have a device that had 3 states, you could represent 0, 1 and 2, and therefore count in a base 3 system.

In what situations is octal base used?

I've seen binary and hex used quite often but never octal. Yet octal has it's own convention for being used in some languages (ie, a leading 0 indicating octal base). When is octal used? What are some typical situations when one would use octal or octal would be easier to reason about? Or is it merely a matter of taste?
Octal is used when the number of bits in one word is a multiple of 3, or if the grouping of the bits makes sense to notate in groups of 3. Examples are
ancient systems with 18bit word sizes (mostly historical)
systems with 9bit bytes (mostly historical)
unix file permissions with 9bits (3*3bits, "rwxr-x---" 0750)
unix file permissions with 12bits (the same as the 9bit version but adding three bits in front for setuid, setgid, and sticky, 01777 but the letters are more complicated here)
I have not encountered any uses of octal other than unix file permission bits during my about 25 years in IT.
If the number of bits in your word is a multiple of 4, however, please do use hex, by all means.
Octal is used as a shorthand for representing file permissions on UNIX systems. For example, file mode rwxr-xr-x would be 0755.
Octal is used when the syntax is a relic from the ages when it perhaps made sense on some platform (system words haven't always been a multiple of 8 bits). Nowadays hex is the thing to use.
Didn't think of this but Digital displays!
Several other uses from:
http://en.wikipedia.org/wiki/Octal
One of the main reasons octal used to be more frequently used was that it is easier to convert between octal and binary in your head than hex to binary: you only have to remember the binary representation of the 8 octal digits (0-7).
Back in the days when debugging meant reading register contents from a row of LEDs, or entering data with an array of toggle switches, this was a big concern. The panels on many of these early computers grouped the LEDs and switches in groups of threes to facilitate this.
However, hex began to win out as word sizes that are multiples of 8-bit bytes began to win out, and the need to read and enter data in binary became unecessary (with console text UI and later GUI debuggers).
If birds could count, my guess would be that they use octal. While most birds have 3 digits on their feathered "hands", most are Tetradactyly, meaning 4 toes on each foot.
In avionics, ARINC 429 word labels are almost always expressed in octal.
Music, as long as you stay away from (most) sharps and flats.
FYI, there are a few places that windows and javascript automatically decide that a number prefixed with a zero is octal and convert the number.
In windows if you ping and address like 10.0.2.010 it will actually ping 10.0.2.8
Windows also does this if you enter it as the ip/dns address for the computer
Though it is deprecated, Javascript does this by default on some functions like parseInt if you do not specify a radix http://www.w3schools.com/jsref/jsref_parseint.asp

How can you reverse engineer a binary thrift file?

I've been asked to process some files serialized as binary (not text/JSON unfortunately) Thrift objects, but I don't have access to the program or programmer that created the files, so I have no idea of their structure, field order, etc. Is there a way using the Thrift libraries to open a binary file and analyze it, getting a list of the field types, values, nesting, etc.?
Unfortunately it appears that Thrift's binary protocol does not do very much tagging of data at all; to decode it appears to assume you have the .thrift file in hand so you know, say, the next 4 bytes are supposed to be an integer, and aren't actually the first half of a float. So it appears you are stuck with, basically, looking at the files in a hex editor (or equivalent) and trying to deduce fields based on the exact patterns you're seeing.
There are a very few helpful bits:
Each file begins with a version, protocol identifier string, and sequence number. Maps will begin with 6 bytes that identify the key and value types (first two bytes, as integer codes) plus the number of elements as a 4 byte integer. The type codes appear to be standard (the canonical location of their definitions seems to be TProtocol.h in the Thrift sources, for instance a boolean value is specified by type code 2, UTF-8 string by type code 16, and so on). Strings are prefixed by a 4 byte integer length field, and lists are prefixed by the type (1 byte) and a 4 byte length. It looks like all integer fields are saved big-endian, and floating points are saved in IEEE format (which should make doubles relatively easy to find, at least).
The TBinaryProtocol* files in Thrift have a few more helpful details; on the plus side, there are a number of different implementations so you can read the ones implemented in the language you are most comfortable with.
Sorry, I know this probably isn't that helpful but it really does appear this is all the information the Thrift binary format provides; clearly the binary format was designed with the intent that you would always know the exact protocol spec already, and that the goal was the minimize wire space, rather than make it at all easy to decode blindly.