Why HTML decimal and HTML hex? - html

I have tried to Google quite a while now for an answer why HTML entities can be compiled either in HTML decimal or HTML hex. So my questions are:
What is the difference between HTML decimal and HTML hex?
Why are there two systems to do the same thing?

Originally, HTML was nominally based on SGML, which has decimal character references only. Later, the hexadecimal alternative was added in HTML 4.01 (and soon implemented in browsers), then retrofitted into SGML in the Web Adaptations Annex.
The apparent main reason for adding the hexadecimal alternative was that all modern character code and encoding standards, such as Unicode, use hexadecimal notation for the code numbers of characters. The ability to refer to a character by its Unicode number, written in the conventional hexadecimal notation, just prefixed with &#x and suffixed with ;, helps to avoid errors that may arise if people convert from hexadecimal to decimal notation.

There are three radixes used in computer technologies:
Binary, radix 2, because ultimately integers are arrays of switches, each which may be on (1) or off (0).
Octal, radix 8, because each digit represents exactly 3 bits, so it's easy to convert to binary.
Decimal, radix 10, because humans have 10 fingers and because we grew up using this radix.
Hexadecimal, radix 16, because like octal it's easy to convert to bits, but even better because 2 hex digits can be expressed in exactly 1 byte. If, for example, you see an rgba value given in hex as 0x00ff00ff, you can see instantly that it represents opaque green.
So, to answer the question posed, for some of us hex is the natural way to express integers as it gives more insight into the storage. For others it's decimal. To each his or her own!
Finishing with an HTML example: could &65536; be a character of utf-16? In hex it's easy to see that the answer is no, because its the same as &x10000; which needs more than 16 bits.

Related

What's the exact meaning of the statement "Since ASCII used 7 bits for the character, it could only represent 128 different characters"?

I come across the below statement while studying about HTML Character Sets and Character Encoding :
Since ASCII used 7 bits for the character, it could only represent 128
different characters.
When we convert any decimal value from the ASCII character set to its binary equivalent it comes down to a 7-bits long binary number.
E.g. For Capital English Letter 'E' the decimal value of 69 exists in ASCII table. If we convert '69' to it's binary equivalent it comes down to the 7-bits long binary number 1000101
Then, why in the ASCII Table it's been mentioned as a 8-bits long binary number 01000101 instead of a 7-bits long binary number 1000101 ?
This is contradictory to the statement
Since ASCII used 7 bits for the character, it could only represent 128
different characters.
The above statement is saying that ASCII used 7 bits for the character.
Please clear my confusion about considering the binary equivalent of a decimal value. Whether should I consider a 7-bits long binary equivalent or a 8-bits long binary equivalent of any decimal value from the ASCII Table? Please explain to me in an easy to understand language.
Again, consider the below statement :
Since ASCII used 7 bits for the character, it could only represent 128
different characters.
According to the above statement how does the number of characters(128) that ASCII supports relates to the fact that ASCII uses 7 bits to represent any character?
Please clear the confusion.
Thank You.
In most processors, memory is byte-addressable and not bit-addressable. That is, a memory address gives the location of an 8-bit value. So, almost all data is manipulated in multiples of 8 bits at a time.
If we were to store a value that has by its nature only 7 bits, we would very often use one byte per value. If the data is a sequence of such values, as text might be, we would still use one byte per value to make counting, sizing, indexing and iterating easier.
When we describe the value of a byte, we often show all of its bits, either in binary or hexadecimal. If a value is some sort of integer (say of 1, 2, 4, or 8 bytes) and its decimal representation would be more understandable, we would write the decimal digits for the whole integer. But in those cases, we might lose the concept of how many bytes it is.
BTW—HTML doesn't have anything to do with ASCII. And, Extended ASCII isn't one encoding. The fundamental rule of character encodings is to read (decode) with the encoding the text was written (encoded) with. So, a communication consists of the transferring of bytes and a shared understanding of the character encoding. (That makes saying "Extended ASCII" so inadequate as to be nearly useless.)
An HTML document represents a sequence of Unicode characters. So, one of the Unicode character encodings (UTF-8) is the most common encoding for an HTML document. Regardless, after it is read, the result is Unicode. An HTML document could be encoded in ASCII but, why do that? If you did know it was ASCII, you could just as easily know that it's UTF-8.
Outside of HTML, ASCII is used billions—if not trillions—of times per second. But, unless you know exactly how it pertains to your work, forget about it, you probably aren't using ASCII.

Converting from lowercase to uppercase using decimal/binary representation of alphabets

I'm using RISC-V and I am limited to using just and, or, xori, addition, subtraction, multiplication, division of integer values.
So for instance, the letter "a" will be represented as 97 and "aa" will be represented as 24929, and so on. The UI converts binary sequence into decimal representation, and I cannot directly modify n-th bit.
Is there anyway I can find a simple, general equation of converting from lowercase to uppercase the decimal representation of a maximum of 8 letter sequence of Strings?
Also, I forgot to add, I can't partition the string into individual letters either. Maybe it's possible, but I don't know how to do it.
Letters or characters are usually represented as byte values, which are easier to read in hexadecimal. This can be seen if you convert 97 and 24929 to hex.
You did not mention the system which was used to encode the characters; mentioning the value for one character is not definitive. Assuming your letters are encoded as ASCII, find an ASCII table and figure out the DIFFERENCE between upper- and lowercase character codes.
Use this knowledge to design an algorithm to transform lowercase character codes to uppercase.
A good uppercase conversion algorithm will not modify characters that are not lowercase letters.
This can be extended to SIMD if you are careful to avoid carries between bytes if you need to add or subtract.

Representation of numbers in the computer

In the representation of inputs in the computer, the numbers are taken as characters and encoded with Ascii code or are they converted directly to binary? in other way: When my input is considered as integer and not a character?
Both are possible, and it depends on the application. In other words the software programmer decides. In general, binary representation is more efficient in terms of storage requirements and processing speed. Therefore binary representation is more usual, but there are good examples when it is better to keep numbers as strings:
to avoid problems with conversions
phone numbers
when no adequate binary representation is available (e.g. 100 digits of pi)
numbers where no processing takes places
to be continued ...
The most basic building block of electronic data is a bit. It can have only 2 values, 0 and 1. Other data structures are built from collection of bits, such as an 8-bit byte, or a 32-bit float.
When a collection of bits needs to be used to represent a character, a certain encoding is used to give lexical meaning to these bits, such as ASCII, UTF8 and others.
When you want to display character information to the screen, you use a graphical layer to draw pixels representing the "character" (collection of bits with matching encoding) to the screen.

Why are leading zeroes used to represent octal numbers?

I've always wondered why leading zeroes (0) are used to represent octal numbers, instead of — for example — 0o. The use of 0o would be just as helpful, but would not cause as many problems as leading 0es (e.g. parseInt('08'); in JavaScript). What are the reason(s) behind this design choice?
All modern languages import this convention from C, which imported it from B, which imported it from BCPL.
Except BCPL used #1234 for octal and #x1234 for hexadecimal. B has departed from this convention because # was an unary operator in B (integer to floating point conversion), so #1234 could not be used, and # as a base indicator was replaced with 0.
The designers of B tried to make the syntax very compact. I guess this is the reason they did not use a two-character prefix.
Worth noting that in Python 3.0, they decided that octal literals must be prefixed with '0o' and the old '0' prefix became a SyntaxError, for the exact reasons you mention in your question
https://www.python.org/dev/peps/pep-3127/#removal-of-old-octal-syntax
"0b" is often used for binary rather than for octal. The leading "0" is, I suspect for "O -ctal".
If you know you are going to be parsing octal then use parseInt('08', 10); to make it treat the number as base ten.

In what situations is octal base used?

I've seen binary and hex used quite often but never octal. Yet octal has it's own convention for being used in some languages (ie, a leading 0 indicating octal base). When is octal used? What are some typical situations when one would use octal or octal would be easier to reason about? Or is it merely a matter of taste?
Octal is used when the number of bits in one word is a multiple of 3, or if the grouping of the bits makes sense to notate in groups of 3. Examples are
ancient systems with 18bit word sizes (mostly historical)
systems with 9bit bytes (mostly historical)
unix file permissions with 9bits (3*3bits, "rwxr-x---" 0750)
unix file permissions with 12bits (the same as the 9bit version but adding three bits in front for setuid, setgid, and sticky, 01777 but the letters are more complicated here)
I have not encountered any uses of octal other than unix file permission bits during my about 25 years in IT.
If the number of bits in your word is a multiple of 4, however, please do use hex, by all means.
Octal is used as a shorthand for representing file permissions on UNIX systems. For example, file mode rwxr-xr-x would be 0755.
Octal is used when the syntax is a relic from the ages when it perhaps made sense on some platform (system words haven't always been a multiple of 8 bits). Nowadays hex is the thing to use.
Didn't think of this but Digital displays!
Several other uses from:
http://en.wikipedia.org/wiki/Octal
One of the main reasons octal used to be more frequently used was that it is easier to convert between octal and binary in your head than hex to binary: you only have to remember the binary representation of the 8 octal digits (0-7).
Back in the days when debugging meant reading register contents from a row of LEDs, or entering data with an array of toggle switches, this was a big concern. The panels on many of these early computers grouped the LEDs and switches in groups of threes to facilitate this.
However, hex began to win out as word sizes that are multiples of 8-bit bytes began to win out, and the need to read and enter data in binary became unecessary (with console text UI and later GUI debuggers).
If birds could count, my guess would be that they use octal. While most birds have 3 digits on their feathered "hands", most are Tetradactyly, meaning 4 toes on each foot.
In avionics, ARINC 429 word labels are almost always expressed in octal.
Music, as long as you stay away from (most) sharps and flats.
FYI, there are a few places that windows and javascript automatically decide that a number prefixed with a zero is octal and convert the number.
In windows if you ping and address like 10.0.2.010 it will actually ping 10.0.2.8
Windows also does this if you enter it as the ip/dns address for the computer
Though it is deprecated, Javascript does this by default on some functions like parseInt if you do not specify a radix http://www.w3schools.com/jsref/jsref_parseint.asp