As nibble for 4-bit, octet for 8-bit, is there a english word for 2-bit or 3-bit? - terminology

And, though the length of a machine word is dependent, but it's often 16-bit long. Is there english words explicitly defined for 2-bit, 3-bit, and 16-bit?

Usually, a short refers to a 16-bit integer, but that's not guaranteed. A short only needs to be char <= short <= int in terms of bit size.
Another term suggested by that Wikipedia article is halfword, but I've never seen anyone use that term.
The Jargon file proposes crumb (and some others, tongue firmly in cheek) for 2-bit datatypes, but doesn't have anything for 3 bits.

Related

Handling Double values on CUDA ( Compute Capability 1.1) [duplicate]

I am writing a program for an embedded hardware that only supports 32-bit single-precision floating-point arithmetic. The algorithm I am implementing, however, requires a 64-bit double-precision addition and comparison. I am trying to emulate double datatype using a tuple of two floats. So a double d will be emulated as a struct containing the tuple: (float d.hi, float d.low).
The comparison should be straightforward using a lexicographic ordering. The addition however is a bit tricky because I am not sure which base should I use. Should it be FLT_MAX? And how can I detect a carry?
How can this be done?
Edit (Clarity): I need the extra significant digits rather than the extra range.
double-float is a technique that uses pairs of single-precision numbers to achieve almost twice the precision of single precision arithmetic accompanied by a slight reduction of the single precision exponent range (due to intermediate underflow and overflow at the far ends of the range). The basic algorithms were developed by T.J. Dekker and William Kahan in the 1970s. Below I list two fairly recent papers that show how these techniques can be adapted to GPUs, however much of the material covered in these papers is applicable independent of platform so should be useful for the task at hand.
https://hal.archives-ouvertes.fr/hal-00021443
Guillaume Da Graça, David Defour
Implementation of float-float operators on graphics hardware,
7th conference on Real Numbers and Computers, RNC7.
http://andrewthall.org/papers/df64_qf128.pdf
Andrew Thall
Extended-Precision Floating-Point Numbers for GPU Computation.
This is not going to be simple.
A float (IEEE 754 single-precision) has 1 sign bit, 8 exponent bits, and 23 bits of mantissa (well, effectively 24).
A double (IEEE 754 double-precision) has 1 sign bit, 11 exponent bits, and 52 bits of mantissa (effectively 53).
You can use the sign bit and 8 exponent bits from one of your floats, but how are you going to get 3 more exponent bits and 29 bits of mantissa out of the other?
Maybe somebody else can come up with something clever, but my answer is "this is impossible". (Or at least, "no easier than using a 64-bit struct and implementing your own operations")
It depends a bit on what types of operations you want to perform. If you only care about additions and subtractions, Kahan Summation can be a great solution.
If you need both the precision and a wide range, you'll be needing a software implementation of double precision floating point, such as SoftFloat.
(For addition, the basic principle is to break the representation (e.g. 64 bits) of each value into its three consitituent parts - sign, exponent and mantissa; then shift the mantissa of one part based on the difference in the exponents, add to or subtract from the mantissa of the other part based on the sign bits, and possibly renormalise the result by shifting the mantissa and adjusting the exponent correspondingly. Along the way, there are a lot of fiddly details to account for, in order to avoid unnecessary loss of accuracy, and deal with special values such as infinities, NaNs, and denormalised numbers.)
Given all the constraints for high precision over 23 magnitudes, I think the most fruitful method would be to implement a custom arithmetic package.
A quick survey shows Briggs' doubledouble C++ library should address your needs and then some. See this.[*] The default implementation is based on double to achieve 30 significant figure computation, but it is readily rewritten to use float to achieve 13 or 14 significant figures. That may be enough for your requirements if care is taken to segregate addition operations with similar magnitude values, only adding extremes together in the last operations.
Beware though, the comments mention messing around with the x87 control register. I didn't check into the details, but that might make the code too non-portable for your use.
[*] The C++ source is linked by that article, but only the gzipped tar was not a dead link.
This is similar to the double-double arithmetic used by many compilers for long double on some machines that have only hardware double calculation support. It's also used as float-float on older NVIDIA GPUs where there's no double support. See Emulating FP64 with 2 FP32 on a GPU. This way the calculation will be much faster than a software floating-point library.
However in most microcontrollers there's no hardware support for floats so they're implemented purely in software. Because of that, using float-float may not increase performance and introduce some memory overhead to save the extra bytes of exponent.
If you really need the longer mantissa, try using a custom floating-point library. You can choose whatever is enough for you, for example change the library to adapt a new 48-bit float type of your own if only 40 bits of mantissa and 7 bits of exponent is needed. No need to spend time for calculating/storing the unnecessary 16 bits anymore. But this library should be very efficient because compiler's libraries often have assembly level optimization for their own type of float.
Another software-based solution that might be of use: GNU MPFR
It takes care of many other special cases and allows arbitrary precision (better than 64-bit double) that you would have to otherwise take care of yourself.
That's not practical. If it was, every embedded 32-bit processor (or compiler) would emulate double precision by doing that. As it stands, none do it that I am aware of. Most of them just substitute float for double.
If you need the precision and not the dynamic range, your best bet would be to use fixed point. IF the compiler supports 64-bit this will be easier too.

negative integers in binary

5 (decimal) in binary 00000101
-5 (two's complement) in binary 11111011
but 11111011 is also 251 (decimal)!
How does computer discern one from another??
How does it know whether it's -5 or 251??
it's THE SAME 11111011
Thanks in advance!!
Signed bytes have a maximum of 127.
Unsigned bytes cannot be negative.
The compiler knows whether the variable holding that value is of signed or unsigend type, and treats it appropriately.
If your program chooses to treat the byte as signed, the run-time system decides whether the byte is to be considered positive or negative according to the high-order bit. A 1 in that high-order bit (bit 7, counting from the low-order bit 0) means the number is negative; a 0 in that bit position means the number is positive. So, in the case of 11111011, bit 7 is set to 1 and the number is treated, accordingly, as negative.
Because the sign bit takes up one bit position, the absolute magnitude of the number can range from 0 to 127, as was said before.
If your program chooses to treat the byte as unsigned, on the other hand, what would have been the sign bit is included in the magnitude, which can then range from 0 to 255.
Two's complement is designed to allow signed numbers to be added/substracted to one another in the same way unsigned numbers are. So there are only two cases where the signed-ness of numbers affect the computer at low level.
when there are overflows
when you are performing operations on mixed: one signed, one unsigned
Different processors take different tacks for this. WRT orverflows, the MIPS RISC architecture, for example, deals with overflows using traps. See http://en.wikipedia.org/wiki/MIPS_architecture#MIPS_I_instruction_formats
To the best of my knowledge, mixing signed and unsigned needs to avoided at a program level.
If you're asking "how does the program know how to interpret the value" - in general it's because you've told the compiler the "type" of the variable you assigned the value to. The program doesn't actually care if 00000101 as "5 decimal", it just has an unsigned integer with value 00000101 that it can perform operations legal for unsigned integers upon, and will behave in a given manner if you try to compare with or cast to a different "type" of variable.
At the end of the day everything in programming comes down to binary - all data (strings, numbers, images, sounds etc etc) and the compiled code just ends up as a large binary blob.

What is the name for a number which is the same in both big- and little-endian byte order?

One of my old classmates just asked me this and I'm at a total loss. Google gave me a lot of definitions of endian-ness, but not the term for this special case. IS there even a term for this?
See palindrome. Consider, for example, a 32-bit integer as a sequence of four byte values when stored in memory. If the sequence of four bytes is a palindrome, then the it has the same integer value in both big- and little-endian. So,
all 8-bit integers are palindromes,
all 16-bit integers of the form AA (where A is a byte) are palindromes,
all 32-bit integers of the form AAAA or ABBA (where A and B are bytes) are palindromes,
and so on. Historically, there have been architectures with mixed endianness (notably the VAX), but here I'm limiting myself to pure big- or little-endian representations.
how about zero :-)
The answer is, "A byte".
Palindromic, even though that term is usually used for numbers that have digits that are the same both ways.

Help Understanding 8bit Floating Point Conversions with Decimals and Binary

I'm in a basic Engineering class and we're going through binary conversions. I can figure out the base 10 to binary or hex conversions really well, however the 8bit floating point conversions are kicking my ass and I can't find anything online that breaks it down in a n00b level and shows the steps? Wondering if any gurus have found anything online that would be helpful for this situation.
I have questions like 00101010(8bfp) = what number in base 10
Whenever I want to remember how floating point works, I refer back to the wikipedia page on 32 bit floats. I think it lays out the concepts pretty well.
http://en.wikipedia.org/wiki/Single_precision_floating-point_format
Note that wikipedia doesn't know what 8 bit floats are, I think your professor may have invented them ;)
Binary floating point formats are usually broken down into 3 fields: Sign bit, exponent and mantissa. The sign bit is simply set to 1 if the entire number should be negative, and 0 if the number is positive. The exponent is usually an unsigned int with an offset, where 2 to the 0'th power (1) is in the middle of the range. It's simpler in hardware and software to compare sizes this way. The mantissa works similarly to the mantissa in regular scientific notation, with the following caveat: The most significant bit is hidden. This is due to the requirement of normalizing scientific notation to have one significant digit above the decimal point. Remember when your math teacher in elementary school would whack your knuckles with a ruler for writing 35.648 x 10^6 or 0.35648 x 10^8 instead of the correct 3.5648 x 10^7? Since binary only has two states, this required digit above the decimal point is always one, and eliminating it allows another bit of accuracy at the low end of the mantissa.

Are digits represented in sequence in all text encodings?

This question is language agnostic but is inspired by these c/c++ questions.
How to convert a single char into an int
Char to int conversion in C
Is it safe to assume that the characters for digits (0123456789) appear contigiously in all text encodings?
i.e. is it safe to assume that
'9'-'8' = 1
'9'-'7' = 2
...
'9'-'0' = 9
in all encodings?
I'm looking forward to a definitive answer to this one :)
Thanks,
Update: OK, let me limit all encodings to mean anything as old as ASCII and/or EBCDIC and afterwards. Sandscrit I'm not so worried about . . .
I don't know about all encodings, but at least in ASCII and <shudder> EBCDIC, the digits 0-9 all come consecutively and in increasing numeric order. Which means that all ASCII- and EBCDIC-based encodings should also have their digits in order. So for pretty much anything you'll encounter, barring Morse code or worse, I'm going to say yes.
You're going to find it hard to prove a negative. Nobody can possibly know every text encoding ever invented.
All encodings in common use today (except EBCDIC, is it still in common use?) are supersets of ASCII. I'd say you're more likely to win the lottery than you are to find a practical environment where the strict ordering of '0' to '9' doesn't hold.
Both the C++ Standard and the C standard require that this be so, for C++ and C program text.
According to K&R ANSI C it is.
Excerpt:
..."This particular program relies on the properties of the character representation of the digits. For example, the test
if (c >= '0' && c <= '9') ...
determines whether the character in c is a digit. If it is, the numeric value of that
digit is
c - '0'
This works only if '0', '1', ..., '9' have consecutive increasing values. Fortunately, this is true for all character sets...."
All text encodings I know of typically order each representation of digits sequentially. However, your question becomes a lot broader when you include all of the other representations of digits in other encodings, such as Japanese: 1234567890. Notice how the characters for the numbers are different? Well, they are actually different code points. So, I really think the answer to your question is a hard maybe, since there are so many encodings out there and they have multiple representations of digits in them.
A better question is to ask yourself, why do I need to count on digits to be in sequential code points in the first place?