Not sure if the title of my question makes sense, so bear with me. I'd like to find a system for representing single digit numbers with as few bits as possible. There is a method called "Densely packed decimal" (https://en.wikipedia.org/wiki/Densely_packed_decimal) which would be my ideal solution, but I wouldn't even know if that's possible or how I could implement it without further research or guidance from a guru.
The next best thing would be to be able to use a 4-bit addressing system to represent digits, but once again I'm not sure if that is even possible.
So! Barring implementations of the above methods/systems, I could settle for a 1-byte data type which I could use to represent pairs of two integers. Is there a 1-byte data-type in Fortran, or does it not allow for that level of control?
There is a 1 byte datatype in (almost) every programming language. It is the character. It is actually the definition of a byte, that it can hold a default character.
There is also a 1-byte (strictly speaking 1-octet) integer type in Fortran, accessible as integer(int8) where int8 is a constant from the iso_fortran_env module (Fortran 2008).
Both can be used to implement such things. Whether you will use division by other numbers, xoring, or Fortran bit manipulation intrinsic functions https://www.nsc.liu.se/~boein/f77to90/a5.html#section10 (probably the best option) is up to you.
Related
I was wondering why a computer would need binary code converters to convert from BCD to Excess-3 for example. Why is this necessary can't computers just use one form of binary code.
Some older forms of binary representation persist even after a newer, "better" form comes into use. For example, legacy hardware that is still in use running legacy code that would be too costly to rewrite. Word lengths were not standardized in the early years of computing, so machines with words varying from 5 to 12 bits in length naturally will require different schemes for representing the same numbers.
In some cases, a company might persist in using a particular representation for self-compatibility (i.e., with the company's older products) reasons, or because it's an ingrained habit or "the company way." For example, the use of big-endian representation in Motorola and PowerPC chips vs. little-endian representation in Intel chips. (Though note that many PowerPC processors support both types of endian-ness, even if manufacturers typically only use one in a product.)
The previous paragraph only really touches upon byte ordering, but that can still be an issue for data interchange.
Even for BCD, there are many ways to store it (e.g., 1 BCD digit per word, or 2 BCD digits packed per byte). IBM has a clever representation called zoned decimal where they store a value in the high-order nybble which, combined with the BCD value in the low-order nybble, forms an EBCDIC character representing the value. This is pretty useful if you're married to the concept of representing characters using EBCDIC instead of ASCII (and using BCD instead of 2's complement or unsigned binary).
Tangentially related: IBM mainframes from the 1960s apparently converted BCD into an intermediate form called qui-binary before performing an arithmetic operation, then converted the result back to BCD. This is sort of a Rube Goldberg contraption, but according to the linked article, the intermediate form gives some error detection benefits.
The IBM System/360 (and probably a bunch of newer machines) supported both packed BCD and pure binary representations, though you have to watch out for IBM nomenclature — I have heard an old IBMer refer to BCD as "binary," and pure binary (unsigned, 2's complement, whatever) as "binary coded hex." This provides a lot of flexibility; some data may naturally be best represented in one format, some in the other, and the machine provides instructions to convert between forms conveniently.
In the case of floating point arithmetic, there are some values that cannot be represented exactly in binary floating point, but can be with BCD or a similar representation. For example, the number 0.1 has no exact binary floating point equivalent. This is why BCD and fixed-point arithmetic are preferred for things like representing amounts of currency, where you need to exactly represent things like $3.51 and can't allow floating point error to creep in when adding.
Intended application is important. Arbitrary precision arithmetic will require a different representation strategy compared to the fixed-width registers in your CPU (e.g., Java's BigDecimal class). Many languages support arbitrary precision (e.g., Scheme, Haskell), though the underlying implementation of arbitrary precision numbers varies. I'm honestly not sure what is preferable for arbitrary precision, a BCD-type scheme or a denser pure binary representation. In the case of Java's BigDecimal, conversion from binary floating point to BigDecimal is best done by first converting to a String — this makes such conversions potentially inefficient, so you really need to know ahead of time whether float or double is good enough, or whether you really need arbitrary precision, and when.
Another tangent: Groovy, a JVM language, quietly treats all floating point numeric literals in code as BigDecimal values, and uses BigDecimal arithmetic in preference to float or double. That's one reason Groovy is very popular with the insurance industry.
tl;dr There is no one-size-fits-all numeric data type, and as long as that remains the case (probably the heat death of the universe), you'll need to convert between representations.
The semantics of integers and doubles are quite different. Lua recently also added integer support (even though as a subtype of number). Python in a sense is perfect in its type completeness, I am not even talking about other more heavy weight languages like C++/C#/Java ...
There are systems that treats integers significantly different than floating point numbers (or doubles), and while using JSON, it imposes some mental damage that all the integers writing to the wire come back as doubles. On the high level application logic, one could probably differentiate based on the property, but the extra double to int cast makes the code not intuitive and misleading - people would ask questions like: why you do a cast here? Are you sure this is an integer?
So in a sense, the intent is not completely clear when there is no explicit integer support.
Can anyone shed some lights?
Thanks!
I hope this isn't too opinionated for SO; it may not have a good answer.
In a portion of a library I'm writing, I have a byte array that gets populated with values supplied by the user. These values might be of type Float, Double, Int (of different sizes), etc. with binary representations you might expect from C, say. This is all we can say about the values.
I have an opportunity for an optimization: I can initialize my byte array with the byte MAGIC, and then whenever no byte of the user-supplied value is equal to MAGIC I can take a fast path, otherwise I need to take the slow path.
So my question is: what is a principled way to go about choosing my magic byte, such that it will be reasonably likely not to appear in the (variously-encoded and distributed) data I receive?
Part of my question, I suppose, is whether there's something like a Benford's law that can tell me something about the distribution of bytes in many sorts of data.
Capture real-world data from a diverse set of inputs that would be used by applications of your library.
Write a quick and dirty program to analyze dataset. It sounds like what you want to know is which bytes are most frequently totally excluded. So the output of the program would say, for each byte value, how many inputs do not contain it.
This is not the same as least frequent byte. In data analysis you need to be careful to mind exactly what you're measuring!
Use the analysis to define your architecture. If no byte never appears, you can abandon the optimization entirely.
I was inclined to use byte 255 but I discovered that is also prevalent in MSWord files. So I use byte 254 now, for EOF code to terminate a file.
Although this question is not obviously related to a program, I think that it is quite interesting and it will help me with a program I am working on.
My question is this:
Computers are binary systems and have 3 fundamental operations available to them: AND, OR, and NOT (as I understand it), from which all of its other functions are derived. I can understand how the system can perform arithmetic on binary numbers using these operators, but how can the system then convert these numbers into decimal for the user without using the conventional operators (ie. +, -, *, /)?
You have BCD or IEEE Floating point to deal with decimal in the binary system, there are other specifications, but these are the most common and the IEEE is the one computers use nowadays i think.
Last night I was thining that programming languages can have a feature in which we should be able to constraints the values assigned to primitive data types.
For example I should be able to say my variable of type int can only have value between 0 and 100
int<0, 100> progress;
This would then act as a normal integer in all scenarios except the fact that you won't be able to specify values out of the range defined in constraint. The compiler will not compile the code progress=200.
This constraint can be carried over with type information.
Is this possible? Is it done in any programming language? If yes then which language has it and what is this technique called?
It is generally not possible. It makes little sense to use integers without any arithmetic operators. With arithmetic operators you have this:
int<0,100> x, u, v;
...
x = u + v; // is it in range?
If you're willing to do checks at run-time, then yes, several mainstream languages support it, starting with Pascal.
I believe Pascal (and Delphi) offers something similar with subrange types.
I think this is not possible at all in Java and in Ruby (well, in Ruby probably it is possible, but requires some effort). I have no idea about other languages, though.
Ada allows something like what you describe with ranges:
type My_Int is range 1..100;
So if you try assign a value to a My_Int that's less than 1 or greater than 100, Ada will raise the exception Constraint_Error.
Note that I've never used Ada. I've only read about this feature, so do your research before you plunge in.
It is certainly possible. There are many different techniques to do that, but 'dependent types' is the most popular.
The constraints can be even checked statically at compile time by compiler. See, for example, Agda2 and ATS (ats-lang.org).
Weaker forms of your 'range types' are possible without full dependent types, I think.
Some keywords to search for research papers:
- Guarded types
- Refinment types
- Subrange types
Certainly! In case you missed it: C. Do you C? You don't C? You don't count short as a constraint on Integer? Ok, so C only gives you pre-packaged constrained types.
BTW: It seems the answer that Pascal has subrange types misses the point of them. In Pascal array bounds violations are not possible. This is because the array index must of the same type as the array was declared with. In turn this means that to use an integer index you must coerce it down to the subrange, and that is where the run time check is done, not accessing the array.
This is a very important idea because it means a for loop over an array index type may access the array components safely without any run time checking.
Pascal has subranges. Ada extended that a bit, so you can do something like a subrange, or you can create an entirely new type with characteristics of the existing type, but not compatible with it (e.g., even if it was in the right range, you wouldn't be able to assign an Integer to your new type based off of Integer).
C++ doesn't support the idea directly, but is flexible enough that you can implement it if you want to. If you decide to support all the compound assignment operators (+=, -=, *=, etc.) this can be a lot of work though.
Other languages that support operator overloading (e.g., ML and company) can probably support it in much the same way as C++.
Also note that there are a few non-trivial decisions involved in the design. In particular, if the type is used in a way that could/does result in an intermediate result that overflows the specified range, but produces a final result that's within the specified range, what do you want to happen? Depending on your situation, that might be an error, or it might be entirely acceptable, and you'll have to decide which.
I really doubt that you can do that. Afterall these are primitive datatypes, with emphasis on primitive!
adding a constraint will make the type a subclass of its primitive state, thus extending it.
from wikipedia:
a basic type is a data type provided by a programming language as a basic building block. Most languages allow more complicated composite types to be recursively constructed starting from basic types.
a built-in type is a data type for which the programming language provides built-in support.
So personally, even if it is possible i wouldnt do, since its a bad practice. Instead just create an object that returns this type and the constraints (which i am sure you thought of this solution).
SQL has domains, which consist of a base type together with a dynamically-checked constraint. For example, you might define the domain telephone_number as a character string with an appropriate number of digits, valid area code, etc.