The semantics of integers and doubles are quite different. Lua recently also added integer support (even though as a subtype of number). Python in a sense is perfect in its type completeness, I am not even talking about other more heavy weight languages like C++/C#/Java ...
There are systems that treats integers significantly different than floating point numbers (or doubles), and while using JSON, it imposes some mental damage that all the integers writing to the wire come back as doubles. On the high level application logic, one could probably differentiate based on the property, but the extra double to int cast makes the code not intuitive and misleading - people would ask questions like: why you do a cast here? Are you sure this is an integer?
So in a sense, the intent is not completely clear when there is no explicit integer support.
Can anyone shed some lights?
Thanks!
Related
Not sure if the title of my question makes sense, so bear with me. I'd like to find a system for representing single digit numbers with as few bits as possible. There is a method called "Densely packed decimal" (https://en.wikipedia.org/wiki/Densely_packed_decimal) which would be my ideal solution, but I wouldn't even know if that's possible or how I could implement it without further research or guidance from a guru.
The next best thing would be to be able to use a 4-bit addressing system to represent digits, but once again I'm not sure if that is even possible.
So! Barring implementations of the above methods/systems, I could settle for a 1-byte data type which I could use to represent pairs of two integers. Is there a 1-byte data-type in Fortran, or does it not allow for that level of control?
There is a 1 byte datatype in (almost) every programming language. It is the character. It is actually the definition of a byte, that it can hold a default character.
There is also a 1-byte (strictly speaking 1-octet) integer type in Fortran, accessible as integer(int8) where int8 is a constant from the iso_fortran_env module (Fortran 2008).
Both can be used to implement such things. Whether you will use division by other numbers, xoring, or Fortran bit manipulation intrinsic functions https://www.nsc.liu.se/~boein/f77to90/a5.html#section10 (probably the best option) is up to you.
Flash is known to behave in very unpredictable ways ways when it comes to manipulating data. I'm curious that if there is any performance/memory benefit to using Numbers instead of ints aside from values that need precision. I have heard that some basic operations in Flash may convert multiple times between the two type to resolve the expression. I've also heard that Flash runtime, under the hood, actually maps ints to non-precision Numbers/Floats during runtime. Is any of this true?
Flash runtime is a dark place indeed.
As you mentioned AVM2 does convert big ints into Number.
Whole Numbers are actualy ints.
And there's more stuff about ints.
Uints used to be slow used in a loop BUT NOW THEY ARE NOT (results in the article seem to be a combination of weird bytecode generation and JIT optimizations).
Numbers take more space in memory but this is nothing compared to even a single JPEG file.
Logically it feels better to use uints in loops.
Numbers are floating point falues you have to be careful comparing them.
Jackson Dunstan does pretty good tests of different AS3 language constructs performance. Of course it's always good to check results yourself. From the series about 10.2 performance you can see that with every new Flash Player version they optimize something but other things might get slower: 1 2 3.
P.S. This answer might get old very soon and might as well be cited in a couple of years later which of course will be wrong.
You don't get any real performance benefit with int over Number. So if you're not using a variable for stuff like loop indices or things that require exact increments, Number is fine. In fact, a Number can be NaN if you get an invalid result, which is a nice benefit.
Last night I was thining that programming languages can have a feature in which we should be able to constraints the values assigned to primitive data types.
For example I should be able to say my variable of type int can only have value between 0 and 100
int<0, 100> progress;
This would then act as a normal integer in all scenarios except the fact that you won't be able to specify values out of the range defined in constraint. The compiler will not compile the code progress=200.
This constraint can be carried over with type information.
Is this possible? Is it done in any programming language? If yes then which language has it and what is this technique called?
It is generally not possible. It makes little sense to use integers without any arithmetic operators. With arithmetic operators you have this:
int<0,100> x, u, v;
...
x = u + v; // is it in range?
If you're willing to do checks at run-time, then yes, several mainstream languages support it, starting with Pascal.
I believe Pascal (and Delphi) offers something similar with subrange types.
I think this is not possible at all in Java and in Ruby (well, in Ruby probably it is possible, but requires some effort). I have no idea about other languages, though.
Ada allows something like what you describe with ranges:
type My_Int is range 1..100;
So if you try assign a value to a My_Int that's less than 1 or greater than 100, Ada will raise the exception Constraint_Error.
Note that I've never used Ada. I've only read about this feature, so do your research before you plunge in.
It is certainly possible. There are many different techniques to do that, but 'dependent types' is the most popular.
The constraints can be even checked statically at compile time by compiler. See, for example, Agda2 and ATS (ats-lang.org).
Weaker forms of your 'range types' are possible without full dependent types, I think.
Some keywords to search for research papers:
- Guarded types
- Refinment types
- Subrange types
Certainly! In case you missed it: C. Do you C? You don't C? You don't count short as a constraint on Integer? Ok, so C only gives you pre-packaged constrained types.
BTW: It seems the answer that Pascal has subrange types misses the point of them. In Pascal array bounds violations are not possible. This is because the array index must of the same type as the array was declared with. In turn this means that to use an integer index you must coerce it down to the subrange, and that is where the run time check is done, not accessing the array.
This is a very important idea because it means a for loop over an array index type may access the array components safely without any run time checking.
Pascal has subranges. Ada extended that a bit, so you can do something like a subrange, or you can create an entirely new type with characteristics of the existing type, but not compatible with it (e.g., even if it was in the right range, you wouldn't be able to assign an Integer to your new type based off of Integer).
C++ doesn't support the idea directly, but is flexible enough that you can implement it if you want to. If you decide to support all the compound assignment operators (+=, -=, *=, etc.) this can be a lot of work though.
Other languages that support operator overloading (e.g., ML and company) can probably support it in much the same way as C++.
Also note that there are a few non-trivial decisions involved in the design. In particular, if the type is used in a way that could/does result in an intermediate result that overflows the specified range, but produces a final result that's within the specified range, what do you want to happen? Depending on your situation, that might be an error, or it might be entirely acceptable, and you'll have to decide which.
I really doubt that you can do that. Afterall these are primitive datatypes, with emphasis on primitive!
adding a constraint will make the type a subclass of its primitive state, thus extending it.
from wikipedia:
a basic type is a data type provided by a programming language as a basic building block. Most languages allow more complicated composite types to be recursively constructed starting from basic types.
a built-in type is a data type for which the programming language provides built-in support.
So personally, even if it is possible i wouldnt do, since its a bad practice. Instead just create an object that returns this type and the constraints (which i am sure you thought of this solution).
SQL has domains, which consist of a base type together with a dynamically-checked constraint. For example, you might define the domain telephone_number as a character string with an appropriate number of digits, valid area code, etc.
When programming in a C-like language should one's "default" integer type be int or uint/unsigned int? By default, I mean when you don't need negative numbers but either one should be easily big enough for the data you're holding. I can think of good arguments for both:
signed: Better-behaved mathematically, less possibility of weird behavior if you try to go below zero in some boundary case you didn't think of, generally avoids odd corner cases better.
unsigned: Provides a little extra assurance against overflow, just in case your assumptions about the values are wrong. Serves as documentation that the value represented by the variable should never be negative.
The Google C++ Style Guide has an interesting opinion on unsigned integers:
(quote follows:)
On Unsigned Integers
Some people, including some textbook authors, recommend using unsigned types to represent numbers that are never negative. This is intended as a form of self-documentation. However, in C, the advantages of such documentation are outweighed by the real bugs it can introduce. Consider:
for (unsigned int i = foo.Length()-1; i >= 0; --i) ...
This code will never terminate! Sometimes gcc will notice this bug and warn you, but often it will not. Equally bad bugs can occur when comparing signed and unsigned variables. Basically, C's type-promotion scheme causes unsigned types to behave differently than one might expect.
So, document that a variable is non-negative using assertions. Don't use an unsigned type.
(end quote)
Certainly signed. If overflow worries you, underflow should worry you more, because going "below zero" by accident is easier than over int-max.
"unsigned" should be a conscious choice that makes the developer think about potential risks, used only there where you are absolutely sure that you can never go negative (not even accidentally), and that you need the additional value space.
As a rough rule of thumb, I used unsigned ints for counting things, and signed ints for measuring things.
If you find yourself decrementing or subtracting from an unsigned int, then you should be in a context where you already expect to be taking great care not to underflow (for instance, because you're in some low-level code stepping back from the end of a string, so of course you have first ensured that the string is long enough to support this). If you aren't in a context like that, where it's absolutely critical that you don't go below zero, then you should have used a signed value.
In my usage, unsigned ints are for values which absolutely cannot go negative (or for that one in a million situation where you actually want modulo 2^N arithmetic), not for values which just so happen not to be negative, in the current implementation, probably.
I tend to go with signed, unless I know I need unsigned, as int is typically signed, and it takes more effort to type unsigned int, and uint may cause another programmer a slight pause to think about what the values can be.
So, I don't see any benefit to just defaulting to an unsigned, since the normal int is signed.
You don't get much 'assurance against overflow' with unsigned. You're as likely to get different but stranger behaviour than with signed, but slightly later... Better to get those assumptions right before hand maybe?
Giving a more specific type assignment (like unsigned int) conveys more information about the usage of the variable, and can help the compiler to keep track of any times when you're assigning an "incorrect" value. For instance, if you're using a variable to track the database ID of an object/element, there (likely) should never be a time when the ID is less than zero (or one); in this sort of case, rather than asserting that state, using an unsigned integer value conveys that statement to other developers as well as the compiler.
I doubt there is a really good language-agnostic answer to this. There are enough differences between languages and how they handle mixed types that no one answer is going to make sense for all (or even most).
In the languages I use most often, I use signed unless I have a specific reason to do otherwise. That's mostly C and C++ though. In another language, I might well give a different answer.
I wanted to see if folks were using decimal for financial applications instead of double. I have seen lots of folks using double all over the place with unintended consequences . .
Do you see others making this mistake . . .
We did unfortunately and we regret it. We had to change all doubles to decimals. Decimals are good for financial applications. You can look at this article
A Money type for the CLR:
A convenient, high-performance money
structure for the CLR which handles
arithmetic operations, currency types,
formatting, and careful distribution
and rounding without loss.
Yes, using float or double for financials is a common mistake, leading to much, much pain. decimal is the most obvious choice in this scenario.
For general knowledge, a good discussion of each is here (float/double) and here (decimal).
This is not as obvious as you may think. I recently had the controller of a large corporation tell me that he wanted his financial reports to match what Excel would generate, which is maintaining calculated results internally at maximum precision and only rounding at the last minute for display purposes. This means that you can't always match the Excel answers by manual calculations using only displayed values. His explanation was that there were multiple algorithms for generating the results, each one doing rounding at a different place using decimal values, therefore potentially generating conflicting answers, but the Excel method always generated the same answer.
I personally think he's wrong, but with so many financial people using Excel without understanding how to use it properly for financial calculations, I'll bet there's a lot of people agreeing with this controller.
I don't want to start a religious war, but I'd love to hear other opinions on this.
If it is "scientific" measurement (I mean weight, length, area etc) use double.
If it is financial, or has anything to do with law (e.g. the area of a property) then use decimal.
The hard part is rounding.
If the tax is 2.4% do you round in the details or after the sum?
Most of the time yo have to do both (AND fix the difs)
I've run into this a few times. Many languages have nothing of the sort built in, and to someone who doesn't understand the problem it seems like just another hassle, especially if it looks like it works as intended without it.
I have always used Decimal. At least when I had a language that supports it. Otherwise, rounding errors will kill you.
I totally agree on correctness issues of floating point vs decimal mentioned above, but
many financial applications are performance critical.
In such cases you will consider to use float/double as decimal has great impact on performance in systems where decimal types are not supported in hardware. And still it is possible to wrap floating point types in higher level classes (e.g. Tax, Commission, Balance, Dividend, Quote, Tick, etc...) that represent domain model and encapsulate all rounding logic as well as valid operators on these types and their interactions.
And yes - in some projects I have implemented custom rounding functions to squeeze up to 20% more out of calculations compared to .NET or win32 methods.
Another thing to consider is whether you pass your objects out of process, as serializing decimals which are usually 4 integers and passing them over the wire is much more CPU intensive (esp if not supported) and results in significantly more bandwidth and larger memory footprint.