In Unison, is it possible to have smart constructors for unique types? - unison-lang

Take for example:
unique type ZeroOneClosed = ZeroOneClosed Float
> ZeroOneClosed 1.5
Here the idea is that we would have a function like newZeroOneClosed: Float -> Either Float ZeroOneClosed that takes any Float and only returns the right value if the float is in [0,1], while at the same time, preventing us directly to call ZeroOneClosed x outside of some restricted scope.

You can create the smart constructor, but restricting calls to ZeroOneClosed isn’t possible today, though it’s on the long-term roadmap.
For now, the closest you could get would be to use delete.term to remove the name of the constructor, which would make it a little harder (but not impossible) to call and a little harder to read as well, which you probably don’t want.

Related

Guidelines for listing the order of function arguments

Are there any rules that you follow to determine the order of function arguments? For example, float pow(float x, float exponent) vs float pow(float exponent, float x). For concreteness, C++ could be used, but the question is valid for all programming languages.
My main concern is from the usability point of view, not runtime performance.
Edit:
Some possible bases for ordering could be:
Inputs versus Output
The way a "formula" is usually written, i.e., arguments from left-to-write.
Specificity to the argument to the context of the function, i.e., whether it is a "general" argument, e.g., a singleton object of the system, or specific.
In the example you cite, I think the order was decided on the basis of the mathematical notation xexponent, in which the base is written before the exponent and becomes the left parameter.
I'm not aware of any really sound general principle other than to try to imagine what your users will expect and/or easily remember. People aren't even wholly agreed whether you should write (source, destination) or (destination, source) when copying (compare std::copy with std::memcpy), although I'm pretty sure that the former is now much more common.
There are a whole lot of general conventions, though, followed to different extents by different people:
if the function is considered primarly to act upon a particular object, put it first
parameters that are considered to "configure" the operation of the function come after parameters that are considered the main subject of the function.
out-params come last (but I suspect some people follow the reverse)
To some extent it doesn't really matter -- namely the extent to which your users have IDEs that tell them the parameter order as they type the function name.

When to use sentinel values?

I recently had to use a GPS location API where each location object had among other things two properties altitude and verticalAccuracy. A negative verticalAccuracy signifies that altitude is invalid, whereas normally a smaller but positive value of verticalAccuracy actually means that altitude is more precise (since it's the vertical distance that it may be off by - I'll leave the discussion as to why this measure is called verticalAccuracy and not verticalInaccuracy for some other time).
This got me thinking: When is it a good idea to use sentinel values like this API does and when would it be better to explicitly make a separate hasValidAltitude property? Are there other options?
Sometimes, sentinel answers aren't really possible; maybe the function's range coincides with the codomain (range). This isn't the case with altitude, unless you allow negative altitudes (maybe in the future, there will be underwater cities). For instance, maybe we're talking about the intersection between lines (not a great example, since floating-points have a few built-in sentinels like +INF and NaN) or the precise integer quotient (without rounding, this is not guaranteed to exist... 7 and 3, for instance... here, the remainder after division can be viewed as either a sentinel or a "exact integer quotient exists" property). More generally, any reliable sentinel can be trivially used to construct a property-based mechanism.
Based on this, I'd recommend avoiding sentinels wherever this is possible and makes sense. My reasoning is that they are an internal implementation detail of the module, and should be encapsulated behind an information-hiding interface.

What exactly is the danger of using magic debug values (such as 0xDEADBEEF) as literals?

It goes without saying that using hard-coded, hex literal pointers is a disaster:
int *i = 0xDEADBEEF;
// god knows if that location is available
However, what exactly is the danger in using hex literals as variable values?
int i = 0xDEADBEEF;
// what can go wrong?
If these values are indeed "dangerous" due to their use in various debugging scenarios, then this means that even if I do not use these literals, any program that during runtime happens to stumble upon one of these values might crash.
Anyone care to explain the real dangers of using hex literals?
Edit: just to clarify, I am not referring to the general use of constants in source code. I am specifically talking about debug-scenario issues that might come up to the use of hex values, with the specific example of 0xDEADBEEF.
There's no more danger in using a hex literal than any other kind of literal.
If your debugging session ends up executing data as code without you intending it to, you're in a world of pain anyway.
Of course, there's the normal "magic value" vs "well-named constant" code smell/cleanliness issue, but that's not really the sort of danger I think you're talking about.
With few exceptions, nothing is "constant".
We prefer to call them "slow variables" -- their value changes so slowly that we don't mind recompiling to change them.
However, we don't want to have many instances of 0x07 all through an application or a test script, where each instance has a different meaning.
We want to put a label on each constant that makes it totally unambiguous what it means.
if( x == 7 )
What does "7" mean in the above statement? Is it the same thing as
d = y / 7;
Is that the same meaning of "7"?
Test Cases are a slightly different problem. We don't need extensive, careful management of each instance of a numeric literal. Instead, we need documentation.
We can -- to an extent -- explain where "7" comes from by including a tiny bit of a hint in the code.
assertEquals( 7, someFunction(3,4), "Expected 7, see paragraph 7 of use case 7" );
A "constant" should be stated -- and named -- exactly once.
A "result" in a unit test isn't the same thing as a constant, and requires a little care in explaining where it came from.
A hex literal is no different than a decimal literal like 1. Any special significance of a value is due to the context of a particular program.
I believe the concern raised in the IP address formatting question earlier today was not related to the use of hex literals in general, but the specific use of 0xDEADBEEF. At least, that's the way I read it.
There is a concern with using 0xDEADBEEF in particular, though in my opinion it is a small one. The problem is that many debuggers and runtime systems have already co-opted this particular value as a marker value to indicate unallocated heap, bad pointers on the stack, etc.
I don't recall off the top of my head just which debugging and runtime systems use this particular value, but I have seen it used this way several times over the years. If you are debugging in one of these environments, the existence of the 0xDEADBEEF constant in your code will be indistinguishable from the values in unallocated RAM or whatever, so at best you will not have as useful RAM dumps, and at worst you will get warnings from the debugger.
Anyhow, that's what I think the original commenter meant when he told you it was bad for "use in various debugging scenarios."
There's no reason why you shouldn't assign 0xdeadbeef to a variable.
But woe betide the programmer who tries to assign decimal 3735928559, or octal 33653337357, or worst of all: binary 11011110101011011011111011101111.
Big Endian or Little Endian?
One danger is when constants are assigned to an array or structure with different sized members; the endian-ness of the compiler or machine (including JVM vs CLR) will affect the ordering of the bytes.
This issue is true of non-constant values, too, of course.
Here's an, admittedly contrived, example. What is the value of buffer[0] after the last line?
const int TEST[] = { 0x01BADA55, 0xDEADBEEF };
char buffer[BUFSZ];
memcpy( buffer, (void*)TEST, sizeof(TEST));
I don't see any problem with using it as a value. Its just a number after all.
There's no danger in using a hard-coded hex value for a pointer (like your first example) in the right context. In particular, when doing very low-level hardware development, this is the way you access memory-mapped registers. (Though it's best to give them names with a #define, for example.) But at the application level you shouldn't ever need to do an assignment like that.
I use CAFEBABE
I haven't seen it used by any debuggers before.
int *i = 0xDEADBEEF;
// god knows if that location is available
int i = 0xDEADBEEF;
// what can go wrong?
The danger that I see is the same in both cases: you've created a flag value that has no immediate context. There's nothing about i in either case that will let me know 100, 1000 or 10000 lines that there is a potentially critical flag value associated with it. What you've planted is a landmine bug that, if I don't remember to check for it in every possible use, I could be faced with a terrible debugging problem. Every use of i will now have to look like this:
if (i != 0xDEADBEEF) { // Curse the original designer to oblivion
// Actual useful work goes here
}
Repeat the above for all of the 7000 instances where you need to use i in your code.
Now, why is the above worse than this?
if (isIProperlyInitialized()) { // Which could just be a boolean
// Actual useful work goes here
}
At a minimum, I can spot several critical issues:
Spelling: I'm a terrible typist. How easily will you spot 0xDAEDBEEF in a code review? Or 0xDEADBEFF? On the other hand, I know that my compile will barf immediately on isIProperlyInitialised() (insert the obligatory s vs. z debate here).
Exposure of meaning. Rather than trying to hide your flags in the code, you've intentionally created a method that the rest of the code can see.
Opportunities for coupling. It's entirely possible that a pointer or reference is connected to a loosely defined cache. An initialization check could be overloaded to check first if the value is in cache, then to try to bring it back into cache and, if all that fails, return false.
In short, it's just as easy to write the code you really need as it is to create a mysterious magic value. The code-maintainer of the future (who quite likely will be you) will thank you.

Why don't languages raise errors on integer overflow by default?

In several modern programming languages (including C++, Java, and C#), the language allows integer overflow to occur at runtime without raising any kind of error condition.
For example, consider this (contrived) C# method, which does not account for the possibility of overflow/underflow. (For brevity, the method also doesn't handle the case where the specified list is a null reference.)
//Returns the sum of the values in the specified list.
private static int sumList(List<int> list)
{
int sum = 0;
foreach (int listItem in list)
{
sum += listItem;
}
return sum;
}
If this method is called as follows:
List<int> list = new List<int>();
list.Add(2000000000);
list.Add(2000000000);
int sum = sumList(list);
An overflow will occur in the sumList() method (because the int type in C# is a 32-bit signed integer, and the sum of the values in the list exceeds the value of the maximum 32-bit signed integer). The sum variable will have a value of -294967296 (not a value of 4000000000); this most likely is not what the (hypothetical) developer of the sumList method intended.
Obviously, there are various techniques that can be used by developers to avoid the possibility of integer overflow, such as using a type like Java's BigInteger, or the checked keyword and /checked compiler switch in C#.
However, the question that I'm interested in is why these languages were designed to by default allow integer overflows to happen in the first place, instead of, for example, raising an exception when an operation is performed at runtime that would result in an overflow. It seems like such behavior would help avoid bugs in cases where a developer neglects to account for the possibility of overflow when writing code that performs an arithmetic operation that could result in overflow. (These languages could have included something like an "unchecked" keyword that could designate a block where integer overflow is permitted to occur without an exception being raised, in those cases where that behavior is explicitly intended by the developer; C# actually does have this.)
Does the answer simply boil down to performance -- the language designers didn't want their respective languages to default to having "slow" arithmetic integer operations where the runtime would need to do extra work to check whether an overflow occurred, on every applicable arithmetic operation -- and this performance consideration outweighed the value of avoiding "silent" failures in the case that an inadvertent overflow occurs?
Are there other reasons for this language design decision as well, other than performance considerations?
In C#, it was a question of performance. Specifically, out-of-box benchmarking.
When C# was new, Microsoft was hoping a lot of C++ developers would switch to it. They knew that many C++ folks thought of C++ as being fast, especially faster than languages that "wasted" time on automatic memory management and the like.
Both potential adopters and magazine reviewers are likely to get a copy of the new C#, install it, build a trivial app that no one would ever write in the real world, run it in a tight loop, and measure how long it took. Then they'd make a decision for their company or publish an article based on that result.
The fact that their test showed C# to be slower than natively compiled C++ is the kind of thing that would turn people off C# quickly. The fact that your C# app is going to catch overflow/underflow automatically is the kind of thing that they might miss. So, it's off by default.
I think it's obvious that 99% of the time we want /checked to be on. It's an unfortunate compromise.
I think performance is a pretty good reason. If you consider every instruction in a typical program that increments an integer, and if instead of the simple op to add 1, it had to check every time if adding 1 would overflow the type, then the cost in extra cycles would be pretty severe.
You work under the assumption that integer overflow is always undesired behavior.
Sometimes integer overflow is desired behavior. One example I've seen is representation of an absolute heading value as a fixed point number. Given an unsigned int, 0 is 0 or 360 degrees and the max 32 bit unsigned integer (0xffffffff) is the biggest value just below 360 degrees.
int main()
{
uint32_t shipsHeadingInDegrees= 0;
// Rotate by a bunch of degrees
shipsHeadingInDegrees += 0x80000000; // 180 degrees
shipsHeadingInDegrees += 0x80000000; // another 180 degrees, overflows
shipsHeadingInDegrees += 0x80000000; // another 180 degrees
// Ships heading now will be 180 degrees
cout << "Ships Heading Is" << (double(shipsHeadingInDegrees) / double(0xffffffff)) * 360.0 << std::endl;
}
There are probably other situations where overflow is acceptable, similar to this example.
C/C++ never mandate trap behaviour. Even the obvious division by 0 is undefined behaviour in C++, not a specified kind of trap.
The C language doesn't have any concept of trapping, unless you count signals.
C++ has a design principle that it doesn't introduce overhead not present in C unless you ask for it. So Stroustrup would not have wanted to mandate that integers behave in a way which requires any explicit checking.
Some early compilers, and lightweight implementations for restricted hardware, don't support exceptions at all, and exceptions can often be disabled with compiler options. Mandating exceptions for language built-ins would be problematic.
Even if C++ had made integers checked, 99% of programmers in the early days would have turned if off for the performance boost...
Because checking for overflow takes time. Each primitive mathematical operation, which normally translates into a single assembly instruction would have to include a check for overflow, resulting in multiple assembly instructions, potentially resulting in a program that is several times slower.
It is likely 99% performance. On x86 would have to check the overflow flag on every operation which would be a huge performance hit.
The other 1% would cover those cases where people are doing fancy bit manipulations or being 'imprecise' in mixing signed and unsigned operations and want the overflow semantics.
Backwards compatibility is a big one. With C, it was assumed that you were paying enough attention to the size of your datatypes that if an over/underflow occurred, that that was what you wanted. Then with C++, C# and Java, very little changed with how the "built-in" data types worked.
If integer overflow is defined as immediately raising a signal, throwing an exception, or otherwise deflecting program execution, then any computations which might overflow will need to be performed in the specified sequence. Even on platforms where integer overflow checking wouldn't cost anything directly, the requirement that integer overflow be trapped at exactly the right point in a program's execution sequence would severely impede many useful optimizations.
If a language were to specify that integer overflows would instead set a latching error flag, were to limit how actions on that flag within a function could affect its value within calling code, and were to provide that the flag need not be set in circumstances where an overflow could not result in erroneous output or behavior, then compilers could generate more efficient code than any kind of manual overflow-checking programmers could use. As a simple example, if one had a function in C that would multiply two numbers and return a result, setting an error flag in case of overflow, a compiler would be required to perform the multiplication whether or not the caller would ever use the result. In a language with looser rules like I described, however, a compiler that determined that nothing ever uses the result of the multiply could infer that overflow could not affect a program's output, and skip the multiply altogether.
From a practical standpoint, most programs don't care about precisely when overflows occur, so much as they need to guarantee that they don't produce erroneous results as a consequence of overflow. Unfortunately, programming languages' integer-overflow-detection semantics have not caught up with what would be necessary to let compilers produce efficient code.
My understanding of why errors would not be raised by default at runtime boils down to the legacy of desiring to create programming languages with ACID-like behavior. Specifically, the tenet that anything that you code it to do (or don't code), it will do (or not do). If you didn't code some error handler, then the machine will "assume" by virtue of no error handler, that you really want to do the ridiculous, crash-prone thing you're telling it to do.
(ACID reference: http://en.wikipedia.org/wiki/ACID)

When to use unsigned values over signed ones?

When is it appropriate to use an unsigned variable over a signed one? What about in a for loop?
I hear a lot of opinions about this and I wanted to see if there was anything resembling a consensus.
for (unsigned int i = 0; i < someThing.length(); i++) {
SomeThing var = someThing.at(i);
// You get the idea.
}
I know Java doesn't have unsigned values, and that must have been a concious decision on Sun Microsystems' part.
I was glad to find a good conversation on this subject, as I hadn't really given it much thought before.
In summary, signed is a good general choice - even when you're dead sure all the numbers are positive - if you're going to do arithmetic on the variable (like in a typical for loop case).
unsigned starts to make more sense when:
You're going to do bitwise things like masks, or
You're desperate to to take advantage of the sign bit for that extra positive range .
Personally, I like signed because I don't trust myself to stay consistent and avoid mixing the two types (like the article warns against).
In your example above, when 'i' will always be positive and a higher range would be beneficial, unsigned would be useful. Like if you're using 'declare' statements, such as:
#declare BIT1 (unsigned int 1)
#declare BIT32 (unsigned int reallybignumber)
Especially when these values will never change.
However, if you're doing an accounting program where the people are irresponsible with their money and are constantly in the red, you will most definitely want to use 'signed'.
I do agree with saint though that a good rule of thumb is to use signed, which C actually defaults to, so you're covered.
I would think that if your business case dictates that a negative number is invalid, you would want to have an error shown or thrown.
With that in mind, I only just recently found out about unsigned integers while working on a project processing data in a binary file and storing the data into a database. I was purposely "corrupting" the binary data, and ended up getting negative values instead of an expected error. I found that even though the value converted, the value was not valid for my business case.
My program did not error, and I ended up getting wrong data into the database. It would have been better if I had used uint and had the program fail.
C and C++ compilers will generate a warning when you compare signed and unsigned types; in your example code, you couldn't make your loop variable unsigned and have the compiler generate code without warnings (assuming said warnings were turned on).
Naturally, you're compiling with warnings turned all the way up, right?
And, have you considered compiling with "treat warnings as errors" to take it that one step further?
The downside with using signed numbers is that there's a temptation to overload them so that, for example, the values 0->n are the menu selection, and -1 means nothing's selected - rather than creating a class that has two variables, one to indicate if something is selected and another to store what that selection is. Before you know it, you're testing for negative one all over the place and the compiler is complaining about how you're wanting to compare the menu selection against the number of menu selections you have - but that's dangerous because they're different types. So don't do that.
size_t is often a good choice for this, or size_type if you're using an STL class.