I've been using strings to represent decoded JSON integers larger than 32 bits. It seems the string_of_int is capable of dealing with large integer inputs. So a decoder, written (in the Json.Decode namespace):
id: json |> field("id", int) |> string_of_int, /* 'id' is string */
is succefully dealing with integers of at least 37 bits.
Encoding, on the other hand, is proving troublesome for me. The remote server won't accept a string representation, and is expecting an int64. Is it possible to make bs-json support the int64 type? I was hoping something like this could be made to work:
type myData = { id: int64 };
let encodeMyData = (data:myData) => Json.Encode.(object_([("id", int64(myData.id)]))
Having to roll my own encoder is not nearly as formidable as a decoder, but ... I'd rather not.
You don't say exactly what problem you have with encoding. The int encoder does literally nothing except change the type, trusting that the int value is actually valid. So I would assume it's the int_of_string operation that causes problems. But that begs the question, if you can successfully decode it as an int, why are you then converting it to a string?
The underlying problem here is that JavaScript doesn't have 64 bit integers. The max safe integer is 253 - 1. JavaScript doesn't actually have integers at all, only floats, which can represent a certain range of integers, but can't efficiently do integer arithmetic unless they're converted to either 32-bit or 64-bit ints. And so for whatever reason, probably consistent overflow handling, it was decided in the EcmaScript specification that binary bitwise operations should operate on 32-bit integers. And so that opened the possibility for an internal 32-bit representation, a notation for creating 32-bit integers, and the possibility of optimized integer arithmetic on those.
So to your question:
Would it be "safe" to just add external int64 : int64 -> Js.Json.t = "%identity" to the encoder files?
No, because there's no 64-bit integer representation in JavaScript, int64 values are represented as an array of two Numbers I believe, but is also an internal implementation detail that's subject to change. Just casting it to Js.Json.t will not yield the result you expect.
So what can you do then?
I would recommend using float. In most respects this will behave exactly like JavaScript numbers, giving you access to its full range.
Alternatively you can use nativeint, which should behave like floats except for division, where the result is truncated to a 32-bit integer.
Lastly, you could also implement your own int_of_string to create an int that is technically out of range by using a couple of lightweight JavaScript functions directly, though I wouldn't really recommend doing this:
let bad_int_of_string = str =>
str |> Js.Float.fromString |> Js.Math.floor_int;
I'm writing dispose methods for all my classes so I can make their objects eligible for Garbage Collection by reference counting when I'm done with them. If a class variable is for an int, uint, or Number, I don't have to null it out in my dispose method, correct? What about arrays/vectors that contain those data types? I don't have to do array.length = 0 either, right? But I have to do array = null. What about strings? Are there any other data types that I don't have to null references for?
first of all why you do such thing and when you will call those dispose methods?
In FP there isn't an event when page close. GC is smart enough to deal with almost all problems and you don't have to do manual reference counting.
Check this article:enter link description here
But let's left aside this.
So in AS3 you don't have to nullify any of primitive types( String , Number , int , uint , Boolean ) nor arrays or vectors that holding it( when we speak for GC, If you want to free memory you can clear it and when FP or Air needs a memory the GC will collect it ).
Calling array.length = 0 will truncate the array and the objects will be collected from GC ( if there isn;t another reference to it ).
Strings are immutable so if you have var of type string that holds some string and than assign to it null for example, the original string will remein until end of the program or it will be collected sometime
Basically, I wonder if a language exists where this code will be invalid because even though counter and distance are both int under the hood, they represent incompatible types in the real world:
#include <stdio.h>
typedef int counter;
typedef int distance;
int main() {
counter pies = 1;
distance lengthOfBiscuit = 4;
printf("total pies: %d\n", pies + lengthOfBiscuit);
return 0;
}
That compiles with no warnings with "gcc -pedantic -Wall" and all other languages where I've tried it. It seems like it would be a good idea to disallow accidentally adding a counter and a distance, so where is the language support?
(Incidentally, the real-life example that prompted this quesion was web dev work in PHP and Python -- I was trying to make "HTML-escaped string", "SQL-escaped string" and "raw dangerous user input" incompatible, but the best I can seem to get is apps hungarian notation as suggested here --> http://www.joelonsoftware.com/articles/Wrong.html <-- and that still relies on human checking ("wrong code looks wrong") rather than compiler support ("wrong code is wrong"))
Haskell can do this, with GeneralizedNewtypeDeriving you can treat wrapped values as the underlying thing, whilst only exposing what you need:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
newtype Counter = Counter Int deriving Num
newtype Distance = Distance Int deriving Num
main :: IO ()
main = print $ Counter 1 + Distance 2
Now you get the error:
Add.hs:6:28:
Couldn't match expected type ‘Counter’ with actual type ‘Distance’
In the second argument of ‘(+)’, namely ‘Distance 2’
In the second argument of ‘($)’, namely ‘Counter 1 + Distance 2’
You can still "force" the underlying data type with "coerce", or by unwrapping the Ints explicitly.
I should add that any language with "real" types should be able to do this.
In Ada you can have types that use the same representation, but are still distinct types. What a "strong typedef" would be (if it existed) in C or C++.
In your case, you could do
type counter is new Integer;
type distance is new Integer;
to create two new types that behave like integers, but cannot be mixed.
Derived types and sub types in Ada
You could ccreate an object wrapping the undelying type in a member variable and define operations (even in the form of functions) that make sense on that type (e.g. LEngth would define "plus" allowing addition to another length, but for angle).
A drawback of this approach is you have to create a wrapper for each underlying type you care about and define the appropriate operations for each sensible combination, which might be tedious and possibly error-prone.
In C++, you could check out BOOST support for dimensions. The example given is designed primarily for physical dimensions, but I think you could adapt it to many others as well.
standard C lib:
int fputc(int c , FILE *stream);
And such behaviors occured many times, e.g:
int putc(int c, FILE *stream);
int putchar(int c);
why not use CHAR as it really is?
If use INT is necessary, when should I use INT instead of CHAR?
Most likely (in my opinion, since much of the rationale behind early C is lost in the depths of time), it it was simply to mirror the types used in the fgetc type functions which must be able to return any real character plus the EOF special character. The fgetc function gets the next character converted to an int, and uses a special marker value EOF to indicate the end of the stream.
To do that, they needed the wider int type since a char isn't quite large enough to hold all possible characters plus one more thing.
And, since the developers of C seemed to prefer a rather minimalist approach to code, it makes sense that they would use the same type, to allow for code such as:
filecopy(ifp, ofp)
FILE *ifp;
FILE *ofp;
{
int c;
while ((c = fgetc (ifp)) != EOF)
fputc (c, ofp);
}
No char parameters in K&R C
One reason is that in early versions1 of C there were no char parameters.
Yes, you could declare a parameter as char or float but it was considered int or double. Therefore, it would have, then, been somewhat misleading to document an interface as taking a char argument.
I believe this is still true today for functions declared without prototypes, in order for it to be possible to interoperate with older code.
1. Early, but still widespread. C was a quick success and became the first (and still, mostly, the only) widely successful systems programming language.
I'm looking for a clear, concise and accurate answer.
Ideally as the actual answer, although links to good explanations welcome.
Boxed values are data structures that are minimal wrappers around primitive types*. Boxed values are typically stored as pointers to objects on the heap.
Thus, boxed values use more memory and take at minimum two memory lookups to access: once to get the pointer, and another to follow that pointer to the primitive. Obviously this isn't the kind of thing you want in your inner loops. On the other hand, boxed values typically play better with other types in the system. Since they are first-class data structures in the language, they have the expected metadata and structure that other data structures have.
In Java and Haskell generic collections can't contain unboxed values. Generic collections in .NET can hold unboxed values with no penalties. Where Java's generics are only used for compile-time type checking, .NET will generate specific classes for each generic type instantiated at run time.
Java and Haskell have unboxed arrays, but they're distinctly less convenient than the other collections. However, when peak performance is needed it's worth a little inconvenience to avoid the overhead of boxing and unboxing.
* For this discussion, a primitive value is any that can be stored on the call stack, rather than stored as a pointer to a value on the heap. Frequently that's just the machine types (ints, floats, etc), structs, and sometimes static sized arrays. .NET-land calls them value types (as opposed to reference types). Java folks call them primitive types. Haskellions just call them unboxed.
** I'm also focusing on Java, Haskell, and C# in this answer, because that's what I know. For what it's worth, Python, Ruby, and Javascript all have exclusively boxed values. This is also known as the "Everything is an object" approach***.
*** Caveat: A sufficiently advanced compiler / JIT can in some cases actually detect that a value which is semantically boxed when looking at the source, can safely be an unboxed value at runtime. In essence, thanks to brilliant language implementors your boxes are sometimes free.
from C# 3.0 In a Nutshell:
Boxing is the act of casting a value
type into a reference type:
int x = 9;
object o = x; // boxing the int
unboxing is... the reverse:
// unboxing o
object o = 9;
int x = (int)o;
Boxing & unboxing is the process of converting a primitive value into an object oriented wrapper class (boxing), or converting a value from an object oriented wrapper class back to the primitive value (unboxing).
For example, in java, you may need to convert an int value into an Integer (boxing) if you want to store it in a Collection because primitives can't be stored in a Collection, only objects. But when you want to get it back out of the Collection you may want to get the value as an int and not an Integer so you would unbox it.
Boxing and unboxing is not inherently bad, but it is a tradeoff. Depending on the language implementation, it can be slower and more memory intensive than just using primitives. However, it may also allow you to use higher level data structures and achieve greater flexibility in your code.
These days, it is most commonly discussed in the context of Java's (and other language's) "autoboxing/autounboxing" feature. Here is a java centric explanation of autoboxing.
In .Net:
Often you can't rely on what the type of variable a function will consume, so you need to use an object variable which extends from the lowest common denominator - in .Net this is object.
However object is a class and stores its contents as a reference.
List<int> notBoxed = new List<int> { 1, 2, 3 };
int i = notBoxed[1]; // this is the actual value
List<object> boxed = new List<object> { 1, 2, 3 };
int j = (int) boxed[1]; // this is an object that can be 'unboxed' to an int
While both these hold the same information the second list is larger and slower. Each value in the second list is actually a reference to an object that holds the int.
This is called boxed because the int is wrapped by the object. When its cast back the int is unboxed - converted back to it's value.
For value types (i.e. all structs) this is slow, and potentially uses a lot more space.
For reference types (i.e. all classes) this is far less of a problem, as they are stored as a reference anyway.
A further problem with a boxed value type is that it's not obvious that you're dealing with the box, rather than the value. When you compare two structs then you're comparing values, but when you compare two classes then (by default) you're comparing the reference - i.e. are these the same instance?
This can be confusing when dealing with boxed value types:
int a = 7;
int b = 7;
if(a == b) // Evaluates to true, because a and b have the same value
object c = (object) 7;
object d = (object) 7;
if(c == d) // Evaluates to false, because c and d are different instances
It's easy to work around:
if(c.Equals(d)) // Evaluates to true because it calls the underlying int's equals
if(((int) c) == ((int) d)) // Evaluates to true once the values are cast
However it is another thing to be careful of when dealing with boxed values.
Boxing is the process of conversion of a value type into a reference type. Whereas Unboxing is the conversion of a reference type into a value type.
EX: int i = 123;
object o = i;// Boxing
int j = (int)o;// UnBoxing
Value Types are: int, char and structures, enumerations.
Reference Types are:
Classes,interfaces,arrays,strings and objects
The language-agnostic meaning of a box is just "an object contains some other value".
Literally, boxing is an operation to put some value into the box. More specifically, it is an operation to create a new box containing the value. After boxing, the boxed value can be accessed from the box object, by unboxing.
Note that objects (not OOP-specific) in many programming languages are about identities, but values are not. Two objects are same iff. they have identities not distinguishable in the program semantics. Values can also be the same (usually under some equality operators), but we do not distinguish them as "one" or "two" unique values.
Providing boxes is mainly about the effort to distinguish side effects (typically, mutation) from the states on the objects otherwise probably invisible to the users.
A language may limit the allowed ways to access an object and hide the identity of the object by default. For example, typical Lisp dialects has no explicit distinctions between objects and values. As a result, the implementation has the freedom to share the underlying storage of the objects until some mutation operations occurs on the object (so the object must be "detached" after the operation from the shared instance to make the effect visible, i.e. the mutated value stored in the object could be different than the other objects having the old value). This technique is sometimes called object interning.
Interning makes the program more memory efficient at runtime if the objects are shared without frequent needs of mutation, at the cost that:
The users cannot distinguish the identity of the objects.
There are no way to identify an object and to ensure it has states explicitly independent to other objects in the program before some side effects have actually occur (and the implementation does not aggressively to do the interning concurrently; this should be the rare case, though).
There may be more problems on interoperations which require to identify different objects for different operations.
There are risks that such assumptions can be false, so the performance is actually made worse by applying the interning.
This depends on the programming paradigm. Imperative programming which mutates objects frequently certainly would not work well with interning.
Implementations depending on COW (copy-on-write) to ensure interning can incur serious performance degradation in concurrent environments.
Even local sharing specifically for a few internal data structures can be bad. For example, ISO C++ 11 did not allow sharing of the internal elements of std::basic_string for this reason exactly, even at the cost of breaking the ABI on at least one mainstream implementation (libstdc++).
Boxing and unboxing incur performance penalties. This is obvious especially when these operations can be naively avoided by hand but actually not easy for the optimizer. The concrete measurement of the cost depends (on per-implementation or even per-program basis), though.
Mutable cells, i.e. boxes, are well-established facilities exactly to resolve the problems of the 1st and 2nd bullets listed above. Additionally, there can be immutable boxes for implementation of assignment in a functional language. See SRFI-111 for a practical instance.
Using mutable cells as function arguments with call-by-value strategy implements the visible effects of mutation being shared between the caller and the callee. The object contained by an box is effectively "called by shared" in this sense.
Sometimes, the boxes are referred as references (which is technically false), so the shared semantics are named "reference semantics". This is not correct, because not all references can propagate the visible side effects (e.g. immutable references). References are more about exposing the access by indirection, while boxes are the efforts to expose minimal details of the accesses like whether indirection or not (which is uninterested and better avoided by the implementation).
Moreover, "value semantic" is irrelevant here. Values are not against to references, nor to boxes. All the discussions above are based on call-by-value strategy. For others (like call-by-name or call-by-need), no boxes are needed to shared object contents in this way.
Java is probably the first programming language to make these features popular in the industry. Unfortunately, there seem many bad consequences concerned in this topic:
The overall programming paradigm does not fit the design.
Practically, the interning are limited to specific objects like immutable strings, and the cost of (auto-)boxing and unboxing are often blamed.
Fundamental PL knowledge like the definition of the term "object" (as "instance of a class") in the language specification, as well as the descriptions of parameter passing, are biased compared to the the original, well-known meaning, during the adoption of Java by programmers.
At least CLR languages are following the similar parlance.
Some more tips on implementations (and comments to this answer):
Whether to put the objects on the call stacks or the heap is an implementation details, and irrelevant to the implementation of boxes.
Some language implementations do not maintain a contiguous storage as the call stack.
Some language implementations do not even make the (per thread) activation records a linear stack.
Some language implementations do allocate stacks on the free store ("the heap") and transfer slices of frames between the stacks and the heap back and forth.
These strategies has nothing to do boxes. For instance, many Scheme implementations have boxes, with different activation records layouts, including all the ways listed above.
Besides the technical inaccuracy, the statement "everything is an object" is irrelevant to boxing.
Python, Ruby, and JavaScript all use latent typing (by default), so all identifiers referring to some objects will evaluate to values having the same static type. So does Scheme.
Some JavaScript and Ruby implementations use the so-called NaN-boxing to allow inlining allocation of some objects. Some others (including CPython) do not. With NaN boxing, a normal double object needs no unboxing to access its value, while a value of some other types can be boxed in a host double object, and there is no reference for double or the boxed value. With the naive pointer approach, a value of host object pointer like PyObject* is an object reference holding a box whose boxed value is stored in the dynamically allocated space.
At least in Python, objects are not "everything". They are also not known as "boxed values" unless you are talking about interoperability with specific implementations.
The .NET FCL generic collections:
List<T>
Dictionary<TKey, UValue>
SortedDictionary<TKey, UValue>
Stack<T>
Queue<T>
LinkedList<T>
were all designed to overcome the performance issues of boxing and unboxing in previous collection implementations.
For more, see chapter 16, CLR via C# (2nd Edition).
Boxing and unboxing facilitates value types to be treated as objects. Boxing means converting a value to an instance of the object reference type. For example, Int is a class and int is a data type. Converting int to Int is an exemplification of boxing, whereas converting Int to int is unboxing. The concept helps in garbage collection, Unboxing, on the other hand, converts object type to value type.
int i=123;
object o=(object)i; //Boxing
o=123;
i=(int)o; //Unboxing.
Like anything else, autoboxing can be problematic if not used carefully. The classic is to end up with a NullPointerException and not be able to track it down. Even with a debugger. Try this:
public class TestAutoboxNPE
{
public static void main(String[] args)
{
Integer i = null;
// .. do some other stuff and forget to initialise i
i = addOne(i); // Whoa! NPE!
}
public static int addOne(int i)
{
return i + 1;
}
}