What is boxing and unboxing and what are the trade offs? - language-agnostic

I'm looking for a clear, concise and accurate answer.
Ideally as the actual answer, although links to good explanations welcome.

Boxed values are data structures that are minimal wrappers around primitive types*. Boxed values are typically stored as pointers to objects on the heap.
Thus, boxed values use more memory and take at minimum two memory lookups to access: once to get the pointer, and another to follow that pointer to the primitive. Obviously this isn't the kind of thing you want in your inner loops. On the other hand, boxed values typically play better with other types in the system. Since they are first-class data structures in the language, they have the expected metadata and structure that other data structures have.
In Java and Haskell generic collections can't contain unboxed values. Generic collections in .NET can hold unboxed values with no penalties. Where Java's generics are only used for compile-time type checking, .NET will generate specific classes for each generic type instantiated at run time.
Java and Haskell have unboxed arrays, but they're distinctly less convenient than the other collections. However, when peak performance is needed it's worth a little inconvenience to avoid the overhead of boxing and unboxing.
* For this discussion, a primitive value is any that can be stored on the call stack, rather than stored as a pointer to a value on the heap. Frequently that's just the machine types (ints, floats, etc), structs, and sometimes static sized arrays. .NET-land calls them value types (as opposed to reference types). Java folks call them primitive types. Haskellions just call them unboxed.
** I'm also focusing on Java, Haskell, and C# in this answer, because that's what I know. For what it's worth, Python, Ruby, and Javascript all have exclusively boxed values. This is also known as the "Everything is an object" approach***.
*** Caveat: A sufficiently advanced compiler / JIT can in some cases actually detect that a value which is semantically boxed when looking at the source, can safely be an unboxed value at runtime. In essence, thanks to brilliant language implementors your boxes are sometimes free.

from C# 3.0 In a Nutshell:
Boxing is the act of casting a value
type into a reference type:
int x = 9;
object o = x; // boxing the int
unboxing is... the reverse:
// unboxing o
object o = 9;
int x = (int)o;

Boxing & unboxing is the process of converting a primitive value into an object oriented wrapper class (boxing), or converting a value from an object oriented wrapper class back to the primitive value (unboxing).
For example, in java, you may need to convert an int value into an Integer (boxing) if you want to store it in a Collection because primitives can't be stored in a Collection, only objects. But when you want to get it back out of the Collection you may want to get the value as an int and not an Integer so you would unbox it.
Boxing and unboxing is not inherently bad, but it is a tradeoff. Depending on the language implementation, it can be slower and more memory intensive than just using primitives. However, it may also allow you to use higher level data structures and achieve greater flexibility in your code.
These days, it is most commonly discussed in the context of Java's (and other language's) "autoboxing/autounboxing" feature. Here is a java centric explanation of autoboxing.

In .Net:
Often you can't rely on what the type of variable a function will consume, so you need to use an object variable which extends from the lowest common denominator - in .Net this is object.
However object is a class and stores its contents as a reference.
List<int> notBoxed = new List<int> { 1, 2, 3 };
int i = notBoxed[1]; // this is the actual value
List<object> boxed = new List<object> { 1, 2, 3 };
int j = (int) boxed[1]; // this is an object that can be 'unboxed' to an int
While both these hold the same information the second list is larger and slower. Each value in the second list is actually a reference to an object that holds the int.
This is called boxed because the int is wrapped by the object. When its cast back the int is unboxed - converted back to it's value.
For value types (i.e. all structs) this is slow, and potentially uses a lot more space.
For reference types (i.e. all classes) this is far less of a problem, as they are stored as a reference anyway.
A further problem with a boxed value type is that it's not obvious that you're dealing with the box, rather than the value. When you compare two structs then you're comparing values, but when you compare two classes then (by default) you're comparing the reference - i.e. are these the same instance?
This can be confusing when dealing with boxed value types:
int a = 7;
int b = 7;
if(a == b) // Evaluates to true, because a and b have the same value
object c = (object) 7;
object d = (object) 7;
if(c == d) // Evaluates to false, because c and d are different instances
It's easy to work around:
if(c.Equals(d)) // Evaluates to true because it calls the underlying int's equals
if(((int) c) == ((int) d)) // Evaluates to true once the values are cast
However it is another thing to be careful of when dealing with boxed values.

Boxing is the process of conversion of a value type into a reference type. Whereas Unboxing is the conversion of a reference type into a value type.
EX: int i = 123;
object o = i;// Boxing
int j = (int)o;// UnBoxing
Value Types are: int, char and structures, enumerations.
Reference Types are:
Classes,interfaces,arrays,strings and objects

The language-agnostic meaning of a box is just "an object contains some other value".
Literally, boxing is an operation to put some value into the box. More specifically, it is an operation to create a new box containing the value. After boxing, the boxed value can be accessed from the box object, by unboxing.
Note that objects (not OOP-specific) in many programming languages are about identities, but values are not. Two objects are same iff. they have identities not distinguishable in the program semantics. Values can also be the same (usually under some equality operators), but we do not distinguish them as "one" or "two" unique values.
Providing boxes is mainly about the effort to distinguish side effects (typically, mutation) from the states on the objects otherwise probably invisible to the users.
A language may limit the allowed ways to access an object and hide the identity of the object by default. For example, typical Lisp dialects has no explicit distinctions between objects and values. As a result, the implementation has the freedom to share the underlying storage of the objects until some mutation operations occurs on the object (so the object must be "detached" after the operation from the shared instance to make the effect visible, i.e. the mutated value stored in the object could be different than the other objects having the old value). This technique is sometimes called object interning.
Interning makes the program more memory efficient at runtime if the objects are shared without frequent needs of mutation, at the cost that:
The users cannot distinguish the identity of the objects.
There are no way to identify an object and to ensure it has states explicitly independent to other objects in the program before some side effects have actually occur (and the implementation does not aggressively to do the interning concurrently; this should be the rare case, though).
There may be more problems on interoperations which require to identify different objects for different operations.
There are risks that such assumptions can be false, so the performance is actually made worse by applying the interning.
This depends on the programming paradigm. Imperative programming which mutates objects frequently certainly would not work well with interning.
Implementations depending on COW (copy-on-write) to ensure interning can incur serious performance degradation in concurrent environments.
Even local sharing specifically for a few internal data structures can be bad. For example, ISO C++ 11 did not allow sharing of the internal elements of std::basic_string for this reason exactly, even at the cost of breaking the ABI on at least one mainstream implementation (libstdc++).
Boxing and unboxing incur performance penalties. This is obvious especially when these operations can be naively avoided by hand but actually not easy for the optimizer. The concrete measurement of the cost depends (on per-implementation or even per-program basis), though.
Mutable cells, i.e. boxes, are well-established facilities exactly to resolve the problems of the 1st and 2nd bullets listed above. Additionally, there can be immutable boxes for implementation of assignment in a functional language. See SRFI-111 for a practical instance.
Using mutable cells as function arguments with call-by-value strategy implements the visible effects of mutation being shared between the caller and the callee. The object contained by an box is effectively "called by shared" in this sense.
Sometimes, the boxes are referred as references (which is technically false), so the shared semantics are named "reference semantics". This is not correct, because not all references can propagate the visible side effects (e.g. immutable references). References are more about exposing the access by indirection, while boxes are the efforts to expose minimal details of the accesses like whether indirection or not (which is uninterested and better avoided by the implementation).
Moreover, "value semantic" is irrelevant here. Values are not against to references, nor to boxes. All the discussions above are based on call-by-value strategy. For others (like call-by-name or call-by-need), no boxes are needed to shared object contents in this way.
Java is probably the first programming language to make these features popular in the industry. Unfortunately, there seem many bad consequences concerned in this topic:
The overall programming paradigm does not fit the design.
Practically, the interning are limited to specific objects like immutable strings, and the cost of (auto-)boxing and unboxing are often blamed.
Fundamental PL knowledge like the definition of the term "object" (as "instance of a class") in the language specification, as well as the descriptions of parameter passing, are biased compared to the the original, well-known meaning, during the adoption of Java by programmers.
At least CLR languages are following the similar parlance.
Some more tips on implementations (and comments to this answer):
Whether to put the objects on the call stacks or the heap is an implementation details, and irrelevant to the implementation of boxes.
Some language implementations do not maintain a contiguous storage as the call stack.
Some language implementations do not even make the (per thread) activation records a linear stack.
Some language implementations do allocate stacks on the free store ("the heap") and transfer slices of frames between the stacks and the heap back and forth.
These strategies has nothing to do boxes. For instance, many Scheme implementations have boxes, with different activation records layouts, including all the ways listed above.
Besides the technical inaccuracy, the statement "everything is an object" is irrelevant to boxing.
Python, Ruby, and JavaScript all use latent typing (by default), so all identifiers referring to some objects will evaluate to values having the same static type. So does Scheme.
Some JavaScript and Ruby implementations use the so-called NaN-boxing to allow inlining allocation of some objects. Some others (including CPython) do not. With NaN boxing, a normal double object needs no unboxing to access its value, while a value of some other types can be boxed in a host double object, and there is no reference for double or the boxed value. With the naive pointer approach, a value of host object pointer like PyObject* is an object reference holding a box whose boxed value is stored in the dynamically allocated space.
At least in Python, objects are not "everything". They are also not known as "boxed values" unless you are talking about interoperability with specific implementations.

The .NET FCL generic collections:
List<T>
Dictionary<TKey, UValue>
SortedDictionary<TKey, UValue>
Stack<T>
Queue<T>
LinkedList<T>
were all designed to overcome the performance issues of boxing and unboxing in previous collection implementations.
For more, see chapter 16, CLR via C# (2nd Edition).

Boxing and unboxing facilitates value types to be treated as objects. Boxing means converting a value to an instance of the object reference type. For example, Int is a class and int is a data type. Converting int to Int is an exemplification of boxing, whereas converting Int to int is unboxing. The concept helps in garbage collection, Unboxing, on the other hand, converts object type to value type.
int i=123;
object o=(object)i; //Boxing
o=123;
i=(int)o; //Unboxing.

Like anything else, autoboxing can be problematic if not used carefully. The classic is to end up with a NullPointerException and not be able to track it down. Even with a debugger. Try this:
public class TestAutoboxNPE
{
public static void main(String[] args)
{
Integer i = null;
// .. do some other stuff and forget to initialise i
i = addOne(i); // Whoa! NPE!
}
public static int addOne(int i)
{
return i + 1;
}
}

Related

Why is the int class lowercase but the String class is not?

I've noticed this across a few languages, but I'll make my question specific to AS3. Why is an int class lowercase but a String or Number is not?
var myInt:int = 0;
var myString:String = "";
var myNum:Number = 0;
That's simply primitive values vs. Objects. You can do something like String.substring() because it's an Object but you can't do anything with int, it's just a number.
==== EDIT ====
According to the comment below, the int in AS3 is a class so you can call some methods of it. However, it is still a primitive type. The difference is explained here: http://www.adobe.com/devnet/actionscript/learning/as3-fundamentals/data-types.html
"Primitive values are usually faster than complex values because ActionScript 3 stores primitive values in a special way that makes memory and speed optimizations possible.
Note: For readers interested in the technical details, ActionScript 3 stores primitive values internally as immutable objects. The fact that they are stored as immutable objects means that passing by reference is effectively the same as passing by value. This cuts down on memory usage and increases execution speed, because references are usually significantly smaller than the values themselves."
This is most likely due to the fact that they followed the ECMAScript3 standard where you on page 14 of the PDF find the definition of primitive values of type Number, String, and Boolean.
Page 26 of the PDF lists future reserved keywords, where int appears. Naturally if they want to support unsigned integers, it should be called uint.
Personally I think it would make more sense to name them Int and Uint (or do like Java)

In which languages is function abstraction not primitive

In Haskell function type (->) is given, it's not an algebraic data type constructor and one cannot re-implement it to be identical to (->).
So I wonder, what languages will allow me to write my version of (->)? How does this property called?
UPD Reformulations of the question thanks to the discussion:
Which languages don't have -> as a primitive type?
Why -> is necessary primitive?
I can't think of any languages that have arrows as a user defined type. The reason is that arrows -- types for functions -- are baked in to the type system, all the way down to the simply typed lambda calculus. That the arrow type must fundamental to the language comes directly from the fact that the way you form functions in the lambda calculus is via lambda abstraction (which, at the type level, introduces arrows).
Although Marcin aptly notes that you can program in a point free style, this doesn't change the essence of what you're doing. Having a language without arrow types as primitives goes against the most fundamental building blocks of Haskell. (The language you reference in the question.)
Having the arrow as a primitive type also shares some important ties to constructive logic: you can read the function arrow type as implication from intuition logic, and programs having that type as "proofs." (Namely, if you have something of type A -> B, you have a proof that takes some premise of type A, and produces a proof for B.)
The fact that you're perturbed by the use of having arrows baked into the language might imply that you're not fundamentally grasping why they're so tied to the design of the language, perhaps it's time to read a few chapters from Ben Pierce's "Types and Programming Languages" link.
Edit: You can always look at languages which don't have a strong notion of functions and have their semantics defined with respect to some other way -- such as forth or PostScript -- but in these languages you don't define inductive data types in the same way as in functional languages like Haskell, ML, or Coq. To put it another way, in any language in which you define constructors for datatypes, arrows arise naturally from the constructors for these types. But in languages where you don't define inductive datatypes in the typical way, you don't get arrow types as naturally because the language just doesn't work that way.
Another edit: I will stick in one more comment, since I thought of it last night. Function types (and function abstraction) forms the basis of pretty much all programming languages -- at least at some level, even if it's "under the hood." However, there are languages designed to define the semantics of other languages. While this doesn't strictly match what you're talking about, PLT Redex is one such system, and is used for specifying and debugging the semantics of programming languages. It's not super useful from a practitioners perspective (unless your goal is to design new languages, in which case it is fairly useful), but maybe that fits what you want.
Do you mean meta-circular evaluators like in SICP? Being able to write your own DSL? If you create your own "function type", you'll have to take care of "applying" it, yourself.
Just as an example, you could create your own "function" in C for instance, with a look-up table holding function pointers, and use integers as functions. You'd have to provide your own "call" function for such "functions", of course:
void call( unsigned int function, int data) {
lookup_table[function](data);
}
You'd also probably want some means of creating more complex functions from primitive ones, for instance using arrays of ints to signify sequential execution of your "primitive functions" 1, 2, 3, ... and end up inventing whole new language for yourself.
I think early assemblers had no ability to create callable "macros" and had to use GOTO.
You could use trampolining to simulate function calls. You could have only global variables store, with shallow binding perhaps. In such language "functions" would be definable, though not primitive type.
So having functions in a language is not necessary, though it is convenient.
In Common Lisp defun is nothing but a macro associating a name and a callable object (though lambda is still a built-in). In AutoLisp originally there was no special function type at all, and functions were represented directly by quoted lists of s-expressions, with first element an arguments list. You can construct your function through use of cons and list functions, from symbols, directly, in AutoLisp:
(setq a (list (cons 'x NIL) '(+ 1 x)))
(a 5)
==> 6
Some languages (like Python) support more than one primitive function type, each with its calling protocol - namely, generators support multiple re-entry and returns (even if syntactically through the use of same def keyword). You can easily imagine a language which would let you define your own calling protocol, thus creating new function types.
Edit: as an example consider dealing with multiple arguments in a function call, the choice between automatic currying or automatical optional args etc. In Common LISP say, you could easily create yourself two different call macros to directly represent the two calling protocols. Consider functions returning multiple values not through a kludge of aggregates (tuples, in Haskell), but directly into designated recepient vars/slots. All are different types of functions.
Function definition is usually primitive because (a) functions are how programmes get things done; and (b) this sort of lambda-abstraction is necessary to be able to programme in a pointful style (i.e. with explicit arguments).
Probably the closest you will come to a language that meets your criteria is one based on a purely pointfree model which allows you to create your own lambda operator. You might like to explore pointfree languages in general, and ones based on SKI calculus in particular: http://en.wikipedia.org/wiki/SKI_combinator_calculus
In such a case, you still have primitive function types, and you always will, because it is a fundamental element of the type system. If you want to get away from that at all, probably the best you could do would be some kind of type system based on a category-theoretic generalisation of functions, such that functions would be a special case of another type. See http://en.wikipedia.org/wiki/Category_theory.
Which languages don't have -> as a primitive type?
Well, if you mean a type that can be named, then there are many languages that don't have them. All languages where functions are not first class citiziens don't have -> as a type you could mention somewhere.
But, as #Kristopher eloquently and excellently explained, functions are (or can, at least, perceived as) the very basic building blocks of all computation. Hence even in Java, say, there are functions, but they are carefully hidden from you.
And, as someone mentioned assembler - one could maintain that the machine language (of most contemporary computers) is an approximation of the model of the register machine. But how it is done? With millions and billions of logical circuits, each of them being a materialization of quite primitive pure functions like NOT or NAND, arranged in a certain physical order (which is, obviously, the way hardware engeniers implement function composition).
Hence, while you may not see functions in machine code, they're still the basis.
In Martin-Löf type theory, function types are defined via indexed product types (so-called Π-types).
Basically, the type of functions from A to B can be interpreted as a (possibly infinite) record, where all the fields are of the same type B, and the field names are exactly all the elements of A. When you need to apply a function f to an argument x, you look up the field in f corresponding to x.
The wikipedia article lists some programming languages that are based on Martin-Löf type theory. I am not familiar with them, but I assume that they are a possible answer to your question.
Philip Wadler's paper Call-by-value is dual to call-by-name presents a calculus in which variable abstraction and covariable abstraction are more primitive than function abstraction. Two definitions of function types in terms of those primitives are provided: one implements call-by-value, and the other call-by-name.
Inspired by Wadler's paper, I implemented a language (Ambidexer) which provides two function type constructors that are synonyms for types constructed from the primitives. One is for call-by-value and one for call-by-name. Neither Wadler's dual calculus nor Ambidexter provides user-defined type constructors. However, these examples show that function types are not necessarily primitive, and that a language in which you can define your own (->) is conceivable.
In Scala you can mixin one of the Function traits, e.g. a Set[A] can be used as A => Boolean because it implements the Function1[A,Boolean] trait. Another example is PartialFunction[A,B], which extends usual functions by providing a "range-check" method isDefinedAt.
However, in Scala methods and functions are different, and there is no way to change how methods work. Usually you don't notice the difference, as methods are automatically lifted to functions.
So you have a lot of control how you implement and extend functions in Scala, but I think you have a real "replacement" in mind. I'm not sure this makes even sense.
Or maybe you are looking for languages with some kind of generalization of functions? Then Haskell with Arrow syntax would qualify: http://www.haskell.org/arrows/syntax.html
I suppose the dumb answer to your question is assembly code. This provides you with primitives even "lower" level than functions. You can create functions as macros that make use of register and jump primitives.
Most sane programming languages will give you a way to create functions as a baked-in language feature, because functions (or "subroutines") are the essence of good programming: code reuse.

What is ADT? (Abstract Data Type)

I am currently studying about Abstract Data Types (ADT's) but I don't get the concept at all. Can someone please explain to me what this actually is? Also what is collection, bag, and List ADT? in simple terms?
Abstract Data Type(ADT) is a data type, where only behavior is defined but not implementation.
Opposite of ADT is Concrete Data Type (CDT), where it contains an implementation of ADT.
Examples:
Array, List, Map, Queue, Set, Stack, Table, Tree, and Vector are ADTs. Each of these ADTs has many implementations i.e. CDT. The container is a high-level ADT of above all ADTs.
Real life example:
book is Abstract (Telephone Book is an implementation)
The Abstact data type Wikipedia article has a lot to say.
In computer science, an abstract data type (ADT) is a mathematical model for a certain class of data structures that have similar behavior; or for certain data types of one or more programming languages that have similar semantics. An abstract data type is defined indirectly, only by the operations that may be performed on it and by mathematical constraints on the effects (and possibly cost) of those operations.
In slightly more concrete terms, you can take Java's List interface as an example. The interface doesn't explicitly define any behavior at all because there is no concrete List class. The interface only defines a set of methods that other classes (e.g. ArrayList and LinkedList) must implement in order to be considered a List.
A collection is another abstract data type. In the case of Java's Collection interface, it's even more abstract than List, since
The List interface places additional stipulations, beyond those specified in the Collection interface, on the contracts of the iterator, add, remove, equals, and hashCode methods.
A bag is also known as a multiset.
In mathematics, the notion of multiset (or bag) is a generalization of the notion of set in which members are allowed to appear more than once. For example, there is a unique set that contains the elements a and b and no others, but there are many multisets with this property, such as the multiset that contains two copies of a and one of b or the multiset that contains three copies of both a and b.
In Java, a Bag would be a collection that implements a very simple interface. You only need to be able to add items to a bag, check its size, and iterate over the items it contains. See Bag.java for an example implementation (from Sedgewick & Wayne's Algorithms 4th edition).
A truly abstract data type describes the properties of its instances without commitment to their representation or particular operations. For example the abstract (mathematical) type Integer is a discrete, unlimited, linearly ordered set of instances. A concrete type gives a specific representation for instances and implements a specific set of operations.
Notation of Abstract Data Type(ADT)
An abstract data type could be defined as a mathematical model with a
collection of operations defined on it. A simple example is the set of
integers together with the operations of union, intersection defined
on the set.
The ADT's are generalizations of primitive data type(integer, char
etc) and they encapsulate a data type in the sense that the definition
of the type and all operations on that type localized to one section
of the program. They are treated as a primitive data type outside the
section in which the ADT and its operations are defined.
An implementation of an ADT is the translation into statements of
a programming language of the declaration that defines a variable to
be of that ADT, plus a procedure in that language for each
operation of that ADT. The implementation of the ADT chooses a
data structure to represent the ADT.
A useful tool for specifying the logical properties of data type is
the abstract data type. Fundamentally, a data type is a collection of
values and a set of operations on those values. That collection and
those operations form a mathematical construct that may be implemented
using a particular hardware and software data structure. The term
"abstract data type" refers to the basic mathematical concept that defines the data type.
In defining an abstract data type as mathamatical concept, we are not
concerned with space or time efficinecy. Those are implementation
issue. Infact, the defination of ADT is not concerned with
implementaion detail at all. It may not even be possible to implement
a particular ADT on a particular piece of hardware or using a
particular software system. For example, we have already seen that an
ADT integer is not universally implementable.
To illustrate the concept of an ADT and my specification method,
consider the ADT RATIONAL which corresponds to the mathematical
concept of a rational number. A rational number is a number that can
be expressed as the quotient of two integers. The operations on
rational numbers that, we define are the creation of a rational number
from two integers, addition, multiplication and testing for equality.
The following is an initial specification of this ADT.
/* Value defination */
abstract typedef <integer, integer> RATIONAL;
condition RATIONAL [1]!=0;
/*Operator defination*/
abstract RATIONAL makerational (a,b)
int a,b;
preconditon b!=0;
postcondition makerational [0] =a;
makerational [1] =b;
abstract RATIONAL add [a,b]
RATIONAL a,b;
postcondition add[1] = = a[1] * b[1]
add[0] = a[0]*b[1]+b[0]*a[1]
abstract RATIONAL mult [a, b]
RATIONAL a,b;
postcondition mult[0] = = a[0]*b[a]
mult[1] = = a[1]*b[1]
abstract equal (a,b)
RATIONAL a,b;
postcondition equal = = |a[0] * b[1] = = b[0] * a[1];
An ADT consists of two parts:-
1) Value definition
2) Operation definition
1) Value Definition:-
The value definition defines the collection of values for the ADT and
consists of two parts:
1) Definition Clause
2) Condition Clause
For example, the value definition for the ADT RATIONAL states that
a RATIONAL value consists of two integers, the second of which does
not equal to 0.
The keyword abstract typedef introduces a value definitions and the
keyword condition is used to specify any conditions on the newly
defined data type. In this definition the condition specifies that the
denominator may not be 0. The definition clause is required, but the
condition may not be necessary for every ADT.
2) Operator Definition:-
Each operator is defined as an abstract junction with three parts.
1)Header
2)Optional Preconditions
3)Optional Postconditions
For example the operator definition of the ADT RATIONAL includes the
operations of creation (makerational), addition (add) and
multiplication (mult) as well as a test for equality (equal). Let us
consider the specification for multiplication first, since, it is the
simplest. It contains a header and post-conditions, but no
pre-conditions.
abstract RATIONAL mult [a,b]
RATIONAL a,b;
postcondition mult[0] = a[0]*b[0]
mult[1] = a[1]*b[1]
The header of this definition is the first two lines, which are just
like a C function header. The keyword abstract indicates that it is
not a C function but an ADT operator definition.
The post-condition specifies, what the operation does. In a
post-condition, the name of the function (in this case, mult) is used
to denote the result of an operation. Thus, mult [0] represents
numerator of result and mult 1 represents the denominator of the
result. That is it specifies, what conditions become true after the
operation is executed. In this example the post-condition specifies
that the neumerator of the result of a rational multiplication equals
integer product of numerators of the two inputs and the denominator
equals th einteger product of two denominators.
List
In computer science, a list or sequence is an abstract data type that
represents a countable number of ordered values, where the same value
may occur more than once. An instance of a list is a computer
representation of the mathematical concept of a finite sequence; the
(potentially) infinite analog of a list is a stream. Lists are a basic
example of containers, as they contain other values. If the same value
occurs multiple times, each occurrence is considered a distinct item
The name list is also used for several concrete data structures that
can be used to implement abstract lists, especially linked lists.
Image of a List
Bag
A bag is a collection of objects, where you can keep adding objects to
the bag, but you cannot remove them once added to the bag. So with a
bag data structure, you can collect all the objects, and then iterate
through them. You will bags normally when you program in Java.
Image of a Bag
Collection
A collection in the Java sense refers to any class that implements the
Collection interface. A collection in a generic sense is just a group
of objects.
Image of collections
Actually Abstract Data Types is:
Concepts or theoretical model that defines a data type logically
Specifies set of data and set of operations that can be performed on that data
Does not mention anything about how operations will be implemented
"Existing as an idea but not having a physical idea"
For example, lets see specifications of some Abstract Data Types,
List Abstract Data Type: initialize(), get(), insert(), remove(), etc.
Stack Abstract Data Type: push(), pop(), peek(), isEmpty(), isNull(), etc.
Queue Abstract Data Type: enqueue(), dequeue(), size(), peek(), etc.
One of the simplest explanation given on Brilliant's wiki:
Abstract data types, commonly abbreviated ADTs, are a way of
classifying data structures based on how they are used and the
behaviors they provide. They do not specify how the data structure
must be implemented or laid out in memory, but simply provide a
minimal expected interface and set of behaviors. For example, a stack
is an abstract data type that specifies a linear data structure with
LIFO (last in, first out) behavior. Stacks are commonly implemented
using arrays or linked lists, but a needlessly complicated
implementation using a binary search tree is still a valid
implementation. To be clear, it is incorrect to say that stacks are
arrays or vice versa. An array can be used as a stack. Likewise, a
stack can be implemented using an array.
Since abstract data types don't specify an implementation, this means
it's also incorrect to talk about the time complexity of a given
abstract data type. An associative array may or may not have O(1)
average search times. An associative array that is implemented by a
hash table does have O(1) average search times.
Example for ADT: List - can be implemented using Array and LinkedList, Queue, Deque, Stack, Associative array, Set.
https://brilliant.org/wiki/abstract-data-types/?subtopic=types-and-data-structures&chapter=abstract-data-types
ADT are a set of data values and associated operations that are precisely independent of any paticular implementaition. The strength of an ADT is implementaion is hidden from the user.only interface is declared .This means that the ADT is various ways
Abstract Data type is a mathematical module that includes data with various operations. Implementation details are hidden and that's why it is called abstract. Abstraction allowed you to organise the complexity of the task by focusing on logical properties of data and actions.
In programming languages, a type is some data and the associated operations. An ADT is a user defined data aggregate and the operations over these data and is characterized by encapsulation, the data and operations are represented, or at list declared, in a single syntactic unit, and information hiding, only the relevant operations are visible for the user of the ADT, the ADT interface, in the same way that a normal data type in the programming language. It's an abstraction because the internal representation of the data and implementation of the operations are of no concern to the ADT user.
Before defining abstract data types, let us considers the different
view of system-defined data types. We all know that by default all
primitive data types (int, float, etc.) support basic operations such
as addition and subtraction. The system provides the implementations
for the primitive data types. For user-defined data types, we also
need to define operations. The implementation for these operations can
be done when we want to actually use them. That means in general,
user-defined data types are defined along with their operations.
To simplify the process of solving problems, we combine the data
structures with their operations and we call this "Abstract Data
Type". (ADT's).
Commonly used ADT'S include: Linked List, Stacks, Queues, Binary Tree,
Dictionaries, Disjoint Sets (Union and find), Hash Tables and many
others.
ADT's consist of two types:
1. Declaration of data.
2. Declaration of operation.
Simply Abstract Data Type is nothing but a set of operation and set of data is used for storing some other data efficiently in the machine.
There is no need of any perticular type declaration.
It just require a implementation of ADT.
To solve problems we combine the data structure with their operations. An ADT consists of two parts:
Declaration of Data.
Declaration of Operation.
Commonly used ADT's are Linked Lists, Stacks, Queues, Priority Queues, Trees etc. While defining ADTs we don't need to worry about implementation detals. They come into picture only when we want to use them.
Abstract data type are like user defined data type on which we can perform functions without knowing what is there inside the datatype and how the operations are performed on them . As the information is not exposed its abstracted. eg. List,Array, Stack, Queue. On Stack we can perform functions like Push, Pop but we are not sure how its being implemented behind the curtains.
ADT is a set of objects and operations, no where in an ADT’s definitions is there any mention of how the set of operations is implemented. Programmers who use collections only need to know how to instantiate and access data in some pre-determined manner, without concerns for the details of the collections implementations. In other words, from a user’s perspective, a collection is an abstraction, and for this reason, in computer science, some collections are referred to as abstract data types (ADTs). The user is only concern with learning its interface, or the set of operations its performs...more
in a simple word: An abstract data type is a collection of data and operations that work on that data. The operations both describe the data to the rest of the program and allow the rest of the program to change the data. The word “data” in “abstract data type” is used loosely. An ADT might be a graphics window with all the operations that affect it, a file and file operations, an insurance-rates table and the operations on it, or something else.
from code complete 2 book
Abstract data type is the collection of values and any kind of operation on these values. For example, since String is not a primitive data type, we can include it in abstract data types.
ADT is a data type in which collection of data and operation works on that data . It focuses on more the concept than implementation..
It's up to you which language you use to make it visible on the earth
Example:
Stack is an ADT while the Array is not
Stack is ADT because we can implement it by many languages,
Python c c++ java and many more , while Array is built in data type
An abstract data type, sometimes abbreviated ADT, is a logical description of how we view the data and the operations that are allowed without regard to how they will be implemented. This means that we are concerned only with what the data is representing and not with how it will eventually be constructed.
https://runestone.academy/runestone/books/published/pythonds/Introduction/WhyStudyDataStructuresandAbstractDataTypes.html
Abstractions give you only information(service information) but not implementation.
For eg: When you go to withdraw money from an ATM machine, you just know one thing i.e put your ATM card to the machine, click the withdraw option, enter the amount and your money is out if there is money.
This is only what you know about ATM machines. But do you know how you are receiving money?? What business logic is going on behind? Which database is being called? Which server at which location is being invoked?? No, you only know is service information i.e you can withdraw money. This is an abstraction.
Similarly, ADT gives you an overview of data types: what they are / can be stored and what operations you can perform on those data types. But it doesn’t provide how to implement that. This is ADT. It only defines the logical form of your data types.
Another analogy is :
In a car or bike, you only know when you press the brake your vehicle will stop. But do you know how the bike stops when you press the brake??? No, means implementation detail is being hidden. You only know what brake does when you press but don’t know how it does.
An abstract data type(ADT) is an abstraction of a data structure that provides only the interface to which the data structure must adhere. The interface does not give any specific details about how something should be implemented or in what programming language.
The term data type is as the type of data which a particular variable can hold - it may be an integer, a character, a float, or any range of simple data storage representation. However, when we build an object oriented system, we use other data types, known as abstract data type, which represents more realistic entities.
E.g.: We might be interested in representing a 'bank account' data type, which describe how all bank account are handled in a program. Abstraction is about reducing complexity, ignoring unnecessary details.

Is there any advantage in using Vector.<Object> in place of a standard Array?

Because of the inability to create Vectors dynamically, I'm forced to create one with a very primitive type, i.e. Object:
var list:Vector.<Object> = new Vector.<Object>();
I'm assuming that Vector gains its power from being typed as closely as possible, rather than the above, but I may be wrong and there are in-fact still gains when using the above in place of a normal Array or Object:
var list:Array = [];
var list:Object = {};
Does anyone have any insight on this?
You will not gain any benefits from Vector.< Object > compared to Array or vice versa. Also the underlying data structure will be the same even if you have a tighter coupled Vector such as Vector.< Foo >. The only optimization gains will be if you use value types. The reason for this is that Ecmascript will still be late binding and all reference objects share the same referencing byte structure.
However, in Ecmascript 4 (of which Actionscript is an implementation) the Vector generic datatype adds bounds checking to element access (the non-vector will simply grow the array), so the functionality varies slightly and consequently the number of CPU clock cycles will vary a little bit. This is negligible however.
One advantage I've seen is that coding is a bit easier with vectors, because FlashDevelop (and most coding tools for as3) can do code hinting better. so I can do myVector. and see my methods and functions, array won't let you do that without casting myArr[2] as myObject (thought this kind of casting is rumoured to make it faster, not slower)
Array's sort functions are faster however, but if it is speed you're after, you might be better served by linked lists (pending the application)
I think using vectors is the proper way to be coding, but not necessarily better.
Excellent question- Vectors have a tremendous value! Vector. vs Array is a bad example of the differences though and benchmarks may be similar. However, Vector. vs Array is DEFINITELY better both memory and processing. The speed improvement comes from Flash not needing to "box" and "unbox" the values (multiple mathematical operations required for this). Also, Array cannot allocate memory as effectively as a typed Vector. Strict-typing collections are almost always better.
Benchmarks:
http://jacksondunstan.com/articles/636
http://www.mikechambers.com/blog/2008/09/24/actioscript-3-vector-array-performance-comparison/
Even .NET suffers from boxing collections (Array):
http://msdn.microsoft.com/en-us/library/ms173196.aspx
UPDATE:
I've been corrected! Only primitive numeric types get a performance enhancement
from Vectors. You won't see any improvement with Array vs Vector.<Object>.

Is my understanding of type systems correct?

The following statements represent my understanding of type systems (which suffers from too little hands-on experience outside the Java world); please correct any errors.
The static/dynamic distinction seems pretty clear-cut:
Statically typed langauges assign each variable, field and parameter a type and the compiler prevents assignments between incompatible types. Examples: C, Java, Pascal.
Dynamically typed languages treat variables as generic bins that can hold anything you want - types are checked (if at all) only at runtime when you actually perform operations on the values, not when you assign them. Examples: Smalltalk, Python, JavaScript.
Type inference allows statically typed languages to look like (and have some of the advantages of) dynamically typed ones, by inferring types from the context so that you don't have to declare them most of the time - but unlike in dynamic languages, you cannot e.g. use a variable to hold a string initially and then assign an integer to it. Examples: Haskell, Scala
I am much less certain about the strong/weak distinction, and I suspect that it's not very clearly defined:
Strongly typed languages assign each runtime value a type and only allow operations to be performed that are defined for that type, otherwise there is an explicit type error.
Weakly typed languages don't have runtime type checks - if you try to perform an operation on a value that it does not support, the results are unpredictable. It may actually do something useful, but more likely you'll get corrupted data, a crash, or some undecipherable secondary error.
There seems to be at least two different kinds of weakly typed languages (or perhaps a continuum):
In C and assembler, values are basically buckets of bits, so anything is possible and if you get the compiler to dereference the first 4 bytes of a null-terminated string, you better hope it leads somewhere that does not contain legal machine code.
PHP and JavaScript are also generally considered weakly typed, but do not consider values to be opaque bit buckets; they will, however, perform implicit type conversions.
But these implicit conversions seem to apply mainly to string/integer/float variables - does that really warrant the classification as weakly typed? Or are there other issues where these languages's type system may obfuscate errors?
I am much less certain about the strong/weak distinction, and I suspect that it's not very clearly defined.
You are right: it isn't.
This is what Benjamin C. Pierce, author of Types and Programming Languages and Advanced Types and Programming Languages has to say:
I spent a few weeks... trying to sort out the terminology of "strongly typed," "statically typed," "safe," etc., and found it amazingly difficult.... The usage of these terms is so various as to render them almost useless.
Luca Cardelli, in his Typeful Programming article, defines it as the absence of unchecked run-time type errors. Tony Hoare calls that exact same property "security". Other papers call it "type safety" or simply "safety".
Mark-Jason Dominus wrote a classic rant about this a couple of years ago on the comp.lang.perl.moderated newsgroup, in a discussion about whether or not Perl was strongly typed. In this rant he states that within just a few hours of research, he was able to find 8 different, sometimes contradictory definitions, mostly from respected sources like college textbooks or peer-reviewed papers. In particular, those texts contained examples that were meant to help the students distinguish between strongly and weakly typed languages, and according to those examples, C is strongly typed, C is weakly typed, C++ is strongly typed, C++ is weakly typed, Lisp is strongly typed, Lisp is weakly typed, Perl is strongly typed, Perl is weakly typed. (Does that clear up any confusion?)
The only definition that I have seen consistently applied is:
strongly typed: my programming language
weakly typed: your programming language
Regarding static and dynamic typing you are dead on the money. Static typing means that programs are checked before being executed, and a program might be rejected before it starts. Dynamic typing means that the types of values are checked during execution, and a poorly typed operation might cause the program to halt or otherwise signal an error at run time. A primary reason for static typing is to rule out programs that might have such "dynamic type errors".
Bob Harper has argued that a dynamically typed language can (and should) be considered to be a statically typed language with a single type, which Bob calls "value". This view is fair, but it's helpful only in limited contexts, such as trying to be precise about the type theory of languages.
Although I think you grasp the concept, your bullets do not make it clear that type inference is simply a special case of static typing. In most languages with type inference, type annotations are optional, but not necessarily in all contexts. (Example: signatures in ML.) Advanced static type systems often give you a tradeoff between annotations and inference; for example, in Haskell you can type polymorphic functions of higher rank (forall to the left of an arrow) but only with an annotations. So, if you are willing to add an annotation, you can get the compiler to accept a program that would be rejected without the annotation. I think this is the wave of the future in type inference.
The ideas of "strong" and "weak" typing I would characterize as not useful, because they don't have a universally agreed on technical meaning. Strong typing generally means that there are no loopholes in the type system, whereas weak typing means the type system can be subverted (invalidating any guarantees). The terms are often used incorrectly to mean static and dynamic typing. To see the difference, think of C: the language is type-checked at compile time (static typing), but there are plenty of loopholes; you can pretty much cast a value of any type to another type of the same size—in particular, you can cast pointer types freely. Pascal was a language that was intended to be strongly typed but famously had an unforeseen loophole: a variant record with no tag.
Implementations of strongly typed languages often acquire loopholes over time, usually so that part of the run-time system can be implemented in the high-level language. For example, Objective Caml has a function called Obj.magic which has the run-time effect of simply returning its argument, but at compile time it converts a value of any type to one of any other type. My favorite example is Modula-3, whose designers called their type-casting construct LOOPHOLE.
I encourage you to avoid the terms "strong" and "weak" with regard to type systems, and instead say precisely what you mean, e.g., "the type system guarantees that the following class of errors cannot occur at run time" (strong), "the static type system does not protect against certain run-time errors" (weak), or "the type system has a loophole" (weak). Just calling a type system "strong" or "weak" by itself does not communicate very much.
This is a pretty accurate reflection of my own understanding of the topic of the static/dynamic, strong/weak typing discussion. In addition, you can consider those other languages:
In languages such as TCL and Bourne Shell, the "main" value type is the string. Numeric operators are available that implicitly coerce input values from string representation and result values to string representation. They can be considered examples of dynamic, weakly typed languages.
Forth may be an example of a static, weakly typed language. The language performs no type checking of its own, and the main stack may interchangeably contain pointers, integers, strings (conventionally represented as two cells, start and length). Inconsistent use of operators can lead to either interesting, or unspecified behavior. Typical Forth implementations provide a separate stack for floating point numbers.
Maybe this Book can help. Be prepared for some math though. If I remember correctly, a "non-math" statement was: "Strongly typed: A language that I feel safe to program with".
There seems to be at least two different kinds of weakly typed languages (or perhaps a continuum):
In C and assembler, values are basically buckets of bits, so anything is possible and if you get the compiler to dereference the first 4 bytes of a null-terminated string, you better hope it leads somewhere that does not contain legal machine code.
I would disagree with this statement, at least in C. You can manipulate the type system in C in such a way that you can treat any given memory location as a bucket of bits, but a variable most definitely has a type and that type has specific properties. The fact that there are no runtime checks (unless you consider floating point exceptions or segmentation faults to be runtime checks) isn't really relevant. C can be considered "weakly typed" in the sense that the compiler will perform some implicit type conversion for you, but it doesn't go very far with it.
I consider strong/weak to be the concept of implicit conversion and a good example is addition of a string and a number. In a strongly typed language the conversion won't happen (at least in all languages I can think of) and you'll get an error. Weakly typed languages like VB (with Option Explicit Off) and Javascript will try to cast one of the operands to the other type.
In VB.Net with Option Strict Off:
Dim A As String = "5"
Dim B As Integer = 5
Trace.WriteLine(A + B) 'returns 10
With Option Strict On (turning VB into a strongly typed language) you'll get a compiler error.
In Javascript:
var A = '5';
var B = 5;
alert(A + B);//returns 55
Some people will say that the results are not predictable but they actually do follow a set of rules.
Hmm, don't know much more either, but I wanted to mention C++ and its implicit converstions(implicit constructors). This might be as well an example of weak typing.
I agree with the others who say "there doesn't seem to be a hard and fast definition here." My answer tends to be based on how much rope the language gives you WRT types. If you can pretty much fake anything you want, then it's weak. If it really doesn't let you get yourself into trouble, even if you want to, it's strong.
I really haven't seen too many languages that skirt this border, so I can't say that I've ever needed a better definition that that...