What is ADT? (Abstract Data Type) - language-agnostic

I am currently studying about Abstract Data Types (ADT's) but I don't get the concept at all. Can someone please explain to me what this actually is? Also what is collection, bag, and List ADT? in simple terms?

Abstract Data Type(ADT) is a data type, where only behavior is defined but not implementation.
Opposite of ADT is Concrete Data Type (CDT), where it contains an implementation of ADT.
Examples:
Array, List, Map, Queue, Set, Stack, Table, Tree, and Vector are ADTs. Each of these ADTs has many implementations i.e. CDT. The container is a high-level ADT of above all ADTs.
Real life example:
book is Abstract (Telephone Book is an implementation)

The Abstact data type Wikipedia article has a lot to say.
In computer science, an abstract data type (ADT) is a mathematical model for a certain class of data structures that have similar behavior; or for certain data types of one or more programming languages that have similar semantics. An abstract data type is defined indirectly, only by the operations that may be performed on it and by mathematical constraints on the effects (and possibly cost) of those operations.
In slightly more concrete terms, you can take Java's List interface as an example. The interface doesn't explicitly define any behavior at all because there is no concrete List class. The interface only defines a set of methods that other classes (e.g. ArrayList and LinkedList) must implement in order to be considered a List.
A collection is another abstract data type. In the case of Java's Collection interface, it's even more abstract than List, since
The List interface places additional stipulations, beyond those specified in the Collection interface, on the contracts of the iterator, add, remove, equals, and hashCode methods.
A bag is also known as a multiset.
In mathematics, the notion of multiset (or bag) is a generalization of the notion of set in which members are allowed to appear more than once. For example, there is a unique set that contains the elements a and b and no others, but there are many multisets with this property, such as the multiset that contains two copies of a and one of b or the multiset that contains three copies of both a and b.
In Java, a Bag would be a collection that implements a very simple interface. You only need to be able to add items to a bag, check its size, and iterate over the items it contains. See Bag.java for an example implementation (from Sedgewick & Wayne's Algorithms 4th edition).

A truly abstract data type describes the properties of its instances without commitment to their representation or particular operations. For example the abstract (mathematical) type Integer is a discrete, unlimited, linearly ordered set of instances. A concrete type gives a specific representation for instances and implements a specific set of operations.

Notation of Abstract Data Type(ADT)
An abstract data type could be defined as a mathematical model with a
collection of operations defined on it. A simple example is the set of
integers together with the operations of union, intersection defined
on the set.
The ADT's are generalizations of primitive data type(integer, char
etc) and they encapsulate a data type in the sense that the definition
of the type and all operations on that type localized to one section
of the program. They are treated as a primitive data type outside the
section in which the ADT and its operations are defined.
An implementation of an ADT is the translation into statements of
a programming language of the declaration that defines a variable to
be of that ADT, plus a procedure in that language for each
operation of that ADT. The implementation of the ADT chooses a
data structure to represent the ADT.
A useful tool for specifying the logical properties of data type is
the abstract data type. Fundamentally, a data type is a collection of
values and a set of operations on those values. That collection and
those operations form a mathematical construct that may be implemented
using a particular hardware and software data structure. The term
"abstract data type" refers to the basic mathematical concept that defines the data type.
In defining an abstract data type as mathamatical concept, we are not
concerned with space or time efficinecy. Those are implementation
issue. Infact, the defination of ADT is not concerned with
implementaion detail at all. It may not even be possible to implement
a particular ADT on a particular piece of hardware or using a
particular software system. For example, we have already seen that an
ADT integer is not universally implementable.
To illustrate the concept of an ADT and my specification method,
consider the ADT RATIONAL which corresponds to the mathematical
concept of a rational number. A rational number is a number that can
be expressed as the quotient of two integers. The operations on
rational numbers that, we define are the creation of a rational number
from two integers, addition, multiplication and testing for equality.
The following is an initial specification of this ADT.
/* Value defination */
abstract typedef <integer, integer> RATIONAL;
condition RATIONAL [1]!=0;
/*Operator defination*/
abstract RATIONAL makerational (a,b)
int a,b;
preconditon b!=0;
postcondition makerational [0] =a;
makerational [1] =b;
abstract RATIONAL add [a,b]
RATIONAL a,b;
postcondition add[1] = = a[1] * b[1]
add[0] = a[0]*b[1]+b[0]*a[1]
abstract RATIONAL mult [a, b]
RATIONAL a,b;
postcondition mult[0] = = a[0]*b[a]
mult[1] = = a[1]*b[1]
abstract equal (a,b)
RATIONAL a,b;
postcondition equal = = |a[0] * b[1] = = b[0] * a[1];
An ADT consists of two parts:-
1) Value definition
2) Operation definition
1) Value Definition:-
The value definition defines the collection of values for the ADT and
consists of two parts:
1) Definition Clause
2) Condition Clause
For example, the value definition for the ADT RATIONAL states that
a RATIONAL value consists of two integers, the second of which does
not equal to 0.
The keyword abstract typedef introduces a value definitions and the
keyword condition is used to specify any conditions on the newly
defined data type. In this definition the condition specifies that the
denominator may not be 0. The definition clause is required, but the
condition may not be necessary for every ADT.
2) Operator Definition:-
Each operator is defined as an abstract junction with three parts.
1)Header
2)Optional Preconditions
3)Optional Postconditions
For example the operator definition of the ADT RATIONAL includes the
operations of creation (makerational), addition (add) and
multiplication (mult) as well as a test for equality (equal). Let us
consider the specification for multiplication first, since, it is the
simplest. It contains a header and post-conditions, but no
pre-conditions.
abstract RATIONAL mult [a,b]
RATIONAL a,b;
postcondition mult[0] = a[0]*b[0]
mult[1] = a[1]*b[1]
The header of this definition is the first two lines, which are just
like a C function header. The keyword abstract indicates that it is
not a C function but an ADT operator definition.
The post-condition specifies, what the operation does. In a
post-condition, the name of the function (in this case, mult) is used
to denote the result of an operation. Thus, mult [0] represents
numerator of result and mult 1 represents the denominator of the
result. That is it specifies, what conditions become true after the
operation is executed. In this example the post-condition specifies
that the neumerator of the result of a rational multiplication equals
integer product of numerators of the two inputs and the denominator
equals th einteger product of two denominators.
List
In computer science, a list or sequence is an abstract data type that
represents a countable number of ordered values, where the same value
may occur more than once. An instance of a list is a computer
representation of the mathematical concept of a finite sequence; the
(potentially) infinite analog of a list is a stream. Lists are a basic
example of containers, as they contain other values. If the same value
occurs multiple times, each occurrence is considered a distinct item
The name list is also used for several concrete data structures that
can be used to implement abstract lists, especially linked lists.
Image of a List
Bag
A bag is a collection of objects, where you can keep adding objects to
the bag, but you cannot remove them once added to the bag. So with a
bag data structure, you can collect all the objects, and then iterate
through them. You will bags normally when you program in Java.
Image of a Bag
Collection
A collection in the Java sense refers to any class that implements the
Collection interface. A collection in a generic sense is just a group
of objects.
Image of collections

Actually Abstract Data Types is:
Concepts or theoretical model that defines a data type logically
Specifies set of data and set of operations that can be performed on that data
Does not mention anything about how operations will be implemented
"Existing as an idea but not having a physical idea"
For example, lets see specifications of some Abstract Data Types,
List Abstract Data Type: initialize(), get(), insert(), remove(), etc.
Stack Abstract Data Type: push(), pop(), peek(), isEmpty(), isNull(), etc.
Queue Abstract Data Type: enqueue(), dequeue(), size(), peek(), etc.

One of the simplest explanation given on Brilliant's wiki:
Abstract data types, commonly abbreviated ADTs, are a way of
classifying data structures based on how they are used and the
behaviors they provide. They do not specify how the data structure
must be implemented or laid out in memory, but simply provide a
minimal expected interface and set of behaviors. For example, a stack
is an abstract data type that specifies a linear data structure with
LIFO (last in, first out) behavior. Stacks are commonly implemented
using arrays or linked lists, but a needlessly complicated
implementation using a binary search tree is still a valid
implementation. To be clear, it is incorrect to say that stacks are
arrays or vice versa. An array can be used as a stack. Likewise, a
stack can be implemented using an array.
Since abstract data types don't specify an implementation, this means
it's also incorrect to talk about the time complexity of a given
abstract data type. An associative array may or may not have O(1)
average search times. An associative array that is implemented by a
hash table does have O(1) average search times.
Example for ADT: List - can be implemented using Array and LinkedList, Queue, Deque, Stack, Associative array, Set.
https://brilliant.org/wiki/abstract-data-types/?subtopic=types-and-data-structures&chapter=abstract-data-types

ADT are a set of data values and associated operations that are precisely independent of any paticular implementaition. The strength of an ADT is implementaion is hidden from the user.only interface is declared .This means that the ADT is various ways

Abstract Data type is a mathematical module that includes data with various operations. Implementation details are hidden and that's why it is called abstract. Abstraction allowed you to organise the complexity of the task by focusing on logical properties of data and actions.

In programming languages, a type is some data and the associated operations. An ADT is a user defined data aggregate and the operations over these data and is characterized by encapsulation, the data and operations are represented, or at list declared, in a single syntactic unit, and information hiding, only the relevant operations are visible for the user of the ADT, the ADT interface, in the same way that a normal data type in the programming language. It's an abstraction because the internal representation of the data and implementation of the operations are of no concern to the ADT user.

Before defining abstract data types, let us considers the different
view of system-defined data types. We all know that by default all
primitive data types (int, float, etc.) support basic operations such
as addition and subtraction. The system provides the implementations
for the primitive data types. For user-defined data types, we also
need to define operations. The implementation for these operations can
be done when we want to actually use them. That means in general,
user-defined data types are defined along with their operations.
To simplify the process of solving problems, we combine the data
structures with their operations and we call this "Abstract Data
Type". (ADT's).
Commonly used ADT'S include: Linked List, Stacks, Queues, Binary Tree,
Dictionaries, Disjoint Sets (Union and find), Hash Tables and many
others.
ADT's consist of two types:
1. Declaration of data.
2. Declaration of operation.

Simply Abstract Data Type is nothing but a set of operation and set of data is used for storing some other data efficiently in the machine.
There is no need of any perticular type declaration.
It just require a implementation of ADT.

To solve problems we combine the data structure with their operations. An ADT consists of two parts:
Declaration of Data.
Declaration of Operation.
Commonly used ADT's are Linked Lists, Stacks, Queues, Priority Queues, Trees etc. While defining ADTs we don't need to worry about implementation detals. They come into picture only when we want to use them.

Abstract data type are like user defined data type on which we can perform functions without knowing what is there inside the datatype and how the operations are performed on them . As the information is not exposed its abstracted. eg. List,Array, Stack, Queue. On Stack we can perform functions like Push, Pop but we are not sure how its being implemented behind the curtains.

ADT is a set of objects and operations, no where in an ADT’s definitions is there any mention of how the set of operations is implemented. Programmers who use collections only need to know how to instantiate and access data in some pre-determined manner, without concerns for the details of the collections implementations. In other words, from a user’s perspective, a collection is an abstraction, and for this reason, in computer science, some collections are referred to as abstract data types (ADTs). The user is only concern with learning its interface, or the set of operations its performs...more

in a simple word: An abstract data type is a collection of data and operations that work on that data. The operations both describe the data to the rest of the program and allow the rest of the program to change the data. The word “data” in “abstract data type” is used loosely. An ADT might be a graphics window with all the operations that affect it, a file and file operations, an insurance-rates table and the operations on it, or something else.
from code complete 2 book

Abstract data type is the collection of values and any kind of operation on these values. For example, since String is not a primitive data type, we can include it in abstract data types.

ADT is a data type in which collection of data and operation works on that data . It focuses on more the concept than implementation..
It's up to you which language you use to make it visible on the earth
Example:
Stack is an ADT while the Array is not
Stack is ADT because we can implement it by many languages,
Python c c++ java and many more , while Array is built in data type

An abstract data type, sometimes abbreviated ADT, is a logical description of how we view the data and the operations that are allowed without regard to how they will be implemented. This means that we are concerned only with what the data is representing and not with how it will eventually be constructed.
https://runestone.academy/runestone/books/published/pythonds/Introduction/WhyStudyDataStructuresandAbstractDataTypes.html

Abstractions give you only information(service information) but not implementation.
For eg: When you go to withdraw money from an ATM machine, you just know one thing i.e put your ATM card to the machine, click the withdraw option, enter the amount and your money is out if there is money.
This is only what you know about ATM machines. But do you know how you are receiving money?? What business logic is going on behind? Which database is being called? Which server at which location is being invoked?? No, you only know is service information i.e you can withdraw money. This is an abstraction.
Similarly, ADT gives you an overview of data types: what they are / can be stored and what operations you can perform on those data types. But it doesn’t provide how to implement that. This is ADT. It only defines the logical form of your data types.
Another analogy is :
In a car or bike, you only know when you press the brake your vehicle will stop. But do you know how the bike stops when you press the brake??? No, means implementation detail is being hidden. You only know what brake does when you press but don’t know how it does.

An abstract data type(ADT) is an abstraction of a data structure that provides only the interface to which the data structure must adhere. The interface does not give any specific details about how something should be implemented or in what programming language.

The term data type is as the type of data which a particular variable can hold - it may be an integer, a character, a float, or any range of simple data storage representation. However, when we build an object oriented system, we use other data types, known as abstract data type, which represents more realistic entities.
E.g.: We might be interested in representing a 'bank account' data type, which describe how all bank account are handled in a program. Abstraction is about reducing complexity, ignoring unnecessary details.

Related

In which languages is function abstraction not primitive

In Haskell function type (->) is given, it's not an algebraic data type constructor and one cannot re-implement it to be identical to (->).
So I wonder, what languages will allow me to write my version of (->)? How does this property called?
UPD Reformulations of the question thanks to the discussion:
Which languages don't have -> as a primitive type?
Why -> is necessary primitive?
I can't think of any languages that have arrows as a user defined type. The reason is that arrows -- types for functions -- are baked in to the type system, all the way down to the simply typed lambda calculus. That the arrow type must fundamental to the language comes directly from the fact that the way you form functions in the lambda calculus is via lambda abstraction (which, at the type level, introduces arrows).
Although Marcin aptly notes that you can program in a point free style, this doesn't change the essence of what you're doing. Having a language without arrow types as primitives goes against the most fundamental building blocks of Haskell. (The language you reference in the question.)
Having the arrow as a primitive type also shares some important ties to constructive logic: you can read the function arrow type as implication from intuition logic, and programs having that type as "proofs." (Namely, if you have something of type A -> B, you have a proof that takes some premise of type A, and produces a proof for B.)
The fact that you're perturbed by the use of having arrows baked into the language might imply that you're not fundamentally grasping why they're so tied to the design of the language, perhaps it's time to read a few chapters from Ben Pierce's "Types and Programming Languages" link.
Edit: You can always look at languages which don't have a strong notion of functions and have their semantics defined with respect to some other way -- such as forth or PostScript -- but in these languages you don't define inductive data types in the same way as in functional languages like Haskell, ML, or Coq. To put it another way, in any language in which you define constructors for datatypes, arrows arise naturally from the constructors for these types. But in languages where you don't define inductive datatypes in the typical way, you don't get arrow types as naturally because the language just doesn't work that way.
Another edit: I will stick in one more comment, since I thought of it last night. Function types (and function abstraction) forms the basis of pretty much all programming languages -- at least at some level, even if it's "under the hood." However, there are languages designed to define the semantics of other languages. While this doesn't strictly match what you're talking about, PLT Redex is one such system, and is used for specifying and debugging the semantics of programming languages. It's not super useful from a practitioners perspective (unless your goal is to design new languages, in which case it is fairly useful), but maybe that fits what you want.
Do you mean meta-circular evaluators like in SICP? Being able to write your own DSL? If you create your own "function type", you'll have to take care of "applying" it, yourself.
Just as an example, you could create your own "function" in C for instance, with a look-up table holding function pointers, and use integers as functions. You'd have to provide your own "call" function for such "functions", of course:
void call( unsigned int function, int data) {
lookup_table[function](data);
}
You'd also probably want some means of creating more complex functions from primitive ones, for instance using arrays of ints to signify sequential execution of your "primitive functions" 1, 2, 3, ... and end up inventing whole new language for yourself.
I think early assemblers had no ability to create callable "macros" and had to use GOTO.
You could use trampolining to simulate function calls. You could have only global variables store, with shallow binding perhaps. In such language "functions" would be definable, though not primitive type.
So having functions in a language is not necessary, though it is convenient.
In Common Lisp defun is nothing but a macro associating a name and a callable object (though lambda is still a built-in). In AutoLisp originally there was no special function type at all, and functions were represented directly by quoted lists of s-expressions, with first element an arguments list. You can construct your function through use of cons and list functions, from symbols, directly, in AutoLisp:
(setq a (list (cons 'x NIL) '(+ 1 x)))
(a 5)
==> 6
Some languages (like Python) support more than one primitive function type, each with its calling protocol - namely, generators support multiple re-entry and returns (even if syntactically through the use of same def keyword). You can easily imagine a language which would let you define your own calling protocol, thus creating new function types.
Edit: as an example consider dealing with multiple arguments in a function call, the choice between automatic currying or automatical optional args etc. In Common LISP say, you could easily create yourself two different call macros to directly represent the two calling protocols. Consider functions returning multiple values not through a kludge of aggregates (tuples, in Haskell), but directly into designated recepient vars/slots. All are different types of functions.
Function definition is usually primitive because (a) functions are how programmes get things done; and (b) this sort of lambda-abstraction is necessary to be able to programme in a pointful style (i.e. with explicit arguments).
Probably the closest you will come to a language that meets your criteria is one based on a purely pointfree model which allows you to create your own lambda operator. You might like to explore pointfree languages in general, and ones based on SKI calculus in particular: http://en.wikipedia.org/wiki/SKI_combinator_calculus
In such a case, you still have primitive function types, and you always will, because it is a fundamental element of the type system. If you want to get away from that at all, probably the best you could do would be some kind of type system based on a category-theoretic generalisation of functions, such that functions would be a special case of another type. See http://en.wikipedia.org/wiki/Category_theory.
Which languages don't have -> as a primitive type?
Well, if you mean a type that can be named, then there are many languages that don't have them. All languages where functions are not first class citiziens don't have -> as a type you could mention somewhere.
But, as #Kristopher eloquently and excellently explained, functions are (or can, at least, perceived as) the very basic building blocks of all computation. Hence even in Java, say, there are functions, but they are carefully hidden from you.
And, as someone mentioned assembler - one could maintain that the machine language (of most contemporary computers) is an approximation of the model of the register machine. But how it is done? With millions and billions of logical circuits, each of them being a materialization of quite primitive pure functions like NOT or NAND, arranged in a certain physical order (which is, obviously, the way hardware engeniers implement function composition).
Hence, while you may not see functions in machine code, they're still the basis.
In Martin-Löf type theory, function types are defined via indexed product types (so-called Π-types).
Basically, the type of functions from A to B can be interpreted as a (possibly infinite) record, where all the fields are of the same type B, and the field names are exactly all the elements of A. When you need to apply a function f to an argument x, you look up the field in f corresponding to x.
The wikipedia article lists some programming languages that are based on Martin-Löf type theory. I am not familiar with them, but I assume that they are a possible answer to your question.
Philip Wadler's paper Call-by-value is dual to call-by-name presents a calculus in which variable abstraction and covariable abstraction are more primitive than function abstraction. Two definitions of function types in terms of those primitives are provided: one implements call-by-value, and the other call-by-name.
Inspired by Wadler's paper, I implemented a language (Ambidexer) which provides two function type constructors that are synonyms for types constructed from the primitives. One is for call-by-value and one for call-by-name. Neither Wadler's dual calculus nor Ambidexter provides user-defined type constructors. However, these examples show that function types are not necessarily primitive, and that a language in which you can define your own (->) is conceivable.
In Scala you can mixin one of the Function traits, e.g. a Set[A] can be used as A => Boolean because it implements the Function1[A,Boolean] trait. Another example is PartialFunction[A,B], which extends usual functions by providing a "range-check" method isDefinedAt.
However, in Scala methods and functions are different, and there is no way to change how methods work. Usually you don't notice the difference, as methods are automatically lifted to functions.
So you have a lot of control how you implement and extend functions in Scala, but I think you have a real "replacement" in mind. I'm not sure this makes even sense.
Or maybe you are looking for languages with some kind of generalization of functions? Then Haskell with Arrow syntax would qualify: http://www.haskell.org/arrows/syntax.html
I suppose the dumb answer to your question is assembly code. This provides you with primitives even "lower" level than functions. You can create functions as macros that make use of register and jump primitives.
Most sane programming languages will give you a way to create functions as a baked-in language feature, because functions (or "subroutines") are the essence of good programming: code reuse.

What exactly is a datatype?

I understand what a datatype is (intuitively). But I need the formal definition. I don't understand if it is a set or it's the names 'int' 'float' etc. The formal definition found on wikipedia is confusing.
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored.
Can anyone help me with that?
Yep. What that's saying is that a data type has three pieces:
The various possible values. So, for example, an eight bit signed integer might have -127..128. This of that as a set of values V.
The operations: so an 8-bit signed integer might have +, -, * (multiply), and / (divide). The full definition would define those as functions from V into V, or possible as a function from V into float for division.
The way it's stored -- I sort of gave it away when I said "eight bit signed integer". The other detail is that I'm assuming a specific representation by the way I showed the range of values.
You might, if you're into object oriented programming, notice that this is very much like the definition of a class, which is defined by the storage used by each object, adn the methods of the class. Providing those parts for some arbitrary thing, but not inheritance rules, gives you what's called an abstract data type.
Update
#Appy, there's some room for differences in the formalities. I was a little subtle because it was late and I was suddenly uncertain if I'd assumed one's complement or two's complement -- of course it's two's complement. So interpretation is included in my description. Abstractly, though, you'd say it is a algebraic structure T=(V,O) where V is a set of values, O a set of functions from V into some arbitrary type -- remember '==' for example will be a function eq:V × V → {0,1} so you can't expect every operation to be into V.
I can define it as a classification of a particular type of information. It is easy for humans to distinguish between different types of data. We can usually tell at a glance whether a number is a percentage, a time, or an amount of money. We do this through special symbols %, :, and $.
Basically it's the concept that I am sure you grock. For computers however a data type is defined and has various associated attributes, like size, like a definition keywork (sometimes), the values it can take (numbers or characters for example) and operations that can be done on it like add subtract for numbers and append on string or compare on a character, etc. These differ from language to language and even from environment to env. (16 - 32 bit ints/ 32 - 64 envs./ etc).
If there is anything I am missing or needs refining please ask as this is fairly open ended.

What is the preferred way to store different versions of data?

When you're writing an application that needs to read and work with two versions of data in the same way, what is the best way to structure your classes to represent that data. I have come up with three scenarios:
Common Base/Specific Children
Data Union
Distinct Structures
Version 1 Car Example
byte DoorCount
int Color
byte HasMoonroof
byte HasSpoiler
float EngineSize
byte CylinderCount
Version 2 Car
byte DoorCount
int Color
enum:int MoonRoofType
enum:int TrunkAccessories
enum:int EngineType
Common Base/Specific Children
With this method, there is a base class of common fields between the two versions of data and a child class for each version of the data.
class Car {
byte DoorCount;
int Color;
}
class CarVersion1 : Car {
byte HasMoonroof;
byte HasSpoiler;
float EngineSize;
byte CylinderCount;
}
class CarVersion2 : Car {
int MoonRoofType;
int TrunkAccessories;
int EngineType;
}
Strengths
OOP Paradigm
Weaknesses
Existing child classes will have to change if a new version is released that removes a common field
Data for one conceptual unit is split between two definitions not because of any division meaningful to itself.
Data Union
Here, a Car is defined as the union of the fields of Car across all versions of the data.
class Car {
CarVersion version;
byte DoorCount;
int Color;
int MoonRoofType; //boolean if Version 1
int TrunkAccessories; //boolean if Version 1
int EngineType; //CylinderCount if Version 1
float EngineSize; //Not used if Version2
}
Strengths
Um... Everything is in one place.
Weaknesses
Forced case driven code.
Difficult to maintain when another version is release or legacy is removed.
Difficult to conceptualize. The meanings of the fields changed based on the version.
Distinct Structures
Here the structures have no OOP relationship to each other. However, interfaces may be implemented by both classes if/when the code expects to treat them in the same fashion.
class CarVersion1 {
byte DoorCount;
int Color;
byte HasMoonroof;
byte HasSpoiler;
float EngineSize;
byte CylinderCount;
}
class CarVersion2 {
byte DoorCount;
int Color;
int MoonRoofType;
int TrunkAccessories;
int EngineType;
}
Strengths
Straightforward approach
Easy to maintain if a new version is added or legacy is removed.
Weaknesses
It's an anti-pattern.
Is there a better way that I didn't think of? It's probably obvious that I favor the last methodology, but is the first one better?
Why is the third option, distinct structures for each version, a bad idea or anti-pattern?
If the two versions of data structures are used in a common application/module - they will have to implement the same interface. Period. It is definitely untenable to write two different application modules to handle two different versions of data structure. The fact that the underlying data model is extremely different should be irrelevant. After all, the goal of writing objects is to achieve a practical level of encapsulation.
As you continue writing code in this way, you should eventually find places where the code in both classes are similar or redundant. If you move these common pieces of code out of the various version classes, you may eventually end up with version classes that not only implement the same interface, but can also implement the same base/abstract class. Voila, you've found your way to your "first" option.
I think this is the best path in an environment with constantly evolving data. It requires some diligence and "looking behind" on older code, but worth the benefits of code clarity and reusable components.
Another thought: in your example, the base class is "Car". In my opinion, it hardly ever turns out that the base class is so "near" to it's inheritors. A more realistic set of base classes or interfaces might be "Versionable", "Upgradeable", "OptionContainer", etc. Just speaking from my experience, YMMV.
use the second approach and enhance it with the interfaces. remember that you can implement multiple interfaces "versions" which gives you the power of backward compatibility! i hope that you'll get what i meant to say ;)
Going on the following requirement:
an application that needs to read and work with two versions of data in the same way
I would say that the most important thing is that you funnel all logic through a data abstraction layer, so that none of your logic will have to care about whether you're using version 1, 2 or n of the data.
One way to do this is to have just one data class, that is the most "buffed up" version of the data. Basically, it would have MoonRoofType, but not HasMoonRoof since that can be inferred. This class should not have any obsolete properties either, since it's up to the data abstraction layer to decide what the default values should be.
In the end, you'll have an application that doesn't care about the data versions at all.
As for the data abstraction layer, you may or may not want to have data classes for every version. Most likely, all you'll need is one class for every version of the data structure with Save and Load methods for storing/creating the data instances used by your application logic.

Should Tuples Subclass Each Other?

Given a set of tuple classes in an OOP language: Pair, Triple and Quad, should Triple subclass Pair, and Quad subclass Triple?
The issue, as I see it, is whether a Triple should be substitutable as a Pair, and likewise Quad for Triple or Pair. Whether Triple is also a Pair and Quad is also a Triple and a Pair.
In one context, such a relationship might be valuable for extensibility - today this thing returns a Pair of things, tomorrow I need it to return a Triple without breaking existing callers, who are only using the first two of the three.
On the other hand, should they each be distinct types? I can see benefit in stronger type checking - where you can't pass a Triple to a method that expects a Pair.
I am leaning towards using inheritance, but would really appreciate input from others?
PS: In case it matters, the classes will (of course) be generic.
PPS: On a way more subjective side, should the names be Tuple2, Tuple3 and Tuple4?
Edit: I am thinking of these more as loosely coupled groups; not specifically for things like x/y x/y/z coordinates, though they may be used for such. It would be things like needing a general solution for multiple return values from a method, but in a form with very simple semantics.
That said, I am interested in all the ways others have actually used tuples.
Different length of tuple is a different type. (Well, in many type systems anyways.) In a strongly typed language, I wouldn't think that they should be a collection.
This is a good thing as it ensures more safety. Places where you return tuples usually have somewhat coupled information along with it, the implicit knowledge of what each component is. It's worse if you pass in more values in a tuple than expected -- what's that supposed to mean? It doesn't fit inheritance.
Another potential issue is if you decide to use overloading. If tuples inherit from each other, then overload resolution will fail where it should not. But this is probably a better argument against overloading.
Of course, none of this matters if you have a specific use case and find that certain behaviours will help you.
Edit: If you want general information, try perusing a bit of Haskell or ML family (OCaml/F#) to see how they're used and then form your own decisions.
It seems to me that you should make a generic Tuple interface (or use something like the Collection mentioned above), and have your pair and 3-tuple classes implement that interface. That way, you can take advantage of polymorphism but also allow a pair to use a simpler implementation than an arbitrary-sized tuple. You'd probably want to make your Tuple interface include .x and .y accessors as shorthand for the first two elements, and larger tuples can implement their own shorthands as appropriate for items with higher indices.
Like most design related questions, the answer is - It depends.
If you are looking for conventional Tuple design, Tuple2, Tuple3 etc is the way to go. The problem with inheritance is that, first of all Triplet is not a type of Pair. How would you implement the equals method for it? Is a Triplet equal to a Pair with first two items the same? If you have a collection of Pairs, can you add triplet to it or vice versa? If in your domain this is fine, you can go with inheritance.
Any case, it pays to have an interface/abstract class (maybe Tuple) which all these implement.
it depends on the semantics that you need -
a pair of opposites is not semantically compatible with a 3-tuple of similar objects
a pair of coordinates in polar space is not semantically compatible with a 3-tuple of coordinates in Euclidean space
if your semantics are simple compositions, then a generic class Tuple<N> would make more sense
I'd go with 0,1,2 or infinity. e.g. null, 1 object, your Pair class, or then a collection of some sort.
Your Pair could even implement a Collection interface.
If there's a specific relationship between Three or Four items, it should probably be named.
[Perhaps I'm missing the problem, but I just can't think of a case where I want to specifically link 3 things in a generic way]
Gilad Bracha blogged about tuples, which I found interesting reading.
One point he made (whether correctly or not I can't yet judge) was:
Literal tuples are best defined as read only. One reason for this is that readonly tuples are more polymorphic. Long tuples are subtypes of short ones:
{S. T. U. V } <= {S. T. U} <= {S. T} <= {S}
[and] read only tuples are covariant:
T1 <= T2, S1 <= S2 ==> {S1. T1} <= {S2. T2}
That would seem to suggest my inclination to using inheritance may be correct, and would contradict amit.dev when he says that a Triple is not a Pair.

What is boxing and unboxing and what are the trade offs?

I'm looking for a clear, concise and accurate answer.
Ideally as the actual answer, although links to good explanations welcome.
Boxed values are data structures that are minimal wrappers around primitive types*. Boxed values are typically stored as pointers to objects on the heap.
Thus, boxed values use more memory and take at minimum two memory lookups to access: once to get the pointer, and another to follow that pointer to the primitive. Obviously this isn't the kind of thing you want in your inner loops. On the other hand, boxed values typically play better with other types in the system. Since they are first-class data structures in the language, they have the expected metadata and structure that other data structures have.
In Java and Haskell generic collections can't contain unboxed values. Generic collections in .NET can hold unboxed values with no penalties. Where Java's generics are only used for compile-time type checking, .NET will generate specific classes for each generic type instantiated at run time.
Java and Haskell have unboxed arrays, but they're distinctly less convenient than the other collections. However, when peak performance is needed it's worth a little inconvenience to avoid the overhead of boxing and unboxing.
* For this discussion, a primitive value is any that can be stored on the call stack, rather than stored as a pointer to a value on the heap. Frequently that's just the machine types (ints, floats, etc), structs, and sometimes static sized arrays. .NET-land calls them value types (as opposed to reference types). Java folks call them primitive types. Haskellions just call them unboxed.
** I'm also focusing on Java, Haskell, and C# in this answer, because that's what I know. For what it's worth, Python, Ruby, and Javascript all have exclusively boxed values. This is also known as the "Everything is an object" approach***.
*** Caveat: A sufficiently advanced compiler / JIT can in some cases actually detect that a value which is semantically boxed when looking at the source, can safely be an unboxed value at runtime. In essence, thanks to brilliant language implementors your boxes are sometimes free.
from C# 3.0 In a Nutshell:
Boxing is the act of casting a value
type into a reference type:
int x = 9;
object o = x; // boxing the int
unboxing is... the reverse:
// unboxing o
object o = 9;
int x = (int)o;
Boxing & unboxing is the process of converting a primitive value into an object oriented wrapper class (boxing), or converting a value from an object oriented wrapper class back to the primitive value (unboxing).
For example, in java, you may need to convert an int value into an Integer (boxing) if you want to store it in a Collection because primitives can't be stored in a Collection, only objects. But when you want to get it back out of the Collection you may want to get the value as an int and not an Integer so you would unbox it.
Boxing and unboxing is not inherently bad, but it is a tradeoff. Depending on the language implementation, it can be slower and more memory intensive than just using primitives. However, it may also allow you to use higher level data structures and achieve greater flexibility in your code.
These days, it is most commonly discussed in the context of Java's (and other language's) "autoboxing/autounboxing" feature. Here is a java centric explanation of autoboxing.
In .Net:
Often you can't rely on what the type of variable a function will consume, so you need to use an object variable which extends from the lowest common denominator - in .Net this is object.
However object is a class and stores its contents as a reference.
List<int> notBoxed = new List<int> { 1, 2, 3 };
int i = notBoxed[1]; // this is the actual value
List<object> boxed = new List<object> { 1, 2, 3 };
int j = (int) boxed[1]; // this is an object that can be 'unboxed' to an int
While both these hold the same information the second list is larger and slower. Each value in the second list is actually a reference to an object that holds the int.
This is called boxed because the int is wrapped by the object. When its cast back the int is unboxed - converted back to it's value.
For value types (i.e. all structs) this is slow, and potentially uses a lot more space.
For reference types (i.e. all classes) this is far less of a problem, as they are stored as a reference anyway.
A further problem with a boxed value type is that it's not obvious that you're dealing with the box, rather than the value. When you compare two structs then you're comparing values, but when you compare two classes then (by default) you're comparing the reference - i.e. are these the same instance?
This can be confusing when dealing with boxed value types:
int a = 7;
int b = 7;
if(a == b) // Evaluates to true, because a and b have the same value
object c = (object) 7;
object d = (object) 7;
if(c == d) // Evaluates to false, because c and d are different instances
It's easy to work around:
if(c.Equals(d)) // Evaluates to true because it calls the underlying int's equals
if(((int) c) == ((int) d)) // Evaluates to true once the values are cast
However it is another thing to be careful of when dealing with boxed values.
Boxing is the process of conversion of a value type into a reference type. Whereas Unboxing is the conversion of a reference type into a value type.
EX: int i = 123;
object o = i;// Boxing
int j = (int)o;// UnBoxing
Value Types are: int, char and structures, enumerations.
Reference Types are:
Classes,interfaces,arrays,strings and objects
The language-agnostic meaning of a box is just "an object contains some other value".
Literally, boxing is an operation to put some value into the box. More specifically, it is an operation to create a new box containing the value. After boxing, the boxed value can be accessed from the box object, by unboxing.
Note that objects (not OOP-specific) in many programming languages are about identities, but values are not. Two objects are same iff. they have identities not distinguishable in the program semantics. Values can also be the same (usually under some equality operators), but we do not distinguish them as "one" or "two" unique values.
Providing boxes is mainly about the effort to distinguish side effects (typically, mutation) from the states on the objects otherwise probably invisible to the users.
A language may limit the allowed ways to access an object and hide the identity of the object by default. For example, typical Lisp dialects has no explicit distinctions between objects and values. As a result, the implementation has the freedom to share the underlying storage of the objects until some mutation operations occurs on the object (so the object must be "detached" after the operation from the shared instance to make the effect visible, i.e. the mutated value stored in the object could be different than the other objects having the old value). This technique is sometimes called object interning.
Interning makes the program more memory efficient at runtime if the objects are shared without frequent needs of mutation, at the cost that:
The users cannot distinguish the identity of the objects.
There are no way to identify an object and to ensure it has states explicitly independent to other objects in the program before some side effects have actually occur (and the implementation does not aggressively to do the interning concurrently; this should be the rare case, though).
There may be more problems on interoperations which require to identify different objects for different operations.
There are risks that such assumptions can be false, so the performance is actually made worse by applying the interning.
This depends on the programming paradigm. Imperative programming which mutates objects frequently certainly would not work well with interning.
Implementations depending on COW (copy-on-write) to ensure interning can incur serious performance degradation in concurrent environments.
Even local sharing specifically for a few internal data structures can be bad. For example, ISO C++ 11 did not allow sharing of the internal elements of std::basic_string for this reason exactly, even at the cost of breaking the ABI on at least one mainstream implementation (libstdc++).
Boxing and unboxing incur performance penalties. This is obvious especially when these operations can be naively avoided by hand but actually not easy for the optimizer. The concrete measurement of the cost depends (on per-implementation or even per-program basis), though.
Mutable cells, i.e. boxes, are well-established facilities exactly to resolve the problems of the 1st and 2nd bullets listed above. Additionally, there can be immutable boxes for implementation of assignment in a functional language. See SRFI-111 for a practical instance.
Using mutable cells as function arguments with call-by-value strategy implements the visible effects of mutation being shared between the caller and the callee. The object contained by an box is effectively "called by shared" in this sense.
Sometimes, the boxes are referred as references (which is technically false), so the shared semantics are named "reference semantics". This is not correct, because not all references can propagate the visible side effects (e.g. immutable references). References are more about exposing the access by indirection, while boxes are the efforts to expose minimal details of the accesses like whether indirection or not (which is uninterested and better avoided by the implementation).
Moreover, "value semantic" is irrelevant here. Values are not against to references, nor to boxes. All the discussions above are based on call-by-value strategy. For others (like call-by-name or call-by-need), no boxes are needed to shared object contents in this way.
Java is probably the first programming language to make these features popular in the industry. Unfortunately, there seem many bad consequences concerned in this topic:
The overall programming paradigm does not fit the design.
Practically, the interning are limited to specific objects like immutable strings, and the cost of (auto-)boxing and unboxing are often blamed.
Fundamental PL knowledge like the definition of the term "object" (as "instance of a class") in the language specification, as well as the descriptions of parameter passing, are biased compared to the the original, well-known meaning, during the adoption of Java by programmers.
At least CLR languages are following the similar parlance.
Some more tips on implementations (and comments to this answer):
Whether to put the objects on the call stacks or the heap is an implementation details, and irrelevant to the implementation of boxes.
Some language implementations do not maintain a contiguous storage as the call stack.
Some language implementations do not even make the (per thread) activation records a linear stack.
Some language implementations do allocate stacks on the free store ("the heap") and transfer slices of frames between the stacks and the heap back and forth.
These strategies has nothing to do boxes. For instance, many Scheme implementations have boxes, with different activation records layouts, including all the ways listed above.
Besides the technical inaccuracy, the statement "everything is an object" is irrelevant to boxing.
Python, Ruby, and JavaScript all use latent typing (by default), so all identifiers referring to some objects will evaluate to values having the same static type. So does Scheme.
Some JavaScript and Ruby implementations use the so-called NaN-boxing to allow inlining allocation of some objects. Some others (including CPython) do not. With NaN boxing, a normal double object needs no unboxing to access its value, while a value of some other types can be boxed in a host double object, and there is no reference for double or the boxed value. With the naive pointer approach, a value of host object pointer like PyObject* is an object reference holding a box whose boxed value is stored in the dynamically allocated space.
At least in Python, objects are not "everything". They are also not known as "boxed values" unless you are talking about interoperability with specific implementations.
The .NET FCL generic collections:
List<T>
Dictionary<TKey, UValue>
SortedDictionary<TKey, UValue>
Stack<T>
Queue<T>
LinkedList<T>
were all designed to overcome the performance issues of boxing and unboxing in previous collection implementations.
For more, see chapter 16, CLR via C# (2nd Edition).
Boxing and unboxing facilitates value types to be treated as objects. Boxing means converting a value to an instance of the object reference type. For example, Int is a class and int is a data type. Converting int to Int is an exemplification of boxing, whereas converting Int to int is unboxing. The concept helps in garbage collection, Unboxing, on the other hand, converts object type to value type.
int i=123;
object o=(object)i; //Boxing
o=123;
i=(int)o; //Unboxing.
Like anything else, autoboxing can be problematic if not used carefully. The classic is to end up with a NullPointerException and not be able to track it down. Even with a debugger. Try this:
public class TestAutoboxNPE
{
public static void main(String[] args)
{
Integer i = null;
// .. do some other stuff and forget to initialise i
i = addOne(i); // Whoa! NPE!
}
public static int addOne(int i)
{
return i + 1;
}
}