For specifically 'list' and 'array', does the difference depend on the programming language, or is it universal in the field of computer science?
Im new to CS, and for some reason I only hear 'list' mentioned in Python, and 'array' in Java, never heard 'array' in Java or 'list' in Python. Does this imply difference in implementation? or just in naming?
It may depend on the actual programming language and/or platform... but in a more general term:
An array is a set of contiguous memory that holds elements of the same type in a "packed" way.
A list is a set of items, addressed by pointers (or similar mechanisms) that link those items. Not necessarily contiguous in memory
This is a very general broad description... different languages and platforms may implement things differently. In some, an array and a list may be indistinguishable.
More likely, an array has a fixed size (to resize it you have to create a new array and copy the original contents to it), and can't have "holes". A list has (normally) dynamic capacity, and you can insert or remove items at any point without destroying the original list. But again, these are implementation details that may vary depending on the programming language you use.
Related
As programming evolved over the years (from assembler to high level languages), more and more features (garbage collection, exceptions, dynamic typing) have been added as standard to some languages. Is it possible to create a high level language that starts of with all features on by default, and once a program runs as well, then to be able to selectively turn features of in code, or have sections of code which a quarantined off so that they do not use these features. Perhaps modifying branches in the Abstract Syntax Tree to be statically typed, instead of dynamic; compiled, instead of interpreted.
Is there any programming language which can be used as dynamic and static, and also selectively turn of garbage collection, by releasing used objects, even up to disabling exception handling, all the way to the point where the run-time consist of only c like constructs, or any the above mentioned?
For a language to do what you're asking, it would have to be built to support both alternatives: garbage collection and manual memory management or static and dynamic typing and make the two worlds interoperate.
In other words, what you're saying to be just "turn off A", is actually "design A, design B, design transitioning between A and B". So, doing this would be a significant amount of additional design and implementation work, it would make the language more complicated and the language might end up as "worst of both worlds".
Now, languages that support both combinations of the features you mentioned do exist, in a limited form:
C# is normally a statically typed language, but it also has the dynamic keyword, which allows you to switch to dynamic typing for certain variables. This was primarily meant for interoperation with dynamic languages, and is not used much in practice.
C++/CLI is a language that supports both manually managed memory, (* pointers, new to allocate and delete to deallocate) and garbage collected memory (^ pointers, gcnew to allocate). It is primarily meant for interoperation between C++ code and .Net code and is not widely used in practice.
You might have noticed a theme here: in both cases, the feature/language were created to bridge the two worlds, but didn't gain much traction.
There has been much fuss about dynamically vs. statically typed languages. To my eye, however, while statically typed languages enable the compiler (or interpreter) to know a bit more about your intentions, they only barely scratch the surface of what could be conveyed. Indeed, some languages have an orthogonal mechanism for providing a bit more information in annotations.
I am aware of strongly typed languages like Agda and Coq that are very persnickety about what they allow you to do; I'm not terribly interested in those. Rather, I'm wondering what languages or theory exist that expand the richness of what you can explain to the compiler about what it is that you intend. For example, if you have a mutable vector and you turn it into a unit vector, why couldn't your compiler select a unit-vector form of vector projection instead of the more computationally expensive general form? The type has not changed--and the work required to build all the requisite types would be off-putting even in a language with amazingly easy typing such as Haskell--and yet it seems that the compiler could be empowered to know a great deal about the situation.
Does some language already enable things like this, either outside of standard type-theory or within one of its more advanced branches?
there are languages with turing-complete system type. which means that your types can express any computable property. for example list of length 6 or valid credit card number. however most mainstream languages uses simpler system types. haskell is considered to have very powerful system type
I found it interesting to read on one of the ways that you can do functional dynamic dispatch in sicp - using a table of type tag + name -> functions that you can fetch from or add to.
I was wondering, is this a typical type dispatch mechanism for a dynamic non OO language?
Also what would be the typical way to monkey path this, using a chaining list of tables(if you don't find it in the first table try next table recursively)? Rebind the table within local scope to a modified copy? ect?
I believe this is a typical type dispatch mechanism, even for non-dynamic non-OO languages, based on this article about the JHC Haskell compiler and how it implements type classes. The implication in the article is that most Haskell compilers implement type classes (a kind of type dispatch) by passing dictionaries. His alternative is direct case analysis, which likely would not be applicable in dynamically typed languages, since you don't know ahead of time what the types of the constituents of your expression will be. On the other hand, this isn't extensible at run-time either.
As for dynamic non-OO languages, I'm not aware of many examples outside Lisp/Scheme. Common Lisp's CLOS makes Lisp a proper OO language and provides dynamic dispatch as well as multiple dispatch (you can add or remove generics and methods at run-time, and they can key off the type of more than just the first parameter). I don't know how this is usually implemented, but I do know that it is usually an add-on facility rather than a built-in facility, which implies it's using functionality available to the would-be monkey-patcher, and also that certain versions have been criticized for their lack of speed (CLISP, I think, but they may have resolved this). Of course, you could implement this type of parallel dispatch mechanism within an OO language as well, and you can probably find plenty of examples of that.
If you were using purely-functional persistent maps or dictionaries, you could certainly implement this faculty without even needing the chain of inherited maps; as you "modify" the map, you get a new map back, but all the existing references to the old map would still be valid and see it as the old version. If you were implementing a language with this facility you could interpret it by putting the type->function map in the Reader monad and wrapping your interpreter in it.
In the context of functional programming which is the correct term to use: persistent or immutable? When I Google "immutable data structures" I get a Wikipedia link to an article on "Persistent data structure" which even goes on to say:
such data structures are effectively
immutable
Which further confuses things for me. Do functional programs rely on persistent data structures or immutable data structures? Or are they always the same thing?
The proper term for functional data structures is immutable. The teerm "persistent" is used in at least three ways:
A persistent data structure refers to the situation where you have an old data structure, you create a new one, but you keep a pointer to the old one. Typically the old one and new one share a lot of state—they may differ only by a constant number of heap objects or perhaps a linear number of heap data structures. This kind of persistence is a consequence of having immutable data structures, plus an algorithm that retains pointers to old versions of a data structure, allowing them to persist.
A persistent variable is one whose value persists across multiple invocations of the same program. This can be done with language features or libraries.
A persistent programming language is one that provides persistent variables. The holy grail is orthogonal persistence: a programmer can decide whether a variable is persistent, independent of all the other properties of that variable. At the moment, this kind of programming language is far-out research, but it's useful to preserve the terminological distinction.
I don't feel up to editing Wikipedia today :-(
A persistent data struture preserves only the previous version of itself after a change. Depending on the type of persistent data structure...you may or may not be able to modify previous versions.
An immutable type can not be changed at all.
Functional Languages primarily rely on Immutable types (also called values) for their data storage (even though you can use Mutable types in some...but it has to be done explicitly).
The article also says "in a purely functional program all data is immutable," which is true.
In my opinion you don't really need to make this distinction. If you're programming in a functional language or in a completely functional style -- as opposed to using functional idioms in imperative code where convenient -- then you can just say "data structure." By definition they will be immutable and persistent.
If you need to make the distinction for some reason, then "persistent" might be more appropriate for dynamic structures like trees and queues, where values appear to "change" based on execution traces, and "immutable" for simple value objects.
Immutable generally means "does not change". Persistent generally is taken to mean "stored on permanent storage medium". However, the Wikipedia article you mention seems to take the two words to mean very similar things (but not exact same). In fact it states:
A persistent data structure is not a data structure committed to persistent storage, such as a disk; this is a different and unrelated sense of the word "persistent."
"Immutable" is used far more often, as "persistent" is overloaded (normally it means "stored outside of and outliving the program") and even the correct definition carries additional semantic baggage that's often unrelated to the distinguishing quality of purely functional programming — the fact that there is no mutable state. A = A and always will for all values of A.
In this article, the authors use the word "persistent" as meaning "observationally immutable, although implemented with mutations under the hood". In that particular case, the mutations are hidden by the module system of a functional, but not pure, programming language.
I recently expressed my view about this elsewhere* , but I think it deserves further analysis so I'm posting this as its own question.
Let's say that I need to create and pass around a container in my program. I probably don't have a strong opinion about one kind of container versus another, at least at this stage, but I do pick one; for sake of argument, let's say I'm going to use a List<>.
The question is: Is it better to write my methods to accept and return a high level interface such as C#'s IEnumerable? Or should I write methods to take and pass the specific container class that I have chosen.
What factors and criteria should I look for to decide? What kind of programs work benefit from one or the other? Does the computer language affect your decision? Performance? Program size? Personal style?
(Does it even matter?)
**(Homework: find it. But please post your answer here before you look for my own, so as not bias you.)*
Your method should always accept the least-specific type it needs to execute its function. If your method needs to enumerate, accept IEnumerable. If it needs to do IList<>-specific things, by definition you must give it a IList<>.
The only thing that should affect your decision is how you plan to use the parameter. If you're only iterating over it, use IEnumerable<T>. If you are accessing indexed members (eg var x = list[3]) or modifying the list in any way (eg list.Add(x)) then use ICollection<T> or IList<T>.
There is always a tradeoff. The general rule of thumb is to declare things as high up the hierarchy as possible. So if all you need is access to the methods in IEnumerable then that is what you should use.
Another recent example of a SO question was a C API that took a filename instead of a File * (or file descriptor). There the filename severly limited what sores of things could be passed in (there are many things you can pass in with a file descriptor, but only one that has a filename).
Once you have to start casting you have either gone too high OR you should be making a second method that takes a more specific type.
The only exception to this that I can think of is when speed is an absolute must and you do not want to go through the expense of a virtual method call. Declaring the specific type removes the overhead of virtual functions (will depend on the language/environment/implementation, but as a general statement that is likely correct).
It was a discussion with me that prompted this question, so Euro Micelli already knows my answer, but here it is! :)
I think Linq to Objects already provides a great answer to this question. By using the simplest interface to a sequence of items it could, it gives maximum flexibility about how you implement that sequence, which allows lazy generation, boosting productivity without sacrificing performance (not in any real sense).
It is true that premature abstraction can have a cost - but mainly it is the cost of discovering/inventing new abstractions. But if you already have perfectly good ones provided to you, then you'd be crazy not to take advantage of them, and that is what the generic collection interfaces provides you with.
There are those who will tell you that it is "easier" to make all the data in a class public, just in case you will need to access it. In the same way, Euro advised that it would be better to use a rich interface to a container such as IList<T> (or even the concrete class List<T>) and then clean up the mess later.
But I think, just as it is better to hide the data members of a class that you don't want to access, to allow you to modify the implementation of that class easily later, so you should use the simplest interface available to refer to a sequence of items. It is easier in practice to start by exposing something simple and basic and then "loosen" it later, than it is to start with something loose and struggle to impose order on it.
So assume IEnumerable<T> will do to represent a sequence. Then in those cases where you need to Add or Remove items (but still don't need by-index lookup), use IContainer<T>, which inherits IEnumerable<T> and so will be perfectly interoperable with your other code.
This way it will be perfectly clear (just from local examination of some code) precisely what that code will be able to do with the data.
Small programs require less abstraction, it is true. But if they are successful, they tend to become big programs. This is much easier if they employ simple abstractions in the first place.
It does matter, but the correct solution completely depends on usage. If you only need to do a simple enumeration then sure use IEnumerable that way you can pass any implementer to access the functionality you need. However if you need list functionality and you don't want to have to create a new instance of a list if by chance every time the method is called the enumerable that was passed wasn't a list then go with a list.
I answered a similar C# question here. I think you should always provide the simplest contract you can, which in the case of collections in my opinion, ordinarily is IEnumerable Of T.
The implementation can be provided by an internal BCL type - be it Set, Collection, List etcetera - whose required members are exposed by your type.
Your abstract type can always inherit simple BCL types, which are implemented by your concrete types. This in my opinion allows you to adhere to LSP easier.