Tcl and records (structs) - tcl

Package struct::record from TCLLIB provides means for emulating record types. But record instances are commands in the current namespace and not variables in the current scope. This means there is no garbage collection for record instances. Passing name of the record instance to a procedure means passing it by reference not by value, it is possible to pass string representation of the record as parameter but it requires to create another instance in the procedure, configure it and delete by hand, it's annoying. I wonder about the rationale behind this design. A simple alternative is provide a lisp-style records - a set of construction, access and modification procedures and represent records as lists.

The struct::record implementation is, from my viewpoint, an oo-style implementation. If you're searching for a data-style implementation (like lisp) where the commands are totally separate from the data, you might want to look at the dict command.
I'll note that oo-style and data-style are really not good descriptions, but they were the best I could think of offhand.

You most certainly can do it “the Lisp way”.
proc mkFooBarRecord {foo bar} {
# Keep index #0 for a "type" for easier debugging
return [list "fooBarRecord" $foo $bar]
}
proc getFoo {fooBarRecord} {
if {[lindex $fooBarRecord 0] ne "fooBarRecord"} {error "not fooBarRecord"}
return [lindex $fooBarRecord 1]
}
# Etc.
That works quite well. (Write it in C and you can make it more efficient too.) Mind you, as a generic data structure, it seems that many people prefer Tcl 8.5's dictionaries. There are many ways to use them; here's one:
proc mkFooBarRecord {foo bar} {
return [dict create "type" fooBarRecord "foo" $foo "bar" $bar]
}
proc getFoo {fooBarRecord} {
dict with fooBarRecord {
if {$type ne "fooBarRecord"} {error "not fooBarRecord"}
return $foo
}
}
As for the whole structures versus objects debate, Tcl tends to regard objects as state with operations (leading to a natural presentation as a command, a fairly heavyweight concept) whereas structures are pure values (and so lightweight). Having written a fair chunk on this, I really don't know what's best in general; I work on a case-by-case basis. If you are going with “structures”, also consider whether you should have collections that represent fields across many structures (equivalent to using column-wise storage instead of row-wise storage in a database) as that can lead to more efficient handling in some cases.
Also consider using a database; SQLite integrates extremely well with Tcl, is reasonably efficient, and supports in-memory databases if you don't want to futz around with disk files.

I will not answer your question, because I was not been using Tcl for many years and I never use this kind of struct, but I can give you the path to two possible places that are very plausible to provide a good answer for you:
The Tcl'ers Wiki http://wiki.tcl.tk
The Frenode's Tcl IRC channel
At the time I used Tcl they proved to be invaluable resources.

Related

Is everything really a string in TCL?

And what is it, if it isn't?
Everything I've read about TCL states that everything is just a string in it. There can be some other types and structures inside of an interpreter (for performance), but at TCL language level everything must behave just like a string. Or am I wrong?
I'm using an IDE for FPGA programming called Vivado. TCL automation is actively used there. (TCL version is still 8.5, if it helps)
Vivado's TCL scripts rely on some kind of "object oriented" system. Web search doesn't show any traces of this system elsewhere.
In this system objects are usually obtained from internal database with "get_*" commands. I can manipulate properties of these objects with commands like get_property, set_property, report_property, etc.
But these objects seem to be something more than just a string.
I'll try to illustrate:
> set vcu [get_bd_cells /vcu_0]
/vcu_0
> puts "|$vcu|"
|/vcu_0|
> report_property $vcu
Property Type Read-only Value
CLASS string true bd_cell
CONFIG.AXI_DEC_BASE0 string false 0
<...>
> report_property "$vcu"
Property Type Read-only Value
CLASS string true bd_cell
CONFIG.AXI_DEC_BASE0 string false 0
<...>
But:
> report_property "/vcu_0"
ERROR: [Common 17-58] '/vcu_0' is not a valid first class Tcl object.
> report_property {/vcu_0}
ERROR: [Common 17-58] '/vcu_0' is not a valid first class Tcl object.
> report_property /vcu_0
ERROR: [Common 17-58] '/vcu_0' is not a valid first class Tcl object.
> puts |$vcu|
|/vcu_0|
> report_property [string range $vcu 0 end]
ERROR: [Common 17-58] '/vcu_0' is not a valid first class Tcl object.
So, my question is: what exactly is this "valid first class Tcl object"?
Clarification:
This question might seem like asking for help with Vivado scripting, but it is not. (I was even in doubt about adding [vivado] to tags.)
I can just live and script with these mystic objects.
But it would be quite useful (for me, and maybe for others) to better understand their inner workings.
Is this "object system" a dirty hack? Or is it a perfectly valid TCL usage?
If it's valid, where can I read about it?
If it is a hack, how is it (or can it be) implemented? Where exactly does string end and object starts?
Related:
A part of this answer can be considered as an opinion in favor of the "hack" version, but it is quite shallow in a sense of my question.
A first class Tcl value is a sequence of characters, where those characters are drawn from the Basic Multilingual Plane of the Unicode specification. (We're going to relax that BMP restriction in a future version, but that's not yet in a version we'd recommend for use.) All other values are logically considered to be subtypes of that. For example, binary strings have the characters come from the range [U+000000, U+0000FF], and integers are ASCII digit sequences possibly preceded by a small number of prefixes (e.g., - for a negative number).
In terms of implementation, there's more going on. For example, integers are usually implemented using 64-bit binary values in the endianness that your system uses (but can be expanded to bignums when required) inside a value boxing mechanism, and the string version of the value is generated on demand and cached while the integer value doesn't change. Floating point numbers are IEEE double-precision floats. Lists are internally implemented as an array of values (with smartness for handling allocation). Dictionaries are hash tables with linked lists hanging off each of the hash buckets. And so on. THESE ARE ALL IMPLEMENTATION DETAILS! As a programmer, you can and should typically ignore them totally. What you need to know is that if two values are the same, they will have the same string, and if they have the same string, they are the same in the other interpretation. (Values with different strings can also be equal for other reasons: for example, 0xFF is numerically equal to 255 — hex vs decimal — but they are not string equal. Tcl's true natural equality is string equality.)
True mutable entities are typically represented as named objects: only the name is a Tcl value. This is how Tcl's procedures, classes, I/O system, etc. all work. You can invoke operations on them, but you can only see inside to a limited extent.
Vivado TCL is not TCL. Vivado will not really document their language they call TCL, but refer you to the real TCL language documentation. Where Vivado TCL and TCL differ, you are left on your own without help. TCL was a poor choice for a scripting language given the very large data bases, so they had to bastardize it to get it half functional. You are better off getting help on the Xilinx forums then in general TCL forums. Why they went with TCL rather than python is beyond anyone's comprehension.

Performance comparison: one argument or a list of arguments?

I am defining a new TCL command whose implementation is C++. The command is to query a data stream and the syntax is something like this:
mycmd <arg1> <arg2> ...
The idea is this command takes a list of arguments and returns a list which has the corresponding data for each argument.
My colleague commented that it is best just to use a single argument and when multi values are needed, just call the command multiple times.
There are some other discussions, but one thing we cannot agree with each other is, the performance.
I think my version, list of argument should be quicker because when we want multi arguments, it is one time cost going through TCL interpreter.
His comment is new to me -
function implementation is cached
accessing TCL function is quicker than accessing TCL data
Is this reasoning sound?
If you use Tcl_EvalObjv to invoke the command, you won't go through the Tcl interpreter. The cost will be one hash-table lookup (or less, if you reuse the Tcl_Obj* containing the command name) and then you'll be in the implementation of the command. Otherwise, constructing a list Tcl_Obj* (e.g., with Tcl_NewListObj) and then calling Tcl_EvalObj is nearly as cheap, as that's a special case because the list construction code is guaranteed to produce lists that are also substitution-free commands.
Building a normal string and passing that through Tcl_Eval (or Tcl_EvalObj) is significantly slower, as that has to be parsed. (OTOH, passing the same Tcl_Obj* through Tcl_EvalObj multiple times in a row will be faster as it will be compiled internally to bytecode.)
Accessing into values (i.e., into Tcl_Obj* references) is pretty fast, provided the internal representation of those values matches the type that the access function requires. If there's a mismatch, an internal type conversion function may be called and they're often relatively expensive. To understand internal representations, here's a few for you to think about:
string — array of unicode characters
integer — a C long (except when you spill over into arbitrary precision work)
list — array of Tcl_Obj* references
dict — hash table that maps Tcl_Obj* to Tcl_Obj*
script — bytecoded version
command — pointer to the implementation function
OK, those aren't the exact types (there's often other bookkeeping data too) but they're what you should think of as the model.
As to “which is fastest”, the only sane way to answer the question is to try it and see which is fastest for real: the answer will depend on too many factors for anyone without the actual code to predict it. If you're calling from Tcl, the time command is perfect for this sort of performance analysis work (it is what it is designed for). If you're calling from C or C++, use that language's performance measurement idioms (which I don't know, but would search Stack Overflow for).
Myself? I advise writing the API to be as clear and clean as possible. Describe the actual operations, and don't distort everything to try to squeeze an extra 0.01% of performance out.

What language feature can allow transition from visitor to sequence?

I will use C# syntax since I am familiar with it, but it is not really language-specific.
Let's say we want to provide an API to go over a Tree and do something with each Node.
Solution 1: void Visit(Tree tree, Action<Node> action)
It takes a tree, and calls action on each node in the tree.
Solution 2: IEnumerable<Node> ToEnumerable(Tree tree)
It converts tree to a flat lazy sequence so we can go over and call action on each node.
Now, let's see how we can convert one API to another.
It is pretty trivial to provide Visit on top of ToEnumerable:
void Visit(Tree tree, Action<Node> action) {
ToEnumerable(tree).ForEach(action);
}
However, is there a concept/feature in any language that will allow to provide ToEnumerable on top of Visit (as lazy sequence, so list is not created in advance)?
Not sure if I understand you correctly, but in Python, you can create iterable interface on any object.
So you would just add special method __iter__ (which will yield nodes while traversing the tree).
The visit procedure is then just about iterating through Tree object and calling action on each node.
If you are writing the code that will visit each node (as with a tree), it's possible to have an iterator call iterators for each branch, and perform a yield return on leaf nodes. This approach will work, and is very simple, but has the serious disadvantage that it's very easy to end up with code that will be very readable but execute very slowly. Some other questions and answers on this site will offer insight as to how to traverse trees efficiently within an iterator.
If the "tree" was just an example, and what you really have is a class which exposes a routine to call some delegate upon each node (similar to List.ForEach()), but does not expose an IEnumerable, you may be able to use the former to produce a List, which you could then iterate. Use something like var myList = new List<someThing>(); myCollection.ForEach( (x) => myList.Add(x) ); and then you may be able to enumerate myList.
If even that isn't sufficient, because the objects that were added to the list may not be valid by the time enumeration is complete, it may in rare cases be possible to use multiple threading to accomplish what's needed. For example, if you have two sorted collections whose ForEach method prepares each items for use, does the specified action, and then cleans up each item before proceeding to the next, and if you need to interleave the actions on items from two independent collections, one might be able to iterate the collections on separate threads, and use synchronization primitives so each thread will wait as necessary for the other.
Note that collections which only expose themselves via ForEach method are apt to restrict access during the execution of such a ForEach (if such restriction weren't necessary, they would probably implement IEnumerable). It may be possible for the "item action" called by one ForEach to perform another ForEach on the same collection on the same thread, since the latter ForEach would have to complete before the former one could resume. While one ForEach is running, however, an attempt to call a ForEach on a second thread would likely either malfunction or wait for the first operation to complete. If the first ForEach was waiting for some action by the second, deadlock would result. Because of this, scenarios where multi-threading will work better than simply building a List are rare. Nonetheless, there are a few cases where it may be helpful (e.g. the above-mentioned "zipper" operation on independent collections).
I think now I understand the idea. The concept that I need here is called first-class continuations or, specifically, call/cc. The confusing thing about it for me is that C# already provides a limited implementation of this concept in yield return, but it is not applicable to my scenario.
So if C# provided full implementation, the solution would look like:
IEnumerable<Node> ToEnumerable(Tree tree) {
tree.Visit(node => magic yield return node);
}
where magic yield return instead of returning sequence from node => ... lambda returns next element from ToEnumerable.
However, this answer is still not complete as I do not see the exact correlation between yield return and call/cc. I will update the answer when I understand this.

Why all the functions from object oriented language allows to return only one value (General)

I am curious to know about this.
whenever I write a function which have to return multiple values, either I have to use pass by reference or create an array store values in it and pass them.
Why all the Object Orinented languages functions are not allowed to return multiple parameters as we pass them as input. Like is there anything inbuilt structure of the language which is restricting from doing this.
Dont you think it will be fun and easy if we are allowed to do so.
It's not true that all Object-Oriented languages follow this paradigm.
e.g. in Python (from here):
def quadcube (x):
return x**2, x**3
a, b = quadcube(3)
a will be 9 and b will be 27.
The difference between the traditional
OutTypeA SomeFunction(out OutTypeB, TypeC someOtherInputParam)
and your
{ OutTypeA, OutTypeB } SomeFunction(TypeC someOtherInputParam)
is just syntactic sugar. Also, the tradition of returning one single parameter type allows writing in the easy readable natural language of result = SomeFunction(...). It's just convenience and ease of use.
And yes, as others said, you have tuples in some languages.
This is likely because of the way processors have been designed and hence carried over to modern languages such as Java or C#. The processor can load multiple things (pointers) into parameter registers but only has one return value register that holds a pointer.
I do agree that not all OOP languages only support returning one value, but for the ones that "apparently" do, this I think is the reason why.
Also for returning a tuple, pair or struct for that matter in C/C++, essentially, the compiler is returning a pointer to that object.
First answer: They don't. many OOP languages allow you to return a tuple. This is true for instance in python, in C++ you have pair<> and in C++0x a fully fledged tuple<> is in TR1.
Second answer: Because that's the way it should be. A method should be short and do only one thing and thus can be argued, only need to return one thing.
In PHP, it is like that because the only way you can receive a value is by assigning the function to a variable (or putting it in place of a variable). Although I know array_map allows you to do return something & something;
To return multiple parameters, you return an single object that contains both of those parameters.
public MyResult GetResult(x)
{
return new MyResult { Squared = Math.Pow(x,2), Cubed = Math.Pow(x,3) };
}
For some languages you can create anonymous types on the fly. For others you have to specify a return object as a concrete class. One observation with OO is you do end up with a lot of little classes.
The syntactic niceties of python (see #Cowan's answer) are up to the language designer. The compiler / runtime could creating an anonymous class to hold the result for you, even in a strongly typed environment like the .net CLR.
Yes it can be easier to read in some circumstances, and yes it would be nice. However, if you read Eric Lippert's blog, you'll often read dialogue's and hear him go on about how there are many nice features that could be implemented, but there's a lot of effort that goes into every feature, and some things just don't make the cut because in the end they can't be justified.
It's not a restriction, it is just the architecture of the Object Oriented and Structured programming paradigms. I don't know if it would be more fun if functions returned more than one value, but it would be sure more messy and complicated. I think the designers of the above programming paradigms thought about it, and they probably had good reasons not to implement that "feature" -it is unnecessary, since you can already return multiple values by packing them in some kind of collection. Programming languages are designed to be compact, so usually unnecessary features are not implemented.

"Necessary" Uses of Recursion in Imperative Languages

I've recently seen in a couple of different places comments along the lines of, "I learned about recursion in school, but have never used it or felt the need for it since then." (Recursion seems to be a popular example of "book learning" amongst a certain group of programmers.)
Well, it's true that in imperative languages such as Java and Ruby[1], we generally use iteration and avoid recursion, in part because of the risk of stack overflows, and in part because it's the style most programmers in those languages are used to.
Now I know that, strictly speaking, there are no "necessary" uses of recursion in such languages: one can always somehow replace recursion with iteration, no matter how complex things get. By "necessary" here, I'm talking about the following:
Can you think of any particular examples of code in such languages where recursion was so much better than iteration (for reasons of clarity, efficiency, or otherwise) that you used recursion anyway, and converting to iteration would have been a big loss?
Recursively walking trees has been mentioned several times in the answers: what was it exactly about your particular use of it that made recursion better than using a library-defined iterator, had it been available?
[1]: Yes, I know that these are also object-oriented languages. That's not directly relevant to this question, however.
There are no "necessary" uses of recursion. All recursive algorithms can be converted to iterative ones. I seem to recall a stack being necessary, but I can't recall the exact construction off the top of my head.
Practically speaking, if you're not using recursion for the following (even in imperative languages) you're a little mad:
Tree traversal
Graphs
Lexing/Parsing
Sorting
When you are walking any kind of tree structure, for example
parsing a grammar using a recursive-descent parser
walking a DOM tree (e.g. parsed HTML or XML)
also, every toString() method that calls the toString() of the object members can be considered recursive, too. All object serializing algorithms are recursive.
In my work recursion is very rarely used for anything algorithmic. Things like factorials etc are solved much more readably (and efficiently) using simple loops. When it does show up it is usually because you are processing some data that is recursive in nature. For example, the nodes on a tree structure could be processed recursively.
If you were to write a program to walk the nodes of a binary tree for example, you could write a function that processed one node, and called itself to process each of it's children. This would be more effective than trying to maintain all the different states for each child node as you looped through them.
The most well-known example is probably the quicksort algorithm developed by by C.A.R. Hoare.
Another example is traversing a directory tree for finding a file.
In my opinion, recursive algorithms are a natural fit when the data structure is also recursive.
def traverse(node, function):
function(this)
for each childnode in children:
traverse(childnode, function)
I can't see why I'd want to write that iteratively.
It's all about the data you are processing.
I wrote a simple parser to convert a string into a data structure, it's probably the only example in 5 years' work in Java, but I think it was the right way to do it.
The string looked like this:
"{ index = 1, ID = ['A', 'B', 'C'], data = {" +
"count = 112, flags = FLAG_1 | FLAG_2 }}"
The best abstraction for this was a tree, where all leaf nodes are primitive data types, and branches could be arrays or objects. This is the typical recursive problem, a non-recursive solution is possible but much more complex.
Recursion can always be rewritten as iteration with an external stack. However if you're sure that you don't risk very deep recursion that would lead to stackoverflow, recursion is a very convenient thing.
One good example is traversing a directory structure on a known operating system. You usually know how deep it can be (maximum path length is limited) and therefore will not have a stackoverflow. Doing the same via iteration with an external stack is not so convenient.
It was said "anything tree". I may be too cautious, and I know that stacks are big nowadays, but I still won't use recursion on a typical tree. I would, however, do it on a balanced tree.
I have a List of reports. I am using indexers on my class that contains this list. The reports are retrieved by their screen names using the indexers. In the indexer, if the report for that screen name doesn't exist it loads the report and recursively calls itself.
public class ReportDictionary
{
private static List<Report> _reportList = null;
public ReportColumnList this[string screenName]
{
get
{
Report rc = _reportList.Find(delegate(Report obj) { return obj.ReportName == screenName; });
if (rc == null)
{
this.Load(screenName);
return this[screenName]; // Recursive call
}
else
return rc.ReportColumnList.Copy();
}
private set
{
this.Add(screenName, value);
}
}
}
This can be done without recursion using some additional lines of code.