What is the best way to create and access multidimensional hashes in TCL that are present in perl for example:
if{ $line = (\D+) ....} {
$hash{name}=$1
$hash{height}=$2
}
etc
You can either use composite keys like this (the neatest option in the simplest cases):
set x 1
set y 2
set d($x,$y) 3
Or you can put a dictionary in an array element:
set x 1
set y 2
dict set d($x) $y 3
Or you can use a nested dictionary:
set x 1
set y 2
dict set d $x $y 3
There are some subtleties about the differences between them, but most of the time most people's code doesn't really care and doesn't need to care.
Here's the case where you are most likely to need to care. If you're doing the first option and you can't make guarantees about what characters might be in the atomic keys, you can use list to build the overall key (as that knows how to apply quoting rules to avoid confusion):
set x "the quick, cunning brown fox"
set y "the ever-so, ever-so lazy dog"
set d([list $x $y]) "jumps over"
Of course, that makes access more awkward as you need to use list (or another list-building command) when building the keys (or have the right string literals, which is annoying for larger keys). The other two options have no problems at all with confusing arbitrary keys; dictionaries had not getting mixed up over such things as an explicit design goal.
Related
For example, I want to pass a dictionary with 10 pairs into a function to bypass the 8 valence limit. I then want the keys of the dictionary each to be assigned as a local variable to be assigned to their value and use that as the parameters for my function. Alternatively, is there a better way to pull this off?
I am afraid there is no way to functionally assign value to local scope variable. E.g.
{eval parse "a: 10";}1b
creates global variable a.
You may fix some scope name, e.g. .l, keep variables there and clear scope before function return, e.g.:
{
eval each (:),'flip (`$".l.",/:string key x;value x);
r: .l.a + .l.b + .l.c;
delete from `.l;
r
}`a`b`c!1 2 3
But getting values directly from input dictionary (like x[`a]) seems easier and clearer approach.
apply helps to simplify invoking other functions, using dictionary values as parameters. This may be what you are looking for. E.g.
{
f: {x+y+z};
f . x`a`b`c
}`a`b`c!1 2 3
I have been working through the Advent of Code problems in Perl6 this year and was attempting to use a grammar to parse the Day 3's input.
Given input in this form: #1 # 1,3: 4x4 and this grammar that I created:
grammar Claim {
token TOP {
'#' <id> \s* '#' \s* <coordinates> ':' \s* <dimensions>
}
token digits {
<digit>+
}
token id {
<digits>
}
token coordinates {
<digits> ',' <digits>
}
token dimensions {
<digits> 'x' <digits>
}
}
say Claim.parse('#1 # 1,3: 4x4');
I am interested in extracting the actual tokens that were matched i.e. id, x + y from coordinates, and height + width from the dimensions from the resulting parse. I understand that I can pull them from the resulting Match object of Claim.parse(<input>), but I have to dig down through each grammar production to get the value I need e.g.
say $match<id>.hash<digits>.<digit>;
this seems a little messy, is there a better way?
For the particular challenge you're solving, using a grammar is like using a sledgehammer to crack a nut.
Like #Scimon says, a single regex would be fine. You can keep it nicely readable by laying it out appropriately. You can name the captures and keep them all at the top level:
/ ^
'#' $<id>=(\d+) ' '
'# ' $<x>=(\d+) ',' $<y>=(\d+)
': ' $<w>=(\d+) x $<d>=(\d+)
$
/;
say ~$<id x y w d>; # 1 1 3 4 4
(The prefix ~ calls .Str on the value on its right hand side. Called on a Match object it stringifies to the matched strings.)
With that out the way, your question remains perfectly cromulent as it is because it's important to know how P6 scales in this regard from simple regexes like the one above to the largest and most complex parsing tasks. So that's what the rest of this answer covers, using your example as the starting point.
Digging less messily
say $match<id>.hash<digits>.<digit>; # [「1」]
this seems a little messy, is there a better way?
Your say includes unnecessary code and output nesting. You could just simplify to something like:
say ~$match<id> # 1
Digging a little deeper less messily
I am interested in extracting the actual tokens that were matched i.e. id, x + y from coordinates, and height + width from the dimensions from the resulting parse.
For matches of multiple tokens you no longer have the luxury of relying on Perl 6 guessing which one you mean. (When there's only one, guess which one it guesses you mean. :))
One way to write your say to get the y coordinate:
say ~$match<coordinates><digits>[1] # 3
If you want to drop the <digits> you can mark which parts of a pattern should be stored in a list of numbered captures. One way to do so is to put parentheses around those parts:
token coordinates { (<digits>) ',' (<digits>) }
Now you've eliminated the need to mention <digits>:
say ~$match<coordinates>[1] # 3
You could also name the new parenthesized captures:
token coordinates { $<x>=(<digits>) ',' $<y>=(<digits>) }
say ~$match<coordinates><y> # 3
Pre-digging
I have to dig down through each grammar production to get the value I need
The above techniques still all dig down into the automatically generated parse tree which by default precisely corresponds to the tree implicit in the grammar's hierarchy of rule calls. The above techniques just make the way you dig into it seem a little shallower.
Another step is to do the digging work as part of the parsing process so that the say is simple.
You could inline some code right into the TOP token to store just the interesting data you've made. Just insert a {...} block in the appropriate spot (for this sort of thing that means the end of the token given that you need the token pattern to have already done its matching work):
my $made;
grammar Claim {
token TOP {
'#' <id> \s* '#' \s* <coordinates> ':' \s* <dimensions>
{ $made = ~($<id>, $<coordinatess><x y>, $<dimensions><digits>[0,1]) }
}
...
Now you can write just:
say $made # 1 1 3 4 4
This illustrates that you can just write arbitrary code at any point in any rule -- something that's not possible with most parsing formalisms and their related tools -- and the code can access the parse state as it is at that point.
Pre-digging less messily
Inlining code is quick and dirty. So is using a variable.
The normal thing to do for storing data is to instead use the make function. This hangs data off the match object that's being constructed corresponding to a given rule. This can then be retrieved using the .made method. So instead of $make = you'd have:
{ make ~($<id>, $<coordinatess><x y>, $<dimensions><digits>[0,1]) }
And now you can write:
say $match.made # 1 1 3 4 4
That's much tidier. But there's more.
A sparse subtree of a parse tree
.oO ( 🎶 On the first day of an imagined 2019 Perl 6 Christmas Advent calendar 🎶 a StackOverflow title said to me ... )
In the above example I constructed a .made payload for just the TOP node. For larger grammars it's common to form a sparse subtree (a term I coined for this because I couldn't find a standard existing term).
This sparse subtree consists of the .made payload for the TOP that's a data structure referring to .made payloads of lower level rules which in turn refer to lower level rules and so on, skipping uninteresting intermediate rules.
The canonical use case for this is to form an Abstract Syntax Tree after parsing some programming code.
In fact there's an alias for .made, namely .ast:
say $match.ast # 1 1 3 4 4
While this is trivial to use, it's also fully general. P6 uses a P6 grammar to parse P6 code -- and then builds an AST using this mechanism.
Making it all elegant
For maintainability and reusability you can and typically should not insert code inline at the end of rules but should instead use Action objects.
In summary
There are a range of general mechanisms that scale from simple to complex scenarios and can be combined as best fits any given use case.
Add parentheses as I explained above, naming the capture that those parentheses zero in on, if that is a nice simplification for digging into the parse tree.
Inline any action you wish to take during parsing of a rule. You get full access to the parse state at that point. This is great for making it easy to extract just the data you want from a parse because you can use the make convenience function. And you can abstract all actions that are to be taken at the end of successfully matching rules out of a grammar, ensuring this is a clean solution code-wise and that a single grammar remains reusable for multiple actions.
One final thing. You may wish to prune the parse tree to omit unnecessary leaf detail (to reduce memory consumption and/or simplify parse tree displays). To do so, write <.foo>, with a dot preceding the rule name, to switch the default automatic capturing off for that rule.
You can refer to each of you named portions directly. So to get the cordinates you can access :
say $match.<coordinates>.<digits>
this will return you the Array of digits matches. Ig you just want the values the easiest way is probably :
say $match.<coordinates>.<digits>.map( *.Int) or say $match.<coordinates>.<digits>>>.Int or even say $match.<coordinates>.<digits>».Int
to cast them to Ints
For the id field it's even easier you can just cast the <id> match to an Int :
say $match.<id>.Int
I am trying to compare using if condition
xorg != "t8405" or "t9405" or "t7805" or "t8605" or "t8705"
I want to compare if xorg is not equal to all of these values on the right side then perform Y.
I am trying to figure out how can I have more smart comparison better or shell I compare xorg with one by one value?
Regards
I think the in and ni (not in) operators are what you should look at. They test for membership (or non-membership) of a list. In this case:
if {$xorg ni {"t8405" "t9405" "t7805" "t8605" "t8705"}} {
puts "it wasn't in there!"
}
If you've got a lot of these things and are testing frequently, you're actually better off putting the values into the keys of an array and using info exists:
foreach key {"t8405" "t9405" "t7805" "t8605" "t8705"} {
set ary($key) 1
}
if {![info exists ary($xorg)]} {
puts "it wasn't in there!"
}
It takes more setup doing it this way, but it's actually faster per test after that (especially from 8.5 onwards). The speedup is because arrays are internally implemented using fast hash tables; hash lookups are quicker than linear table scans. You can also use dictionaries (approximately dict set instead of set and dict exists instead of info exists) but the speed is similar.
The final option is to use lsearch -sorted if you put that list of things in order, since that switches from linear scanning to binary search. This can also be very quick and has potentially no setup cost (if you store the list sorted in the first place) but it's the option that is least clear in my experience. (The in operator uses a very simplified lsearch internally, but just in linear-scanning mode.)
# Note; I've pre-sorted this list
set items {"t7805" "t8405" "t8605" "t8705" "t9405"}
if {[lsearch -sorted -exact $items $xorg] < 0} {
puts "it wasn't in there!"
}
I usually use either the membership operators (because they're easy) or info exists if I've got a convenient set of array keys. I often have the latter around in practice...
I feel like I understand MAKE as being a constructor for a datatype. It takes two arguments... the first the target datatype, and the second a "spec".
In the case of objects it's fairly obvious that a block of Rebol data can be used as the "spec" to get back a value of type object!
>> foo: make object! [x: 10 y: 20 z: func [value] [print x + y + value] ]
== make object! [
x: 10
y: 20
]
>> print foo/x
10
>> foo/z 1
31
I know that if you pass an integer when you create a block, it will preallocate enough underlying memory to hold a block of that length, despite being empty:
>> foo: make block! 10
== []
That makes some sense. If you pass a string in, then you get the string parsed into Rebol tokens...
>> foo: make block! "some-set-word: {String in braces} some-word 12-Dec-2012"
== [some-set-word: "String in braces" some-word 12-Dec-2012]
Not all types are accepted, and again I'll say so far... so good.
>> foo: make block! 12-Dec-2012
** Script error: invalid argument: 12-Dec-2012
** Where: make
** Near: make block! 12-Dec-2012
By contrast, the TO operation is defined very similar, except it is for "conversion" instead of "construction". It also takes a target type as a first parameter, and then a "spec". It acts differently on values
>> foo: to block! 10
== [10]
>> foo: to block! 12-Dec-2012
== [12-Dec-2012]
That seems reasonable. If it received a non-series value, it wrapped it in a block. If you try an any-block! value with it, I'd imagine it would give you a block! series with the same values inside:
>> foo: to block! quote (a + b)
== [a + b]
So I'd expect a string to be wrapped in a block, but it just does the same thing MAKE does:
>> foo: to block! "some-set-word: {String in braces} some-word 12-Dec-2012"
== [some-set-word: "String in braces" some-word 12-Dec-2012]
Why is TO being so redundant with MAKE, and what is the logic behind their distinction? Passing integers into to block! gets the number inside a block (instead of having the special construction mode), and dates go into to block! making the date in a block instead of an error as with MAKE. So why wouldn't one want a to block! of a string to put that string inside a block?
Also: beyond reading the C sources for the interpreter, where is the comprehensive list of specs accepted by MAKE and TO for each target type?
MAKE is a constructor, TO is a converter. The reason that we have both is that for many types that operation is different. If they weren't different, we could get by with one operation.
MAKE takes a spec that is supposed to be a description of the value you're constructing. This is why you can pass MAKE a block and get values like objects or functions that aren't block-like at all. You can even pass an integer to MAKE and have it be treated like an allocation directive.
TO takes a value that is intended to be more directly converted to the target type (this value being called "spec" is just an unfortunate naming mishap). This is why the values in the input more directly correspond to the values in the output. Whenever there is a sensible default conversion, TO does it. That is why many types don't have TO conversions defined between them, the types are too different conceptually. We have fairly comprehensive conversions for some types where this is appropriate, such as to strings and blocks, but have carefully restricted some other conversions that are more useful to prohibit, such as from none to most types.
In some cases of simple types, there really isn't a complex way to describe the type. For them, it doesn't hurt to have the constructors just take self-describing values as their specs. Coincidentally, this ends up being the same behavior as TO for the same type and values. This doesn't hurt, so it's not useful to trigger an error in this case.
There are no comprehensive docs for the behavior of MAKE and TO because in Rebol 3 their behavior isn't completely finalized. There is still some debate in some cases about what the proper behavior should be. We're trying to make things more balanced, without losing any valuable functionality. We've already done a lot of work improving none and binary conversions, for instance. Once they are more finalized, and once we have a place to put them, we'll have more docs. In the meanwhile most of the Rebol 2 behavior is documented, and most of the changes so far for Rebol 3 are in CureCode.
Also: beyond reading the C sources for the interpreter, where is the
comprehensive list of specs accepted by MAKE and TO for each target
type?
May not be that useful, since it's red specific:
comparison-matrix
conversion-matrix
But it does at least mention if the behaviour is similar or different from rebol
Would like to take an advice from TCL professionals for best practice.
Say you want to construct a list with a specific data by using a proc. Now which is the best way?
proc processList { myList } {
upvar $myList list_
#append necessary data into list_
}
proc returnList {} {
set list_ {}
#append necessary data into list_
return $list_
}
set list1 {}
processList list1
set list2 [returnList ]
Which one of this practices is recommended?
EDIT: I am sorry but I can't understand consensus (and explanation) of people who answered to this question.
I virtually always use the second method:
proc returnList {} {
set result {}
# ... accumulate the result like this ...
lappend result a b c d e
return $result
}
set lst [returnList]
There's virtually no difference in memory usage or speed, but I find it easier to think functionally. Also, in Tcl 8.5 you can do the splitting up of the result list relatively simply (if that's what you need):
set remainderList [lassign [returnList] firstValue secondValue]
With that, you'd end up with a in $firstValue, b in secondValue, and c d e in $remainderList.
The first option modify an existing list whereas the second create a new list. For a better comparison, I would add a parameter to returnList() and create a return value from that parameter.
Given that, the difference is in the way of passing parameters -- by reference or by value-- and in the memory budget needed by each operation.
There is no side effect with the second method, but it could be very consuming if the list is huge.
There is no general rule for recommending one over the other. My own rule is to start with the second way unless other constraints lead not to do so.
What would the syntax be for the equivalent of:
set lst [returnList]
If you wanted to return more than one list at once?
I thought that if you did something like this:
return [$list1 $list2]
It was supposed to return a list of lists, which you could then access with lindex. But, that doesn't appear to be exactly what it does. It really just gives you two lists back, without external curly braces grouping them into a single list. In Perl, you can do something like:
($var1, $var2) = process_that_returns_two_values;
But I don't think "set" allows that in Tcl.