parse specific branch in hiearchically formatted text file in Tcl - tcl

I'm trying to parse an ascii text file that looks like this.
KEY1 VAL1
KEY2 VAL2
KEY3 VAL3
KEY4 VAL4
KEY5 VAL5
KEY6 VAL6
KEY7 VAL7
KEY8 VAL8
KEY9 VAL9
I would like to convert this to a flat table of values from KEYs 1,5,7. I have a very ugly brute force algorithm that loops through the file and sets flags to read values, but that doesn't appear most efficient.
something like:
set f [open $filename]
set data [split [read $f] "\n"]
foreach line $data {
if {[string match KEY1* $line] ==1} {set key1match 1}
if {($keymatch1==1) && ([string match KEY5* $line] ==1} {set key5match 1}
...
Is there a more elegant way to generate this mapping?

Is this what you wanted?
set keylist {}
set keyset {KEY1 KEY5 KEY7}
set flatDict {}
foreach line [split [string trim $input] \n] {
if {[regexp {(\s*)(\w+)\s*(.*)} $line -> indent key val] && $key in $keyset} {
set level [expr {[string length $indent] / 2}]
set keylist [lrange $keylist 0 $level]
lappend keylist $key
dict set flatDict $keylist $val
}
}
% set flatDict
KEY1 VAL1 {KEY1 KEY5} VAL5 {KEY1 KEY5 KEY7} VAL7
This code keeps a list of keys, keylist, that grows (by lappend) and contracts (by lrange) according to indentation (and is completely dependent on indentation being correct). Only keys in a given set, keyset are considered. For each value added to the dictionary, the current $keylist is used as a key (the dict command can deal with key hierarchies, but then the keys must be separate and not inside a list (e.g. dict set myDict foo bar 123).
Documentation:
&& (operator),
/ (operator),
dict,
expr,
foreach,
if,
in (operator),
lappend,
lrange,
regexp,
set,
split,
string,
Syntax of Tcl regular expressions
Afterthought: with your selection of keys there is actually no need to contract the key list. If you only use keys that follow a single line of descent from the root, you could use this code:
set keylist {}
set flatDict {}
foreach line [split [string trim $input] \n] {
set val [lassign [split [string trim $line]] key]
if {$key in $keyset} {
lappend keylist $key
dict set flatDict $keylist $val
}
}
% set flatDict
KEY1 VAL1 {KEY1 KEY5} VAL5 {KEY1 KEY5 KEY7} VAL7
Note that in both examples, I have provided for values that might contain whitespace. The code can be made a little more regular if the value is always atomic.

Here's some code to parse that data into a dictionary:
set indent_width 2
set d [dict create]
set fh [open [lindex $argv 0] r]
while {[gets $fh line] != -1} {
regexp {^(\s*)(\S+)\s*(.*)} $line -> indent key value
if {$key eq ""} continue
set level [expr {[string length $indent] / $indent_width}]
dict set d $key level $level
dict set d $key value $value
dict set d $key children [list]
dict set d $key parent ""
dict set d last $level $key
set prev_level [expr {$level - 1}]
if {$prev_level >= 0} {
set parent_key [dict get $d last $prev_level]
dict update d $parent_key item {
dict lappend item children $key
}
dict set d $key parent $parent_key
}
}
dict unset d last
dict for {key value} $d {puts [list $key $value]}
outputs
KEY1 {level 0 value VAL1 children {KEY2 KEY5} parent {}}
KEY2 {level 1 value VAL2 children KEY3 parent KEY1}
KEY3 {level 2 value VAL3 children KEY4 parent KEY2}
KEY4 {level 3 value VAL4 children {} parent KEY3}
KEY5 {level 1 value VAL5 children KEY6 parent KEY1}
KEY6 {level 2 value VAL6 children KEY7 parent KEY5}
KEY7 {level 3 value VAL7 children {} parent KEY6}
KEY8 {level 0 value VAL8 children KEY9 parent {}}
KEY9 {level 1 value VAL9 children {} parent KEY8}

Related

assign a key value list to an array in Tcl

I have a list that is a key value paired list. Something like the following
key1 value1 key2 value2 key3 value3
I would like to map this to an array or a dictionary.
Currently, my code looks like this
for {set i 0} {$i < [llength $list]} {incr i} {
if {[expr {fmod($i,2)}] == 0} {
set key [lindex $list $i]
} else {
set v_array(${key}) [lindex $list $i]
}
}
In perl, I know this can be assigned into a key value dictionary in one pass. Is there such simpler method in Tcl?
You can create an array in one line (I'm using one line to define the list):
% set list [list key1 value1 key2 value2 key3 value3]
key1 value1 key2 value2 key3 value3
% array set v_array $list
And if you want to check the contents, you can use parray (Tcl 8.5 and later):
% parray v_array
v_array(key1) = value1
v_array(key2) = value2
v_array(key3) = value3
And the documentation for the other array commands can be found here with examples for each.
If you somehow cannot avoid a loop, then using foreach would be easier (be sure the list has an even number of elements):
foreach {a b} $list {
set v_array($a) $b
}
(Here foreach is taking the elements in $list two at a time and assign them to a and b)
You can use dict command for creating/manipulating dictionaries in Tcl.
% set mydict [dict create key1 value1 key2 value2 key3 value3]
key1 value1 key2 value2 key3 value3
% dict get $mydict
key1 value1 key2 value2 key3 value3
% dict get $mydict key3
value3
% dict get $mydict key1
value1
%
Even without the dict create command, you can straightway fetch/access the keys and values even from a list as long as it is in key-value form. i.e. even number of elements.
For example,
% set mylist {key1 value1 key2 value2 key3 value3}
key1 value1 key2 value2 key3 value3
% dict get $mylist key2
value2
As you can notice, I have not used dict create command here , but still able to access the dictionary items.
Reference : dict
Easiest solution:
set mylist {key1 value1 key2 value2 key3 value3}
array set arr $mylist
Thats it.
Now, do a parray to check.
file: t3
#
set list [list key1 value1 key2 value2 key3 value3]
array set arr $list
parray arr
#
Execute the file: tclsh t3
arr(key1) = value1
arr(key2) = value2
arr(key3) = value3

Robust way to pick value from X-Y table in TCL

I would like to know the most robust method for extracting Y Value for a given X Value from column of X-Y data.
I am currently performing this operation with the following code, but is very unreliable/flakey as it keeps falling over with error of can't read or no variable var_01
Please advice.
Iterate based on Column Z
for {set i 0} {$i < [llength $Col_z]} {incr i} {
set Xdata [lindex $Col_x $i]
set Ydata [lindex $Col_y $i]
lappend var $Ydata
if { $Xdata >= 0.9 && $Xdata <= 1.1 } {
set a [lindex $var $i]
lappend var_01 $a
} else {lappend var_01 0
#set var_01 0}
}
It's very hard to work out what you want to do, but maybe it helps to simplify the code a bit:
foreach z $Col_z x $Col_x y $Col_y {
if {$z eq {}} {
break
}
if {$x >= 0.9 && $x <= 1.1} {
lappend var_01 $y
} else {
lappend var_01 0
}
}
Edit according to comment: is this better?
set var_01 {}
foreach z $Col_z x $Col_x y $Col_y {
if {$z eq {}} {
break
}
if {$x >= 0.9 && $x <= 1.1} {
lappend var_01 $y
}
}
Note that var_01 might be empty if no value of x is within the range.
Documentation:
&& (operator),
<= (operator),
>= (operator),
break,
eq (operator),
foreach,
if,
lappend,
set
A very convenient way to represent tables in tcl is by simple array. Here is an example:
array set xy {}
foreach i {1 2 3} {
foreach j {10 20 30} {
set xy($i,$j) [expr $i + $j]
}
}
Now xy is an array whose keys look like table indexes. Here:
% array names xy
3,10 2,20 1,30 3,20 2,30 3,30 1,10 2,10 1,20
Or more clear:
% foreach k [array names xy] {puts $k}
3,10
2,20
1,30
3,20
2,30
3,30
1,10
2,10
1,20
Here is how to access them:
% puts $xy(3,10)
13
The 3,10 inside the parenthesis is a string! The array returns the value associated with that string, which was associated in the above loop. (Therefore there must not be space after the comma).
It's easy to access the values if the indexes are given in variables:
% set x 3
3
% set y 10
10
% puts $x,$y
3,10
The last command is equivalent to explicit quotation marks:
% puts "$x,$y"
3,10
And here is how we access the array element at that key:
% puts $xy($x,$y)
13
And if the key doesn't exist:
% puts $xy(4,10)
can't read "xy(4,10)": no such element in array
Let's conclude with printing the keys and values of the array:
% foreach k [array names xy] {puts "$k: $xy($k)"}
3,10: 13
2,20: 22
1,30: 31
3,20: 23
2,30: 32
3,30: 33
1,10: 11
2,10: 12
1,20: 21
ADDED
Now suppose you have the y and z values, how do you find the x?
set y 20
set z 23
Using the special, powerful tcl property of everything is a string:
Here we find all keys and values matching the key pattern *,20:
set results [array get xy *,$y]
Let's see:
puts $results
% 2,20 22 3,20 23 1,20 21
We got a list of 3 pairs, each contains the key and value.
Now let's extract the key/value that corresponds to outr $z. We will use the powerful regexp tcl command, seeing $results now as a string instead of a list:
regexp "(\\d+),($y) ($z)" $results whole x1 y1 z1
And now x1, y1, z1 hold all the information we want:
puts "$x1 $y1 $z1"
% 3 20 23

Count number of unique element in a list

Say I have a list, a b c b b d e e f …, and I don't know how many different kind of elements are in there.
How do I count the number of each unique element and print them out?
Output would looks like:
a: 32
b: 12
c: 6
…
You have to count them up. This isn't too hard with an array or dictionary of counters. I'll use a dictionary since then they'll be printed in order of first occurrence. (With an array, you'd get a “random” order or you'd have to sort them.)
set counters {}
foreach item $list {
dict incr counters $item
}
dict for {item count} $counters {
puts "${item}: $count"
}
Try this if you have 8.4 or older version of TCL,
set lst "a a a a b b b c c c d d a a a f f f f f s s s s"
set unique [lsort -unique $lst]
foreach f $unique {
set cnt 0
foreach item $lst {
if {$item == $f} {
incr cnt
}
}
puts "$f :: $cnt"
}
Gives Output Like,
% tclsh main.tcl
a :: 7
b :: 3
c :: 3
d :: 2
f :: 5
s :: 4
It can be easily done using lsearch and llength.
Lets say your list is {a c a c s a a c a} then,
set tempList {a c a c s a a c a}
puts "c : [llength [lsearch -all $tempList c]]"
puts "a : [llength [lsearch -all $tempList a]]"
puts "d : [llength [lsearch -all $tempList d]]"
Output :
c : 3
a : 5
d : 0
Explanation : lsearch -all, will return all the index of matching element
and this list of index is returned to llength which will count length of the list.
The dict or array solution is the best one and should be preferred. Another way that works on a sorted list of tokens is to match contiguous regions of non-blank tokens.
% regexp -all -inline {(\S+)(?:\s+\1)*} {a a b b b c d d}
{a a} a {b b b} b c c {d d} d
The result is an even-sized list of alternately matched regions of tokens and the token matched in the region. This can be used to print a frequency report for the tokens in the list in list.
foreach {a b} [regexp -all -inline {(\S+)(?:\s+\1)*} [lsort $list]] {
puts "$b: [llength $a]"
}
Note the limitation that the tokens cannot contain blanks. This can be overcome, but it's simpler to use the array / dict solution which only requires the tokens to be valid list elements.
Documentation: foreach, llength, lsort, puts, Syntax of Tcl regular expressions, regexp

TCL sort a file mathematically

I have a file which has multiple lines like :-
A B A 10 20
A B A 10 20
C D A 10 15
A B Q 15 20
A B A 35 45
A B A 15 20
C D A 10 15
A B A 20 25
.
.
.
A A A x1 y1
The first three fileds are some text patterns.
Now I want to write a program in TCL which does BOTH of the following:-
Does a unique sort "sort -u" for the file & reoves the repeated lines & dumps the O/P in new file.
For case where 1st three field is same dump only those lines where the numbers are greater than 10 from each other.
For eg the O/P of above file satisfying both conditions will be:-
A B A 10 20
A B A 35 45
C D A 10 15
A B Q 15 20
The order of lines is not important in file.
##Changed the program
set input [open "data.txt" "r"]
set content [read $input]
set lines [lsort -unique [split $content "\n"]]
set keylist ""
set valuelist ""
foreach line $lines {
if {$line == ""} { continue }
set data [split $line " "]
set key [join [lrange $data 0 2] "_"]
set index [lsearch $keylist $key]
if {$index != -1} {
set value [lindex $valuelist $index]
set diff_a [expr [lindex $data 3] - [lindex $value 0]]
set diff_b [expr [lindex $data 4] - [lindex $value 1]]
if {$diff_a > 10 && $diff_b > 10 } {
puts $line
}
set a [ lreplace valuelist $index $index [lrange $data 3 4]]
set valuelist $a
} else {
lappend keylist $key
lappend valuelist [lrange $data 3 4]
puts $line
}
}
It's not a smart solution, but works.
set input [open "data.txt" "r"]
set content [read $input]
set lines [lsort -unique [split $content "\n"]]
set keylist ""
set valuelist ""
foreach line $lines {
if {$line == ""} { continue }
set data [split $line " "]
set key [join [lrange $data 0 2] "_"]
set index [lsearch $keylist $key]
if {$index != -1} {
set value [lindex $valuelist $index]
set diff_a [expr [lindex $data 3] - [lindex $value 0]]
set diff_b [expr [lindex $data 4] - [lindex $value 1]]
if {$diff_a > 10 && $diff_b > 10 } {
puts $line
}
set valuelist [lreplace valuelist $index $index [lrange $data 3 4]]
} else {
lappend keylist $key
lappend valuelist [lrange $data 3 4]
puts $line
}
}
Output:
A B A 10 20
A B A 35 45
A B Q 15 20
C D A 10 15

How to pass a dictionary with more arguments into a proc in tcl?

proc test {a b c } {
puts $a
puts $b
puts $c
}
set test_dict [dict create a 2 b 3 c 4 d 5]
Now I want to pass dict into test like this:
test $test_dict
How to make test only selects three elements in the dict with the same name of its parameters (the keys). The expected output should be:
2
3
4
Because it selects a b c in the dictionary but not d. How can I do this? I saw some code does like this but I can't make it work.
I think you should use dict get:
proc test {test_dic} {
puts [dict get $test_dic a]
puts [dict get $test_dic b]
puts [dict get $test_dic c]
}
set test_dict [dict create a 2 b 3 c 4 d 5]
test $test_dict
Edit:
Another variant would be to use dict with:
proc test {test_dic} {
dict with test_dic {
puts $a
puts $b
puts $c
}
}
set test_dict [dict create a 2 b 3 c 4 d 5]
test $test_dict
But test gets still a list.