How to break a single TCL list into multiple sublists and easily searchable? - tcl

I have single TCL list that is extracted from a text file.
{ First Name = John
Last Name = Doe
Country = USA
Hello;
World;
Hello;
World;
First Name = Dwayne
Last Name = Jhonson
Country = USA
DoYou;
Smellwhatthe;
RockisCooking;
First Name = Harry
Last Name = Potter
Country = UK
The;
BoyWHo;
Lived; }
I want to be able to have the user input the text file(list), First name,last name and country. The code needs to dump out the remaining information for further post processing.
The way I am thinking of coding it right now is with multiple FOR loops, but I am sure there is a more efficient way to do this. Any tips?
proc display_name_information {text_file first_name last_name country} {
set fid [open $text_file r]
set filecontent [read $fid]
set input_list [split $filecontent "\n"]
foreach elem $input_list{
set first_word [lindex $line 0]
set second_word [lindex $line 1]
set third_Word [lindex $line 2]
if {[expr {$first_word== "First"}] && [expr {$third_word== "$first_name"}]}
*Then similarly check last name and country*
*and then output everything until I reach the keyword "First Name" again*
This feels very inefficient for large files.

A generic method of processing a text file is using a state machine. In the following example, each time a text line matches the expected pattern, the state machine goes to the next state. In each state, you may do further processing, such as extracting data from the text line. Until all lines are done.
set state 0
set chn1 [open input_file r]
array set record [list]
while { [gets $chn1 s1] >= 0 } {
set s1 [string trim $s1]
switch -- $state {
0 - 3 {
if { [regexp {^First Name\s*=\s*(.*)$} $s1 match data] } {
set first_name $data
set state 1
} elseif { $state == 3 } {
append record($key) $s1 { }
}
}
1 {
if { [regexp {^Last Name\s*=\s*(.*)$} $s1 match data] } {
set last_name $data
set state 2
}
}
2 {
if { [regexp {^Country\s*=\s*(.*)$} $s1 match data] } {
set country $data
set key ${first_name},${last_name},${country}
set state 3
}
}
}
}
append record($key) $s1 { }
close $chn1
parray record
exit

Related

Tcl How to sort certain words in the text and take the last one

I have a text and contains
#AA_VERSION = Aa/10.10-d87_1
#AA_VERSION = Aa/10.10-d887_1
#AA_VERSION = Aa/10.10-d138_1
#AA_VERSION = Aa/10.10-d82_1
How can I sort all the #AA_VERSION = beginning and print the last one?
And if the text don't have the # beginning ,how to show space or don't have version.
Thanks for your kindly help !!
Assuming you've already got a list of the contents of the lines, what you need to do is iterate over that list and test whether the line in question matches your critera; if it does, you store that matched information in a variable. At the end of the loop, the variable will contain the last such info that was matched.
set version ""
set current ""
foreach line $lines {
if {[regexp {^(#?)AA_VERSION *= *(.+)} $line -> commented info]} {
if {$commented eq "#"} {
set version [string trim $info]
} else {
if {$current ne ""} {
puts stderr "WARNING: multiple current versions"
}
set current [string trim $info]
}
}
}
# All lines scanned; describe what we've found
if {$version eq ""} {
puts "no #AA_VERSION line"
} else {
puts "#AA_VERSION is $version"
}
if {$current eq ""} {
puts "no current AA_VERSION"
} else {
puts "current AA_VERSION is $current"
}
The classic way to get a list of all lines in a file is this procedure:
proc linesOf {filename} {
set f [open $filename]
set data [read $filename]
close $f
return [split $data "\n"]
}
set lines [linesOf "mydata.txt"]

Tcl script to search strings and store in variable

Loop = (
{
Value = (
{
{
Key :
{
A = "B";
C = "D";
Class = (
{
section = "section_a";
Pairs = (
{
Name = "Ram";
Mark = "80";
},
{
Name = "Latha";
Mark = "70";
},
{
Name = "Mohan";
Mark = "90";
},
{
Name = "David";
Mark = "76";
} );
} );
Id = 1;
};
Absent :
{
DAYS = "Two days";
};
},
{
Key :
{
A = "B";
C = "D";
Class = (
{
section = "section_b";
Pairs = (
{
Name = "Ram";
Mark = "30";
},
{
Name = "Latha";
Mark = "45";
},
{
Name = "Mohan";
Mark = "100";
},
{
Name = "David";
Mark = "76";
} );
} );
Id = 2;
};
Absent :
{
DAYS = "Four days";
};
},
} );
} );
I am new to tcl script. I have a txt file in a above format. Using tcl script I have to store strings(section, Name , mark and Absent days) in a different variables to store in a csv file.
I tried below code to search the word Key
set search "Key"
set file [open "Marks.txt" r]
while {[gets $file data] != -1} {
if {[string match *[string toupper $search]* [string toupper $data]] } {
puts "Found '$search' in the line '$data'"
} else {
puts "does not match"
}
}
It is working to find the word key and printing whenever it matches. It works for that line which has the word Key. But, here I want to find the word Loop then I want search for the word Key in side loop. If it sees the word Key then it has to copy some strings present in the loop to variables. The word Key will be repeated in the files multiple times. After the word Key there will be {..} all over there file has some content. The script has to read the content and store it in some variable.
Exp: The script need to find the word Key in the text file, then look for the
section if present then section_b need to be stored in variable temp1(exp. temp1=section_a), Like wise:
If it sees Ram then Mark below the line needs to be stored in temp2 (exp. temp2=80).
If it sees Latha then Mark below the line needs to be stored in temp3 (exp. temp3=70).
then find id and need to to store the value 1 need to be stored in temp4(exp. temp4=1).
then Days meed to be stored in temp5(exp. temp5=Two days)
These temp values need to be written in the csv file everytime when it sees the word Key in the text file in below format.
section Ram Latha id Days
Section_a 80 70 1 Two days
Section_b 30 45 2 Four days
Can you help me in writing the tcl script to get this. Thank you.
This is the kind of thing that awk is really good for:
awk '
function quoted(string, a) {
split(string, a, /"/)
return a[2]
}
BEGIN { OFS="\t"; print "section", "Ram", "Latha", "id", "Days" }
$1 == "section" {section = quoted($0); delete marks}
$1 == "Name" {name = quoted($0)}
$1 == "Mark" {marks[name] = quoted($0)}
$1 == "Id" {id = gensub(/;/, "", 1, $3)}
$1 == "DAYS" {print section, marks["Ram"], marks["Latha"], id, quoted($0)}
' file
Translating that as a Tcl "one-liner", I'd write
echo '
proc trim {str} {string trim $str {"}}
puts "section\tRam\tLatha\tid\tdays"
set fh [open [lindex $argv end]]
while {[gets $fh line] != -1} {
if {[regexp -- {(\S+) = ("[^"]+"|\d+);} $line -> key value]} {
switch -exact -- $key {
section {set section [trim $value]; array set marks {}}
Name {set name [trim $value]}
Mark {set marks($name) [trim $value]}
Id {set id $value}
DAYS {puts [join [list $section $marks(Ram) $marks(Latha) $id [trim $value]] \t]}
}
}
}
' | tclsh - file

How to write back the variables in a tcl readable format to the file intelligently using tcl in the same format

This code has a class named class1 and methods set, get, load, save
# Class1 definition
oo::class create class1 {
variable dataArr
method set {key value} {
set dataArr($key) $value
}
method get {key} {
if {[info exist dataArr($key)]} {
return $dataArr($key)
}
}
method load {} {
set fp [open /home/karthikc/data.tcl r]
set file_data [read $fp]
puts $file_data
eval $file_data
close $fp
}
method save {{newFilePath ""}} {
if [info exists filePath] {
set tmpFP $filePath
}
if {$newFilePath ne ""} {
set tmpFP $newFilePath
}
if ![info exists tmpFP] {
puts"neither newFilePath argument is passed nor filePath variable is present"
return 0
}
try {
set fhandle [open $tmpFP w]
if ![info exists dataArr] {
puts "dataArr variable doesn't exist in the object [self]"
return 0
}
foreach key [array names dataArr] {
set kvPair [list $key $dataArr($key)]
lappend dataLst $kvPair
puts $fhandle "my set $key $dataArr($key)"
puts "my set $key $dataArr($key)"
}
set filePath $tmpFP
puts "dictionary is successfully saved in the file path"
} on error {result opts} {
puts $result
puts "Return options Directory"
puts $opts
return 0
} finally {
if [info exist fhandle] {
close $fhandle
}
}
return 1
}
}
Which I use like this:
# create object instance
set obj [class1 new]
# call load method
$obj load
# call save method
$obj save /home/karthikc/data.tcl
my data.tcl is
my set key1 value1
my set key2 value2
my set key3 [list valueA valueB valueC]
my set key4 [list valueX [list valueY valueZ]]
I want to write back to same format or some other list of lists
Suggestions for improvement
You can simplify your SERIALIZER, and render it more robust along the way.
First, don't serialize the object's state into a script, but a literal map (associative array or dict in Tcl). And read it as such:
data.tcl could look like:
key3 {valueA valueB valueC}
key4 {valueX {valueY valueZ}}
key1 value1
key2 value2
Your load method could use array set to read this directly:
method load {} {
set fp [open /tmp/data.tcl r]
set file_data [read $fp]
array set dataArr $file_data
close $fp
}
Your save method can directly use array get and produce formatted output:
method save2 {{newFilePath ""}} {
if {[array exists dataArr]} {
set fhandle [open $newFilePath w]
set out ""
foreach {k v} [array get dataArr] {
append out $k " " [list $v] \n
}
puts $fhandle $out
close $fhandle
}
}
Key to the idea is to avoid eval, and, therefore, code injection. And the serialization format matches 1:1 first-class Tcl data structures.
Improving your question
Allow me to say, your question is not a proper question. It does not state a problem, one has to read between the lines and snippets to sense what you are after. Also, the code example should be reduced to a minimum, to demonstrate your perceived problem. It is not helpful to paste your entire code work.
I changed the format of storing the lists
from
my set key1 value1
my set key2 value2
my set key3 [list valueA valueB valueC]
my set key4 [list valueX [list valueY valueZ]]
to
my set key1 value1
my set key2 value2
my set key3 {valueA valueB valueC}
my set key4 {valueX {valueY valueZ}}
and in save method I changed
puts $fhandle "my set $key $dataArr($key)"
to
puts $fhandle [list my set $key $dataArr($key)]
And these changes serve the purpose.

Tcl: how to print one set

My file to be parsed is like this
Name : John
Pin : 5400
Age : 40
Place: Korea
Amount : 4000
Name : Peter
Pin : 6700
Age : 10
Place : Japan
Amount : 3600
My tcl code is
set start "Name"
set pn "Pin"
set ag "Age"
set ag_cutoff 15
set amnt "Amount"
foreach line [split $content "\n"] {
if {[regexp $start $line]} {
set count 1
set l1 $line
}
if {[regexp $pn $line] && $count ==1} {
set pin_val [lindex $line 2]
set l2 $line
}
if {[regexp $ag $line] && $count ==1} {
set ag [lindex $line 2]
if { $ag > $ag_cutoff} {
set rep_taken 1
set l3 $line
}
if {[regexp $amnt $line] && $count ==1 && $rep_taken == 1} {
set age_val [lindex $line 2]
puts $op1 "$ag $age_val "
puts $op2 "$l1\n$l2\n$l3\n"
}
This code is fine for plots.
However, I also want to o/p a file with complete set where $ag>$ag_cutoff.
Now with puts $op3 "$l1\n$l2\n$l3\n" ---> Able to print to a file. But how to print line Place which is not evaluated. Any better way to accomplish this.
Name : John
Pin : 5400
Age : 40
Place : Korea
Amount : 4000
It would be a lot simpler to let the parsing loop just create a dictionary (this replaces your code above):
set data {}
set count 0
foreach line [split $content \n] {
if {[lindex $line 0] eq "Name"} {
incr count
}
dict set data $count [lindex $line 0] [lindex $line 2]
}
This will blow up if the first line doesn't start with "Name", or if there is a missing blank between a colon and a word, and also if a value consists of several words. All of these are easy to fix.
Here, for instance, is an expanded version that takes care of the last two problems, should they occur:
set data {}
set count 0
foreach line [split $content \n] {
set keyword [string trimright [lindex $line 0] :]
set value [string trimleft [lrange $line 1 end] {: }]
if {$keyword eq "Name"} {
incr count
}
dict set data $count $keyword $value
}
When all records are stored, one can output selected records using dictionary iteration:
set ag_cutoff 15
dict for {count record} $data {
if {[dict get $record Age] > $ag_cutoff} {
dict for {k v} $record {
puts "$k : $v"
}
}
}
This also means that you can keep adding fields to the records, and the code will still work without change.
Precautions
If the data in content has empty lines at the beginning or end, or between some lines, these methods won't work. A simple way to guard against empty or blank lines at the beginning or the end is to replace
foreach line [split $content \n] {
with
foreach line [split [string trim $content] \n] {
If empty / blank lines may occur within the data, one can use this to skip them:
foreach line [split $content \n] {
if {[string is space $line]} continue
If one is 100% sure that all data is in proper list form, it is possible (but a bit code-smelly) to use list commands like lindex on it directly. If one is less sure, or if one wants to be more correct, one should convert each line to a list before working on it:
foreach line [split $content \n] {
set line [split $line]
Documentation: dict, foreach, if, incr, lindex, lrange, puts, set, split, string

How to get selective data from a file in TCL?

I am trying to parse selective data from a file based on certain key words using tcl,for example I have a file like this
...
...
..
...
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
...
...
...
How do I catch only the 30, 90 and 214 between "data_start" and "data_end"? What I have so far(tcl newbie),
proc get_data_value{ data_file } {
set lindex 0
set fp [open $data_file r]
set filecontent [read $fp]
while {[gets $filecontent line] >= 0} {
if { [string match "data_start" ]} {
#Capture only the first number?
#Use regex? or something else?
if { [string match "data_end" ] } {
break
} else {
##Do Nothing?
}
}
}
close $fp
}
If your file is smaller in size, then you can use read command to slurp the whole data into a variable and then apply regexp to extract the required information.
input.txt
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
data_start
130 abc1 xyz
190 abc2 xyz
1214 abc3 xyz
data_end
extractNumbers.tcl
set fp [open input.txt r]
set data [read $fp]
close $fp
set result [regexp -inline -all {data_start.*?\n(\d+).*?\n(\d+).*?\n(\d+).*?data_end} $data]
foreach {whole_match number1 number2 number3} $result {
puts "$number1, $number2, $number3"
}
Output :
30, 90, 214
130, 190, 1214
Update :
Reading a larger file's content into a single variable will cause the program to crash depends on the memory of your PC. When I tried to read a file of size 890MB with read command in a Win7 8GB RAM laptop, I got unable to realloc 531631112 bytes error message and tclsh crashed. After some bench-marking found that it is able to read a file with a size of 500,015,901 bytes. But the program will consume 500MB of memory since it has to hold the data.
Also, having a variable to hold this much data is not efficient when it comes to extracting the information via regexp. So, in such cases, it is better to go ahead with read the content line by line.
Read more about this here.
Load all the data from the file into a variable. Set start and end tokens and seek to those positions. Process the item line by line. Tcl uses lists of strings separated by white space so we can process the items in the line with foreach {a b c} $line {...}.
tcl:
set data {...
...
..
...
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
...
...
...}
set i 0
set start_str "data_start"
set start_len [string length $start_str]
set end_str "data_end"
set end_len [string length $end_str]
while {[set start [string first $start_str $data $i]] != -1} {
set start [expr $start + $start_len]
set end [string first $end_str $data $start]
set end [expr $end - 1]
set item [string range $data $start $end]
set lines [split $item "\n"]
foreach {line} $lines {
foreach {a b c} $line {
puts "a=$a, b=$b, c=$c"
}
}
set i [expr $end + $end_len]
}
output:
a=30, b=abc1, c=xyz
a=90, b=abc2, c=xyz
a=214, b=abc3, c=xyz
I'd write that as
set fid [open $data_file]
set p 0
while {[gets $fid line] != -1} {
switch -regexp -- $line {
{^data_end} {set p 0}
{^data_start} {set p 1}
default {
if {$p && [regexp {^(\d+)\M} $line -> num]} {
lappend nums $num
}
}
}
}
close $fid
puts $nums
or, even
set nums [exec sed -rn {/data_start/,/data_end/ {/^([[:digit:]]+).*/ s//\1/p}} $data_file]
puts $nums
My favorite method would be to declare procs for each of the acceptable tokens and utilize the unknown mechanism to quietly ignore the unacceptable ones.
proc 30 args {
... handle 30 $args
}
proc 90 args {
... process 90 $args
}
rename unknown original_unknown
proc unknown args {
# This space was deliberately left blank
}
source datafile.txt
rename original_unknown unknown
You'll be using Tcl's built-in parsing, which should be considerably faster. It also looks better in my opinion.
You can also put the line-handling logic into your unknown-procedure entirely:
rename unknown original_unknown
proc unknown {first args} {
process $first $args
}
source input.txt
rename original_unknown unknown
Either way, the trick is that Tcl's own parser (implemented in C) will be breaking up the input lines into tokens for you -- so you don't have to implement the parsing in Tcl yourself.
This does not always work -- if, for example, the input is using multi-line syntax (without { and }) or if the tokens are separated with something other than white space. But in your case it should do nicely.