Looking for a search string in a file and using those lines for processing in TCL - tcl

To be more precise:
I need to be looking into a file abc.txt which has contents something like this:
files/f1/atmp.c 98 100
files/f1/atmp1.c 89 100
files/f1/atmp2.c !! 75 100
files/f2/btmp.c 92 100
files/f2/btmp2.c !! 85 100
files/f3/xtmp.c 92 100
The script needs to find "!!" and use those lines to print out the following as output:
atmp2.c 75
btmp2.c 85
Any help?

this should do the trick.
set data {files/f1/atmp.c 98 100
files/f1/atmp1.c 89 100
files/f1/atmp2.c !! 75 100
files/f2/btmp.c 92 100
files/f2/btmp2.c !! 85 100
files/f3/xtmp.c 92 100}
set lines [split $data \n]
foreach line $lines {
set match [regexp {(\S+)\s+!!\s+(\d+)} $line -> file num]
if {$match} {puts "$file $num"}
}
Although regexp has a -all switch I don't think we can use it here as we only get the last match vars with -all

If your file isn't huge, you can slurp the whole thing into memory, split the lines into a TCL list, and then iterate through the list looking for a match. For example:
set fh [open foo]
set lines [read $fh]
close $fh
set lines [split $lines "\n"]
foreach line $lines {
if { [regexp {.*/(\S+\.c)\s*!!\s*(\d+)} $line match file data] } {
puts "$file $data"
}
}
This will successfully return just the lines with "!!" in them. With your posted corpus, the results are:
atmp2.c 75
btmp2.c 85

I might be tempted in this case to exec to awk:
set output [exec awk {$2 == "!!" {print $1, $3}} abc.txt]
puts $output

The trick is to combine the code that reads lines from the file with a regular expression that detects matching lines and extracts the relevant parts (a one-step process with regexp). The only tricky part is working out what exactly to use as the regular expression, so that you get exactly what you want. I'm going to guess that you're after the parts of the filenames after the /, that those filenames won't contain spaces, and that the number you're after is the entirety of the first digit sequence after the double exclamation. (Other formats are possible, some of which are easier to extract with other tools such as scan.) That would give us something like this:
set f [open abc.txt]
while {[gets $f line] >= 0} {
if {[regexp {([^\s/]+)\s+!!\s+(\d+)} $line -> name value]} {
# Or do whatever you want with these
puts "$name $value"
}
}
close $f
(The gets command with two arguments returns the length of line read, or -1 on failure. For normal files the only failure mode is EOF, so we can just terminate the loop when we get a negative value. Other kinds of channels can be more complex…)

Related

how to read and and perform calculations on fix point value with tcl

I would like to read this file below with tcl:
BEGIN
%Time (real) HG (real)
!Time HG
-0.000110400001 0.6
-0.000110399901 0.6
-0.000110399801 0.6
-0.000110399701 0.6
-0.000110399601 0.55
-0.000110399501 0.5
-0.000110399401 0.45
-0.000110399301 0.4
-0.000110399201 0.45
-0.000110399101 0.5
-0.000110399001 0.55
-0.000110398901 0.6
For each Time column, i would like to increment by +0.000110400001 and write this result in new file. i would like other column doesn't be modified and copy as such.
I began to coding (see below), i can open and read the value but I don't how to convert string in fix point and make addition on this. If anyone help me that would be nice.
set inVector [lindex $argv 0]
puts "input vector : $inVector"
set filename "resultat.mdf"
set fileId [open $filename "w"]
set PROCESSING_FILE [open "$inVector" r]
while {[eof $PROCESSING_FILE]==0} {
set string [gets $PROCESSING_FILE]
if {[string index $string 3] != "B"} {
if {[string index $string 3] != "%"} {
if {[string index $string 3] != "!"} {
foreach line $string {
puts "input value : $line"
}
} else {
puts $fileId $string
}
} else {
puts $fileId $string
}
} else {
puts $fileId $string
}
}
close $PROCESSING_FILE
close $fileId
For lines with digits on, you could probably read them like this:
scan $string "%f %f" time hg
If that returns 2 (for two fields processed) you've successfully read two (floating point) numbers from that line. Otherwise, the line is something else. This leads to code like this (with some standard line-by-line idioms that must've been written up already in some other question):
# Skipping all the code for opening files
# While we successfully read a line from the input file
while {[gets $PROCESSING_FILE line] >= 0} {
# Attempt to parse two floats (with at least one whitespace between) out of line
if {[scan $line "%f %f" time hg] == 2} {
# Action to take on a matched line
puts "input line: '$line' time:$time HG:$hg"
} else {
# Action to take on an unmatched line
puts $fileId $line
}
}
# Skipping the code for closing files
For files with truly fixed width fields, you use string range to pick out pieces of the line and then attempt to parse those (or you write a messy regular expression and use regexp). Those tend to need more tuning to the data.

Manipulating file in tcl language

First time poster and new to TCL so please pardon my knowledge.
I've found a few examples on stackoverflow and with that help created a script.
I need to modify few lines of a file, I've tried the following (see code). I can seem to add the line of interest but it does not write it in the correct location e.g. if I want to replace line 3 it adds line after line 3
and moreover deletes subsequent lines if there is more than one line operation.
Lastly could some one kindly suggest the best way to identify the line of interest with name rather than line number. Name is always in the form Filter.HpOrd_n =
where n is 0...k
Data in info.dat
AA
BB
Filter.HpOrd_1 = 2
Filter.HpOrd_2 = 2
Filter.HpOrd_3 = 0.1
Filter.HpOrd_4 = 0.2
CC
DD
EE
FF
Code:
set fd [open "info.dat" r+]
set i 0
while { [gets $fd line] != -1 } {
set line [split $line "\n"]
incr i
if {$i == 3} {
set nLine [lreplace $line 0 0 Filter.LoPass]
puts $fd [join $nLine "\n"]
}
if {$i == 6} {
set nLine [lreplace $line 0 0 Filter.Butterworth]
puts $fd [join $nLine "\n"]
}
}
close $fd
With plain Tcl:
# the input and output file handles
set fin [open info.dat r]
set fout [file tempfile fname]
# process the file
while {[gets $fin line] != -1} {
puts $fout [string map {
"Filter.HpOrd_1" "Filter.LoPass"
"Filter.HpOrd_4" "Filter.Butterworth"
} $line]
}
close $fin
close $fout
# backup the original and overwrite it
file link -hard info.dat.bak info.dat
file rename -force -- $fname info.dat
TCL is just a meta language and set fd [open "info.dat" r+] is related to general file descriptor handling. If you open a file descriptor "r+" you can read and write to that file descriptor, but one file descriptor always points to one point in a file.
With "r+" your file descriptor initially points to the start of the file. Then you gets $fd line a line from the file, so $fd points to the start of the second line afterwards. Now you puts $fs [join $nline "\n"] blindly overwriting from the start of the second line and so on.
Generally you cannot replace lines in one file, but you will write a second file and move that after you closed both files. You can overwrite with seek, but you overwrite from a point in the file. So what you put should always have the same size, of you have read before.
Plain files (in basically all programming languages) are byte/character oriented rather than line oriented. This means 1) that you need to use a seek operation to get back to the beginning of the line you want to overwrite, and 2) unless the new line is exactly the same length as the old one, you will experience stub lines around it.
You have other problems as well. set line [split $line "\n"] doesn't do anything: you've just read line from gets, so it's guaranteed not to have any newlines in it. [join $nLine "\n"] doesn't do what you probably think it does: it will replace any sequences of whitespace in $line with single newlines, but it will not place any newline at the end of the string.
Unless your files are insanely large, I recommend something like this:
Replace by line number
proc lineReplace args {
set lines [split [lindex $args end] \n]
foreach {n line} [lrange $args 0 end-1] {
set index [incr n -1]
if {$index > 0} {
lset lines $index $line
}
}
join $lines \n
}
package require fileutil
fileutil::updateInPlace info.dat {
lineReplace
3 Filter.LoPass
6 Filter.Butterworth
}
In the "front end" you only specify the command to use and thereafter pairs of line number / new line text.
In the "back end" (the lineReplace command) the parameter args will contain those number / line pairs and at the end, as a single item, the complete contents of the file. The file contents are then split into a list of lines, and for every number / line pair you replace one of the items in that list. Finally, the list of lines are joined back into a string with newlines between each line. This string is returned by lineReplace to fileutil::updateInPlace, which replaces the old contents in the file with the returned string.
Replace by name
proc lineReplaceByName args {
set lines [split [lindex $args end] \n]
foreach {name line} [lrange $args 0 end-1] {
set index [lsearch $lines $name*]
if {$index > 0} {
lset lines $index $line
}
}
join $lines \n
}
fileutil::updateInPlace info.dat {
lineReplaceByName
Filter.HpOrd_1 Filter.LoPass
Filter.HpOrd_4 Filter.Butterworth
}
In this case the "back end" calculates the line number by searching for the given name at the beginning of each line. If the name isn't found, the replacement operation is skipped. Otherwise it's the same as before.
Replacing just the name
If you don't want to replace the complete line, but just the name part of it, some changes are necessary. If you are 100% sure that 1) the name never has any whitespace in it, and 2) there is always whitespace between the name and the =, you can just replace lset lines $index $line with lset lines $index 0 $line. If you want to play it safer, you can replace the line with
lset lines $index [regsub {.+(?=\s*=\s*)} [lindex $lines $index] $line]
which uses a regular expression to find the character region that precedes the = character (optionally with whitespace around it) and then replaces that with the text you provided.
The fileutil package is a part of the Tcllib companion library to Tcl.
Documentation: fileutil package, foreach, if, incr, join, lindex, lrange, lsearch, lset, package, proc, regsub, seek, set, split

How to get selective data from a file in TCL?

I am trying to parse selective data from a file based on certain key words using tcl,for example I have a file like this
...
...
..
...
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
...
...
...
How do I catch only the 30, 90 and 214 between "data_start" and "data_end"? What I have so far(tcl newbie),
proc get_data_value{ data_file } {
set lindex 0
set fp [open $data_file r]
set filecontent [read $fp]
while {[gets $filecontent line] >= 0} {
if { [string match "data_start" ]} {
#Capture only the first number?
#Use regex? or something else?
if { [string match "data_end" ] } {
break
} else {
##Do Nothing?
}
}
}
close $fp
}
If your file is smaller in size, then you can use read command to slurp the whole data into a variable and then apply regexp to extract the required information.
input.txt
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
data_start
130 abc1 xyz
190 abc2 xyz
1214 abc3 xyz
data_end
extractNumbers.tcl
set fp [open input.txt r]
set data [read $fp]
close $fp
set result [regexp -inline -all {data_start.*?\n(\d+).*?\n(\d+).*?\n(\d+).*?data_end} $data]
foreach {whole_match number1 number2 number3} $result {
puts "$number1, $number2, $number3"
}
Output :
30, 90, 214
130, 190, 1214
Update :
Reading a larger file's content into a single variable will cause the program to crash depends on the memory of your PC. When I tried to read a file of size 890MB with read command in a Win7 8GB RAM laptop, I got unable to realloc 531631112 bytes error message and tclsh crashed. After some bench-marking found that it is able to read a file with a size of 500,015,901 bytes. But the program will consume 500MB of memory since it has to hold the data.
Also, having a variable to hold this much data is not efficient when it comes to extracting the information via regexp. So, in such cases, it is better to go ahead with read the content line by line.
Read more about this here.
Load all the data from the file into a variable. Set start and end tokens and seek to those positions. Process the item line by line. Tcl uses lists of strings separated by white space so we can process the items in the line with foreach {a b c} $line {...}.
tcl:
set data {...
...
..
...
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
...
...
...}
set i 0
set start_str "data_start"
set start_len [string length $start_str]
set end_str "data_end"
set end_len [string length $end_str]
while {[set start [string first $start_str $data $i]] != -1} {
set start [expr $start + $start_len]
set end [string first $end_str $data $start]
set end [expr $end - 1]
set item [string range $data $start $end]
set lines [split $item "\n"]
foreach {line} $lines {
foreach {a b c} $line {
puts "a=$a, b=$b, c=$c"
}
}
set i [expr $end + $end_len]
}
output:
a=30, b=abc1, c=xyz
a=90, b=abc2, c=xyz
a=214, b=abc3, c=xyz
I'd write that as
set fid [open $data_file]
set p 0
while {[gets $fid line] != -1} {
switch -regexp -- $line {
{^data_end} {set p 0}
{^data_start} {set p 1}
default {
if {$p && [regexp {^(\d+)\M} $line -> num]} {
lappend nums $num
}
}
}
}
close $fid
puts $nums
or, even
set nums [exec sed -rn {/data_start/,/data_end/ {/^([[:digit:]]+).*/ s//\1/p}} $data_file]
puts $nums
My favorite method would be to declare procs for each of the acceptable tokens and utilize the unknown mechanism to quietly ignore the unacceptable ones.
proc 30 args {
... handle 30 $args
}
proc 90 args {
... process 90 $args
}
rename unknown original_unknown
proc unknown args {
# This space was deliberately left blank
}
source datafile.txt
rename original_unknown unknown
You'll be using Tcl's built-in parsing, which should be considerably faster. It also looks better in my opinion.
You can also put the line-handling logic into your unknown-procedure entirely:
rename unknown original_unknown
proc unknown {first args} {
process $first $args
}
source input.txt
rename original_unknown unknown
Either way, the trick is that Tcl's own parser (implemented in C) will be breaking up the input lines into tokens for you -- so you don't have to implement the parsing in Tcl yourself.
This does not always work -- if, for example, the input is using multi-line syntax (without { and }) or if the tokens are separated with something other than white space. But in your case it should do nicely.

How to manipulate each line in a file in TCL

I'm trying to write some data from iperf to a file using tcl script.The file has more than 100 lines. Now i need to parse the first 10 lines, neglect it and consider the next set of 10 lines and print it, again i need to neglect the next set of 10 lines and print the next 10 lines and keep continuing until i reach the end of file. How could i do it programmatic ally?
exec c:\\iperf_new\\iperf -c $REF_WLAN_IPAddr -f m -w 2M -i 1 -t $run_time > xx.txt
set fp [open "xx.txt" r ]
set file_data [read $fp]
set data [split $file_data "\n"]
foreach line $data {
if {[regexp {(MBytes) +([0-9\.]*)} $line match pre tput]==1 } {
puts "Throughput: $tput Mbps"
}
Well, as your example shows, you have found out how to split a (slurped) file into lines and process them one-by-one.
Now what's the problem with implementing "skip ten lines, process ten lines, skip another ten lines etc"? It's just about using a variable which counts lines seen so far plus selecting a branch of code based on its value. This approach has nothing special when it comes to Tcl: there are commands available to count, conditionally select branches of code and control looping.
If branching based on the current value of a line counter looks too lame, you could implement a state machine around that counter variable. But for this simple case it looks like over-engeneering.
Another approach would be to pick the necessary series of lines out of the list returned by split using lrange. This approach might use a nice property of lrange which can be told to return a sublist "since this index and until the end of the list", so the solution really boils down to:
set lines [split [read $fd] \n]
parse_header [lrange $lines 0 9]
puts [join [lrange $lines 10 19] \n]
parse_something_else [lrange 20 29]
puts [join [lrange $lines 30 end] \n]
For a small file this solution looks pretty compact and clean.
If I understood you correctly, you want to print lines 11-20, 31-40, 51-60,... The following will do what you want:
package require Tclx
set counter 0
for_file line xxx.txt {
if {$counter % 20 >= 10} { puts $line }
incr counter
}
The Tclx package provides a simple way to read lines from a file: the for_file command.

How to ensure my regular expression does not match too much

A file has few words with numbers in the begining of them. i want to extract a particular no line.when given 1, it extracts line 1 also with 11, 21
FILE.txt has contents:
1.sample
lines of
2.sentences
present in
...
...
10.the
11.file
when Executed pro 1 file.txt
gives results from line 1,10 and also from line 11
as these three results have 1 in their string. i.e
Output of the script:
1.sample
10.the
11.file
Expected output: the output which i am expecting
is only line 1 contents and not the line 10 or line 11 contents.
i.e
Expected output:
1.sample
My current code:
proc pro { pattern args} {
set file [open $args r]
set lnum 0
set occ 0
while {[gets $file line] >=0} {
incr lnum
if {[regexp $pattern $line]} {
incr occ
puts "The pattern is present in line: $lnum"
puts "$line"
} else {
puts "not found"
}
}
puts "total number of occurencese : $occ"
close $file
}
the program is working fine but the thing is i am retrieving lines that i dont want to along with the expected line. As the number (1) which i want to retrieve is present in the other strings such as 11, 21, 14 etc these lines are also getting printed.
kindly tolerate my unclear way of explaining the question.
You can solve the problem using word boundaries as suggested by glen but you can also consider the following things:
If after every line number there is a . then you can use it as delimiter in regular expression
regexp "^$lineNo\\." $a
I would also suggest to use ^ (match at the beginning of line) so that even if number is present in the line elsewhere it would not get counted.
tcl word boundaries are well explained at http://www.regular-expressions.info/wordboundaries.html
You have to ensure your pattern matches only between word boundaries:
if {[regexp "\\m$pattern\\M" $line]} { ...
See the documentation for regular expression syntax.
If what you're looking to do is as constrained as what you're describing, why not just use something like
if { [string range $line 0 [string length $pattern]] eq "${pattern}." } {
...
}