TCL: reading next line of a file once a certain pattern has been identified - tcl

I have this data
# Curve 0 of 2, 7 points
# x y xlow xhigh type
20.781 1 20.781 20.781
20.8102 10 20.8102 20.8102
20.8395 18 20.8395 20.8395
20.8687 13 20.8687 20.8687
20.898 15 20.898 20.898
20.9273 18 20.9273 20.9273
20.9565 13 20.9565 20.9565
# Curve 1 of 2, 7 points
# x y xlow xhigh type
21.635 2 21.635 21.635
21.6625 19 21.6625 21.6625
21.6899 29 21.6899 21.6899
21.7173 63 21.7173 21.7173
21.7447 137 21.7447 21.7447
21.7721 168 21.7721 21.7721
21.7996 109 21.7996 21.7996
All the information is an unique file, i.e block data are separated by three blank lines.
I want to collect the information that is just at the beginning of the next line that has characters # x y xlow xhigh type. Also I want to collect the information that is at the end of each block. In other words, I want to print on screen the values that are in bold letter (20.781 20.9565 21.635 21.7996).
I wrote these lines of code but I don't know how to print the info that is just below the characters # x y.
set input [open "dataHist.dat" r]
while { [gets $input line] != -1 } {
if { [string range 0 4] == "# x y"} {
}
}

Since the first line of each block tells you long it is, you can use that to tell which lines you want to extract the first number from:
#!/usr/bin/env tclsh
proc must_gets {ch var_} {
upvar $var_ var
if {[gets $ch var] < 0} {
error "Premature end of file"
}
}
proc extract_numbers {filename} {
set ch [open $filename]
try {
set nums {}
while {[gets $ch line] >= 0} {
if {[regexp {^# Curve \d+ of \d+, (\d+) points} $line -> nPoints]} {
must_gets $ch line ;# Discard '# x y ...' line.
must_gets $ch line ;# First point line
# Extract first element of it
lappend nums [lindex [split $line] 0]
# Read remaining point lines
for {set n 2} {$n <= $nPoints} {incr n} {
must_gets $ch line
}
# And extract first element of last one
lappend nums [lindex [split $line] 0]
}
}
return $nums
} finally {
chan close $ch
}
}
# 20.781 20.9565 21.635 21.7996
puts [extract_numbers dataHist.dat]

Related

how to split a file to list of lists TCL

I'm coding TCL and I would like to split a file into two lists of lists,
the file contain:
(1,2) (3,4) (5,6)
(7,8) (9,10) (11,12)
and I would like to get two list
one for each line, that contain lists that each one contain to two number
for example:
puts $list1 #-> {1 2} {3 4} {5 6}
puts [lindex $list1 0] #-> 1 2
puts [lindex $list2 2] #-> 11 12
I tried to use regexp and split but no success
The idea of using regexp is good, but you'll need to do some post-processing on its output.
# This is what you'd read from a file
set inputdata "(1,2) (3,4) (5,6)\n(7,8) (9,10) (11,12)\n"
foreach line [split $inputdata "\n"] {
# Skip empty lines.
# (I often put a comment format in my data files too; this is where I'd handle it.)
if {$line eq ""} continue
# Parse the line.
set bits [regexp -all -inline {\(\s*(\d+)\s*,\s*(\d+)\s*\)} $line]
# Example results of regexp:
# (1,2) 1 2 (3,4) 3 4 (5,6) 5 6
# Post-process to build the lists you really want
set list([incr idx]) [lmap {- a b} $bits {list $a $b}]
}
Note that this is building up an array; long experience says that calling variables list1, list2, …, when you're building them in a loop is a bad idea, and that an array should be used, effectively giving variables like list(1), list(2), …, as that yields a much lower bug rate.
An alternate approach is to use a simpler regexp and then have scan parse the results. This can be more effective when the numbers aren't just digit strings.
foreach line [split $inputdata "\n"] {
if {$line eq ""} continue
set bits [regexp -all -inline {\([^()]+\)} $line]
set list([incr idx]) [lmap substr $bits {scan $substr "(%d,%d)"}]
}
If you're not using Tcl 8.6, you won't have lmap yet. In that case you'd do something like this instead:
foreach line [split $inputdata "\n"] {
if {$line eq ""} continue
set bits [regexp -all -inline {\(\s*(\d+)\s*,\s*(\d+)\s*\)} $line]
set list([incr idx]) {}
foreach {- a b} $bits {
lappend list($idx) [list $a b]
}
}
foreach line [split $inputdata "\n"] {
if {$line eq ""} continue
set bits [regexp -all -inline {\([^()]+\)} $line]
set list([incr idx]) {}
foreach substr $bits {
lappend list($idx) [scan $substr "(%d,%d)"]
# In *very* old Tcl you'd need this:
# scan $substr "(%d,%d)" a b
# lappend list($idx) [list $a $b]
}
}
You have an answer already, but it can actually be done a little bit simpler (or at least without regexp, which is usually a good thing).
Like Donal, I'll assume this to be the text read from a file:
set lines "(1,2) (3,4) (5,6)\n(7,8) (9,10) (11,12)\n"
Clean it up a bit, removing the parentheses and any white space before and after the data:
% set lines [string map {( {} ) {}} [string trim $lines]]
1,2 3,4 5,6
7,8 9,10 11,12
One way to do it with good old-fashioned Tcl, resulting in a cluster of variables named lineN, where N is an integer 1, 2, 3...:
set idx 0
foreach lin [split $lines \n] {
set res {}
foreach li [split $lin] {
lappend res [split $li ,]
}
set line[incr idx] $res
}
A doubly iterative structure like this (a number of lines, each having a number of pairs of numbers separated by a single comma) is easy to process using one foreach within the other. The variable res is used for storing result lines as they are assembled. At the innermost level, the pairs are split and list-appended to the result. For each completed line, a variable is created to store the result: its name consists of the string "line" and an increasing index.
As Donal says, it's not a good idea to use clusters of variables. It's much better to collect them into an array (same code, except for how the result variable is named):
set idx 0
foreach lin [split $lines \n] {
set res {}
foreach li [split $lin] {
lappend res [split $li ,]
}
set line([incr idx]) $res
}
If you have the results in an array, you can use the parray utility command to list them in one fell swoop:
% parray line
line(1) = {1 2} {3 4} {5 6}
line(2) = {7 8} {9 10} {11 12}
(Note that this is printed output, not a function return value.)
You can get whole lines from this result:
% set line(1)
{1 2} {3 4} {5 6}
Or you can access pairs:
% lindex $line(1) 0
1 2
% lindex $line(2) 2
11 12
If you have the lmap command (or the replacement linked to below), you can simplify the solution somewhat (you don't need the res variable):
set idx 0
foreach lin [split $lines \n] {
set line([incr idx]) [lmap li [split $lin] {
split $li ,
}]
}
Still simpler is to let the result be a nested list:
set lineList [lmap lin [split $lines \n] {
lmap li [split $lin] {
split $li ,
}
}]
You can access parts of the result similar to above:
% lindex $lineList 0
{1 2} {3 4} {5 6}
% lindex $lineList 0 0
1 2
% lindex $lineList 1 2
11 12
Documentation:
array,
foreach,
incr,
lappend,
lindex,
lmap (for Tcl 8.5),
lmap,
parray,
set,
split,
string
The code works for windows :
TCL file code is :
proc captureImage {} {
#open the image config file.
set configFile [open "C:/main/image_config.txt" r]
#To retrive the values from the config file.
while {![eof $configFile]} {
set part [split [gets $configFile] "="]
set props([string trimright [lindex $part 0]]) [string trimleft [lindex $part 1]]
}
close $configFile
set time [clock format [clock seconds] -format %Y%m%d_%H%M%S]
set date [clock format [clock seconds] -format %Y%m%d]
#create the folder with the current date
set folderPath $props(folderPath)
append folderDate $folderPath "" $date "/"
set FolderCreation [file mkdir $folderDate]
while {0} {
if { [file exists $date] == 1} {
}
break
}
#camera selection to capture image.
set camera "video"
append cctv $camera "=" $props(cctv)
#set the image resolution (XxY).
set resolutionX $props(resolutionX)
set resolutionY $props(resolutionY)
append resolution $resolutionX "x" $resolutionY
#set the name to the save image
set imagePrefix $props(imagePrefix)
set imageFormat $props(imageFormat)
append filename $folderDate "" $imagePrefix "_" $time "." $imageFormat
set logPrefix "Image_log"
append logFile $folderDate "" $logPrefix "" $date ".txt"
#ffmpeg command to capture image in background
exec ffmpeg -f dshow -benchmark -i $cctv -s $resolution $filename >& $logFile &
after 3000
}
}
captureImage
thext file code is :
cctv=Integrated Webcam
resolutionX=1920
resolutionY=1080
imagePrefix=ImageCapture
imageFormat=jpg
folderPath=c:/test/
//camera=video=Integrated Webcam,Logitech HD Webcam C525
This code works for me me accept the code from text file were list of parameters are passed.

TCL:How to read each line of a file and split each column and save it as a variable

I wanted to read a file one line a time, split its columns and saved it in variables.
File looks like this:
%% cat Memory_minus.list
MEM_rf_2p_hce_1024x8_naxos1 RF dual_port 1024 8
MEM_rf_2p_hce_128x18_naxos1 RF dual_port 128 18
MEM_rf_2p_hce_16x128_fusion RF dual_port 16 128
MEM_rf_2p_hce_16x80_fusion4 RF dual_port 16 80
MEM_rf_2p_hce_256x18_naxos1 RF dual_port 256 18
MEM_rf_2p_hce_256x8_naxos1 RF dual_port 256 8
in a single foreach loop(one line) i wanted to have all 5 columns converted to variables, eg for first line each variable should return following.
puts $var1 ->>> MEM_rf_2p_hce_1024x8_naxos1
puts $var2 ->>> RF
puts $var3 ->>> dual_port
puts $var4 ->>> 1024
puts $var5 ->>> 8
I tried writing a code to read a file line by line and then extracting the items but it didn't worked.
set f [open Memory_minus.list]
while {1} {
set line [gets $f]
if {[eof $f]} {
close $f
break
}
#set base_name [exec awk {{print $1}} $line ]
foreach lt $line {
puts $lt
}
}
$lt return whole scrambled outputs.
set fp [open input.txt r]
while {[gets $fp line]!=-1} {
# To make sure that the line has five words
if {[llength $line]!=5} {
puts "Incorrect format in the line -> '$line'"
continue
}
foreach {var1 var2 var3 var4 var5} $line {
puts ->>>$var1;puts ->>>$var2;puts ->>>$var3;puts ->>>$var4;puts ->>>$var5;
}
}
close $fp
Output : (Showing only first line's output)
->>>MEM_rf_2p_hce_1024x8_naxos1
->>>RF
->>>dual_port
->>>1024
->>>8

How to get selective data from a file in TCL?

I am trying to parse selective data from a file based on certain key words using tcl,for example I have a file like this
...
...
..
...
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
...
...
...
How do I catch only the 30, 90 and 214 between "data_start" and "data_end"? What I have so far(tcl newbie),
proc get_data_value{ data_file } {
set lindex 0
set fp [open $data_file r]
set filecontent [read $fp]
while {[gets $filecontent line] >= 0} {
if { [string match "data_start" ]} {
#Capture only the first number?
#Use regex? or something else?
if { [string match "data_end" ] } {
break
} else {
##Do Nothing?
}
}
}
close $fp
}
If your file is smaller in size, then you can use read command to slurp the whole data into a variable and then apply regexp to extract the required information.
input.txt
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
data_start
130 abc1 xyz
190 abc2 xyz
1214 abc3 xyz
data_end
extractNumbers.tcl
set fp [open input.txt r]
set data [read $fp]
close $fp
set result [regexp -inline -all {data_start.*?\n(\d+).*?\n(\d+).*?\n(\d+).*?data_end} $data]
foreach {whole_match number1 number2 number3} $result {
puts "$number1, $number2, $number3"
}
Output :
30, 90, 214
130, 190, 1214
Update :
Reading a larger file's content into a single variable will cause the program to crash depends on the memory of your PC. When I tried to read a file of size 890MB with read command in a Win7 8GB RAM laptop, I got unable to realloc 531631112 bytes error message and tclsh crashed. After some bench-marking found that it is able to read a file with a size of 500,015,901 bytes. But the program will consume 500MB of memory since it has to hold the data.
Also, having a variable to hold this much data is not efficient when it comes to extracting the information via regexp. So, in such cases, it is better to go ahead with read the content line by line.
Read more about this here.
Load all the data from the file into a variable. Set start and end tokens and seek to those positions. Process the item line by line. Tcl uses lists of strings separated by white space so we can process the items in the line with foreach {a b c} $line {...}.
tcl:
set data {...
...
..
...
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
...
...
...}
set i 0
set start_str "data_start"
set start_len [string length $start_str]
set end_str "data_end"
set end_len [string length $end_str]
while {[set start [string first $start_str $data $i]] != -1} {
set start [expr $start + $start_len]
set end [string first $end_str $data $start]
set end [expr $end - 1]
set item [string range $data $start $end]
set lines [split $item "\n"]
foreach {line} $lines {
foreach {a b c} $line {
puts "a=$a, b=$b, c=$c"
}
}
set i [expr $end + $end_len]
}
output:
a=30, b=abc1, c=xyz
a=90, b=abc2, c=xyz
a=214, b=abc3, c=xyz
I'd write that as
set fid [open $data_file]
set p 0
while {[gets $fid line] != -1} {
switch -regexp -- $line {
{^data_end} {set p 0}
{^data_start} {set p 1}
default {
if {$p && [regexp {^(\d+)\M} $line -> num]} {
lappend nums $num
}
}
}
}
close $fid
puts $nums
or, even
set nums [exec sed -rn {/data_start/,/data_end/ {/^([[:digit:]]+).*/ s//\1/p}} $data_file]
puts $nums
My favorite method would be to declare procs for each of the acceptable tokens and utilize the unknown mechanism to quietly ignore the unacceptable ones.
proc 30 args {
... handle 30 $args
}
proc 90 args {
... process 90 $args
}
rename unknown original_unknown
proc unknown args {
# This space was deliberately left blank
}
source datafile.txt
rename original_unknown unknown
You'll be using Tcl's built-in parsing, which should be considerably faster. It also looks better in my opinion.
You can also put the line-handling logic into your unknown-procedure entirely:
rename unknown original_unknown
proc unknown {first args} {
process $first $args
}
source input.txt
rename original_unknown unknown
Either way, the trick is that Tcl's own parser (implemented in C) will be breaking up the input lines into tokens for you -- so you don't have to implement the parsing in Tcl yourself.
This does not always work -- if, for example, the input is using multi-line syntax (without { and }) or if the tokens are separated with something other than white space. But in your case it should do nicely.

How to read number count of words?

How to read number count of words?
Lines has this format:
vertices_count
X, Y
X, Y
X, Y
(X, Y pair can be in the same line)
for example:
3
12.5, 56.8
12.5, 56.8
12.5, 56.8
I would like to read vertices_count number of words(escaping comma):
So for above example reading words should be:
12.5 56.8 12.5 56.8 12.5 56.8
set fh [open f r]
gets $fh num
read $fh data
close $fh
set number_re {-?\d+(?:\.\d*)?|-?\d*\.\d+}
set vertices {}
foreach {_ x y} [regexp -inline -all "($number_re),\\s*($number_re)" $data] {
lappend vertices $x $y
if {[llength $vertices] == $num * 2} break
}
puts $vertices
# => 12.5 56.8 12.5 56.8 12.5 56.8
while {[llength $vertices] < $num * 2} {
gets $fh line
foreach {_ x y} [regexp -inline -all "($number_re),\\s*($number_re)" $line] {
lappend vertices $x $y
if {[llength $vertices] == $num * 2} break
}
}
close $fh
I'm still not clear exactly what you are after. Here is some code to read data from a named file. Judging from your other question, you can have several sets of data in your input stream and this code returns them all as a list. Each element of the list is one set of coordinates
# Read the input from file
set fil [open filename.file]
set input [read $fil]
close $fil
set data [list]; # No output so for
set seekCount yes; # Next token is a vertex count
foreach token [string map {, " "} $input] {
# Convert commas to spaces
if {$seekCount} {
set nCoords [expr $token * 2];
# Save number of coordinates
set datum [list]; # Clean out vertex buffer
} else {
lappend datum $token; # Save coordinate
incr nCoords -1
if {$nCoords <= 0} {
# That was the last coordinate
lappend data $datum; # Append the list of coordinates
set seekCount yes; # and look for anopther count
}
}
}
This is a very quick-and-dirty solution, which makes no attempt to handle errors. One thing, however that it will cope with is variable amounds of whitespace and missing whitespace after the commas.
Good luck, I hope this helps.
This procedure first reads a count line, then reads that number of lines and puts as a list into $varName. It returns the number of elements in $varName, or -1 if EOF occured before a count was read.
proc getNLines {stream varName} {
upvar 1 $varName lines
set lines {}
if {[gets $stream n] < 0} {
return -1
}
while {$n > 0} {
if {[gets $stream line] < 0} {
error "bad data format"
}
lappend lines $line
incr n -1
}
return [llength $lines]
}
while {[getNLines stdin lines] >= 0} {
# ...
}

TCL comparing two fields in a file and printing the line

I have two files.
File a
0 10 20 30
10 20 30 40
0 10 23 34
Values of File a are in (x1 y1 x2 y2 format)
File b
format is P M M x y -(some number)
P M M 10 20 -100
P M M 20 30 -150
P M M 50 60 -100
in File B, I want to search if field 4,5(starting from P as field 1) is in range in File a. If it is then dont print the line of FileB else print the line.
we will not print line of File b where
x1 < x < x2 & y1 < y < y2
so the O/P of the script should be
File c
P M M 50 60 -100
I have written the following script in tcl but my issue is that it is not searching for all the content of file b in File a.
set abc "b"
set ab "a"
set cord [open $ab "r"]
if [catch {open $abc r} FILE_R {
puts "failed to read $abc"
return -1
}
while { [gets $FILE_R line] >= 0 } {
if [regexp {^#} $line ] {
} else {
set x_cord [lindex $line 3]
set y_cord [lindex $line 4]
while { [gets $cord line] >= 0 } {
set x1_cord [lindex $line 0]
set y1_cord [lindex $line 1]
set x2_cord [lindex $line 2]
set y2_cord [lindex $line 3]
if { [expr x1_cord < x_cord && x_cord < x2_cord && y1_cord < y_cord && y_cord < y2_cord ] == 1 } {
} else {
puts $line
}
}
}
}
close $FILE_R
This line:
if { [expr x1_cord < x_cord && x_cord < x2_cord && y1_cord < y_cord && y_cord < y2_cord ] == 1 } {
is rather wrong in a number if respects. In particular, those variables are not being read from. You're also doing extra work by putting expr inside an if condition, as they already use the same syntax. Instead, use this:
if {$x1_cord < $x_cord && $x_cord < $x2_cord && $y1_cord < $y_cord && $y_cord < $y2_cord} {
You also seem to be using if {somecondition} {} else { somescript }; that's not especially inefficient, but it looks strange. Just negate the condition and do it like this: if {!(somecondition)} { somescript }
First open the file of coordinates and read each line until you have the maximum and minimum values for each of x1,y1,x2,y2. That will be 8 values.
Then open the second file and read each line once. Then split it and compare it with your limits.
The problem you have above is you open each file once but you are attempting to re-read the coord file each time you have a FILE_R line. But you never seek back to the start of the file. If you did, it would be hugely inefficient but might work. Get the limits first then process the second file.
It is also smarter to match lines with regexp but split them with scan. This way you are less likely to run into problems with things Tcl considers significant like open braces or semicolons or excessive numbers of spaces. For instance:
% scan "P M M 10 20 -100" {P M M %d %d %d} x y z
3
% list $x $y $z
10 20 -100
Notice we can check we got the right number of scanned parameters (3).
Here is my solution:
package require Tclx
# Read file a.txt, which contains x1 y1 x2 y2
# After reading, coords will be
# {{0 10 20 30} {10 20 30 40} {0 10 23 34}}
# which is a list of lists
set coords [split [read_file a.txt] \n]
# Read file b.txt, line by line
# where each line contains P M M x y -(some number)
for_file line b.txt {
if {[regexp {^#} $line]} { continue }; # skip comment
#lassign $line p m1 m2 x y n; # parse the line into proper fields
foreach {p m1 m2 x y n} $line {};
set inrange 0; # assume out of range, will flag it if found otherwise
foreach coord $coords {
#lassign $coord x1 y1 x2 y2; # parse the range
foreach {x1 y1 x2 y2} $coord {}
if {$x1 < $x && $x < $x2 && $y1 < $y && $y < $y2} {
set inrange 1; # we are in range, set the flag for not printing
}
}
# Print if not in any range
if {!$inrange} { puts $line }
}
Discussion
The commands read_file, for_file, and lassign comes from the Tclx package, which simplify life quite a bit. The lassign becomes a built-in command starting with Tcl version 8.5.
To answer the comment: you can use read_file as long as you use the Tclx package. It does not matter if you have Tcl 8.4, 8.5, or beyond.
The rest of the code is straight-forward, I hope you don't have trouble understanding.
Since user2095095 does not have access to Tclx and he/she is running Tcl 8.4, I have to replace lassign with the foreach trick.