How to get selective data from a file in TCL? - tcl

I am trying to parse selective data from a file based on certain key words using tcl,for example I have a file like this
...
...
..
...
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
...
...
...
How do I catch only the 30, 90 and 214 between "data_start" and "data_end"? What I have so far(tcl newbie),
proc get_data_value{ data_file } {
set lindex 0
set fp [open $data_file r]
set filecontent [read $fp]
while {[gets $filecontent line] >= 0} {
if { [string match "data_start" ]} {
#Capture only the first number?
#Use regex? or something else?
if { [string match "data_end" ] } {
break
} else {
##Do Nothing?
}
}
}
close $fp
}

If your file is smaller in size, then you can use read command to slurp the whole data into a variable and then apply regexp to extract the required information.
input.txt
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
data_start
130 abc1 xyz
190 abc2 xyz
1214 abc3 xyz
data_end
extractNumbers.tcl
set fp [open input.txt r]
set data [read $fp]
close $fp
set result [regexp -inline -all {data_start.*?\n(\d+).*?\n(\d+).*?\n(\d+).*?data_end} $data]
foreach {whole_match number1 number2 number3} $result {
puts "$number1, $number2, $number3"
}
Output :
30, 90, 214
130, 190, 1214
Update :
Reading a larger file's content into a single variable will cause the program to crash depends on the memory of your PC. When I tried to read a file of size 890MB with read command in a Win7 8GB RAM laptop, I got unable to realloc 531631112 bytes error message and tclsh crashed. After some bench-marking found that it is able to read a file with a size of 500,015,901 bytes. But the program will consume 500MB of memory since it has to hold the data.
Also, having a variable to hold this much data is not efficient when it comes to extracting the information via regexp. So, in such cases, it is better to go ahead with read the content line by line.
Read more about this here.

Load all the data from the file into a variable. Set start and end tokens and seek to those positions. Process the item line by line. Tcl uses lists of strings separated by white space so we can process the items in the line with foreach {a b c} $line {...}.
tcl:
set data {...
...
..
...
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
...
...
...}
set i 0
set start_str "data_start"
set start_len [string length $start_str]
set end_str "data_end"
set end_len [string length $end_str]
while {[set start [string first $start_str $data $i]] != -1} {
set start [expr $start + $start_len]
set end [string first $end_str $data $start]
set end [expr $end - 1]
set item [string range $data $start $end]
set lines [split $item "\n"]
foreach {line} $lines {
foreach {a b c} $line {
puts "a=$a, b=$b, c=$c"
}
}
set i [expr $end + $end_len]
}
output:
a=30, b=abc1, c=xyz
a=90, b=abc2, c=xyz
a=214, b=abc3, c=xyz

I'd write that as
set fid [open $data_file]
set p 0
while {[gets $fid line] != -1} {
switch -regexp -- $line {
{^data_end} {set p 0}
{^data_start} {set p 1}
default {
if {$p && [regexp {^(\d+)\M} $line -> num]} {
lappend nums $num
}
}
}
}
close $fid
puts $nums
or, even
set nums [exec sed -rn {/data_start/,/data_end/ {/^([[:digit:]]+).*/ s//\1/p}} $data_file]
puts $nums

My favorite method would be to declare procs for each of the acceptable tokens and utilize the unknown mechanism to quietly ignore the unacceptable ones.
proc 30 args {
... handle 30 $args
}
proc 90 args {
... process 90 $args
}
rename unknown original_unknown
proc unknown args {
# This space was deliberately left blank
}
source datafile.txt
rename original_unknown unknown
You'll be using Tcl's built-in parsing, which should be considerably faster. It also looks better in my opinion.
You can also put the line-handling logic into your unknown-procedure entirely:
rename unknown original_unknown
proc unknown {first args} {
process $first $args
}
source input.txt
rename original_unknown unknown
Either way, the trick is that Tcl's own parser (implemented in C) will be breaking up the input lines into tokens for you -- so you don't have to implement the parsing in Tcl yourself.
This does not always work -- if, for example, the input is using multi-line syntax (without { and }) or if the tokens are separated with something other than white space. But in your case it should do nicely.

Related

how to read and and perform calculations on fix point value with tcl

I would like to read this file below with tcl:
BEGIN
%Time (real) HG (real)
!Time HG
-0.000110400001 0.6
-0.000110399901 0.6
-0.000110399801 0.6
-0.000110399701 0.6
-0.000110399601 0.55
-0.000110399501 0.5
-0.000110399401 0.45
-0.000110399301 0.4
-0.000110399201 0.45
-0.000110399101 0.5
-0.000110399001 0.55
-0.000110398901 0.6
For each Time column, i would like to increment by +0.000110400001 and write this result in new file. i would like other column doesn't be modified and copy as such.
I began to coding (see below), i can open and read the value but I don't how to convert string in fix point and make addition on this. If anyone help me that would be nice.
set inVector [lindex $argv 0]
puts "input vector : $inVector"
set filename "resultat.mdf"
set fileId [open $filename "w"]
set PROCESSING_FILE [open "$inVector" r]
while {[eof $PROCESSING_FILE]==0} {
set string [gets $PROCESSING_FILE]
if {[string index $string 3] != "B"} {
if {[string index $string 3] != "%"} {
if {[string index $string 3] != "!"} {
foreach line $string {
puts "input value : $line"
}
} else {
puts $fileId $string
}
} else {
puts $fileId $string
}
} else {
puts $fileId $string
}
}
close $PROCESSING_FILE
close $fileId
For lines with digits on, you could probably read them like this:
scan $string "%f %f" time hg
If that returns 2 (for two fields processed) you've successfully read two (floating point) numbers from that line. Otherwise, the line is something else. This leads to code like this (with some standard line-by-line idioms that must've been written up already in some other question):
# Skipping all the code for opening files
# While we successfully read a line from the input file
while {[gets $PROCESSING_FILE line] >= 0} {
# Attempt to parse two floats (with at least one whitespace between) out of line
if {[scan $line "%f %f" time hg] == 2} {
# Action to take on a matched line
puts "input line: '$line' time:$time HG:$hg"
} else {
# Action to take on an unmatched line
puts $fileId $line
}
}
# Skipping the code for closing files
For files with truly fixed width fields, you use string range to pick out pieces of the line and then attempt to parse those (or you write a messy regular expression and use regexp). Those tend to need more tuning to the data.

Parse CSV into key value pair

I have a csv file which has hostname and attached serial numbers. I want to create a key value pair with key being hostname and value being the list of serial numbers. The serial numbers can be one or many.
For example:
A, 1, 2, 3, 4
B, 5, 6
C, 7, 8, 9
D, 10
I need to access key A and get {1 2 3 4} as output. And if I access D i should get {10}
How should I do this? As the version of TCL i am using doesn't support any packages like CSV and I also won't be able to install it as it is in the server, So I am looking at a solution which doesn't include any packages.
For now, I am splitting the line with \n and then I process each element. Then I split the elements with "," and then I get the host name and serial numbers in a list. I then use the 0th index of the list as hostname and remaining values as serial numbers. Is there a cleaner solution?
I'd do something like:
#!/usr/bin/env tclsh
package require csv
package require struct::queue
set filename "file.csv"
set fh [open $filename r]
set q [struct::queue]
csv::read2queue $fh $q
close $fh
set data [dict create]
while {[$q size] > 0} {
set values [lassign [$q get] hostname]
dict set data $hostname [lmap elem $values {string trimleft $elem}]
}
dict for {key value} $data {
puts "$key => $value"
}
then
$ tclsh csv.tcl
A => 1 2 3 4
B => 5 6
C => 7 8 9
D => 10
The repeated recommendation given here is to use the CSV package for this purpose. See also the answer by #glenn-jackman. If unavailable, the time is better invested in obtaining it at your server side.
To get you started, however, you might want to adopt something along the lines of:
set dat {
A, 1, 2, 3, 4
B, 5, 6
C, 7, 8, 9
D, 10
}
set d [dict create]
foreach row [split [string trim $dat] \n] {
set row [lassign [split $row ,] key]
dict set d [string trim $key] [concat {*}$row]
}
dict get $d A
dict get $d D
Be warned, however, such hand-knitted solutions typically only serve their purpose when you have full control of the data being processed and its representation. Again, time is better invested by obtaining the CSV package.
I tried this way and got it working. Thanks again for your inputs. Yes, I know csv package would be easy but I cannot install it in server/product.
set multihost "host_slno.csv"
set fh1 [open $multihost r]
set data [read -nonewline $fh1]
close $fh1
set hostslnodata [ split $data "\n" ]
set hostslno [dict create];
foreach line $hostslnodata {
set line1 [join [split $line ", "] ]
puts "$line1"
if {[regexp {([A-Za-z0-9_\-]+)\s+(.*)} $line1 match hostname serial_numbers]} {
dict lappend hostslno $hostname $serial_numbers
}
}
puts [dict get $hostslno]
The sourcecode from the csv package is available. If you are unable to install the full csv package, you can include the code from here:
http://core.tcl.tk/tcllib/artifact/2898cd911697ecdb
If you still can't use that option, then stripping out all the whitespace and splitting on "," is required.
An alternative to the earlier answers is using string map:
set row [split [string map {" " ""} $row ] ,]
The string map will remove all spaces, and then split on ","
Once you have converted the lines of text into valid tcl lists:
A 1 2 3 4
B 5 6
C 7 8 9
D 10
Then you can use the lindex and lrange commands to pluck off all the pieces.
foreach row $data {
set server [lindex $row 0]
set serial_numbers [lrange $row 1 end]
dict set ...
One possibility:
set hostslno [dict create]
set multihost "host_slno.csv"
set fh1 [open $multihost]
while {[gets $fh line] >= 0} {
set numbers [lassign [regexp -inline -all {[^\s,]+} $line] hostname]
dict set hostslno $hostname $numbers
}
close $fh1
puts [dict get $hostslno A]

how to split a file to list of lists TCL

I'm coding TCL and I would like to split a file into two lists of lists,
the file contain:
(1,2) (3,4) (5,6)
(7,8) (9,10) (11,12)
and I would like to get two list
one for each line, that contain lists that each one contain to two number
for example:
puts $list1 #-> {1 2} {3 4} {5 6}
puts [lindex $list1 0] #-> 1 2
puts [lindex $list2 2] #-> 11 12
I tried to use regexp and split but no success
The idea of using regexp is good, but you'll need to do some post-processing on its output.
# This is what you'd read from a file
set inputdata "(1,2) (3,4) (5,6)\n(7,8) (9,10) (11,12)\n"
foreach line [split $inputdata "\n"] {
# Skip empty lines.
# (I often put a comment format in my data files too; this is where I'd handle it.)
if {$line eq ""} continue
# Parse the line.
set bits [regexp -all -inline {\(\s*(\d+)\s*,\s*(\d+)\s*\)} $line]
# Example results of regexp:
# (1,2) 1 2 (3,4) 3 4 (5,6) 5 6
# Post-process to build the lists you really want
set list([incr idx]) [lmap {- a b} $bits {list $a $b}]
}
Note that this is building up an array; long experience says that calling variables list1, list2, …, when you're building them in a loop is a bad idea, and that an array should be used, effectively giving variables like list(1), list(2), …, as that yields a much lower bug rate.
An alternate approach is to use a simpler regexp and then have scan parse the results. This can be more effective when the numbers aren't just digit strings.
foreach line [split $inputdata "\n"] {
if {$line eq ""} continue
set bits [regexp -all -inline {\([^()]+\)} $line]
set list([incr idx]) [lmap substr $bits {scan $substr "(%d,%d)"}]
}
If you're not using Tcl 8.6, you won't have lmap yet. In that case you'd do something like this instead:
foreach line [split $inputdata "\n"] {
if {$line eq ""} continue
set bits [regexp -all -inline {\(\s*(\d+)\s*,\s*(\d+)\s*\)} $line]
set list([incr idx]) {}
foreach {- a b} $bits {
lappend list($idx) [list $a b]
}
}
foreach line [split $inputdata "\n"] {
if {$line eq ""} continue
set bits [regexp -all -inline {\([^()]+\)} $line]
set list([incr idx]) {}
foreach substr $bits {
lappend list($idx) [scan $substr "(%d,%d)"]
# In *very* old Tcl you'd need this:
# scan $substr "(%d,%d)" a b
# lappend list($idx) [list $a $b]
}
}
You have an answer already, but it can actually be done a little bit simpler (or at least without regexp, which is usually a good thing).
Like Donal, I'll assume this to be the text read from a file:
set lines "(1,2) (3,4) (5,6)\n(7,8) (9,10) (11,12)\n"
Clean it up a bit, removing the parentheses and any white space before and after the data:
% set lines [string map {( {} ) {}} [string trim $lines]]
1,2 3,4 5,6
7,8 9,10 11,12
One way to do it with good old-fashioned Tcl, resulting in a cluster of variables named lineN, where N is an integer 1, 2, 3...:
set idx 0
foreach lin [split $lines \n] {
set res {}
foreach li [split $lin] {
lappend res [split $li ,]
}
set line[incr idx] $res
}
A doubly iterative structure like this (a number of lines, each having a number of pairs of numbers separated by a single comma) is easy to process using one foreach within the other. The variable res is used for storing result lines as they are assembled. At the innermost level, the pairs are split and list-appended to the result. For each completed line, a variable is created to store the result: its name consists of the string "line" and an increasing index.
As Donal says, it's not a good idea to use clusters of variables. It's much better to collect them into an array (same code, except for how the result variable is named):
set idx 0
foreach lin [split $lines \n] {
set res {}
foreach li [split $lin] {
lappend res [split $li ,]
}
set line([incr idx]) $res
}
If you have the results in an array, you can use the parray utility command to list them in one fell swoop:
% parray line
line(1) = {1 2} {3 4} {5 6}
line(2) = {7 8} {9 10} {11 12}
(Note that this is printed output, not a function return value.)
You can get whole lines from this result:
% set line(1)
{1 2} {3 4} {5 6}
Or you can access pairs:
% lindex $line(1) 0
1 2
% lindex $line(2) 2
11 12
If you have the lmap command (or the replacement linked to below), you can simplify the solution somewhat (you don't need the res variable):
set idx 0
foreach lin [split $lines \n] {
set line([incr idx]) [lmap li [split $lin] {
split $li ,
}]
}
Still simpler is to let the result be a nested list:
set lineList [lmap lin [split $lines \n] {
lmap li [split $lin] {
split $li ,
}
}]
You can access parts of the result similar to above:
% lindex $lineList 0
{1 2} {3 4} {5 6}
% lindex $lineList 0 0
1 2
% lindex $lineList 1 2
11 12
Documentation:
array,
foreach,
incr,
lappend,
lindex,
lmap (for Tcl 8.5),
lmap,
parray,
set,
split,
string
The code works for windows :
TCL file code is :
proc captureImage {} {
#open the image config file.
set configFile [open "C:/main/image_config.txt" r]
#To retrive the values from the config file.
while {![eof $configFile]} {
set part [split [gets $configFile] "="]
set props([string trimright [lindex $part 0]]) [string trimleft [lindex $part 1]]
}
close $configFile
set time [clock format [clock seconds] -format %Y%m%d_%H%M%S]
set date [clock format [clock seconds] -format %Y%m%d]
#create the folder with the current date
set folderPath $props(folderPath)
append folderDate $folderPath "" $date "/"
set FolderCreation [file mkdir $folderDate]
while {0} {
if { [file exists $date] == 1} {
}
break
}
#camera selection to capture image.
set camera "video"
append cctv $camera "=" $props(cctv)
#set the image resolution (XxY).
set resolutionX $props(resolutionX)
set resolutionY $props(resolutionY)
append resolution $resolutionX "x" $resolutionY
#set the name to the save image
set imagePrefix $props(imagePrefix)
set imageFormat $props(imageFormat)
append filename $folderDate "" $imagePrefix "_" $time "." $imageFormat
set logPrefix "Image_log"
append logFile $folderDate "" $logPrefix "" $date ".txt"
#ffmpeg command to capture image in background
exec ffmpeg -f dshow -benchmark -i $cctv -s $resolution $filename >& $logFile &
after 3000
}
}
captureImage
thext file code is :
cctv=Integrated Webcam
resolutionX=1920
resolutionY=1080
imagePrefix=ImageCapture
imageFormat=jpg
folderPath=c:/test/
//camera=video=Integrated Webcam,Logitech HD Webcam C525
This code works for me me accept the code from text file were list of parameters are passed.

Ignoring lines in a file TCL

Suppose I have a file1 with a few queries inside,
Query 1
Query 2
Query 3
And I have a normal text file2 containing a bunch of data
Data 1 Query 1 something something
Data something Query 2 something something
Something Query 3 something something
Data1 continue no query
Data2 continue no query
How do I create a loop such that it ignores the lines containing queries from file1 and prints only lines without the queries in the file? so in this case only these values gets printed
Data1 continue no query
Data2 continue no query
i tried producing the results using this loop script i made
Storing the queries to be ignored from file1 into $wlistItems
set openFile1 [open file1.txt r]
while {[gets openFile1 data] > -1} {
set wlist $data
append wlistItems "{$wlist}\n"
}
close $openFile1
Processing file2 to print lines without ignored queries
set openFile2 [open file2.txt r]
while {[gets $openFile2 data] > -1} {
for {set n 0} {$n < [llength $wListItems]} {incr n} {
if {[regexp -all "[lindex $wListItems $n]" $data all value]} {
continue
}
puts $data
}
}
close $openFile2
However, the script does not skip the lines. It instead prints out repeated data from file2.
while {[gets $openFile2 data] > -1} {
set found 0
for {set n 0} {$n < [llength $wListItems]} {incr n} {
if {[regexp -all "[lindex $wListItems $n]" $data all value]} {
set found 1
break
}
}
if {!$found} {
puts $data
}
}
A simpler solution:
package require fileutil
set queries [join [split [string trim [::fileutil::cat file1]] \n] |]
::fileutil::foreachLine line file2 {
if {![regexp ($queries) $line]} {
puts $line
}
}
The first command (after the package require) reads the file with the queries and packs them up as a set of branches (Query 1|Query 2|Query 3). The second command processes the second file line by line and prints those lines that don't contain any of those branches.
Documentation: fileutil package, if, join, package, puts, Syntax of Tcl regular expressions, regexp, set, split, string
I'd just do this:
puts [exec grep -Fvf file1 file2]

Looking for a search string in a file and using those lines for processing in TCL

To be more precise:
I need to be looking into a file abc.txt which has contents something like this:
files/f1/atmp.c 98 100
files/f1/atmp1.c 89 100
files/f1/atmp2.c !! 75 100
files/f2/btmp.c 92 100
files/f2/btmp2.c !! 85 100
files/f3/xtmp.c 92 100
The script needs to find "!!" and use those lines to print out the following as output:
atmp2.c 75
btmp2.c 85
Any help?
this should do the trick.
set data {files/f1/atmp.c 98 100
files/f1/atmp1.c 89 100
files/f1/atmp2.c !! 75 100
files/f2/btmp.c 92 100
files/f2/btmp2.c !! 85 100
files/f3/xtmp.c 92 100}
set lines [split $data \n]
foreach line $lines {
set match [regexp {(\S+)\s+!!\s+(\d+)} $line -> file num]
if {$match} {puts "$file $num"}
}
Although regexp has a -all switch I don't think we can use it here as we only get the last match vars with -all
If your file isn't huge, you can slurp the whole thing into memory, split the lines into a TCL list, and then iterate through the list looking for a match. For example:
set fh [open foo]
set lines [read $fh]
close $fh
set lines [split $lines "\n"]
foreach line $lines {
if { [regexp {.*/(\S+\.c)\s*!!\s*(\d+)} $line match file data] } {
puts "$file $data"
}
}
This will successfully return just the lines with "!!" in them. With your posted corpus, the results are:
atmp2.c 75
btmp2.c 85
I might be tempted in this case to exec to awk:
set output [exec awk {$2 == "!!" {print $1, $3}} abc.txt]
puts $output
The trick is to combine the code that reads lines from the file with a regular expression that detects matching lines and extracts the relevant parts (a one-step process with regexp). The only tricky part is working out what exactly to use as the regular expression, so that you get exactly what you want. I'm going to guess that you're after the parts of the filenames after the /, that those filenames won't contain spaces, and that the number you're after is the entirety of the first digit sequence after the double exclamation. (Other formats are possible, some of which are easier to extract with other tools such as scan.) That would give us something like this:
set f [open abc.txt]
while {[gets $f line] >= 0} {
if {[regexp {([^\s/]+)\s+!!\s+(\d+)} $line -> name value]} {
# Or do whatever you want with these
puts "$name $value"
}
}
close $f
(The gets command with two arguments returns the length of line read, or -1 on failure. For normal files the only failure mode is EOF, so we can just terminate the loop when we get a negative value. Other kinds of channels can be more complex…)