file writing operation in tcl takes lot of time - tcl

set filePointer [open "fileName" "r"]
set fileWritePointer [open "fileNameWrite" "w"]
set lines [split [read $filePointer] "\n"]
close $filePointer
set length [llength $lines]
for {set i 0} {$i<$length} {incr i} {
if {[regexp "Matching1" $line]} {
puts $fileWritePointer $line
}
if {[regexp "Matching" $line]} {
puts $fileWritePointer $line
}
}
close $fileWritePointer
I am reading all the lines of the file at a time and splitting it by new line character and reading each line at a time inside the for loop.
After some syntax checks using regexp for the lines I am dumping only selected lines into a new file using the below syntax.
puts $filePointer $line
My file has around 2 million lines of code.
Like this many regexp matching is present roughly around 1.5.

Without knowing why the code is slow (or what exactly you're using a baseline for measurement against) it's hard to be sure what to do to accelerate it. However, you can try switching to streaming processing:
set fin [open "fileName"]
set fout [open "fileNameWrite" "w"]
while {[gets $fin line] >= 0} {
if {[regexp "Matching1" $line]} {
puts $fout $line
}
if {[regexp "Matching" $line]} {
puts $fout $line
}
}
close $fout
close $fin
You should make sure that your regular expressions are constant values for the duration of the processing to avoid recompiling them for every line (which would be very slow!) though those constant values can be stored in variables, so long as those variables are used without anything being added to them:
set RE1 "Matching1"
set RE2 "Matching"
# Note: these variables are NOT assigned to below! They are just used!
set fin [open "fileName"]
set fout [open "fileNameWrite" "w"]
while {[gets $fin line] >= 0} {
# Added “--” to make sure that the REs are never interpreted as anything else
if {[regexp -- $RE1 $line]} {
puts $fout $line
}
if {[regexp -- $RE2 $line]} {
puts $fout $line
}
}
close $fout
close $fin
You might also get extra speed by choosing the right encodings, putting all this code in a procedure, etc. As noted, it's hard to be sure what is the best thing to try without knowing why the code is actually slow, and that in part depends on the system on which it is being run.

Do you actually need regular expression matching? String matching is likely to be faster.
Can more than one match be made against the same line, and in that case do you really need the line to be written once for each match? If not, you can speed things up by skipping the rest of the matching attempts once one has succeeded:
if {[regexp -- $RE1 $line]} {
puts $fout $line
} elseif {[regexp -- $RE2 $line]} {
puts $fout $line
} elseif { ... } {
or
if {
[regexp -- $RE1 $line] ||
[regexp -- $RE2 $line] ||
...
} then {
puts $fout $line
}
or
switch -regexp -- $line \
$RE1 - \
$RE2 - \
... - \
default {
puts $fout $line
}

Related

Tcl finding min value in a textfile

i have one Textfile with thousands of values and some alphanumerical chars like this:
\Test1
+3.00000E-04
+5.00000E-04
+4.00000E-04
now i want to scan this file and write the values into variables.
set path "C:/test.txt"
set in [open $path r]
while {[gets $in line] != -1} {
set Cache [gets $in line]
if { $Cache < $Cache } {
set lowest "$Cache"
}
}
has anybody an idea? im getting a alert which tells me the Directory couldnt deleted?!
br
You could use the core math function tcl::mathfunc::min. If there is "junk" (i.e. lines that contain text that aren't numbers), you can filter those lines out first:
set numbers {}
set f [open test.txt]
while {[gets $f line] >= 0} {
if {[string is double -strict $line]} {
lappend numbers [string trim $line]
}
}
close $f
tcl::mathfunc::min {*}$numbers
# => +3.00000E-04
If every line is a valid double precision floating point number, you can dispense with the filtering:
set f [open test.txt]
set numbers [split [string trim [read $f]]]
close $f
tcl::mathfunc::min {*}$numbers
# => +3.00000E-04
If you can use the Tcllib module fileutil, which is easy to pick up from the Tcllib site if not available on your installation (it is included in the ActiveTcl installation already), you can simplify the code somewhat:
package require fileutil
set numbers {}
::fileutil::foreachLine line test.txt {
if {[string is double -strict $line]} {
lappend ::numbers [string trim $line]
}
}
tcl::mathfunc::min {*}$numbers
or
package require fileutil
tcl::mathfunc::min {*}[split [string trim [::fileutil::cat test.txt]]]
Documentation:
>= (operator),
close,
fileutil (package),
gets,
if,
lappend,
namespace,
open,
package,
read,
set,
split,
string,
while,
{*} (syntax),
Mathematical functions for Tcl expressions

need to print two different strings from a text file using tcl script

I have arequirement to get two different strings from two consecutive lines from a file using tcl script
I tried following but it doesn't work.
So here below i need to print string "Clock" and "b0". I am able to print Clock. but i need both "clock" "b0"
set f [eval exec "cat src.txt"]
set linenumber 0
while {[gets $f line] >= 0} {
incr linenumber
if {[string match "Clock" $line] >= 0 } {
# ignore by just going straight to the next loop iteration
while {[gets $f line] >= 0} {
incr linenumber
if { [string match "b0" $line"]} {
close $out
puts "final $line"
}
puts "\n$line"
continue
}
}
}
close $f
Is this you want?
set f [open "src.txt" r]; # opening file
while {![eof $f] >= 0} {
set line [gets $f]; # reading line from file
if {[string match "*Clock*" $line]} {
; # if Clock found
puts $line
set line [gets $f]; # reading next line from file
if { [string match "*b0*" $line]} {
; # if b0 found
puts "final $line"
}
}
}
close $f

How to get the data between two strings from a file in tcl?

In TCL Scripting:
I have a file in that i know how to search a string but how to get the line number when string is found.please answer me if it is possible
or
set fd [open test.txt r]
while {![eof $fd]} {
set buffer [read $fd]
}
set lines [split $buffer "\n"]
if {[regexp "S1 Application Protocol" $lines]} {
puts "string found"
} else {puts "not found"}
#puts $lines
#set i 0
#while {[regexp -start 0 "S1 Application Protocol" $line``s]==0} {incr i
#puts $i
#}
#puts [llength $lines]
#puts [lsearch -exact $buffer S1]
#puts [lrange $lines 261 320]
in the above program i am getting the output as string found .if i will give the string other than in this file i am getting string not found.
The concept of 'a line' is just a convention that we layer on top of the stream of data that we get from a file. So if you want to work with line numbers then you have to calculate them yourself. The gets command documnetion contains the following example:
set chan [open "some.file.txt"]
set lineNumber 0
while {[gets $chan line] >= 0} {
puts "[incr lineNumber]: $line"
}
close $chan
So you just need to replace the puts statement with your code to find the pattern of text you want to find and when you find it the value of $line gives you the line number.
To copy text that lies between two other lines I'd use something like the following
set chan [open "some.file.txt"]
set out [open "output.file.txt" "w"]
set lineNumber 0
# Read until we find the start pattern
while {[gets $chan line] >= 0} {
incr lineNumber
if { [string match "startpattern" $line]} {
# Now read until we find the stop pattern
while {[gets $chan line] >= 0} {
incr lineNumber
if { [string match "stoppattern" $line] } {
close $out
break
} else {
puts $out $line
}
}
}
}
close $chan
The easiest way is to use the fileutil::grep command:
package require fileutil
# Search for ipsum from test.txt
foreach match [fileutil::grep "ipsum" test.txt] {
# Each match is file:line:text
set match [split $match ":"]
set lineNumber [lindex $match 1]
set lineText [lindex $match 2]
# do something with lineNumber and lineText
puts "$lineNumber - $lineText"
}
Update
I realized that if the line contains colon, then lineText is truncated at the third colon. So, instead of:
set lineText [lindex $match 2]
we need:
set lineText [join [lrange $match 2 end] ":"]

I want to append two simultaneous line with tcl, how to do that?

suppose my file contains
we must greap the ep
the whole ep
endpoint: /usr/home/bin/tcl_
giga/hope (v)
beginpoint" /usr/home/bin/lp50 (^)
I only want to print the endpoint path i.e. /usr/home/bin/tcl_giga/hope in one line.
Can anyone help me regarding the same. Actually i have write my code like :-
set fp [open "text" "r+"]
while {![eof $fp]} {
gets $fp line
puts $line
if {[regexp {endpoint:} $line]} {
set new_line $line
puts $new_line
}
}
But that is only printing the 1st endpoint line.
This is one way to do it. It allows you to change the number of lines to print after an "endpoint:" line.
set printNMore 0
while {[gets $fp line] >= 0} {
if {[string first "endpoint:" $line] >= 0} {
set printNMore 2
}
if {$printNMore > 0} {
puts $line
incr printNMore -1
}
}
The simplest way to do that line-at-a-time is something like this:
set fp [open "text" "r+"]
while {[gets $fp line] >= 0} {
if {[string match "endpoint: *" $line]} {
puts -nonewline [string range $line 10 end]
# Plus the whole of the next line...
puts [gets $fp]
}
}
However, if I was doing this myself I'd slurp the whole file in at once and use a regular expression:
set fp [open "text" "r+"]
set contents [read $fp]
if {[regexp {endpoint: ?([^\n]+)\n([^\n]*)} -> bit1 bit2]} {
# Print the concatenation of the capture groups
puts "$bit1$bit2"
}
I could write a better pattern if I knew what the file format was more precisely…

How to insert a line break and grab data inside foreach loop

I have a set of fields to parse from a file and Im doing it line by line inside a foreach loop, i want to know how i can skip a line and go to the next line
For example : if encounter a string called "ABC", i need to grab a number in the next line,
some characters "ABC"
123
The problem is I'm actually having a lot of numbers in the file but i need to grab a number, specifically the number which is after a line break after the string "ABC".
How can i do this
?
It's a bit easier to do with a while loop, reading one line at a time, since you can then easily read an extra line when you find your trigger case (assuming you don't have a run of lines with "ABC" in them):
set fd [open $theFilename]
while {[gets $fd line] >= 0} {
if {
[string match *"ABC"* $line]
&& [gets $fd line] >= 0
&& [regexp {\d+} $line -> num]
} then { # I like to use 'then' after a multi-line conditional; it's optional
puts "Found number $num after \"ABC\""
}
}
close $fd
The reason this is awkward with foreach is that it will always process the same number of elements each time through the loop.
If you're dealing with data which can have the run-of-lines issue alluded to above, you are actually better off with foreach curiously enough:
set fd [open $theFilename]
set lines [split [read $fd] \n]
close $fd
foreach line $lines {
incr idx; # Always the index of the *next* line
if {
[string match *"ABC"* $line]
&& [regexp {\d+} [lindex $lines $idx] -> num]
} then {
puts "Found number $num after \"ABC\""
}
}
This works because when you do lindex of something past the end, it produces the empty string (which won't match that simple regular expression).
You can try this simple solution
set trigger 0
set fh [open "your_file" "r"]
while {[gets $fh line] != -1} {
if {[regexp -- {"ABC"} $line]} {
incr trigger
continue
}
if {$trigger > 0} {
puts $line ; # or do something else
incr trigger -1
}
}
close $fh