Extracting data from a text file and writing it elsewhere - tcl

How can I read a file and put the elements in it to a list and write the contents to other file?
the file contents are
this is my house and it is very good
this is my village and it is the best
good and best has to be repeated 10 times.
help me if possible

Did you mean something like that?
set fi [open "filename"]
set fo [open "outputfile" w]
while {[gets $fi line]!=-1} {
if {$line!=""} {
for {set i 0} {$i<10} {incr i} {
puts $fo $line
}
}
}
close $fi
close $fo

Your question is unclear.
Do you mean the lines containing "good" or "best" need to be repeated 10 times?
set fin [open infile r]
set fout [open outfile w]
while {[gets $fin line] != -1} {
switch -glob -- $line {
*good* -
*best* {
for {set i 1} {$i <= 10} {incr i} {
puts $fout $line
}
}
default {
# what to do for lines NOT containing "good" or "best"?
}
}
}
close $fin
close $fout
If you mean the actual words need to be repeated 10 times:
while {[gets $fin line] != -1} {
puts $fout [regsub -all {\y(good|best)\y} $line {& & & & & & & & & &}]
}
An example for that regsub command:
set str "these goods are good; that bestseller is the best"
puts [regsub -all {\y(good|best)\y} $str {& & &}]
# prints: these goods are good good good; that bestseller is the best best best
Update
Based on your comments, you might want:
array set values {good 10 best 20}
while {[gets $fin line] != -1} {
regsub -all {\y(good|best)\y} $line {&-$values(&)} line
puts $fout [subst -nobackslashes -nocommands $line]
}

Related

file writing operation in tcl takes lot of time

set filePointer [open "fileName" "r"]
set fileWritePointer [open "fileNameWrite" "w"]
set lines [split [read $filePointer] "\n"]
close $filePointer
set length [llength $lines]
for {set i 0} {$i<$length} {incr i} {
if {[regexp "Matching1" $line]} {
puts $fileWritePointer $line
}
if {[regexp "Matching" $line]} {
puts $fileWritePointer $line
}
}
close $fileWritePointer
I am reading all the lines of the file at a time and splitting it by new line character and reading each line at a time inside the for loop.
After some syntax checks using regexp for the lines I am dumping only selected lines into a new file using the below syntax.
puts $filePointer $line
My file has around 2 million lines of code.
Like this many regexp matching is present roughly around 1.5.
Without knowing why the code is slow (or what exactly you're using a baseline for measurement against) it's hard to be sure what to do to accelerate it. However, you can try switching to streaming processing:
set fin [open "fileName"]
set fout [open "fileNameWrite" "w"]
while {[gets $fin line] >= 0} {
if {[regexp "Matching1" $line]} {
puts $fout $line
}
if {[regexp "Matching" $line]} {
puts $fout $line
}
}
close $fout
close $fin
You should make sure that your regular expressions are constant values for the duration of the processing to avoid recompiling them for every line (which would be very slow!) though those constant values can be stored in variables, so long as those variables are used without anything being added to them:
set RE1 "Matching1"
set RE2 "Matching"
# Note: these variables are NOT assigned to below! They are just used!
set fin [open "fileName"]
set fout [open "fileNameWrite" "w"]
while {[gets $fin line] >= 0} {
# Added “--” to make sure that the REs are never interpreted as anything else
if {[regexp -- $RE1 $line]} {
puts $fout $line
}
if {[regexp -- $RE2 $line]} {
puts $fout $line
}
}
close $fout
close $fin
You might also get extra speed by choosing the right encodings, putting all this code in a procedure, etc. As noted, it's hard to be sure what is the best thing to try without knowing why the code is actually slow, and that in part depends on the system on which it is being run.
Do you actually need regular expression matching? String matching is likely to be faster.
Can more than one match be made against the same line, and in that case do you really need the line to be written once for each match? If not, you can speed things up by skipping the rest of the matching attempts once one has succeeded:
if {[regexp -- $RE1 $line]} {
puts $fout $line
} elseif {[regexp -- $RE2 $line]} {
puts $fout $line
} elseif { ... } {
or
if {
[regexp -- $RE1 $line] ||
[regexp -- $RE2 $line] ||
...
} then {
puts $fout $line
}
or
switch -regexp -- $line \
$RE1 - \
$RE2 - \
... - \
default {
puts $fout $line
}

Tcl echo file contents to transcript

i'm using some simulator that uses Tcl for transcript commands (Questa sim)
i want to echo file content like "cat" command in unix.
can it be done in one line command at tcl? is it possible to "cat" just the 5 first lines of file
In one line
puts [read [open data.dat r]]
OR step by step..
set handle [open data.dat r]
puts [read $handle]
close $handle
To open a file and echo its contents to standard output (just like cat), do this:
set f [open $filename]
fcopy $f stdout
close $f
To just do the first five lines (which is just like head -5), use this procedure:
proc head {filename {lineCount 5}} {
set f [open $filename]
for {set i 0} {$i < $lineCount} {incr i} {
if {[gets $f line] >= 0} {
puts $line
}
}
close $f
}
It takes more work because it's more complex to detect line endings than it is to just ship bytes around.
Here is the following code, to read 5 lines at a time from a given file.
#!/usr/bin/tclsh
set prev_count -1
set fp [open "input-file.txt" "r"]
set num_lines [split [read $fp] \n]
for {set i 4} {$i < [llength $num_lines]} { incr i 5} {
set line_5 [lrange $num_lines [incr prev_count] $i ]
set prev_count $i
puts "$line_5\n\n"
}

need to print two different strings from a text file using tcl script

I have arequirement to get two different strings from two consecutive lines from a file using tcl script
I tried following but it doesn't work.
So here below i need to print string "Clock" and "b0". I am able to print Clock. but i need both "clock" "b0"
set f [eval exec "cat src.txt"]
set linenumber 0
while {[gets $f line] >= 0} {
incr linenumber
if {[string match "Clock" $line] >= 0 } {
# ignore by just going straight to the next loop iteration
while {[gets $f line] >= 0} {
incr linenumber
if { [string match "b0" $line"]} {
close $out
puts "final $line"
}
puts "\n$line"
continue
}
}
}
close $f
Is this you want?
set f [open "src.txt" r]; # opening file
while {![eof $f] >= 0} {
set line [gets $f]; # reading line from file
if {[string match "*Clock*" $line]} {
; # if Clock found
puts $line
set line [gets $f]; # reading next line from file
if { [string match "*b0*" $line]} {
; # if b0 found
puts "final $line"
}
}
}
close $f

How to get the data between two strings from a file in tcl?

In TCL Scripting:
I have a file in that i know how to search a string but how to get the line number when string is found.please answer me if it is possible
or
set fd [open test.txt r]
while {![eof $fd]} {
set buffer [read $fd]
}
set lines [split $buffer "\n"]
if {[regexp "S1 Application Protocol" $lines]} {
puts "string found"
} else {puts "not found"}
#puts $lines
#set i 0
#while {[regexp -start 0 "S1 Application Protocol" $line``s]==0} {incr i
#puts $i
#}
#puts [llength $lines]
#puts [lsearch -exact $buffer S1]
#puts [lrange $lines 261 320]
in the above program i am getting the output as string found .if i will give the string other than in this file i am getting string not found.
The concept of 'a line' is just a convention that we layer on top of the stream of data that we get from a file. So if you want to work with line numbers then you have to calculate them yourself. The gets command documnetion contains the following example:
set chan [open "some.file.txt"]
set lineNumber 0
while {[gets $chan line] >= 0} {
puts "[incr lineNumber]: $line"
}
close $chan
So you just need to replace the puts statement with your code to find the pattern of text you want to find and when you find it the value of $line gives you the line number.
To copy text that lies between two other lines I'd use something like the following
set chan [open "some.file.txt"]
set out [open "output.file.txt" "w"]
set lineNumber 0
# Read until we find the start pattern
while {[gets $chan line] >= 0} {
incr lineNumber
if { [string match "startpattern" $line]} {
# Now read until we find the stop pattern
while {[gets $chan line] >= 0} {
incr lineNumber
if { [string match "stoppattern" $line] } {
close $out
break
} else {
puts $out $line
}
}
}
}
close $chan
The easiest way is to use the fileutil::grep command:
package require fileutil
# Search for ipsum from test.txt
foreach match [fileutil::grep "ipsum" test.txt] {
# Each match is file:line:text
set match [split $match ":"]
set lineNumber [lindex $match 1]
set lineText [lindex $match 2]
# do something with lineNumber and lineText
puts "$lineNumber - $lineText"
}
Update
I realized that if the line contains colon, then lineText is truncated at the third colon. So, instead of:
set lineText [lindex $match 2]
we need:
set lineText [join [lrange $match 2 end] ":"]

I want to append two simultaneous line with tcl, how to do that?

suppose my file contains
we must greap the ep
the whole ep
endpoint: /usr/home/bin/tcl_
giga/hope (v)
beginpoint" /usr/home/bin/lp50 (^)
I only want to print the endpoint path i.e. /usr/home/bin/tcl_giga/hope in one line.
Can anyone help me regarding the same. Actually i have write my code like :-
set fp [open "text" "r+"]
while {![eof $fp]} {
gets $fp line
puts $line
if {[regexp {endpoint:} $line]} {
set new_line $line
puts $new_line
}
}
But that is only printing the 1st endpoint line.
This is one way to do it. It allows you to change the number of lines to print after an "endpoint:" line.
set printNMore 0
while {[gets $fp line] >= 0} {
if {[string first "endpoint:" $line] >= 0} {
set printNMore 2
}
if {$printNMore > 0} {
puts $line
incr printNMore -1
}
}
The simplest way to do that line-at-a-time is something like this:
set fp [open "text" "r+"]
while {[gets $fp line] >= 0} {
if {[string match "endpoint: *" $line]} {
puts -nonewline [string range $line 10 end]
# Plus the whole of the next line...
puts [gets $fp]
}
}
However, if I was doing this myself I'd slurp the whole file in at once and use a regular expression:
set fp [open "text" "r+"]
set contents [read $fp]
if {[regexp {endpoint: ?([^\n]+)\n([^\n]*)} -> bit1 bit2]} {
# Print the concatenation of the capture groups
puts "$bit1$bit2"
}
I could write a better pattern if I knew what the file format was more precisely…