TCL: Read lines from file that contain only relevant words - tcl

I'm reading file and make some manipulation on the data.
Unfortunately I get the below error message:
unable to alloc 347392 bytes
Abort
Since the file is huge, I want to read only the lines that contain some word (describe in "regexp_or ")
Is there any way to read only the lines that contain "regexp_or" and save the foreach loop?
set regexp_or "^Err|warning|Fatal error"
set file [open [lindex $argv 1] r]
set data [ read $file ]
foreach line [ split $data "\n" ] {
if {[regexp [subst $regexp_or] $line]} {
puts $line
}
}

You could pull your input through grep:
set file [open |[list grep -E $regexp_or [lindex $argv 1]] r]
But that depends on grep being available. To do it completely in Tcl, you can process the file in chunks:
set file [open [lindex $argv 1] r]
while {![eof $file]} {
# Read a million characters
set data [read $file 1000000]
# Make sure to only work with complete lines
append data [gets $file]
foreach line [lsearch -inline -all -regexp [split $data \n] $regexp_or] {
puts $line
}
}
close $file

Related

How to parse a text file in tcl using separators?

I have a text file of the format
35|46
36|49
37|51
38|22
40|1
39|36
41|4
I have to read the file into an array across the separator "|" where left side will be the key of the array and right side will be the value.
I have used the following code
foreach {line} [split [read $lFile] \n] {
#puts $line
foreach {lStr} [split $line |] {
if { $lStr!="" } {
set lPartNumber [lindex $lStr 0]
set lNodeNumber [lindex $lStr 1]
set ::capPartsInterConnected::lMapPartNumberToNodeNumber($lPartNumber) $lNodeNumber
}
}
}
close $lFile
I am not able to read the left side of the separator "|". How to do it?
And similarly for this :
35|C:\AI\DESIGNS\SAMPLEDSN50\BENCH_WORKLIB.OLB|R
36|C:\AI\DESIGNS\SAMPLEDSN50\BENCH_WORKLIB.OLB|R
I need to assign all three strings in different variables
You are making mistake in the foreach where the result of split will be assigned to a loop variable lStr where it will contain only one value at a time causing the failure.
With lassign, this can be performed easily.
set fp [open input.txt r]
set data [split [read $fp] \n]
close $fp
foreach line $data {
if {$line eq {}} {
continue
}
lassign [split $line | ] key value
set result($key) $value
}
parray result
lassign [split "35|C:\\AI\\DESIGNS\\SAMPLEDSN50\\BENCH_WORKLIB.OLB|R" |] num userDir name
puts "num : $num"
puts "userDir : $userDir"
puts "name : $name"

How to read a string in tcl using split with the last character?

I am trying to read the following text printing each string before ;
0:1:2:3;
1:2:0;
10:13:15;
I wrote the following code
foreach {line} [split [read $lFile] \n] {
lassign [split $line ;] a
puts $a
}
But the output is the same string. I want the string before ;
In Tcl, semicolon marks the end of a command line, as such, you are actually doing split $line and not split $line ;. You will have to quote the ; for it to work:
foreach {line} [split [read $lFile] \n] {
lassign [split $line ";"] a
puts $a
}
Or using braces:
foreach {line} [split [read $lFile] \n] {
lassign [split $line {;}] a
puts $a
}
You could also use
set a [regsub {;.*} $a ""]
or, assuming no text after the semicolon
set a [string trimright $a ";"]
Output is the same string because you have a mistake in the foreach (as it was explained here). Though, you don't have to use foreach. You can read file line by line using while loop.
set file [open lFile.txt r];
while {![eof $file]} {
gets $file line;
lassign [split $line ";"] splittedFile;
puts stdout $splittedFile;
}
Or in other words, as long as file has not reached its end (![eof $file]), split file and print it to standard output.

How to get the data between two strings from a file in tcl?

In TCL Scripting:
I have a file in that i know how to search a string but how to get the line number when string is found.please answer me if it is possible
or
set fd [open test.txt r]
while {![eof $fd]} {
set buffer [read $fd]
}
set lines [split $buffer "\n"]
if {[regexp "S1 Application Protocol" $lines]} {
puts "string found"
} else {puts "not found"}
#puts $lines
#set i 0
#while {[regexp -start 0 "S1 Application Protocol" $line``s]==0} {incr i
#puts $i
#}
#puts [llength $lines]
#puts [lsearch -exact $buffer S1]
#puts [lrange $lines 261 320]
in the above program i am getting the output as string found .if i will give the string other than in this file i am getting string not found.
The concept of 'a line' is just a convention that we layer on top of the stream of data that we get from a file. So if you want to work with line numbers then you have to calculate them yourself. The gets command documnetion contains the following example:
set chan [open "some.file.txt"]
set lineNumber 0
while {[gets $chan line] >= 0} {
puts "[incr lineNumber]: $line"
}
close $chan
So you just need to replace the puts statement with your code to find the pattern of text you want to find and when you find it the value of $line gives you the line number.
To copy text that lies between two other lines I'd use something like the following
set chan [open "some.file.txt"]
set out [open "output.file.txt" "w"]
set lineNumber 0
# Read until we find the start pattern
while {[gets $chan line] >= 0} {
incr lineNumber
if { [string match "startpattern" $line]} {
# Now read until we find the stop pattern
while {[gets $chan line] >= 0} {
incr lineNumber
if { [string match "stoppattern" $line] } {
close $out
break
} else {
puts $out $line
}
}
}
}
close $chan
The easiest way is to use the fileutil::grep command:
package require fileutil
# Search for ipsum from test.txt
foreach match [fileutil::grep "ipsum" test.txt] {
# Each match is file:line:text
set match [split $match ":"]
set lineNumber [lindex $match 1]
set lineText [lindex $match 2]
# do something with lineNumber and lineText
puts "$lineNumber - $lineText"
}
Update
I realized that if the line contains colon, then lineText is truncated at the third colon. So, instead of:
set lineText [lindex $match 2]
we need:
set lineText [join [lrange $match 2 end] ":"]

Insert lines of code in a file after n numbers of lines using tcl

I am trying to write a tcl script in which I need to insert some lines of code after finding a regular expression .
For instance , I need to insert more #define lines of codes after finding the last occurrence of #define in the present file.
Thanks !
When making edits to a text file, you read it in and operate on it in memory. Since you're dealing with lines of code in that text file, we want to represent the file's contents as a list of strings (each of which is the contents of a line). That then lets us use lsearch (with the -regexp option) to find the insertion location (which we'll do on the reversed list so we find the last instead of the first location) and we can do the insertion with linsert.
Overall, we get code a bit like this:
# Read lines of file (name in “filename” variable) into variable “lines”
set f [open $filename "r"]
set lines [split [read $f] "\n"]
close $f
# Find the insertion index in the reversed list
set idx [lsearch -regexp [lreverse $lines] "^#define "]
if {$idx < 0} {
error "did not find insertion point in $filename"
}
# Insert the lines (I'm assuming they're listed in the variable “linesToInsert”)
set lines [linsert $lines end-$idx {*}$linesToInsert]
# Write the lines back to the file
set f [open $filename "w"]
puts $f [join $lines "\n"]
close $f
Prior to Tcl 8.5, the style changes a little:
# Read lines of file (name in “filename” variable) into variable “lines”
set f [open $filename "r"]
set lines [split [read $f] "\n"]
close $f
# Find the insertion index in the reversed list
set indices [lsearch -all -regexp $lines "^#define "]
if {![llength $indices]} {
error "did not find insertion point in $filename"
}
set idx [expr {[lindex $indices end] + 1}]
# Insert the lines (I'm assuming they're listed in the variable “linesToInsert”)
set lines [eval [linsert $linesToInsert 0 linsert $lines $idx]]
### ALTERNATIVE
# set lines [eval [list linsert $lines $idx] $linesToInsert]
# Write the lines back to the file
set f [open $filename "w"]
puts $f [join $lines "\n"]
close $f
The searching for all the indices (and adding one to the last one) is reasonable enough, but the contortions for the insertion are pretty ugly. (Pre-8.4? Upgrade.)
Not exactly the answer to your question, but this is the type of task that lends towards shell scripting (even if my solution is a bit ugly).
tac inputfile | sed -n '/#define/,$p' | tac
echo "$yourlines"
tac inputfile | sed '/#define/Q' | tac
should work!
set filename content.txt
set fh [open $filename r]
set lines [read $fh]
close $fh
set line_con [split $lines "\n"]
set line_num {}
set i 0
foreach line $line_con {
if [regexp {^#define} $line] {
lappend line_num $i
incr i
}
}
if {[llength $line_num ] > 0 } {
linsert $line_con [lindex $line_num end] $line_insert
} else {
puts "no insert point"
}
set filename content_new.txt
set fh [open $filename w]
puts $fh file_con
close $fh

TCL - find a regular pattern in a file and return the occurrence and number of occurrences

I am writing a code to grep a regular expression pattern from a file, and output that regular expression and the number of times it has occured.
Here is the code: I am trying to find the pattern "grep" in my file hello.txt:
set file1 [open "hello.txt" r]
set file2 [read $file1]
regexp {grep} $file2 matched
puts $matched
while {[eof $file2] != 1} {
set number 0
if {[regexp {grep} $file2 matched] >= 0} {
incr number
}
puts $number
}
Output that I got:
grep
--------
can not find channel named "qwerty
iiiiiii
wxseddtt
lsakdfhaiowehf'
jbsdcfiweg
kajsbndimm s
grep
afnQWFH
ACV;SKDJNCV;
qw qde
kI UQWG
grep
grep"
while executing
"eof $file2"
It's usually a mistake to check for eof in a while loop -- check the return code from gets instead:
set filename "hello.txt"
set pattern {grep}
set count 0
set fid [open $filename r]
while {[gets $fid line] != -1} {
incr count [regexp -all -- $pattern $line]
}
close $fid
puts "$count occurrances of $pattern in $filename"
Another thought: if you're just counting pattern matches, assuming your file is not too large:
set fid [open $filename r]
set count [regexp -all -- $pattern [read $fid [file size $filename]]]
close $fid
The error message is caused by the command eof $file2. The reason is that $file2 is not a file handle (resp. channel) but contains the content of the file hello.txt itself. You read this file content with set file2 [read $file1].
If you want to do it like that I would suggest to rename $file2 into something like $filecontent and loop over every contained line:
foreach line [split $filecontent "\n"] {
... do something ...
}
Glenn is spot on. Here is another solution: Tcl comes with the fileutil package, which has the grep command:
package require fileutil
set pattern {grep}
set filename hello.txt
puts "[llength [fileutil::grep $pattern $filename]] occurrences found"
If you care about performance, go with Glenn's solution.