TCL - find a regular pattern in a file and return the occurrence and number of occurrences

TCL - find a regular pattern in a file and return the occurrence and number of occurrences - tcl

I am writing a code to grep a regular expression pattern from a file, and output that regular expression and the number of times it has occured.
Here is the code: I am trying to find the pattern "grep" in my file hello.txt:
set file1 [open "hello.txt" r]
set file2 [read $file1]
regexp {grep} $file2 matched
puts $matched
while {[eof $file2] != 1} {
set number 0
if {[regexp {grep} $file2 matched] >= 0} {
incr number
}
puts $number
}
Output that I got:
grep
--------
can not find channel named "qwerty
iiiiiii
wxseddtt
lsakdfhaiowehf'
jbsdcfiweg
kajsbndimm s
grep
afnQWFH
ACV;SKDJNCV;
qw qde
kI UQWG
grep
grep"
while executing
"eof $file2"

It's usually a mistake to check for eof in a while loop -- check the return code from gets instead:
set filename "hello.txt"
set pattern {grep}
set count 0
set fid [open $filename r]
while {[gets $fid line] != -1} {
incr count [regexp -all -- $pattern $line]
}
close $fid
puts "$count occurrances of $pattern in $filename"
Another thought: if you're just counting pattern matches, assuming your file is not too large:
set fid [open $filename r]
set count [regexp -all -- $pattern [read $fid [file size $filename]]]
close $fid

The error message is caused by the command eof $file2. The reason is that $file2 is not a file handle (resp. channel) but contains the content of the file hello.txt itself. You read this file content with set file2 [read $file1].
If you want to do it like that I would suggest to rename $file2 into something like $filecontent and loop over every contained line:
foreach line [split $filecontent "\n"] {
... do something ...
}

Glenn is spot on. Here is another solution: Tcl comes with the fileutil package, which has the grep command:
package require fileutil
set pattern {grep}
set filename hello.txt
puts "[llength [fileutil::grep $pattern $filename]] occurrences found"
If you care about performance, go with Glenn's solution.

Related

TCL: Read lines from file that contain only relevant words

I'm reading file and make some manipulation on the data.
Unfortunately I get the below error message:
unable to alloc 347392 bytes
Abort
Since the file is huge, I want to read only the lines that contain some word (describe in "regexp_or ")
Is there any way to read only the lines that contain "regexp_or" and save the foreach loop?
set regexp_or "^Err|warning|Fatal error"
set file [open [lindex $argv 1] r]
set data [ read $file ]
foreach line [ split $data "\n" ] {
if {[regexp [subst $regexp_or] $line]} {
puts $line
}
}

You could pull your input through grep:
set file [open |[list grep -E $regexp_or [lindex $argv 1]] r]
But that depends on grep being available. To do it completely in Tcl, you can process the file in chunks:
set file [open [lindex $argv 1] r]
while {![eof $file]} {
# Read a million characters
set data [read $file 1000000]
# Make sure to only work with complete lines
append data [gets $file]
foreach line [lsearch -inline -all -regexp [split $data \n] $regexp_or] {
puts $line
}
}
close $file

Read lines from file exactly as they appear

I am reading from a file and need to find the exact line $(eval $(call CreateTest KEYWORD and everything following after the line (as the rest is all random). This is how I am currently trying to find it but it always reports back as nothing found to match.
proc listFromFile {$path1} {
set find {$(eval $(call CreateTest, KEYWORD}
upvar path1 path1
set f [open $path1 r]
set data [split [string trim [read $f]] \n]
close $f
# return [lsearch -all -inline $data *KEYWORD*]
return [lsearch -exact -all -inline $data $find*]
}
The commented out line is the closest I can get it to work but it pulls anything with KEYWORD anywhere in the file. the KEYWORD could appear in lines I do not want to read therefore I need to pull the exact line as stated above
EDIT
I should have mentioned that the file is formatted like so;
$(eval $(call CreateTest, KEYWORD ...
$(eval $(call CreateTest, NOT_KEYWORD ...
$(eval $(call CreateTest, KEYWORD ...
$(eval $(call CreateTest, KEYWORD ...
$(eval $(call CreateTest, NOT_KEYWORD ...
$(eval $(call CreateTest, KEYWORD ...
which means I only want to pull the lines containing the exact string and the keyword. But there are lines between what I am looking for that I do not want to display

I think you should just apply your match to each line as you read them.
proc getMatchingLines {filename match} {
set result {}
set f [open $filename r]
while {[gets $f line] != -1} {
if {[string match ${find}* $line]} {
lappend result $line
}
}
close $f
return $result
}
set find {$(eval $(call CreateTest, KEYWORD}
set matching [getMatchingLines $filename $find]
foreach line $matching {
# do something with the matching line
}
You could build up a list of results or do something immediately for each matching line as appropriate for your application. The main difference is that string match doesn't have many meta characters unlike regexp. Only * and ? are special so it is simple to match for a line matching your string followed by anything ie: ${find}*.

Use string first and string range instead:
# foo.tcl
set f [open "data.txt" r]
set body [read $f]
puts -nonewline [string range $body [string first "ccc" $body] [string length $body]]
close $f
Test:
$ cat data.txt
aaa
bbb
ccc
ddd
eee
$ tclsh foo.tcl
ccc
ddd
eee

I think in your code you have used * as a glob pattern.
return [lsearch -exact -all -inline $data $find*]
When -exact flag used, it will treat that * as a literal * thereby failing to get the desired result. Removing that * will solve the problem.
proc listFromFile {$path1} {
set find {$(eval $(call CreateTest, KEYWORD }
upvar path1 path1
set f [open $path1 r]
set data [split [string trim [read $f]] \n]
close $f
return [lsearch -all -inline $data $find]]
}

This should work:
proc listFromFile path {
set f [open $path r]
set data [split [string trim [read $f]] \n]
close $f
return [lsearch -exact -all -inline $data { KEYWORD}]
}
In my answer to your earlier question, I suggested lsearch (without -exact) and KEYWORD* as a pattern because that seemed to be what you were after. Considering the lines you show here, searching for a space character followed by the string KEYWORD seems more likely to work.
Another thing: your problem with the parameter (which you tried to solve with upvar) was that you had a dollar sign attached to the parameter name. If you leave out the dollar sign you get a usable parameter name like in the code above (it is possible to use it even with the dollar sign, but it's a lot harder).
Documentation: close, lsearch, open, proc, read, return, set, split, string

TCL: Check file existance by SHELL environment variable (another one)

I have a file contain lines with path to the files. Sometimes a path contain SHELL environment variable and I want to check the file existence.
The following is my solution:
set fh [open "the_file_contain_path" "r"]
while {![eof $fh]} {
set line [gets $fh]
if {[regexp -- {\$\S+} $line]} {
catch {exec /usr/local/bin/tcsh -c "echo $line" } line
if {![file exists $line]} {
puts "ERROR: the file $line is not exists"
}
}
}
I sure there is more elegant solution without using
/usr/local/bin/tcsh -c

You can capture the variable name in the regexp command and do a lookup in Tcl's global env array. Also, your use of eof as the while condition means your loop will interate one time too many (see http://phaseit.net/claird/comp.lang.tcl/fmm.html#eof)
set fh [open "the_file_contain_path" "r"]
while {[gets $fh line] != -1} {
# this can handle "$FOO/bar/$BAZ"
if {[string first {$} $line] != -1} {
regsub -all {(\$)(\w+)} $line {\1::env(\2)} new
set line [subst -nocommand -nobackslashes $new]
}
if {![file exists $line]} {
puts "ERROR: the file $line does not exist"
}
}

First off, it's usually easier (for small files, say of no more than 1–2MB) to read in the whole file and split it into lines instead of using gets and eof in a while loop. (The split command is very fast.)
Secondly, to do the replacement you need the place in the string to replace, so you use regexp -indices. That does mean that you need to take a little more complex approach to doing the replacement, with string range and string replace to do some of the work. Assuming you're using Tcl 8.5…
set fh [open "the_file_contain_path" "r"]
foreach line [split [read $fh] "\n"] {
# Find a replacement while there are any to do
while {[regexp -indices {\$(\w+)} $line matchRange nameRange]} {
# Get what to replace with (without any errors, just like tcsh)
set replacement {}
catch {set replacement $::env([string range $line {*}$nameRange])}
# Do the replacement
set line [string replace $line {*}$matchRange $replacement]
}
# Your test on the result
if {![file exists $line]} {
puts "ERROR: the file $line is not exists"
}
}

TCL programs can read environment variables using the built-in global variable env. Read the line, look for $ followed by a name, look up $::env($name), and substitute it for the variable.
Using the shell for this is very bad if the file is supplied by untrusted users. What if they put ; rm * in the file? And if you're going to use a shell, you should at least use sh or bash, not tcsh.

Insert lines of code in a file after n numbers of lines using tcl

I am trying to write a tcl script in which I need to insert some lines of code after finding a regular expression .
For instance , I need to insert more #define lines of codes after finding the last occurrence of #define in the present file.
Thanks !

When making edits to a text file, you read it in and operate on it in memory. Since you're dealing with lines of code in that text file, we want to represent the file's contents as a list of strings (each of which is the contents of a line). That then lets us use lsearch (with the -regexp option) to find the insertion location (which we'll do on the reversed list so we find the last instead of the first location) and we can do the insertion with linsert.
Overall, we get code a bit like this:
# Read lines of file (name in “filename” variable) into variable “lines”
set f [open $filename "r"]
set lines [split [read $f] "\n"]
close $f
# Find the insertion index in the reversed list
set idx [lsearch -regexp [lreverse $lines] "^#define "]
if {$idx < 0} {
error "did not find insertion point in $filename"
}
# Insert the lines (I'm assuming they're listed in the variable “linesToInsert”)
set lines [linsert $lines end-$idx {*}$linesToInsert]
# Write the lines back to the file
set f [open $filename "w"]
puts $f [join $lines "\n"]
close $f
Prior to Tcl 8.5, the style changes a little:
# Read lines of file (name in “filename” variable) into variable “lines”
set f [open $filename "r"]
set lines [split [read $f] "\n"]
close $f
# Find the insertion index in the reversed list
set indices [lsearch -all -regexp $lines "^#define "]
if {![llength $indices]} {
error "did not find insertion point in $filename"
}
set idx [expr {[lindex $indices end] + 1}]
# Insert the lines (I'm assuming they're listed in the variable “linesToInsert”)
set lines [eval [linsert $linesToInsert 0 linsert $lines $idx]]
### ALTERNATIVE
# set lines [eval [list linsert $lines $idx] $linesToInsert]
# Write the lines back to the file
set f [open $filename "w"]
puts $f [join $lines "\n"]
close $f
The searching for all the indices (and adding one to the last one) is reasonable enough, but the contortions for the insertion are pretty ugly. (Pre-8.4? Upgrade.)

Not exactly the answer to your question, but this is the type of task that lends towards shell scripting (even if my solution is a bit ugly).
tac inputfile | sed -n '/#define/,$p' | tac
echo "$yourlines"
tac inputfile | sed '/#define/Q' | tac
should work!

set filename content.txt
set fh [open $filename r]
set lines [read $fh]
close $fh
set line_con [split $lines "\n"]
set line_num {}
set i 0
foreach line $line_con {
if [regexp {^#define} $line] {
lappend line_num $i
incr i
}
}
if {[llength $line_num ] > 0 } {
linsert $line_con [lindex $line_num end] $line_insert
} else {
puts "no insert point"
}
set filename content_new.txt
set fh [open $filename w]
puts $fh file_con
close $fh

Parsing a file with Tcl

I have a file in here which has multiple set statements. However I want to extract the lines of my interest. Can the following code help
set in [open filename r]
seek $in 0 start
while{ [gets $in line ] != -1} {
regexp (line to be extracted)
}

Other solution:
Instead of using gets I prefer using read function to read the whole contents of the file and then process those line by line. So we are in complete control of operation on file by having it as list of lines
set fileName [lindex $argv 0]
catch {set fptr [open $fileName r]} ;
set contents [read -nonewline $fptr] ;#Read the file contents
close $fptr ;#Close the file since it has been read now
set splitCont [split $contents "\n"] ;#Split the files contents on new line
foreach ele $splitCont {
if {[regexp {^set +(\S+) +(.*)} $ele -> name value]} {
puts "The name \"$name\" maps to the value \"$value\""
}
}
How to run this code:
say above code is saved in test.tcl
Then
tclsh test.tcl FileName
FileName is full path of file unless the file is in the same directory where the program is.

First, you don't need to seek to the beginning straight after opening a file for reading; that's where it starts.
Second, the pattern for reading a file is this:
set f [open $filename]
while {[gets $f line] > -1} {
# Process lines
if {[regexp {^set +(\S+) +(.*)} $line -> name value]} {
puts "The name \"$name\" maps to the value \"$value\""
}
}
close $f
OK, that's a very simple RE in the middle there (and for more complicated files you'll need several) but that's the general pattern. Note that, as usual for Tcl, the space after the while command word is important, as is the space between the while expression and the while body. For specific help with what RE to use for particular types of input data, ask further questions here on Stack Overflow.

Yet another solution:
as it looks like the source is a TCL script, create a new safe interpreter using interp which only has the set command exposed (and any others you need), hide all other commands and replace unknown to just skip anything unrecognised. source the input in this interpreter

Here is yet another solution: use the file scanning feature of Tclx. Please look up Tclx for more info. I like this solution for that you can have several scanmatch blocks.
package require Tclx
# Open a file, skip error checking for simplicity
set inputFile [open sample.tcl r]
# Scan the file
set scanHandle [scancontext create]
scanmatch $scanHandle {^\s*set} {
lassign $matchInfo(line) setCmd varName varValue; # parse the line
puts "$varName = $varValue"
}
scanfile $scanHandle $inputFile
close $inputFile
Yet another solution: use the grep command from the fileutil package:
package require fileutil
puts [lindex $argv 0]
set matchedLines [fileutil::grep {^\s*set} [lindex $argv 0]]
foreach line $matchedLines {
# Each line is in format: filename:line, for example
# sample.tcl:set foo bar
set varName [lindex $line 1]
set varValue [lindex $line 2]
puts "$varName = $varValue"
}

I've read your comments so far, and if I understand you correctly your input data file has 6 (or 9, depending which comment) data fields per line, separated by spaces. You want to use a regexp to parse them into 6 (or 9) arrays or lists, one per data field.
If so, I'd try something like this (using lists):
set f [open $filename]
while {[gets $f line] > -1} {
# Process lines
if {[regexp {(\S+) (\S+) (\S+) (\S+) (\S+) (\S+)} $line -> name source drain gate bulk inst]} {
lappend nameL $name
lappend sourceL $source
lappend drainL $drain
lappend gateL $gate
lappend bulkL $bulk
lappend instL $inst
}
}
close $f
Now you should have a set of 6 lists, one per field, with one entry in the list for each item in your input file. To access the i-th name, for example, you grab $nameL[$i].
If (as I suspect) your main goal is to get the parameters of the device whose name is "foo", you'd use a structure like this:
set name "foo"
set i [lsearch $nameL $name]
if {$i != -1} {
set source $sourceL[$i]
} else {
puts "item $name not found."
set source ''
# or set to 0, or whatever "not found" marker you like
}

set File [ open $fileName r ]
while { [ gets $File line ] >= 0 } {
regex {(set) ([a-zA-Z0-0]+) (.*)} $line str1 str2 str3 str4
#str2 contains "set";
#str3 contains variable to be set;
#str4 contains the value to be set;
close $File
}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

TCL - find a regular pattern in a file and return the occurrence and number of occurrences - tcl

Related

TCL: Read lines from file that contain only relevant words

Read lines from file exactly as they appear

TCL: Check file existance by SHELL environment variable (another one)

Insert lines of code in a file after n numbers of lines using tcl

Parsing a file with Tcl

Categories

Resources