How to ensure my regular expression does not match too much

How to ensure my regular expression does not match too much - tcl

A file has few words with numbers in the begining of them. i want to extract a particular no line.when given 1, it extracts line 1 also with 11, 21
FILE.txt has contents:
1.sample
lines of
2.sentences
present in
...
...
10.the
11.file
when Executed pro 1 file.txt
gives results from line 1,10 and also from line 11
as these three results have 1 in their string. i.e
Output of the script:
1.sample
10.the
11.file
Expected output: the output which i am expecting
is only line 1 contents and not the line 10 or line 11 contents.
i.e
Expected output:
1.sample
My current code:
proc pro { pattern args} {
set file [open $args r]
set lnum 0
set occ 0
while {[gets $file line] >=0} {
incr lnum
if {[regexp $pattern $line]} {
incr occ
puts "The pattern is present in line: $lnum"
puts "$line"
} else {
puts "not found"
}
}
puts "total number of occurencese : $occ"
close $file
}
the program is working fine but the thing is i am retrieving lines that i dont want to along with the expected line. As the number (1) which i want to retrieve is present in the other strings such as 11, 21, 14 etc these lines are also getting printed.
kindly tolerate my unclear way of explaining the question.

You can solve the problem using word boundaries as suggested by glen but you can also consider the following things:
If after every line number there is a . then you can use it as delimiter in regular expression
regexp "^$lineNo\\." $a
I would also suggest to use ^ (match at the beginning of line) so that even if number is present in the line elsewhere it would not get counted.
tcl word boundaries are well explained at http://www.regular-expressions.info/wordboundaries.html

You have to ensure your pattern matches only between word boundaries:
if {[regexp "\\m$pattern\\M" $line]} { ...
See the documentation for regular expression syntax.

If what you're looking to do is as constrained as what you're describing, why not just use something like
if { [string range $line 0 [string length $pattern]] eq "${pattern}." } {
...
}

Related

Extract information from a list using Tcl

I have multiple log files which contain values like this with headers :
I want to make a header file which contains each row from column 1 as individual column headers and min - max from each of the row and present it in column format.
Info in log files:
Trace Header Min Max Mean
aaa 1 6 xx
bbb 2 7 xxx
What I want :
aaa bbb
1-6 2-7
Thanks for help

Try this (the long listing is supposed to be in the data variable, read from a file or whatever):
foreach line [split $data \n] {
if {[scan $line {%s %d %d} header min max] eq 3} {
set result($header) $min-$max
}
}
% parray result
result(aaa) = 1-6
result(bbb) = 2-7
The scan command looks for three fields on each line, one text field and two decimal integer fields. A matching line reports three fields found, empty lines or lines with only text report less. If it finds a match, it is added to the result.
ETA:
To deal with the real-world log file you mentioned in a comment:
foreach line [split $data \n] {
if {[scan $line {%59[ #()-./0-9:=>A-Za-z]%s %d %d} header stuff min max] eq 4} {
set result([string trim $header]) $min-$max
}
}
(Note that duplicate headers are compacted into one in the array.)
If you have whitespace in a field, you can't consume the data with %s. Instead you can find out what kind of data the header might contain by using
% set chars [string map {\n {}} [join [lsort -unique [split $data {}]] {}]]
#()-./0123456789:=>ABCDEFGHILMNOPRSTUVWXY[]abcdefghijklmnopqrstuvwxyz
which is easy to simplify to the field specification
[ #()-./0-9:=>A-Za-z]
If you need to able to match brackets, put them in like this:
[][ #()-./0-9:=>A-Za-z]
To split at lines containing uppercase text and blanks, then only equal-signs and possibly more blanks up to line end,
package require textutil::split
::textutil::splitx $data {(?n)^[[:upper:] ]+=+\s*$}
Documentation:
eq (operator),
foreach,
if,
join,
lsort,
package,
parray,
regexp,
Syntax of Tcl regular expressions,
scan,
set,
split,
string,
textutil::split (package)

Code snippet:
set foo {
Info in log files:
Trace Header Min Max Mean
aaa 1 6 xx
bbb 2 7 xxx
}
set pattern {^(.*)\s+(\d+)\s+(\d+)\s+.*$}
set result [regexp -line -inline -all -- $pattern $foo]
array set bar {}
puts "Here's one view..."
foreach {all item min max} $result {
puts "$item $min-$max"
set bar([string trim $item]) $min-$max
}
puts ""
puts "Here's another one..."
puts [join [lsort [array names bar]] "\t"]
foreach item [lsort [array names bar]] {
puts -nonewline "$bar($item)\t"
}
Execution output:
Here's one view...
aaa 1-6
bbb 2-7
Here's another one...
aaa bbb
2-7 1-6

How to delete a part of the text file if a pattern is found matching using tcl?

How can I remove a part of the text file if the pattern I am searching is matched?
eg:
pg_pin (VSS) {
direction : inout;
pg_type : primary_ground;
related_bias_pin : "VBN";
voltage_name : "VSS";
}
leakage_power () {
value : 0;
when : "A1&A2&X";
**related_pg_pin** : VBN;
}
My pattern is related_pg_pin. If this pattern is found i want to remove that particular section(starting from leakage power () { till the closing bracket}).

proc getSection f {
set section ""
set inSection false
while {[gets $f line] >= 0} {
if {$inSection} {
append section $line\n
# find the end of the section (a single right brace, #x7d)
if {[string match \x7d [string trim $line]]} {
return $section
}
} else {
# find the beginning of the section, with a left brace (#x7b) at the end
if {[string match *\x7b [string trim $line]]} {
append section $line\n
set inSection true
}
}
}
return
}
set f [open data.txt]
set g [open output.txt w]
set section [getSection $f]
while {$section ne {}} {
if {![regexp related_pg_pin $section]} {
puts $g $section
}
set section [getSection $f]
}
close $f
close $g
Starting with the last paragraph of the code, we open a file for reading (through the channel $f) and then get a section. (The procedure to get a section is a little bit convoluted, so it goes into a command procedure to be out of the way.) As long as non-empty sections keep coming, we check if the pattern occurs: if not, we print the section to the output file through the channel $g. Then we get the next section and go to the next iteration.
To get a section, first assume we haven't yet seen any part of a section. Then we keep reading lines until the end of the file is found. If a line ending with a left brace is found, we add it to the section and take a note that we are now in a section. From then on, we add every line to the section. If a line consisting of a single right brace is found, we quit the procedure and deliver the section to the caller.
Documentation:
! (operator),
>= (operator),
append,
close,
gets,
if,
ne (operator),
open,
proc,
puts,
regexp,
return,
set,
string,
while,
Syntax of Tcl regular expressions
Syntax of Tcl string matching:
* matches a sequence of zero or more characters
? matches a single character
[chars] matches a single character in the set given by chars (^ does not negate; a range can be given as a-z)
\x matches the character x, even if that character is special (one of *?[]\)

Here's a "clever" way to do it:
proc unknown args {
set body [lindex $args end]
if {[string first "related_pg_pin" $body] == -1} {puts $args}
}
source file.txt
Your data file appears to be Tcl-syntax-compatible, so execute it like a Tcl file, and for unknown commands, check to see if the last argument of the "command" contains the string you want to avoid.
This is clearly insanely risky, but it's fun.

String replacement not happening

I am reading line by line from file. In that file I want to replace
#endif statement with comment line as /******/. The following code is not touching that line
while {[gets $in line] !=-1}
{
# if substring #endif is present in the string
if { [regexp {endif} $line] } {
set line [string replace "#endif" 1 7 "/*****/" } $line]
}
}

As #Donal wrote your "string replace" line has several issues - the "} $line" part is a syntax error, for one, and the range you are giving is longer than the string you are looking to replace. Maybe you meant:
set line [string replace ${line} 1 6 "/*****/"]
But that assumes the "#endif" part is hard coded to start from the second character of the line.
I think for what you asked it is simpler to use "regsub":
set line [regsub {#endif} ${line} {/*****/}]

catch multiple empty lines in file in tcl

There are 4 empty space in my file,set in wr_fp.I want to catch four empty space in code. But below code is not working.
while {[gets $wr_fp line3] >= 0} {
if {[regexp "\n\s+\n\s+\n\s+\n" $line3]} { puts "found 4 empty lines"}
}

tl;dr: Don't put REs in "quotes", put them in {braces}.
The problem is that you've put your RE in quotes, so that it is actually this:
s+
s+
s+
Because of Tcl's general substitution rules, \n becomes a newline and \s becomes a simple s. Putting the RE in braces inhibits this (unwanted in this case) behaviour.

this is my answer.I want this.
while {[gets $rd_fp line] >= 0} {
if {[string match "" $line]} {
if {[expr $count % 4] == 1} {puts "found 4 space"}
incr count
}
}

The gets / chan gets command reads one line at a time and discards the newline character from each line, so your test will never succeed. You need to read in the full contents of the file at once:
set txt [chan read $wr_fp]
if {[regexp {\n\s+\n\s+\n\s+\n} $txt]} { puts "found 4 empty lines"}
Note that you need to use braces around the regular expression as Donal explains.
On some typical pitfalls of RE formulation:
do you really intend to specify that there must be at least one whitespace character on each 'empty' line? If you want to allow lines with no characters at all between the newlines, use \s* instead of \s+.
Also note that this regular expression will match ranges with more than four newlines: the extra newlines will be consumed by one of the \s+ groups. If you want to disallow extra newlines, match with (e.g.) [ \t\f\r] (or any other combination of whitespace you want) instead of \s. Note that this means the expression will match exactly three lines with nothing but blanks, tabs, form feeds, and returns, the lines surrounded and separated by newlines: you might want to extend it with one more subgroup to match the fourth line.
I'm a bit mystified by your solution as described in your own answer, since it doesn't do what was specified in the question. With the following text file:
abc
def
ghi
jkl
mno
pqr
stu
vwx
yz.
(where there is a tab character in the second line after "pqr")
and assuming count has the value 0 when the code is called, your code outputs "found 4 space" after reading the blank lines after "def", "pqr", and "vwx", but not after the line before "stu", where your question indicated it should be.
This code
set count 0
while {[gets $rd_fp line] >= 0} {
if {[string is space $line]} {
incr count
if {$count == 4} {puts "found 4 space"}
} else {
set count 0
}
}
does do what you asked for (nearly): it accepts lines containing whitespace as empty, and it prints its message only after finding four consecutive empty lines. The major difference from the specification in your question is that it also accepts lines without any characters as empty. To match your specification, string is space -strict $line should be used instead.
Documentation: chan, gets, if, incr, puts, regexp, set, string, while

Looking for a search string in a file and using those lines for processing in TCL

To be more precise:
I need to be looking into a file abc.txt which has contents something like this:
files/f1/atmp.c 98 100
files/f1/atmp1.c 89 100
files/f1/atmp2.c !! 75 100
files/f2/btmp.c 92 100
files/f2/btmp2.c !! 85 100
files/f3/xtmp.c 92 100
The script needs to find "!!" and use those lines to print out the following as output:
atmp2.c 75
btmp2.c 85
Any help?

this should do the trick.
set data {files/f1/atmp.c 98 100
files/f1/atmp1.c 89 100
files/f1/atmp2.c !! 75 100
files/f2/btmp.c 92 100
files/f2/btmp2.c !! 85 100
files/f3/xtmp.c 92 100}
set lines [split $data \n]
foreach line $lines {
set match [regexp {(\S+)\s+!!\s+(\d+)} $line -> file num]
if {$match} {puts "$file $num"}
}
Although regexp has a -all switch I don't think we can use it here as we only get the last match vars with -all

If your file isn't huge, you can slurp the whole thing into memory, split the lines into a TCL list, and then iterate through the list looking for a match. For example:
set fh [open foo]
set lines [read $fh]
close $fh
set lines [split $lines "\n"]
foreach line $lines {
if { [regexp {.*/(\S+\.c)\s*!!\s*(\d+)} $line match file data] } {
puts "$file $data"
}
}
This will successfully return just the lines with "!!" in them. With your posted corpus, the results are:
atmp2.c 75
btmp2.c 85

I might be tempted in this case to exec to awk:
set output [exec awk {$2 == "!!" {print $1, $3}} abc.txt]
puts $output

The trick is to combine the code that reads lines from the file with a regular expression that detects matching lines and extracts the relevant parts (a one-step process with regexp). The only tricky part is working out what exactly to use as the regular expression, so that you get exactly what you want. I'm going to guess that you're after the parts of the filenames after the /, that those filenames won't contain spaces, and that the number you're after is the entirety of the first digit sequence after the double exclamation. (Other formats are possible, some of which are easier to extract with other tools such as scan.) That would give us something like this:
set f [open abc.txt]
while {[gets $f line] >= 0} {
if {[regexp {([^\s/]+)\s+!!\s+(\d+)} $line -> name value]} {
# Or do whatever you want with these
puts "$name $value"
}
}
close $f
(The gets command with two arguments returns the length of line read, or -1 on failure. For normal files the only failure mode is EOF, so we can just terminate the loop when we get a negative value. Other kinds of channels can be more complex…)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to ensure my regular expression does not match too much - tcl

You have to ensure your pattern matches only between word boundaries: if {[regexp "\\m$pattern\\M" $line]} { ... See the documentation for regular expression syntax.

If what you're looking to do is as constrained as what you're describing, why not just use something like if { [string range $line 0 [string length $pattern]] eq "${pattern}." } { ... }

Related

Extract information from a list using Tcl

How to delete a part of the text file if a pattern is found matching using tcl?

String replacement not happening

catch multiple empty lines in file in tcl

Looking for a search string in a file and using those lines for processing in TCL

Categories

Resources