editing file at multiple places using tcl/tk patterns - tcl

I have a file in which I have to search for "if statement" and corresponding "end if statement" . Currently I am doing it using lsearch( separately for "if" and "end if" and then using lappend to combine the two). Problem arises when there is cascaded if statement, which makes it difficult to identify the related "if" and "end if" pairs. If there is no assignment between the two statements then I use lreplace to delete the lines between the if and end if pair. This has to run in loop because there are multiple such pairs. Every time lreplace is used, lsearch is used again to calculate the new indexes. I am finding that this is very inefficient implementation. Can anyone suggest some pointers to improve the same.

This is not a simple thing to do. The issue is that you're really needing a pushdown automaton rather than a simple finite automaton. Simple searching won't cut it.
What you can do though is this: go through and replace each if and end if keyword with characters otherwise unused (\u0080 and \u0081 are good candidates; the C1 controls are really obscure). Then you can use a simple match in a loop to pick off each inner pair while requiring there to be no unmatched \u0080/\u0081 inside. With each match, you get swap the characters back to the tokens and do the other processing you want at the same time. Once there are no more matches, you're done.
set txt [string map {"end if" "\u0081" "if" "\u0080"} $txt]
while {[regexp -indices {\u0080[^\u0080\u0081]*\u0081} $txt span]} {
set bit [string map {"\u0081" "end if" "\u0080" "if"} [string range $txt {*}$span]]
puts "matched $bit"
# ...
set txt [string replace $txt $bit {*}$span]
}

Related

Split camelcase value with TCL

I have this TCL expression:
[string toupper [join [lrange [file split [value [topnode].file]] 1 1]]]
This retrieves companyName value from c:/companyName... and I need to split that value before the first capital letter into Company Name. Any ideas?
Thanks in advance.
That's rather more in one word than I would consider a good idea. It makes the whole thing quite opaque! Let's split it up.
Firstly, I would expect the base company name to be better retrieved with lindex from the split filename.
set companyName [lindex [file split [value [topnode].file]] 1]
Now, we need to process that to get the human-readable version out of it. Alas, that's going be a bit difficult without knowing what's been done to it, but if we use as our example fooBarBoo_grill then we can see what we can do. First, we get the pieces with some regular expressions (this part might need tweaking if there are non-ASCII characters involved, or if certain critical characters need special treatment):
# set companyName "fooBarBoo_grill"
set pieces [regexp -all -inline {[a-z]+|[A-Z][a-z]*} $companyName]
# pieces = foo Bar Boo grill
Next, we need to capitalise. I'll assume you're using Tcl 8.6 and so have lmap as it is perfect for this task. The string totitle command has been around for a very long time.
set pieces [lmap word $pieces {string totitle $word}]
# pieces = Foo Bar Boo Grill
That list might need a bit more tweaking, or it might be OK as it is. An example of tweaking that might be necessary is if you've got an Irish name like O'Hanrahan, or if you need to insert a comma before and period after Inc.
Finally, we properly ought to set companyName [join $pieces] to get back a true string, but that doesn't have a noticeable effect with a list of words made purely out of letters. Also, more complex joins with regular expressions might be needed if you've done insertion of prefixing punctuation (the , Inc. case).
If I was doing this for real, I'd try to have the proper company name expressed directly elsewhere rather than relying on the filename. Much simpler to get right!
To begin with, try using
lindex [file split [value [topnode].file]] 1
The lrange command will return a list, which might cause problems with some directory names. The join command should be pointless if you don't use lrange, and string toupper removes the information you need to do the operation you want to do.
To split before uppercase letters, you can use repetitive matches of either (?:[a-z]+|[A-Z][a-z]+) (ASCII / English alphabet letters only) or (?:[[:lower:]]+|[[:upper:]][[:lower:]]+) (any Unicode letters).
% regexp -all -inline {(?:[a-z]+|[A-Z][a-z]+)} camelCaseWord
camel Case Word
Use string totitle to change the first letter of the first word to upper case.
Documentation:
file,
lindex,
regexp,
string,
Syntax of Tcl regular expressions

How to add many selections to one variable in tcl

after doing some other things in my script I end up with a series of variables set in tcl ($sel1, $sel2, $sel3, ...) and I need to add them to the following line:
set all [::TopoTools::selections2mol "$box $sel1 $sel2 $sel3 $sel4"]
Now, if I only had four this would be fine by hand, but in the final version I will have hundreds which is untenable to do by hand. I'm sure the answer is some kind of loop, but I've been giving it some thought now and I can't quite figure it out. If I had, say, $sel1, $sel2, all the way to a given number, how would I add them to that line in the format shown at any amount that I want, with the $box at the beginning as shown? Thanks very much for your help.
It may or may not be relevant, but I define the variables in a loop as follows:
set sel$i [atomselect $id all]
I'm not familiar with the software you are using, but it should be possible to fix this without too much hassle.
If you put this inside the loop instead:
set sell$i [atomselect $id all]
append valueStr " " [set sell$i]
(or perhaps this, even if it is little C:)
append valueStr " " [set sell$i [atomselect $id all]]
you will get the string that " $sel1 $sel2 $sel3 $sel4" is substituted into (remember to put $box in as well).
With Tcl 8.5 or later, you can do
dict set values $i [atomselect $id all]
inside the loop, which gives you a dictionary structure containing all values, and then create the sequence of values with:
set all [::Topotools::selections2mol [concat $box [dict values $values]]]
Depending on the output and input formats of atomselect and selections2mol, the latter might not actually work without a little fine-tuning, but it should be worth a try.
In the latter case, you aren't getting the variables, but each value is available as
dict get $values $i
You can do this with an array also:
set values($i) [atomselect $id all]
but then you need to sort the keys before collecting the values, like this
set valueStr [list $box]
foreach key [lsort -integer [array names values]] {
append valueStr " " $values($key)
}
Documentation:
append,
array,
concat,
dict,
foreach,
list,
lsort,
set

How to grep parameters inside square brackets?

Could you please help me with the following script?
It is a Tcl script which Synopsys IC Compiler II will source.
set_dont_use [get_lib_cells */*CKGT*0P*] -power
set_dont_use [get_lib_cells */*CKTT*0P*] -setup
May I know how to take only */*CKGT*0P* and */*CKTT*0P* and assign these to a variable.
Of course you can treat a Tcl script as something you search through; it's just a file with text in it after all.
Let's write a script to select the text out. It'll be a Tcl script, of course. For readability, I'm going to put the regular expression itself in a global variable; treat it like a constant. (In larger scripts, I find it helps a lot to give names to REs like this, as those names can be used to remind me of the purpose of the regular expression. I'll call it “RE” here.)
set f [open theScript.tcl]
# Even with 10 million lines, modern computers will chew through it rapidly
set lines [split [read $f] "\n"]
close $f
# This RE will match the sample lines you've told us about; it might need tuning
# for other inputs (and knowing what's best is part of the art of RE writing)
set RE {^set_dont_use \[get_lib_cells ([\w*/]+)\] -\w+$}
foreach line $lines {
if {[regexp $RE $line -> term]} {
# At this point, the part you want is assigned to $term
puts "FOUND: $term"
}
}
The key things in the RE above? It's in braces to reduce backslash-itis. Literal square brackets are backslashed. The bit in parentheses is the bit we're capturing into the term variable. [\w*/]+ matches a sequence of one or more characters from a set consisting of “standard word characters” plus * and /.
The use of regexp has -> as a funny name for a variable that is ignored. I could have called it dummy instead; it's going to have the whole matched string in it when the RE matches, but we already have that in $term as we're using a fully-anchored RE. But I like using -> as a mnemonic for “assign the submatches into these”. Also, the formal result of regexp is the number of times the RE matched; without the -all option, that's effectively a boolean that is true exactly when there was a match, which is useful. Very useful.
To assign the output of any command <command> to a variable with a name <name>, use set <name> [<command>]:
> set hello_length [string length hello]
5
> puts "The length of 'hello' is $hello_length."
The length of 'hello' is 5.
In your case, maybe this is what you want? (I still don't quite understand the question, though.)
set ckgt_cells [get_lib_cells */*CKGT*0P*]
set cktt_cells [get_lib_cells */*CKTT*0P*]

TCL RegExp IP exceptions

I have the following TCl regexp to extract an exact IP from a line:
set ip [regexp -all -inline {((([2][5][0-5]|([2][0-4]|[1][0-9]|[0-9])?[0-9])\.){3})([2][5][0-5]|([2][0-4]|[1][0-9]|[0-9])?[0-9])} $ip_text]
I'm using it to analyze a log file, and it works fine, except it's also extracting the domain name IP portion when the domain name also contains an IP format (but usually in reverse), which I don't wan't
eg when ip_text = Log File 61.140.142.192 - 2012-06-16, 192.142.140.61.broad.gz.gd.dynamic.163data.com.cn, CHN, 1
I get 61.140.142.192 & 192.142.140.61 but only 61.140.142.192 is legit.
and when ip_text = Entry "61.140.170.118" resolved from 118.170.140.61.broad.gz.gd.dynamic.163data.com.cn, and 61.140.185.45 verified.
I get 61.140.170.118, 118.170.140.61 & 164.111.111.34 but only 61.140.170.118 & 61.140.185.45 are legit.
Is there a way to make the regexpr exclude IP's that have a domain name character after it? ie exclude <IP><dot> or <IP><dash> or <IP><any alpha/numeric character>
You can use a negative lookahead constraint on the end of that RE. Those are written as (?!\.|\d) in this case, which matches when the next character is not a . or a digit (it also matches at the end of the string, when there's no next character at all). With complicated regular expressions it's often easier to save them in a variable (often global) since that effectively lets you name the RE.
set IPAddrRE {(((25[0-5]|(2[0-4]|1[0-9]|[1-9])?[0-9])\.){3})(25[0-5]|(2[0-4]|1[0-9]|[1-9])?[0-9])(?!\.|\d)}
set ip [regexp -all -inline $IPAddrRE $ip_text]
The reason you need to prevent the follower being a digit? Without that, the RE can stop matching one character earlier, allowing it to pick 192.142.140.6 out of your sample text as well as the value you actually want.
You should consider using non-capturing grouping for this task. Replacing (…) with (?:…) will allow the RE engine to use a more efficient matcher internally. On a lot of text, this will make a substantial difference. For example, with this version:
set IPAddrRE {(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9])?[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9])?[0-9])(?!\.|\d)}
I see that the time to execute is about half what the version I listed in the first part of this answer is (and about 40% of what your original version required). However, it produces different results — none of the bits that you probably don't require — so you'll need to adapt other code too:
% set ip [regexp -all -inline $IPAddrRE $ip_text]
61.140.142.192
It's often a good idea to dumb down your regular expressions instead of trying to make them smarter.
lmap candidate [regexp -inline -all {[\d.]+} $txt] {
if {[llength [split $candidate .]] == 4} {
set candidate
} else {
continue
}
}
will pick out the exact three numbers you wanted from your text.
Documentation: continue, if, llength, lmap, lmap replacement, Syntax of Tcl regular expressions, regexp, set, split

How to append two string in TCL with a space between them?

I'm trying to append two string in tcl. I'm reading from csv and set values to the variables and then i will use that for assigning it my application. I tried the below one.
set vMyvalue [lindex $lsLine 17]
append vMyvalue " [lindex $lsLine 18]"
it is giving me the expected result. Like for e.g if i have values 250 and km in 17th and 18th position in csv. i'm getting
250 km
But the problem is when there are no values in the 17th and 18th i mean when it is empty, that time also it is adding space. But my application won't allow me to assign space for that value. How can i resolve this? I just started working in TCL. I'm not aware of many functions.
I think the most intuitive way to handle cases similar to this one if you don't know a function to do this (including for example if you are joining two strings with some character but if any one of them are empty strings, then you want something different to be done), would be to use if. In this case:
if {$vMyvalue eq " "} {set vMyvalue ""}
If you want to make your code somewhat shorter, you can make use of the functions lrange (list range), join and string:
set vMyvalue [string trim [join [lrange $lsLine 17 18] " "]]
lrange returns a list of elements from the list $lsLine between indices 17 to 18 inclusive, then join literally joins those elements with a space, and last, string trim cleans up any leading and trailing spaces (removing the space completely if it is the only character in the string).
There are several ways to do this. The minimum modification from the code you already have is probably to trim the result. Trim removes leading and trailing whitespace but if it's only whitespace it would trim it to an empty string. So:
set myValue [string trim $myValue]