Help needed in writing regular expression -- TCL - tcl

Just seeking a favour to write a regular expression to match the following set of strings. I want to write an expression which matches all the following strings TCL
i) ( XYZ XZZ XVZ XWZ )
Clue : Starting string is X and Z ending string is same for all the pairs. Only the middle string is differs Y Z V W.
My trial: [regexp {^X([Y|Z|V|W]*)Z$}]
I want to write an another regexp which catches/matches only the following string wherever comes
ii) (XYZ)
My trial: [regexp {^X([Y]*)Z$}] or simply regexp {^XYZ$}
Just want to make sure its a correct approach. Is there any other way available to optimize the regexp :)
i) 1st Question Tested
set to_Match_Str "XYZ XZZ XVZ XWZ"
foreach {wholeStr to_Match_Str} [regexp -all -inline {X[YZVW]Z} $to_Match_Str] {
puts "MATCH $to_Match_Str in the list"
}
It prints only XZZ XWZ from the list. Its leaves out XYZ & XVZ
When I include the paranthesis [regexp -all -inline {X([YZVW])Z} $to_Match_Str]. It prints all the middle characters correctly Y Z V W

i) (XYZ XZZ XVZ XWZ)
Clue : Starting string is X and Z ending string is same for all the pairs. Only the middle string is differs Y Z V W.
My trial: [regexp {^X([Y|Z|V|W]*)Z$}]
Assuming you're not after literal parentheses around the whole lot, you match that using this:
regexp {X([YZVW])Z} $string -> matchedSubstr
That's because the interior strings are all single characters. (It also stores the matched substring in the variable matchedSubstr; choose any variable name there that you want.) You should not use | inside a [] in a regular expression, as it has no special meaning there. (You might need to add ^$ anchors round the outside.)
On the other hand, if you want to match multiple character sequences (which the Y etc. are just stand-ins for) then you use this:
regexp {X(Y|Z|V|W)Z} $string -> matchedSubstr
Notice that | is being used here, but [] is not.
If your real string has many of these strings (whichever pattern you're using to match them) then the easiest way to extract them all is with the -all -inline options to regexp, typically used in a foreach like this:
foreach {wholeStr matchedSubstr} [regexp -all -inline {X([YZVW])Z} $string] {
puts "Hey! I found a $matchSubstr in there!"
}
Mix and match to taste.
My trial: [regexp {^X([Y]*)Z$}] or simply regexp {^XYZ$}
Just want to make sure its a correct approach. Is there any other way available to optimize the regexp :)
That's optimal for an exact comparison. And in fact Tcl will optimize that internally to a straight string equality test if that's literal.

My trial: [regexp {^X([Y|Z|V|W]*)Z$}]
That would match the strings given, but as you are using the * multiplier it would also match strings like "XZ", "XYYYYYYYYYYYYYYYYZ" and "XYZYVWZWWWZVYYWZ". To match the middle character only once, don't use a multiplier:
^X([Y|Z|V|W])Z$
My trial: [regexp {^X([Y]*)Z$}]
The same there, it will also match strings like "XZ", "XYYZ" and "XYYYYYYYYYYYYYYYYZ". Don't put a multiplier after the set:
^X([Y])Z$
or simply regexp {^XYZ$}
That won't catch anything. To make it do the same as the other (catch the Y character), you need the parentheses:
^X(Y)Z$

You can use the Visual Regexp tool to help, it provides feedback as you construct your regular expression.

Related

How to search for 0,a1[4],* where * is a wildcard in a list of 0,a2,4 0,a1[4],3 0,a1[4],5 .... in tcl

I tried lsearch -all $list_ 0,a1[4],*
a1[4] is stored in a variable
SO basically need
set var "a1[4]"
lsearch -all $list_ 0,$var,*
By default lsearch uses glob patterns (as described by the documentation for string match — it's the exact same matching engine being used). That's good because it means that * is a wildcard, but awkward because it means that [ is also special (it starts a character set match). You need some simple escaping, and to keep that sane you should put your whole pattern in {braces} so we don't need to fight with Tcl over what the meanings of bracket and backslash are:
lsearch -all $list_ {0,a1\[4\],*}
You don't need braces; you could write this instead:
lsearch -all $list_ 0,a1\\\[4\\\],*
But that's ugly! And difficult to maintain (trust me on that). So use braces, OK?
In the case where you're pulling the subpattern from a variable, things get more complicated. The fix is to use string map (or regsub) to condition the pattern piece.
# Split into three lines for clarity; qvar = “quoted var”
set ADD_BACKSLASHES {[ {\[} ] {\]}}
set qvar [string map $ADD_BACKSLASHES $var]
lsearch -all $list_ 0,$qvar,*

Split camelcase value with TCL

I have this TCL expression:
[string toupper [join [lrange [file split [value [topnode].file]] 1 1]]]
This retrieves companyName value from c:/companyName... and I need to split that value before the first capital letter into Company Name. Any ideas?
Thanks in advance.
That's rather more in one word than I would consider a good idea. It makes the whole thing quite opaque! Let's split it up.
Firstly, I would expect the base company name to be better retrieved with lindex from the split filename.
set companyName [lindex [file split [value [topnode].file]] 1]
Now, we need to process that to get the human-readable version out of it. Alas, that's going be a bit difficult without knowing what's been done to it, but if we use as our example fooBarBoo_grill then we can see what we can do. First, we get the pieces with some regular expressions (this part might need tweaking if there are non-ASCII characters involved, or if certain critical characters need special treatment):
# set companyName "fooBarBoo_grill"
set pieces [regexp -all -inline {[a-z]+|[A-Z][a-z]*} $companyName]
# pieces = foo Bar Boo grill
Next, we need to capitalise. I'll assume you're using Tcl 8.6 and so have lmap as it is perfect for this task. The string totitle command has been around for a very long time.
set pieces [lmap word $pieces {string totitle $word}]
# pieces = Foo Bar Boo Grill
That list might need a bit more tweaking, or it might be OK as it is. An example of tweaking that might be necessary is if you've got an Irish name like O'Hanrahan, or if you need to insert a comma before and period after Inc.
Finally, we properly ought to set companyName [join $pieces] to get back a true string, but that doesn't have a noticeable effect with a list of words made purely out of letters. Also, more complex joins with regular expressions might be needed if you've done insertion of prefixing punctuation (the , Inc. case).
If I was doing this for real, I'd try to have the proper company name expressed directly elsewhere rather than relying on the filename. Much simpler to get right!
To begin with, try using
lindex [file split [value [topnode].file]] 1
The lrange command will return a list, which might cause problems with some directory names. The join command should be pointless if you don't use lrange, and string toupper removes the information you need to do the operation you want to do.
To split before uppercase letters, you can use repetitive matches of either (?:[a-z]+|[A-Z][a-z]+) (ASCII / English alphabet letters only) or (?:[[:lower:]]+|[[:upper:]][[:lower:]]+) (any Unicode letters).
% regexp -all -inline {(?:[a-z]+|[A-Z][a-z]+)} camelCaseWord
camel Case Word
Use string totitle to change the first letter of the first word to upper case.
Documentation:
file,
lindex,
regexp,
string,
Syntax of Tcl regular expressions

How to grep parameters inside square brackets?

Could you please help me with the following script?
It is a Tcl script which Synopsys IC Compiler II will source.
set_dont_use [get_lib_cells */*CKGT*0P*] -power
set_dont_use [get_lib_cells */*CKTT*0P*] -setup
May I know how to take only */*CKGT*0P* and */*CKTT*0P* and assign these to a variable.
Of course you can treat a Tcl script as something you search through; it's just a file with text in it after all.
Let's write a script to select the text out. It'll be a Tcl script, of course. For readability, I'm going to put the regular expression itself in a global variable; treat it like a constant. (In larger scripts, I find it helps a lot to give names to REs like this, as those names can be used to remind me of the purpose of the regular expression. I'll call it “RE” here.)
set f [open theScript.tcl]
# Even with 10 million lines, modern computers will chew through it rapidly
set lines [split [read $f] "\n"]
close $f
# This RE will match the sample lines you've told us about; it might need tuning
# for other inputs (and knowing what's best is part of the art of RE writing)
set RE {^set_dont_use \[get_lib_cells ([\w*/]+)\] -\w+$}
foreach line $lines {
if {[regexp $RE $line -> term]} {
# At this point, the part you want is assigned to $term
puts "FOUND: $term"
}
}
The key things in the RE above? It's in braces to reduce backslash-itis. Literal square brackets are backslashed. The bit in parentheses is the bit we're capturing into the term variable. [\w*/]+ matches a sequence of one or more characters from a set consisting of “standard word characters” plus * and /.
The use of regexp has -> as a funny name for a variable that is ignored. I could have called it dummy instead; it's going to have the whole matched string in it when the RE matches, but we already have that in $term as we're using a fully-anchored RE. But I like using -> as a mnemonic for “assign the submatches into these”. Also, the formal result of regexp is the number of times the RE matched; without the -all option, that's effectively a boolean that is true exactly when there was a match, which is useful. Very useful.
To assign the output of any command <command> to a variable with a name <name>, use set <name> [<command>]:
> set hello_length [string length hello]
5
> puts "The length of 'hello' is $hello_length."
The length of 'hello' is 5.
In your case, maybe this is what you want? (I still don't quite understand the question, though.)
set ckgt_cells [get_lib_cells */*CKGT*0P*]
set cktt_cells [get_lib_cells */*CKTT*0P*]

How to trim two words from right of a string

I want to remove two words from right of a string.
For example:
set str "sachin is the pride of india"
I need to remove india and of from right and there should be no space after that.
I have tried using string trimright.
The string trimright command is exactly the wrong tool for this; it treats its trim argument as a set of characters to remove, not a literal. The simplest way of doing this is with lreplace, provided the string doesn't contain list metacharacters and you don't care about the number of spaces.
set shortened [lreplace $str end-1 end]
If you need to do it reliably, regular expressions are the tool of choice.
set shortened [regsub {\s*\S+\s+\S+\s*$} $str ""]
Use regsub for this. Please.

TCL: Get a number out of a string using scan

I have the following string:
set operating_period "1.86ns" ; # set dominant default period , from create_clock command in sdc
I would like to get the number out of this. So the result should be
1.86
Any suggestions how to do that in TCL?
I tried scan, but obviously I fail =( ...
Use scan:
% set operating_period "1.86ns"
1.86ns
% set x [scan $operating_period %f]
1.86
http://www.tcl.tk/man/tcl8.6/TclCmd/scan.htm
http://www.tcl.tk/man/tcl8.6/TclCmd/format.htm
Sometimes, when working with particularly ill-formed data (e.g., anything written free-form by people) you have to use a mixture of techniques to extract the data. For example, you can use both regexp and scan:
set inputString "wow yet 183.326ns another float"
if {[scan [regexp -inline {[\d.]+ns} $inputString] "%f" value] == 1} {
# Found something! It's in $value now
}
The regexp does the extraction (-inline is nice; it makes regexp return what it matched) and scan “extracts the sense” from what was found and stores a sane floating-point number in $value, assuming there was any there in the first place. You might need to tweak the RE to get best results (for example, the current one won't cope with negative numbers right now).