TCL: Get a number out of a string using scan - tcl

I have the following string:
set operating_period "1.86ns" ; # set dominant default period , from create_clock command in sdc
I would like to get the number out of this. So the result should be
1.86
Any suggestions how to do that in TCL?
I tried scan, but obviously I fail =( ...

Use scan:
% set operating_period "1.86ns"
1.86ns
% set x [scan $operating_period %f]
1.86
http://www.tcl.tk/man/tcl8.6/TclCmd/scan.htm
http://www.tcl.tk/man/tcl8.6/TclCmd/format.htm

Sometimes, when working with particularly ill-formed data (e.g., anything written free-form by people) you have to use a mixture of techniques to extract the data. For example, you can use both regexp and scan:
set inputString "wow yet 183.326ns another float"
if {[scan [regexp -inline {[\d.]+ns} $inputString] "%f" value] == 1} {
# Found something! It's in $value now
}
The regexp does the extraction (-inline is nice; it makes regexp return what it matched) and scan “extracts the sense” from what was found and stores a sane floating-point number in $value, assuming there was any there in the first place. You might need to tweak the RE to get best results (for example, the current one won't cope with negative numbers right now).

Related

Split camelcase value with TCL

I have this TCL expression:
[string toupper [join [lrange [file split [value [topnode].file]] 1 1]]]
This retrieves companyName value from c:/companyName... and I need to split that value before the first capital letter into Company Name. Any ideas?
Thanks in advance.
That's rather more in one word than I would consider a good idea. It makes the whole thing quite opaque! Let's split it up.
Firstly, I would expect the base company name to be better retrieved with lindex from the split filename.
set companyName [lindex [file split [value [topnode].file]] 1]
Now, we need to process that to get the human-readable version out of it. Alas, that's going be a bit difficult without knowing what's been done to it, but if we use as our example fooBarBoo_grill then we can see what we can do. First, we get the pieces with some regular expressions (this part might need tweaking if there are non-ASCII characters involved, or if certain critical characters need special treatment):
# set companyName "fooBarBoo_grill"
set pieces [regexp -all -inline {[a-z]+|[A-Z][a-z]*} $companyName]
# pieces = foo Bar Boo grill
Next, we need to capitalise. I'll assume you're using Tcl 8.6 and so have lmap as it is perfect for this task. The string totitle command has been around for a very long time.
set pieces [lmap word $pieces {string totitle $word}]
# pieces = Foo Bar Boo Grill
That list might need a bit more tweaking, or it might be OK as it is. An example of tweaking that might be necessary is if you've got an Irish name like O'Hanrahan, or if you need to insert a comma before and period after Inc.
Finally, we properly ought to set companyName [join $pieces] to get back a true string, but that doesn't have a noticeable effect with a list of words made purely out of letters. Also, more complex joins with regular expressions might be needed if you've done insertion of prefixing punctuation (the , Inc. case).
If I was doing this for real, I'd try to have the proper company name expressed directly elsewhere rather than relying on the filename. Much simpler to get right!
To begin with, try using
lindex [file split [value [topnode].file]] 1
The lrange command will return a list, which might cause problems with some directory names. The join command should be pointless if you don't use lrange, and string toupper removes the information you need to do the operation you want to do.
To split before uppercase letters, you can use repetitive matches of either (?:[a-z]+|[A-Z][a-z]+) (ASCII / English alphabet letters only) or (?:[[:lower:]]+|[[:upper:]][[:lower:]]+) (any Unicode letters).
% regexp -all -inline {(?:[a-z]+|[A-Z][a-z]+)} camelCaseWord
camel Case Word
Use string totitle to change the first letter of the first word to upper case.
Documentation:
file,
lindex,
regexp,
string,
Syntax of Tcl regular expressions

How to grep parameters inside square brackets?

Could you please help me with the following script?
It is a Tcl script which Synopsys IC Compiler II will source.
set_dont_use [get_lib_cells */*CKGT*0P*] -power
set_dont_use [get_lib_cells */*CKTT*0P*] -setup
May I know how to take only */*CKGT*0P* and */*CKTT*0P* and assign these to a variable.
Of course you can treat a Tcl script as something you search through; it's just a file with text in it after all.
Let's write a script to select the text out. It'll be a Tcl script, of course. For readability, I'm going to put the regular expression itself in a global variable; treat it like a constant. (In larger scripts, I find it helps a lot to give names to REs like this, as those names can be used to remind me of the purpose of the regular expression. I'll call it “RE” here.)
set f [open theScript.tcl]
# Even with 10 million lines, modern computers will chew through it rapidly
set lines [split [read $f] "\n"]
close $f
# This RE will match the sample lines you've told us about; it might need tuning
# for other inputs (and knowing what's best is part of the art of RE writing)
set RE {^set_dont_use \[get_lib_cells ([\w*/]+)\] -\w+$}
foreach line $lines {
if {[regexp $RE $line -> term]} {
# At this point, the part you want is assigned to $term
puts "FOUND: $term"
}
}
The key things in the RE above? It's in braces to reduce backslash-itis. Literal square brackets are backslashed. The bit in parentheses is the bit we're capturing into the term variable. [\w*/]+ matches a sequence of one or more characters from a set consisting of “standard word characters” plus * and /.
The use of regexp has -> as a funny name for a variable that is ignored. I could have called it dummy instead; it's going to have the whole matched string in it when the RE matches, but we already have that in $term as we're using a fully-anchored RE. But I like using -> as a mnemonic for “assign the submatches into these”. Also, the formal result of regexp is the number of times the RE matched; without the -all option, that's effectively a boolean that is true exactly when there was a match, which is useful. Very useful.
To assign the output of any command <command> to a variable with a name <name>, use set <name> [<command>]:
> set hello_length [string length hello]
5
> puts "The length of 'hello' is $hello_length."
The length of 'hello' is 5.
In your case, maybe this is what you want? (I still don't quite understand the question, though.)
set ckgt_cells [get_lib_cells */*CKGT*0P*]
set cktt_cells [get_lib_cells */*CKTT*0P*]

TCL RegExp IP exceptions

I have the following TCl regexp to extract an exact IP from a line:
set ip [regexp -all -inline {((([2][5][0-5]|([2][0-4]|[1][0-9]|[0-9])?[0-9])\.){3})([2][5][0-5]|([2][0-4]|[1][0-9]|[0-9])?[0-9])} $ip_text]
I'm using it to analyze a log file, and it works fine, except it's also extracting the domain name IP portion when the domain name also contains an IP format (but usually in reverse), which I don't wan't
eg when ip_text = Log File 61.140.142.192 - 2012-06-16, 192.142.140.61.broad.gz.gd.dynamic.163data.com.cn, CHN, 1
I get 61.140.142.192 & 192.142.140.61 but only 61.140.142.192 is legit.
and when ip_text = Entry "61.140.170.118" resolved from 118.170.140.61.broad.gz.gd.dynamic.163data.com.cn, and 61.140.185.45 verified.
I get 61.140.170.118, 118.170.140.61 & 164.111.111.34 but only 61.140.170.118 & 61.140.185.45 are legit.
Is there a way to make the regexpr exclude IP's that have a domain name character after it? ie exclude <IP><dot> or <IP><dash> or <IP><any alpha/numeric character>
You can use a negative lookahead constraint on the end of that RE. Those are written as (?!\.|\d) in this case, which matches when the next character is not a . or a digit (it also matches at the end of the string, when there's no next character at all). With complicated regular expressions it's often easier to save them in a variable (often global) since that effectively lets you name the RE.
set IPAddrRE {(((25[0-5]|(2[0-4]|1[0-9]|[1-9])?[0-9])\.){3})(25[0-5]|(2[0-4]|1[0-9]|[1-9])?[0-9])(?!\.|\d)}
set ip [regexp -all -inline $IPAddrRE $ip_text]
The reason you need to prevent the follower being a digit? Without that, the RE can stop matching one character earlier, allowing it to pick 192.142.140.6 out of your sample text as well as the value you actually want.
You should consider using non-capturing grouping for this task. Replacing (…) with (?:…) will allow the RE engine to use a more efficient matcher internally. On a lot of text, this will make a substantial difference. For example, with this version:
set IPAddrRE {(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9])?[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9])?[0-9])(?!\.|\d)}
I see that the time to execute is about half what the version I listed in the first part of this answer is (and about 40% of what your original version required). However, it produces different results — none of the bits that you probably don't require — so you'll need to adapt other code too:
% set ip [regexp -all -inline $IPAddrRE $ip_text]
61.140.142.192
It's often a good idea to dumb down your regular expressions instead of trying to make them smarter.
lmap candidate [regexp -inline -all {[\d.]+} $txt] {
if {[llength [split $candidate .]] == 4} {
set candidate
} else {
continue
}
}
will pick out the exact three numbers you wanted from your text.
Documentation: continue, if, llength, lmap, lmap replacement, Syntax of Tcl regular expressions, regexp, set, split

Parse ipv6 address in tcl

In TCL, I need to split an ipv6 address and port combination in the format [fec1::10]:80 to fec1::10 and 80.
Please suggest a way to do it.
Thanks!
(In the examples below I assume that the address will be subjected to further processing (expansion, etc) because there are a lot of forms that it can take: hence, in this preliminary stage I treat it simply as a string of any character rather than groups of hex digits separated by colons. The ip package mentioned by kostix is excellent for processing the address, just not for separating the address from the port number.)
Given the variable
set addrport {[fec1::10]:80}
There are several possible ways, including brute-force regular expression matching:
regexp -- {\[(.+)\]:(\d+)} $addrport -> addr port
(which means "capture a non-empty sequence of any character that is inside literal brackets, then skip a colon and thereafter capture a non-empty sequence of any digit"; the three variables at the end of the invocation get the whole match, the first captured submatch, and the second captured submatch, respectively)
(note 1: American usage of the word 'brackets' here: for British speakers I mean square brackets, not round brackets/parentheses)
(note 2: I'm using the code fragment -> in two ways: as a variable name in the above example, and as a commenting symbol denoting return value in some of the following examples. I hope you're not confused by it. Both usages are kind of a convention and are seen a lot in Tcl examples.)
regexp -inline -- {\[(.+)\]:(\d+)} $addrport
# -> {[fec1::10]:80} fec1::10 80
will instead give you a list with three elements (again, the whole match, the address, and the port).
Many programmers will stop looking for possible solutions here, but you're still with me, aren't you? Because there are more, possibly better, methods.
Another alternative is to convert the string to a two-element list (where the first element is the address and the second the port number):
split [string map {[ {} ]: { }} $addrport]
# -> fec1::10 80
(which means "replace any left brackets with empty strings (i.e. remove them) and any substrings that consist of a right bracket and a colon with a single space; then split the resulting string into a list")
it can be used to assign to variables like so:
lassign [split [string map {[ {} ]: { }} $addrport]] addr port
(which performs a sequential assign from the resulting list into two variables).
The scan command will also work:
scan $addrport {[%[^]]]:%d} addr port
(which means "after a left bracket, take a sequence of characters that does not include a right bracket, then skip a right bracket and a colon and then take a decimal number")
want the result as a list instead?
scan $addrport {[%[^]]]:%d}
# -> fec1::10 80
Even split works, in a slightly roundabout way:
set list [split $addrport {[]:}]
# -> {} fec1 {} 10 {} 80
set addr [lindex $list 1]::[lindex $list 3]
set port [lindex $list 5]
(note: this will have to be rewritten for addresses that are expanded to more than two groups).
Take your pick, but remember to be wary of regular expressions. Quicker, easier, more seductive they are, but always bite you in the ass in the end, they will.
(Note: the 'Hoodiecrow' mentioned in the comments is me, I used that nick earlier. Also note that at the time this question appeared I was still sceptical towards the ip module: today I swear by it. One is never to old to learn, hopefully.)
The ip package from the Tcl standard library can do that, and more.
One of the simplest ways to parse these sorts of things is with scan. It's the command that many Tclers forget!
set toParse {[fec1::10]:80}
scan $toParse {[%[a-f0-9:]]:%d} ip port
puts "host is $ip and port is $port"
The trick is that you want “scan charcters from limited set”. And in production code you want to check the result of scan, which should be the number of groups matched (2 in this case).

Help needed in writing regular expression -- TCL

Just seeking a favour to write a regular expression to match the following set of strings. I want to write an expression which matches all the following strings TCL
i) ( XYZ XZZ XVZ XWZ )
Clue : Starting string is X and Z ending string is same for all the pairs. Only the middle string is differs Y Z V W.
My trial: [regexp {^X([Y|Z|V|W]*)Z$}]
I want to write an another regexp which catches/matches only the following string wherever comes
ii) (XYZ)
My trial: [regexp {^X([Y]*)Z$}] or simply regexp {^XYZ$}
Just want to make sure its a correct approach. Is there any other way available to optimize the regexp :)
i) 1st Question Tested
set to_Match_Str "XYZ XZZ XVZ XWZ"
foreach {wholeStr to_Match_Str} [regexp -all -inline {X[YZVW]Z} $to_Match_Str] {
puts "MATCH $to_Match_Str in the list"
}
It prints only XZZ XWZ from the list. Its leaves out XYZ & XVZ
When I include the paranthesis [regexp -all -inline {X([YZVW])Z} $to_Match_Str]. It prints all the middle characters correctly Y Z V W
i) (XYZ XZZ XVZ XWZ)
Clue : Starting string is X and Z ending string is same for all the pairs. Only the middle string is differs Y Z V W.
My trial: [regexp {^X([Y|Z|V|W]*)Z$}]
Assuming you're not after literal parentheses around the whole lot, you match that using this:
regexp {X([YZVW])Z} $string -> matchedSubstr
That's because the interior strings are all single characters. (It also stores the matched substring in the variable matchedSubstr; choose any variable name there that you want.) You should not use | inside a [] in a regular expression, as it has no special meaning there. (You might need to add ^$ anchors round the outside.)
On the other hand, if you want to match multiple character sequences (which the Y etc. are just stand-ins for) then you use this:
regexp {X(Y|Z|V|W)Z} $string -> matchedSubstr
Notice that | is being used here, but [] is not.
If your real string has many of these strings (whichever pattern you're using to match them) then the easiest way to extract them all is with the -all -inline options to regexp, typically used in a foreach like this:
foreach {wholeStr matchedSubstr} [regexp -all -inline {X([YZVW])Z} $string] {
puts "Hey! I found a $matchSubstr in there!"
}
Mix and match to taste.
My trial: [regexp {^X([Y]*)Z$}] or simply regexp {^XYZ$}
Just want to make sure its a correct approach. Is there any other way available to optimize the regexp :)
That's optimal for an exact comparison. And in fact Tcl will optimize that internally to a straight string equality test if that's literal.
My trial: [regexp {^X([Y|Z|V|W]*)Z$}]
That would match the strings given, but as you are using the * multiplier it would also match strings like "XZ", "XYYYYYYYYYYYYYYYYZ" and "XYZYVWZWWWZVYYWZ". To match the middle character only once, don't use a multiplier:
^X([Y|Z|V|W])Z$
My trial: [regexp {^X([Y]*)Z$}]
The same there, it will also match strings like "XZ", "XYYZ" and "XYYYYYYYYYYYYYYYYZ". Don't put a multiplier after the set:
^X([Y])Z$
or simply regexp {^XYZ$}
That won't catch anything. To make it do the same as the other (catch the Y character), you need the parentheses:
^X(Y)Z$
You can use the Visual Regexp tool to help, it provides feedback as you construct your regular expression.