Parse ipv6 address in tcl - tcl

In TCL, I need to split an ipv6 address and port combination in the format [fec1::10]:80 to fec1::10 and 80.
Please suggest a way to do it.
Thanks!

(In the examples below I assume that the address will be subjected to further processing (expansion, etc) because there are a lot of forms that it can take: hence, in this preliminary stage I treat it simply as a string of any character rather than groups of hex digits separated by colons. The ip package mentioned by kostix is excellent for processing the address, just not for separating the address from the port number.)
Given the variable
set addrport {[fec1::10]:80}
There are several possible ways, including brute-force regular expression matching:
regexp -- {\[(.+)\]:(\d+)} $addrport -> addr port
(which means "capture a non-empty sequence of any character that is inside literal brackets, then skip a colon and thereafter capture a non-empty sequence of any digit"; the three variables at the end of the invocation get the whole match, the first captured submatch, and the second captured submatch, respectively)
(note 1: American usage of the word 'brackets' here: for British speakers I mean square brackets, not round brackets/parentheses)
(note 2: I'm using the code fragment -> in two ways: as a variable name in the above example, and as a commenting symbol denoting return value in some of the following examples. I hope you're not confused by it. Both usages are kind of a convention and are seen a lot in Tcl examples.)
regexp -inline -- {\[(.+)\]:(\d+)} $addrport
# -> {[fec1::10]:80} fec1::10 80
will instead give you a list with three elements (again, the whole match, the address, and the port).
Many programmers will stop looking for possible solutions here, but you're still with me, aren't you? Because there are more, possibly better, methods.
Another alternative is to convert the string to a two-element list (where the first element is the address and the second the port number):
split [string map {[ {} ]: { }} $addrport]
# -> fec1::10 80
(which means "replace any left brackets with empty strings (i.e. remove them) and any substrings that consist of a right bracket and a colon with a single space; then split the resulting string into a list")
it can be used to assign to variables like so:
lassign [split [string map {[ {} ]: { }} $addrport]] addr port
(which performs a sequential assign from the resulting list into two variables).
The scan command will also work:
scan $addrport {[%[^]]]:%d} addr port
(which means "after a left bracket, take a sequence of characters that does not include a right bracket, then skip a right bracket and a colon and then take a decimal number")
want the result as a list instead?
scan $addrport {[%[^]]]:%d}
# -> fec1::10 80
Even split works, in a slightly roundabout way:
set list [split $addrport {[]:}]
# -> {} fec1 {} 10 {} 80
set addr [lindex $list 1]::[lindex $list 3]
set port [lindex $list 5]
(note: this will have to be rewritten for addresses that are expanded to more than two groups).
Take your pick, but remember to be wary of regular expressions. Quicker, easier, more seductive they are, but always bite you in the ass in the end, they will.
(Note: the 'Hoodiecrow' mentioned in the comments is me, I used that nick earlier. Also note that at the time this question appeared I was still sceptical towards the ip module: today I swear by it. One is never to old to learn, hopefully.)

The ip package from the Tcl standard library can do that, and more.

One of the simplest ways to parse these sorts of things is with scan. It's the command that many Tclers forget!
set toParse {[fec1::10]:80}
scan $toParse {[%[a-f0-9:]]:%d} ip port
puts "host is $ip and port is $port"
The trick is that you want “scan charcters from limited set”. And in production code you want to check the result of scan, which should be the number of groups matched (2 in this case).

Related

apparent inconsistency read/write variable

I'm learning about Tcl just now. I've seen just a bit of it, I see for instance to create a variable (and initialize it) you can do
set varname value
I am familiarizing with the fact that basically everything is a string, such as "value" above, but "varname" gets kind of a special treatment I guess because of the "set" built-in function, so varname is not interpreted as a string but rather as a name.
I can later on access the value with $varname, and this is fine to me, it is used to specify varname is not to be considered as a string.
I'm now reading about lists and a couple commands make me a bit confused
set colors {"aqua" "maroon" "cyan"}
puts "list length is [llength $colors]"
lappend colors "purple"
So clearly "lappend" is another one of such functions like set that can interpret the first argument as a name and not a string, but then why didn't they make it llength the same (no need for $)?
I'm thinking that it's just a convention that, in general, when you "read" a variable you need the $ while you don't for "writing".
A different look at the question: what Tcl commands are appropriate for list literals?
It's valid to count the elements of a list literal:
llength {my dog has fleas}
But it doesn't make sense to append a new element to a literal
lappend {my dog has fleas} and ticks
(That is actually valid Tcl, but it sets the odd variable ${my dog has fleas})
this is more sensible:
set mydog {my dog has fleas}
lappend mydog and ticks
Names are strings. Or rather a string is a name because it is used as a name. And $ in Tcl means “read this variable right now”, unlike in some other languages where it really means “here is a variable name”.
The $blah syntax for reading from a variable is convenient syntax that approximately stands in for doing [set blah] (with just one argument). For simple names, they become the same bytecode, but the $… form doesn't handle all the weird edge cases (usually with generated names) that the other one does. If a command (such as set, lappend, unset or incr) takes a variable name, it's because it is going to write to that variable and it will typically be documented to take a varName (variable name, of course) or something like that. Things that just read the value (e.g., llength or lindex) will take the value directly and not the name of a variable, and it is up to the caller to provide the value using whatever they want, perhaps $blah or [call something].
In particular, if you have:
proc ListRangeBy {from to {by 1}} {
set result {}
for {set x $from} {$x <= $to} {incr x $by} {
lappend result $x
}
return $result
}
then you can do:
llength [ListRangeBy 3 77 8]
and
set listVar [ListRangeBy 3 77 8]
llength $listVar
and get exactly the same value out of the llength. The llength doesn't need to know anything special about what is going on.

Split camelcase value with TCL

I have this TCL expression:
[string toupper [join [lrange [file split [value [topnode].file]] 1 1]]]
This retrieves companyName value from c:/companyName... and I need to split that value before the first capital letter into Company Name. Any ideas?
Thanks in advance.
That's rather more in one word than I would consider a good idea. It makes the whole thing quite opaque! Let's split it up.
Firstly, I would expect the base company name to be better retrieved with lindex from the split filename.
set companyName [lindex [file split [value [topnode].file]] 1]
Now, we need to process that to get the human-readable version out of it. Alas, that's going be a bit difficult without knowing what's been done to it, but if we use as our example fooBarBoo_grill then we can see what we can do. First, we get the pieces with some regular expressions (this part might need tweaking if there are non-ASCII characters involved, or if certain critical characters need special treatment):
# set companyName "fooBarBoo_grill"
set pieces [regexp -all -inline {[a-z]+|[A-Z][a-z]*} $companyName]
# pieces = foo Bar Boo grill
Next, we need to capitalise. I'll assume you're using Tcl 8.6 and so have lmap as it is perfect for this task. The string totitle command has been around for a very long time.
set pieces [lmap word $pieces {string totitle $word}]
# pieces = Foo Bar Boo Grill
That list might need a bit more tweaking, or it might be OK as it is. An example of tweaking that might be necessary is if you've got an Irish name like O'Hanrahan, or if you need to insert a comma before and period after Inc.
Finally, we properly ought to set companyName [join $pieces] to get back a true string, but that doesn't have a noticeable effect with a list of words made purely out of letters. Also, more complex joins with regular expressions might be needed if you've done insertion of prefixing punctuation (the , Inc. case).
If I was doing this for real, I'd try to have the proper company name expressed directly elsewhere rather than relying on the filename. Much simpler to get right!
To begin with, try using
lindex [file split [value [topnode].file]] 1
The lrange command will return a list, which might cause problems with some directory names. The join command should be pointless if you don't use lrange, and string toupper removes the information you need to do the operation you want to do.
To split before uppercase letters, you can use repetitive matches of either (?:[a-z]+|[A-Z][a-z]+) (ASCII / English alphabet letters only) or (?:[[:lower:]]+|[[:upper:]][[:lower:]]+) (any Unicode letters).
% regexp -all -inline {(?:[a-z]+|[A-Z][a-z]+)} camelCaseWord
camel Case Word
Use string totitle to change the first letter of the first word to upper case.
Documentation:
file,
lindex,
regexp,
string,
Syntax of Tcl regular expressions

Tcl quoting proc to sanitise string to pass to other shells

I want to pass a dict value to another shell (in my application it passes through a few 'shell' levels), and the dict contains characters (space, double quotes, etc) that cause issues.
I can use something like ::base64::encode -wrapchar $dict and the corresponding ::base64::decode $str and it works as expected but the result is, of course, pretty much unreadable.
However, for debugging & presentation reasons I would prefer an encoded/sanitised string that resembled the original dict value inasmuch as reasonable and used a character set that avoids spaces, quotes, etc.
So, I am looking for something like ::base64 mapping procs but with a lighter
touch.
Any suggestions would be appreciated.
You can make lighter-touch quoting schemes using either string map or regsub to do the main work.
Here's an example of string map:
set input "O'Donnell's Bait Shop"
set quoted '[string map {' {'\''}} $input]' ; #'# This comment just because of stupid Stack Overflow syntax highlighter
puts $quoted
# ==> 'O'\''Donnell'\''s Bait Shop'
Here's an example of regsub:
set input "This uses a hypothetical quoting of some letters"
set quoted <[regsub -all {[pqr]} $input {«&»}]>
puts $quoted
# ==> <This uses a hy«p»othetical «q»uoting of some lette«r»s>
You'll need to decide what sort of quoting you really want to use. For myself, if I was going through several shells, I'd be wanting to avoid quoting at all (because it is difficult to get right) and instead find ways to send the data in some other way, perhaps over a pipeline or in a temporary file. At a pinch, I'd use an environment variable, as shells tend to not mess around with those nearly as much as arguments.

How to grep parameters inside square brackets?

Could you please help me with the following script?
It is a Tcl script which Synopsys IC Compiler II will source.
set_dont_use [get_lib_cells */*CKGT*0P*] -power
set_dont_use [get_lib_cells */*CKTT*0P*] -setup
May I know how to take only */*CKGT*0P* and */*CKTT*0P* and assign these to a variable.
Of course you can treat a Tcl script as something you search through; it's just a file with text in it after all.
Let's write a script to select the text out. It'll be a Tcl script, of course. For readability, I'm going to put the regular expression itself in a global variable; treat it like a constant. (In larger scripts, I find it helps a lot to give names to REs like this, as those names can be used to remind me of the purpose of the regular expression. I'll call it “RE” here.)
set f [open theScript.tcl]
# Even with 10 million lines, modern computers will chew through it rapidly
set lines [split [read $f] "\n"]
close $f
# This RE will match the sample lines you've told us about; it might need tuning
# for other inputs (and knowing what's best is part of the art of RE writing)
set RE {^set_dont_use \[get_lib_cells ([\w*/]+)\] -\w+$}
foreach line $lines {
if {[regexp $RE $line -> term]} {
# At this point, the part you want is assigned to $term
puts "FOUND: $term"
}
}
The key things in the RE above? It's in braces to reduce backslash-itis. Literal square brackets are backslashed. The bit in parentheses is the bit we're capturing into the term variable. [\w*/]+ matches a sequence of one or more characters from a set consisting of “standard word characters” plus * and /.
The use of regexp has -> as a funny name for a variable that is ignored. I could have called it dummy instead; it's going to have the whole matched string in it when the RE matches, but we already have that in $term as we're using a fully-anchored RE. But I like using -> as a mnemonic for “assign the submatches into these”. Also, the formal result of regexp is the number of times the RE matched; without the -all option, that's effectively a boolean that is true exactly when there was a match, which is useful. Very useful.
To assign the output of any command <command> to a variable with a name <name>, use set <name> [<command>]:
> set hello_length [string length hello]
5
> puts "The length of 'hello' is $hello_length."
The length of 'hello' is 5.
In your case, maybe this is what you want? (I still don't quite understand the question, though.)
set ckgt_cells [get_lib_cells */*CKGT*0P*]
set cktt_cells [get_lib_cells */*CKTT*0P*]

TCL RegExp IP exceptions

I have the following TCl regexp to extract an exact IP from a line:
set ip [regexp -all -inline {((([2][5][0-5]|([2][0-4]|[1][0-9]|[0-9])?[0-9])\.){3})([2][5][0-5]|([2][0-4]|[1][0-9]|[0-9])?[0-9])} $ip_text]
I'm using it to analyze a log file, and it works fine, except it's also extracting the domain name IP portion when the domain name also contains an IP format (but usually in reverse), which I don't wan't
eg when ip_text = Log File 61.140.142.192 - 2012-06-16, 192.142.140.61.broad.gz.gd.dynamic.163data.com.cn, CHN, 1
I get 61.140.142.192 & 192.142.140.61 but only 61.140.142.192 is legit.
and when ip_text = Entry "61.140.170.118" resolved from 118.170.140.61.broad.gz.gd.dynamic.163data.com.cn, and 61.140.185.45 verified.
I get 61.140.170.118, 118.170.140.61 & 164.111.111.34 but only 61.140.170.118 & 61.140.185.45 are legit.
Is there a way to make the regexpr exclude IP's that have a domain name character after it? ie exclude <IP><dot> or <IP><dash> or <IP><any alpha/numeric character>
You can use a negative lookahead constraint on the end of that RE. Those are written as (?!\.|\d) in this case, which matches when the next character is not a . or a digit (it also matches at the end of the string, when there's no next character at all). With complicated regular expressions it's often easier to save them in a variable (often global) since that effectively lets you name the RE.
set IPAddrRE {(((25[0-5]|(2[0-4]|1[0-9]|[1-9])?[0-9])\.){3})(25[0-5]|(2[0-4]|1[0-9]|[1-9])?[0-9])(?!\.|\d)}
set ip [regexp -all -inline $IPAddrRE $ip_text]
The reason you need to prevent the follower being a digit? Without that, the RE can stop matching one character earlier, allowing it to pick 192.142.140.6 out of your sample text as well as the value you actually want.
You should consider using non-capturing grouping for this task. Replacing (…) with (?:…) will allow the RE engine to use a more efficient matcher internally. On a lot of text, this will make a substantial difference. For example, with this version:
set IPAddrRE {(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9])?[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9])?[0-9])(?!\.|\d)}
I see that the time to execute is about half what the version I listed in the first part of this answer is (and about 40% of what your original version required). However, it produces different results — none of the bits that you probably don't require — so you'll need to adapt other code too:
% set ip [regexp -all -inline $IPAddrRE $ip_text]
61.140.142.192
It's often a good idea to dumb down your regular expressions instead of trying to make them smarter.
lmap candidate [regexp -inline -all {[\d.]+} $txt] {
if {[llength [split $candidate .]] == 4} {
set candidate
} else {
continue
}
}
will pick out the exact three numbers you wanted from your text.
Documentation: continue, if, llength, lmap, lmap replacement, Syntax of Tcl regular expressions, regexp, set, split