not able to understand output of regexp in tcl

not able to understand output of regexp in tcl - tcl

Please explain the output of this tcl command , i am
not getting the result .
on tclsh
set line = "Clock Domain: clk"
regexp {Clock Domain:\s*(.+)} $line tmp1 Pnr_clk
$tmp1 = "Clock Domain: clk"
$Pnr_clk = clk
How this value is assigned

The Tcl regexp command is documented to assign the submatches to the variables whose names you provide. The first such variable you give is tmp1, which gets the whole string that the overall RE matched (which might be a substring of the overall input string; Tcl's RE engine does not anchor matches by default). The second such variable is Pnr_clk, which gets what the first parenthesized sub-RE matches, which in this case is clk because the \s* before the parenthesis greedily consumed the whitespace after Clock Domain:.

Related

String.IndexOf() returns unexpected value - cannot extract substring between two search strings

Script to manipulate some proper names in a web story to help my reading tool pronounce them correctly.
I get the content of a webpage via
$webpage = (Invoke-WebRequest -URI 'https://wanderinginn.com/2018/03/20/4-20-e/').Content
This $webpage should be of type String.
Now
$webpage.IndexOf('<div class="entry-content">')
returns correct value, yet
$webpage.IndexOf("Previous Chapter")
returns unexpected value and I need some explanation why or how I can find the error myself.
In theory it should cut the "body" of the page run it through a list of proper nouns I want to Replace and push this into a htm-file.
It all works, but the value of IndexOf("Prev...") does not.
Edit:
After invoke-webrequest I can
Set-Clipboard $webrequest
and post this in notepad++, there I can find both 'div class="entry-content"' and 'Previous Chapter'.
If I do something like
Set-Clipboard $webpage.substring(
$webpage.IndexOf('<div class="entry-content">'),
$webpage.IndexOf('PreviousChapter')
)
I would expect Powershell to correctly determine both first instances of those strings and cut between. Therefore my clipboard should now have my desired content, yet the string goes further than the first occurrence.

tl;dr
You had a misconception about how String.Substring() method works: the second argument must be the length of the substring to extract, not the end index (character position) - see below.
As an alternative, you can use a more concise (albeit more complex) regex operation with -replace to extract the substring of interest in a single operation - see below.
Overall, it's better to use an HTML parser to extract the desired information, because string processing is brittle (HTML allows variations in whitespace, quoting style, ...).
As Lee_Dailey points out, you had a misconception about how the String.Substring() method works: its arguments are:
a starting index (0-based character position),
from which a substring of a given length should be returned.
Instead, you tried to pass another index as the length argument.
To fix this, you must subtract the lower index from the higher one, so as to obtain the length of the substring you want to extract:
A simplified example:
# Sample input from which to extract the substring
# '>>this up to here'
# or, better,
# 'this up to here'.
$webpage = 'Return from >>this up to here<<'
# WRONG (your attempt):
# *index* of 2nd substring is mistakenly used as the *length* of the
# substring to extract, which in this even *breaks*, because a length
# that exceeds the bounds of the string is specified.
$webpage.Substring(
$webpage.IndexOf('>>'),
$webpage.IndexOf('<<')
)
# OK, extracts '>>this up to here'
# The difference between the two indices is the correct length
# of the substring to extract.
$webpage.Substring(
($firstIndex = $webpage.IndexOf('>>')),
$webpage.IndexOf('<<') - $firstIndex
)
# BETTER, extracts 'this up to here'
$startDelimiter = '>>'
$endDelimiter = '<<'
$webpage.Substring(
($firstIndex = $webpage.IndexOf($startDelimiter) + $startDelimiter.Length),
$webpage.IndexOf($endDelimiter) - $firstIndex
)
General caveats re .Substring():
In the following cases this .NET method throws an exception, which PowerShell surfaces as a statement-terminating error; that is, by default the statement itself is terminated, but execution continues:
If you specify an index that is outside the bounds of the string (a 0-based character position less than 0 or one greater than the length of the string):
'abc'.Substring(4) # ERROR "startIndex cannot be larger than length of string"
If you specify a length whose endpoint would fall outside the bounds of the string (if the index plus the length yields an index that is greater than the length of the string).
'abc'.Substring(1, 3) # ERROR "Index and length must refer to a location within the string"
That said, you could use a single regex (regular expression) to extract the substring of interest, via the -replace operator:
$webpage = 'Return from >>this up to here<<'
# Outputs 'this up to here'
$webpage -replace '^.*?>>(.*?)<<.*', '$1'
The key is to have the regex match the entire string and extract the substring of interest via a capture group ((...)) whose value ($1) can then be used as the replacement string, effectively returning just that.
For more information about -replace, see this answer.
Note: In your specific case an additional tweak is needed, because you're dealing with a multiline string:
$webpage -replace '(?s).*?<div class="entry-content">(.*?)Previous Chapter.*', '$1'
Inline option ((?...)) s ensures that metacharacter . also matches newline characters (so that .* matches across lines), which it doesn't by default.
Note that you may have to apply escaping to the search strings to embed in the regex, if they happen to contain regex metacharacters (characters with special meaning in the context of a regex):
With embedded literal strings, \-escape characters as needed; e.g., escape .txt as \.txt
If a string to embed comes from a variable, apply [regex]::Escape() to its value first; e.g.:
$var = '.txt'
# [regex]::Escape() yields '\.txt', which ensures
# that '.txt' doesn't also match '_txt"
'a_txt a.txt' -replace ('a' + [regex]::Escape($var)), 'a.csv'

How to grep parameters inside square brackets?

Could you please help me with the following script?
It is a Tcl script which Synopsys IC Compiler II will source.
set_dont_use [get_lib_cells */*CKGT*0P*] -power
set_dont_use [get_lib_cells */*CKTT*0P*] -setup
May I know how to take only */*CKGT*0P* and */*CKTT*0P* and assign these to a variable.

Of course you can treat a Tcl script as something you search through; it's just a file with text in it after all.
Let's write a script to select the text out. It'll be a Tcl script, of course. For readability, I'm going to put the regular expression itself in a global variable; treat it like a constant. (In larger scripts, I find it helps a lot to give names to REs like this, as those names can be used to remind me of the purpose of the regular expression. I'll call it “RE” here.)
set f [open theScript.tcl]
# Even with 10 million lines, modern computers will chew through it rapidly
set lines [split [read $f] "\n"]
close $f
# This RE will match the sample lines you've told us about; it might need tuning
# for other inputs (and knowing what's best is part of the art of RE writing)
set RE {^set_dont_use \[get_lib_cells ([\w*/]+)\] -\w+$}
foreach line $lines {
if {[regexp $RE $line -> term]} {
# At this point, the part you want is assigned to $term
puts "FOUND: $term"
}
}
The key things in the RE above? It's in braces to reduce backslash-itis. Literal square brackets are backslashed. The bit in parentheses is the bit we're capturing into the term variable. [\w*/]+ matches a sequence of one or more characters from a set consisting of “standard word characters” plus * and /.
The use of regexp has -> as a funny name for a variable that is ignored. I could have called it dummy instead; it's going to have the whole matched string in it when the RE matches, but we already have that in $term as we're using a fully-anchored RE. But I like using -> as a mnemonic for “assign the submatches into these”. Also, the formal result of regexp is the number of times the RE matched; without the -all option, that's effectively a boolean that is true exactly when there was a match, which is useful. Very useful.

To assign the output of any command <command> to a variable with a name <name>, use set <name> [<command>]:
> set hello_length [string length hello]
5
> puts "The length of 'hello' is $hello_length."
The length of 'hello' is 5.
In your case, maybe this is what you want? (I still don't quite understand the question, though.)
set ckgt_cells [get_lib_cells */*CKGT*0P*]
set cktt_cells [get_lib_cells */*CKTT*0P*]

tcl scripts, struggling with [...] and [expr ...]

I can't understand how assignments and use of variables work in Tcl.
Namely:
If I do something like
set a 5
set b 10
and I do
set c [$a + $b]
Following what internet says:
You obtain the results of a command by placing the command in square
brackets ([]). This is the functional equivalent of the back single
quote (`) in sh programming, or using the return value of a function
in C.
So my statement should set c to 15, right?
If yes, what's the difference with
set c [expr $a + $b]
?
If no, what does that statement do?

Tcl's a really strict language at its core; it always follows the rules. For your case, we can therefore analyse it like this:
set c [$a + $b]
That's three words, set (i.e., the standard “write to a variable” command), c, and what we get from evaluating the contents of the brackets in [$a + $b]. That in turn is a script formed by a single command invocation with another three words, the contents of the a variable (5), +, and the contents of the b variable (10). That the values look like numbers is irrelevant: the rules are the same in all cases.
Since you probably haven't got a command called 5, that will give you an error. On the other hand, if you did this beforehand:
proc 5 {x y} {
return "flarblegarble fleek"
}
then your script would “work”, writing some (clearly defined) utter nonsense words into the c variable. If you want to evaluate a somewhat mathematical expression, you use the expr command; that's it's one job in life, to concatenate all its arguments (with a space between them) and evaluate the result as an expression using the documented little expression language that it understands.
You virtually always want to put braces around the expression, FWIW.
There are other ways to make what you wrote do what you expect, but don't do them. They're slow. OTOH, if you are willing to put the + first, you can make stuff go fast with minimum interference:
# Get extra commands available for Lisp-like math...
namespace path ::tcl::mathop
set c [+ $a $b]
If you're not a fan of Lisp-style prefix math, use expr. It's what most Tcl programmers do, after all.

set c [$a + $b]
Running the above command, you will get invalid command name "5" error message.
For mathematical operations, we should rely on expr only as Tcl treats everything as string.
set c [expr $a + $b]
In this case, the value of a and b is passed and addition is performed.
Here, it is always safe and recommended to brace the expressions as,
set c [expr {$a+$b}]
To avoid any possible surprises in the evaluation.
Update 1 :
In Tcl, everything is based on commands. It can a user-defined proc or existing built-in commands such as lindex. Using a bare-word of string will trigger a command call. Similarly, usage of [ and ] will also trigger the same.
In your case, $a replaced with the value of the variable a and since they are enclosed within square brackets, it triggers command call and since there is no command with the name 5, you are getting the error.

Can we give array name with hyphen in TCL

I am declaring a array in TCL say
set JDSU-12-1(key) element
parray JDSU-12-1
I am getting error saying JDSU is not a array
Even simple puts statement is not working
% puts $JDSU-12-1(key)
can't read "JDSU": no such variable
Is there any way i can declare array name with hyphen. I know _ works in array but not sure about hyphen

You can use special characters in Tcl variable names. You need the braces for those though:
% puts ${JDSU-12-1(key)}
element
You can even use $:
% set \$word "Hello world" ;# Or set {$word} "Hello world"
% puts ${$word}
Hello world
EDIT: Some reference:
beedub.com (Emphasis mine)
The set command is used to assign a value to a variable. It takes two arguments: the first is the name of the variable and the second is the value. Variable names can be any length, and case is significant. In fact, you can use any character in a variable name.

You can use almost any character for the name of a variable in Tcl — the only restrictions relate to :: as that is a namespace separator, and ( as that is used for arrays — but the $ syntax is more restrictive; the name it accepts (without using the ${…} form) has to consist of just ASCII letters, ASCII digits, underscores or namespace separators. Dashes aren't on that list.
The standard (and simplest) way of reading from a variable with a “weird” name is to use set with only one argument, as that happily accepts any legal variable name at all:
puts "the element is '[set JDSU-12-1(key)]'"
However, if you're doing this a lot it is actually easier to make an alias to the (array) variable name:
upvar 0 JDSU-12-1 theArray
puts "the element is $theArray(key)"
That's exactly how parray does it, though it uses upvar 1 because it is aliasing to a variable in the calling scope and not in the current scope.

Although you can use such special characters, you can only use a few when you try to access variables with $varname.
To quote the relevant section from the manual:
$name
Name is the name of a scalar variable; the name is a sequence of one or more characters that are a letter, digit, underscore, or namespace separators (two or more colons). Letters and digits are only the standard ASCII ones (0-9, A-Z and a-z).
$name(index)
Name gives the name of an array variable and index gives the name of an element within that array. Name must contain only letters, digits, underscores, and namespace separators, and may be an empty string. Letters and digits are only the standard ASCII ones (0-9, A-Z and a-z). Command substitutions, variable substitutions, and backslash substitutions are performed on the characters of index.
${name}
Name is the name of a scalar variable or array element. It may contain any characters whatsoever except for close braces. It indicates an array element if name is in the form “arrayName(index)” where arrayName does not contain any open parenthesis characters, “(”, or close brace characters, “}”, and index can be any sequence of characters except for close brace characters. No further substitutions are performed during the parsing of name.
There may be any number of variable substitutions in a single word. Variable substitution is not performed on words enclosed in braces.
Note that variables may contain character sequences other than those listed above, but in that case other mechanisms must be used to access them (e.g., via the set command's single-argument form).
I want to empathis the last paragraph a bit:
You are always able to read any variable with set varname:
set JDSU-12-1(key) element
puts [set JDSU-12-1(key)]
Unlike the ${varname} access, you can substitute a part of the variable name (in your case the array key), the entire variable, while set k "key"; puts ${JDSU-12-1($k)} does not work.

You can easily do that:
set set-var "test"
while accessing so ${set-var}

Like in most other programming languages, TCL variable must be alphanumeric starting with letter (A to Z, or _). Hyphen or dash (-) is not permitted as part of variable name, otherwise it would be confused with arithmetic minus or subtraction: there would be no difference between $x-1 as variable with name "x-1" or $x-1 as variable x minus 1.

Try this :)
subst $\{[subst ${conn}](phan)\}

Which version are you working ??
my tcl works.
% set JDSU-12-1(key) element
element
% parray JDSU-12-1
JDSU-12-1(key) = element

changing csh regexp match code to tcl

I need to change the following piece of code in shell to tcl. Please help.
if (` expr $_f : proj_lp_ ` == 8) then
I need the tcl equivalent of the condition inside the if condition.
Thanks!

See the expr manual page where is states:
STRING : REGEXP
anchored pattern match of REGEXP in STRING
So your _f variable holds a string and you are comparing it with the litteral proj_lp_. The result is the length of the match. In tcl code that could be if {[regexp {^proj_lp_} $_f]} { ...} as you only care if it matches. You could also just use if {[string match "proj_lp_*" $_f]} {...}. The expr(1) page says this is an anchored regexp -- hence adding the caret. Both the examples I have given will only match at the start of the input string (ie: they are anchored).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

not able to understand output of regexp in tcl - tcl

Please explain the output of this tcl command , i am not getting the result . on tclsh set line = "Clock Domain: clk" regexp {Clock Domain:\s*(.+)} $line tmp1 Pnr_clk $tmp1 = "Clock Domain: clk" $Pnr_clk = clk How this value is assigned

Related

String.IndexOf() returns unexpected value - cannot extract substring between two search strings

How to grep parameters inside square brackets?

tcl scripts, struggling with [...] and [expr ...]

Can we give array name with hyphen in TCL

changing csh regexp match code to tcl

Categories

Resources