I have single letters and numbers in a variable that I would like to remove
example inputs:
USA-2019-1-aoiwer
USA-A-jowerasf
BB-a_owierlasdf-2019
flsfwer_5_2015-asfdlwer
desired outputs:
USA-2019--aoiwer
USA--jowerasf
BB-_owierlasdf-2019
flsfwer__2015-asfdlwer
my code:
bind pub "-|-" !aa proc:aa
proc proc:aa { nick host handle channel arg } {
set line [lindex $arg 0]
set line [string map {[a-z] """} $line]
set line [string map {[0-9] """} $line]
putnow "PRIVMSG $channel :$line"
}
Unfortunately that does not work and i have no other idea
Regards
string map would remove all the lowercase letters and numbers, if it worked. However, you also have unbalanced quotes, which causes a syntax error when the proc is resolving.
I would recommend using regsub. The hard part, however, would be to get a proper expression to do the task. I will suggest the following:
bind pub "-|-" !aa proc:aa
proc proc:aa { nick host handle channel arg } {
set line [lindex $arg 0]
regsub -nocase -all {([^a-z0-9]|\y)[a-z0-9]([^a-z0-9]|\y)} $line {\1\2} line
putnow "PRIVMSG $channel :$line"
}
Basically ([^a-z0-9]|\y) matches a character that is non alphanumeric, or a word boundary (which will match at the beginning of a sentence for example if it can, or at the end of a sentence), and stores it (this is the purpose of the parens).
The matched groups are stored in order starting with 1, so in the replace portion of regsub, I'm placing the parts that shouldn't be replaced back where they were.
The above should work fine.
You could technically go a little fancier with a slightly different expression:
regsub -nocase -all {([^a-z0-9]|\y)[a-z0-9](?![a-z0-9])} $line {\1} line
Which uses a negative lookahead ((?! ... )).
Anyway, if you do want to get more in depth, I recommend reading the manual on regular expression syntax
Related
I havetried to split but still failed.
set strdata "34a64323R6662w0332665323020346t534r66662v43037333444533053534a64323R6662w0332665323020346t534r66662v430373334445330535"
puts [split $strdata "3334445330535"] ;#<---- this command does not work
The result needed as below:
{34a64323R6662w0332665323020346t534r66662v43037} {34a64323R6662w0332665323020346t534r66662v43037}
The split command's optional second argument is interpreted as a set of characters to split on, so it really isn't going to do what you want. However, there are other approaches. One of the simpler methods of doing what you want is to use string map to convert the character sequence into a character that isn't in the input data (Unicode is full of those!) and then split on that:
set strdata "34a64323R6662w0332665323020346t534r66662v43037333444533053534a64323R6662w0332665323020346t534r66662v430373334445330535"
set splitterm "3334445330535"
set items [split [string map [list $splitterm "\uFFFF"] $strdata] "\uFFFF"]
foreach i $items {
puts "==> $i"
}
# ==> 34a64323R6662w0332665323020346t534r66662v43037
# ==> 34a64323R6662w0332665323020346t534r66662v43037
# ==> {}
Note that there is a {} (i.e., an empty-string list element) at the end because that's the string that came after the last split element. If you don't want that, add a string trimright between the string map and the split:
# Doing this in steps because the line is a bit long otherwise
set mapped [string map [list $splitterm "\uFFFF"] $strdata]
set trimmed [string trimright $mapped "\uFFFF"]
set items [split $trimmed "\uFFFF"]
The split command doesn't work like that, see the documentation.
Try making the data string into a list like this:
regsub -all 3334445330535 $strdata " "
i.e. replacing the delimiter with a space.
Documentation:
regsub,
split
How can I remove a part of the text file if the pattern I am searching is matched?
eg:
pg_pin (VSS) {
direction : inout;
pg_type : primary_ground;
related_bias_pin : "VBN";
voltage_name : "VSS";
}
leakage_power () {
value : 0;
when : "A1&A2&X";
**related_pg_pin** : VBN;
}
My pattern is related_pg_pin. If this pattern is found i want to remove that particular section(starting from leakage power () { till the closing bracket}).
proc getSection f {
set section ""
set inSection false
while {[gets $f line] >= 0} {
if {$inSection} {
append section $line\n
# find the end of the section (a single right brace, #x7d)
if {[string match \x7d [string trim $line]]} {
return $section
}
} else {
# find the beginning of the section, with a left brace (#x7b) at the end
if {[string match *\x7b [string trim $line]]} {
append section $line\n
set inSection true
}
}
}
return
}
set f [open data.txt]
set g [open output.txt w]
set section [getSection $f]
while {$section ne {}} {
if {![regexp related_pg_pin $section]} {
puts $g $section
}
set section [getSection $f]
}
close $f
close $g
Starting with the last paragraph of the code, we open a file for reading (through the channel $f) and then get a section. (The procedure to get a section is a little bit convoluted, so it goes into a command procedure to be out of the way.) As long as non-empty sections keep coming, we check if the pattern occurs: if not, we print the section to the output file through the channel $g. Then we get the next section and go to the next iteration.
To get a section, first assume we haven't yet seen any part of a section. Then we keep reading lines until the end of the file is found. If a line ending with a left brace is found, we add it to the section and take a note that we are now in a section. From then on, we add every line to the section. If a line consisting of a single right brace is found, we quit the procedure and deliver the section to the caller.
Documentation:
! (operator),
>= (operator),
append,
close,
gets,
if,
ne (operator),
open,
proc,
puts,
regexp,
return,
set,
string,
while,
Syntax of Tcl regular expressions
Syntax of Tcl string matching:
* matches a sequence of zero or more characters
? matches a single character
[chars] matches a single character in the set given by chars (^ does not negate; a range can be given as a-z)
\x matches the character x, even if that character is special (one of *?[]\)
Here's a "clever" way to do it:
proc unknown args {
set body [lindex $args end]
if {[string first "related_pg_pin" $body] == -1} {puts $args}
}
source file.txt
Your data file appears to be Tcl-syntax-compatible, so execute it like a Tcl file, and for unknown commands, check to see if the last argument of the "command" contains the string you want to avoid.
This is clearly insanely risky, but it's fun.
I am new to tcl, trying to learn, need a help for below.
My string looks like in configFileBuf and trying to replace second occurance of ConfENB:local-udp-port>31001" with XYZ, but below regsub cmd i was tried is always replacing with first occurance (37896). Plz help how to replace second occurance with xyz.
set ConfigFileBuf "<ConfENB:virtual-phy>
</ConfENB:local-ip-addr>
<ConfENB:local-udp-port>37896</ConfENB:local-udp-port>
</ConfENB:local-ip-addr>
<ConfENB:local-udp-port>31001</ConfENB:local-udp-port>
</ConfENB:virtual-phy>"
regsub -start 1 "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>" $ConfigFileBuf "XYZ" ConfigFileBuf
puts $ConfigFileBuf
You have to use regexp -indices to find where to start the replacement, and only then regsub. It's not too bad if you put the regular expression in its own variable.
set RE "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>"
set start [lindex [regexp -all -indices -inline $RE $ConfigFileBuf] 1 0]
regsub -start $start RE $ConfigFileBuf "XYZ" ConfigFileBuf
The 1 is the number of submatches in the RE (zero in this case) plus 1. You can compute it with the help of regexp -about, giving this piece of trickiness:
set RE "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>"
set relen [expr {1 + [lindex [regexp -about $RE] 0]}]
set start [lindex [regexp -all -indices -inline $RE $ConfigFileBuf] $relen 0]
regsub -start $start RE $ConfigFileBuf "XYZ" ConfigFileBuf
If your string was well-formed XML I'd suggest something like tDOM to manipulate it. DOM-style manipulation is almost always better than regular expression-based manipulation on XML markup. (I mention this on the off chance that it's actually supposed to be XML and you just quoted it wrong.)
It looks like you're trying to use -start 1 to tell regsub to skip the first match. The starting index is actually a character index, so in this invocation regsub will just skip the first character in the string. You could set -start further into your string, but that's fragile unless you use regexp to calculate where the first match ends.
I think the best solution would be to get a list of indices to matches by invoking regexp with -all -inline -indices, pick out the second index pair using lindex and finally use string replace to perform the substitution, like this:
set pattern {</ConfENB:local-ip-addr>[ \n\t]+<ConfENB:local-udp-port>[0-9 ]+</ConfENB:local-udp-port>}
set matches [regexp -all -inline -indices -- $pattern $ConfigFileBuf]
set match [lindex $matches 1]
set ConfigFileBuf [string replace $ConfigFileBuf {*}$match XYZ]
The variable match contains a pair of indices (start and end, respectively) for the range of characters you want to replace. As string replace expects those indices to be in different arguments you need to expand $match with the {*} prefix. If you have an earlier version of Tcl than 8.5, you need a slight change to the above code:
foreach {start end} $match break
set ConfigFileBuf [string replace $ConfigFileBuf $start $end XYZ]
In passing, note that you can avoid escaping e.g. character sets in a regular expression if you quote it with braces instead of double quotes.
Documentation links: regexp, lindex, string
having issues trying to debug this 'extra characters after close-brace' error. Error message points to my proc line ... I just can't see it for 2 days!
# {{{ MAIN PROGRAM
proc MAIN_PROGRAM { INPUT_GDS_OASIS_FILE L CELL_LIST_FILE } {
if { [file exists $CELL_LIST_FILE] == 0 } {
set celllist [$L cells]
} else {
set fp [open $CELL_LIST_FILE r]
set file_data [read $fp]
close $fp
set celllist [split $file_data "\n"]
set totalcells [expr [llength $celllist] - 1]
}
set counter 0
foreach cell $celllist {
set counter [expr {$counter + 1}]
set value [string length $cell]
set value3 [regexp {\$} $cell]
if { $value > 0 && $value2 == 0 && $value3 == 0 } {
# EXTRACT BOUNDRARY SIZE FIRST
puts "INFO -- READING Num : $counter/$totalcells -- $cell ..."
ONEIP_EXTRACT_BOUNDARY_SIZE $cell $L "IP_SIZE/$cell.txt"
exec gzip -f "IP_SIZE/$cell.txt"
}
}
# }}}
}
# }}}
This seems to be an unfortunate case of using braces in comments. The Tcl parser looks at braces before comments (http://tcl.tk/man/tcl8.5/TclCmd/Tcl.htm). It is a problem if putting braces in comments causes a mismatched number of open/close braces.
Try using a different commenting style, and remove the "{{{" and "}}}" from your comments.
I'm pretty sure that this is down to braces in comments within the proc body.
The wiki page here has a good explaination. In short a Tcl comment isn't like a comment most other languages and having unmatched braces in them leads to all
sorts of issues.
So the braces in the #}}} just before the end of the proc are probably the problem.
Tcl requires procedure bodies to be brace-balanced, even within comments.
OK, that's a total lie. Tcl really requires brace-quoted strings to be brace-balanced (Tcl's brace-quoted strings are just like single-quoted strings in bash, except they nest). The proc command just interprets its third argument as a script (used to define the procedure body) and it's very common to use brace-quoted strings for that sort of thing. This is a feature of Tcl's general syntax, and is why Tcl is very good indeed at handling things like DSLs.
You could instead do this:
proc brace-demo args "puts hi; # {{{"
brace-demo do it yeah
and that will work fine. Totally legal Tcl, and has a comment in a procedure body with unbalanced braces. It just happens that for virtually any real procedure, putting in all the required backslashes to stop interpretation of variable and command substitutions too soon is a total bear. Everyone uses braces for simplicity, and so has to balance them.
It's hardly ever a problem except occasionally for comments.
For example I have a sentence : whAT is yOur hoUSe nUmBer ? Is iT 26. I have to convert all the first the first letters of each word to uppercase and rest in lower case. I am suppose to use all lsearch, lindex lreplace and stuff and form the code. Can someone tell me how to do this?
The string totitle command is close: it lowercases the whole string except for the first char which is uppercase.
set s {whAT is yOur hoUSe nUmBer ? Is iT 26.}
string totitle $s
What is your house number ? is it 26.
To capitalize each word is a little more involved:
proc CapitalizeEachWord {sentence} {
subst -nobackslashes -novariables [regsub -all {\S+} $sentence {[string totitle &]}]
}
set s {whAT is yOur hoUSe nUmBer ? Is iT 26.}
CapitalizeEachWord $s
What Is Your House Number ? Is It 26.
The regsub command takes each space-separated word and replaces it with the literal string "[string totitle word]":
"[string totitle whAT] [string totitle is] [string totitle yOur] [string totitle hoUSe] [string totitle nUmBer] [string totitle ?] [string totitle Is] [string totitle iT] [string totitle 26.]"
The we use the subst command to evaluate all the individual "string totitle" commands.
When Tcl 8.7 comes out, we'll be able to do:
proc CapitalizeEachWord {sentence} {
regsub -all -command {\S+} $sentence {string totitle}
}
The usual model (in 8.6 and before) for applying a command to a bunch of regular-expression-chosen substrings of a string is this:
subst [regsub -all $REtoFindTheSubstrings [MakeSafe $input] {[TheCommandToApply &]}]
The MakeSafe is needed because subst doesn't just do the bits that you want. Even with disabling some substitution classes (e.g., with the -novariables) option, you still need the trickiest one of all — command substitutions — and that means that strings like hello[pwd]goodbye can catch you out. To deal with this you make the string “safe” by replacing every Tcl metacharacter (or at least the ones that matter in a subst) by its backslashed version. Here's a classic version of MakeSafe (that you'll often see inlined):
proc MakeSafe {inputString} {
regsub -all {[][$\\{}"" ]} $inputString {\\&}
}
Demonstrating it interactively:
% MakeSafe {hello[pwd]goodbye}
hello\[pwd\]goodbye
With that version, no substitution classes need to be turned off in subst though you could turn off variables, and there's no surprises possible when you're applying the command as things that could crop up in the substituted argument string have been escaped. But there's a big disadvantage: you potentially need to change the regular expression in your transformation to take account of the extra backslashes now present. It's not required for the question's RE (as that just selects sequences of word characters) and indeed that could safely be this reduced version:
subst [regsub -all {\w+} [regsub -all {[][\\$]} $input {\\&}] {[string totitle &]}]
In 8.7 onwards, there's a -command option to regsub that avoids all this mess. It's also quite a bit faster, as subst works by compiling its transformations into bytecode (that's not a good win for a one-off substitution!) and regsub -command uses direct command invoking instead, much more likely to be fast.
regsub -all -command {\w+} $input {string totitle}
The internal approach used by regsub -all -command can be emulated in 8.6 (or earlier with more extra shims), but it is non-trivial:
proc regsubAllCommand {RE input command} {
# I'll assume there's no sub-expressions in the RE in order to keep this code shorter
set indices [regexp -all -inline -indices -- $RE $input]
# Gather the replacements first to make state behave right
set replacements [lmap indexPair $indices {
# Safe version of: uplevel $command [string range $input {*}$indexPair]
uplevel 1 [list {*}$command [string range $input {*}$indexPair]]
}]
# Apply the replacements in reverse order
set output $input
foreach indexPair [lreverse $indices] replacement [lreverse $replacements] {
set output [string replace $output {*}$indexPair $replacement]
}
return $output
}
The C implementation of regsub uses working buffers and so on internally, but that's not quite so convenient at the Tcl level.
You can Use Initcap function for making 1st letter in upper case and rest in lower case.