I have a problem string which has multiple lines:
line1
Link1: //website/go/<example>
line2
I am trying to make trim to get just web page link part (just address which can be various - in this case "//website/go/") but there are some extra signs before and after.
My try:
set temp [string map { " " "" "line1" "" "Link1: " "" "line2" "" } $output]
puts "found link : $temp
And the output of it is:
found link :<empty line>
//website/go/<example>
<empty line>`
How can I remove all white spaces, newlines, etc. and trim it in way to get just the part of the string which I am looking for. In this case to get just: //website/go/<example>?
Given this input data:
line1
Link1: //website/go/<example>
line2
You can use string map as you do and then post-process the result with string trim (assuming you only expect one thing left at the end) or remove the newlines with another mapping element. On two lines for clarity:
set temp [string map { " " "" "line1" "" "Link1: " "" "line2" "" } $output]
set temp [string trim $temp]
puts "found link : $temp
However, in this case I'd actually use regexp to pick the data I want:
regexp -line {Link1:\s+(.*\S)} $output -> temp
puts "found link : $temp
Regular expressions tend to be more suitable for parsing part-formatted data, provided you remember to keep them short. The longer a RE is, the harder it is to understand.
Related
I am trying to find multiple string patterns in a string in TCL. I cannot get the correct and optimized way to do that.
I have tried some code and it is not working
I have to find -h ,-he,-hel ,-help in the string -help
set args "-help"
set res1 [string first "-h" $args]
set res2 [ string first -he $args]
set res3 [string first -hel $args]
set res4 [string first "-help" $args"]
if { $res1 == -1 || $res2 || $res3 || $res4 } {
puts "\n string not found"
} else {
puts "\n string found"
}
how to use regexp here I am not sure , so need some inputs.
The expected output is
This is a case where using regexp is easier. (Asking if a string is a prefix of -help is a separate problem.) The trick here is to use ? and (…) (or rather (?:…) which is the non-capturing version) in the RE and you must use the -- option because the RE begins with a -:
if {[regexp -- {-h(?:e(?:lp?)?)?} $string]} {
puts "Found the string"
} else {
puts "Did not find the string"
}
If you want to know what string you actually found, add in a variable to pick up the overall match:
if {[regexp -- {-h(?:e(?:lp?)?)?} $string matched]} {
puts "Found the string '$matched'"
} else {
puts "Did not find the string"
}
If you instead want the indices where it matched, you need an extra option:
if {[regexp -indices -- {-h(?:e(?:lp?)?)?} $string match]} {
puts "Found the string at $match"
} else {
puts "Did not find the string"
}
If you were instead interested in whether the string was a prefix of -help, you instead should do:
if {[string equal -length [string length $string] $string "-help"]} {
puts "Found the string"
} else {
puts "Did not find the string"
}
Many uses of this sort of thing are actually doing command line parsing. In that case, the tcl::prefix command is very useful. For example, tcl::prefix match finds the entry in a list of options that a string is a unique prefix of and generates an error message when things are ambiguous or simply don't match; the result can be switched on easily:
set MY_OPTIONS {
-help
-someOtherOpt
}
switch [tcl::prefix match $MY_OPTIONS $string] {
-help {
puts "I have -help"
}
-someOtherOpt {
puts "I have -someOtherOpt"
}
}
I havetried to split but still failed.
set strdata "34a64323R6662w0332665323020346t534r66662v43037333444533053534a64323R6662w0332665323020346t534r66662v430373334445330535"
puts [split $strdata "3334445330535"] ;#<---- this command does not work
The result needed as below:
{34a64323R6662w0332665323020346t534r66662v43037} {34a64323R6662w0332665323020346t534r66662v43037}
The split command's optional second argument is interpreted as a set of characters to split on, so it really isn't going to do what you want. However, there are other approaches. One of the simpler methods of doing what you want is to use string map to convert the character sequence into a character that isn't in the input data (Unicode is full of those!) and then split on that:
set strdata "34a64323R6662w0332665323020346t534r66662v43037333444533053534a64323R6662w0332665323020346t534r66662v430373334445330535"
set splitterm "3334445330535"
set items [split [string map [list $splitterm "\uFFFF"] $strdata] "\uFFFF"]
foreach i $items {
puts "==> $i"
}
# ==> 34a64323R6662w0332665323020346t534r66662v43037
# ==> 34a64323R6662w0332665323020346t534r66662v43037
# ==> {}
Note that there is a {} (i.e., an empty-string list element) at the end because that's the string that came after the last split element. If you don't want that, add a string trimright between the string map and the split:
# Doing this in steps because the line is a bit long otherwise
set mapped [string map [list $splitterm "\uFFFF"] $strdata]
set trimmed [string trimright $mapped "\uFFFF"]
set items [split $trimmed "\uFFFF"]
The split command doesn't work like that, see the documentation.
Try making the data string into a list like this:
regsub -all 3334445330535 $strdata " "
i.e. replacing the delimiter with a space.
Documentation:
regsub,
split
I need to replace all double quotes from the below string and keep the first and last double quote. How can I do this?
"0 "ifx" "blrcom" "media" "00-00-00-01-01-00" "server" "10.10.10.1" "10.10.10.10" "255.255.255.0" "11.11.11.1" "192.168.1.1" 0 "14.14.14.1"";
The simplest way is probably to remove all double quotes (with string map) and then put the outer ones back on afterwards (with string concatenation).
set str {"0 "ifx" "blrcom" "media" "00-00-00-01-01-00" "server" "10.10.10.1" "10.10.10.10" "255.255.255.0" "11.11.11.1" "192.168.1.1" 0 "14.14.14.1""}
set stripped [string map {\" {}} $str]
set str \"$stripped\"
If you've possibly got that semicolon at the end as well, handle it first/last. string match and string trimright are the right tools.
set gotSemi [string match "*;" $str]
set stripped [string map {\" {}} [string trimright $str ";"]]
set str \"$stripped\"
if {$gotSemi} {
append str ";"
}
How to check that string is a single word?
Is this right way to do that?
set st "some string"
if { [llength $st] != 1 } {
puts "error"
}
According to one possible definition, you check if a string is one word by using:
catch {set oneWord 0;set oneWord [expr {[llength $string] == 0}]}
That's the Tcl language definition of a word.
On the other hand, if your preferred definition is “is alphanumeric” then you have other possibilities, such as:
# -strict excludes the empty string (normally included for historic reasons)
set oneWord [string is alnum -strict $string]
My answer is based on the assumption that a word contains only alphabet characters.
If you don't mind using some regexp, you can use this:
set st "some string"
if { ![regexp {^[A-Za-z]+$} $st] } {
puts "error"
}
[regexp expression string] returns 0 if there is no match and 1 is there is a match.
The expression I used is ^[A-Za-z]+$ which means the string starts with a letter and can contain any number of letters and must end with a letter. If you want to include a dash inside (e.g. co-operate is one word), you add it in the character class:
^[A-Za-z-]+$
If you are now worried about trailing spaces, I would suggest trimming it first before passing it to the regexp:
set st " some string "
if { ![regexp {^[A-Za-z]+$} [string trim $st]] } {
puts "error"
}
or if you want to directly use the regexp...
set st " some string "
if { ![regexp {^\s*[A-Za-z]+\s*$} $st] } {
puts "error"
}
EDIT: If a word is considered as a string of characters except space, you can do something else: check if the string contains a space.
set st "some strings"
if { [regexp { } $st] } {
puts "error"
}
If it finds a space, regexp will return 1.
regexp provides a straight forward way to match a word with \w and \W. \w matches a word character, while \W matches any character except a word character.
set st "some string"
if { [regexp {\W} $st] } {
puts "error"
}
However \w matches only digits, alphabets and _ (in any combination). If special characters are there in your word, this will not work.
I was using the command 'string trimright' to trim my string but I found that this command trims more than required.
My expression is "dssss.dcsss" If I use string trim command to trim the last few characters ".dcsss", it trims the entire string. How can I deal with this?
Command:
set a [string trimright "dcssss.dcsss" ".dcsss"]
puts $a
Intended output:
dcsss
Actual output
""
The string trimright command treats its (optional) last argument as a set of characters to remove (and so .dcsss is the same as sdc. to it), just like string trim and string trimleft do; indeed, string trim is just like using both string trimright and string trimleft in succession. This makes it unsuitable for what you are trying to do; to remove a suffix if it is present, you can use several techniques:
# It looks like we're stripping a filename extension...
puts [file rootname "dcssss.dcsss"]
# Can use a regular expression if we're careful...
puts [regsub {\.dcsss$} "dcssss.dcsss" {}]
# Do everything by hand...
set str "dcssss.dcsss"
if {[string match "*.dcsss" $str]} {
set str [string range $str 0 end-6]
}
puts $str
If what you're doing really is filename manipulation, like it looks like, do use the first of these options. The file command has some really useful commands for working with filenames in a cross-platform manner in it.