Split a string into words which are enclosed in single quotes - tcl

I want to split a string into separate words which which are enclosed in single quotes like below:
For example:
set str {'Name' 'Karna Mayer' ''}
I want to split this into 3 separate words. How can this be performed using Tcl.

For this sort of task, I'd use regexp -all -inline and lmap (to drop the unwanted bits from the results of that).
set input "'Name' 'Karna Mayer' ''"
set output [lmap {- bit} [regexp -all -inline {'([^'']*)'} $input] {set bit}]
The good thing about this is that if you have a way of escaping a single quote in that, you can use a more complex regular expression and match that too.
set output [lmap {- bit} [regexp -all -inline {'((?:\\.|[^''])*)'} $input] {
string map {\\ {}} $bit
}]

You can use string map to convert the single quotes to double quotes and escape existing quotes
set str [string map {{"} {\"} ' {"}} $str]
# "name" "Karna Mayer" ""
you can then use list and argument expansion to convert it to a list
set l [list {*}$str]
# Name {Karna Mayer} {}
full program
set str {'Name' 'Karna Mayer' ''}
set str [string map {{"} {\"} ' {"}} $str]
set l [list {*}$str]

If you use single quote as a separator, then you'll take every second element:
% set input "'Name' 'Karna Mayer' ''"
'Name' 'Karna Mayer' ''
% split $input {'}
{} Name { } {Karna Mayer} { } {} {}
We see: the empty string before the first quote; the first field; the space between the 1st and 2nd; the 2nd field; the next space; the (empty) 3rd field; and then the empty string after the last quote. We want to ignore this last element.
% set fields [lmap {_ field} [lrange [split $input {'}] 0 end-1] {set field}]
Name {Karna Mayer} {}
No thanks to the Tcl syntax highlighter.

Related

Is there a simple way to parse a line of Tcl into its command and its arguments (not just splitting by whitespace)

Suppose I have a string which is also a Tcl command.
set line {lsort -unique [list a b c a]}
How can I convert this string into a list equivalent to this?
{
{lsort}
{-unique}
{[list a b c a]}
}
Because of whitespace inside the square brackets, I can't just use lindex.
For example:
> lindex $line 2
--> [list
The reason I'm asking is because I have a large Tcl script that I want to parse and re-write. I would like certain lines in the re-written script to have swapped argument order or some numerical arguments scaled by a factor.
I know I could parse the string character by character, keeping track of {}, [], and " characters, but this feels like re-inventing something that might already exist. I've been looking at the info and interp commands but couldn't find anything there.
I used info complete successfully in this proc.
proc command_to_list {command} {
# split by whitespace
set words [regexp -all -inline {\S+} $command]
set spaces [regexp -all -inline {\s+} $command]
set output_list [list]
set buffer ""
foreach word $words space $spaces {
append buffer $word
if {[info complete $buffer]} {
lappend output_list $buffer
set buffer ""
} else {
append buffer $space
}
}
return $output_list
}
This proc will group whitespace separated 'words' until they have no unmatched curlies, double quotes, or square brackets. Whitespace is preserved inside of matching pairs of curlies, double quotes or square brackets.
> set command {foreach {k v} [list k1 v1 k2 v2] {puts "$k $v"}}
> foreach word [command_to_list $command] {puts $word}
foreach
{k v}
[list k1 v1 k2 v2]
{puts "$k $v"}

Tcl: replace string in a specific column

I have the below line:
^ 1 0.02199 0.03188 0.03667 0.00136 0.04155 0.00000 1.07223 1.07223 -0.47462 0.00335 -0.46457 buf_63733/Z DCKBD1BWP240H11P57PDULVT -
I want to replace column 3 with a different value and to keep the entire line with spaces as is.
I tried lreplace - but spaces deleted.
string map can only replace a word but didn't find a way to replace exact column.
Can someone advice?
Assuming the columns are separated by at least 2 spaces, you could use something like:
set indices [regexp -all -indices -inline {\S+(?:\s\S+)?\s{2,}} $line]
set colCount 1
set newValue 0.01234
foreach pair $indices {
if {$colCount == 3} {
lassign $pair start end
set column [string range $line $start $end]
set value [string trimright $column]
set valueEnd [expr {$end-[string length $column]+[string length $value]}]
set newLine [string replace $line $start $valueEnd $newValue]
} elseif {$colCount > 3} {
break
}
incr colCount
}
You can change the newValue to something else or the newLine to line if you don't need the old line.
Another method uses regsub to inject a command into the replacement string, and then subst to evaluate it. This is like perl's s/pattern/code/e
set newline [subst [regsub {^((?:\s+\S+){2})(\s+\S+)} $line \
{\1[format "%*s" [string length "\2"] $newvalue]}]]

Replace same strings with swap difference?

To manipulate Strings in Tcl, we use the string command.
If you need to replace comma:
set value { 10.00 }
puts [string map -nocase { . , } $value]
# Return: 10,00
We can replace several strings:
set text "This is a replacement test text"
puts [string map -nocase { e E s S a A } $text]
# Returns: THIS IS A TEXT OF REPLACEMENT TEST
Of course, we can replace words:
set text "This is a replacement test text"
puts [string map -nocase {test TEST a {second}} $text]
# Returns: This is the second replacement TEST text.
So far so good!
But one question that does not want to be silent is .. How to replace more than one identical occurrence in the sentence, giving a DIFFERENT substitution for each of them?
For example:
set time {10:02:12}
puts [string map -nocase { { : +} {: =} } $time]
I would like this result: 10 + 02 = 12
proc seqmap {str match args} {
set rc $str
foreach l [lreverse [regexp -all -indices -inline ***=$match $str]] \
replacement [lreverse $args] {
set rc [string replace $rc {*}$l $replacement]
}
return $rc
}
seqmap 10:02:12 : { + } { = }
=> 10 + 02 = 12
I'm using lreverse in case the replacement has a different length than the string it replaces. The indices would be off if the replacements were done from left to right.
The ***= is used to avoid special treatment of wildcard characters in the match string.
Of course, things get a lot more complicated if you want to handle the case where the number of occurrences doesn't match the number of provided substitutions. And even more if you want to replace several different strings.
This version handles the complications mentioned above:
proc seqmap {map str} {
# Transform the map into a dict with each key containing a list of replacements
set mapdict {}
foreach {s r} $map {dict lappend mapdict $s $r}
# Build a map where each key maps to a unique tag
# At the same time build a dict that maps our tags to the replacements
# First map the chosen tag character in case it is present in the string
set newmap {# #00}
set mapdict [dict map {s r} $mapdict {
lappend newmap $s [set s [format #%02d [incr num]]]
set r
}]
# Add the tag character to the dict so it can be mapped back
dict set mapdict #00 #
# Map the tags into the string
set rc [string map $newmap $str]
# Locate the positions where the tags ended up
set match [regexp -all -indices -inline {#\d\d} $rc]
# Create a list of replacements matching the tags
set replace [lmap l $match {
# Extract the tag
set t [string range $rc {*}$l]
# Obtain a replacement for this tag
set s [lassign [dict get $mapdict $t] r]
# Return the used replacement to the end of the list
dict set mapdict $t [linsert $s end $r]
# Add the replacement to the list
set r
}]
# Walk the two lists in reverse order, replacing the tags with the selected replacements
foreach l [lreverse $match] r [lreverse $replace] {
set rc [string replace $rc {*}$l $r]
}
# Done
return $rc
}
You call it just like you would string map, so with a key-value mapping and the string to perform the replacements on. Any duplicated keys specify the subsequent values to be substituted for each occurrence of the key. When the list is exhausted it starts over from the beginning.
So puts [seqmap {: + : = : *} 10:02:12] => 10+02=12
And puts [seqmap {: + : =} 10:02:12:04:16] => 10+02=12+04=16
As presented, the command can handle up to 99 unique keys. But it can easily be updated if more are needed.

Return string after specific character

I have a question regarding possibility of getting string after specific character in TCL.
Whan I mean is :
Input:
abcdefgh = hgfedcba
Output:
hgfedcba
(return everything after "=" without possible whitespaces)
This is what I was using:
regexp {abcdefgh=\s+"(.*)"} $text_var all variable
In some cases it is ok (with spaces) but when there is no whitespaces then it is not working.
Assuming
% set s {abcdefgh = hgfedcba}
# => abcdefgh = hgfedcba
(or the same thing without one or both of the blanks) you could do one of these:
% scan $s {%*[^=]= %s}
# => hgfedcba
(Scan the string for a substring not containing "=", then advance past the equals sign and optional whitespace, then return the rest of the string.)
string trim [lindex [split $s =] 1]
(Split the string at the equals sign, return the (whitespace-trimmed) second resulting element.)
string trim [string range $s [string first = $s]+1 end]
(Return the (whitespace-trimmed) substring starting after the equals sign.)
string trim [lindex [regexp -inline {[^=]+$} $s] 0]
(Return the (whitespace-trimmed) first match of one or more characters, not including the equals sign, anchored on the end of the string.)
lindex [regexp -inline -all {[a-h]+} $s] 1
(Return the second match of consecutive characters from the set "a" to "h".)
string trimleft [string trimleft $s {abcdefgh }] {= }
(Remove all characters from the start of the string that occur in the set "a" to "h" and blank, then remove from start of the resulting string any characters that are equals sign or blank.)
% regexp {abcdefgh\s*=\s*(\S+)} "abcdefgh = hgfedcba" all variable
1
% set variable
hgfedcba
% regexp {abcdefgh\s*=\s*(\S+)} "abcdefgh=hgfedcba" all variable
1
% set variable
hgfedcba
%

How to find ',' in a string in TCL

I am new to TCL, just wanted to know that how can we search for "," in a string and want the particular string before and after.
Example : tampa,florida
It has to search for , if in that string if there is , it should return tampa and florida we can use string replace but it will not work in my condition because i need to map, tampa and florida to different set of variables dont even know how the inbound would look like to use string range.
.
Thanks,
Arya
Unless there is some further condition, you could do it this way:
split tampa,florida ,
This command gives as result a list containing the two strings "tampa" and "florida".
Documentation: split
The shortest piece of code to do this would be using regular expressions:
if {[regexp {(.+),(.+)} $string a b c]} {
# $a is the complete match. But we don't care
# about that so we ignore it
puts $b; #tampa
puts $c; #florida
}
The regular expression (.+),(.+) means:
(
. any character
+ one or more of the above
) save it in a capture group
, comma character
(
. any character
+ one or more of the above
) save it in a capture group
See the documentation of regular expression syntax in tcl for more about regular expressions: https://www.tcl.tk/man/tcl8.6/TclCmd/re_syntax.htm
But if you're not familiar with regular expressions and want an alternative way of doing this you can use the various string commands. This is one way to do it:
set comma_location [string first "," $string]
if {$comma_location > -1} {
set a [string range $string 0 [expr {$comma_location -1}]
set b [string range $string [expr {$comma_location +1}] end]
puts $a; #tampa
puts $b; #florida
}
A variant of slebetman's last answer.
proc before_after {value find {start 0}} {
set index [string first $find $value $start]
set left_side [string range $value $start [expr $index - 1]]
set right_side [string range $value [expr $index + 1] end]
return [list $left_side $right_side]
}
puts [before_after "tampa,fl" ","]
output:
tampa fl