I have the following string
"TCL is known as "tool command language", TCL is known as "tool
command language", TCL is known as "tool command language""
from the above input I want a output like below
"TCL is known as tool command language, TCL is known as tool command
language, TCL is known as tool command language"
i.e. only first and last double quotes should be displayed on output, and all other should be deleted, Could someone let me know the different methods to accomplish this
There can be many ways. I have tried with regsub
set str {"TCL is known as "tool command language", TCL is known as "tool command language", TCL is known as "tool command language""}
puts "Input : $str"
regsub -all {(.)"} $str {\1} output
puts "Output : $output"
which will produce the following
Input : "TCL is known as "tool command language", TCL is known as "tool command language", TCL is known as "tool command language""
Output : "TCL is known as tool command language, TCL is known as tool command language, TCL is known as tool command language"
The pattern I have used is (.)". In regular expressions, the atom . will match any single character. (Will talk about the parenthesis usage at the bottom). Then a single quote. So, basically, this will match any single char and having a single quote next to it as shown below.
As you can see, we have a total of 6 matches. Let us take the 2nd match which is e". Our main intention is to remove the quotes. But, we have matched 2 characters. This is the reason why we have grouped it with parenthesis.
With Tcl, we can access 1st subgroup with the help of \1 and 2nd subgroup with \2 and so on. Finally, we are substituting the 2 characters with one character which is nothing but the first letter other than quote. i.e. e" is substituted with character e.
Notice the use of -all flag at the beginning which is responsible for matching all the occurrence of this pattern.
Note : \1 should be used with braces like {\1} as I have mentioned. In case, if you want to access it without braces, you have to use \\1
Reference : Non-capturing subpatterns
You could remove all quotes and re-add the outer ones. One way:
set new [format {"%s"} [string map {{"} {}} $str]]
You seek to remove all " characters from the string that have at least one character on each side of them. That leads to this regular expression substitution:
set transformedString [regsub -all {(.)[""]+(.)} $inputString {\1\2}]
The " is doubled up and in [brackets] just to make the highlighting here work. You could use {(.)"+(.)} instead.
Related
Should or should I not wrap quotes around variables in a shell script?
For example, is the following correct:
xdg-open $URL
[ $? -eq 2 ]
or
xdg-open "$URL"
[ "$?" -eq "2" ]
And if so, why?
General rule: quote it if it can either be empty or contain spaces (or any whitespace really) or special characters (wildcards). Not quoting strings with spaces often leads to the shell breaking apart a single argument into many.
$? doesn't need quotes since it's a numeric value. Whether $URL needs it depends on what you allow in there and whether you still want an argument if it's empty.
I tend to always quote strings just out of habit since it's safer that way.
In short, quote everything where you do not require the shell to perform word splitting and wildcard expansion.
Single quotes protect the text between them verbatim. It is the proper tool when you need to ensure that the shell does not touch the string at all. Typically, it is the quoting mechanism of choice when you do not require variable interpolation.
$ echo 'Nothing \t in here $will change'
Nothing \t in here $will change
$ grep -F '#&$*!!' file /dev/null
file:I can't get this #&$*!! quoting right.
Double quotes are suitable when variable interpolation is required. With suitable adaptations, it is also a good workaround when you need single quotes in the string. (There is no straightforward way to escape a single quote between single quotes, because there is no escape mechanism inside single quotes -- if there was, they would not quote completely verbatim.)
$ echo "There is no place like '$HOME'"
There is no place like '/home/me'
No quotes are suitable when you specifically require the shell to perform word splitting and/or wildcard expansion.
Word splitting (aka token splitting);
$ words="foo bar baz"
$ for word in $words; do
> echo "$word"
> done
foo
bar
baz
By contrast:
$ for word in "$words"; do echo "$word"; done
foo bar baz
(The loop only runs once, over the single, quoted string.)
$ for word in '$words'; do echo "$word"; done
$words
(The loop only runs once, over the literal single-quoted string.)
Wildcard expansion:
$ pattern='file*.txt'
$ ls $pattern
file1.txt file_other.txt
By contrast:
$ ls "$pattern"
ls: cannot access file*.txt: No such file or directory
(There is no file named literally file*.txt.)
$ ls '$pattern'
ls: cannot access $pattern: No such file or directory
(There is no file named $pattern, either!)
In more concrete terms, anything containing a filename should usually be quoted (because filenames can contain whitespace and other shell metacharacters). Anything containing a URL should usually be quoted (because many URLs contain shell metacharacters like ? and &). Anything containing a regex should usually be quoted (ditto ditto). Anything containing significant whitespace other than single spaces between non-whitespace characters needs to be quoted (because otherwise, the shell will munge the whitespace into, effectively, single spaces, and trim any leading or trailing whitespace).
When you know that a variable can only contain a value which contains no shell metacharacters, quoting is optional. Thus, an unquoted $? is basically fine, because this variable can only ever contain a single number. However, "$?" is also correct, and recommended for general consistency and correctness (though this is my personal recommendation, not a widely recognized policy).
Values which are not variables basically follow the same rules, though you could then also escape any metacharacters instead of quoting them. For a common example, a URL with a & in it will be parsed by the shell as a background command unless the metacharacter is escaped or quoted:
$ wget http://example.com/q&uack
[1] wget http://example.com/q
-bash: uack: command not found
(Of course, this also happens if the URL is in an unquoted variable.) For a static string, single quotes make the most sense, although any form of quoting or escaping works here.
wget 'http://example.com/q&uack' # Single quotes preferred for a static string
wget "http://example.com/q&uack" # Double quotes work here, too (no $ or ` in the value)
wget http://example.com/q\&uack # Backslash escape
wget http://example.com/q'&'uack # Only the metacharacter really needs quoting
The last example also suggests another useful concept, which I like to call "seesaw quoting". If you need to mix single and double quotes, you can use them adjacent to each other. For example, the following quoted strings
'$HOME '
"isn't"
' where `<3'
"' is."
can be pasted together back to back, forming a single long string after tokenization and quote removal.
$ echo '$HOME '"isn't"' where `<3'"' is."
$HOME isn't where `<3' is.
This isn't awfully legible, but it's a common technique and thus good to know.
As an aside, scripts should usually not use ls for anything. To expand a wildcard, just ... use it.
$ printf '%s\n' $pattern # not ``ls -1 $pattern''
file1.txt
file_other.txt
$ for file in $pattern; do # definitely, definitely not ``for file in $(ls $pattern)''
> printf 'Found file: %s\n' "$file"
> done
Found file: file1.txt
Found file: file_other.txt
(The loop is completely superfluous in the latter example; printf specifically works fine with multiple arguments. stat too. But looping over a wildcard match is a common problem, and frequently done incorrectly.)
A variable containing a list of tokens to loop over or a wildcard to expand is less frequently seen, so we sometimes abbreviate to "quote everything unless you know precisely what you are doing".
Here is a three-point formula for quotes in general:
Double quotes
In contexts where we want to suppress word splitting and globbing. Also in contexts where we want the literal to be treated as a string, not a regex.
Single quotes
In string literals where we want to suppress interpolation and special treatment of backslashes. In other words, situations where using double quotes would be inappropriate.
No quotes
In contexts where we are absolutely sure that there are no word splitting or globbing issues or we do want word splitting and globbing.
Examples
Double quotes
literal strings with whitespace ("StackOverflow rocks!", "Steve's Apple")
variable expansions ("$var", "${arr[#]}")
command substitutions ("$(ls)", "`ls`")
globs where directory path or file name part includes spaces ("/my dir/"*)
to protect single quotes ("single'quote'delimited'string")
Bash parameter expansion ("${filename##*/}")
Single quotes
command names and arguments that have whitespace in them
literal strings that need interpolation to be suppressed ( 'Really costs $$!', 'just a backslash followed by a t: \t')
to protect double quotes ('The "crux"')
regex literals that need interpolation to be suppressed
use shell quoting for literals involving special characters ($'\n\t')
use shell quoting where we need to protect several single and double quotes ($'{"table": "users", "where": "first_name"=\'Steve\'}')
No quotes
around standard numeric variables ($$, $?, $# etc.)
in arithmetic contexts like ((count++)), "${arr[idx]}", "${string:start:length}"
inside [[ ]] expression which is free from word splitting and globbing issues (this is a matter of style and opinions can vary widely)
where we want word splitting (for word in $words)
where we want globbing (for txtfile in *.txt; do ...)
where we want ~ to be interpreted as $HOME (~/"some dir" but not "~/some dir")
See also:
Difference between single and double quotes in Bash
What are the special dollar sign shell variables?
Quotes and escaping - Bash Hackers' Wiki
When is double quoting necessary?
I generally use quoted like "$var" for safe, unless I am sure that $var does not contain space.
I do use $var as a simple way to join lines:
lines="`cat multi-lines-text-file.txt`"
echo "$lines" ## multiple lines
echo $lines ## all spaces (including newlines) are zapped
Whenever the https://www.shellcheck.net/ plugin for your editor tells you to.
Hi im try to add a definded text area %-74s using sed and printf in a tcl script i have but im not sure how to add the printf info to the line of code i have
puts $f "sed -i "s/XXXTLEXXX/\$1/\" /$file";
any help would be greatly appreciated
ive tried a few combinations but all error
Your problem is that you have a need to peint a string with limited substitutions in it, yet that string contains $, " and \ characters in it. Those special characters mean that using a normal double-quoted word in Tcl is very awkward; you could use lots of backslashes to quote the TCL metacharacters, but that's horrible when most of the string is in another language (shell/sed in your case). Here is a better option with string map and a brace-quoted word (which is free of substitutions):
set str {sed -i "s/XXXTLEXXX/$1/" /%FILE%}
puts $f [string map [list "%FILE%" $file] $str]
Note that you can do multiple substitutions in one string map, and that it does each substitution wherever it can. You can use a multi-line literal too. (%FILE% was chosen to be a literal that didn't otherwise occur in the string. Pick your own as you need them, but putting the name in helps with readability.)
I'm currently coding a CSV validator using awk. Here's an example of the code:
awk 'BEGIN{FS=OFS=","} NF!=17{print "not enough fields"; exit}
!($1~/[[:alnum:]]$/) {print "1st field invalid"; exit}' npp_test.cs
However the alnum section won't accept both alphabetic and numerical characters.
So if the data is "t" the program will exit, and if the data is "1" the same thing. However if it is "t1" it won't recognise it as valid.
How would I go about getting the code to accept a mix of alpha and numeric data.
Also the top line isn't really relevant as its just field count:)
If your environment does not support POSIX character classes in awk you may use explicit character ranges in bracket expressions:
!($1 ~ /^[A-Z0-9]{1,25}$/)
Here,
^ - matches the start of a line
[A-Z0-9]{1,25} - matches 1 to 25 uppercase letters or digits
$ - end of string.
NOTE: To avoid any issues with collations, you may add LANG=C before the awk command.
I want to pass a dict value to another shell (in my application it passes through a few 'shell' levels), and the dict contains characters (space, double quotes, etc) that cause issues.
I can use something like ::base64::encode -wrapchar $dict and the corresponding ::base64::decode $str and it works as expected but the result is, of course, pretty much unreadable.
However, for debugging & presentation reasons I would prefer an encoded/sanitised string that resembled the original dict value inasmuch as reasonable and used a character set that avoids spaces, quotes, etc.
So, I am looking for something like ::base64 mapping procs but with a lighter
touch.
Any suggestions would be appreciated.
You can make lighter-touch quoting schemes using either string map or regsub to do the main work.
Here's an example of string map:
set input "O'Donnell's Bait Shop"
set quoted '[string map {' {'\''}} $input]' ; #'# This comment just because of stupid Stack Overflow syntax highlighter
puts $quoted
# ==> 'O'\''Donnell'\''s Bait Shop'
Here's an example of regsub:
set input "This uses a hypothetical quoting of some letters"
set quoted <[regsub -all {[pqr]} $input {«&»}]>
puts $quoted
# ==> <This uses a hy«p»othetical «q»uoting of some lette«r»s>
You'll need to decide what sort of quoting you really want to use. For myself, if I was going through several shells, I'd be wanting to avoid quoting at all (because it is difficult to get right) and instead find ways to send the data in some other way, perhaps over a pipeline or in a temporary file. At a pinch, I'd use an environment variable, as shells tend to not mess around with those nearly as much as arguments.
I have a Tcl program where I often find expressions of the following kind:
proc func {} {...}
...
lappend arr([set v [func]]) $v
The intended meaning of the last line is
set v [func]
lappend arr($v) $v
It obviously works. What I would like to know: Does it work "by accident", or does Tcl guarantee, that the first parameter passed to lappend is evaluated before the second?
Tcl is always evaluated from left to right as you can read on the documentation, I quote the part:
Substitutions take place from left to right, and each substitution is evaluated completely before attempting to evaluate the next. Thus, a sequence like:
set y [set x 0][incr x][incr x]
will always set the variable y to the value, 012.
Agreed with Jerry. Adding some flavor in it.
Tcl commands are evaluated in two steps : parsing & execution.
First the Tcl interpreter parses the command string into words, performing substitutions along the way.
Then a command procedure processes the words to produce a result string. Each command has a separate command procedure.
Let us consider the following code.
%set input "The cat in the hat"
The cat in the hat
%string match "*at in*" $input
1
In the parsing step the Tcl interpreter applies the rules described in this chapter to divide the command up into words and perform substitutions.
Parsing is done in exactly the same way for every command. During the parsing step the Tcl interpreter does not apply any meaning to the values of the words. Tcl just performs a set of simple string operations such as replacing the characters $a with the string stored in variable a. Tcl does not know or care whether a or the resulting word is a number or the name of a widget or anything else.
In the execution step meaning is applied to the words of the command. Tcl treats the first word as a command name, checking to see if the command is defined and locating a command procedure to carry out its function. If the command is defined then the Tcl interpreter invokes its command procedure, passing all of the words of the command to the command procedure. The command procedure is free to interpret the words in any way that it pleases, and different commands apply very different meanings to their arguments.
Major rule to remember here
Tcl parses a command and makes substitutions in a single pass from left to right. Each character is scanned exactly once.
At most a single layer of substitution occurs for each character; the result of one substitution is not scanned for further
substitutions.
Reference : Tcl and the Tk Toolkit