Grep the word inside double quote - tcl

How can I extract a word inside a double quote inside a file?
e.g.
variable "xxx"

Reading a text file into Tcl is just this:
set fd [open $filename]
set data [read $fd] ;# Now $data is the entire contents of the file
close $fd
To get the first quoted string (under some assumptions, notably a lack backslashed double quote characters inside the double quotes), use this:
if {[regexp {"([^""]*)"} $data -> substring]} {
# We found one, it's now in $substring
}
(Doubling up the quote in the brackets is totally unnecessary — only one is needed — but it does mean that the highlighter does the right thing here.)
The simplest method of finding all the quoted strings is this:
foreach {- substring} [regexp -inline -all {"([^""]*)"} $data] {
# One of the substrings is $substring at this point
}
Notice that I'm using the same regular expression in each case. Indeed, it's actually good practice to factor such REs (especially if repeatedly used) into a variable of their own so that you can “name” them.
Combining all that stuff above:
set FindQuoted {"([^""]*)"}
set fd [open $filename]
foreach {- substring} [regexp -inline -all $FindQuoted [read $fd]] {
puts "I have found $substring for you"
}
close $fd

Internal Matching
If you're just looking for a regular expression, then you can use TCL's capture groups. For example:
set string {variable "xxx"}
regexp {"(.*)"} $string match group1
puts $group1
This will return xxx, discarding the quotes.
External Matching
If you want to match data in a file without having to handling reading the file into TCL directly, you can do that too. For example:
set match [exec sed {s/^variable "\(...\)"/\1/} /tmp/foo]
This will call sed to find just the parts of the match you want, and assign them to a TCL variable for further process. In this example, the match variable is set to xxx as above, but is operating on an external file rather than a stored string.

When you just want to find with grep all words in quotes in a file and do something with the words, you do something like this (in a shell):
grep -o '"[^"]*"' | while read word
do
# do something with $word
echo extracted: $word
done

Related

change a number in txt file

I have the next expression: jj_ftfll h\\h\ -0.8898:0.006656 0.998:0.99999 h&j\hhh in a txt file,
and I need to add 0.005 to the 0.006656 number. I want to use Tcl and I can't think of any good idea.
There's several aspects that are tricky.
The file needs to be edited in-place despite the fact that the addition might change the length of the line. (Such an addition could potentially either lengthen or shorten the line.)
There needs to be a way of robustly recognising that that is the line to modify, and not some other line in the file. (This is actually the hardest of these problems in reality; it's extremely application-specific.)
The number needs to be extracted from the line, modified, and written back.
The values you are dealing with are potentially (well, actually) not represented precisely in IEEE binary floating point, which is what Tcl will use to do the calculations.
Bearing all that in mind, we are talking about these sorts of solutions:
We'll read the whole file in, split it into a list of strings, one string per line (henceforth referred to as the lines), update the lines of interest, and then write the whole lot back.
We'll use regexp to decide if a line is of interest. That's by far the most common command for this sort of task.
This one is messy in Tcl 8.6 and before. It's got a much better solution in Tcl 8.7.
There's really not all that much you can do about this. If you know the range of the numbers, you can use format to help… but it's messy. But maybe you'll get lucky.
set filename "foobar.txt"
# Get the lines of the file; this is GREAT if the file isn't too large
set f [open $filename]
set lines [split [read $f] "\n"]
close $f
# Now THAT'S what I call a horrible regular expression!
set RE {^(jj_ftfll\s+h\\\\h\\\s+-?[\d.]+:)(-?[\d.]+)(\s+-?[\d.]+:-?[\d.]+\s+h&j\hhh)$}
set newLines {}
foreach line $lines {
if {[regexp $RE $line -> prefix number suffix]} {
set line $prefix[expr {$number + 0.005}]$suffix
}
lappend newLines $line
}
# Write back over the file; the -nonewline prevents the number of lines from growing
set f [open $filename w]
puts -nonewline $f [join $newLines "\n"]
close $f
The trick with the regexp is that I am matching three pieces: the bit of the line before the part to replace (saved in the variable prefix), the number to replace itself (number), and the bit after the part to replace (suffix); the regexp command returns the number of times it matches (1 if the RE is found, 0 if it isn't). It's a scary RE mostly because it has -?[\d.]+ to match those floating point numbers, and I've changed spaces to \s+ (i.e., “at least one whitespace character”).
The version for 8.7 is this:
set filename "foobar.txt"
# Get the lines of the file; this is GREAT if the file isn't too large
set f [open $filename]
set lines [split [read $f] "\n"]
close $f
# Now THAT'S what I call a horrible regular expression!
set RE {^(jj_ftfll\s+h\\\\h\\\s+-?[\d.]+:)(-?[\d.]+)(\s+-?[\d.]+:-?[\d.]+\s+h&j\hhh)$}
proc addDeltaInLine {delta prefix number suffix} {
set number [expr {$number + $delta}]
return [string cat $prefix $number $suffix]
}
set newLines [lmap line $lines {
regsub -command $RE $line {addDeltaInLine 0.005}
}]
# Write back over the file; the -nonewline prevents the number of lines from growing
set f [open $filename w]
puts -nonewline $f [join $newLines "\n"]
close $f
The combination of lmap and regsub -command clean things up quite a bit. The RE is still scary though…

Can I convert a string with space using totitle?

The Tcl documentation is clear on how to use string totitle:
Returns a value equal to string except that the first character in
string is converted to its Unicode title case variant (or upper case
if there is no title case variant) and the rest of the string is
converted to lower case.
Is there a workaround or method that will convert a string with spaces (the first letter of each word would be upper case)?
For example in Python:
intro : str = "hello world".title()
print(intro) # Will print Hello World, notice the capital H and W.
In Tcl 8.7, the absolutely most canonical way of doing this is to use regsub with the -command option to apply string totitle to the substrings you want to alter:
set str "hello world"
# Very simple RE: (greedy) sequence of word characters
set tcstr [regsub -all -command {\w+} $str {string totitle}]
puts $tcstr
In earlier versions of Tcl, you don't have that option so you need a two stage transformation:
set tcstr [subst [regsub -all {\w+} $str {[string totitle &]}]]
The problem with this is that it will below up if the input string has certain Tcl metacharacters in it; it is possible to fix this, but it's horrible to do; I added the -command option to regsub precisely because I was fed up of having to do a multi-stage substitute just to make a string I could feed through subst. Here's the safe version (the input stage could also be done with string map):
set tcstr [subst [regsub -all {\w+} [regsub -all {[][$\\]} $str {\\&}] {[string totitle &]}]]
It gets really complicated (well, at least quite non-obvious) when you want to actually do the replacement on substrings that have been transformed. Which is why it is now possible to circumvent all that mess with regsub -command that is careful with word boundaries when doing the replacement command running (because the Tcl C API is actually good at that).
Donal gave you an answer but there is a package that allows you to do what you want textutil::string from Tcllib
package require textutil::string
puts [::textutil::string::capEachWord "hello world"]
> Hello World

To find line index and word index by reading a text file

I have just started learning Tcl, can someone help me how to find line index and word index for a particular word by reading a text file using Tcl.
Thank you
As mentioned in the comments, there is a lot of basic commands you might utilize to solve your problem. To read a file into a list of lines you could use open, split, read and close commands as follows:
set file_name "x.txt"
# Open a file in a read mode
set handle [open $file_name r]
# Create a list of lines
set lines [split [read $handle] "\n"]
close $handle
Finding a certain word in a list of lines might be achieved by using a for loop, incr and a set of lists related commands like llength, lindex and lsearch. Every string in Tcl can be interpreted and processed as a list. The implementation might look like this:
# Searching for a word "word"
set neddle "word"
set w -1
# For each line (you can use `foreach` command here)
for {set l 0} {$l < [llength $lines]} {incr l} {
# Treat a line as a list and search for a word
if {[set w [lsearch [lindex $lines $l] $neddle]] != -1} {
# Exit the loop if you found the word
break
}
}
if {$w != -1} {
puts "Word '$neddle' found. Line index is $l. Word index is $w."
} else {
puts "Word '$neddle' not found."
}
Here, the script iterates over the lines and searches each one for a given word as if it was a list. Executing a list command on a string splits it by space by default. The loop stops when a word is found in a line (when lsearch returns a non-negative index).
Also note, that the list commands are treating multiple spaces as a single separator. In this case it seems to be a desired behavior. Using split command on a string with a double space would effectively create a "zero length word" which might yield an incorrect word index.

how to get specific parameters in a square bracket and store it in to a specific variable in tcl

set_dont_use [get_lib_cells */*CKGT*0P*] -power
set_dont_use [get_lib_cells */*CKTT*0P*] -setup
The above is a text file.
I Want to store */CKGTOP* and */CKTTOP* in to a variable this is the programme which a person helped me with
set f [open theScript.tcl]
# Even with 10 million lines, modern computers will chew through it rapidly
set lines [split [read $f] "\n"]
close $f
# This RE will match the sample lines you've told us about; it might need tuning
# for other inputs (and knowing what's best is part of the art of RE writing)
set RE {^set_dont_use \[get_lib_cells ([\w*/]+)\] -\w+$}
foreach line $lines {
if {[regexp $RE $line -> term]} {
# At this point, the part you want is assigned to $term
puts "FOUND: $term"
}
}
My question is if more than one cells like for example
set_dont_use [get_lib_cells */*CKGT*0P* */*CKOU*TR* /*....] -power
set_dont_use [get_lib_cells */*CKGT*WP* */*CKOU*LR* /*....] -setup
then the above script isn't helping me to store the these "n" number cells in the variable known as term
Could any of u people help me
Thanking you ahead in time
I would go with
proc get_lib_cells args {
global term
lappend term {*}$args
}
proc unknown args {}
and then just
source theScript.tcl
in a shell that doesn't have the module you are using loaded, and thus doesn't know any of these non-standard commands.
By setting unknown to do nothing, other commands in the script will just be passed over.
Note that redefining unknownimpairs Tcl's ability to automatically load some processes, so don't keep using that interpreter after this.
Documentation:
global,
lappend,
proc,
unknown,
{*} (syntax)
Your coding seems like the Synopsys syntax, meaning - it shouldn't work the way you wrote it, I'd expect curly braces:
set_dont_use [get_lib_cells {*/*CKGT*0P* */*CKOU*TR* /*....}] -power
moreover, the \w doesn't catch the *,/ (see this).
If I were you, I'd go for set RE {^set_dont_use \[get_lib_cells \{?([\S*]+ )+\}?\] -\w+$} and treat the resulting pattern match as a list.
Edit:
see this:
% regexp {^set_dont_use [get_lib_cells {?(\S+) ?}?]} $line -> match
1
% echo $match
*/*CKGT*0P*
If you have more than one item in your line, add another parentheses inside the curly braces:
regexp {^set_dont_use \[get_lib_cells \{?(\S+) ?(\S+)?\}?\]} $l -> m1 m2
ect.
Another Edit
take a look at this, just in case you want multiple matches with the same single pattern, but than, instead of \S+, you should try something that looks like this: [A-Za-z\/\*]

non-case-sensitive version of file exists command

Well, not sure what to do in this regard. A little while ago I modified a logging script for an eggdrop bot.. but now an issue unfolds that for some reason, it is logging actions/text in separate files because of an issue of character case. #channel.html exists, as does #Channel.html, though the former is written to because of the current state of the channel name(it can change if all users leave and one rejoins with different case).
I've narrowed this problem down to what I believe is the issue. file exists 'filename_here'. I've looked through tcl's documentation, and I've read through the wiki regarding mixed case file names(it treats them as different files of course), but I have yet to find such an option(or user made proc) that would allow me to disable this behavior.
Is there a way around/to do this?
It really depends on the file system (i.e., the OS) as file exists is just a thin wrapper around the OS's basic file existence test. Classic Unix filesystems are mostly case-sensitive, whereas Windows filesystems are usually case-insensitive. This means that it is usually best to write your code to be careful with handling the case of things; you probably ought to consider using string tolower to get a channel name in an expected case (since I think IRC channel names are case-insensitive).
But if you can't do that, the best you can do is to get the list of filenames that match case-insensitively and check if that's a single value. Alas, this is a messy operation as glob doesn't have a -nocase option (it's rare that people want such a thing), so we need to use string match -nocase to help out:
set files [lmap f [glob *.html] {
expr {[string match -nocase ${channel}.html $f] ? $f : [continue]}
}]
if {[llength $files] == 1} {
set channel_file [lindex $files 0]
} else {
# Oh no! Ambiguity!
}
That uses lmap from Tcl 8.6; earlier versions of Tcl should use this instead:
set files {}
foreach f [glob *.html] {
if {[string match -nocase ${channel}.html $f]} {
lappend files $f
}
}
if {[llength $files] == 1} {
set channel_file [lindex $files 0]
} else {
# Oh no! Ambiguity!
}
Pick a filename case (#channel.html, #Channel.html or #CHANNEL.HTML) and use string tolower, string totitle or string toupper respectively on filename_here. Then use that value for all file operations.
An lsearch filter on glob can be used to perform a case-insensitive search for a particular file name, e.g.
% lsearch -nocase -all -inline -glob [glob ./*] {*/myfile.txt}
./myFile.txt ./Myfile.txt ./MYFILE.txt
A sanity check using llength on the lsearch result above can be used to flag an error in case more than one file name is returned.