in tcl, how to edit string in the open file? - tcl

let's say that I have opened a file using:
set in [open "test.txt" r]
I'm intend to revise some string in the certain line, like:
style="fill:#ff00ff;fill-opacity:1"
and this line number is: 20469
And I want to revise the value ff00ff to other string value like ff0000.
What are the proper ways to do this? Thanks in advance!

You need to open the file in read-write mode; the r+ mode is probably suitable.
In most cases with files up to a reasonable number of megabytes long, you can read the whole file into a string, process that with a command like regsub to perform the change in memory, and then write the whole thing back after seeking to the start of the file. Since you're not changing the size of the file, this will work well. (Shortening the file requires explicit truncation.)
set f [open "test.txt" r+]
set data [read $f]
regsub {(style="fill:#)ff00ff(;fill-opacity:1)"} $data {\1ff0000\2} data
seek $f 0
puts -nonewline $f $data
# If you need it, add this here by uncommenting:
#chan truncate $f
close $f
There are other ways to do the replacement; the choice depends on the details of what you're doing.

Related

how to read the binary section of script currently being evaluated?

How do I read the section after end-of-stream (^Z) in a Tcl-script being sourced?
So far I got info script returning the filename of the currently sourced script which I could open just like any file and put the read position to after end-of-stream by just parsing the file.
In theory the content of the file could change between the invocation of source and subsequent info script and open, possibly causing temporal inconsistency between read script and binary data.
Is there a magic command for this that I've missed? Or do we rely on users/administrators making sure such inconsistencies can't happen?
Suggestion
Provide for your custom source that extracts the trailer in the same I/O step as sourcing the contained script. For example:
interp hide {} source source
proc ::source {fp} {
set size [file size $fp]
set chan [open $fp r]
info script $fp
try {
chan configure $chan -eofchar {\u001a {}}
set script [read $chan]
uplevel 1 [list eval $script]
set scriptOffset [chan tell $chan]
if {$scriptOffset < $size} {
chan seek $chan 1 current; # move cursor beyond eof
chan configure $chan -translation binary
set trailer [read $chan]
# do whatever you want to do with the trailer
}
} finally {
close $chan
}
}
Some remarks
The trick is to employ the same machinery as Tcl's source does internally: configure -eofchar.
Once it has been determined, that there is a trailer (i.e., content beyond the eof char), seek is used to position the cursor at the script's offset.
A second read will then get you the trailer.
From this point onwards, you must be careful to maintain the trailer value in its shape as byte array.
Disclaimer: Tcl wizards like Donal might have better ways of doing so. Also, single-file distribution mechanisms like starkits might have helpers for dealing with script trailers.

How to map a file handle to the file it is accessing

I am trying to map a file handle like file6
(the result of a previous set fh [open somefile.txt w]) to the file it was accessing.
So I'd like a mapping between file6 --> somefile.txt
I tried file channels, but this only lists the channel names - not the actual file name.
As you've noticed, Tcl doesn't keep this information for you, but you can easily keep track of it for yourself by doing something like this:
set fh [open somefile.txt w]
set filenames($fh) somefile.txt
Then when you want to know the associated file name, you can
puts $filenames($fh)
You can automate this by e.g.:
proc myOpen {name args} {
global filenames
set fh [open $name {*}$args]
set filenames($fh) $name
return $fh
}
This is of course a quick-and-dirty semi-solution that leaves some important things open, like for instance how the association needs to be similarly removed if the channel is closed. It is possible, but a bit complicated, to create a more comprehensive solution.
Documentation:
global,
open,
proc,
puts,
return,
set,
{*} (syntax)

collecting set of files using tcl script

I am looking to generate a tcl script, which reads each line of a file, say abc.txt; each line of abc.txt is a specific location of set of files which need to be picked except the ones commented.
For example abc.txt has
./pvr.vhd
./pvr1.vhd
// ./pvr2.vhd
So I need to read each line of abc.txt and pick the file from the location it has mentioned and store it in a separate file except the once which starts with "//"
Any hint or script will be deeply appreciated.
The usual way of doing this is to put a filter at the start of the loop that processes each line that causes the commented lines to be skipped. You can use string match to do the actual detecting of whether a line is to be filtered.
set f [open "abc.txt"]
set lines [split [read $f] "\n"]
close $f
foreach line $lines {
if {[string match "//*" $line]} {
continue
}
# ... do your existing processing here ...
}
This also works just as well when used with a streaming loop (while {[gets $f line] >= 0} {…}).

How to mask the sensitive information contained in a file using tcl?

I'm trying to implement a tcl script which reads a text file, and masks all the sensitive information (such as passwords, ip addresses etc) contained it and writes the output to another file.
As of now I'm just substituting this data with ** or ##### and searching the entire file with regexp to find the stuff which I need to mask. But since my text file can be 100K lines of text or more, this is turning out to be incredibly inefficient.
Are there any built in tcl functions/commands I can make use of to do this faster? Do any of the add on packages provide extra options which can help get this done?
Note: I'm using tcl 8.4 (But if there are ways to do this in newer versions of tcl, please do point me to them)
Generally speaking, you should put your code in a procedure to get best performance out of Tcl. (You have got a few more related options in 8.5 and 8.6, such as lambda terms and class methods, but they're closely related to procedures.) You should also be careful with a number of other things:
Put your expressions in braces (expr {$a + $b} instead of expr $a + $b) as that enables a much more efficient compilation strategy.
Pick your channel encodings carefully. (If you do fconfigure $chan -translation binary, that channel will transfer bytes and not characters. However, gets is not be very efficient on byte-oriented channels in 8.4. Using -encoding iso8859-1 -translation lf will give most of the benefits there.)
Tcl does channel buffering quite well.
It might be worth benchmarking your code with different versions of Tcl to see which works best. Try using a tclkit build for testing if you don't want to go to the (minor) hassle of having multiple Tcl interpreters installed just for testing.
The idiomatic way to do line-oriented transformations would be:
proc transformFile {sourceFile targetFile RE replacement} {
# Open for reading
set fin [open $sourceFile]
fconfigure $fin -encoding iso8859-1 -translation lf
# Open for writing
set fout [open $targetFile w]
fconfigure $fout -encoding iso8859-1 -translation lf
# Iterate over the lines, applying the replacement
while {[gets $fin line] >= 0} {
regsub -- $RE $line $replacement line
puts $fout $line
}
# All done
close $fin
close $fout
}
If the file is small enough that it can all fit in memory easily, this is more efficient because the entire match-replace loop is hoisted into the C level:
proc transformFile {sourceFile targetFile RE replacement} {
# Open for reading
set fin [open $sourceFile]
fconfigure $fin -encoding iso8859-1 -translation lf
# Open for writing
set fout [open $targetFile w]
fconfigure $fout -encoding iso8859-1 -translation lf
# Apply the replacement over all lines
regsub -all -line -- $RE [read $fin] $replacement outputlines
puts $fout $outputlines
# All done
close $fin
close $fout
}
Finally, regular expressions aren't necessarily the fastest way to do matching of strings (for example, string match is much faster, but accepts a far more restricted type of pattern). Transforming one style of replacement code to another and getting it to go really fast is not 100% trivial (REs are really flexible).
Especially for very large files - as mentioned - it's not the best way to read the whole file into a variable. As soon as your system runs out of memory you can't prevent your app crashes. For data that is separated by line breaks, the easiest solution is to buffer one line and process it.
Just to give you an example:
# Open old and new file
set old [open "input.txt" r]
set new [open "output.txt" w]
# Configure input channel to provide data separated by line breaks
fconfigure $old -buffering line
# Until the end of the file is reached:
while {[gets $old ln] != -1} {
# Mask sensitive information on variable ln
...
# Write back line to new file
puts $new $ln
}
# Close channels
close $old
close $new
I can't think of any better way to process large files in Tcl - please feel free to tell me any better solution. But Tcl was not made to process large data files. For real performance you may use a compiled instead of a scripted programming language.
Edit: Replaced ![eof $old] in while loop.
A file with 100K lines is not that much (unless every line is 1K chars long :) so I'd suggest you read the entire file into a var and make the substitution on that var:
set fd [open file r+]
set buf [read $fd]
set buf [regsub -all $(the-passwd-pattern) $buf ****]
# write it back
seek $fd 0; # This is not safe! See potrzebie's comment for details.
puts -nonewline $fd $buf
close $fd

Tcl seek and write in a file opened with 'a+'

I need to store some logs in a file that can grow with every execution. A logical way would be to use a+ option when opening because using w+ would truncate the file. However, with the a+ option (Tcl 8.4) I cannot write anywhere in the file. seek works fine. I can verify that the pointer was moved using tell. But the output is always done at the tail end of the file.
Is there any way to resolve this? I.e. having the ability to seek and write in any place and also preserve the old file at the open.
In Tcl 8.5, the behavior of Tcl on Unix was changed so that the O_APPEND flag is passed to the open() system call. This causes the OS to always append the data to the file, and is inherited when the FD is passed to subprocesses; for logs, it is exactly the right thing. (In 8.4 and before, and in all versions on Windows, the behavior is simulated inside Tcl's file channel implementation, which will internally seek() to the end immediately before the write(); that obviously is subject to potential problems with race conditions when there are multiple processes logging to the same file and is definitely unsafe when the FD is passed to subprocesses.) You can manage truncation of the opened file with chan truncate (new in 8.5), which works just fine on a+-opened files.
If you do not want the seek-to-end behavior, you should not use a+ (or a). Try r+ or some combination of flags, like this:
set f [open $filename {RDWR CREAT}]
For comparison, the a+ option is now exactly the same as the flags RDWR CREAT APPEND, and not all combinations of longer flags can be described by short form flag specifiers. If you're not specifying APPEND, you'll need to do the seek $f 0 end yourself (and watch out for problems with multiple processes if you're appending to logs; that's when APPEND becomes required and exceptionally hard to correctly simulate any other way).
Open with r+ - it opens in read mode (thus not turncating the file) but allows writing as well.
See the documentation of open for more info: http://www.tcl.tk/man/tcl8.5/TclCmd/open.htm
I have verified that using the a+ option allow me to read/write anywhere in the file. However, by writing in the middle (or at the beginning) of a file, I overwrite the data there, not inserting. The following code illustrate that point:
#!/usr/bin/env tclsh
# Open the file, with truncation
set f [open foo w]
puts $f "one"
puts $f "two"
close $f
# Open again, with a+ ==> read/write/append
set f [open foo a+]
puts $f "three" ;# This goes to the end of the file
seek $f 4 ;# Seek to the beginning of the word "two"
puts $f "2.0" ;# Overwrite the word "two"
close $f
# Open and verify the contents
set f [open foo r]
puts [read $f]
close $f
Output:
one
2.0
three
If you are looking to insert in the middle of the file, you might want to look at the fileutil package, which contains the ::fileutil::insertIntoFile command.