Time resolved memory footprint of TCL exec - tcl

What's the high resolution time axis behavior of TCL 'exec ' ?
I understand that a 'fork' command will be used which will at first create a copy of the memory image of the process and then proceed.
Here's the motivation for my question:
A user gave me following observation. A 64 GB machine has a TCL based tool interface running with 60GB memory used. (let's assume swap is small). At the TCL prompt he gives 'exec ls' and the process crashes with a memory error.
You insight is much appreciated.
Thanks,
Gert

The exec command will call the fork() system call internally. This is usually OK, but might run out of memory when the OS is configured to not swap and the originating Tcl process is very large (or if there is very little slop room; it depends on the actual situation of course).
The ideas I have for reducing memory usage are to either using vfork() (by patching tclUnixPipe.c; you can define USE_VFORK in the makefile to enable that, and I don't know why that isn't used more widely) or by creating a helper process early on (before lots of memory is used) that will do the execs on your main process's behalf. Here's how to do that latter option:
# This is setup done at the start
set forkerProcess [open "|tclsh" r+]
fconfigure $forkerProcess -buffering line -blocking 0
puts $forkerProcess {
fconfigure stdout -buffering none
set tcl_prompt1 ""
set tcl_prompt2 ""
set tcl_interactive 0
proc exechelper args {
catch {exec {*}$args} value options
puts [list [list $value $options]]
}
}
# TRICKY BIT: Yield and drain anything unwanted
after 25
read $forkerProcess
# Call this, just like exec, to run programs without memory hazards
proc do-exec args {
global forkerProcess
fconfigure $forkerProcess -blocking 1
puts $forkerProcess [list exechelper {*}$args]
set result [gets $forkerProcess]
fconfigure $forkerProcess -blocking 0
while {![info complete $result]} {
append result \n [read $forkerProcess]
}
lassign [lindex $result 0] value options
return -options $options $value
}

Related

How is a channel's output-buffer content deleted without writing it to the channel?

I don't know much about PHP or Tcl; but I am trying to learn both concurrently.
In PHP, I read that every script should start with ob_start and, therefore, have been using the following.
ob_start(NULL, 0, PHP_OUTPUT_HANDLER_STDFLAGS);
echo header('Content-Length: '.ob_get_length());
ob_end_flush();
ob_end_clean();
In Tcl channels, I see that the options of -buffering full and -buffersize take care of ob_start() and chan flush is analogous to ob_end_flush() and chan pending output returns the number of bytes written to the output buffer but not yet written out.
I've been looking at my two texts on Tcl and the Tcl manual web page for channels and I can't find a method of just clearing the channel output buffer without writing it.
If data is being written to a channel set to -buffering full and an error is caught/trapped is it possible to empty the buffer and not write it to the channel?
It though perhaps that could use chan seek to set the position back to start similar to setting a pointer back to the beginning of a segment of RAM but the pipe example doesn't appear to create a channel that supports seeking.
lassign [chan pipe] rchan wchan
chan configure $rchan -buffering line -blocking 0 -translation crlf
chan configure $wchan -buffering full -blocking 0 -translation crlf
chan puts $wchan "This is the full messsage which shall attempt to truncate."
chan puts stdout "wchan pending: [chan pending output $wchan]"
chan puts stdout "wchan tell: [chan tell $wchan]"
# => -1 Thus, channel does not support seeking.
#chan seek $wchan 5 start
# => Errors invalid seek
chan flush $wchan
chan puts stdout [chan gets $rchan]
Thank you.
Sounds like you want to only output text written to a channel if no error happens in the middle of writing?
One way is to use a variable channel from tcllib; everything written to the channel is stored in a variable, which can then be written out to the real target on successful completion of whatever you're trying to do.
Example:
#!/usr/bin/env tclsh
package require tcl::chan::variable
proc main {} {
variable output
set output ""
set outputchan [::tcl::chan::variable output]
try {
puts $outputchan "Some text"
error "This is an error"
# Won't get written if an error is raised
chan flush $outputchan
puts -nonewline $output
} on error {errMsg errOptions} {
# Report error if you want
} finally {
chan close $outputchan
}
}
main
I don't think Tcl provides the functionality you are looking for. It's assumed that if you send something to a channel then it should always be written out.

Using Spawn-Expect mechanism in TCL-8.5

set pipeline [open "|Certify.exe args" "r"]
fconfigure $pipeline -blocking false
fconfigure $pipeline -buffering none
fileevent $pipeline readable [list handlePipeReadable $pipeline]
proc handlePipeReadable {pipe} {
if {[gets $pipe line] >= 0} {
# Managed to actually read a line; stored in $line now
} elseif {[eof $pipe]} {
# Pipeline was closed; get exit code, etc.
if {[catch {close $pipe} msg opt]} {
set exitinfo [dict get $opt -errorcode]
} else {
# Successful termination
set exitinfo ""
}
# Stop the waiting in [vwait], below
set ::donepipe $pipe
} else {
puts ""
# Partial read; things will be properly buffered up for now...
}
}
vwait ::donepipe
I have tried using pipe in TCL code. But for some reason, I want to convert this to Spawn- Expect mechanism. But I am grappling with it and facing issues when doing so. Can anyone please help me out??
Expect makes the pattern of usage very different and it uses a different way of interacting with the wrapped program that's much more like how interactive usage works (which stops a whole class of buffering-related bugs, which I suspect may be what you're hitting). Because of that, converting things over is not a drop-in change. Here's the basic pattern of use in a simple case:
package require Expect
# Note: different words become different arguments here
spawn Certify.exe args
expect "some sort of prompt string"
send "your input\r"; # \r is *CARRIAGE RETURN*
expect "something else"
send "something else\r"
expect eof
close
The real complexity comes when you can set up timeouts, wait for multiple things at once, wait for patterns as well as literal strings, etc. But doing the same from ordinary Tcl (even ignoring the buffering problems) is much more work. It's also almost always more work in virtually every other language.
Note that Expect doesn't do GUI automation. Just command-line programs. GUI automation is a much more complex topic.
It's not possible to give generic descriptions of what might be done as it depends so much on what the Certify.exe program actually does, and how you work with it interactively.

Write to stdout, but save tail -n 1 to a file

Is there anyway to run a process in the background while showing the real time updates in the stdout and only saving the last line (tail -n 1 savefile) to a file? There can be anywhere between 1 and 15 tests running at the same time and I need to be able to see that the tests are running but I do not want to save the entire text output.
I should mention since the tests are running in the background I am using a checkpid loop to wait for the tests to finish
also if it helps this is how my script is running the tests...
set runtest [exec -ignorestderr bsub -I -q lin_i make $testvar SEED=1 VPDDUMP=on |tail -n 1 >> $path0/runtestfile &]
I have found that if I use | tee it causes the checkpid loop to skip but if I do |tee it does not display output.
It's going to be better to use a simpler pipeline with explicit management of the output handling in Tcl, instead of using tail -n (and tee) to simulate it.
set pipeline($testvar) [open |[list bsub -I -q lin_i make $testvar SEED=1 VPDDUMP=on]]
fileevent $pipeline($testvar) readable [list handleInput $testvar]
fconfigure $pipeline($testvar) -blocking 0
# The callback for when something is available to be read
proc handleInput {testvar} {
upvar ::pipeline($testvar) chan ::status($testvar) status
if {[gets $chan line] >= 0} {
# OK, we've got an update to the current status; stash in a variable
set status $line
# Echo to stdout
puts $line
return
} elseif {[eof $chan]} {
if {[catch {close $line}]} {
puts "Error from pipeline for '$testvar'"
}
unset chan
# I don't know if you want to do anything else on termination
return
}
# Nothing to do otherwise; don't need to care about very long lines here
}
This code, plus a little vwait to enable event-based processing (assuming you're not also using Tk), will let you read from the pipeline while not preventing you from doing other things. You can even fire off multiple pipelines at once; Tcl will cope just fine. What's more, setting a write trace on the ::status array will let you monitor for changes across all of the pipelines at once.

How to mask the sensitive information contained in a file using tcl?

I'm trying to implement a tcl script which reads a text file, and masks all the sensitive information (such as passwords, ip addresses etc) contained it and writes the output to another file.
As of now I'm just substituting this data with ** or ##### and searching the entire file with regexp to find the stuff which I need to mask. But since my text file can be 100K lines of text or more, this is turning out to be incredibly inefficient.
Are there any built in tcl functions/commands I can make use of to do this faster? Do any of the add on packages provide extra options which can help get this done?
Note: I'm using tcl 8.4 (But if there are ways to do this in newer versions of tcl, please do point me to them)
Generally speaking, you should put your code in a procedure to get best performance out of Tcl. (You have got a few more related options in 8.5 and 8.6, such as lambda terms and class methods, but they're closely related to procedures.) You should also be careful with a number of other things:
Put your expressions in braces (expr {$a + $b} instead of expr $a + $b) as that enables a much more efficient compilation strategy.
Pick your channel encodings carefully. (If you do fconfigure $chan -translation binary, that channel will transfer bytes and not characters. However, gets is not be very efficient on byte-oriented channels in 8.4. Using -encoding iso8859-1 -translation lf will give most of the benefits there.)
Tcl does channel buffering quite well.
It might be worth benchmarking your code with different versions of Tcl to see which works best. Try using a tclkit build for testing if you don't want to go to the (minor) hassle of having multiple Tcl interpreters installed just for testing.
The idiomatic way to do line-oriented transformations would be:
proc transformFile {sourceFile targetFile RE replacement} {
# Open for reading
set fin [open $sourceFile]
fconfigure $fin -encoding iso8859-1 -translation lf
# Open for writing
set fout [open $targetFile w]
fconfigure $fout -encoding iso8859-1 -translation lf
# Iterate over the lines, applying the replacement
while {[gets $fin line] >= 0} {
regsub -- $RE $line $replacement line
puts $fout $line
}
# All done
close $fin
close $fout
}
If the file is small enough that it can all fit in memory easily, this is more efficient because the entire match-replace loop is hoisted into the C level:
proc transformFile {sourceFile targetFile RE replacement} {
# Open for reading
set fin [open $sourceFile]
fconfigure $fin -encoding iso8859-1 -translation lf
# Open for writing
set fout [open $targetFile w]
fconfigure $fout -encoding iso8859-1 -translation lf
# Apply the replacement over all lines
regsub -all -line -- $RE [read $fin] $replacement outputlines
puts $fout $outputlines
# All done
close $fin
close $fout
}
Finally, regular expressions aren't necessarily the fastest way to do matching of strings (for example, string match is much faster, but accepts a far more restricted type of pattern). Transforming one style of replacement code to another and getting it to go really fast is not 100% trivial (REs are really flexible).
Especially for very large files - as mentioned - it's not the best way to read the whole file into a variable. As soon as your system runs out of memory you can't prevent your app crashes. For data that is separated by line breaks, the easiest solution is to buffer one line and process it.
Just to give you an example:
# Open old and new file
set old [open "input.txt" r]
set new [open "output.txt" w]
# Configure input channel to provide data separated by line breaks
fconfigure $old -buffering line
# Until the end of the file is reached:
while {[gets $old ln] != -1} {
# Mask sensitive information on variable ln
...
# Write back line to new file
puts $new $ln
}
# Close channels
close $old
close $new
I can't think of any better way to process large files in Tcl - please feel free to tell me any better solution. But Tcl was not made to process large data files. For real performance you may use a compiled instead of a scripted programming language.
Edit: Replaced ![eof $old] in while loop.
A file with 100K lines is not that much (unless every line is 1K chars long :) so I'd suggest you read the entire file into a var and make the substitution on that var:
set fd [open file r+]
set buf [read $fd]
set buf [regsub -all $(the-passwd-pattern) $buf ****]
# write it back
seek $fd 0; # This is not safe! See potrzebie's comment for details.
puts -nonewline $fd $buf
close $fd

TCL: Two way communication between threads in Windows

I need to have two way communication between threads in Tcl and all I can get is one way with parameters passing in as my only master->helper communication channel. Here is what I have:
proc ExecProgram { command } {
if { [catch {open "| $command" RDWR} fd ] } {
#
# Failed, return error indication
#
error "$fd"
}
}
To call the tclsh83, for example ExecProgram "tclsh83 testCases.tcl TestCase_01"
Within the testCases.tcl file I can use that passed in information. For example:
set myTestCase [lindex $argv 0]
Within testCases.tcl I can puts out to the pipe:
puts "$myTestCase"
flush stdout
And receive that puts within the master thread by using the process ID:
gets $app line
...within a loop.
Which is not very good. And not two-way.
Anyone know of an easy 2-way communication method for tcl in Windows between 2 threads?
Here is a small example that shows how two processes can communicate. First off the child process (save this as child.tcl):
gets stdin line
puts [string toupper $line]
and then the parent process that starts the child and comunicates with it:
set fd [open "| tclsh child.tcl" r+]
puts $fd "This is a test"
flush $fd
gets $fd line
puts $line
The parent uses the value returned by open to send and receive data to/from the child process; the r+ parameter to open opens the pipeline for both read and write.
The flush is required because of the buffering on the pipeline; it is possible to change this to line buffering using the fconfigure command.
Just one other point; looking at your code you aren't using threads here you are starting a child process. Tcl has a threading extension which does allow proper interthread communications.