I have the next expression: jj_ftfll h\\h\ -0.8898:0.006656 0.998:0.99999 h&j\hhh in a txt file,
and I need to add 0.005 to the 0.006656 number. I want to use Tcl and I can't think of any good idea.
There's several aspects that are tricky.
The file needs to be edited in-place despite the fact that the addition might change the length of the line. (Such an addition could potentially either lengthen or shorten the line.)
There needs to be a way of robustly recognising that that is the line to modify, and not some other line in the file. (This is actually the hardest of these problems in reality; it's extremely application-specific.)
The number needs to be extracted from the line, modified, and written back.
The values you are dealing with are potentially (well, actually) not represented precisely in IEEE binary floating point, which is what Tcl will use to do the calculations.
Bearing all that in mind, we are talking about these sorts of solutions:
We'll read the whole file in, split it into a list of strings, one string per line (henceforth referred to as the lines), update the lines of interest, and then write the whole lot back.
We'll use regexp to decide if a line is of interest. That's by far the most common command for this sort of task.
This one is messy in Tcl 8.6 and before. It's got a much better solution in Tcl 8.7.
There's really not all that much you can do about this. If you know the range of the numbers, you can use format to help… but it's messy. But maybe you'll get lucky.
set filename "foobar.txt"
# Get the lines of the file; this is GREAT if the file isn't too large
set f [open $filename]
set lines [split [read $f] "\n"]
close $f
# Now THAT'S what I call a horrible regular expression!
set RE {^(jj_ftfll\s+h\\\\h\\\s+-?[\d.]+:)(-?[\d.]+)(\s+-?[\d.]+:-?[\d.]+\s+h&j\hhh)$}
set newLines {}
foreach line $lines {
if {[regexp $RE $line -> prefix number suffix]} {
set line $prefix[expr {$number + 0.005}]$suffix
}
lappend newLines $line
}
# Write back over the file; the -nonewline prevents the number of lines from growing
set f [open $filename w]
puts -nonewline $f [join $newLines "\n"]
close $f
The trick with the regexp is that I am matching three pieces: the bit of the line before the part to replace (saved in the variable prefix), the number to replace itself (number), and the bit after the part to replace (suffix); the regexp command returns the number of times it matches (1 if the RE is found, 0 if it isn't). It's a scary RE mostly because it has -?[\d.]+ to match those floating point numbers, and I've changed spaces to \s+ (i.e., “at least one whitespace character”).
The version for 8.7 is this:
set filename "foobar.txt"
# Get the lines of the file; this is GREAT if the file isn't too large
set f [open $filename]
set lines [split [read $f] "\n"]
close $f
# Now THAT'S what I call a horrible regular expression!
set RE {^(jj_ftfll\s+h\\\\h\\\s+-?[\d.]+:)(-?[\d.]+)(\s+-?[\d.]+:-?[\d.]+\s+h&j\hhh)$}
proc addDeltaInLine {delta prefix number suffix} {
set number [expr {$number + $delta}]
return [string cat $prefix $number $suffix]
}
set newLines [lmap line $lines {
regsub -command $RE $line {addDeltaInLine 0.005}
}]
# Write back over the file; the -nonewline prevents the number of lines from growing
set f [open $filename w]
puts -nonewline $f [join $newLines "\n"]
close $f
The combination of lmap and regsub -command clean things up quite a bit. The RE is still scary though…
Related
I am new to tcl programming, I am reading a csv file however the rows can contain a space. But tcl splits on spaces. how to ignore that default behavior.
my csv is
1,fname,lname 1
2,fname,lname 2
The split works, when I try to output [lindex ${line} 2] I was expecting lname 1. However since tcl splits on spaces how to I overcome that issue.
foreach row $data {
set line [split ${row} ","]
puts [lindex ${line} 0]
}
You almost have the answer right there. When doing simple CSV reading, you first split by newline to get the records, and then split by comma to get the fields in a record.
foreach row [split $data "\n"] {
set line [split $row ","]
puts [lindex $line 0]
}
In the complex case (once you start having fields with embedded commas and newlines and so on) you use the csv package from Tcllib, as that handles the nuances for you. In particular csv::read2matrix is helpful.
And if you can, use a character other than comma to separate fields. Tabs are a common recommended choice; that makes tab-separated files, and that's very commonly supported and usually has trouble-free interoperation.
I am looking to generate a tcl script, which reads each line of a file, say abc.txt; each line of abc.txt is a specific location of set of files which need to be picked except the ones commented.
For example abc.txt has
./pvr.vhd
./pvr1.vhd
// ./pvr2.vhd
So I need to read each line of abc.txt and pick the file from the location it has mentioned and store it in a separate file except the once which starts with "//"
Any hint or script will be deeply appreciated.
The usual way of doing this is to put a filter at the start of the loop that processes each line that causes the commented lines to be skipped. You can use string match to do the actual detecting of whether a line is to be filtered.
set f [open "abc.txt"]
set lines [split [read $f] "\n"]
close $f
foreach line $lines {
if {[string match "//*" $line]} {
continue
}
# ... do your existing processing here ...
}
This also works just as well when used with a streaming loop (while {[gets $f line] >= 0} {…}).
I'm trying to implement a tcl script which reads a text file, and masks all the sensitive information (such as passwords, ip addresses etc) contained it and writes the output to another file.
As of now I'm just substituting this data with ** or ##### and searching the entire file with regexp to find the stuff which I need to mask. But since my text file can be 100K lines of text or more, this is turning out to be incredibly inefficient.
Are there any built in tcl functions/commands I can make use of to do this faster? Do any of the add on packages provide extra options which can help get this done?
Note: I'm using tcl 8.4 (But if there are ways to do this in newer versions of tcl, please do point me to them)
Generally speaking, you should put your code in a procedure to get best performance out of Tcl. (You have got a few more related options in 8.5 and 8.6, such as lambda terms and class methods, but they're closely related to procedures.) You should also be careful with a number of other things:
Put your expressions in braces (expr {$a + $b} instead of expr $a + $b) as that enables a much more efficient compilation strategy.
Pick your channel encodings carefully. (If you do fconfigure $chan -translation binary, that channel will transfer bytes and not characters. However, gets is not be very efficient on byte-oriented channels in 8.4. Using -encoding iso8859-1 -translation lf will give most of the benefits there.)
Tcl does channel buffering quite well.
It might be worth benchmarking your code with different versions of Tcl to see which works best. Try using a tclkit build for testing if you don't want to go to the (minor) hassle of having multiple Tcl interpreters installed just for testing.
The idiomatic way to do line-oriented transformations would be:
proc transformFile {sourceFile targetFile RE replacement} {
# Open for reading
set fin [open $sourceFile]
fconfigure $fin -encoding iso8859-1 -translation lf
# Open for writing
set fout [open $targetFile w]
fconfigure $fout -encoding iso8859-1 -translation lf
# Iterate over the lines, applying the replacement
while {[gets $fin line] >= 0} {
regsub -- $RE $line $replacement line
puts $fout $line
}
# All done
close $fin
close $fout
}
If the file is small enough that it can all fit in memory easily, this is more efficient because the entire match-replace loop is hoisted into the C level:
proc transformFile {sourceFile targetFile RE replacement} {
# Open for reading
set fin [open $sourceFile]
fconfigure $fin -encoding iso8859-1 -translation lf
# Open for writing
set fout [open $targetFile w]
fconfigure $fout -encoding iso8859-1 -translation lf
# Apply the replacement over all lines
regsub -all -line -- $RE [read $fin] $replacement outputlines
puts $fout $outputlines
# All done
close $fin
close $fout
}
Finally, regular expressions aren't necessarily the fastest way to do matching of strings (for example, string match is much faster, but accepts a far more restricted type of pattern). Transforming one style of replacement code to another and getting it to go really fast is not 100% trivial (REs are really flexible).
Especially for very large files - as mentioned - it's not the best way to read the whole file into a variable. As soon as your system runs out of memory you can't prevent your app crashes. For data that is separated by line breaks, the easiest solution is to buffer one line and process it.
Just to give you an example:
# Open old and new file
set old [open "input.txt" r]
set new [open "output.txt" w]
# Configure input channel to provide data separated by line breaks
fconfigure $old -buffering line
# Until the end of the file is reached:
while {[gets $old ln] != -1} {
# Mask sensitive information on variable ln
...
# Write back line to new file
puts $new $ln
}
# Close channels
close $old
close $new
I can't think of any better way to process large files in Tcl - please feel free to tell me any better solution. But Tcl was not made to process large data files. For real performance you may use a compiled instead of a scripted programming language.
Edit: Replaced ![eof $old] in while loop.
A file with 100K lines is not that much (unless every line is 1K chars long :) so I'd suggest you read the entire file into a var and make the substitution on that var:
set fd [open file r+]
set buf [read $fd]
set buf [regsub -all $(the-passwd-pattern) $buf ****]
# write it back
seek $fd 0; # This is not safe! See potrzebie's comment for details.
puts -nonewline $fd $buf
close $fd
How can I extract a word inside a double quote inside a file?
e.g.
variable "xxx"
Reading a text file into Tcl is just this:
set fd [open $filename]
set data [read $fd] ;# Now $data is the entire contents of the file
close $fd
To get the first quoted string (under some assumptions, notably a lack backslashed double quote characters inside the double quotes), use this:
if {[regexp {"([^""]*)"} $data -> substring]} {
# We found one, it's now in $substring
}
(Doubling up the quote in the brackets is totally unnecessary — only one is needed — but it does mean that the highlighter does the right thing here.)
The simplest method of finding all the quoted strings is this:
foreach {- substring} [regexp -inline -all {"([^""]*)"} $data] {
# One of the substrings is $substring at this point
}
Notice that I'm using the same regular expression in each case. Indeed, it's actually good practice to factor such REs (especially if repeatedly used) into a variable of their own so that you can “name” them.
Combining all that stuff above:
set FindQuoted {"([^""]*)"}
set fd [open $filename]
foreach {- substring} [regexp -inline -all $FindQuoted [read $fd]] {
puts "I have found $substring for you"
}
close $fd
Internal Matching
If you're just looking for a regular expression, then you can use TCL's capture groups. For example:
set string {variable "xxx"}
regexp {"(.*)"} $string match group1
puts $group1
This will return xxx, discarding the quotes.
External Matching
If you want to match data in a file without having to handling reading the file into TCL directly, you can do that too. For example:
set match [exec sed {s/^variable "\(...\)"/\1/} /tmp/foo]
This will call sed to find just the parts of the match you want, and assign them to a TCL variable for further process. In this example, the match variable is set to xxx as above, but is operating on an external file rather than a stored string.
When you just want to find with grep all words in quotes in a file and do something with the words, you do something like this (in a shell):
grep -o '"[^"]*"' | while read word
do
# do something with $word
echo extracted: $word
done
Hello i was wondering if its possible to read the last line of a realtime logfile with eggdrop and a .tcl script im able to read the first part of the logfile but thats it it doesnt read anymore of it
Is it possible to put an upper bound on the length of a line of the logfile? If so, it's pretty easy to get the last line:
# A nice fat upper bound!
set upperBoundLength 1024
# Open the log file
set f [open $logfile r]
# Go to some distance from the end; catch because don't care about errors here
catch {seek $f -$upperBoundLength end}
# Read to end, stripping trailing newline
set data [read -nonewline $f]
# Hygiene: close the logfile
close $f
# Get the last line
set lastline [lindex [split $data "\n"] end]
Note that it's not really necessary to do the seek; it just saves you from having to read the vast majority of the file which you presumably don't want.