Tcl: Check for same content between two files - tcl

I have a program which I made in vimscript which checks two files if they are the same. It makes a system call to diff to verify if they are differents or not.
I need something similar in Tcl but without resorting to external commands or system calls. I don't need to know the difference or have comparison between the files, just to return 1 if both files have the same content or 0 if the contents are different.

proc comp_file {file1 file2} {
# optimization: check file size first
set equal 0
if {[file size $file1] == [file size $file2]} {
set fh1 [open $file1 r]
set fh2 [open $file2 r]
set equal [string equal [read $fh1] [read $fh2]]
close $fh1
close $fh2
}
return $equal
}
if {[comp_file /tmp/foo /tmp/bar]} {
puts "files are equal"
}

For a straight binary comparison, you can just work a chunk at a time. (4kB is probably quite enough per chunk though you can pick larger values; I/O overhead will dominate in any case.) The simplest way to express this is with a loop inside a try…finally (requires Tcl 8.6):
proc sameContent {file1 file2} {
set f1 [open $file1 "rb"]
set f2 [open $file2 "rb"]
try {
while 1 {
if {[read $f1 4096] ne [read $f2 4096]} {
return 0
} elseif {[eof $f1]} {
# The same if we got to EOF at the same time
return [eof $f2]
} elseif {[eof $f2]} {
return 0
}
}
} finally {
close $f1
close $f2
}
}
Otherwise, we can take advantage of the fact that we can see if a variable has been set to keep the logic fairly simple (which is quite a lot less clear) to make code that works in older versions of Tcl:
proc sameContent {file1 file2} {
set f1 [open $file1]
fconfigure $f1 -translation binary
set f2 [open $file2]
fconfigure $f2 -translation binary
while {![info exist same]} {
if {[read $f1 4096] ne [read $f2 4096]} {
set same 0
} elseif {[eof $f1]} {
# The same if we got to EOF at the same time
set same [eof $f2]
} elseif {[eof $f2]} {
set same 0
}
}
close $f1
close $f2
return $same
}
Both are invoked in the same way:
if {[sameContent "./foo.txt" "some/dir/bar.txt"]} {
puts "They're the same contents, byte-for-byte"
} else {
puts "A difference was found"
}

Related

Reading Hebrew in JSON via native-messaging API and writing to file but Hebrew looks wrong in the file?

I'm using this code to read messages received from a browser extension via the native-messaging API. Fortunately, it works and reads all the data. Unfortunately, I'm doing something wrong or failing to do something right (if there is a difference there) and when the data is written to the file, it appears that the multi-byte characters (Hebrew, at the moment) are being written as individual bytes.
Would you please tell me what needs to be done before writing [dict get $msg doc] to the file? $msg is a big string of HTML. Am I configuring stdin incorrectly or does the data being written to file need to be encoded?
Instead of seeing Hebrew, the file shows items like this ×”Ö¸×ָֽרֶץ×.
According to the MDN document the incoming message is:
On the application side, you use standard input to receive messages
and standard output to send them.
Each message is serialized using JSON, UTF-8 encoded and is preceded
with an unsigned 32-bit value containing the message length in native
byte order.
Thank you.
#!/usr/bin/env tclsh
package require json
proc coread {reqBytes} {
set ::extMsg {}
set remBytes $reqBytes
while {![chan eof stdin]} {
yield
append ::extMsg [set data [read stdin $remBytes]]
set remBytes [expr {$remBytes - [string length $data]}]
if {$remBytes == 0} {
return $reqBytes
}
}
throw {COREAD EOF} "Unexpected EOF"
}
proc ExtRead {} {
chan event stdin readable coro_stdin
while {1} {
set ::extMsg {}
chan puts $::fp_log "starting coread 4"
if { [coread 4] != 4 || [binary scan $::extMsg iu len] != 1 } {
exit
}
if { [coread $len] != $len } {
exit
}
set msg [::json::json2dict $::extMsg]
set book [dict get $msg book]
set path "../${book}.html"
set fp [open $path w]
chan puts $fp [dict get $msg doc]
close $fp
}
}
proc CloseConnect {} {
set ::forever 0
}
proc Listen {} {
# Listening on stdin
set ::forever 1
chan configure stdin -buffering full -blocking 0 -encoding iso8859-1 -translation crlf
coroutine coro_stdin ExtRead
vwait forever
}
set extMsg {}
Listen
I don't really want to answer my own question but this is too much for a comment; and I'll be happy to delete it if someone provides a better one.
As #Abra pointed out, I had the encoding configured wrong on stdin. I thought all that needed done then was to use [string bytelength $data] instead of [string length $data] but that only worked for a few iterations. I do not understand why [coread 4] was ending up reading more than 4 bytes such that $remBytes was negative. It resulted in misreading the data and the coroutine waited for far more data than there was. So, I moved the configuring of stdin into proc extRead and changed it from binary to utf-8 for each part of the read. Binary for the 4 bytes indicating the length of the message, and utf-8 for the message. And, I had to use [string length $data] for the binary read of 4 bytes and [string bytelength $data] for reading the message; and this resulted in $remBytes always equaling zero upon completion of reading both the length and the message, such that if {$remBytes == 0} works rather than needing the earlier change mentioned in my comment of if {$remBytes <= 0}.
It works and the Hebrew now appears correctly. But it bothers me that I don't understand why [string bytelength $data] would not be 4 when reading 4 bytes. I write the values out to a log/debugging file at various steps to verify this.
I should add, here, that I had a similar question in the past which #Schelte Bron answered but I don't think I followed his advice correctly or unknowingly reverted to an old version of my code. He recommended:
The solution is to configure stdin to binary and then apply the utf-8
decoding later:
I don't know why I didn't start with that but he probably provided the answer three months ago.
#!/usr/bin/env tclsh
package require json
proc coread {reqBytes readType} {
set ::extMsg {}
set remBytes $reqBytes
while {![chan eof stdin]} {
yield
append ::extMsg [set data [read stdin $remBytes]]
if { $readType } {
set remBytes [expr {$remBytes - [string bytelength $data]}]
} else {
set remBytes [expr {$remBytes - [string length $data]}]
}
if {$remBytes == 0} {
return $reqBytes
}
}
throw {COREAD EOF} "Unexpected EOF"
}
proc ExtRead {} {
chan event stdin readable coro_stdin
while {1} {
chan configure stdin -buffering full -blocking 0 -encoding binary
set ::extMsg {}
if { [coread 4 0] != 4 || [binary scan $::extMsg iu len] != 1 } {
exit
}
chan configure stdin -buffering full -blocking 0 -encoding utf-8 -translation crlf
if { [coread $len 1] != $len } {
exit
}
set msg [::json::json2dict $::extMsg]
set book [dict get $msg book]
set path "../${book}.html"
set fp [open $path w]
chan puts $fp [dict get $msg doc]
close $fp
}
}
proc CloseConnect {} {
set ::forever 0
}
proc Listen {} {
# Listening on stdin
set ::forever 1
coroutine coro_stdin ExtRead
vwait forever
}
set extMsg {}
Listen

Unable to match string in a if loop in TCL

I am trying to check for a sentence in a file. I have this so far, but it always prints "one". Expected is "zero". Is there a problem with my regex?
File contents:
This is the header.
Test is a pass.
This is the footer.
Code is below:
set file [open "test.report" r]
while {[gets $file line] != -1} {
if {[regexp {\s+Test is a pass} $line]} {
puts "zero"
} else {
puts "one"
}
}
close $file
If you just want to check if a string is present, and you're not searching through too large a file (100MB isn't “too large” in this context, but 1GB is getting close) then you can just load the file in at once and use a line-aware regexp-matching mode.
set f [open "thefile.report"]
set data [read $f]
close $f
if {[regexp -line {\s+Test is a pass} $data]} {
puts "The test passed"
} else {
puts "The test did not pass"
}
You'll still need to think carefully about what to actually search for. In particular, if the file contains ANSI colour codes then your test needs to match them too.
For large files, processing a line at a time is right, but then you need to get the semantics right. It's easiest to write a helper procedure for this, and try…finally… helps a lot too:
proc isMatching {filename regular_expression} {
set f [open $filename]
try {
while {[gets $f line] >= 0} {
if {[regexp -- $regular_expression $line]} {
return true
}
}
return false
} finally {
close $f
}
}
if {[isMatching "thefile.report" {\s+Test is a pass}]} {
puts "The test passed"
} else {
puts "The test did not pass"
}
If you have files with single lines over 1GB long, then you've got something truly horrible. It's possible to build processing systems to cope with this, but it's nasty and requires trickier techniques. Ask again if you're unlucky enough to be stuck with this…

How to print newly updated lines in tcl

I need to print the lines which are newly added in file.
My code looks as follows:
proc dd {} {
global line_number
set line_number 0
set a [open "pkg.v" r]
#global count
while {[gets $a line]>=0} {
incr line_number
global count
set count [.fr.lst2 size]
puts "enter $count"
if {[eof $a]} {
#.fr.lst2 insert end "$line"
# set count [.fr.lst2 size]
close $a
} elseif {$count > 0} {
.fr.lst2 delete 0 end
if {$count+1} {
.fr.lst2 insert end "$line"
puts "i am $count"
}
} else {
.fr.lst2 insert end "$line"
puts "i am not"
}
}
puts "$count"
}
Assuming we're talking about lines written to the end of a log file on any Unix system (Linux, OSX, etc.) then it's trivially done with the help of tail:
# Make the pipeline to read from 'tail -f'; easy easy stuff!
set mypipe [exec |[list tail -f $theLogfile]]
# Make the pipe be non-blocking; usually a good idea for anything advanced
fconfigure $mypipe -blocking 0
# Handle data being available by calling a procedure which will read it
# The procedure takes two arguments, and we use [list] to build the callback
# script itself (Good Practice in Tcl coding)
fileevent $mypipe readable [list processLine $mypipe .fr.lst2]
proc processLine {pipeline widget} {
if {[gets $pipeline line] >= 0} {
# This is probably too simplistic for you; adapt as necessary
$widget insert end $line
} elseif {[eof $pipeline]} { # Check for EOF *after* [gets] fails!
close $pipeline
}
}

What is a good approach for continuous searching for a fresh term of a pattern in a large file?

The AUT creates logs for a particular function run and appends the log in a central file.
The line to search in this file is:
LatestTimeStamp>MyFunction SomeStep timeLapsed SOME_TIME_VALUE
Every time the log is generated by AUT, fresh multiple logs of similar pattern are generated as above and its required to extract these fresh logs.
The simple approach I am using is:
class structure
itcl::class clsLogs {
variable _oldTimeStamp ""
variable _logRec
variable _runCtr 0
method _extractInfoForRun {runType} {
#read log
catch {close $fp}
set log [read [set fp [open [file join [file normalize $env(APPDATA)] Logs Action.log]]]]
#garbage everything before old time stamp and collect all fresh log
if {[info exists _oldTimeStamp] && $_oldTimeStamp!=""} {
regsub [subst -nobackslashes -nocommands {.*$_oldTimeStamp[^\n]*\n}] [set freshLog $log] "" freshLog
}
#increment run counter for this run
incr _runCtr
#get all fresh entry lines for reporting timelapsed for different steps of MyFunction in this run
set freshEntries [regexp -inline -all [subst -nocommands -nobackslashes {[^\n]*MyFunction[^\n]*timeLapsed[^\n]*}] $freshLog]
#iterate and collect time lapsed info for each step of MyFunction for this run
foreach ent $freshEntries {
regexp {(.*?)>.*>>MyFunction\s+(.*)\s+timeLapsed\s+(.*)$} $ent -> timeStamp runStep lapsedTime ;
puts ************runTyp>$runTyp***********\n\t$ent\n\ttimeStamp->$timeStamp\nlapsedTime->$lapsedTime
set _logRec(MyFunction_Run-$_runCtr:$runStep,lapsedTime) $lapsedTime
}
#reset old time stamp variable for next run
set _oldTimeStamp $timeStamp
}
}
But this file could be huge and storing everything in one read output variable could result in overflow:
set log [read [set fp [open [file join [file normalize $env(APPDATA)] Logs Action.log]]]]
Is it somehow possible to use a combination to get the current position of the file pointer and use it to offset to last cursor position and then start reading each time from that position?
What are the Tcl command options for the same?
so this does it:
seek [set fp [open $file]] $_fOffset
set txt [read $fp]
set _fOffset [tell $fp]
In context:
::itcl::class clsLogs {
private {
variable _fOffset 0
}
public {
method _fFreshRead {file args} {
set options(-resetOffSet) false
array set options $args
if {$options(-resetOffSet)} {
set _fOffset 0
}
seek [set fp [open $file]] $_fOffset
set txt [read $fp]
set _fOffset [tell $fp]
close $fp
return $txt
}
}
}

Splitting a huge file into smaller files

How can I split a huge file into n number of smaller files using Tcl? The file name to split and number of files to be created have to be given through command line. Here is what I have so far:
proc splitter { file no } {
set lnum 0
set file_open [open $file r]
while {[gets $file_open line] >= 0 } {
incr lnum
}
puts "$lnum"
set num [expr $lnum/$no]
close $file_open
}
Here is one way to split text files, which has the advantage of not holding much in memory at once. (You can also split binary files, but then you need to use read instead of gets, and also to consider whether there are record boundaries in the data; text is mostly simpler.)
#!/usr/bin/env tclsh8.5
proc splitter {filename fileCount} {
set targetFileSize [expr {[file size $filename] / $fileCount}]
set n 0
set fin [open $filename]
while {[gets $fin line]} {
if {![info exist fout]} {
set fout [open $filename.split_[incr n] w]
}
puts $fout $line
if {[tell $fout] > $targetFileSize} {
close $fout
unset fout
}
}
if {[info exist fout]} {
close $fout
}
close $fin
}
splitter {*}$argv; # Connect to outside command line
use the global argv array to access command line parameters
after you read the file to count the lines, instead of closing the file handle, you can seek back to the top of the file.
if you're on *nix, have you considered using exec to call out to split?