I have just started learning tcl and I have a problem with reading a big file.
I have a data file which looks like the following:
420
360 360 360 3434.01913
P 6.9022 0.781399 -0.86106
C 4.36397 -0.627479 3.83363
P 6.90481 5.42772 3.08491
....
and ends like this:
P -7.21325 1.71285 -0.127325
C -4.14243 0.41123 4.67585
420
360 360 360 3210.69667
so C is the last line of one section and 420 is the start of the next section.so every 420 lines make a section of the whole file.
how can I read every section of this file and have it as like say "frame1" and do this until the end of the file (having frame2, frame3 and ...).
I have come up with a simple script just to read the whole file line by line but I do not know how to do this.Thanks
The answer to your question "how to read every section of a file using tcl?" is quite simply "keep reading until you run out of lines".
The answer to the question "how to count sections and skip header lines" is something like this:
while { ...condition... } {
if {[gets $fp line] < 0} break
lassign $line name x y z
if {$name eq "420"} {
incr section_counter
} elseif {$name in {P C}} {
# do something with the data
}
}
The condition for your while loop will be tested once for each line read. The first if command breaks out of the loop once the entire file has been read. You don't need to split the line you read unless you expect one of the lines to contain a string that isn't a proper list. Once you have assigned the fields of the line into the variables, you can look inside name to see what kind of line you got. The second if command says that if $name is the string "420", the section counter is increased. If, on the other hand, $name contains "P" or "C", you have a data line to process. If neither of these conditions are fulfilled, you have the line after a "420" line, which is simply ignored.
Documentation: break, gets, if, incr, lassign, while
Related
I write 2 script to do somting like this:
#script1, to dump info:
proc script1 {} {
puts $file "set a 123"
puts $file "set b 456"
.....
}
(The file size I dump is 8GB)
#And use script2 to source it and do data category:
while { [get $file_wrtie_out_by_script1 line] != -1 } {
eval $line
}
close $file_wrtie_out_by_script1
Do the job....
return
In this case, the script is hang in return, how to solve the issue... stuck 3+ days, thnaks
Update:
Thanks for Colin, now I use source instead of eval, but even remove the "Do the job...", just keep return, still hang
The gets command will return the number of characters in the line that it just read from the file channel.
When all the lines of the file have been read, then gets will return -1.
Your problem is that you have a while loop that is never ending. Your while loop will terminate when gets returns 1. You need to change the condition to -1 for the while loop to terminate.
I agree with the comment from Colin that you should just use source instead of eval for each line. Using eval line-by-line will fail if you have a multi-line command (but that might not be the case in your example).
I was writing a TCL program which looked something like this :
#!/usr/bin/tclsh
set fInp [open file1.txt r]
while {[gets $fInp line] >= 0} {
statement 1
statement 2
}
statement 3
statement 4
while {[gets $fInp line] >=0} {
statement 5
statement 6
}
close $fInp
I was expecting this to work fine , but to my surprise , the second while loop was not getting executed at all.
I came to a conclusion that we cannot read a file in TCL twice using same file descriptor (or channel)
So I closed the fInp and opened that file again using fInp2 , and it worked !
What is the reason behind this behavior , and is there any other way of doing it ?
Thanks
This is normal behavior for reading from files in every programming language and OS I'm familiar with. Once you read to the end of the file in the first loop, there's nothing left to read. You can reset and adjust the internal offset into the file's contents using the seek command, though.
seek $fInp 0 start
after the first loop will reset it to the beginning of the file so you can read it again in the second loop.
I am new to TCL language and wish to know how can I do the following process. Assume I have a program that creates a text file per single run and should be run for 10000 times. Every single run creates and text file called "OUT.out". All I am interested is a single number in a specific column from that OUT.out file in a single run.
Ideal case for a single run should be as following:
Start the main Run, (should be repeated for 10000 times, assumed)
Run Case 1
Finish the Case 1
Open the text file, OUT.out.
Find the maximum absolute value in the 4th column of the text file.
Save the max value in a separate text file in row 1.
delete the OUT.out file
Run Case 2
Finish the Case 2 of the main loop
Open the text file, OUT.out.
Find the maximum absolute value in the 4th column of the text file.
Save the max value in a separate text file in row 2.
delete the OUT.out file
Run Case 3
Finish the Case 3 of the main loop
Open the text file, OUT.out.
Find the maximum absolute value in the 4th column of the text file.
Save the max value in a separate text file in row 3.
delete the OUT.out file
Run Case 4
.
.
.
I presume code should be shorted that my note. Thanks in advance for your help.
Depending on what the separator is, you might do:
# Read in the data and list-ify it; REAL data is often messier though
set f [open OUT.out]
set table [lmap row [split [read $f] "\n"] {split $row}]
close $f
# Kill that unwanted file
file delete OUT.out
# Tcl indexes start at 0
set col4abs [lmap row $table {
expr { abs([lindex $row 3]) }
}]
# Get the maximum of a list of values
set maxAbs [tcl::mathfunc::max {*}$col4abs]
# You don't say what file to accumulate maximums in
set f [open accumulate.out "a"]; # IMPORTANT: a == append mode
puts $f $maxAbs
close $f
and then repeat that after each run. I'm sure you can figure out how to do that bit.
But if you're doing this a lot, you probably should look into storing the results in a database instead; they're much better suited for this sort of thing than a pile of ordinary files. (I can thoroughly recommend SQLite; we moved our bulk result data management into it and greatly improved our ability to manage things, and that's keeping lots of quite big binary blobs as well as various chunks of analysable metadata.)
I'm trying to use an expect script to access a remote device via telnet, read/save the remote "EVENTLOG" locally, and then extract specific lines (serial numbers) from the log file. Problem is the log files are constantly changing so I need a way to search for specific strings. The remote device is Linux based, but doesn't have things like grep, vi, less, etc as it's QNX Neutrino, hence having to do it locally.
I've successfully gotten the telnet, read the file and save locally under control, but when I get to "reading" the file is when I have issues. Currently I'm just trying to get it to print what it found, but the script just exits without reporting anything except some extra braces??
#!/usr/bin/expect -f
set timeout -1
log_user 1
spawn telnet $IP
match_max 100000
expect "login:"
send -- "$USER\r"
expect "Password:"
send -- "$PW\r"
expect "# "
send -- "\r"
#at this point logged into device
#send command to generate the "dallaslog"
set dallaslog [open dallaslog.txt w]
expect "#"
send -- "cat `ls -rt /LOG/event*`\r"
expect "(cat) exited status=0"
set logout $expect_out(buffer)
puts $dallaslog "$logout"
close $dallaslog
unset expect_out(buffer)
set dallasread [open dallaslog.txt r]
set lines [split [read $dallasread] "\r"]
close $dallasread
puts "${green}$lines{$normal}"
#a debug line to print $dallasread in green so I can verify it works up to here
foreach line $lines {
if {[regexp {.*Dallas ID: 0.*\n} $lines match]} {
if {$match == 1} {
puts $line ;# Prints whole line which has 1 at end
}
}
}
expect "# "
send -- "exit\r"
interact
What I'm (eventually) looking for is the script to catch any line starting with "Dallas ID:" and then to save that information to a variable, so I can use the "scan" command to parse the line and extract information.
What I get is:
(the results from $lines being "puts" in green)
"...
<ENTRY TIME="01/01/1970 00:48:07" PROC="syncd" FILE="mips.cc" LINE="208" NUM="10000">
UTC step from 01/01/1970 00:48:08 to 01/01/1970 00:48:07
</ENTRY>
Process 3174431 (cat) exited status=0
}{}
# exit
Process 3162142 (sh) exited status=0.
Connection closed by foreign host."
Thank you in advance for all the help. I'm a newbie to TCL/expect (been toying with it since last July) but I'm finding it to be a pretty powerful tool, just hard for me to debug!
EDIT: Added more information per #meuh 's reponse.
Example: There can be up to 4 Dallas ID, but generally I only have 0 and 1. Goal is to get the SN, BC, CN for reach Dallas ID saved as variables to put in a separate text file.
<ENTRY TIME="01/01/1970 00:00:06" PROC="sys" FILE="PlatformUtils.cpp" LINE="1227" NUM="10044">
Dallas ID: 1 SN:00000622393A BC: J4AD945 CN: IS200BPPBH2BMD R0: 001C
</ENTRY>
The foreach loop I used was an example from an old question on stack overflow I tried to modify to use here, unsuccessfully.
EDIT: I should also probably mention that this event log is approximately 800 lines long every time it gets read, which is why I haven't posted an excerpt from it.
This regexp line is probably not doing what you want:
if {[regexp {.*Dallas ID: 0.*\n} $lines match]} {
if {$match == 1} {
puts $line
You are passing the list $lines instead of, presumably, the single line $line. The variable match will be set to the string that matched which must therefore include the words "Dallas" and so on, so it can never be 1.
Your code comment says Prints whole line which has 1 at end, but I'm not sure what you are looking for as you do not have any example data that fits the regexp.
If you choose your regexp pattern using grouping you could capture parts of the line so perhaps not need a further scan. Eg
regexp {PROC="([a-z]*)"} $line match submatch
would set variable submatch to syncd in your above example.
You may also have a fundamental problem caused by tcl's handling of \r\n on input from a file. The lines you got from $expect_out(buffer) do indeed have the 2 characters as end-of-line delimiters. However,
when using read, by default I believe, it will translate the same sequence to a normalised \n. So your split will not do anything, and you need to split on \n rather than \r. You can check the size of the list of lines you have with
puts [llength $lines]
If it is 1, then your split is not working. Replace it with
set lines [split [read $dallasread] "\n"]
This should help your loop, where for example you can try
foreach line $lines {
if {[regexp {.*Dallas ID: (\d+) SN:([^ ]+)} $line match idnum SN]} {
puts $line
puts "$idnum, $SN"
}
}
You must remove the \n at the end of your regexp, as this is no longer present after the split. I've extended the regexp example with (\d+) to match for the id number (\d matches a digit), and ([^ ]+) to match any number of non-space characters after the text SN:.
These values are captured by the use of () grouping, and are placed in the variables idnum and SN, which you should be able to see output by the second puts command.
I have requirement to split the file into multiple file before FTP ( since FTP have limitation of 1 GB). I am using SPLIT function to do so.
split --bytes=$SPLIT_FILE_SIZE $FILE -d $FILE"_"
$SPLIT_FILE_SIZE=900M
Now i am noticing that it is splitting the record also.
Also my data in record does not have any NEW LINE character in it.
For e.g.
My original file have
a|b|c|d|e|f
a1|b1|c1|d1|e1|f1
a2|b2|c2|d2|e2|f2
a3|b3|c3|d3|e3|f3
a4|b4|c4|d4|e4|f4
So my split file is
First file content :
a|b|c|d|e|f
a1|b1|c1|d1|e1|f1
a2|b2|c2|
Second file Content :
d2|e2|f2
a3|b3|c3|d3|e3|f3
a4|b4|c4|d4|e4|f4
Appreciate any suggestions.
This can be added to as you need, but in the most basic form, as long as you're dealing with text input, you may be able to use something like this:
#!/usr/bin/awk -f
BEGIN {
inc=1
}
s > 900*1024*1024 { # 900MB, per your question
inc++
s=0
}
{
s+=length($0)
print > "outfile." inc
}
This walks through the file, line by line, adding the length to a variable, then resetting the variable and incrementing a counter to be used as an output filename.
Upgrades might include, perhaps, taking the size from a command line option (ARGV[]), or including some sort of status/debugging output as the script runs.
Since you are asking it to split by counting bytes, it doesn't care if the split point is the middle of the line. Instead, get the average of number of bytes per line, add some safety margin and split by line.
split -l=$SPLIT_FILE_LINE $FILE -d $FILE"_"
You can count the number of lines in the file using wc -l $FILENAME. Note that Mac OS X and FreeBSD distributions don't have the -d` option.
Here is how I did it
SPLIT_FILE_SIZE=900
avg_length_of_line=awk '{ total += length($0); count++ } END { print total/count }' $FILE
r_avg_length_of_line=printf "%.0f\n" "$avg_length_of_line"
max_limit_of_file=expr $SPLIT_FILE_SIZE \* 1024 \* 1024
max_line_count=echo $((max_limit_of_file / r_avg_length_of_line))
split -l $max_line_count $FILE -d $FILE"_"