I am new to TCL language and wish to know how can I do the following process. Assume I have a program that creates a text file per single run and should be run for 10000 times. Every single run creates and text file called "OUT.out". All I am interested is a single number in a specific column from that OUT.out file in a single run.
Ideal case for a single run should be as following:
Start the main Run, (should be repeated for 10000 times, assumed)
Run Case 1
Finish the Case 1
Open the text file, OUT.out.
Find the maximum absolute value in the 4th column of the text file.
Save the max value in a separate text file in row 1.
delete the OUT.out file
Run Case 2
Finish the Case 2 of the main loop
Open the text file, OUT.out.
Find the maximum absolute value in the 4th column of the text file.
Save the max value in a separate text file in row 2.
delete the OUT.out file
Run Case 3
Finish the Case 3 of the main loop
Open the text file, OUT.out.
Find the maximum absolute value in the 4th column of the text file.
Save the max value in a separate text file in row 3.
delete the OUT.out file
Run Case 4
.
.
.
I presume code should be shorted that my note. Thanks in advance for your help.
Depending on what the separator is, you might do:
# Read in the data and list-ify it; REAL data is often messier though
set f [open OUT.out]
set table [lmap row [split [read $f] "\n"] {split $row}]
close $f
# Kill that unwanted file
file delete OUT.out
# Tcl indexes start at 0
set col4abs [lmap row $table {
expr { abs([lindex $row 3]) }
}]
# Get the maximum of a list of values
set maxAbs [tcl::mathfunc::max {*}$col4abs]
# You don't say what file to accumulate maximums in
set f [open accumulate.out "a"]; # IMPORTANT: a == append mode
puts $f $maxAbs
close $f
and then repeat that after each run. I'm sure you can figure out how to do that bit.
But if you're doing this a lot, you probably should look into storing the results in a database instead; they're much better suited for this sort of thing than a pile of ordinary files. (I can thoroughly recommend SQLite; we moved our bulk result data management into it and greatly improved our ability to manage things, and that's keeping lots of quite big binary blobs as well as various chunks of analysable metadata.)
Related
I have a code to calculate the mean of the first five values of each column of a file, for then use these values as a reference point for all set. The problem is that now I need to do the same but for many files. So I will need to obtain the mean of each file to then use these values again with the originals files. I have tried in this way but I obtain an error. Thanks.
%%% - Loading the file of each experiment
myfiles = dir('*.lvm'); % To load every file of .lvm
for i = 1:length(myfiles) % Loop with the number of files
files=myfiles(i).name;
mydata(i).files = files;
mydata(i).T = fileread(files);
arraymean(i) = mean(mydata(i));
end
The files that I need to compute are more or less like this:
Delta_X 3.000000 3.000000 3.000000
***End_of_Header***
X_Value C_P1N1 C_P1N2 C_P1N3
0.000000 -0.044945 -0.045145 -0.045705
0.000000 -0.044939 -0.045135 -0.045711
3.000000 -0.044939 -0.045132 -0.045706
6.000000 -0.044938 -0.045135 -0.045702
Your first line results in 'myfiles' being a structure array with components that you will find defined when you type 'help dir'. In particular, the names of all the files are contained in the structure element myfiles(i).name. To display all the file names, type myfiles.name. So far so good. In the for loop you use 'fileread', but fileread (see help fileread) returns the character string rather than the actual values. I have named your prototype .lvm file DinaF.lvm and I have written a very, very simple function to read the data in that file, by skipping the first three lines, then storing the following matrix, assumed to have 4 columns, in an array called T inside the function and arrayT in the main program
Here is a modified script, where a function read_lvm has been included to read your 'model' lvm file.
The '1' in the first line tells Octave that there is more to the script than just the following function: the main program has to be interpreted as well.
1;
function T=read_lvm(filename)
fid = fopen (filename, "r");
%% Skip by first three lines
for lhead=1:3
junk=fgetl(fid);
endfor
%% Read nrow lines of data, quit when file is empty
nrow=0;
while (! feof (fid) )
nrow=nrow + 1;
thisline=fscanf(fid,'%f',4);
T(nrow,1:4)=transpose(thisline);
endwhile
fclose (fid);
endfunction
## main program
myfiles = dir('*.lvm'); % To load every file of .lvm
for i = 1:length(myfiles) % Loop with the number of files
files=myfiles(i).name;
arrayT(i,:,:) = read_lvm(files);
columnmean(i,1:4)=mean(arrayT(i,:,:))
end
Now the tabular values associated with each .lvm file are in the array arrayT and the mean for that data set is in columnmean(i,1:4). If i>1 then columnmean would be an array, with each row containing the files for each lvm file. T
This discussion is getting to be too distant from the initial question. I am happy to continue to help. If you want more help, close this discussion by accepting my answer (click the swish), then ask a new question with a heading like 'How to read .lvm files in Octave'. That way you will get the insights from many more people.
I have several huge (>2GB) JSON files that end in ,\n]. Here is my test file example, which is the last 25 characters of a 2 GB JSON file:
test.json
":{"value":false}}}}}},
]
I need to delete the ,\n and add back in the ] from the last three characters of the last line. The entire file is on three lines: both the front and end brackets are on their own line, and all the contents of the JSON array is on the second line.
I can't load the entire stream into memory to do something like:
string[0..-2]
because the file is way too large. I tried several approaches, including Ruby's:
chomp!(",\n]")
and UNIX's:
sed
both of which made no change to my JSON file. I viewed the last 25 characters by doing:
tail -c 25 filename.json
and also did:
ls -l
to verify that the byte size of the new and the old file versions were the same.
Can anyone help me understand why none of these approaches is working?
It's not necessary to read in the whole file if you're looking to make a surgical operation like this. Instead you can just overwrite the last few bytes in the file:
file = 'huge.json'
IO.write(file, "\n]\n", File.stat(file).size - 5)
The key here is to write as many bytes out as you back-track from the end, otherwise you'll need to trim the file length, though you can do that as well if necessary with truncate.
I have requirement to split the file into multiple file before FTP ( since FTP have limitation of 1 GB). I am using SPLIT function to do so.
split --bytes=$SPLIT_FILE_SIZE $FILE -d $FILE"_"
$SPLIT_FILE_SIZE=900M
Now i am noticing that it is splitting the record also.
Also my data in record does not have any NEW LINE character in it.
For e.g.
My original file have
a|b|c|d|e|f
a1|b1|c1|d1|e1|f1
a2|b2|c2|d2|e2|f2
a3|b3|c3|d3|e3|f3
a4|b4|c4|d4|e4|f4
So my split file is
First file content :
a|b|c|d|e|f
a1|b1|c1|d1|e1|f1
a2|b2|c2|
Second file Content :
d2|e2|f2
a3|b3|c3|d3|e3|f3
a4|b4|c4|d4|e4|f4
Appreciate any suggestions.
This can be added to as you need, but in the most basic form, as long as you're dealing with text input, you may be able to use something like this:
#!/usr/bin/awk -f
BEGIN {
inc=1
}
s > 900*1024*1024 { # 900MB, per your question
inc++
s=0
}
{
s+=length($0)
print > "outfile." inc
}
This walks through the file, line by line, adding the length to a variable, then resetting the variable and incrementing a counter to be used as an output filename.
Upgrades might include, perhaps, taking the size from a command line option (ARGV[]), or including some sort of status/debugging output as the script runs.
Since you are asking it to split by counting bytes, it doesn't care if the split point is the middle of the line. Instead, get the average of number of bytes per line, add some safety margin and split by line.
split -l=$SPLIT_FILE_LINE $FILE -d $FILE"_"
You can count the number of lines in the file using wc -l $FILENAME. Note that Mac OS X and FreeBSD distributions don't have the -d` option.
Here is how I did it
SPLIT_FILE_SIZE=900
avg_length_of_line=awk '{ total += length($0); count++ } END { print total/count }' $FILE
r_avg_length_of_line=printf "%.0f\n" "$avg_length_of_line"
max_limit_of_file=expr $SPLIT_FILE_SIZE \* 1024 \* 1024
max_line_count=echo $((max_limit_of_file / r_avg_length_of_line))
split -l $max_line_count $FILE -d $FILE"_"
I have just started learning tcl and I have a problem with reading a big file.
I have a data file which looks like the following:
420
360 360 360 3434.01913
P 6.9022 0.781399 -0.86106
C 4.36397 -0.627479 3.83363
P 6.90481 5.42772 3.08491
....
and ends like this:
P -7.21325 1.71285 -0.127325
C -4.14243 0.41123 4.67585
420
360 360 360 3210.69667
so C is the last line of one section and 420 is the start of the next section.so every 420 lines make a section of the whole file.
how can I read every section of this file and have it as like say "frame1" and do this until the end of the file (having frame2, frame3 and ...).
I have come up with a simple script just to read the whole file line by line but I do not know how to do this.Thanks
The answer to your question "how to read every section of a file using tcl?" is quite simply "keep reading until you run out of lines".
The answer to the question "how to count sections and skip header lines" is something like this:
while { ...condition... } {
if {[gets $fp line] < 0} break
lassign $line name x y z
if {$name eq "420"} {
incr section_counter
} elseif {$name in {P C}} {
# do something with the data
}
}
The condition for your while loop will be tested once for each line read. The first if command breaks out of the loop once the entire file has been read. You don't need to split the line you read unless you expect one of the lines to contain a string that isn't a proper list. Once you have assigned the fields of the line into the variables, you can look inside name to see what kind of line you got. The second if command says that if $name is the string "420", the section counter is increased. If, on the other hand, $name contains "P" or "C", you have a data line to process. If neither of these conditions are fulfilled, you have the line after a "420" line, which is simply ignored.
Documentation: break, gets, if, incr, lassign, while
I am dealing with a "large" measurement data, approximately 30K key-value
pairs. The measurements have number of iterations. After each iteration a
datafile (non-csv) with 30K kay-value pairs is created. I want to somehow
creata a csv file of form:
Key1,value of iteration1,value of iteration2,...
Key2,value of iteration1,value of iteration2,...
Key2,value of iteration1,value of iteration2,...
...
Now, I was wondering about efficient way of adding each iteration mesurement
data as a columns to csv file in Tcl. So, far it seems that in either case I
will need to load whole csv file into some variable(array/list) and work on
each element by adding new measurement data. This seems somewhat inefficient.
Is there another way, perhaps?
Because CSV files are fundamentally text files, you have to load all the data in and write it out again. There's no other way to expand the number of columns since the data is fundamentally row-major. The easiest way to do what you say you want (after all, 30k-pairs isn't that much) is to use the csv package to do the parsing work. This code might do what you're looking for…
package require csv
package require struct::matrix
# Load the file into a matrix
struct::matrix data
set f [open mydata.csv]
csv::read2matrix $f data , auto
close $f
# Add your data
set newResults {}
foreach key [data get column 0] {
lappend newResults [computeFrom $key]; # This is your bit!
}
data add column $newResults
# Write back out again
set f [open mydata.csv w]
csv::writematrix data $f
close $f
You would probably be better off using a database though. Both metakit and sqlite3 work very well with Tcl, and handle this sort of task well.