gnuplot ignores divisor of csv-column values - csv

I am using gnuplot to draw a histogram of a series of RAM measurements I performed.
However, I want it to display the values that are stored in Bytes in CSV files in KB.
I divided the respective columns by 1024, but gnuplot simply ignores that.
Below you see the template that is changed by a script to have meaningful file names for CSVFILE and PSFILE and then fed into gnuplot.
set style data histogram
set style histogram errorbars gap 1
set xlabel "nodes"
set ylabel "memory (KB)"
set key left box
set datafile separator ","
set terminal postscript landscape
set output 'PSFILE'
plot 'CSVFILE' using ($2/1024):($3/1024):xtic(1) ti col lt -1 fs pattern 1,\
'' using ($4/1024):($5/1024):xtic(1) ti col lt -1 fs pattern 2,\
'' using ($6/1024):($7/1024):xtic(1) ti col lt -1 fs pattern 4,\
'' using ($8/1024):($9/1024):xtic(1) ti col lt -1 fs pattern 6,\
'' using ($10/1024):($11/1024):xtic(1) ti col lt -1 fs pattern 5,\
'' using ($12/1024):($13/1024):xtic(1) ti col lt -1 fs pattern 7,\
'' using ($14/1024):($15/1024):xtic(1) ti col lt -1 fs pattern 3
So what does not work is the /1024. Any ideas how to do that?
Changing the CSV files instead came to my mind, yes, but they are a lot, and I would have to write a script to change all cells, which I definitely do not fancy to do.

Okay, the solution was trivial. I just had to enclose the $2 values in extra braces, like ($2)/1024.

Related

Gnuplot one-liner to generate a titled line for each row in a CSV file

I've been trying to figure out gnuplot but haven't been getting anywhere for seemingly 2 reasons. My lack of understanding gnuplot set commands, and the layout of my data file. I've decided the best option is to ask for help.
Getting this gnuplot command into a one-liner is the hope.
Example rows from my CSV data file (MyData.csv):
> _TitleRow1_,15.21,15.21,...could be more, could be less
> _TitleRow2_,16.27,16.27,101,55.12,...could be more, could be less
> _TitleRow3_,16.19,16.19,20.8,...could be more, could be less
...(over 100 rows)
Contents of MyData.csv rows will always be a string as the first column for title, followed by an undetermined amount of decimal values. (Each row gets appended to periodically, so specifying an open ended amount of columns to include is needed)
What I'd like to happen is to generate a line graph showing a line for each row in the csv, using the first column as a row title, and the following numbers generating the actual line.
This is the I'm trying:
gnuplot -e 'set datafile separator ","; set key autotitle columnhead; plot "MyData.csv"'
Which results in:
set datafile separator ","; set key autotitle columnhead; plot "MyData.csv"
^
line 0: Bad data on line 2 of file MyData.csv
This looks like an amazing tool and I'm looking forward to learning more about it. Thanks in advance for any hints/assistance!
Your datafile format is very unfortunate for gnuplot which prefers data in columns.
Although, you can also plot rows (which is not straightforward in gnuplot, but see an example here). This requires a strict matrix, but the problem with your data is that you have a variable column count.
Actually, your CSV is not a "correct" CSV, because a CSV should have the same number of columns for all rows, i.e. if one row has less data than the row with maximum data the line should be filled with ,,, as many as needed. That's basically what the script below is doing.
With this you can plot rows with the option matrix (check help matrix). However, you will get some warnings warning: matrix contains missing or undefined values which you can ignore.
Alternatively, you could transpose your data (with variable column count maybe not straightforward). Maybe there are external tools which can do it easily. With gnuplot-only it will be a bit cumbersome (and first you would have to fill your shorter rows as in the example below).
Maybe there is a simpler and better gnuplot-only solution which I am currently not aware of.
Data: SO73099645.dat
_TitleRow1_, 1.2, 1.3
_TitleRow2_, 2.2, 2.3, 2.4, 2.5
_TitleRow3_, 3.2, 3.3, 3.4
Script:
### plotting rows with variable columns
reset session
FILE = "SO73099645.dat"
getColumns(s) = (sum [i=1:strlen(s)] (s[i:i] eq ',') ? 1 : 0) + 1
set datafile separator "\t"
colCount = 0
myNaNs = myHeaders = ''
stats FILE u (rowCount=$0+1, c=getColumns(strcol(1)), c>colCount ? colCount=c : 0) nooutput
do for [i=1:colCount] {myNaNs=myNaNs.',NaN' }
set table $Data
plot FILE u (s=strcol(1),c=getColumns(s),s.myNaNs[1:(colCount-c)*4]) w table
unset table
set datafile separator ","
stats FILE u (myHeaders=sprintf('%s "%s"',myHeaders,strcol(1))) nooutput
myHeader(n) = word(myHeaders,n)
set key noenhanced
plot for [row=0:rowCount-1] $Data matrix u 1:3 every ::1:row::row w lp pt 7 ti myHeader(row+1)
### end of script
As "one-liner":
FILE = "SO/SO73099645.dat"; getColumns(s) = (sum [i=1:strlen(s)] (s[i:i] eq ',') ? 1 : 0) + 1; set datafile separator "\t"; colCount = 0; myNaNs = myHeaders = ''; stats FILE u (rowCount=$0+1, c=getColumns(strcol(1)), c>colCount ? colCount=c : 0) nooutput; do for [i=1:colCount] {myNaNs=myNaNs.',NaN' }; set table $Data; plot FILE u (s=strcol(1),c=getColumns(s),s.myNaNs[1:(colCount-c)*4]) w table; unset table; set datafile separator ","; stats FILE u (myHeaders=sprintf('%s "%s"',myHeaders,strcol(1))) nooutput; myHeader(n) = word(myHeaders,n); set key noenhanced; plot for [row=0:rowCount-1] $Data matrix u 1:3 every ::1:row::row w lp pt 7 ti myHeader(row+1)
Result:

How can I set the numbering of the x-axis of an Octave plot to engineering notation?

I made a very simple Octave script
a = [10e6, 11e6, 12e6];
b = [10, 11, 12];
plot(a, b, 'rd-')
which outputs the following graph.
Graph
Is it possible to set the numbering on the x-axis to engineering notation, rather than scientific, and have it display "10.5e+6, 11e+6, 11.5e+6" instead of "1.05e+7, 1.1e+7, 1.15+e7"?
While octave provides a 'short eng' formatting option, which does what you're asking for in terms of printing to the terminal, it does not appear to provide this functionality in plots or when formatting strings via sprintf.
Therefore you'll have to find a way to do this by yourself, with some creative string processing of the initial xticks, and substituting the plot's ticklabels accordingly. Thankfully it's not that hard :)
Using your example:
a = [10e6, 11e6, 12e6];
b = [10, 11, 12];
plot(a, b, 'rd-')
format short eng % display stdout in engineering format
TickLabels = disp( xticks ) % collect string as it would be displayed on the stdout
TickLabels = strsplit( TickLabels ) % tokenize at spaces
TickLabels = TickLabels( 2 : end - 1 ) % discard start and end empty tokens
TickLabels = regexprep( TickLabels, '\.0+e', 'e' ) % remove purely zero decimals using a regular expression
TickLabels = regexprep( TickLabels, '(\.[1-9]*)0+e', '$1e' ) % remove non-significant zeros in non-zero decimals using a regular expression
xticklabels( TickLabels ) % set the new ticklabels to the plot
format % reset short eng format back to default, if necessary

Dimension problem when converting a MATLAB .m script into an Octave compatible syntax

I want to run a MATLAB script M-file to reconstruct a point cloud in Octave. Therefore I had to rewrite some parts of the code to make it compatible with Octave. Actually the M-file works fine in Octave (I don't get any errors) and also the plotted point cloud looks good at first glance, but it seems that the variables are only half the size of the original MATLAB variables. In the attached screenshots you can see what I mean.
Octave:
MATLAB:
You can see that the dimension of e.g. M in Octave is 1311114x3 but in MATLAB it is 2622227x3. The actual number of rows in my raw file is 2622227 as well.
Here you can see an extract of the raw file (original data) that I use.
Rotation angle Measured distance
-0,090 26,295
-0,342 26,294
-0,594 26,294
-0,846 26,295
-1,098 26,294
-1,368 26,296
-1,620 26,296
-1,872 26,296
In MATLAB I created my output variable as follows.
data = table;
data.Rotationangle = cell2mat(raw(:, 1));
data.Measureddistance = cell2mat(raw(:, 2));
As there is no table function in Octave I wrote
data = cellfun(#(x)str2num(x), strrep(raw, ',', '.'))
instead.
Octave also has no struct2array function, so I had to replace it as well.
In MATLAB I wrote.
data = table2array(data);
In Octave this was a bit more difficult to do. I had to create a struct2array function, which I did by means of this bug report.
%% Create a struct2array function
function retval = struct2array (input_struct)
%input check
if (~isstruct (input_struct) || (nargin ~= 1))
print_usage;
endif
%convert to cell array and flatten/concatenate output.
retval = [ (struct2cell (input_struct)){:}];
endfunction
clear b;
b.a = data;
data = struct2array(b);
Did I make a mistake somewhere and could someone help me to solve this problem?
edit:
Here's the part of my script where I'm using raw.
delimiter = '\t';
startRow = 5;
formatSpec = '%s%s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'HeaderLines' ,startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
fclose(fileID);
%% Convert the contents of columns containing numeric text to numbers.
% Replace non-numeric text with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = mat2cell(dataArray{col}, ones(length(dataArray{col}), 1));
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[1,2]
% Converts text in the input cell array to numbers. Replaced non-numeric
% text with NaN.
rawData = dataArray{col};
for row=1:size(rawData, 1)
% Create a regular expression to detect and remove non-numeric prefixes and
% suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\.]*)+[\,]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\.]*)*[\,]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData(row), regexstr, 'names');
numbers = result.numbers;
% Detected commas in non-thousand locations.
invalidThousandsSeparator = false;
if numbers.contains('.')
thousandsRegExp = '^\d+?(\.\d{3})*\,{0,1}\d*$';
if isempty(regexp(numbers, thousandsRegExp, 'once'))
numbers = NaN;
invalidThousandsSeparator = true;
end
end
% Convert numeric text to numbers.
if ~invalidThousandsSeparator
numbers = strrep(numbers, '.', '');
numbers = strrep(numbers, ',', '.');
numbers = textscan(char(numbers), '%f');
numericData(row, col) = numbers{1};
raw{row, col} = numbers{1};
end
catch
raw{row, col} = rawData{row};
end
end
end
You don't see any raw in my workspaces because I clear all temporary variables before I reconstruct my point cloud.
Also my original data in row 1311114 and 1311115 look normal.
edit 2:
As suggested here is a small example table to clarify what I want and what MATLAB does with the table2array function in my case.
data =
-0.0900 26.2950
-0.3420 26.2940
-0.5940 26.2940
-0.8460 26.2950
-1.0980 26.2940
-1.3680 26.2960
-1.6200 26.2960
-1.8720 26.2960
With the struct2array function I used in Octave I get the following array.
data =
-0.090000 26.295000
-0.594000 26.294000
-1.098000 26.294000
-1.620000 26.296000
-2.124000 26.295000
-2.646000 26.293000
-3.150000 26.294000
-3.654000 26.294000
If you compare the Octave array with my original data, you can see that every second row is skipped. This seems to be the reason for 1311114 instead of 2622227 rows.
edit 3:
I tried to solve my problem with the suggestions of #Tasos Papastylianou, which unfortunately was not successful.
First I did the variant with a struct.
data = struct();
data.Rotationangle = [raw(:,1)];
data.Measureddistance = [raw(:,2)];
data = cell2mat( struct2cell (data ).' )
But this leads to the following structure in my script. (Unfortunately the result is not what I would like to have as shown in edit 2. Don't be surprised, I only used a small part of my raw file to accelerate the run of my script, so here are only 769 lines.)
[766,1] = -357,966
[767,1] = -358,506
[768,1] = -359,010
[769,1] = -359,514
[1,2] = 26,295
[2,2] = 26,294
[3,2] = 26,294
[4,2] = 26,296
Furthermore I get the following error.
error: unary operator '-' not implemented for 'cell' operands
error: called from
Cloud_reconstruction at line 137 column 11
Also the approach with the dataframe octave package didn't work. When I run the following code it leads to the error you can see below.
dataframe2array = #(df) cell2mat( struct(df).x_data );
pkg load dataframe;
data = dataframe();
data.Rotationangle = [raw(:, 1)];
data.Measureddistance = [raw(:, 2)];
dataframe2array(data)
error:
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 147 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 106 column 20
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 176 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 106 column 20
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 147 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 176 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
error: RHS(_,2): but RHS has size 768x1
error: called from
df_matassign at line 179 column 11
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
Both error messages refer to the following part of my script where I'm doing the reconstruction of the point cloud in cylindrical coordinates.
distLaserCenter = 47; % Distance between the pipe centerline and the blind zone in mm
m = size(data,1); % Find the length of the first dimension of data
zincr = 0.4/360; % z increment in mm per deg
data(:,1) = -data(:,1);
for i = 1:m
data(i,2) = data(i,2) + distLaserCenter;
if i == 1
data(i,3) = 0;
elseif abs(data(i,1)-data(i-1)) < 100
data(i,3) = data(i-1,3) + zincr*(data(i,1)-data(i-1));
else abs(data(i,1)-data(i-1)) > 100;
data(i,3) = data(i-1,3) + zincr*(data(i,1)-(data(i-1)-360));
end
end
To give some background information for a better understanding. The script is used to reconstruct a pipe as a point cloud. The surface of the pipe was scanned from inside with a laser and the laser measured several points (distance from laser to the inner wall of the pipe) at each deg of rotation. I hope this helps to understand what I want to do with my script.
Not sure exactly what you're trying to do, but here's a toy example of how a struct could be used in an equivalent manner to a table:
matlab:
data = table;
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
table2array(data)
octave:
data = struct();
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
cell2mat( struct2cell (data ).' )
Note the transposition operation (.') before passing the result to cell2mat, since in a table, the 'fieldnames' are arranged horizontally in columns, whereas the struct2cell ends up arranging what used to be the 'fieldnames' as rows.
You might also be interested in the dataframe octave package, which performs similar functions to matlab's table (or in fact, R's dataframe object): https://octave.sourceforge.io/dataframe/ (you can install this by typing pkg install -forge dataframe in your console)
Unfortunately, the way to display the data as an array is still not ideal (see: https://stackoverflow.com/a/55417141/4183191), but you can easily convert that into a tiny function, e.g.
dataframe2array = #(df) cell2mat( struct(df).x_data );
Your code can then become:
pkg load dataframe;
data = dataframe();
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
dataframe2array(data)

Use column from CSV as a category label for plotting column chart using gnuplot

I have a CSV file looking like:
frameNo dataSeg paritySeg frameType
0 17 3 k
1 2 1 d
2 3 1 d
3 3 1 d
4 3 1 d
5 2 1 d
6 3 1 d
7 3 1 d
8 4 1 d
I'm able to plot stacked column diagram showing number of data and parity segments per frame. Looks like this:
What I'd like to add to it, however, is paint differently those columns (both data and parity) which have "k" marker in the last column. Basically, distinguish between two categories - "d" and "k".
Is that possible using gnuplot?
Here's the script I'm using:
set style histogram rowstacked;
set style data histograms;
set style fill solid;
set datafile separator "\t";
set terminal png size 2500,1500 enhanced font ",30";
set title "";
set tics font ",25";
set xlabel "Frame #" font ",25";
set ylabel "# of segments" font ",25";
set key outside;
set xrange [0:];
plot "segments.csv" using 2 t "Data", "" using 3 t "Parity";'
You could impose a custom condition on the columns being plotted and supply an invalid value (signaling to skip the particular data point) if this condition is not met:
set terminal pngcairo size 1200,600 enhanced font ",30";
set output 'test.png'
set style histogram rowstacked;
set style data histograms;
set style fill solid;
#set datafile separator "\t";
set title "";
set tics font ",25";
set xlabel "Frame #" font ",25";
set ylabel "# of segments" font ",25";
set key outside;
set xrange [0:];
fName = 'segments.csv'
plot \
fName using (strcol(4) eq 'd'?$2:1/0) t "Data d" lc rgb '#666666', \
fName using (strcol(4) eq 'd'?$3:1/0) t "Parity d" lc rgb '#ff0000', \
fName using (strcol(4) eq 'k'?$2:1/0) t "Data k" lc rgb '#000000', \
fName using (strcol(4) eq 'k'?$3:1/0) t "Parity k" lc rgb '#990000'
this would give (using the sample data in your question):

Gnuplot Function

How can I plot a function with x being a value from my datafile? Something like that:
set encoding utf8
set term postscript eps enhanced color font "Helvetica, 20"
set output 'kernel.eps'
# Mean & Standard Deviation
load "mean_sd.dat"
# Bandwidth
h = 1.6*sd*n**(-0.2)
# Kernel Function
K(x) = exp(-x*x/2.0)/(sqrt(2.0*pi))
# PLOT --> THIS DOES NOT WORK
# EACH VALUE IN $2 MUST BE USED FOR A SINGLE K(X)
plot for [i=1:n] 'probability.dat' using 0:(K((x - $2)/h))
My data file 'probability.dat':
366.000000 3.153012
366.000000 4.211409
366.000000 3.845248
366.000000 4.131654
366.000000 3.956508
Thank you in advance.
I am not sure that I understood your question correctly, but if you want to plot the kernel function for all values from the second column, then one could proceed for example as follows:
set encoding utf8
set term postscript eps enhanced color font "Helvetica, 20"
set output 'kernel.eps'
# Mean & Standard Deviation
sd=1
n=1
# Bandwidth
h = 1.6*sd*n**(-0.2)
# Kernel Function
K(x) = exp(-x*x/2.0)/(sqrt(2.0*pi))
# PLOT --> THIS DOES NOT WORK
# EACH VALUE IN $2 MUST BE USED FOR A SINGLE K(X)
fname = 'probability.txt'
N = system(sprintf("wc -l %s | gawk '{print $1}'", fname))
cmd(i) = system(sprintf("gawk 'NR==%d{print $2;exit}' %s", i, fname))
set key left top reverse
set xr [-10:10]
plot for [i=1:N] K((x - cmd(i))/h) title sprintf("%.3f", real(cmd(i))) lw 2
Here, the "strategy" is to:
find the total number or records in the input file with (alternatively, one could use the stats command)
N = system(sprintf("wc -l %s | gawk '{print $1}'", fname))
define a function which extracts the ith value from the input file
cmd(i) = sprintf("gawk 'NR==%d{print $2;exit}' %s", i, fname)
The output is then: