Use column from CSV as a category label for plotting column chart using gnuplot - csv

I have a CSV file looking like:
frameNo dataSeg paritySeg frameType
0 17 3 k
1 2 1 d
2 3 1 d
3 3 1 d
4 3 1 d
5 2 1 d
6 3 1 d
7 3 1 d
8 4 1 d
I'm able to plot stacked column diagram showing number of data and parity segments per frame. Looks like this:
What I'd like to add to it, however, is paint differently those columns (both data and parity) which have "k" marker in the last column. Basically, distinguish between two categories - "d" and "k".
Is that possible using gnuplot?
Here's the script I'm using:
set style histogram rowstacked;
set style data histograms;
set style fill solid;
set datafile separator "\t";
set terminal png size 2500,1500 enhanced font ",30";
set title "";
set tics font ",25";
set xlabel "Frame #" font ",25";
set ylabel "# of segments" font ",25";
set key outside;
set xrange [0:];
plot "segments.csv" using 2 t "Data", "" using 3 t "Parity";'

You could impose a custom condition on the columns being plotted and supply an invalid value (signaling to skip the particular data point) if this condition is not met:
set terminal pngcairo size 1200,600 enhanced font ",30";
set output 'test.png'
set style histogram rowstacked;
set style data histograms;
set style fill solid;
#set datafile separator "\t";
set title "";
set tics font ",25";
set xlabel "Frame #" font ",25";
set ylabel "# of segments" font ",25";
set key outside;
set xrange [0:];
fName = 'segments.csv'
plot \
fName using (strcol(4) eq 'd'?$2:1/0) t "Data d" lc rgb '#666666', \
fName using (strcol(4) eq 'd'?$3:1/0) t "Parity d" lc rgb '#ff0000', \
fName using (strcol(4) eq 'k'?$2:1/0) t "Data k" lc rgb '#000000', \
fName using (strcol(4) eq 'k'?$3:1/0) t "Parity k" lc rgb '#990000'
this would give (using the sample data in your question):

Related

Gnuplot timefmt with different lengths

I try to plot sth out of two .csv files.
In the first there is the timeformat %Y-%m-%d %H:%M:%S and in the second %H:%M:%S.
It isn't possible for me, to show both graphs at once.
When i erase the "set timefmt/xrange "%Y-%m-%d %H:%M:%S"", only the other graph is showed and the other way around.
Somebody have an idea, what I can do?
File1:
1;2022-11-24 17:21:34;0;+3.311;+0.004;+0.003;+0.001;+0.000;+0.000;+0.000;+0.000;+0.001;-0.001;+0.001;+0.000;-0.001;+0.000;-0.001;+0.000;+0.000;-0.001;+0.002;-0.001;LLLLLLLLLL;LLLLLLLLLL;LLLLLLLLL
2;2022-11-24 17:21:34;200;+3.311;+0.007;+0.002;+0.001;-0.001;+0.000;+0.000;-0.001;+0.001;-0.001;+0.001;+0.000;-0.002;+0.001;-0.001;+0.000;+0.001;-0.001;+0.001;-0.001;LLLLLLLLLL;LLLLLLLLLL;LLLLLLLLL
...
File2:
17:22:28;3.446;1.398;0.007;4.817508;0.025
17:22:29;3.447;1.398;0.008;4.818906;0.027
17:22:30;3.448;1.398;0.008;4.820303999999999;0.029
...
My code:
set grid
set datafile separator ";"
set title 'xxx'
set title font ",12"
set ylabel 'U/V' font ",12"
#set format x "%H:%M:%S"
set key box font ",12"
#myformat = "%Y-%m-%d %H:%M:%S"
#set key at strptime(myformat,"2022-11-24 18:02:55"), 3.005
set xtics time
set xlabel 'time' font ",12"
set yrange [3:3.7]
set ytics font ",10"
set y2tics font ",10"
set border 11
set border lw 2
set xtics font ",8"
set tics nomirror
set term wxt size 1200, 460
set xdata time
**#set timefmt "%Y-%m-%d %H:%M:%S"
set timefmt "%H:%M:%S"
#set xrange ["2022-11-24 17:22:00":"2022-11-24 18:47:00"]
set xrange ["17:20:00":"18:47:00"]**
plot'xxx.CSV' using (timecolumn(2, "%Y-%m-%d %H:%M:%S")):4 every ::43::25741 title "aaa" lt 7 lc 7 with lines, \
'yyy.csv' using (timecolumn(1, "%H:%M:%S")):2 title "bbb" lt 3 lc 6 with lines
My code:
set grid
set datafile separator ";"
set title 'xxx'
set title font ",12"
set ylabel 'U/V' font ",12"
#set format x "%H:%M:%S"
set key box font ",12"
#myformat = "%Y-%m-%d %H:%M:%S"
#set key at strptime(myformat,"2022-11-24 18:02:55"), 3.005
set xtics time
set xlabel 'time' font ",12"
set yrange [3:3.7]
set ytics font ",10"
set y2tics font ",10"
set border 11
set border lw 2
set xtics font ",8"
set tics nomirror
set term wxt size 1200, 460
set xdata time
**#set timefmt "%Y-%m-%d %H:%M:%S"
set timefmt "%H:%M:%S"
#set xrange ["2022-11-24 17:22:00":"2022-11-24 18:47:00"]
set xrange ["17:20:00":"18:47:00"]**
plot'xxx.CSV' using (timecolumn(2, "%Y-%m-%d %H:%M:%S")):4 every ::43::25741 title "aaa" lt 7 lc 7 with lines, \
'yyy.csv' using (timecolumn(1, "%H:%M:%S")):2 title "bbb" lt 3 lc 6 with lines
I think the problem is that only one of your data files gives a specific date.
If you read in time data using format "%H:%M:%S" (no year/date given) then the times are assumed to be relative to the epoch date 1-Jan-1970. So those data points come out 52 years off from the 2022 data points.
Option 1:
If all the data points in both files are from the same day, then I suggest the easiest thing to do is skip the date information in the file where it is present. I.e.
set timefmt "%H:%M:%S"
set xrange ["17:20:00":"18:47:00"]
plot 'xxx.csv' using (timecolumn(2, "2022-11-24 %H:%M:%S")):4 title "aaa" \
'yyy.csv' using (timecolumn(1, "%H:%M:%S")):2 title "bbb"
The string "2022-11-24" must match on input for the first file but it doesn't actually contribute to the date calculation.
Option 2:
If you really do care about the date, or if the first file spans multiple dates so that a constant string cannot match, then you could instead add a date component to the time string in the second file by concatenating a string constant containing the date.
myfmt = "%Y-%m-%d %H:%M:%S"
set timefmt "%Y-%m-%d %H:%M:%S"
set xrange ["2022-11-24 17:20:00":"2022-11-24 18:47:00"]
plot 'xxx.csv' using (timecolumn(2, myfmt)):4 title "aaa" lt 7 lc 7 with lp, \
'yyy.csv' using (strptime(myfmt,"2022-11-24 ".strcol(1))):2 title "bbb" lt 3 lc 6 with lp

Grouped bar plot with multiple labels in x-axis

I am trying to replicate something close to the following graph in gnuplot as I need to use it on a latex paper. I have tried a lot but I cannot make the two-line labels at the bottom. Could you please guide me? Also, how is it possible to have the % character as part of a label in the x-axis? Latex complains about it.
The data are in the following format (example). Each different color corresponds to different method. Blue is method 1 (m1), orange is method 2 (m2), and brown is method 3 (m3)
#% system1-m1 system1-m2 system1-m3 system2-m1 ...
0.5% 16 8 15 6
1% 15 17 16 8
2% 12 10 20 15
Thanks
Edit
My code so far is as follows:
set rmargin 0
set key outside tmargin center top horizontal width 3
set border
set grid
set boxwidth 0.8
set style fill solid 1.00
set xtics nomirror rotate by 0
set format y '%1.f'
set yrange [0 to 22]
set ylabel 'Gain (\%)'
set ytics 0, 5
set style data histograms
set label 1 at -0.3, -4 '|---------System 1------------|'
set label 2 at 2.7, -4 '|---------System 2------------|'
plot "./data/metrics.dat" using 2:xtic(1) title 'Method 1' ,\
"" using 3 title 'Method 2', \
"" using 4 title 'Method 3',
And I have modified the .dat file as
0.5 16 8 15
1.0 15 17 16
2.0 12 10 20
0.5 13 6 4
1.0 11 13 13
2.0 14 12 14
because I cannot make it print the % character. The output graph is
As you can see it is not scalable. I have to put labels by hand (trial and error) and also the labels below the x-axis do not contain the % character.
We've been close: set format x '%.1f\%%'. The following works for me with cairolatex terminal (check help cairolatex).
Code:
### percent sign for tic label in TeX
reset session
set term cairolatex
set output 'SO70029830.tex'
set title 'Some \TeX\ or \LaTeX\ title: $a^2 + b^2 = c^2$'
set format x '%.1f\%%'
plot x
set output
### end of code
Result: (screenshot)
Addition:
Sorry, I forgot the second part of your question: the labels.
Furthermore, in your graph you are using xtic(1) as tic labels, i.e. text format, so the command set format x '%.1f\%%' from my answer above will not help here. One possible solution would be to create and use your special TeX label like this:
myTic(col) = sprintf('%.1f\%%',column(col))
plot $Data using 2:xtic(myTic(1))
For the labels, I would use arrows and labels. Each histogram is placed at integer numbers starting from 0. So, the arrows have to go from x-values -0.5 to 2.5 and from 2.5 to 5.5. The labels are placed at x-value 1 and 4. There is certainly room for improvements.
Code:
### tic labels with % for TeX and lines/labels
reset session
set term cairolatex
set output 'SO70029830.tex'
$Data <<EOD
0.5 16 8 15
1.0 15 17 16
2.0 12 10 20
0.5 13 6 4
1.0 11 13 13
2.0 14 12 14
EOD
set rmargin 0
set key outside center top horizontal width 3
set border
set grid
set boxwidth 0.8
set style fill solid 1.00
set xtics nomirror rotate by 0
set format y '%1.f'
set yrange [0 to 22]
set ylabel 'Gain (\%)'
set ytics 0, 5
set style data histograms
set bmargin 4
set arrow 1 from -0.5, screen 0.05 to 2.5, screen 0.05 heads size 0.05,90
set label 1 at 1, screen 0.05 'System 1' center offset 0,-0.7
set arrow 2 from 2.5, screen 0.05 to 5.5, screen 0.05 heads size 0.05,90
set label 2 at 4, screen 0.05 'System 2' center offset 0,-0.7
myTic(col) = sprintf('%.1f\%%',column(col))
plot $Data using 2:xtic(myTic(1)) title 'Method 1' ,\
"" using 3 title 'Method 2', \
"" using 4 title 'Method 3',
set output
### enf of code
Result: (screenshot from LaTeX document)
As an alternative to the answer of #theozh there is already a build-in function called newhistogram that directly allows to place labels below the x-axis.
While working on an an answer that involves newhistogram I discovered a bug with horizontal key layout, which is now fixed thanks to Ethan. So, with the newest development version of gnuplot at hand I am able to offer a solution that allows for more finetuning like the ability to change the inter-group spacing.
set terminal cairolatex standalone colour header '\usepackage{siunitx}' size 25cm, 7cm
# generate some random data in your format
N = 7
set print $MYDATA
do for [i=1:N] {
print sprintf('0.5 %f %f %f', rand(0)*20, rand(0)*20, rand(0)*20)
print sprintf('1.0 %f %f %f', rand(0)*20, rand(0)*20, rand(0)*20)
print sprintf("2.0 %f %f %f", rand(0)*20, rand(0)*20, rand(0)*20)
}
unset print
# define the look
set style data histograms
set style fill solid 1.00
set boxwidth 0.8
set key horizontal outside t c width 1
set xr [-1:27]
set xtics nomirror
set ytics out 5 nomirror
set grid y # I don't think vertical grid lines are needed here
set ylabel 'Gain/\%'
set rmargin 0.01
set bmargin 3
As for the tic marks, I adapted #theozh's answer a bit – since you are using LaTeX already, you might as well parse the numbers through siunitx, which will ensure correct spacing between numbers and the unit:
myTic(col) = sprintf('\SI{%.1f}{\%}',column(col))
The vertical separation marks like in the screenshot you provided can be created iteratively:
do for [i=1:N+1] {set arrow i from first -1+(i-1)*4, graph 0 to first -1+(i-1)*4, screen 0 lw 2 nohead}
Now for the actual plot command:
plot newhistogram "System 1" offset 0,-0.5 lt 1, for [i=1:3] $MYDATA using (column(i+1)):xtic(myTic(1)) every ::0::2 title sprintf('Method %.0f',i), \
newhistogram "System 2" offset 0,-0.5 lt 1 at 4, for [i=1:3] $MYDATA using (column(i+1)):xtic(myTic(1)) every ::3::5 not, \
newhistogram "System 3" offset 0,-0.5 lt 1 at 8, for [i=1:3] $MYDATA using (column(i+1)):xtic(myTic(1)) every ::6::8 not, \
newhistogram "System 4" offset 0,-0.5 lt 1 at 12, for [i=1:3] $MYDATA using (column(i+1)):xtic(myTic(1)) every ::9::11 not, \
newhistogram "System 5" offset 0,-0.5 lt 1 at 16, for [i=1:3] $MYDATA using (column(i+1)):xtic(myTic(1)) every ::12::14 not, \
newhistogram "System 6" offset 0,-0.5 lt 1 at 20, for [i=1:3] $MYDATA using (column(i+1)):xtic(myTic(1)) every ::15::17 not, \
newhistogram "System 7" offset 0,-0.5 lt 1 at 24, for [i=1:3] $MYDATA using (column(i+1)):xtic(myTic(1)) every ::18::20 not
That looks very nasty, what's going on here?
newhistogram creates a new group of histogram boxes, its first argument is a string that is put below the x axis. It is also told to reset the linetype counter to 1.
Then the three columns of the data are plotted iteratively, but not all lines at once, but only the first three lines, with corresponding key entries.
Then another newhistogram is created and it is told to start at the x value 4 (which would be the default anyway). Now the next three lines are plotted, and so.
Now, every time newhistogram is called an empty line is added to key, hence making trouble with the key placement. Therefore the new keyword introduced by Ethan is
set style histogram nokeyseparators
which will disable this behaviour.
As you see, the spaces between the groups are larger than inside. You might want to change the numbers in newhistogram at ... and adjust the calculation of vertical line positions accordingly.
The plot command is of course highly repetitive, and it would be nice to make it an iterative call. Unfortunately, iterations that span multiple objects are not possible within a plot call. However, it is possible to iteratively put the plot command string together (excessively using string concatenation .) and then plot it.
A = 'newhistogram "System '
B = '" offset 0,-0.5 lt 1'
C = 'for [i=1:3] $MYDATA using (column(i+1)):xtic(myTic(1)) every ::'
myplotstring = A.'1'.B.', '.C."0::2 title sprintf('Method %.0f',i),"
do for [i=2:N] {myplotstring = myplotstring.A.i.B.'at '.(4*(i-1)).', '.C.(3*i-3).'::'.(3*i-1).' not, '}
plot #myplotstring

GNUPlot if statement on plot

I've a csv data file like this:
Sensor1;value;iteration
Sensor2;value;iteration
Sensor2;value;iteration
Sensor1;value;iteration
Sensor2;value;iteration
Can I plot two different lines in base of my 1st col value? one for Sensor1 and another for Sensor2 in same plot.
Now I plot all data as follow:
set terminal jpeg
set output 'testimage.jpeg'
set autoscale # scale axes automatically
unset log # remove any log-scaling
unset label # remove any previous labels
set xtic auto # set xtics automatically
set ytic auto # set ytics automatically
set datafile separator ";"
set xrange [1:10000]
set yrange [3000:5000]
plot "result_test_day_1.csv" using 5:3:(stringcolumn(1) eq "Sensor1"? $2:1/0) title "a" lc rgb "blue" with lines
plot "result_test_day_1.csv" using 5:3:(stringcolumn(1) eq "Sensor2"? $2:1/0) title "b" lc rgb "red" with lines

Using different colors in Gnuplot based on a CSV file column value

I have a CSV file with the following structure:
X,Y,Z
where X and Y are coordinates on a square plot and Z can be 0/1. I want to plot points with different color, depending on the value in the Z column.
Is that possible?
So far I have a file which just displays all the data on the square chart and colors them with only 1 color:
filename='test.csv'
set datafile separator ","
set title filename
set size square
plot filename using 0:1 linecolor rgb "yellow"
It's all in the documentation, check help rgbcolor variable :
rgb(r,g,b) = 65536 * int(r) + 256 * int(g) + int(b)
color1=rgb(255,0,0); color2=rgb(0,255,0)
plot fname using 1:2:($3==0?color1:color2) w p lc rgb variable

Gnuplot stats does not work as expected: max value not right

Assuming to have the following 4 datasets:
a.csv
1,1
2,3
3,5
5,6
6,9
7,9
8,10
9,12
10,13
b.csv
1,1
2,5
3,10
5,15
6,20
7,25
8,30
9,35
10,40
c.csv
1,1
2,10
3,100
5,1000
6,2000
7,5000
8,10000
9,20000
10,50000
d.csv
1,1
2,20
3,300
5,5000
6,9000
7,10000
8,15000
9,30000
10,100000
In Gnuplot I've tried to run the command stats on each of them to get the maximum value for x and y (i.e., columns 1 and 2) and to set the corresponding xrange & yrange. Unfortunately, the result is not the one I've expected.
Here is the full script:
#!/usr/bin/env gnuplot
set terminal latex
set term pngcairo enhanced size 1500,800
set output 'plot.png'
set multiplot layout 2,2
set xlabel 't' font ',16'
set ylabel '#pkt' font ',16'
set grid xtics lt 0 lw 1 lc rgb "#333333"
set grid ytics lt 0 lw 1 lc rgb "#333333"
set xtics font ',14'
set ytics font ',14'
set key font ',12'
set title font ',20'
set datafile separator ','
###
set title '(a)'
stats "a.csv" using 1:2 name "a"
set xrange [0:a_max_x]
set yrange [0:a_max_y+a_max_y*0.5]
plot "a.csv" using 1:2 title 'v1' with lines linewidth 3 linecolor rgb 'blue'
###
set title '(b)'
stats "b.csv" using 1:2 name "b"
set xrange [0:b_max_x]
set yrange [0:b_max_y+b_max_y*0.5]
plot "b.csv" using 1:2 title 'v1' with lines linewidth 3 linecolor rgb 'blue'
###
set title '(c)'
stats "c.csv" using 1:2 name "c"
set xrange [0:c_max_x]
set yrange [0:c_max_y+c_max_y*0.5]
plot "c.csv" using 1:2 title 'v1' with lines linewidth 3 linecolor rgb 'blue'
###
set title '(d)'
stats "d.csv" using 1:2 name "d"
set xrange [0:d_max_x]
set yrange [0:d_max_y+d_max_y*0.5]
plot "d.csv" using 1:2 title 'v1' with lines linewidth 3 linecolor rgb 'blue'
###
unset multiplot
and the result:
As you can see, maximum values in the plots b, c and d are not correct. Indeed, the verbose output of stats returns:
[...]
Maximum: 10.0000 [8] 13.0000 [8]
[...]
Maximum: 5.0000 [3] 15.0000 [3]
[...]
Maximum: 2.0000 [1] 10.0000 [1]
[...]
Maximum: 1.0000 [0] 1.0000 [0]
[...]
Apparently, only stats for the plot a is right. Is there anything wrong in my script?
You need you reinitialize xrange and yrange after setting them each time, because otherwise stats finds some of you points outside the range you have previously set and does not take them into account. It's the last line below:
set title '(a)'
stats "a.csv" using 1:2 name "a"
set xrange [0:a_max_x]
set yrange [0:a_max_y+a_max_y*0.5]
plot "a.csv" using 1:2 title 'v1' with lines linewidth 3 linecolor rgb 'blue'
set xrange [*:*] ; set yrange [*:*] # <--- This line after each plot will fix your issue
In your case there is no need to use stats in order to set the ranges.
Your requirements are:
Use tight limits for the xrange and the yrange. You get this with set autoscale fix.
Extend the maximum of the yrange by 50%. That is achieved with set offsets 0,0,graph 0.5,0:
#!/usr/bin/env gnuplot
set term pngcairo enhanced size 1500,800
set output 'plot.png'
set multiplot layout 2,2
set xlabel 't' font ',16'
set ylabel '#pkt' font ',16'
set grid xtics ytics lt 0 lw 1 lc rgb "#333333"
set tics font ',14'
set key font ',12'
set title font ',20'
set datafile separator ','
set style data lines
set style line 1 linewidth 3 linecolor rgb 'blue'
###
set title '(a)'
set autoscale fix
set offset 0,0,graph 0.5,0
plot "a.csv" using 1:2 title 'v1' linestyle 1
###
set title '(b)'
plot "b.csv" using 1:2 title 'v1' linestyle 1
###
set title '(c)'
plot "c.csv" using 1:2 title 'v1' linestyle 1
###
set title '(d)'
plot "d.csv" using 1:2 title 'v1' linestyle 1
###
unset multiplot
One further comment: If you're going to use a LaTeX-based terminal for your actual image, don't use latex, but rather epslatex, cairolatex, context or lua tikz, which are all much better regarding the supported features and quality.