How to handle outliers in this octave boxplot for improved readability - octave

Outliers between 1,5 - 3 times the interquantile range is marked with an "+" and above 3 times the IQR with an "o". But due to this data set with multiple outliers the below boxplot is very hard to read since the "+" and "o" symbols are plotted on top of each other creating what appears to be a thick red line.
I need to plot all data so removing them is not an option but I would be fine to display "longer" boxes, i.e. stretch the q1 and q4 to reach the true min/max values and skip the "+" and "o" outlier symbols. I would also be fine if just the min and max outliers was displayed.
I'm totally in the dark here and the octave boxplot documentation found here did not include any helpful examples on how to handle outliers. A search here at stackoverflow didn't get me closer to a solution either. So any help or directions is very appreciated!
How can I modify the below code to create a boxplot based on the same data set that is readable (i.e. doesn't plot outliers on top of each other creating a thick red line)?
I'm using Octave 4.2.1 64-bits on a Windows 10 machine with qt as the graphics_toolkit and with GDAL_TRANSLATE called from within Octave to handle the tif-files.
It's not an option to switch graphics_toolkit to gnuplot etc. since I haven't been able to "rotate" the plot (horizontal boxes instead of vertical). And it's in the .pdf file the results must have an effect, not only in octaves viewer.
Please forgive my totally "newbie-style" coding-work-around to get a proper high resolution pdf-exported:
pkg load statistics
clear all;
fns = glob ("*.tif");
for k=1:numel (fns)
ofn = tmpnam;
cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
[s, out] = system (cmd);
if (s != 0)
error ('calling gdal_translate failed with "%s"', out);
endif
fid = fopen (ofn, "r");
# read 6 headerlines
hdr = [];
for i=1:6
s = strsplit (fgetl (fid), " ");
hdr.(s{1}) = str2double (s{2});
endfor
d = dlmread (fid);
# check size against header
assert (size (d), [hdr.nrows hdr.ncols])
# set nodata to NA
d (d == hdr.NODATA_value) = NA;
raw{k} = d;
# create copy with existing values
raw_v{k} = d(! isna (d));
fclose (fid);
endfor
## generate plot
boxplot (raw_v)
set (gca, "xtick", 1:numel(fns),
"xticklabel", strrep (fns, ".tif", ""));
ylabel ("Plats kvar (meter)");
set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");
set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")
zoom (0.95)
view ([90 90])
print ("loudden_box_dotted.pdf", "-F:14")

I would just delete the outliers. This is easy because the handles are returned. I've also included some caching algorithm so you don't have to reload all tifs if you are playing with plots. Splitting the conversion, processing and plotting in different scripts is always a good idea (but not for stackoverflow where minimalistic examples are prefered). Here we go:
pkg load statistics
cache_fn = "input.raw";
# only process tif if not already done
if (! exist (cache_fn, "file"))
fns = glob ("*.tif");
for k=1:numel (fns)
ofn = tmpnam;
cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
printf ("calling '%s'...\n", cmd);
fflush (stdout);
[s, out] = system (cmd);
if (s != 0)
error ('calling gdal_translate failed with "%s"', out);
endif
fid = fopen (ofn, "r");
# read 6 headerlines
hdr = [];
for i=1:6
s = strsplit (fgetl (fid), " ");
hdr.(s{1}) = str2double (s{2});
endfor
d = dlmread (fid);
# check size against header
assert (size (d), [hdr.nrows hdr.ncols])
# set nodata to NA
d (d == hdr.NODATA_value) = NA;
raw{k} = d;
# create copy with existing values
raw_v{k} = d(! isna (d));
fclose (fid);
endfor
# save result
save (cache_fn, "raw_v", "fns");
else
load (cache_fn)
endif
## generate plot
[s, h] = boxplot (raw_v);
## in h you'll find now box, whisker, median, outliers and outliers2
## delete them
delete (h.outliers)
delete (h.outliers2)
set (gca, "xtick", 1:numel(fns),
"xticklabel", strrep (fns, ".tif", ""));
ylabel ("Plats kvar (meter)");
set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");
set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")
zoom (0.95)
view ([90 90])
print ("loudden_box_dotted.pdf", "-F:14")
gives

Related

Plotly Express: Prevent bars from stacking when Y-axis catgories have the same name

I'm new to plotly.
Working with:
Ubuntu 20.04
Python 3.8.10
plotly==5.10.0
I'm doing a comparative graph using a horizontal bar chart. Different instruments measuring the same chemical compounds. I want to be able to do an at-a-glance, head-to-head comparison if the measured value amongst all machines.
The problem is; if the compound has the same name amongst the different instruments - Plotly stacks the data bars into a single bar with segment markers. I very much want each bar to appear individually. Is there a way to prevent Plotly Express from automatically stacking the common bars??
Examples:
CODE
gobardata = []
for blended_name in _df[:20].blended_name: # should always be unique
##################################
# Unaltered compound names
compound_names = [str(c) for c in _df[_df.blended_name == blended_name]["injcompound_name"].tolist()]
# Random number added to end of compound_names to make every string unique
# compound_names = ["{} ({})".format(str(c),random.randint(0, 1000)) for c in _df[_df.blended_name == blended_name]["injcompound_name"].tolist()]
##################################
deltas = _df[_df.blended_name == blended_name]["delta_rettime"].to_list()
gobardata.append(
go.Bar(
name = blended_name,
x = deltas,
y = compound_names,
orientation='h',
))
fig = go.Figure(data = gobardata)
fig.update_traces(width=1)
fig.update_layout(
bargap=1,
bargroupgap=.1,
xaxis_title="Delta Retention Time (Expected - actual)",
yaxis_title="Instrument name(Injection ID)"
)
fig.show()
What I'm getting (Using actual, but repeated, compound names)
What I want (Adding random text to each compound name to make it unique)
OK. Figured it out. This is probably pretty klugy, but it consistently works.
Basically...
Use go.FigureWidget...
...with make_subplots having a common x-axis...
...controlling the height of each subplot based on number of bars.
Every bar in each subplot is added as an individual trace...
...using a dictionary matching bar name to a common color.
The y-axis labels for each subplot is a list containing the machine name as [0], and then blank placeholders ('') so the length of the y-axis list matches the number of bars.
And manually manipulating the legend so each bar name appears only once.
# Get lists of total data
all_compounds = list(_df.injcompound_name.unique())
blended_names = list(_df.blended_name.unique())
#################################################################
# The heights of each subplot have to be set when fig is created.
# fig has to be created before adding traces.
# So, create a list of dfs, and use these to calculate the subplot heights
dfs = []
subplot_height_multiplier = 20
subplot_heights = []
for blended_name in blended_names:
df = _df[(_df.blended_name == blended_name)]#[["delta_rettime", "injcompound_name"]]
dfs.append(df)
subplot_heights.append(df.shape[0] * subplot_height_multiplier)
chart_height = sum(subplot_heights) # Prep for the height of the overall chart.
chart_width = 1000
# Make the figure
fig = make_subplots(
rows=len(blended_names),
cols=1,
row_heights = subplot_heights,
shared_xaxes=True,
)
# Create the color dictionary to match a color to each compound
_CSS_color = CSS_chart_color_list()
colors = {}
for compound in all_compounds:
try: colors[compound] = _CSS_color.pop()
except IndexError:
# Probably ran out of colors, so just reuse
_CSS_color = CSS_color.copy()
colors[compound] = _CSS_color.pop()
rowcount = 1
for df in dfs:
# Add bars individually to each subplot
bars = []
for label, labeldf in df.groupby('injcompound_name'):
fig.add_trace(
go.Bar(x = labeldf.delta_rettime,
y = [labeldf.blended_name.iloc[0]]+[""]*(len(labeldf.delta_rettime)-1),
name = label,
marker = {'color': colors[label]},
orientation = 'h',
),
row=rowcount,
col=1,
)
rowcount += 1
# Set figure to FigureWidget
fig = go.FigureWidget(fig)
# Adding individual traces creates redundancies in the legend.
# This removes redundancies from the legend
names = set()
fig.for_each_trace(
lambda trace:
trace.update(showlegend=False)
if (trace.name in names) else names.add(trace.name))
fig.update_layout(
height=chart_height,
width=chart_width,
title_text="∆ of observed RT to expected RT",
showlegend = True,
)
fig.show()

Running timeseries graphing function in Rmd producing cluttered x-axis labels (not present in test code)

I have a folder of xx .csv timeseries that I want to graph and knit into a clean HTML document. I have a ggplot code that produces the plot that I want using a single timeseries.csv. However, when I try to put the bones of that ggplot code in a function inside of a for loop to run each of the timeseries.csv files through the function I get a some plots with pretty different formatting.
Plot generated with my test ggplot code:
Plot generated with function and for loop:
Changes I'm trying to make to the ugly Rmd plot:
Nicely space the x-axis tick marks to whole mins (i.e. "11:14:00", "11:15:00")
Connect the data points (solved with subbing geom_line() with geom_path())
Example Rmd Code Below. Please Note that the graphs produced still have nice formatting, I'm not sure how to reproduce this problem sort of posting a 500 row dataframe. I also don't know how to post my rmd code without SO using the formatting commands in this post, so I threw in at 3 of " around my header formatting and at the end of the code to disable it.
Edits and Updates
I am getting a persistent error geom_path: Each group consists of only one observation. Do you need to adjust the group
aesthetic?.
As suggested by the commenters I tried removing plot() and using the the createChlDiffPlot() directly and replacing plot() with print(). Both produce the same ugly plots as before.
Replaced geom_line() with geom_path(). The points are now connected! x-axis cluttering is still there.
Time variable is reading as hms num
Many thanks for any help on this!
```
---
title: "Chl Filtration"
output:
flexdashboard::flex_dashboard:
theme: yeti
orientation: rows
editor_options:
chunk_output_type: console
---
```{r setup}
library(flexdashboard)
library(dplyr)
library(ggplot2)
library(hms)
library(ggthemes)
library(readr)
library(data.table)
#### Example Data
df1 <- data.frame(Time = as_hms(c("11:22:33","11:22:34","11:22:35","11:22:38","11:23:00","11:23:01","11:23:02")),
Chl_ug_L_Up = c(0.2,0.1,0.25,-0.2,-0.3,-0.15,0.1),
Chl_ug_L_Down = c(0.5,0.4,0.3,0.2,0.1,0,-0.1))
df2 <- data.frame(Time = as_hms(c("08:02:33","08:02:34","08:02:35","08:02:40","08:02:42","08:02:43","08:02:49")),
Chl_ug_L_Up = c(-0.2,-0.1,-0.25,0.2,0.3,0.15,-0.1),
Chl_ug_L_Down = c(-0.1,0,0.1,0.2,0.3,0.4,0.1))
data_directory = "./" # data folder in R project folder in the real deal
output_directory = "./" # output graph directory in R project folder
write_csv(df1, file.path(data_directory, "SO_example_df1.csv"))
write_csv(df2, file.path(data_directory, "SO_example_df2.csv"))
#### Function to create graphs
createChlDiffPlot = function(aTimeSeriesFile, aFileName, aGraphOutputDirectory, aType)
{
aFile_Mod = aTimeSeriesFile %<>%
select(Time, Chl_ug_L_Up, Chl_ug_L_Down) %>%
mutate(Chl_diff = Chl_ug_L_Up - Chl_ug_L_Down)
one_plot = ggplot(data = aFile_Mod, aes(x = Time, y = Chl_diff)) + # tried adding 'group = 1' in aes to connect points
geom_path(size = 1, color = "green") +
geom_point(color = "green") +
theme_gdocs() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.title = element_blank()) +
labs(x = "", y = "Chl Difference", title = paste0(aFileName, " - ", "Filtration"))
one_graph_name = paste0(gsub(".csv", "", aFileName), "_", aType, ".pdf")
ggsave(one_graph_name, one_plot, dpi = 600, width = 7, height = 5, units = "in", device = "pdf", aGraphOutputDirectory)
return(one_plot)
}
"``` ### remove the quotes when running example
Plots - After Velocity Adjustment
=====================================" ### remove quotes when running example
```{r, fig.width=13.5, fig.height=5}
all_files_Filtration = list.files(data_directory, pattern = ".csv")
# Loop to plot function
for(file in 1 : length(all_files_Filtration))
{
file_name = all_files_Filtration[file]
one_file = fread(file.path(data_directory, file_name))
# plot the time series agains
plot(createChlDiffPlot(one_file, file_name, output_directory, "Velocity_Paired"))
}
"``` #remove quotes when running example
```
I finally figured it out.
1) Replacing geom_line() with geom_path() connected the data points when rendered in Rmd.
2) df1$Time was formatted as a difftime object. When I looked at the dataframe in the global environment, Time :hmsnum 11:11:09 .... This made me think my format was ok, but when I ran class(df1$Time) I got [1] "hms" "difftime". With a quick google I found out difftime objects are not quite the same as hms, and my original time was generated by subtracting times. I added a conversion into my mutate function:
select(Time, Chl_ug_L_Up, Chl_ug_L_Down) %>%
mutate(Chl_diff = Chl_ug_L_Up - Chl_ug_L_Down,
Time = as_hms(Time)) # convert difftime objecct to hms
ggplot I think has some auto-formatting for hms variables, which is why difftime variable was producing ugly crowded x- axes.

How to automatically crop an .OBJ 3D model to a bounding box?

In the now obsoleted Autodesk ReCap API it was possible to specify a "bounding box" around the scene to be generated from images.
In the resulting models, any vertices outside the bounding box were discarded, and any volumes that extended beyond the bounding box were truncated to have faces at the box boundaries.
I am now using Autodesk's Forge Reality Capture API which replaced ReCap. Apparently, This new API does not allow the user to specify a bounding box.
So I am now searching for a program that takes an .OBJ file and a specified bounding box as input, and outputs a file of just the vertices and faces within this bounding box.
Given that there is no way to specify the bounding box in Reality Capture API, I created this python program. It is crude, in that it only discards faces that have vertices that are outside the bounding box. And it actually does discards nondestructively, only by commenting them out in the output OBJ file. This allows you to uncomment them and then use a different bounding box.
This may not be what you need if you truly want to remove all relevant v, vn, vt, vp and f lines that are outside the bounding box, because the OBJ file size remains mostly unchanged. But for my particular needs, keeping all the records and just using comments was preferable.
# obj3Dcrop.py
# (c) Scott L. McGregor, Dec 2019
# License: free for all non commercial uses. Contact author for any other uses.
# Changes and Enhancements must be shared with author, and be subject to same use terms
# TL;DR: This program uses a bounding box, and "crops" faces and vertices from a
# Wavefront .OBJ format file, created by Autodesk Forge Reality Capture API
# if one of the vertices in a face is not within the bounds of the box.
#
# METHOD
# 1) All lines other than "v" vertex definitions and "f" faces definitions
# are copied UNCHANGED from the input .OBJ file to an output .OBJ file.
# 2) All "v" vertex definition lines have their (x, y, z) positions tested to see if:
# minX < x < maxX and minY < y < maxY and minZ < z < maxZ ?
# If TRUE, we want to keep this vertex in the new OBJ, so we
# store its IMPLICIT ORDINAL position in the file in a dictionary called v_keepers.
# If FALSE, we will use its absence from the v_keepers file as a way to identify
# faces that contain it and drop them. All "v" lines are also copied unchanged to the
# output file.
# 3) All "f" lines (face definitions) are inspected to verify that all 3 vertices in the face
# are in the v_keepers list. If they are, the f line is output unchanged.
# 4) Any "f" line that refers to a vertex that was cropped, is prefixed by "# CROPPED: "
# in the output file. Lines beginning # are treated as comments, and ignored in future
# processing.
# KNOWN LIMITATIONS: This program generates models in which the outside of bound faces
# have been removed. The vertices that were found outside the bounding box, are still in the
# OBJ file, but they are now disconnected and therefore ignored in later processing.
# The "f" lines for faces with vertices outside the bounding box are also still in the
# output file, but now commented out, so they don't process. Because this is non-destructive.
# we can easily change our bounding box later, uncomment cropped lines and reprocess.
#
# This might be an incomplete solution for some potential users. For such users
# a more complete program would delete unneeded v, vn, vt and vp lines when the v vertex
# that they refer to is dropped. But note that this requires renumbering all references to these
# vertice definitions in the "f" face definition lines. Such a more complete solution would also
# DISCARD all 'f' lines with any vertices that are out of bounds, instead of making them copies.
# Such a rewritten .OBJ file would be var more compact, but changing the bounding box would require
# saving the pre-cropped original.
# QUIRK: The OBJ file format defines v, vn, vt, vp and f elements by their
# IMPLICIT ordinal occurrence in the file, with each element type maintaining
# its OWN separate sequence. It then references those definitions EXPLICITLY in
# f face definitions. So deleting (or commenting out) element references requires
# appropriate rewriting of all the"f"" lines tracking all the new implicit positions.
# Such rewriting is not particularly hard to do, but it is one more place to make
# a mistake, and could make the algorithm more complicated to understand.
# This program doesn't bother, because all further processing of the output
# OBJ file ignores unreferenced v, vn, vt and vp elements.
#
# Saving all lines rather than deleting them to save space is a tradeoff involving considerations of
# Undo capability, compute cycles, compute space (unreferenced lines) and maintenance complexity choice.
# It is left to the motivated programmer to add this complexity if needed.
import sys
#bounding_box = sys.argv[1] # should be in the only string passsed (maxX, maxY, maxZ, minX, minY, minZ)
bounding_box = [10, 10, 10, -10, -10, 1]
maxX = bounding_box[0]
maxY = bounding_box[1]
maxZ = bounding_box[2]
minX = bounding_box[3]
minY = bounding_box[4]
minZ = bounding_box[5]
v_keepers = dict() # keeps track of which vertices are within the bounding box
kept_vertices = 0
discarded_vertices = 0
kept_faces = 0
discarded_faces = 0
discarded_lines = 0
kept_lines = 0
obj_file = open('sample.obj','r')
new_obj_file = open('cropped.obj','w')
# the number of the next "v" vertex lines to process.
original_v_number = 1 # the number of the next "v" vertex lines to process.
new_v_number = 1 # the new ordinal position of this vertex if out of bounds vertices were discarded.
for line in obj_file:
line_elements = line.split()
# Python doesn't have a SWITCH statement, but we only have three cases, so we'll just use cascading if stmts
if line_elements[0] != "f": # if it isn't an "f" type line (face definition)
if line_elements[0] != "v": # and it isn't an "v" type line either (vertex definition)
# ************************ PROCESS ALL NON V AND NON F LINE TYPES ******************
# then we just copy it unchanged from the input OBJ to the output OBJ
new_obj_file.write(line)
kept_lines = kept_lines + 1
else: # then line_elements[0] == "v":
# ************************ PROCESS VERTICES ****************************************
# a "v" line looks like this:
# f x y z ...
x = float(line_elements[1])
y = float(line_elements[2])
z = float(line_elements[3])
if minX < x < maxX and minY < y < maxY and minZ < z < maxZ:
# if vertex is within the bounding box, we include it in the new OBJ file
new_obj_file.write(line)
v_keepers[str(original_v_number)] = str(new_v_number)
new_v_number = new_v_number + 1
kept_vertices = kept_vertices +1
kept_lines = kept_lines + 1
else: # if vertex is NOT in the bounding box
new_obj_file.write(line)
discarded_vertices = discarded_vertices +1
discarded_lines = discarded_lines + 1
original_v_number = original_v_number + 1
else: # line_elements[0] == "f":
# ************************ PROCESS FACES ****************************************
# a "f" line looks like this:
# f v1/vt1/vn1 v2/vt2/vn2 v3/vt3/vn3 ...
# We need to delete any face lines where ANY of the 3 vertices v1, v2 or v3 are NOT in v_keepers.
v = ["", "", ""]
# Note that v1, v2 and v3 are the first "/" separated elements within each line element.
for i in range(0,3):
v[i] = line_elements[i+1].split('/')[0]
# now we can check if EACH of these 3 vertices are in v_keepers.
# for each f line, we need to determine if all 3 vertices are in the v_keepers list
if v[0] in v_keepers and v[1] in v_keepers and v[2] in v_keepers:
new_obj_file.write(line)
kept_lines = kept_lines + 1
kept_faces = kept_faces +1
else: # at least one of the vertices in this face has been deleted, so we need to delete the face too.
discarded_lines = discarded_lines + 1
discarded_faces = discarded_faces +1
new_obj_file.write("# CROPPED "+line)
# end of line processing loop
obj_file.close()
new_obj_file.close()
print ("kept vertices: ", kept_vertices ,"discarded vertices: ", discarded_vertices)
print ("kept faces: ", kept_faces, "discarded faces: ", discarded_faces)
print ("kept lines: ", kept_lines, "discarded lines: ", discarded_lines)
Unfortunately, (at least for now) there is no way to specify the bounding box in Reality Capture API.

Octave: Plotyy log files from geothermal heat pump (import/plot datetime)

I'm trying to plot values from my geothermal heat pump log files to analyse it's performance. I tried with excel but it was to slow and not possible to get the plot type I wanted so I'm trying Octave instead. I have absolutely no experience with octave so please forgive my incompetence!
I've processed the .log files with open office calc to get into a decent delimited format. The first column is datetime with the format MM/DD/YY HH:MM:SS, in total there is 21 columns (but I only need 5) and one header line with a label, coma delimiter is '.' and delimiter is ','. The file can be downloaded here and the first 7 columns look like this:
02/19/2018 23:07:00,-0.7,47.5,42,47.3,52.1,1.5
I'm currently trying to plot this with demonstration 3 plotyy from here. Column 2, 3, 5 and 8 imports correctly so I'm figuring it's a problem with the datetime column 1. How can I get Octave to import column 1 correctly and use it as x axis in this plot?:
data=csvread('heatpump.csv');
clf;
hold on
t=data(:,1);
x=data(:,3);
y=data(:,5);
z=data(:,2);
o=data(:,8);
[hax, h1, h2] = plotyy (t, x, t, y);
[~, h3, h4] = plotyy (t, z, t, o);
set ([h3, h4], "linestyle", "--");
xlabel (hax(1), "Time");
title (hax(2), 'Heat pump analysis');
ylabel (hax(1), "Radiator and hot water temp");
ylabel (hax(2), "Outdoor temp and brine out");
There are many, many ways. Here I show you how to read the csv using csv2cell from the io package. I've tried to modify your existing code as less as sane. The first columns is used verbatim (well, I inserted a linebreak) to the plot. There is also a commented version which actually does the conversion and you could then use datetick. Btw, If you add google drive links it would be cool if you add direct links so someone can easily grab the csv or insert the url in the code as I've done, see below.
set (0, "defaultlinelinewidth", 2);
url = "https://drive.google.com/uc?export=download&id=1K_czefz-Wz4HPdvc7YqIqIupPwMi8a7r";
fn = "heatpump.csv";
if (! exist (fn, "file"))
urlwrite (url, fn);
endif
pkg load io
d = csv2cell (fn);
# convert to serial date
# (but you don't have if you want to keep the old format)
#t = datenum (d(2:end,1), "mm/dd/yyyy HH:MM:SS");
data = cell2mat (d(2:end,2:end));
clf;
hold on
t = 1:rows (data);
# Attention: the date/time column time was removed above, so the indizes are shifted
x = data(:,2);
y = data(:,4);
z = data(:,1);
o = data(:,7);
[hax, h1, h2] = plotyy (t, x, t, y);
[hax2, h3, h4] = plotyy (t, z, t, o);
grid on
#set ([h3, h4], "linestyle", "--");
xlabel (hax(1), "Time");
title (hax(2), 'Heat pump analysis');
ylabel (hax(1), "Radiator and hot water temp");
ylabel (hax(2), "Outdoor temp and brine out");
# use date as xtick
# extract them
date_time = d (get(hax2(1), "xtick"), 1);
# break them after the date part
date_time = strrep (date_time, " ", "\n");
# feed them back
set (hax, "xticklabel", date_time)
set (hax2, "xticklabel", date_time)
print ("-S1200,1000", "-F:10", "out.png")

Gnuplot Function

How can I plot a function with x being a value from my datafile? Something like that:
set encoding utf8
set term postscript eps enhanced color font "Helvetica, 20"
set output 'kernel.eps'
# Mean & Standard Deviation
load "mean_sd.dat"
# Bandwidth
h = 1.6*sd*n**(-0.2)
# Kernel Function
K(x) = exp(-x*x/2.0)/(sqrt(2.0*pi))
# PLOT --> THIS DOES NOT WORK
# EACH VALUE IN $2 MUST BE USED FOR A SINGLE K(X)
plot for [i=1:n] 'probability.dat' using 0:(K((x - $2)/h))
My data file 'probability.dat':
366.000000 3.153012
366.000000 4.211409
366.000000 3.845248
366.000000 4.131654
366.000000 3.956508
Thank you in advance.
I am not sure that I understood your question correctly, but if you want to plot the kernel function for all values from the second column, then one could proceed for example as follows:
set encoding utf8
set term postscript eps enhanced color font "Helvetica, 20"
set output 'kernel.eps'
# Mean & Standard Deviation
sd=1
n=1
# Bandwidth
h = 1.6*sd*n**(-0.2)
# Kernel Function
K(x) = exp(-x*x/2.0)/(sqrt(2.0*pi))
# PLOT --> THIS DOES NOT WORK
# EACH VALUE IN $2 MUST BE USED FOR A SINGLE K(X)
fname = 'probability.txt'
N = system(sprintf("wc -l %s | gawk '{print $1}'", fname))
cmd(i) = system(sprintf("gawk 'NR==%d{print $2;exit}' %s", i, fname))
set key left top reverse
set xr [-10:10]
plot for [i=1:N] K((x - cmd(i))/h) title sprintf("%.3f", real(cmd(i))) lw 2
Here, the "strategy" is to:
find the total number or records in the input file with (alternatively, one could use the stats command)
N = system(sprintf("wc -l %s | gawk '{print $1}'", fname))
define a function which extracts the ith value from the input file
cmd(i) = sprintf("gawk 'NR==%d{print $2;exit}' %s", i, fname)
The output is then: