Octave: Plotyy log files from geothermal heat pump (import/plot datetime) - csv

I'm trying to plot values from my geothermal heat pump log files to analyse it's performance. I tried with excel but it was to slow and not possible to get the plot type I wanted so I'm trying Octave instead. I have absolutely no experience with octave so please forgive my incompetence!
I've processed the .log files with open office calc to get into a decent delimited format. The first column is datetime with the format MM/DD/YY HH:MM:SS, in total there is 21 columns (but I only need 5) and one header line with a label, coma delimiter is '.' and delimiter is ','. The file can be downloaded here and the first 7 columns look like this:
02/19/2018 23:07:00,-0.7,47.5,42,47.3,52.1,1.5
I'm currently trying to plot this with demonstration 3 plotyy from here. Column 2, 3, 5 and 8 imports correctly so I'm figuring it's a problem with the datetime column 1. How can I get Octave to import column 1 correctly and use it as x axis in this plot?:
data=csvread('heatpump.csv');
clf;
hold on
t=data(:,1);
x=data(:,3);
y=data(:,5);
z=data(:,2);
o=data(:,8);
[hax, h1, h2] = plotyy (t, x, t, y);
[~, h3, h4] = plotyy (t, z, t, o);
set ([h3, h4], "linestyle", "--");
xlabel (hax(1), "Time");
title (hax(2), 'Heat pump analysis');
ylabel (hax(1), "Radiator and hot water temp");
ylabel (hax(2), "Outdoor temp and brine out");

There are many, many ways. Here I show you how to read the csv using csv2cell from the io package. I've tried to modify your existing code as less as sane. The first columns is used verbatim (well, I inserted a linebreak) to the plot. There is also a commented version which actually does the conversion and you could then use datetick. Btw, If you add google drive links it would be cool if you add direct links so someone can easily grab the csv or insert the url in the code as I've done, see below.
set (0, "defaultlinelinewidth", 2);
url = "https://drive.google.com/uc?export=download&id=1K_czefz-Wz4HPdvc7YqIqIupPwMi8a7r";
fn = "heatpump.csv";
if (! exist (fn, "file"))
urlwrite (url, fn);
endif
pkg load io
d = csv2cell (fn);
# convert to serial date
# (but you don't have if you want to keep the old format)
#t = datenum (d(2:end,1), "mm/dd/yyyy HH:MM:SS");
data = cell2mat (d(2:end,2:end));
clf;
hold on
t = 1:rows (data);
# Attention: the date/time column time was removed above, so the indizes are shifted
x = data(:,2);
y = data(:,4);
z = data(:,1);
o = data(:,7);
[hax, h1, h2] = plotyy (t, x, t, y);
[hax2, h3, h4] = plotyy (t, z, t, o);
grid on
#set ([h3, h4], "linestyle", "--");
xlabel (hax(1), "Time");
title (hax(2), 'Heat pump analysis');
ylabel (hax(1), "Radiator and hot water temp");
ylabel (hax(2), "Outdoor temp and brine out");
# use date as xtick
# extract them
date_time = d (get(hax2(1), "xtick"), 1);
# break them after the date part
date_time = strrep (date_time, " ", "\n");
# feed them back
set (hax, "xticklabel", date_time)
set (hax2, "xticklabel", date_time)
print ("-S1200,1000", "-F:10", "out.png")

Related

How can I set the numbering of the x-axis of an Octave plot to engineering notation?

I made a very simple Octave script
a = [10e6, 11e6, 12e6];
b = [10, 11, 12];
plot(a, b, 'rd-')
which outputs the following graph.
Graph
Is it possible to set the numbering on the x-axis to engineering notation, rather than scientific, and have it display "10.5e+6, 11e+6, 11.5e+6" instead of "1.05e+7, 1.1e+7, 1.15+e7"?
While octave provides a 'short eng' formatting option, which does what you're asking for in terms of printing to the terminal, it does not appear to provide this functionality in plots or when formatting strings via sprintf.
Therefore you'll have to find a way to do this by yourself, with some creative string processing of the initial xticks, and substituting the plot's ticklabels accordingly. Thankfully it's not that hard :)
Using your example:
a = [10e6, 11e6, 12e6];
b = [10, 11, 12];
plot(a, b, 'rd-')
format short eng % display stdout in engineering format
TickLabels = disp( xticks ) % collect string as it would be displayed on the stdout
TickLabels = strsplit( TickLabels ) % tokenize at spaces
TickLabels = TickLabels( 2 : end - 1 ) % discard start and end empty tokens
TickLabels = regexprep( TickLabels, '\.0+e', 'e' ) % remove purely zero decimals using a regular expression
TickLabels = regexprep( TickLabels, '(\.[1-9]*)0+e', '$1e' ) % remove non-significant zeros in non-zero decimals using a regular expression
xticklabels( TickLabels ) % set the new ticklabels to the plot
format % reset short eng format back to default, if necessary

Dimension problem when converting a MATLAB .m script into an Octave compatible syntax

I want to run a MATLAB script M-file to reconstruct a point cloud in Octave. Therefore I had to rewrite some parts of the code to make it compatible with Octave. Actually the M-file works fine in Octave (I don't get any errors) and also the plotted point cloud looks good at first glance, but it seems that the variables are only half the size of the original MATLAB variables. In the attached screenshots you can see what I mean.
Octave:
MATLAB:
You can see that the dimension of e.g. M in Octave is 1311114x3 but in MATLAB it is 2622227x3. The actual number of rows in my raw file is 2622227 as well.
Here you can see an extract of the raw file (original data) that I use.
Rotation angle Measured distance
-0,090 26,295
-0,342 26,294
-0,594 26,294
-0,846 26,295
-1,098 26,294
-1,368 26,296
-1,620 26,296
-1,872 26,296
In MATLAB I created my output variable as follows.
data = table;
data.Rotationangle = cell2mat(raw(:, 1));
data.Measureddistance = cell2mat(raw(:, 2));
As there is no table function in Octave I wrote
data = cellfun(#(x)str2num(x), strrep(raw, ',', '.'))
instead.
Octave also has no struct2array function, so I had to replace it as well.
In MATLAB I wrote.
data = table2array(data);
In Octave this was a bit more difficult to do. I had to create a struct2array function, which I did by means of this bug report.
%% Create a struct2array function
function retval = struct2array (input_struct)
%input check
if (~isstruct (input_struct) || (nargin ~= 1))
print_usage;
endif
%convert to cell array and flatten/concatenate output.
retval = [ (struct2cell (input_struct)){:}];
endfunction
clear b;
b.a = data;
data = struct2array(b);
Did I make a mistake somewhere and could someone help me to solve this problem?
edit:
Here's the part of my script where I'm using raw.
delimiter = '\t';
startRow = 5;
formatSpec = '%s%s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'HeaderLines' ,startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
fclose(fileID);
%% Convert the contents of columns containing numeric text to numbers.
% Replace non-numeric text with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = mat2cell(dataArray{col}, ones(length(dataArray{col}), 1));
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[1,2]
% Converts text in the input cell array to numbers. Replaced non-numeric
% text with NaN.
rawData = dataArray{col};
for row=1:size(rawData, 1)
% Create a regular expression to detect and remove non-numeric prefixes and
% suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\.]*)+[\,]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\.]*)*[\,]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData(row), regexstr, 'names');
numbers = result.numbers;
% Detected commas in non-thousand locations.
invalidThousandsSeparator = false;
if numbers.contains('.')
thousandsRegExp = '^\d+?(\.\d{3})*\,{0,1}\d*$';
if isempty(regexp(numbers, thousandsRegExp, 'once'))
numbers = NaN;
invalidThousandsSeparator = true;
end
end
% Convert numeric text to numbers.
if ~invalidThousandsSeparator
numbers = strrep(numbers, '.', '');
numbers = strrep(numbers, ',', '.');
numbers = textscan(char(numbers), '%f');
numericData(row, col) = numbers{1};
raw{row, col} = numbers{1};
end
catch
raw{row, col} = rawData{row};
end
end
end
You don't see any raw in my workspaces because I clear all temporary variables before I reconstruct my point cloud.
Also my original data in row 1311114 and 1311115 look normal.
edit 2:
As suggested here is a small example table to clarify what I want and what MATLAB does with the table2array function in my case.
data =
-0.0900 26.2950
-0.3420 26.2940
-0.5940 26.2940
-0.8460 26.2950
-1.0980 26.2940
-1.3680 26.2960
-1.6200 26.2960
-1.8720 26.2960
With the struct2array function I used in Octave I get the following array.
data =
-0.090000 26.295000
-0.594000 26.294000
-1.098000 26.294000
-1.620000 26.296000
-2.124000 26.295000
-2.646000 26.293000
-3.150000 26.294000
-3.654000 26.294000
If you compare the Octave array with my original data, you can see that every second row is skipped. This seems to be the reason for 1311114 instead of 2622227 rows.
edit 3:
I tried to solve my problem with the suggestions of #Tasos Papastylianou, which unfortunately was not successful.
First I did the variant with a struct.
data = struct();
data.Rotationangle = [raw(:,1)];
data.Measureddistance = [raw(:,2)];
data = cell2mat( struct2cell (data ).' )
But this leads to the following structure in my script. (Unfortunately the result is not what I would like to have as shown in edit 2. Don't be surprised, I only used a small part of my raw file to accelerate the run of my script, so here are only 769 lines.)
[766,1] = -357,966
[767,1] = -358,506
[768,1] = -359,010
[769,1] = -359,514
[1,2] = 26,295
[2,2] = 26,294
[3,2] = 26,294
[4,2] = 26,296
Furthermore I get the following error.
error: unary operator '-' not implemented for 'cell' operands
error: called from
Cloud_reconstruction at line 137 column 11
Also the approach with the dataframe octave package didn't work. When I run the following code it leads to the error you can see below.
dataframe2array = #(df) cell2mat( struct(df).x_data );
pkg load dataframe;
data = dataframe();
data.Rotationangle = [raw(:, 1)];
data.Measureddistance = [raw(:, 2)];
dataframe2array(data)
error:
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 147 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 106 column 20
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 176 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 106 column 20
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 147 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 176 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
error: RHS(_,2): but RHS has size 768x1
error: called from
df_matassign at line 179 column 11
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
Both error messages refer to the following part of my script where I'm doing the reconstruction of the point cloud in cylindrical coordinates.
distLaserCenter = 47; % Distance between the pipe centerline and the blind zone in mm
m = size(data,1); % Find the length of the first dimension of data
zincr = 0.4/360; % z increment in mm per deg
data(:,1) = -data(:,1);
for i = 1:m
data(i,2) = data(i,2) + distLaserCenter;
if i == 1
data(i,3) = 0;
elseif abs(data(i,1)-data(i-1)) < 100
data(i,3) = data(i-1,3) + zincr*(data(i,1)-data(i-1));
else abs(data(i,1)-data(i-1)) > 100;
data(i,3) = data(i-1,3) + zincr*(data(i,1)-(data(i-1)-360));
end
end
To give some background information for a better understanding. The script is used to reconstruct a pipe as a point cloud. The surface of the pipe was scanned from inside with a laser and the laser measured several points (distance from laser to the inner wall of the pipe) at each deg of rotation. I hope this helps to understand what I want to do with my script.
Not sure exactly what you're trying to do, but here's a toy example of how a struct could be used in an equivalent manner to a table:
matlab:
data = table;
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
table2array(data)
octave:
data = struct();
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
cell2mat( struct2cell (data ).' )
Note the transposition operation (.') before passing the result to cell2mat, since in a table, the 'fieldnames' are arranged horizontally in columns, whereas the struct2cell ends up arranging what used to be the 'fieldnames' as rows.
You might also be interested in the dataframe octave package, which performs similar functions to matlab's table (or in fact, R's dataframe object): https://octave.sourceforge.io/dataframe/ (you can install this by typing pkg install -forge dataframe in your console)
Unfortunately, the way to display the data as an array is still not ideal (see: https://stackoverflow.com/a/55417141/4183191), but you can easily convert that into a tiny function, e.g.
dataframe2array = #(df) cell2mat( struct(df).x_data );
Your code can then become:
pkg load dataframe;
data = dataframe();
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
dataframe2array(data)

How to handle outliers in this octave boxplot for improved readability

Outliers between 1,5 - 3 times the interquantile range is marked with an "+" and above 3 times the IQR with an "o". But due to this data set with multiple outliers the below boxplot is very hard to read since the "+" and "o" symbols are plotted on top of each other creating what appears to be a thick red line.
I need to plot all data so removing them is not an option but I would be fine to display "longer" boxes, i.e. stretch the q1 and q4 to reach the true min/max values and skip the "+" and "o" outlier symbols. I would also be fine if just the min and max outliers was displayed.
I'm totally in the dark here and the octave boxplot documentation found here did not include any helpful examples on how to handle outliers. A search here at stackoverflow didn't get me closer to a solution either. So any help or directions is very appreciated!
How can I modify the below code to create a boxplot based on the same data set that is readable (i.e. doesn't plot outliers on top of each other creating a thick red line)?
I'm using Octave 4.2.1 64-bits on a Windows 10 machine with qt as the graphics_toolkit and with GDAL_TRANSLATE called from within Octave to handle the tif-files.
It's not an option to switch graphics_toolkit to gnuplot etc. since I haven't been able to "rotate" the plot (horizontal boxes instead of vertical). And it's in the .pdf file the results must have an effect, not only in octaves viewer.
Please forgive my totally "newbie-style" coding-work-around to get a proper high resolution pdf-exported:
pkg load statistics
clear all;
fns = glob ("*.tif");
for k=1:numel (fns)
ofn = tmpnam;
cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
[s, out] = system (cmd);
if (s != 0)
error ('calling gdal_translate failed with "%s"', out);
endif
fid = fopen (ofn, "r");
# read 6 headerlines
hdr = [];
for i=1:6
s = strsplit (fgetl (fid), " ");
hdr.(s{1}) = str2double (s{2});
endfor
d = dlmread (fid);
# check size against header
assert (size (d), [hdr.nrows hdr.ncols])
# set nodata to NA
d (d == hdr.NODATA_value) = NA;
raw{k} = d;
# create copy with existing values
raw_v{k} = d(! isna (d));
fclose (fid);
endfor
## generate plot
boxplot (raw_v)
set (gca, "xtick", 1:numel(fns),
"xticklabel", strrep (fns, ".tif", ""));
ylabel ("Plats kvar (meter)");
set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");
set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")
zoom (0.95)
view ([90 90])
print ("loudden_box_dotted.pdf", "-F:14")
I would just delete the outliers. This is easy because the handles are returned. I've also included some caching algorithm so you don't have to reload all tifs if you are playing with plots. Splitting the conversion, processing and plotting in different scripts is always a good idea (but not for stackoverflow where minimalistic examples are prefered). Here we go:
pkg load statistics
cache_fn = "input.raw";
# only process tif if not already done
if (! exist (cache_fn, "file"))
fns = glob ("*.tif");
for k=1:numel (fns)
ofn = tmpnam;
cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
printf ("calling '%s'...\n", cmd);
fflush (stdout);
[s, out] = system (cmd);
if (s != 0)
error ('calling gdal_translate failed with "%s"', out);
endif
fid = fopen (ofn, "r");
# read 6 headerlines
hdr = [];
for i=1:6
s = strsplit (fgetl (fid), " ");
hdr.(s{1}) = str2double (s{2});
endfor
d = dlmread (fid);
# check size against header
assert (size (d), [hdr.nrows hdr.ncols])
# set nodata to NA
d (d == hdr.NODATA_value) = NA;
raw{k} = d;
# create copy with existing values
raw_v{k} = d(! isna (d));
fclose (fid);
endfor
# save result
save (cache_fn, "raw_v", "fns");
else
load (cache_fn)
endif
## generate plot
[s, h] = boxplot (raw_v);
## in h you'll find now box, whisker, median, outliers and outliers2
## delete them
delete (h.outliers)
delete (h.outliers2)
set (gca, "xtick", 1:numel(fns),
"xticklabel", strrep (fns, ".tif", ""));
ylabel ("Plats kvar (meter)");
set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");
set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")
zoom (0.95)
view ([90 90])
print ("loudden_box_dotted.pdf", "-F:14")
gives

Shapefile with overlapping polygons: calculate average values

I have a very big polygon shapefile with hundreds of features, often overlapping each other. Each of these features has a value stored in the attribute table. I simply need to calculate the average values in the areas where they overlap.
I can imagine that this task requires several intricate steps: I was wondering if there is a straightforward methodology.
I’m open to every kind of suggestion, I can use ArcMap, QGis, arcpy scripts, PostGis, GDAL… I just need ideas. Thanks!
You should use the Union tool from ArcGIS. It will create new polygons where the polygons overlap. In order to keep the attributes from both polygons, add your polygon shapefile twice as input and use ALL as join_attributes parameter.This creates also polygons intersecting with themselves, you can select and delete them easily as they have the same FIDs. Then just add a new field to the attribute table and calculate it based on the two original value fields from the input polygons.
This can be done in a script or directly with the toolbox's tools.
After few attempts, I found a solution by rasterising all the features singularly and then performing cell statistics in order to calculate the average.
See below the script I wrote, please do not hesitate to comment and improve it!
Thanks!
#This script processes a shapefile of snow persistence (area of interest: Afghanistan).
#the input shapefile represents a month of snow cover and contains several features.
#each feature represents a particular day and a particular snow persistence (low,medium,high,nodata)
#these features are polygons multiparts, often overlapping.
#a feature of a particular day can overlap a feature of another one, but features of the same day and with
#different snow persistence can not overlap each other.
#(potentially, each shapefile contains 31*4 feature).
#the script takes the features singularly and exports each feature in a temporary shapefile
#which contains only one feature.
#Then, each feature is converted to raster, and after
#a logical conditional expression gives a value to the pixel according the intensity (high=3,medium=2,low=1,nodata=skipped).
#Finally, all these rasters are summed and divided by the number of days, in order to
#calculate an average value.
#The result is a raster with the average snow persistence in a particular month.
#This output raster ranges from 0 (no snow) to 3 (persistent snow for the whole month)
#and values outside this range should be considered as small errors in pixel overlapping.
#This script needs a particular folder structure. The folder C:\TEMP\Afgh_snow_cover contains 3 subfolders
#input, temp and outputs. The script takes care automatically of the cleaning of temporary data
import arcpy, numpy, os
from arcpy.sa import *
from arcpy import env
#function for finding unique values of a field in a FC
def unique_values_in_table(table, field):
data = arcpy.da.TableToNumPyArray(table, [field])
return numpy.unique(data[field])
#check extensions
try:
if arcpy.CheckExtension("Spatial") == "Available":
arcpy.CheckOutExtension("Spatial")
else:
# Raise a custom exception
#
raise LicenseError
except LicenseError:
print "spatial Analyst license is unavailable"
except:
print arcpy.GetMessages(2)
finally:
# Check in the 3D Analyst extension
#
arcpy.CheckInExtension("Spatial")
# parameters and environment
temp_folder = r"C:\TEMP\Afgh_snow_cover\temp_rasters"
output_folder = r"C:\TEMP\Afgh_snow_cover\output_rasters"
env.workspace = temp_folder
unique_field = "FID"
field_Date = "DATE"
field_Type = "Type"
cellSize = 0.02
fc = r"C:\TEMP\Afgh_snow_cover\input_shapefiles\snow_cover_Dec2007.shp"
stat_output_name = fc[-11:-4] + ".tif"
#print stat_output_name
arcpy.env.extent = "MAXOF"
#find all the uniquesID of the FC
uniqueIDs = unique_values_in_table(fc, "FID")
#make layer for selecting
arcpy.MakeFeatureLayer_management (fc, "lyr")
#uniqueIDs = uniqueIDs[-5:]
totFeatures = len(uniqueIDs)
#for each feature, get the date and the type of snow persistence(type can be high, medium, low and nodata)
for i in uniqueIDs:
SC = arcpy.SearchCursor(fc)
for row in SC:
if row.getValue(unique_field) == i:
datestring = row.getValue(field_Date)
typestring = row.getValue(field_Type)
month = str(datestring.month)
day = str(datestring.day)
year = str(datestring.year)
#format month and year string
if len(month) == 1:
month = '0' + month
if len(day) == 1:
day = '0' + day
#convert snow persistence to numerical value
if typestring == 'high':
typestring2 = 3
if typestring == 'medium':
typestring2 = 2
if typestring == 'low':
typestring2 = 1
if typestring == 'nodata':
typestring2 = 0
#skip the NoData features, and repeat the following for each feature (a feature is a day and a persistence value)
if typestring2 > 0:
#create expression for selecting the feature
expression = ' "FID" = ' + str(i) + ' '
#select the feature
arcpy.SelectLayerByAttribute_management("lyr", "NEW_SELECTION", expression)
#create
#outFeatureClass = os.path.join(temp_folder, ("M_Y_" + str(i)))
#create faeture class name, writing the snow persistence value at the end of the name
outFeatureClass = "Afg_" + str(year) + str(month) + str(day) + "_" + str(typestring2) + '.shp'
#export the feature
arcpy.FeatureClassToFeatureClass_conversion("lyr", temp_folder, outFeatureClass)
print "exported FID " + str(i) + " \ " + str(totFeatures)
#create name of the raster and convert the newly created feature to raster
outRaster = outFeatureClass[4:-4] + ".tif"
arcpy.FeatureToRaster_conversion(outFeatureClass, field_Type, outRaster, cellSize)
#remove the temporary fc
arcpy.Delete_management(outFeatureClass)
del SC, row
#now many rasters are created, representing the snow persistence types of each day.
#list all the rasters created
rasterList = arcpy.ListRasters("*", "All")
print rasterList
#now the rasters have values 1 and 0. the following loop will
#perform CON expressions in order to assign the value of snow persistence
for i in rasterList:
print i + ":"
inRaster = Raster(i)
#set the value of snow persistence, stored in the raster name
value_to_set = i[-5]
inTrueRaster = int(value_to_set)
inFalseConstant = 0
whereClause = "Value > 0"
# Check out the ArcGIS Spatial Analyst extension license
arcpy.CheckOutExtension("Spatial")
print 'Executing CON expression and deleting input'
# Execute Con , in order to assign to each pixel the value of snow persistence
print str(inTrueRaster)
try:
outCon = Con(inRaster, inTrueRaster, inFalseConstant, whereClause)
except:
print 'CON expression failed (probably empty raster!)'
nameoutput = i[:-4] + "_c.tif"
outCon.save(nameoutput)
#delete the temp rasters with values 0 and 1
arcpy.Delete_management(i)
#list the raster with values of snow persistence
rasterList = arcpy.ListRasters("*_c.tif", "All")
#sum the rasters
print "Caclulating SUM"
outCellStats = CellStatistics(rasterList, "SUM", "DATA")
#calculate the number of days (num of rasters/3)
print "Calculating day ratio"
num_of_rasters = len(rasterList)
print 'Num of rasters : ' + str(num_of_rasters)
num_of_days = num_of_rasters / 3
print 'Num of days : ' + str(num_of_days)
#in order to store decimal values, multiplicate the raster by 1000 before dividing
outCellStats = outCellStats * 1000 / num_of_days
#save the output raster
print "saving output " + stat_output_name
stat_output_name = os.path.join(output_folder,stat_output_name)
outCellStats.save(stat_output_name)
#delete the remaining temporary rasters
print "deleting CON rasters"
for i in rasterList:
print "deleting " + i
arcpy.Delete_management(i)
arcpy.Delete_management("lyr")
Could you rasterize your polygons into multiple layers, each pixel could contain your attribute value. Then merge the layers by averaging the attribute values?

How do I feed in my own data into PyAlgoTrade?

I'm trying to use PyAlogoTrade's event profiler
However I don't want to use data from yahoo!finance, I want to use my own but can't figure out how to parse in the CSV, it is in the format:
Timestamp Low Open Close High BTC_vol USD_vol [8] [9]
2013-11-23 00 800 860 847.666666 886.876543 853.833333 6195.334452 5248330 0
2013-11-24 00 745 847.5 815.01 860 831.255 10785.94131 8680720 0
The complete CSV is here
I want to do something like:
def main(plot):
instruments = ["AA", "AES", "AIG"]
feed = yahoofinance.build_feed(instruments, 2008, 2009, ".")
Then replace yahoofinance.build_feed(instruments, 2008, 2009, ".") with my CSV
I tried:
import csv
with open( 'FinexBTCDaily.csv', 'rb' ) as csvfile:
data = csv.reader( csvfile )
def main( plot ):
feed = data
But it throws an attribute error. Any ideas how to do this?
I suggest to create your own Rowparser and Feed, which is much easier than it sounds, have a look here: yahoofeed
This also allows you to work with intraday data and cleanup the data if needed, like your timestamp.
Another possibility, of course, would be to parse your file and save it, so it looks like a yahoo feed. In your case, you would have to adapt the columns and the Timestamp.
Step A: follow PyAlgoTrade doc on GenericBarFeed class
On this link see the addBarsFromCSV() in CSV section of the BarFeed class in v0.16
On this link see the addBarsFromCSV() in CSV section of the BarFeed class in v0.17
Note
- The CSV file must have the column names in the first row.
- It is ok if the Adj Close column is empty.
- When working with multiple instruments:
--- If all the instruments loaded are in the same timezone, then the timezone parameter may not be specified.
--- If any of the instruments loaded are in different timezones, then the timezone parameter should be set.
addBarsFromCSV( instrument, path, timezone = None )
Loads bars for a given instrument from a CSV formatted file. The instrument gets registered in the bar feed.
Parameters:
(string) instrument – Instrument identifier.
(string) path – The path to the CSV file.
(pytz) timezone – The timezone to use to localize bars.Check pyalgotrade.marketsession.
Next:
A BarFeed loads bars from CSV files that have the following format:
Date Time, Open, High, Low, Close, Volume, Adj Close
2013-01-01 13:59:00,13.51001,13.56,13.51,13.56789,273.88014126,13.51001
Step B: implement a documented CSV-file pre-formatting
Your CSV data will need a bit of sanity ( before will be able to be used in PyAlgoTrade methods ),however it is doable and you can create an easy transformator either by hand or with a use of a powerful numpy.genfromtxt() lambda-based converters facilities.
This sample code is intended for an illustration purpose, to see immediately the powers of converters for your own transformations, as CSV-structure differs.
with open( getCsvFileNAME( ... ), "r" ) as aFH:
numpy.genfromtxt( aFH,
skip_header = 1, # Ref. pyalgotrade
delimiter = ",",
# v v v v v v
# 2011.08.30,12:00,1791.20,1792.60,1787.60,1789.60,835
# 2011.08.30,13:00,1789.70,1794.30,1788.70,1792.60,550
# 2011.08.30,14:00,1792.70,1816.70,1790.20,1812.10,1222
# 2011.08.30,15:00,1812.20,1831.50,1811.90,1824.70,2373
# 2011.08.30,16:00,1824.80,1828.10,1813.70,1817.90,2215
converters = { 0: lambda aString: mPlotDATEs.date2num( datetime.datetime.strptime( aString, "%Y.%m.%d" ) ), #_______________________________________asFloat ( 1.0, +++ )
1: lambda aString: ( ( int( aString[0:2] ) * 60 + int( aString[3:] ) ) / 60. / 24. ) # ( 15*60 + 00 ) / 60. / 24.__asFloat < 0.0, 1.0 )
# HH: :MM HH MM
}
)
You can use pyalgotrade.barfeed.addBarsFromSequence with list comprehension to feed in data from CSV row by row/bar by bar. Basically you create a bar from each row, pass OHLCV as init parameters and extra columns with additional data in a dictionary. You can try something like this (with all the required imports):
data = pd.DataFrame(index=pd.date_range(start='2021-11-01', end='2021-11-05'), columns=['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume', 'ExtraCol1', 'ExtraCol3', 'ExtraCol4', 'ExtraCol5'], data=np.random.rand(5, 10))
feed = yahoofeed.Feed()
feed.addBarsFromSequence('instrumentID', data.index.map(lambda i:
BasicBar(
i,
data.loc[i, 'Open'],
data.loc[i, 'High'],
data.loc[i, 'Low'],
data.loc[i, 'Close'],
data.loc[i, 'Volume'],
data.loc[i, 'Adj Close'],
Frequency.DAY,
data.loc[i, 'ExtraCol1':].to_dict())
).values)
The input data frame was created with random values to make this example easier to reproduce, but the part where the bars are added to the feed should work the same for data frames from CSVs given that the valid column names are used.