Shapefile with overlapping polygons: calculate average values - gis

I have a very big polygon shapefile with hundreds of features, often overlapping each other. Each of these features has a value stored in the attribute table. I simply need to calculate the average values in the areas where they overlap.
I can imagine that this task requires several intricate steps: I was wondering if there is a straightforward methodology.
I’m open to every kind of suggestion, I can use ArcMap, QGis, arcpy scripts, PostGis, GDAL… I just need ideas. Thanks!

You should use the Union tool from ArcGIS. It will create new polygons where the polygons overlap. In order to keep the attributes from both polygons, add your polygon shapefile twice as input and use ALL as join_attributes parameter.This creates also polygons intersecting with themselves, you can select and delete them easily as they have the same FIDs. Then just add a new field to the attribute table and calculate it based on the two original value fields from the input polygons.
This can be done in a script or directly with the toolbox's tools.

After few attempts, I found a solution by rasterising all the features singularly and then performing cell statistics in order to calculate the average.
See below the script I wrote, please do not hesitate to comment and improve it!
Thanks!
#This script processes a shapefile of snow persistence (area of interest: Afghanistan).
#the input shapefile represents a month of snow cover and contains several features.
#each feature represents a particular day and a particular snow persistence (low,medium,high,nodata)
#these features are polygons multiparts, often overlapping.
#a feature of a particular day can overlap a feature of another one, but features of the same day and with
#different snow persistence can not overlap each other.
#(potentially, each shapefile contains 31*4 feature).
#the script takes the features singularly and exports each feature in a temporary shapefile
#which contains only one feature.
#Then, each feature is converted to raster, and after
#a logical conditional expression gives a value to the pixel according the intensity (high=3,medium=2,low=1,nodata=skipped).
#Finally, all these rasters are summed and divided by the number of days, in order to
#calculate an average value.
#The result is a raster with the average snow persistence in a particular month.
#This output raster ranges from 0 (no snow) to 3 (persistent snow for the whole month)
#and values outside this range should be considered as small errors in pixel overlapping.
#This script needs a particular folder structure. The folder C:\TEMP\Afgh_snow_cover contains 3 subfolders
#input, temp and outputs. The script takes care automatically of the cleaning of temporary data
import arcpy, numpy, os
from arcpy.sa import *
from arcpy import env
#function for finding unique values of a field in a FC
def unique_values_in_table(table, field):
data = arcpy.da.TableToNumPyArray(table, [field])
return numpy.unique(data[field])
#check extensions
try:
if arcpy.CheckExtension("Spatial") == "Available":
arcpy.CheckOutExtension("Spatial")
else:
# Raise a custom exception
#
raise LicenseError
except LicenseError:
print "spatial Analyst license is unavailable"
except:
print arcpy.GetMessages(2)
finally:
# Check in the 3D Analyst extension
#
arcpy.CheckInExtension("Spatial")
# parameters and environment
temp_folder = r"C:\TEMP\Afgh_snow_cover\temp_rasters"
output_folder = r"C:\TEMP\Afgh_snow_cover\output_rasters"
env.workspace = temp_folder
unique_field = "FID"
field_Date = "DATE"
field_Type = "Type"
cellSize = 0.02
fc = r"C:\TEMP\Afgh_snow_cover\input_shapefiles\snow_cover_Dec2007.shp"
stat_output_name = fc[-11:-4] + ".tif"
#print stat_output_name
arcpy.env.extent = "MAXOF"
#find all the uniquesID of the FC
uniqueIDs = unique_values_in_table(fc, "FID")
#make layer for selecting
arcpy.MakeFeatureLayer_management (fc, "lyr")
#uniqueIDs = uniqueIDs[-5:]
totFeatures = len(uniqueIDs)
#for each feature, get the date and the type of snow persistence(type can be high, medium, low and nodata)
for i in uniqueIDs:
SC = arcpy.SearchCursor(fc)
for row in SC:
if row.getValue(unique_field) == i:
datestring = row.getValue(field_Date)
typestring = row.getValue(field_Type)
month = str(datestring.month)
day = str(datestring.day)
year = str(datestring.year)
#format month and year string
if len(month) == 1:
month = '0' + month
if len(day) == 1:
day = '0' + day
#convert snow persistence to numerical value
if typestring == 'high':
typestring2 = 3
if typestring == 'medium':
typestring2 = 2
if typestring == 'low':
typestring2 = 1
if typestring == 'nodata':
typestring2 = 0
#skip the NoData features, and repeat the following for each feature (a feature is a day and a persistence value)
if typestring2 > 0:
#create expression for selecting the feature
expression = ' "FID" = ' + str(i) + ' '
#select the feature
arcpy.SelectLayerByAttribute_management("lyr", "NEW_SELECTION", expression)
#create
#outFeatureClass = os.path.join(temp_folder, ("M_Y_" + str(i)))
#create faeture class name, writing the snow persistence value at the end of the name
outFeatureClass = "Afg_" + str(year) + str(month) + str(day) + "_" + str(typestring2) + '.shp'
#export the feature
arcpy.FeatureClassToFeatureClass_conversion("lyr", temp_folder, outFeatureClass)
print "exported FID " + str(i) + " \ " + str(totFeatures)
#create name of the raster and convert the newly created feature to raster
outRaster = outFeatureClass[4:-4] + ".tif"
arcpy.FeatureToRaster_conversion(outFeatureClass, field_Type, outRaster, cellSize)
#remove the temporary fc
arcpy.Delete_management(outFeatureClass)
del SC, row
#now many rasters are created, representing the snow persistence types of each day.
#list all the rasters created
rasterList = arcpy.ListRasters("*", "All")
print rasterList
#now the rasters have values 1 and 0. the following loop will
#perform CON expressions in order to assign the value of snow persistence
for i in rasterList:
print i + ":"
inRaster = Raster(i)
#set the value of snow persistence, stored in the raster name
value_to_set = i[-5]
inTrueRaster = int(value_to_set)
inFalseConstant = 0
whereClause = "Value > 0"
# Check out the ArcGIS Spatial Analyst extension license
arcpy.CheckOutExtension("Spatial")
print 'Executing CON expression and deleting input'
# Execute Con , in order to assign to each pixel the value of snow persistence
print str(inTrueRaster)
try:
outCon = Con(inRaster, inTrueRaster, inFalseConstant, whereClause)
except:
print 'CON expression failed (probably empty raster!)'
nameoutput = i[:-4] + "_c.tif"
outCon.save(nameoutput)
#delete the temp rasters with values 0 and 1
arcpy.Delete_management(i)
#list the raster with values of snow persistence
rasterList = arcpy.ListRasters("*_c.tif", "All")
#sum the rasters
print "Caclulating SUM"
outCellStats = CellStatistics(rasterList, "SUM", "DATA")
#calculate the number of days (num of rasters/3)
print "Calculating day ratio"
num_of_rasters = len(rasterList)
print 'Num of rasters : ' + str(num_of_rasters)
num_of_days = num_of_rasters / 3
print 'Num of days : ' + str(num_of_days)
#in order to store decimal values, multiplicate the raster by 1000 before dividing
outCellStats = outCellStats * 1000 / num_of_days
#save the output raster
print "saving output " + stat_output_name
stat_output_name = os.path.join(output_folder,stat_output_name)
outCellStats.save(stat_output_name)
#delete the remaining temporary rasters
print "deleting CON rasters"
for i in rasterList:
print "deleting " + i
arcpy.Delete_management(i)
arcpy.Delete_management("lyr")

Could you rasterize your polygons into multiple layers, each pixel could contain your attribute value. Then merge the layers by averaging the attribute values?

Related

Plotly Express: Prevent bars from stacking when Y-axis catgories have the same name

I'm new to plotly.
Working with:
Ubuntu 20.04
Python 3.8.10
plotly==5.10.0
I'm doing a comparative graph using a horizontal bar chart. Different instruments measuring the same chemical compounds. I want to be able to do an at-a-glance, head-to-head comparison if the measured value amongst all machines.
The problem is; if the compound has the same name amongst the different instruments - Plotly stacks the data bars into a single bar with segment markers. I very much want each bar to appear individually. Is there a way to prevent Plotly Express from automatically stacking the common bars??
Examples:
CODE
gobardata = []
for blended_name in _df[:20].blended_name: # should always be unique
##################################
# Unaltered compound names
compound_names = [str(c) for c in _df[_df.blended_name == blended_name]["injcompound_name"].tolist()]
# Random number added to end of compound_names to make every string unique
# compound_names = ["{} ({})".format(str(c),random.randint(0, 1000)) for c in _df[_df.blended_name == blended_name]["injcompound_name"].tolist()]
##################################
deltas = _df[_df.blended_name == blended_name]["delta_rettime"].to_list()
gobardata.append(
go.Bar(
name = blended_name,
x = deltas,
y = compound_names,
orientation='h',
))
fig = go.Figure(data = gobardata)
fig.update_traces(width=1)
fig.update_layout(
bargap=1,
bargroupgap=.1,
xaxis_title="Delta Retention Time (Expected - actual)",
yaxis_title="Instrument name(Injection ID)"
)
fig.show()
What I'm getting (Using actual, but repeated, compound names)
What I want (Adding random text to each compound name to make it unique)
OK. Figured it out. This is probably pretty klugy, but it consistently works.
Basically...
Use go.FigureWidget...
...with make_subplots having a common x-axis...
...controlling the height of each subplot based on number of bars.
Every bar in each subplot is added as an individual trace...
...using a dictionary matching bar name to a common color.
The y-axis labels for each subplot is a list containing the machine name as [0], and then blank placeholders ('') so the length of the y-axis list matches the number of bars.
And manually manipulating the legend so each bar name appears only once.
# Get lists of total data
all_compounds = list(_df.injcompound_name.unique())
blended_names = list(_df.blended_name.unique())
#################################################################
# The heights of each subplot have to be set when fig is created.
# fig has to be created before adding traces.
# So, create a list of dfs, and use these to calculate the subplot heights
dfs = []
subplot_height_multiplier = 20
subplot_heights = []
for blended_name in blended_names:
df = _df[(_df.blended_name == blended_name)]#[["delta_rettime", "injcompound_name"]]
dfs.append(df)
subplot_heights.append(df.shape[0] * subplot_height_multiplier)
chart_height = sum(subplot_heights) # Prep for the height of the overall chart.
chart_width = 1000
# Make the figure
fig = make_subplots(
rows=len(blended_names),
cols=1,
row_heights = subplot_heights,
shared_xaxes=True,
)
# Create the color dictionary to match a color to each compound
_CSS_color = CSS_chart_color_list()
colors = {}
for compound in all_compounds:
try: colors[compound] = _CSS_color.pop()
except IndexError:
# Probably ran out of colors, so just reuse
_CSS_color = CSS_color.copy()
colors[compound] = _CSS_color.pop()
rowcount = 1
for df in dfs:
# Add bars individually to each subplot
bars = []
for label, labeldf in df.groupby('injcompound_name'):
fig.add_trace(
go.Bar(x = labeldf.delta_rettime,
y = [labeldf.blended_name.iloc[0]]+[""]*(len(labeldf.delta_rettime)-1),
name = label,
marker = {'color': colors[label]},
orientation = 'h',
),
row=rowcount,
col=1,
)
rowcount += 1
# Set figure to FigureWidget
fig = go.FigureWidget(fig)
# Adding individual traces creates redundancies in the legend.
# This removes redundancies from the legend
names = set()
fig.for_each_trace(
lambda trace:
trace.update(showlegend=False)
if (trace.name in names) else names.add(trace.name))
fig.update_layout(
height=chart_height,
width=chart_width,
title_text="∆ of observed RT to expected RT",
showlegend = True,
)
fig.show()

For loop that stores variable in progressive order

I am trying to work on a matlab script that calculates a 1x1854 matrix called N2. This routine has to be performed 1000 times because each iteration the input data files are different. I am trying to store the matrix N2 in progressive order for each iteration, like N2_1, N2_2 ecc. How should implement that?
for ii=1:1000
file1 = load(['/Users/gianmarcobroilo/Desktop/1000shifts/delays/GRV_JUGR_2021158_1648X35X35001KV03.NEWFES_delay_' num2str(ii) '.TXT']);
file2 = load(['/Users/gianmarcobroilo/Desktop/1000shifts/delays/GRV_JUGR_2021158_1648X35K35001KV03.NEWFES_delay_' num2str(ii) '.TXT']);
%%calculations...
[N,bind] = elecdensity(omega_new,closestapproach);
%
N2_num2str(ii) = N./1e6;
end
To generate those variables, change the code line
N2_num2str(ii) = N./1e6;
to
eval(['N2_' num2str(ii) '= ' 'N./1e6']);
This might be computational too expensive. Another approach I will use to avoid the usage of the "eval" command is to save the tables in a structure and each field of it will be the matrix (named N_NUMBER). So, the code will be
% Generate the struct object
myValues = struct;
% Start the for loops
for ii=1:1000
file1 = load(['/Users/gianmarcobroilo/Desktop/1000shifts/delays/GRV_JUGR_2021158_1648X35X35001KV03.NEWFES_delay_' num2str(ii) '.TXT']);
file2 = load(['/Users/gianmarcobroilo/Desktop/1000shifts/delays/GRV_JUGR_2021158_1648X35K35001KV03.NEWFES_delay_' num2str(ii) '.TXT']);
%%calculations...
[N,bind] = elecdensity(omega_new,closestapproach);
%
fieldName = ['N2_' num2str(ii)];
myValues.(fieldName) = N./1e6;
end
% Print the table 54
myValues.N2_54

Custom environment Gym for step function processing with DDPG Agent

I'm new to reinforcement learning, and I would like to process audio signal using this technique. I built a basic step function that I wish to flatten to get my hands on Gym OpenAI and reinforcement learning in general.
To do so, I am using the GoalEnv provided by OpenAI since I know what the target is, the flat signal.
That is the image with input and desired signal :
The step function calls _set_action which performs achieved_signal = convolution(input_signal,low_pass_filter) - offset, low_pass_filter takes a cutoff frequency as input as well.
Cutoff frequency and offset are the parameters that act on the observation to get the output signal.
The designed reward function returns the frame to frame L2-norm between the input signal and the desired signal, to the negative, to penalize a large norm.
Following is the environment I created:
def butter_lowpass(cutoff, nyq_freq, order=4):
normal_cutoff = float(cutoff) / nyq_freq
b, a = signal.butter(order, normal_cutoff, btype='lowpass')
return b, a
def butter_lowpass_filter(data, cutoff_freq, nyq_freq, order=4):
b, a = butter_lowpass(cutoff_freq, nyq_freq, order=order)
y = signal.filtfilt(b, a, data)
return y
class `StepSignal(gym.GoalEnv)`:
def __init__(self, input_signal, sample_rate, desired_signal):
super(StepSignal, self).__init__()
self.initial_signal = input_signal
self.signal = self.initial_signal.copy()
self.sample_rate = sample_rate
self.desired_signal = desired_signal
self.distance_threshold = 10e-1
max_offset = abs(max( max(self.desired_signal) , max(self.signal))
- min( min(self.desired_signal) , min(self.signal)) )
self.action_space = spaces.Box(low=np.array([10e-4,-max_offset]),\
high=np.array([self.sample_rate/2-0.1,max_offset]), dtype=np.float16)
obs = self._get_obs()
self.observation_space = spaces.Dict(dict(
desired_goal=spaces.Box(-np.inf, np.inf, shape=obs['achieved_goal'].shape, dtype='float32'),
achieved_goal=spaces.Box(-np.inf, np.inf, shape=obs['achieved_goal'].shape, dtype='float32'),
observation=spaces.Box(-np.inf, np.inf, shape=obs['observation'].shape, dtype='float32'),
))
def step(self, action):
range = self.action_space.high - self.action_space.low
action = range / 2 * (action + 1)
self._set_action(action)
obs = self._get_obs()
done = False
info = {
'is_success': self._is_success(obs['achieved_goal'], self.desired_signal),
}
reward = -self.compute_reward(obs['achieved_goal'],self.desired_signal)
return obs, reward, done, info
def reset(self):
self.signal = self.initial_signal.copy()
return self._get_obs()
def _set_action(self, actions):
actions = np.clip(actions,a_max=self.action_space.high,a_min=self.action_space.low)
cutoff = actions[0]
offset = actions[1]
print(cutoff, offset)
self.signal = butter_lowpass_filter(self.signal, cutoff, self.sample_rate/2) - offset
def _get_obs(self):
obs = self.signal
achieved_goal = self.signal
return {
'observation': obs.copy(),
'achieved_goal': achieved_goal.copy(),
'desired_goal': self.desired_signal.copy(),
}
def compute_reward(self, goal_achieved, goal_desired):
d = np.linalg.norm(goal_desired-goal_achieved)
return d
def _is_success(self, achieved_goal, desired_goal):
d = self.compute_reward(achieved_goal, desired_goal)
return (d < self.distance_threshold).astype(np.float32)
The environment can then be instantiated into a variable, and flattened through the FlattenDictWrapper as advised here https://openai.com/blog/ingredients-for-robotics-research/ (end of the page).
length = 20
sample_rate = 30 # 30 Hz
in_signal_length = 20*sample_rate # 20sec signal
x = np.linspace(0, length, in_signal_length)
# Desired output
y = 3*np.ones(in_signal_length)
# Step signal
in_signal = 0.5*(np.sign(x-5)+9)
env = gym.make('stepsignal-v0', input_signal=in_signal, sample_rate=sample_rate, desired_signal=y)
env = gym.wrappers.FlattenDictWrapper(env, dict_keys=['observation','desired_goal'])
env.reset()
The agent is a DDPG Agent from keras-rl, since the actions can take any values in the continuous action_space described in the environment.
I wonder why the actor and critic nets need an input with an additional dimension, in input_shape=(1,) + env.observation_space.shape
nb_actions = env.action_space.shape[0]
# Building Actor agent (Policy-net)
actor = Sequential()
actor.add(Flatten(input_shape=(1,) + env.observation_space.shape, name='flatten'))
actor.add(Dense(128))
actor.add(Activation('relu'))
actor.add(Dense(64))
actor.add(Activation('relu'))
actor.add(Dense(nb_actions))
actor.add(Activation('linear'))
actor.summary()
# Building Critic net (Q-net)
action_input = Input(shape=(nb_actions,), name='action_input')
observation_input = Input(shape=(1,) + env.observation_space.shape, name='observation_input')
flattened_observation = Flatten()(observation_input)
x = Concatenate()([action_input, flattened_observation])
x = Dense(128)(x)
x = Activation('relu')(x)
x = Dense(64)(x)
x = Activation('relu')(x)
x = Dense(1)(x)
x = Activation('linear')(x)
critic = Model(inputs=[action_input, observation_input], outputs=x)
critic.summary()
# Building Keras agent
memory = SequentialMemory(limit=2000, window_length=1)
policy = BoltzmannQPolicy()
random_process = OrnsteinUhlenbeckProcess(size=nb_actions, theta=0.6, mu=0, sigma=0.3)
agent = DDPGAgent(nb_actions=nb_actions, actor=actor, critic=critic, critic_action_input=action_input,
memory=memory, nb_steps_warmup_critic=2000, nb_steps_warmup_actor=10000,
random_process=random_process, gamma=.99, target_model_update=1e-3)
agent.compile(Adam(lr=1e-3, clipnorm=1.), metrics=['mae'])
Finally, the agent is trained:
filename = 'mem20k_heaviside_flattening'
hist = agent.fit(env, nb_steps=10, visualize=False, verbose=2, nb_max_episode_steps=5)
with open('./history_dqn_test_'+ filename + '.pickle', 'wb') as handle:
pickle.dump(hist.history, handle, protocol=pickle.HIGHEST_PROTOCOL)
agent.save_weights('h5f_files/dqn_{}_weights.h5f'.format(filename), overwrite=True)
Now here is the catch: the agent seems to always be stuck to the same neighborhood of output values across all episodes for a same instance of my env:
The cumulated reward is negative since I just allowed the agent to get negative rewards. I used it from https://github.com/openai/gym/blob/master/gym/envs/robotics/fetch_env.py which is part of OpenAI code as example.
Across one episode, I should get varying sets of actions converging towards a (cutoff_final, offset_final) that would get my input step signal close to my output flat signal, which is clearly not the case. In addition, I thought, for successive episodes, I should get different actions.
I wonder why the actor and critic nets need an input with an additional dimension, in input_shape=(1,) + env.observation_space.shape
I think the GoalEnv is designed with HER (Hindsight Experience Replay) in mind, since it will use the "sub-spaces" inside the observation_space to learn from sparse reward signals (there is a paper in OpenAI website that explains how HER works). Haven't look at the implementation, but my guess is that there needs to be an additional input since HER also process the "goal" parameter.
Since it seems you are not using HER (works with any off-policy algorithm, including DQN, DDPG, etc), you should handcraft an informative reward function (rewards are not binary, eg, 1 if objective achieved, 0 otherwise) and use the base Env class. The reward should be calculated inside the step method, since rewards in MDP's are functions like r(s, a, s`) you probably will have all the information you need. Hope it helps.

Octave: Plotyy log files from geothermal heat pump (import/plot datetime)

I'm trying to plot values from my geothermal heat pump log files to analyse it's performance. I tried with excel but it was to slow and not possible to get the plot type I wanted so I'm trying Octave instead. I have absolutely no experience with octave so please forgive my incompetence!
I've processed the .log files with open office calc to get into a decent delimited format. The first column is datetime with the format MM/DD/YY HH:MM:SS, in total there is 21 columns (but I only need 5) and one header line with a label, coma delimiter is '.' and delimiter is ','. The file can be downloaded here and the first 7 columns look like this:
02/19/2018 23:07:00,-0.7,47.5,42,47.3,52.1,1.5
I'm currently trying to plot this with demonstration 3 plotyy from here. Column 2, 3, 5 and 8 imports correctly so I'm figuring it's a problem with the datetime column 1. How can I get Octave to import column 1 correctly and use it as x axis in this plot?:
data=csvread('heatpump.csv');
clf;
hold on
t=data(:,1);
x=data(:,3);
y=data(:,5);
z=data(:,2);
o=data(:,8);
[hax, h1, h2] = plotyy (t, x, t, y);
[~, h3, h4] = plotyy (t, z, t, o);
set ([h3, h4], "linestyle", "--");
xlabel (hax(1), "Time");
title (hax(2), 'Heat pump analysis');
ylabel (hax(1), "Radiator and hot water temp");
ylabel (hax(2), "Outdoor temp and brine out");
There are many, many ways. Here I show you how to read the csv using csv2cell from the io package. I've tried to modify your existing code as less as sane. The first columns is used verbatim (well, I inserted a linebreak) to the plot. There is also a commented version which actually does the conversion and you could then use datetick. Btw, If you add google drive links it would be cool if you add direct links so someone can easily grab the csv or insert the url in the code as I've done, see below.
set (0, "defaultlinelinewidth", 2);
url = "https://drive.google.com/uc?export=download&id=1K_czefz-Wz4HPdvc7YqIqIupPwMi8a7r";
fn = "heatpump.csv";
if (! exist (fn, "file"))
urlwrite (url, fn);
endif
pkg load io
d = csv2cell (fn);
# convert to serial date
# (but you don't have if you want to keep the old format)
#t = datenum (d(2:end,1), "mm/dd/yyyy HH:MM:SS");
data = cell2mat (d(2:end,2:end));
clf;
hold on
t = 1:rows (data);
# Attention: the date/time column time was removed above, so the indizes are shifted
x = data(:,2);
y = data(:,4);
z = data(:,1);
o = data(:,7);
[hax, h1, h2] = plotyy (t, x, t, y);
[hax2, h3, h4] = plotyy (t, z, t, o);
grid on
#set ([h3, h4], "linestyle", "--");
xlabel (hax(1), "Time");
title (hax(2), 'Heat pump analysis');
ylabel (hax(1), "Radiator and hot water temp");
ylabel (hax(2), "Outdoor temp and brine out");
# use date as xtick
# extract them
date_time = d (get(hax2(1), "xtick"), 1);
# break them after the date part
date_time = strrep (date_time, " ", "\n");
# feed them back
set (hax, "xticklabel", date_time)
set (hax2, "xticklabel", date_time)
print ("-S1200,1000", "-F:10", "out.png")

How do I feed in my own data into PyAlgoTrade?

I'm trying to use PyAlogoTrade's event profiler
However I don't want to use data from yahoo!finance, I want to use my own but can't figure out how to parse in the CSV, it is in the format:
Timestamp Low Open Close High BTC_vol USD_vol [8] [9]
2013-11-23 00 800 860 847.666666 886.876543 853.833333 6195.334452 5248330 0
2013-11-24 00 745 847.5 815.01 860 831.255 10785.94131 8680720 0
The complete CSV is here
I want to do something like:
def main(plot):
instruments = ["AA", "AES", "AIG"]
feed = yahoofinance.build_feed(instruments, 2008, 2009, ".")
Then replace yahoofinance.build_feed(instruments, 2008, 2009, ".") with my CSV
I tried:
import csv
with open( 'FinexBTCDaily.csv', 'rb' ) as csvfile:
data = csv.reader( csvfile )
def main( plot ):
feed = data
But it throws an attribute error. Any ideas how to do this?
I suggest to create your own Rowparser and Feed, which is much easier than it sounds, have a look here: yahoofeed
This also allows you to work with intraday data and cleanup the data if needed, like your timestamp.
Another possibility, of course, would be to parse your file and save it, so it looks like a yahoo feed. In your case, you would have to adapt the columns and the Timestamp.
Step A: follow PyAlgoTrade doc on GenericBarFeed class
On this link see the addBarsFromCSV() in CSV section of the BarFeed class in v0.16
On this link see the addBarsFromCSV() in CSV section of the BarFeed class in v0.17
Note
- The CSV file must have the column names in the first row.
- It is ok if the Adj Close column is empty.
- When working with multiple instruments:
--- If all the instruments loaded are in the same timezone, then the timezone parameter may not be specified.
--- If any of the instruments loaded are in different timezones, then the timezone parameter should be set.
addBarsFromCSV( instrument, path, timezone = None )
Loads bars for a given instrument from a CSV formatted file. The instrument gets registered in the bar feed.
Parameters:
(string) instrument – Instrument identifier.
(string) path – The path to the CSV file.
(pytz) timezone – The timezone to use to localize bars.Check pyalgotrade.marketsession.
Next:
A BarFeed loads bars from CSV files that have the following format:
Date Time, Open, High, Low, Close, Volume, Adj Close
2013-01-01 13:59:00,13.51001,13.56,13.51,13.56789,273.88014126,13.51001
Step B: implement a documented CSV-file pre-formatting
Your CSV data will need a bit of sanity ( before will be able to be used in PyAlgoTrade methods ),however it is doable and you can create an easy transformator either by hand or with a use of a powerful numpy.genfromtxt() lambda-based converters facilities.
This sample code is intended for an illustration purpose, to see immediately the powers of converters for your own transformations, as CSV-structure differs.
with open( getCsvFileNAME( ... ), "r" ) as aFH:
numpy.genfromtxt( aFH,
skip_header = 1, # Ref. pyalgotrade
delimiter = ",",
# v v v v v v
# 2011.08.30,12:00,1791.20,1792.60,1787.60,1789.60,835
# 2011.08.30,13:00,1789.70,1794.30,1788.70,1792.60,550
# 2011.08.30,14:00,1792.70,1816.70,1790.20,1812.10,1222
# 2011.08.30,15:00,1812.20,1831.50,1811.90,1824.70,2373
# 2011.08.30,16:00,1824.80,1828.10,1813.70,1817.90,2215
converters = { 0: lambda aString: mPlotDATEs.date2num( datetime.datetime.strptime( aString, "%Y.%m.%d" ) ), #_______________________________________asFloat ( 1.0, +++ )
1: lambda aString: ( ( int( aString[0:2] ) * 60 + int( aString[3:] ) ) / 60. / 24. ) # ( 15*60 + 00 ) / 60. / 24.__asFloat < 0.0, 1.0 )
# HH: :MM HH MM
}
)
You can use pyalgotrade.barfeed.addBarsFromSequence with list comprehension to feed in data from CSV row by row/bar by bar. Basically you create a bar from each row, pass OHLCV as init parameters and extra columns with additional data in a dictionary. You can try something like this (with all the required imports):
data = pd.DataFrame(index=pd.date_range(start='2021-11-01', end='2021-11-05'), columns=['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume', 'ExtraCol1', 'ExtraCol3', 'ExtraCol4', 'ExtraCol5'], data=np.random.rand(5, 10))
feed = yahoofeed.Feed()
feed.addBarsFromSequence('instrumentID', data.index.map(lambda i:
BasicBar(
i,
data.loc[i, 'Open'],
data.loc[i, 'High'],
data.loc[i, 'Low'],
data.loc[i, 'Close'],
data.loc[i, 'Volume'],
data.loc[i, 'Adj Close'],
Frequency.DAY,
data.loc[i, 'ExtraCol1':].to_dict())
).values)
The input data frame was created with random values to make this example easier to reproduce, but the part where the bars are added to the feed should work the same for data frames from CSVs given that the valid column names are used.