Plot timestamp on x-axis from CSV file - octave

I'm trying to make a plot with datetimes on the x-axis in Octave. Datetime is a timestamp in "yyyy-mm-dd HH:MM" format. I used scan2cell function to read from CSV file, data are printed in array but cannot be plotted - It shows error: plot: no data to plot
%code:
filename = 'Test_12.csv';
data = csv2cell(filename);
time = data (:,2);
y = data (:,3);
plot(time,y)
Note: CSVread function prints datetime in imaginary format!
2021 - 3i
2021 - 3i
2021 - 3i

Given th following Test_12.csv file:
No,Time,timeToFix
1,2021-03-09 20:02,51
2,2021-03-09 20:14,87
3,2021-03-09 20:16,82
4,2000-01-01 0:03,44
5,2021-03-09 20:21,34
6,2021-03-09 20:31,34
You can do the following:
pkg load io % provides the `csv2cell` function
Filename = 'Test_12.csv';
CsvContents = csv2cell( Filename );
Headers = CsvContents(1 , :); % First row contains headers
Data = CsvContents(2:end, :); % Followed by data
% Convert values in cells to desired array format in each case
Time = cellfun( #(s) datenum( s, 'yyyy-mm-dd HH:MM' ), Data(:,2) );
TimeToFix = cell2mat( Data(:,3) );
% Plot
P = plot( Time, TimeToFix, 'o' );
% Touch up plot a tiny bit to make it look nicer
set( P, 'markerfacecolor', [1,0.5,0.5], 'markeredgecolor', [0.5,0.25,0.25], 'linewidth', 2, 'markersize', 10 )
grid on
datetick( 'yyyy-mm-dd' ) % Convert 'datenum' values back to appropriate datestrings
set( gca, 'color', [0.8,0.8,0.8], 'gridcolor', 'w', 'gridlinestyle', ':', 'gridalpha', 1 )

Related

How can I set the numbering of the x-axis of an Octave plot to engineering notation?

I made a very simple Octave script
a = [10e6, 11e6, 12e6];
b = [10, 11, 12];
plot(a, b, 'rd-')
which outputs the following graph.
Graph
Is it possible to set the numbering on the x-axis to engineering notation, rather than scientific, and have it display "10.5e+6, 11e+6, 11.5e+6" instead of "1.05e+7, 1.1e+7, 1.15+e7"?
While octave provides a 'short eng' formatting option, which does what you're asking for in terms of printing to the terminal, it does not appear to provide this functionality in plots or when formatting strings via sprintf.
Therefore you'll have to find a way to do this by yourself, with some creative string processing of the initial xticks, and substituting the plot's ticklabels accordingly. Thankfully it's not that hard :)
Using your example:
a = [10e6, 11e6, 12e6];
b = [10, 11, 12];
plot(a, b, 'rd-')
format short eng % display stdout in engineering format
TickLabels = disp( xticks ) % collect string as it would be displayed on the stdout
TickLabels = strsplit( TickLabels ) % tokenize at spaces
TickLabels = TickLabels( 2 : end - 1 ) % discard start and end empty tokens
TickLabels = regexprep( TickLabels, '\.0+e', 'e' ) % remove purely zero decimals using a regular expression
TickLabels = regexprep( TickLabels, '(\.[1-9]*)0+e', '$1e' ) % remove non-significant zeros in non-zero decimals using a regular expression
xticklabels( TickLabels ) % set the new ticklabels to the plot
format % reset short eng format back to default, if necessary

Is there a way to take a list of strings and create a JSON file, where both the key and value are list items?

I am creating a python script that can read scanned, and tabular .pdfs and extract some important data and insert it into a JSON to later be implemented into a SQL database (I will also be developing the DB as a project for learning MongoDB).
Basically, my issue is I have never worked with any JSON files before but that was the format I was recommended to output to. The scraping script works, the pre-processing could be a lot cleaner, but for now it works. The issue I run into is the keys, and values are in the same list, and some of the values because they had a decimal point are two different list items. Not really sure where to even start.
I don't really know where to start, I suppose since I know what the indexes of the list are I can easily assign keys and values, but then it may not be applicable to any .pdf, that is the script cannot be coded explicitly.
import PyPDF2 as pdf2
import textract
with "TestSpec.pdf" as filename:
pdfFileObj = open(filename, 'rb')
pdfReader = pdf2.pdfFileReader(pdfFileObj)
num_pages = pdfReader.numpages
count = 0
text = ""
while count < num_pages:
pageObj = pdfReader.getPage(0)
count += 1
text += pageObj.extractText()
if text != "":
text = text
else:
text = textract.process(filename, method="tesseract", language="eng")
def cleanText(x):
'''
This function takes the byte data extracted from scanned PDFs, and cleans it of all
unnessary data.
Requires re
'''
stringedText = str(x)
cleanText = stringedText.replace('\n','')
splitText = re.split(r'\W+', cleanText)
caseingText = [word.lower() for word in splitText]
cleanOne = [word for word in caseingText if word != 'n']
dexStop = cleanOne.index("od260")
dexStart = cleanOne.index("sheet")
clean = cleanOne[dexStart + 1:dexStop]
return clean
cleanText = cleanText(text)
This is the current output
['n21', 'feb', '2019', 'nsequence', 'lacz', 'rp', 'n5', 'gat', 'ctc', 'tac', 'cat', 'ggc', 'gca', 'cat', 'ttc', 'ccc', 'gaa', 'aag', 'tgc', '3', 'norder', 'no', '15775199', 'nref', 'no', '207335463', 'n25', 'nmole', 'dna', 'oligo', '36', 'bases', 'nproperties', 'amount', 'of', 'oligo', 'shipped', 'to', 'ntm', '50mm', 'nacl', '66', '8', 'xc2', 'xb0c', '11', '0', '32', '6', 'david', 'cook', 'ngc', 'content', '52', '8', 'd260', 'mmoles', 'kansas', 'state', 'university', 'biotechno', 'nmolecular', 'weight', '10', '965', '1', 'nnmoles']
and we want the output as a JSON setup like
{"Date | 21feb2019", "Sequence ID: | lacz-rp", "Sequence 5'-3' | gat..."}
and so on. Just not sure how to do that.
here is a screenshot of the data from my sample pdf
So, i have figured out some of this. I am still having issues with grabbing the last 3rd of the data i need without explicitly programming it in. but here is what i have so far. Once i have everything working then i will worry about optimizing it and condensing.
# for PDF reading
import PyPDF2 as pdf2
import textract
# for data preprocessing
import re
from dateutil.parser import parse
# For generating the JSON file array
import json
# This finds and opens the pdf file, reads the data, and extracts the data.
filename = "*.pdf"
pdfFileObj = open(filename, 'rb')
pdfReader = pdf2.PdfFileReader(pdfFileObj)
text = ""
pageObj = pdfReader.getPage(0)
text += pageObj.extractText()
# checks if extracted data is in string form or picture, if picture textract reads data.
# it then closes the pdf file
if text != "":
text = text
else:
text = textract.process(filename, method="tesseract", language="eng")
pdfFileObj.close()
# Converts text to string from byte data for preprocessing
stringedText = str(text)
# Removed escaped lines and replaced them with actual new lines.
formattedText = stringedText.replace('\\n', '\n').lower()
# Slices the long string into a workable piece (only contains useful data)
slice1 = formattedText[(formattedText.index("sheet") + 10): (formattedText.index("secondary") - 2)]
clean = re.sub('\n', " ", slice1)
clean2 = re.sub(' +', ' ', clean)
# Creating the PrimerData dictionary
with open("PrimerData.json",'w') as file:
primerDataSlice = clean[clean.index("molecular"): -1]
primerData = re.split(": |\n", primerDataSlice)
primerKeys = primerData[0::2]
primerValues = primerData[1::2]
primerDict = {"Primer Data": dict(zip(primerKeys,primerValues))}
# Generatring the JSON array "Primer Data"
primerJSON = json.dumps(primerDict, ensure_ascii=False)
file.write(primerJSON)
# Grabbing the date (this has just the date, so json will have to add date.)
date = re.findall('(\d{2}[\/\- ](\d{2}|january|jan|february|feb|march|mar|april|apr|may|may|june|jun|july|jul|august|aug|september|sep|october|oct|november|nov|december|dec)[\/\- ]\d{2,4})', clean2)
Without input data it is difficult to give you working code. A minimal working example with input would help. As for JSON handling, python dictionaries can dump to json easily. See examples here.
https://docs.python-guide.org/scenarios/json/
Get a json string from a dictionary and write to a file. Figure out how to parse the text into a dictionary.
import json
d = {"Date" : "21feb2019", "Sequence ID" : "lacz-rp", "Sequence 5'-3'" : "gat"}
json_data = json.dumps(d)
print(json_data)
# Write that data to a file
So, I did figure this out, the problem was really just that because of the way my pre-processing was pulling all the data into a single list wasn't really that great of an idea considering that the keys for the dictionary never changed.
Here is the semi-finished result for making the Dictionary and JSON file.
# Collect the sequence name
name = clean2[clean2.index("Sequence") + 11: clean2.index("Sequence") + 19]
# Collecting Shipment info
ordered = input("Who placed this order? ")
received = input("Who is receiving this order? ")
dateOrder = re.findall(
r"(\d{2}[/\- ](\d{2}|January|Jan|February|Feb|March|Mar|April|Apr|May|June|Jun|July|Jul|August|Aug|September|Sep|October|Oct|November|Nov|December|Dec)[/\- ]\d{2,4})",
clean2)
dateReceived = date.today()
refNo = clean2[clean2.index("ref.No. ") + 8: clean2.index("ref.No.") + 17]
orderNo = clean2[clean2.index("Order No.") +
10: clean2.index("Order No.") + 18]
# Finding and grabbing the sequence data. Storing it and then finding the
# GC content and melting temp or TM
bases = int(clean2[clean2.index("bases") - 3:clean2.index("bases") - 1])
seqList = [line for line in clean2 if re.match(r'^[AGCT]+$', line)]
sequence = "".join(i for i in seqList[:bases])
def gc_content(x):
count = 0
for i in x:
if i == 'G' or i == 'C':
count += 1
else:
count = count
return round((count / bases) * 100, 1)
gc = gc_content(sequence)
tm = mt.Tm_GC(sequence, Na=50)
moleWeight = round(mw(Seq(sequence, generic_dna)), 2)
dilWeight = float(clean2[clean2.index("ug/OD260:") +
10: clean2.index("ug/OD260:") + 14])
dilution = dilWeight * 10
primerDict = {"Primer Data": {
"Sequence": sequence,
"Bases": bases,
"TM (50mM NaCl)": tm,
"% GC content": gc,
"Molecular weight": moleWeight,
"ug/0D260": dilWeight,
"Dilution volume (uL)": dilution
},
"Shipment Info": {
"Ref. No.": refNo,
"Order No.": orderNo,
"Ordered by": ordered,
"Date of Order": dateOrder,
"Received By": received,
"Date Received": str(dateReceived.strftime("%d-%b-%Y"))
}}
# Generating the JSON array "Primer Data"
with open("".join(name) + ".json", 'w') as file:
primerJSON = json.dumps(primerDict, ensure_ascii=False)
file.write(primerJSON)

Octave: Plotyy log files from geothermal heat pump (import/plot datetime)

I'm trying to plot values from my geothermal heat pump log files to analyse it's performance. I tried with excel but it was to slow and not possible to get the plot type I wanted so I'm trying Octave instead. I have absolutely no experience with octave so please forgive my incompetence!
I've processed the .log files with open office calc to get into a decent delimited format. The first column is datetime with the format MM/DD/YY HH:MM:SS, in total there is 21 columns (but I only need 5) and one header line with a label, coma delimiter is '.' and delimiter is ','. The file can be downloaded here and the first 7 columns look like this:
02/19/2018 23:07:00,-0.7,47.5,42,47.3,52.1,1.5
I'm currently trying to plot this with demonstration 3 plotyy from here. Column 2, 3, 5 and 8 imports correctly so I'm figuring it's a problem with the datetime column 1. How can I get Octave to import column 1 correctly and use it as x axis in this plot?:
data=csvread('heatpump.csv');
clf;
hold on
t=data(:,1);
x=data(:,3);
y=data(:,5);
z=data(:,2);
o=data(:,8);
[hax, h1, h2] = plotyy (t, x, t, y);
[~, h3, h4] = plotyy (t, z, t, o);
set ([h3, h4], "linestyle", "--");
xlabel (hax(1), "Time");
title (hax(2), 'Heat pump analysis');
ylabel (hax(1), "Radiator and hot water temp");
ylabel (hax(2), "Outdoor temp and brine out");
There are many, many ways. Here I show you how to read the csv using csv2cell from the io package. I've tried to modify your existing code as less as sane. The first columns is used verbatim (well, I inserted a linebreak) to the plot. There is also a commented version which actually does the conversion and you could then use datetick. Btw, If you add google drive links it would be cool if you add direct links so someone can easily grab the csv or insert the url in the code as I've done, see below.
set (0, "defaultlinelinewidth", 2);
url = "https://drive.google.com/uc?export=download&id=1K_czefz-Wz4HPdvc7YqIqIupPwMi8a7r";
fn = "heatpump.csv";
if (! exist (fn, "file"))
urlwrite (url, fn);
endif
pkg load io
d = csv2cell (fn);
# convert to serial date
# (but you don't have if you want to keep the old format)
#t = datenum (d(2:end,1), "mm/dd/yyyy HH:MM:SS");
data = cell2mat (d(2:end,2:end));
clf;
hold on
t = 1:rows (data);
# Attention: the date/time column time was removed above, so the indizes are shifted
x = data(:,2);
y = data(:,4);
z = data(:,1);
o = data(:,7);
[hax, h1, h2] = plotyy (t, x, t, y);
[hax2, h3, h4] = plotyy (t, z, t, o);
grid on
#set ([h3, h4], "linestyle", "--");
xlabel (hax(1), "Time");
title (hax(2), 'Heat pump analysis');
ylabel (hax(1), "Radiator and hot water temp");
ylabel (hax(2), "Outdoor temp and brine out");
# use date as xtick
# extract them
date_time = d (get(hax2(1), "xtick"), 1);
# break them after the date part
date_time = strrep (date_time, " ", "\n");
# feed them back
set (hax, "xticklabel", date_time)
set (hax2, "xticklabel", date_time)
print ("-S1200,1000", "-F:10", "out.png")

Shapefile with overlapping polygons: calculate average values

I have a very big polygon shapefile with hundreds of features, often overlapping each other. Each of these features has a value stored in the attribute table. I simply need to calculate the average values in the areas where they overlap.
I can imagine that this task requires several intricate steps: I was wondering if there is a straightforward methodology.
I’m open to every kind of suggestion, I can use ArcMap, QGis, arcpy scripts, PostGis, GDAL… I just need ideas. Thanks!
You should use the Union tool from ArcGIS. It will create new polygons where the polygons overlap. In order to keep the attributes from both polygons, add your polygon shapefile twice as input and use ALL as join_attributes parameter.This creates also polygons intersecting with themselves, you can select and delete them easily as they have the same FIDs. Then just add a new field to the attribute table and calculate it based on the two original value fields from the input polygons.
This can be done in a script or directly with the toolbox's tools.
After few attempts, I found a solution by rasterising all the features singularly and then performing cell statistics in order to calculate the average.
See below the script I wrote, please do not hesitate to comment and improve it!
Thanks!
#This script processes a shapefile of snow persistence (area of interest: Afghanistan).
#the input shapefile represents a month of snow cover and contains several features.
#each feature represents a particular day and a particular snow persistence (low,medium,high,nodata)
#these features are polygons multiparts, often overlapping.
#a feature of a particular day can overlap a feature of another one, but features of the same day and with
#different snow persistence can not overlap each other.
#(potentially, each shapefile contains 31*4 feature).
#the script takes the features singularly and exports each feature in a temporary shapefile
#which contains only one feature.
#Then, each feature is converted to raster, and after
#a logical conditional expression gives a value to the pixel according the intensity (high=3,medium=2,low=1,nodata=skipped).
#Finally, all these rasters are summed and divided by the number of days, in order to
#calculate an average value.
#The result is a raster with the average snow persistence in a particular month.
#This output raster ranges from 0 (no snow) to 3 (persistent snow for the whole month)
#and values outside this range should be considered as small errors in pixel overlapping.
#This script needs a particular folder structure. The folder C:\TEMP\Afgh_snow_cover contains 3 subfolders
#input, temp and outputs. The script takes care automatically of the cleaning of temporary data
import arcpy, numpy, os
from arcpy.sa import *
from arcpy import env
#function for finding unique values of a field in a FC
def unique_values_in_table(table, field):
data = arcpy.da.TableToNumPyArray(table, [field])
return numpy.unique(data[field])
#check extensions
try:
if arcpy.CheckExtension("Spatial") == "Available":
arcpy.CheckOutExtension("Spatial")
else:
# Raise a custom exception
#
raise LicenseError
except LicenseError:
print "spatial Analyst license is unavailable"
except:
print arcpy.GetMessages(2)
finally:
# Check in the 3D Analyst extension
#
arcpy.CheckInExtension("Spatial")
# parameters and environment
temp_folder = r"C:\TEMP\Afgh_snow_cover\temp_rasters"
output_folder = r"C:\TEMP\Afgh_snow_cover\output_rasters"
env.workspace = temp_folder
unique_field = "FID"
field_Date = "DATE"
field_Type = "Type"
cellSize = 0.02
fc = r"C:\TEMP\Afgh_snow_cover\input_shapefiles\snow_cover_Dec2007.shp"
stat_output_name = fc[-11:-4] + ".tif"
#print stat_output_name
arcpy.env.extent = "MAXOF"
#find all the uniquesID of the FC
uniqueIDs = unique_values_in_table(fc, "FID")
#make layer for selecting
arcpy.MakeFeatureLayer_management (fc, "lyr")
#uniqueIDs = uniqueIDs[-5:]
totFeatures = len(uniqueIDs)
#for each feature, get the date and the type of snow persistence(type can be high, medium, low and nodata)
for i in uniqueIDs:
SC = arcpy.SearchCursor(fc)
for row in SC:
if row.getValue(unique_field) == i:
datestring = row.getValue(field_Date)
typestring = row.getValue(field_Type)
month = str(datestring.month)
day = str(datestring.day)
year = str(datestring.year)
#format month and year string
if len(month) == 1:
month = '0' + month
if len(day) == 1:
day = '0' + day
#convert snow persistence to numerical value
if typestring == 'high':
typestring2 = 3
if typestring == 'medium':
typestring2 = 2
if typestring == 'low':
typestring2 = 1
if typestring == 'nodata':
typestring2 = 0
#skip the NoData features, and repeat the following for each feature (a feature is a day and a persistence value)
if typestring2 > 0:
#create expression for selecting the feature
expression = ' "FID" = ' + str(i) + ' '
#select the feature
arcpy.SelectLayerByAttribute_management("lyr", "NEW_SELECTION", expression)
#create
#outFeatureClass = os.path.join(temp_folder, ("M_Y_" + str(i)))
#create faeture class name, writing the snow persistence value at the end of the name
outFeatureClass = "Afg_" + str(year) + str(month) + str(day) + "_" + str(typestring2) + '.shp'
#export the feature
arcpy.FeatureClassToFeatureClass_conversion("lyr", temp_folder, outFeatureClass)
print "exported FID " + str(i) + " \ " + str(totFeatures)
#create name of the raster and convert the newly created feature to raster
outRaster = outFeatureClass[4:-4] + ".tif"
arcpy.FeatureToRaster_conversion(outFeatureClass, field_Type, outRaster, cellSize)
#remove the temporary fc
arcpy.Delete_management(outFeatureClass)
del SC, row
#now many rasters are created, representing the snow persistence types of each day.
#list all the rasters created
rasterList = arcpy.ListRasters("*", "All")
print rasterList
#now the rasters have values 1 and 0. the following loop will
#perform CON expressions in order to assign the value of snow persistence
for i in rasterList:
print i + ":"
inRaster = Raster(i)
#set the value of snow persistence, stored in the raster name
value_to_set = i[-5]
inTrueRaster = int(value_to_set)
inFalseConstant = 0
whereClause = "Value > 0"
# Check out the ArcGIS Spatial Analyst extension license
arcpy.CheckOutExtension("Spatial")
print 'Executing CON expression and deleting input'
# Execute Con , in order to assign to each pixel the value of snow persistence
print str(inTrueRaster)
try:
outCon = Con(inRaster, inTrueRaster, inFalseConstant, whereClause)
except:
print 'CON expression failed (probably empty raster!)'
nameoutput = i[:-4] + "_c.tif"
outCon.save(nameoutput)
#delete the temp rasters with values 0 and 1
arcpy.Delete_management(i)
#list the raster with values of snow persistence
rasterList = arcpy.ListRasters("*_c.tif", "All")
#sum the rasters
print "Caclulating SUM"
outCellStats = CellStatistics(rasterList, "SUM", "DATA")
#calculate the number of days (num of rasters/3)
print "Calculating day ratio"
num_of_rasters = len(rasterList)
print 'Num of rasters : ' + str(num_of_rasters)
num_of_days = num_of_rasters / 3
print 'Num of days : ' + str(num_of_days)
#in order to store decimal values, multiplicate the raster by 1000 before dividing
outCellStats = outCellStats * 1000 / num_of_days
#save the output raster
print "saving output " + stat_output_name
stat_output_name = os.path.join(output_folder,stat_output_name)
outCellStats.save(stat_output_name)
#delete the remaining temporary rasters
print "deleting CON rasters"
for i in rasterList:
print "deleting " + i
arcpy.Delete_management(i)
arcpy.Delete_management("lyr")
Could you rasterize your polygons into multiple layers, each pixel could contain your attribute value. Then merge the layers by averaging the attribute values?

How do I feed in my own data into PyAlgoTrade?

I'm trying to use PyAlogoTrade's event profiler
However I don't want to use data from yahoo!finance, I want to use my own but can't figure out how to parse in the CSV, it is in the format:
Timestamp Low Open Close High BTC_vol USD_vol [8] [9]
2013-11-23 00 800 860 847.666666 886.876543 853.833333 6195.334452 5248330 0
2013-11-24 00 745 847.5 815.01 860 831.255 10785.94131 8680720 0
The complete CSV is here
I want to do something like:
def main(plot):
instruments = ["AA", "AES", "AIG"]
feed = yahoofinance.build_feed(instruments, 2008, 2009, ".")
Then replace yahoofinance.build_feed(instruments, 2008, 2009, ".") with my CSV
I tried:
import csv
with open( 'FinexBTCDaily.csv', 'rb' ) as csvfile:
data = csv.reader( csvfile )
def main( plot ):
feed = data
But it throws an attribute error. Any ideas how to do this?
I suggest to create your own Rowparser and Feed, which is much easier than it sounds, have a look here: yahoofeed
This also allows you to work with intraday data and cleanup the data if needed, like your timestamp.
Another possibility, of course, would be to parse your file and save it, so it looks like a yahoo feed. In your case, you would have to adapt the columns and the Timestamp.
Step A: follow PyAlgoTrade doc on GenericBarFeed class
On this link see the addBarsFromCSV() in CSV section of the BarFeed class in v0.16
On this link see the addBarsFromCSV() in CSV section of the BarFeed class in v0.17
Note
- The CSV file must have the column names in the first row.
- It is ok if the Adj Close column is empty.
- When working with multiple instruments:
--- If all the instruments loaded are in the same timezone, then the timezone parameter may not be specified.
--- If any of the instruments loaded are in different timezones, then the timezone parameter should be set.
addBarsFromCSV( instrument, path, timezone = None )
Loads bars for a given instrument from a CSV formatted file. The instrument gets registered in the bar feed.
Parameters:
(string) instrument – Instrument identifier.
(string) path – The path to the CSV file.
(pytz) timezone – The timezone to use to localize bars.Check pyalgotrade.marketsession.
Next:
A BarFeed loads bars from CSV files that have the following format:
Date Time, Open, High, Low, Close, Volume, Adj Close
2013-01-01 13:59:00,13.51001,13.56,13.51,13.56789,273.88014126,13.51001
Step B: implement a documented CSV-file pre-formatting
Your CSV data will need a bit of sanity ( before will be able to be used in PyAlgoTrade methods ),however it is doable and you can create an easy transformator either by hand or with a use of a powerful numpy.genfromtxt() lambda-based converters facilities.
This sample code is intended for an illustration purpose, to see immediately the powers of converters for your own transformations, as CSV-structure differs.
with open( getCsvFileNAME( ... ), "r" ) as aFH:
numpy.genfromtxt( aFH,
skip_header = 1, # Ref. pyalgotrade
delimiter = ",",
# v v v v v v
# 2011.08.30,12:00,1791.20,1792.60,1787.60,1789.60,835
# 2011.08.30,13:00,1789.70,1794.30,1788.70,1792.60,550
# 2011.08.30,14:00,1792.70,1816.70,1790.20,1812.10,1222
# 2011.08.30,15:00,1812.20,1831.50,1811.90,1824.70,2373
# 2011.08.30,16:00,1824.80,1828.10,1813.70,1817.90,2215
converters = { 0: lambda aString: mPlotDATEs.date2num( datetime.datetime.strptime( aString, "%Y.%m.%d" ) ), #_______________________________________asFloat ( 1.0, +++ )
1: lambda aString: ( ( int( aString[0:2] ) * 60 + int( aString[3:] ) ) / 60. / 24. ) # ( 15*60 + 00 ) / 60. / 24.__asFloat < 0.0, 1.0 )
# HH: :MM HH MM
}
)
You can use pyalgotrade.barfeed.addBarsFromSequence with list comprehension to feed in data from CSV row by row/bar by bar. Basically you create a bar from each row, pass OHLCV as init parameters and extra columns with additional data in a dictionary. You can try something like this (with all the required imports):
data = pd.DataFrame(index=pd.date_range(start='2021-11-01', end='2021-11-05'), columns=['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume', 'ExtraCol1', 'ExtraCol3', 'ExtraCol4', 'ExtraCol5'], data=np.random.rand(5, 10))
feed = yahoofeed.Feed()
feed.addBarsFromSequence('instrumentID', data.index.map(lambda i:
BasicBar(
i,
data.loc[i, 'Open'],
data.loc[i, 'High'],
data.loc[i, 'Low'],
data.loc[i, 'Close'],
data.loc[i, 'Volume'],
data.loc[i, 'Adj Close'],
Frequency.DAY,
data.loc[i, 'ExtraCol1':].to_dict())
).values)
The input data frame was created with random values to make this example easier to reproduce, but the part where the bars are added to the feed should work the same for data frames from CSVs given that the valid column names are used.