Running timeseries graphing function in Rmd producing cluttered x-axis labels (not present in test code) - function

I have a folder of xx .csv timeseries that I want to graph and knit into a clean HTML document. I have a ggplot code that produces the plot that I want using a single timeseries.csv. However, when I try to put the bones of that ggplot code in a function inside of a for loop to run each of the timeseries.csv files through the function I get a some plots with pretty different formatting.
Plot generated with my test ggplot code:
Plot generated with function and for loop:
Changes I'm trying to make to the ugly Rmd plot:
Nicely space the x-axis tick marks to whole mins (i.e. "11:14:00", "11:15:00")
Connect the data points (solved with subbing geom_line() with geom_path())
Example Rmd Code Below. Please Note that the graphs produced still have nice formatting, I'm not sure how to reproduce this problem sort of posting a 500 row dataframe. I also don't know how to post my rmd code without SO using the formatting commands in this post, so I threw in at 3 of " around my header formatting and at the end of the code to disable it.
Edits and Updates
I am getting a persistent error geom_path: Each group consists of only one observation. Do you need to adjust the group
aesthetic?.
As suggested by the commenters I tried removing plot() and using the the createChlDiffPlot() directly and replacing plot() with print(). Both produce the same ugly plots as before.
Replaced geom_line() with geom_path(). The points are now connected! x-axis cluttering is still there.
Time variable is reading as hms num
Many thanks for any help on this!
```
---
title: "Chl Filtration"
output:
flexdashboard::flex_dashboard:
theme: yeti
orientation: rows
editor_options:
chunk_output_type: console
---
```{r setup}
library(flexdashboard)
library(dplyr)
library(ggplot2)
library(hms)
library(ggthemes)
library(readr)
library(data.table)
#### Example Data
df1 <- data.frame(Time = as_hms(c("11:22:33","11:22:34","11:22:35","11:22:38","11:23:00","11:23:01","11:23:02")),
Chl_ug_L_Up = c(0.2,0.1,0.25,-0.2,-0.3,-0.15,0.1),
Chl_ug_L_Down = c(0.5,0.4,0.3,0.2,0.1,0,-0.1))
df2 <- data.frame(Time = as_hms(c("08:02:33","08:02:34","08:02:35","08:02:40","08:02:42","08:02:43","08:02:49")),
Chl_ug_L_Up = c(-0.2,-0.1,-0.25,0.2,0.3,0.15,-0.1),
Chl_ug_L_Down = c(-0.1,0,0.1,0.2,0.3,0.4,0.1))
data_directory = "./" # data folder in R project folder in the real deal
output_directory = "./" # output graph directory in R project folder
write_csv(df1, file.path(data_directory, "SO_example_df1.csv"))
write_csv(df2, file.path(data_directory, "SO_example_df2.csv"))
#### Function to create graphs
createChlDiffPlot = function(aTimeSeriesFile, aFileName, aGraphOutputDirectory, aType)
{
aFile_Mod = aTimeSeriesFile %<>%
select(Time, Chl_ug_L_Up, Chl_ug_L_Down) %>%
mutate(Chl_diff = Chl_ug_L_Up - Chl_ug_L_Down)
one_plot = ggplot(data = aFile_Mod, aes(x = Time, y = Chl_diff)) + # tried adding 'group = 1' in aes to connect points
geom_path(size = 1, color = "green") +
geom_point(color = "green") +
theme_gdocs() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.title = element_blank()) +
labs(x = "", y = "Chl Difference", title = paste0(aFileName, " - ", "Filtration"))
one_graph_name = paste0(gsub(".csv", "", aFileName), "_", aType, ".pdf")
ggsave(one_graph_name, one_plot, dpi = 600, width = 7, height = 5, units = "in", device = "pdf", aGraphOutputDirectory)
return(one_plot)
}
"``` ### remove the quotes when running example
Plots - After Velocity Adjustment
=====================================" ### remove quotes when running example
```{r, fig.width=13.5, fig.height=5}
all_files_Filtration = list.files(data_directory, pattern = ".csv")
# Loop to plot function
for(file in 1 : length(all_files_Filtration))
{
file_name = all_files_Filtration[file]
one_file = fread(file.path(data_directory, file_name))
# plot the time series agains
plot(createChlDiffPlot(one_file, file_name, output_directory, "Velocity_Paired"))
}
"``` #remove quotes when running example
```

I finally figured it out.
1) Replacing geom_line() with geom_path() connected the data points when rendered in Rmd.
2) df1$Time was formatted as a difftime object. When I looked at the dataframe in the global environment, Time :hmsnum 11:11:09 .... This made me think my format was ok, but when I ran class(df1$Time) I got [1] "hms" "difftime". With a quick google I found out difftime objects are not quite the same as hms, and my original time was generated by subtracting times. I added a conversion into my mutate function:
select(Time, Chl_ug_L_Up, Chl_ug_L_Down) %>%
mutate(Chl_diff = Chl_ug_L_Up - Chl_ug_L_Down,
Time = as_hms(Time)) # convert difftime objecct to hms
ggplot I think has some auto-formatting for hms variables, which is why difftime variable was producing ugly crowded x- axes.

Related

Plotly Express: Prevent bars from stacking when Y-axis catgories have the same name

I'm new to plotly.
Working with:
Ubuntu 20.04
Python 3.8.10
plotly==5.10.0
I'm doing a comparative graph using a horizontal bar chart. Different instruments measuring the same chemical compounds. I want to be able to do an at-a-glance, head-to-head comparison if the measured value amongst all machines.
The problem is; if the compound has the same name amongst the different instruments - Plotly stacks the data bars into a single bar with segment markers. I very much want each bar to appear individually. Is there a way to prevent Plotly Express from automatically stacking the common bars??
Examples:
CODE
gobardata = []
for blended_name in _df[:20].blended_name: # should always be unique
##################################
# Unaltered compound names
compound_names = [str(c) for c in _df[_df.blended_name == blended_name]["injcompound_name"].tolist()]
# Random number added to end of compound_names to make every string unique
# compound_names = ["{} ({})".format(str(c),random.randint(0, 1000)) for c in _df[_df.blended_name == blended_name]["injcompound_name"].tolist()]
##################################
deltas = _df[_df.blended_name == blended_name]["delta_rettime"].to_list()
gobardata.append(
go.Bar(
name = blended_name,
x = deltas,
y = compound_names,
orientation='h',
))
fig = go.Figure(data = gobardata)
fig.update_traces(width=1)
fig.update_layout(
bargap=1,
bargroupgap=.1,
xaxis_title="Delta Retention Time (Expected - actual)",
yaxis_title="Instrument name(Injection ID)"
)
fig.show()
What I'm getting (Using actual, but repeated, compound names)
What I want (Adding random text to each compound name to make it unique)
OK. Figured it out. This is probably pretty klugy, but it consistently works.
Basically...
Use go.FigureWidget...
...with make_subplots having a common x-axis...
...controlling the height of each subplot based on number of bars.
Every bar in each subplot is added as an individual trace...
...using a dictionary matching bar name to a common color.
The y-axis labels for each subplot is a list containing the machine name as [0], and then blank placeholders ('') so the length of the y-axis list matches the number of bars.
And manually manipulating the legend so each bar name appears only once.
# Get lists of total data
all_compounds = list(_df.injcompound_name.unique())
blended_names = list(_df.blended_name.unique())
#################################################################
# The heights of each subplot have to be set when fig is created.
# fig has to be created before adding traces.
# So, create a list of dfs, and use these to calculate the subplot heights
dfs = []
subplot_height_multiplier = 20
subplot_heights = []
for blended_name in blended_names:
df = _df[(_df.blended_name == blended_name)]#[["delta_rettime", "injcompound_name"]]
dfs.append(df)
subplot_heights.append(df.shape[0] * subplot_height_multiplier)
chart_height = sum(subplot_heights) # Prep for the height of the overall chart.
chart_width = 1000
# Make the figure
fig = make_subplots(
rows=len(blended_names),
cols=1,
row_heights = subplot_heights,
shared_xaxes=True,
)
# Create the color dictionary to match a color to each compound
_CSS_color = CSS_chart_color_list()
colors = {}
for compound in all_compounds:
try: colors[compound] = _CSS_color.pop()
except IndexError:
# Probably ran out of colors, so just reuse
_CSS_color = CSS_color.copy()
colors[compound] = _CSS_color.pop()
rowcount = 1
for df in dfs:
# Add bars individually to each subplot
bars = []
for label, labeldf in df.groupby('injcompound_name'):
fig.add_trace(
go.Bar(x = labeldf.delta_rettime,
y = [labeldf.blended_name.iloc[0]]+[""]*(len(labeldf.delta_rettime)-1),
name = label,
marker = {'color': colors[label]},
orientation = 'h',
),
row=rowcount,
col=1,
)
rowcount += 1
# Set figure to FigureWidget
fig = go.FigureWidget(fig)
# Adding individual traces creates redundancies in the legend.
# This removes redundancies from the legend
names = set()
fig.for_each_trace(
lambda trace:
trace.update(showlegend=False)
if (trace.name in names) else names.add(trace.name))
fig.update_layout(
height=chart_height,
width=chart_width,
title_text="∆ of observed RT to expected RT",
showlegend = True,
)
fig.show()

Not saving html interactive file with R

I am trying to design a circos plot using BioCircos R package. BioCircos allows to save the plots as .html interactive files. However, when I run the package using RScript the saved .html file is empty. To save the .html file I used saveWidget option from htmlwidgets package. Is it something wrong with saveWidget option? The code I used follows:
#!/usr/bin/Rscript
######R script for BioCircos test
library(htmlwidgets)
library(BioCircos)
genomes <- list("chra1" = 217471166, "chra2" = 181034961, "chra3" = 153873357, "chra4" = 153961319, "chra5" = 164033575,
"chra6" = 154486312, "chra7" = 133565930, "chra8" = 147241510, "chra9" = 91218944, "chra10" = 52432566, "chrb1" = 843366180, "chrb2" = 842558404, "chrb3" = 707956555, "chrb4" = 635713434, "chrb5" = 567300182,
"chrb6" = 439630435, "chrb7" = 236595445, "chrb8" = 231667822, "chrb9" = 230778867, "chrb10" = 151572763, "chrb11" = 103205957) # custom genome
links_chromosomes_01 <- c("chra1", "chra2", "chra3", "chra4", "chra4", "chra5", "chra6", "chra7", "chra7", "chra8", "chra8", "chra9", "chra10") # Chromosomes on which the links should start
links_chromosomes_02 <- c("chrb2", "chrb3", "chrb1", "chrb9", "chrb10", "chrb4", "chrb5", "chrb6", "chrb1", "chrb8", "chrb3", "chrb7", "chrb6") # Chromosomes on which the links should end
links_pos_01 <- c(115060347, 102611974, 14761160, 128700431, 128681496, 42116205, 58890582, 40356090,
146935315, 136481944, 157464876, 39323393, 84752508, 136164354,
99573657, 102580613,
111139346, 120764772, 90748238, 122164776,
44933176, 18823342,
48771409, 128288229, 150613881, 18509106, 123913217, 51237349,
34237851, 53357604, 78270031,
25306417, 25320614,
94266153,
41447919, 28810876, 2802465,
45583472,
81968637, 27858237, 17263637,
30569409) ### links chra chromosomes
links_pos_02 <- c(410543481, 463189512, 825903588, 353914638, 354135472, 717707494, 643107332, 724899652,
583713545, 558756961, 642015290, 154999098, 340216235, 557731577,
643350872, 655077847,
85356666, 157889318, 226411560, 161566470,
109857786, 25338955,
473876792, 124495704, 46258030, 572314729, 141584107, 426419779,
531245660, 220131772, 353941099,
62422773, 62387030,
116923325,
76544045, 33452274, 7942164,
642047816,
215981114, 39278129, 23302654,
418922633) ### links chrb chromosomes
links_labels <- c("aldh1a3", "amh", "cyp26b1", "dmrt1", "dmrt3", "fgf20", "hhip", "srd5a3",
"amhr2", "dhh", "fgf9", "nr0b1", "rspo1", "wnt1",
"aldh1a2", "cyp19a1",
"lhx9", "pdgfb", "ptch2", "sox10",
"cbln1", "wt1",
"esr1", "foxl2", "gata4", "lrpprc", "serpine2", "srd5a2",
"asns", "ctnnb1", "srd5a1",
"cyp26a1", "cyp26c1",
"wnt4",
"ar", "nr5a1", "ptgds",
"fgf16",
"cxcr4", "pdgfa", "sox8",
"sox9")
tracklist <- BioCircosLinkTrack('myLinkTrack', links_chromosomes_01, links_pos_01,
links_pos_01, links_chromosomes_02, links_pos_02, links_pos_02,
maxRadius = 0.55, labels = links_labels)
#plotting results
plot_chra_chrb <- BioCircos(tracklist, genome = chra_chrb_genomes, genomeFillColor = "RdBu", chrPad = 0.02, displayGenomeBorder = FALSE, genomeLabelTextSize = "10pt", genomeTicksScale = 4e+3,
elementId = "chra_chrb_comp_plot_test.html")
saveWidget(plot_chra_chrb, "chra_chrb_comp_plot_test.html", selfcontained = F, libdir = "lib")
The command line to run this script:
Rscript /path_to/Circle_plot_test.r
I tried to use this script in RStudio (without saveWidget() command), however it took too long to run in my personnel computer and the results was not displayed. However, this could be due to memory usage setup because when I took off some data, the script easily generates the plot in RStudio and I am able to save it. Is there other way to save the .hmtl interactive files in R or am I doing something wrong using htmlwidgets package in my script?
Thanks all in advance for any help and comments.
When you said it took too long to run, that was a sign that something was wrong! You weren't getting anything when you used saveWidget, because there is nothing returned from BioCiros.
I found two things that are a problem. The first one will result in a blank output—you can't use a '.' in the element ID. This ID will be used in the HTML coding.
You were getting huge delays due to the scale you set for genomeTickScale. That scaling value is for a tick mark attribute. I'm not sure why you set it to .004. However, when I comment out that line, it renders immediately. I have no issues with saving the widget, either.
--One other thing, you had chra_chrb_genomes as the object name assigned to the parameter genome in the function BioCircos. I assumed it was the object genome from your question since it was the only unused object.
The only things I changed were in the BioCircos function:
(plot_chra_chrb <- BioCircos(tracklist, genome = genomes, #chra_chrb_genomes,
genomeFillColor = "RdBu",
chrPad = 0.02,
displayGenomeBorder = FALSE,
genomeLabelTextSize = "10pt",
# genomeTicksScale = 4e+3, # problematic
elementId = "chra_chrb_comp_plot_test" # no periods
))

Iteratively read a fixed number of lines into R

I have a josn file I'm working with that contains multiple json objects in a single file. R is unable to read the file as a whole. But since each object occurs at regular intervals, I would like to iteratively read a fixed number of lines into R.
There are a number of SO questions on reading single lines into R but I have been unable to extend these solutions to a fixed number of lines. For my problem I need to read 16 lines into R at a time (eg 1-16, 17-32 etc)
I have tried using a loop but can't seem to get the syntax right:
## File
file <- "results.json"
## Create connection
con <- file(description=file, open="r")
## Loop over a file connection
for(i in 1:1000) {
tmp <- scan(file=con, nlines=16, quiet=TRUE)
data[i] <- fromJSON(tmp)
}
The file contains over 1000 objects of this form:
{
"object": [
[
"a",
0
],
[
"b",
2
],
[
"c",
2
]
]
}
With #tomtom inspiration I was able to find a solution.
## File
file <- "results.json"
## Loop over a file
for(i in 1:1000) {
tmp <- paste(scan(file=file, what="character", sep="\n", nlines=16, skip=(i-1)*16, quiet=TRUE),collapse=" ")
assign(x = paste("data", i, sep = "_"), value = fromJSON(tmp))
}
I couldn't create a connection as each time I tried the connection would close before the file had been completely read. So I got rid of that step.
I had to include the what="character" variable as scan() seems to expect a number by default.
I included sep="\n", paste() and collapse=" " to create a single string rather than the vector of characters that scan() creates by default.
Finally I just changed the final assignment operator to have a bit more control over the names of the output.
This might help:
EDITED to make it use a list and Reduce into one file
## Loop over a file connection
data <- NULL
for(i in 1:1000) {
tmp <- scan(file=con, nlines=16, skip=(i-1)*16, quiet=TRUE)
data[[i]] <- fromJSON(tmp)
}
df <- Reduce(function(x, y) {paste(x, y, collapse = " ")})
You would have to make sure that you don't reach further than the end of the file though ;-)

Wrapper to FOR loops with progress bar

I like to use a progress bar while running slow for loops. This could be done easily with several helpers, but I do like the tkProgressBar from tcltk package.
A small example:
pb <- tkProgressBar(title = "Working hard:", min = 0, max = length(urls), width = 300)
for (i in 1:300) {
# DO SOMETHING
Sys.sleep(0.5)
setTkProgressBar(pb, i, label=paste( round(i/length(urls)*100, 0), "% ready!"))
}
close(pb)
And I would like to set up a small function to store in my .Rprofile named to forp (as: for loop with progressbar), to call just like for but with auto added progress bar - but unfortunately have no idea how to implement and grab the expr part of the loop function. I had some experiments with do.call but without success :(
Imaginary working example (which acts like a for loop but creates a TkProgressBar and auto updates it in each iteration):
forp (i in 1:10) {
#do something
}
UPDATE: I think the core of the question is how to write a function which not only has parameters in the parentheses after the function (like: foo(bar)), but also can handle expr specified after the closing parentheses, like: foo(bar) expr.
BOUNTY OFFER: would go to any answer that could modify my suggested function to work like the syntax of basic for loops. E.g. instead of
> forp(1:1000, {
+ a<-i
+ })
> a
[1] 1000
it could be called like:
> forp(1:1000) {
+ a<-i
+ }
> a
[1] 1000
Just to clarify the task again: how could we grab the { expression } part of a function call? I am afraid that this is not possible, but will leave on the bounty for a few days for the pros :)
Given the other answers supplied, I suspect that it is impossible tough to do in exactly the way you specify.
However, I believe there is a way of getting very close, if you use the plyr package creatively. The trick is to use l_ply which takes a list as input and creates no output.
The only real differences between this solution and your specification is that in a for loop you can directly modify variables in the same environment. Using l_ply you need to send a function, so you will have to be more careful if you want to modify stuff in the parent environment.
Try the following:
library(plyr)
forp <- function(i, .fun){
l_ply(i, .fun, .progress="tk")
}
a <- 0
forp(1:100, function(i){
Sys.sleep(0.01)
a<<-a+i
})
print(a)
[1] 5050
This creates a progress bar and modifies the value of a in the global environment.
EDIT.
For the avoidance of doubt: The argument .fun will always be a function with a single argument, e.g. .fun=function(i){...}.
For example:
for(i in 1:10){expr} is equivalent to forp(1:10, function(i){expr})
In other words:
i is the looping parameter of the loop
.fun is a function with a single argument i
My solution is very similar to Andrie's except it uses base R, and I second his comments on the need to wrap what you want to do in a function and the subsequent need to use <<- to modify stuff in a higher environment.
Here's a function that does nothing, and does it slowly:
myfun <- function(x, text) {
Sys.sleep(0.2)
cat("running ",x, " with text of '", text, "'\n", sep="")
x
}
Here's my forp function. Note that regardless of what we're actually looping over, it instead loops over the sequence 1:n instead and get the right term of what we actually want within the loop. plyr does this automatically.
library(tcltk)
forp <- function(x, FUN, ...) {
n <- length(x)
pb <- tkProgressBar(title = "Working hard:", min = 0, max = n, width = 300)
out <- vector("list", n)
for (i in seq_len(n)) {
out[[i]] <- FUN(x[i], ...)
setTkProgressBar(pb, i, label=paste( round(i/n*100, 0), "% ready!"))
}
close(pb)
invisible(out)
}
And here's how both for and forp might be used, if all we want to do is call myfun:
x <- LETTERS[1:5]
for(xi in x) myfun(xi, "hi")
forp(x, myfun, text="hi")
And here's how they might be used if we want to modify something along the way.
out <- "result:"
for(xi in x) {
out <- paste(out, myfun(xi, "hi"))
}
out <- "result:"
forp(x, function(xi) {
out <<- paste(out, myfun(xi, "hi"))
})
For both versions the result is
> out
[1] "result: A B C D E"
EDIT: After seeing your (daroczig's) solution, I have another idea that might not be quite so unwieldy, which is to evaluate the expression in the parent frame. This makes it easier to allow for values other than i (now specified with the index argument), though as of right now I don't think it handles a function as the expression, though just to drop in instead a for loop that shouldn't matter.
forp2 <- function(index, x, expr) {
expr <- substitute(expr)
n <- length(x)
pb <- tkProgressBar(title = "Working hard:", min = 0, max = n, width = 300)
for (i in seq_len(n)) {
assign(index, x[i], envir=parent.frame())
eval(expr, envir=parent.frame())
setTkProgressBar(pb, i, label=paste( round(i/n*100, 0), "% ready!"))
}
close(pb)
}
The code to run my example from above would be
out <- "result:"
forp2("xi", LETTERS[1:5], {
out <- paste(out, myfun(xi, "hi"))
})
and the result is the same.
ANOTHER EDIT, based on the additional information in your bounty offer:
The syntax forX(1:1000) %doX$ { expression } is possible; that's what the foreach package does. I'm too lazy right now to build it off of your solution, but building off mine, it could look like this:
`%doX%` <- function(index, expr) {
x <- index[[1]]
index <- names(index)
expr <- substitute(expr)
n <- length(x)
pb <- tkProgressBar(title = "Working hard:", min = 0, max = n, width = 300)
for (i in seq_len(n)) {
assign(index, x[i], envir=parent.frame())
eval(expr, envir=parent.frame())
setTkProgressBar(pb, i, label=paste( round(i/n*100, 0), "% ready!"))
}
close(pb)
invisible(out)
}
forX <- function(...) {
a <- list(...)
if(length(a)!=1) {
stop("index must have only one element")
}
a
}
Then the use syntax is this, and the result is the same as above.
out <- "result:"
forX(xi=LETTERS[1:5]) %doX% {
out <- paste(out, myfun(xi, "hi"))
}
out
If you use the plyr family of commands instead of a for loop (generally a good idea if possible), you get as an added bonus a whole system of progress bars.
R.utils also has some progress bars built into it, and there exist instructions for using them in for loops.
R's syntax doesn't let you do exactly what you want, ie:
forp (i in 1:10) {
#do something
}
But what you can do is create some kind of iterator object and loop using while():
while(nextStep(m)){sleep.milli(20)}
Now you have the problem of what m is and how you make nextStep(m) have side effects on m in order to make it return FALSE at the end of your loop. I've written simple iterators that do this, as well as MCMC iterators that let you define and test for a burnin and thinning period within your loop.
Recently at the R User conference I saw someone define a 'do' function that then worked as an operator, something like:
do(100) %*% foo()
but I'm not sure that was the exact syntax and I'm not sure how to implement it or who it was put that up... Perhaps someone else can remember!
What you're hoping for, I think would be something that looks like
body(for)<- as.call(c(as.name('{'),expression([your_updatebar], body(for))))
And yep, the problem is that "for" is not a function, or at least not one whose "body" is accessible. You could, I suppose, create a "forp" function that takes as arguments 1) a string to be turned into the loop counter, e.g., " ( i in seq(1,101,5) )" , and 2) the body of your intended loop, e.g., y[i]<- foo[i]^2 ; points(foo[i],y[i], and then jump thru some getcallparse magic to execute the actual for loop.
Then , in pseudocode (not close to actual R code, but I think you see what should happen)
forp<-function(indexer,loopbody) {
pseudoparse( c("for (", indexer, ") {" ,loopbody,"}")
}
The problem is that the for-loop in R is treated special. A normal function is not allowed to look like that. Some small tweaks can make it loop pretty close though. And as #Aaron mentioned, the foreach package's %dopar% paradigm seems like the best fit. Here's my version of how it could work:
`%doprogress%` <- function(forExpr, bodyExpr) {
forExpr <- substitute(forExpr)
bodyExpr <- substitute(bodyExpr)
idxName <- names(forExpr)[[2]]
vals <- eval(forExpr[[2]])
e <- new.env(parent=parent.frame())
pb <- tkProgressBar(title = "Working hard:", min = 0, max = length(vals), width = 300)
for (i in seq_along(vals)) {
e[[idxName]] <- vals[[i]]
eval(bodyExpr, e)
setTkProgressBar(pb, i, label=paste( round(i/length(vals)*100, 0), "% ready!"))
}
}
# Example usage:
foreach(x = runif(10)) %doprogress% {
# do something
if (x < 0.5) cat("small\n") else cat("big")
}
As you can see, you have to type x = 1:10 instead of x in 1:10, and the infix operator %<whatever>% is needed to get hold of the looping construct and the loop body. I currently don't do any error checking (to avoid muddling the code). You should check the name of the function ("foreach"), the number of arguments to it (1) and that you actually get a valid loop variable ("x") and not an empty string.
I propose hereby two solutions that use the standard for syntax, both are using the great package progress from Gábor Csárdi and Rich FitzJohn
1) we can override temporarily or locally the for function to wrap around base::for and support progress bars.
2) we can define the unused for<-, and wrap around base::for using syntax pb -> for(it in seq) {exp} where pb is progress bar built with progress::progress_bar$new().
Both solutions behave as standard for calls :
The values changed at the previous iteration are available
on error the modified variables will have the value they had just before the error
I packaged my solution and will demo them below then will go through the code
Usage
#devtools::install_github("moodymudskipper/pbfor")
library(pbfor)
Using pb_for()
By default pb_for() will override the for function for one run only.
pb_for()
for (i in 1:10) {
# DO SOMETHING
Sys.sleep(0.5)
}
Using parameters from progress::progress_bar$new() :
pb_for(format = "Working hard: [:bar] :percent :elapsed",
callback = function(x) message("Were'd done!"))
for (i in 1:10) {
# DO SOMETHING
Sys.sleep(0.5)
}
Using for<-
The only restriction compared to a standard for call is that the first argument must exist and can't be NULL.
i <- NA
progress_bar$new() -> for (i in 1:10) {
# DO SOMETHING
Sys.sleep(0.5)
}
We can define a custom progress bar, and maybe define it conveniently in an initialisation script or in one's R profile.
pb <- progress_bar$new(format = "Working hard: [:bar] :percent :elapsed",
callback = function(x) ("Were'd done!"))
pb -> for (i in 1:10) {
# DO SOMETHING
Sys.sleep(0.5)
}
For nested progress bars we can use the following trick :
pbi <- progress_bar$new(format = "i: [:bar] :percent\n\n")
pbj <- progress_bar$new(format = "j: [:bar] :percent ")
i <- NA
j <- NA
pbi -> for (i in 1:10) {
pbj -> for (j in 1:10) {
# DO SOMETHING
Sys.sleep(0.1)
}
}
note that due to operator precedence the only way to call for<- and benefit from the syntax of for calls is to use the left to right arrow ´->´.
how they work
pb_for()
pb_for() creates a for function object in its parent environment, then the new for :
sets up a progress bar
modifies the loop content
adds a `*pb*`$tick() at the end of the loop content expression
feeds it back to base::`for` in a clean environment
assigns on exit all modified or created variables to the parent environment.
removes itself if once is TRUE (the default)
It's generally sensitive to override an operator, but it cleans after itself and won't affect the global environment if used in a function so I think it's safe enough to use.
for<-
This approach :
doesn't override for
allows the use of progress bar templates
has an arguably more intuitive api
It has a few drawbacks however:
its first argument must exist, which is the case for all assignment functions (fun<-).
it does some memory magic to find the name of its first argument as it's not easily done with assignment functions, this might have a performance cost, and I'm not 100% sure about the robustness
we need the package pryr
What it does :
find the name of the first argument, using a helper function
clone the progress bar input
edit it to account for the number of iterations of the loop (the length of the second argument of for<-
After this it's similar to what is described for pb_for() in the section above.
The code
pb_for()
pb_for <-
function(
# all args of progress::progress_bar$new() except `total` which needs to be
# infered from the 2nd argument of the `for` call, and `stream` which is
# deprecated
format = "[:bar] :percent",
width = options("width")[[1]] - 2,
complete = "=",
incomplete = "-",
current =">",
callback = invisible, # doc doesn't give default but this seems to work ok
clear = TRUE,
show_after = .2,
force = FALSE,
# The only arg not forwarded to progress::progress_bar$new()
# By default `for` will self detruct after being called
once = TRUE) {
# create the function that will replace `for`
f <- function(it, seq, expr){
# to avoid notes at CMD check
`*pb*` <- IT <- SEQ <- EXPR <- NULL
# forward all arguments to progress::progress_bar$new() and add
# a `total` argument computed from `seq` argument
pb <- progress::progress_bar$new(
format = format, width = width, complete = complete,
incomplete = incomplete, current = current,
callback = callback,
clear = clear, show_after = show_after, force = force,
total = length(seq))
# using on.exit allows us to self destruct `for` if relevant even if
# the call fails.
# It also allows us to send to the local environment the changed/created
# variables in their last state, even if the call fails (like standard for)
on.exit({
vars <- setdiff(ls(env), c("*pb*"))
list2env(mget(vars,envir = env), envir = parent.frame())
if(once) rm(`for`,envir = parent.frame())
})
# we build a regular `for` loop call with an updated loop code including
# progress bar.
# it is executed in a dedicated environment and the progress bar is given
# a name unlikely to conflict
env <- new.env(parent = parent.frame())
env$`*pb*` <- pb
eval(substitute(
env = list(IT = substitute(it), SEQ = substitute(seq), EXPR = substitute(expr)),
base::`for`(IT, SEQ,{
EXPR
`*pb*`$tick()
})), envir = env)
}
# override `for` in the parent frame
assign("for", value = f,envir = parent.frame())
}
for<- (and fetch_name())
`for<-` <-
function(it, seq, expr, value){
# to avoid notes at CMD check
`*pb*` <- IT <- SEQ <- EXPR <- NULL
# the symbol fed to `it` is unknown, R uses `*tmp*` for assignment functions
# so we go get it by inspecting the memory addresses
it_chr <- fetch_name(it)
it_sym <-as.symbol(it_chr)
# complete the progress bar with the `total` parameter
# we need to clone it because progress bars are environments and updated
# by reference
pb <- value$clone()
pb$.__enclos_env__$private$total <- length(seq)
# when the script ends, even with a bug, the values that have been changed
# are written to the parent frame
on.exit({
vars <- setdiff(ls(env), c("*pb*"))
list2env(mget(vars, env),envir = parent.frame())
})
# computations are operated in a separate environment so we don't pollute it
# with it, seq, expr, value, we need the progress bar so we name it `*pb*`
# unlikely to conflict by accident
env <- new.env(parent = parent.frame())
env$`*pb*` <- pb
eval(substitute(
env = list(IT = it_sym, SEQ = substitute(seq), EXPR = substitute(expr)),
base::`for`(IT, SEQ,{
EXPR
`*pb*`$tick()
})), envir = env)
# because of the `fun<-` syntax we need to return the modified first argument
invisible(get(it_chr,envir = env))
}
helpers:
fetch_name <- function(x,env = parent.frame(2)) {
all_addresses <- sapply(ls(env), address2, env)
all_addresses <- all_addresses[names(all_addresses) != "*tmp*"]
all_addresses_short <- gsub("(^|<)[0x]*(.*?)(>|$)","\\2",all_addresses)
x_address <- tracemem(x)
untracemem(x)
x_address_short <- tolower(gsub("(^|<)[0x]*(.*?)(>|$)","\\2",x_address))
ind <- match(x_address_short, all_addresses_short)
x_name <- names(all_addresses)[ind]
x_name
}
address2 <- getFromNamespace("address2", "pryr")
Thanks for everyone for your kind answers! As none of those fit my wacky needs, I started to steal some pieces of the given answers and made up a quite customized version:
forp <- function(iis, .fun) {
.fun <- paste(deparse(substitute(.fun)), collapse='\n')
.fun <- gsub(' <- ', ' <<- ', .fun, fixed=TRUE)
.fun <- paste(.fun, 'index.current <- 1 + index.current; setTkProgressBar(pb, index.current, label=paste( round(index.current/index.max*100, 0), "% ready!"))', sep='\n')
ifelse(is.numeric(iis), index.max <- max(iis), index.max <- length(iis))
index.current <- 1
pb <- tkProgressBar(title = "Working hard:", min = 0, max = index.max, width = 300)
for (i in iis) eval(parse(text=paste(.fun)))
close(pb)
}
This is quite lengthy for a simple function like this, but depends only on base (anf of course: tcltk) and has some nice features:
can be used on expressions, not just functions,
you do not have to use <<- in your expressions to update global environment, <- are replaced to <<- in the given expr. Well,this might be annoying for someone.
can be used with non-numeric indexes (see below). That is why the code become so long :)
Usage is similar to for except for you do not have to specify the i in part and you have to use i as index in the loop. Other drawback is that I did not find a way to grab the {...} part specified after a function, so this must be included in the parameters.
Example #1: Basic use
> forp(1:1000, {
+ a<-i
+ })
> a
[1] 1000
Try it to see the neat progress bar on your computer! :)
Example #2: Looping through some characters
> m <- 0
> forp (names(mtcars), {
+ m <- m + mean(mtcars[,i])
+ })
> m
[1] 435.69

How do I feed in my own data into PyAlgoTrade?

I'm trying to use PyAlogoTrade's event profiler
However I don't want to use data from yahoo!finance, I want to use my own but can't figure out how to parse in the CSV, it is in the format:
Timestamp Low Open Close High BTC_vol USD_vol [8] [9]
2013-11-23 00 800 860 847.666666 886.876543 853.833333 6195.334452 5248330 0
2013-11-24 00 745 847.5 815.01 860 831.255 10785.94131 8680720 0
The complete CSV is here
I want to do something like:
def main(plot):
instruments = ["AA", "AES", "AIG"]
feed = yahoofinance.build_feed(instruments, 2008, 2009, ".")
Then replace yahoofinance.build_feed(instruments, 2008, 2009, ".") with my CSV
I tried:
import csv
with open( 'FinexBTCDaily.csv', 'rb' ) as csvfile:
data = csv.reader( csvfile )
def main( plot ):
feed = data
But it throws an attribute error. Any ideas how to do this?
I suggest to create your own Rowparser and Feed, which is much easier than it sounds, have a look here: yahoofeed
This also allows you to work with intraday data and cleanup the data if needed, like your timestamp.
Another possibility, of course, would be to parse your file and save it, so it looks like a yahoo feed. In your case, you would have to adapt the columns and the Timestamp.
Step A: follow PyAlgoTrade doc on GenericBarFeed class
On this link see the addBarsFromCSV() in CSV section of the BarFeed class in v0.16
On this link see the addBarsFromCSV() in CSV section of the BarFeed class in v0.17
Note
- The CSV file must have the column names in the first row.
- It is ok if the Adj Close column is empty.
- When working with multiple instruments:
--- If all the instruments loaded are in the same timezone, then the timezone parameter may not be specified.
--- If any of the instruments loaded are in different timezones, then the timezone parameter should be set.
addBarsFromCSV( instrument, path, timezone = None )
Loads bars for a given instrument from a CSV formatted file. The instrument gets registered in the bar feed.
Parameters:
(string) instrument – Instrument identifier.
(string) path – The path to the CSV file.
(pytz) timezone – The timezone to use to localize bars.Check pyalgotrade.marketsession.
Next:
A BarFeed loads bars from CSV files that have the following format:
Date Time, Open, High, Low, Close, Volume, Adj Close
2013-01-01 13:59:00,13.51001,13.56,13.51,13.56789,273.88014126,13.51001
Step B: implement a documented CSV-file pre-formatting
Your CSV data will need a bit of sanity ( before will be able to be used in PyAlgoTrade methods ),however it is doable and you can create an easy transformator either by hand or with a use of a powerful numpy.genfromtxt() lambda-based converters facilities.
This sample code is intended for an illustration purpose, to see immediately the powers of converters for your own transformations, as CSV-structure differs.
with open( getCsvFileNAME( ... ), "r" ) as aFH:
numpy.genfromtxt( aFH,
skip_header = 1, # Ref. pyalgotrade
delimiter = ",",
# v v v v v v
# 2011.08.30,12:00,1791.20,1792.60,1787.60,1789.60,835
# 2011.08.30,13:00,1789.70,1794.30,1788.70,1792.60,550
# 2011.08.30,14:00,1792.70,1816.70,1790.20,1812.10,1222
# 2011.08.30,15:00,1812.20,1831.50,1811.90,1824.70,2373
# 2011.08.30,16:00,1824.80,1828.10,1813.70,1817.90,2215
converters = { 0: lambda aString: mPlotDATEs.date2num( datetime.datetime.strptime( aString, "%Y.%m.%d" ) ), #_______________________________________asFloat ( 1.0, +++ )
1: lambda aString: ( ( int( aString[0:2] ) * 60 + int( aString[3:] ) ) / 60. / 24. ) # ( 15*60 + 00 ) / 60. / 24.__asFloat < 0.0, 1.0 )
# HH: :MM HH MM
}
)
You can use pyalgotrade.barfeed.addBarsFromSequence with list comprehension to feed in data from CSV row by row/bar by bar. Basically you create a bar from each row, pass OHLCV as init parameters and extra columns with additional data in a dictionary. You can try something like this (with all the required imports):
data = pd.DataFrame(index=pd.date_range(start='2021-11-01', end='2021-11-05'), columns=['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume', 'ExtraCol1', 'ExtraCol3', 'ExtraCol4', 'ExtraCol5'], data=np.random.rand(5, 10))
feed = yahoofeed.Feed()
feed.addBarsFromSequence('instrumentID', data.index.map(lambda i:
BasicBar(
i,
data.loc[i, 'Open'],
data.loc[i, 'High'],
data.loc[i, 'Low'],
data.loc[i, 'Close'],
data.loc[i, 'Volume'],
data.loc[i, 'Adj Close'],
Frequency.DAY,
data.loc[i, 'ExtraCol1':].to_dict())
).values)
The input data frame was created with random values to make this example easier to reproduce, but the part where the bars are added to the feed should work the same for data frames from CSVs given that the valid column names are used.