for loop using ggplot for longitudinal data - html

I am trying to visualize my longtudinal data by graphs using ggplot through for loop.
for(i in colnames(dat_longC)[c(3,5:10,14,17:19,30:39)]){
print(ggplot(dat_longC, aes(x = exam, y = i, group = zz_nr))+ geom_point()+
geom_line() + xlab("Examination") + ylab(i))}
}
when I use the ggplot command in a for loop, I only get a line extending between examination times. If I use the same command on a single variable, it works and gives me trajectory graphs. What do you think could be the problem?

Your problem is that you are using i to indicate the column. That is just an index, so it does not know what you are actually trying to plot. you really want colnames(dat_longC)[i]. Unfortunately, that will still not work because you are using a string as a variable name, which does not work for ggplot2. Instead, you will likely need !!sym(colnames(dat_longC)[i]). I can't really test without your data, but here is some example code to help guide you.
library(tidyverse)
map(colnames(mtcars)[2:4],
\(x) ggplot(mtcars, aes(!!sym(x), mpg))+
geom_point()+
ggtitle(x))
#> [[1]]
#>
#> [[2]]
#>
#> [[3]]

for(i in colnames(dat_longC)[c(3,5:10,14,17:19,30:39)]){
print(ggplot(dat_longC, aes_string(x = "exam" , y = i, group = "zz_nr"))+ geom_point()+
geom_line() + xlab("Examination") + ylab(i))}
}
Thank you very much for your reply!
I just used aes_string and added quote to the variable names and it worked out.

Related

Deleting commas in R Markdown html output

I am using R Markdown to create an html file for regression results tables, which are produced by stargazer and lfe in a code chunk.
library(lfe); library(stargazer)
data <- data.frame(x = 1:10, y = rnorm(10), z = rnorm(10))
result <- stargazer(felm(y ~ x + z, data = data), type = 'html')
I create a html file win an inline code r result after the chunk above. However, a bunch of commas appear at the top of the table.
When I check the html code, I see almost every </tr> is followed by a comma.
How can I delete these commas?
Maybe not what you are looking for exactly but I am a huge fan of modelsummary. I knit to HTML to see how it looks and then usually knit to pdf. The modelsummary equivalent would look something like this
library(lfe)
library(modelsummary)
data = data.frame(x = 1:10, y = rnorm(10), z = rnorm(10))
results = felm(y ~ x + z, data = data)
modelsummary(results)
There are a lot of ways to customize it through kableExtra and other packages. The documentation is really good. Here is kind of a silly example
library(kableExtra)
modelsummary(results,
coef_map = c("x" = "Cool Treatment",
"z" = "Confounder",
"(Intercept)" = "(Intercept)")) %>%
row_spec(1, background = "#F5ABEA")

How to add beta parameter to F1 Score for fit_resample in TidyModel

I am using the fit_resamples() function in TidyModels to get the F1 metrics as below.
I would like to know how to pass the beta parameter whose default is set at 1 at the moment.
glm_workflow %>%
fit_resamples(resamples = trainDatFolds,
metrics = metric_set(roc_auc, pr_auc,
accuracy, f_meas),
control = control_resamples(save_pred = TRUE) %>%
collect_metrics()
Thanks a lot!
Zarni
You'll need a simple wrapper for the metric. See ?metric_set. The examples include one where the ccc() function is used with an argument.

How to get descriptive table for both continuous and categorical variables?

I want to get descriptive table in html format for all variables that are in data frame. I need for continuous variables mean and standard deviation. For categorical variables frequency (absolute count) of each category and percentage of each category. Also I need the count of missing values to be included.
Lets use this data:
data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA
I want to get table in html format that will look like this:
----------------------------------------------------------------------
Variables N (missing) Mean (SD) / %
----------------------------------------------------------------------
len 59 (1) 18.9 (7.65)
supp
OJ 30 50%
VC 29 48.33%
NA 1 1.67%
dose 60 1.17 (0.629)
I need also to set the number of digits after decimal point to show.
If you know better variant to display that information in html in better way than please provide your solution.
Here's a programatic way to create separate summary tables for the numeric and factor columns. Note that this doesn't make note of NAs in the table as you requested, but does ignore NAs to calculate summary stats as you did. It's a starting point, anyway. From here you could combine the tables and format the headers however you want.
If you knit this code within an RMarkdown document with HTML output, kable will automatically generate the html table and a css will format the table nicely with a horizontal rules as pictured below. Note that there's also a booktabs option to kable that makes prettier tables like the LaTeX booktabs package. Otherwise, see the documentation for knitr::kable for options.
library(dplyr)
library(tidyr)
library(knitr)
data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA
numeric_cols <- dplyr::select_if(df, is.numeric) %>%
gather(key = "variable", value = "value") %>%
group_by(variable) %>%
summarize(count = n(),
mean = mean(value, na.rm = TRUE),
sd = sd(value, na.rm = TRUE))
factor_cols <- dplyr::select_if(df, is.factor) %>%
gather(key = "variable", value = "value") %>%
group_by(variable, value) %>%
summarize(count = n()) %>%
mutate(p = count / sum(count, na.rm = TRUE))
knitr::kable(numeric_cols)
knitr::kable(factor_cols)
I found r package table1 that does what I want. Here is a code:
library(table1)
data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA
table1(reformulate(colnames(df)), data=df)

How different colors to different sections of a route on leaflet map? [R Studio]

I have a JSON file of a long route. The file contains the lat and long of of this route.
I'm trying to mark different sections of this route based on a set of criteria (which I've compiled in a dataframe). However, I'm facing to problems:
1) How do I break up this long set of lat and longs into segments? (can't do this manually because I have many route variations)
2) How do I assign a variable color to each segment?
I intend to use leaflet map (for its interactivity), but I'm open to better suggestions.
When working with spatial data, it helps to know spatial classes! I am assuming you know hoe to read your JSON file as a data frame into R.
Here's a reproducible example:
library(mapview)
library(sp)
### create some random data with coordinates from (data("breweries91", package = "mapview"))
set.seed(123)
dat <- data.frame(val = as.integer(rnorm(32, 10, 2)),
lon = coordinates(breweries91)[, 1],
lat = coordinates(breweries91)[, 2])
### state condition for creation of lines
cond <- c(8, 9, 10)
### loop through conditions and create a SpatialLines object for each condition
lns <- lapply(seq(cond), function(i) {
ind <- dat$val == cond[i]
sub_dat <- dat[ind, ]
coords <- cbind(sub_dat$lon, sub_dat$lat)
ln <- coords2Lines(coords, ID = as.character(cond[i]))
proj4string(ln) <- "+init=epsg:4326"
return(ln)
})
### view lines with mapview
mapview(lns[[1]], col = "darkred") +
mapview(lns[[2]], col = "forestgreen") +
mapview(lns[[3]], col = "cornflowerblue")
Essentially, what we are doing here is create a valid sp::SpatialLines object for each condition we specify. The we plot those using mapview given you mentioned interactivity. Plotting of spatial objects can be achieved in many ways (base, lattice, ggplot2, leaflet, ...) so there's many options to choose. Have a look at sp Gallery for a nice tutorial.
Note: This answer is only valid for non-projected geographic coordinates (i.e. latitude/longitude)!

Performance problem transforming JSON data

I've got some data in JSON format that I want to do some visualization on. The data (approximately 10MB of JSON) loads pretty fast, but reshaping it into a usable form takes a couple of minutes for just under 100,000 rows. I have something that works, but I think it can be done much better.
It may be easiest to understand by starting with my sample data.
Assuming you run the following command in /tmp:
curl http://public.west.spy.net/so/time-series.json.gz \
| gzip -dc - > time-series.json
You should be able to see my desired output (after a while) here:
require(rjson)
trades <- fromJSON(file="/tmp/time-series.json")$rows
data <- do.call(rbind,
lapply(trades,
function(row)
data.frame(date=strptime(unlist(row$key)[2], "%FT%X"),
price=unlist(row$value)[1],
volume=unlist(row$value)[2])))
someColors <- colorRampPalette(c("#000099", "blue", "orange", "red"),
space="Lab")
smoothScatter(data, colramp=someColors, xaxt="n")
days <- seq(min(data$date), max(data$date), by = 'month')
smoothScatter(data, colramp=someColors, xaxt="n")
axis(1, at=days,
     labels=strftime(days, "%F"),
     tick=FALSE)
You can get a 40x speedup by using plyr. Here is the code and the benchmarking comparison. The conversion to date can be done once you have the data frame and hence I have removed it from the code to facilitate apples-to-apples comparison. I am sure a faster solution exists.
f_ramnath = function(n) plyr::ldply(trades[1:n], unlist)[,-c(1, 2)]
f_dustin = function(n) do.call(rbind, lapply(trades[1:n],
function(row) data.frame(
date = unlist(row$key)[2],
price = unlist(row$value)[1],
volume = unlist(row$value)[2]))
)
f_mrflick = function(n) as.data.frame(do.call(rbind, lapply(trades[1:n],
function(x){
list(date=x$key[2], price=x$value[1], volume=x$value[2])})))
f_mbq = function(n) data.frame(
t(sapply(trades[1:n],'[[','key')),
t(sapply(trades[1:n],'[[','value')))
rbenchmark::benchmark(f_ramnath(100), f_dustin(100), f_mrflick(100), f_mbq(100),
replications = 50)
test elapsed relative
f_ramnath(100) 0.144 3.692308
f_dustin(100) 6.244 160.102564
f_mrflick(100) 0.039 1.000000
f_mbq(100) 0.074 1.897436
EDIT. MrFlick's solution leads to an additional 3.5x speedup. I have updated my tests.
I received another transformation by MrFlick in irc that was significantly faster and worth mentioning here:
data <- as.data.frame(do.call(rbind,
lapply(trades,
function(x) {list(date=x$key[2],
price=x$value[1],
volume=x$value[2])})))
It seems to be made significantly faster by not building the inner frames.
You are doing vectorized operations on single elements, which is very inefficient. Price and volume can be extracted like this:
t(sapply(trades,'[[','value'))
And dates like this:
strptime(sapply(trades,'[[','key')[c(F,T)],'%FT%X')
Now only some sugar and the complete code looks like this:
data.frame(
strptime(sapply(trades,'[[','key')[c(F,T)],'%FT%X'),
t(sapply(trades,'[[','value')))->data
names(data)<-c('date','price','volume')
On my notebook, the whole set gets converted in about 0.7s, while 10k first rows (10%) take circa 8s using the original algorithm.
Is batching an option? Process 1000 rows at a time perhaps depending on how deep your json is. Do you really need to transform all the data? I am not sure about r and what exactly you are dealing with, but I am thinking of a generic approach.
Also do take a look at this: http://jackson.codehaus.org/ : A High-performance JSON processor.