R Shiny: How can we store values computed in an output, to be reused in an other output? - output

I'm developing at the moment my first Shiny app and I have a little issue:
In the "server part" of the app, I have 3 output objects (renderTables) named:
output$results
output$results2
output$results3
In the two first outputs, I compute some parameters using information comming from inputs. These computations are done in a conditionnal loop (if/if else) and for some of them stock in a variable "Resultats".
What I want to do is to reuse these parameters in the third output ("output$results3").
Is there a way to store these parameters when computing them and then to reuse them ?
As an example: I want to be able to use the value of "SP" (computed in out$results2) in the output$results3.
If I don't store them, the output$results3 just can't find the parameters...
I have already tried reactivevalues() function but it didn't work.
Do you have an idea how to do it?
# Define server logic to summarize and view selected dataset
server <- function(input, output) {
# 1. Calculs et formattage des résultats
#Demande
output$results <- renderTable({
if (input$shape == "QofPD") {
a <- input$a
b <- input$b
}
if (input$shape == "PofQD") {
a <- input$a/input$b
b <- 1/input$b
}
Q0 <- max(0,a - b*input$P0)
epd <- -b*input$P0/Q0
SC <- (a/b-input$P0)*Q0/2
DT0 <- input$P0*Q0
Variables <- c("Demande:", "Demande inverse:", "Prix (P*) = ","Quantité (Q*) = ",paste0("Elasticité-prix (",intToUtf8(949),") = "), "Dépense totale (DT*) = ", "Surplus du consommateur (SC) = ")
Resultats <- c(paste0("Q = ",round(a,2)," - ",round(b,2),"P"),paste0("P = ",round(a/b,2)," - ",round(1/b,2),"Q"),input$P0,round(Q0,2),round(epd,2),round(DT0,2),round(SC,2))
df.results <- data.frame(Variables, Resultats)
df.results
},
striped=TRUE, colnames=FALSE)
# Offre
output$results2 <- renderTable({
if (input$shape2 == "QofPO") {
a2 <- input$a2
b2 <- input$b2
inter <- -a2/b2
slope <- 1/b2
Q02 <- max(0,a2+b2*input$P0)
epo <- b2*input$P0/Q02
SP <- (Q02+a2)/2*input$P0
RT0 <- input$P0*Q02
Variables <- c("Offre:", "Offre inverse:", "Prix (P*) = ","Quantité (Q*) = ",paste0("Elasticité-prix (",intToUtf8(949),") = "), "Recette totale (RT*) = ", "Surplus du producteur (SP) = ")
Resultats <- c(paste0("Q = ",a2," + ",b2,"P"),paste0("P = ",round(slope,2),"Q"," ",round(inter,2)),input$P0,round(Q02,2),round(epo,2),round(RT0,2),round(SP,2))
df.results <- data.frame(Variables, Resultats)
df.results
}
else if (input$shape2 == "PofQO") {
a2 <- input$a2
b2 <- input$b2
inter <- a2
slope <- b2
Q02 <- max(0,(input$P0-a2)/b2)
epo <- (1/b2)*input$P0/Q02
SP <- (input$P0-inter)*Q02/2
RT0 <- input$P0*Q02
Variables <- c("Offre:", "Offre inverse:", "Prix (P*) = ","Quantité (Q*) = ",paste0("Elasticité-prix (",intToUtf8(949),") = "), "Recette totale (RT*) = ", "Surplus du producteur (SP) = ")
Resultats <- c(paste0("Q = ",round(-a2/b2,2)," + ",round(1/b2,2),"P"),paste0("P = ",round(inter,2)," + ",round(slope,2),"Q"),input$P0,round(Q02,2),round(epo,2),round(RT0,2),round(SP,2))
df.results <- data.frame(Variables, Resultats)
df.results
}
},
striped=TRUE, colnames=FALSE)
#Equilibre
output$results3 <- renderTable({
# Here I want to enter the parameters SP / SC...etc to be able to make new computation.
},
striped=TRUE, colnames=FALSE)
}
shinyApp(ui=ui, server=server)
I hope I was clear enough.
Thanks for your help.
And I stay available for more information if needed
Valentin

Related

Group by multiple columns in a function in dplyr

I want to create a function that takes an externally defined variable and uses it in a group by using dplyr. Here is what I have so far:
data(mtcars)
my_grp_col <- 'gear'
calculate_mean <- function(data, grouping_column, target){
data %>%
group_by(cyl, am, {{my_grp_col}}, target) %>%
summarize(mean(target, na.rm = T))
}
calculate_mean(data = mtcars, grouping_column = my_grp_col, target = mpg)
Essentially, I want to group by cyl, am, gear (which I have defined externally) and then calculate the mean of target (mpg).
The following would work (note that you need also {{...}} around target in this case):
data(mtcars)
my_grp_col <- 'gear'
calculate_mean <- function(data, grouping_column, target){
data %>%
group_by(cyl, am, !!sym(grouping_column), {{target}}) %>%
summarize(mean(target, na.rm = T))
}
calculate_mean(data = mtcars, grouping_column = my_grp_col, target = mpg)
However, it would look much nicer if you also directly give grouping_column without defining it as string before:
calculate_mean <- function(data, grouping_column, target){
data %>%
group_by(cyl, am, {{grouping_column}}, {{target}}) %>%
summarize(mean(target, na.rm = T))
}
calculate_mean(data = mtcars, grouping_column = gear, target = mpg)

Tips for Increasing the Effectiveness of this R Code

I am writing a loop (in R) to webscrape Reddit posts - using Reddit's API ("Pushshift").
Essentially, I would like to get every comment that contains the word "Trump" between now and until 20,000 hours ago at an hourly basis. The API stores the comments in a JSON frame - I wrote the following code in R to obtain these comments (note - I made it so that the results are saved after every 200 iterations in case of a crash):
library(jsonlite)
part1 = "https://api.pushshift.io/reddit/search/comment/?q=trump&after="
part2 = "h&before="
part3 = "h&size=500"
results = list()
for (i in 1:20000)
{tryCatch({
{
url_i<- paste0(part1, i+1, part2, i, part3)
r_i <- data.frame(fromJSON(url_i))
results[[i]] <- r_i
myvec_i <- sapply(results, NROW)
print(c(i, sum(myvec_i)))
ifelse(i %% 200 == 0, saveRDS(results, "results_index.RDS"), "" )
}
}, error = function(e){})
}
final = do.call(rbind.data.frame, results)
saveRDS(final, "final.RDS")
The code runs - but I am looking for tips to increase the speed and efficiency of this code. For example, I have noticed that:
Sometimes this code seems to take a really long time on certain iterations
I also have a feeling that as the "list" grows in size and the global environment with R becomes more full, things are also slowing down.
Sometimes, the webscraping stops collecting new results (i.e. I added a statement which shows the cumulative number of results that have been collected at each iteration - sometimes, this number stops updating)
I used "tryCatch()" to skip errors to prevent the loop from crashing - but perhaps there might have been some way around this that could have potentially resulted in more Reddit comments being scraped?
Could someone please recommend some tips on how to optimize and speed this code up? Perhaps someone could try running this code and let me know what they think?
Thank you!
There are two things you can do : 1) save the data.frame into a ".RData file" at each iteration. You need less memory when you do this because you do not store data in the RAM 2) use parallel calculations. Here is an example :
library(parallel)
library(doParallel)
library(RSelenium)
fn_Par <- function(core_Id, all_Index, list_remDr, nb_Core)
{
library(jsonlite)
library(RSelenium)
remDr <- list_remDr[[core_Id]]
remDr$open()
setwd("D:\\")
part1 <- "https://api.pushshift.io/reddit/search/comment/?q=trump&after="
part2 <- "h&before="
part3 <- "h&size=500"
nb_Index_All <- length(all_Index)
nb_Id_Per_Core <- floor(nb_Index_All / nb_Core)
index_To_Extract <- all_Index[(1 + (core_Id - 1) * nb_Id_Per_Core) : min((core_Id * nb_Id_Per_Core), nb_Index_All)]
for(i in index_To_Extract)
{
url_i <- paste0(part1, i + 1, part2, i, part3)
remDr$navigate(url_i)
Sys.sleep(0.5)
web_Obj <- remDr$findElement("css selector", 'body > pre')
r_i <- tryCatch(data.frame(fromJSON(web_Obj$getElementText()[[1]])), error = function(e) NA)
if(is.null(dim(r_i)) == FALSE)
{
Sys.sleep(10)
remDr$navigate(url_i)
web_Obj <- remDr$findElement("css selector", 'body > pre')
r_i <- tryCatch(data.frame(fromJSON(web_Obj$getElementText()[[1]])), error = function(e) NA)
}
save(r_i, file = paste0(i, "_core_Id_", core_Id, ".RData"))
Sys.sleep(0.5)
}
}
nb_CPU <- 4
cluster <- parallel::makeCluster(nb_CPU)
doParallel::registerDoParallel(cl = cluster)
list_remDr <- list()
list_rd <- list()
for(i in 1 : nb_CPU)
{
print(i)
port <- as.integer(4444L + rpois(lambda = 1000, 1))
list_rd[[i]] <- rsDriver(chromever = "105.0.5195.52", browser = "chrome", port = port)
list_remDr[[i]] <- list_rd[[i]]$client
}
parLapply(cluster, X = 1 : nb_CPU, fun = fn_Par, all_Index = 1 : 2000, list_remDr = list_remDr, nb_Core = nb_CPU)
Here is another approach that can be considered :
library(parallel)
library(doParallel)
fn_Par <- function(core_Id, all_Index, nb_Core)
{
library(jsonlite)
setwd("D:\\")
part1 <- "https://api.pushshift.io/reddit/search/comment/?q=trump&after="
part2 <- "h&before="
part3 <- "h&size=500"
nb_Index_All <- length(all_Index)
nb_Id_Per_Core <- floor(nb_Index_All / nb_Core)
index_To_Extract <- all_Index[(1 + (core_Id - 1) * nb_Id_Per_Core) : min((core_Id * nb_Id_Per_Core), nb_Index_All)]
for(i in index_To_Extract)
{
url_i <- paste0(part1, i + 1, part2, i, part3)
r_i <- tryCatch(data.frame(fromJSON(url_i)), error = function(e) NA)
if(is.null(dim(r_i)) == TRUE)
{
Sys.sleep(5)
r_i <- tryCatch(data.frame(fromJSON(url_i)), error = function(e) NA)
}
if(is.null(dim(r_i)) == TRUE)
{
Sys.sleep(5)
r_i <- tryCatch(data.frame(fromJSON(url_i)), error = function(e) NA)
}
if(is.null(dim(r_i)) == TRUE)
{
Sys.sleep(5)
r_i <- tryCatch(data.frame(fromJSON(url_i)), error = function(e) NA)
}
save(r_i, file = paste0(i, "_core_Id_", core_Id, ".RData"))
}
}
nb_CPU <- 4
cluster <- parallel::makeCluster(nb_CPU)
doParallel::registerDoParallel(cl = cluster)
parLapply(cluster, X = 1 : nb_CPU, fun = fn_Par, all_Index = 1 : 2000, nb_Core = nb_CPU)

Scrape page content after option tag is selected

I'd like to scrape the content of a page once the province (and the commune) are selected.
The following code correctly outputs the provinces and their values.
library(rvest)
page <- read_html(x = "https://www.solferinoesanmartino.it/progetto-torelli/progetto-torelli-risultati/")
text <- page %>% html_nodes(xpath='//select[#name="provincia"]/option')%>% html_text()
values <- page %>% html_nodes(xpath='//select[#name="provincia"]/option')%>% html_attr("value")
Res <- data.frame(text = text, values = values, stringsAsFactors = FALSE)
Res
Now, I'd like to access the page for each value, e.g. this might be helpful for getting access to value=19.
text <- page %>% html_nodes(xpath="//*/option[#value = '19']")%>% html_text()
text
The source code is the following
<div class="row results_form_search">
<form role="search" method="POST" class="search-form" action="/progetto-torelli/progetto-torelli-risultati/" id="search_location">
<input type="hidden" name="comune_from" value="" />
<div class="form-row">
<input type="text" name="cognome" placeholder="Cognome" autocomplete="off" value="">
<select name="provincia">
<option value="0" selected>Seleziona Provincia</option>
<option value="74"
>-
</option>
<option value="75"
>AGRIGENTO
</option>
<option value="19"
>ALESSANDRIA
This is where the content that I want to scrape might be.
<div class="row">
<ul class="listing_search">
</ul>
</div>
Thank you so much for your advice!
RSelenium may end up being the way to go. However, if you can insert some judicious waits, or chunk your requests, so server isn't swamped with requests, you can use rvest and make the same requests the page does.
You first need to generate all the combinations of province and comune (filtering out unwanted values); this can be done by making xmlhttp requests, using the value attribute for the options within the select for province, to gather back the comune dropdown options and their associated values.
You then make further requests, for each combination pair, to get the page content, which you would get when making selections from each of those dropdowns manually and pushing CERCA.
Pauses are needed as there are 10,389 valid combinations, by my reckoning, and, if you attempt to make all those requests one after the other, following the initial requests as well, the server will cut-off the connection.
Another option would be to chunk up combined into smaller dataframes and make requests for those at timed intervals and then combine the results.
library(rvest)
library(dplyr)
library(purrr)
get_provincias <- function(link) {
nodes <- read_html(link) %>%
html_nodes('[name="provincia"] > option:not([selected]):not(:contains("-")):not(:contains("\u0085"))')
df <- data.frame(
Provincia = nodes %>% html_text(trim = T),
id0 = nodes %>% html_attr("value")
)
return(df)
}
get_comunes <- function(id) {
link <- sprintf(
"https://www.solferinoesanmartino.it/db-torelli/_get_comuni.php?id0=%s&id1=0&_=%i",
id,
as.numeric(as.POSIXct(Sys.Date(), format = "%Y-%m-%d"))
)
# print(link)
nodes <- read_html(link) %>% html_nodes('option:not([value="0"])')
df <- data.frame(
id0 = id, # id1
Comune = nodes %>% html_text(trim = T),
id3 = nodes %>% html_attr("value")
)
return(df)
}
get_page <- function(prov_id, com_id) {
link <- sprintf(
"https://www.solferinoesanmartino.it/db-torelli/_get_soldati.php?id0=1&id1=&id2=%s&id3=%s&_=%i",
prov_id,
com_id,
as.numeric(as.POSIXct(Sys.Date(), format = "%Y-%m-%d"))
)
page <- read_html(link)
# print(page %>% html_node(".listing_name") %>% html_text(trim = T))
# print(tibble(id3 = com_id, page = page))
return(tibble(id3 = com_id, page = page))
}
provincias <- get_provincias("https://www.solferinoesanmartino.it/progetto-torelli/progetto-torelli-risultati")
comunes <- map_df(provincias$id0, get_comunes) %>% filter(Comune != "-")
combined <- dplyr::right_join(provincias, comunes, by = "id0")
# length(combined$Comune) -> 10389
results <- map2_dfr(combined$id0, combined$id3, .f = get_page)
final <- dplyr::inner_join(combined, results, by = "id3")
Below is a longer version, with the additional info you requested, where I played around with adding pauses. I still found that I could run up to, and including
combined <- dplyr::right_join(provincias, comunes, by = "id0")
in one go. But after that I needed to chunk requests into about 2000 requests batches with 20-30 minutes in between. You can try tweaking the timings below. I ended up using the commented out section to run each batch and then left a pause of 30 mins in between.
Some things to consider:
It seems that you can have comunes values like ... which still return listings. With that in mind you may wish to remove the :not parts of this:
html_nodes('[name="provincia"] > option:not([selected]):not(:contains("-")):not(:contains("\u0085"))')
as I assumed that was filtering out invalid results.
Next, you might consider writing a helper function with httr and retry,
to make the requests with backoff/retry, rather than use pauses.
Such a function might look like this:
httr::RETRY(
"GET",
<request url>,
times = 3,
pause_min = 20*60,
pause_base = 20*60)
Anyway, those are some ideas. Even without the server cutting the connection, via uses of waits, I still found it started to throttle requests, meaning some requests took quite a long time to complete. Optimizing this could potentially take a lot of time and effort. I spent a good few days playing around looking at chunk size and waits.
library(rvest)
library(dplyr)
library(purrr)
get_provincias <- function(link) {
nodes <- read_html(link) %>%
html_nodes('[name="provincia"] > option:not([selected]):not(:contains("-")):not(:contains("\u0085"))')
df <- data.frame(
Provincia = nodes %>% html_text(trim = T),
id0 = nodes %>% html_attr("value")
)
return(df)
}
get_comunes <- function(id) {
link <- sprintf(
"https://www.solferinoesanmartino.it/db-torelli/_get_comuni.php?id0=%s&id1=0&_=%i",
id,
as.numeric(as.POSIXct(Sys.Date(), format = "%Y-%m-%d"))
)
# print(link)
nodes <- read_html(link) %>% html_nodes('option:not([value="0"])')
df <- data.frame(
id0 = id, # id1
Comune = nodes %>% html_text(trim = T),
id3 = nodes %>% html_attr("value")
)
return(df)
}
get_data <- function(prov_id, com_id) {
link <- sprintf(
"https://www.solferinoesanmartino.it/db-torelli/_get_soldati.php?id0=1&id1=&id2=%s&id3=%s&_=%i",
prov_id,
com_id,
as.numeric(as.POSIXct(Sys.Date(), format = "%Y-%m-%d"))
)
# print(link)
page <- read_html(link)
df <- data.frame(
cognome = page %>% html_nodes(".listing_name") %>% html_text(trim = T),
livello = page %>% html_nodes(".listing_level") %>% html_text(trim = T),
id3 = com_id,# for later join back on comune
id0 = prov_id
)
Sys.sleep(.25) # pause for . sec
return(df)
}
get_chunks <- function(df, chunk_size) { # adapted from #BenBolker https://stackoverflow.com/a/7060331
n <- nrow(df)
r <- rep(1:ceiling(n / chunk_size), each = chunk_size)[1:n]
d <- split(df, r)
return(d)
}
write_rows <- function(df, filename) {
flag <- file.exists(filename)
df2 <- purrr::map2_dfr(df$id0, df$id3, .f = get_data)
write.table(df2,
file = filename, sep = ",",
append = flag,
quote = F, col.names = !flag,
row.names = F
)
Sys.sleep(60*10)
}
provincias <- get_provincias("https://www.solferinoesanmartino.it/progetto-torelli/progetto-torelli-risultati")
Sys.sleep(60*5)
comunes <- map_df(provincias$id0, get_comunes) %>% filter(Comune != "-")
Sys.sleep(60*10)
combined <- dplyr::right_join(provincias, comunes, by = "id0")
Sys.sleep(60*10)
chunked <- get_chunks(combined, 2000) # https://stackoverflow.com/questions/7060272/split-up-a-dataframe-by-number-of-rows
filename <- "prov_com_cog_liv.csv"
map(chunked, ~ write_rows(.x, filename))
## #### test case #####################
# df <- chunked[[6]]
#
# flag <- file.exists(filename)
#
# df2 <- map2_dfr(df$id0, df$id3, .f = get_data)
#
# write.table(df2,
# file = filename, sep = ",",
# append = flag,
# quote = F, col.names = !flag,
# row.names = F
# )
####################################
results <- read.csv(filename)
final <- dplyr::right_join(combined, results, by = "id3")

How can I get the data of the second web page?

I am trying to get the data from a web in R using rvest package: https://etfdb.com/stock/AAPL/
But no matter how I tried, I can only get the table of the first page. Could anybody help me do this? Thank you so much.
See code below. tb1 and tb2 are the same!! That's wired.
url1 <- "https://etfdb.com/stock/AAPL/#etfs&sort_name=weighting&sort_order=desc&page=1"
url2 <- "https://etfdb.com/stock/AAPL/#etfs&sort_name=weighting&sort_order=desc&page=2"
tbs1 <- rvest::html_nodes(xml2::read_html(url1), "table")
tbs2 <- rvest::html_nodes(xml2::read_html(url2), "table")
tb1 <- rvest::html_table(tbs1[1])[[1]]
tb2 <- rvest::html_table(tbs2[1])[[1]]
This website post GET requests to update JSON data to the table. After some attempts, this is the code I came up with to deal with JSON data: (not a beautiful code but it works)
library(rjson)
library(rvest)
library(writexl)
lastpage <- 9;
df <- data.frame();
for (i in 1:lastpage){
x <- fromJSON(file = paste("https://etfdb.com/data_set/?tm=40274&cond={%22by_stock%22:25}&no_null_sort=&count_by_id=true&limit=25&sort=weighting&order=desc&limit=25&offset=", 25 * (i-1), sep = ""));
x <- x[2][[1]];
pg_df <- data.frame(matrix(unlist(x), nrow=length(x), byrow=T),stringsAsFactors=FALSE);
df <- rbind(df, pg_df);
}
for (i in 1:nrow(df)){
df$X1[i] <- read_html(df$X1[i]) %>% html_text(trim = TRUE);
df$X3[i] <- read_html(df$X3[i]) %>% html_text(trim = TRUE);
df$X5[i] <- read_html(df$X5[i]) %>% html_text(trim = TRUE);
}
df <- data.frame(df$X1, df$X3, df$X5, df$X7, df$X9);
colnames(df) <- c("Ticker", "ETF", "ETFdb.com Category", "Expense Ratio", "Weighting");
write_xlsx(
df,
path = "stock.xlsx",
col_names = TRUE,
format_headers = TRUE,
use_zip64 = FALSE
)
Update:
You can see the data source at the attribute data-url of the table here:
I'll update the code that makes it easier for you:
library(rjson)
library(rvest)
library(writexl)
stock_ticket <- "AAPL";
url <- paste("https://etfdb.com/stock/", stock_ticket, sep = "");
lastpage <- 9;
df <- data.frame();
data_url <- read_html(url) %>% html_node(xpath = "//table[#id='etfs']") %>% html_attr("data-url");
for (i in 1:lastpage){
x <- fromJSON(file = paste("https://etfdb.com", data_url, "&offset=", 25 * (i-1), sep = ""));
x <- x[2][[1]];
pg_df <- data.frame(matrix(unlist(x), nrow=length(x), byrow=T),stringsAsFactors=FALSE);
df <- rbind(df, pg_df);
}
for (i in 1:nrow(df)){
df$X1[i] <- read_html(df$X1[i]) %>% html_text(trim = TRUE);
df$X3[i] <- read_html(df$X3[i]) %>% html_text(trim = TRUE);
df$X5[i] <- read_html(df$X5[i]) %>% html_text(trim = TRUE);
}
df <- data.frame(df$X1, df$X3, df$X5, df$X7, df$X9);
colnames(df) <- c("Ticker", "ETF", "ETFdb.com Category", "Expense Ratio", "Weighting");
write_xlsx(
df,
path = "stock.xlsx",
col_names = TRUE,
format_headers = TRUE,
use_zip64 = FALSE
)

Web scrape request does not work for long date range input in R

The code here works for web scraping by sending repeated request based on the input date range (startDate and endDate). Then data will be saved in csv file. I have used this code before for different xPath # html_node() argument, and it works fine. Now with different xPath, seems like it cannot works for longer date range. With this one, I also can't detect which data went missing because the code fails to work when the heading element applied as attr(). I've tried increase the range slowly and it only works until [1:29]. After that, no matter what the range used, it keep showing the old return. In some cases, I can see str(new_df) complete the request, but it keep saving the old return, as if the bind_rows fail. Sometimes (by using date of different year), I was able to extract the range desired by slowly increase the range (desired range is [1:92]). It makes me excited, but when I change the date input to get other different year it return the last record. sometimes with error, and sometimes the error not appear. I include the lengthy code here so anyone can reproduce it. I wonder if the website burdened by repeated request or my pc getting muzzy. Kindly help.
get_sounding_data <- function(region = c("naconf", "samer", "pac", "nz", "ant",
"np", "europe", "africa", "seasia", "mideast"),
date,
from_hr = c("00", "12", "all"),
to_hr = c("00", "12", "all"),
station_number = 48615) {
# we use these pkgs (I removed the readr and dplyr dependencies)
suppressPackageStartupMessages({
require("xml2", quietly = TRUE)
require("httr", quietly = TRUE)
require("rvest", quietly = TRUE)
})
# validate region
region <- match.arg(
arg = region,
choices = c(
"naconf", "samer", "pac", "nz", "ant",
"np", "europe", "africa", "seasia", "mideast"
)
)
# this actually validates the date for us if it's a character string
date <- as.Date(date)
# get year and month
year <- as.integer(format(date, "%Y"))
stopifnot(year %in% 1973:as.integer(format(Sys.Date(), "%Y")))
year <- as.character(year)
month <- format(date, "%m")
# we need these to translate day & *_hr to the param the app needs
c(
"0100", "0112", "0200", "0212", "0300", "0312", "0400", "0412",
"0500", "0512", "0600", "0612", "0700", "0712", "0800", "0812",
"0900", "0912", "1000", "1012", "1100", "1112", "1200", "1212",
"1300", "1312", "1400", "1412", "1500", "1512", "1600", "1612",
"1700", "1712", "1800", "1812", "1900", "1912", "2000", "2012",
"2100", "2112", "2200", "2212", "2300", "2312", "2400", "2412",
"2500", "2512", "2600", "2612", "2700", "2712", "2800", "2812",
"2900", "2912", "3000", "3012", "3100", "3112"
) -> hr_vals
c(
"01/00Z", "01/12Z", "02/00Z", "02/12Z", "03/00Z", "03/12Z", "04/00Z",
"04/12Z", "05/00Z", "05/12Z", "06/00Z", "06/12Z", "07/00Z", "07/12Z",
"08/00Z", "08/12Z", "09/00Z", "09/12Z", "10/00Z", "10/12Z", "11/00Z",
"11/12Z", "12/00Z", "12/12Z", "13/00Z", "13/12Z", "14/00Z", "14/12Z",
"15/00Z", "15/12Z", "16/00Z", "16/12Z", "17/00Z", "17/12Z", "18/00Z",
"18/12Z", "19/00Z", "19/12Z", "20/00Z", "20/12Z", "21/00Z", "21/12Z",
"22/00Z", "22/12Z", "23/00Z", "23/12Z", "24/00Z", "24/12Z", "25/00Z",
"25/12Z", "26/00Z", "26/12Z", "27/00Z", "27/12Z", "28/00Z", "28/12Z",
"29/00Z", "29/12Z", "30/00Z", "30/12Z", "31/00Z", "31/12Z"
) -> hr_inputs
hr_trans <- stats::setNames(hr_vals, hr_inputs)
o_from_hr <- from_hr <- as.character(tolower(from_hr))
o_to_hr <- to_hr <- as.character(tolower(to_hr))
if ((from_hr == "all") || (to_hr == "all")) {
from_hr <- to_hr <- "all"
} else {
from_hr <- hr_trans[sprintf("%s/%02dZ", format(date, "%d"), as.integer(from_hr))]
match.arg(from_hr, hr_vals)
to_hr <- hr_trans[sprintf("%s/%02dZ", format(date, "%d"), as.integer(to_hr))]
match.arg(to_hr, hr_vals)
}
# clean up the station number if it was entered as a double
station_number <- as.character(as.integer(station_number))
# execute the API call
httr::GET(
url = "http://weather.uwyo.edu/cgi-bin/sounding",
query = list(
region = region,
TYPE = "TEXT:LIST",
YEAR = year,
MONTH = sprintf("%02d", as.integer(month)),
FROM = from_hr,
TO = to_hr,
STNM = station_number
)
) -> res
# check for super bad errors (that we can't handle nicely)
httr::stop_for_status(res)
# get the page content
doc <- httr::content(res, as="text")
# if the site reports no data, issue a warning and return an empty data frame
if (grepl("Can't get", doc)) {
doc <- xml2::read_html(doc)
msg <- rvest::html_nodes(doc, "body")
msg <- rvest::html_text(msg, trim=TRUE)
msg <- gsub("\n\n+.*$", "", msg)
warning(msg)
return(data.frame(stringsAsFactors=FALSE))
}
# turn it into something we can parse
doc <- xml2::read_html(doc)
# get the metadata
#meta <- rvest::html_node(doc, "h2")
#meta <- rvest::html_text(meta, trim=TRUE)
#attr(doc, "meta") <- meta
raw_dat <- doc %>%
html_nodes("pre + h3") %>%
html_text()
indices <- doc %>%
str_split(pattern = "\n", simplify = T) %>%
map_chr(str_squish) %>%
tibble(x = .) %>%
separate(x, into = c("Station", "Value"), sep = ": ") %>%
filter(!is.na(Value))
data <- tidyr::spread(indices, Station, Value)
data
}
startDate <- as.Date("01-11-1984", format="%d-%m-%Y")
endDate <- as.Date("04-11-1984",format="%d-%m-%Y")
#startDate <- as.Date("01-11-1984", format="%d-%m-%Y")
#endDate <- as.Date("31-01-1985",format="%d-%m-%Y")
days <- seq(startDate, endDate, "day")
#wanted to have [1:92], but its not working
lapply(days[1:4], function(day) {
get_sounding_data(
region = "seasia",
date = day,
from_hr = "00",
to_hr = "00",
station_number = "48615"
)
}) -> soundings_48615
warnings()
new_df <- map(soundings_48615, . %>% mutate_all(parse_guess))
#str(new_df)
library(tidyr)
library(tidyverse)
library(dplyr)
dat <- bind_rows(new_df)
dat <- dat %>% separate(col =`Observation time`, into = c('Date', 'time'), sep = '/')
dat$Date <- as.Date(dat$Date, format = "%y%m%d")
#save in text file
library(xlsx)
write.csv(dat, 'c:/Users/Hp/Documents/1984.csv')
get_sounding_data <- NULL
error
Error in bind_rows_(x, .id) :
Column `1000 hPa to 500 hPa thickness` can't be converted from numeric to character
dat <- dat %>% separate(col =`Observation time`, into = c('Date', 'time'), sep = '/')
Error in eval_tidy(enquo(var), var_env) :
object 'Observation time' not found
I've install different R version, but this error keep come out. So I ignore it.
Error: package or namespace load failed for ‘xlsx’:
.onLoad failed in loadNamespace() for 'rJava', details:
call: fun(libname, pkgname)
error: No CurrentVersion entry in Software/JavaSoft registry! Try re-
installing Java and make sure R and Java have matching architectures.