"attempt to select less than one element" with R and json - json

I use the package smappR for twitter-harvesting and analysis. When using the function getTimeline (https://github.com/SMAPPNYU/smappR/blob/master/R/get-timeline.R) in this loop:
#import excel-file with twitter usernames
riksdagsledamoter <- read.xlsx("lista_riksdagsledamoter_twitter.xlsx", sheetIndex=1)
# delete MPs with out twitter account
twitterusers <- riksdagsledamoter[!(is.na(riksdagsledamoter$twitterusername)), ]
row.names(twitterusers) <- seq(nrow(twitterusers))
names(twitterusers) <- c("twitterusername", "name", "parti") #convertering to vector
twitterusersvector <- as.vector(twitterusers$twitterusername)
for(i in 1:length(twitterusersvector)){
user <- twitterusersvector[i]
# user <- twitterusers$twitterusername[i]
# rov <- as.vector(user)
getTimeline(screen_name = user,
filename = "pa324.json", # where tweets will be stored
n=3200, ## number of tweets to download (max is 3,200)
oauth_folder = "~/Dropbox/Privat/R/credentials",
sleep=10, )
if ( file.info("pa324.json")$size > 120000000 ){
tweetsToMongo(file.name="pa324.json", ns="xxxx.yyyy",
host="192.168.x.x", username="", password="")
file.remove("pa324.json")
}
}
I get this error:
Error in json.data[[tweets]] : attempt to select less than one element
I have googled it and have tried:
row.names(twitterusers) <- seq(nrow(twitterusers))
With no help. What could be wrong?

Related

How to pass multiple value as ID to `rvest`

I want to extract a bunch of numbers from a website https://www.bcassessment.ca/ using PID (unique ID). The list of sample PID are shown below:
PID <- c("012-215-023", "024-521-647", "025-891-669")
For these values, I opened the website manually and in the search engine of the website, I chose PID from the list of available options and then searched these numbers. The search redirected me to the following URLs
URL <- c("https://www.bcassessment.ca//Property/Info/QTAwMDAwM1hIUA==",
"https://www.bcassessment.ca//Property/Info/QTAwMDAwNEJKMA==",
"https://www.bcassessment.ca//Property/Info/QTAwMDAwMUc5OA==")
Then for each of these URLs, I ran the code shown below, to extract the total value of the property:
out <- c()
for (i in 1: length(URL)) {
url <- URL[i]
out[i] <- url %>%
read_html %>%
html_nodes('span#lblTotalAssessedValue') %>%
html_text()
i <- i+1
}
which gives me the final result
[1] "$543,000" "$957,000" "$487,000"
The problem is that I have a list of PID (more than 50000) and I cannot manually search each of these PIDs in the website to find the actual link and then run rvest to scrape it. How do you recommend automating this process so I can only provide PIDs and get the output price?
Summary: for a list of known PID I want to open https://www.bcassessment.ca/ and extract the most up-to-date price of the property, and I want it to be done Automatically.
Test_PID
I added list of PID code, so you can check if you want to check the code is working:
structure(list(P.I.D.. = c("004-050-541", "016-658-540", "016-657-861",
"016-657-764", "019-048-386", "025-528-360", "800-058-036", "025-728-954",
"028-445-783", "027-178-048", "028-445-571", "025-205-145", "015-752-798",
"026-041-308", "024-521-698", "027-541-631", "024-360-651", "028-445-040",
"025-851-411", "025-529-293", "024-138-436", "023-893-796", "018-496-768",
"025-758-721", "024-219-665", "024-359-866", "018-511-015", "026-724-979",
"023-894-253", "006-331-505", "025-961-012", "024-219-690", "027-309-878",
"028-445-716", "025-759-060", "017-692-733", "025-728-237", "028-447-221",
"023-894-202", "028-446-020", "026-827-611", "028-058-798", "017-574-412",
"023-893-591", "018-511-457", "025-960-199", "027-178-714", "027-674-941",
"027-874-826", "025-110-390", "028-071-336", "018-257-984", "023-923-393",
"026-367-203", "027-601-854", "003-773-922", "025-902-989", "018-060-641",
"025-530-003", "018-060-722", "025-960-423", "016-160-126", "009-301-461",
"025-960-580", "019-090-315", "023-464-283", "028-445-503", "006-395-708",
"028-446-674", "018-258-549", "023-247-398", "029-321-166", "024-519-871",
"023-154-161", "003-904-547", "004-640-357", "006-314-864", "025-960-521",
"013-326-783", "003-430-049", "027-490-084", "024-360-392", "028-054-474",
"026-076-179", "005-309-689", "024-613-509", "025-978-551", "012-215-066",
"024-034-002", "025-847-244", "024-222-038", "003-912-019", "024-845-264",
"006-186-254", "026-826-691", "026-826-712", "024-575-569", "028-572-581",
"026-197-774", "009-695-958", "016-089-120", "025-703-811", "024-576-671",
"026-460-751", "026-460-149", "003-794-181", "018-378-684", "023-916-745",
"003-497-721", "003-397-599", "024-982-211", "018-060-129", "018-061-231",
"017-765-714", "027-303-799", "028-565-312", "018-061-010", "006-338-232",
"023-680-024", "028-983-971", "028-092-490", "006-293-239", "018-061-257",
"028-092-376", "018-060-137", "004-302-664", "016-988-060", "003-371-166",
"027-325-342", "011-475-480", "018-060-200")), row.names = c(NA,
-131L), class = c("tbl_df", "tbl", "data.frame"))
P.S. The website I mentioned is a public website and everyone can open it and add an address to find the estimated price of a property, so I don't think there is any problem with scraping it as it's a public database.
When you submit the pid through the form, it triggers the following call:
GET https://www.bcassessment.ca/Property/Search/GetByPid/012215023?PID=012215023&_=1619713418473
The call above has the following parameters:
012215023 is the PID without dash - in your input. It's both a path and query parameter
1619713418473 is the current timestamp in milliseconds since 1970 (unix timestamp)
The result of the call above is a json response like this:
{
"sEcho": 1,
"aaData": [
["XXXXXXX", "XXXXXXXX", "XXXXXXXXXXXX", "200-027-615-115-48-0004", "QTAwMDAwM1hIUA=="]
]
}
The above call returns the response as text/plain and not as application/json content type, so we have to parse it using jsonlite. Then pick the last item of aaData array value which is, in this case: QTAwMDAwM1hIUA== and build the resulted url like the one in your post.
The following code gets a list of PID and extracts the $ values for each one of these:
library(rvest)
getValueForPID <- function(pid) {
pidNum = gsub("-", "", pid)
time <- as.numeric(as.POSIXct(Sys.time()))*1000
output <- content(httr::GET(paste0("https://www.bcassessment.ca/Property/Search/GetByPid/",pidNum), query = list(
"PID" = pidNum,
"_" = format(time, digits=13)
)), "text", encoding = "UTF-8")
if(output == "found_no_results"){
return("")
}
data = jsonlite::fromJSON(output)
id = data$aaData[5]
text <- paste0("https://www.bcassessment.ca/Property/Info/", id) %>%
read_html %>%
html_nodes('span#lblTotalAssessedValue') %>%
html_text()
return(text)
}
PID <- c("004-050-541", "016-658-540", "016-657-861", "016-657-764", "019-048-386", "025-528-360", "800-058-036")
out <- c()
count <- 1
for (i in PID) {
print(i)
out[count] <- getValueForPID(i)
count <- count + 1
}
print(out)
sample output:
[1] "$543,000" "$957,000" "$487,000"
kaggle link: https://www.kaggle.com/bertrandmartel/bcassesment-pid

Inconsistent error in RStudio in Web-scraping script: "arguments imply differing number of rows: 31, 30"

(edited to make question and problem clearer)
I am running a script (R) to scrape Goodreads reviews. Recently, I get the following error message for some of the pages I'm trying to scrape:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 31, 30
The numbers at the end may change, f.e. 30, 33.
What seems strange to me is that the error is not constant, it only occurs for some of the pages I'm trying to scrape, although the script itself remains the same. Example: scraping the reviews of The Handmaid's Tale (https://www.goodreads.com/book/show/38447.The_Handmaid_s_Tale?ac=1&from_search=true&qid=ZGrzc7AfLN&rank=1) causes an error (32, 30), but scraping the reviews of Typhoon Kingdom (https://www.goodreads.com/book/show/52391186-typhoon-kingdom) causes no problems.
Full script:
library(rJava)
library(data.table)
library(dplyr)
library(magrittr)
library(rvest)
library(RSelenium)
library(lubridate)
library(stringr)
library(purrr)
options(stringsAsFactors = F) #needed to prevent errors when merging data frames
#Paste the Goodreads Url
url <- "https://www.goodreads.com/book/show/6101138-wolf-hall"
englishOnly = F #If FALSE, all languages are chosen
#Set your browser settings
#Do NOT use Firefox!
rD <- rsDriver(browser = "chrome", chromever = "83.0.4103.39")
remDr <- rD[["client"]]
remDr$setTimeout(type = "implicit", 2000)
remDr$navigate(url)
bookTitle = unlist(remDr$getTitle())
finalData = data.frame()
# Main loop going through the website pages
morePages = T
pageNumber = 1
while(morePages){
#Select reviews in correct language
selectLanguage = if(englishOnly){
selectLanguage = remDr$findElement("xpath", "//select[#id='language_code']/option[#value='']")
} else {
selectLanguage = remDr$findElement("xpath", "//select[#id='language_code']/option[1]")
}
selectLanguage$clickElement()
Sys.sleep(3)
#Expand all reviews
expandMore <- remDr$findElements("link text", "...more")
sapply(expandMore, function(x) x$clickElement())
#Extracting the reviews from the page
reviews <- remDr$findElements("css selector", "#bookReviews .stacked")
reviews.html <- lapply(reviews, function(x){x$getElementAttribute("outerHTML")[[1]]})
reviews.list <- lapply(reviews.html, function(x){read_html(x) %>% html_text()} )
reviews.text <- unlist(reviews.list)
#Get the review ID's from all the links
reviewId = reviews.html %>% str_extract("/review/show/\\d+")
reviewId = reviewId[!is.na(reviewId)] %>% str_extract("\\d+")
#Some reviews have only rating and no text, so we process them separately
onlyRating = unlist(map(1:length(reviews.text), function(i) str_detect(reviews.text[i], "^\\\n\\\n")))
#Full reviews
if(sum(!onlyRating) > 0){
filterData = reviews.text[!onlyRating]
fullReviews = purrr::map_df(seq(1, length(filterData), by=2), function(i){
review = unlist(strsplit(filterData[i], "\n"))
data.frame(
date = mdy(review[2]), #date
username = str_trim(review[5]), #user
rating = str_trim(review[9]), #overall
comment = str_trim(review[12]) #comment
)
})
#Add review text to full reviews
fullReviews$review = unlist(purrr::map(seq(2, length(filterData), by=2), function(i){
str_trim(str_remove(filterData[i], "\\s*\\n\\s*\\(less\\)"))
}))
} else {
fullReviews = data.frame()
}
#partial reviews (only rating)
if(sum(onlyRating) > 0){
filterData = reviews.text[onlyRating]
partialReviews = purrr::map_df(1:length(filterData), function(i){
review = unlist(strsplit(filterData[i], "\n"))
data.frame(
date = mdy(review[9]), #date
username = str_trim(review[4]), #user
rating = str_trim(review[8]), #overall
comment = "",
review = ""
)
})
} else {
partialReviews = data.frame()
}
finalData = rbind(finalData, cbind(reviewId, rbind(fullReviews, partialReviews)))
#Go to next page if possible
nextPage = remDr$findElements("xpath", "//a[#class='next_page']")
if(length(nextPage) > 0){
message(paste("PAGE", pageNumber, "Processed - Going to next"))
nextPage[[1]]$clickElement()
pageNumber = pageNumber + 1
Sys.sleep(2)
} else {
message(paste("PAGE", pageNumber, "Processed - Last page"))
morePages = FALSE
}
}
#end of the main loop
#Replace missing ratings by 'not rated'
finalData$rating = ifelse(finalData$rating == "", "not rated", finalData$rating)
#Stop server
rD[["server"]]$stop()
#Write results
write.csv(finalData, paste0(bookTitle, ".csv"), row.names = F)
message("FINISHED!")
I've removed some parts of the code to find out where the problem comes from and it seems to me that it must be caused by this piece of code that extracts the review-IDs:
#Get the review ID's from all the links
reviewId = reviews.html %>% str_extract("/review/show/\\d+")
reviewId = reviewId[!is.na(reviewId)] %>% str_extract("\\d+")
When I remove this piece of code and change finalData = rbind(finalData, cbind(reviewId, rbind(fullReviews, partialReviews))) to finalData = rbind(finalData, fullReviews, partialReviews), the script runs without problems and without causing any errors. However, I really need to be able to extract these review-IDs to properly anonymise my data, so leaving it out is not an option.
I've tried to exchange that part of the code with this, as this should also be able to scrape the review-ID as well(but please correct me if I'm wrong):
#Get the review ID's from all the links
reviewId = reviews.html %>% str_extract("review_\\d+")
reviewId = reviewId[!is.na(reviewId)] %>% str_extract("\\d+")
This did not solve the problem and caused the same error, though with some differences: 1. the error has completely different numbers: Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 30 and 2. the error now occurs for every single URL instead of for some, so I've actually managed to somehow make it worse.
As I don't have a lot of experience working with R (or scripts in general), this is about as far as my knowledge and solution skills stretch. I'm especially confused because the error only occurs for some URLs and not for the others. If you want to try it, you can simply run the full script as it is (it has a url for a book that caused an error). There is no need to change anything for a test run, except perhaps your chromever.
Does anyone know what causes this error and how it might be solved? Concrete steps would be very appreciated. Thank you!

Web scrape request does not work for long date range input in R

The code here works for web scraping by sending repeated request based on the input date range (startDate and endDate). Then data will be saved in csv file. I have used this code before for different xPath # html_node() argument, and it works fine. Now with different xPath, seems like it cannot works for longer date range. With this one, I also can't detect which data went missing because the code fails to work when the heading element applied as attr(). I've tried increase the range slowly and it only works until [1:29]. After that, no matter what the range used, it keep showing the old return. In some cases, I can see str(new_df) complete the request, but it keep saving the old return, as if the bind_rows fail. Sometimes (by using date of different year), I was able to extract the range desired by slowly increase the range (desired range is [1:92]). It makes me excited, but when I change the date input to get other different year it return the last record. sometimes with error, and sometimes the error not appear. I include the lengthy code here so anyone can reproduce it. I wonder if the website burdened by repeated request or my pc getting muzzy. Kindly help.
get_sounding_data <- function(region = c("naconf", "samer", "pac", "nz", "ant",
"np", "europe", "africa", "seasia", "mideast"),
date,
from_hr = c("00", "12", "all"),
to_hr = c("00", "12", "all"),
station_number = 48615) {
# we use these pkgs (I removed the readr and dplyr dependencies)
suppressPackageStartupMessages({
require("xml2", quietly = TRUE)
require("httr", quietly = TRUE)
require("rvest", quietly = TRUE)
})
# validate region
region <- match.arg(
arg = region,
choices = c(
"naconf", "samer", "pac", "nz", "ant",
"np", "europe", "africa", "seasia", "mideast"
)
)
# this actually validates the date for us if it's a character string
date <- as.Date(date)
# get year and month
year <- as.integer(format(date, "%Y"))
stopifnot(year %in% 1973:as.integer(format(Sys.Date(), "%Y")))
year <- as.character(year)
month <- format(date, "%m")
# we need these to translate day & *_hr to the param the app needs
c(
"0100", "0112", "0200", "0212", "0300", "0312", "0400", "0412",
"0500", "0512", "0600", "0612", "0700", "0712", "0800", "0812",
"0900", "0912", "1000", "1012", "1100", "1112", "1200", "1212",
"1300", "1312", "1400", "1412", "1500", "1512", "1600", "1612",
"1700", "1712", "1800", "1812", "1900", "1912", "2000", "2012",
"2100", "2112", "2200", "2212", "2300", "2312", "2400", "2412",
"2500", "2512", "2600", "2612", "2700", "2712", "2800", "2812",
"2900", "2912", "3000", "3012", "3100", "3112"
) -> hr_vals
c(
"01/00Z", "01/12Z", "02/00Z", "02/12Z", "03/00Z", "03/12Z", "04/00Z",
"04/12Z", "05/00Z", "05/12Z", "06/00Z", "06/12Z", "07/00Z", "07/12Z",
"08/00Z", "08/12Z", "09/00Z", "09/12Z", "10/00Z", "10/12Z", "11/00Z",
"11/12Z", "12/00Z", "12/12Z", "13/00Z", "13/12Z", "14/00Z", "14/12Z",
"15/00Z", "15/12Z", "16/00Z", "16/12Z", "17/00Z", "17/12Z", "18/00Z",
"18/12Z", "19/00Z", "19/12Z", "20/00Z", "20/12Z", "21/00Z", "21/12Z",
"22/00Z", "22/12Z", "23/00Z", "23/12Z", "24/00Z", "24/12Z", "25/00Z",
"25/12Z", "26/00Z", "26/12Z", "27/00Z", "27/12Z", "28/00Z", "28/12Z",
"29/00Z", "29/12Z", "30/00Z", "30/12Z", "31/00Z", "31/12Z"
) -> hr_inputs
hr_trans <- stats::setNames(hr_vals, hr_inputs)
o_from_hr <- from_hr <- as.character(tolower(from_hr))
o_to_hr <- to_hr <- as.character(tolower(to_hr))
if ((from_hr == "all") || (to_hr == "all")) {
from_hr <- to_hr <- "all"
} else {
from_hr <- hr_trans[sprintf("%s/%02dZ", format(date, "%d"), as.integer(from_hr))]
match.arg(from_hr, hr_vals)
to_hr <- hr_trans[sprintf("%s/%02dZ", format(date, "%d"), as.integer(to_hr))]
match.arg(to_hr, hr_vals)
}
# clean up the station number if it was entered as a double
station_number <- as.character(as.integer(station_number))
# execute the API call
httr::GET(
url = "http://weather.uwyo.edu/cgi-bin/sounding",
query = list(
region = region,
TYPE = "TEXT:LIST",
YEAR = year,
MONTH = sprintf("%02d", as.integer(month)),
FROM = from_hr,
TO = to_hr,
STNM = station_number
)
) -> res
# check for super bad errors (that we can't handle nicely)
httr::stop_for_status(res)
# get the page content
doc <- httr::content(res, as="text")
# if the site reports no data, issue a warning and return an empty data frame
if (grepl("Can't get", doc)) {
doc <- xml2::read_html(doc)
msg <- rvest::html_nodes(doc, "body")
msg <- rvest::html_text(msg, trim=TRUE)
msg <- gsub("\n\n+.*$", "", msg)
warning(msg)
return(data.frame(stringsAsFactors=FALSE))
}
# turn it into something we can parse
doc <- xml2::read_html(doc)
# get the metadata
#meta <- rvest::html_node(doc, "h2")
#meta <- rvest::html_text(meta, trim=TRUE)
#attr(doc, "meta") <- meta
raw_dat <- doc %>%
html_nodes("pre + h3") %>%
html_text()
indices <- doc %>%
str_split(pattern = "\n", simplify = T) %>%
map_chr(str_squish) %>%
tibble(x = .) %>%
separate(x, into = c("Station", "Value"), sep = ": ") %>%
filter(!is.na(Value))
data <- tidyr::spread(indices, Station, Value)
data
}
startDate <- as.Date("01-11-1984", format="%d-%m-%Y")
endDate <- as.Date("04-11-1984",format="%d-%m-%Y")
#startDate <- as.Date("01-11-1984", format="%d-%m-%Y")
#endDate <- as.Date("31-01-1985",format="%d-%m-%Y")
days <- seq(startDate, endDate, "day")
#wanted to have [1:92], but its not working
lapply(days[1:4], function(day) {
get_sounding_data(
region = "seasia",
date = day,
from_hr = "00",
to_hr = "00",
station_number = "48615"
)
}) -> soundings_48615
warnings()
new_df <- map(soundings_48615, . %>% mutate_all(parse_guess))
#str(new_df)
library(tidyr)
library(tidyverse)
library(dplyr)
dat <- bind_rows(new_df)
dat <- dat %>% separate(col =`Observation time`, into = c('Date', 'time'), sep = '/')
dat$Date <- as.Date(dat$Date, format = "%y%m%d")
#save in text file
library(xlsx)
write.csv(dat, 'c:/Users/Hp/Documents/1984.csv')
get_sounding_data <- NULL
error
Error in bind_rows_(x, .id) :
Column `1000 hPa to 500 hPa thickness` can't be converted from numeric to character
dat <- dat %>% separate(col =`Observation time`, into = c('Date', 'time'), sep = '/')
Error in eval_tidy(enquo(var), var_env) :
object 'Observation time' not found
I've install different R version, but this error keep come out. So I ignore it.
Error: package or namespace load failed for ‘xlsx’:
.onLoad failed in loadNamespace() for 'rJava', details:
call: fun(libname, pkgname)
error: No CurrentVersion entry in Software/JavaSoft registry! Try re-
installing Java and make sure R and Java have matching architectures.

For loop - doesn't work properly

I'm trying to use Indeed API to search for specific jobs and I faced a problem when for loop doesn't go through each iterations.
Here is the example of code that I used:
original_url_1 <- "http://api.indeed.com/ads/apisearch?publisher=750330686195873&format=json&q="
original_url_2 <-"&l=Canada&sort=date&radius=10&st=&jt=&start=0&limit=25&fromage=3&filter=&latlong=1&co=ca&chnl=&userip=69.46.99.196&useragent=Mozilla/%2F4.0%28Firefox%29&v=2"
keywords <- c("data+scientist", "data+analyst")
for(i in keywords) {
url <- paste0(original_url_1,i,original_url_2)
x <- as.data.frame(jsonlite::fromJSON(httr::content(httr::GET(url),
as = "text", encoding = "UTF-8")))
data <- rbind(data, x)
}
Url leads to JSON file and adding one of the keyword to the url will change the JSON file. So I'm trying to repeat this for all keywords and store the result in the dataframe. However, when I'm trying to use more keywords I'm getting the result only for a few first keywords.
original_url_1 <- "http://api.indeed.com/ads/apisearch?publisher=750330686195873&format=json&q="
original_url_2 <-"&l=Canada&sort=date&radius=10&st=&jt=&start=0&limit=25&fromage=3&filter=&latlong=1&co=ca&chnl=&userip=69.46.99.196&useragent=Mozilla/%2F4.0%28Firefox%29&v=2"
keywords <- c("data_scientist", "data+analyst")
data<-data.table(NULL)#initialization of object
for(i in keywords) {
url <- paste0(Original_url_1,i,Original_url_2)
x <- as.data.frame(jsonlite::fromJSON(httr::content(httr::GET(url),as = "text", encoding = "UTF-8")))
data <- rbind(data, x)
}
>dim(data)
[1] 39 31
Here is the correct code:
original_url_1 <- "http://api.indeed.com/ads/apisearch?publisher=750330686195873&format=json&q="
original_url_2 <-"&l=Canada&sort=date&radius=10&st=&jt=&start=0&limit=25&fromage=3&filter=&latlong=1&co=ca&chnl=&userip=69.46.99.196&useragent=Mozilla/%2F4.0%28Firefox%29&v=2"
keywords <- c("data+scientist", "data+analyst")
data <- data.frame()
for (i in keywords) {
tryCatch({url <- paste0(original_url_1,i,original_url_2)
x <- as.data.frame(jsonlite::fromJSON(httr::content(httr::GET(url),
as = "text", encoding = "UTF-8")))
data <- rbind(data, x)
}, error = function(t){})
}

How to handle HTTP error 503 when making API calls to process JSON files in R with the jsonlite package?

I'm having problems using the JSONlite package in R to collect Dota2 match data using the Steam API. I am not an experienced developer and really appreciate any help. Thanks!
I have created a script in R. When I check the API call using a web browser it correctly returns the JSON contents, but when I execute the very same API call in R (either in a for loop or as a single call) using the fromJSON() function, I get the following errors:
Error in open.connection(con, "rb") : HTTP error 503.
In addition: Warning message:
closing unused connection 3 (https://api.steampowered.com/IDOTA2Match_570/GetMatchDetails/V001/?match_id=2170111273&key=XXXXXXXXXXPLACEHOLDERXXXXXXXXXXX)
This is the R script I have created to collect multiple JSON responses using the fromJSON command and jsonlite:
# Load required libraries
library(rvest)
library(stringr)
library(magrittr)
library(plyr)
library(dplyr)
library(tidyr)
library(knitr)
library(XML)
library(data.table)
library(foreign)
library(pbapply)
library(jsonlite)
## Set base url components
base.url_0 = "https://api.steampowered.com/IDOTA2Match_570/GetMatchDetails/V001/?match_id="
base.url_0.1 = "&key="
steamAPIkey = "XXXXXXXXXXPLACEHOLDERXXXXXXXXXXX" # Steam API Key
### Create for loop where each "i" is a DOTA2 match ID
for(i in seq(1:length(targets$match_id))) {
base.url = paste0(
base.url_0,
targets$match_id[i],
base.url_0.1,
steamAPIkey)
message("Retrieving page ", targets$match_id[i])
## Get JSON response and store into data.frame
ifelse(
tmp_json <- fromJSON(
txt = base.url,flatten = T), # if the json file exists
as.data.frame(tmp_errors_1$matches) <- base.url # if the json file does not exists
) # close ifelse statement
tmp_json <- try_default(
expr =
as.data.frame(tmp_json), # convert json file into a data frame
default =
as.data.frame(tmp_errors_2$matches) <- base.url, quiet = T) # if error, add match id to a dataframe
## Rbindlist
l = list(results, tmp_json)
results <- rbindlist(l,fill = T)
## Sleep for x seconds
Sys.sleep(runif(1, 2, 3))
## End of loop
}