How to pass multiple value as ID to `rvest`

How to pass multiple value as ID to `rvest` - html

I want to extract a bunch of numbers from a website https://www.bcassessment.ca/ using PID (unique ID). The list of sample PID are shown below:
PID <- c("012-215-023", "024-521-647", "025-891-669")
For these values, I opened the website manually and in the search engine of the website, I chose PID from the list of available options and then searched these numbers. The search redirected me to the following URLs
URL <- c("https://www.bcassessment.ca//Property/Info/QTAwMDAwM1hIUA==",
"https://www.bcassessment.ca//Property/Info/QTAwMDAwNEJKMA==",
"https://www.bcassessment.ca//Property/Info/QTAwMDAwMUc5OA==")
Then for each of these URLs, I ran the code shown below, to extract the total value of the property:
out <- c()
for (i in 1: length(URL)) {
url <- URL[i]
out[i] <- url %>%
read_html %>%
html_nodes('span#lblTotalAssessedValue') %>%
html_text()
i <- i+1
}
which gives me the final result
[1] "$543,000" "$957,000" "$487,000"
The problem is that I have a list of PID (more than 50000) and I cannot manually search each of these PIDs in the website to find the actual link and then run rvest to scrape it. How do you recommend automating this process so I can only provide PIDs and get the output price?
Summary: for a list of known PID I want to open https://www.bcassessment.ca/ and extract the most up-to-date price of the property, and I want it to be done Automatically.
Test_PID
I added list of PID code, so you can check if you want to check the code is working:
structure(list(P.I.D.. = c("004-050-541", "016-658-540", "016-657-861",
"016-657-764", "019-048-386", "025-528-360", "800-058-036", "025-728-954",
"028-445-783", "027-178-048", "028-445-571", "025-205-145", "015-752-798",
"026-041-308", "024-521-698", "027-541-631", "024-360-651", "028-445-040",
"025-851-411", "025-529-293", "024-138-436", "023-893-796", "018-496-768",
"025-758-721", "024-219-665", "024-359-866", "018-511-015", "026-724-979",
"023-894-253", "006-331-505", "025-961-012", "024-219-690", "027-309-878",
"028-445-716", "025-759-060", "017-692-733", "025-728-237", "028-447-221",
"023-894-202", "028-446-020", "026-827-611", "028-058-798", "017-574-412",
"023-893-591", "018-511-457", "025-960-199", "027-178-714", "027-674-941",
"027-874-826", "025-110-390", "028-071-336", "018-257-984", "023-923-393",
"026-367-203", "027-601-854", "003-773-922", "025-902-989", "018-060-641",
"025-530-003", "018-060-722", "025-960-423", "016-160-126", "009-301-461",
"025-960-580", "019-090-315", "023-464-283", "028-445-503", "006-395-708",
"028-446-674", "018-258-549", "023-247-398", "029-321-166", "024-519-871",
"023-154-161", "003-904-547", "004-640-357", "006-314-864", "025-960-521",
"013-326-783", "003-430-049", "027-490-084", "024-360-392", "028-054-474",
"026-076-179", "005-309-689", "024-613-509", "025-978-551", "012-215-066",
"024-034-002", "025-847-244", "024-222-038", "003-912-019", "024-845-264",
"006-186-254", "026-826-691", "026-826-712", "024-575-569", "028-572-581",
"026-197-774", "009-695-958", "016-089-120", "025-703-811", "024-576-671",
"026-460-751", "026-460-149", "003-794-181", "018-378-684", "023-916-745",
"003-497-721", "003-397-599", "024-982-211", "018-060-129", "018-061-231",
"017-765-714", "027-303-799", "028-565-312", "018-061-010", "006-338-232",
"023-680-024", "028-983-971", "028-092-490", "006-293-239", "018-061-257",
"028-092-376", "018-060-137", "004-302-664", "016-988-060", "003-371-166",
"027-325-342", "011-475-480", "018-060-200")), row.names = c(NA,
-131L), class = c("tbl_df", "tbl", "data.frame"))
P.S. The website I mentioned is a public website and everyone can open it and add an address to find the estimated price of a property, so I don't think there is any problem with scraping it as it's a public database.

When you submit the pid through the form, it triggers the following call:
GET https://www.bcassessment.ca/Property/Search/GetByPid/012215023?PID=012215023&_=1619713418473
The call above has the following parameters:
012215023 is the PID without dash - in your input. It's both a path and query parameter
1619713418473 is the current timestamp in milliseconds since 1970 (unix timestamp)
The result of the call above is a json response like this:
{
"sEcho": 1,
"aaData": [
["XXXXXXX", "XXXXXXXX", "XXXXXXXXXXXX", "200-027-615-115-48-0004", "QTAwMDAwM1hIUA=="]
]
}
The above call returns the response as text/plain and not as application/json content type, so we have to parse it using jsonlite. Then pick the last item of aaData array value which is, in this case: QTAwMDAwM1hIUA== and build the resulted url like the one in your post.
The following code gets a list of PID and extracts the $ values for each one of these:
library(rvest)
getValueForPID <- function(pid) {
pidNum = gsub("-", "", pid)
time <- as.numeric(as.POSIXct(Sys.time()))*1000
output <- content(httr::GET(paste0("https://www.bcassessment.ca/Property/Search/GetByPid/",pidNum), query = list(
"PID" = pidNum,
"_" = format(time, digits=13)
)), "text", encoding = "UTF-8")
if(output == "found_no_results"){
return("")
}
data = jsonlite::fromJSON(output)
id = data$aaData[5]
text <- paste0("https://www.bcassessment.ca/Property/Info/", id) %>%
read_html %>%
html_nodes('span#lblTotalAssessedValue') %>%
html_text()
return(text)
}
PID <- c("004-050-541", "016-658-540", "016-657-861", "016-657-764", "019-048-386", "025-528-360", "800-058-036")
out <- c()
count <- 1
for (i in PID) {
print(i)
out[count] <- getValueForPID(i)
count <- count + 1
}
print(out)
sample output:
[1] "$543,000" "$957,000" "$487,000"
kaggle link: https://www.kaggle.com/bertrandmartel/bcassesment-pid

Related

(R) Webscraping Error : arguments imply differing number of rows: 1, 0

I am working with the R programming language.
In a previous question (R: Webscraping Pizza Shops - "read_html" not working?), I learned how to scrape the names and address of Pizza Stores from YellowPages (e.g. https://www.yellowpages.ca/search/si/2/pizza/Canada). Here is the code for how to scrape a single page:
library(tidyverse)
library(rvest)
scraper <- function(url) {
page <- url %>%
read_html()
tibble(
name = page %>%
html_elements(".jsListingName") %>%
html_text2(),
address = page %>%
html_elements(".listing__address--full") %>%
html_text2()
)
}
I then tried to make a LOOP that will repeat this for all 391 pages:
a = "https://www.yellowpages.ca/search/si/"
b = "/pizza/Canada"
list_results = list()
for (i in 1:391)
{
url_i = paste0(a,i,b)
s_i = data.frame(scraper(url_i))
ss_i = data.frame(i,s_i)
print(ss_i)
list_results[[i]] <- ss_i
}
final = do.call(rbind.data.frame, list_results)
My Problem: I noticed that after the 60th page, I get the following error:
Error in data.frame(i, s_i) :
arguments imply differing number of rows: 1, 0
In addition: Warning message:
In for (i in seq_along(specs)) { :
closing unused connection
To investigate, I went to the 60th page (https://www.yellowpages.ca/search/si/60/pizza/Canada) and noticed that you can not click beyond this page:
My Question: Is there something that I can do differently to try and move past the 60th page, or is there some internal limitation within YellowPages that is preventing from me scraping further?
Thanks!

This is a limit in the Yellow Pages preventing to continue to the next page. A solution is to assign the return value of scraper and check the number of rows. If it is 0, break the for loop.
a = "https://www.yellowpages.ca/search/si/"
b = "/pizza/Canada"
list_results <- list()
for (i in 1:391) {
url_i = paste0(a,i,b)
s <- scraper(url_i, i)
message(paste("page number:", i, "\trows:", nrow(s)))
if(nrow(s) > 0L) {
s_i <- as.data.frame(s)
ss_i <- data.frame(i, s_i)
} else {
message("empty page, bailing out...")
break
}
list_results[[i]] <- ss_i
}
final <- do.call(rbind.data.frame, list_results)
dim(final)
# [1] 2100 3

Issue loading HTML Table into R

I want to load the table at the bottom of the following webpage into R, either as a dataframe or table: https://www.lawschooldata.org/school/Yale%20University/18. My first instinct was to use the readHTMLTable function in the XML package
library(XML)
url <- "https://www.lawschooldata.org/school/Yale%20University/18"
##warning message after next line
table <- readHTMLTable(url)
table
However, this returns an empty list and gives me the following warning:
Warning message:XML content does not seem to be XML: ''
I also tried adapting code I found here Scraping html tables into R data frames using the XML package. This worked for 5 of the 6 tables on the page, but just returned the header row and one row with values from the header row for the 6th table, which is the one I am interested in. Code below:
library(XML)
library(RCurl)
library(rlist)
theurl <- getURL("https://www.lawschooldata.org/school/Yale%20University/18",.opts = list(ssl.verifypeer = FALSE) )
tables <- readHTMLTable(theurl)
##generates a list of the 6 tables on the page
tables <- list.clean(tables, fun = is.null, recursive = FALSE)
##takes the 6th table, which is the one I am interested in
applicanttable <- tables[[6]]
##the problem is that this 6th table returns just the header row and one row of values
##equal to those the header row
head(applicanttable)
Any insights would be greatly appreciated! For reference, I have also consulted the following posts that appear to have similar goals, but could not find a solution there:
Scraping html tables into R data frames using the XML package
Extracting html table from a website in R

The data is dynamically pulled from a nested JavaScript array, within a script tag when JavaScript runs in the browser. This doesn't happen when you use rvest to retrieve the non-rendered content (as seen in view-source).
You can regex out the appropriate nested array and then re-construct the table by splitting out the rows, adding the appropriate headers and performing some data manipulations on various columns e.g. some columns contain html which needs to be parsed to obtain the desired value.
As some columns e.g. Name contain values which could be interpreted as file paths , when using read_html, I use htmltidy to ensure handling as valid html.
N.B. If you use RSelenium then the page will render and you can just grab the table direct without reconstructing it.
TODO:
There are still some data type manipulations you could choose to apply to a few columns.
There is some more logic to be applied to ensure only Name is returned in Name column. Take the case of df$Name[10], this returns "Character and fitness issues" instead of Anxiousboy, due to the required value actually sitting in element.nextSibling.nextSibling of the p tag which is actually selected. These, infrequent, edge cases, need some additional logic built in. In this case, you might test for a particular string being returned then resort to re-parsing with an xpath expression.
R:
library(rvest)
#> Loading required package: xml2
#> Warning: package 'xml2' was built under R version 4.0.3
library(stringr)
library(htmltidy)
#> Warning: package 'htmltidy' was built under R version 4.0.3
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
get_value <- function(input) {
value <- tidy_html(input) %>%
read_html() %>%
html_node("a, p, span") %>%
html_text(trim = T)
result <- ifelse(is.na(value), input, value)
return(result)
}
tidy_result <- function(result) {
return(gsub("<.*", "", result))
}
page <- read_html("https://www.lawschooldata.org/school/Yale%20University/18")
s <- page %>% toString()
headers <- page %>%
html_nodes("#applicants-table th") %>%
html_text(trim = T)
s <- stringr::str_extract(s, regex("DataTable\\(\\{\n\\s+data:(.*\\n\\]\\n\\])", dotall = T)) %>%
gsub("\n", "", .)
rows <- stringr::str_extract_all(s, regex("(\\[.*?\\])", dotall = T))[[1]] %>% as.list()
df <- sapply(rows, function(x) {
stringr::str_match_all(x, "'(.*?)'")[[1]][, 2]
}) %>%
t() %>%
as_tibble(.name_repair = "unique")
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> * `` -> ...3
#> * `` -> ...4
#> * `` -> ...5
#> * ...
names(df) <- headers
df <- df %>%
rowwise() %>%
mutate(across(c("Name", "GRE", "URM", "$$$$"), .f = get_value)) %>%
mutate_at(c("Result"), tidy_result)
write.csv(df, "Yale Applications.csv")
Created on 2021-06-23 by the reprex package (v0.3.0)
Sample output:

web-scraping: web-scraped object doesn't match information on the website and crashes RStudio [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I collected a series of URLs similar to this one. For each URL, I am using the rvest package to web-scrape information related to the address of every practitioner listed in each box of the webpage. By inspecting the HTML structure of the webpage, I could notice that the information I am trying to retrieve is present inside the HTML division called unit size1of2 (which appears, by hovering with the cursor, as div.unit.size1of2). Then, I used the following code to extract the information I need:
library(rvest)
library(xlm2)
webpage <- read_html(x = "myURL")
webpage_name <- webpage %>%
html_nodes("div.unit.size1of2") %>%
html_text(trim = T)
However, when I extract the information, the result I get it's super messy. First of all, there are information I didn't want to scrape, some of them seems to not even be present on the website. In addition, my RStudio IDE freezes for a while, and every time I try to output the result, without working properly afterwards with any command. Finally, the result is not the one I was looking for.
Do you think this is due to some kind of protection present on the website?
Thank you for your help!

You can start iterating on rows which can be selected using div.search-result .line and then :
getting the name using div:first-child h3
getting the ordinal using div:first-child p
getting the location by iterating on div:nth-child(2) p since there can be multiple locations (one has 5 locations on your page) and store them in a list
It's necessary to remove the tabs and new lines using gsub("[\t\n]", "", x) for the name and ordinal. For the addresses, you can get the text and split according to new line \n, remove duplicates new line and strip the first and last line to have a list like :
[1] "CABINET VÉTÉRINAIRE DV FEYS JEAN-MARC"
[2] "Cabinet Veterinaire"
[3] "ZA de Kercadiou"
[4] "XXXXX"
[5] "LANVOLLON"
[6] "Tél : 0X.XX.XX.XX.XX"
The following code also converts the list of vectors to a dataframe with all the data on that page :
library(rvest)
library(plyr)
url = "https://www.veterinaire.fr/annuaires/trouver-un-veterinaire-pour-soigner-mon-animal.html?tx_siteveterinaire_general%5B__referrer%5D%5B%40extension%5D=SiteVeterinaire&tx_siteveterinaire_general%5B__referrer%5D%5B%40vendor%5D=SiteVeterinaire&tx_siteveterinaire_general%5B__referrer%5D%5B%40controller%5D=FrontendUser&tx_siteveterinaire_general%5B__referrer%5D%5B%40action%5D=search&tx_siteveterinaire_general%5B__referrer%5D%5Barguments%5D=YToxOntzOjY6InNlYXJjaCI7YTo1OntzOjM6Im5vbSI7czowOiIiO3M6NjoicmVnaW9uIjtzOjA6IiI7czoxMToiZGVwYXJ0ZW1lbnQiO3M6MDoiIjtzOjU6InZpbGxlIjtzOjA6IiI7czoxMjoiaXRlbXNQZXJQYWdlIjtzOjI6IjEwIjt9fQ%3D%3D21a1899f9a133814dfc1eb4e01b3b47913bd9925&tx_siteveterinaire_general%5B__referrer%5D%5B%40request%5D=a%3A4%3A%7Bs%3A10%3A%22%40extension%22%3Bs%3A15%3A%22SiteVeterinaire%22%3Bs%3A11%3A%22%40controller%22%3Bs%3A12%3A%22FrontendUser%22%3Bs%3A7%3A%22%40action%22%3Bs%3A6%3A%22search%22%3Bs%3A7%3A%22%40vendor%22%3Bs%3A15%3A%22SiteVeterinaire%22%3B%7D7cd75ca141359a98763248c24da8103293a53d08&tx_siteveterinaire_general%5B__trustedProperties%5D=a%3A1%3A%7Bs%3A6%3A%22search%22%3Ba%3A5%3A%7Bs%3A3%3A%22nom%22%3Bi%3A1%3Bs%3A6%3A%22region%22%3Bi%3A1%3Bs%3A11%3A%22departement%22%3Bi%3A1%3Bs%3A5%3A%22ville%22%3Bi%3A1%3Bs%3A12%3A%22itemsPerPage%22%3Bi%3A1%3B%7D%7D86c9510d17c093c44d053714ab20567929a45f9d&tx_siteveterinaire_general%5Bsearch%5D%5Bnom%5D=&tx_siteveterinaire_general%5Bsearch%5D%5Bregion%5D=&tx_siteveterinaire_general%5Bsearch%5D%5Bdepartement%5D=&tx_siteveterinaire_general%5Bsearch%5D%5Bville%5D=&tx_siteveterinaire_general%5Bsearch%5D%5BitemsPerPage%5D=100&tx_siteveterinaire_general%5B%40widget_0%5D%5BcurrentPage%5D=127&cHash=8d8dc78e004b4b9d0ecfdf9b884f54ca"
rows <- read_html(url) %>%
html_nodes("div.search-result .line")
strip <- function (x) gsub("[\t\n]", "", x)
i <- 1
data = list()
for(r in rows){
addresses = list()
j <- 1
locations = r %>% html_nodes("div:nth-child(2) p")
for(loc in locations){
addresses[[j]] <- loc %>% html_text() %>%
gsub("[\t]", "", .) %>% #remove tabs
gsub('([\n])\\1+', '\\1', .) %>% #remove duplicate \n
gsub('^\n|\n$', '', .) %>% #remove 1st and last \n
strsplit(., split='\n', fixed=TRUE) #split by \n
j <- j + 1
}
data[[i]] <- c(
name = r %>% html_nodes("div:first-child h3") %>% html_text() %>% strip(.),
ordinal = r %>% html_nodes("div:first-child p") %>% html_text() %>% strip(.),
addresses = addresses
)
i <- i + 1
}
df = rbind.fill(lapply(data,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)}))
#show data
print(df)
for(i in 1:3){
print(paste("name",df[i,"name"]))
print(paste("ordinal",df[i,"ordinal"]))
print(paste("addresses",df[i,"addresses"]))
print(paste("addresses1",df[i,"addresses1"]))
print(paste("addresses2",df[i,"addresses2"]))
print(paste("addresses3",df[i,"addresses3"]))
}

R - Issue with the DOM of the danish parliament (webscraping)

I've been working on a webscraping project for the political science department at my university.
The Danish parliament is very transparent about their democratic process and they are uploading all the legislative documents on their website. I've been crawling over all pages starting 2008. Right now I'm parsing the information into a dataframe and I'm having an issue that I was not able to resolve so far.
If we look at the DOM we can see that they named most of the objects div.tingdok-normal. The number of objects varies between 16-19. To parse the information correctly for my dataframe I tried to grep out the necessary parts according to patterns. However, the issue is that sometimes my pattern match more than once and I don't know how to tell R that I only want the first match.
for the sake of an example I include some code:
final.url <- "https://www.ft.dk/samling/20161/lovforslag/l154/index.htm"
to.save <- getURL(final.url)
p <- read_html(to.save)
normal <- p %>% html_nodes("div.tingdok-normal > span") %>% html_text(trim =TRUE)
tomatch <- c("Forkastet regeringsforslag", "Forkastet privat forslag", "Vedtaget regeringsforslag", "Vedtaget privat forslag")
type <- unique (grep(paste(tomatch, collapse="|"), results, value = TRUE))
Maybe you can help me with that

My understanding is that you want to extract the text of the webpage, because the "tingdok-normal" are related to the text. I was able to get the text of the webpage with the following code. Also, the following code identifies the position of the first "regex hit" of the different patterns to match.
library(pagedown)
library(pdftools)
library(stringr)
pagedown::chrome_print("https://www.ft.dk/samling/20161/lovforslag/l154/index.htm",
"C:/.../danish.pdf")
text <- pdftools::pdf_text("C:/.../danish.pdf")
tomatch <- c("(A|a)ftalen", "(O|o)pholdskravet")
nb_Tomatch <- length(tomatch)
list_Position <- list()
list_Text <- list()
for(i in 1 : nb_Tomatch)
{
# Locates the first hit of the regex
# To locate all regex hit, use stringr::str_locate_all
list_Position[[i]] <- stringr::str_locate(text , pattern = tomatch[i])
list_Text[[i]] <- stringr::str_sub(string = text,
start = list_Position[[i]][1, 1],
end = list_Position[[i]][1, 2])
}
Here is another approach :
library(RDCOMClient)
library(stringr)
library(rvest)
url <- "https://www.ft.dk/samling/20161/lovforslag/l154/index.htm"
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
IEApp$Navigate(url)
Sys.sleep(5)
doc <- IEApp$Document()
html_Content <- doc$documentElement()$innerText()
tomatch <- c("(A|a)ftalen", "(O|o)pholdskravet")
nb_Tomatch <- length(tomatch)
list_Position <- list()
list_Text <- list()
for(i in 1 : nb_Tomatch)
{
# Locates the first hit of the regex
# To locate all regex hit, use stringr::str_locate_all
list_Position[[i]] <- stringr::str_locate(text , pattern = tomatch[i])
list_Text[[i]] <- stringr::str_sub(string = text,
start = list_Position[[i]][1, 1],
end = list_Position[[i]][1, 2])
}

rbind fromJSON page: duplicate rowname error

I was trying to rbind some json data scraped from api
library(jsonlite)
pop_dat <- data.frame()
for (i in 1:3) {
# Generate url for each page
url <- paste0('http://api.worldbank.org/v2/countries/all/indicators/SP.POP.TOTL?format=json&page=',i)
# Get json data from each page and transform it into dataframe
dat <- as.data.frame(fromJSON(url)[2],flatten = TRUE, row.names = NULL)
pop_dat <- rbind(pop_dat, dat)
}
However, it returns the following error:
Error in row.names<-.data.frame(*tmp*, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘1’, ‘10’, ‘11’, ‘12’, ‘13’, ‘14’, ‘15’, ‘16’, ‘17’, ‘18’, ‘19’, ‘2’, ‘20’, ‘21’, ‘22’, ‘23’, ‘24’, ‘25’, ‘26’, ‘27’, ‘28’, ‘29’, ‘3’, ‘30’, ‘31’, ‘32’, ‘33’, ‘34’, ‘35’, ‘36’, ‘37’, ‘38’, ‘39’, ‘4’, ‘40’, ‘41’, ‘42’, ‘43’, ‘44’, ‘45’, ‘46’, ‘47’, ‘48’, ‘49’, ‘5’, ‘50’, ‘6’, ‘7’, ‘8’, ‘9’
Changing the row.names to null doesn't work. I heard from someone it is due to the fact that some data are stored as lists here, which I don't quite understand.
I understand that there is an alternative package WDI to access this data and it works well, but I want to know how to resolve the duplicates row name problem here in general so that I can deal with similar situation where no alternative package is available.

I heard from someone it is due to the fact that some data are stored as lists...
This is correct. The solution is fairly simple, but I find it really easy to get tripped up by this. Right now you're using:
dat <- as.data.frame(fromJSON(url)[2],flatten = TRUE, row.names = NULL)
The problem comes from fromJSON(url)[2]. This should be fromJSON(url)[[2]] instead. According to the documentation, the key difference between [ and [[ is a single bracket can select multiple elements whereas [[ selects only one.
You can see how this works with some fake data.
foo <- list(
a = rnorm(100),
b = rnorm(100),
c = rnorm(100)
)
With [, you can select multiple values inside this list.
foo[c("a", "b")]
length(foo["a"]) # Result is 1 not 100 like you might expect.
With [[ the results are different.
foo[[c("a", "b")]] # Raises a subscript error.
foo[["a"]] #This works.
length(foo[["a"]]) # Result is 100.
So, your answer will depend on which subset operator you're using. For your problem, you'll want to use [[ to select a single data.frame inside of the list. Then, you should be able to use rbind correctly.
final <- data.frame()
for (i in 1:10) {
url <- paste0(
'http://api.worldbank.org/v2/countries/all/indicators/SP.POP.TOTL?format=json&page=',
i
)
res <- jsonlite::fromJSON(url, flatten = TRUE)[[2]]
final <- rbind(final, res)
}
Alternative solution with lapply:
urls <- sprintf(
'http://api.worldbank.org/v2/countries/all/indicators/SP.POP.TOTL?format=json&page=%s',
1:10
)
resl <- lapply(urls, jsonlite::fromJSON, flatten = TRUE)
resl <- lapply(resl, "[[", 2) # Use lapply to select the 2 element from each list element.
resl <- do.call(rbind, resl) # This takes all the elements of the list and uses those elements as the arguments for rbind.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to pass multiple value as ID to `rvest` - html

Related

(R) Webscraping Error : arguments imply differing number of rows: 1, 0

Issue loading HTML Table into R

web-scraping: web-scraped object doesn't match information on the website and crashes RStudio [closed]

R - Issue with the DOM of the danish parliament (webscraping)

rbind fromJSON page: duplicate rowname error

Categories

Resources