How to extract strings from rows which have .json like format?

How to extract strings from rows which have .json like format? - json

I have imported a .json file using library(jsonlite) stream_in(file(".json"))
However, one of the columns still looks as a .json format.
Im not really sure how proceed in order to extact the columns ID and email from the .json column.
My example:
date <- as.Date(as.character( c("2015-02-13",
"2015-02-14",
"2015-02-14")))
ID <- c(1,2,3)
name <- c("John","Michael","Thomas")
drinks <- c("Beer","Coffee","Tee")
consumed <- c(2,5,3)
john<- "{\"employeID\":\"1\",\"other_details\":{\"email\":\"john#gmx.com\"},\"computer\":\"yes\"}"
michael<- "{\"employeID\":\"2\",\"other_details\":{\"email\":\"michael#yahoo.com\"},\"computer\":\"yes\"}"
thomas<- "{\"employeID\":\"3\",\"other_details\":{\"email\":\"thomas#gmail.com\"},\"computer\":\"yes\"}"
json <- c(john,michael,thomas)
df <- data.frame(date,ID,name,drinks,consumed,json)
Where the data.frame looks like that:
I would like to get the following format:
date ID name drinks consumed email computer
#1 2015-02-13 1 John Beer 2 john#gmx.com yes
#2 2015-02-14 2 Michael Coffee 5 michael#yahoo.com no
#3 2015-02-14 3 Thomas Tee 3 thomas#gmail.com yes
What I have tried was to was first to use the library(jsonlite) again in different variations but it always results in:
fromJSON(df$json[1])
Error: Argument 'txt' must be a JSON string, URL or file.
How can I extract these fields properly?

df$json is a factor vector while fromJSON only accepts a JSON string, URL or file. You can try
fromJSON(as.character(df$json[1]))
or add stringsAsFactor=FALSE when you create df.
You do your task, you can try:
library(tidyverse)
df %>%
filter(json != "{}") %>% # Drop rows with json == "{}"
rowwise() %>%
do(data.frame(ID = .$ID, jsonlite::fromJSON(.$json), stringsAsFactors=FALSE)) %>%
merge(df %>% select(-json), by="ID", all.y=TRUE)
Output:
ID employeID email computer date name drinks consumed
1 1 1 john#gmx.com yes 2015-02-13 John Beer 2
2 2 2 michael#yahoo.com yes 2015-02-14 Michael Coffee 5
3 3 3 thomas#gmail.com yes 2015-02-14 Thomas Tee 3
It can handle cases with "{}" in json column.
df2 <- df %>%
rbind(data.frame(date="2015-02-14", ID=4, name="Kitman",
drinks="Chocolate", consumed=1, json="{}"))
df2 %>%
filter(json != "{}") %>%
rowwise() %>%
do(data.frame(ID = .$ID, jsonlite::fromJSON(.$json), stringsAsFactors=FALSE)) %>%
merge(df2 %>% select(-json), by="ID", all.y=TRUE)
Output:
ID employeID email computer date name drinks consumed
1 1 1 john#gmx.com yes 2015-02-13 John Beer 2
2 2 2 michael#yahoo.com yes 2015-02-14 Michael Coffee 5
3 3 3 thomas#gmail.com yes 2015-02-14 Thomas Tee 3
4 4 <NA> <NA> <NA> 2015-02-14 Kitman Chocolate 1
Outdated:
cbind(
df %>% select(-json),
df$json %>%
map(~as.data.frame(jsonlite::fromJSON(.))) %>%
do.call("rbind", .)
)
Output:
date ID name drinks consumed employeID email computer
1 2015-02-13 1 John Beer 2 1 john#gmx.com yes
2 2015-02-14 2 Michael Coffee 5 2 michael#yahoo.com yes
3 2015-02-14 3 Thomas Tee 3 3 thomas#gmail.com yes

First, try:
ndjson::stream_in("filename.json")
The ndjson package is faster than jsonlite and was built for flattening (it's very task-specific and not as swiss-army-knife-ish as the highly useful jsonlite pkg).
Or, we can keep the tidyverse idioms all the way through:
library(tidyverse)
map_df(df$json, ~jsonlite::fromJSON(as.character(.))) %>%
bind_cols(select(df, -json)) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.list, as.character) %>%
select(ID, name, drinks, consumed, everything())
## # A tibble: 3 × 8
## ID name drinks consumed computer employeID other_details.email date
## <dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <date>
## 1 1 John Beer 2 yes 1 john#gmx.com 2015-02-13
## 2 2 Michael Coffee 5 yes 2 michael#yahoo.com 2015-02-14
## 3 3 Thomas Tee 3 yes 3 thomas#gmail.com 2015-02-14
And, you get your character columns.

Related

Scraping Website with Unchanging URL in R

I would like to scrape a series of tables from a website whose URL does not change when I click through the tables in my browser. Each table corresponds to a unique date. The default table is that which corresponds to today's date. I can scroll through past dates in my browser, but can't seem to find a way to do so in R.
Using library(rvest) this bit of code will reliably download the table that corresponds to today's date (I'm only interested in the first of the three tables).
webad <- "https://official.nba.com/referee-assignments/"
off <- webad %>%
read_html() %>%
html_table()
off <- off[[1]]
How can I download the table that corresponds to, say "2022-10-04", to "2022-10-06", or to yesterday?
I've tried to work through it by identifying the node under which the table lies, in the hopes that I could manipulate it to reflect a prior date. However, the following reproduces the same table as above:
webad <- "https://official.nba.com/referee-assignments/"
off <- webad %>%
read_html() %>%
html_nodes("#main > div > section:nth-child(1) > article > div > div.dayContent > div > table") %>%
html_table()
off <- off[[1]]
Scrolling through past dates in my browser, I've identified various places in the html that reference the prior date; but I can't seem to change it from R, yet alone get the table I download to reflect a change:
webad %>%
read_html() %>%
html_nodes("#main > div > section:nth-child(1) > article > header > div")
I've messed around some with html_form(), follow_link(), and set_values() also, but to no avail.
Is there a good way to navigate this particular URL in R?

You can consider the following approach :
library(RSelenium)
library(rvest)
port <- as.integer(4444L + rpois(lambda = 1000, 1))
rd <- rsDriver(chromever = "105.0.5195.52", browser = "chrome", port = port)
remDr <- rd$client
remDr$open()
url <- "https://official.nba.com/referee-assignments/"
remDr$navigate(url)
web_Obj_Date <- remDr$findElement("css selector", "#ref-filters-menu > li > div > button")
web_Obj_Date$clickElement()
web_Obj_Date_Input <- remDr$findElement("id", 'ref-date')
web_Obj_Date_Input$clearElement()
web_Obj_Date_Input$sendKeysToElement(list("2022-10-05"))
web_Obj_Date_Input$doubleclick()
web_Obj_Date <- remDr$findElement("css selector", "#ref-filters-menu > li > div > button")
web_Obj_Date$clickElement()
web_Obj_Go_Button <- remDr$findElement("css selector", "#date-filter")
web_Obj_Go_Button$submitElement()
html_Content <- remDr$getPageSource()[[1]]
read_html(html_Content) %>% html_table()
[[1]]
# A tibble: 5 x 5
Game `Official 1` `Official 2` `Official 3` Alternate
<chr> <chr> <chr> <chr> <lgl>
1 Indiana # Charlotte John Goble (#10) Lauren Holtkamp (#7) Phenizee Ransom (#70) NA
2 Cleveland # Philadelphia Marc Davis (#8) Jacyn Goble (#68) Tyler Mirkovich (#97) NA
3 Toronto # Boston Josh Tiven (#58) Matt Boland (#18) Intae hwang (#96) NA
4 Dallas # Oklahoma City Courtney Kirkland (#61) Mitchell Ervin (#27) Cheryl Flores (#91) NA
5 Phoenix # L.A. Lakers Bill Kennedy (#55) Rodney Mott (#71) Jenna Reneau (#93) NA
[[2]]
# A tibble: 0 x 5
# ... with 5 variables: Game <lgl>, Official 1 <lgl>, Official 2 <lgl>, Official 3 <lgl>, Alternate <lgl>
# i Use `colnames()` to see all variable names
[[3]]
# A tibble: 0 x 5
# ... with 5 variables: Game <lgl>, Official 1 <lgl>, Official 2 <lgl>, Official 3 <lgl>, Alternate <lgl>
# i Use `colnames()` to see all variable names
[[4]]
# A tibble: 6 x 7
S M T W T F S
<int> <int> <int> <int> <int> <int> <int>
1 NA NA NA NA NA NA 1
2 2 3 4 5 6 7 8
3 9 10 11 12 13 14 15
4 16 17 18 19 20 21 22
5 23 24 25 26 27 28 29
6 30 31 NA NA NA NA NA

Here is another approach that can be considered :
library(RDCOMClient)
library(rvest)
url <- "https://official.nba.com/referee-assignments/"
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
IEApp$Navigate(url)
Sys.sleep(5)
doc <- IEApp$Document()
clickEvent <- doc$createEvent("MouseEvent")
clickEvent$initEvent("click", TRUE, FALSE)
web_Obj_Date <- doc$querySelector("#ref-filters-menu > li > div > button")
web_Obj_Date$dispatchEvent(clickEvent)
web_Obj_Date_Input <- doc$GetElementById('ref-date')
web_Obj_Date_Input[["Value"]] <- "2022-10-05"
web_Obj_Go_Button <- doc$querySelector("#date-filter")
web_Obj_Go_Button$dispatchEvent(clickEvent)
html_Content <- doc$Body()$innerHTML()
read_html(html_Content) %>% html_table()
[[1]]
# A tibble: 5 x 5
Game `Official 1` `Official 2` `Official 3` Alternate
<chr> <chr> <chr> <chr> <lgl>
1 Indiana # Charlotte John Goble (#10) Lauren Holtkamp (#7) Phenizee Ransom (#70) NA
2 Cleveland # Philadelphia Marc Davis (#8) Jacyn Goble (#68) Tyler Mirkovich (#97) NA
3 Toronto # Boston Josh Tiven (#58) Matt Boland (#18) Intae hwang (#96) NA
4 Dallas # Oklahoma City Courtney Kirkland (#61) Mitchell Ervin (#27) Cheryl Flores (#91) NA
5 Phoenix # L.A. Lakers Bill Kennedy (#55) Rodney Mott (#71) Jenna Reneau (#93) NA
[[2]]
# A tibble: 8 x 7
Game `Official 1` `Official 2` `Official 3` Alternate `` ``
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 "Game" "Official 1" "Official 2" "Official 3" "Alternate" NA NA
2 "S" "M" "T" "W" "T" "F" "S"
3 "" "" "" "" "" "" "1"
4 "2" "3" "4" "5" "6" "7" "8"
5 "9" "10" "11" "12" "13" "14" "15"
6 "16" "17" "18" "19" "20" "21" "22"
7 "23" "24" "25" "26" "27" "28" "29"
8 "30" "31" "" "" "" "" ""
[[3]]
# A tibble: 7 x 7
Game `Official 1` `Official 2` `Official 3` Alternate `` ``
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 "S" "M" "T" "W" "T" "F" "S"
2 "" "" "" "" "" "" "1"
3 "2" "3" "4" "5" "6" "7" "8"
4 "9" "10" "11" "12" "13" "14" "15"
5 "16" "17" "18" "19" "20" "21" "22"
6 "23" "24" "25" "26" "27" "28" "29"
7 "30" "31" "" "" "" "" ""
[[4]]
# A tibble: 6 x 7
S M T W T F S
<int> <int> <int> <int> <int> <int> <int>
1 NA NA NA NA NA NA 1
2 2 3 4 5 6 7 8
3 9 10 11 12 13 14 15
4 16 17 18 19 20 21 22
5 23 24 25 26 27 28 29
6 30 31 NA NA NA NA NA

If you install the Docker software (see https://docs.docker.com/engine/install/), you can consider the following approach with firefox :
library(RSelenium)
library(rvest)
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
url <- "https://official.nba.com/referee-assignments/"
remDr$navigate(url)
web_Obj_Date <- remDr$findElement("css selector", "#ref-filters-menu > li > div > button")
web_Obj_Date$clickElement()
web_Obj_Date_Input <- remDr$findElement("id", 'ref-date')
web_Obj_Date_Input$clearElement()
web_Obj_Date_Input$sendKeysToElement(list("2022-10-05"))
web_Obj_Date_Input$doubleclick()
web_Obj_Date <- remDr$findElement("css selector", "#ref-filters-menu > li > div > button")
web_Obj_Date$clickElement()
web_Obj_Go_Button <- remDr$findElement("css selector", "#date-filter")
web_Obj_Go_Button$submitElement()
html_Content <- remDr$getPageSource()[[1]]
read_html(html_Content) %>% html_table()
[[1]]
# A tibble: 5 x 5
Game `Official 1` `Official 2` `Official 3` Alternate
<chr> <chr> <chr> <chr> <lgl>
1 Indiana # Charlotte John Goble (#10) Lauren Holtkamp (#7) Phenizee Ransom (#70) NA
2 Cleveland # Philadelphia Marc Davis (#8) Jacyn Goble (#68) Tyler Mirkovich (#97) NA
3 Toronto # Boston Josh Tiven (#58) Matt Boland (#18) Intae hwang (#96) NA
4 Dallas # Oklahoma City Courtney Kirkland (#61) Mitchell Ervin (#27) Cheryl Flores (#91) NA
5 Phoenix # L.A. Lakers Bill Kennedy (#55) Rodney Mott (#71) Jenna Reneau (#93) NA
[[2]]
# A tibble: 0 x 5
# ... with 5 variables: Game <lgl>, Official 1 <lgl>, Official 2 <lgl>, Official 3 <lgl>, Alternate <lgl>
# i Use `colnames()` to see all variable names
[[3]]
# A tibble: 0 x 5
# ... with 5 variables: Game <lgl>, Official 1 <lgl>, Official 2 <lgl>, Official 3 <lgl>, Alternate <lgl>
# i Use `colnames()` to see all variable names
[[4]]
# A tibble: 6 x 7
S M T W T F S
<int> <int> <int> <int> <int> <int> <int>
1 NA NA NA NA NA NA 1
2 2 3 4 5 6 7 8
3 9 10 11 12 13 14 15
4 16 17 18 19 20 21 22
5 23 24 25 26 27 28 29
6 30 31 NA NA NA NA NA

R: how to toggle html page selection in web scraping

library(XML)
library(RCurl)
library(rlist)
theurl <- getURL("http://legacy.baseballprospectus.com/sortable/index.php?cid=2022181",.opts = list(ssl.verifypeer = FALSE) )
tables <- readHTMLTable(theurl)
I'm trying to scrape the 2016 table data from the above webpage. If I change the Year to 2010, the url changes to http://legacy.baseballprospectus.com/sortable/index.php?cid=1966487.
I want to automate my algorithm so that it can obtain the table across different Year, but I'm not sure how I can obtain the unique identifiers (e.g. 1966487) for each page automatically. Is there a way to find the list of these?
I've tried looking at the html source code, but no luck.

With rvest, you can set the value in the form and submit it. Wrapped in purrr::map_dfr to iterate and row-bind the results in to a data frame,
library(rvest)
sess <- html_session("http://legacy.baseballprospectus.com/sortable/index.php?cid=2022181")
baseball <- purrr::map_dfr(
2017:2015,
function(y){
Sys.sleep(10 + runif(1)) # be polite
form <- sess %>%
html_node(xpath = '//form[#action="index.php"]') %>%
html_form() %>%
set_values(year = y)
sess <- submit_form(sess, form)
sess %>%
read_html() %>%
html_node('#TTdata') %>%
html_table(header = TRUE)
}
)
tibble::as_data_frame(baseball) # for printing
#> # A tibble: 4,036 x 38
#> `#` NAME TEAM LG YEAR AGE G PA AB R
#> <dbl> <chr> <chr> <chr> <int> <int> <int> <int> <int> <int>
#> 1 1 Giancarlo Stanton MIA NL 2017 27 159 692 597 123
#> 2 2 Joey Votto CIN NL 2017 33 162 707 559 106
#> 3 3 Charlie Blackmon COL NL 2017 30 159 725 644 137
#> 4 4 Aaron Judge NYA AL 2017 25 155 678 542 128
#> 5 5 Nolan Arenado COL NL 2017 26 159 680 606 100
#> 6 6 Kris Bryant CHN NL 2017 25 151 665 549 111
#> 7 7 Mike Trout ANA AL 2017 25 114 507 402 92
#> 8 8 Jose Altuve HOU AL 2017 27 153 662 590 112
#> 9 9 Paul Goldschmidt ARI NL 2017 29 155 665 558 117
#> 10 10 Jose Ramirez CLE AL 2017 24 152 645 585 107
#> # ... with 4,026 more rows, and 28 more variables: H <int>, `1B` <int>,
#> # `2B` <int>, `3B` <int>, HR <int>, TB <int>, BB <int>, IBB <int>,
#> # SO <int>, HBP <int>, SF <int>, SH <int>, RBI <int>, DP <int>,
#> # NETDP <dbl>, SB <int>, CS <int>, AVG <dbl>, OBP <dbl>, SLG <dbl>,
#> # OPS <dbl>, ISO <dbl>, BPF <int>, oppOPS <dbl>, TAv <dbl>, VORP <dbl>,
#> # FRAA <dbl>, BWARP <dbl>

Json list not 'flattening' properly

I have the below list in a column of a data frame.
As you can see, the variables change through the items. The column affilications is not always present.
I have been trying to flatten the list to a data frame or to a list of 3, but I am geeting a single columg with all elements of every column.
Is there a way I can tell R that each element has 3 columns and that the first one is not always present and to fill it with let's say null.
[[1]]
NULL
[[2]]
affiliations author_id author_name
1 Punjabi University 780E3459 munish puri
2 Punjabi University 48D92C79 rajesh dhaliwal
3 Punjabi University 7D9BD37C r s singh
[[3]]
author_id author_name
1 7FF872BC barbara eileen ryan
[[4]]
author_id author_name
1 0299B8E9 fraser j harbutt
[[5]]
author_id author_name
1 7DAB7B72 richard m freeland
[[6]]
NULL
This is what I'm getting when I try and flatten it.
authors
1 Punjabi University
2 Punjabi University
3 Punjabi University
4 780E3459
5 48D92C79
6 7D9BD37C
7 munish puri
8 rajesh dhaliwal
9 r s singh
10 7FF872BC
But what I really need would be:
[[1]] NULL
[[2]]affiliations author_id author_name
1 Punjabi University 780E3459 munish puri
2 Punjabi University 48D92C79 rajesh dhaliwal
3 Punjabi University 7D9BD37C r s singh
[[3]] NULL author_id author_name
1 NULL 7FF872BC barbara eileen ryan

I i understand you correctly you have data as follows:
require(tidyverse)
list(
NULL,
tibble(a=c(2, 2), b=c(2, 2), c=c(2, 2)),
tibble(b=3, c=3)
)
So:
[[1]]
NULL
[[2]]
# A tibble: 2 x 3
a b c
<dbl> <dbl> <dbl>
1 2 2 2
2 2 2 2
[[3]]
# A tibble: 1 x 2
b c
<dbl> <dbl>
1 3 3
Using bind_rows results in:
bind_rows(list(
NULL,
tibble(a=c(2, 2), b=c(2, 2), c=c(2, 2)),
tibble(b=3, c=3)
))
# A tibble: 3 x 3
a b c
<dbl> <dbl> <dbl>
1 2 2 2
2 2 2 2
3 NA 3 3

Convert JSON to data.frame with more than 2 columns

I am trying to properly convert a JSON to a data.frame with 3 columns.
This is a simplification of my data
# simplification of my real data
my_data <- '{"Bag 1": [["bananas", 1], ["oranges", 2]],"Bag 2": [["bananas", 3], ["oranges", 4], ["apples", 5]]}'
library(jsonlite)
my_data <- fromJSON(my_data)
> my_data
$`Bag 1`
[,1] [,2]
[1,] "bananas" "1"
[2,] "oranges" "2"
$`Bag 2`
[,1] [,2]
[1,] "bananas" "3"
[2,] "oranges" "4"
[3,] "apples" "5"
I try to convert that to a data.frame
# this return an error about "arguments imply differing number of rows: 2, 3"
my_data <- as.data.frame(my_data)
> my_data <- as.data.frame(my_data)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 2, 3
This is my solution to create the data.frame
# my solution
my_data <- data.frame(fruit = do.call(c, my_data),
bag_number = rep(1:length(my_data),
sapply(my_data, length)))
# how it looks
my_data
> my_data
fruit bag_number
Bag 11 bananas 1
Bag 12 oranges 1
Bag 13 1 1
Bag 14 2 1
Bag 21 bananas 2
Bag 22 oranges 2
Bag 23 apples 2
Bag 24 3 2
Bag 25 4 2
Bag 26 5 2
But my idea is to obtain something like this to avoid problems like doing my_data[a:b,1] when I want to use ggplot2 and others.
fruit | quantity | bag_number
oranges | 2 | 1
bananas | 1 | 1
oranges | 4 | 2
bananas | 3 | 2
apples | 5 | 2

library(plyr)
# import data (note that the rJSON package does this differently than the jsonlite package)
data.import <- jsonlite::fromJSON(my_data)
# combine all data using plyr
df <- ldply(data.import, rbind)
# clean up column names
colnames(df) <- c('bag_number', 'fruit', 'quantity')
bag_number fruit quantity
1 Bag 1 bananas 1
2 Bag 1 oranges 2
3 Bag 2 bananas 3
4 Bag 2 oranges 4
5 Bag 2 apples 5

purrr / tidyverse version. You also get proper types with this and rid of "Bag":
library(jsonlite)
library(purrr)
library(readr)
fromJSON(my_data, flatten=TRUE) %>%
map_df(~as.data.frame(., stringsAsFactors=FALSE), .id="bag") %>%
type_convert() %>%
setNames(c("bag_number", "fruit", "quantity")) -> df
df$bag_number <- gsub("Bag ", "", df$bag_number)

Rvest R not getting inner table

I'm trying to retrieve the Medals Table inside Wikipedia for Olympics 2012.
library(rvest)
library(magrittr)
url <- "https://en.wikipedia.org/wiki/United_States_at_the_2012_Summer_Olympics"
xpath0 <- '//*[#id="mw-content-text"]/table[1]'
xpath1 <- '//*[#id="mw-content-text"]/table[2]'
xpath2 <- '//*[#id="mw-content-text"]/table[2]/tbody/tr/td[1]'
xpath3 <- '//*[#id="mw-content-text"]/table[2]/tbody/tr/td[1]/table'
tb <- url %>%
html() %>%
html_nodes(xpath=xpath0) %>%
html_nodes("") %>%
html_table()
xpath0 or xpath1 return an error
Error in parse_simple_selector(stream) :
Expected selector, got <EOF at 1>
xpath2 and xpath3 return empty lists.
At same time I tried to use Selectorgadget (https://cran.r-project.org/web/packages/rvest/vignettes/selectorgadget.html) to point to the exact element. I got
//td[(((count(preceding-sibling::) + 1) = 1) and parent::)] |
//*[contains(concat( " ", #class, " " ), concat( " ",
"headerSortDown", " " ))]
and the Error
Error in parse_simple_selector(stream) :
Expected selector, got
I really appreciate any help.
Joa

The first table with the names has a complicated structure and seems to be very difficult to convert into a standard format. At least I didn't succeed.
A summary of the number of medals by sport and the total medals can be obtained with
library(rvest) #v.0.2.0.9000
url <- "https://en.wikipedia.org/wiki/United_States_at_the_2012_Summer_Olympics"
tb <- read_html(url) %>% html_node("table.wikitable:nth-child(2)") %>% html_table(fill=TRUE)
#> head(tb)
# Medals by sport NA NA NA NA NA NA
#1 Sport 01 ! 02 ! 03 ! Total NA NA
#2 Swimming 16 9 6 31 NA NA
#3 Track & field 9 12 7 28 NA NA
#4 Gymnastics 3 1 2 6 NA NA
#5 Shooting 3 0 1 4 NA NA
#6 Tennis 3 0 1 4 NA NA
Then there is another table summarizing all competitors that you can get with
tb2 <- read_html(url) %>% html_node("table.wikitable:nth-child(20)") %>% html_table()
#> head(tb2)
# Sport Men Women Total
#1 Archery 3 3 6
#2 Athletics (track and field) 63 62 125
#3 Badminton 2 1 3
#4 Basketball 12 12 24
#5 Boxing 9 3 12
#6 Canoeing 5 2 7
And this is the table of multiple medalists:
tb3 <- read_html(url) %>% html_node("table.wikitable:nth-child(8)") %>% html_table(fill=TRUE)
#> head(tb3)
# Multiple medalists NA NA NA NA NA NA
#1 Name Sport 01 ! 02 ! 03 ! Total NA
#2 Michael Phelps Swimming 4 2 0 6 NA
#3 Missy Franklin Swimming 4 0 1 5 NA
#4 Allison Schmitt Swimming 3 1 1 5 NA
#5 Ryan Lochte Swimming 2 2 1 5 NA
#6 Allyson Felix Track & field 3 0 0 3 NA
It really depends on which table you want to have, as pointed out by #Metrics. There are many tables on that page.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to extract strings from rows which have .json like format? - json

Related

Scraping Website with Unchanging URL in R

R: how to toggle html page selection in web scraping

Json list not 'flattening' properly

Convert JSON to data.frame with more than 2 columns

Rvest R not getting inner table

Categories

Resources