Processing JSON using rjson

Processing JSON using rjson - json

I'm trying to process some data in JSON format. rjson::fromJSON imports the data successfully and places it into a quite unwieldy list.
library(rjson)
y <- fromJSON(file="http://api.lmiforall.org.uk/api/v1/wf/predict/breakdown/region?soc=6145&minYear=2014&maxYear=2020")
str(y)
List of 3
$ soc : num 6145
$ breakdown : chr "region"
$ predictedEmployment:List of 7
..$ :List of 2
.. ..$ year : num 2014
.. ..$ breakdown:List of 12
.. .. ..$ :List of 3
.. .. .. ..$ code : num 1
.. .. .. ..$ name : chr "London"
.. .. .. ..$ employment: num 74910
.. .. ..$ :List of 3
.. .. .. ..$ code : num 7
.. .. .. ..$ name : chr "Yorkshire and the Humber"
.. .. .. ..$ employment: num 61132
...
However, as this is essentially tabular data, I would like it in a succinct data.frame. After much trial and error I have the result:
y.p <- do.call(rbind,lapply(y[[3]], function(p) cbind(p$year,do.call(rbind,lapply(p$breakdown, function(q) data.frame(q$name,q$employment,stringsAsFactors=F))))))
head(y.p)
p$year q.name q.employment
1 2014 London 74909.59
2 2014 Yorkshire and the Humber 61131.62
3 2014 South West (England) 65833.57
4 2014 Wales 33002.64
5 2014 West Midlands (England) 68695.34
6 2014 South East (England) 98407.36
But the command seems overly fiddly and complex. Is there a simpler way of doing this?

Here I recover the geometry of the list
ni <- seq_along(y[[3]])
nj <- seq_along(y[[c(3, 1, 2)]])
nij <- as.matrix(expand.grid(3, ni=ni, 2, nj=nj))
then extract the relevant variable information using the rows of nij as an index into the nested list
data <- apply(nij, 1, function(ij) y[[ij]])
year <- apply(cbind(nij[,1:2], 1), 1, function(ij) y[[ij]])
and make it into a more friendly structure
> data.frame(year, do.call(rbind, data))
year code name employment
1 2014 1 London 74909.59
2 2015 5 West Midlands (England) 69132.34
3 2016 12 Northern Ireland 24313.94
4 2017 5 West Midlands (England) 71723.4
5 2018 9 North East (England) 27199.99
6 2019 4 South West (England) 71219.51

I am not sure it is simpler, but the result is more complete and I think is easier to read. My idea using Map is, for each couple (year,breakdown), aggregate breakdown data into single table and then combine it with year.
dat <- y[[3]]
res <- Map(function(x,y)data.frame(year=y,
do.call(rbind,lapply(x,as.data.frame))),
lapply(dat,'[[','breakdown'),
lapply(dat,'[[','year'))
## transform the list to a big data.frame
do.call(rbind,res)
year code name employment
1 2014 1 London 74909.59
2 2014 7 Yorkshire and the Humber 61131.62
3 2014 4 South West (England) 65833.57
4 2014 10 Wales 33002.64
5 2014 5 West Midlands (England) 68695.34
6 2014 2 South East (England) 98407.36

Related

Scraping html text into table with delimiters that do not have a clear pattern using R (rvest)

I'm just learning how to use R to scrape data from webpages, and I'm running into a couple of issues.
For reference, the website that I am practicing on is here: http://www.rsssf.com/tables/34q.html
As far as I know, the website I am scraping data from is not a table so I can't directly scrape the information into a table, so here is the code I wrote to just have all of the text:
wcq_1934_html <- read_html("http://www.rsssf.com/tables/34q.html")
wcq_1934_node <- html_nodes(wcq_1934_html, "pre")
wcq_1934_text <- html_text(wcq_1934_node, trim = TRUE)
This results in a very long text file with all of the information that I need, just not formatted in an ideal way.
So I am next attempting to substring this text in order to get an output that looks something like this.
Country A - Country A Score - Country B - Country B Score
It doesn't have to be exactly like this, I just basically need for each game the country and how many goals they scored and ideally it should be comparable with the other country from the same game so I can know who won or lost! I do not need any of the other information like where the game was played, etc.
So I've tried three different ways to get this:
First test: split text by dashes:
test <- strsplit(wcq_1934_text, "-")
df_test <- data.frame(test)
This gives me the information I need in a table but the rows don't match the exact scores that I need (i.e. Lithuania 0, and Sweden 2 are in separate rows)
Second test: split text by spaces:
test2 <- strsplit(wcq_1934_text, " ")
df_test2 <- data.frame(test2)
This is helpful because it gives me the scores in one row (0-2 for the first game), but the countries are unevenly spaced out across rows.
Third test: split text by "tabs"
test3 <- strsplit(wcq_1934_text, " ")
df_test3 <- data.frame(test3)
This has a similar issue to the first test.
Any suggestions would be much appreciated. This is my first ever Stack Overflow post, although I've lurked around and this website has been helpful to me for a very long time. Thank you in advance!

Here's a solution that provides you most of what you need, though as MrFlick commented, it is a little fragile to this page. I'll stay with rvest, though as biomiha suggested, it isn't really buying you a lot here (though it does cleanly break out the <pre> block).
Starting with your wcq_1934_text, it's a single long string, let's break it up by newlines (CRLF in this case):
wcq_1934_text <- strsplit(wcq_1934_text, "[\r\n]+")[[1]]
str(wcq_1934_text)
# chr [1:51] "Hosts: Italy (not automatically qualified)" "Holders: Uruguay (did not enter)" "Group 1 [Sweden]" ...
I'll the magrittr package merely because it helps break out each step of the process using the %>% non-pipe; you can convert it non-magrittr by changing (say) func1() %>% func2() %>% func3() to func3(func2(func1())) (yuck) or intermediate assignment of return values, ret1 <- func1(); ret2 <- func2(ret1); ....
library(magrittr)
dat <- Filter(function(a) grepl("^[0-9][0-9]", a), wcq_1934_text) %>%
paste(., collapse = "\n") %>%
textConnection() %>%
read.fwf(file = ., widths = c(10, 16, 17, 4, 99), stringsAsFactors = FALSE) %>%
lapply(trimws) %>%
as.data.frame(stringsAsFactors = FALSE)
The widths are fragile and unique to this page. If other reporting pages have slightly different column layouts, you'll need to use a different function, perhaps one that can automatically determine the breaks.
head(dat)
# V1 V2 V3 V4 V5
# 1 11.06.33 Stockholm Sweden 6-2 Estonia
# 2 29.06.33 Kaunas Lithuania 0-2 Sweden
# 3 11.03.34 Madrid Spain 9-0 Portugal
# 4 18.03.34 Lisboa Portugal 1-2 Spain
# 5 25.03.34 Milano Italy 4-0 Greece
# 6 25.03.34 Sofia Bulgaria 1-4 Hungary
From here, it's up to you which columns you want to use.
For instance, handling of the date, you might want:
dat$V1 <- as.POSIXct(gsub("([0-9]+)$", "19\\1", dat$V1), format = "%d.%m.%Y")
dat$V1
# [1] "1933-06-11 PST" "1933-06-29 PST" "1934-03-11 PST" "1934-03-18 PST" "1934-03-25 PST" "1934-03-25 PST" "1934-04-25 PST" "1934-04-29 PST"
# [9] "1933-10-15 PST" "1934-03-15 PST" "1933-09-24 PST" "1933-10-29 PST" "1934-04-29 PST" "1934-02-25 PST" "1934-04-08 PST" "1934-04-29 PST"
# [17] "1934-03-11 PST" "1934-04-15 PST" "1934-01-28 PST" "1934-02-01 PST" "1934-02-04 PST" "1934-03-04 PST" "1934-03-11 PST" "1934-03-18 PST"
# [25] "1934-05-24 PST" "1934-03-16 PST" "1934-04-06 PST"
The gsub stuff is because as.POSIXct assumes 2-digit years less than 69 are in the 20th century, 19th for 69-99.
It's easy enough to use either strsplit on the scores, but you could also do:
library(tidyr)
dat %>%
separate(V4, c("score1", "score2"), sep="-") %>%
head()
# Warning: Too few values at 1 locations: 10
# V1 V2 V3 score1 score2 V5
# 1 1933-06-11 Stockholm Sweden 6 2 Estonia
# 2 1933-06-29 Kaunas Lithuania 0 2 Sweden
# 3 1934-03-11 Madrid Spain 9 0 Portugal
# 4 1934-03-18 Lisboa Portugal 1 2 Spain
# 5 1934-03-25 Milano Italy 4 0 Greece
# 6 1934-03-25 Sofia Bulgaria 1 4 Hungary
(The warning is expected, since one game was not played so has "n/p" for a score. You might want to handle non-score values in V4 before trying the split, perhaps replacing anything not numeric-dash-numeric with NA.)

Equally specific to this particular site but may be easier to generalize:
library(rvest)
library(purrr)
library(dplyr)
library(stringi)
pg <- read_html("http://www.rsssf.com/tables/34q.html")
Target the <pre> and strip out some things that aren't part of "tables":
html_nodes(pg, "pre") %>%
html_text() %>%
stri_split_lines() %>%
flatten_chr() %>%
discard(stri_detect_regex, "^(NB| )") -> lines
Now, we get the start and end lines indexes of each "group":
starts <- which(grepl("^Group", lines))
ends <- c(starts[-1], length(lines))
We iterate over those starts and ends and:
extract the group info
clean up the table
discard any "empty" tables
turn the tabular data into a data frame, doing some munging along the way
I can annotate the following more if needed:
map2_df(starts, ends, ~{
grp_info <- stri_match_all_regex(lines[.x], "Group ([[:digit:]]+) \\[(.*)]")[[1]][,2:3]
lines[(.x+1):.y] %>%
discard(stri_detect_regex, "(^[^[:digit:]]| round)") %>%
discard(`==`, "") -> grp
if (length(grp) == 0) return(NULL)
stri_split_regex(grp, "\ \ +") %>%
map_df(~{
.x[1:4] %>%
as.list() %>%
set_names(c("date", "team_a", "team_b", "score_team")) %>%
flatten_df() %>%
separate(score_team, c("score", "team_c"), sep=" ") %>%
mutate(group_num = grp_info[1], group_info = grp_info[2]) %>%
separate(date, c("d", "m", "y")) %>%
mutate(date = as.Date(sprintf("19%s-%s-%s", y, m, d))) %>%
select(-d, -m, -y)
})
})
## # A tibble: 27 x 7
## team_a team_b score team_c group_num group_info date
## <chr> <chr> <chr> <chr> <chr> <chr> <date>
## 1 Stockholm Sweden 6-2 Estonia 1 Sweden 1933-06-11
## 2 Kaunas Lithuania 0-2 Sweden 1 Sweden 1933-06-29
## 3 Madrid Spain 9-0 Portugal 2 Spain 1934-03-11
## 4 Lisboa Portugal 1-2 Spain 2 Spain 1934-03-18
## 5 Milano Italy 4-0 Greece 3 Italy 1934-03-25
## 6 Sofia Bulgaria 1-4 Hungary 4 Hungary, Austria 1934-03-25
## 7 Wien Austria 6-1 Bulgaria 4 Hungary, Austria 1934-04-25
## 8 Budapest Hungary 4-1 Bulgaria 4 Hungary, Austria 1934-04-29
## 9 Warszawa Poland 1-2 Czechoslovakia 5 Czechoslovakia 1933-10-15
## 10 Praha Czechoslovakia n/p Poland 5 Czechoslovakia 1934-03-15
## 11 Beograd Yugoslavia 2-2 Switzerland 6 Romania, Switzerland 1933-09-24
## 12 Bern Switzerland 2-2 Romania 6 Romania, Switzerland 1933-10-29
## 13 Bucuresti Romania 2-1 Yugoslavia 6 Romania, Switzerland 1934-04-29
## 14 Dublin Ireland 4-4 Belgium 7 Netherlands, Belgium 1934-02-25
## 15 Amsterdam Netherlands 5-2 Ireland 7 Netherlands, Belgium 1934-04-08
## 16 Antwerpen Belgium 2-4 Netherlands 7 Netherlands, Belgium 1934-04-29
## 17 Luxembourg Luxembourg 1-9 Germany 8 Germany, France 1934-03-11
## 18 Luxembourg Luxembourg 1-6 France 8 Germany, France 1934-04-15
## 19 Port-au-Prince Haiti 1-3 Cuba 11 USA 1934-01-28
## 20 Port-au-Prince Haiti 1-1 Cuba 11 USA 1934-02-01
## 21 Port-au-Prince Haiti 0-6 Cuba 11 USA 1934-02-04
## 22 Cd. de Mexico Mexico 3-2 Cuba 11 USA 1934-03-04
## 23 Cd. de Mexico Mexico 5-0 Cuba 11 USA 1934-03-11
## 24 Cd. de Mexico Mexico 4-1 Cuba 11 USA 1934-03-18
## 25 Roma USA 4-2 Mexico 11 USA 1934-05-24
## 26 Cairo Egypt 7-1 Palestina 12 Egypt 1934-03-16
## 27 Tel Aviv Palestina 1-4 Egypt 12 Egypt 1934-04-06

Filter in Nested Data Frame

I am playing around with the Yelp data set and want to filter the business set according to the category.
I imported the JSON file into R with
yelp_business = stream_in(file("yelp_academic_dataset_business.json"))
which results then in the following data frame:
'data.frame': 77445 obs. of 15 variables:
$ business_id : chr "5UmKMjUEUNdYWqANhGckJw" "UsFtqoBl7naz8AVUBZMjQQ" "3eu6MEFlq2Dg7bQh8QbdOg" "cE27W9VPgO88Qxe4ol6y_g" ...
$ full_address : chr "4734 Lebanon Church Rd\nDravosburg, PA 15034" "202 McClure St\nDravosburg, PA 15034" "1 Ravine St\nDravosburg, PA 15034" "1530 Hamilton Rd\nBethel Park, PA 15234" ...
$ hours :'data.frame': 77445 obs. of 7 variables:
..$ Friday :'data.frame': 77445 obs. of 2 variables:
.. ..$ close: chr "21:00" NA NA NA ...
.. ..$ open : chr "11:00" NA NA NA ...
..$ Tuesday :'data.frame': 77445 obs. of 2 variables:
.. ..$ close: chr "21:00" NA NA NA ...
.. ..$ open : chr "11:00" NA NA NA ...
..$ Thursday :'data.frame': 77445 obs. of 2 variables:
.. ..$ close: chr "21:00" NA NA NA ...
.. ..$ open : chr "11:00" NA NA NA ...
..$ Wednesday:'data.frame': 77445 obs. of 2 variables:
.. ..$ close: chr "21:00" NA NA NA ...
.. ..$ open : chr "11:00" NA NA NA ...
..$ Monday :'data.frame': 77445 obs. of 2 variables:
.. ..$ close: chr "21:00" NA NA NA ...
.. ..$ open : chr "11:00" NA NA NA ...
..$ Sunday :'data.frame': 77445 obs. of 2 variables:
.. ..$ close: chr NA NA NA NA ...
.. ..$ open : chr NA NA NA NA ...
..$ Saturday :'data.frame': 77445 obs. of 2 variables:
.. ..$ close: chr NA NA NA NA ...
.. ..$ open : chr NA NA NA NA ...
$ open : logi TRUE TRUE TRUE FALSE TRUE TRUE ...
$ categories :List of 77445
..$ : chr "Fast Food" "Restaurants"
..$ : chr "Nightlife"
..$ : chr "Auto Repair" "Automotive"
..$ : chr "Active Life" "Mini Golf" "Golf"
..$ : chr "Shopping" "Home Services" "Internet Service Providers" "Mobile Phones" ...
..$ : chr "Bars" "American (New)" "Nightlife" "Lounges" ...
..$ : chr "Active Life" "Trainers" "Fitness & Instruction"
..$ : chr "Bars" "American (Traditional)" "Nightlife" "Restaurants"
..$ : chr "Auto Repair" "Automotive" "Tires"
..$ : chr "Active Life" "Mini Golf"
..$ : chr "Home Services" "Contractors"
..$ : chr "Veterinarians" "Pets"
..$ : chr "Libraries" "Public Services & Government"
..$ : chr "Automotive" "Auto Parts & Supplies"
I now want to filter all rows according to the business category and want to include all categories that have food in the category list.
However, if I just try it that way:
input ="food"
engage = filter(yelp_business, grepl(input, categories))
I receive the following error code:
Error: data_frames can only contain 1d atomic vectors and lists
I first suspected the nested structure to be a reason for that. However using tidyjson does not help either as category is a list and not a dataframe within the main dataframe.
Does anyone have an idea how to solve this? I just need a list of all food restaurant's business ids to then filter the review json file from Yelp to extract the written reviews.
Any help with this is really appreciated! Thanks a lot!

tidyjson does not yet support ndjson, and I am not quite sure how to nicely work with stream_in().
However, it is possible to read the file directly and process naturally with tidyjson. I am using the development version from devtools::install_github('jeremystan/tidyjson').
document.id gives a nice identification of objects, so I find the document.ids that have "food" in one of the "categories." From that point, we filter and do whatever additional data analysis is desired.
library(dplyr)
library(stringr)
library(tidyjson)
j <- readLines("yelp_academic_dataset_business.json")
raw <- j %>% as.tbl_json()
## pull out the categories for filtering
prep <- raw %>% enter_object("categories") %>%
gather_array() %>% append_values_string()
## filter to 'food' categories (use document.id to identify json objects)
keepids <- prep[str_detect(str_to_lower(prep$string), "food"), ]$document.id %>%
unique()
## filter and do any further data analysis you want to do
raw %>% filter(document.id %in% keepids) %>%
spread_values(
name = json_chr(name),
city = json_chr(city),
state = json_chr(state),
stars = json_chr(stars))
#> # A tbl_json: 21 x 5 tibble with a "JSON" attribute
#> `attr(., "JSON")` document.id name city
#> <chr> <int> <chr> <chr>
#> 1 "{\"business_id\":..." 2 Cut and Taste Las Vegas
#> 2 "{\"business_id\":..." 8 Taco Bell Scottsdale
#> 3 "{\"business_id\":..." 10 Sehne Backwaren Stuttgart
#> 4 "{\"business_id\":..." 20 Graceful Cake Creations Mesa
#> 5 "{\"business_id\":..." 26 Chipotle Mexican Grill Toronto
#> 6 "{\"business_id\":..." 30 Carrabba's Italian Grill Glendale
#> 7 "{\"business_id\":..." 32 I Deal Coffee Toronto
#> 8 "{\"business_id\":..." 34 Lo-Lo's Chicken & Waffles Phoenix
#> 9 "{\"business_id\":..." 38 Kabob Palace Las Vegas
#> 10 "{\"business_id\":..." 43 Tea Shop 168 Markham
#> # ... with 11 more rows, and 2 more variables: state <chr>, stars <chr>
NOTE - I only processed the first 100 records of the yelp_academic_dataset_business.json file.

Extracting data from list in R

library(RCurl)
library(rjson)
json <- getURL('https://extraction.import.io/query/runtime/17d882b5-c118-4f27-8ce1-90085ec0b116?_apikey=d5a8a01e20174e95887dc0f385e4e3f6d7ef5ca1428d5a029f2aa352509948ade8e5d7fb0dc941f4769a32b541ca6b38a7cd6578dfd81b357fbc4f2e008f5154f1dbfcff31878798fa887b70b1ff59dd&url=http%3A%2F%2Fwww.numbeo.com%2Fcost-of-living%2Fcompare_cities.jsp%3Fcountry1%3DSingapore%26country2%3DAustralia%26city1%3DSingapore%26city2%3DMelbourne')
obj <- fromJSON(json)
I would like to get the data into nice columns of data, but many steps in the list are "nameless". Any idea of how to organise the data?

Check out this difference, and let me know what you think. This is what your object looks like:
library(RCurl)
library(rjson)
json <- getURL('https://extraction.import.io/query/runtime/17d882b5-c118-4f27-8ce1-90085ec0b116?_apikey=d5a8a01e20174e95887dc0f385e4e3f6d7ef5ca1428d5a029f2aa352509948ade8e5d7fb0dc941f4769a32b541ca6b38a7cd6578dfd81b357fbc4f2e008f5154f1dbfcff31878798fa887b70b1ff59dd&url=http%3A%2F%2Fwww.numbeo.com%2Fcost-of-living%2Fcompare_cities.jsp%3Fcountry1%3DSingapore%26country2%3DAustralia%26city1%3DSingapore%26city2%3DMelbourne')
obj <- rjson::fromJSON(json)
str(obj)
List of 2
$ extractorData:List of 3
..$ url : chr "http://www.numbeo.com/cost-of-living/compare_cities.jsp?country1=Singapore&country2=Australia&city1=Singapore&city2=Melbourne"
..$ resourceId: chr "b1250747011ee774e7c881617c86a5a9"
..$ data :List of 1
.. ..$ :List of 1
.. .. ..$ group:List of 52
.. .. .. ..$ :List of 6
.. .. .. .. ..$ COL VALUE :List of 1
.. .. .. .. .. ..$ :List of 1
.. .. .. .. .. .. ..$ text: chr "Meal, Inexpensive Restaurant"
Indeed a lot of Lists in between there that you don't need. Now try the jsonlite package's fromJSON function:
library(jsonlite)
obj2<- jsonlite::fromJSON(json)
List of 2
$ extractorData:List of 3
..$ url : chr "http://www.numbeo.com/cost-of-living/compare_cities.jsp?country1=Singapore&country2=Australia&city1=Singapore&city2=Melbourne"
..$ resourceId: chr "b1250747011ee774e7c881617c86a5a9"
..$ data :'data.frame': 1 obs. of 1 variable:
.. ..$ group:List of 1
.. .. ..$ :'data.frame': 52 obs. of 6 variables:
.. .. .. ..$ COL VALUE :List of 52
.. .. .. .. ..$ :'data.frame': 1 obs. of 1 variable:
.. .. .. .. .. ..$ text: chr "Meal, Inexpensive Restaurant"
.. .. .. .. ..$ :'data.frame': 1 obs. of 1 variable:
.. .. .. .. .. ..$ text: chr "Meal for 2 People, Mid-range Restaurant, Three-course"
.. .. .. .. ..$ :'data.frame': 1 obs. of 1 variable:
Still though, this JSON just isn't pretty, we'll need to fix this.
I take it you want that data frame in there. So start with
df <- obj2$extractorData$data$group[[1]]
and there's your data frame. Problem though: every single cell is in a list here. Including NULL values, and you can't just unlist those, they'll disappear and the columns in which they were will grow shorter...
Edit: Here's how to handle the columns with list(NULL) values.
df[sapply(df[,2],is.null),2] <- NA
df[sapply(df[,3],is.null),3] <- NA
df[sapply(df[,4],is.null),4] <- NA
df[sapply(df[,5],is.null),5] <- NA
df2 <- sapply(df, unlist) %>% as.data.frame
It can be written more elegantly for sure, but this'll get you going and it's understandable.

Transform list cell in data frame into rows

I'm sorry for no code to replicate, I can provide a picture only. See it below please.
A data frame with Facebook insights data prepared from JSON consists a column "values" with list values. For the next manipulation I need to have only one value in the column. So the row 3 on picture should be transformed into two (with list content or value directly):
post_story_adds_by_action_type_unique lifetime list(like = 38)
post_story_adds_by_action_type_unique lifetime list(share = 11)
If there are 3 or more values in data frame list cell, it should make 3 or more single value rows.
Do you know how to do it?
I use this code to get the json and data frame:
i <- fromJSON(post.request.url)
i <- as.data.frame(i$insights$data)
Edit:
There will be no deeper nesting, just this one level.
The list is not needed in the result, I need just the values and their names.

Let's assume you're starting with something that looks like this:
mydf <- data.frame(a = c("A", "B", "C", "D"), period = "lifetime")
mydf$values <- list(list(value = 42), list(value = 5),
list(value = list(like = 38, share = 11)),
list(value = list(like = 38, share = 13)))
str(mydf)
## 'data.frame': 4 obs. of 3 variables:
## $ a : Factor w/ 4 levels "A","B","C","D": 1 2 3 4
## $ period: Factor w/ 1 level "lifetime": 1 1 1 1
## $ values:List of 4
## ..$ :List of 1
## .. ..$ value: num 42
## ..$ :List of 1
## .. ..$ value: num 5
## ..$ :List of 1
## .. ..$ value:List of 2
## .. .. ..$ like : num 38
## .. .. ..$ share: num 11
## ..$ :List of 1
## .. ..$ value:List of 2
## .. .. ..$ like : num 38
## .. .. ..$ share: num 13
## NULL
Instead of retaining lists in your output, I would suggest flattening out the data, perhaps using a function like this:
myFun <- function(indt, col) {
if (!is.data.table(indt)) indt <- as.data.table(indt)
other_names <- setdiff(names(indt), col)
list_col <- indt[[col]]
rep_out <- sapply(list_col, function(x) length(unlist(x, use.names = FALSE)))
flat <- {
if (is.null(names(list_col))) names(list_col) <- seq_along(list_col)
setDT(tstrsplit(names(unlist(list_col)), ".", fixed = TRUE))[
, val := unlist(list_col, use.names = FALSE)][]
}
cbind(indt[rep(1:nrow(indt), rep_out)][, (col) := NULL], flat)
}
Here's what it does with the "mydf" I shared:
myFun(mydf, "values")
## a period V1 V2 V3 val
## 1: A lifetime 1 value NA 42
## 2: B lifetime 2 value NA 5
## 3: C lifetime 3 value like 38
## 4: C lifetime 3 value share 11
## 5: D lifetime 4 value like 38
## 6: D lifetime 4 value share 13

Get only specific object within json in a data frame

I would like to import a single object from a json file into a R data frame. Normally I use fromJSON() from the jsonlite package. However now I want to load this json into a data frame and then only the object that is called plays.
If I use:
library(jsonlite)
df <- fromJSON("http://live.nhl.com/GameData/20132014/2013020555/PlayByPlay.json")
It gives a data frame containing all the objects. Is there a way to only load the plays object in the data frame? Or should I just load the complete json and restructure this within R?

That does return a dataframe, although it 's kind of a mangled gemisch of list and dataframe. If you use a different package, it is just a list. Using str(df) (warning ...long output)
library(RJSONIO)
str(df)
#------------
List of 1
$ data:List of 2
..$ refreshInterval: num 0
..$ game :List of 7
.. ..$ awayteamid : num 24
.. ..$ awayteamname: chr "Anaheim Ducks"
.. ..$ hometeamname: chr "Washington Capitals"
.. ..$ plays :List of 1
.. .. ..$ play:List of 102
.. .. .. ..$ :List of 28
-----------Output truncated----------------
.... shows that the plays portions can be obtained with:
plays_out <- df$data$game$plays
I do not see that there is any advantage in trying to parse this yourself. Most of the "volume" of data is in the plays component.
When I use jsonlite::fromJSON I get a slightly different structure which is sufficiently different that I now I need to use a different call to get the plays items:
> str(df )
'data.frame': 1 obs. of 2 variables:
$ refreshInterval:List of 1
..$ data: num 0
$ game :'data.frame': 1 obs. of 7 variables:
..$ awayteamid :List of 1
.. ..$ data: num 24
..$ awayteamname:List of 1
.. ..$ data: chr "Anaheim Ducks"
..$ hometeamname:List of 1
.. ..$ data: chr "Washington Capitals"
..$ plays :'data.frame': 1 obs. of 1 variable:
.. ..$ play:List of 1
.. .. ..$ data:'data.frame': 102 obs. of 29 variables:
.. .. .. ..$ aoi :List of 102
.. .. .. .. ..$ : num 8470612 8470621 8473933 8473972 8475151 ...
.. .. .. .. ..$ : num 8459442 8467332 8467400 8471476 8471699 ...
.. .. .. .. ..$ : num 8459442 8467332 8467400 8471476 8471699 ...
.. .. .. .. ..$ : num 8459442 8467332 8467400 8471476 8471699 ...
#------snipped output------------
> length(df$game$plays)
[1] 1
> length(df$game$plays$play)
[1] 1
> length(df$game$plays$play$data)
[1] 29
I think I prefer the result from RJSONIO::fromJSON, since it doesn't add the complexity of dataframe coercion.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Processing JSON using rjson - json

Related

Scraping html text into table with delimiters that do not have a clear pattern using R (rvest)

Filter in Nested Data Frame

Extracting data from list in R

Transform list cell in data frame into rows

Get only specific object within json in a data frame

Categories

Resources