Importing data from a JSON file into R [duplicate] - json

This question already has answers here:
Parse JSON with R
(6 answers)
Closed 2 years ago.
Is there a way to import data from a JSON file into R? More specifically, the file is an array of JSON objects with string fields, objects, and arrays. The RJSON Package isn't very clear on how to deal with this http://cran.r-project.org/web/packages/rjson/rjson.pdf.

First install the rjson package:
install.packages("rjson")
Then:
library("rjson")
json_file <- "http://api.worldbank.org/country?per_page=10&region=OED&lendingtype=LNX&format=json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
Update: since version 0.2.1
json_data <- fromJSON(file=json_file)

jsonlite will import the JSON into a data frame. It can optionally flatten nested objects. Nested arrays will be data frames.
> library(jsonlite)
> winners <- fromJSON("winners.json", flatten=TRUE)
> colnames(winners)
[1] "winner" "votes" "startPrice" "lastVote.timestamp" "lastVote.user.name" "lastVote.user.user_id"
> winners[,c("winner","startPrice","lastVote.user.name")]
winner startPrice lastVote.user.name
1 68694999 0 Lamur
> winners[,c("votes")]
[[1]]
ts user.name user.user_id
1 Thu Mar 25 03:13:01 UTC 2010 Lamur 68694999
2 Thu Mar 25 03:13:08 UTC 2010 Lamur 68694999

An alternative package is RJSONIO. To convert a nested list, lapply can help:
l <- fromJSON('[{"winner":"68694999", "votes":[
{"ts":"Thu Mar 25 03:13:01 UTC 2010", "user":{"name":"Lamur","user_id":"68694999"}},
{"ts":"Thu Mar 25 03:13:08 UTC 2010", "user":{"name":"Lamur","user_id":"68694999"}}],
"lastVote":{"timestamp":1269486788526,"user":
{"name":"Lamur","user_id":"68694999"}},"startPrice":0}]'
)
m <- lapply(
l[[1]]$votes,
function(x) c(x$user['name'], x$user['user_id'], x['ts'])
)
m <- do.call(rbind, m)
gives information on the votes in your example.

If the URL is https, like used for Amazon S3, then use getURL
json <- fromJSON(getURL('https://s3.amazonaws.com/bucket/my.json'))

First install the RJSONIO and RCurl package:
install.packages("RJSONIO")
install.packages("(RCurl")
Try below code using RJSONIO in console
library(RJSONIO)
library(RCurl)
json_file = getURL("https://raw.githubusercontent.com/isrini/SI_IS607/master/books.json")
json_file2 = RJSONIO::fromJSON(json_file)
head(json_file2)

load the packages:
library(httr)
library(jsonlite)
I have had issues converting json to dataframe/csv. For my case I did:
Token <- "245432532532"
source <- "http://......."
header_type <- "applcation/json"
full_token <- paste0("Bearer ", Token)
response <- GET(n_source, add_headers(Authorization = full_token, Accept = h_type), timeout(120), verbose())
text_json <- content(response, type = 'text', encoding = "UTF-8")
jfile <- fromJSON(text_json)
df <- as.data.frame(jfile)
then from df to csv.
In this format it should be easy to convert it to multiple .csvs if needed.
The important part is content function should have type = 'text'.

import httr package
library(httr)
Get the url
url <- "http://www.omdbapi.com/?apikey=72bc447a&t=Annie+Hall&y=&plot=short&r=json"
resp <- GET(url)
Print content of resp as text
content(resp, as = "text")
Print content of resp
content(resp)
Use content() to get the content of resp, but this time do not specify
a second argument. R figures out automatically that you're dealing
with a JSON, and converts the JSON to a named R list.

Related

Make a dataframe from a json list

I would like convert data from json in a data frame in R. I tried with the package data.tree but i get only a data.frame with only NA...
library(dplyr)
library(jsonlite)
library(data.tree)
library(magrittr)
data<-fromJSON("http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/nama_gdp_c?precision=1&geo=EU28&unit=EUR_HAB&time=2010&time=2011&indic_na=B1GM")
repos<-as.Node(data)
repos %>% ToDataFrameTable(valeur=function(x) x$repos$value,annee= function(x) x$repos$dimension$time$category$label)
I tried this too:
repos %>% ToDataFrameTable(valeur=function(x) x$value,annee= function(x) x$dimension$time$category$label)
But here there is just a two columns data empty
I tried directly this
as.data.frame(valeur=data$value,annee=data$dimension$time$category$label)
but i get this :
"Error in as.data.frame(valeur = data$value, annee = data$dimension$time$category$label) : argument "x" is missing, with no default"
If someone know something...
How about this?
library(rjson)
js <- fromJSON(file="http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/nama_gdp_c?precision=1&geo=EU28&unit=EUR_HAB&time=2010&time=2011&indic_na=B1GM")
df <- data.frame(years=stack(js$dimension$time$category$label)$value,
value=stack(js$value)$value,
country=stack(js$dimension$geo$category$label)$value)
df
Output is:
years value country
1 2010 24400 European Union (28 countries)
2 2011 25100 European Union (28 countries)

How to handle HTTP error 503 when making API calls to process JSON files in R with the jsonlite package?

I'm having problems using the JSONlite package in R to collect Dota2 match data using the Steam API. I am not an experienced developer and really appreciate any help. Thanks!
I have created a script in R. When I check the API call using a web browser it correctly returns the JSON contents, but when I execute the very same API call in R (either in a for loop or as a single call) using the fromJSON() function, I get the following errors:
Error in open.connection(con, "rb") : HTTP error 503.
In addition: Warning message:
closing unused connection 3 (https://api.steampowered.com/IDOTA2Match_570/GetMatchDetails/V001/?match_id=2170111273&key=XXXXXXXXXXPLACEHOLDERXXXXXXXXXXX)
This is the R script I have created to collect multiple JSON responses using the fromJSON command and jsonlite:
# Load required libraries
library(rvest)
library(stringr)
library(magrittr)
library(plyr)
library(dplyr)
library(tidyr)
library(knitr)
library(XML)
library(data.table)
library(foreign)
library(pbapply)
library(jsonlite)
## Set base url components
base.url_0 = "https://api.steampowered.com/IDOTA2Match_570/GetMatchDetails/V001/?match_id="
base.url_0.1 = "&key="
steamAPIkey = "XXXXXXXXXXPLACEHOLDERXXXXXXXXXXX" # Steam API Key
### Create for loop where each "i" is a DOTA2 match ID
for(i in seq(1:length(targets$match_id))) {
base.url = paste0(
base.url_0,
targets$match_id[i],
base.url_0.1,
steamAPIkey)
message("Retrieving page ", targets$match_id[i])
## Get JSON response and store into data.frame
ifelse(
tmp_json <- fromJSON(
txt = base.url,flatten = T), # if the json file exists
as.data.frame(tmp_errors_1$matches) <- base.url # if the json file does not exists
) # close ifelse statement
tmp_json <- try_default(
expr =
as.data.frame(tmp_json), # convert json file into a data frame
default =
as.data.frame(tmp_errors_2$matches) <- base.url, quiet = T) # if error, add match id to a dataframe
## Rbindlist
l = list(results, tmp_json)
results <- rbindlist(l,fill = T)
## Sleep for x seconds
Sys.sleep(runif(1, 2, 3))
## End of loop
}

How to read nested JSON structure?

I have some JSON that looks like this:
"total_rows":141,"offset":0,"rows":[
{"id":"1","key":"a","value":{"SP$Sale_Price":"240000","CONTRACTDATE$Contract_Date":"2006-10-26T05:00:00"}},
{"id":"2","key":"b","value":{"SP$Sale_Price":"2000000","CONTRACTDATE$Contract_Date":"2006-08-22T05:00:00"}},
{"id":"3","key":"c","value":{"SP$Sale_Price":"780000","CONTRACTDATE$Contract_Date":"2007-01-18T06:00:00"}},
...
In R, what would be the easiest way to produce a scatter-plot of SP$Sale_Price versus CONTRACTDATE$Contract_Date?
I got this far:
install.packages("rjson")
library("rjson")
json_file <- "http://localhost:5984/testdb/_design/sold/_view/sold?limit=100"
json_data <- fromJSON(file=json_file)
install.packages("plyr")
library(plyr)
asFrame <- do.call("rbind.fill", lapply(json_data, as.data.frame))
but now I'm stuck...
> plot(CONTRACTDATE$Contract_Date, SP$Sale_Price)
Error in plot(CONTRACTDATE$Contract_Date, SP$Sale_Price) :
object 'CONTRACTDATE' not found
How to make this work?
Suppose you have the following JSON-file:
txt <- '{"total_rows":141,"offset":0,"rows":[
{"id":"1","key":"a","value":{"SP$Sale_Price":"240000","CONTRACTDATE$Contract_Date":"2006-10-26T05:00:00"}},
{"id":"2","key":"b","value":{"SP$Sale_Price":"2000000","CONTRACTDATE$Contract_Date":"2006-08-22T05:00:00"}},
{"id":"3","key":"c","value":{"SP$Sale_Price":"780000","CONTRACTDATE$Contract_Date":"2007-01-18T06:00:00"}}]}'
Then you can read it as follows with the jsonlite package:
library(jsonlite)
json_data <- fromJSON(txt, flatten = TRUE)
# get the needed dataframe
dat <- json_data$rows
# set convenient names for the columns
# this step is optional, it just gives you nicer columnnames
names(dat) <- c("id","key","sale_price","contract_date")
# convert the 'contract_date' column to a datetime format
dat$contract_date <- strptime(dat$contract_date, format="%Y-%m-%dT%H:%M:%S", tz="GMT")
Now you can plot:
plot(dat$contract_date, dat$sale_price)
Which gives:
If you choose not to flatten the JSON, you can do:
json_data <- fromJSON(txt)
dat <- json_data$rows$value
sp <- strtoi(dat$`SP$Sale_Price`)
cd <- strptime(dat$`CONTRACTDATE$Contract_Date`, format="%Y-%m-%dT%H:%M:%S", tz="GMT")
plot(cd,sp)
Which gives the same plot:
I found a way that doesn't discard the field names:
install.packages("jsonlite")
install.packages("curl")
json <- fromJSON(json_file)
r <- json$rows
At this point r looks like this:
> class(r)
[1] "data.frame"
> colnames(r)
[1] "id" "key" "value"
After some more Googling and trial-and-error I landed on this:
f <- r$value
sp <- strtoi(f[["SP$Sale_Price"]])
cd <- strptime(f[["CONTRACTDATE$Contract_Date"]], format="%Y-%m-%dT%H:%M:%S", tz="GMT")
plot(cd,sp)
And the result on my full data-set...

R: How to cache scraped websites (XML package) for later processing

I have the following function to webscrape websites:
library(XML)
dl_url <- function(link_url) {
con <- url(link_url)
raw_data <- readLines(con)
close(con)
parsed_data <- htmlTreeParse(raw_data, useInternalNodes = TRUE)
parsed_data
}
When I use:
URLs <- lapply(list_urls, dl_url)
I get the expected list of parsed websites,
str(URLs):
List of x
$ :Classes 'HTMLInternalDocument', 'HTMLInternalDocument', 'XMLInternalDocument', 'XMLAbstractDocument' <externalptr>
$ :Classes 'HTMLInternalDocument', 'HTMLInternalDocument', 'XMLInternalDocument', 'XMLAbstractDocument' <externalptr>
....
However, I am unable to store the data. dput(URLs) only yields a 1 kb file with text in it.
What is the best way to locally cache (parsed) html websites in R?
Thank you very much!

R: Extract JSON Variable Info

I'm trying to download NBA player information from Numberfire and then put that information into a data frame. However I seem to be running into a few issues
The following snippet downloads the information just fine
require(RCurl)
require(stringr)
require(rjson)
#download data from numberfire
nf <- "https://www.numberfire.com/nba/fantasy/fantasy-basketball-projections"
html <- getURL(nf)
Then there is what I assume to be a JSON data structure
#extract json variable (?)
pat <- "NF_DATA.*}}}"
jsn <- str_extract(html, pat)
jsn <- str_split(jsn, "NF_DATA = ")
parse <- newJSONParser()
parse$addData(jsn)
It seems to add data OK as it doesn't throw any errors, but if there is data in that object I can't tell or seem to get it out!
I'd paste in the jsn variable but it's way over the character limit. Any hints as to where I'm going wrong would be much appreciated
Adding the final line gets a nice list format that you can transform to a data.frame
require(RCurl); require(stringr); require(rjson)
#download data from numberfire
nf <- "https://www.numberfire.com/nba/fantasy/fantasy-basketball-projections"
html <- getURL(nf)
#extract json variable (?)
pat <- "NF_DATA.*}}}"
jsn <- str_extract(html, pat)
jsn <- str_split(jsn, "NF_DATA = ")
fromJSON(jsn[[1]][[2]])