I have the following code, which extracts data from a JSON file.
library(jsonlite)
file_path <- 'C:/some/file/path.json'
df <- jsonlite::fromJSON(txt = file_path ,
simplifyVector = FALSE,
simplifyDataFrame = TRUE,
simplifyMatrix = FALSE,
flatten = FALSE)
The data structure is highly nested. My approach extracts 99% of it just fine, but in one particular part of the data I came across a phenomenon that I would describe as an "embedded" data frame:
df <- structure(
list(
ID = c(1L, 2L, 3L, 4L, 5L),
var1 = c('a', 'b', 'c', 'd', 'e'),
var2 = structure(
list(
var2a = c('v', 'w', 'x', 'y', 'z'),
var2b = c('vv', 'ww', 'xx', 'yy', 'zz')),
.Names = c('var2a', 'var2b'),
row.names = c(NA, 5L),
class = 'data.frame'),
var3 = c('aa', 'bb', 'cc', 'dd', 'ee')),
.Names = c('ID', 'var1', 'var2', 'var3'),
row.names = c(NA, 5L),
class = 'data.frame')
# Looks like this:
# ID var1 var2.var2a var2.var2b var3
# 1 1 a v vv aa
# 2 2 b w ww bb
# 3 3 c x xx cc
# 4 4 d y yy dd
# 5 5 e z zz ee
This looks like a normal data frame, and it behaves like that for the most part.
class(df)
# [1] "data.frame"
df[1,]
# ID var1 var2.var2a var2.var2b var3
# 1 a v vv aa
dim(df)
# [1] 5 4
# One less than expected due to embedded data frame
lapply(df, class)
# $ID
# [1] "integer"
#
# $var1
# [1] "character"
#
# $var2
# [1] "data.frame"
#
# $var3
# [1] "character"
str(df)
# 'data.frame': 5 obs. of 4 variables:
# $ ID : int 1 2 3 4 5
# $ var1: chr "a" "b" "c" "d" ...
# $ var2:'data.frame': 5 obs. of 2 variables:
# ..$ var2a: chr "v" "w" "x" "y" ...
# ..$ var2b: chr "vv" "ww" "xx" "yy" ...
# $ var3: chr "aa" "bb" "cc" "dd" ...
What is going on here, why is jsonlite creating this odd structure instead of just a simple data.frame? Can I avoid this behaviour, and if not how can I most elegantly rectify this? I've used the approach below, but it feels very hacky, at best.
# Any columns with embedded data frame?
newX <- X[,-which(lapply(X, class) == 'data.frame')] %>%
# Append them to the end
cbind(X[,which(lapply(X, class) == 'data.frame')])
Update
The suggested workaround solves my issue, but I still feel like I don't understand the strange embedded data.frame structure. I would have thought that such a structure would be illegal by R data format conventions, or at least behave differently in terms of subsetting using [. I have opened a separate question on that.
I think you want to flatten your df object:
json <- toJSON(df)
flat_df <- fromJSON(json, flatten = T)
str(flat_df)
'data.frame': 5 obs. of 5 variables:
$ ID : int 1 2 3 4 5
$ var1 : chr "a" "b" "c" "d" ...
$ var3 : chr "aa" "bb" "cc" "dd" ...
$ var2.var2a: chr "v" "w" "x" "y" ...
$ var2.var2b: chr "vv" "ww" "xx" "yy" ...
Is that closer to what you're looking for?
Related
I'm sorry for no code to replicate, I can provide a picture only. See it below please.
A data frame with Facebook insights data prepared from JSON consists a column "values" with list values. For the next manipulation I need to have only one value in the column. So the row 3 on picture should be transformed into two (with list content or value directly):
post_story_adds_by_action_type_unique lifetime list(like = 38)
post_story_adds_by_action_type_unique lifetime list(share = 11)
If there are 3 or more values in data frame list cell, it should make 3 or more single value rows.
Do you know how to do it?
I use this code to get the json and data frame:
i <- fromJSON(post.request.url)
i <- as.data.frame(i$insights$data)
Edit:
There will be no deeper nesting, just this one level.
The list is not needed in the result, I need just the values and their names.
Let's assume you're starting with something that looks like this:
mydf <- data.frame(a = c("A", "B", "C", "D"), period = "lifetime")
mydf$values <- list(list(value = 42), list(value = 5),
list(value = list(like = 38, share = 11)),
list(value = list(like = 38, share = 13)))
str(mydf)
## 'data.frame': 4 obs. of 3 variables:
## $ a : Factor w/ 4 levels "A","B","C","D": 1 2 3 4
## $ period: Factor w/ 1 level "lifetime": 1 1 1 1
## $ values:List of 4
## ..$ :List of 1
## .. ..$ value: num 42
## ..$ :List of 1
## .. ..$ value: num 5
## ..$ :List of 1
## .. ..$ value:List of 2
## .. .. ..$ like : num 38
## .. .. ..$ share: num 11
## ..$ :List of 1
## .. ..$ value:List of 2
## .. .. ..$ like : num 38
## .. .. ..$ share: num 13
## NULL
Instead of retaining lists in your output, I would suggest flattening out the data, perhaps using a function like this:
myFun <- function(indt, col) {
if (!is.data.table(indt)) indt <- as.data.table(indt)
other_names <- setdiff(names(indt), col)
list_col <- indt[[col]]
rep_out <- sapply(list_col, function(x) length(unlist(x, use.names = FALSE)))
flat <- {
if (is.null(names(list_col))) names(list_col) <- seq_along(list_col)
setDT(tstrsplit(names(unlist(list_col)), ".", fixed = TRUE))[
, val := unlist(list_col, use.names = FALSE)][]
}
cbind(indt[rep(1:nrow(indt), rep_out)][, (col) := NULL], flat)
}
Here's what it does with the "mydf" I shared:
myFun(mydf, "values")
## a period V1 V2 V3 val
## 1: A lifetime 1 value NA 42
## 2: B lifetime 2 value NA 5
## 3: C lifetime 3 value like 38
## 4: C lifetime 3 value share 11
## 5: D lifetime 4 value like 38
## 6: D lifetime 4 value share 13
I have problem with converting lists to data.frame
First I have downloaded dataset in JSON format from Data API:
request1 <- POST(url = "https://api.data-api.io/v1/subjekti", add_headers('x-dataapi-key' = "xxxxxxx", 'content-type'= "application/json"), body = list(oib = oibreq), encode = "json")
json1 <- content(request1, type = "application/json")
json2 <- fromJSON(toJSON(json1, null = "null"), flatten = TRUE)
The problem is that data are elements of lists. For example
> json2[['oib']]
[[1]]
[1] "00045103869"
[[2]]
[1] "18527887472"
[[3]]
[1] "92680516748"
all colnames:
> colnames(json2)
[1] "oib" "mb" "mbs" "mbo" "rno" "naziv"
[7] "adresa" "grad" "posta" "zupanija" "nkd2007" "puo"
[13] "godinaOsnivanja" "status" "temeljniKapital" "isActive" "datumBrisanja" "predmetPoslovanja"
How can I convert this lists to data.frame?
Sorry, that was my first question on stockoverflow. There is my dataset:
> data <- dput(json3)
structure(list(oib = list("00045103869", "18527887472", "92680516748"),
mb = list("01699032", "03858731", "02591596"), mbs = list(
"080451345", "060060881", "040260786"), mbo = c(NA, NA,
NA), rno = c(NA, NA, NA), naziv = list("INTERIJER DIZAJN d.o.o.",
"M - Đ COMMERCE d.o.o.", "HIP REKLAME d.o.o. u stečaju"),
adresa = list("Savska cesta 179", "Put Piketa 0", "Sadska 2"),
grad = list("Zagreb", "Sinj", "Rijeka"), posta = list("10000",
"21230", "51000"), zupanija = list("Grad Zagreb", "Splitsko-dalmatinska",
"Primorsko-goranska"), nkd2007 = list("1623", "4719",
"4711"), puo = list(92L, 92L, 92L), godinaOsnivanja = list(
"2003", "1995", "2009"), status = list("bez postupka",
"bez postupka", "stečaj"), temeljniKapital = list("20.000,00 kn",
"509.100,00 kn", "20.000,00 kn"), isActive = list(TRUE,
TRUE, FALSE), datumBrisanja = list(NULL, NULL, "2015-12-24T00:00:00+01:00")), .Names = c("oib",
"mb", "mbs", "mbo", "rno", "naziv", "adresa", "grad", "posta",
"zupanija", "nkd2007", "puo", "godinaOsnivanja", "status", "temeljniKapital",
"isActive", "datumBrisanja"), class = "data.frame", row.names = c(NA,
3L))
A quick & dirty way would be to substitute the NULL values by e.g. NAs like this
f <- function(lst) lapply(lst, function(x) if (is.list(x)) f(x) else if (is.null(x)) NA_character_ else x)
df <- as.data.frame(lapply(f(json2), unlist))
str(df)
# 'data.frame': 3 obs. of 17 variables:
# $ oib : Factor w/ 3 levels "00045103869",..: 1 2 3
# $ mb : Factor w/ 3 levels "01699032","02591596",..: 1 3 2
# $ mbs : Factor w/ 3 levels "040260786","060060881",..: 3 2 1
# $ mbo : logi NA NA NA
# $ rno : logi NA NA NA
# $ naziv : Factor w/ 3 levels "HIP REKLAME d.o.o. u stecaju",..: 2 3 1
# $ adresa : Factor w/ 3 levels "Put Piketa 0",..: 3 1 2
# $ grad : Factor w/ 3 levels "Rijeka","Sinj",..: 3 2 1
# $ posta : Factor w/ 3 levels "10000","21230",..: 1 2 3
# $ zupanija : Factor w/ 3 levels "Grad Zagreb",..: 1 3 2
# $ nkd2007 : Factor w/ 3 levels "1623","4711",..: 1 3 2
# $ puo : int 92 92 92
# $ godinaOsnivanja: Factor w/ 3 levels "1995","2003",..: 2 1 3
# $ status : Factor w/ 2 levels "bez postupka",..: 1 1 2
# $ temeljniKapital: Factor w/ 2 levels "20.000,00 kn",..: 1 2 1
# $ isActive : logi TRUE TRUE FALSE
# $ datumBrisanja : Factor w/ 1 level "2015-12-24T00:00:00+01:00": NA NA 1
But there may be better options.
I’m using the R programming language (and R Studio) having trouble organizing some data that I’m pulling via API so that it’s writeable to a table. I’m using the StubHub API to get a JSON response that contains all ticket listings for a particular event. I can successfully make the call to StubHub, I get the successful response. Here’s the code I am using to grab the response:
# get the content part of the response
msgContent = content(response)
# format to JSON object
jsonContent = jsonlite::fromJSON(toJSON(msgContent),flatten=TRUE,simplifyVector=TRUE)
This JSON object has a node called “listing” and that’s what I’m most interested in, so I set a variable to that part of the object:
friListings = jsonContent $listing
Checking the class of “friListings” I see I have a data.frame:
> class(friListings)
[1] "data.frame"
When I click on this variable in R Studio — View(friListings) — it opens in a new tab and looks pretty and nicely formatted. There are 21 variables (columns) and 609 observations (row). I see null values for certain cells, which is expected.
I would like to write this data.frame out as a table in a file on my computer. When I try to do that, I get this error.
> write.table(friListings,file="data",row.names=FALSE)
Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) :
missing value where TRUE/FALSE needed
Looking at other postings, it appears this is happening because my data.frame is actually not “flat” and is a list of lists with different classes and nesting. I validate this by str() on each of the columns in friListings….
> str(friListings[1])
'data.frame': 609 obs. of 1 variable:
$ listingId:List of 609
..$ : int 1138579989
..$ : int 1138969061
..$ : int 1138958138
(this is just the first couple of lines, there are hundreds)
Another example:
> str(friListings[6])
'data.frame': 609 obs. of 1 variable:
$ sellerSectionName:List of 609
..$ : chr "Upper 354 - No View"
..$ : chr "Club 303 - Obstructed/No View"
..$ : chr "Middle 254 - Obstructed/No View"
(this is just the first couple of lines, there are hundreds)
Here is the head of friListings that I am attempting to share using dput from the reproducible example post:
> dput(head(friListings,4))
structure(list(listingId = list(1138579989L, 1138969061L, 1138958138L,
1139003985L), sectionId = list(1552295L, 1552172L, 1552220L,
1552289L), row = list("16", "6", "22", "26"), quantity = list(
1L, 2L, 4L, 1L), sellerSectionName = list("Upper 354 - No View",
"Club 303 - Obstructed/No View", "Middle 254 - Obstructed/No View",
"353"), sectionName = list("Upper 354 - Obstructed/No View",
"Club 303 - Obstructed/No View", "Middle 254 - Obstructed/No View",
"Upper 353 - Obstructed/No View"), seatNumbers = list("21",
"7,8", "13,14,15,16", "General Admission"), zoneId = list(
232917L, 232909L, 232914L, 232917L), zoneName = list("Upper",
"Club", "Middle", "Upper"), listingAttributeList = list(structure(c(204L,
201L), .Dim = c(2L, 1L)), structure(c(4369L, 5370L), .Dim = c(2L,
1L)), structure(c(4369L, 5989L), .Dim = c(2L, 1L)), structure(c(204L,
4369L), .Dim = c(2L, 1L))), listingAttributeCategoryList = list(
structure(1L, .Dim = c(1L, 1L)), structure(1L, .Dim = c(1L,
1L)), structure(1L, .Dim = c(1L, 1L)), structure(1L, .Dim = c(1L,
1L))), deliveryTypeList = list(structure(5L, .Dim = c(1L,
1L)), structure(5L, .Dim = c(1L, 1L)), structure(5L, .Dim = c(1L,
1L)), structure(5L, .Dim = c(1L, 1L))), dirtyTicketInd = list(
FALSE, FALSE, FALSE, FALSE), splitOption = list("0", "0",
"1", "1"), ticketSplit = list("1", "2", "2", "1"), splitVector = list(
structure(1L, .Dim = c(1L, 1L)), structure(2L, .Dim = c(1L,
1L)), structure(c(2L, 4L), .Dim = c(2L, 1L)), structure(1L, .Dim = c(1L,
1L))), sellerOwnInd = list(0L, 0L, 0L, 0L), currentPrice.amount = list(
468.99, 475L, 475L, 550.45), currentPrice.currency = list(
"USD", "USD", "USD", "USD"), faceValue.amount = list(NULL,
NULL, NULL, NULL), faceValue.currency = list(NULL, NULL,
NULL, NULL)), .Names = c("listingId", "sectionId", "row",
"quantity", "sellerSectionName", "sectionName", "seatNumbers",
"zoneId", "zoneName", "listingAttributeList", "listingAttributeCategoryList",
"deliveryTypeList", "dirtyTicketInd", "splitOption", "ticketSplit",
"splitVector", "sellerOwnInd", "currentPrice.amount", "currentPrice.currency",
"faceValue.amount", "faceValue.currency"), row.names = c(NA,
4L), class = "data.frame")
I tried to get around this by going through each column in friListings, unlisting that node, saving to a vector and then doing a cbind to stitch them all together. But, when I do that, I get vectors of different lengths because of the nulls. I took this approach one step further and tried to class each column to force NAs to preserve the nulls, but that’s not working. And, regardless, there’s gotta be a better approach than this. Here's some output to illustrate what happens when I attempt this approach.
# Take the column zoneId and casting it as numeric to force NA
friListings$zoneId<-lapply(friListings$zoneId, as.numeric)
# check the length
> length(friListings$zoneId)
[1] 609
# unlist and check the length... and I lost 11 items
> zoneid <- unlist(friListings$zoneId, use.names=FALSE)
> length(zoneid)
[1] 598
# here's the tail of the column... (because I happen to know that's where the empty values that are being dropped are)
> tail(friListings$zoneId)
[[1]]
numeric(0)
[[2]]
numeric(0)
[[3]]
numeric(0)
[[4]]
numeric(0)
[[5]]
numeric(0)
[[6]]
numeric(0)
I know people work with JSON and R all the time (I'm obviously not one of those people!), so maybe I’m missing something obvious. But I’ve spent 5 hours trying different ways to clean this data and searching the internet for answers. I read the JSON package documentation, too.
I really just want to "flatten" this object so that it’s pretty and structured in the same way the R Studio renders it when I do View(friListings). I'm already passing "flatten=TRUE" in my "fromJSON" call above and it doesn't seem to be doing what I expect. Same with the "simplifyVector=TRUE" (which is TRUE by default according to the docs, but added it for clarity).
Thanks for any insight or guidance you may be able to offer!!!
You might want to try and adapt this approach:
f <- function(x)
if(is.list(x)) {
unlist(lapply(x, f))
} else {
x[which(is.null(x))] <- NA
paste(x, collapse = ",")
}
df <- as.data.frame(do.call(cbind, lapply(friListings, f)))
write.table(df, tf <- tempfile(fileext = "csv"))
df <- read.table(tf)
str(df)
# 'data.frame': 4 obs. of 21 variables:
# $ listingId : int 1138579989 1138969061 1138958138 1139003985
# $ sectionId : int 1552295 1552172 1552220 1552289
# $ row : int 16 6 22 26
# $ quantity : int 1 2 4 1
# $ sellerSectionName : Factor w/ 4 levels "353","Club 303 - Obstructed/No View",..: 4 2 3 1
# $ sectionName : Factor w/ 4 levels "Club 303 - Obstructed/No View",..: 4 1 2 3
# $ seatNumbers : Factor w/ 4 levels "13,14,15,16",..: 2 3 1 4
# $ zoneId : int 232917 232909 232914 232917
# $ zoneName : Factor w/ 3 levels "Club","Middle",..: 3 1 2 3
# $ listingAttributeList : Factor w/ 4 levels "204,201","204,4369",..: 1 3 4 2
# $ listingAttributeCategoryList: int 1 1 1 1
# $ deliveryTypeList : int 5 5 5 5
# $ dirtyTicketInd : logi FALSE FALSE FALSE FALSE
# $ splitOption : int 0 0 1 1
# $ ticketSplit : int 1 2 2 1
# $ splitVector : Factor w/ 3 levels "1","2","2,4": 1 2 3 1
# $ sellerOwnInd : int 0 0 0 0
# $ currentPrice.amount : num 469 475 475 550
# $ currentPrice.currency : Factor w/ 1 level "USD": 1 1 1 1
# $ faceValue.amount : logi NA NA NA NA
# $ faceValue.currency : logi NA NA NA NA
In general I feel there is a need to make JSON objects by folding multiple columns. There is no direct way to do this afaik. Please point it out if there is ..
I have data of this from
A B C
1 a x
1 a y
1 c z
2 d p
2 f q
2 f r
How do I write a json which looks like
{'query':'1', 'type':[{'name':'a', 'values':[{'value':'x'}, {'value':'y'}]}, {'name':'c', 'values':[{'value':'z'}]}]}
and similarly for 'query':'2'
I am looking to spit them in the mongo import/export individual json lines format.
Any pointers are also appreciated..
You've got a little "non-standard" thing going with two keys of "value" (I don't know if this is legal json), as you can see here:
(js <- jsonlite::fromJSON('{"query":"1", "type":[{"name":"a", "values":[{"value":"x"}, {"value":"y"}]}, {"name":"c", "values":[{"value":"z"}]}]}'))
## $query
## [1] "1"
##
## $type
## name values
## 1 a x, y
## 2 c z
... with a data.frame cell containing a list of data.frames:
js$type$values[[1]]
## value
## 1 x
## 2 y
class(js$type$values[[1]])
## [1] "data.frame"
If you can accept your "type" variable containing a vector instead of a named-list, then perhaps the following code will suffice:
jsonlite::toJSON(lapply(unique(dat[, 'A']), function(a1) {
list(query = a1,
type = lapply(unique(dat[dat$A == a1, 'B']), function(b2) {
list(name = b2,
values = dat[(dat$A == a1) & (dat$B == b2), 'C'])
}))
}))
## [{"query":[1],"type":[{"name":["a"],"values":["x","y"]},{"name":["c"],"values":["z"]}]},{"query":[2],"type":[{"name":["d"],"values":["p"]},{"name":["f"],"values":["q","r"]}]}]
How do I write a json array from R that has a sequence of lat and long?
I would like to write:
[[[1,2],[3,4],[5,6]]]
the best I can do is:
toJSON(matrix(1:6, ncol = 2, byrow = T))
#"[ [ 1, 2 ],\n[ 3, 4 ],\n[ 5, 6 ] ]"
How can I wrap the thing in another array (the json kind)?
This is important to me so I can write files into a geojson format as a LineString.
I usually use fromJSON to get the target object :
ll <- fromJSON('[[[1,2],[3,4],[5,6]]]')
str(ll)
List of 1
$ :List of 3
..$ : num [1:2] 1 2
..$ : num [1:2] 3 4
..$ : num [1:2] 5 6
So we should create , a list of unnamed list, each containing 2 elements:
xx <- list(setNames(split(1:6,rep(1:3,each=2)),NULL))
identical(toJSON(xx),'[[[1,2],[3,4],[5,6]]]')
[1] TRUE
If you have a matrix
m1 <- matrix(1:6, ncol=2, byrow=T)
may be this helps:
library(rjson)
paste0("[",toJSON(setNames(split(m1, row(m1)),NULL)),"]")
#[1] "[[[1,2],[3,4],[5,6]]]"