R list(structure(list())) to data frame - json

I have a JSON data source providing a list of hashes:
[
{ "a": "foo",
"b": "sdfshk"
},
{ "a": "foo",
"b": "ihlkyhul"
}
]
I use fromJSON() in the rjson package to convert that to an R data structure. It returns:
list(
structure(list(a = "foo", b = "sdfshk"), .Names = c("a", "b")),
structure(list(a = "foo", b = "ihlkyhul"), .Names = c("a", "b"))
)
I need to get this into an R data frame, but data.frame() turns that into a single-row data frame with four columns instead of a 2x2 data frame as expected. I lack the R-fu to do the transform from one to the other, though it looks like it should be straightforward.
Bonus points:
The actual problem is a bit more complex, because the JSON data source isn't as regular as I show above. The objects it returns vary in type. That is, the field set in each can be one of a few different types:
[
{ "a": "foo",
"b": "asdfhalsdhfla"
},
{ "a": "bar",
"c": "akjdhflakjhsdlfkah",
"d": "jfhglskhfglskd",
},
{ "a": "foo",
"b": "dfhlkhldsfg"
}
]
As you can see, the "a" field in each object is a type tag, indicating which other fields the object will have.
I'm not too particular how the solution copes with this.
It wouldn't be horrible if the two object types were just mooshed together, so you get columns a, b, c, and d, and the rows simply have N/A or NULL values where the JSON source object doesn't have a value for a given field. I believe I can clean the resulting data frame with subset(df, a == "foo"). I'll end up with some empty columns that way, but it won't matter to my program.
It would be better if the solution provides a way to select which JSON source rows go into the data frame and which get rejected, so the result has only the columns and rows actually required.

If you have a jagged list you want converted to a data.frame, you could use Hadley's plyr's rbind.fill. Saved my neck on a couple of occasions. Let me know if this is what you're looking for. Notice that I modified your first example to include only "b" in the third element to make it jagged.
> x <- list(
+ structure(list(a = "foo", b = "sdfshk"), .Names = c("a", "b")),
+ structure(list(a = "foo", b = "ihlkyhul"), .Names = c("a", "b")),
+ structure(list(b = "asdf"), .Names = "b")
+ )
>
> library(plyr)
> do.call("rbind.fill", lapply(x, as.data.frame))
a b
1 foo sdfshk
2 foo ihlkyhul
3 <NA> asdf

Related

Splitting Json array values to k/v and sequencing

So I managed to split the following data to k/v pairs
"tags": [
"category--Cola",
"sugar--3.000000",
"barcode--cola001",
"barcode--cola001_1",
"language--en",
"sku--cola_classic",
"sku--cola_cherry",
],
like so...
t = product['tags']
t_filtered = [k for k in t if '--' in k]
product['tags'] = dict(s.split('--') for s in t_filtered)
I want the output to be something like this
{
"category": [Cola],
"sugar":[3.0],
"barcode":[cola001,cola001_1],
"language":[en],
"sku": [cola_classic, cola_cherry],
}
so I tried this... (ref: https://docs.python.org/3/library/collections.html#collections.defaultdict)
product['tags'] = dict(s.split('--') for s in t_filtered)
s = product['tags']
d = {}
for k, v in s:
d.setdefault(k, []).append(v)
print(d)
but getting this error:
ValueError: too many values to unpack (expected 2)
Also, just to verify s is a <classic 'dict'> so I can't figure out the issue.

Write/read recursive structure S4 objects

I have a recursive structure of S4 objects , that can be presented ( this is a simple version) by theses 2 classes:
cl2 <-
setClass("cl2",
representation(
id = "numeric",
date="Date"),
prototype = list(
date=Sys.Date(),
id=sample(1:100,1)
)
)
cl1 <-
setClass("cl1",
representation(
date="Date",
cl2 = "cl2"
),
prototype = list(
date=Sys.Date()
)
)
I would like to save/load objects of type cl1. I opt to use json format(suitable for unstructured objects). The problem is with dates. Dates are coerced to numeric? Is there an option/solution to get dates in the right format when I serialize the object? Note that the objects can contains other objects ( recursive structure) so I would like that all dates are in the good format.
cat(RJSONIO::toJSON(cl1(),pretty=TRUE))
{
"date" : 16861,
"cl2" : {
"id" : 90,
"date" : 16861
}
}
A solution can be to replace dates by character. But I will loose the validation mechanism of S4 object and I should implement the date validation for all objects. Thanks in advance for any help.
An expected output should be like :
{
"date" :"2016-03-01",
"cl2" : {
"id" : 76,
"date" : "2016-03-01"
}
}
Reading the documentation of toJSON I found an interesting parameter:
force unclass/skip objects of classes with no defined JSON mapping
So I tried and I think this would match you need as you can simply ignore the class entry:
> s <- jsonlite::toJSON(cl1(),force=TRUE,auto_unbox=TRUE,pretty=TRUE)
> s
{
"date": "2016-03-01",
"cl2": {
"date": "2016-03-01",
"id": 67,
"class": "cl2"
},
"class": "cl1"
}
Drawback: This is still no loadable "as-is" to s4 objects with fromJSON as it will give a named list back, analyzing the list recursively to recreate S4 objects is doable, but you'll have to create the necessary as implementation to turn a named list to your classes, for your example:
setAs('list', 'cl2',
function(from, to) {
new(to, id=from[['id']], date=as.Date(from[['date']]))
})
setAs('list','cl1',
function(from, to) {
new(to,date=as.Date(from[['date']],cl2=as(from[['cl2']],'cl2')))
})
With a dummy input from previous output:
input <- '
{
"date": "2016-03-05",
"cl2": {
"date": "2016-02-01",
"id": 83,
"class": "cl2"
},
"class": "cl1"
}'
This gives:
> as(fromJSON(input),'cl1')
An object of class "cl1"
Slot "date":
[1] "2016-03-05"
Slot "cl2":
An object of class "cl2"
Slot "id":
[1] 67
Slot "date":
[1] "2016-03-01"
I let you adapt this to your real use case, probably using fromJSON(input,FALSE) to get a 'pure' list to coerce with lapply for example if you have multiples instances of your cl1 class in the json input.
One option is to use the jsonlite package to serialize. Indeed jsonlite::tojson respects date and serilze them in well formated form. The problem is jsonlite::toJSON is not defined for S4 objects. My solution is to coerce the object to a list and then seralize it:
## S4 method to coerce any S4 object to a list
setMethod("as.list",signature(x="ANY"),
function(x) {
Map(
function(y) if (isS4(slot(x,y))) as.list(slot(x,y)) else slot(x,y)
,slotNames(class(x)))
})
## coercion
jsonlite::toJSON(as.list(cl1()),pretty=TRUE,auto_unbox=TRUE)
{
"date": "2016-03-01",
"cl2": {
"id": 24,
"date": "2016-03-01"
}
}
udpdate
in as.list I replace lapply by Map to create a named list.
For the recursive reading of S4 classes from JSON you can use a similar approach:
library(RJSONIO)
createParser <- function(className) {
setAs("list", className, function(from, to) {
to <- new(to)
for (n in names(from)) {
if (isS4(slot(to, n))) {
c <- class(slot(to, n))[[1]]
o <- as(from[[n]], c)
slot(to, n) = o
} else {
slot(to, n) = from[[n]]
}
}
to
})
}
Name <- setClass("Name", slots=c("first"="character", "last"="character"))
createParser("Name")
Customer <- setClass("Customer", slots=c("name"="Name", "age"="numeric"))
createParser("Customer")
Case <- setClass("Case", slots=c("customer"="Customer"))
createParser("Case")
c1 <- Case(customer=Customer(name=Name(first="Mika", last="R"), age=100))
j <- RJSONIO::toJSON(c1)
l <- RJSONIO::fromJSON(j, simplify = FALSE)
as(l, "Case")

Dataframe in R to be converted to sequence of JSON objects

I had asked the same question after editing 2 times of a previous question I had posted. I am sorry for the bad usage of this website. I have flagged it for deletion and I am posting a proper new question on the same here. Please look into this.
I am basically working on a recommender system code. The output has to be converted to sequence of JSON objects. I have a matrix that has a look up table for every item ID, with the list of the closest items it is related to and the the similarity scores associated with their combinations.
Let me explain through a example.
Suppose I have a matrix
In the below example, Item 1 is similar to Items 22 and 23 with similarity scores 0.8 and 0.5 respectively. And the remaining rows follow the same structure.
X1 X2 X3 X4 X5
1 22 23 0.8 0.5
34 4 87 0.4 0.4
23 7 92 0.6 0.5
I want a JSON structure for every item (every X1 for every row) along with the recommended items and the similarity scores for each combination as a separate JSON entity and this being done in sequence. I don't want an entire JSON object containing these individual ones.
Assume there is one more entity called "coid" that will be given as input to the code. I assume it is XYZ and it is same for all the rows.
{ "_id" : { "coid" : "XYZ", "iid" : "1"}, "items" : [ { "item" : "22", "score" : 0.8},{ "item": "23", "score" : 0.5}] }
{ "_id" : { "coid" : "XYZ", "iid" : "34"},"items" : [ { "item" : "4", "score" : 0.4},{ "item": "87", "score" : 0.4}] }
{ "_id" : { "coid" : "XYZ", "iid" : "23"},"items" : [ { "item" : "7", "score" : 0.6},{ "item": "92", "score" : 0.5}] }
As in the above, each entity is a valid JSON structure/object but they are not put together into a separate JSON object as a whole.
I appreciate all the help done for the previous question but somehow I feel this new alteration I have here is not related to them because in the end, if you do a toJSON(some entity), then it converts the entire thing to one JSON object. I don't want that.
I want individual ones like these to be written to a file.
I am very sorry for my ignorance and inconvenience. Please help.
Thanks.
library(rjson)
## Your matrix
mat <- matrix(c(1,34,23,
22, 4, 7,
23,87,92,
0.8, 0.4, 0.6,
0.5, 0.4, 0.5), byrow=FALSE, nrow=3)
I use a function (not very interesting name makejson) that takes a row of the matrix and returns a JSON object. It makes two list objects, _id and items, and combines them to a JSON object
makejson <- function(x, coid="ABC") {
`_id` <- list(coid = coid, iid=x[1])
nitem <- (length(x) - 1) / 2 # Number of items
items <- list()
for(i in seq(1, nitem)) {
items[[i]] <- list(item = x[i + 1], score = x[i + 1 + nitem])
}
toJSON(list(`_id`=`_id`, items=items))
}
Then using apply (or a for loop) I use the function for each row of the matrix.
res <- apply(mat, 1, makejson, coid="XYZ")
cat(res, sep = "\n")
## {"_id":{"coid":"XYZ","iid":1},"items":[{"item":22,"score":0.8},{"item":23,"score":0.5}]}
## {"_id":{"coid":"XYZ","iid":34},"items":[{"item":4,"score":0.4},{"item":87,"score":0.4}]}
## {"_id":{"coid":"XYZ","iid":23},"items":[{"item":7,"score":0.6},{"item":92,"score":0.5}]}
The result can be saved to a file with cat by specifying the file argument.
## cat(res, sep="\n", file="out.json")
There is a small difference in your output and mine, the numbers are in quotes ("). If you want to have it like that, mat has to be character.
## mat <- matrix(as.character(c(1,34,23, ...
Hope it helps,
alex

Ragged list or data frame to JSON

I am trying to create a ragged list in R that corresponds to the D3 tree structure of flare.json. My data is in a data.frame:
path <- data.frame(P1=c("direct","direct","organic","direct"),
P2=c("direct","direct","end","end"),
P3=c("direct","organic","",""),
P4=c("end","end","",""), size=c(5,12,23,45))
path
P1 P2 P3 P4 size
1 direct direct direct end 5
2 direct direct organic end 12
3 organic end 23
4 direct end 45
but it could also be a list or reshaped if necessary:
path <- list()
path[[1]] <- list(name=c("direct","direct","direct","end"),size=5)
path[[2]] <- list(name=c("direct","direct","organic","end"), size=12)
path[[3]] <- list(name=c("organic", "end"), size=23)
path[[4]] <- list(name=c("direct", "end"), size=45)
The desired output is:
rl <- list()
rl <- list(name="root", children=list())
rl$children[1] <- list(list(name="direct", children=list()))
rl$children[[1]]$children[1] <- list(list(name="direct", children=list()))
rl$children[[1]]$children[[1]]$children[1] <- list(list(name="direct", children=list()))
rl$children[[1]]$children[[1]]$children[[1]]$children[1] <- list(list(name="end", size=5))
rl$children[[1]]$children[[1]]$children[2] <- list(list(name="organic", children=list()))
rl$children[[1]]$children[[1]]$children[[2]]$children[1] <- list(list(name="end", size=12))
rl$children[[1]]$children[2] <- list(list(name="end", size=23))
rl$children[2] = list(list(name="organic", children=list()))
rl$children[[2]]$children[1] <- list(list(name="end", size=45))
So when I print to json it's:
require(RJSONIO)
cat(toJSON(rl, pretty=T))
{
"name" : "root",
"children" : [
{
"name" : "direct",
"children" : [
{
"name" : "direct",
"children" : [
{
"name" : "direct",
"children" : [
{
"name" : "end",
"size" : 5
}
]
},
{
"name" : "organic",
"children" : [
{
"name" : "end",
"size" : 12
}
]
}
]
},
{
"name" : "end",
"size" : 23
}
]
},
{
"name" : "organic",
"children" : [
{
"name" : "end",
"size" : 45
}
]
}
]
}
I am having a hard time wrapping my head around the recursive steps that are necessary to create this list structure in R. In JS I can pretty easily move around the nodes and at each node determine whether to add a new node or keep moving down the tree by using push as needed, eg: new = {"name": node, "children": []}; or new = {"name": node, "size": size}; as in this example. I tried to split the data.frame as in this example:
makeList<-function(x){
if(ncol(x)>2){
listSplit<-split(x,x[1],drop=T)
lapply(names(listSplit),function(y){list(name=y,children=makeList(listSplit[[y]]))})
} else {
lapply(seq(nrow(x[1])),function(y){list(name=x[,1][y],size=x[,2][y])})
}
}
jsonOut<-toJSON(list(name="root",children=makeList(path)))
but it gives me an error
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?
The function given in the linked Q&A is essentially what you need, however it was failing on your data set because of the null values for some rows in the later columns. Instead of just blindly repeating the recursion until you run out of columns, you need to check for your "end" value, and use that to switch to making leaves:
makeList<-function(x){
listSplit<-split(x[-1],x[1], drop=TRUE);
lapply(names(listSplit),function(y){
if (y == "end") {
l <- list();
rows = listSplit[[y]];
for(i in 1:nrow(rows) ) {
l <- c(l, list(name=y, size=rows[i,"size"] ) );
}
l;
}
else {
list(name=y,children=makeList(listSplit[[y]]))
}
});
}
I believe this does what you want, though it has some limitations. In particular, it is assumed that every branch in your network is unique (i.e. there can't be two rows in your data frame that are equal for every column other than size):
df.split <- function(p.df) {
p.lst.tmp <- unname(split(p.df, p.df[, 1]))
p.lst <- lapply(
p.lst.tmp,
function(x) {
if(ncol(x) == 2L && nrow(x) == 1L) {
return(list(name=x[1, 1], size=unname(x[, 2])))
} else if (isTRUE(is.na(unname(x[ ,2])))) {
return(list(name=x[1, 1], size=unname(x[, ncol(x)])))
}
list(name=x[1, 1], children=df.split(x[, -1, drop=F]))
}
)
p.lst
}
all.equal(rl, df.split(path)[[1]])
# [1] TRUE
Though note you had the organic size switched, so I had to fix your rl to get this result (rl has it as 45, but your path as 23). Also, I modified your path data.frame slightly:
path <- data.frame(
root=rep("root", 4),
P1=c("direct","direct","organic","direct"),
P2=c("direct","direct","end","end"),
P3=c("direct","organic",NA,NA),
P4=c("end","end",NA,NA),
size=c(5,12,23,45),
stringsAsFactors=F
)
WARNING: I haven't tested this with other structures, so it's possible it will hit corner cases that you'll need to debug.

Using RJSONIO and AsIs class

I am writing some helper functions to convert my R variables to JSON. I've come across this problem: I would like my values to be represented as JSON arrays, this can be done using the AsIs class according to the RJSONIO documentation.
x = "HELLO"
toJSON(list(x = I(x)), collapse="")
"{ \"x\": [ \"HELLO\" ] }"
But say we have a list
y = list(a = "HELLO", b = "WORLD")
toJSON(list(y = I(y)), collapse="")
"{ \"y\": {\n \"a\": \"HELLO\",\n\"b\": \"WORLD\" \n} }"
The value found in y -> a is NOT represented as an array. Ideally I would have
"{ \"y\": [{\n \"a\": \"HELLO\",\n\"b\": \"WORLD\" \n}] }"
Note the square brackets. Also I would like to get rid of all "\n"s, but collapse does not eliminate the line breaks in nested JSON. Any ideas?
try writing as
y = list(list(a = "HELLO", b = "WORLD"))
test<-toJSON(list(y = I(y)), collapse="")
when you write to file it appears as:
{ "y": [
{
"a": "HELLO",
"b": "WORLD"
}
] }
I guess you could remove the \n as
test<-gsub("\n","",test)
or use RJSON package
> rjson::toJSON(list(y = I(y)))
[1] "{\"y\":[{\"a\":\"HELLO\",\"b\":\"WORLD\"}]}"
The reason
> names(list(a = "HELLO", b = "WORLD"))
[1] "a" "b"
> names(list(list(a = "HELLO", b = "WORLD")))
NULL
examining the rjson::toJSON you will find this snippet of code
if (!is.null(names(x)))
return(toJSON(as.list(x)))
str = "["
so it would appear to need an unnamed list to treat it as a JSON array. Maybe RJSONIO is similar.