Using RJSONIO and AsIs class - json

I am writing some helper functions to convert my R variables to JSON. I've come across this problem: I would like my values to be represented as JSON arrays, this can be done using the AsIs class according to the RJSONIO documentation.
x = "HELLO"
toJSON(list(x = I(x)), collapse="")
"{ \"x\": [ \"HELLO\" ] }"
But say we have a list
y = list(a = "HELLO", b = "WORLD")
toJSON(list(y = I(y)), collapse="")
"{ \"y\": {\n \"a\": \"HELLO\",\n\"b\": \"WORLD\" \n} }"
The value found in y -> a is NOT represented as an array. Ideally I would have
"{ \"y\": [{\n \"a\": \"HELLO\",\n\"b\": \"WORLD\" \n}] }"
Note the square brackets. Also I would like to get rid of all "\n"s, but collapse does not eliminate the line breaks in nested JSON. Any ideas?

try writing as
y = list(list(a = "HELLO", b = "WORLD"))
test<-toJSON(list(y = I(y)), collapse="")
when you write to file it appears as:
{ "y": [
{
"a": "HELLO",
"b": "WORLD"
}
] }
I guess you could remove the \n as
test<-gsub("\n","",test)
or use RJSON package
> rjson::toJSON(list(y = I(y)))
[1] "{\"y\":[{\"a\":\"HELLO\",\"b\":\"WORLD\"}]}"
The reason
> names(list(a = "HELLO", b = "WORLD"))
[1] "a" "b"
> names(list(list(a = "HELLO", b = "WORLD")))
NULL
examining the rjson::toJSON you will find this snippet of code
if (!is.null(names(x)))
return(toJSON(as.list(x)))
str = "["
so it would appear to need an unnamed list to treat it as a JSON array. Maybe RJSONIO is similar.

Related

Splitting Json array values to k/v and sequencing

So I managed to split the following data to k/v pairs
"tags": [
"category--Cola",
"sugar--3.000000",
"barcode--cola001",
"barcode--cola001_1",
"language--en",
"sku--cola_classic",
"sku--cola_cherry",
],
like so...
t = product['tags']
t_filtered = [k for k in t if '--' in k]
product['tags'] = dict(s.split('--') for s in t_filtered)
I want the output to be something like this
{
"category": [Cola],
"sugar":[3.0],
"barcode":[cola001,cola001_1],
"language":[en],
"sku": [cola_classic, cola_cherry],
}
so I tried this... (ref: https://docs.python.org/3/library/collections.html#collections.defaultdict)
product['tags'] = dict(s.split('--') for s in t_filtered)
s = product['tags']
d = {}
for k, v in s:
d.setdefault(k, []).append(v)
print(d)
but getting this error:
ValueError: too many values to unpack (expected 2)
Also, just to verify s is a <classic 'dict'> so I can't figure out the issue.

Look for JSON example with all allowed combinations of structure in max depth 2 or 3

I've wrote a program which process JSON objects. Now I want to verify if I've missed something.
Is there an JSON-example of all allowed JSON structure combinations? Something like this:
{
"key1" : "value",
"key2" : 1,
"key3" : {"key1" : "value"},
"key4" : [
[
"string1",
"string2"
],
[
1,
2
],
...
],
"key5" : true,
"key6" : false,
"key7" : null,
...
}
As you can see at http://json.org/ on the right hand side the grammar of JSON isn't quite difficult, but I've got several exceptions because I've forgotten to handles some structure combinations which are possible. E.g. inside an array there can be "string, number, object, array, true, false, null" but my program couldn't handle arrays inside an array until I ran into an exception. So everything was fine until I got this valid JSON object with arrays inside an array.
I want to test my program with a JSON object (which I'm looking for). After this test I want to be feel certain that my program handle every possible valid JSON structure on earth without an exception.
I don't need nesting in depth 5 or so. I only need something in nested depth 2 or max 3. With all base types which nested all allowed base types, inside this base type.
Have you thought of escaped characters and objects within an object?
{
"key1" : {
"key1" : "value",
"key2" : [
"String1",
"String2"
],
},
"key2" : "\"This is a quote\"",
"key3" : "This contains an escaped slash: \\",
"key4" : "This contains accent charachters: \u00eb \u00ef",
}
Note: \u00eb and \u00ef are resp. charachters ë and ï
Choose a programming language that support json.
Try to load your json, on fail the exception's message is descriptive.
Example:
Python:
import json, sys;
json.loads(open(sys.argv[1]).read())
Generate:
import random, json, os, string
def json_null(depth = 0):
return None
def json_int(depth = 0):
return random.randint(-999, 999)
def json_float(depth = 0):
return random.uniform(-999, 999)
def json_string(depth = 0):
return ''.join(random.sample(string.printable, random.randrange(10, 40)))
def json_bool(depth = 0):
return random.randint(0, 1) == 1
def json_list(depth):
lst = []
if depth:
for i in range(random.randrange(8)):
lst.append(gen_json(random.randrange(depth)))
return lst
def json_object(depth):
obj = {}
if depth:
for i in range(random.randrange(8)):
obj[json_string()] = gen_json(random.randrange(depth))
return obj
def gen_json(depth = 8):
if depth:
return random.choice([json_list, json_object])(depth)
else:
return random.choice([json_null, json_int, json_float, json_string, json_bool])(depth)
print(json.dumps(gen_json(), indent = 2))

Write/read recursive structure S4 objects

I have a recursive structure of S4 objects , that can be presented ( this is a simple version) by theses 2 classes:
cl2 <-
setClass("cl2",
representation(
id = "numeric",
date="Date"),
prototype = list(
date=Sys.Date(),
id=sample(1:100,1)
)
)
cl1 <-
setClass("cl1",
representation(
date="Date",
cl2 = "cl2"
),
prototype = list(
date=Sys.Date()
)
)
I would like to save/load objects of type cl1. I opt to use json format(suitable for unstructured objects). The problem is with dates. Dates are coerced to numeric? Is there an option/solution to get dates in the right format when I serialize the object? Note that the objects can contains other objects ( recursive structure) so I would like that all dates are in the good format.
cat(RJSONIO::toJSON(cl1(),pretty=TRUE))
{
"date" : 16861,
"cl2" : {
"id" : 90,
"date" : 16861
}
}
A solution can be to replace dates by character. But I will loose the validation mechanism of S4 object and I should implement the date validation for all objects. Thanks in advance for any help.
An expected output should be like :
{
"date" :"2016-03-01",
"cl2" : {
"id" : 76,
"date" : "2016-03-01"
}
}
Reading the documentation of toJSON I found an interesting parameter:
force unclass/skip objects of classes with no defined JSON mapping
So I tried and I think this would match you need as you can simply ignore the class entry:
> s <- jsonlite::toJSON(cl1(),force=TRUE,auto_unbox=TRUE,pretty=TRUE)
> s
{
"date": "2016-03-01",
"cl2": {
"date": "2016-03-01",
"id": 67,
"class": "cl2"
},
"class": "cl1"
}
Drawback: This is still no loadable "as-is" to s4 objects with fromJSON as it will give a named list back, analyzing the list recursively to recreate S4 objects is doable, but you'll have to create the necessary as implementation to turn a named list to your classes, for your example:
setAs('list', 'cl2',
function(from, to) {
new(to, id=from[['id']], date=as.Date(from[['date']]))
})
setAs('list','cl1',
function(from, to) {
new(to,date=as.Date(from[['date']],cl2=as(from[['cl2']],'cl2')))
})
With a dummy input from previous output:
input <- '
{
"date": "2016-03-05",
"cl2": {
"date": "2016-02-01",
"id": 83,
"class": "cl2"
},
"class": "cl1"
}'
This gives:
> as(fromJSON(input),'cl1')
An object of class "cl1"
Slot "date":
[1] "2016-03-05"
Slot "cl2":
An object of class "cl2"
Slot "id":
[1] 67
Slot "date":
[1] "2016-03-01"
I let you adapt this to your real use case, probably using fromJSON(input,FALSE) to get a 'pure' list to coerce with lapply for example if you have multiples instances of your cl1 class in the json input.
One option is to use the jsonlite package to serialize. Indeed jsonlite::tojson respects date and serilze them in well formated form. The problem is jsonlite::toJSON is not defined for S4 objects. My solution is to coerce the object to a list and then seralize it:
## S4 method to coerce any S4 object to a list
setMethod("as.list",signature(x="ANY"),
function(x) {
Map(
function(y) if (isS4(slot(x,y))) as.list(slot(x,y)) else slot(x,y)
,slotNames(class(x)))
})
## coercion
jsonlite::toJSON(as.list(cl1()),pretty=TRUE,auto_unbox=TRUE)
{
"date": "2016-03-01",
"cl2": {
"id": 24,
"date": "2016-03-01"
}
}
udpdate
in as.list I replace lapply by Map to create a named list.
For the recursive reading of S4 classes from JSON you can use a similar approach:
library(RJSONIO)
createParser <- function(className) {
setAs("list", className, function(from, to) {
to <- new(to)
for (n in names(from)) {
if (isS4(slot(to, n))) {
c <- class(slot(to, n))[[1]]
o <- as(from[[n]], c)
slot(to, n) = o
} else {
slot(to, n) = from[[n]]
}
}
to
})
}
Name <- setClass("Name", slots=c("first"="character", "last"="character"))
createParser("Name")
Customer <- setClass("Customer", slots=c("name"="Name", "age"="numeric"))
createParser("Customer")
Case <- setClass("Case", slots=c("customer"="Customer"))
createParser("Case")
c1 <- Case(customer=Customer(name=Name(first="Mika", last="R"), age=100))
j <- RJSONIO::toJSON(c1)
l <- RJSONIO::fromJSON(j, simplify = FALSE)
as(l, "Case")

Scala 2.10: Array + JSON arrays to hashmap

After reading a JSON result from a web service response:
val jsonResult: JsValue = Json.parse(response.body)
Containing content something like:
{
result: [
["Name 1", "Row1 Val1", "Row1 Val2"],
["Name 2", "Row2 Val1", "Row2 Val2"]
]
}
How can I efficiently map the contents of the result array in the JSON with a list (or something similar) like:
val keys = List("Name", "Val1", "Val2")
To get an array of hashmaps?
Something like this ?
This solution is functional and handles None/Failure cases "properly" (by returning a None)
val j = JSON.parseFull( json ).asInstanceOf[ Option[ Map[ String, List[ List[ String ] ] ] ] ]
val res = j.map { m ⇒
val r = m get "result"
r.map { ll ⇒
ll.foldRight( List(): List[ Map[ String, String ] ] ) { ( l, acc ) ⇒
Map( ( "Name" -> l( 0 ) ), ( "Val1" -> l( 1 ) ), ( "Val2" -> l( 2 ) ) ) :: acc
}
}.getOrElse(None)
}.getOrElse(None)
Note 1: I had to put double quotes around result in the JSON String to get the JSON parser to work
Note 2: the code could look nicer using more "monadic" sugar such as for comprehensions or using applicative functors

R list(structure(list())) to data frame

I have a JSON data source providing a list of hashes:
[
{ "a": "foo",
"b": "sdfshk"
},
{ "a": "foo",
"b": "ihlkyhul"
}
]
I use fromJSON() in the rjson package to convert that to an R data structure. It returns:
list(
structure(list(a = "foo", b = "sdfshk"), .Names = c("a", "b")),
structure(list(a = "foo", b = "ihlkyhul"), .Names = c("a", "b"))
)
I need to get this into an R data frame, but data.frame() turns that into a single-row data frame with four columns instead of a 2x2 data frame as expected. I lack the R-fu to do the transform from one to the other, though it looks like it should be straightforward.
Bonus points:
The actual problem is a bit more complex, because the JSON data source isn't as regular as I show above. The objects it returns vary in type. That is, the field set in each can be one of a few different types:
[
{ "a": "foo",
"b": "asdfhalsdhfla"
},
{ "a": "bar",
"c": "akjdhflakjhsdlfkah",
"d": "jfhglskhfglskd",
},
{ "a": "foo",
"b": "dfhlkhldsfg"
}
]
As you can see, the "a" field in each object is a type tag, indicating which other fields the object will have.
I'm not too particular how the solution copes with this.
It wouldn't be horrible if the two object types were just mooshed together, so you get columns a, b, c, and d, and the rows simply have N/A or NULL values where the JSON source object doesn't have a value for a given field. I believe I can clean the resulting data frame with subset(df, a == "foo"). I'll end up with some empty columns that way, but it won't matter to my program.
It would be better if the solution provides a way to select which JSON source rows go into the data frame and which get rejected, so the result has only the columns and rows actually required.
If you have a jagged list you want converted to a data.frame, you could use Hadley's plyr's rbind.fill. Saved my neck on a couple of occasions. Let me know if this is what you're looking for. Notice that I modified your first example to include only "b" in the third element to make it jagged.
> x <- list(
+ structure(list(a = "foo", b = "sdfshk"), .Names = c("a", "b")),
+ structure(list(a = "foo", b = "ihlkyhul"), .Names = c("a", "b")),
+ structure(list(b = "asdf"), .Names = "b")
+ )
>
> library(plyr)
> do.call("rbind.fill", lapply(x, as.data.frame))
a b
1 foo sdfshk
2 foo ihlkyhul
3 <NA> asdf