R: jsonlite - export key:value pairs from a list of lists - json

I have a list of lists which are of variable length. The first value of each nested list is the key, and the rest of the values in the list will be the array entry. It looks something like this:
[[1]]
[1] "Bob" "Apple"
[[2]]
[1] "Cindy" "Apple" "Banana" "Orange" "Pear" "Raspberry"
[[3]]
[1] "Mary" "Orange" "Strawberry"
[[4]]
[1] "George" "Banana"
I've extracted the keys and entries as follows:
keys <- lapply(x, '[', 1)
entries <- lapply(x, '[', -1)
but now that I have these, I don't know how I can associate a key:value pair in R without creating a matrix first, but this is silly since my data don't fit in a rectangle anyway (every example I've seen uses the column names from a matrix as the key values).
This is my crappy method using a matrix, assigning rownames, and then using jsonLite to export to JSON.
#Create a matrix from entries, without recycling
#I found this function on StackOverflow which seems to work...
cbind.fill <- function(...){
nm <- list(...)
nm <- lapply(nm, as.matrix)
n <- max(sapply(nm, nrow))
do.call(cbind, lapply(nm, function (x)
rbind(x, matrix(, n-nrow(x), ncol(x)))))
}
#Call said function
matrix <- cbind.fill(entries)
#Transpose the thing
matrix <- t(matrix)
#Set column names
colnames(matrix) <- keys
#Export to json
json<-toJSON(matrix)
The result is good, but the implementation sucks. Result:
[{"Bob":["Apple"],"Cindy":["Apple","Banana","Orange","Pear","Raspberry"],"Mary":["Orange","Strawberry"],"George":["Banana"]}]
Please let me know of better ways that might exist to accomplish this.

How about:
names(entries) <- unlist(keys)
toJSON(entries)

Consider the following lapply() approach:
library(jsonlite)
entries <- list(c('Bob', 'Apple'),
c('Cindy', 'Apple', 'Banana', 'Orange','Pear','Raspberry'),
c('Mary', 'Orange', 'Strawberry'),
c('George', 'Banana'))
# ITERATE ALL CONTENTS EXCEPT FIRST
inner <- list()
nestlist <- lapply(entries,
function(i) {
inner <- i[2:length(i)]
return(inner)
})
# NAME EACH ELEMENT WITH FIRST ELEMENT
names(nestlist) <- lapply(entries, function(i) i[1])
#$Bob
#[1] "Apple"
#$Cindy
#[1] "Apple" "Banana" "Orange" "Pear" "Raspberry"
#$Mary
#[1] "Orange" "Strawberry"
#$George
#[1] "Banana"
x <- toJSON(list(nestlist), pretty=TRUE)
x
#[
# {
# "Bob": ["Apple"],
# "Cindy": ["Apple", "Banana", "Orange", "Pear", "Raspberry"],
# "Mary": ["Orange", "Strawberry"],
# "George": ["Banana"]
# }
#]

I think this has already been sufficiently answered but here is a method using purrr and jsonlite.
library(purrr)
library(jsonlite)
sample_data <- list(
list("Bob","Apple"),
list("Cindy","Apple","Banana","Orange","Pear","Raspberry"),
list("Mary","Orange","Strawberry"),
list("George","Banana")
)
sample_data %>%
map(~set_names(list(.x[-1]),.x[1])) %>%
toJSON(auto_unbox=TRUE, pretty=TRUE)

Related

R: Selecting certain from a JSON file

I've imported a JSON file into R from ( http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json ) and I'm trying to select only counties in Kansas.
Right now I have all the data into one variable and I'm trying to make subdata of this that is just counties of Kansas. I'm not sure how to go about this.
What you have there is geoJson, which can be read directly by library(sf), to give you an sf object, which is also data.frame. Then you can use the usual data.frame subsetting operations
library(sf)
sf <- sf::read_sf("http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json")
sf[sf$NAME == "Kansas", ]
# Simple feature collection with 1 feature and 5 fields
# geometry type: MULTIPOLYGON
# dimension: XY
# bbox: xmin: -102.0517 ymin: 36.99308 xmax: -94.58993 ymax: 40.00316
# epsg (SRID): 4326
# proj4string: +proj=longlat +datum=WGS84 +no_defs
# GEO_ID STATE NAME LSAD CENSUSAREA geometry
# 30 0400000US20 20 Kansas 81758.72 MULTIPOLYGON(((-99.541116 3...
And seeing as you want the individual counties, you need to use the counties data set
sf_counties <- sf::read_sf("http://eric.clst.org/wupl/Stuff/gz_2010_us_050_00_500k.json")
sf_counties[sf_counties$STATE == 20, ]
To stay with a JSON workflow, can try jqr
library(jqr)
url <- 'http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json'
download.file(url, (f <- tempfile(fileext = ".json")))
res <- paste0(readLines(f), collapse = " ")
out <- jq(res, '.features[] | select(.properties.NAME == "Kansas")')
can map easily like
library(leaflet)
leaflet() %>%
addTiles() %>%
addGeoJSON(out) %>%
setView(-98, 38, 6)
library(rjson)
lst=fromJSON(file = 'http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json')
index = which(sapply(lapply(lst$features,"[[",'properties'),'[[','NAME')=='Kansas')
subdata = lst$features[[index]]

Trouble spreading values using tidyjson

I am trying to convert the following multi-document JSON file into a data.frame.
x = '[
{"name": "Bob","groupIds": ["kwt6x61", "yiahf43"]},
{"name": "Sally","groupIds": "yiahf43"}
]'
I'm almost there by using
y = x %>% gather_array() %>%
spread_values(
name = jstring("name"),
groupIds = jstring("groupIds")
)
print(y)
Which returns:
document.id array.index name groupIds
1 1 1 Bob list("kwt6x61", "yiahf43")
2 1 2 Sally yiahf43
Can someone help spread the groupsIds into addtional rows?
This is an interesting problem. The issue stems from the fact that an array of 1 is stored as a string. Otherwise, enter_object('groupIds') %>% gather_array %>% append_values_string would work nicely. tidyjson does not seem to handle this situation nicely. I wonder whether this would even be considered valid JSON, since in one case groupIds is a string, and in another it is an array.
In any case, although this is not an ideal solution, you can use json_types() to illustrate the difference and then conditionally treat each. I converted to a tbl_df (i.e. dropped JSON component) for future processing when done parsing.
library(tidyjson)
library(dplyr)
library(tidyr)
x = '[
{"name": "Bob","groupIds": ["kwt6x61", "yiahf43"]},
{"name": "Sally","groupIds": "yiahf43"}
]'
## Show the different types
z <- x %>% gather_array() %>% spread_values(
name=jstring('name')
) %>% enter_object('groupIds') %>% json_types()
## Conditionally treat each
final <- bind_rows(
z[z$type=='array',] %>% gather_array('id') %>% append_values_string('groupId')
, z[z$type=='string',] %>% append_values_string('groupId') %>% mutate(id=1)
) %>% tbl_df
## Spread them out, maybe? Depends on what you're looking for
final %>% spread('id','groupId')

Flatten deep nested json in R

I am trying to use R to convert a nested JSON file into a two dimensional dataframe.
My JSON file has a nested structure. But, the names and properties are the same across levels.
{"name":"A", "value":"1", "c":
[{"name":"a1", "value":"11", "c":
[{"name":"a11", "value":"111"},
{"name":"a12", "value":"112"}]
},
{"name":"a2", "value":"12"}]
}
The desired dataset would look like this. Although the exact column names can be different.
name value c__name c_value c_c_name c_c_value
A 1 a1 11 a11 111
A 1 a1 11 a12 112
A 1 a2 12
The code I have so far flattens the data, but it only seems to work for the first level (see the screenshot of the output).
library(jsonlite)
json_file <- ' {"name":"A", "value":"1", "c":
[{"name":"a1", "value":"11", "c":
[{"name":"a11", "value":"111"},
{"name":"a12", "value":"112"}]
},
{"name":"a2", "value":"12"}]
}'
data <- fromJSON(json_file, flatten = TRUE)
View(data)
I tried multiple packages, including jsonlite and RJSONIO, I spent the last 5 hours 5 hours debugging this and trying various online tutorial, but without success. Thanks for your help!
Firstly, that is some ugly JSON; if you have a way of avoiding it, do so. Consequently, what follows is also pretty ugly—to the degree that I normally wouldn't post it, but I am doing so now in the hope that some of the approaches may be of use. If it offends your eyes, let me know and I'll delete it.
library(jsonlite) # for fromJSON
library(reshape2) # for melt
library(dplyr) # for inner_join, select
jlist <- fromJSON(json_file)
jdf <- as.data.frame(jlist)
jdf$c.value <- as.numeric(jdf$c.value) # fix type
jdf$L1 <- as.integer(factor(jdf$c.name)) # for use as a key with an artifact of melt later *urg, sorry*
ccdf <- melt(jdf$c.c) # get nested list into usable form
names(ccdf)[1:2] <- c('c.c.name', 'c.c.value') # fix names so they won't cause problems with the join
df3 <- inner_join(jdf[, -5], ccdf) # join, take out nested column
df3$c.c.value <- as.numeric(df3$c.c.value) # fix type
df3 <- df3 %>% select(-L1, -c) # get rid of useless columns
which leaves you with
> df3
name value c.name c.value c.c.name c.c.value
1 A 1 a1 11 a11 111
2 A 1 a1 11 a12 112
3 A 1 a2 12 <NA> NA
with reasonably sensible types. The packages used are avoidable, if you like.
Is this scalable? Well, not really, without more of the same mess. If anybody else has a less nasty and more scalable approach for dealing with nasty JSON, please post it; I'd be as grateful as the OP.
I think I figured out a way to do this. It seems to work with larger trees. The idea is to unlist the JSON and use the names attribute of the unlisted elements. In this example, if a node has one parent, the name attribute will start with "c.", if it has a parent and a "grandparent", it will list it as "c.c."...etc. So, the code below uses this structure to find the level of nesting and placing the node in the appropriate columns. The rest of the code adds the attributes of the parent nodes and deletes extra rows generated. I know it is not elegant, but I thought it might be useful for others.
library(stringr)
library(jsonlite)
json_file <- ' {"name":"A", "value":"1", "c":
[{"name":"a1", "value":"11", "c":
[{"name":"a11", "value":"111"},
{"name":"a12", "value":"112"}]
},
{"name":"a2", "value":"12"}]
}'
nestedjson <- fromJSON(json_file, simplifyVector = F) #read the json
nAttrPerNode <- 2 #number of attributes per node
strChild <- "c." #determines level of nesting
unnestedjson <- unlist(nestedjson) #convert JSON to unlist
unnestednames <- attr(unnestedjson, "names") #get the names of the cells
depthTree <- (max(str_count(unnestednames, strChild)) + 1) * nAttrPerNode #maximum tree depth
htTree <- length(unnestednames) / nAttrPerNode #maximum tree height (number of branches)
X <- array("", c(htTree, depthTree))
for (nodeht in 1:htTree){ #iterate through the branches and place the nodes based on the count of strChild in the name attribute
nodeIndex <- nodeht * nAttrPerNode
nodedepth <- str_count(unnestednames[nodeIndex], strChild) + 1
X[nodeht, nodedepth * nAttrPerNode - 1] <- unnestedjson[nodeIndex - 1]
X[nodeht, nodedepth * nAttrPerNode] <- unnestedjson[nodeIndex]
}
for (nodeht in 2:htTree){ #repeat the parent node attributes for the children
nodedepth <- 0
repeat{
nodedepth <- nodedepth + 1
startcol <- nodedepth * nAttrPerNode - 1
endcol <- startcol + nAttrPerNode - 1
if (X[nodeht, startcol] == "" & nodedepth < depthTree/nAttrPerNode){
X[nodeht, startcol:endcol] <- X[nodeht-1, startcol:endcol]
} else {
break()
}
}
}
deleteRows <- NULL #Finally delete the rows that only have the parent attributes for nodes that have children
strBranches <- apply(X, 1, paste, collapse="")
for (nodeht in 1:(htTree-1)){
branch2sub <- substr(strBranches[nodeht+1], 1, nchar(strBranches[nodeht]))
if (strBranches[nodeht]==branch2sub){
deleteRows <- c(deleteRows, nodeht)
}
}
deleteRows
X <- X[-deleteRows,]

Saving a JSON object to file.JSON

I've create a JSON file, and I need to be able to share the file via email with other collaborators. However, although there are plenty of topics available on handling JSON objects in the R workspace, there are virtually no resources discussing how to actually export a JSON object to a .JSON file.
Here's a simple example:
list1 <- vector(mode="list", length=2)
list1[[1]] <- c("a", "b", "c")
list1[[2]] <- c(1, 2, 3)
exportJson <- toJSON(list1)
## Save the JSON to file
save(exportJson, file="export.JSON")
## Attempt to read in the JSON
library("rjson")
json_data <- fromJSON(file="export.JSON")
The final line, attempting to read in the JSON file, results in an error: "Error in fromJSON(file = "export.JSON") : unexpected character 'R'"
Obviously the save() function is not the way to go, but after extensive googling, I have found nothing that says how to export the JSON to a file. Any help would be greatly appreciated.
You can use write:
library(RJSONIO)
list1 <- vector(mode="list", length=2)
list1[[1]] <- c("a", "b", "c")
list1[[2]] <- c(1, 2, 3)
exportJson <- toJSON(list1)
> exportJson
[1] "[\n [ \"a\", \"b\", \"c\" ],\n[ 1, 2, 3 ] \n]"
write(exportJson, "test.json")
library("rjson")
json_data <- fromJSON(file="test.json")
> json_data
[[1]]
[1] "a" "b" "c"
[[2]]
[1] 1 2 3
There is also the jsonlite package:
library(jsonlite)
exportJSON <- toJSON(list1)
write(exportJSON, "test.json")
list2 <- fromJSON("test.json")
identical(list1, list2)

Substring in Data Frame R

I have data from GPS log like this : (this data in rows of data frame columns)
{"mAccuracy":20.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":21677339000000,"mExtras":{"networkLocationSource":"cached","networkLocationType":"wifi","noGPSLocation":{"mAccuracy":20.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":21677339000000,"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811956,"mLongitude":126.9104909,"mProvider":"network","mSpeed":0.0,"mTime":1402801381486},"travelState":"stationary"},"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811956,"mLongitude":126.9104909,"mProvider":"network","mSpeed":0.0,"mTime":1402801381486,"timestamp":1402801665.512}
The problem is I only need Latitude and longitude value, so I think i can use substring and sappy for applying to all data in dataframe.
But I am not sure this way is handsome because when i use substring ex: substr("abcdef", 2, 4) so I need to count who many chars from beginning until "mLatitude" , so anybody can give suggestion the fast way to processing it?
Thank you to #mnel for answering question, it's work , but i still have problem
From mnel answer I've created function like this :
fgps <- function(x) {
out <- fromJSON(x)
c(out$mExtras$noGPSLocation$mLatitude,
out$mExtras$noGPSLocation$mLongitude)
}
and then this is my data :
gpsdata <- head(dfallgps[,4],2)
[1] "{\"mAccuracy\":23.128,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":76437488000000,\"mExtras\":{\"networkLocationSource\":\"cached\",\"networkLocationType\":\"wifi\",\"noGPSLocation\":{\"mAccuracy\":23.128,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":76437488000000,\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1779956,\"mLongitude\":126.9089661,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894224187},\"travelState\":\"stationary\"},\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1779956,\"mLongitude\":126.9089661,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894224187,\"timestamp\":1402894517.425}"
[2] "{\"mAccuracy\":1625.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":77069916000000,\"mExtras\":{\"networkLocationSource\":\"cached\",\"networkLocationType\":\"cell\",\"noGPSLocation\":{\"mAccuracy\":1625.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":77069916000000,\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811881,\"mLongitude\":126.9084072,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894857416},\"travelState\":\"stationary\"},\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811881,\"mLongitude\":126.9084072,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894857416,\"timestamp\":1402894857.519}"
When run sapply why the data still shows in the result not just the results values.
sapply(gpsdata, function(gpsdata) fgps(gpsdata))
{"mAccuracy":23.128,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":76437488000000,"mExtras":{"networkLocationSource":"cached","networkLocationType":"wifi","noGPSLocation":{"mAccuracy":23.128,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":76437488000000,"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1779956,"mLongitude":126.9089661,"mProvider":"network","mSpeed":0.0,"mTime":1402894224187},"travelState":"stationary"},"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1779956,"mLongitude":126.9089661,"mProvider":"network","mSpeed":0.0,"mTime":1402894224187,"timestamp":1402894517.425}
[1,] 35.178
[2,] 126.909
{"mAccuracy":1625.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":77069916000000,"mExtras":{"networkLocationSource":"cached","networkLocationType":"cell","noGPSLocation":{"mAccuracy":1625.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":77069916000000,"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811881,"mLongitude":126.9084072,"mProvider":"network","mSpeed":0.0,"mTime":1402894857416},"travelState":"stationary"},"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811881,"mLongitude":126.9084072,"mProvider":"network","mSpeed":0.0,"mTime":1402894857416,"timestamp":1402894857.519}
[1,] 35.18119
[2,] 126.90841
I want the result looks like :
[1] 35.178 126.909
[2] 35.18119 126.90841
Thank you
It would appear that your data is in JSON format. Therefore, use a RJSONIO::fromJSON to read the file.
E.g.:
txt <- "{\"mAccuracy\":20.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":21677339000000,\"mExtras\":{\"networkLocationSource\":\"cached\",\"networkLocationType\":\"wifi\",\"noGPSLocation\":{\"mAccuracy\":20.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":21677339000000,\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811956,\"mLongitude\":126.9104909,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402801381486},\"travelState\":\"stationary\"},\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811956,\"mLongitude\":126.9104909,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402801381486,\"timestamp\":1402801665.512}"
Then process:
library(RJSONIO)
out <- fromJSON(txt)
out$$mLongitude
#[1] 126.9105
out$mLatitude
#[1] 35.1812
# to process multiple values
tt <- rep(txt,2)
myData <- lapply(tt, fromJSON)
latlong <- do.call(rbind,lapply(myData, `[` ,c('mLatitude','mLongitude')))
# or using rbind list
library(data.table)
latlong <- rbindlist(lapply(myData, `[` ,c('mLatitude','mLongitude')))