I have a periodic process in R that yields me a data.frame.
I want to use this data.frame to create a dropdown selector with AngularJS.
My final data.frame will look more or less as follows (my real example might have a deeper hierarchical structure):
DF<-data.frame(hie1=c(rep("Cl1",2),"Cl2"),hie2=c("Cl1op1","Cl1op2","Clop1"),
hie3=c("/first.html","/second.html","/third.html"))
I need to convert that data.frame into a JSON with the following structure :
{
"Cl1":{"Cl1op1":"/first.html","Cl1op2": "/second.html"},
"Cl2":{"Cl2op1":"/third.html"}
}
So far, I have tried all the toJSON commands of the rjson and RJSONIO packages for the data.frame with and without column names:
library(rjson)
#library(RJSONIO)
DF2<-DF
colnames(DF2)<-NULL
cat(toJSON(DF))
cat(toJSON(DF2))
I thought about using reshape2's dcast function beforeusing toJSON, but I do not know what kind of structure I need to achieve my goal.
I also used the functions toJSON2 an toJSONArray from the rCharts with no success.
Is there an appropriate transformation in R to get the output I am looking for?
P.S. (I do not mind having [] instead of {})
EDIT:
I have created a couple of functions (included below) to fulfil my needs.
However, they are not too clean and I believe that there must be a better way to perform this transformation in R.
I keep this question open expecting a better solution.
linktwo<-function(V){
paste0(sapply(V,function(x) paste0("'",toString(x),"'")),collapse=":")
}
pastehier<-function(DF){
if(ncol(DF)==2){
return(paste0(apply(DF,1,linktwo),collapse=","))
}else{
u<-unique(DF[,1])
output=character()
for(i in u){
output<-append(output,paste0(paste0("'",i,"'"),":{",pastehier(DF[DF[,1]==i,-1]),
"}"))
}
return(paste0(output,collapse=","))
}
}
pastehier(DF)
I do not fully understand your request and maybe my solution is useless, but here is a try:
library(reshape2)
prova <- dcast(DF, hie1 ~ ... )
toJSON(prova, pretty = TRUE)
[
{
"hie1": "Cl1",
"Cl1op1": "/first.html",
"Cl1op2": "/second.html"
},
{
"hie1": "Cl2",
"Clop1": "/third.html"
}
]
where:
> prova
hie1 Cl1op1 Cl1op2 Clop1
1 Cl1 /first.html /second.html <NA>
2 Cl2 <NA> <NA> /third.html
Related
I am trying to get a JSON response from an API:
test <- GET(url, add_headers(`api_key` = key))
content(test, 'parsed')
When I run content(test, 'parsed'), I get the following error:
# Error: lexical error: invalid string in json text. .Note: Final passage of the "fiscal cliff bill" on January 1
I think this is because of the double quotations. How can I either replace the double quotes or if this is not the problem, how can I fix this issue?
Thanks!
So I had run into a similar problem before, and I had intended to write a quite function to use Jeroen's fix to try to repair the JSON. Since I intended to do it anyway, here's a quick hack attempt.
NB: repairing a structured format like this is speculative at best and most certainly prone to errors. The good news is that I tried to keep this specific enough so that it will not produce false results: it'll either fix what it knows it can, or fail. The "unit-testing" really needs to check other corner-cases. If you find something that this does not fix (and should) or that this breaks (gasp!), please comment!
fix_json_quotes <- function(s) {
if (length(s) != 1) {
warning("the argument has length > 1 and only the first element will be used")
s <- s[[1]]
}
stopifnot(is.character(s))
val <- jsonlite::validate(s)
while (! val) {
ind <- attr(val, "offset") - 1
snew <- gsub("(.*)(['\"])([[:space:],]*)$", "\\1\\\\\\2\\3", substr(s, 1, ind))
if (snew != substr(s, 1, ind)) {
s <- paste0(snew, substr(s, ind + 1, nchar(s)))
} else {
break
}
val <- jsonlite::validate(s)
}
if (! val) {
# still not validating
stop("unable to fix quotes")
}
return(s)
}
Some sample data, unit-testing if you will (testthat is not required for use of the function):
library(testthat)
lst <- list(a="final \"cliff bill\" on")
json <- as.character(toJSON(lst))
json
# [1] "{\"a\":[\"final \\\"cliff bill\\\" on\"]}"
Okay, there should be no change:
expect_equal(json, fix_json_quotes(json))
Some bad data:
# un-escape the double quotes
badlst <- "{\"a\":[\"final \"cliff bill\" on\"]}"
expect_error(jsonlite::fromJSON(badlst))
expect_equal(json, fix_json_quotes(badlst))
PS: this looks specifically for double-quotes, nothing more. However, I believe that there are related errors that this might also be able to fix. I "left room" for this, in the second group within the regex (([\"])); for example, if single-quotes could also cause a problem, then the group could be changed to be ([\"']). I don't know if it's useful or even necessary.
I have a big json file, containing 18 fields, some of which contain some other subfields. I read the file in R in the following way:
json_file <- "daily_profiles_Bnzai_20150914_20150915_20150914.json"
data <- fromJSON(sprintf("[%s]", paste(readLines(json_file), collapse=",")))
This gives me a giant list with all the fields contained in the json file. I want to make it into a data.frame and do some operations in the meantime. For example if I do:
doc_length <- data.frame(t(apply(as.data.frame(data$doc_lenght_map), 1, unlist)))
os <- data.frame(t(apply(as.data.frame(data$operating_system), 1, unlist)))
navigation <- as.data.frame(data$navigation)
monday <- data.frame(t(apply(navigation[,grep("Monday",names(data$navigation))],1,unlist)))
Monday <- data.frame(apply(monday, 1, sum))
works fine, I get what I want, with all the right subfields and then I want to join them in a final data.frame that I will use to do other operations.
Now, I'd like to do something like that on the subset of fields where I don't need to do operations. So, for example, the days of the week contained in navigation are not included. I'd like to have something like (suppose I have a data.frame df):
for(name in names(data))
{
df <- cbind(df, data.frame(t(apply(as.data.frame(data$name), 1, unlist)))
}
The above loop gives me errors. So, what I want to do is finding a way to access all the fields of the list in an automatic way, as in the loop, where the iterator "name" takes on all the fields of the list, without having to call them singularly and then doing some operations with those fields. I tried even with
for(name in names(data))
{
df <- cbind(df, data.frame(t(apply(as.data.frame(data[name]), 1, unlist)))
}
but it doesn't take all of the subfields. I also tried with
data[, name]
but it doesn't work either. So I think I need to use the "$" operator.
Is it possible to do something like that?
Thank you a lot!
Davide
Like the other commenters, I am confused, but I will throw this out to see if it might point you in the right direction.
# make mtcars a list as an example
data <- lapply(mtcars,identity)
do.call(
cbind,
lapply(
names(data),
function(name){
data.frame(data[name])
}
)
)
I'm trying to get a latex or html output of the regression results of a VGAM model (in the example bellow it's a generalized ordinal logit). But the packages I know for this purpose do not work with a vglm object.
Here you can see a little toy example with the error messages I'm getting:
library(VGAM)
n <- 1000
x <- rnorm(n)
y <- ordered( rbinom(n, 3, prob=.5) )
ologit <- vglm(y ~ x,
family = cumulative(parallel = F , reverse = TRUE),
model=T)
library(stargazer)
stargazer(ologit)
Error in objects[[i]]$zelig.call : $ operator not defined for this S4 class
library(texreg)
htmlreg(ologit)
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘extract’ for signature ‘"vglm"’
library(memisc)
mtable(ologit)
Error in UseMethod("getSummary") : no applicable method for 'getSummary' applied to an object of class "c('vglm', 'vlm', 'vlmsmall')"
I just had the same problem. My first work around is to run the OLogit Regression with the polr function of the MASS package. The resulting objects are easily visualizable / summarizable by the usual packages (I recommend sjplot 's tab_model function for the table output!)
2nd Option is to craft your own table, which you then turn into a neat HTML object via stargazer.
For this you need to know that s4 objects are not subsettable in the same manner as conventional objects (http://adv-r.had.co.nz/Subsetting.html). The most straight forward solution is to subset the object, i.e. extract the relevant aspects with an # instead of a $ symbol:
sumobject <- summaryvglm(yourvglmobject)
stargazer(sumpbject#coef3, type="html", out = "RegDoc.doc")
A little cumbersome but it did the trick for me. Hope this helps!
I have a JSON file "adjFloatTest.data" .In R, i read the field "Volume" from that file using the following code:
json <- fromJSON("adjFloatTest.data")
volume <- json$volume
the value of the volume is as follows
> volume
$AAPL
$AAPL[[1]]
1980-12-12
16751200
$AAPL[[2]]
1980-12-15
100424081
$AAPL[[3]]
1980-12-16
0.1177374
$AAPL[[4]]
1980-12-17
7164476
$AAPL[[5]]
1980-12-18
5364366
Each elements corresponding to company,date,value. I want to store each dates into a list. How it is possible?
This will give you the list of dates
sapply(volume,names)
the following should work:
sapply(volume, function(x) lapply(x, "[[", 1))
but a reproducible example that could be copied+pasted would be helpful.
If the above doesnt work, please use something like dput(volume[1:2]) to offer some workable sample data.
This question is about a generic mechanism for converting any collection of non-cyclical homogeneous or heterogeneous data structures into a dataframe. This can be particularly useful when dealing with the ingestion of many JSON documents or with a large JSON document that is an array of dictionaries.
There are several SO questions that deal with manipulating deeply nested JSON structures and turning them into dataframes using functionality such as plyr, lapply, etc. All the questions and answers I have found are about specific cases as opposed to offering a general approach for dealing with collections of complex JSON data structures.
In Python and Ruby I've been well-served by implementing a generic data structure flattening utility that uses the path to a leaf node in a data structure as the name of the value at that node in the flattened data structure. For example, the value my_data[['x']][[2]][['y']] would appear as result[['x.2.y']].
If one has a collection of these data structures that may not be entirely homogeneous the key to doing a successful flattening to a dataframe would be to discover the names of all possible dataframe columns, e.g., by taking the union of all keys/names of the values in the individually flattened data structures.
This seems like a common pattern and so I'm wondering whether someone has already built this for R. If not, I'll build it but, given R's unique promise-based data structures, I'd appreciate advice on an implementation approach that minimizes heap thrashing.
Hi #Sim I had cause to reflect on your problem yesterday define:
flatten<-function(x) {
dumnames<-unlist(getnames(x,T))
dumnames<-gsub("(*.)\\.1","\\1",dumnames)
repeat {
x <- do.call(.Primitive("c"), x)
if(!any(vapply(x, is.list, logical(1)))){
names(x)<-dumnames
return(x)
}
}
}
getnames<-function(x,recursive){
nametree <- function(x, parent_name, depth) {
if (length(x) == 0)
return(character(0))
x_names <- names(x)
if (is.null(x_names)){
x_names <- seq_along(x)
x_names <- paste(parent_name, x_names, sep = "")
}else{
x_names[x_names==""] <- seq_along(x)[x_names==""]
x_names <- paste(parent_name, x_names, sep = "")
}
if (!is.list(x) || (!recursive && depth >= 1L))
return(x_names)
x_names <- paste(x_names, ".", sep = "")
lapply(seq_len(length(x)), function(i) nametree(x[[i]],
x_names[i], depth + 1L))
}
nametree(x, "", 0L)
}
(getnames is adapted from AnnotationDbi:::make.name.tree)
(flatten is adapted from discussion here How to flatten a list to a list without coercion?)
as a simple example
my_data<-list(x=list(1,list(1,2,y='e'),3))
> my_data[['x']][[2]][['y']]
[1] "e"
> out<-flatten(my_data)
> out
$x.1
[1] 1
$x.2.1
[1] 1
$x.2.2
[1] 2
$x.2.y
[1] "e"
$x.3
[1] 3
> out[['x.2.y']]
[1] "e"
so the result is a flattened list with roughly the naming structure you suggest. Coercion is avoided also which is a plus.
A more complicated example
library(RJSONIO)
library(RCurl)
json.data<-getURL("http://www.reddit.com/r/leagueoflegends/.json")
dumdata<-fromJSON(json.data)
out<-flatten(dumdata)
UPDATE
naive way to remove trailing .1
my_data<-list(x=list(1,list(1,2,y='e'),3))
gsub("(*.)\\.1","\\1",unlist(getnames(my_data,T)))
> gsub("(*.)\\.1","\\1",unlist(getnames(my_data,T)))
[1] "x.1" "x.2.1" "x.2.2" "x.2.y" "x.3"
R has two packages for dealing with JSON input: rjson and RJSONIO. If I understand correctly what you mean by "collection of non-cyclical homogeneous or heterogeneous data structures", I think either of these packages will import that sort of structure as a list.
You can then flatten that list (into a vector) using the unlist function.
If the list is suitably structured (a non-nested list where each element is the same length) then as.data.frame prvoides an alternative to convert the list to be a data frame.
An example:
(my_data <- list(x = list('1' = 1, '2' = list(y = 2))))
unlist(my_data)
The jsonlite package is a fork of RJSONIO specifically designed to make conversion between JSON and data frames easier. You don't provide any example json data, but I think this might be what you are looking for. Have a look at this blog post or the vignette.
Great answer with the flatten and getnames functions. Took a few minutes to figure out all the options needed to get from a vector of JSON strings to a data.frame, so I thought I'd record that here. Suppose jsonvec is a vector of JSON strings. The following builds a data.frame (data.table) where there is one row per string, and each column corresponds to a different possible leaf node of the JSON tree. Any string missing a particular leaf node is filled with NA.
library(data.table)
library(jsonlite)
parsed = lapply(jsonvec, fromJSON, simplifyVector=FALSE)
flattened = lapply(parsed, flatten) #using flatten from accepted answer
d = rbindlist(flattened, fill=TRUE)
I'm now a big fan of simply:
library(jsonlite)
library(tidyverse)
fromJSON("file_path.json") %>%
unlist() %>%
enframe()
And then potentially, depending on your data, piping that into
%>%
pivot_wider()
Once it's in a flat table shape, there are a load of tools in tidyverse and other R libraries more generally for wrangling things around and e.g., dealing with columns with similar prefixes (which will result from the above pipeline as the parent name of the children within a nested json chunk will be prefixed to the child's name).