Split and transform a R character string into numerical vector - json

I want to convert the following json and put the values into a data frame. It almost works but as.data.frame() puts everything into one row.
require(rjson)
require(RCurl)
y = getURI(url1)
y
[1] "[{\"close\":5.45836392962902,\"highest\":5.45837200714172,\"lowest\":5.45836392962902,\"open\":5.45837200714172,\"start_time\":\"2012-01-29T18:29:24-08:00\"},{\"close\":5.45837200714172,\"highest\":5.45837200714172,\"lowest\":5.45834791002201,\"open\":5.45835598753471,\"start_time\":\"2012-01-29T18:28:24-08:00\"}]"
x = fromJSON(y)
> str(x)
List of 2
$ :List of 5
..$ close : num 5.46
..$ highest : num 5.46
..$ lowest : num 5.46
..$ open : num 5.46
..$ start_time: chr "2012-01-29T18:29:24-08:00"
$ :List of 5
..$ close : num 5.46
..$ highest : num 5.46
..$ lowest : num 5.46
..$ open : num 5.46
..$ start_time: chr "2012-01-29T18:28:24-08:00"
as.data.frame(x)
close highest lowest open start_time close.1 highest.1 lowest.1 open.1 start_time.1
1 5.458364 5.458372 5.458364 5.458372 2012-01-29T18:29:24-08:00 5.458372 5.458372 5.458348 5.458356 2012-01-29T18:28:24-08:00
Instead of it being on one row. I want them in two rows.
close highest lowest open start_time
1 5.458364 5.458372 5.458364 5.458372 2012-01-29T18:29:24-08:00
2 5.458372 5.458372 5.458348 5.458356 2012-01-29T18:28:24-08:00
Is there something I can specify in as.data.table for this to work?
EDIT:
do.call(rbind,lapply(x,as.data.frame))
The above was able to coerce it into a data frame, but the time stamp column has two factors. This next part has its own question here
y = do.call(rbind,lapply(x,as.data.frame))
str(x)
'data.frame': 2 obs. of 5 variables:
$ close : num 5.46 5.46
$ highest : num 5.47 5.46
$ lowest : num 5.46 5.46
$ open : num 5.46 5.46
$ start_time: Factor w/ 2 levels "2012-01-29T21:48:24-05:00",..: 1 2
If I try to convert the POSIX format I get
x$start_time = as.POSIXct(x$start_time)
x$start_time
[1] "2012-01-29 CST" "2012-01-29 CST"
But it loses the time data.

You might try:
do.call(rbind,lapply(x,as.data.frame))

Related

Convergence issue using a LMM with random intercepts and slopes per patient, and with a continuous AR1 correlation structure

Without providing a reproducible example (because that converged, and my data did not), my data is in a long format and includes 104 patients of which 304 measurements were taken. Most only have 2 measurements (n=68), followed by 5 (n=32), 4 (n=3), 3 (n=1), and 6 measurements (n=1). Some repeated measurements are taken fairly short after each other (for example for 74 of the 304 measurements the time in between was ~0.5 years), where for some patients with only two measurements the time in between is 5 years. The average follow-up time is about 3.5 years.
So far I've modeled a random intercept and slope model:
library(nlme)
ris <-
lme(fixed=outcome~ 1 + fu_time + age*fu_time + sex*fu_time + smoking*fu_time +
obesity*fu_time + diab*fu_time + hypt*fu_time + hyperchol*fu_time +
ckd*fu_time,
random=~1 + fu_time|patid,
data=data,
na.action="na.omit",
method="ML")
To further take into account the differing time in between subsequent measurements but keeping the increasing variances over time, I've tried to specify a model with (1) a random intercept, (2) a continuous AR1 correlation structure, and (3) heterogeneous variances:
ri_cAR1_hetvar <-
update(ris,
random=~1|patid,
correlation=corCAR1(form=~1|patid),
weight=varIdent(form=~1|fu_time),
control=lmeControl(opt="optim")) # does not converge.
This model does not converge. As per advice of Ben Bolker previously here is some output from running debug(nlme:::logLik.reStruct) followed by the following commands:
### Running str(object)
List of 1
$ patid: 'pdLogChol' num 0.0752
..- attr(*, "formula")=Class 'formula' language ~1
.. .. ..- attr(*, ".Environment")=<environment: 0x000000001ec31628>
..- attr(*, "Dimnames")=List of 2
.. ..$ : chr "(Intercept)"
.. ..$ : chr "(Intercept)"
- attr(*, "settings")= int [1:4] 0 1 0 4
- attr(*, "class")= chr "reStruct"
- attr(*, "plen")= Named int 1
..- attr(*, "names")= chr "patid"
### Running str(conLin)
List of 5
$ Xy : num [1:314, 1:22] 1 0.816 0.816 0.816 0.816 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:314] "1" "2" "3" "4" ...
.. ..$ : chr [1:22] "(Intercept)" "(Intercept)" "fu_time" "age" ...
$ dims :List of 15
..$ N : int 314
..$ ZXrows : int 314
..$ ZXcols : num 22
..$ Q : int 1
..$ StrRows: num 125
..$ qvec : Named num [1:3] 1 0 0
.. ..- attr(*, "names")= chr [1:3] "patid" "" ""
..$ ngrps : Named int [1:3] 104 1 1
.. ..- attr(*, "names")= chr [1:3] "patid" "X" "y"
..$ DmOff : Named num [1:3] 0 1 401
.. ..- attr(*, "names")= chr [1:3] "" "patid" ""
..$ ncol : Named num [1:3] 1 20 1
.. ..- attr(*, "names")= chr [1:3] "patid" "" ""
..$ nrot : Named num [1:3] 21 1 0
.. ..- attr(*, "names")= chr [1:3] "" "" ""
..$ ZXoff :List of 3
.. ..$ patid: num [1:104] 0 6 8 13 15 18 23 28 30 32 ...
.. ..$ X : Named num 314
.. .. ..- attr(*, "names")= chr "patid"
.. ..$ y : Named num 6594
.. .. ..- attr(*, "names")= chr ""
..$ ZXlen :List of 3
.. ..$ patid: num [1:104] 6 2 5 2 3 5 5 2 2 2 ...
.. ..$ X : num 314
.. ..$ y : num 314
..$ SToff :List of 3
.. ..$ patid: num [1:104] 0 1 2 3 4 5 6 7 8 9 ...
.. ..$ X : Named num 229
.. .. ..- attr(*, "names")= chr "patid"
.. ..$ y : Named num 2749
.. .. ..- attr(*, "names")= chr ""
..$ DecOff :List of 3
.. ..$ PXE_nr: num [1:104] 0 1 2 3 4 5 6 7 8 9 ...
.. ..$ X : Named num 125
.. .. ..- attr(*, "names")= chr "patid"
.. ..$ y : Named num 2625
.. .. ..- attr(*, "names")= chr ""
..$ DecLen :List of 3
.. ..$ PXE_nr: num [1:104] 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ X : num 125
.. ..$ y : num 125
$ logLik :Class 'logLik' : 4.3 (df=178)
$ sigma : num 0
$ auxSigma: num 0
### Running dput(object)
structure(list(patid= structure(-1.29387589179439, formula = ~1, Dimnames = list(
"(Intercept)", "(Intercept)"), class = c("pdLogChol", "pdSymm",
"pdMat"))), settings = c(0L, 1L, 0L, 4L), class = "reStruct", plen = c(patid= 1L))
I understand this question might be difficult to answer without a reproducible example, but posting anymore of my data is not really an option, and model works on the subsets of my data that I had prepared for this question.
If anyone has any ideas how I could further diagnose this problem, the help would be very much appreciated!
Update
I've noticed that the problem arises when I specify different variances per week by adding weight=varIdent(form=~1|fu_time), and nót per se by adding a continuous AR1 (that works fine). Might the problem be in how many parameters are estimated in the model with heterogenous variances? As in, I thought that that model still only just estimates phi, but now that I think of it it might estimate a parameter for every timepoint, which in my case is a continuous variable with a lot of levels...

Extracting data from list in R

library(RCurl)
library(rjson)
json <- getURL('https://extraction.import.io/query/runtime/17d882b5-c118-4f27-8ce1-90085ec0b116?_apikey=d5a8a01e20174e95887dc0f385e4e3f6d7ef5ca1428d5a029f2aa352509948ade8e5d7fb0dc941f4769a32b541ca6b38a7cd6578dfd81b357fbc4f2e008f5154f1dbfcff31878798fa887b70b1ff59dd&url=http%3A%2F%2Fwww.numbeo.com%2Fcost-of-living%2Fcompare_cities.jsp%3Fcountry1%3DSingapore%26country2%3DAustralia%26city1%3DSingapore%26city2%3DMelbourne')
obj <- fromJSON(json)
I would like to get the data into nice columns of data, but many steps in the list are "nameless". Any idea of how to organise the data?
Check out this difference, and let me know what you think. This is what your object looks like:
library(RCurl)
library(rjson)
json <- getURL('https://extraction.import.io/query/runtime/17d882b5-c118-4f27-8ce1-90085ec0b116?_apikey=d5a8a01e20174e95887dc0f385e4e3f6d7ef5ca1428d5a029f2aa352509948ade8e5d7fb0dc941f4769a32b541ca6b38a7cd6578dfd81b357fbc4f2e008f5154f1dbfcff31878798fa887b70b1ff59dd&url=http%3A%2F%2Fwww.numbeo.com%2Fcost-of-living%2Fcompare_cities.jsp%3Fcountry1%3DSingapore%26country2%3DAustralia%26city1%3DSingapore%26city2%3DMelbourne')
obj <- rjson::fromJSON(json)
str(obj)
List of 2
$ extractorData:List of 3
..$ url : chr "http://www.numbeo.com/cost-of-living/compare_cities.jsp?country1=Singapore&country2=Australia&city1=Singapore&city2=Melbourne"
..$ resourceId: chr "b1250747011ee774e7c881617c86a5a9"
..$ data :List of 1
.. ..$ :List of 1
.. .. ..$ group:List of 52
.. .. .. ..$ :List of 6
.. .. .. .. ..$ COL VALUE :List of 1
.. .. .. .. .. ..$ :List of 1
.. .. .. .. .. .. ..$ text: chr "Meal, Inexpensive Restaurant"
Indeed a lot of Lists in between there that you don't need. Now try the jsonlite package's fromJSON function:
library(jsonlite)
obj2<- jsonlite::fromJSON(json)
List of 2
$ extractorData:List of 3
..$ url : chr "http://www.numbeo.com/cost-of-living/compare_cities.jsp?country1=Singapore&country2=Australia&city1=Singapore&city2=Melbourne"
..$ resourceId: chr "b1250747011ee774e7c881617c86a5a9"
..$ data :'data.frame': 1 obs. of 1 variable:
.. ..$ group:List of 1
.. .. ..$ :'data.frame': 52 obs. of 6 variables:
.. .. .. ..$ COL VALUE :List of 52
.. .. .. .. ..$ :'data.frame': 1 obs. of 1 variable:
.. .. .. .. .. ..$ text: chr "Meal, Inexpensive Restaurant"
.. .. .. .. ..$ :'data.frame': 1 obs. of 1 variable:
.. .. .. .. .. ..$ text: chr "Meal for 2 People, Mid-range Restaurant, Three-course"
.. .. .. .. ..$ :'data.frame': 1 obs. of 1 variable:
Still though, this JSON just isn't pretty, we'll need to fix this.
I take it you want that data frame in there. So start with
df <- obj2$extractorData$data$group[[1]]
and there's your data frame. Problem though: every single cell is in a list here. Including NULL values, and you can't just unlist those, they'll disappear and the columns in which they were will grow shorter...
Edit: Here's how to handle the columns with list(NULL) values.
df[sapply(df[,2],is.null),2] <- NA
df[sapply(df[,3],is.null),3] <- NA
df[sapply(df[,4],is.null),4] <- NA
df[sapply(df[,5],is.null),5] <- NA
df2 <- sapply(df, unlist) %>% as.data.frame
It can be written more elegantly for sure, but this'll get you going and it's understandable.

Transform list cell in data frame into rows

I'm sorry for no code to replicate, I can provide a picture only. See it below please.
A data frame with Facebook insights data prepared from JSON consists a column "values" with list values. For the next manipulation I need to have only one value in the column. So the row 3 on picture should be transformed into two (with list content or value directly):
post_story_adds_by_action_type_unique lifetime list(like = 38)
post_story_adds_by_action_type_unique lifetime list(share = 11)
If there are 3 or more values in data frame list cell, it should make 3 or more single value rows.
Do you know how to do it?
I use this code to get the json and data frame:
i <- fromJSON(post.request.url)
i <- as.data.frame(i$insights$data)
Edit:
There will be no deeper nesting, just this one level.
The list is not needed in the result, I need just the values and their names.
Let's assume you're starting with something that looks like this:
mydf <- data.frame(a = c("A", "B", "C", "D"), period = "lifetime")
mydf$values <- list(list(value = 42), list(value = 5),
list(value = list(like = 38, share = 11)),
list(value = list(like = 38, share = 13)))
str(mydf)
## 'data.frame': 4 obs. of 3 variables:
## $ a : Factor w/ 4 levels "A","B","C","D": 1 2 3 4
## $ period: Factor w/ 1 level "lifetime": 1 1 1 1
## $ values:List of 4
## ..$ :List of 1
## .. ..$ value: num 42
## ..$ :List of 1
## .. ..$ value: num 5
## ..$ :List of 1
## .. ..$ value:List of 2
## .. .. ..$ like : num 38
## .. .. ..$ share: num 11
## ..$ :List of 1
## .. ..$ value:List of 2
## .. .. ..$ like : num 38
## .. .. ..$ share: num 13
## NULL
Instead of retaining lists in your output, I would suggest flattening out the data, perhaps using a function like this:
myFun <- function(indt, col) {
if (!is.data.table(indt)) indt <- as.data.table(indt)
other_names <- setdiff(names(indt), col)
list_col <- indt[[col]]
rep_out <- sapply(list_col, function(x) length(unlist(x, use.names = FALSE)))
flat <- {
if (is.null(names(list_col))) names(list_col) <- seq_along(list_col)
setDT(tstrsplit(names(unlist(list_col)), ".", fixed = TRUE))[
, val := unlist(list_col, use.names = FALSE)][]
}
cbind(indt[rep(1:nrow(indt), rep_out)][, (col) := NULL], flat)
}
Here's what it does with the "mydf" I shared:
myFun(mydf, "values")
## a period V1 V2 V3 val
## 1: A lifetime 1 value NA 42
## 2: B lifetime 2 value NA 5
## 3: C lifetime 3 value like 38
## 4: C lifetime 3 value share 11
## 5: D lifetime 4 value like 38
## 6: D lifetime 4 value share 13

"NA" in JSON file translates to NA logical

I have json files with data for countries. One of the files has the following data:
"[{\"count\":1,\"subject\":{\"name\":\"Namibia\",\"alpha2\":\"NA\"}}]"
I have the following code convert the json into a data.frame using the jsonlite package:
df = as.data.frame(fromJSON(jsonfile), flatten=TRUE))
I was expecting a data.frame with numbers and strings:
count subject.name subject.alpha2
1 Namibia "NA"
Instead, the NA alpha2 code is being automatically converted into NA logical, and this is what I get:
str(df)
$ count : int 1
$ subject.name : chr "Namibia"
$ subject.alpha2: logi NA
I want alpha2 to be a string, not logical. How do I fix this?
That particular implementation of fromJSON (and there are three different packages with that name for a function) has a simplifyVector argument which appears to prevent the corecion:
require(jsonlite)
> as.data.frame( fromJSON(test, simplifyVector=FALSE ) )
count subject.name subject.alpha2
1 1 Namibia NA
> str( as.data.frame( fromJSON(test, simplifyVector=FALSE ) ) )
'data.frame': 1 obs. of 3 variables:
$ count : int 1
$ subject.name : Factor w/ 1 level "Namibia": 1
$ subject.alpha2: Factor w/ 1 level "NA": 1
> str( as.data.frame( fromJSON(test, simplifyVector=FALSE ) ,stringsAsFactors=FALSE) )
'data.frame': 1 obs. of 3 variables:
$ count : int 1
$ subject.name : chr "Namibia"
$ subject.alpha2: chr "NA"
I tried seeing if that option worked well with the flatten argument, but was disappointed:
> str( fromJSON(test, simplifyVector=FALSE, flatten=TRUE) )
List of 1
$ :List of 2
..$ count : int 1
..$ subject:List of 2
.. ..$ name : chr "Namibia"
.. ..$ alpha2: chr "NA"
The accepted answer did not solve my use case.
However, rjson::fromJSON does this naturally, and to my surprise, 10 times faster on my data.

Get only specific object within json in a data frame

I would like to import a single object from a json file into a R data frame. Normally I use fromJSON() from the jsonlite package. However now I want to load this json into a data frame and then only the object that is called plays.
If I use:
library(jsonlite)
df <- fromJSON("http://live.nhl.com/GameData/20132014/2013020555/PlayByPlay.json")
It gives a data frame containing all the objects. Is there a way to only load the plays object in the data frame? Or should I just load the complete json and restructure this within R?
That does return a dataframe, although it 's kind of a mangled gemisch of list and dataframe. If you use a different package, it is just a list. Using str(df) (warning ...long output)
library(RJSONIO)
str(df)
#------------
List of 1
$ data:List of 2
..$ refreshInterval: num 0
..$ game :List of 7
.. ..$ awayteamid : num 24
.. ..$ awayteamname: chr "Anaheim Ducks"
.. ..$ hometeamname: chr "Washington Capitals"
.. ..$ plays :List of 1
.. .. ..$ play:List of 102
.. .. .. ..$ :List of 28
-----------Output truncated----------------
.... shows that the plays portions can be obtained with:
plays_out <- df$data$game$plays
I do not see that there is any advantage in trying to parse this yourself. Most of the "volume" of data is in the plays component.
When I use jsonlite::fromJSON I get a slightly different structure which is sufficiently different that I now I need to use a different call to get the plays items:
> str(df )
'data.frame': 1 obs. of 2 variables:
$ refreshInterval:List of 1
..$ data: num 0
$ game :'data.frame': 1 obs. of 7 variables:
..$ awayteamid :List of 1
.. ..$ data: num 24
..$ awayteamname:List of 1
.. ..$ data: chr "Anaheim Ducks"
..$ hometeamname:List of 1
.. ..$ data: chr "Washington Capitals"
..$ plays :'data.frame': 1 obs. of 1 variable:
.. ..$ play:List of 1
.. .. ..$ data:'data.frame': 102 obs. of 29 variables:
.. .. .. ..$ aoi :List of 102
.. .. .. .. ..$ : num 8470612 8470621 8473933 8473972 8475151 ...
.. .. .. .. ..$ : num 8459442 8467332 8467400 8471476 8471699 ...
.. .. .. .. ..$ : num 8459442 8467332 8467400 8471476 8471699 ...
.. .. .. .. ..$ : num 8459442 8467332 8467400 8471476 8471699 ...
#------snipped output------------
> length(df$game$plays)
[1] 1
> length(df$game$plays$play)
[1] 1
> length(df$game$plays$play$data)
[1] 29
I think I prefer the result from RJSONIO::fromJSON, since it doesn't add the complexity of dataframe coercion.