"NA" in JSON file translates to NA logical - json

I have json files with data for countries. One of the files has the following data:
"[{\"count\":1,\"subject\":{\"name\":\"Namibia\",\"alpha2\":\"NA\"}}]"
I have the following code convert the json into a data.frame using the jsonlite package:
df = as.data.frame(fromJSON(jsonfile), flatten=TRUE))
I was expecting a data.frame with numbers and strings:
count subject.name subject.alpha2
1 Namibia "NA"
Instead, the NA alpha2 code is being automatically converted into NA logical, and this is what I get:
str(df)
$ count : int 1
$ subject.name : chr "Namibia"
$ subject.alpha2: logi NA
I want alpha2 to be a string, not logical. How do I fix this?

That particular implementation of fromJSON (and there are three different packages with that name for a function) has a simplifyVector argument which appears to prevent the corecion:
require(jsonlite)
> as.data.frame( fromJSON(test, simplifyVector=FALSE ) )
count subject.name subject.alpha2
1 1 Namibia NA
> str( as.data.frame( fromJSON(test, simplifyVector=FALSE ) ) )
'data.frame': 1 obs. of 3 variables:
$ count : int 1
$ subject.name : Factor w/ 1 level "Namibia": 1
$ subject.alpha2: Factor w/ 1 level "NA": 1
> str( as.data.frame( fromJSON(test, simplifyVector=FALSE ) ,stringsAsFactors=FALSE) )
'data.frame': 1 obs. of 3 variables:
$ count : int 1
$ subject.name : chr "Namibia"
$ subject.alpha2: chr "NA"
I tried seeing if that option worked well with the flatten argument, but was disappointed:
> str( fromJSON(test, simplifyVector=FALSE, flatten=TRUE) )
List of 1
$ :List of 2
..$ count : int 1
..$ subject:List of 2
.. ..$ name : chr "Namibia"
.. ..$ alpha2: chr "NA"

The accepted answer did not solve my use case.
However, rjson::fromJSON does this naturally, and to my surprise, 10 times faster on my data.

Related

Transform list cell in data frame into rows

I'm sorry for no code to replicate, I can provide a picture only. See it below please.
A data frame with Facebook insights data prepared from JSON consists a column "values" with list values. For the next manipulation I need to have only one value in the column. So the row 3 on picture should be transformed into two (with list content or value directly):
post_story_adds_by_action_type_unique lifetime list(like = 38)
post_story_adds_by_action_type_unique lifetime list(share = 11)
If there are 3 or more values in data frame list cell, it should make 3 or more single value rows.
Do you know how to do it?
I use this code to get the json and data frame:
i <- fromJSON(post.request.url)
i <- as.data.frame(i$insights$data)
Edit:
There will be no deeper nesting, just this one level.
The list is not needed in the result, I need just the values and their names.
Let's assume you're starting with something that looks like this:
mydf <- data.frame(a = c("A", "B", "C", "D"), period = "lifetime")
mydf$values <- list(list(value = 42), list(value = 5),
list(value = list(like = 38, share = 11)),
list(value = list(like = 38, share = 13)))
str(mydf)
## 'data.frame': 4 obs. of 3 variables:
## $ a : Factor w/ 4 levels "A","B","C","D": 1 2 3 4
## $ period: Factor w/ 1 level "lifetime": 1 1 1 1
## $ values:List of 4
## ..$ :List of 1
## .. ..$ value: num 42
## ..$ :List of 1
## .. ..$ value: num 5
## ..$ :List of 1
## .. ..$ value:List of 2
## .. .. ..$ like : num 38
## .. .. ..$ share: num 11
## ..$ :List of 1
## .. ..$ value:List of 2
## .. .. ..$ like : num 38
## .. .. ..$ share: num 13
## NULL
Instead of retaining lists in your output, I would suggest flattening out the data, perhaps using a function like this:
myFun <- function(indt, col) {
if (!is.data.table(indt)) indt <- as.data.table(indt)
other_names <- setdiff(names(indt), col)
list_col <- indt[[col]]
rep_out <- sapply(list_col, function(x) length(unlist(x, use.names = FALSE)))
flat <- {
if (is.null(names(list_col))) names(list_col) <- seq_along(list_col)
setDT(tstrsplit(names(unlist(list_col)), ".", fixed = TRUE))[
, val := unlist(list_col, use.names = FALSE)][]
}
cbind(indt[rep(1:nrow(indt), rep_out)][, (col) := NULL], flat)
}
Here's what it does with the "mydf" I shared:
myFun(mydf, "values")
## a period V1 V2 V3 val
## 1: A lifetime 1 value NA 42
## 2: B lifetime 2 value NA 5
## 3: C lifetime 3 value like 38
## 4: C lifetime 3 value share 11
## 5: D lifetime 4 value like 38
## 6: D lifetime 4 value share 13

Disappearing values when data from json file got imported into R

The problem:
I have a json file with 20000 lines, which are basically web logs each representing specific users activities. I want to create a data frame in R to work with this data. Here is an example of a json line (random):
{"_type":"verifiedProductDetail","ts":1431820984214,"did":"7cd80696-4ede-49e4-a267-b887e684de32","profileId":"33021589-c159-4ec6-8c22-c0e5d9b600d9","preferenceIds":[],"price":115.0,"itemId":"10645","category":"/Binnenverlichting/Wandlampen","currency":1,"language":1,"name":"Wandlamp Linea 60 aluminium","url":"http://www.shop1.be/pagea/wandlampen.html_be","imageUrl":"http://vhetnevnejk.cloudfont.net/media/catalog/product/cache/7/thumbnail/450x/9df78eab33525dcdehl6e5fb8d27136e95/i/m/image_14583/Wandlamp.jpg","id":"871d275a-c856-4280-9cbd-f163b9f749e7","product":{"_id":"625363f4-0d80-3ff5-b091-174de3f9c9b2","domainId":"7cd80696-4ede-49e4-a267-b887e684de32","created":1427806290512,"updated":1436870460905,"itemId":"10645","prices":{"4":299.99,"1":69.99,"2":69.99,"5":299.99},"ratings":{"4":{"rate":1.0,"count":1,"created":1433447796660,"lan":4},"1":{"rate":0.9,"count":2,"created":1434355924529,"lan":1}},"categories":[{"language":3,"text":" Destockage","created":1427820384334},{"language":2,"text":" Outlet","created":1427883890399},{"language":1,"text":"/Binnenverlichting/Wandlampen","created":1431545171151},{"language":6,"text":" Outlet","created":1427876074772},{"language":4,"text":" Outlet","created":1427901573250},{"language":4,"text":" Beleuchtung nach Raum","created":1427827783211},{"language":11,"text":" Outlet","created":1427809161244}],"names":[{"language":3,"text":"Applique murale Linea 60cm en aluminium","created":1427820384334},{"language":2,"text":"Wall Lamp Linea 60 Aluminium","created":1427826729309},{"language":1,"text":"Wandlamp Linea 60 aluminium","created":1435695901730},{"language":6,"text":"Aplique de pared LINEA 60 aluminio ","created":1427819228360},{"language":11,"text":"Kinkiet Linea 60 aluminium","created":1427806290512},{"language":4,"text":"Wandleuchte Linea 60 Aluminium","created":1436870460905}],"imageUrl":"hhttp://vhetnevnejk.cloudfont.net/media/catalog/product/cache/7/thumbnail/450x/9df78eab335evwnrf5fb8d27136e95/i/m/image_14083/LineaWandlamp.jpg","url":"http://www.lampyiswiatlo.pl/kinkiet-linea.html","overwritePrinciples":{},"sku":"10645","stock":-1},"preferences":[]}
Here is what I did in R:
install.packages("rjson")
library("rjson")
SampleFile <- "filesample.json"
json_data <- fromJSON(paste(readLines(SampleFile), collapse=""))
str(json_data)
summary(json_data)
Finally I read it in R and have extracted variables:
> str(json_data)
List of 18
$ _type : chr "verifiedProductDetail"
$ ts : num 1.43e+12
$ did : chr "7cd80696-4ede-49e4-a267-b887e684de32"
$ profileId : chr "8be1a552-9124-453d-a0aa-7124c99b56c6"
$ preferenceIds: list()
$ price : num 26.9
$ itemId : chr "9858"
$ category : chr ""
$ currency : num 1
$ language : num 6
$ name : chr "up Weiss"
$ profile :List of 13
..$ _id : chr "8be1a552-9124-453d-a0aa-7124c99b56c6"
..$ created : num 1.43e+12
..$ updated : num 1.43e+12
[and others]
My issue: However, as you can see the length is 1 for all my variables, meaning that each variable only takes and represents one value (the first entry on the json file). Other values have disappeared. We can see it better using summary() function.
> summary(json_data)
Length Class Mode
_type 1 -none- character
ts 1 -none- numeric
did 1 -none- character
profileId 1 -none- character
preferenceIds 0 -none- list
price 1 -none- numeric
itemId 1 -none- character
category 1 -none- character
currency 1 -none- numeric
language 1 -none- numeric
name 1 -none- character
url 1 -none- character
imageUrl 1 -none- character
id 1 -none- character
profile 13 -none- list
product 14 -none- list
group 10 -none- list
preferences 0 -none- list
Summary: Could you please give to me any advice on what is wrong with my code that makes it only get the first value of each variable and all others have disappeared?

How do I write a json array from R that has a sequence of lat and long?

How do I write a json array from R that has a sequence of lat and long?
I would like to write:
[[[1,2],[3,4],[5,6]]]
the best I can do is:
toJSON(matrix(1:6, ncol = 2, byrow = T))
#"[ [ 1, 2 ],\n[ 3, 4 ],\n[ 5, 6 ] ]"
How can I wrap the thing in another array (the json kind)?
This is important to me so I can write files into a geojson format as a LineString.
I usually use fromJSON to get the target object :
ll <- fromJSON('[[[1,2],[3,4],[5,6]]]')
str(ll)
List of 1
$ :List of 3
..$ : num [1:2] 1 2
..$ : num [1:2] 3 4
..$ : num [1:2] 5 6
So we should create , a list of unnamed list, each containing 2 elements:
xx <- list(setNames(split(1:6,rep(1:3,each=2)),NULL))
identical(toJSON(xx),'[[[1,2],[3,4],[5,6]]]')
[1] TRUE
If you have a matrix
m1 <- matrix(1:6, ncol=2, byrow=T)
may be this helps:
library(rjson)
paste0("[",toJSON(setNames(split(m1, row(m1)),NULL)),"]")
#[1] "[[[1,2],[3,4],[5,6]]]"

Get only specific object within json in a data frame

I would like to import a single object from a json file into a R data frame. Normally I use fromJSON() from the jsonlite package. However now I want to load this json into a data frame and then only the object that is called plays.
If I use:
library(jsonlite)
df <- fromJSON("http://live.nhl.com/GameData/20132014/2013020555/PlayByPlay.json")
It gives a data frame containing all the objects. Is there a way to only load the plays object in the data frame? Or should I just load the complete json and restructure this within R?
That does return a dataframe, although it 's kind of a mangled gemisch of list and dataframe. If you use a different package, it is just a list. Using str(df) (warning ...long output)
library(RJSONIO)
str(df)
#------------
List of 1
$ data:List of 2
..$ refreshInterval: num 0
..$ game :List of 7
.. ..$ awayteamid : num 24
.. ..$ awayteamname: chr "Anaheim Ducks"
.. ..$ hometeamname: chr "Washington Capitals"
.. ..$ plays :List of 1
.. .. ..$ play:List of 102
.. .. .. ..$ :List of 28
-----------Output truncated----------------
.... shows that the plays portions can be obtained with:
plays_out <- df$data$game$plays
I do not see that there is any advantage in trying to parse this yourself. Most of the "volume" of data is in the plays component.
When I use jsonlite::fromJSON I get a slightly different structure which is sufficiently different that I now I need to use a different call to get the plays items:
> str(df )
'data.frame': 1 obs. of 2 variables:
$ refreshInterval:List of 1
..$ data: num 0
$ game :'data.frame': 1 obs. of 7 variables:
..$ awayteamid :List of 1
.. ..$ data: num 24
..$ awayteamname:List of 1
.. ..$ data: chr "Anaheim Ducks"
..$ hometeamname:List of 1
.. ..$ data: chr "Washington Capitals"
..$ plays :'data.frame': 1 obs. of 1 variable:
.. ..$ play:List of 1
.. .. ..$ data:'data.frame': 102 obs. of 29 variables:
.. .. .. ..$ aoi :List of 102
.. .. .. .. ..$ : num 8470612 8470621 8473933 8473972 8475151 ...
.. .. .. .. ..$ : num 8459442 8467332 8467400 8471476 8471699 ...
.. .. .. .. ..$ : num 8459442 8467332 8467400 8471476 8471699 ...
.. .. .. .. ..$ : num 8459442 8467332 8467400 8471476 8471699 ...
#------snipped output------------
> length(df$game$plays)
[1] 1
> length(df$game$plays$play)
[1] 1
> length(df$game$plays$play$data)
[1] 29
I think I prefer the result from RJSONIO::fromJSON, since it doesn't add the complexity of dataframe coercion.

Split and transform a R character string into numerical vector

I want to convert the following json and put the values into a data frame. It almost works but as.data.frame() puts everything into one row.
require(rjson)
require(RCurl)
y = getURI(url1)
y
[1] "[{\"close\":5.45836392962902,\"highest\":5.45837200714172,\"lowest\":5.45836392962902,\"open\":5.45837200714172,\"start_time\":\"2012-01-29T18:29:24-08:00\"},{\"close\":5.45837200714172,\"highest\":5.45837200714172,\"lowest\":5.45834791002201,\"open\":5.45835598753471,\"start_time\":\"2012-01-29T18:28:24-08:00\"}]"
x = fromJSON(y)
> str(x)
List of 2
$ :List of 5
..$ close : num 5.46
..$ highest : num 5.46
..$ lowest : num 5.46
..$ open : num 5.46
..$ start_time: chr "2012-01-29T18:29:24-08:00"
$ :List of 5
..$ close : num 5.46
..$ highest : num 5.46
..$ lowest : num 5.46
..$ open : num 5.46
..$ start_time: chr "2012-01-29T18:28:24-08:00"
as.data.frame(x)
close highest lowest open start_time close.1 highest.1 lowest.1 open.1 start_time.1
1 5.458364 5.458372 5.458364 5.458372 2012-01-29T18:29:24-08:00 5.458372 5.458372 5.458348 5.458356 2012-01-29T18:28:24-08:00
Instead of it being on one row. I want them in two rows.
close highest lowest open start_time
1 5.458364 5.458372 5.458364 5.458372 2012-01-29T18:29:24-08:00
2 5.458372 5.458372 5.458348 5.458356 2012-01-29T18:28:24-08:00
Is there something I can specify in as.data.table for this to work?
EDIT:
do.call(rbind,lapply(x,as.data.frame))
The above was able to coerce it into a data frame, but the time stamp column has two factors. This next part has its own question here
y = do.call(rbind,lapply(x,as.data.frame))
str(x)
'data.frame': 2 obs. of 5 variables:
$ close : num 5.46 5.46
$ highest : num 5.47 5.46
$ lowest : num 5.46 5.46
$ open : num 5.46 5.46
$ start_time: Factor w/ 2 levels "2012-01-29T21:48:24-05:00",..: 1 2
If I try to convert the POSIX format I get
x$start_time = as.POSIXct(x$start_time)
x$start_time
[1] "2012-01-29 CST" "2012-01-29 CST"
But it loses the time data.
You might try:
do.call(rbind,lapply(x,as.data.frame))