I have a large data file filled up of key value pairs. The key is an ID and the value is a huge json object. I have been trying to convert this data file to a df in R, by importing the data as a 2 column table and then converting the value to a data frame.
I keep getting this error, even after I validated my json.
Error: lexical error: invalid string in json text.
[{ f: { SEQNUM: [ 455043, 455044,
(right here) ------^
below is my code
part00013 <- read.table(PatientData, sep = '\t', header = F, as.is = T)
colnames(part00013) <- c('k','v')
make_indexDateLists <- function(x) {
# x['v'] <- lapply(x['v'], function(y) as.character(y))
# x['v'] <- lapply(x['v'], function(y) substr(y,1, nchar(y)-1 ))
# x['v'] <- lapply(x['v'], function(y) substr(y,2,nchar(y)))
x["v"] <- lapply(as.character(x["v"]), function(y) jsonlite::fromJSON(y,simplifyVector = T))
#do assignpatienttocohorts
x["v"] <- lapply(x["v"], function(y) RJSONIO::toJSON(y))
cbind(x$k, x$v)
}
make_indexDateLists(part00013)
and here is a sample file https://drive.google.com/open?id=0B6hKduYaYwdJQ3BwbUpNSW9EZk0
It's invalid JSON, but you can turn it into valid JSON:
library(stringi)
library(jsonlite)
library(tidyverse)
tmp <- readLines("oneline_part00013")
parts <- stri_split_fixed(tmp, "\t", 2)[[1]]
fromJSON(parts[2], flatten = FALSE) %>%
glimpse()
## Observations: 1
## Variables: 7
## $ f <data.frame> 455043, 455044, 455045, 455046, 455047, 455048, 45504...
## $ s <data.frame> 246549, 246550, 246551, 246552, 246553, 246554, 24655...
## $ i <data.frame> 8224, 8788, 770102, 30, 10, 30, 3301, 3301, 3301, 192...
## $ d <data.frame> 1114386, 1114387, 1114388, 1114389, 1114390, 1114391,...
## $ o <data.frame> 162072527, 162072528, 162072529, 162072530, 162072531...
## $ t <data.frame> 408352, 408353, 408354, 408355, 408356, 408357, 40835...
## $ a <data.frame> 36527, 42259, 35562, 42458, 39119, 30, 10, 30, 20, 30...
flatten = TRUE will un-nest all the data.frame columns (you'll end up with over 450 columns that way)
Related
I've imported a JSON file into R from ( http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json ) and I'm trying to select only counties in Kansas.
Right now I have all the data into one variable and I'm trying to make subdata of this that is just counties of Kansas. I'm not sure how to go about this.
What you have there is geoJson, which can be read directly by library(sf), to give you an sf object, which is also data.frame. Then you can use the usual data.frame subsetting operations
library(sf)
sf <- sf::read_sf("http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json")
sf[sf$NAME == "Kansas", ]
# Simple feature collection with 1 feature and 5 fields
# geometry type: MULTIPOLYGON
# dimension: XY
# bbox: xmin: -102.0517 ymin: 36.99308 xmax: -94.58993 ymax: 40.00316
# epsg (SRID): 4326
# proj4string: +proj=longlat +datum=WGS84 +no_defs
# GEO_ID STATE NAME LSAD CENSUSAREA geometry
# 30 0400000US20 20 Kansas 81758.72 MULTIPOLYGON(((-99.541116 3...
And seeing as you want the individual counties, you need to use the counties data set
sf_counties <- sf::read_sf("http://eric.clst.org/wupl/Stuff/gz_2010_us_050_00_500k.json")
sf_counties[sf_counties$STATE == 20, ]
To stay with a JSON workflow, can try jqr
library(jqr)
url <- 'http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json'
download.file(url, (f <- tempfile(fileext = ".json")))
res <- paste0(readLines(f), collapse = " ")
out <- jq(res, '.features[] | select(.properties.NAME == "Kansas")')
can map easily like
library(leaflet)
leaflet() %>%
addTiles() %>%
addGeoJSON(out) %>%
setView(-98, 38, 6)
library(rjson)
lst=fromJSON(file = 'http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json')
index = which(sapply(lapply(lst$features,"[[",'properties'),'[[','NAME')=='Kansas')
subdata = lst$features[[index]]
Consider the following simple R data frame:
shape1 <- c('red','triangle')
shape2 <- c('blue','circle')
df <- data.frame(shape1,shape2)
rownames(df) <- c('Prop1','Prop2')
I would like to convert it to the following JSON:
{
"shape1": {"Prop1":"red","Prop2":"triangle"},
"shape2": {"Prop1":"blue","Prop2":"circle"}
}
Any ideas how to go about this?
In these situations it's sometimes easier to work backwards from the JSON and find out what R structure is required, to then give you back that JSON.
## start with the desired json
js <- '{"shape1": {"Prop1":"red","Prop2":"triangle"},
"shape2": {"Prop1":"blue","Prop2":"circle"}}'
## convert it into an R object (in this case, it's a list)
lst <- jsonlite::fromJSON(js)
lst
# $shape1
# $shape1$Prop1
# [1] "red"
#
# $shape1$Prop2
# [1] "triangle"
#
#
# $shape2
# $shape2$Prop1
# [1] "blue"
#
# $shape2$Prop2
# [1] "circle"
So now lst is the structure you require to get to your desired JSON
jsonlite::toJSON(lst, pretty = T)
# {
# "shape1": {
# "Prop1": ["red"],
# "Prop2": ["triangle"]
# },
# "shape2": {
# "Prop1": ["blue"],
# "Prop2": ["circle"]
# }
# }
I have a vector of JSONs (of the same structure) and transform it to a data.frame. Following example does exactly what I want.
require(jsonlite) # fromJSON()
require(magrittr) # for the pipeline only
require(data.table) # rbindlist()
jsons <- c('{"num":1,"char":"a","list":{"x":1,"y":2}}',
'{"num":2,"char":"b","list":{"x":1,"y":2}}',
'{"num":3,"char":"c","list":{"x":1,"y":2}}')
df <- jsons %>%
lapply(fromJSON) %>%
lapply(as.data.frame.list, stringsAsFactors = F) %>%
rbindlist(fill = T)
Some elements of the JSON are objects, i.e. if I transform it fromJSON() some elements of the list will be lists as well. I cannot use unlist() to each list because I have different variable types so I am using as.data.frame.list() function. This is however too slow to do for each JSON individually. Is there a way how can I do it more effectively?
json <- '{"$schema":"http://json-schema.org/draft-04/schema#","title":"Product set","type":"array","items":{"title":"Product","type":"object","properties":{"id":{"description":"The unique identifier for a product","type":"number"},"name":{"type":"string"},"price":{"type":"number","minimum":0,"exclusiveMinimum":true},"tags":{"type":"array","items":{"type":"string"},"minItems":1,"uniqueItems":true},"dimensions":{"type":"object","properties":{"length":{"type":"number"},"width":{"type":"number"},"height":{"type":"number"}},"required":["length","width","height"]},"warehouseLocation":{"description":"Coordinates of the warehouse with the product","$ref":"http://json-schema.org/geo"}},"required":["id","name","price"]}}'
system.time(
df <- json %>% rep(1000) %>%
lapply(fromJSON) %>%
lapply(as.data.frame.list, stringsAsFactors = F) %>%
rbindlist(fill = T)
) # 2.72
I know that there are plenty of similar questions but most of the answers I saw was about using as.data.frame() or data.frame(). Nobody mentioned the speed. Maybe there is no better solution to this.
I finally found the answer. It will be on CRAN soon.
devtools::install_github("jeremystan/tidyjson")
tidyjson::spread_all()
This function is about 10-times faster than my example above.
Try to collapse all JSONs in the one string. Let's show example of the solution:
require(jsonlite)
require(data.table)
json <- '{"$schema":"http://json-schema.org/draft-04/schema#","title":"Product set","type":"array","items":{"title":"Product","type":"object","properties":{"id":{"description":"The unique identifier for a product","type":"number"},"name":{"type":"string"},"price":{"type":"number","minimum":0,"exclusiveMinimum":true},"tags":{"type":"array","items":{"type":"string"},"minItems":1,"uniqueItems":true},"dimensions":{"type":"object","properties":{"length":{"type":"number"},"width":{"type":"number"},"height":{"type":"number"}},"required":["length","width","height"]},"warehouseLocation":{"description":"Coordinates of the warehouse with the product","$ref":"http://json-schema.org/geo"}},"required":["id","name","price"]}}'
n <- 1000
ex <- rep(json, 1000)
f1 <- function(x) {
res <- lapply(x, fromJSON)
res <- lapply(res, as.data.frame.list, stringsAsFactors = FALSE)
res <- rbindlist(res, fill = TRUE)
return(res)
}
f2 <- function(x) {
res <- fromJSON(paste0("[", paste(x, collapse = ","), "]"), flatten = TRUE)
lst <- sapply(res, is.list)
res[lst] <- lapply(res[lst], function(x) as.data.table(transpose(x)))
res <- flatten(res)
return(res)
}
bench::mark(
f1(ex), f2(ex), min_iterations = 100, check = FALSE
)
#> # A tibble: 2 x 14
#> expression min mean median max `itr/sec` mem_alloc n_gc n_itr #> total_time result memory time
#> <chr> <bch:t> <bch:t> <bch:t> <bch:tm> <dbl> <bch:byt> <dbl> <int> #> <bch:tm> <list> <list> <lis>
#> 1 f1(ex) 2.27s 2.35s 2.32s 2.49s 0.425 0B 5397 100 #> 3.92m <data… <Rpro… <bch…
#> 2 f2(ex) 48.85ms 63.78ms 57.88ms 116.19ms 15.7 0B 143 100 #> 6.38s <data… <Rpro… <bch…
#> # … with 1 more variable: gc <list>
I have some JSON that looks like this:
"total_rows":141,"offset":0,"rows":[
{"id":"1","key":"a","value":{"SP$Sale_Price":"240000","CONTRACTDATE$Contract_Date":"2006-10-26T05:00:00"}},
{"id":"2","key":"b","value":{"SP$Sale_Price":"2000000","CONTRACTDATE$Contract_Date":"2006-08-22T05:00:00"}},
{"id":"3","key":"c","value":{"SP$Sale_Price":"780000","CONTRACTDATE$Contract_Date":"2007-01-18T06:00:00"}},
...
In R, what would be the easiest way to produce a scatter-plot of SP$Sale_Price versus CONTRACTDATE$Contract_Date?
I got this far:
install.packages("rjson")
library("rjson")
json_file <- "http://localhost:5984/testdb/_design/sold/_view/sold?limit=100"
json_data <- fromJSON(file=json_file)
install.packages("plyr")
library(plyr)
asFrame <- do.call("rbind.fill", lapply(json_data, as.data.frame))
but now I'm stuck...
> plot(CONTRACTDATE$Contract_Date, SP$Sale_Price)
Error in plot(CONTRACTDATE$Contract_Date, SP$Sale_Price) :
object 'CONTRACTDATE' not found
How to make this work?
Suppose you have the following JSON-file:
txt <- '{"total_rows":141,"offset":0,"rows":[
{"id":"1","key":"a","value":{"SP$Sale_Price":"240000","CONTRACTDATE$Contract_Date":"2006-10-26T05:00:00"}},
{"id":"2","key":"b","value":{"SP$Sale_Price":"2000000","CONTRACTDATE$Contract_Date":"2006-08-22T05:00:00"}},
{"id":"3","key":"c","value":{"SP$Sale_Price":"780000","CONTRACTDATE$Contract_Date":"2007-01-18T06:00:00"}}]}'
Then you can read it as follows with the jsonlite package:
library(jsonlite)
json_data <- fromJSON(txt, flatten = TRUE)
# get the needed dataframe
dat <- json_data$rows
# set convenient names for the columns
# this step is optional, it just gives you nicer columnnames
names(dat) <- c("id","key","sale_price","contract_date")
# convert the 'contract_date' column to a datetime format
dat$contract_date <- strptime(dat$contract_date, format="%Y-%m-%dT%H:%M:%S", tz="GMT")
Now you can plot:
plot(dat$contract_date, dat$sale_price)
Which gives:
If you choose not to flatten the JSON, you can do:
json_data <- fromJSON(txt)
dat <- json_data$rows$value
sp <- strtoi(dat$`SP$Sale_Price`)
cd <- strptime(dat$`CONTRACTDATE$Contract_Date`, format="%Y-%m-%dT%H:%M:%S", tz="GMT")
plot(cd,sp)
Which gives the same plot:
I found a way that doesn't discard the field names:
install.packages("jsonlite")
install.packages("curl")
json <- fromJSON(json_file)
r <- json$rows
At this point r looks like this:
> class(r)
[1] "data.frame"
> colnames(r)
[1] "id" "key" "value"
After some more Googling and trial-and-error I landed on this:
f <- r$value
sp <- strtoi(f[["SP$Sale_Price"]])
cd <- strptime(f[["CONTRACTDATE$Contract_Date"]], format="%Y-%m-%dT%H:%M:%S", tz="GMT")
plot(cd,sp)
And the result on my full data-set...
I've create a JSON file, and I need to be able to share the file via email with other collaborators. However, although there are plenty of topics available on handling JSON objects in the R workspace, there are virtually no resources discussing how to actually export a JSON object to a .JSON file.
Here's a simple example:
list1 <- vector(mode="list", length=2)
list1[[1]] <- c("a", "b", "c")
list1[[2]] <- c(1, 2, 3)
exportJson <- toJSON(list1)
## Save the JSON to file
save(exportJson, file="export.JSON")
## Attempt to read in the JSON
library("rjson")
json_data <- fromJSON(file="export.JSON")
The final line, attempting to read in the JSON file, results in an error: "Error in fromJSON(file = "export.JSON") : unexpected character 'R'"
Obviously the save() function is not the way to go, but after extensive googling, I have found nothing that says how to export the JSON to a file. Any help would be greatly appreciated.
You can use write:
library(RJSONIO)
list1 <- vector(mode="list", length=2)
list1[[1]] <- c("a", "b", "c")
list1[[2]] <- c(1, 2, 3)
exportJson <- toJSON(list1)
> exportJson
[1] "[\n [ \"a\", \"b\", \"c\" ],\n[ 1, 2, 3 ] \n]"
write(exportJson, "test.json")
library("rjson")
json_data <- fromJSON(file="test.json")
> json_data
[[1]]
[1] "a" "b" "c"
[[2]]
[1] 1 2 3
There is also the jsonlite package:
library(jsonlite)
exportJSON <- toJSON(list1)
write(exportJSON, "test.json")
list2 <- fromJSON("test.json")
identical(list1, list2)