get information from dataset (json format) in r - json

I created a datatable from mongodb collection. Data in this datatable is in JSON format but I cant get to extract the information from it..
{"place":{"bounding_box":{
"type":"Polygon",
"coordinates":[
[
[
-119.932568,
36.648905
],
[
-119.632419,
36.648905
]
]
]
}}}
I need the first two values of the coordinates: lat = 36.648905 and lon = -119.932568
But cant seems to extract that info:
my_lon <- myBigDF$place.bounding_box.coordinates[1[1[1]]]
I have tried few combination but I'm always getting NULL.
Thank you for any help..
--EDIT-- Including the code on how I'm connecting to db and creating dataframe from it..
mongo <- mongo.create(host="localhost" , db="mydb")
library(plyr)
## create the empty data frame
myDF = data.frame(stringsAsFactors = FALSE)
## create the cursor we will iterate over, basically a select * in SQL
cursor = mongo.find(mongo, namespace)
## create the counter
i = 1
## iterate over the cursor
while (mongo.cursor.next(cursor)) {
# iterate and grab the next record
tmp = mongo.bson.to.list(mongo.cursor.value(cursor))
# make it a dataframe
tmp.df = as.data.frame(t(unlist(tmp)), stringsAsFactors = F)
# bind to the master dataframe
myDF = rbind.fill(myDF, tmp.df)
}

It's hard to tell exactly how you are going from the JSON string to an R object. There are different libraries that parse thing differently. If I assume for a moment use "rjson", then you would have something like
x <- rjson::fromJSON('{"place":{"bounding_box":{ "type":"Polygon", "coordinates":[ [ [ -119.932568, 36.648905 ], [ -119.632419, 36.648905 ] ] ] }}}')
And because your data seems to have an excessive number of square brackets, things are a bit messy. You can get to the coordinates section with
x$place$bounding_box$coordinates
# [1]]
# [[1]][[1]]
# [1] -119.9326 36.6489
#
# [[1]][[2]]
# [1] -119.6324 36.6489
which is a list of lists of vectors. To make a nice matrix of lat/long coordinates you can do
do.call(rbind, x$place$bounding_box$coordinates[[1]])

Related

How Do I Consume an Array of JSON Objects using Plumber in R

I have been experimenting with Plumber in R recently, and am having success when I pass the following data using a POST request;
{"Gender": "F", "State": "AZ"}
This allows me to write a function like the following to return the data.
#* #post /score
score <- function(Gender, State){
data <- list(
Gender = as.factor(Gender)
, State = as.factor(State))
return(data)
}
However, when I try to POST an array of JSON objects, I can't seem to access the data through the function
[{"Gender":"F","State":"AZ"},{"Gender":"F","State":"NY"},{"Gender":"M","State":"DC"}]
I get the following error
{
"error": [
"500 - Internal server error"
],
"message": [
"Error in is.factor(x): argument \"Gender\" is missing, with no default\n"
]
}
Does anyone have an idea of how Plumber parses JSON? I'm not sure how to access and assign the fields to vectors to score the data.
Thanks in advance
I see two possible solutions here. The first would be a command line based approach which I assume you were attempting. I tested this on a Windows OS and used column based data.frame encoding which I prefer due to shorter JSON string lengths. Make sure to escape quotation marks correctly to avoid 'argument "..." is missing, with no default' errors:
curl -H "Content-Type: application/json" --data "{\"Gender\":[\"F\",\"F\",\"M\"],\"State\":[\"AZ\",\"NY\",\"DC\"]}" http://localhost:8000/score
# [["F","F","M"],["AZ","NY","DC"]]
The second approach is R native and has the advantage of having everything in one place:
library(jsonlite)
library(httr)
## sample data
lst = list(
Gender = c("F", "F", "M")
, State = c("AZ", "NY", "DC")
)
## jsonify
jsn = lapply(
lst
, toJSON
)
## query
request = POST(
url = "http://localhost:8000/score?"
, query = jsn # values must be length 1
)
response = content(
request
, as = "text"
, encoding = "UTF-8"
)
fromJSON(
response
)
# [,1]
# [1,] "[\"F\",\"F\",\"M\"]"
# [2,] "[\"AZ\",\"NY\",\"DC\"]"
Be aware that httr::POST() expects a list of length-1 values as query input, so the array data should be jsonified beforehand. If you want to avoid the additional package imports altogether, some system(), sprintf(), etc. magic should do the trick.
Finally, here is my plumber endpoint (living in R/plumber.R and condensed a little bit):
#* #post /score
score = function(Gender, State){
lapply(
list(Gender, State)
, as.factor
)
}
and code to fire up the API:
pr = plumber::plumb("R/plumber.R")
pr$run(port = 8000)

parsing nested structures in R

I have a json-like string that represents a nested structure. it is not a real json in that the names and values are not quoted. I want to parse it to a nested structure, e.g. list of lists.
#example:
x_string = "{a=1, b=2, c=[1,2,3], d={e=something}}"
and the result should be like this:
x_list = list(a=1,b=2,c=c(1,2,3),d=list(e="something"))
is there any convenient function that I don't know that does this kind of parsing?
Thanks.
If all of your data is consistent, there is a simple solution involving regex and jsonlite package. The code is:
if(!require(jsonlite, quiet=TRUE)){
#if library is not installed: installs it and loads it into the R session for use.
install.packages("jsonlite",repos="https://ftp.heanet.ie/mirrors/cran.r-project.org")
library(jsonlite)
}
x_string = "{a=1, b=2, c=[1,2,3], d={e=something}}"
json_x_string = "{\"a\":1, \"b\":2, \"c\":[1,2,3], \"d\":{\"e\":\"something\"}}"
fromJSON(json_x_string)
s <- gsub( "([A-Za-z]+)", "\"\\1\"", gsub( "([A-Za-z]*)=", "\\1:", x_string ) )
fromJSON( s )
The first section checks if the package is installed. If it is it loads it, otherwise it installs it and then loads it. I usually include this in any R code I'm writing to make it simpler to transfer between pcs/people.
Your string is x_string, we want it to look like json_x_string which gives the desired output when we call fromJSON().
The regex is split into two parts because it's been a while - I'm pretty sure this could be made more elegant. Then again, this depends on if your data is consistent so I'll leave it like this for now. First it changes "=" to ":", then it adds quotation marks around all groups of letters. Calling fromJSON(s) gives the output:
fromJSON(s)
$a
[1] 1
$b
[1] 2
$c
[1] 1 2 3
$d
$d$e
[1] "something"
I would rather avoid using JSON's parsing for the lack of extendibility and flexibility, and stick to a solution of regex + recursion.
And here is an extendable base code that parses your input string as desired
The main recursion function:
# Parse string
parse.string = function(.string){
regex = "^((.*)=)??\\{(.*)\\}"
# Recursion termination: element parsing
if(iselement(.string)){
return(parse.element(.string))
}
# Extract components
elements.str = gsub(regex, "\\3", .string)
elements.vector = get.subelements(elements.str)
# Recursively parse each element
parsed.elements = list(sapply(elements.vector, parse.string, USE.NAMES = F))
# Extract list's name and return
name = gsub(regex, "\\2", .string)
names(parsed.elements) = name
return(parsed.elements)
}
.
Helping functions:
library(stringr)
# Test if the string is a base element
iselement = function(.string){
grepl("^[^[:punct:]]+=[^\\{\\}]+$", .string)
}
# Parse element
parse.element = function(element.string){
splits = strsplit(element.string, "=")[[1]]
element = splits[2]
# Parse numeric elements
if(!is.na(as.numeric(element))){
element = as.numeric(element)
}
# TODO: Extend here to include vectors
# Reformat and return
element = list(element)
names(element) = splits[1]
return(element)
}
# Get subelements from a string
get.subelements = function(.string){
# Regex of allowed elements - Extend here to include more types
elements.regex = c("[^, ]+?=\\{.+?\\}", #Sublist
"[^, ]+?=\\[.+?\\]", #Vector
"[^, ]+?=[^=,]+") #Base element
str_extract_all(.string, pattern = paste(elements.regex, collapse = "|"))[[1]]
}
.
Parsing results:
string = "{a=1, b=2, c=[1,2,3], d={e=something}}"
string_2 = "{a=1, b=2, c=[1,2,3], d=somthing}"
named_string = "xyz={a=1, b=2, c=[1,2,3], d={e=something, f=22}}"
named_string_2 = "xyz={d={e=something, f=22}}"
parse.string(string)
# [[1]]
# [[1]]$a
# [1] 1
#
# [[1]]$b
# [1] 2
#
# [[1]]$c
# [1] "[1,2,3]"
#
# [[1]]$d
# [[1]]$d$e
# [1] "something"

How to create a list from json key:values in python3

I'm looking to create a python3 list of the locations from the json file city.list.json downloaded from OpenWeatherMaps http://bulk.openweathermap.org/sample/city.list.json.gz. The file passes http://json-validator.com/ but I can not figure out how to correctly open the file and create a list of values of key 'name'. I keep hitting json.loads errors about io.TextIOWrapper etc.
I created a short test file
[
{
"id": 707860,
"name": "Hurzuf",
"country": "UA",
"coord": {
"lon": 34.283333,
"lat": 44.549999
}
}
,
{
"id": 519188,
"name": "Novinki",
"country": "RU",
"coord": {
"lon": 37.666668,
"lat": 55.683334
}
}
]
Is there a way to parse this and create a list ["Hurzuf", "Novinki"] ?
You should use json.load() instead of json.loads(). I named my test file file.json and here is the code:
import json
with open('file.json', mode='r') as f:
# At first, read the JSON file and store its content in an Python variable
# By using json.load() function
json_data = json.load(f)
# So now json_data contains list of dictionaries
# (because every JSON is a valid Python dictionary)
# Then we create a result list, in which we will store our names
result_list = []
# We start to iterate over each dictionary in our list
for json_dict in json_data:
# We append each name value to our result list
result_list.append(json_dict['name'])
print(result_list) # ['Hurzuf', 'Novinki']
# Shorter solution by using list comprehension
result_list = [json_dict['name'] for json_dict in json_data]
print(result_list) # ['Hurzuf', 'Novinki']
You just simply iterate over elements in your list and check whether the key is equal to name.

converting json file into R/text file without execution

Rstudio has crashed while working and the unsaved files were not able to be loaded into the session. But the files are available in the JSON format. An example,
{
"contents" : "library(hgu133a.db)\nx <- hgu133aENSEMBL\nx\nlength(x)\ncount.mappedkeys(x)\nx[1:3]\nlinks(x[1:3])\n\n## Keep only the mapped keys\nkeys(x) <- mappedkeys(x)\nlength(x)\ncount.mappedkeys(x)\nx # now it is a submap\n\n## The above subsetting can also be achieved with\nx <- hgu133aENSEMBL[mappedkeys(hgu133aENSEMBL)]\n\n",
"created" : 1463131195093.000,
"dirty" : true,
"encoding" : "",
"folds" : "",
"hash" : "1482602869",
"id" : "737C178C",
"lastKnownWriteTime" : 0,
"path" : null,
"project_path" : null,
"properties" : {
"tempName" : "Untitled3"
},
"source_on_save" : false,
"type" : "r_source"
}
The JSON format files can be read using the jsonlite::fromJSON and the required information was stored in contents variable. When tried to read the commands using the readLines() or scan() the commands were being executed instead of converting them into a simple file. How to convert this into a r file ?
output(?):command in a r script/text file.
library(hgu133a.db)
x <- hgu133aENSEMBL
x
length(x)
count.mappedkeys(x)
x[1:3]
links(x[1:3])
## Keep only the mapped keys
keys(x) <- mappedkeys(x)
length(x)
count.mappedkeys(x)
x
# now it is a submap
## The above subsetting can also be achieved with
x <- hgu133aENSEMBL[mappedkeys(hgu133aENSEMBL)]
If anyone looking for answer to this question, the command suggested by #Kevin worked.
writeLines(json$contents, con = "/path/to/file.R")
Output:
library(hgu133a.db)
x <- hgu133aENSEMBL
x
length(x)
count.mappedkeys(x)
x[1:3]
links(x[1:3])
## Keep only the mapped keys
keys(x) <- mappedkeys(x)
length(x)
count.mappedkeys(x)
x # now it is a submap
## The above subsetting can also be achieved with
x <- hgu133aENSEMBL[mappedkeys(hgu133aENSEMBL)]

Iteratively read a fixed number of lines into R

I have a josn file I'm working with that contains multiple json objects in a single file. R is unable to read the file as a whole. But since each object occurs at regular intervals, I would like to iteratively read a fixed number of lines into R.
There are a number of SO questions on reading single lines into R but I have been unable to extend these solutions to a fixed number of lines. For my problem I need to read 16 lines into R at a time (eg 1-16, 17-32 etc)
I have tried using a loop but can't seem to get the syntax right:
## File
file <- "results.json"
## Create connection
con <- file(description=file, open="r")
## Loop over a file connection
for(i in 1:1000) {
tmp <- scan(file=con, nlines=16, quiet=TRUE)
data[i] <- fromJSON(tmp)
}
The file contains over 1000 objects of this form:
{
"object": [
[
"a",
0
],
[
"b",
2
],
[
"c",
2
]
]
}
With #tomtom inspiration I was able to find a solution.
## File
file <- "results.json"
## Loop over a file
for(i in 1:1000) {
tmp <- paste(scan(file=file, what="character", sep="\n", nlines=16, skip=(i-1)*16, quiet=TRUE),collapse=" ")
assign(x = paste("data", i, sep = "_"), value = fromJSON(tmp))
}
I couldn't create a connection as each time I tried the connection would close before the file had been completely read. So I got rid of that step.
I had to include the what="character" variable as scan() seems to expect a number by default.
I included sep="\n", paste() and collapse=" " to create a single string rather than the vector of characters that scan() creates by default.
Finally I just changed the final assignment operator to have a bit more control over the names of the output.
This might help:
EDITED to make it use a list and Reduce into one file
## Loop over a file connection
data <- NULL
for(i in 1:1000) {
tmp <- scan(file=con, nlines=16, skip=(i-1)*16, quiet=TRUE)
data[[i]] <- fromJSON(tmp)
}
df <- Reduce(function(x, y) {paste(x, y, collapse = " ")})
You would have to make sure that you don't reach further than the end of the file though ;-)