How Do I Consume an Array of JSON Objects using Plumber in R - json

I have been experimenting with Plumber in R recently, and am having success when I pass the following data using a POST request;
{"Gender": "F", "State": "AZ"}
This allows me to write a function like the following to return the data.
#* #post /score
score <- function(Gender, State){
data <- list(
Gender = as.factor(Gender)
, State = as.factor(State))
return(data)
}
However, when I try to POST an array of JSON objects, I can't seem to access the data through the function
[{"Gender":"F","State":"AZ"},{"Gender":"F","State":"NY"},{"Gender":"M","State":"DC"}]
I get the following error
{
"error": [
"500 - Internal server error"
],
"message": [
"Error in is.factor(x): argument \"Gender\" is missing, with no default\n"
]
}
Does anyone have an idea of how Plumber parses JSON? I'm not sure how to access and assign the fields to vectors to score the data.
Thanks in advance

I see two possible solutions here. The first would be a command line based approach which I assume you were attempting. I tested this on a Windows OS and used column based data.frame encoding which I prefer due to shorter JSON string lengths. Make sure to escape quotation marks correctly to avoid 'argument "..." is missing, with no default' errors:
curl -H "Content-Type: application/json" --data "{\"Gender\":[\"F\",\"F\",\"M\"],\"State\":[\"AZ\",\"NY\",\"DC\"]}" http://localhost:8000/score
# [["F","F","M"],["AZ","NY","DC"]]
The second approach is R native and has the advantage of having everything in one place:
library(jsonlite)
library(httr)
## sample data
lst = list(
Gender = c("F", "F", "M")
, State = c("AZ", "NY", "DC")
)
## jsonify
jsn = lapply(
lst
, toJSON
)
## query
request = POST(
url = "http://localhost:8000/score?"
, query = jsn # values must be length 1
)
response = content(
request
, as = "text"
, encoding = "UTF-8"
)
fromJSON(
response
)
# [,1]
# [1,] "[\"F\",\"F\",\"M\"]"
# [2,] "[\"AZ\",\"NY\",\"DC\"]"
Be aware that httr::POST() expects a list of length-1 values as query input, so the array data should be jsonified beforehand. If you want to avoid the additional package imports altogether, some system(), sprintf(), etc. magic should do the trick.
Finally, here is my plumber endpoint (living in R/plumber.R and condensed a little bit):
#* #post /score
score = function(Gender, State){
lapply(
list(Gender, State)
, as.factor
)
}
and code to fire up the API:
pr = plumber::plumb("R/plumber.R")
pr$run(port = 8000)

Related

JSON no longer parsable after being sent through TCP in Ruby

I've created a basic client and server that pass a string, which I've changed to JSON instead. But the JSON string is only parsable before it gets sent through TCP. After it's sent, the string version is identical (after a chomp), but on the server side it no longer processes the JSON correctly. Here is some of my code (with other bits trimmed)
Some of the client code
require 'json'
require 'socket'
foo = {'a' => 1, 'b' => 2, 'c' => 3}
puts foo.to_s + "......."
foo.to_json
puts foo['b'] # => outputs the correct '2' answer
client = TCPSocket.open('localhost', 2000)
client.puts json
client.close;
Some of the server code
require 'socket'
require 'json'
server = TCPServer.open(2000)
while true
client = server.accept # Accept client
response = client.gets
print response
response = response.chomp
response.to_json
puts response['b'] # => outputs 'b'
end
The output 'b' should be '2' instead. How do I fix this?
Thanks
In your server you wrote response.to_json. This turns a string to JSON, then throws it away. And I don't like the .chomp, either.
Try
response = client.gets
hash = JSON.parse(response)
Now hash is a Ruby Hash object with your data in it, and hash['b'] should work correctly.
The problem is that .to_json does not parse JSON inside a string and replace itself with the result. It is used to convert the string into a format that is an acceptable JSON value.
require 'json'
string = "abc"
puts string
puts string.to_json
This will output:
abc
"abc"
The method is added to the String class by the JSON generator and it uses it internally to generate the JSON document.
But why does your response['b'] return "b"?
Because Ruby strings have a method called [] that can be used to:
Return a substring: "abc"[0,2] => "ab"
Return a single character from index: "abc"[1] => "b"
Return a substring if the string contains it: "abc"["bc"] => "bc", "abc"["fg"] => nil
Return a regexp match: "abc"[/^a([a-z])c/, 1] => "b"
and possibly some other ways I can't think of right now.
So this happens because your response is a string that has the character "b" in it:
response = "something with a b"
puts response["b"]
# outputs b
puts response["x"]
# outputs a blank line because response does not contain "x".
Instead of .to_json your code has to call JSON.parse or JSON.load:
data = JSON.parse(response)
puts data['b']

parsing nested structures in R

I have a json-like string that represents a nested structure. it is not a real json in that the names and values are not quoted. I want to parse it to a nested structure, e.g. list of lists.
#example:
x_string = "{a=1, b=2, c=[1,2,3], d={e=something}}"
and the result should be like this:
x_list = list(a=1,b=2,c=c(1,2,3),d=list(e="something"))
is there any convenient function that I don't know that does this kind of parsing?
Thanks.
If all of your data is consistent, there is a simple solution involving regex and jsonlite package. The code is:
if(!require(jsonlite, quiet=TRUE)){
#if library is not installed: installs it and loads it into the R session for use.
install.packages("jsonlite",repos="https://ftp.heanet.ie/mirrors/cran.r-project.org")
library(jsonlite)
}
x_string = "{a=1, b=2, c=[1,2,3], d={e=something}}"
json_x_string = "{\"a\":1, \"b\":2, \"c\":[1,2,3], \"d\":{\"e\":\"something\"}}"
fromJSON(json_x_string)
s <- gsub( "([A-Za-z]+)", "\"\\1\"", gsub( "([A-Za-z]*)=", "\\1:", x_string ) )
fromJSON( s )
The first section checks if the package is installed. If it is it loads it, otherwise it installs it and then loads it. I usually include this in any R code I'm writing to make it simpler to transfer between pcs/people.
Your string is x_string, we want it to look like json_x_string which gives the desired output when we call fromJSON().
The regex is split into two parts because it's been a while - I'm pretty sure this could be made more elegant. Then again, this depends on if your data is consistent so I'll leave it like this for now. First it changes "=" to ":", then it adds quotation marks around all groups of letters. Calling fromJSON(s) gives the output:
fromJSON(s)
$a
[1] 1
$b
[1] 2
$c
[1] 1 2 3
$d
$d$e
[1] "something"
I would rather avoid using JSON's parsing for the lack of extendibility and flexibility, and stick to a solution of regex + recursion.
And here is an extendable base code that parses your input string as desired
The main recursion function:
# Parse string
parse.string = function(.string){
regex = "^((.*)=)??\\{(.*)\\}"
# Recursion termination: element parsing
if(iselement(.string)){
return(parse.element(.string))
}
# Extract components
elements.str = gsub(regex, "\\3", .string)
elements.vector = get.subelements(elements.str)
# Recursively parse each element
parsed.elements = list(sapply(elements.vector, parse.string, USE.NAMES = F))
# Extract list's name and return
name = gsub(regex, "\\2", .string)
names(parsed.elements) = name
return(parsed.elements)
}
.
Helping functions:
library(stringr)
# Test if the string is a base element
iselement = function(.string){
grepl("^[^[:punct:]]+=[^\\{\\}]+$", .string)
}
# Parse element
parse.element = function(element.string){
splits = strsplit(element.string, "=")[[1]]
element = splits[2]
# Parse numeric elements
if(!is.na(as.numeric(element))){
element = as.numeric(element)
}
# TODO: Extend here to include vectors
# Reformat and return
element = list(element)
names(element) = splits[1]
return(element)
}
# Get subelements from a string
get.subelements = function(.string){
# Regex of allowed elements - Extend here to include more types
elements.regex = c("[^, ]+?=\\{.+?\\}", #Sublist
"[^, ]+?=\\[.+?\\]", #Vector
"[^, ]+?=[^=,]+") #Base element
str_extract_all(.string, pattern = paste(elements.regex, collapse = "|"))[[1]]
}
.
Parsing results:
string = "{a=1, b=2, c=[1,2,3], d={e=something}}"
string_2 = "{a=1, b=2, c=[1,2,3], d=somthing}"
named_string = "xyz={a=1, b=2, c=[1,2,3], d={e=something, f=22}}"
named_string_2 = "xyz={d={e=something, f=22}}"
parse.string(string)
# [[1]]
# [[1]]$a
# [1] 1
#
# [[1]]$b
# [1] 2
#
# [[1]]$c
# [1] "[1,2,3]"
#
# [[1]]$d
# [[1]]$d$e
# [1] "something"

Livy Server: return a dataframe as JSON?

I am executing a statement in Livy Server using HTTP POST call to localhost:8998/sessions/0/statements, with the following body
{
"code": "spark.sql(\"select * from test_table limit 10\")"
}
I would like an answer in the following format
(...)
"data": {
"application/json": "[
{"id": "123", "init_date": 1481649345, ...},
{"id": "133", "init_date": 1481649333, ...},
{"id": "155", "init_date": 1481642153, ...},
]"
}
(...)
but what I'm getting is
(...)
"data": {
"text/plain": "res0: org.apache.spark.sql.DataFrame = [id: string, init_date: timestamp ... 64 more fields]"
}
(...)
Which is the toString() version of the dataframe.
Is there some way to return a dataframe as JSON using the Livy Server?
EDIT
Found a JIRA issue that addresses the problem: https://issues.cloudera.org/browse/LIVY-72
By the comments one can say that Livy does not and will not support such feature?
I recommend using the built-in (albeit hard to find documentation for) magics %json and %table:
%json
session_url = host + "/sessions/1"
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""\
val d = spark.sql("SELECT COUNT(DISTINCT food_item) FROM food_item_tbl")
val e = d.collect
%json e
""")}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
%table
session_url = host + "/sessions/21"
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""\
val x = List((1, "a", 0.12), (3, "b", 0.63))
%table x
""")}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
Related: Apache Livy: query Spark SQL via REST: possible?
I don't have a lot of experience with Livy, but as far as I know this endpoint is used as an interactive shell and the output will be a string with the actual result that would be shown by a shell. So, with that in mind, I can think of a way to emulate the result you want, but It may not be the best way to do it:
{
"code": "println(spark.sql(\"select * from test_table limit 10\").toJSON.collect.mkString(\"[\", \",\", \"]\"))"
}
Then, you will have a JSON wrapped in a string, so your client could parse it.
I think in general your best bet is to write your output to a database of some kind. If you write to a randomly named table, you could have your code read it after the script is done.

httr GET operation unable to access JSON response

I am trying to access the JSON response from an API call in my R script. The API call is succesful, and I can view the JSON response in the console. However, I am unable to access any data from it.
A sample code segment is:
require(httr)
target <- '#trump'
sentence<- 'Donald trump has a wonderful toupe, it really is quite stunning that a man can be so refined and elegant'
query <- url_encode(sentence)
target <- gsub('#', '', target)
endpoint <- "https://alchemy.p.mashape.com/text/TextGetTargetedSentiment?outputMode=json&target="
apiCall <- paste(endpoint, target, '&text=', query, sep = '')
resp <-GET(apiCall, add_headers("X-Mashape-Key" = sentimentKey, "Accept" = "application/json"))
stop_for_status(resp)
headers(resp)
str(content(resp))
content(resp, "text")
I followed examples in the httr quickstart guide from CRAN (here) as well as this stack.
Unfortunately, I keep getting either "unused parameters 'text' in content()" or "no definition exists for content() accepting a class of 'response.' Does anyone have any advice? PS the headers will print, and resp$content will print the raw bitstream
Expanding on the comment, you need to set the content type explicitly in the call to content(...). Since your code is not reproducible, here is an example using the Census Bureau's geocoder (which returns a json response).
library(httr)
url <- "http://geocoding.geo.census.gov/geocoder/locations/onelineaddress"
resp <-GET(url, query=list(address="1600 Pennsylvania Avenue, Washington DC",
benchmark=9,
format="json"))
json <- content(resp, type="application/json")
json$result$addressMatches[[1]]$coordinates
# $x
# [1] -77.038025
#
# $y
# [1] 38.898735
Assuming your are actually getting a json response, and that it is well-formed, simply using content(resp, type="application/json") should work.

get information from dataset (json format) in r

I created a datatable from mongodb collection. Data in this datatable is in JSON format but I cant get to extract the information from it..
{"place":{"bounding_box":{
"type":"Polygon",
"coordinates":[
[
[
-119.932568,
36.648905
],
[
-119.632419,
36.648905
]
]
]
}}}
I need the first two values of the coordinates: lat = 36.648905 and lon = -119.932568
But cant seems to extract that info:
my_lon <- myBigDF$place.bounding_box.coordinates[1[1[1]]]
I have tried few combination but I'm always getting NULL.
Thank you for any help..
--EDIT-- Including the code on how I'm connecting to db and creating dataframe from it..
mongo <- mongo.create(host="localhost" , db="mydb")
library(plyr)
## create the empty data frame
myDF = data.frame(stringsAsFactors = FALSE)
## create the cursor we will iterate over, basically a select * in SQL
cursor = mongo.find(mongo, namespace)
## create the counter
i = 1
## iterate over the cursor
while (mongo.cursor.next(cursor)) {
# iterate and grab the next record
tmp = mongo.bson.to.list(mongo.cursor.value(cursor))
# make it a dataframe
tmp.df = as.data.frame(t(unlist(tmp)), stringsAsFactors = F)
# bind to the master dataframe
myDF = rbind.fill(myDF, tmp.df)
}
It's hard to tell exactly how you are going from the JSON string to an R object. There are different libraries that parse thing differently. If I assume for a moment use "rjson", then you would have something like
x <- rjson::fromJSON('{"place":{"bounding_box":{ "type":"Polygon", "coordinates":[ [ [ -119.932568, 36.648905 ], [ -119.632419, 36.648905 ] ] ] }}}')
And because your data seems to have an excessive number of square brackets, things are a bit messy. You can get to the coordinates section with
x$place$bounding_box$coordinates
# [1]]
# [[1]][[1]]
# [1] -119.9326 36.6489
#
# [[1]][[2]]
# [1] -119.6324 36.6489
which is a list of lists of vectors. To make a nice matrix of lat/long coordinates you can do
do.call(rbind, x$place$bounding_box$coordinates[[1]])