How to handover R list object to R transform in Code Workbook - palantir-foundry

I estimate a polr model with R in foundry code workbook. After estimation I want to handover the model object to another R transform. Estimation works fine. Handover not.
Example:
global code
library(MASS)
transform code
model <- function(modeldata) {
df <- modeldata
model = polr(answer ~ eos_ratio,
data = df,
Hess = TRUE)
return(model)
}
Error message
Function should return a data frame or FoundryObject or NULL.
Thanks

When you return something from Code Workbooks in Foundry, it's expected for you to be returning a data frame, a FoundryObject or a raw file written to a dataset. For saving down models, users generally use Foundry Machine Learning (FoundryML) but it currently doesn't support R officially. Please contact support for the specifics and if you face issues since I realize there might be a lot of pain here.
My R is a bit rusty, but there are a couple of (hacky) alternatives:
From the documentation https://www.palantir.com/docs/foundry/code-workbook/transforms-unstructured/#unstructured-files-in-r , here is how you save down a .rds file:
write_rds_file <- function(r_dataframe) {
output <- new.output()
output_fs <- output$fileSystem()
saveRDS(r_dataframe, output_fs$get_path("my_RDS_file.rds", 'w'))
}
You can something similar with rda: Reusing a Model Built in R
The other option is store the model parameters in a dataframe:
How do I store lm object in a data frame in R
You can then recreate the model using those parameters.

Related

How do I parse a GeoJSON shapefile into a dataset with one row per Feature?

I'm working on a project and need to parse a GeoJSON shapefile of flight route airspace in the US. The data is coming from the FAA open data portal: https://adds-faa.opendata.arcgis.com/datasets/faa::route-airspace/about
There seems to be some relevant documentation at /workspace/documentation/product/geospatial-docs/vector_data_in_transforms where it mentions:
A typical Foundry Ontology pipeline for geospatial vector data may include the following steps:
- Convert into rows with GeoJSON representation of the shape for each feature.
However there isn't actually any guidance on how to go about doing this when the source is a single GeoJSON file with a FeatureCollection and the desired output is a dataset with one row per Feature in the collection.
Anyone have a code snippet for accomplishing this? Seems like a pretty generic task in Foundry.
I typically do something like this:
import json
with open('Route_Airspace.geojson', 'r') as f:
data = json.load(f)
rows = []
for feature in data['features']:
row = {
'geometry': json.dumps(feature['geometry']),
'properties': json.dumps(feature['properties']),
'id': feature['properties']['OBJECTID']
}
rows.append(row)
Note you can leave out the properties, but I like to keep them in this step in case I need them later. Also note this is a good place to set each row's primary key as well (the Features in this dataset have an OBJECTID property, but this may vary)
This rows list can then be used to initialise a Pandas dataframe:
import pandas as pd
df = pd.DataFrame(rows)
or a Spark dataframe (*assuming you're doing this within a transform):
df = ctx.spark_session.createDataFrame(rows)
The resulting dataframes will have one row per Feature, where that feature's shape is contained within the geometry column.
Full example within transform:
from transforms.api import transform, Input, Output
import json
#transform(
out=Output('Path/to/output'),
source_df=Input('Path/to/source'),
)
def compute(source_df, out, ctx)
with source_df.filesystem().open('Route_Airspace.geojson', 'r') as f:
data = json.load(f)
rows = []
for feature in data['features']:
row = {
'geometry': json.dumps(feature['geometry']),
'properties': json.dumps(feature['properties']),
'id': feature['properties']['OBJECTID']
}
rows.append(row)
df = ctx.spark_session.createDataFrame(rows)
out.write_dataframe(df)
Note that for this to work your GeoJSON file needs to be uploaded into a "dataset without a schema" so the raw file becomes accessible via the FileSystem API.

R: How can I use the package Rcrawler to do JSON parsing in parallel?

I just came across this powerful R package but unfortunately haven't been able to find out how to parse a list of urls in parallel where the response is in JSON.
As a simple example, suppose I have a list of cities (in Switzerland):
list_cities <- c("Winterthur", "Bern", "Basel", "Lausanne", "Lugano")
In a next step I'd like to find public transport connections to the city of Zurich for each of the listed cities. I can use the following transport api to query public timetable data:
https://transport.opendata.ch
Using the httr package, I can make a request for each city as follows:
for (city in list_cities) {
r <- GET(paste0("http://transport.opendata.ch/v1/connections?from=", city, "&to=Zurich&limit=1&fields[]=connections/duration"))
cont <- content(r, as = "parsed", type = "application/json", encoding = "UTF-8")
}
to get the duration of the individual journeys. However, I have a much longer list and more destinations. That's why I am looking for a way to make multiple requests in parallel.
Note I did not test this - but first, you would initialize your parallel workers
library(parallel)
cl <- makeCluster(detectCores() - 1)
clusterEvalQ(cl, { library(Rcrawler) }) # load required packages onto each parallel worker
Make function with your relevant commands
custom_parse_json <- function(city) {
r <- GET(paste0("http://transport.opendata.ch/v1/connections?from=", city, "&to=Zurich&limit=1&fields[]=connections/duration"))
cont <- content(r, as = "parsed", type = "application/json", encoding = "UTF-8")
return(cont)
}
Export function to each parallel work
clusterExport(cl, c("custom_parse_json"))
Loop through list of cities
parLapply(cl, list_cities, function(i) custom_parse_json(i))
This should return a list of your JSON content.

How to read a JSON object that I created in R into sparkR

I would like to take a dataframe I've created in R, and turn that into a JSON Object, and then read that JSON object into sparkR. With my current project, I can't just pass a dataframe into SparkR, and have to do this roundabout method to get my project to work. I also can't make a local JSON file first to read into sparkR, and so I am trying to make a JSON object to hold my data, and then read that into sparkR.
In other posts I read, Scala Spark has a function
sqlContext.read.json(anotherPeopleRDD)
That seems to do what I am trying to accomplish. Is there something similar for SparkR?
Here is the code I am working with right now:
.libPaths(c(.libPaths(), '/root/Spark1.6.2/spark-1.6.2-bin-hadoop2./R/lib'))
Sys.setenv(SPARK_HOME = '/root/Spark1.6.2/spark-1.6.2-bin-hadoop2.6')
Sys.setenv(R_HOME = '/root/R-3.4.1')
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
Sys.setenv("spark.r.command" = '/usr/bin')
Sys.setenv(HADOOP_CONF_DIR = "/etc/hadoop/conf.cloudera.yarn")
Sys.setenv(PATH = paste(Sys.getenv(c('PATH')), '/root/Spark1.6.2/spark1.6.2-bin-hadoop2.6/bin', sep=':'))
library(SparkR)
sparkR.stop()
sc <- sparkR.init(sparkEnvir = list(spark.shuffle.service.enabled=TRUE,spark.dynamicAllocation.enabled=TRUE, spark.dynamicAllocation.initialExecutors="2"), master = "yarn-client", appName = "SparkR")
sqlContext <- sparkRSQL.init(sc)
options(warn=-1)
n = 1000
x = data.frame(id = 1:n, val = rnorm(n))
library(RJSONIO)
exportJson <- toJSON(x)
testJsonData = read.json(sqlContext, exportJson) #fails
collect(testJsonData)
remove(sc)
remove(sqlContext)
sparkR.stop()
options(warn=0)
With the error message I'm getting for read.json:
17/08/03 12:25:35 ERROR r.RBackendHandler: json on 2 failed
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: {
The solution to this problem was that the JSON files I was working with was not supported by the spark read.json function, due to how it was formated. Instead I had to use another R library, jsonlite, to make my JSON files, and now it works as intended.
This is how it looks like when I create the file now:
library(jsonlite)
exportJson <- toJSON(x)
testJsonData = read.json(sqlContext, exportJson) #fails
collect(testJsonData)
I hope that helps anyone!

Accessing JSON Objects in R

I've been trying to access an API, which appears in the following JSON format:
{"notes":"data is an array of arrays (rows), column names for rows in
row_headers","row_headers":["Date","Time Spent (seconds)","Number of
People","Activity","Category","Productivity"],"rows":[["2014-05-28T09:00:00",538,1,"Gmail","Email",-1],["2014-05-28T09:00:00",450,1,"MS
Outlook","Email",1],["2014-05-28T09:00:00",374,1,"communicator","General
Communication \u0026 Scheduling",1],["2014-05-28T09:00:00",315,1,"MS
Terminal Services Client","General Software
Development",2],["2014-05-28T09:00:00",306,1,"fivethirtyeight.com","General
News \u0026 Opinion",-
I was able to import this JSON dictionary with Python and cast it into a dataframe using the following code:
data = json.load(urllib2.urlopen(url))
dataframe = pandas.DataFrame(data['rows'],columns=data['row_headers'])
However, when I try to do a similar thing in R with the following code, I get an error saying that R "cannot open file 'NA': No such file or directory."
json_file <- 'url of API key'
raw_json <- fromJSON(file=json_file['raw'])
If anyone has any insight into how you can go about loading a JSON object like I show above into an R dataframe, I would greatly appreciate it.
Here is an example that works using RJSONIO::fromJSON
library(RJSONIO)
raw_json <- fromJSON(content = "https://www.rescuetime.com/anapi/data?rtapi_key=B63NUgr1wbXQS_I6tT8ON0LpvyPPcXNOd1mXfrG9&perspective=interval&format=json&resolution_time=hour&restrict_kind=activity&restrict_begin=2013-01-01&restrict_end=2014-08-28%22")
You can then use data.table::rbindlist to coerce to a data.table.
library(data.table)
final_data <- rbindlist(lapply(raw_json[['rows']],setattr,'names', raw_json[['row_headers']]))

Importing JSONP data from HTML page then exporting to CSV

I have some JSON data, of which this is a snippet:
{"sweater":"15", "localtime":"7:14 PM", "xcoord":-61,
"desc":"John Smith SHOT on Jack Jones", "teamid":10,"strength":701,
"pid":8465200,"formalEventId":"TOR8", "period":1, "type":"Shot", "p3name":"",
"eventid":8, "p2name":"Jack Jones", "ycoord":21, "pid3":"", "time":"00:38",
"playername":"John Smith", "p1name":"John Smith",
"video":"2_26_ott_tor_0910_TOR8_save_800K_16x9.flv", "pid2":8469461, "pid1":8465200}
I would like to grab this info from an HTML URL with this format:
http://foo.com/data/20092010/20090xxxxx/PxP.jsonp
where xxxxx is a 5 digit game code which I would like to have inserted from a list (via loop).
The data I need most is: sweater, xcoord, teamid, strength, period, type, ycoord, time, playername AND to have the game code (xxxxx) inserted as a column as well.
So it would be:
Gamecode, sweater, xcoord, teamid, strength, period, type, ycoord, time, playername
Then, have it export all the info into one (1) CSV file.
Can anyone help with pointing me in right direction?
EDIT:
I tried to import the json file as a local file, using the following code:
#libraries
library(RCurl)
library(rjson)
library(bitops)
#fetch data
j <- getURL("file:///Desktop/test.jsonp")
#grab JSON
j.list <- fromJSON(j)
#get each data item
j.df <- data.frame(playername = sapply(j.list, function(x) x$sweater))
j.df <- data.frame(xcoord = sapply(j.list, function(x) x$xcoord))
j.df <- data.frame(ycoord = sapply(j.list, function(x) x$ycoord))
j.df <- data.frame(type = sapply(j.list, function(x) x$type))
write.csv(j.df, file="fooPxP.csv")
and get an empty CSV file. Any ideas what I am doing wrong?
Here is some of the actual data file from beginining:
loadPlayByPlay({"data":{"refreshInterval":0,"game":{"awayteamid":9,"awayteamname":"Ottawa Senators","hometeamname":"Toronto Maple Leafs","plays":{"play":[{"sweater":"11","localtime":"7:14 PM","xcoord":76,"desc":"Daniel Alfredsson HIT on Tomas Kaberle","teamid":9,"strength":701,"pid":8460621,"formalEventId":"TOR51","period":1,"type":"Hit","p3name":"","eventid":51,"p2name":"Tomas Kaberle","ycoord":-40,"pid3":"","time":"00:16","playername":"Daniel Alfredsson","p1name":"Daniel Alfredsson","pid2":8465200,"pid1":8460621},{"sweater":"15","localtime":"7:14 PM","xcoord":-61,"desc":"Tomas Kaberle SHOT on Pascal Leclaire","teamid":10,"strength":701,"pid":8465200,"formalEventId":"TOR8","period":1,"type":"Shot","p3name":"","eventid":8,"p2name":"Pascal Leclaire","ycoord":21,"pid3":"","time":"00:38","playername":"Tomas Kaberle","p1name":"Tomas Kaberle","video":"2_26_ott_tor_0910_TOR8_save_800K_16x9.flv","pid2":8469461,"pid1":8465200}}})
Thanks in advance!
I wrote an article on fetching JSON from a URL and converting to a data frame, which might help you to get started.
You can fetch the data using getURL() in the RCurl library, like this:
library(RCurl)
j <- getURL("http://foo.com/data/20092010/20090xxxxx/PxP.jsonp")
Next, fromJSON() in the rjson package should convert it to a list:
library(rjson)
j.list <- fromJSON(j)
You can then construct a data frame from the list. For example, to get a column named "sweater", try:
j.df <- data.frame(sweater = sapply(j.list, function(x) x$sweater))
Just add more columns as arguments to data.frame() using the other JSON keys.
To add the "xxxxx", you'll need to parse the URL using something like grep().
Once you have your data frame, you can write to CSV using either write.table() or write.csv(). For many URLs, you'll have to figure out how to combine the lists generated by fromJSON() into one data frame.
There's R functions for reading anything from a URL (see help(download.file), and also the rjson package on CRAN for handling json data. Some tweaking may be needed if its really JSONP.
For a similar example, check out my geonames package - it reads JSON data from geonames.org and constructs data frames.
If its not on CRAN then its on R-Forge. I forget..
Writing file
Writing a file in tn the client is problematic for most browsers due to security restrictions.
In Internet Explorer only you can write a file using the execCommand - example in http://4umi.com/web/javascript/filewrite.php.
Translating json to CSV
I ran into this converted from json to csv.
http://skysanders.net/subtext/archive/2010/09/19/json-to-csv.aspx
Alternative
Generate the conversion on the server and download to the browser straight text (this is mime text/plain)