In DAML scenarios, how do I reuse getParty? - daml

I'm writing multiple scenarios, with a similar setup:
test0 = scenario do
bank <- getParty "Bank"
alice <- getParty "Alice"
-- ....
assert True
test1 = scenario do
bank <- getParty "Bank"
alice <- getParty "Alice"
-- ...
assert True
The linter is suggesting I reduce duplication:
/Foo.daml:5:3: Suggestion: Reduce duplication
Found:
bank <- getParty "Bank"
alice <- getParty "Alice"
assert True
Perhaps:
Combine with /Users/shaynefletcher/Foo.daml:11:3
How can I extract the setup from the scenario?

Got an answer from Shayne F:
parties = do
bank <- getParty "Bank"
alice <- getParty "Alice"
return (bank, alice)
test0 = scenario do
(bank, alice) <- parties
-- ....
assert True
test1 = scenario do
(bank, alice) <- parties
-- ...
assert True
For clarity, the type signature for parties is:
parties : Scenario (Party, Party)

Related

R - Issue with the DOM of the danish parliament (webscraping)

I've been working on a webscraping project for the political science department at my university.
The Danish parliament is very transparent about their democratic process and they are uploading all the legislative documents on their website. I've been crawling over all pages starting 2008. Right now I'm parsing the information into a dataframe and I'm having an issue that I was not able to resolve so far.
If we look at the DOM we can see that they named most of the objects div.tingdok-normal. The number of objects varies between 16-19. To parse the information correctly for my dataframe I tried to grep out the necessary parts according to patterns. However, the issue is that sometimes my pattern match more than once and I don't know how to tell R that I only want the first match.
for the sake of an example I include some code:
final.url <- "https://www.ft.dk/samling/20161/lovforslag/l154/index.htm"
to.save <- getURL(final.url)
p <- read_html(to.save)
normal <- p %>% html_nodes("div.tingdok-normal > span") %>% html_text(trim =TRUE)
tomatch <- c("Forkastet regeringsforslag", "Forkastet privat forslag", "Vedtaget regeringsforslag", "Vedtaget privat forslag")
type <- unique (grep(paste(tomatch, collapse="|"), results, value = TRUE))
Maybe you can help me with that
My understanding is that you want to extract the text of the webpage, because the "tingdok-normal" are related to the text. I was able to get the text of the webpage with the following code. Also, the following code identifies the position of the first "regex hit" of the different patterns to match.
library(pagedown)
library(pdftools)
library(stringr)
pagedown::chrome_print("https://www.ft.dk/samling/20161/lovforslag/l154/index.htm",
"C:/.../danish.pdf")
text <- pdftools::pdf_text("C:/.../danish.pdf")
tomatch <- c("(A|a)ftalen", "(O|o)pholdskravet")
nb_Tomatch <- length(tomatch)
list_Position <- list()
list_Text <- list()
for(i in 1 : nb_Tomatch)
{
# Locates the first hit of the regex
# To locate all regex hit, use stringr::str_locate_all
list_Position[[i]] <- stringr::str_locate(text , pattern = tomatch[i])
list_Text[[i]] <- stringr::str_sub(string = text,
start = list_Position[[i]][1, 1],
end = list_Position[[i]][1, 2])
}
Here is another approach :
library(RDCOMClient)
library(stringr)
library(rvest)
url <- "https://www.ft.dk/samling/20161/lovforslag/l154/index.htm"
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
IEApp$Navigate(url)
Sys.sleep(5)
doc <- IEApp$Document()
html_Content <- doc$documentElement()$innerText()
tomatch <- c("(A|a)ftalen", "(O|o)pholdskravet")
nb_Tomatch <- length(tomatch)
list_Position <- list()
list_Text <- list()
for(i in 1 : nb_Tomatch)
{
# Locates the first hit of the regex
# To locate all regex hit, use stringr::str_locate_all
list_Position[[i]] <- stringr::str_locate(text , pattern = tomatch[i])
list_Text[[i]] <- stringr::str_sub(string = text,
start = list_Position[[i]][1, 1],
end = list_Position[[i]][1, 2])
}

Scraping dynamic table in R

I am stuck on a simple web scrape.
My goal is to scrape Morningstar.com to retrieve the education of the managers associated to a fund name.
First off, let me say that I am not familiar at all with this operation. However, I did my best to provide some code.
For example, consider the following webpage
http://financials.morningstar.com/fund/management.html?t=AALGX&region=usa&culture=en_US
The problem is that the page dynamically loads the section I am targeting, so it doesn't actually get pulled in by read_html()
So what I did was to access to the data loaded in my section of interest.
Specifically, I did:
# edit: added packages required
library(xml2)
library(rvest)
library(stringi)
# original code
tmp_url <- "http://financials.morningstar.com/fund/management.html?t=AALGX&region=usa&culture=en_US"
pg <- read_html(tmp_url)
tmp <- length(html_nodes(pg, xpath=".//script[contains(., 'function loadManagerInfo()')]"))
html_nodes(pg, xpath=".//script[contains(., 'function loadManagerInfo()')]") %>%
html_text() %>%
stri_split_lines() %>%
.[[1]] -> js_lines
idx <- which(stri_detect_fixed(js_lines, '\t\t\"//financials.morningstar.com/oprn/c-managers.action?&t='))
start <- nchar("\t\t\"//financials.morningstar.com/oprn/c-managers.action?&t=")+1
id <- substr(js_lines[idx],start, start+9)
tab <- read_html(paste0("http://financials.morningstar.com/oprn/c-managers.action?&t=",id,"&region=usa&culture=en-US&cur=&callback=jsonp1523529017966&_=1523529019244"), options = "HUGE")
The object tab contains the information I need.
What I need to do now is to create a dataframe associating to each manager name, his or her manager education.
I could try to do this by transforming my object in a string, then extracting the characters following the word "Education".
Though, this looks extremely inefficient.
I was wondering if anyone can provide some guidance.
This thing really is a mess - nice work getting the links and downloding the info.
Poking around a lot and taking various detours this is the best I could come up:
Clean Up
First there is some cleanup to do. Instead of directly downloading and parsing the document in one step we will:
download the document as text
clean up the text a little to get the JSON
parse the JSON
extract the HTML item
do some further cleaning
finally parse the HTML
url <-
paste0(
"http://financials.morningstar.com/oprn/c-managers.action?&t=",
id,
"&region=usa&culture=en-US&cur=&callback=jsonp1523529017966&_=1523529019244"
)
txt <-
readLines(url, warn = FALSE)
json <-
txt %>%
gsub("^jsonp\\d+\\(", "", .) %>%
gsub("\\)$", "", .)
json_parsed <-
jsonlite::fromJSON(json)
html_clean <-
json_parsed$html %>%
gsub("\t", "", .)
html_parsed <-
read_html(html_clean)
First Round of Node Extraction
Next we use some black magic node extraction trickery. Basically the trick goes like this: If we have a node set (the thing you get when using html_nodes) we can use further XPath queries to drill down.
The first node set (cvs) captures the basic path to the CV entries in the table.
The second node set (info_tmp) drills down a little further to get the those part of the CV entries where further information ("Other Assets Managed", "Education", ... etc) is stored.
cvs <-
html_parsed %>%
html_nodes(xpath = "/html/body/table/tbody/tr[not(#align='left')]")
info_tmp <-
cvs %>%
html_nodes(xpath = "td/table/tbody")
Building up Data.Frame 1
There is little problem with the table. Each CV entry lives in its own table row. For name, from, to and description there is always exactly one item per CV entry but for "Other Assets Managed", "Education", ... etc this is not true.
Therefore, information extraction is done in two parts.
df <-
cvs %>%
lapply(
FUN =
function(x){
tmp <-
x %>%
html_nodes(xpath = "th") %>%
html_text() %>%
gsub(" +", "", .)
data.frame(
name = stri_extract(tmp, regex = "[. \\w]+"),
from = stri_extract(tmp, regex = "\\d{2}/\\d{2}/\\d{4}"),
to = stri_extract(tmp, regex = "\\d{2}/\\d{2}/\\d{4}")
)
}
) %>%
do.call(rbind, .)
df$description <-
info_tmp %>%
html_nodes(xpath = "tr[1]/td[1]") %>%
html_text()
df$cv_id <- seq_len(nrow(df))
Building Up Data.Frame 2
Now some more html nodes trickery ... If we use html_nodes() the result set of html_nodes() we get all matching and none of the none matching nodes. This is a problem since we might get 1, 0 or multiple nodes per node set node basically destroying any information about where those newly selected nodes came from.
There is a solution however: We can use lapply to query each element of an node set independently from the others and therewith preserving information about the original structure.
extract_key_value_pairs <-
function(i, info_tmp){
cv_id <-
seq_along(info_tmp)
key <-
lapply(
info_tmp,
function(x){
tmp <-
x %>%
html_nodes(xpath = paste0("tr[",i,"]/td[1]")) %>%
html_text()
if ( length(tmp) == 0 ) {
return("")
}else{
return(tmp)
}
}
)
value <-
lapply(
info_tmp,
function(x){
tmp <-
x %>%
html_nodes(xpath = paste0("tr[",i,"]/td[2]")) %>%
html_text() %>%
stri_trim_both() %>%
stri_split(fixed = "\n") %>%
lapply(X = ., stri_trim_both)
if ( length(tmp) == 0 ) {
return("")
}else{
return(unlist(tmp))
}
}
)
df <-
mapply(
cv_id = cv_id,
key = key,
value = value,
FUN =
function(cv_id, key, value){
data.frame(
cv_id = cv_id,
key = key,
value = value
)
},
SIMPLIFY = FALSE
) %>%
do.call(rbind, .)
df[df$key != "",]
}
df2 <-
lapply(
X = c(3, 5, 7),
FUN = extract_key_value_pairs,
info_tmp = info_tmp
) %>%
do.call(rbind, .)
Results
df
## name from to description cv_id
## 1 Kurt J. Lauber 03/20/2013 03/20/2013 Mr. Lauber ... 1
## 2 Noah J. Monsen 02/28/2018 02/28/2018 Mr. Monsen ... 2
## 3 Lauri Brunner 09/30/2018 09/30/2018 Ms. Brunne ... 3
## 4 Darren M. Bagwell 02/29/2016 02/29/2016 Darren M. ... 4
## 5 David C. Francis 10/07/2011 10/07/2011 Francis is ... 5
## 6 Michael A. Binger 04/14/2010 04/14/2010 Binger has ... 6
## 7 David E. Heupel 04/14/2010 04/14/2010 Mr. Heupel ... 7
## 8 Matthew D. Finn 03/30/2007 03/30/2007 Mr. Finn h ... 8
## 9 Scott Vergin 03/30/2007 03/30/2007 Vergin has ... 9
## 10 Frederick L. Plautz 11/01/1995 11/01/1995 Plautz has ... 10
## 11 Clyde E. Bartter 01/01/1994 01/01/1994 Bartter is ... 11
## 12 Wayne C. Stevens 01/01/1994 01/01/1994 Stevens is ... 12
## 13 Julian C. Ball 07/16/1987 07/16/1987 Ball is a ... 13
df2
## cv_id key value
## 1 Other Assets Managed
## 2 Other Assets Managed
## 3 Other Assets Managed
## 4 Certification CFA
## 4 Other Assets Managed
## 5 Certification CFA
## 5 Education M.B.A. University of Pittsburgh, 1978
## 5 Education B.A. University of Pittsburgh, 1977
## 5 Other Assets Managed
## 6 Certification CFA
## 6 Education M.B.A. University of Minnesota, 1991
## 6 Education B.S. University of Minnesota, 1987
## 6 Other Assets Managed
## 7 Other Assets Managed
## 8 Certification CFA
## 8 Education B.A. University of Pennsylvania, 1984
## 8 Education M.B.A. University of Michigan, 1990
## 8 Other Assets Managed
## 9 Certification CFA
## 9 Education M.B.A. University of Minnesota, 1980
## 9 Education B.A. St. Olaf College, 1976
## 9 Other Assets Managed
## 10 Education M.S. University of Wisconsin, 1981
## 10 Education B.B.A. University of Wisconsin, 1979
## 10 Other Assets Managed
## 11 Certification CFA
## 11 Education M.B.A. Western Reserve University, 1964
## 11 Education B.A. Baldwin-Wallace College, 1953
## 11 Other Assets Managed
## 12 Certification CFA
## 12 Education M.B.A. University of Wisconsin,
## 12 Education B.B.A. University of Miami,
## 12 Other Assets Managed
## 13 Certification CFA
## 13 Education B.A. Kent State University, 1974
## 13 Education J.D. Cleveland State University, 1984
## 13 Other Assets Managed
I don't have a solution, as this is not an area I have worked with before. However, with brute force you can probably get the table, assuming you have a list of rules that can parse the text to a data frame.
Thought I'd share what I have though
# get the text
f <- xml_text(tab)
# split up, this bit is tricky..
split_f <- strsplit(f, split="\\\\t", perl=TRUE)[[1]]
split_f <- strsplit(split_f, split="\\\\n", perl=TRUE)
split_f <- unlist(split_f)
split_f <- trimws(split_f)
# find ones to remove
sort(table(split_f), decreasing = T)[1:5]
split_f <- split_f[split_f!="—"]
split_f <- split_f[split_f!=""]
# manually found where to split
keep <- split_f[2:108]
# text looks ok, but would need rules to extract the rows in to a data.frame
View(keep)

R: Selecting certain from a JSON file

I've imported a JSON file into R from ( http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json ) and I'm trying to select only counties in Kansas.
Right now I have all the data into one variable and I'm trying to make subdata of this that is just counties of Kansas. I'm not sure how to go about this.
What you have there is geoJson, which can be read directly by library(sf), to give you an sf object, which is also data.frame. Then you can use the usual data.frame subsetting operations
library(sf)
sf <- sf::read_sf("http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json")
sf[sf$NAME == "Kansas", ]
# Simple feature collection with 1 feature and 5 fields
# geometry type: MULTIPOLYGON
# dimension: XY
# bbox: xmin: -102.0517 ymin: 36.99308 xmax: -94.58993 ymax: 40.00316
# epsg (SRID): 4326
# proj4string: +proj=longlat +datum=WGS84 +no_defs
# GEO_ID STATE NAME LSAD CENSUSAREA geometry
# 30 0400000US20 20 Kansas 81758.72 MULTIPOLYGON(((-99.541116 3...
And seeing as you want the individual counties, you need to use the counties data set
sf_counties <- sf::read_sf("http://eric.clst.org/wupl/Stuff/gz_2010_us_050_00_500k.json")
sf_counties[sf_counties$STATE == 20, ]
To stay with a JSON workflow, can try jqr
library(jqr)
url <- 'http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json'
download.file(url, (f <- tempfile(fileext = ".json")))
res <- paste0(readLines(f), collapse = " ")
out <- jq(res, '.features[] | select(.properties.NAME == "Kansas")')
can map easily like
library(leaflet)
leaflet() %>%
addTiles() %>%
addGeoJSON(out) %>%
setView(-98, 38, 6)
library(rjson)
lst=fromJSON(file = 'http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json')
index = which(sapply(lapply(lst$features,"[[",'properties'),'[[','NAME')=='Kansas')
subdata = lst$features[[index]]

For Loop reading API response with existing Data Frame

I have a dataframe:
df
NAME ARTISTNAME COL3
1 Everything_Now (continued) Arcade Fire Everything_Now%20(continued)%20Arcade%20Fire
2 Everything Now Arcade Fire Everything%20Now%20Arcade%20Fire
3 Signs of Life Arcade Fire Signs%20of%20Life%20Arcade%20Fire
4 Creature Comfort Arcade Fire Creature%20Comfort%20Arcade%20Fire
5 Peter Pan Arcade Fire Peter%20Pan%20Arcade%20Fire
6 Chemistry Arcade Fire Chemistry%20Arcade%20Fire
My goal is to loop this with Genius Lyric's API to get the lyric url for each value in COL3.
If I were to not loop this and just do it for each song individually, then my output for one would look like this:
genius_url <- "https://api.genius.com/search?q=Everything_Now%20(continued)%20Arcade%20Fire"
getgeniuslyrics <- GET(genius_url, add_headers(Authorization = HeaderValue))
geniuslyrics <- jsonlite::fromJSON(toJSON(content(getgeniuslyrics)))
answer <- data.frame(geniuslyrics$response$hits$result$url[1])
answer
X.https...genius.com.Arcade.fire.everything.now.continued.lyrics.
1 https://genius.com/Arcade-fire-everything-now-continued-lyrics
str(answer)
'data.frame': 1 obs. of 1 variable:
$ X.https...genius.com.Arcade.fire.everything.now.continued.lyrics.: Factor w/ 1 level "https://genius.com/Arcade-fire-everything-now-continued-lyrics": 1
This was my attempt at the for-loop so far but I am getting an error:
for(i in 1:length(df[,3])) {
genius_url <- paste("https://api.genius.com/search?q=",
df3[i,3],
sep="")
getgeniuslyrics <- GET(genius_url, add_headers(Authorization = HeaderValue))
geniuslyrics <- jsonlite::fromJSON(toJSON(content(getgeniuslyrics)))
answer <- data.frame(geniuslyrics$response$hits$result$url[1])
df[i,4] <- answer[1,]
}
The error message I am getting is:
Error in x[...] <- m : replacement has length zero
In addition: There were 26 warnings (use warnings() to see them)
Hope this makes sense. Any help would be great, thanks.
Does your dataframe already have column three or you are to create it from columns 1 and 2? I assumed you have to create the third column given the first and the second.
Try rewriting the one trial in a function like format:
funfun <- function(...){
x=unlist(list(...))
A=paste(unlist(lapply(x,strsplit," ")),collapse = "%20")
genius_url=paste0("https://api.genius.com/search?q=",A)
getgeniuslyrics <- GET(genius_url, add_headers(Authorization = HeaderValue))
geniuslyrics <- jsonlite::fromJSON(toJSON(content(getgeniuslyrics)))
answer <- data.frame(geniuslyrics$response$hits$result$url[1])
answer
}
nom maybe from here you can loop or use apply functions:
apply(df[,1:2],1,funfun)
in the case you have the third column, then your life is easier:
funfun_1 <- function(x){
genius_url=paste0("https://api.genius.com/search?q=",x)
getgeniuslyrics <- GET(genius_url, add_headers(Authorization = HeaderValue))
geniuslyrics <- jsonlite::fromJSON(toJSON(content(getgeniuslyrics)))
answer <- data.frame(geniuslyrics$response$hits$result$url[1])
answer
}
sapply(df[,3],funfun_1)

R: jsonlite - export key:value pairs from a list of lists

I have a list of lists which are of variable length. The first value of each nested list is the key, and the rest of the values in the list will be the array entry. It looks something like this:
[[1]]
[1] "Bob" "Apple"
[[2]]
[1] "Cindy" "Apple" "Banana" "Orange" "Pear" "Raspberry"
[[3]]
[1] "Mary" "Orange" "Strawberry"
[[4]]
[1] "George" "Banana"
I've extracted the keys and entries as follows:
keys <- lapply(x, '[', 1)
entries <- lapply(x, '[', -1)
but now that I have these, I don't know how I can associate a key:value pair in R without creating a matrix first, but this is silly since my data don't fit in a rectangle anyway (every example I've seen uses the column names from a matrix as the key values).
This is my crappy method using a matrix, assigning rownames, and then using jsonLite to export to JSON.
#Create a matrix from entries, without recycling
#I found this function on StackOverflow which seems to work...
cbind.fill <- function(...){
nm <- list(...)
nm <- lapply(nm, as.matrix)
n <- max(sapply(nm, nrow))
do.call(cbind, lapply(nm, function (x)
rbind(x, matrix(, n-nrow(x), ncol(x)))))
}
#Call said function
matrix <- cbind.fill(entries)
#Transpose the thing
matrix <- t(matrix)
#Set column names
colnames(matrix) <- keys
#Export to json
json<-toJSON(matrix)
The result is good, but the implementation sucks. Result:
[{"Bob":["Apple"],"Cindy":["Apple","Banana","Orange","Pear","Raspberry"],"Mary":["Orange","Strawberry"],"George":["Banana"]}]
Please let me know of better ways that might exist to accomplish this.
How about:
names(entries) <- unlist(keys)
toJSON(entries)
Consider the following lapply() approach:
library(jsonlite)
entries <- list(c('Bob', 'Apple'),
c('Cindy', 'Apple', 'Banana', 'Orange','Pear','Raspberry'),
c('Mary', 'Orange', 'Strawberry'),
c('George', 'Banana'))
# ITERATE ALL CONTENTS EXCEPT FIRST
inner <- list()
nestlist <- lapply(entries,
function(i) {
inner <- i[2:length(i)]
return(inner)
})
# NAME EACH ELEMENT WITH FIRST ELEMENT
names(nestlist) <- lapply(entries, function(i) i[1])
#$Bob
#[1] "Apple"
#$Cindy
#[1] "Apple" "Banana" "Orange" "Pear" "Raspberry"
#$Mary
#[1] "Orange" "Strawberry"
#$George
#[1] "Banana"
x <- toJSON(list(nestlist), pretty=TRUE)
x
#[
# {
# "Bob": ["Apple"],
# "Cindy": ["Apple", "Banana", "Orange", "Pear", "Raspberry"],
# "Mary": ["Orange", "Strawberry"],
# "George": ["Banana"]
# }
#]
I think this has already been sufficiently answered but here is a method using purrr and jsonlite.
library(purrr)
library(jsonlite)
sample_data <- list(
list("Bob","Apple"),
list("Cindy","Apple","Banana","Orange","Pear","Raspberry"),
list("Mary","Orange","Strawberry"),
list("George","Banana")
)
sample_data %>%
map(~set_names(list(.x[-1]),.x[1])) %>%
toJSON(auto_unbox=TRUE, pretty=TRUE)