In a DAML contract, how do I extract the name of a party from a Party field?
Currently, toText p gives me Party(Alice). I'd like to only keep the name of the party.
That you care about the precise formatting of the resulting string suggests that you are implementing a codec in DAML. As a general principle DAML excels as a modelling/contract language, but consequently has limited features to support the sort of IO-oriented work this question implies. You are generally better off returning DAML values, and implementing codecs in Java/Scala/C#/Haskell/etc interfacing with the DAML via the Ledger API.
Still, once you have a Text value you also have access to the standard List manipulation functions via unpack, so converting "Party(Alice)" to "Alice" is not too difficult:
daml 1.0 module PartyExtract where
import Base.List
def pack (cs: List Char) : Text =
foldl (fun (acc: Text) (c: Char) -> acc <> singleton c) "" cs;
def partyToText (p: Party): Text =
pack $ reverse $ drop 2 $ reverse $ drop 7 $ unpack $ toText p
test foo : Scenario {} = scenario
let p = 'Alice'
assert $ "Alice" == partyToText p
In DAML 1.2 the standard library has been expanded, so the code above can be simplified:
daml 1.2
module PartyExtract2
where
import DA.Text
traceDebug : (Show a, Show b) => b -> a -> a
traceDebug b a = trace (show b <> show a) $ a
partyToText : Party -> Text
partyToText p = dropPrefix "'" $ dropSuffix "'" $ traceDebug "show party: " $ show p
foo : Scenario ()
foo = do
p <- getParty "Alice"
assert $ "Alice" == (traceDebug "partyToText party: " $ partyToText p)
NOTE: I have left the definition and calls to traceDebug so you can see the exact strings being generated in the scenario trace output.
Related
I have a simple csv created as part of a workflow, like below:
sample,value
A,1
B,0.5
Separately, I have another channel with file names matching the sample names. I'd like to be able to use the values associated with each sample name within a new process.
I've tried splitting the CSV using .splitCsv but (unsurprisingly) sometimes the incorrect value gets used with a sample, although it does run the correct number of times. I've also tried just using awk within the script to pull out the corresponding value and save it to a variable, and this causes the correct value to be used, but it consumes the CSV file and so only one sample gets processed.
Super simplified nextflow (DSL2) script:
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process foo {
input:
path input_file
output:
path 'file.csv', emit csv
"""
script that creates csv
"""
}
process bar {
input:
path input_file2
output:
path 'file.bam', emit bam
"""
script that creates bam files
"""
}
process help_me {
input:
path csv
path bam
output:
path 'result'
"""
script that uses value from csv on associated bam file
"""
}
workflow {
foo(params.input)
bar(params.input2)
help_me(foo.out.csv, bar.out.bam)
}
Thanks!!
Edit: In essence, is there a way to synchronize two channels such that I can use a csv's individual rows with associated files?
If you have a value channel, you can reuse a file (like a CSV) an unlimited number of times without consuming the channel. For example:
workflow {
input1 = file( params.input1 )
input2 = file( params.input2 )
foo( input1 )
bar( input2 )
help_me(foo.out.csv, bar.out.bam)
}
Here, both input1 and input2 are value channels. Also, (emphasis mine):
A value channel is implicitly created by a process when an input
specifies a simple value in the from clause. Moreover, a value channel
is also implicitly created as output for a process whose inputs are
only value channels.
Means that both foo.out.csv and bar.out.bam are also value channels. Additionally, help_me.out is also a value channel. If input2 was instead a queue channel, you can see that input1 can be re-used an unlimited number of times:
$ mkdir -p ./path/to/bams
$ touch ./path/to/bams/{A,B,C}.bam
$ touch ./foo.txt
params.input1 = './foo.txt'
params.input2 = './path/to/bams/*.bam'
workflow {
input1 = file( params.input1 )
input2 = Channel.fromPath( params.input2 )
foo( input1 )
bar( input2 )
help_me(foo.out.csv, bar.out.bam)
}
Results:
$ nextflow run script.nf
N E X T F L O W ~ version 22.04.0
Launching `script.nf` [trusting_allen] DSL2 - revision: 75209e4c85
executor > local (7)
[24/d459f7] process > foo [100%] 1 of 1 ✔
[04/a903e4] process > bar (2) [100%] 3 of 3 ✔
[24/7a9a1d] process > help_me (3) [100%] 3 of 3 ✔
Note that bar.out.bam and help_me.out are now queue channels.
If instead, you have one CSV per sample (or similar configuration), you will need some way to join these channels prior and adjust your new process' input declaration accordingly. What you want to avoid is declaring two (or more) queue channels in your input block. This part of docs is well worth the time investment: Understand how multiple input channels work, and would explain why you saw the incorrect value being associated with a particular sample when consuming the splitCsv output. To join these channels, you can use the join operator. For example, given your simple CSV (as 'foo.csv') and the test bams created previously:
nextflow.enable.dsl=2
params.input1 = './foo.csv'
params.input2 = './path/to/bams/*.bam'
process help_me {
debug true
input:
tuple val(sample), val(myval), path(bam)
output:
path 'result'
"""
echo -n "sample: ${sample}, myval: ${myval}, bam: ${bam}"
touch result
"""
}
workflow {
Channel.fromPath( params.input1 ) \
| splitCsv( header:true ) \
| map { row -> tuple( row.sample, row.value ) } \
| set { rows_ch }
Channel.fromPath( params.input2 ) \
| map { bam -> tuple( bam.baseName, bam ) } \
| join( rows_ch ) \
| map { sample, bam, myval -> tuple( sample, myval, bam ) } \
| help_me
}
Results:
$ nextflow run script.nf
N E X T F L O W ~ version 22.04.0
Launching `script.nf` [lethal_mayer] DSL2 - revision: 395732babc
executor > local (2)
[c5/e96085] process > help_me (1) [100%] 2 of 2 ✔
sample: B, myval: 0.5, bam: B.bam
sample: A, myval: 1, bam: A.bam
If your CSV has more than one value for a particalar sample and these are specified on seperate lines, you probably want instead the combine operator. For example, if your 'foo.csv' contains:
sample,value
A,1
B,0.5
B,2
And replace, join( rows_ch ) with combine( rows_ch, by:0 ) in the above example. Results:
nextflow run script.nf
N E X T F L O W ~ version 22.04.0
Launching `script.nf` [festering_miescher] DSL2 - revision: f8de1e0d20
executor > local (3)
[ee/8af543] process > help_me (3) [100%] 3 of 3 ✔
sample: A, myval: 1, bam: A.bam
sample: B, myval: 0.5, bam: B.bam
sample: B, myval: 2, bam: B.bam
I am trying to create a robust glob pattern that will match most of the different naming conventions used for fastq files we receive. However, the version of nextflow I am using (20.10.0) on the HPC doesn't seem to accept what I've written.
Here are some examples of file names:
19_S8_R1_001.fastq.gz 19_S8_R2_001.fastq.gz
F1HD1_S28_R1.fastq.gz F1HD1_S28_R2.fastq.gz
SRR3137747_1.fastq SRR3137747_2.fastq
The pattern I originally wrote to go with the fromFilePairs operator was *_?(R){1,2}?(_001).f?(ast)q?(.gz). Which I tested in a bash environment. Here is the output from testing in the directory with the top two example files:
-bash-4.2$ shopt -s extglob
-bash-4.2$ ls -1 *_?(R){1,2}?(_001).f?(ast)q?(.gz)
19_S8_R1_001.fastq.gz
19_S8_R2_001.fastq.gz
But when I tried to run this with nextflow, it just gave me the error message I put into the ifEmpty operator.
I've eventually got it working, but using this pattern: *_{R1,R2,1,2}{.fastq.gz,.fq.gz,.fastq,.fq,_001.fastq.gz,_001.fq.gz,_001.fastq,_001.fq}, which isn't particularly robust.
Unless I've missed it in the nextflow documentation (and the information I've found about glob), I don't see alternatives to match-zero-or-more operators in nextflow. Any alternative solutions?
Thanks in advance.
The following glob pattern seems to match some of the more common FASTQ filenames:
Channel
.fromFilePairs( '*_{,R}{1,2}{,_001}.{fq,fastq}{,.gz}' )
.view()
Or with a parameterized directory prefix:
Channel
.fromFilePairs( "${params.input_dir}/*_{,R}{1,2}{,_001}.{fq,fastq}{,.gz}" )
.view()
Results:
N E X T F L O W ~ version 21.04.3
Launching `script.nf` [serene_austin] - revision: 5527b9b3c0
[SRR3137747, [/path/to/fasta/SRR3137747_1.fastq, /path/to/fasta/SRR3137747_2.fastq]]
[19_S8, [/path/to/fasta/19_S8_R1_001.fastq.gz, /path/to/fasta/19_S8_R2_001.fastq.gz]]
[F1HD1_S28, [/path/to/fasta/F1HD1_S28_R1.fastq.gz, /path/to/fasta/F1HD1_S28_R2.fastq.gz]]
Another option, which might be more robust (and readable), is to make use of the fact that you can specify more than one glob pattern using a list as argument, and build your list of glob patterns dynamically:
nextflow.enable.dsl=2
params.input_dir = '/path/to/fasta'
def cartesian_product(A, B) {
A.collectMany{ a -> B.collect { b -> [a, b] } }
}
def extensions = [
'.fastq.gz',
'.fastq',
'.fq.gz',
'.fq',
]
def suffixes = [
'*_R{1,2}_001',
'*_R{1,2}',
'*_{1,2}',
]
workflow {
def patterns = cartesian_product(suffixes, extensions).collect {
"${params.input_dir}/${it.join()}"
}
Channel.fromFilePairs( patterns ).view()
}
Results:
N E X T F L O W ~ version 21.04.3
Launching `script.nf` [deadly_payne] - revision: 6d2472ef23
[19_S8, [/path/to/fasta/19_S8_R1_001.fastq.gz, /path/to/fasta/19_S8_R2_001.fastq.gz]]
[F1HD1_S28, [/path/to/fasta/F1HD1_S28_R1.fastq.gz, /path/to/fasta/F1HD1_S28_R2.fastq.gz]]
[SRR3137747, [/path/to/fasta/SRR3137747_1.fastq, /path/to/fasta/SRR3137747_2.fastq]]
I'm really new to Haskell and am trying to learn some simple functions.
I made some option choices through functions like so:
displayOptions :: Int -> String
displayOptions option
| option == 0 = "1 - Shop\n2 - Fight Monsters\n3 - Inn\n4 - Monk"
| otherwise = "invalid"
I then get the user input with getLine
choice <- getLine
And then I display a second option box for example,
playerChoice :: String -> String
playerChoice option
| option == "1" = "Sword - 50gp\nShield - 100gp"
| option == "2" = "You go fight some monsters outside town."
| option == "3" = "You go to the town Inn."
| option == "4" = "You go see the holy monk."
| otherwise = "You entered invalid information...\n" ++ displayOptions(0)
What I'm confused about is how I can get the user's choice again within a function. I want my otherwise = to say invalid information, display the options, get the input again and then display the choice they made.
So my main program would look something like this:
main = do
putStrLn "The king has sent you on the journey to become a master."
putStrLn $ "What would you like to do?"
putStrLn $ displayOptions(0)
choice <- getLine
putStrLn $ playerChoice(choice)
You would have to change the return type to be IO String instead of String.
However, probably you want to return Either String String so the function indicates that it's returned the game progression text Right "You do something" or a failure with an explanation of the failure Left "Not an option".
Then in the caller you loop until you get a Right value and each time you get a Left value you print the text and ask again.
I'm sure there's a slightly better way but here's some quickly fixed up code:
module Main where
playerChoice :: String -> Either String String
playerChoice option
| option == "1" = Right "Sword - 50gp\nShield - 100gp"
| option == "2" = Right "You go fight some monsters outside town."
| option == "3" = Right "You go to the town Inn."
| option == "4" = Right "You go see the holy monk."
| otherwise = Left "You entered invalid information..."
displayOptions :: Int -> String
displayOptions option
| option == 0 = "1 - Shop\n2 - Fight Monsters\n3 - Inn\n4 - Monk\n"
| otherwise = "invalid"
main = do
let progress whathappens = do
putStrLn whathappens
let tryAsk prompt = do
putStrLn prompt
choice <- getLine
either tryAsk progress $ playerChoice(choice)
tryAsk $ displayOptions(0) ++ "What would you like to do?"
progress "The king has sent you on the journey to become a master."
if you import Data.Function then you can also write it like the following - which in this case probably isn't better but it's a nice shallow step into a fascinating part of haskell:
fix (\moreProgress whathappens -> do
putStrLn whathappens
fix (\askAgain prompt -> do
putStrLn prompt
choice <- getLine
either askAgain moreProgress $ playerChoice(choice))
$ displayOptions(0) ++ "What would you like to do?")
$ "The king has sent you on the journey to become a master."
You should change the return type to Either String String in order to provide an error message. For more details, look at the docs for the Either type.
In Haskell, we don't have traditional looping structures like for or while instead we use recursive calls. See While loop in Haskell with a condition for an example.
Looking to extract records from a table in a very well formed HTMl table using HXT. I've reviewed a couple of examples on SO and the HXT documentation, such as:
Extracting Values from a Subtree
http://adit.io/posts/2012-04-14-working_with_HTML_in_haskell.html
https://www.schoolofhaskell.com/school/advanced-haskell/xml-parsing-with-validation
Running Haskell HXT outside of IO?
extract multiples html tables with hxt
Parsing html in haskell
http://neilbartlett.name/blog/2007/08/01/haskell-explaining-arrows-through-xml-transformationa/
https://wiki.haskell.org/HXT/Practical/Simple2
https://wiki.haskell.org/HXT/Practical/Simple1
Group html table rows with HXT in Haskell
Parsing multiple child nodes in Haskell with HXT
My problem is:
I want to identify a table uniquely by a known id, and then for each
tr within that table, create a record object and return this as a list
of records.
Here's my HTML
<!DOCTYPE html>
<head>
<title>FakeHTML</title>
</head>
<body>
<table id="fakeout-dont-get-me">
<thead><tr><td>Null</td></tr></thead>
<tbody><tr><td>Junk!</td></tr></tbody>
</table>
<table id="Greatest-Table">
<thead>
<tr><td>Name</td><td>Favorite Rock</td></tr>
</thead>
<tbody>
<tr id="rock1">
<td>Fred</td>
<td>Igneous</td>
</tr>
<tr id="rock2">
<td>Bill</td>
<td>Sedimentary</td>
</tr>
</tbody>
</table>
</body>
</html>
Here's the code I'm trying, along with 2 different approaches to parsing this. First, imports ...
{-# LANGUAGE Arrows, OverloadedStrings, DeriveDataTypeable, FlexibleContexts #-}
import Text.XML.HXT.Core
import Text.HandsomeSoup
import Text.XML.HXT.XPath.XPathEval
import Data.Tree.NTree.TypeDefs
import Text.XML.HXT.XPath.Arrows
What I want is a list of Rockrecs, eg from...
recs = [("rock1", "Name", "Fred", "Favorite Rock", "Igneous"),
("rock2", "Name", "Bill", "Favorite Rock", "Sedimentary")]
data Rockrec = Rockrec { rockID:: String,
rockName :: String,
rockFav :: String} deriving Show
rocks = [(\(a,_,b,_,c) -> Rockrec a b c ) r | r <- recs]
-- [Rockrec {rockID = "rock1", rockName = "Fred", rockFav = "Igneous"},
-- Rockrec {rockID = "rock2", rockName = "Bill", rockFav = "Sedimentary"}]
Here's my first way, which uses a bind on runLA after I return a bunch of [XMLTree]. That is, I do a first parse just to get the right table, then I process the tree rows after that first grab.
Attempt 1
getTab = do
dt <- Prelude.readFile "fake.html"
let html = parseHtml dt
tab <- runX $ html //> hasAttrValue "id" (== "Greatest-Table")
return tab
-- hmm, now this gets tricky...
-- table <- getTab
node tag = multi (hasName tag)
-- a la https://stackoverflow.com/questions/3901492/running-haskell-hxt-outside-of-io?rq=1
getIt :: ArrowXml cat => cat (Data.Tree.NTree.TypeDefs.NTree XNode) (String, String)
getIt = (node "tr" >>>
(getAttrValue "id" &&& (node "td" //> getText)))
This kinda works. I need to massage a bit, but can get it to run...
-- table >>= runLA getIt
-- [("","Name"),("","Favorite Rock"),("rock1","Fred"),("rock1","Igneous"),("rock2","Bill"),("rock2","Sedimentary")]
This is a second approach, inspired by https://wiki.haskell.org/HXT/Practical/Simple1. Here, I think I'm relying on something in {-# LANGUAGE Arrows -} (which coincidentally breaks my list comprehension for rec above), to use the proc function to do this in a more readable do block. That said, I can't even get a minimal version of this to compile:
Attempt 2
getR :: ArrowXml cat => cat XmlTree Rockrec
getR = (hasAttrValue "id" (== "Greatest-Table")) >>>
proc x -> do
rockId <- getText -< x
rockName <- getText -< x
rockFav <- getText -< x
returnA -< Rockrec rockId rockName rockFav
EDIT
Trouble with the types, in response to the comment below from Alec
λ> getR [table]
<interactive>:56:1-12: error:
• Couldn't match type ‘NTree XNode’ with ‘[[XmlTree]]’
Expected type: [[XmlTree]] -> Rockrec
Actual type: XmlTree -> Rockrec
• The function ‘getR’ is applied to one argument,
its type is ‘cat0 XmlTree Rockrec’,
it is specialized to ‘XmlTree -> Rockrec’
In the expression: getR [table]
In an equation for ‘it’: it = getR [table]
λ> getR table
<interactive>:57:1-10: error:
• Couldn't match type ‘NTree XNode’ with ‘[XmlTree]’
Expected type: [XmlTree] -> Rockrec
Actual type: XmlTree -> Rockrec
• The function ‘getR’ is applied to one argument,
its type is ‘cat0 XmlTree Rockrec’,
it is specialized to ‘XmlTree -> Rockrec’
In the expression: getR table
In an equation for ‘it’: it = getR table
END EDIT
Even if I'm not selecting elements, I can't get the above to run. I'm also a little puzzled at how I should do something like put the first td in rockName and the second td in rockFav, how to include an iterator on these (supposing I have a lot of td fields, instead of just 2.)
Any further general tips on how to do this more painlessly appreciated.
From HXT/Practical/Google1 I think I am able to piece together a solution.
{-# LANGUAGE Arrows #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Hanzo where
import Text.HandsomeSoup
import Text.XML.HXT.Cor
atTag tag =
deep (isElem >>> hasName tag)
text =
deep isText >>> getText
data Rock = Rock String String String deriving Show
rocks =
atTag "tbody" //> atTag "tr"
>>> proc x -> do
rowID <- x >- getAttrValue "id"
name <- x >- atTag "td" >. (!! 0) >>> text
kind <- x >- atTag "td" >. (!! 1) >>> text
returnA -< Rock rowID name kind
main = do
dt <- readFile "html.html"
result <- runX $ parseHtml dt
//> hasAttrValue "id" (== "Greatest-Table")
>>> rocks
print result
The key takeways are these:
Your arrows work on streams of elements, but not individual elements. This is the ArrowList constraint. Thus, calling getText three times will produce surprising behavior because getText represents all the different possible text values you could get in the course of streaming <table> elements through your proc x -> do {...}.
What we can do instead is focus on the stream we want: a stream of <tr>s inside the <tbody>. For each table row, we grab the ID attribute value and the text of the first two <td>s.
This does not seem the most elegant solution, but one way we can index into a stream is to filter it down with the (>.) :: ArrowList cat => cat a b -> ([b] -> c) -> cat a c combinator.
One last trick, one that I noticed in the practical wiki examples: we can use deep and isElem/isText to focus on just the nodes we want. XML trees are noisy!
I have the following SML program in a file named testexc.sml:
structure TestExc : sig
val main : (string * string list -> int)
end =
struct
exception OhNoes;
fun main(prog_name, args) = (
raise OhNoes
)
end
I build it with smlnj-110.74 like this:
ml-build sources.cm TestExc.main testimg
Where sources.cm contains:
Group is
csx.sml
I invoke the program like so (on Mac OS 10.8):
sml #SMLload testimg.x86-darwin
I expect to see something when I invoke the program, but the only thing I get is a return code of 1:
$ sml #SMLload testimg.x86-darwin
$ echo $?
1
What gives? Why would SML fail silently on this unhandled exception? Is this behavior normal? Is there some generic handler I can put on main that will print the error that occurred? I realize I can match exception OhNoes, but what about larger programs with exceptions I might not know about?
The answer is to handle the exception, call it e, and print the data using a couple functions available in the system:
$ sml
Standard ML of New Jersey v110.74 [built: Tue Jan 31 16:23:10 2012]
- exnName;
val it = fn : exn -> string
- exnMessage;
val it = fn : exn -> string
-
Now, we have our modified program, were we have the generic handler tacked on to main():
structure TestExc : sig
val main : (string * string list -> int)
end =
struct
exception OhNoes;
open List;
fun exnToString(e) =
List.foldr (op ^) "" ["[",
exnName e,
" ",
exnMessage e,
"]"]
fun main(prog_name, args) = (
raise OhNoes
)
handle e => (
print("Grasshopper disassemble: " ^ exnToString(e));
42)
end
I used lists for generating the message, so to make this program build, you'll need a reference to the basis library in sources.cm:
Group is
$/basis.cm
sources.cm
And here's what it looks like when we run it:
$ sml #SMLload testimg.x86-darwin
Grasshopper disassemble: [OhNoes OhNoes (more info unavailable: ExnInfoHook not initialized)]
$ echo $?
42
I don't know what ExnInfoHook is, but I see OhNoes, at least. It's too bad the SML compiler didn't add a basic handler for us, so as to print something when there was an unhandled exception in the compiled program. I suspect ml-build would be responsible for that task.