How to iterate over all config entries on Groovy? - configuration

I have a groovy scipt which play a config role in my app. It is structure will be such:
a{
b=val1
c{
d=val2
}
}
e{
f=val3
}
How can I iterate over enties in this config to separate setting of one root from setting of another root? I mean such way of iteration where I will be able to determine root positions, something like this:
a (root)
b
c (subroot)
d
e (root)
f
And config level not limited by 2 level, so iterate using simple inner 'for' cycles is not suitable, cause I don't how many level will be on compilation.

You mean like this?
Given a configuration:
def cfg = '''
a {
b = 'val1'
c {
d = 'val2'
}
}
e {
f = 'val3'
}'''
You can define a recursive walk method like so:
def walk( map, root=true ) {
map.each { key, value ->
if( value instanceof Map ) {
println "$key (${root?'root':'subroot'})"
walk( value, false )
}
else {
println "$key"
}
}
}
Then call the function, passing in the slurped config:
walk( new ConfigSlurper().parse( cfg ) )
This prints:
a (root)
b
c (subroot)
d
e (root)
f
You can also have your config in a file (in this example, Config.groovy)
Then, you can change the walk call to:
walk( new ConfigSlurper().parse( Config ) )
And it will output the same

Related

Nextflow rename barcodes and concatenate reads within barcodes

My current working directory has the following sub-directories
My Bash script
Hi there
I have compiled the above Bash script to do the following tasks:
rename the sub-directories (barcode01-12) taking information from the metadata.csv
concatenate the individual reads within a sub-directory and move them up in the $PWD
then I use these concatenated reads (one per barcode) for my Nextflow script below:
Query:
How can I get the above pre-processing tasks (renaming and concatenating) or the Bash script added at the beginning of my following Nextflow script?
In my experience, FASTQ files can get quite large. Without knowing too much of the specifics, my recommendation would be to move the concatenation (and renaming) to a separate process. In this way, all of the 'work' can be done inside Nextflow's working directory. Here's a solution that uses the new DSL 2. It uses the splitCsv operator to parse the metadata and identify the FASTQ files. The collection can then be passed into our 'concat_reads' process. To handle optionally gzipped files, you could try the following:
params.metadata = './metadata.csv'
params.outdir = './results'
process concat_reads {
tag { sample_name }
publishDir "${params.outdir}/concat_reads", mode: 'copy'
input:
tuple val(sample_name), path(fastq_files)
output:
tuple val(sample_name), path("${sample_name}.${extn}")
script:
if( fastq_files.every { it.name.endsWith('.fastq.gz') } )
extn = 'fastq.gz'
else if( fastq_files.every { it.name.endsWith('.fastq') } )
extn = 'fastq'
else
error "Concatentation of mixed filetypes is unsupported"
"""
cat ${fastq_files} > "${sample_name}.${extn}"
"""
}
process pomoxis {
tag { sample_name }
publishDir "${params.outdir}/pomoxis", mode: 'copy'
cpus 18
input:
tuple val(sample_name), path(fastq)
"""
mini_assemble \\
-t ${task.cpus} \\
-i "${fastq}" \\
-o results \\
-p "${sample_name}"
"""
}
workflow {
fastq_extns = [ '.fastq', '.fastq.gz' ]
Channel.fromPath( params.metadata )
| splitCsv()
| map { dir, sample_name ->
all_files = file(dir).listFiles()
fastq_files = all_files.findAll { fn ->
fastq_extns.find { fn.name.endsWith( it ) }
}
tuple( sample_name, fastq_files )
}
| concat_reads
| pomoxis
}

Is there a way to lookup a value from a CSV in nextflow? Or, alternately, reuse a CSV?

I have a simple csv created as part of a workflow, like below:
sample,value
A,1
B,0.5
Separately, I have another channel with file names matching the sample names. I'd like to be able to use the values associated with each sample name within a new process.
I've tried splitting the CSV using .splitCsv but (unsurprisingly) sometimes the incorrect value gets used with a sample, although it does run the correct number of times. I've also tried just using awk within the script to pull out the corresponding value and save it to a variable, and this causes the correct value to be used, but it consumes the CSV file and so only one sample gets processed.
Super simplified nextflow (DSL2) script:
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process foo {
input:
path input_file
output:
path 'file.csv', emit csv
"""
script that creates csv
"""
}
process bar {
input:
path input_file2
output:
path 'file.bam', emit bam
"""
script that creates bam files
"""
}
process help_me {
input:
path csv
path bam
output:
path 'result'
"""
script that uses value from csv on associated bam file
"""
}
workflow {
foo(params.input)
bar(params.input2)
help_me(foo.out.csv, bar.out.bam)
}
Thanks!!
Edit: In essence, is there a way to synchronize two channels such that I can use a csv's individual rows with associated files?
If you have a value channel, you can reuse a file (like a CSV) an unlimited number of times without consuming the channel. For example:
workflow {
input1 = file( params.input1 )
input2 = file( params.input2 )
foo( input1 )
bar( input2 )
help_me(foo.out.csv, bar.out.bam)
}
Here, both input1 and input2 are value channels. Also, (emphasis mine):
A value channel is implicitly created by a process when an input
specifies a simple value in the from clause. Moreover, a value channel
is also implicitly created as output for a process whose inputs are
only value channels.
Means that both foo.out.csv and bar.out.bam are also value channels. Additionally, help_me.out is also a value channel. If input2 was instead a queue channel, you can see that input1 can be re-used an unlimited number of times:
$ mkdir -p ./path/to/bams
$ touch ./path/to/bams/{A,B,C}.bam
$ touch ./foo.txt
params.input1 = './foo.txt'
params.input2 = './path/to/bams/*.bam'
workflow {
input1 = file( params.input1 )
input2 = Channel.fromPath( params.input2 )
foo( input1 )
bar( input2 )
help_me(foo.out.csv, bar.out.bam)
}
Results:
$ nextflow run script.nf
N E X T F L O W ~ version 22.04.0
Launching `script.nf` [trusting_allen] DSL2 - revision: 75209e4c85
executor > local (7)
[24/d459f7] process > foo [100%] 1 of 1 ✔
[04/a903e4] process > bar (2) [100%] 3 of 3 ✔
[24/7a9a1d] process > help_me (3) [100%] 3 of 3 ✔
Note that bar.out.bam and help_me.out are now queue channels.
If instead, you have one CSV per sample (or similar configuration), you will need some way to join these channels prior and adjust your new process' input declaration accordingly. What you want to avoid is declaring two (or more) queue channels in your input block. This part of docs is well worth the time investment: Understand how multiple input channels work, and would explain why you saw the incorrect value being associated with a particular sample when consuming the splitCsv output. To join these channels, you can use the join operator. For example, given your simple CSV (as 'foo.csv') and the test bams created previously:
nextflow.enable.dsl=2
params.input1 = './foo.csv'
params.input2 = './path/to/bams/*.bam'
process help_me {
debug true
input:
tuple val(sample), val(myval), path(bam)
output:
path 'result'
"""
echo -n "sample: ${sample}, myval: ${myval}, bam: ${bam}"
touch result
"""
}
workflow {
Channel.fromPath( params.input1 ) \
| splitCsv( header:true ) \
| map { row -> tuple( row.sample, row.value ) } \
| set { rows_ch }
Channel.fromPath( params.input2 ) \
| map { bam -> tuple( bam.baseName, bam ) } \
| join( rows_ch ) \
| map { sample, bam, myval -> tuple( sample, myval, bam ) } \
| help_me
}
Results:
$ nextflow run script.nf
N E X T F L O W ~ version 22.04.0
Launching `script.nf` [lethal_mayer] DSL2 - revision: 395732babc
executor > local (2)
[c5/e96085] process > help_me (1) [100%] 2 of 2 ✔
sample: B, myval: 0.5, bam: B.bam
sample: A, myval: 1, bam: A.bam
If your CSV has more than one value for a particalar sample and these are specified on seperate lines, you probably want instead the combine operator. For example, if your 'foo.csv' contains:
sample,value
A,1
B,0.5
B,2
And replace, join( rows_ch ) with combine( rows_ch, by:0 ) in the above example. Results:
nextflow run script.nf
N E X T F L O W ~ version 22.04.0
Launching `script.nf` [festering_miescher] DSL2 - revision: f8de1e0d20
executor > local (3)
[ee/8af543] process > help_me (3) [100%] 3 of 3 ✔
sample: A, myval: 1, bam: A.bam
sample: B, myval: 0.5, bam: B.bam
sample: B, myval: 2, bam: B.bam

Chain http request and merge json response in ELM

I've succeeded in triggering a simple http request in ELM and decoding the JSON response into an ELM value - [https://stackoverflow.com/questions/43139316/decode-nested-variable-length-json-in-elm]
The problem I'm facing now-
How to chain (concurrency preferred) two http requests and merge the json into my new (updated) model. Note - please see the updated Commands.elm
Package used to access remote data - krisajenkins/remotedata http://package.elm-lang.org/packages/krisajenkins/remotedata/4.3.0/RemoteData
Github repo of my code - https://github.com/areai51/my-india-elm
Previous Working Code -
Models.elm
type alias Model =
{ leaders : WebData (List Leader)
}
initialModel : Model
initialModel =
{ leaders = RemoteData.Loading
}
Main.elm
init : ( Model, Cmd Msg )
init =
( initialModel, fetchLeaders )
Commands.elm
fetchLeaders : Cmd Msg
fetchLeaders =
Http.get fetchLeadersUrl leadersDecoder
|> RemoteData.sendRequest
|> Cmd.map Msgs.OnFetchLeaders
fetchLeadersUrl : String
fetchLeadersUrl =
"https://data.gov.in/node/85987/datastore/export/json"
Msgs.elm
type Msg
= OnFetchLeaders (WebData (List Leader))
Update.elm
update msg model =
case msg of
Msgs.OnFetchLeaders response ->
( { model | leaders = response }, Cmd.none )
Updated Code - (need help with Commands.elm)
Models.elm
type alias Model =
{ lsLeaders : WebData (List Leader)
, rsLeaders : WebData (List Leader) <------------- Updated Model
}
initialModel : Model
initialModel =
{ lsLeaders = RemoteData.Loading
, rsLeaders = RemoteData.Loading
}
Main.elm
init : ( Model, Cmd Msg )
init =
( initialModel, fetchLeaders )
Commands.elm
fetchLeaders : Cmd Msg
fetchLeaders = <-------- How do I call both requests here ? And fire separate msgs
Http.get fetchLSLeadersUrl lsLeadersDecoder <----- There will be a different decoder named rsLeadersDecoder
|> RemoteData.sendRequest
|> Cmd.map Msgs.OnFetchLSLeaders
fetchLSLeadersUrl : String
fetchLSLeadersUrl =
"https://data.gov.in/node/85987/datastore/export/json"
fetchRSLeadersUrl : String <------------------ New data source
fetchRSLeadersUrl =
"https://data.gov.in/node/982241/datastore/export/json"
Msgs.elm
type Msg
= OnFetchLSLeaders (WebData (List Leader))
| OnFetchRSLeaders (WebData (List Leader)) <-------- New message
Update.elm
update msg model =
case msg of
Msgs.OnFetchLSLeaders response ->
( { model | lsLeaders = response }, Cmd.none )
Msgs.OnFetchRSLeaders response -> <--------- New handler
( { model | rsLeaders = response }, Cmd.none )
The way to fire off two concurrent requests is to use Cmd.batch:
init : ( Model, Cmd Msg )
init =
( initialModel, Cmd.batch [ fetchLSLeaders, fetchRSLeaders ] )
There is no guarantee on which request will return first and there is no guarantee that they will both be successful. One could fail while the other succeeds, for example.
You mention that you want to merge the results, but you didn't say how the merge would work, so I'll just assume you want to append the lists of leaders together in one list, and it will be useful to your application if you had only to deal with a single RemoteData value rather than multiple.
You can merge multiple RemoteData values together with a custom function using map and andMap.
mergeLeaders : WebData (List Leader) -> WebData (List Leader) -> WebData (List Leader)
mergeLeaders a b =
RemoteData.map List.append a
|> RemoteData.andMap b
Notice that I'm using List.append there. That can really be any function that takes two lists and merges them however you please.
If you prefer an applicative style of programming, the above could be translated to the following infix version:
import RemoteData.Infix exposing (..)
mergeLeaders2 : WebData (List Leader) -> WebData (List Leader) -> WebData (List Leader)
mergeLeaders2 a b =
List.append <$> a <*> b
According to the documentation on andMap (which uses a result tuple rather than an appended list in its example):
The final tuple succeeds only if all its children succeeded. It is still Loading if any of its children are still Loading. And if any child fails, the error is the leftmost Failure value.

Chisel language how to best use Queues?

I'm new in chisel, if someone can explain the role of:
1- Queue
2- DecoupledIO
3- Decoupled
3- ValidIO
4- Valid
Is this piece of chisel code correct?
...
val a = Decoupled()
val b = Decoupled()
val c = Decoupled()
...
val Reg_a = Reg(UInt())
val Reg_b = Reg(UInt())
...
when(io.a.valid && io.a.ready && io.b.valid && io.b.ready && io.c.valid && io.c.ready)
{
Reg_a := io.a.bits.data
Reg_b := io.b.bits.data
}
io.c.bits := Reg_a & Reg_b
...
Module.io.a <> Queue(Module_1.io.a_1)
Module.io.b <> Queue(Module_1.io.b_1)
Module_1.io.c_1 <> Queue(Module.io.c)
regards!
Queue is a hardware module that implements a first in, first out queue with DecoupledIO inputs and outputs
DecoupledIO is a ready/valid interface type with members ready, valid, and bits
Decoupled is a helper to construct DecoupledIO from some other type
ValidIO is similar to DecoupledIO except that it only has valid and bits
Valid is similar to Decoupled for constructing ValidIOs
I can't tell what the code is trying to do, but here's an example of a Module that has 2 DecoupledIO inputs and 1 DecoupledIO output. It buffers the inputs with queues and then connects the output to the sum of the inputs:
import chisel3._
import chisel3.util._
class QueueModule extends Module {
val io = IO(new Bundle {
val a = Flipped(Decoupled(UInt(32.W))) // valid and bits are inputs
val b = Flipped(Decoupled(UInt(32.W)))
val z = Decoupled(UInt(32.W)) // valid and bits are outputs
})
// Note that a, b, and z are all of type DecoupledIO
// Buffer the inputs with queues
val qa = Queue(io.a) // io.a is the input to the FIFO
// qa is DecoupledIO output from FIFO
val qb = Queue(io.b)
// We only dequeue when io.z is ready
qa.nodeq() // equivalent to qa.ready := false.B
qb.nodeq()
// When qa and qb have valid inputs and io.z is ready for an output
when (qa.valid && qb.valid && io.z.ready) {
io.z.enq(qa.deq() + qb.deq())
/* The above is short for
io.z.valid := true.B
io.z.bits := qa.bits + qb.bits
qa.ready := true.B
qb.ready := true.B
*/
}
}
Hope this helps!

Logstash indexing JSON arrays

Logstash is awesome. I can send it JSON like this (multi-lined for readability):
{
"a": "one"
"b": {
"alpha":"awesome"
}
}
And then query for that line in kibana using the search term b.alpha:awesome. Nice.
However I now have a JSON log line like this:
{
"different":[
{
"this": "one",
"that": "uno"
},
{
"this": "two"
}
]
}
And I'd like to be able to find this line with a search like different.this:two (or different.this:one, or different.that:uno)
If I was using Lucene directly I'd iterate through the different array, and generate a new search index for each hash within it, but Logstash currently seems to ingest that line like this:
different: {this: one, that: uno}, {this: two}
Which isn't going to help me searching for log lines using different.this or different.that.
Any got any thoughts as to a codec, filter or code change I can make to enable this?
You can write your own filter (copy & paste, rename the class name, the config_name and rewrite the filter(event) method) or modify the current JSON filter (source on Github)
You can find the JSON filter (Ruby class) source code in the following path logstash-1.x.x\lib\logstash\filters named as json.rb. The JSON filter parse the content as JSON as follows
begin
# TODO(sissel): Note, this will not successfully handle json lists
# like your text is '[ 1,2,3 ]' JSON.parse gives you an array (correctly)
# which won't merge into a hash. If someone needs this, we can fix it
# later.
dest.merge!(JSON.parse(source))
# If no target, we target the root of the event object. This can allow
# you to overwrite #timestamp. If so, let's parse it as a timestamp!
if !#target && event[TIMESTAMP].is_a?(String)
# This is a hack to help folks who are mucking with #timestamp during
# their json filter. You aren't supposed to do anything with
# "#timestamp" outside of the date filter, but nobody listens... ;)
event[TIMESTAMP] = Time.parse(event[TIMESTAMP]).utc
end
filter_matched(event)
rescue => e
event.tag("_jsonparsefailure")
#logger.warn("Trouble parsing json", :source => #source,
:raw => event[#source], :exception => e)
return
end
You can modify the parsing procedure to modify the original JSON
json = JSON.parse(source)
if json.is_a?(Hash)
json.each do |key, value|
if value.is_a?(Array)
value.each_with_index do |object, index|
#modify as you need
object["index"]=index
end
end
end
end
#save modified json
......
dest.merge!(json)
then you can modify your config file to use the/your new/modified JSON filter and place in \logstash-1.x.x\lib\logstash\config
This is mine elastic_with_json.conf with a modified json.rb filter
input{
stdin{
}
}filter{
json{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
if you want to use your new filter you can configure it with the config_name
class LogStash::Filters::Json_index < LogStash::Filters::Base
config_name "json_index"
milestone 2
....
end
and configure it
input{
stdin{
}
}filter{
json_index{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
Hope this helps.
For a quick and dirty hack, I used the Ruby filter and below code , no need to use the out of box 'json' filter anymore
input {
stdin{}
}
filter {
grok {
match => ["message","(?<json_raw>.*)"]
}
ruby {
init => "
def parse_json obj, pname=nil, event
obj = JSON.parse(obj) unless obj.is_a? Hash
obj = obj.to_hash unless obj.is_a? Hash
obj.each {|k,v|
p = pname.nil?? k : pname
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p,event)
else
p = pname.nil?? k : [pname,k].join('.')
event[p] = v
end
}
end
def parse_json_array obj, i,pname, event
obj = JSON.parse(obj) unless obj.is_a? Hash
pname_ = pname
if obj.is_a? Hash
obj.each {|k,v|
p=[pname_,i,k].join('.')
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p, event)
else
event[p] = v
end
}
else
n = [pname_, i].join('.')
event[n] = obj
end
end
"
code => "parse_json(event['json_raw'].to_s,nil,event) if event['json_raw'].to_s.include? ':'"
}
}
output {
stdout{codec => rubydebug}
}
Test json structure
{"id":123, "members":[{"i":1, "arr":[{"ii":11},{"ii":22}]},{"i":2}], "im_json":{"id":234, "members":[{"i":3},{"i":4}]}}
and this is whats output
{
"message" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"#version" => "1",
"#timestamp" => "2014-07-25T00:06:00.814Z",
"host" => "Leis-MacBook-Pro.local",
"json_raw" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"id" => 123,
"members.0.i" => 1,
"members.0.arr.0.ii" => 11,
"members.0.arr.1.ii" => 22,
"members.1.i" => 2,
"im_json" => 234,
"im_json.0.i" => 3,
"im_json.1.i" => 4
}
The solution I liked is the ruby filter because that requires us to not write another filter. However, that solution creates fields that are on the "root" of JSON and it's hard to keep track of how the original document looked.
I came up with something similar that's easier to follow and is a recursive solution so it's cleaner.
ruby {
init => "
def arrays_to_hash(h)
h.each do |k,v|
# If v is nil, an array is being iterated and the value is k.
# If v is not nil, a hash is being iterated and the value is v.
value = v || k
if value.is_a?(Array)
# "value" is replaced with "value_hash" later.
value_hash = {}
value.each_with_index do |v, i|
value_hash[i.to_s] = v
end
h[k] = value_hash
end
if value.is_a?(Hash) || value.is_a?(Array)
arrays_to_hash(value)
end
end
end
"
code => "arrays_to_hash(event.to_hash)"
}
It converts arrays to has with each key as the index number. More details:- http://blog.abhijeetr.com/2016/11/logstashelasticsearch-best-way-to.html