List of all instances created by a module - google-compute-engine

I have a number of module invocations that look similar to this
1 module "gcpue4a1" {
2 source = "../../../modules/pods"
3
4 }
where the module is creating instances, DNS records, etc.
locals {
gateway_name = "gateway-${var.network_zone}-${var.environment}-1"
}
resource "google_compute_instance" "gateway" {
name = "${local.gateway_name}"
machine_type = "n1-standard-8"
zone = "${var.zone}"
allow_stopping_for_update = true
}
How can I iterate over a list of all instances that have been created through this module. Can I do it with instance tags or labels?
In the end what I want is to be able to iterate over a list to export to an ansible inventory file. But I'm just not sure how I do this when my resources are encapsulated in modules.
With terraform show I can clearly see the structure of the variables.
➜ gcp-us-east4 git:(integration) ✗ terraform show | grep google_compute_instance.gateway -n1
640- zone = us-east4-a
641:module.screencast-gcp-pod-gcpue4a1-food.google_compute_instance.gateway:
642- id = gateway-gcpue4a1-food-1
--
--
991- zone = us-east4-a
992:module.screencast-gcp-pod-gcpue4a2-food.google_compute_instance.gateway:
993- id = gateway-gcpue4a2-food-1
--
--
1342- zone = us-east4-a
1343:module.screencast-gcp-pod-gcpue4a3-food.google_compute_instance.gateway:
1344- id = gateway-gcpue4a3-food-1
--
--
1693- zone = us-east4-a
1694:module.screencast-gcp-pod-gcpue4a4-food.google_compute_instance.gateway:
1695- id = gateway-gcpue4a4-food-1
The etcd inventory piece works just fine when I explicitly say which node I want. The overall inventory piece below it does not and I'm not sure how to fix it.
10 ##Create ETCD Inventory
11 provisioner "local-exec" {
12 command = "echo \"\n[etcd]\n${google_compute_instance.k8s-master.name} ansible_s sh_host=${google_compute_instance.k8s-master.network_interface.0.address}\" >> kubesp ray-inventory"
13 }
14
15 ##Create Nodes Inventory
16 provisioner "local-exec" {
17 command = "echo \"\n[kube-node]\" >> kubespray-inventory"
18 }
19 # provisioner "local-exec" {
20 # command = "echo \"${join("\n",formatlist("%s ansible_ssh_host=%s", google_compu te_instance.gateway.*.name, google_compute_instance.gateway.*.network_interface.0.add ress))}\" >> kubespray-inventory"
21 # }
➜ gcp-us-east4 git:(integration) ✗ terraform apply
Error: resource 'null_resource.ansible-provision' provisioner local-exec (#4): unknown resource 'google_compute_instance.gateway' referenced in variable google_compute_instance.gateway.*.id

you can make sure each module adds a label that matches the module
and you can then use gcloud compute instances list and use a filter to only show the ones with the specific lablel.

Related

Nextflow rename barcodes and concatenate reads within barcodes

My current working directory has the following sub-directories
My Bash script
Hi there
I have compiled the above Bash script to do the following tasks:
rename the sub-directories (barcode01-12) taking information from the metadata.csv
concatenate the individual reads within a sub-directory and move them up in the $PWD
then I use these concatenated reads (one per barcode) for my Nextflow script below:
Query:
How can I get the above pre-processing tasks (renaming and concatenating) or the Bash script added at the beginning of my following Nextflow script?
In my experience, FASTQ files can get quite large. Without knowing too much of the specifics, my recommendation would be to move the concatenation (and renaming) to a separate process. In this way, all of the 'work' can be done inside Nextflow's working directory. Here's a solution that uses the new DSL 2. It uses the splitCsv operator to parse the metadata and identify the FASTQ files. The collection can then be passed into our 'concat_reads' process. To handle optionally gzipped files, you could try the following:
params.metadata = './metadata.csv'
params.outdir = './results'
process concat_reads {
tag { sample_name }
publishDir "${params.outdir}/concat_reads", mode: 'copy'
input:
tuple val(sample_name), path(fastq_files)
output:
tuple val(sample_name), path("${sample_name}.${extn}")
script:
if( fastq_files.every { it.name.endsWith('.fastq.gz') } )
extn = 'fastq.gz'
else if( fastq_files.every { it.name.endsWith('.fastq') } )
extn = 'fastq'
else
error "Concatentation of mixed filetypes is unsupported"
"""
cat ${fastq_files} > "${sample_name}.${extn}"
"""
}
process pomoxis {
tag { sample_name }
publishDir "${params.outdir}/pomoxis", mode: 'copy'
cpus 18
input:
tuple val(sample_name), path(fastq)
"""
mini_assemble \\
-t ${task.cpus} \\
-i "${fastq}" \\
-o results \\
-p "${sample_name}"
"""
}
workflow {
fastq_extns = [ '.fastq', '.fastq.gz' ]
Channel.fromPath( params.metadata )
| splitCsv()
| map { dir, sample_name ->
all_files = file(dir).listFiles()
fastq_files = all_files.findAll { fn ->
fastq_extns.find { fn.name.endsWith( it ) }
}
tuple( sample_name, fastq_files )
}
| concat_reads
| pomoxis
}

Is there a way to lookup a value from a CSV in nextflow? Or, alternately, reuse a CSV?

I have a simple csv created as part of a workflow, like below:
sample,value
A,1
B,0.5
Separately, I have another channel with file names matching the sample names. I'd like to be able to use the values associated with each sample name within a new process.
I've tried splitting the CSV using .splitCsv but (unsurprisingly) sometimes the incorrect value gets used with a sample, although it does run the correct number of times. I've also tried just using awk within the script to pull out the corresponding value and save it to a variable, and this causes the correct value to be used, but it consumes the CSV file and so only one sample gets processed.
Super simplified nextflow (DSL2) script:
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process foo {
input:
path input_file
output:
path 'file.csv', emit csv
"""
script that creates csv
"""
}
process bar {
input:
path input_file2
output:
path 'file.bam', emit bam
"""
script that creates bam files
"""
}
process help_me {
input:
path csv
path bam
output:
path 'result'
"""
script that uses value from csv on associated bam file
"""
}
workflow {
foo(params.input)
bar(params.input2)
help_me(foo.out.csv, bar.out.bam)
}
Thanks!!
Edit: In essence, is there a way to synchronize two channels such that I can use a csv's individual rows with associated files?
If you have a value channel, you can reuse a file (like a CSV) an unlimited number of times without consuming the channel. For example:
workflow {
input1 = file( params.input1 )
input2 = file( params.input2 )
foo( input1 )
bar( input2 )
help_me(foo.out.csv, bar.out.bam)
}
Here, both input1 and input2 are value channels. Also, (emphasis mine):
A value channel is implicitly created by a process when an input
specifies a simple value in the from clause. Moreover, a value channel
is also implicitly created as output for a process whose inputs are
only value channels.
Means that both foo.out.csv and bar.out.bam are also value channels. Additionally, help_me.out is also a value channel. If input2 was instead a queue channel, you can see that input1 can be re-used an unlimited number of times:
$ mkdir -p ./path/to/bams
$ touch ./path/to/bams/{A,B,C}.bam
$ touch ./foo.txt
params.input1 = './foo.txt'
params.input2 = './path/to/bams/*.bam'
workflow {
input1 = file( params.input1 )
input2 = Channel.fromPath( params.input2 )
foo( input1 )
bar( input2 )
help_me(foo.out.csv, bar.out.bam)
}
Results:
$ nextflow run script.nf
N E X T F L O W ~ version 22.04.0
Launching `script.nf` [trusting_allen] DSL2 - revision: 75209e4c85
executor > local (7)
[24/d459f7] process > foo [100%] 1 of 1 ✔
[04/a903e4] process > bar (2) [100%] 3 of 3 ✔
[24/7a9a1d] process > help_me (3) [100%] 3 of 3 ✔
Note that bar.out.bam and help_me.out are now queue channels.
If instead, you have one CSV per sample (or similar configuration), you will need some way to join these channels prior and adjust your new process' input declaration accordingly. What you want to avoid is declaring two (or more) queue channels in your input block. This part of docs is well worth the time investment: Understand how multiple input channels work, and would explain why you saw the incorrect value being associated with a particular sample when consuming the splitCsv output. To join these channels, you can use the join operator. For example, given your simple CSV (as 'foo.csv') and the test bams created previously:
nextflow.enable.dsl=2
params.input1 = './foo.csv'
params.input2 = './path/to/bams/*.bam'
process help_me {
debug true
input:
tuple val(sample), val(myval), path(bam)
output:
path 'result'
"""
echo -n "sample: ${sample}, myval: ${myval}, bam: ${bam}"
touch result
"""
}
workflow {
Channel.fromPath( params.input1 ) \
| splitCsv( header:true ) \
| map { row -> tuple( row.sample, row.value ) } \
| set { rows_ch }
Channel.fromPath( params.input2 ) \
| map { bam -> tuple( bam.baseName, bam ) } \
| join( rows_ch ) \
| map { sample, bam, myval -> tuple( sample, myval, bam ) } \
| help_me
}
Results:
$ nextflow run script.nf
N E X T F L O W ~ version 22.04.0
Launching `script.nf` [lethal_mayer] DSL2 - revision: 395732babc
executor > local (2)
[c5/e96085] process > help_me (1) [100%] 2 of 2 ✔
sample: B, myval: 0.5, bam: B.bam
sample: A, myval: 1, bam: A.bam
If your CSV has more than one value for a particalar sample and these are specified on seperate lines, you probably want instead the combine operator. For example, if your 'foo.csv' contains:
sample,value
A,1
B,0.5
B,2
And replace, join( rows_ch ) with combine( rows_ch, by:0 ) in the above example. Results:
nextflow run script.nf
N E X T F L O W ~ version 22.04.0
Launching `script.nf` [festering_miescher] DSL2 - revision: f8de1e0d20
executor > local (3)
[ee/8af543] process > help_me (3) [100%] 3 of 3 ✔
sample: A, myval: 1, bam: A.bam
sample: B, myval: 0.5, bam: B.bam
sample: B, myval: 2, bam: B.bam

Creating 20+ Azure Resource Group with Locks

Creating 20+ Azure Resource Group with Locks two locations in the US (West and East). I can not fine JSON template or cli template which would let me create them through user prompt in the terminal or through JSON parameter in the console. I cant be creating one by one for both the regions using
New-AzureRmResourceGroup -Name $rgName -Location $locName
Closest i saw in MS site is the below -
variables
`$labPrefix = "Mlab"
$labnumber = "2017"
$labsubnet = "55"
$rgName = $labPrefix + $labnumber #New resource group name
$locName = "West Europe" # Loation of new resource group
$saName = $rgName.Replace("-","").tolower()
$saType="Standard_LRS" # Storage account type`
If i was creating RG as Mlab2017 - this would work. but mine would have 4 different labPrefix and 4 different labnumber. I cant seem to find a better solutions for this. any help on creating the json array with or shell script array to pass and create the RG with Locks will be highly appreciated.
You could use template to create resource groups firstly, then you could use Power Shell to lock resource groups in specific area. For example:
$location1 = "eastus"
$location2 = "westus"
$rg=Get-AzureRmResourceGroup |Where-Object{($_.Location -eq $location1) -or ($_.Location -eq $location2)}
$rgnames = $rg.ResourceGroupName
foreach ($rgname in $rgnames)
{
$lockname = $rgname+"lock"
New-AzureRmResourceLock -LockName $lockname -LockLevel CanNotDelete -ResourceGroupName $rgname
}
You also could check this link.

How can we provide multiple values for a Single argument either in services.conf or comands.conf

Here I am trying to use a plugin to check whether the service running or not, if there is any warning or any critical action required, at the same time the performance parameter.
We have used below plugin to check if a server is alive or not and read it's performance data JSON
https://github.com/drewkerrigan/nagios-http-json
I am trying to read a JSON file as below which is hosted on http://localhost:8080/sample.json
The plugin works perfectly on Command line, it shows me all the Metrics available.
$:/usr/lib/nagios/plugins$ ./check_http_json.py -H localhost:8080 -p sample.json -m metrics.etp_count metrics.atc_count
OK: Status OK.|'metrics.etp_count'=101 'metrics.atc_count'=0
But when I try the same in Icinga2 configuration, it doesn't show me this performance metrics, although it doesn't give any error but at the same time it don't show any value.
find the JSON, Command.conf and Service.conf as follows.
{
"metrics": {
"etp_count": "0",
"atc_count": "101",
"mean_time": -1.0,
}
}
Below are my commands.conf and services.conf
commands.conf
/* Json Read Command */
object CheckCommand "json_check"{
import "plugin-check-command"
command = [PluginDir + "/check_http_json.py"]
arguments = {
"-H" = "$server_port$"
"-p" = "$json_path$"
"-w" = "$warning_value$"
"-c" = "$critical_value$"
"-m" = "$Metrics1$,$Metrics2$"
}
}
services.conf
apply Service "json"{
import "generic-service"
check_command = "json_check"
vars.server_port="localhost:8080"
vars.json_path="sample.json"
vars.warning_value="metrics.etp_count,1:100"
vars.critical_value="metrics.etp_count,101:1000"
vars.Metrics1="metrics.etp_count"
vars.Metrics2="metrics.atc_count"
assign where host.name == NodeName
}
Does any one have any idea how can we pass multiple values in Command.conf and Service.conf??
I have resolved the issue.
I had to change the Plugin file "check_http_json.py" for below code
def checkMetrics(self):
"""Return a Nagios specific performance metrics string given keys and parameter definitions"""
metrics = ''
warning = ''
critical = ''
if self.rules.metric_list != None:
for metric in self.rules.metric_list:
Replaced With
def checkMetrics(self):
"""Return a Nagios specific performance metrics string given keys and parameter definitions"""
metrics = ''
warning = ''
critical = ''
if self.rules.metric_list != None:
for metric in self.rules.metric_list[0].split():
Actually the issue was the list was not handled properly, so it was not able to iterate through the items in the list, it was considering it as a single string due to services.config file.
it had to be further get split to get the items in the Metrics string.

django and celery beat scheduler no database entries

my problem is that the beat scheduler doesn't store entries in the table 'tasks' and 'workers'. i use django and celery. in my database (MySQL) i have added a periodic tast "Estimate Region" with Interval 120 seconds.
this is how i start my worker:
`python manage.py celery worker -n worker.node1 -B --loglevel=info &`
after i started the worker i can see in the terminal that the worker works and the scheduler picks out the periodic task from the database and operates it.
how my task is defined:
#celery.task(name='fv.tasks.estimateRegion',
ignore_result=True,
max_retries=3)
def estimateRegion(region):
terminal shows this:
WARNING ModelEntry: Estimate Region fv.tasks.estimateRegion(*['ASIA'], **{}) {<freq: 2.00 minutes>}
[2013-05-23 10:48:19,166: WARNING/MainProcess] <ModelEntry: Estimate Region fv.tasks.estimateRegion(*['ASIA'], **{}) {<freq: 2.00 minutes>}>
INFO Calculating estimators for exchange:Bombay Stock Exchange
the task "estimate region" returns me a results.csv file, so i can see that the worker and the beat scheduler works. But after that i have no database entries in "tasks" or "workers" in my django admin panel.
Here are my celery settings in settings.py
` CELERY_DISABLE_RATE_LIMITS = True
CELERY_TASK_SERIALIZER = 'pickle'
CELERY_RESULT_SERIALIZER = 'pickle'
CELERY_IMPORTS = ('fv.tasks')
CELERY_RESULT_PERSISTENT = True
# amqp settings
BROKER_URL = 'amqp://fv:password#localhost'
#BROKER_URL = 'amqp://fv:password#192.168.99.31'
CELERY_RESULT_BACKEND = 'amqp'
CELERY_TASK_RESULT_EXPIRES = 18000
CELERY_ROUTES = (fv.routers.TaskRouter(), )
_estimatorExchange = Exchange('estimator')
CELERY_QUEUES = (
Queue('celery', Exchange('celery'), routing_key='celery'),
Queue('estimator', _estimatorExchange, routing_key='estimator'),
)
# beat scheduler settings
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
# development settings
CELERY_RESULT_PERSISTENT = False
CELERY_DEFAULT_DELIVERY_MODE = 'transient'`
i hope anyone can help me :)
Have you started celerycam?
python manage.py celerycam
It will take a snapshot (every 1 second by default) of the current state of tasks.
You can read more about it in the celery documentation