Declare variable-length array as a config value with Snakemake CLI - configuration

I have a Snakemake workflow where one of the top-level config entries is an array of variable size (in this particular example, a sibling may or may not be included in the analysis). Currently I'm using the following config file.
{
"case": "/scratch/standage/12175/BAMs/12175.proband.bam",
"controls": [
"/scratch/standage/12175/BAMs/12175.mother.bam",
"/scratch/standage/12175/BAMs/12175.father.bam"
]
}
I know snakemake allows one to specify config options on the command line with the --config flag. Since the case value is a single string, this is trivial to do on the command line. But what about the controls value(s)? Is it possible to pass an array/list of values as one of the config options on the command line?

Is it possible to pass an array/list of values as one of the config options on the command line
I doubt that is directly possible, but you could pass a quoted string of space (or comma or whatever) separated values that you split to list inside the Snakefile:
snakemake -C controls='control1 control2 ...'
Then inside the Snakefile:
controls= config['controls'].split(' ')

An alternative solution would be to pass variables on the command line like so...
snakemake --config case=proband.bam control1=mother.bam control2=father.bam
...and then to parse the configuration settings dynamically in the Snakefile. For example, any config key matching the regular expression control\d+ corresponds to a control sample.
So it's possible, but a bit of a stretch, and the config file is probably the better/cleaner option.

Related

Glob as the argument of a shell function

I'm writing a reusable function, so I need the argument to be as flexible as possible.
Consider a simple example:
function testf(){
print ./*.$1
}
This works. For example, with testf mp3 it lists all the files ending with .mp3 in an array, making possible the use of for loops. But this way it only allows me to work with the extension name.
Therefore, I tried:
function testf(){
print ./$1
}
However, it doesn't work. Using testf *.mp3, unlike using print *.mp3 in the terminal, it will only pass the first matching string instead of the whole array.
Any suggestion?
ists all the files ending with .mp3 in an array ... there is no array involved in your question.
But to your problem: First, you want to pass to your function a wildcard pattern, but this is not what you are actually doing. testf *.mp3 expands the pattern before the function is invoked (this process is called filename generation), and your testf gets just a list of files as parameters. You can pass a pattern, but you have to ask the shell not to expand it:
testf '*.mp3'
In this case, your $1 indeed will contain the string *.mp3. However, your print ./$1 will still not work. The reason is that filename generation occurs before parameter expansion (which is the process where $1 is replaced by the string it contains). Again, you have to ask the shell to do it the other way round:
print ./${~1}
The shell performs several types of expansions before launching the command. When you enter
testf *.mp3
the shell will expand the glob first, and pass each filename as a separate argument to the function
Your function could look like this:
function testf(){
printf './%s\n' "$#"
}

Replace value of object property in multiple JSON files

I'm working with multiple JSON files that are located in the same folder.
Files contain objects with the same properties and they are such as:
{
"identifier": "cameraA",
"alias": "a",
"rtsp": "192.168.1.1"
}
I want to replace a property for all the objects in the JSON files at the same time for a certain condition.
For example, let's say that I want to replace all the rtsp values of the objects with identifier equal to "cameraA".
I've been trying with something like:
jq 'if .identifier == \"cameraA" then .rtsp=\"cameraX" else . end' -c *.json
But it isn't working.
Is there a simple way to replace the property of an object among multiple JSON files?
jq can only write to STDIN and STDOUT, so the simplest approach would be to process one file at a time, e.g. putting your jq program inside a shell loop. sponge is often used when employing this approach.
However, there is an alternative that has the advantage of efficiency. It requires only one invocation of jq, the output of which would include the filename information (obtained from input_filename). This output would then be the input of an auxiliary process, e.g. awk.

How to copy or move multiple files with same extension?

So I am trying to move a bunch of files with similar extensions from /home/ to /root/
Code I tried is
file copy /home/*.abc.xyz /root/
Also tried
set infile [glob -nocomplain /home/*.abc.xyz ]
if { [llength $infile] > 0 } {
file copy $infile /root/
}
No success.
Your two attempts fail for different reasons:
There is no wildcard expansion in arguments to file copy, or any Tcl command, for that matter: file copy /home/*.abc.xyz /root/. This will look for a single source with a literal * in its filename.
glob -nocomplain /home/*.abc.xyz is ok to collect the sources, but glob returns a list of sources. file copy requires each source to passed as a separate argument, not a single one. To expand a single collection value of source files into a multiple separate arguments, use the Tcl expansion operator {*}
Therefore:
set infiles [glob -nocomplain *.tcl]
if {[llength $infiles]} {
file copy {*}$infiles /tmp/tgt/
}
For a 1-line answer:
file copy {*}[glob /home/*.abc.xyz] /root/.
The file copy (and file rename) commands have two forms (hence the reference to the manual page in the comment). The first form copies a single file to a new target. The second form copies all the file name arguments to a new directory and this form of the command insists that the directory name be the last argument and you may have an arbitrary number of source file names preceding. Also, file copy does not do glob expansion on its arguments, so as you rightly surmised, you also need to use the glob command to obtain a list of the files to copy. The problem is that the glob command returns a list of file names and you passed that list as a single argument, i.e.
file copy $infile /root/
passes the list as a single argument and so the file copy command thinks it is dealing with the first form and attempts to find a file whose name matches that of the entire list. This file probably doesn't exist. Placing the error message in your question would have helped us to know for sure.
So what you want to do is take the list of files contained in the infile variable and expand it into separate argument words. Since this is a common situation, Tcl has some syntax to help (assuming you are not using some ancient version of Tcl). Try using the command:
file copy {*}$infile /root/
in place of your first attempt and see if that helps the situation.

Using arrays in a for loop, in bash [duplicate]

This question already has answers here:
bash script, create array of all files in a directory
(3 answers)
Closed 7 years ago.
I am currently working on a bash script where I must download files from our mySQL database, host them somewhere different, then update the database with the new location for the image. The last portion is my problem area, creating the array full of filenames and iterating through them, replacing the file names in the database as we go.
For whatever reason I keep getting these kinds of errors:
not found/X2b6qZP.png: 1: /xxx/images/X2b6qZP.png: ?PNG /xxx/images/X2b6qZP.png: 2: /xxx/images/X2b6qZP.png: : not found
/xxx/images/X2b6qZP.png: 1: /xxx/images/X2b6qZP.png: Syntax error: word unexpected (expecting ")")
files=$($DOWNLOADDIRECTORY/*)
files=$(${files[#]##*/})
# Iterate through the file names in the download directory, and assign the new values to the detail table.
for file in "${files[#]}"
do
mysql -h ${HOST} -u ${USER} -p${PASSWORD} ${DBNAME} "UPDATE crm_category_detail SET detail_value = 'http://xxx.xxx.x.xxx/img/$file' WHERE detail_value LIKE '%imgur.com/$file'"
done
You are trying to execute a glob as a command. The syntax to use arrays is array=(tokens):
files=("$DOWNLOADDIRECTORY"/*)
files=("${files[#]##*/}")
You are also trying to run your script with sh instead of bash.
Do not run sh file or use #!/bin/sh. Arrays are not supported in sh.
Instead use bash file or #!/bin/bash.
whats going on right here?
files=$($DOWNLOADDIRECTORY/*)
I dont think this is doing what you think it is doing.
According to this answer, you want to omit the first $ to get an array of files.
files=($DOWNLOADDIRECTORY/*)
I just wrote a sample script
#!/bin/sh
alist=(/*)
printf '%s\n' "${alist[#]}"
Output
/bin
/boot
/data
/dev
/dist
/etc
/home
/lib
....
Your assignments are not creating arrays. You need arrayname=( values for array ) as the notation. Hence:
files=( "$DOWNLOADDIRECTORY"/* )
files=( "${files[#]##*/}" )
The first line will give you all the names in the directory specified by $DOWNLOADDIRECTORY. The second carefully removes the directory prefix.
I've used spaces after ( and before ) for clarity; the shell neither requires nor objects to them. I used double quotes around the variable name and expansions to keep things sane when name do contain spaces etc.
Although it isn't immediately obvious why you might do this, its advantage over many alternatives is that it preserves spaces etc in file names.
You could just loop directly over the files:
for file in "$DOWNLOADDIRECTORY"/*; do
file="${file##*/}" # or file=$(basename "$file")
# MySQL stuff
done
Some quoting added in case of spaces in paths.

Converting Tcl to C++

I am trying to convert some tcl script into a C++ program. I don't have much experience with tcl and am hoping someone could explain what some of the following things are actually doing in the tcl script:
1) set rtn [true_test_sfm $run_dir]
2) cd [glob $run_dir]
3) set pwd [pwd]
Is the first one just checking if true_test_sfm directory exists in run_dir?
Also, I am programming on a windows machine. Would the system function be the equivalent to exec statements in tcl? And if so how would I print the result of the system function call to stdout?
In Tcl, square brackets indicate "evaluate the code between the square brackets". The result of that evaluation is substituted for the entire square-bracketed expression. So, the first line invokes the function true_test_sfm with a single argument $run_dir; the result of that function call is then assigned to the variable rtn. Unfortunately, true_test_sfm is not a built-in Tcl function, which means it's user-defined, which means there's no way we can tell you what the effect of that function call will be based on the information you've provided here.
glob is a built-in Tcl function which takes a file pattern as an argument and then lists files that match that pattern. For example, if a directory contains files "foo", "bar" and "baz", glob b* would return a list of two files, "bar" and "baz". Therefore the second line is looking for any files that match the pattern given by $run_dir, then using the cd command (another Tcl built-in) to change to the directory found by glob. Probably $run_dir is not actually a file pattern, but an explicit file name (ie, no globbing characters like * or ? in the string), otherwise this code may break unexpectedly. On Windows, some combination of FindFirstFile/FindNextFile in C++ could be used as a substitute for glob in Tcl, and SetCurrentDirectory could substitute for cd.
pwd is another built-in Tcl function which returns the process current working directory as an absolute path. So the last line is querying the current working directory and saving the result in a variable named pwd. Here you could use GetCurrentDirectory as a substitute for pwd.