I've been doing the d3 let's make a map tutorial and I'M SO CLOSE but something happened in merging the two json files because the final uk.json doesn't have the three letter country codes -- rendering my map useless because I can't assign a class to the subunits.
I read this from Mike Bostock that said topojson changed and to do this instead when creating the file:
topojson \
--id-property su_a3 \
-p name=NAME \
-p name \
-o topo/uk.json \
topo/subunits.json \
topo/places.json
which I ran in the Terminal but same output on the uk.json file. Any ideas? Do I need to make a subfolder within my directory called "topo"?
1. Working code: Quickly, I find in your code some diferences with mine. Try out this :
topojson \
--id-property su_a3 \
-p name=name \
-p name=NAME \
-o topo/uk.json \
-- topo/subunits.json \
topo/places.json
I'haven't test it however. The topo/... path is also a difference with my code.
2. Missing: A possibility is that you lost this property upper in your workflow. The GIS file's data attribute name may have changed, etc.
3. Case sensitive: Check that the keys you call in your TOPOJSON match the keys within your GIS / Geojson file. This is case sensitive. To check within the shp file : QuantumGIS* > load the .shp file > Right click on layer > Open attribute table > There, look at the column's title.
*: or other GIS software
I had the same issue, although after a while I realized that the doc says
-p, --properties feature properties to preserve; no name preserves all properties
So if you use -p without anything else, something like
topojson --id-property SU_A3 -p -o yourjson.json -- subunits.json places.json
you will get all the features and you will be able to retrieve whatever field you want. I don't know how is that if you only want to map some attributes (I was having the same issue)
Anyway, hope this help
Related
I have Snakefile as following:
SAMPLES, = glob_wildcards("data/{sample}_R1.fq.gz")
rule all:
input:
expand("samtools_sorted_out/{sample}.raw.snps.indels.g.vcf", sample=SAMPLES),
expand("samtools_sorted_out/combined_gvcf")
rule combine_gvcf:
input: "samtools_sorted_out/{sample}.raw.snps.indels.g.vcf"
output:directory("samtools_sorted_out/combined_gvcf")
params: gvcf_file_list="gvcf_files.list",
gatk4="/storage/anaconda3/envs/exome/share/gatk4-4.1.0.0-0/gatk-package-4.1.0.0-local.jar"
shell:"""
java -DGATK_STACKTRACE_ON_USER_EXCEPTION=true \
-jar {params.gatk4} GenomicsDBImport \
-V {params.gvcf_file_list} \
--genomicsdb-workspace-path {output}
"""
When I test it with dry run, I got error:
RuleException in line 335 of /data/yifangt/exomecapture/Snakefile:
Wildcards in input, params, log or benchmark file of rule combine_gvcf cannot be determined from output files:
'sample'
There are two places that I need some help:
The {output} is a folder that will be created by the shell part;
The {output} folder was hard-coded manually required by the command line (and the contents are unknown ahead of time).
The problem seems to be with the {output} without expansion as compared with the {input} which does.
How should I handle with this situation? Thanks a lot!
I have a problem figuring out how to make the input directive only select all {samples} files in the rule below.
rule MarkDup:
input:
expand("Outputs/MergeBamAlignment/{samples}_{lanes}_{flowcells}.merged.bam", zip,
samples=samples['sample'],
lanes=samples['lane'],
flowcells=samples['flowcell']),
output:
bam = "Outputs/MarkDuplicates/{samples}_markedDuplicates.bam",
metrics = "Outputs/MarkDuplicates/{samples}_markedDuplicates.metrics",
shell:
"gatk --java-options -Djava.io.tempdir=`pwd`/tmp \
MarkDuplicates \
$(echo ' {input}' | sed 's/ / --INPUT /g') \
-O {output.bam} \
--VALIDATION_STRINGENCY LENIENT \
--METRICS_FILE {output.metrics} \
--MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 200000 \
--CREATE_INDEX true \
--TMP_DIR Outputs/MarkDuplicates/tmp"
Currently it will create correctly named output files, but it selects all files that match the pattern based on all wildcards. So I'm perhaps halfway there. I tried changing {samples} to {{samples}} in the input directive as such:
expand("Outputs/MergeBamAlignment/{{samples}}_{lanes}_{flowcells}.merged.bam", zip,
lanes=samples['lane'],
flowcells=samples['flowcell']),`
but this broke the previous rule somehow. So the solution is something like
input:
"{sample}_*.bam"
But clearly this doesn't work.
Is it possible to collect all files that match {sample}_*.bam with a function and use that as input? And if so, will the function still work with $(echo ' {input}' etc...) in the shell directive?
If you just want all the files in the directory, you can use a lambda function
from glob import glob
rule MarkDup:
input:
lambda wcs: glob('Outputs/MergeBamAlignment/%s*.bam' % wcs.samples)
output:
bam="Outputs/MarkDuplicates/{samples}_markedDuplicates.bam",
metrics="Outputs/MarkDuplicates/{samples}_markedDuplicates.metrics"
shell:
...
Just be aware that this approach can't do any checking for missing files, since it will always report that the files needed are the files that are present. If you do need confirmation that the upstream rule has been executed, you can have the previous rule touch a flag, which you then require as input to this rule (though you don't actually use the file for anything other than enforcing execution order).
If I understand correctly, zip needs to be applied only to {lane} and {flowcells} and not to {samples}. In that case, use two expand instances can achieve that.
input:
expand(expand("Outputs/MergeBamAlignment/{{samples}}_{lanes}_{flowcells}.merged.bam",
zip, lanes=samples['lane'], flowcells=samples['flowcell']),
samples=samples['sample'])
PS: output.tmp file uses {sample} instead of {samples}. Typo?
I've read plenty of articles about this issue on here, but I still can't seem to get around this issue. I've been trying to use Neo4j-import on some large genome data CSVs I have, but it doesn't seem to recognise the files. My command line input is as follows:
user#LenovoPC ~/.config/Neo4j Desktop/Application/neo4jDatabases/database-2f182948-e170-45b1-b9f4-19d236ff5d43/installation-3.5.1 $ \
bin/neo4j-import --into data/databases/graph.db --id-type string \
--nodes:Allele variants.csv --nodes:Chromosome chromosome.csv --nodes:Phenotype phenotypes.csv \
--nodes:Sample samples.csv --relationships:BELONGS_TO variant_chromosomes.csv \
--relationships: sample_phenotypes.csv --relationships:ALTERNATIVE_TO variant_variants.csv \
--relationships:HAS sample_variants50-99.csv.gz
But I'm getting the following error:
WARNING: neo4j-import is deprecated and support for it will be removed in a future version of Neo4j; please use neo4j-admin import instead.
Input error: Expected '--nodes' to have at least 1 valid item, but had 0 []
Caused by:Expected '--nodes' to have at least 1 valid item, but had 0 []
java.lang.IllegalArgumentException: Expected '--nodes' to have at least 1 valid item, but had 0 []
at org.neo4j.kernel.impl.util.Validators.lambda$atLeast$6(Validators.java:144)
at org.neo4j.helpers.Args.validated(Args.java:670)
at org.neo4j.helpers.Args.interpretOptionsWithMetadata(Args.java:637)
at org.neo4j.tooling.ImportTool.extractInputFiles(ImportTool.java:623)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:445)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:380)
I included the file path, as I'm using Neo4j Desktop and am not sure if this has a different file structure? My csv files are stored in the import folder (but I also have copies in the current folder and the graph.db folder just in case).
The import directory is as follows:
user#LenovoPC ~/.config/Neo4j Desktop/Application/neo4jDatabases/database-2f182948-e170-45b1-b9f4-19d236ff5d43/installation-3.5.1/import $ dir
chromosomes.csv samples.csv variants.csv
phenotypes.csv sample_variants50-99.csv.gz variants.csv.gz
sample_phenotypes.csv variant_chromosomes.csv
variant_variants.csv
I can only assume that it's my filepath, but I've tried quite a few alternatives and had no luck at all. If anyone could shed some light on what the issue is, I would really appreciate it!
Best is to cd into the desktop directory, place the csv files into the import folder.
then you can do:
cd ~/.config/Neo4j Desktop/Application/neo4jDatabases/database-2f182948-e170-45b1-b9f4-19d236ff5d43/installation-3.5.1
bin/neo4j-import --into data/databases/graph.db --id-type string \
--nodes:Allele import/variants.csv \
--nodes:Chromosome import/chromosome.csv \
--nodes:Phenotype import/phenotypes.csv \
--nodes:Sample import/samples.csv \
--relationships:BELONGS_TO import/variant_chromosomes.csv \
--relationships import/sample_phenotypes.csv \
--relationships:ALTERNATIVE_TO import/variant_variants.csv \
--relationships:HAS import/sample_variants50-99.csv.gz
Some more notes:
HAS is a pretty generic relationship type
I left off the colon here: --relationships import/sample_phenotypes.csv not sure if you have the rel-type in the file
is this a single file? --relationships:HAS import/sample_variants50-99.csv.gz
I just need to dump specific tables from my database such that these specific tables (3 tables to be exact out of 200 tables) will now be implemented by DBIx::Class::Schema.
Here is the command from the docs (https://metacpan.org/pod/dbicdump):
dbicdump -o dump_directory=./lib -o components='["InflateColumn::DateTime"]' -o preserve_case=1 MyApp::Schema dbi:mysql:database=database_name user pass;
I tried appending the table name after the database_name but no luck, it still dumps all the tables in the specified database. Need help. I can't find anything in the docs.
Also Out of topic question :
What does these means? -o components='["InflateColumn::DateTime"]' -o preserve_case=1 I also cant find their explanation in the docs.
Thanks
You can pass the option constraint to the underlying DBIx::Class::Schema::Loader instance so it only selects certain tables. The documentation is a bit vague on this.
These can be specified either as a regex (preferrably on the qr//
form), or as an arrayref of arrayrefs. Regexes are matched against the
(unqualified) table name, while arrayrefs are matched according to
"moniker_parts".
For example:
db_schema => [qw(some_schema other_schema)],
moniker_parts => [qw(schema name)],
constraint => [
[ qr/\Asome_schema\z/ => qr/\A(?:foo|bar)\z/ ],
[ qr/\Aother_schema\z/ => qr/\Abaz\z/ ],
],
In this case only the tables foo and bar in some_schema and baz in
other_schema will be dumped.
So what you need to pass to dbicdump would look something like this.
dbicdump \
-o dump_directory=./lib \
-o components='["InflateColumn::DateTime"]' \
-o preserve_case=1 \
-o constraint='qr/^(?:foo|bar|baz)$/' \
MyApp::Schema dbi:mysql:database=database_name user pass;
That will only give you the tables foo, bar and baz. You need the quoted regular expression without array refs if there is only one schema, and you don't want to use manually preset monikers (which are names used for tables in the generated Schema class).
I can get the latest commit from the GitHub api using :
$ curl 'https://api.github.com/repos/dwkns/test/commits?per_page=1'
However the resulting JSON doesn't contain any reference to the tag I created when I did that commit.
I can get a list of tags using :
$ curl 'https://api.github.com/repos/dwkns/test/tags'
However the resulting JSON, while it contains the names of tags I want, is not in the order in which they were created - there is no way of telling which tag is the latest one.
EDIT : The latest tag created was LatestLatestLatest
My question then is what API call(s) do I need to do to get the name of the latest tag in my repository?
Semantic Versioning Example
NOTE: If you're in a hurry and don't need all the fine details explained, just jump down to "The Solution" and execute the command.
This solution uses curl and grep to match the LATEST semantically versioned release number. An example will be demonstrated using my own Github repo "pi-ap" (a pile of bash scripts which automates config of a Raspberry Pi into a wireless AP).
You can test the example I give you on the CLI and after you're satisfied it works as intended, you can tweak it to your own use-case.
Versioning Format Construction:
Since we're using grep to match the version number, I need to explain its' construction. 3 pairs of integers separated by 2 dots and prefaced by a "v":
vXX.XX.XX
^ ^ ^
| | |
| | Patch
| Minor
Major
NOTE: If a field only has a single digit, I'll pad it with a zero to ensure the resulting format is predictable: always 3 pairs of integers separated by 2 dots.
The Solution:
Github Username: F1Linux
Github Repo Name: pi-ap (NOTE: exclude the ".git" suffix)
curl -s 'https://github.com/f1linux/pi-ap/tags/'|grep -Eo "$Version v[0-9]{1,2}.[0-9]{1,2}.[0-9]{1,2}"|sort -r|head -n1
Validate the Result Correct:
In your browser, go to:
https://github.com/f1linux/pi-ap/tags
And validate that the latest tag was returned from the command.
The above is fairly extensible for most use-cases. Just need to change the user & repo names and remove/replace the "v" if you don't use this convention in tagging your repos.
Using jq in combination with curl you can have a pretty straightforward command:
curl -s \
-H "Accept: application/vnd.github.v3+json" \
https://api.github.com/repos/dwkns/test/tags \
| jq -r '.[0].name'
Output (as of today):
v56
Explanation on jq command:
-r is for "raw", avoid json quotes on jq's output
.[0] selects the first (latest) tag object in json array we got from github
.name selects the name property in this lastest json object
#!/bin/sh
curl -s https://github.com/dwkns/test/tags |
awk '/tag-name/{print $3;exit}' FS='[<>]'
Or
#!/bin/awk -f
BEGIN {
FS = "[<>]"
while ("curl -s https://github.com/dwkns/test/tags" | getline) {
if(/tag-name/){print $3;exit}
}
}