How to use the dbicdump to only dump specific table - mysql

I just need to dump specific tables from my database such that these specific tables (3 tables to be exact out of 200 tables) will now be implemented by DBIx::Class::Schema.
Here is the command from the docs (https://metacpan.org/pod/dbicdump):
dbicdump -o dump_directory=./lib -o components='["InflateColumn::DateTime"]' -o preserve_case=1 MyApp::Schema dbi:mysql:database=database_name user pass;
I tried appending the table name after the database_name but no luck, it still dumps all the tables in the specified database. Need help. I can't find anything in the docs.
Also Out of topic question :
What does these means? -o components='["InflateColumn::DateTime"]' -o preserve_case=1 I also cant find their explanation in the docs.
Thanks

You can pass the option constraint to the underlying DBIx::Class::Schema::Loader instance so it only selects certain tables. The documentation is a bit vague on this.
These can be specified either as a regex (preferrably on the qr//
form), or as an arrayref of arrayrefs. Regexes are matched against the
(unqualified) table name, while arrayrefs are matched according to
"moniker_parts".
For example:
db_schema => [qw(some_schema other_schema)],
moniker_parts => [qw(schema name)],
constraint => [
[ qr/\Asome_schema\z/ => qr/\A(?:foo|bar)\z/ ],
[ qr/\Aother_schema\z/ => qr/\Abaz\z/ ],
],
In this case only the tables foo and bar in some_schema and baz in
other_schema will be dumped.
So what you need to pass to dbicdump would look something like this.
dbicdump \
-o dump_directory=./lib \
-o components='["InflateColumn::DateTime"]' \
-o preserve_case=1 \
-o constraint='qr/^(?:foo|bar|baz)$/' \
MyApp::Schema dbi:mysql:database=database_name user pass;
That will only give you the tables foo, bar and baz. You need the quoted regular expression without array refs if there is only one schema, and you don't want to use manually preset monikers (which are names used for tables in the generated Schema class).

Related

How to write a correct mongodb query for mongodump?

I'm trying to backup 3 articles from my database, I have their IDs but when I try to use mongodump I just can't seem to be able to write the proper json query. I get either a JSON error message, or some cryptic cannot decode objectID into a slice message.
Here's the command that I'm trying to run at the moment:
mongodump -d 'data' -c 'articles' -q '{"$oid": "5fa0bd32f7d5870029c7d421" }'
This is returning the ObjectID into a slice error, which I don't really understand. I also tried with ObjectId, like this:
mongodump -d 'data' -c 'articles' -q '{"_id": ObjectId("5fa0bd32f7d5870029c7d421") }'
But this one gives me a invalid JSON error.
I've tried all forms of escaping, escaping the double quotes, escaping the dollar, but nothing NOTHING seems to work. I'm desperate, and I hate mongodb. The closest I've been able to get to a working solution was this:
mongodump -d 'nikkei' -c 'articles' -q '{"_id": "ObjectId(5fa0bd32f7d5870029c7d421)" }'
And I say closest because this didn't fail, the command ran but it returned done dumping data.articles (0 documents) which means, if I understood correctly, that no articles were saved.
What would be the correct format for the query? I'm using mongodump version r4.2.2 by the way.
I have a collection with these 4 documents:
> db.test.find()
{ "_id" : ObjectId("5fab80615397db06f00503c3") }
{ "_id" : ObjectId("5fab80635397db06f00503c4") }
{ "_id" : ObjectId("5fab80645397db06f00503c5") }
{ "_id" : ObjectId("5fab80645397db06f00503c6") }
I make the binary export using the mongodump. This is using MongoDB v4.2 on Windows OS.
>> mongodump --db=test --collection=test --query="{ \"_id\": { \"$eq\" : { \"$oid\": \"5fab80615397db06f00503c3\" } } }"
2020-11-11T11:42:13.705+0530 writing test.test to dump\test\test.bson
2020-11-11T11:42:13.737+0530 done dumping test.test (1 document)
Here's an answer for those using Python:
Note: you must have mongo database tools installed on your system
import json
import os
# insert you query here
query = {"$oid": "5fa0bd32f7d5870029c7d421"}
# cast the query to a string
query = json.dumps(query)
# run the mongodump
command = f"mongodump --db my_database --collection my_collection --query '{query}'"
os.system(command)
If your query is for JSON than try this format.
mongodump -d=nikkei -c=articles -q'{"_id": "ObjectId(5fa0bd32f7d5870029c7d421)" }'
Is there nothing else you could query though, like a title? Might make things a little more simple.
I pulled this from mongoDB docs. It was pretty far down the page but here is the link.
https://docs.mongodb.com/database-tools/mongodump/#usage-in-backup-strategy

How do I make this into a function for input files? [duplicate]

I have a problem figuring out how to make the input directive only select all {samples} files in the rule below.
rule MarkDup:
input:
expand("Outputs/MergeBamAlignment/{samples}_{lanes}_{flowcells}.merged.bam", zip,
samples=samples['sample'],
lanes=samples['lane'],
flowcells=samples['flowcell']),
output:
bam = "Outputs/MarkDuplicates/{samples}_markedDuplicates.bam",
metrics = "Outputs/MarkDuplicates/{samples}_markedDuplicates.metrics",
shell:
"gatk --java-options -Djava.io.tempdir=`pwd`/tmp \
MarkDuplicates \
$(echo ' {input}' | sed 's/ / --INPUT /g') \
-O {output.bam} \
--VALIDATION_STRINGENCY LENIENT \
--METRICS_FILE {output.metrics} \
--MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 200000 \
--CREATE_INDEX true \
--TMP_DIR Outputs/MarkDuplicates/tmp"
Currently it will create correctly named output files, but it selects all files that match the pattern based on all wildcards. So I'm perhaps halfway there. I tried changing {samples} to {{samples}} in the input directive as such:
expand("Outputs/MergeBamAlignment/{{samples}}_{lanes}_{flowcells}.merged.bam", zip,
lanes=samples['lane'],
flowcells=samples['flowcell']),`
but this broke the previous rule somehow. So the solution is something like
input:
"{sample}_*.bam"
But clearly this doesn't work.
Is it possible to collect all files that match {sample}_*.bam with a function and use that as input? And if so, will the function still work with $(echo ' {input}' etc...) in the shell directive?
If you just want all the files in the directory, you can use a lambda function
from glob import glob
rule MarkDup:
input:
lambda wcs: glob('Outputs/MergeBamAlignment/%s*.bam' % wcs.samples)
output:
bam="Outputs/MarkDuplicates/{samples}_markedDuplicates.bam",
metrics="Outputs/MarkDuplicates/{samples}_markedDuplicates.metrics"
shell:
...
Just be aware that this approach can't do any checking for missing files, since it will always report that the files needed are the files that are present. If you do need confirmation that the upstream rule has been executed, you can have the previous rule touch a flag, which you then require as input to this rule (though you don't actually use the file for anything other than enforcing execution order).
If I understand correctly, zip needs to be applied only to {lane} and {flowcells} and not to {samples}. In that case, use two expand instances can achieve that.
input:
expand(expand("Outputs/MergeBamAlignment/{{samples}}_{lanes}_{flowcells}.merged.bam",
zip, lanes=samples['lane'], flowcells=samples['flowcell']),
samples=samples['sample'])
PS: output.tmp file uses {sample} instead of {samples}. Typo?

pt-query-digest filter queries with no key or bad key

How to filter all queries with either no key or a "bad" key?
pt-query-digest /var/lib/mysql/mysql-slow.log --filter '($event->{No_index_used} eq "Yes" || $event->{No_good_index_used} eq "Yes")'
This syntax returns evaluation error
I think that pt-index-usage is a better tool for what you need. The downside for it is that it needs to run against an active mysql instance, which can cause considerable overhead. If you have a slave that you can use or if you can restore a backup and run it there, it's better to use that.
With pt-query-digest, you can only filter by a certain set of attributes that are suitable for your input. To see the list of attributes that can be used to filter, you can run something like this:
pt-query-digest \
slowlog \
--filter 'print Dumper $event' \
--no-report \
--sample 1
This will generate a list of pairs such as Lock_time => '0.000026', this means that you can use $event->{Lock_time} to filter pt-query-digest results.

How do I get the latest tag value from the github API for a given repository

I can get the latest commit from the GitHub api using :
$ curl 'https://api.github.com/repos/dwkns/test/commits?per_page=1'
However the resulting JSON doesn't contain any reference to the tag I created when I did that commit.
I can get a list of tags using :
$ curl 'https://api.github.com/repos/dwkns/test/tags'
However the resulting JSON, while it contains the names of tags I want, is not in the order in which they were created - there is no way of telling which tag is the latest one.
EDIT : The latest tag created was LatestLatestLatest
My question then is what API call(s) do I need to do to get the name of the latest tag in my repository?
Semantic Versioning Example
NOTE: If you're in a hurry and don't need all the fine details explained, just jump down to "The Solution" and execute the command.
This solution uses curl and grep to match the LATEST semantically versioned release number. An example will be demonstrated using my own Github repo "pi-ap" (a pile of bash scripts which automates config of a Raspberry Pi into a wireless AP).
You can test the example I give you on the CLI and after you're satisfied it works as intended, you can tweak it to your own use-case.
Versioning Format Construction:
Since we're using grep to match the version number, I need to explain its' construction. 3 pairs of integers separated by 2 dots and prefaced by a "v":
vXX.XX.XX
^ ^ ^
| | |
| | Patch
| Minor
Major
NOTE: If a field only has a single digit, I'll pad it with a zero to ensure the resulting format is predictable: always 3 pairs of integers separated by 2 dots.
The Solution:
Github Username: F1Linux
Github Repo Name: pi-ap (NOTE: exclude the ".git" suffix)
curl -s 'https://github.com/f1linux/pi-ap/tags/'|grep -Eo "$Version v[0-9]{1,2}.[0-9]{1,2}.[0-9]{1,2}"|sort -r|head -n1
Validate the Result Correct:
In your browser, go to:
https://github.com/f1linux/pi-ap/tags
And validate that the latest tag was returned from the command.
The above is fairly extensible for most use-cases. Just need to change the user & repo names and remove/replace the "v" if you don't use this convention in tagging your repos.
Using jq in combination with curl you can have a pretty straightforward command:
curl -s \
-H "Accept: application/vnd.github.v3+json" \
https://api.github.com/repos/dwkns/test/tags \
| jq -r '.[0].name'
Output (as of today):
v56
Explanation on jq command:
-r is for "raw", avoid json quotes on jq's output
.[0] selects the first (latest) tag object in json array we got from github
.name selects the name property in this lastest json object
#!/bin/sh
curl -s https://github.com/dwkns/test/tags |
awk '/tag-name/{print $3;exit}' FS='[<>]'
Or
#!/bin/awk -f
BEGIN {
FS = "[<>]"
while ("curl -s https://github.com/dwkns/test/tags" | getline) {
if(/tag-name/){print $3;exit}
}
}

topojson makefile has no country data

I've been doing the d3 let's make a map tutorial and I'M SO CLOSE but something happened in merging the two json files because the final uk.json doesn't have the three letter country codes -- rendering my map useless because I can't assign a class to the subunits.
I read this from Mike Bostock that said topojson changed and to do this instead when creating the file:
topojson \
--id-property su_a3 \
-p name=NAME \
-p name \
-o topo/uk.json \
topo/subunits.json \
topo/places.json
which I ran in the Terminal but same output on the uk.json file. Any ideas? Do I need to make a subfolder within my directory called "topo"?
1. Working code: Quickly, I find in your code some diferences with mine. Try out this :
topojson \
--id-property su_a3 \
-p name=name \
-p name=NAME \
-o topo/uk.json \
-- topo/subunits.json \
topo/places.json
I'haven't test it however. The topo/... path is also a difference with my code.
2. Missing: A possibility is that you lost this property upper in your workflow. The GIS file's data attribute name may have changed, etc.
3. Case sensitive: Check that the keys you call in your TOPOJSON match the keys within your GIS / Geojson file. This is case sensitive. To check within the shp file : QuantumGIS* > load the .shp file > Right click on layer > Open attribute table > There, look at the column's title.
*: or other GIS software
I had the same issue, although after a while I realized that the doc says
-p, --properties feature properties to preserve; no name preserves all properties
So if you use -p without anything else, something like
topojson --id-property SU_A3 -p -o yourjson.json -- subunits.json places.json
you will get all the features and you will be able to retrieve whatever field you want. I don't know how is that if you only want to map some attributes (I was having the same issue)
Anyway, hope this help