Recursively download resources from RESTful web service - json

I'd like to recursively download JSON resources from a RESTful HTTP endpoint and store these in a local directory structure, following links to related resources in the form of JSON strings containing HTTP URLs. Wget would seem to be a likely tool for the job, though its recursive download is apparently limited to HTML hyperlinks and CSS url() references.
The resources in question are Swagger documentation files similar to this one, though in my cases all of the URLs are absolute. The Swagger schema is fairly complicated, but it would be sufficient to follow any string that looks like an absolute HTTP(S) URL. Even better would be to follow absolute or relative paths specified in 'path' properties.
Can anyone suggest a general purpose recursive crawler that would do what I want here, or a lightweight way of scripting wget or similar to achieve it?

I ended up writing a shell script to solve the problem:
API_ROOT_URL="http://petstore.swagger.wordnik.com/api/api-docs"
OUT_DIR=`pwd`
function download_json {
echo "Downloading $1 to $OUT_DIR$2.json"
curl -sS $1 | jq . > $OUT_DIR$2.json
}
download_json $API_ROOT_URL /api-index
jq -r .apis[].path $OUT_DIR/api-index.json | while read -r API_PATH; do
API_PATH=${API_PATH#$API_ROOT_URL}
download_json $API_ROOT_URL$API_PATH $API_PATH
done
This uses jq to extract the API paths from the index file, and also to pretty print the JSON as it is downloaded. As webron mentions this will probably only be of interest to people still using the 1.x Swagger schema, though I can see myself adapting this script for other problems in the future.
One problem I've found with this for Swagger is that the order of entries in our API docs is apparently not stable. Running the script several times in a row against our API docs (generated by swagger-springmvc) results in minor changes to property orders. This can be partly fixed by sorting the JSON objects' property keys with jq's --sort-keys option, but this doesn't cover all cases, e.g. a model schema's required property which is a plain array of string property names.

Related

Gateway rest API resource can't find the file I provide

resource "aws_api_gateway_rest_api" "api" {
body = "${file("apigateway/json-resolved/swagger.json")}"
name = "api"
}
---------------------------------------------------------------------------------
Invalid value for "path" parameter: no file exists at apigateway/json-resolved/swagger.json;
this function works only with files that are distributed as
part of the configuration source code,
so if this file will be created by a resource in this configuration you must
instead obtain this result from an attribute of that resource.
When I try to deploy my API by providing the actual path to the API JSON, this is what it throws. Even though the file is there, even though I tried different paths, from relative to absolute, etc. It works when I paste the entire JSON in the body, but not when I provide a file. Why is that?
Since Terraform is not aware of the location of the file, you should specify it explicitly:
If the file is in the same directory, then use ./apigateway/json-resolved/swagger.json
If the file is one directory up from the directory you are running Terraform from, you could use ../apigateway/json-resolved/swagger.json
Alternatively, it is a good idea to use Terraform built-in functions for path manipulation: path.cwd, path.module, or path.root. More detailed explanation about what these three functions represent can be found in [1].
Provide a full path to the file by running pwd in the directory where the file is located (this works on Linux and MacOS) and paste the result of the command in the file function input.
Additionally, any combination of the points 2. and 3. could also work, but you should be careful.
There is also another great answer to a similar question [2].
NOTE: in some cases the path.* functions might not give expected results on Windows. As per this comment [3] from Github, if the paths are used consistently (i.e., all / or all \), Windows should also be able to work with path.* but only for versions of Terraform >=0.12. Based on the code snippet form the question it seems in this case an older version is used.
[1] https://www.terraform.io/language/expressions/references#filesystem-and-workspace-info
[2] Invalid value for "path" parameter: no file exists at
[3] https://github.com/hashicorp/terraform/issues/14986#issuecomment-448756885

How to use local JSON assets to simulate API in Scala.js

I'm new to Scala and Scala.js and I want to experiment with handling JSON data. I'd like to simulate a server response by returning the content of a JSON file local to my Scala.js project, parse it and work with the data. What would be the best way to do so? Where should I place these files in my project tree, and how would I get their content?
Say that I have a file called myJSON.json containing something like
[
{
"ress": "AR",
"lastDate": "2017-10-27 09:19:18"
},
{
"ress": "JIM",
"lastDate": "2017-10-27 06:57:15"
},
{
"ress": "JOE",
"lastDate": "2017-09-29 11:57:39"
}
]
Can I place this file somewhere in my project so that I can read this file and then parse its content to use it somehow (could be displayed in the browser, logged to the console, etc...)? I guess I could use a tool such as scala-js or something similar for parsing, but accessing the file content in the first place is what I try to figure out.
Note that I'm using scala-js.
Thanks in advance!
Like others said above, Javascript that runs in the browser in general can't access the local filesystem. There are some exceptions:
The File API lets you access files that the user has selected in the UI using <input type="file" /> or drag-and-dropped into the browser window.
The Filesystem API lets you access files the way you seem to want, but it is non-standard and is not supported in most browsers. It also seems that Scala.js has no typings for it, but I'm not sure.
scala-js-dom has typings for the File API that you can use – search for File and FileList types in its source. Its API mirrors the Javascript API, so you will need to look for how exactly to do this in JS. Then translating it into Scala.js will be easy (or at least a different question).
If the File API does not work for your use case, another option is to use something like json-server to easily serve your JSON files on localhost via HTTP.

Custom dynamic inventory scripts/plugins in Ansible

Ansible allows devs
to write programs (in any language) that will return JSON describing the dynamic “snapshot” of current hosts. I’m using vSphere, which is currently not supported by Ansible OSS, and so I need to write such a "custom inventory plugin".
I can handle the querying of vSphere for a list of hosts, as well as constructing the JSON that is compatible with what Ansible is expecting.
Where the documentation completely (seemingly) falls flat is:
How do I “connect” Ansible with my inventory app? That is, say my inventory app is a simple bash script (inventory.sh)..how do I configure Ansible to call bash inventory.sh and obtain JSON from it? In reality the app will likely be a Java executable (inventory.jar) but I figure that if I can figure out how to get it working with bash, I can extrapolate to Java; and
How does Ansible actually capture/fetch the JSON back from the app? STDOUT? Is this all supposed to happen over an HTTP connection? Examples? How does inventory.sh or inventory.jar communicate that JSON back to Ansible?
The inventory script has to be located on the same machine where Ansible runs. It is not communicating through http, Ansible will simply parse the STDOUT of your program. The location does not matter at all, you have to pass the path to Ansible when you call Ansible:
ansible-playbook ... -i /path/to/your/inventory.sh
To avoid passing the inventory location every time you could add this to you ansible.cfg:
inventory = /path/to/your/inventory.sh
You could also copy the script to /etc/ansible/hosts, which is the default location Ansible will look for inventory files/scripts, but I prefer to keep things together so I suggest to place it close to your playbooks/roles etc.
And (3) Is any of this documented, anywhere? Don't see anything in the Ansible docs...
It is not mentioned on the page Developing Dynamic Inventory Sources but it is to be seen on some examples on the page Dynamic Inventory. The docs are community managed and from times litte unstructured and lacking important information.
BTW, there is a VMware inventory script included. By looking at the source I have seen it imports some vSphere stuff. I have little experience with VMware so I can't judge if this is actually what you need and don't need to write your own.
This is completely user defined. Typically you would write your dynamic inventory in Python and use a json dump of the output to create the inventory.
Here is an example for the use case you mentioned (vSphere): https://github.com/RaymiiOrg/ansible-vmware/blob/master/query.py
In a nutshell you create it like a normal Python file and create the options (as he does in main) and selectively execute functions based on which options are passed. These will make REST calls and return the output in the form of a JSON dump, which Ansible can parse for use in inventory.

Convert JSON data to BSON on the command line

I'm on an Ubuntu system, and I'm trying to write a testing framework that has to (among other things) compare the output of a mongodump command. This command generates a bunch of BSON files, which I can compare. However, for human readability, I'd like to convert these to nicely formatted JSON instead, which I can do using the provided bsondump command. The issue is that this appears to be a one-way conversion.
While I can work around this if I absolutely need to, it would be alot easier if there was a way to convert back from JSON to BSON on the command line. Does anyone know of a command line tool to do this? Google seems to have come up dry.
I haven't used them, but bsontools can convert from json, xml, or csv
As #WiredPrarie points out, the conversion from BSON to JSON is lossy, and it makes no sense to want to go back the other way. Workarounds include using mongoimport instead of mongorestore, or just using the original BSON. See the comments for more deails (adding this answer mainly so I can close the question)
You can try beesn, it converts data both ways. For your variant - JSON -> BSON - use the -x switch.
Example:
$ beesn -x -i test-data/01.json -o my.bson
Disclaimer: I am an author of this tool.

How do I use the Perl Text-MediawikiFormat to convert mediawiki to xhtml?

On an Ubuntu platform, I installed the nice little perl script
libtext-mediawikiformat-perl - Convert Mediawiki markup into other text formats
which is available on cpan. I'm not familiar with perl and have no idea how to go about using this library to write a perl script that would convert a mediawiki file to an html file. e.g. I'd like to just have a script I can run such as
./my_convert_script input.wiki > output.html
(perhaps also specifying the base url, etc), but have no idea where to start. Any suggestions?
I believe #amon is correct that perl library I reference in the question is not the right tool for the task I proposed.
I ended up using the mediawiki API with the action="parse" to convert to HTML using the mediawiki engine, which turned out to be much more reliable than any of the alternative parsers I tried proposed on the list. (I then used pandoc to convert my html to markdown.) The mediawiki API handles extraction of categories and other metadata too, and I just had to append the base url to internal image and page links.
Given the page title and base url, I ended up writing this as an R function.
wiki_parse <- function(page, baseurl, format="json", ...){
require(httr)
action = "parse"
addr <- paste(baseurl, "/api.php?format=", format, "&action=", action, "&page=", page, sep="")
config <- c(add_headers("User-Agent" = "rwiki"), ...)
out <- GET(addr, config=config)
parsed_content(out)
}
The Perl library Text::MediawikiFormat isn't really intended for stand-alone use but rather as a formatting engine inside a larger application.
The documentation at CPAN does actually show a way how to use this library, and does note that other modules might provide better support for one-off conversions.
You could try this (untested) one-liner
perl -MText::MediawikiFormat -e'$/=undef; print Text::MediawikiFormat::format(<>)' input.wiki >output.html
although that defies the whole point (and customization abilities) of this module.
I am sure that someone has already come up with a better way to convert single MediaWiki files, so here is a list of alternative MediaWiki processors on the mediawiki site. This SO question coud also be of help.
Other markup languages, such as Markdown provide better support for single-file conversions. Markdown is especially well suited for technical documents and mirrors email conventions. (Also, it is used on this site.)
The libfoo-bar-perl packages in the Ubuntu repositories are precompiled Perl modules. Usually, these would be installed via cpan or cpanm. While some of these libraries do include scripts, most don't, and aren't meant as stand-alone applications.