I'm on an Ubuntu system, and I'm trying to write a testing framework that has to (among other things) compare the output of a mongodump command. This command generates a bunch of BSON files, which I can compare. However, for human readability, I'd like to convert these to nicely formatted JSON instead, which I can do using the provided bsondump command. The issue is that this appears to be a one-way conversion.
While I can work around this if I absolutely need to, it would be alot easier if there was a way to convert back from JSON to BSON on the command line. Does anyone know of a command line tool to do this? Google seems to have come up dry.
I haven't used them, but bsontools can convert from json, xml, or csv
As #WiredPrarie points out, the conversion from BSON to JSON is lossy, and it makes no sense to want to go back the other way. Workarounds include using mongoimport instead of mongorestore, or just using the original BSON. See the comments for more deails (adding this answer mainly so I can close the question)
You can try beesn, it converts data both ways. For your variant - JSON -> BSON - use the -x switch.
Example:
$ beesn -x -i test-data/01.json -o my.bson
Disclaimer: I am an author of this tool.
Related
Hi I'm trying to parse any of the files from the link underneath. I've tried reaching out to the owner of the data dumps, but nothing works in trying to parse the files as proper JSON files. No program we use (Power BI, Jupyter, Excel) anything really, wants to recognise the files as JSON and we can't figure out why this might be. I was wondering if anyone could help figuring out what the issue is here as this dataset is very interesting to me and my co-students. I hope I'm using the word 'parsing' correctly.
The link to the data dumps is linked underneath:
https://files.pushshift.io/reddit/comments/
The file I downloaded (I just tried one at random) was handled just fine by jq, my preferred command-line tool for processing JSON files.
jq accepts an input consisting of a sequence of JSON objects, which is what I found when I decompressed the test file. This format is commonly known as JSON lines, and many tools can handle it. The Wikipedia article on JSON streaming contains more information and a (possibly outdated) list of tools.
If your tools aren't capable of handling more than one JSON object in an input, you could turn the files into something which you can handle by adding a comma to the end of every line except the last one (since each JSON object is a single line) and then surrounding the whole input inside a pair of brackets to turn the sequence into a JSON list. Since JSON does not actually care about newlines, it would be sufficient to add a line containing [ at the beginning and a line containing ] at the end. I don't know what command-line tools you have available and are comfortable with, but the task shouldn't be too difficult.
I just installed jq on Ubuntu 18.04 and It is really nicely formatted when making curl calls. I am wondering if there is a way to always have my terminal(s) output JSON data nicely formatted without having to append the | jq . command to the end of it? I am familiar with setting up aliases in my .zshrc file but there is no single command I would use this for so not sure how to make this happen.
Dummy command I ran
curl -X GET localhost:3000/products | jq .
I want the jq command to always be in effect when returning json to from the CLI.
Thanks!
While I won't go so far as to say this cannot be done, not only am I not aware of any straightforward way of doing so, I think it would be extremely challenging to do at all for the following reasons:
It's hard to reliable identify that a command is outputting JSON. Many things might look like JSON but not be intended to be JSON. A command might output something that looks like JSON in large part but that isn't entirely JSON.
It's impossible to know if a long-running command is outputting valid JSON until it has finished. This gives a dilemma of what to do in the interim - wait for it to finish, potentially indefinitely (especially if it's waiting for your input!) or find some way to roll-back the output if it turns out we guessed wrong.
It's hard to predict up-front if a command is going to return JSON. As in your example, you can't tell what the remote server is going to return from a curl command.
It's unclear what's the "right" thing to do for non-trivial commands, such as invoking shell functions or pipelines or applying redirections, or for programs such as text editors that use advanced control codes for redrawing the screen.
For these reasons, I think if this is possible it would require quite an advanced shell that captures the output of all commands the user executes, analyses and formats that output continuously, and has the capability to roll-back and reformat (or unformat) the output if it realises the output is incompatible with its method of display. I think that would be pretty interesting to see but I'm not aware of such a shell.
You could start with a simple script, e.g.
$ cat curljq
#!/bin/bash
curl "$#" | jq .
$ chmod +x curljq
$ ./curljq -Ss -H "Accept: application/json" https://reqbin.com/echo/get/json
{
"success": "true"
}
I use a powershell script to get JSON Information from a webinterdace. Actually i store the JSON Information from there into a file before i use JQ 1.5 (under Windows 10) to transform the json into a format that i can upload into a database. But since i use jq in the same powershell enviroment, i think i can avoid that redirection and work directly with the json text (with a variable or with the json text direct in the jq command). I checked the Manual but found no clear answer on that question (for me). I found in the Manual the --argjson command that Looks like that i need. But the Manual is not clear how i define the variable under Windows/powershell.
Regards
Timo
Sorry if the question is confusing. I found a way to work with variables directly on the commandline, e.g:
$variable | C:\jq.exe [Filter]
There was no Explanation in the JQ Manual how to pass json text directly on the commandshell to jq. But i found it. Thanks for your help.
Regards
Timo
You could use the ConvertFrom-Json command to convert any piece of JSON directly into a PowerShell object.
I have a bunch of pcap files, created with tcpdump. I would like to store these in a database, for easier querying, indexing etc. I thought mongodb might be a good choice, because storing a packet the way Wireshark/TShark presents them as JSON document seems to be natural.
It should be possible to create PDML files with tshark, parse these and insert them into mongodb, but I am curious if someone knows of an existing/other solution.
On the command line (Linux, Windows or MacOS), you can use tshark.
e.g.
tshark -r input.pcap -T json >output.json
or with a filter:
tshark -2 -R "your filter" -r input.pcap -T json >output.json
Considering you mentioned a set of pcap files, you can also pre-merge the pcap files into a single pcap and then export that in one go if preferred..
mergecap -w output.pcap input1.pcap input2.pcap..
Wireshark has a feature to export it's capture files to JSON.
File->Export Packet Dissections->As JSON
You could use pcaphar. More info about HAR here.
Does anyone known of a simple utility for editing a simple BSON database/file?
Did You try this: http://docs.mongodb.org/manual/reference/bsondump/ ?
The installation package that includes mongodump ('mongo-tools' on Ubuntu) should also include bsondump, for which the manpage says:
bsondump - examine BSON files in a human-readable form
You can convert BSON to JSON with the following:
bsondump --pretty <your_file.bson
As a data interchange format, BSON may not be suitable for editing directly. For manipulating a BSON dataset, you could, of course, upload it to MonogDB and work with that. Or, you could open the bsondump decoded JSON in an editor. But the BSON Wikipedia article indicates that compatible libraries exist in several languages, which suggests that you should also be able to decode it programmatically to an internal map representation and edit that internal representation in code.