Force mongodb to output strict JSON - json

I want to consume the raw output of some MongoDB commands in other programs that speak JSON. When I run commands in the mongo shell, they represent Extended JSON, fields in "shell mode", with special fields like NumberLong , Date, and Timestamp. I see references in the documentation to "strict mode", but I see no way to turn it on for the shell, or a way to run commands like db.serverStatus() in things that do output strict JSON, like mongodump. How can I force Mongo to output standards-compliant JSON?
There are several other questions on this topic, but I don't find any of their answers particularly satisfactory.

The MongoDB shell speaks Javascript, so the answer is simple: use JSON.stringify(). If your command is db.serverStatus(), then you can simply do this:
JSON.stringify(db.serverStatus())
This won't output the proper "strict mode" representation of each of the fields ({ "floatApprox": <number> } instead of { "$numberLong": "<number>" }), but if what you care about is getting standards-compliant JSON out, this'll do the trick.

I have not found a way to do this in the mongo shell, but as a workaround, mongoexport can run queries and its output uses strict mode and can be piped into other commands that expect JSON input (such as json_pp or jq). For example, suppose you have the following mongo shell command to run a query, and you want to create a pipeline using that data:
db.myItemsCollection.find({creationDate: {$gte: ISODate("2016-09-29")}}).pretty()
Convert that mongo shell command into this shell command, piping for the sake of example to `json_pp:
mongoexport --jsonArray -d myDbName -c myItemsCollection -q '{"creationDate": {"$gte": {"$date": "2016-09-29T00:00Z"}}}' | json_pp
You will need to convert the query into strict mode format, and pass the database name and collection name as arguments, as well as quote properly for your shell, as shown here.

In case of findOne
JSON.stringify(db.Bill.findOne({'a': '123'}))
In case of a cursor
db.Bill.find({'a': '123'}).forEach(r=>print(JSON.stringify(r)))
or
print('[') + db.Bill.find().limit(2).forEach(r=>print(JSON.stringify(r) + ',')) + print(']')
will output
[{a:123},{a:234},]
the last one will have a ',' after the last item...remove it

To build on the answer from #jbyler, you can strip out the numberLongs using sed after you get your data - that is if you're using linux.
mongoexport --jsonArray -d dbName -c collection -q '{fieldName: {$regex: ".*turkey.*"}}' | sed -r 's/\{ "[$]numberLong" : "([0-9]+)" }/"\1"/g' | json_pp

EDIT: This will transform a given document, but will not work on a list of documents. Changed find to findOne.
Adding
.forEach(function(results){results._id=results._id.toString();printjson(results)})`
to a findOne() will output valid JSON.
Example:
db
.users
.findOne()
.forEach(function (results) {
results._id = results._id.toString();
printjson(results)
})
Source: https://www.mydbaworld.com/mongodb-shell-output-valid-json/

Related

How to write a correct mongodb query for mongodump?

I'm trying to backup 3 articles from my database, I have their IDs but when I try to use mongodump I just can't seem to be able to write the proper json query. I get either a JSON error message, or some cryptic cannot decode objectID into a slice message.
Here's the command that I'm trying to run at the moment:
mongodump -d 'data' -c 'articles' -q '{"$oid": "5fa0bd32f7d5870029c7d421" }'
This is returning the ObjectID into a slice error, which I don't really understand. I also tried with ObjectId, like this:
mongodump -d 'data' -c 'articles' -q '{"_id": ObjectId("5fa0bd32f7d5870029c7d421") }'
But this one gives me a invalid JSON error.
I've tried all forms of escaping, escaping the double quotes, escaping the dollar, but nothing NOTHING seems to work. I'm desperate, and I hate mongodb. The closest I've been able to get to a working solution was this:
mongodump -d 'nikkei' -c 'articles' -q '{"_id": "ObjectId(5fa0bd32f7d5870029c7d421)" }'
And I say closest because this didn't fail, the command ran but it returned done dumping data.articles (0 documents) which means, if I understood correctly, that no articles were saved.
What would be the correct format for the query? I'm using mongodump version r4.2.2 by the way.
I have a collection with these 4 documents:
> db.test.find()
{ "_id" : ObjectId("5fab80615397db06f00503c3") }
{ "_id" : ObjectId("5fab80635397db06f00503c4") }
{ "_id" : ObjectId("5fab80645397db06f00503c5") }
{ "_id" : ObjectId("5fab80645397db06f00503c6") }
I make the binary export using the mongodump. This is using MongoDB v4.2 on Windows OS.
>> mongodump --db=test --collection=test --query="{ \"_id\": { \"$eq\" : { \"$oid\": \"5fab80615397db06f00503c3\" } } }"
2020-11-11T11:42:13.705+0530 writing test.test to dump\test\test.bson
2020-11-11T11:42:13.737+0530 done dumping test.test (1 document)
Here's an answer for those using Python:
Note: you must have mongo database tools installed on your system
import json
import os
# insert you query here
query = {"$oid": "5fa0bd32f7d5870029c7d421"}
# cast the query to a string
query = json.dumps(query)
# run the mongodump
command = f"mongodump --db my_database --collection my_collection --query '{query}'"
os.system(command)
If your query is for JSON than try this format.
mongodump -d=nikkei -c=articles -q'{"_id": "ObjectId(5fa0bd32f7d5870029c7d421)" }'
Is there nothing else you could query though, like a title? Might make things a little more simple.
I pulled this from mongoDB docs. It was pretty far down the page but here is the link.
https://docs.mongodb.com/database-tools/mongodump/#usage-in-backup-strategy

Oracle SQLcl: Spool to json, only include content in items array?

I'm making a query via Oracle SQLcl. I am spooling into a .json file.
The correct data is presented from the query, but the format is strange.
Starting off as:
SET ENCODING UTF-8
SET SQLFORMAT JSON
SPOOL content.json
Follwed by a query, produces a JSON file as requested.
However, how do I remove the outer structure, meaning this part:
{"results":[{"columns":[{"name":"ID","type":"NUMBER"},
{"name":"LANGUAGE","type":"VARCHAR2"},{"name":"LOCATION","type":"VARCHAR2"},{"name":"NAME","type":"VARCHAR2"}],"items": [
// Here is the actual data I want to see in the file exclusively
]
I only want to spool everything in the items array, not including that key itself.
Is this possible to set as a parameter before querying? Reading the Oracle docs have not yielded any answers, hence asking here.
Thats how I handle this.
After output to some file, I use jq command to recreate the file with only the items
ssh cat file.json | jq --compact-output --raw-output '.results[0].items' > items.json
`
Using this library = https://stedolan.github.io/jq/

Is there a `jq` command line tool or wrapper which lets you interactively explore `jq` similar to `jmespath.terminal`

jq is a lightweight and flexible command-line JSON processor.
https://stedolan.github.io/jq/
Is there a jq command line tool or wrapper which lets you pipe output into it and interactively explore jq, with the JSON input in one pane and your interactively updating result in another pane, similar to jmespath.terminal ?
I'm looking for something similar to the JMESPath Terminal jpterm
"JMESPath exploration tool in the terminal"
https://github.com/jmespath/jmespath.terminal
I found this project jqsh but it's not maintained and it appears to produce a lot of errors when I use it.
https://github.com/bmatsuo/jqsh
I've used https://jqplay.org/ and it's a great web based jq learning tool. However, I want to be able to, in the shell, pipe the json output of a command into an interactive jq which allows me to explore and experiment with jq commands.
Thanks in advance!
I've been using jiq and I'm pretty happy with it.
https://github.com/fiatjaf/jiq
It's jid with jq.
You can drill down interactively by using jq filtering queries.
jiq uses jq internally, and it requires you to have jq in your PATH.
Using the aws cli
aws ec2 describe-regions --region-names us-east-1 us-west-1 | jiq
jiq output
[Filter]> .Regions
{
"Regions": [
{
"Endpoint": "ec2.us-east-1.amazonaws.com",
"RegionName": "us-east-1"
},
{
"Endpoint": "ec2.us-west-1.amazonaws.com",
"RegionName": "us-west-1"
}
]
}
https://github.com/simeji/jid
n.b. I'm not clear how strictly it follows jq syntax and feature set
You may have to roll-your-own.
Of course, jq itself is interactive in the sense that if you invoke it without specifying any JSON input, it will process STDIN interactively.
If you want to feed the same data to multiple programs, you could easily write your own wrapper. Over at github, there's a bash script named jqplay that has a few bells and whistles. For example, if the input command begins with | then the most recent result is used as input.
Example 1
./jqplay -c spark.json
Enter a jq filter (possibly beginning with "|"), or blank line to terminate:
.[0]
{"name":"Paddington","lovesPandas":null,"knows":{"friends":["holden","Sparky"]}}
.[1]
{"name":"Holden"}
| .name
"Holden"
| .[0:1]
"H"
| length
1
.[1].name
"Holden"
Bye.
Example 2
./jqplay -n
Enter a jq filter (possibly beginning and/or ending with "|"), or blank line to terminate:
?
An initial | signifies the filter should be applied to the previous jq
output.
A terminating | causes the next line that does not trigger a special
action to be appended to the current line.
Special action triggers:
:exit # exit this script, also triggered by a blank line
:help # print this help
:input PATHNAME ...
:options OPTIONS
:save PN # save the most recent output in the named file provided
it does not exist
:save! PN # save the most recent output in the named file
:save # save to the file most recently specified by a :save command
:show # print the OPTIONS and PATHNAMEs currently in effect
:! PN # equivalent to the sequence of commands
:save! PN
:input PN
? # print this help
# # ignore this line
1+2
3
:exit
Bye.
If you're using Emacs (or willing to) then JQ-mode allows you to run JQ filters interactively on the current JSON document buffer:
https://github.com/ljos/jq-mode
There is a new one: https://github.com/PaulJuliusMartinez/jless
JLess is a command-line JSON viewer designed for reading, exploring, and searching through JSON data.
JLess will pretty print your JSON and apply syntax highlighting.
Expand and collapse Objects and Arrays to grasp the high- and low-level structure of a JSON document. JLess has a large suite of vim-inspired commands that make exploring data a breeze.
JLess supports full text regular-expression based search. Quickly find the data you're looking for in long String values, or jump between values for the same Object key.

mongoexport without _id field

I am using mongoexport to export some data into .json formatted file, however the document has a large size overhead introduced by _id:IDVALUE tuples.
I found a similar post Is there a way to retrieve data from MongoDB without the _id field? on how to omit the _id field when retrieving data from mongo, but not exporting. It is suggested to use: .Exclude("_id"). I tried to reqrite the --query parameter of mongoexport to somehow include the .Exclude("_id") parameter, but all of the attempts failed so far.
Please suggest what is the proper way of doing this, or should I revert to using some post-export techniques?
Thanks
There appears to be no way to exclude a field (such as _id) using mongoexport.
Here's an alternative that has worked for me on moderate sized databases:
mongo myserver/mydb --quiet --eval "db.mycoll.find({}, {_id:0}).forEach(printjson);" > out.txt
On a large database (many millions of records) it can take a while and running this will affect other operations people try to do on the system:
This works:
mongoexport --db db_name --collection collection_name | sed '/"_id":/s/"_id":[^,]*,//' > file_name.json
Pipe the output of mongoexport into jq and remove the _id field there.
mongoexport --uri=mongodb://localhost/mydb --collection=my_collection \
| jq 'del(._id)'
Update: adding link to jq.
I know you specified you wanted to export in JSON but if you could substitute CSV data the native mongo export will work, and will be a lot faster than the above solutions
mongoexport --db <dbName> --collection <collectionName> --csv --fields "<fieldOne>,<fieldTwo>,<fieldThree>" > mongoex.csv
mongoexport doesn't seem to have such option.
With ramda-cli stripping the _id would look like:
mongoexport --db mydb --collection mycoll -f name,age | ramda 'omit ["_id"]'
I applied quux00's solution but forEach(printjson) prints MongoDB Extended JSON notation in the output (for instance "last_update" : NumberLong("1384715001000").
It will be better to use the following line instead:
db.mycoll.find({}, {_id:0}).forEach(function (doc) {
print( JSON.stringify(doc) );
});
mongo <server>/<database> --quiet --eval "db.<collection>.find({}, {_id:0,<field>:1}).forEach(printjson);" > out.txt
If you have some query to execute change "" to '' and write your condition in find with "" like find("age":13).
The simplest way to exclude the sub-document information such as the "_id" is to export it as a csv, then use a tool to convert the csv into json.
mongoexport can not omit "_id"
sed is a powerful command to do it:
mongoexport --db mydb --collection mycoll -f name,age | sed '/"_id":/s/"_id":[^,]*,//'
The original answer is from Exclude _id field using MongoExport command
Just use --type=csv option in mongoexport command.
mongoexport --db=<db_name> --collection=<collection_name> --type=csv --field=<fields> --out=<Outfilename>.csv
For MongoDb version 3.4, you can use --noHeaderLine option in mongoexport command to exclude the field header in csv export too.
For Detail: https://docs.mongodb.com/manual/reference/program/mongoexport/
export into a file and just use replace empty value using Regular expression, in my case
"_id": "f5dc48e1-ed04-4ef9-943b-b1194a088b95"
I used "_id": "(\w|-)*",
With jq this can be achieved easily:
mongoexport -d database -c collection --jsonArray | jq 'del(.[]._id)'
Have you tried specifying your fields with the --fields flag? All fields that are not mentioned are excluded from the export.
For maintainability you can also write your fields into a seperate file and use --fieldFile.

mongoexport JSON parsing error

Trying to use a query with mongoexport results in an error. But the same query is evaluated by the mongo-client without an error.
In mongo-client:
db.listing.find({"created_at":new Date(1221029382*1000)})
with mongoexport:
mongoexport -d event -c listing -q '{"created_at":new Date(1221029382*1000)}'
The generated error:
Fri Nov 11 17:44:08 Assertion: 10340:Failure parsing JSON string near:
$and: [ {
0x584102 0x528454 0x5287ce 0xa94ad1 0xa8e2ed 0xa92282 0x7fbd056a61c4
0x4fca29
mongoexport(_ZN5mongo11msgassertedEiPKc+0x112) [0x584102]
mongoexport(_ZN5mongo8fromjsonEPKcPi+0x444) [0x528454]
mongoexport(_ZN5mongo8fromjsonERKSs+0xe) [0x5287ce]
mongoexport(_ZN6Export3runEv+0x7b1) [0xa94ad1]
mongoexport(_ZN5mongo4Tool4mainEiPPc+0x169d) [0xa8e2ed]
mongoexport(main+0x32) [0xa92282]
/lib/libc.so.6(__libc_start_main+0xf4) [0x7fbd056a61c4]
mongoexport(__gxx_personality_v0+0x3d9) [0x4fca29]
assertion: 10340 Failure parsing JSON string near: $and: [ {
But doing the multiplication in Date beforehand in mongoexport:
mongoexport -d event -c listing -q '{"created_at":new Date(1221029382000)}'
works!
Why is mongo evaluating the queries differently in these two contexts?
The mongoexport command-line utility supports passing a query in JSON format, but you are trying to evaluate JavaScript in your query.
The JSON format was originally derived from JavaScript's object notation, but the contents of a JSON document can be parsed without eval()ing it in a JavaScript interpreter.
You should consider JSON as representing "structured data" and JavaScript as "executable code". So there are, in fact, two different contexts for the queries you are running.
The mongo command-line utility is an interactive JavaScript shell which includes a JavaScript interpreter as well as some helper functions for working with MongoDB. While the JavaScript object format looks similar to JSON, you can also use JavaScript objects, function calls, and operators.
Your example of 1221029382*1000 is the result of a math operation that would be executed by the JavaScript interpreter if you ran that in the mongo shell; in JSON it's an invalid value for a new Date so mongoexport is exiting with a "Failure parsing JSON string" error.
I also got this error doing a mongoexport, but for a different reason. I'll share my solution here though since I ended up on this SO page while trying to solve my issue.
I know it has little to do with this question, but the title of this post brought it up in Google, so since I was getting the exact same error I'll add an answer. Hopefully it helps someone.
I was trying to do a MongoId _id query in the Windows console. The problem was that I needed to wrap the JSON query in double quotes, and the ObjectId also had to be in double quotes (not single!). So I had to escape the ObjectId quotes.
mongoexport -u USERNAME -pPASSWORD -d DATABASE -c COLLECTION
--query "{_id : ObjectId(\"5148894d98981be01e000011\")}"
If I wrap the JSON query in single quote on Windows, I get this error:
ERROR: too many positional options
And if I use single quotes around the ObjectId, I get this error:
Assertion: 10340:Failure parsing JSON string near: _id
So, yeah. Good luck.