How to perform mongoexport without "numberLong" objects - json

I did a mongoexport dump of a database:
$ mongoexport -d my_db -c articles
and I'm seeing some of our ids are wrapped in "$numberLong" objects. Unfortunately, it isn't consistent. Some filemakerIds are just plain ints:
{"_id":{"$oid":"52126317036480948dc2abf2"},"filemakerId":4129,
and some are not:
{"_id":{"$oid":"52126317036480948dc2abf1"},"filemakerId":{"$numberLong":"4073"},
These ids will always be a 3 or 4 digit number. It would be easier for me if the dump consistently displayed them as an Int ( eg "filemakerId":4129). Can mongoexport force this?

The alternative to the mongoexport tool can be MongoCompass where you can export your data in JSON and CSV.
First, go to aggregation console and create an aggregation pipeline.
In $project section convert your data into double or int.(i used double)
add this:
{$project:{"filemakerId":{$toDouble:"$filemakerId"}}}
By converting you data to double/int you will didn't get the "filemakerId":{"$numberLong":"4073"}.
what you will get "filemakerId":4073
Finally, click on export and choose the output format which would be JSON/CSV.
Hope you'll get your expected result.

Related

Facing issue with Mongoexport json file "_id" column

I am exporting mongo collection to json format and then loading that data to bigquery table using bq load command.
mongoexport --uri mongo_uri --collection coll_1 --type json --fields id,createdAt,updatedAt --out data1.csv
The json row looks like below:
{"_id":{"$oid":"6234234345345234234sdfsf"},"id":1,"createdAt":"2021-05-11 04:15:15","updatedAt":null}
but when i run bq load command in bigquery it gives below error:
Invalid field name "$oid". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 300 characters long.
I think if mongoexport json contains {"_id": ObjectId(6234234345345234234sdfsf)} , my issue will be solved.
Is there any way to export json like this?
Or any other way to achive this?
Note: i can't use csv format because mongo documents contain commas.
By default, _id holds an ObjectId value, so it's better to store data in {"_id": ObjectId(6234234345345234234sdfsf)} format instead of storing it in "_id":{"$oid":"6234234345345234234sdfsf"}.
As you mentioned if json contains {"_id": ObjectId(6234234345345234234sdfsf)} your problem will be solved.
Replace $oid with oid. I'm using Python, so the code below worked:
with fileinput.FileInput("mongoexport_json.txt", inplace=True, encoding="utf8") as file:
for line in file:
print(line.replace('"$oid":', '"oid":'), end='')

How to write a correct mongodb query for mongodump?

I'm trying to backup 3 articles from my database, I have their IDs but when I try to use mongodump I just can't seem to be able to write the proper json query. I get either a JSON error message, or some cryptic cannot decode objectID into a slice message.
Here's the command that I'm trying to run at the moment:
mongodump -d 'data' -c 'articles' -q '{"$oid": "5fa0bd32f7d5870029c7d421" }'
This is returning the ObjectID into a slice error, which I don't really understand. I also tried with ObjectId, like this:
mongodump -d 'data' -c 'articles' -q '{"_id": ObjectId("5fa0bd32f7d5870029c7d421") }'
But this one gives me a invalid JSON error.
I've tried all forms of escaping, escaping the double quotes, escaping the dollar, but nothing NOTHING seems to work. I'm desperate, and I hate mongodb. The closest I've been able to get to a working solution was this:
mongodump -d 'nikkei' -c 'articles' -q '{"_id": "ObjectId(5fa0bd32f7d5870029c7d421)" }'
And I say closest because this didn't fail, the command ran but it returned done dumping data.articles (0 documents) which means, if I understood correctly, that no articles were saved.
What would be the correct format for the query? I'm using mongodump version r4.2.2 by the way.
I have a collection with these 4 documents:
> db.test.find()
{ "_id" : ObjectId("5fab80615397db06f00503c3") }
{ "_id" : ObjectId("5fab80635397db06f00503c4") }
{ "_id" : ObjectId("5fab80645397db06f00503c5") }
{ "_id" : ObjectId("5fab80645397db06f00503c6") }
I make the binary export using the mongodump. This is using MongoDB v4.2 on Windows OS.
>> mongodump --db=test --collection=test --query="{ \"_id\": { \"$eq\" : { \"$oid\": \"5fab80615397db06f00503c3\" } } }"
2020-11-11T11:42:13.705+0530 writing test.test to dump\test\test.bson
2020-11-11T11:42:13.737+0530 done dumping test.test (1 document)
Here's an answer for those using Python:
Note: you must have mongo database tools installed on your system
import json
import os
# insert you query here
query = {"$oid": "5fa0bd32f7d5870029c7d421"}
# cast the query to a string
query = json.dumps(query)
# run the mongodump
command = f"mongodump --db my_database --collection my_collection --query '{query}'"
os.system(command)
If your query is for JSON than try this format.
mongodump -d=nikkei -c=articles -q'{"_id": "ObjectId(5fa0bd32f7d5870029c7d421)" }'
Is there nothing else you could query though, like a title? Might make things a little more simple.
I pulled this from mongoDB docs. It was pretty far down the page but here is the link.
https://docs.mongodb.com/database-tools/mongodump/#usage-in-backup-strategy

Force mongodb to output strict JSON

I want to consume the raw output of some MongoDB commands in other programs that speak JSON. When I run commands in the mongo shell, they represent Extended JSON, fields in "shell mode", with special fields like NumberLong , Date, and Timestamp. I see references in the documentation to "strict mode", but I see no way to turn it on for the shell, or a way to run commands like db.serverStatus() in things that do output strict JSON, like mongodump. How can I force Mongo to output standards-compliant JSON?
There are several other questions on this topic, but I don't find any of their answers particularly satisfactory.
The MongoDB shell speaks Javascript, so the answer is simple: use JSON.stringify(). If your command is db.serverStatus(), then you can simply do this:
JSON.stringify(db.serverStatus())
This won't output the proper "strict mode" representation of each of the fields ({ "floatApprox": <number> } instead of { "$numberLong": "<number>" }), but if what you care about is getting standards-compliant JSON out, this'll do the trick.
I have not found a way to do this in the mongo shell, but as a workaround, mongoexport can run queries and its output uses strict mode and can be piped into other commands that expect JSON input (such as json_pp or jq). For example, suppose you have the following mongo shell command to run a query, and you want to create a pipeline using that data:
db.myItemsCollection.find({creationDate: {$gte: ISODate("2016-09-29")}}).pretty()
Convert that mongo shell command into this shell command, piping for the sake of example to `json_pp:
mongoexport --jsonArray -d myDbName -c myItemsCollection -q '{"creationDate": {"$gte": {"$date": "2016-09-29T00:00Z"}}}' | json_pp
You will need to convert the query into strict mode format, and pass the database name and collection name as arguments, as well as quote properly for your shell, as shown here.
In case of findOne
JSON.stringify(db.Bill.findOne({'a': '123'}))
In case of a cursor
db.Bill.find({'a': '123'}).forEach(r=>print(JSON.stringify(r)))
or
print('[') + db.Bill.find().limit(2).forEach(r=>print(JSON.stringify(r) + ',')) + print(']')
will output
[{a:123},{a:234},]
the last one will have a ',' after the last item...remove it
To build on the answer from #jbyler, you can strip out the numberLongs using sed after you get your data - that is if you're using linux.
mongoexport --jsonArray -d dbName -c collection -q '{fieldName: {$regex: ".*turkey.*"}}' | sed -r 's/\{ "[$]numberLong" : "([0-9]+)" }/"\1"/g' | json_pp
EDIT: This will transform a given document, but will not work on a list of documents. Changed find to findOne.
Adding
.forEach(function(results){results._id=results._id.toString();printjson(results)})`
to a findOne() will output valid JSON.
Example:
db
.users
.findOne()
.forEach(function (results) {
results._id = results._id.toString();
printjson(results)
})
Source: https://www.mydbaworld.com/mongodb-shell-output-valid-json/

mongoexport without _id field

I am using mongoexport to export some data into .json formatted file, however the document has a large size overhead introduced by _id:IDVALUE tuples.
I found a similar post Is there a way to retrieve data from MongoDB without the _id field? on how to omit the _id field when retrieving data from mongo, but not exporting. It is suggested to use: .Exclude("_id"). I tried to reqrite the --query parameter of mongoexport to somehow include the .Exclude("_id") parameter, but all of the attempts failed so far.
Please suggest what is the proper way of doing this, or should I revert to using some post-export techniques?
Thanks
There appears to be no way to exclude a field (such as _id) using mongoexport.
Here's an alternative that has worked for me on moderate sized databases:
mongo myserver/mydb --quiet --eval "db.mycoll.find({}, {_id:0}).forEach(printjson);" > out.txt
On a large database (many millions of records) it can take a while and running this will affect other operations people try to do on the system:
This works:
mongoexport --db db_name --collection collection_name | sed '/"_id":/s/"_id":[^,]*,//' > file_name.json
Pipe the output of mongoexport into jq and remove the _id field there.
mongoexport --uri=mongodb://localhost/mydb --collection=my_collection \
| jq 'del(._id)'
Update: adding link to jq.
I know you specified you wanted to export in JSON but if you could substitute CSV data the native mongo export will work, and will be a lot faster than the above solutions
mongoexport --db <dbName> --collection <collectionName> --csv --fields "<fieldOne>,<fieldTwo>,<fieldThree>" > mongoex.csv
mongoexport doesn't seem to have such option.
With ramda-cli stripping the _id would look like:
mongoexport --db mydb --collection mycoll -f name,age | ramda 'omit ["_id"]'
I applied quux00's solution but forEach(printjson) prints MongoDB Extended JSON notation in the output (for instance "last_update" : NumberLong("1384715001000").
It will be better to use the following line instead:
db.mycoll.find({}, {_id:0}).forEach(function (doc) {
print( JSON.stringify(doc) );
});
mongo <server>/<database> --quiet --eval "db.<collection>.find({}, {_id:0,<field>:1}).forEach(printjson);" > out.txt
If you have some query to execute change "" to '' and write your condition in find with "" like find("age":13).
The simplest way to exclude the sub-document information such as the "_id" is to export it as a csv, then use a tool to convert the csv into json.
mongoexport can not omit "_id"
sed is a powerful command to do it:
mongoexport --db mydb --collection mycoll -f name,age | sed '/"_id":/s/"_id":[^,]*,//'
The original answer is from Exclude _id field using MongoExport command
Just use --type=csv option in mongoexport command.
mongoexport --db=<db_name> --collection=<collection_name> --type=csv --field=<fields> --out=<Outfilename>.csv
For MongoDb version 3.4, you can use --noHeaderLine option in mongoexport command to exclude the field header in csv export too.
For Detail: https://docs.mongodb.com/manual/reference/program/mongoexport/
export into a file and just use replace empty value using Regular expression, in my case
"_id": "f5dc48e1-ed04-4ef9-943b-b1194a088b95"
I used "_id": "(\w|-)*",
With jq this can be achieved easily:
mongoexport -d database -c collection --jsonArray | jq 'del(.[]._id)'
Have you tried specifying your fields with the --fields flag? All fields that are not mentioned are excluded from the export.
For maintainability you can also write your fields into a seperate file and use --fieldFile.

Proper way to import json file to mongo

I've been trying to use mongo with some data imported, but I'm not able to use it properly with my document description.
This is an example of the .json I import using mongoimport: https://gist.github.com/2917854
mongoimport -d test -c example data.json
I noticed that all my document it's imported to a unique object in spite of creating one of object for each shop.
That's why when I try to find a shop or anything I want to query, all the document is returned.
db.example.find({"shops.name":"x"})
I want to be able to query the db to obtain products by the id using dot notation something similar to:
db.example.find({"shops.name":"x","categories.type":"shirts","clothes.id":"1"}
The problem is that all the document is imported like a single object. The question is: How
do I need to import the object to obtain my desired result?
Docs note that:
This utility takes a single file that contains 1 JSON/CSV/TSV string per line and inserts it.
In the structure you are using -assuming the errors on the gist are fixed- you are essentially importing one document with only shops field.
After breaking the data into separate shop docs, import using something like (shops being the collection name, makes more sense than using example):
mongoimport -d test -c shops data.json
and then you can query like:
db.shops.find({"name":x,"categories.type":"shirts"})
There is a parameter --jsonArray:
Accept import of data expressed with multiple MongoDB document within a single JSON array
Using this option you can feed it an array, so you only need to strip the outer object syntax i.e. everything at the beginning until and including "shops" :, and the } at the end.
Myself I use a little tool called jq that can extract the array from command line:
./jq '.shops' shops.json
IMPORT FROM JSON
mongoimport --db "databaseName" --collection "collectionName" --type json --file "fileName.json" --jsonArray
JSON format should be in this format. (Array of Objects)
[
{ name: "Name1", msg: "This is msg 1" },
{ name: "Name2", msg: "This is msg 2" },
{ name: "Name3", msg: "This is msg 3" }
]
IMPORT FROM CSV
mongoimport --db "databaseName" --collection "collectionName" --type csv --file "fileName.csv" --headerline
More Info
https://docs.mongodb.com/getting-started/shell/import-data/
Importing a JSON
The command mongoimport allows us to import human readable JSON in a specific database & a collection. To import a JSON data in a specific database & a collection, type mongoimport -d databaseName -c collectionName jsonFileName.json