Proper way to import json file to mongo - json

I've been trying to use mongo with some data imported, but I'm not able to use it properly with my document description.
This is an example of the .json I import using mongoimport: https://gist.github.com/2917854
mongoimport -d test -c example data.json
I noticed that all my document it's imported to a unique object in spite of creating one of object for each shop.
That's why when I try to find a shop or anything I want to query, all the document is returned.
db.example.find({"shops.name":"x"})
I want to be able to query the db to obtain products by the id using dot notation something similar to:
db.example.find({"shops.name":"x","categories.type":"shirts","clothes.id":"1"}
The problem is that all the document is imported like a single object. The question is: How
do I need to import the object to obtain my desired result?

Docs note that:
This utility takes a single file that contains 1 JSON/CSV/TSV string per line and inserts it.
In the structure you are using -assuming the errors on the gist are fixed- you are essentially importing one document with only shops field.
After breaking the data into separate shop docs, import using something like (shops being the collection name, makes more sense than using example):
mongoimport -d test -c shops data.json
and then you can query like:
db.shops.find({"name":x,"categories.type":"shirts"})

There is a parameter --jsonArray:
Accept import of data expressed with multiple MongoDB document within a single JSON array
Using this option you can feed it an array, so you only need to strip the outer object syntax i.e. everything at the beginning until and including "shops" :, and the } at the end.
Myself I use a little tool called jq that can extract the array from command line:
./jq '.shops' shops.json

IMPORT FROM JSON
mongoimport --db "databaseName" --collection "collectionName" --type json --file "fileName.json" --jsonArray
JSON format should be in this format. (Array of Objects)
[
{ name: "Name1", msg: "This is msg 1" },
{ name: "Name2", msg: "This is msg 2" },
{ name: "Name3", msg: "This is msg 3" }
]
IMPORT FROM CSV
mongoimport --db "databaseName" --collection "collectionName" --type csv --file "fileName.csv" --headerline
More Info
https://docs.mongodb.com/getting-started/shell/import-data/

Importing a JSON
The command mongoimport allows us to import human readable JSON in a specific database & a collection. To import a JSON data in a specific database & a collection, type mongoimport -d databaseName -c collectionName jsonFileName.json

Related

Facing issue with Mongoexport json file "_id" column

I am exporting mongo collection to json format and then loading that data to bigquery table using bq load command.
mongoexport --uri mongo_uri --collection coll_1 --type json --fields id,createdAt,updatedAt --out data1.csv
The json row looks like below:
{"_id":{"$oid":"6234234345345234234sdfsf"},"id":1,"createdAt":"2021-05-11 04:15:15","updatedAt":null}
but when i run bq load command in bigquery it gives below error:
Invalid field name "$oid". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 300 characters long.
I think if mongoexport json contains {"_id": ObjectId(6234234345345234234sdfsf)} , my issue will be solved.
Is there any way to export json like this?
Or any other way to achive this?
Note: i can't use csv format because mongo documents contain commas.
By default, _id holds an ObjectId value, so it's better to store data in {"_id": ObjectId(6234234345345234234sdfsf)} format instead of storing it in "_id":{"$oid":"6234234345345234234sdfsf"}.
As you mentioned if json contains {"_id": ObjectId(6234234345345234234sdfsf)} your problem will be solved.
Replace $oid with oid. I'm using Python, so the code below worked:
with fileinput.FileInput("mongoexport_json.txt", inplace=True, encoding="utf8") as file:
for line in file:
print(line.replace('"$oid":', '"oid":'), end='')

Add a json document with mlab (mongodb in heroku) with multiple objects

I am trying to import a JSON file to my mongoldb in heroku by using the 'Add document' button. When I only insert one object, everything is working accordingly. However, if I try to add multiple objects in the same JSON the site returns to the homepage without any result. The JSON looks like this:
[
{"flightNo":"t010118CND11111112","STD": {"$date": "2018-01-01T06:00:00.000Z"}},
{"flightNo":"t010118CND11121112","STD": {"$date": "2018-01-01T14:00:00.000Z"}}
]
Isn't it possible to import a large file containing multiple objects? If not, is there any other easy way to achieve this?
You can use mongoimport to import JSON:
mongoimport -h ds123.mlab.com:123 -d mydb -c mycoll -u myuser -p "my password" --file "C:\Users\me\file.json" --jsonArray
The JSON files that MongoDB works with are usually formatted like this:
{"flightNo":"t010118CND11111112","STD": {"$date": "2018-01-01T06:00:00.000Z"}}
{"flightNo":"t010118CND11121112","STD": {"$date": "2018-01-01T14:00:00.000Z"}}
Note the lack of [] and comma. The --jsonArray parameter allows you to use ordinary JSON arrays:
[
{"flightNo":"t010118CND11111112","STD": {"$date": "2018-01-01T06:00:00.000Z"}},
{"flightNo":"t010118CND11121112","STD": {"$date": "2018-01-01T14:00:00.000Z"}}
]

Using mongoimport with a single document containing a list of documents

Given the following document in a json file:
{
"session_date" : "03/03/2017",
"data" : [
{"user": "jack", "views": 10}
]
}
The JSON is valid if I copy it to the insert window of Robomongo, and results in inserting one document which contains a list of documents (a list of 1 document in this simple example).
Nevertheless, I am unable to do this with mongoimport:
> mongoimport --db mydb --jsonArray --collection mycollection --file data\test.json
> connected to: localhost
> Failed: error reading separator after document #1: bad JSON array format - found no opening bracket '[' in input source
> imported 0 documents
Since it is a document and not an array of documents, I cannot use --jsonArray option.
Any help importing this?
mongoimport --db mydb --collection mycollection --file data\test.json
should work for you if that's the simple import file you're planning to import anyway.
Just to add info, this shall create a document with the above JSON data in the mycollection under mydb.

Error handling importing json to mongo

I have a json file which looks like this:
{"_id":"id1","cat":["C","D","R","P"],"desc":"some desc","name":"some_name","category":"categ","languages":["en"],"type":"swf","start_date":1.39958850681748E9},
{"_id":"toy-driver","cat":["C","D","R","P"],"desc":"some desc","name":"name2","category":"Rac","languages":["en"],"type":"swf","start_date":1.399588506820609E9},
............
It consists of about 900 rows. When i happen to import 2 or 3 similar rows using the command:
mongoimport -d test -c test --jsonArray file.json
it imports perfectly without any errors whereas importing the entire file causes some error.
imported 0 objects
encountered 1 error
If I'm not mistaken, there are 2 problems here.
--jsonArray expects 1 JSON document where all of your 900 records are part of a single array.
You're supplying 900 standalone JSON rows from within a file, which doesn't answer the --jsonArray format. (also, drop the comma at the end of each line, mongoimport expects newlines.
if you want to import data from a file, you have to specify the keyword --file file.json
I believe that if you'll remove the --jsonArray argument and add the --file keyword, this will be fine. e.g.:
mongoimport -d test -c test --file file.json
Where file.json is similar to this:
{"_id":"id1","cat":["C","D","R"], ... "type":"swf","start_date":1.39958850681748E9}
{"_id":"id2","cat":["C","D","R"], ... "type":"swf","start_date":1.39958850681748E9}
{"_id":"id3","cat":["C","D","R"], ... "type":"swf","start_date":1.39958850681748E9}
Hope this helped!

mongoexport without _id field

I am using mongoexport to export some data into .json formatted file, however the document has a large size overhead introduced by _id:IDVALUE tuples.
I found a similar post Is there a way to retrieve data from MongoDB without the _id field? on how to omit the _id field when retrieving data from mongo, but not exporting. It is suggested to use: .Exclude("_id"). I tried to reqrite the --query parameter of mongoexport to somehow include the .Exclude("_id") parameter, but all of the attempts failed so far.
Please suggest what is the proper way of doing this, or should I revert to using some post-export techniques?
Thanks
There appears to be no way to exclude a field (such as _id) using mongoexport.
Here's an alternative that has worked for me on moderate sized databases:
mongo myserver/mydb --quiet --eval "db.mycoll.find({}, {_id:0}).forEach(printjson);" > out.txt
On a large database (many millions of records) it can take a while and running this will affect other operations people try to do on the system:
This works:
mongoexport --db db_name --collection collection_name | sed '/"_id":/s/"_id":[^,]*,//' > file_name.json
Pipe the output of mongoexport into jq and remove the _id field there.
mongoexport --uri=mongodb://localhost/mydb --collection=my_collection \
| jq 'del(._id)'
Update: adding link to jq.
I know you specified you wanted to export in JSON but if you could substitute CSV data the native mongo export will work, and will be a lot faster than the above solutions
mongoexport --db <dbName> --collection <collectionName> --csv --fields "<fieldOne>,<fieldTwo>,<fieldThree>" > mongoex.csv
mongoexport doesn't seem to have such option.
With ramda-cli stripping the _id would look like:
mongoexport --db mydb --collection mycoll -f name,age | ramda 'omit ["_id"]'
I applied quux00's solution but forEach(printjson) prints MongoDB Extended JSON notation in the output (for instance "last_update" : NumberLong("1384715001000").
It will be better to use the following line instead:
db.mycoll.find({}, {_id:0}).forEach(function (doc) {
print( JSON.stringify(doc) );
});
mongo <server>/<database> --quiet --eval "db.<collection>.find({}, {_id:0,<field>:1}).forEach(printjson);" > out.txt
If you have some query to execute change "" to '' and write your condition in find with "" like find("age":13).
The simplest way to exclude the sub-document information such as the "_id" is to export it as a csv, then use a tool to convert the csv into json.
mongoexport can not omit "_id"
sed is a powerful command to do it:
mongoexport --db mydb --collection mycoll -f name,age | sed '/"_id":/s/"_id":[^,]*,//'
The original answer is from Exclude _id field using MongoExport command
Just use --type=csv option in mongoexport command.
mongoexport --db=<db_name> --collection=<collection_name> --type=csv --field=<fields> --out=<Outfilename>.csv
For MongoDb version 3.4, you can use --noHeaderLine option in mongoexport command to exclude the field header in csv export too.
For Detail: https://docs.mongodb.com/manual/reference/program/mongoexport/
export into a file and just use replace empty value using Regular expression, in my case
"_id": "f5dc48e1-ed04-4ef9-943b-b1194a088b95"
I used "_id": "(\w|-)*",
With jq this can be achieved easily:
mongoexport -d database -c collection --jsonArray | jq 'del(.[]._id)'
Have you tried specifying your fields with the --fields flag? All fields that are not mentioned are excluded from the export.
For maintainability you can also write your fields into a seperate file and use --fieldFile.