I have a json file which looks like this:
{"_id":"id1","cat":["C","D","R","P"],"desc":"some desc","name":"some_name","category":"categ","languages":["en"],"type":"swf","start_date":1.39958850681748E9},
{"_id":"toy-driver","cat":["C","D","R","P"],"desc":"some desc","name":"name2","category":"Rac","languages":["en"],"type":"swf","start_date":1.399588506820609E9},
............
It consists of about 900 rows. When i happen to import 2 or 3 similar rows using the command:
mongoimport -d test -c test --jsonArray file.json
it imports perfectly without any errors whereas importing the entire file causes some error.
imported 0 objects
encountered 1 error
If I'm not mistaken, there are 2 problems here.
--jsonArray expects 1 JSON document where all of your 900 records are part of a single array.
You're supplying 900 standalone JSON rows from within a file, which doesn't answer the --jsonArray format. (also, drop the comma at the end of each line, mongoimport expects newlines.
if you want to import data from a file, you have to specify the keyword --file file.json
I believe that if you'll remove the --jsonArray argument and add the --file keyword, this will be fine. e.g.:
mongoimport -d test -c test --file file.json
Where file.json is similar to this:
{"_id":"id1","cat":["C","D","R"], ... "type":"swf","start_date":1.39958850681748E9}
{"_id":"id2","cat":["C","D","R"], ... "type":"swf","start_date":1.39958850681748E9}
{"_id":"id3","cat":["C","D","R"], ... "type":"swf","start_date":1.39958850681748E9}
Hope this helped!
Related
I am using json2csv to convert multiple json files structured like
{
"address": "0xe9f6191596bca549e20431978ee09d3f8db959a9",
"copyright": "None",
"created_at": "None"
...
}
The problem is that I need to put multiple json files into one csv file.
In my code I iterate through a hash file, call a curl with that hash and output the data to a json. Then I use json2csv to convert each json to csv.
mkdir -p curl_outs
{ cat hashes.hash; echo; } | while read h; do
echo "Downloading $h"
curl -L https://main.net955305.contentfabric.io/s/main/q/$h/meta/public/nft > curl_outs/$h.json;
node index.js $h;
json2csv -i curl_outs/$h.json -o main.csv;
done
I use -o to output the json into csv, however it just overwrites the previous json data. So I end up with only one row.
I have used >>, and this does append to the csv file.
json2csv -i "curl_outs/${h}.json" >> main.csv
But for some reason it appends the data's keys to the end of the csv file
I've also tried
cat csv_outs/*.csv > main.csv
However I get the same output.
How do I append multiple json files to one main csv file?
It's not entirely clear from the image and your description what's wrong with >>, but it looks like maybe the CSV file doesn't have a trailing line break, so appending the next file (>>) starts writing directly at the end of the last row and column (cell) of the previous file's data.
I deal with CSVs almost daily and love the GoCSV tool. Its stack subcommand will do just what the name implies: stack multiple CSVs, one on top of the other.
In your case, you could download each JSON and convert it to an individual (intermediate) CSV. Then, at the end, stack all the intermediate CSVs, then delete all the intermediate CSVs.
mkdir -p curl_outs
{ cat hashes.hash; echo; } | while read h; do
echo "Downloading $h"
curl -L https://main.net955305.contentfabric.io/s/main/q/$h/meta/public/nft > curl_outs/$h.json;
node index.js $h;
json2csv -i curl_outs/$h.json -o curl_outs/$h.csv;
done
gocsv stack curl_outs/*.csv > main.csv;
# I suggested deleting the intermediate CSVs
# rm curl_outs/*.csv
# ...
I changed the last line of your loop to json2csv -i curl_outs/$h.json -o curl_outs/$h.csv; to create those intermediate CSVs I mentioned before. Now, gocsv's stack subcommand can take a list of those intermediate CSVs and give you main.csv.
I am exporting mongo collection to json format and then loading that data to bigquery table using bq load command.
mongoexport --uri mongo_uri --collection coll_1 --type json --fields id,createdAt,updatedAt --out data1.csv
The json row looks like below:
{"_id":{"$oid":"6234234345345234234sdfsf"},"id":1,"createdAt":"2021-05-11 04:15:15","updatedAt":null}
but when i run bq load command in bigquery it gives below error:
Invalid field name "$oid". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 300 characters long.
I think if mongoexport json contains {"_id": ObjectId(6234234345345234234sdfsf)} , my issue will be solved.
Is there any way to export json like this?
Or any other way to achive this?
Note: i can't use csv format because mongo documents contain commas.
By default, _id holds an ObjectId value, so it's better to store data in {"_id": ObjectId(6234234345345234234sdfsf)} format instead of storing it in "_id":{"$oid":"6234234345345234234sdfsf"}.
As you mentioned if json contains {"_id": ObjectId(6234234345345234234sdfsf)} your problem will be solved.
Replace $oid with oid. I'm using Python, so the code below worked:
with fileinput.FileInput("mongoexport_json.txt", inplace=True, encoding="utf8") as file:
for line in file:
print(line.replace('"$oid":', '"oid":'), end='')
Given the following document in a json file:
{
"session_date" : "03/03/2017",
"data" : [
{"user": "jack", "views": 10}
]
}
The JSON is valid if I copy it to the insert window of Robomongo, and results in inserting one document which contains a list of documents (a list of 1 document in this simple example).
Nevertheless, I am unable to do this with mongoimport:
> mongoimport --db mydb --jsonArray --collection mycollection --file data\test.json
> connected to: localhost
> Failed: error reading separator after document #1: bad JSON array format - found no opening bracket '[' in input source
> imported 0 documents
Since it is a document and not an array of documents, I cannot use --jsonArray option.
Any help importing this?
mongoimport --db mydb --collection mycollection --file data\test.json
should work for you if that's the simple import file you're planning to import anyway.
Just to add info, this shall create a document with the above JSON data in the mycollection under mydb.
I am using mongoexport to export some data into .json formatted file, however the document has a large size overhead introduced by _id:IDVALUE tuples.
I found a similar post Is there a way to retrieve data from MongoDB without the _id field? on how to omit the _id field when retrieving data from mongo, but not exporting. It is suggested to use: .Exclude("_id"). I tried to reqrite the --query parameter of mongoexport to somehow include the .Exclude("_id") parameter, but all of the attempts failed so far.
Please suggest what is the proper way of doing this, or should I revert to using some post-export techniques?
Thanks
There appears to be no way to exclude a field (such as _id) using mongoexport.
Here's an alternative that has worked for me on moderate sized databases:
mongo myserver/mydb --quiet --eval "db.mycoll.find({}, {_id:0}).forEach(printjson);" > out.txt
On a large database (many millions of records) it can take a while and running this will affect other operations people try to do on the system:
This works:
mongoexport --db db_name --collection collection_name | sed '/"_id":/s/"_id":[^,]*,//' > file_name.json
Pipe the output of mongoexport into jq and remove the _id field there.
mongoexport --uri=mongodb://localhost/mydb --collection=my_collection \
| jq 'del(._id)'
Update: adding link to jq.
I know you specified you wanted to export in JSON but if you could substitute CSV data the native mongo export will work, and will be a lot faster than the above solutions
mongoexport --db <dbName> --collection <collectionName> --csv --fields "<fieldOne>,<fieldTwo>,<fieldThree>" > mongoex.csv
mongoexport doesn't seem to have such option.
With ramda-cli stripping the _id would look like:
mongoexport --db mydb --collection mycoll -f name,age | ramda 'omit ["_id"]'
I applied quux00's solution but forEach(printjson) prints MongoDB Extended JSON notation in the output (for instance "last_update" : NumberLong("1384715001000").
It will be better to use the following line instead:
db.mycoll.find({}, {_id:0}).forEach(function (doc) {
print( JSON.stringify(doc) );
});
mongo <server>/<database> --quiet --eval "db.<collection>.find({}, {_id:0,<field>:1}).forEach(printjson);" > out.txt
If you have some query to execute change "" to '' and write your condition in find with "" like find("age":13).
The simplest way to exclude the sub-document information such as the "_id" is to export it as a csv, then use a tool to convert the csv into json.
mongoexport can not omit "_id"
sed is a powerful command to do it:
mongoexport --db mydb --collection mycoll -f name,age | sed '/"_id":/s/"_id":[^,]*,//'
The original answer is from Exclude _id field using MongoExport command
Just use --type=csv option in mongoexport command.
mongoexport --db=<db_name> --collection=<collection_name> --type=csv --field=<fields> --out=<Outfilename>.csv
For MongoDb version 3.4, you can use --noHeaderLine option in mongoexport command to exclude the field header in csv export too.
For Detail: https://docs.mongodb.com/manual/reference/program/mongoexport/
export into a file and just use replace empty value using Regular expression, in my case
"_id": "f5dc48e1-ed04-4ef9-943b-b1194a088b95"
I used "_id": "(\w|-)*",
With jq this can be achieved easily:
mongoexport -d database -c collection --jsonArray | jq 'del(.[]._id)'
Have you tried specifying your fields with the --fields flag? All fields that are not mentioned are excluded from the export.
For maintainability you can also write your fields into a seperate file and use --fieldFile.
I've been trying to use mongo with some data imported, but I'm not able to use it properly with my document description.
This is an example of the .json I import using mongoimport: https://gist.github.com/2917854
mongoimport -d test -c example data.json
I noticed that all my document it's imported to a unique object in spite of creating one of object for each shop.
That's why when I try to find a shop or anything I want to query, all the document is returned.
db.example.find({"shops.name":"x"})
I want to be able to query the db to obtain products by the id using dot notation something similar to:
db.example.find({"shops.name":"x","categories.type":"shirts","clothes.id":"1"}
The problem is that all the document is imported like a single object. The question is: How
do I need to import the object to obtain my desired result?
Docs note that:
This utility takes a single file that contains 1 JSON/CSV/TSV string per line and inserts it.
In the structure you are using -assuming the errors on the gist are fixed- you are essentially importing one document with only shops field.
After breaking the data into separate shop docs, import using something like (shops being the collection name, makes more sense than using example):
mongoimport -d test -c shops data.json
and then you can query like:
db.shops.find({"name":x,"categories.type":"shirts"})
There is a parameter --jsonArray:
Accept import of data expressed with multiple MongoDB document within a single JSON array
Using this option you can feed it an array, so you only need to strip the outer object syntax i.e. everything at the beginning until and including "shops" :, and the } at the end.
Myself I use a little tool called jq that can extract the array from command line:
./jq '.shops' shops.json
IMPORT FROM JSON
mongoimport --db "databaseName" --collection "collectionName" --type json --file "fileName.json" --jsonArray
JSON format should be in this format. (Array of Objects)
[
{ name: "Name1", msg: "This is msg 1" },
{ name: "Name2", msg: "This is msg 2" },
{ name: "Name3", msg: "This is msg 3" }
]
IMPORT FROM CSV
mongoimport --db "databaseName" --collection "collectionName" --type csv --file "fileName.csv" --headerline
More Info
https://docs.mongodb.com/getting-started/shell/import-data/
Importing a JSON
The command mongoimport allows us to import human readable JSON in a specific database & a collection. To import a JSON data in a specific database & a collection, type mongoimport -d databaseName -c collectionName jsonFileName.json