Add a json document with mlab (mongodb in heroku) with multiple objects - json

I am trying to import a JSON file to my mongoldb in heroku by using the 'Add document' button. When I only insert one object, everything is working accordingly. However, if I try to add multiple objects in the same JSON the site returns to the homepage without any result. The JSON looks like this:
[
{"flightNo":"t010118CND11111112","STD": {"$date": "2018-01-01T06:00:00.000Z"}},
{"flightNo":"t010118CND11121112","STD": {"$date": "2018-01-01T14:00:00.000Z"}}
]
Isn't it possible to import a large file containing multiple objects? If not, is there any other easy way to achieve this?

You can use mongoimport to import JSON:
mongoimport -h ds123.mlab.com:123 -d mydb -c mycoll -u myuser -p "my password" --file "C:\Users\me\file.json" --jsonArray
The JSON files that MongoDB works with are usually formatted like this:
{"flightNo":"t010118CND11111112","STD": {"$date": "2018-01-01T06:00:00.000Z"}}
{"flightNo":"t010118CND11121112","STD": {"$date": "2018-01-01T14:00:00.000Z"}}
Note the lack of [] and comma. The --jsonArray parameter allows you to use ordinary JSON arrays:
[
{"flightNo":"t010118CND11111112","STD": {"$date": "2018-01-01T06:00:00.000Z"}},
{"flightNo":"t010118CND11121112","STD": {"$date": "2018-01-01T14:00:00.000Z"}}
]

Related

How to write a correct mongodb query for mongodump?

I'm trying to backup 3 articles from my database, I have their IDs but when I try to use mongodump I just can't seem to be able to write the proper json query. I get either a JSON error message, or some cryptic cannot decode objectID into a slice message.
Here's the command that I'm trying to run at the moment:
mongodump -d 'data' -c 'articles' -q '{"$oid": "5fa0bd32f7d5870029c7d421" }'
This is returning the ObjectID into a slice error, which I don't really understand. I also tried with ObjectId, like this:
mongodump -d 'data' -c 'articles' -q '{"_id": ObjectId("5fa0bd32f7d5870029c7d421") }'
But this one gives me a invalid JSON error.
I've tried all forms of escaping, escaping the double quotes, escaping the dollar, but nothing NOTHING seems to work. I'm desperate, and I hate mongodb. The closest I've been able to get to a working solution was this:
mongodump -d 'nikkei' -c 'articles' -q '{"_id": "ObjectId(5fa0bd32f7d5870029c7d421)" }'
And I say closest because this didn't fail, the command ran but it returned done dumping data.articles (0 documents) which means, if I understood correctly, that no articles were saved.
What would be the correct format for the query? I'm using mongodump version r4.2.2 by the way.
I have a collection with these 4 documents:
> db.test.find()
{ "_id" : ObjectId("5fab80615397db06f00503c3") }
{ "_id" : ObjectId("5fab80635397db06f00503c4") }
{ "_id" : ObjectId("5fab80645397db06f00503c5") }
{ "_id" : ObjectId("5fab80645397db06f00503c6") }
I make the binary export using the mongodump. This is using MongoDB v4.2 on Windows OS.
>> mongodump --db=test --collection=test --query="{ \"_id\": { \"$eq\" : { \"$oid\": \"5fab80615397db06f00503c3\" } } }"
2020-11-11T11:42:13.705+0530 writing test.test to dump\test\test.bson
2020-11-11T11:42:13.737+0530 done dumping test.test (1 document)
Here's an answer for those using Python:
Note: you must have mongo database tools installed on your system
import json
import os
# insert you query here
query = {"$oid": "5fa0bd32f7d5870029c7d421"}
# cast the query to a string
query = json.dumps(query)
# run the mongodump
command = f"mongodump --db my_database --collection my_collection --query '{query}'"
os.system(command)
If your query is for JSON than try this format.
mongodump -d=nikkei -c=articles -q'{"_id": "ObjectId(5fa0bd32f7d5870029c7d421)" }'
Is there nothing else you could query though, like a title? Might make things a little more simple.
I pulled this from mongoDB docs. It was pretty far down the page but here is the link.
https://docs.mongodb.com/database-tools/mongodump/#usage-in-backup-strategy

Using mongoimport with a single document containing a list of documents

Given the following document in a json file:
{
"session_date" : "03/03/2017",
"data" : [
{"user": "jack", "views": 10}
]
}
The JSON is valid if I copy it to the insert window of Robomongo, and results in inserting one document which contains a list of documents (a list of 1 document in this simple example).
Nevertheless, I am unable to do this with mongoimport:
> mongoimport --db mydb --jsonArray --collection mycollection --file data\test.json
> connected to: localhost
> Failed: error reading separator after document #1: bad JSON array format - found no opening bracket '[' in input source
> imported 0 documents
Since it is a document and not an array of documents, I cannot use --jsonArray option.
Any help importing this?
mongoimport --db mydb --collection mycollection --file data\test.json
should work for you if that's the simple import file you're planning to import anyway.
Just to add info, this shall create a document with the above JSON data in the mycollection under mydb.

Is there any way in Elasticsearch to get results as CSV file in curl API?

I am using elastic search.
I need results from elastic search as a CSV file.
Any curl URL or any plugins to achieve this?
I've done just this using cURL and jq ("like sed, but for JSON"). For example, you can do the following to get CSV output for the top 20 values of a given facet:
$ curl -X GET 'http://localhost:9200/myindex/item/_search?from=0&size=0' -d '
{"from": 0,
"size": 0,
"facets": {
"sourceResource.subject.name": {
"global": true,
"terms": {
"order": "count",
"size": 20,
"all_terms": true,
"field": "sourceResource.subject.name.not_analyzed"
}
}
},
"sort": [
{
"_score": "desc"
}
],
"query": {
"filtered": {
"query": {
"match_all": {}
}
}
}
}' | jq -r '.facets["subject"].terms[] | [.term, .count] | #csv'
"United States",33755
"Charities--Massachusetts",8304
"Almshouses--Massachusetts--Tewksbury",8304
"Shields",4232
"Coat of arms",4214
"Springfield College",3422
"Men",3136
"Trees",3086
"Session Laws--Massachusetts",2668
"Baseball players",2543
"Animals",2527
"Books",2119
"Women",2004
"Landscape",1940
"Floral",1821
"Architecture, Domestic--Lowell (Mass)--History",1785
"Parks",1745
"Buildings",1730
"Houses",1611
"Snow",1579
I've used Python successfully, and the scripting approach is intuitive and concise. The ES client for python makes life easy. First grab the latest Elasticsearch client for Python here:
http://www.elasticsearch.org/blog/unleash-the-clients-ruby-python-php-perl/#python
Then your Python script can include calls like:
import elasticsearch
import unicodedata
import csv
es = elasticsearch.Elasticsearch(["10.1.1.1:9200"])
# this returns up to 500 rows, adjust to your needs
res = es.search(index="YourIndexName", body={"query": {"match": {"title": "elasticsearch"}}},500)
sample = res['hits']['hits']
# then open a csv file, and loop through the results, writing to the csv
with open('outputfile.tsv', 'wb') as csvfile:
filewriter = csv.writer(csvfile, delimiter='\t', # we use TAB delimited, to handle cases where freeform text may have a comma
quotechar='|', quoting=csv.QUOTE_MINIMAL)
# create column header row
filewriter.writerow(["column1", "column2", "column3"]) #change the column labels here
for hit in sample:
# fill columns 1, 2, 3 with your data
col1 = hit["some"]["deeply"]["nested"]["field"].decode('utf-8') #replace these nested key names with your own
col1 = col1.replace('\n', ' ')
# col2 = , col3 = , etc...
filewriter.writerow([col1,col2,col3])
You may want to wrap the calls to the column['key'] references in try / catch error handling, since documents are unstructured, and may not have the field from time to time (depends on your index).
I have a complete Python sample script using the latest ES python client available here:
https://github.com/jeffsteinmetz/pyes2csv
You can use elasticsearch head plugin.
You can install from elasticsearch head plugin
http://localhost:9200/_plugin/head/
Once you have the plugin installed, navigate to the structured query tab, provide query details and you can select 'csv' format from the 'Output Results' dropdown.
I don't think there is a plugin that will give you CSV results directly from the search engine, so you will have to query ElasticSearch to retrieve results and then write them to a CSV file.
Command line
If you're on a Unix-like OS, then you might be able to make some headway with es2unix which will give you search results back in raw text format on the command line and so should be scriptable.
You could then dump those results to text file or pipe to awk or similar to format as CSV. There is a -o flag available, but it only gives 'raw' format at the moment.
Java
I found an example using Java - but haven't tested it.
Python
You could query ElasticSearch with something like pyes and write the results set to a file with the standard csv writer library.
Perl
Using Perl then you could use Clinton Gormley's GIST linked by Rakesh - https://gist.github.com/clintongormley/2049562
Shameless plug. I wrote estab - a command line program to export elasticsearch documents to tab-separated values.
Example:
$ export MYINDEX=localhost:9200/test/default/
$ curl -XPOST $MYINDEX -d '{"name": "Tim", "color": {"fav": "red"}}'
$ curl -XPOST $MYINDEX -d '{"name": "Alice", "color": {"fav": "yellow"}}'
$ curl -XPOST $MYINDEX -d '{"name": "Brian", "color": {"fav": "green"}}'
$ estab -indices "test" -f "name color.fav"
Brian green
Tim red
Alice yellow
estab can handle export from multiple indices, custom queries, missing values, list of values, nested fields and it's reasonably fast.
If you are using kibana (app/discover in general), you can make your query in the UI, then save it and share -> CSV Reports. This creates a csv with a line for each record and columns will be comma separated
I have been using https://github.com/robbydyer/stash-query stash-query for this.
I find it quite convenient and working well, though i struggle with the install every time I redo it (this is due to me not being very fluent with gem's and ruby).
On Ubuntu 16.04 though, what seemed to work was:
apt install ruby
sudo apt-get install libcurl3 libcurl3-gnutls libcurl4-openssl-dev
gem install stash-query
and then you should be good to go
Installs Ruby
Install curl dependencies for Ruby, because the stash-query tool is working via the REST API of elasticsearch
Installs stash query
This blog post describes how to build it as well:
https://robbydyer.wordpress.com/2014/08/25/exporting-from-kibana/
you can use elasticsearch2csv is a small and effective python3 script that uses Elasticsearch scroll API and handle a big query response.
You can use GIST. Its simple.
Its in Perl and you can get some help from it.
Please download and see the usage on GitHub. Here is the link.
GIST GitHub
Or if you want in Java then go for elasticsearch-river-csv
elasticsearch-river-csv

mongoexport without _id field

I am using mongoexport to export some data into .json formatted file, however the document has a large size overhead introduced by _id:IDVALUE tuples.
I found a similar post Is there a way to retrieve data from MongoDB without the _id field? on how to omit the _id field when retrieving data from mongo, but not exporting. It is suggested to use: .Exclude("_id"). I tried to reqrite the --query parameter of mongoexport to somehow include the .Exclude("_id") parameter, but all of the attempts failed so far.
Please suggest what is the proper way of doing this, or should I revert to using some post-export techniques?
Thanks
There appears to be no way to exclude a field (such as _id) using mongoexport.
Here's an alternative that has worked for me on moderate sized databases:
mongo myserver/mydb --quiet --eval "db.mycoll.find({}, {_id:0}).forEach(printjson);" > out.txt
On a large database (many millions of records) it can take a while and running this will affect other operations people try to do on the system:
This works:
mongoexport --db db_name --collection collection_name | sed '/"_id":/s/"_id":[^,]*,//' > file_name.json
Pipe the output of mongoexport into jq and remove the _id field there.
mongoexport --uri=mongodb://localhost/mydb --collection=my_collection \
| jq 'del(._id)'
Update: adding link to jq.
I know you specified you wanted to export in JSON but if you could substitute CSV data the native mongo export will work, and will be a lot faster than the above solutions
mongoexport --db <dbName> --collection <collectionName> --csv --fields "<fieldOne>,<fieldTwo>,<fieldThree>" > mongoex.csv
mongoexport doesn't seem to have such option.
With ramda-cli stripping the _id would look like:
mongoexport --db mydb --collection mycoll -f name,age | ramda 'omit ["_id"]'
I applied quux00's solution but forEach(printjson) prints MongoDB Extended JSON notation in the output (for instance "last_update" : NumberLong("1384715001000").
It will be better to use the following line instead:
db.mycoll.find({}, {_id:0}).forEach(function (doc) {
print( JSON.stringify(doc) );
});
mongo <server>/<database> --quiet --eval "db.<collection>.find({}, {_id:0,<field>:1}).forEach(printjson);" > out.txt
If you have some query to execute change "" to '' and write your condition in find with "" like find("age":13).
The simplest way to exclude the sub-document information such as the "_id" is to export it as a csv, then use a tool to convert the csv into json.
mongoexport can not omit "_id"
sed is a powerful command to do it:
mongoexport --db mydb --collection mycoll -f name,age | sed '/"_id":/s/"_id":[^,]*,//'
The original answer is from Exclude _id field using MongoExport command
Just use --type=csv option in mongoexport command.
mongoexport --db=<db_name> --collection=<collection_name> --type=csv --field=<fields> --out=<Outfilename>.csv
For MongoDb version 3.4, you can use --noHeaderLine option in mongoexport command to exclude the field header in csv export too.
For Detail: https://docs.mongodb.com/manual/reference/program/mongoexport/
export into a file and just use replace empty value using Regular expression, in my case
"_id": "f5dc48e1-ed04-4ef9-943b-b1194a088b95"
I used "_id": "(\w|-)*",
With jq this can be achieved easily:
mongoexport -d database -c collection --jsonArray | jq 'del(.[]._id)'
Have you tried specifying your fields with the --fields flag? All fields that are not mentioned are excluded from the export.
For maintainability you can also write your fields into a seperate file and use --fieldFile.

Proper way to import json file to mongo

I've been trying to use mongo with some data imported, but I'm not able to use it properly with my document description.
This is an example of the .json I import using mongoimport: https://gist.github.com/2917854
mongoimport -d test -c example data.json
I noticed that all my document it's imported to a unique object in spite of creating one of object for each shop.
That's why when I try to find a shop or anything I want to query, all the document is returned.
db.example.find({"shops.name":"x"})
I want to be able to query the db to obtain products by the id using dot notation something similar to:
db.example.find({"shops.name":"x","categories.type":"shirts","clothes.id":"1"}
The problem is that all the document is imported like a single object. The question is: How
do I need to import the object to obtain my desired result?
Docs note that:
This utility takes a single file that contains 1 JSON/CSV/TSV string per line and inserts it.
In the structure you are using -assuming the errors on the gist are fixed- you are essentially importing one document with only shops field.
After breaking the data into separate shop docs, import using something like (shops being the collection name, makes more sense than using example):
mongoimport -d test -c shops data.json
and then you can query like:
db.shops.find({"name":x,"categories.type":"shirts"})
There is a parameter --jsonArray:
Accept import of data expressed with multiple MongoDB document within a single JSON array
Using this option you can feed it an array, so you only need to strip the outer object syntax i.e. everything at the beginning until and including "shops" :, and the } at the end.
Myself I use a little tool called jq that can extract the array from command line:
./jq '.shops' shops.json
IMPORT FROM JSON
mongoimport --db "databaseName" --collection "collectionName" --type json --file "fileName.json" --jsonArray
JSON format should be in this format. (Array of Objects)
[
{ name: "Name1", msg: "This is msg 1" },
{ name: "Name2", msg: "This is msg 2" },
{ name: "Name3", msg: "This is msg 3" }
]
IMPORT FROM CSV
mongoimport --db "databaseName" --collection "collectionName" --type csv --file "fileName.csv" --headerline
More Info
https://docs.mongodb.com/getting-started/shell/import-data/
Importing a JSON
The command mongoimport allows us to import human readable JSON in a specific database & a collection. To import a JSON data in a specific database & a collection, type mongoimport -d databaseName -c collectionName jsonFileName.json