I'm new to MongoDB in Windows 7 and I tried to import JSON and CSV file into MongoDB.
1.First i tried importing JSON file using the command
"C:\>mongodb\bin\mongoimport –host localhost:27017 –db mydb –collection docs"
and it showed this error
"exception:BSON representation of supplied JSON is too large: code FailedToParse: FailedToParse: Expecting '{': offset:0"
2.When i import CSV file i used the command
"C:\mongodb\bin>mongoimport --db mynewdb --collection message --type csv --fields
form,Iname,fistname,lastname --file d:\new folder\csv1.csv"
and i get the error message as
"ERROR: multiple occurrences
Import CSV, TSV or JSON data into MongoDB.
When importing JSON documents, each document must be a separate line of the input file"
I downloaded JSON and CSV bulk file randomly by browsing. I want to know whether it will get imported when data's are non-organized or should the data be organized? If so where can get a complete bulk JSON and CSV file which is ready to import.
Try using the --jsonArray flag at the end of your query so it would look like
C:\>mongodb\bin\mongoimport –host localhost:27017 –db mydb –collection docs --jsonArray
Related
I am trying to load data into Redshift using a Firehose delivery stream.
I am using a jsonpaths file uploaded to S3 at the following location.
s3://my_bucket/jsonpaths.json
This file contains the following jsonpaths config
{
"jsonpaths": [
"$['col_1']",
"$['col_2']",
"$['col_3']",
"$['col_4']"
]
}
To me this config looks ok, but the Firehose Redshift logs keep showing the following error.
"The provided jsonpaths file is not in a supported JSON format."
A similar error is seen even if I run the following copy command directly on the Redshift cluster.
reshift_db=# COPY my_schema.my_table
FROM 's3://my_bucket/data.json'
FORMAT JSON 's3://my_bucket/jsonpaths.json'
CREDENTIALS 'aws_iam_role=<role_arn>'
;
ERROR: Manifest file is not in correct json format
DETAIL:
-----------------------------------------------
error: Manifest file is not in correct json format
code: 8001
context: Manifest file location = s3://my_bucket/jsonpaths.json
query: yyyyy
location: s3_utility.cpp:338
process: padbmaster [pid=xxxxx]
-----------------------------------------------
Can someone help with what is going wrong here?
The problem in my case was a BOM (Byte Order Mark) at the beginning of the jsonpaths file. Some editors can save a file with BOM, and this does not show as characters when seen in the editor. And apparently Redshift does not like BOM at the beginning of the jsonpaths file.
For those of you who want to check if this is the case for your jsonpaths file, you can open the file in a hex editor. For the S3 file this can be done as follows.
# aws s3 cp s3://my_bucket/jsonpaths.json - | hexdump -C
To remove the BOM from the file you can do the following.
# aws s3 cp s3://my_bucket/jsonpaths.json - | dos2unix | aws s3 cp - s3://my_bucket/jsonpaths.json
Almost after 2 days of trying, and after having raised an AWS Support ticket, and having posted this question, it struct me that I should check the file in a hex editor.
When I run the following command:
mongoimport -v -d ntsb -c data xml_results.json --jsonArray
I get this error:
2020-07-15T22:51:41.267-0400 using write concern: &{majority false 0}
2020-07-15T22:51:41.270-0400 filesize: 68564556 bytes
2020-07-15T22:51:41.270-0400 using fields:
2020-07-15T22:51:41.270-0400 connected to: mongodb://localhost/
2020-07-15T22:51:41.270-0400 ns: ntsb.data
2020-07-15T22:51:41.271-0400 connected to node type: standalone
2020-07-15T22:51:41.271-0400 Failed: error processing document #1: invalid character '}' looking for beginning of object key string
2020-07-15T22:51:41.271-0400 0 document(s) imported successfully. 0 document(s) failed to import.
I have tried all the solutions in this file and nothing worked. My JSON file is 60ish MB in size so it would be really hard to go through it and find the bracket issue. I believe that it is a problem with the UTF-8 formatting maybe? I take an XML file I downloaded on the internet and convert it into JSON with a Python script. When I try the --jsonArray flag, it gives the same error. Any ideas? Thanks!
It turns out within this massive file there were a few unnecessary commas. I was able to use Pythons built in JSON parsing to jump to lines with errors and remove them manually. As far as I can tell, the invalid character had nothing to do with the } but with the comma that caused it to expect another value before the closing bracket.
After solving this, I was still unable to import successfully because now the file was too large. The trick around this was to surround all the JSON objects with array brackets [] and use the following command: mongoimport -v -d ntsb -c data xml_results.json --batchSize 1 --jsonArray
After a few seconds the data imported successfully into Mongo.
I am trying to import a JSON file into MongoDb using Mongoimport. It throws the following error Failed: error processing document #1: read C:\Users\mbryant2\Documents\primer-dataset.json: The handle is invalid.
Here is my cmd:
$ mongoimport --db tempTestDb --collection restaurants --drop --file C:/Users/mbryant2/Documents/primer-dataset.json
and response:
2018-09-14T12:17:36.337-0600 connected to: localhost
2018-09-14T12:17:36.338-0600 dropping: tempTestDb.restaurants
2018-09-14T12:17:36.339-0600 Failed: error processing document #1: read C:\Users\mbryant2\Documents\primer-dataset.json: The handle is invalid.
2018-09-14T12:17:36.339-0600 imported 0 documents
Anyone have any ideas on what I am missing? Is it needing login credentials or something like that?
If the data is represented as a JSON array, rather than individual lines of JSON text, you will need to add the --jsonArray parameter to mongoimport.
mongoimport command returns with the correct amount of documents, and adds a new collection but when I try to open my db there is nothing. I am using a json array to store my data but am not sure why this isnt working.
C:\Program Files\MongoDB\Server\3.2\bin>mongoimport --db playerList --collection data --jsonArray --file ../../../../../nodeProjects/public/data.json
2016-07-20T09:30:05.807-0700 connected to: localhost
2016-07-20T09:30:05.813-0700 imported 1 document
C:\Program Files\MongoDB\Server\3.2\bin>mongo
MongoDB shell version: 3.2.7
connecting to: test
> use playerList
switched to db playerList
> db.playerList.find().pretty()
> db.getCollectionNames()
[ "data" ]
and my data.json file is.
[{"name":"A.J. Green","team":"CIN","pos":"WR","weeklyPts":[{"week":1,"pts":6.3},{"week":2,"pts":10.5},{"week":3,"pts":34.7}]}]
your collection is data not playerList which can be viewed in last line i.e db.getCollectionNames(), change db.playerList.find().pretty to db.data.find.pretty()and it will work
The collection name in your find() is wrong, you are doing a find on the playerList collection but you imported the data into a collection called "data". So try:
db.data.find().pretty()
I am trying to import a json document into mongodb but it shows me unexpected identifier. my json document looks something like following
[
{
"Cancer Sites":"Female Breast",
"State":"Alabama",
"Year":2000,
"Sex":"Female",
"Count":550
},
{
"Cancer Sites":"Female Breast",
"State":"Alabama",
"Year":2000,
"Sex":"Female",
"Count":2340
},
{
"Cancer Sites":"Female Breast",
"State":"Alabama",
"Year":2000,
"Sex":"Female",
"Count":45
}
]
I tried with following query from my mongo shell but it doesn't work
mongoimport -d treatment -c stats --file news.json
I am executing it from mongo shell on windows command prompt. my mongo shell is in C:\mongodb\bin path and my file is also in same path. can anyone tell where I am wrong
since it is a list of array we should use
mongoimport -d treatment -c stats --jsonArray news.json