mongodb import csv to json with subobjects - json

I have file.csv with similar to this structure
loremipsum; machine, metal
As i understand, succeful import will looks like
{
text: "loremipsum", << string
tags: ["machine","metal"] << object with two fields
}
The best result i get
{
text: "loremipsum", << string
tags: "machine, metal" << string
}
If i understand it correctly then please tell me how to do succeful import. Thanks.
Edit: because "tags" object should contain ~16 urls, so tell me how it should be stored correctly.

Ideally, below command should be used to import csv file to mongoDb (Maybe you are using the same):
mongoimport --db users --type csv --headerline --file /filePath/fileName.csv
I think ,Your Problem is with array type of data (if I understood correctly...!!).
Then, You need to first add one document in collection and export it as CSV file. This will give you the format which is expected by above command to import your data correctly. And then Arrange your data as per exported CSV file.
link Answer Here Explained it Very well

I had this data in Excel
I wanted like this in MongoDB
{
"name":"John",
"age":30,
"cars":[ "Ford", "BMW", "Fiat" ]
}
I did replaced the headings like cars.0 cars.1 cars.2
Like this
I used the tool mongoimport and ran this command
mongoimport.exe --uri "mongodb+srv:/localhost/<MYDBName>" --username dbuser --collection ----drop --type csv --useArrayIndexFields --headerline --file 1.csv
here my csv is 1.csv

Related

How to add commas in between JSON objects using Linux Shell and SnowSQL?

While there are several posts about this topic on Stack Overflow, none match my exact use case. I am using a Linux shell script to run SnowSQL to generate a json file.
========================
My json file needs to have a comma between json objects.
This:
{
"CAMPAIGN": "Welcome_New",
"UUID": "fe881781-bdc2-41b2-95f2-e0e8c19dc597"
}
{
"CAMPAIGN": "Welcome_Existing",
"UUID": "77a41c02-beb9-48bf-ada4-b2074c1a78cb"
}
...needs to look this:
{
"CAMPAIGN": "Welcome_New",
"UUID": "fe881781-bdc2-41b2-95f2-e0e8c19dc597"
},
{
"CAMPAIGN": "Welcome_Existing",
"UUID": "77a41c02-beb9-48bf-ada4-b2074c1a78cb"
}
Here is my complete ksh script:
#!/usr/bin/ksh
. /appl/.snf_logon
export SNOW_PKEY_FILE=$(mktemp ./pkey-XXXXXX)
trap "rm -f ${SNOW_PKEY_FILE}" EXIT
LibGetSnowCred
{
outFile=JSON_FILE_TYPE_TEST.json
inDir=/testing
outFileNm=#my_db.my_schema.my_file_stage/${outFile}
snowsql \
--private-key-path $SNOW_PKEY_FILE \
-o exit_on_error=true \
-o friendly=false \
-o timing=false \
-o log_level=ERROR \
-o echo=true <<!
COPY INTO ${outFileNm}
FROM (SELECT object_construct(
'UUID',UUID
,'CAMPAIGN',CAMPAIGN)
FROM my_db.my_schema.JSON_Test_Table
LIMIT 2)
FILE_FORMAT=(
TYPE=JSON
COMPRESSION=NONE
)
OVERWRITE=True
HEADER=False
SINGLE=True
MAX_FILE_SIZE=4900000000
;
get ${outFileNm} file://${inDir}/;
rm ${outFileNm};
!
if [ $? -eq 0 ]; then
echo "Export successful"
else
echo "ERROR in export"
fi
}
Is the best practice to add the comma during the SELECT or after the file is generated and how?
With or without that comma, the text is still not JSON but just a random text that looks like JSON. You export several rows, each row as an independent object. You need to gather all these objects into an array to produce a valid JSON.
A JSON that encodes an array of rows looks like this:
[
{
"CAMPAIGN": "Welcome_New",
"UUID": "fe881781-bdc2-41b2-95f2-e0e8c19dc597"
},
{
"CAMPAIGN": "Welcome_Existing",
"UUID": "77a41c02-beb9-48bf-ada4-b2074c1a78cb"
}
]
The easiest way to produce this output would be to ask the database, if it supports this option (to wrap all the records into a list before generating the JSON, to not export each record in a separate JSON).
If this is not possible then you have a file that contains multiple JSONs. You can use jq to convert these individual JSONs into a JSON similar to the one described above (encoding an array of objects).
It is as simple as that:
jq --slurp '.' input_file > output_file
The option --slurp tells jq to read all the JSONs from the file input_file in memory, to parse them and to put them into an array. That is the program input.
'.' is the jq program. It says "dump the current object". It does not do any processing to the input data. The current object is the array.
After it executes the program (which, in this case doesn't do anything), jq dumps the modified value (as JSON, of course) to the standard output (by default, on screen).
The > output_file part redirects this output to a file (named output_file) instead of showing it on screen.
You can see how it works on the jq playground.

json file items inserted as single item in mongodb

This is my first time using mongodb and I have a products.json file in the format of
{"products":[ {} , {} , {} ] }
and when I inserted it to mongodb with the command :
mongoimport --db databaseName --collection collectioname --file products.json
I got in my collection the products.json as a single document with a single object id not a doc with an id for every {} object in my array .Obviously my json format is incorrect and I would really appreciate your help with this since I am a beginner.
From what you described , what you need is to import with --collection products , and have your products in the json file to be as follow:
{}
{}
{}
Or you can modify the file content as follow:
[{},{},{}]
and add --jsonArray to the mongoimport command to load the array objects as separate documents ...

Inserting document to MongoDB from mongodump JSON with pymongo

I have a JSON file (converted from mongodump BSON) which I would like to insert to a MongoDB using pymongo.
The approach I am using is something like:
with open('duplicate_docs.json') as f:
lines = f.readlines()
for line in lines:
record = json.loads(line)
db.insert_one(record)
However, the JSON is in the form:
{ "_id" : ObjectId( "54ccc3f469702d45ca450200"), \"id\":\"54713efd69702d78d1420500\",\"name\":\"response"}
As you can see there are escape charaters () for the JSON keys and I am not able to load this as a JSON.
What is the best way yo fix a JSON string like this so it can be used to do insert to MongoDB?
Thank you.
As an alternative approach if you take the actual output of mongodump you can insert it straight in with the bson.json_util loads() function.
from pymongo import MongoClient
from bson.json_util import loads
db = MongoClient()['mydatabase']
with open('c:/temp/duplicate_docs.json', mode='w') as f:
f.write('{"_id":{"$oid":"54ccc3f469702d45ca450200"},"id":"54713efd69702d78d1420500","name":"response"}')
with open('c:/temp/duplicate_docs.json') as f:
lines = f.readlines()
for line in lines:
record = loads(line)
db.docs.insert_one(record)
why not use mongoexport to dump to json not bson
mongoexport --port 27017 --db <database> --collection <collection> --out output.json
and then use
mongoimport --port 27017 --db <database> --collection <collection> --file output.json

Removing the index of a collection in mongoDB?

I want to remove the id of my document in MongoDB.
I will put the document below:
{
"_id" : ObjectId("54f2324671eb13650e8b4569"),
"nome" : "Pr.Ademar Lourenço",
"tweet" : "Jornal Águas Lindas: Prefeito Hildo se reúne com governador Rollemberg e prefeitos do Entorno http://t.co/PtVWIENdO4"
}
I want you to be like this:
{
"nome" : "Pr.Ademar Lourenço",
"tweet" : "Jornal Águas Lindas: Prefeito Hildo se reúne com governador Rollemberg e prefeitos do Entorno http://t.co/PtVWIENdO4"
}
The reason and I'll export this document to a .txt and treats it out of MongoDB.
You can use mongoexport to export data in either json or csv format. There is a --fields option to this utility that will let you define which specific fields to export
--fields nome,tweet
Adjusting the examples in the reference documentation: http://docs.mongodb.org/v2.2/reference/mongoexport/ to your example.
For JSON
mongoexport --db sales --collection contacts --fields nome,tweet --out contacts.json
For CSV
mongoexport --db users --collection contacts --fields nome,tweet --csv --out contacts.csv
Hopefully this gives you enough to export your data into the form you want.
you can make a query and remove the _id field in projection parameter, example:
db.tweets.find({}, {_id:0});
this will remove the column _id in response.

Dump Mongo Collection into JSON format

Is there any way to dump mongo collection into json format? Either on the shell or using java driver.I am looking for the one with best performance.
Mongo includes a mongoexport utility (see docs) which can dump a collection. This utility uses the native libmongoclient and is likely the fastest method.
mongoexport -d <database> -c <collection_name>
Also helpful:
-o: write the output to file, otherwise standard output is used (docs)
--jsonArray: generates a valid json document, instead of one json object per line (docs)
--pretty: outputs formatted json (docs)
Use mongoexport/mongoimport to dump/restore a collection:
Export JSON File:
mongoexport --db <database-name> --collection <collection-name> --out output.json
Import JSON File:
mongoimport --db <database-name> --collection <collection-name> --file input.json
WARNING
mongoimport and mongoexport do not reliably preserve all rich BSON data types because JSON can only represent a subset of the types supported by BSON. As a result, data exported or imported with these tools may lose some measure of fidelity.
Also, http://bsonspec.org/
BSON is designed to be fast to encode and decode. For example,
integers are stored as 32 (or 64) bit integers, so they don't need to
be parsed to and from text. This uses more space than JSON for small
integers, but is much faster to parse.
In addition to compactness, BSON adds additional data types
unavailable in JSON, notably the BinData and Date data types.
Here's mine command for reference:
mongoexport --db AppDB --collection files --pretty --out output.json
On Windows 7 (MongoDB 3.4), one has to move the cmd to the place where mongod.exe and mongo.exe file resides =>
C:\MongoDB\Server\3.4\bin else it won't work saying it does not recongnize mongoexport command.
From the Mongo documentation:
The mongoexport utility takes a collection and exports to either JSON or CSV. You can specify a filter for the query, or a list of fields to output
Read more here: http://www.mongodb.org/display/DOCS/mongoexport
Here is a little node script that I write to dump all collections in a specific database to the specified output directory...
#!/usr/bin/env node
import { MongoClient } from 'mongodb';
import { spawn } from 'child_process';
import fs from 'fs';
const DB_URI = 'mongodb://0.0.0.0:27017';
const DB_NAME = 'database-name';
const OUTPUT_DIR = 'output-directory';
const client = new MongoClient(DB_URI);
async function run() {
try {
await client.connect();
const db = client.db(DB_NAME);
const collections = await db.collections();
if (!fs.existsSync(OUTPUT_DIR)) {
fs.mkdirSync(OUTPUT_DIR);
}
collections.forEach(async (c) => {
const name = c.collectionName;
await spawn('mongoexport', [
'--db',
DB_NAME,
'--collection',
name,
'--jsonArray',
'--pretty',
`--out=./${OUTPUT_DIR}/${name}.json`,
]);
});
} finally {
await client.close();
console.log(`DB Data for ${DB_NAME} has been written to ./${OUTPUT_DIR}/`);
}
}
run().catch(console.dir);