import large JSON file into mongo - json

Due to some migrations and change in server, I have to change my Mongo database from old to new data is also need to transfer but data is too much now almost ~4GB of each file. In total, I have almost 20 files.
My problem is when I upload to new collections it says "tostring" error. I read and come to know there is the limit from MongoDB of 16mb to import a file.
How can I import JSON file into MongoDB? Thank you in advance.

If you read the documentation for mongoexport it says:
Avoid using mongoimport and mongoexport for full instance production
backups. They do not reliably preserve all rich BSON data types,
because JSON can only represent a subset of the types supported by
BSON. Use mongodump and mongorestore as described in MongoDB Backup
Methods for this kind of functionality.
Rather than using mongoexport to create a json file and then mongoimport to reimport it, you should use mongodump and mongorestore.

Related

Backup core data, one entity only

My application requires some kind of data backup and some kind of data exchange between users, so what I want to achieve is the ability to export an entity but not the entire database.
I have found some help but for the full database, like this post:
Backup core data locally, and restore from backup - Swift
This applies to the entire database.
I tried to export a JSON file, this might work except that the entity I'm trying to export contains images as binary data.
So I'm stuck.
Any help exporting not the full database but just one entity or how to write a JSON that includes binary data.
Take a look at protobuf. Apple has an official swift lib for it
https://github.com/apple/swift-protobuf
Protobuf is an alternate encoding to JSON that has direct support for serializing binary data. There are client libraries for any language you might need to read the data in, or command-line tools if you want to examine the files manually.

What is the best practice for converting and storing a nested JSON file as a MongoDB document?

I have a huge (approx. 1.8gb) JSON file, the data of which I aim to use in an Express app. Loading the JSON file itself is infeasible as Node limits imports to 512mb. I am new to MongoDB (and databases in general) and thus have avoided using them so far, however, this issue appears as though it must be solved using a database.
I am wondering if there is a method by which I can convert my JSON file (which includes data nested up to 5 levels) into a MongoDB document that I can query from my server using Mongoose.
I managed to resolve this issue through further searching. After converting the largest object into an array (creating an array of 2000 smaller objects that were under the 16mb limit hard coded in MongoDB), I used the mongoimport command to convert the JSON into documents.
mongoimport -d DATABASE_NAME -c COLLECTION_NAME --file JSON_FILE --jsonArray

Storing plain JSON in HDFS to be used in MongoDB

I am fetching JSON data from different API's. I want to store them in HDFS and then use them in MongoDB.
Do I need to convert them to avro, sequence file, parquet, etc., or can I simply store them as plain JSON and load them to the database later?
I know that if i convert them to another format they will get distributed better and compressed, but how will I be able then to upload an avro file to MongoDB? MongoDB only accepts JSON. Should I do another step to read them from avro and convert them to JSON?
How large is the data you're fetching? If it's less than 128MB (with or without compression) per file, it really shouldn't be in HDFS.
To answer the question, format doesn't really matter. You can use SparkSQL to read any Hadoop format (or JSON) to load into Mongo (and vice versa).
Or you can write the data first to Kafka, then use a process such as Kafka Connect to write to both HDFS and Mongo at the same time.

How to import from sql dump to MongoDB?

I am trying to import data from MySQL dump .sql file to get imported into MongoDB. But I could not see any mechanism for RDBMS to NoSQL data migration.
I have tried to convert the data into JSON and CSV but it is not giving m the desired output in the MongoDB.
I thought to try Apache Sqoop but it is mostly for SQL or NoSQL to Hadoop.
I could not understand, how it can be possible to migrate data from 'MySQL' to 'MongoDB'?
I there any thought apart from what I have tried till now?
Hoping to hear a better and faster solution for this type of migration.
I suggest you dump Mysql data to a CSV file,also you can try other file format,but make sure the file format is friendly so that you can import the data into MongoDB easily,both of MongoDB and Mysql support CSV file format very well.
You can try to use mysqldump or OUTFILE keyword to dump Mysql databases for backup,using mysqldump maybe takes a long time,so have a look at How can I optimize a mysqldump of a large database?.
Then use mongoimport tool to import data.
As far as I know,there are three ways to optimize this importing:
mongoimport --numInsertionWorkers N It will start several insertion workers, N can be the number of cores.
mongod --njournal Most of the continuous disk usage come from the journal,so disable journal might be a good way for optimizing.
split up your file and start parallel jobs.
Actually in my opinion, importing data and exporting data aren't difficulty,it seems that your dataset is large,so if you don't design you document structure,it still make your code slower,it is not recommended doing automatic migrations from relational database to MongoDB,the database performance might not be good.
So it's worth designing your data structure, you can check out Data models.
Hope this helps.
You can use Mongify which helps you to move/migrate data from SQL based systems to MongoDB. Supports MySQL, PostgreSQL, SQLite, Oracle, SQLServer, and DB2.
Requires ruby and rubygems as prerequisites. Refer this documentation to install and configure mongify.

Quick implementation for very large indexed text search?

I have a single text file that is about 500GB (ie a very large log file) and would like to build an implementation to search it quickly.
So far I have created my own inverted index with a SQLite Database but this doesn't scale well enough.
Can anyone suggest a fairly simple implementation that would allow quick searching of this massive document?
I have looked at Solr and Lucene but these look too complicated for a quick solution, I'm thinking a database with built in full-text indexing (MySQl, Raven, Mongo etc.) may be the simplest solution but have no experience with this.
Since you are looking at text processing for log files I'd take a close look at the Elasticsearch Logstask Kibana stack. Elasticsearch provides the Lucene based text search. Logstash parses and loads the log file into Elasticsearch. And Kibana provides a visualization and query tool for searching and analyzing the data.
This is a good webinar on the ELK stack by one of their trainers: http://www.elasticsearch.org/webinars/elk-stack-devops-environment/
As an experienced MongoDB, Solr and Elasticsearch user I was impressed by how it easy it was to get all three components up and functional analyzing log data. And it also has a robust user community, both here on stackoverflow and elsewhere.
You can download it here: http://www.elasticsearch.org/overview/elkdownloads/
convert log file to csv then csv import to mysql, mongodb etc.
mongodb:
for help :
mongoimport --help
json file :
mongoimport --db db --collection collection --file collection.json
csv file :
mongoimport --db db--collection collection --type csv --headerline --file collection.csv
Use the “--ignoreBlanks” option to ignore blank fields. For CSV and TSV imports, this option provides the desired functionality in most cases: it avoids inserting blank fields in MongoDB documents.
link Guide:
mongoimport , mongoimport v2.2
then define index on collection and enjoy :-)