Using mongodb to store a single but complex JSON object - json

I want to store a single, big and complex JSON object in mongodb and I want to be able to retrieve and modify specific parts of it. A simple solution would be to store it in a single document, but I'm not sure how that would play with multiple write requests. Another option would be to keep every node of the JSON in different documents, kind of like a pattern explained here in the mongodb documentation. This way I can retrieve only parts of the whole object and work on them that way.
My question is: do I get anything out of the latter approach? I'm kind of new to mongodb, but as I read it has database lock on multiple write requests, so it would seem that having my JSON taken apart like this would achieve nothing when it comes to scaling.

If you consider to store data larger then 16MB you should definitely use some sort of hashing as mongodb has a 16MB size limit on its documents.
From MongoDB Limits and Thresholds
The maximum BSON document size is 16 megabytes.

Related

What's stopping me from using a standalone JSON file instead of a local db?

I need to store data for a native mobile app I'm writing and I was wondering: 'why do I need to bother with DB setup when I can just read/write a JSON file?. All the interactions are basic and could most likely be parsed as JSON objects rather than queried.
what are the advantages?
DB are intent to work with standardized data or large data sets. If you know that there is only a few properties to read and it's not changing, JSON may be easier, but if you have a list of items, a DB can optimize the queries with index or ensure consistency through multiple tables

Should I use JSONField or FileField to store JSON datas?

I am wondering how I should store my JSON datas to have the best performances and scalability.
I have two options :
The first one would be to use JSONField, which will probably provides me an advantage in simplicity when it comes on performances and handling the datas since I don't have to get them out of a file each time.
My second option would be to store my JSON datas in FileFields as json files. This seems the best option since the huge quantity of JSON wouldn't be stored in a DataBase (only the location of the file). In my opinion it's the best option for scalability but maybe not for user performances since the file has to be read each time before displaying them in the template.
I would like to know if I am thinking reasonably, what's the best way between to store JSON datas for them to be reusable as fast as possible without making it complicated to the database & scalability ?
Json field will obviously has a good performance because of its indexing. A very good feature of it would be the native data access feature which means that you don't have to parse/load json and then query, you can just query directly from model field. Now since you have a huge json data it seems that file is a better option than model field but file only has advantage of storage.
Quoting from some random article from google search:
Postgres json field takes almost 11% extra data than the json file on your file system so test of 268mb file in json field is 233 mb (formatted json file)
Storing in a file has some cons which includes reading files parsing json and querying which is time consuming since it is disk based operations. Scalebility will not be a issue with json field although your db size will be high so moving the data might become tough for you.
So unless you have a shortage of database space you should choose jsonfield.

Storing a JSON file in Redis and retrieving it

I am storing the info contained in a JSON file in Redis. I am doing it with the nodejs redis driver. Do you think that I am losing something if I am employing a hashtable for storing the info?
The info is simply a large array (several thousands) of elements (several fields within every element, no more than 50 fields sometimes) in the data and a small bunch of properties in the meta.
I understand that you're storing those JSON strings as follows:
hset some-key some-sub-key <the json>
Actually there's another valid approach which involves using the global key space directly:
set some-key:sub-key <the json>
If you're just storing those JSON strings I would say that creating global space keys is the simplest and most effective approach in your case.
What do you mean by losing something? Storing values(JSON) and retrieving them in Redis could be really fast. Plus Redis comes with some very handy APIs like TTL, FLUSHALL etc...
Personally, I'm using Redis for my Profile page. I store my image uploads in Redis and never had an issue.
My profile page: http://fanjin.computer
Github repo: https://github.com/bfwg/relay-gallery
Although this question has been answered, for future reference some might be asking the same question but looking for a different answer (like me).
If that's the case I would suggest looking in to RedisJSON for creating a JSON type in redis.

Accessing objects from json files on disk

I have ~500 json files on my disk that represents hotels all over the world, each around 30 mbs, all objects have the same structure.
At certain points in my spring server I require to get the information of a single hotel, let's say via code (which is inside the json object).
The data is read only, but I might get updates from the hotels providers at certain times, like extra json files or delta changes.
Now I don't want to migrate my json files to a relational database that's for sure, so I've been investigating in the best solution to achieve what I want.
I tried Apache Drill because querying straight from json files made me think less headaches of dealing with the data, I did a directory query using Drill, something like:
SELECT * FROM dfs.'C:\hotels\' WHERE code='1b3474';
but this obviously does not seem to be the most efficient way for me as it takes around 10 seconds to fetch a single hotel.
At the moment I'm trying out Couch DB, but I'm still learning it. Should I migrate all the hotels to a single document (makes a bit of sense to me)? Or should I consider each hotel a document?
I'm just looking for pointers on what is a good solution to achieve what I want, so here to take your opinion.
The main issue here is that json files do not have indexes associated with them, and Drill does not create indexes for them. So whenever you do a query like SELECT * FROM dfs.'C:\hotels\' WHERE code='1b3474'; Drill has no choice but to read each json file and parse and process all the data in each file. The more files and data you have, the longer this query will take. If you need to do lookups like this often, I would suggest not using Drill for this use case. Some alternatives are:
A relational database where you have an index built for the code column.
A key value store where code is the key.

which database suits my application mysql or mongodb ? using Node.js , Backbone , Now.js

I want to make an application like docs.google.com (without its api,completely on my own server) using
frontend : backbone
backend : node
What database would u think is better ? mysql or mongodb ? Should support good scalability .
I am familiar with mysql with php and i will be happy if the answer is mysql.
But many tutorials i saw, they used mongodb, why did they use mongodb without mysql ?
What should i use ?
Can anyone give me link for some sample application(with source) build using backbone , Node , mysql (or mongo) . or atleast app. with Node and mysql
Thanks
With MongoDB, you can just store JSON objects and retrieve them fully-formed, so you don't really need an ORM layer and you spend less CPU time translating your data back-and-forth. The developers behind MongoDB have also made horizontally scaling the database a higher priority and let you run arbitrary Javascript code to pre-process data on the DB side (allowing map-reduce style filtering of data).
But you lose some for these gains: You can't join records. Actually, the JSON structure you store could only be done via joins in SQL, but in MongoDB you only have that one structure to your data, while in SQL you can query differently and get your data represented in alternate ways much easier, so if you need to do a lot of analytics on your database, MongoDB will make that harder.
The query language in MongoDB is "rougher", in my opinion, than SQL's, partly because it's less familiar, and partly because the querying features "feel" haphazardly put together, partially to make it valid JSON, and partially because there are literally a couple of ways of doing the same thing, and some are older ways that aren't as useful or regularly-formatted as the others. And there's the added complexity of the array and sub-object types over SQL's simple row-based design, so the syntax has to be able to handle querying for arrays that contain some of the values you defined, contain all of the values you defined, contain only the values you defined, and contain none of the values you defined. The same distinctions apply to object keys and their values, and this makes the query syntax harder to grasp. (And while I can see the need for edge-cases, the $where query parameter, which takes a javascript function that is run on every record of the data and returns a boolean, is a Siren song because you can easily define what objects you want to return or not, but it has to run on every record in the database, no indexes can be used.)
So, it depends on what you want to do, but since you say it's for a Google Docs clone, you probably don't care about any representation but the document representation, itself, and you're probably only going to query based on document ID, document name, or the owner's ID/name, nothing too complex in the querying.
Then, I'd say being able to take the JSON representation of the document your user is editing, and just throw it into the database and have it automatically index these important fields, is worth the price of learning a new database.
I was also struggling with this choice looking at the hype created by using MongoDB for tasks it was not built for. So my 2 cents are:
Storing and retrieving hierarchical objects, that your documents probably are, is easier in MongoDB, as David says. It becomes more complicated if you want to store documents that are bigger than 16Mb though - MongoDB's answer is GridFS.
Organising documents in folders, groups, keeping track of which user owns which documents and who he/she provided access to them is definitely easier with MySQL - you have the advantage of powerful SQL queries with joins etc., built in EXPLAIN optimization, triggers, functions, stored procedures, etc. MongoDB is nowhere near.
So what prevents you from using both MySQL to organize the documents and MongoDB to store one collection of documents identified by id (or several collections - one for each document type)? It seems to me the best choice and using two databases in one application is not a problem, really.
MySQL will store users, groups, folders, permissions - whatever you fancy - and for each document it will store a reference to the collection and the document id (MongoDB has a special format for it - DBRefs). MongoDB will store documents themselves in collections, if they are all less than 16MB, or the previews and metadata of documents in collections and the whole documents in GridFS.
David provided a good answer. A few things to add to it.
MongoDB's flexible nature permits for easy agile / iterative development.
MongoDB like node.js is asyncronous in nature and works very well within asyncronous environments.
Mongoose is a good ODM (object document mapper) that makes working with MongoDB with Node.js feel very natural. Unlike ORMs this is a very thin layer.
For Google Doc like functionality, the flexibility & very rich data structure provided by MongoDB feels like a much better fit.
You can find some good example posts by searching for mongoose, node and MongoDB.
Here's one that also uses backbone.js and looks good http://mattkopala.com/blog/2012/02/12/getting-started-with-nodejs/