Accessing objects from json files on disk - json

I have ~500 json files on my disk that represents hotels all over the world, each around 30 mbs, all objects have the same structure.
At certain points in my spring server I require to get the information of a single hotel, let's say via code (which is inside the json object).
The data is read only, but I might get updates from the hotels providers at certain times, like extra json files or delta changes.
Now I don't want to migrate my json files to a relational database that's for sure, so I've been investigating in the best solution to achieve what I want.
I tried Apache Drill because querying straight from json files made me think less headaches of dealing with the data, I did a directory query using Drill, something like:
SELECT * FROM dfs.'C:\hotels\' WHERE code='1b3474';
but this obviously does not seem to be the most efficient way for me as it takes around 10 seconds to fetch a single hotel.
At the moment I'm trying out Couch DB, but I'm still learning it. Should I migrate all the hotels to a single document (makes a bit of sense to me)? Or should I consider each hotel a document?
I'm just looking for pointers on what is a good solution to achieve what I want, so here to take your opinion.

The main issue here is that json files do not have indexes associated with them, and Drill does not create indexes for them. So whenever you do a query like SELECT * FROM dfs.'C:\hotels\' WHERE code='1b3474'; Drill has no choice but to read each json file and parse and process all the data in each file. The more files and data you have, the longer this query will take. If you need to do lookups like this often, I would suggest not using Drill for this use case. Some alternatives are:
A relational database where you have an index built for the code column.
A key value store where code is the key.

Related

Best way to store/edit GEOJSON working on Wordpress [duplicate]

I want to create a website that will have an ajax search. It will fetch the data or from a JSON file or from a database.I do not know which technology to use to store the data. JSON file or MySQL. Based on some quick research it is gonna be about 60000 entries. So the file size if i use JSON will be around 30- 50 MB and if use MySQL will have 60000 rows. What are the limitations of each technique and what are the benefits?
Thank you
I can't seem to comment since I need 50 rep. for commenting, so I will give it as an answer:
MySQL will be preferable for many reasons, not the least of which being you do not want your web server process to have write access to the filesystem (except for possibly logging) because that is an easy way to get exploited.
Also, the MySQL team has put a lot of engineering effort into things such as replication, concurrent access to data, ACID compliance, and data integrity.
Imagine if, for instance, you add a new field that is required in whatever data structure you are storing. If you store in JSON files, you will have to have some process that opens each file, adds the field, then saves it. Compare this to the difficulty of using ALTER TABLE with a DEFAULT value for the field. (A bit of a contrived example, but how many hacks do you want to leave in your codebase for dealing with old data?) so to be really blunt about, MySQL is a database while JSON is not, so the correct answer is MySQL, without hesitation. JSON is just a language, and barely even that. JSON was never designed to handle anything like concurrent connections or any sort of data manipulation, since its own function is to represent data, not to manage it.
So go with MySQL for storing the data. Then you should use some programming language to read that database, and send that information as JSON, rather than actually storing anything in JSON.
If you store the data in files, whether in JSON format or anything else, you will have all sorts of problems that people have stopped worrying about since databases started being used for the same thing. Size limitations, locks, name it. It's good enough when you have one user, but the moment you add more of them, you'll start solving so many problems that you would probably end up by writing an entire database engine just to handle the files for you, while all along you could have simply used an actual database. Do note! Don't take my word for granted, I am not an expert on this field, so let others post their answer and then judge by that. I think enough people here on stackoverflow have more experience then I do haha. These are NOT entirely my words, but I have taken out the parts that were true from what I knew and know and added some of my own knowledge :) Have a great time making your website
For MySQl :you can select specific rows,or specific column using queries ,filter data based on a key,order alphabetically
downside:need a REST API to fetch data because it can't be accessed directly,you have to use php or python or whatever programming language for backend code.
for json file :benefits :no backend code directly accessed using GET http request.
downside:no filtering ,ordering or any queries,you have to do it manually.

On Node MySQL vs JSON

I'm currently using a MySQL where do I store JSON information and I'd fetch them from MySQL and parse them on my application. I would like to get rid of MySQL, but first I would like to know is that wise?
Is that efficient if I move to way that I store the data into data folder that contains .json files and these contains the data I need? There will be my app's coordinate data per user who wants to track themselves on map. Will that cause any issues? I don't need "query", but what about big data like 50K lines in example? Same amount will be in MySQL too. Amount doesn't change, but will there be any problems that appears when moving from "reading from sql to reading from json files"
It's difficult to answer all these questions in one, but I'll address some of them:
There are dedicated NoSQL databases that are very good at the type of data storage you're talking about: MongoDB, CouchDB etc. It might be worth checking these out. They are very good at dealing with JSON data. Querying and parsing are very simple in Node.js.
You can store JSON in MySQL (or other RDMS systems), I've done it in several projects with good results. As of MySQL 5.7.8 there is a dedicated JSON type. Queries can actually work surprisingly well, I know I've queried tables with tens of millions of JSON entries pretty quickly.
Make sure you consider backup and restore scenarios, what happens in the event of a data loss. Using MySQL or a NoSQL database will simplify this for you. Either way make sure you have this covered!
I wouldn't call 50K lines big data! I dealt with databases with tens of millions of rows.. this still wouldn't be called big data.
I would probably not recommend storing your data in files. I've worked in telematics before, we stored millions of JSON blobs in relational databases with very little problems. Later on we planned to move to a NoSQL database for these, but the relational database worked surprising well, especially because you can adopt a hybrid approach of using relational queries and including JSON data in the results (to be parsed by clients).
You might not need the ability to query, but it's very useful to get for example "Give me all JSON for user id 100". An RDBMS or NoSQL system would make this very easy.

What to use JSON file or SQL

I want to create a website that will have an ajax search. It will fetch the data or from a JSON file or from a database.I do not know which technology to use to store the data. JSON file or MySQL. Based on some quick research it is gonna be about 60000 entries. So the file size if i use JSON will be around 30- 50 MB and if use MySQL will have 60000 rows. What are the limitations of each technique and what are the benefits?
Thank you
I can't seem to comment since I need 50 rep. for commenting, so I will give it as an answer:
MySQL will be preferable for many reasons, not the least of which being you do not want your web server process to have write access to the filesystem (except for possibly logging) because that is an easy way to get exploited.
Also, the MySQL team has put a lot of engineering effort into things such as replication, concurrent access to data, ACID compliance, and data integrity.
Imagine if, for instance, you add a new field that is required in whatever data structure you are storing. If you store in JSON files, you will have to have some process that opens each file, adds the field, then saves it. Compare this to the difficulty of using ALTER TABLE with a DEFAULT value for the field. (A bit of a contrived example, but how many hacks do you want to leave in your codebase for dealing with old data?) so to be really blunt about, MySQL is a database while JSON is not, so the correct answer is MySQL, without hesitation. JSON is just a language, and barely even that. JSON was never designed to handle anything like concurrent connections or any sort of data manipulation, since its own function is to represent data, not to manage it.
So go with MySQL for storing the data. Then you should use some programming language to read that database, and send that information as JSON, rather than actually storing anything in JSON.
If you store the data in files, whether in JSON format or anything else, you will have all sorts of problems that people have stopped worrying about since databases started being used for the same thing. Size limitations, locks, name it. It's good enough when you have one user, but the moment you add more of them, you'll start solving so many problems that you would probably end up by writing an entire database engine just to handle the files for you, while all along you could have simply used an actual database. Do note! Don't take my word for granted, I am not an expert on this field, so let others post their answer and then judge by that. I think enough people here on stackoverflow have more experience then I do haha. These are NOT entirely my words, but I have taken out the parts that were true from what I knew and know and added some of my own knowledge :) Have a great time making your website
For MySQl :you can select specific rows,or specific column using queries ,filter data based on a key,order alphabetically
downside:need a REST API to fetch data because it can't be accessed directly,you have to use php or python or whatever programming language for backend code.
for json file :benefits :no backend code directly accessed using GET http request.
downside:no filtering ,ordering or any queries,you have to do it manually.

Grails App with Huge Tables

I'm trying to create a database from existing csv files that are about 20,000 columns wide and 700 rows deep. In grails I would like the 20,000 column domain to belongTo another simpler domain (about 200 columns). But upon compilation I get:
java.lang.RuntimeException: Class file too large!
Which is understandable because it's way too much data. My question is, what is the best approach to handle this problem in grails? Should I simply break up the big table into separate domains? Look for a different table format?
I'm specifically worried about:
1) Search time, parsing search methods then delegating to sub domains.
2) Importing the data from the huge csv file into the domains.
When you crash into a JVM size limit like this, take it as a big hint that your approach is way off. As I mentioned in another question earlier this week, we shouldn't even know what these limits are, much less be anywhere near hitting them.
I don't see much benefit in using something like GORM or even an O-O approach in general to this much data. It's not an object in any realistic, usable sense - it's a massive bunch of data. You'll need to programmatically access everything anyway even if it did work, since hand-managing the code for that would be crazy amounts of code. Do you really plan on creating one or more instances of these beasts and passing them around as method args?
You'll need to look at this from a big data perspective, not an ORM perspective.

which database suits my application mysql or mongodb ? using Node.js , Backbone , Now.js

I want to make an application like docs.google.com (without its api,completely on my own server) using
frontend : backbone
backend : node
What database would u think is better ? mysql or mongodb ? Should support good scalability .
I am familiar with mysql with php and i will be happy if the answer is mysql.
But many tutorials i saw, they used mongodb, why did they use mongodb without mysql ?
What should i use ?
Can anyone give me link for some sample application(with source) build using backbone , Node , mysql (or mongo) . or atleast app. with Node and mysql
Thanks
With MongoDB, you can just store JSON objects and retrieve them fully-formed, so you don't really need an ORM layer and you spend less CPU time translating your data back-and-forth. The developers behind MongoDB have also made horizontally scaling the database a higher priority and let you run arbitrary Javascript code to pre-process data on the DB side (allowing map-reduce style filtering of data).
But you lose some for these gains: You can't join records. Actually, the JSON structure you store could only be done via joins in SQL, but in MongoDB you only have that one structure to your data, while in SQL you can query differently and get your data represented in alternate ways much easier, so if you need to do a lot of analytics on your database, MongoDB will make that harder.
The query language in MongoDB is "rougher", in my opinion, than SQL's, partly because it's less familiar, and partly because the querying features "feel" haphazardly put together, partially to make it valid JSON, and partially because there are literally a couple of ways of doing the same thing, and some are older ways that aren't as useful or regularly-formatted as the others. And there's the added complexity of the array and sub-object types over SQL's simple row-based design, so the syntax has to be able to handle querying for arrays that contain some of the values you defined, contain all of the values you defined, contain only the values you defined, and contain none of the values you defined. The same distinctions apply to object keys and their values, and this makes the query syntax harder to grasp. (And while I can see the need for edge-cases, the $where query parameter, which takes a javascript function that is run on every record of the data and returns a boolean, is a Siren song because you can easily define what objects you want to return or not, but it has to run on every record in the database, no indexes can be used.)
So, it depends on what you want to do, but since you say it's for a Google Docs clone, you probably don't care about any representation but the document representation, itself, and you're probably only going to query based on document ID, document name, or the owner's ID/name, nothing too complex in the querying.
Then, I'd say being able to take the JSON representation of the document your user is editing, and just throw it into the database and have it automatically index these important fields, is worth the price of learning a new database.
I was also struggling with this choice looking at the hype created by using MongoDB for tasks it was not built for. So my 2 cents are:
Storing and retrieving hierarchical objects, that your documents probably are, is easier in MongoDB, as David says. It becomes more complicated if you want to store documents that are bigger than 16Mb though - MongoDB's answer is GridFS.
Organising documents in folders, groups, keeping track of which user owns which documents and who he/she provided access to them is definitely easier with MySQL - you have the advantage of powerful SQL queries with joins etc., built in EXPLAIN optimization, triggers, functions, stored procedures, etc. MongoDB is nowhere near.
So what prevents you from using both MySQL to organize the documents and MongoDB to store one collection of documents identified by id (or several collections - one for each document type)? It seems to me the best choice and using two databases in one application is not a problem, really.
MySQL will store users, groups, folders, permissions - whatever you fancy - and for each document it will store a reference to the collection and the document id (MongoDB has a special format for it - DBRefs). MongoDB will store documents themselves in collections, if they are all less than 16MB, or the previews and metadata of documents in collections and the whole documents in GridFS.
David provided a good answer. A few things to add to it.
MongoDB's flexible nature permits for easy agile / iterative development.
MongoDB like node.js is asyncronous in nature and works very well within asyncronous environments.
Mongoose is a good ODM (object document mapper) that makes working with MongoDB with Node.js feel very natural. Unlike ORMs this is a very thin layer.
For Google Doc like functionality, the flexibility & very rich data structure provided by MongoDB feels like a much better fit.
You can find some good example posts by searching for mongoose, node and MongoDB.
Here's one that also uses backbone.js and looks good http://mattkopala.com/blog/2012/02/12/getting-started-with-nodejs/