It is a performance question - I created a web app (in Node.js) that loads a JSON file that has around 10 000 records and then displays that data to the user. I'm wondering if it would be faster to use (for example) MongoDB(or any other noSQL database, CouchDB?) instead? And how much faster would it be?
If you are looking for speed, JSON is quite specifically "not-fast". JSON involves sending the Keys along with the Values and it requires some heavy parsing on the receiving end. Reading the data from file can be slower than reading from the DB. I wouldn't like to say which is better, so you'll have to test it.
Related
I'm currently using a MySQL where do I store JSON information and I'd fetch them from MySQL and parse them on my application. I would like to get rid of MySQL, but first I would like to know is that wise?
Is that efficient if I move to way that I store the data into data folder that contains .json files and these contains the data I need? There will be my app's coordinate data per user who wants to track themselves on map. Will that cause any issues? I don't need "query", but what about big data like 50K lines in example? Same amount will be in MySQL too. Amount doesn't change, but will there be any problems that appears when moving from "reading from sql to reading from json files"
It's difficult to answer all these questions in one, but I'll address some of them:
There are dedicated NoSQL databases that are very good at the type of data storage you're talking about: MongoDB, CouchDB etc. It might be worth checking these out. They are very good at dealing with JSON data. Querying and parsing are very simple in Node.js.
You can store JSON in MySQL (or other RDMS systems), I've done it in several projects with good results. As of MySQL 5.7.8 there is a dedicated JSON type. Queries can actually work surprisingly well, I know I've queried tables with tens of millions of JSON entries pretty quickly.
Make sure you consider backup and restore scenarios, what happens in the event of a data loss. Using MySQL or a NoSQL database will simplify this for you. Either way make sure you have this covered!
I wouldn't call 50K lines big data! I dealt with databases with tens of millions of rows.. this still wouldn't be called big data.
I would probably not recommend storing your data in files. I've worked in telematics before, we stored millions of JSON blobs in relational databases with very little problems. Later on we planned to move to a NoSQL database for these, but the relational database worked surprising well, especially because you can adopt a hybrid approach of using relational queries and including JSON data in the results (to be parsed by clients).
You might not need the ability to query, but it's very useful to get for example "Give me all JSON for user id 100". An RDBMS or NoSQL system would make this very easy.
I am in the process of building my first live node.js web app. It contains a form that accepts data regarding my clients current stock. When submitted, an object is made and saved to an array of current stock. This stock is then permanently displayed on their website until the entry is modified or deleted.
It is unlikely that there will ever be more than 20 objects stored at any time and these will only be updated perhaps once a week. I am not sure if it is necessary to use MongoDB to store these, or whether there could be a simpler more appropriate alternative. Perhaps the objects could be stored to a JSON file instead? Or would this have too big an implication on page load times?
You could potentially store in a JSON file or even in a cache of sorts such as Redis but I still think MongoDB would be your best bet for a live site.
Storing something in a JSON file is not scalable so if you end up storing a lot more data than originally planned (this often happens) you may find you run out of storage on your server hard drive. Also if you end up scaling and putting your app behind a load balancer, then you will need to make sure there are matching copy's of that JSON file on each server. Further more, it is easy to run into race conditions when updating a JSON file. If two processes are trying to update the file at the same time, you are going to potentially lose data. Technically speaking, JSON file would work but it's not recommended.
Storing in memory (i.e.) Redis has similar implications that the data is only available on that one server. Also the data is not persistent, so if your server restarted for whatever reason, you'd lose what was stored in memory.
For all intents and purposes, MongoDB is your best bet.
The only way to know for sure is test it with a load test. But as you probably read html and js files from the file system when serving web pages anyway, the extra load of reading a few json files shouldn't be a problem.
If you want to go with simpler way i.e JSON file use nedb API which is plenty fast as well.
I am wondering how I should store my JSON datas to have the best performances and scalability.
I have two options :
The first one would be to use JSONField, which will probably provides me an advantage in simplicity when it comes on performances and handling the datas since I don't have to get them out of a file each time.
My second option would be to store my JSON datas in FileFields as json files. This seems the best option since the huge quantity of JSON wouldn't be stored in a DataBase (only the location of the file). In my opinion it's the best option for scalability but maybe not for user performances since the file has to be read each time before displaying them in the template.
I would like to know if I am thinking reasonably, what's the best way between to store JSON datas for them to be reusable as fast as possible without making it complicated to the database & scalability ?
Json field will obviously has a good performance because of its indexing. A very good feature of it would be the native data access feature which means that you don't have to parse/load json and then query, you can just query directly from model field. Now since you have a huge json data it seems that file is a better option than model field but file only has advantage of storage.
Quoting from some random article from google search:
Postgres json field takes almost 11% extra data than the json file on your file system so test of 268mb file in json field is 233 mb (formatted json file)
Storing in a file has some cons which includes reading files parsing json and querying which is time consuming since it is disk based operations. Scalebility will not be a issue with json field although your db size will be high so moving the data might become tough for you.
So unless you have a shortage of database space you should choose jsonfield.
I have a question which is relating to machine learning application in real world. It might be sounds stupid lol.
I've been self study machine learning for a while and most of the exercise was using the csv file as data source (both processed and raw). I would like to ask is there any other methods other than import csv file to channel/supply data for machine learning?
Example: Streaming Facebook/ Twitter live feed's data for machine learning in real-time, rather than collect old data and stored them into CSV file.
The data source can be anything. Usually, it's provided as a CSV or JSON file. But in the real world, say you have a website such as Twitter, as you're mentioning, you'd be storing your data in a rational DB such as SQL databases, and for some data you'd be putting them in an in-memory cache.
You can basically utilize both of these to retrieve your data and process it. The thing here is when you have too much data to fit in the memory, you can't really just query everything and process it, in that case, you'll be utilizing some smart algorithms to process data in chunks.
Good thing about some databases such as SQL is that they provide you with a set of functions that you can invoke right in your SQL script to efficiently calculate some data. For example you can get a sum of a column across the whole table or something using SUM() function SQL, which allows for efficient and easy data manipulation
I have ~500 json files on my disk that represents hotels all over the world, each around 30 mbs, all objects have the same structure.
At certain points in my spring server I require to get the information of a single hotel, let's say via code (which is inside the json object).
The data is read only, but I might get updates from the hotels providers at certain times, like extra json files or delta changes.
Now I don't want to migrate my json files to a relational database that's for sure, so I've been investigating in the best solution to achieve what I want.
I tried Apache Drill because querying straight from json files made me think less headaches of dealing with the data, I did a directory query using Drill, something like:
SELECT * FROM dfs.'C:\hotels\' WHERE code='1b3474';
but this obviously does not seem to be the most efficient way for me as it takes around 10 seconds to fetch a single hotel.
At the moment I'm trying out Couch DB, but I'm still learning it. Should I migrate all the hotels to a single document (makes a bit of sense to me)? Or should I consider each hotel a document?
I'm just looking for pointers on what is a good solution to achieve what I want, so here to take your opinion.
The main issue here is that json files do not have indexes associated with them, and Drill does not create indexes for them. So whenever you do a query like SELECT * FROM dfs.'C:\hotels\' WHERE code='1b3474'; Drill has no choice but to read each json file and parse and process all the data in each file. The more files and data you have, the longer this query will take. If you need to do lookups like this often, I would suggest not using Drill for this use case. Some alternatives are:
A relational database where you have an index built for the code column.
A key value store where code is the key.