Load huge JSON data in browser - json

From an API I load 15000 json objects and thats around 100mb.
Please suggest whats the best way to handle that data. I have very good machine (12GB and i7) an my Chrome never crash with that data but I suppose that my clients sometimes use a older machine with just 4GB o RAM.
Is it a problem for older browser to handle 100mb of json.
It a designer app and sometimes all 15000 object properties needs to be updated real time. Of course I do not show all all 15 000 in DOM - I split into chunk of arrays.
Please suggest whats the best wya to handle a large amount of JSON data. Is it backend - MongoDB or some in-memory database?
Thanks in advance!

Related

Parse large Jsons from Rest-API

i am facing the problem of parsing large json-results from a rest-endpoint (elasticsearch).
besides the design of the system has got its flaws, I am wondering whether there is another way to do the parsing.
The rest-response contains 10k Object in Json-Array. I am using the native Json-mapper of elasticsearch and Jsoniter. Both lack performance and slow down the application. The request duration raises up to 10-15 sec.
I will encourage a change of the interface but the big result list will remain for the next 6 month.
Could anyone give me an advice what to do to speed up the performance with elasticsearch?
Profile everything.
Is Elasticsearch slow in generating the response?
If you perform the query with Curl, redirect the output to a file, and time it, what fraction of your app's time taken does that take?
Are you running it locally? You might be dropping packets/being throttled by low bandwidth over the network.
Is the performance hit is purely decoding the response?
How long does it take to decode the same blob of JSON using Jsoniter once loaded into memory from a static file?
Have you considered chunking your query?
What about spinning it off as a separate process and immediately returning to the event loop?
There are lots of options and not enough detail in your question to be able to give solid advice.

How can I store a 2 - 3 GB tree in memory and have it accessible to nodejs?

I have a large tree of data that I want to be able to efficiently access leaves, and efficiently serialize large chunks (10 - 20 MBs of it at a time) into json.
Right now I'm storing it as javascript objects, but I'm seeing garbage collection times of 4 - 5 seconds, which is not okay.
I tried using an embedded database (both sqlite and lmdb), but the performance overhead of going from rows to trees when I access data is prety high -- taking me 6 seconds to serialize 5 MBs into json.
Ideally I'd want to be able to tell v8 "please don't try to garbage collect that tree!" (I tried turning GC off on the whole process, but I'm running a lightweight tcp server in front of it and that quickly started to run out of memory).
Or, maybe there's an embedded (or not embedded?) database that handles this natively that I don't know about. (I do know about MongoDB -- it has a 16 MB limit on max object size though).
I'm thinking of maybe trying to pack the tree in a node buffer object (ie, basically simulate the v8 stack myself) but before I get that desperate I thought I'd ask stackoverflow :-)
Storing large objects in a GC language is a bad practice. It is a problem in Java world as well.
There are 2 solutions to this:
Use an In Memory DB - like Redis. See if you can leverage the data structure primitives Redis provides to your advantage.
Go Native - NodeJS provides simple(comparatively) FFI, as half of the library is written in it. See the addons document here on how to proceed.
If you are deploying on server, then you have a 3rd option as well. Instead of linking native code directly with Node, you can write it as an service, and tie it together using a Message Broker like Beanstalk / ZeroMQ / RabbitMQ.
This allows for ease of deployment, as suitable server resources can be provisioned for the app. In your case, the frontend TCP server can sit on its own cheap instance, while the Tree wrangling program can have a large memory instance to work with.
Also, MongoDB is horrible for relational data, which makes it a bad choice for storing Trees. Graph databases might work for you depending on your usecase.
Perhaps you can look into graph databases? Neo4j seems to be a popular one these days and they have node.js client libraries.

Caching last entires of stream data

I am going to work on a distributed application. The data is going to be streamed and analyzed. Also, the end users need to have access to the last streamed data as quickly as possible. Also, I need to keep back-up of the data as well as worked on it.
My initial idea is as follows:
1) Keep redis as a cache to hold the last entries.
2) MySQL - storing data
3) Hadoop/Hbase - convenient way of storing data to analyze it.
What do you think of such a setup? Would you recommend anything else?
Thanks!
I think a combination of Spark and Cassandra would be an excellent way to go. Cassandra can easily handle the data throughput and storage. Spark provides lightning quick analytics.

How to handle storage limitations when fetching data from a large data store?

I am going to be fetching a large amount of data from a data store with a slow internet connection. After fetching the data I have to parse the csv file, fix the errors and store the results in a db. I don't need to keep this data forever. It will only be needed when we need to create reports based on this data. But keeping it in the system means a faster response and we don't have to parse/clean/fix errors in csv files each time. The problem is that our system has much smaller storage space. So I cant keep all the parsed/clean data on our system. At some point I have to delete this data and when a request for this data comes again then we have to fetch, parse and clean it again. I want to have policy for deleting old data. When do I delete the cleaned data? Can somebody give a suggestion for this problem?
You are describing a classic caching problem where you have a large but slow storage medium and a small but fast storage medium that can't hold all the data.
Ideally, you throw out the data which either won't be used much in the future. However, it is often hard to predict future access patterns. So people use heuristics to make informed guesses.
One heuristic is least recently used. This assumes that if I haven't used a data item recently, I won't use it much in the future. To do this you throw out the data which has the oldest access time.
Another method is to throw out the data that has been least frequently used.
For further information you can look at articles on browser caching, and OS disk caching.

Live chat application using Node JS Socket IO and JSON file

I am developing a Live chat application using Node JS, Socket IO and JSON file. I am using JSON file to read and write the chat data. Now I am stuck on one issue, When I do the stress testing i.e pushing continuous messages into the JSON file, the JSON format becomes invalid and my application crashes.Although I am using forever.js which should keep application up but still the application crashes.
Does anybody have idea on this?
Thanks in advance for any help.
It is highly recommended that you re-consider your approach for persisting data to disk.
Among other things, one really big issue is that you will likely experience data loss. If we both get the file at the exact same time - {"foo":"bar"} - we both make a change and you save it before me, my change will overwrite yours since I started with the same thing as you. Although you saved it before me, I didn't re-open it after you saved.
What you are possibly seeing now in an append-only approach is that we're both adding bits and pieces without regard to valid JSON structure (IE: {"fo"bao":r":"ba"for"o"} from {"foo":"bar"} x 2).
Disk I/O is actually pretty slow. Even with an SSD hard drive. Memory is where it's at.
As recommended, you may want to consider MongoDB, MySQL, or otherwise. This may be a decent use case for Couchbase which is an in-memory key/value store based on memcache that persists things to disk ASAP. It is extremely JSON friendly (it is actually mostly based on JSON), offers great map/reduce support to query data, is super easy to scale to multiple servers, and has a node.js module.
This would allow you to very easily migrate your existing data storage routine into a database. Also, it provides CAS support which will prevent you from data loss in the scenarios outlined earlier.
At minimum though, you should possibly just modify an in memory object that you save to disk ever so often to prevent permanent data loss. However, this only works well with 1 server and then you're back at likely needing to look at a database.