Handling Very Large JSON Dataset? - json

I have a scenario wherein frontend app makes a call to backend DB (APP -> API Gateway -> SpringBoot -> DB) with a JSON request. Backend returns a very large dataset (>50000 rows) in response sizing ~10 MB.
My frontend app is highly responsive and mission critical, we are seeing performance issues; frontend where app is not responding or timing-out. What can be best design to resolve this issue condering
DB query cant be normalized any further.
SpringBoot code has has cache builtin.
No data can be left behind due to intrinsic nature
No multiple calls can be made as data is needed is first call itself
Can any cache be built in-between frontend and backend?
Thanks.

Sounds like this is a generated report from a search. If this data needs to be associated with each other, I'd assign the search an id and restore the results on the server. Then pull the data for this id as needed on the frontend. You should never have to send 50,000 rows to the client in one go... Paginate the data and pull as needed if you have to. If you don't want to paginate, how much data can they display on a single screen? You can pull more data from the server based on where they scroll on the page. You should only need to return the count of the rows to the frontend, and maybe 100 rows of data. This would allow you to show a scrollbar with the right height. When they scroll to a certain position within the data, you can pull the corresponding offset from the server for that particular search id. Even if you could return all 50,000+ rows in one go, it doesn't sound very friendly to the end user's device to have to load that kind of memory for a functional page.
This is a sign of a flawed frontend that should be redone.

10mb is huge and can be inconsiderate to your users especially if there's a high probability of mobile use.
If possible, it would be best to collect this data on the backend, probably put it onto disk, and then provide only the necessary data to the frontend as it's needed. As the map needs more data, you would make further calls to the backend.
If this isn't possible, you could load this data with the client-side bundle. If the data doesn't update too frequently, you can even cache it on the frontend. This would at least prevent the user from needing to fetch it repeatedly.

Related

Sending back hundreds of rows from a node and express REST API

I'm working on a RESTful API using node and express (with a MySQL database).
I have a table of products, and a page on my front end which is supposed to display those products. The problem is, there's hundreds (sometimes thousands) of them, I can't just send all of them back in a response.
What is the best way (in terms of performance and speed) for transferring all those rows back to the client?
I thought of streaming the data so that the client starts displaying them while the streaming is still happening. But I have no idea on how to do that, or whether that's the best way.

Implementation of pagination over Elastic Search data (more than 10000 records) created from denormalized data of sql?

I am currently working on a project which has large amount of data (around 10K-20K entries per day). We have stored data into our primary database MySQL and entire de-normalized data into Elastic Search. We are using Elastic Search database for representing dashboards & download reports.
Now problem arises when we have to paginate data in dashboard.
We can use size and from parameters to display by default up to 10000 records to your users. If we want to change this limit, we change index.max_result_window but that may lead to internal memory issues. Other possible solution might be Scroll API but that has some constraints over time frame. What is best possible way to perform pagination in large data set. Also the functionality must include pagination where user can move to all possible page number displayed onto client side dashboard.

Meteor performance comparison of publishing static data vs. getting data via HTTP Get request

I am building an app that receives a bunch of static data that is read only. The user does not change the data, or send any data to the server. The app just gets the data and presents it to the user in various views.
Like for example a parts list, with part numbers and prices. This data is currently stored in mongoDB.
I have few options for getting the data to the client. I could just use meteor's publication system, and have the client subscribe to the data it needs.
Or I could map all the data the client needs into one JSON file, save the JSON file to Amazon S3, and have the client make simple GET request to grab the data.
If we wanted this app to scale to many, many users, would not using meteor publication be the best? Or would either method be similar in terms of performance? Using meteor publication system would be the easiest, but I am worried that going down this route would lead to performance issues if a lot of clients request the data. If the performance between publishing and get request is about the same, I would just stick with the publication as its the easiest.
In this case Meteor will provide better performance. If your data is mostly server to client driven then clients do not have to worry about polling the server and the server will not have to worry about handling the request.
Also Meteor requires very little resources to send data to the client because the connection is persistent. Take an app like code fights which is built on Meteor constantly has thousands of connections to and from it, its performance runs great.
As a side note, if you are ready to serve your static data as a JSON file in a separate server (AWS S3), then it means you do not expect that data to be that big, so that it can be handled in a single file and entirely loaded in client's memory.
In that case, you might even want to reconsider the need to perform any separate request (whether HTTP or Meteor Pub/Sub).
For instance, simply embedding the data in your app, or served through SSR / Fast Render package.
Then if you are really concerned about your scalability, you might even reconsider the need to use Meteor, since you do not seem to need any client-server interactivity (no real need for Pub/Sub, no reactivity…). After your prototype is ready, you could rework it as a separate and static SPA, so that you do not even need to serve it through Node / Meteor.

Technology stack for a multiple queue system

I'll describe the application I'm trying to build and the technology stack I'm thinking at the moment to know your opinion.
Users should be able to work in a list of task. These tasks are coming from an API with all the information about it: id, image urls, description, etc. The API is only available in one datacenter and in order to avoid the delay, for example in China, the tasks are stored in a queue.
So you'll have different queues depending of your country and once that you finish with your task it will be send to another queue which will write this information later on in the original datacenter
The list of task is quite huge that's why there is an API call to get the tasks(~10k rows), store it in a queue and users can work on them depending on the queue the country they are.
For this system, where you can have around 100 queues, I was thinking on redis to manage the list of tasks request(ex: get me 5k rows for China queue, write 500 rows in the write queue, etc).
The API response are coming as a list of json objects. These 10k rows for example need to be stored somewhere. Due to you need to be able to filter in this queue, MySQL isn't an option at least that I store every field of the json object as a new row. First think is a NoSQL DB but I wasn't too happy with MongoDB in the past and an API response doesn't change too much. Like I need relation tables too for other thing, I was thinking on PostgreSQL. It's a relation database and you have the ability to store json and filter by them.
What do you think? Ask me is something isn't clear
You can use HStore extension from PostgreSQL to store JSON, or dynamic columns from MariaDB (MySQL clone).
If you can move your persistence stack to java, then many interesting options are available: mapdb (but it requires memory and its api is changing rapidly), persistit, or mvstore (the engine behind H2).
All these would allow to store json with decent performances. I suggest you use a full text search engine like lucene to avoid searching json content in a slow way.

Find function in j2ee web application

i'm developing a little market in a web application and i have to implement the search function. Now, i know i can use MATCH function in mysql or i can add some libraries (like apache lucene) but that's not the point of my doubt. I'm thinking about managing the set of results i get from the search function (a servlet will do this), cause not all the results should be send to client at one time, so i would like to separate them in some pages. I want to know what is more efficient to do, if i should prefer to do the search in db for every page the client calls or if i should save the result set in a managed bean and access them while the client request a new page of results. Thx (i hope my english is enough understandable)
The question you should be asking is "how many results can you store in memory"? If you have a small dataset, by all means, sure but you will have to define what "small dataset means". This will help as you call the database once and filter on your result in memory (which is faster).
Alternative approach, for larger/huge dataset, you will want to request to the database on every user page request. The problem here is that you call the database on each call, so you will have to have an optimised search query that will bring results in small chunks (SQL LIMIT clause). If you only want to hit the database once and filter the result in "memory", you will have to slot in a caching layer in between your application and your database. That way, the results are cached and you filter on the cached result. The cache will sit on a different JVM as not to share your memory heap space.
There is no silver bullet here. You can only answer this based on your non-functional requirements.
I hope this helps.