Node.js Rendering big amount of JSON data from the server - json

I have an view that has a for loop that inserts rows to a table. The table is very big and already consisting of couple of thousand of rows.
When I run it, the server throws out of memory exception.
I would like to add an infinite scrolling feature so I won't have to load all the data at once.
Right now the data is being sent with regular res.render(index.ejs, data) (data is JSON)
I can figure out the infinite scrolling part, but how do I get the JSON data in chunks from the server ?
I am using node.js with express and ejs as template engine.
I am open for using any framework that will aid me through the process (was particularly checking out Angualr.js).
Thanks

Firstly, there is an angular component for infinite scroll: http://ngmodules.org/modules/ngInfiniteScroll
Then, you have to change you backend query to look something like:
http://my.domain.com/items?before=1392382383&count=50
This essentially tells your server to fetch items created/published/changed/whatever before the given timestamp and return only 50 of them. That means you back-end entities (be them blog entries, products etc) need to have some short of natural ordering in a continuous space (a publication date timestamp is almost continuous). This is very important, cause even if you use a timestamp, you may end-up with extreme heisenbugs where items are rendered twice (if you use <= that's definate), loose entries (if you use < and 2 entries on the edges of your result sets are on the same timestamp) or even load the same items again and again (more than 50 items on the same timestamp). You have to take care of such cases by filtering duplicates.
Your server-side code translates this into a query like (DB2 SQL of course):
SELECT * FROM ITEMS
WHERE PUBLICATION_DATE <= 1392382383
ORDER BY PUBLICATION_DATE DESC
FETCH FIRST 50 ROWS ONLY
When infinite scroll reaches the end of the page and calls your registered callback, you create this $http.get request by taking into account the last item of your already loaded items. For the first query, you can use the current timestamp.
Another approach is to simply send the id of the last item, like:
http://my.domain.com/items?after_item=1232&count=50
and let the server decide what to do. I suppose you can use NoSQL storage like Redis to answer this kind of query very fast and without side-effects.
That's the general idea. I hope it helps.

Related

What is the Optimized way to Paginate Active Record Objects with Filter?

I want to display the Users list in pagination with my rails API, However I have few constraints here before displaying the users I want to check users who have access to the view files, Here is the code:
def verified_client
conditions = {}
conditions[:user_name] = fetch_verified_users_with_api_call # returns[user_1,user_2, ....]
#users = User.where(conditions).where('access NOT LIKE ?', 'admin_%').ordered
will_paginate(#users, params[:page])
end
Q1) Is there a way where I don't have to make sql call when users try to fetch subsequent pages(page 2, page 3.. page n)?
Q2) What would happen when verified_users list return million on items? I suspect the SQL will fail
I could have used limit and offset with the Query, but I will not know the total result and page size to achieve the same I have to fire one more SQL call to get count and write up own logic to get number of pages.
Generated SQL:
select *
from users
where user_name IN (user_1, user_2 .... user_10000)
AND (access NOT LIKE 'admin_%')
That query is hard to optimize. It probably does essentially all the work for each page and there is no good way to prevent this scan. Adding these may help:
INDEX(access)
INDEX(user, access)
I have seen 70K items in an IN list, but I have not heard of 1M. What is going on? Would it be shorter to say which users are not included? Could there be another table with the user list? (Sometimes a JOIN works better than IN, especially if you have already run a Select to get the list.)
Could the admins be filtered out of the IN list before building this query? Then,
INDEX(user)
is likely to be quite beneficial.
Is there at most one row per user? If so, then pagination can be revised to be very efficient. This is done by "remembering where you left off" instead of using OFFSET. More: http://mysql.rjweb.org/doc.php/pagination
Q1) Is there a way where I don't have to make sql call when users try
to fetch subsequent pages(page 2, page 3.. page n)?
The whole idea of pagination is that you make the query faster by returning a small subset of the total number of records. In most cases the number of requests for the first page will vastly outnumber the other pages so this could very well be a case of premature optimization that might do more harm then good.
If is actually a problem its better adressed with SQL caching, ETags or other caching mechanisms - not by loading a bunch of pages at once.
Q2) What would happen when verified_users list return million on items? I suspect the SQL will fail
Your database or application will very likely slow to a halt and then crash when it runs out of memory. Exactly what happens depends on your architecture and how grumpy your boss is on that given day.
Q1) Is there a way where I don't have to make sql call when users try to fetch subsequent pages(page 2, page 3.. page n)?
You can get the whole result set and store it in your app. As far as the database is concerned this is not slow or non-optimal. Then performance including memory is your app's problem.
Q2) What would happen when verified_users list return million on items? I suspect the SQL will fail
What will happen is all those entries will be concatenated in the SQL string. There is likely a maximum SQL string size and a million entries would be too much.
A possible solution is if you have a way to identify the verified users in the database and do a join with that table.
What is the Optimized way to Paginate Active Record Objects with Filter?
The three things which are not premature optimizations with databases is (1) use indexed queries not table scans, (2) avoid correlated sub-queries, and (3) reduce network turns.
Make sure you have an index it can use, in particular for the order. So make sure you know what order you are asking for.
If instead of the access field starting with a prefix if you had a field to indicate an admin user you can make an index with the first field as that admin field and the second field as what you are ordering by. This allows the database to sort the records efficiently, especially important when paging with offset and limit.
As for network turns you might want to use paging and not worry about network turns. One idea is to prefetch the next page if possible. So after it gets the results of page 1, query for page 2. Hold the page 2 results until viewed, but when viewed then get the results for page 3.

Load balancing KEYs using GET via Redis

my application currently using MySQL makes phone calls fetching information about the dialed numbers and the caller ID from the DB. I want to have a group where a list of caller IDs to be defined in Redis. Let's say 10 caller IDs. But for each dialing, I want to SELECT/GET the caller ID from redis server not just a random number. Is that possible with Redis? It's like load balancing from the list of Keys from redis to make sure all keys are given a fair chance to be used?
An example of the data set will be a phonebook which will be the key, and there will be say 10 phone numbers in that phonebook. I want to use those numbers for every unique dialing so all numbers in the phonebook are used evenly for dialing.
I can do that in MySQL by setting up an update field in the table but that's going to increate UPDATE's on MySQL. Is this something can easily be done with Redis? I can't seem to think of a logic on how to do that.
Thanks.
There are two ways to do it in Redis:
ZSET
You can track the usage frequency with the score of a zset entry. So when you fetch one out from Redis with lowest score, you increase its score by one.
The side benefit is you can easily see exactly how many times each element has been used.
LIST
If you're not bothered about tracking the usage in numbers. You can also do it with a Redis list. Just use RPOPLPUSH source destination from/to itself to achieve round robin load balancing effect. Basically it takes an element from the bottom and puts it back onto the top the queue, and returns you the value of the shuffled element, obviously.
The benefit is there is only one command to run and the operation is atomic.

Using a Mysql result from NodeJS rather than slurping rows

TLDR; How do I SELECT on a Mysql table with Node.js without slurping the results? (If you say SELECT COUNT(id) then use OFFSET and LIMIT, no, that's silly. I want to HOLD the query result object and poke it after, not hammer the database to death re-SELECTing every few seconds. Some of these queries will take several seconds to run!)
What: I want to run a Mysql query on a nodejs service and basically attach the MYSQL_RES to the client's session context.
Why: The queries in question may yield tens of thousands of results. I'll only have 1 demanding client at a time, and the UI will only display ~30 results at a time in a scrollable/flickable list view.
How: In Qt it's standard practice to use a QSqlModel for circumstances such as this. Basically if you have a list (view) with this type of model attached, it will only "read" the results pertinent to the visible area. As the view is scrolled down it populates it with more results.
But: The NodeJS asynchronous method is lovely, but I have yet to find a way to get ONLY the result object (sans rows), and a respective method of "picking out" result rows or a span thereof arbitrarily.

Would using Redis with Rails provide any performance benefit for this specific kind of queries

I don't know if this is the right place to ask question like this, but here it goes:
I have an intranet-like Rails 3 application managing about 20k users which are in nested-set (preordered tree - http://en.wikipedia.org/wiki/Nested_set_model).
Those users enter stats (data, just plain numeric values). Entered stats are assigned to category (we call it Pointer) and a week number.
Those data are further processed and computed to Results.
Some are computed from users activity + result from some other category... etc.
What user enters isn't always the same what he sees in reports.
Those computations can be very tricky, some categories have very specific formulae.
But the rest is just "give me sum of all entered values for this category for this user for this week/month/year".
Problem is that those stats needs also to be summed for a subset of users under selected user (so it will basically return sum of all values for all users under the user, including self).
This app is in production for 2 years and it is doing its job pretty well... but with more and more users it's also pretty slow when it comes to server-expensive reports, like "give me list of all users under myself and their statistics. One line for summed by their sub-group and one line for their personal stats"). Of course, users wants (and needs) their reports to be as actual as possible, 5 mins to reflect newly entered data is too much for them. And this specific report is their favorite :/
To stay realtime, we cannot do the high-intensive sqls directly... That would kill the server. So I'm computing them only once via background process and frontend just reads the results.
Those sqls are hard to optimize and I'm glad I've moved from this approach... (caching is not an option. See below.)
Current app goes like this:
frontend: when user enters new data, it is saved to simple mysql table, like [user_id, pointer_id, date, value] and there is also insert to the queue.
backend: then there is calc_daemon process, which every 5 seconds checks the queue for new "recompute requests". We pop the requests, determine what else needs to be recomputed along with it (pointers have dependencies... simplest case is: when you change week stats, we must recompute month and year stats...). It does this recomputation the easy way.. we select the data by customized per-pointer-different sqls generated by their classes.
those computed results are then written back to mysql, but to partitioned tables (one table per year). One line in this table is like [user_id, pointer_id, month_value, w1_value, w2_value, w3_value, w4_value]. This way, the tables have ~500k records (I've basically reduced 5x # of records).
when frontend needs those results it does simple sums on those partitioned data, with 2 joins (because of the nested set conds).
The problem is that those simple sqls with sums, group by and join-on-the-subtree can take like 200ms each... just for a few records.. and we need to run a lot of these sqls... I think they are optimized the best they can, according to explain... but they are just too hard for it.
So... The QUESTION:
Can I rewrite this to use Redis (or other fast key-value store) and see any benefit from it when I'm using Ruby and Rails? As I see it, if I'll rewrite it to use redis, I'll have to run much more queries against it than I have to with mysql, and then perform the sum in ruby manually... so the performance can be hurt considerably... I'm not really sure if I could write all the possible queries I have now with redis... Loading the users in rails and then doing something like "redis, give me sum for users 1,2,3,4,5..." doesn't seem like right idea... But maybe there is some feature in redis that could make this simpler?)...
Also the tree structure needs to be like nested set, i.e. it cannot have one entry in redis with list of all child-ids for some user (something like children_for_user_10: [1,2,3]) because the tree structure changes frequently... That's also the reason why I can't have those sums in those partitioned tables, because when the tree changes, I would have to recompute everything.. That's why I perform those sums realtime.)
Or would you suggest me to rewrite this app to different language (java?) and to compute the results in memory instead? :) (I've tried to do it SOA-way but it failed on that I end up one way or another with XXX megabytes of data in ruby... especially when generating the reports... and gc just kills it...) (and a side effect is that one generating report blocks the whole rails app :/ )
Suggestions are welcome.
Redis would be faster, it is an in-memory database, but can you fit all of that data in memory? Iterating over redis keys is not recommended, as noted in the comments, so I wouldn't use it to store the raw data. However, Redis is often used for storing the results of sums (e.g. logging counts of events), for example it has a fast INCR command.
I'm guessing that you would get sufficient speed improvement by using a stored procedure or a faster language than ruby (eg C-inline or Go) to do the recalculation. Are you doing group-by in the recalculation? Is it possible to change group-bys to code that orders the result-set and then manually checks when the 'group' changes. For example if you are looping by user and grouping by week inside the loop, change that to ordering by user and week and keep variables for the current and previous values of user and week, as well as variables for the sums.
This is assuming the bottleneck is the recalculation, you don't really mention which part is too slow.

Sorting/Displaying a large data set within a browser - how much JSON is too much?

This is a technical question regarding browser limitations for parsing and sorting JSON.
We are looking at performing a clustering algorithm on large data sets (potentially 50k rows, potentially 10 fields per row) that are returned from a query and displayed to users in a table, 25 rows per page, and sortable on all fields. The clustering will take place server side and then send back the client the clustered results as JSON.
Currently, the clustered result data will not be existing within any database table. This creates some issues for sorting and paging, and back button support too.
Instead of rerunning the query for "next page" and "resort", I'm wondering if I could send all the data back at one time as a potentially very large JSON, and then displaying only 25 records at a time to implement paging. But what about when a user wants to resort? Could a browser handle resorting 50k+ rows? And still maintain the paging feature?
Would it just be better to create a temp table for the users query results?
you might get a faster result with jsonp. I don't know if there any limits on size, other than practical, since you basically make the browser treat the json as a script. You would have to process the result into some sort of data structure to support your paging, but that shouldn't be difficult.