What is the most efficient way to display lots of data on a website? - json

I have an optimization question.
Lets say that I'm making a website, and it has a JSON file with 5,000 pairs (about 582 kb) and through the combination of 3 sliders and some select tags it is possible to display every value. So the time to appear between every pair is in microseconds.
My question is: If the website is also made to run on mobile browsers, where is it more efficient to have the 5000 pairs of data - in a JSON file or in the data base? and why?

I am building a photo site with similar requirements and I can say after months of investigations and experimenting that there are no easy answer to that question. But I will try to give you some hints:
Try to divide the data in chunks, for example - if your sliders are selecting values between 1 through 100, instead of delivering exactly what the client selected, round up a bit maybe +-10 or maybe more, that way you can continue filtering on the client side without a server roundtrip. Save all data in client memory before querying.
Don't render more than what is visible on the screen, JSON storage and filtering is fast but DOM is very slow, minimize the visible elements.
Use 304 caching - meaning - whenever the client is requesting the same data twice; send a proper 304 response with etag. For example - a good rule of thumb here is to use something you know very easily, like the max ID in the database or so to see if any new data has been updated since the last call. If not, just send 304 and the client will use whatever he had the last time.
Use absolute positioning. Don't even try to use the CSS float or something like that, it will not work. Just calculate each position of each element. This will also help you to achieve tip nr 2 (by filtering out all elements that are outside of the visible screen). You can still use CSS transitions which gives nice animations when they change sliders.
You could experiment with IndexedDB to help with the client side querying but unfortunately the support in different browsers are still not good enough plus you hit the roof on storage, better to use the ordinary cache and with proper headings.
Good Luck!

A database like MongoDB would be good for this. It still uses the JSON syntax for storage so you can use the values from the JSON file. The querying is very fast too and you wouldn't have to parse the JSON file and store it in an object before using it.

Given the size of the data (just 582Kb) I will opt for the Json file.
The drawback is you will have a penalty starting the app and loading the data in memory, but then all queries will run very fast in memory as a good advantage.
You need to think about how much acceses will your app do to the database (how many queries) against load the file just once. And think if your main objective are mobile browsers or pcs.
For this volume of data I wouldn't try a database (another process consuming resources), just try how much resources (time, memory) are needed to load the JSON file.
If the data is going to grow... then you will need to rethink this, or maybe split your json file following some criteria.

Related

Is there any better way to render the fetched feed data on webpage with infinite scrolling?

I am creating a webpage in ReactJS for post feed (with texts, images, videos) just like Reddit with infinite scrolling. I have created a single post component which will be provided with the required data. I am fetching the multiple posts from MySQL with axios. Also, I have implemented redux store in my project.
I have also added post voting. Currently, I am storing all the posts from db in redux store. If user upvotes or downvotes, that change will be in redux store as well as in database, and web-page is re-rendering the element at ease.
Is it feasible to use redux-store for this, as the data will be increased soon, maybe in millions and more ?
I previously used useState hook to store all the data. But with that I had issue of dynamic re-rendering, as I had to set state every time user votes.
If anyone has any efficient way, please help out.
Seems that this question goes far beyond just one topic. Let's break it down to the main pieces:
Client state. You say that you are currently using redux to store posts and update the number of upvotes as it changes. The thing is that this state is not actually a state in your case(or at least most of it). This is a common misconception to treat whatever data that is coming from API a state. In most cases it's not a state, it's a cache. And you need a tool that makes work with cache easier. I would suggest trying something like react-query or swr. This way you will avoid a lot of boilerplate code and hand off server data cache management to a library.
Infinite scrolling. There are a few things to consider here. First, you need to figure out how you are going to detect when to preload more posts. You can do it by using the IntersectionObserver. Or you can use some fance library from NPM that does it for you. Second, if you aim for millions of records, you need to think about virtualization. In a nutshell, it removes elements that are outside of the viewport from the DOM so browsers don't eat up all memory and die after some time of doomscrolling(that would be a nice feature tho). This article would be a good starting point: https://levelup.gitconnected.com/how-to-render-your-lists-faster-with-react-virtualization-5e327588c910.
Data source. You say that you are storing all posts in database but don't mention any API layer. If you are shooting for millions and this is not a project for just practicing your skills, I would suggest having an API between the client app and database. Here are some good questions where you can find out why it is not the best idea to connect to database directly from client: one, two.

(Vis.js Network) Load nodes from database

I have a hard time understanding how to use vis.js network with a large amount of data generated dynamically. From what I read in the documentation, there are only two easy ways to import data: from gephi or in dot language; right? Isn't that a bit restrictive?
I have no knowledge of gephi or dot language so I decided to use my mysql database which I am used to working with.
So I query my data with php, and generate javascript to build the nodes and edges for the network.
But so far, I only have about 200 nodes and edges (which is like 1/5 of the data I'll have in the end) and it's already very slow to load, it seems like it takes a lot of ressources to display the network (my MacBook Pro gets really loud anytime I open the network page), when vis.js is supposed to be quick and lightweight.
Is that because all the nodes and edges are "written" in the code of the page? Or is it the fact that I use php to query the mysql data?
I don't refuse the idea to work with a json file, or dot language, I just have no idea how to do that... but if it can get me better performances, I'd like to learn how to do it. Can anyone explain in details how it all works? And with either of these methods, can I get different sizes and colors for the nodes and the edges according to the data I need to show (right now I do that in php after querying the data from the database)?
The format required by Vis Network can be serialized and deserialized using const object = JSON.parse(string); and const string = JSON.stringify(object);. There's no need to use Gephi or DOT to simply store data in the data base.
Nodes have size property to change size and both nodes and edges have color to change color. Edges can also inherit color from connected nodes. For more details see the docs for nodes at https://visjs.github.io/vis-network/docs/network/nodes.html and edges at https://visjs.github.io/vis-network/docs/network/edges.html.
Regarding performance there is not much I can tell you without some sample code and data to play with. I tried putting more that 200 nodes to https://thomaash.github.io/me/#/canvas which was built with Vis Network. As I expected it loads instantly and works just fine but I have no idea how fast or slow is MacBook Pro compared to my machine.

Asp.NET Core 2 Images

I have a couple of questions about images, since I don't know what is better for my purposes. Also this might me helpful for other people because I couldn't find this info in other questions.
Well, although this is an asp.net core 2.0 application the first question could is a general question about images.
QUESTION 1
When I have images that I want to load everytime I usually add a query string so the explorers like Chrome or IE don't get the chached image they have. In my case I add the time ticks to the url of the image, this way it loads the image everytime since the query string is always different:
filePath += "?" + DateTime.Now.Ticks;
But in my case I have a panel where the administrators of the page can change a lot of images. The problem, when they change those images if there is no query string the users are going to see an old image they have stored in their explorer cache.
The question is, if I add the query string to many images is not bad for the performance? is there any other solution for this?
QUESTION 2
I also have photos of the users and other images stored in the site. When I saw a image all the visitors of the site can see the path (for example: www.site.com/user_files/user_001/photo001.jpg).
Is there a way to hide those paths or transform in another thing is asp.net core 2.0?
Thanks a lot.
Using something like ticks will get the job done, but in a very naive way. You're going to put more stress both on your server and the clients, since effectively the image will have to be refetched every single time, regardless of whether it has changed or not. If you will have any mobile users, the situation is far worse for them, as they'll be forced to redownload all these resources over and over, usually over limited (and costly) data plans.
A far better approach is to use a cryptographic digest, often called a "hash". Essentially, the same data encrypted in the same way will return the same hash. It's usually used to detect tampering with transmitted data, but since each message will (generally) have a unique hash and that hash will be the same each time for the same piece of data, you can also use this to generate a cache-busting query string that only changes when the image data itself changes.
Now, to be thorough, there's technically no guarantee that two messages won't result in the same hash. Instances where that occurs are called "collisions" and they can happen. However, if you use a sufficiently complex algorithm like SHA256, the likelihood of collisions is greatly reduced. Regardless, it should not be a real issue for concern for this particular use case of cache-busting images.
Simplistically, to create the hash, you simply do something like:
string hash;
using (var sha256 = SHA256.Create())
{
hash = Convert.ToBase64String(sha256.ComputeHash(imageBytes));
}
The value of hash then will be something like z1JZs/EwmDGW97RuXtRDjlt277kH+11EEBHtkbVsUhE=.
However, ASP.NET Core has an ImageTagHelper built-in that will handle this for you. Essentially, you just need to do:
<img src="/path/to/image.jpg" asp-append-version="true" />
As for your second question, about hiding or obfuscating the image path, that's not strictly possible, but can be worked around. The URL you use to reference the image uniquely identifies that resource. If you change it in any way, it's effectively not the same resource any more, and thus, would not locate the actual image you wanted to display. So, in a strict sense, no, you cannot change the URL. However, you can proxy the request through a different URL, effectively obfuscating the URL for the original image.
Simply, you'd just have an action on some controller that takes an image path (as part of the query string), loads that from the filesystem and returns it as a response. Care should be taken limit the scope of files that can be returned like this, both based on directory (only allow your image directory, for example, not C:\Windows\, etc.) and file type (only allow images to be returned, not random text files, config files, etc.). That portion is straight-forward enough, and you can find many examples online if you need them.
Ultimately, this doesn't really solve anything, though, because now your image path is simply in the query string instead. However, now that you've set this part up, you can encrypt that part of the query string using the Data Protection API. There's some basic getting started information available in the docs. Essentially, you're just going to encrypt the image path when creating the URL, and then in your action that returns the image, you decrypt the path first before running the rest of the code. For the encryption part, you can create a tag helper to do this for you without having to have a ton of logic in your views.

Why people always encourage single js for a website?

I read some website development materials on the Web and every time a person is asking for the organization of a website's js, css, html and php files, people suggest single js for the whole website. And the argument is the speed.
I clearly understand the fewer request there is, the faster the page is responded. But I never understand the single js argument. Suppose you have 10 webpages and each webpage needs a js function to manipulate the dom objects on it. Putting 10 functions in a single js and let that js execute on every single webpage, 9 out of 10 functions are doing useless work. There is CPU time wasting on searching for non-existing dom objects.
I know that CPU time on individual client machine is very trivial comparing to bandwidth on single server machine. I am not saying that you should have many js files on a single webpage. But I don't see anything go wrong if every webpage refers to 1 to 3 js files and those js files are cached in client machine. There are many good ways to do caching. For example, you can use expire date or you can include version number in your js file name. Comparing to mess the functionality in a big js file for all needs of many webpages of a website, I far more prefer split js code into smaller files.
Any criticism/agreement on my argument? Am I wrong? Thank you for your suggestion.
A function does 0 work unless called. So 9 empty functions are 0 work, just a little exact space.
A client only has to make 1 request to download 1 big JS file, then it is cached on every other page load. Less work than making a small request on every single page.
I'll give you the answer I always give: it depends.
Combining everything into one file has many great benefits, including:
less network traffic - you might be retrieving one file, but you're sending/receiving multiple packets and each transaction has a series of SYN, SYN-ACK, and ACK messages sent across TCP. A large majority of the transfer time is establishing the session and there is a lot of overhead in the packet headers.
one location/manageability - although you may only have a few files, it's easy for functions (and class objects) to grow between versions. When you do the multiple file approach sometimes functions from one file call functions/objects from another file (ex. ajax in one file, then arithmetic functions in another - your arithmetic functions might grow to need to call the ajax and have a certain variable type returned). What ends up happening is that your set of files needs to be seen as one version, rather than each file being it's own version. Things get hairy down the road if you don't have good management in place and it's easy to fall out of line with Javascript files, which are always changing. Having one file makes it easy to manage the version between each of your pages across your (1 to many) websites.
Other topics to consider:
dormant code - you might think that the uncalled functions are potentially reducing performance by taking up space in memory and you'd be right, however this performance is so so so so minuscule, that it doesn't matter. Functions are indexed in memory and while the index table may increase, it's super trivial when dealing with small projects, especially given the hardware today.
memory leaks - this is probably the largest reason why you wouldn't want to combine all the code, however this is such a small issue given the amount of memory in systems today and the better garbage collection browsers have. Also, this is something that you, as a programmer, have the ability to control. Quality code leads to less problems like this.
Why it depends?
While it's easy to say throw all your code into one file, that would be wrong. It depends on how large your code is, how many functions, who maintains it, etc. Surely you wouldn't pack your locally written functions into the JQuery package and you may have different programmers that maintain different blocks of code - it depends on your setup.
It also depends on size. Some programmers embed the encoded images as ASCII in their files to reduce the number of files sent. These can bloat files. Surely you don't want to package everything into 1 50MB file. Especially if there are core functions that are needed for the page to load.
So to bring my response to a close, we'd need more information about your setup because it depends. Surely 3 files is acceptable regardless of size, combining where you would see fit. It probably wouldn't really hurt network traffic, but 50 files is more unreasonable. I use the hand rule (no more than 5), but surely you'll see a benefit combining those 5 1KB files into 1 5KB file.
Two reasons that I can think of:
Less network latency. Each .js requires another request/response to the server it's downloaded from.
More bytes on the wire and more memory. If it's a single file you can strip out unnecessary characters and minify the whole thing.
The Javascript should be designed so that the extra functions don't execute at all unless they're needed.
For example, you can define a set of functions in your script but only call them in (very short) inline <script> blocks in the pages themselves.
My line of thought is that you have less requests. When you make request in the header of the page it stalls the output of the rest of the page. The user agent cannot render the rest of the page until the javascript files have been obtained. Also javascript files download sycronously, they queue up instead of pull at once (at least that is the theory).

Does anyone know how I can store large binary values in Riak?

Does anyone know how I can store large binary values in Riak?
For now, they don't recommend storing files larger than 50MB in size without splitting them. See: FAQ - Riak Wiki
If your files are smaller than 50MB, than proceed as you would with storing non binary data in Riak.
Another reason one might pick Riak is for flexibility in modeling your data. Riak will store any data you tell it to in a content-agnostic way — it does not enforce tables, columns, or referential integrity. This means you can store binary files right alongside more programmer-transparent formats like JSON or XML. Using Riak as a sort of “document database” (semi-structured, mostly de-normalized data) and “attachment storage” will have different needs than the key/value-style scheme — namely, the need for efficient online-queries, conflict resolution, increased internal semantics, and robust expressions of relationships.Schema Design in Riak - Introduction
#Brian Mansell's answer is on the right track - you don't really want to store large binary values (over 50 MB) as a single object, in Riak (the cluster becomes unusably slow, after a while).
You have 2 options, instead:
1) If a binary object is small enough, store it directly. If it's over a certain threshold (50 MB is a decent arbitrary value to start with, but really, run some performance tests to see what the average object size is, for your cluster, after which it starts to crawl) -- break up the file into several chunks, and store the chunks separately. (In fact, most people that I've seen go this route, use chunks of 1 MB in size).
This means, of course, that you have to keep track of the "manifest" -- which chunks got stored where, and in what order. And then, to retrieve the file, you would first have to fetch the object tracking the chunks, then fetch the individual file chunks and reassemble them back into the original file. Take a look at a project like https://github.com/podados/python-riakfs to see how they did it.
2) Alternatively, you can just use Riak CS (Riak Cloud Storage), to do all of the above, but the code is written for you. That's exactly how RiakCS works -- it breaks an incoming file into chunks, stores and tracks them individually in plain Riak, and reassembles them when it comes time to fetch it back. And provides an Amazon S3 API for file storage, for your convenience. I highly recommend this route (so as not to reinvent the wheel -- chunking and tracking files is hard enough). Yes, CS is a paid product, but check out the free Developer Trial, if you're curious.
Just like every other value. Why would it be different?
Use either the Erlang interface ( http://hg.basho.com/riak/src/461421125af9/doc/basic-client.txt ) or the "raw" HTTP interface ( http://hg.basho.com/riak/src/tip/doc/raw-http-howto.txt ). It should "just work."
Also, you'll generally find a better response on the riak-users mailing list than you will here. http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com (No offense to z8000, who seems to also have answers.)