I have a very large array (20 million numbers, output of a sql query) in my MVC application and I need to send it to the client browser (it will be visualized on a map using webGL and the user is supposed to play with the data locally). What is the best approach to send the data? (Please just do not suggest this is a bad idea! I am looking for an answer to this specific question, not alternative suggestions)
This is my current code (called using ajax), but when array size goes above 3 millions I receive outofmemory exception. It seems the serialization (stringbuilder?) fails.
List<double> results = DomainModel.GetPoints();
JsonResult result = Json(results, JsonRequestBehavior.AllowGet);
result.MaxJsonLength = Int32.MaxValue;
return result;
I do not have much experience with web programming/javascript/MVC. I have been researching for the past 24 hours but did not get anywhere, so I need a hint/sample code to continue my research.
NO, NO, NO, you do not send that much information to the browser:
it results in a huge memory usage that will most likely crash the web-browser (and in fact in your case it does)
it takes a large amount of time to retrieve it, not everyone has a good internet connection, and even good connections can fluctuate over time
If you're building a map tool, then I'd recommend splitting the map into tiles and sending only the data corresponding to the portion of the map the user is currently working on. Also for larger zooms you can filter out data, as surely you can't place it all on the map.
Edit. A somewhat another alternative would be to ask your users to use machines with at least 16GB of RAM, or whatever RAM size is needed to deal with your huge data).
Related
I am performance testing a map based web application, where a query gets fired onto the DB and in return a tabular data and map appears. Sort of like Google Maps. The problem i suppose is with rendering. While the time taken to actually "see" a table on the browser is around 1 minute, the same transaction takes around 3 minutes to complete in Vugen.
While advance tracing the logs, it shows a JSON response (of the above mentioned table) is getting downloaded. This response is of 6 Mb and is delaying the transaction to complete. I have been careful that no asynchronous calls are going along with this GET call and is actually covered with the lr_start_transaction and lr_end_transaction which is causing this high response time.
I understand that we might get better response over client side activities using TruClient protocol or others, but there is a restriction over it and Web HTTP protocol needs to be used.
I am using HP LR 12.02 version, over WinInet capture level.
My question is, is there any way i can actually emulate that "1 minute" time that a user would actually need to "see" the tabular data, rather than the 3 minutes it is taking. its okay if i disregard this JSON response and not download this 6 Mb data, if it makes any difference.
Any suggestion would be much appreciated. Thanks!
I've been trying to resolve this problem for a while now, I even gave it a shot to rewrite the entire program but without success. The application is running on VueJS 2.3.3 and is supposed to be running on Chromium in combination with a Raspberry Pi (irrelevant information, for now).
We're working with several components which are being included in a single file, later on this file will be compiled using either gulp or npm run dev. When the instance of VueJS initializes, a request will be send using Vue Resource's $http option. This'll receive a json response with a size of around 30mb. This'll be saved in the data array, as seen here:
this.$http.get('<url>' + this.token)
.then((response) => {
this.properties = response.properties;
});
This data will later on be used for further actions, another thing that is worth mentioning is that the data is being refreshed every once in a while. Which is where I think the problem occurs, if I'm not refreshing the data every 5 minutes (can be longer too, really depends on the way I'm testing) the program runs fine. It's just that I want to refresh the data every once in a while to retrieve new information. The way of setting a timeout which I'm using is as following:
this.dataTimeout = setTimeout(this.refreshData, 300000);
Each (so called) property has at least 6 base64 images saved in it's JSON, which are later used to present to the user. Besides that, there is a name, address, and some other tiny bits of data. It doesn't sound all that wrong but I'm getting the feeling that each response makes the memory grow so intense that even a desktop is getting trouble running it.
Each 10 seconds a new property will be presented on the user's screen including the images, street, location, etc. I'm not sure if there is a memory leak in my code or if I'm forgetting something. A few questions pop up in my head:
Do I need to reset the response I'm getting from the server back to
null or undefined?
Could there be a leak in one of the plugins I'm using (Just VueResources)?
For how long can a VueJS instance stay alive, is there any recommended time to reload the entire program?
What thinks should I take in consideration in order to achieve this at all?
I've taken out the data renewal and put a demo project online, this can be seen right here. The main problem I'm having is that the browser just runs out of memory and shows us the amazing Aw snap! page from Chrome. I tried taking snapshots from the memory usage but it all seems fine, it just explodes randomly after a while.
Well, I don't know what really does your app, but are your 30Mb of data really useful / essential ? In JSON moreover ?
Maybe you don't need all this data, and you could just adapt the data to your needs. For example, keep your JSON store data, and retrieve your Base64 images by another way.
I don't understand why you store in memory images. Images are just useful for display purpose in my opinion.
So I think 30Mb is really huge. But maybe I'm wrong ?
By the way, I've tested with Firefox Nightly, no problem here. Doesn't seem to crash. Maybe I don't encounter the refresh call ?
I am writing a web application for mapping Real-time GPS coordinates on Google maps coming from a GPS device, for fleet managment.
Since the flow of data is very fast from the GPS device to web application for database it becomes very heavy and the database is being queried every 5 seconds(via AJAX from web browser running the website) it becomes more heavy.
Keeping the updates in real-time is becoming very difficult a lagging of 30 seconds to 60 seconds is created between the actually update and its visibility on the website.
I am using Django + Apache + MySQL on CentOS 6.4 64 bit.
Any advice in what direction i should move to make the processing/visibility of data in more real-time would be helpful.
I would suggest you to use NoSql database like MongoDB. It would really help you to achieve real time application performance.
Have a look at Django-With-MonoDB.
And if possible try to replace default python interpreter to PyPy.
I think these two are enough to give you best performance. :)
Understanding Django-using-PyPy
Also for front-end you should use KnockoutJS or AngularJS.
Some tipps:
Avoid xml, especially a DOM based xml parser (this blows up data by a factor of 100). (A lat long coordinate without time, needs 8 bytes, not more)
favor a binary represenattion of the coordinates, and parse them by hand, instead using an slow generated parsing code taht probaly uses reflection to parse.
try to minimize the use of databases, especially relational ones.
raise the intervall that clients are sending: e.g evry 20min instead
evry 5.
if you use a db, minimize the transactions, try to do all processing
in one transaction.
I have an optimization question.
Lets say that I'm making a website, and it has a JSON file with 5,000 pairs (about 582 kb) and through the combination of 3 sliders and some select tags it is possible to display every value. So the time to appear between every pair is in microseconds.
My question is: If the website is also made to run on mobile browsers, where is it more efficient to have the 5000 pairs of data - in a JSON file or in the data base? and why?
I am building a photo site with similar requirements and I can say after months of investigations and experimenting that there are no easy answer to that question. But I will try to give you some hints:
Try to divide the data in chunks, for example - if your sliders are selecting values between 1 through 100, instead of delivering exactly what the client selected, round up a bit maybe +-10 or maybe more, that way you can continue filtering on the client side without a server roundtrip. Save all data in client memory before querying.
Don't render more than what is visible on the screen, JSON storage and filtering is fast but DOM is very slow, minimize the visible elements.
Use 304 caching - meaning - whenever the client is requesting the same data twice; send a proper 304 response with etag. For example - a good rule of thumb here is to use something you know very easily, like the max ID in the database or so to see if any new data has been updated since the last call. If not, just send 304 and the client will use whatever he had the last time.
Use absolute positioning. Don't even try to use the CSS float or something like that, it will not work. Just calculate each position of each element. This will also help you to achieve tip nr 2 (by filtering out all elements that are outside of the visible screen). You can still use CSS transitions which gives nice animations when they change sliders.
You could experiment with IndexedDB to help with the client side querying but unfortunately the support in different browsers are still not good enough plus you hit the roof on storage, better to use the ordinary cache and with proper headings.
Good Luck!
A database like MongoDB would be good for this. It still uses the JSON syntax for storage so you can use the values from the JSON file. The querying is very fast too and you wouldn't have to parse the JSON file and store it in an object before using it.
Given the size of the data (just 582Kb) I will opt for the Json file.
The drawback is you will have a penalty starting the app and loading the data in memory, but then all queries will run very fast in memory as a good advantage.
You need to think about how much acceses will your app do to the database (how many queries) against load the file just once. And think if your main objective are mobile browsers or pcs.
For this volume of data I wouldn't try a database (another process consuming resources), just try how much resources (time, memory) are needed to load the JSON file.
If the data is going to grow... then you will need to rethink this, or maybe split your json file following some criteria.
Lets say that the API is well documented and every possible response field is described.
Should web application's server API exclude null fields in a JSON response to lower the amount of traffic? Is this a good idea at all?
I was trying to calculate the amount of traffic reduced for a large app like Twitter, and the numbers are actually quite convincing.
For example: if you exclude a single response field, "someGenericProperty":null, which is 26 bytes, from every single API response, while Twitter is reportedly having 13 billion API requests per day, the amount of traffic reduction will be >300 Gb.
More than 300 Gb less traffic every day is quite a money saver, isn't it? That's probably the most naive and simplistic calculation ever, but still.
In general, no. The more public the API and and the more potential consumers of the API, the more invariant the API should be.
Developers getting started with the API are confused when a field shows up some times, but not other times. This leads to frustration and ultimately wastes the API owner's time in the form of support requests.
There is no way to know exactly how downstream consumers are using an API. Often, they are not using it just as the API developer imagines. Elements that appear or disappear based on the context can break applications that consume the API. The API developer usually has no way to know when a downstream application has been broken, short of complaints from downstream developers.
When data elements appear or disappear, uncertainty is introduced. Was the data element not sent because the API considered it to be irrelevant? Or has the API itself changed? Or is some bug in the consumer's code not parsing the response correctly? If the consumer expects a fields and it isn't there, how does that get debugged?
On the server side, extra code is needed to strip out those fields from the response. What if the logic to strip out data the wrong? It's a chance to inject defects and it means there is more code that must be maintained.
In many applications, network latency is the dominating factor, not bandwidth. For performance reasons, many API developers will favor a few large request/responses over many small request/responses. At my last company, the sales and billing systems would routinely exchange messages of 100 KB, 200 KB or more. Sometimes only a few KB of the data was needed. But overall system performance was better than fetching some data, discovering more was needed then sending additional request for that data.
For most applications some inconsistency is more dangerous than superfluous data is wasteful.
As always, there are a million exceptions. I once interviewed for a job at a torpedo maintenance facility. They had underwater sensors on their firing range to track torpedoes. All sensor data were relayed via acoustic modems to a central underwater data collector. Acoustic underwater modems? Yes. At 300 baud, every byte counts.
There are battery-powered embedded applications where every bytes counts, as well as low-frequency RF communication systems.
Another exception is sparse data. For example, imagine a matrix with 4,000,000 rows and 10,000 columns where 99.99% of the values of the matrix are zero. The matrix should be represented with a sparse data structure that does not include the zeros.
It's definitely dependent from the service and the amount of data it provides; it should be evaluate the ratio about null / not null data and set a threshold over than it worth to exclude that elements.
Thanks for sharing, it's an interesting point as for me.
The question is on a wrong side - JSON is not the best format to compress or reduce traffic, but something like google protobuffers or bson is.
I am carefully re-evaluating nullables in the API scheme right now. We use swagger (Open API) and json scheme does not really have something like nullable type and I think there is a good reason for this.
If you have a JSON response that maps a DB integer field which is suddenly NULL (or can be according to DB scheme), well it is indeed ok for relational DB but not at all healthy for your API.
I suggest to adopt and follow a much more elegant approach, and that would be to make better use of "required" also for the response.
If the field is optional in the response API scheme and it has null value in the DB do not return this field.
We have enabled strict scheme checks also for the API responses, and this gives us a much better control of our data and force us not to rely on states in the API.
For the API client that of course means doing checks like:
if ("key" in response) {
console.log("Optional key value:" + response[key]);
} else {
console.log("Optional key not found");
}