How to do Realtime/fast processing of GPS data in web application? - mysql

I am writing a web application for mapping Real-time GPS coordinates on Google maps coming from a GPS device, for fleet managment.
Since the flow of data is very fast from the GPS device to web application for database it becomes very heavy and the database is being queried every 5 seconds(via AJAX from web browser running the website) it becomes more heavy.
Keeping the updates in real-time is becoming very difficult a lagging of 30 seconds to 60 seconds is created between the actually update and its visibility on the website.
I am using Django + Apache + MySQL on CentOS 6.4 64 bit.
Any advice in what direction i should move to make the processing/visibility of data in more real-time would be helpful.

I would suggest you to use NoSql database like MongoDB. It would really help you to achieve real time application performance.
Have a look at Django-With-MonoDB.
And if possible try to replace default python interpreter to PyPy.
I think these two are enough to give you best performance. :)
Understanding Django-using-PyPy
Also for front-end you should use KnockoutJS or AngularJS.

Some tipps:
Avoid xml, especially a DOM based xml parser (this blows up data by a factor of 100). (A lat long coordinate without time, needs 8 bytes, not more)
favor a binary represenattion of the coordinates, and parse them by hand, instead using an slow generated parsing code taht probaly uses reflection to parse.
try to minimize the use of databases, especially relational ones.
raise the intervall that clients are sending: e.g evry 20min instead
evry 5.
if you use a db, minimize the transactions, try to do all processing
in one transaction.

Related

real time data web page

I am new to programming and am working on pushing real time data from a PLC to a web page either by deploying HTML 5 on the WAGO or a Modbus driver wrapper. I honestly have tried to research but don't know where to start. it will be a closed private network with little to no influence from the outside web. I am simply looking to display a single piece of live information for proof of concept. basically I'm trying to custom design a Groov program.
You might want to look into using OPC. Kepware & SoftwareToolbox are just 2 of many vendors that offer tools to help you get your data the way you want it.
There is an existing tool to do what you want, but I am under the impression you have to write one from scratch. The existing tool is http://www.softwaretoolbox.com/cogentdatahub/ if you are interested in looking at it for ideas.
I've been able to interface with PLC using python and modbusTCP and an Raspberry pi as the webserver. Python is a quick and easy to learn language. Websockets are the HTML5 component best used for realtime data.
simple connect code (after you install everything):
from pymodbus.client.sync import ModbusTcpClient as ModbusClient
from time import sleep
client = ModbusClient('ip_address_of_modbus_IO')
if(client.connect()):
print(client.read_discrete_inputs(200,1).bits[0])
client.write_coil(0,True)
sleep(100)
client.write_coil(2,True)
found here:
http://simplyautomationized.blogspot.com/2013/09/home-automation-project-2-rpi-light.html
Can create a websocket broadcast server using an example here:
http://simplyautomationized.blogspot.com/2015/09/raspberry-pi-create-websocket-api.html
Fortunately you can not push data to a browser.
The Internet would become an even greater mess if you could.
To solve this, have your webpage contain a timer, written in JavaScript.
Every say 1 sec. it does an AJAX request (e.g. use jQuery implementation) to the server, which then delivers (almost) realtime data.
The webpage then displays that in some DOM element, e.g. an empty DIV.
So it's the browser polling your server.
#BlueDog
The data is "almost" realtime because sampling once a second gives a delay of at least one second. In the ideal case, as soon as data changes, it would be pushed to the browser. Unfortunately the browser has no way of knowing that anything changed, so the best it can do is frequently "ask" for updates (polling).
How much the delay is depends on your poll frequency. If it's once per second one has to add the delays for transmission of the page request and the reply of the server. The transmission time depends on your network (which may be the Internet with all uncertainty involved). If the backbones involved have enough capacity I expect overall delay to be between 1 and 1.5 seconds. With a dedicated network and even more frequent polling, I expect that 0.5 seconds should be possible. These are however estimated averages. If I request a page over the Internet and my provider (again) has a problem, it may be hours before I receive what I want. Also things like virus scanners and OS updates may spoil your game.
So, practically: with a good broadband connection, a stable browser and the right process priorities it should be possible to get below 1 second overall delay (incl. poll time interval) for 95% of the time. Be prepared to reboot the client every few days. Most browsers leak memory and most OS'es do so too.

How to send a very large array to the client browser?

I have a very large array (20 million numbers, output of a sql query) in my MVC application and I need to send it to the client browser (it will be visualized on a map using webGL and the user is supposed to play with the data locally). What is the best approach to send the data? (Please just do not suggest this is a bad idea! I am looking for an answer to this specific question, not alternative suggestions)
This is my current code (called using ajax), but when array size goes above 3 millions I receive outofmemory exception. It seems the serialization (stringbuilder?) fails.
List<double> results = DomainModel.GetPoints();
JsonResult result = Json(results, JsonRequestBehavior.AllowGet);
result.MaxJsonLength = Int32.MaxValue;
return result;
I do not have much experience with web programming/javascript/MVC. I have been researching for the past 24 hours but did not get anywhere, so I need a hint/sample code to continue my research.
NO, NO, NO, you do not send that much information to the browser:
it results in a huge memory usage that will most likely crash the web-browser (and in fact in your case it does)
it takes a large amount of time to retrieve it, not everyone has a good internet connection, and even good connections can fluctuate over time
If you're building a map tool, then I'd recommend splitting the map into tiles and sending only the data corresponding to the portion of the map the user is currently working on. Also for larger zooms you can filter out data, as surely you can't place it all on the map.
Edit. A somewhat another alternative would be to ask your users to use machines with at least 16GB of RAM, or whatever RAM size is needed to deal with your huge data).

What is the most efficient way to display lots of data on a website?

I have an optimization question.
Lets say that I'm making a website, and it has a JSON file with 5,000 pairs (about 582 kb) and through the combination of 3 sliders and some select tags it is possible to display every value. So the time to appear between every pair is in microseconds.
My question is: If the website is also made to run on mobile browsers, where is it more efficient to have the 5000 pairs of data - in a JSON file or in the data base? and why?
I am building a photo site with similar requirements and I can say after months of investigations and experimenting that there are no easy answer to that question. But I will try to give you some hints:
Try to divide the data in chunks, for example - if your sliders are selecting values between 1 through 100, instead of delivering exactly what the client selected, round up a bit maybe +-10 or maybe more, that way you can continue filtering on the client side without a server roundtrip. Save all data in client memory before querying.
Don't render more than what is visible on the screen, JSON storage and filtering is fast but DOM is very slow, minimize the visible elements.
Use 304 caching - meaning - whenever the client is requesting the same data twice; send a proper 304 response with etag. For example - a good rule of thumb here is to use something you know very easily, like the max ID in the database or so to see if any new data has been updated since the last call. If not, just send 304 and the client will use whatever he had the last time.
Use absolute positioning. Don't even try to use the CSS float or something like that, it will not work. Just calculate each position of each element. This will also help you to achieve tip nr 2 (by filtering out all elements that are outside of the visible screen). You can still use CSS transitions which gives nice animations when they change sliders.
You could experiment with IndexedDB to help with the client side querying but unfortunately the support in different browsers are still not good enough plus you hit the roof on storage, better to use the ordinary cache and with proper headings.
Good Luck!
A database like MongoDB would be good for this. It still uses the JSON syntax for storage so you can use the values from the JSON file. The querying is very fast too and you wouldn't have to parse the JSON file and store it in an object before using it.
Given the size of the data (just 582Kb) I will opt for the Json file.
The drawback is you will have a penalty starting the app and loading the data in memory, but then all queries will run very fast in memory as a good advantage.
You need to think about how much acceses will your app do to the database (how many queries) against load the file just once. And think if your main objective are mobile browsers or pcs.
For this volume of data I wouldn't try a database (another process consuming resources), just try how much resources (time, memory) are needed to load the JSON file.
If the data is going to grow... then you will need to rethink this, or maybe split your json file following some criteria.

How do I insert/update/delete individuals with Jena SDB keeping maximum performance?

Recently I switched from the OWL API to Jena in the hope that the performance regarding the insertion and querying of data would increase.
So I started by loading my OWL ontology into a MySQL-based triple-store using Jena SDB. Therefore I used
model.read("owl-concepts.turtle")
Jena creates about 1500 nodes within the triple store (in the Mysql-table). Initially I was surprised a little bit about the high number of nodes. But this seems reasonable as the OWL ontology contains approximately 80 OWL classes with several data and object properties.
To read data (individuals) from the ontology I leveraged the Jena SDB interface. I retrieved a model and based on the model an ontModel. I used that ontModel to modify individuals, for instance:
ontModel.createIndividual(...);
ontModel.getIndividual(....);
individual.remove();
For the ontModel i used OWL_MEM; according to the documentation this should mean that no reasoning is applied.
I realized that, based on the described approach, the modification of individual(s) data is not as fast as I expected. On the average the insertion of a simple individual takes between 2 and 30 seconds.
So I started asking: Is using the model interface in Jena the recommended way of modifying data or does this approach has low performance and instead SPARQL should be used for the modification of data? My original plan was to use SPARQL only for the querying part...
Would be thankful for every expert opinion or your experience with Jena.
Using a persistent triple store -- particularly SDB -- with a reasoner is not a good idea. Reasoners often perform a large number of random accesses on the database, each of which has a little overhead. Once you add them up things get slow.
Similarly, use SPARQL rather than the model or ontology API, since you generate lots of small accesses again.
Given the size of your data this probably fits in memory comfortably, so do that. You can always move the data en mass from and too the SDB store to persist it.
For just loading,
store.getLoader().startBulkUpdate();
...
store.getLoader().flushTriples();
(where store is the SBD store object)
but if you are adding and deleting, it's difficult to speed up.
One approach is read all the data into memory, work there, and put it all back. You can do this with the data severed by Fuseki, and using the graph store protocol part of SPARQL. You can use any storage backend.

JSON or HTML: Which output can perform better?

I am thinking of improving website performance by moving rendering to the client side. The current stack is: (router, sphinx, db) + HTML. I am thinking of changing this to: (router, sphinx, db) + JSON.
All of clients are running i7 processors and they don't care much about client side rendering performance. We also have client side app which is ready to connect to resful JSON API (this is not to go into discussion about client vs server side rendering).
1) Rendering on server takes about 20% of time (and 80% goes to routing, sphinx, db). I heard that outputting JSON takes about half of the time that it takes to output HTML, so I think it would be 10% improvement, and those 10% could go into data processing. Am I right about that?
2) I believe that 10% improvement for one server means that, to get the same amount of performance with a large scale app with 100 physical servers, we need 10% less servers: in this case 90 instead of 100. Is this correct?
3) How is it possible to get the best performance in Ruby to output JSON instead of any other format?
4) Taking daily scenarios, what difference could be made performance wise if we output JSON instead of HTML?
1, 2) Probably yes, but there could be uncounted for factors, which may make the performance increase to be less than you expect. Like, if bottleneck is IO, and as HTML creation probably is CPU-limited, then reducing CPU load will only let CPUs idle more. Only way to find out really is to have reliable benchmarks while running parallel request handling, and get hard numbers.
Further, spending the hours to develop client-side rendering might be more expensive, than just paying for more server capacity... Moore's law is still holding, doing that kind of optimization for such a small improvement is probably not worth the development cost... Probably better to concentrate those dev resources on something which would increase revenue, instead of trying to make small savings.
3) JSON generation probably uses a native library, while HTML generation happens in Ruby script code. And native code is typically 1-2 orders of magnitude faster than interpreted (and not JIT-compiled) code at low level operations. The higher level operation it is, the narrower the gap, so if "generate JSON" is the high level operation, then it's equally fast if you call it from Ruby or from compiled language code.
4) Well, not sure I understand the question, but see answer 1,2...
see http://openmymind.net/2012/5/30/Client-Side-vs-Server-Side-Rendering/ maybe that will help you
best way to find out for you particular case is to implement it and test.
you can use new relic and google analytics (maybe others as well) to see client performance and rendering times and experience