I am thinking of improving website performance by moving rendering to the client side. The current stack is: (router, sphinx, db) + HTML. I am thinking of changing this to: (router, sphinx, db) + JSON.
All of clients are running i7 processors and they don't care much about client side rendering performance. We also have client side app which is ready to connect to resful JSON API (this is not to go into discussion about client vs server side rendering).
1) Rendering on server takes about 20% of time (and 80% goes to routing, sphinx, db). I heard that outputting JSON takes about half of the time that it takes to output HTML, so I think it would be 10% improvement, and those 10% could go into data processing. Am I right about that?
2) I believe that 10% improvement for one server means that, to get the same amount of performance with a large scale app with 100 physical servers, we need 10% less servers: in this case 90 instead of 100. Is this correct?
3) How is it possible to get the best performance in Ruby to output JSON instead of any other format?
4) Taking daily scenarios, what difference could be made performance wise if we output JSON instead of HTML?
1, 2) Probably yes, but there could be uncounted for factors, which may make the performance increase to be less than you expect. Like, if bottleneck is IO, and as HTML creation probably is CPU-limited, then reducing CPU load will only let CPUs idle more. Only way to find out really is to have reliable benchmarks while running parallel request handling, and get hard numbers.
Further, spending the hours to develop client-side rendering might be more expensive, than just paying for more server capacity... Moore's law is still holding, doing that kind of optimization for such a small improvement is probably not worth the development cost... Probably better to concentrate those dev resources on something which would increase revenue, instead of trying to make small savings.
3) JSON generation probably uses a native library, while HTML generation happens in Ruby script code. And native code is typically 1-2 orders of magnitude faster than interpreted (and not JIT-compiled) code at low level operations. The higher level operation it is, the narrower the gap, so if "generate JSON" is the high level operation, then it's equally fast if you call it from Ruby or from compiled language code.
4) Well, not sure I understand the question, but see answer 1,2...
see http://openmymind.net/2012/5/30/Client-Side-vs-Server-Side-Rendering/ maybe that will help you
best way to find out for you particular case is to implement it and test.
you can use new relic and google analytics (maybe others as well) to see client performance and rendering times and experience
Related
There's a similar question about streaming large results but the answer just points at docs and no clear answer emerges.
I believe that merely treating a full result set as a stream still takes a lot of memory on the jdbc driver side..
I am wondering if there's any clear cut pattern, or best practice, for making it work, especially on the jdbc driver side.
And in particular I am not sure why setFetchSize(Integer.MIN_VALUE) is a very good idea, as it seems far from optimal if that means each row is sent on its own on the wire.
I believe libraries like jooq and slick already take care of that... and am curious as to how to accomplish it with and without them.
Thanks!
I am wondering if there's any clear cut pattern, or best practice, for making it work, especially on the jdbc driver side.
The best practice is not to do synchronous streaming but rather fetch in moderate size chunks. However avoid using OFFSET (also see). If your doing a batch process this can be facilitated by first selecting and pushing the data into a temporary table (ie turn your original results you want into a table first and then select chunks from the table... databases are really fast at copying data internally).
Synchronous streaming in general does not scale (aka iterator). It does not scale well for batch processing and it certainly does not scale for handling loads of clients. This is why the drivers vary and do so many different things because its fairly difficult to figure out how much resources to load because it is a pull model. Async streaming (push model) would probably help but unfortunately the JDBC standard does not support async streaming.
You might notice but this is one of the reasons why many of the wrappers around JDBC such as Spring JDBC do not return Iterators (along with fact that the resource also needs to be manually cleaned up). Some of the wrappers provide iterators but really they just turn the results into a list .
Your link to the Scala version is rather disturbing that its upvoted given the stateful nature of managing a ResultSet... its very un-Scala like... I'm not sure those folks know they have to consume the iterator or close the connection/ResultSet properly which requires a fair amount of imperative programming.
While it may seem inefficient to let the database decide how much to buffer just remember that most database connections are extremely heavy memory wise (at least on postgres they are). So if you take a long time streaming and have many clients your going to have to create more connections and put serious burden on the database. Not to mention the default buffers have probably been highly optimized (ie the resultset size that client ends up with).
Finally for batch processing chunks can be done in parallel which is obviously more efficient than a synchronous pipeline as well as being restarted (with out having to rework already processed data) if a problem occurs.
I am new to programming and am working on pushing real time data from a PLC to a web page either by deploying HTML 5 on the WAGO or a Modbus driver wrapper. I honestly have tried to research but don't know where to start. it will be a closed private network with little to no influence from the outside web. I am simply looking to display a single piece of live information for proof of concept. basically I'm trying to custom design a Groov program.
You might want to look into using OPC. Kepware & SoftwareToolbox are just 2 of many vendors that offer tools to help you get your data the way you want it.
There is an existing tool to do what you want, but I am under the impression you have to write one from scratch. The existing tool is http://www.softwaretoolbox.com/cogentdatahub/ if you are interested in looking at it for ideas.
I've been able to interface with PLC using python and modbusTCP and an Raspberry pi as the webserver. Python is a quick and easy to learn language. Websockets are the HTML5 component best used for realtime data.
simple connect code (after you install everything):
from pymodbus.client.sync import ModbusTcpClient as ModbusClient
from time import sleep
client = ModbusClient('ip_address_of_modbus_IO')
if(client.connect()):
print(client.read_discrete_inputs(200,1).bits[0])
client.write_coil(0,True)
sleep(100)
client.write_coil(2,True)
found here:
http://simplyautomationized.blogspot.com/2013/09/home-automation-project-2-rpi-light.html
Can create a websocket broadcast server using an example here:
http://simplyautomationized.blogspot.com/2015/09/raspberry-pi-create-websocket-api.html
Fortunately you can not push data to a browser.
The Internet would become an even greater mess if you could.
To solve this, have your webpage contain a timer, written in JavaScript.
Every say 1 sec. it does an AJAX request (e.g. use jQuery implementation) to the server, which then delivers (almost) realtime data.
The webpage then displays that in some DOM element, e.g. an empty DIV.
So it's the browser polling your server.
#BlueDog
The data is "almost" realtime because sampling once a second gives a delay of at least one second. In the ideal case, as soon as data changes, it would be pushed to the browser. Unfortunately the browser has no way of knowing that anything changed, so the best it can do is frequently "ask" for updates (polling).
How much the delay is depends on your poll frequency. If it's once per second one has to add the delays for transmission of the page request and the reply of the server. The transmission time depends on your network (which may be the Internet with all uncertainty involved). If the backbones involved have enough capacity I expect overall delay to be between 1 and 1.5 seconds. With a dedicated network and even more frequent polling, I expect that 0.5 seconds should be possible. These are however estimated averages. If I request a page over the Internet and my provider (again) has a problem, it may be hours before I receive what I want. Also things like virus scanners and OS updates may spoil your game.
So, practically: with a good broadband connection, a stable browser and the right process priorities it should be possible to get below 1 second overall delay (incl. poll time interval) for 95% of the time. Be prepared to reboot the client every few days. Most browsers leak memory and most OS'es do so too.
I am writing a web application for mapping Real-time GPS coordinates on Google maps coming from a GPS device, for fleet managment.
Since the flow of data is very fast from the GPS device to web application for database it becomes very heavy and the database is being queried every 5 seconds(via AJAX from web browser running the website) it becomes more heavy.
Keeping the updates in real-time is becoming very difficult a lagging of 30 seconds to 60 seconds is created between the actually update and its visibility on the website.
I am using Django + Apache + MySQL on CentOS 6.4 64 bit.
Any advice in what direction i should move to make the processing/visibility of data in more real-time would be helpful.
I would suggest you to use NoSql database like MongoDB. It would really help you to achieve real time application performance.
Have a look at Django-With-MonoDB.
And if possible try to replace default python interpreter to PyPy.
I think these two are enough to give you best performance. :)
Understanding Django-using-PyPy
Also for front-end you should use KnockoutJS or AngularJS.
Some tipps:
Avoid xml, especially a DOM based xml parser (this blows up data by a factor of 100). (A lat long coordinate without time, needs 8 bytes, not more)
favor a binary represenattion of the coordinates, and parse them by hand, instead using an slow generated parsing code taht probaly uses reflection to parse.
try to minimize the use of databases, especially relational ones.
raise the intervall that clients are sending: e.g evry 20min instead
evry 5.
if you use a db, minimize the transactions, try to do all processing
in one transaction.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Very large HTTP request vs many small requests
I need a 2D array(as Json) to be sent from server to client. It would be around 400X400 in size with each entry around 4 characters of text. So that makes it around 640KB of data.
Which of the following extreme approaches is better ?
I make a large HTTP request of all the data at one go.
I make 400 requests - each asking for a single row (around 1.6 KB)
I believe optimal approach would be somewhere in middle. Could anyone give me an idea what might be the optimal single request size for this data.
Thanks,
When making a request you always have to deal with some overhead (like DNS request, opening connection and closing it). So it might be wiser to have 1 big request.
Also you might experience better gzip/deflate compression when having 1 big request.
Depends on the application and the effect you wish to achieve. Here are two scenarios:
if you are dealing with a GUI then perhaps chunking is a good idea where a small chunk would update the visuals giving an illusion of 'speed' to human. Here you want to logically chunk up the data as per gui update requirements. You can apply this same concept to prioritizing any other pseudo-real-time scenario.
If on the other hand you are just dumping this data then don't chunk, since 100 6 byte requests are overall significantly more time consuming than 1 600 byte request.
Generally speaking however, network packet transportation (TCP) chunking and delivery is FAR more optimized than whatever you could come up with at the application transport layer (HTTP). Multiple requests / chunks means multiple fragments.
It is generally a futile effort to try to do transport layer optimizations using application layer protocol. And, IMHO it defeats the purpose of both :-)
If you have real time requirements for whatever reason, then you should take control of the transport itself and do optimization there (but that does not seem to be the case).
Happy coding
Definitely go with 1 request and if you enable gzip compression on the server you wont be sending anything near 640KB
All the page speed optimisation tools(eg yslow, google page speed etc) recommend reducing the number of requests to speed up page load times.
Small number of HTTP requests would be better so make one Request.
I've recently begun to unveil and slowly roll out a homemade CMS. The site allows a lot of customization with movement towards internationalization and customization onto a level that doesn't require source code. This is a personal project, and the entire intent was to see how far I can push my own programming limits (the question of distrubtion of a CMS that handles blog, webcomic, and a small forum isn't one that I'm willing to consider, not until I clean it up and work on it some more--as well, seeing as it's an amateur project, I doubt it has any gravity compared to other, more refined projects... but those are not topics that concern the topic at hand.)
I've instituted a series of code that allows me to see how fast each page is generated and how many queries are ran; on average, I'm seeing 9-13, upwards to 12 MySQL queries performed per page. Average time to generate a page is somewhere between 10-20 ms. Now, not having any experience with professional design, what is the optimum that I should be striving for?
What are ways to reduce generation time (or, with an average of 15 ms/page, is this not even a concern), or tactics on reducing the number of a queries on a page where most of the content is loaded FROM an MySQL database, including things like menu items.
Mind you, this is a very broad question; it isn't my intent to ask a general question or spark conversation, but to find out ways of reducing the load (if any) on a server that such a system could create.
Using a PHP opcode cache will dramatically cut down on the time taken to open and compile PHP scripts, by skipping the parsing and compilation into bytecode.
Turning on the MySQL query cache is generally (though not always) a good idea.
Rather than focusing on the number of queries, focus on reducing the time those queries take by optimising your queries. It is often much more efficient to have a larger number of small, optimised queries than to try and reduce the number of queries.
Use a profiler such as the one built in to XDebug. Together with an interpreter like KCacheGrind or WinCacheGrind, optimising code really helps when you know what to focus on. It's not worth optimising something that contributes only a negligible amount to your total execution time. It's worth getting to know what everything in *CacheGrind means.
My PHP content management system usually loads a page in about the same amount of time (down to minimum 8ms where everything is a cache hit). But very occasionally, when you do something complex it may take over 500ms. When concerned about user experience the typical time is more important, not so much the outliers, but when concerned about server load the average time is more important, so those 500ms outliers are suddenly quite important.
If you are mainly developing these sites for small companies or other reasons where you don't predict or imagine a high traffic (i.e. nothing like Digg/Facebook/etc) then an average of 15ms should be fine.
May I ask what the 12 queries are for? I imagine they are for getting menus items, getting page content and the like. There are various methods of combining/optimising queries, so if you perhaps post a few I (and other stackers) may be able to help you optimise your queries.
It depends...as it always does with performance questions, if the system currently meets your performance requirement then don't worry too much.
Generally if your page generation time is 15ms it will only be a fraction of the total click to glass time that the user experiences, see the yahoo exception performance pages There will be other things to look at in order to get the fastest possible page load time.
On the server, the chances are that the db is going to be caching the results to nearly all if not all of the queries you are running, hence the very fast page timing. You might want to load up a larger set of data if you haven't already done so to test the app, you might find a performance will degrade with the size of the data set.