Reducing Ruby on Rails traffic to the database - mysql

The networking team has flagged our Ruby on Rails application as one of the top producers of network traffic on our network, specifically from packet traffic between the app server and the database server (mysql).
What are the recommended best practices to reduce traffic between a Rails app and the database? Persistent database connections?

Is it an actual problem, or do they ding the top 3 db consumers no matter what? Check your logs or have them supply you with a log of queries that they think are problematic.
Beyond that, check to see if you're doing bad things like making model calls from your views in loops. Your logs should tell you what's going on here, if you see each partial paired with a query every time it's rendered, that's a big sign that your logic should be pulled back into the models and controllers.

Fire up Wireshark or another network scanner and look for the biggest packets or small packets that are too frequent - to identify the specific, troublesome queries.
Then, before even considering caching, check if that query can really be cached or if it just pulls too much data you are not using.
At this point, there are too many different possible causes - each with it's own recommended practices.

Related

Does Symfony3 / Doctrine open one MySQL connection per visitor?

So I have developped this website with Symfony3 and Doctrine. I have one major concern about performance with MySQL and more specifically the number of simultaneous open connexions.
For the moment, one to five users are online on the website. What happens if, let's say, 1,500 users connect within one minute? Does Symfony3 or Doctrine handle this kind of situations? How can I be sure the website doesn't go down providing me with the Too many connections MySQL error?
And if I go up to 5,000? And 10,000? The server has 4GB of RAM and a 2.40Ghz mono-core processor but I wouldn't worry about the hardware as I'm more concerned about MySQL.
These situations already happened in the past but I was running the website with Wordpress and W3 Total Cache plugin. Should I consider using a cache manager such as memcached or else?
In short, I'm concerned about the website becoming unavailable in case of sudden high trafic (and thought of the MysQL Too many connections error in first but I might be missing something even more important).
Thanks for lightening me out on this one as I'm not fully aware about performance issues with Symfony.
I believe it does open one connection per visitor. Regardless of whether it does or not however neither Symfony or Doctrine has a magic bullet to handle every load/connection scenario.
Why don't you use a load testing tool (there are many) and see how it actually pans out? In my experience predicting a bottleneck is useless, as they will always crop up where you least expect it.
For example, the MySQL connection limit is only one part of the optimisation puzzle. It's no good just worrying about connection limits, you need to respond to web requests as quickly and efficiently as possible to free up MySQL connection resources (and other resources your app is using). So if your server is slow you will run out of connections (or some other resource) almost immediately under significant load, regardless of MySQL connection limits.
That said, those server specifications seem a little low for 5-10k users per minute. I wouldn't expect a machine like that to handle that kind of load without some serious optimisation/caching/etc.
The symfony performance page is a good starter, and there is also a good article on caching - there's a ton of available material on the subject. Good luck! :)
If you use php-fpm it depends on pm.max_children in fpm/pool.d/www.conf.
pm.max_children refers to the maximum number of concurrent PHP-FPM processes allowed to exist in such a pool. If the volume of incoming requests requires the creation of more PHP-FPM processes than the number allowed by the max_children limit, those additional requests are backlogged in a queue to await service.
So when pm.max_children > max_connections (my.cnf) and active users > max_connections you will get "Too many connections".

Implementing dynamically updating upvote/downvote

How to implement dynamically updating vote count similar to quora:- Whenever a user upvotes an answer its reflected automatically for every one who is viewing that page.
I am looking for an answer that address following:
Do we have to keep polling for upvote counts for every answer, If yes
then how to manage the server load arising because of so many users
polling for upvotes.
Or to use websockits/push notifications, how scalable are these?
How to store the upvote/downvote count in databases/inmemory to support this. How do they control the number of read/writes. My backend database is mysql
The answer I am looking for may not be exactly how quora is doing it, but may be how this can be done using available opensource technologies.
It's not the back-end system details that you need to worry about but the front end. Having connection being open all the time is impractical at any real scale. Instead you want the opposite - to be able to serve and close connection from back-end as fast as you can.
Websockets is a sexy technology, but again, in real world there are issues with proxies, if you are developing something that should work on a variety of screens (desktop, tablet, mobile) it might became a concern to you. Even good-old long polls might not work through firewalls and proxies.
Here is a good news: I think
"keep polling for upvote counts for every answer"
is a totally good solution in this case. Consider the following:
your use-case does not need any real real-time updates. There is little harm to see the counter updated a bit later
for very popular topics you would like to squash multiple up-votes/down-votes into one anyway
most of the topics will see no up-vote/down-vote traffic at all for days/weeks, so keeping a connection open, waiting for an event that never comes is a waste
most of the user will never up-vote/down-vote that just came to read a topic, so your read/write ration of topics stats will be greatly skewed toward reads
network latencies varies hugely across clients, you will see horrible transfer rates for a 100B http responses, while this sluggish client is fetching his response byte-by-byte your precious server connection and what is more importantly - thread on a back end server is busy
Here is what I'd start with:
have browsers periodically poll for a new topic stat, after the main page loads
keep your MySQL, keep counters there. Every time there is an up/down vote update the DB
put Memcached in front of the DB as a write-through cache i.e. every time there is an up/down vote update cache, then update DB. Set explicit expire time for a counter there to be 10-15 minutes . Every time counter is updated expire time is prolongated automatically.
design these polling http calls to be cacheable by http proxies, set expire and ttl http headers to be 60 sec
put a reverse proxy(Varnish, nginx) in front of your front end servers, have this proxy do the caching of the said polling calls. These takes care of the second level cache and help free up backend servers threads quicker, see network latencies concern above
set-up your reverse proxy component to talk to memcached servers directly without making a call to the backend server, yes if your can do it with both Varnish and nginx.
there is no fancy schema for storing such data, it's a simple inc()/dec() operation in memcached, note that it's safe from the race condition point of view. It's also a safe atomic operation in MySQL UPDATE table SET field = field + 1 WHERE [...]
Aggressive multi level caching covers your read path: in Memcached and in all http caches along the way, note that these http poll requests will be cached on the edges as well.
To take care of the long tail of unpopular topic - make http ttl for such responses reverse proportional to popularity.
A read request will only infrequently gets to the front end server, when http cache expired and memcached does not have it either. If that is still a problem, add memecached servers and increase expire time in memcached across the board.
After you done with that you have all the reads taken care of. The only problem you might still have, depending on the scale, is high rate of writes i.e. flow of up/down votes. This is where your single MySQL instance might start showing some lags. Fear not - proceed along the old beaten path of sharding your instances, or adding a NoSQL storage just for counters.
Do not use any messaging system unless absolutely necessary or you want an excuse to play with it.
Websockets, Server Sent Events (I think that's what you meant by 'push notifications') and AJAX long polling have the same drawback - they keep underlying TCP connection open for a long time.
So the question is how many open TCP connections can a server handle.
Basically, it depends on its OS, number of file descriptors (a config parameter) and available memory (each open connection reserves a read/write buffers).
Here's more on that.
We once tested a possibility to keep 1 million websocket connections open on a single server (Windows 7 x64 with 16Gb of RAM, JVM 1.7 with 8Gb of heap, using Undertow beta to serve Web requests).
Surprisingly, the hardest part was to generate the load on the server )
It managed to hold 1M. But again the server didn't do something useful, just received requests, went through protocol upgrade and kept those connections open.
There was also some number of lost connections, for whatever reason. We didn't investigate. But in production you would also have to ping the server and handle reconnection.
Apart from that, Websockets seem like an overkill here, SSE still aren't widely adopted.
So I would go with good old AJAX polling, but optimize it as much as possible.
Works everywhere, simple to implement and tweak, no reliance on an external system (I had bad experience with that several times), possibilities for optimization.
For instance, you could group updates for all open articles in a single browser, or adjust update interval according to how popular the article is.
After all it doesn't seem like you need real-time notifications here.
sounds like you might be able to use a messaging system like Kafka, or RabbitMQ, or ActiveMQ. Your front end would sent votes to a message channel and receive them with a listener, and you could have a server side piece persist the votes to the db periodically.
You could also accomplish your task by polling your database, and by incre/decre menting a number related to a post via a stored proc... there are a bunch of options here and it depends on how much concurrency you may be facing.

Nginx Vs Apache to solve load isseu on website

So Have a web application that has 10-12 pages with many POST/ GET DB Calls. We usually have a apache crash/other problem when site traffic results to 1000 or so (concurrent users) which is very small number, we have updated server with good RAM and resources. When our system admin guy do load testing on blitz and other custom script and is suggesting to move away from Apache. Some things does not make sense to me. Like Apache is not too bad to handle few thousand of concurrent users considering we have cloudflare for caching. Here is what he suggested:
replacement of Apache+mod_fcgi with Nginx+php-fpm which can make the server handle much more users, and then test it.
or
2. For testing: Need 10-20 servers to run a scenario from. Basically, what is needed is a more complex blitz.io analogue. create one server, which takes all those hours, then just clone it in the cloud and pay for about 1 hour of testing multiplied by the number of servers needed.
Once again there are many DB calls anf HT access. ALso what makes Nginx better than apache in this case?
I would check this comparison first. Basically, nginx is event based, so it's able to handle more requests concurrently. However, as the MySQL DB seems to be the choke point here, it's very possible that nginx wouldn't solve all your problems. Perhaps moving to a NoSQL kind of database, that's better at scaling horizontally, would help (if that's feasible).

Prevent 'too many connections'(ConnectionPool is not the answer, looking for mysql server side solution)

A few weeks ago, I post a question about queuing database access request to prevent 'too many connection' error when massive concurrent db requests happen. People told me ConnectionPool is the right way to go which I agreed at that time. However, I finally realized this is not the solution especially when there are a lot of different clients accessing mysql server through network, because connection pool is at client side it can not prevent the sum of connections of all clients from exceeding the max connection number of mysql server.
I think there should be some middleware on the mysql server working as a queue or pool, is anybody familiar with this? Thank you.
I know this question is widely asked, I am also surprised as if there is no total solution for it.
HAProxy should perform TCP-level queueing for you purpose. Though, would it be better to build an application server in the middle, to handle incoming flow at more conscious level than TCP. This could require rewriting of both server and clients, but could give you more control over what's happening.
What you ask is actually a pretty complicated problem.
First of all you need to decide whether mis-alignments in data are acceptable, for example: if you store in the database the number of Likes received, and you ask this number at 12:00:00, and the number in the DB is 500, and someone posts a LIKE at 12:00:01, and you query it again at 12:00:02; is it OK to receive "500" again, even if the correct number should be 501, provided that in a little time the answer "501" does come out?
If this is acceptable (the infamous "301 bug" in YouTube), then you might start caching some SELECT responses.
You might even cache them in middleware, i.e. have a special process running continuously and hogging ONE connection to MySQL, and answering requests in a queue. You might run it internally in the server as a Web server on port 8001 and have an Apache ReverseProxy, HAproxy, pound, or NginX location to proxy it outside.
You can do the same for special UPDATE/DELETE queries even if it's trickier.
It would be best to cache queries running asynchronously through AJAX first, if any, because serializing queries with a proxy is liable to perceptibly slow down the application.
You have a threefold target:
run queries on MySQL as fast as possible (look into indexing and MySQL caching) in order to free the ConnectionPool and keep it as lightly loaded as possible.
refactor the application in order to extract all information from queries (e.g., the number of rows with a certain property AND those rows as data are often retrieved using TWO queries, but with proper management you need only one and a SQLNumRows() call. Also, quite often similar queries with different informations are run, when a single query might have returned all information at one go: typically, one query to check user/password, another to fetch the complete user profile).
divert the most calls possible to something not at all (NginX, middleware) or lightly (queuing process) bound to MySQL; in the latter case, using a known number of connections in order to run predictably.
Unfortunately there's no easy "magic bullet" to solve this problem (except of course increasing the number of connections, maybe replicating the DB on several hosts running as master-slave. While not really a magic bullet, it is easier to design and implement).

economical way of scaling a php+mysql website

My partner and I are trying to start a website hosted in cloud. It has pretty heavy ajax traffic and the backend handles money transactions so we need ACID in some of the DB tables.
Currently everything is running off a single server. Some of the AJAX traffic are cached in text files.
Question:
What's the best way to scale the database server? I thought about moving mysql to separate instances and do master-master duplication. However this seems tough and I heard I might lose ACID properties even with InnoDB? Is Amazon RDS a good solution?
The web server is relatively stateless except for some custom log files and the ajax cache files. What's a good way to scale to multiple web servers? I guess the custom log files can be moved to a reliable shared file system or DB but not sure what to do about the AJAX cache file coherency across multiple servers. (I dont care about losing /var/log/* if web server dies)
For performance it might be cheaper to go with larger instance with more cores and memory but eventually I would need redundancy so wondering what's the best way to do this cheaply.
thanks
take a look at this post. there is plenty of presentations on the net discussing scalability. few things i suggest to keep in mind:
plan early for the data sharding [even if you are not going to do it immediately]
try using mechanisms like memcached to limit number of queries sent to the database
prepare to serve static content from other domain, in the longer run - from ngin-x-alike server and later CDN
redundancy - depends on your needs. is 'read-only' mode acceptable for your site? if so - go with mysql replication + rsync of static files and in case of failover have your site work in that mode till you recover the master node. if you need high availability - then take a look either at drbd replication [at least for mysql] or setup with automated promotion of slave server to become master node.
you might find following interesting:
http://yoshinorimatsunobu.blogspot.com/2011/08/mysql-mha-support-for-multi-master.html
http://mysqlperformanceblog.com
http://highscalability.com
http://google.com - search for scalability, lamp, failover... there are tones of case studies and horror stories from the trench lines :-]
Another option is using a scaleable platform such as Amazon Web Services. You can start out with a micro instance and configure load balancing to fire up more instances as needed.
Once you determine average resource requirements you can then resize your image to larger or smaller depending on your needs.
http://aws.amazon.com
http://tuts.pinehead.tv/2011/06/26/creating-an-amazon-ec2-instance-with-linux-lamp-stack/
http://tuts.pinehead.tv/2011/09/11/how-to-use-amazon-rds-relation-database-service-to-host-mysql/
Amazon allows you to either load balance or change instance size based off demand.