So Have a web application that has 10-12 pages with many POST/ GET DB Calls. We usually have a apache crash/other problem when site traffic results to 1000 or so (concurrent users) which is very small number, we have updated server with good RAM and resources. When our system admin guy do load testing on blitz and other custom script and is suggesting to move away from Apache. Some things does not make sense to me. Like Apache is not too bad to handle few thousand of concurrent users considering we have cloudflare for caching. Here is what he suggested:
replacement of Apache+mod_fcgi with Nginx+php-fpm which can make the server handle much more users, and then test it.
or
2. For testing: Need 10-20 servers to run a scenario from. Basically, what is needed is a more complex blitz.io analogue. create one server, which takes all those hours, then just clone it in the cloud and pay for about 1 hour of testing multiplied by the number of servers needed.
Once again there are many DB calls anf HT access. ALso what makes Nginx better than apache in this case?
I would check this comparison first. Basically, nginx is event based, so it's able to handle more requests concurrently. However, as the MySQL DB seems to be the choke point here, it's very possible that nginx wouldn't solve all your problems. Perhaps moving to a NoSQL kind of database, that's better at scaling horizontally, would help (if that's feasible).
Related
I wrote a web application using python and Flask framework, and set it up on Apache with mod_wsgi.
Today I use JMeter to perform some load testing on this application.
For one web URL:
when I set only 1 thread to send request, the response time is 200ms
when I set 20 concurrent threads to send requests, the response time increases to more than 4000ms(4s). THIS IS UNACCEPTABLE!
I am trying to find the problem, so I recorded the time in before_request and teardown_request methods of flask. And it turns out the time taken to process the request is just over 10ms.
In this URL handler, the app just performs some SQL queries (about 10) in Mysql database, nothing special.
To test if the problem is with web server or framework configuration, I wrote another method Hello in the same flask application, which just returns a string. It performs perfectly under load, the response time is 13ms with 20-thread concurrency.
And when doing the load test, I execute 'top' on my server, there are about 10 apache threads, but the CPU is mostly idle.
I am at my wit's end now. Even if the request are performed serially, the performance should not drop so drastically... My guess is that there is some queuing somewhere that I am unaware of, and there must be overhead besides handling the request.
If you have experience in tuning performance of web applications, please help!
EDIT
About apache configuration, I used MPM worker mode, the configuration:
<IfModule mpm_worker_module>
StartServers 4
MinSpareThreads 25
MaxSpareThreads 75
ThreadLimit 64
ThreadsPerChild 50
MaxClients 200
MaxRequestsPerChild 0
</IfModule>
As for mod_wsgi, I tried turning WSGIDaemonProcess on and off (by commenting the following line out), the performance looks the same.
# WSGIDaemonProcess tqt processes=3 threads=15 display-name=TQTSERVER
Congratulations! You found the performance problem - not your users!
Analysing performance problems on web applications is usually hard, because there are so many moving parts, and it's hard to see inside the application while it's running.
The behaviour you describe is usually associated with a bottleneck resource - this happens when there's a particular resource that can't keep up, so queues requests, which tends to lead to a "hockey stick" curve with response times - once you hit the point where this resource can't keep up, the response time goes up very quickly.
20 concurrent threads seems low for that to happen, unless you're doing a lot of very heavy lifting on the page.
First place to start is TOP - while CPU is low, what's memory, disk access etc. doing? Is your database running on the same machine? If not, what does TOP say on the database server?
Assuming it's not some silly hardware thing, the next most likely problem is the database access on that page. It may be that one query is returning literally the entire database when all you want is one record (this is a fairly common anti pattern with ORM solutions); that could lead to the behaviour you describe. I would use the Flask logging framework to record your database calls (start, end, number of records returned), and look for anomalies there.
If the database is performing well under load, it's either the framework or the application code. Again, use logging statements in the code to trace the execution time of individual blocks of code, and keep hunting...
It's not glamorous, and can be really tedious - but it's a lot better that you found this before going live!
Look at using New Relic to identify where the bottleneck is. See overview of it and discussion of identifying bottlenecks in my talk:
http://lanyrd.com/2012/pycon/spcdg/
Also edit your original question and add the mod_wsgi configuration you are using, plus whether you are using Apache prefork or worker MPM as you could be doing something non optimal there.
This is not the typical question, but I'm out of ideas and don't know where else to go. If there are better places to ask this, just point me there in the comments. Thanks.
Situation
We have this web application that uses Zend Framework, so runs in PHP on an Apache web server. We use MySQL for data storage and memcached for object caching.
The application has a very unique usage and load pattern. It is a mobile web application where every full hour a cronjob looks through the database for users that have some information waiting or action to do and sends this information to a (external) notification server, that pushes these notifications to them. After the users get these notifications, the go to the app and use it, mostly for a very short time. An hour later, same thing happens.
Problem
In the last few weeks usage of the application really started to grow. In the last few days we encountered very high load and doubling of application response times during and after the sending of these notifications (so basically every hour). The server doesn't crash or stop responding to requests, it just gets slower and slower and often takes 20 minutes to recover - until the same thing starts again at the full hour.
We have extensive monitoring in place (New Relic, collectd) but I can't figure out what's wrong; I can't find the bottlekneck. That's where you come in:
Can you help me figure out what's wrong and maybe how to fix it?
Additional information
The server is a 16 core Intel Xeon (8 cores with hyperthreading, I think) and 12GB RAM running Ubuntu 10.04 (Linux 3.2.4-20120307 x86_64). Apache is 2.2.x and PHP is Version 5.3.2-1ubuntu4.11.
If any configuration information would help analyze the problem, just comment and I will add it.
Graphs
info
phpinfo()
apc status
memcache status
collectd
Processes
CPU
Apache
Load
MySQL
Vmem
Disk
New Relic
Application performance
Server overview
Processes
Network
Disks
(Sorry the graphs are gifs and not the same time period, but I think the most important info is in there)
The problem is almost certainly MySQL based. If you look at the final graph mysql/mysql_threads you can see the number of threads hits 200 (which I assume is your setting for max_connections) at 20:00. Once the max_connections has been hit things do tend to take a while to recover.
Using mtop to monitor MySQL just before the hour will really help you figure out what is going on but if you cannot install this you could just using SHOW PROCESSLIST;. You will need to establish your connection to mysql before the problem hits. You will probably see lots of processes queued with only 1 process currently executing. This will be the most likely culprit.
Having identified the query causing the problems you can attack your code. Without understanding how your application is actually working my best guess would be that using an explicit transaction around the problem query(ies) will probably solve the problem.
Good luck!
The networking team has flagged our Ruby on Rails application as one of the top producers of network traffic on our network, specifically from packet traffic between the app server and the database server (mysql).
What are the recommended best practices to reduce traffic between a Rails app and the database? Persistent database connections?
Is it an actual problem, or do they ding the top 3 db consumers no matter what? Check your logs or have them supply you with a log of queries that they think are problematic.
Beyond that, check to see if you're doing bad things like making model calls from your views in loops. Your logs should tell you what's going on here, if you see each partial paired with a query every time it's rendered, that's a big sign that your logic should be pulled back into the models and controllers.
Fire up Wireshark or another network scanner and look for the biggest packets or small packets that are too frequent - to identify the specific, troublesome queries.
Then, before even considering caching, check if that query can really be cached or if it just pulls too much data you are not using.
At this point, there are too many different possible causes - each with it's own recommended practices.
My partner and I are trying to start a website hosted in cloud. It has pretty heavy ajax traffic and the backend handles money transactions so we need ACID in some of the DB tables.
Currently everything is running off a single server. Some of the AJAX traffic are cached in text files.
Question:
What's the best way to scale the database server? I thought about moving mysql to separate instances and do master-master duplication. However this seems tough and I heard I might lose ACID properties even with InnoDB? Is Amazon RDS a good solution?
The web server is relatively stateless except for some custom log files and the ajax cache files. What's a good way to scale to multiple web servers? I guess the custom log files can be moved to a reliable shared file system or DB but not sure what to do about the AJAX cache file coherency across multiple servers. (I dont care about losing /var/log/* if web server dies)
For performance it might be cheaper to go with larger instance with more cores and memory but eventually I would need redundancy so wondering what's the best way to do this cheaply.
thanks
take a look at this post. there is plenty of presentations on the net discussing scalability. few things i suggest to keep in mind:
plan early for the data sharding [even if you are not going to do it immediately]
try using mechanisms like memcached to limit number of queries sent to the database
prepare to serve static content from other domain, in the longer run - from ngin-x-alike server and later CDN
redundancy - depends on your needs. is 'read-only' mode acceptable for your site? if so - go with mysql replication + rsync of static files and in case of failover have your site work in that mode till you recover the master node. if you need high availability - then take a look either at drbd replication [at least for mysql] or setup with automated promotion of slave server to become master node.
you might find following interesting:
http://yoshinorimatsunobu.blogspot.com/2011/08/mysql-mha-support-for-multi-master.html
http://mysqlperformanceblog.com
http://highscalability.com
http://google.com - search for scalability, lamp, failover... there are tones of case studies and horror stories from the trench lines :-]
Another option is using a scaleable platform such as Amazon Web Services. You can start out with a micro instance and configure load balancing to fire up more instances as needed.
Once you determine average resource requirements you can then resize your image to larger or smaller depending on your needs.
http://aws.amazon.com
http://tuts.pinehead.tv/2011/06/26/creating-an-amazon-ec2-instance-with-linux-lamp-stack/
http://tuts.pinehead.tv/2011/09/11/how-to-use-amazon-rds-relation-database-service-to-host-mysql/
Amazon allows you to either load balance or change instance size based off demand.
I was benchmarking my production server (it's in Beta) and the results were poor to say the least. On pages without any dynamic content, 1000 Requests with a concurrency of 1 returned 73 Requests/Sec.
When I start to add MYSQL queries to the equation, things quickly spiral out of control. The same 1000 requests on my homepage produce the following results:
CPU spikes to 50%
Load spikes to 3.7 (though that doesn't always happen)
complete request:1000
failed requests:0
write errors:0
requests/sec: 2.44
transfer rate: 113.26[Kbytes/sec]
90% of requests are served within 142ms.
95% of requests are served within 3531ms (it just keeps getting worse after that).
Taking a look at top while I run the benchmark
mysqld runs as a process is consuming roughly 7% of memory and 2.5% cpu
Apache seems to spawn 7 concurrent processes at times
At other points, Apache does not show up in Top
I'm running preforked Apache on a Micro AWS instance (ubuntu) and I'll upgrade to a higher instance, but I worry that there is an underlying problem here with the code or my Apache setup.
I am deploying Django with Mod_WSGI and I set KeepAliveTimeout to 3 just in case a couple of slow processes were screwing me up.
My code for the homepage is seemingly straightforward and though it requires joins.
def index(request):
posts=Post.objects.filter(photo__isnull=False).order_by('date').distinct()[0:7]
ohouses=Open_House.objects.filter(post__photo__isnull=False).order_by('day').distinct()[0:4]
return render_to_response("index.html", {'posts':posts,'ohouses':ohouses},context_instance=RequestContext(request))
I have left the default configuration in place for MYSQL.
Could this all be attributable to running a Micro Instance? Could my instance be somewhat corrupted? Any other plausible explanations?
There's a ton that goes into quick response times. Django is pretty optimized for what it is, but relying on a framework alone will never get you where you want to be.
If you're going to use Apache, use the MPM fork, and even then disable all modules you don't absolutely need. Apache can be made to run fast, but it's not the fastest horse out there. You'll do better with something like Nginx or (cringe) Cherokee. Cherokee is a good webserver, but usability index is like zero.
Any static resources should be served directly by your webserver or better yet, off a CDN.
Assuming you've optimized your own code to not make inefficient use of queries, Django's built in, automatic query caching will help reduce the overall amount of queries needed to the database. After that, you need to employ something like memcached.
Then, there's the server itself. Depending on the size of your site, you may not need much RAM and CPU, but it's always better to have too much than not enough. It might be beneficial to put some artificial load on your server (automated testing, spidering your site, etc), and see how your system resources hold up. If you get anywhere near capping out (I'd say over 50% with simple tests like that), you need to add some more into your instance's pool.
Search online for articles on how to optimize MySQL. Out of the box, it tends to use a lot more resources than it actually needs; there's lots of room for improvement there. And, if it's not already on its own server, consider strongly offloading it to it's own server. If you're anticipating a lot of traffic, the same server responding to web requests and fetching data from a database will become a bottleneck quick.
Could this all be attributable to running a Micro Instance?
Micro instances burst to 2 CPUs for a short period of time, after which they are severely capped for several minutes. I wouldn't trust any benchmarks done on a Micro EC2 instance for that reason.