We are looking to increase the injection speed to SMTP in a desktop .net application. Currently we are not able to scale more than 30 mails per minute. It runs on a single thread. Can anyone sugest any DLL or any other thing to follow...
Related
I've an api, notifyCustomers() implemented on my batch server which gets called from my application server. It can send notification via three channels SMS, Push & Email. I've separate helper classes for each of them and they all execute in async mode.
I've got around 30k users out of which I usually send notification to the particular set of users ranging from 3k to 20k. The issue that I face is whenever I call that api, mysql performance just goes for a toss, particularly CPU. CPU utilisation goes around 100% for a very long period of around 30 mins
I've figured out workaround by doing following things and it's helping me in keeping things under control:
Using projection instead of domain object
Getting data in batch of 500 in each call
Implemented indexing based on the criteria that I need
No database calls from async methods of SMS, Email and Push
Thread.sleep(10 mins) between each subsequent fetch operation of data batches <== This is the dirty hack that's bothering me a lot
If I remove Thread.sleep() then everything goes haywire because batch server just calls async methods and then fires up db call to fetch next batch of 500 users in very quick successions till the time db server stops responding.
I need help with what I shall be doing in order to get rid of 5th point while keeping things under control? I'm running mysql on RDS with 300 IOPS and 4 GB RAM (db.t3.medium)
I wrote a web application using python and Flask framework, and set it up on Apache with mod_wsgi.
Today I use JMeter to perform some load testing on this application.
For one web URL:
when I set only 1 thread to send request, the response time is 200ms
when I set 20 concurrent threads to send requests, the response time increases to more than 4000ms(4s). THIS IS UNACCEPTABLE!
I am trying to find the problem, so I recorded the time in before_request and teardown_request methods of flask. And it turns out the time taken to process the request is just over 10ms.
In this URL handler, the app just performs some SQL queries (about 10) in Mysql database, nothing special.
To test if the problem is with web server or framework configuration, I wrote another method Hello in the same flask application, which just returns a string. It performs perfectly under load, the response time is 13ms with 20-thread concurrency.
And when doing the load test, I execute 'top' on my server, there are about 10 apache threads, but the CPU is mostly idle.
I am at my wit's end now. Even if the request are performed serially, the performance should not drop so drastically... My guess is that there is some queuing somewhere that I am unaware of, and there must be overhead besides handling the request.
If you have experience in tuning performance of web applications, please help!
EDIT
About apache configuration, I used MPM worker mode, the configuration:
<IfModule mpm_worker_module>
StartServers 4
MinSpareThreads 25
MaxSpareThreads 75
ThreadLimit 64
ThreadsPerChild 50
MaxClients 200
MaxRequestsPerChild 0
</IfModule>
As for mod_wsgi, I tried turning WSGIDaemonProcess on and off (by commenting the following line out), the performance looks the same.
# WSGIDaemonProcess tqt processes=3 threads=15 display-name=TQTSERVER
Congratulations! You found the performance problem - not your users!
Analysing performance problems on web applications is usually hard, because there are so many moving parts, and it's hard to see inside the application while it's running.
The behaviour you describe is usually associated with a bottleneck resource - this happens when there's a particular resource that can't keep up, so queues requests, which tends to lead to a "hockey stick" curve with response times - once you hit the point where this resource can't keep up, the response time goes up very quickly.
20 concurrent threads seems low for that to happen, unless you're doing a lot of very heavy lifting on the page.
First place to start is TOP - while CPU is low, what's memory, disk access etc. doing? Is your database running on the same machine? If not, what does TOP say on the database server?
Assuming it's not some silly hardware thing, the next most likely problem is the database access on that page. It may be that one query is returning literally the entire database when all you want is one record (this is a fairly common anti pattern with ORM solutions); that could lead to the behaviour you describe. I would use the Flask logging framework to record your database calls (start, end, number of records returned), and look for anomalies there.
If the database is performing well under load, it's either the framework or the application code. Again, use logging statements in the code to trace the execution time of individual blocks of code, and keep hunting...
It's not glamorous, and can be really tedious - but it's a lot better that you found this before going live!
Look at using New Relic to identify where the bottleneck is. See overview of it and discussion of identifying bottlenecks in my talk:
http://lanyrd.com/2012/pycon/spcdg/
Also edit your original question and add the mod_wsgi configuration you are using, plus whether you are using Apache prefork or worker MPM as you could be doing something non optimal there.
This is not the typical question, but I'm out of ideas and don't know where else to go. If there are better places to ask this, just point me there in the comments. Thanks.
Situation
We have this web application that uses Zend Framework, so runs in PHP on an Apache web server. We use MySQL for data storage and memcached for object caching.
The application has a very unique usage and load pattern. It is a mobile web application where every full hour a cronjob looks through the database for users that have some information waiting or action to do and sends this information to a (external) notification server, that pushes these notifications to them. After the users get these notifications, the go to the app and use it, mostly for a very short time. An hour later, same thing happens.
Problem
In the last few weeks usage of the application really started to grow. In the last few days we encountered very high load and doubling of application response times during and after the sending of these notifications (so basically every hour). The server doesn't crash or stop responding to requests, it just gets slower and slower and often takes 20 minutes to recover - until the same thing starts again at the full hour.
We have extensive monitoring in place (New Relic, collectd) but I can't figure out what's wrong; I can't find the bottlekneck. That's where you come in:
Can you help me figure out what's wrong and maybe how to fix it?
Additional information
The server is a 16 core Intel Xeon (8 cores with hyperthreading, I think) and 12GB RAM running Ubuntu 10.04 (Linux 3.2.4-20120307 x86_64). Apache is 2.2.x and PHP is Version 5.3.2-1ubuntu4.11.
If any configuration information would help analyze the problem, just comment and I will add it.
Graphs
info
phpinfo()
apc status
memcache status
collectd
Processes
CPU
Apache
Load
MySQL
Vmem
Disk
New Relic
Application performance
Server overview
Processes
Network
Disks
(Sorry the graphs are gifs and not the same time period, but I think the most important info is in there)
The problem is almost certainly MySQL based. If you look at the final graph mysql/mysql_threads you can see the number of threads hits 200 (which I assume is your setting for max_connections) at 20:00. Once the max_connections has been hit things do tend to take a while to recover.
Using mtop to monitor MySQL just before the hour will really help you figure out what is going on but if you cannot install this you could just using SHOW PROCESSLIST;. You will need to establish your connection to mysql before the problem hits. You will probably see lots of processes queued with only 1 process currently executing. This will be the most likely culprit.
Having identified the query causing the problems you can attack your code. Without understanding how your application is actually working my best guess would be that using an explicit transaction around the problem query(ies) will probably solve the problem.
Good luck!
This question asks about the disadvantages of 'cgi-bin based' services. As far as I can ascertain, apart from perhaps the naming convention, nothing much has changed over the years as far as web based client/server interaction is concerned. There is of course now the option to use AJAX clients but ultimately they are still stateless and code on the server, whatever language it is written in, still waits for input to be sent via 'GET' or 'POST' methods.
Having been out of the loop as far as web programming is concerned for quite a while, am I missing something obvious?
To clarify my question: The question I referred to suggests that 'cgi-bin' based systems are no longer in use, what is the new alternative?
#sarnold. Thank you for your answer. Just so I am 100% certain about this, even if a system is developed using the 'latest and greatest' server platform (I guess this would be a .net based system or Linux equivalent) it is still, ultimately, just a program, or programs, running (if using fast cgi) or waiting to be started on a server, so there really hasn't been any change over the years. If that is the case what alternative is Brian referring to in his question?
The largest changes have been in tools like mod_php that execute the code directly in the address space of the web server and FastCGI which implement something very nearly identical to the CGI protocol, but with a handful of long-lived processes, rather than fork(2) + execve(2) of a new interpreter for every single request.
Of course, both approaches have problems: executing the interpreter directly in the address space of the web server is potentially horrible for reliability and security: the server (typically) runs with the same privileges all the time, so separating users is (typically) impossible. Further, flaws in the interpreter can be quite common, so it isn't a good solution for shared hosting environments, because any user could run arbitrary code with the privileges required to access all the data of all the other users on the system.
The FastCGI approach almost keeps the same speed; it does sacrifice some speed for copying data around between processes, but this isn't a real big deal for anyone except huge volume sites. But, you can run multiple FastCGI systems as different user accounts attached to different locations of the single 'web server' (e.g., http://example.com/public/ runs under account www-public and http://example.com/private/ runs under account www-private), and the FastCGI systems don't need to run with the same privileges as the web server.
Of course, there are also servlet systems where the server calls into compiled callbacks (frequently, compiled to bytecode) code that is linked into the server process. Much less "scripting"-feel.
We have a MySQL driven site that will occasionally get 100K users in the space of 48 hours, all logging into the site and making purchases.
We are attempting to simulate this kind of load using tools like Apache Bench and Siege.
While the key metric seems to me number of concurrent users, and we've got our report results, we still feel like we're in the dark.
What I want to ask is: What kinds of things should we be testing to anticipate this kind of traffic?
50 concurrent users 1000 Times? 500 concurrent users 10 times?
We're looking at DB errors, apache timeouts, and response times. What else should we be looking at?
This is a vague question and I know there is no "right" answer, we're just looking for some general thoughts on how to determine what our infrastructure can realistically handle.
Thanks in advance!
Simultaneous users is certainly one of the key factors - especially as that applies to DB connection pools, etc. But you will also want to verify that the page rate (pages/sec) of your tests is also in the range you expect. If the the think-time in your testcases is off by much, you can accidentally simulate a much higher (or lower) page rate than your real-world traffic. Think time is the amount of time the user spends between page requests - reading the page, filling out a form, etc.
Depending on what other information you have on hand, this might help you calculate the number of simultaneous users to simulate:
Virtual User Calculators
The complete page load time seen by the end-user is usually the most important metric to evaluate system performance. You'll also want to look for failure rates on all transactions. You should also be on the lookout for transactions that never complete. Some testing tools do not report these very well, allowing simulated users to hang indefinitely when the server doesn't respond...and not reporting this condition. Look for tools that report the number of users waiting on a given page or transaction and the average amount of time those users are waiting.
As for the server-side metrics to look for, what other technologies is your app built on? You'll want to look at different things for a .NET app vs. a PHP app.
Lastly, we have found it very valuable to look at how the system responds to increasing load, rather than looking at just a single level of load. This article goes into more detail.
Ideally you are going to want to model your usage to the user, but creating simulated concurrent sessions for 100k users is usually not easily accomplished.
The best source would be to check out your logs for the busiest hour and try and figure out a way to model that load level.
The database is usually a critical piece of infrastructure, so I would look at recording the number and length of lock waits as well as the number and duration of db statements.
Another key item to look at is disk queue lengths.
Mostly the process is to look for slow responses either in across the whole site or for specific pages and then hone in on the cause.
The biggest problem for load testing is that is quite hard to test your network and if you have (as most public sites do) a limited bandwidth through your ISP, that may create a performance issue that is not reflected in the load tests.