How To Interpret Siege and or Apache Bench Results

How To Interpret Siege and or Apache Bench Results - stress-testing

We have a MySQL driven site that will occasionally get 100K users in the space of 48 hours, all logging into the site and making purchases.
We are attempting to simulate this kind of load using tools like Apache Bench and Siege.
While the key metric seems to me number of concurrent users, and we've got our report results, we still feel like we're in the dark.
What I want to ask is: What kinds of things should we be testing to anticipate this kind of traffic?
50 concurrent users 1000 Times? 500 concurrent users 10 times?
We're looking at DB errors, apache timeouts, and response times. What else should we be looking at?
This is a vague question and I know there is no "right" answer, we're just looking for some general thoughts on how to determine what our infrastructure can realistically handle.
Thanks in advance!

Simultaneous users is certainly one of the key factors - especially as that applies to DB connection pools, etc. But you will also want to verify that the page rate (pages/sec) of your tests is also in the range you expect. If the the think-time in your testcases is off by much, you can accidentally simulate a much higher (or lower) page rate than your real-world traffic. Think time is the amount of time the user spends between page requests - reading the page, filling out a form, etc.
Depending on what other information you have on hand, this might help you calculate the number of simultaneous users to simulate:
Virtual User Calculators
The complete page load time seen by the end-user is usually the most important metric to evaluate system performance. You'll also want to look for failure rates on all transactions. You should also be on the lookout for transactions that never complete. Some testing tools do not report these very well, allowing simulated users to hang indefinitely when the server doesn't respond...and not reporting this condition. Look for tools that report the number of users waiting on a given page or transaction and the average amount of time those users are waiting.
As for the server-side metrics to look for, what other technologies is your app built on? You'll want to look at different things for a .NET app vs. a PHP app.
Lastly, we have found it very valuable to look at how the system responds to increasing load, rather than looking at just a single level of load. This article goes into more detail.

Ideally you are going to want to model your usage to the user, but creating simulated concurrent sessions for 100k users is usually not easily accomplished.
The best source would be to check out your logs for the busiest hour and try and figure out a way to model that load level.
The database is usually a critical piece of infrastructure, so I would look at recording the number and length of lock waits as well as the number and duration of db statements.
Another key item to look at is disk queue lengths.
Mostly the process is to look for slow responses either in across the whole site or for specific pages and then hone in on the cause.
The biggest problem for load testing is that is quite hard to test your network and if you have (as most public sites do) a limited bandwidth through your ISP, that may create a performance issue that is not reflected in the load tests.

Related

MySQL Scaling on GCP

I created a instance (8 core) of MySQL on GCP. And a simple database in it. When I run a load of 40000+ concurrent users (1500 req/sec), the response times come out very high (10 seconds+). However I can see the hardware cpu utilization only at 15% or so.
What can I do to get the response time in msec?
Cheers!
Deepak

Imagine cramming 40000 shoppers in a grocery store. How many hours would it take for a shopper to buy just one carton of milk?
Seriously, there is a limit to how many connections can be made to any device. Database computers will top out at a few hundred. After that, latency will suffer severely as all the connections are waiting for their turn at various shared resources.
Another approach
Let's say these are achievable:
10ms to connect, fetch info for a page, and disconnect.
1500 pages built per second. (By the way, make sure the web server can achieve this.)
15 concurrent connections, each running for 10 ms. That equals 1500 pages per second.
1500 pages per second = 90000 pages per minute.
So, let's specify "40000 pages delivered to different (or same) users in one minute". I suggest that will be easy. And it won't require much more than 15 concurrent users. (Traffic is never smooth [except in a benchmark], so 50 concurrent connections may happen.)

[thousands of database servers is] where i would like to go eventually...however i need to solve a basic problem of mine which I have posted above!
Right, you have to expand the number of database servers now if you are serving 40,000 concurrent queries. Not eventually.
But let's be clear about what comprises concurrent users. Here's an example:
mysql> show global status like 'threads%';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Threads_connected | 1266 |
| Threads_running | 9 |
+-------------------+-------+
I've analyzed high-scale production sites for dozens of internet companies. It's typical to see hundreds or thousands of concurrent connections but few of these are executing an SQL query at any given moment. When a given thread is between executing queries, you can view it SHOW PROCESSLIST but it is only doing "Sleep".
This is fine, and it's normal.
I give the analogy to an ssh session: you may be connected to a shell on a linux server, but if you're doing nothing, just sitting at a shell prompt, you aren't taxing the server resources much. You could have hundreds of users connected with ssh to the same server at once. But if they all begin running applications at the same time, you're in trouble. The server might not handle that load well. At least, all of the users will experience slow performance.
It's the same with a MySQL database. If you need a server that can support 40,000 Threads_running, then you need to spread that load over many MySQL servers. There isn't any single server that exists today that can handle that.
But you might mean something different when you say 40,000 concurrent users. It might be that you have 40,000 users who are looking at some page on your website at the same time. But that's not resulting in continuous SQL queries in 40,000 database sessions all at the same time. Each person spends some time reading the web page they just loaded, and scrolling up and down, and perhaps typing into a form. While they are doing that, the website is waiting for their next request, and the web server and database server is not doing any work for that user while it's waiting. It can do some work for other users.
In this way, a single database server can support 40,000 (or more) users who are by some definition using the site, even though only a handful are invoking any code to run SQL queries at any given moment.
This is normal and most websites can handle that traffic with no problems.
If that's the reality of your application, and you still have problems scaling it, then you might have inefficient application code or unoptimized SQL queries. That is, the website could serve the requests easily if you wrote the code to be more efficient.
Inefficient code cannot be fixed by changing your server. The cost of inefficient code scales up faster than you can hope to handle it by upgrading the server. So you must solve performance problems by writing better code.
This is the point of an old tweet of mine:
The subject of scalable internet architecture is very complex. You need to do a lot of study and a lot of testing to grow a website and make it scalable.
You can start by reading. My favorite is Theo Schlossnagle's book Scalable Internet Architectures. Here is a video of Theo speaking about the same subject: https://www.youtube.com/watch?v=2WuT2rdLK5A
The book is from quite a few years ago. Perhaps the scale websites need to support is greater than it was back then, but the methods of achieving scalability are the same today.
Test
Identify bottlenecks
Rearchitect your web app code to relieve those bottlenecks
Test again

how to implement saved-searches scenario

what is saved-search?
Save is the mechanism users don't find their desired results in advanced search and just push "Save My Search Criteria bottom" and we save the search criteria and when corresponding data post to website we will inform the user "hey user, the item(s) you were looking for exists now come and visit it".
Saved Searches is useful for sites with complex search options, or sites where users may want to revisit or share dynamic sets of search results.
we have advanced search and don't need to implement new search, what we require is a good performance scenario to achieve saved-search mechanism.
we have a website that users post about 120,000 posts per day into the website and we are going to implement SAVED SEARCH scenario(something like this what https://www.gumtree.com/ do), it means users using advanced search but they don't find their desired content and just want to save the search criteria and if there will be any results in the website we inform them with notification.
We are using Elastic search and Mysql in our Website.We still, haven't implement anything and just thinking about it to find good solution which can handle high rate of date, in other hand **the problem is the scale of work, because we have a lot of posts per day and also we guess users use this feature a lot, So we are looking for good scenario which could handle this scale of work easy with high performance.
suggested solutions but not the best
one quick solution is we save the saved-searches in saved-search-index in Elastic then run a cronjob that for all saved-searches items get results from posts-index- Elastic and if there is any result push a record into the RabbitMq to notify the equivalent user.
on user post an item into the website we check it with exists saved-searches in saved-search-index in Elastic and if matched we put a record into the RabbitMq,( the main problem of this method is it could be matched with a huge number of saved-searches in every post inserted into the website).
My big concern is about scale and performance, I'll appreciate sharing your experiences and ideas about this problem with me.
My estimation about the scale
Expire date of saved-search is three month
at least 200,000 Saved-search Per day
So we have 9,000,000 active Records
I'll appreciate if you share your mind with me
*just FYI**
- we also have RabbitMQ for our queue jobs
- our ES servers are good enough with 64GB RAM

Cron job - No. Continual job - yes.
Why? As things scale, or as activity spikes, cron jobs become problematical. If the cron job for 09:00 runs too long, it will compete for resources with the 10:00 instance; this can cascade into a disaster.
At the other side, if a cron job finishes 'early', then the activity oscillates between "busy" (the cron job is doing stuff) and "not busy" (cron has finished, and not time for next invocation).
So, instead, I suggest a job that continually runs through all the "stored queries", doing them one at a time. When it finishes the list, is simply starts over. This completely eliminates my complaints about cron, and provides an automatic "elasticity" to handle busy/not-busy times -- the scan will slow down or speed up accordingly.
When the job finishes, the list, it starts over on the list. That is, it runs 'forever'. (You could use a simple cron job as a 'keep-alive' monitor that restarts it if it crashes.)
OK, "one job" re-searching "one at a time" is probably not best. But I disagree with using a queuing mechanism. Instead, I would have a small number of processes, each acting on some chunk of the stored queries. There are many ways: grab-and-lock; gimme a hundred to work on; modulo N; etc. Each has pros and cons.

Because you are already using Elasticsearch and you have confirmed that you are creating something like Google Alerts, the most straightforward solution would be Elasticsearch Percolator.
From the official documentation, Percolator is useful when:
You run a price alerting platform which allows price-savvy customers to specify a rule like "I am interested in buying a specific electronic gadget and I want to be notified if the price of gadget falls below $X from any vendor within the next month". In this case you can scrape vendor prices, push them into Elasticsearch and use its reverse-search (Percolator) capability to match price movements against customer queries and eventually push the alerts out to the customer once matches are found.
I can't say much when it comes to performance, because you did not provide any example of your queries but mostly because my findings are inconsistent.
According to this post (https://www.elastic.co/blog/elasticsearch-queries-or-term-queries-are-really-fast), Elasticsearch queries should be capable of reaching 30,000 queries/second.
However, this unanswered question (Elasticsearch percolate performance) reported a painfully slow 200 queries/second on a 16 CPU server.
With no additional information I can only guess that the cause is configuration problems, so I think you'll have to try a bunch of different configurations to get the best possible performance.
Good luck!

This answer was written without a true understanding of the implications of a "saved search". I leave it here as discussion of a related problem, but not as a "saved search" solution. -- Rick James
If you are saving only the "query", I don't see a problem. I will assume you are saving both the query and the "resultset"...
One "saved search" per second? 2.4M rows? Simply rerun the search when needed. The system should be able to handle that small a load.
Since the data is changing, the resultset will become outdated soon? How soon? That is, saving the resultset needs to be purged rather quickly. Surely the data is not so static that you can wait a month. Maybe an hour?
Actually saving the resultset and being able to replay it involves (1) complexity in your code, (2) overhead in caching, I/O, etc, etc.
What is the average number of times that the user will look at the same search? Because of the overhead I just mentioned, I suspect the average number of times needs to be more than 2 to justify the overhead.
Bottomline... This smells like "premature optimization". I recommend
Build the site without saving resultsets.
Stress test it to see when it will break.
Work on optimizing the slow parts.
As for RabbitMQ -- "Don't queue it, just do it". The cost of queuing and dequeuing is (1) increased latency for the user and (2) increased overhead on system. The benefit (at your medium scale) is minimal.
If you do hit scaling problems, consider
Move clients off to another server -- away from the database. This will give you some scaling, but not 2x. To go farther...
Use replication: One Master + many readonly Slaves -- and do the queries on the Slaves. This gives you virtually unlimited scaling in the database.
Have multiple web servers -- virtually unlimited scaling in this part.

I don't understand why you want to use saved-search... First: you should optimize service, so as to use as little as possible the saved-search.
Have you done anything with the ES server? (What can you afford), so:
Have you optimized elasticearch server? By default, it uses 1GB of RAM. The best solution is to give him half the machine RAM, but no more than 16GB (if I'm remember. Check doc)
How powerful is the ES machine? He likes core instead of MHZ.
How many ES nodes do you have? You can always add another machine to get the results faster.
In my case (ES 2.4), the server slows down after a few days, so I restart it once a day.
And next:
Why do you want to fire up tasks every half hour? If you already use cron, fire then every minute, and you indicate that the query is running. With the other the post you have a better solution and an explanation.
Why do you separate the result from the query?
Remember to standardize the query to change the order of the parameters, not to force a new query.
Why do you want to use MySQL to store results? The better document-type database, like Elasticsearch xD.
I propose you:
Optimize ES structure - choose right tokenisers for fields.
Use asynchronous loading of results - eg WebSocket + Node.js

Implementing dynamically updating upvote/downvote

How to implement dynamically updating vote count similar to quora:- Whenever a user upvotes an answer its reflected automatically for every one who is viewing that page.
I am looking for an answer that address following:
Do we have to keep polling for upvote counts for every answer, If yes
then how to manage the server load arising because of so many users
polling for upvotes.
Or to use websockits/push notifications, how scalable are these?
How to store the upvote/downvote count in databases/inmemory to support this. How do they control the number of read/writes. My backend database is mysql
The answer I am looking for may not be exactly how quora is doing it, but may be how this can be done using available opensource technologies.

It's not the back-end system details that you need to worry about but the front end. Having connection being open all the time is impractical at any real scale. Instead you want the opposite - to be able to serve and close connection from back-end as fast as you can.
Websockets is a sexy technology, but again, in real world there are issues with proxies, if you are developing something that should work on a variety of screens (desktop, tablet, mobile) it might became a concern to you. Even good-old long polls might not work through firewalls and proxies.
Here is a good news: I think
"keep polling for upvote counts for every answer"
is a totally good solution in this case. Consider the following:
your use-case does not need any real real-time updates. There is little harm to see the counter updated a bit later
for very popular topics you would like to squash multiple up-votes/down-votes into one anyway
most of the topics will see no up-vote/down-vote traffic at all for days/weeks, so keeping a connection open, waiting for an event that never comes is a waste
most of the user will never up-vote/down-vote that just came to read a topic, so your read/write ration of topics stats will be greatly skewed toward reads
network latencies varies hugely across clients, you will see horrible transfer rates for a 100B http responses, while this sluggish client is fetching his response byte-by-byte your precious server connection and what is more importantly - thread on a back end server is busy
Here is what I'd start with:
have browsers periodically poll for a new topic stat, after the main page loads
keep your MySQL, keep counters there. Every time there is an up/down vote update the DB
put Memcached in front of the DB as a write-through cache i.e. every time there is an up/down vote update cache, then update DB. Set explicit expire time for a counter there to be 10-15 minutes . Every time counter is updated expire time is prolongated automatically.
design these polling http calls to be cacheable by http proxies, set expire and ttl http headers to be 60 sec
put a reverse proxy(Varnish, nginx) in front of your front end servers, have this proxy do the caching of the said polling calls. These takes care of the second level cache and help free up backend servers threads quicker, see network latencies concern above
set-up your reverse proxy component to talk to memcached servers directly without making a call to the backend server, yes if your can do it with both Varnish and nginx.
there is no fancy schema for storing such data, it's a simple inc()/dec() operation in memcached, note that it's safe from the race condition point of view. It's also a safe atomic operation in MySQL UPDATE table SET field = field + 1 WHERE [...]
Aggressive multi level caching covers your read path: in Memcached and in all http caches along the way, note that these http poll requests will be cached on the edges as well.
To take care of the long tail of unpopular topic - make http ttl for such responses reverse proportional to popularity.
A read request will only infrequently gets to the front end server, when http cache expired and memcached does not have it either. If that is still a problem, add memecached servers and increase expire time in memcached across the board.
After you done with that you have all the reads taken care of. The only problem you might still have, depending on the scale, is high rate of writes i.e. flow of up/down votes. This is where your single MySQL instance might start showing some lags. Fear not - proceed along the old beaten path of sharding your instances, or adding a NoSQL storage just for counters.
Do not use any messaging system unless absolutely necessary or you want an excuse to play with it.

Websockets, Server Sent Events (I think that's what you meant by 'push notifications') and AJAX long polling have the same drawback - they keep underlying TCP connection open for a long time.
So the question is how many open TCP connections can a server handle.
Basically, it depends on its OS, number of file descriptors (a config parameter) and available memory (each open connection reserves a read/write buffers).
Here's more on that.
We once tested a possibility to keep 1 million websocket connections open on a single server (Windows 7 x64 with 16Gb of RAM, JVM 1.7 with 8Gb of heap, using Undertow beta to serve Web requests).
Surprisingly, the hardest part was to generate the load on the server )
It managed to hold 1M. But again the server didn't do something useful, just received requests, went through protocol upgrade and kept those connections open.
There was also some number of lost connections, for whatever reason. We didn't investigate. But in production you would also have to ping the server and handle reconnection.
Apart from that, Websockets seem like an overkill here, SSE still aren't widely adopted.
So I would go with good old AJAX polling, but optimize it as much as possible.
Works everywhere, simple to implement and tweak, no reliance on an external system (I had bad experience with that several times), possibilities for optimization.
For instance, you could group updates for all open articles in a single browser, or adjust update interval according to how popular the article is.
After all it doesn't seem like you need real-time notifications here.

sounds like you might be able to use a messaging system like Kafka, or RabbitMQ, or ActiveMQ. Your front end would sent votes to a message channel and receive them with a listener, and you could have a server side piece persist the votes to the db periodically.
You could also accomplish your task by polling your database, and by incre/decre menting a number related to a post via a stored proc... there are a bunch of options here and it depends on how much concurrency you may be facing.

Does Caching always enhance performance?

I have a number of sites with PHP and MySQL, especially running MediaWiki, and I need to enhance the performance. However, I have only a limited percentage of CPU that I'm allowed to use.
The best thing I can think about to improve performance is to enable caching. However, I'm confused: Does that really enhance performance overall or just enhance speed?
What I can think about is, if caching will use files, then it would take more processing to get the content of these files. If it will use SQL tables, then it will take more processing to query these tables as well, perhaps the time will be shorter, but the CPU usage will be more.
Is that correct or not? does caching consume more CPU to give a speeder results or it improves performance overall?

At the most basic level caching should be used to store the result of CPU intensive processes. For example, if you have a server side image handler that creates an image on-the-fly (say a thumbnail and larger preview) then you don't want this operation to occur on every request - you'd want to run this process once and store the results; Then, every other request gets the saved result.
This is obviously a hugely over-simplified description of basic caching, and the use of an image is fine in this case as you don't have to worry about stale data i.e. how often will the actual image change? In your case, databases are hugely different. If you cache data then how can you guarantee that there won't be an instant mismatch between your real data and your cached data? Querying a database is not always a CPU intensive task also (granted you have to consider how the database is designed in terms of indexing, table size etc) but in most cases querying a well designed database is far more intensive on disk I/O than it is on CPU cycles.
First, you need to look at your database design and secondly your queries. For example are you normalizing your database correctly, are your queries trawling through huge amounts of data when you could just archive, are you joining tables on non-indexed fields, are your where clauses querying fields that could be indexed (IN is particulary bad in these cases).
I recommend you get hold of a query analyzer and spend some time optimizing your table structure and queries to find that bottle neck before looking into more drastic changes.

Reference : http://msdn.microsoft.com/en-us/library/ee817646.aspx
Performance : Caching techniques are commonly used to improve application performance by storing relevant data as close as possible to the data consumer, thus avoiding repetitive data creation, processing, and transportation.
For example, storing data that does not change, such as a list of countries, in a cache can improve performance by minimizing data access operations and eliminating the need to recreate the same data for each request.
Scalability : The same data, business functionality, and user interface fragments are often required by many users and processes in an application. If this information is processed for each request, valuable resources are wasted recreating the same output. Instead, you can store the results in a cache and reuse them for each request. This improves the scalability of your application because as the user base increases, the demand for server resources for these tasks remains constant.
For example, in a Web application the Web server is required to render the user interface for each user request. You can cache the rendered page in the ASP.NET output cache to be used for future requests, freeing resources to be used for other purposes.
Caching data can also help scale the resources of your database server. By storing frequently used data in a cache, fewer database requests are made, meaning that more users can be served.
Availability : Occasionally the services that provide information to your application may be unavailable. By storing that data in another place, your application may be able to survive system failures such as network latency, Web service problems, or hardware failures.
For example, each time a user requests information from your data store, you can return the information and also cache the results, updating the cache on each request. If the data store then becomes unavailable, you can still service requests using the cached data until the data store comes back online.

You need to profile your seem and find out where the bottle necking is happening. Cacheing is the best type of page load, its one that doesn't hit the server at all. You can build a very simple caching system that only reloads the information ever 15 minutes. So, if the page was cached in the last 15 minutes it gives them a pre-rendered page. The page loaded once, it creates a temp file. every 15 minutes you create a new on (if someone loads that page).
Caching only stores a file that the server has already done the work for. The work to create the file is already done and your simply storing it.

You use the terms 'performance' and 'speed'. I'll assume 'performance' relates to CPU cycles on your web server and that 'speed' relates to the time it takes to serve the page to the user. You want to maximize web server 'performance' ( by lowering the total number of CPU cycles needed to serve pages ) whilst maximizing 'speed' ( lowering the time it takes to serve a web page ).
The good news for you is that Caching can improve both of these metrics at the same time. By caching content you create an output page that is stored in the cache and can be served repeatedly to users directly without having to re-execute PHP code that originally created this output page ( thus lowering CPU cycles ). Fetching a cached page from cache consumes less CPU cycles than re-executing PHP code.
Caching is particularly good for web pages that are generally the same for all users who request the page - for example in a wiki, and for pages that generally do not change all too often - again, a wiki.

"Enhance performance" sounds like some of the email I get...
There are two, interrelated things that happen here. One is "how long does it take to serve a given request?", and the other is "how many requests can I serve concurrently given my limited resources?". People tend to use either or both of those concepts when talking about performance.
Caching can help with both those things.
The most effective caching strategy uses resources outside your machines to cache your stuff - the most obvious examples are the user's browser, or a CDN. I'll assume you can't use a CDN, but by spending a bit of effort on setting the HTTP cache headers, you can reduce the number of requests to your server for static or sluggish resources quite dramatically.
For dynamic content - usually the web page you generate by querying your database - the next most effective caching strategy is to cache the HTML generated by (parts of) your page. For instance, if you have a "most popular items" box on your homepage, this will usually run a couple of moderately complex database queries, and then some "turn data to HTML" back-end code. If you can cache the HTML, you save both the database queries and the CPU effort of turning the data into HTML.
If that's not possible, you may be able to cache the result of some database queries. That helps in reducing the database load, and usually also reduces the load on your web server - the code required to run the database query and deal with the results is usually more onerous that retrieving the item from cache; because it's faster, it allows your request to be handled quicker, which frees up resources more quickly. This reduces the load on your servers for an individual request, and thus allows you to serve more concurrent requests.

MySQL scale up or scale out?

I have been tasked with investigating reasons why our internal web application is hitting performance problems.
The web application itself is part written in PHP and part written in Perl, and we have a MySQL database which is where I believe the source of performance hit is occurring.
We have about 400 users of the system, of which, most are spread across different timezones, so generally there are only ever a max of 30 users online at any one time. The performance problems have crept up on us, particularly over the past year as the database keeps growing.
The system is running on one single 32-bit debian server - 6GB of RAM, with 8 x 2.4GHz intel CPU. This is probably not hefty enough for the job in-hand. However, even at times where I am the only user online, page loading time can still be slow.
I'm trying to determine whether we need to scale up or scale out. Firstly, I'd like to know is how well our hardware is coping with the demands placed upon it. And secondly, whether it might be worth scaling out and creating some replication slaves to balance the load.
There are a lot of tools available on the internet - probably a bit too many to investigate. Can anyone recommend any tools that can provide some profiling/performance monitoring that may help me on my quest.
Many thanks,
ns

Your slow-down seems to be related to the data and not to the number of concurrent users.
Properly indexed queries tend to scale logarithmically with the amount of data - i.e. doubling the data increases the query time by some constant C, doubling the data again by the same C, doubling again by the same C etc... Before you know it, you have humongous amounts of data, yet your queries are just a little slower.
If the slow-down wasn't as gradual in your case (i.e. it was linear to the amount of data, or worse), this might be an indication of badly optimized queries. Throwing more iron at the problem will postpone it, but unless you have unlimited budget, you'll have to actually solve the root cause at some point:
Measure the query performance on the actual data to identify slow queries.
Examine the execution plans for possible improvements.
If necessary, learn about indexing, clustering, covering and other performance techniques.
And finally, apply that knowledge onto queries you have identified in steps (1) and (2).
If nothing else helps, think about your data model. Sometimes, a "perfectly" normalized model is not the best performing one, so a little judicial denormalization might be warranted.

The easy (lazy) way if you have budget is just to throw some more iron at it.
A better way would be, before deciding where or how to scale, would be to identify the bottlenecks. Is it every page load that is slow? Or just particular pages? If it is just a few pages then invest in a profiler (for PHP both xDebug and the Zend Debugger can do profiling). I would also (if you haven't) invest in a test system that is as similar as possible to the live system to run diagnostics.
You could also look at gathering some stats; either at server level with a program such as sar (from the sysstat package and also at the db level (have you got the slow query log running?).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008