So I have a bit of a performance problem. I have made a java program that constructs a database. The problem is when loading in the data. I am loading in 5,000 files into a sql Database. When the program starts off, it can process about 10% of the files in 10 minutes however it gets much slower as it progresses. Currently at 28% it is going to finish in 16 hours at its current rate. However that rate is slowing down considerably.
My question is why does the program get progressively slower as it runs and how to fix that.
EDIT: I have two versions. One is threaded (capped at 5 threads) and one is not. The difference between the two is negligible. I can post the code again if any one likes, but I took it out because I am now fairly certain that the bottle neck is the MySQL (Also appropriately re tagged). I went ahead and used batch inserts. This did cause an initial increase in speed but once again after processing about 30% of the data it does drop of quickly.
So SQL Points
My Engine for all 64 tables is InnoDB version 10.
The table have about 300k rows at this point (~30% of the data)
All tables have one "joint" primary key. A id and a date.
Looking at MySQL WorkBench I see that there is a query per thread (5 queries)
I am not sure the unit of time (Just reading from MySQL Administrator), but the queries to check if a file is already inserted are taking `300. (This query should be fast as it is a SELECT MyIndex from MyTable Limit 1 to 1 where Date = date.) As I have been starting and stopping the program I built in this check to see if the file was already inserted. That way I am able to start it after each change and see what if any improvement there is without starting the process again.
I am fairly certain that the degradation of preformance is related to the tables' sizes. (I can stop and start the program now and the process remains slow. It is only when the tables are small that the process is going at an acceptable speed.)
Please, please ask and I will post what ever information you need.
DONE! Well I just let it run for the 4 Days it needed to. Thank you all for the help.
Cheers,
--Orlan
Q1: Why does the program get progressively slower?
In your problem space, you have 2 systems interacting: a producer that reads from the file system and produces data, and a consumer that transforms that data into records and stores them in the db. Your code is currently hard linking these two processes and your system works at the slowest speed of the two.
In your program you have a fixed arrival rate (1/sec - the wait when you've more than 10 threads running). If you have indexes in the tables being filled, as the table grows bigger, inserts will take longer. That means that while your arrival rate is fixed at 1/sec, your exit rate is continuosly increasing. Therefore, you will be creating more and more threads that share the same CPU/IO resources and getting less things done per unit of time. Creating threads is also a very expensive operation.
Q2: Could it have to do with how I am constructing the queries from Strings?
Only partially. Your string manipulation is a fixed cost in the system. It increases the cost it takes to service one request. But string operations are CPU bounded and your problem is I/O bounded, meaning that improving the string handling (that you should) will only marginally improve the performance of the system. (See Amdahl's Law).
Q3: how to fix that (performance issue)
Separate the file reader process from the db insert process. See the Consumer-Producer pattern. See also Completion Service for an implementation built-in the JDK:
(FileReaderProducer) --> queue --> (DBBulkInsertConsumer)
Don't create new Threads. Use the facilities provided by the java.util.concurrent package, like the executor service or the Completion service mentioned above. For a "bare" threadpool, use the Executors factory.
For this specific proble, having 2 separate thread pools, (one for the consumer, one for the producer) will allow you to tune your system for best performance. File reading improves with parallelization (up to your I/O bound), but db inserts are not (I/O + indexes + relational consistency checks), so you might need to limit the amount of file reading threads (3-5) to match the insertion rate (2-3). You can monitor the queue size to evaluate your system performance.
Use JDBC bulk inserts: http://viralpatel.net/blogs/batch-insert-in-java-jdbc/
Use StringBuilder instead of String concatenation. Strings in Java are immutable. That means that every time you do: myString += ","; you are creating a new String and making the old String elegible for garbage collection. In turn, this increases garbage collection performance penalties.
You can use direct insert from file to database (read here). It works faster. When I do same for postgres I get 20 times performance increase.
And also dounload Your kit profiler and profile your application for performance. Than you will see what takes your time.
There's a number of things in your code that could contribute to the speed problems and you are correct in suspecting that the Strings play a role.
Take for example this code:
String rowsString = "";
// - an extra 1 to not have a comma at the end
for (int i = 0; i <= numberOfRows - 3; i++) {
rowsString += "(DATA), \n";
}
rowsString += "(DATA)";
Depending on how many rows there are, this is a potential bottle-neck and memory hog. I think it's best if you use a StringBuilder here. I see a lot of String manipulation that are better suited to StringBuilders. Might I suggest you read up on String handling a bit and optimise these, especially where you += Strings?
Then the next question is how is your table designed? There could be things making your inserts slow, like incorrect default lengths for varchars, no indexes or too many indexes etc.
Most databases load data more efficiently if,
you load in batches of data,
you load in a relatively small numebr of threads e.g. one or two.
As you add more threads you add more overhead, so you expect it to be slower.
Try using an ExecutorService with a fixed size pool e.g. 2-4 and try loading the data in batches of say 100 at a time in a transaction.
You have several good tried and tested options for speeding up database access.
Use an ExecutorService for your threads. This may not help speed-wise but it will help you implement the following.
Hold a ThreadLocal Connection instead of making a new connection for every file. Also, obviously, don't close it.
Create a single PreparedStatement instead of making a new one every time around.
Batch up your statement executions.
Related
For example, if you were to have a webpage that writes 10 times (5ms each query so relatively inexpensive) to a MySQL database every time you visited the page. Would it be the same thing as a page load that writes only one query that is 50ms instead? What I'm basically asking is does queries per second really matter? Will it bottleneck my database faster?
If I had a pretty large database, is there any differences of writing 10000 inexpensive queries per second versus 1000 more expensive ones?
Besides the actual time spent executing a query, there is an overhead for your Rails application to establish a connection and close it.
Take your example of comparing one write of a certain amount of data versus 10 writes, each one-tenth the amount of data.
In the former case, only one database connection is needed, putting little stress on the connection pool. The write completes in 50ms, and during that time 9 other writes could also come in and execute. On the other hand, in the latter case, 10 connections from the pool must be used for the same data alone. And likely the amount of time needed for each of the 10 writes would be greater than 5ms, due to the overhead of establishing the connection.
Databases, like anything else, can benefit from economy of scale. Of course, there are instances when you simply need so many connections happening. But, all other factors being the same, if your business logic can be done with one large write rather than ten smaller ones, you might want to go for the single write.
I've got a web application, it has the normal feature, user settings etc these are all stored in MYSQL with the user etc.....
A particular part of the application a is a table of data for the user to edit.
I would like to make this table real time, across multiple users. Ie multiple users can open the page edit the data and see changes in real time done by other users editing the table.
My thinking is to cache the data for the table in Redis, then preform all the actions in redis like keeping all the clients up to date.
Once all the connection have closed for a particular table save the data back to mysql for persistence, I know Redis can be used as a persistent NoSQL database but as RAM is limited and all my other data is stored in MYSQL, mysql seems a better option.
Is this a correct use case for redis? Is my thinking correct?
It depends on the scalability. The number of records you are going to deal with and the structure you are going to use for saving it.
I will discuss about pros and crons of using redis. The decision is up to you.
Advantages of using redis:
1) It can handle heavy writes and reads in comparison with MYSQL
2) It has flexible structures (hashmap, sorted set etc) which can
localise your writes instead of blocking the whole table.
3) Read queries will be much faster as it is served from cache.
Disadvantages of using redis:
1) Maintaining transactions. What happens if both users try to access a
particular cell at a time? Do you have a right data structure in redis to
handle this case?
2) What if the data is huge? It will exceed the memory limit.
3) What happens if there is a outage?
4) If you plan for persistence of redis. Say using RDB or AOF. Will you
handle those 5-10 seconds of downtime?
Things to be focussed:
1) How much data you are going to deal with? Assume for a table of 10000 rows wit 10 columns in redis takes 1 GB of memory (Just an assumption actual memory will be very much less). If your redis is 10GB cluster then you can handle only 10 such tables. Do a math of about how many rows * column * live tables you are going to work with and the memory it consumes.
2) Redis uses compression for data within a range http://redis.io/topics/memory-optimization. Let us say you decide to save the table with a hashmap, you have two options, for each column you can have a hashmap or for each row you can have a hashmap. Second option will be the optimal one. because storing 1000 (hashmaps -> rows) * 20 (records in each hash map -> columns) will take 10 time less memory than storing in the other way. Also in this way if a cell is changed you can localize in hashmap of within 20 values.
3) Loading the data back in your MYSQL. how often will this going to happen? If your work load is high then MYSQL begins to perform worse for other operations.
4) How are you going to deal with multiple clients on notifying the changes? Will you load the whole table or the part which is changes? Loading the changed part will be the optimal one. In this case, where will you maintain the list of cells which have been altered?
Evaluate your system with these questions and you will find whether it is feasible or not.
ABSTRACT
Talking with some colleagues we came accross the "extract random row from a big database table" issue. It's a classic one and we know the naive approach (also on SO) is usually something like:
SELECT * FROM mytable ORDER BY RAND() LIMIT 1
THE PROBLEM
We also know a query like that is utterly inefficient and actually usable only with very few rows. There are some approaches that could be taken to attain better efficiency, like these ones still on SO, but they won't work with arbitrary primary keys and the randomness will be skewed as soon as you have holes in your numeric primary keys. An answer to the last cited question links to this article which has a good explanation and some bright solutions involving an additional "equal distribution" table that must be maintained whenever the "master data" table changes. But then again if you have frequent DELETEs on a big table you'll probably be screwed up by the constant updating of the added table. Also note that many solutions rely on COUNT(*) which is ridiculously fast on MyISAM but "just fast" on InnoDB (I don't know how it performs on other platforms but I suspect the InnoDB case could be representative of other transactional database systems).
In addition to that, even the best solutions I was able to find are fast but not Ludicrous Speed fast.
THE IDEA
A separate service could be responsible to generate, buffer and distribute random row ids or even entire random rows:
it could choose the best method to extract random row ids depending on how the original PKs are structured. An ordered list of keys could be maintained in ram by the service (shouldn't take too many bytes per row in addition to the actual size of the PK, it's probably ok up to 100~1000M rows with standard PCs and up to 1~10 billion rows with a beefy server)
once the keys are in memory you have an implicit "row number" for each key and no holes in it so it's just a matter of choosing a random number and directly fetch the corresponding key
a buffer of random keys ready to be consumed could be maintained to quickly respond to spikes in the incoming requests
consumers of the service will connect and request N random rows from the buffer
rows are returned as simple keys or the service could maintain a (pool of) db connection(s) to fetch entire rows
if the buffer is empty the request could block or return EOF-like
if data is added to the master table the service must be signaled to add the same data to its copy too, flush the buffer of random picks and go on from that
if data is deleted from the master table the service must be signaled to remove that data too from both the "all keys" list and "random picks" buffer
if data is updated in the master table the service must be signaled to update corresponding rows in the key list and in the random picks
WHY WE THINK IT'S COOL
does not touch disks other than the initial load of keys at startup or when signaled to do so
works with any kind of primary key, numerical or not
if you know you're going to update a large batch of data you can just signal it when you're done (i.e. not at every single insert/update/delete on the original data), it's basically like having a fine grained lock that only blocks requests for random rows
really fast on updates of any kind in the original data
offloads some work from the relational db to another, memory only process: helps scalability
responds really fast from its buffers without waiting for any querying, scanning, sorting
could easily be extended to similar use cases beyond the SQL one
WHY WE THINK IT COULD BE A STUPID IDEA
because we had the idea without help from any third party
because nobody (we heard of) has ever bothered to do something similar
because it adds complexity in the mix to keep it updated whenever original data changes
AND THE QUESTION IS...
Does anything similar already exists? If not, would it be feasible? If not, why?
The biggest risk with your "cache of eligible primary keys" concept is keeping the cache up to date, when the origin data is changing continually. It could be just as costly to keep the cache in sync as it is to run the random queries against the original data.
How do you expect to signal the cache that a value has been added/deleted/updated? If you do it with triggers, keep in mind that a trigger can fire even if the transaction that spawned it is rolled back. This is a general problem with notifying external systems from triggers.
If you notify the cache from the application after the change has been committed in the database, then you have to worry about other apps that make changes without being fitted with the signaling code. Or ad hoc queries. Or queries from apps or tools for which you can't change the code.
In general, the added complexity is probably not worth it. Most apps can tolerate some compromise and they don't need an absolutely random selection all the time.
For example, the inequality lookup may be acceptable for some needs, even with the known weakness that numbers following gaps are chosen more often.
Or you could pre-select a small number of random values (e.g. 30) and cache them. Let app requests choose from these. Every 60 seconds or so, refresh the cache with another set of randomly chosen values.
Or choose a random value evenly distributed between MIN(id) and MAX(id). Try a lookup by equality, not inequality. If the value corresponds to a gap in the primary key, just loop and try again with a different random value. You can terminate the loop if it's not successful after a few tries. Then try another method instead. On average, the improved simplicity and speed of an equality lookup may make up for the occasional retries.
It appears you are basically addressing a performance issue here. Most DB performance experts recommend you have as much RAM as your DB size, then disk is no longer a bottleneck - your DB lives in RAM and flushes to disk as required.
You're basically proposing a custom developed in-RAM CDC Hashing system.
You could just build this as a standard database only application and lock your mapping table in RAM, if your DB supports this.
I guess I am saying that you can address performance issues without developing custom applications, just use already existing performance tuning methods.
Lets say I query a table with 500K rows. I would like to begin viewing any rows in the fetch buffer, which holds the result set, even though the query has not yet completed. I would like to scroll thru the fetch buffer. If I scroll too far ahead, I want to display a message like: "REACHED LAST ROW IN FETCH BUFFER.. QUERY HAS NOT YET COMPLETED".
Could this be accomplished using fgets() to read the fetch buffer while the query continues building the result set? Doing this implies multi-threading*
Can a feature like this, other than the FIRST ROWS hint directive, be provided in Oracle, Informix, MySQL, or other RDBMS?
The whole idea is to have the ability to start viewing rows before a long query completes, while displaying a counter of how many rows are available for immediate viewing.
EDIT: What I'm suggesting may require a fundamental change in a DB server's architecture, as to the way they handle their internal fetch buffers, e.g. locking up the result set until the query has completed, etc. A feature like the one I am suggesting would be very useful, especially for queries which take a long time to complete. Why have to wait until the whole query completes, when you could start viewing some of the results while the query continues to gather more results!
Paraphrasing:
I have a table with 500K rows. An ad-hoc query without a good index to support it requires a full table scan. I would like to immediately view the first rows returned while the full table scan continues. Then I want to scroll through the next results.
It seems that what you would like is some sort of system where there can be two (or more) threads at work. One thread would be busy synchronously fetching the data from the database, and reporting its progress to the rest of the program. The other thread would be dealing with the display.
In the meantime, I would like to display the progress of the table scan, example: "Searching...found 23 of 500,000 rows so far".
It isn't clear that your query will return 500,000 rows (indeed, let us hope it does not), though it may have to scan all 500,000 rows (and may well have only found 23 rows that match so far). Determining the number of rows to be returned is hard; determining the number of rows to be scanned is easier; determining the number of rows already scanned is very difficult.
If I scroll too far ahead, I want to display a message like: "Reached last row in look-ahead buffer...query has not completed yet".
So, the user has scrolled past the 23rd row, but the query is not yet completed.
Can this be done? Maybe like: spawn/exec, declare scroll cursor, open, fetch, etc.?
There are a couple of issues here. The DBMS (true of most databases, and certainly of IDS) remains tied up as far as the current connection on processing the one statement. Obtaining feedback on how a query has progressed is difficult. You could look at the estimated rows returned when the query was started (information in the SQLCA structure), but those values are apt to be wrong. You'd have to decide what to do when you reach row 200 of 23, or you only get to row 23 of 5,697. It is better than nothing, but it is not reliable. Determining how far a query has progressed is very difficult. And some queries require an actual sort operation, which means that it is very hard to predict how long it will take because no data is available until the sort is done (and once the sort is done, there is only the time taken to communicate between the DBMS and the application to hold up the delivery of the data).
Informix 4GL has many virtues, but thread support is not one of them. The language was not designed with thread safety in mind, and there is no easy way to retrofit it into the product.
I do think that what you are seeking would be most easily supported by two threads. In a single-threaded program like an I4GL program, there isn't an easy way to go off and fetch rows while waiting for the user to type some more input (such as 'scroll down the next page full of data').
The FIRST ROWS optimization is a hint to the DBMS; it may or may not give a significant benefit to the perceived performance. Overall, it typically means that the query is processed less optimally from the DBMS perspective, but getting results to the user quickly can be more important than the workload on the DBMS.
Somewhere down below in a much down-voted answer, Frank shouted (but please don't SHOUT):
That's exactly what I want to do, spawn a new process to begin displaying first_rows and scroll through them even though the query has not completed.
OK. The difficulty here is organizing the IPC between the two client-side processes. If both are connected to the DBMS, they have separate connections, and therefore the temporary tables and cursors of one session are not available to the other.
When a query is executed, a temporary table is created to hold the query results for the current list. Does the IDS engine place an exclusive lock on this temp table until the query completes?
Not all queries result in a temporary table, though the result set for a scroll cursor usually does have something approximately equivalent to a temporary table. IDS does not need to place a lock on the temporary table backing a scroll cursor because only IDS can access the table. If it was a regular temp table, there'd still not be a need to lock it because it cannot be accessed except by the session that created it.
What I meant with the 500k rows, is nrows in the queried table, not how many expected results will be returned.
Maybe a more accurate status message would be:
Searching 500,000 rows...found 23 matching rows so far
I understand that an accurate count of nrows can be obtained in sysmaster:sysactptnhdr.nrows?
Probably; you can also get a fast and accurate count with 'SELECT COUNT(*) FROM TheTable'; this does not scan anything but simply accesses the control data - probably effectively the same data as in the nrows column of the SMI table sysmaster:sysactptnhdr.
So, spawning a new process is not clearly a recipe for success; you have to transfer the query results from the spawned process to the original process. As I stated, a multithreaded solution with separate display and database access threads would work after a fashion, but there are issues with doing this using I4GL because it is not thread-aware. You'd still have to decide how the client-side code is going store the information for display.
There are three basic limiting factors:
The execution plan of the query. If the execution plan has a blocking operation at the end (such as a sort or an eager spool), the engine cannot return rows early in the query execution. It must wait until all rows are fully processed, after which it will return the data as fast as possible to the client. The time for this may itself be appreciable, so this part could be applicable to what you're talking about. In general, though, you cannot guarantee that a query will have much available very soon.
The database connection library. When returning recordsets from a database, the driver can use server-side paging or client-side paging. Which is used can and does affect which rows will be returned and when. Client-side paging forces the entire query to be returned at once, reducing the opportunity for displaying any data before it is all in. Careful use of the proper paging method is crucial to any chance to display data early in a query's lifetime.
The client program's use of synchronous or asynchronous methods. If you simply copy and paste some web example code for executing a query, you will be a bit less likely to be working with early results while the query is still running—instead the method will block and you will get nothing until it is all in. Of course, server-side paging (see point #2) can alleviate this, however in any case your application will be blocked for at least a short time if you do not specifically use an asynchronous method. For anyone reading this who is using .Net, you may want to check out Asynchronous Operations in .Net Framework.
If you get all of these right, and use the FAST FIRSTROW technique, you may be able to do some of what you're looking for. But there is no guarantee.
It can be done, with an analytic function, but Oracle has to full scan the table to determine the count no matter what you do if there's no index. An analytic could simplify your query:
SELECT x,y,z, count(*) over () the_count
FROM your_table
WHERE ...
Each row returned will have the total count of rows returned by the query in the_count. As I said, however, Oracle will have to finish the query to determine the count before anything is returned.
Depending on how you're processing the query (e.g., a PL/SQL block in a form), you could use the above query to open a cursor, then loop through the cursor and display sets of records and give the user the chance to cancel.
I'm not sure how you would accomplish this, since the query has to complete prior to the results being known. No RDBMS (that I know of) offers any means of determining how many results to a query have been found prior to the query completing.
I can't speak factually for how expensive such a feature would be in Oracle because I have never seen the source code. From the outside in, however, I think it would be rather costly and could double (if not more) the length of time a query took to complete. It would mean updating an atomic counter after each result, which isn't cheap when you're talking millions of possible rows.
So I am putting up my comments into this answer-
In terms of Oracle.
Oracle maintains its own buffer cache inside the system global area (SGA) for each instance. The hit ratio on the buffer cache depends on the sizing and reaches 90% most of the time, which means 9 out of 10 hits will be satisfied without reaching the disk.
Considering the above, even if there is a "way" (so to speak) to access the buffer chache for a query you run, the results would highly depend on the cache sizing factor. If a buffer cache is too small, the cache hit ratio will be small and more physical disk I/O will result, which will render the buffer cache un-reliable in terms of temp-data content. If a buffer cache is too big, then parts of the buffer cache will be under-utilized and memory resources will be wasted, which in terms would render too much un-necessary processing trying to access the buffer cache while in order to peek in it for the data you want.
Also depending on your cache sizing and SGA memory it would be upto the ODBC driver / optimizer to determine when and how much to use what (cache buffering or Direct Disk I/O).
In terms of trying to access the "buffer cache" to find "the row" you are looking for, there might be a way (or in near future) to do it, but there would be no way to know if what you are looking for ("The row") is there or not after all.
Also, full table scans of large tables usually result in physical disk reads and a lower buffer cache hit ratio.You can get an idea of full table scan activity at the data file level by querying v$filestat and joining to SYS.dba_data_files. Following is a query you can use and sample results:
SELECT A.file_name, B.phyrds, B.phyblkrd
FROM SYS.dba_data_files A, v$filestat B
WHERE B.file# = A.file_id
ORDER BY A.file_id;
Since this whole ordeal is highly based on multiple parameters and statistics, the results of what you are looking for may remain a probability driven off of those facotrs.
By Upper tens of thousands requests/second I want to see 60,000 -> +90,000 requests/second.
My Setup consists of the following:
user ---> web app --> message queue --> parser --> database?
I should mention that the parser currently can parse/stuff around 18750 records/second using COPY so we are limited on that end until we start adding more parsers -- this isn't a huge concern for me now.
I have a system that requires the ability to bulk upload as fast as I can as many records as I can. This same system (or it can be different depending on how you would approach it) should be able to respond to analytical type queries such as this:
wonq = "select sum(amount) from actions where player = '#player' and " +
"(type = 'award' or type = 'return') and hand = hand_num"
lostq = "select sum(amount) from actions where player = 'player' and " +
"type != 'award' and type != 'return' and hand = hand_num"
.....10-15 thousand times (PER USER) since they are keyed off to another table. Needless to say we paginate these results at 10/page for now.
I've looked at the following: (assuming these are all on the same server)
mysql (reg. run of the mill rdbms) -- was able to get into the 15-20 thousand requests/second range; under current conditions if we try to scale this out we need a seperate host/database everytime we need to scale -- this is not doable
couchdb (document oriented db) -- didn't break 700 requests/second; I was really hoping this was going to save our ass -- not a chance!
vertica (columnar oriented db) -- was hitting 60000 request/second, closed source, very pricey; this is still an option but I personally did not like it at all
tokyocabinet (hash based db) -- is currently weighing in at 45,000 inserts/second and 66,000 selects/second; yesterday when I wrote this I was using a FFI based adapater that was performing at around 5555 requests/second; this is by-far THE fastest most awesome database I've seen yet!!
terracotta -- (vm cluster) currently evaluating this along with jmaglev (can't wait until maglev itself comes out) -- this is THE SLOWEST!
maybe I'm just approaching this problem wrong but I've ALWAYS heard that RDBMS were slow as all hell -- so where are these super fast systems that I've heard about?
Testing Conditions::
Just so ppl know my specs on my dev box are:
dual 3.2ghz intel, 1 gig ram
Mysql mysql.cnf edits were:
key_buffer = 400M # was 16M
innodb_log_file_size = 100M # non existent before
innodb_buffer_pool_size = 200M # non existent before
UPDATE::
It turns out that terracotta might have a place in our application structure but it flat out WILL NOT be replacing our database anytime soon as it's speeds are terrible and it's heap utilization sucks.
On the other hand, I was very happy to see that tokyocabinet's NON-FFI ruby library (meaning tyrant/cabinet) is super fast and right now that is first place.
For crazy-big scalability, you'll want to focus on two things:
Sharding: Split your data set into groups that don't overlap. Have an easy, fast way to map from a request to a server. (Player starting with a-f, server 1; g-q, server 2... etc...)
Caching: Use Memcache to remember the output of some really common select queries, so you don't have to go to disk as often.
Well the big player in the game is Oracle but thats big bucks.
If you want to go cheap then you will have to pay the price in a different terms:
by partioning the DB across multiple instances and distributing the load.
Potentially caching results so actual DB access is reduced.
user ---> web app --> message queue --> parser --> database?
What do you need the message queue for?
Those are a big performance issue normally.
Sharding and caching as ojrac said.
Another option is to take a step back and figure out to do the work with less queries! From the little information you gave I can't help but think "there must be a better way". From the examples you gave some summary tables (with optional caching) might be an easy win.
Hypertable etc gives better performance for some data access patterns, but yours sound very suited for the typical databases.
And yeah, CouchDB is disappointingly slow.
have you tried postgresql? it should be faster than mysql. but anyhow, you would need to balance the load over multiple servers (split database). you can have multiple databases (e.g. for each client) and then one centralized one that will sync with those small ones...
Have you tried redis? They promise the speed of 110000 SETs/second, 81000 GETs/second. It's an advanced key-value db with support for lists and sets.
I doubt any system will give you the out-of-the-box performance that you need. You are probably going to start hitting hard limits on the machine you are on (with just about any write-intensive db you will hit I/O limits pretty fast). Some analysis might be required, but the disk is nearly always the bottleneck. More RAM will help, as will using Solid State Disks.
However, you will probably need clustering of some kind regardless of which actual db you use. You can shard the data itself, or with MySQL, setting up read-slaves will spread the load across nodes and should give you the throughput you are looking for.
Also: MongoDB is awesome. Might be worth a look.
The typical way to quickly store data durably in a write-heavy app is to use an append-only log. If properly deployed s.t. the log file is on its own spinning disk, the disk seek time is minimized per write/append operation.
One can update metadata to know the offset for some primary key after each write.
There is a mysql storage engine that does this is you want to use mysql. Another option is one of the new nosql databases like fleetdb.
Have you tried using a SSD as well?
There are lots of options to solve this problem but they are likely going to require some manual labor.