Understanding odbc_pool_size in ejabberd.yml - mysql

in ejabberd.yml we have following line :
##
## Number of connections to open to the database for each virtual host
##
## odbc_pool_size: 10
we are running mysql enabled ejabberd server. MySql server connection limit is 300.
After doing research online (on very limited documentation available) , it seems like increase odbc_pool_size from default 10 mainly affects (decreases) the connecting time of client to server. we have an average of ~1500 users online at one given time instance.
My question : what exact purpose does odbc_pool_size variable serve. How will increasing the pool size affect server connect time / latency ?
UPDATE
Ejabberd Server stats :
8 gb RAM
Dual core
~2000 users (peak hours)
average cpu utilaztion 13.5%
MySql Server stats:
max supported simultaneous connection: 300
write IOPS (QPS) 23.1/sec
read IOPS 1/sec
Memory usage : 2.5/15gb
According to you what will be a good odbc_pool_size for above configuration? (I was thinking of something around 50?)

Like any pool, its size decide of the number of request that can be processed in parallel. If your pool size in 10, only 10 requests can be process in parallel, the other are queued. It means if you have 100 users that tried to connect at the same time, the last one to be process will have to wait for 10 batches of queries to have been processed, thus increasing the latency.
Increasing the pool size can help with latency, up to a point where database cannot cope with more parallelism and global performance will decrease. Good value depends on your database sizing, your use case and your overall architecture.
You need to perform benchmarks and experiment to adapt the sizing to your own case as it really depend on actual traffic patterns.

Related

Are sysbench threads similar to thread_pool_size?

I tested a mysql cluster using sysbench to figure out a sweet spot to set maximum threads to. In my endevours I came across the threads option in sysbench.
--threads=N
I also came across the thread_pool_size in Mysql Thread pool operations.
thread_pool_size: The number of thread groups in the thread pool. This is the most important parameter controlling thread pool performance.
So the question that plagues me is are the threads for sysbench similar to the thread_pool_size for mysql?
Here is an example of a command that I used.
sysbench oltp_read_write.lua --threads=26 --time=30 --mysql-user='root' --mysql-password='password' --table-size=10000 --mysql-host=10.100.100.64 --mysql-port=6033 run
This is an image to show my current configuration:
CNFfiles
OUCH!
thread_cache_size is the number of "threads" to hang onto. It is a simpleminded pooling. It is a number not bytes!! 10 is a reasonable number. Anything more than max_connections is unnecessary.
max_connections refers to "concurrent" connections, not total over time. The default of 151 is fine for most systems. 1000 is "high" but is warranted for some systems; 10K is too high.
Check these:
SHOW GLOBAL STATUS LIKE 'Max_used_connections';
SHOW GLOBAL STATUS LIKE 'Threads_running';
The former is a high-water mark (since startup). If it is close to max_connections, then maybe max_connections should be increased.
The latter says how many of the current connections are actually doing anything. If it is over 100, the connections are stumbling over each other. We will need more details to discuss what to do next. (1 is common; a 'busy' system might say no more than 10, and change rapidly.)
Sysbench is a client of MySQL. It can start a number of threads, one per connection.
When not using a thread pool in MySQL Server, every client connection starts its own thread. So there's a one-to-one correspondence between sysbench threads and MySQL Server threads.
It's typical that a client connection is not running a query every second. Normally a client application runs other code in between waiting for queries. So on the MySQL Server side, some threads exist, but they aren't doing anything. This appears as "Sleep" in the processlist.
It's pretty common to have hundreds of client connections open, but only one or two dozen of these connections doing any query at any given moment. The others are all sleeping.
As a metaphor, I would compare this to customers in a bank, where they approach a teller's window and do transactions. The customer blocks others from using the same teller, even if the customer is signing a form or something else that is not talking directly to the teller.
When using a thread pool, threads are handled differently in MySQL Server. The thread pool feature exists so that a smaller number of threads in the MySQL Server can be shared by a greater number of client connections. The threads in MySQL Server are no longer corresponding one-to-one with client connections. They switch when a client connection requests to execute an SQL query. This is done to reduce resource usage when your clients open a large number of connections.
A metaphor for this is a restaurant where a single server can handle a whole section with customers. The customers only need attention from time to time, and the server can therefore keep track of multiple tables of customers.
In the case of sysbench, this is probably not a typical workload. The client threads are running SQL queries more rapidly than a typical application. If you try to use a thread pool in this case, you might have more client requests than the number of threads in the thread pool, and in this case the client requests might queue up.
In the restaurant metaphor, this would be the infrequent times when more than one table wants something at the same time. Then all but one of the tables must wait, but hopefully not for long since most customer requests are brief.
Using the thread pool in MySQL Server while testing with sysbench might not be the best way to measure the maximum throughput of queries.

Couchbase : 160x faster with only one node : why?

On couchbase website, one can see couchbase can easily reach 100 000 requests per second.
As my application needs basically only key/value store, I give a try to couchbase.
So I tried to build a small cluster within my provider.
I use python client, and Couchbase server 2.2.0 community edition.
With a single node into the "cluster" : I can do 16 000 requests per second : nice !
But when there are 2 nodes into the cluster, I got only 100 requests per second for 'set(key,val)', and the same for 'get(key)' (I used the default bucket).
This is for a very small number of keys : 10 000 keys, length : only 10 bytes !
When looking the stats, it seems there is no bottleneck (CPU/disk/RAM).
My hardware :
Core i5 (3.4 Ghz)
32 GB RAM
Disk : SSD 120Go
Network : Gigabit, bandwith limited to 200 Mbps
The only point I see is that I have a 10ms latency between the 2 nodes :
What should be a "good" latency between nodes ?
What performance I can expect with a gigabit connection ?
I used default bucket, should I use another one with specific parameters ?
10ms latency is pretty high if your running both your client and server in the same datacenter so the first thing I would do is try to figure out why your network giving you such high latencies.
As you mentioned you are doing about 100 ops/sec and this makes sense if your network latency is 10ms. This also means that your likely doing synchronous IO over the network. This means your waiting for one request to make a round-trip before sending the next. The python client should have async API's that allow you to send multiple requests without waiting for the responses to come back later. This will vastly improve the amount of ops/sec you can do.
I know the website mentions that Couchbase can do a 100k ops/sec for a single node, but I've gotten up to almost 250k ops/sec. The only things that will really slow you down is the network (which I maxed out in this case) and how many items are resident in memory when you request them since having to go to disk will lower your performance especially if you only have a few connections to the database.
Here's some answers to the questions you posted.
Nodes should be in the same datacenter if they are part of the same cluster. (Use the cross datacenter replication feature if they are in different data centers)
Expect to be able to max out the network connection and that the server will not be the bottleneck when all of your data is resident in memory.
There are no specific parameters that you need to tune in order to get performance from Couchbase.
[EDIT] There is no reason why 1 node would perform better than 2 nodes. In fact having more nodes should cause you to have more throughput.

What does max_connections really mean?

I'm trying to set the ideal performance setup for MySQL and resources needed on a shared hosting.
My question is, what does max_connections really mean?
Is it the number of unique concurrent requests made to the server?
So if there are two users, 1 with 1 tab open and the other with 4 tabs open... and both press all their tabs to reload at the same time, will there be 5 connections made to the MySQL DB? Consequently, if we bump this scenario to: 10 people with 2 tabs and 31 people with one tab all pressing refresh at the same time... with our max_connections at 50, will everyone get locked out?
The reason I ask is because I want to shoot for low max_connections to be conservative with memory resources since I consistently see the site going into cpu throttling mode
Thank you for your help
Yes, there is a separate connection opened for each page. However, assuming you're not doing anything database-intensive, the connection will be short-lived and close itself once the page has been served to the client.
If you do exceed the maximum number of connections, any subsequent connection attempt will fail.
The number of connections permitted is controlled by the max_connections system variable. The default value is 151 to improve performance when MySQL is used with the Apache Web server. (Previously, the default was 100.) If you need to support more connections, you should set a larger value for this variable.
mysqld actually permits max_connections+1 clients to connect. The extra connection is reserved for use by accounts that have the SUPER privilege. By granting the SUPER privilege to administrators and not to normal users (who should not need it), an administrator can connect to the server and use SHOW PROCESSLIST to diagnose problems even if the maximum number of unprivileged clients are connected. See Section 12.7.5.30, “SHOW PROCESSLIST Syntax”.
The maximum number of connections MySQL can support depends on the quality of the thread library on a given platform, the amount of RAM available, how much RAM is used for each connection, the workload from each connection, and the desired response time. Linux or Solaris should be able to support at 500 to 1000 simultaneous connections routinely and as many as 10,000 connections if you have many gigabytes of RAM available and the workload from each is low or the response time target undemanding. Windows is limited to (open tables × 2 + open connections) < 2048 due to the Posix compatibility layer used on that platform.
Increasing open-files-limit may be necessary. Also see Section 2.5, “Installing MySQL on Linux”, for how to raise the operating system limit on how many handles can be used by MySQL.
From: MySQL 5.6 Reference Manual:: C.5.2.7 Too many connections
max_connections is a global variable that can have a minimum value of 1 and a maximum value of 100000. However, It has always been commonly known that settings max_connections to an insanely high value is not too good for performance. Generations of system administrators have followed this rule.
When it comes to performance max_connections value is always bounded to server specs and if it is not in use, no performance issue will occur.
Please use this for more information.

Database concurrent connections in regard to web (http) requests and scalability

One database connection is equal to one web request (in case, of course, your client reads the database on each request). By using a connection pool these connections are pre-created, but they are still used one-per-request.
Now to some numbers - if you google for "Tomcat concurrent connections" or "Apache concurrent connections", you'll see that they support without any problem 16000-20000 concurrent connections.
On the other hand, the MySQL administrator best practices say that the maximum number of concurrent database connections is 4096.
On a quick search, I could not find any information about PostgreSQL.
Q1: is there a software limit to concurrent connections in PostgreSQL, and is the one of MySQL indeed 4096
Q2. Do I miss something, or MySQL (or any db imposing a max concurrent connections limit) will appear as a bottleneck, provided the hardware and the OS allow a large number of concurrent connections?
Update: Q3 how exactly a higher connection count is negative to performance?
Q2: You can have far more users on your web site than connections to your database because each user doesn't hold a connection open. Users only require a connection every so often and then only for a short time. Your web app connection pool will generally have far fewer than the 4096 limit.
Think of a restaurant analogy. A restaurant may have 100 customers (users) but only 5 waiters (connections). It works because customers only require a waiter for a short time every so often.
The time when it goes wrong is when all 100 customers put their hand up and say 'check please', or when all 16,000 users hit the 'submit order' button at the same time.
Q1: you set a configuration paramter called max_connections. It can be set well above 4096, but you are definitely advised to keep it much lower than that for performance reasons.
Q2: you usually don't need that many connections, and things will be much faster if you limit the number of concurrent queries on your database. You can use something like pgbouncer in transaction mode to interleave many transactions over fewer connections.
The Wikipedia Study Case
30 000 HTTP requests/s during peak-time
3 Gbit/s of data traffic
3 data centers: Tampa, Amsterdam, Seoul
350 servers, ranging between 1x P4 to 2x Xeon Quad-
Core, 0.5 - 16 GB of memory
...managed by ~ 6 people
This is a little bit off-topic of your questions. But I think you could find this useful. you don't always kick the DB for each request. a correct caching strategy is almost always the best performance improvement you can apply to your web app. lot of static content could remain in cache until it explicitly change. this is how Wikipedia does it.
From the link you provided to "MySQL administrator best practices"
"Note: connections take memory and your OS might not be able to handle a lot of connections. MySQL binaries for Linux/x86 allow you to have up to 4096 concurrent connections, but self compiled binaries often have less of a limit."
So 4096 seems like the current maximum. Bear in mind that the limit is per server and you can have multiple slave servers that can be used to serve queries.
http://dev.mysql.com/doc/refman/5.0/en/replication-solutions-scaleout.html

Does MySQL packet size cause slowdown?

I have written a program which uses a MySQL database, and transaction between the database server (a very powerful one) and the client is happening over a ADSL connection (1 Mbit/s).
But I have a very very slow connection between each client and the server. Only approximately 3-4 KB/s data is send through the server. Neither the server nor the clients use the Internet for other purposes, just my program uses the Internet.
I can't figure out why? Is the reason MySQL server packet size?
Any suggestions?
Try using mytop to identify the server low performance cause.
Another one: you may be using SELECT COUNT(*) FROM .. for large InnoDB tables which causes a table scan.
And can you test for some other services whether the exchange data rate between the machines is OK? Even if the even if the output bandwidth is lower for ADSL users 3-4 kB might not be the reason of low performance.
The effective transfer rate is often heavily limited by the number of roundtrips between client and server. Without seeing your code it is sort of difficult to tell, but you should check the number of requests happening.
If you have a single request that results in many records being returned, you should see a better usage of bandwidth than with a higher number of requests which only deliver a few rows each.
In the latter case the actual result transfer is probably quite fast, but the latencies involved in the "control communications" (i. e. the statements themselves, login requests etc.) will add up, effectively lowering overall throughput.
As for the packet size: When it is very small, there is more overhead in the communications, increasing the aforementioned effect. The server's default max_allowed_packet size if 1MB if memory serves, but that should be fine with your connection.
You first have to debug both connections.
What is your upload speed if you upload a file with WinSCP ot equivalent to the MySQL server? It should be near 90 KB/s with ADSL 1 Mbit/s.