As per my understanding, database connection pool usually works this way:
create n connections during app initialization and put them into a cache(e.g. a list)
a thread will require a connection to do some operation to the db and return the connection back when it has finished
when there is no available connection in the cache, the thread in step2 will be waiting util a connection is pushed back to the cache
My question is:
Can we execute multiple db operations through one connection after we acquire it from the pool instead of do one db operation then put it back? it's seems more efficient, because it saves the time acquiring and putting back the connection. (under multiple threads condition, there must be some cost of locking when add and get from the connection pool)
can anyone help? Thks!
Yes, the database connection can be used for multiple operations each time it is acquired from the pool, and this behavior is typical for database applications that use pooling. For example, a connection might be acquired once and reused for several operations during the handling of a request to a REST service. This lifecycle also often involves managing those operations as a single transaction in the database.
Related
I am creating a rest api that uses mysql as data base. My confusion is that should i connect to database in every request and release the connection at the end of the operation. Or should i connect the database at the start of the server and make it globally available and forget about releasing the connection
I would caution that neither option is quite wise.
The advantage of creating one connection for each request is that those connections can interact with your database in parallel, this is great when you have a lot of requests coming through.
The disadvantage (and the reason you might just create one connection on startup and share it) is obviously the setup cost of establishing a new connection each time.
One option to look into is connection pooling https://en.wikipedia.org/wiki/Connection_pool.
At a high level you can establish a pool of open connections on startup. When you need to make a request remove one of those connections from the pool, use it, and return it when done.
There are a number of useful Node packages that implement this abstraction, you should be able to find one if you look.
I am connecting to a remote MySQL database in my node.js server code for a web app. Is there any advantage to using a connection pool when I only have a single instance of a node.js application server running?
Connection pools are per instance of application. When you connect to the db, you are doing it from that particular instance and hence the pool is in the scope of that instance. The advantage of creating a pool is that you don't create / close connections very often, as this is, in general, a very expensive process. Rather, you maintain a set of connections open, in idle state, ready to be used if there is a need.
Update
In node there is async.parallel() construct which allows you to launch a set of tasks in async manner. Immagine that those tasks represent each one a single query. If you have a single connection to use, each process should use that same one, and it will quickly become a bottelneck. Instead, if you have a pool of available connections, each task can use a separate connection until the pool is completely used. Check this for more detailed reference.
A typical Elixir web application will usually have a postgresql backend, with Ecto queries coupled with the API logic.
However since cowboy creates a child GenServer process (containing the app logic) per request, will this have the effect of producing n psql threads for n concurrent requests, even with the pooling cowboy/poolboy provides?
Then, moving to a scenario where multiple instances of the application exists (for example a docker container cluster) will this not add an extra factor to the total number of existing database threads?
Cowboy does create a new Erlang process for each request but executing an Ecto query from that process will not result in a new Database connection. Ecto keeps a pool of connections to the database (using db_connection/poolboy). The size of this pool is set using the pool_size option in the configuration of the Repo. When you initiate a query, a connection from this pool is borrowed and used to execute the query. The connection is returned to the pool after the execution is complete. Ecto will never create a new connection for each query. If a connection is not available in the pool, it'll wait for one to be available or eventually time out if no connection is checked in in the configured timeout (defaults to 30 seconds).
I'm experiencing a problem with Rails and my MySQL RDS Instance. I have my rails app connected to it through our database.yml file with a pool of 10 (now 5) connections. The other day another user of the database tried running a stored procedure but it would not execute. It was stuck just hanging around waiting to execute. The user looked at the processes and noticed that our rails user had around 30 idle processes so they killed some of those. The stored procedure kicked off then and ran without issue.
We are on an r3.xlarge instance and had ~100 total processes at the time of problem. This doesn't seem alarmingly high to me and I'm not sure why the procedure wouldn't execute without freeing up some of the processes. I guess my question is, is there a way to tell my rails app to release some of these idle connections after x seconds, or a way to control these connections better? I can write a cron which frees them up, but I'd love to do it the rails/best way.
Thanks for any help!
It seems to me that you may have hit the maximum connections limit on the MySQL instance. You can run select ##max_connections on your MySQL to find out the limit.
I don't know of a way to force Rails to close its allocated db connections. Each server process may use up to the pool size connections (i.e. 10 or 5 in your case) to the db for its threads. The distinction between threads and processes is important: if you for example have multiple workers serving your rails app running as separate processes (e.g. puma can be configured like that), then each of the process may allocate up to 5 or 10 connections. If you use background processes (sidekiq etc.), they also may use up to this amount of connections.
The ConnectionPool also provides a reaper that can be used to free allocated db connections from dead threads but unless your app is having some larger troubles, this usually will not help (your threads are more probably idle than dead).
So, I'd give a general advice to try to estimate the maximum number of connections that all your rails processes might need and if it is near or above the MySQL connection limit, either lower the connection pool size or decrease the number of possibly run Rails processes (workers).
If you need more help, please specify what application server do you use to run your Rails app and how it is configured and the same also for any background job workers.
Try setting reaping_frequency in your database.yml file:
reaping_frequency: frequency in seconds to periodically run the Reaper,
which attempts to find and recover connections from dead threads,
which can occur if a programmer forgets to close a connection at the
end of a thread or a thread dies unexpectedly. Regardless of this
setting, the Reaper will be invoked before every blocking wait.
(Default nil, which means don't schedule the Reaper)
Above documentation from: http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html
I'm running a rails 2.3.5 application, which supports me to pool mysql connections to my database. But I remember reading that my mongrel servers are single threaded. What's the point of having a connection pool, to a single threaded application? Is there a way to multi-thread my app?
Also, do connection pools understand, that ruby 1.8 has "green" threads?
Cheers!
Manage Connections
The major benefit of connection pooling for a single-thread server like Mongrel/Passenger/etc is that the connection is established/maintained in a Rack handler outside the main Rails request processing. This allows for a connection to be established once vs. many times as it's used in different ways. The goal is to re-use the established connection and minimize the number of connections. This should prevent having to reconnect within a given request processing cycle and possibly even between requests (if I recall correctly).
Multiple Concurrent Connections
Although most use cases (Mongrel/Passenger) are single threaded and can only use a single connection at a time - there is JRuby and environments/app servers that have full multi-threaded support. Rails has been thread safe since 2.2
TL;DR:
Pool establishes connection automatically. Some people do use multiple concurrent db connections from pool.