Is Commons Components HttpClient IdleConnectionMonitorThread needed? - apache-httpclient-4.x

The Commons HTTP client docs provide an implementation of IdleConnectionMonitorThread that periodically closes old connections. It states
The only feasible solution that does not involve a one thread per socket model for idle connections is a dedicated monitor thread used to evict connections that are considered expired due to a long period of inactivity
Internally, PoolingHttpClientConnectionManager.closeExpiredConnections() calls entry.isExpired(). This is the same call that's made when acquiring a connection from the pool.
Other than latency in requesting connections, why is IdleConnectionMonitorThread recommended if both implementations of HttpClientConnectionManager (PoolingHttpClientConnectionManager and BasicHttpClientConnectionManager) have logic for closing old connections?

There is enough reasons to avoid build-up of state connections in half-closed state.
Usually one needs evict idle / expired connections from a separate monitor thread only if the system tends to have very long periods of communication inactivity. Having said that I personally see no good reason to not have a low priority monitor thread that cleans up the connection pool every other minute or so.

Related

Is there a way to create a Cloud SQL postgres connection in a Google Cloud function (Scala) that does not use HikariCP?

I would like to create a Cloud function to call a Postgres Cloud SQL DB. Currently I followed the documentation and create a Hikari based connection...
val config = new HikariConfig
config.setJdbcUrl(jdbcURL)
config.setDataSourceProperties(connProps)
config.setMaximumPoolSize(10)
config.setMinimumIdle(4)
config.addDataSourceProperty("ipTypes", "PUBLIC,PRIVATE") // TODO: Make configureable
println("Config created")
val pool : DataSource = new HikariDataSource(config) // Do we really need Hikari here if it doesn't need pooling?
println("Returning the datasource")
Some(pool)
This works but it causes a 25 sec delay due to "cold start"s. I would like to try using PG driver directly and see if that is faster but I think that isn't possible thanks the the UNIX socket/SQL Cloud proxy stuff based on the documentation.
Is there a way to connect to Cloud SQL from a Cloud function using a basic PG Driver connection and not the Hikari stuff?
As mentioned in the thread:
With all "serverless" compute providers, there is always going to be
some form of cold start cost that you can't eliminate. Even if you are
able to keep a single instance alive by pinging it, the system may
spin up any number of other instances to handle current load. Those
new instances will have a cold start cost. Then, when load decreases,
the unnecessary instances will be shut down.
you can now specify a minimum number of instances to keep active. This
can help reduce (but not eliminate) cold starts. Read the Google
Cloud blog and the documentation.
If you absolutely demand hot servers to handle requests 24/7, then you
need to manage your own servers that run 24/7 (and pay the cost of
those servers running 24/7). As you can see, the benefit of serverless
is that you don't manage or scale your own servers, and you only pay
for what you use, but you have unpredictable cold start costs
associated with your project. That's the tradeoff.
For more information related to dependencies you can refer to the link provided by guillaume blaquiere.
To answer your exact question:
Can I connect without using HikariCP?
The answer is sure; you can use any number of connection pooling libraries avaible in Java. The examples often show HikariCP because it is far and away the most popular and highest performing.
So it's unlikely that switching connection pools will improve your performance. A slightly different question implied by your first question might be:
Can I connect without using a connection pool?
And again the answer is sure, you could use the driver directly -- but you probably shouldn't. Connection creation and management is expensive (and hard), and using a connection pool is a best practice. I wouldn't consider code "production quality" without one. While it might save you boot time, it's likely to introduce more overhead and latency into the request itself, costing you more overall. Additionally, it'll remove helpful error handling and retries around connections that you'll now have to deal with yourself.
So it seems you question really might be:
How can I reduce my cold start time?
Well with a start time of 25 seconds, the problem likely isn't limited to just Hikari. I would check out this GCP doc page on performance, and look into other articles on how to improve start up time for JVMs or your specific frameworks.
However, one way that HikariCP might be impacting your start up time is that HikariCP blocks on the connection creation until the initialization is complete. There are a few things you can do to improve this (but likely will only help, not eliminate the 25s cold start)
You can lower your number of connections to 1. Cloud function instances only handle requests one at a time, so specifying a min-idle of 4 and a max connection to 10 is likely leading to wasted connections.
You can move the initialization of Hikari to happen outside of your start up. The GCP docs page I mentioned above shows how to use lazy initialization, so expensive object's aren't created until you need them. This will move the cost of initializing Hikari out of your functions start up. This could make the first request that calls it more expensive -- if that is a concern, I would suggest combining lazy initialization along with triggering that initialization in async way on start up. This way the pool is created in the background, without blocking startup.
As an alternative to #2, you could also lower min-idle connections to 0 - e.i., initialize the Hikari Pool with 0 connections in it. While this might be easier to implement, it will mean that requests without a warmed up connection will have to wait for a new connection to be established. (which makes #2 more optimal in terms of performance).

Should I open a a MySQL Connection every request or always?

Assuming I'll execute every 2 seconds a query, should I open the connection on every request or should I keep the connection alive until the application(server) stops?
In my experience, establishing connections is unlikely to be a bottleneck for a mysql server (connection overhead is fairly low in mysql). That having been said, reusing existing connections is often an appropriate approach, but it requires some careful considerations: if the database server is temporarily unavailable, the code must reconnect; if the server is replaced, it must reconnect (mysql implementations tend towards failover solutions rather than true high availability); if the application uses multiple connections to mysql, you must be sure not to cross your connections between users or sessions (active database, timezone, charset, and so on are sessions variables, essentially tied to a connection). If you're not up to the task of making your reusable connection reliable in these and other edge cases, creating a new connection every 2 seconds may provide this durability for free.
In short, there can be less-than-obvious benefits to short lived connections. I would not bother to add intelligence around maintaining a persistent connection unless you have reason to believe it will actually make a meaningful difference in your case (eg benchmarks).

Producer Consumer setup: How to handle Database Connections?

I'm building my first single-producer/single-consumer app in which the consumer takes items off the queue and stores them in a MySQL database.
Previously, when it was a single thread app, I would open a connection to the DB, send the query, close the connection, and repeat every time new info came in.
With a producer-consumer setup, what is the better way to handle the DB connection? Should I open it once before starting the consumer loop (I can't see a problem with this, but I'm sure that one of you fine folks will point it out if there is one)? Or should I open and close the DB connection on each iteration of the loop (seems like a waste of time and resources)?
This software runs on approximately 30 small linux computers and all of them talk to the same database. I don't see 30 simultaneous connections being an issue, but I'd love to hear your thoughts.
Apologies if this has been covered, I couldn't find it anywhere. If it has, a link would be fantastic. Thanks!
EDIT FOR CLARITY
My main focus here is the speed of the consumer thread. The whole reason for switching from single- to multi-threaded was because the single-threaded version was missing incoming information because it was busy trying to connect to the database. Given that the producer thread is expected to start dumping info into the buffer at quite a high rate, and given that the buffer will be limited in size, it is very important that the consumer work through the buffer as quickly as possible while remaining stable.
Your MySQL shouldn't have any problems handling connections in the hundreds, if not thousands.
On each of your consumers you should set up a connection pool use that from your consumer. If you consume the messages in a single thread (per application) the pool only needs to use one connection but it's also fine to consume and start parallel threads that all use one connection.
The reason for using a connection pool is that it will handle re connection and keep alive for you. Just ask it for one connection and have it promise that it will work (it does this by running a small query against the database). If you don't use a connection for a while and it get's terminated the pool will just create a new one.

c3p0 seems to close active connections

I set unreturnedConnectionTimeout to release stale connections. I assumed that this was only going to close connections without any activity but it looks like this just closes every connection after the specified time.
Is this a bug or is this 'as designed'?
The manual states:
unreturnedConnectionTimeout defines a limit (in seconds) to how long a
Connection may remain checked out. If set to a nozero value,
unreturned, checked-out Connections that exceed this limit will be
summarily destroyed, and then replaced in the pool. Obviously, you
must take care to set this parameter to a value large enough that all
intended operations on checked out Connections have time to complete.
You can use this parameter to merely workaround unreliable client apps
that fail to close() Connections
From this I conclude that activity is not influencing the throwing away of connections. To me that sounds strange. Why throw away active connections?
Thanks,
Milo
i'm the author of c3p0, and of the paragraph you quote.
unreturnedConnectionTimeout is exactly what its name and documentation state: a timeout for unreturned Connections. it was implemented reluctantly, in response to user feedback, because it would never be necessary or useful if clients reliably check-in the Connections they check out. when it was implemented, I added a second unsolicited config param, debugUnreturnedConnectionStackTraces, to encourage developers to fix client applications rather than rely lazily on unreturnedConnectionTimeout.
there is nothing strange about the definition of unreturnedConnectionTimeout. generally, applications that use a Connection pool do not keep Connections checked out for long periods of time. doing so defies the purpose of a Connection pool, which is to allow applications to acquire Connections on an as-needed basis without a large performance penalty. the alternative to a Connection pool is for an application to check out Connections and retain them for long-periods of time, so they are always available for use. but maintaining long-lived Connections turns out to be complicated, so most applications delegate this to a pooling library like c3p0.
i understand that you have a preexisting application that maintains long-lived Connections, that you cannot easily modify. you would like a hybrid architecture between applications that maintain long-lived Connections directly and applications that delegate to a pool. in particular, what you want is for a library that helps you maintain the long-lived Connections that your application is already designed to retain.
c3p0 is not that library, unfortunately. c3p0 (like most Connection pooling libraries) considers checked-out Connections to be the client's property, and does no maintenance work on them until they are checked back in. there are two exceptions to this: unreturnedConnectionTimeout will close() Connections out from underneath clients if they have been checked out for too long, and c3p0 will invisibly test checked-out Connection when Exceptions occur, in order to determine whether Connections that have experienced Exceptions are suitable for return to the pool or else must be destroyed on check-in.
unreturnedConnectionTimeout is not the parameter you want. you would like something that automatically closes Connections when they are inactive for a period of time, but that permits Connections to be checked out indefinitely. such a parameter might be called inactiveConnectionTimout, and is a feature that could conceivably be added to c3p0, but has not been. it probably will not be, because few application hold checked-out Connections for long periods, and c3p0 is full of features that help you observe failures once Connections are checked-in, or when Connections transition between checked-out and checked in.
in your (pretty unusual) case, this means there is a feature you would like that simply is not provided by the library. i am sorry about that!
The unreturnedConnections can be active, it depends on how long it takes to execute eg. the query on the database. You should set the timeout for it to the value bigger then the longest operation you can expect with your application. Sometimes if you know that the value should be enough and c3p0 is still closing active connections than it means that the connection leaked somewhere (maybe not closed properly).

Persistent vs non-Persistent - Which should I use?

My site has always used persistent connections, based on my understanding of them there's no reason not to. Why close the connection when it can be reused? I have a site that in total accesses about 7 databases. It's not a huge traffic site, but it's big enough. What's your take on persistent, should I use them?
With persistent connections:
You cannot build transaction processing effectively
impossible user sessions on the same connection
app are not scalable. With time you may need to extend it and it will require management/tracking of persistent connections
if the script, for whatever reason, could not release the lock on the table, then any following scripts will block indefinitely and one should restart the db server. Using transactions, transaction block will also pass to the next script (using the same connection) if script execution ends before the transaction block completes, etc.
Persistent connections do not bring anything you can do with non-persistent connections.
Then, why to use them, at all?
The only possible reason is performance, to use them when overhead of creating a link to your SQL Server is high. And this depends on many factors like:
database type
whether MySQl server is on the same machine and, if not, how far? might be out of your local network /domain?
how much overloaded by other processes the machine on which MySQL sits
One always can replace persistent connections with non-persistent connections. It might change the performance of the script, but not its behavior!
Commercial RDMS might be licensed by the number of concurrent opened connections and here the persistent connections can misserve
My knowledge on the area is kinda limited so I can't give you many details on the subject but, as far as I know, the process of creating connections and handing them to a thread really costs resources, so I would avoid it if I were you. Anyhow I think that most of this decisions can't be generalized and depend on the business.
If, for instance, your application communicates continuously with the Database and will only stop when the application is closed, then perhaps persistent connections are the way to go, for you avoid the process mentioned before.
However, if your application only communicates with the Database sporadically to get minor information then closing the connection might be more sane, for you won't waste resources on opened connections that are not being used.
Also there is a technique called "Connection Pooling", in which you create a series of connections a priori and keep them there for other applications to consume. In this case connections are persistent to the database but non-persistent to the applications.
Note: Connections in MSSQL are always persistent to the database because connection pooling is the default behavior.