I'm working on a akka-http/slick web service, and I need to do the following in a transaction:
Insert a row in a table
Call some external web service
Commit the transaction
The web service I need to call is sometimes really slow to respond (let's say ~2 seconds).
I'm worried that this might keep the SQL connection open for too longer, and that'll exhaust Slick's connection pool and affect other independent requests.
Is this a possibility? Or does Slick do something to make sure this "idle" mid-transaction connection does not starve the pool?
If it is something I should be worried about - is there anything I can do to remedy this?
If it matters, I'm using MySQL with TokuDB.
The slick documentation seems to say that this will be a problem.
The use of a transaction always implies a pinned session.
And
You can use withPinnedSession to force the use of a single session, keeping the existing session open even when waiting for non-database computations.
From: http://slick.lightbend.com/doc/3.2.0/dbio.html#transactions-and-pinned-sessions
Related
I would like to create a Cloud function to call a Postgres Cloud SQL DB. Currently I followed the documentation and create a Hikari based connection...
val config = new HikariConfig
config.setJdbcUrl(jdbcURL)
config.setDataSourceProperties(connProps)
config.setMaximumPoolSize(10)
config.setMinimumIdle(4)
config.addDataSourceProperty("ipTypes", "PUBLIC,PRIVATE") // TODO: Make configureable
println("Config created")
val pool : DataSource = new HikariDataSource(config) // Do we really need Hikari here if it doesn't need pooling?
println("Returning the datasource")
Some(pool)
This works but it causes a 25 sec delay due to "cold start"s. I would like to try using PG driver directly and see if that is faster but I think that isn't possible thanks the the UNIX socket/SQL Cloud proxy stuff based on the documentation.
Is there a way to connect to Cloud SQL from a Cloud function using a basic PG Driver connection and not the Hikari stuff?
As mentioned in the thread:
With all "serverless" compute providers, there is always going to be
some form of cold start cost that you can't eliminate. Even if you are
able to keep a single instance alive by pinging it, the system may
spin up any number of other instances to handle current load. Those
new instances will have a cold start cost. Then, when load decreases,
the unnecessary instances will be shut down.
you can now specify a minimum number of instances to keep active. This
can help reduce (but not eliminate) cold starts. Read the Google
Cloud blog and the documentation.
If you absolutely demand hot servers to handle requests 24/7, then you
need to manage your own servers that run 24/7 (and pay the cost of
those servers running 24/7). As you can see, the benefit of serverless
is that you don't manage or scale your own servers, and you only pay
for what you use, but you have unpredictable cold start costs
associated with your project. That's the tradeoff.
For more information related to dependencies you can refer to the link provided by guillaume blaquiere.
To answer your exact question:
Can I connect without using HikariCP?
The answer is sure; you can use any number of connection pooling libraries avaible in Java. The examples often show HikariCP because it is far and away the most popular and highest performing.
So it's unlikely that switching connection pools will improve your performance. A slightly different question implied by your first question might be:
Can I connect without using a connection pool?
And again the answer is sure, you could use the driver directly -- but you probably shouldn't. Connection creation and management is expensive (and hard), and using a connection pool is a best practice. I wouldn't consider code "production quality" without one. While it might save you boot time, it's likely to introduce more overhead and latency into the request itself, costing you more overall. Additionally, it'll remove helpful error handling and retries around connections that you'll now have to deal with yourself.
So it seems you question really might be:
How can I reduce my cold start time?
Well with a start time of 25 seconds, the problem likely isn't limited to just Hikari. I would check out this GCP doc page on performance, and look into other articles on how to improve start up time for JVMs or your specific frameworks.
However, one way that HikariCP might be impacting your start up time is that HikariCP blocks on the connection creation until the initialization is complete. There are a few things you can do to improve this (but likely will only help, not eliminate the 25s cold start)
You can lower your number of connections to 1. Cloud function instances only handle requests one at a time, so specifying a min-idle of 4 and a max connection to 10 is likely leading to wasted connections.
You can move the initialization of Hikari to happen outside of your start up. The GCP docs page I mentioned above shows how to use lazy initialization, so expensive object's aren't created until you need them. This will move the cost of initializing Hikari out of your functions start up. This could make the first request that calls it more expensive -- if that is a concern, I would suggest combining lazy initialization along with triggering that initialization in async way on start up. This way the pool is created in the background, without blocking startup.
As an alternative to #2, you could also lower min-idle connections to 0 - e.i., initialize the Hikari Pool with 0 connections in it. While this might be easier to implement, it will mean that requests without a warmed up connection will have to wait for a new connection to be established. (which makes #2 more optimal in terms of performance).
So this is more of a generic question but an important one for me and perhaps future googlers.
Since one can create one connection and keep it alive as long as whatever process is connected to it is alive too, and librarys can keep it healthy (upon failing etc) why would one use a pool.
I can not understand where the performance enhancement comes into to play.
The querys are just getting queued the same way they would with one connection.
There is no 'parallel' processing.
Also assuming the process and the DB are in the same server there is no time lost sending the request over the network. In addition no time is lost connecting, and ending the connection with either option.
I can only see the demerits such as making sure the data selected are not currently getting updated by another connection, thus receiving old data etc.
I wanted to boost the performance of my MySQL DB and make it more scalable as I stumbled upon the pool vs 1 permanent connection argument without being sure if I should change or not.
A few weeks ago, I post a question about queuing database access request to prevent 'too many connection' error when massive concurrent db requests happen. People told me ConnectionPool is the right way to go which I agreed at that time. However, I finally realized this is not the solution especially when there are a lot of different clients accessing mysql server through network, because connection pool is at client side it can not prevent the sum of connections of all clients from exceeding the max connection number of mysql server.
I think there should be some middleware on the mysql server working as a queue or pool, is anybody familiar with this? Thank you.
I know this question is widely asked, I am also surprised as if there is no total solution for it.
HAProxy should perform TCP-level queueing for you purpose. Though, would it be better to build an application server in the middle, to handle incoming flow at more conscious level than TCP. This could require rewriting of both server and clients, but could give you more control over what's happening.
What you ask is actually a pretty complicated problem.
First of all you need to decide whether mis-alignments in data are acceptable, for example: if you store in the database the number of Likes received, and you ask this number at 12:00:00, and the number in the DB is 500, and someone posts a LIKE at 12:00:01, and you query it again at 12:00:02; is it OK to receive "500" again, even if the correct number should be 501, provided that in a little time the answer "501" does come out?
If this is acceptable (the infamous "301 bug" in YouTube), then you might start caching some SELECT responses.
You might even cache them in middleware, i.e. have a special process running continuously and hogging ONE connection to MySQL, and answering requests in a queue. You might run it internally in the server as a Web server on port 8001 and have an Apache ReverseProxy, HAproxy, pound, or NginX location to proxy it outside.
You can do the same for special UPDATE/DELETE queries even if it's trickier.
It would be best to cache queries running asynchronously through AJAX first, if any, because serializing queries with a proxy is liable to perceptibly slow down the application.
You have a threefold target:
run queries on MySQL as fast as possible (look into indexing and MySQL caching) in order to free the ConnectionPool and keep it as lightly loaded as possible.
refactor the application in order to extract all information from queries (e.g., the number of rows with a certain property AND those rows as data are often retrieved using TWO queries, but with proper management you need only one and a SQLNumRows() call. Also, quite often similar queries with different informations are run, when a single query might have returned all information at one go: typically, one query to check user/password, another to fetch the complete user profile).
divert the most calls possible to something not at all (NginX, middleware) or lightly (queuing process) bound to MySQL; in the latter case, using a known number of connections in order to run predictably.
Unfortunately there's no easy "magic bullet" to solve this problem (except of course increasing the number of connections, maybe replicating the DB on several hosts running as master-slave. While not really a magic bullet, it is easier to design and implement).
I'm doing my first foray with mysql and I have a doubt about how to handle the connection(s) my applications has.
What I am doing now is opening a connection and keeping it alive until I terminate my program. I do a mysql_ping() every now and then and the connection is started with MYSQL_OPT_RECONNECT.
The other option (I can think of), would be to start a new connection before doing anything that requires my connection to the database and closing it after I'm done with it.
What are the pros and cons of these two approaches?
what are the "side effects" of a long connection?
What is the most used method of handling this?
Cheers ;)
Some extra details
At this point I am keeping the connection alive and I ping it every now and again to now it's status and reconnect if needed.
In spite of this, when there is some consistent concurrency with queries happening in quick succession, I get a "Server has gone away" message and after a while the connection is re-established.
I'm left wondering if this is a side effect of a prolonged connection or if this is just a case of bad mysql server configuration.
Any ideas?
In general there is quite some amount of overhead incurred when opening a connection. Depending on how often you expect this to happen it might be ok, but if you are writing any kind of application that executes more than just a very few commands per program run, I would recommend a connection pool (for server type apps) or at least a single or very few connections from your standalone app to be kept open for some time and reused for multiple transactions.
That way you have better control over how many connections get opened at the application level, even before the database server gets involved. This is a service an application server offers you, but it can also be rolled up rather easily if you want to keep it smaller.
Apart from performance reasons a pool is also a good idea to be prepared for peaks in demand. When a lot of requests come in and each of them tries to open a separate connection to the database - or as you suggested even more (per transaction) - you are quickly going to run out of resources. Keep in mind that every connection consumes memory inside MySQL!
Also you want to make sure to use a non-root user to connect, because if you don't (I think it is tied to the MySQL SUPER privilege), you might find yourself locked out. MySQL reserves at least one connection for an administrator for problem fixing, but if your app connects with that privilege, all connections would already be used up when you try to put out the fire manually.
Unless you are worried about having too many connections open (i.e. over 1,000), you she leave the connection open. There is overhead in connecting/reconnecting that will only slow things down. If you know you are going to need the connection to stay open for a while, run this query instead of pinging periodically:
SET SESSION wait_timeout=#
Where # is the number of seconds to leave an idle connection open.
What kind of application are you writing? If it's a webscript: keep it open. If it's an executable, pool your connections (if necessary, most of the times a singleton will do).
My site has always used persistent connections, based on my understanding of them there's no reason not to. Why close the connection when it can be reused? I have a site that in total accesses about 7 databases. It's not a huge traffic site, but it's big enough. What's your take on persistent, should I use them?
With persistent connections:
You cannot build transaction processing effectively
impossible user sessions on the same connection
app are not scalable. With time you may need to extend it and it will require management/tracking of persistent connections
if the script, for whatever reason, could not release the lock on the table, then any following scripts will block indefinitely and one should restart the db server. Using transactions, transaction block will also pass to the next script (using the same connection) if script execution ends before the transaction block completes, etc.
Persistent connections do not bring anything you can do with non-persistent connections.
Then, why to use them, at all?
The only possible reason is performance, to use them when overhead of creating a link to your SQL Server is high. And this depends on many factors like:
database type
whether MySQl server is on the same machine and, if not, how far? might be out of your local network /domain?
how much overloaded by other processes the machine on which MySQL sits
One always can replace persistent connections with non-persistent connections. It might change the performance of the script, but not its behavior!
Commercial RDMS might be licensed by the number of concurrent opened connections and here the persistent connections can misserve
My knowledge on the area is kinda limited so I can't give you many details on the subject but, as far as I know, the process of creating connections and handing them to a thread really costs resources, so I would avoid it if I were you. Anyhow I think that most of this decisions can't be generalized and depend on the business.
If, for instance, your application communicates continuously with the Database and will only stop when the application is closed, then perhaps persistent connections are the way to go, for you avoid the process mentioned before.
However, if your application only communicates with the Database sporadically to get minor information then closing the connection might be more sane, for you won't waste resources on opened connections that are not being used.
Also there is a technique called "Connection Pooling", in which you create a series of connections a priori and keep them there for other applications to consume. In this case connections are persistent to the database but non-persistent to the applications.
Note: Connections in MSSQL are always persistent to the database because connection pooling is the default behavior.