How to connect to an arbitary database using FaaS? - google-cloud-functions

I just did some reading about serverless computing and FaaS. If using FaaS to access an arbitrary database, we need each time to establish and close a database connection. In, lets say a node applications, we would usually establish the connection once and reuse it for multiple requests.
Correct?
I have a hosted MongoDB at mlab and thought about implementing a REST API with Googles Cloud Functions Service. Don't know how to handle the database connection efficient.
For sure thing get clearer while coding and testing. But I would like to know chances to succeed before spending a lot of time.
Thanks
Stefan

Serverless platforms reuse the underlying containers between distinct function invocations whenever possible. Hence you can set up a database connection pool in the global function scope and reuse it for subsequent invocations - as long as the container stays warm. GCP has a guide here using MySQL but I imagine the same applies to MongoDB.

Related

Is there a way to create a Cloud SQL postgres connection in a Google Cloud function (Scala) that does not use HikariCP?

I would like to create a Cloud function to call a Postgres Cloud SQL DB. Currently I followed the documentation and create a Hikari based connection...
val config = new HikariConfig
config.setJdbcUrl(jdbcURL)
config.setDataSourceProperties(connProps)
config.setMaximumPoolSize(10)
config.setMinimumIdle(4)
config.addDataSourceProperty("ipTypes", "PUBLIC,PRIVATE") // TODO: Make configureable
println("Config created")
val pool : DataSource = new HikariDataSource(config) // Do we really need Hikari here if it doesn't need pooling?
println("Returning the datasource")
Some(pool)
This works but it causes a 25 sec delay due to "cold start"s. I would like to try using PG driver directly and see if that is faster but I think that isn't possible thanks the the UNIX socket/SQL Cloud proxy stuff based on the documentation.
Is there a way to connect to Cloud SQL from a Cloud function using a basic PG Driver connection and not the Hikari stuff?
As mentioned in the thread:
With all "serverless" compute providers, there is always going to be
some form of cold start cost that you can't eliminate. Even if you are
able to keep a single instance alive by pinging it, the system may
spin up any number of other instances to handle current load. Those
new instances will have a cold start cost. Then, when load decreases,
the unnecessary instances will be shut down.
you can now specify a minimum number of instances to keep active. This
can help reduce (but not eliminate) cold starts. Read the Google
Cloud blog and the documentation.
If you absolutely demand hot servers to handle requests 24/7, then you
need to manage your own servers that run 24/7 (and pay the cost of
those servers running 24/7). As you can see, the benefit of serverless
is that you don't manage or scale your own servers, and you only pay
for what you use, but you have unpredictable cold start costs
associated with your project. That's the tradeoff.
For more information related to dependencies you can refer to the link provided by guillaume blaquiere.
To answer your exact question:
Can I connect without using HikariCP?
The answer is sure; you can use any number of connection pooling libraries avaible in Java. The examples often show HikariCP because it is far and away the most popular and highest performing.
So it's unlikely that switching connection pools will improve your performance. A slightly different question implied by your first question might be:
Can I connect without using a connection pool?
And again the answer is sure, you could use the driver directly -- but you probably shouldn't. Connection creation and management is expensive (and hard), and using a connection pool is a best practice. I wouldn't consider code "production quality" without one. While it might save you boot time, it's likely to introduce more overhead and latency into the request itself, costing you more overall. Additionally, it'll remove helpful error handling and retries around connections that you'll now have to deal with yourself.
So it seems you question really might be:
How can I reduce my cold start time?
Well with a start time of 25 seconds, the problem likely isn't limited to just Hikari. I would check out this GCP doc page on performance, and look into other articles on how to improve start up time for JVMs or your specific frameworks.
However, one way that HikariCP might be impacting your start up time is that HikariCP blocks on the connection creation until the initialization is complete. There are a few things you can do to improve this (but likely will only help, not eliminate the 25s cold start)
You can lower your number of connections to 1. Cloud function instances only handle requests one at a time, so specifying a min-idle of 4 and a max connection to 10 is likely leading to wasted connections.
You can move the initialization of Hikari to happen outside of your start up. The GCP docs page I mentioned above shows how to use lazy initialization, so expensive object's aren't created until you need them. This will move the cost of initializing Hikari out of your functions start up. This could make the first request that calls it more expensive -- if that is a concern, I would suggest combining lazy initialization along with triggering that initialization in async way on start up. This way the pool is created in the background, without blocking startup.
As an alternative to #2, you could also lower min-idle connections to 0 - e.i., initialize the Hikari Pool with 0 connections in it. While this might be easier to implement, it will mean that requests without a warmed up connection will have to wait for a new connection to be established. (which makes #2 more optimal in terms of performance).

Best technique to make node mysql run fastest?

I am using this
var mysql = require('mysql');
in my node.js app. I am interested to make my app perform the fastest. I have many functions that connect to SQL. There is 2 approaches I am familiar with
For every request, I make a new connection and then execute the query and the close the connection.
Open the connection and make it a global variable, and then never close it. Then for every request that comes in, it just uses the opened connection saved globally.
Which is generally better to use? Also for number 2, if the server closes unexpectedly, then the sql connection doesn't close. Is that bad?
Thanks
Approach 2 is faster, but to avoid the potential problem of connections dropping without unexpectedly, you'll have to implement testing mechanism for every segment that queries the database (ex: count the number of returned rows).
To take this approach further, you can define connections bank or pool. Where you can deal with connection testing and distributions. The basic idea is to have many connections to the database and just inject healthy connections to consumers (functions, or objects that query the database). As Andrew mentions in the comments You can check this question: node.js + mysql connection pooling
Since the database is an essential asset to a project, if this is not a homework or learning project, it might not be a bad idea to explore 3rd party libraries, where a lot of the connections and security details is covered and automated.

Microservices centralized database model

Currently we have some microservice, they have their own database model and migration what provided by GORM Golang package. We have a big old MySQL database which is against the microservices laws, but we can't replace it. Im afraid when the microservices numbers start to growing, we will be lost in the many database model. When I add a new column in a microservice I just type service migrate to the terminal (because there is a cli for run and migrate commands), and it is refresh the database.
What is the best practice to manage it. For example I have 1000 microservice, noone will type the service migrate when someone refresh the models. I thinking about a centralized database service, where we just add a new column and it will store all the models with all migration. The only problem, how will the services get to know about database model changes. This is how we store for example a user in a service:
type User struct {
ID uint `gorm:"column:id;not null" sql:"AUTO_INCREMENT"`
Name string `gorm:"column:name;not null" sql:"type:varchar(100)"`
Username sql.NullString `gorm:"column:username;not null" sql:"type:varchar(255)"`
}
func (u *User) TableName() string {
return "users"
}
Depending on your use cases, MySQL Cluster might be an option. Two phase commits used by MySQL Cluster make frequent writes impractical, but if write performance isn't a big issue then I would expect MySQL Cluster would work out better than connection pooling or queuing hacks. Certainly worth considering.
If I'm understanding your question correctly, you're trying to still use one MySQL instance but with many microservices.
There are a couple of ways to make an SQL system work:
You could create a microservice-type that handles data inserts/reads from the database and take advantage of connection pooling. And have the rest of your services do all their data read/writes through these services. This will definitely add a bit of extra latency to all your writes/reads and likely be problematic at scale.
You could attempt to look for a multi-master SQL solution (e.g. CitusDB) that scales easily and you can use a central schema for your database and just make sure to handle edge cases for data insertion (de-deuping etc.)
You can use data-streaming architectures like Kafka or AWS Kinesis to transfer your data to your microservices and make sure they only deal with data through these streams. This way, you can de-couple your database from your data.
The best way to approach it in my opinion is #3. This way, you won't have to think about your storage at the computation layer of your microservice architecture.
Not sure what service you're using for your microservices, but StdLib forces a few conversions (e.g. around only transferring data through HTTP) that helps folks wrap their head around it all. AWS Lambda also works very well with Kinesis as a source to launch the function which could help with the #3 approach.
Disclaimer: I'm the founder of StdLib.
If I understand your question correctly it seems to me that there may be multiple ways to achieve this.
One solution is to have a schema version somewhere in the database that your microservices periodically check. When your database schema changes you can increase the schema version. As a result of this if a service notices that the database schema version is higher than the current schema version of the service it can migrate the schema in the code which gorm allows.
Other options could depend on how you run your microservices. For example if you run them using some orchestration platform (e.g. Kubernetes) you could put the migration code somewhere to run when your service initializes. Then once you update the schema you can force a rolling refresh of your containers which would in turn trigger the migration.

Sqlalchemy sessions and autobahn

I'm using the autobahn server in twisted to provide an RPC API. Some calls require queries to the database and multiple clients may be connected via websocket to the server.
I am using the SqlAlchemy ORM to access the database.
What are the pros and cons of the two following approaches for dealing with SqlAlchemy sessions.
Create and destroy a session for every RPC call
Create a single session when the server starts and use it in every RPC call
Which would you recommend and why? (I'm leaning towards 2)
The recommended way of doing SQL-based database access from Twisted (and Autobahn) with databases like PostgreSQL, Oracle or SQLite would be twisted.enterprise.adbapi.
twisted.enterprise.adbapi will run queries on a background thread pool, which is required, since most database drivers are blocking.
Sidenote: for PostgreSQL, there is a native-asynchronous, non-blocking
driver also: txpostgres.
Now, if you put an ORM like SQLAlchemy on top of the native SQL driver, I'm not sure how this will work together (if at all) with twisted.enterprise.adbapi.
So from the options you mention
Is a no go, since most drivers are blocking (and Autobahn's RPCs run on the main thread = Twisted reactor thread - and you MUST not block that).
With this, you need to put the database session(s) in background threads (again, to not block).
Also see here.
If you're using SQLAlchemy and Twisted together, consider using Alchimia rather than the built-in adbapi.

Creating a .NET MVC web app with a Mirrored Database for HA

I am writing my first .NET MVC application and I am using the Code-First approach. I have recently learned how to configure two SQL Servers installations for High Availability using a Mirror Database and a Witness (not to be confused with Failover Clusters) to do the failover process. I think this would be a great time to practice both things by mounting my web app into a highly-available DB.
Now, for what I learned (correct me if I'm wrong) in the mirror configuration you have the witness failover to the secondary DB if the first one goes down... but your application will also need to change the connection string to reference the secondary server.
What is the best approach to have both addresses in the Web.config (or somewhere else) and choosing the right connection string?
I have zero experience with connecting to Mirrored databases, so this is all heresy! :)
The short of it may be you may not have to do anything special, as long as you pass along the FailoverPartner attribute in your connection string. The long of it is you may need additional error handling to attempt a new connection so the data provide will actually use the FailoverPartner name in the new connection.
There seems to be some good information with Connecting Clients to a Database Mirroring Session to get started. Have you had a chance to check that out?
If not, its there with Making the Initial Connection where they introduce the FailoverPartner attribute of the ConnectionString property attributes.
Reconnecting to a Database Mirroring Session suggests that on any client disconnect due to failover, the client will need to trap this exception and be prepared to reconnect:
The application must become aware of
the error. Then, the application needs
to close the failed connection and
open a new connection using the same
connection string attributes.
If the FailoverPartner attribute is available, this process should be relatively transparent to the client.
If the above doesn't work, then you might need to actually introduce some logic at the application tier to track who is the primary node, the failover node, and connection strings for each, and be prepared to persist that information somewhere - much like the data access provider should be doing for us (eyes wide open).
There is also this ServerFault post on database mirroring with Sql Server that might be of interest from an operational viewpoint that has additional reference information.
Hopefully someone with actual experience will back up any of this!
This may be totally off base, but what if you had a load balancer between your web server and the database servers?
The Load Balancer would have both databases in it's pool, using basic health check techniques (e.g ping, etc).
Your configuration would then only need to point to the IP of the Load Balancer, and wouldn't need to change.
This is what these network devices are good for. It's not the job of the programming framework (ASP.NET) to make decisions on the health of servers.