AWS Lambda - MySQL caching - mysql

I have Lambda that uses RDS. I wanted to improve it and use the Lambda connection caching. I have found several articles, and implemented it on my side, best to my knowledge. But now, I am not sure it is this the rigth way to go.
I have Lambda (running Node 8), which has several files used with require. I will start from the main function, until I reach the MySQL initializer, which is exact path. All will be super simple, showing only to flow of the code that runs MySQL:
Main Lambda:
const jobLoader = require('./Helpers/JobLoader');
exports.handler = async (event, context) => {
const emarsysPayload = event.Records[0];
let validationSchema;
const body = jobLoader.loadJob('JobName');
...
return;
...//
Job Code:
const MySQLQueryBuilder = require('../Helpers/MySqlQueryBuilder');
exports.runJob = async (params) => {
const data = await MySQLQueryBuilder.getBasicUserData(userId);
MySQLBuilder:
const mySqlConnector = require('../Storage/MySqlConnector');
class MySqlQueryBuilder {
async getBasicUserData (id) {
let query = `
SELECT * from sometable WHERE id= ${id}
`;
return mySqlConnector.runQuery(query);
}
}
And Finally the connector itself:
const mySqlConnector = require('promise-mysql');
const pool = mySqlConnector.createPool({
host: process.env.MY_SQL_HOST,
user: process.env.MY_SQL_USER,
password: process.env.MY_SQL_PASSWORD,
database: process.env.MY_SQL_DATABASE,
port: 3306
});
exports.runQuery = async query => {
const con = await pool.getConnection();
const result = con.query(query);
con.release();
return result;
};
I know that measuring performance will show the actual results, but today is Friday, and I will not be able to run this on Lambda until the late next week... And really, it would be awesome start of the weekend knowing I am in right direction... or not.
Thank for the inputs.

First thing would be to understand how require works in NodeJS. I do recommend you go through this article if you're interested in knowing more about it.
Now, once you have required your connection, you have it for good and it won't be required again. This matches what you're looking for as you don't want to overwhelm your database by creating a new connection every time.
But, there is a problem...
Lambda Cold Starts
Whenever you invoke a Lambda function for the first time, it will spin up a container with your function inside it and keep it alive for approximately 5 mins. It's very likely (although not guaranteed) that you will hit the same container every time as long as you are making 1 request at a time. But what happens if you have 2 requests at the same time? Then another container will be spun up in parallel with the previous, already warmed up container. You have just created another connection on your database and now you have 2 containers. Now, guess what happens if you have 3 concurrent requests? Yes! One more container, which equals one more DB connection.
As long as there are new requests to your Lambda functions, by default, they will scale out to meet demand (you can configure it in the console to limit the execution to as many concurrent executions as you want - respecting your Account limits)
You cannot safely make sure you have a fixed amount of connections to your Database by simply requiring your code upon a Function's invocation. The good thing is that this is not your fault. This is just how Lambda functions behave.
...one other approach is
to cache the data you want in a real caching system, like ElasticCache, for example. You could then have one Lambda function be triggered by a CloudWatch Event that runs in a certain frequency of time. This function would then query your DB and store the results in your external cache. This way you make sure your DB connection is only opened by one Lambda at a time, because it will respect the CloudWatch Event, which turns out to run only once per trigger.
EDIT: after the OP sent a link in the comment sections, I have decided to add a few more info to clarify what the mentioned article wants to say
From the article:
"Simple. You ARE able to store variables outside the scope of our
handler function. This means that you are able to create your DB
connection pool outside of the handler function, which can then be
shared with each future invocation of that function. This allows for
pooling to occur."
And this is exactly what you're doing. And this works! But the problem is if you have N connections (Lambda Requests) at the same time. If you don't set any limits, by default, up to 1000 Lambda functions can be spun up concurrently. Now, if you then make another 1000 requests simultaneously in the next 5 minutes, it's very likely you won't be opening any new connections, because they have already been opened on previous invocations and the containers are still alive.

Adding to the answer above by Thales Minussi but for a Python Lambda. I am using PyMySQL and to create a connection pool I added the connection code above the handler in a Lambda that fetches data. Once I did this, I was not getting any new data that was added to the DB after an instance of the Lambda was executed. I found bugs reported here and here that are related to this issue.
The solution that worked for me was to add a conn.commit() after the SELECT query execution in the Lambda.
According to the PyMySQL documentation, conn.commit() is supposed to commit any changes, but a SELECT does not make changes to the DB. So I am not sure exactly why this works.

Related

How to fix Cloud SQL (MySQL) & Cloud functions slow queries

I have an application that, through the Firebase Cloud Functions, connects to a Cloud SQL database (MySQL).
The SQL CLOUD machine I am using is the free and lowest level one. (db-f1-micro, shared core, 1vCPU 0.614 GB)
I report below what is my architecture of use for the execution of a simple query.
I have a file called "database.js" which exports my connection (pool) to the db.
const mysqlPromise = require('promise-mysql');
const cf = require('./config');
const connectionOptions = {
connectionLimit: cf.config.connection_limit, // 250
host: cf.config.app_host,
port: cf.config.app_port,
user: cf.config.app_user,
password: cf.config.app_password,
database: cf.config.app_database,
socketPath: cf.config.app_socket_path
};
if(!connectionOptions.host && !connectionOptions.port){
delete connectionOptions.host;
delete connectionOptions.port;
}
const connection = mysqlPromise.createPool(connectionOptions)
exports.connection = connection
Here instead is how I use the connection to execute the query within a "callable cloud function"
Note that the tables are light (no more than 2K records)
// import connection
const db = require("../Config/database");
// define callable function
exports.getProdottiNegozio = functions
.region("europe-west1")
.https.onCall(async (data, context) => {
const { id } = data;
try {
const pool = await db.connection;
const prodotti = await pool.query(`SELECT * FROM products WHERE shop_id=? ORDER BY name`, [id]);
return prodotti;
} catch (error) {
throw new functions.https.HttpsError("failed-precondition", error);
}
});
Everything works correctly, in the sense that the query is executed and returns the expected results, but there is a performance.
Query execution is sometimes very slow. (up to 10 seconds !!!).
I have noticed that some times in the morning they are quite fast (about 1 second), but sometimes they are very slow and make my application very slow.
Checking the logs inside the GCP console I noticed that this message appears.
severity: "INFO"
textPayload: "2021-07-30T07:44:04.743495Z 119622 [Note] Aborted connection 119622 to db: 'XXX' user: 'YYY' host: 'cloudsqlproxy~XXX.XXX.XXX.XXX' (Got an error reading communication packets)"
At the end of all this I would like some help to understand how to improve the performance of the application.
Is it just a SQL CLOUD machine problem? Would it be enough to increase resources to have decent query execution?
Or am I wrong about the architecture of the code and how I organize the functions and the calls to the db?
Thanks in advance to everyone :)
Don't connect directly to your database with an auto scaling solution:
You shouldn't use an auto scaling web service (Firebase Functions) to connect to a database directly. Imagine you get 400 requests, that means 400 connections opened to your database if each function tries to connect on startup. Your database will start rejecting (or queuing) new connections. You should ideally host a service that is online permanently and let Firebase Function tell that service what to query with an existing connection.
Firebase functions takes its sweet time to start up:
Firebase Functions takes 100~300ms to start (cold start) for each function called. So add that to your wait time. More so if your function relies on a connection to something else before it can respond.
Functions have a short lifespan:
You should also know that Firebase Functions don't live very long. They are meant to be single task microservices. Their lifespan is 90 seconds if I recall correctly. Make sure your query doesn't take longer than that
Specific to your issue:
If your database gets slow during the day it might be because the usage increases.
You are using a shared core, which means you share resources on the lowest tier with the the other lower tier databases in that region/zone. You might need to increase resources, like move to a dedicated core, or optimize your query(ies). I'd recommend bumping up your CPU. The cost is really low for small CPU options

How to debug knex.js? Without having to pollute my db

does anyone know anything about debugging with knexjs and mysql? I'm trying to do a lot of things and test out stuff and I keep polluting my test database with random data. ideally, I'd just like to do things and see what the output query would be instead of running it against the actual database to see if it actually worked.
I can't find anything too helpful in their docs. they mention passing {debug: true} as one of the options in your initialize settings but it doesn't really explain what it does.
I am a junior developer, so maybe some of this is not meant to be understood by juniors but at the end of the day Is just not clear at all what steps I should take to be able to just see what queries would have been ran instead of running the real queries and polluting my db.
const result = await db().transaction(trx =>
trx.insert(mapToSnakeCase(address), 'id').into('addresses')
.then(addressId =>
trx.insert({ addresses_id: addressId, display_name: displayName }, 'id')
.into('chains')).toString();
You can build a knex query, but until you attach a .then() or awiat() (or run . asCallback((error,cb)=>{})), the query is just an object.
So you could do
let localVar = 8
let query = knex('table-a').select().where('id', localVar)
console.log(query.toString())
// outputs a string 'select * from table-a where id = 8'
This does not hit the database, and is synchronous. Make as many of these as you want!
As soon as you do await query or query.then(rows => {}) or query.asCallback( (err,rows)=>{} ) you are awaiting the db results, starting the promise chain or defining the callback. That is when the database is hit.
Turning on debug: true when initializing just writes the results of query.toSQL() to the console as they run against the actual DB. Sometimes an app might make a lot of queries and if one goes wrong this is a way to see why a DB call failed (but is VERY verbose so typically is not on all the time).
In our app's tests, we do actually test against the database because unit testing this type of stuff is a mess. We use knex's migrations on a test database that is brought down and up every time the tests run. So it always starts clean (or with known seed data), and if a test fails the DB is in the same state to be manually inspected. While we create a lot of test data in a testing run, it's cleaned up before the next test.

How to implement using MYSQL jdbc in an Akka Actor

Hey I read this jdbc docs
https://www.playframework.com/documentation/2.1.0/ScalaDatabase
and this question
Is it good to put jdbc operations in actors?
Now I have an ActorClass for my mysql transaction, and this actor instantiated several times, whenever request comes. So each request would instantiate new actor. Is it safe for connection pool?
Can I use
val connection = DB.getConnection()
is connection object could handle async transaction?
So I could just a singleton to handle mysql connection and used it in all instantiated actors. Also if I want to use anorm, how do I make an implicit connection object?
Thanks
Your DB.getConnection() should be a promise[Connection] or a future[Connection] if you don't want to block the actor. (caveats at the end of the answer)
If your DB.getConnection() is synchronous (returning only connection without wrapping type) your actor will hang until it actually gets a connection from the pool while processing the actual message. It doesn't matter your DB being singleton or not, in the end it will hit the connection pool.
That being said, you can create actors to handle the messaging and other actors to handle persistence in the database, put them in different thread dispatchers giving more thread to database intensive. This is suggested also in the PlayFramework.
Caveats:
If you run futures inside the actor you are not ensure of the thread/timing it will run, I'm assuming you did something in the line of these (read the comments)
def receive = {
case aMessage =>
val aFuture = future(db.getConnection)
aFuture.map { theConn => //from previous line when you acquire the conn and when you execute the next line
//it could pass a long time they run in different threads/time
//that's why you should better create an actor that handles this sync and let
//akka do the async part
theConn.prepareStatemnt(someSQL)
//omitted code...
}
}
so my suggestion would be
//actor A receives,
//actor B proccess db (and have multiple instances of this one due to slowness from db)
class ActorA(routerOfB : ActorRef) extends Actor {
def recieve = {
case aMessage =>
routerOfB ! aMessage
}
}
class ActorB(db : DB) extends Actor {
def receive = {
case receive = {
val conn = db.getConnection //this blocks but we have multiple instances
//and enforces to run in same thread
val ps = conn.prepareStatement(someSQL)
}
}
}
You will need routing: http://doc.akka.io/docs/akka/2.4.1/scala/routing.html
As I know you couldn't run multiple concurrent query on a single connection for RDBMS ( I've not seen any resource/reference for async/non-blocking call for mysql even in C-API; ). To run your queries concurrently, you most have multiple connection instances.
DB.getConnection isn't expensive while you have multiple instances of connection. The most expensive area for working with DB is running sql query and waiting for its response.
To being async with your DB-calls, you should run them in other threads (not in the main thread-pool of Akka or Play); Slick does it for you. It manages a thread-pool and run your DB-calls on them, then your main threads will be free for processing income requests. Then you dont need to wrap your DB-calls in actors to being async.
I think you should take a connection from pool and return when it is done. If we go by single connection per actor what if that connection gets disconnect, you may need to reinitialize it.
For transactions you may want to try
DB.withTransaction { conn => // do whatever you need with the connection}
For more functional way of database access I would recommend to look into slick which has a nice API that can be integrated with plain actors and going further with streams.

all pooled connections were in use and max pool size was reached

I am writing a .NET 4.0 console app that
Opens up a connection Uses a Data Reader to cursor through a list of keys
For each key read, calls a web service
Stores the result of a web service in the database
I then spawn multiple threads of this process in order to improve the maximum number of records that I can process per second.
When I up the process beyond about 30 or so threads, I get the following error:
System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
Is there an Server or client side option to tweak to allow me to obtain more connections fromn the connection pool?
I am calling a sql 2008 r2 DATABASE.
tHx
This sounds like a design issue. What's your total record count from the database? Iterating through the reader will be really fast. Even if you have hundreds of thousands of rows, going through that reader will be quick. Here's a different approach you could take:
Iterate through the reader and store the data in a list of objects. Then iterate through your list of objects at a number of your choice (e.g. two at a time, three at a time, etc) and spawn that number of threads to make calls to your web service in parallel.
This way you won't be opening multiple connections to the database, and you're dealing with what is likely the true bottleneck (the HTTP call to the web service) in parallel.
Here's an example:
List<SomeObject> yourObjects = new List<SomeObject>();
if (yourReader.HasRows) {
while (yourReader.Read()) {
SomeObject foo = new SomeObject();
foo.SomeProperty = myReader.GetInt32(0);
yourObjects.Add(foo);
}
}
for (int i = 0; i < yourObjects.Count; i = i + 2) {
//Kick off your web service calls in parallel. You will likely want to do something with the result.
Task[] tasks = new Task[2] {
Task.Factory.StartNew(() => yourService.MethodName(yourObjects[i].SomeProperty)),
Task.Factory.StartNew(() => yourService.MethodName(yourObjects[i+1].SomeProperty)),
};
Task.WaitAll(tasks);
}
//Now do your database INSERT.
Opening up a new connection for all your requests is incredibly inefficient. If you simply want to use the same connection to keep requesting things, that is more than possible. You can open a connection, and then run as many SqlCommand commands through that one connection. Simply keep the ONE connection around, and dispose of it after all your threading is done.
Please restart the IIS you will be able to connect

mysql_connect "bool $new_link = true" is very slow

I'm using latest version of Xampp on 64bit Win7.
The problem is that, when I use mysql_connect with "bool $new_link" set to true like so:
mysql_connect('localhost', 'root', 'my_password', TRUE);
script execution time increases dramatically (about 0,5 seconds per connection, and when I have 4 diffirent objects using different connections, it takes ~2 seconds).
Is setting "bool $new_link" to true, generally a bad idea or could it just be some problem with my software configuration.
Thank you.
//Edit:
I'm using new link, because I have multiple objects, that use mysql connections (new objects can be created inside already existing objects and so on). In the end, when it comes to unsetting objects (I have mysql_close inside my __destruct() functions), I figured, that the only way to correctly clean up loose ends would be that all objects have their own connection variables.
I just formated my PC so configuration should be default conf.
Don't open a new connection unless you have a need for one (for instance, accessing multiple databases simultaneously).
Also you don't have to explicitly call mysql_close. I usually just include a function to quickly retrieve an existing db link (or a new one if none exists yet).
function &getDBConn() {
global $DBConn;
if(!$DBConn) $DBConn = mysql_connect(...);
return $DBConn;
}
// now you can just call $dbconn = getDBConn(); whenever you need it
Use "127.0.0.1" instead of "localhost". It improved my performance with mysql_connect from ~1 sek to a couple of milliseconds.
Article about php/mysql_connect and IPv6 on windows: http://www.bluetopazgames.com/uncategorized/php-mysql_connect-is-slow-1-second-for-localhost-windows-7/