Building a scalable API with Node and MySQL/MariaDB

Building a scalable API with Node and MySQL/MariaDB - mysql

What is considered best practice for handling and managing connections when building an API or web application with Node.js that depends on MySQL (or in my case, MariaDB)?
Per the documentation for node-mysql, there seem to be two methods to use:
var connection = mysql.createConnection({...});
app.get("/", function(req, res) {
connection.query("SELECT * FROM ....", function(error, result) {
res.json(result);
});
});
-- or --
var pool = mysql.createPool({...});
app.get("/", function(req, res) {
pool.getConnection(error, connection) {
if (error) {
console.log("Error getting new connection from pool");
} else {
connection.query("SELECT * FROM ....", function(error, result) {
connection.release();
res.json(result);
});
}
});
});
To me, it makes the most sense to use the second option, as it should use as many connections as are needed, as opposed to relying on a single connection. However, I have experienced problems using a pool with multiple routes, i.e each route gets a new connection from the pool, executes a query, and releases it back into the pool. Each time I get a connection from a pool, use it, and release it, it seems there is still a process in MySQL waiting for another request. Eventually, these processes build up in MySQL (visible by running SHOW PROCESSLIST) and the application is no longer able to retrieve a connection from the pool.
I have resorted to using the first method because it works and my application doesn't crash, but it doesn't seem like a robust solution. However, node-mariasql looks promising, but I can't tell if that will be any better than what I am currently using.
My question is: what is the best way to handle/structure MySQL connections when building an API or web application that relies heavily on SQL queries on almost every request?

Changing connection.release() to connection.destory() solved my issue. I'm not sure what the former is supposed to do, but the latter behaves as expected and actually removes the connection. This means once a connection is done being used, it kills the MySQL process and creates another when needed. This also means that many queries can hit the API simultaneously, and slow queries will not block new ones.

Better late then never.
connection.destroy() would mean that on each impression you are making a new connection to mySQL, instead of just grabbing an idle connection and querying on that which would have less overhead. Basically you are not using the pool anymore.
Its possible your mySQL user had a limited number of connections to mysql, or that the number of queries you were making to sql were slower then the number of impressions coming into your server.
You can try tweaking the connectionLimit parameter to something higher, so your server can handle more connections simultaneously.
var pool = mysql.createPool({
connectionLimit : 10,
host : 'example.org',
user : 'bob',
password : 'secret'
});

Related

To create all function as async is a bad practice with mysql knex

I have used knex.js (with slave and master connection) in node to connect mysql.
My code is working fine for 150 users but when the concurrent user increases, the heap usage in PM2 also reaches to 100% or above and server stop responding or responded very slow.
The AWS CPU (5x large aws ec2 instance with 8 core cpu and 32GB RAM) use is 12-20%.
Application is responding in millisecond with JWT authentication and pm2.
There are lot of queries that depends upon previous query result, so i created all function as async.
Problematic point:
In every hour, a slot is open for all users (approx 150 000) to edit their content (approx 3k live concurrent).
Temporary solution that i did:
Implementation cluster resolve my problem but on cluster kill not closing the DB connection.
My doubt:
When the application is working fine with cluster, why the application does not work with out cluster in same configuration?
async function authorize(req, res) {
//decode aes-128 request
let body = await helper.get_decoded_request(req);
if (!body) {
return res.status(200).json(await helper.error("Error occurs while decoding."));
}
const schema = joi.object({
"client_id": joi.string().min(1).required(),
"client_secret": joi.string().min(1).required()
});
try {
await schema.validateAsync(body);
}
catch (err) {
return res.status(200).json(await helper.error("Please send the valid request"));
}
try {
//data base update and add function that writen in other file
await user.update_user(data);
await match.add_transaction(data);
} catch (err) {
return res.status(200).json(await helper.error("Please send the valid request"));
}
return res.status(200).json(await helper.success("Token added successfully.", jsontoken));
}
Model file code:
const { read_db, write_db } = require("./database");
async function add_transaction(data) {
return await write_db.insert(data).table("users");
}
Database file:
var knex = require("knex");
const read_db = knex({
client: "mysql",
connection: {
database: process.env.SLAVE_MYSQL_DB,
host: process.env.SLAVE_MYSQL_HOST,
port: process.env.SLAVE_MYSQL_PORT,
user: process.env.SLAVE_MYSQL_USER,
password: process.env.MYSQL_PASSWORD
},
pool: { min: 1, max: 10 }
});
const write_db = knex({
client: "mysql",
connection: {
database: process.env.MYSQL_DB,
host: process.env.MYSQL_HOST,
port: process.env.MYSQL_PORT,
user: process.env.MYSQL_USER,
password: process.env.MYSQL_PASSWORD
},
pool: { min: 1, max: 10 }
});
module.exports = { read_db, write_db };

There is not enough information to give any answers about the root cause of the problem, but maybe these general steps how to start solving the problem helps.
The first and most important part is to replicate the slowdown locally and run your app in profiler to actually find out which parts of the code are actually taking all that CPU. Running your app with node --inspect allows you to run profile the execution of your app in Chrome browsers profiler.
Also sounds like you are running just a single node process, so it is pretty strange how are you able to use 100% CPU from all 8 cores with quite low user count (less than 1000?)... One thing that comes to my mind is if you check for example password hash on every user request, it might actually start to use lots of CPU very fast.
If that indeed is the problem you could implement for example JSON Web Token support to your app, where password is only used to get the access token which is then really fast to verify in comparison to a password authentication.
When your basic app is working fine then you should start thinking how to run multiple server processes on single instance to be able to utilize all the CPU cores better.
EDIT:
Sounds like that 12-20% of 5-core instance is about 100% of single core. So your app seems to be saturating the CPU.
About the heap usage, if you profile the application can you see if it is the garbage collection which starts to halt the application.
One reason could be that you are actually creating too much trash which at some point starts to hurt the performance. You could also try if for example increasing default heap size from 1.7GB to 8GB node --max-old-space-size=8092 helps a bit.
In addition to garbage collection, for example if you are returning tens of thousands rows from DB just parsing them to javascript objects can saturate the CPU. So profiling is still right way to try to find out what is going on. Parsing especially dates is really slow in knex by default (when they are converted to Date objects).

Using Node and Mysql

I am working on angular app and php as backend that will do the data processing to MySQL. recently i found out node-mysql plugin for Nodejs that will communicate to MySQL through JS.
After the documentations provided there, i have a question which i want to ask here and need some enlightenment from all of you guys.
According to documentation, we will have to declare the connection in JS file as
var mysql = require('mysql');
var connection = mysql.createConnection({
host : 'localhost',
user : 'me',
password : 'secret',
database : 'my_db'
});
connection.connect();
connection.query('SELECT 1 + 1 AS solution', function(err, rows, fields) {
if (err) throw err;
console.log('The solution is: ', rows[0].solution);
});
connection.end();
By providing all the sensitive database login data in JS, will it be a big hole for security issue there? if Yes, how can prevent it?
And the query will be done as well through the JS file
var mysql = require('mysql');
var pool = mysql.createPool(...);
pool.getConnection(function(err, connection) {
// Use the connection
connection.query( 'SELECT something FROM sometable', function(err, rows) {
// And done with the connection.
connection.release();
// Don't use the connection here, it has been returned to the pool.
});
});
Is that mean that the attacker will easily to find out what is the query that we use to query to database? Unlike server side language like PHP where we just call the php file together with the params.
Is it safe to use this driver on Nodejs?
Sorry for being newbie with this such questions.

Node JS is server side too. Node JS using javascript for coding, not mean it will expose to your clients browser. It just seen on server side and stand as backend who give response to client browser request.
For simple explanation, just imagine Node JS as PHP server but in Javascript language and don't need apache server. Of course they have different behavior and many different feature. You better read some tutorial about how Node JS work first and try your self before read advance tutorial.

Pattern for handling MySQL database connections within an express application

I am using express 4.x, and the latest MySQL package for node.
The pattern for a PHP application (which I am most familiar with) is to have some sort of database connection common file that gets included and the connection is automatically closed upon the completion of the script. When implementing it in an express app, it might look something like this:
// includes and such
// ...
var db = require('./lib/db');
app.use(db({
host: 'localhost',
user: 'root',
pass: '',
dbname: 'testdb'
}));
app.get('/', function (req, res) {
req.db.query('SELECT * FROM users', function (err, users) {
res.render('home', {
users: users
});
});
});
Excuse the lack of error handling, this is a primitive example. In any case, my db() function returns middleware that will connect to the database and store the connection object req.db, effectively giving a new object to each request. There are a few problems with this method:
This does not scale at all; database connections (which are expensive) are going to scale linearly with fairly inexpensive requests.
Database connections are not closed automatically and will kill the application if an uncaught error trickles up. You have to either catch it and reconnection (feels like an antipattern) or write more middleware that EVERYTHING must call pior to output to ensure the connection is closed (anti-DRY, arguably)
The next pattern I've seen is to simply open one connection as the app starts.
var mysql = require('mysql');
var connection = mysql.createConnection(config);
connection.on('connect', function () {
// start app.js here
});
Problems with this:
Still does not scale. One connection will easily get clogged with more than just 10-20 requests on my production boxes (1gb-2gb RAM, 3.0ghz quad CPU).
Connections will still timeout after a while, I have to provide an error handler to catch it and reconnection - very kludgy.
My question is, what kind of approach should be taken with handing database connections in an express app? It needs to scale (not infinitely, just within reason), I should not have to manually close in the route/include extra middleware for every path, and I (preferably) to not want to catch timeout errors and reopen them.

Since, you're talk about MySQL in NodeJS, I have to point you to KnexJS! You'll find writing queries is much more fun. The other thing they use is connection pooling, which should solve your problem. It's using a little package called generic-pool-redux which manages things like DB connections.
The idea is you have one place your express app access the DB through code. That code, as it turns out, is using a connection pool to share the load among connections. I initialize mine something like this:
var Knex = require('knex');
Knex.knex = Knex({...}); //set options for DB
In other files
var knex = require('knex').knex;
Now all files that could access the DB are using the same connection pool (set up once at start).
I'm sure there are other connection pool packages out there for Node and MySQL, but I personally recommend KnexJS if you're doing any dynamic or complex SQL queries. Good luck!

When use poolConnection or CreateConnection felixge/node-mysql

I use https://github.com/felixge/node-mysql for my application
When and Why use
db_pool = mysql.createConnection(db);
or
db_pool = mysql.createPool(db);
what are the differences? and when to use them?

A single connection is blocking. While executing one query, it cannot execute others. Hence, your DB throughput may be reduced.
A pool manages many lazily-created (in felixge's module) connections. While one connection is busy running a query, others can be used to execute subsequent queries. This can result in an increase in application performance as it allows multiple queries to be run in parallel.

Connection pooling allows you to reuse existing database connections instead of opening a new connection for every request to your Node application.
Many PHP and .Net folks are accustomed to connection pooling, since the standard data access layers in these platforms pool connections automatically (depending on how you access the database.)
Opening a new database connection takes time and server resources. Using a connection that is already there is much faster, and overall, your application should need to maintain less total open connections at any one time if you use connection pooling.
The connection pooling functionality of node-mysql works very well and is easy to use. I keep the pool in a global variable and just pass that to any modules that need to access the database.
For example, here the env_settings variable in the app server holds global settings, including the active connection pool:
var http = require("http");
var mysql = require('mysql');
var env_settings = {
dbConnSettings: {
host: "localhost",
database: "yourDBname",
user: "yourDBuser",
password: "yourDBuserPassword"
},
port: 80
};
// Create connection pool
env_settings.connection_pool = mysql.createPool(env_settings.dbConnSettings);
var app = connect()
.use(site.ajaxHandlers(env_settings));
http.createServer(app).listen(env_settings.port);
And here is the ajaxHandlers module that uses the connection pool:
ajaxHandlers = function (env_settings) {
return function ajaxHandlers(req, res, next) {
var sql, connection;
env_settings.connection_pool.getConnection(function(err, connection) {
sql = "SELECT some_data FROM some_table";
connection.query(sql, function(err, rows, fields) {
if (err) {
connection.release();
// Handle data access error here
return;
}
if (rows) {
for (var i = 0; i < rows.length; i++) {
// Process rows[i].some_data
}
}
connection.release();
res.end('Process Complete');
return;
});
});
}
}
/* Expose public functions ------ */
exports.ajaxHandlers = ajaxHandlers;
The connection_pool.getConnection method is asynchronous, so when the existing open connection is returned from the pool, or a new connection is opened if need be, then the callback function is called and you can use the connection. Also note the use of connection.release() instead of ending the connection as normal. The release just allows the pool to take back the connection so it can be reused.
Here is a good way to think about the difference. Take the example of a very simple app that takes requests and returns a data set containing the results. Without connection pooling, every time a request is made, a new connection is opened to the database, the results are returned, and then the connection is closed. If the app gets more requests per second that it can fulfill, then the amount of concurrent open transactions increases, since there are more than one connection active at any time. Also, each transaction will take longer because it has to open a new connection to the data server, which is a relatively big step.
With connection pooling, the app will only open new connections when none are in the pool. So the pool will open a bunch of new connections upon the first few requests, and leave them open. Now when a new request is made, the connection pooling process will grab a connection that is already open and was used before instead of opening a new connection. This will be faster, and there will be less active connections to the database under heavy load. Of course, there will be more "waiting" connections open when no one is hitting the server, since they are held in the pool. But that is not usually an issue because the server has plenty of resources available in that case anyway.
So database connection pooling can be used to make your app faster, and more scalable. If you have very little traffic, it is not as important - unless you want to return results as quick as possible. Connection pooling if often part of an overall strategy to decrease latency and improve overall performance.

Connecting NodeJS to MySQL without a module

I'm aware of the popularity of a module like node-mysql for connecting to a database from an application, but I can't find any info on the connecting process without using a module like this.
Obviously I could go fishing around the modules themselves for the answer, but is there really no user-case for simple connections with simple queries without module dependency and bloated functionality?
I find it strange given the very simple I/O of a process like MySQL.

This has less to do with node.js and more to do with knowing how to implement the MySql client/server protocol. You simply need to create a tcp connection to the server and send the correct format and sequence of data per the protocol. node-mysql has done the difficult part: abstracting the protocol into something much easier to use.

This is subjective, but looking at the example in https://github.com/felixge/node-mysql
for me looks like simple connection and simple Query
var mysql = require('mysql');
var connection = mysql.createConnection({
host : 'localhost',
user : 'me',
password : 'secret',
});
connection.connect();
connection.query('SELECT 1 + 1 AS solution', function(err, rows, fields) {
if (err) throw err;
console.log('The solution is: ', rows[0].solution);
});
connection.end();
If you have a look to the source code you'll see what it takes to implement the mysql client protocol, I would say that is not that simple
https://github.com/felixge/node-mysql/blob/master/lib/Connection.js
https://github.com/felixge/node-mysql/tree/master/lib/protocol
But again this is something subjective,IMHO I don't think that there is a simpler way to query MySql.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008