I use https://github.com/felixge/node-mysql for my application
When and Why use
db_pool = mysql.createConnection(db);
or
db_pool = mysql.createPool(db);
what are the differences? and when to use them?
A single connection is blocking. While executing one query, it cannot execute others. Hence, your DB throughput may be reduced.
A pool manages many lazily-created (in felixge's module) connections. While one connection is busy running a query, others can be used to execute subsequent queries. This can result in an increase in application performance as it allows multiple queries to be run in parallel.
Connection pooling allows you to reuse existing database connections instead of opening a new connection for every request to your Node application.
Many PHP and .Net folks are accustomed to connection pooling, since the standard data access layers in these platforms pool connections automatically (depending on how you access the database.)
Opening a new database connection takes time and server resources. Using a connection that is already there is much faster, and overall, your application should need to maintain less total open connections at any one time if you use connection pooling.
The connection pooling functionality of node-mysql works very well and is easy to use. I keep the pool in a global variable and just pass that to any modules that need to access the database.
For example, here the env_settings variable in the app server holds global settings, including the active connection pool:
var http = require("http");
var mysql = require('mysql');
var env_settings = {
dbConnSettings: {
host: "localhost",
database: "yourDBname",
user: "yourDBuser",
password: "yourDBuserPassword"
},
port: 80
};
// Create connection pool
env_settings.connection_pool = mysql.createPool(env_settings.dbConnSettings);
var app = connect()
.use(site.ajaxHandlers(env_settings));
http.createServer(app).listen(env_settings.port);
And here is the ajaxHandlers module that uses the connection pool:
ajaxHandlers = function (env_settings) {
return function ajaxHandlers(req, res, next) {
var sql, connection;
env_settings.connection_pool.getConnection(function(err, connection) {
sql = "SELECT some_data FROM some_table";
connection.query(sql, function(err, rows, fields) {
if (err) {
connection.release();
// Handle data access error here
return;
}
if (rows) {
for (var i = 0; i < rows.length; i++) {
// Process rows[i].some_data
}
}
connection.release();
res.end('Process Complete');
return;
});
});
}
}
/* Expose public functions ------ */
exports.ajaxHandlers = ajaxHandlers;
The connection_pool.getConnection method is asynchronous, so when the existing open connection is returned from the pool, or a new connection is opened if need be, then the callback function is called and you can use the connection. Also note the use of connection.release() instead of ending the connection as normal. The release just allows the pool to take back the connection so it can be reused.
Here is a good way to think about the difference. Take the example of a very simple app that takes requests and returns a data set containing the results. Without connection pooling, every time a request is made, a new connection is opened to the database, the results are returned, and then the connection is closed. If the app gets more requests per second that it can fulfill, then the amount of concurrent open transactions increases, since there are more than one connection active at any time. Also, each transaction will take longer because it has to open a new connection to the data server, which is a relatively big step.
With connection pooling, the app will only open new connections when none are in the pool. So the pool will open a bunch of new connections upon the first few requests, and leave them open. Now when a new request is made, the connection pooling process will grab a connection that is already open and was used before instead of opening a new connection. This will be faster, and there will be less active connections to the database under heavy load. Of course, there will be more "waiting" connections open when no one is hitting the server, since they are held in the pool. But that is not usually an issue because the server has plenty of resources available in that case anyway.
So database connection pooling can be used to make your app faster, and more scalable. If you have very little traffic, it is not as important - unless you want to return results as quick as possible. Connection pooling if often part of an overall strategy to decrease latency and improve overall performance.
Related
TL;DR: Vertical or Horizontal scaling for this system design?
I have NGINX running as a load balancer for my application. It distributes across 4 EC2 (t2.micro's cuz I'm cheap) to route traffic and those are all currently hitting one server for my MySQL database (also a t2.micro, totalling 6 separate EC2 instances for the whole system).
I thinking about horizontally scale my database via Source/Replica distribution, and my thought is that I should route all read queries/GET requests (the highest traffic volume I'll get) to the Replicas and all write queries/POST requests to the Source db.
I know that I'll have to programmatically choose which DB my servers point to based on request method, but I'm unsure of how best to approach that or if I'm better off vertically scaling my DB at that point and investing in a larger EC2 instance.
Currently I'm connecting to the Source DB using an express server and it's handling everything. I haven't implemented the Source/Replica configuration just yet because I want to get my server-side planned out first.
Here's the current static connection setup:
const mysql = require('mysql2');
const Promise = require('bluebird');
const connection = mysql.createConnection({
host: '****',
port: 3306,
user: '****',
password: '*****',
database: 'qandapi',
});
const db = Promise.promisifyAll(connection, { multiArgs: true });
db.connectAsync().then(() =>
console.log(`Connected to QandApi as ID ${db.threadId}`)
);
module.exports = db;
What I want to happen is I want to either:
set up an express middleware function that looks at the request method and connects to the appropriate database by creating 2 configuration templates to put into the createConnection function (I'm unsure of how I would make sure it doesn't try to reconnect if a connection already exists, though)
if possible just open two connections simultaneously and route which database takes which method (I'm hopeful this option will work so that I can make things simpler)
Is this feasible? Am I going to see worse performance doing this than if I just vertically scaled my EC2 to something with more vCPUs?
Please let me know if any additional info is needed.
Simultaneous MySQL Database Connection
I would be hesitant to use any client input to connect to a server, but I understand how this could be something you would need to do in some scenarios. The simplest and quickest way around this issue would be to create a second database connection file. In order to make this dynamic, you can simply require the module based on conditions in your code, so sometimes it will be called and promised at only certain points, after certain conditions. This process could be risky and requires requiring modules in the middle of your code so it isn't ideal but can get the job done. Ex :
const dbConnection = require("../utils/dbConnection");
//conditional {
const controlledDBConnection = require("../utils/controlledDBConnection");
var [row] = await controlledDBConnection.execute("SELECT * FROM `foo`;")
}
Although using more files could potentially have an effect on space constraints and could potentially slow down code while waiting for a new promise, but the overall effect will be minimal. controlledDBConnection.js would just be something close to a duplicate to dbConnection.js with slightly different parameters depending on your needs.
Another path you can take if you want to avoid using multiple files is to export a module with a dynamically set variable from your controller file, and then import it into a standard connection file. This would allow you to change up your connection without rewriting a duplicate, but you will need diligent error checks and a default.
Info on modules in JS : https://javascript.info/import-export
Some other points
Use Environment Variables for your database information like host, etc. since this will allow for you to easily change information for your database all in one place, while also allowing you to include your .env file in .gitignore if you are using github
Here is another great stack overflow question/answer that might help with setting up a dynamic connection file : How to create dynamically database connection in Node.js?
How to set up .env files : https://nodejs.dev/learn/how-to-read-environment-variables-from-nodejs
How to set up .gitignore : https://stackabuse.com/git-ignore-files-with-gitignore/
I have used knex.js (with slave and master connection) in node to connect mysql.
My code is working fine for 150 users but when the concurrent user increases, the heap usage in PM2 also reaches to 100% or above and server stop responding or responded very slow.
The AWS CPU (5x large aws ec2 instance with 8 core cpu and 32GB RAM) use is 12-20%.
Application is responding in millisecond with JWT authentication and pm2.
There are lot of queries that depends upon previous query result, so i created all function as async.
Problematic point:
In every hour, a slot is open for all users (approx 150 000) to edit their content (approx 3k live concurrent).
Temporary solution that i did:
Implementation cluster resolve my problem but on cluster kill not closing the DB connection.
My doubt:
When the application is working fine with cluster, why the application does not work with out cluster in same configuration?
async function authorize(req, res) {
//decode aes-128 request
let body = await helper.get_decoded_request(req);
if (!body) {
return res.status(200).json(await helper.error("Error occurs while decoding."));
}
const schema = joi.object({
"client_id": joi.string().min(1).required(),
"client_secret": joi.string().min(1).required()
});
try {
await schema.validateAsync(body);
}
catch (err) {
return res.status(200).json(await helper.error("Please send the valid request"));
}
try {
//data base update and add function that writen in other file
await user.update_user(data);
await match.add_transaction(data);
} catch (err) {
return res.status(200).json(await helper.error("Please send the valid request"));
}
return res.status(200).json(await helper.success("Token added successfully.", jsontoken));
}
Model file code:
const { read_db, write_db } = require("./database");
async function add_transaction(data) {
return await write_db.insert(data).table("users");
}
Database file:
var knex = require("knex");
const read_db = knex({
client: "mysql",
connection: {
database: process.env.SLAVE_MYSQL_DB,
host: process.env.SLAVE_MYSQL_HOST,
port: process.env.SLAVE_MYSQL_PORT,
user: process.env.SLAVE_MYSQL_USER,
password: process.env.MYSQL_PASSWORD
},
pool: { min: 1, max: 10 }
});
const write_db = knex({
client: "mysql",
connection: {
database: process.env.MYSQL_DB,
host: process.env.MYSQL_HOST,
port: process.env.MYSQL_PORT,
user: process.env.MYSQL_USER,
password: process.env.MYSQL_PASSWORD
},
pool: { min: 1, max: 10 }
});
module.exports = { read_db, write_db };
There is not enough information to give any answers about the root cause of the problem, but maybe these general steps how to start solving the problem helps.
The first and most important part is to replicate the slowdown locally and run your app in profiler to actually find out which parts of the code are actually taking all that CPU. Running your app with node --inspect allows you to run profile the execution of your app in Chrome browsers profiler.
Also sounds like you are running just a single node process, so it is pretty strange how are you able to use 100% CPU from all 8 cores with quite low user count (less than 1000?)... One thing that comes to my mind is if you check for example password hash on every user request, it might actually start to use lots of CPU very fast.
If that indeed is the problem you could implement for example JSON Web Token support to your app, where password is only used to get the access token which is then really fast to verify in comparison to a password authentication.
When your basic app is working fine then you should start thinking how to run multiple server processes on single instance to be able to utilize all the CPU cores better.
EDIT:
Sounds like that 12-20% of 5-core instance is about 100% of single core. So your app seems to be saturating the CPU.
About the heap usage, if you profile the application can you see if it is the garbage collection which starts to halt the application.
One reason could be that you are actually creating too much trash which at some point starts to hurt the performance. You could also try if for example increasing default heap size from 1.7GB to 8GB node --max-old-space-size=8092 helps a bit.
In addition to garbage collection, for example if you are returning tens of thousands rows from DB just parsing them to javascript objects can saturate the CPU. So profiling is still right way to try to find out what is going on. Parsing especially dates is really slow in knex by default (when they are converted to Date objects).
I have a Sparkjava app which I have deployed on a Tomcat server. It uses SQL2O to interface with the MySQL-database. After some time I start to have trouble connecting to the database. I've tried connecting directly from SQL2O, connecting through HikariCP and connecting through JNDI. They all work for about a day, before I start getting Communications link failure. This app gets hit a handful of times a day at best, so performance is a complete non issue. I want to configure the app to use one database connection per request. How do I go about that?
The app doesn't come online again afterwards until I redeploy it (overwrite ROOT.war again). Restarting tomcat or the entire server does nothing.
Currently every request creates a new Sql2o object and executes the query using withConnection. I'd be highly surprised if I was leaking any connections.
Here's some example code (simplified).
public class UserRepositry {
static {
try {
Class.forName("com.mysql.jdbc.Driver");
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
}
protected Sql2o sql2o = new Sql2o("jdbc:mysql://mysql.server.name/dbname?serverTimezone=UTC", "username", "password");
public List<Users> getUsers() {
return sql2o.withConnection((c, o) -> {
return c.createQuery(
"SELECT\n" +
" id,\n" +
" name\n" +
"FROM users"
)
.executeAndFetch(User.class);
});
}
}
public class Main {
public static void main(String[] args) {
val gson = new Gson();
port(8080);
get("/users", (req, res) -> {
return new UserRepository().getUsers();
}, gson::toJson);
}
}
If you rely on Tomcat to provide the connection to you: It's coming from a pool. Just go with plain old JDBC and open that connection yourself (and make sure to close it as well) if you don't like that.
So much for the answer to your question, to the letter. Now for the spirit: There's nothing wrong with connections coming from a pool. In all cases, it's your responsibility to handle it properly: Get access to a connection and free it up (close) when you're done with it. It doesn't make a difference if the connection is coming from a pool or has been created manually.
As you say performance is not an issue: Note that the creation of a connection may take some time, so even if the computer is largely idle, creating a new connection per request may have a notable effect on the performance. Your server won't overheat, but it might add a second or two to the request turnaround time.
Check configurations for your pool - e.g. validationQuery (to detect communication failures) or limits for use per connection. And make sure that you don't run into those issues because of bugs in your code. You'll need to handle communication errors anyways. And, again, that handling doesn't differ whether you use pools or not.
Edit: And finally: Are you extra extra sure that there indeed is no communication link failure? Like: Database or router unplugged every night to connect the vacuum cleaner? (no pun intended), Firewall dropping/resetting connections etc?
I am using express 4.x, and the latest MySQL package for node.
The pattern for a PHP application (which I am most familiar with) is to have some sort of database connection common file that gets included and the connection is automatically closed upon the completion of the script. When implementing it in an express app, it might look something like this:
// includes and such
// ...
var db = require('./lib/db');
app.use(db({
host: 'localhost',
user: 'root',
pass: '',
dbname: 'testdb'
}));
app.get('/', function (req, res) {
req.db.query('SELECT * FROM users', function (err, users) {
res.render('home', {
users: users
});
});
});
Excuse the lack of error handling, this is a primitive example. In any case, my db() function returns middleware that will connect to the database and store the connection object req.db, effectively giving a new object to each request. There are a few problems with this method:
This does not scale at all; database connections (which are expensive) are going to scale linearly with fairly inexpensive requests.
Database connections are not closed automatically and will kill the application if an uncaught error trickles up. You have to either catch it and reconnection (feels like an antipattern) or write more middleware that EVERYTHING must call pior to output to ensure the connection is closed (anti-DRY, arguably)
The next pattern I've seen is to simply open one connection as the app starts.
var mysql = require('mysql');
var connection = mysql.createConnection(config);
connection.on('connect', function () {
// start app.js here
});
Problems with this:
Still does not scale. One connection will easily get clogged with more than just 10-20 requests on my production boxes (1gb-2gb RAM, 3.0ghz quad CPU).
Connections will still timeout after a while, I have to provide an error handler to catch it and reconnection - very kludgy.
My question is, what kind of approach should be taken with handing database connections in an express app? It needs to scale (not infinitely, just within reason), I should not have to manually close in the route/include extra middleware for every path, and I (preferably) to not want to catch timeout errors and reopen them.
Since, you're talk about MySQL in NodeJS, I have to point you to KnexJS! You'll find writing queries is much more fun. The other thing they use is connection pooling, which should solve your problem. It's using a little package called generic-pool-redux which manages things like DB connections.
The idea is you have one place your express app access the DB through code. That code, as it turns out, is using a connection pool to share the load among connections. I initialize mine something like this:
var Knex = require('knex');
Knex.knex = Knex({...}); //set options for DB
In other files
var knex = require('knex').knex;
Now all files that could access the DB are using the same connection pool (set up once at start).
I'm sure there are other connection pool packages out there for Node and MySQL, but I personally recommend KnexJS if you're doing any dynamic or complex SQL queries. Good luck!
What is considered best practice for handling and managing connections when building an API or web application with Node.js that depends on MySQL (or in my case, MariaDB)?
Per the documentation for node-mysql, there seem to be two methods to use:
var connection = mysql.createConnection({...});
app.get("/", function(req, res) {
connection.query("SELECT * FROM ....", function(error, result) {
res.json(result);
});
});
-- or --
var pool = mysql.createPool({...});
app.get("/", function(req, res) {
pool.getConnection(error, connection) {
if (error) {
console.log("Error getting new connection from pool");
} else {
connection.query("SELECT * FROM ....", function(error, result) {
connection.release();
res.json(result);
});
}
});
});
To me, it makes the most sense to use the second option, as it should use as many connections as are needed, as opposed to relying on a single connection. However, I have experienced problems using a pool with multiple routes, i.e each route gets a new connection from the pool, executes a query, and releases it back into the pool. Each time I get a connection from a pool, use it, and release it, it seems there is still a process in MySQL waiting for another request. Eventually, these processes build up in MySQL (visible by running SHOW PROCESSLIST) and the application is no longer able to retrieve a connection from the pool.
I have resorted to using the first method because it works and my application doesn't crash, but it doesn't seem like a robust solution. However, node-mariasql looks promising, but I can't tell if that will be any better than what I am currently using.
My question is: what is the best way to handle/structure MySQL connections when building an API or web application that relies heavily on SQL queries on almost every request?
Changing connection.release() to connection.destory() solved my issue. I'm not sure what the former is supposed to do, but the latter behaves as expected and actually removes the connection. This means once a connection is done being used, it kills the MySQL process and creates another when needed. This also means that many queries can hit the API simultaneously, and slow queries will not block new ones.
Better late then never.
connection.destroy() would mean that on each impression you are making a new connection to mySQL, instead of just grabbing an idle connection and querying on that which would have less overhead. Basically you are not using the pool anymore.
Its possible your mySQL user had a limited number of connections to mysql, or that the number of queries you were making to sql were slower then the number of impressions coming into your server.
You can try tweaking the connectionLimit parameter to something higher, so your server can handle more connections simultaneously.
var pool = mysql.createPool({
connectionLimit : 10,
host : 'example.org',
user : 'bob',
password : 'secret'
});