What is a good Configuration for distributed Spring Boot system with 36 downloaders through ssh tunnels - mysql

I've created a Java Spring Boot application that launches 36 downloader droplets on digital ocean, which ssh tunnel to a database CPU Optimized droplet and downloads from an API into the database.
I've configured hikari as follows towards less pooling connections assuming the database may have trouble with too many and thinking they might not be required.
spring.datasource.hikari.maximumPoolSize=5
spring.datasource.hikari.connectionTimeout=200000
spring.datasource.hikari.maxLifetime=1800000
spring.datasource.hikari.validationTimeout=100000
I'm wondering if those settings may or may not be recommended and why. I've reduced the maximumPoolSize to 5 however I haven't found much information on whether that is considered too small for Java Spring Boot Application to run effectively.
Given each downloader is storing data in the database sequentially do I need to have more than a few pooling connections on each downloader?
I've configured the maximum connections in mysql to 250 and the maximum ssh connections on the database server to 200. I note that 114 sshD processes are created on the server. Can a server handle that many ssh tunneling connections?
Do you forsee any problems with this kind of distributed setup with Spring boot? One thing I have had to do before adjusting to these settings is place retry connection code around each database connection to prevent disconnection errors.
Thanks
Conteh

Related

Does RDS proxy affects current application side pooling?

I have a Saas application on AWS ECS and databases on AWS RDS. We are planning to implement AWS RDS Proxy for pooling implementation. From the RDS proxy documentation, I saw that we don't need to make any changes to the application code. Currently, we are using application side connection pooling. When we implement an RDS proxy for pooling, does the current pooling have any impact?
Do we need to remove the application side pooling to work with RDS effectively?
My main concern is, if I choose 100% pooling in RDS proxy and from application pooling configuration if we limit that to say 100 max connection. Will that be a bottleneck?
TLDR: keep the connection pool in your application, and size it to the number of connections required by that one instance of your application (e.g. the ECS task or EKS pod).
With a database proxy in the middle, there are two separate legs to a "connection":
First, there is a connection from the application to the proxy. What you called the "application side pooling" is this type of connection. Since there's still overhead associated with creating a new instance of this type of connection, continuing to use a connection pool in your application probably is a good idea.
Second, there is a connection from the proxy to the database. These connections are managed by the proxy. The number of connections of this type is controlled by a proxy configuration. If you set this configuration to 100%, then you're allowing the proxy to use up to the database's max_connections value, and other clients may be starved for connections.
So, when your application wants to use a connection, it needs to get a connection from its local pool. Then, the proxy needs to pair that with a connection to the database. The proxy will reuse connections to the database where possible (this technique also is called multiplexing). Or, quoting the official docs: "You can open many simultaneous connections to the proxy, and the proxy keeps a smaller number of connections open to the DB instance or cluster. Doing so further minimizes the memory overhead for connections on the database server. This technique also reduces the chance of "too many connections" errors."
As your container orchestrator (e.g. ECS or EKS) scales your application horizontally, your application will open/close connections to the proxy, but the proxy will prevent your database from becoming overwhelmed by these changes.

AWS RDS MySQL database username and password sufficient for commercial security

I'm new to cloud computing so this might be an obvious question. I have a desktop Java application that will connect to an AWS RDS MySQL database using JDBC. Is using the endpoint, username and password for the database the preferred commercial way of connecting to the database?
To encrypt communication I plan to use SSL.
You could open your database instance to the outside, using regular credentials. But, a safer way to proceed might be to create an endpoint in AWS, possibly running in Java, which would expose one or more APIs which in turn would hit the MySQL database running in RDS. That is, you would not expose the RDS instance to the outside world directly, but only internally to this API, also running in AWS. Then, your desktop Java application would talk to this intermediary application when it needs to access the database.
The advantage of this suggestion is that it lessens the risk of your RDS instance being attacked via something like DOS. Of course, the API you create on top of the database could also be attacked. But, Java web application running in a container (and other similar applications in other languages) were designed to be exposed to the outside, much less so database instances.

How to secure a MySQL connection over network?

I'm running Tomcat 7/MySQL 5.6 on Centos 6. It's time to separate the database to another server. What is the best approach to securing the connection between Tomcat and the backend MySQL server. It's Virtualized and I don't want to run the connection open over a shared network.
I'm thinking tunneling through ssh. SSL seems a lot of work. But what's the "recommended" approach?
You're right to be careful about sending traffic over an open network. The MySQL protocol by default is not encrypted at all, so if someone can capture packets on your network, then they can see all your data.
I prefer using either an ssh tunnel or a vpn connection. I just find it easier to configure.
My colleague Ernie Souhrada at Percona posted a couple of really good blog articles about the efficiency of using an ssh tunnel versus using MySQL client options to connect via SSL and bear the overhead of handshaking on every connection.
http://www.mysqlperformanceblog.com/2013/10/10/mysql-ssl-performance-overhead/
http://www.mysqlperformanceblog.com/2013/11/18/mysql-encryption-performance-revisited/
The performance impact of SSL handshake that Ernie reports won't be quite a much of an issue for a Tomcat environment, since you would typically have a connection pool, and therefore new connections would be made less frequently.

Pooling connection to mysql database server from django application server

I am designing the backend of my ios application. The backend has separate database and application server running mysql and django separately in different machines. Till now, I have connected my application server with my database server in simple way: I changed the database host in application server settings to point to remore database server and created a new remote host in database server configuration files allowing remote application server to access the database. All works fine and I have decided to go with this setup for production. Then when I was reading Instagram engineering blog, I saw them mentioning 'Pgbouncer' to pool connection to their postgresql database server. What is the need for something like this? Has this got something to do with only performance, or is this a production friendly approach to use something like this for communication between database and application server. Is my general approach mentioned too amateur?
Your approach is not amateur at all. The purpose of bouncer in your case would be to eliminate connection time that happens on each request django handles. For example, on Heroku, which is hosted on AWS servers, this could eat up 40-50ms of each request.
Now, if you had a master/slave setup or something like that, a connection pool would also provide you a failover functionality (just an example)

Apache HTTPClient doesn't allow more than 1500 reusable connections

I'm using Apache HTTPClient (4.2.2) / Java7 to open many reusable connections to a tomcat 7 server (to simulate many users repeatedly hitting the service). Both client and server on Ubuntu 12 (but different machines). I made sure that systctl.conf and limits.conf allow this scenario.
This works well up to about 1500 simulated users / connections. The connections get reused as expected. Somewhere between 1500 and 1600 simulated users however, connections are no longer reused and closed/ re-opend all the time. Why might this be the case?
I don't think the problem is on the server side as when I start multiple simulation clients on different machines against the same server, the server has no problems reusing the connections as long as each client doesn't go beyond 1500 connections.
There can be various reasons as to why connections are not longer being re-used depending on the configuration of the connection manager OR server side configuration. The easiest way to find out the reason is to run HttpClient with context logging on as described in the 'context logging for connection management / request execution' example in the Logging Guide
You might need to increase the number of available workers,at least check if there are workers free when you run out of connections by going to server-status