I am currently connecting my ec2 server to rds via the following:
self.conn = MySQLdb.connect (
host = settings.DATABASES['default']['HOST'],
port = 3306,
user = settings.DATABASES['default']['USER'],
passwd = settings.DATABASES['default']['PASSWORD'],
db = settings.DATABASES['default']['NAME'])
This connects via tcp and is much, much slower for me than locally when I connect on my own machine to mysql through a socket. How would I connect an ec2 instance to an rds database via a socket connection so it is much faster than using tcp/ip for long-running scripts (the difference for me is an update script will take 10 hours instead of one).
Short answer: You can't.
Aside: all connections to MySQL on a Linux server use "sockets," of course, whether they are Internet (TCP) sockets, or IPC/Unix Domain sockets. But in this question, as in common MySQL parlance, "socket" refers to an IPC socket connection, using a special file, such as /tmp/mysql.sock, though the specific path to the socket file varies by Linux distribution.
A Unix domain socket or IPC socket (inter-process communication socket) is a data communications endpoint for exchanging data between processes executing within the same host operating system.
https://en.m.wikipedia.org/wiki/Unix_domain_socket
So, you can't use the MySQL "socket" connection mechanism, because the RDS server is not on the same machine. The same holds true, of course, any time the MySQL server is on a different machine.
On a local machine, the performance difference between an IPC socket connection and a TCP socket connection (from/to the same machine) is negligible. There is no disagreement that TCP connections have more overhead than IPC simply because of the TCP/IP wrapper and checksums, the three-way handshake, etc... but again, these tiny fractions of milliseconds of difference that will be entirely lost on the casual observer.
To conclude that TCP connections are "slower" than IPC connections, and particularly by a factor of 10, is not correct. The quotes around "slower" reflect my conclusion that you have not yet defined "slower" with sufficient precision: Slow to connect? Slow to transfer large amounts of data (bandwidth/throughput issue)? Slower to return from each query?
Take note of the Fallacies of Distributed Computing, particularly this one:
Latency is zero.
I suspect your primary performance issue is going to be found in the fact that your code is not optimal for non-zero latency. The latency between systems in EC2 (including RDS) within a region should be under 1 millisecond, but that's still many hundreds of times the round-trip latency on a local machine (which is not technically zero but could easily be just a handful of microseconds).
Testing your code locally, using a TCP connection (using the host 127.0.0.1 and port 3306) instead of the IPC socket should illustrate whether there's really a significant difference or whether the problem is somewhere else... possibly inefficient use of the connections, or unnecessarily repeated disconnect/reconnect, though it's difficult to speculate further without a clearer understanding of what you mean by "slow."
Related
I have an RDS instance hosting a mySQL database. Instance size is db.t2.micro
I also have an ExpressJS backend connecting to the mySQL RDS instance via a connection pool:
Additionally i have a mobile app, the client, feeding off the ExpressJS API.
The issue i'm facing is, either via the mobile app or via Postman, there are times where i get a 'Too many connections' error and therefore several requests fail:
On the RDS instance. On current activity i sometimes get 65 connections, showing it's reaching the limit. What i need clarity on is:
When 200 mobile app instances connect to the API, to the RDS instance, does it register as 200 connections or 1 connection from ExpressJS?
Is it normal to be reaching the RDS instance 65 connection limit?
Is this just a matter of me using db.t2.micro instance size which is not recommended for prod? Will upgrading the instance size resolve this issue?
Is there something i'm doing wrong with my requests?
Thank you and your feedback is appreciated.
If your app creates a connection pool of 100, that's the number of database connections it will try to open. It must be lower than your MySQL connection limit.
Typically connection pools open all the connections for the pool, so they are ready when a client calls the http API. The connections might normally be running no SQL queries, if there are not many clients using the API at a given moment. The database connections are nevertheless connected.
Sort of like when you ssh to a remote linux server but you just sit there at a shell prompt for a while before running any command. You're still connected.
You asked if a db.t2.micro instance was not recommended for production. Yes, I would agree with that. It's tempting to use the smallest instance possible to save money, but a db.t2.micro is too small for anything but light testing, in my opinion.
In fact, I would not use any t2 instance for production, regardless of size. The t2 type uses "burstable" performance. This means it can provide only brief periods of good performance. Once the instance depletes its performance credits, they recharge slowly, and while they recharge, the performance of that instance is very low. This is okay for testing, but not for production, if you expect to provide consistent performance at any time.
i have a database that thousands of users need to connect to (via ODBC) for very brief periods (it's a subscription licensing database for a win32 desktop app). They connect, get their approval to run and disconnect).
max_connections is set to 1000 but am not seeing the re-use i would expect server side. i.e. server currently has about 800 processes/connections sleeping (and another 200 connected to real data in other databases on the same server) .... yet a new attempt by a client app was rejected 'too many connections'.
What am i missing?
have increased the max_connections for now to 1500 but if that just means another 500 sleeping connections it's not a long term solution. pretty sure clients are disconnecting properly but am adding some diagnostics to the win32 app just in case.
MariaDB 10.3.11
with MySQL ODBC 5.3 ANSI Driver
It's normal to see a lot of sessions "Sleeping". That means the client is connected, but not executing a query at this moment. The client is likely doing other tasks, before or after running an SQL query. Just like if you are logged into a server with ssh, most of the time you're just sitting at the shell prompt not running any program.
It's up to you to design your clients to wait to connect until they need data, then disconnect promptly after getting their data. It's pretty common in apps that they connect to the database at startup, and remain connected. It's also pretty common in some frameworks to make multiple connections at startup, and treat them as a pool that can be used by multiple threads of the client app. It's your app, so you should configure this as needed.
Another thing to try is to enable the thread pool in the MariaDB server. See https://mariadb.com/kb/en/thread-pool-in-mariadb/
This is different from a client-side connection pool. The thread pool allows many thousands of clients to think they're connected, without allocating a full-blown thread in the MariaDB server for every single connection. When a client has something to query, at that time it is given one of the threads. When that client is done, it may continue to maintain a connection, but the thread in the MariaDB server is reallocated to a different client's request.
This is good for "bursty" workloads by many clients, and it sounds like your case might be a good candidate.
I am currently connecting to an RDS instance (MariaDB) without an issue from within the configured VPC.
I am also connecting to the RDS instance from local clients (external to the VPC) with no connectivity issues but have serious issues with SQL execution speeds. It can take up to 20 times longer to execute a query remotely vs locally (an EC2 on same VPC).
I have the Security Group for the RDS instance setup to allow the external IPs as incoming rules and the RDS instance is listening on a non-default port (not 3306).
I cannot think of anything I should be doing differently on the network side of things and I have set skip-name-resolve=1, yet the speed is ridiculous.
It has no preference in terms of what the SQL query may be (SELECT, UPDATE, DELETE), they all execute slow.
Server RDS is MariaDb 10.1.19 on a db.t2.medium instance.
Client connection is via MySQL .NET Connector and connection string:
Server=<ip>;Port=<port>;Database=<dbname>;User ID=<dbuser>;Pooling=true;CharSet=utf8;Password=<dbpass>
Client has no connectivity or speed issues when DB in not an RDS (local MySQL).
I have seen various network related issues popping up now and then (connection stream dropped) but nothing serious apart from that, just very slow.
Any pointers on how to at least determine where the problem is?
The scenario I am trying to achieve (with acceptable speeds) is described here (albeit vague in their instructions):
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.Scenarios.html#USER_VPC.Scenario4
I know it is very basic. But want to clear some concept of mysql connections. I have following scenario.
Db server and web servers are on different locations.
Web server is running an article based web site.
Articles data is stored in db server.
Web server is delivering 100 articles/pages per second.
My questions are as follows:
Single connection between web server and db server can handle it ?
How many connections would be created by default ?
If I suppose connections as pipes, what is i/o capacity of each connection ?
Is there any relationship between db server's RAM, processor, OS and number of connections ?
Thanks in advance
Single connection between web server and db server can handle it?
A single connection can pass several requests, but not at the same time, and only for a single client process. So the best scaling approach might be one connection pool per web server process, where threads can obtain an established connection to perform their request, and return the connection to the pool once they are done.
How many connections would be created by default ?
Depends on the client language, the MySQL connector implementation, the frameworks you use, the web server configuration, and probably a bunch of other details like this. So your best bet is to simply look at the list of network connections to the MySQL service, e.g. using lsof or netstat.
If I suppose connections as pipes, what is i/o capacity of each connection ?
The limiting factor will probably be shared resources, like the network bandwidth or processing capabilities at either end. So the number of connections shouldn't have a large impact on the data transfer rate. The overhead to establish a connection is significant, though, which is why I suggested reducing the number of connections using pooling.
Is there any relationship between db server's RAM, processor, OS and number of connections ?
Might be, if some application makes choices based on these parameters, but in general I'd consider this rather unlikely.
I'm trying to figure out why mysql uses Unix socket (/tmp/mysql.sock) by default, instead of normal TCP/IP sockets.
It doesn't seem like a security thing, as you can listen only on 127.0.0.1 which should be equally safe (socket file is world-writable, so you don't get Unix accounts based protection).
And surely all operating systems rely on high performance TCP/IP so much that it cannot be significantly slower than Unix sockets - Linux does all sort of zero-copy tricks even for network traffic, so it surely must be fast for loopback.
So is there any legitimate reason for using Unix sockets here, or is it just some weird historical accident?
While you don't hit the entire IP stack when going over localhost, you still hit a big part of it.
A unix socket is essentially just a 2-way pipe. It's faster and lighter.
Unix sockets also allow you to control access without managing firewall rules, as access can be given through filesystem permissions.
One other feature unix sockets provide is the ability to pass file descriptor from one process to another.
There is less overhead in using Unix Sockets instead of TCP/IP, as this is basically a byte stream without the extra network bookkeeping. See wikipedia for a little bit more info.
Unix domain connections appear as byte streams, much like network connections, but all data remains within the local computer. UNIX domain sockets use the file system as address name space, i.e. they are referenced by processes as inodes in the file system. This allows two distinct processes to open the same socket in order to communicate. However, the actual communication (the data exchange) does not use the file system, but buffers in kernel memory.