2 mySQL clusters in HAProxy - mysql

We use HAProxy (1.5) to proxy mysql to 4 Galera Nodes. We use roundrobin and works good for High Availability and Load Balancing.
See /etc/haproxy/haproxy.cfg
global
user haproxy
group haproxy
defaults
mode http
log global
retries 2
timeout connect 3000ms
timeout server 10h
timeout client 10h
listen stats
bind *:8404
stats enable
stats hide-version
stats uri /stats
listen mysql-cluster
bind 127.0.0.1:3306
mode tcp
option mysql-check user haproxy_check
balance roundrobin
server dbcl_01_dc1 xx.xx.xx.xx:3306 check
server dbcl_03_dc6 1xx.xx.xx.xx:3306 check
server dbcl_04_do xx.xx.xx.xx:3306 check
server dbcl_05_dc4 xx.xx.xx.xx:3306 check
This works great but we have a fear of the Cluster failing us some day and we would like haproxy to roll over to another mysql server should none of the above 4 galera nodes be available. We would only want this last server being used as dooms day scenario as its data is one hour behind the production cluster and more importantly a different dataset. The idea is we automatically roll over to our non-clustered mysql data from one hour behind and keep our customers operating.
Does anybody know if this is possible with HAProxy? So First 4 Servers in roundrobin and if they are not available then choose non clustered single database server as last resort.

You can try something with backup to help you configure with failover
listen mysql-cluster
bind 127.0.0.1:3306
mode tcp
option mysql-check user haproxy_check
balance roundrobin
server dbcl_01_dc1 xx.xx.xx.xx:3306 check
server dbcl_03_dc6 xx.xx.xx.xx:3306 check
server dbcl_04_dc2 xx.xx.xx.xx:3306 check
server dbcl_05_dc4 xx.xx.xx.xx:3306 check
// Solution
server dbbk_01_dc1 xx.xx.xx.xx:3306 check backup
In this case if all the 4 servers in the cluster goes down traffic will get routed to the backup server.
However, you can also try multiple backup servers as part of the configuration
listen mysql-cluster
bind 127.0.0.1:3306
mode tcp
option mysql-check user haproxy_check
balance roundrobin
server dbcl_01_dc1 xx.xx.xx.xx:3306 check
server dbcl_03_dc6 xx.xx.xx.xx:3306 check
server dbcl_04_dc2 xx.xx.xx.xx:3306 check
server dbcl_05_dc4 xx.xx.xx.xx:3306 check
// Solution
server dbbk_01_dc1 xx.xx.xx.xx:3306 check backup
server dbbk_02_dc2 xx.xx.xx.xx:3306 check backup
In the above solution HAProxy picks up first server as backup until it goes down, and as a failover it uses the second server to serve the traffic if first backup server goes down.
If there is huge traffic surge and you want multiple backups to handle all your traffic you can also setup something like this with option allbackups which routes traffic to all the backups.
There is official documentation with much more complex settings.

Related

Asterisk Realtime Crashing on load when using HAProxy to Galera Cluster

Works fine under little load on our test bench but once we add to production the whole thing crashes and we are unable to get asterisk to function correctly. Almost as if there is a lag or delay in accessing the MariaDB cluster.
Our architecture and configs below;
Asterisk 13 Realtime with HAProxy(1.5.18) --> 6 x MariaDB(10.4.11) on independent Datacentres with Galera syncing them (1 only as backup)
Galera Sync is working fine and other services are able to read/write via the HAProxy 100%
Only seems to become and issue when we add load or we reload the dialplan or restart asterisk etc.
[haproxy.cfg]
global
user haproxy
group haproxy
defaults
mode http
log global
retries 2
timeout connect 3000ms
timeout server 10h
timeout client 10h
listen stats
bind *:8404
stats enable
stats hide-version
stats uri /stats
listen mysql-cluster
bind 127.0.0.1:3306
mode tcp
option mysql-check user haproxy_check
balance roundrobin
server mysql_server1 10.0.0.1:3306 check
server mysql_server2 10.0.0.2:3306 check
server mysql_server3 10.0.0.3:3306 check
server mysql_server4 10.0.0.4:3306 check
server mysql_server5 10.0.0.5:3306 check
server mysql_server6 10.0.0.6:3306 check backup
Really we would like to know if firstly Asterisk 13 Realtime will work via HAProxy and if so are there config changes we need to make to get it working.
Can provide more info if required
Try use Realtime->ODBC->haproxy.
If not help, use debugging, for example, gdb traces.
There is no way to determine what issue you have. Need more logs and configs.

MySQL stuck or network issue?

we have mysql-server(5.5.47)that hosted on physical server. It listen external internet interface(with restrict user access), mysql server intensively used from different places(we use different libraries to communicate with mysql). But sometimes whole mysql server(or network) stuck and stop accept connection, and a clients failed with etimedout(connect)/timeout(recv), even direct connection from server to mysql with mysql cli not working(stuck without any response — seems to be try to establish connections).
First thought was that it is related to tcp backlog, so mysql backlog was increased — but this not help at all.
Issue not repeatable, so last time when this issue happened we sniff traffic, and what we get:
http://grab.by/STwq — screenshot
*.*.27.65 — it is client
*.*.20.80 — it is mysql server
From session we can assume that tcp connection established, but server retransmit SYN/ACK to client(from dump we see that server receive ACK, why retransmit ?), but in normal case mysql must generate init packet and send to client, after connection was established.
It is only screen from 1 session, but all other sessions mostly same, SYN -> SYN/ACK -> ACK -> and server retransmit SYN/ACK up to retries_count.
After restart mysql all get normal immediately after restart. So not sure it is related to network or mysql.
Any thoughts would be appropriate.
Thank you!

AWS RDS Aborted Connection Haproxy

I create 1 master and 2 replication in AWS RDS and 1 EC2 with haproxy
listen rds-cluster
bind 172.30.0.xxx:3306
mode tcp
option mysql-check user ha_check
balance roundrobin
server mysql-1 replica1.xxxx.ap-southeast-1.rds.amazonaws.com:3306 check weight 1 fall 2 fastinter 1000
server mysql-2 replica2.xxxx.ap-southeast-1.rds.amazonaws.com:3306 check weight 1 fall 2 fastinter 1000
If I can connect directly using endpoint to replica server,
But if I using haproxy
$ mysql -h172.30.0.xxx -uha_read -ppassword -e "show variables like 'server_id'"
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0
i got that error
I already increase connect_timeout
if I check
SHOW GLOBAL STATUS LIKE 'Aborted_connects';
it's keep increasing
===============
This article solve my problem
CUSTOM CONFIGURATION OF AMAZON RDS INSTANCES
by default if you did not change the security group settings when launch RDS, only your IP will be authorized to reach your databases. In your case you need to authorize your haproxy node to reach your databases as well.
Go to RDS, select your instance, then security group, edit, add a new rule to enable either the security group of your HAproxy (best practice) or HAproxy IP (still good enough if this is an elastic IP) to access the database on port 3306.
Hope this is clear enough :)
EDIT: I understand that you solved your issue, but for people reading later (or even for you if you want to enhance security) I add a little information about what I said:
the RDS hostname will be resolved to private IP when the DNS query is made from an instance in the same VPC to the Amazon provided DNS server in that VPC. Thus in your security group, in that case, you would have to allow either the subnet of you haproxy or its private IP (not public one).

mysql farm with haproxy

I want to configure a DB farm in a single node with containers. My idea is to access in each of these DB with a subdomain, for example mysql1.example.com:3306, mysql2.example.com:3306, mysql3.example.com:3306.
I'm trying to implement this model with HAProxy, it seems that the first time that I connect to one database through the HAProxy it works. When I reconnect I get:
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0
The template I use in HAproxy is:
global
maxconn 256
debug
defaults
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
listen www
bind *:3306
mode tcp
acl host_mysql hdr(host) -i mysql1.example.com
server mysql_db_1 172.31.20.75:3307
acl host_mysql hdr(host) -i mysql2.example.com
server mysql_db_2 172.31.20.75:3308
acl host_mysql hdr(host) -i mysql3.example.com
server mysql_db_3 172.31.20.75:3309
I auto-respond. It's not possible to create this implemetation due to Mysql uses TCP protocol, so it not include the URL in the header. For this reason HAproxy can't redirect to the correct server.
I'm thinking to implement this environment using virtual IP's assigned to each database. Another implementation would be running all databases in the same server and different ports.

unable to connect to AWS VPC RDS instance (mysql or postgres)

(I'm posting this question after the fact because of the time it took to find the root cause and solution. There's also a good chance other people will run into the same problem)
I have an RDS instance (in a VPC) that I'm trying to connect to from an application running on a classic EC2 instance, connected via ClassicLink. Security groups and DNS aren't an issue.
I am able to establish socket connections to the RDS instance, but cannot connect with CLI tools (psql, mysql, etc.) or DB GUI tools like toad or mysql workbench.
Direct socket connections with telnet or nc result in TCP connections in the "ESTABLISHED" state (output from netstat).
Connections from DB CLI, GUI tools, or applications result in timeouts and TCP connections that are stuck in the "SYN" state.
UPDATE: The root cause in my case was a problem with MTU size and EC2 ClassicLink. I've posted some general troubleshooting information below in an answer in case other people run into similar RDS connectivity issues.
Additional information for people who might run into similar issues trying to connect to RDS or RedShift:
1) Check security groups
Verify the security group for the RDS instance allows access from the security group your source server belongs to (or its IP added directly if external to AWS). The security group you should be looking at is the one specified in the RDS instance attributes from the RDS console UI (named "security group").
NOTE: Database security groups might be different from AWS EC2 security groups. If your RDS instance is in classic/public EC2, you should check in the "database security group" section of the RDS UI. For VPC users, the security group will be a normal VPC security group (the name sg-xxx will be listed in the RDS instance's attributes).
2) Confirm DNS isn't an issue.
Amazon uses split DNS, so a DNS lookup external to AWS will return the public IP while a lookup internal to AWS will return a private IP. If you suspect it is a DNS issue, have you confirmed different IPs are returned from different availability zones? If different AZs get different IPs, you will need to contact AWS support.
3) Confirm network connectivity by establishing a socket connection.
Tools like tracepath and traceroute likely won't help since RDS currently drops ICMP traffic.
Test port connectivity by trying to establish a socket connection to the RDS instance on port 3306 (mysql, or 5432 for postgres). Start by finding the IP of the RDS instance and using either telnet or nc (be sure to use the internal/private IP if connecting from within AWS):
telnet x.x.x.x 3306
nc -vz x.x.x.x 3306
a) If your connection attempt isn't successful and immediately fails, the port is likely blocked or the remote host isn't running a service on that port. you may need to engage AWS support to troubleshoot further. If connecting from outside of AWS, try to connect from another instance inside AWS first (as your firewall might be blocking those connections).
b) If your connection isn't successful and you get a timeout, packets are probably being dropped/ignored by a firewall or packets are returning on a different network path. You can confirm this by running netstat -an | grep SYN (from a different ssh session while waiting for the telnet/nc command to timeout).
Connections in the SYN state mean that you've sent a connection request, but haven't received anything back (SYN_ACK or reject/block). Usually this means a firewall or security group is ignoring or dropping packets.
It can also be a problem with NAT routing or multiple paths from multiple interfaces. Check to make sure you're not using iptables or a NAT gateway between your host and the RDS instance. If you're in a VPC, also make sure you allow egress/outbound traffic from the source host.
c) If your socket connection test was successful, but you can't connect with a mysql client (CLI, workbench, app, etc.), take a look at the output of netstat to see what state the connection is in (replace x.x.x.x with the actual IP address of the RDS instance):
netstat -an | grep x.x.x.x
If you were getting a connection established when using telnet or NC, but you see the 'SYN' state when using a mysql client, you might be running into an MTU issue.
RDS, at the time this is written, may not support ICMP packets used for PMTUD (https://en.wikipedia.org/wiki/Path_MTU_Discovery#Problems_with_PMTUD). This can be a problem if you're trying to access RDS or RedShift that's in a VPC from a classic ec2 instance via ClassicLink. Try lowering the MTU with the following, then testing again:
sudo ip link show
# take note of the current MTU (likely 1500 or 9001)
sudo ip link set dev eth0 mtu 1400
If the lower MTU worked, be sure to follow up with AWS customer support for help and mention that you are seeing an MTU issue while trying to connect to your RDS instance. This can happen if TCP packets are wrapped with encapsulation for tunneling, resulting in a lower usable MTU for packet data / payload. Lowering the MTU on the source server allows the wrapped packets to still fit under the MTU limit while passing through the tunneling gateway.
If it didn't work, set your MTU back to it's default and engage AWS support for further troubleshooting.