Google Compute Engine instances cannot ping each other on internal addresses - google-compute-engine

I have a few instances running in Google Compute engine:
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
inst-dns2 xx-xxxx-x f1-micro 10.240.0.4 x.x.x.x RUNNING
inst-st0 xx-xxxx-x f1-micro 10.240.0.2 x.x.x.x RUNNING
inst-www0 xx-xxxx-x n1-standard-2 10.240.0.3 x.x.x.x RUNNING
inst-dns3 xx-xxxx-x f1-micro 10.240.0.5 x.x.x.x RUNNING
As you can see, they all have IPs in the same subnet. I have been using the built-in default network since I started.
NAME MODE IPV4_RANGE GATEWAY_IPV4
default legacy 10.240.0.0/16 10.240.0.1
I still have the default-allow-internal firewall rule:
NAME NETWORK SRC_RANGES RULES SRC_TAGS TARGET_TAGS
default-allow-internal default 10.240.0.0/16 tcp:1-65535,udp:1-65535,icmp
Yet, none of my instances can ping one another.
The docs state, "In the default network, by default, all virtual machine instances within a network can communicate with each other, because of the default firewall rules described above."
So, I am consfused. I thought I could do this without having to create custom networks and all that.
Any input would be much appreciated.

I resolved this by using official FreeBSD instance images instead of my own pre-built ones. Beware: the official images have their own issues, such as problems system date/time. Do your research and test before going into production.

Related

Recover From Changing Host IP#

I was trying to add an IP# to my Google Compute Engine (RHEL7) instance, but I typed the invocation wrong:
sudo ifconfig eth0 1.2.3.4
The existing IP# on eth0 was 1.2.3.3, so that invocation changed my existing IP# to one that isn't known to anything else. And so I lost all connections (ssh, http, even ping) to the instance.
How do I recover from this mistake? Is there a gcloud or GCP Console method I can use, since I can't connect directly to the instance anymore.
Since the ifconfig was invoked from a shell, not reconfigured in any startup scripts (or anywhere else), just resetting the instance will reboot it and cause it to config its eth0 according to its startup scripts:
$ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
<instance-name> <instance-zone> <machine-type> <preemptible> <bad-internal-ip#> <external-ip#>
$ gcloud compute instances reset <instance-name>
For the following instance:
- [<instance-name>]
choose a zone:
[1] asia-east1-a
[2] asia-east1-b
[...]
Please enter your numeric choice: <N-of-instance-zone>
Updated [https://www.googleapis.com/compute/v1/projects/<project-name>/zones/<instance-zone>/instances/<instance-name].
$ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
<instance-name> <instance-zone> <machine-type> <preemptible> <default-internal-ip#> <external-ip#> RUNNING
After you enter your numeric zone it can take several seconds or a longer (but probably not more than 5 minutes) for the instance to restart.
Look around in the cloud platform console. You usually can change the external IP, then go the long way around - provided its instanced.

Amazon EC2 and RDS accross AWS Accounts [duplicate]

From EC2 instance i-78a8df00, I'm trying to connect to RDS instance mysql.************.us-east-1.rds.amazonaws.com. They are both in the U.S. East region. I added the security group of EC2 instance (sg-********) to the RDS security group, but that didn't help. It appears to be a firewall/DNS issue as it is timing out when running this command:
ubuntu#ip-10-195-189-237:~$ mysql -h mysql.************.us-east-1.rds.amazonaws.com
ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql.************.us-east-1.rds.amazonaws.com' (110)
I can connect to RDS instance fine from my local machine using the same line as above. I tried various forum solutions but those don't help.
I had similar problem, when I spun a new EC2 instance, but didn't change setting in RDS security group of inbound IP address allowed to connect to port 3306 of my RDS instance.
The confusing bit was an option in RDS dashboard, called Security Groups. You don't need it to solve the problem.
What you really need is:
Go to list of RDS instances
Click on the instance you are trying to connect
Click Security group rules section
This should open a new browser tab or window with details of security group.
Locate several tabs in bottom part, select Inbound rules tab and click Edit button.
Change value to the IP address of your EC2 instance or IPv4 CIDR blocks, e.g.
174.33.0.0/16
To get this value, you can either ssh into your instance and run ip addr or run EC2 Manager in browser and locate value of Private IPs in your instance details.
Additional information for people who might run into similar issues trying to connect to RDS or RedShift:
1) Check security groups
Verify the security group for the RDS instance allows access from the security group your source server belongs to (or its IP added directly if external to AWS). The security group you should be looking at is the one specified in the RDS instance attributes from the RDS console UI (named "security group").
NOTE: Database security groups might be different from AWS EC2 security groups. If your RDS instance is in classic/public EC2, you should check in the "database security group" section of the RDS UI. For VPC users, the security group will be an normal VPC security group (the name sg-xxx will be listed in the RDS instance's attributes).
2) Confirm DNS isn't an issue.
Amazon uses split DNS, so a DNS lookup external to AWS will return the public IP while a lookup internal to AWS will return a private IP. If you suspect it is a DNS issue, have you confirmed different IPs are returned from different availability zones? If different AZs get different IPs, you will need to contact AWS support.
3) Confirm network connectivity by establishing a socket connection.
Tools like tracepath and traceroute likely won't help since RDS currently drops ICMP traffic.
Test port connectivity by trying to establish a socket connection to the RDS instance on port 3306 (mysql, or 5432 for postgres). Start by finding the IP of the RDS instance and using either telnet or nc:
telnet x.x.x.x 3306
nc -vz x.x.x.x 3306
a) If your connection attempt isn't successful and immediately fails, the port is likely blocked or the remote host isn't running a service on that port. you may need to engage AWS support to troubleshoot further. If connecting from outside of AWS, try to connect from another instance inside AWS first (as your firewall might be blocking those connections).
b) If your connection isn't successful and you get a timeout, packets are probably being dropped/ignored by a firewall or packets are returning on a different network path. You can confirm this by running netstat -an | grep SYN (from a different CLI window/session while running and waiting for the telnet/nc command to timeout). Connections in the SYN state mean that you've sent a connection request, but haven't received anything back (SYN_ACK or reject/block). Usually this means a firewall or security group is ignoring or dropping packets.
Check to make sure you're not using iptables or a NAT gateway between your host and the RDS instance. If you're in a VPC, also make sure you allow egress/outbound traffic from the source host.
c) If your socket connection test was successful, but you can't connect with a mysql client (CLI, workbench, app, etc.), take a look at the output of netstat to see what state the connection is in (replace x.x.x.x with the actual IP address of the RDS instance):
netstat -an | grep x.x.x.x
If you were getting a connection established when using telnet or NC, but you see the 'SYN' state when using a mysql client, you might be running into an MTU issue.
RDS, at the time this is written, may not support ICMP packets used for PMTUD (https://en.wikipedia.org/wiki/Path_MTU_Discovery#Problems_with_PMTUD). This can be a problem if you're trying to access RDS or RedShift that's in a VPC from a classic ec2 instance via ClassicLink. Try lowering the MTU with the following, then testing again:
sudo ip link show
# take note of the current MTU (likely 1500 or 9001)
sudo ip link set dev eth0 mtu 1400
If the lower MTU worked, be sure to follow up with AWS customer support for help and mention that you are seeing an MTU issue while trying to connect to your RDS instance. This can happen if TCP packets are wrapped with encapsulation for tunneling, resulting in a lower usable MTU for packet data / payload. Lowering the MTU on the source server allows the wrapped packets to still fit under the limit.
If it didn't work, set your MTU back to it's default and engage AWS support for further troubleshooting.
While Mark's problem seemed to have something to do with multi-AZ routing & EC2 classic, I ran into this exact same problem today.
To fix it, I modified the Security Group that was created automatically with my RDS instance by adding the two private IP addresses from my EC2 instance.
This was a fairly obvious problem, but I'm new to AWS in general, so hopefully this is useful for others like myself.
Apparently, multi-AZ screws everything up. Since the default multi-AZ config placed my database in region us-east-1d, and my EC2 instance was in region us-east-1a, the DNS was not routing correctly. I re-created the RDS instance as non-multi-AZ, and made it live in us-east-1a, and all is happy.
If there are any super geniuses out there in regards to DNS routing on AWS with RDS, ELB, and multi-AZ capabilities, it would be pretty awesome to know how to do this, since this isn't documented anywhere in Amazon Web Service's documentation.
After struggling for 3 days I finally found why mine was not connecting ...
Add outbound rule on your EC2 instance for port 3306 and inbound rules on your RDS server on port 3306. The inbound value should be the security of the EC2 instance
Example:
Your EC2 security group is - sg.ec2
And RDS security group is - sg.rds
So go to edit outbound rules of sg.ec2 and add Custom TCP at port 3306 and destination to 0.0.0.0/0
Then, go to edit inbound rule of sg.rds and add an inbound rule at port 3306 and source as sg.ec2
I had the similar problem today when my EC2 instance suddenly lost access to RDS instance and Wordpress stopped working. The security groups were correct and I could even connect to MySQL from console on EC2 instance but not from PHP. For some reason restarting EC2 server helped me.
Looks like sometime between the last posting and this posting, Amazon fixed the DNS routing issue, because everything works fine now for multi-AZ rds...
Solution: I had to adjust the inbound rule of the RDS instance's security group to allow connections from the IP address of my EC2 instance.
Longer explanation: When creating an AWS RDS instance I selected the option to be able to connect to it via public IP address. In doing so I ASSUMED that I would then be able to connect to it from ANY IP Address, but that was not the case. Rather, when the instance got created/configured, it took the public IP address of my laptop and set the inbound rule of the RDS security group to only accept connections from that IP address. So I had to manually add a rule to allow incoming MYSQL/Aurora connections from IP address of the EC2 instance.
I ran into the same issue, I was not able to connect from EC2 instance to the RDS both resources in the same VPC and multi AZ setup.
The multi-AZ setup was not a problem for me.
I was missing only the OUTBOUND rule 3306 from my EC2 instance Security group, to my RDS Security group.
According to the Amazon doc, we should use:
PROMPT> mysql -h <endpoint> -P 3306 -u <mymasteruser> -p
where endpoint and mymasteruser(username) are from your RDS instance.
I solved that problem using IP public adress directly(of the endpoint) instead of the endpoint(****.us-east-1.rds.amazonaws.com).You can get the ip public address using "ping" command(ping ****.us-east-1.rds.amazonaws.com)

unable to connect to AWS VPC RDS instance (mysql or postgres)

(I'm posting this question after the fact because of the time it took to find the root cause and solution. There's also a good chance other people will run into the same problem)
I have an RDS instance (in a VPC) that I'm trying to connect to from an application running on a classic EC2 instance, connected via ClassicLink. Security groups and DNS aren't an issue.
I am able to establish socket connections to the RDS instance, but cannot connect with CLI tools (psql, mysql, etc.) or DB GUI tools like toad or mysql workbench.
Direct socket connections with telnet or nc result in TCP connections in the "ESTABLISHED" state (output from netstat).
Connections from DB CLI, GUI tools, or applications result in timeouts and TCP connections that are stuck in the "SYN" state.
UPDATE: The root cause in my case was a problem with MTU size and EC2 ClassicLink. I've posted some general troubleshooting information below in an answer in case other people run into similar RDS connectivity issues.
Additional information for people who might run into similar issues trying to connect to RDS or RedShift:
1) Check security groups
Verify the security group for the RDS instance allows access from the security group your source server belongs to (or its IP added directly if external to AWS). The security group you should be looking at is the one specified in the RDS instance attributes from the RDS console UI (named "security group").
NOTE: Database security groups might be different from AWS EC2 security groups. If your RDS instance is in classic/public EC2, you should check in the "database security group" section of the RDS UI. For VPC users, the security group will be a normal VPC security group (the name sg-xxx will be listed in the RDS instance's attributes).
2) Confirm DNS isn't an issue.
Amazon uses split DNS, so a DNS lookup external to AWS will return the public IP while a lookup internal to AWS will return a private IP. If you suspect it is a DNS issue, have you confirmed different IPs are returned from different availability zones? If different AZs get different IPs, you will need to contact AWS support.
3) Confirm network connectivity by establishing a socket connection.
Tools like tracepath and traceroute likely won't help since RDS currently drops ICMP traffic.
Test port connectivity by trying to establish a socket connection to the RDS instance on port 3306 (mysql, or 5432 for postgres). Start by finding the IP of the RDS instance and using either telnet or nc (be sure to use the internal/private IP if connecting from within AWS):
telnet x.x.x.x 3306
nc -vz x.x.x.x 3306
a) If your connection attempt isn't successful and immediately fails, the port is likely blocked or the remote host isn't running a service on that port. you may need to engage AWS support to troubleshoot further. If connecting from outside of AWS, try to connect from another instance inside AWS first (as your firewall might be blocking those connections).
b) If your connection isn't successful and you get a timeout, packets are probably being dropped/ignored by a firewall or packets are returning on a different network path. You can confirm this by running netstat -an | grep SYN (from a different ssh session while waiting for the telnet/nc command to timeout).
Connections in the SYN state mean that you've sent a connection request, but haven't received anything back (SYN_ACK or reject/block). Usually this means a firewall or security group is ignoring or dropping packets.
It can also be a problem with NAT routing or multiple paths from multiple interfaces. Check to make sure you're not using iptables or a NAT gateway between your host and the RDS instance. If you're in a VPC, also make sure you allow egress/outbound traffic from the source host.
c) If your socket connection test was successful, but you can't connect with a mysql client (CLI, workbench, app, etc.), take a look at the output of netstat to see what state the connection is in (replace x.x.x.x with the actual IP address of the RDS instance):
netstat -an | grep x.x.x.x
If you were getting a connection established when using telnet or NC, but you see the 'SYN' state when using a mysql client, you might be running into an MTU issue.
RDS, at the time this is written, may not support ICMP packets used for PMTUD (https://en.wikipedia.org/wiki/Path_MTU_Discovery#Problems_with_PMTUD). This can be a problem if you're trying to access RDS or RedShift that's in a VPC from a classic ec2 instance via ClassicLink. Try lowering the MTU with the following, then testing again:
sudo ip link show
# take note of the current MTU (likely 1500 or 9001)
sudo ip link set dev eth0 mtu 1400
If the lower MTU worked, be sure to follow up with AWS customer support for help and mention that you are seeing an MTU issue while trying to connect to your RDS instance. This can happen if TCP packets are wrapped with encapsulation for tunneling, resulting in a lower usable MTU for packet data / payload. Lowering the MTU on the source server allows the wrapped packets to still fit under the MTU limit while passing through the tunneling gateway.
If it didn't work, set your MTU back to it's default and engage AWS support for further troubleshooting.

Google Compute Engine: Internal DNS server and issues with the resolving

Since google Compute engine does not provides internal DNS i created 2 centos bind machines which will do the resolving for the machines on GCE and forward the resolvings over vpn to my private cloud and vice versa.
as the google cloud help docs suggests you can have this kind of scenario. and edit the resolv.conf on each instance to do the resolving.
What i did was edit the ifcg-eth0 to disable the PEERDNS and in /etc/resolv.conf
i added the search domain and top 2 nameservrs my instances.
now after one instance gets rebooted..it wont start again because its searching for the metadata.google.internal domain
Jul 8 10:17:14 instance-1 google: Waiting for metadata server, attempt 412
What is the best practice in this kind of scenarios?
ty
Also i need the internal DNS for to do the poor's man round-robin failover, since GCE does not provides internal balancers.
As mentioned at https://cloud.google.com/compute/docs/networking:
Each instance's metadata server acts as a DNS server. It stores the DNS entries for all network IP addresses in the local network and calls Google's public DNS server for entries outside the network. You cannot configure this DNS server, but you can set up your own DNS server if you like and configure your instances to use that server instead by editing the /etc/resolv.conf file.
So you should be able to just use 169.254.169.254 for your DNS server. If you need to define external DNS entries, you might like Cloud DNS. If you set up a domain with Cloud DNS, or any other DNS provider, the 169.254.169.254 resolver should find it.
If you need something more complex, such as customer internal DNS names, then your own BIND server might be the best solution. Just make sure that metadata.google.internal. resolves to 169.254.169.254.
OK, I just ran in to this.. but unfortunately there was no timeout after 30 minutes that got it working. Fortunatly nelasx had correctly diagnosed it, and given the fix. I'm adding this to give the steps I had to take based on his excellent question and commented answer. I've just pulled the info I had to gather together in one place, to get to a solution.
Symptoms: on startup of the google instance - getting connection refused
After inspecting serial console output, will see:
Jul 8 10:17:14 instance-1 google: Waiting for metadata server, attempt 412
You could try waiting, didn't work for me, and inspection of https://github.com/GoogleCloudPlatform/compute-image-packages/blob/master/google-startup-scripts/usr/share/google/onboot
# Failed to resolve host or connect to host. Retry indefinitely.
6|7) sleep 1.0
log "Waiting for metadata server, attempt ${count}"
Led me to believe that will not work.
So, the solution was to fiddle with the disk, to add in nelasx's solution:
"edit ifcfg-eth and change PEERDNS=no edit /etc/resolv.conf and put on top your nameservers + search domain edit /etc/hosts and add: 169.254.169.254 metadata.google.internal"
To do this,
Best to create a snapshot backup before you start in case it goes awry
Uncheck "Delete boot disk when instance is deleted" for your instance
Delete the instance
Create a micro instance
Mount the disk
sudo ls -l /dev/disk/by-id/* # this will give you the name of the instances
sudo mkdir /mnt/new
sudo mount /dev/disk/by-id/scsi-0Google_PersistentDisk_instance-1-part1 /mnt/new
where instance-1 will be changed as per your setup
Go in an edit as per nelasx's solution - idiot trap I fell for - use a relative path - don't just sudo vi /etc/hosts use /mnt/new/etc/hosts - that cost me 15 more minutes as I had to go through the: got depressed, scratched head, kicked myself cycle.
Delete the debug instance, ensuring your attached disk delete option is unchecked
Create a new instance matching your original with the edited disk as your boot disk and fire it up.

Site to site OpenSWAN VPN tunnel issues with AWS

We have a VPN tunnel with Openswan between two AWS regions and our colo facility (Used AWS’s guide: http://aws.amazon.com/articles/5472675506466066). Regular usage works OK (ssh, etc), but we are having some MySQL issues over the tunnel between all areas. Using mysql command line client on a linux server and trying to connect using the MySQL Connector J it basically stalls… it seems to open the connection, but then gets stuck. It doesn't get denied or anything, just hangs there.
After initial research thought this was an MTU issue, but I've messed with that a lot and no luck.
Connection to the server works fine, and we can choose a database to use and such, but using the Java connector it appears that the Java client isn't receiving any network traffic after the query is made.
When running a select in the MySQL client on linux we can get a max of 2 or 3 rows before it goes dead.
With this said, I also have a separate openswan VPN on the AWS side for client (mac and iOS) vpn connections. Everything works fantastically through the client VPN and it seems more stable in general. The main difference I've noticed is that the static connection is using "tunnel" as the type and the client is using "transport", but when switching the static tunnel connection to transport it says there's like 30 open connections and doesn't work.
I'm very new to OpenSWAN, so hoping someone can help to point me in the right direction of getting the static tunnel working as well as the client VPN.
As always, here's my config files:
ipsec.conf for BOTH static tunnel servers:
# basic configuration
config setup
# Debug-logging controls: "none" for (almost) none, "all" for lots.
# klipsdebug=none
# plutodebug="control parsing"
# For Red Hat Enterprise Linux and Fedora, leave protostack=netkey
protostack=netkey
nat_traversal=yes
virtual_private=
oe=off
# Enable this if you see "failed to find any available worker"
# nhelpers=0
#You may put your configuration (.conf) file in the "/etc/ipsec.d/" and uncomment this.
include /etc/ipsec.d/*.conf
VPC1-to-colo tunnel conf
conn vpc1-to-DT
type=tunnel
authby=secret
left=%defaultroute
leftid=54.213.24.xxx
leftnexthop=%defaultroute
leftsubnet=10.1.4.0/24
right=72.26.103.xxx
rightsubnet=10.1.2.0/23
pfs=yes
auto=start
colo-to-VPC1 tunnel conf
conn DT-to-vpc1
type=tunnel
authby=secret
left=%defaultroute
leftid=72.26.103.xxx
leftnexthop=%defaultroute
leftsubnet=10.1.2.0/23
right=54.213.24.xxx
rightsubnet=10.1.4.0/24
pfs=yes
auto=start
Client point VPN ipsec.conf
# basic configuration
config setup
interfaces=%defaultroute
klipsdebug=none
nat_traversal=yes
nhelpers=0
oe=off
plutodebug=none
plutostderrlog=/var/log/pluto.log
protostack=netkey
virtual_private=%v4:10.1.4.0/24
conn L2TP-PSK
authby=secret
pfs=no
auto=add
keyingtries=3
rekey=no
type=transport
forceencaps=yes
right=%any
rightsubnet=vhost:%any,%priv
rightprotoport=17/0
# Using the magic port of "0" means "any one single port". This is
# a work around required for Apple OSX clients that use a randomly
# high port, but propose "0" instead of their port.
left=%defaultroute
leftprotoport=17/1701
# Apple iOS doesn't send delete notify so we need dead peer detection
# to detect vanishing clients
dpddelay=10
dpdtimeout=90
dpdaction=clear
Found the solution. Needed to add the following IP tables rule on both ends:
iptables -t mangle -I POSTROUTING -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
This along with an MTU of 1400 and we're looking very solid
We had the same issue with a server connecting from the EU region to an RDS instance in the US. This appears to be a known issue with the RDS instances not responding to ICMP which is needed to auto-discover the MTU settings. As a workaround, you'll need to configure a smaller MTU on the instance that is performing the query.
On the server that is making the connection to the RDS instance (not the VPN tunnel instances), run the following command to get a MTU setting of 1422 (which worked for us):
sudo ifconfig eth0 mtu 1422