Load balancing many services on few GCE nodes - google-compute-engine

I have 2 GCE nodes, each running the same N services. For each service, I use the GCE network load balancer to distribute requests to the 2 nodes. I therefore created the following setup:
Since I want the load balancer to check the health of each service separately, I have a health check for each of the N services (every health check checks a different port for an HTTP response)
Since each service has its own health check, I have N target-pools, all of them just containing node 1 and 2, but all with a different health check.
Since I have N target pools, I also have N forwarding rules
Since I want each of these load balanced services to be available externally (actually, from within GAE), I assign each of the forwarding rules a static IP address
The problem is that I have more than 7 services I want to run, and the regional quota of GCE only allow 7 static IP addresses. This makes me suspect I'm doing something wrong, and there's a better design for what I'm doing.
The root of my problem seems to be that I want a health check for each service (instead of each node), which I can only seem to do if I split up the entire path up to the forwarding rule in the GCE network load balancer.

Your configuration looks reasonable, given that each service has its own dedicated health-check.
Note that if you need more than the default resource quotas and your project is not in Free Trial stage, you can request more quota using the quota change request form.

Related

Automatically add a range of IP in security group in AWS

My RDS instance is configured to only accept connections from EC2 Security group. I connect my SQL client via SSH.
This is OK, but now, I have an external service that also need to connect DB.
This services tells me that he will use this range of IP: https://ip-ranges.amazonaws.com/ip-ranges.json
So, I must whitelist it in my RDS Security Group.
My question: How can I add this json to my SG automatically.
Thanks
There is no automated way to apply that set of IP ranges automatically. You'll need to parse it and apply the ranges yourself. Use your favourite tool of choice (bash, python, c#, manually).
However, the JSON file he gave you is the IP ranges for all of AWS in all regions.
If your external service could tell you which regions they use, you can reduce that list significantly.
For example, if you can reduce it to just Virginia region (us-east-1), then there are 187 IP blocks to apply.
By default, security groups have a limit of 50 rules. There's a limit of 5 network security groups per network interface. So basically you're looking at a hard limit of 250 rules.
If you want, you can contact AWS support and they can adjust the rule-per-security-group limit to 250 by decreasing the security-groups-per-network-interface limit to 1. Or you can spread up to 250 rules over 5 security groups.
Source: Amazon VPC Limits
If you need more than 250 rules, you'll need to setup a proxy with 2+ public IP addresses to accommodate the extra security groups required.
Additional Note:
Applying all of these IP ranges would allow anyone to connect to your RDS instance from an AWS instance. This may be too wide a security hole to open.
You can setup a lambda function to do this for you. Here is a script example in python from AWS labs that does it for an ELB security group and the cloudfront ip address range.
https://github.com/awslabs/aws-cloudfront-samples/tree/master/update_security_groups_lambda

Solution for 1 GCP network-to-many GCP networks VPN topologies that addresses internal IP ambiguity

I have a problem where our firm has many GCP projects, and I need to expose services on my project to these distinct GCP projects. Firewalling in individual IPs isn't really sustainable, as we dynamically spin up and tear down hundreds of GCE VMs a day.
I've successfully joined a network from my project to another project via GCP's VPN, but I'm not sure what the best practice should be joining multiple networks to my single network, especially since most of the firm has the same default internal address subnetwork range for the project's default network. I understand that doing it the way that I am will probably work (it's unclear if it'll actually reach the right network, though), but this creates a huge ambiguity in terms of IP collisions, where potentially two VMs could exists in separate networks and have the same internal IP.
I've read that outside of the cloud, most VPNs support NAT remapping, which seems to let you remap the internal IP space of the remote peer's subnet (like, 10.240.* to 11.240.*), such that you can never have ambiguity from the peer doing the remapping.
I also know that Cloud Router may be an option, but it seems like a solution to a very specific problem that doesn't fully encompass this one: dynamically adding and removing subnets to the VPN.
Thanks.
I think you will need to utilize the custom subnet mode network (non-default), specify non-overlapping IP ranges for the networks to avoid collision. See "Creating a new network with custom subnet ranges" in this doc: https://cloud.google.com/compute/docs/subnetworks#networks_and_subnetworks

AWS Elastic Load Balancing: Seeing extremely long initial connection time

For a couple of days, we often see an extremely long initial connection time (15s - 1.3 minutes) to our ELBs when making any request via ssl.
Oddly, I was only able to observe this in Google Chrome (not Safari nor Firefox nor curl).
It does not occur every single request, but around 50% of requests. It occurs with the first request (OPTIONS-call).
Our setup is the following:
Cross-Zone ELB that connects to a node.js backend (currently in 2 AZs in eu-west-1). All instances are healthy and once the request comes through, it is processed normally. Currently, there is basically no load on the system. Cloudwatch for ELB does not report any backend connection errors, neither a SurgeQueue (value 0) nor a spillover count. The ELB metrics show a low latency (< 100 ms).
We have Route53 configured to route to the ELB (we don't see any dns trouble, see attached screenshot).
We have different REST-APIs that all have this setup. It occurs to all of the ELBs (each of them is connecting to an indipendent node.js backend). All of these ELBs are set up the same way via our cloudformation template.
The ELBs also do our SSL-termination.
What could lead to such a behavior? Is it possible that the ELBs are not configured properly? And why could it only appear on Google Chrome?
I think it is a possible ELB misconfiguration. I had the same problem when I put private subnets to ELB. Fixed it by changing private subnets to public. See https://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-manage-subnets.html
Just to follow up on #Nikita Ogurtsov's excellent answer; I had the same problem except that it was just one of my subnets that happened to be private and the rest public.
Even if you think your subnets are public, I recommend you double check the route tables to ensure that they all have a Gateway.
You can use a single Route Table that has a Gateway for all your LB subnets if this make sense
VPC/Subnets/(select subnet)/Route Table/Edit
For me the issue was that I had an unused "Availability Zone" in my Classic Load Balancer. Once I removed the unhealthy and unused Availability Zone the consistent 20 or 21 second delay in "Initial Connection" dropped to under 50ms.
Note: You may need to give it time to update. I had my DNS TTL set to 60 seconds so I would see the fix within a minute of removing the unused Availability Zone.
This can be a problem with the elb of amazon. The elb scale the number of instances with the number of request.
You should see some pick of requests at those times.
Amazon adds some instances in order to fit the load.
the instances are reachable during the launch process so your clients get those timeout. it's totally randomness so you should :
ping the elb in order to get all the ip used
use mtr on all ip found
Keep an eye on CloudWatch
Find some clues
Solution If you're DNS is configured to hit directly on the ELB -> you should reduce the TTL of the association (IP,DNS). The IP can change at any time with the ELB so you can have serious damage on your traffic.
The client keep Some IP from the ELB in cache so you can have those can of trouble.
Scaling Elastic Load Balancers
Once you create an elastic load balancer, you must configure it to accept incoming traffic and route requests to your EC2 instances. These configuration parameters are stored by the controller, and the controller ensures that all of the load balancers are operating with the correct configuration. The controller will also monitor the load balancers and manage the capacity that is used to handle the client requests. It increases capacity by utilizing either larger resources (resources with higher performance characteristics) or more individual resources. The Elastic Load Balancing service will update the Domain Name System (DNS) record of the load balancer when it scales so that the new resources have their respective IP addresses registered in DNS. The DNS record that is created includes a Time-to-Live (TTL) setting of 60 seconds, with the expectation that clients will re-lookup the DNS at least every 60 seconds. By default, Elastic Load Balancing will return multiple IP addresses when clients perform a DNS resolution, with the records being randomly ordered on each DNS resolution request. As the traffic profile changes, the controller service will scale the load balancers to handle more requests, scaling equally in all Availability Zones.
Best Practices ELB on AWS
ALB Loadbalancer need 2 Availability Zones. If you use a Privat/Public/Nat VPC setting, then must all public subnets have a connection to the Internet.
For me the issue was that the ALB was pointing to an Nginx instance, which had a misconfigured DNS resolver. This meant that Nginx tried to use the resolver, timed out, and then actually started working a bit later.
Not really super connected with Load Balancer itself, but maybe helps someone figure out the issue in their own setup.
Check a security group too. That was an issue in my case.
I see a similar problem in my Chrome logs (1.3m lag). It happens in an OPTIONS request, and from wireshark, I don't even see the request leaving the PC in the first place. Any suggestions as to what Chrome might be doing are welcome.
We have recently encountered chrome taking 1.3 mins to load pages but the cause was slightly different. Just popping it here incase it helps someone.
1.3 mins seems to be how long Chrome will wait when trying to connect to a specific IP. Our domain name has multiple IP addresses in the A record (similar to a CNAME setup) and one of those IP's belonged to a server that had crashed. So sometimes the browser would connect quickly because it used a valid IP and sometimes we would get the long wait as it tried to connect to the invalid IP, timed out, and then retried with a valid IP.
So it is worth checking that all the IP's listed when you dig your domain, are resolving correctly.

Easy way to NAT compute engine instances

I assume the support for NAT is already available with the routing and networking available in compute engine? Looking for some easy to read documentation and commands to setup a situation where either one instance acts as a router and other instances can use that to access the public internet. Another scenario I'm looking for is how to make instances with no external IP address be able to access the internet. Is there a gcutil friendly way of scripting this up?
It sounds like you're looking for the Routes Collection. For your first case, the examples should show you how one instance can act as a gateway for other instances by setting a route for the internal nodes to use the gateway as a "next hop" for their traffic.
For your second scenario, there is a caveat listed that "Currently, any packets sent to the Internet must be sent by an instance that has an external IP address. If you create a route that sends packets to the Internet from a particular instance, that instance must also have an external IP. If you create a route that sends packets to the Internet gateway, but the source instance doesn't have an external IP address, the packet will be dropped."

how do you add additional nics to a compute engine vm?

how do I add a NIC to a compute engine instance? I need more then one NIC so I can build out an environment...I've looked all over and there is nothing on how to do it...
I know it's probably some API call through the SDK, but I have no idea, and I can't find anything on it.
EDIT:
It's the rhel6 image. figured I should clarify.
The question is probably old and a lot has changed since. Now it's definitely possible to add more nics to an instance but only at creation time (you can find a networking tab on the create instance page on the portal - corresponding rest api exists too). Each nic has to connect to a different virtual network, so you need to create more before creating the instance (if you don't have already).
Do you need an external address or an internal address? If external, you can use gcutil to add an IP address to an existing instance. If internal, you can configure a static network address on the instance, and add a route entry to send traffic for that address to that instance.
I was looking for similiar thing (to have a VM which runs Apache and nginx simultaneously on different IPs), but it seems like although you can have multiple networks (up to 5) in a project and each network can belong to multiple instances, you can not have more than one network per instance. From the documentation:
A project can contain multiple networks and each network can have multiple instances attached to it. [...] A network belongs to only one project and each instance can only belong to one network.