Monitoring unhealthy hosts on google cloud - google-compute-engine

I am using an external monitoring service (not stackdriver)
I wish to monitor the number of unhealthy hosts on my load balancer.
It seems like the google cloud api doesn't expose this metrics
therefore I implemented a custom script that gets the instance groups of the load balancer, get the instances' data (dns) and performs the health check
pretty cumbersome. is there a simple way to do it?

You can use the command 'gcloud compute backend-services get-health' to get the status of each instance in your backend service. This command will provide the current status of each instance, HEALTHY or UNHEALTHY, that is part of your backend service.

Related

Share state across Google Cloud Functions

I have a Google Cloud Function that I want to return the same value to all clients calling it. The value is set by another Google Cloud Function. I have this working using Firestore, but I want something that can store the value in memory or push the value change into an event queue.
If you look for in memory and low latency data storage, you can have a look to memorystore service. It's based on Redis product and can serve you data in key-value access mode at low latency.
Memorystore is only available with a private IP in your VPC. For this, you can plug a serverless VPC Connector to your functions (who write and who read) to allow them to access to your VPC and thus to access to Memorystore service.
Take care to create your functions, your serverless VPC Connector and your Memorystore in the same region to improve the latency.
If it doesn't work, have a look to your firewall rules and allow the Redis traffic port (6379)

Are custom metadata values for GCE instance stored securely?

I was wondering if custom metadata for google compute engine VM instances was an appropriate place to store sensitive information for configuring apps that run on the instance.
So we use container-optimised OS images to run microservices. We configure the containers with environment variables for things like creds for db connections and other systems we integrate with.
The VMs are treated as ephemeral for each CD deployment and the best I have come up with so far is to create an instance template with config values loaded via a file I keep on my local machine into the VM custom metadata, which is then made available to a systemctl unit when the VM starts up (cloud-config).
The essence of this means environment variable values (some containing creds) are uploaded by me (which don't change very much) and are then pulled from the VM instance metadata server when a new VM is fired up. So I'm just wondering if there's any significant security concerns with this approach...
Many thanks for your help
According to the Compute Engine documentation :
Is metadata information secure?
When you make a request to get
information from the metadata server, your request and the subsequent
metadata response never leaves the physical host running the virtual
machine instance.
Since the request and response are not leaving the physical host, you will not be able to access the metadata from another VM or from outside Google Cloud Platform. However, any user with access the VM will be able to query the metadata server and retrieve the information.
Based on the information you provided, storing credentials for a test or staging environment in this manner would be acceptable. However, if this is a production system with customer or information important to the business, I would keep the credentials in a secure store that tracks access. The data in the metadata server is not encrypted, and accesses are not logged.

AWS SQS to receive message from outside of AWS

my company has a messaging system which sends real-time messages in JSON format, and it's not built on AWS, and will not have any VPN connection with AWS.
our team is trying to use AWS SQS to receive these messages, which will then have DynamoDB process JSON messages to TSV, then load into RDS.
however, as per the FAQ, SQS can only receive message from within AWS.
https://aws.amazon.com/sqs/faqs/
Q: Who can perform operations on a message queue?
Only an AWS account owner (or an AWS account that the account owner has delegated rights to can perform operations on an Amazon SQS message queue.
In order to use SQS, one way I can think of is to create a public-facing EC2 instance, which receives messages and passes over to SQS.
My questions here are:
is my idea correct?
if it's correct, can you share any details on how to build any applications on this EC2 instance to achieve the functionality (I have no experience on application development, your insights are really appreciated!)
is there any easier/better options in AWS that can achieve the goal to receive message in my use case?
is my idea correct?
No, it isn't.
You're misinterpreting the (admittedly somewhat unclear) information in the FAQ.
SQS is accessible and usable from anywhere on the Internet. Its only exposed interface is HTTP(S). In fact, from inside EC2, SQS is not accessible unless the EC2 instance actually has outbound access to the Internet.
The point being made in the documentation is not that you need to be "inside" AWS to use queues, but rather that you need to be in possession of an authorized set of AWS credentials in order to work with queues.¹
If you have an AWS account, you have credentials, and you can use SQS. There is no requirement that you access the queue from "inside" AWS.
Choose the endpoint closest to your servers (for lowest latency) and you should find it open and accessible, from anywhere.
¹Queues can be configured to allow anonymous acccess after they are created. (Don't do it, I'm just saying it is possible.) This section of the FAQ seems to be referring to a subset of operations, such as creating queues.
I was not able to write to SQS from an external service. I found some partial explanations but got stuck at the role creation.
The alternative I found is using AWS services Lambda + API Gateway to write to SQS.
This tutorial was extremely helpful, explaining all the steps in great details:
https://startupnextdoor.com/adding-to-sqs-queue-using-aws-lambda-and-a-serverless-api-endpoint/
You can access sqs from anywhere once you have proper permission through accesskey&secret key or IAM role.
SQS is not specific to vpc
It is clear that you try to do this :
Take message from your company messaging system, send it to SQS.
It is not wrong using your method (using EC2 as a bridge). However, you don't need EC2 to connect to SQS.
All AWS services can be access using AWS API(e.g. Python boto3, etc) from internet, as long as you provide the correct credential. So you can put your "middleware" in anywhere as long as you are able establish connection to the said services.
So there is lots of more options available to you. e.g. trigger from your messaging system; use AWS Lambda, etc.
Thanks for sharing the information and your insights with me!
I have tested below solution, which works for my use case:
created an endpoint in AWS API Gateway, which is able to receive messages from company messaging system, a system that does not carry AWS credentials
created a Lambda function triggered by API Gateway, so once a message arrives, Lambda will digest the JSON message and convert it to TSV, and then load into RDS

Google Load-Balancing CDN

I am using the Google Load-Balancer with the CDN option enabled.
When I setup the Backend Configuration for the load-balancer, I setup a backend with instances in US-Central, US-West and US-East.
Everything is working great, except all traffic is being routed only to the US-West backend service.
Does the load-balancer option route traffic to the closest backend service?
I see that there is an advanced menu in the load balancer for creating forwarding rules, target proxies and more.
Is there something I need to do to make my load-balancer load closest to client?
If they are in Florida and the CDN does not have the file, they get routed to the US-East VM Instance?
If that is not possible, it seems like having only an US-Central server would better than having US-Central, US-East and US-West? That way East Coast misses are not going to the West Coast to get the file. Everything will pull from the central location.
Unless there is a way to route traffic from the load-balancer to the closest VM instance, it seems as if the only solution would be to create different load balancers with the CDN enabled and use DNS routing to point to the CDN pool that is closest.
That setup would use 3 different CDN ip address's, 3 Compute Engine ip address's and dns latency or location routing. If they are in Florida, route them to the Google Load Balancer CDN in the east coast.
I'm not sure that would be a good solution on top of the Anycast ip routing. It seems like overkill.
Thank you for listening and any help or guidance would be appreciated.
"By default, to distribute traffic to instances, Google Compute Engine picks an instance based on a hash of the source IP and port and the destination IP and port."
Similar question: Google compute engine load balancing not routing properly Except all traffic in a live environment is all going to the same VM instance.
I am using the Google CDN Frontend Anycast ip address.
I think Elving is right and there may be a mis-configuration. Here is a screen shot of the VM instances in the Google Cloud. It says the two instances aren't in use.
Here is another picture of the Instances Groups. I don't see a clear way to make the instances attached to the instance groups.
The load balancer will automatically route traffic to the nearest instance group with capacity. You don't need to do anything other than configure your backend service to use multiple instance groups.
There's more information at https://cloud.google.com/compute/docs/load-balancing/http/.

Hadoop cluster on Google Compute Engine: Accessing master node via REST

I have deployed a hadoop cluster on google compute engine. I then run a machine learning algorithm (Cloudera's Oryx) on the master node of the hadoop cluster. The output of this algorithm is accessed via an HTTP REST API. Thus I need to access the output either by a web browser, or via REST commands. However, I cannot resolve the address for the output of the master node which takes the form http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091.
I have allowed http traffic and allowed access to ports 80 and 8091 on the network. But I cannot resolve the address given. Note this http address is NOT the IP address of the master node instance.
I have followed along with examples for accessing IP addresses of compute instances. However, I cannot find examples of accessing a single node of a hadoop cluster on GCE, that follows this form http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091. Any help would be appreciated. Thank you.
The reason you're seeing this is that the "HOSTNAME.c.PROJECT.internal" name is only resolvable from within the GCE network of that same instance itself; these domain names are not globally visible. So, if you were to SSH into your master node first, and then try to curl http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091 then you should successfully retrieve the contents, whereas trying to access from your personal browser will just fail to resolve that hostname into any IP address.
So unfortunately, the quickest way for you to retrieve those contents is indeed to use the external IP address of your GCE instance. If you've already opened port 8091 on the network, simply use gcutil getinstance CLUSTER_NAME-m and look for the entry specifying external IP address; then plug that in as your URL: http://[external ip address]:8091.
If you turned up the cluster using bdutil, a more involved but nicer way to access your cluster is by running the bdutil socksproxy command. This opens a dynamic-port-forwarding SSH tunnel to your master node as a SOCKS5 proxy, so that you can then configure your browser to use localhost:1080 as your proxy server, make sure to enable remote DNS resolution, and then visit your browser using the normal http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091 URL.