Hadoop cluster on Google Compute Engine: Accessing master node via REST - google-compute-engine

I have deployed a hadoop cluster on google compute engine. I then run a machine learning algorithm (Cloudera's Oryx) on the master node of the hadoop cluster. The output of this algorithm is accessed via an HTTP REST API. Thus I need to access the output either by a web browser, or via REST commands. However, I cannot resolve the address for the output of the master node which takes the form http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091.
I have allowed http traffic and allowed access to ports 80 and 8091 on the network. But I cannot resolve the address given. Note this http address is NOT the IP address of the master node instance.
I have followed along with examples for accessing IP addresses of compute instances. However, I cannot find examples of accessing a single node of a hadoop cluster on GCE, that follows this form http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091. Any help would be appreciated. Thank you.

The reason you're seeing this is that the "HOSTNAME.c.PROJECT.internal" name is only resolvable from within the GCE network of that same instance itself; these domain names are not globally visible. So, if you were to SSH into your master node first, and then try to curl http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091 then you should successfully retrieve the contents, whereas trying to access from your personal browser will just fail to resolve that hostname into any IP address.
So unfortunately, the quickest way for you to retrieve those contents is indeed to use the external IP address of your GCE instance. If you've already opened port 8091 on the network, simply use gcutil getinstance CLUSTER_NAME-m and look for the entry specifying external IP address; then plug that in as your URL: http://[external ip address]:8091.
If you turned up the cluster using bdutil, a more involved but nicer way to access your cluster is by running the bdutil socksproxy command. This opens a dynamic-port-forwarding SSH tunnel to your master node as a SOCKS5 proxy, so that you can then configure your browser to use localhost:1080 as your proxy server, make sure to enable remote DNS resolution, and then visit your browser using the normal http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091 URL.

Related

Confusion about kubernetes Networking with external world

I have a confusion. When i try to access the services like mysql that are externally hosted or outside the cluster, what will be the source address of the packet that are sent to mysql. To make it simple while creating user in mysql for the api to access it, how do i create it?
For example:
CREATE USER 'newuser'#'IP or HOSTNAME' IDENTIFIED BY 'user_password';
What should be the IP? the pod IP or the host IP?
Is there any way through which if the pod is spawn in any node but it can authenticate against mysql?
Thank You
When accessing services outside the kubernetes cluster the source IP will be the regular IP of the node, the application is running on.
So there is no difference if you run the application directly on the node ("metal") or inside a container.
Kubernetes will select an appropriate node to schedule the container (pod) to, so this might change during the lifetime.
If you want to increase the security you should investigate TLS with mutual authentication in addition to the password. The source IP is not the best course of action for dynamic environments like cloud or kubernetes.

Which URL/IP to use, when accessing Kubernetes Nodes on Rancher?

I am trying to expose services to the world outside the rancher clusters.
Api1.mydomain.com, api2.mydomain.com, and so on should be accessible.
Inside rancher we have several clusters. I try to use one cluster specifically. It's spanning 3 nodes node1cluster1, node2cluster1 and node2cluster1.
I have added ingress inside the rancher cluster, to forward service requests for api1.mydomain.com to a specific workload.
On our DNS I entered the api1.mydomain.com to be forwarded, but it didn't work yet.
Which IP URL should I use to enter in the DNS? Should it be rancher.mydomain.com, where the web gui of rancher runs? Should it be a single node of the cluster that had the ingress (Node1cluster1)?
Both these options seem not ideal. What is the correct way to do this?
I am looking for a solution that exposes a full url to the outside world. (Exposing ports is not an option as the companies dns cant forward to them.)
Simple answer based on the inputs provided: Create a DNS entry with the IP address of Node1cluster1.
I am not sure how you had installed the ingress controller, but by default, it's deployed as "DaemonSet". So you can either use any one of the IP addresses of the cluster nodes or all the IP addresses of the cluster nodes. (Don't expect DNS to load balance though).
The other option is to have a load balancer in front with all the node IP addresses configured to actually distribute the traffic.
Another strategy that I have seen is to have a handful of nodes dedicated to run Ingress by use of taints/tolerations and not use them for scheduling regular workloads.

How to find the external IP?

I have a Python application which has been deployed to openshift.
I am using an external REST service in my application. In order to use this service, the developers of the REST service have to whitelist my IP because a Firewall blocks unauthorized IP addresses.
How can I find the external IP of my application? How can I find it in openshift? I tried a few OC commands, but I am not sure if I have to get the IP of the pod or the service.
Out of the box the traffic from internal cluster components will appear to external infrastructure like they are coming from whichever OpenShift compute host their pods are currently scheduled on.
Information on internal cluster networking and how traffic traverses from a process running inside a pod to the external network can be found at SDN: Packet Flow.
In your case you could have the external application whitelist all of the ip addresses of the compute hosts that are expected to run your application pods.
Alternately you could set up an EgressIP. This will cause all traffic originating from a specific OpenShift project to appear as if it is originating from a single ip address. You could then have your external application whitelist the EgressIP address.
Documentation for configuring EgressIP can be found in the official documentation under Enabling Static IPs for External Project Traffic
What you are searching for is the external IP of the Service. A Service acts as a load balancer for your pods but by default it only has a cluster-wide IP address. If you need a URL to access it from the outside, you can create a Route. For your purpose where you need an actual external IP address, you can assign the Service an external IP manually. Information on how to do this can be found in the official OpenShift Docs.

Cannot access Google Cloud Compute Instance External IP

I have set up an Google Cloud Compute Instance:
Machine type
n1-standard-1 (1 vCPU, 3.75 GB memory)
CPU platform
Intel Haswell
Zone
us-east1-c
I can ssh in using the external address.
I have installed the vncserver and can access it on port 5901 from localhost as well as the internal IP.
I am trying to access it from the static, external IP address but it is not working.
I have configured the firewall to open to port to 0.0.0.0/0, but it is not reachable.
Can anyone help?
------after further investigation from the tips from the two answers (thanks, both!), I have a partial answer:
The Google Cloud Compute instance was set, by default, to not allow
HTTP traffic. I reset the configuration to allow HTTP traffic. I
then tried the troubleshooting tip to run a small HTTP service in
python. I was able to get a ressponse from the service over the
internet.
The summary of the current situation is as follows:
The external IP address can be reached
It is enabled and working for SSH
It is enabled and working for HTTP
It does not seem to allow traffic from vncserver
Any idea how to configure the compute instance to allow for vncserver traffic?
If you already verified that Google Firewall or your VM are not blocking packets, you must make sure that VNC service is configured to listen on the external IP address.
You can always use a utility like nmap outside Google project to reveal information on the port status.
enable http/https traffic form the firewall as per the need. it will work!!
The Google Cloud Compute instance was set, by default, to not allow HTTP traffic. I reset the configuration to allow HTTP traffic. I then tried the troubleshooting tip to run a small HTTP service in python. I was able to get a response from the service over the internet.
As such, the original question is answered, I can access Google Cloud Compute Instance External IP. My wider issue is still not solved, but I will post a new, more specific question about this issue
TLDR: make sure you are requesting http not https
In my case i was following the link from my CE instance's External Ip property which takes you directly to the https version and i didn't set up https, so that was causing the 'site not found' error.
Create an entry in your local ssh config file as below with mentioned local forward port. In my case its an example of yarn's IP, which I want to access in browser.
Host hadoop
HostName <External-IP>
User <Local-machine-username>
IdentityFile ~/.ssh/<private-key-for-above-user>
LocalForward 8089 <Internal-IP>:8088
In addition to having the firewall rules to allow HTTP traffic in both Google Cloud Platform and within the OS of the instance, make sure you install a web server such as Apache or Nginx.
After installing the web server, you connect to the instance using SSH and verify you do not get a failed connection with the following command:
$ sudo wget http://localhost
If the connection is positive, it means that you can access your external URL:
http://<IP-EXTERNAL-VM>
Usually there are two main things to check.
1. Port
By default, only port 80, 443 and ICMP are exposed. If your server is running on a different port, create a record for the same.
2. Firewall
Make sure you are allowing http and https traffic based on your need.
oua re
For me the problem was that I set up the traffic for the firewall rule to be 'Egress' instead of 'Ingress'.
If anyone already initiated 'https'
just disable it and check again.

Access external client IP from behind Google Compute Engine network load balancer

I am running a Ruby on Rails app (using Passenger in Nginx mode) on Google Container Engine. These pods are sitting behind a GCE network load balancer. My question is how to access the external Client IP from inside the Rails app.
The Github issue here seems to present a solution, but I ran the suggested:
for node in $(kubectl get nodes -o name | cut -f2 -d/); do
kubectl annotate node $node \
net.beta.kubernetes.io/proxy-mode=iptables;
gcloud compute ssh --zone=us-central1-b $node \
--command="sudo /etc/init.d/kube-proxy restart";
done
but I am still getting a REMOTE_ADDR header of 10.140.0.1.
On ideas on how I could get access to the real Client IP (for geolocation purposes)?
Edit: To be more clear, I am aware of the ways of accessing the client IP from inside Rails, however all of these solutions are getting me the internal Kubernetes IP, I believe the GCE network load balancer is not configured (or perhaps unable) to send the real client IP.
A Googler's answer to another version of my question verifies what I am trying to do is not currently possible with the Google Container Engine Network Load Balancer currently.
EDIT (May 31, 2017): as of Kubernetes v1.5 and up this is possible on GKE with the beta annotation service.beta.kubernetes.io/external-traffic. This was answered on SO here. Please note when I added the annotation the health checks were not created on the existing nodes. Recreating the LB and restarting the nodes solved the issue.
It seems as though this is not a rails problem at all, but one of GCE. You can try the first part of
request.env["HTTP_X_FORWARDED_FOR"]
Explanation
Getting Orgin IP From Load Balancer advises that https://cloud.google.com/compute/docs/load-balancing/http/ has the text
The proxies set HTTP request/response headers as follows:
Via: 1.1 google (requests and responses)
X-Forwarded-Proto: [http | https] (requests only)
X-Forwarded-For: <client IP(s)>, <global forwarding rule external IP> (requests only)
Can be a comma-separated list of IP addresses depending on the X-Forwarded-For entries appended by the intermediaries the client is
traveling through. The first element in the section
shows the origin address.
X-Cloud-Trace-Context: <trace-id>/<span-id>;<trace-options> (requests only)
Parameters for Stackdriver Trace.