IPFS 504 Gateway Time-out with public gateways from local node - ipfs

I installed Kubo on my local PC, I just entered the ipfs daemon command to run it. After that, I pushed some text content, it works fine and I can access it from the IPFS public gateway. But, when I put any image or file type of content and go to access it from a public IPFS gateway, it shows "504 Gateway Time-Out (Openresty)". Also, it takes a long time to load any content, it's too slow. Is it possible to make it too fast? How can I fix the "504 Gateway Time-Out" issue?

If the gateway cannot find the data you've requested, it will time out. A gateway is just an IPFS node with an HTTP server bolted onto it (more-or-less).
If the only node containing the data is your home node, you must ensure your node is discoverable so the gateway node can find and access your data. Personally I like to put my data on web3.storage if I need the popular public gateways to find it.
For more info check out my answer on the related question Why is it so hard for web browsers to open IPFS links?.

Related

Do all IPFS requests go through a public gateway?

I initialized a local directory with ipfs add -r . and was able to access it through https://ipfs.io/ipfs gateway using the hash.
I was also able to get the files from another node using ipfs get -o <file-name> <hash>
Is the file served through the ipfs.io gateway or through local gateways of other decentralized participating nodes?
TLDR: No
The go/js-ipfs CLIs will not make any HTTP related requests, to public gateways or otherwise, when you perform an ipfs get
Gateways, public or local, are just a convenient way of bridging the IPFS protocol stack with the standard experience of performing an HTTP request for some data. A local gateway will let you use standard HTTP based applications (e.g. web browsers, curl, etc.) while still utilizing your locally running IPFS daemon under the hood. On the other hand, the public gateways let you use standard HTTP based applications while using someone else's (i.e. publicly run infrastructure's) IPFS daemon under the hood.
The main utility of the public gateways is making content that peers have in the public IPFS network available over HTTP to people and applications that are not able to run IPFS.

Access external client IP from behind Google Compute Engine network load balancer

I am running a Ruby on Rails app (using Passenger in Nginx mode) on Google Container Engine. These pods are sitting behind a GCE network load balancer. My question is how to access the external Client IP from inside the Rails app.
The Github issue here seems to present a solution, but I ran the suggested:
for node in $(kubectl get nodes -o name | cut -f2 -d/); do
kubectl annotate node $node \
net.beta.kubernetes.io/proxy-mode=iptables;
gcloud compute ssh --zone=us-central1-b $node \
--command="sudo /etc/init.d/kube-proxy restart";
done
but I am still getting a REMOTE_ADDR header of 10.140.0.1.
On ideas on how I could get access to the real Client IP (for geolocation purposes)?
Edit: To be more clear, I am aware of the ways of accessing the client IP from inside Rails, however all of these solutions are getting me the internal Kubernetes IP, I believe the GCE network load balancer is not configured (or perhaps unable) to send the real client IP.
A Googler's answer to another version of my question verifies what I am trying to do is not currently possible with the Google Container Engine Network Load Balancer currently.
EDIT (May 31, 2017): as of Kubernetes v1.5 and up this is possible on GKE with the beta annotation service.beta.kubernetes.io/external-traffic. This was answered on SO here. Please note when I added the annotation the health checks were not created on the existing nodes. Recreating the LB and restarting the nodes solved the issue.
It seems as though this is not a rails problem at all, but one of GCE. You can try the first part of
request.env["HTTP_X_FORWARDED_FOR"]
Explanation
Getting Orgin IP From Load Balancer advises that https://cloud.google.com/compute/docs/load-balancing/http/ has the text
The proxies set HTTP request/response headers as follows:
Via: 1.1 google (requests and responses)
X-Forwarded-Proto: [http | https] (requests only)
X-Forwarded-For: <client IP(s)>, <global forwarding rule external IP> (requests only)
Can be a comma-separated list of IP addresses depending on the X-Forwarded-For entries appended by the intermediaries the client is
traveling through. The first element in the section
shows the origin address.
X-Cloud-Trace-Context: <trace-id>/<span-id>;<trace-options> (requests only)
Parameters for Stackdriver Trace.

Google Compute Engine HTTP Load Balancing - Log Files Available?

We are having a weird issue with the HTTP Load Balancing. Is there a way to view log files to troubleshoot why a request would be failing with a (502) Bad Gateway? Like log files or something like that?
Traffic from the load balancer to your instances has an IP address in the range of 130.211.0.0/22. When viewing logs on your load balanced instances, you will not see the source address of the original client. Instead, you will see source addresses from this range.
The load balancing configuration automatically creates firewall rules if the instance operating system is a Compute Engine image. If not, you have to create the firewall rules manually by adding the following in your GCE firewall:
130.211.0.0/22 tcp:1-5000 Apply to all targets
A 502 error can be caused by an unhealthy instance as well. Make sure that your instance is healthy. You can narrow down the issue by trying to Curl your instance's IP address behind the load balancer to check if it returns a correct output.

Server Sent Events in Google Compute Engine

I'm trying to get an app that uses Server Sent events working on Google Compute Engine, when SSH'd into the box I can view them, but not externally via the ephermeral IP, aka
curl 0.0.0.0/route
works from inside the box but
curl xx.xx.xx.xx/route
just hangs, looking at the headers from other routes there seems to be some sort of cacheing proxy in between the box and the outside word that is preventing server sent events from getting out because the the connection hasn't completed, there is a similar issue with nginx until you set proxy_cache off, but as far as I can tell there is no documentation for configuring the proxy that compute engine uses.
Is it possible to do server sent events from Google Compute Engine and if so what do you have to do to get it to work?
edit:
Request is created with the browser EventSource object, so it has the default headers which look to be Accept:text/event-stream, Cache-Control:no-cache, plus Referer and User-Agent.
The headers I add are Content-Type:text/event-stream, Cache-Control:no-cache, and Connection:keep-alive.
When run in AWS all is fine when I run it behind nginx assuming I modify the config appropriately.
In Google Compute Engine other pages load fine but the route with Server Sent Events just hangs never even receiving headers. The reason I suspect google is sticking a proxy between the GCE box and the outside world is the addition of Via:HTTP/1.1 proxy10205 headers.
There may be magic on the lower network layers but there is no (transparent or otherwise) proxy between your VM and the internet on GCE for the external IP. I'm not sure where the Via header comes from, doesn't the browser/client have a proxy configured?
External IPs are not configured in the most straightforward way on GCE though which might be tripping up something in the stack. I think for external IPs, the external IP itself does not appear anywhere in the VM config, it's translated to the VM internal IP by 1-1 NAT. Loadbalanced IPs do end up on the host with external IP visible though (even though even these are configured in a funny way).
Even though I don't think anything should really care about the server IP for SSE, maybe try setting up a loadbalanced IP pointing to just that one instance and see if it works any better?
"Via:HTTP/1.1 proxy10205" in your HTTP response is not from Google Compute Engine.
The GCE does not strip out the Server-Sent-Events headers. I list the simple steps below which can help you to configure a demo Server-Sent Events on an GCE VM instance:
Create an GCE instance using CentOS image.
Install Apache web server and PHP:
$ sudo yum install httpd php
Create an index.html file with the HTML content from this page :
$ sudo vi /var/www/html/index.html
Create a PHP file called demo_sse.php in the www root directory ($ sudo vi /var/www/html/demo_sse.php ) with the following content:
<?php
header('Content-Type: text/event-stream');
header('Cache-Control: no-cache');
$time = date('r');
echo "data: The server time is: {$time}\n\n";
flush();
?>
Now visit the webpage. You can also verify the header using curl command:
$ curl -H "Accept:text/event-stream" --verbos http://<YOUR-GCE-IP ADDRESS>/demo_sse.php

Hadoop cluster on Google Compute Engine: Accessing master node via REST

I have deployed a hadoop cluster on google compute engine. I then run a machine learning algorithm (Cloudera's Oryx) on the master node of the hadoop cluster. The output of this algorithm is accessed via an HTTP REST API. Thus I need to access the output either by a web browser, or via REST commands. However, I cannot resolve the address for the output of the master node which takes the form http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091.
I have allowed http traffic and allowed access to ports 80 and 8091 on the network. But I cannot resolve the address given. Note this http address is NOT the IP address of the master node instance.
I have followed along with examples for accessing IP addresses of compute instances. However, I cannot find examples of accessing a single node of a hadoop cluster on GCE, that follows this form http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091. Any help would be appreciated. Thank you.
The reason you're seeing this is that the "HOSTNAME.c.PROJECT.internal" name is only resolvable from within the GCE network of that same instance itself; these domain names are not globally visible. So, if you were to SSH into your master node first, and then try to curl http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091 then you should successfully retrieve the contents, whereas trying to access from your personal browser will just fail to resolve that hostname into any IP address.
So unfortunately, the quickest way for you to retrieve those contents is indeed to use the external IP address of your GCE instance. If you've already opened port 8091 on the network, simply use gcutil getinstance CLUSTER_NAME-m and look for the entry specifying external IP address; then plug that in as your URL: http://[external ip address]:8091.
If you turned up the cluster using bdutil, a more involved but nicer way to access your cluster is by running the bdutil socksproxy command. This opens a dynamic-port-forwarding SSH tunnel to your master node as a SOCKS5 proxy, so that you can then configure your browser to use localhost:1080 as your proxy server, make sure to enable remote DNS resolution, and then visit your browser using the normal http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091 URL.