gce container logs not showing up in cloud logging - google-compute-engine

i have a rather recent kubernetes cluster running on GCE. I am trying to get my application to log to Cloud Logging / Stackdriver.
I can see all the kubernetes cluster logs there but no container output ever materializes.
So when I follow this guide: http://kubernetes.io/docs/getting-started-guides/logging/, I can see the output of the pod
kubectl logs counter
2163: Wed Aug 31 15:02:52 UTC 2016
This never makes it to the Logging Interface
Pod not showing in selector
The fluentd-cloud-logging pods give no logging output
kubectl logs --namespace=kube-system fluentd-cloud-logging-staging-minion-group-20hk
The /var/log/google-fluentd/google-fluentd.log file looks happy
...
2016-08-31 14:07:16 +0000 [info]: following tail of /var/log/containers/node-problem-detector-v0.1-hgtcr_kube-system_POD-07e5b134c9f8ff48f73f1df41473a84a07738ac750840f09938d604694c4bd6e.log
2016-08-31 14:07:16 +0000 [info]: following tail of /var/log/containers/rails-2607986313-s7r5e_default_POD-9f1dd02f23de552a40297f761d09c03b50e5a2cd9789ef498139d24602d9847e.log
2016-08-31 14:07:16 +0000 [info]: following tail of /var/log/salt/minion
2016-08-31 14:07:16 +0000 [info]: following tail of /var/log/startupscript.log
2016-08-31 14:07:16 +0000 [info]: following tail of /var/log/docker.log
2016-08-31 14:07:16 +0000 [info]: following tail of /var/log/kubelet.log
2016-08-31 14:07:22 +0000 [info]: Successfully sent to Google Cloud Logging API.
2016-08-31 14:07:22 +0000 [info]: Successfully sent to Google Cloud Logging API.
Kubernetes Version is
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.5", GitCommit:"b0deb2eb8f4037421077f77cb163dbb4c0a2a9f5", GitTreeState:"clean", BuildDate:"2016-08-11T20:29:08Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.5", GitCommit:"b0deb2eb8f4037421077f77cb163dbb4c0a2a9f5", GitTreeState:"clean", BuildDate:"2016-08-11T20:21:58Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
Cluster was started with
export KUBE_GCE_ZONE=europe-west1-d
export NODE_SIZE=n1-standard-2
export NUM_NODES=2
export KUBE_GCE_INSTANCE_PREFIX=staging
export ENABLE_CLUSTER_AUTOSCALER=true
export KUBE_ENABLE_CLUSTER_MONITORING=true
export KUBE_ENABLE_CLUSTER_MONITORING=google
Any ideas what I might be doing wrong? To my understanding this should work out of the box, right?

Bit of a long shot, but have you enabled the logging API?
"You can do so from the Developers Console, here. Try going there, clicking the Enable API button, and seeing whether the errors keep coming."
https://github.com/kubernetes/kubernetes/issues/20516
Google Cloud Logging + google-fluentd Dropping Messages

Ok, this is rather silly:
If you run a kubernetes cluster on GCE, the container application logs will appear in the Google Container Engine logs.
Never bothered checking there because, well, I am not using the Container Engine.

Related

Stackdriver Monitoring with full access scope not authorized

After deploying a brand new Google Compute Engine instance with full API access and installing the Stackdriver agent, the Monitoring is not showing any metrics from the agent.
According to the Install Agent manual no further settings (like manually configurating an API key) should be required.
The agent service status also shows the following error:
$ systemctl status stackdriver-agent
Jul 13 10:14:00 host stackdriver-agent[21203]: [ OK ]
Jul 13 10:14:00 host systemd[1]: Started LSB: start and stop Stackdriver Agent.
Jul 13 10:14:00 host collectd[21226]: Initialization complete, entering read-loop.
Jul 13 10:14:00 host collectd[21226]: match_throttle_metadata_keys: 1 history entries, 1 distinct keys, 46 bytes server memory.
Jul 13 10:14:00 host collectd[21226]: tcpconns plugin: Reading from netlink succeeded. Will use the netlink method from now on.
Jul 13 10:14:00 host collectd[21226]: write_gcm: Asking metadata server for auth token
Jul 13 10:14:01 host collectd[21226]: write_gcm: Unsuccessful HTTP request 403: {
"error": {
"code": 403,...
Jul 13 10:14:01 host collectd[21226]: write_gcm: Error talking to the endpoint.
Jul 13 10:14:01 host collectd[21226]: write_gcm: wg_transmit_unique_segment failed.
Jul 13 10:14:01 host collectd[21226]: write_gcm: wg_transmit_unique_segments failed. Flushing.
Google Cloud Console shows the instance having:
Cloud API access scopes
This instance has full API access to all Google Cloud services.
and running the following command inside the instance shows:
$ curl --silent -f -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/scopes
https://www.googleapis.com/auth/cloud-platform
Any thoughts on what is going wrong?
I figured it out:
You have to enable the Google Monitoring API in the API Manager, which is not enabled by default. No need to specify an API key, the default application credentials are picked up.
Interestingly, I have two projects which also use Stackdriver Monitoring since early this year and those do not require the Google Monitoring API to be enabled.

GCE Instance Not Found

I'm trying to set up a Kubernetes cluster on GCE using CoreOS as the base OS. But I'm having the following issue when trying to make the cluster a multizone cluster by setting the --cloud-provider and --cloud-config flags.
The below is the output from the API Server on the master node:
Jun 15 09:22:09 cos-000-pub-pvt-master.c.project-id.internal kubelet-wrapper[1098]: E0615 09:22:09.790068 1098 gce.go:2380] Failed to retrieve instance: "10.0.0.2"
Jun 15 09:22:09 cos-000-pub-pvt-master.c.project-id.internal kubelet-wrapper[1098]: E0615 09:22:09.790125 1098 gce.go:2414] getInstanceByName/multiple-zones: failed to get instance 10.0.0.2; err: instance not found
Jun 15 09:22:09 cos-000-pub-pvt-master.c.project-id.internal kubelet-wrapper[1098]: E0615 09:22:09.790151 1098 kubelet.go:1131] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: instance not found
When running kubectl get nodes there is no output, but when running kubectl --namespace kube-system get pods I see the API Server, Controller Manager, Scheduler and each of the Proxies for each of the nodes. Although I can see them they are restarted every 45-60 seconds.
The GCE config file is as follows:
[GLOBAL]
multizone=true
If I've left something out that can help let me know.
It seems that the --hostname-override flag was causing this issue. I've removed that and the master is now able to find the node in the GCE API.

Google Cloud Logging + google-fluentd Dropping Messages

I have a rather small (1-2 node) kubernetes cluster running in GKE with ±40 Pods running. The problem at hand is that it's not logging to the GCE Console properly. I see lots of messages from the fluentd container(s) in the following format:
$ kubectl logs fluentd-cloud-logging-gke-xxxxxxxx-node-xxxx
2016-02-02 23:30:09 +0000 [warn]: Dropping 10 log message(s) error_class="Google::APIClient::ClientError" error="Project has not enabled the API. Please use Google Developers Console to activate the 'logging' API for your project."
2016-02-02 23:30:09 +0000 [warn]: Dropping 1 log message(s) error_class="Google::APIClient::ClientError" error="Project has not enabled the API. Please use Google Developers Console to activate the 'logging' API for your project."
2016-02-02 23:30:09 +0000 [warn]: Dropping 3 log message(s) error_class="Google::APIClient::ClientError" error="Project has not enabled the API. Please use Google Developers Console to activate the 'logging' API for your project."
2016-02-02 23:30:09 +0000 [warn]: Dropping 41 log message(s) error_class="Google::APIClient::ClientError" error="Project has not enabled the API. Please use Google Developers Console to activate the 'logging' API for your project."
2016-02-02 23:30:09 +0000 [warn]: Dropping 5 log message(s) error_class="Google::APIClient::ClientError" error="Project has not enabled the API. Please use Google Developers Console to activate the 'logging' API for your project."
...and so on. I'm seeing ~5 of these messages per second, so I know things are producing logs. However, in the compute engine console I see something like the following:
So somewhere in between I'm obviously loosing lots of messages. Strange though, that I'm not loosing all these messages!
The cluster is configured with Logging.write and Monitoring.all privileges as suggested in GH issue #15727
It's definitely confusing that some of the logs are showing up. Given that error message, I'd expect none of your logs to be showing up in the viewer, since it sounds like the logging API hasn't been enabled for your project yet.
You can do so from the Developers Console, here. Try going there, clicking the Enable API button, and seeing whether the errors keep coming.

Message Archive Management Plugin (Prosody) can't open archive

I'm trying to get Message Archive Management ( mam ) on a prosody server
working.
I tried it with SQLite3, MySQL and PostgreSQL.
Always this log:
Oct 20 14:56:21 general info Hello and welcome to Prosody version 0.9.7
Oct 20 14:56:21 general info Prosody is using the epoll backend for connecti$
Oct 20 14:56:22 localhost:mam error Could not open archive storage
The archive is existing in /var/lib/prosody/.

Google Cloud SDK 0.9.39 still fails to do setup-managed-vms

I just updated my gcloud components on my OS X Mavericks laptop, so now I have:
$ gcloud version
Google Cloud SDK 0.9.39
$ boot2docker version
Boot2Docker-cli version: v1.3.2
Git commit: e41a9ae
I was hoping managed vms setup would work, but alas:
$ gcloud preview app setup-managed-vms
Select the runtime to download the base image for:
[1] Go
[2] Java
[3] Python27
[4] All
Please enter your numeric choice (4): 2
Pulling base images for runtimes [java] from Google Cloud Storage
Pulling image: google/appengine-java
Traceback (most recent call last):
File "/Users/hussein/google-cloud-sdk/./lib/googlecloudsdk/gcloud/gcloud.py", line 175, in <module>
main()
File "/Users/hussein/google-cloud-sdk/./lib/googlecloudsdk/gcloud/gcloud.py", line 171, in main
_cli.Execute()
File "/Users/hussein/google-cloud-sdk/./lib/googlecloudsdk/calliope/cli.py", line 385, in Execute
post_run_hooks=self.__post_run_hooks, kwargs=kwargs)
File "/Users/hussein/google-cloud-sdk/./lib/googlecloudsdk/calliope/frontend.py", line 274, in _Execute
pre_run_hooks=pre_run_hooks, post_run_hooks=post_run_hooks)
File "/Users/hussein/google-cloud-sdk/./lib/googlecloudsdk/calliope/backend.py", line 928, in Run
result = command_instance.Run(args)
File "/Users/hussein/google-cloud-sdk/lib/googlecloudsdk/appengine/app_commands/setup_managed_vms.py", line 39, in Run
args.image_version)
File "/Users/hussein/google-cloud-sdk/./lib/googlecloudsdk/appengine/lib/images/pull.py", line 54, in PullBaseDockerImages
util.PullSpecifiedImages(docker_client, image_names, version, bucket)
File "/Users/hussein/google-cloud-sdk/./lib/googlecloudsdk/appengine/lib/images/util.py", line 232, in PullSpecifiedImages
'Error pulling {image}: {e}'.format(image=image_name, e=e))
googlecloudsdk.appengine.lib.images.util.DockerPullError: Error pulling google/appengine-java: 500 Server Error: Internal Server Error ("Invalid registry endpoint "http://localhost:49153/v1/". HTTPS attempt: Get https://localhost:49153/v1/_ping: read tcp 127.0.0.1:49153: connection reset by peer. HTTP attempt: Get http://localhost:49153/v1/_ping: read tcp 127.0.0.1:49153: connection reset by peer")
I'm very new to Docker and managed vas, and I'm wondering if the issue is due to my boot2docker port forwarding setup. My env is setup correctly I think;
$ env | grep DOCKER
DOCKER_HOST=tcp://192.168.59.103:2376
DOCKER_TLS_VERIFY=1
DOCKER_CERT_PATH=/Users/h/.boot2docker/certs/boot2docker-vm
With the docker host IP being:
$ boot2docker ip
docker#localhost's password:
The VM's Host only interface IP address is: 192.168.59.103
Finally, the containers on my system so far are:
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
73c6d317a631 google/docker-registry:latest "./run.sh" 2 minutes ago Exited (-1) About a minute ago goofy_archimedes
40b709d6fa00 gcloud-credentials-image:latest "/true" 2 minutes ago Exited (0) 2 minutes ago gcloud-credentials-1417828737.2
a3073bc56ff2 google/docker-registry:latest "./run.sh" 47 hours ago Exited (-1) 47 hours ago distracted_bell
1b6fe130af45 11cd171d89b3 "/true" 47 hours ago Exited (0) 47 hours ago gcloud-credentials-1417707423.48
28c181e66b11 google/docker-registry:latest "./run.sh" 2 days ago Exited (0) 4 minutes ago 0.0.0.0:5000->5000/tcp elegant_darwin
Then why is gcloud python script trying to access the registry on localhost? Someone, please show me the light!