Stackdriver Monitoring with full access scope not authorized - google-compute-engine

After deploying a brand new Google Compute Engine instance with full API access and installing the Stackdriver agent, the Monitoring is not showing any metrics from the agent.
According to the Install Agent manual no further settings (like manually configurating an API key) should be required.
The agent service status also shows the following error:
$ systemctl status stackdriver-agent
Jul 13 10:14:00 host stackdriver-agent[21203]: [ OK ]
Jul 13 10:14:00 host systemd[1]: Started LSB: start and stop Stackdriver Agent.
Jul 13 10:14:00 host collectd[21226]: Initialization complete, entering read-loop.
Jul 13 10:14:00 host collectd[21226]: match_throttle_metadata_keys: 1 history entries, 1 distinct keys, 46 bytes server memory.
Jul 13 10:14:00 host collectd[21226]: tcpconns plugin: Reading from netlink succeeded. Will use the netlink method from now on.
Jul 13 10:14:00 host collectd[21226]: write_gcm: Asking metadata server for auth token
Jul 13 10:14:01 host collectd[21226]: write_gcm: Unsuccessful HTTP request 403: {
"error": {
"code": 403,...
Jul 13 10:14:01 host collectd[21226]: write_gcm: Error talking to the endpoint.
Jul 13 10:14:01 host collectd[21226]: write_gcm: wg_transmit_unique_segment failed.
Jul 13 10:14:01 host collectd[21226]: write_gcm: wg_transmit_unique_segments failed. Flushing.
Google Cloud Console shows the instance having:
Cloud API access scopes
This instance has full API access to all Google Cloud services.
and running the following command inside the instance shows:
$ curl --silent -f -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/scopes
https://www.googleapis.com/auth/cloud-platform
Any thoughts on what is going wrong?

I figured it out:
You have to enable the Google Monitoring API in the API Manager, which is not enabled by default. No need to specify an API key, the default application credentials are picked up.
Interestingly, I have two projects which also use Stackdriver Monitoring since early this year and those do not require the Google Monitoring API to be enabled.

Related

SendGrid misconfiguration on Google Cloud (535 Authentication failed)

So I've installed SendGrid on GoogleCE with Centos base following the documented instruction from Google:
[https://cloud.google.com/compute/docs/tutorials/sending-mail/using-sendgrid#before-you-begin][1]
Using the test from the command line (various accounts):
echo 'MESSAGE' | mail -s 'SUBJECT' GJ******#gmail.com
the /var/log/maillog says with several lines of 50 or so attempts in 1 second:
postfix/error[32324]: A293210062D7: to=<GJ********#gmail.com>, relay=none, delay=145998, delays=145997/1.2/0/0, dsn=4.0.0, status=deferred (delivery temporarily suspended: SASL authentication failed; server smtp.sendgrid.net[167.89.115.53] said: 535 Authentication failed: The provided authorization grant is invalid, expired, or revoked)
And the message is queued up and retried every few hours. Now, messing around, I could change the port setting from 2525 to one of the regular ports that isn't blocked by google and the email gets bounced right away to the user account in the mail test message.
I made sure to use the api key generated, the SendGrid system say no attempt have been made or bounced or whatever.
There were other errors in the maillog, actually as it tries every second, pages of them, but I change the perms in that directory so no longer, but maybe gives a clue to how it's misconfigured?
Oct 31 19:04:14 beadc postfix/pickup[15119]: fatal: chdir("/var/spool/postfix"): Permission denied
Oct 31 19:04:15 beadc postfix/master[1264]: warning: process /usr/libexec/postfix/qmgr pid 15118 exit status 1
Oct 31 19:04:15 beadc postfix/master[1264]: warning: /usr/libexec/postfix/qmgr: bad command startup -- throttling
Oct 31 19:04:15 beadc postfix/master[1264]: warning: process /usr/libexec/postfix/pickup pid 15119 exit status 1
Oct 31 19:04:15 beadc postfix/master[1264]: warning: /usr/libexec/postfix/pickup: bad command startup -- throttling
The only info I can find searching about the error is that it means a SendGrid misconfiguration.
Any ideas as to what the misconfiguration might be?
I've determined the 535 error was a port/firewall issue. Which means that the 550 error I had on the other port still exists.
Check your firewall settings on 535
[https://cloud.google.com/compute/docs/tutorials/sending-mail/][1]

SUM authentication issue with saphostctrl – Authentication Required

We are trying to start the Software Update Manager (SUM) 1.0 SP20 PL4 on an Netweaver 7.02 Sandbox with Red Hat Enterprise Linux 7 and DB2 (DB6).
We extracted the SUM package to /usr/sap//SUM and started the tool via command (with root):
./STARTUP confighostagent QHR &
or
./STARTUP &
When calling the URL http://localhost:1128/lmsl/sumabap/QHR/doc/sluigui the authentication box appears where we type in the sidadm credentials. When we confirm the credentials the box appears again after 1 second. No matter if the credentials are correct (sidadm with correct password) or not (any login with any password), the authentication box appears again (see attached screenshot).
This is, what we already checked:
Restart of the SUM
Restart of SAP Host Agent
Installation of latest SAP Host Agent version
Restart of complete virtual machine
Tried Internet Explorer, Firefox, Chrome in normal mode and in
private browsing mode
Re-download / re-extract of SUM to /usr/sap//SUM
Check of file authorizations of SUM
Notes we checked:
927637 - Web service authentication in sapstartsrv as of Release 7.00
1563660 - sapcontrol, user authorization issues (SUM)
2284028 - SUM SL Common UI : Troubleshooting problems with the new
SUM UI
2426160 - DB6: Add. Info - Software Update Manager 1.0 SP20
We changed the saphostctrl tracelevel to 3 and found an error in the /usr/sap/hostctrl/work/sapstartsrv.log after trying to authenticate again:
[Thr 140134583793408] Authenticate check on cache failed
Tue Jul 11 17:21:34 2017
pam_authenticate_user -> service( sapstartsrv ) user (
qhradm )
*** ERROR => pam_authenticate ( qhradm ) failed :
Authentication failure [usercheckux. 243]
[Thr 140134583793408] helper exit with return code 251
Tue Jul 11 17:21:34 2017
pam_authenticate_user -> service( login ) user ( qhradm )
Tue Jul 11 17:21:36 2017
*** ERROR => pam_authenticate ( qhradm ) failed :
Authentication failure [usercheckux. 243]
[Thr 140134583793408] Tue Jul 11 17:21:36 2017
[Thr 140134583793408] helper exit with return code 251
[Thr 140134583793408] *** ERROR => soap_check_permission
authentication: ( qhradm, ExecutOperation ) FAILED [DefaultOpera 163]
[Thr 140134583793408] Authenticate clear cache
[Thr 140134583793408] Unauthorized (user authentication
required)
[Thr 140134583793408] *** ERROR => Authentication is
required [HTTPProxyHan 258]
[Thr 140134583793408] HTTPResponse::SendError HTTP 401:
'Unauthorized: User authentication required' send as 'Unauthorized'
SAP note 927637 says the following:
[…]
If the user/password check fails, the system generates an "Invalid Credentials" SOAP exception.
[…]
Unfortunately there are no hints what to do with the above error message.
Do you have any idea, what we can do to find/solve the problem?
regards,
Umar Abdullah

Stackdriver unable to determine collectd endpoint

All my hosts stopped reporting stats to collectd google gateway. This is due to some internal change on google side.
In logs files I see this:
Jan 13 08:52:36 ign-rpt01 systemd[1]: Stopping LSB: start and stop Stackdriver Agent...
Jan 13 08:52:36 ign-rpt01 stackdriver-agent[10768]: mesg: ttyname failed: Inappropriate ioctl for device
Jan 13 08:52:36 ign-rpt01 stackdriver-agent[10768]: * Stopping Stackdriver metrics collection agent stackdriver-agent
Jan 13 08:52:37 ign-rpt01 stackdriver-agent[10768]: ...done.
Jan 13 08:52:37 ign-rpt01 systemd[1]: Stopped LSB: start and stop Stackdriver Agent.
Jan 13 08:52:37 ign-rpt01 systemd[1]: Starting LSB: start and stop Stackdriver Agent...
Jan 13 08:52:37 ign-rpt01 stackdriver-agent[10794]: mesg: ttyname failed: Inappropriate ioctl for device
Jan 13 08:52:37 ign-rpt01 stackdriver-agent[10794]: * Starting Stackdriver metrics collection agent stackdriver-agent
Jan 13 08:52:38 ign-rpt01 stackdriver-agent[10794]: Unable to determine collectd endpoint!
Jan 13 08:52:38 ign-rpt01 stackdriver-agent[10794]: * not starting, configuration error
Jan 13 08:52:38 ign-rpt01 stackdriver-agent[10794]: ...fail!
Jan 13 08:52:38 ign-rpt01 systemd[1]: Started LSB: start and stop Stackdriver Agent.
Jan 13 08:53:16 ign-rpt01 extractd[10869]: Error sending processes data: Stackdriver gateway replied with a 401: <html><title>HTTP 401: Unauthorized (Invalid API key)</title><body>HTTP 401: Unauthorized (Invalid API key)</body></html>
Jan 13 08:54:16 ign-rpt01 extractd[10903]: Error sending processes data: Stackdriver gateway replied with a 401: <html><title>HTTP 401: Unauthorized (Invalid API key)</title><body>HTTP 401: Unauthorized (Invalid API key)</body></html>
Jan 13 08:55:16 ign-rpt01 extractd[10947]: Error sending processes data: Stackdriver gateway replied with a 401: <html><title>HTTP 401: Unauthorized (Invalid API key)</title><body>HTTP 401: Unauthorized (Invalid API key)</body></html>
When I go to stackdriver account settings:
The following instances are using a deprecated configuration of the monitoring agent. Alerting policies referencing metrics from these agents do not work as intended and are currently unsupported. Dashboards using metrics from these agents are also unsupported and will soon stop working.
Please update your monitoring agent. Learn more
Okay, it turns out that now only --write-gcm is supported now.
TL;DR version
Just run this:
curl -O "https://repo.stackdriver.com/stack-install.sh"
sudo bash stack-install.sh --write-gcm
And hey, my stats are starting to come in again:

gce container logs not showing up in cloud logging

i have a rather recent kubernetes cluster running on GCE. I am trying to get my application to log to Cloud Logging / Stackdriver.
I can see all the kubernetes cluster logs there but no container output ever materializes.
So when I follow this guide: http://kubernetes.io/docs/getting-started-guides/logging/, I can see the output of the pod
kubectl logs counter
2163: Wed Aug 31 15:02:52 UTC 2016
This never makes it to the Logging Interface
Pod not showing in selector
The fluentd-cloud-logging pods give no logging output
kubectl logs --namespace=kube-system fluentd-cloud-logging-staging-minion-group-20hk
The /var/log/google-fluentd/google-fluentd.log file looks happy
...
2016-08-31 14:07:16 +0000 [info]: following tail of /var/log/containers/node-problem-detector-v0.1-hgtcr_kube-system_POD-07e5b134c9f8ff48f73f1df41473a84a07738ac750840f09938d604694c4bd6e.log
2016-08-31 14:07:16 +0000 [info]: following tail of /var/log/containers/rails-2607986313-s7r5e_default_POD-9f1dd02f23de552a40297f761d09c03b50e5a2cd9789ef498139d24602d9847e.log
2016-08-31 14:07:16 +0000 [info]: following tail of /var/log/salt/minion
2016-08-31 14:07:16 +0000 [info]: following tail of /var/log/startupscript.log
2016-08-31 14:07:16 +0000 [info]: following tail of /var/log/docker.log
2016-08-31 14:07:16 +0000 [info]: following tail of /var/log/kubelet.log
2016-08-31 14:07:22 +0000 [info]: Successfully sent to Google Cloud Logging API.
2016-08-31 14:07:22 +0000 [info]: Successfully sent to Google Cloud Logging API.
Kubernetes Version is
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.5", GitCommit:"b0deb2eb8f4037421077f77cb163dbb4c0a2a9f5", GitTreeState:"clean", BuildDate:"2016-08-11T20:29:08Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.5", GitCommit:"b0deb2eb8f4037421077f77cb163dbb4c0a2a9f5", GitTreeState:"clean", BuildDate:"2016-08-11T20:21:58Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
Cluster was started with
export KUBE_GCE_ZONE=europe-west1-d
export NODE_SIZE=n1-standard-2
export NUM_NODES=2
export KUBE_GCE_INSTANCE_PREFIX=staging
export ENABLE_CLUSTER_AUTOSCALER=true
export KUBE_ENABLE_CLUSTER_MONITORING=true
export KUBE_ENABLE_CLUSTER_MONITORING=google
Any ideas what I might be doing wrong? To my understanding this should work out of the box, right?
Bit of a long shot, but have you enabled the logging API?
"You can do so from the Developers Console, here. Try going there, clicking the Enable API button, and seeing whether the errors keep coming."
https://github.com/kubernetes/kubernetes/issues/20516
Google Cloud Logging + google-fluentd Dropping Messages
Ok, this is rather silly:
If you run a kubernetes cluster on GCE, the container application logs will appear in the Google Container Engine logs.
Never bothered checking there because, well, I am not using the Container Engine.

Message Archive Management Plugin (Prosody) can't open archive

I'm trying to get Message Archive Management ( mam ) on a prosody server
working.
I tried it with SQLite3, MySQL and PostgreSQL.
Always this log:
Oct 20 14:56:21 general info Hello and welcome to Prosody version 0.9.7
Oct 20 14:56:21 general info Prosody is using the epoll backend for connecti$
Oct 20 14:56:22 localhost:mam error Could not open archive storage
The archive is existing in /var/lib/prosody/.