How to get who or what turned off a pod? - openshift

We are currently trying to debug an issue with a pod and figured out that 6 other pod (not related) was turned off and would want to figure out when that happens and who or what turned it off (to see if it's related or not with the first issue).
Is it possible to get this kind of information with openshift ?

These operations are typically recorded in the audit logs (if you have enabled those): https://docs.openshift.com/container-platform/4.7/security/audit-log-view.html
So you can filter certain actions for example like so (GET actions):
oc adm node-logs node-1.example.com --path=oauth-apiserver/audit.log \
| jq 'select(.verb != "get")'

Related

Can PM2 take an action upon a process being marked "Errored"

PM2 will mark a process status "Errored" if it restarts more than "max_restarts" where each restart lasts less than "min_uptime". Perhaps it happens in other circumstances as well.
I'd like to take an action in the event that such a string of fatal errors occur. In my case, I'd like to reboot the whole machine since it means something horrible has occurred. Is this possible?
Note: I now see that it's possible to do this when PM2 is being used programmatically (see answer below). Is there a way to do it automatically through the CLI instead? Something similar to a githook that runs automatically upon the "errored" status being raised.
If PM2 is being used programmatically, this function can be used:
pm2.describe(process,errback)
It returns 'processDescription', which includes 'pm2_env', which includes 'status', which would show 'errored'.
This may answer the question for someone else, but it does not answer the question for me, as I would like to use PM2 via CLI call, and not from within another node script.
The question is quite old, but I had the same problem and nowadays, there is a CLI solution:
You can use pm2 jlist to get the current process list as JSON and parse it for example with jq. To search for all processes managed by pm2 in status "errored", you could call something like:
pm2 jlist | jq '.[] | {"name": .name, "status": .pm2_env.status} | select(.status=="errored")'

How to tag gunicorn metrics with proc_name?

I'm pushing gunicorn metrics from multiple applications into datadog from the same host however I cannot find a way to group the statsd metrics using either a tag or proc_name.
Datadog gunicorn integration
https://app.datadoghq.com/account/settings#integrations/gunicorn
Datadog agent checks are being updated automatically with the app:proc_name tag. I can use this to group and select the data for a specific service.
https://github.com/DataDog/dd-agent/blob/5.2.x/checks.d/gunicorn.py#L53
For the statsd metrics however, I do not see how to assign a tag or proc_name. This is not being done automatically nor do I see a way to specify a tag.
https://github.com/benoitc/gunicorn/blob/19.6.0/gunicorn/instrument/statsd.py#L90
Datadog config:
cat /etc/dd-agent/datadog.conf
[Main]
dd_url: https://app.datadoghq.com
api_key: <KEY>
bind_host: 0.0.0.0
log_level: INFO
statsd_metric_namespace: my_namespace
tags: role:[service, test]
Gunicorn config:
# cat /etc/dd-agent/conf.d/gunicorn.yaml
init_config:
instances:
- proc_name: service
- proc_name: another_service
Any ideas on how this might be achieved?
Examples using notebooks:
In this example, I am able to select app:service in either the 'from' or 'avg by' drop downs.
Timeseries - `gunicorn.workers` - from `app:service`
For the metrics with the my_namespace prefix I am unable to reference the same application name. Only host and environment related tags are available.
Timeseries - `my_namespace.gunicorn.workers` - from "Not available"
Timeseries - `my_namespace.gunicorn.requests` - from "Not available"
Spoke with Datadog support. Very helpful but the short answer is that there is currently no option to add additional tags to specify the specific proc_name in the individual gunicorn.yaml file.
As a workaround to enable grouping we enabled unique prefixes for each application but the trade-off is that the metrics are no longer sharing the same namespace.
I've submitted a new feature request on the Github project which will hopefully be considered.
https://github.com/DataDog/integrations-core/issues/1062

How to ping from Zabbix agent?

Is it possible to ping from Zabbix agent and pass that data into Zabbix server? I would like to be able to get response time from the agent.
I read that it is possible by using fping, would be great if someone could guide me to the correct path.
Thank you,
Rijath Mohammed
While that is not currently available out of the box, you can implement such a functionality using a feature called "user parameters". This forum thread has a simple example:
UserParameter=myping[*],/etc/zabbix/fping -q $1;echo $?
Although for you the path to fping is likely to be /usr/sbin/fping or /usr/bin/fping.
You can read more about user parameters in the official manual: https://www.zabbix.com/documentation/3.0/manual/config/items/userparameters .
While I haven't ever configured that, it would be similar on Windows - see this forum thread for some inspiration.
And if you would like to see this feature implemented out of the box, make sure to vote on this feature request.
Got it working using the below powershell script, :)
$Test = test-connection google.com -count 1
$Test.responsetime
This will just return the response time for Google.com and that value is passed to Zabbix using the below user parameter:
UnsafeUserParameters=1
UserParameter=ping.google,C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe C:\zabbix\pinggoogle.ps1
I am calling this parameter from Zabbix using the key "ping.google"

Couchbase - Can the N1QL DP4 release handle the stale option being set on a query?

I have been running some tests using the DP4 release of N1QL.
It seems that if I write to the database (save a document) I can access it by key straight away, but if I run a query to find it by the document type and another matching value it doesn't come back in the results for between 1 and 10 seconds.
After this time has passed, the query returns the expected result.
I have seen the issue raised here: https://issues.couchbase.com/browse/MB-10944
The issue says it is resolved in DP4 but there is no confirmation of this or documentation on how to use the new feature.
Has anybody figured out how to do this or could one of the Couchbase developers lend a hand?
yes but that feature is currently not available via the N1QL shell and you will need to use the HTTP REST API directly to pass those parameters.
e.g.
curl -v http://localhost:8093/query/service -d 'statement=select * from default&scan_consistency=REQUEST_PLUS'
By setting the scan_consistency parameter to 'REQUEST PLUS', N1QL will set stale=false internally for the view scan.

How to access Hudson job1 artifacts from another job2?

We have a production job and a nightly job for a project in Hudson. The production job needs to pull off some artifacts from a specific nightly build # (which is provided as parameter). Can anyone help us with a hint on how to achieve this?
The Copy Artifact plugin seems to be capable of doing this.
Another approach could be to fetch the artifact via
http://server/jobs/job1/[build #]/artifacts/
You can use "Build Environment" configuration tools in the job's configuration page. Tick the Configure M2 Extra Build Steps box and add an Execute Shell which grep things from the desired artifact.
We have similar need and use the following system groovy:
import hudson.model.*
def currentBuild = Thread.currentThread().executable;
currentBuild.addAction(new ParametersAction(new StringParameterValue('LAST_BUILD_STATUS', 'FAILURE')));
def buildJob = Hudson.instance.getJob("ArtifactJobName");
def artifacts = buildJob.getLastBuild().getArtifacts();
if (buildJob.getLastBuild().getResult() == Result.SUCCESS && artifacts != null && artifacts.size() > 0) {
currentBuild.addAction(new ParametersAction(new StringParameterValue('VARIABLE_NAME', artifacts[0].getFileName())));
currentBuild.addAction(new ParametersAction(new StringParameterValue('LAST_BUILD_STATUS', 'SUCCESS')));
}
This creates a VARIABLE_NAME with the artifact name in it from ArtifactJobName, which we use since they are all stored in a specific folder. I am not sure what will happen if you have multiple artifacts, but it seems you could get them from the artifacts array.
You could use getLastSuccessfulBuild to prevent issue when another ArtifactJobName is being build at the moment and you get array with null.