How to tag gunicorn metrics with proc_name? - gunicorn

I'm pushing gunicorn metrics from multiple applications into datadog from the same host however I cannot find a way to group the statsd metrics using either a tag or proc_name.
Datadog gunicorn integration
https://app.datadoghq.com/account/settings#integrations/gunicorn
Datadog agent checks are being updated automatically with the app:proc_name tag. I can use this to group and select the data for a specific service.
https://github.com/DataDog/dd-agent/blob/5.2.x/checks.d/gunicorn.py#L53
For the statsd metrics however, I do not see how to assign a tag or proc_name. This is not being done automatically nor do I see a way to specify a tag.
https://github.com/benoitc/gunicorn/blob/19.6.0/gunicorn/instrument/statsd.py#L90
Datadog config:
cat /etc/dd-agent/datadog.conf
[Main]
dd_url: https://app.datadoghq.com
api_key: <KEY>
bind_host: 0.0.0.0
log_level: INFO
statsd_metric_namespace: my_namespace
tags: role:[service, test]
Gunicorn config:
# cat /etc/dd-agent/conf.d/gunicorn.yaml
init_config:
instances:
- proc_name: service
- proc_name: another_service
Any ideas on how this might be achieved?
Examples using notebooks:
In this example, I am able to select app:service in either the 'from' or 'avg by' drop downs.
Timeseries - `gunicorn.workers` - from `app:service`
For the metrics with the my_namespace prefix I am unable to reference the same application name. Only host and environment related tags are available.
Timeseries - `my_namespace.gunicorn.workers` - from "Not available"
Timeseries - `my_namespace.gunicorn.requests` - from "Not available"

Spoke with Datadog support. Very helpful but the short answer is that there is currently no option to add additional tags to specify the specific proc_name in the individual gunicorn.yaml file.
As a workaround to enable grouping we enabled unique prefixes for each application but the trade-off is that the metrics are no longer sharing the same namespace.
I've submitted a new feature request on the Github project which will hopefully be considered.
https://github.com/DataDog/integrations-core/issues/1062

Related

How to get who or what turned off a pod?

We are currently trying to debug an issue with a pod and figured out that 6 other pod (not related) was turned off and would want to figure out when that happens and who or what turned it off (to see if it's related or not with the first issue).
Is it possible to get this kind of information with openshift ?
These operations are typically recorded in the audit logs (if you have enabled those): https://docs.openshift.com/container-platform/4.7/security/audit-log-view.html
So you can filter certain actions for example like so (GET actions):
oc adm node-logs node-1.example.com --path=oauth-apiserver/audit.log \
| jq 'select(.verb != "get")'

Using Ansible to list only available NICs from a pool of 10 NICs in Azure

Problem statement:
List only available NICs (not attached to any VM) from a pool of 10 NICs in Azure cloud.
Condition:
Not to use Azure resource tags to get NIC state information (is available or not).
Below code snippet solves the problem using tags which fails to satisfy the above condition.
- hosts: localhost
tasks:
- name: Get available NICs from NIC Pool
azure_rm_networkinterface_facts:
resource_group: '{{NIC_rg_name}}'
tags:
- available:yes
register: NicDetails
- name: List available NICs
debug:
msg: '{{NicDetails.ansible_facts.azure_networkinterfaces}}'
How can I achieve the same result without using Azure ressource tags ?
I believe , below code would return all the network interfaces within a resource group
- name: Get network interfaces within a resource group
azure_rm_networkinterface_facts:
resource_group: Testing
This should do what you are looking for.
Also if we want to use tags , we can use the below code
- name: Get network interfaces by tag
azure_rm_networkinterface_facts:
resource_group: Testing
tags:
- testing
- foo:bar
You can find the common return value details here.
Prerequisite to run the module:
python >= 2.7
azure >= 2.0.0

How to use option Arbitrtaion=WaitExternal in MySQL Cluster?

I'm currently reading MySQL Reference Manual and notice that there an option of NDB config -- Arbitrtaion=WaitExternal. The question is how to use this option and how to implement an external cluster manager?
The Arbitration parameter also makes it possible to configure arbitration in
such a way that the cluster waits until after the time determined by Arbitrat-
ionTimeout has passed for an external cluster manager application to perform
arbitration instead of handling arbitration internally. This can be done by
setting Arbitration = WaitExternal in the [ndbd default] section of the config.ini
file. For best results with the WaitExternal setting, it is recommended that
ArbitrationTimeout be 2 times as long as the interval required by the external
cluster manager to perform arbitration.
A bit of git annotate and some searching of original design docs says the following:
When the arbitrator is about to send an arbitration message to the arbitrator it will instead issue the following log message:
case ArbitCode::WinWaitExternal:{
char buf[8*4*2+1];
sd->mask.getText(buf);
BaseString::snprintf(m_text, m_text_len,
"Continuing after wait for external arbitration, "
"nodes: %s", buf);
break;
}
So e.g.
Continuing after wait for external arbitration, nodes: 1,2
The external clusterware should check for this message
at the same interval as the ArbitrationTimeout.
When it discovers this message, the external cluster ware
should kill the data node that it decides to lose the
arbitration.
This kill will be noted by the NDB data nodes and will
decide the matter which node is to survive.

Messages stuck or lost in ActiveMQ cluster

I've set up a small ActiveMQ Network of Brokers to increase reliability. It consists of 3 nodes with the following properties (full config template file is available here):
ActiveMQ Version 5.13.3 (latest as of July 16)
Local LevelDB persistence adapter
NetworkConnector uri="static:(tcp://${OTHER_NODE1}:61616,tcp://${OTHER_NODE2}:61616)" with the two variables set for e.g. node2 to node1 and node3 (uni-directal conn. between all nodes).
Clients connect with failover:(tcp://node1:61616,tcp://node2:61616,tcp://node3:61616), send and retrieve messages as needed.
The failover protocol randomizes the target machine, so messages might be sent back and forth inside the cluster.
There are two (failing) scenarios:
As it is described now, some messages are not delivered, because they are not allowed to go "back". This is done to avoid loops and described in this blog post.
Activating the replayWhenNoConsumers flag as described in the blog and in NoB: Stuck Messages causes those messages to be recognized as duplicates.With enableAudit enabled, I get cursor got duplicate send ID, disabling it gives me a <MSG> paged in, is cursor audit disabled? Removing from store and redirecting to dlq.
Maybe this is trivial to fix - anybody has an idea?

How to add a SSD disk to a google compute engine instance with Ansible?

Ansible has the gce_pd module: http://docs.ansible.com/gce_pd_module.html. According to the documentation you can specify the size and mode (READ, READ-WRITE) but not the type (SSD vs. Standard). Is it possible to use the gce_pd module to create a SSD disk?
As of right now, https://github.com/ansible/ansible-modules-core/blob/devel/cloud/google/gce_pd.py has no mention of SSD at all, so it seems like it's not supported. If this is something that you really need, consider submitting a feature request.
This is now available in Ansible.
According to the updated official docs, disk_type was added in Ansible 1.9
disk_type can have these possible values:
pd-standard
pd-ssd
Here's an example:
# Simple attachment action to an existing instance
- local_action:
module: gce_pd
name: mongodata
instance_name: www1
size_gb: 30
disk_type: pd-ssd