config: |-
region: us-west-2
period_seconds: 240
metrics:
Region in cloudwatch metrics is hardcoded, Need to add that as a variable to pull the metrics from specific region based on where we launch the resources.
example: "arn:${aws_partition}:iam::${aws_account_id}:role/${cluster_id}-cloudwatch-exporter"
region: us-west-2
I want to write the "arn" for the cloudwatch exporter region. unable to find the exact for cloudwatch exporter. can someone help me to write arn.
Related
I have a Spring Boot application running on AWS Elastic Beanstalk. There are multiple application instances running. The number of running applications might dynamically increase and decrease from 2 to 10 instances.
I have set up Prometheus, but scraping the metrics has a critical limitation: it is only able to scrape the Elastic Beanstalk load balancer. This means that every scrape will return a different instance (round robin), so the metrics fluctuate wildly.
# prometheus.yml
scrape_configs:
- job_name: "my-backend"
metrics_path: "/metrics/prometheus"
scrape_interval: 5s
static_configs:
- targets: [ "dev.my-app.website.com" ] # this is a load balancer
labels:
application: "my-backend"
env: "dev"
(I am pursuing a correct set up, where Prometheus can directly scrape from the instances, but because of business limitations this is not possible - so I would like a workaround.)
As a workaround I have added a random UUID label to each application instance using RandomValuePropertySource
# application.yml
management:
endpoints:
enabled-by-default: false
web:
exposure:
include: "*"
endpoint:
prometheus.enabled: true
metrics:
tags:
instance_id: "${random.uuid}" # used to differentiate instances behind load-balancer
This means that the metrics can be uniquely identified, so on one refresh I might get
process_uptime_seconds{instance_id="6fb3de0f-7fef-4de2-aca9-46bc80a6ed27",} 81344.727
While on the next I could get
process_uptime_seconds{instance_id="2ef5faad-6e9e-4fc0-b766-c24103e111a9",} 81231.112
Generally this is fine, and helps for most metrics, but it is clear that Prometheus gets confused and doesn't store the two results separately. This is a particular problem for 'counters', as they are supposed to only increase, but because the different instances handle different requests, the counter might increase or decrease. Graphs end up jagged and disconnected.
I've tried relabelling the instance label (I think that's how Prometheus decides how to store the data separately?), but this doesn't seem to have any effect
# prometheus.yml
scrape_configs:
- job_name: "my-backend"
metrics_path: "/metrics/prometheus"
scrape_interval: 5s
static_configs:
- targets: [ "dev.my-app.website.com" ] # this is a load balancer
labels:
application: "my-backend"
env: "dev"
metric_relabel_configs:
- target_label: instance
source_labels: [__address__, instance_id]
separator: ";"
action: replace
To re-iterate: I know this is not ideal, and the correct solution is to directly connect - that is in motion and will happen eventually. For now, I'm just trying to improve my workaround, so I can get something working sooner.
I'm trying to create a GCE instance "with a container" (as supported by gcloud CLI) by using POST https://www.googleapis.com/compute/v1/projects/{project}/zones/{zone}/instances.
How can I pass the container image url in the request payload?
If you are trying to set the "Machine Type", then you can specify the URL following the syntax mentioned in this document.
In document:
Full or partial URL of the machine type resource to use for this instance, in the format: zones/zone/machineTypes/machine-type. This is provided by the client when the instance is created. For example, the following is a valid partial url to a predefined machine type:
zones/us-central1-f/machineTypes/n1-standard-1
To create a custom machine type, provide a URL to a machine type in the following format, where CPUS is 1 or an even number up to 32 (2, 4, 6, ... 24, etc), and MEMORY is the total memory for this instance. Memory must be a multiple of 256 MB and must be supplied in MB (e.g. 5 GB of memory is 5120 MB):
zones/zone/machineTypes/custom-CPUS-MEMORY
For example: zones/us-central1-f/machineTypes/custom-4-5120 For a full list of restrictions, read the Specifications for custom machine types.
If you instead, you want to create a container cluster, then the API method mentioned in this link can help you.
It doesn't seem that there's an equivalent REST API for gcloud compute instances create-with-container ..., however as suggested by #user10880591's comment, Terraform can help. Specifically, the container-vm module deals with the generation of metadata required for this kind of action.
Usage example can be found here.
Problem statement:
List only available NICs (not attached to any VM) from a pool of 10 NICs in Azure cloud.
Condition:
Not to use Azure resource tags to get NIC state information (is available or not).
Below code snippet solves the problem using tags which fails to satisfy the above condition.
- hosts: localhost
tasks:
- name: Get available NICs from NIC Pool
azure_rm_networkinterface_facts:
resource_group: '{{NIC_rg_name}}'
tags:
- available:yes
register: NicDetails
- name: List available NICs
debug:
msg: '{{NicDetails.ansible_facts.azure_networkinterfaces}}'
How can I achieve the same result without using Azure ressource tags ?
I believe , below code would return all the network interfaces within a resource group
- name: Get network interfaces within a resource group
azure_rm_networkinterface_facts:
resource_group: Testing
This should do what you are looking for.
Also if we want to use tags , we can use the below code
- name: Get network interfaces by tag
azure_rm_networkinterface_facts:
resource_group: Testing
tags:
- testing
- foo:bar
You can find the common return value details here.
Prerequisite to run the module:
python >= 2.7
azure >= 2.0.0
I'm pushing gunicorn metrics from multiple applications into datadog from the same host however I cannot find a way to group the statsd metrics using either a tag or proc_name.
Datadog gunicorn integration
https://app.datadoghq.com/account/settings#integrations/gunicorn
Datadog agent checks are being updated automatically with the app:proc_name tag. I can use this to group and select the data for a specific service.
https://github.com/DataDog/dd-agent/blob/5.2.x/checks.d/gunicorn.py#L53
For the statsd metrics however, I do not see how to assign a tag or proc_name. This is not being done automatically nor do I see a way to specify a tag.
https://github.com/benoitc/gunicorn/blob/19.6.0/gunicorn/instrument/statsd.py#L90
Datadog config:
cat /etc/dd-agent/datadog.conf
[Main]
dd_url: https://app.datadoghq.com
api_key: <KEY>
bind_host: 0.0.0.0
log_level: INFO
statsd_metric_namespace: my_namespace
tags: role:[service, test]
Gunicorn config:
# cat /etc/dd-agent/conf.d/gunicorn.yaml
init_config:
instances:
- proc_name: service
- proc_name: another_service
Any ideas on how this might be achieved?
Examples using notebooks:
In this example, I am able to select app:service in either the 'from' or 'avg by' drop downs.
Timeseries - `gunicorn.workers` - from `app:service`
For the metrics with the my_namespace prefix I am unable to reference the same application name. Only host and environment related tags are available.
Timeseries - `my_namespace.gunicorn.workers` - from "Not available"
Timeseries - `my_namespace.gunicorn.requests` - from "Not available"
Spoke with Datadog support. Very helpful but the short answer is that there is currently no option to add additional tags to specify the specific proc_name in the individual gunicorn.yaml file.
As a workaround to enable grouping we enabled unique prefixes for each application but the trade-off is that the metrics are no longer sharing the same namespace.
I've submitted a new feature request on the Github project which will hopefully be considered.
https://github.com/DataDog/integrations-core/issues/1062
The openshift documentation has a feature Exposing Object Fields that I am struggling to comprehend. When I load my secret I am exposing it as per the documentation. Yet it is unclear from the language of the documentation what are the actual mechanism to bind to the exposed variables. The docs state:
An example response to a bind operation given the above partial
template follows:
{ "credentials": {
"username": "foo",
"password": "YmFy",
"service_ip_port": "172.30.12.34:8080",
"uri": "http://route-test.router.default.svc.cluster.local/mypath" } }
Yet that example isn't helpful as its not clear what was bound and how it was bound to actually pick-up the exposed variables. What I am hoping it is all about is that the exposed values become ambient and that when I run some other templates into the same project (???) it will automatically resolve (bind) the variables. Then I can decouple secret creation (happening at product creation time) and secret usage (happening when developers populate their project). Am I correct that this feature creates ambient properties and that they are picked up by any template? Are there any examples of using this feature to decouple secret creation from secret usage (i.e. using this feature for segregation fo duties).
I am running Redhat OCP:
OpenShift Master:
v3.5.5.31.24
Kubernetes Master:
v1.5.2+43a9be4