Can I split a single scrape target (a load balancer) into multiple targets, based on a label value? - amazon-elastic-beanstalk

I have a Spring Boot application running on AWS Elastic Beanstalk. There are multiple application instances running. The number of running applications might dynamically increase and decrease from 2 to 10 instances.
I have set up Prometheus, but scraping the metrics has a critical limitation: it is only able to scrape the Elastic Beanstalk load balancer. This means that every scrape will return a different instance (round robin), so the metrics fluctuate wildly.
# prometheus.yml
scrape_configs:
- job_name: "my-backend"
metrics_path: "/metrics/prometheus"
scrape_interval: 5s
static_configs:
- targets: [ "dev.my-app.website.com" ] # this is a load balancer
labels:
application: "my-backend"
env: "dev"
(I am pursuing a correct set up, where Prometheus can directly scrape from the instances, but because of business limitations this is not possible - so I would like a workaround.)
As a workaround I have added a random UUID label to each application instance using RandomValuePropertySource
# application.yml
management:
endpoints:
enabled-by-default: false
web:
exposure:
include: "*"
endpoint:
prometheus.enabled: true
metrics:
tags:
instance_id: "${random.uuid}" # used to differentiate instances behind load-balancer
This means that the metrics can be uniquely identified, so on one refresh I might get
process_uptime_seconds{instance_id="6fb3de0f-7fef-4de2-aca9-46bc80a6ed27",} 81344.727
While on the next I could get
process_uptime_seconds{instance_id="2ef5faad-6e9e-4fc0-b766-c24103e111a9",} 81231.112
Generally this is fine, and helps for most metrics, but it is clear that Prometheus gets confused and doesn't store the two results separately. This is a particular problem for 'counters', as they are supposed to only increase, but because the different instances handle different requests, the counter might increase or decrease. Graphs end up jagged and disconnected.
I've tried relabelling the instance label (I think that's how Prometheus decides how to store the data separately?), but this doesn't seem to have any effect
# prometheus.yml
scrape_configs:
- job_name: "my-backend"
metrics_path: "/metrics/prometheus"
scrape_interval: 5s
static_configs:
- targets: [ "dev.my-app.website.com" ] # this is a load balancer
labels:
application: "my-backend"
env: "dev"
metric_relabel_configs:
- target_label: instance
source_labels: [__address__, instance_id]
separator: ";"
action: replace
To re-iterate: I know this is not ideal, and the correct solution is to directly connect - that is in motion and will happen eventually. For now, I'm just trying to improve my workaround, so I can get something working sooner.

Related

Red Hat Service Mesh: custom header transformed into x-b3-traceId is lost

I am trying to integrate a legacy system with microservice hosted on Red Hat OpenShift platform. The service is a java app behind ingress gateway.
The legacy app passes unique operation identifier as a custom header uniqueId. The microservice leverages openshift service mesh support for Jaeger so I can pass tracing headers such as x-b3-traceid and see the request trace in Jaeger UI. Unfortunately, the legacy app cannot be modified and won't send jaeger headers but uniqueId conforms jaeger rules and seems ok to be used for tracing.
I am trying to transform uniqueId into x-b3-traceid on an envoy filter. The problem is that I can copy it to any other header, but cannot modify x-b3-* headers. Istio keeps generating new set of x-b3-* headers no matter what I do in envoy filter. See filter code below.
I tried different filter positions (on ingress gateway, on pod sidecar, before envoy.router, etc). Seems nothing works. Can anyone recommend how can I pass custom header as a traceId for service mesh's jaeger? I can create a custom proxy service transforming one header with another but it looks redundant. Is it possible to achieve that with service mesh only?
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: call-id-filter
namespace: mynamespace
spec:
filters:
- filterConfig:
inlineCode: |
function envoy_on_request(request_handle)
headers = request_handle:headers()
uniqueId=headers:get("uniqueId")
if (uniqueId ~= nil) then
request_handle:headers():add("x-b3-traceid", uniqueId) -- istio overwrites these values
request_handle:headers():add("x-b3-spanid", "myspan")
request_handle:headers():add("x-b3-sampled", "1")
request_handle:headers():add("my-custom-unique-id", uniqueId) -- works fine
request_handle:logCritical("envoy filter setting x-b3-traceid with"..uniqueId)
end
end
filterName: envoy.lua
filterType: HTTP
insertPosition:
index: FIRST
listenerMatch:
listenerType: GATEWAY
portNumber: 9011
workloadLabels:
app: istio-ingressgateway

Using Ansible to list only available NICs from a pool of 10 NICs in Azure

Problem statement:
List only available NICs (not attached to any VM) from a pool of 10 NICs in Azure cloud.
Condition:
Not to use Azure resource tags to get NIC state information (is available or not).
Below code snippet solves the problem using tags which fails to satisfy the above condition.
- hosts: localhost
tasks:
- name: Get available NICs from NIC Pool
azure_rm_networkinterface_facts:
resource_group: '{{NIC_rg_name}}'
tags:
- available:yes
register: NicDetails
- name: List available NICs
debug:
msg: '{{NicDetails.ansible_facts.azure_networkinterfaces}}'
How can I achieve the same result without using Azure ressource tags ?
I believe , below code would return all the network interfaces within a resource group
- name: Get network interfaces within a resource group
azure_rm_networkinterface_facts:
resource_group: Testing
This should do what you are looking for.
Also if we want to use tags , we can use the below code
- name: Get network interfaces by tag
azure_rm_networkinterface_facts:
resource_group: Testing
tags:
- testing
- foo:bar
You can find the common return value details here.
Prerequisite to run the module:
python >= 2.7
azure >= 2.0.0

Nginx ingress controller rate limiting not working

annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/limit-connection: "1"
nginx.ingress.kubernetes.io/limit-rpm: "20"
and the container image version, iam using,
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.22.0
trying to send 200 requests in ten mins of range (and per min it is like a 20 requests from a single ipaddress) and after that it has to refuse the requests.
Which nginx ingress version are you using ? please use quay.io/aledbf/nginx-ingress-controller:0.415 and then check, Also Please look at this link - https://github.com/kubernetes/ingress-nginx/issues/1839
Try to change this limit-connection: to limit-connections:
For more info check this
If doesn't help, please put your commands or describe that how are you testing your connection limits.
I changed it to the limit-connections, I am mentioning the annotations in the ingress yml file and applying it and i can in the nginx conf the following
`worker_rlimit_nofile 15360;
limit_req_status 503;
limit_conn_status 503;
# Ratelimit test_nginx
# Ratelimit test_nginx `
` map $whitelist_xxxxxxxxxxxx $limit_xxxxxxxxxx {
limit_req_zone $limit_xxxxxxxx zone=test_nginx_rpm:5m rate=20r/m;
limit_req zone=test_nginx_rpm burst=100 nodelay;
limit_req zone=test_nginx_rpm burst=100 nodelay;
limit_req zone=test_nginx_rpm burst=100 nodelay;`
when i kept this annotations,
` nginx.ingress.kubernetes.io/limit-connections: "1"
nginx.ingress.kubernetes.io/limit-rpm: "20" `
I can see the above burst and other things in the nginx conf file, can you please tell me these make any differences ?
There are two things that could be making you experience rate-limits higher than configured: burst and nginx replicas.
Burst
As you have already noted in https://stackoverflow.com/a/54426317/3477266, nginx-ingress adds a burst configuration to the final config it creates for the rate-limiting.
The burst value is always 5x your rate-limit value (it doesn't matter if it's a limit-rpm or limit-rps setting.)
That's why you got a burst=100 from a limit-rpm=20.
You can read here the effect this burst have in Nginx behavior: https://www.nginx.com/blog/rate-limiting-nginx/#bursts
But basically it's possible that Nginx will not return 429 for all request you would expect, because of the burst.
The total number of requests routed in a given period will be total = rate_limit * period + burst
Nginx replicas
Usually nginx-ingress is deployed with Horizontal Pod AutoScaler enabled, to scale based on demand. Or it's explicitly configured to run with more than 1 replica.
In any case, if you have more than 1 replica of Nginx running, each one will handle rate-limiting individually.
This basically means that your rate-limit configuration will be multiplied by the number of replicas, and you could end up with rate-limits a lot higher than you expected.
There is a way to use a memcached instance to make them share the rate-limiting count, as described in: https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#global-rate-limiting

How to tag gunicorn metrics with proc_name?

I'm pushing gunicorn metrics from multiple applications into datadog from the same host however I cannot find a way to group the statsd metrics using either a tag or proc_name.
Datadog gunicorn integration
https://app.datadoghq.com/account/settings#integrations/gunicorn
Datadog agent checks are being updated automatically with the app:proc_name tag. I can use this to group and select the data for a specific service.
https://github.com/DataDog/dd-agent/blob/5.2.x/checks.d/gunicorn.py#L53
For the statsd metrics however, I do not see how to assign a tag or proc_name. This is not being done automatically nor do I see a way to specify a tag.
https://github.com/benoitc/gunicorn/blob/19.6.0/gunicorn/instrument/statsd.py#L90
Datadog config:
cat /etc/dd-agent/datadog.conf
[Main]
dd_url: https://app.datadoghq.com
api_key: <KEY>
bind_host: 0.0.0.0
log_level: INFO
statsd_metric_namespace: my_namespace
tags: role:[service, test]
Gunicorn config:
# cat /etc/dd-agent/conf.d/gunicorn.yaml
init_config:
instances:
- proc_name: service
- proc_name: another_service
Any ideas on how this might be achieved?
Examples using notebooks:
In this example, I am able to select app:service in either the 'from' or 'avg by' drop downs.
Timeseries - `gunicorn.workers` - from `app:service`
For the metrics with the my_namespace prefix I am unable to reference the same application name. Only host and environment related tags are available.
Timeseries - `my_namespace.gunicorn.workers` - from "Not available"
Timeseries - `my_namespace.gunicorn.requests` - from "Not available"
Spoke with Datadog support. Very helpful but the short answer is that there is currently no option to add additional tags to specify the specific proc_name in the individual gunicorn.yaml file.
As a workaround to enable grouping we enabled unique prefixes for each application but the trade-off is that the metrics are no longer sharing the same namespace.
I've submitted a new feature request on the Github project which will hopefully be considered.
https://github.com/DataDog/integrations-core/issues/1062

How to add a SSD disk to a google compute engine instance with Ansible?

Ansible has the gce_pd module: http://docs.ansible.com/gce_pd_module.html. According to the documentation you can specify the size and mode (READ, READ-WRITE) but not the type (SSD vs. Standard). Is it possible to use the gce_pd module to create a SSD disk?
As of right now, https://github.com/ansible/ansible-modules-core/blob/devel/cloud/google/gce_pd.py has no mention of SSD at all, so it seems like it's not supported. If this is something that you really need, consider submitting a feature request.
This is now available in Ansible.
According to the updated official docs, disk_type was added in Ansible 1.9
disk_type can have these possible values:
pd-standard
pd-ssd
Here's an example:
# Simple attachment action to an existing instance
- local_action:
module: gce_pd
name: mongodata
instance_name: www1
size_gb: 30
disk_type: pd-ssd