How to enable additional configs for prometheus operator - configuration

according to Prometheus-operator documentation we should be able to supply our additional configuration easily via secret file. Does anybody actually succeed this step? I have several questions:
where these configuration will appear in prometheus pod?
should this configuration be in a form of prometheus configuration file or just list additional scrape entries
can we supply additional files (json configs) via file_sd_configs: and if so how to supply those files into prometheus manifest file?
Regardless of those questions I have hard time to add the additional configuration. I basically followed exact steps from documentation, and here is my observations:
here is my new configuration
cat prometheus-additional.yaml
- job_name: "prometheus-custom"
static_configs:
- targets: ["localhost:9090"]
add new secret file
kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml
create prometheus.yml file with additional configuration
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
replicas: 2
resources:
requests:
memory: 400Mi
additionalScrapeConfigs:
name: additional-scrape-configs
key: prometheus-additional.yaml
deploy prometheus.yaml
kubectl apply -f prometheus.yaml
check logs and there is no indication of my new configuration
kubectl logs prometheus-prometheus-0 -c prometheus
level=info ts=2019-12-05T18:07:30.217852541Z caller=main.go:302 msg="Starting Prometheus" version=" (version=2.7.1, branch=HEAD, revision=62e591f928ddf6b3468308b7ac1de1c63aa7fcf3)"
level=info ts=2019-12-05T18:07:30.217916972Z caller=main.go:303 build_context="(go=go1.11.5, user=root#f9f82868fc43, date=20190131-11:16:59)"
level=info ts=2019-12-05T18:07:30.217971648Z caller=main.go:304 host_details="(Linux 4.19.3-300.fc29.x86_64 #1 SMP Wed Nov 21 15:27:25 UTC 2018 x86_64 prometheus-prometheus-0 (none))"
level=info ts=2019-12-05T18:07:30.217994128Z caller=main.go:305 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-12-05T18:07:30.218236509Z caller=main.go:306 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-12-05T18:07:30.219359123Z caller=main.go:620 msg="Starting TSDB ..."
level=info ts=2019-12-05T18:07:30.219487263Z caller=web.go:416 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-12-05T18:07:30.230944675Z caller=main.go:635 msg="TSDB started"
level=info ts=2019-12-05T18:07:30.231037536Z caller=main.go:695 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2019-12-05T18:07:30.23125837Z caller=main.go:722 msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2019-12-05T18:07:30.231294106Z caller=main.go:589 msg="Server is ready to receive web requests."
level=info ts=2019-12-05T18:07:33.568068248Z caller=main.go:695 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2019-12-05T18:07:33.568305994Z caller=main.go:722 msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
And, when I logged into prometheus pod I don't see any additional configuration either, and when I check my prometheus web console I don't see any of my configurations.

Turns out, the prometheus-operator still relies on serviceMonitorSelector: {} part of the manifest file according to this ticket. Therefore in order to add additional configuration we need to have the following manifest:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
replicas: 2
resources:
requests:
memory: 400Mi
additionalScrapeConfigs:
name: prometheus-config
key: prometheus-config.yaml
serviceMonitorSelector: {}
where prometheus-config.yaml will contain prometheus scrape rules and deployed via secrets to prometheus cluster. I also found empirically that current prometheus-operator does not support file_sd_configs in prometheus configuration (sad) and someone need to write full rules in prometheus-config.yaml file.

The documentation is somehow wrong, a current operator does not need the serviceMonitorSelector, but using the above configuration throws a
cannot unmarshal !!map into []yaml.MapSlice error.
The correct reference is additionalScrapeConfigsSecret:
additionalScrapeConfigsSecret:
enabled: true
name: additional-scrape-configs
key: prometheus-additional.yaml
Else you get the error cannot unmarshal !!map into []yaml.MapSlice
Here is a better documentation:
https://github.com/prometheus-community/helm-charts/blob/8b45bdbdabd9b54766c4beb3c562b766b268a034/charts/kube-prometheus-stack/values.yaml#L2691
According to this, you could add scrape configs without packaging into a secret like this:
additionalScrapeConfigs: |
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]

Related

Setting vm.max_map_count with tuned

Trying to set vm.max_map_count with the node tuning operator and the openshift ClusterLogging operator. Openshift version is 4.9.17, cluster logging and elasticsearch operators are latest.
This is my tuned configuration:
apiVersion: tuned.openshift.io/v1
kind: Tuned
name: common-services-es
namespace: openshift-cluster-node-tuning-operator
spec:
profile:
- data: |
[main]
summary=Optimize systems running ES on OpenShift nodes
include=openshift-node
[sysctl]
vm.max_map_count=262144
name: common-services-es
recommend:
- match:
- label: component
type: pod
value: elasticsearch
priority: 5
profile: common-services-es
My ClusterLogging operator configuration is the default operator, and I can verify the labels component=elasticsearch on the pod.
Getting the pod logs with the following command
for p in `oc get pods -n openshift-cluster-node-tuning-operator -l openshift-app=tuned -o=jsonpath='{range .items[*]}{.metadata.name} {end}'`; do printf "\n*** $p ***\n" ; oc logs pod/$p -n openshift-cluster-node-tuning-operator | grep applied; done
returns tuned.daemon.daemon: static tuning from profile 'common-services-es' applied on all 3 of my es nodes, but the elasticsearch pod still fails to start with the error max virtual memory areas vm.max_map_count [253832] is too low, increase to at least [262144] and running sysctl vm.max_map_count on the nodes confirm the value is 253832.
Turns out that IBM Cloud openshift doesn't use machineconfigs, and tuned uses machineconfigs.

Install fluentd on openshift without elastic search?

On openshift 4. 3, will config fluentd to forward logs to external syslog. Could I install fluentd only without install elasticsearch etc?
Thanks
Weiren
Yes, you can install only fluentd by only specifying the collection part when deploying the ClusterLogging CRD:
apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
name: instance
namespace: openshift-logging
spec:
collection:
logs:
fluentd: {}
type: fluentd
managementState: Managed
Note that later versions of OpenShift even allow you to specify LogForwarding. More information on how to deploy the ClusterLogging can be found in the documentation: https://docs.openshift.com/container-platform/4.3/logging/cluster-logging-deploying.html#cluster-logging-deploy-clo-cli_cluster-logging-deploying

KNative serving is not showing Ready after installing on Openshift

Followed the link - https://docs.openshift.com/container-platform/4.1/serverless/installing-openshift-serverless.html to install KNative Serving on top of Openshift v4.1. After installing all the openshift operators, control plane. member roll etc as given in the link; I expect to see that serving component is running by executing -
C:\Knative installation>oc get knativeserving/knative-serving -n knative-serving --template='{{range .status.conditions}}{{printf "%s=%s\n" .type .status}}{{end}}'
But the above returns nothing. Just returns back the prompt.
Also below are o/p of the get resource command of the serving component -
C:\Knative installation>oc get knativeserving/knative-serving -n knative-serving
NAME VERSION READY REASON
knative-serving
C:\Knative installation>oc get knativeserving/knative-serving -n knative-serving -o yaml
apiVersion: serving.knative.dev/v1alpha1
kind: KnativeServing
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"serving.knative.dev/v1alpha1","kind":"KnativeServing","metadata": {"annotations":{},"name":"knative-serving","namespace":"knative-serving"}}
creationTimestamp: "2020-01-12T10:53:42Z"
generation: 1
name: knative-serving
namespace: knative-serving
resourceVersion: "63660251"
selfLink: /apis/serving.knative.dev/v1alpha1/namespaces/knative-serving/knativeservings/knative-serving
uid: cc4b330f-3529-11ea-83ef-0272cb600f74
What could be wrong? I believe KNative Serving did not install correctly but not sure how to debug. I uninstalled and reinstalled several times but no help.
Also, I thought to proceed and install a service using KNative Serving (ref link https://docs.openshift.com/container-platform/4.1/serverless/getting-started-knative-services.html) but, applying the very first resource shows problem.
service.yaml
apiVersion: serving.knative.dev/v1alpha1
kind: Service
metadata:
name: helloworld-go
namespace: default
spec:
template:
spec:
containers:
- image: gcr.io/knative-samples/helloworld-go
env:
- name: TARGET
value: "Go Sample v1"
Applying service.yaml returns error.
C:\start Knative service> oc apply --filename service.yaml
error: unable to recognize "service.yaml": no matches for kind "Service" in version "serving.knative.dev/v1alpha1"
Any help is appreciated. Thanks.

Kubernetes installation error in flannel step

I am installing kubernetes using kubeadm on GCP Centos VM and I am getting the following error while running flannel step.
Error:
[root#master ~]# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/a70459be0084506e4ec919aa1c114638878db11b/Documentation/kube-flannel.yml
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
unable to recognize "https://raw.githubusercontent.com/coreos/flannel/a70459be0084506e4ec919aa1c114638878db11b/Documentation/kube-flannel.yml": no matches for kind "DaemonSet" in version "extensions/v1beta1"
unable to recognize "https://raw.githubusercontent.com/coreos/flannel/a70459be0084506e4ec919aa1c114638878db11b/Documentation/kube-flannel.yml": no matches for kind "DaemonSet" in version "extensions/v1beta1"
unable to recognize "https://raw.githubusercontent.com/coreos/flannel/a70459be0084506e4ec919aa1c114638878db11b/Documentation/kube-flannel.yml": no matches for kind "DaemonSet" in version "extensions/v1beta1"
unable to recognize "https://raw.githubusercontent.com/coreos/flannel/a70459be0084506e4ec919aa1c114638878db11b/Documentation/kube-flannel.yml": no matches for kind "DaemonSet" in version "extensions/v1beta1"
unable to recognize "https://raw.githubusercontent.com/coreos/flannel/a70459be0084506e4ec919aa1c114638878db11b/Documentation/kube-flannel.yml": no matches for kind "DaemonSet" in version "extensions/v1beta1"
What changes shall I made in order to fix this?
Use flannel yaml from the official documentation
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel configured
clusterrolebinding.rbac.authorization.k8s.io/flannel unchanged
serviceaccount/flannel unchanged
configmap/kube-flannel-cfg configured
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created
As #suren correctly mention - the issue is in the apiVersion: extensions/v1beta1
In latest yaml it looks like
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds-amd64
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
...
That's a versioning issue with the DaemonSet and your kubernetes cluster. You are using extensions/v1beta1, but DaemonSets have been promoted to apps/v1.
If you already have api-server running, try kubectl explain daemonset, and it will tell you what should be the apiVersion for the DaemonSets.
If not, just download the flannel file, edit it, change the apiVersion: extensions/v1beta1, by apiVersion: apps/v1, and it should work.

route to application stopped working in OpenShift Online 3.9

I have an application running in Openshift Online starter, which worked for the last 5 months. A single pod behind a service with a route defined that does edge tls termination.
Since Saturday, when trying to access the application, I get the error message
Application is not available
The application is currently not serving requests at this endpoint. It may not have been started or is still starting.
Possible reasons you are seeing this page:
The host doesn't exist. Make sure the hostname was typed correctly and that a route matching this hostname exists.
The host exists, but doesn't have a matching path. Check if the URL path was typed correctly and that the route was created using the desired path.
Route and path matches, but all pods are down. Make sure that the resources exposed by this route (pods, services, deployment configs, etc) have at least one pod running.
The pod is running, I can exec into it and check this, I can port-forward to it and access it.
checking the different components with oc:
$ oc get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE
taboo3-23-jt8l8 1/1 Running 0 1h 10.128.37.90 ip-172-31-30-113.ca-central-1.compute.internal
$ oc get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
taboo3 172.30.238.44 <none> 8080/TCP 151d
$ oc describe svc taboo3
Name: taboo3
Namespace: sothawo
Labels: app=taboo3
Annotations: openshift.io/generated-by=OpenShiftWebConsole
Selector: deploymentconfig=taboo3
Type: ClusterIP
IP: 172.30.238.44
Port: 8080-tcp 8080/TCP
Endpoints: 10.128.37.90:8080
Session Affinity: None
Events: <none>
$ oc get route
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
taboo3 taboo3-sothawo.193b.starter-ca-central-1.openshiftapps.com taboo3 8080-tcp edge/Redirect None
I tried to add a new route as well (with or without tls), but am getting the same error.
Does anybody have an idea what might be causing this and how to fix it?
Addition April 17, 2018: Got an email from Openshift Online support:
It looks like you may be affected by this bug.
So waiting for it to be resolved.
The problem has been resolved by Openshift Online, the application is working again