【实践】Prometheus拉取外部集群数据

很多时候我们需要在当前集群部署Prometheus进行监控的数据拉取和收集,当我们进行多个公有云集群,并且规模都不大的时候,我们可以进行多个公有云集群数据的统一汇集

前言

目前看阿里云提供的2种数据拉取,一种是基于云原生Prometheus,一种是基于Prometheus-operator做多集群的数据拉取,本集群无需部署
腾讯云是基于Thanos做的二次深度定制,有基于kubetnetes自动发现的深度定制hash规则,解决hash不均衡的问题,以及配置刷新后的reloader问题
我们说一下如何从集群A拉取集群B的监控数据:

操作

以拉取kubelet的cadvisor为例,如果我们想要拉取数据要么利用rbac权限从apiserver进行请求和数据返回,无形中给apiserver造成了一定的压力。

RBAC的创建

首先我们需要在A-B集群部署RBAC,让我们不管在本集群还是外部集群都有权限进行数据的拉取和搜集
RBAC.yaml

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
  namespace: kube-system
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - nodes/metrics
  - nodes/metrics/cadvisor
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: kube-system

注意: 新增 nodes/metrics、nodes/metrics/cadvisor 相关路径的获取和搜集权限,为了绕过apiserver进行分布式拉取以及减轻apiserver的压力

RBAC token的获取

创建好rbac的文件后可以利用命令查看保存的secret:

kubectl get sa prometheus -n kube-system -o yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"prometheus","namespace":"kube-system"}}
  creationTimestamp: 2020-02-19T03:56:52Z
  name: prometheus
  namespace: kube-system
  resourceVersion: "289515514"
  selfLink: /api/v1/namespaces/kube-system/serviceaccounts/prometheus
  uid: dcc501be-52cb-11ea-97c1-246e964bd6e0
secrets:
- name: prometheus-token-z6hzl

获取token(ca证书也有,如有需要也可以使用):

kubectl get secrets prometheus-token-z6hzl -n kube-system -ojsonpath='{.data.token}' | base64 -d

token权限的验证

物理机节点操作:

TOKEN=$(kubectl get secrets prometheus-token-z6hzl -n kube-system -ojsonpath='{.data.token}' | base64 -d)

带Bearer_token验证urL:

curl -k --header "Authorization: Bearer $TOKEN" https://任意K8S节点IP:10250/metrics/cadvisor

在得到数据格式如下时:

container_threads{container="xxx",image="xxxx"}   666
container_threads{container="xxx",image="xxxx"}   666
container_threads{container="xxx",image="xxxx"}   666

如出现403、401或Forbidden等字样代表权限不够,从头进行撸一遍看哪里有问题

  • 更新:
    如果 创建完成RBAC且确认相对路径没有问题的情况下,如果出现
    Forbidden (user=system:anonymous, verb=get, resource=nodes, subresource=metrics)
    
    可以试试绑定一个clusterrolebinding
    kubectl create clusterrolebinding system:anonymous   --clusterrole=cluster-admin   --user=system:anonymous -n kube-system
    

主拉取集群操作

Prometheus-operator

创建token的secret文件

apiVersion: v1
kind: Secret
metadata:
  name: others-prometheus-token
type: Opaque
stringData:
  k8s.token: |-
eyJhbGciOiJSUzI1NiIsImtpZCI6IjZ1b3JCM0tQNnNuUmNtRTBRNllLanItc1UxWEtSbjJsSGJhNXphRUVOTmMifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWdxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

创建Prometheus外挂配置文件

kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring

以cadvisor为示例

官方示例:

    - job_name: 'kubernetes-cadvisor'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

修改后需要挂载的:

- job_name: 'kubernetes-cadvisor-stag'
  kubernetes_sd_configs:
  - api_server: https://apiserverIP:6443
    role: node
    bearer_token_file: /etc/prometheus/secrets/others-prometheus-token/k8s.token
    tls_config:
      insecure_skip_verify: true
  scheme: https
  tls_config:
    insecure_skip_verify: true
  bearer_token_file: /etc/prometheus/secrets/others-prometheus-token/k8s.token
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_node_address_InternalIP]
    regex: (.+)
    target_label: __address__
    replacement: ${1}:10250
  - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    target_label: __metrics_path__
    replacement: /metrics/cadvisor
  metric_relabel_configs:
  - source_labels: [container]
    regex: (.+)
    target_label: container_name
    replacement: $1
    action: replace
  - source_labels: [pod]
    regex: (.+)
    target_label: pod_name
    replacement: $1
    action: replace

operator-Prometheus部署文件新增修改

新增配置
  secrets:
    - others-prometheus-token
  additionalScrapeConfigs:
    name: additional-configs
    key: prometheus-additional.yaml

之后部署完成就ok了

云原生kubernetes_sd_configs

云原生配置文件:

- job_name: 'kubernetes-cadvisor-stag'
  kubernetes_sd_configs:
  - api_server: https://apiserverIP:6443
    role: node
    bearer_token_file: /etc/prometheus/k8s.token
    tls_config:
      insecure_skip_verify: true
  scheme: https
  tls_config:
    insecure_skip_verify: true
  bearer_token_file: /etc/prometheus/k8s.token
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_node_address_InternalIP]
    regex: (.+)
    target_label: __address__
    replacement: ${1}:10250
  - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    target_label: __metrics_path__
    replacement: /metrics/cadvisor
  metric_relabel_configs:
  - source_labels: [container]
    regex: (.+)
    target_label: container_name
    replacement: $1
    action: replace
  - source_labels: [pod]
    regex: (.+)
    target_label: pod_name
    replacement: $1
    action: replace

  k8s.token: |
eyJhbGciOiJSUzI1NiIsImtpZCI6IndOWmlUbEU5Z2F1UVltbDZZeEg5SnhzSmRzUERfcUNmMXFXQjRRV09fX0EifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJtb25pdG9yaW5nIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6InByb21ldGhldXMtdG9rZW4teHdtZGYiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50Lxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

部署文件

configmap统一挂载即可

图例: