【原】二进制部署 k8s 1.18.3 (17)

二进制部署k8s管理组件和新版本 kubeadm 部署的都会发现在prometheus status 下的 target 页面上发现kube-controller-manager和kube-scheduler的 target 为0/0。因为 serviceMonitor是根据 label 去选取 svc的,可以查看对应的serviceMonitor是选取的ns范围是kube-system

解决办法:

查看endpoint 两者的endpoint为 none

[root@uk8s-a kube-prometheus]# kubectl get endpoints -n kube-system NAME ENDPOINTS AGE kube-controller-manager <none> 7m35s kube-dns 10.244.43.2:53,10.244.62.2:53,10.244.43.2:9153 + 3 more... 4m10s kube-scheduler <none> 7m31s kubelet 192.168.0.170:4194,192.168.0.183:4194,192.168.0.236:4194 + 12 more... 22s

查看两者的端口

[root@uk8s-a kube-prometheus]# ss -tnlp| grep scheduler LISTEN 0 32768 :::10251 :::* users:(("kube-scheduler",pid=60128,fd=5)) LISTEN 0 32768 :::10259 :::* users:(("kube-scheduler",pid=60128,fd=7)) [root@uk8s-a kube-prometheus]# ss -tnlp| grep contro LISTEN 0 32768 :::10252 :::* users:(("kube-controller",pid=59695,fd=6)) LISTEN 0 32768 :::10257 :::* users:(("kube-controller",pid=59695,fd=7))

创建文件并执行

[root@uk8s-a yaml]# cat schedulerandcontroller-ep-svc.yaml # cat kube-scheduer-service.yaml apiVersion: v1 kind: Service metadata: labels: k8s-app: kube-scheduler name: kube-scheduler namespace: kube-system spec: clusterIP: None ports: - name: https-metrics port: 10259 protocol: TCP targetPort: 10259 - name: http-metrics port: 10251 protocol: TCP targetPort: 10251 type: ClusterIP --- # cat kube-controller-manager-service.yaml apiVersion: v1 kind: Service metadata: labels: k8s-app: kube-controller-manager name: kube-controller-manager namespace: kube-system spec: clusterIP: None ports: - name: https-metrics port: 10257 protocol: TCP targetPort: 10257 - name: http-metrics port: 10252 protocol: TCP targetPort: 10252 type: ClusterIP --- # cat ep-controller-manager.yaml apiVersion: v1 kind: Endpoints metadata: labels: k8s-app: kube-controller-manager name: kube-controller-manager namespace: kube-system annotations: prometheus.io/scrape: 'true' subsets: - addresses: - ip: 192.168.0.236 targetRef: kind: Node name: 192.168.0.236 - ip: 192.168.0.170 targetRef: kind: Node name: 192.168.0.170 - ip: 192.168.0.243 targetRef: kind: Node name: 192.168.0.243 ports: - name: http-metrics port: 10252 protocol: TCP - name: https-metrics port: 10257 protocol: TCP --- # cat ep-scheduler.yaml apiVersion: v1 kind: Endpoints metadata: labels: k8s-app: kube-scheduler name: kube-scheduler namespace: kube-system annotations: prometheus.io/scrape: 'true' subsets: - addresses: - ip: 192.168.0.236 targetRef: kind: Node name: 192.168.0.236 - ip: 192.168.0.170 targetRef: kind: Node name: 192.168.0.170 - ip: 192.168.0.243 targetRef: kind: Node name: 192.168.0.243 ports: - name: http-metrics port: 10251 protocol: TCP - name: https-metrics port: 10259 protocol: TCP 2) node-exporter的 target 显示(3/5)

有两个有问题的 Node,同时查看 kubectl top node 也发现问题,节点数据看不到

[root@uk8s-a kube-prometheus]# kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% 192.168.0.170 110m 5% 1360Mi 36% 192.168.0.236 114m 5% 1569Mi 42% 192.168.0.243 101m 5% 1342Mi 36% 192.168.0.183 <unknown> <unknown> <unknown> <unknown> 192.168.0.86 <unknown> <unknown> <unknown> <unknown>

解决办法:

查看问题节点所对应的 Pod

[root@uk8s-a kube-prometheus]# kubectl get pods -o custom-columns='NAME:metadata.name,NODE:spec.nodeName' -n monitoring |grep node node-exporter-2fqt5 192.168.0.243 node-exporter-fxqxb 192.168.0.170 node-exporter-pbq28 192.168.0.183 node-exporter-tvw5j 192.168.0.236 node-exporter-znp6k 192.168.0.86

查看日志

[root@uk8s-a kube-prometheus]# kubectl logs -f node-exporter-znp6k -n monitoring -c kube-rbac-proxy I0627 02:58:01.947861 53400 main.go:213] Generating self signed cert as no cert is provided I0627 02:58:44.246733 53400 main.go:243] Starting TCP socket on [192.168.0.86]:9100 I0627 02:58:44.346251 53400 main.go:250] Listening securely on [192.168.0.86]:9100 E0627 02:59:27.246742 53400 webhook.go:106] Failed to make webhook authenticator request: Post https://10.96.0.1:443/apis/authentication.k8s.io/v1beta1/tokenreviews: dial tcp 10.96.0.1:443: i/o timeout E0627 02:59:27.247585 53400 proxy.go:67] Unable to authenticate the request due to an error: Post https://10.96.0.1:443/apis/authentication.k8s.io/v1beta1/tokenreviews: dial tcp 10.96.0.1:443: i/o timeout E0627 02:59:42.160199 53400 webhook.go:106] Failed to make webhook authenticator request: Post https://10.96.0.1:443/apis/authentication.k8s.io/v1beta1/tokenreviews: dial tcp 10.96.0.1:443: i/o timeout

一直在报连接 10.96.0.1:443 超时,像是 kubernetes 在回包的时候,无法建立连接,

两种解决办法:

在问题节点加入一条防火墙命令(不推荐)

iptables -t nat -I POSTROUTING -s 10.96.0.0/12 -j MASQUERADE

修改 kube-proxy 配置文件,改成正确的 cluster-CIDR (推荐)

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/zzyxyd.html