一、Kubernetes日志概述
Kubernetes日志是排查问题的关键:
日志类型:
- 容器日志(stdout/stderr)
- 宿主机日志
- 应用日志
- K8s组件日志
二、ELK架构
1. 架构图
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Fluentd │ │ Fluentd │ │ Fluentd │ │ Fluentd │
│ Node1 │ │ Node2 │ │ Node3 │ │ NodeN │
└────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘
│ │ │ │
└───────────────┼───────────────┼───────────────┘
│
┌──────┴──────┐
│ Elasticsearch│
│ Cluster │
└──────┬──────┘
│
┌───────────┼───────────┐
│ │ │
┌──────┴──────┐ ┌──┴──┐ ┌──────┴──────┐
│ Kibana │ │ API │ │ Logstash │
└─────────────┘ └─────┘ └─────────────┘
2. Fluentd部署
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
data:
fluent.conf: |
<source>
@type tail
@id input_tail
@label @mainstream
<parse>
@type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
</source>
<filter kubernetes.**>
@type kubernetes_metadata
@id filter_kube_metadata
</filter>
<match **>
@type elasticsearch
host elasticsearch.logging.svc
port 9200
logstash_format true
logstash_prefix kubernetes
</match>
3. DaemonSet部署
yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: logging
spec:
selector:
matchLabels:
app: fluentd
template:
metadata:
labels:
app: fluentd
spec:
serviceAccount: fluentd
containers:
- name: fluentd
image: fluent/fluentd:v1.16-debian-1
volumeMounts:
- name: config
mountPath: /etc/fluent/config.d/
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: config
configMap:
name: fluentd-config
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
三、Loki架构
1. 架构图
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Promtail│ │ Promtail│ │ Promtail│
│ Node1 │ │ Node2 │ │ Node3 │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└───────────────┼───────────────┘
│
┌──────┴──────┐
│ Loki │
│ Distributor│
└──────┬──────┘
│
┌──────┴──────┐
│ Ingester │
└──────┬──────┘
│
┌──────┴──────┐
│ Chunk │
│ Storage │
└──────┬──────┘
│
┌───────────┼───────────┐
│ │ │
┌──────┴──────┐ ┌──┴──┐ ┌──────┴──────┐
│ Grafana │ │ API │ │ PromQL │
└─────────────┘ └─────┘ └─────────────┘
2. Loki部署
yaml
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: loki-stack
namespace: monitoring
spec:
size: 1x.small
storage:
type: filesystem
services:
- name: read
replicas: 1
- name: write
replicas: 1
tenants:
mode: single
3. Promtail配置
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: promtail-config
data:
promtail.yaml: |
server:
http_listen_port: 3100
grpc_listen_port: 9096
positions:
filename: /tmp/positions.yaml
client:
url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: kubernetes
kubernetes:
kubeconfig_file: ""
labels:
cluster: "k8s-prod"
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*.log
- targets:
- localhost
labels:
job: containers
__path__: /var/log/containers/*.log
四、ELK vs Loki对比
| 特性 | ELK (Elasticsearch) | Loki |
|---|---|---|
| 存储 | Elasticsearch | 对象存储 |
| 索引 | 全文索引 | Label索引 |
| 资源消耗 | 高 | 低 |
| 查询 | Lucene | LogQL |
| 成本 | 高 | 低 |
五、日志查询
1. Kibana查询
# 搜索包含error的日志
error
# 搜索特定namespace
kubernetes.namespace_name: production
# 组合查询
kubernetes.pod_name: myapp AND error
2. LogQL查询
promql
# 查询所有日志
{job="myapp"}
# 过滤日志级别
{job="myapp"} |= "ERROR"
# 统计日志数量
count_over_time({job="myapp"}[5m])
# 解析JSON日志
json | level="error"
六、最佳实践
1. 日志规范
json
{
"timestamp": "2024-01-15T10:00:00Z",
"level": "ERROR",
"service": "order-service",
"trace_id": "abc123",
"message": "Order creation failed",
"error": "Database connection timeout"
}
2. 日志收集策略
- 结构化日志(JSON)
- 统一日志级别
- 包含trace_id
- 敏感信息脱敏
七、总结
日志收集方案选择:
- ELK:功能强大,资源消耗高
- Loki:轻量级,成本低
- 推荐:中小规模使用Loki,大规模使用ELK
个人观点,仅供参考