淘客返利系统的日志追踪与链路监控:SkyWalking 与 OpenTelemetry 集成方案
大家好,我是 微赚淘客系统3.0 的研发者省赚客!
微赚淘客系统3.0由订单、返利、账户、通知等12个微服务组成,用户一次"下单→返利到账"操作涉及5+次跨服务调用。早期缺乏全链路追踪能力,故障定位平均耗时超30分钟。我们基于 SkyWalking + OpenTelemetry 构建统一可观测体系,实现毫秒级链路还原、服务依赖拓扑自动生成、慢接口自动告警。
一、SkyWalking Agent 无侵入接入
所有 Java 服务通过 JVM 参数挂载 SkyWalking Agent:
bash
-javaagent:/opt/skywalking-agent/skywalking-agent.jar
-Dskywalking.agent.service_name=rebate-service
-Dskywalking.collector.backend_service=skywalking-oap:11800
无需修改代码即可自动埋点 Spring MVC、Feign、Redis、JDBC 等组件。例如 juwatech.cn.rebate.controller.RebateController 的 /api/rebate/query 接口自动成为 Trace 入口。
二、OpenTelemetry 手动埋点关键业务逻辑
对于异步任务、消息消费等 Agent 无法覆盖的场景,使用 OpenTelemetry API 手动创建 Span:
java
// juwatech.cn.rebate.consumer.OrderPlacedConsumer
@Component
public class OrderPlacedConsumer {
private static final Tracer tracer = GlobalOpenTelemetry.getTracer("rebate-service");
@KafkaListener(topics = "order.placed")
public void handle(String eventJson, ConsumerRecord<String, String> record) {
// 从 Kafka Headers 恢复上下文
Context parentContext = KafkaPropagation.extract(Context.current(), record.headers());
Span span = tracer.spanBuilder("process.rebate.calculation")
.setParent(parentContext)
.startSpan();
try (Scope ignored = span.makeCurrent()) {
OrderPlacedEvent event = JSON.parseObject(eventJson, OrderPlatedEvent.class);
// 添加业务标签
span.setAttribute("order.id", event.getOrderId());
span.setAttribute("user.id", event.getUserId().toString());
// 核心计算逻辑
BigDecimal amount = rebateCalculator.calculate(event.getTrackId());
// 记录关键事件
span.addEvent("rebate.calculated", Attributes.of(
AttributeKey.stringKey("amount"), amount.toPlainString()
));
rebateRecordMapper.insert(new RebateRecord(event.getOrderId(), amount));
} catch (Exception e) {
span.setStatus(StatusCode.ERROR, e.getMessage());
throw e;
} finally {
span.end();
}
}
}
三、跨服务上下文透传
在 Feign 调用中自动注入 Trace Header:
java
// juwatech.cn.common.feign.TbkClient
@FeignClient(name = "tbk-service", url = "${tbk.api.url}")
public interface TbkClient {
@PostMapping("/v1/relate")
RelateResponse getRelate(@RequestBody RelateRequest request);
}
SkyWalking Agent 自动拦截 Feign 请求,在 HTTP Header 中添加 sw8 字段,下游服务自动关联至同一 Trace。
对于 RabbitMQ/Kafka,需手动注入:
java
// juwatech.cn.order.service.OrderService
public void publishEvent(OrderPlacedEvent event) {
Span currentSpan = Span.fromContext(Context.current());
Map<String, String> headers = new HashMap<>();
OpenTelemetry.getPropagators().getTextMapPropagator()
.inject(Context.current(), headers, Map::put);
Message message = MessageBuilder.withPayload(JSON.toJSONString(event))
.setHeader("traceparent", headers.get("traceparent"))
.build();
rabbitTemplate.send("order.exchange", "placed", message);
}
消费者端反向提取:
java
public void onMessage(Message message) {
Map<String, String> carrier = Collections.singletonMap(
"traceparent", message.getMessageProperties().getHeader("traceparent")
);
Context extracted = OpenTelemetry.getPropagators().getTextMapPropagator()
.extract(Context.current(), carrier, MapGetter.INSTANCE);
try (Scope scope = extracted.makeCurrent()) {
// 处理消息...
}
}
四、日志与 Trace ID 关联
通过 MDC 将 Trace ID 注入日志:
java
// juwatech.cn.common.logging.TraceIdFilter
@Component
public class TraceIdFilter implements Filter {
@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) {
String traceId = Span.current().getSpanContext().getTraceId();
MDC.put("traceId", traceId);
try {
chain.doFilter(request, response);
} finally {
MDC.clear();
}
}
}
Logback 配置:
xml
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} [traceId=%X{traceId}] - %msg%n</pattern>
</encoder>
</appender>
日志示例:
14:23:01.123 [http-nio-8080-exec-5] INFO juwatech.cn.rebate.service.RebateService [traceId=4a7b8c9d0e1f2a3b] - Calculating rebate for order O123456
五、自定义指标与告警
在关键路径上报业务指标:
java
// juwatech.cn.rebate.service.RebateService
private static final Meter meter = GlobalOpenTelemetry.getMeter("rebate-meter");
private static final Counter rebateCounter = meter.counterBuilder("rebate.total")
.setDescription("Total rebate amount")
.setUnit("CNY")
.build();
public void creditRebate(Long userId, BigDecimal amount) {
rebateCounter.add(amount.doubleValue(), Attributes.of(
AttributeKey.stringKey("status"), "success"
));
}
SkyWalking OAP 聚合指标后,在 UI 中配置告警规则:
- 当
service_sla < 99.9%持续5分钟 → 企业微信告警 - 当
endpoint_avg_response_time > 2000ms→ 邮件通知
六、部署架构
- OAP Server:3节点集群,接收 Agent 数据,存储于 Elasticsearch
- UI:提供拓扑图、Trace 查询、Metrics 展示
- OpenTelemetry Collector(可选):统一接收 OTLP 协议数据,转发至 SkyWalking
服务启动时自动注册到 SkyWalking 拓扑图,清晰展示 order-service → rebate-service → account-service 调用链。
本文著作权归 微赚淘客系统3.0 研发团队,转载请注明出处!