PHP 中使用 opentelemetry-auto-laravel 进行链路追踪时间片段不连续的问题

先上正菜

  • PHP项目上了opentelemetry的时候发现有部分片段时间不连续

接入配置(如有需要, 点击这里查看详情)

接入

配置

env 复制代码
TEL_PHP_AUTOLOAD_ENABLED=true
TEL_SERVICE_NAME=test
TEL_TRACES_EXPORTER=otlp
TEL_METRICS_EXPORTER=none
TEL_LOGS_EXPORTER=none
TEL_EXPORTER_OTLP_PROTOCOL=grpc
TEL_EXPORTER_OTLP_ENDPOINT=http://
TEL_EXPORTER_OTLP_HEADERS=Authentication=xxx
TEL_EXPORTER_OTLP_TIMEOUT=1000
TEL_EXPORTER_OTLP_TRACES_TIMEOUT=1000

运行

  • open-telemetry/opentelemetry-auto-laravel这个项目通过composer.json_register.phpLaravel自动加载github.com/open-teleme...
json 复制代码
{
  "files": [
    "_register.php"
  ]
}
php 复制代码
public static function autoload(): bool
{
    if (!self::isEnabled() || self::isExcludedUrl()) {
        return false;
    }
    Globals::registerInitializer(function (Configurator $configurator) {
        $propagator = (new PropagatorFactory())->create();
        if (Sdk::isDisabled()) {
            //@see https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/configuration/sdk-environment-variables.md#general-sdk-configuration
            return $configurator->withPropagator($propagator);
        }
        $emitMetrics = Configuration::getBoolean(Variables::OTEL_PHP_INTERNAL_METRICS_ENABLED);

        $resource = ResourceInfoFactory::defaultResource();
        $exporter = (new ExporterFactory())->create();
        $meterProvider = (new MeterProviderFactory())->create($resource);
        // 主要关注这一行, 这里我们会创建出一个 BatchSpanProcessor 
        $spanProcessor = (new SpanProcessorFactory())->create($exporter, $emitMetrics ? $meterProvider : null);
        $tracerProvider = (new TracerProviderBuilder())
            ->addSpanProcessor($spanProcessor)
            ->setResource($resource)
            ->setSampler((new SamplerFactory())->create())
            ->build();

        $loggerProvider = (new LoggerProviderFactory())->create($emitMetrics ? $meterProvider : null, $resource);

        ShutdownHandler::register($tracerProvider->shutdown(...));
        ShutdownHandler::register($meterProvider->shutdown(...));
        ShutdownHandler::register($loggerProvider->shutdown(...));

        return $configurator
            ->withTracerProvider($tracerProvider)
            ->withMeterProvider($meterProvider)
            ->withLoggerProvider($loggerProvider)
            ->withPropagator($propagator)
        ;
    });

    return true;
}

案例代码

  • 自定义一个类
php 复制代码
<?php

namespace App\Service\Tracing;

use OpenTelemetry\API\Globals;
use OpenTelemetry\API\Trace\SpanInterface;
use OpenTelemetry\API\Trace\TracerInterface;

class Tracer
{
    protected TracerInterface $tracer;
    protected ?SpanInterface $lastSpan = null;
    protected ?SpanInterface $rootSpan = null;
    /**
     * @var array Span
     */
    protected array $spanMap = [];

    public function __construct()
    {
        $this->tracer = Globals::tracerProvider()
            ->getTracer('io.opentelemetry.contrib.php.laravel');
    }

    /**
     * 请查看 AppServiceProvider 注册为 scoped, 适用于 Octane
     * @return Tracer
     */
    public static function getInstance(): Tracer
    {
        return app(Tracer::class);
    }

    public function startRootSpan($name): void
    {
        $span = $this->startSpan($name);
        $this->rootSpan = $span;
    }

    public function startAndEndLastSpan($name): SpanInterface
    {
        $this->endLastSpan();
        return $this->startSpan($name);
    }

    public function startSpan($name): SpanInterface
    {
        $span = $this->tracer->spanBuilder($name)->startSpan();
        $this->spanMap[$name] = $span;
        $this->lastSpan = $span;
        return $span;
    }

    public function endRootSpan(): void
    {
        $this->endSpan($this->rootSpan);
    }

    /**
     * @return void 方便的 end 上一个 span
     */
    public function endLastSpan(): void
    {
        $this->endSpan($this->lastSpan);
    }

    public function endSpan(?SpanInterface $span): void
    {
        if (is_null($span)) {
            return;
        }

        $span->end();
    }
}
  • Tracer类注册到服务提供者app/Providers/AppServiceProvider.php
php 复制代码
<?php

namespace App\Providers;

use App\Utils\Tracing\Tracer;
use Illuminate\Support\ServiceProvider;

class AppServiceProvider extends ServiceProvider
{
    /**
     * Register any application services.
     *
     * @return void
     */
    public function register()
    {
        // 整个生命周期只注册一次
        $this->app->scoped(Tracer::class, function () {
            return new Tracer();
        });
    }

    /**
     * Bootstrap any application services.
     *
     * @return void
     */
    public function boot()
    {
        //
    }
}
  • 在控制器使用
php 复制代码
<?php

namespace App\Http\Controllers;

use App\Service\Tracing;

class IndexAlbumsController extends Controller
{
    public function index()
    {
        $tracer = Tracer::getInstance();
        // 步骤0
        $tracer->startRootSpan('xxxx');
        
        // 步骤1
        $tracer->startSpan('s1');
        // 业务代码xxx
        // 结束步骤1, 并开启步骤2
        $tracer->startAndEndLastSpan('s2');
        // 业务代码xxx
        // 结束步骤2
        $tracer->endLastSpan();
        
        // 结束 root
        $tracer->endRootSpan();
    }
}

问题

  • 代码很简单, 就追踪几个函数, 看耗时, 不出意外的话, 意外还是发生了
  • 线上偶尔会在$span->end()的时候耗时几百毫秒, 百思不得其解

查看end()的实现

  • 实际上会走到BatchSpanProcessor类的onEnd方法
php 复制代码
class BatchSpanProcessor {

    public function onEnd(ReadableSpanInterface $span): void
    {
        if ($this->closed) {
            return;
        }
        if (!$span->getContext()->isSampled()) {
            return;
        }

        if ($this->queueSize === $this->maxQueueSize) {
            $this->dropped++;

            return;
        }

        $this->queueSize++;
        $this->batch[] = $span->toSpanData();
        $this->nextScheduledRun ??= $this->clock->now() + $this->scheduledDelayNanos;

        if (count($this->batch) === $this->maxExportBatchSize) {
            $this->enqueueBatch();
        }
        if ($this->autoFlush) {
            // flush 
            $this->flush();
        }
    }
}
  • 所以罪魁祸首flush方法, 这里会根据配置到达一定数量, 一定时间把链路追踪上报
  • 由于PHP常规运行没有多线程, flush上报链路追踪的时候会阻塞当前进程

解决办法

  1. flush 方法上多线程, 短期内不可能, 估计百分之九十九的项目都是没用多线程的
  2. opentelemetry.io/docs/collec...使用Opentelemetry collector代理
  3. 装作没看到!!!
相关推荐
安的列斯凯奇1 小时前
SpringBoot篇 单元测试 理论篇
spring boot·后端·单元测试
架构文摘JGWZ2 小时前
FastJson很快,有什么用?
后端·学习
BinaryBardC2 小时前
Swift语言的网络编程
开发语言·后端·golang
邓熙榆2 小时前
Haskell语言的正则表达式
开发语言·后端·golang
专职5 小时前
spring boot中实现手动分页
java·spring boot·后端
Ciderw5 小时前
Go中的三种锁
开发语言·c++·后端·golang·互斥锁·
m0_748246356 小时前
SpringBoot返回文件让前端下载的几种方式
前端·spring boot·后端
m0_748230446 小时前
创建一个Spring Boot项目
java·spring boot·后端
卿着飞翔6 小时前
Java面试题2025-Mysql
java·spring boot·后端
C++小厨神6 小时前
C#语言的学习路线
开发语言·后端·golang