OceanBase集群诊断工具：obdiag

obdiag安装部署
obdiag配置文件
- 系统配置文件
- 用户侧配置文件
obdiag使用场景：一键集群巡检
obdiag使用场景：一键诊断分析
- 日志诊断分析
- 全链路日志诊断分析
obdiag使用场景：一键根因分析
obdiag使用场景：一键信息收集
- 常规诊断信息收集
- 场景化诊断信息收集
obdiag使用场景：一键集群洞察

obdiag最新版本：V3.7.1

OceanBase敏捷诊断工具（OceanBase Diagnostic Tool，简称obdiag）是一款源代码开源敏捷黑屏诊断工具。可以对OceanBase集群进行一键集群巡检、一键信息收集、一键诊断分析、一键根因分析、一键集群洞察。

obdiag安装部署

离线方式部署需要事先从官网下载RPM安装包。

离线独立部署：

bash 复制代码

yum localinstall -y oceanbase-diagnostic-tool*.rpm
sh /opt/oceanbase-diagnostic-tool/init.sh   # 建议使用OB部署用户执行

obd离线部署（建议升级obd到V2.5.0及以上版本）：

bash 复制代码

obd mirror clone oceanbase-diagnostic-tool-xxxxxxxx.rpm
obd obdiag deploy

obdiag配置文件

obdiag配置文件的路径有两个：

obdiag自用的系统配置文件，路径是/opt/oceanbase-diagnostic-tool/conf/inner_config.yml，一般情况下无需修改。
用户侧的配置文件，默认路径是~/.obdiag/config.yml，支持自定义路径，并且支持生成多个集群的配置文件。

系统配置文件

📖obdiag自用的inner_config.yml系统配置文件内容格式如下：

yaml 复制代码

obdiag:
  basic:
    config_path: ~/.obdiag/config.yml # 用户侧的配置文件路径
    config_backup_dir: ~/.obdiag/backup_conf # 通过 obdiag config 命令执行时，老配置文件会进行备份，备份路径
    file_number_limit: 50 # 对单台远程主机执行一次采集命令回传的文件数量上限
    file_size_limit: 5G # 对单台远程主机执行一次采集命令回传的文件大小上限
    dis_rsa_algorithms: 0 # 禁用RSA算法，主要目的是解决SSH连接兼容性问题，默认值为0
    strict_host_key_checking: 0 # 控制 SSH 连接时的主机密钥检查策略，用于验证远程服务器的身份。
  logger:
    log_dir: ~/.obdiag/log  # obdiag 自身的执行日志存储路径
    log_filename: obdiag.log # obdiag 自身的执行日志存储文件名
    file_handler_log_level: DEBUG # obdiag 自身的执行日志输出的最低级别
    log_level: INFO # obdiag 自身的执行日志级别
    mode: obdiag
    stdout_handler_log_level: INFO # obdiag 打印到屏幕上的最低日志级别
    error_stream: sys.stdout # 错误日志输出流，默认值为 sys.stdout
    silent: false # 是否静默模式
  ssh_client:
    remote_client_sudo: 0 # 是否使用 sudo 执行远程命令，默认值为 0
    cmd_exec_timeout: 180 # 远程命令执行超时时间，默认值为 180 秒
analyze:
  thread_nums: 3
check: # 巡检所需配置，一般场景下不需要变更
  ignore_version: false # 忽略 OceanBase 的版本
  work_path: "~/.obdiag/check" # 巡检报告输出路径
  report:
    report_path: "./check_report/" # 巡检报告输出路径
    export_type: table # 巡检报告输出类型
  tasks_base_path: "~/.obdiag/check/tasks/" # 巡检任务的基础目录
gather:
  scenes_base_path: "~/.obdiag/gather/tasks"  # gather场景的目录
  redact_processing_num: 3 
  thread_nums: 3
rca:
  result_path: "./obdiag_rca/" # rca结果存储路径

用户侧配置文件

⭐️通过obd的命令直接部署obdiag工具，无需配置~/.obdiag/config.yml文件，obd会自动生成obdiag配置。

用户侧配置文件可通过obdiag config命令快速生成或者直接编辑配置文件。执行obdiag config后会出现交互形式的输入。

bash 复制代码

obdiag config -h11.22.33.44 -uroot@sys -p****** -P2881
obdiag version: 3.7.1
Please enter the following configuration !!!
Enter your oceanbase host ssh username (default:''): admin   # ssh到各OB主机的用户名
Enter your use password or key file (0:use password; 1:use key file) default: 0: 0
Enter your oceanbase host ssh password (default:''): ********   # ssh到各主机的用户对应的口令
Enter your oceanbase host ssh_port (default:'22'):    # ssh端口号，默认是22
Enter your oceanbase install home_path (default:'/root/observer'): /oceanbase/app/4.2   # OceanBase安装目录
Enter your need config obproxy [y/N] (default:'N'):    # 被诊断的集群是否需要带上obproxy
Node information has been rewritten to the configuration file /home/admin/.obdiag/config.yml, and you can enjoy the journey !
Trace ID: b33e9fb2-b3a7-11f0-ae0e-525400991228
If you want to view detailed obdiag logs, please run: obdiag display-trace b33e9fb2-b3a7-11f0-ae0e-525400991228

如果集群中部署了obproxy，还需要为obdiag配置obproxy的信息：

bash 复制代码

...
Enter your need config obproxy [y/N] (default:'N'): y # 被诊断的集群是否需要带上obproxy
Enter your obproxy server eg:'192.168.1.1;192.168.1.2;192.168.1.3' (default:''): xx.xx.xx.xx # obproxy的节点ip
Enter your obproxy host ssh username (default:''): test # obproxy的节点的ssh用户
Enter your obproxy host ssh password (default:''): ********* # obproxy的节点的ssh用户密码
Enter your obproxy host ssh port (default:'22'): 22  # obproxy的节点的ssh的端口号，默认是22
Enter your obproxy install home_path (default:'/root/obproxy'): /home/admin/obproxy # obproxy的安装目录

执行完成后在~/.obdiag/config.yml中会生成一份新的配置，如果原来~/.obdiag/config.yml存在内容，将会将老配置备份到~/.obdiag/backup_conf目录下。

📖完整的~/.obdiag/config.yml配置文件内容格式如下：

yaml 复制代码

# 第一部分：OCP 相关的
ocp:
  login:
    url: http://xx.xx.xx.xxx:xx
    user: ****
    password: ******
    
# 第二部分：obcluster 相关的
obcluster:
  ob_cluster_name: test # 集群名
  db_host: xx.xx.xx.1 # 集群的连接地址
  db_port: 2881 # default 2881
  tenant_sys: # sys租户的配置信息，为避免权限问题，建议配置root@sys
    user: root@sys # 默认为 root@sys
    password: ""
  servers:
    nodes:
      - ip: xx.xx.xx.1
      - ip: xx.xx.xx.2
      - ip: xx.xx.xx.3
    global:
      ssh_username: **** # 登录信息，建议使用部署observer时所使用的用户信息
      ssh_password: **** # 若不使用密码登录或无密码，可设置为""
      # ssh_port: 22 # ssh端口，默认为22
      # ssh_key_file: "" # ssh秘钥地址，与ssh_password二选一即可
      # ssh_type: remote # observer的部署模式目前支持remote、docker（不支持kube）默认为remote
      # container_name: xxx # 当ssh_type为docker，此项必填，为observer的容器名
      
      # observer的安装目录，例如observer的可执行程序为/root/observer/bin/observer
      # 则需要填写的home_path为/root/observer
      home_path: /root/observer   
      data_dir: /root/observer/store # observer的数据盘路径，一般为上文提到的${home_path}/store，与obd内同名配置概念一致
      redo_dir: /root/observer/store # observer的日志盘盘路径，一般为上文提到的${home_path}/store，与obd内同名配置概念一致
      
# 第三部分：obproxy 相关
obproxy:
  obproxy_cluster_name: obproxy
  servers:
    nodes:
      - ip: xx.xx.xx.4
      - ip: xx.xx.xx.5
      - ip: xx.xx.xx.6
    global:
      ssh_username: **** # 登录信息，建议使用部署obproxy时所使用的用户信息
      ssh_password: **** # 若不使用密码登录或无密码，可设置为""
      # ssh_port: 22 # ssh端口，默认为22
      # ssh_key_file: "" # ssh秘钥地址，与ssh_password二选一即可
      # ssh_type: remote # obproxy的部署模式目前支持remote、docker（不支持kube）默认为remote
      # container_name: xxx # 当ssh_type为docker，此项必填，为obproxy的容器名
      
      # obproxy的安装目录，例如obproxy的可执行程序为/root/obproxy/bin/obproxy
      # 则需要填写的home_path为/root/obproxy
      home_path: /root/obproxy

可以直接编辑生成的配置文件。其中global下的配置优先级低于特定节点下的配置，如果节点下配置了，将会走节点下的特定配置，如果节点没配置特定信息，走global下的配置。

下面演示了单独为每个节点配置不同的ssh连接信息和OB目录信息：

yaml 复制代码

obcluster:
  ob_cluster_name: test
  db_host: xx.xx.xx.1
  db_port: 2881 # default 2881
  tenant_sys:
    user: root@sys # default root@sys
    password: ""
  servers:
    nodes:
      - ip: xx.xx.xx.1
        ssh_username: ****
        ssh_password: ****1
        home_path: /root/observer1
        data_dir: /root/observer/store1
        redo_dir: /root/observer/store1
      - ip: xx.xx.xx.2
        ssh_username: ****2
        ssh_password: ****2
        home_path: /root/observer2
        data_dir: /root/observer/store2
        redo_dir: /root/observer/store2
      - ip: xx.xx.xx.3
        ssh_username: ****3
        ssh_password: ****3
        home_path: /root/observer3
        data_dir: /root/observer/store3
        redo_dir: /root/observer/store3
    global:
      ssh_port: 22

obdiag使用场景：一键集群巡检

使用obdiag check run命令可对OceanBase数据库集群运行状态进行巡检。

列出obdiag的所有巡检套餐：

bash 复制代码

[admin@observer01 ~]$ obdiag check list
obdiag version: 3.7.1

[check cases about obproxy]:
---------------------------------------------------------------------------------------------------
command                                  info_en                                 info_cn            
---------------------------------------------------------------------------------------------------
obdiag check run                         default check all task without filter   默认执行除filter组里的所有巡检项
obdiag check run --obproxy_cases=proxy   obproxy version check                   obproxy 版本检查       
---------------------------------------------------------------------------------------------------

[check cases about observer]:
---------------------------------------------------------------------------------------------------------------------------------------------------------------
command                                       info_en                                                                           info_cn                         
---------------------------------------------------------------------------------------------------------------------------------------------------------------
obdiag check run                              default check all task without filter                                             默认执行除filter组里的所有巡检项             
obdiag check run --cases=ad                   Test and inspection tasks                                                         测试巡检任务                          
obdiag check run --cases=column_storage_poc   column storage poc                                                                列存POC检查                         
obdiag check run --cases=build_before         Deployment environment check                                                      部署环境检查                          
obdiag check run --cases=sysbench_run         Collection of inspection tasks when executing sysbench                            执行sysbench时的巡检任务集合              
obdiag check run --cases=sysbench_free        Collection of inspection tasks before executing sysbench                          执行sysbench前的巡检任务集合              
obdiag check run --cases=k8s_basic            Collection of basic inspection tasks for OceanBase deployed on Kubernetes         Kubernetes 中部署的 OceanBase 集群基础巡检
obdiag check run --cases=k8s_performance      Collection of performance inspection tasks for OceanBase deployed on Kubernetes   Kubernetes 中部署的 OceanBase 集群性能巡检
---------------------------------------------------------------------------------------------------------------------------------------------------------------

指定配置文件运行集群全量巡检：

bash 复制代码

obdiag check run   # 默认使用配置文件 ~/.obdiag/config.yml
obdiag check run -c /home/admin/.obdiag/cluster1_config.yml

不使用配置文件运行一键全量巡检：

bash 复制代码

obdiag check run \
    --config db_host=xx.xx.xx.xx \
    --config db_port=xxxx \
    --config tenant_sys.user=root@sys \
    --config tenant_sys.password=*** \
    --config obcluster.servers.global.ssh_username=test \
    --config obcluster.servers.global.ssh_password=****** \
    --config obcluster.servers.global.home_path=/home/admin/oceanbase
    --config obcluster.servers.nodes[1].data_dir=/home/admin/oceanbase/store
    --config obcluster.servers.nodes[1].redo_dir=/home/admin/oceanbase/store
    --config obproxy.servers.nodes[0].ip=xx.xx.xx.1 \
    --config obproxy.servers.nodes[1].ip=xx.xx.xx.xx.2 \
    --config obproxy.servers.global.ssh_username=test \
    --config obproxy.servers.global.ssh_password=****** \
    --config obproxy.servers.global.home_path=/home/admin/obproxy

obdiag使用场景：一键诊断分析

obdiag一键诊断分析支持日志诊断分析、全链路日志诊断分析、参数和变量分析、索引空间分析、内存分析诊断、队列积压分析等多种场景。

日志诊断分析

使用obdiag analyze log命令可对OceanBase的日志进行分析，找出发生过的错误信息。

日志在线分析：

bash 复制代码

# 默认分析所有节点最近30分钟的日志，默认使用配置文件 ~/.obdiag/config.yml
obdiag analyze log   

# 分析所有节点最近10分钟的日志
obdiag analyze log --since 10m

# 指定日志分析的时间区间
obdiag analyze log --from "2025-10-25 10:00:00" --to "2025-10-25 10:30:00"

通过--files开启日志离线分析模式，需要传递本地OceanBase日志或者日志路径。

bash 复制代码

# 分析指定路径下的所有日志文件
obdiag analyze log --files /oceanbase/app/4.2/log/

# 分析单个日志文件
obdiag analyze log --files /oceanbase/app/4.2/log/observer.log.20251028120258846

分析结果Status为PASS表示没有发现错误。

全链路日志诊断分析

🐬 全链路诊断是对全链路所有组件进行问题定位的诊断。

OceanBase数据库是分布式数据库，因此调用链路复杂，当出现超时问题的时，往往无法快速定位是OceanBase 内部组件或是网络的问题，运维人员只能根据经验和observer日志进行分析。OB内核在4.0新增了trace.log日志，可以用于分析全链路诊断。

全链路有两条路径，一条是从应用通过客户端（JDBC或OCI等）下发请求给ODP（代理服务器）访问OBServer，访问结果返回给应用；

另一条是从应用通过客户端（JDBC或OCI等）直接访问OBServer，访问结果返回给应用。

使用obdiag analyze flt_trace命令可对OceanBase的全链路日志进行分析，给出全链路诊断报告。

使用示例

在gv$ob_sql_audit视图中查找SQL的flt_trace_id。

sql 复制代码

SQL> select query_sql, flt_trace_id from oceanbase.gv$ob_sql_audit where query_sql like 'select @@version_comment limit 1';
+----------------------------------+--------------------------------------+
| query_sql                        | flt_trace_id                         |
+----------------------------------+--------------------------------------+
| select @@version_comment limit 1 | 00060aa3-d607-f5f2-328b-388e17f687cb |
+----------------------------------+--------------------------------------+

⚠️ 关于gv$ob_sql_audit返回的flt_trace_id字段值可能为空的问题：

gv$ob_sql_audit里的每一个在observer运行过的SQL记录，都会有对应的trace_id，但是flt_trace_id不一定会有，通常这是OB抽样决定的。

如果客户端会话链路上所有组件都开通了全链路诊断对应的能力，那么这个会话发出的所有sql的flt_trace_id都有值。这个有性能损失代价，所以不是默认行为。

这个规则将来还可能改变。

或者也可从OBProxy、OceanBase数据库的trace.log日志中找到flt_trace_id：

bash 复制代码

less trace.log

[2023-12-07 22:20:07.242229] [489640][T1_L0_G0][T1][YF2A0BA2DA7E-00060BEC28627BEF-0-0] {"trace_id":"00060bec-275e-9832-e730-7c129f2182ac","name":"close_das_task","id":"00060bec-2a20-bf9e-56c9-724cb467f859","start_ts":1701958807240606,"end_ts":1701958807240607,"parent_id":"00060bec-2a20-bb5f-e03a-5da01aa3308b","is_follow":false}

其中，00060bec-275e-9832-e730-7c129f2182ac就是其flt_trace_id。

执行全链路诊断命令。

bash 复制代码

obdiag analyze flt_trace --flt_trace_id 00060aa3-d607-f5f2-328b-388e17f687cb

obdiag使用场景：一键根因分析

使用obdiag rca命令可帮助OceanBase数据库相关的诊断信息分析，目前支持对OceanBase的异常场景进行分析，找出可能导致问题的原因。

列出obdiag的根因分析支持的所有场景：

bash 复制代码

[admin@observer01 ~]$ obdiag rca list
obdiag version: 3.7.1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
command                                                                                       info_en                                                                                                 info_cn                                                
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
obdiag rca run --scene=clog_disk_full                                                         Identify the issue of clog disk space being full.                                                       clog日志磁盘空间满的问题                                         
obdiag rca run --scene=ddl_disk_full                                                          Insufficient disk space reported during DDL process.                                                    DDL过程中报磁盘空间不足的问题                                       
obdiag rca run --scene=ddl_failure                                                            diagnose ddl failure                                                                                    诊断ddl失败                                                
obdiag rca run --scene=delete_server_error --env svr_ip=xxx.xxx.xxx.xxx --env svr_port=2881   Diagnose issues during observer node removal in the cluster                                             排查删除 observer 节点时遇到的问题                                 
obdiag rca run --scene=disconnection                                                          root cause analysis of disconnection                                                                    针对断链接场景的根因分析                                           
obdiag rca run --scene=index_ddl_error                                                        Troubleshooting errors in indexing execution.                                                           建索引执行报错问题排查                                            
obdiag rca run --scene=lock_conflict                                                          root cause analysis of lock conflict                                                                    针对锁冲突的根因分析                                             
obdiag rca run --scene=log_error                                                              Troubleshooting log related issues. Currently supported scenes: no_leader.                              日志相关问题排查。目前支持：无主场景。                                    
obdiag rca run --scene=major_hold                                                             root cause analysis of major hold                                                                       针对卡合并场景的根因分析                                           
obdiag rca run --scene=memory_full                                                            [beta] memory full. e.g. error_code_4013 .                                                              [beta] 内存爆问题排查                                         
obdiag rca run --scene=oms_full_trans                                                         OMS full connector error                                                                                oms全量迁移报错                                              
obdiag rca run --scene=oms_obcdc                                                              OMS obcdc log                                                                                           oms obcdc 组件问题分析                                       
obdiag rca run --scene=replay_hold                                                            [beta] replay hold                                                                                      [beta] 回放卡问题排查                                         
obdiag rca run --scene=suspend_transaction                                                    root cause analysis of suspend transaction                                                              悬挂事务                                                   
obdiag rca run --scene=transaction_disconnection                                              root cause analysis of transaction disconnection                                                        针对事务断连场景的根因分析                                          
obdiag rca run --scene=transaction_execute_timeout                                            transaction execute timeout error, error_code like -4012. Need input err_msg                            事务执行超时报错                                               
obdiag rca run --scene=transaction_not_ending                                                 transaction wait timeout error (beta), error_code like -4012                                            事务不结束场景（测试版），目前使用较为复杂                                  
obdiag rca run --scene=transaction_other_error                                                transaction other error, error_code like -4030，-4121，-4122，-4124，-4019                                  事务其他错误，除了目前已经列出的错误，比如错误码为：-4030，-4121，-4122，-4124，-4019
obdiag rca run --scene=transaction_rollback                                                   transaction rollback error. error_code like -6002                                                       事务回滚报错                                                 
obdiag rca run --scene=transaction_wait_timeout                                               transaction wait timeout error, error_msg like 'Shared lock conflict' or 'Lock wait timeout exceeded'   事务等待超时报错                                               
obdiag rca run --scene=unit_gc                                                                [beta] unit gc 问题排查.                                                                                    [beta] clog日志磁盘空间满的问题                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

obdiag使用场景：一键信息收集

使用obdiag gather命令可帮助OceanBase数据库相关的诊断信息收集。

常规诊断信息收集

使用obdiag gather [type]命令可进行OceanBase数据库常规诊断信息的收集。

gather type包含如下:

log：一键收集所属OceanBase集群的日志。
sysstat：一键收集所属OceanBase集群主机信息。
clog：一键收集所属OceanBase集群的clog日志。
slog：一键收集所属OceanBase集群的slog日志。
plan_monitor：一键收集所属OceanBase集群指定trace_id的SQL的执行详情信息。
stack：一键收集所属OceanBase集群的堆栈信息。
perf：一键收集所属OceanBase集群的perf信息（火焰图/扁鹊图）。
ash：一键收集ASH报告。
tabledump：一键收集表信息（包括表结构、数据分布信息等）。
parameter：一键收集集群参数信息，并存放成csv文件。
core：一键收集所属OceanBase集群的core文件。
obproxy_log：一键收集所属OceanBase集群所依赖的ODP的日志。
all：一键统一收集所属OceanBase集群的诊断信息。

收集日志：

bash 复制代码

# 收集指定时间段的日志
obdiag gather log --from "2025-06-30 17:30:00" --to "2025-06-30 18:30:00" 

# 收集最近一段时间的日志
obdiag gather log --since 1h

# 过滤关键字
obdiag gather log --from "2025-06-30 17:30:00" --to "2025-06-30 18:30:00" --grep "AAAAA" --grep "BBBBB"

收集SQL执行详情：

bash 复制代码

# 执行完SQL获取trace_id
SELECT last_trace_id();

# 从系统视图获取trace_id
select query_sql,trace_id from oceanbase.GV$OB_SQL_AUDIT where query_sql like 'xxx%' order by REQUEST_TIME desc limit 5;

# 收集SQL执行信息
obdiag gather plan_monitor --trace_id YB420BA2D99B-0005EBBFC45D5A00-0-0 --env "{db_connect='-hxx -Pxx -uxx -pxx -Dxx'}"

场景化诊断信息收集

使用obdiag gather scene命令可进行OceanBase数据库场景化诊断信息的收集。

查看当前支持的诊断信息收集场景：

bash 复制代码

[admin@observer01 ~]$ obdiag gather scene list
obdiag version: 3.7.1

[Observer Problem Gather Scenes]:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
command                                                                                                                                   info_en                                       info_cn            
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
obdiag gather scene run --scene=observer.backup --env "{tenant_id=xxx}"                                                                   [backup problem]                              [数据备份问题]           
obdiag gather scene run --scene=observer.backup_clean --env "{tenant_id=xxx}"                                                             [backup clean]                                [备份清理问题]           
obdiag gather scene run --scene=observer.base                                                                                             [cluster base info]                           [集群基础信息]           
obdiag gather scene run --scene=observer.clog_disk_full                                                                                   [clog disk full]                              [clog盘满]           
obdiag gather scene run --scene=observer.cluster_down                                                                                     [cluster down]                                [集群无法连接]           
obdiag gather scene run --scene=observer.compaction                                                                                       [compaction]                                  [合并问题]             
obdiag gather scene run --scene=observer.cpu_high --env "{perf_count=100000000}"                                                          [High CPU]                                    [CPU高]             
obdiag gather scene run --scene=observer.delay_of_primary_and_backup                                                                      [delay of primary and backup]                 [主备库延迟]            
obdiag gather scene run --scene=observer.io                                                                                               [io problem]                                  [io问题]             
obdiag gather scene run --scene=observer.log_archive                                                                                      [log archive]                                 [日志归档问题]           
obdiag gather scene run --scene=observer.long_transaction                                                                                 [long transaction]                            [长事务]              
obdiag gather scene run --scene=observer.memory                                                                                           [memory problem]                              [内存问题]             
obdiag gather scene run --scene=observer.perf_sql --env "{db_connect='-h127.0.0.1 -P2881 -utest@test -p****** -Dtest', trace_id='Yxx'}"   [SQL performance problem]                     [SQL性能问题]          
obdiag gather scene run --scene=observer.px_collect_log --env "{trace_id='Yxx', estimated_time='2025-10-29 10:56:39'}"                    [Collect error source node logs for SQL PX]   [SQL PX 收集报错源节点日志] 
obdiag gather scene run --scene=observer.recovery                                                                                         [recovery]                                    [数据恢复问题]           
obdiag gather scene run --scene=observer.restart                                                                                          [restart]                                     [observer无故重启]     
obdiag gather scene run --scene=observer.rootservice_switch                                                                               [rootservice switch]                          [有主改选或者无主选举的切主]    
obdiag gather scene run --scene=observer.sql_err --env "{db_connect='-h127.0.0.1 -P2881 -utest@test -p****** -Dtest', trace_id='Yxx'}"    [SQL execution error]                         [SQL 执行出错]         
obdiag gather scene run --scene=observer.suspend_transaction                                                                              [suspend transaction]                         [悬挂事务]             
obdiag gather scene run --scene=observer.topsql                                                                                           [topsql info]                                 [集群 topsql]        
obdiag gather scene run --scene=observer.unit_data_imbalance                                                                              [unit data imbalance]                         [unit迁移/缩小 副本不均衡问题]
obdiag gather scene run --scene=observer.unknown                                                                                          [unknown problem]                             [未能明确问题的场景]        
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

obdiag使用场景：一键集群洞察

查看当前支持的场景：

bash 复制代码

[admin@observer01 ~]$ obdiag display scene list
obdiag version: 3.7.1

[Observer Problem Display Scenes]:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
command                                                                                                                                                                 info_en                                                                                         info_cn                               
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
obdiag display scene run --scene=observer.all_tenant                                                                                                                    [all_tenant]                                                                                    [所有tenant基本信息]                        
obdiag display scene run --scene=observer.cluster_info                                                                                                                  [cluster info]                                                                                  [集群信息展示]                              
obdiag display scene run --scene=observer.cpu                                                                                                                           [cpu]                                                                                           [tenant的cpu信息]                        
obdiag display scene run --scene=observer.database_datasize --env tenant_id=1 --env database_name=test                                                                  [database data_size]                                                                            [查看库占用磁盘大小]                           
obdiag display scene run --scene=observer.event --env tenant_name=test                                                                                                  [event]                                                                                         [event信息]                             
obdiag display scene run --scene=observer.index --env database_name=test --env table_name=test                                                                          [index]                                                                                         [查询表上的index信息]                        
obdiag display scene run --scene=observer.inner_table --env tablename=test                                                                                              [inner_table]                                                                                   [内部表信息模糊匹配]                           
obdiag display scene run --scene=observer.leader --env level=('all' or tenant_id or table_name)                                                                         [leader]                                                                                        [ob集群的leader信息]                       
obdiag display scene run --scene=observer.lock_table --env tablename=test                                                                                               [lock table]                                                                                    [某张表上持有锁的信息]                          
obdiag display scene run --scene=observer.lockholder                                                                                                                    [lock holder info]                                                                              [查看锁等待]                               
obdiag display scene run --scene=observer.long_transaction --env wait_time=waittime(s)                                                                                  [long_transaction]                                                                              [集群的长事务信息]                            
obdiag display scene run --scene=observer.memory                                                                                                                        [memory]                                                                                        [所有租户的 memory 信息]                     
obdiag display scene run --scene=observer.plan  --env tenant_name=test --env sqlid=test                                                                                 [plan, display sql's plan statistics, sqlid is the SQL ID corresponding to the cached object]   [查看sql的执行计划的统计信息，sqlid指缓存对象对应的 SQL ID]
obdiag display scene run --scene=observer.plan_explain  --env svr_ip=test --env svr_port=2882 --env tenant_id=test --env plan_id=test                                   [plan_explain]                                                                                  [实际执行计划算子信息]                          
obdiag display scene run --scene=observer.processlist --env tenant_name=test                                                                                            [processlist]                                                                                   [查看 processlist]                      
obdiag display scene run --scene=observer.processlist_stat                                                                                                              [processlist_stat]                                                                              [processlist 实时会话信息汇总]                
obdiag display scene run --scene=observer.rs                                                                                                                            [rs]                                                                                            [查看 rootservice 信息]                   
obdiag display scene run --scene=observer.server_info                                                                                                                   [server info]                                                                                   [server 信息展示]                         
obdiag display scene run --scene=observer.slowsql --env tenant_name=test --env mtime=10                                                                                 [slowsql，mtime is query time, unit minute]                                                      [查看慢sql，mtime为查询时间，单位分钟]              
obdiag display scene run --scene=observer.storage_method --env tenant_name=test --env database_name=test                                                                [query table/index uses storage method]                                                         [查看表/索引是行存/列存/行列冗余的存储方式]              
obdiag display scene run --scene=observer.table_datasize --env tenant_id=1 --env database_name=test --env table_name=test                                               [table data_size]                                                                               [查看表占用磁盘大小]                           
obdiag display scene run --scene=observer.table_info --env db_connect='-h127.0.0.1 -P2881 -utest@test -p****** -Dtest' --env database_name=test --env table_name=test   [table info]                                                                                    [表信息展示]                               
obdiag display scene run --scene=observer.table_ndv --env database_name=test --env table_name=test                                                                      [table_ndv]                                                                                     [查询表 ndv 信息]                          
obdiag display scene run --scene=observer.tenant_info --env tenant_name=test                                                                                            [tenant info]                                                                                   [租户信息展示]                              
obdiag display scene run --scene=observer.topsql --env tenant_name=test --env mtime=10                                                                                  [topsql info]                                                                                   [查看topsql]                            
obdiag display scene run --scene=observer.unit_info                                                                                                                     [unit info]                                                                                     [unit 信息展示]                           
obdiag display scene run --scene=observer.zone_info                                                                                                                     [zone info]                                                                                     [zone 信息展示]                           
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

示例：

bash 复制代码

# 集群信息概览
obdiag display scene run --scene=observer.cluster_info 

# 显示集群中的所有租户
obdiag display scene run --scene=observer.all_tenant 

# 查看持有锁的会话信息
obdiag display scene run --scene=observer.lockholder 
obdiag display scene run --scene=observer.lockholder --env db_connect='-h127.0.0.1 -P2881 -utest@test -p****** -Dtest'

# 查看某张表上的锁信息
obdiag display scene run --scene=observer.lock_table --env tablename=test

# 查看表信息
obdiag display scene run --scene=observer.table_info --env db_connect='-h127.0.0.1 -P2881 -utest@test -p****** -Dtest' --env database_name=test --env table_name=test

# 查看执行时间超过100s的事务
obdiag display scene run --scene=observer.long_transaction --env wait_time=100

# 查看指定SQL的执行计划
obdiag display scene run --scene=observer.plan --env tenant_name=test --env sqlid=test --env sqlid=test

# 查看10s内的Top SQL
obdiag display scene run --scene=observer.topsql --env tenant_name=test --env mtime=10

# 查看执行超过10s的慢SQL
obdiag display scene run --scene=observer.slowsql --env tenant_name=test --env mtime=10

References

【1】https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000000089883

【2】https://www.oceanbase.com/docs/common-obdiag-cn-1000000004222680

【3】https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001050207