1. 前言
obdiag是OceanBase的敏捷诊断工具。1.2版本中,obdiag支持快速收集诊断信息,但仅有收集能力是不够的,还需要有分析能力。因此在obdiag的1.3.0版本中,我们加入了OB集群的日志分析功能。用户可以一键进行集群的OB日志的分析,以便发现可能存在的异常情况。
2. obdiag 日志分析设计
2.1 架构设计
主体架构还是依托于obdiag的集中式采集模式,当用户发起obdiag 的分析的时候需要去各个节点上进行采集,将采集回来的数据集中进行分析处理。
2.2 obdiag执行在线日志分析的时序图
-
用户设置配置文件,配置文件的路径在obdiag安装目录的config/config.yml中,主要是设置所要分析的OceanBase集群的ssh登陆信息,因为obdiag需要通过ssh方式去集群拉取日志到obdiag的节点上进行分析
-
执行obdiag analyze log <option> 命令
-
obdiag 接收到用户的analyze命令后会去解析<option> 内的参数
-
obdiag解析完analyze参数后会启动日志拉取的环节,拉取的节点是步骤一中用户配置的,拉取的日志的时间范围、过滤条件等都是步骤三<option>设定的
-
obdiag 发送远程主机的执行指令
-
远程执行日志的grep或者cp命令来获取日志
-
符合条件的日志会统一放到临时文件中,便于后续的回传
-
下载远程主机上筛选出来的符合条件的日志
-
下载完毕后,发送临时文件清理指令
-
远程主机临时文件会被清理
-
obdiag 对远程主机拉取回来的日志文件进行分析,对于日志分析,主要规则是针对日志中的retcode进行分析,统计各retcode出现的次数、最早开始时间、最晚出现的时间以及其对应的trace_id的等信息
-
obdiag分析完日志后会在黑屏上打印出总览的日志分析信息
-
obdiag分析日志的详细信息会输出到文件中
-
用户可以通过obdiag 输出的文件地址查看详细的日志分析报告
3. obdiag日志分析实践
obdiag analyze <analyze type> [options]
analyze type
包含如下:
- log:一键分析 OceanBase 的日志。
3.1 obdiag analyze log
使用该命令可以一键在线分析 OceanBase 集群的日志,或者通过 --files
开启离线分析模式。
- 本文所指的在线分析指的是 OceanBase 集群在线运行状态,日志分布在各个 OBServer 节点上。
- 本文所指的离线分析模式是
--files
参数传递下,可以分析已经收集到机 obdiag 部署机器上的 OBServer 节点日志。 - 需要确保已经在 obdiag 配置文件
config.yml
中配置好需要收集节点的登录信息。相关的详细配置介绍,参见 obdiag 配置。
例子:
obdiag analyze log --scope observer --from 2023-10-08 10:25:00 --to 2023-10-08 11:30:00
...
FileListInfo:
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node | LogList |
+================+=======================================================================================================================================================================================================================+
| xx.xx.xx.xx | ['observer.log.20231008104204260', 'observer.log.20231008111305072', 'observer.log.20231008114410668', 'observer.log.wf.20231008104204260', 'observer.log.wf.20231008111305072', 'observer.log.wf.20231008114410668'] |
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
...
Analyze OceanBase Online Log Summary:
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| Node | Status | FileName | ErrorCode | Message | Count |
+================+===========+==============================================================================+=============+===============================================================================================================================+=========+
| xx.xx.xx.xx | Completed | analyze_pack_20231008171201/xx_xx_xx_xx/observer.log.20231008104204260 | -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use | 2 |
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| xx.xx.xx.xx | Completed | analyze_pack_20231008171201/xx_xx_xx_xx/observer.log.20231008111305072 | -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use | 8 |
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| xx.xx.xx.xx | Completed | analyze_pack_20231008171201/xx_xx_xx_xx/observer.log.20231008114410668 | -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use | 10 |
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| xx.xx.xx.xx | Completed | analyze_pack_20231008171201/xx_xx_xx_xx/observer.log.20231008114410668 | -4009 | IO error | 20 |
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
For more details, please run cmd 'cat analyze_pack_20231008171201/result_details.txt'
快捷分析最近一段时间的日志:
在线分析最近一小时的日志,该指令执行的时候会从远程主机上拉取最近一小时的日志进行分析,诊断出出现过的错误
obdiag gather log --scope observer --since 1h
# 在线分析最近 30 分钟的日志,该指令执行的时候会从远程主机上拉取最近30分钟的日志进行分析,诊断出出现过的错误
obdiag analyze log --scope observer --since 30m
离线分析日志:
ls -lh test/
-rw-r--r-- 1 admin staff 256M Oct 8 17:24 observer.log.20231008104204260
-rw-r--r-- 1 admin staff 256M Oct 8 17:24 observer.log.20231008111305072
-rw-r--r-- 1 admin staff 256M Oct 8 17:24 observer.log.20231008114410668
-rw-r--r-- 1 admin staff 18K Oct 8 17:24 observer.log.wf.20231008104204260
-rw-r--r-- 1 admin staff 19K Oct 8 17:24 observer.log.wf.20231008111305072
-rw-r--r-- 1 admin staff 18K Oct 8 17:24 observer.log.wf.20231008114410668
obdiag analyze log --files test/
Analyze OceanBase Offline Log Summary:
+-----------+-----------+-----------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| Node | Status | FileName | ErrorCode | Message | Count |
+===========+===========+=======================================================================+=============+===============================================================================================================================+=========+
| 127.0.0.1 | Completed | analyze_pack_20231008172144/127_0_0_1_/observer.log.20231008104204260 | -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use | 2 |
+-----------+-----------+-----------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| 127.0.0.1 | Completed | analyze_pack_20231008172144/127_0_0_1_/observer.log.20231008111305072 | -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use | 8 |
+-----------+-----------+-----------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| 127.0.0.1 | Completed | analyze_pack_20231008172144/127_0_0_1_/observer.log.20231008114410668 | -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use | 10 |
+-----------+-----------+-----------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| 127.0.0.1 | Completed | analyze_pack_20231008172144/127_0_0_1_/observer.log.20231008114410668 | -4009 | IO error | 20 |
+-----------+-----------+-----------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
For more details, please run cmd 'cat analyze_pack_20231008172144/result_details.txt'
《OceanBase诊断系列》分享持续更新,也欢迎大家贡献自己的诊断OceanBase的方法。
|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 第一篇 | 如何修炼成"神医"------《OceanBase诊断系列》之一 |
| 第二篇 | 走进SQL审计视图------《OceanBase诊断系列》之二 |
| 第三篇 | 快速收集诊断信息,敏捷诊断工具obdiag应用实践------《OceanBase诊断系列》之三 |
| 第四篇 | 如何快速分析OB集群日志,敏捷诊断工具obdiag分析能力实践------《OceanBase诊断系列》之四 |