【全网独家】oceanbase容器重启时报obshell failed错误,无法正常启动的问题处理

正常运行的oceanbase容器,重新启动该容器却启动不了,重启服务器也无法恢复,报obshell failed错误,无法正常启动,本文记录了问题处理过程。

一、问题现象

1、正常运行的oceanbase容器,重启却启动不了

2、运行docker logs oceanbase检查日志,出错信息如下

核心错误为以下两句

ERROR\] 127.0.0.1 obshell failed \[ERROR\] oceanbase-ce start failed

并提示运行 "obd display-trace 3d1c71c4-f80a-11ee-947f-0242ac110002"来检查obd的日志信息。

二、问题分析

1、定位问题

此时容器已无法启动,无法进入容器运行obd display-trace命令,但还好数据目录是挂载的主机目录 /app/dockerdata/oceanbase/obd,相应日志文件在主机侧可以直接查看。

bash 复制代码
[root@localhost ~]# cat /app/dockerdata/oceanbase/obd/log/obd 
....
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- exited code 2, error output:
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] ls: cannot access '/proc/118': No such file or directory
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] 
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- root@127.0.0.1 set env OB_ROOT_PASSWORD to ''
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- start obshell: cd /root/ob; /root/ob/bin/obshell admin start --ip 127.0.0.1 --port 2886
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- local execute: cd /root/ob; /root/ob/bin/obshell admin start --ip 127.0.0.1 --port 2886 
[2024-04-11 13:48:57.414] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- exited code 29, error output:
[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] open /root/ob/run/daemon.pid: file exists
[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] 
[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [ERROR] 127.0.0.1 obshell failed
[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - sub start ref count to 0
[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - export start
[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [ERROR] oceanbase-ce start failed
[2024-04-11 13:48:57.420] [3d1c71c4-f80a-11ee-947f-0242ac110002] [INFO] See https://www.oceanbase.com/product/ob-deployer/error-codes .
[2024-04-11 13:48:57.420] [3d1c71c4-f80a-11ee-947f-0242ac110002] [INFO] Trace ID: 3d1c71c4-f80a-11ee-947f-0242ac110002
[2024-04-11 13:48:57.420] [3d1c71c4-f80a-11ee-947f-0242ac110002] [INFO] If you want to view detailed obd logs, please run: obd display-trace 3d1c71c4-f80a-11ee-947f-0242ac110002
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 1
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 0
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - unlock /root/.obd/lock/mirror_and_repo
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - exclusive lock /root/.obd/lock/deploy_obcluster release, count 0
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - unlock /root/.obd/lock/deploy_obcluster
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - share lock /root/.obd/lock/global release, count 0
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - unlock /root/.obd/lock/global

可以看到关键的出错信息为:

2024-04-11 13:48:57.415\] \[3d1c71c4-f80a-11ee-947f-0242ac110002\] \[DEBUG\] open /root/ob/run/daemon.pid: file exists \[2024-04-11 13:48:57.415\] \[3d1c71c4-f80a-11ee-947f-0242ac110002\] \[DEBUG

2024-04-11 13:48:57.415\] \[3d1c71c4-f80a-11ee-947f-0242ac110002\] \[ERROR\] 127.0.0.1 obshell failed \[2024-04-11 13:48:57.416\] \[3d1c71c4-f80a-11ee-947f-0242ac110002\] \[DEBUG\] - sub start ref count to 0 \[2024-04-11 13:48:57.416\] \[3d1c71c4-f80a-11ee-947f-0242ac110002\] \[DEBUG\] - export start \[2024-04-11 13:48:57.416\] \[3d1c71c4-f80a-11ee-947f-0242ac110002\] \[ERROR\] oceanbase-ce start failed

即容器在启动ob时发现/root/ob/run/daemon.pid存在,认为程序仍在运行退出,随即obshell 启动失败,导致最后oceanbase-ce启动失败。

三、解决办法

容器内的/root/ob/run/daemon.pid对应主机/app/dockerdata/oceanbase/ob/run/daemon.pid,察看文件内容

bash 复制代码
[root@localhost ~]# cat /app/dockerdata/oceanbase/ob/run/daemon.pid
98

里面的值为上次容器运行时守护进程的pid,删除该文件,重启容器

bash 复制代码
[root@localhost ~]# rm /app/dockerdata/oceanbase/ob/run/daemon.pid
rm: remove regular file '/app/dockerdata/oceanbase/ob/run/daemon.pid'? y
[root@localhost ~]# docker restart oceanbase
oceanbase
[root@localhost ~]# docker ps -a
CONTAINER ID        IMAGE                    COMMAND              CREATED             STATUS              PORTS                    NAMES
e2f1998af148        oceanbase/oceanbase-ce   "/bin/sh -c _boot"   38 minutes ago      Up 6 seconds        0.0.0.0:3306->2881/tcp   oceanbase

容器恢复正常 ,尝试登录:

bash 复制代码
[root@localhost ~]# mysql -h127.0.0.1 -uroot -p -P3306
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3221487687
Server version: 5.7.25 OceanBase_CE 4.3.0.1 (r100000242024032211-0193a343bc60b4699ec47792c3fc4ce166a182f9) (Built Mar 22 2024 13:19:48)

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| LBACSYS            |
| mysql              |
| oceanbase          |
| ocs                |
| ORAAUDITOR         |
| SYS                |
| test               |
+--------------------+
8 rows in set (0.02 sec)

mysql> exit
Bye
[root@localhost ~]# 

可见业务已经恢复。

经查,这是oceanbase容器的一个运行BUG,通过docker restart oceanbase(oceanbase为运行的容器名)就必然会启不来了,要删掉pid文件才能重新正常启动,:-(。

相关推荐
倔强的石头_3 分钟前
MySQL 兼容性深度解析:从内核级优化到“零修改”迁移工程实践
前端·数据库
水杉i10 分钟前
Redis 使用笔记
数据库·redis·笔记
学不完的10 分钟前
redis
数据库·redis·缓存·运维开发
无效的名字13 分钟前
最快速在服务器上搭建代理
运维·服务器
木与长清15 分钟前
人鼠同源基因离线转换
数据库·矩阵·数据分析·r语言
liurunlin88817 分钟前
Linux系统安装部署Tomcat
linux·运维·tomcat
wanhengidc19 分钟前
服务器 数据安全稳定
运维·服务器·数据库·游戏·智能手机
tingting011919 分钟前
linux系统-统计连接数-钉钉告警
linux·数据库·钉钉
山峰哥20 分钟前
数据库工程中的SQL调优策略与实践:从索引优化到执行计划分析
数据库·sql·性能优化
执笔画情ora28 分钟前
postgresql管理-pg_hba.conf 文件详解管理
数据库·postgresql