【全网独家】oceanbase容器重启时报obshell failed错误,无法正常启动的问题处理

正常运行的oceanbase容器,重新启动该容器却启动不了,重启服务器也无法恢复,报obshell failed错误,无法正常启动,本文记录了问题处理过程。

一、问题现象

1、正常运行的oceanbase容器,重启却启动不了

2、运行docker logs oceanbase检查日志,出错信息如下

核心错误为以下两句

[ERROR] 127.0.0.1 obshell failed
[ERROR] oceanbase-ce start failed

并提示运行 "obd display-trace 3d1c71c4-f80a-11ee-947f-0242ac110002"来检查obd的日志信息。

二、问题分析

1、定位问题

此时容器已无法启动,无法进入容器运行obd display-trace命令,但还好数据目录是挂载的主机目录 /app/dockerdata/oceanbase/obd,相应日志文件在主机侧可以直接查看。

bash 复制代码
[root@localhost ~]# cat /app/dockerdata/oceanbase/obd/log/obd 
....
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- exited code 2, error output:
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] ls: cannot access '/proc/118': No such file or directory
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] 
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- root@127.0.0.1 set env OB_ROOT_PASSWORD to ''
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- start obshell: cd /root/ob; /root/ob/bin/obshell admin start --ip 127.0.0.1 --port 2886
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- local execute: cd /root/ob; /root/ob/bin/obshell admin start --ip 127.0.0.1 --port 2886 
[2024-04-11 13:48:57.414] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- exited code 29, error output:
[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] open /root/ob/run/daemon.pid: file exists
[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] 
[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [ERROR] 127.0.0.1 obshell failed
[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - sub start ref count to 0
[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - export start
[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [ERROR] oceanbase-ce start failed
[2024-04-11 13:48:57.420] [3d1c71c4-f80a-11ee-947f-0242ac110002] [INFO] See https://www.oceanbase.com/product/ob-deployer/error-codes .
[2024-04-11 13:48:57.420] [3d1c71c4-f80a-11ee-947f-0242ac110002] [INFO] Trace ID: 3d1c71c4-f80a-11ee-947f-0242ac110002
[2024-04-11 13:48:57.420] [3d1c71c4-f80a-11ee-947f-0242ac110002] [INFO] If you want to view detailed obd logs, please run: obd display-trace 3d1c71c4-f80a-11ee-947f-0242ac110002
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 1
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 0
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - unlock /root/.obd/lock/mirror_and_repo
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - exclusive lock /root/.obd/lock/deploy_obcluster release, count 0
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - unlock /root/.obd/lock/deploy_obcluster
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - share lock /root/.obd/lock/global release, count 0
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - unlock /root/.obd/lock/global

可以看到关键的出错信息为:

[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] open /root/ob/run/daemon.pid: file exists

[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG]

[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [ERROR] 127.0.0.1 obshell failed

[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - sub start ref count to 0

[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - export start

[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [ERROR] oceanbase-ce start failed

即容器在启动ob时发现/root/ob/run/daemon.pid存在,认为程序仍在运行退出,随即obshell 启动失败,导致最后oceanbase-ce启动失败。

三、解决办法

容器内的/root/ob/run/daemon.pid对应主机/app/dockerdata/oceanbase/ob/run/daemon.pid,察看文件内容

bash 复制代码
[root@localhost ~]# cat /app/dockerdata/oceanbase/ob/run/daemon.pid
98

里面的值为上次容器运行时守护进程的pid,删除该文件,重启容器

bash 复制代码
[root@localhost ~]# rm /app/dockerdata/oceanbase/ob/run/daemon.pid
rm: remove regular file '/app/dockerdata/oceanbase/ob/run/daemon.pid'? y
[root@localhost ~]# docker restart oceanbase
oceanbase
[root@localhost ~]# docker ps -a
CONTAINER ID        IMAGE                    COMMAND              CREATED             STATUS              PORTS                    NAMES
e2f1998af148        oceanbase/oceanbase-ce   "/bin/sh -c _boot"   38 minutes ago      Up 6 seconds        0.0.0.0:3306->2881/tcp   oceanbase

容器恢复正常 ,尝试登录:

bash 复制代码
[root@localhost ~]# mysql -h127.0.0.1 -uroot -p -P3306
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3221487687
Server version: 5.7.25 OceanBase_CE 4.3.0.1 (r100000242024032211-0193a343bc60b4699ec47792c3fc4ce166a182f9) (Built Mar 22 2024 13:19:48)

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| LBACSYS            |
| mysql              |
| oceanbase          |
| ocs                |
| ORAAUDITOR         |
| SYS                |
| test               |
+--------------------+
8 rows in set (0.02 sec)

mysql> exit
Bye
[root@localhost ~]# 

可见业务已经恢复。

经查,这是oceanbase容器的一个运行BUG,通过docker restart oceanbase(oceanbase为运行的容器名)就必然会启不来了,要删掉pid文件才能重新正常启动,:-(。

相关推荐
肖永威几秒前
CentOS环境上离线安装python3及相关包
linux·运维·机器学习·centos
六月闻君4 分钟前
MySQL 报错:1137 - Can‘t reopen table
数据库·mysql
布鲁格若门7 分钟前
CentOS 7 桌面版安装 cuda 12.4
linux·运维·centos·cuda
Eternal-Student12 分钟前
【docker 保存】将Docker镜像保存为一个离线的tar归档文件
运维·docker·容器
SelectDB技术团队13 分钟前
兼顾高性能与低成本,浅析 Apache Doris 异步物化视图原理及典型场景
大数据·数据库·数据仓库·数据分析·doris
DC_BLOG18 分钟前
Linux-Apache静态资源
linux·运维·apache
码农小丘20 分钟前
一篇保姆式centos/ubuntu安装docker
运维·docker·容器
inventecsh29 分钟前
mongodb基础操作
数据库·mongodb
白云如幻33 分钟前
SQL99版链接查询语法
数据库·sql·mysql
耗同学一米八1 小时前
2024 年河北省职业院校技能大赛网络建设与运维赛项样题二
运维·网络·mariadb