【全网独家】oceanbase容器重启时报obshell failed错误,无法正常启动的问题处理

正常运行的oceanbase容器,重新启动该容器却启动不了,重启服务器也无法恢复,报obshell failed错误,无法正常启动,本文记录了问题处理过程。

一、问题现象

1、正常运行的oceanbase容器,重启却启动不了

2、运行docker logs oceanbase检查日志,出错信息如下

核心错误为以下两句

[ERROR] 127.0.0.1 obshell failed
[ERROR] oceanbase-ce start failed

并提示运行 "obd display-trace 3d1c71c4-f80a-11ee-947f-0242ac110002"来检查obd的日志信息。

二、问题分析

1、定位问题

此时容器已无法启动,无法进入容器运行obd display-trace命令,但还好数据目录是挂载的主机目录 /app/dockerdata/oceanbase/obd,相应日志文件在主机侧可以直接查看。

bash 复制代码
[root@localhost ~]# cat /app/dockerdata/oceanbase/obd/log/obd 
....
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- exited code 2, error output:
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] ls: cannot access '/proc/118': No such file or directory
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] 
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- root@127.0.0.1 set env OB_ROOT_PASSWORD to ''
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- start obshell: cd /root/ob; /root/ob/bin/obshell admin start --ip 127.0.0.1 --port 2886
[2024-04-11 13:48:56.356] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- local execute: cd /root/ob; /root/ob/bin/obshell admin start --ip 127.0.0.1 --port 2886 
[2024-04-11 13:48:57.414] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] -- exited code 29, error output:
[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] open /root/ob/run/daemon.pid: file exists
[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] 
[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [ERROR] 127.0.0.1 obshell failed
[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - sub start ref count to 0
[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - export start
[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [ERROR] oceanbase-ce start failed
[2024-04-11 13:48:57.420] [3d1c71c4-f80a-11ee-947f-0242ac110002] [INFO] See https://www.oceanbase.com/product/ob-deployer/error-codes .
[2024-04-11 13:48:57.420] [3d1c71c4-f80a-11ee-947f-0242ac110002] [INFO] Trace ID: 3d1c71c4-f80a-11ee-947f-0242ac110002
[2024-04-11 13:48:57.420] [3d1c71c4-f80a-11ee-947f-0242ac110002] [INFO] If you want to view detailed obd logs, please run: obd display-trace 3d1c71c4-f80a-11ee-947f-0242ac110002
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 1
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 0
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - unlock /root/.obd/lock/mirror_and_repo
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - exclusive lock /root/.obd/lock/deploy_obcluster release, count 0
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - unlock /root/.obd/lock/deploy_obcluster
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - share lock /root/.obd/lock/global release, count 0
[2024-04-11 13:48:57.421] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - unlock /root/.obd/lock/global

可以看到关键的出错信息为:

[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] open /root/ob/run/daemon.pid: file exists

[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG]

[2024-04-11 13:48:57.415] [3d1c71c4-f80a-11ee-947f-0242ac110002] [ERROR] 127.0.0.1 obshell failed

[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - sub start ref count to 0

[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [DEBUG] - export start

[2024-04-11 13:48:57.416] [3d1c71c4-f80a-11ee-947f-0242ac110002] [ERROR] oceanbase-ce start failed

即容器在启动ob时发现/root/ob/run/daemon.pid存在,认为程序仍在运行退出,随即obshell 启动失败,导致最后oceanbase-ce启动失败。

三、解决办法

容器内的/root/ob/run/daemon.pid对应主机/app/dockerdata/oceanbase/ob/run/daemon.pid,察看文件内容

bash 复制代码
[root@localhost ~]# cat /app/dockerdata/oceanbase/ob/run/daemon.pid
98

里面的值为上次容器运行时守护进程的pid,删除该文件,重启容器

bash 复制代码
[root@localhost ~]# rm /app/dockerdata/oceanbase/ob/run/daemon.pid
rm: remove regular file '/app/dockerdata/oceanbase/ob/run/daemon.pid'? y
[root@localhost ~]# docker restart oceanbase
oceanbase
[root@localhost ~]# docker ps -a
CONTAINER ID        IMAGE                    COMMAND              CREATED             STATUS              PORTS                    NAMES
e2f1998af148        oceanbase/oceanbase-ce   "/bin/sh -c _boot"   38 minutes ago      Up 6 seconds        0.0.0.0:3306->2881/tcp   oceanbase

容器恢复正常 ,尝试登录:

bash 复制代码
[root@localhost ~]# mysql -h127.0.0.1 -uroot -p -P3306
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3221487687
Server version: 5.7.25 OceanBase_CE 4.3.0.1 (r100000242024032211-0193a343bc60b4699ec47792c3fc4ce166a182f9) (Built Mar 22 2024 13:19:48)

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| LBACSYS            |
| mysql              |
| oceanbase          |
| ocs                |
| ORAAUDITOR         |
| SYS                |
| test               |
+--------------------+
8 rows in set (0.02 sec)

mysql> exit
Bye
[root@localhost ~]# 

可见业务已经恢复。

经查,这是oceanbase容器的一个运行BUG,通过docker restart oceanbase(oceanbase为运行的容器名)就必然会启不来了,要删掉pid文件才能重新正常启动,:-(。

相关推荐
大G哥1 小时前
记一次K8S 环境应用nginx stable-alpine 解析内部域名失败排查思路
运维·nginx·云原生·容器·kubernetes
醉颜凉1 小时前
银河麒麟桌面操作系统修改默认Shell为Bash
运维·服务器·开发语言·bash·kylin·国产化·银河麒麟操作系统
cyt涛2 小时前
MyBatis 学习总结
数据库·sql·学习·mysql·mybatis·jdbc·lombok
苦逼IT运维2 小时前
YUM 源与 APT 源的详解及使用指南
linux·运维·ubuntu·centos·devops
Rookie也要加油2 小时前
01_SQLite
数据库·sqlite
仍有未知等待探索2 小时前
Linux 传输层UDP
linux·运维·udp
liuxin334455662 小时前
教育技术革新:SpringBoot在线教育系统开发
数据库·spring boot·后端
zeruns8022 小时前
如何搭建自己的域名邮箱服务器?Poste.io邮箱服务器搭建教程,Linux+Docker搭建邮件服务器的教程
linux·运维·服务器·docker·网站
北城青2 小时前
WebRTC Connection Negotiate解决
运维·服务器·webrtc
疯狂的大狗3 小时前
docker进入正在运行的容器,exit后的比较
运维·docker·容器