文章目录
- [1 还原MySQL节点](#1 还原MySQL节点)
-
- [1.1 Ansible还原](#1.1 Ansible还原)
-
- [1.1 安装ansible](#1.1 安装ansible)
- [1.2 配置ansible配置文件](#1.2 配置ansible配置文件)
- [1.3 构建自动化管理通道 (Bootstrapping)](#1.3 构建自动化管理通道 (Bootstrapping))
- [1.4 修改配置文件](#1.4 修改配置文件)
- [1.5 重置MySQL集群 (Playbook 执行)](#1.5 重置MySQL集群 (Playbook 执行))
- [1.2 手动还原](#1.2 手动还原)
1 还原MySQL节点
还原前根据想要实现的功能提前调整MySQL配置文件,或者调整剧本也可以
1.1 Ansible还原
原理就是手动清理,只不过就是通过剧本来执行,可以理解成脚本
1.1 安装ansible
bash
# 配置epel源
[root@MySQL-Router ~]# sh /usr/bin/my_script/swhouse_set.sh
==============================Start wshouse_set.sh script==============================
Enter [function name] to option function,script support local,network,custom.
Local,Aliyun,Docker,Epel,Customize):Epel
Updating Subscription Management repositories.
Unable to read consumer identity
This system is not registered with an entitlement server. You can use "rhc" or "subscription-manager" to register.
repo id repo name
AliyunAppStream Aliyun.appstream
AliyunBaseOS Aliyun.baseos
Epel Epel
Sucess modify!!!
==============================End wshouse_set.sh script================================
[root@MySQL-Router ~]# grep baseurl /etc/yum.repos.d/Epel.repo
baseurl=https://mirrors.aliyun.com/epel/9/Everything/x86_64/
# 安装ansible
[root@MySQL-Router ~]# dnf install ansible -yq
Installed:
ansible-1:7.7.0-1.el9.noarch ansible-core-1:2.14.14-1.el9.x86_64
git-core-2.43.5-1.el9_4.x86_64 python3-cffi-1.14.5-5.el9.x86_64
python3-cryptography-36.0.1-4.el9.x86_64 python3-packaging-20.9-5.el9.noarch
python3-ply-3.11-14.el9.0.1.noarch python3-pycparser-2.20-6.el9.noarch
python3-resolvelib-0.5.4-5.el9.noarch
[root@MySQL-Router ~]# ansible --version
ansible [core 2.14.14]
config file = /etc/ansible/ansible.cfg
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.9/site-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.9.18 (main, Jan 24 2024, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] (/usr/bin/python3)
jinja version = 3.1.2
libyaml = True
# 为ansible添加用户
[root@MySQL-Router ~]# useradd devops
[root@MySQL-Router ~]# echo 123 | passwd --stdin devops
Changing password for user devops.
passwd: all authentication tokens updated successfully.
1.2 配置ansible配置文件
bash
# 配置ansible
[root@MySQL-Router ~]# su - devops
[devops@MySQL-Router ~]$ mkdir ansible
[devops@MySQL-Router ~]$ cat >ansible/ansible.cfg <<EOF
> [defaults]
> inventory=./inventory # 处理清单-->要处理那几台主机
> remote_user=root # 远程使用到的用户
> host_key_checking=false # 不记录指纹(不输入yes和no)
> [privilege_escalation]
> become=False # 远程到的主机时不做身份切换-->不和理,后续需要调整
> EOF
[devops@MySQL-Router ~]$ cat ansible/ansible.cfg
[defaults]
inventory=./inventory
remote_user=root
host_key_checking=false
[privilege_escalation]
become=False
# 编写配置清单
[devops@MySQL-Router ~]$ cat > ansible/inventory << EOF
> [mysql]
> 172.25.254.10
> 172.25.254.20
> 172.25.254.30
> EOF
[devops@MySQL-Router ~]$ cat ansible/inventory
[mysql]
172.25.254.10
172.25.254.20
172.25.254.30
1.3 构建自动化管理通道 (Bootstrapping)
这部分目的是让 devops 用户能够无需密码地控制所有节点,并以 root 权限执行任务。
bash
# 利用ansible自动化部署工具,为集群进行"初始化环境准备"和"重置MySQL服务"
[devops@MySQL-Router ~]$ cd ansible/
# 创建用户
[devops@MySQL-Router ansible]$ ansible mysql -m user -a 'name=devops'
172.25.254.30 | CHANGED => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": true,
"comment": "",
"create_home": true,
"group": 1000,
"home": "/home/devops",
"name": "devops",
"shell": "/bin/bash",
"state": "present",
"system": false,
"uid": 1000
}
..................
# 设置密码
[devops@MySQL-Router ansible]$ ansible mysql -m shell -a 'echo 123 | passwd --stdin devops'
172.25.254.30 | CHANGED | rc=0 >>
Changing password for user devops.
passwd: all authentication tokens updated successfully.
..................
# 配置sudo免密权限
[devops@MySQL-Router ansible]$ ansible mysql -m shell -a 'echo "devops ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers'
172.25.254.30 | CHANGED | rc=0 >>
........................
# 创建SSH目录
[devops@MySQL-Router ansible]$ ansible all -m file -a 'path=/home/devops/.ssh owner=devops group=devops mode="0700" state=directory'
172.25.254.30 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"gid": 1000,
"group": "devops",
"mode": "0700",
"owner": "devops",
"path": "/home/devops/.ssh",
"size": 103,
"state": "directory",
"uid": 1000
}
..................
# 分发公钥(配置免密登录)
[devops@MySQL-Router ansible]$ ansible all -m copy -a 'src=/home/devops/.ssh/authorized_keys dest=/home/devops/.ssh/authorized_keys owner=devops group=devops mode='0600''
172.25.254.30 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"checksum": "68279959b69128a45451bad8fbf5da74944432fd",
"dest": "/home/devops/.ssh/authorized_keys",
"gid": 1000,
"group": "devops",
"mode": "0600",
"owner": "devops",
"path": "/home/devops/.ssh/authorized_keys",
"size": 580,
"state": "file",
"uid": 1000
}
..................
1.4 修改配置文件
bash
# 现在是使用root用户远程过去的(ansible/ansible.cfg里面指定),不安全,改成devops用户
[devops@MySQL-Router ansible]$ ansible all -m shell -a 'whoami'
172.25.254.30 | CHANGED | rc=0 >>
root # 根据之前的配置文件,这里的root是40的root
..................
# 验证
[devops@MySQL-Router ansible]$ egrep "devops|become" ansible.cfg
remote_user=devops
become=False
[devops@MySQL-Router ansible]$ ansible all -m shell -a 'whoami'
172.25.254.30 | CHANGED | rc=0 >>
devops
..................
# 做身份转变
[devops@MySQL-Router ansible]$ vim ansible.cfg
[devops@MySQL-Router ansible]$ cat ansible.cfg
[defaults]
inventory=./inventory
remote_user=devops
host_key_checking=false
[privilege_escalation]
become=True
become_ask_pass=False # 因为前面配置了NOPASSWD,所以不需要问sudo密码
become_method=sudo # sudo提权到root
become_user=root
[devops@MySQL-Router ansible]$ ansible all -m shell -a 'whoami'
172.25.254.30 | CHANGED | rc=0 >>
root
..................
1.5 重置MySQL集群 (Playbook 执行)
1.5.1 YAML剧本(clear_mysql.yml)
通过编写YAML剧本 (clear_mysql.yml) 来暴力重置数据库状态,通常用于重新搭建集群或清理脏数据。
yaml
[devops@MySQL-Router ansible]$ vim clear_mysql.yml
[devops@MySQL-Router ansible]$ cat clear_mysql.yml
- name: reset mysql
hosts: mysql
tasks:
- name: stop mysql
shell: '/usr/local/mysql/bin/mysqld stop'
ignore_errors: yes
- name: delete mysql data
file:
path: /data/mysql
state: absent
- name: create data directroy
file:
path: /data/mysql
state: directory
owner: mysql
group: mysql
- name: initialize mysql
shell: '/usr/local/mysql/bin/mysqld --initialize --user=mysql'
- name: start mysql
1.5.2 扮演剧本
bash
[devops@MySQL-Router ansible]$ ansible-playbook clear_mysql.yml -vv | grep password
# 毛都没输出
# 没成功,看日志,找报错
[devops@MySQL-Router ansible]$ ansible-playbook clear_mysql.yml -vv | less
/ERROR
# MySQL8.3参数不匹配,注释移除掉就好了
[ERROR] [MY-000067] [Server] unknown variable 'rpl_semi_sync_slave_enabled=1'.\n2026-03-23T11:08:22.845206Z 0 [ERROR] [MY-013236] [Server] The designated data directory /data/mysql/ is unusable. You can remove all files that the server added to it.\n2026-03-23T11:08:22.845235Z 0
[devops@MySQL-Router ansible]$ ansible mysql -m shell -a "sed -i 's/^rpl_semi_sync_master_enabled/#rpl_semi_sync_master_enabled/g' /etc/my.cnf"
172.25.254.30 | CHANGED | rc=0 >>
..................
[devops@MySQL-Router ansible]$ ansible mysql -m shell -a "sed -i 's/^rpl_semi_sync_slave_enabled/#rpl_semi_sync_slave_enabled/g' /etc/my.cnf"
172.25.254.30 | CHANGED | rc=0 >>
..................
[devops@MySQL-Router ansible]$ ansible mysql -m shell -a "grep -n 'rpl_semi_sync' /etc/my.cnf"
172.25.254.30 | CHANGED | rc=0 >>
9:#rpl_semi_sync_slave_enabled=1
172.25.254.20 | CHANGED | rc=0 >>
12:#rpl_semi_sync_slave_enabled=1
172.25.254.10 | CHANGED | rc=0 >>
9:#rpl_semi_sync_master_enabled=1
# 再次执行
[devops@MySQL-Router ansible]$ ansible-playbook clear_mysql.yml -vv | grep password
changed: [172.25.254.30] => {
..................
A temporary password is generated for root@localhost: n/dqdsFfc6#/
..................}
changed: [172.25.254.20] => {
.....................
A temporary password is generated for root@localhost: H2iHth3kHl>Q
.....................}
changed: [172.25.254.10] => {
..................
A temporary password is generated for root@localhost: Yjh>P?Fiq5pN
..................}
1.5.3 问题
现象
bash
[root@mysql-node1 ~]# mysql -pYjh>P?Fiq5pN
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/data/mysql/mysql.sock' (2)
[root@mysql-node1 ~]# ps -aux | grep mysqld
root 10535 0.0 0.0 5736 1920 pts/0 T 19:47 0:00 cat /usr/local/mysql/bin/mysqld
root 10538 0.0 0.0 5736 1920 pts/0 T 19:47 0:00 cat /usr/local/mysql/bin/mysqld
mysql 12407 0.3 11.7 2736548 436940 ? Ssl 20:24 0:05 /usr/local/mysql/bin/mysqld
root 13416 0.0 0.0 6416 2304 pts/0 S+ 20:50 0:00 grep --color=auto mysqld
[root@mysql-node1 ~]# systemctl is-active mysqld.service
active
[root@mysql-node1 ~]# find / -name "mysql.sock" 2>/dev/null
# 没有
分析
在执行 Playbook 之前,MySQL 服务可能已经是启动状态(或者是上一次初始化后残留的僵尸进程)。
当运行 Playbook 时:
- stop mysql 任务使用了 /usr/local/mysql/bin/mysqld stop,但这个命令可能没有成功杀死那个旧的进程(或者杀死了又立刻被 systemd 重启了,因为 systemctl is-active 显示 active)。
- delete mysql data 任务尝试删除 /data/mysql。注意: 如果 MySQL 进程还在运行并占用着数据目录,Linux 通常允许删除文件,但进程持有的文件句柄不会释放,且新文件无法正确写入。
- initialize mysql 任务执行时,可能因为目录里还有旧进程的锁文件(.pid, .sock 的残留引用),导致初始化虽然输出了密码(这是初始化结束前的最后一步),但实际上环境是脏的,或者初始化生成的 socket 路径与客户端预期的不一致。
- 最关键的是:初始化 (--initialize) 只是准备数据文件,它不会启动一个长期运行的服务进程来监听 socket! 初始化进程跑完就退出了。现在的 mysqld 进程是之前遗留下来的旧进程,它指向的是旧的数据目录配置,而新数据目录是刚初始化的,两者不匹配,所以没有 socket 文件生成在data/mysql位置。
解决
bash
# 杀死现在进程,清理脏环境(/data/mysql/下的数据),启动mysql,安全初始化............就是手动还原数据库节点,或者停掉mysqld服务,再执行一遍脚本。
[root@mysql-node1 ~]# ps -aux | grep mysqld
root 10535 0.0 0.0 5736 1920 pts/0 T 19:47 0:00 cat /usr/local/mysql/bin/mysqld
root 10538 0.0 0.0 5736 1920 pts/0 T 19:47 0:00 cat /usr/local/mysql/bin/mysqld
mysql 13477 1.3 11.0 2736548 412268 ? Ssl 21:10 0:00 /usr/local/mysql/bin/mysqld
root 13561 0.0 0.0 6416 2304 pts/0 S+ 21:10 0:00 grep --color=auto mysqld
[root@mysql-node1 ~]# pkill -9 mysqld
[root@mysql-node1 ~]# ps -aux | grep mysqld
root 10535 0.0 0.0 5736 1920 pts/0 T 19:47 0:00 cat /usr/local/mysql/bin/mysqld
root 10538 0.0 0.0 5736 1920 pts/0 T 19:47 0:00 cat /usr/local/mysql/bin/mysqld
mysql 13588 34.0 10.8 2604476 402800 ? Ssl 21:11 0:00 /usr/local/mysql/bin/mysqld
root 13637 0.0 0.0 6416 2304 pts/0 S+ 21:11 0:00 grep --color=auto mysqld
[root@mysql-node1 ~]# pkill -9 10535
[root@mysql-node1 ~]# pkill -9 10538
[root@mysql-node1 ~]# pkill -9 13588
[root@mysql-node1 ~]# pkill -9 13637
[root@mysql-node1 ~]# ps -aux | grep mysqld
root 10535 0.0 0.0 5736 1920 pts/0 T 19:47 0:00 cat /usr/local/mysql/bin/mysqld
root 10538 0.0 0.0 5736 1920 pts/0 T 19:47 0:00 cat /usr/local/mysql/bin/mysqld
mysql 13588 0.5 10.9 2670012 407280 ? Ssl 21:11 0:00 /usr/local/mysql/bin/mysqld
root 13647 0.0 0.0 6416 2176 pts/0 S+ 21:13 0:00 grep --color=auto mysqld
[root@mysql-node1 ~]# pkill -9 -f "cat /usr/local/mysql/bin/mysqld"
[1]- Killed cat /usr/local/mysql/bin/mysqld
[2]+ Killed cat /usr/local/mysql/bin/mysqld
[root@mysql-node1 ~]# ps -aux | grep mysqld
mysql 13588 0.5 10.9 2670012 407280 ? Ssl 21:11 0:01 /usr/local/mysql/bin/mysqld
root 13650 0.0 0.0 6416 2304 pts/0 S+ 21:14 0:00 grep --color=auto mysqld
[root@mysql-node1 ~]# killall -9 -u mysql /usr/local/mysql/bin/mysqld
[root@mysql-node1 ~]# ps -aux | grep mysqld
mysql 13677 18.5 10.8 2604532 404544 ? Ssl 21:16 0:00 /usr/local/mysql/bin/mysqld
root 13726 0.0 0.0 6416 2304 pts/0 S+ 21:16 0:00 grep --color=auto mysqld
# 卧槽,不用这么麻烦,直接关掉服务就好了
[root@mysql-node1 ~]# systemctl stop mysqld.service
[root@mysql-node1 ~]# ps -aux | grep mysqld
root 13836 0.0 0.0 6416 2304 pts/0 S+ 21:19 0:00 grep --color=auto mysqld
[root@mysql-node1 ~]# rm -rf /data/mysql/*
[root@mysql-node1 ~]# ls /data/mysql/
# 数据库初始化建立mysql基本数据
[root@mysql-node1 ~]# mysqld --initialize --user=mysql
2026-03-24T11:34:28.727297Z 6 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: Ja%zWXa=D8dN
[root@mysql-node1 ~]# systemctl start mysqld.service
[root@mysql-node1 ~]# mysql -pJa%zWXa=D8dN
mysql> exit
# 再扮演一遍剧本
[root@mysql-node2 ~]# systemctl start mysqld.service
[root@mysql-node2 ~]# mysql -pZ4Mtw0=KcR5f
mysql>
解决2
通过"解决"步骤中我又发现了其实是剧本没停止掉mysqld服务,换成守护进程方式停止,或者直接暴力杀死所有进程
pkill -9 mysqld(实验环境随便暴力杀进程)因为使用/usr/local/mysql/bin/mysqld stop去停掉进程在我这个环境不行,应该是环境有点脏的原因。
bash
[root@mysql-node1 ~]# /usr/local/mysql/bin/mysqld --user=mysql &
[1] 6199
[root@mysql-node1 ~]# 2026-03-24T11:59:29.455234Z 0 [System] [MY-015015] [Server] MySQL Server - start.
2026-03-24T11:59:29.634950Z 0 [Warning] [MY-011070] [Server] 'Disabling symbolic links using --skip-symbolic-links (or equivalent) is the default. Consider not using this option as it' is deprecated and will be removed in a future release.
2026-03-24T11:59:29.635016Z 0 [System] [MY-010116] [Server] /usr/local/mysql/bin/mysqld (mysqld 8.3.0) starting as process 6199
2026-03-24T11:59:29.641422Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2026-03-24T11:59:29.702045Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2026-03-24T11:59:29.793906Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2026-03-24T11:59:29.793951Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2026-03-24T11:59:29.806440Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /data/mysql/mysqlx.sock
2026-03-24T11:59:29.806575Z 0 [System] [MY-010931] [Server] /usr/local/mysql/bin/mysqld: ready for connections. Version: '8.3.0' socket: '/data/mysql/mysql.sock' port: 3306 Source distribution.
[root@mysql-node1 ~]# mysql -p*vw:qB&gY36T
[2] 6267
-bash: gY36T: command not found
[root@mysql-node1 ~]# mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)
^C
[2]+ Exit 1 mysql -p*vw:qB
[root@mysql-node1 ~]# mysql -p'*vw:qB&gY36T'
mysql>
[root@mysql-node1 ~]# /usr/local/mysql/bin/mysqld stop
2026-03-24T12:04:52.606606Z 0 [System] [MY-015015] [Server] MySQL Server - start.
2026-03-24T12:04:52.795164Z 0 [Warning] [MY-011070] [Server] 'Disabling symbolic links using --skip-symbolic-links (or equivalent) is the default. Consider not using this option as it' is deprecated and will be removed in a future release.
2026-03-24T12:04:52.795229Z 0 [System] [MY-010116] [Server] /usr/local/mysql/bin/mysqld (mysqld 8.3.0) starting as process 6271
2026-03-24T12:04:52.797342Z 0 [ERROR] [MY-010123] [Server] Fatal error: Please read "Security" section of the manual to find out how to run mysqld as root!
2026-03-24T12:04:52.797475Z 0 [ERROR] [MY-010119] [Server] Aborting
2026-03-24T12:04:52.797862Z 0 [System] [MY-010910] [Server] /usr/local/mysql/bin/mysqld: Shutdown complete (mysqld 8.3.0) Source distribution.
2026-03-24T12:04:52.797866Z 0 [System] [MY-015016] [Server] MySQL Server - end.
# 没停掉
[root@mysql-node1 ~]# ps -aux | grep mysqld
mysql 6199 0.4 11.3 2674188 422184 pts/0 Sl 19:59 0:01 /usr/local/mysql/bin/mysqld --user=mysql
root 6277 0.0 0.0 6416 2304 pts/0 S+ 20:05 0:00 grep --color=auto mysqld
# 直接把剧本停止部分改为pkill -9 mysqld,试过了守护经常和pkill可以停掉
[root@mysql-node1 ~]# ps -aux | grep mysqld
mysql 6322 1.0 10.8 2673156 403700 pts/0 Sl 20:07 0:00 /usr/local/mysql/bin/mysqld --user=mysql
root 6372 0.0 0.0 6416 2304 pts/0 S+ 20:08 0:00 grep --color=auto mysqld
[root@mysql-node1 ~]# pkill -9 mysqld
[root@mysql-node1 ~]# ps -aux | grep mysqld
root 6375 0.0 0.0 6416 2304 pts/0 S+ 20:09 0:00 grep --color=auto mysqld
[1]+ Killed /usr/local/mysql/bin/mysqld --user=mysql
# 改用剧本使用pkill试一次
# 三个节点服务都开着
[devops@MySQL-Router ansible]$ ansible-playbook clear_mysql.yml -vv | grep password
172.25.254.30 root@localhost: Mo:FHoIs_4w9
172.25.254.10 root@localhost: xcaR*JtSc11d
172.25.254.20 root@localhost: g!,yC*g(4uu8
[root@mysql-node3 ~]# systemctl start mysqld
[root@mysql-node3 ~]# mysql -p'Mo:FHoIs_4w9'
mysql>
[root@mysql-node2 ~]# systemctl start mysqld
[root@mysql-node2 ~]# mysql -p'g!,yC*g(4uu8'
mysql>
[root@mysql-node1 ~]# systemctl start mysqld
[root@mysql-node1 ~]# mysql -p'xcaR*JtSc11d'
mysql>
1.2 手动还原
删数据
/data/mysql/,再重新走一般数据库的启动。
bash
#所有节点初始化数据
[root@mysql-node1 ~]# /etc/init.d/mysqld stop
[root@mysql-node1 ~]# rm -rf /data/mysql/*
[root@mysql-node1 ~]# cat > /etc/my.cnf <<EOF
[mysqld]
datadir=/data/mysql
socket=/data/mysql/mysql.sock
symbolic-links=0
#不同主机server-id一定要根据实际情况做相应改变
server-id=10|20|30
log-bin=mysql-bin
gtid_mode=ON
enforce-gtid-consistency=ON
default_authentication_plugin=mysql_native_password
log_slave_updates=ON
binlog_format=ROW
binlog_checksum=NONE
disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"
EOF
[root@mysql-node1 ~]# mysqld --user=mysql --initialize
..................
# 起服务
# # 数据库安全初始化-->mysql_secure_installation

