ActiveMQ 5.1.3:单节点与集群部署实战指南

前言:在当今数字化时代,企业级应用的复杂性呈指数级增长,系统之间的通信和数据交互成为业务流程的核心。无论是金融交易的实时处理、电商平台的订单流转,还是物联网设备的数据采集,高效、可靠的消息传递机制都是确保系统稳定运行的关键。然而,传统的同步通信方式往往面临性能瓶颈、系统耦合度过高以及扩展性不足等问题。在这种背景下,消息中间件应运而生,而ActiveMQ作为其中的佼佼者,凭借其卓越的性能、丰富的功能和广泛的应用场景,成为了企业级应用集成的首选解决方案之一。

ActiveMQ是一个开源的、高性能的消息中间件,它基于Java开发,遵循JMS(Java Message Service)规范,同时支持多种消息协议,能够与不同语言和平台无缝集成。它不仅提供了强大的消息传递功能,还通过灵活的部署方式、高可用性设计以及丰富的社区支持,满足了企业级应用对可靠性和扩展性的严格要求。无论是初学者还是资深开发者,ActiveMQ都能提供从入门到进阶的全方位支持,帮助用户快速构建高效、稳定的消息通信架构。

在接下来的篇章中,我们将深入探索ActiveMQ的核心特性、部署方式、应用场景以及最佳实践,帮助您全面了解这一强大的消息中间件。无论您是希望优化现有系统架构,还是正在寻找新的技术解决方案,ActiveMQ都值得您深入了解和尝试。让我们一起走进ActiveMQ的世界,开启高效消息通信之旅。

以下是基于当前最新稳定版的 ActiveMQ(5.18.3 版本)的单机部署和集群部署步骤,以及 JDK 的具体安装步骤:

一、JDK 安装步骤

1. 下载适合您系统的 JDK 安装包,可从 Oracle 官方网站获取。

2. 上传 JDK 安装包到服务器指定目录,如 /mpjava

3. 使用以下命令安装 JDK:

bash 复制代码
cd /mpjava
rpm -ivh jdk-<version>-linux-x64.rpm

或使用 yum 安装:

bash 复制代码
yum install java-11-openjdk

4. 验证 JDK 是否安装成功:

bash 复制代码
java -version

二、ActiveMQ 5.18.3 单机部署

1. 安装 JDK,参看上述步骤。

2. 将 ActiveMQ 压缩包(apache-activemq-5.18.3-bin.tar.gz)上传到服务器目录(例如:/mpjava)。

3. 解压压缩包:

bash 复制代码
cd /mpjava
tar xvf apache-activemq-5.18.3-bin.tar.gz

4. 启动 ActiveMQ:

bash 复制代码
cd /mpjava/apache-activemq-5.18.3
bin/activemq start

5. 检测 ActiveMQ 是否安装成功:

  • 检测 ActiveMQ 端口 61616 是否监听:

    bash 复制代码
    netstat -an | grep 61616
  • 在能访问 ActiveMQ 服务器的机器上浏览器输入 http://<ActiveMQ服务器IP>:8161/admin/ 验证管理服务是否启动。

6. 配置开机启动:

bash 复制代码
vim /etc/rc.d/rc.local

添加执行语句:

复制代码
/mpjava/apache-activemq-5.18.3/bin/activemq start

保存后退出,并授权开机启动文件:

bash 复制代码
chmod +x /etc/rc.d/rc.local

三、ActiveMQ 5.18.3 集群部署

1. 集群规划:

  • 作用:ZooKeeper 节点,地址:172.26.223.71、172.26.223.72、172.26.223.73,ZooKeeper 端口:2181。
  • 作用:ActiveMQ 节点,地址:172.26.223.71、172.26.223.72、172.26.223.73,各节点端口规划:
    • openwire 端口:61616、61626
    • amqp 端口:5672、5682
    • stomp 端口:61613、61623
    • mqtt 端口:1883、1884
    • ws 端口:61614、61624
    • admin 端口:8161、8162
  • 集群结构:mq11、mq12、mq13 组成高可用集群 cluster1,mq21、mq22、mq23 组成高可用集群 cluster2;cluster1、cluster2 进行负载组成集群 mqcluster。
  • 三台服务器上传 apache-activemq-5.18.3-bin.tar.gz,并解压到两个目录:
    • /mpjava/activemq-cluster1/
    • /mpjava/activemq-cluster2/

2. 部署 ZooKeeper 集群(参照我的相关部署博客文档)。

3. 高可用集群配置:

  • 修改 conf/activemq.xml:

    xml 复制代码
    <broker xmlns="http://activemq.apache.org/schema/core" brokerName="cluster1" dataDirectory="${activemq.data}">
      <persistenceAdapter>
        <replicatedLevelDB
          directory="${activemq.data}/leveldb"
          replicas="3"
          bind="tcp://0.0.0.0:0"
          zkAddress="172.26.223.71:2181,172.26.223.72:2181,172.26.223.73:2181"
          hostname="172.26.223.71"
          sync="local_disk"
          zkPath="/activemq/leveldb-stores/cluster1"/>
      </persistenceAdapter>
      <transportConnectors>
        <transportConnector name="openwire" uri="tcp://0.0.0.0:61616?maximumConnections=1000&amp;wireFormat.maxFrameSize=104857600"/>
        <transportConnector name="amqp" uri="amqp://0.0.0.0:5672?maximumConnections=1000&amp;wireFormat.maxFrameSize=104857600"/>
        <transportConnector name="stomp" uri="stomp://0.0.0.0:61613?maximumConnections=1000&amp;wireFormat.maxFrameSize=104857600"/>
        <transportConnector name="mqtt" uri="mqtt://0.0.0.0:1883?maximumConnections=1000&amp;wireFormat.maxFrameSize=104857600"/>
        <transportConnector name="ws" uri="ws://0.0.0.0:61614?maximumConnections=1000&amp;wireFormat.maxFrameSize=104857600"/>
      </transportConnectors>
    </broker>
  • 修改 conf/jetty.xml:

    xml 复制代码
    <bean id="jettyPort" class="org.apache.activemq.web.WebConsolePort" init-method="start">
      <property name="host" value="0.0.0.0"/>
      <property name="port" value="8161"/>
    </bean>

4. 负载集群:

  • cluster1 集群的每个节点的 activemq.xml 中添加配置(在 persistenceAdapter 标签前):

    xml 复制代码
    <networkConnectors>
      <networkConnector uri="static:(tcp://172.26.223.71:61626,tcp://172.26.223.72:61626,tcp://172.26.223.73:61626)" duplex="true"/>
    </networkConnectors>
  • cluster2 集群的每个节点的 activemq.xml 中添加配置(在 persistenceAdapter 标签前):

    xml 复制代码
    <networkConnectors>
      <networkConnector uri="static:(tcp://172.26.223.71:61616,tcp://172.26.223.72:61616,tcp://172.26.223.73:61616)" duplex="true"/>
    </networkConnectors>

5. 启动 ActiveMQ:

  • 在每个 ActiveMQ 的 bin 目录下执行:

    bash 复制代码
    ./activemq start

6. 配置开机启动:

bash 复制代码
vim /etc/rc.d/rc.local

添加执行语句:

复制代码
/mpjava/activemq-cluster1/bin/activemq start
/mpjava/activemq-cluster2/bin/activemq start

保存后退出,并授权开机启动文件:

bash 复制代码
chmod +x /etc/rc.d/rc.local

运维增强与故障预防

1. 程序中配置:

  • 将 ly-mp-other.properties 中 mp.component.amqUrl 配置为:

    复制代码
    mp.component.amqType=2
    mp.component.amqUrl = failover:(tcp://172.26.223.71:61616,tcp://172.26.223.71:61626,tcp://172.26.223.72:61616,tcp://172.26.223.72:61626,tcp://172.26.223.73:61616,tcp://172.26.223.73:61626)

2. ActiveMQ守护与监控脚本

功能简述:

本脚本用于监控ActiveMQ集群实例的运行状态,并在实例异常退出时自动重启。同时,脚本会检查ActiveMQ的日志文件,检测是否出现leveldb数据损坏的错误日志(如Could not load message seq或No reader available for position),并在检测到错误时记录详细的告警信息到activemq-cluster-error-alert.log文件中。此脚本适用于生产环境中ActiveMQ集群的高可用性监控和故障恢复。
核心功能矩阵:

功能模块 实现方式 关键指标
进程存活监控 每60秒检测activemq-cluster1activemq-cluster2进程状态 检测精度:100%
异常自动恢复 进程消失时自动执行: 1. 数据目录备份(带时间戳) 2. 服务重启 恢复时间:<30秒
LevelDB健康检测 实时扫描日志中的关键错误: - Could not load message seq - No reader available 错误检出率:95%
智能告警系统 结构化日志输出到activemq-cluster-error-alert.log 告警延迟:<60秒
自保护机制 通过nohup实现后台运行,开机自启动配置 运行稳定性:7×24小时

使用方法:
① 手动启动: su - root -c '/mpjava/amqwatch.sh &'
② 开机自启动:

将以下命令加入/etc/rc.local:su - root -c '/mpjava/amqwatch.sh &'
③ 或者创建systemd服务文件(推荐):

复制代码
cat <<EOF > /etc/systemd/system/amqwatch.service
[Unit]
Description=ActiveMQ Watcher Service
After=network.target

[Service]
ExecStart=/mpjava/amqwatch.sh
Restart=always
User=root

[Install]
WantedBy=multi-user.target
EOF

systemctl enable amqwatch.service
systemctl start amqwatch.service

示例日志输出:

当检测到ActiveMQ实例异常或日志错误时,脚本会记录以下格式的告警信息到activemq-cluster-error-alert.log:

复制代码
start-----------------------------------------------------
日志编号:监控脚本-2001
时间:2025-02-17 14:30:00
主机名称:amq-cluster-01(192.168.1.101)
步骤:0
类:ActiveMQ-61616
功能描述:ActiveMQ数据文件leveldb同步异常,文件损坏!
用户名:SYS
日志:ActiveMQ数据文件leveldb同步异常,文件损坏!异常日志内容:Could not load message seq 和 No reader available for position
解决方案:建议在非生产时间,结束所有ActiveMQ进程。
级别:ERROR
其他参数:
end----------------------------------------------------

注意事项:

**① 权限问题:**脚本需要以root用户运行,以确保能够正常操作ActiveMQ实例和日志文件。

**② 备份数据目录:**在重启ActiveMQ实例时,脚本会自动备份data目录,以防止数据损坏导致的问题。

**③ 日志文件路径:**确保activemq-cluster-error-alert.log文件路径存在,否则需要手动创建。

**④ 监控频率:**脚本以60秒为周期运行,可根据实际需求调整监控频率。

通过此脚本,可以有效提升ActiveMQ集群的高可用性和稳定性,减少因实例异常或数据损坏导致的业务中断风险。

脚本内容:

bash 复制代码
#!/bin/bash

#add for chkconfig
#chkconfig: 2345 70 30
#description:AmqWatch  shell #关于脚本的简短描述
#processname:AmqWatch        #第一个进程名,后边设置自启动的时候会用到
#开机启动/etc/rc.local加入:su - root -c '/mpjava/amqwatch.sh &'
P1=/mpjava/activemq-cluster1/
P2=/mpjava/activemq-cluster2/

errorLogNum61616=0
errorLogNum61626=0
errorLogFile="/mpjava/activemq-cluster-error-alert.log"
local_host=`hostname`
local_ip=`/sbin/ifconfig -a|grep inet|grep -v 127.0.0.1|grep -v 192.168.122.1|grep -v inet6|awk '{print $2}'|tr -d "addr:"`
while true; do

		#获取运行程序的路径
		pidpath=$(ps x | grep activemq | grep -v grep | awk '{print $9}')
		echo $pidpath
		datetime=`date +%Y%m%d_%H%M%S_%N |cut -b1-20`
		currTime=$(date +"%Y-%m-%d %T")
		date=$(date +%Y%m%d)
		if [[ $pidpath =~ $P1 ]]
		then
			echo "$P1 已经存在"
		else
			echo "start activemq. $P1 bin/activemq"
			mv ${P1}data ${P1}data.bak$datetime
			nohup ${P1}bin/activemq start >/dev/null 2>&1 &
		fi
		#检查61616日志
		#tmpLogNum1=`cat /mpjava/activemq-cluster1/data/activemq.log |grep -E "Could not load message seq|No reader available for position" |wc -l`
		tmpLogNum1=`grep -E "Could not load message seq|No reader available for position" ${P1}/data/activemq.log   |wc -l`
		#tmpLogNum1=$?
		echo "tmpLogNum1:${tmpLogNum1}"
		if [[ $tmpLogNum1 -gt $errorLogNum61616 ]]
		then
			echo "start-----------------------------------------------------"  | tee -a $errorLogFile
			echo "     日志编号:监控脚本-2001" | tee -a $errorLogFile
			echo "       时间:$currTime" | tee -a $errorLogFile
			echo "     主机名称:$local_host($local_ip)" | tee -a $errorLogFile
			echo "       步骤:0" | tee -a $errorLogFile
			echo "        类:ActiveMQ-61616" | tee -a $errorLogFile
			echo "     功能描述:ActiveMQ数据文件levelDB同步异常,文件损坏!" | tee -a $errorLogFile
			echo "      用户名:SYS" | tee -a $errorLogFile
			echo "       日志:ActiveMQ数据文件levelDB同步异常,文件损坏!异常日志内容:Could not load message seq 和 No reader available for position" | tee -a $errorLogFile
			echo "     解决方案:建议在非生产时间,结束3台ActiveMQ进程。" | tee -a $errorLogFile
			echo "       级别:ERROR" | tee -a $errorLogFile
			echo "     其他参数:" | tee -a $errorLogFile
			echo "end----------------------------------------------------" | tee -a $errorLogFile
		fi
		errorLogNum61616=$tmpLogNum1;
		
		if [[ $pidpath =~ $P2 ]]
		then
			echo "$P2 已经存在"
		else
			echo "start activemq. $P2 bin/activemq"
			mv ${P2}data ${P2}data.bak$datetime
			nohup ${P2}bin/activemq start >/dev/null 2>&1 &
		fi
		#检查61626日志
		#tmpLogNum2=`cat /mpjava/activemq-cluster2/data/activemq.log |grep -E "Could not load message seq|No reader available for position" |wc -l`
		tmpLogNum2=`grep -E "Could not load message seq|No reader available for position" ${P2}/data/activemq.log   |wc -l`
		#tmpLogNum2=$?
		echo "tmpLogNum2:${tmpLogNum2}"
		if [[ $tmpLogNum2 -gt $errorLogNum61626 ]]
		then
			echo "start-----------------------------------------------------"  | tee -a $errorLogFile
			echo "     日志编号:监控脚本-2001" | tee -a $errorLogFile
			echo "       时间:$currTime" | tee -a $errorLogFile
			echo "     主机名称:$local_host($local_ip)" | tee -a $errorLogFile
			echo "       步骤:0" | tee -a $errorLogFile
			echo "        类:ActiveMQ-61626(${P2})" | tee -a $errorLogFile
			echo "     功能描述:ActiveMQ数据文件levelDB同步异常,文件损坏!" | tee -a $errorLogFile
			echo "      用户名:SYS" | tee -a $errorLogFile
			echo "       日志:ActiveMQ数据文件levelDB同步异常,文件损坏!异常日志内容:Could not load message seq 和 No reader available for position" | tee -a $errorLogFile
			echo "     解决方案:建议在非生产时间,结束3台ActiveMQ进程。" | tee -a $errorLogFile
			echo "       级别:ERROR" | tee -a $errorLogFile
			echo "     其他参数:" | tee -a $errorLogFile
			echo "end----------------------------------------------------" | tee -a $errorLogFile
		fi
		errorLogNum61626=$tmpLogNum2;
        #每次循环沉睡60s
        sleep 60
done

3.activemq定时归档脚本进程amqportwatch.sh

此脚本必须在三台ActiveMQ以外的服务器(如监控服务器)部署。实现1)端口全部异常扫描。2)定时对activemq进行归档,即在没有未消费消息情况下,对三台ActiveMQ服务器的6个进程进程关停。

手动启动命令:su - root -c '/mpjava/amqportwatch.sh &'

开机自启动命令,在/etc/rc.local,加入su - root -c '/mpjava/amqportwatch.sh &'
脚本内容:

复制代码
#!/bin/bash

#add for chkconfig
#chkconfig: 2345 70 30
#description:AmqPortWatch  shell #关于脚本的简短描述
#processname:AmqPortWatch        #第一个进程名,后边设置自启动的时候会用到
#开机启动/etc/rc.local加入:su - root -c '/mpjava/amqportwatch.sh &'
P1=/mpjava/activemq-cluster1/
P2=/mpjava/activemq-cluster2/

#amq集群服务器
remote_hosts="192.168.1.101 192.168.1.102 192.168.1.103"
#amq服务器用户
remote_host_user=root
#amq服务器密码
remote_host_right_password="CZGC123!!"

#归档开关 1-开,开启定时归档;0-关,不归档;
data_keep_run=1;
#归档周:1-6即周一至六,0是周日
data_keep_week=3;
#归档时:00-23,05为凌晨5点,17点为下午5点
data_keep_hour=15;
#activemq访问网关地址(Nginx-VIP),默认地址,如:http://172.26.152.173:8161/admin/xml/queues.jsp
NginxGatewayServerIP="172.26.152.173"
#activemq访问账号密码
data_keep_xml_user="admin";
data_keep_xml_password="admin";


#归档时间未成功归档,离上次成功归档超过1个月(30天),在归档时间执行强制归档。
data_keep_day_count=30;
#归档日志目录
logPath="/mpjava/amqportwatch-log/"
mkdir -p $logPath
#归档日志文件
logfile=""
#告警日志输出,监控格式标准start---- end----
errorLogFile="${logPath}activemq-cluster-error-alert.log"
#节点MQ归档脚本(已增加远程归档,不是必须部署)
amqwatchPathFile="/mpjava/amqwatch.sh"
#当前时间
currTime=$(date +"%Y-%m-%d %T")
#文件名称
cur_datetime="`date +%Y-%m-%d-%H-%M-%S`";
#归档当前状态 0-未归档;1-归档中;
data_keep_state=0;

#归档状态,控制归档期间重试
data_keep_61616_state=0;
data_keep_61626_state=0;

#上次成功归档时间,三个月进行强制归档
data_keep_61616_last_date=$currTime
data_keep_61626_last_date=$currTime

checkPortReslut=0
checkPortServerIP=$NginxGatewayServerIP

check61616ServerIP=$NginxGatewayServerIP
check61626ServerIP=$NginxGatewayServerIP

local_host=`hostname`
local_ip=`/sbin/ifconfig -a|grep inet|grep -v 127.0.0.1|grep -v 192.168.122.1|grep -v inet6|awk '{print $2}'|tr -d "addr:"`

function checkAMQPort()
{
	currTime=$(date +"%Y-%m-%d %T")
	AMQPort=$1; AMQPath=$2
	echo "${currTime}检查${AMQPort}端口bengin" | tee -a $logfile
	checkncatreslut="`rpm -qa |grep ncat`"
	echo "checkncatreslut:${checkncatreslut}" | tee -a $logfile
	checksshpassreslut="`rpm -qa sshpass`"
	echo "checksshpassreslut:${checksshpassreslut}" | tee -a $logfile
	if [ -n "$checkncatreslut" ] && [ -n "$checksshpassreslut" ];then 
		#check begin
		checkPortReslut=0
		checkPortServerIP=$NginxGatewayServerIP
		for itemServer in $remote_hosts
		do
			echo $itemServer  | tee -a $logfile
			ping -c2 -i0.3 -W1 $itemServer &>/dev/null
			pingResult=$?;
			echo "pingResult:${pingResult}" | tee -a $logfile
			if [ $pingResult -ne 0 ]; then
				echo "${itemServer},无法ping通!" | tee -a $logfile
			else
				echo "${itemServer},ping OK!"  | tee -a $logfile
				ncat -w 1 $itemServer $AMQPort  </dev/null
				ncatResult=$?;
				echo "ncatResult:${ncatResult}" | tee -a $logfile
				if [ $ncatResult -ne 1 ]; then
					echo "ncat -w 1 ${itemServer}:${AMQPort},OK!"   | tee -a $logfile
					checkPortReslut=1;
					checkPortServerIP=$itemServer;
				else
					echo "ncat -w 1 ${itemServer}:${AMQPort},NG!" | tee -a $logfile
				fi 
			fi
			
		done
		echo "checkPortReslut:${checkPortReslut}" | tee -a $logfile
		#check end
		if [ $checkPortReslut -ne 0 ]; then
			echo "${AMQPort}端口正常!" | tee -a $logfile
		else
			echo "${AMQPort}端口全部异常!" | tee -a $logfile
			
			echo "start-----------------------------------------------------"  | tee -a $errorLogFile
			echo "     日志编号:监控脚本-2001" | tee -a $errorLogFile
			echo "       时间:$currTime" | tee -a $errorLogFile
			echo "     主机名称:$local_host($local_ip)" | tee -a $errorLogFile
			echo "       步骤:0" | tee -a $errorLogFile
			echo "        类:ActiveMQ" | tee -a $errorLogFile
			echo "     功能描述:ActiveMQ${AMQPort}端口全部异常" | tee -a $errorLogFile
			echo "      用户名:SYS" | tee -a $errorLogFile
			echo "       日志:ActiveMQ${AMQPort}端口全部异常" | tee -a $errorLogFile
			echo "     解决方案:结束3台ActiveMQ${AMQPort}进程进行归档。" | tee -a $errorLogFile
			echo "       级别:ERROR" | tee -a $errorLogFile
			echo "     其他参数:" | tee -a $errorLogFile
			echo "end----------------------------------------------------" | tee -a $errorLogFile
			
			for itemServer in $remote_hosts
			do
				ping -c2 -i0.3 -W1 $itemServer &>/dev/null
				if [ $? -ne 0 ]; then
					echo "${itemServer},无法ping通!" | tee -a $logfile
				else
					echo "开始结束进程:${itemServer}:${AMQPort}:${AMQPath}" | tee -a $logfile
					stopresult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "ps -ef |grep "$AMQPath" | grep -v grep| awk '{print \$2}'| xargs kill -9 >/dev/null 2>&1 &")
					echo "stopresult:${stopresult}" | tee -a $logfile
					amqwatchCheckResult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "ps -ef |grep amqwatch.sh |grep -v grep |wc -l;")
					echo "amqwatchCheckResult:${amqwatchCheckResult}" | tee -a $logfile
					if [ $amqwatchCheckResult -ne 0 ]; then
						echo "${itemServer}监控脚amqwatch.sh本正常!" | tee -a $logfile 
					else
						echo "${itemServer}监控脚本amqwatch.sh不正常!" | tee -a $logfile
						echo "确定脚本文件amqwatch.sh:${itemServer}:${AMQPort}:${AMQPath}" | tee -a $logfile
						amqwatchFileResult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "[ -e ${amqwatchPathFile} ] && echo 1 || echo 0;")
						echo "amqwatchFileResult:${amqwatchFileResult}" | tee -a $logfile
						if [ $amqwatchFileResult = 1 ]; then
							echo "远程启动amqwatch.sh:${itemServer}:${AMQPort}:${AMQPath}" | tee -a $logfile
							amqwatchRunResult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "su - root -c '${amqwatchPathFile} &' >/dev/null 2>&1 &")
							echo "amqwatchRunResult:${amqwatchRunResult}" | tee -a $logfile
						else
							echo "远程归档mv:${itemServer}:${AMQPort}:${AMQPath}" | tee -a $logfile
							amqMVDataResult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "mv ${AMQPath}data ${AMQPath}data.bak${cur_datetime} &") 
							echo "amqMVDataResult${amqMVDataResult}" | tee -a $logfile
							echo "远程启动./activemq start:${itemServer}:${AMQPort}:${AMQPath}" | tee -a $logfile
							amqStartResult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "${AMQPath}bin/./activemq start >/dev/null 2>&1 &") 
							echo "amqStartResult:${amqStartResult}" | tee -a $logfile
						fi
					fi
				fi 
			done
		fi
	else
		echo "请安装nact和sshpass!" | tee -a $logfile
	fi
	echo "检查${AMQPort}端口end" | tee -a $logfile
}


while true; do
	#AMQ定期归档begin
	cur_date="`date +%Y-%m-%d`";
	currTime=$(date +"%Y-%m-%d %T")
	cur_datetime="`date +%Y-%m-%d-%H-%M-%S`";
	#分
	cur_date_M="`date +%M`";
	#时
	cur_date_H="`date +%H`";
	#周
	cur_date_W="`date +%w`";
	#日志输出 echo "日志输出"| tee -a $logfile
	logfile="${logPath}/${cur_date}.log" 
	
	#归档开关 1-开,开启定时归档;0-关,不归档;
	if [ $data_keep_run = 1 ]; then
		#if [ $cur_date_H = "05" ]; then
		#凌晨5点执行
		if [ $cur_date_W = $data_keep_week ] && [ $cur_date_H = $data_keep_hour ]; then
		#每周日凌晨5点执行
			echo "${currTime}执行定时归档[周${data_keep_week},时${data_keep_hour}]![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
			if [ $data_keep_state = 0 ]; then
				#归档状态进行中
				data_keep_state=1;
				
				#确认是否有待消费MQ消息begin
				echo "开始确认是否有待消费(Number Of Pending Messages)MQ消息![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
				cur_curl_datetime="`date +%Y-%m-%d-%H-%M-%S`";
				
				#未消费消息检查结果0-下载队列xml文件错误,1-没有未消费队列或者只有死信队列,可执行归档,2-有未消费队列,不执行归档
				check61616PendingMessagesResult=0;
				activemq61616PathXml="${logPath}/activemq61616-${cur_curl_datetime}.xml";
				activemq61616PathUrl="http://${check61616ServerIP}:8161/admin/xml/queues.jsp"
				#curl -u admin:admin -o activemq61616-data.xml "http://172.26.153.110:8161/admin/xml/queues.jsp"
				curl -u $data_keep_xml_user:$data_keep_xml_password -o $activemq61616PathXml $activemq61616PathUrl
				
				#确认下载61616队列xml文件格式正常
				curl61616Result=`grep -E '<queues>|<queue name=' $activemq61616PathXml |wc -l`
				echo "curl61616Result:$curl61616Result" | tee -a $logfile
				if [ $curl61616Result = 0 ]; then
					echo "下载61616队列xml文件错误${activemq61616PathUrl},无法确认未消费队列[curl61616Result:$curl61616Result],此次定时归档失败!" | tee -a $logfile
					echo "start-----------------------------------------------------"  | tee -a $errorLogFile
					echo "     日志编号:监控脚本-2001" | tee -a $errorLogFile
					echo "       时间:$currTime" | tee -a $errorLogFile
					echo "     主机名称:$local_host($local_ip)" | tee -a $errorLogFile
					echo "       步骤:0" | tee -a $errorLogFile
					echo "        类:ActiveMQ" | tee -a $errorLogFile
					echo "     功能描述:定时归档[周${data_keep_week},时${data_keep_hour}]" | tee -a $errorLogFile
					echo "      用户名:SYS" | tee -a $errorLogFile
					echo "       日志:下载61616队列xml文件错误${activemq61616PathUrl},无法确认未消费队列[curl61616Result:$curl61616Result],此次定时归档失败!" | tee -a $errorLogFile
					echo "     解决方案:确认ActiveMQ-8161web管理服务正常。" | tee -a $errorLogFile
					echo "       级别:ERROR" | tee -a $errorLogFile
					echo "     其他参数:activemq61616PathXml:${activemq61616PathXml},data_keep_xml_user:${data_keep_xml_user},data_keep_xml_password:${data_keep_xml_password}]" | tee -a $errorLogFile
					echo "end----------------------------------------------------" | tee -a $errorLogFile
			
				else
					echo "下载61616队列xml文件正常${activemq61616PathUrl},[curl61616Result:$curl61616Result]" | tee -a $logfile
					#统计61616未消费队列
					#grep -E --color 'size="[1-9][0-9]*"' activemq61616-data.xml
					ActiveMQ61616_Pending_Num=`grep -E --color 'size="[1-9][0-9]*"' $activemq61616PathXml |wc -l`
					echo "统计61616未消费队列[ActiveMQ61616_Pending_Num:$ActiveMQ61616_Pending_Num]" | tee -a $logfile
					#统计61616死信队列
					ActiveMQ61616_DLQ_Num=`grep -C 2 -E --color '<queue name="ActiveMQ.DLQ">' $activemq61616PathXml |grep  -E --color 'size="[1-9][0-9]*"' |wc -l`
					echo "统计61616死信队列[ActiveMQ61616_DLQ_Num:$ActiveMQ61616_DLQ_Num]" | tee -a $logfile
					if [ $ActiveMQ61616_Pending_Num = 0 ] || [ $ActiveMQ61616_Pending_Num = $ActiveMQ61616_DLQ_Num ]; then
						#没有未消费队列或者只有死信队列,执行61616归档
						check61616PendingMessagesResult=1;
					else
						#有未消费队列,无法执行61616归档,输出告警
						check61616PendingMessagesResult=2;
					fi
				fi
				echo "完成确认61616是否有待消费(Number Of Pending Messages)MQ消息![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}][check61616PendingMessagesResult:$check61616PendingMessagesResult]" | tee -a $logfile
				
				
				#未消费消息检查结果0-下载队列xml文件错误,1-没有未消费队列或者只有死信队列,可执行归档,2-有未消费队列,不执行归档
				check61626PendingMessagesResult=0;
				activemq61626PathXml="${logPath}/activemq61626-${cur_curl_datetime}.xml";
				activemq61626PathUrl="http://${check61626ServerIP}:8162/admin/xml/queues.jsp"
				curl -u $data_keep_xml_user:$data_keep_xml_password -o $activemq61626PathXml $activemq61626PathUrl
				#确认下载61626队列xml文件格式正常
				curl61626Result=`grep -E '<queues>|<queue name=' $activemq61626PathXml |wc -l`
				echo "curl61626Result:$curl61626Result" | tee -a $logfile
				if [ $curl61626Result = 0 ]; then
					echo "下载61626队列xml文件错误${$activemq61626PathUrl},无法确认未消费队列[curl61626Result:$curl61626Result],此次定时归档失败!" | tee -a $logfile
					echo "start-----------------------------------------------------"  | tee -a $errorLogFile
					echo "     日志编号:监控脚本-2001" | tee -a $errorLogFile
					echo "       时间:$currTime" | tee -a $errorLogFile
					echo "     主机名称:$local_host($local_ip)" | tee -a $errorLogFile
					echo "       步骤:0" | tee -a $errorLogFile
					echo "        类:ActiveMQ" | tee -a $errorLogFile
					echo "     功能描述:定时归档[周${data_keep_week},时${data_keep_hour}]" | tee -a $errorLogFile
					echo "      用户名:SYS" | tee -a $errorLogFile
					echo "       日志:下载61626队列xml文件错误${activemq61626PathUrl},无法确认未消费队列[curl61626Result:$curl61626Result],此次定时归档失败!" | tee -a $errorLogFile
					echo "     解决方案:确认ActiveMQ-8162web管理服务正常。" | tee -a $errorLogFile
					echo "       级别:ERROR" | tee -a $errorLogFile
					echo "     其他参数:activemq61626PathXml:${activemq61626PathXml},data_keep_xml_user:${data_keep_xml_user},data_keep_xml_password:${data_keep_xml_password}]" | tee -a $errorLogFile
					echo "end----------------------------------------------------" | tee -a $errorLogFile
				else
					echo "下载61626队列xml文件正常${activemq61626PathUrl},[curl61626Result:$curl61626Result]" | tee -a $logfile
					#统计61626未消费队列
					#grep -E --color 'size="[1-9][0-9]*"' activemq61626-data.xml
					ActiveMQ61626_Pending_Num=`grep -E --color 'size="[1-9][0-9]*"' $activemq61626PathXml |wc -l`
					echo "统计61626未消费队列[ActiveMQ61626_Pending_Num:$ActiveMQ61626_Pending_Num]" | tee -a $logfile
					#统计61626死信队列
					ActiveMQ61626_DLQ_Num=`grep -C 2 -E --color '<queue name="ActiveMQ.DLQ">' $activemq61626PathXml |grep  -E --color 'size="[1-9][0-9]*"' |wc -l`
					echo "统计61626死信队列[ActiveMQ61626_DLQ_Num:$ActiveMQ61626_DLQ_Num]" | tee -a $logfile
					
					if [ $ActiveMQ61626_Pending_Num = 0 ] || [ $ActiveMQ61626_Pending_Num = $ActiveMQ61626_DLQ_Num ]; then
						#没有未消费队列或者只有死信队列,执行61626归档
						check61626PendingMessagesResult=1;
					else
						#有未消费队列,无法执行61626归档,输出告警
						check61626PendingMessagesResult=2;
					fi
				fi
				echo "完成确认61626是否有待消费(Number Of Pending Messages)MQ消息![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}][check61626PendingMessagesResult:$check61626PendingMessagesResult]" | tee -a $logfile
				#确认是否有待消费MQ消息end
				
				#记录本次执行归档结果
				data_keep_61616_state_now=$data_keep_61616_state;
				data_keep_61626_state_now=$data_keep_61626_state;
				for itemServer in $remote_hosts
				do
					ping -c2 -i0.3 -W1 $itemServer &>/dev/null
					if [ $? -ne 0 ]; then
						echo "${itemServer},无法ping通!" | tee -a $logfile
					else
						echo "开始执行归档和清理begin" | tee -a $logfile
						if [ $data_keep_61616_state -ne 1 ]; then
							echo "开始结束进程:${itemServer}:${P1}" | tee -a $logfile
							lastDateTime=`date -d "${data_keep_61616_last_date}" +%s`;
							nowDate=$(date +"%Y-%m-%d %T");
							nowDateTime=`date -d "${nowDate}" +%s`;
							CountSeconds=$(($nowDateTime-$lastDateTime));
							CountDays=0;
							if [ $CountSeconds -gt 86400 ]; then
								CountDays=`expr $CountSeconds / 86400`;
							fi
							echo "61616距离上次归档天数:[CountSeconds:${CountSeconds},CountDays:${CountDays}]" | tee -a $logfile
							
							#可执行归档或者离上次成功归档超过1个月(30天)
							if [ $check61616PendingMessagesResult = 1 ] || [ $CountDays -ge $data_keep_day_count ]; then
								stopresult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "ps -ef |grep "$P1" | grep -v grep| awk '{print \$2}'| xargs kill -9 >/dev/null 2>&1 &")
								data_keep_61616_last_date=$(date +"%Y-%m-%d %T")
								data_keep_61616_state_now=1;
								echo "${data_keep_61616_last_date}完成结束进程!stopresult:${stopresult}" | tee -a $logfile
							else
								data_keep_61616_state_now=0;
								echo "未执行结束进程,8161有待消费MQ消息或者下载数据异常:${itemServer}:${P1}" | tee -a $logfile
							fi
						else
							echo "61616今天归档成功,不需要重复归档![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
						fi
						
						if [ $data_keep_61626_state -ne 1 ]; then
							echo "开始结束进程:${itemServer}:${P2}" | tee -a $logfile
							
							lastDateTime=`date -d "${data_keep_61626_last_date}" +%s`;
							nowDate=$(date +"%Y-%m-%d %T");
							nowDateTime=`date -d "${nowDate}" +%s`;
							CountSeconds=$(($nowDateTime-$lastDateTime));
							CountDays=0;
							if [ $CountSeconds -gt 86400 ]; then
								CountDays=`expr $CountSeconds / 86400`;
							fi
							echo "61626距离上次归档天数:[CountSeconds:${CountSeconds},CountDays:${CountDays}]" | tee -a $logfile
							#可执行归档或者离上次成功归档超过1个月(30天)
							if [ $check61626PendingMessagesResult = 1 ] || [ $CountDays -ge $data_keep_day_count ]; then
								stopresult2=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "ps -ef |grep "$P2" | grep -v grep| awk '{print \$2}'| xargs kill -9 >/dev/null 2>&1 &")
								data_keep_61626_last_date=$(date +"%Y-%m-%d %T");
								data_keep_61626_state_now=1;
								echo "${data_keep_61626_last_date}完成结束进程!stopresult2:${stopresult2}" | tee -a $logfile
							else
								data_keep_61626_state_now=0;
								echo "未执行结束进程,8162有待消费MQ消息或者下载数据异常:${itemServer}:${P2}" | tee -a $logfile
							fi
						else
							echo "61626今天归档成功,不需要重复归档![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
						fi
						
						
						if [ $data_keep_61616_state -ne 1 ]; then
							echo "开始清理:${itemServer}:${P1}" | tee -a $logfile
							#find /mpjava/activemq-cluster1/ -mtime +7 -type d -name "data.bak*" -exec rm -rf {} \;
							rmresult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "find "$P1" -mtime +7 -type d -name 'data.bak*' -exec rm -rf {} \;")
							echo "rmresult:${rmresult}" | tee -a $logfile
						else
							echo "61616今天清理成功,不需要再清理![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
						fi
						
						if [ $data_keep_61626_state -ne 1 ]; then
							echo "开始清理:${itemServer}:${P2}" | tee -a $logfile
							#find /mpjava/activemq-cluster2/ -mtime +7 -type d -name "data.bak*" -exec rm -rf {} \;
							rmresult2=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "find "$P2" -mtime +7 -type d -name 'data.bak*' -exec rm -rf {} \;")
							echo "rmresult2:${rmresult2}" | tee -a $logfile
						else
							echo "61626今天清理成功,不需要再清理![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
						fi
						echo "完成执行归档和清理end" | tee -a $logfile
					fi 
				done
				
				#更新本次执行归档结果
				data_keep_61616_state=$data_keep_61616_state_now;
				data_keep_61626_state=$data_keep_61626_state_now;
				
				#清理7天前/mpjava/amqportwath/*.log和*.xml文件
				echo "清理7天前${logPath}*.log和*.xml文件" | tee -a $logfile
				find ${logPath} -mtime +7 -type f -name "*.log" -exec rm -rf {} \;
				find ${logPath} -mtime +7 -type f -name "*.xml" -exec rm -rf {} \;
				
				#如果61616或者61626归档失败,10分钟后重试
				if [ $data_keep_61616_state -ne 1 ] || [ $data_keep_61626_state -ne 1 ]; then
					data_keep_state=0;
				#if [ $data_keep_state = 1 ]; then 
					if [ $data_keep_61616_state -ne 1 ]; then 
					#61616归档失败
						echo "61616归档失败!" | tee -a $logfile
						echo "start-----------------------------------------------------"  | tee -a $errorLogFile
						echo "     日志编号:监控脚本-2001" | tee -a $errorLogFile
						echo "       时间:$currTime" | tee -a $errorLogFile
						echo "     主机名称:$local_host($local_ip)" | tee -a $errorLogFile
						echo "       步骤:0" | tee -a $errorLogFile
						echo "        类:ActiveMQ" | tee -a $errorLogFile
						echo "     功能描述:定时归档[周${data_keep_week},时${data_keep_hour}]" | tee -a $errorLogFile
						echo "      用户名:SYS" | tee -a $errorLogFile
						echo "       日志:61616归档失败!原因是有待消费MQ消息或者下载数据异常。" | tee -a $errorLogFile
						echo "     解决方案:非生产时间,手动执行归档(结束三台MQ服务器的6个activemq进程)。" | tee -a $errorLogFile
						echo "       级别:ERROR" | tee -a $errorLogFile
						echo "     其他参数:" | tee -a $errorLogFile
						echo "end----------------------------------------------------" | tee -a $errorLogFile
					fi
					if [ $data_keep_61626_state -ne 1 ]; then 
					#61626归档失败
						echo "61626归档失败!" | tee -a $logfile
						echo "start-----------------------------------------------------"  | tee -a $errorLogFile
						echo "     日志编号:监控脚本-2001" | tee -a $errorLogFile
						echo "       时间:$currTime" | tee -a $errorLogFile
						echo "     主机名称:$local_host($local_ip)" | tee -a $errorLogFile
						echo "       步骤:0" | tee -a $errorLogFile
						echo "        类:ActiveMQ" | tee -a $errorLogFile
						echo "     功能描述:定时归档[周${data_keep_week},时${data_keep_hour}]" | tee -a $errorLogFile
						echo "      用户名:SYS" | tee -a $errorLogFile
						echo "       日志:61626归档失败!原因是有待消费MQ消息或者下载数据异常。" | tee -a $errorLogFile
						echo "     解决方案:非生产时间,手动执行归档(结束三台MQ服务器的6个activemq进程)。" | tee -a $errorLogFile
						echo "       级别:ERROR" | tee -a $errorLogFile
						echo "     其他参数:" | tee -a $errorLogFile
						echo "end----------------------------------------------------" | tee -a $errorLogFile
					fi
				#fi
					echo "61616或者61626归档失败,10分钟后重试。[data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
					echo "sleep:10分钟(600s)" | tee -a $logfile
					sleep 600
				fi
			else
				echo "${currTime}今天已完成归档![周${data_keep_week},时${data_keep_hour}][data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
				echo "sleep:10分钟(600s)" | tee -a $logfile
				sleep 600
			fi
		else
			echo "${currTime}非归档时间![周${data_keep_week},时${data_keep_hour}][data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
			

			#归档状态复原
			data_keep_61616_state=0;
			data_keep_61626_state=0;
			data_keep_state=0;
			echo "${currTime}归档状态复原![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
			
		fi
	fi
	#AMQ定期归档end 
	
	if [ $data_keep_state = 0 ]; then 
		#AMQ端口61616/61626检查begin
		check61616PortReslut=0
		check61616ServerIP=""
		checkAMQPort 61616 $P1
		check61616PortReslut=$checkPortReslut
		check61616ServerIP=$checkPortServerIP
		echo "check61616PortReslut${check61616PortReslut}" | tee -a $logfile
		echo "check61616ServerIP${check61616ServerIP}" | tee -a $logfile
		
		check61626PortReslut=0
		check61626ServerIP=""
		checkAMQPort 61626 $P2
		check61626PortReslut=$checkPortReslut
		check61626ServerIP=$checkPortServerIP
		echo "check61626PortReslut:${check61626PortReslut}" | tee -a $logfile
		echo "check61626ServerIP${check61626ServerIP}" | tee -a $logfile
		#AMQ端口61616/61626检查end 
		
		if [ $check61616PortReslut -ne 0 ] && [ $check61626PortReslut -ne 0 ]; then
			#检查端口正常,每次循环沉睡1分钟(60s)
			echo "sleep:1分钟(60s)" | tee -a $logfile
			sleep 60
		else
			#检查端口不正常,结束进程后,20分钟再进行确认
			echo "sleep:20分钟(1200s)" | tee -a $logfile
			sleep 1200
		fi
		
	fi
done

相关推荐
虚拟指尖1 分钟前
Ubuntu编译安装COLMAP【实测编译成功】
linux·运维·ubuntu
椎4951 小时前
苍穹外卖前端nginx错误之一解决
运维·前端·nginx
刘某的Cloud1 小时前
parted磁盘管理
linux·运维·系统·parted
极验1 小时前
iPhone17实体卡槽消失?eSIM 普及下的安全挑战与应对
大数据·运维·安全
爱倒腾的老唐1 小时前
24、Linux 路由管理
linux·运维·网络
yannan201903131 小时前
Docker容器
运维·docker·容器
_清浅2 小时前
计算机网络【第六章-应用层】
运维·服务器·计算机网络
失散132 小时前
分布式专题——39 RocketMQ客户端编程模型
java·分布式·架构·rocketmq
正在努力的小河2 小时前
Linux 自带的 LED 灯驱动实验
linux·运维·服务器
李子圆圆2 小时前
电力专用多功能微气象监测装置在电网安全运维中的核心价值是什么?
运维·安全