AWS EC2设置内存用量、磁盘用量等高级指标监控

本文介绍AWS EC2安装Cloudwatch Agent操作流程,实现内存用量等额外指标的采集。

当我们以普通的方式来开启EC2 instance,我们只能在CloudWatch里面看到CPU、EBS IO、网络IO等信息,但是如果要涉及到内存用量,EBS存储空间用量等信息时,在默认没有到情况下无法直接看到,需要安装额外的Cloudwatch Agent。默认情况下,AWS的监控服务Cloudwatch并没有对EC2内的内存总量和使用情况进行监控,因为内存属于用户操作系统内的信息,在AWS的产品设计中,所有系统内的信息都属于用户的私有财产和信息。

需要进行内存用量,EBS存储空间等额外高敏感指标收集的典型场景:

  1. 某个服务在某次高峰流量到达后,运行一段时间发现性能下降或者失去响应,当流量峰值过去后依然无法恢复
  2. 一般是某台EC2运行了一些关键业务,但是过一段时间后程序性能下降或者某些服务运行异常
  3. 在内存严重紧张的情况下,SSH进程会僵死失去响应,运维人员无法轻松的SSH登陆EC2
  4. 某些自建的数据库和数据分析数据仓库,需要时刻监控EBS剩余空间

安装流程

安装Cloudwatch Agent和collectd

本文环境基于Amazon Linux 2023进行演示

我们先进行Cloudwatch Agent的安装

bash 复制代码
sudo dnf install amazon-cloudwatch-agent
bash 复制代码
sudo dnf install collectd

为本台EC2设置权限

我们需要参考Create IAM roles and users for use with CloudWatch agent,先建立一个role,并未role赋予CloudWatchAgentServerPolicyAWSXRayDaemonWriteAccesslogs:PutRetentionPolicy,具体的权限细节见下图

然后我们为EC2赋予这个role,起初我们看到EC2是没有role的

现在有了

配置Cloudwatch Agent监控

我们可以直接运行sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard帮助我们进行设置向导,下面是我自己的演示为例子

bash 复制代码
[ec2-user@ip-172-50-2-23 ~]$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
================================================================
= Welcome to the Amazon CloudWatch Agent Configuration Manager =
=                                                              =
= CloudWatch Agent allows you to collect metrics and logs from =
= your host and send them to CloudWatch. Additional CloudWatch =
= charges may apply.                                           =
================================================================
On which OS are you planning to use the agent?
1. linux
2. windows
3. darwin
default choice: [1]:
1
Trying to fetch the default region based on ec2 metadata...
2023/12/14 06:02:51 I! imds retry client will retry 1 times
Are you using EC2 or On-Premises hosts?
1. EC2
2. On-Premises
default choice: [1]:
1
Which user are you planning to run the agent?
1. root
2. cwagent
3. others
default choice: [1]:
1
Do you want to turn on StatsD daemon?
1. yes
2. no
default choice: [1]:
1
Which port do you want StatsD daemon to listen to?
default choice: [8125]

What is the collect interval for StatsD daemon?
1. 10s
2. 30s
3. 60s
default choice: [1]:
3
What is the aggregation interval for metrics collected by StatsD daemon?
1. Do not aggregate
2. 10s
3. 30s
4. 60s
default choice: [4]:
4
Do you want to monitor metrics from CollectD? WARNING: CollectD must be installed or the Agent will fail to start
1. yes
2. no
default choice: [1]:
1
Do you want to monitor any host metrics? e.g. CPU, memory, etc.
1. yes
2. no
default choice: [1]:
1
Do you want to monitor cpu metrics per core?
1. yes
2. no
default choice: [1]:
2
Do you want to add ec2 dimensions (ImageId, InstanceId, InstanceType, AutoScalingGroupName) into all of your metrics if the info is available?
1. yes
2. no
default choice: [1]:
1
Do you want to aggregate ec2 dimensions (InstanceId)?
1. yes
2. no
default choice: [1]:
1
Would you like to collect your metrics at high resolution (sub-minute resolution)? This enables sub-minute resolution for all metrics, but you can customize for specific metrics in the output json file.
1. 1s
2. 10s
3. 30s
4. 60s
default choice: [4]:
4
Which default metrics config do you want?
1. Basic
2. Standard
3. Advanced
4. None
default choice: [1]:
2
Current config as follows:
{
        "agent": {
                "metrics_collection_interval": 60,
                "run_as_user": "root"
        },
        "metrics": {
                "aggregation_dimensions": [
                        [
                                "InstanceId"
                        ]
                ],
                "append_dimensions": {
                        "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
                        "ImageId": "${aws:ImageId}",
                        "InstanceId": "${aws:InstanceId}",
                        "InstanceType": "${aws:InstanceType}"
                },
                "metrics_collected": {
                        "collectd": {
                                "metrics_aggregation_interval": 60
                        },
                        "cpu": {
                                "measurement": [
                                        "cpu_usage_idle",
                                        "cpu_usage_iowait",
                                        "cpu_usage_user",
                                        "cpu_usage_system"
                                ],
                                "metrics_collection_interval": 60,
                                "totalcpu": false
                        },
                        "disk": {
                                "measurement": [
                                        "used_percent",
                                        "inodes_free"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "diskio": {
                                "measurement": [
                                        "io_time"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "mem": {
                                "measurement": [
                                        "mem_used_percent"
                                ],
                                "metrics_collection_interval": 60
                        },
                        "statsd": {
                                "metrics_aggregation_interval": 60,
                                "metrics_collection_interval": 60,
                                "service_address": ":8125"
                        },
                        "swap": {
                                "measurement": [
                                        "swap_used_percent"
                                ],
                                "metrics_collection_interval": 60
                        }
                }
        }
}
Are you satisfied with the above config? Note: it can be manually customized after the wizard completes to add additional items.
1. yes
2. no
default choice: [1]:
1
Do you have any existing CloudWatch Log Agent (http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html) configuration file to import for migration?
1. yes
2. no
default choice: [2]:
2
Do you want to monitor any log files?
1. yes
2. no
default choice: [1]:
2
Do you want the CloudWatch agent to also retrieve X-ray traces?
1. yes
2. no
default choice: [1]:
2
Existing config JSON identified and copied to:  /opt/aws/amazon-cloudwatch-agent/etc/backup-configs
Saved config file to /opt/aws/amazon-cloudwatch-agent/bin/config.json successfully.
Current config as follows:
{
        "agent": {
                "metrics_collection_interval": 60,
                "run_as_user": "root"
        },
        "metrics": {
                "aggregation_dimensions": [
                        [
                                "InstanceId"
                        ]
                ],
                "append_dimensions": {
                        "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
                        "ImageId": "${aws:ImageId}",
                        "InstanceId": "${aws:InstanceId}",
                        "InstanceType": "${aws:InstanceType}"
                },
                "metrics_collected": {
                        "collectd": {
                                "metrics_aggregation_interval": 60
                        },
                        "cpu": {
                                "measurement": [
                                        "cpu_usage_idle",
                                        "cpu_usage_iowait",
                                        "cpu_usage_user",
                                        "cpu_usage_system"
                                ],
                                "metrics_collection_interval": 60,
                                "totalcpu": false
                        },
                        "disk": {
                                "measurement": [
                                        "used_percent",
                                        "inodes_free"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "diskio": {
                                "measurement": [
                                        "io_time"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "mem": {
                                "measurement": [
                                        "mem_used_percent"
                                ],
                                "metrics_collection_interval": 60
                        },
                        "statsd": {
                                "metrics_aggregation_interval": 60,
                                "metrics_collection_interval": 60,
                                "service_address": ":8125"
                        },
                        "swap": {
                                "measurement": [
                                        "swap_used_percent"
                                ],
                                "metrics_collection_interval": 60
                        }
                }
        }
}
Please check the above content of the config.
The config file is also located at /opt/aws/amazon-cloudwatch-agent/bin/config.json.
Edit it manually if needed.
Do you want to store the config in the SSM parameter store?
1. yes
2. no
default choice: [1]:
2
Program exits now.

向导设置完成后,一般会输出一个监控配置文件,文件路径可以在上面的对话中找到,一般为/opt/aws/amazon-cloudwatch-agent/bin/config.json

启动Cloudwatch Agent

我们需要用这份配置(/opt/aws/amazon-cloudwatch-agent/bin/config.json)启动Cloudwatch Agent

bash 复制代码
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json

查看Cloudwatch Agent服务状态并设置开启启动

bash 复制代码
# 设置开机自启动amazon-cloudwatch-agent
sudo systemctl enable amazon-cloudwatch-agent
# 启动amazon-cloudwatch-agent
sudo systemctl start amazon-cloudwatch-agent
# 查看amazon-cloudwatch-agent状态
sudo systemctl status amazon-cloudwatch-agent

在Cloudwatch上查看监控指标

我们直接可以在Cloudwatch里面查看,其中CWAgentCloudwatch Agent在Cloudwatch中创建的namespace,专门用于存放Cloudwatch Agent上传的指标。

参考

相关推荐
齐 飞22 分钟前
MongoDB笔记01-概念与安装
前端·数据库·笔记·后端·mongodb
饮啦冰美式29 分钟前
22.04Ubuntu---ROS2使用rclcpp编写节点
linux·运维·ubuntu
wowocpp29 分钟前
ubuntu 22.04 server 安装 和 初始化 LTS
linux·运维·ubuntu
Huaqiwill30 分钟前
Ubuntun搭建并行计算环境
linux·云计算
wclass-zhengge33 分钟前
Netty篇(入门编程)
java·linux·服务器
Lign1731434 分钟前
ubuntu unrar解压 中文文件名异常问题解决
linux·运维·ubuntu
LunarCod39 分钟前
WorkFlow源码剖析——Communicator之TCPServer(中)
后端·workflow·c/c++·网络框架·源码剖析·高性能高并发
码农派大星。1 小时前
Spring Boot 配置文件
java·spring boot·后端
vip4511 小时前
Linux 经典面试八股文
linux