AWS EC2设置内存用量、磁盘用量等高级指标监控

本文介绍AWS EC2安装Cloudwatch Agent操作流程,实现内存用量等额外指标的采集。

当我们以普通的方式来开启EC2 instance,我们只能在CloudWatch里面看到CPU、EBS IO、网络IO等信息,但是如果要涉及到内存用量,EBS存储空间用量等信息时,在默认没有到情况下无法直接看到,需要安装额外的Cloudwatch Agent。默认情况下,AWS的监控服务Cloudwatch并没有对EC2内的内存总量和使用情况进行监控,因为内存属于用户操作系统内的信息,在AWS的产品设计中,所有系统内的信息都属于用户的私有财产和信息。

需要进行内存用量,EBS存储空间等额外高敏感指标收集的典型场景:

  1. 某个服务在某次高峰流量到达后,运行一段时间发现性能下降或者失去响应,当流量峰值过去后依然无法恢复
  2. 一般是某台EC2运行了一些关键业务,但是过一段时间后程序性能下降或者某些服务运行异常
  3. 在内存严重紧张的情况下,SSH进程会僵死失去响应,运维人员无法轻松的SSH登陆EC2
  4. 某些自建的数据库和数据分析数据仓库,需要时刻监控EBS剩余空间

安装流程

安装Cloudwatch Agent和collectd

本文环境基于Amazon Linux 2023进行演示

我们先进行Cloudwatch Agent的安装

bash 复制代码
sudo dnf install amazon-cloudwatch-agent
bash 复制代码
sudo dnf install collectd

为本台EC2设置权限

我们需要参考Create IAM roles and users for use with CloudWatch agent,先建立一个role,并未role赋予CloudWatchAgentServerPolicyAWSXRayDaemonWriteAccesslogs:PutRetentionPolicy,具体的权限细节见下图

然后我们为EC2赋予这个role,起初我们看到EC2是没有role的

现在有了

配置Cloudwatch Agent监控

我们可以直接运行sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard帮助我们进行设置向导,下面是我自己的演示为例子

bash 复制代码
[ec2-user@ip-172-50-2-23 ~]$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
================================================================
= Welcome to the Amazon CloudWatch Agent Configuration Manager =
=                                                              =
= CloudWatch Agent allows you to collect metrics and logs from =
= your host and send them to CloudWatch. Additional CloudWatch =
= charges may apply.                                           =
================================================================
On which OS are you planning to use the agent?
1. linux
2. windows
3. darwin
default choice: [1]:
1
Trying to fetch the default region based on ec2 metadata...
2023/12/14 06:02:51 I! imds retry client will retry 1 times
Are you using EC2 or On-Premises hosts?
1. EC2
2. On-Premises
default choice: [1]:
1
Which user are you planning to run the agent?
1. root
2. cwagent
3. others
default choice: [1]:
1
Do you want to turn on StatsD daemon?
1. yes
2. no
default choice: [1]:
1
Which port do you want StatsD daemon to listen to?
default choice: [8125]

What is the collect interval for StatsD daemon?
1. 10s
2. 30s
3. 60s
default choice: [1]:
3
What is the aggregation interval for metrics collected by StatsD daemon?
1. Do not aggregate
2. 10s
3. 30s
4. 60s
default choice: [4]:
4
Do you want to monitor metrics from CollectD? WARNING: CollectD must be installed or the Agent will fail to start
1. yes
2. no
default choice: [1]:
1
Do you want to monitor any host metrics? e.g. CPU, memory, etc.
1. yes
2. no
default choice: [1]:
1
Do you want to monitor cpu metrics per core?
1. yes
2. no
default choice: [1]:
2
Do you want to add ec2 dimensions (ImageId, InstanceId, InstanceType, AutoScalingGroupName) into all of your metrics if the info is available?
1. yes
2. no
default choice: [1]:
1
Do you want to aggregate ec2 dimensions (InstanceId)?
1. yes
2. no
default choice: [1]:
1
Would you like to collect your metrics at high resolution (sub-minute resolution)? This enables sub-minute resolution for all metrics, but you can customize for specific metrics in the output json file.
1. 1s
2. 10s
3. 30s
4. 60s
default choice: [4]:
4
Which default metrics config do you want?
1. Basic
2. Standard
3. Advanced
4. None
default choice: [1]:
2
Current config as follows:
{
        "agent": {
                "metrics_collection_interval": 60,
                "run_as_user": "root"
        },
        "metrics": {
                "aggregation_dimensions": [
                        [
                                "InstanceId"
                        ]
                ],
                "append_dimensions": {
                        "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
                        "ImageId": "${aws:ImageId}",
                        "InstanceId": "${aws:InstanceId}",
                        "InstanceType": "${aws:InstanceType}"
                },
                "metrics_collected": {
                        "collectd": {
                                "metrics_aggregation_interval": 60
                        },
                        "cpu": {
                                "measurement": [
                                        "cpu_usage_idle",
                                        "cpu_usage_iowait",
                                        "cpu_usage_user",
                                        "cpu_usage_system"
                                ],
                                "metrics_collection_interval": 60,
                                "totalcpu": false
                        },
                        "disk": {
                                "measurement": [
                                        "used_percent",
                                        "inodes_free"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "diskio": {
                                "measurement": [
                                        "io_time"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "mem": {
                                "measurement": [
                                        "mem_used_percent"
                                ],
                                "metrics_collection_interval": 60
                        },
                        "statsd": {
                                "metrics_aggregation_interval": 60,
                                "metrics_collection_interval": 60,
                                "service_address": ":8125"
                        },
                        "swap": {
                                "measurement": [
                                        "swap_used_percent"
                                ],
                                "metrics_collection_interval": 60
                        }
                }
        }
}
Are you satisfied with the above config? Note: it can be manually customized after the wizard completes to add additional items.
1. yes
2. no
default choice: [1]:
1
Do you have any existing CloudWatch Log Agent (http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html) configuration file to import for migration?
1. yes
2. no
default choice: [2]:
2
Do you want to monitor any log files?
1. yes
2. no
default choice: [1]:
2
Do you want the CloudWatch agent to also retrieve X-ray traces?
1. yes
2. no
default choice: [1]:
2
Existing config JSON identified and copied to:  /opt/aws/amazon-cloudwatch-agent/etc/backup-configs
Saved config file to /opt/aws/amazon-cloudwatch-agent/bin/config.json successfully.
Current config as follows:
{
        "agent": {
                "metrics_collection_interval": 60,
                "run_as_user": "root"
        },
        "metrics": {
                "aggregation_dimensions": [
                        [
                                "InstanceId"
                        ]
                ],
                "append_dimensions": {
                        "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
                        "ImageId": "${aws:ImageId}",
                        "InstanceId": "${aws:InstanceId}",
                        "InstanceType": "${aws:InstanceType}"
                },
                "metrics_collected": {
                        "collectd": {
                                "metrics_aggregation_interval": 60
                        },
                        "cpu": {
                                "measurement": [
                                        "cpu_usage_idle",
                                        "cpu_usage_iowait",
                                        "cpu_usage_user",
                                        "cpu_usage_system"
                                ],
                                "metrics_collection_interval": 60,
                                "totalcpu": false
                        },
                        "disk": {
                                "measurement": [
                                        "used_percent",
                                        "inodes_free"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "diskio": {
                                "measurement": [
                                        "io_time"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "mem": {
                                "measurement": [
                                        "mem_used_percent"
                                ],
                                "metrics_collection_interval": 60
                        },
                        "statsd": {
                                "metrics_aggregation_interval": 60,
                                "metrics_collection_interval": 60,
                                "service_address": ":8125"
                        },
                        "swap": {
                                "measurement": [
                                        "swap_used_percent"
                                ],
                                "metrics_collection_interval": 60
                        }
                }
        }
}
Please check the above content of the config.
The config file is also located at /opt/aws/amazon-cloudwatch-agent/bin/config.json.
Edit it manually if needed.
Do you want to store the config in the SSM parameter store?
1. yes
2. no
default choice: [1]:
2
Program exits now.

向导设置完成后,一般会输出一个监控配置文件,文件路径可以在上面的对话中找到,一般为/opt/aws/amazon-cloudwatch-agent/bin/config.json

启动Cloudwatch Agent

我们需要用这份配置(/opt/aws/amazon-cloudwatch-agent/bin/config.json)启动Cloudwatch Agent

bash 复制代码
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json

查看Cloudwatch Agent服务状态并设置开启启动

bash 复制代码
# 设置开机自启动amazon-cloudwatch-agent
sudo systemctl enable amazon-cloudwatch-agent
# 启动amazon-cloudwatch-agent
sudo systemctl start amazon-cloudwatch-agent
# 查看amazon-cloudwatch-agent状态
sudo systemctl status amazon-cloudwatch-agent

在Cloudwatch上查看监控指标

我们直接可以在Cloudwatch里面查看,其中CWAgentCloudwatch Agent在Cloudwatch中创建的namespace,专门用于存放Cloudwatch Agent上传的指标。

参考

相关推荐
xcya3 分钟前
Java ReentrantLock 核心用法
后端
用户4665370150516 分钟前
如何在 IntelliJ IDEA 中可视化压缩提交到生产分支
后端·github
小楓120122 分钟前
MySQL數據庫開發教學(一) 基本架構
数据库·后端·mysql
天天摸鱼的java工程师24 分钟前
Java 解析 JSON 文件:八年老开发的实战总结(从业务到代码)
java·后端·面试
白仑色25 分钟前
Spring Boot 全局异常处理
java·spring boot·后端·全局异常处理·统一返回格式
之诺31 分钟前
MySQL通信过程字符集转换
后端·mysql
喵手32 分钟前
反射机制:你真的了解它的“能力”吗?
java·后端·java ee
用户4665370150533 分钟前
git代码压缩合并
后端·github
武大打工仔36 分钟前
从零开始手搓一个MVC框架
后端
开心猴爷42 分钟前
移动端网页调试实战 Cookie 丢失问题的排查与优化
后端