AWS EC2设置内存用量、磁盘用量等高级指标监控

本文介绍AWS EC2安装Cloudwatch Agent操作流程,实现内存用量等额外指标的采集。

当我们以普通的方式来开启EC2 instance,我们只能在CloudWatch里面看到CPU、EBS IO、网络IO等信息,但是如果要涉及到内存用量,EBS存储空间用量等信息时,在默认没有到情况下无法直接看到,需要安装额外的Cloudwatch Agent。默认情况下,AWS的监控服务Cloudwatch并没有对EC2内的内存总量和使用情况进行监控,因为内存属于用户操作系统内的信息,在AWS的产品设计中,所有系统内的信息都属于用户的私有财产和信息。

需要进行内存用量,EBS存储空间等额外高敏感指标收集的典型场景:

  1. 某个服务在某次高峰流量到达后,运行一段时间发现性能下降或者失去响应,当流量峰值过去后依然无法恢复
  2. 一般是某台EC2运行了一些关键业务,但是过一段时间后程序性能下降或者某些服务运行异常
  3. 在内存严重紧张的情况下,SSH进程会僵死失去响应,运维人员无法轻松的SSH登陆EC2
  4. 某些自建的数据库和数据分析数据仓库,需要时刻监控EBS剩余空间

安装流程

安装Cloudwatch Agent和collectd

本文环境基于Amazon Linux 2023进行演示

我们先进行Cloudwatch Agent的安装

bash 复制代码
sudo dnf install amazon-cloudwatch-agent
bash 复制代码
sudo dnf install collectd

为本台EC2设置权限

我们需要参考Create IAM roles and users for use with CloudWatch agent,先建立一个role,并未role赋予CloudWatchAgentServerPolicyAWSXRayDaemonWriteAccesslogs:PutRetentionPolicy,具体的权限细节见下图

然后我们为EC2赋予这个role,起初我们看到EC2是没有role的

现在有了

配置Cloudwatch Agent监控

我们可以直接运行sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard帮助我们进行设置向导,下面是我自己的演示为例子

bash 复制代码
[ec2-user@ip-172-50-2-23 ~]$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
================================================================
= Welcome to the Amazon CloudWatch Agent Configuration Manager =
=                                                              =
= CloudWatch Agent allows you to collect metrics and logs from =
= your host and send them to CloudWatch. Additional CloudWatch =
= charges may apply.                                           =
================================================================
On which OS are you planning to use the agent?
1. linux
2. windows
3. darwin
default choice: [1]:
1
Trying to fetch the default region based on ec2 metadata...
2023/12/14 06:02:51 I! imds retry client will retry 1 times
Are you using EC2 or On-Premises hosts?
1. EC2
2. On-Premises
default choice: [1]:
1
Which user are you planning to run the agent?
1. root
2. cwagent
3. others
default choice: [1]:
1
Do you want to turn on StatsD daemon?
1. yes
2. no
default choice: [1]:
1
Which port do you want StatsD daemon to listen to?
default choice: [8125]

What is the collect interval for StatsD daemon?
1. 10s
2. 30s
3. 60s
default choice: [1]:
3
What is the aggregation interval for metrics collected by StatsD daemon?
1. Do not aggregate
2. 10s
3. 30s
4. 60s
default choice: [4]:
4
Do you want to monitor metrics from CollectD? WARNING: CollectD must be installed or the Agent will fail to start
1. yes
2. no
default choice: [1]:
1
Do you want to monitor any host metrics? e.g. CPU, memory, etc.
1. yes
2. no
default choice: [1]:
1
Do you want to monitor cpu metrics per core?
1. yes
2. no
default choice: [1]:
2
Do you want to add ec2 dimensions (ImageId, InstanceId, InstanceType, AutoScalingGroupName) into all of your metrics if the info is available?
1. yes
2. no
default choice: [1]:
1
Do you want to aggregate ec2 dimensions (InstanceId)?
1. yes
2. no
default choice: [1]:
1
Would you like to collect your metrics at high resolution (sub-minute resolution)? This enables sub-minute resolution for all metrics, but you can customize for specific metrics in the output json file.
1. 1s
2. 10s
3. 30s
4. 60s
default choice: [4]:
4
Which default metrics config do you want?
1. Basic
2. Standard
3. Advanced
4. None
default choice: [1]:
2
Current config as follows:
{
        "agent": {
                "metrics_collection_interval": 60,
                "run_as_user": "root"
        },
        "metrics": {
                "aggregation_dimensions": [
                        [
                                "InstanceId"
                        ]
                ],
                "append_dimensions": {
                        "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
                        "ImageId": "${aws:ImageId}",
                        "InstanceId": "${aws:InstanceId}",
                        "InstanceType": "${aws:InstanceType}"
                },
                "metrics_collected": {
                        "collectd": {
                                "metrics_aggregation_interval": 60
                        },
                        "cpu": {
                                "measurement": [
                                        "cpu_usage_idle",
                                        "cpu_usage_iowait",
                                        "cpu_usage_user",
                                        "cpu_usage_system"
                                ],
                                "metrics_collection_interval": 60,
                                "totalcpu": false
                        },
                        "disk": {
                                "measurement": [
                                        "used_percent",
                                        "inodes_free"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "diskio": {
                                "measurement": [
                                        "io_time"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "mem": {
                                "measurement": [
                                        "mem_used_percent"
                                ],
                                "metrics_collection_interval": 60
                        },
                        "statsd": {
                                "metrics_aggregation_interval": 60,
                                "metrics_collection_interval": 60,
                                "service_address": ":8125"
                        },
                        "swap": {
                                "measurement": [
                                        "swap_used_percent"
                                ],
                                "metrics_collection_interval": 60
                        }
                }
        }
}
Are you satisfied with the above config? Note: it can be manually customized after the wizard completes to add additional items.
1. yes
2. no
default choice: [1]:
1
Do you have any existing CloudWatch Log Agent (http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html) configuration file to import for migration?
1. yes
2. no
default choice: [2]:
2
Do you want to monitor any log files?
1. yes
2. no
default choice: [1]:
2
Do you want the CloudWatch agent to also retrieve X-ray traces?
1. yes
2. no
default choice: [1]:
2
Existing config JSON identified and copied to:  /opt/aws/amazon-cloudwatch-agent/etc/backup-configs
Saved config file to /opt/aws/amazon-cloudwatch-agent/bin/config.json successfully.
Current config as follows:
{
        "agent": {
                "metrics_collection_interval": 60,
                "run_as_user": "root"
        },
        "metrics": {
                "aggregation_dimensions": [
                        [
                                "InstanceId"
                        ]
                ],
                "append_dimensions": {
                        "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
                        "ImageId": "${aws:ImageId}",
                        "InstanceId": "${aws:InstanceId}",
                        "InstanceType": "${aws:InstanceType}"
                },
                "metrics_collected": {
                        "collectd": {
                                "metrics_aggregation_interval": 60
                        },
                        "cpu": {
                                "measurement": [
                                        "cpu_usage_idle",
                                        "cpu_usage_iowait",
                                        "cpu_usage_user",
                                        "cpu_usage_system"
                                ],
                                "metrics_collection_interval": 60,
                                "totalcpu": false
                        },
                        "disk": {
                                "measurement": [
                                        "used_percent",
                                        "inodes_free"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "diskio": {
                                "measurement": [
                                        "io_time"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "mem": {
                                "measurement": [
                                        "mem_used_percent"
                                ],
                                "metrics_collection_interval": 60
                        },
                        "statsd": {
                                "metrics_aggregation_interval": 60,
                                "metrics_collection_interval": 60,
                                "service_address": ":8125"
                        },
                        "swap": {
                                "measurement": [
                                        "swap_used_percent"
                                ],
                                "metrics_collection_interval": 60
                        }
                }
        }
}
Please check the above content of the config.
The config file is also located at /opt/aws/amazon-cloudwatch-agent/bin/config.json.
Edit it manually if needed.
Do you want to store the config in the SSM parameter store?
1. yes
2. no
default choice: [1]:
2
Program exits now.

向导设置完成后,一般会输出一个监控配置文件,文件路径可以在上面的对话中找到,一般为/opt/aws/amazon-cloudwatch-agent/bin/config.json

启动Cloudwatch Agent

我们需要用这份配置(/opt/aws/amazon-cloudwatch-agent/bin/config.json)启动Cloudwatch Agent

bash 复制代码
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json

查看Cloudwatch Agent服务状态并设置开启启动

bash 复制代码
# 设置开机自启动amazon-cloudwatch-agent
sudo systemctl enable amazon-cloudwatch-agent
# 启动amazon-cloudwatch-agent
sudo systemctl start amazon-cloudwatch-agent
# 查看amazon-cloudwatch-agent状态
sudo systemctl status amazon-cloudwatch-agent

在Cloudwatch上查看监控指标

我们直接可以在Cloudwatch里面查看,其中CWAgentCloudwatch Agent在Cloudwatch中创建的namespace,专门用于存放Cloudwatch Agent上传的指标。

参考

相关推荐
蚂蚁背大象3 小时前
Rust 所有权系统是为了解决什么问题
后端·rust
子玖5 小时前
go实现通过ip解析城市
后端·go
Java不加班5 小时前
Java 后端定时任务实现方案与工程化指南
后端
心在飞扬5 小时前
RAG 进阶检索学习笔记
后端
Moment5 小时前
想要长期陪伴你的助理?先从部署一个 OpenClaw 开始 😍😍😍
前端·后端·github
Das1_5 小时前
【Golang 数据结构】Slice 底层机制
后端·go
得物技术5 小时前
深入剖析Spark UI界面:参数与界面详解|得物技术
大数据·后端·spark
古时的风筝5 小时前
花10 分钟时间,把终端改造成“生产力武器”:Ghostty + Yazi + Lazygit 配置全流程
前端·后端·程序员
Cache技术分享5 小时前
340. Java Stream API - 理解并行流的额外开销
前端·后端