本文介绍AWS EC2安装Cloudwatch Agent操作流程,实现内存用量等额外指标的采集。
当我们以普通的方式来开启EC2 instance,我们只能在CloudWatch里面看到CPU、EBS IO、网络IO等信息,但是如果要涉及到内存用量,EBS存储空间用量等信息时,在默认没有到情况下无法直接看到,需要安装额外的Cloudwatch Agent
。默认情况下,AWS的监控服务Cloudwatch并没有对EC2内的内存总量和使用情况进行监控,因为内存属于用户操作系统内的信息,在AWS的产品设计中,所有系统内的信息都属于用户的私有财产和信息。
需要进行内存用量,EBS存储空间等额外高敏感指标收集的典型场景:
- 某个服务在某次高峰流量到达后,运行一段时间发现性能下降或者失去响应,当流量峰值过去后依然无法恢复
- 一般是某台EC2运行了一些关键业务,但是过一段时间后程序性能下降或者某些服务运行异常
- 在内存严重紧张的情况下,SSH进程会僵死失去响应,运维人员无法轻松的SSH登陆EC2
- 某些自建的数据库和数据分析数据仓库,需要时刻监控EBS剩余空间
安装流程
安装Cloudwatch Agent和collectd
本文环境基于Amazon Linux 2023进行演示
我们先进行Cloudwatch Agent
的安装
bash
sudo dnf install amazon-cloudwatch-agent
bash
sudo dnf install collectd
为本台EC2设置权限
我们需要参考Create IAM roles and users for use with CloudWatch agent,先建立一个role,并未role赋予CloudWatchAgentServerPolicy
、AWSXRayDaemonWriteAccess
和logs:PutRetentionPolicy
,具体的权限细节见下图
然后我们为EC2赋予这个role,起初我们看到EC2是没有role的
现在有了
配置Cloudwatch Agent监控
我们可以直接运行sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
帮助我们进行设置向导,下面是我自己的演示为例子
bash
[ec2-user@ip-172-50-2-23 ~]$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
================================================================
= Welcome to the Amazon CloudWatch Agent Configuration Manager =
= =
= CloudWatch Agent allows you to collect metrics and logs from =
= your host and send them to CloudWatch. Additional CloudWatch =
= charges may apply. =
================================================================
On which OS are you planning to use the agent?
1. linux
2. windows
3. darwin
default choice: [1]:
1
Trying to fetch the default region based on ec2 metadata...
2023/12/14 06:02:51 I! imds retry client will retry 1 times
Are you using EC2 or On-Premises hosts?
1. EC2
2. On-Premises
default choice: [1]:
1
Which user are you planning to run the agent?
1. root
2. cwagent
3. others
default choice: [1]:
1
Do you want to turn on StatsD daemon?
1. yes
2. no
default choice: [1]:
1
Which port do you want StatsD daemon to listen to?
default choice: [8125]
What is the collect interval for StatsD daemon?
1. 10s
2. 30s
3. 60s
default choice: [1]:
3
What is the aggregation interval for metrics collected by StatsD daemon?
1. Do not aggregate
2. 10s
3. 30s
4. 60s
default choice: [4]:
4
Do you want to monitor metrics from CollectD? WARNING: CollectD must be installed or the Agent will fail to start
1. yes
2. no
default choice: [1]:
1
Do you want to monitor any host metrics? e.g. CPU, memory, etc.
1. yes
2. no
default choice: [1]:
1
Do you want to monitor cpu metrics per core?
1. yes
2. no
default choice: [1]:
2
Do you want to add ec2 dimensions (ImageId, InstanceId, InstanceType, AutoScalingGroupName) into all of your metrics if the info is available?
1. yes
2. no
default choice: [1]:
1
Do you want to aggregate ec2 dimensions (InstanceId)?
1. yes
2. no
default choice: [1]:
1
Would you like to collect your metrics at high resolution (sub-minute resolution)? This enables sub-minute resolution for all metrics, but you can customize for specific metrics in the output json file.
1. 1s
2. 10s
3. 30s
4. 60s
default choice: [4]:
4
Which default metrics config do you want?
1. Basic
2. Standard
3. Advanced
4. None
default choice: [1]:
2
Current config as follows:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"cpu": {
"measurement": [
"cpu_usage_idle",
"cpu_usage_iowait",
"cpu_usage_user",
"cpu_usage_system"
],
"metrics_collection_interval": 60,
"totalcpu": false
},
"disk": {
"measurement": [
"used_percent",
"inodes_free"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"diskio": {
"measurement": [
"io_time"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 60,
"service_address": ":8125"
},
"swap": {
"measurement": [
"swap_used_percent"
],
"metrics_collection_interval": 60
}
}
}
}
Are you satisfied with the above config? Note: it can be manually customized after the wizard completes to add additional items.
1. yes
2. no
default choice: [1]:
1
Do you have any existing CloudWatch Log Agent (http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html) configuration file to import for migration?
1. yes
2. no
default choice: [2]:
2
Do you want to monitor any log files?
1. yes
2. no
default choice: [1]:
2
Do you want the CloudWatch agent to also retrieve X-ray traces?
1. yes
2. no
default choice: [1]:
2
Existing config JSON identified and copied to: /opt/aws/amazon-cloudwatch-agent/etc/backup-configs
Saved config file to /opt/aws/amazon-cloudwatch-agent/bin/config.json successfully.
Current config as follows:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"cpu": {
"measurement": [
"cpu_usage_idle",
"cpu_usage_iowait",
"cpu_usage_user",
"cpu_usage_system"
],
"metrics_collection_interval": 60,
"totalcpu": false
},
"disk": {
"measurement": [
"used_percent",
"inodes_free"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"diskio": {
"measurement": [
"io_time"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 60,
"service_address": ":8125"
},
"swap": {
"measurement": [
"swap_used_percent"
],
"metrics_collection_interval": 60
}
}
}
}
Please check the above content of the config.
The config file is also located at /opt/aws/amazon-cloudwatch-agent/bin/config.json.
Edit it manually if needed.
Do you want to store the config in the SSM parameter store?
1. yes
2. no
default choice: [1]:
2
Program exits now.
向导设置完成后,一般会输出一个监控配置文件,文件路径可以在上面的对话中找到,一般为/opt/aws/amazon-cloudwatch-agent/bin/config.json
启动Cloudwatch Agent
我们需要用这份配置(/opt/aws/amazon-cloudwatch-agent/bin/config.json
)启动Cloudwatch Agent
bash
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json
查看Cloudwatch Agent服务状态并设置开启启动
bash
# 设置开机自启动amazon-cloudwatch-agent
sudo systemctl enable amazon-cloudwatch-agent
# 启动amazon-cloudwatch-agent
sudo systemctl start amazon-cloudwatch-agent
# 查看amazon-cloudwatch-agent状态
sudo systemctl status amazon-cloudwatch-agent
在Cloudwatch上查看监控指标
我们直接可以在Cloudwatch里面查看,其中CWAgent
是Cloudwatch Agent
在Cloudwatch中创建的namespace,专门用于存放Cloudwatch Agent
上传的指标。