国内的云服务商一般都会为用户的的服务器安装OS内的监控插件,通过插件来提供CPU,磁盘和内存指标监控。但是这在AWS并不是默认的,原因是在国外用户的隐私和信息安全是非常重要的,安全是AWS的第一优先级,AWS不会在用户未明确许可的情况下在服务器的OS里面安装指标收集插件并主动的收集部分指标。
默认情况下,AWS的监控服务Cloudwatch
并没有对EC2
内的内存总量和使用情况进行监控,因为内存属于用户操作系统内的信息,在AWS的产品设计中,所有系统内的信息都属于用户的私有财产和信息。所以默认情况下,AWS的Cloudwatch不收集相关信息,除非你主动进行相关的设置,在服务器系统内安装CloudWatch agent
。
实际使用的项目中,以内存监控为代表的系统、应用层面的监控是系统监控中的非常重要的一环,所以AWS提供了Cloudwatch Agent
来帮助用户将EC2实例中的系统层面的信息,事实上,Cloudwatch Agent
不仅仅能够收集内存信息,还能在更多系统层面收集信息,比如: CPU Active/Idle time
、Disk IO Time
、Network
的包转发数等等,相比EC2的默认Cloudwatch
,它可以提供更为详细和多样性的监控。
一般情况下,监控某台服务器我们可以为它安装指标收集插件。我们可以参考Monitor memory and disk metrics for Amazon EC2 Linux instances,里面涉及到比较多的步骤,如果被监控的服务器比较多的话需要我们逐个服务器进行设置,比较麻烦。那么我们怎么自动的为我们的服务器做好这些步骤呢?答案就算使用AWS Systems Manager来自动化完成。
AWS Systems Manager 设置
先下载cloudfromation模板,使用浏览器下载 mem-metrics-cfn-temp.yaml,或者使用wget下载:wget https://d2908q01vomqb2.cloudfront.net/artifacts/MTBlog/cloudops-1223/mem-metrics-cfn-temp.yaml
备注:建议进行细节的修改更准确的获取EC2的role ,因为在容器或者某些场景下instance profile name
不一定总是和role name
一致。
例如我的EKS节点的role和instance profile不一致
完整的代码如下:
yaml
---
AWSTemplateFormatVersion: '2010-09-09'
Description: A sample template to create a AWS Systems Manager Automation Document that installs Amazon CloudWatch agent, sets up necessary permissions and configures CloudWatch agent to publish memory metrics
to CloudWatch
Resources:
SsmMemMetricsAutomationRole:
Type: AWS::IAM::Role
Properties:
Description: AWS IAM role for AWS Systems Manager to execute automation document
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: ssm.amazonaws.com
Action: sts:AssumeRole
Condition:
StringEquals:
aws:SourceAccount: !Sub ${AWS::AccountId}
ArnLike:
aws:SourceArn: !Sub arn:${AWS::Partition}:ssm:*:${AWS::AccountId}:automation-execution/*
ManagedPolicyArns:
- !Sub arn:${AWS::Partition}:iam::aws:policy/service-role/AmazonSSMAutomationRole
Path: /
Policies:
- PolicyName: SsmMemMetricIamPolicy
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- iam:GetRole
- iam:GetInstanceProfile
- iam:GetPolicy
- iam:AttachRolePolicy
- iam:ListInstanceProfiles
- ec2:DescribeInstances
Resource: '*'
CloudWatchAgentConfigFile:
Type: AWS::SSM::Parameter
Properties:
Name: CloudwatchAgentConfigForMemoryMetricsLinux.json
Description: Store CloudWatch Agent configuration file as AWS Systems Manager Parameter
Type: String
Value: |
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "cwagent"
},
"metrics": {
"append_dimensions": {
"InstanceId": "${aws:InstanceId}"
},
"metrics_collected": {
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
}
}
}
}
MemoryMetricsRunbook:
Type: AWS::SSM::Document
Properties:
DocumentFormat: YAML
DocumentType: Automation
Name: ConfigureMemoryMetricsOnEC2Linux
Content:
description: Install CloudWatch Agent, Add permissions to target instances and configure CloudWatch agent to publish metrics
schemaVersion: '0.3'
assumeRole: '{{AutomationAssumeRole}}'
parameters:
InstanceId:
type: String
description: Select instances
AutomationAssumeRole:
type: String
description: (Optional) The ARN of the role that allows Automation to perform the actions on your behalf.
default: ''
allowedPattern: ^arn:aws(-cn|-us-gov)?:iam::\d{12}:role\/[\w+=,.@_\/-]+|^$
mainSteps:
- name: AttachCloudWatchAgentServerPolicy
action: aws:executeScript
onFailure: Abort
isCritical: true
timeoutSeconds: 600
description: |
## Find the attached role, attach CloudWatchAgentServer managed policy to the role
inputs:
Runtime: python3.8
Handler: attach_cloudwatch_agent_managed_policy
InputPayload:
InstanceIds: '{{InstanceId}}'
Script: |
import boto3
ec2_client = boto3.client('ec2')
iam_client = boto3.client('iam')
current_session = boto3.session.Session()
current_region = current_session.region_name
partition = current_session.get_partition_for_region(current_region)
cloudwatchagent_policy_arn = f'arn:{partition}:iam::aws:policy/CloudWatchAgentServerPolicy'
# instances_id = event['InstanceIds']
def attach_cloudwatch_agent_managed_policy(event,context):
# Define the instance ID for which you want to find the IAM role
instance_id = event['InstanceIds']
# Use the describe_instances() method to get information about the instance
response = ec2_client.describe_instances(InstanceIds=[instance_id])
# Get the IAM role from the response
# 下面这部分作了细节改动,方便更准确获取EC2的role
# 获取AWS EC2 IAM instance profile
iam_instance_profile_arn = response['Reservations'][0]['Instances'][0]['IamInstanceProfile']['Arn']
# 截取instance profile名称
ec2_instance_profile_name = iam_instance_profile_arn.split('/')[-1]
iam_client = boto3.client('iam')
# 根据instance profile名称获取instance profile的详细信息
iam_instance_profile = iam_client.get_instance_profile(
InstanceProfileName=ec2_instance_profile_name
)
# 根据instance profile详情获取EC2 role name
ec2_iam_role_name = iam_instance_profile['InstanceProfile']['Roles'][0]['RoleName']
iam_client.attach_role_policy(RoleName=ec2_iam_role_name, PolicyArn=cloudwatchagent_policy_arn)
- name: installCWAgent
action: aws:runCommand
onFailure: Abort
inputs:
Parameters:
action:
- Install
installationType:
- Uninstall and reinstall
name:
- AmazonCloudWatchAgent
DocumentName: AWS-ConfigureAWSPackage
InstanceIds:
- '{{InstanceId}}'
- name: configureCWAgent
action: aws:runCommand
inputs:
DocumentName: AmazonCloudWatch-ManageAgent
InstanceIds:
- '{{InstanceId}}'
Parameters:
action: configure
mode: ec2
optionalConfigurationSource: ssm
optionalConfigurationLocation: CloudwatchAgentConfigForMemoryMetricsLinux.json
optionalRestart: 'yes'
Outputs:
SsmMemMetricsAutomationRoleName:
Description: Name of the SSM Automation IAM Role
Value: !Ref SsmMemMetricsAutomationRole
使用改动过的yaml文件创建cloudfromation stack
arduino
aws cloudformation create-stack --stack-name MemoryMetricsAutomation --template-body file://mem-metrics-cfn-temp.yaml --capabilities CAPABILITY_NAMED_IAM
等待cloudfromation stack
创建完成
arduino
aws cloudformation wait stack-create-complete --stack-name MemoryMetricsAutomation
获取AWS Systems Manager
自动执行所需的SsmMemMetricsAutomationRoleName
css
aws cloudformation describe-stacks --stack-name MemoryMetricsAutomation --query 'Stacks[0].Outputs[?OutputKey==`SsmMemMetricsAutomationRoleName`].OutputValue'
具体的操作如下:
使用AWS Systems Manager安装插件
打开AWS Systems Manager
->Change Management
->Automation
接着点击Execute automation
,选择我们前面创建好的runbook,这里我选择Owned by me
->ConfigureMemoryMetricsOnEC2Linux
然后选择simple execution
并选中我们需要安装监控的EC2,最后选择前面创建的SsmMemMetricsAutomationRoleName
(一般名称为MemoryMetricsAutomation-SsmMemMetricsAutomationRole-xxxx
),然后提交
提交后就会进行相关的流程化安装
等待执行完成
等几分钟后内存指标会被上报到cloudwatch中,最后我们可以在cloudwatch里面看到内存的使用情况
总结
- 被安装内存监控的EC2必须先被赋予一个IAM Role,这样cloudwatch agent才能获取到合适的权限发送指标到cloudwatch
- 后续新的EC2安装监控,我们可以直接在
AWS Systems Manager
->Change Management
->Automation
批量选择这些EC2来简化安装流程 - 新创建的EC2一般默认不会安装cloudwatch gaent,我们除了可以使用AWS Systems Manager外,也可以设置在EC2启动是自动执行自定义脚步来完成类似步骤,具体的可以使用instance user data来实现,比如将脚本写在Launch template的user data里面来自动的为EC2 Auto Scaling Group里面的EC2进行启动时安装