使用AWS Systems Manager简化AWS EC2内存监控设置

国内的云服务商一般都会为用户的的服务器安装OS内的监控插件,通过插件来提供CPU,磁盘和内存指标监控。但是这在AWS并不是默认的,原因是在国外用户的隐私和信息安全是非常重要的,安全是AWS的第一优先级,AWS不会在用户未明确许可的情况下在服务器的OS里面安装指标收集插件并主动的收集部分指标。

默认情况下,AWS的监控服务Cloudwatch并没有对EC2内的内存总量和使用情况进行监控,因为内存属于用户操作系统内的信息,在AWS的产品设计中,所有系统内的信息都属于用户的私有财产和信息。所以默认情况下,AWS的Cloudwatch不收集相关信息,除非你主动进行相关的设置,在服务器系统内安装CloudWatch agent

实际使用的项目中,以内存监控为代表的系统、应用层面的监控是系统监控中的非常重要的一环,所以AWS提供了Cloudwatch Agent来帮助用户将EC2实例中的系统层面的信息,事实上,Cloudwatch Agent不仅仅能够收集内存信息,还能在更多系统层面收集信息,比如: CPU Active/Idle timeDisk IO TimeNetwork的包转发数等等,相比EC2的默认Cloudwatch,它可以提供更为详细和多样性的监控。

一般情况下,监控某台服务器我们可以为它安装指标收集插件。我们可以参考Monitor memory and disk metrics for Amazon EC2 Linux instances,里面涉及到比较多的步骤,如果被监控的服务器比较多的话需要我们逐个服务器进行设置,比较麻烦。那么我们怎么自动的为我们的服务器做好这些步骤呢?答案就算使用AWS Systems Manager来自动化完成。

AWS Systems Manager 设置

先下载cloudfromation模板,使用浏览器下载 mem-metrics-cfn-temp.yaml,或者使用wget下载:wget https://d2908q01vomqb2.cloudfront.net/artifacts/MTBlog/cloudops-1223/mem-metrics-cfn-temp.yaml

备注:建议进行细节的修改更准确的获取EC2的role ,因为在容器或者某些场景下instance profile name不一定总是和role name一致。

例如我的EKS节点的role和instance profile不一致

完整的代码如下:

yaml 复制代码
---
AWSTemplateFormatVersion: '2010-09-09'
Description: A sample template to create a AWS Systems Manager Automation Document that installs Amazon CloudWatch agent, sets up necessary permissions and configures CloudWatch agent to publish memory metrics
  to CloudWatch
Resources:
  SsmMemMetricsAutomationRole:
    Type: AWS::IAM::Role
    Properties:
      Description: AWS IAM role for AWS Systems Manager to execute automation document
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: ssm.amazonaws.com
            Action: sts:AssumeRole
            Condition:
              StringEquals:
                aws:SourceAccount: !Sub ${AWS::AccountId}
              ArnLike:
                aws:SourceArn: !Sub arn:${AWS::Partition}:ssm:*:${AWS::AccountId}:automation-execution/*
      ManagedPolicyArns:
        - !Sub arn:${AWS::Partition}:iam::aws:policy/service-role/AmazonSSMAutomationRole
      Path: /
      Policies:
        - PolicyName: SsmMemMetricIamPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - iam:GetRole
                  - iam:GetInstanceProfile
                  - iam:GetPolicy
                  - iam:AttachRolePolicy
                  - iam:ListInstanceProfiles
                  - ec2:DescribeInstances
                Resource: '*'
  CloudWatchAgentConfigFile:
    Type: AWS::SSM::Parameter
    Properties:
      Name: CloudwatchAgentConfigForMemoryMetricsLinux.json
      Description: Store CloudWatch Agent configuration file as AWS Systems Manager Parameter
      Type: String
      Value: |
        {
            "agent": {
                    "metrics_collection_interval": 60,
                    "run_as_user": "cwagent"
            },
            "metrics": {
                    "append_dimensions": {
                        "InstanceId": "${aws:InstanceId}"
                    },
                    "metrics_collected": {
                    "mem": {
                            "measurement": [
                                "mem_used_percent"
                            ],
                            "metrics_collection_interval": 60
                    }
                }
            }
        }
  MemoryMetricsRunbook:
    Type: AWS::SSM::Document
    Properties:
      DocumentFormat: YAML
      DocumentType: Automation
      Name: ConfigureMemoryMetricsOnEC2Linux
      Content:
        description: Install CloudWatch Agent, Add permissions to target instances and configure CloudWatch agent to publish metrics
        schemaVersion: '0.3'
        assumeRole: '{{AutomationAssumeRole}}'
        parameters:
          InstanceId:
            type: String
            description: Select instances
          AutomationAssumeRole:
            type: String
            description: (Optional) The ARN of the role that allows Automation to perform the actions on your behalf.
            default: ''
            allowedPattern: ^arn:aws(-cn|-us-gov)?:iam::\d{12}:role\/[\w+=,.@_\/-]+|^$
        mainSteps:
          - name: AttachCloudWatchAgentServerPolicy
            action: aws:executeScript
            onFailure: Abort
            isCritical: true
            timeoutSeconds: 600
            description: |
              ## Find the attached role, attach CloudWatchAgentServer managed policy to the role
            inputs:
              Runtime: python3.8
              Handler: attach_cloudwatch_agent_managed_policy
              InputPayload:
                InstanceIds: '{{InstanceId}}'
              Script: |
                import boto3
                ec2_client = boto3.client('ec2')
                iam_client = boto3.client('iam')
                current_session = boto3.session.Session()
                current_region = current_session.region_name
                partition = current_session.get_partition_for_region(current_region)
                cloudwatchagent_policy_arn = f'arn:{partition}:iam::aws:policy/CloudWatchAgentServerPolicy'
                # instances_id = event['InstanceIds']
                def attach_cloudwatch_agent_managed_policy(event,context):
                  # Define the instance ID for which you want to find the IAM role
                  instance_id = event['InstanceIds']

                  # Use the describe_instances() method to get information about the instance
                  response = ec2_client.describe_instances(InstanceIds=[instance_id])

                  # Get the IAM role from the response
                  # 下面这部分作了细节改动,方便更准确获取EC2的role
                  
                  # 获取AWS EC2 IAM instance profile
                  iam_instance_profile_arn = response['Reservations'][0]['Instances'][0]['IamInstanceProfile']['Arn']
                  # 截取instance profile名称
                  ec2_instance_profile_name = iam_instance_profile_arn.split('/')[-1]
                  iam_client = boto3.client('iam')
                  # 根据instance profile名称获取instance profile的详细信息
                  iam_instance_profile = iam_client.get_instance_profile(
                      InstanceProfileName=ec2_instance_profile_name
                  )
                  # 根据instance profile详情获取EC2 role name
                  ec2_iam_role_name = iam_instance_profile['InstanceProfile']['Roles'][0]['RoleName']
                  iam_client.attach_role_policy(RoleName=ec2_iam_role_name, PolicyArn=cloudwatchagent_policy_arn)
          - name: installCWAgent
            action: aws:runCommand
            onFailure: Abort
            inputs:
              Parameters:
                action:
                  - Install
                installationType:
                  - Uninstall and reinstall
                name:
                  - AmazonCloudWatchAgent
              DocumentName: AWS-ConfigureAWSPackage
              InstanceIds:
                - '{{InstanceId}}'
          - name: configureCWAgent
            action: aws:runCommand
            inputs:
              DocumentName: AmazonCloudWatch-ManageAgent
              InstanceIds:
                - '{{InstanceId}}'
              Parameters:
                action: configure
                mode: ec2
                optionalConfigurationSource: ssm
                optionalConfigurationLocation: CloudwatchAgentConfigForMemoryMetricsLinux.json
                optionalRestart: 'yes'
Outputs:
  SsmMemMetricsAutomationRoleName:
    Description: Name of the SSM Automation IAM Role
    Value: !Ref SsmMemMetricsAutomationRole

使用改动过的yaml文件创建cloudfromation stack

arduino 复制代码
aws cloudformation create-stack --stack-name MemoryMetricsAutomation --template-body file://mem-metrics-cfn-temp.yaml --capabilities CAPABILITY_NAMED_IAM

等待cloudfromation stack创建完成

arduino 复制代码
aws cloudformation wait stack-create-complete --stack-name MemoryMetricsAutomation

获取AWS Systems Manager自动执行所需的SsmMemMetricsAutomationRoleName

css 复制代码
aws cloudformation describe-stacks --stack-name MemoryMetricsAutomation --query 'Stacks[0].Outputs[?OutputKey==`SsmMemMetricsAutomationRoleName`].OutputValue'

具体的操作如下:

使用AWS Systems Manager安装插件

打开AWS Systems Manager->Change Management->Automation

接着点击Execute automation,选择我们前面创建好的runbook,这里我选择Owned by me->ConfigureMemoryMetricsOnEC2Linux

然后选择simple execution并选中我们需要安装监控的EC2,最后选择前面创建的SsmMemMetricsAutomationRoleName(一般名称为MemoryMetricsAutomation-SsmMemMetricsAutomationRole-xxxx),然后提交

提交后就会进行相关的流程化安装

等待执行完成

等几分钟后内存指标会被上报到cloudwatch中,最后我们可以在cloudwatch里面看到内存的使用情况

总结

  1. 被安装内存监控的EC2必须先被赋予一个IAM Role,这样cloudwatch agent才能获取到合适的权限发送指标到cloudwatch
  2. 后续新的EC2安装监控,我们可以直接在AWS Systems Manager->Change Management->Automation批量选择这些EC2来简化安装流程
  3. 新创建的EC2一般默认不会安装cloudwatch gaent,我们除了可以使用AWS Systems Manager外,也可以设置在EC2启动是自动执行自定义脚步来完成类似步骤,具体的可以使用instance user data来实现,比如将脚本写在Launch template的user data里面来自动的为EC2 Auto Scaling Group里面的EC2进行启动时安装

参考

相关推荐
Java水解12 分钟前
Spring Boot 配置文件深度解析
spring boot·后端
狗头大军之江苏分军18 分钟前
Node.js 性能优化实践,但老板只关心是否能跑
前端·后端
李拾叁的摸鱼日常27 分钟前
Java泛型基本用法与PECS原则详解
java·后端·面试
狗头大军之江苏分军27 分钟前
Node.js 真香,但每次部署都想砸电脑
前端·javascript·后端
帅那个帅1 小时前
go的雪花算法代码分享
开发语言·后端·golang
酒酿萝卜皮1 小时前
Elastic Search 聚合查询
后端
程序员清风1 小时前
阿里二面:新生代垃圾回收为啥使用标记复制算法?
java·后端·面试
sino爱学习1 小时前
Java 三元表达式(?:)的常见坑总结
java·后端
❀͜͡傀儡师1 小时前
Spring Boot函数式编程:轻量级路由函数替代传统Controller
java·spring boot·后端