使用AWS Systems Manager简化AWS EC2内存监控设置

国内的云服务商一般都会为用户的的服务器安装OS内的监控插件,通过插件来提供CPU,磁盘和内存指标监控。但是这在AWS并不是默认的,原因是在国外用户的隐私和信息安全是非常重要的,安全是AWS的第一优先级,AWS不会在用户未明确许可的情况下在服务器的OS里面安装指标收集插件并主动的收集部分指标。

默认情况下,AWS的监控服务Cloudwatch并没有对EC2内的内存总量和使用情况进行监控,因为内存属于用户操作系统内的信息,在AWS的产品设计中,所有系统内的信息都属于用户的私有财产和信息。所以默认情况下,AWS的Cloudwatch不收集相关信息,除非你主动进行相关的设置,在服务器系统内安装CloudWatch agent

实际使用的项目中,以内存监控为代表的系统、应用层面的监控是系统监控中的非常重要的一环,所以AWS提供了Cloudwatch Agent来帮助用户将EC2实例中的系统层面的信息,事实上,Cloudwatch Agent不仅仅能够收集内存信息,还能在更多系统层面收集信息,比如: CPU Active/Idle timeDisk IO TimeNetwork的包转发数等等,相比EC2的默认Cloudwatch,它可以提供更为详细和多样性的监控。

一般情况下,监控某台服务器我们可以为它安装指标收集插件。我们可以参考Monitor memory and disk metrics for Amazon EC2 Linux instances,里面涉及到比较多的步骤,如果被监控的服务器比较多的话需要我们逐个服务器进行设置,比较麻烦。那么我们怎么自动的为我们的服务器做好这些步骤呢?答案就算使用AWS Systems Manager来自动化完成。

AWS Systems Manager 设置

先下载cloudfromation模板,使用浏览器下载 mem-metrics-cfn-temp.yaml,或者使用wget下载:wget https://d2908q01vomqb2.cloudfront.net/artifacts/MTBlog/cloudops-1223/mem-metrics-cfn-temp.yaml

备注:建议进行细节的修改更准确的获取EC2的role ,因为在容器或者某些场景下instance profile name不一定总是和role name一致。

例如我的EKS节点的role和instance profile不一致

完整的代码如下:

yaml 复制代码
---
AWSTemplateFormatVersion: '2010-09-09'
Description: A sample template to create a AWS Systems Manager Automation Document that installs Amazon CloudWatch agent, sets up necessary permissions and configures CloudWatch agent to publish memory metrics
  to CloudWatch
Resources:
  SsmMemMetricsAutomationRole:
    Type: AWS::IAM::Role
    Properties:
      Description: AWS IAM role for AWS Systems Manager to execute automation document
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: ssm.amazonaws.com
            Action: sts:AssumeRole
            Condition:
              StringEquals:
                aws:SourceAccount: !Sub ${AWS::AccountId}
              ArnLike:
                aws:SourceArn: !Sub arn:${AWS::Partition}:ssm:*:${AWS::AccountId}:automation-execution/*
      ManagedPolicyArns:
        - !Sub arn:${AWS::Partition}:iam::aws:policy/service-role/AmazonSSMAutomationRole
      Path: /
      Policies:
        - PolicyName: SsmMemMetricIamPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - iam:GetRole
                  - iam:GetInstanceProfile
                  - iam:GetPolicy
                  - iam:AttachRolePolicy
                  - iam:ListInstanceProfiles
                  - ec2:DescribeInstances
                Resource: '*'
  CloudWatchAgentConfigFile:
    Type: AWS::SSM::Parameter
    Properties:
      Name: CloudwatchAgentConfigForMemoryMetricsLinux.json
      Description: Store CloudWatch Agent configuration file as AWS Systems Manager Parameter
      Type: String
      Value: |
        {
            "agent": {
                    "metrics_collection_interval": 60,
                    "run_as_user": "cwagent"
            },
            "metrics": {
                    "append_dimensions": {
                        "InstanceId": "${aws:InstanceId}"
                    },
                    "metrics_collected": {
                    "mem": {
                            "measurement": [
                                "mem_used_percent"
                            ],
                            "metrics_collection_interval": 60
                    }
                }
            }
        }
  MemoryMetricsRunbook:
    Type: AWS::SSM::Document
    Properties:
      DocumentFormat: YAML
      DocumentType: Automation
      Name: ConfigureMemoryMetricsOnEC2Linux
      Content:
        description: Install CloudWatch Agent, Add permissions to target instances and configure CloudWatch agent to publish metrics
        schemaVersion: '0.3'
        assumeRole: '{{AutomationAssumeRole}}'
        parameters:
          InstanceId:
            type: String
            description: Select instances
          AutomationAssumeRole:
            type: String
            description: (Optional) The ARN of the role that allows Automation to perform the actions on your behalf.
            default: ''
            allowedPattern: ^arn:aws(-cn|-us-gov)?:iam::\d{12}:role\/[\w+=,.@_\/-]+|^$
        mainSteps:
          - name: AttachCloudWatchAgentServerPolicy
            action: aws:executeScript
            onFailure: Abort
            isCritical: true
            timeoutSeconds: 600
            description: |
              ## Find the attached role, attach CloudWatchAgentServer managed policy to the role
            inputs:
              Runtime: python3.8
              Handler: attach_cloudwatch_agent_managed_policy
              InputPayload:
                InstanceIds: '{{InstanceId}}'
              Script: |
                import boto3
                ec2_client = boto3.client('ec2')
                iam_client = boto3.client('iam')
                current_session = boto3.session.Session()
                current_region = current_session.region_name
                partition = current_session.get_partition_for_region(current_region)
                cloudwatchagent_policy_arn = f'arn:{partition}:iam::aws:policy/CloudWatchAgentServerPolicy'
                # instances_id = event['InstanceIds']
                def attach_cloudwatch_agent_managed_policy(event,context):
                  # Define the instance ID for which you want to find the IAM role
                  instance_id = event['InstanceIds']

                  # Use the describe_instances() method to get information about the instance
                  response = ec2_client.describe_instances(InstanceIds=[instance_id])

                  # Get the IAM role from the response
                  # 下面这部分作了细节改动,方便更准确获取EC2的role
                  
                  # 获取AWS EC2 IAM instance profile
                  iam_instance_profile_arn = response['Reservations'][0]['Instances'][0]['IamInstanceProfile']['Arn']
                  # 截取instance profile名称
                  ec2_instance_profile_name = iam_instance_profile_arn.split('/')[-1]
                  iam_client = boto3.client('iam')
                  # 根据instance profile名称获取instance profile的详细信息
                  iam_instance_profile = iam_client.get_instance_profile(
                      InstanceProfileName=ec2_instance_profile_name
                  )
                  # 根据instance profile详情获取EC2 role name
                  ec2_iam_role_name = iam_instance_profile['InstanceProfile']['Roles'][0]['RoleName']
                  iam_client.attach_role_policy(RoleName=ec2_iam_role_name, PolicyArn=cloudwatchagent_policy_arn)
          - name: installCWAgent
            action: aws:runCommand
            onFailure: Abort
            inputs:
              Parameters:
                action:
                  - Install
                installationType:
                  - Uninstall and reinstall
                name:
                  - AmazonCloudWatchAgent
              DocumentName: AWS-ConfigureAWSPackage
              InstanceIds:
                - '{{InstanceId}}'
          - name: configureCWAgent
            action: aws:runCommand
            inputs:
              DocumentName: AmazonCloudWatch-ManageAgent
              InstanceIds:
                - '{{InstanceId}}'
              Parameters:
                action: configure
                mode: ec2
                optionalConfigurationSource: ssm
                optionalConfigurationLocation: CloudwatchAgentConfigForMemoryMetricsLinux.json
                optionalRestart: 'yes'
Outputs:
  SsmMemMetricsAutomationRoleName:
    Description: Name of the SSM Automation IAM Role
    Value: !Ref SsmMemMetricsAutomationRole

使用改动过的yaml文件创建cloudfromation stack

arduino 复制代码
aws cloudformation create-stack --stack-name MemoryMetricsAutomation --template-body file://mem-metrics-cfn-temp.yaml --capabilities CAPABILITY_NAMED_IAM

等待cloudfromation stack创建完成

arduino 复制代码
aws cloudformation wait stack-create-complete --stack-name MemoryMetricsAutomation

获取AWS Systems Manager自动执行所需的SsmMemMetricsAutomationRoleName

css 复制代码
aws cloudformation describe-stacks --stack-name MemoryMetricsAutomation --query 'Stacks[0].Outputs[?OutputKey==`SsmMemMetricsAutomationRoleName`].OutputValue'

具体的操作如下:

使用AWS Systems Manager安装插件

打开AWS Systems Manager->Change Management->Automation

接着点击Execute automation,选择我们前面创建好的runbook,这里我选择Owned by me->ConfigureMemoryMetricsOnEC2Linux

然后选择simple execution并选中我们需要安装监控的EC2,最后选择前面创建的SsmMemMetricsAutomationRoleName(一般名称为MemoryMetricsAutomation-SsmMemMetricsAutomationRole-xxxx),然后提交

提交后就会进行相关的流程化安装

等待执行完成

等几分钟后内存指标会被上报到cloudwatch中,最后我们可以在cloudwatch里面看到内存的使用情况

总结

  1. 被安装内存监控的EC2必须先被赋予一个IAM Role,这样cloudwatch agent才能获取到合适的权限发送指标到cloudwatch
  2. 后续新的EC2安装监控,我们可以直接在AWS Systems Manager->Change Management->Automation批量选择这些EC2来简化安装流程
  3. 新创建的EC2一般默认不会安装cloudwatch gaent,我们除了可以使用AWS Systems Manager外,也可以设置在EC2启动是自动执行自定义脚步来完成类似步骤,具体的可以使用instance user data来实现,比如将脚本写在Launch template的user data里面来自动的为EC2 Auto Scaling Group里面的EC2进行启动时安装

参考

相关推荐
cjy0001111 小时前
springboot的 nacos 配置获取不到导致启动失败及日志不输出问题
java·spring boot·后端
小江的记录本2 小时前
【事务】Spring Framework核心——事务管理:ACID特性、隔离级别、传播行为、@Transactional底层原理、失效场景
java·数据库·分布式·后端·sql·spring·面试
sheji34162 小时前
【开题答辩全过程】以 基于springboot的校园失物招领系统为例,包含答辩的问题和答案
java·spring boot·后端
程序员cxuan2 小时前
人麻了,谁把我 ssh 干没了
人工智能·后端·程序员
wuyikeer4 小时前
Spring Framework 中文官方文档
java·后端·spring
Victor3564 小时前
MongoDB(61)如何避免大文档带来的性能问题?
后端
Victor3564 小时前
MongoDB(62)如何避免锁定问题?
后端
wuyikeer4 小时前
Spring BOOT 启动参数
java·spring boot·后端
子木HAPPY阳VIP5 小时前
Ubuntu 22.04 VMware 设置固定IP配置
人工智能·后端·目标检测·机器学习·目标跟踪
人间打气筒(Ada)5 小时前
如何基于 Go-kit 开发 Web 应用:从接口层到业务层再到数据层
开发语言·后端·golang