OpenStack Nova Scheduler 计算节点选择机制

Nova Scheduler 的核心任务是解决"虚拟机实例在哪个计算节点上启动"的问题,它根据用户通过 flavor 提出的资源需求(如 CPU、内存、磁盘)来做出决策。其默认的调度器是 Filter Scheduler,工作流程主要分为过滤 (Filtering) 和称重 (Weighting) 两个阶段。

1、整体流程

1.1 RPC 入口与协调器

scheduler/manager.py,这是调度流程的起点和总指挥。

python 复制代码
# scheduler/manager.py
class SchedulerManager(manager.Manager):
    """Scheduler manager."""

    def __init__(self, *args, **kwargs):
        super(SchedulerManager, self).__init__(service_name='scheduler', *args, **kwargs)
        self.client = scheduler_client.SchedulerClient() # 初始化客户端,用于调用Placement
        self.host_manager = host_manager.HostManager()   # 初始化主机管理器

    @messaging.expected_exceptions(exception.NoValidHost)
    def select_destinations(self, context, request_spec, filter_properties,
                           spec_obj=None, ...):
        """Select target hosts for instances."""
        # ... 参数检查和准备 ...

        # 1. 通过Client查询Placement API,获取分配候选(alloc_reqs)和提供商摘要(provider_summaries)
        alloc_reqs, provider_summaries = self.client.get_allocation_candidates(
            context, spec_obj)

        # 2. 如果没有候选,直接抛出异常,避免后续无用功
        if not alloc_reqs:
            raise exception.NoValidHost(reason="")

        # 3. 调用HostManager,根据Placement返回的信息更新主机状态
        host_states = self.host_manager.get_host_states_from_provider_summaries(
            context, provider_summaries, ...)

        # 4. 调用HostManager进行过滤和称重,这是核心决策逻辑
        selections = self.host_manager.select_destinations(
            context, spec_obj, host_states, ...)

        # 5. 通过Client,向Placement API申领资源
        self.client.claim_resources(context, spec_obj, selections, alloc_reqs)

        return selections

1.2 主机状态管理与策略执行核心

scheduler/host_manager.py,这是调度逻辑的真正核心,负责管理主机状态并执行过滤和称重。

python 复制代码
# scheduler/host_manager.py
class HostManager(object):
    """Manage HostStates and implements scheduling logic."""

    def __init__(self):
        self.filter_handler = filters.HostFilterHandler()    # 过滤器加载器
        self.weight_handler = weights.HostWeightHandler()    # 称重器加载器
        self.filter_classes = self.filter_handler.get_matching_classes(
            CONF.scheduler_default_filters)                  # 加载配置的过滤器类
        self.weight_classes = self.weight_handler.get_matching_classes(
            CONF.scheduler_weight_classes)                   # 加载配置的称重器类

    def get_filtered_hosts(self, host_states, spec_obj, ...):
        """Filter hosts based on specified filters."""
        filtered_hosts = []
        for host_state in host_states:
            # 对每个主机,按顺序执行所有过滤器
            if self.host_passes(host_state, spec_obj):
                filtered_hosts.append(host_state)
        return filtered_hosts

    def host_passes(self, host_state, spec_obj):
        """Check if a host passes all filters."""
        for filter_cls in self.filter_classes:
            filter_obj = filter_cls()
            # 如果任何一个过滤器不通过,立即返回False
            if not filter_obj.host_passes(host_state, spec_obj):
                LOG.debug("Host %(host)s failed filter %(filter)s",
                          {'host': host_state.host, 'filter': filter_cls.__name__})
                return False
        return True # 全部通过才返回True

    def get_weighed_hosts(self, hosts, spec_obj, ...):
        """Weigh the hosts based on specified weighers."""
        weighed_hosts = []
        for host in hosts:
            weight = 0
            for weigher_cls in self.weight_classes:
                weigher_obj = weigher_cls()
                # 计算每个称重器的权重并乘以乘数,然后累加
                weight += (weigher_obj._weigh_object(host, spec_obj) *
                           weigher_obj.weight_multiplier)
            weighed_hosts.append(weights.WeighedHost(host, weight))
        # 根据最终权重进行排序
        weighed_hosts.sort(key=lambda x: x.weight, reverse=True)
        return weighed_hosts

    def select_destinations(self, context, spec_obj, host_states, ...):
        """The main scheduling process."""
        # 调用上述方法完成调度
        filtered_hosts = self.get_filtered_hosts(host_states, spec_obj)
        if not filtered_hosts:
            raise exception.NoValidHost(reason="")
        weighed_hosts = self.get_weighed_hosts(filtered_hosts, spec_obj)
        # 选择最优主机
        return [weighed_hosts[0]]

1.3 过滤器示例

python 复制代码
# scheduler/filters/ram_filter.py
class RamFilter(filters.BaseHostFilter):
    """Filter out hosts with insufficient RAM."""

    def host_passes(self, host_state, spec_obj):
        """Return True if host has sufficient RAM."""
        # 从请求规格中获取所需内存
        requested_ram = spec_obj.memory_mb
        # 从主机状态中获取可用内存(此值已考虑 overcommit ratio)
        free_ram_mb = host_state.free_ram_mb

        # 核心逻辑:可用内存 >= 请求内存 ? 通过 : 不通过
        return free_ram_mb >= requested_ram

1.4 权重器示例

python 复制代码
# scheduler/weights/ram.py
class RamWeigher(weights.BaseHostWeigher):
    # 定义权重乘数的配置名称,在nova.conf中设置
    weight_multiplier = 'ram_weight_multiplier'

    def _weigh_object(self, host_state, spec_obj):
        """Calculate weight based on free RAM."""
        # 核心逻辑:返回节点的可用内存作为基础权重值
        # 最终权重 = free_ram_mb * ram_weight_multiplier
        # 如果乘数为正,则空闲内存越多,权重越高,越优先(分散)
        # 如果乘数为负,则空闲内存越少,权重越高,越优先(堆叠)
        return host_state.free_ram_mb

2、过滤器

所有过滤器均继承自 BaseHostFilter,并实现 host_passes 方法(判断节点是否符合条件)。调度时,过滤器按 enabled_filters 配置顺序依次执行,所有过滤器通过的节点才会进入权重计算阶段。

2.1 配置

过滤器按nova.confenabled_filters 配置顺序依次执行

ini 复制代码
[scheduler]
enabled_filters = ComputeFilter,RamFilter,CoreFilter,DiskFilter,AvailabilityZoneFilter,ServerGroupAntiAffinityFilter

[compute]
cpu_allocation_ratio = 16.0  # CPU 过量使用比例,默认16.0
ram_allocation_ratio = 1.5  # 内存过量使用比例,默认1.5
disk_allocation_ratio = 1.0  # 磁盘过量使用比例,默认1.0

2.1 RamFilter:内存过滤

过滤内存不足的计算节点(考虑内存过量使用策略),并可调整内存过量比例

python 复制代码
# nova/scheduler/filters/ram.py
class RamFilter(filters.BaseHostFilter):
    def host_passes(self, host_state, spec_obj, filter_properties):
        # 获取实例所需内存
        instance_ram = spec_obj.memory_mb
        # 内存过量使用比例(默认1.5,可配置)
        ram_ratio = CONF.compute.ram_allocation_ratio
        # 计算节点实际可用内存 = 总内存 * 过量比例 - 已用内存
        usable_ram = int(host_state.total_ram_mb * ram_ratio) - host_state.used_ram_mb
        return usable_ram >= instance_ram

2.2 CoreFilter:CPU 核心过滤

过滤 vCPU 核心不足的计算节点(考虑 CPU 过量使用策略),可调整 CPU 过量比例

python 复制代码
# nova/scheduler/filters/core.py
class CoreFilter(filters.BaseHostFilter):
    def host_passes(self, host_state, spec_obj, filter_properties):
        # 获取实例所需 vCPU
        instance_vcpus = spec_obj.vcpus
        # 可用 vCPU = 总 vCPU - 已用 vCPU
        free_vcpus = host_state.vcpus_total - host_state.vcpus_used
        return free_vcpus >= instance_vcpus

2.3 ComputeFilter:计算服务状态过滤

过滤 nova-compute 服务未运行的计算节点。

python 复制代码
# nova/scheduler/filters/compute.py
class ComputeFilter(filters.BaseHostFilter):
    def host_passes(self, host_state, spec_obj, filter_properties):
        # 检查节点的 nova-compute 服务是否为 "up" 状态
        return host_state.service_up

2.4 AvailabilityZoneFilter:可用区过滤

过滤不在指定可用区的计算节点。实例创建时可通过 --availability-zone 指定可用区

python 复制代码
# nova/scheduler/filters/availability_zone.py
class AvailabilityZoneFilter(filters.BaseHostFilter):
    def host_passes(self, host_state, spec_obj, filter_properties):
        # 获取实例请求的可用区列表
        requested_azs = spec_obj.availability_zones
        if not requested_azs:
            return True  # 无指定可用区时全通过
        # 检查节点的可用区是否在请求列表中
        return host_state.availability_zone in requested_azs

2.5 ServerGroupAffinityFilter:实例组亲和性过滤

确保同一实例组的实例调度到相同节点(亲和性策略),创建实例组时指定 --policy affinity

python 复制代码
# nova/scheduler/filters/server_group.py
class ServerGroupAffinityFilter(filters.BaseHostFilter):
    def host_passes(self, host_state, spec_obj, filter_properties):
        group = filter_properties.get('group')
        if not group or group.policy != 'affinity':
            return True
        # 检查节点是否已有同组实例
        instances_on_host = self._get_group_instances(host_state, group.id)
        return len(instances_on_host) > 0

2.6 ServerGroupAntiAffinityFilter:实例组反亲和性过滤

确保同一实例组的实例调度到不同节点(反亲和性策略)。

创建实例组时指定 --policy anti-affinity

python 复制代码
# nova/scheduler/filters/server_group.py

class ServerGroupAntiAffinityFilter(filters.BaseHostFilter):
    def host_passes(self, host_state, spec_obj, filter_properties):
        group = filter_properties.get('group')
        if not group or group.policy != 'anti-affinity':
            return True
        # 检查节点是否已有同组实例
        instances_on_host = self._get_group_instances(host_state, group.id)
        return len(instances_on_host) == 0

2.7 DiskFilter:磁盘空间过滤

过滤本地磁盘空间不足的计算节点(考虑磁盘过量使用策略),可调整磁盘过量比例

python 复制代码
# nova/scheduler/filters/disk.py

class DiskFilter(filters.BaseHostFilter):
    def host_passes(self, host_state, spec_obj, filter_properties):
        # 获取实例所需磁盘空间
        instance_disk = spec_obj.root_gb + sum(ephemeral_gb for ephemeral_gb in spec_obj.ephemeral_gbs)
        # 磁盘过量使用比例(默认1.0,可配置)
        disk_ratio = CONF.compute.disk_allocation_ratio
        # 计算节点实际可用磁盘 = 总磁盘 * 过量比例 - 已用磁盘
        usable_disk = int(host_state.local_gb * disk_ratio) - host_state.local_gb_used
        return usable_disk >= instance_disk

2.8 PciPassthroughFilter:PCI 设备过滤

过滤不支持所需 PCI 设备(如 GPU、FPGA)的计算节点。

启用后需在 nova.conf 中配置 PCI 设备白名单:

ini 复制代码
[scheduler]
enabled_filters = PciPassthroughFilter,...
[pci]
alias = {
    "name": "nvidia_gpu",
    "product_id": "1eb8",
    "vendor_id": "10de",
    "device_type": "GPU"
}
python 复制代码
# nova/scheduler/filters/pci_passthrough.py

class PciPassthroughFilter(filters.BaseHostFilter):
    def host_passes(self, host_state, spec_obj, filter_properties):
        # 获取实例的 PCI 设备请求
        pci_requests = spec_obj.pci_requests
        if not pci_requests:
            return True
        # 检查节点是否有满足需求的 PCI 设备
        return host_state.pci_stats.support_requests(pci_requests.requests)

3、权重器

对经过过滤器筛选后的 "合格节点" 进行打分排序,得分最高的节点将被优先选择部署实例。权重器通过 "权重因子 + 乘数配置" 实现灵活的调度策略,可根据业务需求(如优先空闲内存、低 CPU 负载)调整节点优先级。

所有权重器均继承自 nova.scheduler.weights.BaseWeigher,需实现 weigh() 方法(计算单项得分)。关键属性:weight_multiplier:权重乘数(默认 1.0),可通过配置调整,乘数越大,该权重器对总得分的影响越强。

3.1 配置示例

ini 复制代码
[scheduler]
# 1. 启用权重器(顺序不影响得分计算,仅影响代码执行顺序)
enabled_weighters = RAMWeigher,CoreWeigher,LoadWeigher

# 2. 配置权重乘数(根据业务需求调整)
# 内存密集型实例:提高RAMWeigher权重
ram_weight_multiplier = 1.8
# CPU密集型实例:提高CoreWeigher权重
core_weight_multiplier = 1.2
# 平衡负载:适度提高LoadWeigher权重
load_weight_multiplier = 0.8

# 3. 权重计算相关优化(可选)
# 节点状态缓存时间(减少重复计算,默认60秒)
scheduler_cache_expiry = 60

3.1 RAMWeigher:内存权重器(默认启用)

  • 按节点可用内存比例打分,可用内存越多,得分越高,优先选择内存充裕的节点。
  • 适用于内存密集型实例(如大数据、缓存服务)。
python 复制代码
# nova/scheduler/weights/ram.py

class RAMWeigher(BaseWeigher):
    # 默认权重乘数(可通过配置覆盖)
    weight_multiplier = 1.0

    def weigh(self, host_state, spec_obj, weight_properties):
        """计算内存单项得分:可用内存比例 × 权重乘数"""
        # 总内存为0时返回0(避免除零错误)
        if host_state.total_ram_mb == 0:
            return 0.0
        # 可用内存比例 = 可用内存 / 总内存
        free_ram_ratio = host_state.free_ram_mb / host_state.total_ram_mb
        # 单项得分 = 比例 × 权重乘数
        return free_ram_ratio * self.weight_multiplier

3.2 CoreWeigher:CPU 权重器

  • 按节点可用 CPU 核心比例打分,可用 CPU 越多,得分越高,优先选择 CPU 空闲的节点。
  • 适用于 CPU 密集型实例(如计算、渲染服务)。
python 复制代码
# nova/scheduler/weights/core.py

class CoreWeigher(BaseWeigher):
    weight_multiplier = 1.0

    def weigh(self, host_state, spec_obj, weight_properties):
        """计算CPU单项得分:可用CPU比例 × 权重乘数"""
        if host_state.vcpus_total == 0:
            return 0.0
        # 可用CPU比例 = (总CPU - 已用CPU)/ 总CPU
        free_core_ratio = (host_state.vcpus_total - host_state.vcpus_used) / host_state.vcpus_total
        return free_core_ratio * self.weight_multiplier

3.3 LoadWeigher:负载权重器

  • 按节点整体负载打分(CPU + 内存使用率),负载越低,得分越高,优先选择低负载节点。
  • 适用于追求节点负载均衡的场景,避免单节点过载。
python 复制代码
#nova/scheduler/weights/load.py

class LoadWeigher(BaseWeigher):
    weight_multiplier = 1.0

    def weigh(self, host_state, spec_obj, weight_properties):
        """计算负载单项得分:(1 - 平均负载)× 权重乘数"""
        if host_state.vcpus_total == 0 or host_state.total_ram_mb == 0:
            return 0.0
        # CPU使用率 = 已用CPU / 总CPU
        cpu_usage = host_state.vcpus_used / host_state.vcpus_total
        # 内存使用率 = (总内存 - 可用内存)/ 总内存
        mem_usage = (host_state.total_ram_mb - host_state.free_ram_mb) / host_state.total_ram_mb
        # 平均负载 = (CPU使用率 + 内存使用率)/ 2
        avg_load = (cpu_usage + mem_usage) / 2
        # 负载越低,得分越高(1 - 平均负载)
        return (1 - avg_load) * self.weight_multiplier
相关推荐
努力打怪升级3 天前
云计算介绍
云计算·openstack
哈里谢顿9 天前
Nova parse_args 函数详解
openstack
哈里谢顿11 天前
OpenStack 中的 nova-conductor 与 ironic-conductor 及其分布式锁机制详解
openstack
哈里谢顿15 天前
OpenStack oslo-config 详解
openstack
感哥20 天前
OpenStack Cinder 创建卷
openstack
感哥20 天前
OpenStack Cinder 架构
openstack
感哥23 天前
OpenStack Nova 创建虚拟机
openstack
感哥23 天前
OpenStack Glance(镜像)
openstack
感哥24 天前
OpenStack Keystone详解
openstack