工业领域的Hadoop架构学习~系列文章14：Hadoop集群部署 - 从规划到上线的全流程实践

第14期：Hadoop集群部署 - 从规划到上线的全流程实践

导言：工业大数据平台的集群部署是一项系统工程，需要综合考虑硬件选型、网络规划、软件架构、容灾备份等多方面因素。本期从企业级Hadoop集群规划出发，详细讲解物理机部署、容器化部署（Kubernetes）以及自动化运维工具的使用，助您构建生产级Hadoop集群。

14.1 工业Hadoop集群规划

14.1.1 硬件选型与容量规划

复制代码

工业大数据平台硬件选型指南：

┌────────────────────────────────────────────────────────────────────┐
│                        硬件选型矩阵                                  │
├─────────────────┬─────────────────┬─────────────────┬───────────────┤
│     组件        │      CPU        │      内存        │      磁盘     │
├─────────────────┼─────────────────┼─────────────────┼───────────────┤
│ NameNode        │ 32核+           │ 256GB+          │ SSD 1TB+     │
│                 │ (高主频,低延迟)  │ (元数据缓存)     │ (RAID10)     │
├─────────────────┼─────────────────┼─────────────────┼───────────────┤
│ DataNode        │ 32-64核         │ 128-256GB       │ HDD 12TB+    │
│                 │ (并行处理)      │ (Block缓存)      │ (JBOD)       │
├─────────────────┼─────────────────┼─────────────────┼───────────────┤
│ ResourceManager │ 32核            │ 128GB           │ SSD 512GB    │
│                 │                 │                 │ (本地日志)    │
├─────────────────┼─────────────────┼─────────────────┼───────────────┤
│ NodeManager     │ 32-64核         │ 64-128GB        │ HDD 4-8TB    │
│                 │ (容器资源)      │ (任务执行)       │ (中间数据)    │
├─────────────────┼─────────────────┼─────────────────┼───────────────┤
│ ZooKeeper       │ 16核            │ 32GB            │ SSD 200GB    │
│                 │ (低延迟)        │ (选举缓存)       │ (快速选举)    │
├─────────────────┼─────────────────┼─────────────────┼───────────────┤
│ Kafka Broker    │ 32核            │ 64-128GB         │ SSD 2TB+     │
│                 │                 │ (页缓存)         │ (高吞吐)     │
└─────────────────┴─────────────────┴─────────────────┴───────────────┘

磁盘配置策略：
- 数据盘: 使用JBOD(Just a Bunch Of Disks)，避免RAID开销
- 日志盘: RAID1，保证高可用
- SSD: 用于NameNode元数据、Kafka日志、WAL

14.1.2 容量规划公式

复制代码

Hadoop集群容量规划模型：

┌─────────────────────────────────────────────────────────────────────┐
│                          容量规划公式                                │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  1. 存储容量计算                                                     │
│     ┌─────────────────────────────────────────────────────────────┐  │
│     │ RawCapacity = Σ(DiskSize × DiskCount × NodeCount)           │  │
│     │                                                             │  │
│     │ UsableCapacity = RawCapacity × ReplicationFactor ×          │  │
│     │                   (1 - Overhead) × (1 - Reserved)          │  │
│     │                                                             │  │
│     │ 参数说明:                                                    │  │
│     │ - ReplicationFactor: 默认3 (工业场景建议3)                   │  │
│     │ - Overhead: 10% (HDFS内部开销)                               │  │
│     │ - Reserved: 5% (预留空间)                                    │  │
│     └─────────────────────────────────────────────────────────────┘  │
│                                                                      │
│  2. 内存规划                                                         │
│     ┌─────────────────────────────────────────────────────────────┐  │
│     │ YarnMemory = TotalMemory × (1 - SystemOverhead) ×            │  │
│     │              (1 - HBaseOverhead) × YARNAllocationRatio       │  │
│     │                                                             │  │
│     │ ContainerSize = Floor(YarnMemory / ContainerRatio)           │  │
│     │                                                             │  │
│     │ 推荐配置:                                                     │  │
│     │ - 系统开销: 10%                                               │  │
│     │ - HBase开销: 20-30% (如果共用节点)                           │  │
│     │ - YARN分配比: 80%                                             │  │
│     │ - 单Container大小: 4-8GB                                      │  │
│     └─────────────────────────────────────────────────────────────┘  │
│                                                                      │
│  3. 计算核心数规划                                                   │
│     ┌─────────────────────────────────────────────────────────────┐  │
│     │ VirtualCores = PhysicalCores × CPUAllocationRatio            │  │
│     │                                                             │  │
│     │ 工业场景推荐:                                                │  │
│     │ - CPU分配比: 0.8-1.0                                         │  │
│     │ - Container数量 ≈ 2 × VirtualCores / 3                       │  │
│     └─────────────────────────────────────────────────────────────┘  │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

14.1.3 集群架构拓扑设计

#mermaid-svg-QGlzVGeDccjPwMuu{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-QGlzVGeDccjPwMuu .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-QGlzVGeDccjPwMuu .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-QGlzVGeDccjPwMuu .error-icon{fill:#552222;}#mermaid-svg-QGlzVGeDccjPwMuu .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-QGlzVGeDccjPwMuu .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-QGlzVGeDccjPwMuu .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-QGlzVGeDccjPwMuu .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-QGlzVGeDccjPwMuu .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-QGlzVGeDccjPwMuu .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-QGlzVGeDccjPwMuu .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-QGlzVGeDccjPwMuu .marker{fill:#333333;stroke:#333333;}#mermaid-svg-QGlzVGeDccjPwMuu .marker.cross{stroke:#333333;}#mermaid-svg-QGlzVGeDccjPwMuu svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-QGlzVGeDccjPwMuu p{margin:0;}#mermaid-svg-QGlzVGeDccjPwMuu .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-QGlzVGeDccjPwMuu .cluster-label text{fill:#333;}#mermaid-svg-QGlzVGeDccjPwMuu .cluster-label span{color:#333;}#mermaid-svg-QGlzVGeDccjPwMuu .cluster-label span p{background-color:transparent;}#mermaid-svg-QGlzVGeDccjPwMuu .label text,#mermaid-svg-QGlzVGeDccjPwMuu span{fill:#333;color:#333;}#mermaid-svg-QGlzVGeDccjPwMuu .node rect,#mermaid-svg-QGlzVGeDccjPwMuu .node circle,#mermaid-svg-QGlzVGeDccjPwMuu .node ellipse,#mermaid-svg-QGlzVGeDccjPwMuu .node polygon,#mermaid-svg-QGlzVGeDccjPwMuu .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-QGlzVGeDccjPwMuu .rough-node .label text,#mermaid-svg-QGlzVGeDccjPwMuu .node .label text,#mermaid-svg-QGlzVGeDccjPwMuu .image-shape .label,#mermaid-svg-QGlzVGeDccjPwMuu .icon-shape .label{text-anchor:middle;}#mermaid-svg-QGlzVGeDccjPwMuu .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-QGlzVGeDccjPwMuu .rough-node .label,#mermaid-svg-QGlzVGeDccjPwMuu .node .label,#mermaid-svg-QGlzVGeDccjPwMuu .image-shape .label,#mermaid-svg-QGlzVGeDccjPwMuu .icon-shape .label{text-align:center;}#mermaid-svg-QGlzVGeDccjPwMuu .node.clickable{cursor:pointer;}#mermaid-svg-QGlzVGeDccjPwMuu .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-QGlzVGeDccjPwMuu .arrowheadPath{fill:#333333;}#mermaid-svg-QGlzVGeDccjPwMuu .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-QGlzVGeDccjPwMuu .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-QGlzVGeDccjPwMuu .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-QGlzVGeDccjPwMuu .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-QGlzVGeDccjPwMuu .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-QGlzVGeDccjPwMuu .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-QGlzVGeDccjPwMuu .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-QGlzVGeDccjPwMuu .cluster text{fill:#333;}#mermaid-svg-QGlzVGeDccjPwMuu .cluster span{color:#333;}#mermaid-svg-QGlzVGeDccjPwMuu div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-QGlzVGeDccjPwMuu .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-QGlzVGeDccjPwMuu rect.text{fill:none;stroke-width:0;}#mermaid-svg-QGlzVGeDccjPwMuu .icon-shape,#mermaid-svg-QGlzVGeDccjPwMuu .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-QGlzVGeDccjPwMuu .icon-shape p,#mermaid-svg-QGlzVGeDccjPwMuu .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-QGlzVGeDccjPwMuu .icon-shape .label rect,#mermaid-svg-QGlzVGeDccjPwMuu .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-QGlzVGeDccjPwMuu .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-QGlzVGeDccjPwMuu .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-QGlzVGeDccjPwMuu :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 核心网络 40GE Spine-Leaf
机架5 - Edge节点
Gateway/Jumphost
管理节点
监控节点
机架4 - Worker节点
DataNode-2
NodeManager-2
Kafka Broker-2
ZooKeeper-3
机架3 - Worker节点
DataNode-1
NodeManager-1
Kafka Broker-1
机架2 - Core节点
NameNode-2
ResourceManager-2
Hive Metastore
ZooKeeper-2
机架1 - Master节点
NameNode-1
ResourceManager-1
HBase Master
ZooKeeper-1
L3 Switch 40GE

14.2 Ambari/Rancher自动化部署

14.2.1 Ambari集群安装

bash 复制代码

#!/bin/bash
# ambari_install.sh - Ambari自动化安装脚本

set -e

# 环境变量
AMBARI_VERSION="2.7.6"
CLUSTER_NAME="industrial-hadoop"
AMBARI_SERVER="node1.industrial.com"
AMBARI_AGENTS=("node2.industrial.com" "node3.industrial.com" "node4.industrial.com")

# 基础环境准备
setup_base_env() {
    echo "[INFO] 设置主机名和hosts文件..."
    
    for host in ${AMBARI_AGENTS[@]}; do
        ssh root@$host "
            hostnamectl set-hostname $(echo $host | cut -d. -f1)
            echo '$host $(echo $host | cut -d. -f1)' >> /etc/hosts
            systemctl disable firewalld
            systemctl stop firewalld
            setenforce 0
            sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
        "
    done
    
    # 配置NTP时间同步
    for host in $AMBARI_SERVER ${AMBARI_AGENTS[@]}; do
        ssh root@$host "
            yum install -y chrony
            systemctl enable chronyd
            systemctl start chronyd
        "
    done
}

# 安装Ambari Server
install_ambari_server() {
    echo "[INFO] 安装Ambari Server..."
    
    # 安装JDK
    ssh root@$AMBARI_SERVER "
        yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel
        export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
    "
    
    # 下载并安装Ambari
    cd /opt
    wget -q https://archive.apache.org/dist/ambari/ambari-$AMBARI_VERSION/hdp/"$AMBARI_VERSION"/ambari-server.rpm
    wget -q https://archive.apache.org/dist/ambari/ambari-$AMBARI_VERSION/hdp/"$AMBARI_VERSION"/ambari-agent.rpm
    
    yum install -y ambari-server.rpm
    yum install -y ambari-agent.rpm
    
    # 配置Ambari
    ambari-server setup -s \
        --java-home=/usr/lib/jvm/java-1.8.0-openjdk
        
    ambari-server start
}

# 安装Ambari Agent到所有节点
install_ambari_agents() {
    echo "[INFO] 安装Ambari Agent..."
    
    for host in ${AMBARI_AGENTS[@]}; do
        ssh root@$host "
            yum install -y ambari-agent.rpm
            ambari-agent configure-all -s --hostname=$AMBARI_SERVER
            ambari-agent start
        "
    done
}

# 主函数
main() {
    setup_base_env
    install_ambari_server
    install_ambari_agents
    
    echo "[SUCCESS] Ambari安装完成，请访问 http://$AMBARI_SERVER:8080"
}

main "$@"

14.2.2 Blueprint定义（自动化部署）

json 复制代码

// hadoop-blueprint.json - Ambari Blueprint
{
  "configurations": [
    {
      "global": {
        "namenode_heapsize": "4096m",
        "datanode_heapsize": "2048m",
        "dtnode_heapsize": "2048m"
      }
    },
    {
      "core-site": {
        "fs.defaultFS": "hdfs://industrial-cluster",
        "ha.zookeeper.quorum": "node1:2181,node2:2181,node3:2181"
      }
    },
    {
      "hdfs-site": {
        "dfs.nameservices": "industrial-cluster",
        "dfs.ha.namenodes.industrial-cluster": "nn1,nn2",
        "dfs.namenode.http-address.industrial-cluster.nn1": "node1:50070",
        "dfs.namenode.http-address.industrial-cluster.nn2": "node2:50070",
        "dfs.namenode.rpc-address.industrial-cluster.nn1": "node1:8020",
        "dfs.namenode.rpc-address.industrial-cluster.nn2": "node2:8020"
      }
    },
    {
      "yarn-site": {
        "yarn.resourcemanager.ha.enabled": "true",
        "yarn.resourcemanager.cluster-id": "rm-cluster",
        "yarn.resourcemanager.ha.rm-ids": "rm1,rm2"
      }
    }
  ],
  "host_groups": [
    {
      "name": "master_hosts",
      "components": [
        {"name": "ZOOKEEPER_SERVER"},
        {"name": "HDFS_NAMENODE"},
        {"name": "YARN_RESOURCEMANAGER"},
        {"name": "HIVE_METASTORE"},
        {"name": "SPARK2_JOBHISTORYSERVER"}
      ]
    },
    {
      "name": "worker_hosts",
      "components": [
        {"name": "HDFS_DATANODE"},
        {"name": "YARN_NODEMANAGER"},
        {"name": "SPARK2_EXECUTOR"}
      ]
    }
  ],
  "Blueprint": {
    "stack_name": "HDP",
    "stack_version": "3.1"
  }
}

14.3 Kubernetes云原生部署

14.3.1 Hadoop Operator CRD定义

yaml 复制代码

# hadoop-cluster-crd.yaml - Kubernetes CRD定义
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: hadoopclusters.hadoop.apache.org
spec:
  group: hadoop.apache.org
  names:
    kind: HadoopCluster
    plural: hadoopclusters
    shortNames: [hc]
  scope: Namespaced
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                clusterType:
                  type: string
                  enum: ["hadoop", "hbase", "kafka", "full"]
                version:
                  type: string
                  default: "3.3.6"
                replicas:
                  type: integer
                  minimum: 1
                hdfs:
                  properties:
                    nameNodes:
                      type: integer
                      minimum: 1
                      maximum: 2
                    dataNodes:
                      type: integer
                    storagePerNode:
                      type: string
                    storageClass:
                      type: string
                yarn:
                  properties:
                    resourceManagers:
                      type: integer
                    nodeManagers:
                      type: integer
                    memoryPerNode:
                      type: string
                security:
                  properties:
                    kerberosEnabled:
                      type: boolean
                    tlsEnabled:
                      type: boolean
                image:
                  type: string

14.3.2 Hadoop Kubernetes Operator实现

java 复制代码

// HadoopClusterReconciler.java - Kubernetes Operator控制器
package com.industrial.hadoop.operator;

import io.javaoperatorsdk.operator.api.*;
import io.javaoperatorsdk.operator.api.updater.*;

public class HadoopClusterReconciler implements Reconciler<HadoopCluster> {
    
    private final KubernetesClient client;
    private final HadoopDeploymentManager deployer;
    
    public HadoopClusterReconciler(KubernetesClient client) {
        this.client = client;
        this.deployer = new HadoopDeploymentManager(client);
    }
    
    @Override
    public UpdateControl<HadoopCluster> reconcile(
            HadoopCluster hadoopCluster, 
            Context<HadoopCluster> context) {
        
        String name = hadoopCluster.getMetadata().getName();
        String namespace = hadoopCluster.getMetadata().getNamespace();
        
        LOG.info("Reconciling HadoopCluster: {}/{}", namespace, name);
        
        // 获取期望状态
        HadoopClusterSpec spec = hadoopCluster.getSpec();
        
        // 1. 确保ConfigMap存在
        deployer.reconcileConfigMap(hadoopCluster);
        
        // 2. 确保Kerberos Secret存在
        if (spec.getSecurity() != null 
            && Boolean.TRUE.equals(spec.getSecurity().getKerberosEnabled())) {
            deployer.reconcileKerberosSecrets(hadoopCluster);
        }
        
        // 3. 部署NameNode (StatefulSet)
        deployer.deployNameNode(hadoopCluster);
        
        // 4. 部署DataNode (DaemonSet)
        deployer.deployDataNode(hadoopCluster);
        
        // 5. 部署ResourceManager (Deployment)
        deployer.deployResourceManager(hadoopCluster);
        
        // 6. 部署NodeManager (DaemonSet)
        deployer.deployNodeManager(hadoopCluster);
        
        // 7. 更新状态
        return UpdateControl.patchStatus(hadoopCluster);
    }
    
    @Override
    public DeleteControl cleanup(
            HadoopCluster hadoopCluster, 
            Context<HadoopCluster> context) {
        
        LOG.info("Cleaning up HadoopCluster: {}/{}", 
            hadoopCluster.getMetadata().getNamespace(),
            hadoopCluster.getMetadata().getName());
        
        // 级联删除所有资源
        deployer.deleteAllResources(hadoopCluster);
        
        return DeleteControl.DELETE_ON_GENERATION_SUPPORT;
    }
}

14.3.3 Kubernetes部署示例

yaml 复制代码

# hadoop-cluster.yaml - Hadoop集群部署清单
apiVersion: hadoop.apache.org/v1
kind: HadoopCluster
metadata:
  name: industrial-hadoop
  namespace: bigdata
spec:
  version: "3.3.6"
  clusterType: "full"
  
  # HDFS配置
  hdfs:
    nameNodes: 2           # HA配置
    dataNodes: 6          # DataNode数量
    storagePerNode: "2Ti"  # 每个DataNode的存储
    storageClass: "hadoop-storage"
    
  # YARN配置
  yarn:
    resourceManagers: 2   # HA配置
    nodeManagers: 6
    memoryPerNode: "32Gi"
    coresPerNode: 16
    
  # 安全配置
  security:
    kerberosEnabled: true
    tlsEnabled: true
    
  # 监控配置
  monitoring:
    prometheusEnabled: true
    grafanaDashboardEnabled: true
    
---
# 生成的HDFS NameNode StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: industrial-hadoop-namenode
  namespace: bigdata
spec:
  serviceName: namenode
  replicas: 2
  selector:
    matchLabels:
      app: hadoop
      component: namenode
  template:
    spec:
      containers:
        - name: namenode
          image: industrial/hadoop:3.3.6
          command:
            - /opt/hadoop/bin/hdfs
            - namenode
          env:
            - name: HADOOP_CONF_DIR
              value: /opt/hadoop/etc/hadoop
          ports:
            - containerPort: 8020
            - containerPort: 9870
          volumeMounts:
            - name: hadoop-conf
              mountPath: /opt/hadoop/etc/hadoop
            - name: namenode-data
              mountPath: /opt/hadoop/data/namenode
      volumes:
        - name: hadoop-conf
          configMap:
            name: industrial-hadoop-config
        - name: namenode-data
          persistentVolumeClaim:
            claimName: namenode-pvc

14.4 集群部署验证与上线

14.4.1 部署验证检查清单

bash 复制代码

#!/bin/bash
# deploy_verify.sh - 集群部署验证脚本

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

PASS=0
FAIL=0

check() {
    local name="$1"
    local cmd="$2"
    
    echo -n "Checking: $name ... "
    if eval "$cmd" > /dev/null 2>&1; then
        echo -e "${GREEN}PASS${NC}"
        ((PASS++))
    else
        echo -e "${RED}FAIL${NC}"
        ((FAIL++))
    fi
}

echo "=============================================="
echo "    Hadoop集群部署验证"
echo "=============================================="

# HDFS验证
echo -e "\n${YELLOW}[1] HDFS验证${NC}"
check "NameNode启动" "[ $(jps | grep NameNode | wc -l) -ge 1 ]"
check "DataNode启动" "[ $(jps | grep DataNode | wc -l) -ge 1 ]"
check "HDFS写入测试" "echo 'test' | hdfs dfs -put - /tmp/test_hdfs.txt && hdfs dfs -rm /tmp/test_hdfs.txt"
check "HDFS健康状态" "[ '\$(hdfs dfsadmin -safemode get | grep 'OFF')' != '' ]"
check "块副本数检查" "[ \$(hdfs fsck / | grep 'Total' | grep -oP 'replicas: \K\d+') -ge 0 ]"

# YARN验证
echo -e "\n${YELLOW}[2] YARN验证${NC}"
check "ResourceManager启动" "[ \$(jps | grep ResourceManager | wc -l) -ge 1 ]"
check "NodeManager启动" "[ \$(jps | grep NodeManager | wc -l) -ge 1 ]"
check "YARN节点状态" "[ '\$(yarn node -list | grep -c 'RUNNING')' -ge 1 ]"
check "提交测试作业" "yarn jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 2 2"

# Hive验证
echo -e "\n${YELLOW}[3] Hive验证${NC}"
check "Hive Metastore" "[ \$(jps | grep RunJar | wc -l) -ge 1 ]"
check "Hive连接测试" "beeline -u 'jdbc:hive2://localhost:10000' -e 'SELECT 1;'"

# ZooKeeper验证
echo -e "\n${YELLOW}[4] ZooKeeper验证${NC}"
check "ZooKeeper进程" "[ \$(jps | grep QuorumPeerMain | wc -l) -ge 3 ]"
check "ZooKeeper状态" "echo 'stat' | nc localhost 2181 | grep Mode"

# Kafka验证
echo -e "\n${YELLOW}[5] Kafka验证${NC}"
check "Kafka进程" "[ \$(jps | grep Kafka | wc -l) -ge 1 ]"
check "Kafka主题创建" "kafka-topics.sh --create --topic test --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1"

echo "=============================================="
echo "    验证结果汇总"
echo "=============================================="
echo -e "${GREEN}通过: $PASS${NC}"
echo -e "${RED}失败: $FAIL${NC}"

if [ $FAIL -eq 0 ]; then
    echo -e "\n${GREEN}✓ 所有检查通过，集群可以上线！${NC}"
    exit 0
else
    echo -e "\n${RED}✗ 存在失败项，请检查后重新验证${NC}"
    exit 1
fi

14.5 知识体系总结

#mermaid-svg-XWPIguxthNsX1HZc{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-XWPIguxthNsX1HZc .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-XWPIguxthNsX1HZc .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-XWPIguxthNsX1HZc .error-icon{fill:#552222;}#mermaid-svg-XWPIguxthNsX1HZc .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-XWPIguxthNsX1HZc .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-XWPIguxthNsX1HZc .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-XWPIguxthNsX1HZc .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-XWPIguxthNsX1HZc .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-XWPIguxthNsX1HZc .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-XWPIguxthNsX1HZc .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-XWPIguxthNsX1HZc .marker{fill:#333333;stroke:#333333;}#mermaid-svg-XWPIguxthNsX1HZc .marker.cross{stroke:#333333;}#mermaid-svg-XWPIguxthNsX1HZc svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-XWPIguxthNsX1HZc p{margin:0;}#mermaid-svg-XWPIguxthNsX1HZc .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-XWPIguxthNsX1HZc .cluster-label text{fill:#333;}#mermaid-svg-XWPIguxthNsX1HZc .cluster-label span{color:#333;}#mermaid-svg-XWPIguxthNsX1HZc .cluster-label span p{background-color:transparent;}#mermaid-svg-XWPIguxthNsX1HZc .label text,#mermaid-svg-XWPIguxthNsX1HZc span{fill:#333;color:#333;}#mermaid-svg-XWPIguxthNsX1HZc .node rect,#mermaid-svg-XWPIguxthNsX1HZc .node circle,#mermaid-svg-XWPIguxthNsX1HZc .node ellipse,#mermaid-svg-XWPIguxthNsX1HZc .node polygon,#mermaid-svg-XWPIguxthNsX1HZc .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-XWPIguxthNsX1HZc .rough-node .label text,#mermaid-svg-XWPIguxthNsX1HZc .node .label text,#mermaid-svg-XWPIguxthNsX1HZc .image-shape .label,#mermaid-svg-XWPIguxthNsX1HZc .icon-shape .label{text-anchor:middle;}#mermaid-svg-XWPIguxthNsX1HZc .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-XWPIguxthNsX1HZc .rough-node .label,#mermaid-svg-XWPIguxthNsX1HZc .node .label,#mermaid-svg-XWPIguxthNsX1HZc .image-shape .label,#mermaid-svg-XWPIguxthNsX1HZc .icon-shape .label{text-align:center;}#mermaid-svg-XWPIguxthNsX1HZc .node.clickable{cursor:pointer;}#mermaid-svg-XWPIguxthNsX1HZc .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-XWPIguxthNsX1HZc .arrowheadPath{fill:#333333;}#mermaid-svg-XWPIguxthNsX1HZc .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-XWPIguxthNsX1HZc .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-XWPIguxthNsX1HZc .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-XWPIguxthNsX1HZc .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-XWPIguxthNsX1HZc .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-XWPIguxthNsX1HZc .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-XWPIguxthNsX1HZc .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-XWPIguxthNsX1HZc .cluster text{fill:#333;}#mermaid-svg-XWPIguxthNsX1HZc .cluster span{color:#333;}#mermaid-svg-XWPIguxthNsX1HZc div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-XWPIguxthNsX1HZc .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-XWPIguxthNsX1HZc rect.text{fill:none;stroke-width:0;}#mermaid-svg-XWPIguxthNsX1HZc .icon-shape,#mermaid-svg-XWPIguxthNsX1HZc .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-XWPIguxthNsX1HZc .icon-shape p,#mermaid-svg-XWPIguxthNsX1HZc .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-XWPIguxthNsX1HZc .icon-shape .label rect,#mermaid-svg-XWPIguxthNsX1HZc .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-XWPIguxthNsX1HZc .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-XWPIguxthNsX1HZc .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-XWPIguxthNsX1HZc :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Hadoop集群部署
硬件规划
部署方式
自动化工具
验证上线
容量计算
硬件选型
网络拓扑
物理机
Ambari
Kubernetes
Ansible
Terraform
Operator
功能验证
性能测试
SLA验证

部署方式	适用场景	优点	缺点
物理机	超大规模集群	性能最优、资源独占	灵活性差、运维复杂
Ambari	中大型集群	可视化管理、生态完善	侵入性强、版本绑定
Kubernetes	云原生场景	弹性伸缩、快速部署	性能开销、网络复杂
混合部署	过渡期	灵活迁移	复杂度高

下期预告

第15期我们将深入探讨《机器学习与大数据融合》，讲解如何利用Spark MLlib、FlinkML进行工业大数据分析与机器学习建模。敬请期待！

作者：高炉炼铁智能化技术研究者，专注钢铁冶金与人工智能交叉领域。

👍 如果觉得有帮助，请点赞、收藏、转发！

版权归作者所有，未经许可请勿抄袭，套用，商用(或其它具有利益性行为) 。

🔔 关注专栏，不错过后续精彩内容！