hadoop部署

一、软件安装

1、安装docker

安装yum-config-manager配置工具
yum -y install yum-utils

建议使用阿里云yum源:(推荐)
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

安装docker-ce版本
yum install -y docker-ce

启动并开机启动
systemctl enable --now docker
docker --version

2、安装docker-compose

curl -SL https://github.com/docker/compose/releases/download/v2.16.0/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose

chmod +x /usr/local/bin/docker-compose
docker-compose --version

二、docker-compose

1、docker-compose deploy

1)设置副本数 deploy-test/replicas_test.yaml

yaml 复制代码
version: '3'
services:
  replicas_test:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    restart: always
    command: ["sh","-c","sleep 36000"]
    deploy:
      replicas: 2
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

docker-compose -f replicas_test.yaml up -d
docker-compose -f replicas_test.yaml ps

从上图可知,通过配置 deploy.replicas 来控制创建服务容器的数量,但是并非所有场景都适用, 下面Hadoop的有些组件是不适用的,像要求设置主机名和容器名的时候,就不太适用通过这个参数来调整容器的数量。

2)资源隔离 deploy-test/resources_test.yaml

docker-compose的资源隔离跟k8s里面的是一样的,示例如下:

yaml 复制代码
version: '3'
services:
  resources_test:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    restart: always
    command: ["sh","-c","sleep 36000"]
    deploy:
      replicas: 2
      resources:
        # 容器资源申请的最大值,容器最多能适用这么多资源
        limits:
          cpus: '1'
          memory: 100M
        # 所需资源的最小值,跟k8s里的requests一样,就是运行容器的最小值
        reservations:
          cpus: '0.5'
          memory: 50M
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

docker-compose -f resources_test.yaml up -d
docker-compose -f resources_test.yaml ps

查看状态
docker stats deploy-test-resources_test-1

2、docker-compose network

network 在容器中是非常重要的一个知识点,所以这里重点以示例讲解的方式来看看不同docker-compose项目之间如果通过名称访问,默认情况下,每个docker-compose就是一个项目(不同目录,相同目录的多个compose属于一个项目),每个项目就会默认生成一个网络。注意,默认情况下只能在同一个网络中使用名称相互访问。那不同项目中如何通过名称访问呢,接下来就以示例讲解。

1)不加network进行测试验证

network-test/test1/test1.yaml

yaml 复制代码
version: '3'
services:
  test1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    container_name: c_test1
    hostname: h_test1
    restart: always
    command: ["sh","-c","sleep 36000"]
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

network-test/test2/test2.yaml

yaml 复制代码
version: '3'
services:
  test2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    container_name: c_test2
    hostname: h_test2
    restart: always
    command: ["sh","-c","sleep 36000"]
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

docker-compose -f test1/test1.yaml up -d
docker-compose -f test2/test2.yaml up -d

查看网络
docker network ls

查看network,会生成两个network,如果两个yaml文件在同一个目录下,只会生成一个,它们也就属于同一个network下,是可以通过名称相互访问的。这里是在不同的目录下,就会生成两个network,默认情况下,不同的network是隔离的,不能通过名称访问的。yaml文件所在的目录名就是项目名称。这个项目名称是可以通过参数指定的,下面会细讲。

互ping

bash 复制代码
docker exec -it c_test1 ping c_test2  
docker exec -it c_test1 ping h_test2  
docker exec -it c_test2 ping c_test1  
docker exec -it c_test2 ping h_test1

卸载

bash 复制代码
docker-compose -f test1/test1.yaml down  
docker-compose -f test2/test2.yaml down 

2)加上network进行测试验证

在 test1/network_test1.yaml 定义创建新network,在下面test2/network_test2.yaml引用test1创建的网络,那么这两个项目就在同一个网络中了,注意先后执行顺序。

network-test/test1/network_test1.yaml

yaml 复制代码
version: '3'
services:
  network_test1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    container_name: c_network_test1
    hostname: h_network_test1
    restart: always
    command: ["sh","-c","sleep 36000"]
    # 使用network
    networks:
      - test1_network
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

# 定义创建新网络
networks:
  test1_network:
    driver: bridge

network-test/test2/network_test2.yaml

yaml 复制代码
version: '3'
services:
  network_test2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    container_name: c_network_test2
    hostname: h_network_test2
    restart: always
    networks:
      - test1_test1_network
    command: ["sh","-c","sleep 36000"]
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

# 引用test1的网络
networks:
  test1_test1_network:
    external: true

docker-compose -f test1/network_test1.yaml up -d
docker-compose -f test2/network_test2.yaml up -d

查看网络
docker network ls
互ping

docker exec -it c_network_test1 ping -c3 c_network_test2

docker exec -it c_network_test1 ping -c3 h_network_test2

docker exec -it c_network_test2 ping -c3 c_network_test1

docker exec -it c_network_test2 ping -c3 h_network_test1

卸载,注意顺序,要先卸载应用方,要不然network被应用了是删除不了的
docker-compose -f test2/network_test2.yaml down
docker-compose -f test1/network_test1.yaml down

从上实验可知,只有多个项目在同一个网络里才可以通过主机名或者容器名访问的。

3)默认/指定 网络名称

network-test/est.yaml

bash 复制代码
version: '3'
services:
  test:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    restart: always
    command: ["sh","-c","sleep 36000"]
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

先不加参数,查看网络
docker-compose -f test.yaml up -d
docker network ls

docker-compose -f test.yaml down

使用参数自定义项目名称,-p, --project-name,有四种写法

docker-compose -p=p001 -f test.yaml up -d

docker-compose -p p002 -f test.yaml up -d

docker-compose --project-name=p003 -f test.yaml up -d

docker-compose --project-name p004 -f test.yaml up -d

查看网络
docker network ls

查看所有项目
docker-compose ls

三、hadoop部署(非高可用)

最终的目录结构

1)安装 JDK

下载地址:www.oracle.com/at/java/tec...

tar -zxvf jdk-8u212-linux-x64.tar.gz

/etc/profile文件中追加如下内容:

echo "export JAVA_HOME=`pwd`/jdk1.8.0_212" >> /etc/profile echo "export PATH=JAVA_HOME/bin:PATH" >> /etc/profile echo "export CLASSPATH=.:JAVA_HOME/lib/dt.jar:JAVA_HOME/lib/tools.jar" >> /etc/profile

加载生效 source /etc/profile

2)下载 hadoop 相关的软件

1、Hadoop

下载地址:dlcdn.apache.org/hadoop/comm...

wget dlcdn.apache.org/hadoop/comm... --no-check-certificate

wget mirrors.tuna.tsinghua.edu.cn/apache/hado... --no-check-certificate

2、hive

下载地址:archive.apache.org/dist/hive

wget archive.apache.org/dist/hive/h...

wget mirrors.tuna.tsinghua.edu.cn/apache/hive...

3)Dockerfile

bash 复制代码
FROM centos:7
 
RUN rm -f /etc/localtime && \
    ln -sv /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
    echo "Asia/Shanghai" > /etc/timezone
 
RUN export LANG=zh_CN.UTF-8
 
# 创建用户和用户组,跟yaml编排里的user: 10000:10000
RUN groupadd --system --gid=10000 hadoop && useradd --system --home-dir /home/hadoop --uid=10000 --gid=hadoop hadoop
 
# 安装sudo和常用工具
RUN yum -y install sudo net-tools telnet wget nc curl ; chmod 640 /etc/sudoers
 
# 给hadoop添加sudo权限
RUN echo "hadoop ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
 
RUN mkdir /opt/apache/
 
# 安装 JDK
ADD jdk-8u212-linux-x64.tar.gz /opt/apache/
ENV JAVA_HOME /opt/apache/jdk1.8.0_212
ENV PATH $JAVA_HOME/bin:$PATH

# 配置 Hadoop
ENV HADOOP_VERSION 3.3.5
ADD hadoop-${HADOOP_VERSION}.tar.gz /opt/apache/
ENV HADOOP_HOME /opt/apache/hadoop
RUN ln -s /opt/apache/hadoop-${HADOOP_VERSION} $HADOOP_HOME
 
ENV HADOOP_COMMON_HOME=${HADOOP_HOME} \
    HADOOP_HDFS_HOME=${HADOOP_HOME} \
    HADOOP_MAPRED_HOME=${HADOOP_HOME} \
    HADOOP_YARN_HOME=${HADOOP_HOME} \
    HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop \
    PATH=${PATH}:${HADOOP_HOME}/bin

# 配置Hive
ENV HIVE_VERSION 3.1.3
ADD apache-hive-${HIVE_VERSION}-bin.tar.gz /opt/apache/
ENV HIVE_HOME=/opt/apache/hive
ENV PATH=$HIVE_HOME/bin:$PATH
RUN ln -s /opt/apache/apache-hive-${HIVE_VERSION}-bin ${HIVE_HOME}

# 创建namenode、datanode存储目录
RUN mkdir -p /opt/apache/hadoop/data/{hdfs,yarn} /opt/apache/hadoop/data/hdfs/namenode /opt/apache/hadoop/data/hdfs/datanode/data{1..3} /opt/apache/hadoop/data/yarn/{local-dirs,log-dirs,apps}

COPY bootstrap.sh /opt/apache/
 
RUN chmod +x /opt/apache/bootstrap.sh

COPY config/hadoop-config/* ${HADOOP_HOME}/etc/hadoop/
 
RUN chown -R hadoop:hadoop /opt/apache
 
ENV ll "ls -l"
 
WORKDIR /opt/apache

4)配置

1、Hadoop 配置

主要有以下几个文件:

core-site.xml、dfs.hosts、dfs.hosts.exclude、hdfs-site.xml、mapred-site.xml、yarn-hosts-exclude、yarn-hosts-include、yarn-site.xml

2、 .env

cat > .env << EOF HADOOP_HDFS_NN_PORT=9870 HADOOP_HDFS_DN_PORT=9864 HADOOP_YARN_RM_PORT=8088 HADOOP_YARN_NM_PORT=8042 HADOOP_YARN_PROXYSERVER_PORT=9111 HADOOP_MR_HISTORYSERVER_PORT=19888 EOF

5)脚本bootstrap.sh

bash 复制代码
#!/usr/bin/env sh


wait_for() {
    echo Waiting for $1 to listen on $2...
    while ! nc -z $1 $2; do echo waiting...; sleep $SLEEP_SECOND; done
}

start_hdfs_namenode() {
  
	if [ ! -f /tmp/namenode-formated ];then
		${HADOOP_HOME}/bin/hdfs namenode -format >/tmp/namenode-formated 
	fi

	${HADOOP_HOME}/bin/hdfs --loglevel INFO --daemon start namenode
	
	tail -f ${HADOOP_HOME}/logs/*namenode*.log
}

start_hdfs_datanode() {

        wait_for $1 $2
	
	${HADOOP_HOME}/bin/hdfs --loglevel INFO --daemon start datanode

        tail -f ${HADOOP_HOME}/logs/*datanode*.log	
}

start_yarn_resourcemanager() {

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start resourcemanager

        tail -f ${HADOOP_HOME}/logs/*resourcemanager*.log
}

start_yarn_nodemanager() {

        wait_for $1 $2

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start nodemanager

        tail -f ${HADOOP_HOME}/logs/*nodemanager*.log
}

start_yarn_proxyserver() {

        wait_for $1 $2

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start proxyserver

        tail -f ${HADOOP_HOME}/logs/*proxyserver*.log
}

start_mr_historyserver() {
       
        wait_for $1 $2

#	${HADOOP_HOME}/bin/mapred --loglevel INFO  --daemon  start historyserver

	tail -f ${HADOOP_HOME}/logs/*historyserver*.log
}

case $1 in
	hadoop-hdfs-nn)
		start_hdfs_namenode
		;;
	hadoop-hdfs-dn)
		start_hdfs_datanode $2 $3
		;;
	hadoop-yarn-rm)
		start_yarn_resourcemanager
		;;
	hadoop-yarn-nm)
                start_yarn_nodemanager $2 $3
                ;;
	hadoop-yarn-proxyserver)
		start_yarn_proxyserver $2 $3
		;;
	hadoop-mr-historyserver)
		start_mr_historyserver $2 $3
		;;
	*)
		echo "请输入正确的服务启动命令~"
	;;
esac

6)docker-compose.yaml 编排

yaml 复制代码
version: '3'
services:
  hadoop-hdfs-nn:
    image: hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-nn
    hostname: hadoop-hdfs-nn
    restart: always
    env_file:
      - .env
    ports:
      - "30070:${HADOOP_HDFS_NN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-nn"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_NN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-hdfs-dn-0:
    image: hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-dn-0
    hostname: hadoop-hdfs-dn-0
    restart: always
    depends_on:
      - hadoop-hdfs-nn
    env_file:
      - .env
    ports:
      - "30864:${HADOOP_HDFS_DN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
      # hadoop-hdfs-dn-1:
      #   image: hadoop:v1
      #   user: "hadoop:hadoop"
      #   container_name: hadoop-hdfs-dn-1
      #   hostname: hadoop-hdfs-dn-1
      #   restart: always
      #   depends_on:
      #     - hadoop-hdfs-nn
      #   env_file:
      #     - .env
      #   ports:
      #     - "30865:${HADOOP_HDFS_DN_PORT}"
      #   command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
      #   networks:
      #     - hadoop_network
      #   healthcheck:
      #     test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      #     interval: 10s
      #     timeout: 5s
      #     retries: 3
      # hadoop-hdfs-dn-2:
      #   image: hadoop:v1
      #   user: "hadoop:hadoop"
      #   container_name: hadoop-hdfs-dn-2
      #   hostname: hadoop-hdfs-dn-2
      #   restart: always
      #   depends_on:
      #     - hadoop-hdfs-nn
      #   env_file:
      #     - .env
      #   ports:
      #     - "30866:${HADOOP_HDFS_DN_PORT}"
      #   command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
      #   networks:
      #     - hadoop_network
      #   healthcheck:
      #     test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      #     interval: 10s
      #     timeout: 5s
      #     retries: 3
  hadoop-yarn-rm:
    image: hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-rm
    hostname: hadoop-yarn-rm
    restart: always
    env_file:
      - .env
    ports:
      - "30888:${HADOOP_YARN_RM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-rm"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_YARN_RM_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-yarn-nm-0:
    image: hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-nm-0
    hostname: hadoop-yarn-nm-0
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30042:${HADOOP_YARN_NM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
      # hadoop-yarn-nm-1:
      #   image: hadoop:v1
      #   user: "hadoop:hadoop"
      #   container_name: hadoop-yarn-nm-1
      #   hostname: hadoop-yarn-nm-1
      #   restart: always
      #   depends_on:
      #     - hadoop-yarn-rm
      #   env_file:
      #     - .env
      #   ports:
      #     - "30043:${HADOOP_YARN_NM_PORT}"
      #   command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
      #   networks:
      #     - hadoop_network
      #   healthcheck:
      #     test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      #     interval: 10s
      #     timeout: 5s
      #     retries: 3
      # hadoop-yarn-nm-2:
      #   image: hadoop:v1
      #   user: "hadoop:hadoop"
      #   container_name: hadoop-yarn-nm-2
      #   hostname: hadoop-yarn-nm-2
      #   restart: always
      #   depends_on:
      #     - hadoop-yarn-rm
      #   env_file:
      #     - .env
      #   ports:
      #     - "30044:${HADOOP_YARN_NM_PORT}"
      #   command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
      #   networks:
      #     - hadoop_network
      #   healthcheck:
      #     test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      #     interval: 10s
      #     timeout: 5s
      #     retries: 3
  hadoop-yarn-proxyserver:
    image: hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-proxyserver
    hostname: hadoop-yarn-proxyserver
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30911:${HADOOP_YARN_PROXYSERVER_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-proxyserver hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_YARN_PROXYSERVER_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-mr-historyserver:
    image: hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-mr-historyserver
    hostname: hadoop-mr-historyserver
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "31988:${HADOOP_MR_HISTORYSERVER_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-mr-historyserver hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_MR_HISTORYSERVER_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3

networks:
  hadoop_network:
    driver: bridge

如果是不同的compose文件生成的容器,如果不指定一样的network,它们直接是不能通过主机名访问的。 depends_on 只能决定容器的启动先后顺序,无法决定容器里服务的启动顺序,作用不大,所以在上面 bootstrap.sh 脚本里加上一个 wait_for 函数来真正控制服务的启动顺序。

7) 构建镜像

docker build -t hadoop:v1 . --no-cache

参数解释:

-t:指定镜像名称

. :当前目录Dockerfile

-f:指定Dockerfile路径

--no-cache:不缓存

8)启动服务

docker-compose -f docker-compose.yaml up -d

9)其它命令

docker rm -f $(docker ps -aq)
docker logs -f hadoop-mr-historyserver

10)验证搭建结果

hdfs ip:30070

hdfs文件系统目录

yarn ip:30888

参考文章: baijiahao.baidu.com/s?id=176201...

相关推荐
随心............7 小时前
在开发过程中遇到问题如何解决,以及两个经典问题
hive·hadoop·spark
yumgpkpm16 小时前
CMP (类ClouderaCDP7.3(404次编译) )华为鲲鹏Aarch64(ARM)信创环境 查询2100w行 hive 查询策略
数据库·数据仓库·hive·hadoop·flink·mapreduce·big data
K_i1342 天前
Hadoop 集群自动化运维实战
运维·hadoop·自动化
Q26433650232 天前
【有源码】基于Python与Spark的火锅店数据可视化分析系统-基于机器学习的火锅店综合竞争力评估与可视化分析-基于用户画像聚类的火锅店市场细分与可视化研究
大数据·hadoop·python·机器学习·数据分析·spark·毕业设计
顧棟2 天前
【Yarn实战】Yarn 2.9.1滚动升级到3.4.1调研与实践验证
hadoop·yarn
D明明就是我2 天前
Hive 拉链表
数据仓库·hive·hadoop
嘉禾望岗5032 天前
hive join优化和数据倾斜处理
数据仓库·hive·hadoop
yumgpkpm2 天前
华为鲲鹏 Aarch64 环境下多 Oracle 数据库汇聚操作指南 CMP(类 Cloudera CDP 7.3)
大数据·hive·hadoop·elasticsearch·zookeeper·big data·cloudera
忧郁火龙果3 天前
六、Hive的基本使用
数据仓库·hive·hadoop
忧郁火龙果3 天前
五、安装配置hive
数据仓库·hive·hadoop