hadoop部署

一、软件安装

1、安装docker

安装yum-config-manager配置工具
yum -y install yum-utils

建议使用阿里云yum源:(推荐)
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

安装docker-ce版本
yum install -y docker-ce

启动并开机启动
systemctl enable --now docker
docker --version

2、安装docker-compose

curl -SL https://github.com/docker/compose/releases/download/v2.16.0/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose

chmod +x /usr/local/bin/docker-compose
docker-compose --version

二、docker-compose

1、docker-compose deploy

1)设置副本数 deploy-test/replicas_test.yaml

yaml 复制代码
version: '3'
services:
  replicas_test:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    restart: always
    command: ["sh","-c","sleep 36000"]
    deploy:
      replicas: 2
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

docker-compose -f replicas_test.yaml up -d
docker-compose -f replicas_test.yaml ps

从上图可知,通过配置 deploy.replicas 来控制创建服务容器的数量,但是并非所有场景都适用, 下面Hadoop的有些组件是不适用的,像要求设置主机名和容器名的时候,就不太适用通过这个参数来调整容器的数量。

2)资源隔离 deploy-test/resources_test.yaml

docker-compose的资源隔离跟k8s里面的是一样的,示例如下:

yaml 复制代码
version: '3'
services:
  resources_test:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    restart: always
    command: ["sh","-c","sleep 36000"]
    deploy:
      replicas: 2
      resources:
        # 容器资源申请的最大值,容器最多能适用这么多资源
        limits:
          cpus: '1'
          memory: 100M
        # 所需资源的最小值,跟k8s里的requests一样,就是运行容器的最小值
        reservations:
          cpus: '0.5'
          memory: 50M
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

docker-compose -f resources_test.yaml up -d
docker-compose -f resources_test.yaml ps

查看状态
docker stats deploy-test-resources_test-1

2、docker-compose network

network 在容器中是非常重要的一个知识点,所以这里重点以示例讲解的方式来看看不同docker-compose项目之间如果通过名称访问,默认情况下,每个docker-compose就是一个项目(不同目录,相同目录的多个compose属于一个项目),每个项目就会默认生成一个网络。注意,默认情况下只能在同一个网络中使用名称相互访问。那不同项目中如何通过名称访问呢,接下来就以示例讲解。

1)不加network进行测试验证

network-test/test1/test1.yaml

yaml 复制代码
version: '3'
services:
  test1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    container_name: c_test1
    hostname: h_test1
    restart: always
    command: ["sh","-c","sleep 36000"]
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

network-test/test2/test2.yaml

yaml 复制代码
version: '3'
services:
  test2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    container_name: c_test2
    hostname: h_test2
    restart: always
    command: ["sh","-c","sleep 36000"]
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

docker-compose -f test1/test1.yaml up -d
docker-compose -f test2/test2.yaml up -d

查看网络
docker network ls

查看network,会生成两个network,如果两个yaml文件在同一个目录下,只会生成一个,它们也就属于同一个network下,是可以通过名称相互访问的。这里是在不同的目录下,就会生成两个network,默认情况下,不同的network是隔离的,不能通过名称访问的。yaml文件所在的目录名就是项目名称。这个项目名称是可以通过参数指定的,下面会细讲。

互ping

bash 复制代码
docker exec -it c_test1 ping c_test2  
docker exec -it c_test1 ping h_test2  
docker exec -it c_test2 ping c_test1  
docker exec -it c_test2 ping h_test1

卸载

bash 复制代码
docker-compose -f test1/test1.yaml down  
docker-compose -f test2/test2.yaml down 

2)加上network进行测试验证

在 test1/network_test1.yaml 定义创建新network,在下面test2/network_test2.yaml引用test1创建的网络,那么这两个项目就在同一个网络中了,注意先后执行顺序。

network-test/test1/network_test1.yaml

yaml 复制代码
version: '3'
services:
  network_test1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    container_name: c_network_test1
    hostname: h_network_test1
    restart: always
    command: ["sh","-c","sleep 36000"]
    # 使用network
    networks:
      - test1_network
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

# 定义创建新网络
networks:
  test1_network:
    driver: bridge

network-test/test2/network_test2.yaml

yaml 复制代码
version: '3'
services:
  network_test2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    container_name: c_network_test2
    hostname: h_network_test2
    restart: always
    networks:
      - test1_test1_network
    command: ["sh","-c","sleep 36000"]
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

# 引用test1的网络
networks:
  test1_test1_network:
    external: true

docker-compose -f test1/network_test1.yaml up -d
docker-compose -f test2/network_test2.yaml up -d

查看网络
docker network ls
互ping

docker exec -it c_network_test1 ping -c3 c_network_test2

docker exec -it c_network_test1 ping -c3 h_network_test2

docker exec -it c_network_test2 ping -c3 c_network_test1

docker exec -it c_network_test2 ping -c3 h_network_test1

卸载,注意顺序,要先卸载应用方,要不然network被应用了是删除不了的
docker-compose -f test2/network_test2.yaml down
docker-compose -f test1/network_test1.yaml down

从上实验可知,只有多个项目在同一个网络里才可以通过主机名或者容器名访问的。

3)默认/指定 网络名称

network-test/est.yaml

bash 复制代码
version: '3'
services:
  test:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    restart: always
    command: ["sh","-c","sleep 36000"]
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

先不加参数,查看网络
docker-compose -f test.yaml up -d
docker network ls

docker-compose -f test.yaml down

使用参数自定义项目名称,-p, --project-name,有四种写法

docker-compose -p=p001 -f test.yaml up -d

docker-compose -p p002 -f test.yaml up -d

docker-compose --project-name=p003 -f test.yaml up -d

docker-compose --project-name p004 -f test.yaml up -d

查看网络
docker network ls

查看所有项目
docker-compose ls

三、hadoop部署(非高可用)

最终的目录结构

1)安装 JDK

下载地址:www.oracle.com/at/java/tec...

tar -zxvf jdk-8u212-linux-x64.tar.gz

/etc/profile文件中追加如下内容:

echo "export JAVA_HOME=`pwd`/jdk1.8.0_212" >> /etc/profile echo "export PATH=JAVA_HOME/bin:PATH" >> /etc/profile echo "export CLASSPATH=.:JAVA_HOME/lib/dt.jar:JAVA_HOME/lib/tools.jar" >> /etc/profile

加载生效 source /etc/profile

2)下载 hadoop 相关的软件

1、Hadoop

下载地址:dlcdn.apache.org/hadoop/comm...

wget dlcdn.apache.org/hadoop/comm... --no-check-certificate

wget mirrors.tuna.tsinghua.edu.cn/apache/hado... --no-check-certificate

2、hive

下载地址:archive.apache.org/dist/hive

wget archive.apache.org/dist/hive/h...

wget mirrors.tuna.tsinghua.edu.cn/apache/hive...

3)Dockerfile

bash 复制代码
FROM centos:7
 
RUN rm -f /etc/localtime && \
    ln -sv /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
    echo "Asia/Shanghai" > /etc/timezone
 
RUN export LANG=zh_CN.UTF-8
 
# 创建用户和用户组,跟yaml编排里的user: 10000:10000
RUN groupadd --system --gid=10000 hadoop && useradd --system --home-dir /home/hadoop --uid=10000 --gid=hadoop hadoop
 
# 安装sudo和常用工具
RUN yum -y install sudo net-tools telnet wget nc curl ; chmod 640 /etc/sudoers
 
# 给hadoop添加sudo权限
RUN echo "hadoop ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
 
RUN mkdir /opt/apache/
 
# 安装 JDK
ADD jdk-8u212-linux-x64.tar.gz /opt/apache/
ENV JAVA_HOME /opt/apache/jdk1.8.0_212
ENV PATH $JAVA_HOME/bin:$PATH

# 配置 Hadoop
ENV HADOOP_VERSION 3.3.5
ADD hadoop-${HADOOP_VERSION}.tar.gz /opt/apache/
ENV HADOOP_HOME /opt/apache/hadoop
RUN ln -s /opt/apache/hadoop-${HADOOP_VERSION} $HADOOP_HOME
 
ENV HADOOP_COMMON_HOME=${HADOOP_HOME} \
    HADOOP_HDFS_HOME=${HADOOP_HOME} \
    HADOOP_MAPRED_HOME=${HADOOP_HOME} \
    HADOOP_YARN_HOME=${HADOOP_HOME} \
    HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop \
    PATH=${PATH}:${HADOOP_HOME}/bin

# 配置Hive
ENV HIVE_VERSION 3.1.3
ADD apache-hive-${HIVE_VERSION}-bin.tar.gz /opt/apache/
ENV HIVE_HOME=/opt/apache/hive
ENV PATH=$HIVE_HOME/bin:$PATH
RUN ln -s /opt/apache/apache-hive-${HIVE_VERSION}-bin ${HIVE_HOME}

# 创建namenode、datanode存储目录
RUN mkdir -p /opt/apache/hadoop/data/{hdfs,yarn} /opt/apache/hadoop/data/hdfs/namenode /opt/apache/hadoop/data/hdfs/datanode/data{1..3} /opt/apache/hadoop/data/yarn/{local-dirs,log-dirs,apps}

COPY bootstrap.sh /opt/apache/
 
RUN chmod +x /opt/apache/bootstrap.sh

COPY config/hadoop-config/* ${HADOOP_HOME}/etc/hadoop/
 
RUN chown -R hadoop:hadoop /opt/apache
 
ENV ll "ls -l"
 
WORKDIR /opt/apache

4)配置

1、Hadoop 配置

主要有以下几个文件:

core-site.xml、dfs.hosts、dfs.hosts.exclude、hdfs-site.xml、mapred-site.xml、yarn-hosts-exclude、yarn-hosts-include、yarn-site.xml

2、 .env

cat > .env << EOF HADOOP_HDFS_NN_PORT=9870 HADOOP_HDFS_DN_PORT=9864 HADOOP_YARN_RM_PORT=8088 HADOOP_YARN_NM_PORT=8042 HADOOP_YARN_PROXYSERVER_PORT=9111 HADOOP_MR_HISTORYSERVER_PORT=19888 EOF

5)脚本bootstrap.sh

bash 复制代码
#!/usr/bin/env sh


wait_for() {
    echo Waiting for $1 to listen on $2...
    while ! nc -z $1 $2; do echo waiting...; sleep $SLEEP_SECOND; done
}

start_hdfs_namenode() {
  
	if [ ! -f /tmp/namenode-formated ];then
		${HADOOP_HOME}/bin/hdfs namenode -format >/tmp/namenode-formated 
	fi

	${HADOOP_HOME}/bin/hdfs --loglevel INFO --daemon start namenode
	
	tail -f ${HADOOP_HOME}/logs/*namenode*.log
}

start_hdfs_datanode() {

        wait_for $1 $2
	
	${HADOOP_HOME}/bin/hdfs --loglevel INFO --daemon start datanode

        tail -f ${HADOOP_HOME}/logs/*datanode*.log	
}

start_yarn_resourcemanager() {

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start resourcemanager

        tail -f ${HADOOP_HOME}/logs/*resourcemanager*.log
}

start_yarn_nodemanager() {

        wait_for $1 $2

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start nodemanager

        tail -f ${HADOOP_HOME}/logs/*nodemanager*.log
}

start_yarn_proxyserver() {

        wait_for $1 $2

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start proxyserver

        tail -f ${HADOOP_HOME}/logs/*proxyserver*.log
}

start_mr_historyserver() {
       
        wait_for $1 $2

#	${HADOOP_HOME}/bin/mapred --loglevel INFO  --daemon  start historyserver

	tail -f ${HADOOP_HOME}/logs/*historyserver*.log
}

case $1 in
	hadoop-hdfs-nn)
		start_hdfs_namenode
		;;
	hadoop-hdfs-dn)
		start_hdfs_datanode $2 $3
		;;
	hadoop-yarn-rm)
		start_yarn_resourcemanager
		;;
	hadoop-yarn-nm)
                start_yarn_nodemanager $2 $3
                ;;
	hadoop-yarn-proxyserver)
		start_yarn_proxyserver $2 $3
		;;
	hadoop-mr-historyserver)
		start_mr_historyserver $2 $3
		;;
	*)
		echo "请输入正确的服务启动命令~"
	;;
esac

6)docker-compose.yaml 编排

yaml 复制代码
version: '3'
services:
  hadoop-hdfs-nn:
    image: hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-nn
    hostname: hadoop-hdfs-nn
    restart: always
    env_file:
      - .env
    ports:
      - "30070:${HADOOP_HDFS_NN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-nn"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_NN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-hdfs-dn-0:
    image: hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-dn-0
    hostname: hadoop-hdfs-dn-0
    restart: always
    depends_on:
      - hadoop-hdfs-nn
    env_file:
      - .env
    ports:
      - "30864:${HADOOP_HDFS_DN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
      # hadoop-hdfs-dn-1:
      #   image: hadoop:v1
      #   user: "hadoop:hadoop"
      #   container_name: hadoop-hdfs-dn-1
      #   hostname: hadoop-hdfs-dn-1
      #   restart: always
      #   depends_on:
      #     - hadoop-hdfs-nn
      #   env_file:
      #     - .env
      #   ports:
      #     - "30865:${HADOOP_HDFS_DN_PORT}"
      #   command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
      #   networks:
      #     - hadoop_network
      #   healthcheck:
      #     test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      #     interval: 10s
      #     timeout: 5s
      #     retries: 3
      # hadoop-hdfs-dn-2:
      #   image: hadoop:v1
      #   user: "hadoop:hadoop"
      #   container_name: hadoop-hdfs-dn-2
      #   hostname: hadoop-hdfs-dn-2
      #   restart: always
      #   depends_on:
      #     - hadoop-hdfs-nn
      #   env_file:
      #     - .env
      #   ports:
      #     - "30866:${HADOOP_HDFS_DN_PORT}"
      #   command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
      #   networks:
      #     - hadoop_network
      #   healthcheck:
      #     test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      #     interval: 10s
      #     timeout: 5s
      #     retries: 3
  hadoop-yarn-rm:
    image: hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-rm
    hostname: hadoop-yarn-rm
    restart: always
    env_file:
      - .env
    ports:
      - "30888:${HADOOP_YARN_RM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-rm"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_YARN_RM_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-yarn-nm-0:
    image: hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-nm-0
    hostname: hadoop-yarn-nm-0
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30042:${HADOOP_YARN_NM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
      # hadoop-yarn-nm-1:
      #   image: hadoop:v1
      #   user: "hadoop:hadoop"
      #   container_name: hadoop-yarn-nm-1
      #   hostname: hadoop-yarn-nm-1
      #   restart: always
      #   depends_on:
      #     - hadoop-yarn-rm
      #   env_file:
      #     - .env
      #   ports:
      #     - "30043:${HADOOP_YARN_NM_PORT}"
      #   command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
      #   networks:
      #     - hadoop_network
      #   healthcheck:
      #     test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      #     interval: 10s
      #     timeout: 5s
      #     retries: 3
      # hadoop-yarn-nm-2:
      #   image: hadoop:v1
      #   user: "hadoop:hadoop"
      #   container_name: hadoop-yarn-nm-2
      #   hostname: hadoop-yarn-nm-2
      #   restart: always
      #   depends_on:
      #     - hadoop-yarn-rm
      #   env_file:
      #     - .env
      #   ports:
      #     - "30044:${HADOOP_YARN_NM_PORT}"
      #   command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
      #   networks:
      #     - hadoop_network
      #   healthcheck:
      #     test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      #     interval: 10s
      #     timeout: 5s
      #     retries: 3
  hadoop-yarn-proxyserver:
    image: hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-proxyserver
    hostname: hadoop-yarn-proxyserver
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30911:${HADOOP_YARN_PROXYSERVER_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-proxyserver hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_YARN_PROXYSERVER_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-mr-historyserver:
    image: hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-mr-historyserver
    hostname: hadoop-mr-historyserver
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "31988:${HADOOP_MR_HISTORYSERVER_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-mr-historyserver hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_MR_HISTORYSERVER_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3

networks:
  hadoop_network:
    driver: bridge

如果是不同的compose文件生成的容器,如果不指定一样的network,它们直接是不能通过主机名访问的。 depends_on 只能决定容器的启动先后顺序,无法决定容器里服务的启动顺序,作用不大,所以在上面 bootstrap.sh 脚本里加上一个 wait_for 函数来真正控制服务的启动顺序。

7) 构建镜像

docker build -t hadoop:v1 . --no-cache

参数解释:

-t:指定镜像名称

. :当前目录Dockerfile

-f:指定Dockerfile路径

--no-cache:不缓存

8)启动服务

docker-compose -f docker-compose.yaml up -d

9)其它命令

docker rm -f $(docker ps -aq)
docker logs -f hadoop-mr-historyserver

10)验证搭建结果

hdfs ip:30070

hdfs文件系统目录

yarn ip:30888

参考文章: baijiahao.baidu.com/s?id=176201...

相关推荐
武子康8 小时前
大数据-257 离线数仓 - 数据质量监控 监控方法 Griffin架构
java·大数据·数据仓库·hive·hadoop·后端
莹雨潇潇9 小时前
Hadoop完全分布式环境部署
大数据·hadoop·分布式
学计算机的睿智大学生14 小时前
Hadoop集群搭建
大数据·hadoop·分布式
清平乐的技术专栏1 天前
Hive SQL 查询所有函数
hive·hadoop·sql
节点。csn1 天前
Hadoop yarn安装
大数据·hadoop·分布式
不惑_1 天前
小白入门 · 腾讯云轻量服务器部署 Hadoop 3.3.6
服务器·hadoop·腾讯云
csding111 天前
写入hive metastore报问题Permission denied: user=hadoop,inode=“/user/hive”
数据仓库·hive·hadoop
NiNg_1_2341 天前
基于Hadoop的数据清洗
大数据·hadoop·分布式