Hadoop分布式文件系统(二)

[1. 引言](#1. 引言)
[1. Hadoop文件操作命令](#1. Hadoop文件操作命令)
[2. 部分常用的Hadoop FS Shell命令](#2. 部分常用的Hadoop FS Shell命令)
- [2.1 ls列出文件](#2.1 ls列出文件)
- [2.2 mkdir创建目录](#2.2 mkdir创建目录)
- [2.3 put上传文件](#2.3 put上传文件)
- [2.4 cat查看文件](#2.4 cat查看文件)
- [2.5 get复制文件](#2.5 get复制文件)
- [2.6 rm删除文件](#2.6 rm删除文件)
[3. Hadoop系统管理命令](#3. Hadoop系统管理命令)
[4. HDFS Java API 示例](#4. HDFS Java API 示例)
参考

1. 引言

大多数HDFS Shell命令的行为和对应的Unix Shell命令类似，主要不同之处是HDFS Shell命令操作的是远程Hadoop服务器的文件，而Unix Shell命令操作的是本地文件。

1. Hadoop文件操作命令

调用Hadoop文件系统(FS)Shell命令应使用hadoop fs -cmd <args>的形式(需要把hadoop文件下的bin目录添加到系统的PATH环境变量中)。所有的FS命令使用URI路径作为参数，URI的格式为scheme://authority/path。对于HDFS系统来说，scheme是hdfs；对于本地文件系统来说，scheme是file。scheme和authority都是可选的参数，未加指定就会使用配置中的默认值(我配置的镜像中默认值为hdfs://172.17.0.2:9000，这个值可在hadoop文件下etc/hadoop/core-site.xml中找到)。

Hadoop Shell命令都可在Hadoop3.3.6 FS Shell这个链接里查找。

2. 部分常用的Hadoop FS Shell命令

如果学过Linux Shell命令的话，就很容易上手。可以通过hadoop fs -help获取hadoop fs命令的使用帮助，如下图所示(这个使用帮助很长，仅展示了部分)。

2.1 ls列出文件

命令：hadoop fs -ls <args>。

列出指定目录或者文件的信息。

bash 复制代码

hadoop fs -ls /

2.2 mkdir创建目录

命令：hadoop fs -mkdir <paths>。

接收路径指定的URI作为参数创建目录。

bash 复制代码

hadoop fs -mkdir /test1 /test2 && hadoop fs -ls /

2.3 put上传文件

命令：hadoop fs -put <localsrc> ··· <dst>。

从本地文件系统中复制单个或多个文件到目标文件系统。

bash 复制代码

touch text1 text2 && hadoop fs -put text1 text2 /test1 && hadoop fs -ls /test1

2.4 cat查看文件

命令：hadoop fs -cat URI [URI···]。

查看指定文件的内容。

bash 复制代码

echo "hello" | tee text3 && hadoop fs -put text3 /test2 && hadoop fs -cat /test2/text3

第一个hello是echo命令打印到终端并同时写入text3中，第二个hello才是/test2/text3中的内容。

2.5 get复制文件

命令：hadoop fs -get <src> <localsrc>。

复制文件到本地文件系统。

bash 复制代码

hadoop fs -get /test2/text3 text4 && ls -l .

2.6 rm删除文件

命令：hadoop fs -rm URI [URI···]。

删除指定的文件。

bash 复制代码

hadoop fs -rm -r /test1 /test2 && hadoop fs -ls /

-rm -r使用递归删除，不管目录空不空，全部强制删除。

3. Hadoop系统管理命令

命令	作用
`hadoop version`	查看Hadoop版本
`hadoop namenode -format`	格式化一个新的分布式文件系统
`start-dfs.sh`	启动HDFS
`stop-dfs.sh`	停止HDFS
`start-yarn.sh`	启动yarn
`stop-yarn.sh`	停止yarn
`hdfs dfsadmin -safemode enter`	手动进入安全模式
`hdfs dfsadmin -safemode leave`	退出安全模式
`hdfs dfsadmin -safemode get`	查看是否处于安全模式
`hdfs dfsadmin -help`	列出hadoop dfsadmin支持的命令

Namenode在启动时会进入安全模式，安全模式是Namenode的一种状态，在这个阶段，文件系统不允许有任何修改。安全模式的目的是在系统启动时检查各个Datanode上数据块的有效性，同时根据策略对数据块进行必要的复制或删除，当数据块副本满足最小副本数条件时，会自动退出安全模式。HDFS进入安全模式后会导致Hive和HBase的启动异常。

4. HDFS Java API 示例

我配置的镜像中安装了code-server，只要你启动镜像时把端口映射到宿主机上，就可以通过宿主机的网页(localhost:映射到宿主机上的端口)进行编程。我推荐使用maven来管理Java项目，并且在code-server中安装好Java的插件。

接下来按住ctrl+shift+p启动code-server的命令面板，输入java，点击Create Java Project，选择Maven，选择maven-archetype-quickstart，顺着引导一路配置，等待生成Java项目(Java大佬请随意，本人初识Maven)。

生成的Java项目中有一个pom.xml文件，这个文件里包含项目的一些信息和Java依赖。在<dependencies>和<\dependencies>中添加如下依赖。

xml 复制代码

	<dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>3.3.6</version>
      <exclusions>
        <exclusion>
          <groupId>org.slf4j</groupId>
          <artifactId>slf4j-log4j12</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-hdfs</artifactId>
      <version>3.3.6</version>
      <exclusions>
        <exclusion>
          <groupId>org.slf4j</groupId>
          <artifactId>slf4j-log4j12</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-core</artifactId>
      <version>3.3.6</version>
    </dependency>

在项目src文件下找到App.java文件，输入以下代码。

java 复制代码

import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.BlockLocation;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hdfs.DistributedFileSystem;
import org.apache.hadoop.hdfs.protocol.DatanodeInfo;
import org.apache.hadoop.io.IOUtils;

public class App {

    public static void main(String[] args) throws Exception {
        uploadFile("/root/hadoop-3.3.6/logs/hadoop-root-namenode-0aa49d1e16ae.log", "/");
        createFile("Hello Hadoop root@0aa49d1e16ae\n", "/testcreate");
        createDir("/test");
        fileRename("/hadoop-root-namenode-0aa49d1e16ae.log", "/hadoop-root-namenode.log");
        deleteFile("/test");
        readFile("/testcreate");
        isFileExists("/testcreate");
        fileLastModify("/testcreate");
        fileLocation("/testcreate");
        nodeList();
    }

    static FileSystem getFileSystem() throws Exception {
        URI uri = new URI("hdfs://172.17.0.2:9000/");
        FileSystem fileSystem = FileSystem.get(uri, new Configuration());
        return fileSystem;
    }

    public static void uploadFile(String inputPath,String outputPath) throws Exception {
        FileSystem hdfs = getFileSystem();
        Path src = new Path(inputPath);
        Path dst = new Path(outputPath);
        hdfs.copyFromLocalFile(src, dst);
    }

    public static void createFile(String string, String outputPath) throws Exception {
        byte[] buff = string.getBytes();
        FileSystem hdfs = getFileSystem();
        Path dfs = new Path(outputPath);
        FSDataOutputStream outputStream = hdfs.create(dfs);
        outputStream.write(buff, 0, buff.length);
        outputStream.close();
    }

    public static void createDir(String dirPath) throws Exception {
        FileSystem hdfs = getFileSystem();
        Path dfs = new Path(dirPath);
        hdfs.mkdirs(dfs);
    }

    public static void fileRename(String oldName, String newName) throws Exception {
        FileSystem hdfs = getFileSystem();
        Path frPath = new Path(oldName);
        Path toPath = new Path(newName);
        boolean isRenamed = hdfs.rename(frPath, toPath);
        String result = isRenamed ? "success" : "failure";
        System.out.println("The result of renaming the file: " + result);
    }

    public static void deleteFile(String path) throws Exception {
        FileSystem hdfs = getFileSystem();
        Path delef = new Path(path);
        boolean isDeleted = hdfs.delete(delef, true);
        System.out.println("Delete? " + isDeleted);
    }

    public static void readFile(String readPath) throws Exception {
        FileSystem hdfs = getFileSystem();
        FSDataInputStream openStream = hdfs.open(new Path(readPath));
        IOUtils.copyBytes(openStream, System.out, 1024, false);
        IOUtils.closeStream(openStream);
    }

    public static void isFileExists(String path) throws Exception {
        FileSystem hdfs = getFileSystem();
        Path findf = new Path(path);
        boolean isExisted = hdfs.exists(findf);
        System.out.println("Exist? " + isExisted);
    }

    public static void fileLastModify(String path) throws Exception {
        FileSystem hdfs = getFileSystem();
        Path fPath = new Path(path);
        FileStatus fileStatus = hdfs.getFileStatus(fPath);
        long modiTime = fileStatus.getModificationTime();
        System.out.println("the modify time of testcreate is " + modiTime);
    }

    public static void fileLocation(String path) throws Exception {
        FileSystem hdfs = getFileSystem();
        Path fPath = new Path(path);
        FileStatus fileStatus = hdfs.getFileStatus(fPath);
        BlockLocation[] blockLocations = hdfs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
        int blockLen = blockLocations.length;
        for (int i = 0; i < blockLen; i++) {
            String[] hosts = blockLocations[i].getHosts();
            System.out.println("block_" + i + "_location: " + hosts[0]);
        }
    }

    public static void nodeList() throws Exception {
        FileSystem fs = getFileSystem();
        DistributedFileSystem hdfs = (DistributedFileSystem) fs;
        DatanodeInfo[] dataNodeStats = hdfs.getDataNodeStats();
        for (int i = 0; i < dataNodeStats.length; i++) {
            System.out.println("DataNode_" + i + "_Name: " + dataNodeStats[i].getHostName());
        }
    }
}

点击App.java的右上角按钮，选择Run Java，得到如下运行结果。

参考

吴章勇杨强著大数据Hadoop3.X分布式处理实战