Apache Hadoop文件上传、下载、分布式计算案例初体验

上篇:Apache Hadoop完全分布式集群搭建无坑指南-CSDN博客

通过上篇,我们搭建了完整的Hadoop集群,此篇我们简单通过集群上传和下载文件,同时测试分布式worldCount案例。后续的篇章再对分布式计算、分布式存储作更深的理解。

上传下载测试

从linux本地文件系统上传下载文件验证HDFS集群工作是否正常

复制代码
#创建目录
hdfs dfs -mkdir -p /test/input

#本地hoome目录创建一个文件,随便写点内容进去
cd /root
vim test.txt
​
#上传linxu文件到Hdfs
hdfs dfs -put /root/test.txt /test/input
​
#从Hdfs下载文件到linux本地(可以换别的节点进行测试)
hdfs dfs -get /test/input/test.txt

分布式计算测试

在HDFS文件系统根目录下面创建一个wcinput文件夹

复制代码
[root@hadoop01 hadoop-2.9.2]# hdfs dfs -mkdir /wcinput

创建wc.txt文件,输入如下内容

复制代码
hadoop mapreduce yarn
hdfs hadoop mapreduce
mapreduce yarn kmning
kmning
kmning

上传wc.txt到Hdfs目录/wcinput下

复制代码
hdfs dfs -put wc.txt /wcinput

执行mapreduce任务

复制代码
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /wcinput/ /wcoutput

打印如下

复制代码
24/07/03 20:44:26 INFO client.RMProxy: Connecting to ResourceManager at hadoop03/192.168.43.103:8032
24/07/03 20:44:28 INFO input.FileInputFormat: Total input files to process : 1
24/07/03 20:44:28 INFO mapreduce.JobSubmitter: number of splits:1
24/07/03 20:44:28 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
24/07/03 20:44:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1720006717389_0001
24/07/03 20:44:29 INFO impl.YarnClientImpl: Submitted application application_1720006717389_0001
24/07/03 20:44:29 INFO mapreduce.Job: The url to track the job: http://hadoop03:8088/proxy/application_1720006717389_0001/
24/07/03 20:44:29 INFO mapreduce.Job: Running job: job_1720006717389_0001
24/07/03 20:44:45 INFO mapreduce.Job: Job job_1720006717389_0001 running in uber mode : false
24/07/03 20:44:45 INFO mapreduce.Job:  map 0% reduce 0%
24/07/03 20:44:57 INFO mapreduce.Job:  map 100% reduce 0%
24/07/03 20:45:13 INFO mapreduce.Job:  map 100% reduce 100%
24/07/03 20:45:14 INFO mapreduce.Job: Job job_1720006717389_0001 completed successfully
24/07/03 20:45:14 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=70
                FILE: Number of bytes written=396911
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=180
                HDFS: Number of bytes written=44
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=9440
                Total time spent by all reduces in occupied slots (ms)=11870
                Total time spent by all map tasks (ms)=9440
                Total time spent by all reduce tasks (ms)=11870
                Total vcore-milliseconds taken by all map tasks=9440
                Total vcore-milliseconds taken by all reduce tasks=11870
                Total megabyte-milliseconds taken by all map tasks=9666560
                Total megabyte-milliseconds taken by all reduce tasks=12154880
        Map-Reduce Framework
                Map input records=5
                Map output records=11
                Map output bytes=124
                Map output materialized bytes=70
                Input split bytes=100
                Combine input records=11
                Combine output records=5
                Reduce input groups=5
                Reduce shuffle bytes=70
                Reduce input records=5
                Reduce output records=5
                Spilled Records=10
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=498
                CPU time spent (ms)=3050
                Physical memory (bytes) snapshot=374968320
                Virtual memory (bytes) snapshot=4262629376
                Total committed heap usage (bytes)=219676672
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=80
        File Output Format Counters
                Bytes Written=44

查看结果

复制代码
[root@hadoop01 hadoop-2.9.2]# hdfs dfs -cat /wcoutput/part-r-00000
hadoop  2
hdfs    1
kmning  3
mapreduce       3
yarn    2

可见,程序将单词出现的次数通过MapReduce分布式计算统计了出来。

相关推荐
CableTech_SQH4 小时前
华中科技大学同济医学院附属协和医院重庆医院智能化建设 F5G 全光方案百盛分析报告
大数据·网络·5g·运维开发·信息与通信
陆水A5 小时前
用CASE WHEN实现横向迭代,节点数据串行推算
大数据·数据仓库·数据库开发·etl·etl工程师
3D探路人5 小时前
模灵 大模型聚合API 转发流程技术实现
java·大数据·开发语言·前端·人工智能·计算机视觉
城事漫游Molly6 小时前
案例研究:如何明智地选择案例、精巧地界定边界、深刻地进行分析?
大数据·人工智能·ai写作·论文笔记
LaughingZhu6 小时前
Product Hunt 每日热榜 | 2026-05-12
大数据·人工智能·经验分享·神经网络·产品运营
eastyuxiao6 小时前
数字孪生(Digital Twin)从入门到实战教程
大数据·人工智能·数字孪生
皮皮学姐分享-ppx6 小时前
上市公司数字技术风险暴露数据(2010-2024)|《经济研究》同款大模型测算
大数据·网络·数据库·人工智能·chatgpt·制造
数字会议深科技7 小时前
政务表决会议升级方案解析|多形态大型表决系统融合方案科普
大数据·人工智能·政务·无纸化·会议厂商·ai会议生态服务商·表决系统
互联网科技看点7 小时前
泛微・齐业成核心优势深度解析:数智化费控管理标杆
大数据·人工智能·云计算
财经资讯数据_灵砚智能8 小时前
基于全球经济类多源新闻的NLP情感分析与数据可视化(日间)2026年5月13日
大数据·人工智能·python·信息可视化·自然语言处理