Apache Hadoop文件上传、下载、分布式计算案例初体验

上篇:Apache Hadoop完全分布式集群搭建无坑指南-CSDN博客

通过上篇,我们搭建了完整的Hadoop集群,此篇我们简单通过集群上传和下载文件,同时测试分布式worldCount案例。后续的篇章再对分布式计算、分布式存储作更深的理解。

上传下载测试

从linux本地文件系统上传下载文件验证HDFS集群工作是否正常

复制代码
#创建目录
hdfs dfs -mkdir -p /test/input

#本地hoome目录创建一个文件,随便写点内容进去
cd /root
vim test.txt
​
#上传linxu文件到Hdfs
hdfs dfs -put /root/test.txt /test/input
​
#从Hdfs下载文件到linux本地(可以换别的节点进行测试)
hdfs dfs -get /test/input/test.txt

分布式计算测试

在HDFS文件系统根目录下面创建一个wcinput文件夹

复制代码
[root@hadoop01 hadoop-2.9.2]# hdfs dfs -mkdir /wcinput

创建wc.txt文件,输入如下内容

复制代码
hadoop mapreduce yarn
hdfs hadoop mapreduce
mapreduce yarn kmning
kmning
kmning

上传wc.txt到Hdfs目录/wcinput下

复制代码
hdfs dfs -put wc.txt /wcinput

执行mapreduce任务

复制代码
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /wcinput/ /wcoutput

打印如下

复制代码
24/07/03 20:44:26 INFO client.RMProxy: Connecting to ResourceManager at hadoop03/192.168.43.103:8032
24/07/03 20:44:28 INFO input.FileInputFormat: Total input files to process : 1
24/07/03 20:44:28 INFO mapreduce.JobSubmitter: number of splits:1
24/07/03 20:44:28 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
24/07/03 20:44:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1720006717389_0001
24/07/03 20:44:29 INFO impl.YarnClientImpl: Submitted application application_1720006717389_0001
24/07/03 20:44:29 INFO mapreduce.Job: The url to track the job: http://hadoop03:8088/proxy/application_1720006717389_0001/
24/07/03 20:44:29 INFO mapreduce.Job: Running job: job_1720006717389_0001
24/07/03 20:44:45 INFO mapreduce.Job: Job job_1720006717389_0001 running in uber mode : false
24/07/03 20:44:45 INFO mapreduce.Job:  map 0% reduce 0%
24/07/03 20:44:57 INFO mapreduce.Job:  map 100% reduce 0%
24/07/03 20:45:13 INFO mapreduce.Job:  map 100% reduce 100%
24/07/03 20:45:14 INFO mapreduce.Job: Job job_1720006717389_0001 completed successfully
24/07/03 20:45:14 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=70
                FILE: Number of bytes written=396911
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=180
                HDFS: Number of bytes written=44
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=9440
                Total time spent by all reduces in occupied slots (ms)=11870
                Total time spent by all map tasks (ms)=9440
                Total time spent by all reduce tasks (ms)=11870
                Total vcore-milliseconds taken by all map tasks=9440
                Total vcore-milliseconds taken by all reduce tasks=11870
                Total megabyte-milliseconds taken by all map tasks=9666560
                Total megabyte-milliseconds taken by all reduce tasks=12154880
        Map-Reduce Framework
                Map input records=5
                Map output records=11
                Map output bytes=124
                Map output materialized bytes=70
                Input split bytes=100
                Combine input records=11
                Combine output records=5
                Reduce input groups=5
                Reduce shuffle bytes=70
                Reduce input records=5
                Reduce output records=5
                Spilled Records=10
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=498
                CPU time spent (ms)=3050
                Physical memory (bytes) snapshot=374968320
                Virtual memory (bytes) snapshot=4262629376
                Total committed heap usage (bytes)=219676672
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=80
        File Output Format Counters
                Bytes Written=44

查看结果

复制代码
[root@hadoop01 hadoop-2.9.2]# hdfs dfs -cat /wcoutput/part-r-00000
hadoop  2
hdfs    1
kmning  3
mapreduce       3
yarn    2

可见,程序将单词出现的次数通过MapReduce分布式计算统计了出来。

相关推荐
Lx35214 分钟前
Flink Table API与SQL的最佳实践
大数据
uuukashiro1 小时前
大数据计算引擎选型指南:腾讯云数据湖计算DLC领跑2025市场
大数据·ai·云计算·腾讯云
康语智能1 小时前
小康AI家庭医生:以科技之翼,守陪伴之初心
大数据·人机交互·智能手表
国际云,接待1 小时前
出海东南亚无忧:腾讯云如何凭借本地合作与全球节点,保障游戏和电商业务合规流畅?
大数据·服务器·网络·云计算·腾讯云
RFID舜识物联网1 小时前
NFC与RFID防伪标签:构筑产品信任的科技防线
大数据·人工智能·科技·嵌入式硬件·物联网·安全
五度易链-区域产业数字化管理平台2 小时前
五度易链产业大脑技术拆解:AI + 大数据 + 云计算如何构建产业链数字基础设施?
大数据·人工智能·云计算
数据牧羊人的成长笔记2 小时前
Hadoop 分布式计算MapReduce和资源管理Yarn
hadoop·eclipse·mapreduce
帅次2 小时前
系统分析师-案例分析-数据库系统&数据仓库&反规范化技术&NoSQL&内存数据库
大数据·数据库·数据仓库·oracle·kafka·数据库开发·数据库架构
汽车仪器仪表相关领域2 小时前
汽车排放检测的 “模块化核心”:HORIBA OBS-ONE GS Unit 气体分析单元技术解析
大数据·人工智能·功能测试·车载系统·汽车·安全性测试·汽车检测
涤生大数据2 小时前
日均亿级数据的实时分析:Doris如何接过Spark的接力棒?
大数据·spark·doris·实时计算·大数据开发·实时分析·实时技术