1.将竞赛的数据上传HDFS,查看数据的格式
![](https://file.jishuzhan.net/article/1783164874889629697/139f15e666c2b166513b7ab43ac0da99.webp)
通过浏览器访问hdfs,查看该文档前面的部分数据
![](https://file.jishuzhan.net/article/1783164874889629697/e6c184f4243656bce1ad1eddea7812f7.webp)
每条数据的字段值之间使用逗号隔开的 ,最终时间是第五个自动,获取第五个字段值的中的年月日。
2.通过Idea创建项目mr-raceData ,基础的配置
![](https://file.jishuzhan.net/article/1783164874889629697/06c97236085f035d6782baeb9c1e77f4.webp)
修改pom.xml,添加依赖
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.1.4</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.30</version>
</dependency>
</dependencies>
![](https://file.jishuzhan.net/article/1783164874889629697/0511bfda0ac9685983fa71ae05d1aee0.webp)
在resources目录下,新建log4j.properties
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=D:\\visitcount.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
![](https://file.jishuzhan.net/article/1783164874889629697/722e0b789e08d7e91619eed850c3103d.webp)
编写代码后,需要将其打成Jar包,需要修改pom.xml
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
</execution>
</executions>
</plugin>
</plugins>
</build>
打包使用的插件:
![](https://file.jishuzhan.net/article/1783164874889629697/434f99506ed2b7447a94574929bb95b7.webp)
指定打包的方式为jar
![](https://file.jishuzhan.net/article/1783164874889629697/c23a80ef45f51988cc349c2d009eb95b.webp)
编写源代码:
Mapper模块:
![](https://file.jishuzhan.net/article/1783164874889629697/e25c92340187adbb78cbcd709244703b.webp)
Reducer模块:
![](https://file.jishuzhan.net/article/1783164874889629697/cab3016109ff3f95fa96b773a06e54ac.webp)
Driver模块:
![](https://file.jishuzhan.net/article/1783164874889629697/732042bb843af88b46405bd94c00482f.webp)
最后使用maven打包为Jar,按以下四步参考,clean-->validate-->compile-->package
![](https://file.jishuzhan.net/article/1783164874889629697/ce097767d5d860bf9f43d073454f9601.webp)
在当前项目下的target目录下找到打包后的jar文件
![](https://file.jishuzhan.net/article/1783164874889629697/9da3636c379e08f1ad5e004e2744722d.webp)
将jar文件拷贝到桌面,并上传的master的当前用户目录下
![](https://file.jishuzhan.net/article/1783164874889629697/3ce96f3f311e53fbc50e2728e463fcba.webp)
将竞赛日志数据取部分上传到hdfs上
[yt@master ~]$ hdfs dfs -put access_log.txt /bigdata/
![](https://file.jishuzhan.net/article/1783164874889629697/474133278ac29709d11cdac09dab6ba4.webp)
执行jar文件,实现访问每条访问次数的统计
[yt@master ~]$ hadoop jar visitcount-1.0-SNAPSHOT.jar com.maidu.visitcount.DailyAccessCount /bigdata/access_log.txt /output11/
![](https://file.jishuzhan.net/article/1783164874889629697/780e0fe32c6b5e950947de046cf2131c.webp)
执行完成后,可以查看输出文件,看到最终的统计结果。
查看统计的结果:
![](https://file.jishuzhan.net/article/1783164874889629697/5fdbe1195add647855312320dad687b4.webp)
统计成功。