首先将1月份的订单数据上传到HDFS上,订单数据格式 ID Goods两个数据字段构成
将订单数据保存在order.txt中,(上传前记得启动集群)。
打开Idea创建项目
修改pom.xml,添加依赖
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.1.4</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.30</version>
</dependency>
</dependencies>
指定打包方式:jar
打包时插件的配置:
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
</execution>
</executions>
</plugin>
</plugins>
</build>
在resources目录下新建log4j文件log4j.properties
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=D:\\ordercount.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
在com.maidu.ordercount包中创建一个新类ShoppingOrderCount类,编写以下模块
1.Mapper模块的编写
在ShoppingOrderCount中定义一个内部类MyMapper
public static class MyMap extends Mapper<Object,Text, Text, IntWritable>{
@Override
public void map(Object key,Text value,Context context) throws IOException ,InterruptedException {
String line =value.toString();
String[] arr =line.split(" "); //3 水果 水果作为键 值 1(数量1 不是 3 表示用户编号)
if(arr.length==2){
context.write( new Text(arr[1]),new IntWritable(1) );
}
}
}
2.Reducer模块的编写
在ShoppingOrderCount中定义一个内部类MyReduce
public static class MyReduce extends Reducer<Text,IntWritable,Text,IntWritable>{
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int count =0;
for(IntWritable val:values){
count++;
}
context.write(key,new IntWritable(count));
}
}
3.Driver模块的编写
在ShoppingOrderCount类中编写主方法
public static void main(String[] args) throws Exception{
Configuration conf =new Configuration();
String []otherArgs =new GenericOptionsParser(conf,args).getRemainingArgs();
if(otherArgs.length<2){
System.out.println("必须输入读取文件路径和输出文件路径");
System.exit(2);
}
Job job = Job.getInstance(conf,"order count");
job.setJarByClass(ShoppingOrderCount.class);
job.setMapperClass(MyMap.class);
job.setReducerClass(MyReduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//添加输入的路径
for(int i =0;i<otherArgs.length-1;i++){
FileInputFormat.addInputPath(job,new Path(otherArgs[i]));
}
//设置输出路径
FileOutputFormat.setOutputPath(job,new Path(otherArgs[otherArgs.length-1]));
//执行任务
System.exit( job.waitForCompletion(true)?0:1 );
}
4.使用Maven编译打包,将项目打包为jar
从上往下,四步走,最终target下会生产jar文件
5.将orderCount-1.0-SNAPSHOT.jar拷贝上传到master主机上。
6.执行Jar
[yt@master ~]$ hadoop jar orderCount-1.0-SNAPSHOT.jar com.maidu.ordercount.ShoppingOrderCount /bigdata/order.txt /output-2301-02/
7.执行后查看结果
备注:如果运行出现虚拟内存不够,请参考:is running 261401088B beyond the 'VIRTUAL' memory limit. Current usage: 171.0 MB of 1 GB physical-CSDN博客