4.MapReduce 序列化

目录

概述

序列化是分布式计算中很重要的一环境,好的序列化方式,可以大大减少分布式计算中,网络传输的数据量。

序列化

序列化

对象 --> 字节序例 :存储到磁盘或者网络传输

MR 、Spark、Flink :分布式的执行框架 必然会涉及到网络传输

java 中的序列化:Serializable

Hadoop 中序列化特点: 紧凑、速度、扩展性、互操作

Spark 中使用了其它的序例化框架 Kyro

反序例化

字节序例 ---> 对象

java自带的两种

Serializable

此处是 java 自带的 序例化 方式,这种方式简单方便,但体积大,不利于大数据量网络传输。

java 复制代码
public class JavaSerDemo {

    public static void main(String[] args) throws IOException, ClassNotFoundException {
        Person person = new Person(1, "张三", 33);
        ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("download/person.obj"));
        out.writeObject(person);

        ObjectInputStream in = new ObjectInputStream(new FileInputStream("download/person.obj"));
        Object o = in.readObject();
        System.out.println(o);
    }


    static class Person implements Serializable {
        private int id;
        private String name;
        private int age;

        public Person(int id, String name, int age) {
            this.id = id;
            this.name = name;
            this.age = age;
        }

        @Override
        public String toString() {
            return "Person{" +
                    "id=" + id +
                    ", name='" + name + '\'' +
                    ", age=" + age +
                    '}';
        }

        public int getId() {
            return id;
        }

        public void setId(int id) {
            this.id = id;
        }

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }

        public int getAge() {
            return age;
        }

        public void setAge(int age) {
            this.age = age;
        }
    }
}

非Serializable

java 复制代码
public class DataSerDemo {

    public static void main(String[] args) throws IOException {

        Person person = new Person(1, "张三", 33);
        DataOutputStream out = new DataOutputStream(new FileOutputStream("download/person2.obj"));
        out.writeInt(person.getId());
        out.writeUTF(person.getName());
        out.close();

        DataInputStream in = new DataInputStream(new FileInputStream("download/person2.obj"));
        // 这里要注意,上面以什么顺序写出去,这里就要以什么顺序读取
        int id = in.readInt();
        String name = in.readUTF();
        in.close();
        System.out.println("id:" + id + " name:" + name);

    }

    /**
     *  注意: 不需要继承 Serializable
     */
    static class Person {
        private int id;
        private String name;
        private int age;

        public Person(int id, String name, int age) {
            this.id = id;
            this.name = name;
            this.age = age;
        }

        @Override
        public String toString() {
            return "Person{" +
                    "id=" + id +
                    ", name='" + name + '\'' +
                    ", age=" + age +
                    '}';
        }

        public int getId() {
            return id;
        }

        public void setId(int id) {
            this.id = id;
        }

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }

        public int getAge() {
            return age;
        }

        public void setAge(int age) {
            this.age = age;
        }
    }
}

hadoop序例化

官方地址速递

The key and value classes have to be serializable by the framework and hence need to implement the Writable interface. Additionally, the key classes have to implement the WritableComparable interface to facilitate sorting by the framework.

注意:Writable 两个方法,一个 write ,readFields

java 复制代码
@InterfaceAudience.Public
@InterfaceStability.Stable
public interface Writable {

  void write(DataOutput out) throws IOException;

  void readFields(DataInput in) throws IOException;
}

实践

java 复制代码
public class PersonWritable implements Writable {

    private int id;
    private String name;
    private int age;
    // 消费金额
    private int consumption;
    // 消费总金额
    private long consumptions;


    public PersonWritable() {
    }

    public PersonWritable(int id, String name, int age, int consumption) {
        this.id = id;
        this.name = name;
        this.age = age;
        this.consumption = consumption;
    }

    public PersonWritable(int id, String name, int age, int consumption, long consumptions) {
        this.id = id;
        this.name = name;
        this.age = age;
        this.consumption = consumption;
        this.consumptions = consumptions;
    }

    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    public int getConsumption() {
        return consumption;
    }

    public void setConsumption(int consumption) {
        this.consumption = consumption;
    }

    public long getConsumptions() {
        return consumptions;
    }

    public void setConsumptions(long consumptions) {
        this.consumptions = consumptions;
    }

    @Override
    public String toString() {
        return
                "id=" + id +
                        ", name='" + name + '\'' +
                        ", age='" + age + '\'' +
                        ", consumption=" + consumption + '\'' +
                        ", consumptions=" + consumptions;
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeInt(id);
        out.writeUTF(name);
        out.writeInt(age);
        out.writeInt(consumption);
        out.writeLong(consumptions);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        id = in.readInt();
        name = in.readUTF();
        age = in.readInt();
        consumption = in.readInt();
        consumptions = in.readLong();
    }
}
java 复制代码
/**
 * 统计 个人 消费
 */
public class PersonStatistics {

    static class PersonStatisticsMapper extends Mapper<LongWritable, Text, IntWritable, PersonWritable> {
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String[] split = value.toString().split(",");
            int id = Integer.parseInt(split[0]);
            String name = split[1];
            int age = Integer.parseInt(split[2]);
            int consumption = Integer.parseInt(split[3]);
            PersonWritable writable = new PersonWritable(id, name, age, consumption, 0);
            context.write(new IntWritable(id), writable);
        }
    }

    static class PersonStatisticsReducer extends Reducer<IntWritable, PersonWritable, NullWritable, PersonWritable> {
        @Override
        protected void reduce(IntWritable key, Iterable<PersonWritable> values, Context context) throws IOException, InterruptedException {
            long count = 0L;
            PersonWritable person = null;
            for (PersonWritable data : values) {
                if (Objects.isNull(person)) {
                    person = data;
                }
                count = count + data.getConsumption();
            }
            person.setConsumptions(count);

            PersonWritable personWritable = new PersonWritable(person.getId(), person.getName(), person.getAge(), person.getConsumption(), count);

            context.write(NullWritable.get(), personWritable);
        }
    }

    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        Configuration configuration = new Configuration();

        String sourcePath = "data/person.data";
        String distPath = "downloadOut/person-out.data";

        FileUtil.deleteIfExist(configuration, distPath);

        Job job = Job.getInstance(configuration, "person statistics");
        job.setJarByClass(PersonStatistics.class);
        //job.setCombinerClass(PersonStatistics.PersonStatisticsReducer.class);
        job.setMapperClass(PersonStatisticsMapper.class);
        job.setReducerClass(PersonStatisticsReducer.class);
        job.setMapOutputKeyClass(IntWritable.class);
        job.setMapOutputValueClass(PersonWritable.class);
        job.setOutputKeyClass(NullWritable.class);
        job.setOutputValueClass(PersonWritable.class);

        FileInputFormat.addInputPath(job, new Path(sourcePath));
        FileOutputFormat.setOutputPath(job, new Path(distPath));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}
bash 复制代码
# person.data
1,张三,30,10
1,张三,30,20
2,李四,25,5

上述执行结果如下:

分片/InputFormat & InputSplit

官方文档速递

java 复制代码
org.apache.hadoop.mapreduce.InputFormat
org.apache.hadoop.mapreduce.InputSplit

日志

执行 序列化 测试小程序,关注以下日志

bash 复制代码
# 总共加载一个文件,分隔成一个
2024-01-06 09:19:42,363 [main] [org.apache.hadoop.mapreduce.lib.input.FileInputFormat] [INFO] - Total input files to process : 1
2024-01-06 09:19:42,487 [main] [org.apache.hadoop.mapreduce.JobSubmitter] [INFO] - number of splits:1

结束

至此,MapReduce 序列化 至此结束,如有疑问,欢迎评论区留言。

相关推荐
昨天今天明天好多天1 小时前
【数据仓库】
大数据
油头少年_w1 小时前
大数据导论及分布式存储HadoopHDFS入门
大数据·hadoop·hdfs
Elastic 中国社区官方博客2 小时前
释放专利力量:Patently 如何利用向量搜索和 NLP 简化协作
大数据·数据库·人工智能·elasticsearch·搜索引擎·自然语言处理
力姆泰克2 小时前
看电动缸是如何提高农机的自动化水平
大数据·运维·服务器·数据库·人工智能·自动化·1024程序员节
力姆泰克2 小时前
力姆泰克电动缸助力农业机械装备,提高农机的自动化水平
大数据·服务器·数据库·人工智能·1024程序员节
QYR市场调研3 小时前
自动化研磨领域的革新者:半自动与自动自磨机的技术突破
大数据·人工智能
半部论语4 小时前
第三章:TDengine 常用操作和高级功能
大数据·时序数据库·tdengine
EasyGBS5 小时前
国标GB28181公网直播EasyGBS国标GB28181软件管理解决方案
大数据·网络·音视频·媒体·视频监控·gb28181
2403_875736875 小时前
道品科技的水肥一体化智能灌溉:开启现代农业的创新征程
大数据·人工智能·1024程序员节
河南查新信息技术研究院5 小时前
科技查新在医药健康领域的应用
大数据·科技·全文检索