大数据技术之Flume 企业开发案例——自定义 Sink（10）

[自定义 Sink](#自定义 Sink)

自定义 Sink

1）介绍

Sink 不断地轮询 Channel 中的事件并批量地移除它们，随后将这些事件批量写入到存储或索引系统，或者发送到另一个 Flume Agent。

Sink 是完全事务性的。在从 Channel 批量删除数据之前，每个 Sink 会用 Channel 启动一个事务。批量事件一旦成功写入到存储系统或下一个 Flume Agent，Sink 就利用 Channel 提交事务。事务一旦被提交，该 Channel 从自己的内部缓冲区删除事件。

Sink 组件的目的地包括 hdfs、logger、avro、thrift、ipc、file、null、HBase、solr、自定义等。虽然官方提供的 Sink 类型已经很多，但在实际开发中可能仍不能满足需求。此时，可以根据实际需求来自定义 Sink。

官方提供了自定义 Sink 的接口：Flume Developer Guidehttps://flume.apache.org/FlumeDeveloperGuide.html#sink。自定义 MySink 需要继承 AbstractSink 类并实现 Configurable 接口。

主要实现的方法包括：

configure(Context context) ------ 初始化 context（读取配置文件内容）
process() ------ 从 Channel 读取获取数据（event），这个方法将被循环调用。

使用场景：例如读取 Channel 数据写入 MySQL 或其他文件系统。

2）需求

使用 Flume 接收数据，并在 Sink 端给每条数据添加前缀和后缀，输出到控制台。前后缀可以从 Flume 任务配置文件中配置。

流程分析：

MySink
process()：从 Channel 中取数据，添加前后缀，写入日志。
输出示例：hello:lzl:hello
lzl

数据流：

source
channel
sink

步骤：

编码
- AbstractSink
打包到集群并编写任务配置文件
- Configurable
- configure()：读取任务配置文件中的配置信息。

3）编码

java 复制代码

package com.lzl;

import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class MySink extends AbstractSink implements Configurable {

  // 创建 Logger 对象
  private static final Logger LOG = LoggerFactory.getLogger(AbstractSink.class);

  private String prefix;
  private String suffix;

  @Override
  public Status process() throws EventDeliveryException {

    // 声明返回值状态信息
    Status status;

    // 获取当前 Sink 绑定的 Channel
    Channel ch = getChannel();

    // 获取事务
    Transaction txn = ch.getTransaction();

    // 声明事件
    Event event;

    // 开启事务
    txn.begin();

    // 读取 Channel 中的事件，直到读取到事件结束循环
    while (true) {
      event = ch.take();
      if (event != null) {
        break;
      }
    }

    try {

      // 处理事件（打印）
      LOG.info(prefix + new String(event.getBody()) + suffix);

      // 事务提交
      txn.commit();
      status = Status.READY;

    } catch (Exception e) {

      // 遇到异常，事务回滚
      txn.rollback();
      status = Status.BACKOFF;

    } finally {

      // 关闭事务
      txn.close();

    }

    return status;
  }

  @Override
  public void configure(Context context) {

    // 读取配置文件内容，有默认值
    prefix = context.getString("prefix", "hello:");

    // 读取配置文件内容，无默认值
    suffix = context.getString("suffix");

  }
}

4）测试

（1）打包将写好的代码打包，并放到 Flume 的 lib 目录（例如 /opt/module/flume）下。

（2）配置文件

java 复制代码

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = com.lzl.MySink
#a1.sinks.k1.prefix = lzl:
a1.sinks.k1.suffix = :lzl

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

（3）开启任务

java 复制代码

[lzl@hadoop12  flume]$ bin/flume-ng agent -c conf/ -f job/mysink.conf -n a1 -Dflume.root.logger=INFO,console
[lzl@hadoop12  ~]$ nc localhost 44444
hello
OK
lzl
OK

（4）查看结果