数据采集工具之Flume

本文主要实现数据到datahub的采集过程

1、下载

Index of /dist/flume/1.11.0

datahub插件下载

https://aliyun-datahub.oss-cn-hangzhou.aliyuncs.com/tools/aliyun-flume-datahub-sink-2.0.9.tar.gz

2、安装

复制代码
$ tar aliyun-flume-datahub-sink-x.x.x.tar.gz
$ cd aliyun-flume-datahub-sink-x.x.x
$ mkdir ${FLUME_HOME}/plugins.d
$ mv aliyun-flume-datahub-sink ${FLUME_HOME}/plugins.d

3、编写配置文件

复制代码
# A single-node Flume configuration for DataHub
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /soft/data/test.csv
# Describe the sink
a1.sinks.k1.type = com.aliyun.datahub.flume.sink.DatahubSink
a1.sinks.k1.datahub.accessId = 2Z8tAOpDPBm5LEkA
a1.sinks.k1.datahub.accessKey = Tlupsw2G0PdKGCRyPLucHjeESqoCla
a1.sinks.k1.datahub.endPoint = https://datahub.cn-beijing-tbdg-d01.dh.res.bigdata.tbea.com
a1.sinks.k1.datahub.project = bigdata
a1.sinks.k1.datahub.topic = txt_flume
a1.sinks.k1.serializer = DELIMITED
a1.sinks.k1.serializer.delimiter = ,
a1.sinks.k1.serializer.fieldnames = id,name,gender,salary,my_time,decimal
a1.sinks.k1.serializer.charset = UTF-8
a1.sinks.k1.datahub.retryTimes = 5
a1.sinks.k1.datahub.retryInterval = 5
a1.sinks.k1.datahub.batchSize = 100
a1.sinks.k1.datahub.batchTimeout = 5
a1.sinks.k1.datahub.enablePb = true
a1.sinks.k1.datahub.compressType = DEFLATE
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

4、启动

复制代码
flume-ng agent -n a1 -c conf -f ./conf/flume-txt2datahub.conf -Dflume.root.logger=INFO,console

Q:启动报错

复制代码
[root@hadoop2 apache-flume-1.11.0-bin]# flume-ng agent -n a1 -c conf -f ./conf/flume-txt2datahub.conf -Dflume.root.logger=INFO,console
Info: Including Hive libraries found via () for Hive access
+ exec /soft/jdk1.8.0_421/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/soft/apache-flume-1.11.0-bin/conf:/soft/apache-flume-1.11.0-bin/lib/*:/soft/apache-flume-1.11.0-bin/plugins.d/aliyun-flume-datahub-sink/lib/*:/soft/apache-flume-1.11.0-bin/plugins.d/aliyun-flume-datahub-sink/libext/*:/lib/*' -Djava.library.path= org.apache.flume.node.Application -n a1 -f ./conf/flume-txt2datahub.conf
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/soft/apache-flume-1.11.0-bin/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/soft/apache-flume-1.11.0-bin/plugins.d/aliyun-flume-datahub-sink/libext/slf4j-log4j12-1.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkNotNull(Ljava/lang/Object;Ljava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
        at com.aliyun.datahub.flume.sink.DatahubSink.configure(DatahubSink.java:59)
        at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
        at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:456)
        at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:109)
        at org.apache.flume.node.Application.main(Application.java:491)

A:删除Flume lib文件夹中的guava jar包文件,重新启动

相关推荐
阿里云大数据AI技术21 小时前
StarRocks x Fluss x Paimon湖流一体方案:构建秒级响应、湖流一体的实时数据引擎
大数据·人工智能
Databend21 小时前
Agent 轨迹分析与归因的数据工程实践
大数据·数据库·agent
喵个咪1 天前
Go Wind UBA 拆解系列 - 架构总览:三服务、数据流与契约优先
大数据·后端·go
喵个咪1 天前
Go Wind UBA 拆解系列 - 多租户与安全:两套隔离机制的边界
大数据·后端·go
喵个咪1 天前
Go Wind UBA 拆解系列 - OLAP 与 SQL 硬核:25 个分析模型怎么落地
大数据·后端·go
喵个咪1 天前
Go Wind UBA 拆解系列 - SDK 与采集层:从浏览器到 Kafka
大数据·后端·go
QCC产品中心1 天前
MiniMax Agent 接入实测:企业查询、股权穿透与 UBO 识别(附 Prompt 模板)
大数据·mcp·金融/非金融
SelectDB2 天前
Apache Doris Python UDF:让 SQL 直接调用 Python 生态,支撑 Agent 时代复杂业务逻辑
大数据·数据库·python
ApacheSeaTunnel2 天前
当多表数据涌入,Apache SeaTunnel 如何巧妙化解主键冲突?
大数据·开源·数据集成·seatunnel·技术分享·数据同步
大大大大晴天5 天前
Hudi Metadata Table 与 Hive Sync (HMS)怎么选?
大数据