数据采集工具之Flume

本文主要实现数据到datahub的采集过程

1、下载

Index of /dist/flume/1.11.0

datahub插件下载

https://aliyun-datahub.oss-cn-hangzhou.aliyuncs.com/tools/aliyun-flume-datahub-sink-2.0.9.tar.gz

2、安装

复制代码
$ tar aliyun-flume-datahub-sink-x.x.x.tar.gz
$ cd aliyun-flume-datahub-sink-x.x.x
$ mkdir ${FLUME_HOME}/plugins.d
$ mv aliyun-flume-datahub-sink ${FLUME_HOME}/plugins.d

3、编写配置文件

复制代码
# A single-node Flume configuration for DataHub
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /soft/data/test.csv
# Describe the sink
a1.sinks.k1.type = com.aliyun.datahub.flume.sink.DatahubSink
a1.sinks.k1.datahub.accessId = 2Z8tAOpDPBm5LEkA
a1.sinks.k1.datahub.accessKey = Tlupsw2G0PdKGCRyPLucHjeESqoCla
a1.sinks.k1.datahub.endPoint = https://datahub.cn-beijing-tbdg-d01.dh.res.bigdata.tbea.com
a1.sinks.k1.datahub.project = bigdata
a1.sinks.k1.datahub.topic = txt_flume
a1.sinks.k1.serializer = DELIMITED
a1.sinks.k1.serializer.delimiter = ,
a1.sinks.k1.serializer.fieldnames = id,name,gender,salary,my_time,decimal
a1.sinks.k1.serializer.charset = UTF-8
a1.sinks.k1.datahub.retryTimes = 5
a1.sinks.k1.datahub.retryInterval = 5
a1.sinks.k1.datahub.batchSize = 100
a1.sinks.k1.datahub.batchTimeout = 5
a1.sinks.k1.datahub.enablePb = true
a1.sinks.k1.datahub.compressType = DEFLATE
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

4、启动

复制代码
flume-ng agent -n a1 -c conf -f ./conf/flume-txt2datahub.conf -Dflume.root.logger=INFO,console

Q:启动报错

复制代码
[root@hadoop2 apache-flume-1.11.0-bin]# flume-ng agent -n a1 -c conf -f ./conf/flume-txt2datahub.conf -Dflume.root.logger=INFO,console
Info: Including Hive libraries found via () for Hive access
+ exec /soft/jdk1.8.0_421/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/soft/apache-flume-1.11.0-bin/conf:/soft/apache-flume-1.11.0-bin/lib/*:/soft/apache-flume-1.11.0-bin/plugins.d/aliyun-flume-datahub-sink/lib/*:/soft/apache-flume-1.11.0-bin/plugins.d/aliyun-flume-datahub-sink/libext/*:/lib/*' -Djava.library.path= org.apache.flume.node.Application -n a1 -f ./conf/flume-txt2datahub.conf
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/soft/apache-flume-1.11.0-bin/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/soft/apache-flume-1.11.0-bin/plugins.d/aliyun-flume-datahub-sink/libext/slf4j-log4j12-1.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkNotNull(Ljava/lang/Object;Ljava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
        at com.aliyun.datahub.flume.sink.DatahubSink.configure(DatahubSink.java:59)
        at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
        at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:456)
        at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:109)
        at org.apache.flume.node.Application.main(Application.java:491)

A:删除Flume lib文件夹中的guava jar包文件,重新启动

相关推荐
dessler20 小时前
Hadoop HDFS-SecondaryNameNode(2nn)详细介绍
大数据·hadoop·hdfs
BD_Marathon21 小时前
【Flink】DataStream API:UDF和物理分区算子
android·大数据·flink
百度Geek说21 小时前
TDS数据治理深度实践:从标准化到智能化的演进之路
大数据
shinelord明1 天前
【大数据技术实战】Flink+DS+Dinky 自动化构建数仓平台
大数据·运维·分布式·架构·flink·自动化
IT果果日记1 天前
Flink+Dinky实现UDF自定义函数
大数据·后端·flink
字节跳动数据平台1 天前
火山引擎多模态数据湖落地深势科技,提升科研数据处理效能
大数据
用户Taobaoapi20141 天前
多店铺数据采集效率低?京东API批量调用接口支持千级商品详情批量拉取
大数据·数据挖掘·数据分析
武子康1 天前
大数据-87 Spark 实现圆周率计算与共同好友分析:Scala 实战案例
大数据·后端·spark
BYSJMG1 天前
计算机大数据毕业设计选题:基于Spark+hadoop的全球香水市场趋势分析系统
大数据·vue.js·hadoop·python·spark·django·课程设计
Lx3521 天前
MapReduce自定义Partitioner实战经验分享
大数据·hadoop