flink如何写入es

提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档

文章目录


前言

Flink sink 流数据写入到es5和es7的简单示例。


一、写入到Elasticsearch5

  • pom maven依赖
xml 复制代码
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-elasticsearch5_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
  • 代码如下(示例):
java 复制代码
public class Es5SinkDemo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Row row=Row.of("张三","001",getTimestamp("2016-10-24 21:59:06"));
        Row row2=Row.of("张三","002",getTimestamp("2016-10-24 21:50:06"));
        Row row3=Row.of("张三","002",getTimestamp("2016-10-24 21:51:06"));
        Row row4=Row.of("李四","003",getTimestamp("2016-10-24 21:50:56"));
        Row row5=Row.of("李四","004",getTimestamp("2016-10-24 00:48:36"));
        Row row6=Row.of("王五","005",getTimestamp("2016-10-24 00:48:36"));
        DataStreamSource<Row> source =env.fromElements(row,row2,row3,row4,row5,row6);

        Map<String, String> config = new HashMap<>();
//        config.put("cluster.name", "my-cluster-name");
//        config.put("bulk.flush.max.actions", "1");

        List<InetSocketAddress> transportAddresses = new ArrayList<>();

        transportAddresses.add(new InetSocketAddress(InetAddress.getByName("10.68.8.60"), 9300));
        //Sink操作
        DataStreamSink<Row> rowDataStreamSink = source.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<Row>() {
            public IndexRequest createIndexRequest(Row element) {
                Map<String, Object> json = new HashMap<>();
                json.put("name22", element.getField(0).toString());
                json.put("no22", element.getField(1));
                json.put("age", 34);
                json.put("create_time", element.getField(2));

                return Requests.indexRequest()
                        .index("cc")
                        .type("mtype")
                        .id(element.getField(1).toString())
                        .source(json);
            }

            @Override
            public void process(Row element, RuntimeContext ctx, RequestIndexer indexer) {
                //利用requestIndexer进行发送请求,写入数据
                indexer.add(createIndexRequest(element));
            }
        }));
        env.execute("es demo");
    }
    private static java.sql.Timestamp getTimestamp(String str) throws Exception {
//		String string = "2016-10-24 21:59:06";
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        java.util.Date date=sdf.parse(str);
        java.sql.Timestamp s = new java.sql.Timestamp(date.getTime());
        return s;
    }

二、写入到Elasticsearch7

  • pom maven依赖
xml 复制代码
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-elasticsearch7_2.11</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>
  • 代码如下(示例):
java 复制代码
import org.apache.flink.api.common.functions.RuntimeContext;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction;
import org.apache.flink.streaming.connectors.elasticsearch.RequestIndexer;
import org.apache.flink.streaming.connectors.elasticsearch7.ElasticsearchSink;
import org.apache.flink.types.Row;
import org.apache.http.HttpHost;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.Requests;

import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class EsSinkDemo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Row row=Row.of("张三","001",getTimestamp("2016-10-24 21:59:06"));
        Row row2=Row.of("张三","002",getTimestamp("2016-10-24 21:50:06"));
        Row row3=Row.of("张三","002",getTimestamp("2016-10-24 21:51:06"));
        Row row4=Row.of("李四","003",getTimestamp("2016-10-24 21:50:56"));
        Row row5=Row.of("李四","004",getTimestamp("2016-10-24 00:48:36"));
        Row row6=Row.of("王五","005",getTimestamp("2016-10-24 00:48:36"));
        DataStreamSource<Row> source =env.fromElements(row,row2,row3,row4,row5,row6);

        Map<String, String> config = new HashMap<>();
//        config.put("cluster.name", "my-cluster-name");
// This instructs the sink to emit after every element, otherwise they would be buffered
//        config.put("bulk.flush.max.actions", "1");

        List<HttpHost> hosts = new ArrayList<>();
        hosts.add(new HttpHost("10.68.8.69",9200,"http"));

        ElasticsearchSink.Builder<Row> esSinkBuilder = new ElasticsearchSink.Builder<Row>(hosts,new ElasticsearchSinkFunction<Row>() {
            public IndexRequest createIndexRequest(Row element) {
                Map<String, Object> json = new HashMap<>();
                json.put("name22", element.getField(0).toString());
                json.put("no22", element.getField(1));
                json.put("age", 34);
//                json.put("create_time", element.getField(2));

                return Requests.indexRequest()
                        .index("cc")
                        .id(element.getField(1).toString())
                        .source(json);
            }

            @Override
            public void process(Row element, RuntimeContext ctx, RequestIndexer indexer) {
                //利用requestIndexer进行发送请求,写入数据
                indexer.add(createIndexRequest(element));
            }
        });
        esSinkBuilder.setBulkFlushMaxActions(100);
        //Sink操作
        source.addSink(esSinkBuilder.build());

        env.execute("es demo");
    }

    private static java.sql.Timestamp getTimestamp(String str) throws Exception {
//		String string = "2016-10-24 21:59:06";
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        java.util.Date date=sdf.parse(str);
        java.sql.Timestamp s = new java.sql.Timestamp(date.getTime());
        return s;
    }
}

总结

flink写入es5和es7 的区别是引入不同的flink-connector-elasticsearch,es7已没有type的概念故无需再设置type。

相关推荐
武子康9 小时前
大数据-235 离线数仓 - 实战:Flume+HDFS+Hive 搭建 ODS/DWD/DWS/ADS 会员分析链路
大数据·后端·apache hive
DianSan_ERP1 天前
电商API接口全链路监控:构建坚不可摧的线上运维防线
大数据·运维·网络·人工智能·git·servlet
够快云库1 天前
能源行业非结构化数据治理实战:从数据沼泽到智能资产
大数据·人工智能·机器学习·企业文件安全
AI周红伟1 天前
周红伟:智能体全栈构建实操:OpenClaw部署+Agent Skills+Seedance+RAG从入门到实战
大数据·人工智能·大模型·智能体
B站计算机毕业设计超人1 天前
计算机毕业设计Django+Vue.js高考推荐系统 高考可视化 大数据毕业设计(源码+LW文档+PPT+详细讲解)
大数据·vue.js·hadoop·django·毕业设计·课程设计·推荐算法
计算机程序猿学长1 天前
大数据毕业设计-基于django的音乐网站数据分析管理系统的设计与实现(源码+LW+部署文档+全bao+远程调试+代码讲解等)
大数据·django·课程设计
B站计算机毕业设计超人1 天前
计算机毕业设计Django+Vue.js音乐推荐系统 音乐可视化 大数据毕业设计 (源码+文档+PPT+讲解)
大数据·vue.js·hadoop·python·spark·django·课程设计
十月南城1 天前
数据湖技术对比——Iceberg、Hudi、Delta的表格格式与维护策略
大数据·数据库·数据仓库·hive·hadoop·spark
中烟创新1 天前
灯塔AI智能体获评“2025-2026中国数智科技年度十大创新力产品”
大数据·人工智能·科技
璞华Purvar1 天前
2026智造升级|从配方到生产,从协同到合规——璞华易研PLM赋能制造企业全链路升级
大数据·人工智能