pyflink过滤kafka数据

py 复制代码
from pyflink.table import (TableEnvironment, EnvironmentSettings)

# 输入、输出、过滤条件
columns_in = [
...
]

columns_out = [
...
]
filter_condition = "name = '蒋介石' and sex = '男'"


# 创建执行环境

t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())
t_env.get_config().get_configuration().set_string("pipeline.jars", "file:///work/flink-sql-connector-kafka-3.2.0-1.19.jar")

source_topic = "foo"
sink_topic = "baa"
kafka_servers = "kafka:9092"
kafka_consumer_group_id = "flink consumer"

columnstr = ','.join([f"`{col}` VARCHAR"  for col in columns_in])
source_ddl = f"""
CREATE TABLE kafka_source({columnstr}) WITH (
              'connector' = 'kafka',
              'topic' = '{source_topic}',
              'properties.bootstrap.servers' = '{kafka_servers}',
              'properties.group.id' = '{kafka_consumer_group_id}',
              'scan.startup.mode' = 'latest-offset',
              'format' = 'json'
            )
"""

columnstr2 = ','.join([f"`{col}` VARCHAR"  for col in columns_out])
sink_ddl = f"""
CREATE TABLE kafka_sink ({columnstr2}
    ) with (
      'connector' = 'kafka',
      'topic' = '{sink_topic}',
      'properties.bootstrap.servers' = '{kafka_servers}',
      'properties.group.id' = '{kafka_consumer_group_id}',
      'scan.startup.mode' = 'latest-offset',
      'format' = 'json'
    )
"""
# 过滤字段
filtersql = f"""
insert into kafka_sink
select {
','.join([f"`{col}`"  for col in columns_out])
}
from kafka_source
where {filter_condition}
"""
t_env.execute_sql(filtersql)
t_env.execute_sql(source_ddl)
t_env.execute_sql(sink_ddl)
相关推荐
TF男孩8 小时前
ARQ:一款低成本的消息队列,实现每秒万级吞吐
后端·python·消息队列
该用户已不存在13 小时前
Mojo vs Python vs Rust: 2025年搞AI,该学哪个?
后端·python·rust
阿里云云原生15 小时前
嘉银科技基于阿里云 Kafka Serverless 提升业务弹性能力,节省成本超过 20%
kafka·serverless
站大爷IP15 小时前
Java调用Python的5种实用方案:从简单到进阶的全场景解析
python
用户83562907805121 小时前
从手动编辑到代码生成:Python 助你高效创建 Word 文档
后端·python
c8i21 小时前
python中类的基本结构、特殊属性于MRO理解
python
liwulin050621 小时前
【ESP32-CAM】HELLO WORLD
python
程序消消乐1 天前
Kafka 入门指南:从 0 到 1 构建你的 Kafka 知识基础入门体系
分布式·kafka
智能化咨询1 天前
Kafka架构:构建高吞吐量分布式消息系统的艺术——进阶优化与行业实践
分布式·架构·kafka