flink消费pulsar

文章目录

Pulsar

部署

https://pulsar.apache.org/docs/4.1.x/getting-started-standalone/

复制代码
wget https://mirrors.aliyun.com/apache/pulsar/pulsar-4.0.7/apache-pulsar-4.0.7-bin.tar.gz

bin/pulsar standalone

topic

复制代码
创建
bin/pulsar-admin topics create persistent://public/default/my-topic
发送消息
bin/pulsar-client produce my-topic --messages 'Hello Pulsar!'
接收消息
bin/pulsar-client consume my-topic -s 'my-subscription' -p Earliest -n 0

命令

  1. Topic 管理(pulsar-admin topics)
功能 命令
创建 topic bin/pulsar-admin topics create persistent://public/default/my-topic
删除 topic bin/pulsar-admin topics delete persistent://public/default/my-topic
查看 topic 列表 bin/pulsar-admin topics list public/default
查看 topic 详情 bin/pulsar-admin topics stats persistent://public/default/my-topic
查看内部统计 bin/pulsar-admin topics stats-internal persistent://public/default/my-topic
查看订阅者 bin/pulsar-admin topics subscriptions persistent://public/default/my-topic
清空 subscription bin/pulsar-admin topics unsubscribe persistent://public/default/my-topic -s my-sub
跳过消息(跳到最新) bin/pulsar-admin topics skip persistent://public/default/my-topic -s my-sub -n 100
  1. 发送消息(pulsar-client produce)
功能 命令
发送单条 bin/pulsar-client produce my-topic -m "hello"
发送多条(管道) echo -e "a\nb\nc" | bin/pulsar-client produce my-topic -n 3
指定 key bin/pulsar-client produce my-topic -m "msg" -k "key1"
指定属性 bin/pulsar-client produce my-topic -m "msg" -p "k1=v1,k2=v2"
发送 JSON bin/pulsar-client produce my-topic -m '{"name":"tom","age":18}'
  1. 消费消息(pulsar-client consume)
功能 命令
从最早消费(无限) bin/pulsar-client consume my-topic -s sub1 -p Earliest -n 0
从最新消费 bin/pulsar-client consume my-topic -s sub2 -p Latest -n 0
消费 N 条后退出 bin/pulsar-client consume my-topic -s sub3 -n 5
消费并打印 key bin/pulsar-client consume my-topic -s sub4 -n 0 --show-key
消费并打印时间 bin/pulsar-client consume my-topic -s sub5 -n 0 --show-timestamp

  1. 集群与租户管理(pulsar-admin)
功能 命令
创建租户 bin/pulsar-admin tenants create my-tenant --allowed-clusters standalone
创建命名空间 bin/pulsar-admin namespaces create public/my-ns
设置策略 bin/pulsar-admin namespaces set-retention public/default -s 7d -t 1G
查看集群 bin/pulsar-admin clusters list

开启事务

复制代码
# 开启事务功能
transactionCoordinatorEnabled=true
# 事务超时时间(默认 300s,测试可缩短至 60s)
transactionTimeoutMillis=60000
# 事务日志存储(默认 bookkeeper,无需修改)
transactionLogStore=org.apache.pulsar.broker.transaction.pendingack.impl.BookKeeperPendingAckStore
bash 复制代码
#broker.conf
sed -i.bak '/^#*transactionCoordinatorEnabled/c\transactionCoordinatorEnabled=true' broker.conf
grep -q '^transactionTimeoutMillis' broker.conf || echo 'transactionTimeoutMillis=60000' >> broker.conf

部署

官网:https://archive.apache.org/dist/flink/

国内镜像:https://mirrors.aliyun.com/apache/flink/

maven:https://mvnrepository.com/artifact/org.apache.flink/flink-clients

bash 复制代码
bin/stop-cluster.sh
bin/start-cluster.sh

#下载
wget https://mirrors.aliyun.com/apache/flink/flink-1.18.1/flink-1.18.1-bin-scala_2.12.tgz
#验证是否启动成功
ps -ef | grep flink
netstat -tulpn | grep 8081

命令

bash 复制代码
echo "=== 查看 Flink 作业 ==="
/home/liuhaitao1/flink/flink-1.17.2/bin/flink list

#查看消费的消息
tail -f /home/liuhaitao1/flink/flink-1.17.2/log/flink-*-taskexecutor-*.out
tail -f /home/liuhaitao1/flink/flink-2.1.1/log/flink-*-taskexecutor-*.out
tail -f /home/liuhaitao1/flink/flink-1.14.2/log/flink-*-taskexecutor-*.out

#运行作业
/home/liuhaitao1/flink/flink-1.17.2/bin/flink run \
    -c com.example.PulsarConsumer \
    target/flink-pulsar-consumer-with-deps.jar
#取消作业
/home/liuhaitao1/flink/flink-1.17.2/bin/flink cancel 6a6858a0d11fc412ac682afdb03f80c1
#并行度为4
flink run -p 4 -c ...

连接器

现有

jar包,maven导入就行。

然后将Java项目打包,放到flink服务器上运行即可

jar包来源:

https://mvnrepository.com/artifact/org.apache.flink/flink-connector-pulsar

好用,但是太少了(有点狗屎)

flink自己编译

https://github.com/apache/flink-connector-pulsar

bash 复制代码
#先装到本地仓库
mvn install:install-file \
  -Dfile=target/flink-connector-pulsar-4.2-SNAPSHOT.jar \
  -DgroupId=org.apache.flink \
  -DartifactId=flink-connector-pulsar \
  -Dversion=4.2.0-latest \
  -Dpackaging=jar \
  -DgeneratePom=true

测试

命令

bash 复制代码
/home/liuhaitao1/flink/flink-1.17.2/bin/flink run \
    -c com.example.PulsarConsumer \
    target/flink-pulsar-consumer-with-deps.jar


/home/liuhaitao1/flink/flink-1.14.2/bin/flink run \
    -c com.example.PulsarConsumer \
    target/flink-pulsar-consumer-with-deps.jar

/home/liuhaitao1/flink/flink-1.11.6/bin/flink run \
    -c com.example.PulsarConsumer \
    target/flink-pulsar-consumer-with-deps.jar

其他

bash 复制代码
#发消息
bin/pulsar-client produce my-input-topic -m "hello + $(date +'%a %b %d %H:%M:%S CST %Y')" -n 1

bin/pulsar-client consume persistent://public/default/output-trans -s check -n 0

for i in {1..20}; do
  bin/pulsar-client produce persistent://public/default/input-trans -m "msg-$i"
done

Java连接代码

1.17

复制代码
package org.example;

import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.connector.pulsar.source.PulsarSource;
import org.apache.flink.connector.pulsar.source.enumerator.cursor.StartCursor;
import org.apache.flink.connector.pulsar.sink.PulsarSink;
import org.apache.flink.connector.base.DeliveryGuarantee;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class PulsarConsumer {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // 必须开 checkpoint + 允许外部化 checkpoint(作业 cancel 后还能恢复)
        env.enableCheckpointing(10_000L);
        env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
        env.getCheckpointConfig().enableExternalizedCheckpoints(
                org.apache.flink.streaming.api.environment.CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

        // ==================== Source ====================
        PulsarSource<String> source = PulsarSource.builder()
                .setServiceUrl("pulsar://127.0.0.1:6650")
                .setAdminUrl("http://127.0.0.1:8080")
                .setTopics("persistent://public/default/input-trans")
                .setSubscriptionName("exactly-once-final-" + System.currentTimeMillis()) // 每次都新
                .setStartCursor(StartCursor.earliest())
                .setDeserializationSchema(new SimpleStringSchema())
                .build();

        // ==================== Sink (真 exactly-once) ====================
        PulsarSink<String> sink = PulsarSink.builder()
                .setServiceUrl("pulsar://127.0.0.1:6650")
                .setAdminUrl("http://127.0.0.1:8080")
                .setTopics("persistent://public/default/output-trans")
                .setSerializationSchema(new SimpleStringSchema())
                .setDeliveryGuarantee(DeliveryGuarantee.EXACTLY_ONCE)   // 事务开启
                .build();

        // ==================== Pipeline ====================
        env.fromSource(source, null, "pulsar-source")
                .map(msg -> {
                    System.out.println("处理消息 → " + msg);
                    return "OUT-" + msg;
                })
                .sinkTo(sink);

        env.execute("Pulsar 4.0.7 + Flink Exactly-Once 终极版");
    }
}

pom.xml

1.17

复制代码
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.example</groupId>
  <artifactId>flink-demo1</artifactId>
  <version>1.0.0</version>
  <packaging>jar</packaging>

  <properties>
    <flink.version>1.17.2</flink.version>

    <pulsar.connector.version>4.1.0-1.17</pulsar.connector.version>

    <maven.compiler.source>11</maven.compiler.source>
    <maven.compiler.target>11</maven.compiler.target>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>



  <dependencies>
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-streaming-java</artifactId>
      <version>${flink.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-clients</artifactId>
      <version>${flink.version}</version>
    </dependency>

    <!-- 你自己打的 4.2 connector(最强最稳) -->
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-connector-pulsar</artifactId>
      <version>${pulsar.connector.version}</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-base -->
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-connector-base</artifactId>
      <version>1.17.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.pulsar</groupId>
      <artifactId>pulsar-client-all</artifactId>
      <version>4.0.7</version>
    </dependency>
    <!-- 日志(可选) -->
    <dependency>
      <groupId>org.apache.logging.log4j</groupId>
      <artifactId>log4j-slf4j-impl</artifactId>
      <version>2.17.1</version>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <!-- Shade 打 fat jar -->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.5.0</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals><goal>shade</goal></goals>
            <configuration>
              <createDependencyReducedPom>false</createDependencyReducedPom>
              <finalName>flink-pulsar-consumer-with-deps</finalName>
              <transformers>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <!-- 改成你自己的主类 -->
                  <mainClass>com.example.PulsarConsumer</mainClass>
                </transformer>
              </transformers>
              <filters>
                <filter>
                  <artifact>*:*</artifact>
                  <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                  </excludes>
                </filter>
              </filters>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

maven

maven升级

复制代码
# 1. 安装 sdkman(如果没有)
curl -s "https://get.sdkman.io" | bash
source "$HOME/.sdkman/bin/sdkman-init.sh"

# 2. 安装 Maven 3.9.6
sdk install maven 3.9.6

# 3. 设置为默认
sdk default maven 3.9.6

mvn -version

flink任务maven打包

复制代码
# 清理旧文件
rm -rf target/

# 重新打包
mvn clean package -DskipTests

ls -l target/
相关推荐
Jackeyzhe3 小时前
Flink源码阅读:状态管理
flink
云老大TG:@yunlaoda3604 小时前
如何进行华为云国际站代理商跨Region适配?
大数据·数据库·华为云·负载均衡
字节数据平台5 小时前
刚刚,火山引擎多模态数据湖解决方案发布大数据运维Agent
大数据·运维·火山引擎
Hello.Reader5 小时前
Flink SQL Materialized Table 语句CREATE / ALTER / DROP介绍
数据库·sql·flink
YangYang9YangYan6 小时前
2026高职会计电算化专业高价值技能证书
大数据·学习·区块链
老蒋新思维6 小时前
从「流量算法」到「增长算法」:AI智能体如何重构企业增长的内在逻辑
大数据·网络·人工智能·重构·创始人ip·创客匠人·知识变现
五度易链-区域产业数字化管理平台6 小时前
大数据与 AI 赋能招商全流程:五度易链平台的技术架构与实践应用解析
大数据·人工智能
Moonbeam Community7 小时前
Polkadot 2025:从协议工程到可用的去中心化云平台
大数据·web3·去中心化·区块链·polkadot
阿里云大数据AI技术7 小时前
DataWorks 又又又升级了,这次我们通过 Arrow 列存格式让数据同步速度提升10倍!
大数据·人工智能