文章目录
Pulsar
部署
https://pulsar.apache.org/docs/4.1.x/getting-started-standalone/
wget https://mirrors.aliyun.com/apache/pulsar/pulsar-4.0.7/apache-pulsar-4.0.7-bin.tar.gz
bin/pulsar standalone
topic
创建
bin/pulsar-admin topics create persistent://public/default/my-topic
发送消息
bin/pulsar-client produce my-topic --messages 'Hello Pulsar!'
接收消息
bin/pulsar-client consume my-topic -s 'my-subscription' -p Earliest -n 0
命令
- Topic 管理(pulsar-admin topics)
| 功能 | 命令 |
|---|---|
| 创建 topic | bin/pulsar-admin topics create persistent://public/default/my-topic |
| 删除 topic | bin/pulsar-admin topics delete persistent://public/default/my-topic |
| 查看 topic 列表 | bin/pulsar-admin topics list public/default |
| 查看 topic 详情 | bin/pulsar-admin topics stats persistent://public/default/my-topic |
| 查看内部统计 | bin/pulsar-admin topics stats-internal persistent://public/default/my-topic |
| 查看订阅者 | bin/pulsar-admin topics subscriptions persistent://public/default/my-topic |
| 清空 subscription | bin/pulsar-admin topics unsubscribe persistent://public/default/my-topic -s my-sub |
| 跳过消息(跳到最新) | bin/pulsar-admin topics skip persistent://public/default/my-topic -s my-sub -n 100 |
- 发送消息(pulsar-client produce)
| 功能 | 命令 |
|---|---|
| 发送单条 | bin/pulsar-client produce my-topic -m "hello" |
| 发送多条(管道) | echo -e "a\nb\nc" | bin/pulsar-client produce my-topic -n 3 |
| 指定 key | bin/pulsar-client produce my-topic -m "msg" -k "key1" |
| 指定属性 | bin/pulsar-client produce my-topic -m "msg" -p "k1=v1,k2=v2" |
| 发送 JSON | bin/pulsar-client produce my-topic -m '{"name":"tom","age":18}' |
- 消费消息(pulsar-client consume)
| 功能 | 命令 |
|---|---|
| 从最早消费(无限) | bin/pulsar-client consume my-topic -s sub1 -p Earliest -n 0 |
| 从最新消费 | bin/pulsar-client consume my-topic -s sub2 -p Latest -n 0 |
| 消费 N 条后退出 | bin/pulsar-client consume my-topic -s sub3 -n 5 |
| 消费并打印 key | bin/pulsar-client consume my-topic -s sub4 -n 0 --show-key |
| 消费并打印时间 | bin/pulsar-client consume my-topic -s sub5 -n 0 --show-timestamp |
- 集群与租户管理(pulsar-admin)
| 功能 | 命令 |
|---|---|
| 创建租户 | bin/pulsar-admin tenants create my-tenant --allowed-clusters standalone |
| 创建命名空间 | bin/pulsar-admin namespaces create public/my-ns |
| 设置策略 | bin/pulsar-admin namespaces set-retention public/default -s 7d -t 1G |
| 查看集群 | bin/pulsar-admin clusters list |
开启事务
# 开启事务功能
transactionCoordinatorEnabled=true
# 事务超时时间(默认 300s,测试可缩短至 60s)
transactionTimeoutMillis=60000
# 事务日志存储(默认 bookkeeper,无需修改)
transactionLogStore=org.apache.pulsar.broker.transaction.pendingack.impl.BookKeeperPendingAckStore
bash
#broker.conf
sed -i.bak '/^#*transactionCoordinatorEnabled/c\transactionCoordinatorEnabled=true' broker.conf
grep -q '^transactionTimeoutMillis' broker.conf || echo 'transactionTimeoutMillis=60000' >> broker.conf
flink
部署
官网:https://archive.apache.org/dist/flink/
国内镜像:https://mirrors.aliyun.com/apache/flink/
maven:https://mvnrepository.com/artifact/org.apache.flink/flink-clients
bash
bin/stop-cluster.sh
bin/start-cluster.sh
#下载
wget https://mirrors.aliyun.com/apache/flink/flink-1.18.1/flink-1.18.1-bin-scala_2.12.tgz
#验证是否启动成功
ps -ef | grep flink
netstat -tulpn | grep 8081
命令
bash
echo "=== 查看 Flink 作业 ==="
/home/liuhaitao1/flink/flink-1.17.2/bin/flink list
#查看消费的消息
tail -f /home/liuhaitao1/flink/flink-1.17.2/log/flink-*-taskexecutor-*.out
tail -f /home/liuhaitao1/flink/flink-2.1.1/log/flink-*-taskexecutor-*.out
tail -f /home/liuhaitao1/flink/flink-1.14.2/log/flink-*-taskexecutor-*.out
#运行作业
/home/liuhaitao1/flink/flink-1.17.2/bin/flink run \
-c com.example.PulsarConsumer \
target/flink-pulsar-consumer-with-deps.jar
#取消作业
/home/liuhaitao1/flink/flink-1.17.2/bin/flink cancel 6a6858a0d11fc412ac682afdb03f80c1
#并行度为4
flink run -p 4 -c ...
连接器
现有
jar包,maven导入就行。
然后将Java项目打包,放到flink服务器上运行即可
jar包来源:
https://mvnrepository.com/artifact/org.apache.flink/flink-connector-pulsar
好用,但是太少了(有点狗屎)
flink自己编译
https://github.com/apache/flink-connector-pulsar
bash
#先装到本地仓库
mvn install:install-file \
-Dfile=target/flink-connector-pulsar-4.2-SNAPSHOT.jar \
-DgroupId=org.apache.flink \
-DartifactId=flink-connector-pulsar \
-Dversion=4.2.0-latest \
-Dpackaging=jar \
-DgeneratePom=true
测试
命令
bash
/home/liuhaitao1/flink/flink-1.17.2/bin/flink run \
-c com.example.PulsarConsumer \
target/flink-pulsar-consumer-with-deps.jar
/home/liuhaitao1/flink/flink-1.14.2/bin/flink run \
-c com.example.PulsarConsumer \
target/flink-pulsar-consumer-with-deps.jar
/home/liuhaitao1/flink/flink-1.11.6/bin/flink run \
-c com.example.PulsarConsumer \
target/flink-pulsar-consumer-with-deps.jar
其他
bash
#发消息
bin/pulsar-client produce my-input-topic -m "hello + $(date +'%a %b %d %H:%M:%S CST %Y')" -n 1
bin/pulsar-client consume persistent://public/default/output-trans -s check -n 0
for i in {1..20}; do
bin/pulsar-client produce persistent://public/default/input-trans -m "msg-$i"
done
Java连接代码
1.17
package org.example;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.connector.pulsar.source.PulsarSource;
import org.apache.flink.connector.pulsar.source.enumerator.cursor.StartCursor;
import org.apache.flink.connector.pulsar.sink.PulsarSink;
import org.apache.flink.connector.base.DeliveryGuarantee;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
public class PulsarConsumer {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
// 必须开 checkpoint + 允许外部化 checkpoint(作业 cancel 后还能恢复)
env.enableCheckpointing(10_000L);
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
env.getCheckpointConfig().enableExternalizedCheckpoints(
org.apache.flink.streaming.api.environment.CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
// ==================== Source ====================
PulsarSource<String> source = PulsarSource.builder()
.setServiceUrl("pulsar://127.0.0.1:6650")
.setAdminUrl("http://127.0.0.1:8080")
.setTopics("persistent://public/default/input-trans")
.setSubscriptionName("exactly-once-final-" + System.currentTimeMillis()) // 每次都新
.setStartCursor(StartCursor.earliest())
.setDeserializationSchema(new SimpleStringSchema())
.build();
// ==================== Sink (真 exactly-once) ====================
PulsarSink<String> sink = PulsarSink.builder()
.setServiceUrl("pulsar://127.0.0.1:6650")
.setAdminUrl("http://127.0.0.1:8080")
.setTopics("persistent://public/default/output-trans")
.setSerializationSchema(new SimpleStringSchema())
.setDeliveryGuarantee(DeliveryGuarantee.EXACTLY_ONCE) // 事务开启
.build();
// ==================== Pipeline ====================
env.fromSource(source, null, "pulsar-source")
.map(msg -> {
System.out.println("处理消息 → " + msg);
return "OUT-" + msg;
})
.sinkTo(sink);
env.execute("Pulsar 4.0.7 + Flink Exactly-Once 终极版");
}
}
pom.xml
1.17
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>flink-demo1</artifactId>
<version>1.0.0</version>
<packaging>jar</packaging>
<properties>
<flink.version>1.17.2</flink.version>
<pulsar.connector.version>4.1.0-1.17</pulsar.connector.version>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients</artifactId>
<version>${flink.version}</version>
</dependency>
<!-- 你自己打的 4.2 connector(最强最稳) -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-pulsar</artifactId>
<version>${pulsar.connector.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-base -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-base</artifactId>
<version>1.17.0</version>
</dependency>
<dependency>
<groupId>org.apache.pulsar</groupId>
<artifactId>pulsar-client-all</artifactId>
<version>4.0.7</version>
</dependency>
<!-- 日志(可选) -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>2.17.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<!-- Shade 打 fat jar -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.5.0</version>
<executions>
<execution>
<phase>package</phase>
<goals><goal>shade</goal></goals>
<configuration>
<createDependencyReducedPom>false</createDependencyReducedPom>
<finalName>flink-pulsar-consumer-with-deps</finalName>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<!-- 改成你自己的主类 -->
<mainClass>com.example.PulsarConsumer</mainClass>
</transformer>
</transformers>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
maven
maven升级
# 1. 安装 sdkman(如果没有)
curl -s "https://get.sdkman.io" | bash
source "$HOME/.sdkman/bin/sdkman-init.sh"
# 2. 安装 Maven 3.9.6
sdk install maven 3.9.6
# 3. 设置为默认
sdk default maven 3.9.6
mvn -version
flink任务maven打包
# 清理旧文件
rm -rf target/
# 重新打包
mvn clean package -DskipTests
ls -l target/