一、总览
| 组件 | 作用 |
|---|---|
| PostgreSQL | 数据源,提供 update_time 增量字段 |
| Logstash | 通过 jdbc 插件拉数据,kafka 插件写 Topic |
| Kafka | 下游消费,支持 SASL/PLAIN 账号密码 |
架构图:
PostgreSQL
↓ JDBC (incremental)
Logstash 管道
↓ SASL_PLAIN
Kafka Topic
二、准备工作(一次性)
-
下载 JDBC 驱动
放入 Logstash 机器(或容器):
/usr/share/logstash/postgresql-42.3.6.jar -
PG 侧创建增量字段索引(防止全表扫描)
sqlCREATE INDEX idx_news_update_time ON news(update_time); -
Kafka 侧创建 Topic
bashkafka-topics.sh --create --topic pg-news --partitions 6 --replication-factor 2
三、核心配置:pg2kafka.conf
ruby
input {
jdbc {
# === 连接 ===
jdbc_connection_string => "jdbc:postgresql://postgresql-server:5432/mydb"
jdbc_user => "postgres"
jdbc_password => "${PG_PASSWORD}" # K8s 用 Secret 注入
jdbc_driver_class => "org.postgresql.Driver"
jdbc_driver_library => "/usr/share/logstash/postgresql-42.3.6.jar"
# === 增量 SQL ===
statement => "
SELECT id, title, content, update_time
FROM public.news
WHERE update_time > :sql_last_value
ORDER BY update_time
LIMIT 5000"
use_column_value => true
tracking_column => "update_time"
tracking_column_type => "timestamp"
last_run_metadata_path => "/usr/share/logstash/data/pg_news_last_value"
# 每 30 秒拉一次
schedule => "*/30 * * * *"
}
}
filter {
# 统一时间戳
mutate {
copy => { "update_time" => "@timestamp" }
remove_field => ["@version","update_time"]
}
}
output {
kafka {
bootstrap_servers => "kafka-broker-1:9092,kafka-broker-2:9092,kafka-broker-3:9092"
topic_id => "pg-news"
codec => json_lines
compression_type => "lz4"
acks => "all"
client_id => "logstash-pg2k-news"
# === SASL/PLAIN 账号密码 ===
security_protocol => "SASL_PLAINTEXT"
sasl_mechanism => "PLAIN"
jaas_config => 'org.apache.kafka.common.security.plain.PlainLoginModule required
username="${KAFKA_USER}"
password="${KAFKA_PASS}";'
}
}
四、启动与验证
-
语法检查
bashbin/logstash --config.test_and_exit -f pg2kafka.conf -
启动
bashnohup bin/logstash -f pg2kafka.conf \ --path.logs /var/log/logstash \ > /dev/null 2>&1 & -
消费验证
bashkafka-console-consumer.sh --bootstrap-server kafka-broker-1:9092 \ --topic pg-news --from-beginning能看到实时 JSON 数据即成功。
五、多表并行(复制即可)
| 表名 | Topic | last_value 文件 |
|---|---|---|
| news | pg-news | /data/pg_news_last_value |
| user | pg-user | /data/pg_user_last_value |
把 pg2kafka.conf 复制一份,改三处:
statement表名tracking_column(如有不同)topic_idlast_run_metadata_path必须不同
最后在 pipelines.yml 里加一条即可:
yaml
- pipeline.id: pg-user
path.config: "/etc/logstash/pipeline.d/pg-user.conf"
queue.type: persisted