一、kafka.zookeeper介绍
Kafka
简介: Apache Kafka 是一个开源的分布式流处理平台和消息队列系统。它最初由LinkedIn开发,并于2011年成为Apache软件基金会的顶级项目。
特点:
高吞吐量: Kafka 能够处理大规模的消息流,并具有很高的吞吐量。
持久性: 它将消息持久化到磁盘上,因此即使消费者不在线,也能保证消息不会丢失。
可伸缩性: Kafka 可以很容易地水平扩展以处理大量数据。
实时性: Kafka 可以提供几乎实时的消息传递,适用于大多数实时数据处理需求。
用途:
日志收集: Kafka 可以用作集中式的日志收集系统,收集来自不同源头的日志数据。
消息队列: Kafka 可以用作分布式应用程序之间的消息队列,用于解耦和异步通信。
流处理: Kafka 可以与流处理框架(如Apache Spark、Apache Flink等)结合使用,用于实时数据处理和分析。
ZooKeeper
简介: Apache ZooKeeper 是一个开源的分布式协调服务,最初也是由Yahoo开发的,并于2010年成为Apache软件基金会的顶级项目。
特点:
分布式协调: ZooKeeper 提供了分布式应用程序的协调服务,包括配置管理、命名服务、分布式锁等。
高可用性: ZooKeeper 通过在集群中保持多个节点的复制来实现高可用性和容错性。
一致性: ZooKeeper 提供了严格的一致性,确保所有的客户端在同一时间看到相同的数据视图。
用途:
配置管理: ZooKeeper 可以用于分布式系统的配置管理,例如动态配置更新。
命名服务: ZooKeeper 可以提供命名服务,帮助分布式系统中的节点发现和通信。
分布式锁: ZooKeeper 可以用于实现分布式锁,确保在分布式系统中对共享资源的互斥访问。
Kafka 和 ZooKeeper 的关系
在 Kafka 中,ZooKeeper 主要用于管理集群的元数据(如主题、分区、副本分配等)、领导者选举以及生产者和消费者的协调。Kafka 依赖于 ZooKeeper 来确保分布式系统的稳定运行。通常情况下,Kafka 和 ZooKeeper 会一起部署,但它们是两个独立的项目,各自提供不同的功能。
二、创建存储卷
三、搭建Kafka集群
# 操作系统
# CentOS Linux release 7.9.2009 (Core)
lsb_release -a
# 内核版本
# 3.10.0-1160.90.1.el7.x86_64
uname -a
# k8s 版本 1.21
# zookeeper 版本 3.4.10 kafka镜像版本0.11(嫌低可以自己换)
kafka需要依赖zookeeper
kafka的生产者与消费者需要在zookeeper中注册,不然消费者怎么知道生产者是否存活之类的哈哈。废话不多说,直接上干货!
本文用的是statefulset和动态存储部署zookeeper和kafka集群。
zookeeper.yaml
apiVersion: v1
kind: Service
metadata:
name: zk-hs
labels:
app: zk
spec:
ports:
- port: 2888
name: server
- port: 3888
name: leader-election
clusterIP: None
selector:
app: zk
---
apiVersion: v1
kind: Service
metadata:
name: zk-cs
labels:
app: zk
spec:
ports:
- port: 2181
targetPort: 2181
name: client
selector:
app: zk
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
selector:
matchLabels:
app: zk
maxUnavailable: 2
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: zk
spec:
selector:
matchLabels:
app: zk
serviceName: zk-hs
replicas: 3
updateStrategy:
type: RollingUpdate
podManagementPolicy: OrderedReady
template:
metadata:
labels:
app: zk
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zk
topologyKey: "kubernetes.io/hostname"
containers:
- name: kubernetes-zookeeper
imagePullPolicy: IfNotPresent
image: "zhaoguanghui6/kubernetes-zookeeper:1.0-3.4.10"
resources:
requests:
memory: "0.5Gi"
cpu: "0.5"
ports:
- containerPort: 2181
name: client
- containerPort: 2888
name: server
- containerPort: 3888
name: leader-election
command:
- sh
- -c
- "start-zookeeper \
--servers=3 \
--data_dir=/var/lib/zookeeper/data \
--data_log_dir=/var/lib/zookeeper/data/log \
--conf_dir=/opt/zookeeper/conf \
--client_port=2181 \
--election_port=3888 \
--server_port=2888 \
--tick_time=2000 \
--init_limit=10 \
--sync_limit=5 \
--heap=512M \
--max_client_cnxns=60 \
--snap_retain_count=3 \
--purge_interval=12 \
--max_session_timeout=40000 \
--min_session_timeout=4000 \
--log_level=INFO"
readinessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
livenessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
volumeMounts:
- name: datadir
mountPath: /var/lib/zookeeper
securityContext:
runAsUser: 1000
fsGroup: 1000
volumeClaimTemplates:
- metadata:
name: datadir
spec:
storageClassName: nfs-client
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
for i in 0 1 2; do kubectl exec zk-$i -- hostname -f; done
zk-0.zk-headless.default.svc.cluster.local
zk-1.zk-headless.default.svc.cluster.local
zk-2.zk-headless.default.svc.cluster.local
kafka.yaml
---
apiVersion: v1
kind: Service
metadata:
name: kafka-hs
labels:
app: kafka
spec:
ports:
- port: 9092
name: server
clusterIP: None
selector:
app: kafka
---
apiVersion: v1
kind: Service
metadata:
name: kafka-cs
labels:
app: kafka
spec:
selector:
app: kafka
type: NodePort
ports:
- name: client
port: 9092
nodePort: 30092
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: kafka-pdb
spec:
selector:
matchLabels:
app: kafka
minAvailable: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: kafka
spec:
serviceName: kafka-hs
replicas: 3
selector:
matchLabels:
app: kafka
template:
metadata:
labels:
app: kafka
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- kafka
topologyKey: "kubernetes.io/hostname"
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zk
topologyKey: "kubernetes.io/hostname"
terminationGracePeriodSeconds: 300
containers:
- name: kafka
imagePullPolicy: IfNotPresent
image: registry.cn-hangzhou.aliyuncs.com/jaxzhai/k8skafka:v1
resources:
requests:
memory: "1Gi"
cpu: 500m
ports:
- containerPort: 9092
name: server
command:
- sh
- -c
- "exec kafka-server-start.sh /opt/kafka/config/server.properties --override broker.id=${HOSTNAME##*-}
--override listeners=PLAINTEXT://:9092
--override zookeeper.connect=zk-0.zk-hs.default.svc.cluster.local:2181,zk-0.zk-hs.default.svc.cluster.local:2181,zk-0.zk-hs.default.svc.cluster.local:2181
--override log.dir=/var/lib/kafka
--override auto.create.topics.enable=true
--override auto.leader.rebalance.enable=true
--override background.threads=10
--override compression.type=producer
--override delete.topic.enable=true
--override leader.imbalance.check.interval.seconds=300
--override leader.imbalance.per.broker.percentage=10
--override log.flush.interval.messages=9223372036854775807
--override log.flush.offset.checkpoint.interval.ms=60000
--override log.flush.scheduler.interval.ms=9223372036854775807
--override log.retention.bytes=-1
--override log.retention.hours=168
--override log.roll.hours=168
--override log.roll.jitter.hours=0
--override log.segment.bytes=1073741824
--override log.segment.delete.delay.ms=60000
--override message.max.bytes=1000012
--override min.insync.replicas=1
--override num.io.threads=8
--override num.network.threads=3
--override num.recovery.threads.per.data.dir=1
--override num.replica.fetchers=1
--override offset.metadata.max.bytes=4096
--override offsets.commit.required.acks=-1
--override offsets.commit.timeout.ms=5000
--override offsets.load.buffer.size=5242880
--override offsets.retention.check.interval.ms=600000
--override offsets.retention.minutes=1440
--override offsets.topic.compression.codec=0
--override offsets.topic.num.partitions=50
--override offsets.topic.replication.factor=3
--override offsets.topic.segment.bytes=104857600
--override queued.max.requests=500
--override quota.consumer.default=9223372036854775807
--override quota.producer.default=9223372036854775807
--override replica.fetch.min.bytes=1
--override replica.fetch.wait.max.ms=500
--override replica.high.watermark.checkpoint.interval.ms=5000
--override replica.lag.time.max.ms=10000
--override replica.socket.receive.buffer.bytes=65536
--override replica.socket.timeout.ms=30000
--override request.timeout.ms=30000
--override socket.receive.buffer.bytes=102400
--override socket.request.max.bytes=104857600
--override socket.send.buffer.bytes=102400
--override unclean.leader.election.enable=true
--override zookeeper.session.timeout.ms=6000
--override zookeeper.set.acl=false
--override broker.id.generation.enable=true
--override connections.max.idle.ms=600000
--override controlled.shutdown.enable=true
--override controlled.shutdown.max.retries=3
--override controlled.shutdown.retry.backoff.ms=5000
--override controller.socket.timeout.ms=30000
--override default.replication.factor=1
--override fetch.purgatory.purge.interval.requests=1000
--override group.max.session.timeout.ms=300000
--override group.min.session.timeout.ms=6000
--override inter.broker.protocol.version=0.10.2-IV0
--override log.cleaner.backoff.ms=15000
--override log.cleaner.dedupe.buffer.size=134217728
--override log.cleaner.delete.retention.ms=86400000
--override log.cleaner.enable=true
--override log.cleaner.io.buffer.load.factor=0.9
--override log.cleaner.io.buffer.size=524288
--override log.cleaner.io.max.bytes.per.second=1.7976931348623157E308
--override log.cleaner.min.cleanable.ratio=0.5
--override log.cleaner.min.compaction.lag.ms=0
--override log.cleaner.threads=1
--override log.cleanup.policy=delete
--override log.index.interval.bytes=4096
--override log.index.size.max.bytes=10485760
--override log.message.timestamp.difference.max.ms=9223372036854775807
--override log.message.timestamp.type=CreateTime
--override log.preallocate=false
--override log.retention.check.interval.ms=300000
--override max.connections.per.ip=2147483647
--override num.partitions=1
--override producer.purgatory.purge.interval.requests=1000
--override replica.fetch.backoff.ms=1000
--override replica.fetch.max.bytes=1048576
--override replica.fetch.response.max.bytes=10485760
--override reserved.broker.max.id=1000 "
env:
- name: KAFKA_HEAP_OPTS
value : "-Xmx1G -Xms1G"
- name: KAFKA_OPTS
value: "-Dlogging.level=INFO"
volumeMounts:
- name: datadir
mountPath: /var/lib/kafka
readinessProbe:
exec:
command:
- sh
- -c
- "/opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server=localhost:9092"
timeoutSeconds: 5
periodSeconds: 5
initialDelaySeconds: 70
securityContext:
runAsUser: 1000
fsGroup: 1000
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.beta.kubernetes.io/storage-class: "nfs-client"
spec:
accessModes: [ "ReadWriteMany" ]
resources:
requests:
storage: 5Gi
四、验证集群
验证kafka是否可用:
1、进入kafka-0命令: kubectl exec -it kafka-0 bash
进入容器目录:cd /opt/kafka/config
2、创建一个名为aaa的topc命令:kafka-topics.sh --create --topic aaa --zookeeper zk-0.zk-headless.default.svc.cluster.local:2181,zk-1.zk-headless.default.svc.cluster.local:2181,zk-2.zk-headless.default.svc.cluster.local:2181 --partitions 3 --replication-factor 2
结果为:
Created topic "aaa".
3、进入topic为aaa的生产者消息中心:kafka-console-consumer.sh --topic aaa --bootstrap-server localhost:9092
4、复制新的会话,进入另一个容器kafka-1:kubectl exec -it kafka-1 bash
进入消费者,输入命令:kafka-console-producer.sh --topic aaa --broker-list localhost:9092
输入:
hello
i lovle you
回车后,可在生产者消息中心看到消息