kafka下载安装部署

Apache kafka 是一个分布式的基于push-subscribe的消息系统,它具备快速、可扩展、可持久化的特点。它现在是Apache旗下的一个开源系统,作为hadoop生态系统的一部分,被各种商业公司广泛应用。它的最大的特性就是可以实时的处理大量数据以满足各种需求场景:比如基于hadoop的批处理系统、低延迟的实时系统、storm/spark流式处理引擎。

kafka的特性:

1.高吞吐量、低延迟:kafka每秒可以处理几十万条消息,它的延迟最低只有几毫秒

2.可扩展性:kafka集群支持热扩展

3.持久性、可靠性:消息被持久化到本地磁盘,并且支持数据备份防止数据丢失

4.容错性:允许集群中节点失败(若副本数量为n,则允许n-1个节点失败)

5.高并发:支持数千个客户端同时读写

kafka是一个分布式消息系统,由linkedin使用scala编写,用作LinkedIn的活动流(Activity Stream)和运营数据处理管道(Pipeline)的基础。具有高水平扩展和高吞吐量。

Kafka 和其他主流分布式消息系统的对比

1、下载安装kafka

Kafka需要依赖JAVA环境运行,需要先下载安装JDK。Kafka支持内置的Zookeeper和引用外部的Zookeeper,如果使用外部的zookeeper,需要提前下载安装zookeeper (zookeeper下载安装部署)。

在安装jdk之前,先卸载Linux系统自带的jdk。

通过 rpm -qa | grep jdk 命令查看系统自带的jdk,并通过 rpm -e --nodeps命令逐个卸载。

Jdk8下载地址:Java Downloads | Oracle

下载后上传到Linux系统的某个目录下,解压并移动到/usr/local目录下。

bash 复制代码
tar -zxvf jdk-8u391-linux-x64.tar.gz
mv jdk1.8.0_391 /usr/local/jdk1.8/jdk1.8.0_391

配置环境变量,修改 /etc/profile 文件,添加如下jdk的配置。

bash 复制代码
#set java environment
export JAVA_HOME=/usr/local/jdk1.8/jdk1.8.0_391
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH:$HOME/bin

然后执行 source /etc/profile 命令使得修改立即生效。

kafka下载地址:Apache Kafka

在/usr/local/目录下创建kafka目录,并在kafka目录下通过wget命令下载kafka压缩包,或者将在Windows系统中下载好的kafka压缩包通过Xftp传到kafka目录中,然后解压。

bash 复制代码
tar -zxvf kafka_2.12-3.6.1.tgz

最后使用root用户修改/etc/profile文件,添加kafka启动bin目录,以便在任何目录下都可以通过cd $KAFKA_HOME命令进入到kafka安装目录。最后通过source /etc/profile 命令使得修改生效。

配置环境变量,修改 /etc/profile 文件,在最后加上如下配置:

bash 复制代码
export KAFKA_HOME=/usr/local/kafka/kafka_2.12-3.6.1
export PATH=$KAFKA_HOME/bin:$PATH

然后执行 source /etc/profile 命令使得修改立即生效。

2、单机部署

2.1、修改配置文件

在 /usr/local/kafka/kafka_2.12-3.6.1 目录下创建一个用于存放日志的目录mylogs,在server.properties配置文件中会使用到这个目录。

bash 复制代码
mkdir -p /usr/local/kafka/kafka_2.12-3.6.1/mylogs

修改 kafka 的配置文件: server.properties

bash 复制代码
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0

port=9092
host.name=192.168.10.188

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://192.168.10.188:9092

# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
#log.dirs=/tmp/kafka-logs
#log.dirs=D:/MySoftware/Install/tools/kafka/logs
log.dirs=/usr/local/kafka/kafka_2.12-3.6.1/mylogs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
# zookeeper.connect=localhost:2181
zookeeper.connect=192.168.10.188:2181/kafka

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=18000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0


#Delete topic
delete.topic.enable=true

server.properties 中只添加了 port和 host.name ,并修改了 log.dirs zookeeper.connect 属性,其余都是默认。 另外,还要开启 listeners 属性,不然在后面启动 consumer 接受消息时看不到消息。

配置文件中参数详解:

复制代码
broker.id=0  #当前机器在集群中的唯一标识,和zookeeper的myid性质一样
复制代码
port=9092 #当前kafka对外提供服务的端口默认是9092
复制代码
host.name=192.168.7.100 #这个参数默认是关闭的,在0.8.1有个bug,DNS解析问题,失败率的问题。改成自己centos的ip地址。
复制代码
num.network.threads=3 #这个是borker进行网络处理的线程数
复制代码
num.io.threads=8 #这个是borker进行I/O处理的线程数
复制代码
log.dirs=/opt/kafka/kafkalogs/ #消息存放的目录,这个目录可以配置为","逗号分割的表达式,上面的num.io.threads要大于这个目录的个数这个目录,如果配置多个目录,新创建的topic他把消息持久化的地方是,当前以逗号分割的目录中,那个分区数最少就放那一个
复制代码
socket.send.buffer.bytes=102400 #发送缓冲区buffer大小,数据不是一下子就发送的,先回存储到缓冲区了到达一定的大小后在发送,能提高性能
复制代码
socket.receive.buffer.bytes=102400 #kafka接收缓冲区大小,当数据到达一定大小后在序列化到磁盘
复制代码
socket.request.max.bytes=104857600 #这个参数是向kafka请求消息或者向kafka发送消息的请请求的最大数,这个值不能超过java的堆栈大小
复制代码
num.partitions=1 #默认的分区数,一个topic默认1个分区数
复制代码
log.retention.hours=168 #默认消息的最大持久化时间,168小时,7天
复制代码
message.max.byte=5242880  #消息保存的最大值5M
复制代码
default.replication.factor=2  #kafka保存消息的副本数,如果一个副本失效了,另一个还可以继续提供服务
复制代码
replica.fetch.max.bytes=5242880  #取消息的最大直接数
复制代码
log.segment.bytes=1073741824 #这个参数是:因为kafka的消息是以追加的形式落地到文件,当超过这个值的时候,kafka会新起一个文件
复制代码
log.retention.check.interval.ms=300000 #每隔300000毫秒去检查上面配置的log失效时间(log.retention.hours=168 ),到目录查看是否有过期的消息如果有,删除
复制代码
log.cleaner.enable=false #是否启用log压缩,一般不用启用,启用的话可以提高性能
复制代码
zookeeper.connect=192.168.7.100:2181,192.168.7.101:2181,192.168.7.107:2181/kafka #设置zookeeper的连接端口,在集群配置时,要把所有机器的ip地址都要写上,这里以三个机器为例。如果是单机部署,只需要写一个ip地址就行了。

**注意:**在zookeeper.connect的最后加上/kafka是因为kafka需要依赖zookeeper,在kafka启动之后默认会在zookeeper服务所在节点的根目录下创建很多与kafka有关的目录,这样就会导致zookeeper服务所在节点的根目录下的文件很多很乱。另外,如果多个kafka共用一个zookeeper,就会导致zookeeper服务的根目录下各个kafka文件更加混乱。所以在zookeeper.connect的最后加上/kafka是为了在kafka启动时将创建的文件都放到zookeeper节点根目录下的/kafka子目录下。多个kafka共用一个zookeeper时可以分别配置自己的子目录以示区分。

启动zookeeper和kafka之后,会自动在zookeeper节点上创建/kafka目录。

2.2、配置和启动zookeeper

方式一:使用kafka自带的zookeeper,修改zookeeper.properties配置文件:

bash 复制代码
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
# 
#    http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# the directory where the snapshot is stored.
# dataDir=D:/MySoftware/Install/tools/kafka/tmp/zookeeper
dataDir=/usr/local/kafka/kafka_2.12-3.6.1/tmp/zk/data
dataLogDir=/usr/local/kafka/kafka_2.12-3.6.1/tmp/zk/logs
# the port at which the clients will connect
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
maxClientCnxns=0
# Disable the adminserver by default to avoid port conflicts.
# Set the port to something non-conflicting if choosing to enable this
admin.enableServer=false
# admin.serverPort=8080

进入到kafka安装目录下,执行如下命令启动zookeeper:

bin/zookeeper-server-start.sh config/zookeeper.properties

bash 复制代码
bin/zookeeper-server-start.sh config/zookeeper.properties

方式二:使用外部安装的zookeeper。

这里使用外部安装的zookeeper。由于这里kafka是单机部署,所以zookeeper也要使用单机部署,具体步骤见 zookeeper下载安装部署 中的单机部署zookeeper部分。因为之前已经安装并配置了zookeeper,所以这里不在配置了,直接启动就行了。

进入到zookeeper安装目录下的bin目录中,执行如下命令启动zookeeper服务端。

./zkServer.sh start

bash 复制代码
./zkServer.sh start

2.3、启动kafka

切记:启动kafka之前必须先启动zookeeper。

复制代码
进入到kafka的bin目录下,启动kafka。参数-daemon的含义是指启动的服务进程是作为后台进程(守护进程模式)启动,不加就是作为前端线程来启动。Kafka在启动一段时间后,如果出现服务自动关闭情况,可在启动kafka的时使用守护进程模式启动,即在原启动命令中加 -daemon。启动之后用jps命令检查是否启动。启动命令:./kafka-server-start.sh -daemon ../config/server.properties
bash 复制代码
# 进入到 bin 目录
./kafka-server-start.sh -daemon ../config/server.properties

# 或者进入到kafka安装目录,执行如下命令
bin/kafka-server-start.sh -daemon config/server.properties

2.4、创建、查看、删除topic

创建topic:

创建一个名字为testKafka的topic,只有一个副本,一个分区。

进入到kafka安装目录的bin目录下,执行kafka-topics.sh脚本。

--zookeeper参数表示要指定zookeeper服务的安装节点,多个节点可以用逗号分隔。并且最后加上在server.properties配置文件中zookeeper.connect属性设置的kafka启动时存储信息的路径,上面配置文件中zookeeper.connect属性配置的路径是/kafka。

命令:./kafka-topics.sh --zookeeper 192.168.10.188:2181 /kafka --create --replication-factor 1 --partitions 1 --topic testKafka

bash 复制代码
./kafka-topics.sh --zookeeper 192.168.10.188:2181/kafka --create --replication-factor 1 --partitions 1 --topic testKafka
复制代码
#参数解释
复制代码
--replication-factor 1   #副本因子是1
复制代码
--partitions 1   #创建1个分区
复制代码
--topic testKafka   #主题为testKafka

192.168.10.188:2181是在server.properties文件中配置的zookeeper.connect,这个是zk的连接端口。

查看topic及topic状态:

查看topic的命令:./kafka-topics.sh -zookeeper 192.168.10.188:2181 -list

或者:./kafka-topics.sh --zookeeper 192.168.10.188:2181 --list

bash 复制代码
./kafka-topics.sh --zookeeper 192.168.10.188:2181 --list

查看topic状态的命令:./kafka-topics.sh --zookeeper 192.168.10.188:2181 --describe --topic testKafka

bash 复制代码
./kafka-topics.sh --zookeeper 192.168.10.188:2181 --describe --topic testKafka

leader:负责处理消息的读和写,leader是从所有节点中随机选择的.
replicas:列出了所有的副本节点,不管节点是否在服务中.
isr:是正在服务中的节点.

此处是单机部署kafka,只有一个broker,在server.properties文件中broker.id=0,所以此处leader是节点为0的broker。

删除topic:

命令:./kafka-topics.sh --zookeeper 192.168.10.188:2181 --delete --topic testKafka

bash 复制代码
./kafka-topics.sh --zookeeper 192.168.10.188:2181 --delete --topic testKafka
bash 复制代码
#Delete topic
delete.topic.enable=true

2.5、启动producer和consumer

在介绍启动producer和consumer的命令之前,先简单了解一下broker-list、bootstrap-servers和zookeeper几个参数。

1.broker:kafka服务端,可以是一个服务器也可以是一个集群。producer和consumer都相当于这个服务端的客户端。

2.broker-list:指定kafka集群中的一个或多个服务器,一般在使用kafka-console-producer.sh的时候,这个参数是必备参数,另外一个必备的参数是topic。

3.bootstrap-servers指的是kafka目标集群的服务器地址,这和broker-list功能一样,不过在启动producer时要求用broker-list,在启动consumer时用bootstrap-servers。

  1. zookeeper指的是zk服务器或zk集群的地址。旧版本(0.9以前)的kafka,消费的进度(offset)是写在zk中的,所以启动consumer需要知道zk的地址。后来的版本都统一由broker管理,所以在启动consumer时就用bootstrap-server。

启动producer并发送消息,发送消息之后用Ctrl+C结束。

命令:./kafka-console-producer.sh --broker-list 192.168.10.188:9092 --topic testKafka

bash 复制代码
./kafka-console-producer.sh --broker-list 192.168.10.188:9092 --topic testKafka

启动consumer并接受消息。按Ctrl+C结束。

命令:./kafka-console-consumer.sh --zookeeper 192.168.10.188:2181 --topic testKafka --from-beginning (参数 zookeeper bootstrap-server 代替了)

或者:./kafka-console-consumer.sh --bootstrap-server 192.168.10.188:9092 --topic testKafka --from-beginning

bash 复制代码
./kafka-console-consumer.sh --bootstrap-server 192.168.10.188:9092 --topic testKafka --from-beginning

参数**--zookeeper 192.168.10.188:2181** 中的ip和port是zookeeper节点的ip和zookeeper的port,参数**--bootstrap-server 192.168.10.188:9092**中的ip和port是kafka节点的ip和kafka的port。

2.6、查看消费者组以及消息是否积压

查看消费者组的命令:

bin/kafka-consumer-groups.sh --bootstrap-server 192.168.10.188:9092 --list

bash 复制代码
bin/kafka-consumer-groups.sh --bootstrap-server 192.168.10.188:9092 --list

查看消息是否有积压的命令:

bin/kafka-consumer-groups.sh --bootstrap-server 192.168.10.188:9092 --describe --group consumer-group-01

bash 复制代码
bin/kafka-consumer-groups.sh --bootstrap-server 192.168.10.188:9092 --describe --group consumer-group-01

上图是在windows系统中执行kafka命令的截图,与Linux系统命令类似。上图中GROUP表示消费者组,TOPIC表示消息主题,PARTITION表示分区,CURRENT-OFFSET表示当前消费的消息条数,LOG-END-OFFSET表示kafka中生产的消息条数,LAG表示kafka中有多少条消息还未消费,也就是有多少条积压的消息。

在kafka中,消费者是按批次拉取数据的,每一批次拉取的数据条数是0-n条,每个消费者可以拉取多个分区的数据,但是一个分区的数据只能被同一个消费者组中的一个消费者拉取。如果一个消费者拉取多个分区的数据,那么拉取的这一批次的数据就包含多个分区的数据。消费者处理完这批数据之后,会将offset提交到__consumer_offsets这个topic中,__consumer_offsets(是一个topic)就是用于维护消费者消费到哪条数据offset的,是按照分区粒度维护的,各个分区的offset是互不影响的。例如一个consumer拉取两个分区(p0、p1)的数据,如果p0分区的数据处理完并将offset提交到__consumer_offsets中,而p1分区的数据还未处理完,p1分区的offset还未提交到__consumer_offsets中,此时consumer异常重启,consumer不会再拉取p0分区上次已消费的数据,但是会重新拉取p1分区上次消费但未提交的数据。

__consumer_offsets 这个 topic kafka 自动创建的,当 consumer 消费数据之后, consumer 就会把 offset 提交到 __consumer_offsets 中。

2.7、关闭zookeeper和kafka

关闭kafka的命令:./kafka-server-stop.sh (必须进到kafka的bin目录下才能执行该命令)

关闭zk的命令:./zkServer.sh stop (必须进到zookeeper的bin目录下才能执行该命令)

3、集群部署

集群部署的步骤与单机部署几乎是一样的,主要的区别在于kafka的配置文件。

Kafka集群是把状态保存在Zookeeper中的,首先要搭建Zookeeper集群。

Zookeeper的集群部署具体步骤见 zookeeper下载安装部署 中的集群部署zookeeper部分。这里与zookeeper集群部署一样,仍然使用三台计算机构成kafka集群。下面先在一台计算机上部署kafka,另外两台计算机的配置与这一台完全一样,只需修改配置文件中对应节点的ip和broker.id。假设三台计算机的ip地址分别是192.168.1.128192.168.1.129192.168.1.130

3.1、修改配置文件

在 /usr/local/kafka/kafka_2.12-3.6.1 目录下创建一个用于存放日志的目录mylogs,在server.properties配置文件中会使用到这个目录。

bash 复制代码
mkdir -p /usr/local/kafka/kafka_2.12-3.6.1/mylogs

修改 kafka 的配置文件: server.properties

bash 复制代码
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1

port=9092
host.name=192.168.1.128

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://192.168.1.128:9092

# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
#log.dirs=/tmp/kafka-logs
#log.dirs=D:/MySoftware/Install/tools/kafka/logs
log.dirs=/usr/local/kafka/kafka_2.12-3.6.1/mylogs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
# zookeeper.connect=localhost:2181
zookeeper.connect=192.168.1.128:2181,192.168.1.129:2181,192.168.1.130:2181/kafka

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=18000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0


#Delete topic
delete.topic.enable=true

在server.properties文件中主要配置的就是broker.id port host.name listeners log.dirs zookeeper.connect 这六个属性,其他的都是默认值。

在zookeeper.connect的最后加上/kafka是因为kafka需要依赖zookeeper,在kafka启动之后默认会在zookeeper服务所在节点的根目录下创建很多与kafka有关的目录,这样就会导致zookeeper服务所在节点的根目录下的文件很多很乱。另外,如果多个kafka共用一个zookeeper,就会导致zookeeper服务的根目录下各个kafka文件更加混乱。所以在zookeeper.connect的最后加上/kafka是为了在kafka启动时将创建的文件都放到zookeeper节点根目录下的/kafka子目录下。多个kafka共用一个zookeeper时可以分别配置自己的子目录以示区分。

启动zookeeper和kafka之后,会自动在zookeeper节点上创建/kafka目录。

3.2、配置和启动zookeeper

Zookeeper的集群部署具体步骤见 zookeeper下载安装部署 中的集群部署zookeeper部分。

3.3、启动kafka

复制代码
三个机器都要启动kafka。进入到kafka的bin目录下,启动kafka。启动之后用jps命令检查是否启动。

启动命令./kafka-server-start.sh -daemon ../config/server.properties

bash 复制代码
# 进入到 bin 目录
./kafka-server-start.sh -daemon ../config/server.properties

# 或者进入到kafka安装目录,执行如下命令
bin/kafka-server-start.sh -daemon config/server.properties

3.4、创建topic

创建一个名字为testKafka的topic,有两个副本,两个分区。

--zookeeper参数表示要指定zookeeper服务的安装节点,多个节点可以用逗号分隔。并且最后加上在server.properties配置文件中zookeeper.connect属性设置的kafka启动时存储信息的路径,上面配置文件中zookeeper.connect属性配置的路径是/kafka。

命令:./kafka-topics.sh --zookeeper 192.168.1.128:2181/kafka --create --replication-factor 2 --partitions 2 --topic testKafka

bash 复制代码
./kafka-topics.sh --zookeeper 192.168.1.128:2181/kafka --create --replication-factor 2 --partitions 2 --topic testKafka

3.5、启动producer和consumer

启动producer并发送消息,发送消息之后用Ctrl+C结束。

命令:./kafka-console-producer.sh --broker-list 192.168.1.128:9092 --topic testKafka

bash 复制代码
./kafka-console-producer.sh --broker-list 192.168.1.128:9092 --topic testKafka

启动consumer并接受消息。按Ctrl+C结束。

命令:./kafka-console-consumer.sh --bootstrap-server 192.168.1.128:9092 --topic testKafka --from-beginning

bash 复制代码
./kafka-console-consumer.sh --bootstrap-server 192.168.1.128:9092 --topic testKafka --from-beginning
相关推荐
不吃饭的猪15 分钟前
记一次运行spark报错
大数据·分布式·spark
qq_4639448620 分钟前
【Spark征服之路-2.1-安装部署Spark(一)】
大数据·分布式·spark
昭阳~1 小时前
Kafka深度技术解析:架构、原理与最佳实践
分布式·架构·kafka
ikun·2 小时前
Kafka 消息队列
分布式·kafka
ghie90902 小时前
Spring Boot使用Redis实现分布式锁
spring boot·redis·分布式
风麒麟2 小时前
13. springCloud AlibabaSeata处理分布式事务
分布式·spring·spring cloud
纪元A梦3 小时前
分布式流处理与消息传递——向量时钟 (Vector Clocks) 算法详解
java·分布式·算法
Wo3Shi4七4 小时前
怎么在Kafka上支持延迟消息?
后端·kafka·消息队列
后端码匠5 小时前
Kafka 单机部署启动教程(适用于 Spark + Hadoop 环境)
hadoop·spark·kafka