clickhouse集群+Zk优化-解决只读模式,主节点磁盘增长快

问题1:数据库进入只读模式

最近在项目中使用clickhouse的时候,遇到了一个批量插入后报错的问题。报错的内容是数据库进入了只读模式,导致数据写不进去。发现有大量的批量写入报错日志信息。(关键异常信息:DB::Exception: Table is in readonly mode)

部署环境:使用的clickhouse的版本是20.8,比较早的版本了。微服务协调组件使用的是zookeeper。三个节点的集群。

主要原因:

  1. 数据量很大时,ClickHouse中的表可能会进入只读模式,这主要是因为元数据(metadata)的增长超过了协调服务(如ZooKeeper)能够有效处理的范围。

  2. ZooKeeper需要在多个节点之间保持元数据的一致性。随着元数据的增加,同步这些数据的时间也会增加,从而可能导致同步延迟。

  3. ZooKeeper非常强大且可靠,但它也有其自身的性能限制。当元数据的更新过于频繁时,ZooKeeper可能无法及时处理所有的更新请求。

总结:数据量过大导致Zookeeper负载过高,主从同步的时间过长,导致主节点没有收到从节点的完成响应。主节点在等待从节点完成的期间,继而引发ClickHouse进入只读模式。

数据库报错信息:

bash 复制代码
2024.08.12 09:40:15.396881 [ 13043 ] {2f16eacd-1c5f-4560-bc5f-c121280643d7} <Error> DynamicQueryHandler: Code: 242, e.displayText() = DB::Exception: Table is in readonly mode (zookeeper path: /clickhouse/aiot_tb/1/ts_kv_cluster), Stack trace (when copying this message, always include the lines below):

0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x13cd24bc in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0xa4346c9 in /usr/bin/clickhouse
2. ? @ 0x9efb0cf in /usr/bin/clickhouse
3. DB::StorageReplicatedMergeTree::write(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::Context const&) @ 0x10fa0404 in /usr/bin/clickhouse
4. DB::PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(std::__1::shared_ptr<DB::IStorage> const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::Context const&, std::__1::shared_ptr<DB::IAST> const&, bool) @ 0x114eea16 in /usr/bin/clickhouse
5. DB::InterpreterInsertQuery::execute() @ 0x10ceed1e in /usr/bin/clickhouse
6. ? @ 0x10e0e178 in /usr/bin/clickhouse
7. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) @ 0x10e0fe08 in /usr/bin/clickhouse
8. DB::HTTPHandler::processQuery(DB::Context&, Poco::Net::HTTPServerRequest&, HTMLForm&, Poco::Net::HTTPServerResponse&, DB::HTTPHandler::Output&) @ 0x117e3a2e in /usr/bin/clickhouse
9. DB::HTTPHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&) @ 0x117e7260 in /usr/bin/clickhouse
10. Poco::Net::HTTPServerConnection::run() @ 0x13c1466e in /usr/bin/clickhouse
11. Poco::Net::TCPServerConnection::start() @ 0x13c11197 in /usr/bin/clickhouse
12. Poco::Net::TCPServerDispatcher::run() @ 0x13c1158b in /usr/bin/clickhouse
13. Poco::PooledThread::run() @ 0x13d481ae in /usr/bin/clickhouse
14. Poco::ThreadImpl::runnableEntry(void*) @ 0x13d44ea9 in /usr/bin/clickhouse
15. start_thread @ 0x7ea5 in /usr/lib64/libpthread-2.17.so
16. __clone @ 0xfdf3d in /usr/lib64/libc-2.17.so
 (version 20.8.3.18)
 
2024.08.12 09:40:15.397685 [ 12358 ] {17419ab6-1261-439d-b3df-6e80cfcab4b3} <Error> executeQuery: Code: 242, e.displayText() = DB::Exception: Table is in readonly mode (zookeeper path: /clickhouse/aiot_tb/1/ts_kv_cluster) (version 20.8.3.18) (from 192.168.249.17:55888) (in query: insert into ts_kv_cluster (entity_id,key,ts,bool_v,str_v,long_v,dbl_v,json_v) FORMAT RowBinary ), Stack trace (when copying this message, always include the lines below):

0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x13cd24bc in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0xa4346c9 in /usr/bin/clickhouse
2. ? @ 0x9efb0cf in /usr/bin/clickhouse
3. DB::StorageReplicatedMergeTree::write(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::Context const&) @ 0x10fa0404 in /usr/bin/clickhouse
4. DB::PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(std::__1::shared_ptr<DB::IStorage> const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::Context const&, std::__1::shared_ptr<DB::IAST> const&, bool) @ 0x114eea16 in /usr/bin/clickhouse
5. DB::InterpreterInsertQuery::execute() @ 0x10ceed1e in /usr/bin/clickhouse
6. ? @ 0x10e0e178 in /usr/bin/clickhouse
7. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) @ 0x10e0fe08 in /usr/bin/clickhouse
8. DB::HTTPHandler::processQuery(DB::Context&, Poco::Net::HTTPServerRequest&, HTMLForm&, Poco::Net::HTTPServerResponse&, DB::HTTPHandler::Output&) @ 0x117e3a2e in /usr/bin/clickhouse
9. DB::HTTPHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&) @ 0x117e7260 in /usr/bin/clickhouse
10. Poco::Net::HTTPServerConnection::run() @ 0x13c1466e in /usr/bin/clickhouse
11. Poco::Net::TCPServerConnection::start() @ 0x13c11197 in /usr/bin/clickhouse
12. Poco::Net::TCPServerDispatcher::run() @ 0x13c1158b in /usr/bin/clickhouse
13. Poco::PooledThread::run() @ 0x13d481ae in /usr/bin/clickhouse
14. Poco::ThreadImpl::runnableEntry(void*) @ 0x13d44ea9 in /usr/bin/clickhouse
15. start_thread @ 0x7ea5 in /usr/lib64/libpthread-2.17.so
16. __clone @ 0xfdf3d in /usr/lib64/libc-2.17.so

zookeeper报错信息

关键异常总结:

Unexpected exception in LearnerHandler: java.net.SocketException: Socket closed

Ignoring unexpected exception java.lang.InterruptedException

Unexpected exception in LearnerHandler: java.net.SocketException: Connection reset

Exception while sending packets in LearnerHandler java.net.SocketException: Broken pipe (Write failed)

will be dropped if server is in r-o mode

bash 复制代码
2024-08-12 12:15:01,957 [myid:1] - WARN  [SyncThread:1:FileTxnLog@393] - fsync-ing the write ahead log in SyncThread:1 took 14266ms which will adversely effect operation latency.File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2024-08-12 12:15:09,968 [myid:1] - WARN  [SyncThread:1:FileTxnLog@393] - fsync-ing the write ahead log in SyncThread:1 took 8010ms which will adversely effect operation latency.File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2024-08-12 12:15:09,974 [myid:1] - INFO  [CommitProcessor:1:LeaderSessionTracker@104] - Committing global session 0x20404e1f22a0002
2024-08-12 12:15:09,974 [myid:1] - INFO  [CommitProcessor:1:LeaderSessionTracker@104] - Committing global session 0x20404e1f22a0003
2024-08-12 12:15:09,974 [myid:1] - INFO  [CommitProcessor:1:LeaderSessionTracker@104] - Committing global session 0x30404dd5e2c0004
2024-08-12 12:15:11,475 [myid:1] - WARN  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):LearnerHandler@1077] - Closing connection to peer due to transaction timeout.
2024-08-12 12:15:11,475 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):LearnerHandler@1158] - Synchronously closing socket to learner 3.
2024-08-12 12:15:11,476 [myid:1] - ERROR [LearnerHandler-/192.168.249.20:52462:LearnerHandler@718] - Unexpected exception in LearnerHandler: 
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.read(SocketInputStream.java:204)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96)
        at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86)
        at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134)
        at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:655)
2024-08-12 12:15:11,476 [myid:1] - WARN  [LearnerHandler-/192.168.249.20:52462:LearnerHandler@734] - ******* GOODBYE /192.168.249.20:52462 ********
2024-08-12 12:15:11,476 [myid:1] - WARN  [LearnerHandler-/192.168.249.20:52462:LearnerHandler@1048] - Ignoring unexpected exception
java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
        at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
        at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
        at org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:1046)
        at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:736)
2024-08-12 12:15:11,482 [myid:1] - WARN  [NIOWorkerThread-15:ZooKeeperServer@1417] - Connection request from old client /192.168.249.19:47342; will be dropped if server is in r-o mode
2024-08-12 12:15:14,149 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@390] - Notification: my state:LEADING; n.sid:3, n.state:LOOKING, n.leader:3, n.round:0x34e, n.peerEpoch:0x34c, n.zxid:0x34c0002996f, message format version:0x2, n.config version:0x0
2024-08-12 12:15:14,154 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52948:LearnerHandler@509] - Follower sid: 3 : info : bigdata03:2888:3888:participant
2024-08-12 12:15:14,155 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52948:ZKDatabase@346] - On disk txn sync enabled with snapshotSizeFactor 0.33
2024-08-12 12:15:14,155 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52948:LearnerHandler@804] - Synchronizing with Learner sid: 3 maxCommittedLog=0x34c0002996f minCommittedLog=0x34c0002977b lastProcessedZxid=0x34c0002996f peerLastZxid=0x34c0002996f
2024-08-12 12:15:14,155 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52948:LearnerHandler@849] - Sending DIFF zxid=0x34c0002996f for peer sid: 3
2024-08-12 12:15:15,968 [myid:1] - WARN  [SyncThread:1:FileTxnLog@393] - fsync-ing the write ahead log in SyncThread:1 took 5987ms which will adversely effect operation latency.File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2024-08-12 12:15:17,187 [myid:1] - WARN  [SyncThread:1:FileTxnLog@393] - fsync-ing the write ahead log in SyncThread:1 took 1218ms which will adversely effect operation latency.File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2024-08-12 12:15:17,188 [myid:1] - INFO  [CommitProcessor:1:LeaderSessionTracker@104] - Committing global session 0x10327dc03f20009
2024-08-12 12:15:18,500 [myid:1] - WARN  [SyncThread:1:FileTxnLog@393] - fsync-ing the write ahead log in SyncThread:1 took 1310ms which will adversely effect operation latency.File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2024-08-12 12:15:19,987 [myid:1] - WARN  [SyncThread:1:FileTxnLog@393] - fsync-ing the write ahead log in SyncThread:1 took 1486ms which will adversely effect operation latency.File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2024-08-12 12:15:22,481 [myid:1] - ERROR [LearnerHandler-/192.168.249.20:52948:LearnerHandler@718] - Unexpected exception in LearnerHandler: 
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:210)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96)
        at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86)
        at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134)
        at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:655)
2024-08-12 12:15:22,481 [myid:1] - ERROR [Sender-/192.168.249.20:52948:LearnerHandler@371] - Exception while sending packets in LearnerHandler
java.net.SocketException: Broken pipe (Write failed)
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at org.apache.zookeeper.server.quorum.LearnerHandler.sendPackets(LearnerHandler.java:335)
        at org.apache.zookeeper.server.quorum.LearnerHandler.access$200(LearnerHandler.java:65)
        at org.apache.zookeeper.server.quorum.LearnerHandler$1.run(LearnerHandler.java:750)
2024-08-12 12:15:22,481 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52948:LearnerHandler@1158] - Synchronously closing socket to learner 3.
2024-08-12 12:15:22,481 [myid:1] - WARN  [LearnerHandler-/192.168.249.20:52948:LearnerHandler@734] - ******* GOODBYE /192.168.249.20:52948 ********
2024-08-12 12:15:22,482 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@390] - Notification: my state:LEADING; n.sid:3, n.state:LOOKING, n.leader:3, n.round:0x34e, n.peerEpoch:0x34c, n.zxid:0x34c0002996f, message format version:0x2, n.config version:0x0
2024-08-12 12:15:22,976 [myid:1] - WARN  [SyncThread:1:FileTxnLog@393] - fsync-ing the write ahead log in SyncThread:1 took 1494ms which will adversely effect operation latency.File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2024-08-12 12:15:23,835 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52950:LearnerHandler@509] - Follower sid: 3 : info : bigdata03:2888:3888:participant
2024-08-12 12:15:23,836 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52950:ZKDatabase@346] - On disk txn sync enabled with snapshotSizeFactor 0.33
2024-08-12 12:15:23,836 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52950:LearnerHandler@804] - Synchronizing with Learner sid: 3 maxCommittedLog=0x34c0002997f minCommittedLog=0x34c0002978b lastProcessedZxid=0x34c0002997f peerLastZxid=0x34c0002996f
2024-08-12 12:15:23,836 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52950:LearnerHandler@868] - Using committedLog for peer sid: 3

解决方式1

优化zookeeper的配置

这个方式目前来看并不能完全解决,但是可以有效降低zookeeper的问题时间。

修改zookeeper安装目录下面的配置zoo.cfg

核心的点:

bash 复制代码
autopurge.snapRetainCount=3
syncLimit=10
forceSync=no

配置含义解析:

autopurge.snapRetainCount

保留autopurge.snapRetainCount个最近的数据快照(dataDir)和对应的事务日志文件(dataLogDir),其余的将会删除掉。默认值是3,最小值也是3。

syncLimit

集群中的 Leader 节点和 Follower 节点之间请求和应答时能容忍的最多心跳数

forceSync

最关键的一点,该参数用于配置ZooKeeper服务器是否在事务提交的时候,将日志写入操作强制刷入磁盘,默认情况下是"yes",即每次事务日志写入操作都会实时刷入磁盘。forceSync=no,这个对写请求的性能提升很有帮助,是指每次写请求的数据都要从pagecache中固化到磁盘上,才算是写成功返回。当写请求数量到达一定程度的时候,后续写请求会等待前面写请求的forceSync操作,造成一定延时。如果追求低延时的写请求,配置forceSync=no,数据写到pagecache后就返回。但是机器断电的时候,pagecache中的数据有可能丢失。

我用这个方法优化后,报错减少了非常多,偶发性的还有少量报错。也是因为zookeeper同步的时候会有。但不是只读模式的报错,而是请求数据库超时导致的。

参考:ZooKeeper调优 (qq.com)

解决方式2

更换服务器协调组件,Zookeeper组件切换ClickHouseKeeper组件。

这个我还没有验证。这个更换需要ck数据库的版本也比较高,太低的换不了。所以数据库版本低的也需要同步升级。

问题2:主节点磁盘占用增长快

我在主从节点的服务器发现,主节点的数据量是从节点的好几倍不止。感觉有问题,以为像mysql那种bin日志等。后期查看发现原来是部分系统日志没有自动删除导致的。

主节点的store目录通常用于存储集群元数据信息,比如分片和副本的配置信息,集群的元数据和日志文件等。

主节点的日志大小-131G

从节点的日志大小-38G

验证方法

先查看各个日志表的大小,确认是不是该问题导致

sql 复制代码
SELECT 
    sum(rows) AS `总行数`,
    formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
    formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
    round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`,
    `table` AS `表名`
FROM system.parts where database = 'system' group by `table`

查看发现这几个日志表确实很大:

解决方法

设置日志表的 TTL 来自动清理数据。例如保存 30 天的日志可以使用以下 SQL:

sql 复制代码
ALTERTABLE query_log MODIFY TTL event_date + toIntervalDay(30);

参考:Clickhouse:日志表占用大量磁盘空间怎么办?_Clickhouse_Ken_InfoQ写作社区

项目架构

数据库部署架构

clickhouse集群:三个节点(clickhouse的版本是20.8)

zookeeper集群:三个节点(每个节点与ck用的一台服务器)

物联网项目整体架构

ClickHouse集群-Zk优化

相关推荐
爱上口袋的天空7 小时前
09 - Clickhouse的SQL操作
数据库·sql·clickhouse
谭震鸿9 小时前
Zookeeper集群搭建Centos环境下
分布式·zookeeper·centos
求积分不加C2 天前
zookeeper is not a recognized option--解决方案
分布式·zookeeper·云原生
爱上口袋的天空3 天前
06 - Clickhouse的表引擎
数据库·clickhouse
Karoku0663 天前
【企业级分布式系统】ZooKeeper集群
linux·运维·数据库·分布式·zookeeper·云原生
fengqing55783 天前
zookeeper安装教程
分布式·zookeeper·云原生
吹老师个人app编程教学4 天前
ClickHouse的介绍、安装、数据类型
数据库·clickhouse·oracle
有被蠢哭到4 天前
Python连接Mysql、Postgre、ClickHouse、Redis常用库及封装方法
redis·python·mysql·clickhouse·postgresql
genghongsheng4 天前
执行flink sql连接clickhouse库
数据库·clickhouse·flink
zhcf4 天前
【Zookeeper】一、Zookeeper的使命
分布式·zookeeper