clickhouse集群+Zk优化-解决只读模式,主节点磁盘增长快

问题1:数据库进入只读模式

最近在项目中使用clickhouse的时候,遇到了一个批量插入后报错的问题。报错的内容是数据库进入了只读模式,导致数据写不进去。发现有大量的批量写入报错日志信息。(关键异常信息:DB::Exception: Table is in readonly mode)

部署环境:使用的clickhouse的版本是20.8,比较早的版本了。微服务协调组件使用的是zookeeper。三个节点的集群。

主要原因:

  1. 数据量很大时,ClickHouse中的表可能会进入只读模式,这主要是因为元数据(metadata)的增长超过了协调服务(如ZooKeeper)能够有效处理的范围。

  2. ZooKeeper需要在多个节点之间保持元数据的一致性。随着元数据的增加,同步这些数据的时间也会增加,从而可能导致同步延迟。

  3. ZooKeeper非常强大且可靠,但它也有其自身的性能限制。当元数据的更新过于频繁时,ZooKeeper可能无法及时处理所有的更新请求。

总结:数据量过大导致Zookeeper负载过高,主从同步的时间过长,导致主节点没有收到从节点的完成响应。主节点在等待从节点完成的期间,继而引发ClickHouse进入只读模式。

数据库报错信息:

bash 复制代码
2024.08.12 09:40:15.396881 [ 13043 ] {2f16eacd-1c5f-4560-bc5f-c121280643d7} <Error> DynamicQueryHandler: Code: 242, e.displayText() = DB::Exception: Table is in readonly mode (zookeeper path: /clickhouse/aiot_tb/1/ts_kv_cluster), Stack trace (when copying this message, always include the lines below):

0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x13cd24bc in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0xa4346c9 in /usr/bin/clickhouse
2. ? @ 0x9efb0cf in /usr/bin/clickhouse
3. DB::StorageReplicatedMergeTree::write(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::Context const&) @ 0x10fa0404 in /usr/bin/clickhouse
4. DB::PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(std::__1::shared_ptr<DB::IStorage> const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::Context const&, std::__1::shared_ptr<DB::IAST> const&, bool) @ 0x114eea16 in /usr/bin/clickhouse
5. DB::InterpreterInsertQuery::execute() @ 0x10ceed1e in /usr/bin/clickhouse
6. ? @ 0x10e0e178 in /usr/bin/clickhouse
7. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) @ 0x10e0fe08 in /usr/bin/clickhouse
8. DB::HTTPHandler::processQuery(DB::Context&, Poco::Net::HTTPServerRequest&, HTMLForm&, Poco::Net::HTTPServerResponse&, DB::HTTPHandler::Output&) @ 0x117e3a2e in /usr/bin/clickhouse
9. DB::HTTPHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&) @ 0x117e7260 in /usr/bin/clickhouse
10. Poco::Net::HTTPServerConnection::run() @ 0x13c1466e in /usr/bin/clickhouse
11. Poco::Net::TCPServerConnection::start() @ 0x13c11197 in /usr/bin/clickhouse
12. Poco::Net::TCPServerDispatcher::run() @ 0x13c1158b in /usr/bin/clickhouse
13. Poco::PooledThread::run() @ 0x13d481ae in /usr/bin/clickhouse
14. Poco::ThreadImpl::runnableEntry(void*) @ 0x13d44ea9 in /usr/bin/clickhouse
15. start_thread @ 0x7ea5 in /usr/lib64/libpthread-2.17.so
16. __clone @ 0xfdf3d in /usr/lib64/libc-2.17.so
 (version 20.8.3.18)
 
2024.08.12 09:40:15.397685 [ 12358 ] {17419ab6-1261-439d-b3df-6e80cfcab4b3} <Error> executeQuery: Code: 242, e.displayText() = DB::Exception: Table is in readonly mode (zookeeper path: /clickhouse/aiot_tb/1/ts_kv_cluster) (version 20.8.3.18) (from 192.168.249.17:55888) (in query: insert into ts_kv_cluster (entity_id,key,ts,bool_v,str_v,long_v,dbl_v,json_v) FORMAT RowBinary ), Stack trace (when copying this message, always include the lines below):

0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x13cd24bc in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0xa4346c9 in /usr/bin/clickhouse
2. ? @ 0x9efb0cf in /usr/bin/clickhouse
3. DB::StorageReplicatedMergeTree::write(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::Context const&) @ 0x10fa0404 in /usr/bin/clickhouse
4. DB::PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(std::__1::shared_ptr<DB::IStorage> const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::Context const&, std::__1::shared_ptr<DB::IAST> const&, bool) @ 0x114eea16 in /usr/bin/clickhouse
5. DB::InterpreterInsertQuery::execute() @ 0x10ceed1e in /usr/bin/clickhouse
6. ? @ 0x10e0e178 in /usr/bin/clickhouse
7. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) @ 0x10e0fe08 in /usr/bin/clickhouse
8. DB::HTTPHandler::processQuery(DB::Context&, Poco::Net::HTTPServerRequest&, HTMLForm&, Poco::Net::HTTPServerResponse&, DB::HTTPHandler::Output&) @ 0x117e3a2e in /usr/bin/clickhouse
9. DB::HTTPHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&) @ 0x117e7260 in /usr/bin/clickhouse
10. Poco::Net::HTTPServerConnection::run() @ 0x13c1466e in /usr/bin/clickhouse
11. Poco::Net::TCPServerConnection::start() @ 0x13c11197 in /usr/bin/clickhouse
12. Poco::Net::TCPServerDispatcher::run() @ 0x13c1158b in /usr/bin/clickhouse
13. Poco::PooledThread::run() @ 0x13d481ae in /usr/bin/clickhouse
14. Poco::ThreadImpl::runnableEntry(void*) @ 0x13d44ea9 in /usr/bin/clickhouse
15. start_thread @ 0x7ea5 in /usr/lib64/libpthread-2.17.so
16. __clone @ 0xfdf3d in /usr/lib64/libc-2.17.so

zookeeper报错信息

关键异常总结:

Unexpected exception in LearnerHandler: java.net.SocketException: Socket closed

Ignoring unexpected exception java.lang.InterruptedException

Unexpected exception in LearnerHandler: java.net.SocketException: Connection reset

Exception while sending packets in LearnerHandler java.net.SocketException: Broken pipe (Write failed)

will be dropped if server is in r-o mode

bash 复制代码
2024-08-12 12:15:01,957 [myid:1] - WARN  [SyncThread:1:FileTxnLog@393] - fsync-ing the write ahead log in SyncThread:1 took 14266ms which will adversely effect operation latency.File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2024-08-12 12:15:09,968 [myid:1] - WARN  [SyncThread:1:FileTxnLog@393] - fsync-ing the write ahead log in SyncThread:1 took 8010ms which will adversely effect operation latency.File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2024-08-12 12:15:09,974 [myid:1] - INFO  [CommitProcessor:1:LeaderSessionTracker@104] - Committing global session 0x20404e1f22a0002
2024-08-12 12:15:09,974 [myid:1] - INFO  [CommitProcessor:1:LeaderSessionTracker@104] - Committing global session 0x20404e1f22a0003
2024-08-12 12:15:09,974 [myid:1] - INFO  [CommitProcessor:1:LeaderSessionTracker@104] - Committing global session 0x30404dd5e2c0004
2024-08-12 12:15:11,475 [myid:1] - WARN  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):LearnerHandler@1077] - Closing connection to peer due to transaction timeout.
2024-08-12 12:15:11,475 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):LearnerHandler@1158] - Synchronously closing socket to learner 3.
2024-08-12 12:15:11,476 [myid:1] - ERROR [LearnerHandler-/192.168.249.20:52462:LearnerHandler@718] - Unexpected exception in LearnerHandler: 
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.read(SocketInputStream.java:204)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96)
        at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86)
        at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134)
        at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:655)
2024-08-12 12:15:11,476 [myid:1] - WARN  [LearnerHandler-/192.168.249.20:52462:LearnerHandler@734] - ******* GOODBYE /192.168.249.20:52462 ********
2024-08-12 12:15:11,476 [myid:1] - WARN  [LearnerHandler-/192.168.249.20:52462:LearnerHandler@1048] - Ignoring unexpected exception
java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
        at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
        at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
        at org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:1046)
        at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:736)
2024-08-12 12:15:11,482 [myid:1] - WARN  [NIOWorkerThread-15:ZooKeeperServer@1417] - Connection request from old client /192.168.249.19:47342; will be dropped if server is in r-o mode
2024-08-12 12:15:14,149 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@390] - Notification: my state:LEADING; n.sid:3, n.state:LOOKING, n.leader:3, n.round:0x34e, n.peerEpoch:0x34c, n.zxid:0x34c0002996f, message format version:0x2, n.config version:0x0
2024-08-12 12:15:14,154 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52948:LearnerHandler@509] - Follower sid: 3 : info : bigdata03:2888:3888:participant
2024-08-12 12:15:14,155 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52948:ZKDatabase@346] - On disk txn sync enabled with snapshotSizeFactor 0.33
2024-08-12 12:15:14,155 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52948:LearnerHandler@804] - Synchronizing with Learner sid: 3 maxCommittedLog=0x34c0002996f minCommittedLog=0x34c0002977b lastProcessedZxid=0x34c0002996f peerLastZxid=0x34c0002996f
2024-08-12 12:15:14,155 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52948:LearnerHandler@849] - Sending DIFF zxid=0x34c0002996f for peer sid: 3
2024-08-12 12:15:15,968 [myid:1] - WARN  [SyncThread:1:FileTxnLog@393] - fsync-ing the write ahead log in SyncThread:1 took 5987ms which will adversely effect operation latency.File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2024-08-12 12:15:17,187 [myid:1] - WARN  [SyncThread:1:FileTxnLog@393] - fsync-ing the write ahead log in SyncThread:1 took 1218ms which will adversely effect operation latency.File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2024-08-12 12:15:17,188 [myid:1] - INFO  [CommitProcessor:1:LeaderSessionTracker@104] - Committing global session 0x10327dc03f20009
2024-08-12 12:15:18,500 [myid:1] - WARN  [SyncThread:1:FileTxnLog@393] - fsync-ing the write ahead log in SyncThread:1 took 1310ms which will adversely effect operation latency.File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2024-08-12 12:15:19,987 [myid:1] - WARN  [SyncThread:1:FileTxnLog@393] - fsync-ing the write ahead log in SyncThread:1 took 1486ms which will adversely effect operation latency.File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2024-08-12 12:15:22,481 [myid:1] - ERROR [LearnerHandler-/192.168.249.20:52948:LearnerHandler@718] - Unexpected exception in LearnerHandler: 
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:210)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96)
        at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86)
        at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134)
        at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:655)
2024-08-12 12:15:22,481 [myid:1] - ERROR [Sender-/192.168.249.20:52948:LearnerHandler@371] - Exception while sending packets in LearnerHandler
java.net.SocketException: Broken pipe (Write failed)
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at org.apache.zookeeper.server.quorum.LearnerHandler.sendPackets(LearnerHandler.java:335)
        at org.apache.zookeeper.server.quorum.LearnerHandler.access$200(LearnerHandler.java:65)
        at org.apache.zookeeper.server.quorum.LearnerHandler$1.run(LearnerHandler.java:750)
2024-08-12 12:15:22,481 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52948:LearnerHandler@1158] - Synchronously closing socket to learner 3.
2024-08-12 12:15:22,481 [myid:1] - WARN  [LearnerHandler-/192.168.249.20:52948:LearnerHandler@734] - ******* GOODBYE /192.168.249.20:52948 ********
2024-08-12 12:15:22,482 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@390] - Notification: my state:LEADING; n.sid:3, n.state:LOOKING, n.leader:3, n.round:0x34e, n.peerEpoch:0x34c, n.zxid:0x34c0002996f, message format version:0x2, n.config version:0x0
2024-08-12 12:15:22,976 [myid:1] - WARN  [SyncThread:1:FileTxnLog@393] - fsync-ing the write ahead log in SyncThread:1 took 1494ms which will adversely effect operation latency.File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2024-08-12 12:15:23,835 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52950:LearnerHandler@509] - Follower sid: 3 : info : bigdata03:2888:3888:participant
2024-08-12 12:15:23,836 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52950:ZKDatabase@346] - On disk txn sync enabled with snapshotSizeFactor 0.33
2024-08-12 12:15:23,836 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52950:LearnerHandler@804] - Synchronizing with Learner sid: 3 maxCommittedLog=0x34c0002997f minCommittedLog=0x34c0002978b lastProcessedZxid=0x34c0002997f peerLastZxid=0x34c0002996f
2024-08-12 12:15:23,836 [myid:1] - INFO  [LearnerHandler-/192.168.249.20:52950:LearnerHandler@868] - Using committedLog for peer sid: 3

解决方式1

优化zookeeper的配置

这个方式目前来看并不能完全解决,但是可以有效降低zookeeper的问题时间。

修改zookeeper安装目录下面的配置zoo.cfg

核心的点:

bash 复制代码
autopurge.snapRetainCount=3
syncLimit=10
forceSync=no

配置含义解析:

autopurge.snapRetainCount

保留autopurge.snapRetainCount个最近的数据快照(dataDir)和对应的事务日志文件(dataLogDir),其余的将会删除掉。默认值是3,最小值也是3。

syncLimit

集群中的 Leader 节点和 Follower 节点之间请求和应答时能容忍的最多心跳数

forceSync

最关键的一点,该参数用于配置ZooKeeper服务器是否在事务提交的时候,将日志写入操作强制刷入磁盘,默认情况下是"yes",即每次事务日志写入操作都会实时刷入磁盘。forceSync=no,这个对写请求的性能提升很有帮助,是指每次写请求的数据都要从pagecache中固化到磁盘上,才算是写成功返回。当写请求数量到达一定程度的时候,后续写请求会等待前面写请求的forceSync操作,造成一定延时。如果追求低延时的写请求,配置forceSync=no,数据写到pagecache后就返回。但是机器断电的时候,pagecache中的数据有可能丢失。

我用这个方法优化后,报错减少了非常多,偶发性的还有少量报错。也是因为zookeeper同步的时候会有。但不是只读模式的报错,而是请求数据库超时导致的。

参考:ZooKeeper调优 (qq.com)

解决方式2

更换服务器协调组件,Zookeeper组件切换ClickHouseKeeper组件。

这个我还没有验证。这个更换需要ck数据库的版本也比较高,太低的换不了。所以数据库版本低的也需要同步升级。

问题2:主节点磁盘占用增长快

我在主从节点的服务器发现,主节点的数据量是从节点的好几倍不止。感觉有问题,以为像mysql那种bin日志等。后期查看发现原来是部分系统日志没有自动删除导致的。

主节点的store目录通常用于存储集群元数据信息,比如分片和副本的配置信息,集群的元数据和日志文件等。

主节点的日志大小-131G

从节点的日志大小-38G

验证方法

先查看各个日志表的大小,确认是不是该问题导致

sql 复制代码
SELECT 
    sum(rows) AS `总行数`,
    formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
    formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
    round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`,
    `table` AS `表名`
FROM system.parts where database = 'system' group by `table`

查看发现这几个日志表确实很大:

解决方法

设置日志表的 TTL 来自动清理数据。例如保存 30 天的日志可以使用以下 SQL:

sql 复制代码
ALTERTABLE query_log MODIFY TTL event_date + toIntervalDay(30);

参考:Clickhouse:日志表占用大量磁盘空间怎么办?_Clickhouse_Ken_InfoQ写作社区

项目架构

数据库部署架构

clickhouse集群:三个节点(clickhouse的版本是20.8)

zookeeper集群:三个节点(每个节点与ck用的一台服务器)

物联网项目整体架构

ClickHouse集群-Zk优化

相关推荐
Sais_Z9 小时前
ClickHouse的学习与了解
数据库·clickhouse
风中凌乱3 天前
ClickHouse-Backup的安装与部署
clickhouse
风中凌乱3 天前
clickhouse集群的安装与部署
clickhouse
白眼黑刺猬3 天前
ClickHouse从入门到企业级实战全解析课程简介
clickhouse
Rookie小强5 天前
ZooKeeper和Reids做分布式锁的区别?
分布式·zookeeper·云原生
蓝眸少年CY5 天前
(第三篇)spring cloud之Zookeeper注册中心
spring·spring cloud·zookeeper
斯普信专业组5 天前
zookeeper因jute.maxbuffer启动异常问题排查处理
分布式·zookeeper·云原生
chenglin0167 天前
ClickHouse、Doris、OpenSearch、Splunk、Solr系统化分析
clickhouse·solr·lucene
慕y2747 天前
Java学习第一百一十七部分——ClickHouse
java·学习·clickhouse