- 问题描述
SQL 错误 [40] [07000]: Code: 40. DB::Exception: Checksum doesn't match: corrupted data. Reference: 50e8c1efa78de2881b725d44b04be1fe. Actual: 161c99eb681ec36b83540ecdd65ad8c9. Size of compressed block: 32846. The mismatch is caused by single bit flip in data block at byte 10059, bit 6. This is most likely due to hardware failure. If you receive broken data over network and the error does not repeat every time, this can be caused by bad RAM on network interface controller or bad controller itself or bad RAM on network switches or bad CPU on network switches (look at the logs on related network switches; note that TCP checksums don't help) or bad RAM on host (look at dmesg or kern.log for enormous amount of EDAC errors, ECC-related reports, Machine Check Exceptions, mcelog; note that ECC memory can fail if the number of errors is huge) or bad CPU on host. If you read data from disk, this can be caused by disk bit rot. This exception protects ClickHouse from data corruption due to hardware failures.: (while reading column id): (while reading from part /data/clickhouse/store/979/9795066e-6ea5-4550-8361-6c35d8ed9dca/7-20230727_1282473_1283947_9/ from mark 2232 with max_rows_to_read = 33022): While executing MergeTreeThread. (CHECKSUM_DOESNT_MATCH) (version 23.3.1.2823 (official build))
, server ClickHouseNode [uri={socket_timeout=300000,use_server_time_zone=false,use_time_zone=false}]@-1484630932
2.问题解决
https://github.com/marliotto/clickhouse-bitflip
使用clickhouse-bitflip修复损坏的clickhouse数据,将代码下载下来,进行build,然后修复出问题的文件。 像上面的报错,需要修复的文件是
/data/clickhouse/store/979/9795066e-6ea5-4550-8361-6c35d8ed9dca/7-20230727_1282473_1283947_9/id.bin
修复命令
./clickhouse-bitflip /data/clickhouse/store/979/9795066e-6ea5-4550-8361-6c35d8ed9dca/7-20230727_1282473_1283947_9/id.bin