Exploring storage and computing separation for ClickHouse - JuiceFS Blog
ClickHouse 存算分离改造:小红书自研云原生数据仓库实践
唯品会翻牌ClickHouse后,实现百亿级数据自助分析_语言 & 开发_dbaplus社群_InfoQ精选文章
在思考如何实现存算分离,感觉可以像JuiceFS利用多盘存储隔离资源。
多盘配置
XML
<path>/var/lib/clickhouse/</path>
<storage_configuration>
<disks>
<disk_name_1>
<path>/mnt/A123456/data/</path>
</disk_name_1>
</disks>
<policies>
<policy_name_1>
<volumes>
<volume_name_0>
<disk>disk_name_1</disk>
</volume_name_0>
</volumes>
</policy_name_1>
</policies>
</storage_configuration>
但是还有个位置,zk的多副本配置了怎么弄,还有就是如何读data。
查看存储策略
select policy_name,volume_name,disks from system.storage_policies
bash
┌─policy_name───┬─volume_name───┬─disks───────────┐
│ default │ default │ ['default'] │
│ policy_name_1 │ volume_name_0 │ ['disk_name_1'] │
└───────────────┴───────────────┴─────────────────┘
生成mergeTree表写数据
XML
CREATE TABLE myFirstReplacingMT
(
`key` Int64,
`someCol` String,
`eventTime` DateTime
)
ENGINE = ReplacingMergeTree
ORDER BY key SETTINGS storage_policy = 'policy_name_1';
INSERT INTO myFirstReplacingMT Values (1, 'first', '2020-01-01 01:01:01');
INSERT INTO myFirstReplacingMT Values (1, 'second', '2020-01-01 00:00:00');
查看位置
sql
SELECT
name,
data_paths,
metadata_path,
storage_policy
FROM system.tables
WHERE name LIKE 'myFir%'
从文件导入
换一个简单的表 test_batch
bash
CREATE TABLE test_batch (a Int64,b Int64)
ENGINE = ReplacingMergeTree() ORDER BY a
由clickhouse-local生成数据
sudo echo -e "1,2\n2,3" | clickhouse-local --input-format "CSV" -S "a Int64,b Int64" -N "tmp_table" -q "CREATE TABLE test_batch (a Int64,b Int64) ENGINE = ReplacingMergeTree() ORDER BY a;INSERT INTO TABLE test_batch SELECT a,b FROM tmp_table;" --logger.console --path /tmp/test/testlocal/
ls testlocal/data/_local/test_batch/all_1_1_0/
bash
checksums.txt columns.txt count.txt data.bin data.cmrk3 default_compression_codec.txt metadata_version.txt primary.cidx serialization.json
拷贝到server上,查看一下存储位置
bash
SELECT
name,
data_paths,
metadata_path,
storage_policy
FROM system.tables
WHERE name LIKE 'test_batch%'
拷贝到data_paths下detached文件夹
bash
sudo cp -r ./testlocal/data/_local/test_batch/all_1_1_0/ /mnt/xxx/data/store/xxx/detached/
然后在service上:
ALTER TABLE test_batch ATTACH PART 'all_1_1_0';
导入进去了,看看分区(我这里已经重复操作了3次,也就是导入了3次)
bash
SELECT
partition,
name,
active
FROM system.parts
WHERE table = 'test_batch'
Query id: 111
┌─partition─┬─name──────┬─active─┐
│ tuple() │ all_1_1_0 │ 1 │
│ tuple() │ all_2_2_0 │ 1 │
│ tuple() │ all_3_3_0 │ 1 │
└───────────┴───────────┴────────┘