【大数据】clickhouse快速上手

安装部署

拉取镜像

复制代码
docker pull clickhouse/clickhouse-server

启动实例

创建数据目录并启动

bash 复制代码
#!/bin/bash

docker run --restart=always -d -P\
   -e CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1 \
   -e CLICKHOUSE_USER=default \
   -e CLICKHOUSE_PASSWORD=nQczEy2EWJ7w \
    -v "/data/clickhouse/data:/var/lib/clickhouse/" \
    -v "/data/clickhouse/logs:/var/log/clickhouse-server/" \
    --name clickhouse-server --ulimit nofile=262144:262144 clickhouse/clickhouse-server

You may also want to mount:

  • /etc/clickhouse-server/config.d/*.xml - files with server configuration adjustments
  • /etc/clickhouse-server/users.d/*.xml - files with user settings adjustments
  • /docker-entrypoint-initdb.d/ - folder with database initialization scripts (see below).

查看自动分配的端口

复制代码
[root@localhost ~]# docker ps
CONTAINER ID   IMAGE                                                 COMMAND                  CREATED         STATUS                  PORTS                                                                                                                                   NAMES
0bab25db35bf   clickhouse/clickhouse-server                          "/entrypoint.sh"         1 second ago    Up 1 second             0.0.0.0:32773->8123/tcp, :::32773->8123/tcp, 0.0.0.0:32772->9000/tcp, :::32772->9000/tcp, 0.0.0.0:32771->9009/tcp, :::32771->9009/tcp   clickhouse-server

端口说明

The container exposes port 8123 for the HTTP interface and port 9000 for the native client.

初步使用

容器内CLI客户端连接

SQL语法与MySQL基本相同

复制代码
[root@localhost ~]#  docker exec -it clickhouse-server clickhouse-client
ClickHouse client version 25.8.4.13 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 25.8.4.

Warnings:
 * Linux transparent hugepages are set to "always". Check /sys/kernel/mm/transparent_hugepage/enabled

0bab25db35bf :) SELECT 'Hello, ClickHouse!'

SELECT 'Hello, ClickHouse!'

Query id: 78e7cfaa-5118-44d9-ae45-9104eadf1149

   ┌─'Hello, ClickHouse!'─┐
1. │ Hello, ClickHouse!   │
   └──────────────────────┘

1 row in set. Elapsed: 0.001 sec.

0bab25db35bf :) exit
Bye.
[root@localhost ~]#

说明: ClickHouse 服务器配置了密码认证,在 ClickHouse 中,默认配置下,本地连接不需密码。

容器外使用curl连接

复制代码
[root@localhost ~]# echo "SELECT 'Hello, ClickHouse!'" | curl 'http://default:nQczEy2EWJ7w@localhost:32773/?query=' -s --data-binary @-
echo "SELECT 'Hello, ClickHouse'" | curl 'http://default:nQczEy2EWJ7w@localhost:32773/?query=' -s --data-binary @-
Hello, ClickHouse

修改密码方法

复制代码
# 进入 ClickHouse 客户端
docker exec -it clickhouse-server clickhouse-client

# 执行修改密码命令
ALTER USER default IDENTIFIED BY 'new_secure_password';

模拟数据分析场景

我们来模拟写入一款游戏Command Modern Operations的一场推演中产生的所有事件信息,我们使用uuid来标记一个推演场次,即每一场推演的所有事件数据,uuid一样。根据事件产生大量事件日志,比如红方f22战斗机升空,红方发射xx型号导弹,xx型号导弹命中蓝方xx目标造成xx点损伤。模拟数据尽量考虑cmo游戏中的各种武器装备。请提供python程序,随机生成1百万个随机事件,每5000个事件一组,批量写入到clickhouse。

创建表

复制代码
 CREATE TABLE cmo_events
(
    event_id UUID,
    session_uuid UUID,
    event_time DateTime,
    event_type String,
    side Enum8('RED' = 1, 'BLUE' = 2),
    unit_type String,
    unit_name String,
    weapon_system Nullable(String),
    target_type Nullable(String),
    target_name Nullable(String),
    damage_points Nullable(Int32),
    location_lat Float64,
    location_lon Float64,
    description String
) ENGINE = MergeTree()
ORDER BY (session_uuid, event_time);

参考:

复制代码
[root@localhost ~]# docker exec -it clickhouse-server clickhouse-client
ClickHouse client version 25.8.4.13 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 25.8.4.

Warnings:
 * Linux transparent hugepages are set to "always". Check /sys/kernel/mm/transparent_hugepage/enabled

0bab25db35bf :) CREATE TABLE cmo_events
(
    event_id UUID,
    session_uuid UUID,
    event_time DateTime,
    event_type String,
    side Enum8('RED' = 1, 'BLUE' = 2),
    unit_type String,
    unit_name String,
    weapon_system Nullable(String),
    target_type Nullable(String),
    target_name Nullable(String),
    damage_points Nullable(Int32),
    location_lat Float64,
    location_lon Float64,
    description String
) ENGINE = MergeTree()
ORDER BY (session_uuid, event_time);

CREATE TABLE cmo_events
(
    `event_id` UUID,
    `session_uuid` UUID,
    `event_time` DateTime,
    `event_type` String,
    `side` Enum8('RED' = 1, 'BLUE' = 2),
    `unit_type` String,
    `unit_name` String,
    `weapon_system` Nullable(String),
    `target_type` Nullable(String),
    `target_name` Nullable(String),
    `damage_points` Nullable(Int32),
    `location_lat` Float64,
    `location_lon` Float64,
    `description` String
)
ENGINE = MergeTree
ORDER BY (session_uuid, event_time)

Query id: 3a225705-8067-4803-adfe-1f5916b1cc5f

Ok.

0 rows in set. Elapsed: 0.006 sec.

0bab25db35bf :) show tables;

SHOW TABLES

Query id: 8b2ab058-53ab-4ea4-bc1b-a357be3ee4a2

   ┌─name───────┐
1. │ cmo_events │
   └────────────┘

1 row in set. Elapsed: 0.002 sec.

0bab25db35bf :)

python数据生成及写入程序

安装驱动

复制代码
pip install clickhouse-driver

generate_cmo_events.py

复制代码
import uuid
import random
import time
from datetime import datetime, timedelta
from clickhouse_driver import Client
from clickhouse_driver.errors import Error
import logging

# 设置日志
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# ClickHouse连接配置
CH_HOST = 'localhost'
CH_PORT = 32772  # 默认原生TCP端口
CH_USER = 'default'
CH_PASSWORD = 'nQczEy2EWJ7w'  # 如果设置了密码,请填写
CH_DATABASE = 'default'
CH_TABLE = 'cmo_events'

# 模拟数据配置
RED_UNITS = [
    "F-22A Raptor", "Su-57 Felon", "J-20 Mighty Dragon", "T-14 Armata",
    "S-400 Triumf", "Borei-class SSBN", "Yasen-class SSN", "Kirov-class Battlecruiser"
]

BLUE_UNITS = [
    "F-35 Lightning II", "F/A-18E/F Super Hornet", "Arleigh Burke-class DDG",
    "Virginia-class SSN", "Zumwalt-class DDG", "M1A2 Abrams", "Patriot SAM",
    "Trident II D5 SLBM"
]

WEAPON_SYSTEMS = [
    "AIM-120D AMRAAM", "R-77 Adder", "PL-15", "AIM-9X Sidewinder",
    "R-73 Archer", "Tomahawk Cruise Missile", "Kalibr Cruise Missile",
    "DF-21D ASBM", "Standard Missile-6", "Torpedo MK-48", "Torpedo 53-65"
]

EVENT_TYPES = [
    "TAKEOFF", "LANDING", "WEAPON_LAUNCH", "WEAPON_HIT", "WEAPON_MISS",
    "DETECTION", "FUEL_STATUS", "DAMAGE_ASSESSMENT", "MISSION_UPDATE",
    "COMMUNICATION", "SENSOR_ACTIVATION", "COUNTERMEASURE_DEPLOYED"
]

TARGET_TYPES = [
    "AIRBASE", "AIRCRAFT", "WARSHIP", "SUBMAIRNE", "GROUND_VEHICLE",
    "SAM_SITE", "RADAR_SITE", "COMMAND_CENTER", "INFRASTRUCTURE"
]

# 创建ClickHouse客户端
def create_clickhouse_client():
    try:
        client = Client(
            host=CH_HOST,
            port=CH_PORT,
            user=CH_USER,
            password=CH_PASSWORD,
            database=CH_DATABASE
        )
        logger.info("Successfully connected to ClickHouse")
        return client
    except Error as e:
        logger.error(f"Failed to connect to ClickHouse: {e}")
        raise

# 创建表(如果不存在)
def create_table_if_not_exists(client):
    create_table_query = f"""
    CREATE TABLE IF NOT EXISTS {CH_TABLE}
    (
        event_id UUID,
        session_uuid UUID,
        event_time DateTime,
        event_type String,
        side Enum8('RED' = 1, 'BLUE' = 2),
        unit_type String,
        unit_name String,
        weapon_system Nullable(String),
        target_type Nullable(String),
        target_name Nullable(String),
        damage_points Nullable(Int32),
        location_lat Float64,
        location_lon Float64,
        description String
    ) ENGINE = MergeTree()
    ORDER BY (session_uuid, event_time)
    """

    try:
        client.execute(create_table_query)
        logger.info(f"Table {CH_TABLE} created or already exists")
    except Error as e:
        logger.error(f"Failed to create table: {e}")
        raise

# 生成随机事件
def generate_event(session_uuid, start_time):
    event_time = start_time + timedelta(seconds=random.randint(0, 3600*6))  # 6小时内的随机时间
    side = random.choice(["RED", "BLUE"])

    if side == "RED":
        unit_type = random.choice(RED_UNITS)
        unit_name = f"Red {unit_type} #{random.randint(1, 12)}"
    else:
        unit_type = random.choice(BLUE_UNITS)
        unit_name = f"Blue {unit_type} #{random.randint(1, 12)}"

    event_type = random.choice(EVENT_TYPES)

    # 根据事件类型设置特定字段
    weapon_system = random.choice(WEAPON_SYSTEMS) if event_type in ["WEAPON_LAUNCH", "WEAPON_HIT", "WEAPON_MISS"] else None

    if event_type in ["WEAPON_HIT", "WEAPON_MISS", "DAMAGE_ASSESSMENT"]:
        target_side = "BLUE" if side == "RED" else "RED"
        target_type = random.choice(TARGET_TYPES)
        target_name = f"{target_side} {target_type} #{random.randint(1, 20)}"
        damage_points = random.randint(10, 100) if event_type in ["WEAPON_HIT", "DAMAGE_ASSESSMENT"] else None
    else:
        target_type = None
        target_name = None
        damage_points = None

    # 生成位置数据 (模拟在某个区域内的随机位置)
    location_lat = 30.0 + random.uniform(-5, 5)
    location_lon = 120.0 + random.uniform(-5, 5)

    # 生成描述
    if event_type == "TAKEOFF":
        description = f"{unit_name} took off from base"
    elif event_type == "LANDING":
        description = f"{unit_name} landed at base"
    elif event_type == "WEAPON_LAUNCH":
        description = f"{unit_name} launched {weapon_system}"
    elif event_type == "WEAPON_HIT":
        description = f"{weapon_system} from {unit_name} hit {target_name} causing {damage_points} damage"
    elif event_type == "WEAPON_MISS":
        description = f"{weapon_system} from {unit_name} missed {target_name}"
    elif event_type == "DETECTION":
        description = f"{unit_name} detected unknown contact"
    else:
        description = f"{unit_name} {event_type.lower().replace('_', ' ')}"

    return [
        uuid.uuid4(),               # event_id
        session_uuid,               # session_uuid
        event_time,                 # event_time
        event_type,                 # event_type
        side,                       # side
        unit_type,                  # unit_type
        unit_name,                  # unit_name
        weapon_system,              # weapon_system
        target_type,                # target_type
        target_name,                # target_name
        damage_points,              # damage_points
        location_lat,               # location_lat
        location_lon,               # location_lon
        description                 # description
    ]

# 批量生成并插入数据
def generate_and_insert_data(client, num_events=1000000, batch_size=5000):
    session_uuid = uuid.uuid4()
    start_time = datetime.now() - timedelta(hours=6)  # 从6小时前开始

    logger.info(f"Generating {num_events} events for session {session_uuid}")

    total_start_time = time.time()
    generated_events = 0

    # 准备插入查询
    insert_query = f"""
    INSERT INTO {CH_TABLE} (
        event_id, session_uuid, event_time, event_type, side,
        unit_type, unit_name, weapon_system, target_type, target_name,
        damage_points, location_lat, location_lon, description
    ) VALUES
    """

    # 分批次生成和插入数据
    for batch_num in range(0, num_events, batch_size):
        batch_start_time = time.time()
        batch_data = []
        current_batch_size = min(batch_size, num_events - batch_num)

        # 生成批次数据
        for i in range(current_batch_size):
            event = generate_event(session_uuid, start_time)
            batch_data.append(event)

        # 插入数据到ClickHouse
        try:
            client.execute(insert_query, batch_data)
            generated_events += current_batch_size

            batch_time = time.time() - batch_start_time
            logger.info(f"Inserted batch {batch_num//batch_size + 1} with {current_batch_size} events "
                       f"in {batch_time:.2f} seconds "
                       f"({current_batch_size/batch_time:.2f} events/sec)")

        except Error as e:
            logger.error(f"Failed to insert batch {batch_num//batch_size + 1}: {e}")
            # 可以选择重试或继续

    total_time = time.time() - total_start_time
    logger.info(f"Total: Inserted {generated_events} events in {total_time:.2f} seconds "
               f"({generated_events/total_time:.2f} events/sec)")

    return generated_events, total_time

if __name__ == "__main__":
    # 安装依赖: pip install clickhouse-driver

    try:
        # 创建ClickHouse客户端
        client = create_clickhouse_client()

        # 创建表(如果不存在)
        create_table_if_not_exists(client)

        # 生成并插入数据
        events_count, total_time = generate_and_insert_data(client, 1000000, 5000)

        logger.info(f"Successfully inserted {events_count} events in {total_time:.2f} seconds")

    except Exception as e:
        logger.error(f"Program failed with error: {e}")

运行日志(性能优秀)

通过日志,可以看到写入性能非常炸裂!

复制代码
[root@localhost clickhouse-sample]# python3 generate_cmo_events.py
INFO:__main__:Successfully connected to ClickHouse
INFO:__main__:Table cmo_events created or already exists
INFO:__main__:Generating 1000000 events for session 09cfee87-3b10-41cb-a995-9bb351944997
INFO:__main__:Inserted batch 1 with 5000 events in 0.10 seconds (52522.95 events/sec)
INFO:__main__:Inserted batch 2 with 5000 events in 0.08 seconds (60039.74 events/sec)
INFO:__main__:Inserted batch 3 with 5000 events in 0.09 seconds (56199.96 events/sec)
INFO:__main__:Inserted batch 4 with 5000 events in 0.08 seconds (59493.17 events/sec)
INFO:__main__:Inserted batch 5 with 5000 events in 0.08 seconds (58916.54 events/sec)
INFO:__main__:Inserted batch 6 with 5000 events in 0.09 seconds (58321.62 events/sec)
INFO:__main__:Inserted batch 7 with 5000 events in 0.09 seconds (58659.95 events/sec)
INFO:__main__:Inserted batch 8 with 5000 events in 0.08 seconds (61833.34 events/sec)
INFO:__main__:Inserted batch 9 with 5000 events in 0.08 seconds (61569.03 events/sec)
INFO:__main__:Inserted batch 10 with 5000 events in 0.08 seconds (60840.98 events/sec)
INFO:__main__:Inserted batch 11 with 5000 events in 0.08 seconds (61788.53 events/sec)
INFO:__main__:Inserted batch 12 with 5000 events in 0.08 seconds (61649.57 events/sec)
INFO:__main__:Inserted batch 13 with 5000 events in 0.10 seconds (49049.30 events/sec)
INFO:__main__:Inserted batch 14 with 5000 events in 0.08 seconds (62063.36 events/sec)
INFO:__main__:Inserted batch 15 with 5000 events in 0.08 seconds (61636.71 events/sec)
INFO:__main__:Inserted batch 16 with 5000 events in 0.08 seconds (59425.57 events/sec)
INFO:__main__:Inserted batch 17 with 5000 events in 0.08 seconds (61817.67 events/sec)
INFO:__main__:Inserted batch 18 with 5000 events in 0.08 seconds (61878.77 events/sec)
INFO:__main__:Inserted batch 19 with 5000 events in 0.08 seconds (61121.50 events/sec)
INFO:__main__:Inserted batch 20 with 5000 events in 0.08 seconds (62007.94 events/sec)
INFO:__main__:Inserted batch 21 with 5000 events in 0.08 seconds (61800.18 events/sec)
INFO:__main__:Inserted batch 22 with 5000 events in 0.08 seconds (61397.97 events/sec)
INFO:__main__:Inserted batch 23 with 5000 events in 0.08 seconds (61920.43 events/sec)
INFO:__main__:Inserted batch 24 with 5000 events in 0.08 seconds (61823.68 events/sec)
INFO:__main__:Inserted batch 25 with 5000 events in 0.08 seconds (61918.23 events/sec)
INFO:__main__:Inserted batch 26 with 5000 events in 0.08 seconds (59881.16 events/sec)
INFO:__main__:Inserted batch 27 with 5000 events in 0.08 seconds (61966.35 events/sec)
INFO:__main__:Inserted batch 28 with 5000 events in 0.08 seconds (62225.78 events/sec)
INFO:__main__:Inserted batch 29 with 5000 events in 0.08 seconds (61433.76 events/sec)
INFO:__main__:Inserted batch 30 with 5000 events in 0.08 seconds (62425.83 events/sec)
INFO:__main__:Inserted batch 31 with 5000 events in 0.08 seconds (61381.26 events/sec)
INFO:__main__:Inserted batch 32 with 5000 events in 0.10 seconds (49932.31 events/sec)
INFO:__main__:Inserted batch 33 with 5000 events in 0.09 seconds (57886.16 events/sec)
INFO:__main__:Inserted batch 34 with 5000 events in 0.08 seconds (59527.11 events/sec)
INFO:__main__:Inserted batch 35 with 5000 events in 0.09 seconds (58397.13 events/sec)
INFO:__main__:Inserted batch 36 with 5000 events in 0.08 seconds (59672.44 events/sec)
INFO:__main__:Inserted batch 37 with 5000 events in 0.08 seconds (59633.92 events/sec)
INFO:__main__:Inserted batch 38 with 5000 events in 0.08 seconds (60850.33 events/sec)
INFO:__main__:Inserted batch 39 with 5000 events in 0.08 seconds (60351.09 events/sec)
INFO:__main__:Inserted batch 40 with 5000 events in 0.08 seconds (60006.58 events/sec)
INFO:__main__:Inserted batch 41 with 5000 events in 0.09 seconds (58819.38 events/sec)
INFO:__main__:Inserted batch 42 with 5000 events in 0.14 seconds (35813.80 events/sec)
INFO:__main__:Inserted batch 43 with 5000 events in 0.12 seconds (42547.30 events/sec)
INFO:__main__:Inserted batch 44 with 5000 events in 0.09 seconds (57668.41 events/sec)
INFO:__main__:Inserted batch 45 with 5000 events in 0.09 seconds (58709.22 events/sec)
INFO:__main__:Inserted batch 46 with 5000 events in 0.08 seconds (60861.28 events/sec)
INFO:__main__:Inserted batch 47 with 5000 events in 0.08 seconds (61517.56 events/sec)
INFO:__main__:Inserted batch 48 with 5000 events in 0.08 seconds (60536.10 events/sec)
INFO:__main__:Inserted batch 49 with 5000 events in 0.08 seconds (61419.19 events/sec)
INFO:__main__:Inserted batch 50 with 5000 events in 0.08 seconds (60751.97 events/sec)
INFO:__main__:Inserted batch 51 with 5000 events in 0.08 seconds (61520.45 events/sec)
INFO:__main__:Inserted batch 52 with 5000 events in 0.08 seconds (60766.41 events/sec)
INFO:__main__:Inserted batch 53 with 5000 events in 0.10 seconds (49802.47 events/sec)
INFO:__main__:Inserted batch 54 with 5000 events in 0.09 seconds (55050.87 events/sec)
INFO:__main__:Inserted batch 55 with 5000 events in 0.08 seconds (60076.37 events/sec)
INFO:__main__:Inserted batch 56 with 5000 events in 0.08 seconds (60173.77 events/sec)
INFO:__main__:Inserted batch 57 with 5000 events in 0.08 seconds (60918.39 events/sec)
INFO:__main__:Inserted batch 58 with 5000 events in 0.08 seconds (60296.26 events/sec)
INFO:__main__:Inserted batch 59 with 5000 events in 0.08 seconds (61255.75 events/sec)
INFO:__main__:Inserted batch 60 with 5000 events in 0.08 seconds (59838.10 events/sec)
INFO:__main__:Inserted batch 61 with 5000 events in 0.08 seconds (60690.09 events/sec)
INFO:__main__:Inserted batch 62 with 5000 events in 0.08 seconds (59796.30 events/sec)
INFO:__main__:Inserted batch 63 with 5000 events in 0.08 seconds (61230.36 events/sec)
INFO:__main__:Inserted batch 64 with 5000 events in 0.09 seconds (58591.78 events/sec)
INFO:__main__:Inserted batch 65 with 5000 events in 0.08 seconds (60133.05 events/sec)
INFO:__main__:Inserted batch 66 with 5000 events in 0.08 seconds (59571.92 events/sec)
INFO:__main__:Inserted batch 67 with 5000 events in 0.08 seconds (60909.54 events/sec)
INFO:__main__:Inserted batch 68 with 5000 events in 0.08 seconds (61372.10 events/sec)
INFO:__main__:Inserted batch 69 with 5000 events in 0.08 seconds (61232.50 events/sec)
INFO:__main__:Inserted batch 70 with 5000 events in 0.08 seconds (62099.38 events/sec)
INFO:__main__:Inserted batch 71 with 5000 events in 0.08 seconds (61392.04 events/sec)
INFO:__main__:Inserted batch 72 with 5000 events in 0.08 seconds (59276.58 events/sec)
INFO:__main__:Inserted batch 73 with 5000 events in 0.09 seconds (57707.76 events/sec)
INFO:__main__:Inserted batch 74 with 5000 events in 0.08 seconds (59384.51 events/sec)
INFO:__main__:Inserted batch 75 with 5000 events in 0.08 seconds (60872.94 events/sec)
INFO:__main__:Inserted batch 76 with 5000 events in 0.08 seconds (61217.13 events/sec)
INFO:__main__:Inserted batch 77 with 5000 events in 0.08 seconds (61129.34 events/sec)
INFO:__main__:Inserted batch 78 with 5000 events in 0.08 seconds (61061.53 events/sec)
INFO:__main__:Inserted batch 79 with 5000 events in 0.08 seconds (60586.46 events/sec)
INFO:__main__:Inserted batch 80 with 5000 events in 0.08 seconds (59071.04 events/sec)
INFO:__main__:Inserted batch 81 with 5000 events in 0.08 seconds (60347.79 events/sec)
INFO:__main__:Inserted batch 82 with 5000 events in 0.08 seconds (59875.01 events/sec)
INFO:__main__:Inserted batch 83 with 5000 events in 0.09 seconds (57740.65 events/sec)
INFO:__main__:Inserted batch 84 with 5000 events in 0.08 seconds (60582.44 events/sec)
INFO:__main__:Inserted batch 85 with 5000 events in 0.08 seconds (60833.92 events/sec)
INFO:__main__:Inserted batch 86 with 5000 events in 0.09 seconds (58281.74 events/sec)
INFO:__main__:Inserted batch 87 with 5000 events in 0.08 seconds (60078.09 events/sec)
INFO:__main__:Inserted batch 88 with 5000 events in 0.09 seconds (58692.13 events/sec)
INFO:__main__:Inserted batch 89 with 5000 events in 0.08 seconds (60910.07 events/sec)
INFO:__main__:Inserted batch 90 with 5000 events in 0.08 seconds (60866.94 events/sec)
INFO:__main__:Inserted batch 91 with 5000 events in 0.08 seconds (60690.26 events/sec)
INFO:__main__:Inserted batch 92 with 5000 events in 0.08 seconds (59360.47 events/sec)
INFO:__main__:Inserted batch 93 with 5000 events in 0.08 seconds (60913.79 events/sec)
INFO:__main__:Inserted batch 94 with 5000 events in 0.10 seconds (47876.36 events/sec)
INFO:__main__:Inserted batch 95 with 5000 events in 0.08 seconds (61115.09 events/sec)
INFO:__main__:Inserted batch 96 with 5000 events in 0.08 seconds (60889.91 events/sec)
INFO:__main__:Inserted batch 97 with 5000 events in 0.08 seconds (59516.30 events/sec)
INFO:__main__:Inserted batch 98 with 5000 events in 0.09 seconds (52657.99 events/sec)
INFO:__main__:Inserted batch 99 with 5000 events in 0.08 seconds (59015.85 events/sec)
INFO:__main__:Inserted batch 100 with 5000 events in 0.08 seconds (59500.09 events/sec)
INFO:__main__:Inserted batch 101 with 5000 events in 0.08 seconds (59718.83 events/sec)
INFO:__main__:Inserted batch 102 with 5000 events in 0.09 seconds (58250.34 events/sec)
INFO:__main__:Inserted batch 103 with 5000 events in 0.08 seconds (61008.41 events/sec)
INFO:__main__:Inserted batch 104 with 5000 events in 0.08 seconds (60988.36 events/sec)
INFO:__main__:Inserted batch 105 with 5000 events in 0.08 seconds (61100.13 events/sec)
INFO:__main__:Inserted batch 106 with 5000 events in 0.08 seconds (60659.19 events/sec)
INFO:__main__:Inserted batch 107 with 5000 events in 0.08 seconds (61433.94 events/sec)
INFO:__main__:Inserted batch 108 with 5000 events in 0.08 seconds (60162.89 events/sec)
INFO:__main__:Inserted batch 109 with 5000 events in 0.08 seconds (60524.57 events/sec)
INFO:__main__:Inserted batch 110 with 5000 events in 0.09 seconds (58618.63 events/sec)
INFO:__main__:Inserted batch 111 with 5000 events in 0.09 seconds (56181.74 events/sec)
INFO:__main__:Inserted batch 112 with 5000 events in 0.08 seconds (60709.06 events/sec)
INFO:__main__:Inserted batch 113 with 5000 events in 0.08 seconds (60753.91 events/sec)
INFO:__main__:Inserted batch 114 with 5000 events in 0.08 seconds (59026.65 events/sec)
INFO:__main__:Inserted batch 115 with 5000 events in 0.09 seconds (58517.39 events/sec)
INFO:__main__:Inserted batch 116 with 5000 events in 0.08 seconds (61356.83 events/sec)
INFO:__main__:Inserted batch 117 with 5000 events in 0.08 seconds (61392.04 events/sec)
INFO:__main__:Inserted batch 118 with 5000 events in 0.08 seconds (61692.01 events/sec)
INFO:__main__:Inserted batch 119 with 5000 events in 0.09 seconds (57121.47 events/sec)
INFO:__main__:Inserted batch 120 with 5000 events in 0.08 seconds (60300.24 events/sec)
INFO:__main__:Inserted batch 121 with 5000 events in 0.08 seconds (59299.37 events/sec)
INFO:__main__:Inserted batch 122 with 5000 events in 0.08 seconds (62718.19 events/sec)
INFO:__main__:Inserted batch 123 with 5000 events in 0.08 seconds (61665.16 events/sec)
INFO:__main__:Inserted batch 124 with 5000 events in 0.08 seconds (60483.02 events/sec)
INFO:__main__:Inserted batch 125 with 5000 events in 0.08 seconds (60815.75 events/sec)
INFO:__main__:Inserted batch 126 with 5000 events in 0.08 seconds (62324.35 events/sec)
INFO:__main__:Inserted batch 127 with 5000 events in 0.08 seconds (62301.57 events/sec)
INFO:__main__:Inserted batch 128 with 5000 events in 0.08 seconds (59990.79 events/sec)
INFO:__main__:Inserted batch 129 with 5000 events in 0.08 seconds (61615.34 events/sec)
INFO:__main__:Inserted batch 130 with 5000 events in 0.09 seconds (58776.68 events/sec)
INFO:__main__:Inserted batch 131 with 5000 events in 0.08 seconds (60542.04 events/sec)
INFO:__main__:Inserted batch 132 with 5000 events in 0.09 seconds (58653.72 events/sec)
INFO:__main__:Inserted batch 133 with 5000 events in 0.08 seconds (60618.51 events/sec)
INFO:__main__:Inserted batch 134 with 5000 events in 0.08 seconds (62221.17 events/sec)
INFO:__main__:Inserted batch 135 with 5000 events in 0.08 seconds (60850.86 events/sec)
INFO:__main__:Inserted batch 136 with 5000 events in 0.08 seconds (60431.26 events/sec)
INFO:__main__:Inserted batch 137 with 5000 events in 0.08 seconds (59659.71 events/sec)
INFO:__main__:Inserted batch 138 with 5000 events in 0.08 seconds (61355.94 events/sec)
INFO:__main__:Inserted batch 139 with 5000 events in 0.08 seconds (60513.56 events/sec)
INFO:__main__:Inserted batch 140 with 5000 events in 0.09 seconds (57111.83 events/sec)
INFO:__main__:Inserted batch 141 with 5000 events in 0.08 seconds (60274.94 events/sec)
INFO:__main__:Inserted batch 142 with 5000 events in 0.08 seconds (59620.70 events/sec)
INFO:__main__:Inserted batch 143 with 5000 events in 0.08 seconds (60325.39 events/sec)
INFO:__main__:Inserted batch 144 with 5000 events in 0.08 seconds (61003.62 events/sec)
INFO:__main__:Inserted batch 145 with 5000 events in 0.09 seconds (58234.97 events/sec)
INFO:__main__:Inserted batch 146 with 5000 events in 0.08 seconds (60941.22 events/sec)
INFO:__main__:Inserted batch 147 with 5000 events in 0.08 seconds (59946.72 events/sec)
INFO:__main__:Inserted batch 148 with 5000 events in 0.08 seconds (59868.68 events/sec)
INFO:__main__:Inserted batch 149 with 5000 events in 0.09 seconds (58106.53 events/sec)
INFO:__main__:Inserted batch 150 with 5000 events in 0.08 seconds (59796.30 events/sec)
INFO:__main__:Inserted batch 151 with 5000 events in 0.08 seconds (60683.41 events/sec)
INFO:__main__:Inserted batch 152 with 5000 events in 0.08 seconds (60710.82 events/sec)
INFO:__main__:Inserted batch 153 with 5000 events in 0.08 seconds (59928.39 events/sec)
INFO:__main__:Inserted batch 154 with 5000 events in 0.09 seconds (56718.57 events/sec)
INFO:__main__:Inserted batch 155 with 5000 events in 0.08 seconds (61275.80 events/sec)
INFO:__main__:Inserted batch 156 with 5000 events in 0.08 seconds (60462.45 events/sec)
INFO:__main__:Inserted batch 157 with 5000 events in 0.08 seconds (59124.00 events/sec)
INFO:__main__:Inserted batch 158 with 5000 events in 0.08 seconds (58993.61 events/sec)
INFO:__main__:Inserted batch 159 with 5000 events in 0.09 seconds (58088.03 events/sec)
INFO:__main__:Inserted batch 160 with 5000 events in 0.09 seconds (55837.10 events/sec)
INFO:__main__:Inserted batch 161 with 5000 events in 0.08 seconds (59123.34 events/sec)
INFO:__main__:Inserted batch 162 with 5000 events in 0.08 seconds (59958.03 events/sec)
INFO:__main__:Inserted batch 163 with 5000 events in 0.08 seconds (59272.56 events/sec)
INFO:__main__:Inserted batch 164 with 5000 events in 0.08 seconds (59924.28 events/sec)
INFO:__main__:Inserted batch 165 with 5000 events in 0.08 seconds (60516.36 events/sec)
INFO:__main__:Inserted batch 166 with 5000 events in 0.08 seconds (61169.81 events/sec)
INFO:__main__:Inserted batch 167 with 5000 events in 0.08 seconds (61027.05 events/sec)
INFO:__main__:Inserted batch 168 with 5000 events in 0.09 seconds (57761.48 events/sec)
INFO:__main__:Inserted batch 169 with 5000 events in 0.08 seconds (59602.06 events/sec)
INFO:__main__:Inserted batch 170 with 5000 events in 0.08 seconds (59260.83 events/sec)
INFO:__main__:Inserted batch 171 with 5000 events in 0.08 seconds (61051.57 events/sec)
INFO:__main__:Inserted batch 172 with 5000 events in 0.08 seconds (60056.42 events/sec)
INFO:__main__:Inserted batch 173 with 5000 events in 0.08 seconds (60164.27 events/sec)
INFO:__main__:Inserted batch 174 with 5000 events in 0.08 seconds (59361.65 events/sec)
INFO:__main__:Inserted batch 175 with 5000 events in 0.09 seconds (57890.96 events/sec)
INFO:__main__:Inserted batch 176 with 5000 events in 0.09 seconds (57899.43 events/sec)
INFO:__main__:Inserted batch 177 with 5000 events in 0.08 seconds (61684.03 events/sec)
INFO:__main__:Inserted batch 178 with 5000 events in 0.09 seconds (56205.23 events/sec)
INFO:__main__:Inserted batch 179 with 5000 events in 0.08 seconds (60313.42 events/sec)
INFO:__main__:Inserted batch 180 with 5000 events in 0.09 seconds (58260.21 events/sec)
INFO:__main__:Inserted batch 181 with 5000 events in 0.08 seconds (60499.42 events/sec)
INFO:__main__:Inserted batch 182 with 5000 events in 0.09 seconds (55690.89 events/sec)
INFO:__main__:Inserted batch 183 with 5000 events in 0.09 seconds (56410.54 events/sec)
INFO:__main__:Inserted batch 184 with 5000 events in 0.09 seconds (57906.14 events/sec)
INFO:__main__:Inserted batch 185 with 5000 events in 0.09 seconds (56736.98 events/sec)
INFO:__main__:Inserted batch 186 with 5000 events in 0.08 seconds (59127.17 events/sec)
INFO:__main__:Inserted batch 187 with 5000 events in 0.09 seconds (56599.32 events/sec)
INFO:__main__:Inserted batch 188 with 5000 events in 0.08 seconds (60208.49 events/sec)
INFO:__main__:Inserted batch 189 with 5000 events in 0.08 seconds (59523.73 events/sec)
INFO:__main__:Inserted batch 190 with 5000 events in 0.09 seconds (58641.42 events/sec)
INFO:__main__:Inserted batch 191 with 5000 events in 0.08 seconds (61436.46 events/sec)
INFO:__main__:Inserted batch 192 with 5000 events in 0.09 seconds (58496.50 events/sec)
INFO:__main__:Inserted batch 193 with 5000 events in 0.09 seconds (56366.72 events/sec)
INFO:__main__:Inserted batch 194 with 5000 events in 0.08 seconds (60061.23 events/sec)
INFO:__main__:Inserted batch 195 with 5000 events in 0.09 seconds (57494.97 events/sec)
INFO:__main__:Inserted batch 196 with 5000 events in 0.09 seconds (55713.23 events/sec)
INFO:__main__:Inserted batch 197 with 5000 events in 0.09 seconds (54448.85 events/sec)
INFO:__main__:Inserted batch 198 with 5000 events in 0.08 seconds (62113.91 events/sec)
INFO:__main__:Inserted batch 199 with 5000 events in 0.09 seconds (58062.45 events/sec)
INFO:__main__:Inserted batch 200 with 5000 events in 0.09 seconds (58331.02 events/sec)
INFO:__main__:Total: Inserted 1000000 events in 16.91 seconds (59148.66 events/sec)
INFO:__main__:Successfully inserted 1000000 events in 16.91 seconds

写入前后的磁盘空间占用

写入前

复制代码
[root@localhost ~]# du -sh /data/clickhouse/
70M     /data/clickhouse/
[root@localhost ~]#

写入后

复制代码
[root@localhost clickhouse-sample]# du -sh /data/clickhouse/
333M    /data/clickhouse/
[root@localhost clickhouse-sample]#

写入后的内存

复制代码
CONTAINER ID   NAME                           CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
0bab25db35bf   clickhouse-server              2.89%     851.8MiB / 34.63GiB   2.40%     180MB / 987kB     102MB / 819kB     691

简单查看写入的数据

复制代码
0bab25db35bf :) select * from cmo_events limit 20;

SELECT *
FROM cmo_events
LIMIT 20

Query id: 37fb0dea-dae6-4d27-8d79-62cba659b7e0

    ┌─event_id─────────────────────────────┬─session_uuid─────────────────────────┬──────────event_time─┬─event_type──────────────┬─side─┬─unit_type─────────────────┬─unit_name────────────────────────┬─weapon_system─────────┬─target_type────┬─target_name────────────┬─damage_points─┬───────location_lat─┬───────location_lon─┬─description──────────────────────────────────────────────────────────────────────────────┐
 1. │ 8849c00f-cd82-46b2-ab9d-ec9c9de87890 │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ COMMUNICATION           │ BLUE │ Zumwalt-class DDG         │ Blue Zumwalt-class DDG #10       │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │  28.10526232009602 │ 121.95699352544986 │ Blue Zumwalt-class DDG #10 communication                                                 │
 2. │ a872a0a9-1252-4813-ab8c-4cff1f401f8d │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ LANDING                 │ RED  │ Borei-class SSBN          │ Red Borei-class SSBN #6          │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │ 30.696694625440067 │ 121.29328734609689 │ Red Borei-class SSBN #6 landed at base                                                   │
 3. │ 1ad76e63-006f-4e91-83ab-a700828cd82a │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ DAMAGE_ASSESSMENT       │ BLUE │ Arleigh Burke-class DDG   │ Blue Arleigh Burke-class DDG #1  │ ᴺᵁᴸᴸ                  │ GROUND_VEHICLE │ RED GROUND_VEHICLE #14 │            87 │ 26.261531960506662 │ 121.59999706737347 │ Blue Arleigh Burke-class DDG #1 damage assessment                                        │
 4. │ bdcd36a4-3b36-481c-89e6-b2c422aba5b0 │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ WEAPON_HIT              │ BLUE │ F-35 Lightning II         │ Blue F-35 Lightning II #3        │ Torpedo 53-65         │ INFRASTRUCTURE │ RED INFRASTRUCTURE #3  │            65 │ 33.112945272285366 │ 118.45447685467576 │ Torpedo 53-65 from Blue F-35 Lightning II #3 hit RED INFRASTRUCTURE #3 causing 65 damage │
 5. │ 0e9a9c2f-7bcb-481d-8ae3-e869c934615f │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ SENSOR_ACTIVATION       │ RED  │ Borei-class SSBN          │ Red Borei-class SSBN #5          │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │  33.10888852294174 │ 121.66802049225652 │ Red Borei-class SSBN #5 sensor activation                                                │
 6. │ 7d90b6fa-8775-449e-a9e8-becb05c63af1 │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ DETECTION               │ RED  │ S-400 Triumf              │ Red S-400 Triumf #8              │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │ 31.324694206355506 │ 119.61740383324297 │ Red S-400 Triumf #8 detected unknown contact                                             │
 7. │ f054c911-b909-485b-bf58-ec389c68f279 │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ LANDING                 │ RED  │ F-22A Raptor              │ Red F-22A Raptor #7              │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │ 28.983355606808068 │ 118.87768468760133 │ Red F-22A Raptor #7 landed at base                                                       │
 8. │ 3caa95bf-0ece-4a07-b115-70ba1fc248b5 │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ SENSOR_ACTIVATION       │ RED  │ Yasen-class SSN           │ Red Yasen-class SSN #11          │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │ 27.212144356311303 │ 123.75301138525788 │ Red Yasen-class SSN #11 sensor activation                                                │
 9. │ c5c1e9b6-0eed-4146-b7fa-1fa3378e1b07 │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ WEAPON_MISS             │ BLUE │ Virginia-class SSN        │ Blue Virginia-class SSN #10      │ Kalibr Cruise Missile │ GROUND_VEHICLE │ RED GROUND_VEHICLE #11 │          ᴺᵁᴸᴸ │ 34.917367440747384 │ 118.09356032516219 │ Kalibr Cruise Missile from Blue Virginia-class SSN #10 missed RED GROUND_VEHICLE #11     │
10. │ d9728362-f068-4a40-874c-f73ec4f473cf │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ FUEL_STATUS             │ BLUE │ Arleigh Burke-class DDG   │ Blue Arleigh Burke-class DDG #12 │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │ 27.645216291227896 │ 120.54220071943406 │ Blue Arleigh Burke-class DDG #12 fuel status                                             │
11. │ 4a8aff13-84df-47a8-8f5f-806182f185c1 │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ MISSION_UPDATE          │ RED  │ Su-57 Felon               │ Red Su-57 Felon #7               │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │  33.79360033184239 │ 122.92032627373821 │ Red Su-57 Felon #7 mission update                                                        │
12. │ ea4d9ae7-856c-4e2d-ab9e-a11a2c6acf1f │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ MISSION_UPDATE          │ RED  │ Su-57 Felon               │ Red Su-57 Felon #2               │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │  31.67506881472707 │ 115.28273231630332 │ Red Su-57 Felon #2 mission update                                                        │
13. │ 0fee2c66-1cf9-40e9-b482-89c2b1332a63 │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ SENSOR_ACTIVATION       │ BLUE │ M1A2 Abrams               │ Blue M1A2 Abrams #9              │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │  27.01940220507797 │ 116.35083989530673 │ Blue M1A2 Abrams #9 sensor activation                                                    │
14. │ e3441e5d-a55e-4339-8d2a-19a5cb88c072 │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ WEAPON_LAUNCH           │ RED  │ Kirov-class Battlecruiser │ Red Kirov-class Battlecruiser #2 │ R-73 Archer           │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │  25.57873263183426 │ 116.61851532134524 │ Red Kirov-class Battlecruiser #2 launched R-73 Archer                                    │
15. │ 9599d2ca-3206-45a9-8229-f1213dce2930 │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ DETECTION               │ RED  │ J-20 Mighty Dragon        │ Red J-20 Mighty Dragon #4        │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │   34.4541596455493 │ 117.62987749184556 │ Red J-20 Mighty Dragon #4 detected unknown contact                                       │
16. │ b8a471e5-217b-4ee2-83d6-dda9e652f3af │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ LANDING                 │ BLUE │ M1A2 Abrams               │ Blue M1A2 Abrams #6              │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │ 33.169548654507565 │ 118.58063252886755 │ Blue M1A2 Abrams #6 landed at base                                                       │
17. │ b10edb56-3180-42ae-bd27-5c714819982a │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ DETECTION               │ BLUE │ Patriot SAM               │ Blue Patriot SAM #11             │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │  32.81706047928581 │ 116.25912274827049 │ Blue Patriot SAM #11 detected unknown contact                                            │
18. │ 46795b70-2d03-4b05-828e-41a0a64475ed │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ MISSION_UPDATE          │ RED  │ Su-57 Felon               │ Red Su-57 Felon #1               │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │  30.98647515114664 │ 123.10269312841808 │ Red Su-57 Felon #1 mission update                                                        │
19. │ 458402f7-a738-43e0-b303-7326ff2e900c │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ COUNTERMEASURE_DEPLOYED │ RED  │ Borei-class SSBN          │ Red Borei-class SSBN #1          │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │  29.84279445458373 │ 119.76491994064281 │ Red Borei-class SSBN #1 countermeasure deployed                                          │
20. │ 389aca11-e7b8-431a-ab42-7e54306ecc92 │ 09cfee87-3b10-41cb-a995-9bb351944997 │ 2025-09-20 10:24:30 │ MISSION_UPDATE          │ BLUE │ F-35 Lightning II         │ Blue F-35 Lightning II #2        │ ᴺᵁᴸᴸ                  │ ᴺᵁᴸᴸ           │ ᴺᵁᴸᴸ                   │          ᴺᵁᴸᴸ │  33.58536620950474 │   124.293345407109 │ Blue F-35 Lightning II #2 mission update                                                 │
    └──────────────────────────────────────┴──────────────────────────────────────┴─────────────────────┴─────────────────────────┴──────┴───────────────────────────┴──────────────────────────────────┴───────────────────────┴────────────────┴────────────────────────┴───────────────┴────────────────────┴────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────┘

20 rows in set. Elapsed: 0.018 sec.

实时评估十项典型指标

analyze_cmo_data.py

复制代码
from clickhouse_driver import Client
import pandas as pd
from tabulate import tabulate
import matplotlib.pyplot as plt
import matplotlib
matplotlib.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
matplotlib.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号

# ClickHouse连接配置
CH_HOST = 'localhost'
CH_PORT = 32772  # 默认原生TCP端口
CH_USER = 'default'
CH_PASSWORD = 'nQczEy2EWJ7w'  # 如果设置了密码,请填写
CH_DATABASE = 'default'
CH_TABLE = 'cmo_events'

def create_clickhouse_client():
    """创建ClickHouse客户端连接"""
    try:
        client = Client(
            host=CH_HOST,
            port=CH_PORT,
            user=CH_USER,
            password=CH_PASSWORD,
            database=CH_DATABASE
        )
        print("成功连接到ClickHouse数据库")
        return client
    except Exception as e:
        print(f"连接ClickHouse失败: {e}")
        return None

def execute_query(client, query):
    """执行查询并返回结果"""
    try:
        result = client.execute(query)
        return result
    except Exception as e:
        print(f"查询执行失败: {e}")
        return None

def print_chinese_report(title, headers, data):
    """打印中文报表"""
    print(f"\n{'='*60}")
    print(f"{title:^60}")
    print(f"{'='*60}")
    print(tabulate(data, headers=headers, tablefmt='grid', numalign="right"))
    print(f"{'='*60}")

def analyze_cmo_data(client):
    """执行CMO推演数据分析"""

    # 1. 总体统计
    print("正在进行CMO推演数据分析...")
    total_events = execute_query(client, f"SELECT COUNT(*) FROM {CH_TABLE}")[0][0]
    session_uuid = execute_query(client, f"SELECT session_uuid FROM {CH_TABLE} LIMIT 1")[0][0]

    print(f"\n推演场次UUID: {session_uuid}")
    print(f"总事件数量: {total_events:,}")

    # 2. 红蓝双方事件数量对比
    side_stats = execute_query(client, f"""
        SELECT side, COUNT(*) as count,
               COUNT(*) * 100.0 / (SELECT COUNT(*) FROM {CH_TABLE}) as percentage
        FROM {CH_TABLE}
        GROUP BY side
        ORDER BY count DESC
    """)

    print_chinese_report(
        "红蓝双方事件数量对比",
        ["阵营", "事件数量", "占比(%)"],
        side_stats
    )

    # 3. 事件类型分布
    event_type_stats = execute_query(client, f"""
        SELECT event_type, COUNT(*) as count,
               COUNT(*) * 100.0 / (SELECT COUNT(*) FROM {CH_TABLE}) as percentage
        FROM {CH_TABLE}
        GROUP BY event_type
        ORDER BY count DESC
        LIMIT 10
    """)

    print_chinese_report(
        "事件类型分布TOP10",
        ["事件类型", "发生次数", "占比(%)"],
        event_type_stats
    )

    # 4. 最活跃的作战单位
    active_units = execute_query(client, f"""
        SELECT unit_name, COUNT(*) as event_count
        FROM {CH_TABLE}
        GROUP BY unit_name
        ORDER BY event_count DESC
        LIMIT 10
    """)

    print_chinese_report(
        "最活跃的作战单位TOP10",
        ["单位名称", "事件数量"],
        active_units
    )

    # 5. 武器使用统计
    weapon_stats = execute_query(client, f"""
        SELECT weapon_system, COUNT(*) as usage_count
        FROM {CH_TABLE}
        WHERE weapon_system IS NOT NULL
        GROUP BY weapon_system
        ORDER BY usage_count DESC
        LIMIT 10
    """)

    print_chinese_report(
        "武器使用统计TOP10",
        ["武器系统", "使用次数"],
        weapon_stats
    )

    # 6. 命中与伤害统计
    hit_stats = execute_query(client, f"""
        SELECT
            COUNTIf(event_type = 'WEAPON_HIT') as hit_count,
            COUNTIf(event_type = 'WEAPON_MISS') as miss_count,
            SUMIf(damage_points, event_type = 'WEAPON_HIT') as total_damage,
            AVGIf(damage_points, event_type = 'WEAPON_HIT') as avg_damage
        FROM {CH_TABLE}
    """)

    hit_ratio = hit_stats[0][0] / (hit_stats[0][0] + hit_stats[0][1]) * 100 if (hit_stats[0][0] + hit_stats[0][1]) > 0 else 0

    print_chinese_report(
        "武器命中与伤害统计",
        ["命中次数", "未命中次数", "命中率(%)", "总伤害", "平均伤害"],
        [[hit_stats[0][0], hit_stats[0][1], f"{hit_ratio:.2f}", hit_stats[0][2] or 0, f"{(hit_stats[0][3] or 0):.2f}"]]
    )

    # 7. 时间线分析 - 事件频率
    timeline_stats = execute_query(client, f"""
        SELECT
            toStartOfMinute(event_time) as minute,
            COUNT(*) as events_per_minute
        FROM {CH_TABLE}
        GROUP BY minute
        ORDER BY minute
    """)

    # 转换为DataFrame以便分析
    timeline_df = pd.DataFrame(timeline_stats, columns=['minute', 'events_per_minute'])

    print_chinese_report(
        "时间线分析 - 事件频率",
        ["统计项", "值"],
        [
            ["推演开始时间", timeline_df['minute'].min()],
            ["推演结束时间", timeline_df['minute'].max()],
            ["推演持续时间(分钟)", (timeline_df['minute'].max() - timeline_df['minute'].min()).total_seconds() / 60],
            ["每分钟平均事件数", timeline_df['events_per_minute'].mean()],
            ["每分钟最大事件数", timeline_df['events_per_minute'].max()],
            ["每分钟最小事件数", timeline_df['events_per_minute'].min()]
        ]
    )

    # 8. 目标类型分析
    target_stats = execute_query(client, f"""
        SELECT target_type, COUNT(*) as count
        FROM {CH_TABLE}
        WHERE target_type IS NOT NULL
        GROUP BY target_type
        ORDER BY count DESC
        LIMIT 10
    """)

    print_chinese_report(
        "目标类型分析TOP10",
        ["目标类型", "被攻击次数"],
        target_stats
    )

    # 9. 地理分布分析
    geo_stats = execute_query(client, f"""
        SELECT
            round(location_lat, 1) as lat_zone,
            round(location_lon, 1) as lon_zone,
            COUNT(*) as event_count
        FROM {CH_TABLE}
        GROUP BY lat_zone, lon_zone
        ORDER BY event_count DESC
        LIMIT 10
    """)

    print_chinese_report(
        "热点区域分析TOP10",
        ["纬度区域", "经度区域", "事件数量"],
        geo_stats
    )

    # 10. 推演阶段分析
    phase_analysis = execute_query(client, f"""
        WITH time_buckets AS (
            SELECT
                event_time,
                CASE
                    WHEN event_time < (SELECT MIN(event_time) FROM {CH_TABLE}) + INTERVAL 2 HOUR THEN '初期阶段'
                    WHEN event_time < (SELECT MIN(event_time) FROM {CH_TABLE}) + INTERVAL 4 HOUR THEN '中期阶段'
                    ELSE '后期阶段'
                END as phase
            FROM {CH_TABLE}
        )
        SELECT
            phase,
            COUNT(*) as event_count,
            COUNTIf(event_type = 'WEAPON_LAUNCH') as weapon_launches,
            COUNTIf(event_type = 'WEAPON_HIT') as weapon_hits
        FROM time_buckets
        JOIN {CH_TABLE} USING (event_time)
        GROUP BY phase
        ORDER BY phase
    """)

    print_chinese_report(
        "推演阶段分析",
        ["阶段", "事件总数", "武器发射次数", "命中次数"],
        phase_analysis
    )

    # 额外分析:武器效率
    weapon_efficiency = execute_query(client, f"""
        SELECT
            weapon_system,
            COUNT(*) as total_uses,
            COUNTIf(event_type = 'WEAPON_HIT') as hits,
            COUNTIf(event_type = 'WEAPON_MISS') as misses,
            CASE
                WHEN COUNTIf(event_type IN ('WEAPON_HIT', 'WEAPON_MISS')) > 0
                THEN COUNTIf(event_type = 'WEAPON_HIT') * 100.0 / COUNTIf(event_type IN ('WEAPON_HIT', 'WEAPON_MISS'))
                ELSE 0
            END as hit_rate
        FROM {CH_TABLE}
        WHERE weapon_system IS NOT NULL
        GROUP BY weapon_system
        HAVING total_uses >= 10
        ORDER BY hit_rate DESC
        LIMIT 10
    """)

    print_chinese_report(
        "武器效率分析TOP10 (至少使用10次)",
        ["武器系统", "使用次数", "命中次数", "未命中次数", "命中率(%)"],
        weapon_efficiency
    )

def main():
    """主函数"""
    print("CMO推演数据分析程序")
    print("开始连接ClickHouse数据库...")

    client = create_clickhouse_client()
    if client is None:
        return

    try:
        analyze_cmo_data(client)
        print("\n分析完成!")
    except Exception as e:
        print(f"分析过程中出现错误: {e}")
    finally:
        client.disconnect()

if __name__ == "__main__":
    # 安装依赖: pip install clickhouse-driver pandas tabulate matplotlib
    main()

运行效果

复制代码
[root@localhost clickhouse-sample]# time python3 analyze_cmo_data.py
CMO推演数据分析程序
开始连接ClickHouse数据库...
成功连接到ClickHouse数据库
正在进行CMO推演数据分析...

推演场次UUID: 09cfee87-3b10-41cb-a995-9bb351944997
总事件数量: 1,000,000

============================================================
                         红蓝双方事件数量对比
============================================================
+--------+------------+-----------+
| 阵营   |   事件数量 |   占比(%) |
+========+============+===========+
| RED    |     500045 |   50.0045 |
+--------+------------+-----------+
| BLUE   |     499955 |   49.9955 |
+--------+------------+-----------+
============================================================

============================================================
                        事件类型分布TOP10
============================================================
+-------------------------+------------+-----------+
| 事件类型                |   发生次数 |   占比(%) |
+=========================+============+===========+
| WEAPON_HIT              |      83847 |    8.3847 |
+-------------------------+------------+-----------+
| DETECTION               |      83808 |    8.3808 |
+-------------------------+------------+-----------+
| LANDING                 |      83490 |     8.349 |
+-------------------------+------------+-----------+
| WEAPON_LAUNCH           |      83438 |    8.3438 |
+-------------------------+------------+-----------+
| COUNTERMEASURE_DEPLOYED |      83413 |    8.3413 |
+-------------------------+------------+-----------+
| FUEL_STATUS             |      83361 |    8.3361 |
+-------------------------+------------+-----------+
| WEAPON_MISS             |      83357 |    8.3357 |
+-------------------------+------------+-----------+
| TAKEOFF                 |      83227 |    8.3227 |
+-------------------------+------------+-----------+
| SENSOR_ACTIVATION       |      83173 |    8.3173 |
+-------------------------+------------+-----------+
| DAMAGE_ASSESSMENT       |      83071 |    8.3071 |
+-------------------------+------------+-----------+
============================================================

============================================================
                       最活跃的作战单位TOP10
============================================================
+----------------------------------+------------+
| 单位名称                         |   事件数量 |
+==================================+============+
| Red Kirov-class Battlecruiser #4 |       5406 |
+----------------------------------+------------+
| Red S-400 Triumf #2              |       5379 |
+----------------------------------+------------+
| Blue M1A2 Abrams #6              |       5367 |
+----------------------------------+------------+
| Red J-20 Mighty Dragon #10       |       5350 |
+----------------------------------+------------+
| Red F-22A Raptor #10             |       5348 |
+----------------------------------+------------+
| Blue Trident II D5 SLBM #11      |       5329 |
+----------------------------------+------------+
| Red Kirov-class Battlecruiser #1 |       5325 |
+----------------------------------+------------+
| Blue F-35 Lightning II #7        |       5323 |
+----------------------------------+------------+
| Blue F-35 Lightning II #9        |       5321 |
+----------------------------------+------------+
| Blue Patriot SAM #6              |       5319 |
+----------------------------------+------------+
============================================================

============================================================
                        武器使用统计TOP10
============================================================
+-------------------------+------------+
| 武器系统                |   使用次数 |
+=========================+============+
| R-73 Archer             |      22971 |
+-------------------------+------------+
| AIM-120D AMRAAM         |      22889 |
+-------------------------+------------+
| Torpedo MK-48           |      22856 |
+-------------------------+------------+
| Tomahawk Cruise Missile |      22856 |
+-------------------------+------------+
| Kalibr Cruise Missile   |      22829 |
+-------------------------+------------+
| R-77 Adder              |      22803 |
+-------------------------+------------+
| Torpedo 53-65           |      22779 |
+-------------------------+------------+
| PL-15                   |      22777 |
+-------------------------+------------+
| AIM-9X Sidewinder       |      22739 |
+-------------------------+------------+
| DF-21D ASBM             |      22611 |
+-------------------------+------------+
============================================================

============================================================
                         武器命中与伤害统计
============================================================
+------------+--------------+-------------+----------+------------+
|   命中次数 |   未命中次数 |   命中率(%) |   总伤害 |   平均伤害 |
+============+==============+=============+==========+============+
|      83847 |        83357 |       50.15 |  4606780 |      54.94 |
+------------+--------------+-------------+----------+------------+
============================================================

============================================================
                        时间线分析 - 事件频率
============================================================
+--------------------+---------------------+
| 统计项             | 值                  |
+====================+=====================+
| 推演开始时间       | 2025-09-20 10:24:00 |
+--------------------+---------------------+
| 推演结束时间       | 2025-09-20 16:24:00 |
+--------------------+---------------------+
| 推演持续时间(分钟) | 360.0               |
+--------------------+---------------------+
| 每分钟平均事件数   | 2770.083102493075   |
+--------------------+---------------------+
| 每分钟最大事件数   | 2915                |
+--------------------+---------------------+
| 每分钟最小事件数   | 1421                |
+--------------------+---------------------+
============================================================

============================================================
                        目标类型分析TOP10
============================================================
+----------------+--------------+
| 目标类型       |   被攻击次数 |
+================+==============+
| SUBMAIRNE      |        27956 |
+----------------+--------------+
| AIRCRAFT       |        27882 |
+----------------+--------------+
| RADAR_SITE     |        27872 |
+----------------+--------------+
| AIRBASE        |        27858 |
+----------------+--------------+
| INFRASTRUCTURE |        27856 |
+----------------+--------------+
| GROUND_VEHICLE |        27775 |
+----------------+--------------+
| WARSHIP        |        27759 |
+----------------+--------------+
| COMMAND_CENTER |        27728 |
+----------------+--------------+
| SAM_SITE       |        27589 |
+----------------+--------------+
============================================================

============================================================
                        热点区域分析TOP10
============================================================
+------------+------------+------------+
|   纬度区域 |   经度区域 |   事件数量 |
+============+============+============+
|       33.3 |      118.3 |        144 |
+------------+------------+------------+
|       32.5 |      119.6 |        140 |
+------------+------------+------------+
|       29.9 |      121.1 |        136 |
+------------+------------+------------+
|       26.4 |      121.7 |        136 |
+------------+------------+------------+
|       33.9 |      119.3 |        136 |
+------------+------------+------------+
|       31.1 |      116.7 |        135 |
+------------+------------+------------+
|       27.8 |      115.5 |        135 |
+------------+------------+------------+
|         33 |      122.5 |        134 |
+------------+------------+------------+
|       30.6 |      118.3 |        132 |
+------------+------------+------------+
|       26.1 |      122.2 |        131 |
+------------+------------+------------+
============================================================

============================================================
                           推演阶段分析
============================================================
+----------+------------+----------------+------------+
| 阶段     |   事件总数 |   武器发射次数 |   命中次数 |
+==========+============+================+============+
| 中期阶段 |   15747891 |        1322005 |    1322311 |
+----------+------------+----------------+------------+
| 初期阶段 |   15878750 |        1323617 |    1319577 |
+----------+------------+----------------+------------+
| 后期阶段 |   15673439 |        1304067 |    1323536 |
+----------+------------+----------------+------------+
============================================================

============================================================
                   武器效率分析TOP10 (至少使用10次)
============================================================
+-------------------------+------------+------------+--------------+-------------+
| 武器系统                |   使用次数 |   命中次数 |   未命中次数 |   命中率(%) |
+=========================+============+============+==============+=============+
| Kalibr Cruise Missile   |      22829 |       7817 |         7496 |     51.0481 |
+-------------------------+------------+------------+--------------+-------------+
| R-77 Adder              |      22803 |       7737 |         7518 |     50.7178 |
+-------------------------+------------+------------+--------------+-------------+
| PL-15                   |      22777 |       7648 |         7488 |     50.5285 |
+-------------------------+------------+------------+--------------+-------------+
| Standard Missile-6      |      22532 |       7566 |         7431 |     50.4501 |
+-------------------------+------------+------------+--------------+-------------+
| AIM-120D AMRAAM         |      22889 |       7732 |         7637 |     50.3091 |
+-------------------------+------------+------------+--------------+-------------+
| Torpedo MK-48           |      22856 |       7707 |         7637 |     50.2281 |
+-------------------------+------------+------------+--------------+-------------+
| DF-21D ASBM             |      22611 |       7536 |         7537 |     49.9967 |
+-------------------------+------------+------------+--------------+-------------+
| R-73 Archer             |      22971 |       7586 |         7640 |     49.8227 |
+-------------------------+------------+------------+--------------+-------------+
| Tomahawk Cruise Missile |      22856 |       7572 |         7677 |     49.6557 |
+-------------------------+------------+------------+--------------+-------------+
| Torpedo 53-65           |      22779 |       7561 |         7686 |     49.5901 |
+-------------------------+------------+------------+--------------+-------------+
============================================================

分析完成!

real    0m1.314s
user    0m7.131s
sys     0m0.079s

可以看到,clickhouse在100万个事件中做十项统计,耗时不到2s

参考

Install ClickHouse using Docker | ClickHouse Docs

附:一linux time指令时间解释

您观察到的 realusersys 时间之间的差异是正常的,这反映了程序运行时的不同时间维度:

时间类型解释

  1. real (实际时间/墙上时间): 0m1.314s

    • 这是程序从开始到结束的实际经过时间
    • 包括所有等待时间(如I/O操作、网络请求、系统调度等)
  2. user (用户CPU时间): 0m7.131s

    • 这是程序在用户模式下执行所花费的CPU时间总和
    • 如果程序使用了多线程或多进程,这个时间会是所有线程/进程的CPU时间之和
  3. sys (系统CPU时间): 0m0.079s

    • 这是程序在内核模式下执行所花费的CPU时间
    • 包括系统调用、内核服务等

为什么 user 时间比 real 时间长?

在您的情况下,user 时间 (7.131s) 远大于 real 时间 (1.314s),这通常表明:

  1. 多线程/多核利用: 您的程序可能使用了多线程,并且运行在多核CPU上。ClickHouse驱动程序可能在后台使用多个线程来处理查询和数据反序列化。

  2. 并行处理: 虽然您的代码看起来是单线程的,但底层的ClickHouse驱动程序和Python解释器可能在某些操作上使用了并行处理。

  3. CPU密集型操作: 数据分析程序涉及大量的计算和数据处理,这些操作是CPU密集型的,因此会累积较多的用户CPU时间。

  4. 向量化操作: Pandas和其他数据处理库使用向量化操作,这些操作可以高效利用CPU的多个核心。

具体分析

对于您的CMO数据分析程序:

  1. 查询执行: 程序向ClickHouse发送了多个查询,这些查询可能在服务器端并行执行。

  2. 数据处理: 使用Pandas处理查询结果时,Pandas会利用多核CPU进行向量化操作。

  3. 网络I/O: 虽然网络请求是I/O密集型操作,但数据处理是CPU密集型操作,这导致了user时间的累积。

这是一个好现象

实际上,user 时间远大于 real 时间通常是一个好迹象,因为它表明:

  1. 您的程序高效地利用了多核CPU
  2. 计算任务被有效地并行化
  3. CPU资源得到了充分利用
  4. 程序性能良好,没有因为I/O等待而浪费时间

如何验证

如果您想确认是否是并行处理导致了这种现象,可以:

  1. 使用单线程运行程序(设置环境变量 OMP_NUM_THREADS=1
  2. 使用性能分析工具如 cProfile 来分析程序的时间分布
  3. 检查ClickHouse驱动的文档,了解它是否默认使用多线程

总之,您观察到的时间差异是正常的,表明您的程序正在高效地利用系统资源。

附二:ClickHouse是否能支撑同时100场推演的数据写入?

根据您的测试数据和ClickHouse的架构特性分析,答案是肯定的,但需要满足以下条件:


1. 单节点性能评估
  • 测试表现:单节点写入100万事件用时16.91秒(59,148事件/秒),且内存占用仅851.8MiB,磁盘占用333MB。
  • 扩展性测试推演
    若每场推演数据量相同(100万事件),100场总数据量为1亿事件。单节点理论耗时约 28分钟 (100/59,148≈0.0017秒/事件 ×1e8=16910秒)。
    但实际并发写入会因资源竞争显著降低性能。

2. 并发写入瓶颈分析
  • 磁盘IO:MergeTree引擎写入时使用顺序IO,但100个并发会话可能造成磁盘随机IO增加。
  • CPU限制:每条事件生成包含随机数、UUID、字符串操作,CPU密集型操作。
  • 锁竞争 :MergeTree表引擎在高并发写入时可能触发写锁

3. 分布式架构方案(推荐)

建议采用以下架构实现100场并发写入:
写入客户端 Distributed Table MergeTree分片1 MergeTree分片2 ... MergeTree分片N

关键配置

sql 复制代码
-- 创建分布式表(需在集群配置文件中定义分片)
CREATE TABLE cmo_events_distributed AS cmo_events
ENGINE = Distributed('cluster_name', default, cmo_events, session_uuid)

Python代码调整

python 复制代码
# 修改插入语句为分布式表
insert_query = "INSERT INTO cmo_events_distributed (...) VALUES"

优势

  • 线性扩展:通过增加分片数量提升总写入吞吐量
  • 负载均衡 :按session_uuid哈希分布到不同节点
  • 容错能力:单节点故障不影响整体写入

二、腾讯云ClickHouse是否支持在线扩容?

完全支持,其扩容机制具有以下特点:


1. 在线扩容能力
  • 无感知扩容 :通过控制台/API添加节点组,扩容过程中无需停机
  • 自动数据再平衡 :新增节点后自动触发ReplicatedMergeTree数据复制
  • 集群拓扑自动更新:节点组扩容后立即参与分布式查询和写入

2. 扩容操作步骤(腾讯云控制台)

登录控制台 选择集群 点击扩容 选择节点组 设置新节点数量 确认配置 自动扩容完成


3. 扩容后性能提升
  • 写入吞吐量:近似线性增长(每新增1节点可提升5.9万+事件/秒)
  • 查询性能:分布式查询并行加速
  • 存储容量:通过增加节点扩展总存储空间

三、生产环境建议配置

维度 单机方案 腾讯云分布式方案
节点数量 1 ≥3
数据分布 单表写入 Distributed表 + ReplicatedMergeTree
扩容方式 手动迁移数据 控制台点击扩容
写入压力 ≤10万事件/秒 按需扩展至百万级/秒
数据一致性 本地MergeTree ZooKeeper协调
运维复杂度 云服务托管
适用场景 单场景/小规模测试 多推演并发/生产环境

四、性能优化补充建议

  1. 批量写入优化

    python 复制代码
    # 增大批量大小(测试10000~50000条/批)
    batch_size = 50000  # 根据内存调整
  2. 压缩数据

    sql 复制代码
    -- 启用LZ4压缩
    CREATE TABLE cmo_events ENGINE = MergeTree()
    ORDER BY ... 
    SETTINGS compression_codec = 'LZ4'
  3. 分区策略

    sql 复制代码
    -- 按session_uuid分区(适合按场次查询)
    ENGINE = MergeTree()
    PARTITION BY session_uuid
    ORDER BY (session_uuid, event_time)

五、总结

  1. 本地ClickHouse:通过分布式引擎可支撑100场并发推演,但需手动部署集群
  2. 腾讯云ClickHouse
    • 支持在线扩容(控制台操作即可)
    • 建议使用分布式+ReplicatedMergeTree架构
    • 推演场次按session_uuid分布到不同分片

🚀 实测建议:腾讯云ClickHouse单节点写入能力可达8万10万事件/秒**(取决于数据复杂度),100场并发推演可通过部署**510个节点轻松实现

附三:分片

ClickHouse 的 Distributed 表引擎是分布式查询的核心机制,其本质是一个 逻辑表不存储数据 ,而是作为底层分片表的 代理层,用于:

  1. 将写入请求分发到所有分片节点
  2. 将读取请求聚合所有分片节点的数据

一、Distributed 表的本质

1. 不存储数据,只路由
  • cmo_events_distributed 表本身不存储任何数据

  • 它只是告诉 ClickHouse:

    "当我收到写入请求时,请自动把数据分发到所有分片节点的 cmo_events 表中"

2. 分片规则由你定义
  • 你可以指定分片规则(如哈希、随机、固定分片等)
  • 示例:按 session_uuid 哈希分布
sql 复制代码
CREATE TABLE cmo_events_distributed (
    ...
) ENGINE = Distributed('cluster_name', default, cmo_events, session_uuid)
  • Distributed 表会根据 session_uuid 的哈希值,将数据分发到对应节点的本地 cmo_events

二、Distributed 表 vs 本地表

类型 本地表(MergeTree) Distributed 表
数据存储 是(存储在单个节点) 否(不存储数据)
写入操作 直接写入本地 自动分发到所有分片
读取操作 仅读取本地数据 自动聚合所有分片数据
容错能力 单节点故障数据丢失 自动读取所有分片 + Replicated表保障
运维复杂度 需配置集群(或依赖云服务)
典型用途 数据存储 分布式写入/查询入口

三、Distributed 表的创建逻辑

1. 本地表(真实存储)
sql 复制代码
-- 在每个节点上创建
CREATE TABLE cmo_events (
    event_id UUID,
    session_uuid UUID,
    event_time DateTime,
    ...
) ENGINE = MergeTree()
ORDER BY (session_uuid, event_time)
2. Distributed 表(逻辑代理)
sql 复制代码
-- 在集群任意节点创建
CREATE TABLE cmo_events_distributed (
    event_id UUID,
    session_uuid UUID,
    event_time DateTime,
    ...
) ENGINE = Distributed('cluster_name', default, cmo_events, session_uuid)
  • Distributed 表的作用:
    • 写入:根据 session_uuid 的哈希值,将数据分发到对应节点的 cmo_events
    • 查询:从所有节点的 cmo_events 表中拉取数据,进行聚合

四、Distributed 表的工作原理

1. 写入时自动分发
python 复制代码
# 业务代码只需连接任意节点,写入 Distributed 表
client.execute("INSERT INTO cmo_events_distributed (...) VALUES", batch_data)
  • ClickHouse 会自动:
    1. 计算 session_uuid 的哈希值
    2. 根据 cluster_name 配置,决定数据写入哪个分片
    3. 将数据写入对应节点的本地 cmo_events
2. 读取时自动聚合
sql 复制代码
-- 查询 Distributed 表时,ClickHouse 自动从所有分片拉取数据
SELECT COUNT(*) FROM cmo_events_distributed WHERE session_uuid = 'xxx'
  • ClickHouse 会:
    1. 向所有分片节点发送查询
    2. 在协调节点上聚合结果
    3. 返回最终结果

五、腾讯云 ClickHouse 的封装逻辑

1. 自动分片配置
  • 腾讯云会自动维护 clusters.xml,你只需:
    • 在控制台创建集群
    • 选择分片规则(如按 session_uuid 哈希分片)
2. 虚拟IP 接入
  • 腾讯云提供一个 统一接入点 (如 cluster.clickhouse.tencent.com
  • 你只需连接这个虚拟IP,即可:
    • 写入 Distributed 表(自动分发)
    • 查询 Distributed 表(自动聚合)
3. 无需关心底层节点
  • 你只需在任意节点创建 Distributed
  • 腾讯云自动维护分片节点的连接和容错

六、Distributed 表的使用示例

1. 配置集群(腾讯云自动完成)
xml 复制代码
<!-- config.d/clusters.xml -->
<cluster>
    <shard>
        <replica><host>clickhouse-01</host></replica>
    </shard>
    <shard>
        <replica><host>clickhouse-02</host></replica>
    </shard>
</cluster>
2. 创建分布式表
sql 复制代码
CREATE TABLE cmo_events_distributed (
    event_id UUID,
    session_uuid UUID,
    event_time DateTime,
    ...
) ENGINE = Distributed('cluster_name', default, cmo_events, session_uuid)
3. 业务代码写入
python 复制代码
# 连接任意节点,写入 Distributed 表
client = Client(host='cluster.clickhouse.tencent.com', port=9000, user='default', password='xxx')
client.execute("INSERT INTO cmo_events_distributed (...) VALUES", batch_data)
4. 业务代码查询
python 复制代码
# 查询 Distributed 表将自动聚合所有分片数据
result = client.execute("SELECT COUNT(*) FROM cmo_events_distributed WHERE session_uuid = 'xxx'")

七、Distributed 表 vs Redis Cluster 的对比

特性 Redis Cluster ClickHouse Distributed 表
数据分片 自动(客户端透明) 需显式创建 Distributed
写入透明性 客户端自动路由 需写入 Distributed
查询透明性 客户端自动路由 查询 Distributed 表自动聚合
容错机制 自动重试 需配合 ReplicatedMergeTree
运维管理 完全透明 需配置分片规则
扩容感知 客户端自动感知 需重新配置集群(或腾讯云自动)
适用场景 高并发键值读写 大规模OLAP分析

八、总结

  1. Distributed 表的本质

    • 不存储数据
    • 仅作为分片表的代理层
    • 写入时自动分发,读取时自动聚合
  2. 使用方式

    • 写入 Distributed 表(如 cmo_events_distributed
    • 查询 Distributed 表(自动聚合所有分片)
  3. 腾讯云 ClickHouse 的简化

    • 自动维护分片规则
    • 提供虚拟IP接入点
    • 业务代码无需关心底层节点

🚀 关键提示

  • 你只需连接到集群入口(如腾讯云提供的域名),写入 Distributed 表即可实现分布式写入
  • 读取 Distributed 表即可获取完整数据(无需手动聚合)
  • Distributed 表的 _distributed 后缀只是命名习惯,不是强制要求
相关推荐
阿里云大数据AI技术4 小时前
大数据 AI 平台:构筑 Agentic AI 的核心基石
大数据·人工智能
haogexiaole4 小时前
elasticsearch的使用、api调用、更新、持久化
大数据·elasticsearch·搜索引擎
花花鱼4 小时前
elasticsearch 的配制
大数据·elasticsearch·搜索引擎
这样の我4 小时前
elasticsearch更换为opensearch
大数据·elasticsearch·jenkins
leo_hush6 小时前
flink1.18配置多个上游source和下游sink
java·大数据·flink
Hello.Reader6 小时前
Flink 初体验10 分钟完成下载、安装、本地集群启动与示例作业运行
大数据·flink
杭州杭州杭州6 小时前
基于Flink的用户行为实时分析
大数据·flink
王百万_6 小时前
【浅谈Spark和Flink区别及应用】
大数据·数据库·分布式·flink·spark·数据治理·数据库架构