StarRocks(2.5.1)vs Clickhouse(21.7.3.14)集群 SSB 性能测试

一、硬件参数

|------|-------------------------------------------|
| 机器 | 6 台服务器 |
| CPU | Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz |
| 内存 | 238G |
| 网络带宽 | 10G |
| 磁盘 | HDD |

二、软件环境

StarRocks 和 Clickhouse 部署在相同配置的机器上分别进行启动测试。

StarRocks 部署 6BE 1FE

Clickhouse 部署六个节点后建立分布式表

内核版本:3.10.0-693.el7.x86_64

操作系统版本:Centos 7

软件版本:StarRocks 2.5.1、ClickHouse 21.7.3.14

三、测试数据与结果

测试数据(两张表是相同的数据集)

|----------------|------|-----------|
| 表名 | 行数 | 解释 |
| lineorder | 6亿 | SSB商品订单表 |
| customer | 300万 | SSB客户表 |
| part | 140万 | SSB零部件表 |
| supplier | 20万 | SSB供应商表 |
| dates | 2556 | 日期表 |
| lineorder_flat | 6亿 | SSB打平后的宽表 |

3.1 测试表建表 SQL

3.1.1 StarRocks 建表 SQL

sql 复制代码
use ssb;

CREATE TABLE `lineorder` (
  `lo_orderkey` int(11) NOT NULL COMMENT "",
  `lo_linenumber` int(11) NOT NULL COMMENT "",
  `lo_custkey` int(11) NOT NULL COMMENT "",
  `lo_partkey` int(11) NOT NULL COMMENT "",
  `lo_suppkey` int(11) NOT NULL COMMENT "",
  `lo_orderdate` int(11) NOT NULL COMMENT "",
  `lo_orderpriority` varchar(16) NOT NULL COMMENT "",
  `lo_shippriority` int(11) NOT NULL COMMENT "",
  `lo_quantity` int(11) NOT NULL COMMENT "",
  `lo_extendedprice` int(11) NOT NULL COMMENT "",
  `lo_ordtotalprice` int(11) NOT NULL COMMENT "",
  `lo_discount` int(11) NOT NULL COMMENT "",
  `lo_revenue` int(11) NOT NULL COMMENT "",
  `lo_supplycost` int(11) NOT NULL COMMENT "",
  `lo_tax` int(11) NOT NULL COMMENT "",
  `lo_commitdate` int(11) NOT NULL COMMENT "",
  `lo_shipmode` varchar(11) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`lo_orderkey`)
COMMENT "OLAP"
PARTITION BY RANGE(`lo_orderdate`)
(PARTITION p1 VALUES [("-2147483648"), ("19930101")),
PARTITION p2 VALUES [("19930101"), ("19940101")),
PARTITION p3 VALUES [("19940101"), ("19950101")),
PARTITION p4 VALUES [("19950101"), ("19960101")),
PARTITION p5 VALUES [("19960101"), ("19970101")),
PARTITION p6 VALUES [("19970101"), ("19980101")),
PARTITION p7 VALUES [("19980101"), ("19990101")))
DISTRIBUTED BY HASH(`lo_orderkey`) BUCKETS 48
PROPERTIES (
    "replication_num" = "1"
);


CREATE TABLE IF NOT EXISTS `customer` (
  `c_custkey` int(11) NOT NULL COMMENT "",
  `c_name` varchar(26) NOT NULL COMMENT "",
  `c_address` varchar(41) NOT NULL COMMENT "",
  `c_city` varchar(11) NOT NULL COMMENT "",
  `c_nation` varchar(16) NOT NULL COMMENT "",
  `c_region` varchar(13) NOT NULL COMMENT "",
  `c_phone` varchar(16) NOT NULL COMMENT "",
  `c_mktsegment` varchar(11) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`c_custkey`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`c_custkey`) BUCKETS 12
PROPERTIES (
    "replication_num" = "1"
);
CREATE TABLE IF NOT EXISTS `dates` (
  `d_datekey` int(11) NOT NULL COMMENT "",
  `d_date` varchar(20) NOT NULL COMMENT "",
  `d_dayofweek` varchar(10) NOT NULL COMMENT "",
  `d_month` varchar(11) NOT NULL COMMENT "",
  `d_year` int(11) NOT NULL COMMENT "",
  `d_yearmonthnum` int(11) NOT NULL COMMENT "",
  `d_yearmonth` varchar(9) NOT NULL COMMENT "",
  `d_daynuminweek` int(11) NOT NULL COMMENT "",
  `d_daynuminmonth` int(11) NOT NULL COMMENT "",
  `d_daynuminyear` int(11) NOT NULL COMMENT "",
  `d_monthnuminyear` int(11) NOT NULL COMMENT "",
  `d_weeknuminyear` int(11) NOT NULL COMMENT "",
  `d_sellingseason` varchar(14) NOT NULL COMMENT "",
  `d_lastdayinweekfl` int(11) NOT NULL COMMENT "",
  `d_lastdayinmonthfl` int(11) NOT NULL COMMENT "",
  `d_holidayfl` int(11) NOT NULL COMMENT "",
  `d_weekdayfl` int(11) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`d_datekey`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`d_datekey`) BUCKETS 1
PROPERTIES (
    "replication_num" = "1"
);

 CREATE TABLE IF NOT EXISTS `supplier` (
  `s_suppkey` int(11) NOT NULL COMMENT "",
  `s_name` varchar(26) NOT NULL COMMENT "",
  `s_address` varchar(26) NOT NULL COMMENT "",
  `s_city` varchar(11) NOT NULL COMMENT "",
  `s_nation` varchar(16) NOT NULL COMMENT "",
  `s_region` varchar(13) NOT NULL COMMENT "",
  `s_phone` varchar(16) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`s_suppkey`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`s_suppkey`) BUCKETS 12
PROPERTIES (
    "replication_num" = "1"
);

CREATE TABLE IF NOT EXISTS `part` (
  `p_partkey` int(11) NOT NULL COMMENT "",
  `p_name` varchar(23) NOT NULL COMMENT "",
  `p_mfgr` varchar(7) NOT NULL COMMENT "",
  `p_category` varchar(8) NOT NULL COMMENT "",
  `p_brand` varchar(10) NOT NULL COMMENT "",
  `p_color` varchar(12) NOT NULL COMMENT "",
  `p_type` varchar(26) NOT NULL COMMENT "",
  `p_size` int(11) NOT NULL COMMENT "",
  `p_container` varchar(11) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`p_partkey`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`p_partkey`) BUCKETS 12
PROPERTIES (
    "replication_num" = "1"
);

CREATE TABLE `lineorder_flat` (
  `LO_ORDERDATE` date NOT NULL COMMENT "",
  `LO_ORDERKEY` int(11) NOT NULL COMMENT "",
  `LO_LINENUMBER` tinyint(4) NOT NULL COMMENT "",
  `LO_CUSTKEY` int(11) NOT NULL COMMENT "",
  `LO_PARTKEY` int(11) NOT NULL COMMENT "",
  `LO_SUPPKEY` int(11) NOT NULL COMMENT "",
  `LO_ORDERPRIORITY` varchar(100) NOT NULL COMMENT "",
  `LO_SHIPPRIORITY` tinyint(4) NOT NULL COMMENT "",
  `LO_QUANTITY` tinyint(4) NOT NULL COMMENT "",
  `LO_EXTENDEDPRICE` int(11) NOT NULL COMMENT "",
  `LO_ORDTOTALPRICE` int(11) NOT NULL COMMENT "",
  `LO_DISCOUNT` tinyint(4) NOT NULL COMMENT "",
  `LO_REVENUE` int(11) NOT NULL COMMENT "",
  `LO_SUPPLYCOST` int(11) NOT NULL COMMENT "",
  `LO_TAX` tinyint(4) NOT NULL COMMENT "",
  `LO_COMMITDATE` date NOT NULL COMMENT "",
  `LO_SHIPMODE` varchar(100) NOT NULL COMMENT "",
  `C_NAME` varchar(100) NOT NULL COMMENT "",
  `C_ADDRESS` varchar(100) NOT NULL COMMENT "",
  `C_CITY` varchar(100) NOT NULL COMMENT "",
  `C_NATION` varchar(100) NOT NULL COMMENT "",
  `C_REGION` varchar(100) NOT NULL COMMENT "",
  `C_PHONE` varchar(100) NOT NULL COMMENT "",
  `C_MKTSEGMENT` varchar(100) NOT NULL COMMENT "",
  `S_NAME` varchar(100) NOT NULL COMMENT "",
  `S_ADDRESS` varchar(100) NOT NULL COMMENT "",
  `S_CITY` varchar(100) NOT NULL COMMENT "",
  `S_NATION` varchar(100) NOT NULL COMMENT "",
  `S_REGION` varchar(100) NOT NULL COMMENT "",
  `S_PHONE` varchar(100) NOT NULL COMMENT "",
  `P_NAME` varchar(100) NOT NULL COMMENT "",
  `P_MFGR` varchar(100) NOT NULL COMMENT "",
  `P_CATEGORY` varchar(100) NOT NULL COMMENT "",
  `P_BRAND` varchar(100) NOT NULL COMMENT "",
  `P_COLOR` varchar(100) NOT NULL COMMENT "",
  `P_TYPE` varchar(100) NOT NULL COMMENT "",
  `P_SIZE` tinyint(4) NOT NULL COMMENT "",
  `P_CONTAINER` varchar(100) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`LO_ORDERDATE`, `LO_ORDERKEY`)
COMMENT "OLAP"
PARTITION BY RANGE(`LO_ORDERDATE`)
(START ("1992-01-01") END ("1999-01-01") EVERY (INTERVAL 1 YEAR))
DISTRIBUTED BY HASH(`LO_ORDERKEY`) BUCKETS 48
PROPERTIES (
    "replication_num" = "1"
);

3.1.2 Clickhouse 建表 SQL

sql 复制代码
create database ssb ON CLUSTER center_cluster;

show databases;
-- customer 本地表
CREATE  TABLE ssb.customer_local ON CLUSTER center_cluster
(
    CCUSTKEY       UInt32,
        CNAME          String,
        CADDRESS       String,
        CCITY          LowCardinality(String),
        CNATION        LowCardinality(String),
        CREGION        LowCardinality(String),
        CPHONE         String,
        CMKTSEGMENT    LowCardinality(String)
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/customer_local/{shard}/replicate', '{replica}')
ORDER BY (CCUSTKEY)
SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192;
-- customer 分布式表
CREATE TABLE ssb.customer ON CLUSTER center_cluster AS ssb.customer_local engine = Distributed(center_cluster, ssb, customer_local, rand());

-- lineorder 本地表
CREATE  TABLE ssb.lineorder_local ON CLUSTER center_cluster
(
    LOORDERKEY             UInt32,
    LOLINENUMBER           UInt8,
    LOCUSTKEY              UInt32,
    LOPARTKEY              UInt32,
    LOSUPPKEY              UInt32,
    LOORDERDATE            Date,
    LOORDERPRIORITY        LowCardinality(String),
    LOSHIPPRIORITY         UInt8,
    LOQUANTITY             UInt8,
    LOEXTENDEDPRICE        UInt32,
    LOORDTOTALPRICE        UInt32,
    LODISCOUNT             UInt8,
    LOREVENUE              UInt32,
    LOSUPPLYCOST           UInt32,
    LOTAX                  UInt8,
    LOCOMMITDATE           Date,
    LOSHIPMODE             LowCardinality(String)
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/lineorder_local/{shard}/replicate', '{replica}')
PARTITION BY toYear(LOORDERDATE) ORDER BY (LOORDERDATE, LOORDERKEY)
SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192;

-- lineorder 分布式表
CREATE TABLE ssb.lineorder ON CLUSTER center_cluster AS ssb.lineorder_local engine = Distributed(center_cluster, ssb, lineorder_local, rand());

-- part 本地表
CREATE  TABLE ssb.part_local ON CLUSTER center_cluster
(
    PPARTKEY       UInt32,
    PNAME          String,
    PMFGR          LowCardinality(String),
    PCATEGORY      LowCardinality(String),
    PBRAND         LowCardinality(String),
    PCOLOR         LowCardinality(String),
    PTYPE          LowCardinality(String),
    PSIZE          UInt8,
    PCONTAINER     LowCardinality(String)
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/part_local/{shard}/replicate', '{replica}')
ORDER BY PPARTKEY
SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192;

-- part 分布式表
CREATE TABLE ssb.part ON CLUSTER center_cluster AS ssb.part_local engine = Distributed(center_cluster, ssb, part_local, rand());

-- supplier本地表
CREATE  TABLE ssb.supplier_local ON CLUSTER center_cluster
(
        SSUPPKEY       UInt32,
        SNAME          String,
        SADDRESS       String,
        SCITY          LowCardinality(String),
        SNATION        LowCardinality(String),
        SREGION        LowCardinality(String),
        SPHONE         String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/supplier_local/{shard}/replicate', '{replica}')
ORDER BY SSUPPKEY
SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192;

CREATE TABLE ssb.supplier ON CLUSTER center_cluster AS ssb.supplier_local engine = Distributed(center_cluster, ssb, supplier_local, rand());

-- 导入数据
clickhouse-client -h p2hadoop074 --format_csv_delimiter="," --query "INSERT INTO ssb.customer FORMAT CSV" < customer.tbl
clickhouse-client -h p2hadoop074 --format_csv_delimiter="," --query "INSERT INTO ssb.part FORMAT CSV" < part.tbl
clickhouse-client -h p2hadoop074 --format_csv_delimiter="," --query "INSERT INTO ssb.supplier FORMAT CSV" < supplier.tbl
clickhouse-client -h p2hadoop074 --format_csv_delimiter="," --query "INSERT INTO ssb.lineorder FORMAT CSV" < lineorder.tbl

-- 查询数据
SELECT  COUNT(*)  from ssb.lineorder -- 600037902
SELECT  COUNT(*)  from ssb.customer  -- 3000000
SELECT  COUNT(*)  from ssb.part -- 1400000
SELECT  COUNT(*)  from ssb.supplier -- 200000

-- lineorderflat本地表
CREATE TABLE ssb.lineorderflat_local  ON CLUSTER center_cluster(
LOORDERKEY UInt32,
LOLINENUMBER UInt8,
LOCUSTKEY UInt32 ,
LOPARTKEY UInt32 ,
LOSUPPKEY UInt32 ,
LOORDERDATE Date ,
LOORDERPRIORITY String ,
LOSHIPPRIORITY UInt8,
LOQUANTITY UInt8,
LOEXTENDEDPRICE UInt32 ,
LOORDTOTALPRICE UInt32 ,
LODISCOUNT UInt8,
LOREVENUE UInt32 ,
LOSUPPLYCOST UInt32 ,
LOTAX UInt32 ,
LOCOMMITDATE Date ,
LOSHIPMODE String ,
CNAME String ,
CADDRESS String ,
CCITY String ,
CNATION String ,
CREGION String ,
CPHONE String ,
CMKTSEGMENT String ,
SNAME String ,
SADDRESS String ,
SCITY String ,
SNATION String ,
SREGION String ,
SPHONE String ,
PNAME String ,
PMFGR String ,
PCATEGORY String ,
PBRAND String ,
PCOLOR String ,
PTYPE String ,
PSIZE UInt8,
PCONTAINER String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/lineorderflat_local/{shard}/replicate', '{replica}')
PARTITION BY toYear(LOORDERDATE)
ORDER BY (LOORDERDATE, LOORDERKEY)
SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192;

-- -- lineorderflat分布式表
CREATE TABLE ssb.lineorderflat ON CLUSTER center_cluster AS ssb.lineorderflat_local engine = Distributed(center_cluster, ssb, lineorderflat_local, rand());


-- 导入宽表
INSERT INTO ssb.lineorderflat SELECT
    l.LOORDERKEY AS LOORDERKEY,
    l.LOLINENUMBER AS LOLINENUMBER,
    l.LOCUSTKEY AS LOCUSTKEY,
    l.LOPARTKEY AS LOPARTKEY,
    l.LOSUPPKEY AS LOSUPPKEY,
    l.LOORDERDATE AS LOORDERDATE,
    l.LOORDERPRIORITY AS LOORDERPRIORITY,
    l.LOSHIPPRIORITY AS LOSHIPPRIORITY,
    l.LOQUANTITY AS LOQUANTITY,
    l.LOEXTENDEDPRICE AS LOEXTENDEDPRICE,
    l.LOORDTOTALPRICE AS LOORDTOTALPRICE,
    l.LODISCOUNT AS LODISCOUNT,
    l.LOREVENUE AS LOREVENUE,
    l.LOSUPPLYCOST AS LOSUPPLYCOST,
    l.LOTAX AS LOTAX,
    l.LOCOMMITDATE AS LOCOMMITDATE,
    l.LOSHIPMODE AS LOSHIPMODE,
    c.CNAME AS CNAME,
    c.CADDRESS AS CADDRESS,
    c.CCITY AS CCITY,
    c.CNATION AS CNATION,
    c.CREGION AS CREGION,
    c.CPHONE AS CPHONE,
    c.CMKTSEGMENT AS CMKTSEGMENT,
    s.SNAME AS SNAME,
    s.SADDRESS AS SADDRESS,
    s.SCITY AS SCITY,
    s.SNATION AS SNATION,
    s.SREGION AS SREGION,
    s.SPHONE AS SPHONE,
    p.PNAME AS PNAME,
    p.PMFGR AS PMFGR,
    p.PCATEGORY AS PCATEGORY,
    p.PBRAND AS PBRAND,
    p.PCOLOR AS PCOLOR,
    p.PTYPE AS PTYPE,
    p.PSIZE AS PSIZE,
    p.PCONTAINER AS PCONTAINER
FROM lineorder AS l
INNER JOIN ssb.customer AS c ON c.CCUSTKEY = l.LOCUSTKEY
INNER JOIN ssb.supplier AS s ON s.SSUPPKEY = l.LOSUPPKEY
INNER JOIN ssb.part AS p ON p.PPARTKEY = l.LOPARTKEY
sql 复制代码
Progress: 23.84 million rows, 1.39 GB (687.66 thousand rows/s., 40.02 MB/s.)

3.2 测试数据集

3.2.1 StarRocks 生成数据集

bash 复制代码
# 下载测试脚本
wget https://starrocks-public.oss-cn-zhangjiakou.aliyuncs.com/ssb-poc-0.9.3.zip
unzip ssb-poc-0.9.3.zip
cd ssb-poc
make && make install

# 生成100G数据脚本
cd output
bin/gen-ssb.sh 100 data_dir
# 测试100G数据
bin/create_db_table.sh ddl_100

3.2.2 ClickHouse 生成数据集

bash 复制代码
# 下载SSBM工具
[root@p2hadoop075 data03]# git clone https://github.com/vadimtk/ssb-dbgen.git
[root@p2hadoop075 data03]# cd ssb-dbgen-master
[root@p2hadoop075 ssb-dbgen-master]# make

# 生成测试数据,机器性能和磁盘有限,所以指定 -s 100
[root@sdw1 ssb-dbgen-master]# ./dbgen -s 100 -T c
[root@sdw1 ssb-dbgen-master]# ./dbgen -s 100 -T p
[root@sdw1 ssb-dbgen-master]# ./dbgen -s 100 -T s
[root@sdw1 ssb-dbgen-master]# ./dbgen -s 100 -T l

# 查看文件
[root@sdw1 ssb-dbgen-master]# ll .tbl
-rw-r--r-- 1 root root   289529327 4月  26 17:21 customer.tbl
-rw-r--r-- 1 root root 63289191180 4月  26 17:38 lineorder.tbl
-rw-r--r-- 1 root root   121042413 4月  26 17:21 part.tbl
-rw-r--r-- 1 root root    17062852 4月  26 17:21 supplier.tbl
[root@sdw1 ssb-dbgen-master]#

# 查看记录数
[root@sdw1 ssb-dbgen-master]# wc -l .tbl
3000000 customer.tbl
600037902 lineorder.tbl
1400000 part.tbl
200000 supplier.tbl

3.3 单表性能测试 SQL

sql 复制代码
-- 单表查询
--Q1.1
SELECT sum(lo_extendedprice * lo_discount) AS `revenue`
FROM lineorder_flat
WHERE lo_orderdate >= '1993-01-01' and lo_orderdate <= '1993-12-31' AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;
 
--Q1.2
SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat  
WHERE lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-01-31' AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35;
 
--Q1.3
SELECT sum(lo_extendedprice * lo_discount) AS revenue
FROM lineorder_flat
WHERE weekofyear(lo_orderdate) = 6 AND lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-12-31'
 AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35;
 
 
--Q2.1
SELECT sum(lo_revenue), year(lo_orderdate) AS year,  p_brand
FROM lineorder_flat
WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA'
GROUP BY year,  p_brand
ORDER BY year, p_brand;
 
--Q2.2
SELECT
sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand >= 'MFGR#2221' AND p_brand <= 'MFGR#2228' AND s_region = 'ASIA'
GROUP BY year,  p_brand
ORDER BY year, p_brand;
  
--Q2.3
SELECT sum(lo_revenue),  year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand = 'MFGR#2239' AND s_region = 'EUROPE'
GROUP BY  year,  p_brand
ORDER BY year, p_brand;
 
 
--Q3.1
SELECT c_nation, s_nation,  year(lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat
WHERE c_region = 'ASIA' AND s_region = 'ASIA' AND lo_orderdate  >= '1992-01-01' AND lo_orderdate   <= '1997-12-31'
GROUP BY c_nation, s_nation, year
ORDER BY  year ASC, revenue DESC;
 
--Q3.2
SELECT  c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES' AND lo_orderdate  >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
 
--Q3.3
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ( 'UNITED KI1' ,'UNITED KI5') AND s_city in ( 'UNITED KI1' ,'UNITED KI5') AND lo_orderdate  >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
 
--Q3.4
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ('UNITED KI1', 'UNITED KI5') AND s_city in ( 'UNITED KI1',  'UNITED KI5') AND  lo_orderdate  >= '1997-12-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city,  s_city, year
ORDER BY year ASC, revenue DESC;
 
 
--Q4.1
SELECT year(lo_orderdate) AS year, c_nation,  sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, c_nation
ORDER BY year ASC, c_nation ASC;
 
--Q4.2
SELECT year(lo_orderdate) AS year,
    s_nation, p_category, sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND  p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, s_nation,  p_category
ORDER BY  year ASC, s_nation ASC, p_category ASC;
 
--Q4.3
SELECT year(lo_orderdate) AS year, s_city, p_brand,
    sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE s_nation = 'UNITED STATES' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_category = 'MFGR#14'
GROUP BY  year,  s_city, p_brand
ORDER BY year ASC,  s_city ASC,  p_brand ASC;

3.4 单表低基数聚合测试 SQL

sql 复制代码
--Q1
select count(*),lo_shipmode from lineorder_flat group by lo_shipmode;
--Q2
select count(distinct lo_shipmode) from lineorder_flat;
--Q3
select count(*),lo_shipmode,lo_orderpriority from lineorder_flat group by lo_shipmode,lo_orderpriority;
--Q4
select count(*),lo_shipmode,lo_orderpriority from lineorder_flat group by lo_shipmode,lo_orderpriority,lo_shippriority;
--Q5
select count(*),lo_shipmode,s_city from lineorder_flat group by lo_shipmode,s_city;
--Q6
select count(*) from lineorder_flat group by c_city,s_city;
--Q7
select count(*) from lineorder_flat group by lo_shipmode,lo_orderdate;
--Q8
select count(*) from lineorder_flat group by lo_orderdate,s_nation,s_region;
--Q9
select count(*) from lineorder_flat group by c_city,s_city,c_nation,s_nation;
--Q10
select count(*) from (select count(*) from lineorder_flat group by lo_shipmode,lo_orderpriority,p_category,s_nation,c_nation) t;
--Q11
select count(*) from (select count(*) from lineorder_flat group by lo_shipmode,lo_orderpriority,p_category,s_nation,c_nation,p_mfgr) t;
--Q12
select count(*) from (select count(*) from lineorder_flat group by substr(lo_shipmode,2),lower(lo_orderpriority),p_category,s_nation,c_nation,s_region,p_mfgr) t;

3.5 测试结果

3.5.1 查询测试结果

|------|--------------|------------------|-----------------|----------|---------|
| | StarRocks(s) | Clickhouse单节点(s) | Clickhouse集群(s) | CK单节点/SR | CK集群/SR |
| Q1.1 | 0.19 | 0.22 | 0.251 | 1.1 | 1.3 |
| Q1.2 | 0.076 | 0.058 | 0.098 | 0.7 | 1.2 |
| Q1.3 | 0.185 | 0.063 | 0.117 | 0.3 | 0.6 |
| Q2.1 | 0.645 | 2.51 | 0.776 | 4 | 1.2 |
| Q2.2 | 0.583 | 2.23 | 0.673 | 3.8 | 1.1 |
| Q2.3 | 0.426 | 2.15 | 0.6 | 1.4 | 1.4 |
| Q3.1 | 1.21 | 3.46 | 1.57 | 2.8 | 1.29 |
| Q3.2 | 0.598 | 3.43 | 1.49 | 5.7 | 2.5 |
| Q3.3 | 0.439 | 2.58 | 0.765 | 5.9 | 1.7 |
| Q3.4 | 0.091 | 0.152 | 0.121 | 1.68 | 1.32 |
| Q4.1 | 0.684 | 3.82 | 1.59 | 5.9 | 2.3 |
| Q4.2 | 0.295 | 1.38 | 0.519 | 4.8 | 1.75 |
| Q4.3 | 0.292 | 1.77 | 0.507 | 6.06 | 1.7 |
| sum | 5.714 | 23.82 | 8.728 | 4.2 | 1.5 |

3.5.2 低基数聚合测试结果

|-----|-----------------------------|--------|--------------|------------------|-----------------|------------|-----------|
| | 查询类型 | 结果集的基数 | StarRocks(s) | ClickHouse单节点(s) | ClickHouse集群(s) | CK(单节点)/SR | CK(集群)/SR |
| Q1 | group by 1个低基数列(<50) | 7 | 0.574 | 1.27 | 0.32 | 2.2 | 0.6 |
| Q2 | count distinct 1个低基数列(<50) | 1 | 0.275 | 1.93 | 0.54 | 7 | 1.9 |
| Q3 | group by 2个低基数列 | 35 | 0.601 | 4.42 | 1.54 | 7.3 | 2.56 |
| Q4 | group by 2个低基数列,一个int列 | 35 | 0.505 | 4.10 | 1.60 | 8 | 3.2 |
| Q5 | group by 4个低基数列(7*250) | 1750 | 0.986 | 4.41 | 1.35 | 4.5 | 1.8 |
| Q6 | group by 2个低基数列(250*250) | 62500 | 1.80 | 10.5 | 2.00 | 5.8 | 1.1 |
| Q7 | group by 1个低基数列(<50)和1个日期列 | 16842 | 0.628 | 2.18 | 0.58 | 3.5 | 0.92 |
| Q8 | group by 2个低基数列(<50)和2个日期列 | 60150 | 0.68 | 3.65 | 0.91 | 5.7 | 1.3 |
| Q9 | group by 4个低基数列 | 62150 | 1.51 | 15.3 | 3.14 | 10 | 2 |
| Q10 | group by 5个低基数列(<50) | 54687 | 3.53 | 21.2 | 4.46 | 6 | 1.26 |
| Q11 | group by 6个低基数列(<50) | 54687 | 3.58 | 23.5 | 4.98 | 6.6 | 1.4 |
| Q12 | group by 7个包含函数计算低基数列(<50) | 469750 | 3.23 | 25.2 | 4.97 | 7.8 | 1.53 |
| sum | | | 17.9 | 117.7 | 26.39 | 6.6 | 1.5 |

总结:

  1. 在刀片集群环境下,数据表 6 亿行,StarRocks 的单表和聚合查询是 ClickHous 性能的1.5倍左右;
  2. 小规模集群下,ClickHous 能利用更小的资源达到较好的查询性能;
  3. 大规模分布式集群下,StarRocks 查询性能在大部分查询场景下,要比 ClickHous 更好;
  4. ClickHous 单节点适合简单查询,复杂聚合查询性能低于 ClickHous 和 StarRocks 集群;
相关推荐
CTO Plus技术服务中3 小时前
ClickHouse原理解析与应用实践教程
clickhouse
zhangyifang_0092 天前
ClickHouse查询报错:Code: 62. DB::Exception: Max query size exceeded:
数据库·clickhouse
HideInTime2 天前
Clickhouse进阶分组复合排序查询
clickhouse
memgLIFE3 天前
clickhouse
clickhouse
Arbori_262154 天前
clickhouse 实现mysql GROUP_CONCAT() 函数
数据库·mysql·clickhouse
斯普信专业组5 天前
Nomad组件部署clickhouse-job
clickhouse·nomad
麦兜和小可的舅舅5 天前
ClickHouse Drop Table On Cluster 故障分析和原理解析
clickhouse·kafka
重生之绝世牛码7 天前
Linux软件安装 —— ClickHouse单节点安装(rpm安装、tar安装两种安装方式)
大数据·linux·运维·数据库·clickhouse·软件安装·clickhouse单节点
麦兜和小可的舅舅7 天前
Spark to ClickHouse由于DNS问题导致Stage重试的Task竞态分析和问题解决过程
clickhouse·spark