最近在做查询引擎Iceberg 性能测试,主要是环境准备、测试集准备、性能测试开展。
本篇只包括环境准备部分,记录下环境准备过程,几个方面:
- Catalog:尽量贴合生产,需要主流的catalog type,且性能测试在国内,所以Glue、Snowflake Catalog 等都用不了,只能自己部署1套catalog 服务。
- Storage:性能测试机器在国内,海外的对象存储是用不了了(比如S3,Azure,GCS),只能用国内的(比如OSS,COS,OBS)且可能由于catalog server没支持到位,只能走S3协议。
- Query Engine:保证选的catalog type 几种查询引擎都支持。
过滤以上几个条件,环境情况如下
| Type | System | |
|---|---|---|
| catalog type | Rest catalog | Polaris, Nessie |
| storage scheme | S3 | OSS |
| query engine | Doris, Trino |
以下集成情况二选一
- Doris/Trino + Polaris + OSS
- Doris/Trino + Nessie + OSS
Polaris
先说下结论,最新Polaris版本(1.2.0)+ OSS(S3协议) 跑不起来,会有个报错
2025-12-12 17:13:45,460 INFO [org.apa.pol.ser.exc.IcebergExceptionMapper] [4a2120d6-8520-441d-b502-a090f890b03d_0000000000000000030,POLARIS] [,,,] (executor-thread-1) Handling runtimeException aws-chunked encoding is not supported with the specified x-amz-content-sha256 value. (Service: S3, Status Code: 400, Request ID: 693C4D496D7461373771398C) (SDK Attempt Count: 1)
参考两个文档
0002-00000427
使用AWS SDK访问OSS
大概意思这个x-amz-content-sha256 header 不能传,Polaris也没配置参数可以控制这个。
在最新Polaris版本(1.2.0)加了个开关stsUnavailable 支持 Polaris 适配所有支持S3协议的对象存储。在1.2.0 之前,因为必须要走标准的S3 STS鉴权,所以老版本Polaris OSS肯定用不了。
这里有个小插曲,release note里stsUnavailable这个参数拼写错了,导致一直走STS鉴权,花了点时间折腾了下。最终通过日志发现这个参数没设置上,文档上拼写错的,复制错了。
Polaris 1.2.0 release note

当然,顺手提个PR fix下
https://github.com/apache/polaris/pull/3262
附上Polaris + OSS的docker yaml
参考quickstart和ceph example 改的
services:
polaris:
image: apache/polaris:latest
ports:
# API port
- "8181:8181"
# Management port (metrics and health checks)
- "8182:8182"
# Optional, allows attaching a debugger to the Polaris JVM
- "5005:5005"
environment:
JAVA_DEBUG: true
JAVA_DEBUG_PORT: "*:5005"
AWS_REGION: cn-beijing
AWS_ACCESS_KEY_ID: xxxx
AWS_SECRET_ACCESS_KEY: xxxx
AWS_ENDPOINT: http://oss-cn-beijing-internal.aliyuncs.com
POLARIS_BOOTSTRAP_CREDENTIALS: POLARIS,root,s3cr3t
polaris.realm-context.realms: POLARIS
quarkus.otel.sdk.disabled: "true"
healthcheck:
test: ["CMD", "curl", "http://localhost:8182/q/health"]
interval: 2s
timeout: 10s
retries: 10
start_period: 10s
polaris-setup:
image: alpine/curl
depends_on:
polaris:
condition: service_healthy
environment:
- CLIENT_ID=${ROOT_CLIENT_ID:-root}
- CLIENT_SECRET=${ROOT_CLIENT_SECRET:-s3cr3t}
- CATALOG_NAME=${CATALOG_NAME:-quickstart_catalog}
- REALM=${POLARIS_REALM:-POLARIS}
- BASE_LOCATION=${BASE_LOCATION:-s3://xxx/polaris_warehouse}
- S3_ENDPOINT=${S3_ENDPOINT:-http://oss-cn-beijing-internal.aliyuncs.com}
entrypoint: /bin/sh
command:
- -c
- |
set -ex
sleep 10
sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories
apk add --no-cache jq
echo "Obtaining root access token..."
TOKEN_RESPONSE=$$(curl -s -X POST http://polaris:8181/api/catalog/v1/oauth/tokens \
-H 'Content-Type: application/x-www-form-urlencoded' \
-d "grant_type=client_credentials&client_id=$${CLIENT_ID}&client_secret=$${CLIENT_SECRET}&scope=PRINCIPAL_ROLE:ALL")
TOKEN=$$(echo $$TOKEN_RESPONSE | jq -r '.access_token')
echo "Obtained access token"
echo "Creating catalog '$$CATALOG_NAME' in realm $$REALM..."
PAYLOAD='{
"catalog": {
"name": "'$$CATALOG_NAME'",
"type": "INTERNAL",
"readOnly": false,
"properties": {
"default-base-location": "'$$BASE_LOCATION'"
},
"storageConfigInfo": {
"storageType": "S3",
"allowedLocations": ["'$$BASE_LOCATION'", "'$$BASE_LOCATION'/"],
"endpoint": "'$$S3_ENDPOINT'",
"region": "cn-beijing",
"endpointInternal": "'$$S3_ENDPOINT'",
"pathStyleAccess": false,
"stsUnavailable": true
}
}
}'
curl -s -X POST http://polaris:8181/api/management/v1/catalogs \
-H "Authorization: Bearer $$TOKEN" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-H "Polaris-Realm: $$REALM" \
-d "$$PAYLOAD" > /dev/null
echo "✅ Catalog created"
echo ""
echo "Creating principal 'quickstart_user'..."
PRINCIPAL_RESPONSE=$$(curl -s -X POST http://polaris:8181/api/management/v1/principals \
-H "Authorization: Bearer $$TOKEN" \
-H "Polaris-Realm: $$REALM" \
-H "Content-Type: application/json" \
-d '{"principal": {"name": "quickstart_user", "properties": {}}}')
USER_CLIENT_ID=$$(echo $$PRINCIPAL_RESPONSE | jq -r '.credentials.clientId')
USER_CLIENT_SECRET=$$(echo $$PRINCIPAL_RESPONSE | jq -r '.credentials.clientSecret')
echo "✅ Principal created with clientId: $$USER_CLIENT_ID"
echo "Creating principal role 'quickstart_user_role'..."
curl -s -X POST http://polaris:8181/api/management/v1/principal-roles \
-H "Authorization: Bearer $$TOKEN" \
-H "Polaris-Realm: $$REALM" \
-H "Content-Type: application/json" \
-d '{"principalRole": {"name": "quickstart_user_role", "properties": {}}}' > /dev/null
echo "✅ Principal role created"
echo "Creating catalog role 'quickstart_catalog_role'..."
curl -s -X POST http://polaris:8181/api/management/v1/catalogs/$$CATALOG_NAME/catalog-roles \
-H "Authorization: Bearer $$TOKEN" \
-H "Polaris-Realm: $$REALM" \
-H "Content-Type: application/json" \
-d '{"catalogRole": {"name": "quickstart_catalog_role", "properties": {}}}' > /dev/null
echo "✅ Catalog role created"
echo "Assigning principal role to principal..."
curl -s -X PUT http://polaris:8181/api/management/v1/principals/quickstart_user/principal-roles \
-H "Authorization: Bearer $$TOKEN" \
-H "Polaris-Realm: $$REALM" \
-H "Content-Type: application/json" \
-d '{"principalRole": {"name": "quickstart_user_role"}}' > /dev/null
echo "✅ Principal role assigned"
echo "Assigning catalog role to principal role..."
curl -s -X PUT http://polaris:8181/api/management/v1/principal-roles/quickstart_user_role/catalog-roles/$$CATALOG_NAME \
-H "Authorization: Bearer $$TOKEN" \
-H "Polaris-Realm: $$REALM" \
-H "Content-Type: application/json" \
-d '{"catalogRole": {"name": "quickstart_catalog_role"}}' > /dev/null
echo "✅ Catalog role assigned"
echo "Granting CATALOG_MANAGE_CONTENT privilege..."
curl -s -X PUT http://polaris:8181/api/management/v1/catalogs/$$CATALOG_NAME/catalog-roles/quickstart_catalog_role/grants \
-H "Authorization: Bearer $$TOKEN" \
-H "Polaris-Realm: $$REALM" \
-H "Content-Type: application/json" \
-d '{"type": "catalog", "privilege": "CATALOG_MANAGE_CONTENT"}' > /dev/null
echo "✅ Privileges granted"
echo ""
echo "=========================================="
echo "🎉 Polaris Quickstart Setup Complete!"
echo "=========================================="
echo ""
echo "Catalog: $$CATALOG_NAME"
echo " Storage: S3 (MinIO)"
echo " Location: s3://bucket123"
echo " MinIO UI: http://localhost:9001"
echo ""
echo "Root credentials:"
echo " Client ID: $$CLIENT_ID"
echo " Client Secret: $$CLIENT_SECRET"
echo ""
echo "User credentials:"
echo " Client ID: $$USER_CLIENT_ID"
echo " Client Secret: $$USER_CLIENT_SECRET"
echo ""
echo "Polaris main APIs:"
echo " - Iceberg REST: http://localhost:8181/api/catalog/v1"
echo " - Management: http://localhost:8181/api/management/v1"
echo " - Generic Tables: http://localhost:8181/api/polaris/v1"
echo ""
echo "Polaris admin APIs:"
echo " - Health check: http://localhost:8182/q/health"
echo " - Metrics: http://localhost:8182/q/metrics"
echo ""
echo "To get started with Spark:"
echo " spark-sql \\"
echo " --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.0,org.apache.iceberg:iceberg-aws-bundle:1.10.0 \\"
echo " --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\"
echo " --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \\"
echo " --conf spark.sql.catalog.polaris.type=rest \\"
echo " --conf spark.sql.catalog.polaris.warehouse=$$CATALOG_NAME \\"
echo " --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \\"
echo " --conf spark.sql.catalog.polaris.credential=$$USER_CLIENT_ID:$$USER_CLIENT_SECRET \\"
echo " --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \\"
echo " --conf spark.sql.catalog.polaris.s3.endpoint=http://localhost:9000 \\"
echo " --conf spark.sql.catalog.polaris.s3.path-style-access=true \\"
echo " --conf spark.sql.catalog.polaris.s3.access-key-id=minio_root \\"
echo " --conf spark.sql.catalog.polaris.s3.secret-access-key=m1n1opwd \\"
echo " --conf spark.sql.catalog.polaris.client.region=irrelevant \\"
echo " --conf spark.sql.defaultCatalog=polaris"
echo ""
echo "To get started with REST API:"
echo " # Get a token"
echo " export TOKEN=\$$(curl -s -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \\"
echo " -d 'grant_type=client_credentials' \\"
echo " -d 'client_id=$$USER_CLIENT_ID' \\"
echo " -d 'client_secret=$$USER_CLIENT_SECRET' \\"
echo " -d 'scope=PRINCIPAL_ROLE:ALL' \\"
echo " | jq -r '.access_token')"
echo ""
echo " # Create a namespace"
echo " curl -X POST http://localhost:8181/api/catalog/v1/$$CATALOG_NAME/namespaces \\"
echo " -H \"Authorization: Bearer \$$TOKEN\" \\"
echo " -H 'Content-Type: application/json' \\"
echo " -d '{\"namespace\": [\"my_namespace\"], \"properties\": {}}'"
echo ""
echo " # List namespaces"
echo " curl -X GET http://localhost:8181/api/catalog/v1/$$CATALOG_NAME/namespaces \\"
echo " -H \"Authorization: Bearer \$$TOKEN\""
echo ""
echo "=========================================="
Nessie
这个也说下结论,能跑起来。
首先先看下sha256 这种header是怎么解决的
Nessie有个开关可以控制这块

所以问题迎刃而解了
附上Nessie + OSS的docker yaml
version: '3'
services:
nessie:
image: ghcr.io/projectnessie/nessie
container_name: nessie
ports:
- "19120:19120"
environment:
- nessie.catalog.default-warehouse=warehouse
- nessie.catalog.warehouses.warehouse.location=s3://mybucket/my-lakehouse/
- nessie.catalog.warehouses.zgx.location=s3://xxxxx/iceberg_warehouse/
- nessie.catalog.service.s3.default-options.endpoint=http://oss-cn-beijing-internal.aliyuncs.com
- nessie.catalog.service.s3.default-options.access-key=urn:nessie-secret:quarkus:nessie.catalog.secrets.access-key
- nessie.catalog.service.s3.default-options.path-style-access=false
- nessie.catalog.service.s3.default-options.chunked-encoding-enabled=false
- nessie.catalog.service.s3.default-options.auth-type=STATIC
- nessie.catalog.secrets.access-key.name=xxx
- nessie.catalog.secrets.access-key.secret=xxx
- nessie.catalog.service.s3.default-options.region=cn-beijing
- nessie.server.authentication.enabled=false
- nessie.catalog.service.s3.default-options.request-signing-enabled=false
networks:
nessie-rest:
networks:
nessie-rest:
Trino 测试 nessie 连通性
参考
https://projectnessie.org/nessie-latest/trino/?h=client+temp#starter-configuration
获取对应的配置
NESSIE_BASE_URL="http://127.0.0.1:19120/"
curl "${NESSIE_BASE_URL}/iceberg-ext/v1/client-template/trino?format=static"
补充配置 s3.aws-access-key s3.aws-secret-key
Trino 就可以正常读Iceberg表了
[trino@dec7c1a34cb6 /]$ trino --catalog nessie
trino> use zgx;
USE
trino:zgx> show tables;
Table
----------------------
unpartitioned_table
unpartitioned_table1
unpartitioned_table2
unpartitioned_table3
(4 rows)
Query 20251214_145124_00043_v9qpy, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0.24 [4 rows, 417B] [16 rows/s, 1.72KiB/s]
trino:zgx> select * from unpartitioned_table;
col1 | col2 | col3 | col4 | col5 | col6 | col7 | col8 | col9
------+------+---------------------+--------+------------+------------+-------+------------+----------------------------
true | 101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
true | 101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
true | 101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
true | 101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
true | 101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
(5 rows)
Query 20251214_145128_00044_v9qpy, FINISHED, 1 node
Splits: 5 total, 5 done (100.00%)
0.26 [5 rows, 27.5KiB] [19 rows/s, 107KiB/s]
trino:zgx>
附录
https://trino.io/docs/current/connector/iceberg.html
https://projectnessie.org/nessie-latest/configuration
https://projectnessie.org/nessie-latest/trino/