Iceberg Rest Catalog + OSS 实践踩坑记录:Polaris x-amz-content-sha256 报错 与 Nessie 配置

最近在做查询引擎Iceberg 性能测试,主要是环境准备、测试集准备、性能测试开展。

本篇只包括环境准备部分,记录下环境准备过程,几个方面:

  1. Catalog:尽量贴合生产,需要主流的catalog type,且性能测试在国内,所以Glue、Snowflake Catalog 等都用不了,只能自己部署1套catalog 服务。
  2. Storage:性能测试机器在国内,海外的对象存储是用不了了(比如S3,Azure,GCS),只能用国内的(比如OSS,COS,OBS)且可能由于catalog server没支持到位,只能走S3协议。
  3. Query Engine:保证选的catalog type 几种查询引擎都支持。

过滤以上几个条件,环境情况如下

Type System
catalog type Rest catalog Polaris, Nessie
storage scheme S3 OSS
query engine Doris, Trino

以下集成情况二选一

  1. Doris/Trino + Polaris + OSS
  2. Doris/Trino + Nessie + OSS

Polaris

先说下结论,最新Polaris版本(1.2.0)+ OSS(S3协议) 跑不起来,会有个报错

复制代码
2025-12-12 17:13:45,460 INFO  [org.apa.pol.ser.exc.IcebergExceptionMapper] [4a2120d6-8520-441d-b502-a090f890b03d_0000000000000000030,POLARIS] [,,,] (executor-thread-1) Handling runtimeException aws-chunked encoding is not supported with the specified x-amz-content-sha256 value. (Service: S3, Status Code: 400, Request ID: 693C4D496D7461373771398C) (SDK Attempt Count: 1)

参考两个文档
0002-00000427
使用AWS SDK访问OSS

大概意思这个x-amz-content-sha256 header 不能传,Polaris也没配置参数可以控制这个。

在最新Polaris版本(1.2.0)加了个开关stsUnavailable 支持 Polaris 适配所有支持S3协议的对象存储。在1.2.0 之前,因为必须要走标准的S3 STS鉴权,所以老版本Polaris OSS肯定用不了。

这里有个小插曲,release note里stsUnavailable这个参数拼写错了,导致一直走STS鉴权,花了点时间折腾了下。最终通过日志发现这个参数没设置上,文档上拼写错的,复制错了。
Polaris 1.2.0 release note

当然,顺手提个PR fix下
https://github.com/apache/polaris/pull/3262

附上Polaris + OSS的docker yaml

参考quickstart和ceph example 改的

复制代码
services:

  polaris:
    image: apache/polaris:latest
    ports:
      # API port
      - "8181:8181"
      # Management port (metrics and health checks)
      - "8182:8182"
      # Optional, allows attaching a debugger to the Polaris JVM
      - "5005:5005"
    environment:
      JAVA_DEBUG: true
      JAVA_DEBUG_PORT: "*:5005"
      AWS_REGION: cn-beijing
      AWS_ACCESS_KEY_ID: xxxx
      AWS_SECRET_ACCESS_KEY: xxxx
      AWS_ENDPOINT: http://oss-cn-beijing-internal.aliyuncs.com
      POLARIS_BOOTSTRAP_CREDENTIALS: POLARIS,root,s3cr3t
      polaris.realm-context.realms: POLARIS
      quarkus.otel.sdk.disabled: "true"
    healthcheck:
      test: ["CMD", "curl", "http://localhost:8182/q/health"]
      interval: 2s
      timeout: 10s
      retries: 10
      start_period: 10s

  polaris-setup:
    image: alpine/curl
    depends_on:
      polaris:
        condition: service_healthy
    environment:
      - CLIENT_ID=${ROOT_CLIENT_ID:-root}
      - CLIENT_SECRET=${ROOT_CLIENT_SECRET:-s3cr3t}
      - CATALOG_NAME=${CATALOG_NAME:-quickstart_catalog}
      - REALM=${POLARIS_REALM:-POLARIS}
      - BASE_LOCATION=${BASE_LOCATION:-s3://xxx/polaris_warehouse}
      - S3_ENDPOINT=${S3_ENDPOINT:-http://oss-cn-beijing-internal.aliyuncs.com}

    entrypoint: /bin/sh
    command:
      - -c
      - |
        set -ex
        sleep 10
        sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories
        apk add --no-cache jq
        echo "Obtaining root access token..."
        TOKEN_RESPONSE=$$(curl -s -X POST http://polaris:8181/api/catalog/v1/oauth/tokens \
          -H 'Content-Type: application/x-www-form-urlencoded' \
          -d "grant_type=client_credentials&client_id=$${CLIENT_ID}&client_secret=$${CLIENT_SECRET}&scope=PRINCIPAL_ROLE:ALL")

        TOKEN=$$(echo $$TOKEN_RESPONSE | jq -r '.access_token')
        echo "Obtained access token"

        echo "Creating catalog '$$CATALOG_NAME' in realm $$REALM..."
        PAYLOAD='{
          "catalog": {
            "name": "'$$CATALOG_NAME'",
            "type": "INTERNAL",
            "readOnly": false,
            "properties": {
              "default-base-location": "'$$BASE_LOCATION'"
            },
            "storageConfigInfo": {
              "storageType": "S3",
              "allowedLocations": ["'$$BASE_LOCATION'", "'$$BASE_LOCATION'/"],
              "endpoint": "'$$S3_ENDPOINT'",
              "region": "cn-beijing",
              "endpointInternal": "'$$S3_ENDPOINT'",
              "pathStyleAccess": false,
              "stsUnavailable": true
            }
          }
        }'

        curl -s -X POST http://polaris:8181/api/management/v1/catalogs \
          -H "Authorization: Bearer $$TOKEN" \
          -H "Accept: application/json" \
          -H "Content-Type: application/json" \
          -H "Polaris-Realm: $$REALM" \
          -d "$$PAYLOAD" > /dev/null

        echo "✅ Catalog created"

        echo ""
        echo "Creating principal 'quickstart_user'..."
        PRINCIPAL_RESPONSE=$$(curl -s -X POST http://polaris:8181/api/management/v1/principals \
          -H "Authorization: Bearer $$TOKEN" \
          -H "Polaris-Realm: $$REALM" \
          -H "Content-Type: application/json" \
          -d '{"principal": {"name": "quickstart_user", "properties": {}}}')

        USER_CLIENT_ID=$$(echo $$PRINCIPAL_RESPONSE | jq -r '.credentials.clientId')
        USER_CLIENT_SECRET=$$(echo $$PRINCIPAL_RESPONSE | jq -r '.credentials.clientSecret')

        echo "✅ Principal created with clientId: $$USER_CLIENT_ID"

        echo "Creating principal role 'quickstart_user_role'..."
        curl -s -X POST http://polaris:8181/api/management/v1/principal-roles \
          -H "Authorization: Bearer $$TOKEN" \
          -H "Polaris-Realm: $$REALM" \
          -H "Content-Type: application/json" \
          -d '{"principalRole": {"name": "quickstart_user_role", "properties": {}}}' > /dev/null

        echo "✅ Principal role created"

        echo "Creating catalog role 'quickstart_catalog_role'..."
        curl -s -X POST http://polaris:8181/api/management/v1/catalogs/$$CATALOG_NAME/catalog-roles \
          -H "Authorization: Bearer $$TOKEN" \
          -H "Polaris-Realm: $$REALM" \
          -H "Content-Type: application/json" \
          -d '{"catalogRole": {"name": "quickstart_catalog_role", "properties": {}}}' > /dev/null

        echo "✅ Catalog role created"

        echo "Assigning principal role to principal..."
        curl -s -X PUT http://polaris:8181/api/management/v1/principals/quickstart_user/principal-roles \
          -H "Authorization: Bearer $$TOKEN" \
          -H "Polaris-Realm: $$REALM" \
          -H "Content-Type: application/json" \
          -d '{"principalRole": {"name": "quickstart_user_role"}}' > /dev/null

        echo "✅ Principal role assigned"

        echo "Assigning catalog role to principal role..."
        curl -s -X PUT http://polaris:8181/api/management/v1/principal-roles/quickstart_user_role/catalog-roles/$$CATALOG_NAME \
          -H "Authorization: Bearer $$TOKEN" \
          -H "Polaris-Realm: $$REALM" \
          -H "Content-Type: application/json" \
          -d '{"catalogRole": {"name": "quickstart_catalog_role"}}' > /dev/null

        echo "✅ Catalog role assigned"

        echo "Granting CATALOG_MANAGE_CONTENT privilege..."
        curl -s -X PUT http://polaris:8181/api/management/v1/catalogs/$$CATALOG_NAME/catalog-roles/quickstart_catalog_role/grants \
          -H "Authorization: Bearer $$TOKEN" \
          -H "Polaris-Realm: $$REALM" \
          -H "Content-Type: application/json" \
          -d '{"type": "catalog", "privilege": "CATALOG_MANAGE_CONTENT"}' > /dev/null

        echo "✅ Privileges granted"

        echo ""
        echo "=========================================="
        echo "🎉 Polaris Quickstart Setup Complete!"
        echo "=========================================="
        echo ""
        echo "Catalog: $$CATALOG_NAME"
        echo "  Storage: S3 (MinIO)"
        echo "  Location: s3://bucket123"
        echo "  MinIO UI: http://localhost:9001"
        echo ""
        echo "Root credentials:"
        echo "  Client ID:     $$CLIENT_ID"
        echo "  Client Secret: $$CLIENT_SECRET"
        echo ""
        echo "User credentials:"
        echo "  Client ID:     $$USER_CLIENT_ID"
        echo "  Client Secret: $$USER_CLIENT_SECRET"
        echo ""
        echo "Polaris main APIs:"
        echo "  - Iceberg REST:   http://localhost:8181/api/catalog/v1"
        echo "  - Management:     http://localhost:8181/api/management/v1"
        echo "  - Generic Tables: http://localhost:8181/api/polaris/v1"
        echo ""
        echo "Polaris admin APIs:"
        echo "  - Health check:   http://localhost:8182/q/health"
        echo "  - Metrics:        http://localhost:8182/q/metrics"
        echo ""
        echo "To get started with Spark:"
        echo "  spark-sql \\"
        echo "    --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.0,org.apache.iceberg:iceberg-aws-bundle:1.10.0 \\"
        echo "    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\"
        echo "    --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \\"
        echo "    --conf spark.sql.catalog.polaris.type=rest \\"
        echo "    --conf spark.sql.catalog.polaris.warehouse=$$CATALOG_NAME \\"
        echo "    --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \\"
        echo "    --conf spark.sql.catalog.polaris.credential=$$USER_CLIENT_ID:$$USER_CLIENT_SECRET \\"
        echo "    --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \\"
        echo "    --conf spark.sql.catalog.polaris.s3.endpoint=http://localhost:9000 \\"
        echo "    --conf spark.sql.catalog.polaris.s3.path-style-access=true \\"
        echo "    --conf spark.sql.catalog.polaris.s3.access-key-id=minio_root \\"
        echo "    --conf spark.sql.catalog.polaris.s3.secret-access-key=m1n1opwd \\"
        echo "    --conf spark.sql.catalog.polaris.client.region=irrelevant \\"
        echo "    --conf spark.sql.defaultCatalog=polaris"
        echo ""
        echo "To get started with REST API:"
        echo "  # Get a token"
        echo "  export TOKEN=\$$(curl -s -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \\"
        echo "    -d 'grant_type=client_credentials' \\"
        echo "    -d 'client_id=$$USER_CLIENT_ID' \\"
        echo "    -d 'client_secret=$$USER_CLIENT_SECRET' \\"
        echo "    -d 'scope=PRINCIPAL_ROLE:ALL' \\"
        echo "    | jq -r '.access_token')"
        echo ""
        echo "  # Create a namespace"
        echo "  curl -X POST http://localhost:8181/api/catalog/v1/$$CATALOG_NAME/namespaces \\"
        echo "    -H \"Authorization: Bearer \$$TOKEN\" \\"
        echo "    -H 'Content-Type: application/json' \\"
        echo "    -d '{\"namespace\": [\"my_namespace\"], \"properties\": {}}'"
        echo ""
        echo "  # List namespaces"
        echo "  curl -X GET http://localhost:8181/api/catalog/v1/$$CATALOG_NAME/namespaces \\"
        echo "    -H \"Authorization: Bearer \$$TOKEN\""
        echo ""
        echo "=========================================="

Nessie
这个也说下结论,能跑起来。

首先先看下sha256 这种header是怎么解决的

Nessie有个开关可以控制这块

所以问题迎刃而解了

附上Nessie + OSS的docker yaml

复制代码
version: '3'

services:
  nessie:
    image: ghcr.io/projectnessie/nessie
    container_name: nessie
    ports:
      - "19120:19120"
    environment:
      - nessie.catalog.default-warehouse=warehouse
      - nessie.catalog.warehouses.warehouse.location=s3://mybucket/my-lakehouse/
      - nessie.catalog.warehouses.zgx.location=s3://xxxxx/iceberg_warehouse/
      - nessie.catalog.service.s3.default-options.endpoint=http://oss-cn-beijing-internal.aliyuncs.com
      - nessie.catalog.service.s3.default-options.access-key=urn:nessie-secret:quarkus:nessie.catalog.secrets.access-key
      - nessie.catalog.service.s3.default-options.path-style-access=false
      - nessie.catalog.service.s3.default-options.chunked-encoding-enabled=false
      - nessie.catalog.service.s3.default-options.auth-type=STATIC
      - nessie.catalog.secrets.access-key.name=xxx
      - nessie.catalog.secrets.access-key.secret=xxx
      - nessie.catalog.service.s3.default-options.region=cn-beijing
      - nessie.server.authentication.enabled=false
      - nessie.catalog.service.s3.default-options.request-signing-enabled=false
    networks:
      nessie-rest:

networks:
  nessie-rest:

Trino 测试 nessie 连通性

参考
https://projectnessie.org/nessie-latest/trino/?h=client+temp#starter-configuration

获取对应的配置

复制代码
NESSIE_BASE_URL="http://127.0.0.1:19120/"
curl "${NESSIE_BASE_URL}/iceberg-ext/v1/client-template/trino?format=static"

补充配置 s3.aws-access-key s3.aws-secret-key

Trino 就可以正常读Iceberg表了

复制代码
[trino@dec7c1a34cb6 /]$ trino --catalog nessie
trino> use zgx;
USE
trino:zgx> show tables;
        Table
----------------------
 unpartitioned_table
 unpartitioned_table1
 unpartitioned_table2
 unpartitioned_table3
(4 rows)

Query 20251214_145124_00043_v9qpy, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0.24 [4 rows, 417B] [16 rows/s, 1.72KiB/s]

trino:zgx> select * from unpartitioned_table;
 col1 | col2 |        col3         |  col4  |    col5    |    col6    | col7  |    col8    |            col9
------+------+---------------------+--------+------------+------------+-------+------------+----------------------------
 true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
 true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
 true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
 true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
 true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
(5 rows)

Query 20251214_145128_00044_v9qpy, FINISHED, 1 node
Splits: 5 total, 5 done (100.00%)
0.26 [5 rows, 27.5KiB] [19 rows/s, 107KiB/s]

trino:zgx>

附录
https://trino.io/docs/current/connector/iceberg.html
https://projectnessie.org/nessie-latest/configuration
https://projectnessie.org/nessie-latest/trino/

相关推荐
StarRocks_labs1 小时前
StarRocks × Iceberg:联邦查询实践解析
数据库·starrocks·sql·iceberg·物化视图
暴躁小师兄数据学院5 天前
【AI大数据工程师特训笔记】第16讲:大数据环境安装
大数据·hadoop·笔记·flink·spark·database
小旭95276 天前
MySQL 主从复制、MyCat 读写分离与分库分表实战
java·数据库·sql·mysql·database
我是一颗柠檬7 天前
【Redis】事务与Lua脚本Day7(2026年)
数据库·redis·后端·lua·database
我是一颗柠檬8 天前
【Redis】持久化机制Day6(2026年)
数据库·redis·后端·缓存·database
我是一颗柠檬9 天前
【MySQL全面教学】MySQL性能优化实战Day13(2026年)
数据库·后端·sql·mysql·性能优化·database
我是一颗柠檬9 天前
【Redis】字符串与哈希Day3(2026年)
数据库·redis·后端·database
我是一颗柠檬12 天前
【MySQL全面教学】MySQL锁机制与并发控制Day10(2026年)
数据库·sql·mysql·database
我是一颗柠檬15 天前
【MySQL全面教学】MySQL聚合函数与分组Day5(2026年)
数据库·后端·mysql·database
我是一颗柠檬17 天前
【MySQL全面教学】MySQL基础与环境搭建Day1(2026年)
数据库·后端·sql·mysql·database