Iceberg Rest Catalog + OSS 实践踩坑记录:Polaris x-amz-content-sha256 报错 与 Nessie 配置

最近在做查询引擎Iceberg 性能测试,主要是环境准备、测试集准备、性能测试开展。

本篇只包括环境准备部分,记录下环境准备过程,几个方面:

  1. Catalog:尽量贴合生产,需要主流的catalog type,且性能测试在国内,所以Glue、Snowflake Catalog 等都用不了,只能自己部署1套catalog 服务。
  2. Storage:性能测试机器在国内,海外的对象存储是用不了了(比如S3,Azure,GCS),只能用国内的(比如OSS,COS,OBS)且可能由于catalog server没支持到位,只能走S3协议。
  3. Query Engine:保证选的catalog type 几种查询引擎都支持。

过滤以上几个条件,环境情况如下

Type System
catalog type Rest catalog Polaris, Nessie
storage scheme S3 OSS
query engine Doris, Trino

以下集成情况二选一

  1. Doris/Trino + Polaris + OSS
  2. Doris/Trino + Nessie + OSS

Polaris

先说下结论,最新Polaris版本(1.2.0)+ OSS(S3协议) 跑不起来,会有个报错

复制代码
2025-12-12 17:13:45,460 INFO  [org.apa.pol.ser.exc.IcebergExceptionMapper] [4a2120d6-8520-441d-b502-a090f890b03d_0000000000000000030,POLARIS] [,,,] (executor-thread-1) Handling runtimeException aws-chunked encoding is not supported with the specified x-amz-content-sha256 value. (Service: S3, Status Code: 400, Request ID: 693C4D496D7461373771398C) (SDK Attempt Count: 1)

参考两个文档
0002-00000427
使用AWS SDK访问OSS

大概意思这个x-amz-content-sha256 header 不能传,Polaris也没配置参数可以控制这个。

在最新Polaris版本(1.2.0)加了个开关stsUnavailable 支持 Polaris 适配所有支持S3协议的对象存储。在1.2.0 之前,因为必须要走标准的S3 STS鉴权,所以老版本Polaris OSS肯定用不了。

这里有个小插曲,release note里stsUnavailable这个参数拼写错了,导致一直走STS鉴权,花了点时间折腾了下。最终通过日志发现这个参数没设置上,文档上拼写错的,复制错了。
Polaris 1.2.0 release note

当然,顺手提个PR fix下
https://github.com/apache/polaris/pull/3262

附上Polaris + OSS的docker yaml

参考quickstart和ceph example 改的

复制代码
services:

  polaris:
    image: apache/polaris:latest
    ports:
      # API port
      - "8181:8181"
      # Management port (metrics and health checks)
      - "8182:8182"
      # Optional, allows attaching a debugger to the Polaris JVM
      - "5005:5005"
    environment:
      JAVA_DEBUG: true
      JAVA_DEBUG_PORT: "*:5005"
      AWS_REGION: cn-beijing
      AWS_ACCESS_KEY_ID: xxxx
      AWS_SECRET_ACCESS_KEY: xxxx
      AWS_ENDPOINT: http://oss-cn-beijing-internal.aliyuncs.com
      POLARIS_BOOTSTRAP_CREDENTIALS: POLARIS,root,s3cr3t
      polaris.realm-context.realms: POLARIS
      quarkus.otel.sdk.disabled: "true"
    healthcheck:
      test: ["CMD", "curl", "http://localhost:8182/q/health"]
      interval: 2s
      timeout: 10s
      retries: 10
      start_period: 10s

  polaris-setup:
    image: alpine/curl
    depends_on:
      polaris:
        condition: service_healthy
    environment:
      - CLIENT_ID=${ROOT_CLIENT_ID:-root}
      - CLIENT_SECRET=${ROOT_CLIENT_SECRET:-s3cr3t}
      - CATALOG_NAME=${CATALOG_NAME:-quickstart_catalog}
      - REALM=${POLARIS_REALM:-POLARIS}
      - BASE_LOCATION=${BASE_LOCATION:-s3://xxx/polaris_warehouse}
      - S3_ENDPOINT=${S3_ENDPOINT:-http://oss-cn-beijing-internal.aliyuncs.com}

    entrypoint: /bin/sh
    command:
      - -c
      - |
        set -ex
        sleep 10
        sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories
        apk add --no-cache jq
        echo "Obtaining root access token..."
        TOKEN_RESPONSE=$$(curl -s -X POST http://polaris:8181/api/catalog/v1/oauth/tokens \
          -H 'Content-Type: application/x-www-form-urlencoded' \
          -d "grant_type=client_credentials&client_id=$${CLIENT_ID}&client_secret=$${CLIENT_SECRET}&scope=PRINCIPAL_ROLE:ALL")

        TOKEN=$$(echo $$TOKEN_RESPONSE | jq -r '.access_token')
        echo "Obtained access token"

        echo "Creating catalog '$$CATALOG_NAME' in realm $$REALM..."
        PAYLOAD='{
          "catalog": {
            "name": "'$$CATALOG_NAME'",
            "type": "INTERNAL",
            "readOnly": false,
            "properties": {
              "default-base-location": "'$$BASE_LOCATION'"
            },
            "storageConfigInfo": {
              "storageType": "S3",
              "allowedLocations": ["'$$BASE_LOCATION'", "'$$BASE_LOCATION'/"],
              "endpoint": "'$$S3_ENDPOINT'",
              "region": "cn-beijing",
              "endpointInternal": "'$$S3_ENDPOINT'",
              "pathStyleAccess": false,
              "stsUnavailable": true
            }
          }
        }'

        curl -s -X POST http://polaris:8181/api/management/v1/catalogs \
          -H "Authorization: Bearer $$TOKEN" \
          -H "Accept: application/json" \
          -H "Content-Type: application/json" \
          -H "Polaris-Realm: $$REALM" \
          -d "$$PAYLOAD" > /dev/null

        echo "✅ Catalog created"

        echo ""
        echo "Creating principal 'quickstart_user'..."
        PRINCIPAL_RESPONSE=$$(curl -s -X POST http://polaris:8181/api/management/v1/principals \
          -H "Authorization: Bearer $$TOKEN" \
          -H "Polaris-Realm: $$REALM" \
          -H "Content-Type: application/json" \
          -d '{"principal": {"name": "quickstart_user", "properties": {}}}')

        USER_CLIENT_ID=$$(echo $$PRINCIPAL_RESPONSE | jq -r '.credentials.clientId')
        USER_CLIENT_SECRET=$$(echo $$PRINCIPAL_RESPONSE | jq -r '.credentials.clientSecret')

        echo "✅ Principal created with clientId: $$USER_CLIENT_ID"

        echo "Creating principal role 'quickstart_user_role'..."
        curl -s -X POST http://polaris:8181/api/management/v1/principal-roles \
          -H "Authorization: Bearer $$TOKEN" \
          -H "Polaris-Realm: $$REALM" \
          -H "Content-Type: application/json" \
          -d '{"principalRole": {"name": "quickstart_user_role", "properties": {}}}' > /dev/null

        echo "✅ Principal role created"

        echo "Creating catalog role 'quickstart_catalog_role'..."
        curl -s -X POST http://polaris:8181/api/management/v1/catalogs/$$CATALOG_NAME/catalog-roles \
          -H "Authorization: Bearer $$TOKEN" \
          -H "Polaris-Realm: $$REALM" \
          -H "Content-Type: application/json" \
          -d '{"catalogRole": {"name": "quickstart_catalog_role", "properties": {}}}' > /dev/null

        echo "✅ Catalog role created"

        echo "Assigning principal role to principal..."
        curl -s -X PUT http://polaris:8181/api/management/v1/principals/quickstart_user/principal-roles \
          -H "Authorization: Bearer $$TOKEN" \
          -H "Polaris-Realm: $$REALM" \
          -H "Content-Type: application/json" \
          -d '{"principalRole": {"name": "quickstart_user_role"}}' > /dev/null

        echo "✅ Principal role assigned"

        echo "Assigning catalog role to principal role..."
        curl -s -X PUT http://polaris:8181/api/management/v1/principal-roles/quickstart_user_role/catalog-roles/$$CATALOG_NAME \
          -H "Authorization: Bearer $$TOKEN" \
          -H "Polaris-Realm: $$REALM" \
          -H "Content-Type: application/json" \
          -d '{"catalogRole": {"name": "quickstart_catalog_role"}}' > /dev/null

        echo "✅ Catalog role assigned"

        echo "Granting CATALOG_MANAGE_CONTENT privilege..."
        curl -s -X PUT http://polaris:8181/api/management/v1/catalogs/$$CATALOG_NAME/catalog-roles/quickstart_catalog_role/grants \
          -H "Authorization: Bearer $$TOKEN" \
          -H "Polaris-Realm: $$REALM" \
          -H "Content-Type: application/json" \
          -d '{"type": "catalog", "privilege": "CATALOG_MANAGE_CONTENT"}' > /dev/null

        echo "✅ Privileges granted"

        echo ""
        echo "=========================================="
        echo "🎉 Polaris Quickstart Setup Complete!"
        echo "=========================================="
        echo ""
        echo "Catalog: $$CATALOG_NAME"
        echo "  Storage: S3 (MinIO)"
        echo "  Location: s3://bucket123"
        echo "  MinIO UI: http://localhost:9001"
        echo ""
        echo "Root credentials:"
        echo "  Client ID:     $$CLIENT_ID"
        echo "  Client Secret: $$CLIENT_SECRET"
        echo ""
        echo "User credentials:"
        echo "  Client ID:     $$USER_CLIENT_ID"
        echo "  Client Secret: $$USER_CLIENT_SECRET"
        echo ""
        echo "Polaris main APIs:"
        echo "  - Iceberg REST:   http://localhost:8181/api/catalog/v1"
        echo "  - Management:     http://localhost:8181/api/management/v1"
        echo "  - Generic Tables: http://localhost:8181/api/polaris/v1"
        echo ""
        echo "Polaris admin APIs:"
        echo "  - Health check:   http://localhost:8182/q/health"
        echo "  - Metrics:        http://localhost:8182/q/metrics"
        echo ""
        echo "To get started with Spark:"
        echo "  spark-sql \\"
        echo "    --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.0,org.apache.iceberg:iceberg-aws-bundle:1.10.0 \\"
        echo "    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\"
        echo "    --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \\"
        echo "    --conf spark.sql.catalog.polaris.type=rest \\"
        echo "    --conf spark.sql.catalog.polaris.warehouse=$$CATALOG_NAME \\"
        echo "    --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \\"
        echo "    --conf spark.sql.catalog.polaris.credential=$$USER_CLIENT_ID:$$USER_CLIENT_SECRET \\"
        echo "    --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \\"
        echo "    --conf spark.sql.catalog.polaris.s3.endpoint=http://localhost:9000 \\"
        echo "    --conf spark.sql.catalog.polaris.s3.path-style-access=true \\"
        echo "    --conf spark.sql.catalog.polaris.s3.access-key-id=minio_root \\"
        echo "    --conf spark.sql.catalog.polaris.s3.secret-access-key=m1n1opwd \\"
        echo "    --conf spark.sql.catalog.polaris.client.region=irrelevant \\"
        echo "    --conf spark.sql.defaultCatalog=polaris"
        echo ""
        echo "To get started with REST API:"
        echo "  # Get a token"
        echo "  export TOKEN=\$$(curl -s -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \\"
        echo "    -d 'grant_type=client_credentials' \\"
        echo "    -d 'client_id=$$USER_CLIENT_ID' \\"
        echo "    -d 'client_secret=$$USER_CLIENT_SECRET' \\"
        echo "    -d 'scope=PRINCIPAL_ROLE:ALL' \\"
        echo "    | jq -r '.access_token')"
        echo ""
        echo "  # Create a namespace"
        echo "  curl -X POST http://localhost:8181/api/catalog/v1/$$CATALOG_NAME/namespaces \\"
        echo "    -H \"Authorization: Bearer \$$TOKEN\" \\"
        echo "    -H 'Content-Type: application/json' \\"
        echo "    -d '{\"namespace\": [\"my_namespace\"], \"properties\": {}}'"
        echo ""
        echo "  # List namespaces"
        echo "  curl -X GET http://localhost:8181/api/catalog/v1/$$CATALOG_NAME/namespaces \\"
        echo "    -H \"Authorization: Bearer \$$TOKEN\""
        echo ""
        echo "=========================================="

Nessie
这个也说下结论,能跑起来。

首先先看下sha256 这种header是怎么解决的

Nessie有个开关可以控制这块

所以问题迎刃而解了

附上Nessie + OSS的docker yaml

复制代码
version: '3'

services:
  nessie:
    image: ghcr.io/projectnessie/nessie
    container_name: nessie
    ports:
      - "19120:19120"
    environment:
      - nessie.catalog.default-warehouse=warehouse
      - nessie.catalog.warehouses.warehouse.location=s3://mybucket/my-lakehouse/
      - nessie.catalog.warehouses.zgx.location=s3://xxxxx/iceberg_warehouse/
      - nessie.catalog.service.s3.default-options.endpoint=http://oss-cn-beijing-internal.aliyuncs.com
      - nessie.catalog.service.s3.default-options.access-key=urn:nessie-secret:quarkus:nessie.catalog.secrets.access-key
      - nessie.catalog.service.s3.default-options.path-style-access=false
      - nessie.catalog.service.s3.default-options.chunked-encoding-enabled=false
      - nessie.catalog.service.s3.default-options.auth-type=STATIC
      - nessie.catalog.secrets.access-key.name=xxx
      - nessie.catalog.secrets.access-key.secret=xxx
      - nessie.catalog.service.s3.default-options.region=cn-beijing
      - nessie.server.authentication.enabled=false
      - nessie.catalog.service.s3.default-options.request-signing-enabled=false
    networks:
      nessie-rest:

networks:
  nessie-rest:

Trino 测试 nessie 连通性

参考
https://projectnessie.org/nessie-latest/trino/?h=client+temp#starter-configuration

获取对应的配置

复制代码
NESSIE_BASE_URL="http://127.0.0.1:19120/"
curl "${NESSIE_BASE_URL}/iceberg-ext/v1/client-template/trino?format=static"

补充配置 s3.aws-access-key s3.aws-secret-key

Trino 就可以正常读Iceberg表了

复制代码
[trino@dec7c1a34cb6 /]$ trino --catalog nessie
trino> use zgx;
USE
trino:zgx> show tables;
        Table
----------------------
 unpartitioned_table
 unpartitioned_table1
 unpartitioned_table2
 unpartitioned_table3
(4 rows)

Query 20251214_145124_00043_v9qpy, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0.24 [4 rows, 417B] [16 rows/s, 1.72KiB/s]

trino:zgx> select * from unpartitioned_table;
 col1 | col2 |        col3         |  col4  |    col5    |    col6    | col7  |    col8    |            col9
------+------+---------------------+--------+------------+------------+-------+------------+----------------------------
 true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
 true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
 true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
 true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
 true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
(5 rows)

Query 20251214_145128_00044_v9qpy, FINISHED, 1 node
Splits: 5 total, 5 done (100.00%)
0.26 [5 rows, 27.5KiB] [19 rows/s, 107KiB/s]

trino:zgx>

附录
https://trino.io/docs/current/connector/iceberg.html
https://projectnessie.org/nessie-latest/configuration
https://projectnessie.org/nessie-latest/trino/

相关推荐
DBA圈小圈2 天前
【KingbaseES】V8R6查询长事务语句
数据库·postgresql·database
TT哇3 天前
【Database Navigator 插件】idea 社区版连接 mysql 数据库
java·数据库·mysql·intellij-idea·database
dingdingfish5 天前
Oracle 《数据库 2 天开发人员指南》第1章:2天Oracle数据库开发介绍
oracle·database·developer·guide·intro
dingdingfish5 天前
Oracle 《数据库 2 天开发人员指南》第2章:连接与探索Oracle数据库
oracle·database·developer·connect·guide·explore
宝桥南山7 天前
Azure - 尝试使用一下Kusto Query Language(KQL)
sql·microsoft·微软·database·azure·powerbi
dingdingfish7 天前
Oracle 数据库 2 天开发人员指南 翻译及读书笔记
oracle·database·developer·19c·guide
Gauss松鼠会12 天前
【openGauss】如何通过pg_trigger.tgtype获取触发器的各种触发条件
数据库·vr·database·opengauss
Gauss松鼠会12 天前
【GaussDB】如何从GaussDB发布包中提取出内核二进制文件
linux·数据库·database·gaussdb
喂自己代言17 天前
常见的关系型数据库有哪些?如何安装和使用Postgres?(中英双语版)
sql·postgresql·database