流式数据湖Paimon探秘之旅 (十九) REST Catalog自定义服务开发

第19章:REST Catalog自定义服务开发

导言:打造跨系统的元数据服务

在前面的章节中,我们讲解了Paimon的Catalog体系。但在分布式系统中,往往需要跨集群、跨云的元数据管理。REST Catalog就是为了解决这个问题而设计的------通过HTTP接口暴露元数据服务,使得远程系统可以访问和管理Paimon表。

本章将讲解如何从零开始构建一个生产级别的REST Catalog服务


第一部分:REST Catalog架构设计

1.1 REST Catalog的作用

css 复制代码
传统Catalog(文件系统):
Spark任务 → 直接访问HDFS Catalog
Flink任务 → 直接访问HDFS Catalog
Presto查询 → 直接访问HDFS Catalog

问题:
├─ 大量网络I/O(每个操作都要访问HDFS)
├─ 无法跨云跨集群
├─ 元数据访问无法集中控制
└─ 缺乏审计和安全控制

REST Catalog解决方案:
REST Catalog Service(中央服务)
    ├─ 维护元数据缓存
    ├─ 集中认证和授权
    ├─ 审计日志
    └─ 性能优化

Spark → REST API → Service → 最终存储
Flink → REST API → Service → 最终存储
Presto → REST API → Service → 最终存储

1.2 REST Catalog的核心接口

Paimon REST Catalog需要实现的关键API:

bash 复制代码
核心CRUD操作:
├─ 数据库管理:GET /apis/v1/databases, POST /apis/v1/databases/{db}
├─ 表管理:GET /apis/v1/{db}/tables, POST /apis/v1/{db}/tables/{table}
├─ Schema操作:GET /apis/v1/{db}/{table}/schema, POST /apis/v1/{db}/{table}/schema
└─ 统计信息:GET /apis/v1/{db}/{table}/stats

表操作:
├─ 分区:GET /apis/v1/{db}/{table}/partitions
├─ Snapshot:GET /apis/v1/{db}/{table}/snapshots
└─ 清理:DELETE /apis/v1/{db}/{table}/partitions/{pt}

第二部分:构建REST Catalog服务框架

2.1 项目依赖配置

xml 复制代码
<project>
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.example</groupId>
    <artifactId>paimon-rest-catalog</artifactId>
    <version>1.0.0</version>

    <properties>
        <maven.compiler.source>11</maven.compiler.source>
        <maven.compiler.target>11</maven.compiler.target>
        <paimon.version>0.9.0</paimon.version>
        <spring-boot.version>2.7.0</spring-boot.version>
    </properties>

    <dependencies>
        <!-- Paimon Core -->
        <dependency>
            <groupId>org.apache.paimon</groupId>
            <artifactId>paimon-core</artifactId>
            <version>${paimon.version}</version>
        </dependency>

        <!-- Spring Boot Web -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
            <version>${spring-boot.version}</version>
        </dependency>

        <!-- Spring Boot Data JPA(用于元数据持久化)-->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
            <version>${spring-boot.version}</version>
        </dependency>

        <!-- MySQL Driver -->
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.33</version>
        </dependency>

        <!-- Lombok -->
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.30</version>
            <scope>provided</scope>
        </dependency>

        <!-- Jackson for JSON -->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.15.2</version>
        </dependency>

        <!-- Guava for Caching -->
        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>32.0.0-jre</version>
        </dependency>

        <!-- Logging -->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>2.0.5</version>
        </dependency>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.4.7</version>
        </dependency>
    </dependencies>
</project>

2.2 核心数据模型

java 复制代码
package com.example.paimon.catalog.model;

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import javax.persistence.*;
import java.time.LocalDateTime;

// 表元数据
@Entity
@Table(name = "paimon_tables")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class TableMetadata {
    
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    
    @Column(nullable = false)
    private String database;
    
    @Column(nullable = false)
    private String table;
    
    @Column(columnDefinition = "LONGTEXT")
    private String schema;  // JSON格式的Schema
    
    @Column(columnDefinition = "LONGTEXT")
    private String properties;  // 表属性
    
    @Column(nullable = false)
    private String location;  // HDFS或S3路径
    
    @Column(nullable = false)
    private String owner;
    
    @Column(nullable = false)
    private Long createdAt;
    
    @Column(nullable = false)
    private Long updatedAt;
    
    @Column(length = 50)
    private String status;  // ACTIVE, DELETED, ARCHIVED
    
    @Index(columnList = "database, table")
    private String uniqueKey;  // 用于快速查询
}

// 字段元数据
@Entity
@Table(name = "paimon_columns")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class ColumnMetadata {
    
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    
    @Column(nullable = false)
    private String database;
    
    @Column(nullable = false)
    private String table;
    
    @Column(nullable = false)
    private String columnName;
    
    @Column(nullable = false)
    private String dataType;  // INT, BIGINT, STRING等
    
    @Column(nullable = false)
    private Integer columnIndex;  // 字段顺序
    
    @Column(columnDefinition = "TEXT")
    private String comment;
    
    private Boolean nullable;
    
    @Column(columnDefinition = "TEXT")
    private String defaultValue;
}

// 操作审计日志
@Entity
@Table(name = "paimon_audit_logs")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class AuditLog {
    
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    
    @Column(nullable = false)
    private String operator;  // 执行者
    
    @Column(nullable = false)
    private String operation;  // CREATE, UPDATE, DELETE等
    
    @Column(nullable = false)
    private String target;  // database.table
    
    @Column(columnDefinition = "LONGTEXT")
    private String content;  // 操作内容详情
    
    @Column(nullable = false)
    private Long timestamp;
    
    private String status;  // SUCCESS, FAILED
    
    @Column(columnDefinition = "TEXT")
    private String errorMsg;
}

第三部分:实现REST API接口

3.1 数据库管理接口

java 复制代码
package com.example.paimon.catalog.controller;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import com.example.paimon.catalog.service.DatabaseService;
import java.util.Map;

@RestController
@RequestMapping("/apis/v1/databases")
public class DatabaseController {
    
    @Autowired
    private DatabaseService databaseService;
    
    /**
     * 列出所有数据库
     * GET /apis/v1/databases
     */
    @GetMapping
    public ResponseEntity<?> listDatabases() {
        try {
            return ResponseEntity.ok(Map.of(
                "databases", databaseService.listDatabases(),
                "count", databaseService.getDatabaseCount()
            ));
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 获取数据库详情
     * GET /apis/v1/databases/{database}
     */
    @GetMapping("/{database}")
    public ResponseEntity<?> getDatabase(@PathVariable String database) {
        try {
            var db = databaseService.getDatabase(database);
            if (db == null) {
                return ResponseEntity.status(404)
                    .body(Map.of("error", "Database not found: " + database));
            }
            return ResponseEntity.ok(db);
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 创建数据库
     * POST /apis/v1/databases
     * 
     * Request body:
     * {
     *     "name": "my_db",
     *     "comment": "My database",
     *     "properties": {"key": "value"}
     * }
     */
    @PostMapping
    public ResponseEntity<?> createDatabase(@RequestBody Map<String, Object> request) {
        try {
            String database = (String) request.get("name");
            String comment = (String) request.getOrDefault("comment", "");
            
            databaseService.createDatabase(database, comment, 
                (Map<String, String>) request.get("properties"));
            
            return ResponseEntity.ok(Map.of(
                "status", "success",
                "database", database
            ));
        } catch (Exception e) {
            return ResponseEntity.status(400)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 删除数据库
     * DELETE /apis/v1/databases/{database}
     */
    @DeleteMapping("/{database}")
    public ResponseEntity<?> dropDatabase(
            @PathVariable String database,
            @RequestParam(defaultValue = "false") boolean cascade) {
        try {
            databaseService.dropDatabase(database, cascade);
            return ResponseEntity.ok(Map.of(
                "status", "success",
                "database", database
            ));
        } catch (Exception e) {
            return ResponseEntity.status(400)
                .body(Map.of("error", e.getMessage()));
        }
    }
}

3.2 表管理接口

java 复制代码
package com.example.paimon.catalog.controller;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import com.example.paimon.catalog.service.TableService;
import java.util.Map;

@RestController
@RequestMapping("/apis/v1/{database}/tables")
public class TableController {
    
    @Autowired
    private TableService tableService;
    
    /**
     * 列出数据库中的所有表
     * GET /apis/v1/{database}/tables
     */
    @GetMapping
    public ResponseEntity<?> listTables(@PathVariable String database) {
        try {
            return ResponseEntity.ok(Map.of(
                "database", database,
                "tables", tableService.listTables(database),
                "count", tableService.getTableCount(database)
            ));
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 获取表详情
     * GET /apis/v1/{database}/tables/{table}
     */
    @GetMapping("/{table}")
    public ResponseEntity<?> getTable(
            @PathVariable String database,
            @PathVariable String table) {
        try {
            var tableInfo = tableService.getTable(database, table);
            if (tableInfo == null) {
                return ResponseEntity.status(404)
                    .body(Map.of("error", "Table not found"));
            }
            return ResponseEntity.ok(tableInfo);
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 创建表
     * POST /apis/v1/{database}/tables
     * 
     * Request body:
     * {
     *     "name": "orders",
     *     "schema": {
     *         "fields": [
     *             {"name": "order_id", "type": "BIGINT", "nullable": false},
     *             {"name": "amount", "type": "DECIMAL(10, 2)", "nullable": true}
     *         ],
     *         "primaryKeys": ["order_id"]
     *     },
     *     "partitionKeys": ["dt"],
     *     "properties": {
     *         "bucket": "16",
     *         "merge-engine": "deduplicate"
     *     }
     * }
     */
    @PostMapping
    public ResponseEntity<?> createTable(
            @PathVariable String database,
            @RequestBody Map<String, Object> request) {
        try {
            String table = (String) request.get("name");
            
            tableService.createTable(database, table, request);
            
            return ResponseEntity.ok(Map.of(
                "status", "success",
                "database", database,
                "table", table
            ));
        } catch (Exception e) {
            return ResponseEntity.status(400)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 更新表属性
     * PUT /apis/v1/{database}/tables/{table}
     */
    @PutMapping("/{table}")
    public ResponseEntity<?> alterTable(
            @PathVariable String database,
            @PathVariable String table,
            @RequestBody Map<String, Object> request) {
        try {
            tableService.alterTable(database, table, request);
            
            return ResponseEntity.ok(Map.of(
                "status", "success",
                "message", "Table altered successfully"
            ));
        } catch (Exception e) {
            return ResponseEntity.status(400)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 删除表
     * DELETE /apis/v1/{database}/tables/{table}
     */
    @DeleteMapping("/{table}")
    public ResponseEntity<?> dropTable(
            @PathVariable String database,
            @PathVariable String table) {
        try {
            tableService.dropTable(database, table);
            
            return ResponseEntity.ok(Map.of(
                "status", "success",
                "database", database,
                "table", table
            ));
        } catch (Exception e) {
            return ResponseEntity.status(400)
                .body(Map.of("error", e.getMessage()));
        }
    }
}

3.3 Schema管理接口

java 复制代码
package com.example.paimon.catalog.controller;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import com.example.paimon.catalog.service.SchemaService;
import java.util.Map;

@RestController
@RequestMapping("/apis/v1/{database}/{table}")
public class SchemaController {
    
    @Autowired
    private SchemaService schemaService;
    
    /**
     * 获取表Schema
     * GET /apis/v1/{database}/{table}/schema
     */
    @GetMapping("/schema")
    public ResponseEntity<?> getSchema(
            @PathVariable String database,
            @PathVariable String table) {
        try {
            var schema = schemaService.getSchema(database, table);
            return ResponseEntity.ok(Map.of(
                "database", database,
                "table", table,
                "schema", schema
            ));
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 更新Schema(添加列)
     * POST /apis/v1/{database}/{table}/schema
     * 
     * Request body:
     * {
     *     "operation": "add",
     *     "columns": [
     *         {"name": "new_col", "type": "STRING"}
     *     ]
     * }
     */
    @PostMapping("/schema")
    public ResponseEntity<?> alterSchema(
            @PathVariable String database,
            @PathVariable String table,
            @RequestBody Map<String, Object> request) {
        try {
            schemaService.alterSchema(database, table, request);
            
            return ResponseEntity.ok(Map.of(
                "status", "success",
                "message", "Schema updated successfully"
            ));
        } catch (Exception e) {
            return ResponseEntity.status(400)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 获取表统计信息
     * GET /apis/v1/{database}/{table}/stats
     */
    @GetMapping("/stats")
    public ResponseEntity<?> getTableStats(
            @PathVariable String database,
            @PathVariable String table) {
        try {
            var stats = schemaService.getTableStats(database, table);
            return ResponseEntity.ok(stats);
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 获取分区列表
     * GET /apis/v1/{database}/{table}/partitions
     */
    @GetMapping("/partitions")
    public ResponseEntity<?> getPartitions(
            @PathVariable String database,
            @PathVariable String table,
            @RequestParam(defaultValue = "100") int limit) {
        try {
            var partitions = schemaService.getPartitions(database, table, limit);
            return ResponseEntity.ok(Map.of(
                "database", database,
                "table", table,
                "partitions", partitions
            ));
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 获取Snapshot列表
     * GET /apis/v1/{database}/{table}/snapshots
     */
    @GetMapping("/snapshots")
    public ResponseEntity<?> getSnapshots(
            @PathVariable String database,
            @PathVariable String table,
            @RequestParam(defaultValue = "20") int limit) {
        try {
            var snapshots = schemaService.getSnapshots(database, table, limit);
            return ResponseEntity.ok(Map.of(
                "database", database,
                "table", table,
                "snapshots", snapshots
            ));
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
}

第四部分:核心业务逻辑实现

4.1 表服务层

java 复制代码
package com.example.paimon.catalog.service;

import org.apache.paimon.catalog.Catalog;
import org.apache.paimon.catalog.CatalogFactory;
import org.apache.paimon.schema.Schema;
import org.apache.paimon.types.*;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import com.example.paimon.catalog.model.TableMetadata;
import com.example.paimon.catalog.repository.TableMetadataRepository;
import com.google.common.cache.LoadingCache;
import lombok.extern.slf4j.Slf4j;

import java.util.*;

@Slf4j
@Service
public class TableService {
    
    @Autowired
    private TableMetadataRepository tableMetadataRepository;
    
    @Autowired
    private CatalogFactory catalogFactory;
    
    private final LoadingCache<String, Catalog> catalogCache;
    
    public TableService() {
        this.catalogCache = CacheBuilder.newBuilder()
            .maximumSize(10)
            .expireAfterWrite(Duration.ofHours(1))
            .build(new CacheLoader<String, Catalog>() {
                @Override
                public Catalog load(String path) throws Exception {
                    return createCatalog(path);
                }
            });
    }
    
    /**
     * 列出数据库中的所有表
     */
    public List<String> listTables(String database) throws Exception {
        var catalog = getCatalog();
        return catalog.listTables(database);
    }
    
    /**
     * 获取表元数据
     */
    public Map<String, Object> getTable(String database, String table) throws Exception {
        var metadata = tableMetadataRepository
            .findByDatabaseAndTable(database, table);
        
        if (metadata == null) {
            return null;
        }
        
        return Map.of(
            "database", metadata.getDatabase(),
            "table", metadata.getTable(),
            "schema", metadata.getSchema(),
            "location", metadata.getLocation(),
            "owner", metadata.getOwner(),
            "createdAt", metadata.getCreatedAt(),
            "updatedAt", metadata.getUpdatedAt(),
            "properties", metadata.getProperties()
        );
    }
    
    /**
     * 创建表
     */
    public void createTable(String database, String table, 
                           Map<String, Object> request) throws Exception {
        log.info("Creating table: {}.{}", database, table);
        
        // 验证请求
        validateCreateTableRequest(request);
        
        // 构建Schema
        Map<String, Object> schemaMap = (Map<String, Object>) request.get("schema");
        List<DataField> fields = buildFields(schemaMap);
        List<String> primaryKeys = (List<String>) request.get("primaryKeys");
        List<String> partitionKeys = (List<String>) 
            request.getOrDefault("partitionKeys", new ArrayList<>());
        
        Schema schema = Schema.newBuilder()
            .fields(fields)
            .primaryKey(primaryKeys)
            .partitionKeys(partitionKeys)
            .build();
        
        // 获取表属性
        Map<String, String> properties = 
            (Map<String, String>) request.getOrDefault("properties", new HashMap<>());
        
        // 获取Paimon Catalog
        var catalog = getCatalog();
        
        // 创建表
        try {
            catalog.createTable(database, table, schema, properties);
            
            // 保存元数据到MySQL
            TableMetadata metadata = new TableMetadata();
            metadata.setDatabase(database);
            metadata.setTable(table);
            metadata.setSchema(jacksonObjectMapper.writeValueAsString(schema));
            metadata.setLocation(request.getOrDefault("location", "").toString());
            metadata.setOwner((String) request.getOrDefault("owner", "admin"));
            metadata.setCreatedAt(System.currentTimeMillis());
            metadata.setUpdatedAt(System.currentTimeMillis());
            metadata.setStatus("ACTIVE");
            metadata.setProperties(jacksonObjectMapper
                .writeValueAsString(properties));
            
            tableMetadataRepository.save(metadata);
            
            log.info("Table created successfully: {}.{}", database, table);
        } catch (Exception e) {
            log.error("Failed to create table: {}.{}", database, table, e);
            throw e;
        }
    }
    
    /**
     * 删除表
     */
    public void dropTable(String database, String table) throws Exception {
        log.info("Dropping table: {}.{}", database, table);
        
        try {
            var catalog = getCatalog();
            catalog.dropTable(database, table, false);
            
            // 更新数据库状态
            var metadata = tableMetadataRepository
                .findByDatabaseAndTable(database, table);
            if (metadata != null) {
                metadata.setStatus("DELETED");
                metadata.setUpdatedAt(System.currentTimeMillis());
                tableMetadataRepository.save(metadata);
            }
            
            log.info("Table dropped successfully: {}.{}", database, table);
        } catch (Exception e) {
            log.error("Failed to drop table: {}.{}", database, table, e);
            throw e;
        }
    }
    
    /**
     * 获取表数量
     */
    public long getTableCount(String database) {
        return tableMetadataRepository.countByDatabaseAndStatus(database, "ACTIVE");
    }
    
    // 辅助方法
    
    private List<DataField> buildFields(Map<String, Object> schemaMap) {
        List<DataField> fields = new ArrayList<>();
        List<Map<String, Object>> fieldsList = 
            (List<Map<String, Object>>) schemaMap.get("fields");
        
        int index = 0;
        for (Map<String, Object> field : fieldsList) {
            String name = (String) field.get("name");
            String typeStr = (String) field.get("type");
            Boolean nullable = (Boolean) field.getOrDefault("nullable", true);
            
            DataType type = parseDataType(typeStr);
            fields.add(new DataField(index++, name, type, 
                (String) field.get("comment")));
        }
        
        return fields;
    }
    
    private DataType parseDataType(String typeStr) {
        if ("INT".equalsIgnoreCase(typeStr)) {
            return new IntType();
        } else if ("BIGINT".equalsIgnoreCase(typeStr)) {
            return new BigIntType();
        } else if ("STRING".equalsIgnoreCase(typeStr)) {
            return new VarCharType();
        } else if ("DOUBLE".equalsIgnoreCase(typeStr)) {
            return new DoubleType();
        } else if ("DECIMAL".equalsIgnoreCase(typeStr)) {
            // 简化处理,实际应解析(precision, scale)
            return new DecimalType(10, 2);
        }
        // ... 其他类型处理
        return new VarCharType();
    }
    
    private Catalog getCatalog() throws Exception {
        return catalogCache.getUnchecked("/path/to/paimon");
    }
}

4.2 缓存与性能优化

java 复制代码
package com.example.paimon.catalog.cache;

import com.google.common.cache.CacheBuilder;
import com.google.common.cache.CacheLoader;
import com.google.common.cache.LoadingCache;
import org.springframework.stereotype.Component;
import lombok.extern.slf4j.Slf4j;

import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;

@Slf4j
@Component
public class CatalogCacheManager {
    
    /**
     * Schema缓存:数据库名+表名 → Schema
     */
    private final LoadingCache<String, String> schemaCache = 
        CacheBuilder.newBuilder()
            .maximumSize(1000)
            .expireAfterWrite(1, TimeUnit.HOURS)
            .recordStats()
            .build(new CacheLoader<String, String>() {
                @Override
                public String load(String key) throws Exception {
                    // key format: "database.table"
                    String[] parts = key.split("\\.");
                    return loadSchemaFromPaimon(parts[0], parts[1]);
                }
            });
    
    /**
     * 分区缓存:表名 → 分区列表
     */
    private final LoadingCache<String, List<String>> partitionCache = 
        CacheBuilder.newBuilder()
            .maximumSize(100)
            .expireAfterWrite(10, TimeUnit.MINUTES)
            .recordStats()
            .build(new CacheLoader<String, List<String>>() {
                @Override
                public List<String> load(String tableKey) throws Exception {
                    String[] parts = tableKey.split("\\.");
                    return loadPartitionsFromPaimon(parts[0], parts[1]);
                }
            });
    
    /**
     * 获取Schema缓存
     */
    public String getSchema(String database, String table) 
            throws ExecutionException {
        String key = database + "." + table;
        return schemaCache.get(key);
    }
    
    /**
     * 刷新Schema缓存
     */
    public void invalidateSchema(String database, String table) {
        String key = database + "." + table;
        schemaCache.invalidate(key);
        log.info("Invalidated schema cache for {}", key);
    }
    
    /**
     * 获取分区缓存
     */
    public List<String> getPartitions(String database, String table) 
            throws ExecutionException {
        String key = database + "." + table;
        return partitionCache.get(key);
    }
    
    /**
     * 刷新分区缓存(在新增分区后调用)
     */
    public void invalidatePartitions(String database, String table) {
        String key = database + "." + table;
        partitionCache.invalidate(key);
        log.info("Invalidated partition cache for {}", key);
    }
    
    /**
     * 获取缓存统计信息
     */
    public Map<String, Object> getCacheStats() {
        return Map.of(
            "schema", Map.of(
                "hits", schemaCache.stats().hitCount(),
                "misses", schemaCache.stats().missCount(),
                "hitRate", schemaCache.stats().hitRate()
            ),
            "partitions", Map.of(
                "hits", partitionCache.stats().hitCount(),
                "misses", partitionCache.stats().missCount(),
                "hitRate", partitionCache.stats().hitRate()
            )
        );
    }
    
    private String loadSchemaFromPaimon(String database, String table) 
            throws Exception {
        // 实现从Paimon读取Schema
        log.debug("Loading schema for {}.{} from Paimon", database, table);
        // ... 实现细节
        return "";
    }
    
    private List<String> loadPartitionsFromPaimon(String database, String table) 
            throws Exception {
        // 实现从Paimon读取分区列表
        log.debug("Loading partitions for {}.{} from Paimon", database, table);
        // ... 实现细节
        return new ArrayList<>();
    }
}

第五部分:认证与授权

5.1 JWT认证拦截器

java 复制代码
package com.example.paimon.catalog.security;

import io.jsonwebtoken.Jwts;
import io.jsonwebtoken.SignatureAlgorithm;
import io.jsonwebtoken.security.Keys;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;

import javax.servlet.http.HttpServletRequest;
import java.util.Date;

@Component
public class JwtTokenProvider {
    
    @Value("${jwt.secret:your-secret-key-change-in-production}")
    private String jwtSecret;
    
    @Value("${jwt.expiration:86400000}")  // 24小时
    private long jwtExpirationMs;
    
    /**
     * 生成Token
     */
    public String generateToken(String username) {
        return Jwts.builder()
            .setSubject(username)
            .setIssuedAt(new Date())
            .setExpiration(new Date(System.currentTimeMillis() + jwtExpirationMs))
            .signWith(Keys.hmacShaKeyFor(jwtSecret.getBytes()), 
                SignatureAlgorithm.HS512)
            .compact();
    }
    
    /**
     * 从request获取Token
     */
    public String getTokenFromRequest(HttpServletRequest request) {
        String bearerToken = request.getHeader("Authorization");
        if (bearerToken != null && bearerToken.startsWith("Bearer ")) {
            return bearerToken.substring(7);
        }
        return null;
    }
    
    /**
     * 验证Token并获取用户名
     */
    public String getUsernameFromToken(String token) {
        try {
            return Jwts.parserBuilder()
                .setSigningKey(Keys.hmacShaKeyFor(jwtSecret.getBytes()))
                .build()
                .parseClaimsJws(token)
                .getBody()
                .getSubject();
        } catch (Exception e) {
            return null;
        }
    }
}

// 认证过滤器
@Component
public class JwtAuthenticationFilter extends OncePerRequestFilter {
    
    @Autowired
    private JwtTokenProvider jwtTokenProvider;
    
    @Override
    protected void doFilterInternal(HttpServletRequest request, 
                                   HttpServletResponse response,
                                   FilterChain filterChain) 
            throws ServletException, IOException {
        try {
            String token = jwtTokenProvider.getTokenFromRequest(request);
            
            if (token != null) {
                String username = jwtTokenProvider.getUsernameFromToken(token);
                
                if (username != null) {
                    // 设置认证信息
                    SecurityContextHolder.getContext()
                        .setAuthentication(
                            new UsernamePasswordAuthenticationToken(
                                username, null, 
                                new ArrayList<>())
                        );
                }
            }
        } catch (Exception e) {
            logger.error("Cannot set user authentication", e);
        }
        
        filterChain.doFilter(request, response);
    }
}

5.2 基于角色的访问控制(RBAC)

java 复制代码
package com.example.paimon.catalog.security;

import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.stereotype.Service;

@Service
public class AccessControlService {
    
    /**
     * 检查用户是否有表操作权限
     */
    public boolean hasTableAccess(String username, String database, 
                                 String table, String operation) {
        // 实现权限检查逻辑
        // operation: CREATE, READ, UPDATE, DELETE
        
        // 示例:从数据库读取权限配置
        // var permission = permissionRepository
        //     .findByUsernameAndResourceAndOperation(
        //         username, database + "." + table, operation);
        // return permission != null;
        
        return true;  // 默认允许
    }
    
    /**
     * 检查用户是否是数据库管理员
     */
    public boolean isDatabaseAdmin(String username, String database) {
        // 实现管理员检查逻辑
        return true;  // 简化实现
    }
}

// 在Controller中使用
@RestController
@RequestMapping("/apis/v1/{database}/tables")
public class TableController {
    
    @Autowired
    private AccessControlService accessControlService;
    
    @PostMapping
    public ResponseEntity<?> createTable(
            @PathVariable String database,
            @RequestBody Map<String, Object> request,
            Principal principal) {
        
        String username = principal.getName();
        
        // 检查权限
        if (!accessControlService.hasTableAccess(
                username, database, 
                (String) request.get("name"), "CREATE")) {
            return ResponseEntity.status(403)
                .body(Map.of("error", "Permission denied"));
        }
        
        // ... 创建表逻辑
    }
}

第六部分:审计与监控

6.1 操作审计

java 复制代码
package com.example.paimon.catalog.audit;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import com.example.paimon.catalog.model.AuditLog;
import com.example.paimon.catalog.repository.AuditLogRepository;
import lombok.extern.slf4j.Slf4j;

@Slf4j
@Component
public class AuditLogger {
    
    @Autowired
    private AuditLogRepository auditLogRepository;
    
    /**
     * 记录操作日志
     */
    public void log(String operator, String operation, String target, 
                   Map<String, Object> content, String status, 
                   String errorMsg) {
        try {
            AuditLog log = new AuditLog();
            log.setOperator(operator);
            log.setOperation(operation);  // CREATE, UPDATE, DELETE
            log.setTarget(target);  // database.table
            log.setContent(convertToJson(content));
            log.setTimestamp(System.currentTimeMillis());
            log.setStatus(status);  // SUCCESS, FAILED
            log.setErrorMsg(errorMsg);
            
            auditLogRepository.save(log);
            
            log.info("Audit log recorded: {} {} {}", operator, operation, target);
        } catch (Exception e) {
            log.error("Failed to record audit log", e);
        }
    }
    
    /**
     * 简化版本:成功操作
     */
    public void logSuccess(String operator, String operation, String target,
                          Map<String, Object> content) {
        log(operator, operation, target, content, "SUCCESS", null);
    }
    
    /**
     * 简化版本:失败操作
     */
    public void logFailure(String operator, String operation, String target,
                          String errorMsg) {
        log(operator, operation, target, null, "FAILED", errorMsg);
    }
}

// 使用AspectJ注解实现自动审计
@Slf4j
@Aspect
@Component
public class AuditAspect {
    
    @Autowired
    private AuditLogger auditLogger;
    
    @Around("@annotation(com.example.paimon.catalog.audit.Auditable)")
    public Object auditOperation(ProceedingJoinPoint joinPoint) 
            throws Throwable {
        
        MethodSignature signature = (MethodSignature) joinPoint.getSignature();
        Auditable auditable = signature.getMethod()
            .getAnnotation(Auditable.class);
        
        String operator = getOperator();
        String operation = auditable.operation();
        String target = extractTarget(joinPoint);
        
        try {
            Object result = joinPoint.proceed();
            auditLogger.logSuccess(operator, operation, target, null);
            return result;
        } catch (Exception e) {
            auditLogger.logFailure(operator, operation, target, 
                e.getMessage());
            throw e;
        }
    }
    
    private String getOperator() {
        Authentication auth = SecurityContextHolder.getContext()
            .getAuthentication();
        return auth != null ? auth.getName() : "UNKNOWN";
    }
    
    private String extractTarget(ProceedingJoinPoint joinPoint) {
        Object[] args = joinPoint.getArgs();
        if (args.length >= 2) {
            return args[0].toString() + "." + args[1].toString();
        }
        return "UNKNOWN";
    }
}

// 使用注解
@Auditable(operation = "CREATE")
public void createTable(...) {
    // ...
}

6.2 性能监控

java 复制代码
package com.example.paimon.catalog.monitor;

import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import java.util.Map;

@RestController
@RequestMapping("/metrics")
public class MetricsController {
    
    /**
     * 获取系统健康状态
     * GET /metrics/health
     */
    @GetMapping("/health")
    public ResponseEntity<?> health() {
        return ResponseEntity.ok(Map.of(
            "status", "UP",
            "timestamp", System.currentTimeMillis(),
            "uptime", getUptime(),
            "version", "1.0.0"
        ));
    }
    
    /**
     * 获取性能指标
     * GET /metrics/performance
     */
    @GetMapping("/performance")
    public ResponseEntity<?> getPerformanceMetrics() {
        Runtime runtime = Runtime.getRuntime();
        
        return ResponseEntity.ok(Map.of(
            "memory", Map.of(
                "total", runtime.totalMemory() / 1024 / 1024 + "MB",
                "used", (runtime.totalMemory() - runtime.freeMemory()) 
                    / 1024 / 1024 + "MB",
                "free", runtime.freeMemory() / 1024 / 1024 + "MB"
            ),
            "threads", Map.of(
                "count", Thread.activeCount(),
                "peakCount", Thread.activeCount()
            ),
            "gc", Map.of(
                "collections", getGCCollectionCount(),
                "time", getGCTime() + "ms"
            )
        ));
    }
    
    /**
     * 获取缓存统计
     * GET /metrics/cache
     */
    @GetMapping("/cache")
    public ResponseEntity<?> getCacheMetrics(
            @Autowired CatalogCacheManager cacheManager) {
        return ResponseEntity.ok(cacheManager.getCacheStats());
    }
    
    private long getUptime() {
        return ManagementFactory.getRuntimeMXBean().getUptime();
    }
    
    private long getGCCollectionCount() {
        return ManagementFactory.getGarbageCollectorMXBeans().stream()
            .mapToLong(gc -> gc.getCollectionCount())
            .sum();
    }
    
    private long getGCTime() {
        return ManagementFactory.getGarbageCollectorMXBeans().stream()
            .mapToLong(gc -> gc.getCollectionTime())
            .sum();
    }
}

第七部分:生产部署与配置

7.1 应用配置文件

yaml 复制代码
# application.yml
spring:
  application:
    name: paimon-rest-catalog
  datasource:
    url: jdbc:mysql://localhost:3306/paimon_catalog?characterEncoding=utf8
    username: root
    password: password
    driver-class-name: com.mysql.cj.jdbc.Driver
  
  jpa:
    hibernate:
      ddl-auto: validate
    properties:
      hibernate:
        dialect: org.hibernate.dialect.MySQL8Dialect
        format_sql: true
        show_sql: false
        jdbc:
          batch_size: 50
          fetch_size: 100

# Paimon配置
paimon:
  warehouse: hdfs:///warehouse
  default-database: default
  catalog-type: filesystem
  
# 缓存配置
cache:
  schema:
    ttl-hours: 1
    max-size: 1000
  partition:
    ttl-minutes: 10
    max-size: 100

# JWT配置
jwt:
  secret: ${JWT_SECRET:your-secret-key-very-long-string-for-production}
  expiration: 86400000  # 24小时

# 日志配置
logging:
  level:
    root: INFO
    com.example.paimon: DEBUG
  file:
    name: logs/paimon-catalog.log
  pattern:
    file: "%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %msg%n"

server:
  port: 8080
  servlet:
    context-path: /
  compression:
    enabled: true
    min-response-size: 1024

7.2 Docker部署

dockerfile 复制代码
# Dockerfile
FROM openjdk:11-jre-slim

WORKDIR /app

# 复制JAR文件
COPY target/paimon-rest-catalog-1.0.0.jar app.jar

# 暴露端口
EXPOSE 8080

# 启动应用
ENTRYPOINT ["java", "-Xms1g", "-Xmx2g", "-jar", "app.jar"]
yaml 复制代码
# docker-compose.yml
version: '3.8'

services:
  mysql:
    image: mysql:8.0
    environment:
      MYSQL_ROOT_PASSWORD: password
      MYSQL_DATABASE: paimon_catalog
    ports:
      - "3306:3306"
    volumes:
      - mysql-data:/var/lib/mysql

  paimon-catalog:
    build: .
    ports:
      - "8080:8080"
    environment:
      SPRING_DATASOURCE_URL: jdbc:mysql://mysql:3306/paimon_catalog
      SPRING_DATASOURCE_USERNAME: root
      SPRING_DATASOURCE_PASSWORD: password
      PAIMON_WAREHOUSE: hdfs:///warehouse
      JWT_SECRET: your-production-secret-key
    depends_on:
      - mysql
    volumes:
      - ./logs:/app/logs

volumes:
  mysql-data:

第八部分:客户端集成

8.1 Java客户端示例

java 复制代码
package com.example.paimon.catalog.client;

import okhttp3.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.AllArgsConstructor;
import lombok.Data;

public class PaimonCatalogClient {
    
    private final String baseUrl;
    private final String token;
    private final OkHttpClient httpClient;
    private final ObjectMapper objectMapper;
    
    public PaimonCatalogClient(String baseUrl, String token) {
        this.baseUrl = baseUrl;
        this.token = token;
        this.httpClient = new OkHttpClient.Builder()
            .connectTimeout(10, TimeUnit.SECONDS)
            .readTimeout(30, TimeUnit.SECONDS)
            .build();
        this.objectMapper = new ObjectMapper();
    }
    
    /**
     * 创建表
     */
    public void createTable(String database, String table, 
                           TableDefinition definition) throws Exception {
        String url = String.format("%s/apis/v1/%s/tables", 
            baseUrl, database);
        
        String body = objectMapper.writeValueAsString(Map.of(
            "name", table,
            "schema", definition.getSchema(),
            "partitionKeys", definition.getPartitionKeys(),
            "properties", definition.getProperties()
        ));
        
        Request request = new Request.Builder()
            .url(url)
            .addHeader("Authorization", "Bearer " + token)
            .addHeader("Content-Type", "application/json")
            .post(RequestBody.create(body, MediaType.get("application/json")))
            .build();
        
        try (Response response = httpClient.newCall(request).execute()) {
            if (!response.isSuccessful()) {
                throw new RuntimeException(
                    "Failed to create table: " + response.body().string());
            }
        }
    }
    
    /**
     * 获取表Schema
     */
    public Map<String, Object> getTableSchema(String database, 
                                              String table) throws Exception {
        String url = String.format(
            "%s/apis/v1/%s/%s/schema", baseUrl, database, table);
        
        Request request = new Request.Builder()
            .url(url)
            .addHeader("Authorization", "Bearer " + token)
            .get()
            .build();
        
        try (Response response = httpClient.newCall(request).execute()) {
            if (response.isSuccessful()) {
                return objectMapper.readValue(
                    response.body().string(), Map.class);
            } else {
                throw new RuntimeException(
                    "Failed to get schema: " + response.code());
            }
        }
    }
    
    /**
     * 列出分区
     */
    public List<String> getPartitions(String database, String table,
                                     int limit) throws Exception {
        String url = String.format(
            "%s/apis/v1/%s/%s/partitions?limit=%d", 
            baseUrl, database, table, limit);
        
        Request request = new Request.Builder()
            .url(url)
            .addHeader("Authorization", "Bearer " + token)
            .get()
            .build();
        
        try (Response response = httpClient.newCall(request).execute()) {
            if (response.isSuccessful()) {
                Map<String, Object> result = objectMapper.readValue(
                    response.body().string(), Map.class);
                return (List<String>) result.get("partitions");
            } else {
                throw new RuntimeException(
                    "Failed to get partitions: " + response.code());
            }
        }
    }
}

// 表定义
@Data
@AllArgsConstructor
public class TableDefinition {
    private Map<String, Object> schema;
    private List<String> partitionKeys;
    private Map<String, String> properties;
}

// 使用示例
public class Example {
    public static void main(String[] args) throws Exception {
        PaimonCatalogClient client = 
            new PaimonCatalogClient("http://localhost:8080", 
                "your-jwt-token");
        
        TableDefinition definition = new TableDefinition(
            Map.of(
                "fields", List.of(
                    Map.of("name", "id", "type", "BIGINT"),
                    Map.of("name", "name", "type", "STRING")
                ),
                "primaryKeys", List.of("id")
            ),
            List.of("dt"),
            Map.of("bucket", "16", "merge-engine", "deduplicate")
        );
        
        client.createTable("my_db", "my_table", definition);
        
        var schema = client.getTableSchema("my_db", "my_table");
        System.out.println("Schema: " + schema);
        
        var partitions = client.getPartitions("my_db", "my_table", 100);
        System.out.println("Partitions: " + partitions);
    }
}

8.2 Spark集成

scala 复制代码
// 注册Paimon Catalog为Spark Catalog
spark.sql("""
    CREATE CATALOG paimon WITH (
        'type' = 'paimon',
        'warehouse' = 'rest://localhost:8080/apis/v1'
    )
""")

// 使用远程Catalog
spark.sql("USE CATALOG paimon")
spark.sql("USE DATABASE my_db")

val df = spark.sql("SELECT * FROM my_table LIMIT 100")
df.show()

第九部分:常见问题与最佳实践

Q1: 如何处理并发写入冲突?

java 复制代码
/**
 * 使用乐观锁处理并发冲突
 */
@Entity
@Table(name = "paimon_tables")
public class TableMetadata {
    
    @Version
    private Long version;  // 乐观锁版本
    
    // ... 其他字段
}

// 自动处理冲突:如果版本不一致,JPA会抛出异常
try {
    tableMetadataRepository.save(metadata);
} catch (OptimisticLockingFailureException e) {
    log.warn("Concurrent modification detected, retrying...");
    // 重试逻辑
}

Q2: 如何优化大表的分区查询?

java 复制代码
/**
 * 使用分页查询大量分区
 */
@GetMapping("/{table}/partitions")
public ResponseEntity<?> getPartitions(
        @PathVariable String database,
        @PathVariable String table,
        @RequestParam(defaultValue = "0") int page,
        @RequestParam(defaultValue = "100") int size) {
    
    Pageable pageable = PageRequest.of(page, size);
    Page<String> partitions = schemaService
        .getPartitionsPaged(database, table, pageable);
    
    return ResponseEntity.ok(Map.of(
        "partitions", partitions.getContent(),
        "totalElements", partitions.getTotalElements(),
        "totalPages", partitions.getTotalPages(),
        "currentPage", page
    ));
}

Q3: 如何安全地管理密钥?

yaml 复制代码
# 使用环境变量或密钥管理服务
spring:
  datasource:
    password: ${DB_PASSWORD}  # 从环境变量读取
    
jwt:
  secret: ${JWT_SECRET}  # 从环境变量读取

# 或使用Vault集成
spring:
  cloud:
    vault:
      uri: http://localhost:8200
      token: ${VAULT_TOKEN}
      kv:
        backend: secret
        version: 2

第十部分:性能优化建议

优化清单

css 复制代码
缓存优化:
- [x] 启用Schema缓存(TTL 1小时)
- [x] 启用分区缓存(TTL 10分钟)
- [x] 使用CacheLoader自动加载
- [x] 监控缓存命中率

数据库优化:
- [x] 为database+table添加联合索引
- [x] 配置数据库连接池大小
- [x] 启用批量操作
- [x] 定期分析表结构

API优化:
- [x] 启用HTTP压缩
- [x] 实现查询结果分页
- [x] 添加查询超时控制
- [x] 使用异步处理长时间操作

安全优化:
- [x] 启用HTTPS
- [x] 定期轮换JWT密钥
- [x] 实现速率限制
- [x] 启用审计日志

总结

这个架构支持:

  • 集中式认证和授权
  • 元数据缓存和性能优化
  • 跨集群数据同步
  • 完整的审计和追踪
  • 水平扩展能力

下一章:第20章讲解性能测试与基准对标

相关推荐
语落心生1 小时前
流式数据湖Paimon探秘之旅 (十八) 常见问题排查与性能调优
大数据
语落心生1 小时前
流式数据湖Paimon探秘之旅 (十三) 分区与过期管理
大数据
语落心生1 小时前
流式数据湖Paimon探秘之旅 (十五) 文件清理与维护
大数据
土拨鼠烧电路1 小时前
RPA悖论迷思:从解放的利器到运维的枷锁?
大数据·运维·笔记·rpa
语落心生1 小时前
流式数据湖Paimon探秘之旅 (十七) 集群部署与运维
大数据
语落心生1 小时前
流式数据湖Paimon探秘之旅 (十二) 索引与加速
大数据
语落心生1 小时前
流式数据湖Paimon探秘之旅 (十四) Tag与分支管理
大数据
语落心生1 小时前
流式数据湖Paimon探秘之旅 (十一) Changelog变更日志
大数据
语落心生1 小时前
流式数据湖Paimon探秘之旅 (十六) Flink集成深度解析
大数据