流式数据湖Paimon探秘之旅 (十九) REST Catalog自定义服务开发

第19章:REST Catalog自定义服务开发

导言:打造跨系统的元数据服务

在前面的章节中,我们讲解了Paimon的Catalog体系。但在分布式系统中,往往需要跨集群、跨云的元数据管理。REST Catalog就是为了解决这个问题而设计的------通过HTTP接口暴露元数据服务,使得远程系统可以访问和管理Paimon表。

本章将讲解如何从零开始构建一个生产级别的REST Catalog服务


第一部分:REST Catalog架构设计

1.1 REST Catalog的作用

css 复制代码
传统Catalog(文件系统):
Spark任务 → 直接访问HDFS Catalog
Flink任务 → 直接访问HDFS Catalog
Presto查询 → 直接访问HDFS Catalog

问题:
├─ 大量网络I/O(每个操作都要访问HDFS)
├─ 无法跨云跨集群
├─ 元数据访问无法集中控制
└─ 缺乏审计和安全控制

REST Catalog解决方案:
REST Catalog Service(中央服务)
    ├─ 维护元数据缓存
    ├─ 集中认证和授权
    ├─ 审计日志
    └─ 性能优化

Spark → REST API → Service → 最终存储
Flink → REST API → Service → 最终存储
Presto → REST API → Service → 最终存储

1.2 REST Catalog的核心接口

Paimon REST Catalog需要实现的关键API:

bash 复制代码
核心CRUD操作:
├─ 数据库管理:GET /apis/v1/databases, POST /apis/v1/databases/{db}
├─ 表管理:GET /apis/v1/{db}/tables, POST /apis/v1/{db}/tables/{table}
├─ Schema操作:GET /apis/v1/{db}/{table}/schema, POST /apis/v1/{db}/{table}/schema
└─ 统计信息:GET /apis/v1/{db}/{table}/stats

表操作:
├─ 分区:GET /apis/v1/{db}/{table}/partitions
├─ Snapshot:GET /apis/v1/{db}/{table}/snapshots
└─ 清理:DELETE /apis/v1/{db}/{table}/partitions/{pt}

第二部分:构建REST Catalog服务框架

2.1 项目依赖配置

xml 复制代码
<project>
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.example</groupId>
    <artifactId>paimon-rest-catalog</artifactId>
    <version>1.0.0</version>

    <properties>
        <maven.compiler.source>11</maven.compiler.source>
        <maven.compiler.target>11</maven.compiler.target>
        <paimon.version>0.9.0</paimon.version>
        <spring-boot.version>2.7.0</spring-boot.version>
    </properties>

    <dependencies>
        <!-- Paimon Core -->
        <dependency>
            <groupId>org.apache.paimon</groupId>
            <artifactId>paimon-core</artifactId>
            <version>${paimon.version}</version>
        </dependency>

        <!-- Spring Boot Web -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
            <version>${spring-boot.version}</version>
        </dependency>

        <!-- Spring Boot Data JPA(用于元数据持久化)-->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
            <version>${spring-boot.version}</version>
        </dependency>

        <!-- MySQL Driver -->
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.33</version>
        </dependency>

        <!-- Lombok -->
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.30</version>
            <scope>provided</scope>
        </dependency>

        <!-- Jackson for JSON -->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.15.2</version>
        </dependency>

        <!-- Guava for Caching -->
        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>32.0.0-jre</version>
        </dependency>

        <!-- Logging -->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>2.0.5</version>
        </dependency>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.4.7</version>
        </dependency>
    </dependencies>
</project>

2.2 核心数据模型

java 复制代码
package com.example.paimon.catalog.model;

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import javax.persistence.*;
import java.time.LocalDateTime;

// 表元数据
@Entity
@Table(name = "paimon_tables")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class TableMetadata {
    
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    
    @Column(nullable = false)
    private String database;
    
    @Column(nullable = false)
    private String table;
    
    @Column(columnDefinition = "LONGTEXT")
    private String schema;  // JSON格式的Schema
    
    @Column(columnDefinition = "LONGTEXT")
    private String properties;  // 表属性
    
    @Column(nullable = false)
    private String location;  // HDFS或S3路径
    
    @Column(nullable = false)
    private String owner;
    
    @Column(nullable = false)
    private Long createdAt;
    
    @Column(nullable = false)
    private Long updatedAt;
    
    @Column(length = 50)
    private String status;  // ACTIVE, DELETED, ARCHIVED
    
    @Index(columnList = "database, table")
    private String uniqueKey;  // 用于快速查询
}

// 字段元数据
@Entity
@Table(name = "paimon_columns")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class ColumnMetadata {
    
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    
    @Column(nullable = false)
    private String database;
    
    @Column(nullable = false)
    private String table;
    
    @Column(nullable = false)
    private String columnName;
    
    @Column(nullable = false)
    private String dataType;  // INT, BIGINT, STRING等
    
    @Column(nullable = false)
    private Integer columnIndex;  // 字段顺序
    
    @Column(columnDefinition = "TEXT")
    private String comment;
    
    private Boolean nullable;
    
    @Column(columnDefinition = "TEXT")
    private String defaultValue;
}

// 操作审计日志
@Entity
@Table(name = "paimon_audit_logs")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class AuditLog {
    
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    
    @Column(nullable = false)
    private String operator;  // 执行者
    
    @Column(nullable = false)
    private String operation;  // CREATE, UPDATE, DELETE等
    
    @Column(nullable = false)
    private String target;  // database.table
    
    @Column(columnDefinition = "LONGTEXT")
    private String content;  // 操作内容详情
    
    @Column(nullable = false)
    private Long timestamp;
    
    private String status;  // SUCCESS, FAILED
    
    @Column(columnDefinition = "TEXT")
    private String errorMsg;
}

第三部分:实现REST API接口

3.1 数据库管理接口

java 复制代码
package com.example.paimon.catalog.controller;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import com.example.paimon.catalog.service.DatabaseService;
import java.util.Map;

@RestController
@RequestMapping("/apis/v1/databases")
public class DatabaseController {
    
    @Autowired
    private DatabaseService databaseService;
    
    /**
     * 列出所有数据库
     * GET /apis/v1/databases
     */
    @GetMapping
    public ResponseEntity<?> listDatabases() {
        try {
            return ResponseEntity.ok(Map.of(
                "databases", databaseService.listDatabases(),
                "count", databaseService.getDatabaseCount()
            ));
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 获取数据库详情
     * GET /apis/v1/databases/{database}
     */
    @GetMapping("/{database}")
    public ResponseEntity<?> getDatabase(@PathVariable String database) {
        try {
            var db = databaseService.getDatabase(database);
            if (db == null) {
                return ResponseEntity.status(404)
                    .body(Map.of("error", "Database not found: " + database));
            }
            return ResponseEntity.ok(db);
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 创建数据库
     * POST /apis/v1/databases
     * 
     * Request body:
     * {
     *     "name": "my_db",
     *     "comment": "My database",
     *     "properties": {"key": "value"}
     * }
     */
    @PostMapping
    public ResponseEntity<?> createDatabase(@RequestBody Map<String, Object> request) {
        try {
            String database = (String) request.get("name");
            String comment = (String) request.getOrDefault("comment", "");
            
            databaseService.createDatabase(database, comment, 
                (Map<String, String>) request.get("properties"));
            
            return ResponseEntity.ok(Map.of(
                "status", "success",
                "database", database
            ));
        } catch (Exception e) {
            return ResponseEntity.status(400)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 删除数据库
     * DELETE /apis/v1/databases/{database}
     */
    @DeleteMapping("/{database}")
    public ResponseEntity<?> dropDatabase(
            @PathVariable String database,
            @RequestParam(defaultValue = "false") boolean cascade) {
        try {
            databaseService.dropDatabase(database, cascade);
            return ResponseEntity.ok(Map.of(
                "status", "success",
                "database", database
            ));
        } catch (Exception e) {
            return ResponseEntity.status(400)
                .body(Map.of("error", e.getMessage()));
        }
    }
}

3.2 表管理接口

java 复制代码
package com.example.paimon.catalog.controller;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import com.example.paimon.catalog.service.TableService;
import java.util.Map;

@RestController
@RequestMapping("/apis/v1/{database}/tables")
public class TableController {
    
    @Autowired
    private TableService tableService;
    
    /**
     * 列出数据库中的所有表
     * GET /apis/v1/{database}/tables
     */
    @GetMapping
    public ResponseEntity<?> listTables(@PathVariable String database) {
        try {
            return ResponseEntity.ok(Map.of(
                "database", database,
                "tables", tableService.listTables(database),
                "count", tableService.getTableCount(database)
            ));
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 获取表详情
     * GET /apis/v1/{database}/tables/{table}
     */
    @GetMapping("/{table}")
    public ResponseEntity<?> getTable(
            @PathVariable String database,
            @PathVariable String table) {
        try {
            var tableInfo = tableService.getTable(database, table);
            if (tableInfo == null) {
                return ResponseEntity.status(404)
                    .body(Map.of("error", "Table not found"));
            }
            return ResponseEntity.ok(tableInfo);
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 创建表
     * POST /apis/v1/{database}/tables
     * 
     * Request body:
     * {
     *     "name": "orders",
     *     "schema": {
     *         "fields": [
     *             {"name": "order_id", "type": "BIGINT", "nullable": false},
     *             {"name": "amount", "type": "DECIMAL(10, 2)", "nullable": true}
     *         ],
     *         "primaryKeys": ["order_id"]
     *     },
     *     "partitionKeys": ["dt"],
     *     "properties": {
     *         "bucket": "16",
     *         "merge-engine": "deduplicate"
     *     }
     * }
     */
    @PostMapping
    public ResponseEntity<?> createTable(
            @PathVariable String database,
            @RequestBody Map<String, Object> request) {
        try {
            String table = (String) request.get("name");
            
            tableService.createTable(database, table, request);
            
            return ResponseEntity.ok(Map.of(
                "status", "success",
                "database", database,
                "table", table
            ));
        } catch (Exception e) {
            return ResponseEntity.status(400)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 更新表属性
     * PUT /apis/v1/{database}/tables/{table}
     */
    @PutMapping("/{table}")
    public ResponseEntity<?> alterTable(
            @PathVariable String database,
            @PathVariable String table,
            @RequestBody Map<String, Object> request) {
        try {
            tableService.alterTable(database, table, request);
            
            return ResponseEntity.ok(Map.of(
                "status", "success",
                "message", "Table altered successfully"
            ));
        } catch (Exception e) {
            return ResponseEntity.status(400)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 删除表
     * DELETE /apis/v1/{database}/tables/{table}
     */
    @DeleteMapping("/{table}")
    public ResponseEntity<?> dropTable(
            @PathVariable String database,
            @PathVariable String table) {
        try {
            tableService.dropTable(database, table);
            
            return ResponseEntity.ok(Map.of(
                "status", "success",
                "database", database,
                "table", table
            ));
        } catch (Exception e) {
            return ResponseEntity.status(400)
                .body(Map.of("error", e.getMessage()));
        }
    }
}

3.3 Schema管理接口

java 复制代码
package com.example.paimon.catalog.controller;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import com.example.paimon.catalog.service.SchemaService;
import java.util.Map;

@RestController
@RequestMapping("/apis/v1/{database}/{table}")
public class SchemaController {
    
    @Autowired
    private SchemaService schemaService;
    
    /**
     * 获取表Schema
     * GET /apis/v1/{database}/{table}/schema
     */
    @GetMapping("/schema")
    public ResponseEntity<?> getSchema(
            @PathVariable String database,
            @PathVariable String table) {
        try {
            var schema = schemaService.getSchema(database, table);
            return ResponseEntity.ok(Map.of(
                "database", database,
                "table", table,
                "schema", schema
            ));
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 更新Schema(添加列)
     * POST /apis/v1/{database}/{table}/schema
     * 
     * Request body:
     * {
     *     "operation": "add",
     *     "columns": [
     *         {"name": "new_col", "type": "STRING"}
     *     ]
     * }
     */
    @PostMapping("/schema")
    public ResponseEntity<?> alterSchema(
            @PathVariable String database,
            @PathVariable String table,
            @RequestBody Map<String, Object> request) {
        try {
            schemaService.alterSchema(database, table, request);
            
            return ResponseEntity.ok(Map.of(
                "status", "success",
                "message", "Schema updated successfully"
            ));
        } catch (Exception e) {
            return ResponseEntity.status(400)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 获取表统计信息
     * GET /apis/v1/{database}/{table}/stats
     */
    @GetMapping("/stats")
    public ResponseEntity<?> getTableStats(
            @PathVariable String database,
            @PathVariable String table) {
        try {
            var stats = schemaService.getTableStats(database, table);
            return ResponseEntity.ok(stats);
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 获取分区列表
     * GET /apis/v1/{database}/{table}/partitions
     */
    @GetMapping("/partitions")
    public ResponseEntity<?> getPartitions(
            @PathVariable String database,
            @PathVariable String table,
            @RequestParam(defaultValue = "100") int limit) {
        try {
            var partitions = schemaService.getPartitions(database, table, limit);
            return ResponseEntity.ok(Map.of(
                "database", database,
                "table", table,
                "partitions", partitions
            ));
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
    
    /**
     * 获取Snapshot列表
     * GET /apis/v1/{database}/{table}/snapshots
     */
    @GetMapping("/snapshots")
    public ResponseEntity<?> getSnapshots(
            @PathVariable String database,
            @PathVariable String table,
            @RequestParam(defaultValue = "20") int limit) {
        try {
            var snapshots = schemaService.getSnapshots(database, table, limit);
            return ResponseEntity.ok(Map.of(
                "database", database,
                "table", table,
                "snapshots", snapshots
            ));
        } catch (Exception e) {
            return ResponseEntity.status(500)
                .body(Map.of("error", e.getMessage()));
        }
    }
}

第四部分:核心业务逻辑实现

4.1 表服务层

java 复制代码
package com.example.paimon.catalog.service;

import org.apache.paimon.catalog.Catalog;
import org.apache.paimon.catalog.CatalogFactory;
import org.apache.paimon.schema.Schema;
import org.apache.paimon.types.*;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import com.example.paimon.catalog.model.TableMetadata;
import com.example.paimon.catalog.repository.TableMetadataRepository;
import com.google.common.cache.LoadingCache;
import lombok.extern.slf4j.Slf4j;

import java.util.*;

@Slf4j
@Service
public class TableService {
    
    @Autowired
    private TableMetadataRepository tableMetadataRepository;
    
    @Autowired
    private CatalogFactory catalogFactory;
    
    private final LoadingCache<String, Catalog> catalogCache;
    
    public TableService() {
        this.catalogCache = CacheBuilder.newBuilder()
            .maximumSize(10)
            .expireAfterWrite(Duration.ofHours(1))
            .build(new CacheLoader<String, Catalog>() {
                @Override
                public Catalog load(String path) throws Exception {
                    return createCatalog(path);
                }
            });
    }
    
    /**
     * 列出数据库中的所有表
     */
    public List<String> listTables(String database) throws Exception {
        var catalog = getCatalog();
        return catalog.listTables(database);
    }
    
    /**
     * 获取表元数据
     */
    public Map<String, Object> getTable(String database, String table) throws Exception {
        var metadata = tableMetadataRepository
            .findByDatabaseAndTable(database, table);
        
        if (metadata == null) {
            return null;
        }
        
        return Map.of(
            "database", metadata.getDatabase(),
            "table", metadata.getTable(),
            "schema", metadata.getSchema(),
            "location", metadata.getLocation(),
            "owner", metadata.getOwner(),
            "createdAt", metadata.getCreatedAt(),
            "updatedAt", metadata.getUpdatedAt(),
            "properties", metadata.getProperties()
        );
    }
    
    /**
     * 创建表
     */
    public void createTable(String database, String table, 
                           Map<String, Object> request) throws Exception {
        log.info("Creating table: {}.{}", database, table);
        
        // 验证请求
        validateCreateTableRequest(request);
        
        // 构建Schema
        Map<String, Object> schemaMap = (Map<String, Object>) request.get("schema");
        List<DataField> fields = buildFields(schemaMap);
        List<String> primaryKeys = (List<String>) request.get("primaryKeys");
        List<String> partitionKeys = (List<String>) 
            request.getOrDefault("partitionKeys", new ArrayList<>());
        
        Schema schema = Schema.newBuilder()
            .fields(fields)
            .primaryKey(primaryKeys)
            .partitionKeys(partitionKeys)
            .build();
        
        // 获取表属性
        Map<String, String> properties = 
            (Map<String, String>) request.getOrDefault("properties", new HashMap<>());
        
        // 获取Paimon Catalog
        var catalog = getCatalog();
        
        // 创建表
        try {
            catalog.createTable(database, table, schema, properties);
            
            // 保存元数据到MySQL
            TableMetadata metadata = new TableMetadata();
            metadata.setDatabase(database);
            metadata.setTable(table);
            metadata.setSchema(jacksonObjectMapper.writeValueAsString(schema));
            metadata.setLocation(request.getOrDefault("location", "").toString());
            metadata.setOwner((String) request.getOrDefault("owner", "admin"));
            metadata.setCreatedAt(System.currentTimeMillis());
            metadata.setUpdatedAt(System.currentTimeMillis());
            metadata.setStatus("ACTIVE");
            metadata.setProperties(jacksonObjectMapper
                .writeValueAsString(properties));
            
            tableMetadataRepository.save(metadata);
            
            log.info("Table created successfully: {}.{}", database, table);
        } catch (Exception e) {
            log.error("Failed to create table: {}.{}", database, table, e);
            throw e;
        }
    }
    
    /**
     * 删除表
     */
    public void dropTable(String database, String table) throws Exception {
        log.info("Dropping table: {}.{}", database, table);
        
        try {
            var catalog = getCatalog();
            catalog.dropTable(database, table, false);
            
            // 更新数据库状态
            var metadata = tableMetadataRepository
                .findByDatabaseAndTable(database, table);
            if (metadata != null) {
                metadata.setStatus("DELETED");
                metadata.setUpdatedAt(System.currentTimeMillis());
                tableMetadataRepository.save(metadata);
            }
            
            log.info("Table dropped successfully: {}.{}", database, table);
        } catch (Exception e) {
            log.error("Failed to drop table: {}.{}", database, table, e);
            throw e;
        }
    }
    
    /**
     * 获取表数量
     */
    public long getTableCount(String database) {
        return tableMetadataRepository.countByDatabaseAndStatus(database, "ACTIVE");
    }
    
    // 辅助方法
    
    private List<DataField> buildFields(Map<String, Object> schemaMap) {
        List<DataField> fields = new ArrayList<>();
        List<Map<String, Object>> fieldsList = 
            (List<Map<String, Object>>) schemaMap.get("fields");
        
        int index = 0;
        for (Map<String, Object> field : fieldsList) {
            String name = (String) field.get("name");
            String typeStr = (String) field.get("type");
            Boolean nullable = (Boolean) field.getOrDefault("nullable", true);
            
            DataType type = parseDataType(typeStr);
            fields.add(new DataField(index++, name, type, 
                (String) field.get("comment")));
        }
        
        return fields;
    }
    
    private DataType parseDataType(String typeStr) {
        if ("INT".equalsIgnoreCase(typeStr)) {
            return new IntType();
        } else if ("BIGINT".equalsIgnoreCase(typeStr)) {
            return new BigIntType();
        } else if ("STRING".equalsIgnoreCase(typeStr)) {
            return new VarCharType();
        } else if ("DOUBLE".equalsIgnoreCase(typeStr)) {
            return new DoubleType();
        } else if ("DECIMAL".equalsIgnoreCase(typeStr)) {
            // 简化处理,实际应解析(precision, scale)
            return new DecimalType(10, 2);
        }
        // ... 其他类型处理
        return new VarCharType();
    }
    
    private Catalog getCatalog() throws Exception {
        return catalogCache.getUnchecked("/path/to/paimon");
    }
}

4.2 缓存与性能优化

java 复制代码
package com.example.paimon.catalog.cache;

import com.google.common.cache.CacheBuilder;
import com.google.common.cache.CacheLoader;
import com.google.common.cache.LoadingCache;
import org.springframework.stereotype.Component;
import lombok.extern.slf4j.Slf4j;

import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;

@Slf4j
@Component
public class CatalogCacheManager {
    
    /**
     * Schema缓存:数据库名+表名 → Schema
     */
    private final LoadingCache<String, String> schemaCache = 
        CacheBuilder.newBuilder()
            .maximumSize(1000)
            .expireAfterWrite(1, TimeUnit.HOURS)
            .recordStats()
            .build(new CacheLoader<String, String>() {
                @Override
                public String load(String key) throws Exception {
                    // key format: "database.table"
                    String[] parts = key.split("\\.");
                    return loadSchemaFromPaimon(parts[0], parts[1]);
                }
            });
    
    /**
     * 分区缓存:表名 → 分区列表
     */
    private final LoadingCache<String, List<String>> partitionCache = 
        CacheBuilder.newBuilder()
            .maximumSize(100)
            .expireAfterWrite(10, TimeUnit.MINUTES)
            .recordStats()
            .build(new CacheLoader<String, List<String>>() {
                @Override
                public List<String> load(String tableKey) throws Exception {
                    String[] parts = tableKey.split("\\.");
                    return loadPartitionsFromPaimon(parts[0], parts[1]);
                }
            });
    
    /**
     * 获取Schema缓存
     */
    public String getSchema(String database, String table) 
            throws ExecutionException {
        String key = database + "." + table;
        return schemaCache.get(key);
    }
    
    /**
     * 刷新Schema缓存
     */
    public void invalidateSchema(String database, String table) {
        String key = database + "." + table;
        schemaCache.invalidate(key);
        log.info("Invalidated schema cache for {}", key);
    }
    
    /**
     * 获取分区缓存
     */
    public List<String> getPartitions(String database, String table) 
            throws ExecutionException {
        String key = database + "." + table;
        return partitionCache.get(key);
    }
    
    /**
     * 刷新分区缓存(在新增分区后调用)
     */
    public void invalidatePartitions(String database, String table) {
        String key = database + "." + table;
        partitionCache.invalidate(key);
        log.info("Invalidated partition cache for {}", key);
    }
    
    /**
     * 获取缓存统计信息
     */
    public Map<String, Object> getCacheStats() {
        return Map.of(
            "schema", Map.of(
                "hits", schemaCache.stats().hitCount(),
                "misses", schemaCache.stats().missCount(),
                "hitRate", schemaCache.stats().hitRate()
            ),
            "partitions", Map.of(
                "hits", partitionCache.stats().hitCount(),
                "misses", partitionCache.stats().missCount(),
                "hitRate", partitionCache.stats().hitRate()
            )
        );
    }
    
    private String loadSchemaFromPaimon(String database, String table) 
            throws Exception {
        // 实现从Paimon读取Schema
        log.debug("Loading schema for {}.{} from Paimon", database, table);
        // ... 实现细节
        return "";
    }
    
    private List<String> loadPartitionsFromPaimon(String database, String table) 
            throws Exception {
        // 实现从Paimon读取分区列表
        log.debug("Loading partitions for {}.{} from Paimon", database, table);
        // ... 实现细节
        return new ArrayList<>();
    }
}

第五部分:认证与授权

5.1 JWT认证拦截器

java 复制代码
package com.example.paimon.catalog.security;

import io.jsonwebtoken.Jwts;
import io.jsonwebtoken.SignatureAlgorithm;
import io.jsonwebtoken.security.Keys;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;

import javax.servlet.http.HttpServletRequest;
import java.util.Date;

@Component
public class JwtTokenProvider {
    
    @Value("${jwt.secret:your-secret-key-change-in-production}")
    private String jwtSecret;
    
    @Value("${jwt.expiration:86400000}")  // 24小时
    private long jwtExpirationMs;
    
    /**
     * 生成Token
     */
    public String generateToken(String username) {
        return Jwts.builder()
            .setSubject(username)
            .setIssuedAt(new Date())
            .setExpiration(new Date(System.currentTimeMillis() + jwtExpirationMs))
            .signWith(Keys.hmacShaKeyFor(jwtSecret.getBytes()), 
                SignatureAlgorithm.HS512)
            .compact();
    }
    
    /**
     * 从request获取Token
     */
    public String getTokenFromRequest(HttpServletRequest request) {
        String bearerToken = request.getHeader("Authorization");
        if (bearerToken != null && bearerToken.startsWith("Bearer ")) {
            return bearerToken.substring(7);
        }
        return null;
    }
    
    /**
     * 验证Token并获取用户名
     */
    public String getUsernameFromToken(String token) {
        try {
            return Jwts.parserBuilder()
                .setSigningKey(Keys.hmacShaKeyFor(jwtSecret.getBytes()))
                .build()
                .parseClaimsJws(token)
                .getBody()
                .getSubject();
        } catch (Exception e) {
            return null;
        }
    }
}

// 认证过滤器
@Component
public class JwtAuthenticationFilter extends OncePerRequestFilter {
    
    @Autowired
    private JwtTokenProvider jwtTokenProvider;
    
    @Override
    protected void doFilterInternal(HttpServletRequest request, 
                                   HttpServletResponse response,
                                   FilterChain filterChain) 
            throws ServletException, IOException {
        try {
            String token = jwtTokenProvider.getTokenFromRequest(request);
            
            if (token != null) {
                String username = jwtTokenProvider.getUsernameFromToken(token);
                
                if (username != null) {
                    // 设置认证信息
                    SecurityContextHolder.getContext()
                        .setAuthentication(
                            new UsernamePasswordAuthenticationToken(
                                username, null, 
                                new ArrayList<>())
                        );
                }
            }
        } catch (Exception e) {
            logger.error("Cannot set user authentication", e);
        }
        
        filterChain.doFilter(request, response);
    }
}

5.2 基于角色的访问控制(RBAC)

java 复制代码
package com.example.paimon.catalog.security;

import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.stereotype.Service;

@Service
public class AccessControlService {
    
    /**
     * 检查用户是否有表操作权限
     */
    public boolean hasTableAccess(String username, String database, 
                                 String table, String operation) {
        // 实现权限检查逻辑
        // operation: CREATE, READ, UPDATE, DELETE
        
        // 示例:从数据库读取权限配置
        // var permission = permissionRepository
        //     .findByUsernameAndResourceAndOperation(
        //         username, database + "." + table, operation);
        // return permission != null;
        
        return true;  // 默认允许
    }
    
    /**
     * 检查用户是否是数据库管理员
     */
    public boolean isDatabaseAdmin(String username, String database) {
        // 实现管理员检查逻辑
        return true;  // 简化实现
    }
}

// 在Controller中使用
@RestController
@RequestMapping("/apis/v1/{database}/tables")
public class TableController {
    
    @Autowired
    private AccessControlService accessControlService;
    
    @PostMapping
    public ResponseEntity<?> createTable(
            @PathVariable String database,
            @RequestBody Map<String, Object> request,
            Principal principal) {
        
        String username = principal.getName();
        
        // 检查权限
        if (!accessControlService.hasTableAccess(
                username, database, 
                (String) request.get("name"), "CREATE")) {
            return ResponseEntity.status(403)
                .body(Map.of("error", "Permission denied"));
        }
        
        // ... 创建表逻辑
    }
}

第六部分:审计与监控

6.1 操作审计

java 复制代码
package com.example.paimon.catalog.audit;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import com.example.paimon.catalog.model.AuditLog;
import com.example.paimon.catalog.repository.AuditLogRepository;
import lombok.extern.slf4j.Slf4j;

@Slf4j
@Component
public class AuditLogger {
    
    @Autowired
    private AuditLogRepository auditLogRepository;
    
    /**
     * 记录操作日志
     */
    public void log(String operator, String operation, String target, 
                   Map<String, Object> content, String status, 
                   String errorMsg) {
        try {
            AuditLog log = new AuditLog();
            log.setOperator(operator);
            log.setOperation(operation);  // CREATE, UPDATE, DELETE
            log.setTarget(target);  // database.table
            log.setContent(convertToJson(content));
            log.setTimestamp(System.currentTimeMillis());
            log.setStatus(status);  // SUCCESS, FAILED
            log.setErrorMsg(errorMsg);
            
            auditLogRepository.save(log);
            
            log.info("Audit log recorded: {} {} {}", operator, operation, target);
        } catch (Exception e) {
            log.error("Failed to record audit log", e);
        }
    }
    
    /**
     * 简化版本:成功操作
     */
    public void logSuccess(String operator, String operation, String target,
                          Map<String, Object> content) {
        log(operator, operation, target, content, "SUCCESS", null);
    }
    
    /**
     * 简化版本:失败操作
     */
    public void logFailure(String operator, String operation, String target,
                          String errorMsg) {
        log(operator, operation, target, null, "FAILED", errorMsg);
    }
}

// 使用AspectJ注解实现自动审计
@Slf4j
@Aspect
@Component
public class AuditAspect {
    
    @Autowired
    private AuditLogger auditLogger;
    
    @Around("@annotation(com.example.paimon.catalog.audit.Auditable)")
    public Object auditOperation(ProceedingJoinPoint joinPoint) 
            throws Throwable {
        
        MethodSignature signature = (MethodSignature) joinPoint.getSignature();
        Auditable auditable = signature.getMethod()
            .getAnnotation(Auditable.class);
        
        String operator = getOperator();
        String operation = auditable.operation();
        String target = extractTarget(joinPoint);
        
        try {
            Object result = joinPoint.proceed();
            auditLogger.logSuccess(operator, operation, target, null);
            return result;
        } catch (Exception e) {
            auditLogger.logFailure(operator, operation, target, 
                e.getMessage());
            throw e;
        }
    }
    
    private String getOperator() {
        Authentication auth = SecurityContextHolder.getContext()
            .getAuthentication();
        return auth != null ? auth.getName() : "UNKNOWN";
    }
    
    private String extractTarget(ProceedingJoinPoint joinPoint) {
        Object[] args = joinPoint.getArgs();
        if (args.length >= 2) {
            return args[0].toString() + "." + args[1].toString();
        }
        return "UNKNOWN";
    }
}

// 使用注解
@Auditable(operation = "CREATE")
public void createTable(...) {
    // ...
}

6.2 性能监控

java 复制代码
package com.example.paimon.catalog.monitor;

import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import java.util.Map;

@RestController
@RequestMapping("/metrics")
public class MetricsController {
    
    /**
     * 获取系统健康状态
     * GET /metrics/health
     */
    @GetMapping("/health")
    public ResponseEntity<?> health() {
        return ResponseEntity.ok(Map.of(
            "status", "UP",
            "timestamp", System.currentTimeMillis(),
            "uptime", getUptime(),
            "version", "1.0.0"
        ));
    }
    
    /**
     * 获取性能指标
     * GET /metrics/performance
     */
    @GetMapping("/performance")
    public ResponseEntity<?> getPerformanceMetrics() {
        Runtime runtime = Runtime.getRuntime();
        
        return ResponseEntity.ok(Map.of(
            "memory", Map.of(
                "total", runtime.totalMemory() / 1024 / 1024 + "MB",
                "used", (runtime.totalMemory() - runtime.freeMemory()) 
                    / 1024 / 1024 + "MB",
                "free", runtime.freeMemory() / 1024 / 1024 + "MB"
            ),
            "threads", Map.of(
                "count", Thread.activeCount(),
                "peakCount", Thread.activeCount()
            ),
            "gc", Map.of(
                "collections", getGCCollectionCount(),
                "time", getGCTime() + "ms"
            )
        ));
    }
    
    /**
     * 获取缓存统计
     * GET /metrics/cache
     */
    @GetMapping("/cache")
    public ResponseEntity<?> getCacheMetrics(
            @Autowired CatalogCacheManager cacheManager) {
        return ResponseEntity.ok(cacheManager.getCacheStats());
    }
    
    private long getUptime() {
        return ManagementFactory.getRuntimeMXBean().getUptime();
    }
    
    private long getGCCollectionCount() {
        return ManagementFactory.getGarbageCollectorMXBeans().stream()
            .mapToLong(gc -> gc.getCollectionCount())
            .sum();
    }
    
    private long getGCTime() {
        return ManagementFactory.getGarbageCollectorMXBeans().stream()
            .mapToLong(gc -> gc.getCollectionTime())
            .sum();
    }
}

第七部分:生产部署与配置

7.1 应用配置文件

yaml 复制代码
# application.yml
spring:
  application:
    name: paimon-rest-catalog
  datasource:
    url: jdbc:mysql://localhost:3306/paimon_catalog?characterEncoding=utf8
    username: root
    password: password
    driver-class-name: com.mysql.cj.jdbc.Driver
  
  jpa:
    hibernate:
      ddl-auto: validate
    properties:
      hibernate:
        dialect: org.hibernate.dialect.MySQL8Dialect
        format_sql: true
        show_sql: false
        jdbc:
          batch_size: 50
          fetch_size: 100

# Paimon配置
paimon:
  warehouse: hdfs:///warehouse
  default-database: default
  catalog-type: filesystem
  
# 缓存配置
cache:
  schema:
    ttl-hours: 1
    max-size: 1000
  partition:
    ttl-minutes: 10
    max-size: 100

# JWT配置
jwt:
  secret: ${JWT_SECRET:your-secret-key-very-long-string-for-production}
  expiration: 86400000  # 24小时

# 日志配置
logging:
  level:
    root: INFO
    com.example.paimon: DEBUG
  file:
    name: logs/paimon-catalog.log
  pattern:
    file: "%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %msg%n"

server:
  port: 8080
  servlet:
    context-path: /
  compression:
    enabled: true
    min-response-size: 1024

7.2 Docker部署

dockerfile 复制代码
# Dockerfile
FROM openjdk:11-jre-slim

WORKDIR /app

# 复制JAR文件
COPY target/paimon-rest-catalog-1.0.0.jar app.jar

# 暴露端口
EXPOSE 8080

# 启动应用
ENTRYPOINT ["java", "-Xms1g", "-Xmx2g", "-jar", "app.jar"]
yaml 复制代码
# docker-compose.yml
version: '3.8'

services:
  mysql:
    image: mysql:8.0
    environment:
      MYSQL_ROOT_PASSWORD: password
      MYSQL_DATABASE: paimon_catalog
    ports:
      - "3306:3306"
    volumes:
      - mysql-data:/var/lib/mysql

  paimon-catalog:
    build: .
    ports:
      - "8080:8080"
    environment:
      SPRING_DATASOURCE_URL: jdbc:mysql://mysql:3306/paimon_catalog
      SPRING_DATASOURCE_USERNAME: root
      SPRING_DATASOURCE_PASSWORD: password
      PAIMON_WAREHOUSE: hdfs:///warehouse
      JWT_SECRET: your-production-secret-key
    depends_on:
      - mysql
    volumes:
      - ./logs:/app/logs

volumes:
  mysql-data:

第八部分:客户端集成

8.1 Java客户端示例

java 复制代码
package com.example.paimon.catalog.client;

import okhttp3.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.AllArgsConstructor;
import lombok.Data;

public class PaimonCatalogClient {
    
    private final String baseUrl;
    private final String token;
    private final OkHttpClient httpClient;
    private final ObjectMapper objectMapper;
    
    public PaimonCatalogClient(String baseUrl, String token) {
        this.baseUrl = baseUrl;
        this.token = token;
        this.httpClient = new OkHttpClient.Builder()
            .connectTimeout(10, TimeUnit.SECONDS)
            .readTimeout(30, TimeUnit.SECONDS)
            .build();
        this.objectMapper = new ObjectMapper();
    }
    
    /**
     * 创建表
     */
    public void createTable(String database, String table, 
                           TableDefinition definition) throws Exception {
        String url = String.format("%s/apis/v1/%s/tables", 
            baseUrl, database);
        
        String body = objectMapper.writeValueAsString(Map.of(
            "name", table,
            "schema", definition.getSchema(),
            "partitionKeys", definition.getPartitionKeys(),
            "properties", definition.getProperties()
        ));
        
        Request request = new Request.Builder()
            .url(url)
            .addHeader("Authorization", "Bearer " + token)
            .addHeader("Content-Type", "application/json")
            .post(RequestBody.create(body, MediaType.get("application/json")))
            .build();
        
        try (Response response = httpClient.newCall(request).execute()) {
            if (!response.isSuccessful()) {
                throw new RuntimeException(
                    "Failed to create table: " + response.body().string());
            }
        }
    }
    
    /**
     * 获取表Schema
     */
    public Map<String, Object> getTableSchema(String database, 
                                              String table) throws Exception {
        String url = String.format(
            "%s/apis/v1/%s/%s/schema", baseUrl, database, table);
        
        Request request = new Request.Builder()
            .url(url)
            .addHeader("Authorization", "Bearer " + token)
            .get()
            .build();
        
        try (Response response = httpClient.newCall(request).execute()) {
            if (response.isSuccessful()) {
                return objectMapper.readValue(
                    response.body().string(), Map.class);
            } else {
                throw new RuntimeException(
                    "Failed to get schema: " + response.code());
            }
        }
    }
    
    /**
     * 列出分区
     */
    public List<String> getPartitions(String database, String table,
                                     int limit) throws Exception {
        String url = String.format(
            "%s/apis/v1/%s/%s/partitions?limit=%d", 
            baseUrl, database, table, limit);
        
        Request request = new Request.Builder()
            .url(url)
            .addHeader("Authorization", "Bearer " + token)
            .get()
            .build();
        
        try (Response response = httpClient.newCall(request).execute()) {
            if (response.isSuccessful()) {
                Map<String, Object> result = objectMapper.readValue(
                    response.body().string(), Map.class);
                return (List<String>) result.get("partitions");
            } else {
                throw new RuntimeException(
                    "Failed to get partitions: " + response.code());
            }
        }
    }
}

// 表定义
@Data
@AllArgsConstructor
public class TableDefinition {
    private Map<String, Object> schema;
    private List<String> partitionKeys;
    private Map<String, String> properties;
}

// 使用示例
public class Example {
    public static void main(String[] args) throws Exception {
        PaimonCatalogClient client = 
            new PaimonCatalogClient("http://localhost:8080", 
                "your-jwt-token");
        
        TableDefinition definition = new TableDefinition(
            Map.of(
                "fields", List.of(
                    Map.of("name", "id", "type", "BIGINT"),
                    Map.of("name", "name", "type", "STRING")
                ),
                "primaryKeys", List.of("id")
            ),
            List.of("dt"),
            Map.of("bucket", "16", "merge-engine", "deduplicate")
        );
        
        client.createTable("my_db", "my_table", definition);
        
        var schema = client.getTableSchema("my_db", "my_table");
        System.out.println("Schema: " + schema);
        
        var partitions = client.getPartitions("my_db", "my_table", 100);
        System.out.println("Partitions: " + partitions);
    }
}

8.2 Spark集成

scala 复制代码
// 注册Paimon Catalog为Spark Catalog
spark.sql("""
    CREATE CATALOG paimon WITH (
        'type' = 'paimon',
        'warehouse' = 'rest://localhost:8080/apis/v1'
    )
""")

// 使用远程Catalog
spark.sql("USE CATALOG paimon")
spark.sql("USE DATABASE my_db")

val df = spark.sql("SELECT * FROM my_table LIMIT 100")
df.show()

第九部分:常见问题与最佳实践

Q1: 如何处理并发写入冲突?

java 复制代码
/**
 * 使用乐观锁处理并发冲突
 */
@Entity
@Table(name = "paimon_tables")
public class TableMetadata {
    
    @Version
    private Long version;  // 乐观锁版本
    
    // ... 其他字段
}

// 自动处理冲突:如果版本不一致,JPA会抛出异常
try {
    tableMetadataRepository.save(metadata);
} catch (OptimisticLockingFailureException e) {
    log.warn("Concurrent modification detected, retrying...");
    // 重试逻辑
}

Q2: 如何优化大表的分区查询?

java 复制代码
/**
 * 使用分页查询大量分区
 */
@GetMapping("/{table}/partitions")
public ResponseEntity<?> getPartitions(
        @PathVariable String database,
        @PathVariable String table,
        @RequestParam(defaultValue = "0") int page,
        @RequestParam(defaultValue = "100") int size) {
    
    Pageable pageable = PageRequest.of(page, size);
    Page<String> partitions = schemaService
        .getPartitionsPaged(database, table, pageable);
    
    return ResponseEntity.ok(Map.of(
        "partitions", partitions.getContent(),
        "totalElements", partitions.getTotalElements(),
        "totalPages", partitions.getTotalPages(),
        "currentPage", page
    ));
}

Q3: 如何安全地管理密钥?

yaml 复制代码
# 使用环境变量或密钥管理服务
spring:
  datasource:
    password: ${DB_PASSWORD}  # 从环境变量读取
    
jwt:
  secret: ${JWT_SECRET}  # 从环境变量读取

# 或使用Vault集成
spring:
  cloud:
    vault:
      uri: http://localhost:8200
      token: ${VAULT_TOKEN}
      kv:
        backend: secret
        version: 2

第十部分:性能优化建议

优化清单

css 复制代码
缓存优化:
- [x] 启用Schema缓存(TTL 1小时)
- [x] 启用分区缓存(TTL 10分钟)
- [x] 使用CacheLoader自动加载
- [x] 监控缓存命中率

数据库优化:
- [x] 为database+table添加联合索引
- [x] 配置数据库连接池大小
- [x] 启用批量操作
- [x] 定期分析表结构

API优化:
- [x] 启用HTTP压缩
- [x] 实现查询结果分页
- [x] 添加查询超时控制
- [x] 使用异步处理长时间操作

安全优化:
- [x] 启用HTTPS
- [x] 定期轮换JWT密钥
- [x] 实现速率限制
- [x] 启用审计日志

总结

这个架构支持:

  • 集中式认证和授权
  • 元数据缓存和性能优化
  • 跨集群数据同步
  • 完整的审计和追踪
  • 水平扩展能力

下一章:第20章讲解性能测试与基准对标

相关推荐
Blossom.1181 小时前
大模型推理优化实战:连续批处理与PagedAttention性能提升300%
大数据·人工智能·python·神经网络·算法·机器学习·php
F36_9_1 小时前
数字化项目管理系统分享:7款助力企业实现项目智能化协同的工具精选
大数据
qq_12498707532 小时前
基于协同过滤算法的在线教育资源推荐平台的设计与实现(源码+论文+部署+安装)
java·大数据·人工智能·spring boot·spring·毕业设计
程途拾光1582 小时前
发展中国家的AI弯道超车:医疗AI的低成本本土化之路
大数据·人工智能
Mr-Apple3 小时前
记录一次git commit --amend的误操作
大数据·git·elasticsearch
寰天柚子4 小时前
大模型时代的技术从业者:核心能力重构与实践路径
大数据·人工智能
成长之路5144 小时前
【工具变量】上市公司西部陆海新通道DID数据(2010-2024年)
大数据
Hello.Reader4 小时前
Flink SQL UPDATE 语句批模式行级更新、连接器能力要求与实战避坑
大数据·sql·flink
毕设源码-赖学姐5 小时前
【开题答辩全过程】以 基于Spark的电商用户行为分析系统为例,包含答辩的问题和答案
大数据·分布式·spark
图导物联5 小时前
商场室内导航系统:政策适配 + 技术实现 + 代码示例,打通停车逛店全流程
大数据·人工智能·物联网