第19章:REST Catalog自定义服务开发
导言:打造跨系统的元数据服务
在前面的章节中,我们讲解了Paimon的Catalog体系。但在分布式系统中,往往需要跨集群、跨云的元数据管理。REST Catalog就是为了解决这个问题而设计的------通过HTTP接口暴露元数据服务,使得远程系统可以访问和管理Paimon表。
本章将讲解如何从零开始构建一个生产级别的REST Catalog服务。
第一部分:REST Catalog架构设计
1.1 REST Catalog的作用
css
传统Catalog(文件系统):
Spark任务 → 直接访问HDFS Catalog
Flink任务 → 直接访问HDFS Catalog
Presto查询 → 直接访问HDFS Catalog
问题:
├─ 大量网络I/O(每个操作都要访问HDFS)
├─ 无法跨云跨集群
├─ 元数据访问无法集中控制
└─ 缺乏审计和安全控制
REST Catalog解决方案:
REST Catalog Service(中央服务)
├─ 维护元数据缓存
├─ 集中认证和授权
├─ 审计日志
└─ 性能优化
Spark → REST API → Service → 最终存储
Flink → REST API → Service → 最终存储
Presto → REST API → Service → 最终存储
1.2 REST Catalog的核心接口
Paimon REST Catalog需要实现的关键API:
bash
核心CRUD操作:
├─ 数据库管理:GET /apis/v1/databases, POST /apis/v1/databases/{db}
├─ 表管理:GET /apis/v1/{db}/tables, POST /apis/v1/{db}/tables/{table}
├─ Schema操作:GET /apis/v1/{db}/{table}/schema, POST /apis/v1/{db}/{table}/schema
└─ 统计信息:GET /apis/v1/{db}/{table}/stats
表操作:
├─ 分区:GET /apis/v1/{db}/{table}/partitions
├─ Snapshot:GET /apis/v1/{db}/{table}/snapshots
└─ 清理:DELETE /apis/v1/{db}/{table}/partitions/{pt}
第二部分:构建REST Catalog服务框架
2.1 项目依赖配置
xml
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>paimon-rest-catalog</artifactId>
<version>1.0.0</version>
<properties>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
<paimon.version>0.9.0</paimon.version>
<spring-boot.version>2.7.0</spring-boot.version>
</properties>
<dependencies>
<!-- Paimon Core -->
<dependency>
<groupId>org.apache.paimon</groupId>
<artifactId>paimon-core</artifactId>
<version>${paimon.version}</version>
</dependency>
<!-- Spring Boot Web -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>${spring-boot.version}</version>
</dependency>
<!-- Spring Boot Data JPA(用于元数据持久化)-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
<version>${spring-boot.version}</version>
</dependency>
<!-- MySQL Driver -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.33</version>
</dependency>
<!-- Lombok -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.30</version>
<scope>provided</scope>
</dependency>
<!-- Jackson for JSON -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.15.2</version>
</dependency>
<!-- Guava for Caching -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>32.0.0-jre</version>
</dependency>
<!-- Logging -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>2.0.5</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.4.7</version>
</dependency>
</dependencies>
</project>
2.2 核心数据模型
java
package com.example.paimon.catalog.model;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import javax.persistence.*;
import java.time.LocalDateTime;
// 表元数据
@Entity
@Table(name = "paimon_tables")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class TableMetadata {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(nullable = false)
private String database;
@Column(nullable = false)
private String table;
@Column(columnDefinition = "LONGTEXT")
private String schema; // JSON格式的Schema
@Column(columnDefinition = "LONGTEXT")
private String properties; // 表属性
@Column(nullable = false)
private String location; // HDFS或S3路径
@Column(nullable = false)
private String owner;
@Column(nullable = false)
private Long createdAt;
@Column(nullable = false)
private Long updatedAt;
@Column(length = 50)
private String status; // ACTIVE, DELETED, ARCHIVED
@Index(columnList = "database, table")
private String uniqueKey; // 用于快速查询
}
// 字段元数据
@Entity
@Table(name = "paimon_columns")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class ColumnMetadata {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(nullable = false)
private String database;
@Column(nullable = false)
private String table;
@Column(nullable = false)
private String columnName;
@Column(nullable = false)
private String dataType; // INT, BIGINT, STRING等
@Column(nullable = false)
private Integer columnIndex; // 字段顺序
@Column(columnDefinition = "TEXT")
private String comment;
private Boolean nullable;
@Column(columnDefinition = "TEXT")
private String defaultValue;
}
// 操作审计日志
@Entity
@Table(name = "paimon_audit_logs")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class AuditLog {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(nullable = false)
private String operator; // 执行者
@Column(nullable = false)
private String operation; // CREATE, UPDATE, DELETE等
@Column(nullable = false)
private String target; // database.table
@Column(columnDefinition = "LONGTEXT")
private String content; // 操作内容详情
@Column(nullable = false)
private Long timestamp;
private String status; // SUCCESS, FAILED
@Column(columnDefinition = "TEXT")
private String errorMsg;
}
第三部分:实现REST API接口
3.1 数据库管理接口
java
package com.example.paimon.catalog.controller;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import com.example.paimon.catalog.service.DatabaseService;
import java.util.Map;
@RestController
@RequestMapping("/apis/v1/databases")
public class DatabaseController {
@Autowired
private DatabaseService databaseService;
/**
* 列出所有数据库
* GET /apis/v1/databases
*/
@GetMapping
public ResponseEntity<?> listDatabases() {
try {
return ResponseEntity.ok(Map.of(
"databases", databaseService.listDatabases(),
"count", databaseService.getDatabaseCount()
));
} catch (Exception e) {
return ResponseEntity.status(500)
.body(Map.of("error", e.getMessage()));
}
}
/**
* 获取数据库详情
* GET /apis/v1/databases/{database}
*/
@GetMapping("/{database}")
public ResponseEntity<?> getDatabase(@PathVariable String database) {
try {
var db = databaseService.getDatabase(database);
if (db == null) {
return ResponseEntity.status(404)
.body(Map.of("error", "Database not found: " + database));
}
return ResponseEntity.ok(db);
} catch (Exception e) {
return ResponseEntity.status(500)
.body(Map.of("error", e.getMessage()));
}
}
/**
* 创建数据库
* POST /apis/v1/databases
*
* Request body:
* {
* "name": "my_db",
* "comment": "My database",
* "properties": {"key": "value"}
* }
*/
@PostMapping
public ResponseEntity<?> createDatabase(@RequestBody Map<String, Object> request) {
try {
String database = (String) request.get("name");
String comment = (String) request.getOrDefault("comment", "");
databaseService.createDatabase(database, comment,
(Map<String, String>) request.get("properties"));
return ResponseEntity.ok(Map.of(
"status", "success",
"database", database
));
} catch (Exception e) {
return ResponseEntity.status(400)
.body(Map.of("error", e.getMessage()));
}
}
/**
* 删除数据库
* DELETE /apis/v1/databases/{database}
*/
@DeleteMapping("/{database}")
public ResponseEntity<?> dropDatabase(
@PathVariable String database,
@RequestParam(defaultValue = "false") boolean cascade) {
try {
databaseService.dropDatabase(database, cascade);
return ResponseEntity.ok(Map.of(
"status", "success",
"database", database
));
} catch (Exception e) {
return ResponseEntity.status(400)
.body(Map.of("error", e.getMessage()));
}
}
}
3.2 表管理接口
java
package com.example.paimon.catalog.controller;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import com.example.paimon.catalog.service.TableService;
import java.util.Map;
@RestController
@RequestMapping("/apis/v1/{database}/tables")
public class TableController {
@Autowired
private TableService tableService;
/**
* 列出数据库中的所有表
* GET /apis/v1/{database}/tables
*/
@GetMapping
public ResponseEntity<?> listTables(@PathVariable String database) {
try {
return ResponseEntity.ok(Map.of(
"database", database,
"tables", tableService.listTables(database),
"count", tableService.getTableCount(database)
));
} catch (Exception e) {
return ResponseEntity.status(500)
.body(Map.of("error", e.getMessage()));
}
}
/**
* 获取表详情
* GET /apis/v1/{database}/tables/{table}
*/
@GetMapping("/{table}")
public ResponseEntity<?> getTable(
@PathVariable String database,
@PathVariable String table) {
try {
var tableInfo = tableService.getTable(database, table);
if (tableInfo == null) {
return ResponseEntity.status(404)
.body(Map.of("error", "Table not found"));
}
return ResponseEntity.ok(tableInfo);
} catch (Exception e) {
return ResponseEntity.status(500)
.body(Map.of("error", e.getMessage()));
}
}
/**
* 创建表
* POST /apis/v1/{database}/tables
*
* Request body:
* {
* "name": "orders",
* "schema": {
* "fields": [
* {"name": "order_id", "type": "BIGINT", "nullable": false},
* {"name": "amount", "type": "DECIMAL(10, 2)", "nullable": true}
* ],
* "primaryKeys": ["order_id"]
* },
* "partitionKeys": ["dt"],
* "properties": {
* "bucket": "16",
* "merge-engine": "deduplicate"
* }
* }
*/
@PostMapping
public ResponseEntity<?> createTable(
@PathVariable String database,
@RequestBody Map<String, Object> request) {
try {
String table = (String) request.get("name");
tableService.createTable(database, table, request);
return ResponseEntity.ok(Map.of(
"status", "success",
"database", database,
"table", table
));
} catch (Exception e) {
return ResponseEntity.status(400)
.body(Map.of("error", e.getMessage()));
}
}
/**
* 更新表属性
* PUT /apis/v1/{database}/tables/{table}
*/
@PutMapping("/{table}")
public ResponseEntity<?> alterTable(
@PathVariable String database,
@PathVariable String table,
@RequestBody Map<String, Object> request) {
try {
tableService.alterTable(database, table, request);
return ResponseEntity.ok(Map.of(
"status", "success",
"message", "Table altered successfully"
));
} catch (Exception e) {
return ResponseEntity.status(400)
.body(Map.of("error", e.getMessage()));
}
}
/**
* 删除表
* DELETE /apis/v1/{database}/tables/{table}
*/
@DeleteMapping("/{table}")
public ResponseEntity<?> dropTable(
@PathVariable String database,
@PathVariable String table) {
try {
tableService.dropTable(database, table);
return ResponseEntity.ok(Map.of(
"status", "success",
"database", database,
"table", table
));
} catch (Exception e) {
return ResponseEntity.status(400)
.body(Map.of("error", e.getMessage()));
}
}
}
3.3 Schema管理接口
java
package com.example.paimon.catalog.controller;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import com.example.paimon.catalog.service.SchemaService;
import java.util.Map;
@RestController
@RequestMapping("/apis/v1/{database}/{table}")
public class SchemaController {
@Autowired
private SchemaService schemaService;
/**
* 获取表Schema
* GET /apis/v1/{database}/{table}/schema
*/
@GetMapping("/schema")
public ResponseEntity<?> getSchema(
@PathVariable String database,
@PathVariable String table) {
try {
var schema = schemaService.getSchema(database, table);
return ResponseEntity.ok(Map.of(
"database", database,
"table", table,
"schema", schema
));
} catch (Exception e) {
return ResponseEntity.status(500)
.body(Map.of("error", e.getMessage()));
}
}
/**
* 更新Schema(添加列)
* POST /apis/v1/{database}/{table}/schema
*
* Request body:
* {
* "operation": "add",
* "columns": [
* {"name": "new_col", "type": "STRING"}
* ]
* }
*/
@PostMapping("/schema")
public ResponseEntity<?> alterSchema(
@PathVariable String database,
@PathVariable String table,
@RequestBody Map<String, Object> request) {
try {
schemaService.alterSchema(database, table, request);
return ResponseEntity.ok(Map.of(
"status", "success",
"message", "Schema updated successfully"
));
} catch (Exception e) {
return ResponseEntity.status(400)
.body(Map.of("error", e.getMessage()));
}
}
/**
* 获取表统计信息
* GET /apis/v1/{database}/{table}/stats
*/
@GetMapping("/stats")
public ResponseEntity<?> getTableStats(
@PathVariable String database,
@PathVariable String table) {
try {
var stats = schemaService.getTableStats(database, table);
return ResponseEntity.ok(stats);
} catch (Exception e) {
return ResponseEntity.status(500)
.body(Map.of("error", e.getMessage()));
}
}
/**
* 获取分区列表
* GET /apis/v1/{database}/{table}/partitions
*/
@GetMapping("/partitions")
public ResponseEntity<?> getPartitions(
@PathVariable String database,
@PathVariable String table,
@RequestParam(defaultValue = "100") int limit) {
try {
var partitions = schemaService.getPartitions(database, table, limit);
return ResponseEntity.ok(Map.of(
"database", database,
"table", table,
"partitions", partitions
));
} catch (Exception e) {
return ResponseEntity.status(500)
.body(Map.of("error", e.getMessage()));
}
}
/**
* 获取Snapshot列表
* GET /apis/v1/{database}/{table}/snapshots
*/
@GetMapping("/snapshots")
public ResponseEntity<?> getSnapshots(
@PathVariable String database,
@PathVariable String table,
@RequestParam(defaultValue = "20") int limit) {
try {
var snapshots = schemaService.getSnapshots(database, table, limit);
return ResponseEntity.ok(Map.of(
"database", database,
"table", table,
"snapshots", snapshots
));
} catch (Exception e) {
return ResponseEntity.status(500)
.body(Map.of("error", e.getMessage()));
}
}
}
第四部分:核心业务逻辑实现
4.1 表服务层
java
package com.example.paimon.catalog.service;
import org.apache.paimon.catalog.Catalog;
import org.apache.paimon.catalog.CatalogFactory;
import org.apache.paimon.schema.Schema;
import org.apache.paimon.types.*;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import com.example.paimon.catalog.model.TableMetadata;
import com.example.paimon.catalog.repository.TableMetadataRepository;
import com.google.common.cache.LoadingCache;
import lombok.extern.slf4j.Slf4j;
import java.util.*;
@Slf4j
@Service
public class TableService {
@Autowired
private TableMetadataRepository tableMetadataRepository;
@Autowired
private CatalogFactory catalogFactory;
private final LoadingCache<String, Catalog> catalogCache;
public TableService() {
this.catalogCache = CacheBuilder.newBuilder()
.maximumSize(10)
.expireAfterWrite(Duration.ofHours(1))
.build(new CacheLoader<String, Catalog>() {
@Override
public Catalog load(String path) throws Exception {
return createCatalog(path);
}
});
}
/**
* 列出数据库中的所有表
*/
public List<String> listTables(String database) throws Exception {
var catalog = getCatalog();
return catalog.listTables(database);
}
/**
* 获取表元数据
*/
public Map<String, Object> getTable(String database, String table) throws Exception {
var metadata = tableMetadataRepository
.findByDatabaseAndTable(database, table);
if (metadata == null) {
return null;
}
return Map.of(
"database", metadata.getDatabase(),
"table", metadata.getTable(),
"schema", metadata.getSchema(),
"location", metadata.getLocation(),
"owner", metadata.getOwner(),
"createdAt", metadata.getCreatedAt(),
"updatedAt", metadata.getUpdatedAt(),
"properties", metadata.getProperties()
);
}
/**
* 创建表
*/
public void createTable(String database, String table,
Map<String, Object> request) throws Exception {
log.info("Creating table: {}.{}", database, table);
// 验证请求
validateCreateTableRequest(request);
// 构建Schema
Map<String, Object> schemaMap = (Map<String, Object>) request.get("schema");
List<DataField> fields = buildFields(schemaMap);
List<String> primaryKeys = (List<String>) request.get("primaryKeys");
List<String> partitionKeys = (List<String>)
request.getOrDefault("partitionKeys", new ArrayList<>());
Schema schema = Schema.newBuilder()
.fields(fields)
.primaryKey(primaryKeys)
.partitionKeys(partitionKeys)
.build();
// 获取表属性
Map<String, String> properties =
(Map<String, String>) request.getOrDefault("properties", new HashMap<>());
// 获取Paimon Catalog
var catalog = getCatalog();
// 创建表
try {
catalog.createTable(database, table, schema, properties);
// 保存元数据到MySQL
TableMetadata metadata = new TableMetadata();
metadata.setDatabase(database);
metadata.setTable(table);
metadata.setSchema(jacksonObjectMapper.writeValueAsString(schema));
metadata.setLocation(request.getOrDefault("location", "").toString());
metadata.setOwner((String) request.getOrDefault("owner", "admin"));
metadata.setCreatedAt(System.currentTimeMillis());
metadata.setUpdatedAt(System.currentTimeMillis());
metadata.setStatus("ACTIVE");
metadata.setProperties(jacksonObjectMapper
.writeValueAsString(properties));
tableMetadataRepository.save(metadata);
log.info("Table created successfully: {}.{}", database, table);
} catch (Exception e) {
log.error("Failed to create table: {}.{}", database, table, e);
throw e;
}
}
/**
* 删除表
*/
public void dropTable(String database, String table) throws Exception {
log.info("Dropping table: {}.{}", database, table);
try {
var catalog = getCatalog();
catalog.dropTable(database, table, false);
// 更新数据库状态
var metadata = tableMetadataRepository
.findByDatabaseAndTable(database, table);
if (metadata != null) {
metadata.setStatus("DELETED");
metadata.setUpdatedAt(System.currentTimeMillis());
tableMetadataRepository.save(metadata);
}
log.info("Table dropped successfully: {}.{}", database, table);
} catch (Exception e) {
log.error("Failed to drop table: {}.{}", database, table, e);
throw e;
}
}
/**
* 获取表数量
*/
public long getTableCount(String database) {
return tableMetadataRepository.countByDatabaseAndStatus(database, "ACTIVE");
}
// 辅助方法
private List<DataField> buildFields(Map<String, Object> schemaMap) {
List<DataField> fields = new ArrayList<>();
List<Map<String, Object>> fieldsList =
(List<Map<String, Object>>) schemaMap.get("fields");
int index = 0;
for (Map<String, Object> field : fieldsList) {
String name = (String) field.get("name");
String typeStr = (String) field.get("type");
Boolean nullable = (Boolean) field.getOrDefault("nullable", true);
DataType type = parseDataType(typeStr);
fields.add(new DataField(index++, name, type,
(String) field.get("comment")));
}
return fields;
}
private DataType parseDataType(String typeStr) {
if ("INT".equalsIgnoreCase(typeStr)) {
return new IntType();
} else if ("BIGINT".equalsIgnoreCase(typeStr)) {
return new BigIntType();
} else if ("STRING".equalsIgnoreCase(typeStr)) {
return new VarCharType();
} else if ("DOUBLE".equalsIgnoreCase(typeStr)) {
return new DoubleType();
} else if ("DECIMAL".equalsIgnoreCase(typeStr)) {
// 简化处理,实际应解析(precision, scale)
return new DecimalType(10, 2);
}
// ... 其他类型处理
return new VarCharType();
}
private Catalog getCatalog() throws Exception {
return catalogCache.getUnchecked("/path/to/paimon");
}
}
4.2 缓存与性能优化
java
package com.example.paimon.catalog.cache;
import com.google.common.cache.CacheBuilder;
import com.google.common.cache.CacheLoader;
import com.google.common.cache.LoadingCache;
import org.springframework.stereotype.Component;
import lombok.extern.slf4j.Slf4j;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
@Slf4j
@Component
public class CatalogCacheManager {
/**
* Schema缓存:数据库名+表名 → Schema
*/
private final LoadingCache<String, String> schemaCache =
CacheBuilder.newBuilder()
.maximumSize(1000)
.expireAfterWrite(1, TimeUnit.HOURS)
.recordStats()
.build(new CacheLoader<String, String>() {
@Override
public String load(String key) throws Exception {
// key format: "database.table"
String[] parts = key.split("\\.");
return loadSchemaFromPaimon(parts[0], parts[1]);
}
});
/**
* 分区缓存:表名 → 分区列表
*/
private final LoadingCache<String, List<String>> partitionCache =
CacheBuilder.newBuilder()
.maximumSize(100)
.expireAfterWrite(10, TimeUnit.MINUTES)
.recordStats()
.build(new CacheLoader<String, List<String>>() {
@Override
public List<String> load(String tableKey) throws Exception {
String[] parts = tableKey.split("\\.");
return loadPartitionsFromPaimon(parts[0], parts[1]);
}
});
/**
* 获取Schema缓存
*/
public String getSchema(String database, String table)
throws ExecutionException {
String key = database + "." + table;
return schemaCache.get(key);
}
/**
* 刷新Schema缓存
*/
public void invalidateSchema(String database, String table) {
String key = database + "." + table;
schemaCache.invalidate(key);
log.info("Invalidated schema cache for {}", key);
}
/**
* 获取分区缓存
*/
public List<String> getPartitions(String database, String table)
throws ExecutionException {
String key = database + "." + table;
return partitionCache.get(key);
}
/**
* 刷新分区缓存(在新增分区后调用)
*/
public void invalidatePartitions(String database, String table) {
String key = database + "." + table;
partitionCache.invalidate(key);
log.info("Invalidated partition cache for {}", key);
}
/**
* 获取缓存统计信息
*/
public Map<String, Object> getCacheStats() {
return Map.of(
"schema", Map.of(
"hits", schemaCache.stats().hitCount(),
"misses", schemaCache.stats().missCount(),
"hitRate", schemaCache.stats().hitRate()
),
"partitions", Map.of(
"hits", partitionCache.stats().hitCount(),
"misses", partitionCache.stats().missCount(),
"hitRate", partitionCache.stats().hitRate()
)
);
}
private String loadSchemaFromPaimon(String database, String table)
throws Exception {
// 实现从Paimon读取Schema
log.debug("Loading schema for {}.{} from Paimon", database, table);
// ... 实现细节
return "";
}
private List<String> loadPartitionsFromPaimon(String database, String table)
throws Exception {
// 实现从Paimon读取分区列表
log.debug("Loading partitions for {}.{} from Paimon", database, table);
// ... 实现细节
return new ArrayList<>();
}
}
第五部分:认证与授权
5.1 JWT认证拦截器
java
package com.example.paimon.catalog.security;
import io.jsonwebtoken.Jwts;
import io.jsonwebtoken.SignatureAlgorithm;
import io.jsonwebtoken.security.Keys;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;
import javax.servlet.http.HttpServletRequest;
import java.util.Date;
@Component
public class JwtTokenProvider {
@Value("${jwt.secret:your-secret-key-change-in-production}")
private String jwtSecret;
@Value("${jwt.expiration:86400000}") // 24小时
private long jwtExpirationMs;
/**
* 生成Token
*/
public String generateToken(String username) {
return Jwts.builder()
.setSubject(username)
.setIssuedAt(new Date())
.setExpiration(new Date(System.currentTimeMillis() + jwtExpirationMs))
.signWith(Keys.hmacShaKeyFor(jwtSecret.getBytes()),
SignatureAlgorithm.HS512)
.compact();
}
/**
* 从request获取Token
*/
public String getTokenFromRequest(HttpServletRequest request) {
String bearerToken = request.getHeader("Authorization");
if (bearerToken != null && bearerToken.startsWith("Bearer ")) {
return bearerToken.substring(7);
}
return null;
}
/**
* 验证Token并获取用户名
*/
public String getUsernameFromToken(String token) {
try {
return Jwts.parserBuilder()
.setSigningKey(Keys.hmacShaKeyFor(jwtSecret.getBytes()))
.build()
.parseClaimsJws(token)
.getBody()
.getSubject();
} catch (Exception e) {
return null;
}
}
}
// 认证过滤器
@Component
public class JwtAuthenticationFilter extends OncePerRequestFilter {
@Autowired
private JwtTokenProvider jwtTokenProvider;
@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response,
FilterChain filterChain)
throws ServletException, IOException {
try {
String token = jwtTokenProvider.getTokenFromRequest(request);
if (token != null) {
String username = jwtTokenProvider.getUsernameFromToken(token);
if (username != null) {
// 设置认证信息
SecurityContextHolder.getContext()
.setAuthentication(
new UsernamePasswordAuthenticationToken(
username, null,
new ArrayList<>())
);
}
}
} catch (Exception e) {
logger.error("Cannot set user authentication", e);
}
filterChain.doFilter(request, response);
}
}
5.2 基于角色的访问控制(RBAC)
java
package com.example.paimon.catalog.security;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.stereotype.Service;
@Service
public class AccessControlService {
/**
* 检查用户是否有表操作权限
*/
public boolean hasTableAccess(String username, String database,
String table, String operation) {
// 实现权限检查逻辑
// operation: CREATE, READ, UPDATE, DELETE
// 示例:从数据库读取权限配置
// var permission = permissionRepository
// .findByUsernameAndResourceAndOperation(
// username, database + "." + table, operation);
// return permission != null;
return true; // 默认允许
}
/**
* 检查用户是否是数据库管理员
*/
public boolean isDatabaseAdmin(String username, String database) {
// 实现管理员检查逻辑
return true; // 简化实现
}
}
// 在Controller中使用
@RestController
@RequestMapping("/apis/v1/{database}/tables")
public class TableController {
@Autowired
private AccessControlService accessControlService;
@PostMapping
public ResponseEntity<?> createTable(
@PathVariable String database,
@RequestBody Map<String, Object> request,
Principal principal) {
String username = principal.getName();
// 检查权限
if (!accessControlService.hasTableAccess(
username, database,
(String) request.get("name"), "CREATE")) {
return ResponseEntity.status(403)
.body(Map.of("error", "Permission denied"));
}
// ... 创建表逻辑
}
}
第六部分:审计与监控
6.1 操作审计
java
package com.example.paimon.catalog.audit;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import com.example.paimon.catalog.model.AuditLog;
import com.example.paimon.catalog.repository.AuditLogRepository;
import lombok.extern.slf4j.Slf4j;
@Slf4j
@Component
public class AuditLogger {
@Autowired
private AuditLogRepository auditLogRepository;
/**
* 记录操作日志
*/
public void log(String operator, String operation, String target,
Map<String, Object> content, String status,
String errorMsg) {
try {
AuditLog log = new AuditLog();
log.setOperator(operator);
log.setOperation(operation); // CREATE, UPDATE, DELETE
log.setTarget(target); // database.table
log.setContent(convertToJson(content));
log.setTimestamp(System.currentTimeMillis());
log.setStatus(status); // SUCCESS, FAILED
log.setErrorMsg(errorMsg);
auditLogRepository.save(log);
log.info("Audit log recorded: {} {} {}", operator, operation, target);
} catch (Exception e) {
log.error("Failed to record audit log", e);
}
}
/**
* 简化版本:成功操作
*/
public void logSuccess(String operator, String operation, String target,
Map<String, Object> content) {
log(operator, operation, target, content, "SUCCESS", null);
}
/**
* 简化版本:失败操作
*/
public void logFailure(String operator, String operation, String target,
String errorMsg) {
log(operator, operation, target, null, "FAILED", errorMsg);
}
}
// 使用AspectJ注解实现自动审计
@Slf4j
@Aspect
@Component
public class AuditAspect {
@Autowired
private AuditLogger auditLogger;
@Around("@annotation(com.example.paimon.catalog.audit.Auditable)")
public Object auditOperation(ProceedingJoinPoint joinPoint)
throws Throwable {
MethodSignature signature = (MethodSignature) joinPoint.getSignature();
Auditable auditable = signature.getMethod()
.getAnnotation(Auditable.class);
String operator = getOperator();
String operation = auditable.operation();
String target = extractTarget(joinPoint);
try {
Object result = joinPoint.proceed();
auditLogger.logSuccess(operator, operation, target, null);
return result;
} catch (Exception e) {
auditLogger.logFailure(operator, operation, target,
e.getMessage());
throw e;
}
}
private String getOperator() {
Authentication auth = SecurityContextHolder.getContext()
.getAuthentication();
return auth != null ? auth.getName() : "UNKNOWN";
}
private String extractTarget(ProceedingJoinPoint joinPoint) {
Object[] args = joinPoint.getArgs();
if (args.length >= 2) {
return args[0].toString() + "." + args[1].toString();
}
return "UNKNOWN";
}
}
// 使用注解
@Auditable(operation = "CREATE")
public void createTable(...) {
// ...
}
6.2 性能监控
java
package com.example.paimon.catalog.monitor;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import java.util.Map;
@RestController
@RequestMapping("/metrics")
public class MetricsController {
/**
* 获取系统健康状态
* GET /metrics/health
*/
@GetMapping("/health")
public ResponseEntity<?> health() {
return ResponseEntity.ok(Map.of(
"status", "UP",
"timestamp", System.currentTimeMillis(),
"uptime", getUptime(),
"version", "1.0.0"
));
}
/**
* 获取性能指标
* GET /metrics/performance
*/
@GetMapping("/performance")
public ResponseEntity<?> getPerformanceMetrics() {
Runtime runtime = Runtime.getRuntime();
return ResponseEntity.ok(Map.of(
"memory", Map.of(
"total", runtime.totalMemory() / 1024 / 1024 + "MB",
"used", (runtime.totalMemory() - runtime.freeMemory())
/ 1024 / 1024 + "MB",
"free", runtime.freeMemory() / 1024 / 1024 + "MB"
),
"threads", Map.of(
"count", Thread.activeCount(),
"peakCount", Thread.activeCount()
),
"gc", Map.of(
"collections", getGCCollectionCount(),
"time", getGCTime() + "ms"
)
));
}
/**
* 获取缓存统计
* GET /metrics/cache
*/
@GetMapping("/cache")
public ResponseEntity<?> getCacheMetrics(
@Autowired CatalogCacheManager cacheManager) {
return ResponseEntity.ok(cacheManager.getCacheStats());
}
private long getUptime() {
return ManagementFactory.getRuntimeMXBean().getUptime();
}
private long getGCCollectionCount() {
return ManagementFactory.getGarbageCollectorMXBeans().stream()
.mapToLong(gc -> gc.getCollectionCount())
.sum();
}
private long getGCTime() {
return ManagementFactory.getGarbageCollectorMXBeans().stream()
.mapToLong(gc -> gc.getCollectionTime())
.sum();
}
}
第七部分:生产部署与配置
7.1 应用配置文件
yaml
# application.yml
spring:
application:
name: paimon-rest-catalog
datasource:
url: jdbc:mysql://localhost:3306/paimon_catalog?characterEncoding=utf8
username: root
password: password
driver-class-name: com.mysql.cj.jdbc.Driver
jpa:
hibernate:
ddl-auto: validate
properties:
hibernate:
dialect: org.hibernate.dialect.MySQL8Dialect
format_sql: true
show_sql: false
jdbc:
batch_size: 50
fetch_size: 100
# Paimon配置
paimon:
warehouse: hdfs:///warehouse
default-database: default
catalog-type: filesystem
# 缓存配置
cache:
schema:
ttl-hours: 1
max-size: 1000
partition:
ttl-minutes: 10
max-size: 100
# JWT配置
jwt:
secret: ${JWT_SECRET:your-secret-key-very-long-string-for-production}
expiration: 86400000 # 24小时
# 日志配置
logging:
level:
root: INFO
com.example.paimon: DEBUG
file:
name: logs/paimon-catalog.log
pattern:
file: "%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %msg%n"
server:
port: 8080
servlet:
context-path: /
compression:
enabled: true
min-response-size: 1024
7.2 Docker部署
dockerfile
# Dockerfile
FROM openjdk:11-jre-slim
WORKDIR /app
# 复制JAR文件
COPY target/paimon-rest-catalog-1.0.0.jar app.jar
# 暴露端口
EXPOSE 8080
# 启动应用
ENTRYPOINT ["java", "-Xms1g", "-Xmx2g", "-jar", "app.jar"]
yaml
# docker-compose.yml
version: '3.8'
services:
mysql:
image: mysql:8.0
environment:
MYSQL_ROOT_PASSWORD: password
MYSQL_DATABASE: paimon_catalog
ports:
- "3306:3306"
volumes:
- mysql-data:/var/lib/mysql
paimon-catalog:
build: .
ports:
- "8080:8080"
environment:
SPRING_DATASOURCE_URL: jdbc:mysql://mysql:3306/paimon_catalog
SPRING_DATASOURCE_USERNAME: root
SPRING_DATASOURCE_PASSWORD: password
PAIMON_WAREHOUSE: hdfs:///warehouse
JWT_SECRET: your-production-secret-key
depends_on:
- mysql
volumes:
- ./logs:/app/logs
volumes:
mysql-data:
第八部分:客户端集成
8.1 Java客户端示例
java
package com.example.paimon.catalog.client;
import okhttp3.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.AllArgsConstructor;
import lombok.Data;
public class PaimonCatalogClient {
private final String baseUrl;
private final String token;
private final OkHttpClient httpClient;
private final ObjectMapper objectMapper;
public PaimonCatalogClient(String baseUrl, String token) {
this.baseUrl = baseUrl;
this.token = token;
this.httpClient = new OkHttpClient.Builder()
.connectTimeout(10, TimeUnit.SECONDS)
.readTimeout(30, TimeUnit.SECONDS)
.build();
this.objectMapper = new ObjectMapper();
}
/**
* 创建表
*/
public void createTable(String database, String table,
TableDefinition definition) throws Exception {
String url = String.format("%s/apis/v1/%s/tables",
baseUrl, database);
String body = objectMapper.writeValueAsString(Map.of(
"name", table,
"schema", definition.getSchema(),
"partitionKeys", definition.getPartitionKeys(),
"properties", definition.getProperties()
));
Request request = new Request.Builder()
.url(url)
.addHeader("Authorization", "Bearer " + token)
.addHeader("Content-Type", "application/json")
.post(RequestBody.create(body, MediaType.get("application/json")))
.build();
try (Response response = httpClient.newCall(request).execute()) {
if (!response.isSuccessful()) {
throw new RuntimeException(
"Failed to create table: " + response.body().string());
}
}
}
/**
* 获取表Schema
*/
public Map<String, Object> getTableSchema(String database,
String table) throws Exception {
String url = String.format(
"%s/apis/v1/%s/%s/schema", baseUrl, database, table);
Request request = new Request.Builder()
.url(url)
.addHeader("Authorization", "Bearer " + token)
.get()
.build();
try (Response response = httpClient.newCall(request).execute()) {
if (response.isSuccessful()) {
return objectMapper.readValue(
response.body().string(), Map.class);
} else {
throw new RuntimeException(
"Failed to get schema: " + response.code());
}
}
}
/**
* 列出分区
*/
public List<String> getPartitions(String database, String table,
int limit) throws Exception {
String url = String.format(
"%s/apis/v1/%s/%s/partitions?limit=%d",
baseUrl, database, table, limit);
Request request = new Request.Builder()
.url(url)
.addHeader("Authorization", "Bearer " + token)
.get()
.build();
try (Response response = httpClient.newCall(request).execute()) {
if (response.isSuccessful()) {
Map<String, Object> result = objectMapper.readValue(
response.body().string(), Map.class);
return (List<String>) result.get("partitions");
} else {
throw new RuntimeException(
"Failed to get partitions: " + response.code());
}
}
}
}
// 表定义
@Data
@AllArgsConstructor
public class TableDefinition {
private Map<String, Object> schema;
private List<String> partitionKeys;
private Map<String, String> properties;
}
// 使用示例
public class Example {
public static void main(String[] args) throws Exception {
PaimonCatalogClient client =
new PaimonCatalogClient("http://localhost:8080",
"your-jwt-token");
TableDefinition definition = new TableDefinition(
Map.of(
"fields", List.of(
Map.of("name", "id", "type", "BIGINT"),
Map.of("name", "name", "type", "STRING")
),
"primaryKeys", List.of("id")
),
List.of("dt"),
Map.of("bucket", "16", "merge-engine", "deduplicate")
);
client.createTable("my_db", "my_table", definition);
var schema = client.getTableSchema("my_db", "my_table");
System.out.println("Schema: " + schema);
var partitions = client.getPartitions("my_db", "my_table", 100);
System.out.println("Partitions: " + partitions);
}
}
8.2 Spark集成
scala
// 注册Paimon Catalog为Spark Catalog
spark.sql("""
CREATE CATALOG paimon WITH (
'type' = 'paimon',
'warehouse' = 'rest://localhost:8080/apis/v1'
)
""")
// 使用远程Catalog
spark.sql("USE CATALOG paimon")
spark.sql("USE DATABASE my_db")
val df = spark.sql("SELECT * FROM my_table LIMIT 100")
df.show()
第九部分:常见问题与最佳实践
Q1: 如何处理并发写入冲突?
java
/**
* 使用乐观锁处理并发冲突
*/
@Entity
@Table(name = "paimon_tables")
public class TableMetadata {
@Version
private Long version; // 乐观锁版本
// ... 其他字段
}
// 自动处理冲突:如果版本不一致,JPA会抛出异常
try {
tableMetadataRepository.save(metadata);
} catch (OptimisticLockingFailureException e) {
log.warn("Concurrent modification detected, retrying...");
// 重试逻辑
}
Q2: 如何优化大表的分区查询?
java
/**
* 使用分页查询大量分区
*/
@GetMapping("/{table}/partitions")
public ResponseEntity<?> getPartitions(
@PathVariable String database,
@PathVariable String table,
@RequestParam(defaultValue = "0") int page,
@RequestParam(defaultValue = "100") int size) {
Pageable pageable = PageRequest.of(page, size);
Page<String> partitions = schemaService
.getPartitionsPaged(database, table, pageable);
return ResponseEntity.ok(Map.of(
"partitions", partitions.getContent(),
"totalElements", partitions.getTotalElements(),
"totalPages", partitions.getTotalPages(),
"currentPage", page
));
}
Q3: 如何安全地管理密钥?
yaml
# 使用环境变量或密钥管理服务
spring:
datasource:
password: ${DB_PASSWORD} # 从环境变量读取
jwt:
secret: ${JWT_SECRET} # 从环境变量读取
# 或使用Vault集成
spring:
cloud:
vault:
uri: http://localhost:8200
token: ${VAULT_TOKEN}
kv:
backend: secret
version: 2
第十部分:性能优化建议
优化清单
css
缓存优化:
- [x] 启用Schema缓存(TTL 1小时)
- [x] 启用分区缓存(TTL 10分钟)
- [x] 使用CacheLoader自动加载
- [x] 监控缓存命中率
数据库优化:
- [x] 为database+table添加联合索引
- [x] 配置数据库连接池大小
- [x] 启用批量操作
- [x] 定期分析表结构
API优化:
- [x] 启用HTTP压缩
- [x] 实现查询结果分页
- [x] 添加查询超时控制
- [x] 使用异步处理长时间操作
安全优化:
- [x] 启用HTTPS
- [x] 定期轮换JWT密钥
- [x] 实现速率限制
- [x] 启用审计日志
总结
这个架构支持:
- 集中式认证和授权
- 元数据缓存和性能优化
- 跨集群数据同步
- 完整的审计和追踪
- 水平扩展能力
下一章:第20章讲解性能测试与基准对标