【第九章知识点总结3】9.4 Physical model 9.5 pgRouting

9.4 Physical model

9.4.1 Storage and data structures

核心目标

针对空间网络,**找到高效的磁盘存储数据结构,**最小化Find()Insert()Delete()Create()Get-A-Successor()Get-Successors()等操作的 I/O 成本;核心约束是空间网络规模远大于主存,且几何索引(如 R 树)因按邻近性聚类而非边连通性聚类,在边连通性与邻近性无关时性能较差。

关键指标

  • CRR(Connection Retrieval Ratio):边连接的节点对位于同一磁盘扇区的概率,最大化 CRR 是降低 I/O 成本的核心目标。

数据结构分类

1. 主存数据结构
数据结构 定义与特点
邻接矩阵(Adjacency Matrix) 若存在顶点 A 到顶点 B 的边,则M[A,B]=1,直接映射顶点间的连通关系
邻接表(Adjacency List) 将每个顶点映射到其后续顶点列表,高效存储稀疏网络的连通信息
2. 基于磁盘的表结构
表类型 结构特点
规范化表(Normalized Tables) 分两张表存储:一张存储顶点信息(如 id、x 坐标、y 坐标),另一张存储边信息(如 source、dest、distance)
非规范化表(Denormalized Tables) 单张表存储节点信息,包含顶点的后续节点(Successors)和前驱节点(Predecessors)字段,减少表关联操作
3. 基于图的存储方法

核心思路

通过节点分区将图分配到磁盘块,优先选择**"最小割图分区"(Min-cut Graph Partition)**而非 "几何分区",因为前者切割的边更少,能获得更高的 CRR(假设边的查询热度均匀)。

关键步骤

  1. 节点分区:将节点划分到不同磁盘扇区,最大化边连接的节点对在同一扇区的概率;
  2. 二级索引 :使用 R 树或 B 树构建二级索引,支持高效的find()操作;
  3. 实例验证:对明尼阿波利斯主要道路网络的存储测试表明,减少 "切割边"(cut-edges)的分区方案能显著提升 CRR。

存储优化要点

  • 避免依赖几何邻近性聚类,聚焦边连通性聚类;
  • 通过合理分区减少磁盘 I/O,尤其针对 "获取后续节点" 等高频操作。

9.4.2 Algorithms for connectivity query and shortest path

核心查询构建块

空间网络查询的核心构建块包括两类:

  1. 连通性查询(Connectivity (A,B)):判断节点 B 是否可从节点 A 到达;
  2. 最短路径查询(Shortest path (A,B)):找到节点 A 到节点 B 的最小成本路径。

算法分类与特点

1. 主存算法(适用于全图加载到主存的场景)
算法类型 适用场景 核心特点
连通性查询 无向 / 有向图连通性判断 广度优先搜索(BFS)、深度优先搜索(DFS),高效遍历节点验证连通关系
最短路径查询 非负边权图 迪杰斯特拉算法(Dijkstra's algorithm): 贪心策略,逐步扩展最小成本路径;**A * 算法:**引入欧氏距离作为启发函数,减少无效遍历
2. 磁盘级算法(适用于图规模超主存的场景)

分层路由算法(Hierarchical Routing Algorithm)

核心思想

将大图分解为 "岛屿"(Islands)和 "桥梁"(Bridges,跨岛屿的边),通过**"分而治之"** 策略降低计算成本,即**SP(A,B)=SP(A,X)+edge(X,Y)+SP(Y,B)**(X、Y 为桥梁连接的边界节点)。

优势与性能对比
算法 节点扩展数量 核心优势 适用场景
迪杰斯特拉算法 18(示例) 全图遍历,结果精确 主存中的小规模图
A * 算法(欧氏距离启发) 14(示例) 启发函数优化,减少无效遍历 主存中的空间网络(含位置信息)
分层路由算法 11(示例) 分块计算,降低磁盘 I/O 和计算量 磁盘级大规模空间网络
多岛屿与多桥梁挑战
  • 挑战 1:多岛屿会增加计算步骤,需逐步拆解路径;
  • 挑战 2:多桥梁需选择最优边界节点对,即最小化SPC(A,本地边界节点)+SPC(本地边界节点,目标边界节点)+SPC(目标边界节点,B)(SPC 为最短路径成本);
  • 解决方案:预计算边界节点对的最短路径成本,构建边界图(Port graph),优化子问题求解效率。

关键数据结构

  • 边界节点(Port node/Boundary node):连接多个岛屿的节点;
  • 岛屿图(Island graphs/Fragment graphs):分区后的子图;
  • 边界图(Boundary graph):汇总边界节点和跨岛屿路径的简化图。

9.5 pgRouting

核心定位

pgRouting 是 PostgreSQL 的地理空间路由扩展,基于数据库实现空间网络路由功能,支持动态数据更新、多客户端访问和灵活的成本计算。

核心优势

  1. 数据与路由解耦:数据可通过 JDBC、ODBC 或 PL/pgSQL 由多客户端修改,路由引擎实时响应数据变化;
  2. 成本参数灵活:成本值可通过 SQL 动态计算,支持多字段 / 多表联合定义成本;
  3. 兼容 PostGIS:与 PostGIS 扩展无缝集成,支持空间几何数据的存储与处理。

核心操作步骤

1. 环境搭建

  1. 创建数据库:createdb netdatabase
  2. 启用扩展:
    • 启用 PostGIS:psql netdatabase -c "create extension postgis"
    • 启用 pgRouting:psql netdatabase -c "create extension pgrouting"

2. 网络拓扑构建

核心函数:pgr_createTopology
  • 功能:基于边表的几何信息构建网络拓扑,生成顶点表(命名格式:edge_table_vertices_pgr);
  • 参数说明:
    • edge_table:网络边表名(需包含几何字段、主键字段、源节点字段、目标节点字段);
    • tolerance:节点融合阈值(投影单位),距离小于该值的节点将合并;
    • the_geom:几何字段名(默认值:the_geom);
    • id:边表主键字段名(默认值:id);
    • source/target:源节点 / 目标节点字段名(默认值:source/target);
  • 输出结果:
    • 边表的source/target字段更新为顶点表的主键 ID;
    • 生成顶点表,包含顶点 ID、几何坐标、入边数(ein)、出边数(eout)等信息。
拓扑优化函数
函数 功能
pgr_analyzeGraph 分析网络拓扑完整性,检测孤立边、死端、间隙、交点等错误,更新顶点表的cnt(引用计数)和chk(错误标记)字段
pgr_nodeNetwork 按交点分割边表,生成新的分段边表(默认后缀noded),解决边相交导致的拓扑错误
pgr_analyzeOneway 分析单向街道网络,识别方向错误的线段

3. 最短路径查询

核心函数:pgr_dijkstra
  • 功能:基于迪杰斯特拉算法求解最短路径;

  • 语法:

    sql 复制代码
    pgr_dijkstra(
      text edges_sql,  -- 边表查询(需包含id、source、target、cost[、reverse_cost])
      bigint start_vid,-- 起始顶点ID
      bigint end_vid,  -- 目标顶点ID
      boolean directed:=true  -- 是否为有向图(默认true)
    )
  • 输出结果类型(pgr_costResult[]):

    字段名 含义
    seq 路径顺序编号
    id1 节点 ID
    id2 边 ID
    cost 边的成本
    agg_cost 从起始节点到当前节点的累计成本
其他路由算法

pgRouting 支持多种扩展算法,包括:

  • 双向迪杰斯特拉算法(Bi-directional Dijkstra);
  • A算法(Shortest Path A);
  • 多源最短路径算法(K-Dijkstra);
  • 旅行商问题算法(Traveling Sales Person);
  • 带转向限制的最短路径算法(TRSP)等。

典型应用示例

步骤 1:创建并初始化边表

sql 复制代码
-- 创建道路表
create table road (
  id serial primary key,
  name text,
  geom geometry(LineString, 4326)
);
-- 插入道路数据
insert into road(name, geom) values
('A', ST_GeomFromText('LineString(0 20, 10 20)', 4326)),
('B', ST_GeomFromText('LineString(10 20, 10 2)', 4326)),
('C', ST_GeomFromText('LineString(10 2, 20 2)', 4326)),
('D', ST_GeomFromText('LineString(0 5, 20 5)', 4326)),
('E', ST_GeomFromText('LineString(20 4.9999, 20 2.0001)', 4326));

-- 创建网络表并添加字段
create table road_network (
  id serial primary key,
  name text,
  source int,
  target int,
  geom geometry(LineString, 4326),
  len float
);
-- 导入数据并构建拓扑
insert into road_network(name, geom) select name, geom from road;
select pgr_createTopology('road_network', 0.00001, 'geom', 'id', 'source', 'target');
select pgr_analyzeGraph('road_network', 0.00001, 'geom', 'id', 'source', 'target');

-- 计算边长度(作为成本)
update road_network set len = ST_LENGTH(geom);

步骤 2:查询最短路径

sql 复制代码
-- 查询节点5到节点3的最短路径(无向图)
Select seq, id1 As node, id2 As edge, cost 
From pgr_dijkstra(
  'Select id, source, target, len as cost From road_network',
  5, 3, false
);

扩展场景

  1. 动态成本调整 :通过 SQL 条件过滤修改成本(如where id <> 101排除修路道路,where len < 10限制道路长度);
  2. 生态路由(Eco-Routing):支持负成本边(如下坡路段),需避免 "负环" 场景;
  3. 时空网络:支持时间依赖网络、流量网络等动态场景,结合 GPS 历史数据和路径偏好优化。

9.4 Physical Model

9.4.1 Storage and Data Structures

Core Objective

For spatial networks, identify efficient disk-based data structures to minimize the I/O cost of operations such as Find(), Insert(), Delete(), Create(), Get-A-Successor(), and Get-Successors(). The key constraint is that spatial networks are far larger than main memory, and geometric indices (e.g., R-trees) cluster objects by proximity rather than edge connectivity, leading to poor performance when edge connectivity is uncorrelated with proximity.

Key Metric

  • CRR (Connection Retrieval Ratio): The probability that node pairs connected by an edge reside in the same disk sector. Maximizing CRR is critical for reducing I/O costs.

Classification of Data Structures

1. Main Memory Data Structures
Data Structure Definition and Characteristics
Adjacency Matrix If there is an edge from vertex A to vertex B, then M[A, B] = 1, directly mapping the connectivity between vertices
Adjacency List Maps each vertex to a list of its successor vertices, efficiently storing connectivity information for sparse networks
2. Disk-Based Table Structures
Table Type Structural Characteristics
Normalized Tables Store data in two separate tables: one for vertex information (e.g., id, x-coordinate, y-coordinate) and another for edge information (e.g., source, dest, distance)
Denormalized Tables Store node information in a single table, including fields for a vertex's successors and predecessors, reducing table join operations
3. Graph-Based Storage Methods
Core Idea

Partition the graph into disk blocks by node division. Prioritize "Min-cut Graph Partition" over "Geometric Partition" because the former cuts fewer edges and achieves a higher CRR (assuming uniform query popularity across edges).

Key Steps

  1. Node Partitioning: Divide nodes into different disk sectors to maximize the probability that node pairs connected by an edge are in the same sector.
  2. Secondary Indexing : Build secondary indices using R-trees or B-trees to support efficient find() operations.
  3. Empirical Validation: Storage tests on the major road network of Minneapolis show that partitioning schemes with fewer "cut-edges" significantly improve CRR.

Storage Optimization Points

  • Avoid clustering based on geometric proximity; focus on clustering by edge connectivity.
  • Reduce disk I/O through reasonable partitioning, especially for high-frequency operations like "retrieving successor nodes".

9.4.2 Algorithms for Connectivity Query and Shortest Path

Core Query Building Blocks

The core building blocks of spatial network queries include two types:

  1. Connectivity Query (Connectivity(A, B)): Determine whether node B is reachable from node A.
  2. Shortest Path Query (Shortest Path(A, B)): Find the minimum-cost path from node A to node B.

Algorithm Classification and Characteristics

1. Main Memory Algorithms (Suitable for Scenarios Where the Entire Graph Is Loaded into Main Memory)
Algorithm Type Application Scenario Core Characteristics
Connectivity Query Undirected/directed graph connectivity checks Breadth-First Search (BFS) and Depth-First Search (DFS) efficiently traverse nodes to verify connectivity
Shortest Path Query Graphs with non-negative edge weights Dijkstra's Algorithm: Uses a greedy strategy to gradually expand the minimum-cost path;A* Algorithm: Incorporates Euclidean distance as a heuristic to reduce unnecessary traversal
2. Disk-Level Algorithms (Suitable for Scenarios Where Graph Size Exceeds Main Memory)

Hierarchical Routing Algorithm

Core Idea

Decompose large graphs into "Islands" and "Bridges" (edges crossing islands). Use a "divide-and-conquer" strategy to reduce computational costs, i.e., SP(A, B) = SP(A, X) + edge(X, Y) + SP(Y, B) (where X and Y are boundary nodes connected by a bridge).

Advantages and Performance Comparison
Algorithm Number of Nodes Expanded Core Advantages Application Scenario
Dijkstra's Algorithm 18 (example) Full-graph traversal with accurate results Small-scale graphs in main memory
A* Algorithm (Euclidean Heuristic) 14 (example) Heuristic optimization reduces unnecessary traversal Spatial networks in main memory (with location data)
Hierarchical Routing Algorithm 11 (example) Block-based computation reduces disk I/O and computational load Large-scale spatial networks on disk
Challenges with Multiple Islands and Bridges
  • Challenge 1: Multiple islands increase computation steps, requiring gradual path decomposition.
  • Challenge 2 : Multiple bridges per island require selecting the optimal boundary node pair, i.e., minimizing SPC(A, local boundary node) + SPC(local boundary node, target boundary node) + SPC(target boundary node, B) (SPC = Shortest Path Cost).
  • Solution: Precompute shortest path costs for boundary node pairs, construct a boundary graph, and optimize the efficiency of solving subproblems.

Key Data Structures

  • Port Node/Boundary Node: Nodes connecting multiple islands.
  • Island Graphs/Fragment Graphs: Subgraphs obtained through node partitioning.
  • Boundary Graph: A simplified graph summarizing boundary nodes and inter-island paths.

9.5 pgRouting

Core Positioning

pgRouting is a geospatial routing extension for PostgreSQL that implements spatial network routing functionality based on databases, supporting dynamic data updates, multi-client access, and flexible cost calculation.

Core Advantages

  1. Decoupling of Data and Routing: Data can be modified by multiple clients via JDBC, ODBC, or PL/pgSQL, and the routing engine responds to data changes in real time.
  2. Flexible Cost Parameters: Cost values can be dynamically calculated via SQL, supporting cost definition using multiple fields or tables.
  3. PostGIS Compatibility: Seamless integration with the PostGIS extension, enabling storage and processing of spatial geometric data.

Core Operational Steps

1. Environment Setup

  1. Create a database: createdb netdatabase;
  2. Enable extensions:
    • Enable PostGIS: psql netdatabase -c "create extension postgis";
    • Enable pgRouting: psql netdatabase -c "create extension pgrouting".

2. Network Topology Construction

Core Function: pgr_createTopology
  • Functionality : Constructs network topology based on the geometric information of an edge table and generates a vertices table (named in the format: <edge_table>_vertices_pgr);
  • Parameter Description :
    • edge_table: Name of the network edge table (must include geometric field, primary key field, source node field, and target node field);
    • tolerance: Node snapping threshold (in projection units); nodes within this distance are merged;
    • the_geom: Name of the geometric field (default: the_geom);
    • id: Name of the primary key field of the edge table (default: id);
    • source/target: Names of the source node/target node fields (default: source/target);
  • Output Results :
    • The source/target fields of the edge table are updated to the primary key IDs of the vertices table;
    • A vertices table is generated, containing information such as vertex ID, geometric coordinates, number of incoming edges (ein), and number of outgoing edges (eout).
Topology Optimization Functions
Function Functionality
pgr_analyzeGraph Analyzes the integrity of the network topology, detects errors such as isolated edges, dead ends, gaps, and intersections, and updates the cnt (reference count) and chk (error flag) fields of the vertices table
pgr_nodeNetwork Splits the edge table by intersection points to generate a new segmented edge table (default suffix: noded), resolving topology errors caused by edge intersections
pgr_analyzeOneway Analyzes one-way street networks and identifies segments with incorrect directions

3. Shortest Path Query

Core Function: pgr_dijkstra
  • Functionality: Solves the shortest path using Dijkstra's algorithm;

  • Syntax:

  • Output Result Type (pgr_costResult[]) :

    Field Name Meaning
    seq Path sequence number
    id1 Node ID
    id2 Edge ID
    cost Edge cost
    agg_cost Cumulative cost from the start node to the current node
Other Routing Algorithms

pgRouting supports a variety of extended algorithms, including:

  • Bi-directional Dijkstra Algorithm;
  • A* Algorithm (Shortest Path A*);
  • K-Dijkstra Algorithm (One-to-Many Shortest Path);
  • Traveling Salesman Problem Algorithm;
  • Turn Restriction Shortest Path Algorithm (TRSP), etc.

Typical Application Example

Step 1: Create and Initialize the Edge Table

Step 2: Query the Shortest Path

Extended Scenarios

  1. Dynamic Cost Adjustment : Modify costs via SQL condition filtering (e.g., where id <> 101 to exclude roads under construction, where len < 10 to limit road length);
  2. Eco-Routing: Supports edges with negative costs (e.g., downhill sections), but "negative cycles" must be avoided;
  3. Spatio-Temporal Networks: Supports dynamic scenarios such as time-dependent networks and flow networks, combining GPS historical data and path preferences for optimization.
相关推荐
陌路208 分钟前
MYSQL事务篇--事务隔离机制
数据库·mysql
清风6666661 小时前
基于单片机的PID调节脉动真空灭菌器上位机远程监控设计
数据库·单片机·毕业设计·nosql·课程设计·期末大作业
酩酊仙人1 小时前
ABP将ExtraProperties作为查询条件
数据库·postgresql·asp.net
在风中的意志1 小时前
[数据库SQL] [leetcode] 614. 二级关注者
数据库·sql
·云扬·1 小时前
MySQL Group Replication(MGR)核心特性全解析:从事务流程到一致性配置
数据库·mysql
陌路201 小时前
MYSQL事务篇--事务隔离机制的实现
数据库·mysql
oMcLin1 小时前
CentOS 7.9 高负载导致 MySQL 数据库性能下降:内存泄漏与配置优化
数据库·mysql·centos
auspicious航1 小时前
数据库同步技术演进:从备份转储到实时CDC的DBA实战指南
数据库·ffmpeg·dba
SmartRadio1 小时前
物联网云平台数据库选型与搭建全指南(LoRaWAN)
数据库·物联网·lora·lorawan
bst@微胖子2 小时前
CrewAI+FastAPI实现营销战略协助智能体项目
android·数据库·fastapi