设想如下数据库场景:两张表通过外键关联,外键字段为 NOT NULL,且约束有效并被严格执行。在执行两表 JOIN 查询时,却返回空结果。该场景看似不可能出现,但实际生产环境中出现了一个未被预期的边界情况。
问题场景
假设以车辆所有权管理场景为例,包含两张表,分别是车主表与车辆表:
sql
db=# CREATE TABLE owner (
id int PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE
db=# CREATE TABLE car (
id int PRIMARY KEY,
owner_id int NOT NULL,
CONSTRAINT car_owner_id_fk FOREIGN KEY (owner_id) REFERENCES owner(id)
);
CREATE TABLE
初始化数据:
sql
db=# INSERT INTO owner (id, name) VALUES
(1, 'haki'),
(2, 'jerry'),
(3, 'george')
RETURNS *;
id │ name
────┼───────
1 │ haki
2 │ jerry
3 | george
(3 rows)
INSERT 0 2
db=# INSERT INTO car (id, owner_id) VALUES(1, 1) RETURNS *;
id │ owner_id
────┼──────────
1 │ 1
(1 row)
INSERT 0 1
当前状态:
- 存在三个 owner:haki、jerry、george
- car(id=1) 的 owner 为 haki
所有权变更流程
标准流程如下:
- 查询 car 及其 owner,并对 car 行加锁;
- 校验当前 owner;
- 更新 owner_id 并提交。
示例:
ini
db=# BEGIN;
BEGIN
db=*# SELECT *
FROM car JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1
FOR NO KEY UPDATE OF car;
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 1
id │ 1
name │ haki
加锁可避免并发修改,校验完成后执行更新:
ini
db=*# UPDATE car SET owner_id = 2 WHERE id = 1 RETURNING *;
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 2
db=*# COMMIT;
COMMIT
并发场景下的权属变更
当前车辆所有权变更流程如下:
- 查询车辆及其当前所有者,并对相关记录加锁。
- 对所有者信息进行必要校验。
- 更新车辆所有权并提交事务。
在实际的车管系统中,此类请求通常会被高并发处理。下面基于相同流程,分析两个并发会话同时尝试更新同一车辆所有权时的执行情况:
ini
-- First session
db=# BEGIN;
BEGIN
db=*# SELECT * FROM car
JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car;
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 2
id │ 2
name │ jerry
db=*# UPDATE car SET owner_id = 3
WHERE id = 1 RETURNING *;
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 3
db=*# COMMIT;
COMMIT
sql
-- Second session
db=# BEGIN;
BEGIN;
db=*# SELECT * FROM car
JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car;
-- Session is blocked
--
--
--
--
--
--
--
--
--
--
--
--
-- Lock is released by first session
(0 rows)
第一会话加锁并更新车辆车主,第二会话阻塞等待锁释放,锁释放后查询无结果。车辆与车主数据均存在,内关联查询却无返回值。
替换为左外关联查询可观察到异常数据:
ini
db=# BEGIN;
BEGIN
db=*# SELECT * FROM car
LEFT JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car;
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 2
id │ 2
name │ jerry
db=*# UPDATE car SET owner_id = 3
WHERE id = 1 RETURNS *;
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 3
db=*# COMMIT;
COMMIT
sql
db=# BEGIN;
BEGIN
db=*# SELECT * FROM car
LEFT JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car;
-- Session is blocked
--
--
--
--
--
--
--
--
--
--
--
--
-- Lock is released by first session
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 3
id │ ¤
name │ ¤
查询返回车辆最新数据,但车主表字段为空。车辆表存在非空约束与外键约束,理论上内关联查询不会出现无结果场景。
并发问题根源
在 Read Committed 隔离级别下,查询只能基于"语句开始时已提交的数据",但在遇到行锁时,会等待并在锁释放后基于最新行版本重新评估条件。
lua
─[ RECORD 1 ]─┬────────
id │ 1 -- car.id
owner_id │ 3 -- car.owner_id
id │ ¤ -- owner.id
name │ ¤ -- owner.name
本例中,第一个会话将 car.owner 从 1 更新为 3 并提交;第二个会话原本读取到 car=1、owner=1,但因锁等待,锁释放后重新校验时,发现 owner 已变为 3,而原匹配条件(owner=1)失效,因此不会重新关联 owner=3,而是直接丢弃该结果(或返回 NULL)。
关键在于:查询并非整体重跑,而是执行过程中被中断并局部重评估(EvalPlanQual),从而导致 car 表为最新值,而 join 表数据缺失。
结论:在 Read Committed 下,涉及 JOIN 且伴随外键更新的并发场景,可能出现"部分结果"或"不一致结果"。
解决方案
在明确问题根因后,可以对不同方案进行对比分析:
锁定关联行(无效方案)
同时锁定 car 与 owner 无法保证正确性。第二个会话获得锁时,car.owner_id 已变化,JOIN 条件失效,本质是锁定了"旧 owner"。
ini
db=# BEGIN;
BEGIN;
db=*# SELECT * FROM car
JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car, owner;
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 2
id │ 2
name │ jerry
db=*# UPDATE car SET owner_id = 3
WHERE id = 1 RETURNING *;
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 3
db=*# COMMIT;
COMMIT
sql
db=# BEGIN;
BEGIN;
db=*# SELECT * FROM car
JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car, owner;
-- Session is locked
--
--
--
--
--
--
--
--
--
--
--
--
-- Lock is released by first session
(0 rows)
sql
db=# BEGIN;
BEGIN
db=*# SELECT * FROM car
LEFT JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car, owner;
ERROR: FOR NO KEY UPDATE cannot be applied to the nullable side of an outer join
提高隔离级别(可行但有代价)
使用 Repeatable Read / Serializable 可避免不一致,但会引入序列化冲突错误,影响并发能力。
ini
db=# BEGIN ISOLATION LEVEL REPEATABLE READ;
BEGIN
db=*# SELECT * FROM car
JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car;
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 2
id │ 2
name │ jerry
db=*# UPDATE car SET owner_id = 3
WHERE id = 1 RETURNING *;
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 3
db=*# COMMIT;
COMMIT
sql
db=# BEGIN ISOLATION LEVEL REPEATABLE READ;
BEGIN
db=*# SELECT * FROM car
JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car;
-- Session is blocked
--
--
--
--
--
--
--
--
--
--
--
--
-- Lock is released by first session
ERROR: could not serialize access due to concurrent update
拆分查询(简单可靠)
先锁 car,再查 owner,保证读取最新 owner_id;逻辑简单,但需要两次数据库往返。
ini
db=# BEGIN;
BEGIN
db=*# SELECT * FROM car WHERE car.id = 1
FOR NO KEY UPDATE;
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 2
db=*# SELECT * FROM owner WHERE id = 2;
─[ RECORD 1 ]─┬────────
id │ 2
name │ jerry
db=*# UPDATE car SET owner_id = 3 WHERE id = 1
RETURNING *;
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 3
db=*# COMMIT;
COMMIT
markdown
db=# BEGIN;
BEGIN
db=*# SELECT * FROM car WHERE car.id = 1
FOR NO KEY UPDATE;
-- Session is blocked
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
-- Lock is released by first session
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 3
db=*# SELECT * FROM owner WHERE id = 3;
─[ RECORD 1 ]─┬────────
id │ 3
name │ george
子查询 / CTE(推荐方案)
核心思路:将加锁前移到 JOIN 之前,避免读取过期关联关系,同时避免额外往返。
执行计划显示:原查询在 JOIN 后才通过 LockRows 加锁。
ini
db=# BEGIN;
BEGIN
db=*# EXPLAIN SELECT * FROM car
JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car
QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────
LockRows (cost=0.31..16.37 rows=1 width=56)
-> Nested Loop (cost=0.31..16.36 rows=1 width=56)
-> Index Scan using car_pkey on car (cost=0.15..8.17 rows=1 width=14)
Index Cond: (id = 1)
-> Index Scan using owner_pkey on owner (cost=0.15..8.17 rows=1 width=42)
Index Cond: (id = car.owner_id)
通过 CTE,将锁提前到数据源阶段:
ini
db=# BEGIN;
BEGIN;
db=*# EXPLAIN
WITH c AS (
SELECT * FROM car WHERE id = 1
FOR NO KEY UPDATE
)
SELECT * FROM c
JOIN owner ON c.owner_id = owner.id;
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────
Nested Loop (cost=8.33..16.39 rows=1 width=44)
CTE c
-> LockRows (cost=0.15..8.18 rows=1 width=14)
-> Index Scan using car_pkey on car (cost=0.15..8.17 rows=1 width=14)
Index Cond: (id = 1)
-> CTE Scan on c (cost=0.00..0.02 rows=1 width=8)
-> Index Scan using owner_pkey on owner (cost=0.15..8.17 rows=1 width=36)
Index Cond: (id = c.owner_id)
效果:先锁 car,再 JOIN,读取到最新 owner。
ini
db=# BEGIN;
BEGIN
db=*# WITH c AS (
SELECT * FROM car WHERE id = 1
FOR NO KEY UPDATE
)
SELECT * FROM c
JOIN owner ON c.owner_id = owner.id;
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 1
id │ 1
name │ haki
db=*# UPDATE car SET owner_id = 2 WHERE id = 1;
UPDATE 1
db=*# COMMIT;
COMMIT
sql
db=# BEGIN;
BEGIN
db=*# WITH c AS (
SELECT * FROM car WHERE id = 1
FOR NO KEY UPDATE
)
SELECT * FROM c
JOIN owner ON c.owner_id = owner.id;
-- Session is blocked
--
--
--
--
--
--
--
--
--
--
-- Lock is released by first session
─[ RECORD 1 ]─┬────────
id │ 1
owner_id │ 2
id │ 2
name │ jerry
无法使用 WITH 时,可用等价子查询实现:
ini
db=# BEGIN;
BEGIN;
db=*# EXPLAIN
SELECT * FROM (
SELECT * FROM car
WHERE id = 1 FOR NO KEY UPDATE
) c
JOIN owner ON c.owner_id = owner.id;
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────
Nested Loop (cost=0.31..16.38 rows=1 width=44)
-> Subquery Scan on c (cost=0.15..8.19 rows=1 width=8)
-> LockRows (cost=0.15..8.18 rows=1 width=14)
-> Index Scan using car_pkey on car (cost=0.15..8.17 rows=1 width=14)
Index Cond: (id = 1)
-> Index Scan using owner_pkey on owner (cost=0.15..8.17 rows=1 width=36)
Index Cond: (id = c.owner_id)
结论:通过子查询/CTE 前移锁获取时机,是兼顾一致性与性能的最优解。
预防措施
该问题并非理论推演,而是在生产环境中真实发生。触发原因也并非高并发场景,而是用户在 UI 中快速重复点击按钮,短时间内产生了多次相同请求,从而形成并发条件。由于系统中已设置断言机制,未对业务和用户产生实际影响。
在定位问题模式后,开始评估预防手段。此类问题难以通过常规规则检测:不仅依赖具体 SQL 结构,还与执行上下文和并发时序密切相关,对代码理解要求较高。
通过结合大模型辅助分析,对潜在风险点进行了系统性排查与修复。最终统一采用更稳妥的策略:避免在加锁场景中使用 JOIN,改为拆分查询。虽然增加了一次数据库交互,但能够确保结果一致性,属于可接受的权衡。