瀚高数据库
目录
环境
文档用途
详细信息
环境
系统平台:Linux x86-64 Red Hat Enterprise Linux 7
版本:14,13,12,11
文档用途
从底层理解轻量级锁的实现,从保护共享内存的角度理解轻量级锁的使用场景,包括上锁、等待、释放,理解轻量级锁的互斥(execlusive)和共享(shared)2种状态。
详细信息
1.请求注册lwlock
从具体情境出发,选择pg_stat_statements来观察锁的请求、上锁、释放的过程。
如下这个函数的细节和被调用以及NamedLWLockTrancheRequestArray数组、NamedLWLockTrancheRequests的使用已经在hook机制和轻量级锁的实现1阐述,对此不赘述,
功能就是让动态库向内核注册一个lwlock,tranche_name是作为等待事件的名称存在。
注释说明的是该函数被调用的上下文只能是shmem_request_hook的钩子函数中,其它情况不行,而shmem_requeset_hook又是在postmater启动过程中调用的。
cpp
RequestNamedLWLockTranche("pg_stat_statements", 1);
/*
* RequestNamedLWLockTranche
* Request that extra LWLocks be allocated during postmaster
* startup.
*
* This may only be called via the shmem_request_hook of a library that is
* loaded into the postmaster via shared_preload_libraries. Calls from
* elsewhere will fail.
*
* The tranche name will be user-visible as a wait event name, so try to
* use a name that fits the style for those.
*/
void RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks)
{
NamedLWLockTrancheRequest *request;
if (!process_shmem_requests_in_progress)
elog(FATAL, "cannot request additional LWLocks outside shmem_request_hook");
if (NamedLWLockTrancheRequestArray == NULL)
{
NamedLWLockTrancheRequestsAllocated = 16;
NamedLWLockTrancheRequestArray = (NamedLWLockTrancheRequest *)
MemoryContextAlloc(TopMemoryContext,
NamedLWLockTrancheRequestsAllocated
* sizeof(NamedLWLockTrancheRequest));
}
if (NamedLWLockTrancheRequests >= NamedLWLockTrancheRequestsAllocated)
{
int i = pg_nextpower2_32(NamedLWLockTrancheRequests + 1);
NamedLWLockTrancheRequestArray = (NamedLWLockTrancheRequest *)
repalloc(NamedLWLockTrancheRequestArray,
i * sizeof(NamedLWLockTrancheRequest));
NamedLWLockTrancheRequestsAllocated = i;
}
request = &NamedLWLockTrancheRequestArray[NamedLWLockTrancheRequests];
Assert(strlen(tranche_name) + 1 <= NAMEDATALEN);
strlcpy(request->tranche_name, tranche_name, NAMEDATALEN);
request->num_lwlocks = num_lwlocks;
NamedLWLockTrancheRequests++;
}
2.上锁和释放锁对于之前版本产生的问题
在postmaster启动过程中调用钩子函数shmem_startup_hook,进而跳转到pg_stat_statements中的pgss_shmem_startup函数中,利用LWLockAcquired请求上锁,LWLockRelease释放锁,中间是临界区,保护的是共享内存中的数据。
下面先分析LWLlockAcquire的实现。
cpp
LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
pgss = ShmemInitStruct("pg_stat_statements",
sizeof(pgssSharedState),
&found);
if (!found)
{
/* First time through ... */
pgss->lock = &(GetNamedLWLockTranche("pg_stat_statements"))->lock;
pgss->cur_median_usage = ASSUMED_MEDIAN_INIT;
pgss->mean_query_len = ASSUMED_LENGTH_INIT;
SpinLockInit(&pgss->mutex);
pgss->extent = 0;
pgss->n_writers = 0;
pgss->gc_count = 0;
pgss->stats.dealloc = 0;
pgss->stats.stats_reset = GetCurrentTimestamp();
}
info.keysize = sizeof(pgssHashKey);
info.entrysize = sizeof(pgssEntry);
pgss_hash = ShmemInitHash("pg_stat_statements hash",
pgss_max, pgss_max,
&info,
HASH_ELEM | HASH_BLOBS);
LWLockRelease(AddinShmemInitLock);
2.1 LWLockAcquire
逐步解析函数执行过程
2.1.1 LWLockAcquire函数声明
参数LWLock *:, 像AddinShmemInitLock在执行到这里已经被初始化过了,在LWLock实现1。
参数LWLockMode:该函数只支持两种,LW_EXCLUSIVE和LW_SHARED
返回值:可以直接上锁,返回true.其它进程持有锁,进入睡眠。
函数副作用:在锁释放之前,会阻止取消和异常终止中断的发生。这意味着在获取锁期间,取消请求和终止请求将被暂时延迟,直到锁被释放为止。
cpp
/*
* LWLockAcquire - acquire a lightweight lock in the specified mode
*
* If the lock is not available, sleep until it is. Returns true if the lock
* was available immediately, false if we had to sleep.
*
* Side effect: cancel/die interrupts are held off until lock release.
*/
bool LWLockAcquire(LWLock *lock, LWLockMode mode);
2.1.2 一些断言
Assert(!(proc == NULL && IsUnderPostmaster));表示如果是postmaster的子进程并且proc==NULL, 意味着子进程未初始化完毕就想申请轻量级锁,这是非法调用。
cpp
PGPROC *proc = MyProc;
bool result = true;
int extraWaits = 0;
Assert(mode == LW_SHARED || mode == LW_EXCLUSIVE);
/*
* We can't wait if we haven't got a PGPROC. This should only occur
* during bootstrap or shared memory initialization. Put an Assert here
* to catch unsafe coding practices.
*/
Assert(!(proc == NULL && IsUnderPostmaster));
/* Ensure we will have room to remember the lock */
if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
elog(ERROR, "too many LWLocks taken");
关于MyProc:每一个postmaster的子进程有一个MyProc结构
举例来说,当我们启动辅助进程时或者通过接受到客户端请求,都会在子进程中依次执行下面的调用,只有在MyProc完全被初始化好才能申请轻量级锁,在这之前都使用spinlockaccquire。
cpp
void InitPostmasterChild(void)
{
IsUnderPostmaster = true; /* we are a postmaster subprocess now */
......
}
void InitProcessGlobals(void)
{
MyProcPid = getpid();
......
}
void InitProcess(void)
{
....
MyProc->pid = MyProcPid;
/* backendId, databaseId and roleId will be filled in later */
MyProc->backendId = InvalidBackendId;
MyProc->databaseId = InvalidOid;
MyProc->roleId = InvalidOid;
.....
}
当前上下文是执行插件的钩子函数,两者都为空,所以这个assert能通过。
2.1.3 禁用取消和异常终止中断
在多进程并发环境中,多个进程可能同时访问共享内存中的数据结构。为了保证数据结构的一致性和完整性,需要对访问共享内存的操作进行同步和保护。LWLock用于实现这种同步。
在进入受 LWLock 保护的代码段之前,禁用取消和异常终止中断的目的是防止中断干扰对共享内存中数据结构的操作。取消请求和终止请求被暂时延迟,直到完成对共享内存的操作,并释放 LWLock。
这种做法确保了在持有锁期间,不会发生取消请求或终止请求导致的不一致状态或数据破坏。因此,禁用取消和异常终止中断是为了保证共享内存中的数据结构操作的正确性和完整性。
实现方式让全局变量InterruptHolddoffCount++,一些事件的发生会检查该全局变量。
cpp
/*
* Lock out cancel/die interrupts until we exit the code section protected
* by the LWLock. This ensures that interrupts will not interfere with
* manipulations of data structures in shared memory.
*/
HOLD_INTERRUPTS();
cpp
#define HOLD_INTERRUPTS() (InterruptHoldoffCount++)
cpp
volatile uint32 InterruptHoldoffCount = 0;
2.1.4 获取锁
源代码中在给出获取锁的实现方式之前有段比较长的注释,结合早期的代码(pg 7.2),说下这段注释是什么意思。
/*
-
Loop here to try to acquire lock after each time we are signaled by
-
LWLockRelease.
-
NOTE: it might seem better to have LWLockRelease actually grant us the
-
lock, rather than retrying and possibly having to go back to sleep. But
-
in practice that is no good because it means a process swap for every
-
lock acquisition when two or more processes are contending for the same
-
lock. Since LWLocks are normally used to protect not-very-long
-
sections of computation, a process needs to be able to acquire and
-
release the same lock many times during a single CPU time slice, even
-
in the presence of contention. The efficiency of being able to do that
-
outweighs the inefficiency of sometimes wasting a process dispatch
-
cycle because the lock is not free when a released waiter finally gets
-
to run. See pgsql-hackers archives for 29-Dec-01.
*/
轻量级锁其实就是读写锁,其特征是:
-
Any number of threads(processes) can hold a given read-write lock for reading as long as no thread holds the read-write lock for writing.
-
A read-write lock can be allocated for writing only if no thread(process) holds the read-write lock for reading or writing.
所以lock->exclusive只能是0或者1, 而lock->shared可以是0...MaxBackends。
当lock->exclusive>0时,表示当前进程持有的是写锁(互斥锁);当lock->shared>0时,表示当前进程持有的是读锁(共享锁)。
cpp
LWLockAcquire:
/*
* LWLockAcquire - acquire a lightweight lock in the specified mode
*
* If the lock is not available, sleep until it is.
*
* Side effect: cancel/die interrupts are held off until lock release.
*/
void
LWLockAcquire(LWLockId lockid, LWLockMode mode)
{
LWLock *lock = LWLockArray + lockid;
bool mustwait;
PRINT_LWDEBUG("LWLockAcquire", lockid, lock);
/*
* Lock out cancel/die interrupts until we exit the code section
* protected by the LWLock. This ensures that interrupts will not
* interfere with manipulations of data structures in shared memory.
*/
HOLD_INTERRUPTS();
/* Acquire mutex. Time spent holding mutex should be short! */
SpinLockAcquire_NoHoldoff(&lock->mutex);
/* If I can get the lock, do so quickly. */
// 当请求一个写锁时,需要判断当前是否有其它进程持有写锁或者读锁,如果二者都为0,表示当前可以持写锁,mustwait=false,同时要更改状态:lock->exclusive++变为1
// 否则,mustwait为true,请求写锁的进程将睡眠。
if (mode == LW_EXCLUSIVE)
{
if (lock->exclusive == 0 && lock->shared == 0)
{
lock->exclusive++;
mustwait = false;
}
else
mustwait = true;
}
// 如果当前进程请求的是读锁,有一种情况需要考虑,读锁过多导致写锁的饥饿现象。
// 所以除了判断写锁是否存在之外,另外一层看当前队列是否为空,如果不为空,即便是读锁也要排队,否则shared在极端情况(基本不可能)可能不会回退到0, 想访问写锁的进程无限期阻塞。
// 如果某一个进程持有了写锁,mustwait为true。
else
{
/*
* If there is someone waiting (presumably for exclusive access),
* queue up behind him even though I could get the lock. This
* prevents a stream of read locks from starving a writer.
*/
if (lock->exclusive == 0 && lock->head == NULL)
{
lock->shared++;
mustwait = false;
}
else
mustwait = true;
}
// 根据上面的情形,如果加锁成功并且相应的状态已经改变,只需要进入到else当中释放mutex。
// 如果mustwait为true,意味着不能持锁,要将当前的进程入lwlock的队列,等待持锁的进程在释放锁时将其唤醒。
if (mustwait)
{
/* Add myself to wait queue */
PROC *proc = MyProc;
int extraWaits = 0;
/*
* If we don't have a PROC structure, there's no way to wait.
* This should never occur, since MyProc should only be null
* during shared memory initialization.
*/
if (proc == NULL)
elog(FATAL, "LWLockAcquire: can't wait without a PROC structure");
proc->lwWaiting = true;
proc->lwExclusive = (mode == LW_EXCLUSIVE);
proc->lwWaitLink = NULL;
if (lock->head == NULL)
lock->head = proc;
else
lock->tail->lwWaitLink = proc;
lock->tail = proc;
/* Can release the mutex now */
SpinLockRelease_NoHoldoff(&lock->mutex);
/*
* Wait until awakened.
*
* Since we share the process wait semaphore with the regular lock
* manager and ProcWaitForSignal, and we may need to acquire an LWLock
* while one of those is pending, it is possible that we get awakened
* for a reason other than being granted the LWLock. If so, loop back
* and wait again. Once we've gotten the lock, re-increment the sema
* by the number of additional signals received, so that the lock
* manager or signal manager will see the received signal when it
* next waits.
*/
// 通过system V信号量来保持进程间同步,信号量存于共享内存中,多个进程可申请和释放
// 因为别的业务也可能释放这个信号量但是被LWLock被捕捉到了,此时需要重复校验proc->lwWaiting的状态。
// 其实即便使用线程的情况下使用pthread_cond_signal或者pthread_cond_broadcast也存在伪唤醒,都需要对boolean predicate重复校验。
for (;;)
{
/* "false" means cannot accept cancel/die interrupt here. */
IpcSemaphoreLock(proc->sem.semId, proc->sem.semNum, false);
if (!proc->lwWaiting)
break;
extraWaits++;
}
// 利用extraWaits恢复信号量的状态
// 被唤醒的时候lwlock的状态已经被修改了,所以除了修正信号量的状态无需再做其它操作。操作系统小知识:进程阻塞进入到就绪(只考虑时间片轮转)
/*
* The awakener already updated the lock struct's state, so we
* don't need to do anything more to it. Just need to fix the
* semaphore count.
*/
while (extraWaits-- > 0)
IpcSemaphoreUnlock(proc->sem.semId, proc->sem.semNum);
}
else
{
/* Got the lock without waiting */
SpinLockRelease_NoHoldoff(&lock->mutex);
}
/* Add lock to list of locks held by this backend */
Assert(num_held_lwlocks < MAX_SIMUL_LWLOCKS);
held_lwlocks[num_held_lwlocks++] = lockid;
}
LWLockRelease:
当去释放轻量级锁的时候:
1.判断当前进程持有的锁的类型,不管是读锁还是写锁都需要处理等待该锁的队列。
2.队列上要么是空,要么不空,不空的情况下可能既有读锁,也有写锁。空的情况下把中断打开函数就结束了。
3.如果释放的是写锁,lock->exclusive == 0 && lock->shared == 0同时成立,此时队列如果有写锁,lwlock->exclusive++,如果没有写锁,就是以广播的形式唤醒所有的读锁,直到遇到下一个写锁为止。
4.如果释放的是读锁,lock->exclusive == 0,但是lock->shared不一定等于0,比如读写事件序列排列:"RRRWRRR" 但是当最后一个持有读锁的进程释放该读锁的时候,需要处理队列中的内容,RRRWRRR。这个逻辑跟 3 保持一致。
5.最后就是依次唤醒锁,因为此时队列已经是两半了,因在代码中调用了proc->lwWaitLink = NULL;。
cpp
void
LWLockRelease(LWLockId lockid)
{
.............................
/* Acquire mutex. Time spent holding mutex should be short! */
SpinLockAcquire_NoHoldoff(&lock->mutex);
/* Release my hold on lock */
if (lock->exclusive > 0)
lock->exclusive--;
else
{
Assert(lock->shared > 0);
lock->shared--;
}
/*
* See if I need to awaken any waiters. If I released a non-last shared
* hold, there cannot be anything to do.
*/
head = lock->head;
if (head != NULL)
{
if (lock->exclusive == 0 && lock->shared == 0)
{
/*
* Remove the to-be-awakened PROCs from the queue, and update the
* lock state to show them as holding the lock.
*/
proc = head;
if (proc->lwExclusive)
{
lock->exclusive++;
}
else
{
lock->shared++;
while (proc->lwWaitLink != NULL &&
!proc->lwWaitLink->lwExclusive)
{
proc = proc->lwWaitLink;
lock->shared++;
}
}
/* proc is now the last PROC to be released */
lock->head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
}
else
{
/* lock is still held, can't awaken anything */
head = NULL;
}
}
/* We are done updating shared state of the lock itself. */
SpinLockRelease_NoHoldoff(&lock->mutex);
/*
* Awaken any waiters I removed from the queue.
*/
while (head != NULL)
{
proc = head;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
proc->lwWaiting = false;
IpcSemaphoreUnlock(proc->sem.semId, proc->sem.semNum);
}
.......................
}
然后注释说明的就是现在进程在一个CPU的时间片内能执行大量的指令,而上述的实现方式可能会出现如下情况:
Pa和Pb都访问写锁。
Pa持有写锁,释放锁,释放锁的时候改了锁的状态,lwlock->exclusive++,同时唤醒Pb(Pb进入就绪态),但是Pa的时间片没到,所以Pa还在运行,但是之后Pa又想访问写锁,此时只能阻塞了,因为lwlock->exclusive已经改变了。
这最终导致了由释放锁的进程修改了锁的状态,而不得进行进程上下文的切换,频繁的进程上下文切换效率不高。
所以选择让进程在循环中去请求锁,而不是在释放时指定某一个(exclusive)或者某些(shared)进程唤醒,因为循环请求锁可以让刚刚释放锁的进程在时间片内再次持锁。