崩溃(Crash)简记

本文简单记述了一处崩溃(Crash)及其发生原因

问题

许久之前写过一段代码,大概是这个样子:

C++ 复制代码

// iterator 'TSetContainer'
for (const auto& TSetElement : TSetContainer)
{
    // notify event here
    Event.Broadcast(TSetElement);
}

代码很简单,一眼看上去也比较"平坦",似乎不太会出问题,没想经过了几次代码更新后,竟然偶现崩溃(Crash),位置就在遍历容器的时候:

C++ 复制代码

for (const auto& TSetElement : TSetContainer) // crash on this line ?

一开始曾怀疑会不会是容器元素失效之类的问题,但实际容器中存储的元素基本就是数值类型,是不可能失效的;后来又考虑可不可能是容器内存被某次不安全的内存操作给写坏了,当然这是有可能的,但是概率不高,排查起来也比较困难,所以暂时也没有深入.

最后想到会不会其实是后面代码有问题 ?

C++ 复制代码

Event.Broadcast(TSetElement); // problem here ?

考虑到事件通知可能会触发任意逻辑,包括对(遍历中)容器本身的更改(当然,调用链可能会非常深),所以对代码做了如下改动:

C++ 复制代码

// allocate new container
TSet TSetContainerBuffer;

// iterator 'TSetContainer'
for (const auto& TSetElement : TSetContainer)
{
    // collect set element
    TSetContainerBuffer.Add(TSetElement);
}

// iterator 'TSetContainerBuffer'
for (const auto& TSetElement : TSetContainerBuffer)
{
    // notify event here
    Event.Broadcast(TSetElement);
}

改动后事件通知不再会影响到遍历中的容器,奔溃(Crash)问题也便消除了.

总结

总结来看,对于容器遍历中的各类操作,一般我们应该都会了解一些相关的注意事项(譬如遍历中需要删除元素的话就不能简单的直接调用容器的移除操作,还需要保证后续迭代的正确性),但是如果代码涉及更深层次的调用逻辑,那么往往会让问题隐晦起来,这方面除了加强开发人员的认识以外, AI 代码检测工具应该是个更靠谱的方案 ~

细节

以下是对上述问题的细节描述(基于 UE4),有兴趣的朋友可以看一看,实际开发了解上面的总结即可 ~

为了稳定重现问题,我们对代码做一次改动:

C++ 复制代码

// iterator 'TSetContainer'
for (const auto& TSetElement : TSetContainer)
{
    // re-add 'TSetElement' for triggering crash
    TSetContainer.Add(TSetElement);
}

我们进一步的对 range-based for loop 进行等效改写:

C++ 复制代码

auto BeginIter = std::begin(TSetContainer);
auto EndIter = std::end(TSetContainer);

// iterator 'TSetContainer'
for (; BeginIter != EndIter; ++BeginIter)
{
    // re-add 'TSetElement' for triggering crash
    TSetContainer.Add(TSetElement);
}

对于循环中的容器添加操作,初看上去似乎并不会改变容器数据(因为容器为 Set,默认不支持添加重复元素),但实际上不是的,让我们看下实际的实现代码:

C++ 复制代码

template <typename ArgsType>
FSetElementId Emplace(ArgsType&& Args, bool* bIsAlreadyInSetPtr = nullptr)
{
	// Create a new element.
	FSparseArrayAllocationInfo ElementAllocation = Elements.AddUninitialized();
	SetElementType& Element = *new (ElementAllocation) SetElementType(Forward<ArgsType>(Args));

	uint32 KeyHash = KeyFuncs::GetKeyHash(KeyFuncs::GetSetKey(Element.Value));
	return EmplaceImpl(KeyHash, Element, ElementAllocation.Index, bIsAlreadyInSetPtr);
}

代码细节很多,但我们需要知道的就是 Set 添加元素的时候首先会创建元素,如果元素有重复的话则会再删除元素(但是会保留创建元素时申请的内存),简单来说, Set 容器的内存会发生变化(尽管从外部接口来看没有变化) ...

C++ 复制代码

// TSparseArray
int32 Num() const { return Data.Num() - NumFreeIndices; }

譬如上面的 Num() 函数,尽管返回的数值一致,但是(以 Set 当前有一个元素举例):

调用 Add 之前, Data.Num() 为 1, NumFreeIndices 为 0
调用 Add 之后, Data.Num() 为 2, NumFreeIndices 为 1

进一步的深入到迭代器,我们有:

C++ 复制代码

    // TConstSetBitIterator
	/** Find the first set bit starting with the current bit, inclusive. */
	void FindFirstSetBit()
	{
		const uint32* ArrayData      = Array.GetData();
		
		// **here use 'Data.Num()'**
		const int32   ArrayNum       = Array.Num();
		
		const int32   LastDWORDIndex = (ArrayNum - 1) / NumBitsPerDWORD;

		// Advance to the next non-zero uint32.
		uint32 RemainingBitMask = ArrayData[this->DWORDIndex] & UnvisitedBitMask;
		while (!RemainingBitMask)
		{
			++this->DWORDIndex;
			BaseBitIndex += NumBitsPerDWORD;
			if (this->DWORDIndex > LastDWORDIndex)
			{
				// We've advanced past the end of the array.
				// **update use ArrayNum('Data.Num()')**
				CurrentBitIndex = ArrayNum;
				
				return;
			}

			RemainingBitMask = ArrayData[this->DWORDIndex];
			UnvisitedBitMask = ~0;
		}

		// This operation has the effect of unsetting the lowest set bit of BitMask
		const uint32 NewRemainingBitMask = RemainingBitMask & (RemainingBitMask - 1);

		// This operation XORs the above mask with the original mask, which has the effect
		// of returning only the bits which differ; specifically, the lowest bit
		this->Mask = NewRemainingBitMask ^ RemainingBitMask;

		// If the Nth bit was the lowest set bit of BitMask, then this gives us N
		CurrentBitIndex = BaseBitIndex + NumBitsPerDWORD - 1 - FMath::CountLeadingZeros(this->Mask);

		// If we've accidentally iterated off the end of an array but still within the same DWORD
		// then set the index to the last index of the array
		if (CurrentBitIndex > ArrayNum)
		{
			CurrentBitIndex = ArrayNum;
		}
	}
};

由于迭代器内部关联了上述 Data.Num() 的数据,这导致 BeginIter 的递增操作之后, BeginIter 和 EndIter 不会再相同(因为内部 CurrentBitIndex 数值不一致),于是 BeginIter 会一致迭代下去,直到对 ArrayData 的访问越界,最终触发奔溃(Crash) ~