Effective C++ 条款50：了解 new 和 delete 的合理替换时机

为了检测运用错误、收集动态分配内存之使用统计信息、增加分配和归还速度、降低缺省内存管理器带来的空间额外开销、弥补缺省分配器中的非最佳齐位、将相关对象成簇集中、获得非传统的行为。

一、为什么要替换 new 和 delete？

C++ 标准库提供了默认的 operator new 和 operator delete，它们适用于大多数场景。但默认实现是"通用型"的------对每个人都是"足够好"，但对特定场景往往不是"最好"。

当我们对程序的内存使用模式有深入了解时，自定义内存管理往往能带来显著的性能提升。

二、七大替换理由详解

理由1：检测运用错误

内存相关错误是 C++ 中最难调试的问题之一：

错误类型	说明	后果
内存泄漏	new 后没有 delete	内存持续增长
重复释放	对同一内存多次 delete	未定义行为
越界写入	写入已分配区块尾端之后（overrun）	破坏相邻内存
下溢写入	写入已分配区块之前（underrun）	破坏内存头信息

实现：使用签名检测越界

cpp 复制代码

#include <iostream>
#include <new>
#include <cstdlib>
#include <cstring>

static const int SIGNATURE = 0xDEADBEEF;
static const int SIGNATURE_SIZE = sizeof(int);

// 自定义 operator new：带越界检测
void* operator new(std::size_t size) {
    if (size == 0) size = 1;  // 处理 0 字节分配
    
    // 额外分配两个签名区域（头部和尾部）
    std::size_t totalSize = size + 2 * SIGNATURE_SIZE;
    
    void* rawMem = std::malloc(totalSize);
    if (!rawMem) throw std::bad_alloc();
    
    // 在头部写入签名
    *(static_cast<int*>(rawMem)) = SIGNATURE;
    
    // 在尾部写入签名
    void* tailPtr = static_cast<char*>(rawMem) + SIGNATURE_SIZE + size;
    *(static_cast<int*>(tailPtr)) = SIGNATURE;
    
    // 返回用户可用区域（跳过头部签名）
    return static_cast<char*>(rawMem) + SIGNATURE_SIZE;
}

// 自定义 operator delete：验证签名
void operator delete(void* p) noexcept {
    if (!p) return;
    
    // 回退到真正的内存起始位置
    void* rawMem = static_cast<char*>(p) - SIGNATURE_SIZE;
    
    // 检查头部签名
    int headSig = *(static_cast<int*>(rawMem));
    if (headSig != SIGNATURE) {
        std::cerr << "【错误】检测到内存下溢（underrun）！\n";
    }
    
    // 获取原始分配大小（这里简化处理，实际需要记录）
    // 检查尾部签名...
    
    std::free(rawMem);
}

理由2：收集使用统计信息

在优化内存使用之前，我们需要先了解内存的使用模式：

cpp 复制代码

#include <map>
#include <mutex>

struct AllocationStats {
    std::size_t totalAllocations = 0;
    std::size_t totalDeallocations = 0;
    std::size_t currentBytes = 0;
    std::size_t peakBytes = 0;
    std::map<std::size_t, std::size_t> sizeDistribution;  // 大小 -> 次数
};

static AllocationStats stats;
static std::mutex statsMutex;

void* operator new(std::size_t size) {
    {
        std::lock_guard<std::mutex> lock(statsMutex);
        stats.totalAllocations++;
        stats.currentBytes += size;
        stats.peakBytes = std::max(stats.peakBytes, stats.currentBytes);
        stats.sizeDistribution[size]++;
    }
    
    // 在指针前面记录分配大小（用于 delete 时统计）
    std::size_t totalSize = size + sizeof(std::size_t);
    void* rawMem = std::malloc(totalSize);
    if (!rawMem) throw std::bad_alloc();
    
    *(static_cast<std::size_t*>(rawMem)) = size;
    return static_cast<char*>(rawMem) + sizeof(std::size_t);
}

void operator delete(void* p) noexcept {
    if (!p) return;
    
    void* rawMem = static_cast<char*>(p) - sizeof(std::size_t);
    std::size_t size = *(static_cast<std::size_t*>(rawMem));
    
    {
        std::lock_guard<std::mutex> lock(statsMutex);
        stats.totalDeallocations++;
        stats.currentBytes -= size;
    }
    
    std::free(rawMem);
}

// 打印统计信息
void printStats() {
    std::cout << "=== 内存分配统计 ===\n";
    std::cout << "总分配次数: " << stats.totalAllocations << "\n";
    std::cout << "总释放次数: " << stats.totalDeallocations << "\n";
    std::cout << "当前占用: " << stats.currentBytes << " bytes\n";
    std::cout << "峰值占用: " << stats.peakBytes << " bytes\n";
    std::cout << "大小分布:\n";
    for (const auto& [size, count] : stats.sizeDistribution) {
        std::cout << "  " << size << " bytes: " << count << " 次\n";
    }
}

理由3：增加分配和归还速度

默认的内存分配器需要处理各种大小的分配请求，维护复杂的数据结构。对于特定场景，我们可以做针对性优化。

场景：固定大小对象的快速分配

cpp 复制代码

template<std::size_t BlockSize, std::size_t NumBlocks>
class FixedAllocator {
    union Block {
        char data[BlockSize];
        Block* next;  // 空闲时作为链表指针
    };
    
    Block pool[NumBlocks];
    Block* freeList = nullptr;
    
public:
    FixedAllocator() {
        // 初始化空闲链表
        for (std::size_t i = 0; i < NumBlocks - 1; ++i) {
            pool[i].next = &pool[i + 1];
        }
        pool[NumBlocks - 1].next = nullptr;
        freeList = &pool[0];
    }
    
    void* allocate() {
        if (!freeList) {
            throw std::bad_alloc();  // 池耗尽
        }
        
        Block* block = freeList;
        freeList = freeList->next;
        return block;
    }
    
    void deallocate(void* p) {
        if (!p) return;
        
        Block* block = static_cast<Block*>(p);
        block->next = freeList;
        freeList = block;
    }
};

// 为 Widget 使用固定大小分配器（假设 Widget 大小为 64 字节）
class Widget {
    static FixedAllocator<64, 1000> allocator;
    char data[56];  // 填充到 64 字节
    
public:
    static void* operator new(std::size_t size) {
        return allocator.allocate();
    }
    
    static void operator delete(void* p) {
        allocator.deallocate(p);
    }
};

FixedAllocator<64, 1000> Widget::allocator;

优势：分配和释放都是 O(1) 操作，只需简单的链表操作，比通用分配器快数倍。

理由4：降低额外空间开销

通用内存分配器通常在每个分配区块旁边存储元数据（如分配大小、标志位等），这会带来额外的内存开销。

分配器类型	额外开销	适用场景
通用分配器	8-16 bytes/块	通用场景
固定大小分配器	0 bytes	对象大小固定
内存池	0-4 bytes/块	大量小对象

对于大量小对象的场景（如游戏引擎中的粒子系统），这些开销可能占据总内存的 20%-30%！

理由5：弥补非最佳齐位（Alignment）

许多计算机体系结构要求特定类型必须放在特定的内存地址上：

类型	典型对齐要求	未对齐的后果
`int`	4 字节边界	性能下降
`double`	8 字节边界	性能下降或硬件异常
`SIMD` 类型	16/32 字节边界	指令错误

cpp 复制代码

// 自定义对齐的 operator new
template<std::size_t Alignment>
void* alignedMalloc(std::size_t size) {
    std::size_t extra = Alignment + sizeof(void*);
    void* rawMem = std::malloc(size + extra);
    if (!rawMem) return nullptr;
    
    // 计算对齐后的地址
    void* alignedMem = static_cast<char*>(rawMem) + sizeof(void*);
    std::size_t space = size + extra - sizeof(void*);
    alignedMem = std::align(Alignment, size, alignedMem, space);
    
    // 在头部存储原始指针（用于释放）
    void** header = static_cast<void**>(alignedMem);
    header[-1] = rawMem;
    
    return alignedMem;
}

void alignedFree(void* p) {
    if (p) {
        void* rawMem = static_cast<void**>(p)[-1];
        std::free(rawMem);
    }
}

C++17 起，标准库提供了 std::aligned_alloc，但自定义实现仍然有用。

理由6：将相关对象成簇集中

如果知道某些数据结构经常被一起访问，可以将它们分配在相邻的内存位置，以减少 CPU 缓存未命中（cache miss）。

cpp 复制代码

// 场景：游戏中的实体组件系统（ECS）
struct Transform {
    float x, y, z;
    float rotation;
};

struct Physics {
    float velocityX, velocityY;
    float mass;
};

// 将所有 Transform 分配在连续的内存中
class TransformManager {
    static constexpr std::size_t MAX_ENTITIES = 10000;
    Transform transforms[MAX_ENTITIES];
    std::size_t count = 0;
    
public:
    Transform* allocate() {
        if (count >= MAX_ENTITIES) throw std::bad_alloc();
        return &transforms[count++];
    }
};

// 遍历所有 Transform 时，CPU 缓存命中率极高
void updateAllTransforms(TransformManager& manager, std::size_t count) {
    for (std::size_t i = 0; i < count; ++i) {
        // 连续访问，缓存友好
        // transforms[i].x += ...;
    }
}

理由7：获得非传统行为

场景1：共享内存分配

cpp 复制代码

#include <sys/shm.h>  // POSIX 共享内存

// 在共享内存中分配对象
void* sharedMemoryNew(std::size_t size, int shmId) {
    void* shmAddr = shmat(shmId, nullptr, 0);
    if (shmAddr == (void*)-1) throw std::bad_alloc();
    
    // 在共享内存中分配...
    return shmAddr;
}

// 自定义 operator new 用于共享内存对象
class SharedObject {
    static int shmId;
    
public:
    static void* operator new(std::size_t size) {
        return sharedMemoryNew(size, shmId);
    }
    
    static void operator delete(void* p) {
        shmdt(p);  // 分离共享内存
    }
};

场景2：安全擦除内存

cpp 复制代码

// 安全 delete：归还前将内存清零（防止敏感数据泄露）
template<typename T>
void secureDelete(T* p) {
    if (p) {
        p->~T();  // 调用析构函数
        
        // 安全擦除内存
        volatile char* ptr = reinterpret_cast<volatile char*>(p);
        for (std::size_t i = 0; i < sizeof(T); ++i) {
            ptr[i] = 0;
        }
        
        ::operator delete(const_cast<void*>(static_cast<const void*>(p)));
    }
}

三、自定义 new/delete 的注意事项

3.1 必须遵守的规则

cpp 复制代码

// 规则1：如果分配失败，循环调用 new-handler
void* operator new(std::size_t size) {
    if (size == 0) size = 1;
    
    while (true) {
        void* p = std::malloc(size);
        if (p) return p;
        
        // 获取并调用 new-handler
        std::new_handler handler = std::get_new_handler();
        if (handler) {
            handler();  // 可能释放内存或抛异常
        } else {
            throw std::bad_alloc();
        }
    }
}

// 规则2：处理零字节分配
// size == 0 时，返回一个有效的、可 delete 的指针

// 规则3：operator new[] 和 operator new 可以相同实现
// 但 operator delete[] 和 operator delete 通常也相同

// 规则4：保持对齐要求
// 返回的指针必须满足最严格的对齐要求

3.2 一个更完整的示例

cpp 复制代码

#include <iostream>
#include <new>
#include <cstdlib>

class DebugAllocator {
public:
    static void* allocate(std::size_t size, const char* file, int line) {
        if (size == 0) size = 1;
        
        // 额外空间存储调试信息
        std::size_t totalSize = size + sizeof(Header);
        
        void* rawMem = std::malloc(totalSize);
        if (!rawMem) {
            std::new_handler handler = std::get_new_handler();
            if (handler) {
                handler();
                return allocate(size, file, line);  // 重试
            }
            throw std::bad_alloc();
        }
        
        // 写入头部信息
        Header* header = static_cast<Header*>(rawMem);
        header->size = size;
        header->file = file;
        header->line = line;
        header->magic = MAGIC_NUMBER;
        
        totalAllocations++;
        currentBytes += size;
        
        return static_cast<char*>(rawMem) + sizeof(Header);
    }
    
    static void deallocate(void* p) {
        if (!p) return;
        
        void* rawMem = static_cast<char*>(p) - sizeof(Header);
        Header* header = static_cast<Header*>(rawMem);
        
        if (header->magic != MAGIC_NUMBER) {
            std::cerr << "【错误】检测到重复释放或内存损坏！\n";
            return;
        }
        
        currentBytes -= header->size;
        header->magic = 0;  // 标记为已释放
        
        std::free(rawMem);
    }
    
    static void printStats() {
        std::cout << "总分配次数: " << totalAllocations << "\n";
        std::cout << "当前占用: " << currentBytes << " bytes\n";
    }
    
private:
    struct Header {
        std::size_t size;
        const char* file;
        int line;
        unsigned magic;
    };
    
    static constexpr unsigned MAGIC_NUMBER = 0xDEADBEEF;
    static std::size_t totalAllocations;
    static std::size_t currentBytes;
};

std::size_t DebugAllocator::totalAllocations = 0;
std::size_t DebugAllocator::currentBytes = 0;

// 使用宏简化调用
#define DEBUG_NEW new(__FILE__, __LINE__)

// 自定义全局 operator new
void* operator new(std::size_t size, const char* file, int line) {
    return DebugAllocator::allocate(size, file, line);
}

void* operator new[](std::size_t size, const char* file, int line) {
    return DebugAllocator::allocate(size, file, line);
}

void operator delete(void* p) noexcept {
    DebugAllocator::deallocate(p);
}

void operator delete[](void* p) noexcept {
    DebugAllocator::deallocate(p);
}

四、何时不应该自定义 new/delete？

虽然自定义内存管理很强大，但在以下情况下应该避免这样做：

情况	原因
没有明确的性能瓶颈	过早优化是万恶之源
不熟悉内存对齐细节	错误的对齐会导致崩溃或性能下降
多线程环境复杂	线程安全的分配器实现困难
已有成熟的替代方案	如 Boost.Pool、jemalloc、tcmalloc

成熟的替代方案

cpp 复制代码

// Boost.Pool：固定大小对象的快速分配
#include <boost/pool/pool.hpp>

boost::pool<> widgetPool(sizeof(Widget));

Widget* pw = static_cast<Widget*>(widgetPool.malloc());  // O(1)
widgetPool.free(pw);

// jemalloc / tcmalloc：替换全局分配器
// 只需链接对应库，无需修改代码
// LD_PRELOAD=libjemalloc.so ./your_program

五、总结

替换理由	核心目标	实现复杂度
检测运用错误	越界检测、重复释放检测	中等
收集统计信息	分配大小分布、峰值追踪	低
增加分配速度	固定大小分配器、内存池	中等
降低额外开销	消除元数据开销	中等
优化内存齐位	SIMD 对齐、缓存行对齐	高
对象成簇集中	提高缓存命中率	中等
非传统行为	共享内存、安全擦除	高

💡 最佳实践：

在替换 new/delete 之前，先用工具（Valgrind、AddressSanitizer）检测问题

先用标准库的 std::allocator 或 Boost.Pool 等成熟方案

如果必须自定义，确保正确处理对齐、new-handler 和零字节分配

在类级别自定义优先于全局自定义，影响范围更可控

自定义 operator new 和 operator delete 是 C++ 内存管理的终极武器。用得好，可以大幅提升程序性能；用不好，则可能引入难以调试的 bug。谨慎评估，有的放矢，方能事半功倍！

参考资料：《Effective C++》第三版，Scott Meyers 著