C++ 使用分治减小模板递归深度

起因

C++14 引入 STL 的 make_index_sequence 可以生成一个类型为 std::size_t，0 到 N-1 的编译期序列，我们可以这样使用它：

代码

c 复制代码

//利用函数参数推导提取序列
template<std::size_t... Seq>
void foo(std::index_sequence<Seq...>)
{
    //使用Seq
}

//利用模板特化提取序列
template<typename>
struct extract_sequence;
template<std::size_t... Seq>
struct extract_sequence<std::index_sequence<Seq...>>
{
    //使用Seq
};

foo(std::make_index_sequence<5>{});//foo内部得到 {0, 1, 2, 3, 4}
extract_sequence<std::make_index_sequence<3>>;//extract_sequence内部得到 {0, 1, 2}

可是在我的程序中，我想创建一个等差数列，std::make_index_sequence 无法直接拿来使用（其实可以间接使用，后文会提到）。于是我编写了一个自己的版本：

代码

c 复制代码

//主模板，递归实例化
template<std::size_t Base,std::size_t Step, std::size_t Count, std::size_t... Seq>
struct make_arithmetic_progression : make_arithmetic_progression<Base + Step, Step, Count-1, Seq..., Base>
{

};

//递归终点特化版本
template<std::size_t Base, std::size_t Step, std::size_t... Seq>
struct make_arithmetic_progression<Base, Step, 0, Seq...>
{
    using type = std::index_sequence<Seq...>;
};

template<std::size_t Base, std::size_t Step, std::size_t Count>
using arithmetic_progression = typename make_arithmetic_progression<Base, Step, Count>::type;

foo(arithmetic_progression<1, 2, 3>{});//foo内部得到 {1, 3, 5}
extract_sequence<arithmetic_progression<3, 3, 3>>;//extract_sequence内部得到 {3, 6, 9}

问题

这个实现看起来没什么问题，可以正确地生成任意指定的等差数列。但是在一个偶然的情况下，我在创建一个很长的数列时，遇到以下错误：

](p0-xtjj-private.juejin.cn/tos-cn-i-73...)

模板递归实例化深度超过限制了，查阅了一下资料，获知各家编译器都对这个深度有一个默认限制，GCC 和 clang 都是 1024，MSVC 是 499。

可以使用编译参数来指定这个限制：

代码

ini 复制代码

-ftemplate-depth=N //GCC 和 clang
/Zc:templateDepth=N //MSVC

但这显然是下策，每次编译这个模板都需要在编译时加上额外的指令，非常麻烦。

解决

自定义方式

我们知道有个排序算法叫归并排序，将长列表不断拆分成最小单元后合并。我们的这个数列实现也可以直接借鉴这个思路，拆分数组后合并：

代码

arduino 复制代码

//list拼接类主模板
template<typename,typename>
struct index_list_concat;

//list拼接类特化版本
template<std::size_t... Seq1, std::size_t... Seq2>
struct index_list_concat<std::index_sequence<Seq1...>, std::index_sequence<Seq2...>>
{
    using type = std::index_sequence<Seq1..., Seq2...>;
};

//辅助分割主模板
template<std::size_t Base, std::size_t Step, std::size_t Count>
class make_arithmetic_progression
{
    static constexpr std::size_t HalfCount = Count / 2;
    static constexpr std::size_t HalfBase = Base + HalfCount * Step;

public:
    using type = typename index_list_concat<typename make_arithmetic_progression<Base, Step, HalfCount>::type,
                typename make_arithmetic_progression<HalfBase, Step, Count - HalfCount>::type>::type;
};

//拆分为1个元素特化版本
template<std::size_t Base, std::size_t Step>
class make_arithmetic_progression<Base, Step, 1>
{
public:
    using type = index_list<Base>;
};

//拆分为2个元素特化版本
template<std::size_t Base, std::size_t Step>
class make_arithmetic_progression<Base, Step, 2>
{
public: https://www.hefeilaws.com/
    using type = index_list<Base, Base + Step>;
};

template<std::size_t Base, std::size_t Step, std::size_t Count>
using arithmetic_progression = typename make_arithmetic_progression<Base, Step, Count>::type;

这样对于长度为 Count 的序列，模板递归实例化的深度变为 log2Count，按照 MSVC 的默认最大深度 499 来算，也能支持到 2499 长度的序列，这无论如何都够用了，更别说 GCC 和 clang 的 1024 了。

借用标准库

上文说到可以间接使用标准库来实现这个等差数列模板。在我们初中的时候就知道，可以用公差为 1 的等差数列生成任意的其他等差数列：

为公差，为首项为公差，为首项an=n(n=0,1,2,...)bn=an∗d+k(n=0,1,2,...)//d为公差，k为首项结合 C++17 的折叠表达式，我们可以构建这样的模板：

代码

c 复制代码

//辅助推导函数
template<std::size_t Base, std::size_t Step, std::size_t... Seq>
std::index_sequence<(Base + Step * Seq)...> arithmetic_progression_helper(std::index_sequence<Seq...>);

template<std::size_t Base, std::size_t Step, std::size_t Count>
using arithmetic_progression = decltype(arithmetic_progression_helper<Base, Step>(std::make_index_sequence<Count>{}));

这个模板只是在 STL 的 index_sequence 上做了一次包装，更加简洁。

优劣

在我对上述两种实现进行测试时，发现在实例化超长的数列时，使用 STL 的版本无论在速度上还是内存消耗上都显著优于自定义版本。

比如在我的电脑上使用 clang 编译，实例化一个长度为 100000 的数列。STL 版本的实现编译耗时 1 秒左右，内存仅占用数十 MB；而自定义版本的编译却需要 10 秒左右，内存占用超过 1.2GB。

为什么相差如此悬殊？我于是很好奇 STL 是如何实现 make_index_sequence 的，找到源码后跟进去一瞧，它其实是 __make_integer_seq<std::size_t, N>。大家都知道，前导两个下划线的 STL 模板是内部实现版本，它们要么由编译器支持没有源码，要么有源码但不建议直接使用，这个 __make_integer_seq 属于前者。这些由编译器支持的模板大部分属于用户端不可能实现或者实现极为复杂，而在编译器端实现会相对简单并且高效的。

这提醒我们，对于标准库能够直接或间接支持的模块，应该优先选择使用标准库来实现，它们会帮我们最大化地利用资源。不过在学习过程中，编写自己的版本也是很有价值的，能够让我们深入细节，接触优秀的编码思想和优化策略。

总结

使用分治来降低模板递归实例化深度这一策略，是我以前从没考虑过的，虽然它在本次编码中被最终舍弃，但是日后说不定有用武之地。本篇学习记录梳理了我在逐步实现和改进等差数列模板的思路，总结如下：

学习过程中不断造轮子加验证是获取和掌握知识的有效途径；
在实际项目中，尽量使用经过大众检验的库有助于项目的稳定高效。

对于第二条，如果为了一个模块而引入一整个大型库的依赖，仍须仔细斟酌。这与本篇笔记的内容无关，但在实际工作中，如果每个人都为了实现一个简单模块而肆无忌惮的引入三方依赖，造成团队项目仓库中一大半是三方库内容的情况还是很让人不舒服的。