LLVM学习笔记(62)

4.4.3.3. X86TargetLowering子对象

在X86Subtarget构造函数的314行,接着调用X86TargetLowering构造函数构建X86Subtarget中的该类型的子对象TLInfo。

这个TargetLowering派生类,由基于SelectionDAG的指令选择器用于描述LLVM代码如何被降级为SelectionDAG操作。至于其他,这个类展示了:

  • 用于各种ValueType的一个初始寄存器类别,
  • 目标机器原生支持哪些操作,
  • Setcc操作的返回类型,
  • 可用作偏移数的类型,及
  • 各种高级特性,比如通过常量将除法转换为一组乘法是否合算

4.4.3.3.1. TargetLowering

首先看一下基类TargetLowering的构造函数。

40TargetLowering::TargetLowering(const TargetMachine &tm)

41 : TargetLoweringBase(tm) {}

TargetLoweringBase构造函数的定义如下。它为各个目标机器提供了基准设置,各目标机器可以在自己的TargetLowering派生类的构造函数里重新设置相关的参数。

532TargetLoweringBase::TargetLoweringBase(const TargetMachine &tm) : TM(tm) {

533 initActions();

534

535 // Perform these initializations only once.

536 MaxStoresPerMemset = MaxStoresPerMemcpy = MaxStoresPerMemmove =

537 MaxLoadsPerMemcmp = 8;

538 MaxGluedStoresPerMemcpy = 0;

539 MaxStoresPerMemsetOptSize = MaxStoresPerMemcpyOptSize

540 = MaxStoresPerMemmoveOptSize = MaxLoadsPerMemcmpOptSize = 4;

541 UseUnderscoreSetJmp = false;

542 UseUnderscoreLongJmp = false;

543 HasMultipleConditionRegisters = false;

544 HasExtractBitsInsn = false;

545 JumpIsExpensive = JumpIsExpensiveOverride;

546 PredictableSelectIsExpensive = false;

547 EnableExtLdPromotion = false;

548 HasFloatingPointExceptions = true;

549 StackPointerRegisterToSaveRestore = 0;

550 BooleanContents = UndefinedBooleanContent;

551 BooleanFloatContents = UndefinedBooleanContent;

552 BooleanVectorContents = UndefinedBooleanContent;

553 SchedPreferenceInfo = Sched::ILP;

554 JumpBufSize = 0;

555 JumpBufAlignment = 0;

556 MinFunctionAlignment = 0;

557 PrefFunctionAlignment = 0;

558 PrefLoopAlignment = 0;

559 GatherAllAliasesMaxDepth = 18;

560 MinStackArgumentAlignment = 1;

561 // TODO: the default will be switched to 0 in the next commit, along

562 // with the Target-specific changes necessary.

563 MaxAtomicSizeInBitsSupported = 1024;

564

565 MinCmpXchgSizeInBits = 0;

566 SupportsUnalignedAtomics = false;

567

568 std::fill(std::begin(LibcallRoutineNames ), std::end(LibcallRoutineNames), nullptr);

569

570 InitLibcalls(TM.getTargetTriple());

571 InitCmpLibcallCCs(CmpLibcallCCs);

572}

在750行调用initActions()初始化各种action。从下面的代码可以看到,有这些action:OpActions、LoadExtActions、TruncStoreActions、IndexedModeActions与CondCodeActions。它们都是整数类型的数组,数组的内容则是一个LegalizeAction枚举类型。这个枚举类型表示指定的操作对一个目标机器是否合法。如果不是,应该采取什么行动使它们合法:

43 namespace LegalizeActions {

44 enum LegalizeAction : std::uint8_t {

45 /// The operation is expected to be selectable directly by the target, and

46 /// no transformation is necessary.

47 Legal,

48

49 /// The operation should be synthesized from multiple instructions acting on

50 /// a narrower scalar base-type. For example a 64-bit add might be

51 /// implemented in terms of 32-bit add-with-carry.

52 NarrowScalar,

53

54 /// The operation should be implemented in terms of a wider scalar

55 /// base-type. For example a <2 x s8> add could be implemented as a <2

56 /// x s32> add (ignoring the high bits).

57 WidenScalar,

58

59 /// The (vector) operation should be implemented by splitting it into

60 /// sub-vectors where the operation is legal. For example a <8 x s64> add

61 /// might be implemented as 4 separate <2 x s64> adds.

62 FewerElements,

63

64 /// The (vector) operation should be implemented by widening the input

65 /// vector and ignoring the lanes added by doing so. For example <2 x i8> is

66 /// rarely legal, but you might perform an <8 x i8> and then only look at

67 /// the first two results.

68 MoreElements,

69

70 /// The operation itself must be expressed in terms of simpler actions on

71 /// this target. E.g. a SREM replaced by an SDIV and subtraction.

72 Lower,

73

74 /// The operation should be implemented as a call to some kind of runtime

75 /// support library. For example this usually happens on machines that don't

76 /// support floating-point operations natively.

77 Libcall,

78

79 /// The target wants to do something special with this combination of

80 /// operand and type. A callback will be issued when it is needed.

81 Custom,

82

83 /// This operation is completely unsupported on the target. A programming

84 /// error has occurred.

85 Unsupported,

86

87 /// Sentinel value for when no action was found in the specified table.

88 NotFound,

89

90 /// Fall back onto the old rules.

91 /// TODO: Remove this once we've migrated

92 UseLegacyRules,

93};

94 } // end namespace LegalizeActions

因此上述数组的定义分别为:

  • LegalizeAction OpActions[MVT::LAST_VALUETYPE][ISD::BUILTIN_OP_END]

对于每个操作符以及每个类型,保存一个LegalizeAction值,指示指令选择如何处理该操作。大多数操作是合法的(即目标机器原生支持),但是不支持的操作应该被描述。注意这里不考虑非法值类型上的操作。

  • uint16_t LoadExtActions[MVT::LAST_VALUETYPE][MVT::LAST_VALUETYPE]

对于每个载入扩展类型以及每个值类型,保存一个LegalizeAction值,指示指令选择应如何应对涉及一个指定值类型及其扩展类型的载入。使用4比特为每个载入类型保存动作,4个载入类型为一组。

  • LegalizeAction TruncStoreActions[MVT::LAST_VALUETYPE][MVT::LAST_VALUETYPE]

对于每个值类型对,保存一个LegalizeAction值,指示涉及一个指定值类型及其截断类型的截断载入是否合法。

  • uint8_t IndexedModeActions[MVT::LAST_VALUETYPE][ISD::LAST_INDEXED_MODE]

其中ISD::LAST_INDEXED_MODE是内存地址索引模式的数量。对于每个索引模式以及每个值类型,保存一对LegalizeAction值来指示指令选择应如何应对保存及载入。第一维是参考的value_type。第二维代表读写的各种模式。

  • uint32_t CondCodeActions[ISD::SETCC_INVALID][(MVT::LAST_VALUETYPE + 7) / 8]

其中ISD::SETCC_INVALID是LLVM IR条件指令的数量。因此对每个条件码(ISD::CondCode)保存一个LegalizeAction值,指示指令选择应如何处理该条件码。每个CC活动使用4比特。

  • 另外,TargetDAGCombineArray是另一个数组定义。它的类型是:

unsigned char TargetDAGCombineArray[(ISD::BUILTIN_OP_END+CHAR_BIT-1)/CHAR_BIT]

它是一个位图,每个LLVM IR操作对应一个位,如果是1,表示该操作期望使用目标机器的回调方法PerformDAGCombine()来执行指令合并。

574 void TargetLoweringBase::initActions() {

575 // All operations default to being supported.

576 memset(OpActions, 0, sizeof(OpActions));

577 memset(LoadExtActions, 0, sizeof(LoadExtActions));

578 memset(TruncStoreActions, 0, sizeof(TruncStoreActions));

579 memset(IndexedModeActions, 0, sizeof(IndexedModeActions));

580 memset(CondCodeActions, 0, sizeof(CondCodeActions));

581 std::fill(std::begin(RegClassForVT), std::end(RegClassForVT), nullptr);

582 std::fill(std::begin(TargetDAGCombineArray),

583 std::end(TargetDAGCombineArray), 0);

584

585 // Set default actions for various operations.

586 for (MVT VT : MVT::all_valuetypes()) {

587 // Default all indexed load / store to expand.

588 for (unsigned IM = (unsigned)ISD::PRE_INC;

589 IM != (unsigned)ISD::LAST_INDEXED_MODE; ++IM) {

590 setIndexedLoadAction(IM, VT, Expand);

591 setIndexedStoreAction(IM, VT, Expand);

592 }

593

594 // Most backends expect to see the node which just returns the value loaded.

595 setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, VT, Expand);

596

597 // These operations default to expand.

598 setOperationAction(ISD::FGETSIGN, VT, Expand);

599 setOperationAction(ISD::CONCAT_VECTORS, VT, Expand);

600 setOperationAction(ISD::FMINNUM, VT, Expand);

601 setOperationAction(ISD::FMAXNUM, VT, Expand);

602 setOperationAction(ISD::FMINNAN, VT, Expand);

603 setOperationAction(ISD::FMAXNAN, VT, Expand);

604 setOperationAction(ISD::FMAD, VT, Expand);

605 setOperationAction(ISD::SMIN, VT, Expand);

606 setOperationAction(ISD::SMAX, VT, Expand);

607 setOperationAction(ISD::UMIN, VT, Expand);

608 setOperationAction(ISD::UMAX, VT, Expand);

609 setOperationAction(ISD::ABS, VT, Expand);

610

611 // Overflow operations default to expand

612 setOperationAction(ISD::SADDO, VT, Expand);

613 setOperationAction(ISD::SSUBO, VT, Expand);

614 setOperationAction(ISD::UADDO, VT, Expand);

615 setOperationAction(ISD::USUBO, VT, Expand);

616 setOperationAction(ISD::SMULO, VT, Expand);

617 setOperationAction(ISD::UMULO, VT, Expand);

618

619 // ADDCARRY operations default to expand

620 setOperationAction(ISD::ADDCARRY, VT, Expand);

621 setOperationAction(ISD::SUBCARRY, VT, Expand);

622 setOperationAction(ISD::SETCCCARRY, VT, Expand);

623

624 // ADDC/ADDE/SUBC/SUBE default to expand.

625 setOperationAction(ISD::ADDC, VT, Expand);

626 setOperationAction(ISD::ADDE, VT, Expand);

627 setOperationAction(ISD::SUBC, VT, Expand);

628 setOperationAction(ISD::SUBE, VT, Expand);

629

630 // These default to Expand so they will be expanded to CTLZ/CTTZ by default.

631 setOperationAction(ISD::CTLZ_ZERO_UNDEF, VT, Expand);

632 setOperationAction(ISD::CTTZ_ZERO_UNDEF, VT, Expand);

633

634 setOperationAction(ISD::BITREVERSE, VT, Expand);

635

636 // These library functions default to expand.

637 setOperationAction(ISD::FROUND, VT, Expand);

638 setOperationAction(ISD::FPOWI, VT, Expand);

639

640 // These operations default to expand for vector types.

641 if (VT.isVector()) {

642 setOperationAction(ISD::FCOPYSIGN, VT, Expand);

643 setOperationAction(ISD::ANY_EXTEND_VECTOR_INREG, VT, Expand);

644 setOperationAction(ISD::SIGN_EXTEND_VECTOR_INREG, VT, Expand);

645 setOperationAction(ISD::ZERO_EXTEND_VECTOR_INREG, VT, Expand);

646 }

647

648 // For most targets @llvm.get.dynamic.area.offset just returns 0.

649 setOperationAction(ISD::GET_DYNAMIC_AREA_OFFSET, VT, Expand);

650 }

651

652 // Most targets ignore the @llvm.prefetch intrinsic.

653 setOperationAction(ISD::PREFETCH, MVT::Other, Expand);

654

655 // Most targets also ignore the @llvm.readcyclecounter intrinsic.

656 setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Expand);

657

658 // ConstantFP nodes default to expand. Targets can either change this to

659 // Legal, in which case all fp constants are legal, or use isFPImmLegal()

660 // to optimize expansions for certain constants.

661 setOperationAction(ISD::ConstantFP, MVT::f16, Expand);

662 setOperationAction(ISD::ConstantFP, MVT::f32, Expand);

663 setOperationAction(ISD::ConstantFP, MVT::f64, Expand);

664 setOperationAction(ISD::ConstantFP, MVT::f80, Expand);

665 setOperationAction(ISD::ConstantFP, MVT::f128, Expand);

666

667 // These library functions default to expand.

668 for (MVT VT : {MVT::f32, MVT::f64, MVT::f128}) {

669 setOperationAction(ISD::FLOG , VT, Expand);

670 setOperationAction(ISD::FLOG2, VT, Expand);

671 setOperationAction(ISD::FLOG10, VT, Expand);

672 setOperationAction(ISD::FEXP , VT, Expand);

673 setOperationAction(ISD::FEXP2, VT, Expand);

674 setOperationAction(ISD::FFLOOR, VT, Expand);

675 setOperationAction(ISD::FNEARBYINT, VT, Expand);

676 setOperationAction(ISD::FCEIL, VT, Expand);

677 setOperationAction(ISD::FRINT, VT, Expand);

678 setOperationAction(ISD::FTRUNC, VT, Expand);

679 setOperationAction(ISD::FROUND, VT, Expand);

680 }

681

682 // Default ISD::TRAP to expand (which turns it into abort).

683 setOperationAction(ISD::TRAP, MVT::Other, Expand);

684

685 // On most systems, DEBUGTRAP and TRAP have no difference. The "Expand"

686 // here is to inform DAG Legalizer to replace DEBUGTRAP with TRAP.

687 setOperationAction(ISD::DEBUGTRAP, MVT::Other, Expand);

688}

576~580行将所有这些容器都置0了,意味着所有的action都是合法的,而且所有的操作都不需要回调PerformDAGCombine。接下来的代码将个别的操作设置为Expand,下面会看到X86的派生类型还会进行自己的改写。

执行完initActions()后,在TargetLoweringBase构造函数,接下来初始化这些参数成员。

  • MaxStoresPerMemset
  • MaxLoadsPerMemcmp
  • MaxStoresPerMemcpy
  • MaxStoresPerMemmove

在降级@llvm.memset/@llvm.memcpy/@llvm.memmove时,这个域指明替换memset/memcpy/ memmove调用所需的最大储存次数。目标机器必须基于代价门限设置这个值。应该假设目标机器将根据对齐限制,首先使用尽可能多的最大的储存操作,然后如果需要较小的操作。例如,在32位机器上以16比特对齐保存9字节将导致4次2字节储存与1次单字节储存。这仅适用于设置一个常量大小的常量数组。

  • MaxStoresPerMemcpyOptSize
  • MaxLoadsPerMemcmpOptSize
  • MaxStoresPerMemmoveOptSize

替换memcpy/memmove调用的最大储存次数,用于带有OptSize属性的函数。

  • UseUnderscoreSetJmp,UseUnderscoreLongJmp

表示是否使用_setjmp或_longjmp来实现llvm.setjmp或llvm.longjmp。

  • MaxGluedStoresPerMemcpy

在基于MaxStoresPerMemcpy内联memcpy时,说明保持在一起的最大储存指令数。这有助于后面的成对与向量化。

  • HasMultipleConditionRegisters

告诉代码生成器目标机器是否有多个(可分配)条件寄存器用于保存比较结果。如果有多个条件寄存器,代码生成器就不会激进地将比较下沉到使用者所在基本块。

  • HasExtractBitsInsn

告诉代码生成器目标机器是否有BitExtract指令。如果对BitExtract指令,使用者生成一个与shift组合的and指令,代码生成器将激进地将shift下沉到使用者所在基本块。

  • JumpIsExpensive

告诉代码生成器不要生成额外的流控指令,而是应该尝试通过预测合并流控指令。

  • PredictableSelectIsExpensive

告诉代码生成器,如果一个分支的预测通常是正确的,select比该跳转代价要更高。

  • EnableExtLdPromotion

表示目标机器是否希望使用将ext(promotableInst1(...(promotableInstN(load))))转换为promotedInst1(...(promotedInstN(ext(load))))的优化。

  • HasFloatingPointExceptions

表示目标机器是否支持或在意保留浮点数的异常行为。

  • StackPointerRegisterToSaveRestore

如果设置为一个物理寄存器,就指定了llvm.savestack及llvm.restorestack应该保存及恢复的寄存器。

  • BooleanContents
  • BooleanFloatContents
  • BooleanVectorContents

它们都是BooleanContent枚举类型,其定义如下:

140 enum BooleanContent {

141 UndefinedBooleanContent, // Only bit 0 counts, the rest can hold garbage.

142 ZeroOrOneBooleanContent, // All bits zero except for bit 0.

143 ZeroOrNegativeOneBooleanContent // All bits equal to bit 0.

144 };

用于表示各自大于i1类型中的布尔值高位的内容。

  • SchedPreferenceInfo

表示目标机器的调度偏好,通常为了达到总周期数最短或最低寄存器压力的目的。它的类型是Sched::Preference,这个枚举类型给出了LLVM目前支持的调度器类型。

95 enum Preference {

96 None, // No preference

97 Source, // Follow source order.

98 RegPressure, // Scheduling for lowest register pressure.

99 Hybrid, // Scheduling for both latency and register pressure.

100 ILP, // Scheduling for ILP in low register pressure mode.

101 VLIW // Scheduling for VLIW targets.

102 };

103 }

  • JumpBufSize
  • JumpBufAlignment

目标机器jmp_buf缓冲的字节数以及对齐要求。

  • MinFunctionAlignment
  • PrefFunctionAlignment
  • PrefLoopAlignment

分别表示函数的最小对齐要求(用于优化代码大小时,防止显式提供的对齐要求导致错误代码),函数的期望对齐要求(用于没有对齐要求且优化速度时),以及期望的循环对齐要求。

  • MinStackArgumentAlignment

栈上任何参数所需的最小对齐要求。

在568行的容器LibcallRoutineNames的定义是:const char *LibcallRoutineNames[RTLIB:: UNKNOWN_LIBCALL]。其中RTLIB::UNKNOWN_LIBCALL是后端可以发布的运行时库函数调用的数量。这些库函数由RTLIB::Libcall枚举类型描述。这个表由下面的方法根据配置文件来填充:

118 void TargetLoweringBase::InitLibcalls(const Triple &TT) {

119 #define HANDLE_LIBCALL(code, name) \

120 setLibcallName(RTLIB::code, name);

121 #include "llvm/IR/RuntimeLibcalls.def"

122 #undef HANDLE_LIBCALL

123 // Initialize calling conventions to their default.

124 for (int LC = 0; LC < RTLIB::UNKNOWN_LIBCALL; ++LC)

125 setLibcallCallingConv((RTLIB::Libcall)LC, CallingConv::C);

126

127 // A few names are different on particular architectures or environments.

128 if (TT.isOSDarwin()) {

129 // For f16/f32 conversions, Darwin uses the standard naming scheme, instead

130 // of the gnueabi-style _gnu*_ieee.

131 // FIXME: What about other targets?

132 setLibcallName(RTLIB::FPEXT_F16_F32, "__extendhfsf2");

133 setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2");

134

135 // Some darwins have an optimized __bzero/bzero function.

136 switch (TT.getArch()) {

137 case Triple::x86:

138 case Triple::x86_64:

139 if (TT.isMacOSX() && !TT.isMacOSXVersionLT(10, 6))

140 setLibcallName(RTLIB::BZERO, "__bzero");

141 break;

142 case Triple::aarch64:

143 setLibcallName(RTLIB::BZERO, "bzero");

144 break;

145 default:

146 break;

147 }

148

149 if (darwinHasSinCos(TT)) {

150 setLibcallName(RTLIB::SINCOS_STRET_F32, "__sincosf_stret");

151 setLibcallName(RTLIB::SINCOS_STRET_F64, "__sincos_stret");

152 if (TT.isWatchABI()) {

153 setLibcallCallingConv(RTLIB::SINCOS_STRET_F32,

154 CallingConv::ARM_AAPCS_VFP);

155 setLibcallCallingConv(RTLIB::SINCOS_STRET_F64,

156 CallingConv::ARM_AAPCS_VFP);

157 }

158 }

159 } else {

160 setLibcallName(RTLIB::FPEXT_F16_F32, "__gnu_h2f_ieee");

161 setLibcallName(RTLIB::FPROUND_F32_F16, "__gnu_f2h_ieee");

162 }

163

164 if (TT.isGNUEnvironment() || TT.isOSFuchsia()) {

165 setLibcallName(RTLIB::SINCOS_F32, "sincosf");

166 setLibcallName(RTLIB::SINCOS_F64, "sincos");

167 setLibcallName(RTLIB::SINCOS_F80, "sincosl");

168 setLibcallName(RTLIB::SINCOS_F128, "sincosl");

169 setLibcallName(RTLIB::SINCOS_PPCF128, "sincosl");

170 }

171

172 if (TT.isOSOpenBSD()) {

173 setLibcallName(RTLIB::STACKPROTECTOR_CHECK_FAIL, nullptr);

174 }

175}

配置文件RuntimeLibcalls.def定义了后端可以生成的运行时库调用。它包含的内容形如:

HANDLE_LIBCALL(SHL_I16, "__ashlhi3")

在InitLibcalls()开头生成的宏定义会合成这样的枚举值:RTLIB::SHL_I16。这个枚举值实际上也是根据RuntimeLibcalls.def的内容生成的(RuntimeLibcalls.h)。因此,setLibcallName()就是以这些枚举值为下标记录对应的函数名。

setLibcallCallingConv()则是以这些枚举值为下标记录对应函数使用的调用惯例。缺省都是与C调用惯例兼容的LLVM缺省调用惯例。

571行的CmpLibcallCCs的定义是:ISD::CondCode CmpLibcallCCs[RTLIB::UNKNOWN_LIBCALL]。因此InitCmpLibcallCCs()是通过CmpLibcallCCs将RTLIB::Libcall中关于比较的函数关联到反映它们布尔结果的ISD::CondCode值。

相关推荐
南宫生4 小时前
力扣-图论-17【算法学习day.67】
java·学习·算法·leetcode·图论
sanguine__4 小时前
Web APIs学习 (操作DOM BOM)
学习
冷眼看人间恩怨4 小时前
【Qt笔记】QDockWidget控件详解
c++·笔记·qt·qdockwidget
数据的世界016 小时前
.NET开发人员学习书籍推荐
学习·.net
四口鲸鱼爱吃盐6 小时前
CVPR2024 | 通过集成渐近正态分布学习实现强可迁移对抗攻击
学习
OopspoO8 小时前
qcow2镜像大小压缩
学习·性能优化
A懿轩A9 小时前
C/C++ 数据结构与算法【栈和队列】 栈+队列详细解析【日常学习,考研必备】带图+详细代码
c语言·数据结构·c++·学习·考研·算法·栈和队列
居居飒9 小时前
Android学习(四)-Kotlin编程语言-for循环
android·学习·kotlin
kkflash39 小时前
提升专业素养的实用指南
学习·职场和发展
Hejjon10 小时前
SpringBoot 整合 SQLite 数据库
笔记