边界检查消除（BCE,Bound Check Elimination）

Go是一种内存安全的语言。在数组/切片/字符串元素索引和子切片操作中，Go运行时会检查所涉及的索引是否越界。如果索引越界，将会产生一个panic，以防止无效索引造成危害。这就叫做边界检查。

边界检查使我们的代码能够安全运行，但另一方面，也会让代码运行速度略有下降。这是安全型语言必须做出的权衡。

自Go工具链v1.7起，标准Go编译器开始支持BCE（边界检查消除）。BCE可以避免一些不必要的边界检查，从而使标准Go编译器能够生成更高效的程序。以下将列举一些示例，展示在哪些情况下BCE有效，哪些情况下BCE无效。我们可以使用 `-d=ssa/check_bce`编译选项来显示哪些代码行需要进行边界检查。

Example 1

一个简单的示例：

scss 复制代码

// example1.go
package main

func f1a(s []struct{}, index int) {
	_ = s[index] // line 5: Found IsInBounds
	_ = s[index]
	_ = s[index:]
	_ = s[:index+1]
}

func f1b(s []byte, index int) {
	s[index-1] = 'a' // line 12: Found IsInBounds
	_ = s[:index]
}

func f1c(a [5]int) {
	_ = a[0]
	_ = a[4]
}

func f1d(s []int) {
	if len(s) > 2 {
	    _, _, _ = s[0], s[1], s[2]
	}
}

func f1g(s []int) {
	middle := len(s) / 2
	_ = s[:middle]
	_ = s[middle:]
}

func main() {}

让我们使用`-d=ssa/check_bce`编译选项来运行它：

go 复制代码

$ go run -gcflags="-d=ssa/check_bce" example1.go
./example1.go:5:7: Found IsInBounds
./example1.go:12:3: Found IsInBounds

输出结果表明，在上述示例代码中只有两行代码需要进行边界检查。

请注意：版本低于1.21的Go工具链无法去除`f1g`函数中的边界检查。

并且要注意，截至目前（Go工具链v1.24.n ），如果泛型函数中的某个操作涉及类型参数，且该泛型函数从未实例化，官方标准编译器不会对该操作进行BCE检查。例如，`go run -gcflags=-d=ssa/check_bce bar.go`命令不会报告任何内容。

ini 复制代码

// bar.go
package bar

func foo[E any](s []E) {
	_ = s[0] // line 5
	_ = s[1] // line 6
	_ = s[2] // line 7
}

// var _ = foo[bool]

然而，如果变量声明行启用，那么编译器将会报告：

bash 复制代码

./bar.go:5:7: Found IsInBounds
./bar.go:6:7: Found IsInBounds
./bar.go:7:7: Found IsInBounds
./bar.go:4:6: Found IsInBounds

Example 2

以下示例中所示的切片元素索引和子切片操作中的所有边界检查都被消除了。

css 复制代码

// example2.go
package main

func f2a(s []int) {
	for i := range s {
		_ = s[i]
		_ = s[i:len(s)]
		_ = s[:i+1]
	}
}

func f2b(s []int) {
	for i := 0; i < len(s); i++ {
		_ = s[i]
		_ = s[i:len(s)]
		_ = s[:i+1]
	}
}

func f2c(s []int) {
	for i := len(s) - 1; i >= 0; i-- {
		_ = s[i]
		_ = s[i:len(s)]
		_ = s[:i+1]
	}
}

func f2d(s []int) {
	for i := len(s); i > 0; {
		i--
		_ = s[i]
		_ = s[i:len(s)]
		_ = s[:i+1]
	}
}

func f2e(s []int) {
	for i := 0; i < len(s) - 1; i += 2 {
		_ = s[i]
		_ = s[i:len(s)]
		_ = s[:i+1]
	}
}

func main() {}

运行它，我们会发现没有任何输出。没错，官方标准的Go编译器非常智能，它发现上述示例代码中的所有边界检查都可以移除。

go 复制代码

$ go run -gcflags="-d=ssa/check_bce" example2.go

注意：在1.24版本之前，标准Go编译器无法移除以下两个循环中的边界检查。

css 复制代码

func f2g(s []int) {
	for i := len(s) - 1; i >= 0; i-- {
		_ = s[:i+1]
	}
}

func f2h(s []int) {
	for i := 0; i <= len(s) - 1; i++ {
		_ = s[:i+1]
	}
}

Example 3

我们应尽量尽早计算具有最大索引的元素索引或子切片操作，以减少边界检查的次数。在下面的示例中，如果计算表达式 `s[3]` 时不发生恐慌，那么 `s[0]`、`s[1]` 和 `s[2]` 的边界检查就可以消除。

go 复制代码

// example3.go
package main

func f3a(s []int32) int32 {
	return s[0] | // Found IsInBounds (line 5)
		s[1] | // Found IsInBounds
		s[2] | // Found IsInBounds
		s[3]   // Found IsInBounds
}

func f3b(s []int32) int32 {
	return s[3] | // Found IsInBounds (line 12)
		s[0] |
		s[1] |
		s[2]
}

func main() {
}

运行:

bash 复制代码

./example3.go:5:10: Found IsInBounds
./example3.go:6:4: Found IsInBounds
./example3.go:7:4: Found IsInBounds
./example3.go:8:4: Found IsInBounds
./example3.go:12:10: Found IsInBounds

从输出结果可知，`f3a`函数中有4次边界检查，而`f3b`函数中仅有1次。

Example 4

自 Go 工具链 v1.19 起，`f5a `函数中的边界检查已成功移除。

go 复制代码

func f5a(isa []int, isb []int) {
	if len(isa) > 0xFFF {
		for _, n := range isb {
			_ = isa[n & 0xFFF]
		}
	}
}

然而，在 Go 工具链 v1.19 之前，该检查并未被移除。1.19 版本之前的编译器需要一个提示才能移除该检查，如 `f5b` 函数所示：

go 复制代码

func f5b(isa []int, isb []int) {
	if len(isa) > 0xFFF {
		// A successful hint (for v1.18- compilers)
		isa = isa[:0xFFF+1]
		for _, n := range isb {
			_ = isa[n & 0xFFF] // BCEed!
		}
	}
}

func f5c(isa []int, isb []int) {
	if len(isa) > 0xFFF {
		// A not-workable hint (for v1.18- compilers)
		_ = isa[:0xFFF+1]
		for _, n := range isb {
			_ = isa[n & 0xFFF] // Found IsInBounds
		}
	}
}

func f5d(isa []int, isb []int) {
	if len(isa) > 0xFFF {
		// A not-workable hint (for v1.18- compilers)
		_ = isa[0xFFF]
		for _, n := range isb {
			_ = isa[n & 0xFFF] // Found IsInBounds
		}
	}
}

下一部分将展示更多需要编译器提示以避免一些不必要边界检查的情况。

Example 5

在Go工具链v1.24之前，以下代码中存在一些不必要的边界检查：

go 复制代码

func fz(s, x, y []byte) {
	n := copy(s, x)
	copy(s[n:], y) // Found IsSliceInBounds (1.24-)
	_ = x[n:]      // Found IsSliceInBounds (1.24-)
}

func fy(a, b []byte) {
    for i := range min(len(a), len(b)) {
        _ = a[i] // Found IsInBounds (1.24-)
        _ = b[i] // Found IsInBounds (1.24-)
    }
}

func fx(a [256]byte) {
	for i := 0; i < 128; i++ {
		_ = a[2*i] // Found IsInBounds (1.24-)
	}
}

func f4a(is []int, bs []byte) {
	if len(is) >= 256 {
		for _, n := range bs {
			_ = is[n] // Found IsInBounds (1.24-)
		}
	}
}

从1.24版本开始，这些边界检查都被移除了。

在1.24版本之前，我们必须添加一行提示代码来移除f4a函数中的边界检查：

csharp 复制代码

func f4a(is []int, bs []byte) {
	if len(is) >= 256 {
		is = is[:256] // a successful hint
		for _, n := range bs {
			_ = is[n] // BCEed!
		}
	}
}

自1.24版本起，这行提示代码就没必要了。

有时候，编译器需要一些提示来消除某些边界检查。

官方标准的Go编译器还不够智能，无法移除所有不必要的边界检查。有时，需要给编译器一些提示，以便移除某些边界检查。

在下面的示例中，通过在函数 `NumSameBytes_2` 中添加一个冗余的 `if` 代码块，循环中的所有边界检查都被消除了。

scss 复制代码

type T = string

func NumSameBytes_1(x, y T) int {
	if len(x) > len(y) {
		x, y = y, x
	}
	for i := 0; i < len(x); i++ {
		if x[i] != 
			y[i] { // Found IsInBounds
			return i
		}
	}
	return len(x)
}

func NumSameBytes_2(x, y T) int {
	if len(x) > len(y) {
		x, y = y, x
	}
	
	// a successful hint
	if len(x) > len(y) {
		panic("unreachable")
	}
	
	for i := 0; i < len(x); i++ {
		if x[i] != y[i] { // BCEed!
			return i
		}
	}
	return len(x)
}

上述提示在` T `为字符串类型或切片类型时均有效，而以下两个提示中的每一个仅对一种情况有效（截至 Go 工具链 v1.24.n ）。

scss 复制代码

func NumSameBytes_3(x, y T) int {
	if len(x) > len(y) {
		x, y = y, x
	}
	
	y = y[:len(x)] // a hint, only works if T is slice
	for i := 0; i < len(x); i++ {
		if x[i] != y[i] {
			return i
		}
	}
	return len(x)
}

func NumSameBytes_4(x, y T) int {
	if len(x) > len(y) {
		x, y = y, x
	}
	
	_ = y[:len(x)] // a hint, only works if T is string
	for i := 0; i < len(x); i++ {
		if x[i] != y[i] {
			return i
		}
	}
	return len(x)
}

请注意，未来官方标准Go编译器的版本会变得更加智能，届时上述提示将不再必要。

以对边界检查消除友好（BCE-friendly）的方式编写代码

在下面的示例中，`f7b`和`f7c`函数比`f7a`函数少进行3次边界检查。.

ini 复制代码

func f7a(s []byte, i int) {
	_ = s[i+3] // Found IsInBounds
	_ = s[i+2] // Found IsInBounds
	_ = s[i+1] // Found IsInBounds
	_ = s[i]   // Found IsInBounds
}

func f7b(s []byte, i int) {
	s = s[i:i+4] // Found IsSliceInBounds
	_ = s[3]
	_ = s[2]
	_ = s[1]
	_ = s[0]
}

func f7c(s []byte, i int) {
	s = s[i:i+4:i+4] // Found IsSliceInBounds
	_ = s[3]
	_ = s[2]
	_ = s[1]
	_ = s[0]
}

然而，请注意，可能还有其他因素会影响程序性能。在我的机器（英特尔 i5 - 4210U CPU @ 1.70GHz，Linux/amd64 ）上，上述三个函数中，`f7b `函数实际上是性能最差的一个。

bash 复制代码

Benchmark_f7a-4  3861 ns/op
Benchmark_f7b-4  4223 ns/op
Benchmark_f7c-4  3477 ns/op

在实际应用中，推荐使用三索引子切片形式（f7c）。

在下面的示例中，基准测试结果表明

f8z函数是性能最佳的（符合预期）
但f8y函数的性能与f8x函数相当（出乎意料）。

func f8x(s []byte) { var n = len(s) s = s[:n] for i := 0; i <= n - 4; i += 4 { _ = s[i+3] // Found IsInBounds _ = s[i+2] // Found IsInBounds _ = s[i+1] // Found IsInBounds _ = s[i] } }

func f8y(s []byte) { for i := 0; i <= len(s) - 4; i += 4 { s2 := s[i:] _ = s2[3] // Found IsInBounds _ = s2[2] _ = s2[1] _ = s2[0] } }

func f8z(s []byte) { for i := 0; len(s) >= 4; i += 4 { _ = s[3] _ = s[2] _ = s[1] _ = s[0] s = s[4:] } }

事实上，基准测试结果还表明，以下的f8y3函数与f8z函数性能相当，并且f8y2函数的性能与f8y函数相当。所以在实际应用中，对于这种情况，建议使用三索引子切片形式。.

ini 复制代码

func f8y2(s []byte) {
	for i := 0; i < len(s) - 3; i += 4 {
		s2 := s[i:i+4] // Found IsInBounds
		_ = s2[3]
		_ = s2[2]
		_ = s2[1]
		_ = s2[0]
	}
}

func f8y3(s []byte) {
	for i := 0; i < len(s) - 3; i += 4 {
		s2 := s[i:i+4:i+4] // Found IsInBounds
		_ = s2[3]
		_ = s2[2]
		_ = s2[1]
		_ = s2[0]
	}
}

在下面的示例中，f9b函数和f9c函数中没有边界检查，而f9a函数中有一次边界检查。

go 复制代码

func f9a(n int) []int {
	buf := make([]int, n+1)
	k := 0
	for i := 0; i <= n; i++ {
		buf[i] = k // Found IsInBounds
		k++
	}
	return buf
}


func f9b(n int) []int {
	buf := make([]int, n+1)
	k := 0
	for i := 0; i < len(buf); i++ {
		buf[i] = k
		k++
	}
	return buf
}

func f9c(n int) []int {
	buf := make([]int, n+1)
	k := 0
	for i := 0; i < n+1; i++ {
		buf[i] = k
		k++
	}
	return buf
}

在以下代码中，函数f6b的性能优于f6a，但二者的性能都远不如f6c。

go 复制代码

const N = 3

func f6a(s []byte) {
	for i := 0; i < len(s)-(N-1); i += N {
		_ = s[i+N-1] // Found IsInBounds
	}
}

func f6b(s []byte) {
	for i := N-1; i < len(s); i += N {
		_ = s[i] // Found IsInBounds
	}
}

func f6c(s []byte) {
	for i := uint(N-1); i < uint(len(s)); i += N {
		_ = s[i]
	}
}

全局（包级）切片通常对边界检查消除（BCE）不太友好，所以我们应尝试将其赋值给局部切片，以消除一些不必要的边界检查。例如，在以下代码中，fa0函数比fa1和fa2函数多进行一次边界检查，因此函数调用fa1()和fa2(s)的性能都优于fa0()。

go 复制代码

var s = make([]int, 5)

func fa0() {
	for i := range s {
		s[i] = i // Found IsInBounds
	}
}

func fa1() {
	s := s
	for i := range s {
		s[i] = i
	}
}

func fa2(x []int) {
	for i := range x {
		x[i] = i
	}
}

数组通常比切片对边界检查消除（BCE）更友好。在以下代码中，数组版本的函数（fb2和fc2）不需要边界检查。

go 复制代码

var s = make([]int, 256)
var a = [256]int{}

func fb1() int {
    return s[100] // Found IsInBounds
}

func fb2() int {
    return a[100]
}

func fc1(n byte) int {
    return s[n] // Found IsInBounds
}

func fc2(n byte) int {
    return a[n]
}

在Go工具链v1.24之前，以下代码中的f0b函数性能比f0a函数好得多，因为f0a函数存在一些不必要的边界检查。自Go工具链v1.24起，f0a函数中不必要的边界检查都被移除了，所以f0a函数的性能有了很大提升（尽管仍比f0b函数稍慢一些）。

css 复制代码

func f0a(x [16]byte) (r [4]byte){
	for i := 0; i < 4; i++ {
		r[i] =
			x[i*4+3] ^
			x[i*4+2] ^
			x[i*4+1] ^
			x[i*4]     
	}
	return
}

func f0b(x [16]byte) (r [4]byte){
	r[0] = x[3] ^ x[2] ^ x[1] ^ x[0]
	r[1] = x[7] ^ x[6] ^ x[5] ^ x[4]
	r[2] = x[11] ^ x[10] ^ x[9] ^ x[8]
	r[3] = x[15] ^ x[14] ^ x[13] ^ x[12]
	return
}

请注意，未来官方标准的Go编译器版本会变得更加智能，因此更多对边界检查消除（BCE）不友好的代码日后可能会变得友好。

当前官方标准的Go编译器无法消除一些不必要的边界检查

截至Go工具链v1.24.n，官方标准的Go编译器不会消除以下这些不必要的边界检查。

scss 复制代码

func fd(data []int, check func(int) bool) []int {
	var k = 0
	for _, v := range data {
		if check(v) {
			data[k] = v // Found IsInBounds
			k++
		}
	}
	return data[:k] // Found IsSliceInBounds
}


// For the only bound check in the following function,
// * if N == 1, it will be always removed.
// * if N is a power of 2, Go toolchain 1.19+ can remove it.
// * for other cases, Go toolchain fails to remove it.
func fe(s []byte) {
	const N = 3
	if len(s) >= N {
		r := len(s) % N
		_ = s[r] // Found IsInBounds
	}
}

func ff(s []byte) {
	for i := 0; i < len(s); i++ {
		_ = s[i/2] // Found IsInBounds
		_ = s[i/3] // Found IsInBounds
	}
}

func fg(src, dst []byte) {
	dst = dst[:len(src)]
	for len(src) >= 4 {
		dst[1] = // Found IsInBounds
			src[0]
		dst[0] = src[1]
		src = src[4:]
		dst = dst[4:] // Found IsSliceInBounds
	}
}

未来官方标准的Go编译器版本会变得更智能，届时上述不必要的边界检查将被消除。