Writing Bazel rules: simple binary rule

The learning curve for extending Bazel is steeper than simpler build systems like Make or SCons. Bazel rules are highly structured, and learning this structure takes time. However, this structure helps you avoid introducing unnecessary complication and unexpected dependencies in large, complex builds.扩展 Bazel 的学习曲线比 Make 或 SCons 等简单的构建系统更陡峭。Bazel 规则结构严谨,学习这种结构需要时间。但是,这种结构可以帮助您避免在大型复杂构建中引入不必要的复杂性和意外的依赖关系。

How Bazel works

Starlark

Starlark is Bazel's configuration and extension language.It's essentially Python without some of the advanced features: Starlark has no classes, exceptions, or generators, and the module system is different

Repositories, packages, rules, labels

To build things in Bazel, you need to write build files (named BUILD or BUILD.bazel). They look like this:

python 复制代码
load("@io_bazel_rules_go//go:def.bzl", "go_binary", "go_library", "go_test")

go_library(
    name = "fetch_repo_lib",
    srcs = [
        "fetch_repo.go",
        "module.go",
        "vcs.go",
    ],
    importpath = "github.com/bazelbuild/bazel-gazelle/cmd/fetch_repo",
    visibility = ["//visibility:private"],
    deps = ["@org_golang_x_tools_go_vcs//:vcs"],
)

go_binary(
    name = "fetch_repo",
    embed = [":fetch_repo_lib"],
    visibility = ["//visibility:public"],
)

go_test(
    name = "fetch_repo_test",
    srcs = ["fetch_repo_test.go"],
    embed = [":fetch_repo_lib"],
    deps = ["@org_golang_x_tools_go_vcs//:vcs"],
)

构建文件包含许多目标,以 Starlark 函数调用的形式编写。语法是声明性的:您说的是要构建什么,而不是如何构建。在此示例中,我们定义了一个 Go 库 ( "fetch_repo_lib") 和一些源文件。"fetch_repo"从该库构建了一个二进制文件 ( )。我们还"fetch_repo_test"从该库构建了一个测试 ( ) 和一个附加源文件 ( "fetch_repo_test.go")。

Build files contain a number of targets, written as Starlark function calls. The syntax is declarative: you say what you want to build, not how to build it. In this example, we're defining a Go library ("fetch_repo_lib") with a handful for source files. A binary ("fetch_repo") is built from that library. We also have a test ("fetch_repo_test") built from that library and an additional source file ("fetch_repo_test.go").

每个构建文件都隐式定义了一个 Bazel 包。包由构建文件中声明的目标以及包目录和子目录中的所有文件组成,但不包括其他包子目录中定义的目标和文件。可见性限制通常应用于包级别,而 glob(用于匹配源文件的通配符模式)在包边界处结束。通常(并非总是),每个目录都有一个包。

Each build file implicitly defines a Bazel package. A package consists of the targets declared in the build file and all of the files in the package's directory and subdirectories, excluding targets and files defined in other packages' subdirectories. Visibility restrictions are usually applied at the package level, and globs (wildcard patterns used to match source files) end at package boundaries. Frequently (not always), you'll have one package per directory.

目标和文件使用标签命名,标签是类似 的字符串"@io_bazel_rules_go//go:def.bzl"。标签由三部分组成:存储库名称 ( io_bazel_rules_go)、包名称 ( go) 和文件或目标名称 ( def.bzl)。当标签指向同一存储库或包中的某个内容时,可以省略存储库名称和包名称。

Targets and files are named using labels, which are strings that look like "@io_bazel_rules_go//go:def.bzl". Labels have three parts: a repository name (io_bazel_rules_go), a package name (go), and a file or target name (def.bzl). The repository name and the package name may be omitted when a label refers to something in the same repository or package.

存储库在名为 的文件中定义WORKSPACE,该文件位于项目的根目录中。我将在以后的文章中更详细地介绍存储库规则。现在,只需将它们视为具有名称的 git 存储库即可。

Repositories are defined in a file called WORKSPACE, which lives in the root directory of a project. I'll get more into repository rules more in a future article. For now, just think of them as git repositories with names.

Loading, analysis, and execution加载、分析和执行

Bazel builds targets in three phases: loading, analysis, and execution (actually there are more, but these are the phases you need to understand when writing rules).

Bazel 通过三个阶段来构建目标:加载、分析和执行(实际上还有更多,但这些是您在编写规则时需要了解的阶段)。

In the loading phase, Bazel reads and evaluates build files. It builds a graph of targets and dependencies. For example, if you ask to build fetch_repo_test above, Bazel will build a graph with a fetch_repo_test node that depends on fetch_repo_test.go, :fetch_repo_lib, and @org_golang_x_tools_go_vcs//:vcs via srcs, embed, and deps edges, respectively.

在加载阶段,Bazel 会读取并评估构建文件。它会构建一个目标和依赖关系图。例如,如果您要求进行fetch_repo_test构建,Bazel 将构建一个图,其中fetch_repo_test节点分别依赖于fetch_repo_test.go、:fetch_repo_lib并@org_golang_x_tools_go_vcs//:vcs通过srcs、embed和deps边。

In the analysis phase, Bazel evaluates rules in the target graph. Rules declare files and actions that will produce those files. The output of analysis is the file-action graph. Bazel has built-in rules for Java, C++, Python, and a few other things. Other rules are implemented in Starlark. It's important to note that rules cannot directly perform any I/O; they merely tell Bazel how it should execute programs to build targets. This means rules can't make any decisions based on the contents of source files (so no automatic dependency discovery).

在分析阶段,Bazel 评估目标图中的规则。规则声明将生成这些文件的文件和操作。分析的输出是文件操作图。Bazel 内置了 Java、C++、Python 和其他一些语言的规则。其他规则在 Starlark 中实现。需要注意的是,规则不能直接执行任何 I/O;它们只是告诉 Bazel 如何执行程序来构建目标。这意味着规则不能根据源文件的内容做出任何决定(因此没有自动依赖项发现)。

In the execution phase, Bazel runs actions in the file-action graph needed to produce files that are out of date. Bazel has several strategies for running actions. Locally, it runs actions within a sandbox that only exposes declared inputs. This makes builds more hermetic, since it's harder to accidentally depend on system files that vary from machine to machine. Bazel may also run actions on remote build servers where this isolation happens automatically.

在执行阶段,Bazel 会在文件操作图中运行操作,以生成过期的文件。Bazel 有几种运行操作的策略。在本地,它在仅公开声明的输入的沙盒中运行操作。这使得构建更加密封,因为更难意外地依赖于因机器而异的系统文件。Bazel 还可以在远程构建服务器上运行操作,这种隔离会自动发生。

Setting up the repository

Okay, we've gotten all the theory out of the way for today. Let's dive into the code. We're going to write "rules_go_simple", a simplified version of github.com/bazelbuild/rules_go. Don't worry if you don't know Go --- there's not any Go code in here today, and the implementation for other languages will be mostly the same.

好的,今天我们已经讲完了所有的理论。让我们深入研究代码。我们将编写" rules_go_simple",即github.com/bazelbuild/rules_go的简化版本。如果您不了解 Go,也不用担心------今天这里没有任何 Go 代码,其他语言的实现也基本相同。

I've created an example repository at github.com/jayconrod/rules_go_simple. For this article, we'll be looking at the v1 branch. In later articles, we'll add features to branches with higher version numbers.

我在github.com/jayconrod/rules_go_simple创建了一个示例存储库。在本文中,我们将研究分支v1。在后续文章中,我们将向版本号更高的分支添加功能。

The first thing we need is a WORKSPACE file. Every Bazel project should have one of these in the repository root directory. WORKSPACE configures external dependencies that we need to build and test our project. In our case, we have one dependency, @bazel_skylib, which we use to quote strings in shell commands.

我们首先需要一个WORKSPACE文件。每个 Bazel 项目都应该在存储库根目录中有一个这样的文件。WORKSPACE配置构建和测试项目所需的外部依赖项。在我们的例子中,我们有一个依赖项,@bazel_skylib我们用它来引用 shell 命令中的字符串。

Bazel only evaluates the WORKSPACE file for the current project; WORKSPACE files of dependencies are ignored. We declare all our dependencies inside a function in deps.bzl so that projects that depend on rules_go_simple can share our dependencies.

Bazel 仅评估WORKSPACE当前项目的文件;WORKSPACE依赖项文件将被忽略。我们在deps.bzl函数内声明所有依赖项,以便依赖的项目rules_go_simple可以共享我们的依赖项。

Here's our WORKSPACE file.

python 复制代码
workspace(name = "rules_go_simple")

load("@rules_go_simple//:deps.bzl", "go_rules_dependencies")

go_rules_dependencies()

Here's deps.bzl. Note that the _maybe function is private (since it starts with ) and cannot be loaded from other files.
这是deps.bzl。请注意,该_maybe函数是私有的(因为它以 开头
),不能从其他文件加载。

python 复制代码
load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")

def go_rules_dependencies():
    """Declares external repositories that rules_go_simple depends on. This
    function should be loaded and called from WORKSPACE files."""

    # bazel_skylib is a set of libraries that are useful for writing
    # Bazel rules. We use it to handle quoting arguments in shell commands.
    _maybe(
        git_repository,
        name = "bazel_skylib",
        remote = "https://github.com/bazelbuild/bazel-skylib",
        commit = "3fea8cb680f4a53a129f7ebace1a5a4d1e035914",
    )

def _maybe(rule, name, **kwargs):
    """Declares an external repository if it hasn't been declared already."""
    if name not in native.existing_rules():
        rule(name = name, **kwargs)

Note that declaring a repository doesn't automatically download it. Bazel will only download a repository if it needs something inside.请注意,声明存储库并不会自动下载它。Bazel 仅在需要存储库中的某些内容时才会下载它。

Declaring the go_binary rule

To define our binary rule, we'll create a new file, internal/rules.bzl. We'll start with a declaration like this:

为了定义二进制规则,我们将创建一个新文件internal/rules.bzl。我们将从这样的声明开始:

python 复制代码
go_binary = rule(
    implementation = _go_binary_impl,
    attrs = {
        "srcs": attr.label_list(
            allow_files = [".go"],
            doc = "Source files to compile for the main package of this binary",
        ),
        "_stdlib": attr.label(
            default = "//internal:stdlib",
        ),
    },
    doc = "Builds an executable program from Go source code",
    executable = True,
)

You may want to refer to the Bazel documentation for rule and attr here. There's a lot here, so let's break it down.

We are defining a new rule named go_binary by assigning the result of the rule function to a variable with that name.

go_binary is implemented in the _go_binary_impl function (passed as the first argument here), which Bazel will call during the analysis phase for each go_binary target that's part of a build. The implementation function will declare output files and actions.

go_binary has an attribute named srcs, which is a label_list. srcs may be a list of files with names ending in ".go".

Edit: There's also an attribute named _stdlib. This is a hidden attribute (its name starts with _) that points to a target that builds the Go standard library //internal:stdlib. This was a late addition to this series due to a change in Go 1.20. Don't worry too much about it unless you want to understand how Go is built.

go_binary must produce an executable file.

Note that all rules support a set of common attributes like name, visibility, and tags. These don't need to be declared explicitly.

您可能希望在此处参考规则和属性的 Bazel 文档。这里有很多内容,让我们分解一下。

我们通过将规则函数的结果分配给具有该名称的变量来定义名为 go_binary 的新规则。

go_binary 在 _go_binary_impl 函数中实现(在此处作为第一个参数传递),Bazel 将在分析阶段针对构建的每个 go_binary 目标调用该函数。实现函数将声明输出文件和操作。

go_binary 有一个名为 srcs 的属性,它是一个 label_list。srcs 可能是名称以".go"结尾的文件列表。

编辑:还有一个名为 _stdlib 的属性。这是一个隐藏属性(其名称以 _ 开头),指向构建 Go 标准库 //internal:stdlib 的目标。由于 Go 1.20 中的变化,这是本系列的后期添加。除非您想了解 Go 的构建方式,否则不要太担心。

go_binary 必须生成一个可执行文件。

请注意,所有规则都支持一组通用属性,如名称、可见性和标签。 这些不需要明确声明。

Implementing go_binary

Let's look at our implementation function next.

python 复制代码
def _go_binary_impl(ctx):
    # Declare an output file for the main package and compile it from srcs. All
    # our output files will start with a prefix to avoid conflicting with
    # other rules.
    main_archive = ctx.actions.declare_file("{name}_/main.a".format(name = ctx.label.name))
    go_compile(
        ctx,
        srcs = ctx.files.srcs,
        stdlib = ctx.files._stdlib,
        out = main_archive,
    )

    # Declare an output file for the executable and link it. Note that output
    # files may not have the same name as the rule, so we still need to use the
    # prefix here.
    executable_path = "{name}_/{name}".format(name = ctx.label.name)
    executable = ctx.actions.declare_file(executable_path)
    go_link(
        ctx,
        main = main_archive,
        stdlib = ctx.files._stdlib,
        out = executable,
    )

    # Return the DefaultInfo provider. This tells Bazel what files should be
    # built when someone asks to build a go_binary rule. It also says which
    # file is executable (in this case, there's only one).
    return [DefaultInfo(
        files = depset([executable]),
        executable = executable,
    )]

Implementation functions take a single argument, a ctx object. This provides an API used to access rule attributes and to declare files and actions. It also exposes lots of useful metadata.

The first thing we do here is compile the main package. (For readers unfamiliar with Go, packages are the compilation unit; multiple .go source files may be compiled into a single .a package file). We declare a main.a output file using ctx.actions.declare_file, which returns a File object. We then call go_compile to declare the compile action (which we'll get to in just a minute).

Next, we'll link our main.a into a standalone executable. We declare our executable file, then call go_link (which we'll also define in just a minute).

Finally, we need to tell Bazel what we've done by returning a list of providers. A provider is a struct returned by a rule that contains information needed by other rules and by Bazel itself. DefaultInfo is a special provider that all rules should return. Here, we store two useful pieces of information. files is a depset (more on depsets another time) that lists the files that should be built when another rule depends on our rule or when someone runs bazel build on our rule. No one cares about the main.a file, so we just return the binary file here. And executable points to our executable file. If someone runs bazel run on our rule, this is the file that gets run.

实现函数接受一个参数,即 ctx 对象。这提供了一个用于访问规则属性以及声明文件和操作的 API。它还公开了许多有用的元数据。

我们在这里做的第一件事是编译主包。(对于不熟悉 Go 的读者,包是编译单元;多个 .go 源文件可以编译成一个 .a 包文件)。我们使用 ctx.actions.declare_file 声明一个 main.a 输出文件,它返回一个 File 对象。然后我们调用 go_compile 来声明编译操作(我们将在一分钟内讲到)。

接下来,我们将 main.a 链接到一个独立的可执行文件中。我们声明我们的可执行文件,然后调用 go_link(我们也将在一分钟内定义它)。

最后,我们需要通过返回提供程序列表来告诉 Bazel 我们做了什么。提供程序是由规则返回的结构,其中包含其他规则和 Bazel 本身所需的信息。DefaultInfo 是所有规则都应返回的特殊提供程序。这里,我们存储了两条有用的信息。files 是一个 depset(下次再详细介绍 depset),它列出了当另一个规则依赖于我们的规则或有人在我们的规则上运行 bazel build 时应该构建的文件。没有人关心 main.a 文件,所以我们只在这里返回二进制文件。而 executable 指向我们的可执行文件。如果有人在我们的规则上运行 bazel run,这就是要运行的文件。

I chose to define the go_compile and go_link actions in separate functions. They could easily have been inlined in the rule above. However, actions are frequently shared by multiple rules. In future articles, when we define go_library and go_test rules, we'll need to compile more packages, and we'll need to link a new kind of binary. We can't call go_binary from those rules, so it makes sense to pull these actions out into functions in actions.bzl.

我选择在单独的函数中定义 go_compile 和 go_link 操作。它们可以很容易地内联在上述规则中。但是,操作经常由多个规则共享。在以后的文章中,当我们定义 go_library 和 go_test 规则时,我们需要编译更多包,并且需要链接一种新的二进制文件。我们无法从这些规则中调用 go_binary,因此将这些操作拉出到 action.bzl 中的函数中是有意义的。

Here's go_compile:

python 复制代码
def go_compile(ctx, *, srcs, stdlib, out):
    """Compiles a single Go package from sources.

    Args:
        ctx: analysis context.
        srcs: list of source Files to be compiled.
        stdlib: list containing an importcfg file and a package directory
            for the standard library.
        out: output .a file. Should have the importpath as a suffix,
            for example, library "example.com/foo" should have the path
            "somedir/example.com/foo.a".
    """
    stdlib_importcfg = stdlib[0]
    cmd = "go tool compile -o {out} -importcfg {importcfg} -- {srcs}".format(
        out = shell.quote(out.path),
        importcfg = shell.quote(stdlib_importcfg.path),
        srcs = " ".join([shell.quote(src.path) for src in srcs]),
    )
    ctx.actions.run_shell(
        outputs = [out],
        inputs = srcs + stdlib,
        command = cmd,
        env = {"GOPATH": "/dev/null"},  # suppress warning
        mnemonic = "GoCompile",
        use_default_shell_env = True,
    )

This function builds a Bash command to invoke the compiler, then calls run_shell to declare an action that runs that command. run_shell takes our command, a list of input files that will be made available in the sandbox, and a list of output files that Bazel will expect.

此函数构建一个 Bash 命令来调用编译器,然后调用 run_shell 来声明运行该命令的操作。run_shell 接受我们的命令、将在沙箱中提供的输入文件列表以及 Bazel 期望的输出文件列表。

Our go_link function is similar.

python 复制代码
def go_link(ctx, *, out, stdlib, main):
    """Links a Go executable.

    Args:
        ctx: analysis context.
        out: output executable file.
        stdlib: list containing an importcfg file and a package directory
            for the standard library.
        main: archive file for the main package.
    """
    stdlib_importcfg = stdlib[0]
    cmd = "go tool link -o {out} -importcfg {importcfg} -- {main}".format(
        out = shell.quote(out.path),
        importcfg = shell.quote(stdlib_importcfg.path),
        main = shell.quote(main.path),
    )
    ctx.actions.run_shell(
        outputs = [out],
        inputs = [main] + stdlib,
        command = cmd,
        env = {"GOPATH": "/dev/null"},  # suppress warning
        mnemonic = "GoLink",
        use_default_shell_env = True,
    )

I wanted to keep this article from getting too absurdly long, so I chose to to keep things simple instead of doing it the Right Way. In general, I'd caution against using any Bash commands in Bazel actions for several reasons. It's hard to write portable commands (macOS has different versions of most shell commands than Linux with different flags; and in Windows you'll probably need to rewrite everything in Powershell). It's hard to get quoting and escaping right (definitely use shell.quote from @bazel_skylib). It's hard to avoid including some implicit dependency. Bazel tries to isolate you from this a bit with the sandbox; I had to use use_default_shell_env = True to be able to find go on PATH. We should generally avoid using tools installed on the user's system since they may differ across systems, but again, we're keeping it simple this time.

Instead of writing Bash commands, it's better to compile tools with Bazel and use those. That lets you write more sophisticated (and reproducible) build logic in your language of choice.

我想让这篇文章不要太长,所以我选择保持简单,而不是用正确的方式去做。一般来说,出于多种原因,我会警告不要在 Bazel 操作中使用任何 Bash 命令。很难编写可移植的命令(macOS 的大多数 shell 命令与 Linux 的版本不同,带有不同的标志;而在 Windows 中,您可能需要在 Powershell 中重写所有内容)。很难正确引用和转义(一定要使用来自 @bazel_skylib 的 shell.quote)。很难避免包含一些隐式依赖关系。Bazel 试图通过沙盒将您与此隔离开来;我不得不使用 use_default_shell_env = True 才能在 PATH 上找到 go。我们通常应该避免使用安装在用户系统上的工具,因为它们可能因系统而异,但同样,这次我们要保持简单。

与其编写 Bash 命令,不如用 Bazel 编译工具并使用它们。这使您可以使用自己选择的语言编写更复杂(且可重复)的构建逻辑。

Exposing a public interface

It's useful to have declarations for all public symbols in one file. This way, you can refactor your rules without requiring users to update load statements in their projects. load statements import a public symbol from another .bzl file into the current file. They also expose that symbol for other files loading the current file. So all we have to do is create one file that loads our public symbols. That's def.bzl.将所有公共符号的声明放在一个文件中很有用。这样,您可以重构规则,而无需用户更新项目中的加载语句。加载语句将公共符号从另一个 .bzl 文件导入到当前文件中。它们还会将该符号公开给加载当前文件的其他文件。所以我们要做的就是创建一个加载公共符号的文件。那就是 def.bzl。

python 复制代码
load("//internal:rules.bzl", _go_binary = "go_binary")

go_binary = _go_binary

Edit: In very old versions of Bazel, simply loading a symbol in a .bzl file would make it available for loading in other files. In newer versions, a symbol must be defined in order for it to be loadable. It's still a good practice to put your public definitions in one file, but it takes a little more work. Above, we load the internal go_binary as _go_binary, then redefine that as go_binary.编辑:在非常老版本的 Bazel 中,只需在 .bzl 文件中加载符号即可将其加载到其他文件中。在较新的版本中,必须定义符号才能将其加载。将公共定义放在一个文件中仍然是一种很好的做法,但这需要更多的工作。上面,我们将内部 go_binary 加载为 _go_binary,然后将其重新定义为 go_binary。

Testing the go_binary rule

To test go_binary, we can define a sh_test rule that runs a go_binary rule and checks its output. Here's our build file, tests/BUILD.bazel:

python 复制代码
load("//:def.bzl", "go_binary")

sh_test(
    name = "hello_test",
    srcs = ["hello_test.sh"],
    args = ["$(location :hello)"],
    data = [":hello"],
)

go_binary(
    name = "hello",
    srcs = [
        "hello.go",
        "message.go",
    ],
)

Our go_binary rule has two sources, hello.go and message.go. It just prints "Hello, world!". Our test has a data dependency on the hello binary. This means that when the test is run, Bazel will build hello and make it available. To avoid hardcoding the location of the binary in the test, we pass it in as an argument. See " ( l o c a t i o n ) " s u b s t i t u t i o n f o r h o w t h i s w o r k s . H e r e ′ s o u r t e s t s c r i p t : 我们的 g o b i n a r y 规则有两个源, h e l l o . g o 和 m e s s a g e . g o 。它只打印" H e l l o , w o r l d ! "。我们的测试对 h e l l o 二进制文件具有数据依赖性。这意味着当测试运行时, B a z e l 将构建 h e l l o 并使其可用。为了避免在测试中对二进制文件的位置进行硬编码,我们将其作为参数传入。请参阅" (location)" substitution for how this works. Here's our test script:我们的 go_binary 规则有两个源,hello.go 和 message.go。它只打印"Hello, world!"。我们的测试对 hello 二进制文件具有数据依赖性。这意味着当测试运行时,Bazel 将构建 hello 并使其可用。为了避免在测试中对二进制文件的位置进行硬编码,我们将其作为参数传入。请参阅" (location)"substitutionforhowthisworks.Here′sourtestscript:我们的gobinary规则有两个源,hello.go和message.go。它只打印"Hello,world!"。我们的测试对hello二进制文件具有数据依赖性。这意味着当测试运行时,Bazel将构建hello并使其可用。为了避免在测试中对二进制文件的位置进行硬编码,我们将其作为参数传入。请参阅"(location)"替换以了解其工作原理。

这是我们的测试脚本:

python 复制代码
#!/bin/bash

set -euo pipefail

program="$1"
got=$("$program")
want="Hello, world!"

if [ "$got" != "$want" ]; then
  cat >&2 <<EOF
got:
$got

want:
$want
EOF
  exit 1
fi

You can test this out with bazel test //tests:hello_test.

相关推荐
糖果Autosar5 个月前
Writing Bazel rules: library rule, depsets, providers
bazel
m0_740433831 年前
bazel使用中存在的问题
bazel
m0_740433831 年前
bazel远程缓存(Remote Cache)
bazel