Writing Bazel rules: library rule, depsets, providers

In the last article, we built a go_binary rule that compiled and linked a Go executable from a list of sources. This time, we'll define a go_library rule that can compile a Go package that can be depended on by other libraries and binaries.

This article focuses on rules that communicate with each other to build a dependency graph that can be used by a linker (or a linker-like action). All the of the code is from github.com/jayconrod/rules_go_simple on the v2 branch.

Once again, you don't need to know Go to understand this. I'm just using Go as an example because that's what I work on.

在上一篇文章中,我们构建了一个 go_binary 规则,该规则从源列表编译并链接 Go 可执行文件。 这次,我们将定义一个 go_library 规则,该规则可以编译其他库和二进制文件可以依赖的 Go 包。

本文重点介绍相互通信以构建可由链接器(或类似链接器的操作)使用的依赖关系图的规则。 所有代码均来自 v2 分支上的 github.com/jayconrod/rules_go_simple。

再说一次,您不需要了解 Go 就可以理解这一点。 我只是用 Go 作为例子,因为这就是我的工作。

Background

Before we jump in, we need to cover three important concepts: structs, providers, and depsets. They are data structures used to pass information between rules, and we'll need them to gather information about dependencies.

背景在开始之前,我们需要介绍三个重要概念:结构、提供程序和依赖项。它们是用于在规则之间传递信息的数据结构,我们需要它们来收集有关依赖项的信息。

Structs

Structs are a basic data structure in Starlark (technically, structs are not part of the Starlark language; they are provided by Bazel). A struct value is essentially a tuple with a name for each value. You can create a struct value by calling the struct function:

结构是 Starlark 中的基本数据结构(从技术上讲,结构不是 Starlark 语言的一部分;它们由 Bazel 提供)。结构值本质上是一个元组,每个值都有一个名称。您可以通过调用 struct 函数来创建结构值:

python 复制代码
my_value = struct(
    foo = 12,
    bar = 34,
)

You can access fields in the struct the same way you would access fields in an object in Python.

python 复制代码
print(my_value.foo + my_value.bar)

You can use the dir function to get a list of field names of a struct. getattr and hasattr work the way you'd expect, but you can't modify or delete attributes after they're set because struct values are immutable. There are also to_json and to_proto methods on every struct, which you may find useful.您可以使用 dir 函数获取结构体的字段名称列表。getattr 和 hasattr 的工作方式与您预期的一样,但您无法在设置属性后对其进行修改或删除,因为结构体值是不可变的。每个结构体上还有 to_json 和 to_proto 方法,您可能会发现它们很有用。

Providers

A provider is a named struct that contains information about a rule. Rule implementation functions return provider structs when they're evaluated. Providers can be read by anything that depends on the rule. In the last article, our go_binary rule returned a DefaultInfo provider (one of the built-in providers). This time, we'll define a GoLibraryInfo provider that carries metadata about our libraries.

You can define a new provider by calling the provider function.

提供程序是一个命名结构,其中包含有关规则的信息。规则实现函数在评估时返回提供程序结构。任何依赖于规则的东西都可以读取提供程序。在上一篇文章中,我们的 go_binary 规则返回了一个 DefaultInfo 提供程序(内置提供程序之一)。这次,我们将定义一个 GoLibraryInfo 提供程序,它包含有关我们库的元数据。

您可以通过调用提供程序函数来定义新的提供程序。

python 复制代码
MyProvider = provider(
    doc = "My custom provider",
    fields = {
        "foo": "A foo value",
        "bar": "A bar value",
    },
)

Depsets

Bazel provides a special purpose data structure called a depset. Like any set, a depset is a set of unique values. Depsets distinguish themselves from other kinds of sets by being fast to merge and having a well-defined iteration order.

Depsets are typically used to accumulate information like sources or header files over potentially large dependency graphs. In this article, we'll use depsets to accumulate information about dependencies. The linker will be able to use this information without needing to explicitly write all transitive dependencies in the go_binary rule.

A depset comprises a list of direct elements, a list of transitive children, and an iteration order.

Bazel 提供了一种称为 depset 的特殊用途数据结构。与任何集合一样,depset 是一组唯一值。Depset 与其他类型的集合的区别在于,它可以快速合并,并且具有明确定义的迭代顺序。

Depsets 通常用于在可能很大的依赖关系图上累积诸如源或头文件之类的信息。在本文中,我们将使用 depsets 来累积有关依赖项的信息。链接器将能够使用此信息,而无需在 go_binary 规则中明确写入所有传递依赖项。

depset 包含直接元素列表、传递子元素列表和迭代顺序。

Constructing a depset is fast because it just involves creating an object with direct and transitive lists. This takes O(D+T) time where D is the number of elements in the direct list and T is the number of transitive children. Bazel deduplicates elements of both lists when constructing sets. Iterating a depset or converting it to a list takes O(n) time where n is the number of elements in the set and all of its children, including duplicates.

构建 depset 很快,因为它只涉及使用直接和传递列表创建对象。这需要 O(D+T) 时间,其中 D 是直接列表中元素的数量,T 是传递子元素的数量。Bazel 在构建集合时会对两个列表的元素进行重复数据删除。迭代 depset 或将其转换为列表需要 O(n) 时间,其中 n 是集合中的元素数量及其所有子元素(包括重复项)。

Defining go_library

The GoLibraryInfo provider

Ok, the theory is out of the way, let's get to the code.

First, we'll define a new provider. GoLibraryInfo will carry information about each library and its dependencies. We'll define it in a new file, providers.bzl.好了,理论已经讲完了,让我们开始写代码吧。

首先,我们将定义一个新的提供程序。GoLibraryInfo 将包含有关每个库及其依赖项的信息。我们将在新文件 provider.bzl 中定义它。

python 复制代码
GoLibraryInfo = provider(
    doc = "Contains information about a Go library",
    fields = {
        "info": """A struct containing information about this library.
        Has the following fields:
            importpath: Name by which the library may be imported.
            archive: The .a file compiled from the library's sources.
        """,
        "deps": "A depset of info structs for this library's dependencies",
    },
)

Technically, we don't need to list the fields or provide any documentation here, but we may be able to generate HTML documentation from this some day.

The go_library rule

Now we can define the go_library rule. It uses the same go_compile function as go_binary. Here's the new rule declaration in rules.bzl.现在我们可以定义 go_library 规则。它使用与 go_binary 相同的 go_compile 函数。以下是 rules.bzl 中的新规则声明。

python 复制代码
go_library = rule(
    _go_library_impl,
    attrs = {
        "srcs": attr.label_list(
            allow_files = [".go"],
            doc = "Source files to compile",
        ),
        "deps": attr.label_list(
            providers = [GoLibraryInfo],
            doc = "Direct dependencies of the library",
        ),
        "importpath": attr.string(
            mandatory = True,
            doc = "Name by which the library may be imported",
        ),
    },
    doc = "Compiles a Go archive from Go sources and dependencies",
)

There are three attributes here. srcs is a list of labels that refer to source .go files or rules that generate .go files. deps is a list of labels that refer to other Go library rules. They don't have to be go_library specifically, but they have to return GoLibraryInfo providers to be compatible. importpath is just a string. We'll use that to name the output files such that the Go compiler and linker can find them.

Here's the implementation of the rule.这里有三个属性。srcs 是引用源 .go 文件或生成 .go 文件的规则的标签列表。deps 是引用其他 Go 库规则的标签列表。它们不必是特定的 go_library,但它们必须返回 GoLibraryInfo 提供程序才能兼容。importpath 只是一个字符串。我们将使用它来命名输出文件,以便 Go 编译器和链接器可以找到它们。

这是规则的实现。

python 复制代码
def _go_library_impl(ctx):
    # Declare an output file for the library package and compile it from srcs.
    archive = declare_archive(ctx, ctx.attr.importpath)
    go_compile(
        ctx,
        srcs = ctx.files.srcs,
        deps = [dep[GoLibraryInfo] for dep in ctx.attr.deps],
        out = archive,
    )

    # Return the output file and metadata about the library.
    return [
        DefaultInfo(files = depset([archive])),
        GoLibraryInfo(
            info = struct(
                importpath = ctx.attr.importpath,
                archive = archive,
            ),
            deps = depset(
                direct = [dep[GoLibraryInfo].info for dep in ctx.attr.deps],
                transitive = [dep[GoLibraryInfo].deps for dep in ctx.attr.deps],
            ),
        ),
    ]

First, we use declare_archive, a new function defined in actions.bzl, to declare our output file. (For curious Go users, an archive with the import path github.com/foo/bar will be named rule_label%/github.com/foo/bar/baz.a. We can pass the directory rule_label% to the compiler and linker with -I and -L flags respectively so that the archives may be found. -importcfg is a better mechanism for this, but I didn't want to complicate this article too much.)

首先,我们使用在actions.bzl中定义的新函数declare_archive来声明我们的输出文件。(对于好奇的Go用户,导入路径为github.com/foo/bar的存档将被命名为rule_label%/github.com/foo/bar/baz.a。我们可以分别使用-I和-L标志将目录rule_label%传递给编译器和链接器,以便可以找到存档。-importcfg是一种更好的机制,但我不想让这篇文章变得太复杂。)

Next, we compile the library using our go_compile function from before. We access the list of source files through ctx.files.srcs, which is a flat list of files from the srcs attribute. Individual targets in srcs may refer to multiple source files (for example, if we refer to a filegroup or a rule that generates source code), but we just want a flat list. We access dependencies through ctx.attr.deps, which is a list of Targets. Providers can be read from a Target with a subscript expression (dep[GoLibraryInfo] above).

接下来,我们使用之前的go_compile函数编译库。我们通过ctx.files.srcs访问源文件列表,它是来自srcs属性的平面文件列表。srcs中的各个目标可能引用多个源文件(例如,如果我们引用文件组或生成源代码的规则),但我们只想要一个平面列表。我们通过ctx.attr.deps访问依赖项,它是一个目标列表。可以使用下标表达式(上面的 dep[GoLibraryInfo])从 Target 读取提供程序。

Finally, we return a list of two providers, DefaultInfo and GoLibraryInfo. The GoLibraryInfo.info field is a struct with information about the library being compiled. It's important that this struct is immutable and is relatively small, since it will be added to a depset (the GoLibraryInfo.deps field of other libraries) and hashed.

最后,我们返回两个提供程序的列表,DefaultInfo 和 GoLibraryInfo。GoLibraryInfo.info 字段是一个结构,其中包含有关正在编译的库的信息。重要的是,这个结构是不可变的并且相对较小,因为它将被添加到 depset(其他库的 GoLibraryInfo.deps 字段)并进行哈希处理。

There was an important change to go_compile and go_link. Did you catch it? Both now accept a deps argument, a list of GoLibraryInfo objects for direct dependencies.

go_compile uses this to generate -I flags for the compiler (import search paths). The compiler only needs search paths for compiled direct dependencies.

go_compile 和 go_link 有一个重要变化。你注意到了吗?两者现在都接受 deps 参数,即直接依赖项的 GoLibraryInfo 对象列表。

go_compile 使用它为编译器生成 -I 标志(导入搜索路径)。编译器只需要编译直接依赖项的搜索路径。

python 复制代码
def go_compile(ctx, srcs, out, deps = []):
    """Compiles a single Go package from sources.

    Args:
        ctx: analysis context.
        srcs: list of source Files to be compiled.
        out: output .a file. Should have the importpath as a suffix,
            for example, library "example.com/foo" should have the path
            "somedir/example.com/foo.a".
        deps: list of GoLibraryInfo objects for direct dependencies.
    """
    dep_import_args = []
    dep_archives = []
    for dep in deps:
        dep_import_args.append("-I " + shell.quote(_search_dir(dep.info)))
        dep_archives.append(dep.info.archive)

    cmd = "go tool compile -o {out} {imports} -- {srcs}".format(
        out = shell.quote(out.path),
        imports = " ".join(dep_import_args),
        srcs = " ".join([shell.quote(src.path) for src in srcs]),
    )
    ctx.actions.run_shell(
        outputs = [out],
        inputs = srcs + dep_archives,
        command = cmd,
        mnemonic = "GoCompile",
        use_default_shell_env = True,
    )

go_link uses this to generate -L flags for the linker. The linker needs to know about all transitive dependencies, not just the direct dependencies of the binary. That's why we needed GoLibraryInfo.deps; the linker needs to know about everything.

python 复制代码
def go_link(ctx, out, main, deps = []):
    """Links a Go executable.

    Args:
        ctx: analysis context.
        out: output executable file.
        main: archive file for the main package.
        deps: list of GoLibraryInfo objects for direct dependencies.
    """
    deps_set = depset(
        direct = [d.info for d in deps],
        transitive = [d.deps for d in deps],
    )
    dep_lib_args = []
    dep_archives = []
    for dep in deps_set.to_list():
        dep_lib_args.append("-L " + shell.quote(_search_dir(dep)))
        dep_archives.append(dep.archive)

    cmd = "go tool link -o {out} {libs} -- {main}".format(
        out = shell.quote(out.path),
        libs = " ".join(dep_lib_args),
        main = shell.quote(main.path),
    )
    ctx.actions.run_shell(
        outputs = [out],
        inputs = [main] + dep_archives,
        command = cmd,
        mnemonic = "GoLink",
        use_default_shell_env = True,
    )

go_binary now includes a deps attribute and calls go_link with GoLibraryInfo providers from those targets. I won't reproduce the entire source here because it's a very small change from last time.

go_binary 现在包含一个 deps 属性,并使用来自这些目标的 GoLibraryInfo 提供程序调用 go_link。我不会在这里重现整个源代码,因为这与上次相比变化很小。

Exposing a public interface

All our definitions are in an internal directory, and we need to make them available for other people to use. So we load them in def.bzl, which just contains our public definitions. We expose both go_library and GoLibraryInfo. The latter will be needed by anyone who wants to implement compatible rules.

我们所有的定义都在一个内部目录中,我们需要让其他人可以使用它们。因此,我们将它们加载到 def.bzl 中,它只包含我们的公共定义。我们公开 go_library 和 GoLibraryInfo。任何想要实现兼容规则的人都需要后者。

python 复制代码
load(
    "//internal:rules.bzl",
    _go_binary = "go_binary",
    _go_library = "go_library",
)
load(
    "//internal:providers.bzl",
    _GoLibraryInfo = "GoLibraryInfo",
)

go_binary = _go_binary
go_library = _go_library
GoLibraryInfo = _GoLibraryInfo

Testing the go_library rule

We'll test our new functionality the same way we did before: using an sh_test that runs a go_binary built with our new functionality:

python 复制代码
sh_test(
    name = "bin_with_libs_test",
    srcs = ["bin_with_libs_test.sh"],
    args = ["$(location :bin_with_libs)"],
    data = [":bin_with_libs"],
)

go_binary(
    name = "bin_with_libs",
    srcs = ["bin_with_libs.go"],
    deps = [":foo"],
)

go_library(
    name = "foo",
    srcs = ["foo.go"],
    importpath = "rules_go_simple/tests/foo",
    deps = [
        ":bar",
        ":baz",
    ],
)

go_library(
    name = "bar",
    srcs = ["bar.go"],
    importpath = "rules_go_simple/tests/bar",
    deps = [":baz"],
)

go_library(
    name = "baz",
    srcs = ["baz.go"],
    importpath = "rules_go_simple/tests/baz",
)
相关推荐
糖果Autosar5 个月前
Writing Bazel rules: simple binary rule
bazel
m0_740433831 年前
bazel使用中存在的问题
bazel
m0_740433831 年前
bazel远程缓存(Remote Cache)
bazel