【Hive】自定义函数从编写到应用的整个流程（以UDF为例）

卜塔2024-05-03 9:19

1. 编写UDF程序

以Java为例，编写一个字符串反转的函数（工程依赖部分略）：

java 复制代码

package com.example;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.udf.UDFType;
import org.apache.hadoop.io.Text;

@Description(
    name = "ExampleUDF",
    value = "_FUNC_(STR) - Example UDF that reverses the input string"
)
@UDFType(deterministic = true, stateful = false)
public class ExampleUDF extends UDF {

    public String evaluate(String input) {
        if (input == null) {
            return null;
        }
        return new StringBuilder(input).reverse().toString();
    }
}

2. 编译程序

使用Java编译器（如javac）编译UDF类，并使用Hive的jar包进行打包（也可以使用Maven打包）：

powershell 复制代码

javac -cp /path/to/hive/lib/hive-exec.jar -d . ExampleUDF.java
jar -cvf example-udf.jar com/example/ExampleUDF.class

3. 上传jar包

将编译好的UDF JAR上传到HDFS上，以便Hive能够访问它：

powershell 复制代码

hdfs dfs -put example-udf.jar /path/to/udf/jars

4. 注册UDF到Hive

在Hive会话中，使用ADD JAR命令加载UDF的JAR包，然后使用CREATE TEMPORARY FUNCTION或CREATE FUNCTION来注册UDF：

powershell 复制代码

ADD JAR /path/to/udf/jars/example-udf.jar;

CREATE TEMPORARY FUNCTION example_uudf AS 'com.example.ExampleUDF';

-- 或者，创建一个持久的函数（需要Hive 2.3.0及以上版本）:
CREATE FUNCTION example_uudf AS 'com.example.ExampleUDF';

5. 使用UDF

注册UDF后，可以在Hive的查询中使用它了：

powershell 复制代码

SELECT example_uudf(your_column) FROM your_table;