【JavaEE】【SpringAI】图像模型与语音模型

一、图像模型
- [1.1 简介](#1.1 简介)
- [1.2 环境准备](#1.2 环境准备)
- - [1.2.1 申请OpenAI API_KEY](#1.2.1 申请OpenAI API_KEY)
  - [1.2.2 环境搭建](#1.2.2 环境搭建)
- [1.3 Image Model API](#1.3 Image Model API)
- - [1.3.1 简介](#1.3.1 简介)
  - [1.3.2 ImageModel(图像模型)](#1.3.2 ImageModel(图像模型))
  - [1.3.3 ImagePrompt(图像提⽰)](#1.3.3 ImagePrompt(图像提⽰))
  - [1.3.4 ImageMessage(图像消息)](#1.3.4 ImageMessage(图像消息))
  - [1.3.5 ImageOptions(图像选项)](#1.3.5 ImageOptions(图像选项))
  - [1.3.6 ImageResponse(图像响应)](#1.3.6 ImageResponse(图像响应))
  - [1.3.7 ImageGeneration(图像⽣成)](#1.3.7 ImageGeneration(图像⽣成))
- [1.4 Azure OpenAI](#1.4 Azure OpenAI)
- [1.5 QianFan](#1.5 QianFan)
- - [1.5.1 申请API_KEY](#1.5.1 申请API_KEY)
  - [1.5.2 创建项目](#1.5.2 创建项目)
  - [1.5.3 聊天模型](#1.5.3 聊天模型)
  - [1.5.4 图像模型](#1.5.4 图像模型)
二、语音模型
- [2.1 OpenAIText-to-Speech](#2.1 OpenAIText-to-Speech)
- - [2.1.1 ⽂本转语⾳](#2.1.1 ⽂本转语⾳)
  - [2.1.2 API介绍](#2.1.2 API介绍)
  - - [2.1.2.1 OpenAiAudioSpeechOptions(语⾳选项）](#2.1.2.1 OpenAiAudioSpeechOptions(语⾳选项）)
    - [2.1.2.2 SpeechPrompt(语⾳请求)](#2.1.2.2 SpeechPrompt(语⾳请求))
    - [2.1.2.3 SpeechResponse(语⾳响应)](#2.1.2.3 SpeechResponse(语⾳响应))

一、图像模型

1.1 简介

图像模型(ImageModel)是专注于处理与理解视觉数据的⼈⼯智能模型，是计算机视觉与多模态学习的核⼼。主要分为两类：

图像⽣成模型：根据⽂本、图像等条件输⼊，合成新的图像。
图像理解模型：对输⼊图像进⾏分析，完成分类、检测、分割等认知任务

1.2 环境准备

1.2.1 申请OpenAI API_KEY

使⽤OpenAI创建⼀个API来访问图像模型

注册账号：https://platform.openai.com/signup

创建API_KEY：https://platform.openai.com/account/api-keys，APIkey保存下来，后续⽆法查

图像模型需要付费：https://platform.openai.com/settings/organization/billing/overview，需要准备⼀张海外的Visa/Master的信⽤卡或借记卡

1.2.2 环境搭建

创建⽗项⽬：

pom文件：

xml 复制代码

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>spring-ai-project2</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>pom</packaging>
    <modules>
        <module>spring-image-demo</module>
    </modules>

    <properties>
        <maven.compiler.source>17</maven.compiler.source>
        <maven.compiler.target>17</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <spring-ai.version>1.0.0</spring-ai.version>
    </properties>

    <!--
完善依赖
 -->
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.5.3</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.ai</groupId>
                <artifactId>spring-ai-bom</artifactId>
                <version>${spring-ai.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>
</project>

创建子项目：

依赖：

xml 复制代码

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
	    <groupId>org.springframework.ai</groupId>
	    <artifactId>spring-ai-starter-model-openai</artifactId>
	</dependency>
</dependencies>
<build>
    <plugins>
        <plugin>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-maven-plugin</artifactId>
        </plugin>
    </plugins>
</build>

启动类：

本地开发时,即使配置了代理,有时候也⽆法让你的SpringAI应⽤正常请求api,这通常是代理软件⽆法让你的整个系统实现全局代理造成的，在启动类中包含下面内容

java 复制代码

@SpringBootApplication
public class ImageApplication {
	public static void main(String[] args) {
		SpringApplication.run(ImageApplication.class, args);
				
		System.setProperty("http.proxyHost",System.getenv("proxyHost")); //修改为你代理服 务器的 IP
		System.setProperty("https.proxyHost",System.getenv("proxyHost"));
		System.setProperty("http.proxyPort","7890"); //修改为你代理软件的端⼝
		System.setProperty("https.proxyPort","7890"); //同理
	}
}

设置API_KEY：

yml 复制代码

spring:
	ai:
		openai:
			api-key: ${OPENAI_API_KEY}

编写接口：

java 复制代码

package com.ai.image.controller;

import jakarta.servlet.http.HttpServletResponse;
import org.springframework.ai.image.ImagePrompt;
import org.springframework.ai.image.ImageResponse;
import org.springframework.ai.openai.OpenAiImageModel;
import org.springframework.ai.openai.OpenAiImageOptions;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RequestMapping("/openai")
@RestController
public class OpenAiImageController {
    @Autowired
    private OpenAiImageModel openAiImageModel;

    @GetMapping("/image")
    public void image(String message, HttpServletResponse response) {
        ImageResponse imageResponse = openAiImageModel.call(
                new ImagePrompt("A light cream colored mini golden doodle",
                        OpenAiImageOptions.builder()
                                .quality("hd")
                                .N(1)
                                .height(1024)
                                .width(1024).build())
        );
        String imageUrl = imageResponse.getResult().getOutput().getUrl();
        System.out.println(imageUrl);
    }
}

1.3 Image Model API

1.3.1 简介

ImageModelAPI是SpringAI框架中专⻔⽤于图像⽣成的模块化接⼝,它提供了⼀套统⼀的⽅式来与各种图像⽣成的AI模型进⾏交互.该API的设计遵循Spring框架的模块化和可互换性原则,使开发者能够以最⼩的代码变更在不同图像相关AI模型之间切换优势:

统⼀的API抽象层,屏蔽不同模型的实现差异
⽀持通过ImagePrompt封装输⼊,ImageResponse处理输出,
统⼀了与图像模型的通信,简化了API交互
构建在SpringAI通⽤模型API之上,提供图像特定的抽象实现

1.3.2 ImageModel(图像模型)

ImageModel是核⼼接⼝,定义了调⽤图像⽣成模型的基本⽅法.

java 复制代码

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//

package org.springframework.ai.image;

import org.springframework.ai.model.Model;

@FunctionalInterface
public interface ImageModel extends Model<ImagePrompt, ImageResponse> {
    ImageResponse call(ImagePrompt request);
}

1.3.3 ImagePrompt(图像提⽰)

ImagePrompt是⼀个封装图像消息对象列表和可选模型请求选项的ModelRequest

java 复制代码

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//

package org.springframework.ai.image;

import java.util.Collections;
import java.util.List;
import java.util.Objects;
import org.springframework.ai.model.ModelRequest;

public class ImagePrompt implements ModelRequest<List<ImageMessage>> {
    private final List<ImageMessage> messages;
    private ImageOptions imageModelOptions;

    public ImagePrompt(List<ImageMessage> messages) {
        this.messages = messages;
    }

    public ImagePrompt(List<ImageMessage> messages, ImageOptions imageModelOptions) {
        this.messages = messages;
        this.imageModelOptions = imageModelOptions;
    }

    public ImagePrompt(ImageMessage imageMessage, ImageOptions imageOptions) {
        this(Collections.singletonList(imageMessage), imageOptions);
    }

    public ImagePrompt(String instructions, ImageOptions imageOptions) {
        this(new ImageMessage(instructions), imageOptions);
    }

    public ImagePrompt(String instructions) {
        this(new ImageMessage(instructions), ImageOptionsBuilder.builder().build());
    }

    public List<ImageMessage> getInstructions() {
        return this.messages;
    }

    public ImageOptions getOptions() {
        return this.imageModelOptions;
    }

    public String toString() {
        String var10000 = String.valueOf(this.messages);
        return "NewImagePrompt{messages=" + var10000 + ", imageModelOptions=" + String.valueOf(this.imageModelOptions) + "}";
    }

    public boolean equals(Object o) {
        if (this == o) {
            return true;
        } else if (!(o instanceof ImagePrompt)) {
            return false;
        } else {
            ImagePrompt that = (ImagePrompt)o;
            return Objects.equals(this.messages, that.messages) && Objects.equals(this.imageModelOptions, that.imageModelOptions);
        }
    }

    public int hashCode() {
        return Objects.hash(new Object[]{this.messages, this.imageModelOptions});
    }
}

1.3.4 ImageMessage(图像消息)

ImageMessage 类封装了要使⽤的⽂本及其在影响⽣成的图像中的权重,对于⽀持权重的模型,它们可以是正数或负数.

java 复制代码

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//

package org.springframework.ai.image;

import java.util.Objects;

public class ImageMessage {
    private String text;
    private Float weight;

    public ImageMessage(String text) {
        this.text = text;
    }

    public ImageMessage(String text, Float weight) {
        this.text = text;
        this.weight = weight;
    }

    public String getText() {
        return this.text;
    }

    public Float getWeight() {
        return this.weight;
    }

    public String toString() {
        return "ImageMessage{text='" + this.text + "', weight=" + this.weight + "}";
    }

    public boolean equals(Object o) {
        if (this == o) {
            return true;
        } else if (!(o instanceof ImageMessage)) {
            return false;
        } else {
            ImageMessage that = (ImageMessage)o;
            return Objects.equals(this.text, that.text) && Objects.equals(this.weight, that.weight);
        }
    }

    public int hashCode() {
        return Objects.hash(new Object[]{this.text, this.weight});
    }
}

1.3.5 ImageOptions(图像选项)

表⽰可以传递给图像⽣成模型的选项. ImageOptions 接⼝扩展了ModelOptions 接⼝,⽤于定义可以传递给AI模型的少数可移植选项

java 复制代码

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//

package org.springframework.ai.image;

import org.springframework.ai.model.ModelOptions;
import org.springframework.lang.Nullable;

public interface ImageOptions extends ModelOptions {
    @Nullable
    Integer getN();

    @Nullable
    String getModel();

    @Nullable
    Integer getWidth();

    @Nullable
    Integer getHeight();

    @Nullable
    String getResponseFormat();

    @Nullable
    String getStyle();
}

每个特定的ImageModel实现,都可以有⾃⼰选项,这些选项可以传递给AI模型,OpenAI图像⽣成模型有其⾃⼰的选项,如quality，style 等.这个功能允许开发者在启动应⽤程序时使⽤特定模型的选项

java 复制代码

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//

package org.springframework.ai.openai;

import com.fasterxml.jackson.annotation.JsonInclude;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.annotation.JsonInclude.Include;
import java.util.Objects;
import org.springframework.ai.image.ImageOptions;

@JsonInclude(Include.NON_NULL)
public class OpenAiImageOptions implements ImageOptions {
    @JsonProperty("n")
    private Integer n;
    @JsonProperty("model")
    private String model;
    @JsonProperty("size_width")
    private Integer width;
    @JsonProperty("size_height")
    private Integer height;
    @JsonProperty("quality")
    private String quality;
    @JsonProperty("response_format")
    private String responseFormat;
    @JsonProperty("size")
    private String size;
    @JsonProperty("style")
    private String style;
    @JsonProperty("user")
    private String user;
 }

选项属性	配置属性	描述	默认
n	spring.ai.openai.image.options.n	⽣成图像的数量.必须在1到10之间.对于dalle-3, 仅⽀持n=1	-
model	spring.ai.openai.image.options.model	⽤于图像⽣成的模型	OpenAiIma.geApi.DEFAULT_IMAGE_MODEL
width	spring.ai.openai.image.options.size_width	⽣成的图像宽度.对于dall-e-2来说,必须是256、512或1024之⼀	-
height	spring.ai.openai.image.options.size_height	⽣成的图像⾼度.对于dall-e-2来说,必须是256、512或1024之⼀	-
quality	spring.ai.openai.image.options.quality	⽣成图像质量.HD⽣成具有更精细的细节和图像中更⾼⼀致性的图像.此参数仅⽀持dall-e-3	-
response Format	spring.ai.openai.image.options.response_format	返回⽣成的图像的格式.必须是URL或b64_json 之⼀.	-
size	spring.ai.openai.image.options.size	⽣成的图像⼤⼩.对于dall-e-2,必须是256x256、512x512或1024x1024之⼀.对于dall-e-3模型, 必须是1024x1024、1792x1024或1024x1792之⼀	-
style	spring.ai.openai.image.options.style	⽣成的图像⻛格.必须是"vivid"或⾃然之⼀.⽣动会使模型倾向于⽣成超现实和戏剧化的图像.⾃然会使模型产⽣更多⾃然、不那么超现实的图像.此参数仅适⽤于dall-e-3	-
user	spring.ai.openai.image.options.user	⼀个代表您的最终⽤⼾的唯⼀标识符,有助于OpenAI监控和检测滥⽤	-

1.3.6 ImageResponse(图像响应)

ImageResponse封装AI模型的⽣成结果

java 复制代码

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//

package org.springframework.ai.image;

import java.util.List;
import java.util.Objects;
import org.springframework.ai.model.ModelResponse;
import org.springframework.util.CollectionUtils;

public class ImageResponse implements ModelResponse<ImageGeneration> {
    private final ImageResponseMetadata imageResponseMetadata;
    private final List<ImageGeneration> imageGenerations;

    public ImageResponse(List<ImageGeneration> generations) {
        this(generations, new ImageResponseMetadata());
    }

    public ImageResponse(List<ImageGeneration> generations, ImageResponseMetadata imageResponseMetadata) {
        this.imageResponseMetadata = imageResponseMetadata;
        this.imageGenerations = List.copyOf(generations);
    }

    public List<ImageGeneration> getResults() {
        return this.imageGenerations;
    }

    public ImageGeneration getResult() {
        return CollectionUtils.isEmpty(this.imageGenerations) ? null : (ImageGeneration)this.imageGenerations.get(0);
    }

    public ImageResponseMetadata getMetadata() {
        return this.imageResponseMetadata;
    }

    public String toString() {
        String var10000 = String.valueOf(this.imageResponseMetadata);
        return "ImageResponse [imageResponseMetadata=" + var10000 + ", imageGenerations=" + String.valueOf(this.imageGenerations) + "]";
    }

    public boolean equals(Object o) {
        if (this == o) {
            return true;
        } else if (!(o instanceof ImageResponse)) {
            return false;
        } else {
            ImageResponse that = (ImageResponse)o;
            return Objects.equals(this.imageResponseMetadata, that.imageResponseMetadata) && Objects.equals(this.imageGenerations, that.imageGenerations);
        }
    }

    public int hashCode() {
        return Objects.hash(new Object[]{this.imageResponseMetadata, this.imageGenerations});
    }
}

1.3.7 ImageGeneration(图像⽣成)

ImageGeneration 类从ModelResult 继承,以表⽰输出响应以及关于此结果的相关元数据表⽰单个图像⽣成结果及其元数据

AI图像⽣成模型返回数据的两种主要⽅式是:1.返回url链接 2.以Base64编码格式返回的图像数据

java 复制代码

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//

package org.springframework.ai.image;

import org.springframework.ai.model.ModelResult;

public class ImageGeneration implements ModelResult<Image> {
    private ImageGenerationMetadata imageGenerationMetadata;
    private Image image;

    public ImageGeneration(Image image) {
        this.image = image;
    }

    public ImageGeneration(Image image, ImageGenerationMetadata imageGenerationMetadata) {
        this.image = image;
        this.imageGenerationMetadata = imageGenerationMetadata;
    }

    public Image getOutput() {
        return this.image;
    }

    public ImageGenerationMetadata getMetadata() {
        return this.imageGenerationMetadata;
    }

    public String toString() {
        String var10000 = String.valueOf(this.imageGenerationMetadata);
        return "ImageGeneration{imageGenerationMetadata=" + var10000 + ", image=" + String.valueOf(this.image) + "}";
    }
}

1.4 Azure OpenAI

AzureOpenAI:是微软(Microsoft)在⾃家云计算平台Azure上推出的AI服务,底层⽤的是OpenAI训练好的模型

申请文档：https://razeen.me/posts/how-to-apply-and-use-azure-openai-api/

1.5 QianFan

百度千帆是百度智能云推出的AI原⽣应⽤开发平台,基于⽂⼼⼤模型构建,提供从模型开发到应⽤部署的全栈AI能⼒⽀持

1.5.1 申请API_KEY

千帆ModelBuilder地址：https://console.bce.baidu.com/qianfan/

申请API_KEY地址：https://console.bce.baidu.com/qianfan/ais/console/apiKey

开通产品地址：https://console.bce.baidu.com/qianfan/ais/console/presetService

1.5.2 创建项目

pom文件：

xml 复制代码

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.example</groupId>
        <artifactId>spring-ai-project2</artifactId>
        <version>1.0-SNAPSHOT</version>
    </parent>

    <artifactId>spring-qianfan-demo</artifactId>

    <properties>
        <maven.compiler.source>17</maven.compiler.source>
        <maven.compiler.target>17</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-starter-model-openai</artifactId>
        </dependency>

    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

启动类：

java 复制代码

package com.ai.qianfan;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class QianfanApplicationDemo {
	public static void main(String[] args) {
		SpringApplication.run(QianfanApplicationDemo.class, args);
	}
}

配置项：

V1版本为早期独⽴接⼝规范,⾃2025年4⽉30⽇起,V1版本推理服务创建⼊⼝已全⾯关闭,新服务默认基于V2接⼝创建.

yml 复制代码

spring:
  ai:
    openai:
      api-key: ${QIANFAN_API_KEY}
      base-url: https://qianfan.baidubce.com
      chat:
        options:
          model: "ernie-x1-turbo-32k"
          temperature: 0.7
        completions-path: /v2/chat/completions

1.5.3 聊天模型

java 复制代码

package com.ai.qianfan.controller;

import org.springframework.ai.openai.OpenAiChatModel;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("/qianfan")
public class ChatController {
    @Autowired
    public OpenAiChatModel openAiChatModel;
    @RequestMapping("/chat")
    public String chat(String message) {
        return openAiChatModel.call(message);
    }
}

1.5.4 图像模型

配置：

yml 复制代码

      image:
        options:
          model: "flux.1-schnell"
        imagesPath: /v2/images/generations

java 复制代码

    @Autowired
    private OpenAiImageModel openaiImageModel;
    @GetMapping("/image")
    public void image(String message) {
        ImageResponse imageResponse = openaiImageModel.call(
                new ImagePrompt("A light cream colored mini golden doodle",
                        OpenAiImageOptions.builder()
                                .quality("hd")
                                .N(1)
                                .height(1024)
                                .width(1024).build())
        );
        String imageUrl = imageResponse.getResult().getOutput().getUrl();
        System.out.println(imageUrl);
    }

二、语音模型

语⾳模型（SpeechModel）是⼈⼯智能领域中⽤于处理和理解⼈类语⾳信号的计算模型,⼴泛应⽤于语⾳识别（ASR,AutomaticSpeechRecognition）、语⾳合成（TTS,Text-to-Speech）、语⾳助⼿（如Siri、⼩爱同学）等系统中

2.1 OpenAIText-to-Speech

语⾳合成,⼜称⽂本转语⾳(Text-to-Speech,TTS),是将⽂本转换为⾃然语⾳的技术.SpringAI提供了对OpenAI的⽂本转语⾳(TTS)API的⽀持,使⽤⼾能够：

讲述⼀篇书⾯博客⽂章.
在多种语⾔中⽣成语⾳⾳频.
使⽤流式传输提供实时⾳频输出.

SpringAI提供了⾼度抽象的SpeechModel和StreamingSpeechModel等接⼝,通过简单的⽅法调⽤(如call)即可完成⽂本到语⾳的转换,极⼤降低了学习成本和开发复杂度.

语音模型和图像模型的依赖和配置都是相同的。

yml 复制代码

spring:
	ai:
		openai:
			api-key: ${OPENAI_API_KEY}

xml 复制代码

<dependency>
	<groupId>org.springframework.ai</groupId>
	<artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

2.1.1 ⽂本转语⾳

java 复制代码

package com.ai.image.controller;

import org.springframework.ai.openai.OpenAiAudioSpeechModel;
import org.springframework.ai.openai.OpenAiAudioSpeechOptions;
import org.springframework.ai.openai.api.OpenAiAudioApi;
import org.springframework.ai.openai.audio.speech.SpeechPrompt;
import org.springframework.ai.openai.audio.speech.SpeechResponse;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;

@RestController
@RequestMapping("/chat")
public class OpenAiSpeechController {
    @Autowired
    private OpenAiAudioSpeechModel openAiAudioSpeechModel;
    @GetMapping("/tts")
    public void tts() throws IOException {
        OpenAiAudioSpeechOptions speechOptions = OpenAiAudioSpeechOptions.builder()
                .model("tts-1")
                .voice(OpenAiAudioApi.SpeechRequest.Voice.ALLOY)
                .responseFormat(OpenAiAudioApi.SpeechRequest.AudioResponseFormat.MP3)
                .speed(1.0f)
                .build();
        SpeechPrompt speechPrompt = new SpeechPrompt("Hello, this is a text-to speech example.", speechOptions);
                SpeechResponse response = openAiAudioSpeechModel.call(speechPrompt);
        File file = new File(System.getProperty("user.dir") + "/output.mp3");
        try (FileOutputStream fos = new FileOutputStream(file)) {
            fos.write(response.getResult().getOutput());
        }
        catch (IOException e) {
            throw new IOException(e.getMessage());
        }
    }
}

2.1.2 API介绍

2.1.2.1 OpenAiAudioSpeechOptions(语⾳选项）

OpenAiAudioSpeechOptions 类提供了进⾏⽂本转语⾳请求时使⽤的选项.启动时,会使⽤

spring.ai.openai.audio.speech 指定的配置,也可以在运⾏时覆盖这些选项.

java 复制代码

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//

package org.springframework.ai.openai;

import com.fasterxml.jackson.annotation.JsonInclude;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.annotation.JsonInclude.Include;
import org.springframework.ai.model.ModelOptions;
import org.springframework.ai.openai.api.OpenAiAudioApi;

@JsonInclude(Include.NON_NULL)
public class OpenAiAudioSpeechOptions implements ModelOptions {
    @JsonProperty("model")
    private String model;
    @JsonProperty("input")
    private String input;
    @JsonProperty("voice")
    private String voice;
    @JsonProperty("response_format")
    private OpenAiAudioApi.SpeechRequest.AudioResponseFormat responseFormat;
    @JsonProperty("speed")
    private Float speed;

    public OpenAiAudioSpeechOptions() {
    }

    public static Builder builder() {
        return new Builder();
    }

    public String getModel() {
        return this.model;
    }

    public void setModel(String model) {
        this.model = model;
    }

    public String getInput() {
        return this.input;
    }

    public void setInput(String input) {
        this.input = input;
    }

    public String getVoice() {
        return this.voice;
    }

    public void setVoice(String voice) {
        this.voice = voice;
    }

    public void setVoice(OpenAiAudioApi.SpeechRequest.Voice voice) {
        this.voice = voice.getValue();
    }

    public OpenAiAudioApi.SpeechRequest.AudioResponseFormat getResponseFormat() {
        return this.responseFormat;
    }

    public void setResponseFormat(OpenAiAudioApi.SpeechRequest.AudioResponseFormat responseFormat) {
        this.responseFormat = responseFormat;
    }

    public Float getSpeed() {
        return this.speed;
    }

    public void setSpeed(Float speed) {
        this.speed = speed;
    }

    public int hashCode() {
        int prime = true;
        int result = 1;
        result = 31 * result + (this.model == null ? 0 : this.model.hashCode());
        result = 31 * result + (this.input == null ? 0 : this.input.hashCode());
        result = 31 * result + (this.voice == null ? 0 : this.voice.hashCode());
        result = 31 * result + (this.responseFormat == null ? 0 : this.responseFormat.hashCode());
        result = 31 * result + (this.speed == null ? 0 : this.speed.hashCode());
        return result;
    }

    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        } else if (obj == null) {
            return false;
        } else if (this.getClass() != obj.getClass()) {
            return false;
        } else {
            OpenAiAudioSpeechOptions other = (OpenAiAudioSpeechOptions)obj;
            if (this.model == null) {
                if (other.model != null) {
                    return false;
                }
            } else if (!this.model.equals(other.model)) {
                return false;
            }

            if (this.input == null) {
                if (other.input != null) {
                    return false;
                }
            } else if (!this.input.equals(other.input)) {
                return false;
            }

            if (this.voice == null) {
                if (other.voice != null) {
                    return false;
                }
            } else if (!this.voice.equals(other.voice)) {
                return false;
            }

            if (this.responseFormat == null) {
                if (other.responseFormat != null) {
                    return false;
                }
            } else if (!this.responseFormat.equals(other.responseFormat)) {
                return false;
            }

            if (this.speed == null) {
                return other.speed == null;
            } else {
                return this.speed.equals(other.speed);
            }
        }
    }

    public String toString() {
        String var10000 = this.model;
        return "OpenAiAudioSpeechOptions{model='" + var10000 + "', input='" + this.input + "', voice='" + this.voice + "', responseFormat='" + String.valueOf(this.responseFormat) + "', speed=" + this.speed + "}";
    }

    public static class Builder {
        private final OpenAiAudioSpeechOptions options = new OpenAiAudioSpeechOptions();

        public Builder() {
        }

        public Builder model(String model) {
            this.options.model = model;
            return this;
        }

        public Builder input(String input) {
            this.options.input = input;
            return this;
        }

        public Builder voice(String voice) {
            this.options.voice = voice;
            return this;
        }

        public Builder voice(OpenAiAudioApi.SpeechRequest.Voice voice) {
            this.options.voice = voice.getValue();
            return this;
        }

        public Builder responseFormat(OpenAiAudioApi.SpeechRequest.AudioResponseFormat responseFormat) {
            this.options.responseFormat = responseFormat;
            return this;
        }

        public Builder speed(Float speed) {
            this.options.speed = speed;
            return this;
        }

        public OpenAiAudioSpeechOptions build() {
            return this.options;
        }
    }
}

配置项介绍：

选项属性	配置属性	描述
model	spring.ai.openai.audio.speech.options.model	语⾳⽣成模型默认值为:OpenAiAudioApi.TtsModel.TTS_1
input	spring.ai.openai.audio.speech.options.input	语⾳⽣成的内容通常使⽤ prompt 代替
voice	spring.ai.openai.audio.speech.options.voice	说话的声⾳.默认值为:"alloy"，OpenAI提供了多种声⾳,参考:OpenAiAudioApi.Voice 【ALLOY(奥洛伊)：清晰、中性、现代感- ECHO(艾科):沉稳、温暖、值得信赖的男声】【 FABLE(费伯):富有表现⼒、⽣动、讲故事的声⾳】【 ONYX(奥尼克斯):深沉、有⼒、权威的男声】【 NOVA(诺娃)：清晰、明亮、充满活⼒且友善的⼥声】【SHIMMER(希默)：柔和、悦⽿、平静的⼥声】
response_format	spring.ai.openai.audio.speech.options.response_format	指定输出⾳频的格式.默认格为"mp3"，⽀持MP3,WAV ,AAC 等
speed	spring.ai.openai.audio.speech.options.speed	⽣成语⾳的速度.默认值为1。1.0 是正常速度,⼤于1.0 会变快,⼩于1.0 则会变慢

2.1.2.2 SpeechPrompt(语⾳请求)

SpeechPrompt 将我们想要转换的⽂本内容和上⾯配置好的语⾳选项打包在⼀起,形成⼀个完整的"语⾳⽣成请求".简单来说,就是(内容 + 配置) 的组合

java 复制代码

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//

package org.springframework.ai.openai.audio.speech;

import java.util.Objects;
import org.springframework.ai.model.ModelOptions;
import org.springframework.ai.model.ModelRequest;
import org.springframework.ai.openai.OpenAiAudioSpeechOptions;

public class SpeechPrompt implements ModelRequest<SpeechMessage> {
    private final SpeechMessage message;
    private OpenAiAudioSpeechOptions speechOptions;

    public SpeechPrompt(String instructions) {
        this(new SpeechMessage(instructions), OpenAiAudioSpeechOptions.builder().build());
    }

    public SpeechPrompt(String instructions, OpenAiAudioSpeechOptions speechOptions) {
        this(new SpeechMessage(instructions), speechOptions);
    }

    public SpeechPrompt(SpeechMessage speechMessage) {
        this(speechMessage, OpenAiAudioSpeechOptions.builder().build());
    }

    public SpeechPrompt(SpeechMessage speechMessage, OpenAiAudioSpeechOptions speechOptions) {
        this.message = speechMessage;
        this.speechOptions = speechOptions;
    }

    public SpeechMessage getInstructions() {
        return this.message;
    }

    public ModelOptions getOptions() {
        return this.speechOptions;
    }

    public boolean equals(Object o) {
        if (this == o) {
            return true;
        } else if (!(o instanceof SpeechPrompt)) {
            return false;
        } else {
            SpeechPrompt that = (SpeechPrompt)o;
            return Objects.equals(this.speechOptions, that.speechOptions) && Objects.equals(this.message, that.message);
        }
    }

    public int hashCode() {
        return Objects.hash(new Object[]{this.speechOptions, this.message});
    }
}

2.1.2.3 SpeechResponse(语⾳响应)

ImageResponse封装AI模型的⽣成结果.当我们调⽤API后,封装AI模型返回的结果就是(例如⼀个字节数组SpeechResponse 对象.它通常包含⽣成的⾳频数据byte\[\] 或⼀个⽂件流),我们的应⽤程序可以将其保存为⾳频⽂件(如.mp3 )或直接进⾏播放

java 复制代码

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//

package org.springframework.ai.openai.audio.speech;

import java.util.Collections;
import java.util.List;
import java.util.Objects;
import org.springframework.ai.model.ModelResponse;
import org.springframework.ai.openai.metadata.audio.OpenAiAudioSpeechResponseMetadata;

public class SpeechResponse implements ModelResponse<Speech> {
    private final Speech speech;
    private final OpenAiAudioSpeechResponseMetadata speechResponseMetadata;

    public SpeechResponse(Speech speech) {
        this(speech, OpenAiAudioSpeechResponseMetadata.NULL);
    }

    public SpeechResponse(Speech speech, OpenAiAudioSpeechResponseMetadata speechResponseMetadata) {
        this.speech = speech;
        this.speechResponseMetadata = speechResponseMetadata;
    }

    public Speech getResult() {
        return this.speech;
    }

    public List<Speech> getResults() {
        return Collections.singletonList(this.speech);
    }

    public OpenAiAudioSpeechResponseMetadata getMetadata() {
        return this.speechResponseMetadata;
    }

    public boolean equals(Object o) {
        if (this == o) {
            return true;
        } else if (!(o instanceof SpeechResponse)) {
            return false;
        } else {
            SpeechResponse that = (SpeechResponse)o;
            return Objects.equals(this.speech, that.speech) && Objects.equals(this.speechResponseMetadata, that.speechResponseMetadata);
        }
    }

    public int hashCode() {
        return Objects.hash(new Object[]{this.speech, this.speechResponseMetadata});
    }
}

【JavaEE】【SpringAI】图像模型与语音模型

目录

一、图像模型

1.1 简介

1.2 环境准备

1.2.1 申请OpenAI API_KEY

1.2.2 环境搭建

1.3 Image Model API

1.3.1 简介

1.3.2 ImageModel(图像模型)

1.3.3 ImagePrompt(图像提⽰)

1.3.4 ImageMessage(图像消息)

1.3.5 ImageOptions(图像选项)

1.3.6 ImageResponse(图像响应)

1.3.7 ImageGeneration(图像⽣成)

1.4 Azure OpenAI

1.5 QianFan

1.5.1 申请API_KEY

1.5.2 创建项目

1.5.3 聊天模型

1.5.4 图像模型

二、语音模型

2.1 OpenAIText-to-Speech

2.1.1 ⽂本转语⾳

2.1.2 API介绍

2.1.2.1 OpenAiAudioSpeechOptions(语⾳选项）

2.1.2.2 SpeechPrompt(语⾳请求)

2.1.2.3 SpeechResponse(语⾳响应)