使用AssemblyAI将音频数据转换成文本

使用AssemblyAI将音频数据转换成文本

油管上有一个How To Call a REST API In Java - Simple Tutorial讲到

使用AssemblyAI 只需几行代码即可快速尝试我们的语音转文本和语音理解 AI 模型。首先我们得注册账号,然后登录账号,

https://www.assemblyai.com/dashboard/activation页面可以看到自己的API Key,如下图所示:

这个API Key非常重要,在我们的代码中调用AssemblyAI库通过postget请求将音频文件转录成文本时会使用到。上述图中可以看出官方给出了Python(SDK)JavaScript(SDK)PythonJavaScriptPHPRubyC#等代码示例,居然没提供Java的。

一般我们使用AssemblyAI这个库将本地或者网络上存储的音频文件转换成文本需要两步走:

    1. 调用post请求通过AssemblyAI填充我们的audio_ullAPI Key等参数,然后生成对应的transcript id
    1. 根据第一步生成的transcript id,再发送get请求,当转换完成后获取对应的转录text文本

AssemblyAI代码示例

  • Python(SDK)
Python 复制代码
# Install the assemblyai package by executing the command "pip install assemblyai"

import assemblyai as aai

aai.settings.api_key = "填写你自己的API Key"

# audio_file = "./local_file.mp3"
audio_file = "https://assembly.ai/wildfires.mp3"

config = aai.TranscriptionConfig(speech_model=aai.SpeechModel.best)

transcript = aai.Transcriber(config=config).transcribe(audio_file)

if transcript.status == "error":
  raise RuntimeError(f"Transcription failed: {transcript.error}")

print(transcript.text)
  • JavaScript(SDK)
JavaScript 复制代码
// Install the assemblyai package by executing the command "npm install assemblyai"

import { AssemblyAI } from "assemblyai";

const client = new AssemblyAI({
  apiKey: "填写你自己的API Key",
});

// const audioFile = "./local_file.mp3";
const audioFile = 'https://assembly.ai/wildfires.mp3'

const params = {
  audio: audioFile,
  speech_model: "universal",
};

const run = async () => {
  const transcript = await client.transcripts.transcribe(params);

  console.log(transcript.text);
};

run();
  • Python

    Install the requests package by executing the command "pip install requests"

    import requests
    import time

    base_url = "https://api.assemblyai.com"

    headers = {
    "authorization": "填写你自己的API Key"
    }

    You can upload a local file using the following code

    with open("./my-audio.mp3", "rb") as f:

    response = requests.post(base_url + "/v2/upload",

    headers=headers,

    data=f)

    audio_url = response.json()["upload_url"]

    audio_url = "https://assembly.ai/wildfires.mp3"

    data = {
    "audio_url": audio_url,
    "speech_model": "universal"
    }

    url = base_url + "/v2/transcript"
    response = requests.post(url, json=data, headers=headers)

    transcript_id = response.json()['id']
    polling_endpoint = base_url + "/v2/transcript/" + transcript_id

    while True:
    transcription_result = requests.get(polling_endpoint, headers=headers).json()
    transcript_text = transcription_result['text']

    复制代码
    if transcription_result['status'] == 'completed':
      print(f"Transcript Text:", transcript_text)
      break
    
    elif transcription_result['status'] == 'error':
      raise RuntimeError(f"Transcription failed: {transcription_result['error']}")
    
    else:
      time.sleep(3)
  • JavaScript

JavaScript 复制代码
 // Install the axios and fs-extra package by executing the command "npm install axios fs-extra"

import axios from "axios";
import fs from "fs-extra";

const baseUrl = "https://api.assemblyai.com";

const headers = {
  authorization: "填写你自己的API Key",
};

// You can upload a local file using the following code
// const path = "./my-audio.mp3";
// const audioData = await fs.readFile(path);
// const uploadResponse = await axios.post(`${baseUrl}/v2/upload`, audioData, {
//   headers,
// });
// const audioUrl = uploadResponse.data.upload_url;

const audioUrl = "https://assembly.ai/wildfires.mp3";

const data = {
  audio_url: audioUrl,
  speech_model: "universal",
};

const url = `${baseUrl}/v2/transcript`;
const response = await axios.post(url, data, { headers: headers });

const transcriptId = response.data.id;
const pollingEndpoint = `${baseUrl}/v2/transcript/${transcriptId}`;

while (true) {
  const pollingResponse = await axios.get(pollingEndpoint, {
    headers: headers,
  });
  const transcriptionResult = pollingResponse.data;

  if (transcriptionResult.status === "completed") {
    console.log(transcriptionResult.text);
    break;
  } else if (transcriptionResult.status === "error") {
    throw new Error(`Transcription failed: ${transcriptionResult.error}`);
  } else {
    await new Promise((resolve) => setTimeout(resolve, 3000));
  }
}
  • PHP
php 复制代码
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

// replace with your API key
$YOUR_API_KEY = "填写你自己的API Key";

// URL of the file to transcribe
$FILE_URL = "https://assembly.ai/wildfires.mp3";
// You can also transcribe a local file by passing in a file path
// $FILE_URL = './path/to/file.mp3';

// AssemblyAI transcript endpoint (where we submit the file)
$transcript_endpoint = "https://api.assemblyai.com/v2/transcript";

// Request parameters 
$data = array(
    "audio_url" => $FILE_URL // You can also use a URL to an audio or video file on the web
);

// HTTP request headers
$headers = array(
    "authorization: " . $YOUR_API_KEY,
    "content-type: application/json"
);

// submit for transcription via HTTP request
$curl = curl_init($transcript_endpoint);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, json_encode($data));
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($curl);
$response = json_decode($response, true);
curl_close($curl);

# polling for transcription completion
$transcript_id = $response['id'];
$polling_endpoint = "https://api.assemblyai.com/v2/transcript/" . $transcript_id;

while (true) {
    $polling_response = curl_init($polling_endpoint);
    curl_setopt($polling_response, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($polling_response, CURLOPT_RETURNTRANSFER, true);
    $transcription_result = json_decode(curl_exec($polling_response), true);
    
    if ($transcription_result['status'] === "completed") {
        echo $transcription_result['text'];
        break;
    } else if ($transcription_result['status'] === "error") {
        throw new Exception("Transcription failed: " . $transcription_result['error']);
    }

    sleep(3);
}
  • Ruby
ruby 复制代码
require 'net/http'
require 'json'

base_url = 'https://api.assemblyai.com'
headers = {
  'authorization' => '填写你的API Key',
  'content-type' => 'application/json'
}

audio_url = 'https://assembly.ai/wildfires.mp3'

data = {
  "audio_url" => audio_url,
  "speech_model" => "universal"
}

uri = URI.parse("#{base_url}/v2/transcript")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Post.new(uri.request_uri, headers)
request.body = data.to_json
response = http.request(request)
response_body = JSON.parse(response.body)

unless response.is_a?(Net::HTTPSuccess)
  raise "API request failed with status #{response.code}: #{response.body}"
end

transcript_id = response_body['id']
puts "Transcript ID: #{transcript_id}"

polling_endpoint = URI.parse("#{base_url}/v2/transcript/#{transcript_id}")
while true
  polling_http = Net::HTTP.new(polling_endpoint.host, polling_endpoint.port)
  polling_http.use_ssl = true
  polling_request = Net::HTTP::Get.new(polling_endpoint.request_uri, headers)
  polling_response = polling_http.request(polling_request)
  transcription_result = JSON.parse(polling_response.body)
  
  if transcription_result['status'] == 'completed'
    puts "Transcription text: #{transcription_result['text']}"
    break
  elsif transcription_result['status'] == 'error'
    raise "Transcription failed: #{transcription_result['error']}"
  else
    puts 'Waiting for transcription to complete...'
    sleep(3)
  end
end
  • C#
Csharp 复制代码
using System;
using System.IO;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Net.Http.Json;
using System.Text;
using System.Text.Json;
using System.Text.Json.Serialization;
using System.Threading.Tasks;

class Program
{
  static readonly string BaseUrl = "https://api.assemblyai.com";
  static readonly string ApiKey = "填写你的API Key";

  static async Task<string> UploadFileAsync(string filePath, HttpClient httpClient)
  {
    using (var fileStream = File.OpenRead(filePath))
    using (var fileContent = new StreamContent(fileStream))
    {
      fileContent.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
      using (var response = await httpClient.PostAsync("https://api.assemblyai.com/v2/upload", fileContent))
      {
        response.EnsureSuccessStatusCode();
        var jsonDoc = await response.Content.ReadFromJsonAsync<JsonDocument>();
        // Add null check to fix CS8602 warning
        return jsonDoc?.RootElement.GetProperty("upload_url").GetString() ??
               throw new InvalidOperationException("Failed to get upload URL from response");
      }
    }
  }

  static async Task Main(string[] args)
  {
    using var httpClient = new HttpClient();
    httpClient.DefaultRequestHeaders.Add("authorization", ApiKey);

    // var audioUrl = await UploadFileAsync("./my_audio.mp3", httpClient);
    string audioUrl = "https://assembly.ai/wildfires.mp3";

    var requestData = new
    {
      audio_url = audioUrl,
      speech_model = "universal"
    };

    var jsonContent = new StringContent(
        JsonSerializer.Serialize(requestData),
        Encoding.UTF8,
        "application/json");

    using var transcriptResponse = await httpClient.PostAsync($"{BaseUrl}/v2/transcript", jsonContent);
    var transcriptResponseBody = await transcriptResponse.Content.ReadAsStringAsync();
    var transcriptData = JsonSerializer.Deserialize<JsonElement>(transcriptResponseBody);

    if (!transcriptData.TryGetProperty("id", out JsonElement idElement))
    {
      throw new Exception("Failed to get transcript ID");
    }

    string transcriptId = idElement.GetString() ?? throw new Exception("Transcript ID is null");

    string pollingEndpoint = $"{BaseUrl}/v2/transcript/{transcriptId}";

    while (true)
    {
      using var pollingResponse = await httpClient.GetAsync(pollingEndpoint);
      var pollingResponseBody = await pollingResponse.Content.ReadAsStringAsync();
      var transcriptionResult = JsonSerializer.Deserialize<JsonElement>(pollingResponseBody);

      if (!transcriptionResult.TryGetProperty("status", out JsonElement statusElement))
      {
        throw new Exception("Failed to get transcription status");
      }

      string status = statusElement.GetString() ?? throw new Exception("Status is null");

      if (status == "completed")
      {
        if (!transcriptionResult.TryGetProperty("text", out JsonElement textElement))
        {
          throw new Exception("Failed to get transcript text");
        }

        string transcriptText = textElement.GetString() ?? string.Empty;
        Console.WriteLine($"Transcript Text: {transcriptText}");
        break;
      }
      else if (status == "error")
      {
        string errorMessage = transcriptionResult.TryGetProperty("error", out JsonElement errorElement)
            ? errorElement.GetString() ?? "Unknown error"
            : "Unknown error";

        throw new Exception($"Transcription failed: {errorMessage}");
      }
      else
      {
        await Task.Delay(3000);
      }
    }
  }
}

注意:只需要安装好assemblyai相关依赖只好,按照自己的需求参考上面的代码,替换自己的API key以及需要转录的音频文件,即可生成对应的转录文本。

以下面的Ptyhon代码为例:

python 复制代码
# Install the requests package by executing the command "pip install requests"

import requests
import time

base_url = "https://api.assemblyai.com"

headers = {
    "authorization": "填写你自己的API key"
}
# You can upload a local file using the following code
with open("./audio_data/Thirsty.mp4", "rb") as f:
  response = requests.post(base_url + "/v2/upload",
                          headers=headers,
                          data=f)

audio_url = response.json()["upload_url"]
print(f"Audio URL: {audio_url}")

# audio_url = "https://assembly.ai/wildfires.mp3"

data = {
    "audio_url": audio_url,
    "speech_model": "universal"
}

url = base_url + "/v2/transcript"
response = requests.post(url, json=data, headers=headers)

transcript_id = response.json()['id']
polling_endpoint = base_url + "/v2/transcript/" + transcript_id

while True:
  transcription_result = requests.get(polling_endpoint, headers=headers).json()
  transcript_text = transcription_result['text']

  if transcription_result['status'] == 'completed':
    print(f"Transcript Text:", transcript_text)
    break

  elif transcription_result['status'] == 'error':
    raise RuntimeError(f"Transcription failed: {transcription_result['error']}")

  else:
    time.sleep(3)

Java代码示例

如下是使用AssemblyAI将github上的音频文件转换成文本的Java代码示例,注意安装一下Gson库:

// Transcript.java

java 复制代码
package org.example;

public class Transcript {
    private String audio_url;

    private String id;

    private String status;

    private String text;

    private String error;

    public String getError() {
        return error;
    }

    public void setError(String error) {
        this.error = error;
    }

    public String getStatus() {
        return status;
    }

    public void setStatus(String status) {
        this.status = status;
    }

    public String getText() {
        return text;
    }

    public void setText(String text) {
        this.text = text;
    }

    public String getAudio_url() {
        return audio_url;
    }

    public void setAudio_url(String audio_url) {
        this.audio_url = audio_url;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }
}

// Main.java

java 复制代码
package org.example;

import com.google.gson.Gson;

import java.io.IOException;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;

public class Main {
    private static final String API_KEY = "填写你自己的API Key";
    public static void main(String[] args) throws IOException, InterruptedException {
        String audio_url = "";
        Transcript transcript = new Transcript();
        transcript.setAudio_url("https://github.com/johnmarty3/JavaAPITutorial/blob/main/Thirsty.mp4?raw=true");
        Gson gson = new Gson();
        // 利用 Gson 将 Transcript 对象转换为 JSON 字符串
        String jsonRequest = gson.toJson(transcript);

        System.out.println("JSON Request: " + jsonRequest);

        // 构建 POST 请求
        HttpRequest postRequest = HttpRequest.newBuilder()
                .uri(java.net.URI.create("https://api.assemblyai.com/v2/transcript"))
                .header("Content-Type", "application/json")
                .header("authorization", API_KEY)
                .POST(HttpRequest.BodyPublishers.ofString(jsonRequest))
                .build();

        // 创建 HttpClient 实例
        HttpClient httpClient = HttpClient.newHttpClient();

        // 发送 POST 请求
        HttpResponse<String> postResponse = httpClient.send(postRequest, HttpResponse.BodyHandlers.ofString());
        System.out.println("postResponse: " + postResponse.body());

        // 将响应体转换为 Transcript 对象
        transcript = gson.fromJson(postResponse.body(), Transcript.class);
        System.out.println("Transcript ID: " + transcript.getId());

        // 构建 GET 请求以获取转录状态
        HttpRequest getRequest = HttpRequest.newBuilder()
                .uri(java.net.URI.create("https://api.assemblyai.com/v2/transcript/" + transcript.getId()))
                .header("authorization", API_KEY)
                .GET()
                .build();

        while (true) {
            // 发送 GET 请求以获取转录状态
            HttpResponse<String> getResponse = httpClient.send(getRequest, HttpResponse.BodyHandlers.ofString());
            // 将响应体转换为 Transcript 对象
            transcript = gson.fromJson(getResponse.body(), Transcript.class);
//            System.out.println("getResponse: " + getResponse.body());
            System.out.println(transcript.getStatus());
            if ("completed".equals(transcript.getStatus())) {
                break; // 如果转录完成,退出循环
            }  else if ("error".equals(transcript.getStatus())) {
                System.out.println("Transcription error." + transcript.getError());
                break; // 如果转录失败,退出循环
            } else {
                Thread.sleep(1000); // 如果转录未完成,等待 1 秒后重试
            }
        }

        System.out.println("Transcription completed.");
        // 输出转录文本
        System.out.println("Transcript text: " + transcript.getText());
    }
}

注意:上述Maven工程使用Gson库转换json字符串和对象,需要引入Gson依赖:

maven 复制代码
<dependency>
  <groupId>com.google.code.gson</groupId>
  <artifactId>gson</artifactId>
  <version>2.13.1</version>
</dependency>

运行结果如下:

shell 复制代码
JSON Request: {"audio_url":"https://github.com/johnmarty3/JavaAPITutorial/blob/main/Thirsty.mp4?raw\u003dtrue"}
postResponse: {"id": "f3ffd34e-db34-4f89-b3d7-2c3bada97e3f", "language_model": "assemblyai_default", "acoustic_model": "assemblyai_default", "language_code": "en_us", "status": "queued", "audio_url": "https://github.com/johnmarty3/JavaAPITutorial/blob/main/Thirsty.mp4?raw=true", "text": null, "words": null, "utterances": null, "confidence": null, "audio_duration": null, "punctuate": true, "format_text": true, "dual_channel": null, "webhook_url": null, "webhook_status_code": null, "webhook_auth": false, "webhook_auth_header_name": null, "speed_boost": false, "auto_highlights_result": null, "auto_highlights": false, "audio_start_from": null, "audio_end_at": null, "word_boost": [], "boost_param": null, "prompt": null, "keyterms_prompt": null, "filter_profanity": false, "redact_pii": false, "redact_pii_audio": false, "redact_pii_audio_quality": null, "redact_pii_audio_options": null, "redact_pii_policies": null, "redact_pii_sub": null, "speaker_labels": false, "speaker_options": null, "content_safety": false, "iab_categories": false, "content_safety_labels": {}, "iab_categories_result": {}, "language_detection": false, "language_confidence_threshold": null, "language_confidence": null, "custom_spelling": null, "throttled": false, "auto_chapters": false, "summarization": false, "summary_type": null, "summary_model": null, "custom_topics": false, "topics": [], "speech_threshold": null, "speech_model": null, "chapters": null, "disfluencies": false, "entity_detection": false, "sentiment_analysis": false, "sentiment_analysis_results": null, "entities": null, "speakers_expected": null, "summary": null, "custom_topics_results": null, "is_deleted": null, "multichannel": null, "project_id": 671839, "token_id": 678042}
Transcript ID: f3ffd34e-db34-4f89-b3d7-2c3bada97e3f
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
completed
Transcription completed.
Transcript text: These pretzels are making me thirsty.

其中Thirsty.mp4音频文件素材网址为:https://github.com/johnmarty3/JavaAPITutorial/blob/main/Thirsty.mp4

参考资料

相关推荐
java1234_小锋6 分钟前
Java高频面试题:Spring-AOP通知和执行顺序?
java·开发语言·spring
番茄去哪了10 分钟前
Java基础面试题day02
java·开发语言·面向对象编程
我是咸鱼不闲呀26 分钟前
力扣Hot100系列22(Java)——[图论]总结(岛屿数量,腐烂的橘子,课程表,实现Trie(前缀树))
java·leetcode·图论
1104.北光c°26 分钟前
深入浅出 Elasticsearch:从搜索框到精准排序的架构实战
java·开发语言·elasticsearch·缓存·架构·全文检索·es
MSTcheng.32 分钟前
【优选算法必修篇——位运算】『面试题 01.01. 判定字符是否唯一&面试题 17.19. 消失的两个数字』
java·算法·面试
蹦哒34 分钟前
Kotlin 与 Java 语法差异
java·python·kotlin
左左右右左右摇晃35 分钟前
Java并发——并发编程底层原理
java·开发语言
一个有温度的技术博主40 分钟前
Redis系列八:Jedis连接池在java中的使用
java·redis·bootstrap
cyforkk42 分钟前
Java 并发编程教科书级范例:深入解析 computeIfAbsent 与方法引用
java·开发语言
后青春期的诗go1 小时前
泛微OA-E9与第三方系统集成开发企业级实战记录(八)
java·接口·金蝶·泛微·oa·集成开发·对接