ebook2audiobook开源程序使用动态 AI 模型和语音克隆将电子书转换为带有章节和元数据的有声读物。支持 1,107+ 种语言

​一、软件介绍

文末提供程序和源码下载

ebook2audiobook开源程序使用动态 AI 模型和语音克隆将电子书转换为带有章节和元数据的有声读物。支持 1,107+ 种语言。从电子书到带有章节和元数据的有声读物的 CPU/GPU 转换器,使用 XTTSv2、Bark、Vits、Fairseq、YourTTS 等。支持语音克隆和 +1110 种语言!

二、Features 特征

  • 📚 Splits eBook into chapters for organized audio.
    📚 将电子书拆分为多个章节,以便有组织的音频。
  • 🎙️ High-quality text-to-speech with Coqui XTTSv2 and Fairseq (and more).
    🎙️ 使用 Coqui XTTSv2 和 Fairseq(以及更多)实现高质量的文本转语音。
  • 🗣️ Optional voice cloning with your own voice file.
    🗣️ 使用您自己的语音文件进行可选的语音克隆。
  • 🌍 Supports +1110 languages (English by default). List of Supported languages
    🌍 支持 +1110 种语言(默认为英语)。支持的语言列表
  • 🖥️ Designed to run on 4GB RAM.
    🖥️ 设计为在 4GB RAM 上运行。

Supported Languages 支持的语言

Arabic (ar) 阿拉伯语 (ar) Chinese (zh) 中文 (zh) English (en) 英语 (en) Spanish (es) 英语 (en)
French (fr) 法语 (fr) German (de) 德语 (de) Italian (it) 意大利语 (it) Portuguese (pt) 葡萄牙语 (pt)
Polish (pl) 波兰语 (pl) Turkish (tr) 土耳其语 (tr) Russian (ru) 俄语 (ru) Dutch (nl) 荷兰语 (nl)
Czech (cs) 捷克语 (cs) Japanese (ja) 日语 (ja) Hindi (hi) 印地语 (hi) Bengali (bn) 孟加拉语 (bn)
Hungarian (hu) 匈牙利语 (胡) Korean (ko) 韩语 (ko) Vietnamese (vi) 越南语 (vi) Swedish (sv) 瑞典语 (sv)
Persian (fa) 波斯语 (fa) Yoruba (yo) 约鲁巴语 (yo) Swahili (sw) 斯瓦希里语 (sw) Indonesian (id) 印度尼西亚语 (id)
Slovak (sk) 斯洛伐克语 (sk) Croatian (hr) 克罗地亚语 (hr) Tamil (ta) 泰米尔语 (ta) Danish (da) 丹麦语 (da)

三、Hardware Requirements 硬件要求

  • 4gb RAM minimum, 8GB recommended
    最低 4GB RAM,推荐 8GB
  • Virtualization enabled if running on windows (Docker only)
    如果在 Windows 上运行,则启用虚拟化(仅限 Docker)
  • CPU (intel, AMD, ARM), GPU (Nvidia, AMD*, Intel*) (Recommended), MPS (Apple Silicon CPU) *available very soon
    CPU(英特尔、AMD、ARM)、GPU(Nvidia、AMD*、Intel*)(推荐)、MPS(Apple Silicon CPU)*即将推出

四、安装说明

  1. Clone repo 克隆存储库(下载很慢的话,文末提供网盘下载)
复制代码
git clone https://github.com/DrewThomasson/ebook2audiobook.git
cd ebook2audiobook

Launching Gradio Web Interface

启动 Gradio Web 界面

  1. Run ebook2audiobook: 运行 ebook2audiobook:

    • Linux/MacOS Linux/MacOS作系统

      复制代码
      ./ebook2audiobook.sh  # Run launch script
    • Mac Launcher Mac 启动器

      Double click Mac Ebook2Audiobook Launcher.command 双击 Mac Ebook2Audiobook Launcher.command

    • Windows 窗户

      复制代码
      ebook2audiobook.cmd  # Run launch script or double click on it
    • Windows Launcher Windows 启动器

      Double click ebook2audiobook.cmd 双击 ebook2audiobook.cmd

    • Manual Python Install 手动安装 Python

      复制代码
      REQUIRED_PROGRAMS=("calibre" "ffmpeg" "nodejs" "mecab" "espeak-ng" "rust" "sox")
      REQUIRED_PYTHON_VERSION="3.12"
      pip install -r requirements.txt  # Install Python Requirements
      python app.py  # Run Ebook2Audiobook
  2. Open the Web App : Click the URL provided in the terminal to access the web app and convert eBooks. http://localhost:7860/

    打开 Web 应用程序:单击终端中提供的 URL 以访问 Web 应用程序并转换电子书。 http://localhost:7860/

  3. For Public Link : python app.py --share (all OS) ./ebook2audiobook.sh --share (Linux/MacOS) ebook2audiobook.cmd --share (Windows)

    对于公共链接: python app.py --share (所有作系统) ./ebook2audiobook.sh --share (Linux/MacOS) ebook2audiobook.cmd --share (Windows)

Important 重要

If the script is stopped and run again, you need to refresh your gradio GUI interface
如果脚本已停止并再次运行,则需要刷新 gradio GUI 界面
to let the web page reconnect to the new connection socket.
让网页重新连接到新的连接套接字。

五、Basic Usage 基本用法

  • Linux/MacOS: Linux/MacOS:

    复制代码
    ./ebook2audiobook.sh --headless --ebook <path_to_ebook_file> \
        --voice [path_to_voice_file] --language [language_code]
  • Windows 窗户

    复制代码
    ebook2audiobook.cmd --headless --ebook <path_to_ebook_file>
        --voice [path_to_voice_file] --language [language_code]
  • [--ebook] : Path to your eBook file

    --ebook\]:您的 eBook 文件的路径

    --voice\]:语音克隆文件路径(可选)

    --language\]:ISO-639-3 中的语言代码(即:ita 代表意大利语,eng 代表英语,deu 代表德语...... Default language is eng and --language is optional for default language set in ./lib/lang.py. 默认语言为 eng,对于 ./lib/lang.py 中设置的默认语言,--language 是可选的。 The ISO-639-1 2 letters codes are also supported. 还支持 ISO-639-1 2 字母代码。

自定义模型 zip 上传示例

(must be a .zip file containing the mandatory model files. Example for XTTS: config.json, model.pth, vocab.json and ref.wav)
(必须是包含必需模型文件的 .zip 文件。XTTS 示例:config.json、model.pth、vocab.json 和 ref.wav)

  • Linux/MacOS Linux/MacOS作系统

    复制代码
    ./ebook2audiobook.sh --headless --ebook <ebook_file_path> \
        --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path>
  • Windows 窗户

    复制代码
    ebook2audiobook.cmd --headless --ebook <ebook_file_path> \
        --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path>
  • <custom_model_path> : Path to model_name.zip file, which must contain (according to the tts engine) all the mandatory files
    model_name.zip 文件的路径,必须包含(根据 tts 引擎)所有必需的文件
    (see ./lib/models.py). (参见 ./lib/models.py)。

For Detailed Guide with list of all Parameters to use

有关详细指南,其中包含要使用的所有参数的列表

  • Linux/MacOS Linux/MacOS作系统

    复制代码
    ./ebook2audiobook.sh --help
  • Windows 窗户

    复制代码
    ebook2audiobook.cmd --help
  • Or for all OS python app.py --help
    或适用于所有作系统 python app.py --help

复制代码
usage: app.py [-h] [--script_mode SCRIPT_MODE] [--session SESSION] [--share]
              [--headless] [--ebook EBOOK] [--ebooks_dir EBOOKS_DIR]
              [--language LANGUAGE] [--voice VOICE] [--device {cpu,gpu,mps}]
              [--tts_engine {xtts,bark,vits,fairseq,yourtts}]
              [--custom_model CUSTOM_MODEL] [--fine_tuned FINE_TUNED]
              [--output_format OUTPUT_FORMAT] [--temperature TEMPERATURE]
              [--length_penalty LENGTH_PENALTY] [--num_beams NUM_BEAMS]
              [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P]
              [--speed SPEED] [--enable_text_splitting] [--output_dir OUTPUT_DIR]
              [--version]

Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the Gradio interface or run the script in headless mode for direct conversion.

options:
  -h, --help            show this help message and exit
  --session SESSION     Session to resume the conversion in case of interruption, crash, 
                            or reuse of custom models and custom cloning voices.

**** The following option are for gradio/gui mode only:
  Optional

  --share               Enable a public shareable Gradio link.

**** The following options are for --headless mode only:
  --headless            Run the script in headless mode
  --ebook EBOOK         Path to the ebook file for conversion. Cannot be used when --ebooks_dir is present.
  --ebooks_dir EBOOKS_DIR
                        Relative or absolute path of the directory containing the files to convert. 
                            Cannot be used when --ebook is present.
  --language LANGUAGE   Language of the e-book. Default language is set 
                            in ./lib/lang.py sed as default if not present. All compatible language codes are in ./lib/lang.py

optional parameters:
  --voice VOICE         (Optional) Path to the voice cloning file for TTS engine. 
                            Uses the default voice if not present.
  --device {cpu,gpu,mps}
                        (Optional) Pprocessor unit type for the conversion. 
                            Default is set in ./lib/conf.py if not present. Fall back to CPU if GPU not available.
  --tts_engine {xtts,bark,vits,fairseq,yourtts}
                        (Optional) Preferred TTS engine (available are: ['xtts', 'bark', 'vits', 'fairseq', 'yourtts'].
                            Default depends on the selected language. The tts engine should be compatible with the chosen language
  --custom_model CUSTOM_MODEL
                        (Optional) Path to the custom model zip file cntaining mandatory model files. 
                            Please refer to ./lib/models.py
  --fine_tuned FINE_TUNED
                        (Optional) Fine tuned model path. Default is builtin model.
  --output_format OUTPUT_FORMAT
                        (Optional) Output audio format. Default is set in ./lib/conf.py
  --temperature TEMPERATURE
                        (xtts only, optional) Temperature for the model. 
                            Default to config.json model. Higher temperatures lead to more creative outputs.
  --length_penalty LENGTH_PENALTY
                        (xtts only, optional) A length penalty applied to the autoregressive decoder. 
                            Default to config.json model. Not applied to custom models.
  --num_beams NUM_BEAMS
                        (xtts only, optional) Controls how many alternative sequences the model explores. Must be equal or greater than length penalty. 
                            Default to config.json model.
  --repetition_penalty REPETITION_PENALTY
                        (xtts only, optional) A penalty that prevents the autoregressive decoder from repeating itself. 
                            Default to config.json model.
  --top_k TOP_K         (xtts only, optional) Top-k sampling. 
                            Lower values mean more likely outputs and increased audio generation speed. 
                            Default to config.json model.
  --top_p TOP_P         (xtts only, optional) Top-p sampling. 
                            Lower values mean more likely outputs and increased audio generation speed. Default to 0.85
  --speed SPEED         (xtts only, optional) Speed factor for the speech generation. 
                            Default to config.json model.
  --enable_text_splitting
                        (xtts only, optional) Enable TTS text splitting. This option is known to not be very efficient. 
                            Default to config.json model.
  --output_dir OUTPUT_DIR
                        (Optional) Path to the output directory. Default is set in ./lib/conf.py
  --version             Show the version of the script and exit

Example usage:    
Windows:
    Gradio/GUI:
    ebook2audiobook.cmd
    Headless mode:
    ebook2audiobook.cmd --headless --ebook '/path/to/file'
Linux/Mac:
    Gradio/GUI:
    ./ebook2audiobook.sh
    Headless mode:
    ./ebook2audiobook.sh --headless --ebook '/path/to/file'

NOTE: in gradio/gui mode, to cancel a running conversion, just click on the [X] from the ebook upload component.
注意:在 gradio/gui 模式下,要取消正在运行的转换,只需单击电子书上传组件中的 [X] 即可。

Docker GPU Options Docker GPU 选项

Available tags: latest (CUDA 11), cpu, rocm, cuda11, cuda12, cuda128, xpu (x86 only)
可用标签: latest (CUDA 11)、 cpurocm cuda11 cuda12 cuda128 xpu (仅限 x86)

Running the Docker Container

运行 Docker 容器

To run the Docker container and start the Gradio interface, use the following command:
要运行 Docker 容器并启动 Gradio 接口,请使用以下命令:

-Run with CPU only

  • 仅使用 CPU 运行
复制代码
docker run --pull always --rm -p 7860:7860 athomasson2/ebook2audiobook

-Run with GPU Speedup (NVIDIA compatible only)

  • 使用 GPU 加速运行(仅兼容 NVIDIA)
复制代码
docker run --pull always --rm --gpus all -p 7860:7860 athomasson2/ebook2audiobook
Building the Docker Container

构建 Docker 容器

  • You can build the docker image with the command:
    您可以使用以下命令构建 docker 镜像:
复制代码
docker build -t athomasson2/ebook2audiobook .

This command will start the Gradio interface on port 7860.(localhost:7860)
此命令将在端口 7860 上启动 Gradio 接口。(本地主机:7860)

  • For more options add the parameter --help
    有关更多选项,请添加参数 --help

Docker container file locations

Docker 容器文件位置

All ebook2audiobooks will have the base dir of /app/ For example: tmp = /app/tmp audiobooks = /app/audiobooks
所有 ebook2audiobooks 的基本目录均为 /app/ 例如: tmp = /app/tmp audiobooks = /app/audiobooks

Docker headless guide Docker 无头指南

  • Before you do run this you need to create a dir named "input-folder" in your current dir which will be linked, This is where you can put your input files for the docker image to see
    在运行此命令之前,您需要在当前目录中创建一个名为"input-folder"的目录,该目录将被链接,这是您可以放置输入文件供 docker 映像查看的地方
复制代码
mkdir input-folder && mkdir Audiobooks
  • In the command below swap out YOUR_INPUT_FILE.TXT with the name of your input file
    在下面的命令中,将 YOUR_INPUT_FILE.TXT 替换为输入文件的名称
复制代码
docker run --pull always --rm \
    -v $(pwd)/input-folder:/app/input_folder \
    -v $(pwd)/audiobooks:/app/audiobooks \
    athomasson2/ebook2audiobook \
    --headless --ebook /input_folder/YOUR_EBOOK_FILE
  • The output Audiobooks will be found in the Audiobook folder which will also be located in your local dir you ran this docker command in
    输出有声读物将在 Audiobook 文件夹中找到,该文件夹也位于您运行此 docker 命令的本地目录中

To get the help command for the other parameters this program has you can run this

要获取该程序具有的其他参数的 help 命令,您可以运行此命令

复制代码
docker run --pull always --rm athomasson2/ebook2audiobook --help

That will output this Help command output
这将输出此 Help 命令输出

Docker Compose

This project uses Docker Compose to run locally. You can enable or disable GPU support by setting either *gpu-enabled or *gpu-disabled in docker-compose.yml
此项目使用 Docker Compose 在本地运行。您可以通过在 *gpu-enabled *gpu-disabled docker-compose.yml

Steps to Run 运行步骤
  1. Clone the Repository (if you haven't already):
    克隆存储库(如果尚未克隆):

    复制代码
    git clone https://github.com/DrewThomasson/ebook2audiobook.git
    cd ebook2audiobook
  2. Set GPU Support (disabled by default) To enable GPU support, modify docker-compose.yml and change *gpu-disabled to *gpu-enabled
    设置 GPU 支持(默认处于禁用状态)要启用 GPU 支持,请修改 docker-compose.yml 并更改为 *gpu-disabled *gpu-enabled

  3. Start the service: 启动服务:

    复制代码
    docker-compose up -d
  4. Access the service: The service will be available at http://localhost:7860.
    访问服务:该服务将在 http://localhost:7860 提供。

Common Docker Issues 常见的 Docker 问题

  • python: can't open file '/home/user/app/app.py': [Errno 2] No such file or directory (Just remove all post arguments as I replaced the CMD with ENTRYPOINT in the Dockerfile)
    python: can't open file '/home/user/app/app.py': [Errno 2] No such file or directory (只需删除所有 post 参数,因为我在 Dockerfile 中替换了 CMD with ENTRYPOINT

    • Example: docker run --pull always athomasson2/ebook2audiobook app.py --script_mode full_docker - > corrected - > docker run --pull always athomasson2/ebook2audiobook
      示例: docker run --pull always athomasson2/ebook2audiobook app.py --script_mode full_docker - 已更正> - > docker run --pull always athomasson2/ebook2audiobook
    • Arguments can be easily added like this now docker run --pull always athomasson2/ebook2audiobook --share
      现在可以 docker run --pull always athomasson2/ebook2audiobook --share 像这样轻松添加参数
  • Docker gets stuck downloading Fine-Tuned models. (This does not happen for every computer but some appear to run into this issue) Disabling the progress bar appears to fix the issue, as discussed here in #191 Example of adding this fix in the docker run command

    Docker 在下载 Fine-Tuned 模型时遇到困难。(并非每台计算机都会发生这种情况,但有些计算机似乎会遇到此问题)禁用进度条似乎可以解决问题,如 #191 在 docker run 命令中添加此修复程序的示例中所述

复制代码
docker run --pull always --rm --gpus all -e HF_HUB_DISABLE_PROGRESS_BARS=1 -e HF_HUB_ENABLE_HF_TRANSFER=0 \
    -p 7860:7860 athomasson2/ebook2audiobook

Supported eBook Formats 支持的电子书格式

  • .epub, .pdf, .mobi, .txt, .html, .rtf, .chm, .lit, .pdb, .fb2, .odt, .cbr, .cbz, .prc, .lrf, .pml, .snb, .cbc, .rb, .tcr
  • Best results : .epub or .mobi for automatic chapter detection
    最佳结果: .epub.mobi 用于自动章节检测

Output Formats 输出格式

  • Creates a ['m4b', 'm4a', 'mp4', 'webm', 'mov', 'mp3', 'flac', 'wav', 'ogg', 'aac'] (set in ./lib/conf.py) file with metadata and chapters.
    创建一个 ['m4b', 'm4a', 'mp4', 'webm', 'mov', 'mp3', 'flac', 'wav', 'ogg', 'aac'] 包含元数据和章节的(在 ./lib/conf.py 中设置)文件。

六、软件下载

夸克网盘分享

本文信息来源于GitHub作者地址:https://github.com/DrewThomasson/ebook2audiobook

相关推荐
龙湾开发5 分钟前
轻量级高性能推理引擎MNN 学习笔记 02.MNN主要API
人工智能·笔记·学习·机器学习·mnn
CopyLower28 分钟前
Java与AI技术结合:从机器学习到生成式AI的实践
java·人工智能·机器学习
workflower38 分钟前
使用谱聚类将相似度矩阵分为2类
人工智能·深度学习·算法·机器学习·设计模式·软件工程·软件需求
jndingxin41 分钟前
OpenCV CUDA 模块中在 GPU 上对图像或矩阵进行 翻转(镜像)操作的一个函数 flip()
人工智能·opencv
文人sec44 分钟前
接口自动化测试设计思路--设计实战
python·https·单元测试·自动化·pytest
囚生CY1 小时前
【速写】TRL:Trainer的细节与思考(PPO/DPO+LoRA可行性)
人工智能
杨德兴1 小时前
3.3 阶数的作用
人工智能·学习
望获linux1 小时前
医疗实时操作系统方案:手术机器人的微秒级运动控制
人工智能·机器人·实时操作系统·rtos·嵌入式软件·医疗自动化
仓颉编程语言1 小时前
仓颉Magic亮相GOSIM AI Paris 2025:掀起开源AI框架新热潮
人工智能·华为·开源·鸿蒙·仓颉编程语言
攻城狮7号1 小时前
一文理清人工智能,机器学习,深度学习的概念
人工智能·深度学习·机器学习·ai