有时候程序跑不通,不一定是你的问题,可能真是别人的问题。
python
# coding=utf-8
import dashscope
from dashscope.audio.tts_v2 import *
# 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
dashscope.api_key = "sk-f0eac01de2cc41a4ffbd0b1f6745b98b"
# 模型
model = "cosyvoice-v3-flash"
# 音色
voice = "longhuhu_v3"
text = ",".join([
"今天天气怎么样",
"多云转晴",
"我好像出去玩。",
])
synthesizer = SpeechSynthesizer(model=model, voice=voice, format=AudioFormat.OGG_OPUS_8KHZ_MONO_16KBPS)
with open(f'test.opus', 'wb') as f:
audio = synthesizer.call(text)
f.write(audio)
print('[Metric] requestId为:{},首包延迟为:{}毫秒'.format(synthesizer.get_last_request_id(), synthesizer.get_first_package_delay()))
上面是一段阿里云 CosyVoice 语音合成代码。
WAV 合成 BUG
首先设置WAV格式为 8Hz + 16Bit
python
synthesizer = SpeechSynthesizer(model=model, voice=voice, format=AudioFormat.WAV_8000HZ_MONO_16BIT)
输出文件如下,使用ffmpeg 工具查看文件信息
bash
neo@Mac test % ffprobe -i 深圳全国数字中继.wav
ffprobe version 8.0.1 Copyright (c) 2007-2025 the FFmpeg developers
built with Apple clang version 17.0.0 (clang-1700.6.3.2)
configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/8.0.1_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gpl --enable-libsvtav1 --enable-libopus --enable-libx264 --enable-libmp3lame --enable-libdav1d --enable-libvpx --enable-libx265 --enable-videotoolbox --enable-audiotoolbox --enable-neon
libavutil 60. 8.100 / 60. 8.100
libavcodec 62. 11.100 / 62. 11.100
libavformat 62. 3.100 / 62. 3.100
libavdevice 62. 1.100 / 62. 1.100
libavfilter 11. 4.100 / 11. 4.100
libswscale 9. 1.100 / 9. 1.100
libswresample 6. 1.100 / 6. 1.100
[wav @ 0x724c38000] Packet corrupt (stream = 0, dts = NOPTS).
[wav @ 0x724c38000] Estimating duration from bitrate, this may be inaccurate
Input #0, wav, from '深圳全国数字中继.wav':
Duration: 00:00:02.65, bitrate: 128 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 8000 Hz, 1 channels, s16, 128 kb/s
得出结论:8Hz + 128Bit
导致我们设备始终不识别WAV文件
Opus 合成 BUG
看代码,我选格式是 AudioFormat.OGG_OPUS_16KHZ_MONO_16KBPS
python
# coding=utf-8
import dashscope
from dashscope.audio.tts_v2 import *
# 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
dashscope.api_key = "sk-fbded045b98bc41a4ffe2c0b1f67ac01"
# 模型
model = "cosyvoice-v3-flash"
# 音色
voice = "longhuhu_v3"
text = ",".join([
"今天天气怎么样",
"多云转晴",
"我好像出去玩。",
])
synthesizer = SpeechSynthesizer(model=model, voice=voice, format=AudioFormat.OGG_OPUS_16KHZ_MONO_16KBPS)
with open(f'test.opus', 'wb') as f:
audio = synthesizer.call(text)
f.write(audio)
print('[Metric] requestId为:{},首包延迟为:{}毫秒'.format(synthesizer.get_last_request_id(), synthesizer.get_first_package_delay()))
合成文件之后 变成了 48000Hz,我艹,decode 始终出错,搞了一整天。
bash
neo@Mac test % ffprobe -i test.opus
ffprobe version 8.0.1 Copyright (c) 2007-2025 the FFmpeg developers
built with Apple clang version 17.0.0 (clang-1700.6.3.2)
configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/8.0.1_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gpl --enable-libsvtav1 --enable-libopus --enable-libx264 --enable-libmp3lame --enable-libdav1d --enable-libvpx --enable-libx265 --enable-videotoolbox --enable-audiotoolbox --enable-neon
libavutil 60. 8.100 / 60. 8.100
libavcodec 62. 11.100 / 62. 11.100
libavformat 62. 3.100 / 62. 3.100
libavdevice 62. 1.100 / 62. 1.100
libavfilter 11. 4.100 / 11. 4.100
libswscale 9. 1.100 / 9. 1.100
libswresample 6. 1.100 / 6. 1.100
Input #0, ogg, from 'test.opus':
Duration: 00:00:09.30, start: 0.040000, bitrate: 10 kb/s
Stream #0:0: Audio: opus, 48000 Hz, mono, fltp, start 0.040000
Metadata:
ENCODE : opusenc from opus-tools of Alibaba TongYi 1.3.5
neo@Mac test %
阿里云也是草台班子,现在大厂员工平均27岁,你想能弄出啥质量的代码。
现在改成 48Khz 解决了
python
import io
import pyaudio
from pydub import AudioSegment
p = pyaudio.PyAudio()
stream = p.open(
format=pyaudio.paInt16, # Opus解码后固定为16bit有符号PCM
channels=1,
rate=48000,
output=True,
)
# filename = "bi92t-gllpq.opus"
filename = "test.opus"
file = open(filename, "rb")
# Decode opus to PCM
# decoder = opuslib_next.Decoder(64000, 1)
# size = os.path.getsize(filename)
# pcm_data = decoder.decode(file.read(), size, False)
# stream.write(pcm_data)
opus_data = io.BytesIO(file.read())
opus_data.seek(0)
audio = AudioSegment.from_file(opus_data, codec="opus")
stream.write(audio.raw_data)
stream.stop_stream()
stream.close()
p.terminate()
file.close()
由于我们是嵌入式系统,只能支持到 8K,16K,上面BUG已经提交给阿里云了。等待他们解决吧。
MP3 合成,不知道是否有问题。
我要的是 8hz + 128bit
python
synthesizer = SpeechSynthesizer(model=model, voice=voice, format=AudioFormat.MP3_8000HZ_MONO_128KBPS)
结果如下,我没看懂,不知道是否正确,64kb/s ??? 有懂的帮看看。
bash
Input #0, mp3, from 'test.mp3':
Metadata:
encoder : LAME 64bits version 3.100 (http://lame.sf.net)
comment : LAME3.100
Duration: 00:00:05.76, start: 0.000000, bitrate: 64 kb/s
Stream #0:0: Audio: mp3 (mp3float), 8000 Hz, mono, fltp, 64 kb/s
再测试一下 AudioFormat.MP3_48000HZ_MONO_256KBPS
python
synthesizer = SpeechSynthesizer(model=model, voice=voice, format=AudioFormat.MP3_48000HZ_MONO_256KBPS)
结果是 Audio: mp3 (mp3float), 48000 Hz, mono, fltp, 256 kb/s
bash
Input #0, mp3, from 'test.mp3':
Metadata:
encoder : LAME 64bits version 3.100 (http://lame.sf.net)
comment : LAME3.100
Duration: 00:00:05.62, start: 0.000000, bitrate: 256 kb/s
Stream #0:0: Audio: mp3 (mp3float), 48000 Hz, mono, fltp, 256 kb/s
看这个结果。AudioFormat.MP3_8000HZ_MONO_128KBPS 输出 Audio: mp3 (mp3float), 8000 Hz, mono, fltp, 64 kb/s 是错的。
那就实锤了 MP3 合成也有 BUG