Skip to content

Commit

Permalink
Cherry-pick to r1.4 branch (#3798)
Browse files Browse the repository at this point in the history
* [TTS]add Diffsinger with opencpop dataset (#3005)

* Update requirements.txt

* fix vits reduce_sum's input/output dtype, test=tts (#3028)

* [TTS] add opencpop PWGAN example (#3031)

* add opencpop voc, test=tts

* soft link

* Update textnorm_test_cases.txt

* [TTS] add opencpop HIFIGAN example (#3038)

* add opencpop voc, test=tts

* soft link

* add opencpop hifigan, test=tts

* update

* fix dtype diff of last expand_v2 op of VITS (#3041)

* [ASR]add squeezeformer model (#2755)

* add squeezeformer model

* change CodeStyle, test=asr

* change CodeStyle, test=asr

* fix subsample rate error, test=asr

* merge classes as required, test=asr

* change CodeStyle, test=asr

* fix missing code, test=asr

* split code to new file, test=asr

* remove rel_shift, test=asr

* Update README.md

* Update README_cn.md

* Update README.md

* Update README_cn.md

* Update README.md

* fix input dtype of elementwise_mul op from bool to int64 (#3054)

* [TTS] add svs frontend (#3062)

* [TTS]clean starganv2 vc model code and add docstring (#2987)

* clean code

* add docstring

* [Doc] change define asr server config to chunk asr config, test=doc (#3067)

* Update README.md

* Update README_cn.md

* get music score, test=doc (#3070)

* [TTS]fix elementwise_floordiv's fill_constant (#3075)

* fix elementwise_floordiv's fill_constant

* add float converter for min_value in attention

* fix paddle2onnx's install version, install the newest paddle2onnx in run.sh (#3084)

* [TTS] update svs_music_score.md (#3085)

* rm unused dep, test=tts (#3097)

* Update bug-report-tts.md (#3120)

* [TTS]Fix VITS lite infer (#3098)

* [TTS]add starganv2 vc trainer (#3143)

* add starganv2 vc trainer

* fix StarGANv2VCUpdater and losses

* fix StarGANv2VCEvaluator

* add some typehint

* [TTS]【Hackathon + No.190】 + 模型复现:iSTFTNet (#3006)

* iSTFTNet implementation based on hifigan, not affect the function and execution of HIFIGAN

* modify the comment in iSTFT.yaml

* add the comments in hifigan

* iSTFTNet implementation based on hifigan, not affect the function and execution of HIFIGAN

* modify the comment in iSTFT.yaml

* add the comments in hifigan

* add iSTFTNet.md

* modify the format of iSTFTNet.md

* modify iSTFT.yaml and hifigan.py

* Format code using pre-commit

* modify hifigan.py,delete the unused self.istft_layer_id , move the self.output_conv behind else, change conv_post to output_conv

* update iSTFTNet_csmsc_ckpt.zip download link

* modify iSTFTNet.md

* modify hifigan.py and iSTFT.yaml

* modify iSTFTNet.md

* add function for generating srt file (#3123)

* add function for generating srt file

在原来websocket_client.py的基础上,增加了由wav或mp3格式的音频文件生成对应srt格式字幕文件的功能

* add function for generating srt file

在原来websocket_client.py的基础上,增加了由wav或mp3格式的音频文件生成对应srt格式字幕文件的功能

* keep origin websocket_client.py

恢复原本的websocket_client.py文件

* add generating subtitle function into README

* add generate subtitle funciton into README

* add subtitle generation function

* add subtitle generation function

* fix example/aishell local/train.sh if condition bug, test=asr (#3146)

* fix some preprocess bugs (#3155)

* add amp for U2 conformer.

* fix scaler save

* fix scaler save and load.

* mv scaler.unscale_ blow grad_clip.

* [TTS]add StarGANv2VC preprocess (#3163)

* [TTS] [黑客松]Add JETS (#3109)

* Update quick_start.md (#3175)

* [BUG] Fix progress bar unit. (#3177)

* Update quick_start_cn.md (#3176)

* [TTS]StarGANv2 VC fix some trainer bugs, add add reset_parameters (#3182)

* VITS learning rate revised, test=tts

* VITS learning rate revised, test=tts

* [s2t] mv dataset into paddlespeech.dataset (#3183)

* mv dataset into paddlespeech.dataset

* add aidatatang

* fix import

* Fix some typos. (#3178)

* [s2t] move s2t data preprocess into paddlespeech.dataset (#3189)

* move s2t data preprocess into paddlespeech.dataset

* avg model, compute wer, format rsl into paddlespeech.dataset

* fix format rsl

* fix avg ckpts

* Update pretrained model in README (#3193)

* [TTS]Fix losses of StarGAN v2 VC   (#3184)

* VITS learning rate revised, test=tts

* VITS learning rate revised, test=tts

* add new aishell model for better CER.

* add readme

* [s2t] fix cli args to config (#3194)

* fix cli args to config

* fix train cli

* Update README.md

* [ASR] Support Hubert, fintuned on the librispeech dataset (#3088)

* librispeech hubert, test=asr

* librispeech hubert, test=asr

* hubert decode

* review

* copyright, notes, example related

* hubert cli

* pre-commit format

* fix conflicts

* fix conflicts

* doc related

* doc and train config

* librispeech.py

* support hubert cli

* [ASR] fix asr 0-d tensor. (#3214)

* Update README.md

* Update README.md

* fix: 🐛 修复服务端 python ASREngine 无法使用conformer_talcs模型 (#3230)

* fix: 🐛 fix python ASREngine not pass codeswitch

* docs: 📝 Update Docs

* 修改模型判断方式

* Adding WavLM implementation

* fix model m5s

* Code clean up according to comments in #3242

* fix error in tts/st

* Changed the path for the uploaded weight

* Update phonecode.py

 # 固话的正则 错误修改
参考https://github.com/speechio/chinese_text_normalization/blob/master/python/cn_tn.py
固化的正则为:
 pattern = re.compile(r"\D((0(10|2[1-3]|[3-9]\d{2})-?)?[1-9]\d{6,7})\D")

* Adapted wavlmASR model to pretrained weights and CLI

* Changed the MD5 of the pretrained tar file due to bug fixes

* Deleted examples/librispeech/asr5/format_rsl.py

* Update released_model.md

* Code clean up for CIs

* Fixed the transpose usages ignored before

* Update setup.py

* refactor mfa scripts

* Final cleaning; Modified SSL/infer.py and README for wavlm inclusion in model options

* updating readme and readme_cn

* remove tsinghua pypi

* Update setup.py (#3294)

* Update setup.py

* refactor rhy

* fix ckpt

* add dtype param for arange API. (#3302)

* add scripts for tts code switch

* add t2s assets

* more comment on tts frontend

* fix librosa==0.8.1 numpy==1.23.5 for paddleaudio align with this version

* move ssl into t2s.frontend; fix spk_id for 0-D tensor;

* add ssml unit test

* add en_frontend file

* add mix frontend test

* fix long text oom using ssml; filter comma; update polyphonic

* remove print

* hotfix english G2P

* en frontend unit text

* fix profiler (#3323)

* old grad clip has 0d tensor problem, fix it (#3334)

* update to py3.8

* remove fluid.

* add roformer

* fix bugs

* add roformer result

* support position interpolation for langer attention context windown length.

* RoPE with position interpolation

* rope for streaming decoding

* update result

* fix rotary embeding

* Update README.md

* fix weight decay

* fix develop view confict with model's

* Add XPU support for SpeedySpeech (#3502)

* Add XPU support for SpeedySpeech

* fix typos

* update description of nxpu

* Add XPU support for FastSpeech2 (#3514)

* Add XPU support for FastSpeech2

* optimize

* Update ge2e_clone.py (#3517)

修复在windows上的多空格错误

* Fix Readme. (#3527)

* Update README.md

* Update README_cn.md

* Update README_cn.md

* Update README.md

* FIX: Added missing imports

* FIX: Fixed the implementation of a special method

* 【benchmark】add max_mem_reserved for benchmark  (#3604)

* fix profiler

* add max_mem_reserved for benchmark

* fix develop bug function:view to reshape (#3633)

* 【benchmark】fix gpu_mem unit (#3634)

* fix profiler

* add max_mem_reserved for benchmark

* fix benchmark

* 增加文件编码读取 (#3606)

Fixed #3605

* bugfix: audio_len should be 1D, no 0D, which will raise list index out (#3490)

of range error in the following decode process

Co-authored-by: Luzhenhui <[email protected]>

* Update README.md (#3532)

Fixed a typo

* fixed version for paddlepaddle. (#3701)

* fixed version for paddlepaddle.

* fix code style

* 【Fix Speech Issue No.5】issue 3444 transformation import error (#3779)

* fix paddlespeech.s2t.transform.transformation import error

* fix paddlespeech.s2t.transform import error

* 【Fix Speech Issue No.8】issue 3652 merge_yi function has a bug (#3786)

* 【Fix Speech Issue No.8】issue 3652 merge_yi function has a bug

* 【Fix Speech Issue No.8】issue 3652 merge_yi function has a bug

* 【test】add cli test readme (#3784)

* add cli test readme

* fix code style

* 【test】fix test cli bug (#3793)

* add cli test readme

* fix code style

* fix bug

* Update setup.py (#3795)

* adapt view behavior change, fix KeyError. (#3794)

* adapt view behavior change, fix KeyError.

* fix readme demo run error.

* fixed opencc version

---------

Co-authored-by: liangym <[email protected]>
Co-authored-by: TianYuan <[email protected]>
Co-authored-by: 夜雨飘零 <[email protected]>
Co-authored-by: zxcd <[email protected]>
Co-authored-by: longRookie <[email protected]>
Co-authored-by: twoDogy <[email protected]>
Co-authored-by: lemondy <[email protected]>
Co-authored-by: ljhzxc <[email protected]>
Co-authored-by: PiaoYang <[email protected]>
Co-authored-by: WongLaw <[email protected]>
Co-authored-by: Hui Zhang <[email protected]>
Co-authored-by: Shuangchi He <[email protected]>
Co-authored-by: TianHao Zhang <[email protected]>
Co-authored-by: guanyc <[email protected]>
Co-authored-by: jiamingkong <[email protected]>
Co-authored-by: zoooo0820 <[email protected]>
Co-authored-by: shuishu <[email protected]>
Co-authored-by: LixinGuo <[email protected]>
Co-authored-by: gmm <[email protected]>
Co-authored-by: Wang Huan <[email protected]>
Co-authored-by: Kai Song <[email protected]>
Co-authored-by: skyboooox <[email protected]>
Co-authored-by: fazledyn-or <[email protected]>
Co-authored-by: luyao-cv <[email protected]>
Co-authored-by: Color_yr <[email protected]>
Co-authored-by: JeffLu <[email protected]>
Co-authored-by: Luzhenhui <[email protected]>
Co-authored-by: satani99 <[email protected]>
Co-authored-by: mjxs <[email protected]>
Co-authored-by: Mattheliu <[email protected]>
  • Loading branch information
1 parent 9d61b8c commit 7b78036
Show file tree
Hide file tree
Showing 478 changed files with 30,928 additions and 3,483 deletions.
2 changes: 1 addition & 1 deletion .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ git commit -m "xxxxxx, test=doc"
1. 虽然跳过了 CI,但是还要先排队排到才能跳过,所以非自己方向看到 pending 不要着急 🤣
2.`git commit --amend` 的时候才加 `test=xxx` 可能不太有效
3. 一个 pr 多次提交 commit 注意每次都要加 `test=xxx`,因为每个 commit 都会触发 CI
4. 删除 python 环境中已经安装好的的 paddlespeech,否则可能会影响 import paddlespeech 的顺序</div>
4. 删除 python 环境中已经安装好的 paddlespeech,否则可能会影响 import paddlespeech 的顺序</div>
1 change: 0 additions & 1 deletion .github/ISSUE_TEMPLATE/bug-report-tts.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ name: "\U0001F41B TTS Bug Report"
about: Create a report to help us improve
title: "[TTS]XXXX"
labels: Bug, T2S
assignees: yt605155624

---

Expand Down
61 changes: 36 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,13 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

### Recent Update
- 👑 2023.05.31: Add [WavLM ASR-en](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/librispeech/asr5), WavLM fine-tuning for ASR on LibriSpeech.
- 👑 2023.05.04: Add [HuBERT ASR-en](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/librispeech/asr4), HuBERT fine-tuning for ASR on LibriSpeech.
- ⚡ 2023.04.28: Fix [0-d tensor](https://github.com/PaddlePaddle/PaddleSpeech/pull/3214), with the upgrade of paddlepaddle==2.5, the problem of modifying 0-d tensor has been solved.
- 👑 2023.04.25: Add [AMP for U2 conformer](https://github.com/PaddlePaddle/PaddleSpeech/pull/3167).
- 🔥 2023.04.06: Add [subtitle file (.srt format) generation example](./demos/streaming_asr_server).
- 👑 2023.04.25: Add [AMP for U2 conformer](https://github.com/PaddlePaddle/PaddleSpeech/pull/3167).
- 🔥 2023.03.14: Add SVS(Singing Voice Synthesis) examples with Opencpop dataset, including [DiffSinger](./examples/opencpop/svs1)[PWGAN](./examples/opencpop/voc1) and [HiFiGAN](./examples/opencpop/voc5), the effect is continuously optimized.
- 👑 2023.03.09: Add [Wav2vec2ASR-zh](./examples/aishell/asr3).
- 🎉 2023.03.07: Add [TTS ARM Linux C++ Demo](./demos/TTSArmLinux).
- 🔥 2023.03.03 Add Voice Conversion [StarGANv2-VC synthesize pipeline](./examples/vctk/vc3).
Expand Down Expand Up @@ -221,13 +228,13 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision

## Installation

We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7* and *paddlepaddle>=2.4.1*.
We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.8* and *paddlepaddle<=2.5.1*. Some new versions of Paddle do not have support for adaptation in PaddleSpeech, so currently only versions 2.5.1 and earlier can be supported.

### **Dependency Introduction**

+ gcc >= 4.8.5
+ paddlepaddle >= 2.4.1
+ python >= 3.7
+ paddlepaddle <= 2.5.1
+ python >= 3.8
+ OS support: Linux(recommend), Windows, Mac OSX

PaddleSpeech depends on paddlepaddle. For installation, please refer to the official website of [paddlepaddle](https://www.paddlepaddle.org.cn/en) and choose according to your own machine. Here is an example of the cpu version.
Expand Down Expand Up @@ -577,14 +584,14 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
</thead>
<tbody>
<tr>
<td> Text Frontend </td>
<td colspan="2"> &emsp; </td>
<td>
<a href = "./examples/other/tn">tn</a> / <a href = "./examples/other/g2p">g2p</a>
</td>
<td> Text Frontend </td>
<td colspan="2"> &emsp; </td>
<td>
<a href = "./examples/other/tn">tn</a> / <a href = "./examples/other/g2p">g2p</a>
</td>
</tr>
<tr>
<td rowspan="5">Acoustic Model</td>
<td rowspan="6">Acoustic Model</td>
<td>Tacotron2</td>
<td>LJSpeech / CSMSC</td>
<td>
Expand Down Expand Up @@ -619,6 +626,13 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
<a href = "./examples/vctk/ernie_sat">ERNIE-SAT-vctk</a> / <a href = "./examples/aishell3/ernie_sat">ERNIE-SAT-aishell3</a> / <a href = "./examples/aishell3_vctk/ernie_sat">ERNIE-SAT-zh_en</a>
</td>
</tr>
<tr>
<td>DiffSinger</td>
<td>Opencpop</td>
<td>
<a href = "./examples/opencpop/svs1">DiffSinger-opencpop</a>
</td>
</tr>
<tr>
<td rowspan="6">Vocoder</td>
<td >WaveFlow</td>
Expand All @@ -629,9 +643,9 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
</tr>
<tr>
<td >Parallel WaveGAN</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop</td>
<td>
<a href = "./examples/ljspeech/voc1">PWGAN-ljspeech</a> / <a href = "./examples/vctk/voc1">PWGAN-vctk</a> / <a href = "./examples/csmsc/voc1">PWGAN-csmsc</a> / <a href = "./examples/aishell3/voc1">PWGAN-aishell3</a>
<a href = "./examples/ljspeech/voc1">PWGAN-ljspeech</a> / <a href = "./examples/vctk/voc1">PWGAN-vctk</a> / <a href = "./examples/csmsc/voc1">PWGAN-csmsc</a> / <a href = "./examples/aishell3/voc1">PWGAN-aishell3</a> / <a href = "./examples/opencpop/voc1">PWGAN-opencpop</a>
</td>
</tr>
<tr>
Expand All @@ -650,9 +664,9 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
</tr>
<tr>
<td>HiFiGAN</td>
<td>LJSpeech / VCTK / CSMSC / AISHELL-3</td>
<td>LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop</td>
<td>
<a href = "./examples/ljspeech/voc5">HiFiGAN-ljspeech</a> / <a href = "./examples/vctk/voc5">HiFiGAN-vctk</a> / <a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a> / <a href = "./examples/aishell3/voc5">HiFiGAN-aishell3</a>
<a href = "./examples/ljspeech/voc5">HiFiGAN-ljspeech</a> / <a href = "./examples/vctk/voc5">HiFiGAN-vctk</a> / <a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a> / <a href = "./examples/aishell3/voc5">HiFiGAN-aishell3</a> / <a href = "./examples/opencpop/voc5">HiFiGAN-opencpop</a>
</td>
</tr>
<tr>
Expand Down Expand Up @@ -880,15 +894,20 @@ The Text-to-Speech module is originally called [Parakeet](https://github.com/Pad

- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): Use PaddleSpeech TTS and ASR to clone voice from videos.**

<div align="center">
<img src="https://raw.githubusercontent.com/jerryuhoo/VTuberTalk/main/gui/gui.png" width = "500px" />
</div>


## Citation

To cite PaddleSpeech for research, please use the following format.

```text
@inproceedings{zhang2022paddlespeech,
title = {PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit},
author = {Hui Zhang, Tian Yuan, Junkun Chen, Xintong Li, Renjie Zheng, Yuxin Huang, Xiaojie Chen, Enlei Gong, Zeyu Chen, Xiaoguang Hu, dianhai yu, Yanjun Ma, Liang Huang},
booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations},
year = {2022},
publisher = {Association for Computational Linguistics},
}
@InProceedings{pmlr-v162-bai22d,
title = {{A}$^3${T}: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing},
author = {Bai, He and Zheng, Renjie and Chen, Junkun and Ma, Mingbo and Li, Xintong and Huang, Liang},
Expand All @@ -903,14 +922,6 @@ To cite PaddleSpeech for research, please use the following format.
url = {https://proceedings.mlr.press/v162/bai22d.html},
}
@inproceedings{zhang2022paddlespeech,
title = {PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit},
author = {Hui Zhang, Tian Yuan, Junkun Chen, Xintong Li, Renjie Zheng, Yuxin Huang, Xiaojie Chen, Enlei Gong, Zeyu Chen, Xiaoguang Hu, dianhai yu, Yanjun Ma, Liang Huang},
booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations},
year = {2022},
publisher = {Association for Computational Linguistics},
}
@inproceedings{zheng2021fused,
title={Fused acoustic and text encoding for multimodal bilingual pretraining and speech translation},
author={Zheng, Renjie and Chen, Junkun and Ma, Mingbo and Huang, Liang},
Expand Down
55 changes: 35 additions & 20 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleSpeech?color=ffa"></a>
<a href="support os"><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.8+-aff.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSpeech?color=9ea"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleSpeech?color=3af"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSpeech?color=9cc"></a>
Expand Down Expand Up @@ -183,6 +183,13 @@
- 🧩 级联模型应用: 作为传统语音任务的扩展,我们结合了自然语言处理、计算机视觉等任务,实现更接近实际需求的产业级应用。

### 近期更新
- 👑 2023.05.31: 新增 [WavLM ASR-en](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/librispeech/asr5), 基于WavLM的英语识别微调,使用LibriSpeech数据集
- 👑 2023.05.04: 新增 [HuBERT ASR-en](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/librispeech/asr4), 基于HuBERT的英语识别微调,使用LibriSpeech数据集
- ⚡ 2023.04.28: 修正 [0-d tensor](https://github.com/PaddlePaddle/PaddleSpeech/pull/3214), 配合PaddlePaddle2.5升级修改了0-d tensor的问题。
- 👑 2023.04.25: 新增 [U2 conformer 的 AMP 训练](https://github.com/PaddlePaddle/PaddleSpeech/pull/3167).
- 👑 2023.04.06: 新增 [srt格式字幕生成功能](./demos/streaming_asr_server)
- 👑 2023.04.25: 新增 [U2 conformer 的 AMP 训练](https://github.com/PaddlePaddle/PaddleSpeech/pull/3167).
- 🔥 2023.03.14: 新增基于 Opencpop 数据集的 SVS (歌唱合成) 示例,包含 [DiffSinger](./examples/opencpop/svs1)[PWGAN](./examples/opencpop/voc1)[HiFiGAN](./examples/opencpop/voc5),效果持续优化中。
- 👑 2023.03.09: 新增 [Wav2vec2ASR-zh](./examples/aishell/asr3)
- 🎉 2023.03.07: 新增 [TTS ARM Linux C++ 部署示例](./demos/TTSArmLinux)
- 🔥 2023.03.03: 新增声音转换模型 [StarGANv2-VC 合成流程](./examples/vctk/vc3)
Expand Down Expand Up @@ -231,12 +238,12 @@
<a name="安装"></a>
## 安装

我们强烈建议用户在 **Linux** 环境下,*3.7* 以上版本的 *python* 上安装 PaddleSpeech。
我们强烈建议用户在 **Linux** 环境下,*3.8* 以上版本的 *python* 上安装 PaddleSpeech。同时,有一些Paddle新版本的内容没有在做适配的支持,因此目前只能使用2.5.1及之前的版本

### 相关依赖
+ gcc >= 4.8.5
+ paddlepaddle >= 2.4.1
+ python >= 3.7
+ paddlepaddle <= 2.5.1
+ python >= 3.8
+ linux(推荐), mac, windows

PaddleSpeech 依赖于 paddlepaddle,安装可以参考[ paddlepaddle 官网](https://www.paddlepaddle.org.cn/),根据自己机器的情况进行选择。这里给出 cpu 版本示例,其它版本大家可以根据自己机器的情况进行安装。
Expand Down Expand Up @@ -576,43 +583,50 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
<td>
<a href = "./examples/other/tn">tn</a> / <a href = "./examples/other/g2p">g2p</a>
</td>
</tr>
<tr>
<td rowspan="5">声学模型</td>
</tr>
<tr>
<td rowspan="6">声学模型</td>
<td>Tacotron2</td>
<td>LJSpeech / CSMSC</td>
<td>
<a href = "./examples/ljspeech/tts0">tacotron2-ljspeech</a> / <a href = "./examples/csmsc/tts0">tacotron2-csmsc</a>
</td>
</tr>
<tr>
</tr>
<tr>
<td>Transformer TTS</td>
<td>LJSpeech</td>
<td>
<a href = "./examples/ljspeech/tts1">transformer-ljspeech</a>
</td>
</tr>
<tr>
</tr>
<tr>
<td>SpeedySpeech</td>
<td>CSMSC</td>
<td >
<a href = "./examples/csmsc/tts2">speedyspeech-csmsc</a>
</td>
</tr>
<tr>
</tr>
<tr>
<td>FastSpeech2</td>
<td>LJSpeech / VCTK / CSMSC / AISHELL-3 / ZH_EN / finetune</td>
<td>
<a href = "./examples/ljspeech/tts3">fastspeech2-ljspeech</a> / <a href = "./examples/vctk/tts3">fastspeech2-vctk</a> / <a href = "./examples/csmsc/tts3">fastspeech2-csmsc</a> / <a href = "./examples/aishell3/tts3">fastspeech2-aishell3</a> / <a href = "./examples/zh_en_tts/tts3">fastspeech2-zh_en</a> / <a href = "./examples/other/tts_finetune/tts3">fastspeech2-finetune</a>
</td>
</tr>
<tr>
</tr>
<tr>
<td><a href = "https://arxiv.org/abs/2211.03545">ERNIE-SAT</a></td>
<td>VCTK / AISHELL-3 / ZH_EN</td>
<td>
<a href = "./examples/vctk/ernie_sat">ERNIE-SAT-vctk</a> / <a href = "./examples/aishell3/ernie_sat">ERNIE-SAT-aishell3</a> / <a href = "./examples/aishell3_vctk/ernie_sat">ERNIE-SAT-zh_en</a>
</td>
</tr>
</tr>
<tr>
<td>DiffSinger</td>
<td>Opencpop</td>
<td>
<a href = "./examples/opencpop/svs1">DiffSinger-opencpop</a>
</td>
</tr>
<tr>
<td rowspan="6">声码器</td>
<td >WaveFlow</td>
Expand All @@ -623,9 +637,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</tr>
<tr>
<td >Parallel WaveGAN</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop</td>
<td>
<a href = "./examples/ljspeech/voc1">PWGAN-ljspeech</a> / <a href = "./examples/vctk/voc1">PWGAN-vctk</a> / <a href = "./examples/csmsc/voc1">PWGAN-csmsc</a> / <a href = "./examples/aishell3/voc1">PWGAN-aishell3</a>
<a href = "./examples/ljspeech/voc1">PWGAN-ljspeech</a> / <a href = "./examples/vctk/voc1">PWGAN-vctk</a> / <a href = "./examples/csmsc/voc1">PWGAN-csmsc</a> / <a href = "./examples/aishell3/voc1">PWGAN-aishell3</a> / <a href = "./examples/opencpop/voc1">PWGAN-opencpop</a>
</td>
</tr>
<tr>
Expand All @@ -644,9 +658,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</tr>
<tr>
<td >HiFiGAN</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop</td>
<td>
<a href = "./examples/ljspeech/voc5">HiFiGAN-ljspeech</a> / <a href = "./examples/vctk/voc5">HiFiGAN-vctk</a> / <a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a> / <a href = "./examples/aishell3/voc5">HiFiGAN-aishell3</a>
<a href = "./examples/ljspeech/voc5">HiFiGAN-ljspeech</a> / <a href = "./examples/vctk/voc5">HiFiGAN-vctk</a> / <a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a> / <a href = "./examples/aishell3/voc5">HiFiGAN-aishell3</a> / <a href = "./examples/opencpop/voc5">HiFiGAN-opencpop</a>
</td>
</tr>
<tr>
Expand Down Expand Up @@ -703,6 +717,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</tbody>
</table>


<a name="声音分类模型"></a>
**声音分类**

Expand Down
2 changes: 1 addition & 1 deletion audio/paddleaudio/backends/soundfile_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ def soundfile_save(y: np.ndarray, sr: int, file: os.PathLike) -> None:

if sr <= 0:
raise ParameterError(
f'Sample rate should be larger than 0, recieved sr = {sr}')
f'Sample rate should be larger than 0, received sr = {sr}')

if y.dtype not in ['int16', 'int8']:
warnings.warn(
Expand Down
6 changes: 4 additions & 2 deletions audio/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,14 @@

ROOT_DIR = Path(__file__).parent.resolve()

VERSION = '1.1.0'
VERSION = '1.2.0'
COMMITID = 'none'

base = [
"kaldiio",
# paddleaudio align with librosa==0.8.1, which need numpy==1.23.x
"librosa==0.8.1",
"numpy==1.23.5",
"kaldiio",
"pathos",
"pybind11",
"parameterized",
Expand Down
2 changes: 1 addition & 1 deletion audio/tests/features/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def initWavInput(self, url=wav_url):
self.waveform, self.sr = load(os.path.abspath(os.path.basename(url)))
self.waveform = self.waveform.astype(
np.float32
) # paddlespeech.s2t.transform.spectrogram only supports float32
) # paddlespeech.audio.transform.spectrogram only supports float32
dim = len(self.waveform.shape)

assert dim in [1, 2]
Expand Down
4 changes: 2 additions & 2 deletions audio/tests/features/test_istft.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@
from paddleaudio.functional.window import get_window

from .base import FeatTest
from paddlespeech.s2t.transform.spectrogram import IStft
from paddlespeech.s2t.transform.spectrogram import Stft
from paddlespeech.audio.transform.spectrogram import IStft
from paddlespeech.audio.transform.spectrogram import Stft


class TestIstft(FeatTest):
Expand Down
2 changes: 1 addition & 1 deletion audio/tests/features/test_log_melspectrogram.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
import paddleaudio

from .base import FeatTest
from paddlespeech.s2t.transform.spectrogram import LogMelSpectrogram
from paddlespeech.audio.transform.spectrogram import LogMelSpectrogram


class TestLogMelSpectrogram(FeatTest):
Expand Down
2 changes: 1 addition & 1 deletion audio/tests/features/test_spectrogram.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
import paddleaudio

from .base import FeatTest
from paddlespeech.s2t.transform.spectrogram import Spectrogram
from paddlespeech.audio.transform.spectrogram import Spectrogram


class TestSpectrogram(FeatTest):
Expand Down
2 changes: 1 addition & 1 deletion audio/tests/features/test_stft.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
from paddleaudio.functional.window import get_window

from .base import FeatTest
from paddlespeech.s2t.transform.spectrogram import Stft
from paddlespeech.audio.transform.spectrogram import Stft


class TestStft(FeatTest):
Expand Down
Loading

0 comments on commit 7b78036

Please sign in to comment.