glados-tts/README.md

83 lines
3.4 KiB
Markdown

# GLaDOS Text-to-speech (TTS) Voice Generator
[![Build Status](https://jenkins.sudo.is/buildStatus/icon?job=ben%2Fglados-tts%2Fmain&style=flat-square)](https://jenkins.sudo.is/job/ben/job/glados-tts/)
[![git](https://git.sudo.is/shieldsio/static/v1?label=git&message=git.sudo.is/ben/glados-tts&logo=gitea&style=flat-square&logoWidth=20&color=darkgreen)](https://git.sudo.is/ben/glados-tts)
[![github](https://git.sudo.is/shieldsio/static/v1?label=github&message=benediktkr/glados-tts&logo=github&style=flat-square&logoWidth=20&color=darkgreen)](https://github.com/benediktkr/glados-tts)
[![MIT](https://git.sudo.is/shieldsio/badge/license-MIT-blue?style=flat-square)](LICENSE)
Neural network based TTS Engine.
## Notes about this fork
Forked by [`ben`](https://git.sudo.is/ben) (:github: [`@benediktkr`](https://github.com/benediktkr)) from
[`github:VRCWizard/glados-tts-voice-wizard`](https://github.com/VRCWizard/glados-tts-voice-wizard),
which in turn was a fork of
[`github:R2D2FISH/glados-tts`](https://github.com/R2D2FISH/glados-tts).
This fork modernizes and improves the Python code in the project and does a bunch of housekeeping.
* `[DONE]`: Gets rid of the `SciPy` dependency (replaced with the more modern and lightwight [`pysoundfile`](https://github.com/gooofy/py-espeak-ng) (since all it was used for was writing a `.wav` file to disk)
* `[DONE]`: Support modern stable Python 3 versions, and update dependencies.
* `[DONE]`: Versioned packages with `poetry` and `pyproject.toml`
* `[DONE]`: Configuration handling with `click`.
* `[DONE]`: Better logging with `loguru`
* `[DONE]`: Python coding style and code quality improvements (proper handling of `file` object, improved logging..)
* `[DONE]`: Switch to using ASGI with `uvicorn` and `fastapi` instead of Flask and WSGI, and support production-capable deployments as default.
* `[DONE]`: Docker support
* `[TODO]`: Support Home Assistant through the [`notify` integration](https://www.home-assistant.io/integrations/notify/)
* `[TODO]`: see if its possible to avoid `espeak-ng` as a system package dependency (python bindings, buliding the C library, etc)
No work on the speech model itself is expected.
![chell](chell.jpg)
## Description
The initial, regular Tacotron model was trained first on LJSpeech, and
then on a heavily modified version of the [Ellen
McClain](https://en.wikipedia.org/wiki/Ellen_McLain) dataset (all
non-Portal 2 voice lines removed, punctuation added).
* The Forward Tacotron model was only trained on about 600 voice lines.
* The HiFiGAN model was generated through transfer learning from the sample.
* All models have been optimized and quantized.
## Install
First you need to [install the `espeak-ng` system
packages](https://github.com/espeak-ng/espeak-ng/blob/master/docs/guide.md).
```shell
# for debian/ubuntu:
sudo apt-get install espeak-ng
# for fedora/amazon:
sudo yum install espeak-ng
```
This can hopefully be improved in the future. There is a Python
bindings for `espeak` (at a glance, found
[`py-espeak-ng`](https://github.com/gooofy/py-espeak-ng)).
Then install the poetry-managed virtualenv
```shell
poetry install
```
## Usage
If you want to just play around with the TTS, works on the shell:
```shell
poetry run gladosctl
```
The TTS engine can also run as a web server:
```shell
poetry run gladosctl restapi
```
A public instance of the http api is running at `http://www.sudo.is/api/glados`, where you can also read [api documentation](https://www.sudo.is/api/glados/docs).