ID: 5d878f98586f124c232d6024

MultiSpeaker text to Speech

by Raj Kumar

Convolutional sequence-to-sequence model with attention for text-to-speech synthesis


License: MIT license

Tags: Pytoch TorchAuido Text To Speech Audio Synthesis Sequence To Sequence NLP nltk

 Model stats and performance
Dataset Used LJSpeech
Framework PyTorch
OS Used Linux
Publication View
Inference time in seconds per sample.

Screenshots


TEXT TO SPEECH

WHAT IT IS?

It is a Text to speech model Based on deepSpeech paper implementation. The Deep Voice 3 architecture consists of three components:

  • Encoder: A fully-convolutional encoder, which converts textual features to an internal learned representation.
  • Decoder: A fully-convolutional causal decoder, which decodes the learned representation with a multi-hop convolutional attention mechanism into a low-dimensional audio representation (mel-scale spectrograms) in an autoregressive manner.
  • Converter: A fully-convolutional post-processing network, which predicts final vocoder parameters (depending on the vocoder choice) from the decoder hidden states. Unlike the decoder, the converter is non-causal and can thus depend on future context information.

    HOW TO USE?

    To run the Script Sample Command python run.py 20180505_deepvoice3_checkpoint_step000640000.pth nikl_preprocess/example.txt ./

For Help And Other Options - python run.py -h Optional Arguments -- hparams                                    =<parmas>          Hyper parameters [default: ].

-- preset                                        =<json>                Path of preset parameters (json).

--checkpoint-seq2seq                  =<path>                Load seq2seq model from checkpoint path.

--checkpoint-postnet                    =<path>                Load postnet model from checkpoint path.

--file-name-suffix                          =<s>                     File name suffix [default: ].

--max-decoder-steps                   =<N>                     Max decoder steps [default: 500].

--replace_pronunciation_prob     =<N>                     Prob [default: 0.0].

--speaker_id=<id>                                                     Speaker ID (for multi-speaker model).

--output-html                                                              Output html for blog post.

-h, --help  Show help message.

WHAT ARE THE REQUIREMENTS?

To get all the requirements and dependencies installed run the command For GPU - pip install -r gpu_requirements.txt For CPU - pip install -r cpu_requirements.txt

Stats

CPU - 0.000558 GPU - 0.000312

Author View Profile

Raj Kumar
New Delhi, India.
Gold
20
LEVEL

6746 Profile
Views

I love coding

User Reviews



0 total ratings

Model has not been reviewed yet.

More by this user | Show All



Also checkout...