ID: 5d878f98586f124c232d6024

MultiSpeaker text to Speech

by Raj Kumar

Convolutional sequence-to-sequence model with attention for text-to-speech synthesis


License: MIT license

Tags: Pytoch TorchAuido Text To Speech Audio Synthesis Sequence To Sequence NLP nltk

Ā Model stats and performance
Dataset Used LJSpeech
Framework PyTorch
OS Used Linux
Publication View
Inference time in seconds per sample.

Screenshots


TEXT TO SPEECH

WHAT IT IS?

It is a Text to speech model Based on deepSpeech paper implementation. The Deep Voice 3 architecture consists of three components:

  • Encoder: A fully-convolutional encoder, which converts textual features to an internal learned representation.
  • Decoder: A fully-convolutional causal decoder, which decodes the learned representation with a multi-hop convolutional attention mechanism into a low-dimensional audio representation (mel-scale spectrograms) in an autoregressive manner.
  • Converter: A fully-convolutional post-processing network, which predicts final vocoder parameters (depending on the vocoder choice) from the decoder hidden states. Unlike the decoder, the converter is non-causal and can thus depend on future context information.

    HOW TO USE?

    To run the Script Sample Command python run.py 20180505_deepvoice3_checkpoint_step000640000.pth nikl_preprocess/example.txt ./

For Help And Other Options - python run.py -h Optional Arguments -- hparams                                    =<parmas>          Hyper parameters [default: ].

-- preset                                        =<json>                Path of preset parameters (json).

--checkpoint-seq2seq                  =<path>                Load seq2seq model from checkpoint path.

--checkpoint-postnet                    =<path>                Load postnet model from checkpoint path.

--file-name-suffix                          =<s>                     File name suffix [default: ].

--max-decoder-steps                   =<N>                     Max decoder steps [default: 500].

--replace_pronunciation_prob     =<N>                     Prob [default: 0.0].

--speaker_id=<id>                                                     Speaker ID (for multi-speaker model).

--output-html                                                              Output html for blog post.

-h, --help  Show help message.

WHAT ARE THE REQUIREMENTS?

To get all the requirements and dependencies installed run the command For GPU - pip install -r gpu_requirements.txt For CPU - pip install -r cpu_requirements.txt

Stats

CPU - 0.000558 GPU - 0.000312

Author View Profile


Raj Kumar
New Delhi, India.
Level 20 25450 XP

1258 Profile
Views

I love coding

User Reviews



0 total ratings

Model has not been reviewed yet.

More by this user | Show All



Also checkout...