ID: 5d878f98586f124c232d6024
Screenshots
TEXT TO SPEECH
WHAT IT IS?
It is a Text to speech model Based on deepSpeech paper implementation. The Deep Voice 3 architecture consists of three components:
- Encoder: A fully-convolutional encoder, which converts textual features to an internal learned representation.
- Decoder: A fully-convolutional causal decoder, which decodes the learned representation with a multi-hop convolutional attention mechanism into a low-dimensional audio representation (mel-scale spectrograms) in an autoregressive manner.
- Converter: A fully-convolutional post-processing network, which predicts final vocoder parameters (depending on the vocoder choice) from the decoder hidden states. Unlike the decoder, the converter is non-causal and can thus depend on future context information.
HOW TO USE?
To run the Script Sample Commandpython run.py 20180505_deepvoice3_checkpoint_step000640000.pth nikl_preprocess/example.txt ./
For Help And Other Options - python run.py -h
Optional Arguments
-- hparams =<parmas> Hyper parameters [default: ].
-- preset =<json> Path of preset parameters (json).
--checkpoint-seq2seq =<path> Load seq2seq model from checkpoint path.
--checkpoint-postnet =<path> Load postnet model from checkpoint path.
--file-name-suffix =<s> File name suffix [default: ].
--max-decoder-steps =<N> Max decoder steps [default: 500].
--replace_pronunciation_prob =<N> Prob [default: 0.0].
--speaker_id=<id> Speaker ID (for multi-speaker model).
--output-html Output html for blog post.
-h, --help Show help message.
WHAT ARE THE REQUIREMENTS?
To get all the requirements and dependencies installed run the command
For GPU - pip install -r gpu_requirements.txt
For CPU - pip install -r cpu_requirements.txt
Stats
CPU - 0.000558 GPU - 0.000312
User Reviews
0 total ratings
Model has not been reviewed yet.