ID: 5d88dbb36505b97de54e84f0


by Raj Kumar

Deep Learning Based Conversation Chatbot

License: MIT

Tags: Attention Mechanism NlP ChatBot Tensorflow nltk Cornell Sequence To sequence Learning

 Model stats and performance
Dataset Used Cornell
Framework Tensorflow
OS Used Linux
Publication View
Inference time in seconds per sample.

Performance data is not available.




Conversation Chatbot built on Tensorflow using sequence to sequence architecture. Sequence to sequence model aims to map a fixed length input with a fixed length output where the length of the input and output may differ. The model consists of 3 parts: encoder, intermediate (encoder) vector and decoder.


  • A stack of several recurrent units (LSTM or GRU cells for better performance) where each accepts a single element of the input sequence, collects information for that element and propagates it forward.
  • In question-answering problem, the input sequence is a collection of all words from the question. Each word is represented as xi_ where i is the order of that word.
  • The hidden states hi _are computed using the formula:

This simple formula represents the result of an ordinary recurrent neural network. As you can see, we just apply the appropriate weights to the previous hidden state h(t-1) and the input vector x_t. Encoder Vector

  • This is the final hidden state produced from the encoder part of the model. It is calculated using the formula above.
  • This vector aims to encapsulate the information for all input elements in order to help the decoder make accurate predictions.
  • It acts as the initial hidden state of the decoder part of the model.


  • A stack of several recurrent units where each predicts an output y_t at a time step t.
  • Each recurrent unit accepts a hidden state from the previous unit and produces and output as well as its own hidden state.
  • In question-answering problem, the output sequence is a collection of all words from the answer. Each word is represented as y_i where i is the order of that word.
  • Any hidden state h_i is computed using the formula:

As you can see, we are just using the previous hidden state to compute the next one.

  • The output y_t at time step t is computed using the formula:

We calculate the outputs using the hidden state at the current time step together with the respective weight W(S). Softmax is used to create a probability vector which will help us determine the final output (e.g. word in the question-answering problem). The power of this model lies in the fact that it can map sequences of different lengths to each other. As you can see the inputs and outputs are not correlated and their lengths can differ. This opens a whole new range of problems which can now be solved using such architecture.


To run the inference script run this type of  command SAMPLE COMMAND python --modelTag pretrainedv2 --test interactive . Here are some flags which could be useful. For more help and options, use python -h

--modelTag   <name>  allow to give a name to the current model to differentiate between them when testing/training --keepAll                      use this flag when training if when testing, you want to see the predictions at different steps (it can be interesting to see the program changes its name and age as the training progress). Warning It can quickly take a lot of storage space if you don't increase the --saveEvery option. --test interactive          use this Flag to start Interactive conversation with bot. --verbose                     when testing, will print the sentences as they are computed.


To get all the requirements and dependencies installed run the command - For GPU - pip install -r gpu_requirements.txt For CPU - pip install -r cpu_requirements.txt

Author View Profile

Raj Kumar
New Delhi, India.
Level 20 25450 XP

1233 Profile

I love coding

User Reviews

0 total ratings

Model has not been reviewed yet.

More by this user | Show All

Also checkout...