This project is done using TensorFlow and Keras API.
- The dataset can be downloaded from here.
- The architecture for this project follows an Encoder-Decoder approach.
- The Encoder model is a Convolutional Neural Network because the input is an image.
- It's output is then fed into the Decoder model which is a LSTM.
- The image preprocessing is reshaping the image. Transfer Learning is utilized to generate better predictions.
- The text preprocessing consists of cleaning the data by removing digits, converting to lowercase and then generating the mappings for each word to unqiue number and its reverse.
- Vocabulary is then created out of the words by including only the words that fall below the certain threshold. This discarded all the most frequently words like a, an, the.
- Then the word embedding matrix between the vocabulary is calculated by using the glove 6B 50D vector.