Skip to content

Latest commit

 

History

History
16 lines (12 loc) · 1.08 KB

README.md

File metadata and controls

16 lines (12 loc) · 1.08 KB

Image Caption Generator

This project is done using TensorFlow and Keras API.

  • The dataset can be downloaded from here.
  • The architecture for this project follows an Encoder-Decoder approach.
  • The Encoder model is a Convolutional Neural Network because the input is an image.
  • It's output is then fed into the Decoder model which is a LSTM.
  • The image preprocessing is reshaping the image. Transfer Learning is utilized to generate better predictions.
  • The text preprocessing consists of cleaning the data by removing digits, converting to lowercase and then generating the mappings for each word to unqiue number and its reverse.
  • Vocabulary is then created out of the words by including only the words that fall below the certain threshold. This discarded all the most frequently words like a, an, the.
  • Then the word embedding matrix between the vocabulary is calculated by using the glove 6B 50D vector.

Generated Predictions