This project is a classification task from Kaggle: https://www.kaggle.com/competitions/nlp-getting-started
1.Cleaning: Clean Twitter data from both train.csv and test.csv
2.Module_building: Use Tf-idf to do text vectorization and combine various classification models to do classification
3.Keyword_extraction: In order to fill in the null value of the keyword column, two keyword extraction algorithms (Text Rank, Rake) are used to extract the keywords. (Implemented by my member) Unfortunately, this method does not improve the model significantly