Skip to content

tapunict/insTAP

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InsTAP

Purpose of the project

The purpose of the project is to carry out a sentiment analysis on the comments posted by Instagram users in order to evaluate which famous people are more or less loved by the Internet.

Data Pipeline

Data Pipeline

The components of the pipeline are listed below:

  • Instap Producer: retrieves data from Instagram using the Instaloader package and sends it to Logstash

  • Logstash: receives the data from the producer and writes on Kafka's Instap topic.

  • Kafka: message broker, connects logstash to the Spark processing component.

  • Spark: received data from Kafka and perform machine learning prediction

  • Elasticsearch: Indexing incoming data.

  • Kibana: UI dedicated to Data Visualization.

More technical details in the specific folder, more details on the actual usage in this project in doc.

Requirements

  • Docker (Desktop on Windows)
  • Docker Compose
  • Instagram Account credentials

Usage

  1. Clone the project repository:
git clone https://github.com/rosarioamantia/insTAP
  1. Move to producer folder and edit the producer.env file with your Instagram user credentials, users, number of posts and comments you want to see.

  2. Download spark-3.1.2-bin-hadoop2.7 in spark/setup folder.

  3. In the root repository (called insTAP) run all the docker containers:

docker-compose up
  1. Now, the producer will generate data.
  2. Go to:
localhost:5601

and import visualizations located in kibana/export.ndjson to Left Hambuger menu > Management > Stack Management > Saved Objects > Import.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 72.7%
  • Dockerfile 15.2%
  • Shell 12.1%