Skip to content

Latest commit

 

History

History
134 lines (103 loc) · 8.53 KB

README.md

File metadata and controls

134 lines (103 loc) · 8.53 KB

System-Stats-By-Keylogger

Project of Technologies for Advanced Programming

Grade: 30 with honors / 30

Antonio Scardace @ Dept of Math and Computer Science, University of Catania

CodeFactor

Introduction

The course aims to study and use useful technologies to build end-to-end solutions to analyze, manage, archive, process, and view a high amount of data in real-time. For instance, we have seen: Docker containers, and pipelines built with Logstash (for data ingestion), Kafka (for data streaming), Spark (for data processing), ElasticSearch (for data storing), and Kibana (for data visualization).

This project was created as an exam project, to test and practice the following skills:

  • Knowledge of Docker
  • Knowledge of Data Ingestion via Logstash
  • Knowledge of Data Streaming via Kafka
  • Knowledge of Data Processing via Spark
  • Knowledge of Data Storing via Elasticsearch
  • Knowledge of Data Visualization via Kibana
  • Knowledge of Jupyter Notebook (for the presentation)

Real Use Case

The aim of the project is to make stats on the real-time use of the system by the user (and by users in general).

It can be useful as:

  1. System Monitor owned by Operating Systems owners
  2. System Monitor for Public Offices Computers
  3. System Monitor for Prison Computers
  4. Parental Control
  5. Spyware

Data Source: Windows Keylogger

The data source is a Windows Keylogger which sends a log to the TCP server on each foreground window change OR after 1 minute of user inactivity.

The log has the following pattern:

[UUID] :: [Window Title] :: [Timestamp Start]
Logged Text...
[Timestamp End] :: [IP Address]

Each log is composed by:

  • UUID: Identifies the PC univocally. Has the following format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.
  • Window Title: Is the title of the window where the user has typed.
  • Timestamp Start: Indicates when the user started typing in that window.
  • Logged Text: Is the set of keys pressed by the user and logged by the keylogger.
  • Timestamp End: Indicates when the user finished typing in that window.
  • IP Address: Is the public IP address. If the PC has no connection, the default value is "Unknown".

For instance:

[154A9DC6-FF4E-4149-B81C-610AE7BBD151] :: [WhatsApp] :: [2022-01-01 12:00:00]
Hi Nicole, happy new year!!
[2022-01-01 12:00:13] :: [1.2.3.4]

Server System

Receives logs (from multiple clients) and passes them to the pipeline illustrated below:

The following functions are available for each user (personal stats) and for all users (general stats):

  • For Logged Text:
    • Top 8 Last Logged Texts 📄
    • Sentiment analysis 📈
  • For Metadata:
    • Top 10 most used applications 🔖
    • Used windows classification 📊
      • Social
      • Utility
      • Entertainment
      • Web Browsing
      • Office & Study
      • Other
    • Customers Geolocation by IP 🌎
    • Different stats about time spent writing to the PC 👀

Structure & Demo

Let's see the structure of the project and how I have used all the components.
Each component used in this project has been put inside a Docker Container 🐳

Component Description
I have used it to implement a multi-threading server that receives real-time logs via TCP requests on the 8800 port from multiple clients. It extracts the features seen above from the logs and saves them in two CSV files:

metadata.csv = [UUID, Window Title, Timestamp of Begin, Timestamp of End, IP Address]
logs.csv = [UUID, Logged Text]

I have used Logstash to create two different data flow: one for the metadata and one for the text logs. Logstash takes this input data from two files, metadata.csv and logs.csv - they have been shared with the server container via a Docker volume.

Here is an example of what Logstash receives:

I have used Apache Kafka to make a single cluster, which has two topics: one for logs and one for metadata. It receives two different dataflows by Logstash and stores them to be pulled by Spark. Kafka Stream has not been used.

I have created two Docker Containers - one for each Kafka Topic we need to read from. Each of them, after the processing, saves the documents into the Elasticsearch index keylogger.
In the first, Spark Streaming read data from the logs topic and adds a little set of features. It is the VADER dictionary.
In the second, Spark Streaming read data from the metadata topic and adds three features: the type of the window, the difference (in seconds) between the two timestamps fields, and the public IP address geolocation coordinates (if it isn't set to "Unknown").
I have used Elasticsearch to create a cluster, containing the keylogger_stats index, shared only by a single node: es001. It receives docs from Spark Streaming. Data are saved into a Docker Volume to make the application persistent in time.

Here is an example of what Elasticsearch contains and shows:

I have used Kibana to visualize some stats in real-time - I have set the dashboard auto-refresh to happen every second. The dashboard analyzes just the data that have arrived in the last 24 hours. It shows general data (of all PCs) and if the user clicks on a specific UUID, the dashboard shows data of that particular PC.

Getting Started

So that the repository is successfully cloned and project run smoothly, a few steps need to be followed.

Requisites

  • At least 12 GB of RAM.
  • At least 25 GB of free space.
  • Use of Linux, MacOS, or Windows WSL.
  • Need to download and install Docker (but the use of Docker Desktop is optional).
  • The use of Visual Studio Code is strongly recommended.

Installation and Use

   $ git clone https://github.com/ElephanZ/System-Stats-By-Keylogger.git
   $ cd YOUR_PATH/System-Stats-By-Keylogger/
   $ bash run.sh

Useful Links

Container URL Description
broker http://localhost:8080 UI for Kafka
elasticsearch http://localhost:9200 ElasticSearch basic URL
elasticsearch http://localhost:9200/keylogger_stats/_search ElasticSearch index URL
elasticsearch http://localhost:9200/keylogger_stats/_search?... ElasticSearch URL to get all logs
elasticsearch http://localhost:9200/keylogger_stats/_search?... ElasticSearch URL to get all metadata
kibana http://localhost:5601 Kibana basic URL
kibana http://localhost:5601/dashboards/list?... Kibana Dashboards List

License ©️

Author: Antonio Scardace.
Distributed under the GNU General Public License v3.0. See LICENSE for more information.

PLEASE USE AND READ IT FOR ACADEMIC PURPOSES ONLY. ‼️
I DISCLAIM ANY LIABILITY FOR ILLEGAL USE. ‼️