This machine learning pipeline tool focuses mainly on using Tensorflow Extended
library to train machine learning model using data from various data storage.
Notebook files are stored in /notebooks
folder
File | Description |
---|---|
IEEE-CIS-Fraud-Detection-preprocessor.ipynb | pyspark preprocessor notebook |
IEEE-CIS-Fraud-Detection-Train-TF.ipynb | Tensorflow extended model training and publishing code. |
IEEE-CIS-Fraud-Detection-Score-Spark.ipynb | pyspark score notebook. |
The stack is deployed using docker and docker-compose. docker
and docker-compose
are prerequisite.
docker-compose -f sml.yml up -d
This pipeline used IEEE-CIS Fraud Detection data from kaggle. And in the first iteration it was able to achive considerable ok score.
- features were selected based on backward elemination technique.
Model improvement is beyond the scope of this repository.