blockchain-predictor

A set of tools and scripts that download and process blockchain and cryptocurrency course data, generate a dataset, use it to teach a deep learning neural network to make value predictions and evaluate the result.

The project is a work-in-progress and is yet to give (hopefully) accurate predictions.

Installation

System dependencies

The tools require the installation of Ethereum geth, Node.js, Python 3 and MongoDB. Please install the appropriate packages for your system.

Node dependencies

Clone the git repository and install the node dependencies:

git clone https://github.com/Zvezdin/blockchain-predictor.git
cd blockchain-predictor
npm install

Python dependencies

Install the required python dependencies via the following script:

chmod u+x pip-install.sh
./pip-install.sh

You're all set!

Usage

Run geth initially and wait for it to sync with the following command:

geth --rpc --fast --cache 2048 --datadir /path/to/your/data

The initial sync can take multiple hours

Consecutive runs can be done via:

geth --rpc --datadir /path/to/your/data

Run an instance of MongoDB with:

mongod --dbpath /path/to/your/db

Basic data gathering

To download and save course data for the whole history of the cryptocurrency, run:

node data-downloader.js course

To download and save the first 100 blocks after the 10th block from the blockchain, use:

node data-downloader.js blockchain 10 100

For more info, run:

node data-downloader.js help

All data is saved in folder data, relative to the script, as .json files.

Database usage and managemenet

Saving files as simple .json files isn't very useful, scalable and optimal. This project implements a high-level database service that manages data gathering, processing, storage and retrieval.

To download and save historical course data in the database, run:

python3 arcticdb.py --course

To download, process and save blockchain data in the database, run:

python3 arcticdb.py --blockchain

For more info:

python3 arcticdb.py -h

Generation of data properties

Data properties are calculated from the raw data. They are features which can represent certain activity in a better way for the deep network. They are generated for each course tick (time interval for which we have course data).

To generate all of the available properties for all downloaded data, run the following command:

python3 property-generator.py --action generate

To generate one or more properties for all downloaded data, run the following command:

python3 property-generator.py --action generate --properties openPrice,closePrice

To remove all generated properties:

python3 property-generator.py --action remove

For more info:

python3 property-generator.py -h

Generation of a dataset

After the needed data properties are generated, you can proceed with generating the actual dataset. The dataset is generated using a certain dataset model. There are multiple dataset models that "compile" the properties and structure the dataset in a different way. The default is matrix, which generates matrices from a moving window over all of the properties.

The correct labels for each entry in the dataset are also generated. The default label type is boolean, which represents the sign of the next course change. Label type of full represents the actual value change.

The result of dataset generation is a binary file, which is saved by default inside a data folder, relative to the tool's storage.

To generate a dataset from all available data with default settings and save in a different location, run:

python3 dataset_generator.py --filename /my/path/dataset.pickle

To generate a dataset for a certain period of time, use:

python3 dataset_generator --start 2017-03-14-03 --end 2017-07-03-21

To generate a dataset using a certain model and a set of properties, use:

python3 dataset_generator --model matrix --properties accountBalanceDistribution,transactionCount,gasUsed

In most cases when training neural networks, we will need two or three datasets - a train, validation (optional) and a test dataset. These datasets can be generated using separate calls to our dataset_generator for different dates, but we recommend to use one date interval that covers all our data and then split the resulting dataset into the needed parts. In our tool, this is done the following way:

python3 dataset_generator --ratio 6:2:2

This splits the dataset to chunks with ratio 6:2:2, or 60%, 20% and 20%. One can generate only two datasets, for example with:

python3 dataset_generator --ratio 5:3

Please keep in mind that the matrix model has dozens of hyperparameters that have been tuned for most cases. If your case differs, you need to change them in the source code of the matrix model.

For all available options, please see:

python3 dataset_generator -h

Training and evaluating the neural network

The generated dataset can be used to train neural networks. The supported networks depend on the chosen dataset model. The matrix model supports all networks.

To train our convolutional network on an already generated dataset and also shuffle the train dataset, we can do the following:

python3 neural_trainer.py path/to/your/dataset.pickle --models CONV --shuffle

Traninng a neural network can't be that simple, right? Right! You ~~should~~ can override the default network hyperparameters to suit your dataset and problem needs. This can be done via:

python neural_trainer.py data/test.pickle --models CONV --args epoch=5,batch=1,lr=0.0001,kernel=3

This example sets the number of training epochs, the batch size, learning rate and kernel size for the whole convolutional network. Each network architecture has its own set of hyperparameters and they are defined with the network specification itself.

After training, the network's performance will be evaluated with the test dataset and measured by 4+ different accuracy/error scores. The performance on the train and test datasets will also be visualized on a graph by opening a new window. If you do not wish training to be blocked by a graph window, you can save the graph to a file instead, by passing the --quiet parameter. This is useful for automated training of multiple networks, as it allows you to review the results afterwards.

Our other neural models include CustomDeep, LSTM and more to come.

Name		Name	Last commit message	Last commit date
Latest commit History 328 Commits
c++		c++
dataset_models		dataset_models
debug		debug
materials/distributions		materials/distributions
neural		neural
paper		paper
properties		properties
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
arcticdb.py		arcticdb.py
batch_dataset_generator.py		batch_dataset_generator.py
data-downloader.js		data-downloader.js
database_tools.py		database_tools.py
dataset_generator.py		dataset_generator.py
dataset_visualizer.py		dataset_visualizer.py
index.js		index.js
neural_trainer.py		neural_trainer.py
output_parser.py		output_parser.py
package-lock.json		package-lock.json
package.json		package.json
pip-install.sh		pip-install.sh
property_generator.py		property_generator.py
result_visualizer.py		result_visualizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

blockchain-predictor

Installation

System dependencies

Node dependencies

Python dependencies

Usage

Basic data gathering

Database usage and managemenet

Generation of data properties

Generation of a dataset

Training and evaluating the neural network

About

Releases

Packages

Languages

License

comrade-coop/blockchain-predictor

Folders and files

Latest commit

History

Repository files navigation

blockchain-predictor

Installation

System dependencies

Node dependencies

Python dependencies

Usage

Basic data gathering

Database usage and managemenet

Generation of data properties

Generation of a dataset

Training and evaluating the neural network

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages