Skip to content

Latest commit

 

History

History
39 lines (29 loc) · 1.52 KB

deployment.md

File metadata and controls

39 lines (29 loc) · 1.52 KB

Deployment

Requirements:

  1. BigQuery tables : transactions, errors, dedupe_state, transaction_types
  2. PubSub topic for transactions
  3. GCS bucket : Used for dataflow templates, staging and as temp location
  4. ETL Pipeline from PubSub to BigQuery:
    1. PubSub subscription
    2. Service account with following roles: BigQuery Data Editor, Dataflow Worker, Pub/Sub Subscriber, and Storage Admin
  5. Deduplication Task
    1. Service account with following roles: BigQuery Data Editor, BigQuery Job User, Monitoring Metric Writer
  6. Mirror Importer
    1. Service account with following roles: PubSub Publisher
  7. (Optional) ETL Pipeline from PubSub to GCS
    1. GCS Bucket: For output of pipeline
    2. Service account with following roles: Dataflow Worker, Pub/Sub Editor (for creating subscription), and Storage Admin

Resource creation can be automated using setup-gcp-resources.sh. Google Cloud SDK is required to run the script.

Steps

  1. Deploy ETL pipeline

Use deploy-etl-pipeline.sh script to deploy the etl pipeline to GCP Dataflow.

  1. Deploy Deduplication task

TODO

  1. Deploy Hedera Mirror Node Importer to publish transactions to the pubsub topic. See Mirror Nodes installation and configuration for more details.