Skip to content

Tag Engine lets you automate the process of creating and populating metadata tags with Google Cloud's Data Catalog. Tag Engine is licensed under the Apache 2 license terms. Please make sure to read, understand and agree to the terms of the LICENSE and CONTRIBUTING files before proceeding.

License

Notifications You must be signed in to change notification settings

francescpuig7/datacatalog-tag-engine

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tag Engine

This branch contains the Tag Engine 1.0 application which is hosted on App Engine. An early release of Tag Engine 2.0 which is hosted on Cloud Run instead of App Engine is available from the cloud-run branch.

Tag Engine is an open-source extension to Google Cloud's Data Catalog. Tag Engine automates the tagging of BigQuery tables and views as well as data lake files in Cloud Storage. You create a configuration, which contains SQL expressions that define how to populate the fields in the tags. Tag Engine runs the configuration either on demand or on a pre-defined schedule.

Documentation

Deployment Procedure

Tag Engine 1.0 is a Flask application that is hosted on Google App Engine and Firestore. It assumes that you will be tagging assets in BigQuery or Google Cloud Storage. Follow the steps below to deploy the Tag Engine application in your Google Cloud project.

Note: In the deployment procedure below, we use one GCP project for running Tag Engine and Data Catalog and another project for storing data assets in BigQuery. If this is your first time running Tag Engine, you may want to keep everything in one project for simplicity.

Step 1: Set the required environment variables

export TAG_ENGINE_PROJECT=tag-engine-vanilla-337221
export TAG_ENGINE_REGION=us-central
export BQ_PROJECT=warehouse-337221
export TAG_ENGINE_SA=${TAG_ENGINE_PROJECT}@appspot.gserviceaccount.com
gcloud config set project $TAG_ENGINE_PROJECT

Step 2: Enable the following Google Cloud APIs

gcloud services enable iam.googleapis.com
gcloud services enable appengine.googleapis.com

Step 3: Clone this code repository

git clone https://github.com/GoogleCloudPlatform/datacatalog-tag-engine.git

Step 4: Set the input variables

cd datacatalog-tag-engine
cat > deploy/variables.tfvars << EOL
tag_engine_project="${TAG_ENGINE_PROJECT}"
bigquery_project="${BQ_PROJECT}"
app_engine_region="${TAG_ENGINE_REGION}"
app_engine_subregion="${TAG_ENGINE_SUB_REGION}"
EOL

Edit the five variables in datacatalog-tag-engine/tagengine.ini:

[DEFAULT]
TAG_ENGINE_PROJECT = tag-engine-develop
QUEUE_REGION = us-central1
INJECTOR_QUEUE = tag-engine-injector-queue
WORK_QUEUE = tag-engine-work-queue
BIGQUERY_REGION = us-central1

Step 5: Create the Firestore database and deploy the App Engine application

gcloud alpha firestore databases create --project=$TAG_ENGINE_PROJECT --region=$TAG_ENGINE_REGION     
gcloud app create --project=$TAG_ENGINE_PROJECT --region=$TAG_ENGINE_REGION
gcloud app deploy datacatalog-tag-engine/app.yaml

Note: The deploy command assumes that you will be running Tag Engine using App Engine's default Service Account (SA). This SA gets created automatically when you run the deploy command and is assigned the 'Editor' role on the project. Verify that the SA has been assigned the Editor role before continuing with the deployment.

Step 6: Secure App Engine with firewall rules

gcloud app firewall-rules create 100 --action ALLOW --source-range [IP_RANGE]
gcloud app firewall-rules update default --action deny

Alternatively, control access to App Engine by user identity (instead of IP address) with Identity-Aware Proxy (IAP).

Step 7: Run the Terraform scripts

gcloud auth application-default login
cd datacatalog-tag-engine/deploy
terraform init
terraform apply -var-file=variables.tfvars

Note: The deployment can take up to one hour due to the large number of index builds. There are 27 Firestore indexes that get created sequentially to avoid hitting concurrency limits in Firestore.

Step 8: Launch the Tag Engine UI

gcloud app browse

Hint: read this tutorial to learn about Tag Engine's various tag configuration options.

Common UI Commands:

  • Open the Tag Engine UI:
    gcloud app browse

Common API Commands:

  • Create a static asset config:
    curl -X POST [TAG ENGINE URL]/static_asset_tags -d @examples/static_asset_configs/static_asset_create_auto_bq.json

  • Create a dynamic table config:
    curl -X POST [TAG ENGINE URL]/dynamic_table_tags -d @examples/dynamic_table_configs/dynamic_table_create_auto.json

  • Create a dynamic column config:
    curl -X POST [TAG ENGINE URL]/dynamic_column_tags -d @examples/dynamic_column_configs/dynamic_column_create_auto.json

  • Create a glossary asset config:
    curl -X POST [TAG ENGINE URL]/glossary_asset_tags -d @examples/glossary_asset_configs/glossary_asset_create_ondemand_bq.json

  • Create a sensitive column config:
    curl -X POST [TAG ENGINE URL]/sensitive_column_tags -d @examples/sensitive_column_configs/sensitive_column_create_auto.json

  • Create Data Catalog entry config:
    curl -X POST [TAG ENGINE URL]/entries -d @examples/entry_configs/entry_create_auto.json

  • Import tags from CSV files:
    curl -X POST [TAG ENGINE URL]/import_tags -d @examples/import_configs/import_column_tags.json

  • Export tags to BigQuery tables:
    curl -X POST [TAG ENGINE URL]/export_tags -d @examples/export_configs/export_tags_by_project.json

  • Restore tags from Data Catalog's metadata export:
    curl -X POST [TAG ENGINE URL]/restore_tags -d @examples/restore_configs/restore_table_tags.json

  • Get the status of a job:
    curl -X POST [TAG ENGINE URL]/get_job_status -d '{"job_uuid":"47aa9460fbac11ecb1a0190a014149c1"}'

Troubleshooting:

  • Consult the App Engine logs if you encounter any errors while using Tag Engine:
    gcloud app logs tail -s default

About

Tag Engine lets you automate the process of creating and populating metadata tags with Google Cloud's Data Catalog. Tag Engine is licensed under the Apache 2 license terms. Please make sure to read, understand and agree to the terms of the LICENSE and CONTRIBUTING files before proceeding.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 64.2%
  • HTML 31.7%
  • HCL 3.3%
  • Other 0.8%