Project Cleanup (#95)

* Setup CI/CD process around the se-demos-dev workspace (#85) * cicd tests * cicd tests2 * cicd tests3 * cicd tests4 * cicd tests5 * cicd tests5 * cicd tests6 * cicd tests7 * cicd tests8 * cicd tests9 * cicd dev branch * cicd dev branch2 * formatting consistency * setup dev vs demo workpools in CI * Dev ci/cd (#86) * cicd tests * cicd tests2 * cicd tests3 * cicd tests4 * cicd tests5 * cicd tests5 * cicd tests6 * cicd tests7 * cicd tests8 * cicd tests9 * cicd dev branch * cicd dev branch2 * formatting consistency * setup dev vs demo workpools in CI * remove extra env variable * Adds a basic CI template to use for new projects/demos to the dev branch (#88) * remove extra workspace env variable * specifiy name on datalake ci job * specifiy name on datalake ci job2 * initial ci template for new projects * syntax edits * move workflow template * ci image templating * use correct branch name for Dev ci process (#89) * update ci env syntax (#90) * update name in GHA (#91) * remove Project Name env (#92) * Update README.md (#93) * remove legacy demos (#94) --------- Co-authored-by: Jeff Hale <[email protected]>
PrefectHQ · May 21, 2024 · ea77ff8 · ea77ff8
1 parent 45b5c22
commit ea77ff8
Show file tree

Hide file tree

Showing 20 changed files with 102 additions and 1,021 deletions.
diff --git a/.github/ci_template.yaml b/.github/ci_template.yaml
@@ -0,0 +1,68 @@
+### Reference template for building new ci processes for
+### new demo projects within the demos repo.
+### Each demo project should be relatively self-contained
+### within it's own project directory barring external requirements
+### i.e. workpools or shared resources.
+
+name: Build image and deploy Prefect flow - PROJECT_NAME
+
+env:
+  PROJECT_DIRECTTORY: flows/PATH/TO/PROJECT
+  PROD_WORKPOOL: PROD_WORKPOOL_NAME # Preconfigured workpool in the se-demos workspace
+  DEV_WORKPOOL: DEV_WORKPOOL_NAME # Preconfigured workpool in the se-demos-dev workspace
+  CLOUD_ENV: AWS # AWS, GCP, AZURE
+
+on:
+  push:
+    branches:
+      - main
+      - Dev
+    paths:
+      - "$PROJECT_DIRECTORY/**"
+  workflow_dispatch:
+
+jobs:
+  deploy:
+      name: Deploy PROJECT_NAME flows
+      runs-on: ubuntu-latest
+
+      steps:
+        - name: Checkout
+          uses: actions/checkout@v4
+
+        # Appropriate secrets should be set as github secrets
+        # to be referenced here defaults to AWS ECR REPO
+        - name: Log in to image registry 
+          uses: docker/login-action@v3
+          if: env.CLOUD_ENV == 'AWS'
+          with:
+            registry: ${{ secrets.ECR_REPO }}
+            username: ${{ secrets.AWS_ACCESS_KEY_ID }}
+            password: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+
+        - name: Get commit hash
+          id: get-commit-hash
+          run: echo "COMMIT_HASH=$(git rev-parse --short HEAD)" >> "$GITHUB_OUTPUT"
+
+        - name: Setup Python
+          uses: actions/setup-python@v5
+          with:
+            python-version: "3.11"
+            cache: "pip"
+
+        - name: Prefect Deploy
+          # ENV variables to reference in deploy.py script
+          env:
+            BRANCH: ${{ github.ref_name }}
+            GITHUB_SHA: ${{ steps.get-commit-hash.outputs.COMMIT_HASH }}
+            PREFECT_API_KEY: ${{ secrets.PREFECT_API_KEY }}
+            IMG_REPO: ${{ secrets.ECR_REPO }} # IMAGE REGISTRY SECRET should be referenced here
+            WORKSPACE: ${{ github.ref == 'refs/heads/main' && 'se-demos' || 'se-demos-dev' }}
+            WORK_POOL_NAME: ${{ github.ref == 'refs/heads/main' && env.PROD_WORKPOOL || env.DEV_WORKPOOL }}
+            SCHEDULES_ACTIVE: ${{ github.ref == 'refs/heads/main' && True || False }}
+          run: |
+            cd $PROJECT_DIRECTORY
+            pip install -r requirements-ci.txt
+            prefect cloud workspace set -w sales-engineering/$WORKSPACE
+            python deploy.py
+            
diff --git a/.github/workflows/aws_datalake.yaml b/.github/workflows/aws_datalake.yaml
@@ -1,27 +1,32 @@
-name: Build image and deploy Prefect flow - Project 1
+name: Build image and deploy Prefect flow - S3 Datalake
 
 env:
-  PROJECT_NAME: flows/aws/datalake
+  PROJECT_DIRECTORY: flows/aws/datalake
+  PROD_WORKPOOL: Demo-ECS
+  DEV_WORKPOOL: Dev-ECS
+  CLOUD_ENV: AWS # AWS, GCP, AZURE
 
 on:
   push:
     branches:
       - main
+      - Dev
     paths:
-      - "flows/aws/datalake/**"
+      - "$PROJECT_DIRECTORY/**"
   workflow_dispatch:
 
 jobs:
   deploy:
-      name: Deploy AWS datalake flows
+      name: Deploy S3flows
       runs-on: ubuntu-latest
 
       steps:
         - name: Checkout
           uses: actions/checkout@v4
 
-        - name: Log in to ECR
+        - name: Log in to image registry
           uses: docker/login-action@v3
+          if: env.CLOUD_ENV == 'AWS'
           with:
             registry: ${{ secrets.ECR_REPO }}
             username: ${{ secrets.AWS_ACCESS_KEY_ID }}
@@ -39,12 +44,16 @@ jobs:
 
         - name: Prefect Deploy
           env:
+            BRANCH: ${{ github.ref_name }}
             GITHUB_SHA: ${{ steps.get-commit-hash.outputs.COMMIT_HASH }}
             PREFECT_API_KEY: ${{ secrets.PREFECT_API_KEY }}
-            ECR_REPO: ${{ secrets.ECR_REPO }}
+            IMG_REPO: ${{ secrets.ECR_REPO }}
+            WORKSPACE: ${{ github.ref == 'refs/heads/main' && 'se-demos' || 'se-demos-dev' }}
+            WORK_POOL_NAME: ${{ github.ref == 'refs/heads/main' && env.PROD_WORKPOOL || env.DEV_WORKPOOL }}
+            SCHEDULES_ACTIVE: ${{ github.ref == 'refs/heads/main' && 'True' || 'False' }}
           run: |
-            cd flows/aws/datalake
+            cd $PROJECT_DIRECTORY
             pip install -r requirements-ci.txt
-            prefect cloud workspace set -w sales-engineering/se-datalake
+            prefect cloud workspace set -w sales-engineering/$WORKSPACE
             python deploy.py
             
diff --git a/.github/workflows/development_deployment_action.yaml b/.github/workflows/development_deployment_action.yaml
diff --git a/README.md b/README.md
@@ -1,21 +1,12 @@
 # prefect-demos
 Welcome to our repository dedicated to showcasing a variety of Prefect demos. 
 
-Here, you'll find an extensive collection of practical examples and workflows designed to demonstrate the versatility and power of Prefect as a modern data workflow automation tool. Whether you're new to Prefect or an experienced user seeking to enhance your workflow designs, this repository offers valuable insights and easy-to-follow examples. Dive into our demos to explore how Prefect seamlessly orchestrates complex data processes, ensuring efficient and reliable execution of your data tasks. 
+These demos are meant to be end-to-end examples showcasing how multiple features within prefect can be utilized to accommodate different use cases demonstrating the versatility and power of Prefect as a workflow application tool. These demos focus specifically on the code and any prefect features necessary to create these examples, any external configurations necessary are assumed to have been completed separately. Dive into our demos to explore how Prefect seamlessly orchestrates complex data processes, ensuring efficient and reliable execution of your data tasks. 
 
 Get inspired, learn best practices, and discover innovative ways to leverage Prefect in your data projects!
 
 # Flows
-We have broken down our flows into digestible one-off examples that can be easily plugged into your current implementation
-
-### AWS
-- [wave_data.py](flows/aws/wave_data.py)
-
-    *Fetches wave height data via API, writes it to a file, reshapes it using pandas for analysis, and demonstrates Prefect's task caching and result storage capabilities using an AWS S3 bucket.*
-
-- [weather.py](flows/aws/weather.py)
-
-    *Showcases error handling, task retries, conditional flows, result caching, notification alerts, and integration with AWS S3 for result storage*
+Demos separated by project.
 
 #### Datalake Usage
 - [datalake_listener.py](flows/aws/datalake/datalake_listener.py)
@@ -30,34 +21,7 @@ We have broken down our flows into digestible one-off examples that can be easil
 
     *Automates the deployment of two Prefect workflows for data processing: `datalake_listener`, which triggers on AWS S3 object creation, and `fetch_neo_by_date`, which fetches Near Earth Object data daily, using a Docker image from an ECR repository for execution on an ECS push work pool*
 
-### Dask
-- [partition_example.py](flows/simple_flows/partition_examples.py)
-
-    *Demonstrates flexibility in deployment strategies with parallel and asynchronous data ingestion tasks for customers, payments, and orders within specified date ranges, utilizing Prefect with optional Dask for parallel execution
-
-### Databricks
-- [consumer_flow.py](flows/simple_flows/consumer_flow.py)
-
-    *Dynamically scales Databricks resources based on the count of unprocessed blocks, utilizing random values to simulate resource and workload metrics, and executing shell commands as part of the scaling process*
-
-### DBT
-- [dbt_snowflake_flow.py](flows/dbt/dbt_snowflake_flow.py)
-
-    *Integrates Airbyte sync for data extraction, DBT Cloud for transformation jobs, Great Expectations for data quality checks, and Snowflake queries, with Slack notifications for task failures.*
-
-
-### Misc Flows
-- [hello.py](flows/simple_flows/hello.py)
-
-    *Logs a hello, demonstrates task creation, logging, and optional tagging within a minimalist setup.*
-
-- [classic_flow.py](flows/simple_flows/classic_flow.py)
-
-    *Concurrently fetch and report current temperatures for predefined cities, leveraging task caching and S3 for result persistence.*
-
 
-# Utilities
-Utilize utilities for any additional workflows necessary to keep Prefect owned objects up to date
 
 
 

diff --git a/flows/aws/datalake/README.md b/flows/aws/datalake/README.md
@@ -1,8 +1,8 @@
-# Datalake Workflow Automation
+# Data Lake Workflow Automation
 
 ## Overview
 
-This project automates the ingestion and processing of Near Earth Objects (NEO) data from NASA's API into an AWS S3 datalake using Prefect for orchestration. It comprises two main components: a data fetcher that retrieves and stores NEO data in S3, and a listener that processes this data upon arrival.
+This project automates the ingestion and processing of Near Earth Objects (NEO) data from NASA's API into an AWS S3 data lake using Prefect for orchestration. It comprises two main components: a data fetcher that retrieves and stores NEO data in S3, and a listener that processes this data upon arrival.
 
 ![image](/img/data_lake_diagram.png)
 
@@ -49,4 +49,4 @@ This project automates the ingestion and processing of Near Earth Objects (NEO)
 
 ## Deployment
 
-Deployments are handled via the `deploy.py` script, which sets up the flows and configurations needed for execution in AWS. Ensure that you have the necessary AWS permissions and configurations in place.
+Deployments are handled via the `deploy.py` script, which sets up the flows and configurations needed for execution in AWS. Ensure that you have the necessary AWS permissions and configurations in place.
diff --git a/flows/aws/datalake/deploy.py b/flows/aws/datalake/deploy.py
@@ -7,6 +7,11 @@
 from prefect.deployments import DeploymentImage
 from prefect.events import DeploymentEventTrigger
 
+ecr_repo = os.getenv("IMG_REPO")
+image_tag = os.getenv("GITHUB_SHA")
+work_pool_name = os.getenv("WORK_POOL_NAME")
+schedules_active = os.getenv("SCHEDULES_ACTIVE")
+
 datalake_listener_deployment = datalake_listener.to_deployment(
     name="datalake_listener",
     triggers=[
@@ -25,13 +30,14 @@
 
 fetch_neo_by_date_deployment = fetch_neo_by_date.to_deployment(
     name="s3_nasa_fetch",
-    schedule=CronSchedule(cron="0 10 * * *"),
+    schedules=[
+        {
+            "schedule": CronSchedule(cron="0 10 * * *"),
+            "active": schedules_active,
+        }
+    ],
 )
 
-ecr_repo = os.getenv("ECR_REPO")
-image_tag = os.getenv("GITHUB_SHA")
-
-
 deploy(
     datalake_listener_deployment,
     fetch_neo_by_date_deployment,
@@ -40,5 +46,5 @@
         tag=image_tag,
         dockerfile="Dockerfile",
     ),
-    work_pool_name="Demo-ECS",
+    work_pool_name=work_pool_name,
 )
diff --git a/flows/aws/wave_data.py b/flows/aws/wave_data.py
diff --git a/flows/aws/weather.py b/flows/aws/weather.py