Skip to content

Commit

Permalink
Project Cleanup (#95)
Browse files Browse the repository at this point in the history
* Setup CI/CD process around the se-demos-dev workspace (#85)

* cicd tests

* cicd tests2

* cicd tests3

* cicd tests4

* cicd tests5

* cicd tests5

* cicd tests6

* cicd tests7

* cicd tests8

* cicd tests9

* cicd dev branch

* cicd dev branch2

* formatting consistency

* setup dev vs demo workpools in CI

* Dev ci/cd (#86)

* cicd tests

* cicd tests2

* cicd tests3

* cicd tests4

* cicd tests5

* cicd tests5

* cicd tests6

* cicd tests7

* cicd tests8

* cicd tests9

* cicd dev branch

* cicd dev branch2

* formatting consistency

* setup dev vs demo workpools in CI

* remove extra env variable

* Adds a basic CI template to use for new projects/demos to the dev branch (#88)

* remove extra workspace env variable

* specifiy name on datalake ci job

* specifiy name on datalake ci job2

* initial ci template for new projects

* syntax edits

* move workflow template

* ci image templating

* use correct branch name for Dev ci process (#89)

* update ci env syntax (#90)

* update name in GHA (#91)

* remove Project Name env (#92)

* Update README.md (#93)

* remove legacy demos (#94)

---------

Co-authored-by: Jeff Hale <[email protected]>
  • Loading branch information
masonmenges and discdiver authored May 21, 2024
1 parent 45b5c22 commit ea77ff8
Show file tree
Hide file tree
Showing 20 changed files with 102 additions and 1,021 deletions.
68 changes: 68 additions & 0 deletions .github/ci_template.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
### Reference template for building new ci processes for
### new demo projects within the demos repo.
### Each demo project should be relatively self-contained
### within it's own project directory barring external requirements
### i.e. workpools or shared resources.

name: Build image and deploy Prefect flow - PROJECT_NAME

env:
PROJECT_DIRECTTORY: flows/PATH/TO/PROJECT
PROD_WORKPOOL: PROD_WORKPOOL_NAME # Preconfigured workpool in the se-demos workspace
DEV_WORKPOOL: DEV_WORKPOOL_NAME # Preconfigured workpool in the se-demos-dev workspace
CLOUD_ENV: AWS # AWS, GCP, AZURE

on:
push:
branches:
- main
- Dev
paths:
- "$PROJECT_DIRECTORY/**"
workflow_dispatch:

jobs:
deploy:
name: Deploy PROJECT_NAME flows
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v4

# Appropriate secrets should be set as github secrets
# to be referenced here defaults to AWS ECR REPO
- name: Log in to image registry
uses: docker/login-action@v3
if: env.CLOUD_ENV == 'AWS'
with:
registry: ${{ secrets.ECR_REPO }}
username: ${{ secrets.AWS_ACCESS_KEY_ID }}
password: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

- name: Get commit hash
id: get-commit-hash
run: echo "COMMIT_HASH=$(git rev-parse --short HEAD)" >> "$GITHUB_OUTPUT"

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
cache: "pip"

- name: Prefect Deploy
# ENV variables to reference in deploy.py script
env:
BRANCH: ${{ github.ref_name }}
GITHUB_SHA: ${{ steps.get-commit-hash.outputs.COMMIT_HASH }}
PREFECT_API_KEY: ${{ secrets.PREFECT_API_KEY }}
IMG_REPO: ${{ secrets.ECR_REPO }} # IMAGE REGISTRY SECRET should be referenced here
WORKSPACE: ${{ github.ref == 'refs/heads/main' && 'se-demos' || 'se-demos-dev' }}
WORK_POOL_NAME: ${{ github.ref == 'refs/heads/main' && env.PROD_WORKPOOL || env.DEV_WORKPOOL }}
SCHEDULES_ACTIVE: ${{ github.ref == 'refs/heads/main' && True || False }}
run: |
cd $PROJECT_DIRECTORY
pip install -r requirements-ci.txt
prefect cloud workspace set -w sales-engineering/$WORKSPACE
python deploy.py
25 changes: 17 additions & 8 deletions .github/workflows/aws_datalake.yaml
Original file line number Diff line number Diff line change
@@ -1,27 +1,32 @@
name: Build image and deploy Prefect flow - Project 1
name: Build image and deploy Prefect flow - S3 Datalake

env:
PROJECT_NAME: flows/aws/datalake
PROJECT_DIRECTORY: flows/aws/datalake
PROD_WORKPOOL: Demo-ECS
DEV_WORKPOOL: Dev-ECS
CLOUD_ENV: AWS # AWS, GCP, AZURE

on:
push:
branches:
- main
- Dev
paths:
- "flows/aws/datalake/**"
- "$PROJECT_DIRECTORY/**"
workflow_dispatch:

jobs:
deploy:
name: Deploy AWS datalake flows
name: Deploy S3flows
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Log in to ECR
- name: Log in to image registry
uses: docker/login-action@v3
if: env.CLOUD_ENV == 'AWS'
with:
registry: ${{ secrets.ECR_REPO }}
username: ${{ secrets.AWS_ACCESS_KEY_ID }}
Expand All @@ -39,12 +44,16 @@ jobs:

- name: Prefect Deploy
env:
BRANCH: ${{ github.ref_name }}
GITHUB_SHA: ${{ steps.get-commit-hash.outputs.COMMIT_HASH }}
PREFECT_API_KEY: ${{ secrets.PREFECT_API_KEY }}
ECR_REPO: ${{ secrets.ECR_REPO }}
IMG_REPO: ${{ secrets.ECR_REPO }}
WORKSPACE: ${{ github.ref == 'refs/heads/main' && 'se-demos' || 'se-demos-dev' }}
WORK_POOL_NAME: ${{ github.ref == 'refs/heads/main' && env.PROD_WORKPOOL || env.DEV_WORKPOOL }}
SCHEDULES_ACTIVE: ${{ github.ref == 'refs/heads/main' && 'True' || 'False' }}
run: |
cd flows/aws/datalake
cd $PROJECT_DIRECTORY
pip install -r requirements-ci.txt
prefect cloud workspace set -w sales-engineering/se-datalake
prefect cloud workspace set -w sales-engineering/$WORKSPACE
python deploy.py
52 changes: 0 additions & 52 deletions .github/workflows/development_deployment_action.yaml

This file was deleted.

40 changes: 2 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,12 @@
# prefect-demos
Welcome to our repository dedicated to showcasing a variety of Prefect demos.

Here, you'll find an extensive collection of practical examples and workflows designed to demonstrate the versatility and power of Prefect as a modern data workflow automation tool. Whether you're new to Prefect or an experienced user seeking to enhance your workflow designs, this repository offers valuable insights and easy-to-follow examples. Dive into our demos to explore how Prefect seamlessly orchestrates complex data processes, ensuring efficient and reliable execution of your data tasks.
These demos are meant to be end-to-end examples showcasing how multiple features within prefect can be utilized to accommodate different use cases demonstrating the versatility and power of Prefect as a workflow application tool. These demos focus specifically on the code and any prefect features necessary to create these examples, any external configurations necessary are assumed to have been completed separately. Dive into our demos to explore how Prefect seamlessly orchestrates complex data processes, ensuring efficient and reliable execution of your data tasks.

Get inspired, learn best practices, and discover innovative ways to leverage Prefect in your data projects!

# Flows
We have broken down our flows into digestible one-off examples that can be easily plugged into your current implementation

### AWS
- [wave_data.py](flows/aws/wave_data.py)

*Fetches wave height data via API, writes it to a file, reshapes it using pandas for analysis, and demonstrates Prefect's task caching and result storage capabilities using an AWS S3 bucket.*

- [weather.py](flows/aws/weather.py)

*Showcases error handling, task retries, conditional flows, result caching, notification alerts, and integration with AWS S3 for result storage*
Demos separated by project.

#### Datalake Usage
- [datalake_listener.py](flows/aws/datalake/datalake_listener.py)
Expand All @@ -30,34 +21,7 @@ We have broken down our flows into digestible one-off examples that can be easil

*Automates the deployment of two Prefect workflows for data processing: `datalake_listener`, which triggers on AWS S3 object creation, and `fetch_neo_by_date`, which fetches Near Earth Object data daily, using a Docker image from an ECR repository for execution on an ECS push work pool*

### Dask
- [partition_example.py](flows/simple_flows/partition_examples.py)

*Demonstrates flexibility in deployment strategies with parallel and asynchronous data ingestion tasks for customers, payments, and orders within specified date ranges, utilizing Prefect with optional Dask for parallel execution

### Databricks
- [consumer_flow.py](flows/simple_flows/consumer_flow.py)

*Dynamically scales Databricks resources based on the count of unprocessed blocks, utilizing random values to simulate resource and workload metrics, and executing shell commands as part of the scaling process*

### DBT
- [dbt_snowflake_flow.py](flows/dbt/dbt_snowflake_flow.py)

*Integrates Airbyte sync for data extraction, DBT Cloud for transformation jobs, Great Expectations for data quality checks, and Snowflake queries, with Slack notifications for task failures.*


### Misc Flows
- [hello.py](flows/simple_flows/hello.py)

*Logs a hello, demonstrates task creation, logging, and optional tagging within a minimalist setup.*

- [classic_flow.py](flows/simple_flows/classic_flow.py)

*Concurrently fetch and report current temperatures for predefined cities, leveraging task caching and S3 for result persistence.*


# Utilities
Utilize utilities for any additional workflows necessary to keep Prefect owned objects up to date



Expand Down
6 changes: 3 additions & 3 deletions flows/aws/datalake/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Datalake Workflow Automation
# Data Lake Workflow Automation

## Overview

This project automates the ingestion and processing of Near Earth Objects (NEO) data from NASA's API into an AWS S3 datalake using Prefect for orchestration. It comprises two main components: a data fetcher that retrieves and stores NEO data in S3, and a listener that processes this data upon arrival.
This project automates the ingestion and processing of Near Earth Objects (NEO) data from NASA's API into an AWS S3 data lake using Prefect for orchestration. It comprises two main components: a data fetcher that retrieves and stores NEO data in S3, and a listener that processes this data upon arrival.

![image](/img/data_lake_diagram.png)

Expand Down Expand Up @@ -49,4 +49,4 @@ This project automates the ingestion and processing of Near Earth Objects (NEO)
## Deployment
Deployments are handled via the `deploy.py` script, which sets up the flows and configurations needed for execution in AWS. Ensure that you have the necessary AWS permissions and configurations in place.
Deployments are handled via the `deploy.py` script, which sets up the flows and configurations needed for execution in AWS. Ensure that you have the necessary AWS permissions and configurations in place.
18 changes: 12 additions & 6 deletions flows/aws/datalake/deploy.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@
from prefect.deployments import DeploymentImage
from prefect.events import DeploymentEventTrigger

ecr_repo = os.getenv("IMG_REPO")
image_tag = os.getenv("GITHUB_SHA")
work_pool_name = os.getenv("WORK_POOL_NAME")
schedules_active = os.getenv("SCHEDULES_ACTIVE")

datalake_listener_deployment = datalake_listener.to_deployment(
name="datalake_listener",
triggers=[
Expand All @@ -25,13 +30,14 @@

fetch_neo_by_date_deployment = fetch_neo_by_date.to_deployment(
name="s3_nasa_fetch",
schedule=CronSchedule(cron="0 10 * * *"),
schedules=[
{
"schedule": CronSchedule(cron="0 10 * * *"),
"active": schedules_active,
}
],
)

ecr_repo = os.getenv("ECR_REPO")
image_tag = os.getenv("GITHUB_SHA")


deploy(
datalake_listener_deployment,
fetch_neo_by_date_deployment,
Expand All @@ -40,5 +46,5 @@
tag=image_tag,
dockerfile="Dockerfile",
),
work_pool_name="Demo-ECS",
work_pool_name=work_pool_name,
)
48 changes: 0 additions & 48 deletions flows/aws/wave_data.py

This file was deleted.

50 changes: 0 additions & 50 deletions flows/aws/weather.py

This file was deleted.

Loading

0 comments on commit ea77ff8

Please sign in to comment.