Pipeline standards #17

RGilliard-Arch · 2024-03-27T18:56:34Z

No description provided.

janejuenyang

Thanks for putting this together, Reggie! I like what you've noted already and have made some requests to add some sections.

janejuenyang · 2024-03-29T03:28:01Z

pipeline_standards.md

+    - Establish clear metrics for a successful pipeline.
+
+## Choose the right tools and technologies
+Depending on the data type, volume, and velocity, choose appropriate tools and technologies. For example: 


It might be helpful to distinguish tools that are possible when building GFE-based pipelines vs. cloud-based pipelines.

janejuenyang · 2024-03-29T03:34:44Z

pipeline_standards.md

+- Data orchestration: Apache Airflow 
+
+## Scalability and flexibility 
+Where possible, design the pipeline to be easily scaled up or down and to adapt to changes in data types and data formats.


I agree, and there needs to be a balance between adaptability/scalability and how long it takes to deliver what's needed. I suggest adding a few example questions to guide people in this consideration. For instance:

How will implementing a certain scaling capability or data handling flexibility affect the project timeline and code complexity / maintainability?

What cost constraints are there?

In addition, are there some minimum guidelines on flexibility and scalability? For instance -- anything about use of regex, minimizing hard coding, etc?

janejuenyang · 2024-03-29T03:37:10Z

pipeline_standards.md

+## Monitoring and optimizing
+Continuously monitor the performance of the pipeline and seek opportunities to optimize data processing times, reduce costs, and improve data quality. Implement monitoring and logging to track the performance and health of the pipeline. Alerts should be set up for failures or significant performance degradations. Logs can include assessments of data quality and any major errors or inconsistencies caught during data quality checks.
+
+## Ensure security and compliance


Let's include a link to the HHS approved software list (noting that the link is only accessible within HHS).

janejuenyang · 2024-03-29T03:39:10Z

pipeline_standards.md

+## Scalability and flexibility 
+Where possible, design the pipeline to be easily scaled up or down and to adapt to changes in data types and data formats.
+
+## Implement data quality checks


In this (and all of the subsequent sections), can you add a sub-section for Examples and start by linking to relevant parts of the PIR code base? For future projects, we can similarly add links -- though some will be to private repos, which is okay.

janejuenyang · 2024-03-29T03:39:54Z

pipeline_standards.md

@@ -0,0 +1,42 @@
+# Pipeline Best Practices


Please also add to the README

janejuenyang · 2024-03-29T03:43:14Z

pipeline_standards.md

+- How will success be measured?
+    - Establish clear metrics for a successful pipeline.
+
+## Choose the right tools and technologies


Please add a section for building iteratively, ensuring there's constant demos and syncs with the client -- you can adapt from the lessons learned doc.

RGilliard-Arch · 2024-03-29T12:25:44Z

Thank you @janejuenyang! @skalaga-arch is spearheading the work here, so I'll let him take the lead on these changes, but I'll review and contribute--especially the items from the lessons learned.

First draft of pipeline standards

e04b46f

RGilliard-Arch requested a review from janejuenyang March 27, 2024 18:56

RGilliard-Arch assigned skalaga-arch Mar 27, 2024

janejuenyang requested changes Mar 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline standards #17

Pipeline standards #17

RGilliard-Arch commented Mar 27, 2024

janejuenyang left a comment

janejuenyang Mar 29, 2024

janejuenyang Mar 29, 2024

janejuenyang Mar 29, 2024

janejuenyang Mar 29, 2024

janejuenyang Mar 29, 2024

janejuenyang Mar 29, 2024

RGilliard-Arch commented Mar 29, 2024 •

edited

Loading

Pipeline standards #17

Are you sure you want to change the base?

Pipeline standards #17

Conversation

RGilliard-Arch commented Mar 27, 2024

janejuenyang left a comment

Choose a reason for hiding this comment

janejuenyang Mar 29, 2024

Choose a reason for hiding this comment

janejuenyang Mar 29, 2024

Choose a reason for hiding this comment

janejuenyang Mar 29, 2024

Choose a reason for hiding this comment

janejuenyang Mar 29, 2024

Choose a reason for hiding this comment

janejuenyang Mar 29, 2024

Choose a reason for hiding this comment

janejuenyang Mar 29, 2024

Choose a reason for hiding this comment

RGilliard-Arch commented Mar 29, 2024 • edited Loading

RGilliard-Arch commented Mar 29, 2024 •

edited

Loading