Skip to content
Ben Darwin edited this page Jul 6, 2017 · 20 revisions

Overview

Pydpiper is a Python libraries and associated set of executable files intended primarily for running image processing pipelines on compute grids. It provides a domain-specific language (DSL) for constructing pipelines out of smaller components, wrappers for numerous command-line tools (currently largely MINC-centric, but currently expanding to some NIFTI and ITK-based tools), code for constructing common pipeline topologies, and command-line wrappers to run some core pipelines.

Conceptual overview

Pydpiper code can be used from within Python or packaged into an application and called from the shell. Roughly speaking, the process is as follows: first, pipeline

Monitoring an executing pipeline

Running the included check_pipeline_status.py script with a pipeline's <pipeline_name>_uri file as argument will provide a summary of running and finished stages, number of running executors, and other information.

An important source of truth is the pipeline.log file created in the pipeline's output directory. You can control the logging level by setting the shell environment variable PYRO_LOGLEVEL to one of DEBUG, INFO (the default), WARN, or ERROR. INFO reports information about stages starting and finishing, while WARN and ERROR will only report potential problems with execution.

The <pipeline_name>_finished_stages file contains a rather uninformative list of completed stages by their number; you can perform a join (using, e.g., Python's Pandas or R's tidyverse) with the <pipeline_name>_stages.txt file to determine which commands have run.

Pydpiper 2

Pydpiper 1

The pydpiper (version 1) wiki currently lives here:

https://wiki.phenogenomics.ca/display/MICePub/Pydpiper

Clone this wiki locally