Skip to content
Ben Darwin edited this page Jul 6, 2017 · 20 revisions

Pydpiper is a set of Python libraries and associated executables intended primarily for running image processing pipelines on compute grids. It provides a domain-specific language (DSL) for constructing pipelines out of smaller components, wrappers for numerous command-line tools (currently largely MINC-centric, but currently expanding to some NIFTI- and ITK-based tools), code for constructing common pipeline topologies, and command-line wrappers to run some core pipelines.

Conceptual overview

Pydpiper code can be used from within Python or packaged into an application and called from the shell. Roughly speaking, the process is as follows: first, executing Pydpiper code determines the overall topology of a pipeline and the filenames of the input and output files of each step, compiling a graph of "stages" to be scheduled for execution; second, the Pydpiper server spawns "executors" (either remote jobs on a compute grid or subprocesses on a local machine) which get stages (usually shell commands) from the server as their dependencies are satisfied and run them.

Monitoring an executing pipeline

Running the included check_pipeline_status.py script with a pipeline's <pipeline_name>_uri file as argument will provide a summary of running and finished stages, number of running executors, and other information.

An important source of truth is the pipeline.log file created in the pipeline's output directory. You can control the logging level by setting the shell environment variable PYRO_LOGLEVEL (before program start) to one of DEBUG, INFO (the default), WARN, or ERROR. INFO reports information about stages starting and finishing, while WARN and ERROR will only report potential problems with execution.

The <pipeline_name>_finished_stages file contains a rather uninformative list of completed stages by their number; you can perform a join (using, e.g., Python's Pandas or R's tidyverse) with the <pipeline_name>_stages.txt file to determine which commands have run.

Tips

  • Keep your pipeline name (--pipeline-name) and ideally your input filenames relatively short. Our filename propagation is currently rather unwieldy and longer paths risk running over certain program-specific filename length limits, preventing the pipeline from starting.
  • In principle one can start additional executors (via pipeline_executor.py --uri-file ... --num-executors ... ) from the command line, but as we rarely do this we're not certain how well this works.

Pydpiper 2

  • MBM.py
  • MAGeT.py
  • twolevel_model_building.py
  • registration_chain.py

Pydpiper 1

The pydpiper (version 1) wiki currently lives here:

https://wiki.phenogenomics.ca/display/MICePub/Pydpiper

Clone this wiki locally