Skip to content

Latest commit

 

History

History
65 lines (56 loc) · 2.06 KB

File metadata and controls

65 lines (56 loc) · 2.06 KB

SPENSER: full pipeline on single machine

A set of scripts for running the entire SPENSER pipeline on a single machine.

Install

Prerequisites

The following scripts assume the following have been installed:

  • conda: for installation and environments
  • pueue: a process queue for running all LADs
  • Nomisweb API key: a Nomisweb API key is required as an environment variable $API_KEY for successful installation

Submodule and environment set-up

Run install_script.sh to set-up submodules and environment (replace <CONDA_SPENSER> with a name for your conda environment) to run pipeline from repo root.

./scripts/full_pipeline/install.sh <CONDA_SPENSER>

Installation assumes conda is installed for creating the new virtual environment.

Pipeline

SPENSER with 2011 LAD codes

A single LAD (replace <LAD> with a specific LAD code) can be run from the repo root with single_lad.sh:

./scripts/full_pipeline/single_lad.sh <CONDA_SPENSER> <LAD>

All LADs can be run with run_all_lads.sh from the repo root:

./scripts/full_pipeline/run_all_lads.sh <CONDA_SPENSER>

This script requires pueue to be installed. Running for all 380 LADs on a single core will take several weeks.

Postprocessing

Run the postprocessing script to merge 2011 LADs into 2020 LADs:

python scripts/postprocessing/spenser_to_2020_lads.py \
    --data_in <MICROSIMULATION_DATA_PATH> \
    --data_out <MICROSIMULATION_DATA_PATH>

Collation

Finally run collation.sh to reorganise the outputs for the SPC:

./scripts/full_pipeline/collation.sh \
    -s <MICROSIMULATION_DATA_PATH> \
    -t <COLLATED_SPENSER_PATH> \
    -d # dry-run flag

Test

A final check that all regions are covered may be performed with:

python scripts/postprocessing/final_check.py \
    --paths <COLLATED_SPENSER_PATHS>