A set of scripts for running the entire SPENSER pipeline on a single machine.
The following scripts assume the following have been installed:
- conda: for installation and environments
- pueue: a process queue for running all LADs
- Nomisweb API key: a Nomisweb
API key is required as an environment variable
$API_KEY
for successful installation
Run install_script.sh to set-up submodules and environment
(replace <CONDA_SPENSER>
with a name for your conda environment) to run
pipeline from repo root.
./scripts/full_pipeline/install.sh <CONDA_SPENSER>
Installation assumes conda is installed for creating the new virtual environment.
A single LAD (replace <LAD>
with a specific LAD code) can be run from the repo
root with single_lad.sh:
./scripts/full_pipeline/single_lad.sh <CONDA_SPENSER> <LAD>
All LADs can be run with run_all_lads.sh from the repo root:
./scripts/full_pipeline/run_all_lads.sh <CONDA_SPENSER>
This script requires pueue to be installed. Running for all 380 LADs on a single core will take several weeks.
Run the postprocessing script to merge 2011 LADs into 2020 LADs:
python scripts/postprocessing/spenser_to_2020_lads.py \
--data_in <MICROSIMULATION_DATA_PATH> \
--data_out <MICROSIMULATION_DATA_PATH>
Finally run collation.sh to reorganise the outputs for the SPC:
./scripts/full_pipeline/collation.sh \
-s <MICROSIMULATION_DATA_PATH> \
-t <COLLATED_SPENSER_PATH> \
-d # dry-run flag
A final check that all regions are covered may be performed with:
python scripts/postprocessing/final_check.py \
--paths <COLLATED_SPENSER_PATHS>