Skip to content

Commit

Permalink
Integration of auxiliary data (metadata) in models (#109)
Browse files Browse the repository at this point in the history
* Fixup bg percent check in no-bg cases

* Fixup non-bg/bg sample percent check

* Fixup sample script utils import for consistency

* Update sample class count check to remove extra data loop

* Update sample dataset resize func w/ generic version

* Add missing debug/scale to main test config

* Add missing loss/optim/ignoreidx vals to main test cfg

* Move sample metadata fields to parallel hdf5 datasets

The previous implementation would overwrite the metadata attributes each time
a new raster was parsed; this version allows multiple versions to exist in
parallel. The metadata itself is tied to each sample using an index that
corresponds to the position of the metadata string in the separate dataset.
This implementation also stores the entire raster YAML metadata dict as a
single string that may be eval'd and reinstantiated as needed at runtime.

* Remove top-level trn/val/tst split config

* Remove useless class weight vector from test config

* Update segmentation dataset interface to load metadata

* Add metadata unpacking in segm dataset interface

* Fix parameter check to support zero-based values

The previous implementation did not allow null values to actually be
assigned to some non-null default hyperparameters. For example, when
the 'ignore_index' was set to '0' (which is totally valid), it would
be skipped and the default value of '-100' would remain.

* Update hdf5 label map dtypes to int16

* Add coordconv layers & utils in new module

* Add metadata-enabled segm dataset parsing interface

* Add util function for fetching vals in a dictionary

* Update model_choice to allow enabling coordconv via config

* Cleanup dataset creation util func w/ subset loop

* Refactor image reader & vector rasterizer utilities

The current version of these functions is now more generic than before. The rasterization
utility function (vector_to_raster) is now located in the 'utils' package and supports the
burning of vectors into separate layers as well as in the same layer (which is the original
behavior). The new multi-layer behavior is used in the updated 'image_reader_as_array' utility
function to (optionally) append new layers to raw imagery.

The refactoring also allowed the cleanup of the 'assert_band_number' utility function, and
simplification of the code in both the inference script (inference.py') and dataset preparation
script ('image_to_samples.py').

* Update meta-segm dataset parser to make map optional

* Cleanup SegmDataset to clearly only handle zero-dontcare differently

* Refactor 'create_dataloader' function in training script

The current version now inspects the parameter dictionary to see if a 'meta_map' is
provided. If so, the segmentation dataset parser will be replaced by its upgraded
version that can append extra (metadata) layers onto loaded tensors based on that
predefined mapping.

The refactoring also now includes the 'get_num_samples' call directly into the
'create_dataloader' function.

* Update create_dataloader util to force-fix dontcare val

* Update read_csv to parse metadata config file with raster path

The current version now allows a metadata (yaml) file to be associated with each raster
file that will be split into samples. The previous version only allowed a global metadata
file to be parsed.

* Cleanup package imports & add missing import to utils

* Refactor meta-segm-dataset parser to expose meta layer append util

* Move meta_map param from training cfg to global cfg

* Add meta-layer support to inference.py

* Move meta-layer concat to proper location in inference script

* Update meta-enabled config for unet tests

* Move meta-segm cfg to proper dir & add coordconv cfg

* Update csv column count check to allow extras

* Update i2s and inf band count checks to account for meta layers

* Fixup missing meta field in csv parsing output dicts

* Fixup band count in coordconv ex config

* Fixup image reader util to avoid double copies

* Cleanup vector rasterization utils & recursive key getter

* Update aux distmap computing to make target ids optional & add log maps

* Add canvec aux test config and cleanup aux params elsewhere

* Add download links for external (non-private) files

* Re-add previously deleted files from default gdl data dir

* Update i2s/train/inf scripts to support single class segm

* Fixup gpu stats display when gpu is disabled

* Add missing empty metadata fields in test CSVs

* Fixup improper device upload in classif inference

* Update travis to use recent pkgs in conda-forge
  • Loading branch information
fmigneault authored and mpelchat04 committed Nov 6, 2019
1 parent 278fd9a commit 2d04470
Show file tree
Hide file tree
Showing 17 changed files with 837 additions and 222 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.idea/
.vscode/
__pycache__
8 changes: 5 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@ install:
- bash miniconda.sh -b -p $HOME/miniconda
- export PATH="$HOME/miniconda/bin:$PATH"
- hash -r
- conda config --set always_yes yes --set changeps1 no
- conda config --set always_yes yes
- conda config --set changeps1 no
- conda config --prepend channels conda-forge
- conda config --prepend channels pytorch
- conda update -q conda
- conda info -a

- conda create -q -n ci_env python=3.6 pytorch-cpu torchvision-cpu torchvision ruamel_yaml h5py scikit-image scikit-learn fiona rasterio tqdm -c pytorch
- conda create -q -n ci_env python=3.6 pytorch-cpu torchvision-cpu torchvision ruamel_yaml h5py>=2.10 scikit-image scikit-learn fiona rasterio tqdm
- source activate ci_env
before_script:
- unzip ./data/massachusetts_buildings.zip -d ./data
Expand Down
69 changes: 69 additions & 0 deletions conf/config.canvecaux.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Deep learning configuration file ------------------------------------------------
# Five sections :
# 1) Global parameters; those are re-used amongst the next three operations (sampling, training and inference)
# 2) Sampling parameters
# 3) Training parameters
# 4) Inference parameters
# 5) Model parameters

# Global parameters

global:
samples_size: 256
num_classes: 5
data_path: ./data/kingston_wv2_40cm/images
number_of_bands: 4
model_name: unet # One of unet, unetsmall, checkpointed_unet or ternausnet
bucket_name: # name of the S3 bucket where data is stored. Leave blank if using local files
task: segmentation # Task to perform. Either segmentation or classification
num_gpus: 1
aux_vector_file: ./data/canvec_191031_127357_roads.gpkg # https://drive.google.com/file/d/1PCxn2197NiOVKOxGgQIA__w69jAJmjXp
aux_vector_dist_maps: true
meta_map:
scale_data: [0,1]
debug_mode: True
coordconv_convert:
coordconv_scale:

# Sample parameters; used in images_to_samples.py -------------------

sample:
prep_csv_file: ./data/trn_val_tst_kingston.csv # https://drive.google.com/file/d/1uNizOAToa-R_sik0DvBqDUVwjqYdOALJ
samples_dist: 200
min_annotated_percent: 10 # Min % of non background pixels in stored samples. Default: 0
mask_reference: False

# Training parameters; used in train_model.py ----------------------

training:
state_dict_path:
output_path: ./data/output
num_trn_samples:
num_val_samples:
num_tst_samples:
batch_size: 8
num_epochs: 100
loss_fn: Lovasz # One of CrossEntropy, Lovasz, Focal, OhemCrossEntropy (*Lovasz for segmentation tasks only)
optimizer: adam # One of adam, sgd or adabound
learning_rate: 0.0001
weight_decay: 0
step_size: 4
gamma: 0.9
class_weights:
batch_metrics: # (int) Metrics computed every (int) batches. If left blank, will not perform metrics. If (int)=1, metrics computed on all batches.
ignore_index: 0 # Specifies a target value that is ignored and does not contribute to the input gradient. Default: None
augmentation:
rotate_limit: 45
rotate_prob: 0.5
hflip_prob: 0.5
dropout:
dropout_prob:

# Inference parameters; used in inference.py --------

inference:
img_dir_or_csv_file: ./data/trn_val_tst_kingston.csv # https://drive.google.com/file/d/1uNizOAToa-R_sik0DvBqDUVwjqYdOALJ
working_folder: ./data/output
state_dict_path: ./data/output/checkpoint.pth.tar
chunk_size: 256 # (int) Size (height and width) of each prediction patch. Default: 512
overlap: 10 # (int) Percentage of overlap between 2 chunks. Default: 10
69 changes: 69 additions & 0 deletions conf/config.coordconv.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Deep learning configuration file ------------------------------------------------
# Five sections :
# 1) Global parameters; those are re-used amongst the next three operations (sampling, training and inference)
# 2) Sampling parameters
# 3) Training parameters
# 4) Inference parameters
# 5) Model parameters

# Global parameters

global:
samples_size: 256
num_classes: 5
data_path: ./data/kingston_wv2_40cm/images
number_of_bands: 3
model_name: unet # One of unet, unetsmall, checkpointed_unet or ternausnet
bucket_name: # name of the S3 bucket where data is stored. Leave blank if using local files
task: segmentation # Task to perform. Either segmentation or classification
num_gpus: 1
aux_vector_file:
aux_vector_dist_maps:
meta_map:
scale_data: [0,1]
debug_mode: True
coordconv_convert: true
coordconv_scale: 0.4

# Sample parameters; used in images_to_samples.py -------------------

sample:
prep_csv_file: ./data/trn_val_tst_kingston.csv # https://drive.google.com/file/d/1uNizOAToa-R_sik0DvBqDUVwjqYdOALJ
samples_dist: 200
min_annotated_percent: 10 # Min % of non background pixels in stored samples. Default: 0
mask_reference: False

# Training parameters; used in train_model.py ----------------------

training:
state_dict_path:
output_path: ./data/output
num_trn_samples:
num_val_samples:
num_tst_samples:
batch_size: 8
num_epochs: 100
loss_fn: Lovasz # One of CrossEntropy, Lovasz, Focal, OhemCrossEntropy (*Lovasz for segmentation tasks only)
optimizer: adam # One of adam, sgd or adabound
learning_rate: 0.0001
weight_decay: 0
step_size: 4
gamma: 0.9
class_weights:
batch_metrics: # (int) Metrics computed every (int) batches. If left blank, will not perform metrics. If (int)=1, metrics computed on all batches.
ignore_index: 0 # Specifies a target value that is ignored and does not contribute to the input gradient. Default: None
augmentation:
rotate_limit: 45
rotate_prob: 0.5
hflip_prob: 0.5
dropout:
dropout_prob:

# Inference parameters; used in inference.py --------

inference:
img_dir_or_csv_file: ./data/trn_val_tst_kingston.csv # https://drive.google.com/file/d/1uNizOAToa-R_sik0DvBqDUVwjqYdOALJ
working_folder: ./data/output
state_dict_path: ./data/output/checkpoint.pth.tar
chunk_size: 256 # (int) Size (height and width) of each prediction patch. Default: 512
overlap: 10 # (int) Percentage of overlap between 2 chunks. Default: 10
70 changes: 70 additions & 0 deletions conf/config.metasegm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Deep learning configuration file ------------------------------------------------
# Five sections :
# 1) Global parameters; those are re-used amongst the next three operations (sampling, training and inference)
# 2) Sampling parameters
# 3) Training parameters
# 4) Inference parameters
# 5) Model parameters

# Global parameters

global:
samples_size: 256
num_classes: 5
data_path: ./data/kingston_wv2_40cm/images
number_of_bands: 5
model_name: unet # One of unet, unetsmall, checkpointed_unet or ternausnet
bucket_name: # name of the S3 bucket where data is stored. Leave blank if using local files
task: segmentation # Task to perform. Either segmentation or classification
num_gpus: 1
aux_vector_file:
aux_vector_dist_maps:
meta_map:
"properties/eo:gsd": "scaled_channel"
scale_data: [0,1]
debug_mode: True
coordconv_convert:
coordconv_scale:

# Sample parameters; used in images_to_samples.py -------------------

sample:
prep_csv_file: ./data/trn_val_tst_kingston.csv # https://drive.google.com/file/d/1uNizOAToa-R_sik0DvBqDUVwjqYdOALJ
samples_dist: 200
min_annotated_percent: 10 # Min % of non background pixels in stored samples. Default: 0
mask_reference: False

# Training parameters; used in train_model.py ----------------------

training:
state_dict_path:
output_path: ./data/output
num_trn_samples:
num_val_samples:
num_tst_samples:
batch_size: 8
num_epochs: 100
loss_fn: Lovasz # One of CrossEntropy, Lovasz, Focal, OhemCrossEntropy (*Lovasz for segmentation tasks only)
optimizer: adam # One of adam, sgd or adabound
learning_rate: 0.0001
weight_decay: 0
step_size: 4
gamma: 0.9
class_weights:
batch_metrics: # (int) Metrics computed every (int) batches. If left blank, will not perform metrics. If (int)=1, metrics computed on all batches.
ignore_index: 0 # Specifies a target value that is ignored and does not contribute to the input gradient. Default: None
augmentation:
rotate_limit: 45
rotate_prob: 0.5
hflip_prob: 0.5
dropout:
dropout_prob:

# Inference parameters; used in inference.py --------

inference:
img_dir_or_csv_file: ./data/trn_val_tst_kingston.csv # https://drive.google.com/file/d/1uNizOAToa-R_sik0DvBqDUVwjqYdOALJ
working_folder: ./data/output
state_dict_path: ./data/output/checkpoint.pth.tar
chunk_size: 256 # (int) Size (height and width) of each prediction patch. Default: 512
overlap: 10 # (int) Percentage of overlap between 2 chunks. Default: 10
4 changes: 2 additions & 2 deletions conf/config_ci_segmentation_local.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

global:
samples_size: 256
num_classes: 2
num_classes: 1 # will automatically create a 'background' class
data_path: ./data
number_of_bands: 3
model_name: checkpointed_unet # One of unet, unetsmall, checkpointed_unet, ternausnet, fcn_resnet101, deeplabv3_resnet101
Expand Down Expand Up @@ -47,7 +47,7 @@ training:
dropout_prob: False # (float) Set dropout probability, e.g. 0.5
class_weights: [1.0, 2.0]
batch_metrics: 1
ignore_index: 0 # Specifies a target value that is ignored and does not contribute to the input gradient
ignore_index: # Specifies a target value that is ignored and does not contribute to the input gradient
augmentation:
rotate_limit: 45
rotate_prob: 0.5
Expand Down
8 changes: 4 additions & 4 deletions data/images_to_samples_ci_csv.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
./data/22978945_15.tif,./data/massachusetts_buildings.gpkg,class,trn
./data/23429155_15.tif,./data/massachusetts_buildings.gpkg,class,val
./data/23429155_15.tif,./data/massachusetts_buildings.gpkg,class,val
./data/23429155_15.tif,./data/massachusetts_buildings.gpkg,class,tst
./data/22978945_15.tif,,./data/massachusetts_buildings.gpkg,properties/class,trn
./data/23429155_15.tif,,./data/massachusetts_buildings.gpkg,properties/class,val
./data/23429155_15.tif,,./data/massachusetts_buildings.gpkg,properties/class,val
./data/23429155_15.tif,,./data/massachusetts_buildings.gpkg,properties/class,tst
6 changes: 3 additions & 3 deletions data/inference_classif_ci_csv.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
./data/classification/135.tif
./data/classification/408.tif
./data/classification/2533.tif
./data/classification/135.tif,
./data/classification/408.tif,
./data/classification/2533.tif,
4 changes: 2 additions & 2 deletions data/inference_sem_seg_ci_csv.csv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
./data/22978945_15.tif
./data/23429155_15.tif
./data/22978945_15.tif,
./data/23429155_15.tif,
Loading

0 comments on commit 2d04470

Please sign in to comment.