Skip to content

Commit

Permalink
Delete models parameter section, automatically manage output dir, etc. (
Browse files Browse the repository at this point in the history
#107)

* - conf/config.yaml: remove models parameter section
- inference.py: optionally input a directory with .tif files instead of a .csv
- inference.py: add debug mode.
- utils/model_choice.py: returns state_dict instead of path to state_dict
- utils/utils.py: add tolerance in assigning GPU to task, if some memory is used and some usage existing (ex.: 10%)
- images_to_samples.py, inference.py and train_model.py: create output directory automatically with pathlib

* - fix to previous commit

* - fix to previous commit (2)

* - fix to previous commit (3)
  • Loading branch information
remtav authored and ymoisan committed Nov 1, 2019
1 parent b2d765f commit 278fd9a
Show file tree
Hide file tree
Showing 10 changed files with 294 additions and 311 deletions.
56 changes: 30 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ After installing the required computing environment (see next section), one need
## config.yaml
The `config.yaml` file is located in the `conf` directory. It stores the values of all parameters needed by the deep learning algorithms for all phases. It contains the following 5 sections:
The `config.yaml` file is located in the `conf` directory. It stores the values of all parameters needed by the deep learning algorithms for all phases. It contains the following 4 sections:
```yaml
# Deep learning configuration file ------------------------------------------------
Expand All @@ -77,7 +77,6 @@ The `config.yaml` file is located in the `conf` directory. It stores the values
# 2) Sampling parameters
# 3) Training parameters
# 4) Inference parameters
# 5) Model parameters
```
Specific parameters in each section are shown below, where relevant. For more information about config.yaml, view file directly: [conf/config.yaml](https://github.com/NRCan/geo-deep-learning/blob/master/conf/config.yaml)
Expand All @@ -91,18 +90,6 @@ Specific parameters in each section are shown below, where relevant. For more in
- [FCN (backbone: resnet101)](https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf)
- [Deeplabv3 (backbone: resnet101)](https://arxiv.org/abs/1706.05587)
The `config.yaml` contains parameters for each model. Here's an example:

```yaml
# Models parameters; used in train_model.py and inference.py
models:
unet: unet001
dropout: False # Set dropout regularization
probability: 0.2 # Set with dropout
pretrained: /path/to/model/checkpoint.pth.tar # Optional
```

## `csv` preparation
The `csv` specifies the input images and the reference vector data that will be use during the training.
Each row in the `csv` file must contain 4 comma-separated items:
Expand Down Expand Up @@ -138,8 +125,8 @@ global:
number_of_bands: 3 # Number of bands in input images
model_name: unetsmall # One of unet, unetsmall, checkpointed_unet, ternausnet, or inception
bucket_name: # name of the S3 bucket where data is stored. Leave blank if using local files
debug_mode: True # Prints detailed progress bar
scale_data: [0, 1] # Min and Max for input data rescaling. Default: [0, 1]. Enter False if no rescaling is desired.
debug_mode: True # Prints detailed progress bar
sample:
prep_csv_file: /path/to/csv/file_name.csv # Path to CSV file used in preparation.
Expand Down Expand Up @@ -175,28 +162,29 @@ Details on parameters used by this module:
global:
samples_size: 256 # Size (in pixel) of the samples
num_classes: 2 # Number of classes
data_path: /path/to/data/folder # Path to folder containing samples
data_path: /path/to/data/folder # Path to folder containing samples, model and log files
number_of_bands: 3 # Number of bands in input images
model_name: unetsmall # One of unet, unetsmall, checkpointed_unet, ternausnet, or inception
bucket_name: # name of the S3 bucket where data is stored. Leave blank if using local files
task: segmentation # Task to perform. Either segmentation or classification
num_gpus: 0 # Number of GPU device(s) to use. Default: 0
debug_mode: True # Prints detailed progress bar with sample loss, GPU stats (RAM, % of use) and information about current samples.

training:
output_path: /path/to/output/weights/folder # Path to folder where files containing weights will be written
state_dict_path: False # Pretrained model path as .pth.tar or .pth file. Optional.
num_trn_samples: 4960 # Number of samples to use for training. (default: all samples in hdfs file are taken)
num_val_samples: 2208 # Number of samples to use for validation. (default: all samples in hdfs file are taken)
num_tst_samples: # Number of samples to use for test. (default: all samples in hdfs file are taken)
batch_size: 32 # Size of each batch
num_epochs: 150 # Number of epochs
loss_fn: Lovasz # One of CrossEntropy, Lovasz, Focal, OhemCrossEntropy (*Lovasz for segmentation tasks only)
optimizer: adabound # One of adam, sgd or adabound
loss_fn: Lovasz # One of CrossEntropy, Lovasz, Focal, OhemCrossEntropy (*Lovasz for segmentation tasks only)
optimizer: adabound # One of adam, sgd or adabound
learning_rate: 0.0001 # Initial learning rate
weight_decay: 0 # Value for weight decay (each epoch)
gamma: 0.9 # Multiple for learning rate decay
step_size: 4 # Apply gamma every step_size
gamma: 0.9 # Multiple for learning rate decay
dropout: False # (bool) Use dropout or not. Applies to certain models only.
dropout_prob: False # (float) Set dropout probability, e.g. 0.5
class_weights: [1.0, 2.0] # Weights to apply to each class. A value > 1.0 will apply more weights to the learning of the class.
batch_metrics: 2 # (int) Metrics computed every (int) batches. If left blank, will not perform metrics. If (int)=1, metrics computed on all batches.
ignore_index: 0 # Specifies a target value that is ignored and does not contribute to the input gradient. Default: None
Expand All @@ -214,6 +202,8 @@ Inputs:
Output:
- Trained model weights
- checkpoint.pth.tar Corresponding to the training state where the validation loss was the lowest during the training process.
- Model weights and log files are saved to: data_path / 'model' / name_of_.yaml_file.
- If running multiple tests with same data_path, a suffix containing date and time is added to directory (i.e. name of .yaml file)
Process:
- The application loads the model
Expand All @@ -233,6 +223,11 @@ Optimizers:
- SGD (standard optimizer in [torch.optim](https://pytorch.org/docs/stable/optim.html)
- [Adabound/AdaboundW](https://openreview.net/forum?id=Bkg3g2R9FX)
Advanced features:
- To check how a pretrained model performs on test split without fine-tuning, simply:
1. Specify state_dict_path in training parameters
2. In same parameter section, set num_epochs to 0.
## inference.py
The final step in the process is to assign very pixel in the original image a value corresponding to the most probable class.
Expand All @@ -249,12 +244,12 @@ global:
model_name: unetsmall # One of unet, unetsmall, checkpointed_unet, ternausnet, or inception
bucket_name: # name of the S3 bucket where data is stored. Leave blank if using local files
task: segmentation # Task to perform. Either segmentation or classification
debug_mode: True # Prints detailed progress bar
scale_data: [0, 1] # Min and Max for input data rescaling. Default: [0, 1]. Enter False if no rescaling is desired.
debug_mode: True # Prints detailed progress bar
inference:
img_csv_file: /path/to/csv/containing/images/list.csv # CSV file containing the list of all images to infer on
img_dir_or_csv_file: /path/to/csv/containing/images/list.csv # Directory containing all images to infer on OR CSV file with list of images
working_folder: /path/to/folder/with/resulting/images # Folder where all resulting images will be written
state_dict_path: /path/to/model/weights/for/inference/checkpoint.pth.tar # File containing pre-trained weights
chunk_size: 512 # (int) Size (height and width) of each prediction patch. Default: 512
Expand Down Expand Up @@ -322,15 +317,22 @@ global:
debug_mode: True # Prints detailed progress bar with sample loss, GPU stats (RAM, % of use) and information about current samples.
training:
output_path: /path/to/output/weights/folder # Path to folder where files containing weights will be written
state_dict_path: False # Pretrained model path as .pth.tar or .pth file. Optional.
batch_size: 32 # Size of each batch
num_epochs: 150 # Number of epochs
learning_rate: 0.0001 # Initial learning rate
weight_decay: 0 # Value for weight decay (each epoch)
gamma: 0.9 # Multiple for learning rate decay
step_size: 4 # Apply gamma every step_size
gamma: 0.9 # Multiple for learning rate decay
dropout: False # (bool) Use dropout or not. Applies to certain models only.
dropout_prob: False # (float) Set dropout probability, e.g. 0.5
class_weights: [1.0, 2.0] # Weights to apply to each class. A value > 1.0 will apply more weights to the learning of the class.
batch_metrics: 2 # (int) Metrics computed every (int) batches. If left blank, will not perform metrics. If (int)=1, metrics computed on all batches.
ignore_index: 0 # Specifies a target value that is ignored and does not contribute to the input gradient. Default: None
augmentation:
rotate_limit: 45
rotate_prob: 0.5
hflip_prob: 0.5
```
Note: ```data_path``` must always have a value for classification tasks
Expand All @@ -341,6 +343,8 @@ Output:
- Trained model weights
- checkpoint.pth.tar Corresponding to the training state where the validation loss was the lowest during the training process.
- last_epoch.pth.tar Corresponding to the training state after the last epoch.
- Model weights and log files are saved to: data_path / 'model' / name_of_.yaml_file.
- If running multiple tests with same data_path, a suffix containing date and time is added to directory (i.e. name of .yaml file)
Process:
- The application loads the model specified in the configuration file
Expand Down Expand Up @@ -378,7 +382,7 @@ global:
debug_mode: True # Prints detailed progress bar
inference:
img_csv_file: /path/to/csv/containing/images/list.csv # CSV file containing the list of all images to infer on
img_dir_or_csv_file: /path/to/csv/containing/images/list.csv # Directory containing all images to infer on OR CSV file with list of images
working_folder: /path/to/folder/with/resulting/images # Folder where all resulting images will be written
state_dict_path: /path/to/model/weights/for/inference/checkpoint.pth.tar # File containing pre-trained weights
```
Expand Down
27 changes: 4 additions & 23 deletions conf/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ sample:
# Training parameters; used in train_model.py ----------------------

training:
output_path: /path/to/model/weights/output/folder
state_dict_path: path/to/pretrained/file/checkpoint.pth.tar # optional
num_trn_samples: 4960
num_val_samples: 2208
num_tst_samples: 1000
Expand All @@ -43,6 +43,8 @@ training:
weight_decay: 0
step_size: 4
gamma: 0.9
dropout: False # (bool) Use dropout or not
dropout_prob: # (float) Set dropout probability, e.g. 0.5
class_weights: [1.0, 2.0]
batch_metrics: # (int) Metrics computed every (int) batches. If left blank, will not perform metrics. If (int)=1, metrics computed on all batches.
ignore_index: 0 # Specifies a target value that is ignored and does not contribute to the input gradient. Default: None
Expand All @@ -54,29 +56,8 @@ training:
# Inference parameters; used in inference.py --------

inference:
img_csv_file: /path/to/csv/containing/images/list.csv
img_dir_or_csv_file: /path/to/csv/containing/images/list.csv
working_folder: /path/to/folder/with/resulting/images
state_dict_path: /path/to/model/weights/for/inference/checkpoint.pth.tar
chunk_size: 512 # (int) Size (height and width) of each prediction patch. Default: 512
overlap: 10 # (int) Percentage of overlap between 2 chunks. Default: 10

# Models parameters; used in train_model.py and inference.py

models:
unet: &unet001
dropout: False
probability: 0.2 # Set with dropout
pretrained: False # optional
unetsmall:
<<: *unet001
pretrained:
ternausnet:
pretrained: ./models/TernausNet.pt # Mandatory
checkpointed_unet:
<<: *unet001
fcn_resnet101: # pretrained on coco dataset. Use only for 3 band data.
pretrained: # optional
deeplabv3_resnet101: # pretrained on coco dataset. Use only for 3 band data.
pretrained: # optional
inception:
pretrained: # optional
30 changes: 6 additions & 24 deletions conf/config_ci_classification_local.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ sample:
# Training parameters; used in train_model.py ----------------------

training:
output_path: ./data
state_dict_path: # optional
num_trn_samples: 24
num_val_samples: 24
num_tst_samples:
Expand All @@ -43,35 +43,17 @@ training:
weight_decay: 0
step_size: 4
gamma: 0.9
dropout: False # (bool) Use dropout or not
dropout_prob: False # (float) Set dropout probability, e.g. 0.5
class_weights:
batch_metrics: 1
ignore_index: # Specifies a target value that is ignored and does not contribute to the input gradient

# Inference parameters; used in inference.py --------

inference:
img_csv_file: ./data/inference_classif_ci_csv.csv
img_dir_or_csv_file: ./data/inference_classif_ci_csv.csv
working_folder: ./data/classification
state_dict_path: ./data/checkpoint.pth.tar
state_dict_path: ./data/model/config_ci_classification_local/checkpoint.pth.tar
chunk_size:
overlap:

# Models parameters; used in train_model.py and inference.py

models:
unet: &unet001
dropout: False
probability: 0.2 # Set with dropout
pretrained: False # optional
unetsmall:
<<: *unet001
ternausnet:
pretrained: ./models/TernausNet.pt # Mandatory
checkpointed_unet:
<<: *unet001
fcn_resnet101: # only for 3 band data
pretrained: # optional
deeplabv3_resnet101: # only for 3 band data
pretrained: # optional
inception:
pretrained: # optional
overlap:
30 changes: 6 additions & 24 deletions conf/config_ci_segmentation_local.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ sample:
# Training parameters; used in train_model.py ----------------------

training:
output_path: ./data
state_dict_path: # optional
num_trn_samples:
num_val_samples:
num_tst_samples:
Expand All @@ -43,6 +43,8 @@ training:
weight_decay: 0
step_size: 4
gamma: 0.9
dropout: False # (bool) Use dropout or not
dropout_prob: False # (float) Set dropout probability, e.g. 0.5
class_weights: [1.0, 2.0]
batch_metrics: 1
ignore_index: 0 # Specifies a target value that is ignored and does not contribute to the input gradient
Expand All @@ -54,28 +56,8 @@ training:
# Inference parameters; used in inference.py --------

inference:
img_csv_file: ./data/inference_sem_seg_ci_csv.csv
img_dir_or_csv_file: ./data/inference_sem_seg_ci_csv.csv
working_folder: ./data
state_dict_path: ./data/checkpoint.pth.tar
state_dict_path: ./data/model/config_ci_segmentation_local/checkpoint.pth.tar
chunk_size: 512 # (int) Size (height and width) of each prediction patch. Default: 512
overlap: 10 # (int) Percentage of overlap between 2 chunks. Default: 10

# Models parameters; used in train_model.py and inference.py

models:
unet: &unet001
dropout: False
probability: 0.2 # Set with dropout
pretrained: False # optional
unetsmall:
<<: *unet001
ternausnet:
pretrained: ./models/TernausNet.pt # Mandatory
checkpointed_unet:
<<: *unet001
fcn_resnet101: # only for 3 band data
pretrained: # optional
deeplabv3_resnet101: # only for 3 band data
pretrained: /home/rtavon/Documents/kingston-test-deeplabv3-2/model/checkpoint.pth.tar # optional
inception:
pretrained: # optional
overlap: 10 # (int) Percentage of overlap between 2 chunks. Default: 10
8 changes: 7 additions & 1 deletion images_to_samples.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import argparse
import os
from pathlib import Path

import numpy as np
import warnings
import fiona
Expand Down Expand Up @@ -161,6 +163,7 @@ def main(params):
gpkg_file = []
bucket_name = params['global']['bucket_name']
data_path = params['global']['data_path']
Path.mkdir(Path(data_path), exist_ok=True)
csv_file = params['sample']['prep_csv_file']

if bucket_name:
Expand Down Expand Up @@ -199,7 +202,10 @@ def main(params):
bucket.download_file(info['gpkg'], info['gpkg'].split('/')[-1])
info['gpkg'] = info['gpkg'].split('/')[-1]

assert_band_number(info['tif'], params['global']['number_of_bands'])
if os.path.isfile(info['tif']):
assert_band_number(info['tif'], params['global']['number_of_bands'])
else:
raise IOError(f'Could not locate "{info["tif"]}". Make sure file exists in this directory.')

_tqdm.set_postfix(OrderedDict(file=f'{info["tif"]}', sample_size=params['global']['samples_size']))

Expand Down
Loading

0 comments on commit 278fd9a

Please sign in to comment.