Delete models parameter section, automatically manage output dir, etc. (

#107) * - conf/config.yaml: remove models parameter section - inference.py: optionally input a directory with .tif files instead of a .csv - inference.py: add debug mode. - utils/model_choice.py: returns state_dict instead of path to state_dict - utils/utils.py: add tolerance in assigning GPU to task, if some memory is used and some usage existing (ex.: 10%) - images_to_samples.py, inference.py and train_model.py: create output directory automatically with pathlib * - fix to previous commit * - fix to previous commit (2) * - fix to previous commit (3)
NRCan · Nov 1, 2019 · 278fd9a · 278fd9a
1 parent b2d765f
commit 278fd9a
Show file tree

Hide file tree

Showing 10 changed files with 294 additions and 311 deletions.
diff --git a/README.md b/README.md
@@ -68,7 +68,7 @@ After installing the required computing environment (see next section), one need
 
 ## config.yaml
 
-The `config.yaml` file is located in the `conf` directory.  It stores the values of all parameters needed by the deep learning algorithms for all phases.  It contains the following 5 sections:
+The `config.yaml` file is located in the `conf` directory.  It stores the values of all parameters needed by the deep learning algorithms for all phases.  It contains the following 4 sections:
 
 ```yaml
 # Deep learning configuration file ------------------------------------------------
@@ -77,7 +77,6 @@ The `config.yaml` file is located in the `conf` directory.  It stores the values
 #   2) Sampling parameters
 #   3) Training parameters
 #   4) Inference parameters
-#   5) Model parameters
 ```
 
 Specific parameters in each section are shown below, where relevant. For more information about config.yaml, view file directly: [conf/config.yaml](https://github.com/NRCan/geo-deep-learning/blob/master/conf/config.yaml)
@@ -91,18 +90,6 @@ Specific parameters in each section are shown below, where relevant. For more in
 - [FCN (backbone: resnet101)](https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf)
 - [Deeplabv3 (backbone: resnet101)](https://arxiv.org/abs/1706.05587)
 
-The `config.yaml` contains parameters for each model. Here's an example:
-
-```yaml
-# Models parameters; used in train_model.py and inference.py
-
-models:
-  unet: unet001
-    dropout: False                                   # Set dropout regularization
-    probability: 0.2                                 # Set with dropout
-    pretrained: /path/to/model/checkpoint.pth.tar    # Optional
-```    
-
 ## `csv` preparation
 The `csv` specifies the input images and the reference vector data that will be use during the training.
 Each row in the `csv` file must contain 4 comma-separated items:
@@ -138,8 +125,8 @@ global:
   number_of_bands: 3                # Number of bands in input images
   model_name: unetsmall             # One of unet, unetsmall, checkpointed_unet, ternausnet, or inception
   bucket_name:                      # name of the S3 bucket where data is stored. Leave blank if using local files
-  debug_mode: True                  # Prints detailed progress bar 
   scale_data: [0, 1]                # Min and Max for input data rescaling. Default: [0, 1]. Enter False if no rescaling is desired.
+  debug_mode: True                  # Prints detailed progress bar
 
 sample:
   prep_csv_file: /path/to/csv/file_name.csv     # Path to CSV file used in preparation.
@@ -175,28 +162,29 @@ Details on parameters used by this module:
 global:
   samples_size: 256                 # Size (in pixel) of the samples
   num_classes: 2                    # Number of classes
-  data_path: /path/to/data/folder   # Path to folder containing samples
+  data_path: /path/to/data/folder   # Path to folder containing samples, model and log files
   number_of_bands: 3                # Number of bands in input images
   model_name: unetsmall             # One of unet, unetsmall, checkpointed_unet, ternausnet, or inception
   bucket_name:                      # name of the S3 bucket where data is stored. Leave blank if using local files
   task: segmentation                # Task to perform. Either segmentation or classification
   num_gpus: 0                       # Number of GPU device(s) to use. Default: 0
   debug_mode: True                  # Prints detailed progress bar with sample loss, GPU stats (RAM, % of use) and information about current samples.
 
-
 training:
-  output_path: /path/to/output/weights/folder   # Path to folder where files containing weights will be written
+  state_dict_path: False      # Pretrained model path as .pth.tar or .pth file. Optional.
   num_trn_samples: 4960                         # Number of samples to use for training. (default: all samples in hdfs file are taken)
   num_val_samples: 2208                         # Number of samples to use for validation. (default: all samples in hdfs file are taken)
   num_tst_samples:                              # Number of samples to use for test. (default: all samples in hdfs file are taken)
   batch_size: 32                                # Size of each batch
   num_epochs: 150                               # Number of epochs
-  loss_fn: Lovasz # One of CrossEntropy, Lovasz, Focal, OhemCrossEntropy (*Lovasz for segmentation tasks only)
-  optimizer: adabound # One of adam, sgd or adabound
+  loss_fn: Lovasz                               # One of CrossEntropy, Lovasz, Focal, OhemCrossEntropy (*Lovasz for segmentation tasks only)
+  optimizer: adabound                           # One of adam, sgd or adabound
   learning_rate: 0.0001                         # Initial learning rate
   weight_decay: 0                               # Value for weight decay (each epoch)
-  gamma: 0.9                                    # Multiple for learning rate decay
   step_size: 4                                  # Apply gamma every step_size
+  gamma: 0.9                                    # Multiple for learning rate decay
+  dropout: False                                # (bool) Use dropout or not. Applies to certain models only.
+  dropout_prob: False                           # (float) Set dropout probability, e.g. 0.5
   class_weights: [1.0, 2.0]                     # Weights to apply to each class. A value > 1.0 will apply more weights to the learning of the class.
   batch_metrics: 2                              # (int) Metrics computed every (int) batches. If left blank, will not perform metrics. If (int)=1, metrics computed on all batches.
   ignore_index: 0                               # Specifies a target value that is ignored and does not contribute to the input gradient. Default: None
@@ -214,6 +202,8 @@ Inputs:
 Output:
 - Trained model weights
     - checkpoint.pth.tar        Corresponding to the training state where the validation loss was the lowest during the training process.
+- Model weights and log files are saved to: data_path / 'model' / name_of_.yaml_file.
+- If running multiple tests with same data_path, a suffix containing date and time is added to directory (i.e. name of .yaml file)
 
 Process:
 - The application loads the model
@@ -233,6 +223,11 @@ Optimizers:
 - SGD (standard optimizer in [torch.optim](https://pytorch.org/docs/stable/optim.html)
 - [Adabound/AdaboundW](https://openreview.net/forum?id=Bkg3g2R9FX)
 
+Advanced features:
+- To check how a pretrained model performs on test split without fine-tuning, simply:
+    1. Specify state_dict_path in training parameters
+    2. In same parameter section, set num_epochs to 0.
+
 ## inference.py
 
 The final step in the process is to assign very pixel in the original image a value corresponding to the most probable class.
@@ -249,12 +244,12 @@ global:
   model_name: unetsmall     # One of unet, unetsmall, checkpointed_unet, ternausnet, or inception
   bucket_name:              # name of the S3 bucket where data is stored. Leave blank if using local files
   task: segmentation        # Task to perform. Either segmentation or classification
-  debug_mode: True          # Prints detailed progress bar   
   scale_data: [0, 1]        # Min and Max for input data rescaling. Default: [0, 1]. Enter False if no rescaling is desired.
+  debug_mode: True          # Prints detailed progress bar
 
 
 inference:
-  img_csv_file: /path/to/csv/containing/images/list.csv                       # CSV file containing the list of all images to infer on
+  img_dir_or_csv_file: /path/to/csv/containing/images/list.csv                 # Directory containing all images to infer on OR CSV file with list of images
   working_folder: /path/to/folder/with/resulting/images                       # Folder where all resulting images will be written
   state_dict_path: /path/to/model/weights/for/inference/checkpoint.pth.tar    # File containing pre-trained weights
   chunk_size: 512                                                             # (int) Size (height and width) of each prediction patch. Default: 512
@@ -322,15 +317,22 @@ global:
   debug_mode: True                  # Prints detailed progress bar with sample loss, GPU stats (RAM, % of use) and information about current samples.
 
 training:
-  output_path: /path/to/output/weights/folder   # Path to folder where files containing weights will be written
+  state_dict_path: False      # Pretrained model path as .pth.tar or .pth file. Optional.
   batch_size: 32                                # Size of each batch
   num_epochs: 150                               # Number of epochs
   learning_rate: 0.0001                         # Initial learning rate
   weight_decay: 0                               # Value for weight decay (each epoch)
-  gamma: 0.9                                    # Multiple for learning rate decay
   step_size: 4                                  # Apply gamma every step_size
+  gamma: 0.9                                    # Multiple for learning rate decay
+  dropout: False                                # (bool) Use dropout or not. Applies to certain models only.
+  dropout_prob: False                           # (float) Set dropout probability, e.g. 0.5
   class_weights: [1.0, 2.0]                     # Weights to apply to each class. A value > 1.0 will apply more weights to the learning of the class.
   batch_metrics: 2                              # (int) Metrics computed every (int) batches. If left blank, will not perform metrics. If (int)=1, metrics computed on all batches.
+  ignore_index: 0                               # Specifies a target value that is ignored and does not contribute to the input gradient. Default: None
+  augmentation:
+    rotate_limit: 45
+    rotate_prob: 0.5
+    hflip_prob: 0.5
 ```
 Note: ```data_path``` must always have a value for classification tasks
 
@@ -341,6 +343,8 @@ Output:
 - Trained model weights
     - checkpoint.pth.tar        Corresponding to the training state where the validation loss was the lowest during the training process.
     - last_epoch.pth.tar         Corresponding to the training state after the last epoch.
+- Model weights and log files are saved to: data_path / 'model' / name_of_.yaml_file.
+- If running multiple tests with same data_path, a suffix containing date and time is added to directory (i.e. name of .yaml file)
 
 Process:
 - The application loads the model specified in the configuration file
@@ -378,7 +382,7 @@ global:
   debug_mode: True          # Prints detailed progress bar
 
 inference:
-  img_csv_file: /path/to/csv/containing/images/list.csv                       # CSV file containing the list of all images to infer on
+  img_dir_or_csv_file: /path/to/csv/containing/images/list.csv                 # Directory containing all images to infer on OR CSV file with list of images
   working_folder: /path/to/folder/with/resulting/images                       # Folder where all resulting images will be written
   state_dict_path: /path/to/model/weights/for/inference/checkpoint.pth.tar    # File containing pre-trained weights
 ```

diff --git a/conf/config.yaml b/conf/config.yaml
@@ -31,7 +31,7 @@ sample:
 # Training parameters; used in train_model.py ----------------------
 
 training:
-  output_path: /path/to/model/weights/output/folder
+  state_dict_path: path/to/pretrained/file/checkpoint.pth.tar    # optional
   num_trn_samples: 4960
   num_val_samples: 2208
   num_tst_samples: 1000
@@ -43,6 +43,8 @@ training:
   weight_decay: 0
   step_size: 4
   gamma: 0.9
+  dropout: False    # (bool) Use dropout or not
+  dropout_prob:    # (float) Set dropout probability, e.g. 0.5
   class_weights: [1.0, 2.0]
   batch_metrics:    # (int) Metrics computed every (int) batches. If left blank, will not perform metrics. If (int)=1, metrics computed on all batches.
   ignore_index: 0 # Specifies a target value that is ignored and does not contribute to the input gradient. Default: None
@@ -54,29 +56,8 @@ training:
 # Inference parameters; used in inference.py --------
 
 inference:
-  img_csv_file: /path/to/csv/containing/images/list.csv
+  img_dir_or_csv_file: /path/to/csv/containing/images/list.csv
   working_folder: /path/to/folder/with/resulting/images
   state_dict_path: /path/to/model/weights/for/inference/checkpoint.pth.tar
   chunk_size: 512 # (int) Size (height and width) of each prediction patch. Default: 512
   overlap: 10 # (int) Percentage of overlap between 2 chunks. Default: 10
-
-# Models parameters; used in train_model.py and inference.py
-
-models:
-  unet:   &unet001
-    dropout: False
-    probability: 0.2    # Set with dropout
-    pretrained: False   # optional
-  unetsmall:
-    <<: *unet001
-    pretrained:
-  ternausnet:
-    pretrained: ./models/TernausNet.pt    # Mandatory
-  checkpointed_unet:
-    <<: *unet001
-  fcn_resnet101:   # pretrained on coco dataset. Use only for 3 band data.
-    pretrained:    # optional
-  deeplabv3_resnet101:   # pretrained on coco dataset. Use only for 3 band data.
-    pretrained:    # optional
-  inception:
-    pretrained:    # optional
diff --git a/conf/config_ci_classification_local.yaml b/conf/config_ci_classification_local.yaml
@@ -31,7 +31,7 @@ sample:
 # Training parameters; used in train_model.py ----------------------
 
 training:
-  output_path: ./data
+  state_dict_path:    # optional
   num_trn_samples: 24
   num_val_samples: 24
   num_tst_samples:
@@ -43,35 +43,17 @@ training:
   weight_decay: 0
   step_size: 4
   gamma: 0.9
+  dropout: False    # (bool) Use dropout or not
+  dropout_prob: False    # (float) Set dropout probability, e.g. 0.5
   class_weights:
   batch_metrics: 1
   ignore_index: # Specifies a target value that is ignored and does not contribute to the input gradient
 
 # Inference parameters; used in inference.py --------
 
 inference:
-  img_csv_file: ./data/inference_classif_ci_csv.csv
+  img_dir_or_csv_file: ./data/inference_classif_ci_csv.csv
   working_folder: ./data/classification
-  state_dict_path: ./data/checkpoint.pth.tar
+  state_dict_path: ./data/model/config_ci_classification_local/checkpoint.pth.tar
   chunk_size:
-  overlap:
-
-# Models parameters; used in train_model.py and inference.py
-
-models:
-  unet:   &unet001
-    dropout: False
-    probability: 0.2    # Set with dropout
-    pretrained: False   # optional
-  unetsmall:
-    <<: *unet001
-  ternausnet:
-    pretrained: ./models/TernausNet.pt    # Mandatory
-  checkpointed_unet: 
-    <<: *unet001
-  fcn_resnet101:   # only for 3 band data
-    pretrained:    # optional
-  deeplabv3_resnet101:   # only for 3 band data
-    pretrained:    # optional
-  inception:
-    pretrained:    # optional
+  overlap:
diff --git a/conf/config_ci_segmentation_local.yaml b/conf/config_ci_segmentation_local.yaml
@@ -31,7 +31,7 @@ sample:
 # Training parameters; used in train_model.py ----------------------
 
 training:
-  output_path: ./data
+  state_dict_path:    # optional
   num_trn_samples:
   num_val_samples:
   num_tst_samples: 
@@ -43,6 +43,8 @@ training:
   weight_decay: 0
   step_size: 4
   gamma: 0.9
+  dropout: False    # (bool) Use dropout or not
+  dropout_prob: False    # (float) Set dropout probability, e.g. 0.5
   class_weights: [1.0, 2.0]
   batch_metrics: 1
   ignore_index: 0 # Specifies a target value that is ignored and does not contribute to the input gradient
@@ -54,28 +56,8 @@ training:
 # Inference parameters; used in inference.py --------
 
 inference:
-  img_csv_file: ./data/inference_sem_seg_ci_csv.csv
+  img_dir_or_csv_file: ./data/inference_sem_seg_ci_csv.csv
   working_folder: ./data
-  state_dict_path: ./data/checkpoint.pth.tar
+  state_dict_path: ./data/model/config_ci_segmentation_local/checkpoint.pth.tar
   chunk_size: 512 # (int) Size (height and width) of each prediction patch. Default: 512
-  overlap: 10 # (int) Percentage of overlap between 2 chunks. Default: 10
-
-# Models parameters; used in train_model.py and inference.py
-
-models:
-  unet:   &unet001
-    dropout: False
-    probability: 0.2    # Set with dropout
-    pretrained: False   # optional
-  unetsmall:
-    <<: *unet001
-  ternausnet:
-    pretrained: ./models/TernausNet.pt    # Mandatory
-  checkpointed_unet: 
-    <<: *unet001
-  fcn_resnet101:   # only for 3 band data
-    pretrained:    # optional
-  deeplabv3_resnet101:   # only for 3 band data
-    pretrained: /home/rtavon/Documents/kingston-test-deeplabv3-2/model/checkpoint.pth.tar   # optional
-  inception:
-    pretrained:    # optional
+  overlap: 10 # (int) Percentage of overlap between 2 chunks. Default: 10
diff --git a/images_to_samples.py b/images_to_samples.py
@@ -1,5 +1,7 @@
 import argparse
 import os
+from pathlib import Path
+
 import numpy as np
 import warnings
 import fiona
@@ -161,6 +163,7 @@ def main(params):
     gpkg_file = []
     bucket_name = params['global']['bucket_name']
     data_path = params['global']['data_path']
+    Path.mkdir(Path(data_path), exist_ok=True)
     csv_file = params['sample']['prep_csv_file']
 
     if bucket_name:
@@ -199,7 +202,10 @@ def main(params):
                     bucket.download_file(info['gpkg'], info['gpkg'].split('/')[-1])
                 info['gpkg'] = info['gpkg'].split('/')[-1]
 
-            assert_band_number(info['tif'], params['global']['number_of_bands'])
+            if os.path.isfile(info['tif']):
+                assert_band_number(info['tif'], params['global']['number_of_bands'])
+            else:
+                raise IOError(f'Could not locate "{info["tif"]}". Make sure file exists in this directory.')
 
             _tqdm.set_postfix(OrderedDict(file=f'{info["tif"]}', sample_size=params['global']['samples_size']))