Skip to content

Latest commit

 

History

History
176 lines (126 loc) · 9.72 KB

pytorch-collect.md

File metadata and controls

176 lines (126 loc) · 9.72 KB

Back | Next | Contents
Transfer Learning

Collecting your own Datasets

In order to collect your own datasets for training customized models to classify objects or scenes of your choosing, we've created an easy-to-use tool called camera-capture for capturing and labelling images on your Jetson from live video:

The tool will create datasets with the following directory structure on disk:

‣ train/
	• class-A/
	• class-B/
	• ...
‣ val/
	• class-A/
	• class-B/
	• ...
‣ test/
	• class-A/
	• class-B/
	• ...

where class-A, class-B, ect. will be subdirectories containing the data for each object class that you've defined in a class label file. The names of these class subdirectories will match the class label names that we'll create below. These subdirectories will automatically be populated by the tool for the train, val, and test sets from the classes listed in the label file, and a sequence of JPEG images will be saved under each.

Note that above is the organization structure expected by the PyTorch training script that we've been using. If you inspect the Cat/Dog and PlantCLEF datasets, they're also organized in the same way.

Creating the Label File

First, create an empty directory for storing your dataset and a text file that will define the class labels (usually called labels.txt). The label file contains one class label per line, and is alphabetized (this is important so the ordering of the classes in the label file matches the ordering of the corresponding subdirectories on disk). As mentioned above, the camera-capture tool will automatically populate the necessary subdirectories for each class from this label file.

Here's an example labels.txt file with 5 classes:

background
brontosaurus
tree
triceratops
velociraptor

And here's the corresponding directory structure that the tool will create:

‣ train/
	• background/
	• brontosaurus/
	• tree/
	• triceratops/
	• velociraptor/
‣ val/
	• background/
	• brontosaurus/
	• tree/
	• triceratops/
	• velociraptor/
‣ test/
	• background/
	• brontosaurus/
	• tree/
	• triceratops/
	• velociraptor/

Next, we'll cover the command-line options for starting the tool.

Launching the Tool

The source for the camera-capture tool can be found under jetson-inference/tools/camera-capture/, and like the other programs from the repo it gets built to the aarch64/bin directory and installed under /usr/local/bin/

The camera-capture tool accepts 3 optional command-line arguments:

  • --camera flag setting the camera device to use
    • MIPI CSI cameras are used by specifying the sensor index (0 or 1, ect.)
    • V4L2 USB cameras are used by specifying their /dev/video node (/dev/video0, /dev/video1, ect.)
    • The default is to use MIPI CSI sensor 0 (--camera=0)
  • --width and --height flags setting the camera resolution (default is 1280x720)
    • The resolution should be set to a format that the camera supports.
    • Query the available formats with the following commands:
      $ sudo apt-get install v4l-utils
      $ v4l2-ctl --list-formats-ext
  • --fps flag setting the camera fps (default is 30)

Below are some example commands for launching the tool:

$ camera-capture                          # using default MIPI CSI camera (1280x720)
$ camera-capture --camera=/dev/video0     # using V4L2 camera /dev/video0 (1280x720)
$ camera-capture --width=640 --height=480 # using default MIPI CSI camera (640x480)

note: for example cameras to use, see these sections of the Jetson Wiki:
             - Nano:  https://eLinux.org/Jetson_Nano#Cameras
             - Xavier: https://eLinux.org/Jetson_AGX_Xavier#Ecosystem_Products_.26_Cameras
             - TX1/TX2: developer kits include an onboard MIPI CSI sensor module (0V5693)

Collecting Data

Below is the Data Capture Control window, which allows you to pick the desired path to the dataset and load the class label file that you created above, and then presents options for selecting the current object class and train/val/test set that you are currently collecting data for:

First, open the dataset path and class labels. The tool will then create the dataset structure discussed above (unless these subdirectories already exist), and you will see your object labels populated inside the Current Class drop-down.

Then position the camera at the object or scene you have currently selected in the drop-down, and click the Capture button (or press the spacebar) when you're ready to take an image. The images will be saved under that class subdirectory in the train, val, or test set. The status bar displays how many images have been saved under that category.

It's recommended to collect at least 100 training images per class before attempting training. A rule of thumb for the validation set is that it should be roughly 10-20% the size of the training set, and the size of the test set is simply dictated by how many static images you want to test on. You can also just run the camera to test your model if you'd like.

It's important that your data is collected from varying object orientations, camera viewpoints, lighting conditions, and ideally with different backgrounds to create a model that is robust to noise and changes in environment. If you find that you're model isn't performing as well as you'd like, try adding more training data and playing around with the conditions.

Training your Model

When you've collected a bunch of data, then you can try training a model on it, just like we've done before. The training process is the same as the previous examples, and the same PyTorch scripts are used:

$ cd jetson-inference/python/training/classification
$ python train.py --model-dir=<YOUR-MODEL> <PATH-TO-YOUR-DATASET>

Like before, after training you'll need to convert your PyTorch model to ONNX:

$ python onnx_export.py --model-dir=<YOUR-MODEL>

The converted model will be saved under <YOUR-MODEL>/resnet18.onnx, which you can then load with the imagenet-console and imagenet-camera programs like we did in the previous examples:

DATASET=<PATH-TO-YOUR-DATASET>

# C++
imagenet-camera --model=<YOUR-MODEL>/resnet18.onnx --input_blob=input_0 --output_blob=output_0 --labels=$DATASET/labels.txt

# Python
imagenet-camera.py --model=<YOUR-MODEL>/resnet18.onnx --input_blob=input_0 --output_blob=output_0 --labels=$DATASET/labels.txt

If you need to, go back and collect more training data and re-train your model again. You can restart the again and pick up where you left off using the --resume and --epoch-start flags (run python train.py --help for more info). Remember to re-export the model to ONNX after re-training.

What's Next

This is the last step of the Hello AI World tutorial, which covers inferencing and transfer learning on Jetson with TensorRT and PyTorch. To recap, together we've covered:

  • Using image recognition networks to classify images
  • Coding your own image recognition programs in Python and C++
  • Classifying video from a live camera stream
  • Performing object detection to locate object coordinates
  • Re-training models with PyTorch using transfer learning
  • Collecting your own datasets and training your own models

Next we encourage you to experiment and apply what you've learned to other projects, perhaps taking advantage of Jetson's embedded form-factor - for example an autonomous robot or intelligent camera-based system. Here are some example ideas that you could play around with:

  • use GPIO to trigger external actuators or LEDs when an object is detected
  • an autonomous robot that can find or follow an object
  • a handheld battery-powered camera + Jetson + mini-display
  • an interactive toy or treat dispenser for your pet
  • a smart doorbell camera that greets your guests

For more examples to inspire your creativity, see the Jetson Projects page. Have fun and good luck!

You can also follow our Two Days to a Demo tutorial, which covers training of even larger datasets in the cloud or on a PC using discrete NVIDIA GPU(s). Two Days to a Demo also covers semantic segmentation, which is like image classification, but on a per-pixel level instead of predicting one class for the entire image.

Back | Re-training on the PlantCLEF Dataset

© 2016-2019 NVIDIA | Table of Contents