Skip to content

C. Training Procedure

Usman Zahidi edited this page Nov 24, 2024 · 38 revisions

Training Procedure

Training Package and Model Selection

The "FruitDetector" module employs Mask-RCNN to predict the classified masks overlapping a fruit in a given image. Mask-RCNN is proposed and developed by a team at Facebook AI Research (FAIR) as an extension to Faster-RCNN as an instance segmentation tool. There are several packages that help in training, prediction and evaluation of Mask-RCNN models such as torchvision, mmdetection and detectron2.

We selected detectron2 over others because it comes from the originator and is well-maintained. There are several pretrained and baseline models available for detectron2 at [1]. The pretrained models are trained on selection of datasets that are native to the detectron2 package. If a new dataset has to be trained, then it should be added as a custom dataset and should be trained on a defined pretrained dataset.

Training Procedure

FruitDetector Overview

The FruitDetector module has two modes of execution that primarily varies according to the required output. If the prediction results visualisation is required together with its COCO json file, then it is executed in debug mode. If low-latency execution with in-memory json message is required then it runs in optimized mode. The module include training, prediction and evaluation options which is mainly controlled through the configuration file.

Fruitdetector Overview

FruitDetector Overview: FruitDetector components and execution overview.

Installation

The installation of dependencies are defined in the requirements file available at [2]. The installation of packages is performed by a single command;

pip install -r fd_only_requirements.txt
Hardware requirements

The minimum requirement for host PC with x64 architecture is 16 GB RAM and a GPU capable of running CUDA 10. We configured FPN101 Resnet pretrained model which is limited with a minimum GPU memory requiremeny of 5.2 GB.

_BASE_: "../Base-RetinaNet.yaml"
MODEL:
  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-101.pkl"
  RESNETS:
    DEPTH: 101
SOLVER:
  STEPS: (210000, 250000)
  MAX_ITER: 270000

Annotation

The annotation is performed by creating labelled mask on each fruit. We have two different types of annotations such as 1. Fruit Only 2. Ripeness categories (ripe, unripe). There are several tools available for annotations such as labelbox, V7 Darwin that are paid platforms, contrarily CVAT is Intel's free annotation platform. A screenshot of CVAT annotation is shown in the example below. The example shows image and metadata info along with the mask coordinates under segmentation section.

CVAT screenshot

CVAT Annotation environment.

The annotations are exported to COCO 1.0 format which is readable in detectron2. Parts of annotations are shown here for illustration.

"info": {
    "description": "Exported from AOC_Json_Exporter",
    "url": "https://www.lincoln.ac.uk/home/liat/",
    "version": "1.0",
    "year": 2021,
    "contributor": "Lincoln Institute of Agri-food Technology",
    "date_created": "2024-11-16 15:02:16.135036"
  },
  "licenses": [
    {
      "url": "https://www.lincoln.ac.uk/home/liat/",
      "id": "1",
      "name": "placeholder license"
    }
  ],
  "images": [
    {
      "license": 0,
      "file_name": "20231128-150802.jpg",
      "coco_url": "",
      "height": 1080,
      "width": 1920,
      "date_captured": "",
      "flickr_url": "n/a",
      "darwin_url": "",
      "darwin_workview_url": "",
      "id": 1
    }
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "segmentation": [
        [
          1583.0,
          545.5,
          1582.0,
          545.5,
          1581.0,
          545.5

If the annotation is performed as labelled masks only, then package like MaskToCOCOJson available at [3] could be used to convert it to JSON format.

Configuration

After annotation all images subset and their respective annotation files should be place in three folder (train,test,val). As the user-defined dataset are non-native and custom for the detectron2 base, therefore its registration is required. There are five different types of configurations that are described below;

Datasets

This category comprises the user-defined train and test dataset names and download urls for train and test datasets if they are not available in the file directories. If under settings category download_assets is set to true the all these dataset will be downloaded to data directory before the start of the training.

Files

The files category has multiple file settings such as defining "pretrained_model_file" entry which trains on top of user-defined pre-trained model. The "model_file" entry has the model file name which is used for prediction. If one is training a model then it is recommended to put blank "model_file" entry as the output model from training would be place here. If pre-trained model setting is empty string then the base Resnet 101 is used, this resnet model is defined in "config_file" entry. The "train_metadata_catalog_file" and "test_metadata_catalog_file" entries are for defining paths to metadata catalog file of detectron2. The files save class names, dataset information and colour description for annotation process. The file name given in these entries will be created during the training process. The train and test datasets' annotations are given in "train_annotation_file" and "test_annotation_file" entries, respectively.

Directories

The directories with images for datasets are defined in "train_image_dir" and "test_image_dir" entries. The "training_output_dir" is the directory which holds all iterative outputs which is saved after 5000 iterations and other statistical measure used for evaluation purpose. The "prediction_output_dir" is the directory where annotated prediction images are saved. The "prediction_json_dir" is the path where all predicted json files are saved. Both these entries are enabled only in the debug mode.

Training

The configuration related to training such as number of iteration, class labels and learning rate are defined in "epochs", "number_of_classes", and "learning_rate" entries. The use may select between SGD or Adam in "optimizer" entry.

Settings

The settings category is for general execution options such as "download_assets" is enabled when pre-trained models is to be downloaded before the training process starts automatically from the module. The "segm_masks_only" and "bbox" are for annotation output settings to set segmentation mask or to include the bounding box.The module output the orientation of fruits and has the possibility to select between two methods namely a. PCA and b. LOG_POLAR transform. PCA is set as default and is defined in "orientation_method" entry.

datasets:
  train_dataset_name: 'aoc_train_dataset'
  test_dataset_name: 'aoc_test_dataset'
  dataset_train_annotation_url: 'https://lncn.ac/aocanntrain'
  dataset_train_images_url: 'https://lncn.ac/aocdatatrain'
  dataset_test_annotation_url: 'https://lncn.ac/aocanntest'
  dataset_test_images_url: 'https://lncn.ac/aocdatatest'
files:
  # pretrained model used as a training base model, if set as empty, the config file will use imagenet trained model as base.
  # model file is required for prediction, if further training of previously trained model is required then model
  # and pretrained model should have the path to that model file.
  pretrained_model_file: './model/aoc_tomato_ripeness_151_90k.pth'
  model_file: ''
  config_file: 'COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml'
  test_metadata_catalog_file: './data/dataset_catalogs/tomato_ripeness_test_metadata_catalog.pkl'
  train_dataset_catalog_file: './data/dataset_catalogs/tomato_ripeness_train_dataset_catalog.pkl'
  train_annotation_file: './data/tomato_dataset/train/annotations/ripeness_class_annotations.json'
  test_annotation_file: './data/tomato_dataset/test/annotations/ripeness_class_annotations.json'
  model_url: 'https://lncn.ac/aocmodel'
  meta_catalog_url: 'https://lncn.ac/aocmeta'
  train_catalog_url: 'https://lncn.ac/aoccat'
directories:
  train_image_dir: './data/tomato_dataset/train/'
  test_image_dir: './data/tomato_dataset/test/'
  training_output_dir: './data/training_output/'
  prediction_output_dir: 'data/prediction_output/test_images/'
  prediction_json_dir: './data/annotations/predicted/'
training:
  epochs: 50000
  number_of_classes: 2
  optimizer: 'SGD'
  learning_rate: 0.0025
settings:
  download_assets: false # if assets such as model and datasets should be downloaded
  rename_pred_images: false #rename the predicted images in img_000001.png like format
  segm_masks_only: true
  bbox: true
  orientation_method: 'PCA'  #choose between PCA or LOG_POLAR

Training and Evaluation

All folders defined above should be registered as a separate dataset with unique names. In the following example they are "aoc_train_dataset" and "aoc_test_dataset". The dataset should represent the image folder and the annotation file belonging to it. The directory associated with the dataset is defined in "train_image_dir" under "directories" category of the configuration file. The annotation which is a json file is defined in "train_annotation_file" entry. These three entries would make a custom dataset for aoc_train_dataset in our example. The module is controlled by the configuration file whose entries are explained in configuration section. The trainer and predictor modules are called by following commands.

python detectron_trainer.py
python detectron_predictor.py

The detectron_predictor.py script also performs evaluation after the predictions.

Output

There are three types of output from prediction;

  1. masks
  2. Confidence of prediction
  3. Orientation of fruit

The mask output for both strawberry and tomato models are shown in Figures;

Output masks

Predicted masks for strawberries

Output masks

Predicted masks for tomatoes

References

[1] Detectron2, Facebook Artificial Intelligence Research Team. Model zoo for detectron2 package, https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md?plain=1
[2] Package dependencies for FruitDetector, https://github.com/LCAS/aoc_fruit_detector/blob/main/scripts/fd_only_requirements.txt
[3] Mask to COCO JSON converter, https://github.com/usmanzahidi/MaskToCOCOJson

Clone this wiki locally