-
Notifications
You must be signed in to change notification settings - Fork 3
C. Training Procedure
The "FruitDetector" module employs Mask-RCNN to predict the classified masks overlapping a fruit in a given image. Mask-RCNN is proposed and developed by a team at Facebook AI Research (FAIR) as an extension to Faster-RCNN as an instance segmentation tool. There are several packages that help in training, prediction and evaluation of Mask-RCNN models such as torchvision, mmdetection and detectron2.
We selected detectron2 over others because it comes from the originator and is well-maintained. There are several pretrained and baseline models available for detectron2 at [1]. The pretrained models are trained on selection of datasets that are native to the detectron2 package. If a new dataset has to be trained, then it should be added as a custom dataset and should be trained on a defined pretrained dataset.
The FruitDetector module has two modes of execution that primarily varies according to the required output. If the prediction results visualisation is required together with its COCO json file, then it is executed in debug mode. If low-latency execution with in-memory json message is required then it runs in optimized mode. The module include training, prediction and evaluation options which is mainly controlled through the configuration file.

FruitDetector Overview: FruitDetector components and execution overview.
The installation of dependencies are defined in the requirements file available at [2]. The installation of packages is performed by a single command;
pip install -r fd_only_requirements.txtThe minimum requirement for host PC with x64 architecture is 16 GB RAM and a GPU capable of running CUDA 10. We configured FPN101 Resnet pretrained model which is limited with a minimum GPU memory requiremeny of 5.2 GB.
_BASE_: "../Base-RetinaNet.yaml"
MODEL:
WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-101.pkl"
RESNETS:
DEPTH: 101
SOLVER:
STEPS: (210000, 250000)
MAX_ITER: 270000The annotation is performed by creating labelled mask on each fruit. We have two different types of annotations such as 1. Fruit Only 2. Ripeness categories (ripe, unripe). There are several tools available for annotations such as labelbox, V7 Darwin that are paid platforms, contrarily CVAT is Intel's free annotation platform. A screenshot of CVAT annotation is shown in the example below. The example shows image and metadata info along with the mask coordinates under segmentation section.

CVAT Annotation environment.
The annotations are exported to COCO 1.0 format which is readable in detectron2. Parts of annotations are shown here for illustration.
"info": {
"description": "Exported from AOC_Json_Exporter",
"url": "https://www.lincoln.ac.uk/home/liat/",
"version": "1.0",
"year": 2021,
"contributor": "Lincoln Institute of Agri-food Technology",
"date_created": "2024-11-16 15:02:16.135036"
},
"licenses": [
{
"url": "https://www.lincoln.ac.uk/home/liat/",
"id": "1",
"name": "placeholder license"
}
],
"images": [
{
"license": 0,
"file_name": "20231128-150802.jpg",
"coco_url": "",
"height": 1080,
"width": 1920,
"date_captured": "",
"flickr_url": "n/a",
"darwin_url": "",
"darwin_workview_url": "",
"id": 1
}
],
"annotations": [
{
"id": 1,
"image_id": 1,
"category_id": 1,
"segmentation": [
[
1583.0,
545.5,
1582.0,
545.5,
1581.0,
545.5If the annotation is performed as labelled masks only, then package like MaskToCOCOJson available at [3] could be used to convert it to JSON format.
After annotation all images subset and their respective annotation files should be place in three folder (train,test,val). As the user-defined dataset are non-native and custom for the detectron2 base, therefore its registration is required. There are five different types of configurations that are described below;
This category comprises the user-defined train and test dataset names and download urls for train and test datasets if they are not available in the file directories. If under settings category download_assets is set to true the all these dataset will be downloaded to data directory before the start of the training.
The files category has multiple file settings such as defining "pretrained_model_file" entry which trains on top of user-defined pre-trained model. The "model_file" entry has the model file name which is used for prediction. If one is training a model then it is recommended to put blank "model_file" entry as the output model from training would be place here. If pre-trained model setting is empty string then the base Resnet 101 is used, this resnet model is defined in "config_file" entry. The "train_metadata_catalog_file" and "test_metadata_catalog_file" entries are for defining paths to metadata catalog file of detectron2. The files save class names, dataset information and colour description for annotation process. The file name given in these entries will be created during the training process. The train and test datasets' annotations are given in "train_annotation_file" and "test_annotation_file" entries, respectively.
The directories with images for datasets are defined in "train_image_dir" and "test_image_dir" entries. The "training_output_dir" is the directory which holds all iterative outputs which is saved after 5000 iterations and other statistical measure used for evaluation purpose. The "prediction_output_dir" is the directory where annotated prediction images are saved. The "prediction_json_dir" is the path where all predicted json files are saved. Both these entries are enabled only in the debug mode.
The configuration related to training such as number of iteration, class labels and learning rate are defined in "epochs", "number_of_classes", and "learning_rate" entries. The use may select between SGD or Adam in "optimizer" entry.
The settings category is for general execution options such as "download_assets" is enabled when pre-trained models is to be downloaded before the training process starts automatically from the module. The "segm_masks_only" and "bbox" are for annotation output settings to set segmentation mask or to include the bounding box.The module output the orientation of fruits and has the possibility to select between two methods namely a. PCA and b. LOG_POLAR transform. PCA is set as default and is defined in "orientation_method" entry.
datasets:
train_dataset_name: 'aoc_train_dataset'
test_dataset_name: 'aoc_test_dataset'
dataset_train_annotation_url: 'https://lncn.ac/aocanntrain'
dataset_train_images_url: 'https://lncn.ac/aocdatatrain'
dataset_test_annotation_url: 'https://lncn.ac/aocanntest'
dataset_test_images_url: 'https://lncn.ac/aocdatatest'
files:
# pretrained model used as a training base model, if set as empty, the config file will use imagenet trained model as base.
# model file is required for prediction, if further training of previously trained model is required then model
# and pretrained model should have the path to that model file.
pretrained_model_file: './model/aoc_tomato_ripeness_151_90k.pth'
model_file: ''
config_file: 'COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml'
test_metadata_catalog_file: './data/dataset_catalogs/tomato_ripeness_test_metadata_catalog.pkl'
train_dataset_catalog_file: './data/dataset_catalogs/tomato_ripeness_train_dataset_catalog.pkl'
train_annotation_file: './data/tomato_dataset/train/annotations/ripeness_class_annotations.json'
test_annotation_file: './data/tomato_dataset/test/annotations/ripeness_class_annotations.json'
model_url: 'https://lncn.ac/aocmodel'
meta_catalog_url: 'https://lncn.ac/aocmeta'
train_catalog_url: 'https://lncn.ac/aoccat'
directories:
train_image_dir: './data/tomato_dataset/train/'
test_image_dir: './data/tomato_dataset/test/'
training_output_dir: './data/training_output/'
prediction_output_dir: 'data/prediction_output/test_images/'
prediction_json_dir: './data/annotations/predicted/'
training:
epochs: 50000
number_of_classes: 2
optimizer: 'SGD'
learning_rate: 0.0025
settings:
download_assets: false # if assets such as model and datasets should be downloaded
rename_pred_images: false #rename the predicted images in img_000001.png like format
segm_masks_only: true
bbox: true
orientation_method: 'PCA' #choose between PCA or LOG_POLARAll folders defined above should be registered as a separate dataset with unique names. In the following example they are "aoc_train_dataset" and "aoc_test_dataset". The dataset should represent the image folder and the annotation file belonging to it. The directory associated with the dataset is defined in "train_image_dir" under "directories" category of the configuration file. The annotation which is a json file is defined in "train_annotation_file" entry. These three entries would make a custom dataset for aoc_train_dataset in our example. The module is controlled by the configuration file whose entries are explained in configuration section. The trainer and predictor modules are called by following commands.
python detectron_trainer.pypython detectron_predictor.pyThe detectron_predictor.py script also performs evaluation after the predictions.
There are three types of output from prediction;
- masks
- Confidence of prediction
- Orientation of fruit
The mask output for both strawberry and tomato models are shown in Figures;

Predicted masks for strawberries

Predicted masks for tomatoes
[1]
Detectron2, Facebook Artificial Intelligence Research Team.
Model zoo for detectron2 package, https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md?plain=1
[2]
Package dependencies for FruitDetector, https://github.com/LCAS/aoc_fruit_detector/blob/main/scripts/fd_only_requirements.txt
[3]
Mask to COCO JSON converter, https://github.com/usmanzahidi/MaskToCOCOJson