Skip to content

GIST-NJU/CMR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

61 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Composite Metamorphic Relations (CMRs) for DNN Testing

This repository is supplementary to the paper "How Composite Metamorphic Relations Enhance Test Effectiveness of DNN Testing: An Empirical Study".

πŸ“‚ Project Structure

CMR
β”œβ”€β”€ data
β”‚Β Β  β”œβ”€β”€ source            # Source test images
β”‚Β Β  β”œβ”€β”€ followup          # Follow-up test images
β”‚Β Β  └── human_validation  # Follow-up test images sampled for human validation
β”œβ”€β”€ models                # DNNs under test
β”œβ”€β”€ figures               # Figures for RQs
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ results
β”‚Β Β  β”œβ”€β”€ predictions       # Original prediction results
β”‚Β Β  β”œβ”€β”€ validity          # Automated validator and validation results
β”‚Β Β  β”œβ”€β”€ errors            # Failures and faults revelation
β”‚Β Β  β”œβ”€β”€ features          # Feature representations of test images
β”‚Β Β  └── samples           # CMRs and source test images for evaluating augmented models
β”œβ”€β”€ RQs.ipynb             # Codes for analysing data and generating figures and tables
└── src                   # Codes for reproducing experiment

βš™οΈ Requirements

Run the following command to install the dependencies required:

conda create -n cmr python=3.8.18
conda activate cmr
pip install -r requirements.txt

Run the following command to download additional files (about 75 GB, which are not directly included in this repository due to file size limit):

python download_data.py

These include the DNNs trained (in the models/ directory), the test sets of datasets (in the data/source/ directory), and the follow-up test images sampled for human validation (in the data/human_validation directory).

Alternatively, you can manually download the above files from zenodo and cocodataset, and put them into their respective target directories.

πŸ“¦ Subject DNNs and Component MRs

The experiment is performed on five DNNs, trained on five popular image recognition datasets:

  • AlexNet @ MNIST (single-label classification)
  • DenseNet @ Caltech256 (single-label classification)
  • MSRN @ VOC (multi-label classification)
  • MLD @ COCO (multi-label classification)
  • Faceptor @ UTKFace (regression-based facial image estimation)

A total of seven representative component MRs are used to create CMRs for investigation. The implementation of these MRs can be found in the src/mr_utils.py file.

MR Image Transformation Identifier Used
MR1 Brightness 1
MR2 Contrast 3
MR3 Sharpness 2
MR4 Blur (Gaussian) 4
MR5 Rotation 0
MR6 Shear (Horizontal) 5
MR7 Translation 6

In our implementation, each MR is associated with a unique identifier, and the above table gives the mapping between the MRs described in the paper and their corresponding identifiers. Accordingly, each CMR is denoted as a sequence of identifiers of its component MRs. For example, for the CMR componsed under the composition sequence <MR2, MR1> (i.e., first apply Contrast and then Brightness), it is denoted as (3,1) or 31 in our code and raw data.

πŸ“Š Experimental Data

Figures

The figures directory contains all figures presented in the paper:

  • RQ1: Proportion of valid follow-up test images generated by CMRs
  • RQ2: The overlap of fault types uncovered by CMR, the strongest MR, and all component MRs
  • RQ3.1: Difference in Failure Rate (DFR) and Fault Type (DFT) between CMR and its strongest and average-performing component MR
  • RQ3.2: Relationship between DFR/DFT and βˆ†M values observed

Follow-up Test Images

The data/followup/ directory is used to store the follow-up test images generated by applying CMRs on source test images. Since our experiment produced an extreme large volume of follow-up test images (more than 200 millions, approximately 78 TB), we do not provide these images directly. Alternatively, these images can be generated on demand (see Generate follow-up test images).

The data/human_validation/ directory contains the randomly selected follow-up test images for human validation study (RQ1).

Original Prediction Outputs

The results/predictions/ directory contains the original prediction outputs of the DNNs on all source and follow-up test images, including:

  • [Dataset]/[Dataset]_[DNN]_source.npy/.csv: the prediction outputs of source test images. Each element in the NumPy array, or each row in the CSV, corresponds to a single source test image.
  • [Dataset]/[Dataset]_[DNN]_followup.npy: the prediction outputs of follow-up test images. The data is stored as a dict, where the key specifies the identifier of CMR, and the value is a list or dataframe containing prediction outputs of all follow-up test images generated by that CMR.

Validity of Test Inputs

The results/validity/ directory contains the files for running the SelfOracle method, and the results obtained, including:

  • [Dataset]_VAE.pth: the VAE trained for each dataset.
  • [Dataset]_threshold.txt: the threshold that is automatically derived based on the VAE and rate of false alarm (0.01%).
  • [Dataset]_validity.npy: the reconstruction errors of all follow-up test images obtained. Each element in the NumPy array corresponds to a CMR, and each value in the list corresponds to a follow-up test image generated by that CMR. A value exceeding the threshold indicates an invalid image.
  • human_validation.csv: the manual validation results. Each row indicates a follow-up test image that is manually determined as semantically invalid, and the three columns specify the dataset name, the validity determined by SelfOracle (1 indicates valid), and the filename of the follow-up test image. Here, the filename is formatted as [Identifier of CMR]_[Source image index]_[Label of source image], e.g., 012_2780_2.png indicates the follow-up test image generated by applying CMR 012 on source test image 2780, and the label of this source image is 2.

Failure and Fault Revelation

The results/errors/ directory contains the failures and fault types observed, including:

  • failures_[DNN].pkl and faults_[DNN].pkl: the indices of failure-revealing MPs and the fault types uncovered for each MR and CMR. The data is organized in a dict, where the key specifies the identifier of an MR or CMR.
  • failure_rates.csv and fault_types.csv: the Failure Rate (FR) and Fault Type (FT) metrics calculated for each MR and CMR (as specified by the CMR column) under each DNN.

Complementary MRs

The results/features/ directory contains the extracted features of source and follow-up test images for analysing complementary MRs, as stored in [Extractor]/[Dataset].pt and [Extractor]/[Dataset]_[MR].pt. Here, [Extractor] indicates the feature extractor used, and [MR] is the identifier of the MR applied to generate the follow-up test images. These files are PyTorch tensors with shape (N,D), where N is the number of test images and D is the dimension of the extracted feature representation. Note that these files contain feature representations obtained from feature extractors. These feature representations were then reduced to eight dimensions using PCA, and the resulting vectors were used to compute βˆ†M in RQs.ipynb.

The extractors used for feature extraction are pre-trained models available directly from the corresponding packages, except for lenet50, whose pre-trained model is provided at results/features/lenet50/lenet50_mnist.pth.

Influence of Data Augmentation

The results/samples/ directory contains the CMRs and source test images that are randomly selected for evaluating the performance of CMRs under DNNs trained without and with data augmentation, i.e., [Dataset]_[DNN]_cmr50.pkl and [Dataset]_1000.pkl.

πŸ› οΈ Reproducing Experiment

The following scripts can be used to reproduce the complete experiment and evaluation process. To generate tables and figures presented in the paper, run the commands in RQs.ipynb.

1. Generate follow-up test images

python src/generate_followup.py --dataset COCO --strength 2

The --dataset parameter specifies the dataset name, and --strength specifies the composition strength applied. The follow-up test images generated by employing the input mappings of the specified CMRs will be saved in data/followup/[Dataset]/[CMR] directory (e.g., data/followup/COCO/31).

2. Validate follow-up test images

python src/selforacle.py --dataset COCO

This will run the SelfOracle method to determine the validity of test images in data/followup/[Dataset]/[CMR] directory, and produce results as included in the results/validity/ directory.

3. Execute DNNs Under Test

python src/predict.py --dataset COCO            # execute source test images
python src/predict.py --dataset COCO --followup # execute follow-up test images

This will run the DNN to make predications of the test images, and produce results as included in the results/predictions/ directory.

4. Evaluate Failure and Fault Revelation Capability

python src/count_failure_fault.py --dataset COCO

This will base on the prediction outputs to calculate the failure rate and fault type metrics, and produce the files as included in the results/errors/ directory.

5. Feature Extraction

python src/extract_features.py

This will calculate the feature representation of each test image, and produce the results as included in the results/features/ directory. The calculation of the βˆ†M measure and the correlation analysis is performed in RQs.ipynb.

6. Data Augmentation

# train with data augmentation
python src/data_augment.py --dataset COCO --augment online
# sample source images and CMRs for evaluating augmented models
python src/data_augment/select_samples.py --cmr_num 50 --source_num 1000
# predict using augmented models
python src/predict.py --dataset COCO --augment online --cmr_num 50 --source_num 1000
# evaluate failure and fault revelation capability
python src/count_failure_fault.py --dataset COCO --cmr_num 50 --source_num 1000
python src/count_failure_fault.py --dataset COCO --augment online --cmr_num 50 --source_num 1000

This will first train the DNN using augmented dataset and save the checkpoints (with Aug_online suffix following the names of DNN) in the models directory. Then, it will randomly sample a subset of CMRs and source test images (producing files in results/samples/), use them to test the augmented version of DNN (producing files in results/predictions/), and evaluate the failure and fault revelation capability (producing files in results/errors/).

The --augment parameter specifies the data augmentation method used, with online indicating the online (i.e., on-the-fly) data augmentation method. The --cmr_num and --source_num parameters specify the number of CMRs and source test images sampled, respectively. If --source_num sets to 0, all source test images will be used.

About

How Composite Metamorphic Relations Enhance Test Effectiveness of DNN Testing: An Empirical Study

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors