[SIT] EISW-204381: Add per output threshold#34352
[SIT] EISW-204381: Add per output threshold#34352svigh wants to merge 13 commits intoopenvinotoolkit:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds support for per-output (per-layer) threshold specification across all metric comparison modes in the single-image-test tool. Previously, all metrics used global threshold values; now users can specify different thresholds for different output layers using the syntax "layer1:value1;layer2:value2" or continue using single global values.
Changes:
- Extracted mAP metric helper functions to separate files (
map_metric_helpers.hpp/cpp) for better code organization - Implemented per-layer threshold parsing infrastructure in new
argument_parse_helpers.hpp/cppfiles - Changed NRMSE default threshold from 1.0 to 0.02
- Updated all metric test functions to support per-layer thresholds while maintaining backward compatibility with single global values
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| map_metric_helpers.hpp | Declares structures (Detection, MatchResult) and functions for mAP calculation, extracted from main.cpp |
| map_metric_helpers.cpp | Implements IoU calculation, detection parsing, matching, and average precision computation for mAP metric |
| argument_parse_helpers.hpp | Declares all command-line flags (previously in main.cpp), provides utility classes (CaselessEq), and declares per-layer value parsing functions |
| argument_parse_helpers.cpp | Defines all DEFINE_string/DEFINE_* flags with updated help text mentioning per-layer support, implements parsePerLayerValues and getValueForLayer functions, and moves parseCommandLine function from main.cpp |
| main.cpp | Integrates per-layer threshold parsing into all metric test functions (RAW, CoSim, RRMSE, NRMSE, L2Norm, PSNR, Classification, SSD, YOLO, MeanIoU), removes flag definitions and helper code now in argument_parse_helpers, and updates PSNR to handle multi-output models differently |
src/plugins/intel_npu/tools/single-image-test/map_metric_helpers.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_npu/tools/single-image-test/map_metric_helpers.cpp
Outdated
Show resolved
Hide resolved
| // If there's only one output, use the global PSNR metric | ||
| if (outputs.size() == 1) { | ||
| auto refOutput = npu::utils::parseTensorsAsFP32(references); | ||
| auto actOutput = npu::utils::parseTensorsAsFP32(outputs); | ||
|
|
||
| auto result = utils::runPSNRMetric(actOutput, refOutput, dstHeight, dstWidth, scaleBorder, normalizedImage); | ||
| auto result = utils::runPSNRMetric(actOutput, refOutput, dstHeight, dstWidth, scaleBorder, normalizedImage); | ||
|
|
||
| if (FLAGS_psnr_reference - result > FLAGS_psnr_tolerance) { | ||
| std::cout << "Absolute difference between actual value " << result << " and reference value " | ||
| << FLAGS_psnr_reference << " larger then tolerance " << FLAGS_psnr_tolerance << std::endl; | ||
| return false; | ||
| // Get per-layer or global values | ||
| std::string layerName = outputs.begin()->first; | ||
| double psnrReference = utils::getValueForLayer(referenceMap, layerName); | ||
| double psnrTolerance = utils::getValueForLayer(toleranceMap, layerName); | ||
|
|
||
| if (psnrReference - result > psnrTolerance) { | ||
| std::cout << "Absolute difference between actual value " << result << " and reference value " | ||
| << psnrReference << " larger then tolerance " << psnrTolerance << std::endl; | ||
| return false; | ||
| } | ||
| } else { | ||
| // For multiple outputs, compute PSNR per layer | ||
| for (const auto& [layerName, outputTensor] : outputs) { | ||
| auto refIterator = references.find(layerName); | ||
| OPENVINO_ASSERT(refIterator != references.end()); | ||
|
|
||
| TensorMap singleOutput = {{layerName, outputTensor}}; | ||
| TensorMap singleReference = {{layerName, refIterator->second}}; | ||
|
|
||
| auto refOutput = npu::utils::parseTensorsAsFP32(singleReference); | ||
| auto actOutput = npu::utils::parseTensorsAsFP32(singleOutput); | ||
|
|
||
| auto result = utils::runPSNRMetric(actOutput, refOutput, dstHeight, dstWidth, scaleBorder, normalizedImage); | ||
|
|
||
| // Get per-layer values | ||
| double psnrReference = utils::getValueForLayer(referenceMap, layerName); | ||
| double psnrTolerance = utils::getValueForLayer(toleranceMap, layerName); | ||
|
|
||
| std::cout << "Layer: " << layerName << " - "; | ||
| if (psnrReference - result > psnrTolerance) { | ||
| std::cout << "Absolute difference between actual value " << result << " and reference value " | ||
| << psnrReference << " larger then tolerance " << psnrTolerance << std::endl; | ||
| allPassed = false; | ||
| } else { | ||
| std::cout << "PASSED (PSNR: " << result << ", Reference: " << psnrReference << ")" << std::endl; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The PSNR test behavior has changed in a potentially breaking way. Previously, for models with multiple outputs, all outputs were combined into a single PSNR calculation. Now, the code branches at line 1853: if there's exactly one output, it combines all tensors (which is just one), but if there are multiple outputs, it evaluates each layer separately at lines 1870-1895.
This is a functional change, not just adding per-layer threshold support. If this is intentional, it should be documented in the PR description. If not, the multi-output case should still combine all outputs like the original implementation did, while supporting per-layer thresholds for the combined result.
| } catch (...) { | ||
| // Not a single number, parse as key:value pairs |
There was a problem hiding this comment.
The catch-all exception handler at line 220 silently treats any invalid single-value input (e.g., "abc") as a key:value format. If the user mistakenly provides an invalid single value, it will be silently ignored and the default will be used without any warning. Consider adding a warning message inside the catch block to inform users that the single-value parse failed and it will be treated as key:value format, or validate that at least one colon exists before proceeding to key:value parsing.
| } catch (...) { | |
| // Not a single number, parse as key:value pairs | |
| } catch (const std::exception& e) { | |
| // Not a single number. If there is no colon, we cannot interpret this as "layer:value". | |
| if (str.find(':') == std::string::npos) { | |
| std::cerr << "Warning: Failed to parse per-layer value '" << str | |
| << "' as a single number and no 'layer:value' pairs were found. " | |
| << "Using default value " << defaultValue << "." << std::endl; | |
| return result; | |
| } | |
| // Otherwise, fall through and try to parse as key:value pairs. |
…rs.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…rs.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Details:
Tickets:
AI Assistance: