Feature - Add hallucination monitor to gpt-oss #155
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR: feat(monitoring): Hallucination Monitor (SC+NLI+Numeric, optional RAG) with API, CLI, HTML, tests, CI
🎯 What/Why
This PR implements a comprehensive Hallucination Monitor for GPT-OSS that detects and quantifies hallucination risk in LLM outputs using multiple detection signals. The system provides both programmatic API and CLI interfaces, generates beautiful HTML reports, and includes extensive testing and CI integration.
Key Features
Screenshots:
🎯 Generated Report || analytics || insights
Web Interface for testing
🏗️ Architecture
Signal Flow
🔧 Detection Signals
1. Self-Consistency (SC) - Weight: 0.25
sentence-transformers/all-MiniLM-L6-v2
with character-based fallback2. NLI Faithfulness (NLI) - Weight: 0.35
3. Numeric Sanity (NS) - Weight: 0.15
4. Retrieval Support (RS) - Weight: 0.20
5. Jailbreak Heuristics (JB) - Weight: 0.05
📊 Scoring Algorithm
Risk Level Classification:
🚀 Usage Examples
CLI Usage
Python API
Custom Configuration
📁 Files Added/Modified
New Files
gpt_oss/monitoring/__init__.py
- Main exportsgpt_oss/monitoring/halluci_monitor.py
- Main API and orchestrationgpt_oss/monitoring/config.py
- Configuration dataclassesgpt_oss/monitoring/detectors/__init__.py
- Detector exportsgpt_oss/monitoring/detectors/self_consistency.py
- Self-consistency detectorgpt_oss/monitoring/detectors/nli_faithfulness.py
- NLI faithfulness detectorgpt_oss/monitoring/detectors/numeric_sanity.py
- Numeric sanity detectorgpt_oss/monitoring/detectors/retrieval_support.py
- Retrieval support detectorgpt_oss/monitoring/detectors/jailbreak_heuristics.py
- Jailbreak heuristics detectorgpt_oss/monitoring/highlight/__init__.py
- Highlight utilities exportsgpt_oss/monitoring/highlight/span_align.py
- Span alignment utilitiesgpt_oss/monitoring/highlight/html_report.py
- HTML report generatorgpt_oss/monitoring/metrics/__init__.py
- Metrics exportsgpt_oss/monitoring/metrics/scoring.py
- Risk score computationgpt_oss/monitoring/__main__.py
- CLI entry pointgpt_oss/monitoring/requirements-monitor.txt
- Monitoring dependenciesgpt_oss/monitoring/examples/README.md
- Usage examplesgpt_oss/monitoring/examples/truthfulqa_mini.jsonl
- Test datagpt_oss/monitoring/examples/fever_mini.jsonl
- Test datatests/monitoring/test_monitor_basic.py
- Basic teststests/monitoring/test_numeric_sanity.py
- Numeric sanity teststests/monitoring/test_nli_faithfulness.py
- NLI faithfulness teststests/monitoring/test_self_consistency.py
- Self-consistency tests.github/workflows/ci-monitoring.yml
- CI workflowdocs/monitoring_design.md
- Design documentationModified Files
pyproject.toml
- Added monitoring dependencies and CLI entry pointREADME.md
- Added Hallucination Monitor section🧪 Testing
Test Coverage
Test Categories
test_monitor_basic.py
- Core functionality and configurationtest_numeric_sanity.py
- Numeric detection and unit conversionstest_nli_faithfulness.py
- NLI detection and entailmenttest_self_consistency.py
- Self-consistency and similarityCI Integration
📦 Dependencies
Core Dependencies
numpy>=1.21.0
- Numerical computationsscipy>=1.7.0
- Scientific computingregex>=2021.0.0
- Regular expressionstqdm>=4.62.0
- Progress barsNLP/ML Dependencies (Optional)
sentence-transformers>=2.2.0
- Semantic similaritytransformers>=4.20.0
- NLI modelstorch>=1.12.0
- PyTorch backendjinja2>=3.0.0
- HTML template renderingInstallation
🎨 HTML Reports
The system generates beautiful, interactive HTML reports with:
Report Features
🔧 Configuration
MonitorConfig Options
k_samples
: Number of samples for self-consistency (default: 5)temperature
: Generation temperature (default: 0.7)max_new_tokens
: Maximum tokens for generation (default: 512)enable_retrieval_support
: Enable retrieval support detection (default: True)enable_jailbreak_heuristics
: Enable jailbreak detection (default: True)thresholds
: Risk level thresholds (default: high=0.7, medium=0.4)weights
: Signal weights (configurable)html_report
: Generate HTML reports (default: True)report_dir
: Output directory for reports (default: "runs")Customization Points
🚀 Performance
Optimization Strategies
Resource Requirements
🔮 Future Enhancements
Planned Features
Extension Points
📋 Checklist
Implementation
Testing
Documentation
Production Readiness
🎯 Impact
This Hallucination Monitor provides:
The system is designed to be:
🔗 Related
docs/monitoring_design.md
gpt_oss/monitoring/examples/README.md
tests/monitoring/
.github/workflows/ci-monitoring.yml
Ready for Review ✅
This PR implements a complete, production-ready Hallucination Monitor for GPT-OSS with comprehensive testing, documentation, and CI integration. The system provides both programmatic and command-line interfaces, generates beautiful HTML reports, and includes extensive customization options.