PEERISPECT: Claim Verification in Scientific Peer Reviews

Quick links: 💻 Demo

A modular system that extracts check-worthy claims from peer reviews, retrieves relevant evidence from manuscripts, and verifies claims through natural language inference to assess whether review statements are supported, partially supported, contradicted, or undetermined by the paper content.

Features

OpenReview Crawling: Extract papers and reviews from OpenReview URLs
PDF Parsing: Multiple parsing methods (Docling, Nougat, PyPDF2 fallback)
Markdown Cleaning: Remove artifacts and format content
Text Chunking: Split papers into manageable chunks for retrieval
Review Processing: Extract structured reviews from OpenReview data
Claim Extraction: Extract claims from reviews using multiple methods (FENICE, rule-based)
Evidence Retrieval: Find relevant evidence for claims using TF-IDF, BM25, or SBERT
Claim Verification: Verify claims against evidence using LLMs provided by VLLM

Installation

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

Basic Usage

python main.py "https://openreview.net/forum?id=YOUR_SUBMISSION_ID"

With Configuration

python main.py "https://openreview.net/forum?id=YOUR_SUBMISSION_ID" --config config.json

With OpenReview Credentials

python main.py "https://openreview.net/forum?id=YOUR_SUBMISSION_ID" --username your_email --password your_password

Processing Individual Reviews

Process a single review file independently:

python process_individual_review.py path/to/review.json --chunks path/to/chunks.jsonl --output results/

Process all review files in a directory:

python process_individual_review.py reviews/ --directory --chunks path/to/chunks.jsonl --submission-id SUBMISSION_ID

Configuration

The app uses a JSON configuration file (config.json) with the following options:

{
    "pdf_parser": "auto",
    "parser_kwargs": {
        "code_enrichment": false,
        "formula_enrichment": false,
        "model": "0.1.0-small"
    },
    "chunk_size": 512,
    "claim_extraction": "auto",
    "evidence_retrieval": "auto",
    "verification_model": "Qwen/Qwen3-4B-Instruct-2507-FP8",
    "top_k": 4,
    "output_dir": "outputs"
}

Configuration Options

pdf_parser: PDF parsing method ("auto", "docling_standard", "nougat", "pypdf2_fallback")
parser_kwargs: Additional arguments for PDF parsing
chunk_size: Maximum tokens per chunk (default: 512)
claim_extraction: Claim extraction method ("auto", "fenice", "rule_based")
evidence_retrieval: Evidence retrieval method ("auto", "tfidf", "bm25", "sbert")
verification_model: vLLM model for claim verification (default: Qwen/Qwen3-4B-Instruct-2507-FP8)
top_k: Number of evidence chunks per claim (default: 4)
output_dir: Output directory (default: "outputs")

Output Structure

The app creates the following directory structure:

outputs/
├── pdfs/                    # Downloaded PDFs
├── markdown/               # Parsed and cleaned markdown
├── chunks/                 # Chunked markdown for retrieval
├── reviews/                # Review files
│   ├── SUBMISSION_ID_all_reviews.json      # All reviews in one file
│   ├── SUBMISSION_ID_all_reviews.pkl       # Pickle backup of reviews
│   ├── SUBMISSION_ID_review_1_REVIEW_ID.json  # Individual review files
│   ├── SUBMISSION_ID_review_2_REVIEW_ID.json  # Individual review files
│   └── processed/          # Results from individual processing
│       ├── claims/         # Claims per review
│       ├── evidence/       # Evidence per review
│       └── verification/   # Verification per review
├── claims/                 # Extracted claims (all reviews combined)
├── evidence/               # Evidence retrieval results (all reviews)
├── verification/           # Claim verification results (all reviews)
└── *.json                  # Metadata and intermediate files

Processing Pipeline

Main Pipeline

Crawl OpenReview: Extract paper PDF and review data using improved API calls
Parse PDF: Convert PDF to markdown using selected parser
Clean Markdown: Remove artifacts and format content
Chunk Text: Split paper into manageable chunks
Extract Reviews: Structure review data with multiple fallback mechanisms
Extract Claims: Identify claims in review text
Retrieve Evidence: Find relevant paper sections for each claim
Verify Claims: Use LLM to verify claims against evidence

Individual Review Processing

Load Review: Process individual review JSON files
Extract Claims: Extract claims from the specific review
Retrieve Evidence: Find relevant evidence for review claims
Verify Claims: Verify claims independently per review

Review Extraction Improvements

Multiple API Methods: Uses both get_all_notes() and directReplies fallback
Comprehensive Fields: Extracts all available review fields (rating, confidence, strengths, weaknesses, etc.)
Multiple Formats: Saves reviews as JSON, pickle, and individual files
Independent Processing: Each review can be processed separately
Robust Fallbacks: Multiple methods to load reviews if primary extraction fails

Available Methods

PDF Parsing

Docling Standard: High-quality parsing with enrichment options
Nougat: Meta's document understanding model
PyPDF2 Fallback: Simple text extraction

Claim Extraction

FENICE: Neural claim extraction model
Rule-based: Keyword-based extraction

Evidence Retrieval

TF-IDF: Traditional information retrieval
BM25: Probabilistic retrieval model
SBERT: Semantic similarity using sentence transformers

Claim Verification

vLLM: Uses vLLM with OpenAI-compatible API (default: Qwen/Qwen3-4B-Instruct-2507-FP8)
API URL: http://localhost:11435/v1

Dependencies

The app gracefully handles missing dependencies by falling back to available methods. Core dependencies include:

openreview: OpenReview API access
torch: PyTorch for ML models
transformers: Hugging Face transformers
scikit-learn: For TF-IDF retrieval
rank-bm25: For BM25 retrieval
sentence-transformers: For SBERT retrieval
requests: For vLLM API calls (claim verification)
tiktoken: For text tokenization

Troubleshooting

Common Issues

PDF Parsing Fails: Try different parsing methods in config
No Reviews Found:
- Check if the OpenReview URL has published reviews
- Try using OpenReview credentials for private reviews
- Check individual review files in outputs/reviews/ directory
No Claims Extracted: Check if reviews contain claim-like statements
Evidence Retrieval Empty: Verify paper chunks were created successfully
vLLM Connection Error: Ensure vLLM is running at http://localhost:11435/v1 and model is available

Review Processing Issues

Reviews Not Extracted: The app now uses multiple methods to fetch reviews:
- Primary: get_all_notes() API call
- Fallback: directReplies from submission object
- Recovery: Load from saved JSON/pickle files
Individual Review Processing: Use process_individual_review.py to process reviews independently
Missing Review Data: Check the *_all_reviews.json and *_all_reviews.pkl files for raw review data

Logs

The app creates detailed logs in app.log for debugging.

Example Output

After processing, you'll get:

Structured review data
Extracted claims from reviews
Evidence chunks for each claim
Verification results with confidence scores
Human-readable verification report

Abstract

Peer review is central to scientific publishing, yet reviewers frequently include claims that are subjective, rhetorical, or misaligned with the submitted work. Assessing whether review statements are factual and verifiable is crucial for fairness and accountability. At the scale of modern conferences and journals, manually inspecting the grounding of such claims is infeasible. We present Peerispect, an interactive system that operationalizes claim-level verification in peer reviews by extracting check-worthy claims from peer reviews, retrieving relevant evidence from the manuscript, and verifying the claims through natural language inference. Results are presented through a visual interface that highlights evidence directly in the paper, enabling rapid inspection and interpretation. Peerispect is designed as a modular Information Retrieval (IR) pipeline, supporting alternative retrievers, rerankers, and verifiers, and is intended for use by reviewers, authors, and program committees.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
src		src
webserver		webserver
.dockerignore		.dockerignore
.gitignore		.gitignore
DOCKER.md		DOCKER.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
docker-run.sh		docker-run.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PEERISPECT: Claim Verification in Scientific Peer Reviews

Features

Installation

Usage

Basic Usage

With Configuration

With OpenReview Credentials

Processing Individual Reviews

Configuration

Configuration Options

Output Structure

Processing Pipeline

Main Pipeline

Individual Review Processing

Review Extraction Improvements

Available Methods

PDF Parsing

Claim Extraction

Evidence Retrieval

Claim Verification

Dependencies

Troubleshooting

Common Issues

Review Processing Issues

Logs

Example Output

Abstract

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PEERISPECT: Claim Verification in Scientific Peer Reviews

Features

Installation

Usage

Basic Usage

With Configuration

With OpenReview Credentials

Processing Individual Reviews

Configuration

Configuration Options

Output Structure

Processing Pipeline

Main Pipeline

Individual Review Processing

Review Extraction Improvements

Available Methods

PDF Parsing

Claim Extraction

Evidence Retrieval

Claim Verification

Dependencies

Troubleshooting

Common Issues

Review Processing Issues

Logs

Example Output

Abstract

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages