Quick links: π» Demo
A modular system that extracts check-worthy claims from peer reviews, retrieves relevant evidence from manuscripts, and verifies claims through natural language inference to assess whether review statements are supported, partially supported, contradicted, or undetermined by the paper content.
- OpenReview Crawling: Extract papers and reviews from OpenReview URLs
- PDF Parsing: Multiple parsing methods (Docling, Nougat, PyPDF2 fallback)
- Markdown Cleaning: Remove artifacts and format content
- Text Chunking: Split papers into manageable chunks for retrieval
- Review Processing: Extract structured reviews from OpenReview data
- Claim Extraction: Extract claims from reviews using multiple methods (FENICE, rule-based)
- Evidence Retrieval: Find relevant evidence for claims using TF-IDF, BM25, or SBERT
- Claim Verification: Verify claims against evidence using LLMs provided by VLLM
- Create a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtpython main.py "https://openreview.net/forum?id=YOUR_SUBMISSION_ID"python main.py "https://openreview.net/forum?id=YOUR_SUBMISSION_ID" --config config.jsonpython main.py "https://openreview.net/forum?id=YOUR_SUBMISSION_ID" --username your_email --password your_passwordProcess a single review file independently:
python process_individual_review.py path/to/review.json --chunks path/to/chunks.jsonl --output results/Process all review files in a directory:
python process_individual_review.py reviews/ --directory --chunks path/to/chunks.jsonl --submission-id SUBMISSION_IDThe app uses a JSON configuration file (config.json) with the following options:
{
"pdf_parser": "auto",
"parser_kwargs": {
"code_enrichment": false,
"formula_enrichment": false,
"model": "0.1.0-small"
},
"chunk_size": 512,
"claim_extraction": "auto",
"evidence_retrieval": "auto",
"verification_model": "Qwen/Qwen3-4B-Instruct-2507-FP8",
"top_k": 4,
"output_dir": "outputs"
}- pdf_parser: PDF parsing method (
"auto","docling_standard","nougat","pypdf2_fallback") - parser_kwargs: Additional arguments for PDF parsing
- chunk_size: Maximum tokens per chunk (default: 512)
- claim_extraction: Claim extraction method (
"auto","fenice","rule_based") - evidence_retrieval: Evidence retrieval method (
"auto","tfidf","bm25","sbert") - verification_model: vLLM model for claim verification (default: Qwen/Qwen3-4B-Instruct-2507-FP8)
- top_k: Number of evidence chunks per claim (default: 4)
- output_dir: Output directory (default: "outputs")
The app creates the following directory structure:
outputs/
βββ pdfs/ # Downloaded PDFs
βββ markdown/ # Parsed and cleaned markdown
βββ chunks/ # Chunked markdown for retrieval
βββ reviews/ # Review files
β βββ SUBMISSION_ID_all_reviews.json # All reviews in one file
β βββ SUBMISSION_ID_all_reviews.pkl # Pickle backup of reviews
β βββ SUBMISSION_ID_review_1_REVIEW_ID.json # Individual review files
β βββ SUBMISSION_ID_review_2_REVIEW_ID.json # Individual review files
β βββ processed/ # Results from individual processing
β βββ claims/ # Claims per review
β βββ evidence/ # Evidence per review
β βββ verification/ # Verification per review
βββ claims/ # Extracted claims (all reviews combined)
βββ evidence/ # Evidence retrieval results (all reviews)
βββ verification/ # Claim verification results (all reviews)
βββ *.json # Metadata and intermediate files
- Crawl OpenReview: Extract paper PDF and review data using improved API calls
- Parse PDF: Convert PDF to markdown using selected parser
- Clean Markdown: Remove artifacts and format content
- Chunk Text: Split paper into manageable chunks
- Extract Reviews: Structure review data with multiple fallback mechanisms
- Extract Claims: Identify claims in review text
- Retrieve Evidence: Find relevant paper sections for each claim
- Verify Claims: Use LLM to verify claims against evidence
- Load Review: Process individual review JSON files
- Extract Claims: Extract claims from the specific review
- Retrieve Evidence: Find relevant evidence for review claims
- Verify Claims: Verify claims independently per review
- Multiple API Methods: Uses both
get_all_notes()anddirectRepliesfallback - Comprehensive Fields: Extracts all available review fields (rating, confidence, strengths, weaknesses, etc.)
- Multiple Formats: Saves reviews as JSON, pickle, and individual files
- Independent Processing: Each review can be processed separately
- Robust Fallbacks: Multiple methods to load reviews if primary extraction fails
- Docling Standard: High-quality parsing with enrichment options
- Nougat: Meta's document understanding model
- PyPDF2 Fallback: Simple text extraction
- FENICE: Neural claim extraction model
- Rule-based: Keyword-based extraction
- TF-IDF: Traditional information retrieval
- BM25: Probabilistic retrieval model
- SBERT: Semantic similarity using sentence transformers
- vLLM: Uses vLLM with OpenAI-compatible API (default: Qwen/Qwen3-4B-Instruct-2507-FP8)
- API URL: http://localhost:11435/v1
The app gracefully handles missing dependencies by falling back to available methods. Core dependencies include:
openreview: OpenReview API accesstorch: PyTorch for ML modelstransformers: Hugging Face transformersscikit-learn: For TF-IDF retrievalrank-bm25: For BM25 retrievalsentence-transformers: For SBERT retrievalrequests: For vLLM API calls (claim verification)tiktoken: For text tokenization
- PDF Parsing Fails: Try different parsing methods in config
- No Reviews Found:
- Check if the OpenReview URL has published reviews
- Try using OpenReview credentials for private reviews
- Check individual review files in
outputs/reviews/directory
- No Claims Extracted: Check if reviews contain claim-like statements
- Evidence Retrieval Empty: Verify paper chunks were created successfully
- vLLM Connection Error: Ensure vLLM is running at http://localhost:11435/v1 and model is available
- Reviews Not Extracted: The app now uses multiple methods to fetch reviews:
- Primary:
get_all_notes()API call - Fallback:
directRepliesfrom submission object - Recovery: Load from saved JSON/pickle files
- Primary:
- Individual Review Processing: Use
process_individual_review.pyto process reviews independently - Missing Review Data: Check the
*_all_reviews.jsonand*_all_reviews.pklfiles for raw review data
The app creates detailed logs in app.log for debugging.
After processing, you'll get:
- Structured review data
- Extracted claims from reviews
- Evidence chunks for each claim
- Verification results with confidence scores
- Human-readable verification report
Peer review is central to scientific publishing, yet reviewers frequently include claims that are subjective, rhetorical, or misaligned with the submitted work. Assessing whether review statements are factual and verifiable is crucial for fairness and accountability. At the scale of modern conferences and journals, manually inspecting the grounding of such claims is infeasible. We present Peerispect, an interactive system that operationalizes claim-level verification in peer reviews by extracting check-worthy claims from peer reviews, retrieving relevant evidence from the manuscript, and verifying the claims through natural language inference. Results are presented through a visual interface that highlights evidence directly in the paper, enabling rapid inspection and interpretation. Peerispect is designed as a modular Information Retrieval (IR) pipeline, supporting alternative retrievers, rerankers, and verifiers, and is intended for use by reviewers, authors, and program committees.