Understand papers faster. Generate stronger ideas. Build research with confidence.
PaperLens AI is a comprehensive full-stack research assistant designed for students, researchers, and developers. It bridges the gap between raw research papers and actionable outputs by providing an intelligent, workflow-driven platform.
Instead of just chatting with a PDF, PaperLens AI provides structured workflows: extracting key insights from documents, planning experiments based on those insights, discovering research gaps, generating novel problem statements, finding relevant datasets, and running real-time citation intelligence.
- Memory-Safe Extraction: Uses
PyMuPDFwith generator-based extraction to parse massive PDFs without memory spikes. - Persistent Memory: Chunks and embeddings are securely stored in Supabase pgvector, persisting beyond server reloads.
- Map-Reduce Summaries: Generates cohesive summaries for large documents via a map-reduce summarization pipeline (
GET /api/summarize/{paper_id}).
- Intelligent Reference Matcher: Validates bibliography text against the Semantic Scholar API using a robust 4-strategy fallback search (DOI β Exact β Title β Loose).
- Live Streaming Progress: Employs Server-Sent Events (SSE) to display a beautiful animated UI showing exactly which references are processing and matching in real-time.
- Actionable AI Reading Paths: Uses citation volumes and contextual metadata to recommend which papers to read first.
- Step-by-Step Roadmaps: Generate detailed, step-wise execution plans by inputting a research topic and a target difficulty level.
- Practical Metrics: Provides parameter recommendations, risk assessments, and practical implementation details.
- Idea Ideation: Input a domain, subdomain, and complexity level to generate novel research problems.
- Problem Expansion: Expand a selected surface-level idea into a deep execution brief and methodology.
- Critical Analysis: Detect logical flaws, missing literature, or methodological research gaps from uploaded files or pasted text.
- Actionable Advice: Returns severity scores (low/medium/high) alongside actionable suggestions for improvement.
- Intelligent Matching: Recommends the most suitable datasets, evaluation benchmarks, and common framework technologies.
PaperLens AI uses a decoupled client-server architecture, highly optimized for deployment environments with memory constraints (like Render's 500MB tier).
- Frontend: React/TypeScript + Tailwind + Framer Motion. Handled by Clerk for authentication.
- API Gateway: Calls are made via JWT Bearer tokens to the FastAPI backend.
- Data Extraction: PDFs are streamed through
PyMuPDFvia generators keeping memory overhead negligible. - Vector Storage: Chunks are embedded (
all-MiniLM-L6-v2) and upserted to remote Supabase pgvector immediately. The heavytorchengine is lazy-loaded to prevent idle server bloating. - AI Inference: RAG context boundaries are orchestrated via prompt injection directly to Groq.
- Legacy analyzer:
POST /api/analyzeβ returnsdoc_idβPOST /api/askwithdoc_id(in-memory; resets on backend restart) - Persistent RAG:
POST /api/upload-paperβ returnspaper_idβGET /api/summarize/{paper_id}andPOST /api/askwithpaper_id(persistent chunks in pgvector)
| Layer | Technologies |
|---|---|
| Frontend | React, Vite, TypeScript, Tailwind CSS, shadcn/ui, framer-motion |
| Backend | Python 3.10+, FastAPI, Uvicorn, SQLAlchemy |
| PDF Extraction | PyMuPDF (fitz) |
| Authentication | Clerk JWT |
| LLM Orchestration | Groq (llama-3.1-8b-instant) |
| Retrieval (RAG) | Supabase pgvector Remote Storage + local FAISS fallback |
| Database | PostgreSQL |
- Python 3.10+
- Node.js 18+
- Supabase Project (with
pgvectorenabled) - Clerk, Groq, and Semantic Scholar API keys.
Run the SQL definitions from backend/supabase_migration.sql in your Supabase SQL Editor. This establishes the paper_chunks schema and match_chunks RPC vector matching algorithm.
cd backend
python -m venv .venv
.venv\Scripts\Activate.ps1
pip install -r requirements.txtCreate backend/.env with your 6 critical keys:
DATABASE_URL=postgresql://postgres...
SUPABASE_URL=https://...
SUPABASE_KEY=...
CLERK_SECRET_KEY=sk_test_...
GROQ_API_KEY=gsk_...
SEMANTIC_SCHOLAR_API_KEY=...Run server: uvicorn app.main:app --reload
cd frontend
npm installCreate frontend/.env.local:
VITE_CLERK_PUBLISHABLE_KEY=pk_test_...
VITE_API_URL=http://localhost:8000Run client: npm run dev
- Backend: Designed for Render. (
β οΈ KeepENABLE_VECTOR_RETRIEVAL=falsefor idle memory safety). - Frontend: Designed for Vercel.
MIT License. See LICENSE.
