OmniDoc AI: Smart Assistant for Research Summarization

An AI-powered tool for deep document understanding, Q&A, and logic-based challenge generation from user-uploaded research papers, reports, and technical documents.

Features

Upload PDF/TXT documents
Auto-summary (≤150 words)
Ask Anything: Free-form Q&A with references
Challenge Me: Logic-based questions, answer evaluation, and feedback
All answers are grounded in the uploaded document
Modern, responsive web UI (React)
Multi-provider LLM support (OpenAI, Gemini, Claude, Local)

Setup Instructions

1. Backend (FastAPI)

cd backend
pip install -r requirements.txt
python download_models.py  # Download required AI models
python main.py             # Start backend server (http://localhost:8000)

After running the backend you should see a confirmation message like this:

after seeing the confirmation message only you can make calls to backend otherwise it will be error.

2. Frontend (React)

cd frontend
npm install
npm run dev                # Start frontend (http://localhost:5173)

#Demo Video

Usage

Upload a PDF/TXT document
View the auto-generated summary
Use "Ask Anything" for Q&A
Use "Challenge Me" for logic-based questions and feedback
All answers include references to the document

API Keys

Add your LLM API keys in the sidebar (frontend) or in backend/.env
Supported: OpenAI, Google Gemini, Anthropic Claude, Local LLM

Project Structure

OmniDoc-AI/
├── backend/
│   ├── api/
│   ├── services/
│   ├── chroma_db/
│   ├── main.py
│   └── requirements.txt
├── frontend/
│   ├── src/
│   └── package.json
├── tests/
├── README.md
└── QUICKSTART.md

Architecture / Reasoning Flow

System Overview

Frontend: React (with Tailwind CSS) for a modern, responsive UI. Handles document upload, mode selection ("Ask Anything" or "Challenge Me"), and displays answers, references, and reasoning.
Backend: FastAPI, responsible for document parsing, chunking, embedding, hybrid retrieval (ChromaDB), and LLM orchestration (OpenAI, Gemini, Claude, or local models).
Vector DB: ChromaDB stores semantic and keyword embeddings for all uploaded documents, enabling fast and accurate retrieval.

Reasoning Flow

Document Upload
- User uploads a PDF or TXT file via the frontend.
- Backend parses the document, extracts structure, splits into semantic chunks, and stores embeddings in ChromaDB.
- An auto-summary (≤150 words) is generated and displayed immediately.
Interaction Modes
- Ask Anything:
  - User asks a free-form question.
  - Backend performs hybrid retrieval (dense + keyword) to find the most relevant document chunks.
  - LLM is prompted with only the retrieved context.
  - The answer includes:
    - Direct response
    - Step-by-step reasoning (reasoning chain)
    - Specific references/snippets from the document
- Challenge Me:
  - System generates three logic-based or comprehension questions from the document.
  - User answers each question.
  - Backend evaluates the answer, provides a score, feedback, and references to the supporting document content.
Justification & Context
- Every answer and evaluation includes:
  - A reference to the supporting document section (e.g., "Page 2, Section 1.3" or snippet preview)
  - A brief justification or reasoning chain
- The system avoids hallucination by grounding all responses in retrieved document content.
Session & Memory
- The frontend maintains session context, allowing for follow-up questions and persistent chat history.
- Each message can display the reasoning chain and supporting snippets.

Example Reasoning Flow (Ask Anything)

User uploads a research paper.
User asks: "What is the main finding of this paper?"
Backend retrieves the most relevant chunks using hybrid search.
LLM is prompted with:
- The user's question
- The retrieved context
- Instructions to answer only using the provided context and to cite references
LLM returns:
- Answer: "The main finding is X."
- References: "Page 3, Section 2.1"
- Reasoning: "This is supported by the summary in Section 2.1, which states..."

Example Reasoning Flow (Challenge Me)

User uploads a technical manual.
User selects "Challenge Me."
System generates three logic-based questions (e.g., "Explain the process described in Section 4.2").
User answers; backend evaluates each answer, provides a score, feedback, and references.

Troubleshooting

Issue	Solution
Backend won't start	Run `python download_models.py` in backend
Frontend can't connect	Make sure backend is running on port 8000
Model download slow	Wait for first run; models are cached
Port already in use	Change port in `main.py` or frontend config
ChromaDB errors	Delete `chroma_db/` and restart backend

Tests

cd tests
pytest  # or python test_backend.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OmniDoc AI: Smart Assistant for Research Summarization

Features

Setup Instructions

1. Backend (FastAPI)

2. Frontend (React)

#Demo Video

Usage

API Keys

Project Structure

Architecture / Reasoning Flow

System Overview

Reasoning Flow

Example Reasoning Flow (Ask Anything)

Example Reasoning Flow (Challenge Me)

Troubleshooting

Tests

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backend		backend
chroma_db		chroma_db
frontend		frontend
tests		tests
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
check_openai_version.py		check_openai_version.py
requirements.txt		requirements.txt

License

HmbleCreator/OmniDoc-AI

Folders and files

Latest commit

History

Repository files navigation

OmniDoc AI: Smart Assistant for Research Summarization

Features

Setup Instructions

1. Backend (FastAPI)

2. Frontend (React)

#Demo Video

Usage

API Keys

Project Structure

Architecture / Reasoning Flow

System Overview

Reasoning Flow

Example Reasoning Flow (Ask Anything)

Example Reasoning Flow (Challenge Me)

Troubleshooting

Tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages