Docbot - AI-Powered Document Assistant 🤖

Overview 🔍

Docbot is an intelligent document assistant that processes and analyzes documents to provide accurate answers to user queries. It leverages multiple advanced language models and retrieval techniques to ensure high-quality responses.

Features ⭐

Multi-document processing and analysis
Multi-route retrieval system for improved accuracy
Advanced reranking mechanism
Interactive chat interface
Support for various document formats
Streaming responses with real-time feedback

Technical Architecture 🏗️

Retrieval System 🔍

Multiple Embedding Models:
- GTE-large-zh: Optimized for Chinese text understanding
- BGE-large-zh: Enhanced semantic comprehension
- BM25: Classical information retrieval algorithm
Reranking: Uses BGE-reranker-large for context optimization
Large Language Model: Powered by GPT for natural language generation

How It Works 🛠️

Document Processing 📄:
- Documents are loaded and split into manageable chunks
- Each chunk is processed through multiple embedding models
Query Processing 🔎:
- User queries are processed through multiple retrieval routes
- Results are combined and reranked for relevance
- Most relevant context is selected for the final response
Response Generation 💬:
- Selected context is combined with the user query
- LLM generates natural and accurate responses
- Responses are streamed in real-time

Installation & Usage Guide 🚀

Prerequisites 📋

Python 3.10+
CUDA-capable GPU (recommended)
uv package manager (recommended)
OpenAI API key

Installation Steps 📥

Clone Repository

git clone https://github.com/AbyssSkb/Docbot
cd Docbot

Install Dependencies

# Using uv (recommended)
uv sync

# Or using pip
pip install -r requirements.txt

Environment Setup

Create a .env file in the project root:

OPENAI_API_KEY=your_api_key
OPENAI_BASE_URL=your_base_url  # Optional
OPENAI_LLM_MODEL=your_preferred_model  # Default: gpt-4o

Document Setup

Create a doc folder in the project root
Place your documents in the doc folder
Generate document indexes:

python create_index.py

Running the Application 🏃

streamlit run main.py

Basic Usage 💡

Open the provided URL in your web browser
Enter your questions in the chat interface
View real-time responses based on your documents

Limitations and Considerations ⚠️

Language Support 🌐:
- Primary optimization for Chinese text
- English support can be enabled by switching to English-language models
- Consider language-specific requirements for your use case
Text Processing 📝:
- Jieba tokenizer is optimized for Chinese
- Basic English tokenization support
- May require adjustment for other languages
Document Compatibility 📄:
- Uses LangChain's DirectoryLoader
- Some document formats may have compatibility issues
- Verify support for your specific document types

Contributing 🤝

Contributions are welcome! Please feel free to submit pull requests or create issues for bugs and feature requests.

License ⚖️

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_index.py		create_index.py
main.py		main.py
model.py		model.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Docbot - AI-Powered Document Assistant 🤖

Overview 🔍

Features ⭐

Technical Architecture 🏗️

Retrieval System 🔍

How It Works 🛠️

Installation & Usage Guide 🚀

Prerequisites 📋

Installation Steps 📥

Running the Application 🏃

Basic Usage 💡

Limitations and Considerations ⚠️

Contributing 🤝

License ⚖️

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AbyssSkb/Docbot

Folders and files

Latest commit

History

Repository files navigation

Docbot - AI-Powered Document Assistant 🤖

Overview 🔍

Features ⭐

Technical Architecture 🏗️

Retrieval System 🔍

How It Works 🛠️

Installation & Usage Guide 🚀

Prerequisites 📋

Installation Steps 📥

Running the Application 🏃

Basic Usage 💡

Limitations and Considerations ⚠️

Contributing 🤝

License ⚖️

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages