This project implements an interactive AI assistant using Flask and SocketIO, integrated with the Llama model (via llama_cpp). The system allows users to ask questions that are answered based on the content of local documents stored in a specified directory. The application processes these documents by dividing them into smaller chunks, which are then used as context to provide more accurate and relevant responses.
- Interactive Web Interface: Built with Flask and SocketIO for real-time communication between the client and server.
- Integration with Llama Model: Utilizes the Llama model for AI-driven response generation.
- Document Processing: Supports
.txt,.pdf, and.docxfiles, extracting and dividing content into searchable chunks. - Intelligent Context Search: Identifies relevant chunks based on user query keywords to enhance response relevance.
- Dynamic Token Adjustment: Adjusts the number of tokens used in responses based on query complexity and relevant context.
- Detailed Logging: Implements comprehensive logging for monitoring and debugging purposes.
-
Reading and Processing Documents:
- The system reads files from a specified directory (
c:\tempby default). - Supported formats include
.txt,.pdf, and.docx. - Uses
CharacterTextSplitterfrom LangChain to divide document content into chunks of 1000 characters with an overlap of 200 characters.
- The system reads files from a specified directory (
-
Searching for Relevant Chunks:
- Upon receiving a user query, the system removes stopwords and identifies key keywords.
- Searches through the document chunks to find those that contain at least 50% of the query keywords.
- If relevant chunks are found, they are used as additional context for generating a more informed response.
-
Generating Responses:
- Utilizes the Llama model to generate responses based on the provided context.
- Dynamically adjusts the number of tokens (
max_response_tokens) based on the complexity of the query and the amount of relevant context, ensuring responses are comprehensive yet concise. - Implements stop sequences (
stop=["Usuário:", "Assistente:"]) to ensure responses are well-formatted and complete.
-
Real-Time Communication:
- The web interface allows users to submit questions and receive answers in real-time.
- Utilizes SocketIO for bidirectional communication between the client and server.
- Python 3.8+: Ensure Python is installed on your system.
- Llama Model: Download the desired Llama model (
gemma-2-2b-it-Q4_K_M.gguf) and place it in the project directory or specify the correct path in the code. - Documents: Place your
.txt,.pdf, and.docxfiles in the designated directory (c:\tempby default).
-
Clone the Repository:
git clone https://github.com/lucasnumaboa/Chat-model-locally.git cd Chat-model-locally