Skip to content

Conversation

behroozazarkhalili
Copy link

Summary

This PR adds a comprehensive RAG (Retrieval-Augmented Generation) notebook that demonstrates how to build an end-to-end RAG system using:

  • Milvus: Vector database for similarity search and storage
  • LangChain: Framework for document processing and text splitting
  • Anthropic Claude: Large language model for response generation
  • Sentence Transformers: For embedding generation

Features

  • ✅ Modular RAG pipeline design with configurable components
  • ✅ Support for multiple document formats (PDF, TXT, MD)
  • ✅ Interactive query sessions with real-time responses
  • ✅ Comprehensive error handling and logging
  • ✅ Complete setup instructions including Milvus installation
  • ✅ Sample data and example queries for testing
  • ✅ PDF-specific workflow with dedicated pipeline
  • ✅ Utility functions for collection statistics and cleanup

File Structure

  • RAG_Milvus_LangChain_Anthropic.ipynb: Complete notebook with step-by-step implementation
  • Includes prerequisites, setup instructions, and interactive examples
  • Compatible with both local development and Google Colab

Testing

  • Tested with sample documents and PDF files
  • Includes interactive sessions for real-time validation
  • Comprehensive error handling and resource cleanup

This notebook provides a production-ready foundation for developers looking to implement RAG systems with Milvus and Anthropic Claude.

jaelgu and others added 15 commits May 22, 2025 17:48
Signed-off-by: ChengZi <[email protected]>
- Complete RAG pipeline implementation using Milvus vector database
- Integration with LangChain framework for document processing
- Anthropic Claude for response generation
- Modular design with configurable components
- Support for PDF and text document processing
- Interactive query sessions and utility functions
- Comprehensive setup instructions and examples
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: behroozazarkhalili

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot
Copy link
Collaborator

Welcome @behroozazarkhalili! It looks like this is your first PR to milvus-io/bootcamp 🎉

@@ -0,0 +1,1168 @@
{
Copy link
Collaborator

@codingjaguar codingjaguar Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: If the repository is private, you'll need to make it public for the Colab badge to work, or manually upload the notebook to Colab.

We can remove this line :)


Reply via ReviewNB

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made it public.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change repository visibility
For security reasons, you cannot change the visibility of a fork.

I removed the colab badge.

@@ -0,0 +1,1168 @@
{
Copy link
Collaborator

@codingjaguar codingjaguar Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #26.                connections.connect("default", host=self.host, port=self.port)

This is deprecated API. Recommend to use MilvusClient(). Also it's suggested to use endpoint that composes the host and port for initializing MilvusClient()


Reply via ReviewNB

@@ -0,0 +1,1168 @@
{
Copy link
Collaborator

@codingjaguar codingjaguar Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #14.        print("📝 Expected files: ai_overview.txt, vector_databases.txt, rag_systems.txt")

Where do they get such expected files?


Reply via ReviewNB

- Replace deprecated connections.connect() with MilvusClient(uri=endpoint)
- Implement URI endpoint composition pattern for MilvusClient initialization
- Update MilvusVectorStore class to use modern MilvusClient methods
- Simplify schema creation using MilvusClient's streamlined approach
- Fix vector search by adding anns_field parameter specification
- Add proper resource cleanup with client.close() method
- Remove deprecated utility imports and Collection-based operations
- Ensure compatibility with current Milvus Python SDK
- Enhanced API key configuration with environment variable fallback
- Improved error handling and resource management
- Added comprehensive setup instructions and prerequisites
- Included detailed Milvus installation guide with Docker commands
- Enhanced modular architecture with better separation of concerns
- Added support for both sample documents and PDF processing workflows
- Improved logging and debugging capabilities
- Added interactive session functions with better user experience
- Enhanced documentation with step-by-step explanations
- Optimized vector search and embedding generation processes
@behroozazarkhalili behroozazarkhalili force-pushed the add-anthropic-rag-notebook branch 2 times, most recently from bf9869d to b043afa Compare September 2, 2025 01:25
@behroozazarkhalili behroozazarkhalili force-pushed the add-anthropic-rag-notebook branch from 7fd4303 to a6d3789 Compare September 2, 2025 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants