🧠 NLP & RAG Engineering — A Practical Learning Path

A hands-on, end-to-end curriculum covering Retrieval-Augmented Generation (RAG) from fundamentals to production deployment and agentic systems.

14 modules · Jupyter Notebooks · LangChain · LangGraph · FastAPI · Qdrant · Pinecone

📖 About

This repository is a structured, practical guide to building production-grade NLP and RAG systems. Each module builds on the previous — starting from raw document ingestion and ending with deployed agentic AI systems that can retrieve, reason, and produce reports.

Every concept is implemented from scratch in Jupyter Notebooks with working code, not just theory.

🗺️ Learning Path

Documents → Chunks → Embeddings → Vector DB → Retrieval → RAG Pipeline
    ↓
Evaluation → Debugging → Hybrid Search → Agentic RAG → Deployment → Capstones

📚 Modules

#	Module	Key Concepts
01	Document Loaders — Ingesting Data for RAG	PDF, web, CSV, Notion loaders; LangChain document loaders
02	Chunking Strategies — Text Splitters	Fixed-size, recursive, semantic, markdown splitters; overlap strategies
03	Embeddings Explained — Dense vs Sparse	OpenAI embeddings, sentence-transformers, TF-IDF, BM25
04	Vector Databases — Qdrant & Pinecone	Indexing, similarity search, namespaces, metadata filtering
05	Semantic Search & Retrievers	MMR, similarity threshold, contextual compression, multi-query
06	Building Vanilla RAG with LCEL	End-to-end RAG pipeline, prompt engineering, LangChain Expression Language
07	RAG Evaluation with RAGAS	Faithfulness, answer relevancy, context recall, precision metrics
08	Common RAG Issues & Solutions	Hallucination debugging, retrieval failures, chunk quality fixes
09	Hybrid Search — Dense + BM25 + RRF	Combining semantic and keyword search, Reciprocal Rank Fusion
10	Agentic RAG with LangGraph	Reflect & improve loops, self-correcting RAG, graph-based agents
11	CAG — Cache Augmented Generation	KV-cache injection, latency optimization, long-context strategies
12	Deploying RAG with FastAPI	Production API, async endpoints, Docker-ready RAG service
13	Capstone 1 — Enterprise Document Q&A	Production-grade RAG assistant, end-to-end build
14	Capstone 2 — Agentic Research Assistant	Retrieve, analyze & produce structured reports with LangGraph

🧰 Tech Stack

Category	Tools
Language	Python 3.10+
Notebooks	Jupyter Notebook / JupyterLab
LLM Framework	LangChain, LangGraph, LCEL
Embeddings	OpenAI `text-embedding-3-small`, Sentence-Transformers
Vector Databases	Qdrant, Pinecone, ChromaDB
Evaluation	RAGAS
Deployment	FastAPI, Uvicorn
Search	BM25, Dense retrieval, RRF (Reciprocal Rank Fusion)
Agent Framework	LangGraph (StateGraph, ToolNode, conditional edges)

⚙️ Setup

Prerequisites

Python 3.10+
Jupyter Notebook or JupyterLab
API keys for OpenAI / Gemini (used in notebooks)

1. Clone

git clone https://github.com/tashfeen786/NLP.git
cd NLP

2. Install Dependencies

pip install -r requirements.txt

Or install per-module (each folder has its own requirements where needed):

pip install langchain langchain-community langgraph openai sentence-transformers \
            chromadb qdrant-client pinecone-client ragas fastapi uvicorn jupyter

3. Set API Keys

Create a .env file in the root:

OPENAI_API_KEY=your_openai_key
PINECONE_API_KEY=your_pinecone_key
QDRANT_API_KEY=your_qdrant_key     # if using cloud
QDRANT_URL=your_qdrant_url         # if using cloud

4. Run Notebooks

jupyter notebook
# or
jupyter lab

Open any module folder and run the notebook step by step.

🎯 What You'll Build

By the end of this curriculum you will have built:

Module 13 — Enterprise Document Q&A System

Upload any document (PDF, DOCX, TXT)
Chunking + embedding pipeline
Retrieval-augmented answer generation
Deployed as a production FastAPI service

Module 14 — Agentic Research Assistant

Multi-step research agent using LangGraph
Retrieves from multiple sources
Self-reflects and improves answers
Produces structured analytical reports

📂 Repository Structure

NLP/
├── 1 Document Loaders (Ingesting Data for RAG)/
│   └── notebook.ipynb
├── 2 Chunking Strategies for RAG Text Splitters/
│   └── notebook.ipynb
├── 3 Embeddings Explained Dense vs Sparse/
│   └── notebook.ipynb
├── 4 Vector Databases Qdrant & Pinecone/
│   └── notebook.ipynb
├── 5 Semantic Search & Retrievers/
│   └── notebook.ipynb
├── 6 Building Vanilla RAG with LCEL/
│   └── notebook.ipynb
├── 7 RAG Evaluation with RAGAS/
│   └── notebook.ipynb
├── 8 Common RAG Issues & Solutions/
│   └── notebook.ipynb
├── 9 Hybrid Search Dense + BM25 + RRF/
│   └── notebook.ipynb
├── 10 Agentic RAG with LangGraph/
│   └── notebook.ipynb
├── 11 CAG (Cache Augmented Generation)/
│   └── notebook.ipynb
├── 12 Deploying RAG with FastAPI/
│   └── notebook.ipynb + app.py
├── 13 Capstone 1 — Enterprise Document Q&A/
│   └── notebook.ipynb + full project
├── 14 Capstone 2 — Agentic Research Assistant/
│   └── notebook.ipynb + full project
└── README.md

💡 Key Concepts Covered

RAG Fundamentals

Document ingestion from multiple sources
Chunking with overlap for context preservation
Dense vs sparse embeddings and when to use each
Vector similarity search (cosine, dot product, euclidean)

Advanced Retrieval

Hybrid search combining semantic + keyword (BM25 + RRF)
Multi-query retrieval for comprehensive coverage
Contextual compression to reduce noise
Metadata filtering for scoped retrieval

Quality & Debugging

RAGAS evaluation framework (faithfulness, relevancy, recall)
Common failure modes and systematic fixes
Hallucination detection and mitigation

Agentic Systems

LangGraph state machines for multi-step agents
Self-reflection and answer improvement loops
Tool-calling agents with RAG as a tool

Production

FastAPI deployment with async endpoints
Cache Augmented Generation for low latency
End-to-end capstone projects

🚀 Recommended Learning Order

Follow the numbered modules in sequence — each one builds on the previous:

01 → 02 → 03 → 04 → 05 → 06   (Core RAG pipeline)
                              ↓
                    07 → 08 → 09   (Quality & advanced retrieval)
                                  ↓
                         10 → 11 → 12   (Agentic & deployment)
                                        ↓
                               13 → 14   (Capstone projects)

🤝 Contributing

Fork the repository
Create a branch: git checkout -b module/your-addition
Commit: git commit -m "Add: your module description"
Push and open a Pull Request

📄 License

This project is licensed under the MIT License.

👤 Author

Tashfeen Aziz
GitHub: @tashfeen786

From raw documents to production-grade agentic AI — built step by step.

🧠 LangChain · LangGraph · RAG · FastAPI · Qdrant · Pinecone · RAGAS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 NLP & RAG Engineering — A Practical Learning Path

📖 About

🗺️ Learning Path

📚 Modules

🧰 Tech Stack

⚙️ Setup

Prerequisites

1. Clone

2. Install Dependencies

3. Set API Keys

4. Run Notebooks

🎯 What You'll Build

📂 Repository Structure

💡 Key Concepts Covered

🚀 Recommended Learning Order

🤝 Contributing

📄 License

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
1 Document Loaders (Ingesting Data for RAG)		1 Document Loaders (Ingesting Data for RAG)
10 Agentic RAG with LangGraph Reflect Improve for Stronger Answers		10 Agentic RAG with LangGraph Reflect Improve for Stronger Answers
11 CAG (Cache Augmented Generation)		11 CAG (Cache Augmented Generation)
12 Deploying RAG with FastAPI Turn Your RAG Into a Real Product API		12 Deploying RAG with FastAPI Turn Your RAG Into a Real Product API
13 Capstone 1 — Enterprise Document Q&A Build a Production Grade RAG Assistant		13 Capstone 1 — Enterprise Document Q&A Build a Production Grade RAG Assistant
14 Capstone 2 — Agentic Research Assistant Retrieve, Analyze & Produce Reports		14 Capstone 2 — Agentic Research Assistant Retrieve, Analyze & Produce Reports
2 Chunking Strategies for RAG Text Splitters		2 Chunking Strategies for RAG Text Splitters
3 Embeddings Explained Dense vs Sparse		3 Embeddings Explained Dense vs Sparse
4 Vector Databases Qdrant & Pinecone		4 Vector Databases Qdrant & Pinecone
5 Semantic Search & Retrievers How RAG Finds the Right Answers		5 Semantic Search & Retrievers How RAG Finds the Right Answers
6 Building Vanilla RAG with LCEL From Query to Answer (End-to-End)		6 Building Vanilla RAG with LCEL From Query to Answer (End-to-End)
7 RAG Evaluation with RAGAS Measure Quality, Fix Weak Areas		7 RAG Evaluation with RAGAS Measure Quality, Fix Weak Areas
8 Common RAG Issues & Solutions Debug Retrieval and Reduce Hallucinations		8 Common RAG Issues & Solutions Debug Retrieval and Reduce Hallucinations
9 Hybrid Search Dense + BM25 + RRF for Better Retrieval		9 Hybrid Search Dense + BM25 + RRF for Better Retrieval
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🧠 NLP & RAG Engineering — A Practical Learning Path

📖 About

🗺️ Learning Path

📚 Modules

🧰 Tech Stack

⚙️ Setup

Prerequisites

1. Clone

2. Install Dependencies

3. Set API Keys

4. Run Notebooks

🎯 What You'll Build

📂 Repository Structure

💡 Key Concepts Covered

🚀 Recommended Learning Order

🤝 Contributing

📄 License

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages