ML Text Summarization System
This is an end-to-end application that uses a fine-tuned Transformer (T5-small) to summarize long-form text. It includes model training, evaluation, an interactive Streamlit UI, and Docker support with GPU acceleration.
Features:
- Text summarization using HuggingFace Transformers (T5)
- Training on CNN/DailyMail dataset using HuggingFace Trainer
- Evaluation with ROUGE metrics
- Streamlit-based interactive UI
- Docker support with GPU acceleration
- Unit tests included
Quickstart:
-
Install dependencies: pip install -r requirements.txt pip install datasets rouge-score
-
Run the Streamlit app: streamlit run streamlit_app.py
-
Train the model: python train/train_model.py
The model will be saved to ./fine_tuned_model
-
Evaluate model performance: Load the model and tokenizer from ./fine_tuned_model Use train/evaluate.py to compare generated summaries with reference summaries using ROUGE.
-
Run unit tests: python test_summarize.py
-
Docker usage: docker build -t genai-summarizer-gpu . docker run --gpus all -p 8501:8501 genai-summarizer-gpu
Project structure:
- summarize.py: Core summarization logic
- streamlit_app.py: Streamlit frontend
- api.py: FastAPI backend (optional)
- train/: Scripts for data loading, training, and evaluation
- test_summarize.py: Unit tests
- Dockerfile: GPU-enabled container config
- requirements.txt: Python dependencies
Author: Yanming Luo