Skip to content

T5-based Generative AI text summarization system with training, evaluation, Streamlit UI, and Docker support

Notifications You must be signed in to change notification settings

Yanming99/text-summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML Text Summarization System

This is an end-to-end application that uses a fine-tuned Transformer (T5-small) to summarize long-form text. It includes model training, evaluation, an interactive Streamlit UI, and Docker support with GPU acceleration.

Features:

  • Text summarization using HuggingFace Transformers (T5)
  • Training on CNN/DailyMail dataset using HuggingFace Trainer
  • Evaluation with ROUGE metrics
  • Streamlit-based interactive UI
  • Docker support with GPU acceleration
  • Unit tests included

Quickstart:

  1. Install dependencies: pip install -r requirements.txt pip install datasets rouge-score

  2. Run the Streamlit app: streamlit run streamlit_app.py

  3. Train the model: python train/train_model.py

    The model will be saved to ./fine_tuned_model

  4. Evaluate model performance: Load the model and tokenizer from ./fine_tuned_model Use train/evaluate.py to compare generated summaries with reference summaries using ROUGE.

  5. Run unit tests: python test_summarize.py

  6. Docker usage: docker build -t genai-summarizer-gpu . docker run --gpus all -p 8501:8501 genai-summarizer-gpu

Project structure:

  • summarize.py: Core summarization logic
  • streamlit_app.py: Streamlit frontend
  • api.py: FastAPI backend (optional)
  • train/: Scripts for data loading, training, and evaluation
  • test_summarize.py: Unit tests
  • Dockerfile: GPU-enabled container config
  • requirements.txt: Python dependencies

Author: Yanming Luo

About

T5-based Generative AI text summarization system with training, evaluation, Streamlit UI, and Docker support

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published