An automated system that collects AI content, generates summaries using LLMs, ranks them, and delivers a personalized daily email digest — fully deployed on the cloud.
graph TD
A[Sources: YouTube / OpenAI / Anthropic] --> B[Scrapers]
B --> C[Raw Data Stored in PostgreSQL]
C --> D[Content Processing]
D --> E[LLM Summarization]
E --> F[Digest Table]
F --> G[Curator Agent Ranking]
G --> H[Top-N Selection]
H --> I[Email Agent]
I --> J[HTML + Markdown Email]
J --> K[Gmail SMTP Delivery]
F --> L[Mark as Sent]
The system runs as a daily cron job and performs:
- Multi-source scraping
- LLM-based summarization
- Personalized ranking
- Email delivery
- State tracking (
sent_at) to prevent duplicates
Sources → Scrapers → DB → LLM → Ranking → Email → User
- Python 3.12
- PostgreSQL (local + Render)
- SQLAlchemy
- OpenAI API
- Docker + Render
- uv (dependency management)
- Scrape latest AI content
- Extract transcripts / text
- Generate summaries using LLM
- Store digests in database
- Rank based on user profile
- Send email digest
- Mark digests as sent
| Environment | Database |
|---|---|
| LOCAL | PostgreSQL via POSTGRES_* |
| PRODUCTION | DATABASE_URL (Render) |
- ✅ End-to-end automated pipeline
- ✅ LLM-powered summarization
- ✅ Smart ranking system
- ✅ Duplicate prevention (
sent_at) - ✅ Cloud deployment with cron jobs
- ✅ Clean modular architecture
-
Uses
render.yaml -
Deploys:
- PostgreSQL DB
- Cron job (
daily-digest-job)
app/
├── agent/
├── database/
├── scrapers/
├── services/
├── profiles/
├── daily_runner.py
├── runner.py
main.py
Dockerfile
render.yaml
Darsh Vora MS Data Analytics Engineering — Northeastern University
This project demonstrates building and deploying a real-world GenAI system combining data pipelines, LLMs, and cloud infrastructure.
If you found this useful, consider ⭐ the repo!