CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Trigent—A Rich Issue MCP for GitHub Triaging at Scale is an MCP server that provides enriched GitHub issue data to help AI agents effectively triage thousands of issues in upstream projects like JupyterLab. The system enriches raw issue data with semantic embeddings, metrics computation, and intelligent analysis to enable better AI-powered decision-making.

Architecture

The system consists of several Python modules under trigent/:

Core Modules

trigent/pull.py: Data pulling module that fetches raw issues from GitHub repositories using intelligent paging
- Uses gh CLI for GitHub API access with weekly chunking based on updatedAt timestamps
- Implements incremental updates to avoid refetching unchanged issues
- Uses TinyDB for persistent storage and direct issue comparison for updates
- Merges new/updated issues with existing data while preserving all information
- Stores data directly in TinyDB database files
trigent/enrich.py: Data enrichment module that processes raw issue data
- Adds embeddings for semantic search (via Mistral API)
- Computes metrics: reactions, comments, age, activity scores
- Assigns quartiles for all metrics using pandas qcut() with descriptive labels (Bottom25%, Bottom50%, Top50%, Top25%)
- Updates TinyDB database with enriched data
trigent/mcp_server.py: FastMCP server providing database access tools
- Serves enriched issue data to AI agents
- Tools: get_issue, find_similar_issues, find_cross_referenced_issues, get_issue_metrics
trigent/cli.py: CLI orchestration module
- Unified trigent command with subcommands
- Orchestrates the entire workflow from pull to triaging

Additional Modules

trigent/config.py: Configuration management and caching
trigent/database.py: Database utilities and operations

Workflow

# Install the package
pip install -e .

# 1. Initial repository setup (pulls data and enriches it)
trigent pull jupyterlab/jupyterlab --start-date 2025-01-01

# 2. Keep repository up to date (incremental updates)
trigent update jupyterlab/jupyterlab

# 3. Start MCP server for AI agent access
trigent serve jupyterlab/jupyterlab

# 4. Export data for analysis
trigent export jupyterlab/jupyterlab --csv --viz

# 5. Show collection statistics
trigent stats                         # Show all collections
trigent stats jupyterlab/jupyterlab   # Show specific repo

# 6. Clean repository data
trigent clean jupyterlab/jupyterlab

Development Commands

Setup

# Install with development dependencies
pip install -e ".[dev]"

# Configure Mistral API key in config.toml
cp config.toml.example config.toml
# Edit config.toml and add your Mistral API key

Code Quality

# Lint, format, and type check
ruff check trigent/ && ruff format trigent/ && mypy trigent/

Key Files

trigent/cli.py: Main CLI entry point with simplified commands
trigent/pull.py: Python module for fetching raw issues from GitHub
trigent/enrich.py: Python enrichment pipeline with embeddings/metrics
trigent/mcp_server.py: FastMCP server for database access
trigent/database.py: Qdrant vector database operations
trigent/config.py: Configuration management and API key handling
config.toml: User configuration file (API keys, Qdrant settings)
pyproject.toml: Project configuration

Dependencies

Python 3.12+: Core language with modern type hints (updated requirement)
pandas, numpy: Data processing and quartile calculations
requests: HTTP client for Mistral API
FastMCP: Minimal server for database access
scikit-learn: Machine learning utilities for k-nearest neighbors
diskcache: Persistent caching for API responses
toml: Configuration file parsing
ipython, ipdb: Interactive development and debugging
gh CLI: GitHub issue fetching (external dependency)

Architecture Notes

Unified Python: All components integrated in single Python package with clean module separation
Intelligent Paging: GitHub issues fetched via gh CLI with weekly chunking and incremental updates
State Management: Pull module tracks last fetch timestamps to enable efficient incremental updates
Issue Merging: Smart merge logic updates existing issues while preserving all data integrity
Enriched Data: Pandas-based processing adds embeddings and quartiles (UMAP removed)
MCP Server: FastMCP provides database access tools for AI agents
Simplified CLI: Streamlined commands that combine operations (e.g., pull does fetch + enrich)
Direct Integration: No subprocess calls between internal modules - all use direct Python imports

Project Structure

Trigent/
├── trigent/           # Main Python package
│   ├── __init__.py
│   ├── __main__.py          # Entry point for python -m trigent
│   ├── cli.py               # CLI orchestration with all subcommands
│   ├── clean.py             # Clean command implementation
│   ├── update.py            # Update command implementation
│   ├── stats.py             # Stats command implementation
│   ├── pull.py              # GitHub issue fetching via gh CLI
│   ├── enrich.py            # Data enrichment with embeddings/metrics
│   ├── database.py          # Qdrant operations and utilities
│   ├── config.py            # Configuration management and caching
│   ├── export/              # Export command + CSV/viz subdirectory
│   │   ├── __init__.py
│   │   ├── command.py       # Export command entry point
│   │   ├── csv.py           # CSV export functionality
│   │   └── visualize.py     # Visualization export
│   └── serve/               # Serve command + MCP server subdirectory
│       ├── __init__.py
│       ├── __main__.py      # Entry point for python -m trigent.serve
│       ├── command.py       # Serve command entry point
│       └── mcp_server.py    # FastMCP server implementation
├── data/                    # Data storage directory
│   └── issues-{repo}.db     # TinyDB database files (e.g., issues-jupyterlab-jupyterlab.db)
├── dcache/                  # Diskcache directory for API response caching
├── example/                 # Example implementations and agents
├── config.toml              # Configuration file (API keys, settings)
├── config.toml.example      # Example configuration template
├── pyproject.toml           # Python project configuration
├── README.md                # Project documentation
├── CLAUDE.md                # Development instructions for Claude Code
└── uv.lock                  # Dependency lock file

Development Notes

Loading Database for Testing

To test database functionality, load the database the same way as the MCP server:

from trigent.database import load_issues

def _get_repo_name(repo=None):
    """Get repository name, defaulting to jupyterlab/jupyterlab."""
    return repo or "jupyterlab/jupyterlab"

# Load exactly like MCP server  
repo = _get_repo_name()
issues = load_issues(repo)

# Find specific issue
issue_3224 = next((i for i in issues if i["number"] == 3224), None)

Note: The database must be populated first by running:

trigent pull jupyterlab/jupyterlab --mode create (to fetch raw issues in create mode)
trigent enrich jupyterlab/jupyterlab (to add embeddings and metrics)
Subsequent updates use: trigent pull jupyterlab/jupyterlab --mode update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Architecture

Core Modules

Additional Modules

Workflow

Development Commands

Setup

Code Quality

Key Files

Dependencies

Architecture Notes

Project Structure

Development Notes

Loading Database for Testing

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Architecture

Core Modules

Additional Modules

Workflow

Development Commands

Setup

Code Quality

Key Files

Dependencies

Architecture Notes

Project Structure

Development Notes

Loading Database for Testing