A sophisticated tool that automatically generates Swagger/OpenAPI documentation for Express.js APIs using advanced NLP techniques and LLMs.
This project combines Natural Language Processing (NLP) techniques with Large Language Models (LLMs) to automatically generate high-quality API documentation. By preprocessing code with NLP before sending it to LLMs, we achieve:
- Better context understanding
- Reduced token usage
- More specific and higher quality responses
- Fine-grained control over the documentation pipeline
- Automated code base updates
- Automatic API route detection
- Intelligent parameter inference
- Response schema generation
- Validation rules detection
- Swagger/OpenAPI compliant output
- Support for Express.js routes
- Automated documentation insertion
- Git Diff Detection
- Identify changed files
- Code Extraction
- Extract route handlers
- Extract existing comments
- Basic Preprocessing
- Remove unnecessary whitespace
- Normalize line endings
- CodeBERT Processing
- Generate code embeddings
- Semantic code understanding
- Token Classification
- CRUD operation detection
- API endpoint classification
- Named Entity Recognition (NER)
- Route paths
- HTTP methods
- Variable names
- Function names
- Status codes
- Semantic Role Labeling
- Action identification
- Resource detection
- Parameter role assignment
- Transformer Model Analysis
- CRUD patterns
- Authentication flows
- Error handling
- Data validation
- Information Extraction
- Parameter types
- Validation rules
- Response formats
- Constraint Detection
- Required fields
- Validation rules
- Prompt Generation
- Documentation Generation
- Code Base Updates
- Clone the repository:
git clone https://github.com/yourusername/auto_swagger.git
cd auto_swagger- Install UV if you haven't already:
curl -LsSf https://astral.sh/uv/install.sh | sh- Create a virtual environment and install dependencies:
uv venv- Run the auto-swagger tool:
# Run the main documentation generator
uv run auto-swagger --repo-path path/to/express/app
# Run the fine-tuning tool (if needed)
uv run finetuneauto_swagger/
├── src/
│ └── auto_swagger/ # Source code
│ ├── config/ # Configuration management
│ ├── finetune/ # Model fine-tuning utilities
│ ├── parser/ # Code parsing and analysis
│ └── swagger_generator/ # Documentation generation
├── data/ # Project data
│ ├── jsdocs_finetune.jsonl # Fine-tuning dataset
│ └── swagger_docs/ # Generated documentation
├── pyproject.toml # Project configuration
└── README.md
The project uses a config to change the model you want:
@dataclass
class LLMConfig:
model_name: str = "deepseek-ai/deepseek-coder-1.3b-instruct"
max_new_tokens: int = 8192
temperature: float = 0.2
top_k: int = 50
top_p: float = 0.95
max_retries: int = 3- Support for additional backend frameworks beyond Express.js
- Local CLI version without GitHub app dependency
- Enhanced pattern recognition
- Additional documentation formats
- Real-time documentation updates