This guide provides information for developers who want to contribute to or extend CodeWiki.
codewiki/
├── codewiki/ # Main package
│ ├── cli/ # CLI implementation
│ │ ├── commands/ # CLI commands (config, generate)
│ │ ├── models/ # Data models
│ │ ├── utils/ # Utilities
│ │ └── adapters/ # External integrations
│ ├── src/ # Web application
│ │ ├── be/ # Backend (dependency analysis, agents)
│ │ │ ├── agent_orchestrator.py
│ │ │ ├── agent_tools/
│ │ │ ├── cluster_modules.py
│ │ │ ├── dependency_analyzer/
│ │ │ ├── documentation_generator.py
│ │ │ └── llm_services.py
│ │ └── fe/ # Frontend (web interface)
│ │ ├── web_app.py
│ │ ├── routes.py
│ │ ├── github_processor.py
│ │ └── visualise_docs.py
│ ├── templates/ # HTML templates
│ └── run_web_app.py # Web app entry point
├── docker/ # Docker configuration
│ ├── Dockerfile
│ ├── docker-compose.yml
│ └── env.example
├── img/ # Images and assets
├── paper/ # Research paper source
├── tests/ # Test suite
├── output/ # Generated documentation output
├── pyproject.toml # Project metadata
├── requirements.txt # Python dependencies
└── README.md # Main documentation
- Python 3.12+
- Node.js (for mermaid validation)
- Git
- Tree-sitter language parsers
# Clone the repository
git clone https://github.com/FSoft-AI4Code/CodeWiki.git
cd CodeWiki
# Create virtual environment
python3.12 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in development mode
pip install -e .
# Install development dependencies
pip install -r requirements.txt- AST Parser: Tree-sitter based parsing for 7 languages
- Dependency Graph Builder: Constructs call graphs and dependency relationships
- Analyzers: Language-specific analyzers (Python, Java, JavaScript, TypeScript, C, C++, C#)
- Hierarchical decomposition of repository structure
- Feature-oriented module partitioning
- Topological sorting for dependency ordering
- Recursive agent-based documentation generation
- Dynamic delegation for complex modules
- Cross-module reference management
read_code_components.py: Code reading utilitiesgenerate_sub_module_documentations.py: Sub-module documentation generationstr_replace_editor.py: Documentation editing toolsdeps.py: Dependency traversal tools
- FastAPI Backend:
web_app.py,routes.py - GitHub Integration:
github_processor.py - Documentation Viewer:
visualise_docs.py - Background Processing:
background_worker.py
config.py: Configuration management (API settings + agent instructions)generate.py: Documentation generation with customization options
config.py: Configuration data models includingAgentInstructionsjob.py: Job tracking models
fs.py: File system operationsvalidation.py: Input validationprogress.py: Progress trackinglogging.py: Logging configuration
The AgentInstructions model (cli/models/config.py) enables customization:
@dataclass
class AgentInstructions:
include_patterns: Optional[List[str]] = None # e.g., ["*.cs"]
exclude_patterns: Optional[List[str]] = None # e.g., ["*Tests*"]
focus_modules: Optional[List[str]] = None # e.g., ["src/core"]
doc_type: Optional[str] = None # api, architecture, etc.
custom_instructions: Optional[str] = None # Free-form textHow it flows through the system:
- CLI Options (
generate.py) → RuntimeAgentInstructions - Persistent Config (
~/.codewiki/config.json) → DefaultAgentInstructions - Backend Config (
src/config.py) →agent_instructionsdict - Dependency Analyzer → Uses
include_patternsandexclude_patternsfor file filtering - Agent Orchestrator → Injects
custom_instructionsinto LLM prompts
To add new customization options to the agent instructions system:
- Update the model in
cli/models/config.py:
@dataclass
class AgentInstructions:
# ... existing fields ...
new_option: Optional[str] = None # Add new field-
Update serialization methods (
to_dict,from_dict,is_empty,get_prompt_addition) -
Add CLI options in
cli/commands/generate.pyandcli/commands/config.py -
Update backend Config if the option affects analysis (
src/config.py) -
Use in relevant components:
- File filtering →
dependency_analyzer/ast_parser.py - Prompts →
be/prompt_template.py - Agent creation →
be/agent_orchestrator.py
- File filtering →
To add support for a new programming language:
- Add language analyzer in
src/be/dependency_analyzer/analyzers/:
# new_language.py
from .base import BaseAnalyzer
class NewLanguageAnalyzer(BaseAnalyzer):
def __init__(self):
super().__init__("new_language")
def extract_dependencies(self, ast_node):
# Implement dependency extraction
pass
def extract_components(self, ast_node):
# Implement component extraction
pass- Register the analyzer in
src/be/dependency_analyzer/ast_parser.py:
LANGUAGE_ANALYZERS = {
# ... existing languages ...
"new_language": NewLanguageAnalyzer,
}-
Add file extensions in configuration
-
Add tests for the new language
# Run all tests
pytest
# Run specific test file
pytest tests/test_dependency_analyzer.py
# Run with coverage
pytest --cov=codewiki tests/- Follow PEP 8 for Python code
- Use type hints where applicable
- Write docstrings for public functions and classes
- Keep functions focused and modular
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Make your changes
- Write/update tests
- Ensure tests pass:
pytest - Commit your changes:
git commit -am 'Add new feature' - Push to the branch:
git push origin feature/your-feature - Submit a pull request
graph TB
A[Repository Input] --> B[Dependency Graph Construction]
B --> C[Hierarchical Decomposition]
C --> D[Module Tree]
D --> E[Recursive Agent Processing]
E --> F{Complexity Check}
F -->|Complex| G[Dynamic Delegation]
F -->|Simple| H[Generate Documentation]
G --> E
H --> I[Cross-Module References]
I --> J[Hierarchical Assembly]
J --> K[Comprehensive Documentation]
# CLI
codewiki generate --verbose
# Environment variable
export CODEWIKI_LOG_LEVEL=DEBUGTree-sitter parser errors:
- Ensure language parsers are properly installed
- Check file encoding (UTF-8 expected)
LLM API errors:
- Verify API keys and endpoints
- Check rate limits
- Enable retry logic
Memory issues with large repositories:
- Adjust module decomposition threshold
- Increase delegation depth limit
- Caching: Results are cached to avoid redundant processing
- Parallel Processing: Multiple modules can be processed concurrently
- Incremental Updates: Only process changed modules (future work)
For development questions:
- GitHub Issues: https://github.com/FSoft-AI4Code/CodeWiki/issues
- Main Documentation: README.md