:start-after: "## Overview"
:end-before: "## Installation"
graph TD
AmetainformantDocumentation[METAINFORMANT Documentation] --> BgettingStarted[Getting Started]
A --> CuserGuides[User Guides]
A --> DmoduleDocumentation[Module Documentation]
A --> EdeveloperResources[Developer Resources]
A --> F[Reference]
B --> B1[Installation]
B --> B2quickStart[Quick Start]
B --> B3[Tutorials]
C --> C1workflowGuides[Workflow Guides]
C --> C2bestPractices[Best Practices]
C --> C3[Troubleshooting]
C3 --> C3a[RNA Troubleshooting]
D --> D1coreModules[Core Modules]
D --> D2molecularAnalysis[Molecular Analysis]
D --> D3statisticalMethods[Statistical Methods]
D --> D4systemsBiology[Systems Biology]
D --> D5annotation&Metadata[Annotation & Metadata]
D --> D6[Utilities]
E --> E1[Architecture]
E --> E2[Contributing]
E --> E3[Testing]
E --> E4apiReference[API Reference]
F --> F1cliReference[CLI Reference]
F --> F2[Configuration]
F --> F3errorCodes[Error Codes]
subgraph "Primary Entry Points"
G1[README.md] -.-> B
G2[QUICKSTART.md] -.-> B
G3[TUTORIALS.md] -.-> C
end
subgraph "Module Categories"
H1[core/] -.-> D1
H2dna/,Rna/,Protein/,Epigenome/[dna/, rna/, protein/, epigenome/] -.-> D2
H3gwas/,Math/,Ml/,Information/[gwas/, math/, ml/, information/] -.-> D3
H4networks/,Multiomics/,Singlecell/,Simulation/[networks/, multiomics/, singlecell/, simulation/] -.-> D4
H5ontology/,Phenotype/,Ecology/,LifeEvents/[ontology/, phenotype/, ecology/, life_events/] -.-> D5
H6quality/,Visualization/[quality/, visualization/] -.-> D6
H7longread/,Metagenomics/,Structural_variants/,Spatial/,Pharmacogenomics/,Metabolomics/,Menu/[longread/, metagenomics/, structural_variants/, spatial/, pharmacogenomics/, metabolomics/, menu/] -.-> D4
H8cloud/[cloud/] -.-> D6
end
subgraph "Key Documents"
I1[architecture.md] -.-> E1
I2[testing.md] -.-> E3
I3[cli.md] -.-> F1
I4uvSetup.md[UV_SETUP.md] -.-> B1
end
| Category | Module | Description | Key Features |
|---|---|---|---|
| Core | core | Shared utilities and infrastructure | Configuration, I/O, logging, parallel processing, caching |
| DNA | dna | Genomic sequence analysis | Sequences, alignment, phylogeny, population genetics |
| RNA | rna | Transcriptomic analysis | RNA-seq workflows, amalgkit integration |
| Protein | protein | Protein structure and function | Sequences, AlphaFold integration, proteomics |
| Epigenome | epigenome | Epigenetic modifications | Methylation, ChIP-seq, chromatin accessibility |
| GWAS | gwas | Genome-wide association studies | Association testing, quality control, visualization |
| Math | math | Mathematical biology | Population genetics theory, coalescent models |
| ML | ml | Machine learning pipelines | Classification, regression, feature selection |
| Information | information | Information theory | Entropy, mutual information, semantic similarity |
| Networks | networks | Biological networks | PPI, pathways, community detection |
| Multi-omics | multiomics | Multi-omic integration | Joint analysis, data harmonization |
| Single-cell | singlecell | Single-cell genomics | Preprocessing, clustering, trajectory analysis |
| Simulation | simulation | Synthetic data generation | Sequence simulation, agent-based models |
| Ontology | ontology | Functional annotation | Gene Ontology, semantic similarity |
| Phenotype | phenotype | Phenotypic data | Trait analysis, AntWiki integration |
| Ecology | ecology | Ecological analysis | Community diversity, environmental data |
| Life Events | life_events | Temporal event analysis | Life course modeling, embeddings |
| Quality | quality | Data quality assessment | FASTQ analysis, assembly validation |
| Visualization | visualization | Plotting and graphics | 70+ specialized plotting modules |
| Cloud | cloud | Cloud deployment | GCP VM lifecycle, Docker pipelines, genome prep |
| Long-Read | longread | Long-read sequencing | PacBio/ONT, assembly, error correction |
| Metagenomics | metagenomics | Metagenomic analysis | Taxonomic profiling, functional annotation |
| Structural Variants | structural_variants | SV/CNV analysis | Detection, breakpoint resolution |
| Spatial | spatial | Spatial transcriptomics | Tissue mapping, spatial statistics |
| Pharmacogenomics | pharmacogenomics | Clinical genomics | Drug-gene interactions, variant interpretation |
| Metabolomics | metabolomics | Metabolomic analysis | MS data processing, pathway mapping |
| eQTL | eqtl | eQTL integration (cross-cutting) | RNA×GWAS integration — logic in gwas and multiomics |
| MCP | mcp | Model Context Protocol | LLM tool integrations |
| Menu | menu | Interactive navigation | CLI menu system, workflow discovery |
flowchart TD
A[What are you analyzing?] --> B{Data Type}
B -->|Sequences<br/>FASTA/FASTQ| C[DNA? RNA? Proteins?]
B -->|Variants<br/>VCF| D[GWAS / Population Genetics]
B -->|Annotations<br/>GFF/GTF| E[Functional / Ontology]
B -->|Counts<br/>Matrix| F[Expression / Abundance]
B -->|Networks<br/>Edges| G[Pathways / Interactions]
C -->|DNA| H[dna]<br>Sequences, alignment, trees
C -->|RNA| I[rna]<br>Amalgkit, ENA/SRA
C -->|Proteins| J[protein]<br>Structure + function
C -->|Epigenetic| K[epigenome]<br>Bisulfite, ChIP-seq
D --> L[gwas]<br>Association testing
D --> M[multiomics]<br>eQTL, integration
E --> N[ontology]<br>GO, KEGG, enrichment
E --> O[phenotype]<br>Trait analysis
F --> P[singlecell]<br>scRNA-seq, clustering
F --> Q[metagenomics]<br>16S, metagenome
G --> R[networks]<br>Graph + pathway analysis
style H fill:#e1f5ff
style I fill:#e1f5ff
style J fill:#e1f5ff
style K fill:#e1f5ff
style L fill:#e1f5ff
style M fill:#e1f5ff
style N fill:#e1f5ff
style O fill:#e1f5ff
style P fill:#e1f5ff
style Q fill:#e1f5ff
style R fill:#e1f5ff
Click any module name for detailed documentation.
graph LR
ArawData[Raw Data] --> B[Ingestion]
B --> C[Validation]
C --> D{Data Type}
D -->|Genomic| EdnaPipeline[DNA Pipeline]
D -->|Transcriptomic| FrnaPipeline[RNA Pipeline]
D -->|Proteomic| GproteinPipeline[Protein Pipeline]
D -->|Phenotypic| HphenotypePipeline[Phenotype Pipeline]
E --> IqualityControl[Quality Control]
F --> I
G --> I
H --> I
I --> J[Analysis]
J --> K{Integration Level}
K -->|Single-omic| LindividualResults[Individual Results]
K -->|Multi-omic| MintegratedResults[Integrated Results]
L --> N[Visualization]
M --> N
N --> O[Publication]
O --> PscientificInsights[Scientific Insights]
subgraph "Data Sources"
Q[NCBI] -.-> E
R[SRA] -.-> F
S[PDB] -.-> G
T[AntWiki] -.-> H
end
subgraph "Analysis Types"
UsequenceAnalysis[Sequence Analysis] -.-> E
VexpressionAnalysis[Expression Analysis] -.-> F
WstructureAnalysis[Structure Analysis] -.-> G
XtraitAnalysis[Trait Analysis] -.-> H
end
subgraph "Integration Methods"
Y[GWAS] -.-> K
Z[Networks] -.-> K
AA[ML] -.-> K
BBsystemsBiology[Systems Biology] -.-> K
end
METAINFORMANT uses uv for Python package management. Install with uv, or install from source with uv pip install -e .:
# Install into the active environment
uv pip install metainformant
# From source
git clone https://github.com/docxology/MetaInformAnt.git
cd metainformant
uv pip install -e .from metainformant.dna.sequence import composition
from metainformant.rna.engine.workflow import AmalgkitWorkflowConfig, execute_workflow
seq = "ATCGATCGATCG"
gc = composition.gc_content(seq)
print(f"GC content: {gc:.2f}")
config = AmalgkitWorkflowConfig(
work_dir="output/rna_analysis",
species_list=["Apis_mellifera"],
)
results = execute_workflow(config)For a production-grade end-to-end workflow covering genomic association studies, see the Integration Guide. It demonstrates how to:
- Parse & filter VCF files (dna.variants)
- Run mixed-model association (gwas.analysis)
- Construct fine-mapping credible sets (gwas.finemapping)
- Generate Manhattan, QQ, and LocusZoom plots (visualization)
- Orchestrate distributed processing with caching
The guide provides both minimal and full-featured implementations with performance benchmarks and troubleshooting tips.
The metainformant command exposes a small CLI (--version, --modules, protein, quality batch-detect, rna info, gwas info). RNA and GWAS pipelines use Python APIs or scripts/*/run_*.py. See cli.md.
uv run metainformant --help
uv run metainformant protein comp --fasta data/example.faa
python3 scripts/rna/run_workflow.py --config config/amalgkit/amalgkit_pogonomyrmex_barbatus.yaml:maxdepth: 2
:caption: User Guides
setup
UV_SETUP
DOCUMENTATION_GUIDE
TUTORIALS
INTEGRATION
FAQ
DISK_SPACE_MANAGEMENT
LINUX_TRANSFER
rna/amalgkit/TROUBLESHOOTING
ERROR_HANDLING
:maxdepth: 2
:caption: Reference
architecture
cli
SPEC
ORCHESTRATION
COMPARISON_GUIDES
NO_MOCKING_POLICY
:maxdepth: 2
:caption: Modules
core/index
dna/index
rna/index
protein/index
gwas/index
epigenome/index
eqtl/README
ontology/index
phenotype/index
ecology/index
math/index
information/index
ml/index
networks/index
multiomics/index
singlecell/index
quality/index
visualization/index
simulation/index
life_events/index
longread/index
cloud/index
mcp/index
metagenomics/index
structural_variants/index
spatial/index
pharmacogenomics/index
metabolomics/index
menu/index
mcp/index
cloud/README
agents/rules/index
:maxdepth: 2
:caption: Development
testing
- DNA Analysis: Sequence composition, alignments, phylogenetics, population genetics
- RNA Analysis: Transcriptome quantification, differential expression, cross-species analysis
- Protein Analysis: Structure prediction, domain analysis, functional annotation
- GWAS: Genome-wide association studies with population structure correction
- Epigenomics: DNA methylation analysis, chromatin accessibility
- Systems Biology: Network analysis, pathway enrichment, multi-omics integration
- Real Implementations: No mocks or fakes - actual external API calls and tool integration
- Scalable: Parallel processing, memory-efficient algorithms for large datasets
- Robust: Comprehensive error handling and validation
- Tested: Extensive test suite with real-world validation
- Type Hints: Full type annotation throughout codebase
- Documentation: Comprehensive docstrings and API documentation
- CLI: Intuitive command-line interface with subcommands
- Modular: Clean separation of concerns, easy to extend
- Scientific Rigor: Algorithms validated against established methods
- Reproducible: Version-controlled configurations and deterministic workflows
- Standards Compliant: Follows bioinformatics best practices and data formats
graph TB
subgraph "User Interfaces"
CLI[commandLineInterface]
API[pythonAPI]
Scripts[workflowScripts]
end
subgraph "Core Framework"
Workflow[workflowOrchestration]
Config[configurationManagement]
IO[inputOutputUtilities]
Logging[structuredLogging]
end
subgraph "Domain Modules"
DNA[dnaAnalysis]
RNA[rnaAnalysis]
PROT[proteinAnalysis]
GWAS[gwasAnalysis]
EPI[epigenomicsAnalysis]
ONT[ontologyAnalysis]
PHENO[phenotypeAnalysis]
ECOL[ecologyAnalysis]
MATH[mathBiology]
ML[machineLearning]
NET[networksAnalysis]
SC[singleCellAnalysis]
QUAL[qualityControl]
VIZ[visualizationModule]
SIM[simulationModule]
LE[lifeEventsAnalysis]
end
CLI --> Workflow
API --> Workflow
Scripts --> Workflow
Workflow --> Config
Workflow --> IO
Workflow --> Logging
DNA --> Core
RNA --> Core
PROT --> Core
GWAS --> Core
EPI --> Core
ONT --> Core
PHENO --> Core
ECOL --> Core
MATH --> Core
ML --> Core
NET --> Core
SC --> Core
QUAL --> Core
VIZ --> Core
SIM --> Core
LE --> Core
LR[longreadAnalysis] --> Core
METAG[metagenomicsAnalysis] --> Core
SVA[structuralVariants] --> Core
SPAT[spatialTranscriptomics] --> Core
PHARM[pharmacogenomics] --> Core
METAB[metabolomics] --> Core
MENUX[menuSystem] --> Core
CLOUD[cloudDeployment] --> Core
:maxdepth: 1
:caption: Task Reference
tasks/analyze_dna
tasks/run_rna_pipeline
tasks/run_gwas
tasks/deploy_cloud
tasks/visualize_results
tasks/mcp_integration
tasks/performance_tuning
tasks/data_conversion
If you'd like to contribute, see CONTRIBUTING.md.
- GitHub Issues: Report bugs and request features
- Discussions: Ask questions and share ideas
- Documentation: Comprehensive guides and API reference
- Contributing Guide: How to contribute to METAINFORMANT
- Development Setup: Setting up development environment
- Testing Guide: Running and writing tests
- API Design: Understanding the codebase architecture
METAINFORMANT is a comprehensive bioinformatics toolkit designed for modern multi-omics research. Whether you're analyzing genomes, transcriptomes, proteomes, or integrating multi-omics datasets, METAINFORMANT provides the tools and workflows you need.