METAINFORMANT Documentation

:start-after: "## Overview"
:end-before: "## Installation"

Documentation Navigation

graph TD
    AmetainformantDocumentation[METAINFORMANT Documentation] --> BgettingStarted[Getting Started]
    A --> CuserGuides[User Guides]
    A --> DmoduleDocumentation[Module Documentation]
    A --> EdeveloperResources[Developer Resources]
    A --> F[Reference]

    B --> B1[Installation]
    B --> B2quickStart[Quick Start]
    B --> B3[Tutorials]

    C --> C1workflowGuides[Workflow Guides]
    C --> C2bestPractices[Best Practices]
    C --> C3[Troubleshooting]
    C3 --> C3a[RNA Troubleshooting]

    D --> D1coreModules[Core Modules]
    D --> D2molecularAnalysis[Molecular Analysis]
    D --> D3statisticalMethods[Statistical Methods]
    D --> D4systemsBiology[Systems Biology]
    D --> D5annotation&Metadata[Annotation & Metadata]
    D --> D6[Utilities]

    E --> E1[Architecture]
    E --> E2[Contributing]
    E --> E3[Testing]
    E --> E4apiReference[API Reference]

    F --> F1cliReference[CLI Reference]
    F --> F2[Configuration]
    F --> F3errorCodes[Error Codes]


    subgraph "Primary Entry Points"
        G1[README.md] -.-> B
        G2[QUICKSTART.md] -.-> B
        G3[TUTORIALS.md] -.-> C
    end

    subgraph "Module Categories"
        H1[core/] -.-> D1
        H2dna/,Rna/,Protein/,Epigenome/[dna/, rna/, protein/, epigenome/] -.-> D2
        H3gwas/,Math/,Ml/,Information/[gwas/, math/, ml/, information/] -.-> D3
        H4networks/,Multiomics/,Singlecell/,Simulation/[networks/, multiomics/, singlecell/, simulation/] -.-> D4
        H5ontology/,Phenotype/,Ecology/,LifeEvents/[ontology/, phenotype/, ecology/, life_events/] -.-> D5
        H6quality/,Visualization/[quality/, visualization/] -.-> D6
        H7longread/,Metagenomics/,Structural_variants/,Spatial/,Pharmacogenomics/,Metabolomics/,Menu/[longread/, metagenomics/, structural_variants/, spatial/, pharmacogenomics/, metabolomics/, menu/] -.-> D4
        H8cloud/[cloud/] -.-> D6
    end

    subgraph "Key Documents"
        I1[architecture.md] -.-> E1
        I2[testing.md] -.-> E3
        I3[cli.md] -.-> F1
        I4uvSetup.md[UV_SETUP.md] -.-> B1
    end

Module Overview Matrix

Category	Module	Description	Key Features
Core	core	Shared utilities and infrastructure	Configuration, I/O, logging, parallel processing, caching
DNA	dna	Genomic sequence analysis	Sequences, alignment, phylogeny, population genetics
RNA	rna	Transcriptomic analysis	RNA-seq workflows, amalgkit integration
Protein	protein	Protein structure and function	Sequences, AlphaFold integration, proteomics
Epigenome	epigenome	Epigenetic modifications	Methylation, ChIP-seq, chromatin accessibility
GWAS	gwas	Genome-wide association studies	Association testing, quality control, visualization
Math	math	Mathematical biology	Population genetics theory, coalescent models
ML	ml	Machine learning pipelines	Classification, regression, feature selection
Information	information	Information theory	Entropy, mutual information, semantic similarity
Networks	networks	Biological networks	PPI, pathways, community detection
Multi-omics	multiomics	Multi-omic integration	Joint analysis, data harmonization
Single-cell	singlecell	Single-cell genomics	Preprocessing, clustering, trajectory analysis
Simulation	simulation	Synthetic data generation	Sequence simulation, agent-based models
Ontology	ontology	Functional annotation	Gene Ontology, semantic similarity
Phenotype	phenotype	Phenotypic data	Trait analysis, AntWiki integration
Ecology	ecology	Ecological analysis	Community diversity, environmental data
Life Events	life_events	Temporal event analysis	Life course modeling, embeddings
Quality	quality	Data quality assessment	FASTQ analysis, assembly validation
Visualization	visualization	Plotting and graphics	70+ specialized plotting modules
Cloud	cloud	Cloud deployment	GCP VM lifecycle, Docker pipelines, genome prep
Long-Read	longread	Long-read sequencing	PacBio/ONT, assembly, error correction
Metagenomics	metagenomics	Metagenomic analysis	Taxonomic profiling, functional annotation
Structural Variants	structural_variants	SV/CNV analysis	Detection, breakpoint resolution
Spatial	spatial	Spatial transcriptomics	Tissue mapping, spatial statistics
Pharmacogenomics	pharmacogenomics	Clinical genomics	Drug-gene interactions, variant interpretation
Metabolomics	metabolomics	Metabolomic analysis	MS data processing, pathway mapping
eQTL	eqtl	eQTL integration (cross-cutting)	RNA×GWAS integration — logic in `gwas` and `multiomics`
MCP	mcp	Model Context Protocol	LLM tool integrations
Menu	menu	Interactive navigation	CLI menu system, workflow discovery

Module Selection Decision Tree

flowchart TD
    A[What are you analyzing?] --> B{Data Type}
    
    B -->|Sequences<br/>FASTA/FASTQ| C[DNA? RNA? Proteins?]
    B -->|Variants<br/>VCF| D[GWAS / Population Genetics]
    B -->|Annotations<br/>GFF/GTF| E[Functional / Ontology]
    B -->|Counts<br/>Matrix| F[Expression / Abundance]
    B -->|Networks<br/>Edges| G[Pathways / Interactions]
    
    C -->|DNA| H[dna]<br>Sequences, alignment, trees
    C -->|RNA| I[rna]<br>Amalgkit, ENA/SRA
    C -->|Proteins| J[protein]<br>Structure + function
    C -->|Epigenetic| K[epigenome]<br>Bisulfite, ChIP-seq
    
    D --> L[gwas]<br>Association testing
    D --> M[multiomics]<br>eQTL, integration
    
    E --> N[ontology]<br>GO, KEGG, enrichment
    E --> O[phenotype]<br>Trait analysis
    
    F --> P[singlecell]<br>scRNA-seq, clustering
    F --> Q[metagenomics]<br>16S, metagenome
    
    G --> R[networks]<br>Graph + pathway analysis
    
    style H fill:#e1f5ff
    style I fill:#e1f5ff
    style J fill:#e1f5ff
    style K fill:#e1f5ff
    style L fill:#e1f5ff
    style M fill:#e1f5ff
    style N fill:#e1f5ff
    style O fill:#e1f5ff
    style P fill:#e1f5ff
    style Q fill:#e1f5ff
    style R fill:#e1f5ff

Click any module name for detailed documentation.

Data Flow Architecture

graph LR
    ArawData[Raw Data] --> B[Ingestion]
    B --> C[Validation]
    C --> D{Data Type}

    D -->|Genomic| EdnaPipeline[DNA Pipeline]
    D -->|Transcriptomic| FrnaPipeline[RNA Pipeline]
    D -->|Proteomic| GproteinPipeline[Protein Pipeline]
    D -->|Phenotypic| HphenotypePipeline[Phenotype Pipeline]

    E --> IqualityControl[Quality Control]
    F --> I
    G --> I
    H --> I

    I --> J[Analysis]
    J --> K{Integration Level}

    K -->|Single-omic| LindividualResults[Individual Results]
    K -->|Multi-omic| MintegratedResults[Integrated Results]

    L --> N[Visualization]
    M --> N

    N --> O[Publication]
    O --> PscientificInsights[Scientific Insights]


    subgraph "Data Sources"
        Q[NCBI] -.-> E
        R[SRA] -.-> F
        S[PDB] -.-> G
        T[AntWiki] -.-> H
    end

    subgraph "Analysis Types"
        UsequenceAnalysis[Sequence Analysis] -.-> E
        VexpressionAnalysis[Expression Analysis] -.-> F
        WstructureAnalysis[Structure Analysis] -.-> G
        XtraitAnalysis[Trait Analysis] -.-> H
    end

    subgraph "Integration Methods"
        Y[GWAS] -.-> K
        Z[Networks] -.-> K
        AA[ML] -.-> K
        BBsystemsBiology[Systems Biology] -.-> K
    end

Quick Start

Installation

METAINFORMANT uses uv for Python package management. Install with uv, or install from source with uv pip install -e .:

# Install into the active environment
uv pip install metainformant

# From source
git clone https://github.com/docxology/MetaInformAnt.git
cd metainformant
uv pip install -e .

Basic Usage

from metainformant.dna.sequence import composition
from metainformant.rna.engine.workflow import AmalgkitWorkflowConfig, execute_workflow

seq = "ATCGATCGATCG"
gc = composition.gc_content(seq)
print(f"GC content: {gc:.2f}")

config = AmalgkitWorkflowConfig(
    work_dir="output/rna_analysis",
    species_list=["Apis_mellifera"],
)
results = execute_workflow(config)

Complete Pipeline Example (DNA → GWAS → Visualization)

For a production-grade end-to-end workflow covering genomic association studies, see the Integration Guide. It demonstrates how to:

Parse & filter VCF files (dna.variants)
Run mixed-model association (gwas.analysis)
Construct fine-mapping credible sets (gwas.finemapping)
Generate Manhattan, QQ, and LocusZoom plots (visualization)
Orchestrate distributed processing with caching

The guide provides both minimal and full-featured implementations with performance benchmarks and troubleshooting tips.

Command Line Interface

The metainformant command exposes a small CLI (--version, --modules, protein, quality batch-detect, rna info, gwas info). RNA and GWAS pipelines use Python APIs or scripts/*/run_*.py. See cli.md.

uv run metainformant --help
uv run metainformant protein comp --fasta data/example.faa
python3 scripts/rna/run_workflow.py --config config/amalgkit/amalgkit_pogonomyrmex_barbatus.yaml

Documentation Contents

User Guides

:maxdepth: 2
:caption: User Guides

setup
UV_SETUP
DOCUMENTATION_GUIDE
TUTORIALS
INTEGRATION
FAQ
DISK_SPACE_MANAGEMENT
LINUX_TRANSFER
rna/amalgkit/TROUBLESHOOTING
ERROR_HANDLING

API Reference

:maxdepth: 2
:caption: Reference

architecture
cli
SPEC
ORCHESTRATION
COMPARISON_GUIDES
NO_MOCKING_POLICY

Module Documentation

:maxdepth: 2
:caption: Modules

core/index
dna/index
rna/index
protein/index
gwas/index
epigenome/index
eqtl/README
ontology/index
phenotype/index
ecology/index
math/index
information/index
ml/index
networks/index
multiomics/index
singlecell/index
quality/index
visualization/index
simulation/index
life_events/index
longread/index
cloud/index
mcp/index
metagenomics/index
structural_variants/index
spatial/index
pharmacogenomics/index
metabolomics/index
menu/index
mcp/index
cloud/README
agents/rules/index

Development

:maxdepth: 2
:caption: Development

testing

Key Features

Comprehensive Domain Coverage

DNA Analysis: Sequence composition, alignments, phylogenetics, population genetics
RNA Analysis: Transcriptome quantification, differential expression, cross-species analysis
Protein Analysis: Structure prediction, domain analysis, functional annotation
GWAS: Genome-wide association studies with population structure correction
Epigenomics: DNA methylation analysis, chromatin accessibility
Systems Biology: Network analysis, pathway enrichment, multi-omics integration

Production Ready

Real Implementations: No mocks or fakes - actual external API calls and tool integration
Scalable: Parallel processing, memory-efficient algorithms for large datasets
Robust: Comprehensive error handling and validation
Tested: Extensive test suite with real-world validation

Developer Friendly

Type Hints: Full type annotation throughout codebase
Documentation: Comprehensive docstrings and API documentation
CLI: Intuitive command-line interface with subcommands
Modular: Clean separation of concerns, easy to extend

Research Grade

Scientific Rigor: Algorithms validated against established methods
Reproducible: Version-controlled configurations and deterministic workflows
Standards Compliant: Follows bioinformatics best practices and data formats

Architecture

graph TB
    subgraph "User Interfaces"
        CLI[commandLineInterface]
        API[pythonAPI]
        Scripts[workflowScripts]
    end

    subgraph "Core Framework"
        Workflow[workflowOrchestration]
        Config[configurationManagement]
        IO[inputOutputUtilities]
        Logging[structuredLogging]
    end

    subgraph "Domain Modules"
        DNA[dnaAnalysis]
        RNA[rnaAnalysis]
        PROT[proteinAnalysis]
        GWAS[gwasAnalysis]
        EPI[epigenomicsAnalysis]
        ONT[ontologyAnalysis]
        PHENO[phenotypeAnalysis]
        ECOL[ecologyAnalysis]
        MATH[mathBiology]
        ML[machineLearning]
        NET[networksAnalysis]
        SC[singleCellAnalysis]
        QUAL[qualityControl]
        VIZ[visualizationModule]
        SIM[simulationModule]
        LE[lifeEventsAnalysis]
    end

    CLI --> Workflow
    API --> Workflow
    Scripts --> Workflow

    Workflow --> Config
    Workflow --> IO
    Workflow --> Logging

    DNA --> Core
    RNA --> Core
    PROT --> Core
    GWAS --> Core
    EPI --> Core
    ONT --> Core
    PHENO --> Core
    ECOL --> Core
    MATH --> Core
    ML --> Core
    NET --> Core
    SC --> Core
    QUAL --> Core
    VIZ --> Core
    SIM --> Core
    LE --> Core
    LR[longreadAnalysis] --> Core
    METAG[metagenomicsAnalysis] --> Core
    SVA[structuralVariants] --> Core
    SPAT[spatialTranscriptomics] --> Core
    PHARM[pharmacogenomics] --> Core
    METAB[metabolomics] --> Core
    MENUX[menuSystem] --> Core
    CLOUD[cloudDeployment] --> Core

Common Tasks & Quick Commands

:maxdepth: 1
:caption: Task Reference

tasks/analyze_dna
tasks/run_rna_pipeline
tasks/run_gwas
tasks/deploy_cloud
tasks/visualize_results
tasks/mcp_integration
tasks/performance_tuning
tasks/data_conversion

Getting Help

If you'd like to contribute, see CONTRIBUTING.md.

Community Support

GitHub Issues: Report bugs and request features
Discussions: Ask questions and share ideas
Documentation: Comprehensive guides and API reference

Development

Contributing Guide: How to contribute to METAINFORMANT
Development Setup: Setting up development environment
Testing Guide: Running and writing tests
API Design: Understanding the codebase architecture

METAINFORMANT is a comprehensive bioinformatics toolkit designed for modern multi-omics research. Whether you're analyzing genomes, transcriptomes, proteomes, or integrating multi-omics datasets, METAINFORMANT provides the tools and workflows you need.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

METAINFORMANT Documentation

Documentation Navigation

Module Overview Matrix

Module Selection Decision Tree

Data Flow Architecture

Quick Start

Installation

Basic Usage

Complete Pipeline Example (DNA → GWAS → Visualization)

Command Line Interface

Documentation Contents

User Guides

API Reference

Module Documentation

Development

Key Features

Comprehensive Domain Coverage

Production Ready

Developer Friendly

Research Grade

Architecture

Common Tasks & Quick Commands

Getting Help

Community Support

Development

FilesExpand file tree

index.md

Latest commit

History

index.md

File metadata and controls

METAINFORMANT Documentation

Documentation Navigation

Module Overview Matrix

Module Selection Decision Tree

Data Flow Architecture

Quick Start

Installation

Basic Usage

Complete Pipeline Example (DNA → GWAS → Visualization)

Command Line Interface

Documentation Contents

User Guides

API Reference

Module Documentation

Development

Key Features

Comprehensive Domain Coverage

Production Ready

Developer Friendly

Research Grade

Architecture

Common Tasks & Quick Commands

Getting Help

Community Support

Development