A Python framework leveraging DNA-based genetic algorithms to discover and evolve robust algorithmic trading strategies.
The quest for profitable and reliable trading strategies is fraught with challenges. Manual design is constrained by human intuition and biases, while traditional optimization techniques often fall victim to overfitting, creating strategies that look great on historical data but fail dramatically in live markets.
This project tackles these issues using a bio-inspired approach: Genetic Algorithms (GAs). We represent trading strategies not as fixed parameter sets, but as digital DNA. This DNA encodes the logic, indicators, risk management rules, and other components of a strategy.
By simulating evolution over generations—selecting successful strategies, recombining their DNA (crossover), and introducing variations (mutation)—the system automatically explores a vast and complex strategy space. The goal is to discover novel, potentially non-intuitive strategies that exhibit robust performance across different market conditions.
Key differentiators include:
- Rich Genetic Representation: Encoding complex rules and adaptive behaviors directly into the DNA.
- Multi-Objective Optimization: Explicitly optimizing for a balance between profitability and risk metrics (e.g., drawdown, Sortino ratio) using algorithms like NSGA-II, leading to a Pareto front of trade-off solutions rather than a single "best" strategy.
- Diversity Preservation: Employing techniques like Novelty Search and MAP-Elites to encourage exploration and prevent premature convergence on suboptimal solutions, fostering a population of distinct behavioral archetypes.
- Rigorous Validation: Integrating methods like Walk-Forward Optimization and Purged K-Fold Cross-Validation specifically designed for time-series data to combat overfitting.
flowchart LR
A["DNA Sequence\n(ACGT...)"] -->|"Decode\nencoding.py"| B["Chromosome\nList of Genes"]
B --> C["Strategy Logic"]
C -->|"Apply to Data\nbacktesting"| D["Backtest Engine"]
D --> E["Performance Metrics\nProfit, Drawdown, etc."]
E -->|"Fitness Assignment\nfitness/evaluation.py"| F["Fitness Score / Objectives"]
F -->|"Selection\npopulation/selection.py"| G["Select Parents"]
G -->|"Crossover & Mutation\noperators/"| H["New DNA Sequence"]
H --> A
I["Population\npopulation/population.py"] --> G
H --> I
- DNA Encoding (
encoding.py,core/structures.py):- Strategies are represented as
DNASequence(strings of A, C, G, T). - Triplet codons (e.g.,
ATG,GCT) map to integers (0-63). - These integers are scaled and mapped to specific strategy parameters (indicator periods, thresholds, operator types, risk settings) based on predefined schemas within
encoding.py. - Genes are functional blocks defined by
START_CODON(ATG) andSTOP_CODONS(TAA,TAG,TGA). - A
Chromosomegroups multipleGeneobjects decoded from a singleDNASequence. - Regulatory Elements: Promoter sequences preceding genes can influence their
expression_level, potentially modulating their impact.
- Strategies are represented as
- Gene Types (
encoding.pydecoding logic):- The first codon after
STARTdetermines thegene_type. - Subsequent codons define parameters specific to that type (e.g., an
indicatorgene decodes name, period, source; arisk_managementgene decodes SL/TP modes and values).
- The first codon after
- Evolutionary Loop:
- Initialization (
population/population.py): Create an initialPopulationof randomChromosomeobjects. - Evaluation (
fitness/evaluation.py): Run eachChromosome(strategy) through thebacktestingengine against historical data. Calculate fitness objectives (e.g., net profit, max drawdown). - Ranking (
fitness/ranking.py): For multi-objective evolution, rank individuals usingfast_non_dominated_sortand calculatecalculate_crowding_distance(NSGA-II). - Selection (
population/selection.py): Choose parent individuals for reproduction based on fitness/rank (e.g.,tournament_selection,nsga2_selection,lexicase_selection). - Variation (
operators/): Create offspringDNASequenceobjects by applyingcrossover(e.g.,single_point_crossover) andmutation(e.g.,point_mutation,insertion_mutation,gene_duplication_mutation) operators to selected parents. - Replacement: Form the next generation's population from parents and offspring (often involving elitism).
- Repeat: Go back to step 2 for a set number of generations.
- Initialization (
- Diversity (
population/diversity.py,population/map_elites.py):- Techniques like Fitness Sharing (penalizing genotypically similar individuals) or Novelty Search (rewarding unique behavioral characteristics) run alongside fitness evaluation to maintain diversity.
- MAP-Elites maintains an archive (grid) of high-performing individuals across different behavioral niches (e.g., high-frequency vs. low-frequency strategies).
- Island Model (
population/island.py): Evolves multiple populations in parallel, periodically migrating top individuals between islands to share genetic material and explore different search space regions. - Validation (
validation.py): Provides tools likegenerate_wfo_splitsandPurgedKFoldto create appropriate training/testing splits for time-series data, essential for out-of-sample testing during or after evolution.
- Encoding: DNA/Codon/Gene system, Promoters.
- Genes: Rules (Entry/Exit), Risk (SL/TP), Indicators, Filters (Time/Regime), Order Types, Meta-params.
- Operators: Point/Insertion/Deletion/Codon/Inversion/Translocation/Duplication Mutation, Single-Point/Uniform Crossover.
- GA Techniques: MOO (NSGA-II), Selection (Tournament, NSGA-II, Lexicase), Diversity (Sharing, Novelty, MAP-Elites), Island Model, Adaptive Rates.
- Validation: WFO, Purged K-Fold CV.
- Analysis: Parameter Sensitivity, Gene Freq/Corr, Behavior PCA, Plotting (Pareto, History, QD Grid).
- Framework: Persistence, YAML Config, Packaging, Testing.
# Ensure you have Python 3.11+ installed
# Clone the repository
git clone https://github.com/ChristianJStarr/nucleotide_strategy_evolution.git
cd nucleotide_strategy_evolution
# Create and activate a virtual environment (recommended)
python -m venv .venv
# On Windows: .venv\Scripts\activate
# On macOS/Linux: source .venv/bin/activate
# Install the package and its core dependencies
# This uses the pyproject.toml file
pip install -e .
# To install development dependencies (for testing, linting, notebooks):
pip install -e ".[dev]"Key Dependencies: numpy, pandas, pyyaml, matplotlib, plotly, seaborn, scikit-learn, backtesting (or your chosen engine).
# Conceptual example (see examples/ for runnable scripts)
import random
from nucleotide_strategy_evolution.population import Population, IslandModel
from nucleotide_strategy_evolution.fitness import MultiObjectiveEvaluator
from nucleotide_strategy_evolution.operators import get_crossover_operator, get_mutation_operator
from nucleotide_strategy_evolution.population import get_selection_operator
from nucleotide_strategy_evolution.backtesting import setup_backtester
from nucleotide_strategy_evolution.utils import config_loader
from unittest.mock import MagicMock # For demonstration purposes only
# 1. Configuration
evo_params = {
'population': {'size': 50, 'dna_length': 500},
'evolution': {'generations': 20},
'fitness': {'objectives': ['net_profit', '-max_drawdown']},
'operators': {
'crossover': {'type': 'single_point', 'rate': 0.7},
'mutations': [
{'type': 'point_mutation', 'rate': 0.05},
{'type': 'insertion', 'rate': 0.01},
{'type': 'deletion', 'rate': 0.01}
]
},
'selection': {'method': 'nsga2'}
}
compliance_rules = {'max_drawdown_pct': 0.2}
# 2. Setup Backtester (Replace with your actual setup)
# In a real implementation:
# backtester = setup_backtester("your_market_data.csv", initial_cash=100000)
# In this example, we use a mock:
backtester = MagicMock()
backtester.run = lambda chromo, rules: MagicMock(hard_rule_violation=False, stats={'net_profit': random.uniform(0,1000), 'max_drawdown': random.uniform(0,0.1)})
# 3. Initialize Population
population = Population(size=evo_params['population']['size'],
dna_length=evo_params['population']['dna_length'])
population.initialize()
# 4. Setup Evaluator, Operators, Selector
fitness_evaluator = MultiObjectiveEvaluator(backtester, objectives=evo_params['fitness']['objectives'])
crossover_op = get_crossover_operator(evo_params['operators']['crossover'])
mutation_ops = [(get_mutation_operator(m_conf), m_conf['rate']) for m_conf in evo_params['operators']['mutations']]
selection_method = get_selection_operator(evo_params['selection'])
# 5. Evolution Loop (Simplified)
num_generations = evo_params['evolution']['generations']
for gen in range(num_generations):
print(f"\n--- Generation {gen+1}/{num_generations} ---")
# Evaluate Population (using parallel evaluator method)
fitness_results = fitness_evaluator.evaluate_population(
{i: population.individuals[i] for i in range(len(population))},
compliance_rules,
n_jobs=-1 # Use parallel processing
)
# Update population fitness scores
for idx, fitness in fitness_results.items():
population.set_fitness(idx, fitness)
# Selection
parent_indices = selection_method(len(population), population.fitness_scores)
# Create Offspring through crossover and mutation
offspring = []
for i in range(len(population) // 2):
parent1 = population.individuals[parent_indices[i * 2]]
parent2 = population.individuals[parent_indices[i * 2 + 1]]
# Crossover
child1_dna, child2_dna = crossover_op(parent1.raw_dna, parent2.raw_dna)
# Mutation
for mutation_op, rate in mutation_ops:
if random.random() < rate:
child1_dna = mutation_op(child1_dna)
if random.random() < rate:
child2_dna = mutation_op(child2_dna)
# Create chromosomes from DNA
offspring.append(Chromosome(raw_dna=child1_dna))
offspring.append(Chromosome(raw_dna=child2_dna))
# Replacement with elitism
population.individuals = offspring[:len(population)]
print("\nEvolution finished.")See the examples/ directory for more complete and runnable scripts.
nucleotide_strategy_evolution/
├── nucleotide_strategy_evolution/ # Main package source code
│ ├── core/ # Core data structures (DNASequence, Gene, Chromosome)
│ ├── encoding.py # Functions for converting DNA <-> strategy parameters
│ ├── genes/ # Gene type definitions and factory functions
│ ├── operators/ # Genetic operators (mutation.py, crossover.py, adaptive.py)
│ ├── population/ # Population management (population.py, diversity.py, selection.py, island.py, map_elites.py)
│ ├── fitness/ # Fitness evaluation logic (evaluation.py, ranking.py)
│ ├── backtesting/ # Abstract interface definition for backtesting engines
│ ├── analysis/ # Tools for post-evolution analysis (robustness.py, gene_analysis.py, behavior_analysis.py)
│ ├── visualization/ # Plotting utilities (plotting.py)
│ ├── validation.py # WFO and PurgedKFold splitting logic
│ ├── serialization.py # Saving/loading GA state
│ ├── config_loader.py # Loading parameters from YAML files
│ └── utils/ # General helper functions, logging setup
├── tests/ # Contains unit and integration tests mirroring the package structure
├── examples/ # Runnable scripts demonstrating package features
├── notebooks/ # Jupyter notebooks for experimentation and visualization
├── config/ # Example YAML configuration files for evolution runs
├── data/ # Directory for market data
├── pyproject.toml # Project build/dependency configuration (PEP 621)
├── requirements.txt # Core dependencies list (alternative format)
├── requirements-dev.txt # Development dependencies list (alternative format)
└── README.md # This file
To run the test suite:
# Ensure development dependencies are installed: pip install -e ".[dev]"
# From the project root directory:
python -m pytest tests/The project documentation is built using Sphinx. To build and view the documentation:
# Install documentation dependencies
pip install -e ".[dev]"
# Build documentation
# On Windows:
.\build_docs.ps1
# On Unix-based systems:
chmod +x build_docs.sh # Make the script executable (first time only)
./build_docs.sh
# View documentation
# Open docs/_build/html/index.html in your browserThe documentation includes:
- Installation and quickstart guides
- Core concepts explanations
- User guides for all major components
- API reference documentation
- Examples and tutorials
Contributions are welcome! Please feel free to submit bug reports, feature requests, or pull requests.
Before submitting a PR:
- Ensure your code follows the project's coding style (Black formatter)
- Add tests for new functionality
- Make sure existing tests pass
- Update documentation as needed
This project is licensed under the MIT License - see the LICENSE file for details.
Here are visualizations generated from sample runs of the framework:
Shows the trade-off between different objectives (e.g., Profit vs. Drawdown) for the non-dominated solutions found.
Visualizes the performance of the best individual found for different behavioral niches.
Tracks the evolution of metrics like average fitness or diversity over generations.


