🤖 ML SuperTrend Ultimate: Deep Q-Learning + LSTM + PER

First fully-working LSTM + Deep Q-Network trading system implemented in Pine Script!

A self-learning trading agent that uses cutting-edge machine learning techniques to adapt to market conditions in real-time — no external libraries, no Python, just pure Pine Script.

🔥 What Makes This Unique?

This is NOT just another indicator. This is a complete reinforcement learning system that:

✅ Learns from experience using Deep Q-Learning
✅ Remembers patterns with LSTM neural networks
✅ Adapts in real-time without retraining
✅ Prioritizes important data with PER (Prioritized Experience Replay)
✅ Works in your browser — no GPU, no Python, no servers

Why It's Special

Traditional Indicators	ML SuperTrend Ultimate
Static parameters	Learns optimal parameters
Same for all markets	Adapts to each market
Looks at 1-2 bars	Analyzes 8-20 bars history
Simple rules	Deep neural networks
No learning	Continuous learning

⚠️ DISCLAIMER

This is an experimental research project for educational purposes.

NOT financial advice
NO profit guarantees
Use at your own risk
Author bears NO responsibility for any losses

This is a learning tool, not a production trading system. Always backtest thoroughly and use proper risk management.

📖 What's Inside?

🧠 Deep Q-Network (DQN)

The "brain" that makes trading decisions.

8 possible actions (ATR multipliers: 0.3 → 1.5)
4-layer MLP (Multi-Layer Perceptron): 24 → 16 → 8 → 4 neurons
Q-values predict expected reward for each action
Epsilon-greedy exploration (10% → 2% decay)

🔮 LSTM Neural Network

Understands temporal patterns and market context.

24 hidden units (configurable)
Dynamic timesteps (8-20 bars, adapts to volatility)
4 gates: Forget, Input, Cell, Output
Backpropagation Through Time (BPTT)

💾 Prioritized Experience Replay (PER)

Smart memory that focuses on important lessons.

70,000 state buffer (replay memory)
Prioritized sampling based on TD-error
Importance sampling for bias correction
Beta annealing (0.4 → 1.0)

🎯 Adam Optimizer

State-of-the-art optimization for neural networks.

Adaptive learning rate (starts at 0.01)
Momentum + RMSprop combined
Gradient clipping for stability
Per-parameter learning rates

📊 Rich Feature Set

20+ features extracted from market data:

Technical: RSI, MACD, ATR, Stochastic
Volume: OBV, Volume Rate of Change
Advanced: Ichimoku, VWAP, Hurst proxy
Volatility: Heidelberg index, ATR ratios
Custom: NN confidence, entropy

🏗 Architecture Overview

Market Data
    ↓
[Feature Extraction] → 20 features
    ↓
[LSTM Layer] → Temporal patterns (8-20 timesteps)
    ↓
[MLP Network] → 24→16→8→4 neurons
    ↓
[Q-Values] → 8 actions (ATR multipliers)
    ↓
[Action Selection] → Epsilon-greedy
    ↓
[SuperTrend] → Adaptive coefficient
    ↓
Trading Signals
    ↓
[Reward] → (close - entry) / episode_length
    ↓
[Experience Replay] → Store in buffer (70k states)
    ↓
[PER Sampling] → Prioritize high TD-error
    ↓
[Backpropagation] → Update Q-network
    ↓
[LSTM BPTT] → Update LSTM weights

⚡ Key Features

1. Real-Time Learning

No pre-training needed
Learns continuously as market evolves
TD-Error-driven updates

2. Adaptive Parameters

ATR multiplier: 0.3 - 1.5 (agent selects)
LSTM timesteps: 8-20 (volatility-based)
Learning rate: adaptive (0.001 - 0.05)

3. Advanced Techniques

Priority Experience Replay (PER)
Backpropagation Through Time (BPTT)
Gradient clipping
Adaptive Hinge Loss with L2 penalty
Dual-kernel CNN filter

4. Robust Design

Dropout (0.3) prevents overfitting
L2 regularization (0.0008 MLP, 0.0003 LSTM)
Leaky ReLU activation (no vanishing gradients)
Epsilon decay (0.10 → 0.02)

🔬 Technical Specifications

Reinforcement Learning Parameters

State Space: 20-dimensional vector (5 features × 4 timesteps)
Action Space: 8 discrete actions [0.3, 0.4, 0.5, 0.7, 0.9, 1.0, 1.2, 1.5]
Reward Function: (close - entry_price) / episode_length
Discount Factor (γ): 0.99
Epsilon: 0.10 → 0.02 (decay: 0.999)
Training Frequency: Every 10 bars

Network Architecture

LSTM:
  Hidden Size: 8 (default, configurable)
  Timesteps: 8-20 (dynamic)
  Gates: Forget, Input, Cell, Output
  Activation: tanh (gates), sigmoid (cell)

MLP (DQN):
  Input: 20 features
  Layer 1: 24 neurons (Leaky ReLU)
  Layer 2: 16 neurons (Leaky ReLU)
  Layer 3: 8 neurons (Leaky ReLU)
  Layer 4: 4 neurons (Leaky ReLU)
  Output: 8 Q-values (linear)

Dropout: 0.3
L2 Lambda: 0.0008 (MLP), 0.0003 (LSTM)

Experience Replay

Buffer Size: 70,000 transitions
Batch Size: 6 samples
Priority Alpha (α): 0.6
Priority Beta (β): 0.4 → 1.0 (annealing)
Priority Epsilon: 1e-5

Optimizer

Type: Adam
Learning Rate: 0.01 (adaptive: 0.001 - 0.05)
Beta1: 0.9 (momentum)
Beta2: 0.999 (RMSprop)
Epsilon: 1e-8
Gradient Clip: 1.0

🚀 Quick Start

Installation (TradingView)

Open TradingView
Navigate to Pine Editor (bottom panel)
Create new indicator
Copy-paste code from ml_supertrend_ultimate.pine
Click "Add to Chart"

First Run

Initial training: Wait for 200-500 updates
Monitor EMA Error: Should decrease over time
Watch TD-Error: Convergence indicator
Enable debug panel: See learning metrics

Recommended Settings

Timeframe: H1 (1 hour) or H4 (4 hours)
Asset: BTC, ETH, major forex pairs
History: At least 1000 bars for initial training
Auto Optimize: Enabled
Show Debug Panel: Enabled (while learning)

📊 Performance Metrics

The system tracks several metrics to show learning progress:

Training Metrics

TD-Error: Should decrease from ~0.5 to <0.1
EMA Error: Smoothed error, should converge
Update Count: Number of gradient updates
Epsilon: Exploration rate (10% → 2%)

Q-Value Metrics

Avg Max Q: Average of maximum Q-values
Avg Old Q: Average of current Q-predictions
Avg Target Q: Average of target Q-values
Zero TD Count: How many samples have TD-error ≈ 0

Example Learning Curve

Updates 0-500:
  TD-Error: 0.5 → 0.3 (high, exploring)
  EMA Error: 0.7 → 0.5 (decreasing)
  Epsilon: 0.10 → 0.08 (still exploring)

Updates 500-2000:
  TD-Error: 0.3 → 0.15 (converging)
  EMA Error: 0.5 → 0.2 (good convergence)
  Epsilon: 0.08 → 0.04 (exploitation phase)

Updates 2000+:
  TD-Error: 0.15 → 0.05 (converged!)
  EMA Error: 0.2 → 0.1 (stable)
  Epsilon: 0.04 → 0.02 (minimal exploration)

🎓 Educational Value

Perfect for learning:

How LSTM networks work
Deep Q-Learning implementation from scratch
Reinforcement Learning for trading
Neural network training (Adam, BPTT)
Experience Replay and prioritization
Advanced ML techniques in constrained environment

Code Structure

📁 Project Root
├── 📄 ml_supertrend_ultimate.pine  (Main indicator)
├── 📄 README.md                     (This file)
├── 📄 LICENSE                       (MIT)
├── 📄 CHANGELOG.md                  (Version history)
├── 📁 docs/
│   ├── 📄 ARCHITECTURE.md          (Detailed architecture)
│   ├── 📄 TRAINING.md              (Training guide)
│   ├── 📄 FAQ.md                   (Common questions)
│   └── 📄 RESEARCH.md              (Research notes)
└── 📁 images/
    ├── 🖼️ screenshot_1.png         (Trading signals)
    ├── 🖼️ screenshot_2.png         (Debug panel)
    └── 🖼️ architecture.png         (System diagram)

🤝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines.

Ways to Contribute

🐛 Bug reports - Found an issue? Open an issue!
💡 Feature requests - Have an idea? Share it!
📝 Documentation - Improve README, add examples
🔧 Code - Submit pull requests
⭐ Star the repo - Show your support!

Development

git clone https://github.com/YOUR_USERNAME/ml-supertrend-ultimate.git
cd ml-supertrend-ultimate
# Edit ml_supertrend_ultimate.pine
# Test on TradingView
# Submit pull request

📚 References

This project implements techniques from cutting-edge research:

Deep Q-Learning
- Playing Atari with Deep Reinforcement Learning (Mnih et al., 2013)
Prioritized Experience Replay
- Prioritized Experience Replay (Schaul et al., 2015)
LSTM Networks
- Long Short-Term Memory (Hochreiter & Schmidhuber, 1997)
Adam Optimizer
- Adam: A Method for Stochastic Optimization (Kingma & Ba, 2014)

📞 Contact & Support

GitHub Issues: Report bugs or request features
GitHub Discussions: Ask questions, share ideas
Email: [email protected]

⭐ Show Your Support

If you find this project useful:

⭐ Star the repository
🔄 Share with others
📝 Write about it
🤝 Contribute

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

Copyright (c) 2025 [Diogenov Pavel]

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

[Full MIT License text in LICENSE file]

🙏 Acknowledgments

Created with: Claude Sonnet 4.5 by Anthropic 🤖
Inspired by: DeepMind's DQN research
Built in: Altai Krai, Barnaul, Russia 🇷🇺
For: The trading & ML community 🌍

📈 Roadmap

v1.0 (Current)

✅ LSTM + DQN implementation
✅ Prioritized Experience Replay
✅ Adam optimizer
✅ Real-time training

v1.1 (Planned)

Multi-asset support
Improved reward shaping
Advanced visualization
Performance analytics

v2.0 (Future)

Dueling DQN architecture
Double Q-Learning
Rainbow DQN
Attention mechanisms

Made with ❤️ in Russia 🇷🇺

Star ⭐ this repo if you found it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
supertrend_dqn.pine		supertrend_dqn.pine

License

PavelML-Dev/ML-Trading-Systems

Folders and files

Latest commit

History

Repository files navigation