Reinforcement Learning for Optimal Quantum State Discrimination
Model-free calibration of quantum receivers through trial and error.
This repository provides a comprehensive framework for implementing reinforcement learning (RL) techniques to achieve optimal quantum state discrimination over unknown channels. The codebase enables real-time calibration and optimization of quantum devicesβparticularly coherent-state receiversβwithout requiring prior knowledge of system parameters.
Quantum devices are particularly challenging to operate: their functionality relies on precisely tuning parameters, yet environmental conditions constantly shift, causing detuning. Traditional approaches require detailed modeling of environmental behavior, which is often computationally unaffordable, while direct parameter measurements introduce extra noise.
We frame quantum receiver calibration as a reinforcement learning problem, where an agent learns optimal discrimination strategies through trial and errorβwithout any prior knowledge of experimental details.
Q-Learning Framework
- Ξ΅-greedy exploration with configurable parameters
- Adaptive learning rates (1/N decay or fixed)
- Real-time Q-value updates for optimal policy discovery
- Support for change-point detection scenarios
Quantum Physics Engine
- Born rule probability calculations
- Coherent state displacement operations
- Kennedy receiver simulation
- Variable-loss optical channel modeling
Analysis & Visualization
- Learning curve generation
- Q-function landscape plotting
- Noise sensitivity analysis
- Comparative performance metrics
Dynamic Calibration
- Model-free control loops
- Continuous parameter re-calibration
- Adaptation to environmental drift
- Optimal Ξ² displacement learning
qrec/
βββ qrec/ # Core module
β βββ utils.py # Q-learning utilities, physics functions
βββ experiments/ # Experimental configurations
β βββ 0/ # Full exploration (Ξ΅=1.0)
β βββ 1/ # Low exploration, 1/N learning rate
β βββ 2/ # Change-point: Ξ± = 1.5 β 0.25
β βββ 3/ # Fixed lr = 0.005
β βββ 4/ # Fixed lr = 0.05
β βββ 5/ # Change-point with fixed lr (best)
β βββ 6/ # Noise inspection
βββ paper/ # Publication figures
βββ basic_inspection.py # Error landscape visualization
βββ index_experiments # Experiment documentation
βββ requirements.txt # Dependencies
Related Repository: matibilkis/marek
marek/
βββ main_programs/ # Core RL algorithms
βββ dynamic_programming/ # DP optimization modules
βββ bounds_optimals_and_limits/ # Theoretical bounds computation
βββ plotting_programs/ # Visualization tools
βββ appendix_A/ # Supplementary materials
βββ tests/ # Validation suite
βββ agent.py # RL agent implementation
βββ environment.py # Quantum channel simulation
βββ training.py # Training loop
βββ basics.py # Core physics functions
The goal is to discriminate between coherent states |Β±Ξ±β© using a Kennedy-like receiver with displacement Ξ²:
P(n|Ξ±) = exp(-|Ξ±|Β²) Β· Ξ΄ββ + (1 - exp(-|Ξ±|Β²)) Β· Ξ΄ββ
The success probability for a given displacement Ξ²:
Pβ(Ξ²) = Β½ Ξ£β max_{sβ{-1,+1}} P(n | sΞ± + Ξ²)
Action-value updates:
Qβ(Ξ², n, g) β Qβ(Ξ², n, g) + Ξ± Β· [r - Qβ(Ξ², n, g)]
Qβ(Ξ²) β Qβ(Ξ²) + Ξ± Β· [max_g Qβ(Ξ², n, g) - Qβ(Ξ²)]
Policy (Ξ΅-greedy):
Ο(Ξ²) = { random with probability Ξ΅
{ argmax Qβ with probability 1-Ξ΅
git clone https://github.com/matibilkis/qrec.git
cd qrec
pip install -r requirements.txtfrom qrec.utils import *
# Initialize Q-tables with 25 discretized Ξ² values
betas_grid, [q0, q1, n0, n1] = define_q(nbetas=25)
# Find model-aware optimal (for comparison)
mmin, p_star, beta_star = model_aware_optimal(betas_grid, alpha=0.4)
# Run Q-learning episode
hidden_phase = np.random.choice([0, 1]) # Nature chooses Β±Ξ±
indb, beta = ep_greedy(q0, betas_grid, ep=0.01) # Agent chooses Ξ²
n = give_outcome(hidden_phase, beta, alpha=0.4) # Photon detection
indg, guess = ep_greedy(q1[indb, n, :], [0, 1], ep=0.01) # Agent guesses
reward = give_reward(guess, hidden_phase) # Success/failurecd experiments/5
python change_point.pyThe RL agent successfully learns near-optimal receiver configurations:
| Experiment | Configuration | Key Finding |
|---|---|---|
| 0 | Ξ΅ = 1.0 (full exploration) | Baseline uniform sampling |
| 1 | Ξ΅ = 0.01, lr = 1/N | Convergent but slow adaptation |
| 2 | Change-point, lr = 1/N | Cannot adapt to Ξ± changes |
| 3 | Ξ΅ = 0.01, lr = 0.005 | Stable but slow learning |
| 4 | Ξ΅ = 0.01, lr = 0.05 | Good balance |
| 5 | Change-point, lr = 0.05 | Successful re-calibration |
Our agents in action: learning curves for sensor calibration
Key Insight: Fixed learning rates enable adaptation to changing channel conditions, while decaying rates (1/N) lock the agent to initial configurations.
| Function | Description |
|---|---|
p(alpha, n) |
Born rule probability P(n|Ξ±) |
Perr(beta, alpha) |
Error probability for displacement Ξ² |
give_outcome(phase, beta, alpha) |
Sample photon detection outcome |
model_aware_optimal(betas, alpha) |
Compute theoretical optimum |
| Function | Description |
|---|---|
define_q(nbetas) |
Initialize Q-tables and counters |
ep_greedy(qvals, actions, ep) |
Ξ΅-greedy action selection |
greedy(arr) |
Greedy selection (ties broken randomly) |
give_reward(guess, phase) |
Binary reward function |
Psq(q0, q1, betas, alpha) |
Evaluate current policy |
numpy
matplotlib
scipy
tqdm
numba
Contributions are welcome. Please open an issue first to discuss proposed changes.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License. See LICENSE for details.
This framework has enabled three peer-reviewed publications in quantum machine learning:
|
T. Crosta, L. RebΓ³n, F. VilariΓ±o, J.M. Matera, M. Bilkis arXiv:2404.10726 (2024) Model-free RL framework for continuous recalibration of quantum device parameters. Demonstrated on Kennedy receiver-based long-distance quantum communication.
|
M. Bilkis, M. Fraas, A. AcΓn, G. SentΓs arXiv:2203.09807 (2022) Calibration of quantum receivers for optical coherent states over channels with variable transmissivity using reinforcement learning.
|
M. Bilkis, M. Rosati, R. MuΓ±oz-Tapia, J. Calsamiglia Phys. Rev. Research 2, 033295 (2020) Foundational work: RL agents learn near-optimal coherent-state receivers through real-time trial and error experimentation.
|
If you use this code in your research, please cite:
@article{crosta2024automatic,
title={Automatic re-calibration of quantum devices by reinforcement learning},
author={Crosta, T. and Reb{\'o}n, L. and Vilari{\~n}o, F. and Matera, J. M. and Bilkis, M.},
journal={arXiv preprint arXiv:2404.10726},
year={2024}
}
@article{bilkis2022reinforcement,
title={Reinforcement-learning calibration of coherent-state receivers on variable-loss optical channels},
author={Bilkis, M. and Fraas, M. and Ac{\'i}n, A. and Sent{\'i}s, G.},
journal={arXiv preprint arXiv:2203.09807},
year={2022}
}
@article{bilkis2020realtime,
title={Real-time calibration of coherent-state receivers: Learning by trial and error},
author={Bilkis, M. and Rosati, M. and Mu{\~n}oz-Tapia, R. and Calsamiglia, J.},
journal={Physical Review Research},
volume={2},
pages={033295},
year={2020}
}
