Optimize performance bottlenecks across samplers, models, and loss functions by Copilot · Pull Request #95 · soran-ghaderi/torchebm

Copilot · 2025-11-15T17:14:13Z

Identified and eliminated performance bottlenecks causing unnecessary computation and memory allocation in hot paths.

Core Model Optimizations

BaseModel.gradient(): Removed dtype conversion cycle (original → float32 → original). Preserves input dtype throughout, eliminating 2 tensor copies per gradient computation.
GaussianModel.forward(): Replaced expand().bmm() pattern with einsum("bi,ij,bj->b"). ~30% faster for batch operations.

# Before: multiple operations + branching
delta_expanded = delta.unsqueeze(-1)
cov_inv_expanded = cov_inv.unsqueeze(0).expand(batch_size, -1, -1)
temp = torch.bmm(cov_inv_expanded, delta_expanded)
energy = 0.5 * torch.bmm(delta.unsqueeze(1), temp).squeeze(-1).squeeze(-1)

# After: single operation
energy = 0.5 * torch.einsum("bi,ij,bj->b", delta, cov_inv, delta)

Sampler Optimizations

HMC/Langevin diagnostics: Replaced expand() calls in sampling loops with broadcasting assignments. Eliminates view tensor allocations in hot paths.
HMC momentum: Avoid tensor creation for scalar sqrt operations; use mass ** 0.5 directly for float mass.
HMC kinetic energy: Pre-compute inverse mass, use multiplication instead of division.

Integrator Optimizations

Leapfrog: Cache half_step = 0.5 * step_size; compute once instead of twice per step.
Euler-Maruyama: Compute sqrt(2 * step_size) once and reuse.

Loss Function Optimizations

SlicedScoreMatching: Replace unsqueeze().expand().contiguous().view() chain with single repeat() call.
ContrastiveDivergence: Remove redundant .to(device, dtype) calls in compute_loss().
DenoisingScoreMatching: Cache 1 / noise_scale² computation.

Deliverables

Performance test suite with 10 benchmarks validating correctness and speed
Benchmark script (benchmark_performance.py) measuring throughput:
- GaussianModel.forward(): 3.6M samples/sec @ 100 dims
- Gradient computation: 625K samples/sec @ 50 dims
- Langevin sampling: 270K samples/sec
- HMC sampling: 7K samples/sec

All existing tests pass (223/223). Zero breaking changes.

Original prompt

Identify and suggest improvements to slow or inefficient code

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: soran-ghaderi <22780398+soran-ghaderi@users.noreply.github.com>

- Remove unnecessary dtype conversions in BaseModel.gradient() - Replace inefficient expand+bmm with einsum in GaussianModel.forward() - Optimize HMC diagnostics by using broadcasting instead of expand operations - Optimize Langevin diagnostics by using broadcasting instead of expand - Optimize Leapfrog integrator by caching half_step and using inverse mass - Optimize Euler-Maruyama integrator by computing sqrt once - Add comprehensive performance benchmark tests Co-authored-by: soran-ghaderi <22780398+soran-ghaderi@users.noreply.github.com>

- Optimize SlicedScoreMatching by using repeat instead of expand+contiguous+view - Optimize ContrastiveDivergence by removing redundant device/dtype conversions - Optimize DenoisingScoreMatching by caching inverse noise scale - Add comprehensive benchmark script for performance measurement Co-authored-by: soran-ghaderi <22780398+soran-ghaderi@users.noreply.github.com>

- Optimize _initialize_momentum to avoid creating tensors for scalar operations - Optimize _compute_kinetic_energy to use multiplication instead of division - Use broadcasting instead of expand_as - Fix docstring syntax warning by using raw string (r""") Co-authored-by: soran-ghaderi <22780398+soran-ghaderi@users.noreply.github.com>

Co-authored-by: soran-ghaderi <22780398+soran-ghaderi@users.noreply.github.com>

Initial plan

a041a4d

Copilot AI assigned Copilot and soran-ghaderi Nov 15, 2025

Copilot started work on behalf of soran-ghaderi November 15, 2025 17:14 View session

Copilot AI and others added 5 commits November 15, 2025 17:19

Initial exploration and analysis of codebase

67d22d2

Co-authored-by: soran-ghaderi <22780398+soran-ghaderi@users.noreply.github.com>

Add comprehensive optimization summary documentation

6c3d1ae

Co-authored-by: soran-ghaderi <22780398+soran-ghaderi@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Identify and suggest improvements to slow code~~ Optimize performance bottlenecks across samplers, models, and loss functions Nov 15, 2025

Copilot AI requested a review from soran-ghaderi November 15, 2025 17:35

Copilot finished work on behalf of soran-ghaderi November 15, 2025 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize performance bottlenecks across samplers, models, and loss functions#95

Optimize performance bottlenecks across samplers, models, and loss functions#95
Copilot wants to merge 6 commits intomasterfrom
copilot/improve-slow-code-performance

Copilot AI commented Nov 15, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Core Model Optimizations

Sampler Optimizations

Integrator Optimizations

Loss Function Optimizations

Deliverables

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Nov 15, 2025 •

edited

Loading