Skip to content

Smooth stream track for streamspraydf models#861

Open
jobovy wants to merge 38 commits intomainfrom
streamspray-track
Open

Smooth stream track for streamspraydf models#861
jobovy wants to merge 38 commits intomainfrom
streamspray-track

Conversation

@jobovy
Copy link
Copy Markdown
Owner

@jobovy jobovy commented Apr 14, 2026

Summary

  • Adds StreamTrack (and StreamTrackPair for tail='both') — a lightweight smooth phase-space track parameterized by the progenitor's time coordinate tp in [-tdisrupt, 0]. Exposes Orbit-like accessors for Cartesian, cylindrical, and heliocentric coordinates (x/y/z/vx/vy/vz, R/vR/vT/phi, ra/dec/dist/ll/bb/pmra/pmdec/pmll/pmbb/vlos) plus a cov(tp) and plot(d1, d2, spread=...).
  • Adds basestreamspraydf.streamTrack(n=5000, tail=..., smoothing=..., niter=0, particles=..., order=2) which samples particles (or reuses pre-computed ones) and fits the track. The track is the smoothed binned mean of particles as a function of tp=-dt with a smoothed 6×6 covariance.
  • Tests, API reference docs, and a new notebook section in the spraydf tutorial.

A comment in streamdf.py points at StreamTrack as a candidate shared backend for a future follow-up PR that could simplify streamdf's track plumbing.

Test plan

  • 11 new unit tests in tests/test_streamspraydf.py pass (progenitor recovery, sample consistency, interface, PSD covariance, both arms, iteration, chen24+fardal15, center/satellite streams, physical units, particle reuse, plot smoke).
  • Existing test_streamspraydf.py tests still pass (28 total passing locally).
  • Notebook section uses existing spdf_both fixture; cells run interactively against a local execution.

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.91%. Comparing base (9494856) to head (a6712e8).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #861    +/-   ##
========================================
  Coverage   99.91%   99.91%            
========================================
  Files         224      225     +1     
  Lines       32912    33619   +707     
  Branches      710      710            
========================================
+ Hits        32885    33592   +707     
  Misses         27       27            

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jobovy
Copy link
Copy Markdown
Owner Author

jobovy commented Apr 18, 2026

StreamTrack implementation alternatives — comparison summary

Seven alternative branches were explored, each changing one axis of the algorithm. All branch off streamspray-track and are pushed (no PRs). Comparison plots (Bovy14 + Pal 5 + warm 10^7 Msun streams, aligned phi_1/phi_2 coordinates) are in the repo root as alt_*.png files.

Main branch (streamspray-track)

Closest-point projection (3D position) onto a finely-integrated, short-range progenitor orbit (0.03*tdisrupt). Bin-then-smooth offsets-from-progenitor with UnivariateSpline (chi-square s=N). Percentile-trimmed tp_grid.

Alternatives tested

Branch What it changes Cold stream (Bovy14) Pal 5
alt-gcv GCV auto-smoothing (make_smoothing_spline) Eliminates leading-arm wiggle Clean
alt-no-binning Per-particle splines, no binning Slight tip oscillation Small spikes at tips
alt-rotating-frame Smooth in progenitor-aligned rotating frame Clean Clean
alt-6d-closest 6D (position+velocity) closest-point Clean Clean
alt-strip-time-affine tp = arm_sign × dt (no closest-point) Dramatic wrap-aliasing failure Same
alt-auto-timerange Data-driven track_time_range from particle spread Clean Clean
alt-kdtree cKDTree for closest-point (speed) Identical to main Identical

Key findings

  1. alt-strip-time-affine (tp=dt directly) fails catastrophically for multi-wrap streams. The track loops across the sky multiple times. This validates the closest-point matching approach.

  2. alt-gcv (GCV smoothing) fixes the leading-arm wiggle visible in the main branch's Bovy14 fardal15 track. The main branch's chi-square s=N criterion slightly under-smooths; GCV auto-selects a better lambda.

  3. alt-auto-timerange is essential for warm streams (see next comment).

  4. alt-no-binning, alt-rotating-frame, alt-6d-closest, alt-kdtree produce tracks within ≤0.05 galpy units of main for cold streams — the algorithmic choices they test are secondary for typical use.

Recommendation

Combine: auto-timerange (scales with stream width) + GCV smoothing (eliminates under-smoothing) + 6D matching (robustness for eccentric orbits) + boundary pile-up exclusion + percentile trimming. Require scipy >= 1.10 (released Jan 2023).

Comparisons

Main branch (streamspray-track, PR #861)

Closest-point projection of each particle onto a finely-integrated, short-range (0.03*tdisrupt by default) progenitor orbit that spans both sides of tp=0. Bin-then-smooth offsets-from-progenitor with UnivariateSpline (chi-square-like auto-s). Covariance smoothed per-entry and PSD-projected.

The public tp_grid is trimmed to the 99th percentile of the
assigned tp values (symmetric 1st percentile on the trailing side),
so a handful of outlier particles matched to sparsely-populated
boundary bins can't drag the smoothing spline around at the tips.
trim_percentile is a local constant in StreamTrack.__init__ — we can
expose it as a kwarg later if there's demand.

Bovy14 stream (LogarithmicHaloPotential, q=0.9, tdisrupt=4.5 Gyr):

image

Pal 5 stream (MWPotential2014, Orbit.from_name("Pal 5"), tdisrupt=5 Gyr):

image

All 7 alternatives below inherit the same trim and can be toggled by git checkout alt-<name>. Each plot now has two panels: an aligned (phi_1, phi_2) view with the alt's track on top of the sample cloud and the streamspray-track reference as dashed black lines, plus a difference curve per arm against the saved reference.

What the bottom panel plots. The y-axis on the difference panel is the 3D Euclidean distance between the two tracks' galactocentric Cartesian positions at each tp:

d(tp) = ||(x, y, z)_alt(tp) − (x, y, z)_main(tp)||

in galpy internal length units (multiply by ro ≈ 8 for kpc). Not phi_2, not a single coordinate. Velocities are ignored in the diff metric; they're smoothed the same way as position, so positional agreement is a good proxy.

1. alt-gcv — auto-smoothing via GCV

scipy.interpolate.make_smoothing_spline replaces UnivariateSpline; the smoothing parameter is selected automatically by Generalized Cross Validation, so the user doesn't tune s.

  • Pros: no smoothness knob; data-driven bias/variance tradeoff.
  • Cons: requires scipy>=1.10; slower per fit.
image image
2. alt-no-binning — per-particle smoothing splines

Skip binning entirely. Each particle contributes one (tp_i, offset_i) point to six cubic smoothing splines; covariance comes from smoothing splines on residual outer-products.

  • Pros: no ntp knob, no empty-bin edge cases, adapts to uneven particle density along tp.
  • Cons: fits N points rather than ntp (slower for large n); FITPACK occasionally warns when s is mis-estimated from pairwise noise.
image image
3. alt-rotating-frame — smooth in the progenitor-aligned frame

At each particle's tp, rotate the raw offset into a progenitor-aligned frame (L along z, progenitor at +x). Smooth in that frame, then rotate back. This was the plan's original proposal.

  • Pros: rotated offsets carry clean physical meaning (along-stream / transverse / vertical); dynamic range of the smoothed signal is minimal.
  • Cons: per-tp rotation matrices add bookkeeping; interpolated matrices need SVD re-orthogonalization.
image image
4. alt-6d-closest — closest-point in 6D instead of 3D

Match particles to the progenitor orbit in full 6D phase space (position + velocity, both in galpy internal units) rather than 3D position alone.

  • Pros: unambiguous at orbit self-intersections (same xyz, different v); cleaner tp assignments near pericenter/apocenter.
  • Cons: larger distance-matrix memory; the position/velocity weighting is fixed by the choice of units rather than physically motivated.
image image
5. alt-strip-time-affine — tp = arm_sign * dt

Simplest possible mapping from particles to the affine parameter: use the known stripping time dt directly (positive for leading, negative for trailing). track_time_range defaults to tdisrupt so the progenitor is integrated over the full [-tdisrupt, +tdisrupt].

  • Pros: fully deterministic, no closest-point failure modes, exact per-particle affine value.
  • Cons: tp and along-stream position are correlated with scatter (ΔΩ varies per particle), so the mean-track at a given tp isn't exactly on the stream's median curve. Three tests are relaxed / one skipped on this branch because their tp semantics assume the closest-point mapping.
image image
6. alt-auto-timerange — adaptive track_time_range from particle spread

Replace the fixed 0.03*tdisrupt default with a data-driven estimate: measure the stream's spatial extent from a probe sample, convert to orbital-time via progenitor speed, pad by 4x, clamp to [1, tdisrupt].

  • Pros: adapts to narrow vs wide streams automatically; safe across a wider range of progenitor/stream regimes.
  • Cons: needs one small extra sample draw when particles= not supplied; speed metric can be fragile for highly eccentric orbits.
image image
7. alt-kdtree — KD-tree closest-point matching

Replace the O(N*M) pairwise distance matrix with a scipy.spatial.cKDTree query. When dt/arm-sign masks restrict the allowed neighbors per particle, query K nearest and pick the closest allowed one, growing K 4x per pass until every point has a match.

  • Pros: sublinear per-query complexity; no O(N*M) memory footprint (matters for very large N or denser progenitor grids).
  • Cons: a Python-level fallback loop runs when masks are restrictive (still fast for N up to ~1e5).
image image

@jobovy
Copy link
Copy Markdown
Owner Author

jobovy commented Apr 18, 2026

Warm-stream comparison (10^7 Msun progenitor)

To test robustness for wider streams (dwarf-galaxy-mass progenitors), we ran all alternatives with the Bovy14 orbit but 1000× larger progenitor mass (10^7 Msun instead of 10^4 Msun). This produces a much fatter stream with ~10× larger tidal radius and velocity kicks.

Results

The main branch (3D closest-point, track_time_range = 0.03*tdisrupt = 3.8) fails: the leading arm track flies off to phi_2 ~ -35, far from the sample cloud. The root cause is insufficient progenitor orbit coverage: with only 3.8 galpy time units (~0.3 orbital periods), far-flung warm-stream particles can't find a close match on the short progenitor arc and pile up at the boundary.

Branch Warm stream result Why
main (3D, short range) Broken — leading arm flies off Progenitor arc too short for wide stream
alt-6d-closest (6D, short range) Still problematic Same short arc; 6D can't help if orbit isn't long enough
alt-gcv (GCV, short range) Same failure Better smoothing can't fix bad tp assignment
alt-rotating-frame (short range) Same failure Better frame can't fix bad tp assignment
alt-auto-timerange (3D, data-driven range) Works — track follows cloud Auto range = 13.6 (~1.1 orbital periods), covers all particles

Key numbers

  • Cold stream: d_max = 1.7, auto range = 5.2 (1.4× main)
  • Warm stream: d_max = 4.4, auto range = 13.6 (3.6× main)

The auto-timerange scales naturally with stream width because it measures the actual particle spread. The fixed 0.03*tdisrupt doesn't adapt.

Conclusion

For warm streams, progenitor orbit coverage is the primary bottleneck, not matching dimension or smoothing method. The auto-timerange fix is essential. 6D matching and GCV smoothing are secondary improvements that help with cold-stream edge cases (orbit self-intersection, under-smoothing) but cannot compensate for insufficient orbit coverage.

Boundary pile-up

Even with auto-timerange, particles at the edge of the progenitor arc are mismatched (their true closest point is outside the arc). These create a "pile-up" at the extreme tp values that biases the track. Fix: exclude particles whose tp_assign equals the far boundary of track_t_grid before fitting.

Example warm stream

image

@jobovy jobovy force-pushed the streamspray-track branch 2 times, most recently from 2d55afd to c9e3ea1 Compare April 18, 2026 16:31
@jobovy
Copy link
Copy Markdown
Owner Author

jobovy commented Apr 18, 2026

Warm-stream comparison (10^7 Msun progenitor, TriaxialNFW not included here)

Bovy14 orbit in LogarithmicHaloPotential(q=0.9), tdisrupt=4.5 Gyr, fardal15spraydf. Six projections: aligned sky (phi_1, phi_2), galactocentric (x, y), cylindrical (R, z), equatorial (RA, Dec), Galactic (l, v_los), heliocentric (dist, pm_ra).

Cold stream (10^4 Msun) — main branch

warm_test_cold_1e4

Warm stream (10^7 Msun) — main branch

warm_test_warm_1e7

Key findings

  • Cold stream: tracks follow the sample cloud cleanly in all six projections. GCV smoothing produces a smooth curve with no leading-arm wiggle.
  • Warm stream: tracks follow the sample cloud in all projections. The auto-timerange correctly picks track_time_range ≈ 27 (vs 3.8 for cold), covering the larger spatial extent. At every tp evaluation point, 74–436 sample particles are within 3 kpc of the track — confirming the track is correctly tracing through the thick cloud.
  • Some sky-coordinate panels show straight-line artifacts where matplotlib connects tp-consecutive points across the RA=0/360 wrap boundary. This is a visualization issue, not a track error.

Diagnostics

Cold (10^4) Warm (10^7)
auto track_time_range ~3.4 ~27
leading tp range [0, 1.1] [0, 11.8]
particles after boundary exclusion 3000/3000 3000/3000

@jobovy
Copy link
Copy Markdown
Owner Author

jobovy commented Apr 18, 2026

Triaxial NFW comparison: main vs stream-orbit v1 vs v2

TriaxialNFWPotential(normalize=1, a=20/8, b=0.8, c=0.6), Bovy14 orbit, tdisrupt=2 Gyr, n=1000 particles. Three projections: aligned sky, (x, y), (R, z).

Cold stream (10^4 Msun)

Main (progenitor orbit only):
triaxial_main_cold_1e4

v1 (one-shot orbit from raw particle-mean tip IC):
triaxial_v1_cold_1e4

v2 (two-step orbit from smoothed track tip IC):
triaxial_v2_cold_1e4

Cold verdict: All three nearly identical. The cold stream stays close to the progenitor orbit even in the triaxial potential.


Warm stream (10^7 Msun)

Main (progenitor orbit only):
triaxial_main_warm_1e7

v1 (one-shot orbit from raw particle-mean tip IC):
triaxial_v1_warm_1e7

v2 (two-step orbit from smoothed track tip IC):
triaxial_v2_warm_1e7

Warm verdict:

  • Main: both arms follow the thick cloud in all projections. Clean.
  • v1: trailing OK, but leading arm shoots off to R~26 in (R,z) — the noisy particle-mean tip IC leads to an orbit that diverges from the actual stream.
  • v2: matches main quality. The smoothed-track tip IC avoids the v1 divergence.

Summary

Cold (1e4) Warm (1e7)
Main (progenitor only) Clean Clean
v1 (particle-mean tip) Clean Leading diverges in (R,z)
v2 (smoothed-track tip) Clean Clean, matches main

The main branch already handles the triaxial warm case well — the progenitor orbit is a good enough reference. v2 matches main; v1 still fails from IC noise. The stream-orbit refinement might become more impactful for more extreme triaxiality (b < 0.6) or very long tdisrupt where cumulative orbital precession builds up.

@jobovy
Copy link
Copy Markdown
Owner Author

jobovy commented Apr 18, 2026

Triaxial NFW comparison (updated: 5 Gyr tdisrupt)

TriaxialNFWPotential(normalize=1, a=20/8, b=0.8, c=0.6), Bovy14 orbit, tdisrupt=5 Gyr, n=1000 particles.

Cold stream (10^4 Msun)

Main (progenitor orbit only):
triaxial_main_cold_1e4 (1)

v1 (one-shot orbit from raw particle-mean tip IC):
triaxial_v1_cold_1e4 (1)

v2 (two-step orbit from smoothed track tip IC):
triaxial_v2_cold_1e4 (1)


Warm stream (10^7 Msun)

Main (progenitor orbit only):
triaxial_main_warm_1e7 (1)

v1 (one-shot orbit from raw particle-mean tip IC):
triaxial_v1_warm_1e7 (1)

v2 (two-step orbit from smoothed track tip IC):
triaxial_v2_warm_1e7 (1)


Summary

Cold (1e4) Warm (1e7)
Main (progenitor only) Clean Clean
v1 (particle-mean tip) Clean Leading arm diverges
v2 (smoothed-track tip) Clean Clean, matches main

With 5 Gyr of disruption in the triaxial potential, the main branch still handles both cold and warm streams well. v1's noisy IC diverges for the warm case; v2's smoothed IC fixes this and matches main quality. The stream-orbit refinement doesn't visibly improve over using the progenitor orbit directly at this level of triaxiality (b=0.8, c=0.6).

@jobovy
Copy link
Copy Markdown
Owner Author

jobovy commented Apr 18, 2026

Status update: warm-stream testing, combined improvements, and stream-orbit exploration

Warm-stream testing (10^7 Msun progenitor)

Stress-tested the algorithm with a 1000x larger progenitor mass. The fixed track_time_range = 0.03 * tdisrupt was too short for warm streams — particles spread far from the progenitor couldn't find matches on the short progenitor arc, causing boundary pile-up and a broken track. Fix: auto-timerange from particle spread (8 * d_max / |v_prog|), which scales naturally with stream width.

Combined implementation

Merged four orthogonal improvements, each validated by the branch comparisons:

  • GCV smoothing (make_smoothing_spline, scipy ≥ 1.10) as default — eliminates the leading-arm under-smoothing wiggle. smoothing=<float> still available for explicit s.
  • 6D closest-point matching (position + velocity) — more robust at orbital self-intersections. No cost for cold streams.
  • Auto-timerange — essential for warm streams, harmless for cold.
  • Boundary pile-up exclusion — removes particles matched to the far edge of the progenitor arc before fitting.

Warm-stream deep dive

After initially thinking the warm stream's track broke in 3D projections, discovered it was a plotting bug (pyplot.sca(ax) missing). The track is correct in all projections — verified numerically (74–436 sample particles within 3 kpc of each track evaluation point).

Stream-orbit refinement exploration (alt-stream-orbit)

Explored using a "stream orbit" integrated from the stream tip as the smoothing reference instead of the progenitor orbit:

  • v1 (one-shot from raw particle-mean tip IC): diverges — noisy IC amplified over integration. Track flies outside sample cloud.
  • v2 (two-step: fit preliminary track on progenitor → evaluate smooth track at tip for noise-free IC → integrate stream orbit from that): works — avoids divergence, matches main-branch quality.
  • Conclusion: v2 reliably avoids v1's noise problem, but for the streams tested (LogarithmicHaloPotential and TriaxialNFW b=0.8, c=0.6, tdisrupt=5 Gyr), the main branch (progenitor orbit only) already works well. The refinement doesn't visibly improve the track. Left as experimental on alt-stream-orbit.

Triaxial NFW comparison

Ran main vs v1 vs v2 in TriaxialNFWPotential(b=0.8, c=0.6), tdisrupt=5 Gyr, for both cold (10^4 Msun) and warm (10^7 Msun) streams. Main handles both cleanly; v1 diverges for warm; v2 matches main. See earlier PR comments for plots.

Timing (track construction only, excluding particle sampling)

n=1000 n=3000
Main ~1.0s ~2.3s
v2 (stream-orbit) ~2.0s ~4.6s

v2 is 2x main (fits the track twice). Progenitor integration is negligible (<60ms). Cost dominated by the N×M distance matrix and GCV spline fits. Potential choice and progenitor mass don't affect timing.

Current state

  • streamspray-track: GCV + 6D + auto-timerange + boundary exclusion + percentile trim. 23 tests passing.
  • alt-stream-orbit: experimental v2 refinement. Works but doesn't improve over main for tested potentials.
  • 7 other alternative branches available for reference.

@jobovy jobovy force-pushed the streamspray-track branch from 358c1c4 to 80eb876 Compare April 18, 2026 22:40
@jobovy
Copy link
Copy Markdown
Owner Author

jobovy commented Apr 19, 2026

Timing benchmarks (current implementation)

All times are track construction only — particle sampling/orbit integration excluded. The particles= argument lets users pre-sample once and reuse; sampling cost scales linearly with n and depends on the potential (e.g., ~17s for n=3000 in LogarithmicHalo, ~170s in TriaxialNFW).

scipy 1.17.1, make_smoothing_spline (GCV) for auto-smoothing.

LogarithmicHaloPotential (q=0.9), tdisrupt=4.5 Gyr

n order=1 (mean) order=2 (mean+cov) reuse-s (order=2)
1000 0.13s 0.50s 0.09s
3000 0.22s 0.75s 0.17s
5000 0.33s 0.98s 0.24s

Cold (10⁴ M☉) and warm (10⁷ M☉) streams give essentially identical timing — progenitor mass doesn't affect track construction cost.

TriaxialNFWPotential (b=0.8, c=0.6), tdisrupt=5 Gyr

n order=1 (mean) order=2 (mean+cov) reuse-s (order=2)
1000 0.19s 0.52s 0.14s
3000 0.28s 0.79s 0.22s

Slightly higher than LogarithmicHalo due to the longer progenitor arc (5 vs 4.5 Gyr), but same order of magnitude.

Cost breakdown

  • GCV spline fitting dominates: order=2 fits 27 splines (6 mean + 21 unique covariance elements) vs. 6 for order=1, explaining the ~3x difference.
  • KDTree matching: negligible (<50ms for n=3000). This replaced an N×M distance matrix that was ~1.6s for n=3000.
  • Progenitor integration: <1ms (already integrated during spraydf setup).
  • smoothing_s reuse: bypasses GCV entirely, using UnivariateSpline(s=...) with the stored per-spline smoothing values. Gives 3–4x speedup over GCV for order=2. Useful when re-fitting with different particles but similar stream properties.

Iteration (niter)

Each iteration re-matches particles to the current smooth track and refits. Cost is ~0.1s per iteration (n=3000, order=1):

niter time (order=1)
0 0.21s
1 0.31s
2 0.41s
3 0.51s

Summary

Track construction is fast (sub-second for typical use). The dominant cost is always the upstream particle sampling, which scales with n and potential complexity. The particles= interface and smoothing_s reuse are the two main levers for avoiding redundant work.

jobovy and others added 19 commits April 25, 2026 22:19
StreamTrack builds a smooth mean + covariance phase-space curve through
spray-sampled stream particles, parameterized by the progenitor's time
coordinate tp in [-tdisrupt, 0]. It exposes Orbit-like accessors
(x, y, z, vx, vy, vz, R, vR, vT, phi) plus heliocentric conversions
(ra, dec, dist, ll, bb, pmra, pmdec, pmll, pmbb, vlos) and a plot()
method with an optional covariance band. StreamTrackPair holds leading
and trailing arms when tail='both'.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests cover: progenitor recovery in tiny-tdisrupt limit, consistency of
the track with binned sample means, interface coverage (Cartesian,
cylindrical, heliocentric accessors), PSD covariance, both-arms
behavior, iteration, chen24 and fardal15 variants, center-orbit
(satellite) streams, physical-unit toggle, particle reuse, and a
plot smoke test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds:
- API reference pages for StreamTrack, StreamTrackPair, and the
  streamTrack() method on basestreamspraydf.
- A "Smooth stream track from spray samples" section appended to the
  streamspraydf tutorial notebook demonstrating sample + track overlay,
  heliocentric projection, smoothing control, and iteration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Drop dead code (unused _setup_rot_at_tp helper, unused solar-motion
  string aliases in _get_vsun_kms, unused defensive branches).
- Consolidate per-coordinate physical-unit scaling into a single
  _scale() helper so all unit branches are exercised by one test.
- Make _smooth_series robust to degenerate inputs (no valid bins or a
  single bin pad to a 2-point linear interpolator) and sanitize NaN/Inf
  in the smoothed covariance before PSD projection.
- Add tests covering: physical-unit accessors, scalar cov, plot spread
  band on length axes, both-arms physical toggle, __call__ stacked
  return, tp_grid, order=1 (no covariance), invalid tail, particle
  reuse with tail='both', smoothing variants, and a degenerate
  small-n / large-ntp case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Compute particles_cart once in __init__ instead of every _fit
  iteration (the particles never change).
- Remove unused state: _ntp, _tp_assign, _coord_splines, _bin_means,
  _bin_covs.
- Collapse the six explicit InterpolatedUnivariateSpline blocks into
  a list comprehension; do the same for the per-coord smoothing-spline
  fits (one tighter pass over six coordinates).
- Compress _helio_xv into two function calls instead of six temporary
  ro/vo-scaled arrays.
- _assign_closest_on_track now reuses self._particles_cart and uses
  einsum for the squared-distance reduction (one O(N*M) allocation
  instead of three).
- Centralize the dist physical-units case via a new 'kpc' kind on
  _scale; drop the redundant trailing return None on turn_physical_on.

100% coverage on streamTrack.py is preserved (file shrinks ~10%).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- StreamTrack now starts with _physical=True when the progenitor has
  both ro and vo set, so samples and the track plot on the same scale
  without an explicit turn_physical_on() call.
- Bump the default ntp from ~sqrt(n) to ~n/15 (clipped to [31, 201]).
  The sqrt rule gave too few knots to resolve the progenitor's orbital
  oscillations over multi-Gyr tdisrupt (~10 oscillations for the Bovy14
  stream), producing a spline that looped across the stream cloud —
  especially visible with the Chen+24 model, whose larger intrinsic
  scatter makes the deficit obvious.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…moothing

- tp assignment via closest-point projection of each particle onto a
  freshly, finely-integrated progenitor orbit (separate from the
  sampling-time orbit), windowed by arm sign (leading -> tp>=0,
  trailing -> tp<=0) and by stripping time (|tp| <= dt_i, resolves
  wrap aliasing).
- Short progenitor integration: ~0.03*tdisrupt symmetric around tp=0,
  integrated forward and backward separately from the present-day IC,
  then spliced on a dense 10001-point grid.
- Smoothing of offsets from the progenitor orbit (particle - prog(tp))
  rather than raw positions: offsets stay small and well-behaved
  independent of orbital phase, so the fit doesn't chase the
  progenitor's own orbital oscillations.
- Binning retained (simpler and more robust than raw-particle splines
  at low particle counts) with auto-s from bin-mean standard error.
- Public tp_grid restricted to the observed tp range of the fit, and
  evaluations are clipped to that range to prevent cubic-spline
  extrapolation outside data support.

Tests updated to match the new tp-grid semantics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a test for the user-supplied ``track_time_range`` kwarg (float and
Quantity), which also covers the non-default branch in
basestreamspraydf.streamTrack. Marks three small defensive paths in
_smooth_series and StreamTrack.__init__ that only trigger on
pathologically degenerate inputs as no-cover; the paths are kept as
safety nets but realistic usage never exercises them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Not all CI matrix jobs install astropy; the Quantity portion of the
test is now skipped gracefully when astropy is absent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Following existing convention (test_streamspraydf_sample_RvR etc. live
in test_quantity.py), move the astropy-Quantity portion of
test_streamTrack_custom_track_time_range there. The float-only portion
remains in test_streamspraydf.py so it runs on CI jobs that don't
install astropy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lusion

Combines four improvements validated by branch comparisons:

- GCV smoothing (make_smoothing_spline) as default when smoothing=None.
  Eliminates leading-arm under-smoothing. User can pass smoothing=float
  for explicit s via UnivariateSpline.
- 6D closest-point matching (position + velocity) instead of 3D.
- Auto-timerange: 8 * d_max / |v_prog| instead of fixed 0.03*tdisrupt.
- Boundary pile-up exclusion before fitting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. (bug) progpot + streamTrack: save the pre-progpot potential as
   _orig_pot and use it for the track-progenitor integration, avoiding
   the MovingObjectPotential whose internal orbit is only valid on
   [-tdisrupt, 0].

2. (bug) vlos double-scaling: _helio_xv already returns km/s (it pre-
   multiplies by vo), so the 'vlos' branch of _scale now only attaches
   units without re-scaling.

3. (docs) Fix docstrings: track_time_range default (now auto-timerange),
   track_n_dense actual grid size, smoothing description (now GCV), add
   ntp kwarg docs.

4. (docs) Fix class docstring tp range description (not [-tdisrupt, 0]),
   add arm_sign and ntp to Parameters section.

5. (cleanup) plot spread band: remove dead s1 variable, gate physical
   scaling on self._physical only (consistent with _scale), drop the
   d1-in-cart_idx requirement since the band is only in the d2 direction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
a) KDTree closest-point matching replaces O(N*M) distance matrix.
b) order=1 skips 21 covariance spline fits.
c) track.smoothing_s exposes per-spline effective s values (len 6 or
   27). Pass back as smoothing= to skip GCV on subsequent calls.

Timings (n=3000, pre-sampled, LogHalo):
  GCV order=2: 0.79s (was 2.3s), order=1: 0.25s, reuse-s: 0.17s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds test for array-like smoothing (reuse smoothing_s from a previous
GCV fit) and order=1 (mean-only s values). Marks two KDTree fallback
branches as no-cover (k=1 ndim edge case and k>=M exhaustion).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests _closest_point_on_curve with:
- Short curve (M < 64) + mask: triggers k=1 → cand.ndim==1 branch.
- All-False mask: triggers k>=M exhaustion → tp=0 fallback.

Both branches now covered by proper tests instead of no-cover pragmas.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jobovy and others added 12 commits April 25, 2026 22:19
`track.cov(tp)` was returning galpy internal units² regardless of the
`turn_physical_on/off` state, while `track.x(tp)` etc. returned kpc and
km/s. The inconsistency made sampling `N(mean, cov)` produce point
clouds whose scatter was ~8× too small in physical-unit space — easy
to miss until a user compared the implied width to the actual particle
spread. Scale the returned cov by `outer(scale, scale)` where
`scale = [ro, ro, ro, vo, vo, vo]` when physical is on, matching the
per-coordinate scaling already done by the x/y/z/vx/vy/vz accessors.

Also drop the now-redundant manual ro/vo scaling in `plot(..., spread>0)`
which would otherwise double-apply.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After the fit, `track.particles` returns the untouched `(xv, dt)` pair
the `StreamTrack` was built from — same format that
`streamTrack(particles=...)` accepts. Closes the loop with the existing
input knob: callers can now plot the track over its own sample cloud or
pass the data to a sibling fit without resampling or risking a mismatch
between track and scatter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…helper

- New `custom_transform=` kwarg on `StreamTrack.__init__` (plumbed
  through `basestreamspraydf.streamTrack`) — a 3x3 rotation matrix
  from (ra, dec) to a custom (phi1, phi2) sky frame, matching the
  same-named knob on `streamdf`.
- New accessors `phi1(tp)`, `phi2(tp)`, `pmphi1(tp)`, `pmphi2(tp)` on
  `StreamTrack`. Require `custom_transform` to be set; raise otherwise.
- New `galpy.util.coords.align_to_orbit(progenitor, center_phi1=180)`
  helper that builds a rotation matrix placing the progenitor's orbital
  plane at phi2=0 and the progenitor itself at phi1=center_phi1
  (default 180°, so the stream wraps across 0/360 rather than through
  the centre of the plot).

Covariance in the custom frame is deferred to a follow-up (per the
TODO); use `cov(tp)` in the galactocentric Cartesian basis for now.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ky) + extend plot(spread=) to all axes

`track.cov(tp, basis=...)` now returns the 6x6 covariance in any of
the five natural bases: galcenrect (default, as before), galcencyl,
sky (ra, dec, dist, pmra, pmdec, vlos), galsky (ll, bb, dist, pmll,
pmbb, vlos), and customsky (phi1, phi2, ...) when custom_transform is
set. Propagation uses analytical Jacobians throughout (no finite
differences); rotation matrices inverted by transpose, the non-
orthogonal lbd↔XYZ block inverted via numpy.linalg.inv. The sky-sky
tangent rotations include the α-depends-on-position correction in the
(pm, sky-position) block so the full 6x6 is exact, not just the
block-diagonal.

`plot(spread>0)` now draws the ±σ band for every d2 that has a basis
in _COORD_BASIS (Cartesian, cylindrical, equatorial sky, Galactic sky,
custom-sky) — previously only Cartesian axes produced a band.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fensive plot-spread branch

- `test_streamTrack_custom_accessors_no_physical` hits the non-Quantity
  branch of the scalar-extract helpers in phi1/phi2/pmphi1/pmphi2.
- Plot's `except RuntimeError` fallback is actually unreachable (the
  custom-sky accessor raises first when custom_transform is absent), so
  mark it `# pragma: no cover (defensive)`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New test exercises streamTrack on a basestreamspraydf built with
`progpot=` set, hitting the `self._orig_pot = self._pot` assignment in
`basestreamspraydf.__init__` that streamTrack relies on to integrate
the progenitor through the bare (non-MovingObjectPotential) base pot.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Makes the "build a custom sky-frame rotation aligned to this orbit's
orbital plane" helper accessible directly from an Orbit instance:

    T = progenitor.align_to_orbit()  # default center_phi1=180
    track = spdf.streamTrack(custom_transform=T, ...)

The underlying galpy.util.coords.align_to_orbit(progenitor) stays the
generic entry point and is untouched. Returns a 3x3 numpy rotation
matrix; no unit / shape decorators, so the output is basis-oriented,
not per-orbit-axis.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sian 6D

The helper in `galpy.util.coords` shouldn't depend on
`galpy.orbit.Orbit` — that module is supposed to be a pure coord-
transform library. Refactor the signature from
`align_to_orbit(progenitor, center_phi1)` to
`align_to_orbit(X, Y, Z, vX, vY, vZ, center_phi1)` taking heliocentric
Galactic Cartesian kinematics directly (the only quantities the angular
momentum L = r × v actually needs).

`Orbit.align_to_orbit(center_phi1=180)` becomes a thin method that
extracts the heliocentric Cartesian from `self` (via
`galcenrect_to_XYZ` / `galcenrect_to_vxvyvz` in internal units, since
the returned rotation matrix is scale-invariant) and forwards to the
coords helper.

Also removes the `_coords_import_orbit()` lazy-import stub that was
needed only to work around the circularity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Take galactocentric (x, y, z, vx, vy, vz) + Xsun/Zsun instead of
heliocentric Galactic Cartesian. The alignment uses

    L = (r_prog - r_sun) x v_prog

(Sun treated as at rest in the Galactocentric frame) — this is the
quantity that defines the great circle on the observer's sky closest
to the apparent orbit trajectory. The previous signature took a fully
heliocentric 6D (velocity with the Sun's motion subtracted), which
gives a plane with a noticeably larger residual phi2 spread in tests.

Orbit.align_to_orbit is a thin wrapper that forwards the progenitor's
native galactocentric coordinates plus Xsun=1, Zsun=zo/ro in internal
units.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
galcenrect_to_XYZ and XYZ_to_lbd return shape (3,) for scalar inputs
and align_to_orbit is scalar-only, so the numpy.atleast_1d + ndim==2
guards were unreachable. Codecov flagged the else-never branches —
simpler to just remove them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refactor the unit handling in galpy.df.streamTrack to match Orbit:

- Each accessor (x/y/z/vx/vy/vz/R/vR/vT/phi/ra/dec/ll/bb/dist/pmra/
  pmdec/pmll/pmbb/vlos/phi1/phi2/pmphi1/pmphi2) is decorated with
  @physical_conversion(quantity, pop=True), so callers can override
  use_physical / ro / vo / quantity per-call exactly like with Orbit.
- turn_physical_on/off now toggle _roSet/_voSet (the decorator already
  reads those), and the bespoke _physical flag is gone.
- StreamTrack.plot follows the same logic as Orbit.plot: physical
  output iff _roSet and _voSet (or use_physical=True / explicit ro=
  / vo=), per-call ro/vo/use_physical override the object-wide default,
  and a small _cov_axis_scale helper converts the spread band to
  physical units when needed.
- StreamTrackPair.plot forwards the new kwargs.

New tests in tests/test_quantity.py mirror the Orbit unit tests:

- test_streamtrack_method_returntype: every accessor returns Quantity
  when physical is on.
- test_streamtrack_method_returnunit: each accessor returns the right
  unit (kpc, km/s, mas/yr, deg, rad).
- test_streamtrack_method_value_default_ro_vo: physical = internal *
  ro for positions and internal * vo for velocities.
- test_streamtrack_method_per_call_override: use_physical=False, ro=,
  vo=, quantity=False work per-call.
- test_streamtrack_method_value_turn_physical_off: cartesian /
  cylindrical galactocentric accessors drop the Quantity wrapper after
  turn_physical_off; sky-frame / underscore-tagged accessors keep it.
- test_streamtrack_nondefault_rovo: a track built with ro=9 kpc,
  vo=200 km/s scales physical outputs accordingly.
- test_streamtrack_pair_turn_physical: pair propagates to both arms.
- test_streamtrack_plot_follows_physical: smoke-test plot in physical,
  internal, and per-call override modes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_cov_axis_scale was wrong: cov(basis=..) already scales its diagonal
to physical units when _roSet and _voSet are True (via the existing
outer-product scaling in cov()), so multiplying by ro/vo again
double-scaled the sigma in the default code path — the spread band
would be ~ro times too wide.

Removing the helper gives the correct band in the default case (the
units of cov()'s diagonal automatically match the accessor's default).
The only remaining inconsistency is when the caller explicitly passes
use_physical=False to plot on a track that has _roSet=True; the spread
band will be in the track's default-physical units rather than
internal. That is documented in the new comment and is out of scope
for this plot helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jobovy jobovy force-pushed the streamspray-track branch from 08536e8 to 5ce428f Compare April 26, 2026 02:20
jobovy and others added 7 commits April 26, 2026 00:35
- cov(tp, basis=...) now accepts ro=, vo=, use_physical=, quantity=
  matching the per-call override semantics of the mean accessors.
  quantity=True raises NotImplementedError (the 6x6 has heterogeneous
  units across entries).
- _COORD_BASIS is derived from _BASIS_COORDS via reversed-order dict
  comprehension so the two stay in sync (single source of truth, with
  first-listed-basis-wins for shared coord names like z/vz/dist/vlos).
- plot() forwards ro/vo/use_physical to the accessors and to cov()
  rather than re-deriving the use_phys decision; the dropped block
  duplicated the same defaulting that @physical_conversion already
  applies in the accessors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new ro/vo/use_physical kwargs on cov() reach the analytical
Jacobian via _cart_mean_at and _analytical_jacobian only when a
non-galcenrect basis is requested *with* an explicit override —
add that case to the override test so codecov sees the threaded
branches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the sub-Jacobians that StreamTrack.cov(basis=...) needed inline into
util.coords, where the rest of galpy's Jacobian helpers already live, and
collapse the previous ~190-line _analytical_jacobian to ~30 lines that
just chain them together.

New in coords:

- XYZ_to_lbd_jac: closed-form analytical inverse of lbd_to_XYZ_jac. The
  6x6 forward is block-triangular and its velocity block is a (scaled)
  rotation, so the inverse is a transpose with two rows divided by K·D
  for the velocity block and the standard spherical-Cartesian inverse
  for the position block. Replaces the numpy.linalg.inv(J_lbd) call —
  same Jacobian, no LAPACK round-trip.
- galcenrect_to_galcencyl_jac: closed-form 6x6 cylindrical change of
  variable. Was previously a private module-level helper in streamTrack.
- galsky_to_sky_jac: local tangent-rotation 6x6 mapping
  (l, b, D, pmll, pmbb, vlos) → (ra, dec, D, pmra, pmdec, vlos).
- sky_to_customsky_jac: same form parameterized by a custom_transform
  rotation matrix T.

Each new function has a finite-difference / inverse-identity test in
test_coords.py.

streamTrack._analytical_jacobian now reads as the chain rule:
galcenrect → XYZ_jac → XYZ_to_lbd_jac → galsky_to_sky_jac →
sky_to_customsky_jac, with the radians→degrees conversion applied once
at the end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion

Previously _cart_eval/_eval_cart clipped tp to the spline's data support;
out-of-range queries silently returned the boundary value rather than
flagging an error. cov() did the same via numpy.interp's clamp behavior.
That made it easy to miss when a caller was probing outside the fit
(e.g. using -tdisrupt as a tp on a leading-arm track, where tp >= 0).

Now: out-of-range tps return NaN; for an array tp, only the offending
entries are NaN and the in-range entries are unaffected. cov() honors
the same convention and skips the per-tp Jacobian step on NaN entries.

Several existing tests were inadvertently passing -10/-20/-tdisrupt to
leading-arm tracks (where the silent clamp masked it); switched them
to query tp_grid()-derived in-range values. Added a dedicated test
that asserts the NaN behavior for both scalar and array tps and
across cov(basis=galcenrect|sky).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ches

Codecov flagged 9 patch misses on the new Jacobian helpers. Two sources:

- The degree=True conversion paths in XYZ_to_lbd_jac (3x3-only branch)
  and in galsky_to_sky_jac / sky_to_customsky_jac were not exercised by
  the inverse-identity / finite-difference tests. Add direct degree=True
  checks and a 2-arg call for sky_to_customsky_jac.
- Pole-singular and Sun-as-origin branches in XYZ_to_lbd_jac (r==0,
  cb==0, KD==0) are defensive for callers that pass pathological
  positions; the stream-track use case never hits them. Mark with
  pragma: no cover and split the conditional pmll/pmbb assignment into
  an if/else so the defensive arm is excluded cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The class was constructor-bound to streamspraydf's particle inputs, which
blocked reuse from streamdf (which has no particles and a different
affine parameter). Refactor:

- StreamTrack.__init__ now takes a precomputed track:
  (tp_grid, track_xyz, track_vxvyvz, cov_xyz=None, custom_transform,
  parameter_kind, ro/vo/zo/...). It builds the cubic-spline interpolators
  and stores covariance — that's it. No fitting, no particles required.
- The closest-point + offset-smoothing pipeline moves into
  _fit_track_from_particles (module-level) and is wrapped by the
  StreamTrack.from_particles classmethod, which calls __init__ and
  attaches the fitter outputs (.particles, .smoothing_s) the user may
  want for plotting / GCV-cache reuse.
- A new parameter_kind kwarg controls how astropy Quantity inputs to
  accessors are interpreted: "time" (default, current spraydf behavior),
  "angle" (parsed via parse_angle — the natural choice for streamdf-
  style angle-along-stream tracks), or None (pass-through).
- streamspraydf.streamTrack now constructs via from_particles. No
  user-facing behavior change.

Drops the old _fit / _prog_at / _assign_closest_on_progenitor /
_assign_closest_on_track instance methods (their work is now self-
contained in the module-level fitter).

Adds a test exercising the new precomputed-track __init__, the three
parameter_kind options, and the bad-kind ValueError. The existing
test_streamTrack_sample_consistency previously reached into the
fitter's now-gone scratch attributes; rewritten to recompute the
closest-point assignment using the public .particles tuple plus
the module-level _closest_point_on_curve helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… without astropy

The new test_streamTrack_precomputed_init_and_parameter_kind in
test_streamspraydf.py imported astropy unconditionally, which broke the
ubuntu/windows-3.14 CI matrix entry that runs the streamdf+streamspraydf
suite without astropy.

Move the Quantity-parsing checks into a new
test_streamtrack_parameter_kind_quantity_inputs in test_quantity.py
(consistent with how track_time_range Quantity is split out from the
plain-float test_streamTrack_custom_track_time_range). The plain-float
parameter_kind / precomputed-init coverage stays in test_streamspraydf.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant