Skip to content

Conversation

@maresb
Copy link
Collaborator

@maresb maresb commented Sep 21, 2025

I'm not sure if it makes a difference, but we should be attenuating the coefficients in our GP according to the standard decay for the (in this case Matérn) GP prior. Namely, the variance should be the eigenvalue. @aseyboldt, for some reason this is running really slowly for me (also with main). Could you check if this works for you?

Before: all coefficients have std = 1 for elastic or std = 10 for coupling
After: coefficients have std = C sqrt(lambda), where C is chosen so that the top eigenvalue has the previous default std of 1 or 10.

@maresb maresb requested a review from aseyboldt September 21, 2025 14:48
@aseyboldt
Copy link
Collaborator

Sorry for the slow review, but this looks great, I think we can merge this.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@maresb maresb marked this pull request as draft October 17, 2025 20:00
@maresb
Copy link
Collaborator Author

maresb commented Oct 17, 2025

Converted to draft because it contains a bunch of results in notebooks that I wouldn't want in the main branch...

@maresb
Copy link
Collaborator Author

maresb commented Oct 17, 2025

Preliminary results

My hypothesis for this experiment was a slight difference in this direction. I'm pretty surprised about the major difference, especially with "strike slip coupling". Maybe the higher modes are indeed prior-dominated. I'm wondering if @aseyboldt has any thoughts.

What changed?

"Before" we had implemented a simple white noise prior, where each coefficient had a standard normal prior. Here I was suspicious about the prominence of the high eigenfunctions.

"After" we implemented a Matérn $\nu=\tfrac{1}{2}$ prior. This dampens the eigenfunction priors to be a normal with variance equal to the eigenvalue. In other words, it makes the plots look less wobbly.

🤓 details: The kernel we use to compute the KL modes is $\exp(-d/\ell)$, which is the Matérn kernel with $\nu=\tfrac{1}{2}$. In contrast, the white noise prior is effectively $\nu=-1$, so in various technical senses it has 1.5 derivatives less. Note that in the limit of infinitely many eigenfunctions, $\nu=\tfrac{1}{2}$ samples are almost surely $\alpha$-Hölder continuous for $\alpha<\tfrac{1}{2}$, which means that as you crank up the number of eigenfunctions, samples will be continuous, and be very close to satisfying $|f(x+\delta) - f(x)| < C \sqrt\delta$. (A differentiable function will basically by definition satisfy this with $<C \left|\delta\right|$.) In contrast, with white noise, things will diverge when you crank up the number of eigenfunctions.

It's a bit of an open question how to set the overall scale for the prior. In the code I made it so the top before/after eigenvalues agree, and those are hard-coded to 10 for coupling and 1 for elastic.

Samples from "before"

Source: "Run notebook without" (the Matérn prior) https://github.com/brendanjmeade/celeri/blob/461530e9a6ea93b12650fde83d56995dbee37eb3/notebooks/doc_solve_mcmc.ipynb

image image image

Samples from "after"

Source: "Run notebook with" (the Matérn prior) https://github.com/brendanjmeade/celeri/blob/f21cd2d9017c6f90bfec2906eccb4d830be0ae0a/notebooks/doc_solve_mcmc.ipynb

image image image

@aseyboldt
Copy link
Collaborator

aseyboldt commented Oct 18, 2025 via email

@maresb
Copy link
Collaborator Author

maresb commented Oct 18, 2025

Ya, all the other results are consistent with what I expected: toning down the excessive "leopard spots". I'm really surprised though that the strike slip coupling looks so smooth.

@maresb
Copy link
Collaborator Author

maresb commented Oct 20, 2025

It's a bit of an open question how to set the overall scale for the prior. In the code I made it so the top before/after eigenvalues agree, and those are hard-coded to 10 for coupling and 1 for elastic.

This was the problem. It's better to not to hard-code the scale of the top eigenvalue.

I'm still not very confident about how to choose the overall scale for the priors.

@maresb
Copy link
Collaborator Author

maresb commented Oct 27, 2025

Let me outline how I plan to finish this out.

There are a lot of skeletons lurking in the closet with a GP like this. My goal is to bring them all out into the open, making them explicit, and ensure that we identify all sources of inaccuracy and make deliberate principled choices for all the parameters. I will provide a notebook that illustrates all the choices involved, and their effect on the priors. This way we can figure out which parameters are identifiable, and correspondingly fix parameter values or parameter priors according to domain knowledge, and enumerate all our assumptions. The relevant factors include:

  • The covariance kernel, e.g. "Matérn" or "squared exponential" (the smoothness $\nu\to\infty$ limit of Matérn). Celeri currently uses Matérn with $\nu=\tfrac{1}{2}$ which is probably not as smooth as desired, and we made the tentative decision to switch to "squared exponential". The kernel has two more parameters: the amplitude scale $\sigma$ and the correlation length scale $\ell$ which is a distance between two mesh elements.
  • The cutoff effects of using finitely many eigenmodes. This introduces a Nyquist length scale.
  • The triangulation. There are two main issues here:
    1. There's yet another third distance scale for the resolution of the mesh: the distribution of distances between adjacent centroids
    2. A weight must be chosen for each triangle. Currently celeri uses a constant weight, and this has the advantage of providing more detail along refined regions. The more geometrically principled approach is to weight by area, which plays nicely with continuum limits. To get the best of both worlds, we could default to weighting by area, and then when desired, have an optional "measure" factor on top of the area for detail control.

In addition to the above-mentioned distance-based kernels, there are also neighbor-based Laplacian kernels. One of the wishes was to check "distance-along-the-mesh" kernels, and the Laplacians are effectively both the best and easiest way to achieve this. (The naive distance-along-the-mesh kernels are not even positive-definite!!!) It's pretty straightforward to trade out the KL modes for the Laplacian modes, so this provides another "model selection" check.

When it comes to modeling the coupling constant, we enforce the constraint with a logit transform, and thus in logit space we need to set the amplitude scale, and also the mean! (Usually when discussing GPs we set the mean to zero.) The amplitude scale in logit space determines how sharply we transition between the 0 and 1 extremes. The mean determines our default guess for the coupling. Based on previous discussion, rather than using mean 0 in logit space, corresponding to default coupling of 0.5, we want the default coupling to be closer to 0.8. Also, we want to choose the amplitude so that coupling transitions gradually rather than sharply.

So what does this look like in the end? I see a module that takes a mesh and a bunch of configuration parameters and:

  • computes the eigenmodes according to the configuration parameters
  • given the additional non-covariance parameters, visualizes a bunch of prior draws on the mesh
  • computes and compares various analytic heuristics regarding effective correlation distance, Nyquist length, Weyl expansion, mesh properties, and identifies for example which length scale is dominant among correlation, Nyquist, or mesh.

@brendanjmeade
Copy link
Owner

@maresb and @aseyboldt It took me a while to read through this. Investigations here are smart so that we continue to bake in nice properties to the solution to improve the cavelier welded-together approach that I brought.

  • The covariance kernel, e.g. "Matérn" or "squared exponential" (the smoothness $\nu\to\infty$ limit of Matérn). Celeri currently uses Matérn with $\nu=\tfrac{1}{2}$ which is probably not as smooth as desired, and we made the tentative decision to switch to "squared exponential". The kernel has two more parameters: the amplitude scale $\sigma$ and the correlation length scale $\ell$ which is a distance between two mesh elements.

I'd like to understand the relative merits of these. Is there a theory that suggests the appropriateness of one form or another?

  • The triangulation. There are two main issues here:
    1. There's yet another third distance scale for the resolution of the mesh: the distribution of distances between adjacent centroids
    2. A weight must be chosen for each triangle. Currently celeri uses a constant weight, and this has the advantage of providing more detail along refined regions. The more geometrically principled approach is to weight by area, which plays nicely with continuum limits. To get the best of both worlds, we could default to weighting by area, and then when desired, have an optional "measure" factor on top of the area for detail control.

Very thoughtful and reasonable.

In addition to the above-mentioned distance-based kernels, there are also neighbor-based Laplacian kernels. One of the wishes was to check "distance-along-the-mesh" kernels, and the Laplacians are effectively both the best and easiest way to achieve this. (The naive distance-along-the-mesh kernels are not even positive-definite!!!) It's pretty straightforward to trade out the KL modes for the Laplacian modes, so this provides another "model selection" check.

I'm certainly open-minded about this. I found distance weighted eigenmodes, and they just seemed to work, so I went with it. Wise people have warned me off the geodesic distance!

When it comes to modeling the coupling constant, we enforce the constraint with a logit transform, and thus in logit space we need to set the amplitude scale, and also the mean! (Usually when discussing GPs we set the mean to zero.) The amplitude scale in logit space determines how sharply we transition between the 0 and 1 extremes. The mean determines our default guess for the coupling. Based on previous discussion, rather than using mean 0 in logit space, corresponding to default coupling of 0.5, we want the default coupling to be closer to 0.8. Also, we want to choose the amplitude so that coupling transitions gradually rather than sharply.

I agree with you on 0.8 as starting point. I'm less sure about how to tune the transitions.

@maresb
Copy link
Collaborator Author

maresb commented Oct 29, 2025

Is there a theory that suggests the appropriateness of one form or another?

Short answer: if you're modeling stock prices, then $\nu=\tfrac{1}{2}$ is probably a good bet. If everything has a Taylor series, then $\nu=\infty$ is good. If things are "mostly smooth", then $\nu=\tfrac{3}{2}$ and $\nu=\tfrac{5}{2}$ are popular choices. (Half-integers are popular because the Bessel function drops out.) Due to a well-defined limit, there's not a huge difference between $\nu=\tfrac{5}{2}$ and $\nu=\infty$. Practically, there is even less difference due to the eigenmode cutoff.

Longer answer: there's a fairly elementary and straightforward intuition for thinking about this stuff, but all the literature I've seen totally butchers it. I'll send a notebook soon that will explain things and let you experiment with the parameters.

I found distance weighted eigenmodes, and they just seemed to work, so I went with it. Wise people have warned me off the geodesic distance!

To be clear, the Laplacian methods are not geodesic distance, but rather capture the notion of distance "along" the mesh (rather than ambient 3D distance) in the "right" way to avoid the pathologies of geodesic distance. They're firmly grounded in theory, and actually my favored tool.

I'm less sure about how to tune the transitions.

Me too, but we can lay out the various distance scales, figure out the analytic heuristics, and use this to calibrate to your expertise and make an informed choice.

@brendanjmeade
Copy link
Owner

@maresb

Short answer: if you're modeling stock prices, then ν = 1 2 is probably a good bet. If everything has a Taylor series, then ν = ∞ is good. If things are "mostly smooth", then ν = 3 2 and ν = 5 2 are popular choices. (Half-integers are popular because the Bessel function drops out.) Due to a well-defined limit, there's not a huge difference between ν = 5 2 and ν = ∞ . Practically, there is even less difference due to the eigenmode cutoff.

That helps. It's effectively a smoothing. Let's let this be a tuning parameter set in the *_config.json file.

To be clear, the Laplacian methods are not geodesic distance, but rather capture the notion of distance "along" the mesh (rather than ambient 3D distance) in the "right" way to avoid the pathologies of geodesic distance. They're firmly grounded in theory, and actually my favored tool.

I'm pretty sure I tried this with some meshing tool a long time ago. IIRC, it was a nightmare because it was highly sensitive to element geometry. We have a lot of meshes with occasional very thin and or highly scalene triangles. I want to spend 0% of my life cleaning up meshes. A huge advantage of the DWE approach is that it does not depend on distances rather than the full triangle geometry. With this in mind, I'd say that unless there's an easy win here, I'm going to say that we should pass on the Laplace modes because of the mesh geometry dependency.

Summary:

  • Implement Matern prior and allow for setting $\nu$ in a *_config.json file.
  • Do not pursue the Laplace modes because I think this is likely to become a mesh detail disaster zone.

I'm optimistic that these can be implemented cleanly. Thoughts?

Repository owner deleted a comment from brendanjmeade Nov 15, 2025
@github-actions
Copy link

github-actions bot commented Nov 15, 2025

Note

Heads up, @maresb! main has updated pixi.lock since this branch diverged.

This PR doesn't modify pixi.lock, so no conflict exists yet. However, if you need to make any changes to pixi.lock, be sure to merge or rebase main first to avoid conflicts:

Remote upstream should point at brendanjmeade/celeri; your fork is maresb/celeri.

git checkout matern-prior
git fetch upstream
git merge upstream/main

Merge-base: 9532f7b, latest lockfile change on main: f374435.

This notice will be minimized automatically once pixi.lock on main matches the merge-base.


Generated automatically by .github/workflows/lockfile-alert.yaml.

Copy link
Owner

@brendanjmeade brendanjmeade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matern is happening!

@elc45
Copy link
Collaborator

elc45 commented Jan 6, 2026

@brendanjmeade @maresb: I made some plots to show how this changes mesh 0 for the WNA model. For the original DWE with 50 modes and distance exponent = 1, we have priors that look like this:

image

When we use coefficients equal to the eigenvalue of each eigenfunction, making it a proper Matern (1/2) kernel, we get smoother priors like

image

For the original, this leads to posteriors like

image

For the Matern:

image

Sorry for the colors being inconsistent...I was trying to balance being able to show the variation and not being misleading with it. This is just for the first GP change - scaling the std of the coefficients by their eigenvalues.

@brendanjmeade
Copy link
Owner

@elc45 Thanks for the figures! I still have a few questions. I'm still not clear whether something Matern-like is replacing DWEs or not. Where are these calculated? Or is it something like using Matern to define the prior distribution for elastic slip that is then constructed from some set of DWEs?

@elc45
Copy link
Collaborator

elc45 commented Jan 7, 2026

This PR does not change the eigenfunctions themselves, only their coefficients. The current forward model first draws coefficients $\alpha_i$ from $N(0, 10)$, and then sets the mesh to $\sum{\alpha_i v_i}$, where $v_i$ are the DWEs. What this PR does is that it changes the forward model to instead draw $\alpha_i\sim N(0, \lambda_i)$, where $\lambda$ are the corresponding eigenvalues of the DWEs. (It turns out this is actually equivalent to a Matern 1/2 kernel GP because of some functional analysis I don't fully grasp.) There's also some notes on this in the MCMC page in the wiki. Let me know if that makes sense @brendanjmeade

@brendanjmeade
Copy link
Owner

Completely understood and thank you @elc45!

@elc45
Copy link
Collaborator

elc45 commented Jan 11, 2026

@brendanjmeade @maresb I wanted to add just a couple more comments here: one, I updated the two posterior plots I included above in this thread. The plots were showing posterior draws of elastic velocities, not coupling values, which is why they were discontinuous. The GP's are on the coupling values, so I display those, which is more appropriate.

Second, I wanted to show what happens when one increases the number of eigenmodes used in the model, which highlights an advantage of Matern. Using our current approach, this is what we get when we crank up n. eigenmodes:

image

discontinuous - white noise everywhere. Now for the Matern prior:

image We can see it retains autocorrelation but allows for increasing roughness!

I'm adding a final commit now to allow for sigma to be tunable param (to change the scale when appropriate), and then this will be ready to merge.

@elc45 elc45 marked this pull request as ready for review January 11, 2026 15:23
@brendanjmeade
Copy link
Owner

@elc45 @maresb This is very interesting!

  1. Beginner question: Why do the modes depend on the prior?
  2. This isn't yet related to Recommendation: remove support for power-exponential kernel #362 ... but they will get connected.

@elc45
Copy link
Collaborator

elc45 commented Jan 11, 2026

@brendanjmeade The modes don't depend on the prior; we're just changing the prior on the coefficients $\alpha_i$ by which we multiply each mode in the sum $\sum_i{\alpha_i u_i}$ being plotted above. Each subplot in the first plot is generated by drawing $\alpha_i$ from $N(0, 10)$ (then multiplying by $u_i$) for $i$ in 1 to $n$. The subplots in the second plot draw $\alpha_i$ from $N(0,\lambda_i)$ for each $i$ in $n$. In both, the set $u_i$ are the same top $n$ modes of the covariance matrix. The crucial difference is in the behavior of the sum as $n$ grows.

The actual parameters in the model are these $\alpha_i$ values, and hence what are given priors and what inference is done over. In the scope of this PR, the eigenmodes can be considered attributes inherent to the mesh. If we shift to Matern 5/2, changing the covariance kernel, these eigenmodes will change, but just once.

maresb and others added 17 commits February 3, 2026 18:26
Now we're on our own, with just matern_sigma.
…cients

Both parameterizations are mathematically equivalent but change the
geometry of the sampling space, which can affect HMC performance.

- "non_centered" (default): Sample white noise and mollify via eigenvalue scaling
- "centered": Sample directly with heterogeneous variances
Define base_run_dir, centered_run_dir, non_centered_run_dir for easy
comparison between different MCMC runs.
Change from implicit else to explicit elif with ValueError for unknown kinds.

Co-authored-by: Cursor <[email protected]>
Add assertions to help type checker narrow None checks for
tde_to_velocities in solve_mcmc.py and test_solve_dense.py.

Co-authored-by: Cursor <[email protected]>
More precise naming: these are bounds, not limits.

Co-authored-by: Cursor <[email protected]>
- Add default_mcmc_*_sigma_* fields to Config
- Add apply_mcmc_prior_defaults validator to propagate defaults to meshes
- Replace coupling_sigma/elastic_sigma with ss/ds variants in MeshConfig
- Update operators.py cache hash exclusion list
- Update solve_mcmc.py to use new field names

Co-authored-by: Cursor <[email protected]>
- Add default_mcmc_*_mean_* and parameterization fields to Config
- Add mean fields to MeshConfig
- Update operators.py cache hash exclusion list
- Add _get_unconstrained_mean helper to solve_mcmc.py
- Update _coupling_component and _elastic_component to take mu_unconstrained
- Update _mesh_component to compute unconstrained mean from config

The mean can be specified in either constrained (bounded) or unconstrained
(transformed) space via the parameterization setting. Coupling defaults to
constrained mean 0.5, elastic defaults to unconstrained mean 0.0.

Co-authored-by: Cursor <[email protected]>
Remove separate strike_slip and dip_slip suffixed parameters for mean
and sigma. Use single coupling_mean, coupling_sigma, elastic_mean,
elastic_sigma values that apply to both slip directions.

Co-authored-by: Cursor <[email protected]>
Consistent naming with existing mcmc_default_mesh_* parameters.

Co-authored-by: Cursor <[email protected]>
…cale

These top-level defaults propagate to mesh configs that don't explicitly
set their own matern_nu/matern_length_scale values. Cache invalidation
is handled by the operator cache key which includes these parameters.

Co-authored-by: Cursor <[email protected]>
New CLI arguments:
- --mcmc-chains, --mcmc-backend
- --mcmc-station-velocity-method, --mcmc-station-weighting
- --mcmc-station-effective-area
- --mcmc-default-mesh-coupling-mean, --mcmc-default-mesh-coupling-sigma
- --mcmc-default-mesh-elastic-mean, --mcmc-default-mesh-elastic-sigma
- --mesh-default-eigenvector-algorithm
- --mcmc-default-mesh-matern-nu, --mcmc-default-mesh-matern-length-scale

Also handle "none"/"None" string for nullable fields like mcmc_station_weighting.

Co-authored-by: Cursor <[email protected]>
maresb and others added 7 commits February 9, 2026 10:13
MeshConfig had its own defaults (ν=0.5, ℓ=1.0, units="diameters",
algo="eigh") that silently overrode the Config-level defaults (ν=2.5,
ℓ=0.2) because the propagation code only ran when the top-level JSON
explicitly set these keys.

Fix by making MeshConfig.matern_nu, matern_length_scale,
matern_length_units, and eigenvector_algorithm default to None,
with a single source of truth in Config. Consolidate the two
separate propagation mechanisms (read_config + model_validator)
into one Config.apply_mesh_defaults validator. Add
mesh_default_matern_length_units to Config for completeness.

Co-authored-by: Cursor <[email protected]>
@maresb
Copy link
Collaborator Author

maresb commented Feb 9, 2026

I wanted to do a final parameter scan around the defaults. I'm satisfied with the param scan of the priors in the new doc_gp_params.ipynb, so we could also just merge this now and do the param scan once the Mac Studio comes back online. @aseyboldt @elc45

@elc45 elc45 merged commit 0133606 into brendanjmeade:main Feb 9, 2026
7 checks passed
@maresb maresb deleted the matern-prior branch February 9, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants