-
Notifications
You must be signed in to change notification settings - Fork 10
Use the eigenvalues to impose the standard Matérn prior for the GP #269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Sorry for the slow review, but this looks great, I think we can merge this. |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
Converted to draft because it contains a bunch of results in notebooks that I wouldn't want in the |
Preliminary resultsMy hypothesis for this experiment was a slight difference in this direction. I'm pretty surprised about the major difference, especially with "strike slip coupling". Maybe the higher modes are indeed prior-dominated. I'm wondering if @aseyboldt has any thoughts. What changed?"Before" we had implemented a simple white noise prior, where each coefficient had a standard normal prior. Here I was suspicious about the prominence of the high eigenfunctions. "After" we implemented a Matérn 🤓 details: The kernel we use to compute the KL modes is It's a bit of an open question how to set the overall scale for the prior. In the code I made it so the top before/after eigenvalues agree, and those are hard-coded to 10 for coupling and 1 for elastic. Samples from "before"Source: "Run notebook without" (the Matérn prior) https://github.com/brendanjmeade/celeri/blob/461530e9a6ea93b12650fde83d56995dbee37eb3/notebooks/doc_solve_mcmc.ipynb
Samples from "after"Source: "Run notebook with" (the Matérn prior) https://github.com/brendanjmeade/celeri/blob/f21cd2d9017c6f90bfec2906eccb4d830be0ae0a/notebooks/doc_solve_mcmc.ipynb
|
|
I'm not sure yet, but to me that looks more like a bug than an actual
difference due to the prior. How could the model possibly know that there
are definitely no small scale variations?
I haven't spotted a concrete problem in the code, but could it be related
to how the eigenvectors are stored in celeri? Maybe we somehow ended up
treating the eigenvectors of the strike slip component as higher frequency
components and scale them down more?
…On Fri, 17 Oct 2025, 23:20 Ben Mares, ***@***.***> wrote:
*maresb* left a comment (brendanjmeade/celeri#269)
<#269 (comment)>
Preliminary results
My hypothesis for this experiment was a slight difference in this
direction. I'm pretty surprised about the major difference, especially with
"strike slip". Maybe the higher modes are indeed prior-dominated. I'm
wondering if @aseyboldt <https://github.com/aseyboldt> has any thoughts.
What changed?
"Before" we had implemented a simple white noise prior, where each
coefficient had a standard normal prior. Here I was suspicious about the
prominence of the high eigenfunctions.
"After" we implemented a Matérn $\nu=\tfrac{1}{2}$ prior. This dampens
the eigenfunction priors to be a normal with variance equal to the
eigenvalue. In other words, it makes the plots look less wobbly.
🤓 details: The kernel we use to compute the KL modes is $\exp(-d/\ell)$,
which is the Matérn kernel with $\nu=\tfrac{1}{2}$. In contrast, the
white noise prior is effectively $\nu=-1$, so in various technical senses
it has 1.5 derivatives less. Note that in the limit of infinitely many
eigenfunctions, $\nu=\tfrac{1}{2}$ samples are almost surely $\alpha$-Hölder
continuous for $\alpha<\tfrac{1}{2}$, which means that as you crank up
the number of eigenfunctions, samples will be continuous, and be very close
to satisfying $|f(x+\delta) - f(x)| < C \sqrt\delta$. (A
differentiable function will basically by definition satisfy this with $<C
\left|\delta\right|$.) In contrast, with white noise, things will diverge
when you crank up the number of eigenfunctions.
It's a bit of an open question how to set the overall scale for the prior.
In the code I made it so the top before/after eigenvalues agree, and those
are hard-coded to 10 for coupling and 1 for elastic.
Samples from "before"
Source: "Run notebook without" (the Matérn prior)
https://github.com/brendanjmeade/celeri/blob/461530e9a6ea93b12650fde83d56995dbee37eb3/notebooks/doc_solve_mcmc.ipynb
image.png (view on web)
<https://github.com/user-attachments/assets/ee22eff6-640d-4a55-8550-ae2fb86e54d8> image.png
(view on web)
<https://github.com/user-attachments/assets/29fd08ea-eb2c-4739-b9db-ac68fc98f778> image.png
(view on web)
<https://github.com/user-attachments/assets/30d64279-7daf-489a-9472-0643f694fe2d> Samples
from "after"
Source: "Run notebook with" (the Matérn prior)
https://github.com/brendanjmeade/celeri/blob/f21cd2d9017c6f90bfec2906eccb4d830be0ae0a/notebooks/doc_solve_mcmc.ipynb
image.png (view on web)
<https://github.com/user-attachments/assets/decb8ef4-d7cf-4938-b03c-b20ec17f1d3d> image.png
(view on web)
<https://github.com/user-attachments/assets/d1f6a164-17e2-4834-b92f-de254ecf9e2a> image.png
(view on web)
<https://github.com/user-attachments/assets/d83847d5-197b-4bde-80b7-99c0596feba5>
—
Reply to this email directly, view it on GitHub
<#269 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOLSHOZYIKYVCHGGSHYVFL3YFMSJAVCNFSM6AAAAACHC57S3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIMJXGI3TANJVHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Ya, all the other results are consistent with what I expected: toning down the excessive "leopard spots". I'm really surprised though that the strike slip coupling looks so smooth. |
This was the problem. It's better to not to hard-code the scale of the top eigenvalue. I'm still not very confident about how to choose the overall scale for the priors. |
|
Let me outline how I plan to finish this out. There are a lot of skeletons lurking in the closet with a GP like this. My goal is to bring them all out into the open, making them explicit, and ensure that we identify all sources of inaccuracy and make deliberate principled choices for all the parameters. I will provide a notebook that illustrates all the choices involved, and their effect on the priors. This way we can figure out which parameters are identifiable, and correspondingly fix parameter values or parameter priors according to domain knowledge, and enumerate all our assumptions. The relevant factors include:
In addition to the above-mentioned distance-based kernels, there are also neighbor-based Laplacian kernels. One of the wishes was to check "distance-along-the-mesh" kernels, and the Laplacians are effectively both the best and easiest way to achieve this. (The naive distance-along-the-mesh kernels are not even positive-definite!!!) It's pretty straightforward to trade out the KL modes for the Laplacian modes, so this provides another "model selection" check. When it comes to modeling the coupling constant, we enforce the constraint with a logit transform, and thus in logit space we need to set the amplitude scale, and also the mean! (Usually when discussing GPs we set the mean to zero.) The amplitude scale in logit space determines how sharply we transition between the 0 and 1 extremes. The mean determines our default guess for the coupling. Based on previous discussion, rather than using mean 0 in logit space, corresponding to default coupling of 0.5, we want the default coupling to be closer to 0.8. Also, we want to choose the amplitude so that coupling transitions gradually rather than sharply. So what does this look like in the end? I see a module that takes a mesh and a bunch of configuration parameters and:
|
|
@maresb and @aseyboldt It took me a while to read through this. Investigations here are smart so that we continue to bake in nice properties to the solution to improve the cavelier welded-together approach that I brought.
I'd like to understand the relative merits of these. Is there a theory that suggests the appropriateness of one form or another?
Very thoughtful and reasonable.
I'm certainly open-minded about this. I found distance weighted eigenmodes, and they just seemed to work, so I went with it. Wise people have warned me off the geodesic distance!
I agree with you on 0.8 as starting point. I'm less sure about how to tune the transitions. |
Short answer: if you're modeling stock prices, then Longer answer: there's a fairly elementary and straightforward intuition for thinking about this stuff, but all the literature I've seen totally butchers it. I'll send a notebook soon that will explain things and let you experiment with the parameters.
To be clear, the Laplacian methods are not geodesic distance, but rather capture the notion of distance "along" the mesh (rather than ambient 3D distance) in the "right" way to avoid the pathologies of geodesic distance. They're firmly grounded in theory, and actually my favored tool.
Me too, but we can lay out the various distance scales, figure out the analytic heuristics, and use this to calibrate to your expertise and make an informed choice. |
That helps. It's effectively a smoothing. Let's let this be a tuning parameter set in the
I'm pretty sure I tried this with some meshing tool a long time ago. IIRC, it was a nightmare because it was highly sensitive to element geometry. We have a lot of meshes with occasional very thin and or highly scalene triangles. I want to spend 0% of my life cleaning up meshes. A huge advantage of the DWE approach is that it does not depend on distances rather than the full triangle geometry. With this in mind, I'd say that unless there's an easy win here, I'm going to say that we should pass on the Laplace modes because of the mesh geometry dependency. Summary:
I'm optimistic that these can be implemented cleanly. Thoughts? |
|
Note Heads up, @maresb! This PR doesn't modify Remote git checkout matern-prior
git fetch upstream
git merge upstream/mainMerge-base: 9532f7b, latest lockfile change on This notice will be minimized automatically once Generated automatically by |
brendanjmeade
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matern is happening!
|
@brendanjmeade @maresb: I made some plots to show how this changes mesh 0 for the WNA model. For the original DWE with 50 modes and distance exponent = 1, we have priors that look like this:
When we use coefficients equal to the eigenvalue of each eigenfunction, making it a proper Matern (1/2) kernel, we get smoother priors like
For the original, this leads to posteriors like
For the Matern:
Sorry for the colors being inconsistent...I was trying to balance being able to show the variation and not being misleading with it. This is just for the first GP change - scaling the std of the coefficients by their eigenvalues. |
|
@elc45 Thanks for the figures! I still have a few questions. I'm still not clear whether something Matern-like is replacing DWEs or not. Where are these calculated? Or is it something like using Matern to define the prior distribution for elastic slip that is then constructed from some set of DWEs? |
|
This PR does not change the eigenfunctions themselves, only their coefficients. The current forward model first draws coefficients |
|
Completely understood and thank you @elc45! |
|
@brendanjmeade @maresb I wanted to add just a couple more comments here: one, I updated the two posterior plots I included above in this thread. The plots were showing posterior draws of elastic velocities, not coupling values, which is why they were discontinuous. The GP's are on the coupling values, so I display those, which is more appropriate. Second, I wanted to show what happens when one increases the number of eigenmodes used in the model, which highlights an advantage of Matern. Using our current approach, this is what we get when we crank up n. eigenmodes:
discontinuous - white noise everywhere. Now for the Matern prior:
We can see it retains autocorrelation but allows for increasing roughness!
I'm adding a final commit now to allow for sigma to be tunable param (to change the scale when appropriate), and then this will be ready to merge. |
|
@elc45 @maresb This is very interesting!
|
|
@brendanjmeade The modes don't depend on the prior; we're just changing the prior on the coefficients The actual parameters in the model are these |
Now we're on our own, with just matern_sigma.
…cients Both parameterizations are mathematically equivalent but change the geometry of the sampling space, which can affect HMC performance. - "non_centered" (default): Sample white noise and mollify via eigenvalue scaling - "centered": Sample directly with heterogeneous variances
Define base_run_dir, centered_run_dir, non_centered_run_dir for easy comparison between different MCMC runs.
Change from implicit else to explicit elif with ValueError for unknown kinds. Co-authored-by: Cursor <[email protected]>
Add assertions to help type checker narrow None checks for tde_to_velocities in solve_mcmc.py and test_solve_dense.py. Co-authored-by: Cursor <[email protected]>
More precise naming: these are bounds, not limits. Co-authored-by: Cursor <[email protected]>
- Add default_mcmc_*_sigma_* fields to Config - Add apply_mcmc_prior_defaults validator to propagate defaults to meshes - Replace coupling_sigma/elastic_sigma with ss/ds variants in MeshConfig - Update operators.py cache hash exclusion list - Update solve_mcmc.py to use new field names Co-authored-by: Cursor <[email protected]>
- Add default_mcmc_*_mean_* and parameterization fields to Config - Add mean fields to MeshConfig - Update operators.py cache hash exclusion list - Add _get_unconstrained_mean helper to solve_mcmc.py - Update _coupling_component and _elastic_component to take mu_unconstrained - Update _mesh_component to compute unconstrained mean from config The mean can be specified in either constrained (bounded) or unconstrained (transformed) space via the parameterization setting. Coupling defaults to constrained mean 0.5, elastic defaults to unconstrained mean 0.0. Co-authored-by: Cursor <[email protected]>
Remove separate strike_slip and dip_slip suffixed parameters for mean and sigma. Use single coupling_mean, coupling_sigma, elastic_mean, elastic_sigma values that apply to both slip directions. Co-authored-by: Cursor <[email protected]>
Consistent naming with existing mcmc_default_mesh_* parameters. Co-authored-by: Cursor <[email protected]>
Co-authored-by: Cursor <[email protected]>
Co-authored-by: Cursor <[email protected]>
…cale These top-level defaults propagate to mesh configs that don't explicitly set their own matern_nu/matern_length_scale values. Cache invalidation is handled by the operator cache key which includes these parameters. Co-authored-by: Cursor <[email protected]>
New CLI arguments: - --mcmc-chains, --mcmc-backend - --mcmc-station-velocity-method, --mcmc-station-weighting - --mcmc-station-effective-area - --mcmc-default-mesh-coupling-mean, --mcmc-default-mesh-coupling-sigma - --mcmc-default-mesh-elastic-mean, --mcmc-default-mesh-elastic-sigma - --mesh-default-eigenvector-algorithm - --mcmc-default-mesh-matern-nu, --mcmc-default-mesh-matern-length-scale Also handle "none"/"None" string for nullable fields like mcmc_station_weighting. Co-authored-by: Cursor <[email protected]>
MeshConfig had its own defaults (ν=0.5, ℓ=1.0, units="diameters", algo="eigh") that silently overrode the Config-level defaults (ν=2.5, ℓ=0.2) because the propagation code only ran when the top-level JSON explicitly set these keys. Fix by making MeshConfig.matern_nu, matern_length_scale, matern_length_units, and eigenvector_algorithm default to None, with a single source of truth in Config. Consolidate the two separate propagation mechanisms (read_config + model_validator) into one Config.apply_mesh_defaults validator. Add mesh_default_matern_length_units to Config for completeness. Co-authored-by: Cursor <[email protected]>
|
I wanted to do a final parameter scan around the defaults. I'm satisfied with the param scan of the priors in the new doc_gp_params.ipynb, so we could also just merge this now and do the param scan once the Mac Studio comes back online. @aseyboldt @elc45 |












I'm not sure if it makes a difference, but we should be attenuating the coefficients in our GP according to the standard decay for the (in this case Matérn) GP prior. Namely, the variance should be the eigenvalue. @aseyboldt, for some reason this is running really slowly for me (also with
main). Could you check if this works for you?Before: all coefficients have std = 1 for elastic or std = 10 for coupling
After: coefficients have std = C sqrt(lambda), where C is chosen so that the top eigenvalue has the previous default std of 1 or 10.