Replies: 1 comment
-
|
Just some comments unrelated to your question about interpretability, but perhaps your parsimony (0.01) is too high for your loss? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a domain-specific question, but would like to gain general insight from anyone who might've encountered a similar puzzle. Put simply: how do I know when I’ve sufficiently reproduced a known functional form versus when I’ve truly discovered a new one?
Concretely, I have a sparse dataset (24 samples, each from molecular simulations) of calculated excess enthalpies of mixing, or$\Delta H_{mix}$ as a function of mole fraction of some component A within a mixture of A and B, $\chi_{A}$ . There are several semi-empirical models that capture mole fraction-dependence of $\Delta H_{mix}$ , like Redlich-Kester polynomial series, van Laar, Margules two-parameter model, Huron-Vidal, etc. These relations can work fine for weakly polar or non-polar mixtures, but for novel systems of interest to me, they (presumably) aren't as well defined.
Nonetheless, there is a list of well-defined traits of an expressive mixture model, as outlined nicely by Focke et al.. For example, we generally want enthalpy of mixing to follow
so that it "reduces to the pure component value when any mole or mass fraction approaches unity". Similarly, I've tried to ensure that my functional search space encompasses existing models while expansive enough to explore new forms as well. I've attached my data and a few of these model fits + a symbolic regression-generated functional form (PySR inputs included at the bottom).
Note: I weight my data by inverse uncertainties (not plotted above), where at increasing mole fractions, I observe larger uncertainties (up to ~10%).
The more challenging task is computing derived properties from$\Delta H_{mix}$ . For instance, the partial molar enthalpy, $h$ , is defined as
and I've computed these below:
What I’m observing is a delicate balance between keeping the search space tractable and still allowing exploration beyond existing models. Interestingly, my PySR output identifies a form that can be rewritten into the van Laar model, while also suggesting a slightly better-performing alternative:
If symbolic regression reproduces an existing model, that suggests such a model is trustworthy. Conversely, if the search space is restricted to noncompetitive forms, then reproducing known models isn’t surprising. I’d greatly appreciate any rules of thumb or holistic insights on how one may interpret these results!
pySR Inputs:
My general pySR setup is as below:
Beta Was this translation helpful? Give feedback.
All reactions