-
Notifications
You must be signed in to change notification settings - Fork 6
Description
In the original FlexCode paper, authors show that minimize the integrated squared loss on the densities is equivalent to running basic MSE regression on each of the J basis functions with squared error loss.
Eq (2.5) and paragraph below in
https://projecteuclid.org/journals/electronic-journal-of-statistics/volume-11/issue-2/Converting-high-dimensional-regression-to-high-dimensional-conditional-density-estimation/10.1214/17-EJS1302.full
In the cde_loss implementation here [and in the deepCDE paper] this is not used directly, but instead it's just using the squared beta - 2*inner product (beta, basis) + 1. What's the rationale for this? Is this because z_basis is an orthonormal basis and thus z^2 = 1 [even though for any given batch that's not guaranteed to be the case]?
Asking since if I naively implement flexcode as a neural network loss for a multivariate output layer, I would just go with use loss="mse" with y_true=z_basis and y_pred=betas.