Skip to content

Question regarding the intuition of Geometry Consistency Loss Lg #94

@LabubuJ

Description

@LabubuJ

Thank you for your impressive work on AnySplat! I have a question regarding the intuition behind the Geometry Consistency Loss ($\mathcal{L}_g$) introduced in Section 3.3.

Observation:
Your paper notes that depth predictions from the DPT head are often inconsistent across different views, manifesting as "layered sheets" when lifted to 3D space. This implies that for the same physical point, the DPT-predicted depth $D_i$ (from camera $i$) and $D_j$ (from camera $j$) provide conflicting values.

The Question:
In Eq. (6), the model enforces alignment between the DPT-predicted depth ($D_i$) and the rendered 3DGS depth ($\hat{D}_i$):

$\mathcal{L}_g = \frac{1}{N} \sum_{i=1}^{n} (D_i[M] - \hat{D}_i[M])^2$

Since the DPT depth ($D_i$) is the primary source of the inconsistency (the "layers"), why does forcing the unified 3DGS representation to align with these inconsistent targets result in a more coherent surface geometry rather than simply propagating the "layering" error?

Is the optimization process essentially performing a "multi-view consensus", where the single 3DGS model effectively "averages" the conflicting $D_i$ targets to find a single, consistent surface that satisfies all views?

I would appreciate any insights on why this self-alignment loop is so effective at "smoothing out" inconsistencies that are present in the supervisor itself.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions