Skip to content

Conversation

Yuvraj198920
Copy link

@Yuvraj198920 Yuvraj198920 commented Aug 26, 2025

ROOT CAUSE ANALYSIS:
The issue occurs due to the architectural design of apply_dimension:

  1. apply_dimension Process: Uses xr.apply_ufunc which extracts underlying dask arrays
    when calling process functions (this is by design, not a bug)
  2. run_udf Function: Receives raw dask.array.core.Array instead of xr.DataArray
  3. Dimension Loss: When creating xr.DataArray(data) from raw arrays, dimension
    metadata was lost because the original dimension information was not available
  4. Result: UDFs received DataArrays with generic dimension names

CHANGES MADE:

  1. apply.py: Enhanced apply_dimension to pass explicit dimension metadata through context
  2. udf.py: Enhanced run_udf to use explicit dimension metadata when available
  3. Fallback Support: Maintains intelligent dimension naming for cases without context
  4. Backward Compatibility: Existing code continues to work unchanged

@Yuvraj198920 Yuvraj198920 marked this pull request as draft August 26, 2025 14:42
@Yuvraj198920 Yuvraj198920 self-assigned this Aug 26, 2025
@Yuvraj198920
Copy link
Author

cc @@jzvolensky @clausmichele

- Add dimension_helper.py with metadata-driven dimension restoration
- Converts generic dim_0,dim_1,etc. to semantic names (time,y,x,bands)
- Restores coordinate labels from UDF context metadata
- Robust error handling, never breaks UDF execution
- Updated __init__.py to export helper functions

Resolves issue Open-EO#330: UDF dimension names problem
@Yuvraj198920
Copy link
Author

Hi @clausmichele,

Problem: UDFs were receiving generic dimension names (dim_0, dim_1, etc.) instead of semantic names, making it difficult to access dimension labels internally.

Solution: Created a small helper function that re-assigns dimension names and labels at the beginning of UDFs using metadata from the context.

Usage:

from openeo_processes_dask.process_implementations.udf.dimension_helper import restore_semantic_dimensions

def apply_datacube(cube: xr.DataArray, context: dict) -> xr.DataArray:
    # Re-assign dimension names and labels at the beginning
    cube = restore_semantic_dimensions(cube, context)
    
    # Now you can access labels internally using semantic names
    return cube.sel(bands='B02')  # Works with real band names!

Key Benefits:

  • Converts ('dim_0', 'dim_1', 'dim_2', 'dim_3')('time', 'y', 'x', 'bands')
  • Restores real coordinate labels (band names, timestamps, spatial coords)
  • Metadata-driven (no hardcoded patterns)

cc @jzvolensky

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant