Skip to content

added default values for field decoding (#788) #792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 27, 2025

Conversation

kingkushal16
Copy link
Contributor

@kingkushal16 kingkushal16 commented Jul 23, 2025

fixed #788

Overview

Implements robust auto-defaulting for missing fields in the CocoIndex engine value decoder, improving error handling and data conversion flexibility. This addresses #788.


Changes

  • Auto-defaulting is always enabled for supported types (no user-facing option).
  • Type-aware defaults:
    • Only nullable types and table structures (LTable / KTable) are auto-defaulted.
  • Clear error messages: When auto-defaulting is not supported, errors include field names and type information.
  • Warning system: Users are notified via warnings when auto-defaults are applied.
  • Test coverage: New tests ensure correct auto-defaulting and error reporting.

Key Features

Auto-Default Rules

Type Default Value Notes
Optional[T] / T | None None Nullable types
LTable [] Defaults to an empty list
KTable {} Defaults to an empty dictionary
All other types ❌ No default Missing fields raise an error

Enhanced Error Handling

  • Field-specific error messages: Include field name and expected type.
  • Warnings are issued when auto-defaults are applied.

Implementation Details

  • Introduced _get_auto_default_for_type() helper:
    • Returns the default value.
    • Indicates whether auto-defaulting is supported.
  • All decoder logic updated to use this helper.
  • Removed auto_default_missing_fields parameter.
    • Auto-defaulting is always on for supported types and off for others.

Files Modified

  • python/cocoindex/convert.py: Core implementation and error/warning improvements
  • python/cocoindex/tests/test_convert.py: New tests for auto-defaulting and error reporting

class ExtraFieldSupported:
a: int
b: int
c: list[int] = field(default_factory=list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using an explicit default value - which is not an auto default case we start to support by this PR. I think we want to test with the following cases: a field with int | None type, and a field with list[Base] type (as LTable), and a field with dict[str, Base] type (as KTable), and all shouldn't have explicit default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dataclass
class Base:
    a: int

@dataclass
class NullableField:
    a: int
    b: int | None

@dataclass
class LTableField:
    a: int
    b: list[Base]

@dataclass
class KTableField:
    a: int
    b: dict[str, Base]

@dataclass
class UnsupportedField:
    a: int
    b: int

would these dataclasses be fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look good. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the KtableField I am getting an error

TypeError: 'dict' object cannot be converted to 'Sequence' when using validate_full_roundtrip(KTableField(1, {}), KTableField) Can you guys help me out with this. Attaching the dumps below.

=================================== FAILURES ===================================
____________ test_auto_default_for_supported_and_unsupported_types _____________

def test_auto_default_for_supported_and_unsupported_types() -> None:
    @dataclass
    class Base:
        a: int

    @dataclass
    class NullableField:
        a: int
        b: int | None

    @dataclass
    class LTableField:
        a: int
        b: list[Base]

    @dataclass
    class KTableField:
        a: int
        b: dict[str, Base]

    @dataclass
    class UnsupportedField:
        a: int
        b: int

    engine_val = [1]

    validate_full_roundtrip(NullableField(1, None), NullableField)

    validate_full_roundtrip(LTableField(1, []), LTableField)

    decoder = build_engine_value_decoder(KTableField)
    result = decoder(engine_val)
    assert result == KTableField(1, {})
  validate_full_roundtrip(KTableField(1, {}), KTableField)

python/cocoindex/tests/test_convert.py:1508:


python/cocoindex/tests/test_convert.py:122: in validate_full_roundtrip
validate_full_roundtrip_to(


value = test_auto_default_for_supported_and_unsupported_types..KTableField(a=1, b={})
value_type = <class 'cocoindex.tests.test_convert.test_auto_default_for_supported_and_unsupported_types..KTableField'>
decoded_values = ((test_auto_default_for_supported_and_unsupported_types..KTableField(a=1, b={}), <class 'cocoindex.tests.test_convert.test_auto_default_for_supported_and_unsupported_types..KTableField'>),)
_engine = <module 'cocoindex._engine' from '/home/kushal/Desktop/Open-Source/cocoindex/python/cocoindex/_engine.cpython-312-x86_64-linux-gnu.so'>
eq = <function validate_full_roundtrip_to..eq at 0x7d7a6d6d9940>
encoded_value = [1, {}]

def validate_full_roundtrip_to(
    value: Any,
    value_type: Any,
    *decoded_values: tuple[Any, Any],
) -> None:
    """
    Validate the given value becomes specific values after encoding, sending to engine (using output_type), receiving back and decoding (using input_type).

    `decoded_values` is a tuple of (value, type) pairs.
    """
    from cocoindex import _engine  # type: ignore

    def eq(a: Any, b: Any) -> bool:
        if isinstance(a, np.ndarray) and isinstance(b, np.ndarray):
            return np.array_equal(a, b)
        return type(a) is type(b) and not not (a == b)

    encoded_value = encode_engine_value(value)
    value_type = value_type or type(value)
    encoded_output_type = encode_enriched_type(value_type)["type"]
  value_from_engine = _engine.testutil.seder_roundtrip(
        encoded_value, encoded_output_type
    )

E TypeError: 'dict' object cannot be converted to 'Sequence'

python/cocoindex/tests/test_convert.py:101: TypeError

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. The test case is validate_full_roundtrip(KTableField(1, {}), KTableField). Here the empty dict is being passed. When encoding struct fields that contained empty dicts (KTables), the function was preserving them as {}, but the expected engine format for KTables was [].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both! Since fix for the issue (PR #807) may still need more time to be resolved, we can disable this specific test case (e.g. comment out the specific line) and merge the PR first. After the fix is in, we can re-enable it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks guys, I've commented out the line for now and have added a test case just for the decoder.

Comment on lines 426 to 432
default_value = param.default
if default_value is not inspect.Parameter.empty:
return lambda _: default_value

auto_default = _handle_missing_field_with_auto_default(param, name, field_path)
return lambda _: auto_default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these lines need to be indented under the with ChildFieldPath(field_path, f".{name}"):, because it's for this specific field (the .{name} need to be included in error messages)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have addressed this. Thanks.

Copy link
Member

@badmonster0 badmonster0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All look good! Thanks for the PR!

@badmonster0 badmonster0 merged commit 6e6d28f into cocoindex-io:main Jul 27, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Enhance field decoding by adding type-based defaults Validate output data matching expected schema from sources / functions
4 participants