Skip to content

Conversation

@redruin1
Copy link

@redruin1 redruin1 commented Apr 2, 2025

This PR adds an omit_if parameter to cattrs unstructure generators (as described here), which is similar in function to omit_if_default but evaluates a callable at unstructure invoke in order to determine whether an attribute should be omitted:

@define
class Example:
    a: int
    b: int | None
    c: int
    d: int = 123
    e: set = field(factory=set)
    f: int = field()

    @f.default
    def f_default(self) -> int:
        return self.d

    overridden: None = None

converter = Converter()

def default_or_none(instance: AttrsInstance, attribute: Attribute, value: Any) -> bool:
    if value is None:
        return True
    if isinstance(attribute.default, Factory):
        if attribute.default.takes_self:
            return value == attribute.default.factory(instance)
        return value == attribute.default.factory()
    return value == attribute.default    

converter.register_unstructure_hook(
    Example,
    make_dict_unstructure_fn(
        Example,
        converter,
        _cattrs_omit_if=default_or_none,
        c=override(omit_if=lambda inst, attr, value: value < 0),
        overridden=override(omit_if=False),
    ),
)

assert converter.unstructure(Example(a=100, b=None, c=-100, f=123)) == {"a": 100, "overridden": None}

This allows users to control how the unstructure function should behave at runtime (beyond the singular omit_if_default), wheras currently users can only robustly choose how/when to omit fields when the function is first generated.

omit_if is technically a superset of omit_if_default, hence their similar names. However, this implementation leaves omit_if_default in function signatures for 3 reasons:

  1. Backwards compatibility
  2. Performance. An apples-to-apples comparison using a lambda instead of generating in-situ is about twice as slow, at least according to the minimal benchmarks I wrote and performed. This could maybe be improved, but I don't see it ever being faster than embedding the checks directly into the function.
  3. attrs doesn't currently have a convenient get_default(instance, attribute) equivalent, which means they must write their own function if somebody just wants the old omit_if_default behavior with the new omit_if parameter - which will lead to dozens of slightly different implementations and (of course) bugs.

If both omit_if_default and omit_if are defined, then omit_if_default takes priority. However, I can also see a world where defining both simultaneously is forbidden, as which should take precedent is not intrinsically apparent.

This is a proof of concept PR, it is not currently complete. I am interested in completing it (ensuring coverage/writing docs/etc.) but I want the go-ahead before I invest serious amounts of time into it.

redruin1 added 6 commits April 1, 2025 04:08
coverage is deciding not to work on my machine for some reason
accidentally omitted part of another test
@Tinche
Copy link
Member

Tinche commented Apr 5, 2025

I think this is a good idea and we should do it. Let's brainstorm the design a little bit.

It would be great if at the end of this we could soft-deprecate omit and omit_if_default (don't think we can ever remove them because of backwards comp, but we can steer folks to the new API). It should definitely be an error to use both.

I think we can make omit_if="default" trigger the same code path as omit_if_default=True, and similarly for omit_if=True (or we could make it omit_if="always" for clarity?).

The use case you mention is a predicate for unstructuring time, and yeah that sounds great. Would it make sense to just have it receive the attribute value, instead of the three things? The other 2 parameters are known at hook generation time. If they are needed, users could wrap the hook factory, like today.

So the type for the omit_if= parameter could be Literal["default", "always"] | Callable[[Any], bool].

@redruin1
Copy link
Author

redruin1 commented Apr 5, 2025

It should definitely be an error to use both (omit_if and omit_if_default).

Roger, I agree.

I think we can make omit_if="default" trigger the same code path as omit_if_default=True, and similarly for omit_if=True (or we could make it omit_if="always" for clarity?).

I like omit_if="default" becoming the reserved literal for triggering the existing code path a lot; it reads very plainly and is self-documenting. The reason that I used a bool for the override versions of omit_if was because they are essentially reductions of predicates that always return true or false:

# To me, `omit_if=condition` is essentially saying "omit if `condition` evaluates to True"
# So:
override(omit_if=True) == override(omit_if=lambda _: True)
override(omit_if=False) == override(omit_if=lambda _: False)

I think if I had to pick between omit_if: bool and omit_if: Literal["always", "never"] I would probably prefer the former, as to me it just aligns a little more closely with the callable case. However, if you disagree there is no practical reason why we couldn't use whatever literals you wanted, and I suppose if we are set on using "default" then string literals would be overall more consistent.

The use case you mention is a predicate for unstructuring time, and yeah that sounds great. Would it make sense to just have it receive the attribute value, instead of the three things? The other 2 parameters are known at hook generation time. If they are needed, users could wrap the hook factory, like today.

The attribute(s) are absolutely available at generation time, but doesn't the constructed instance live entirely inside of make_dict_unstructure_fn? I believe you when you tell me it is possible, but could you show me exactly how? I'm still new to cattrs.

That being said, if it is possible to do so then it would make the predicates much more clear and intuitive to write, so I'm entirely in favor of minimizing the function signature.

Finally: what is the process for running coverage? I'm working on Windows, and I was having some trouble setting up the development environment. Is there some additional steps that I should take, or is the repo not quite suited for my OS currently? (I noticed most of the tooling was structured around make, for example.)

@redruin1
Copy link
Author

redruin1 commented Apr 6, 2025

Actually, I've thought about it a bit more. It seems to me that there are 2 different things that we're configuring; omitting fields at function generation time, and omitting fields at runtime. omit currently already entirely takes care of configuring at generation time, and users can wrap their generation function so that the value of omit for each field can be dynamically set on some condition or predicate:

def my_unstructure_wrapper(cls, converter, omit: Callable[[attrs.Attribute], bool])
    overrides = {}
    for attr in attrs.fields(cls):
        overrides[attr.name] = override(omit=omit(attr))
    return make_dict_unstructure_fn(cls, converter, **overrides)
    
converter.register_unstructure_hook(
    Example,
    my_unstructure_wrapper(Example, converter, lambda attr: isinstance(attr.type, set))
)

The other half is configuring at runtime, of which omit_if (and formerly omit_if_default) are used for. I'm wondering now if it makes more sense to have both omit and omit_if as valid (mutually exclusive) options for the two separate steps, only choosing to deprecate omit_if_default going forward:

make_dict_unstructure_fn(
    Example,
    converter,
    _cattrs_omit_if="default", # original code path
    a=override(omit=True), # GENERATION: Always removed
    b=override(omit=False), # GENERATION: Always included unconditionally
    c=override(omit_if="default"), # RUNTIME: Exact same as parent, equivalent to no override
    d=override(omit_if=custom_condition), # RUNTIME: Field is omitted if condition passes
    e=override(...), # `omit`/`omit_if` are both None, so we default to the converter's configuration
    #f=override(omit=False, omit_if=custom_condition) # Error: both cannot be specified as it is ambiguous
),

This way we could leave omit entirely unchanged, and the signature for omit_if would become Literal["default"] | Callable[[Any], bool]. Combined with documentation, this means that a user could quickly and plainly see omit and know "pregenerated check" and see omit_if and know "runtime check".

@Tinche
Copy link
Member

Tinche commented Apr 6, 2025

Finally: what is the process for running coverage?

You should be able to just run pytest under coverage (coverage run -m pytest I think, I'm on mobile currently). There shouldn't be anything OS-specific in the toolchain; make is just a convenience.

As for omit vs omit_if - I agree with your analysis but I still think it's worthwhile coalescing them into a single parameter, on the basis of making invalid states unrepresentable.

@redruin1
Copy link
Author

redruin1 commented Apr 9, 2025

I get the same issue on Linux Mint. What step am I not performing?

red@red-desktop:~/SourceCode/repos/Python/cattrs$ pdm -V
PDM, version 2.23.1
red@red-desktop:~/SourceCode/repos/Python/cattrs$ python -m venv .venv
red@red-desktop:~/SourceCode/repos/Python/cattrs$ source .venv/bin/activate
(.venv) red@red-desktop:~/SourceCode/repos/Python/cattrs$ pdm install -d -G :all
INFO: Inside an active virtualenv /home/red/SourceCode/repos/Python/cattrs/.venv, reusing it.
Set env var PDM_IGNORE_ACTIVE_VENV to ignore it.
Synchronizing working set with resolved packages: 76 to add, 0 to update, 0 to remove

  ✔ Install certifi 2025.1.31 successful
  ✔ Install colorama 0.4.6 successful
  ✔ Install alabaster 0.7.16 successful
  ✔ Install anyio 4.9.0 successful
  ✔ Install attrs 25.3.0 successful
  ✔ Install click 8.1.8 successful
  ✔ Install exceptiongroup 1.2.2 successful
  ✔ Install beautifulsoup4 4.13.3 successful
  ✔ Install execnet 2.1.1 successful
  ✔ Install h11 0.14.0 successful
  ✔ Install dnspython 2.7.0 successful
  ✔ Install imagesize 1.4.1 successful
  ✔ Install idna 3.10 successful
  ✔ Install iniconfig 2.1.0 successful
  ✔ Install furo 2024.8.6 successful
  ✔ Install jinja2 3.1.6 successful
  ✔ Install cbor2 5.6.5 successful
  ✔ Install markdown-it-py 3.0.0 successful
  ✔ Install mdit-py-plugins 0.4.2 successful
  ✔ Install docutils 0.21.2 successful
  ✔ Install mdurl 0.1.2 successful
  ✔ Install mypy-extensions 1.0.0 successful
  ✔ Install myst-parser 3.0.1 successful
  ✔ Install packaging 24.2 successful
  ✔ Install pathspec 0.12.1 successful
  ✔ Install black 25.1.0 successful
  ✔ Install pluggy 1.5.0 successful
  ✔ Install platformdirs 4.3.7 successful
  ✔ Install hypothesis 6.130.13 successful
  ✔ Install py-cpuinfo 9.0.0 successful
  ✔ Install immutables 0.21 successful
  ✔ Install pyperf 2.9.0 successful
  ✔ Install pytest 8.3.5 successful
  ✔ Install pygments 2.19.1 successful
  ✔ Install pytest-benchmark 5.1.0 successful
  ✔ Install pytest-xdist 3.6.1 successful
  ✔ Install python-dateutil 2.9.0.post0 successful
  ✔ Install charset-normalizer 3.4.1 successful
  ✔ Install markupsafe 3.0.2 successful
  ✔ Install babel 2.17.0 successful
  ✔ Install requests 2.32.3 successful
  ✔ Install six 1.17.0 successful
  ✔ Install sniffio 1.3.1 successful
  ✔ Install sortedcontainers 2.4.0 successful
  ✔ Install pendulum 3.0.0 successful
  ✔ Install snowballstemmer 2.2.0 successful
  ✔ Install soupsieve 2.6 successful
  ✔ Install msgspec 0.19.0 successful
  ✔ Install sphinx-basic-ng 1.0.0b2 successful
  ✔ Install sphinx-autobuild 2024.10.3 successful
  ✔ Install sphinx-copybutton 0.5.2 successful
  ✔ Install sphinxcontrib-applehelp 2.0.0 successful
  ✔ Install sphinxcontrib-jsmath 1.0.1 successful
  ✔ Install msgpack 1.1.0 successful
  ✔ Install sphinxcontrib-devhelp 2.0.0 successful
  ✔ Install sphinxcontrib-htmlhelp 2.1.0 successful
  ✔ Install starlette 0.46.1 successful
  ✔ Install tomli 2.2.1 successful
  ✔ Install sphinxcontrib-serializinghtml 2.0.0 successful
  ✔ Install sphinxcontrib-qthelp 2.0.0 successful
  ✔ Install tomlkit 0.13.2 successful
  ✔ Install typing-extensions 4.13.1 successful
  ✔ Install urllib3 2.3.0 successful
  ✔ Install pyyaml 6.0.2 successful
  ✔ Install uvicorn 0.34.0 successful
  ✔ Install tzdata 2025.2 successful
  ✔ Install sphinx 7.4.7 successful
  ✔ Install psutil 7.0.0 successful
  ✔ Install ujson 5.10.0 successful
  ✔ Install time-machine 2.16.0 successful
  ✔ Install watchfiles 1.0.5 successful
  ✔ Install websockets 15.0.1 successful
  ✔ Install orjson 3.10.16 successful
  ✔ Install coverage 7.8.0 successful
  ✔ Install pymongo 4.12.0 successful
  ✔ Install ruff 0.11.4 successful
  ✔ Install cattrs 24.1.4.dev40 successful

  0:00:16 🎉 All complete! 76/76
(.venv) red@red-desktop:~/SourceCode/repos/Python/cattrs$ make coverage
pdm run coverage run --source cattrs -m pytest -n auto tests
INFO: Inside an active virtualenv /home/red/SourceCode/repos/Python/cattrs/.venv, reusing it.
Set env var PDM_IGNORE_ACTIVE_VENV to ignore it.
/home/red/SourceCode/repos/Python/cattrs/.venv/lib/python3.10/site-packages/pytest_benchmark/logger.py:39: PytestBenchmarkWarning: Benchmarks are automatically disabled because xdist plugin is active.Benchmarks cannot be performed reliably in a parallelized environment.
  warner(PytestBenchmarkWarning(text))
======================================================================================== test session starts ========================================================================================
platform linux -- Python 3.10.12, pytest-8.3.5, pluggy-1.5.0
benchmark: 5.1.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=True warmup_iterations=5)
rootdir: /home/red/SourceCode/repos/Python/cattrs
configfile: pyproject.toml
plugins: anyio-4.9.0, xdist-3.6.1, hypothesis-6.130.13, benchmark-5.1.0, time-machine-2.16.0
4 workers [842 items]   
............................................................................................................................................................................................. [ 22%]
............................................................................................................................................................................................. [ 44%]
...........x.x.x.x.....xxx.....xxx.x............x.................................................xxx....................................................................................s... [ 67%]
............................................................................................................................................................................................. [ 89%]
...........sss...s....................................................................                                                                                                        [100%]
============================================================================ 822 passed, 5 skipped, 15 xfailed in 52.45s ============================================================================
/home/red/SourceCode/repos/Python/cattrs/.venv/lib/python3.10/site-packages/coverage/inorout.py:525: CoverageWarning: Module cattrs was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
pdm run coverage report -m
INFO: Inside an active virtualenv /home/red/SourceCode/repos/Python/cattrs/.venv, reusing it.
Set env var PDM_IGNORE_ACTIVE_VENV to ignore it.
No data to report.
make: *** [Makefile:63: coverage] Error 1

Allowing PDM to create the environment doesn't work, and neither does running the command inside the environment without PDM.

@Tinche
Copy link
Member

Tinche commented Apr 10, 2025

I think you need to run coverage combine before coverage report: https://coverage.readthedocs.io/en/latest/cmd.html#cmd-combine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants