-
Notifications
You must be signed in to change notification settings - Fork 153
test: add unit tests to execute transform flow e2e #737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
"""Test the complex transform flow with child rows.""" | ||
input_data = Parent(children=[Child(1), Child(2), Child(3)]) | ||
result = for_each_transform.eval(input_data) | ||
expected = Parent(children=[Child(1), Child(2), Child(3)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the value we really expect it to return - the new field isn't here.
The reason is the annotated return type doesn't contain the new field.
(This will need to define another set of Parent
+ Child
for this additional field. This may look a little bit awkward for now, but it's OK as it's only a test, and after #758 is resolved we won't need to define types to annotate return values for transform functions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah right, that can be a valid workaround. But aside from that, I would say there is one more concern if I were right.
Unlike regular @flow_def
flows that use DataCollector
to explicitly capture the output, here our transform flows can only rely on the direct return value. The row()
context creates a temporary scope for iteration, but changes within that scope aren't automatically merged back.
So, if we update it as follows, I don't think the new_field
is attached to the data.
@dataclass
class NewChild:
value: int
new_field: int = 0
@dataclass
class NewParent:
children: list[NewChild]
@cocoindex.op.function()
def extract_value(value: int) -> int:
return value
@cocoindex.transform_flow()
def for_each_transform(
data: cocoindex.DataSlice[Parent],
) -> cocoindex.DataSlice[NewParent]:
with data["children"].row() as child:
child["new_field"] = child["value"].transform(extract_value)
return data
@pytest.mark.asyncio
async def test_for_each_transform_flow_async() -> None:
input_data = Parent(children=[Child(4), Child(5)])
result = await for_each_transform.eval_async(input_data)
expected = NewParent(
children=[NewChild(value=4, new_field=4), NewChild(value=5, new_field=5)]
)
assert result == expected, f"Expected {expected}, got {result}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
row()
context creates a temporary scope for iteration
This is not the intention of the design. In with data["children"].row() as child
, think child
as a reference. When doing child["new_field"] = ...
, it should add a new field to the original table. CocoInsight also shows this idea (it presents the entire table with all new fields merged).
Did you run the code with this change? If the returned table indeed doesn't have new_field
, it's a bug. Maybe comment out this test for now and merge, and after we fix the bug we can uncomment it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! That's quite unusual; the new_field
in the result from the above code snippet was always 0. I'd like to make some other minimal examples and delve into this deeper to see what's wrong with it.
resolve #618.