Fix nested column flattening in group_by to use underscores #1503

nablabits · 2025-12-10T20:38:38Z

This is an early attempt to fix #1329

I'm not very familiar with the code but with my knowledge and the help of some LLM I managed to put together this solution. I'm aware that this could be quite disruptive for users as it's changing the output of some well established function such as to_list (see the example in this test). People more knowledgeable of the use cases can judge this.

If we want to preserve the nested.level1.name structure, I feel that the options go through:

how the signals are extracted here
the to_pandas method and more precisely this get_headers_with_length folk (source) which is responsible of getting the columns that then become a multi-index.

Let me know what you think

(There are a couple of lint errors that we can address once a consensus on the approach is reached)

When using nested columns in `group_by` partition_by, the output now uses underscores to flatten column names (e.g., `nested__level1__name` instead of `nested.level1.name`) to avoid MultiIndex in pandas output. datachain-ai#1329

shcheklein

@nablabits PTAL https://github.com/datachain-ai/datachain/pull/1496/files . Could try to to run the use case on top of that PR?

nablabits · 2025-12-13T08:34:23Z

@shcheklein Yep, I have rebased the branch on top of #1496 and yes, it is working as expected, see: nablabits@1570d50

I will close this now, but feel free to reopen if something else comes up 🙂

Edit: wait, it's working because I didn't remove the flatten=True 🤦

shcheklein reviewed Dec 11, 2025

View reviewed changes

nablabits closed this Dec 13, 2025

nablabits reopened this Dec 13, 2025

nablabits mentioned this pull request Dec 13, 2025

Group by nested column doesn't work #1329

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix nested column flattening in group_by to use underscores #1503

Fix nested column flattening in group_by to use underscores #1503

Uh oh!

nablabits commented Dec 10, 2025

Uh oh!

shcheklein left a comment

Uh oh!

nablabits commented Dec 13, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix nested column flattening in group_by to use underscores #1503

Are you sure you want to change the base?

Fix nested column flattening in group_by to use underscores #1503

Uh oh!

Conversation

nablabits commented Dec 10, 2025

Uh oh!

shcheklein left a comment

Choose a reason for hiding this comment

Uh oh!

nablabits commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nablabits commented Dec 13, 2025 •

edited

Loading